Information, Interaction, and Agency
INFORMATION, INTERACTION, AND AGENCY
Wiebe van der Hoek
Reprinted from Synthese...

Author:
Wiebe van der Hoek

Information, Interaction, and Agency

INFORMATION, INTERACTION, AND AGENCY

Wiebe van der Hoek

Reprinted from Synthese 139:2 and 142:2 (2004), Special Section Knowledge, Rationality & Action

123

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-10 1 4020-3600-0 ISBN-13 978-1-4020-3600-0

Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands.

Cover design by Vincent F. Hendricks

Printed on acid-free paper

All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands.

Contents

Foreword Wiebe van der Hoek Logics for Epistemic Programs Alexandru Baltag and Lawrence S. Moss

vii

1

A Counterexample to Six Fundamental Principles of Belief Formation Hans Rott

61

Comparing Semantics of Logics for Multi-Agent Systems Valentin Goranko and Wojciech Jamroga

77

A Characterization of Von Neumann Games In Terms of Memory Giacomo Bonanno

117

An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems Karl Tuyls, Ann Now é, Tom Lenaerts and Bernard Manderick

133

Evolution of Conventional Meaning and Conversational Principles Robert van Rooy

167

vi Nonmonotonic Inferences and Neural Networks Reinhard Blutner

203

A Model of Jury Decisions Where All Jurors Have the Same Evidence Franz Dietrich and Christian List

235

A SAT-Based Approach to Unbounded Model Checking for Alternating-Time Temporal Epistemic Logic M. Kacprzak and W. Penczek

263

Update Semantics of Security Protocols Arjen Hommerson, John-Jules Meyer and Erik de Vink

289

Index

329

WIEBE VAN DER HOEK

FOREWORD

1. Introduction It was in 2002 that the idea arose that the time was right for a journal in the area of reasoning about Knowledge, Rationality and Action; a journal that would be a platform for those researchers that work on epistemic logic, belief revision, game and decision theory, rational agency, planning and theories of action. Although there are some prestigious conferences organised around these topics, it was felt that to have a journal in this area would have lots of added value. What such a journal would typically be a platform for, would be the kind of problems addressed by researchers from Computer Science, Game Theory, Artiﬁcial Intelligence, Philosophy, Knowledge Representation, Logic and Agents. Problems that address artiﬁcial systems that have to gather information, reason about it and then make a sensible decision about what to do next. It is for this reason, that I am very happy that Knowledge, Rationality & Action (KRA) now exists as its own Section at Springer. For the clear and obvious links that the scope of KRA has with Philosophy, it was decided that KRA would be launched as a series within the journal Synthese. This book collects the ﬁrst two issues of KRA. Its index shows that these ﬁrst two issues indeed address its ‘core business’: all the chapters refer explicitly to knowledge, for instance, and rationality is represented by the many contributions that address games, or reasoning with or about strategies. Actions are present in many chapters in this book: whether they are epistemic programs, or choices by a coalition of agents, or moves in a game, or votes by the members of a jury. All in all, there is an emphasis on Information and a notion of Agency. What is furthermore striking, is that almost all chapters study these concepts in a multi -agent perspective. In no paper in this book we have an isolated decision maker that only reasons about his own information and strategies, but always this is placed in a context of other agents, with some implicit or explicit assumptions about the Interaction.

viii

FOREWORD

Finally, this volume demonstrates that ‘classical’ approaches to Information and Agency co-exist very well with more modern trends that show how Knowledge, Rationality and Action can achieve a broad and refreshing interpretation; there is a chapter on ‘classical AGMlike’ belief revision, but also two on a modern approach in the area of Dynamic Epistemic Logic. There is a chapter on von Neumann games, but also two that defend Evolutionary Game Theory, a branch of game theory that attempts to loosen the ‘classical assumptions’ about ‘hyperrational players’ in games. There are chapters solely on logical theories, but also one that suggests how we can bridge the gap between symbolic and connectionist approaches to cognition. Finally, there is a chapter on voting that relativises Condercet’s ‘classical’ Jury’s Theorem. In the next section I will brieﬂy give some more details about the themes of this book. While this can be conceived as a ‘top-down’ description of the contents, in Section 3 I will give a more ‘bottom-up’ picture of the individual chapters.

2. Themes in this Book The ﬁrst two chapters (Logics for Epistemic Programs and A Counterexample to Six Fundamental Principles of Belief Formation), as well as the fourth (A Characterization of Von Neumann Games) and the last two (A SAT-Based Approach to Unbounded Model Checking for Alternating-Time Temporal Epistemic Logic (Chapter 9) and Update Semantics of Security Protocols) explicitly deal with information change. It is also subject of study in part of Chapter 3, Comparing Semantics of Logics for Multi-Agent Systems. Whereas the ﬁrst chapter, in order to develop a Dynamic Epistemic Logic, uses a powerful object language incorporating knowledge, common knowledge and belief, the second chapter formulates and analyses six simple postulates, in a metaand semi-formal language. A main diﬀerence in the two approaches is that in Chapter 2, the theory has to explain how ‘expectations’ implicitly encoded in a belief set determine the next belief (when a conjunction has to be given up, say), whereas in the ﬁrst chapter, this is not an issue: there, the eﬀect of the learning is exhaustively described by an epistemic program. Moreover, in the ﬁrst chapter, the ‘only’ facts prune to change are epistemic facts (for instance, one agent might learn that a second agent now does know whether a certain fact holds), where they are ‘objective’ in the second. Chapter 10 uses a mix of both kinds of information change: there, it can be both objective and ‘ﬁrst order’. Chapters 3

WIEBE VAN DER HOEK

ix

and 9 analyse the dynamics of knowledge in the context of Alternating Transition Systems, where we have several agents that jointly determine the transition from one state to another. These two chapters do not have operators that explicitly refer to the change of knowledge: it is encoded in the epistemic and action relations in the model. Knowledge in extensive games is treated in a similar way, where it is represented in the information partition of the underlying game trees. Chapter 3 demonstrates that, indeed, there is a close relation between Alternating Transition Systems and Concurrent Game Structures, of which the latter are often conceived as a generalisation of game trees. The chapters 9 and 10 take a genuine Computer Science perspective on some of the issues analysed in chapters 1 and 3. More in particular, both chapters address the problem of (automatic) veriﬁcation of complex agent systems. The systems of study in chapter 9 are the Alternating-time Transitions Systems of chapter 3, more precisely those which incorporate an epistemic component. The problem addressed in this chapter is that of model checking such systems: given a description of a transition system model, and a property expressed in an appropriate logic, can we automatically check whether that property is true in that model? Chapter 10’s aim is to develop veriﬁcation methods for the epistemic program-type of action of Chapter 1, in the area of security protocols. Since the use of encryption keys in such protocols is to hide information in messages from speciﬁc agents, but make it available to others, Dynamic Epistemic Logic seems an appropriate tool here. Games, or at least, game like structures, are the object of study in the chapters 3, 4 and 5 (An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems). The ﬁst of the two address knowledge or belief in such games, whereas the emphasis on the ﬁfth chapter is more on lack of it. Chapter 4 studies memory of past knowledge for players playing a game in extensive form. If a player knows ϕ now, is he guaranteed to always know that he ever knew ϕ? The chapter gives a necessary and suﬃcient condition for it, and shows that players having perfect recall is closely connected to the notion of a von Neumann game. The other two chapters (i.e., 3 and 5) have in common that they try to combine and relate diﬀerent formalisms. Chapter 3 compares and relates several semantics for game like logics, including Alternating-Time Temporal Logic (ATL) and Coalition Logic. Chapter 5 rather relates three disciplines: evolutionary game theory, reinforcement learning and multi-agent systems. The key trigger for this work is the insight that in many realistic multi-agent systems one has to weaken the ‘classical’ game theoretic assumptions about ‘hyper-rational’ agents, by players

x

FOREWORD

referred to as ‘bounded rational’ agents, who only have partial information about the environment and the payoﬀ tables, and who have to learn an optimal ‘policy’ by trial and error. The sixth chapter questions exactly the same assumptions that classical game theory makes on ‘hyper-rational agents’ as are debated in Chapter 5. Chapter 6 addresses the question how to explain which equilibria are chosen in signalling games, games that try to shed a light on language use and language organisation. The chapter proposes to replace current explanations for such selection, which rely on strong assumptions about rationality and common knowledge (thereof) by the players, i.e., the language users, by explanations that are based on insights from evolutionary game theory. Whereas Chapter 3 relates, on a technical level, several semantics for games like logics, and Chapter 5 makes a case to combine three disciplines in order to study the dynamics of rationality in multi-agent systems, Chapter 7 (Nonmonotonic Inferences and Neural Networks) uses a semantics in one research paradigm, i.e., non-monotonic logic as a symbolic reasoning mechanism, to bridge a gap to another paradigm, i.e. to connectionist networks in the sub-symbolic paradigm. Doing so, the chapter is a step toward bridging the gap between symbolic and sub-symbolic modes of computation, thus addressing a long standing issue in Philosophy of Mind. Finally, Chapter 8 (Evolution of Conventional Meaning and Conversational Principles) addresses the issue of rational decision making in a group, or voting. It discusses a classical result that says that, in the scenario of majority voting, if every juror is competent, the reliability that the group decision is correct, converges to certainty, if the group size increases. Thus, this chapter also sits in the multi-agent context, but rather than accepting that the result of a joint action is given by some transition function, this chapter discusses the rationality of a speciﬁc way to merge speciﬁc actions, i.e., those of voting. Moreover, again, it appears that knowledge is crucial here, because the chapter proposes, rather than to assume independence of the voters given the state of the world, we should conditionalize on the latest evidence.

3. Brief Description of the Chapters In Logics for Epistemic Programs, by Alexandru Baltag and Lawrence S.Moss (Chapter 1), the authors take a general formal approach to changes of knowledge, or, better, changes of belief in a multi-agent context. The goal of their paper is to show how several epistemic actions

WIEBE VAN DER HOEK

xi

can be explained as speciﬁc update operations on ‘standard’ Kripke state-models that describe ‘static’ knowledge. Updates describe how we move from one state-model to another, and an epistemic action speciﬁes, how such an update ‘looks like’, for every agent involved in it. An example of an epistemic action is that of a public announcement: if ϕ is publicly announced in a group of agents A, the information ϕ is truthfully announced to everybody in A, and that this is done is common knowledge among the members of A. However, in more private announcements, it may well be that agent b learns a new fact, whereby c is aware of this, without becoming to know the fact itself. The approach of the authors is unique in the sense that they also model epistemic actions by Kripke-like models, called action models. In A Counterexample to Six Fundamental Principles of Belief Formation (Chapter 2), Hans Rott reconsiders six principles that are generally well accepted in the areas of non-monotonic reasoning, belief revision and belief contraction, principles of common sense reasoning. They can all be formulated just by using conjunction and disjunction over new information, or information that has to be abandoned. Rott then pictures a reasoner who initially ‘expects’ that from a set of possible alternatives a, b, or c, none will be chosen. He then sketches three possible (but diﬀerent) scenarios in which the reasoner learns in fact that a ∨ b, a ∨ b ∨ c and c do hold, after all. Depending on how each new information is in line or goes against the reasoners ‘other expectations’, he will infer diﬀerent conclusions in each scenario. Interestingly, the six principles are then tied up with principles in the theory of rational choice, most prominently to the Principle of Independence of Irrelevant Choices: one’s preferences among a set of alternatives should not change (within that set of alternatives), if new options present themselves. Rott argues that, in the setting of belief formation, the eﬀect of this additional information should exactly be accounted for and explained. The chapter concludes negatively: logics that are closer to modelling ‘common sense reasoning’ seem to have a tendency to drift away from the nice, classical patterns that we usually ascribe to ‘standard logics’. Alternating Transition Systems are structures in which each joint action of a group of agents determines a transition between global states of the system. Such systems have inspirations from areas as diverse as game theory, computation models, epistemic and coalition models. Chapter 3 (Comparing Semantics of Logics for Multi-Agent Systems, by Valentin Goranko and Wojciech Jamroga) uses these structures to show how (semantically) several frameworks to reason about the abilities of agents are equivalent. One prominent framework in their analyses is that of Alternating-time Temporal Logic (ATL), a logic intended to reason about what coalitions of agents can achieve, by choosing an

xii

FOREWORD

appropriate strategy, and taking into account all possible strategies for the agents outside the coalition. The chapter ﬁrst of all shows that the three diﬀerent semantics that were proposed for ATL are equivalent. Moreover, the authors demonstrate that ATL subsumes (Extended) Coalition Logic. Last but not least, they show that adding an epistemic component to ATL (giving ATEL), can be completely modelled within ATL, the idea being, to model the epistemic indistinguishability relation for each agent as a strategic transition relation for his ‘associated epistemic agents’. In Chapter 4, (A Characterization of Von Neumann Games in Terms of Memory), Giacomo Bonanno analyses knowledge, and the memory of it, in the context of extensive games with incomplete information. In such a game, it is assumed that each player can be uncertain about the nodes in which he has to make a move. First of all, Bonanno extends this notion to an information completion, in which a player can have uncertainties about all nodes, not just the ones that are his. For such games, he deﬁnes a notion of Memory of Past Knowledge (MPK) in terms of the structure and the information partition in that game. If a game only allows for uncertainty for every player at his decision nodes, the the analogue of MPK is called Memory of Past Decision Nodes (MPD). Syntactically, MPK is related to perfect recall: it appears to be equivalent to saying that, at every node, a player remembers everything he has known before: if he knew ϕ in the past, then now he knows that he previously knew ϕ. This notion is then connected to that of von Neumann games, which, roughly, are games in which all players know the time. The main result of the chapter tells us that an extensive form game with incomplete information allows for an information completion satisfying MPK if, and only if, the game is von Neumann satisfying MPD. Note that from this, it follows that if we start with a game with Memory of Past Decision Nodes, this game can be extended to a game with Memory of Past Knowledge, if and only if it is a von Neumann game. Chapter 5 (An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems) by Karl Tuyls, Ann Now´e, Tom Lenaerts and Bernard Manderick, is a survey paper that argues how three currently loosely connected disciplines can use and contribute to each other’s development. They ﬁrst of all take the stance that crucial assumptions in ‘classical game theory’ make their applicability to multi-agent systems and the real world rather limited: in the latter, we cannot always assume that participants have perfect knowledge of the environment, or even of the payoﬀ tables. Rather than assuming that players are ‘hyper-rational’, who correctly anticipate the behaviour of all other players, they propose players that are ‘boundedly

WIEBE VAN DER HOEK

xiii

rational’, who are limited in their knowledge about the game and the environment, as well as in their computational resources. Moreover, such players learn to respond better by trial and error, which adds to the dynamics of the multi-player, or multi-agent system. Given these assumptions about the partially known dynamic environment, is seems natural to assume that learning and adaptiveness are skills that are important for the agents in that environment. The chapter argues how reinforcement learning, a theoretical framework that is already established in single-agent systems, has to solve several technical problems in order to be applicable to the multi-agent case. The problem is that in such richer systems, the reinforcement an agent receives, may depend on the actions taken by the other agents. This absence of ‘Markovian behaviour’ may make convergence properties of reinforcement learning, as they hold in the single agent case, disappear. In order to fully understand the dynamics of learning and the eﬀects of exploration in multi-agent systems, they propose to use evolutionary game theory in such systems, which adds a solution concept to the classical equilibria, namely that of a strategy being evolutionary stable. A strategy has this property if it is robust against evolutionary pressure from any appearing mutant strategy. Apart from giving several examples illustrating the main concepts, they show how evolutionary game theory can be used as a foundation for modelling new reinforcement algorithms for multi-agent systems. Evolution of Conventional Meaning and Conversational Principles by Robert van Rooy (Chapter 6) questions exactly the same assumptions that classical game theory makes on ‘hyper-rational agents’. This chapter addresses the question how to explain which equilibria are chosen in signalling games, games that try to shed a light on language use and language organisation. The chapter proposes to replace current explanations for such selection, which rely on strong assumptions about rationality and common knowledge (thereof) by the players, i.e., the language users, by explanations that are based on insights from evolutionary game theory, especially that of an evolutionary stable strategy. Rather than obtaining Nash equilibriua in language games by relying on almost reciprocal assumptions about mutual (knowledge of) rationality, or using a psychological notion of salience to explain selection of a so-called conventional equilibrium, the chapter shows how that equilibrium will ‘naturally’ evolve in the context of evolutionary language games. It also uses evolutionary game theory to explain how conventions that enhance eﬃcient communication are more likely to be adapted than those that do not. Finally, this chapter shows how costly signalling can account for honest communication.

x iv

FOREWORD

The aim of Chapter 7 (Nonmonotonic Inferences and Neural Networks), by Reinhard Blutner, is mainly a methodological one, i.e., to show that model-theoretic semantics may be useful for analysing properties of connectionist networks. Doing so, the chapter is a step toward bridging the gap between symbolic and sub-symbolic modes of computation, thus addressing a long standing issue in philosophy of mind. The chapter demonstrates ﬁrst of all that certain activities of connectionist networks can be seen as non-monotonic inferences. Secondly, it shows a correspondence between the coding of knowledge in Hopﬁeld networks, and the representation of knowledge in Poole systems. To do so, the chapter makes the latter systems weight-annotated, assigning a weight to all possible hypotheses in a Poole system. Then, roughly, links in the network are mapped to bi-implications in the logical system. In sum, the chapter contributes to its goals by encouraging us to accept that the diﬀerence between symbolic and neural computation is one of perspective: we should view symbolism as a high-level description of properties of a class of neural networks. Chapter 8 (A Model of Jury Decisions Where All Jurors Have the Same Evidence by Franz Dietrich and Christian List) addresses the issue of rational decision making in a group, or voting. The setting is a simple one: the decision is a jury’s decision about a binary variable (guilty or not) under the assumption that each juror is competent (predicts the right value of the variable with a probability greater than 0.5). Under this scenario, Condorcet’s Jury Theorem predicts that the reliability of a jury’s majority decision converges to 1 if the size of the jury increases unboundedly. This holds under the assumption that diﬀerent jurors are independent conditional on the state of the world, requiring that for each individual juror, a new independent view on the world is available. The authors propose a framework in which the jurors are independent on the evidence, rather than the world. This evidence is called the latest common cause of evidence of the jurors votes. This framework seems to have a realistic underpinning: a jury typically decides on the basis of commonly presented evidence, not on independently obtained signals about the world–the latter often not even being allowed for use in the court room. The chapter’s jury’ theorem then shows that the probability of a correct majority decision is typically less than the corresponding probability in the Condorcet’s model. It also predicts that, as the jury size increases, the probability of a correct majority decision converges to the probability that the evidence is not misleading. Chapter 9 (A SAT-Based Approach to Unbounded Model Checking for Alternating-Time Temporal Epistemic Logic), by M. Kacprzak and W. Penczek address the problem of (automatic)

WIEBE VAN DER HOEK

xv

veriﬁcation of complex agent systems. The systems of study in chapter 9 are the Alternating-time Transitions Systems of chapter 3, more in particular those which incorporate an epistemic component. The problem addressed in this chapter is that of model checking such systems: can we, given a description of a transition system model, and a property expressed in an appropriate logic (ATEL, an epistemic extension of ATL), automatically check whether that property is true in that model? To do so, this approach ﬁxes one of the semantics for ATL given in Chapter 4, to apply a technique from unbounded model checking to it. Then, for a given model T and ATEL-property ϕ, a procedure is given to express them as Quantiﬁed Boolean Formulas, which, using ﬁxed point deﬁnitions, in turn yield purely propositional formulas. A main theorem of the chapter then states that ϕ is true of T if and only if the obtained propositional formula is satisﬁable. Hence, model checking ATEL is reduced to a SAT-based approach, an approach that has computational advantages over model checking using for instance Binary Decision Diagrams. The last chapter, chapter 10, (Update Semantics of Security Protocols by Arjen Hommerson, John-Jules Meyer and Erik de Vink) addresses veriﬁcation of security protocols. This becomes more and more important in an era where agents send more and more private, secret or sensitive messages over an insecure medium. Decryption keys are introduced to make speciﬁc messages only readable to speciﬁc agents, which makes the need to reason about higher order information (knowledge about knowledge) in a multi-agent protocol obvious. This chapter takes three kinds of updates, or messages (or, in the terminology of Chapter 1, ‘epistemic programs’): the public announcement of an object variable, the private learning of a variable and the private learning about the knowledge of other agents about variables. The chapter ﬁrst of all gives a Dynamic Kripke Semantics for these speciﬁc actions, not unlike the semantics using action models as proposed in Chapter 1, and then puts this semantics to work to model and reason about two speciﬁc security protocols, in which encrypted messages are sent and received. This chapter might well be a ﬁrst step to apply the model checking techniques described in Chapter 9 to the dynamic logic framework of Chapter 1, in the area of security and authorisation protocols.

ALEXANDRU BALTAG and LAWRENCE S. MOSS

LOGICS FOR EPISTEMIC PROGRAMS

ABSTRACT. We construct logical languages which allow one to represent a variety of possible types of changes affecting the information states of agents in a multi-agent setting. We formalize these changes by deﬁning a notion of epistemic program. The languages are two-sorted sets that contain not only sentences but also actions or programs. This is as in dynamic logic, and indeed our languages are not signiﬁcantly more complicated than dynamic logics. But the semantics is more complicated. In general, the semantics of an epistemic program is what we call a program model. This is a Kripke model of ‘actions’, representing the agents’ uncertainty about the current action in a similar way that Kripke models of ‘states’ are commonly used in epistemic logic to represent the agents’ uncertainty about the current state of the system. Program models induce changes affecting agents’ information, which we represent as changes of the state model, called epistemic updates. Formally, an update consists of two operations: the ﬁrst is called the update map, and it takes every state model to another state model, called the updated model; the second gives, for each input state model, a transition relation between the states of that model and the states of the updated model. Each variety of epistemic actions, such as public announcements or completely private announcements to groups, gives what we call an action signature, and then each family of action signatures gives a logical language. The construction of these languages is the main topic of this paper. We also mention the systems that capture the valid sentences of our logics. But we defer to a separate paper the completeness proof. The basic operation used in the semantics is called the update product. A version of this was introduced in Baltag et al. (1998), and the presentation here improves on the earlier one. The update product is used to obtain from any program model the corresponding epistemic update, thus allowing us to compute changes of information or belief. This point is of interest independently of our logical languages. We illustrate the update product and our logical languages with many examples throughout the paper.

1. INTRODUCTION

Traditional epistemic puzzles often deal with changes of knowledge that come about in various ways. Perhaps the most popular examples are the puzzles revolving around the fact that a declaration of ignorance of some sentence A may well lead to knowledge of A. We have in mind the scenarios that go by names such as the Muddy Children, the Cheating Spouses, the Three Wisemen, and the like. The standard treatment of these matters (a) introduces the Kripke semantics of modal logic so as to formalize the Synthese 139: 165–224, 2004. Knowledge, Rationality & Action 1–60, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[1]

166

LOGICS FOR EPISTEMIC PROGRAMS

informal notions of knowledge and common knowledge; (b) formalizes one of the scenarios as a particular model; (c) and ﬁnally shows how the formalized notions of knowledge and common knowledge illuminate some key aspects of the overall scenario. The informal notion of knowledge which is closest to what is captured in the traditional semantics is probably justiﬁed true belief. But more generally, one can consider justiﬁable beliefs, regardless of whether or not they happen to be true or not; in many contexts, agents may be deceived by certain actions, without necessarily losing their rationality. Thus, such beliefs, and the justiﬁable changes affecting these beliefs, may be accepted as a proper subject for logical investigation. The successful treatment of a host of puzzles leads naturally to the following THESIS I. Let s be a social situation involving the intuitive concepts of knowledge, justiﬁable beliefs and common knowledge among a group of agents. Assume that s is presented in such a way that all the relevant features of s pertaining to knowledge, beliefs and common knowledge are completely determined. Then we may associate to s a mathematical model S. (S is a multi-agent Kripke model; we call these epistemic state models.) The point of the association is that all intuitive judgements concerning s correspond to formal assertions concerning S, and vice-versa. We are not aware of any previous formulations of this thesis. Nevertheless, some version of this thesis is probably responsible for the appeal of epistemic logic. We shall not be concerned in this paper with a defense of this thesis, but instead we return to our opening point related to change. Dynamic epistemic logic, dynamic doxastic logic, and related formalisms, attempt to incorporate change from model to model in the syntax and semantics of a logical language. We are especially concerned with changes that result from information-updating actions of various sorts. Our overall aim is to formally represent epistemic actions, and we associate to each of them a corresponding update. By “updates” we shall mean operations deﬁned on the space of all state models, operations which are meant to represent welldeﬁned, systematic changes in the information states of all agents. By an “epistemic action” (or program) we shall mean a representation of the way such a change “looks” to each agent. Perhaps the paradigm case of an epistemic action is a public announcement. The ﬁrst goal is to say in a general way what the effect of a (public) announcement should be on a model. It is natural to model such announcements by the logical notion of relativization: publicly announcing a sentence causes all agents to restrict attention to the worlds where the

[2]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

167

sentence was true (before the announcement). Note that the informal notion of announcement takes situations to situations, and the formal notion of relativization is an operation taking models to models. In this paper, we wish to consider many types of epistemic actions that are more difﬁcult to handle than public announcements. These include half-transparent, half-opaque types of actions, such as announcements to groups in a completely private way, announcements that include the possibility that outsiders suspect the announcement but this suspicion is lost on the insiders, private announcements which are secretely intercepted by outsiders etc. We may also consider types of actions exhibiting information-loss and misinformation, where agents are deceived by others or by themselves. THESIS II. Let σ be a social “action” involving and affecting the knowledge (beliefs, common knowledge) of agents. This naturally induces a change of situation; i.e., an operation o taking situations s into situations o(s). Assume that o is presented by assertions concerning knowledge, beliefs and common knowledge facts about s and o(s), and that o is completely determined by these assertions. Then (a) We may associate to the action σ a mathematical model which we call an epistemic action model. ( is also a multi-agent Kripke model.) The point again is that all the intuitive features of, and judgments about, σ correspond to formal properties of . (b) There is an operation ⊗ taking a state model S and an action model and returning a new state model S ⊗ . So each induces an update operation O on state models: O(S) = S ⊗ . (c) The update O is a faithful model of the situation change o, in the sense that for all s: if s corresponds to S as in Thesis I, then again o(s) corresponds to O(S) in the same way; i.e. all intuitive judgements concerning o(s) correspond to formal assertions concerning O(S), and vice-versa. Our aim in this paper is not to offer a full conceptual defense of these two theses. Instead, we will justify the intuitions behind them through examples and usage. We shall use them to build logical languages and models and show how these can be applied to the analysis of natural examples of “social situations” and “social actions”. As in the case of standard possibleworlds semantics (for which a ‘strong’, ontological defense is hard, maybe even impossible, to give), the usefulness of these formal developments may [3]

168

LOGICS FOR EPISTEMIC PROGRAMS

provide a ‘weak’, implicit defense of the philosophical claims underlying our semantics. Our method of deﬁning updates is quite general and leads to logics of epistemic programs, extending standard systems of epistemic logic by adding updates as new operators. These logical languages also incorporate features of propositional dynamic logic. Special cases of our logic, dealing only with public or semi-public announcements to mutually isolated groups, have been considered in Plaza (1989), Gerbrandy (199a, b), and Gerbrandy and Groeneveld (1997). But our overall setting is much more liberal, since it allows for all the above-mentioned types of actions. We feel it would be interesting to study further examples with an eye towards applications, but we leave this to other papers. In our logical systems, we capture only the epistemic aspect of these real actions, disregarding other (intentional) aspects. In particular, to keep things simple we only deal with “‘purely epistemic” actions; i.e., the ones that do not change the facts of the world, but affect only the agents’ beliefs about the world. However, this is not an essential limitation, as our formal setting can be easily adapted to express fact-changing actions. On the semantic side, the main original technical contribution of our paper lies in our decision to represent not only the epistemic states, but also (for the ﬁrst time) the epistemic actions. For this, we use action models, which are epistemic Kripke models of “actions”, similar to the standard Kripke structures of “states”. While for states, these structures represent in the usual way the uncertainty of each agent concerning the current state of the system, we similarly use action signatures to represent the uncertainty of each agent concerning the current action taking place. For example, there will be a single action signature that represents public announcements. There will be a different action signature representing a completely private announcement to one speciﬁed agent, etc. The intuition is that we are dealing with potentially “half-opaque/half-transparent” actions, about which the agents may be incompletely informed, or even completely misinformed. The components (“possible worlds”) of an action model are called “simple” actions, since they are meant to represent particularly simple kinds of actions, whose epistemic impact is uniform on states: the informational features of a simple action are intrinsic to the action, and thus are independent of the informational features of the states to which it can be applied. This independence is subject to only one restriction: the action’s presuppositions or conditions of possibility, which a state must satisfy in order for the action to be executable. Thus, besides the epistemic structure, simple actions have preconditions, deﬁning their domain of applicability: not every action is possible in every state. We model the update [4]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

169

of a state by an action as a partial update operation, given by a restricted product of the two structures: the uncertainties present in the given state and the given action are multiplied, while the “impossible” combinations of states and actions are eliminated (by testing the actions’ preconditions on the state). The underlying intuition is that, since the agent’s uncertainties concerning the state and the ones concerning the simple action are mutually independent, the two uncertainties must be multiplied, except that we insist on applying an action only to those states which satisfy its precondition. On the syntactic side, we use a mixture of dynamic and epistemic logic, with dynamic modalities associated to each action signature, and with common-knowledge modalities for various groups of agents (in addition to the usual individual-knowledge operators). In this paper, we present a sound system for this logic. The logic includes an Action-Knowledge Axiom that generalizes similar axioms found in other papers in the area; (cf. Gergrand 1999a, b; Plaza 1989). The main original features of our system is an inference rule which we call the Action Rule. This allows one to infer sentences expressing common knowledge facts which hold after an epistemic action. From another point of view, the Action Rule expresses what might be called a notion of “epistemic (co)recursion”. Overall, the Action-Knowledge Axiom and the Action Rule express fundamental formal features of the interaction between action and knowledge in multi-agent systems. The logic is studied further in our paper with S. Solecki (Baltag et al. 1998). There we present the completeness and decidability of the logic, and we prove various expressivity results. For Impatient Readers. The main logical systems of the paper are presented in Section 4.2, and to read that one would only have to read the deﬁnition in Section 4.1 ﬁrst. To understand the semantics, one should read in addition Sections 2.1, 2.3, and 3.1–3.4. But we know that our systems would not be of any interest if the motivation were not so great. For this reason, we have included many examples and discussions, particularly in the sections of the paper preceding the introduction of the logics. Readers may read as much or as little of that material as they desire. Indeed, some readers may ﬁnd our examples and discussion of more interest than the logical systems themselves. The main logical systems are presented in Section 5. Technical Results. Concerning our systems will appear in other papers. The completeness/decidability result for the main systems of this paper will appear in a paper (Baltag et al. 1998) written with Sławomir Solecki;

[5]

170

LOGICS FOR EPISTEMIC PROGRAMS

this paper also contains results on the expressive power of our systems. For stronger systems of interest, there are undecidability results; (cf. Miller and Moss (2003)).

1.1. Scenarios Probably the best way to enter our our overall subject is to consider some “epistemic scenarios.” These give the reader some idea of what the general subject is about, and they also provide us with test problems at different points. SCENARIO 1. The Concealed Coin. A and B enter a large room containing a remote-control mechanical coin ﬂipper. One presses a button, and the coin spins through the air, landing in a small box on a table. The box closes. The two people are much too far to see the coin. The main contents of any representation of the relevant knowledge states of A and B are that (a) there are two alternatives, heads and tails; (b) neither party knows which alternative holds; and (c) that (a) and (b) are common knowledge. The need for the notion of common knowledge in reasoning about multi-agent interaction is by now standard in applied epistemic logic, and so we take it as unproblematic that one would want (c) to come out in any representation of this scenario. Perhaps the clearest way to represent this scenario is with a diagram:

In more standard terms, we have a set of two alternatives, call them x and y. We also have some atomic information, that x represents the possible fact of the coin is lying heads up and that y represents the other possible fact. Finally, we also have some extra apparatus needed to indicate that, no matter the actual disposition of the coin, A and B think both alternatives are possible. Following the standard practice in epistemic logic, we take this apparatus to be accessibility relations between the alternatives. The diagram should be read as saying that were the actual state of the world to be x, say, then A and B would still entertain both alternatives. SCENARIO 2. The Coin Revealed to Show Heads. A and B sit down. One opens the box and puts the coin on the table for both to see. It’s heads. The result of this scenario is a model which again is easy to grasp. It consists of one state; call it s. Each agent knows the state in the sense that they think s is the only possible state. [6]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

171

We also shall be interested in keeping track of the relation of the model in Scenario 1 and the model just above. We indicate this in the following way:

The ﬁrst thing to note is that the dotted connection is a partial function; as we shall see later, this is the hallmark of a deterministic epistemic action. But we also will see quite soon that the before-after relation is not always a partial function; so there are epistemic actions which are not deterministic. Another thing to note at this point is that the dotted connection is not subscripted with an agent or a set of agents. It does not represent alternative possibility with respect to anyone, but instead stands for the before-after relation between two models: it is a transition relation, going from input states to the corresponding output states. In this example, the transition relation is in fact a partial function whose domain is the set of states which could possibly be subject to the action of revealing heads. This is possible in only one of the two starting states. SCENARIO 2.1. The Coin Revealed to Show Tails. As a variation on Scenario 2, there is a different Scenario in which the coin is revealed in the same way to both A and B but with the change that tails shows. Our full representation is:

SCENARIO 2.2. The Coin Revealed. Finally, we can consider the nondeterministic sum of publicly revealing heads and publicly revealing tails. The coin is revealed to both A and B, but all that we as external modelers can say is that either they learned that heads shows, or that they learned that tails shows. Our representation is

[7]

172

LOGICS FOR EPISTEMIC PROGRAMS

Observe that, although non-deterministically deﬁned, this is still a deterministic action: the relation described by the dotted connection is still a function. SCENARIO 3. A Semi-private Viewing of Heads. The following is an alternative to the scenarios above in which there is a public revelation. After Scenario 1, A opens the box herself. The coin is lying heads up. B observes A open the box but does not see the coin. And A also does not disclose whether it is heads or tails.

No matter which alternative holds, B would consider both as possible, and A would be certain which was the case SCENARIO 3.1. B’s Turn. After Scenario 3, B takes a turn and opens the box the same way. We expect that after both have individually opened the boxes they see the same things; moreover, they know this will happen. This time, we begin with the end of Scenario 3, and we end with the same end as in the public revelation of heads:

SCENARIO 4. Cheating. After Scenario 1, A secretly opens the box herself. The coin is lying Heads up. B does not observe A open the box, and indeed A is certain that B did not suspect that anything happened after they sat down. This is substantially more difﬁcult conceptually, and the representation is accordingly more involved. Such cheating is like an announcement that results in A’s being sure that the coin lies heads up, and B learns nothing. But the problem is how to model the fact that, after the announcement, B knows nothing (new). We cannot just delete all arrows for B to represent such lack of knowledge: this would actually increase B’s (false) ‘knowledge’, by adding to his set of beliefs new ones; for example, he’ll believe it is not possible that the coin is lying heads up. Deleting arrows always corresponds to increasing ‘information’ (even if sometimes this is just by adding false information). But this seems wrong in our case, since B’s possibilities should be unchanged by A’s secret cheating. Instead, our representation of the informational change induced by such cheating should be:

[8]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

173

(1)

The main point is that after A cheats, B does not think that the actual state of the world is even possible. The states that B thinks are possible should only be two states which are the same as the “before” states, or states which are in some important way similar to those. In addition to the way the state actually is, where A knows the coin lies heads up or rather as an aspect of that state, we need to consider other states to represent the possibilities that B must entertain. We have indicated names s, t, and u of the possibilities in (1). (We could have done this with the scenarios we already considered, but there was less of a reason.) State s has a different status from t and u: while t and u are the states those that B thinks could hold, s is the one that A knows to be the case. Note that the substructure determined by t and u is isomorphic to the “before” structure. This is the way that we formalize the intuition that the states available to B after cheating are essentially the same as the states before cheating. SCENARIO 5. More Cheating. After Scenario 4, B does the very same thing. That is, B opens the box quite secretly. We leave the reader the instructive exercise of working out a representation. We shall return to this matter in Section 3.5, where we solve the problem based on our general machinery. We merely mention now that part of the goal of the paper is precisely to develop tools to build representations of complex epistemic examples such as this one. SCENARIO 6. Lying. After Scenario 1, A convinces B that the coin lies heads up and that she knows this. In fact, she is lying. Here is our representation:

SCENARIO 7. Pick a Card. As another alternative to Scenario 3, C walks in and tells both A and B at the same time that she has a card which either says H, T, or is blank. In the ﬁrst two cases the card describes truly the

[9]

174

LOGICS FOR EPISTEMIC PROGRAMS

state of the coin in the box, and in the last case the intention is that no information is given. Then C gives the card to A in the presence of B. Here is our representation:

The idea here is that t and u represent states where the card was blank, s the state where the card showed H, and t the state where the card showed T. SCENARIO 8. Common Knowledge of (Unfounded) Suspicion. As yet another alternative to Scenario 3, suppose that after A and B make their original entrance, A has not looked, but B has some suspicion concerning A’s cheating; so B considers possible that she (A) has secretely opened the box (but B cannot be sure of this, so he also considers possible that nothing happened); moreover, we assume there is common knowledge of this (B’s) suspicion. That is, we want to think of one single action (a knowing glance by A, for example) that results in all of this. The representation of “before” and “after” is as follows:

One should compare this with Scenario 7. The blank card there is a parallel to no looking here; card with H is parallel to A’s looking and seeing H; and similarly for T. This accounts for the similarity in the models. The main difference is that Scenario 7 was described in such a way that we do not know what the card says; in this scenario we stipulate that A deﬁnitely did not look. This accounts for the difference in dotted lines between the two ﬁgures.

[ 10 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

175

SCENARIO 8.1. Private Communication about the Other. After Scenario 8, someone stands in the doorway behind A and raises a ﬁnger. This indicates to B that A has not cheated. The communication between B and the third party was completely private. Again, we leave the representation as an exercise to the reader. One Goal. of the paper is to propose a theoretical understanding of these representations. It turns out that there are simple operations and general insights which allow one to construct them. One operation allows one to pass, for example, from the representation of Scenario 1 to that of Scenario 4; moreover, the operation is so applicable that it allows us to add private knowledge for one agent in a “private” way to any representation. 1.2. Further Contents of This Paper Section 2 continues our introduction by both reviewing some of the main background concepts of epistemic logic, and also by situating our logical systems with respect to well-known systems. Section 2.1 presents a new “big picture” of the world of epistemic actions. While the work that we do could be understood without the conceptual part of the big picture, it probably would help the reader to work through our deﬁnitions. Section 3 begins the technical part of the paper in earnest, and here we revisit some of the Scenarios of Section 1.1 and their pictures. The idea is to get a logical system for reasoning with, and about, these kinds of models. Section 4 gives the syntax and semantics of our logical systems. These are studied further in Sections 5. For example, we present sound and complete logical systems (with the proofs of soundness and completeness deferred to Baltag et al. (2003)). Endnotes. We gather at the ends of our sections some remarks about the literature and how our work ﬁts in with that of others. We mentioned other work in dynamic epistemic logic and dynamic doxastic logic. Much more on these logics and many other proposals in epistemic logic may be found in Gochet and Gribomont (2003), and Meyer and van der Hoek (1995). Also, a survey of many topics the area of information update and communication may be found in van Benthem’s papers (2000, 2001a, b). The ideas behind several of the scenarios in Section 1.1 are to be found in several places: see, e.g., Plaza (1989), Gerbrandy (1999a, b), Gerbrandy and Groeneveld (1997), and van Ditmarsch (2000, 2001). We shall discuss these papers in more detail later. Our scenarios go beyond the work of these papers. Speciﬁcally, our treatment of the actions in Scenarios 6 and [ 11 ]

176

LOGICS FOR EPISTEMIC PROGRAMS

8 seems new. Also, our use of the relation between “before” and “after” (given by the dotted arrows) is new.

2. EPISTEMIC UPDATES AND OUR TARGET LOGICS

We ﬁx a set AtSen of atomic sentences, and a set A of agents. All of our deﬁnitions depend on AtSen and A, but for the most part we omit mention of these. 2.1. State Models and Epistemic Propositions A

A state model is a triple S = (S, → S , · S ) consisting of a set S of A “states”; a family of binary accessibility relations → S ⊆ S × S, one for each agent A ∈ A; and a “valuation” (or a “truth” map) .S : AtSenP (S), assigning to each atomic sentence p a set pS of states. When dealing with a single ﬁxed state model S, we often drop the subscript S from all the notation. In a state model, atomic sentences are supposed to represent nonepistemic, “objective” facts of the world, which can be thought of as properties of states; the valuation tells us which facts hold at which states. The accessibility relations model the agents’ epistemic uncertainty about A

the current state. That is, to say that s → t in S means that in the model, in state s, agent A considers it possible that the state is t. DEFINITION. Let StateModels be the collection of all state models. An epistemic proposition is an operation ϕ deﬁned on StateModels such that for all S ∈ StateModels, ϕ S ⊆ S. The collection of epistemic propositions is closed in various ways. 1. For each atomic sentence p we have an atomic proposition p with pS = pS .1 2. If ϕ is an epistemic proposition, then so is ¬ϕ, where (¬ϕ) S = S \ ϕ S . 3. If C is a set or class of epistemic propositions, then so is C, with {ϕ S : ϕ ∈ C}. ( C)S = 4. Taking C above to be empty, we have an “always true” epistemic proposition tr, with trS = S. 5. We also may take C in part (3) to be a two-element set {ϕ, ψ}; here we write ϕ ∧ ψ instead of {ϕ, ψ}. We see that if ϕ and ψ are epistemic propositions, then so is ϕ ∧ ψ, with (ϕ ∧ ψ)S = ϕ S ∩ ψ S . [ 12 ]

177

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

6. If ϕ is an epistemic proposition and A ∈ A, then 2A ϕ is an epistemic proposition, with (2A ϕ)S

(2)

=

A

{s ∈ S : if s → t, then t ∈ ϕ S }.

7. If ϕ is an epistemic proposition and B ⊆ A, then 2∗B ϕ is an epistemic proposition, with (2∗B ϕ)S

=

∗

B {s ∈ S : if s −→ t, then t ∈ ϕ S }.

∗

B t iff there is a sequence Here s −→

s = u0

→A1

u1

→A2

· · · →An

un+1 = t

where A1 , . . . , An ∈ B. In other words, there is a sequence of arrows B∗ includes from the set B taking s to t. We allow n = 0 here, so −→ the identity relation on S. To see that 2∗B ϕ is indeed an epistemic proposition, we use parts 3 and 6 above; we may also argue directly, of course. 2.2. Epistemic Logic Recalled In this section, we review the basic deﬁnitions of modal logic. In terms of our work in Section 2.1 above, the inﬁnitary modal propositions are the smallest collection of epistemic propositions containing the propositions p corresponding toatomic sentences p and closed under negation ¬, inﬁnitary conjunction , and the agent modalities 2A . The ﬁnitary propositions are the smallest collection closed the same way, except that we replace by its special cases tr and the binary conjunction operation. Syntactic and Semantic Notions. It will be important for us to make a sharp distinction between syntactic and semantic notions. For example, we speak of atomic sentences and atomic propositions. The difference for us is that atomic sentences are entirely syntactic objects: we won’t treat an atomic sentence p as anything except an unanalyzed mathematical object. On the other hand, this atomic sentence p also has associated with it the atomic proposition p. For us p will be a function whose domain is the (proper class of) state models, and it is deﬁned by (3)

pS

=

{s ∈ S : s ∈ pS }.

This difference may seem pedantic at ﬁrst, and surely there are times when it is sensible to blur it. But for various reasons that will hopefully become clear, we need to insist on it. [ 13 ]

178

LOGICS FOR EPISTEMIC PROGRAMS

Up until now, the only syntactic objects have been the atomic sentences p ∈ AtSen. But we can build the collections of ﬁnitary and inﬁnitary atomic sentences by the same deﬁnitions that we have seen, and then the work of the past section is the semantics of our logical systems. For example, we have sentences p ∧ q, 2A ¬p, and 2∗B q. These then have corresponding epistemic propositions as their semantics: p ∧ q, 2A ¬p, and 2∗B q, respectively. Note that the latter is a properly inﬁnitary proposition (and so 2∗B q is a properly inﬁnitary sentence); it abbreviates an inﬁnite conjunction. The rest of this section studies examples of the semantics and it also makes the connection of the formal system with the informal notions of knowledge, belief and common knowledge. We shall study Scenario 3 of Section 1.1, where A opens the box herself to see heads in a semi-public way: B sees A open the box but not the result, A is aware of this, etc. We want to study the model after the opening. We represented this as

We ﬁrst must represent this picture as a bona ﬁde state model S3 in our sense.2 The picture includes no explicit states, but we must ﬁx some states to have a representation. We choose distinct objects s and t. Then we take as our state model S3 , as deﬁned by S3

=

{s, t}

A

= =

{(s, s), (t, t)} {(s, s), (s, t), (t, s), (t, t)}

→ B →

H T

= =

{s} {t}

In Figure 1, we list some sentences of English along with their translations into standard epistemic logic. We also have calculated the semantics of the formal sentences in the model S3 . It should be stressed that the semantics is exactly the one deﬁned in the previous section. For example, 2A TS3

A

=

{u ∈ S3 : if u → v, then v ∈ TS3 }

= =

{u ∈ S3 : if u → v, then v = t} {t}

A

We also emphasize two other points. First, the translation of English to the formal system is based on the rendering of “A knows” (or “A has a justiﬁed belief that”) as 2A , and of “it is common knowledge that” by 2∗A,B . Second, the chart bears a relation to intuitions that one naturally has about the states s and t. Recall that s is the state that obtains after A looks [ 14 ]

179

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

English the coin shows heads A knows the coin shows heads A knows the coin shows tails B knows that the coin shows head A knows that B doesn’t know it’s heads B knows that A knows that B doesn’t know it’s heads it is common knowledge that either A knows it’s heads or A knows that it’s tails it is common knowledge that B doesn’t know the state of the coin

Formal rendering 2A H 2A T 2B H

Semantics {t} {s} {t} ∅

2A ¬2B H

{s, t}

2B 2A ¬2B H

{s, t}

2∗A,B (2A H ∨ 2A T)

{s, t}

H

2∗A,B ¬(2B H ∨ 2B T) {s, t}

Figure 1. Examples of translations and semantics.

and sees that the coin is lying heads up. The state t, on the other hand, is a state that would have been the case had A seen tails when she looked. 2.3. Updates A transition relation between state models S and T is a relation between the sets S and T . We write r : S → T for this. An update r is a pair of operations r

=

(S → S(r), S → rS ),

where for each S ∈ StateModels, rS : S → S(r) is a transition relation. We call S → S(r) the update map, and S → rS the update relation. EXAMPLE 2.1. Let ϕ be an epistemic proposition. We get an update Pub ϕ which represents the public announcement of ϕ. For each S, S(Pub ϕ) is the sub-state-model of S determined by the states in ϕ S . In this submodel, information about atomic sentences and accessibility relations is simply inherited from the larger model. The update relation (Pub ϕ)S is the inverse of the inclusion relation of S(Pub ϕ) into S. EXAMPLE 2.2. We also get a different update ?ϕ which represents testing whether ϕ is true. Here we take S(?ϕ) to be the model whose state set is ({0} × ϕ S ) ∪ ({1} × S). [ 15 ]

180

LOGICS FOR EPISTEMIC PROGRAMS A

A

The arrows are deﬁned by (i, s) →(j, t) iff s → t in S and j = 1. (Note that states of the form (0, s) are never the targets of arrows in the new model.) Finally, we set =

pS(?ϕ )

{(i, s) ∈ S(?ϕ) : s ∈ pS }.

The relation (?ϕ)S is the set of pairs (s, (0, s)) such that s ∈ ϕ S . We shall study these two examples (and many others) in the sequel, and in particular we shall justify the names “public announcement” and “test”. For now, we continue our general discussion by noting that the collection of updates is closed in various ways. 1. Skip: there is an update 1 with S(1) = S, and 1S is the identity relation on S. 2. Sequential Composition: if r and s are epistemic updates, then their composition r; s is again an epistemic update, where S(r; s) = S(r)(s), and (r; s)S = rS ; sS(r) . Here, we use on the right side the usual composition ; of relations.3 3. Disjoint Union (or Non-deterministic choice): If X is any set of epi stemic updates, then the disjoint union X r is anepistemic update, deﬁned as follows. The set of states of the model X r is the disjoint union of all the sets of states in each model S(r): {(s, r) : r ∈ X and s ∈ S(r)}. A

Similarly, each accessibility relation → is deﬁned as the disjoint union of the corresponding accessibility relations in each model: A

A

(t, r) →(u, s) iff if r = s and t → u in S(r). The valuation p in S( X r) is the disjoint union of the valuations in each state model: {(s, r) : r ∈ X and s ∈ pS(r) }. Finally, the update relation ( X r)S between S and S( X r) is the union of all the update relations rS : t ( r)S (u, s) iff tsS u. p

=

X

4. Special case: Binary Union. The (disjoint) union of two epistemic updates r and s is an update r s, given by r s = {r, s}. [ 16 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

181

5. Another special case: Kleene star (iteration). We have the operation of Kleene star on updates: {1, r, r · r, . . . , rn , . . .} r∗ = where rn is recursively deﬁned by r0 = 1, rn+1 = rn ; r. 1. Crash: We can also take X = ∅ in part 3. This gives an update 0 such that S(0) is the empty model for each S, and 0S is the empty relation. The operations r; s, r s and r∗ are the natural analogues of the operations of relational composition, union of relations and iteration of a relation, and also of the regular operations on programs in PDL. The intended meanings are: for r; s, sequential composition (do r, then do s); for r s, non-deterministic choice (do either r or s); for r∗ , iteration (repeat r some ﬁnite number of times). A New Operation: Synamic Modalities for Updates. If ϕ is an epistemic proposition and r an update, then [r]ϕ is an epistemic proposition deﬁned by (4)

([r]ϕ)S

=

{s ∈ S : if s rS t, then t ∈ ϕ S(r) }.

We should compare (4) and (2). The point is that we may treat updates in a similar manner to other box-like modalities; the structure given by an update allows us to do this. This point leads to the formal languages which we shall construct in due course. But we can illustrate the idea even now. Suppose we want to interpret the sentence [Pub H]2A H in our very ﬁrst model, the model S1 pictured again below:

We shall call the two states s (where H holds) and t. Again, we want to determine [[[Pub H]2A H]]S1 . We already have in Example 2.1 a general deﬁnition of Pub H as an update, so we can calculate S1 (Pub H) and also (Pub H)S1 . We indicate these in the picture

The one-state model on the right is S1 (Pub H), and the dotted arrow shows (Pub H)S1 . So we calculate: [[[Pub H]2A H]]S1 = {s ∈ S1 : whenever s (Pub H)S1 t, then also t ∈ [[2A H]]S1 (Pub H) }. [ 17 ]

182

LOGICS FOR EPISTEMIC PROGRAMS

In S1 (Pub H), the state satisﬁes 2A H. Thus [[[Pub H]2A H]]S1 = {s, t}. It might be more interesting to consider Pub H2A H; this is ¬[Pub H]¬2A H. Similar calculations show that [[Pub H2A H]]S1 = {s ∈ S1 : for some t, s (Pub H)S1 t and t ∈ [[2A H]]S1 (Pub H) }. The point here is that we have a general semantics for sentences like [Pub H]2A H, and this semantics crucially uses Equation (4). That is, to determine the truth set of a sentence like [Pub H]2A H in a particular model S, one applies the update map to S and works with the update relation between the S and the S([Pub H]); one also uses the semantics of 2A H in the new model. This overall point is one of the two leading features of our approach; the other is the update product which we introduce in Section 3. 2.4. The Target Logical Systems This paper presents a number of logical systems which contain epistemic operators of various types. These operators are closely related to aspects of the scenarios of Section 1.1. The logics themselves are presented formally in Section 4, but this work takes a fair amount of preparation. We delay this until after we motivate the subject, and so we turn to an informal presentation of the syntax and semantics of some logics. The overall idea is that the operators we study correspond roughly to the shapes of the action models which we shall see in Section 3.5. THE LOGIC OF PUBLIC ANNOUNCEMENTS. We take a two-place sentential operator [Pub −]−. That is, we want an operation taking sentences ϕ and ψ to a new sentence [Pub ϕ]ψ, and we want a logic closed under this operation. The intended interpretation of [Pub ϕ]ψ is: assuming that ϕ holds, then announcing it results in a state where ψ holds. The announcement here should be completely public. The semantics of every sentence χ will be an epistemic proposition [[χ]] in the sense of Section 2.1. Note that we refer to this operator as a two-place one. This just means that it takes two sentences and returns another sentence. We also think of [Pub ϕ] as a modal operator in its own right, on a par with the knowledge operators 2A . And so we shall think of Pub as something which takes sentences into one-place modal operators. We also consider a dual Pub ϕ to [Pub ϕ]. As one expects, the semantics will arrange that Pub ϕψ and ¬[Pub ϕ]¬ψ be logically equivalent. (That is, they will hold at the same states.) Thus, the intended [ 18 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

183

interpretation of Pub ϕψ is: ϕ holds, and announcing it results in a state where ψ holds. As one indication of the difference, in S1 , y |= [Pub H]true (vacuously, as it were: it is not possible to make a true announcement of H in y). But we do have y |= ¬Pub Htrue; this is how our semantics works out. Once again we only consider true announcements. Our semantics will arrange that ϕ → (Pub ϕψ ↔ [Pub ϕ]ψ). So in working with the example scenarios, the difference between the two modalities will be small. Further, we iterate the announcement operation, obtaining sentences such as [Pub p][Pub q]r. We also want to consider announcements about announcements, as in Pub Pub ϕψχ. This says that it is possible to announce publicly Pub ϕψ, and as a result of a true announcement of this sentence, χ will hold. THE LOGIC OF COMPLETELY PRIVATE ANNOUNCEMENTS TO GROUPS. This time, the syntax is more complicated. If ϕ and ψ are sentences and B is a set of agents, then [PriB ϕ]ψ is again a sentence. The intended interpretation of this is: assuming that ϕ holds, then announcing it publicly to the subgroup B in such a way that outsiders do not even suspect that the announcement happened results in a state where ψ holds. (The “Pri” in the notation stands for “private.”) For example, this kind of announcement occurs in the passage from Scenario 1 to Scenario 4; that is, “cheating” is a kind of private announcement to the “group” {A}. We want it to be the case that in S1 , x |= Pri{A} H(2A H ∧ ¬2B 2A H). That is, in x, it is possible to announce H to A privately (since H is true in x), and by so doing we have a new state where A knows this fact, but B does not know that A knows it. The logic of private announcements to groups allows as modal operators [PriB ϕ] for all sentences ϕ of the logic. We shall show that this logical system extends the logic of public announcements, the idea being that when we take B to be the full set A of agents, Pub ϕ and PriA ϕ should be equivalent in the appropriate sense. THE LOGIC OF COMMON KNOWLEDGE OF ALTERNATIVES. If ϕ is again and ψ are sentences and B is a set of agents, then [CkaB ϕ]ψ a sentence. The intended interpretation of this is: assuming that ϕ1 holds, then announcing it publicly to the subgroup B in such a way it is common knowledge to the set of all agents that the announcement was one of ϕ [ 19 ]

184

LOGICS FOR EPISTEMIC PROGRAMS

Syntactic sentence ϕ language L, L(), etc. action signature basic action expression σ ψ program π , action α

Semantic epistemic proposition ϕ state model S update r epistemic program model (, , ψ 1 , . . . ψ n ) canonical action model

Figure 2. The main notions in this paper.

results in a state where ψ holds. For example, consider Scenario 3. In S1 , we have x |= ¬2A H ∧ Cka{A} H, T(2A (H ∧ ¬2B 2A H) ∧ 2B (2A H ∨ 2A T)). The action here is A learning that either the coin lies heads up or that it lies tails up, and this is done in such a way that B knows that A learns one of the two alternatives but not which one. Before the action, A does not know that the coin lies heads up. As a result, A knows this, and knows that B does not know it. At this point, we have the syntax of some logical languages and also examples of what we intend the semantics to be. We do not yet have the formal semantics of any of these languages, of course. However, even before we turn to this, we want to be clear that our goal in this paper is to study a very wide class of logical systems, including ones for the representation of all possible “announcement types.” The reasons to be interested in this approach rather than to study a few separate logics are as follows: (1) it is more general and elegant to have a uniﬁed presentation; and (2) it gives a formal account of the notion of an “announcement type” which would be otherwise lacking and which should be of independent interest. So what we will really be concerned with is: THE LOGIC OF ALL POSSIBLE EPISTEMIC ACTIONS. If α is an epistemic action and ϕ is a sentence, then [α]ϕ is again a sentence. Here α is some sort of epistemic action (which of course we shall deﬁne in due course); the point is that α might involve arbitrarily complex patterns of suspicion. In this way, we shall recover the four logical systems mentioned above as fragments of the larger logic of all possible epistemic actions. Our Claims in This Paper. Here are some of the claims of the paper about the logical languages we shall construct and about our notion of updates. [ 20 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

185

Figure 3. Languages in this paper.

1. Each type of epistemic action, such as public announcements, completely private announcements, announcements with common knowledge of suspicion, etc, corresponds to a natural collection of updates as we have deﬁned them above. 2. Each type also gives rise to a logical language with that type of action as a primitive construct. Moreover, it is possible to precisely deﬁne the syntax of the language to insure closure under the construct. That is, we should be able to formulate a language with announcements not just about atomic facts, but about announcements themselves, announcements about announcements, etc. 2.5. A Guide to the Concepts in This Paper In this section, we catalog the main notions that we shall introduce in due course. After this, we turn to a discussion of the particular languages that we study, and we situate them with existing logical languages. Recall the we insist on a distinction of syntactic and semantic objects in this paper. We list in Figure 2 the main notions that we will need. We do this mostly to help readers as they explore the different notions. We mention now that the various notions are not developed in the order listed; indeed, we have tried hard to organize this paper in a way that will be easiest to read and understand. For example, one of our main goals is to present a set of languages (syntactic objects) and their semantics (utilizing semantic objects). Languages. This paper studies a number of languages, and to help the reader we list these in Figure 2. Actually, what we study are not individual languages, but rather families of languages parameterized by different choices of primitives. It is standard in modal logic to begin with a set of [ 21 ]

186

LOGICS FOR EPISTEMIC PROGRAMS

atomic propositions, and we do the same. The difference is that we shall call these atomic sentences in order to make a distinction between these essentially syntactic objects and the semantic propositions that we study beginning in Section 2.3. This is our ﬁrst parameter, a set AtSen of atomic sentences. The second is a set A of agents. Given these, L0 is ordinary modal logic with the elements of AtSen as atomic sentences and with agent-knowledge (or belief) modalities 2A for A ∈ A. We add common-knowledge operators 2∗B , for sets of agents B ⊆ A, to get a larger language L1 . In Figure 2, we note the fact that L0 is literally a subset of L1 by using the inclusion arrow. The syntax and semantics of L0 and L1 are presented in Figure 4. Another close neighbor of the system in this paper is Propositional Dynamic Logic (PDL). PDL was ﬁrst formulated by Fischer and Ladner (1979), following the introduction of dynamic logic in Pratt (1976). The syntax and the main clauses in the semantics of PDL are shown in Figure 5. We may also take L0 and close under countable conjunctions (and hence also disjunctions). We call this language Lω0 . Note that L1 is not literally a subset of Lω0 , but there is a translation of L1 into Lω0 that preserves the semantics. We would indicate this in a chart with a dashed arrow. Going further, we may close under arbitrary (set-sized) boolean operations; this language is then called L∞ 0 . PDL is propositional dynamic logic, formulated with atomic programs a replacing the agents A that we have ﬁxed. We might note that we can translate L1 into PDL. The main clauses in the translation are: (2A ϕ)t (2∗B ϕ)t

= =

t [a]ϕ [( b∈B b)∗ ]ϕ t

Beginning in Section 4.2, we study some new languages. These will be based on a third parameter, an action signature . For each “type of epistemic action” there will be an action signature . For each , we’ll have languages L0 ( ), L1 ( ), and L( ). More generally, for each family of action signatures S, we have languages L0 (S), L1 (S), and L(S). These will be related to the other languages as indicated in the ﬁgure. So we shall extend modal logic by adding one or another type of epistemic action. The idea is then we can generate a logic from an action signature corresponding to an intuitive action. For example, corresponding to the notion of a public announcement is a particular action signature

pub, and then the languages L0 ( pub), L1 ( pub), and L( pub) will have something to do with our notion of a “logic of public announcements.” [ 22 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

187

Figure 4. The languages L0 and L1 . For L0 , we drop the 2∗B construct.

Figure 5. Propositional Dynamic Logic (PDL).

In Baltag et al. (2003), we compare the expressive power of the languages mentioned so far. It turns out that all of the arrows in Figure 3 correspond to proper increases in expressive power. (There is one exception: L0 turns out to equal L0 (S) in expressive power for all S.) It is even more interesting to compare expressive power as we change the action signature. For example, we would like to compare logics of public announcement to logics of private announcement to groups. Most of the natural questions in this area are open as of this writing. 2.6. Reformulation of Test-only PDL In PDL, there are two types of syntactic objects, sentences and programs. The programs in PDL are interpreted on a structure by relations on that structure. This is not the way our semantics works, and to make this point we compare the standard semantics of (a fragment of) PDL with a language closer to what we shall ultimately study. To make this point, we consider the test-only fragment of PDL in our terms. This is the fragment built over the empty set of atomic programs. So the programs are skip, the tests ?ϕ and compositions, choices, or iterations of these; sentences are formed as in PDL We give a reformulation in Figure 6. The point is that in PDL the programs are interpreted by relations on a given model, and in our terms programs are interpreted by updates. We have discussed updates of the form ?ϕ in Example2.2. Given that we have [ 23 ]

188

LOGICS FOR EPISTEMIC PROGRAMS

Figure 6. Test-only PDL, with a semantics in our style.

an interpretation of the sentence ϕ as an epistemic proposition [[ϕ]], we then automatically have an update ?[[ϕ]]. For the sentences, the main thing to look at is the semantics of sentences [π ]ϕ; here we use the semantic notions from Section 2.3. The way the semantics works is that we have [[π ]] and [[ϕ]]; the former is an update and the latter is an epistemic proposition. Then we use both of these to get an overall semantics, using Equation (4). In more explicit terms, [[[π ]ϕ]]S

= =

([[[π ]]][[ϕ]]])S {s ∈ S : if s [[π ]]S t, then t ∈ [[ϕ]]S }

2.7. Background: Bisimulation We shall see the concept of bisimulation at various points in the paper, and this section reviews the concept and also develops some of the appropriate deﬁnitions concerning bisimulation and updates. DEFINITION. Let S and T be state models. A bisimulation between S and T is a relation R ⊆ S × T such that whenever s R t, the following three properties hold: 1. s ∈ pS iff t ∈ pT for all atomic sentences p. That is, s and t agree on all atomic information. A s , there is some state t 1. For all agents A and states s such that s → A such that t → t and s R t . A t , there is some state s 2. For all agents A and states t such that t → A such that s → s and s R t . EXAMPLE 2.3. This example concerns the update operation ?ϕ of Example2.2. Fix an epistemic proposition ϕ and a state model S. Recall that [ 24 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

189

(?ϕ)S relates each s ∈ ϕ to the corresponding pair (0, s). We check that the following relation R is a bisimulation between S and S(?ϕ): R

=

(?ϕ)S ∪ {(s, (1, s)) : s ∈ S}.

The deﬁnition of S(?ϕ) insures that the interpretations of atomic sentences are preserved by this relation. A t in S and s R (i, s ). Then we must have s = s. Next, suppose that s → A (1, t). Further, the deﬁnition of R tells us that t R (1, t), and (i, s) → A Finally, suppose that s R (i, s ) and (i, s ) → (j, t). Then s = s and A A in S(?ϕ) implies that s → t in S. j = 1. In addition, the deﬁnition → And as above, t R (1, t). This concludes our veriﬁcations. Recall that L0 is ordinary modal logic, formulated as always over our ﬁxed sets of agents and atomic sentences. The next result concerns the language L∞ 0 of inﬁnitary modal logic. In this language, one has conjunctions and disjunctions of arbitrary sets of sentences. We have the following well-known result: PROPOSITION 2.4. If there is a bisimulation R such that s R t, then s and ∞ t agree on all sentences in L∞ 0 : for all ϕ ∈ L0 , s ∈ [[ϕ]]S iff t ∈ [[ϕ]]T . A pointed state model is a pair (S, s) such that s ∈ S. The state s is called the designated state (or the “point”) of our pointed model, and is meant to represent the actual state of the system. Two pointed models models are said to be bisimilar if there exists a bisimulation relation between them which relates their designated states. So, denoting by ≡ the relation of bisimilarity, we have: (S, s) ≡ (T, t) iff T such that s R t.

there is a bisimulation R between S and

This relation ≡ is indeed an equivalence relation. When S and T are clear from the context, we write s ≡ t instead of (S, s) ≡ (T, t). We say that a proposition ϕ preserves bisimulations if whenever (S, s) ≡ (T, t), then s ∈ ϕ S iff t ∈ ϕ T . We also say that an update r preserves bisimulations if the following two conditions hold: 1. If s rS s and (S, s) ≡ (T, t), then there is some t such that t rT t and (S(r), s ) ≡ (T(r), t ). 2. If t rT t and (S, s) ≡ (T, t), then there is some s such that s rS s and (S(r), s ) ≡ (T(r), t ). [ 25 ]

190

LOGICS FOR EPISTEMIC PROGRAMS

PROPOSITION 2.5. Concerning bisimulation preservation: 1. The bisimulation preserving propositions include the atomic propositions p, and they are closed under all of the (inﬁnitary) operations on propositions. 2. The bisimulation preserving updates are closed under composition and (inﬁnitary) sums. 3. If ϕ and r preserve bisimulations, so does [r]ϕ. Proof. We show the last part. Suppose that s ∈ ([r]ϕ)S , and let (S, s) ≡ (T, t). To show that t ∈ ([r]ϕ)T , let t rT t . Then by condition (2) above, there is some s such that s rS s and (S(r), s ) ≡ (T(r), t ). Since s ∈ ([r]ϕ)S , we have s ∈ ϕ S(r) . And then t ∈ ϕ T(r) , since ϕ too preserves bisimulation. Endnotes. As far as we know, the ﬁrst paper to study the interaction of communication and knowledge in a formal setting is Plaza’s paper “Logics of Public Communications” (Plaza 1989). As the title suggests, the epistemic actions studied are public announcements. In essence, he formalized the logic of public announcements. (In addition to this, (Plaza 1989) contains a number of results special to the logic of announcements which we have not generalized, and it also studies an extension of the logic with non-rigid constants.) The same formalization was made in Gerbrandy (1999a, b), and also Gerbrandy and Groeneveld (1997). These papers further formalize the logic of completely private announcements. (However, their semantics use non-wellfounded sets rather than arbitrary Kripke models. As pointed out in Moss (1999), this restriction was not necessary.) The logic of common knowledge of alternatives was formulated in Baltag et al. (1998) and also in van Ditmarsch’s dissertation (2000). Our introduction of updates is new here, as are the observations on the test-only fragment of PDL. For connections of the ideas here with coalgebra, see Baltag (2003). One very active arena for work on knowledge is distributed systems, and the main source of this work is the book Reasoning About Knowledge Fagin et al. (1996). We depart from Fagin et al. (1996) by introducing operators whose semantics are updates as we have deﬁned them, and by doing without temporal logic operators. In effect, our Kripke models are simpler, since they do not incorporate all of the runs of a system; the new operators can be viewed as a compensation for that. REMARK. Our formulation of a program model uses a designated set of simple actions. There are other equivalent formulations. Another way would be to use a disjoint union of pointed program models; It would be [ 26 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

191

possible to further reformulate some of our deﬁnitions below and thereby to give a semantics for our ultimate languages that differs from what we obtain in Section 4.2 below. However, the two semantics would be equivalent. The reason we prefer to work with designated sets is that it permits us to draw diagrams with fewer states. The cost of the simpler representations is the slightly more complicated deﬁnition, but we feel this cost is worth paying.

3. THE UPDATE PRODUCT OPERATION

In this section, we present the centerpiece of the formulation of our logical systems by introducing action models, program models, and an update product operation. The leading idea is that epistemic actions, like state models, have the property that different agents think that different things are possible. To model the effect of an epistemic update on a state, we use a kind of product of epistemic alternatives. 3.1. Epistemic Action Models Let be the collection of all epistemic propositions. An epistemic action A A , pre), where is a set of simple actions, → model is a triple = ( , → is an A-indexed family of relations on , and pre : → . An epistemic action model is similar to a state model. But we call the members of the set “simple actions” (instead of states). We use different notation and terminology because of a technical difference and a bigger conceptual point. The technical difference is that pre : → (that is, the codomain is the collection of all epistemic propositions). The conceptual point is that we think of “simple” actions as being deterministic actions whose epistemic impact is uniform on states (in the sense explained in our Introduction). So we think of “simple” actions as particularly simple kinds of deterministic actions, whose appearance to agents is uniform: the agents’ uncertainties concerning the current action are independent of their uncertainties concerning the current state. This allows us to abstract away the action uncertainties and represent them as a Kripke structure of actions, in effect forgetting the state uncertainties. As announced in the Introduction, this uniformity of appearance is restricted only to the action’s domain of applicability, deﬁned by its preconditions. Thus, for a simple action σ ∈ , we interpret pre(σ ) as giving the precondition of σ ; this is what needs to hold at a state (in a state model) in order for action σ to be “accepted” in that state. So σ will be executable in s iff its precondition pre(σ ) holds at s. [ 27 ]

192

LOGICS FOR EPISTEMIC PROGRAMS

At this point we have mentioned the ways in which action models and state models differ. What they have in common is that they use accessibility relations to express each agent’s uncertainty concerning something. For state models, the uncertainty has to do with which state is the real one; for action models, it has to do with which action is taking place. Usually we drop the word “epistemic” and therefore refer to “action models”. EXAMPLE 3.1. Here is an action model:

A B A B Formally, = {σ, τ }; σ → σ, σ → τ, τ → τ, τ → τ ; pre(σ ) = H, and pre(τ ) = tr, where recall that tr is the “always true” proposition. As we shall see, this action model will be used in the modeling of a completely private announcement to A that the coin is lying heads up. Further examples may be found later in this section.

3.2. Program Models To model non-deterministic actions and non-simple actions (whose appearances to agents are not uniform on states), we deﬁne epistemic program models. In effect, this means that we decompose complex actions (’programs’) into “simple” ones: they correspond to sets of simple, deterministic actions from a given action model. An epistemic program model is deﬁned as a pair π = ( , ) consisting of an action model and a set of designated simple actions. Each of the simple actions γ ∈ can be thought as being a possible “deterministic resolution” of the non-deterministic action π . As announced above, the intuition about the map pre is that an action is executable in a given state only if all its preconditions hold at that state. We often spell out an epistemic A A , pre, ) rather than (( , → , pre), ). When program model as ( , → drawing the diagrams, we use doubled circles to indicate the designated actions in the set . Finally, we usually drop the word “epistemic” and just refer to these as program models. EXAMPLE 3.2. Every action model and every σ ∈ gives a program model by taking {σ } as the set of designated simple actions. For instance, in connection with Example 3.1, we have

[ 28 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

193

with pre(σ ) = H, pre(τ ) = tr, as before. Program models of this type are very common in our treatment. But we need the extra ability to have sets of designated simple actions to deal with more complicated actions, as our next examples show. EXAMPLE 3.3. A Non-deterministic Action. Let us model the nondeterministic action of either making a completely private announcement to A that the coin is lying heads up, or not making any announcement. The action is completely private, so B doesn’t suspect anything: he thinks no announcement is made. The program model is obtained by choosing = {σ, τ } in the action model from Example 3.1. The picture is

with pre(σ ) = H, pre(τ ) = tr, as before. Alternatively, one could take the disjoint union of with the one-action program model with precondition tr. EXAMPLE 3.4. A Deterministic, but Non-simple Action. Let us model the action of (completely) privately announcing to A whether the coin is lying heads up or not. Observe that this is a deterministic action; that is, the update relation is functional: at any state, the coin is either heads up or not. But the action is not simple: its appearance to A depends on the state. In states in which the coin is heads up, this action looks to A like a private announcement that H is the case; in the states in which the coin is not heads up, the action looks to A like a private announcement that ¬H. (However, the appearance to B is uniform: at any state, the action appears to him as if no announcement is made.) The only way to model this deterministic, but non-simple, action in our setting is as a non-deterministic program model, having as its ‘designated’ actions two mutually exclusive simple actions: one corresponding to a (truthful) private announcement that H, and another one corresponding to a (truthful) private announcement that ¬H.

with pre(σ ) = H, pre(τ ) = tr, and pre(ρ) = ¬H. [ 29 ]

194

LOGICS FOR EPISTEMIC PROGRAMS

3.3. The Update Product of a State Model with an Epistemic Action Model The following operation plays a central role in this paper. A Given a state model S = (S, → S , · S ) and an action model = A ( , → , pre), we deﬁne their update product to be the state model S⊗

=

A , .), (S ⊗ , →

given by the following: the new states are pairs of old states s and simple actions σ which are “consistent”, in the sense that all preconditions of the action σ “hold” at the state s (5)

S⊗

=

{(s, σ ) ∈ S × : s ∈ pre(σ )S }.

The new accessibility relations are taken to be the “products” of the corresponding accessibility relations in the two frames; i.e., for (s, σ ), (s , σ ) ∈ S ⊗ we put (6)

A (s , σ ) iff (s, σ ) →

A A s→ s and σ → σ ,

and the new valuation map .S : AtSen → P (S ⊗ ) is essentially given by the old valuation: (7)

pS⊗

=

{(s, σ ) ∈ S ⊗ : s ∈ pS }.

Intended Interpretation. The update product restricts the full Cartesian product S × to the smaller set S ⊗ in order to insure that states survive actions in the appropriate sense. A on the output frame represFor each agent A, the product arrows → ent agent A’s epistemic uncertainty about the output state. The intuition is that the components of our action models are “simple actions”, so the uncertainty regarding the action is assumed to be independent of the uncertainty regarding the current (input) state. This independence allows us to “multiply” these two uncertainties in order to compute the uncertainty regarding the output state: if whenever the input state is s, agent A thinks the input might be some other state s , and if whenever the current action happening is σ , agent A thinks the current action might be some other action σ , and if s survives σ , then whenever the output state (s, σ ) is reached, agent A thinks the alternative output state (s , σ ) might have been reached. Moreover, these all of the output states that A considers possible are of this form. [ 30 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

195

As for the valuation, we essentially take the same valuation as the one in the input model. If a state s survives an action, then the same facts p hold at the output state (s, σ ) as at the input state s. This means that our actions, if successful, do not change the facts. This condition can of course be relaxed in various ways, to allow for fact-changing actions. But in this paper we are primarily concerned with purely epistemic actions, such as the earlier examples in this section. 3.4. Updates Induced by Program Models Recall that we deﬁned updates and proper epistemic actions in Section 2.3. Right above, in Section 3.2, we deﬁned epistemic action models. Note that there is a big difference: the updates are pairs of operations on the class of all state models, and the program models are typically ﬁnite structures. We think of program models as capturing speciﬁc mechanisms, or algorithms, for inducing updates. This connection is made precise in the following deﬁnition. DEFINITION. Let ( , ) be a program model. We deﬁne an update which we also denote ( , ) as follows: 1. S( , ) = S ⊗ . 2. s ( , )S (t, σ ) iff s = t and σ ∈ . We call this the update induced by ( , ). Bisimulation Preservation. Before moving to examples, we note a simple result that will be used later. PROPOSITION 3.5. Let be an action model in which every pre(σ ) is a bisimulation preserving epistemic proposition. Let ⊆ be arbitrary. Then the update induced by ( , ) preserves bisimulation. Proof. Write r for the update induced by the program model ( , ). Fix S and T, and suppose that s ≡ t via the relation R0 . Suppose that s rS s , so s ∈ S(r) is of the form (s, σ ) for some σ ∈ . Then (t, σ ) ∈ T(r), and clearly t rT (t, σ ). We need only show that (s, σ ) ≡ (t, σ ). But the following relation R is a bisimulation between S(r) and T(r): (s , τ1 ) R (t , τ2 ) iff

s R0 t and τ1 = τ2 .

The veriﬁcation of the bisimulation properties is easy. And R shows that (s, σ ) ≡ (t, σ ), as desired. [ 31 ]

196

LOGICS FOR EPISTEMIC PROGRAMS

3.5. The Coin Scenario Models as Examples of the Update Product We return to the coin scenarios of Section 1.1. Our purpose is partly to indicate how one may obtain the models there as examples of the update product, and at the same time to exemplify the update product construction itself. In this section, the set A of agents is {A, B}, and the set AtSen of atomic propositions is {H, T}. We remind the reader that T represents the coin lying tails up, while tr is our notation for the true epistemic proposition. EXAMPLE 3.6. We begin with an example worked out with many details, and then the rest of our examples will omit many similar calculations. This example has to do with Scenario 4 from Section 1.1, where the coin lies heads up and A takes a look at the coin in such a way that she is certain that B does not suspect anything. We take as S1 and 4 the structures shown below:

(We remind the reader that T is the atomic sentence for “tails” and tr is for “true”.) S1 is from Scenario 1. In 4 , we take the set of distinguished states to be {σ }. comes from Examples 3.1 and 3.2. To take the update product, we ﬁrst form the cartesian product S1 × 4 : {(s, σ ), (s, τ ), (t, σ ), (t, τ )} Of these four pairs, we only want those whose ﬁrst component satisﬁes (in S) the precondition of the second component. We do not want (t, σ ), / [[H]]S . But the other three pairs do satisfy our since pre(σ ) = H and t ∈ condition. So the state model S1 ⊗ 4 will have three states: (s, σ ), (s, τ ), and (t, τ ). The atomic information is inherited from the ﬁrst component, so we have [[H]]S1 ⊗ 4 = {(s, σ ), (s, τ )} and [[T]]S1 ⊗ 4 = {(t, τ )}. The A B and → are those of the product. For example, accessibility relations → B B B s and σ → τ . But we do not have we have (s, σ ) → (s, τ ), because s → A A (s, σ ) → (s, τ ), because σ → τ is false. Now, we rename the states as follows: (s, σ ) ; s

(s, τ ) ; t

(t, τ ) ; u

And then we get a picture of this state model, the same one we had in Scenario 4: [ 32 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

197

The dotted line here shows the update relation between s and (s, σ ). This is the only update relation. For example, we do not relate s and (s, τ ) because τ ∈ / = {σ }. Let S4 = S1 ⊗ 4 . EXAMPLE 3.7. 4 from Example 3.6 represents the action where A cheats, learning heads or tails in a way which is oblivious to B. It is also natural to consider an action of A privately learning the state of the coin. (This action may take place even if the state is tails.) We could use the following program model 4 :

This program model consists of two disjoint components. We leave it to the reader to calculate S1 ⊗ 4 , and also the relation between “before” and “after”. EXAMPLE 3.8. Next we construct the model corresponding to Scenario 5, where B cheats after Scenario 4. We consider the update product of the state model S4 from Example 3.6 above with the program model 5 shown below:

It represents cheating by B. The update product S4 ⊗ 5 has ﬁve states: (s, σ ), (t, σ ), (s, τ ), (t, τ ), and (u, τ ). Notice that (u, σ ) is omitted, since u∈ / [[H]]S4 . We leave it to the reader to calculate the accessibility relations in S4 ⊗ 5 ; and to draw the appropriate ﬁgure. Incidentally, we posed in Section 1.1 the exercise of constructing this representation from ﬁrst principles. Many people are able to construct the ﬁve state picture, and some others construct a related picture with seven states. The seven state picture is bisimilar to the one illustrated here. EXAMPLE 3.9. We next look back at Scenario 2. The simplest action structure is 2 : (8) [ 33 ]

198

LOGICS FOR EPISTEMIC PROGRAMS

It represents a public announcement to A and B that the coin is lying heads up. Here, the distinguished set is the entire action structure. For the A Pub record, we formalize 2 as a singleton set {Pub}. We have Pub → B and Pub → Pub. Also, we set pre(Pub) = H. We did not put the name Pub in the representation in (8), but in situations where we want the name we would draw the same picture except that instead of H we would say Pub : H. Let us call the same structure S2 when we view it as a state model; formally these are the same object. Let S be any model with the property that every action has both a A B where H is true, and also a successor in → where H successor in → is true. Then S ⊗ 2 is bisimilar to S2 . In particular, S1 ⊗ 2 is bisimilar to S2 . EXAMPLE 3.10. Let 3 be the following program model:

3 represents an announcement to A of heads in the manner of Scenario 3. That is, B knows that A either sees heads or sees tails, but not which. Similarly, let let 3 represent the same kind of announcement to B:

Then we have the following: S1 ⊗ 3 ∼ = denotes = S3 , where S3 is the model in Scenario 3 and ∼ isomorphism. S3 ⊗ 3 ∼ = S2 . This conforms with the intuition that successive semiprivate viewings by the two parties of the concealed coin amount to a public viewing. S3 ⊗ 3 ∼ = S3 . There is no point for A to look twice. S4 ⊗ 3 is a three-state model bisimilar to the model S4 from Scenario 4.

EXAMPLE 3.11. To obtain the model in Scenario 7, we use the following program model 7 : [ 34 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

199

with pre(ρ) = tr, pre(σ ) = H, and pre(τ ) = T. As illustrated, the set is the entire set 7 of simple actions. More generally, the general shape of 7 is the frame for an action which has three possibilities. A learns which one happens, and B merely learns that one of the three happens. Further, these two aspects are common knowledge. We omit the calculation that shows that S1 ⊗ 7 is the model drawn in Scenario 7. EXAMPLE 3.12. The program model employed in Scenario 8 is 8 , shown below:

Again we take pre(ρ) = tr, pre(σ ) = H, and pre(τ ) = T. The difference between this and 7 above, is that instead of = 7 , we take to be {s}. Then S1 ⊗ 8 ∼ = S8 . 3.6. Operations on Program Models 1 and 0. We deﬁne program models 1 and 0 as follows: 1 is a one-action A σ for all A, P RE(σ ) = tr, and with distinguished set set {σ } with σ → {σ }. The point here is that the update induced by this program model is exactly the update 1 from Section 2.3. We purposely use the same notation. Similarly, we let 0 be the empty program model. Then its induced update is what we called 0 in Section 2.3. Sequential Composition. In all settings involving “actions” in some sense or other, sequential composition is a natural operation. In our setting, we would like to deﬁne a composition operation on program models, corresponding to the sequential composition of updates. Here is the relevant deﬁnition. [ 35 ]

200

LOGICS FOR EPISTEMIC PROGRAMS

A A Let = ( , → , pre , ) and = (, → , pre , ) be program models. We deﬁne the composition

;

=

A , pre ; , ; ) ( × , →

to be the following program model: 1. × is the cartesian product of the sets and . A in the composition ; is the family of product relations, in the 2. → natural way: A (σ, δ) → (σ , δ )

iff

A A σ→ σ and δ → δ.

3. pre ; (σ, δ) = ( , σ )pre (δ). 4. ; = × . In the deﬁnition of pre, ( , σ ) is an abbreviation for the update ( , {σ }), as deﬁned in Section 3.4 i.e., the update induced by the program model ( , {σ }). EXAMPLE 3.13. This example constructs a program model for lying, as we ﬁnd in Scenario 6. Lying in our subject cannot be taken to be simply a case of private or public announcement: this will not work out. In our given situation, B simply knows that A doesn’t know the side of the coin, and hence cannot accept any lying announcement that would claim such knowledge. One way to make sense of the action of A (successfully) lying to B is to assume that, ﬁrst, before the lying, a suspicion was aroused in B that A might have privately learnt (e.g., by opening the box, or been told) which side of the coin was lying up; then second, that B subsequently receives an untruthful announcement that A knows the coin is lying heads up, an announcement which is known to be false by A herself (but which is believable, and now believed, by B). Obviously, we cannot express things about past actions in our logic, so we have to start right at the beginning, before the lying announcement is sent, and capture the whole action of successful lying as a sequential composition of two actions: B’s suspicion of A’s private learning, followed by B’s receiving (and believing) the lying announcement. This is what we shall do here.4 Let ϕ be ¬2A H and let ψ be H ∧ 2A H. Let 6 be

The idea is that B “learns” a false statement, namely that A knows the state of the coin. Further, we saw 8 in Example 3.12. We consider 8 ; 6 . One can check using Proposition 3.14 that S1 ⊗ ( 8 ; 6 ) ∼ = (S1 ⊗ 8 ) ⊗ 6 ∼ = [ 36 ]

201

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

S8 ⊗ 6 ∼ = S6 . In addition, one can calculate 8 ; 6 explicitly to see what a reasonable one-step program for lying would be. The interesting point is that its preconditions are complex sentences having to do with actions in our language. A A , pre , ) and = (, → , pre , ), Disjoint Union. If = ( , → we take to be the disjoint union of the models, with union of the distinguished actions. The intended meaning is the non-deterministic choice between the programs represented by and . Here is the deﬁnition in more detail, generalized to arbitrary (possibly inﬁnite) disjoint unions: let A { i }i∈I be a family of program models, with i = ( i , → i , prei , i ); we deﬁne their (disjoint) union A

i =

i , → , pre,

i∈I

i∈I

to be the model given by: 1. i∈I i is i∈I ( i × {i}), the disjoint union of the sets i . A A (τ, j ) iff i = j and σ → 2. (σ, i) → iτ . 3. pre(σ, i) = prei (σ ). 4. = i∈I (i × {i}). Iteration. Finally, we deﬁne an iteration operation by ∗ = N}. Here 0 = 1, and n+1 = n ; .

n { : n ∈

We conclude by verifying that our deﬁnition of the operations on program models are correct in the sense that they are faithful to the corresponding operations on updates from Section 2.3. PROPOSITION 3.14. The update induced by a composition of program models is the composition of the induced updates. Similarly for sums and iteration, mutatis mutandis. Proof. Let ( , ) and (, ) be program models. We denote by r the update induced by ( , ), by s the update induced by (, ), and by t the update induced by ( , ); (, ). We need to prove that r; s = t. A Let S = (S, → S , [[.]]S ) be a state model. Recall that S(r; s)

=

S(r)(s)

=

(S ⊗ ( , )) ⊗ (, ).

We claim that this is isomorphic to S ⊗ ( ; , ; ), and indeed the isomorphism is (s, (σ, δ)) → ((s, σ ), δ). We check that (s, (σ, δ)) ∈ S ⊗ ( ; ) iff ((s, σ ), δ) ∈ (S ⊗ ) ⊗ . Indeed, the following are equivalent: [ 37 ]

202 1. 2. 3. 4. 5.

LOGICS FOR EPISTEMIC PROGRAMS

(s, (σ, δ)) ∈ S ⊗ ( ; ). s ∈ pre ; (σ, δ)S . s ∈ ( , σ )pre (δ)S . (s, σ ) ∈ S ⊗ and (s, σ ) ∈ pre (δ)S⊗ . ((s, σ ), δ) ∈ (S ⊗ ) ⊗ .

The rest of veriﬁcation of isomorphism is fairly direct. We also need to check that tS and (r; s)S are related by the isomorphism. Now tS

=

{(s, (s, (σ, δ))) ∈ S ⊗ ( ; ) : σ ∈ , δ ∈ }.

Recall that (r; s)S = rS ; sS(r) and that this is a relational composition in left-to-right order. And indeed, rS sS(r)

= =

{(s, (s, σ )) : (s, σ ) ∈ S ⊗ , σ ∈ } {((s, σ ), ((s, σ ), δ)) ∈ S(r) ⊗ : δ ∈ }.

This completes the proof for composition. We omit the proofs for sums and iteration. Endnotes. The work of this section explains of how complex representations of naturally occurring scenarios may be computed from state models before the scenario and program models. Indeed, this is one of our main points. There are precursors to our work in special cases, most notably Hans van Ditmarsch’s dissertation (2000). That work is about examples like 7 , where some agent or set of agents knows that the current action belongs to some set, but does not know which action it is. But to our knowledge, the work in this section is the ﬁrst time that anything like this has been obtained in general. We have taught the material in this section in several courses at different levels. Our experience is that the material here is highly appealing to students and makes the case for formal representations in the ﬁrst place (for students not familiar with formal logic) and for the kind of technical work that we pursue in this area (for those who are). The idea of representing epistemic actions as Kripke models (or variants of them) was ﬁrst presented in our earlier paper with S. Solecki (Baltag et al. 1998). However, the proposal of that paper was to employ Kripke models in the syntax of a logical language directly. Many readers have felt this to be confusing, since the resulting syntax looked as if it depended on the semantics. Proposals to improve the language were developed in several papers of Baltag (1999, 2001, 2002, 2003). The logics of these papers were more natural than those of Baltag et al. (1998). What [ 38 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

203

is new in this paper is the overall apparatus of action signatures, epistemic actions, etc. We present what we hope is a natural syntax, and at the same time we have programs as syntactic entities in our logical languages (see Section 4 just below) linked to program models in the semantics.

4. LOGICAL LANGUAGES BASED ON ACTION SIGNATURES

This section presents the centerpiece of our formal work, the logical systems corresponding to types of epistemic action. Our overall goal is to abstract from the propositional language earlier, in a way which allows us to ﬁx only the epistemic structure of the desired actions, and vary their preconditions. 4.1. Action Signatures We have presented many examples of the update product in Section 3.5. These allow us to represent many natural updates, and this is one of our goals in this paper. But the structures which we have seen so far are not sufﬁcient to get logical languages incorporating updates. For example, we have in Example 3.7 a program model that represents a private announcement to the agent A that a proposition H happens, and this takes place in a way that B learns nothing. The picture of this was

What we want to do now is to vary things a bit, and then to abstract them. For example, suppose we want to announce a different proposition to A, say ψ. We would use

Varying the proposition ϕ, all announcements of this kind could be thought as actions of the same type. We could then represent the type of the action by the following picture:

And the previous representations include the information that what actually happens is what A hears. To vary this, we need only change which world is designated by the doubled border. We could switch things, or [ 39 ]

204

LOGICS FOR EPISTEMIC PROGRAMS

double neither or both worlds. So we obtain a structure consisting of two action types:

The oval on the left represents the type PriA of a fully private announcement to agent A, while the oval on the right simply represents the type of a skip action (or of an empty, or trivial, public announcement). By inserting any given proposition ψ into the oval depicting the action type PriA , we can obtain speciﬁc private announcements PriA ψ, as depicted above. (There is no reason to insert any proposition into the right oval, since this comes already with its own precondition tr: this means that the type of a skip action uniquely determines the corresponding action skip, since it contains all the information needed to specify this action.) Another example would be the case in which we announce ψ to A in such a way that B is misled into believing ¬ψ (and is also misled into believing that everyone learns ¬ψ). Now we use

In itself, this is not general enough to give rise to an action type in our sense. But we can more generally consider the case in which ψ 1 is announced to A in such a way that B thinks that some other proposition ψ 2 is publicly announced:

By abstracting from the speciﬁc propositions, we obtain the following structure consisting of two action types:

Observe that if we are given now a sequence of two propositions (ψ 1 , ψ 2 ), we could use them to ﬁll in the oval with preconditions in two possible ways (depending on which proposition goes into the left oval). So, in order to uniquely determine how an action type will generate speciﬁc announcements, we need an enumeration without repetition of all the action types in the structure which do not come equipped with trivial preconditions (i.e., all the empty ovals in the diagram, since we assume the others have tr inside).

[ 40 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

205

So at this point, we can see how to abstract things, leading to the following deﬁnition. DEFINITION. An action signature is a structure

=

A , (σ1 , σ2 , . . . , σn )) ( , →

A ) is a ﬁnite Kripke frame, and σ1 , σ2 , . . . , σn is a desigwhere = ( , → nated listing of a subset of without repetitions. We obviously must have n ≤ | |, but we allow the case n = 0 as well, which will produce an empty list. We call the elements of action types, while the ones which are in the listing (σ1 , . . . , σn ) will be called non-trivial action types.

The way this works for us is that an action signature together with an assignment of epistemic propositions to the non-trivial action types in

will give us a full-ﬂedged action model. The trivial action types will describe actions which can always happen, so they will get assigned the trivial precondition tr, by default. And this is the exact sense in which the notion of an action signature is an abstraction of the notion of action model. We shall shortly use action signatures in constructing logical languages. EXAMLPES 4.1. Here is a very simple action signature which we call A skip for all agents A, and we

skip . is a singleton {skip}, we put skip → take the empty list () of types as listing. So we have only one, trivial, type: skip. In a sense which we shall make clear later, this is an action in which “nothing happens”, and moreover it is common knowledge that this is the case. The type of a public announcement can simply be obtained by only changing our listing in skip , to make the type non-trivial; we also change the name of the type to Pub, to distinguish it from the previous one. So the A Pub for signature pub of public announcements has = {Pub}, Pub → every agent A, and the listing is just (Pub). So Pub is the non-trivial type of a public announcement action. Note once again that we have not said what is being announced. We are purposely separating out the structure of the announcement from the content, and Pub is our model of the structure. The next simplest action signature is the “test” signature ? . We take A A skip, and skip → skip

? = {?, skip}, with the listing (?). We also take ? → for all A. So the test ? is the only non-trivial type of action here. This turns out to be a totally opaque form of test: ϕ is tested on the real world, but nobody knows this is happening: when it happening, all the agents think nothing (i.e., skip) is happening. The function of this type will be to generate tests ?ϕ, which affect the states precisely in the way dynamic logic tests do. [ 41 ]

206

LOGICS FOR EPISTEMIC PROGRAMS

For each set B ⊆ A of agents, we deﬁne the action signature PriB of completely private announcements to the group B. It has = {PriB , skip}; the listing is just (PriB ), which makes skip into a trivial type B C PriB for all B ∈ B, PriB → skip for C ∈ B, again; and we put PriB → A and skip → skip for all agents A. The action signature CkaB k is given by: = {1, . . . , k}; the listing is B i for i ≤ k and B ∈ B; and (1, 2, . . . , k), so all types are non-trivial. i → C j for i, j ≤ k and C ∈ B. This action signature is called the ﬁnally i → signature of common knowledge of alternatives for an announcement to the group B. Signature-based Program Models. Now that we have a general abstract notion, we introduce some notation to regain the earlier examples. Let

be a action signature, let (σ1 , . . . , σn be the corresponding listing of non = ψ 1 , . . . , ψ n be a list of epistemic trivial types, let ⊆ , and let ψ propositions. We obtain an epistemic program model ( , )(ψ 1 , . . . , ψ n ) in the following way: 1. The set of simple actions is , and the accessibility relations are those given by the action signature. 2. For j = 1, . . . , n, pre(σj ) = ψ j . We put pre(σ ) = tr for all the other (trivial) actions. 3. The set of distinguished actions is . In the special case that is the singleton set {σi }, we write the resulting signature-based program model as ( , σi , ψ 1 , . . . , ψ n ). To summarize: every action signature, set of distinguished action types in it, and corresponding tuple of epistemic propositions gives an epistemic program model in a canonical way. 4.2. The Syntax and Semantics of L( ) Fix an action signature . We present in Figure 7 a logic L( ). The ﬁrst thing to notice about this is that for the ﬁrst time in a great while, we have a genuine syntax. In this regard, note that n is ﬁxed from

; it is the length of the given listing (σ1 , . . . , σn ). In the programs of the form σ ψ1 , . . . , ψn we have sentences ψ rather than epistemic propositions (which we had written using boldface letters ψ 1 , . . . , ψ n in Section 2.3). Also, the signature ﬁgures into the semantics exactly in those programs σ ψ1 , . . . , ψn ; in those we require that σ ∈ . The program model ( , σ, [[ψ1 ]], . . . , [[ψn ]]) is a signature-based program model as in the previous section. [ 42 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

207

Figure 7. The language L( ) and its semantics.

The second thing to note is that, as in PDL, we have two sorts of syntactic objects: sentences and programs. We call programs of the form σ ψ1 , . . . , ψn basic actions. Note that they might not be “atomic” in the sense that the sentences ψj might themselves contain programs. The basic actions of the form σi ψ1 , . . . ψn (with i ≤ n) are called non-trivial, since they are generated by non-trivial action types. We use the following standard abbreviations: false = ¬true, ϕ ∨ ψ = ¬(¬ϕ ∧ ¬ψ), 3A ϕ = ¬2A ¬ϕ, 3∗B ϕ = ¬2∗B ϕ, and π ϕ = ¬[π ]¬ϕ. The Semantics. Deﬁnes two operations by simultaneous recursion on L( ): 1. ϕ → [[ϕ]], taking the sentences of L( ) into epistemic propositions; and 2. π → [[π ]], taking the programs of L( ) into program models (and hence into induced updates). The formal deﬁnition is given in Figure 7. The main thing to note is that with one key exception, the operations on the right-hand sides are immediate applications of our general deﬁnitions of the closure conditions on epistemic propositions from Section 2.1 and the operations on program models from Section 3.6. A good example to explain this is the clause for the semantics of sentences [α]ϕ. Assuming that we have a program model [[α]], we get an induced update in Section 3.4 which we again denote [[α]]. We also have an epistemic proposition [[ϕ]]. We can therefore form the epistemic proposition [[[α]]][[ϕ]] (see equation (4) in Section 2.3). Note that we have overloaded the square bracket notation; this is intentional, and we have done the same with other notation as well. Similarly, the semantics of skip and crash are the program models 1 and 0 of Section 3.6. We also discuss the deﬁnition of the semantics for basic actions σ ψ. This is precisely where the structure of the action signature enters. For this, recall that we have general deﬁnition of a signature-based program [ 43 ]

208

LOGICS FOR EPISTEMIC PROGRAMS

model ( , , ψ 1 , . . . , ψ n ), where ⊆ and the ψ’s are any epistemic propositions. What we have in the semantics of σ ψ is the special case of this where is the singleton {σ } and ψ i is [[ψi ]], a proposition which we will already have deﬁned when we will come to deﬁne [[σ ψ]]. At this point, it is probably good to go back to our earlier discussions in Section 2.1 of epistemic propositions and updates. What we have done overall is to give a fully syntactic presentation of languages of these epistemic propositions and updates. The constructions of the language correspond to the closure properties noted in Section 2.1. (To be sure, we have restricted to the ﬁnite case at several points because we are interested in a syntax, and at the same time we have re-introduced some inﬁnitary aspects via the Kleene star.) 4.3. Epistemic Program Logics L(S) We generalize now our signature logics L( ) to families S of signatures, in order to deﬁne a general notion of epistemic program logics. Logics Generated by Families of Signatures. Given a family S of signatures, we would like to combine all the logics {L( )} ∈S into a single logic. Let us assume the signatures ∈ S are mutually disjoint (otherwise, just choose mutually disjoint copies of these signatures). We deﬁne the logic L(S) generated by the family S in the following way: the syntax is deﬁned by taking the same deﬁnition we had in Figure 7 for the syntax of L( ), but in which on the side of the programs we take instead as basic actions all expressions of the form σ ψ1 , . . . , ψ n where σ ∈ , for some arbitrary signature ∈ S, and n is the length of the listing of non-trivial action types of . The semantics is again given by the same deﬁnition as in Figure 7, but in which the clause about σ ψ1 , . . . , ψn refers to the appropriate signature: for every ∈ S, every σ ∈ , if n is the length of the listing of , then [[σ ψ1 , . . . , ψn ]]

=

( , σ, [[ψ1 ]], . . . , [[ψn ]]).

EXAMPLE 4.2. This example constructs the logic of all epistemic programs. Take the family S

=

{ : is a ﬁnite signature }

of all ﬁnite signatures5 . The logic L(S) will be called the logic of all epistemic programs. [ 44 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

209

Preservation of Bisimulation and Atomic Propositions. We note two basic facts about the interpretations of sentences and programs in the logics L(S). PROPOSITION 4.3. The interpretations of the sentences and programs of L(S) preserve bisimulation. Proof. By induction on L(S), using Propositions 2.5 and 3.5 PROPOSITION 4.4. The interpretations of programs of L(S) preserve atomic sentences in the sense that if s [[π ]]S t, then for all atomic sentences p, s ∈ pS iff t ∈ pS([[π]]) . Proof. By induction on π . 4.4. Formalization of the Target Logics in Terms of Signatures We formalize now the target logics of Section 2.4 as epistemic program logics L(S). We use the action signatures of Examples 4.1 and the notation from there. The Logic of Public Announcements. This is formalized as L( Pub ). We have sentences [Pub ϕ]ψ, just as we described in Section 2.4. Note that L( Pub ) allows announcements inside announcements. If ϕ, ψ, and χ are sentences, then so is Pub [Pub ϕ]ψχ. We check that L( Pub ) is a good formalization of the logic of public announcements. Fix a sentence ϕ of L( Pub ) and a state model S. We calculate: S([[Pub ϕ]])

= = =

S( Pub , Pub, [[ϕ]]) S ⊗ ( Pub , Pub, [[ϕ]]) {(s, Pub) : s ∈ [[ϕ]]S }

The state model has the structure given earlier in terms of the update product operation. The update relation [[Pub ϕ]]S relates s to (s, Pub) whenever the latter belongs to the state model S([[Pub ϕ]]). The model itself is isomorphic to the sub-state-model of S induced by {s ∈ S : s ∈ [[ϕ]]S }. Under this isomorphism, the update relation is then the inverse of inclusion. This is just how the action of public announcement was described when we ﬁrst encountered it, in Example 2.1 of Section 2.3. Test-only PDL. was introduced in Section 2.6. Recall that this is PDL built over the empty set of atomic actions. Although it was not one of the target languages of Section 2.4, it will be instructive to see how it is formalized in our setting. Recall that test-only PDL has actions of the form ?ϕ. We want [ 45 ]

210

LOGICS FOR EPISTEMIC PROGRAMS

to use our action signature ? . The action types of it are ? and skip, only the ﬁrst one being non-trivial: n = 1. So technically we have sentences of the following forms: (9)

[? ϕ]χ

and [skip ϕ]χ

Let us study the semantics of the basic actions ? ϕ and skip ϕ. Fix a state model S. We calculate: S([[? ϕ]])

= = =

S( ? , ?, [[ϕ]]) S ⊗ ( ? , ?, [[ϕ]]) {(s, ?) : s ∈ [[ϕ]]S } ∪ {(s, skip) : s ∈ S}

A t in S, then The structure of S([[? ϕ]]) is that for each A, if s → A A (s, ?) → (t, skip). Also (s, skip) → (t, skip) under the same condition, and there are no other arrows. The update relation [[? ϕ]]S relates s to (s, ?) whenever the latter belongs to the updated structure. Overall, this update is isomorphic to what we described in Example 2.2 of Section 2.3. Turning to the update map of skip ϕ, we again ﬁx S. The model S([[skip ϕ]]) is again literally the same as what we calculated above. However, the update relation [[skip ϕ]]S now relates each s ∈ S to the pair (s, skip). This relation is a bisimulation. We shall formulate a notion of action equivalence later, and it will turn out that the update [[skip ϕ]] is equivalent to 1. For now, we can also consider the semantics of sentences of the form [skip ϕ]ψ. We have

[[[skip ϕ]ψ]]S

= = =

{s ∈ S : if s [[skip ϕ]]S t, then t ∈ [[ψ]]S([[skip ϕ]]) } {s ∈ S : (s, skip) ∈ [[ψ]]S([[skip ϕ]]) } {s ∈ S : s, ∈ [[ψ]]S }

This last equality is by Proposition 4.3 on bisimulation preservation. The upshot is that [[[skip ϕ]ψ]]S = [[ψ]]S . So for this reason, we might as well identify the sentences [skip ϕ]ψ and ψ. Or to put things differently, we might as well identify the basic action skip ϕ and (the constant of L( ? )) skip. Since we are already modifying the syntax, we might also abbreviate [skipϕ]ψ to ψ. Doing all this leads to a language which is exactly test-only PDL as we had it in Section 2.6, and our semantics there agrees with what we have calculated in the present discussion. In conclusion, test-only PDL is equivalent to the logic L0 ( ? ); i.e., the 2∗ -free fragment of L( ? ). The Logic of Totally Private Announcements. Let Pri be the family Pri [ 46 ]

=

{PriB : ∅ = B ⊆ A}

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

211

of all signatures of totally private announcements to non-empty groups of agents (as introduced in Examples 4.1). Then L(Pri) formalizes one of our target logics: the logic of totally private announcements. For example, in the case of A = {A, B}, L(Pri) will have basic actions of the forms: PriA ϕ, PriB ϕ, P ri A,B ϕ, skipA ϕ, skipB ϕ, skipA,B ϕ. As before, we may as well identify all the programs skipX ϕ with skip, and abbreviate [skipX ]ψ to ψ. The Logic of Common Knowledge of Alternatives. Can be formalized as L(Cka), where Cka

=

{CkaB k : ∅ = B ⊆ A, 1 ≤ k}.

4.5. Other Logics Logics Based on Frame Conditions. Many other types of logics are easily representable in our framework. For example, consider the logic of all S4 programs. To formalize this, we need only consider the disjoint union of all ﬁnite action signatures whose underlying accessibility relations are preorders. For the logic of S5 programs, we change preorders to equivalence relations. Another important class is given by the K45 axioms of modal logic. These systems are particularly important because the K45 and S5 conditions are so prominent in epistemic logic. Announcements by Particular Agents. Our modeling of the notion of the public announcement that ϕ is impersonal in the sense that the announcement does not come from anywhere in particular. It might be best understood as coming externally, as if someone shouted ϕ into the room where the agents were standing. We also want a notion of the public announcement by A of ϕ. We shall write this as PubA ϕ. For this, we identify PubA ϕ with the (externallymade) public announcement that ϕ holds and A knows this. This identiﬁcation does not represent the fact that A intends to inform the others of ϕ. But as we know, intentions are not modeled at all in our system. We claim, however, that on a purely propositional level, the identiﬁcation works. And using it, we can represent announcements by A. One way to do this is via abbreviation: we take PubA ϕψ to be an abbreviation for Pub ϕ ∧ 2A ϕψ. (A different way to formalize PubA ϕ would be to use a special signature constructed just for that purpose. But the discussion here shows that there is no need to do this. One can use Pub .) [ 47 ]

212

LOGICS FOR EPISTEMIC PROGRAMS

Lying. We can also represent misleading epistemic actions, such as lying. Again, we want to ﬁx an agent A and then represent the action of A (successfully) lying that ϕ to the other agents. To all those others, this action should be the same thing as an announcement of ϕ by A. But to say that A lies about ϕ, we want to assume that ϕ is actually false. Further, we want to assume that A moreover knows that ϕ is false. (For if ϕ just happened to be false when A said ϕ, we would not really want to call that “lying.”) The technical details on the representation go as follows. We take a A given as follows. signature Lie A

Lie

=

{SecretA , PubA }.

We take (PubA , SecretA ) as our non-repetitive list of types. The structure is A B SecretA ; for B = A, SecretA → PubA ; ﬁnally, given by taking SecretA → A B A for all B, Pub → Pub . A ) contains sentences like [SecretA ϕ, ψ]χ. The extra arguL( Lie ment ψ is a kind of secret condition. And we can use [LieA ϕ]χ as an abbreviation of [SecretA ϕ ∧ 2A ϕ, ¬ϕ ∧ 2A ¬ϕ]χ. That is, for A to lie about ϕ there is a condition that ¬ϕ ∧ 2A ¬ϕ. But the other agents neither need to know this ahead of time nor do they in any sense “learn” this from the announcement. Indeed, for the other agents, LieA ϕ is just like a truthful public announcement by A. As with private announcements, we take the family of signatures A : A ∈ A}. This family then generates a program logic. In this logic { Lie we have actions which represent lying on the part of any agent, not just one ﬁxed agent. Other Effects: Wiretapping, Paranoia etc. It is possible to model scenarios where one player believes a communication to be private while in reality a second player intercepts the communication. We can also represent gratuitous suspicion (“paranoia”): maybe no “real” action has taken place, except that some people start suspecting some action (e.g., some private communication) has taken place. With these and other effects, the problem is not so much deciding how to model them. Once one has clear intuitions about a social scenario, it is not hard to do the modeling. The real issue in their application seems to be that in complex social situations, our intuitions are not always clear. There is no getting around this, and technical solutions are of limited value for conceptual problems. [ 48 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

213

Endnote. This section is the centerpiece of this paper, and all of the work in it is new. We make a few points about the history of this approach. Our earlier paper with S. Solecki (Baltag et al. 1998) worked with a syntax that employed structured objects as actions. In fact, the actions were Kripke models themselves. This type of syntax also is used in other work such as van Ditmarsch et al. (2003). But the use of structured objects in syntax is felt by many readers to be awkward; see, e.g., remarks in chapter 4 of Kooi’s thesis Kooi (2003). We disagree with the assertion that our earlier syntax blurs the distinction between syntax and semantics. But in order to make the subject more attractive, we have worked out other approaches to the syntax. Baltag (2001, 2002, 2003) developed logical systems that streamline the syntax using a repertoire of program operations, such as learning a program and variable-binding operators. This paper is the ﬁrst one to formulate program logics in terms of action signatures.

5. L OGICAL S YSTEMS

We write |= ϕ to mean that for all state models S and all s ∈ S, s ∈ [[ϕ]]S . In this case, we say that ϕ is valid. In this section, we ﬁx an arbitrary family S of mutually disjoint action signatures, and we consider the generated logics. We present a sound proof system for the validities in L(S), and sound and complete proof systems for two important sublogics: the iteration-free fragment L1 (S) and the logic L0 (S) obtained by deleting both iteration and common knowledge operators. In particular, our results apply to the logics L( ), L1 ( ) and L0 ( ) given by only one signature. However, the soundness/completeness proofs will appear in Baltag (2003). So the point of this section is to just state clearly what the logical system is, and to illustrate its use. Sublanguages. We are of course interested in the languages L(S), but we also consider sublanguages L0 (S) and L1 (S). Recall that L1 (S) is the fragment without the action iteration construct π ∗ . L0 (S) is the fragment without π ∗ and 2∗B . It turns out that L0 ( ) is the easiest to study: it is of the same expressive power as ordinary multi-modal logic. On the other hand, the full logic L(S) is in general undecidable: indeed, even if we take a family consisting of only one signature, of public announcements Pub, the corresponding logic L(Pub) is undecidable (see Miller and Moss (2003)). L1 ( ) is decidable and we have a complete axiom system for it (see Baltag (2003)). In Figure 8 below we present a logic for L(S). We write ϕ if ϕ can be obtained from the axioms of the system using its inference rules. We often [ 49 ]

214

LOGICS FOR EPISTEMIC PROGRAMS

Figure 8. The logical system for L(S). For L1 (S), we drop the ∗∗ axioms and rule; for L0 (S), we also drop the ∗ axioms and rules.

omit the turnstile when it is clear from the context. Our presentation of the proof system uses the meta-syntactical notations associated with the notion of canonical action model (to be introduced below in Section 5.1). This could have been avoided only at the cost of complicating the presentation. We chose the simplest version, and so our logical system, as given in Figure 8, can be fully understood only after reading Section 5.1. AXIOMS. Most of the system will be quite standard from modal logic. The Action Axioms are new, however. These include the Atomic Permanence axiom; note that in this axiom p is an atomic sentence. The axiom says [ 50 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

215

that announcements do not change the brute fact of whether or not p holds. This axiom reﬂects the fact that our actions do not change any kind of local state. The Action-Knowledge Axiom gives a criterion for knowledge after an action. For non-trivial basic actions σi ψ (induced by a non-trivial action type σi ∈ , for some signature ∈ S), this axiom states that (9)

A ϕ ↔ (ψi → [σi ψ]2

A : σi → {2A [σj ψ]ϕ σj in })

In words, two sentences are equivalent: ﬁrst, an agent A will come to know ϕ after a basic action σi ψ is executed; and second, whenever this action σi ψ is executable (i.e., its precondition ψi holds), A knows (already before this action) that ϕ will come to be true after every action that A considers as possibly happening (whenever σi ψ is in fact happening). This axiom should be compared with the Ramsey axiom in conditional logic. One should also study the special case of it for the logic of public announcements in Section 5.3. The Action Rule then gives a necessary criterion for common knowledge after a simple action. (Simple actions are deﬁned below in Section Since common knowledge 5.1. They include the actions of the form σ ψ.) ∗ is formalized by the 2B construct, this rule is a kind of induction rule. (The sentences χβ play the role of strong induction hypotheses.) (For the induction rule for common knowledge assertions without actions, see Lemma 5.5.) 5.1. The Canonical Action Model Recall that we deﬁned action models and program models in Sections 3.1 and 3.2, respectively. At this point, we deﬁne a new action model called the canonical action model of a language L(S). DEFINITION. Let S be a family of mutually disjoint signatures. Recall that a basic action of L(S) is a program of the form σ ψ1 , . . . , ψn , where σ ∈ , for some signature ∈ S, and n is the length of ’s list of nontrivial action types. A simple action of L(S) is a program of L(S) in which neither the choice operation nor the iteration operation π ∗ occur. We use letters like α and β to denote simple actions only. A simple sentence is a sentence of L1 (S) in which neither action sum nor action iteration occur. So all programs in simple sentences are simple actions. The Canonical Action Model of L(S) We deﬁne a program model in several steps. The simple actions of are the simple actions of L(S) as [ 51 ]

216

LOGICS FOR EPISTEMIC PROGRAMS

A deﬁned just above. For all A, the accessibility relation → is the smallest relation such that A skip. 1. skip → A A then σ ϕ → σ ψ. 2. If σ → σ in some signature ∈ S and ϕ = ψ, A A A 3. If α → α and β → β , then α; β → α ; β .

PROPOSITION 5.1. As a frame, is locally ﬁnite: for each simple α, A∗ β. there are only ﬁnitely many β such that α −→ Proof. By induction on α; we use heavily the fact that the accessibility relations on are the smallest family with their deﬁning property. we use the assumption that all our For the simple action expressions σ ψ, signatures ∈ S are ﬁnite and mutually disjoint. Next, we deﬁne P RE : → L(S) by recursion so that P RE(skip) P RE(crash) P RE(σi ψ) P RE(σ ψ) P RE(α; β)

= = = = =

true false

ψi for σi in the given listing of

true for all trivial typesσ αP RE(β)

REMARK. This function P RE should not be confused with the function pre which is part of the structure of a program model. P RE(σ ) is a sentence in our language L(S), while pre was a purely semantic notion, associating propositions to simple actions in a program model. However, there is a connection: we are in the midst of deﬁning the program model , and its pre function is deﬁned in terms of P RE. This is also perhaps a good place to remind the reader that neither P RE nor pre is a ﬁrst-class symbol in L(S); they are deﬁned symbols. Completing the Deﬁnition of . We set pre(σ )

=

[[P RE (σ )]].

This action model is the canonical (epistemic) action model; it plays a somewhat similar role in our work to the canonical model in modal logic. PROPOSITION 5.2 (see Baltag et al. (1998) and Miller and Moss (2003)). For every family S of action signatures, the logical systems for L0 (S) and L1 (S) presented in Figure 8 are sound, complete, and decidable. However, for every signature which contains a (copy of the) “public announcement” action type Pub, the full logic L( ) (including iteration [ 52 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

217

π ∗ ) is undecidable and the proof system for L( ) presented in Figure 8 is sound but incomplete. Indeed, validity in L( ) is 11 -complete, so there are no recursive axiomatizations which are complete. (The same obviously applies to L(S) for any family of signatures S such that ∈ S.) 5.2. Some Derivable Principles LEMMA 5.3. The Action-Knowledge Axiom is provable for all simple actions α: A (10) [α]2A ϕ ↔ (P RE(α) → {2A [β]ϕ : α → β in }) The proof is by induction on α and may be found in Baltag et al. (2003). LEMMA 5.4. For all A ∈ C, all simple α, and all β such that α →A β, 1. [α]2∗C ψ → [α]ψ. 2. [α]2∗C ψ ∧ P RE(α) → 2A [β]2∗C ψ. Proof. Part (1) follows easily from the Epistemic Mix Axiom and modal reasoning. For part (2), we start with a consequence of the Epistemic Mix Axiom: 2∗C ψ → 2A 2∗C ψ. Then by modal reasoning, [α]2∗C ψ → [α]2A 2∗C ψ. By the Action-Knowledge Axiom generalized as we have it in Lemma 5.3, we have [α]2∗C ψ ∧ P RE(α) → 2A [β]2∗C ψ. LEMMA 5.5 (The Common Knowledge Induction Rule) From χ → ψ ∧ 2A χ for all A, infer χ → 2∗A ψ. Proof. We apply the Action Rule to the simple action skip, recalling that A skip for all A. P RE(skip) = true, and skip → 5.3. Logical Systems for the Target Logics We presented a number of target logics in Section 2.4, and these were then formalized in Section 4.1. In particular, we have logics L1 ( ) for a number of interesting action signatures . What we want to do here is to spell out what the axioms of L1 (S) come to when we specialize the general logic to the logics of special interest. In doing this, we ﬁnd it convenient to adopt simpler notations tailored for the fragments. The logic of public announcements is shown in Figure 9. We only included the axioms and rule of inference that speciﬁcally used the structure of the signature pub. So we did not include the sentential validities, the normality axiom for 2A , the composition axiom, modus ponens, etc. Also, we renamed the main axiom and rule to emphasize the “announcement” aspect of the system. [ 53 ]

218

LOGICS FOR EPISTEMIC PROGRAMS

Figure 9. The main points of the logic of public announcements.

Our next logic is the logic of completely private announcements to groups. We discussed the formalization of this in Section 4.4. We have actions PriB ϕ and (of course) skip. The axioms and rules are just as in the logic of public announcements, with a few changes. We must of course consider the relativized operators [PriB ϕ] instead of their simpler counterparts [Pub ϕ].) The actions skip all have true as their precondition, and since (true → ψ) is logically equivalent to ψ, we certainly may omit these actions from the notation in the axioms and rules. The most substantive change which we need to make in Figure 9 concerns the Action-Knowledge Axiom. It splits into two axioms, noted below: [PriB ϕ]2A ψ ↔ (ϕ → 2A [PriB ϕ]ψ) for A ∈ B for A ∈ /B [PriB ϕ]2A ψ ↔ (ϕ → 2A ψ) The last equivalence says: assuming that ϕ is true, then after a private announcement of ϕ to the members of B, an outsider knows ψ just in case she knew ψ before the announcement. Finally, we study the logic of common knowledge of alternatives. This, too, was introduced in Section 2.4 and formalized in Section 4.1. The Action-Knowledge now becomes B A ψ ↔ (ϕ1 → 2 for A ∈ B [CkaB ϕ]2 A [Cka ϕ]ψ) B B i A ψ ↔ (ϕ1 → 0≤i≤k 2A [Cka ϕ ]ψ) for A ∈ /B [Cka ϕ]2

where in the last clause, (ϕ1 , . . . , ϕn )i is the sequence ϕi , ϕ1 , . . . , ϕi−1 , ϕi+1 , . . . , ϕk . (That is, we bring ϕi to the front of the sequence.) [ 54 ]

219

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

5.4. Examples in the Target Logics This section studies some examples of the logic at work. We begin with an application of the Announcement Rule in the logic of public announcements. We again work with the atomic sentences H and T for heads and tails, and with the set {A, B} of agents. We show 2∗A,B (H ↔ ¬T) → [Pub H]2∗A,B ¬T. That is, on the assumption that it is common knowledge that heads and tails are mutually exclusive, then as a result of a public announcement of heads it will be common knowledge that the state is not tails. We give this application in detail. Recall that pub has one simple action which we call Pub. We take χPub to be 2∗A,B (H ↔ ¬T). In addition A Pub for all A, and there are no other arrows in pub. We take α Pub → to be Pub H; note that this is the only action accessible from itself in the canonical action model. To use the Announcement Rule, we must show that 1. 2∗A,B (H ↔ ¬T) → [Pub H]¬T. 2. (2∗A,B (H ↔ ¬T) ∧ H) → 2A 2∗A,B (H ↔ ¬T), and the same with B replacing A. From these assumptions, we may infer [Pub H]2∗A,B ¬T. For the ﬁrst statement, (a) (b) (c) (d) (e)

2∗A,B (H

↔

¬T)

→

T ↔ [Pub H]T Atomic Permanence (a), propositional reasoning (H ↔ ¬T) → (H → ¬[Pub H]T) Partial Functionality [Pub H]¬T ↔ (H → ¬[Pub H]T) Epistemic Mix 2∗A,B (H ↔ ¬T) → (H ↔ ¬T) (d), (b), (c), propositional reasoning 2∗A,B (H ↔ ¬T) → [Pub H]¬T

And the second statement is an easy consequence of the Epistemic Mix Axiom. What Happens when a Publicly Known Fact is Announced? One intuition about public announcements and common knowledge is that if ϕ is common knowledge, then announcing ϕ publicly does not change anything. Formally, we express this by a scheme rather than a single equation: (11)

2∗ ϕ → ([Pub ϕ]ψ ↔ ψ)

(In this line and in the rest of this section, we are omitting the subscripts on the 2∗ operator. More formally, the subscript should be A, since we are [ 55 ]

220

LOGICS FOR EPISTEMIC PROGRAMS

dealing with knowledge which is common to all agents.) What we would like to say is 2∗ ϕ → ψ ([Pub ϕ]ψ ↔ ψ), but of course this cannot be expressed in our language. So we consider only the sentences of the form (12), and we show that all of these are provable. We argue by induction on ϕ. For an atomic sentence p, (12) follows from the Epistemic Mix and Atomic Permanence Axioms. The induction steps for ∧ and ¬ are easy. Next, assume (12) for ψ. By necessitation and Epistemic Mix, we have 2∗ ϕ → (2A [Pub ϕ]ψ ↔ 2A ψ) Note also that by the Announcement-Knowledge Axiom 2∗ ϕ → ([Pub ϕ]2A ψ ↔ 2A [Pub ϕ]ψ) These two imply (12) for 2A ψ. Finally, we assume (12) for ψ and prove it for 2∗B ψ. We show ﬁrst that ∗ 2 ϕ ∧ 2∗ ψ → [Pub ϕ]2∗ ψ. For this we use the Action Rule. We must show that (a) 2∗ ϕ ∧ 2∗ ψ → [Pub ϕ]ψ. (b) 2∗ ϕ ∧ 2∗ ψ ∧ ϕ → 2A (2∗ ϕ ∧ 2∗ ψ). (Actually, since our common knowledge operators 2∗ here are really of the form 2∗A , we need (b) for all agents A.) Point (a) is easy from our induction hypothesis, and (b) is an easy consequence of Epistemic Mix. To conclude, we show 2∗ ϕ ∧ [Pub ϕ]2∗ ψ → 2∗ ψ. For this, we use the Common Knowledge Induction Rule of Lemma 5.5; that is, we show (c) 2∗ ϕ ∧ [Pub ϕ]2∗ ψ → ψ. (d) 2∗ ϕ ∧ [Pub ϕ]2∗ ψ → 2A (2∗ ϕ ∧ [Pub ϕ]2∗ ψ) for all A. For (c), we use Lemma 5.4, part (1) to see that [Pub ϕ]2∗ ψ → [Pub ϕ]ψ; and now (c) follows from our induction hypothesis. For (d), it will be sufﬁcient to show that ϕ ∧ [Pub ϕ]2∗ ψ → 2A [Pub ϕ]2∗ ψ This follows from Lemma 5.4, part (2). [ 56 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

221

A Commutativity Principle for Private Announcements. Suppose that B and C are disjoint sets of agents. Let ϕ1 , ϕ2 , and ψ be sentences. Then we claim that [PriB ϕ1 ][PriC ϕ2 ]ψ ↔ [PriC ϕ2 ][PriB ϕ1 ]ψ. That is, order does not matter with private announcements to disjoint groups. Actions Do Not Change Common Knowledge of Non-epistemic Sentences. For yet another application, let ψ be any boolean combination of atomic sentences. Then for all actions α of any of our logics, ψ ↔ [α]ψ. The proof is an easy induction on ψ. Even more, we have 2∗C ψ ↔ [α]2∗C ψ. In one direction, we use the Action Rule, and in the other, the Common Knowledge Induction Rule (Lemma 5.5). Endnotes. Although the general logical systems in this paper are new, there are important precursors for the target logics. Plaza (1989) constructs what we would call L0 ( Pub ), that is, the logic of public announcements without common knowledge operators or program iteration. (He worked only on models where each accessibility relation is an equivalence relation, so his system includes the S5 axioms.) Gerbrandy (1999a, b), and also Gerbrandy and Groeneveld (1997) went a bit further. They studied the logic of completely private announcements (generalizing public announcements) and presented a logical system which included the common knowledge operators. That is, their system included the Epistemic Mix Axiom. They argued that all of the reasoning in the original Muddy Children scenario can be carried out in their system. This is important because it shows that in order to get a formal treatment of that problem and related ones, one need not posit models which maintain histories. Their system was not complete since it did not have anything like the Action Rule; this ﬁrst appears in a slightly different form in Baltag (2003). 5.5. Conclusion We have been concerned with actions in the social world that affect the intuitive concepts of knowledge, (justiﬁable) beliefs, and common knowledge. This paper has shown how to deﬁne and study logical languages that contain constructs corresponding to such actions. The many examples in this paper show that the logics “work”. Much more can be said about speciﬁc tricky examples, but we hope that the examples connected to our scenarios make the point that we are developing valuable tools. [ 57 ]

222

LOGICS FOR EPISTEMIC PROGRAMS

The key steps in the development are the recognition that we can associate to a social action α a mathematical model . is a program model. In particular, it is a multi-agent Kripke model, so it has features in common with the state models that underlie formal work in the entire area. There is a natural operation of update product at the heart of our work. This operation is surely of independent interest because it enables one to build complex and interesting state models. The logical languages that we introduce use the update product in their semantics, but the syntax is a small variation on propositional dynamic logic. The formalization of the target languages involved the signature-based languages L( ) and also their generalizations L(S). These latter languages are needed to formulate the logic of private announcements, for example. We feel that presenting the update product ﬁrst (before the languages) will make this paper easier to read, and having a relatively standard syntax should also help. Once we have our languages, the next natural step is to study them. This paper presented logical systems for validities, omitting many proofs due to the lack of space.

NOTES 1 It is important for us that the sentence p be a syntactic object, while the proposition p

be a semantic object. See Section 2.5 for further discussion. 2 The subscript 3 comes from the number of the scenario; we shall speak of corresponding

models S1 , S2 , etc., and each time the models will be the ones pictured in Section 1.1. 3 We are writing relational composition in left-to-right order in this paper. 4 In Section 4.5, we shall consider a slightly simpler model of lying. 5 To make this into a set, instead of a proper class, we really mean to take all ﬁnite

signatures whose action types are natural numbers, and then take the disjoint union of this countable set of ﬁnite signatures.

REFERENCES

Baltag, Alexandru: 1999, ‘A Logic of Epistemic Actions’, (Electronic) Proceedings of the FACAS workshop, held at ESSLLI’99, Utrecht University, Utrecht. Baltag, Alexandru: 2001, ‘Logics for Insecure Communication’, in J. van Bentham (ed.) Proceedings of the Eighth Conference on Rationality and Knowledge (TARK’01), Morgan Kaufmann, Los Altos, pp. 111–122. Baltag, Alexandru: 2002, ‘A Logic for Suspicious Players: Epistemic Actions and Belief Updates in Games’, Bulletin Of Economic Research 54(1), 1–46. Baltag, Alexandru: 2003, ‘A Coalgebraic Semantics for Epistemic Programs’, in Proceedings of CMCS’03, Electronic Notes in Theoretical Computer Science 82(1), 315–335.

[ 58 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

223

Baltag, Alexandru: 2003, Logics for Communication: Reasoning about Information Flow in Dialogue Games. Course presented at NASSLLI’03. Available at http://www.indiana.edu/∼nasslli. Baltag, Alexandru, Lawrence S. Moss, and Sławomir Solecki: 1998, ‘The Logic of Common Knowledge, Public Announcements, and Private Suspicions’, in I. Gilboa (ed.), Proceedings of the 7th Conference on Theoretical Aspects of Rationality and Knowledge (TARK’98), pp. 43–56. Baltag, Alexandru, Lawrence S. Moss, and Sławomir Solecki: 2003, ‘The Logic of Epistemic Actions: Completeness, Decidability, Expressivity’, manuscript. Fagin, Ronald, Joseph Y. Halpern, Yoram Moses, and Moshe Y. Vardi: 1996, Reasoning About Knowledge, MIT Press. Fischer, Michael J. and Richard E. Ladner: 1979, ‘Propositional Modal Logic of Programs’, J. Comput. System Sci. 18(2), 194–211. Gerbrandy, Jelle: 1999a, ‘Dynamic Epistemic Logic’, in Lawrence S. Moss, et al (eds), Logic, Language, and Information, Vol. 2, CSLI Publications, Stanford University. Gerbrandy, Jelle: 1999b, Bisimulations on Planet Kripke, Ph.D. dissertation, University of Amsterdam. Gerbrandy, Jelle and Willem Groeneveld: 1997, ‘Reasoning about Information Change’, J. Logic, Language, and Information 6, 147–169. Gochet, P. and P. Gribomont: 2003, ‘Epistemic Logic’, manuscript. Kooi, Barteld P.: 2003, Knowledge, Chance, and Change, Ph.D. dissertation, University of Groningen. Meyer, J.-J. and W. van der Hoek: 1995, Epistemic Logic for AI and Computer Science, Cambridge University Press, Cambridge. Miller, Joseph S. and Lawrence S. Moss: 2003, ‘The Undecidability of Iterated Modal Relativization’, Indiana University Computer Science Department Technical Report 586. Moss, Lawrence S.: 1999, ‘From Hypersets to Kripke Models in Logics of Announcements’, in J. Gerbrandy et al. (eds), JFAK. Essays Dedicated to Johan van Benthem on the Occasion of his 60th Birthday, Vossiuspers, Amsterdam University Press. Plaza, Jan: 1989, ‘Logics of Public Communications’, Proceedings, 4th International Symposium on Methodologies for Intelligent Systems. Pratt, Vaughn R.: 1976, ‘Semantical Considerations on Floyd-Hoare Logic’, in 7th Annual Symposium on Foundations of Computer Science, IEEE Comput. Soc., Long Beach, CA, pp. 109–121. van Benthem, Johan: 2000, ‘Update Delights’, manuscript. van Benthem, Johan: 2002, ‘Games in Dynamic Epistemic Logic’, Bulletin of Economic Research 53(4), 219–248. van Benthem, Johan: 2003, ‘Logic for Information update’, in J. van Bentham (ed.) Proceedings of the Eighth Conference on Rationality and Knowledge (TARK’01), Morgan Kaufmann, Los Altos, pp. 51–68. van Ditmarsch, Hans P.: 2000, ‘Knowledge Games’, Ph.D. dissertation, University of Groningen. van Ditmarsch, Hans P.: 2001, ‘Knowledge Games’, Bulletin of Economic Research 53(4), 249–273. van Ditmarsch, Hans P., W. van der Hoek, and B. P. Kooi: 2003, in V. F. Hendricks et al. (eds), Concurrent Dynamic Epistemic Logic, Synt. Lib. vol. 322, Kluwer Academic Publishers.

[ 59 ]

224

LOGICS FOR EPISTEMIC PROGRAMS

Alexandru Baltag Oxford University Computing Laboratory Oxford, OX1 3QD, U.K. E-mail: [email protected] Lawrence S. Moss Mathematics Department Indiana University Bloomington, IN 47405, U.S.A. E-mail: [email protected]

[ 60 ]

HANS ROTT

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES OF BELIEF FORMATION

ABSTRACT. In recent years there has been a growing consensus that ordinary reasoning does not conform to the laws of classical logic, but is rather nonmonotonic in the sense that conclusions previously drawn may well be removed upon acquiring further information. Even so, rational belief formation has up to now been modelled as conforming to some fundamental principles that are classically valid. The counterexample described in this paper suggests that a number of the most cherished of these principles should not be regarded as valid for commonsense reasoning. An explanation of this puzzling failure is given, arguing that a problem in the theory of rational choice transfers to the realm of belief formation.

1. INTRODUCTION

Part of the cognitive state of a person is characterized by the set of her beliefs and expectations. By the term ‘belief formation’, we will refer to two different kinds of processes. The ﬁrst one is that of drawing inferences from a given set of sentences. Ideally, as a result of this process the reasoner arrives at a well-balanced set of beliefs in reﬂective equilibrium, sometimes referred to as the ‘belief set’. The second process is that of readjusting one’s belief set in response to some perturbation from outside (‘belief transformation’ might be a better term in this case). We will consider two subspecies of belief change that may be triggered by external perturbations. If there is some cognitive ‘input’, a sentence to be accepted, we speak of a belief revision. If the reasoner has to withdraw one of her beliefs, without accepting another belief in its place, she performs a belief contraction. We will be dealing with the three topics of inference, revision and contraction in turn. For each of them, we consider two qualitative principles that have up to now been regarded as very plausible ones. We shall then tell a story with a few alternative developments which is intended to show that all of these principles fail. They do not fail due to the contingencies of some particular system that is being proposed, but indeed as norms for good reasoning. Based on recent reconstructions of belief formation in terms of the theory of rational choice, we give an explanation Synthese 139: 225–240, 2004. Knowledge, Rationality & Action 61–76, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 61 ]

226

HANS ROTT

of why these principles fail. It turns out that well-known problems of this very general theory transfer to the special case where the theory is applied in operations of belief formation. In fact, it will turn out that this special case has features that block one of the standard excuses for the problem at hand. We end up in a quandary that poses a serious challenge to any future conception of belief formation procedures. For a long time, the notion of inference has been thought to be identical with the notion of logical consequence or deduction. Partly as a result of the problems encountered in research in artiﬁcial intelligence and knowledge representation during the 1960s and 1970s, however, logicians have come to realize that most of our reasoning proceeds on the basis of incomplete knowledge and insufﬁcient evidence. Implicit assumptions about the normal state and development of the world, also known as expectations, presumptions, prejudices or defaults, step in to ﬁll the gaps in the reasoner’s body of knowledge. These default assumptions form the context for ordinary reasoning processes. They help us to generate conclusions that are necessary for reaching decisions about how to act, but they are retractible if further evidence arises. Thus our inferences will in many contexts be defeasible or non-monotonic in the sense that an extension of the set of premises does not generally result in an increase of the set of legitimate conclusions. This, however, does not mean that the classical concept of logical consequence gets useless, or that our reasoning gets completely irregular. For the purposes of this paper, we can in fact assume1 that the set of our beliefs (and similarly: the set of our expectations) is consistent and closed with respect to some broadly classical consequence operation Cn. This combined notion of logical coherence (consistency-cum-closure) may be viewed as a constraint that makes the processes of inference and belief change a non-trivial task.2

2. SIX FUNDAMENTAL PRINCIPLES OF BELIEF FORMATION

2.1. In the last two decades, a great number of systems for non-monotonic reasoning have been devised that are supposed to cope with the newly discovered challenge.3 Many classical inference patterns are violated by such systems, but it is equally important to keep in mind that quite a number of classical inference patterns do remain valid. Let us now have a look at two properties that have usually been taken to be constitutive of sound reasoning with logical connectives like ‘and’ and ‘or’ even in the absence of monotonicity. First, if the premise x allows the reasoner to conclude that y is true, then y may be conjoined to the premise x, without spoiling any of the [ 62 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

227

conclusions that x alone permits to be drawn. This severely restricted form of the classical monotony condition is usually called Cumulative Monotony or Cautious Monotony: (1)

If y is in Inf(x), then Inf(x) ⊆ Inf(x ∧ y).

Here Inf(x) denotes the set of all conclusions that can be drawn if the premise is x. The reasoner may in fact possess an arbitrary ﬁnite number of premises which are conjunctively tied together in x. More importantly, Inf(x) is meant to denote what can be obtained if x is all the information available to the agent. Second, if a reasoner wants to know what to infer from a disjunction x ∨ y, she may reason by cases. She will consider ﬁrst what would hold on the assumption of x, and then consider what would hold on the assumption of y. Any sentence that may be inferred in both of these cases should be identiﬁable as a conclusion of an inference starting from x ∨ y. This is the content of a condition called Disjunction in the Premises: (2)

Inf (x) ∩ Inf (y) ⊆ Inf (x ∨ y).

Cumulative Monotony and Disjunction in the Premises hold in most defeasible reasoning systems that have been proposed since non-monotonic logic came into being, and especially in those systems that are semantically well-motivated. There is one important and striking exception. Reiter’s (1980) seminal system of Default Logic violates both Cumulative Monotony and Disjunction in the Premises. However, no advocate of Reiter’s logic has ever argued that Cumulative Monotony and Disjunction in the Premises should be violated. These violations have usually been taken to be defects of the system that need to be remedied.4 Conditions (1) and (2) have never lost their normative force. 2.2. Let us now turn to the revision of belief sets in response to new information. If someone has to incorporate a conjunction x ∧ y, she has to accept both x and y. One idea how to go about revising by the conjunction is to revise ﬁrst with x. If it so happens that y is accepted in the resulting belief set, then one should be sure that every belief contained in this set is also believed after a revision of the original belief set by x ∧ y. In the following, B ∗ x denotes the set of beliefs held after revising the initial belief set B by x (and likewise for the input sentence x ∧ y). (3)

If y is in B ∗ x, then B ∗ x ⊆ B ∗ (x ∧ y).

Another approach to circumscribing the result of a revision by the conjunction x ∧ y is to revise ﬁrst with x, and then to just add y set-theoretically [ 63 ]

228

HANS ROTT

and take the logical consequences of everything taken together. This is not always a good idea, since y may be inconsistent with B ∗ x, and thus the second step would leave us with the inconsistent set of all sentences. But even if we may end up with too many sentences, this strategy seems unobjectionable if it is taken as yielding an upper bound for the revision by a conjunction. This is the content of principle (4)

B ∗ (x ∧ y) ⊆ Cn ((B ∗ x) ∪ {y}).

2.3. Finally, we consider the removal of beliefs. Here again, we focus on upper and lower bounds of changes with respect to conjunctions. If a person wants to remove effectively a conjunction x ∧ y, she has to remove at least one of the conjuncts, that is, either x or y. So if the second conjunct y is still retained in the result of removing the conjunction, what has happened is exactly that the ﬁrst conjunct x has been removed. We will be content here with a weaker condition that replaces the identity by an . inclusion. Here and elsewhere, B − x denotes the set of beliefs that are retained after withdrawing x from the initial belief set B (and likewise for the case where x ∧ y is to be discarded): (5)

. . . If y is in B − (x ∧ y), then B −(x ∧ y) ⊆ B − x.

Another approach to circumscribing the result of a contraction with respect to the conjunction x ∧ y is to consider ﬁrst what would be the result of removing x, and then consider what would be the result of removing y. It is not always necessary to take into account both possibilities, but doing so should certainly be suitable for setting a lower bound. Any sentence that survives both of these thought experiments should surely be included in the result of the removal of x ∧ y. This is the content of principle (6)

. . . B − x ∩ B − y ⊆ B − (x ∧ y).

Principles (3)–(6) have been endorsed almost universally in the literature on belief revision and contraction. The classic standard was set by Alchourrón, Gärdenfors and Makinson (1985). Conditions (4) and (6) are the seventh of their eight ‘rationality postulates’ for revision and contraction, conditions (3) and (5) are considerably weaker – and thus considerably less objectionable – variants of their eighth postulates.5 There exist sophisticated ‘translations’ between operations of nonmonotonic inference, belief revision and removal which show that notwithstanding different appearances, conditions (1), (3) and (5) are essentially different sides of the same (three-faced) coin, as are conditions (2), (4) and (6).6 [ 64 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

229

Some methods of belief formation suggested in the literature violate one or the other of the six principles. Nevertheless, it is fair to say that these principles have retained their great intuitive appeal and have stood fast up to the present day as norms to which all good reasoning is supposed to conform. But this is wrong, or so I shall argue. The following section presents an example that shows, I think, that not a single one of the six principles listed above ought to be endorsed as a valid principle of rational belief formation.

3. THE COUNTEREXAMPLE

The story goes as follows. A well-known philosophy department has announced an open position in metaphysics. Among the applicants for the job there are a few persons that Paul, an interested bystander, happens to know. First, there is Amanda Andrews, an outstanding specialist in metaphysics. Second, we have Bernice Becker, who is also deﬁnitely a very good, though not quite as excellent a metaphysician as Andrews. Becker has in addition done some substantial work in logic. A third applicant is Carlos Cortez. He has a comparatively slim record in metaphysics, but he is widely recognized as one of the most brilliant logicians of his generation. Suppose that Paul’s initial set of beliefs and expectations includes that neither Andrews nor Becker nor Cortez will get the job (say, because Paul and everybody else thinks that Don Doyle, a star metaphysician, is the obvious candidate who is going to get the position anyway). Paul is aware of the fact that only one of the contenders can get a job. 3.1. Consider now three hypothetical scenarios, each of which describes a potential development of the selection procedure. The scenarios are not meant as describing consecutive stages of a single procedure. At most one of the potential scenarios can turn out to become real. In each of these alternative scenarios, Paul is genuinely taken by surprise, because he learns that one of the candidates he had believed to be turned down will – or at least may – be offered the position. (Doyle, by the way, has told the department that he has accepted an offer from Berkeley.) To make things shorter, we introduce some abbreviations. Let the letters a, b and c stand for the sentences that Andrews, Becker and Cortez, respectively, will be offered the position. Paul is having lunch with the dean, a very competent, serious and profoundly honest man who is also the chairman of the selection committee. [ 65 ]

230

HANS ROTT

SCENARIO 1. The dean tells Paul in conﬁdence that either Andrews or Becker will be appointed. This message comes down to supplying Paul with the premise a ∨ b. Given this piece of information, Paul concludes that Andrews will get the job. This conclusion is based on his background assumptions that Andrews has superior qualities as a metaphysician and that expertise in the ﬁeld advertised is the decisive criterion for the appointment. From his background knowledge that there is only one position available, Paul further infers that all the other candidates are going to return empty-handed. SCENARIO 2. In this scenario the dean tells Paul that either Andrews or Becker or Cortez will get the job, thus supplying him with the premise a ∨ b ∨ c. This piece of information triggers off a rather subtle line of reasoning. Knowing that Cortez is a splendid logician, but that he can hardly be called a metaphysician, Paul comes to realize that his background assumption that expertise in the ﬁeld advertised is the decisive criterion for the appointment cannot be upheld. Apparently, competence in logic is regarded as a considerable asset by the selection committee. Still, Paul keeps on believing that Cortez will not make it in the end, because his credentials in metaphysics are just too weak. Since, however, logic appears to contribute positively to a candidate’s research proﬁle, Paul concludes that Becker, and not Andrews, will get the job. This qualitative description should do for our purposes, but for those who prefer the precision of numbers, the following elaboration of our story can be given (see Figure 1). Suppose that the selection committee has decided to assign numerical values in order to evaluate the candidates’ work. Andrews scores 97 out of 100 in metaphysics, but she has done no logic whatsoever, so she scores 0 here. Becker scores 92 in metaphysics and a respectable 50 in logic. Cortez scores only 40 in metaphysics, but boasts of 99 in logic. In scenario 1, Paul takes it that metaphysics is the only criterion, so clearly Andrews must be the winner in his eyes. But in scenario 2, Paul gathers that, rather unexpectedly, logic has some importance. As can easily be veriﬁed, any weight he may wish to attach to the logic score between 1/10 and 1/2 (with metaphysics taking the rest) will see Becker ending up in front of both Andrews and Cortez. SCENARIO 3. This is a very surprising scenario in which Paul is told that Cortez is actually the only serious candidate left in the competition. There is little need to invest a lot of thinking. Paul accepts c in this case. Let us summarize the scenarios as regards the conclusions Paul would draw from the various premises that he may get from the dean of the [ 66 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

231

Figure 1. Becker is ahead of both Andrews and Cortez if logic is assigned a weight between 0.1 and 0.5.

faculty. In scenario 1, Paul infers from a ∨ b that a and ¬b (along with ¬c and ¬d which we will not mention any more). In scenario 2, he infers from a ∨ b ∨ c that ¬a and b. In scenario 3, he infers from c that ¬a and ¬b. Now we ﬁrst ﬁnd that this situation does not conform to Cumulative Monotony. Substitute a ∨ b ∨ c for x and a ∨ b for y in (1). Even though Paul concludes that a ∨ b is true on the basis of the premise a ∨ b ∨ c, it is not the case that everything inferable from the latter is also inferable from (a ∨ b ∨ c) ∧ (a ∨ b) which is equivalent with a ∨ b. Sentences ¬a and b are counterexamples. Second, the example at the same time shows that Disjunction in the Premises does not hold. Take (2) and substitute a ∨ b for x and c for y. Then notice that ¬b can be inferred both from a ∨ b and from c, but it cannot be inferred from a ∨ b ∨ c. Summing up, even though Paul’s reasoning is perfectly rational and sound, it violates both Cumulative Monotony and Disjunction in the Premises. 3.2. Let us then turn to the dynamics of belief. The case of potential revisions of belief is very similar to the case of default reasoning. What we have so far considered as the set of all sentences that may be inferred from a given premise x, will now be reinterpreted as the result of revising a belief set by a new piece of information. This is best explained by looking at the concrete case of the selection procedure. Paul’s initial belief set B contains ¬a, ¬b, ¬c and d (among other things). Paying attention to the fact that the structure of (3) is very [ 67 ]

232

HANS ROTT

similar to the structure of (1), we can re-use the above argument. If Paul’s set of initial beliefs and expectations is revised by a ∨ b ∨ c, then the resulting belief set includes a ∨ b (because it includes b). However, the revised belief set B ∗ (a ∨ b ∨ c) is not a subset of the belief set B ∗ ((a ∨ b ∨ c) ∧ (a ∨ b)) = B ∗ (a ∨ b), as is borne out by sentences like ¬a and b. Thus (3) is violated. In principle (4), substitute a ∨ b ∨ c for x and a ∨ b for y. Then the lefthand side is changed to B ∗(a ∨b), while the right-hand side consists of the set of all logical consequences of B ∗ (a ∨ b ∨ c) and a ∨ b taken together. Since a ∨b is already included in B ∗(a ∨b∨c), we need only consider this latter set. But as we have by now seen several times, the two revised belief sets just mentioned cannot be compared in terms of the subset relation. So (4) is violated. 3.3. For the consideration of belief contractions, we have to change our story slightly. Suppose now that in the different scenarios Paul may be going through, the dean does not go so far as to tell him that Andrews or Becker (or Cortez) will get the offer, but only that one of them might get the offer. Paul’s proper response to this is to withdraw his prior belief that none of Andrews and Becker (and Cortez) will get the job, without at the same time acquiring any new belief instead. In all other respects the story is just the same as before. So this time, in scenario 1 , when Paul is given the information that Andrews or Becker might get the job, he withdraws his belief that ¬a, but he keeps ¬b. And in the alternative scenario 2 , when Paul learns from the dean that Andrews or Becker or Cortez might get the job, he again understands that competence in logic is regarded as an asset by the selection committee, and so he withdraws ¬b while retaining ¬a. Scenario 3 just leads Paul to withdraw ¬c. Now we can see that the prescriptions of the above principles for belief contraction are not complied with. First consider principle (5) and substi. tute ¬a ∧ ¬b for x and ¬c for y. Then we get ¬c in B − (¬a ∧ ¬b ∧ ¬c), . but this belief set is not a subset of B − (¬a∧¬b), since ¬a is in the former but not in the latter set. Finally, the same substitutions serve to refute principle (6). The belief . . ¬b is retained in both B − (¬a ∧ ¬b) and B − ¬c, but it is withdrawn in . B − (¬a ∧ ¬b ∧ ¬c). In sum, then, we have found that Paul’s reasoning which is perfectly rational and adequate for the situations sketched leads to belief formation processes that violate each of the six fundamental principles (1)–(6). How can this be explained?

[ 68 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

233

4. FIRST REACTION : A PROBLEM OF FORMALIZATION

A ﬁrst intuitive reaction to the puzzle is to simply deny that the example exhibits the formal structure that it has been represented as having here, and to claim instead that the various messages Paul may receive from the dean are incompatible with one another. When the dean says that either Andrews or Becker or Cortez will be offered the job, isn’t she, in some sense, saying more than when she says that either Andrews or Becker will be the winner of the competition? Namely, that it is possible that Cortez will be offered the position, while the latter message, at least implicitly, excludes that possibility. Shouldn’t we therefore represent the dean’s message in a somewhat more explicit way? Three things can be said in reply to this objection. First, it is true that a∨ b implicitly conveys the information that Cortez is not among the selected candidates. However, the kind of reasoning that turns implicit messages into explicit beliefs is exactly what is meant to be captured by theories of nonmonotonic reasoning and belief change. It is therefore important to insist that ¬c is not part of the dean’s message, but that it is rather inferred (perhaps subconsciously, automatically) by the reasoner. Representing the dean’s statement in scenario 1 as (a∨b)∧¬c would simply not be adequate. Second, it is of course true that the message a ∨ b ∨ c does not in itself exclude the possibility that c will come out true. But it is not necessary that each individual disjunct is considered to be a serious possibility by any of the interlocutors. For instance, nothing in the story commits us to the view that either the dean or Paul actually believes that Cortez stands a chance of being offered the position. So the dean’s statement in scenario 2 must not be represented as saying that each of a, b and c is possible. As is common in the literature on belief formation, we presuppose in this paper that our language does not include the means to express autoepistemic possibility, something like c (read as, ‘for all I believe, c is possible’). We just saw that we do not need such means for the present case, and we want to limit the expressiveness of the propositional language. Admitting autoepistemic operators immediately makes matters extremely complicated and invalidates almost all of the logical principles that have been envisaged in the theory of belief formation.7 We conclude that the problem is not caused by a sloppy translation of a commonsensical description of the case into regimented language. What, then, does the problem arise from?

[ 69 ]

234

HANS ROTT

5. PROBLEMS OF RATIONAL CHOICE ARE PROBLEMS FOR BELIEF FORMATION

Principles of nonmonotonic inference and belief change can be systematically interpreted in terms of rational choice.8 In this view, the process of belief formation is one of resolving conﬂicts among one’s beliefs and expectations by following through in thought the most plausible possibilities. According to a semantic modelling, the reasoner takes on as beliefs everything that is the case in all of the most plausible models that satisfy the given information, where the most plausible models are determined with the help of a selection function. A syntactic modelling, closely related to the semantic one, describes the reasoner as eliminating the least plausible sentences from a certain set of sentences that generates the conﬂict within her belief or expectation set. And again, the task of determining the least plausible sentences is taken over by a selection function. It is not possible here to give a description of these nicely dovetailing mechanisms even in the barest outlines. Sufﬁce it to say that there are elaborate theories exhibiting in full mathematical detail striking parallels between the ‘theoretical reason’ at work in belief formation processes and those parts of ‘practical reason’ that manifest themselves in processes of rational choice. On this interpretation, Disjunction in the Premises (2) and its counterparts for belief change, (4) and (6), turn out to be instantiations of one of the most fundamental conditions – perhaps the most fundamental condition – of the theory of rational choice. This condition, called Independence of Irrelevant Alternatives, the Chernoff property or Sen’s Property α, says that any element which is optimal in a certain set is also optimal in all subsets of that larger set in which it is contained. Cumulative Monotony (1) and its counterparts, (3) and (5), have turned out to be instantiations of another important condition in the theory of rational choice, namely to Aizerman’s axiom. The above scenarios are modelled after well-known choice situations in which Property α is violated, cases which also happen to disobey Aizerman’s axiom. These properties may fail to be satisﬁed if the very ‘menu’ from which an agent is invited to choose carries information which is new to the agent. The locus classicus for the problem is a passage in Luce and Raiffa (1957, p. 288). They tell a story about a customer of a restaurant who chooses salmon from a menu consisting of salmon and steak only, but changes to steak after being informed that fried snails and frog’s legs are on the menu, too. This customer is not to be blamed for irrationality. The reason why he changes his mind is that he infers from the extended menu [ 70 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

235

that the restaurant must be a good one, one where no risk is involved in taking the steak (which is the customer’s ‘real’ preference as it were). Sen calls this phenomenon the ‘epistemic value’ or the ‘epistemic relevance of the menu’.9 Luce and Raiffa chose to avoid the problem by ﬁat: This illustrates the important assumption implicit in axiom 6 [=essentially Sen’s Property α], namely, that adding new acts to a decision problem under uncertainty does not alter one’s a priori information as to which is the true state of nature. In what follows, we shall suppose that this proviso is satisﬁed. In practice this means that, if a problem is ﬁrst formulated so that the availability of certain acts inﬂuences the plausibility of certain states of nature, then it must be reformulated by redeﬁning the states of nature so that the interaction is eliminated.

Luce and Raiffa thus explain away the problem of the restaurant customer because the extended menu conveys the information that the restaurant is a good one. The customer’s choice is not really between salmon and steak, but basically between salmon-in-a-good-restaurant and steak-in-agood-restaurant (assuming that he does not like fried snakes and frog’s legs). Analogously, we may say that in the above example, Paul’s doxastic choice is not simply one between the belief that Andrews gets the job and the belief that Becker gets the job. Given the information in scenario 2, his choice is rather between the belief that Andrews gets the job and logic matters, and the belief that Becker gets the job and logic matters. So it seems that the two scenarios cannot be compared in the ﬁrst place. Have we solved the puzzle now? No, we haven’t. To see this, we have to understand ﬁrst what is not responsible for the problem. In Luce and Raiffa’s example, the reason for the trouble is not that the extended menu introduces a reﬁnement in the customer’s options, nor is it that his preferences change, nor is it that the second situation cannot be compared with the ﬁrst. The customer is well aware right from the beginning that there are good restaurants and bad restaurants, and that he would prefer steak in a good, but salmon in a bad restaurant. What the availability of snails and frog’s legs signals, however, is that the customer is actually in a good restaurant, whereas he had at ﬁrst been acting on the assumption that he is in a bad one.10 Luce and Raiffa are right in suggesting that the point is that the extended menu carries novel information about the state of the world. Luce and Raiffa’s argument thus may make good sense as a rejoinder in the context of the general theory of choice and decision. It is simply not this theory’s business to explain how information is surreptitiously conveyed through the particular contents of a certain menu. So Luce and Raiffa have a justiﬁcation for refusing to deal with that problem. Unfortunately, no analogous defense is available against the problem highlighted in the present [ 71 ]

236

HANS ROTT

paper. It is the business of theories of belief formation (which include expectation-based inference and belief change) to model how one’s prior information is affected by information received from external sources. This is precisely what these theories are devised to explain! Therefore, the anomaly cannot be pushed away into a neighbouring research ﬁeld.

6. CONCLUSION

What is the moral of our story? We began by reviewing six of the most important and central logical principles that have generally been taken to be valid in common-sense reasoning and that have widely been endorsed as yardsticks for evaluating the adequacy of systems of non-classical logics intended to capture such reasoning. We have seen, however, that there are situations in which these reasoning patterns should not be expected to hold. This comes down to declaring them invalid, not as a contingent property of some particular system that has been proposed in the literature, but as norms to which rational belief formation ought to conform. We have dismissed as premature the idea that the formalization should be puffed up in order to represent what is only implicit in the dean’s message. In the context of theories of belief formation, it seems more appropriate to obey a First maxim of formalization: Do not put into the formalization what is not part of the ordinary language expression. Another lesson to be drawn from the above discussion is that a choicetheoretic modelling of belief formation processes does not only inherit the elegance and power of the theory of rational choice, but also its problems. This is not a trivial observation. Problems encountered in a general theory need not necessarily persist if this theory is applied to a restricted domain. The processes involved in belief formation are of a broadly logical kind, and one might expect that this domain is particularly well-behaved so that one would not encounter the strange phenomena haunting the notion of rational choice. Our example has shown that this is not the case. The problems do carry over from the general to the more speciﬁc domain. Fundamental principles of belief formation are as affected by the anomaly of the ‘informational value of the menu’ as the principles of rational choice. I have then argued that things are even worse as regards this special domain. The reason is that a natural defense – Luce and Raiffa’s defense – which makes sense for the general theory is not open for the special case of belief formation. The problem of the informational value of the menu [ 72 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

237

may appear to be alien to the concerns of rational choice theory. However, it is clear that theories of belief formation are theories just about the processing of information that comes in propositional form. The source of the trouble concerns a paradigm problem for belief formation theories rather than something that may be discharged into some other ﬁeld of research. But perhaps the solution lies in the formalization after all. Compare the dean’s statement that a ∨ b ∨ c with other ways of obtaining this information. Suppose for example, that Paul hears about Doyle’s new job, that he observes all the other candidates except Andrews, Becker and Cortez leaving the place with long faces, but that he is not sure whether he has missed anyone else wearing a long face. The information Paul has in this scenario seems to be the just same as the one obtained in scenario 2 above: Andrews or Becker or Cortez will get the job. But this time, of course, Paul would not conclude that logic is important.11 Why is this? I suggest that the case points to the converse to the maxim above, namely to the Second maxim of formalization: Put into the formalization everything that is part of the ordinary language expression. By ‘expression’, I now do not mean the type, but the token. The expression is not to be taken here as separated from its context of production. What Paul really learns in scenario 2 is not just that either Andrews, Becker or Cortez will get the job, but that the dean says so. And it is part of Paul’s background assumptions that the dean would not say this without very good reasons. The most plausible reason is, according to Paul’s background beliefs, that the selection committee regards Cortez’s scientiﬁc proﬁle as impressive and suitable. No such reasoning takes place in the alternative scenario in which a ∨ b ∨ c is obtained by inductive elimination of the other candidates. The fact that a piece of information comes from a certain origin or source or is transmitted by a certain medium conveys information of its own. In a short slogan, there is no message without a medium. What the example seems to teach us is that at least in some cases, the reasoner should receive not just the content of a message, but take account of the message-with-the-medium. Unfortunately, this leaves us with a dilemma. Formalization is translation into a formal language, and this translation is done with a view to processing the formulas obtained. This is the Third maxim of formalization: The results of a formalization should be efﬁciently processable by an appropriate system of formal reasoning.

[ 73 ]

238

HANS ROTT

Usually, the processing is handled by some kind of logic that allows the reasoner to combine her bits and pieces of information and draw conclusions from them. It is easy to take a ∨ b ∨ c, say, and use it together with other statements about Andrews, Becker and Cortez to infer more about the relevant facts. For example, if the reasoner receives further information that ¬c, then she can conclude that a ∨ b. But it is much harder to work with X-says-that-(a ∨ b ∨ c), since adding Y-tells-me-that-(¬c) does not give him any logical consequences. The medium screens off the message, as it were, from the reasoner’s attempt to exploit its content. Formalization is more accurate if it keeps track of the sources, but it is not very useful any more. The reasoner needs to detach the message from the medium, in order to be able to utilize its content for unmediated inferences. So our discussion has a negative end. Once we make logic more ‘realistic’ in the sense that it captures patterns of everyday reasoning, there is no easy way of saving any of the beautiful properties that have endeared classical logic to students of the subject from Frege on. We have identiﬁed a formidable problem, but we haven’t been able to offer an acceptable solution for it. But problems there are, and creating awareness of problems is one of the important tasks of philosophy.

ACKNOWLEDGEMENTS

I would like to thank Luc Bovens, Anthony Gillies, Franz Huber, Isaac Levi, David Makinson, Erik Olsson, Wlodek Rabinowicz, Krister Segerberg, Wolfgang Spohn and two anonymous referees of this journal for many helpful comments. It is only lack of space and time that prevents me from taking up more of the interesting issues they have raised.

NOTES 1 Along with Dennett (1971, pp. 10–11) and Stalnaker (1984, p. 82), as well as the

majority of the more technical literature mentioned below. 2 What has been said in this little paragraph takes the position that the nonmonotonicity

of commonsense reasoning is an effect of a certain way of using classical logic, rather than a result of applying some irreducibly non-classical, ampliative inference operation (see e.g., Morgan 2000 and Kyburg 2001). The counterexample below is independent of any particular philosophical stand in this matter. 3 For a an excellent survey of the logical patterns underlying nonmonotonic reasoning, see Makinson (1994). 4 See for instance the discussions in Brewka (1991), Giordano and Martelli (1994) and Roos (1998).

[ 74 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

239

5 For comprehensive treatments of this topic, see Gärdenfors (1988), Gärdenfors and Rott

(1995) and Hansson (1999). 6 See Makinson and Gärdenfors (1991) and Rott (2001, chapter 4). 7 Some of the relevant problems are highlighted by Rott (1989) and Lindström and Ra-

binowicz (1999). – As for the underlying language, it is worth noting that our challenge to the theory of belief formation does not depend on any extension of standard propositional language, as other counterexamples to prominent logical principles do. Compare the much-debated riddles raised by McGee’s (1985) counterexample to modus ponens and Gärdenfors’ (1986) trivialization theorem, which both depend on the language’s including non-truth functional conditionals. 8 The theory of rational choice I am referring to here is the classical one deriving from economists like Paul Samuelson, Kenneth Arrow and Amartya Sen. A beautiful and concise summary of the most relevant ideas is given by by Moulin (1985). This theory is applied to the ﬁeld of belief formation by Lindström (1991) and Rott (1993, 2001). 9 Sen (1993, pp. 500–502; 1995, pp. 24–26) has brought the problem to wide attention. There are other reasons why Property α may fail without the chooser being irrational; see for example Levi (1986, pp. 32–34) and Kalai, Rubinstein and Spiegler (2002) about decision making on the basis of multiple preference relations. It remains to be seen how many of the reasons that speak against Property α as a general requirement for rational choice apply to the rather special domain of belief formation. 10 Two sorts of reasons come to mind that might account for the customer’s pessimism. Either his experience is that there are more bad restaurants than good ones, which makes is more likely that the one he is just visiting is bad. Or the pessimistic assumption is made because it is the relevant one for ﬁnding out which decision minimizes maximal damage, and the customer indeed wishes to be on the safe side. 11 I wish to thank Wlodek Rabinowicz for inventing such a scenario in a discussion at a conference in Prague. Wolfgang Spohn came up with a similar point independently a little later in an email message.

REFERENCES

Alchourrón, Carlos, Peter Gärdenfors and David Makinson: 1985, ‘On the Logic of Theory Change: Partial Meet Contraction Functions and Their Associated Revision Functions’, Journal of Symbolic Logic 50, 510–530. Brewka, Gerd: 1991, ‘Cumulative Default Logic: In Defense of Non-Monotonic Inference Rules’, Artiﬁcial Intelligence 50, 183–205. Dennett, Daniel: 1971, ‘Intentional Systems’, Journal of Philosophy 68, 87–106. The page reference is to the reprint in D.D.: 1978, Brainstorms, MIT Press, Cambridge, MA, pp. 3–22. Gärdenfors, Peter: 1986, ‘Belief Revisions and the Ramsey Test for Conditionals’, Philosophical Review 95, 81–93. Gärdenfors, Peter: 1988, Knowledge in Flux. Modeling the Dynamics of Epistemic States, Bradford Books, MIT Press, Cambridge, MA. Gärdenfors, Peter and Hans Rott: 1995, ‘Belief Revision’, in D. M. Gabbay, C. J. Hogger, and J. A. Robinson (eds.), Handbook of Logic in Artiﬁcial Intelligence and Logic Programming Volume IV: Epistemic and Temporal Reasoning, Oxford University Press, Oxford, pp. 35–132.

[ 75 ]

240

HANS ROTT

Giordano, Laura and Alberto Martelli: 1994, ‘On Cumulative Default Logics’, Artiﬁcial Intelligence 66, 161–179. Hansson, Sven O.: 1999, A Textbook of Belief Dynamics: Theory Change and Database Updating, Kluwer Academic Publishers, Dordrecht. Kalai, Gil, Ariel Rubinstein, and Ran Spiegler: 2002, ‘Rationalizing Choice Functions by Multiple Rationales’, Econometrica 70, 2481–2488. Kyburg, Henry E., Jr.: 2001, ‘Real Logic is Nonmonotonic’, Minds and Machines 11, 577– 595. Levi, Isaac: 1986, Hard Choices, Cambridge University Press, Cambridge. Lindström, Sten: 1991, ‘A Semantic Approach to Nonmonotonic Reasoning: Inference Operations and Choice’, Uppsala Prints and Preprints in Philosophy, Department of Philosophy, University of Uppsala. Lindström, Sten and Wlodek Rabinowicz: 1999, ‘DDL Unlimited: Dynamic Doxastic Logic for Introspective Agents’, Erkenntnis 50, 353–385. Luce, R. Duncan and Howard Raiffa: 1957, Games and Decisions, John Wiley & Sons, New York. Makinson, David: 1994, ‘General Patterns in Nonmonotonic Reasoning’, in D. M. Gabbay, C. J. Hogger, and J. A. Robinson (eds.), Handbook of Logic in Artiﬁcial Intelligence and Logic Programming, Vol. 3: Nonmonotonic Reasoning and Uncertain Reasoning, Oxford University Press, Oxford, pp. 35–110. Makinson, David and Peter Gärdenfors: 1991, ‘Relations between the Logic of Theory Change and Nonmonotonic Logic’, in A. Fuhrmann and M. Morreau (eds.), The Logic of Theory Change, Springer LNAI 465, Berlin, pp. 185–205. McGee, Van: 1985, ‘A Counterexample to Modus Ponens’, Journal of Philosophy 82, 462– 471. Morgan, Charles G.: 2000, ‘The Nature of Nonmonotonic Reasoning’, Minds and Machines 10, 321–360. Moulin, Hervé: 1985, ‘Choice Functions over a Finite Set: A Summary’, Social Choice and Welfare 2, 147–160. Reiter, Raymond: 1980, ‘A Logic of Default Reasoning’, Artiﬁcial Intelligence 13, 81–132. Roos, Nico: 1998, ‘Reasoning by Cases in Default Logic’, Artiﬁcial Intelligence 99, 165– 183. Rott, Hans: 1989, ‘Conditionals and Theory Change: Revisions, Expansions, and Additions’, Synthese 81, 91–113. Rott, Hans: 1993, ‘Belief Contraction in the Context of the General Theory of Rational Choice’, Journal of Symbolic Logic 58, 1426–1450. Rott, Hans: 2001, Change, Choice and Inference, Oxford Logic Guides, Vol. 42, Clarendon Press, Oxford. Sen, Amartya K.: 1993, ‘Internal Consistency of Choice’, Econometrica 61, 495–521. Sen, Amartya K.: 1995, ‘Is the Idea of Purely Internal Consistency of Choice Bizarre?’, in J. E. J. Altham and R. Harrison (eds.), World, Mind, and Ethics. Essays on the Ethical Philosophy of Bernard Williams, Cambridge University Press, Cambridge, pp. 19–31. Stalnaker, Robert C.: 1984, Inquiry, Bradford Books, MIT Press, Cambridge, MA. Department of Philosophy, University of Regensburg 93040 Regensburg, Germany E-mail: [email protected]

[ 76 ]

VALENTIN GORANKO and WOJCIECH JAMROGA

COMPARING SEMANTICS OF LOGICS FOR MULTI-AGENT SYSTEMS

ABSTRACT. We draw parallels between several closely related logics that combine – in different proportions – elements of game theory, computation tree logics, and epistemic logics to reason about agents and their abilities. These are: the coalition game logics CL and ECL introduced by Pauly in 2000, the alternating-time temporal logic ATL developed by Alur, Henzinger and Kupferman between 1997 and 2002, and the alternating-time temporal epistemic logic ATEL by van der Hoek and Wooldridge (2002). In particular, we establish some subsumption and equivalence results for their semantics, as well as interpretation of the alternating-time temporal epistemic logic into ATL. The focus in this paper is on models: alternating transition systems, multi-player game models (alias concurrent game structures) and coalition effectivity models turn out to be intimately related, while alternating epistemic transition systems share much of their philosophical and formal apparatus. Our approach is constructive: we present ways to transform between different types of models and languages.

1. INTRODUCTION

In this study we offer a comparative analysis of several recent logical enterprises that aim at modeling multi-agent systems. Most of all, the coalition game logic CL and its extended version ECL (Pauly 2002, 2000b, 2001), and the Alternating-time Temporal Logic ATL (Alur et al. 1997, 1998a, 2002) are studied. These turn out to be intimately related, which is not surprising since all of them deal with essentially the same type of scenarios, viz. a set of agents (players, system components) taking actions, simultaneously or in turns, on a common set of states – and thus effecting transitions between these states. The game-theoretic aspect is very prominent in both approaches; furthermore, in both frameworks the agents pursue certain goals with their actions and in that pursuit they can form coalitions. In both enterprises the objective is to develop formal tools for reasoning about such coalitions of agents and their ability to achieve speciﬁed outcomes in these action games. An extension of ATL, called Alternating-time Temporal Epistemic Logic (ATEL) was introduced in van der Hoek and Wooldridge (2002) in order to enable reasoning about agents acting under incomplete informaSynthese 139: 241–280, 2004. Knowledge, Rationality & Action 77–116, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 77 ]

242

VALENTIN GORANKO AND WOJCIECH JAMROGA

tion. Although the semantics for ATEL is still under debate, the original version of that logic is certainly worth investigating. It turns out that, while extending ATL, ATEL can be interpreted into the former in the sense that there is a translation of models and formulas of ATEL into ATL that preserves the satisﬁability of formulas. This does not imply that logics like ATEL are redundant, of course – in fact, the way of expressing epistemic facts in ATL is purely technical, and the resulting formulas look rather unnatural. Similarly, each of the three alternative semantics for ECL and ATL, investigated here, has its own drawbacks and offers different advantages for practical use. The rest of the paper is organized a follows: ﬁrst, we offer a brief summary of some basic concepts from game theory; then we introduce the main “actors” of our study – logics and structures that have been used for modeling multi-agent systems in temporal perspective. In order to make the paper self-contained we have included all relevant deﬁnitions from Pauly (2002, 2001, 1998a), Alur et al. (2002), van der Hoek and Wooldridge (2002).1 In Sections 3 and 4 the relationships between these logics and structures are investigated in a formal way. We show that: 1. Speciﬁc classes of multi-player game models are equivalent to some types of alternating transition systems. 2. ATL subsumes CL as well as ECL. 3. The three alternative semantics for Alternating-time Temporal Logic and Coalition Logics (based on multi-player game models, alternating transition systems and coalition effectivity models) are equivalent. 4. Formulas and models of ATEL can be translated into its fragment ATL. The paper partly builds on previous work of ours, included in Goranko (2001) and Jamroga (2003).

2. MODELS AND LOGICS OF STRATEGIC ABILITY

The logics studied here have several things in common. They are intended for reasoning about various aspects of multi-agent systems and multi-player games, they are multi-modal logics, they have been obviously inspired by game theory, and they are based on the temporal logic approach. We present and discuss the logics and their models in this section. A broader survey of logic-based approaches to multi-agent systems can be found in Fagin et al. (1995) and van der Hoek and Wooldridge (2003b). [ 78 ]

LOGICS FOR MULTI-AGENT SYSTEMS

243

Figure 1. Extensive and strategic form of the matching pennies game: (A) perfect information case; (B) a1 does not show his coin before the end of the game.

2.1. Basic Inﬂuences 2.1.1. Classical Game Theory Logics of agents and action build upon several important concepts from game theory, most of them going back to the 40s and the seminal book (von Neumann and Morgenstern 1944). We will start with an informal survey of these concepts, following Mostly Hart (1992). An interested reader is referred to Aumann and Hart (1992), Osborne and Rubinstein (1994) for a more extensive introduction to game theory. In game theory, a game is usually presented in its extensive and/or strategic form. The extensive form deﬁnes the game via a tree of possible positions in the game (states), game moves (choices) available to players, and the outcome (utility or payoff) that players gain at each of the ﬁnal states. These games are usually turn-based, i.e., every state is assigned a player who controls the choice of the next move, so the players are taking turns. A strategy for player a speciﬁes a’s choices at the states controlled by a. The strategic form consists of a matrix that presents the payoffs for all combinations of players’ strategies. It presents the whole game in a ‘snapshot’ as if it was played in one single move, while the extensive form emphasizes control and information ﬂow in the game. [ 79 ]

244

VALENTIN GORANKO AND WOJCIECH JAMROGA

EXAMPLE 1. Consider a variant of the matching pennies game. There are two players, each with a coin: ﬁrst a1 chooses to show the heads (action h) or tails (t), then a2 does. If both coins are heads up or both coins are tails up, then a1 wins (and gets score of 1) and a2 loses (score 0). If the coins show different sides, then a2 is the winner. The extensive and strategic forms for this game are shown in Figure 1A. The strategies deﬁne agent’s choices at all ‘his’ nodes, and are labeled appropriately: q1 hq2 t denotes, for instance, a strategy for a2 in which the player chooses to show heads whenever the current state of the game is q1 , and tails at q2 . Note that – using this strategy – a2 wins regardless of the ﬁrst move from a1 . The information available to agents is incomplete in many games. Classical game theory handles this kind of uncertainty through partitioning every player’s nodes into so called information sets. An information set for player a is a set of states that are indistinguishable for a. Traditionally, information sets are deﬁned only for the states in which a chooses the next step. Now a strategy assigns choices to information sets rather than separate states, because players are supposed to choose the same move for all the situations they cannot distinguish. EXAMPLE 2. Suppose that a1 does not show his coin to a2 before the end of the game. Then nodes q1 and q2 belong to the same information set of a2 , as shown in Figure 1B. No player has a strategy that guarantees his win any more. A general remark is in order here. The concept of coalitional game, traditionally considered in game theory, where every possible coalition is assigned a real number (its worth), differs somewhat from the one considered here. In this study we are rather concerned with qualitative aspects of game structures rather than with quantitative analysis of speciﬁc games. It should be clear, however, that these two approaches are in agreement and can be easily put together. Indeed, the intermediate link between them is the notion of (qualitative) effectivity function (Pauly 2002). That notion naturally transfers over to alternating transition systems, thus providing a framework for purely game-theoretic treatment of alternating temporal logics. 2.1.2. Computational Tree Logic and Epistemic Logic Apart from game theory, the concepts investigated in this paper are strongly inﬂuenced by modal logics of computations (such as the computation tree logic CTL) and beliefs (epistemic logic). CTL (Emerson 1990; [ 80 ]

LOGICS FOR MULTI-AGENT SYSTEMS

245

Figure 2. Transitions of the variable controller/client system, together with the tree of possible computations.

Huth and Ryan 2000) involves several operators for temporal properties of computations in transition systems: E (for all paths), A (there is a path), X (nexttime), F (sometime), G (always) and U (until). ‘Paths’ refer to alternative courses of events that may happen in the future; nodes on a path denote states of the system in subsequent moments of time along this particular course. Typically, paths are interpreted as sequences of successive states of computations. EXAMPLE 3. As an illustration, consider a system with a binary variable x. In every step, the variable can retain or change its value. The states and possible transitions are shown in Figure 2. There are two propositions available to observe the value of x: ‘x = 0’ and ‘x = 1’. Then, for example, EF x = 1 is satisﬁed in every state of the system: there is a path such that x will have the value of 1 at some moment. However, the above is not true for every possible course of action: ¬AF x = 1. It is important to distinguish between the computational structure, deﬁned explicitly in the model, and the behavioral structure, i.e., the model of how the system is supposed to behave in time (Schnoebelen 2003). In many temporal models the computational structure is ﬁnite, while the implied behavioral structure is inﬁnite. The computational structure can be seen as a way of deﬁning the tree of possible (inﬁnite) computations that may occur in the system. The way the computational structure unravels into a behavioral structure (computation tree) is shown in Figure 2, too. Epistemic logic offers the notion of epistemic accessibility relation that generalizes information sets, and introduces operators for talking about [ 81 ]

246

VALENTIN GORANKO AND WOJCIECH JAMROGA

individual and collective knowledge. Section 4 describes them in more detail; a reader interested in a comprehensive exposition on epistemic logic can be also referred to the seminal book by Fagin, Halpern, Moses and Vardi (Fagin et al. 1995), or to van der Hoek and Verbrugge (2002) for a survey. 2.2. Coalition Game Logics and Multi-Player Game Models Coalition logic (CL), introduced in Pauly (2000b, 2002), formalizes reasoning about powers of coalitions in strategic games. It extends the classical propositional logic with a family of (non-normal) modalities [A], A ⊆ Agt, where Agt is a ﬁxed set of players. Intuitively, [A]ϕ means that coalition A can enforce an outcome state satisfying ϕ. 2.2.1. Multi-Player Strategic Game Models Game frames (Pauly 2002), represent multi-player strategic games where sets of players can form coalitions in attempts to achieve desirable outcomes. Game frames are based on the notion of a strategic game form – a tuple Agt, { a | a ∈ Agt}, Q, o} consisting of: – a non-empty ﬁnite set of agents (or players) Agt, – a family of (non-empty) sets of actions (choices, strategies) a for each player a ∈ Agt, – a non-empty set of states Q, – an outcome function o: a∈Agt a → Q which associates an outcome state in Q to every combination of choices from all the players. By a collective choice σA we will denote a tuple of choices σa a∈A (one for each player from A ⊆ Agt), and we will be writing o(σA , σAgt\A ) with the presumed meaning. REMARK 1. Note that the notion of “strategy” in strategic game forms is local, wrapped into one-step actions. It differs from the notion of ‘strategy’ in extensive game forms (used in the semantics of ATL) which represents a global, conditional plan of action. To avoid confusion, we will refer to the local strategies as choices, and use the term collective choice instead of strategy proﬁle from Pauly (2002) to denote a combination of simultaneous choices from several players. REMARK 2. A strategic game form deﬁnes the choices and transitions available at a particular state of the game. If the identity of the state does not follow from the context in an obvious way, we will use indices to indicate which state they refer to. [ 82 ]

LOGICS FOR MULTI-AGENT SYSTEMS

247

Figure 3. Transitions of the variable controller/client system.

The set of all strategic game forms for players Agt over states Q will be Agt denoted by Q . A multi-player game frame for a set of players Agt is a Agt pair Q, γ where γ : Q → Q is a mapping associating a strategic game form with each state in Q. A multi-player game model (MGM) for a set of players Agt over a set of propositions is a triple M = Q, γ , π where Q, γ is a multi-player game frame, and π : Q → P () is a valuation labeling each state from Q with the set of propositions that are true at that state. EXAMPLE 4. Consider a variation of the system with binary variable x from Example 3. There are two processes: the controller (or server) s can enforce the variable to retain its value in the next step, or let the client change the value. The client c can request the value of x to be 0 or 1. The players proceed with their choices simultaneously. The states and transitions of the system as a whole are shown in Figure 3. Again, we should make the distinction between computational and behavioral structures. The multi-player game model unravels into a computation tree in a way analogous to CTL models (cf. Figure 2). 2.2.2. Coalition Logic Formulas of CL are deﬁned recursively as: ϕ := p | ¬ϕ | ϕ ∨ ψ | [A]ϕ, where p ∈ is a proposition, and A ⊆ Agt is a group of agents. The semantics of CL can be given via the clauses: – M, q |= p iff p ∈ π(q) for atomic propositions p; – M, q |= [A]ϕ iff there is a collective choice σA such that for every collective choice σAgt\A , we have M, oq (σA , σAgt\A ) |= ϕ. EXAMPLE 5. Consider the variable client/server system from Example 4. The following CL formulas are valid in this model (i.e., true in every state of it): [ 83 ]

248

VALENTIN GORANKO AND WOJCIECH JAMROGA

1. (x = 0 → [s]x = 0) ∧ (x = 1 → [s]x = 1): the server can enforce the value of x to remain the same in the next step; 2. x = 0 → ¬[c]x = 1: c cannot change the value from 0 to 1 on his own; 3. x = 0 → ¬[s]x = 1: s cannot change the value on his own either; 4. x = 0 → [s, c]x = 1: s and c can cooperate to change the value. 2.2.3. Logics for Local and Global Effectivity of Coalitions In CL, the operators [A]ϕ can express local effectivity properties of coalitions, i.e., their powers to force outcomes in single ‘rounds’ of the game. Pauly (2000b) extends CL to the Extended Coalition Logic ECL with iterated operators for global effectivity [A∗ ]ϕ expressing the claim that coalition A has a collective strategy to maintain the truth of ϕ throughout the entire game. In our view, and in the sense of Remark 1, both systems formalize different aspects of reasoning about powers of coalitions: CL can be thought as reasoning about strategic game forms, while ECL rather deals with extensive game forms, representing sequences of moves, collectively effected by the players’ actions. Since ECL can be embedded as a fragment of ATL (as presented in Section 2.4), we will not discuss it separately here. 2.3. Alternating-Time Temporal Logic and its Models Game-theoretic scenarios can occur in various situations, one of them being open computer systems such as computer networks, where the different components can act as relatively autonomous agents, and computations in such systems are effected by their combined actions. The Alternatingtime Temporal Logics ATL and ATL∗ , introduced in Alur et al. (1997), and later reﬁned in Alur et al. (1998a, 2002), are intended to formalize reasoning about computations in such open systems which can be enforced by coalitions of agents, in a way generalizing the logics CTL and CTL∗. 2.3.1. The Logics ATL and ATL∗ In ATL∗ a class of cooperation modalities A replaces the path quantiﬁers E and A. The common-sense reading of A is: The group of agents A have a collective strategy to enforce regardless of what all the other agents do.

ATL is the fragment of ATL∗ subjected to the same syntactic restrictions which deﬁne CTL as a fragment of CTL∗ , i.e., every temporal operator must be immediately preceded by exactly one cooperation modality. The original CTL∗ operators E and A can be expressed in ATL∗ with [ 84 ]

LOGICS FOR MULTI-AGENT SYSTEMS

249

Agt and ∅ respectively, but between both extremes one can express much more about the abilities of particular agents and groups of agents. Since model-checking for ATL∗ requires 2EXPTIME, but it is linear for ATL, ATL is more useful for practical applications, and we will not discuss ATL∗ in this paper. Formally, the recursive deﬁnition of ATL formulas is: ϕ := p | ¬ϕ | ϕ ∨ ψ | AXϕ | AGϕ | AϕUψ The ‘sometime’ operator F can be deﬁned in the usual way as: AF ϕ ≡ Aϕ. It should be noted that at least three different versions of semantic structures for ATL have been proposed by Alur and colleagues in the last 7 years. The earliest version (Alur et al. 1997), involves deﬁnitions of a synchronous turn-based structure and an asynchronous structure in which every transition is controlled by a single agent. The next paper (Alur et al. 1998a) deﬁnes general structures called alternating transition systems where the agents’ choices are identiﬁed with the sets of possible outcomes. In the concurrent game structures from Alur et al. (2002), labels for choices are introduced and the transition function is simpliﬁed. The above papers share the same title and they are often cited incorrectly in the literature as well as citation indices, which may lead to some confusion. 2.3.2. Alternating Transition Systems Alternating transition systems – building on the concept of alternation developed in Chandra et al. (1981) – formalize systems of transitions effected by collective actions of all agents involved. In the particular case of one agent (the system), alternating transition systems are reduced to ordinary transition systems, and ATL reduces to CTL. An alternating transition system (ATS) is a tuple T = , Agt, Q, π, δ where: – is a set of (atomic) propositions, Agt is a non-empty ﬁnite set of agents, Q is a non-empty set of states, and π : Q → P () is a valuation of propositions; – δ: Q×Agt → P (P (Q)) is a transition function mapping a pair state, agent to a non-empty family of choices of possible next states. The idea is that at state q an agent a chooses a set Qa ∈ δ(q, a) thus forcing the outcome state to be from Qa . The resulting transition leads to a state which is in the intersection of all Qa for a ∈ Agt and so it reﬂects the mutual will of all agents. Since the system is required to be deterministic (given the state and the agents’ decisions), Q1 ∩ . . . ∩ Qk must always be a singleton.2 [ 85 ]

250

VALENTIN GORANKO AND WOJCIECH JAMROGA

Figure 4. An ATS for the controller/client problem.

DEFINITION 1. A state q2 ∈ Q is a successor of q1 if, whenever the system is in q1 , the agents can cooperate so that the next state is q2 , i.e., there are choice sets Qa ∈ δ(q1 , a), for each a ∈ Agt such that

suc a∈Agt Qa = {q2 }. The set of successors of q will be denoted by Qq . DEFINITION 2. A computation in T is an inﬁnite sequence of states q0 q1 . . . such that qi+1 is a successor of qi for every i ≥ 0. A q-computation is a computation starting from q. 2.3.3. Semantics of ATL Based on Alternating Transition Systems DEFINITION 3. A strategy for agent a is a mapping fa : Q+ → P (Q) which assigns to every non-empty sequence of states q0 , . . . qn a choice set fa (q0 . . . qn ) ∈ δ(qn , a). The function speciﬁes a’s decisions for every possible (ﬁnite) history of system transitions. A collective strategy for a set of agents A ⊆ Agt is just a tuple of strategies (one per agent from A): FA = fa a∈A . Now, out(q, FA ) denotes the set of outcomes of FA from q, i.e., the set of all q-computations in which group A has been using FA . REMARK 3. This notion of strategy can be speciﬁed as ‘perfect recall strategy’, where the whole history of the game is considered when the choice of the next move is made by the agents. The other extreme alternative is a ‘memoryless strategy’ where only the current state is taken in consideration; further variations on ‘limited memory span strategies’ are possible. While the choice of one or another notion of strategy affects the semantics of the full ATL∗ , it is not difﬁcult to see that both perfect recall strategies and memoryless strategies eventually yield equivalent semantics for ATL. [ 86 ]

LOGICS FOR MULTI-AGENT SYSTEMS

251

Let [i] denote the ith position in computation . The deﬁnition of truth of an ATL formula at state q of an ATS T = , Agt, Q, π, δ follows through the below clauses. Informally speaking, T , q A iff there exists a collective strategy FA such that is satisﬁed for all computations from out(FA , q). (A, X) T , q AXϕ iff there exists a collective strategy FA such that for every computation ∈ out(q, FA ) we have T , [1] ϕ; (A , G) T , q AGϕ iff there exists a collective strategy FA such that for every ∈ out(q, FA ) we have T , [i] ϕ for every i ≥ 0. (A, U) T , q AϕUψ iff there exists a collective strategy FA such that for every ∈ out(q, FA ) there is i ≥ 0 such that T , [i] ψ and for all j such that 0 ≤ j < i we have T , [j ] ϕ. EXAMPLE 6. An ATS for the variable client/server system is shown in Figure 4. The following ATL formulas are valid in this model: 1. (x = 0 → sX x = 0) ∧ (x = 1 → sX x = 1): the server can enforce the value of x to remain the same in the next step; 2. x = 0 → (¬cF x = 1 ∧ ¬sF x = 1): neither c nor s can change the value from 0 to 1, even in multiple steps; 3. x = 0 → s, cF x = 1: s and c can cooperate to change the value. 2.3.4. Semantics of ATL, Based on Concurrent Game Structures and Multi-player Game Models Alur et al. (2002) redeﬁnes ATL models as concurrent game structures: M = k, Q, , π, d, o, where k is the number of players (so Agt can be taken to be {1, . . . , k}), the decisions available to player a at state q are labeled with natural numbers up to da (q) (so a (q) can be taken to be {1, . . . , da (q)}); ﬁnally, a complete tuple of decisions α1 , . . . , αk from all the agents in state q implies a deterministic transition according to the transition function o(q, α1 , . . . , αk ). In a concurrent game structure the type of a strategy function is slightly different since choices are abstract entities indexed by natural numbers now, and a strategy is a mapping fa : Q+ → N such that fa (λq) ≤ da (q). The rest of the semantics looks exactly the same as for alternating transition systems. REMARK 4. Clearly, concurrent game structures are equivalent to Pauly’s multi-player game models; they differ from each other only in notation.3 [ 87 ]

252

VALENTIN GORANKO AND WOJCIECH JAMROGA

Thus, the ATL semantics can as well be based on MGMs, and the truth deﬁnitions look exactly the same as for alternating transition systems (see Section 2.3.3). We leave rewriting the deﬁnitions of a strategy, collective strategy and outcome set in terms of multi-player game models to the reader. The next section shows how this shared semantics can be used to show that ATL subsumes coalition logics. 2.4. Embedding CL and ECL into ATL Both CL and ECL are strictly subsumed by ATL in terms of the shared semantics based on multi-player game models.4 Indeed, there is a translation of the formulas of ECL into ATL, which becomes obvious once the ATL semantic clause (A, X) is rephrased as: [A] T , q AXϕ iff there exists a collective choice FA = {fa }a∈A such that for every collective choice F

Agt\A = {fa }a∈Agt\A , we have

T , s ϕ, where {s} = a∈A fa (q) ∩ a∈Agt\A fa (q) which is precisely the truth-condition for [A]ϕ in the coalition logic CL. Thus, CL embeds in a straightforward way as a simple fragment of ATL by translating [A]ϕ into AXϕ. Accordingly, [C ∗ ]ϕ translates into ATL as AGϕ, which follows from the fact that each of [C ∗ ]ϕ and AGϕ, is the greatest ﬁxpoint of the same operator over [C]ϕ and AXϕ respectively (see Section 2.5). In consequence, ATL subsumes ECL as the fragment ATLXG involving only AXϕ and AGϕ. We will focus on ATL, and will simply regard CL and ECL as its fragments throughout the rest of the paper. 2.5. Effectivity Functions and Coalition Effectivity Models as alternative semantics for ATL As mentioned earlier, game theory usually measures the powers of coalitions quantitatively, and characterizes the possible outcomes in terms of payoff proﬁles. That approach can be easily transformed into a qualitative one, where the payoff proﬁles are encoded in the outcome states themselves and each coalition is assigned a preference order on these outcome states. Then, the power of a coalition can be measured in terms of sets of states in which it can force the actual outcome of the game (i.e., sets for which it is effective), thus deﬁning another semantics for ATL, based on so called coalition effectivity models (introduced by Pauly for the coalition logics CL and ECL). This semantics is essentially a monotone neighborhood semantics for non-normal multi-modal logics, and therefore it enables the results, methods and techniques already developed for modal logics to be applied here as well. [ 88 ]

253

LOGICS FOR MULTI-AGENT SYSTEMS

Figure 5. A coalition effectivity function for the variable client/server system.

DEFINITION 4. A (local) effectivity function is a mapping of type e: P (Agt) → P (P (Q)). The idea is that we associate with each set of players the family of outcome sets for which their coalition is effective. However, the notion of effectivity function as deﬁned above is abstract and not every effectivity function corresponds to a real strategic game form. Those which do can be characterized with the following conditions: 1. Liveness: for every A ⊆ Agt, ∅ ∈ / e(A). 2. Termination: for every A ⊆ Agt, Q ∈ e(A). 3. Agt-maximality: if X ∈ / e(Agt) then Q \ X ∈ e(∅) (if X cannot be effected by the grand coalition of players, then Q \ X is inevitable). 4. Outcome-monotonicity: if X ⊆ Y and X ∈ e(A) then Y ∈ e(A). 5. Super-additivity: for all A1 , A2 ⊆ Agt and X1 , X2 ⊆ Q, if A1 ∩ A2 = ∅, X1 ∈ e(A1 ), and X2 ∈ e(A2 ), then X1 ∩ X2 ∈ e(A1 ∪ A2 ). We note that super-additivity and liveness imply consistency of the powers: for any A ⊆ Agt, if X ∈ e(A) then Q \ X ∈ e(Agt \ A). DEFINITION 5. An effectivity function e is called playable if conditions (1)–(5) hold for e. DEFINITION 6. An effectivity function e is the effectivity function of a strategic game form γ if it associates with each set of players A from γ the family of outcome sets {Q1 , Q2 , . . .}, such that for every Qi the coalition A has a collective choice to ensure that the next state will be in Qi . THEOREM 5 (Pauly 2002). An effectivity function is playable iff it is the effectivity function of some strategic game form. EXAMPLE 7. Figure 5 presents a playable effectivity function that describes powers of all the possible coalitions for the variable server/client system from Example 4, and state q0 . DEFINITION 8. A coalition effectivity frame is a triple F = Agt, Q, E where Agt is a set of players, Q is a non-empty set of states and E: Q → (P (Agt) → P (P (Q))) is a mapping which associates an effectivity [ 89 ]

254

VALENTIN GORANKO AND WOJCIECH JAMROGA

function with each state. We shall write Eq (A) instead of E(q)(A). A coalition effectivity model (CEM) is a tuple E = Agt, Q, E, π where Agt, Q, E is a coalition effectivity frame and π is a valuation of the atomic propositions over Q. DEFINITION 8. A coalition effectivity frame (resp. coalition effectivity model) is standard if it contains only playable effectivity functions. Thus, coalition effectivity models provide semantics of CL by means of the following truth deﬁnition (Pauly 2002): E, q |= [A]ϕ iff {s ∈ E | E, s |= ϕ} ∈ Eq (A). This semantics can be accordingly extended to semantics for ECL (Pauly 2001) and ATL (Goranko 2001) by deﬁning effectivity functions for the global effectivity operators in extensive game forms, where they indicate the outcome sets for which the coalitions have long-term strategies to effect. This extension can be done using the following ﬁxpoint characterizations of AGϕ and AϕUϕ as follows: AGϕ := νZ.ϕ ∧ AXZ, AϕUψ := µZ.ψ ∨ ϕ ∧ AXZ.

3. EQUIVALENCE OF THE DIFFERENT SEMANTICS FOR ATL

In this section we compare the semantics for Alternating-time Temporal Logic, based on alternating transition systems and multi-player game models – and show their equivalence (in the sense that we can transform the models both ways while preserving satisﬁability of ATL formulas). Further, we show that these semantics are both equivalent to the semantics based on coalition effectivity models. The transformation from alternating transition systems to multi-player game models is easy: in fact, for every ATS, an equivalent MGM can be constructed via re-labeling transitions (see Section 3.2). Construction the other way round is more sophisticated: ﬁrst, we observe that all multiplayer game models obtained from alternating transition systems satisfy a special condition we call convexity (Section 3.2); then we show that for every convex MGM, an equivalent ATS can be obtained (Section 3.3). Finally, we demonstrate that for every arbitrary multi-player game model a convex MGM can be constructed that satisﬁes the same formulas of ATL (Section 3.4). [ 90 ]

LOGICS FOR MULTI-AGENT SYSTEMS

255

We show also that the transformations we propose preserve the property of being a turn-based structure, and that they transform injective MGMs into lock-step synchronous ATSs and vice versa. 3.1. Some Special Types of ATSs and MGMs DEFINITION 9 (Pauly 2002). A strategic game form Agt, { a | a ∈ Agt}, Q, o is an a-dictatorship if there is a player a ∈ Agt who determines the outcome state of the game, i.e., ∀σa ∈ a ∃q ∈ Q∀σAgt\{a} o(σa , σAgt\{a} ) = q. An MGM Q, γ , π is turn-based if every γ (q) is a dictatorship.5 We note that the notion of a-dictatorship is quite strong: it presumes that any choice of the dictator forces a chosen state as the outcome. A meaningful alternative, which one can aptly call a-leadership, is when some choices of a can force the next state (the “wise choice of the leader”). It should be interesting to investigate whether the dictatorship-based and leadership-based strategic game forms lead to equivalent semantics for ATL. DEFINITION 10. A strategic game form is injective if o is injective, i.e., assigns different outcome states to different tuples of choices. An MGM is injective if it contains only injective game forms. EXAMPLE 8. Note that the variable client/server game model from Figure 3 is not injective, because choices reject, set0 and reject, set1 always have the same outcome. The model is not turn-based either: s is a leader at both q0 and q1 (he can determine the next state with σs = reject), but the outcome of his other choice (σs = accept) depends on the choice of the client. On the other hand, the game tree from Figure 1A can be seen as a turn-based MGM: player a1 is the dictator at state q0 , and player a2 is the dictator at q1 and q2 (both players can be considered dictators at q3 , q4 , q5 and q6 ). DEFINITION 11 (Alur et al. 1997). An ATS is turn-based synchronous if for every q ∈ Q there is an agent a who decides upon the next state, i.e., δ(q, a) consists entirely of singletons. Every ATS can be “tightened” by removing from every Q ∈ δ(q, a) all states which can never be realized as successors in a transition from q. Every reasonably general criterion should accept such tightening as equivalent to the original ATS. [ 91 ]

256

VALENTIN GORANKO AND WOJCIECH JAMROGA

DEFINITION 12. An ATS T = , Agt, Q, π, δ is tight if, for every q ∈ Q, a ∈ Agt and Qa ∈ δ(q, a), we have Qa ⊆ Qsuc q . COROLLARY 6. For every ATS T there is a tight ATS T which satisﬁes the same formulas of ATL. DEFINITION 13. An ATS is lock-step synchronous if the set of sucof every cessor states Qsuc q state q can be labeled with all tuples from some Cartesian product a∈Agt Qa so that all choice sets from δ(q, a) are ‘hyperplanes’ in Qsuc q , i.e., sets of the form {qa } × b∈Agt\{a} Qb , 6 where qa ∈ Qa . In other words, the agents act independently and each of them can only determine its ‘private’ component of the next state. It is worth emphasizing that lock-step synchronous systems closely correspond to the concept of interpreted systems from the literature on reasoning about knowledge (Fagin et al. 1995). Note that every lock-step synchronous ATS is tight.

3.2. From alternating transition systems to MGMs First, for every ATS T = , Agt, Q, π, δ over a set of agents Agt = {a1 , . . . , ak } there is an equivalent MGM M T = Q, γ T , π where, for q each q ∈ Q, the strategic game form γ T (q) = Agt, { a | a ∈ Agt}, oq , Q is deﬁned in a very simple way: q

– a = δ(q, a), – oq (Qa1 , . . . , Qak ) = s where

ai ∈Agt Qai

= {s}.

EXAMPLE 9. Let us apply the transformation to the alternating transition system from Example 6. The resulting MGM is shown in Figure 6. The following proposition states that it satisﬁes the same ATL formulas as the original system. Note that – as T and M T include the same set of states Q – the construction preserves validity of formulas (in the model), too. PROPOSITION 7. For every alternating transition system T , a state q in it, and an ATL formula ϕ: T , q |= ϕ iff M T , q |= ϕ. The models M T deﬁned as above share a speciﬁc property which will be deﬁned below. First, we need an auxiliary technical notion: a fusion of n-tuples (α1 , . . . , αn ) and (β1 , . . . , βn ) is any n-tuple (γ1 , . . . , γn ) where γi ∈ {αi , βi } , i = 1, . . . , n. The following is easy to check. [ 92 ]

LOGICS FOR MULTI-AGENT SYSTEMS

257

Figure 6. From ATS to a convex game structure: M T for the system from Figure 4.

PROPOSITION 8. For any game form Agt, { a | a ∈ Agt}, Q, o, where Agt = {a1 , . . . , ak }, the following two properties of the outcome function o: a∈Agt a → Q are equivalent: (i) If o(σa1 , . . . , σak ) = o(τa1 , . . . , τak ) = s then o(ςa1 , . . . , ςak ) = s for every fusion (ςa1 , . . . , ςak ) of (σa1 , . . . , σak ) and (τa1 , . . . , τak ). (ii) For every s ∈ Q there are a ⊆ a such that o−1 (s) = a∈Agt a . DEFINITION 14. A strategic game form Agt, { a | a ∈ Agt}, Q, o is convex if the outcome function o satisﬁes (any of) the two equivalent properties above. A multi-player game model M = (Q, γ , π ) is convex if γ (q) is convex for every q ∈ Q. PROPOSITION 9. For every ATS T , the game model M T is convex. Proof: Let M T be deﬁned as above. If oq (Q1a1 , . . . , Q1ak ) = oq (Q2a1 , j . . ., Q2ak ) = s then s ∈ Qa for each j = 1, 2 and a ∈ Agt, therefore

ja j1 jk 1 1 a∈Agt Qa = {s} for any fusion (Qa1 , . . . , Qak ) of (Qa1 , . . . , Qak ) and (Q2a1 , . . . , Q2ak ). [ 93 ]

258

VALENTIN GORANKO AND WOJCIECH JAMROGA

REMARK 10. Pauly has pointed out that the convexity condition is known in game theory under the name of ‘rectangularity’ and rectangular strategic game forms which are ‘tight’ in sense that their α – and β – effectivity functions coincide are characterized in Abdou (1998) as the normal forms of extensive games with unique outcomes. PROPOSITION 11. 1. Every turn-based game model is convex. 2. Every injective game model is convex. Proof. (1) Let M = Q, γ , π be a turn-based MGM for a set of players Agt, and let d ∈ Agt be the dictator for γ (q), q ∈ Q. Then for q −1 every s ∈ Q, we have oq (s) = a∈Agt a where d = {σd ∈ d | q oq (. . . , σd , . . .) = s}, and a = a for all a = d. (2) is trivial. Note that the MGM from Figure 6 is convex, although it is neither injective nor turn-based, so the reverse implication does not hold. 3.3. From Convex Multi-Player Game Models to Alternating Transition Systems As it turns out, convexity is a sufﬁcient condition if we want to relabel transitions from a multi-player game model back to an alternating transition system. Let M = Q, γ , π be a convex MGM over a set of q propositions , where Agt = {a1 , . . . , ak }, and let γ (q) = Agt, { a | M a ∈ Agt}, Q, oq for each q ∈ Q. We transform it to an ATS T = , Agt, Q, π, δ M with the transition function δ M deﬁned by δ M (q, a) = {Qσa | σa ∈ aq }, q

Qσa = {oq (σa , σAgt\{a} ) | σAgt\{a} = σb1 , . . . , σbk−1 , bi = a, σbi ∈ σbi . Thus, Qσa is the set of states to which a transition may be effected from q while agent a has chosen to execute σa . Moreover, δ M (q, a) simply collects all such sets. For purely technical reasons we will regard these δ M (q, a) as indexed families, i.e., even if some Qσ1 and Qσ2 are set-theoretically equal, they will be considered different as long as σ1 = σ2 . By con

vexity of γ (q) it is easy to verify that a∈Agt Qσa = {oq (σa1 , . . . , σak )} for every tuple (Qσa1 , . . . , Qσak ) ∈ δ M (q, a1 ) × · · · × δ M (q, ak ). Furthermore, the following propositions hold. PROPOSITION 12. For every convex MGM M the ATS T M is tight. [ 94 ]

LOGICS FOR MULTI-AGENT SYSTEMS

259

PROPOSITION 13. For every convex MGM M, a state q in it, and an ATL formula ϕ, M, q |= ϕ iff T M , q |= ϕ. Note that the above construction transforms the multi-player game model from Figure 6 exactly back to the ATS from Figure 4. More generally, the constructions converting tight ATSs into convex MGMs and vice versa are mutually inverse, thus establishing a duality between these two types of structures: PROPOSITION 14. T

1. Every tight ATS T is isomorphic to T M . M 2. Every convex MGM M is isomorphic to M T . T

Proof. 1. It sufﬁces to see that δ M (q, a) = δ(q, a) for every q ∈ Q and a ∈ Agt which is straightforward from the tightness of T . q 2. Let M = Q, γ , π be a convex MGM and γ (q) = Agt, { a | a q ∈ Agt}, Q, oq for q ∈ Q. For every σa ∈ a we identify σa with Qσa deﬁned as above. We have to show that the outcome functions oq in M and TM oq in

M agree under that identiﬁcation. Indeed, oq (Qσa1 , . . . , Qσak ) = s iff a∈Agt Qσa = {s} iff oq (σa1 , . . . , σak ) = s. The following proposition shows the relationship between structural properties of MGMs and ATSs: PROPOSITION 15. 1. For every ATS T the game model M T is injective iff T is lock-step synchronous. 2. For every convex MGM M, the ATS T M is lock-step synchronous iff M is injective. 3. For every turn-based synchronous ATS T the game model M T is turnbased. Conversely, if M T is turn-based for some tight ATS T then T is turn-based synchronous. 4. For every convex MGM M the ATS T M is turn-based synchronous iff M is turn-based. Proof. (1) Let T be lock-step synchronous and oq (Qa1 , . . . , Qak ) = s ai ∈ δ(q, ai ), i = 1, . . . , k. Then Qai = {sai } T× a1 , . . . , sak for some Q suc Q where Q = a q a∈Agt\{ai } a∈Agt Qa , whence the injectivity of M . T can be labeled Conversely, if M is injective then every state s ∈ Qsuc q . . . , Qak such that oq (Qa1 , . . . , Qak ) = s, with the unique tuple Qa1 , i.e., Qsuc q is represented by a∈Agt δ(q, a), and every Qai ∈ δ(q, ai ) can be identiﬁed with {Qai } × a∈Agt\{ai } δ(q, a). [ 95 ]

260

VALENTIN GORANKO AND WOJCIECH JAMROGA

q (2) If M is injective then Qsuc by a∈Agt a where every q can be labeled Qσai ∈ δ(q, ai ) is identiﬁed with {σai } × a∈Agt\{ai } δ(q, a). Conversely, if T M is lock-step synchronous then every two different Qσa and Qσa from δ(q, a) must be disjoint, whence the injectivity of M. (3) and (4): the proofs are straightforward. 3.4. Equivalence between the Semantics for ATL Based on ATS and MGM So far we have shown how to transform alternating transition systems to convex multi-player game models, and vice versa. Unfortunately, not every MGM is convex. However, for every MGM we can construct a convex multi-player game model that satisﬁes the same formulas of ATL. This can be done by creating distinct copies of the original states for different incoming transitions, and thus ‘storing’ the knowledge of the previous state and the most recent choices from the agents in the new states. Since the actual choices are present in the label of the resulting state, the new transition function is obviously injective. It is also easy to observe that the below construction preserves not only satisﬁability, but also validity of formulas (in the model). PROPOSITION 16. For every MGM M = Q, γ , π there is an injective (and hence convex) MGM M = Q , γ , π which satisﬁes the same formulas of ATL. q Proof. γ (q) = Agt, { a | a ∈ Agt}, Q, oq we deﬁne Qq = For every q {q}× a∈Agt a and let Q = Q∪ q∈Q Qq . Now we deﬁne γ as follows: – for q ∈ Q, we deﬁne γ (q) = Agt, { a | a ∈ Agt}, O q , Q , and O q (σa1 , . . . , σak ) = q, σa1 , . . . , σak ; – for σ = q, σa1 , . . . , σak ∈ Qq , and s = oq (σa1 , . . . , σak ), we deﬁne γ (σ ) = γ (s); – ﬁnally, π (q) = π(q) for q ∈ Q, and π (q, σa1 , . . . , σak ) = π(oq (σa1 , . . . , σak )) for q, σa1 , . . . σak ∈ Qq . q

The model M is injective and it can be proved by a straightforward induction that for every ATL formula ϕ: – M , q |= ϕ iff M, q |= ϕ for q ∈ Q, and – M , σa1 , . . . , σak |= ϕ iff M, oq (σa1 , . . . , σak ) σa1 , . . . , σak ∈ Qq .

|=

ϕ for

Thus, the restriction of the semantics of ATL to the class of injective (and hence to convex, as well) MGMs does not introduce new validities. Since every ATS can be reduced to an equivalent tight one, we obtain the following. [ 96 ]

LOGICS FOR MULTI-AGENT SYSTEMS

261

Figure 7. Construction of a convex multi-player game model equivalent to the MGM from Figure 3.

Figure 8. ATS-style transition function for the convex game model from Figure 7.

COROLLARY 17. For every ATL formula ϕ the following are equivalent: 1. ϕ is valid in all (tight) alternating transition systems. 2. ϕ is valid in all (injective) multi-player game models. We note that the above construction preserves validity and satisﬁability of ATL∗ formulas, too. EXAMPLE 10. We can apply the construction to the controller from Example 4, and obtain a convex MGM equivalent to the original one in the context of ATL. The result is displayed in Figure 7. The labels for the transitions can be easily deduced from their target states. Re-writing the game model into an isomorphic ATS, according to the Construction from Section 3.3 (see Figure 8), completes the transformation from an arbitrary [ 97 ]

262

VALENTIN GORANKO AND WOJCIECH JAMROGA

multi-player game model to an alternating transition system for which the same ATL formulas hold.

3.5. ATS or MGM? Alur has stated7 that the authors of ATL switched from alternating transition systems to concurrent game structures mostly to improve understandability of the logic and clarity of the presentation. Indeed, identifying actions with their outcomes may make things somewhat artiﬁcial and unnecessarily complicated. In particular, we ﬁnd the convexity condition which ATSs impose too strong and unjustiﬁed in many situations. For instance, consider the following variation of the ‘Chicken’ game: two cars running against each other on a country road and each of the drivers, seeing the other car, can take any of the actions: ‘drive straight’, ‘swerve to the left’ and ‘swerve to the right’. Each of the combined actions for the two drivers: drive straight, swerve to the left and swerve to the right, drive straight leads to a non-collision outcome, while each of their fusions drive straight, drive straight and swerve to the left, swerve to the right leads to a collision. Likewise, in the ‘Coordinated Attack’ scenario (Fagin et al. 1995) any non-coordinated one-sided attack leads to defeat, while the coordinated attack of both armies, which is a fusion of these, leads to a victory. Thus, the deﬁnition of outcome function in coalition games is more general and ﬂexible in our opinion. Let us consider the system from Example 4 again. The multi-player game model (or concurrent game structure) from Figure 3 looks natural and intuitive. Unfortunately, there is no ATS with the same number of states and transitions that ﬁts the system description. In consequence, an ATS modeling the same situation must be larger (Jamroga 2003). The above examples show that correct alternating transition systems are more difﬁcult to come up with directly than multi-player game models, and usually they are more complex, too. This should be especially evident when we consider open systems. Suppose we need to add another client process to the ATS from Example 6. It would be hard to extend the existing transition function in a straightforward way so that it still satisﬁes the formal requirements (all the intersections of choices are singletons). Designing a completely new ATS is probably an easier solution. Another interesting issue is extendibility of the formalisms. Game models incorporate explicit labels for agents’ choices – therefore the labels can be used, for instance, to restrict the set of valid strategies under uncertainty (Jamroga and van der Hoek 2003). [ 98 ]

LOGICS FOR MULTI-AGENT SYSTEMS

263

Figure 9. Coalition effectivity model for the variable client/server system.

3.6. Coalition Effectivity Models as Equivalent Alternative Semantics for ATL Effectivity functions and coalition effectivity models were Introduced in Section 2.5, including a characterization of these effectivity functions which describe abilities of agents and their coalitions in actual strategic game forms (playable effectivity functions, Theorem 5). We are going to extend the result to correspondence between multi-player game models and standard coalition effectivity models (i.e., the coalition effectivity models that contain only playable effectivity functions). Every MGM M = Q, γ , π for the set of players Agt corresponds to a CEM E M = Agt, Q, E M , π , where for every q ∈ Q, X ⊆ Q and A ⊆ Agt, we have X ∈ EqM (A) iff ∃σA ∀σAgt\A ∃s ∈ Xo(σA , σAgt\A ) = s. The choices refer to the strategic game form γ (q). Conversely, by Theorem 5, for every standard coalition effectivity model E there is a multiplayer game model M such that E is isomorphic to E M . Again, by a straightforward induction on formulas, we obtain: PROPOSITION 18. For every MGM M, a state q in it, and an ATL formula ϕ, we have M, q |= ϕ iff E M , q |= ϕ. EXAMPLE 11. Let M be the multi-player game model from Example 4 (variable client/server system). Coalition effectivity model E M is presented in Figure 9. By Proposition 9 and Corollary 17, we eventually obtain: [ 99 ]

264

VALENTIN GORANKO AND WOJCIECH JAMROGA

THEOREM 19. For every ATL formula ϕ the following are equivalent: 1. ϕ is valid in all (tight) alternating transition systems, 2. ϕ is valid in all (injective) multi-player game models, 3. ϕ is valid in all standard coalition effectivity models. Thus, the semantics of ATL based on alternating transition systems, multi-player game models, and standard coalition effectivity models are equivalent. We note that, while the former two semantics are more concrete and natural, they are mathematically less elegant and suitable for formal reasoning about ATL, while the semantics based on coalition effectivity models is essentially a monotone neighborhood semantics for multi-modal logics. The combination of these semantics was used in Goranko and van Drimmelen (2003) to establish a complete axiomatization of ATL. 4. ATEL: ADDING KNOWLEDGE TO STRATEGIES AND TIME

Alternating-time Temporal Epistemic Logic ATEL (van der Hoek and Wooldridge 2002, 2003a) enriches the picture with epistemic component. ATEL adds to ATL operators for representing agents’ knowledge: Ka ϕ reads as “agent a knows that ϕ”. Additional operators EA ϕ, CA ϕ, and DA ϕ refer to “everybody knows”, common knowledge, and distributed knowledge among the agents from A. Thus, EA ϕ means that every agent in A knows that ϕ holds, while CA ϕ means not only that the agents from A know that ϕ, but they also know that they know that, and know that they know that they know it, etc. The distributed knowledge modality DA ϕ denotes a situation in which, if the agents could combine their individual knowledge together, they would be able to infer that ϕ is true. 4.1. AETS and Semantics of Epistemic Formulas Models for ATEL are called alternating epistemic transition systems (AETS). They extend alternating transition systems with epistemic accessibility relations ∼1 , . . . , ∼k ⊆ Q × Q for modeling agents’ uncertainty: T = Agt, Q, , π, ∼a1 , . . . , ∼ak , δ. These are assumed to be equivalence relations. Agent a’s epistemic relation is meant to encode a’s inability to distinguish between the (global) system states: q ∼a q means that, while the system is in state q, agent a cannot really determine whether it is in q or q . Then: T , q |= Ka ϕ iff for all q such that q ∼a q we have T , q |= ϕ [ 100 ]

LOGICS FOR MULTI-AGENT SYSTEMS

265

Figure 10. An AETS for the modiﬁed controller/client problem. The dotted lines display the epistemic accessibility relations for s and c.

REMARK 20. Since the epistemic relations are required to be equivalences, the epistemic layer of ATEL refers indeed to agents’ knowledge rather than beliefs in general. We suggest that this requirement can be relieved to allow ATEL for other kinds of beliefs as well. In particular, the interpretation of ATEL into ATL we propose in Section 4.4 does not assume any speciﬁc properties of the accessibility relations. Relations ∼EA , ∼CA and ∼D A , used to model group epistemics, are derived from the individual accessibility relations of agents from A. First, ∼EA is the union of the relations, i.e., q ∼EA q iff q ∼a q for some a ∈ A. In other words, if everybody knows ϕ, then no agent may be unsure about the truth of it, and hence ϕ should be true in all the states that cannot be distinguished from the current state by even one member of the group. Next, ∼CA is deﬁned as the transitive closure of ∼EA . Finally, ∼D A is the intersection of all the ∼a , a ∈ A: if any agent from A can distinguish q from q , then the whole group can distinguish the states in the sense of distributed knowledge. The semantics of group knowledge can be deﬁned as below (for K = C, E, D): T , q |= KA ϕ iff for all q such that q ∼K A q we have T , q |= ϕ

The time complexity of model checking for ATEL is still polynomial (van der Hoek and Wooldridge 2003a). EXAMPLE 12. Let us consider another variation of the variable controller example: the client can try to add 1 or 2 (modulo 3) to the value of x; the server can still accept or reject the request (Figure 10). The dotted lines show that c cannot distinguish being in state q0 from being in q1 , while s isn’t able to discriminate q0 from q2 . Some formulas that are valid for this AETS are shown below: [ 101 ]

266 1. 2. 3. 4.

VALENTIN GORANKO AND WOJCIECH JAMROGA

x x x x

= 1 → Ks x = 1, = 2 → Es,c ¬x = 1 ∧ ¬Cs,c ¬x = 1, = 0 → sXx = 0 ∧ ¬Ks sXx = 0, = 2 → s, cX(x = 0 ∧ ¬Es,c x = 0).

4.2. Extending Multi-Player Game Models and Coalition Effectivity Models to Include Knowledge Multi-player game models and coalition effectivity models can be augmented with epistemic accessibility relations in a similar way, giving way to multi-player epistemic game models M = q, γ , π, ∼a1 , . . . , ∼ak and epistemic coalition effectivity models E = Agt, Q, E, π, ∼a1 , . . . , ∼ak for a set of agents Agt = {a1 , . . . , ak } over a set of propositions . Semantic rules for epistemic formulas remain the same as in Section 4.1 for both kinds of structures. The equivalence results from Section 3 can be extended to ATEL and its models. COROLLARY 21. For every ATEL formula ϕ the following are equivalent: 1. ϕ is valid in all (tight) alternating epistemic transition systems, 2. ϕ is valid in all (injective) multi-player epistemic game models, 3. ϕ is valid in all standard epistemic coalition effectivity models. We will use multi-player epistemic game models throughout the rest of this chapter for the convenience of presentation they offer. 4.3. Problems with ATEL One of the main challenges in ATEL is the question how, given an explicit way to represent agents’ knowledge, this should interfere with the agents’ available strategies. What does it mean that an agent has a strategy to enforce ϕ, if it involves making different choices in states that are epistemically indistinguishable for the agent, for instance? Moreover, agents are assumed some epistemic capabilities when making decisions, and other for epistemic properties like Ka ϕ. The interpretation of knowledge operators refers to the agents’ capability to distinguish one state from another; the semantics of A allows the agents to base their decisions upon sequences of states. These relations between complete vs. incomplete information on one hand, and perfect vs. imperfect recall on the other, has been studied in Jamroga and van der Hoek (2003). It was also argued that, when reasoning about what an agent can enforce, it seems more appropriate to require the agent to know his winning strategy rather than to know only that such a strategy exists.8 Two variations of ATEL were proposed as solutions: [ 102 ]

LOGICS FOR MULTI-AGENT SYSTEMS

267

Alternating-time Temporal Observational Logic (ATOL) for agents with bounded memory and syntax restricted in a way similar to CTL, and full Alternating-time Temporal Epistemic Logic with Recall (ATELR∗) where agents were able to memorize the whole game. The issue of a philosophically consistent semantics for Alternating-time Temporal Logic with epistemic component is still under debate, and it is rather beyond the scope of this paper. We believe that analogous results to those presented here about ATEL can be obtained for logics like ATOL and ATELRs and their models. 4.4. Interpretations of ATEL into ATL ATL is trivially embedded into ATEL since all ATL formulas are also ATEL formulas. Moreover, every multi-player game model can be extended to a multi-player epistemic game model by deﬁning all epistemic accessibility relations to be the equality, i.e. all agents have no uncertainty about the current state of the system – thus embedding the semantics of ATL in the one for ATEL, and rendering the former a reduct of the latter. Interpretation the other way is more involved. We will ﬁrst construct a satisﬁability preserving interpretation of the fragment of ATEL without distributed knowledge (we will call it ATELCE), and then we will show how it can be extended to the whole ATEL, though at the expense of some blow-up of the models. The interpretation we propose has been inspired by Schild (2000). We should also mention (van Otterloo et al. 2003), as it deals with virtually the same issue. Related work is discussed in more detail at the end of the section. 4.4.1. Idea of the Interpretation ATEL consists of two orthogonal layers. The ﬁrst one, inherited from ATL, refers to what agents can achieve in temporal perspective, and is underpinned by the structure deﬁned via transition function o. The other layer is the epistemic component, reﬂected by epistemic accessibility relations. Our idea of the translation is to leave the original temporal structure intact, while extending it with additional transitions to ‘simulate’ epistemic accessibility links. The ‘simulation’ – like the one in van Otterloo et al. (2003) – is achieved through adding new “epistemic” agents, who can enforce transitions to epistemically accessible states. Unlike in that paper, though, the “moves” of epistemic agents are orthogonal to the original temporal transitions (‘action’ transitions): they lead to special ‘epistemic’ copies of the original states rather than to the ‘action’ states themselves, and no new states are introduced into the course of action. The ‘action’ and “epistemic” states form separate strata in the resulting model, and [ 103 ]

268

VALENTIN GORANKO AND WOJCIECH JAMROGA

Figure 11. New model: ‘action’ vs. ‘epistemic’ states, and ‘action’ vs. ‘epistemic’ transitions. Note that the game frames for ‘epistemic’ states are exact copies of their ‘action’ originals: the ‘action’ transitions from the epistemic layer lead back to the ‘action’ states.

are labeled accordingly to distinguish transitions that implement different modalities. The interpretation consists of two independent parts: a transformation of models and a translation of formulas. First, we propose a construction that transforms every multi-player epistemic game model M for a set of agents {a1 , . . . , ak }, into a (pure) multi-player game model M ATL over a set of agents {a1 , . . . , ak , e1 , . . . , ek }. Agents a1 , . . . , ak are the original agents from M (we will call them ‘real agents’). Agents e1 , . . . , ek are ‘epistemic doubles’ of the real agents: the role of ei is to ‘point out’ the states that were epistemically indistinguishable from the current state for agent a1 in M. Intuitively, Kai ϕ could be then replaced with a formula like ¬ei X¬ϕ that rephrases the semantic deﬁnition of Ka operator from Section 4.1. As M ATL inherits the temporal structure from M, temporal formulas might be left intact. However, it is not as simple as that. Note that agents make their choices simultaneously in multi-player game models, and the resulting transition is a result of all these choices. In consequence, it is not possible that an epistemic agent ei can enforce an ‘epistemic’ transition to state q, and at the same time a group of real agents A is capable of executing an ‘action’ transition to q . Thus, in order to distinguish transitions referring to different modalities, we introduce additional states in model M ATL . States q1ei , . . . , qnei are exact copies of the [ 104 ]

LOGICS FOR MULTI-AGENT SYSTEMS

269

original states q1 , . . . , qn from Q except for one thing: they satisfy a new proposition ei , added to enable identifying moves of epistemic agent ei . Original states q1 , . . . , qn are still in M ATL to represent targets of ‘action’ moves of the real agents a1 , . . ., ak . We will use a new proposition act to label these states. The type of a transition can be recognized by the label of its target state (cf. Figure 11). Now, we must only arrange the interplay between agents’ choices, so that the results can be interpreted in a direct way. To achieve this, every epistemic agent can choose to be “passive” and let the others decide upon the next move, or may select one of the states indistinguishable from q for an agent ai (to be more precise, the epistemic agents do select the epistemic copies of states from Qei rather than the original action states from Q). The resulting transition leads to the state selected by the ﬁrst nonpassive epistemic agent. If all the epistemic agents decided to be passive, the “action” transition chosen by the real agents follows. For such a construction of M ATL , we can ﬁnally show how to translate formulas from ATEL to ATL: 1. Kai ϕ can be rephrased as ¬{e1 , . . . , ei }X(ei ∧ ¬ϕ): the epistemic moves to agent ei ’s epistemic states do not lead to a state where ϕ fails. Note that player ei can select a state of his if, and only if, players e1 , . . . , ei−1 are passive (hence their presence in the cooperation modality). Note also that Kai ϕ can be as well translated as ¬{e1 , . . . , ek } X(ei ∧ ¬ϕ) or ¬{a1 , . . . , ak , e1 , . . . , ek }X(ei ∧ ¬ϕ): when ei decides to be active, choices from a1 , . . . , ak and ei+1 , . . . , ek are irrelevant. 2. AXϕ becomes A ∪ {e1 , . . . , ek }X(act ∧ ϕ) in a similar way. 3. To translate other temporal formulas, we must require that the relevant part of a path runs only through ‘action’ states (labeled with act proposition). Thus, AGϕ can be rephrased as ϕ ∧ A ∪ Agte XA ∪ Agte G(act ∧ ϕ). Note that a simpler translation with A ∪ Agte G(act ∧ ϕ) is incorrect: the initial state of a path does not have to be an action state, since Aϕ can be embedded in an epistemic formula. A similar method applies to the translation of AϕUψ. 4. Translation of common knowledge refers to the deﬁnition of relation ∼CA as the transitive closure of relations ∼ai : CA ϕ means that all the (ﬁnite) sequences of appropriate epistemic transitions must end up in a state where ϕ is true. The only operator that does not seem to lend itself to a translation according to the above scheme is the distributed knowledge operator D, for which we seem to need more ‘auxiliary’ agents. Thus, we will begin with presenting details of our interpretation for ATELCE – a reduced version [ 105 ]

270

VALENTIN GORANKO AND WOJCIECH JAMROGA

of ATEL that includes only common knowledge and ‘everybody knows’ operators for group epistemics. Section 4.4.3 shows how to modify the translation to include distributed knowledge as well. We note that an analogous interpretation into ATL can be proposed for the propositional version of BDI logic based on CTL. 4.4.2. Interpreting Models and Formulas of ATELCE into ATL Given a multi-player epistemic game model M = Q, γ , π, ∼a1 , . . . ∼ ak for a set of agents Agt = {a1 , . . . , ak } over a set of propositions , we construct a new game model M ATL = Q , γ , π over a set of agents Agt = Agt ∪ Agte , where: 1. Agte = {e1 , . . . , ek } is the set of epistemic agents; 2. Q = Q ∪ Qe1 ∪ · · · ∪ Qek , where Qei = {q ei | q ∈ Q}. We assume that Q, Qe1 , . . . , Qek are pairwise disjoint. Further we will be using any S ⊆ Q. the more general notation S (ei ) = {q ei | q ∈ S} for 3. = ∪ {act, e1 , . . . , ek }, and π (p) = π(p) ∪ i=1,...,k π(p)ei for every proposition p ∈ . Moreover, π (act) = Q and π (ei ) = Qei . q

For every state q in M, we translate the game frame γ (q) = Agt, { a | q a ∈ Agt}, Q, o to γ (q) = Agt , { a | a ∈ Agt }, Q , o : q

q

1. a = a for a ∈ Agt: choices of the ‘real’ agents do not change; q 2. ei = {pass} ∪ img(q, ∼ai )ei for i = 1, . . . k, where img(q, R) = {q | qRq } is the image of q with respect to relation R. 3. the new transition function is deﬁned as follows:

⎧ ⎨ oq (σa1 , . . . , σak ) if σe1 = · · · = σek = pass σei if ei is the ﬁrst active oq (σa1 , . . . , σak , σe1 , . . . , σek ) = ⎩ epistemic agent.

The game frames for the new states are exactly the same: γ (q ei ) = γ (q) for all i = 1, . . . , k, q ∈ Q. EXAMPLE 13. A part of the resulting structure for the epistemic game model from Figure 10 is shown in Figure 12. All the new states, plus the transitions going out of q2 are presented. The wildcard ‘∗ ’ stands for any action of the respective agent. For instance, reject, ∗ , pass, pass represents reject, set0, pass, pass} and reject, set1, pass,pass. Now, we deﬁne a translation of formulas from ATELCE to ATL corresponding to the above described interpretation of ATEL models into ATL models: tr(p) = p, [ 106 ]

for p ∈

271

LOGICS FOR MULTI-AGENT SYSTEMS

Figure 12. Construction for the multi-player epistemic game model from Figure 10.

¬tr(ϕ) tr(ϕ) ∨ tr(ψ) A ∪ Agte X(act ∧ tr(ϕ)) tr(ϕ) ∧ A ∪ Agte XA ∪ Agte G(act ∧ tr(ϕ)) tr(ψ) ∨ (tr(ϕ) ∧ A ∪ Agte XA ∪ Agte (act ∧ tr(ϕ))U(act ∧ tr(ψ))) tr(Kai ϕ) = ¬{e1 , . . . , ei }X(ei ∧ ¬tr(ϕ)) ei ∧ ¬tr(ϕ)) tr(EA ϕ) = ¬Agte X(

tr(¬ϕ) tr(ϕ ∨ ψ) tr(AXϕ) tr(AGϕ) tr(AϕUψ)

= = = = =

ai ∈A

tr(CA ϕ) = ¬Agt XAgte ( e

ai ∈A

ei )U(¬tr(ϕ) ∧

ei )

ai ∈A

LEMMA 22. For every ATELCE formula ϕ, model M, and ‘action’ state q ∈ Q, we have M ATL , q |= tr(ϕ) iff M ATL , q ei |= tr(ϕ) for every i = 1, . . . , k. Proof sketch (structural induction on ϕ): It sufﬁces to note that tr(ϕ) can only contain propositions act, e1 , . . . , ek in the scope of AX for some A ⊆ Agt . Besides, the propositions from ϕ are true in q iff they are true in q e1 , . . . , q ek and the game frames for q, q e1 , . . . , q ek are the same. LEMMA 23. For every ATELCE formula ϕ, model M, and a state q ∈ Q, we have M, q |= ϕ iff M ATL , q |= tr(ϕ). [ 107 ]

272

VALENTIN GORANKO AND WOJCIECH JAMROGA

Proof: The proof follows by structural induction on ϕ. We will show that the construction preserves the truth value of ϕ for two cases: ϕ ≡ AXψ and ϕ ≡ CA ψ. The cases of AGψ and AψUϑ can be reduced to the case for AXψ using the fact that these operators are ﬁxpoints (resp. greatest and least) of certain operators deﬁned in terms of AXψ (see Section 2.5). For lack of space we omit the details. An interested reader can tackle the other cases in an analogous way. case ϕ ≡ AXψ, ATELCE ⇒ ATL. Let M, q |= AXψ, then there is σA such that for every σAgt\A we have oq (σA , σAgt\A ) |= ψ. By induction hypothesis, M ATL , oq (σA , σAgt\A ) |= tr(ψ); also, M ATL , oq (σA , σAgt\A ) |= act. Thus, M ATL , oq (σA , σAgt\A , passAgte ) = oq (σA , σAgt\A ) |= act∧tr(ψ), where passC denotes the strategy where every agent from C ⊆ Agte decides to be passive. In consequence, M ATL , q |= A ∪ Agte Xtr(ψ). case ϕ ≡ AXψ, ATL ⇒ ATELCE. M ATL, q |= A ∪ Agte X(act ∧ tr(ψ)), so there is σA∪Agte such that for every σAgt \(A∪Agte ) = σAgt\A we have M ATL, oq (σA∪Agte , σAgt\A ) |= act ∧ tr(ψ). Note that M ATL, oq (σA∪Agte , σAgt\A ) |= act only when σA∪Agte = σA , passAgte , else the transition would lead to an epistemic state. Thus, oq (σA∪Agte , σAgt\A ) = oq (σA , σAgt\A ), and hence M ATL , oq (σA , σAgt\A ) |= tr(ψ). By the induction hypothesis, M, oq (σA , σAgt\A ) |= ψ and ﬁnally M, q |= AXψ. case ϕ ≡ CA ψ, ATELCE ⇒ ATL. We have M, q |= CA ψ, so for every sequence of states q0 = q, q1 , . . . , qn , qi ∼aji qi+1 , aji ∈ A for i = 0, . . . , n − 1, it is true that M, qn |= ψ. Consider the same q in M ATL . The shape of the construction implies that for every sequence q0 = q, q1 , . . . , qn in which every qi+1 is a successor of qi and every qi+1 ∈ Qeji , eji ∈ Ae , we have M ATL , qn |= tr(ψ) (by induction and Lemma 22). Moreover, M ATL, qi |= eji for i ≥ 1, hence M ATL , qi |= aj ∈A ej . Note that the above refers to all the sequences that can be enforced by the agents from Agte , and have aj ∈A ej true along the way (from q1 on). Thus, Agte have no strategy from q such that aj ∈A ej holds from the next state on, and tr(ψ) is eventually false: M ATL , q ATL Agte XAgte ( aj ∈A ej )U( aj ∈A ej ∧ ¬tr(ψ)), which proves the case. ATL e e case ϕ≡ CA ψ, ATL ⇒ ATELCE. We have M , q |= ¬Agt XAgt ( aj ∈A ej )U( aj ∈A ej ∧ ¬tr(ψ)), so for every σAgte there is ATL , q σAgt \Agte = σAgt q (σAgte , σAgt ) = q ∈ Q and M such that o e |= ¬Agt ( aj ∈A ej ) U( aj ∈A ej ∧ ¬tr(ψ)). In particular, this implies that the above holds for all epistemic states q that are suc[ 108 ]

LOGICS FOR MULTI-AGENT SYSTEMS

273

cessors of q in M ATL, also the ones that refer to agents from A (∗ ). Suppose that M, q CA ψ (∗∗ ). Let us now take the action counter ∈ Q of q . By (∗ ), (∗∗ ) and properties of the construction, part qact occurs also in M, and there must be a path q0 = q, q1 = qact , qact . . . , qn , qi ∈ Q, such that qi ∼aji qi+1 and M, qn ATEL ψ. Then, M ATL, qn ATL tr(ψ) (by induction). This means also that we have a sequence q0 = q, q1 = q , . . . , qn in M ATL , in which every qi ∈ Qeji , aji ∈ A, is an epistemic counterpart of qi . Thus, for every i = 1, . . . , n: M ATL , qi |= eji , so M ATL , qi |= aj ∈A ej . Moreover, M ATL , qn ATL tr(ψ) implies that M ATL , qn ATL tr(ψ) ATL , qn |= ¬tr(ψ). Thus, M ATL , q |= (by Lemma 22), so M Agte ( aj ∈A ej )U( aj ∈A ej ∧ ¬tr(ψ)), which contradicts (∗ ). As an immediate corollary of the last two lemmata we obtain: THEOREM 24. For every ATELCE formula ϕ and model M, ϕ is satisﬁable (resp. valid) in M iff tr(ϕ) is satisﬁable (resp. valid) in M ATL. Note that the construction used above to interpret ATELCE in ATL has several nice complexity properties: – The vocabulary (set of propositions ) only increases linearly (and certainly remains ﬁnite). – The set of states in an ATEL-model grows linearly, too: if model M has n states, then M ATL has n = (k + 1)n = O(kn) states. – Let m be the number of transitions in M. We have (k + 1)m action transitions in M ATL. Since the size of every set img(q, ∼a ) can be at most n, there may be no more than kn epistemic transitions per state in M ATL, hence at most (k + 1)nkn in total. Because m ≤ n2 , we have m = O(k 2 n2 ). – Only the length of formulas may suffer an exponential blow-up, because tr(AGϕ) involves two occurrences of tr(ϕ), and the translation of AϕUψ involves two occurrences of both tr(ϕ) and tr(ψ).9 Thus, every nesting of AGϕ and AϕUψ roughly doubles the size of the translated formula in the technical sense. However, the number of different subformulas in the formula only increases linearly. Note that the automata-based methods for model checking (Alur et al. 2002) or satisﬁability checking (van Drimmelen 2003) for ATL are based on an automaton associated with the given formula, built from its ‘subformulas closure’ – and their complexity depends on the number of different subformulas in the formula rather than number of [ 109 ]

274

VALENTIN GORANKO AND WOJCIECH JAMROGA

symbols. In fact, we can avoid the exponential growth of formulas in the context of satisﬁability checking by introducing a new propositional variable p and requiring that it is universally equivalent to tr(ϕ), i.e., adding conjunct φG(p ↔ tr(ϕ)) to the whole translated formula. Then AGϕ can be simply translated as p ∧ A ∪ Agte XA ∪ Agte G(act ∧p). ‘Until’ formulas AϕUψ are treated analogously. A similar method can be proposed for model checking. To translate AGϕ, we ﬁrst use the algorithm from Alur et al. (2002) and model-check tr(ϕ) to ﬁnd the states q ∈ Q in which tr(ϕ) holds. Then we update the model, adding a new proposition p that holds exactly in these states, and we model-check (p ∧ A ∪ Agte XA ∪ Agte G(act ∧ p)) as the translation of AGϕ in the new model. We tackle tr(AϕUψ) likewise. Since the complexity of transforming M to M ATL is no worse than O(n2 ), and the complexity of ATL model checking algorithm from Alur et al. (2002) is O(ml), the interpretation deﬁned above can be used, for instance, for an efﬁcient reduction of model checking of ATELCE formulas to model checking in ATL. 4.4.3. Interpreting Models and Formulas of Full ATEL Now, in order to interpret the full ATEL we modify the construction by introducing new epistemic agents (and states) indexed not only with individual agents, but with all possible non-empty coalitions: Agte = {eA | A ⊆ Agt, A = ∅} Q = Q ∪

QeA ,

A⊆Agt,A =∅

where Q and all QeA are pairwise disjoint. Accordingly, we extend the language with new propositions {eA | A ⊆ Agt}. The choices for complex epistemic agents refer to the (epistemic copies of) states accessible via eA Then we distributed knowledge relations: e A = {pass} ∪ img(q, ∼D A) modify the transition function (putting the strategies from epistemic agents in any predeﬁned order): ⎧ ⎨ oq (σa1 , . . . , σak ) if all σeA = pass if eA is the ﬁrst active σeA oq (σa1 , . . . , σak , . . . , σeA , . . .) = ⎩ epistemic agent

[ 110 ]

275

LOGICS FOR MULTI-AGENT SYSTEMS

Again, the game frames for all epistemic copies of the action states are the same. The translation for all operators remain the same as well (just using e{i} instead of ei ) and the translation of DA is: tr(DA ϕ) = ¬Agte X(eA ∧ ¬tr(ϕ)). The following result can now be proved similarly to Theorem 24. THEOREM 25. For every ATEL formula ϕ and model M, ϕ is satisﬁable (resp. valid) in M iff tr(ϕ) is satisﬁable (resp. valid) in M ATL. This interpretation requires (in general) an exponential blow-up of the original ATEL model (in the number of agents k). We suspect that this may be inevitable – if so, this tells something about the inherent complexity of the epistemic operators. For a speciﬁc ATEL formula ϕ, however, we do not have to include all the epistemic agents eA in the model – only those for which DA occurs in ϕ. Also, we need epistemic states only for these coalitions. Note that the number of such coalitions is never greater than the length of ϕ. Let l be the length of formula ϕ, and let m ¯ be the cardinality of the “densest” modal accessibility relation – either strategic or epistemic – in M. In other words, m ¯ = max(m, m∼1 , . . . , m∼k ), where m is the number of transitions in M, and m∼1 , . . . , m∼k are cardinalities of the respective epistemic relations. Then, the “optimized” transformation gives ¯ transitions, while the new formula tr(ϕ) is us a model with m = O(l · m) again only linearly longer than ϕ (in the sense explained in Section 4.4.2). In consequence, we can still use the ATL model checking algorithm for model checking of ATEL formulas that is linear in the size of the original structure: the complexity of such process is O(ml ¯ 2 ). 4.4.4. Related Work The interpretation presented in this section has been inspired by Schild (2000) in which a propositional variant of the BDI logic (Rao and Georgeff 1991) was proved to be subsumed by propositional µ-calculus. We use a similar method here to show a translation from ATEL models and formulas to models and formulas of ATL that preserves satisﬁability. ATL (just like µ-calculus) is a multimodal logic, where modalities are indexed by agents (programs in the case of µ-calculus). It is therefore possible to ‘simulate’ the epistemic layer of ATEL by adding new agents (and hence new cooperation modalities) to the scope. Thus, the general idea of the interpretation is to translate modalities of one kind to additional modalities of another kind. Similar translations are well known within modal logics community, including translation of epistemic logic into Propositional Dynamic Logic, [ 111 ]

276

VALENTIN GORANKO AND WOJCIECH JAMROGA

translation of dynamic epistemic logic without common knowledge into epistemic logic (Gerbrandy 1999) etc. A work particularly close to ours is included in van Otterloo et al. (2003). In that paper, a reduction of ATEL model checking to model checking of ATL formulas is presented, and the epistemic accessibility relations are handled in a similar way to our approach, i.e., with use of additional ‘epistemic’ agents. We believe, however, that our translation is more general, and provides more ﬂexible framework in many respects: 1. The algorithm from van Otterloo et al. (2003) is intended only for turnbased acyclic transition systems, which is an essential limitation of its applicability. Moreover, the set of states is assumed to be ﬁnite (hence only ﬁnite trees are considered). There is no restriction like this in our method. 2. The language of ATL/ATEL is distinctly reduced in van Otterloo et al. (2003): it includes only ‘sometime’ (F ) and ‘always’ (G) operators in the temporal part (neither ‘next’ nor ‘until’ are treated), and the individual knowledge operator Ka (the group knowledge operators C, E, D are absent). 3. The translation of a model in van Otterloo et al. (2003) depends heavily on the formula one wants to model-check, while in the algorithm presented here, formulas and models are translated independently (except for the sole case of efﬁcient translation of distributed knowledge).

5. C ONCLUDING R EMARKS

We have presented a comparative study of several logics that combine elements of game theory, temporal logics and epistemic logics, and demonstrated their relationship. Still, these enterprises differ in their motivations and agendas. We wanted to show them as parts of a bigger picture, so that one can compare them, appreciate their similarities and differences, and choose the system most suitable for the intended applications. Notably, the systems studied here can beneﬁt from many ideas and results, both technical and conceptual, borrowed from each other. Indeed, ATL has already beneﬁted from being related to coalitional games, as concurrent game structures provide a more general (and natural) semantics than alternating transition systems. Moreover, coalition effectivity models are mathematically simpler and more elegant, and provide technically handier semantics, essentially based on neighborhood semantics for nonnormal modal logics (Parikh 1985; Pauly 2000a). Furthermore, the pure [ 112 ]

LOGICS FOR MULTI-AGENT SYSTEMS

277

game-theoretical perspective of coalition logics can offer new ideas to the framework of open multi-agent systems and computations formalized by ATL. For instance, fundamental concepts in game theory, such as preference relations between outcomes, and Nash equilibria have their counterparts in concurrent game structures (and, more importantly, in the alternating-time logics) which are unexplored yet. On the other hand, the language and framework of ATL has widened the perspective on coalitional games and logics, providing a richer and more ﬂexible vocabulary to talk about abilities of agents and their coalitions. The alternating reﬁnement relations (Alur et al. 1998a) offer an appropriate notion of bisimulation between ATSs and thus can suggest an answer to the question ‘When are two coalition games equivalent?’.10 Also, a number of technical results on expressiveness and complexity, as well as realizability and model-checking methods from Alur et al. (1998a, 2002) can be transferred to coalition games and logics. And there are some speciﬁc aspects of computations in open systems, such as controllability and fairness constraints, which have not been explored in the light of coalition games. There were a few attempts to generalize ATL by including imperfect information in its framework: ATL with incomplete information in Alur et al. (2002), ATEL, ATOL, ATELRs etc. It can be interesting to see how these attempts carry over to the framework of CL. Also, stronger languages like ATL∗ and alternating-time µ-calculus can provide more expressive tools for reasoning about coalition games. In conclusion, we see the main contribution of the present study as casting a bridge between several logical frameworks for multi-agent systems, and we hope to trigger a synergetic effect from their mutual inﬂuence.

ACKNOWLEDGEMENTS

Valentin Goranko acknowledges the ﬁnancial support during this research provided by the National Research Foundation of South Africa and the SASOL Research Fund of the Faculty of Science at Rand Afrikaans University. He would like to thank Johan van Benthem for sparking his interest in logical aspects of games, Marc Pauly for stimulating discussions on coalition games and valuable remarks on this paper, Moshe Vardi for useful suggestions and references, and Philippe Balbiani for careful reading and corrections on an earlier version of the text. Wojciech Jamroga would like to thank Mike Wooldridge and Wiebe van der Hoek who inspired him to have a closer look at logics for multi-agent systems; and Barteld Kooi as well as Wiebe van der Hoek (again) for their remarks and suggestions. [ 113 ]

278

VALENTIN GORANKO AND WOJCIECH JAMROGA

NOTES 1 We make small notational changes here and there to make the differences and common

features between the models and languages clearer and easier to see. 2 Determinism is not a crucial issue here, as it can be easily imposed by introducing a

new, ﬁctitious agent, ‘Nature’, which settles all non-deterministic transitions. 3 The only real difference is that the set of states Q and the sets representing agents’

choices are explicitly required to be ﬁnite in the concurrent game structures, while MGMs and ATSs are not constrained this way. However, these requirements are not essential and can be easily omitted if necessary. 4 Note that the coalition logic-related notions of choice and collective choice can be readily expressed in terms of alternating transition systems, which immediately leads to a semantics for CL based on ATS, too. Thus, ATL and the coalition logics share the semantics based on alternating transition systems as well. 5 In Pauly (2002) these game frames are called dictatorial, but we disagree with that term. Indeed, at every local step in such game one player determines the move, but these players can be different for the different moves. 6 The deﬁnition in Alur et al. (1998a) requires the whole state space Q to be a Cartesian product of the ‘local’ state spaces; Lomuscio (1999) calls such structures ‘hypercube systems’. We ﬁnd that requirement unnecessarily strong. 7 Private communication. 8 This problem is closely related to the distinction between knowledge de re and knowledge de dicto. The issue is well known in the philosophy of language (Quine 1956), as well as research on the interaction between knowledge and action (Moore 1985; Morgenstern 1991; Wooldridge 2000). 9 We thank an anonymous referee for pointing this out. 10 Cf. the paper ‘When are two games the same’ in van Benthem (2000).

REFERENCES

Abdou, J.: 1998, ‘Rectangularity and Tightness: A Normal Form Characterization of Perfect Information Extensive Game Forms’, Mathematics and Operations Research 23(3). Alur, R., T. A. Henzinger, and O. Kupferman: 1997, ‘Alternating-Time Temporal Logic’, in Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society Press, pp. 100–109. Available at http://www.citeseer.nj.nec.com/170985.html. Alur, R., T. A. Henzinger, and O. Kupferman: 1998a, ‘Alternating-Time Temporal Logic’, Lecture Notes in Computer Science 1536, 23–60. Available at http://wwwciteseer.nj.nec.com/174802.html. Alur, R., T. A. Henzinger, and O. Kupferman: 2002, ‘Alternating-Time Temporal Logic’, Journal of the ACM 49, 672–713. Updated, improved, and extended text. Available at http://wwwwww.cis.upenn.edu/ alur/Jacm02.pdf. Alur, R., T. A. Henzinger, O. Kupferman, and M. Vardi: 1998b, ‘Alternating Reﬁnement Relations’, in CONCUR’98. Aumann, R. and S. Hart (eds.): 1992, Handbook of Game Theory with Economic Applications, Vol. 1, Elsevier/North-Holland.

[ 114 ]

LOGICS FOR MULTI-AGENT SYSTEMS

279

Chandra, A., D. Kozen, and L. Stockmeyer: 1981, ‘Alternation’, Journal of the ACM 28(1), 114–133. Emerson, E. A.: 1990, ‘Temporal and Modal Logic’, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, Vol. B, Elsevier, pp. 995–1072. Fagin, R., J. Y. Halpern, Y. Moses, and M. Y. Vardi: 1995, Reasoning about Knowledge, MIT Press, Cambridge, MA. Gerbrandy, J.: 1999, ‘Bisimulations on Planet Kripke’, Ph.D. thesis, University of Amsterdam. Goranko, V.: 2001, ‘Coalition Games and Alternating Temporal Logics’, in J. van Benthem (ed.), Proceedings of the 8th Conference on Theoretical Aspects of Rationality and Knowledge (TARK VIII), Morgan Kaufmann, pp. 259–272, Corrected version. Available at http://wwwhttp://general.rau.ac.za/maths/goranko/papers/gltl31.pdf. Goranko, V. and G. van Drimmelen: 2003, ‘Complete Axiomatization and Decidability Submitted of the Alternating-Time Temporal Logic’. Hart, S.: 1992, ‘Games in Extensive and Strategic Forms’, in R. Aumann and S. Hart (eds.), Handbook of Game Theory with Economic Applications, Vol. 1, Elsevier/North-Holland, pp. 19–40. Huth, M. and M. Ryan: 2000, Logic in Computer Science: Modelling and Reasoning about Systems, Cambridge University Press. Jamroga, W.: 2003, ‘Some Remarks on Alternating Temporal Epistemic Logic’, in B. Dunin-Keplicz and R. Verbrugge (eds.), Proceedings of Formal Approaches to Multi-Agent Systems (FAMAS 2003), pp. 133–140. Jamroga, W. and W. van der Hoek: 2003, ‘Agents that Know how to Play’, submitted. Lomuscio, A.: 1999, ‘Information Sharing among Ideal Agents’, Ph.D. thesis, University of Birmingham. Moore, R.: 1985, ‘A Formal Theory of Knowledge and Action’, in J. Hobbs and R. Moore (eds.), Formal Theories of the Commonsense World, Ablex Publishing Corporation. Morgenstern, L.: 1991, ‘Knowledge and the Frame Problem’, International Journal of Expert Systems 3(4). Osborne, M. and A. Rubinstein: 1994, A Course in Game Theory, MIT Press, Cambridge, MA. Parikh, R.: 1985, ‘The Logic of Games and its Applications’, Ann. of Discrete Mathematics 24, 111–140. Pauly, M.: 2000a, ‘Game Logic for Game Theorists’, Technical Report INS-R0017, CWI. Pauly, M.: 2000b, ‘A Logical Framework for Coalitional Effectivity in Dynamic Procedures’, in Proceedings of the Conference on Logic and the Foundations of Game and Decision Theory (LOFT4), To appear in Bulletin of Economics Research. Pauly, M.: 2001, ‘Logic for Social Software’, Ph.D. thesis, University of Amsterdam. Pauly, M.: 2002, ‘A Modal Logic for Coalitional Power in Games’, Journal of Logic and Computation 12(1), 149–166. Quine, W.: 1956, ‘Quantiﬁers and Propositional Attitudes’, Journal of Philosophy 53, 177– 187. Rao, A. and M. Georgeff: 1991, ‘Modeling Rational Agents within a BDI-Architecture’, Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning, pp. 473–484. Schild, K.: 2000, ‘On the Relationship between BDI Logics and Standard Logics of Concurrency’, Autonomous Agents and Multi Agent Systems pp. 259–283. Schnoebelen, P.: 2003, ‘The Complexity of Temporal Model Checking’, Advances in Modal Logics, Proceedings of AiML 2002, World Scientiﬁc.

[ 115 ]

280

VALENTIN GORANKO AND WOJCIECH JAMROGA

van Benthem, J.: 2000, ‘Logic and Games’, Technical Report X-2000-03, ILLC. van der Hoek, W. and R. Verbrugge: 2002, ‘Epistemic Logic: A Survey’, Game Theory and Applications 8, 53–94. van der Hoek, W. and M. Wooldridge: 2002, ‘Tractable Multiagent Planning for Epistemic Goals’, in C. Castelfranchi and W. Johnson (eds.): Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS-02), ACM Press, New York, pp. 1167–1174. van der Hoek, W. and M. Wooldridge: 2003a, ‘Cooperation, Knowledge and Time – Alternating-time Temporal Epistemic Logic and its Applications’, Studia Logica 75(1), 125–157. van der Hoek, W. and M. Wooldridge: 2003b, ‘Towards a Logic of Rational Agency’, Logic Journal of the IGPL 11(2), 135–160. van Drimmelen, G.: 2003, ‘Satisﬁability in Alternating-Time Temporal Logic’, in Proceedings of LICS’2003 IEEE Computer Society Press, pp. 208–217. van Otterloo, S., W. van der Hoek, and M. Wooldridge: 2003, ‘Knowledge as Strategic Ability’, Electronic Lecture Notes in Theoretical Computer Science 85(2). von Neumann, J. and O. Morgenstern: 1944, Theory of Games and Economic Behaviour, Princeton University Press, Princeton, NJ. Wooldridge, M.: 2000, Reasoning about Rational Agents, MIT Press, Cambridge, MA. Valentin Goranko Department of Mathematics Rand Afrikaans University, South Africa E-mail: [email protected] Wojciech Jamroga Parlevink Group, University of Twente, the Netherlands Institute of Mathematics, University of Gda´nsk, Poland E-mail: [email protected]

[ 116 ]

GIACOMO BONANNO

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

ABSTRACT. An information completion of an extensive game is obtained by extending the information partition of every player from the set of her decision nodes to the set of all nodes. The extended partition satisﬁes Memory of Past Knowledge (MPK) if at any node a player remembers what she knew at earlier nodes. It is shown that MPK can be satisﬁed in a game if and only if the game is von Neumann (vN) and satisﬁes memory at decision nodes (the restriction of MPK to a player’s own decision nodes). A game is vN if any two decision nodes that belong to the same information set of a player have the same number of predecessors. By providing an axiom for MPK we also obtain a syntactic characterization of the said class of vN games.

1. INTRODUCTION

The standard deﬁnition of extensive game (Selten 1975) speciﬁes a player’s information only when it is her turn to move (that is, only at her decision nodes), thus providing only a partial description of what the player learns during any play of the game. For both conceptual and practical reasons (Battigalli and Bonanno 1999; van Benthem, 2001), it may be desirable to express what a player knows also at nodes where she does not have to move, that is, at nodes that belong to another player. For example, one might want to model what information is (or can be) given to player i after some other player has made a move, even if it is not player i’s turn to move. In order to be able to do so one needs to add, for every player, a partition of the set of all nodes, which – when restricted to that player’s decision nodes – coincides with her initial information partition (thus preserving the original information sets).1 In this paper we study one aspect of memory within the context of such extended partitions. In the philosophy literature the concept of memory has been identiﬁed with the retention of past knowledge (Malcolm 1963; Munsat 1966). In accordance with this, we deﬁne Memory of Past Knowledge (MPK) as the property that at any node the player remembers what she knew at earlier nodes. This is a natural property to consider and, indeed, the restriction of it to a player’s own decision nodes is implied by the notion of perfect Synthese 139: 281–295, 2004. Knowledge, Rationality & Action 117–131, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 117 ]

282

GIACOMO BONANNO

recall, which is routinely assumed in game theory. We show that MPK can be satisﬁed only within the class of games that Kuhn (1953) calls von Neumann (vN) games. An extensive game is vN if any two decision nodes of player i that belong to the same information set of player i have the same number of predecessors. We prove that a game satisﬁes MPK if and only if it is a vN game and, for each player, the restriction of MPK to that player’s decision nodes is satisﬁed. We call the latter property “Memory at Decision Nodes” (MDN). We also show that an implication of MPK is that, at every stage of the game, it is common knowledge among all the players that the play of the game has reached that stage (if node x has k predecessors, that is, if the path from the root to x has length k, then we say that x belongs to stage k). One can think of the stage of the game as the number of units of time that have elapsed since the beginning of the game. Thus MPK implies that the time is always common knowledge among the players. In this respect vN games that satisfy MPK are closely related to the synchronous systems studied in the computer science literature, where the agents have access to an external clock (Halpern and Vardi 1986). In Section 3 we show that the proposed notion of memory on extended partitions does indeed capture the interpretation of memory as retention of past knowledge: we show that it is characterized by either of the following axioms: 1. If in the past the player knew φ then she knows now that in the past she knew φ, 2. If the player knows φ now, then at every future time she will know that in the past she knew φ. Thus either axiom provides a syntactic characterization of the class of von Neumann games that satisfy Memory at Decision Nodes.

2. EXTENDED PARTITIONS AND MEMORY

We use the tree-based deﬁnition of extensive game, which is due to Kuhn (1953). Since our analysis deals with the structure of moves and information, and is independent of payoffs, we shall focus on extensive forms and follow closely the deﬁnition given by Selten (1975). The ﬁrst component of an extensive form is a ﬁnite or inﬁnite rooted tree T , →, t0 where t0 denotes the root and, for any two nodes t, x ∈ T , t → x denotes that t is the immediate predecessor of x (or x is an immediate successor of t). For every node t it is assumed that the number of immediate successors of t is ﬁnite (possibly zero). We denote by ≺ the transitive closure of →. Thus [ 118 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

283

t ≺ x denotes that t is a predecessor of x or x is a successor of t (that is, there is a path from t to x) and we use t x to mean that either t = x or t ≺ x. For example, in the extensive form of Figure 1 we have that t → x and t ≺ z3 . Let Z be the set of terminal nodes, that is, nodes that have no successors and X = T \Z the set of decision nodes. For example, in Figure 1, Z = {z1 , z2 , . . . , z7 } and X = {t0 , t, t , y, x, x }. The second component of an extensive form is a set of players N = {1, 2, . . . , n} and a partition {Xi }i∈N of the set of decision nodes X. For every player i ∈ N, Xi is the set of decision nodes of player i. In the extensive form of Figure 1, N = {1, 2}, the set of player 1’s decision nodes is X1 = {t0 , y} and the set of player 2’s decision nodes is X2 = {t, t , x, x }. The third component is, for every player i ∈ N, an equivalence relation ∼i ⊆ Xi × Xi (that is, a binary relation that is reﬂexive, symmetric and transitive) satisfying the following constraint: if t, t ∈ Xi and t ∼i t then the number of immediate successors of t is equal to the number of immediate successors of t . The interpretation of t ∼i t is that player i cannot distinguish between t and t , that is, as far as she knows, she could be making a decision either at node t or at node t . The equivalence classes of ∼i partition Xi and are called the information sets of player i. We denote by Hi the set of information sets of player i. In the extensive form of Figure 1, ∼1 = {(t0 , t0 ), (y,y)} and , x ) . Thus, ∼2 = (t, t), (t, t ), (t , t), (t , t ), (x, x), (x, x ), (x , x), (x for example, 2’s player information sets are t, t and x, x , that is, H2 = { t, t , x, x }. We use the graphic convention of representing an information set as a rounded rectangle enclosing the corresponding nodes, if there are at least two nodes, while if an information set is a singleton we do not draw anything around it. Furthermore, since all the nodes in an information set belong to the same player, we write the corresponding player only once inside the rectangle. The fourth, and last, component of an extensive form is, for every player i ∈ N, a choice partition, which, for each of her information sets, partitions the edges out of nodes in that information set (that is, the set of ordered pairs (t, x) such that t → x) into player i’s choices at that information set. If (t, x) belongs to choice c we write t →c x. The choice partition satisﬁes the following constraints: (1) if t →c x and t →c x then x = x , and (2) if t →c x and t ∼i t then there exists an x such that t →c x . The ﬁrst condition says that a choice at a node selects a unique immediate successor, while the second condition says that if a choice is available at one node of an information set then it is available at every node in that information set. For example, in Figure 1, x →g z2 and x →g z4 , so that [ 119 ]

284

GIACOMO BONANNO

Figure 1.

player 2’s choice g is (x, z2 ), (x , z4 ) . Graphically we represent choices by labeling the corresponding edges in such a way that two edges belong to the same choice if and only if they are assigned the same label. The main focus in game theory has been on games with perfect recall.2 An extensive form is said to have perfect recall if “for every player i and for any two information sets g and h of player i, if one vertex y ∈ h comes after a choice c at g then every vertex x ∈ h comes after this choice c” (Selten 1975; Kuhn 1997, 319). For example, the extensive form of Figure 1 satisﬁes perfect recall (both x and x come after the same choice at the earlier information set {t, t } of player 2, namely choice d ). It is shown in Bonanno (2003) that perfect recall is equivalent to the conjunction of two independent properties, one expressing memory of past actions and the other memory of past knowledge. In this paper we focus on the latter. We call “Memory at Decision Nodes” (MDN) the following property (which is a weakening of perfect recall): if one node in information set h of player i has a predecessor that belongs to information set g of the same player [ 120 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

285

i, then every node in h has a predecessor in g.3 Formally (recall that Hi denotes the set of information sets of player i): (MDN) if x ≺ y, x ∈ g ∈ Hi , y ∈ h ∈ Hi , and y ∈ h, then there exists an x ∈ g such that x ≺ y . This means that, when it is her turn to move, a player always remembers what she knew at earlier decision nodes of hers. Note that this property is considerably weaker than perfect recall, since it is independent of choices. For example, if the extensive form of Figure 1 is modiﬁed in such a way that (t, x) and (t , y) belong to different choices of player 2,4 then it will still satisfy MDN but it will violate perfect recall. In this paper we shall not assume perfect recall, although we will restrict attention to extensive forms that satisfy the weaker property MDN. Our purpose is to study an extension of this property from the set of decision nodes of player i to the set of all the nodes. This requires extending the notion of information set. DEFINITION 1. An information completion of an extensive form is an ntuple K 1 , . . . , Kn where, for each player i = 1, . . . , n, Ki is a partition of the set of nodes T that agrees on player i’s information sets, in the sense that if node t belongs to information set h of player i then the cell of Ki that contains t – denoted by Ki (t) – coincides with h. Formally: if t ∈ h ∈ Hi then Ki (t) = h. We call Memory of Past Knowledge (MPK) the extension of MDN to the extended partition Ki : ∀x, y, y ∈ T , ∀i ∈ N, (MPK) if x ≺ y and y ∈ Ki (y) then there exists an x ∈ Ki (x) such that x ≺ y . In Section 3 we show that MPK does indeed correspond to the syntactic notion of remembering what one knew in the past. In this section we prove that MPK can be only be satisﬁed in von Neumann extensive forms. For every node t ∈ T , we denote by (t) the number of predecessors of t (i.e., the length of the path from the root to t). The following deﬁnition is taken from Kuhn (1953, 1997, 52). DEFINITION 2. An extensive form is von Neumann if, whenever t and x are decision nodes of player i that belong to the same information set of player i, the number of predecessors of t is equal to the number of predecessors of x. Formally: ∀i ∈ N, ∀t, x ∈ T , if t, x ∈ h ∈ Hi then (x) = (t). [ 121 ]

286

GIACOMO BONANNO

Figure 2.

The extensive form shown in Figure 1 is not von Neumann (since x and x belong to the same information set of player 2 and (x) = 2 while (x ) = 3), while the one shown in Figure 2 is von Neumann. The proof of the following proposition requires several steps and is relegated to Appendix A. For every integer k ≥ 0 we denote by T k the set of k-stage nodes: T k = {t ∈ T : (t) = k}.5 PROPOSITION 3. Let G be an arbitrary extensive form and K1 , . . . , Kn an information completion of it that satisﬁes MPK. Then (1) G is von Neumann, and (2) For every t ∈ T , i ∈ N and k ≥ 0, if t ∈ T k then Ki (t) ⊆ T k .

Part (2) of Proposition 3 implies that at every node t it is common knowledge among all the players that the play of the game has reached the stage k = (t). In fact, since Ki (t) ⊆ T k for all i, the cell of the common knowledge partition containing t is also a subset of T k . Thus at every node the number of moves made up to that point is common knowledge among all the players (although some players may be uncertain as to what moves have been made). The following result, due to Battigalli and Bonanno (1999), gives the converse to Proposition 3. [ 122 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

287

PROPOSITION 4. Let G be a von Neumann extensive form that satisﬁes property MDN. Then there exists an information completion K1 , . . . , Kn of it that satisﬁes MPK. Typically, there will be several information completions that satisfy MPK. The ﬁnest of all such completions (capturing the maximum amount of information that can be conveyed to the players, without violating memory) is obtained as follows. First some notation. For every node t and for every player i, let Hi (t) ⊆ Hi be the set of information sets of player i that are crossed by paths starting at t: Hi (t) = {h ∈ Hi : t y for some y ∈ h}. For example, in the extensive form of Figure 2, H3 (t1 ) = {{t4 , t5 }}, H3 (t2 ) = {{t4 , t5 }, {t6 , t7 }}, H3 (t3 ) = {{t6 , t7 }, {t8 }}, H3 (t4 ) = H3 (t5 ) = {t4 , t5 }, etc. Next we introduce, for every player i, a binary relation on T , denoted by ≈i . Let v, w ∈ T ; then v ≈i w if and only if, either (1) v = w, or (2) (v) = (w) and Hi (v) ∩ Hi (w) = ∅. For example, in the extensive form of Figure 2, t1 ≈3 t2 and t2 ≈3 t3 but not t1 ≈3 t3 . The relation ≈i is clearly reﬂexive and symmetric, but, in general, it is not transitive (as in the case of Figure 2). Let ≈∗i denote the transitive closure of ≈i . Thus v ≈∗i w if and only if there exists a ﬁnite sequence of nodes {y1 , y2 , . . . , ym } such that y1 = v, ym = w and, for all k = 1, . . . , m − 1, yk ≈i yk+1 . Then ≈∗i is an equivalence relation on T . Let Ki (t) be the equivalence class of tgenerated by ≈∗i and Ki the set of equivalence classes, that is, Ki (t) = v ∈ T : t ≈∗i v and Ki = {S ⊆ T : S = Ki (t) for some t ∈ T }. For example, in the extensive form of Figure 2, K3 (t0 ) = {t0 }, K3 (t1 ) = K3 (t2 ) = K3 (t3 ) = {t1 , t2 , t3 }, K3 (t4 ) = K3 (t5 ) = {t4 , t5 }, etc. It is shown in Battigalli and Bonanno (1999) that the information completion deﬁned above is the ﬁnest completion that satisﬁes MPK. By Propositions 3 and 4, the class of vN games that satisfy property MDN is precisely the class of games where there exists an information completion that satisﬁes MPK. By Proposition 3 an extensive form which is not von Neumann cannot have an information completion that satisﬁes MPK, even if it satisﬁes MDN. We illustrate this by means of the extensive form of Figure 1, which satisﬁes property MDN. Consider an information completion K2 for player 2. Since information completions preserve information sets, it must be that K2 (t) = K2 (t ) = {t, t } and K2 (x) = K2 (x ) = {x, x }. By MPK, since y ≺ x and x ∈ K2 (x ) there must be a node v ∈ K2 (y) such that v ≺ x. The only predecessors of x are t and t0 . We cannot have t ∈ K2 (y), since that would imply (by deﬁnition of partition) that y ∈ K2 (t), contradicting the fact that K2 (t) = {t, t }. On the other hand, [ 123 ]

288

GIACOMO BONANNO

if t0 ∈ K2 (y) then, since t ≺ y and t0 ∈ K2 (y), MPK would require the existence of a v ∈ K2 (t ) such that v ≺ t0 . But K2 (t ) = {t, t }.

3. SYNTACTIC CHARACTERIZATION OF MEMORY

In this section we provide a syntactic characterization of MPK. We interpret the precedence relation ≺ as a temporal relation and associate with it the standard past and future operators from basic temporal logic (Prior 1956; Burgess 1984; Goldblatt 1992). To the extended partition Ki of player i we associate a knowledge operator for player i. Given an extensive form and an information completion of it, by frame we mean the collection T , ≺, {Ki }i∈N where T is the set of nodes, ≺ the precedence relation on T and Ki is player i’s extended partition of T . We consider a propositional language with the following modal operators: the temporal operators G and H and, for every player i, the knowledge operator Ki . The intended interpretation is as follows: Gφ: H φ: Ki φ:

“it is Going to be the case at every future time that φ” “it Has always been the case that φ” “player i Knows that φ”.

The formal language is built in the usual way from a countable set S of atomic propositions, the connectives ¬ (for “not”) and ∨ (for “or”) and the def modal operators.6 Let P φ = ¬H ¬φ. Thus the interpretation is: P φ:

“at some Past time it was the case that φ”.

Given a frame T , ≺, {Ki }i∈N one obtains a model based on it by adding a function V : S → 2T (where 2T denotes the set of subsets of T ) that associates with every atomic proposition q ∈ S the set of nodes at which q is true. Given a model and a formula φ, the truth set of φ – denoted by V (φ) – is deﬁned as usual. In particular, V (Gφ) = {t ∈ T : ∀t ∈ T if t ≺ t then t ∈ V (φ)}, V (H φ) = {t ∈ T : ∀t ∈ T if t ≺ t then t ∈ V (φ)}, V (Ki φ) = {t ∈ T : Ki (t) ⊆ V (φ)}. An alternative notation for t ∈ V (φ) is t |= φ. A formula φ is valid in a model if t |= φ for all t ∈ T , that is, if φ is true at every node. A formula φ is valid in a frame if it is valid in every model based on it. [ 124 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

289

Finally, we say that a property of frames is characterized by an axiom if (1) the axiom is valid in any frame that satisﬁes the property and, conversely, (2) whenever the axiom is valid in a frame, then the frame satisﬁes the property. The following proposition, which is proved in Appendix B, provides a characterization of MPK.7 PROPOSITION 5. MPK is characterized by either of the following axioms: (M1)

P Ki φ → Ki P Ki φ,

(M2)

Ki φ → GKi P Ki φ.

(M1) says that if, at some time in the past, player i knew φ, then she knows now that in the past she knew φ. While (M1) is backward-looking, (M2) is forward-looking: it says that if player i knows φ now, then at every future time she will know that some time in the past she knew φ. By Propositions 3 and 5, if an extensive form has an information completion that validates axiom (M1) then the extensive form is von Neumann and satisﬁes property MDN. Conversely, by Propositions 4 and 5, a von Neumann extensive form that satisﬁes MDN has an information completion that validates axiom (M1). Thus axiom (M1) provides a syntactic characterization of the class of von Neumann games that satisfy MDN. The same is true of axiom (M2).

4. C ONCLUSION

An information completion of an extensive form is obtained by extending the information partition of every player from the set of her decision nodes to the set of all nodes. One can then deﬁne, for the extended partition, the following notion of memory: at any node a player remembers what she knew at earlier nodes. We showed that this property can be satisﬁed in an extensive form if and only if the extensive form is von Neumann and satisﬁes the restriction of the property to a player’s own decision nodes. We also provided two equivalent axioms for the proposed notion of memory thus obtaining a syntactic characterization of the said class of von Neumann games. [ 125 ]

290

GIACOMO BONANNO

APPENDIX A . PROOFS OF SECTION

2

In this appendix we prove Proposition 3 of Section 2. For the reader’s convenience we repeat the deﬁnition of MPK: if t ≺ x and x ∈ Ki (x) then there exists a t ∈ Ki (t) such that t ≺ x. We say that at node x there is “time Uncertainty” for player i if the cell Ki (x) of her extended partition Ki contains a predecessor of x, that is, if there is a path in the tree that crosses the cell of player i’s extended partition that contains x more than once.8 DEFINITION 6. At x ∈ T there is time uncertainty for player i if there exists a t ∈ Ki (x) such that t ≺ x. The following lemma states that in information completions that satisfy MPK, time uncertainty “propagates into the past”. LEMMA 7. Fix an arbitrary extensive form and let K1 , . . . , Kn be an information completion of it that satisﬁes MPK. Then the following is true for every node x and every player i: if at x there is time uncertainty for player i, then there exists a t ∈ T such that (1) t ≺ x and (2) at t there is time uncertainty for player i. Proof. Let x and i be such that there exists a t ∈ Ki (x) with t ≺ x. By MPK (letting x = t) there exists a t ∈ Ki (t) such that t ≺ t. Thus at t there is time uncertainty for player i. The following proposition says that MPK rules out time uncertainty. PROPOSITION 8. Fix an arbitrary extensive form and let K1 , . . . , Kn be an information completion of it that satisﬁes MPK. Then for every node x and every player i there cannot be time uncertainty at x for player i. Proof. Suppose that there is a node t1 and a player i at which there is time uncertainty for player i. By Lemma 7 there is an inﬁnite sequence t1 , t2 , . . . such that, for all k ≥ 1, tk+1 ≺ tk and at tk+1 there is time uncertainty for player i. Since T , ≺ is a rooted tree, it has no cycles. Thus, for all j, k ≥ 1 with j = k, tj = tk , contradicting the fact that in a rooted tree every node has a ﬁnite number of predecessors. The following proposition states that a situation like the one illustrated in Figure 3 (where rounded rectangles represent cells of Ki ) is not compatible with MPK. [ 126 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

291

Figure 3.

PROPOSITION 9. Let G be an arbitrary extensive form and K1 , . . . , Kn an information completion of it that satisﬁes MPK. Then the following is true for all t, t , x, x , y ∈ T and i ∈ N: if t → x, t ∈ Ki (t), t → x , x y , and y ∈ Ki (x), then y = x. Proof. Suppose not. Then there exist t, t , x, x , y ∈ T and i ∈ N such that t → x (that is, t is the immediate predecessor of x), t ∈ Ki (t), t → x (that is, t is the immediate predecessor of x ), y ∈ Ki (x) and x ≺ y . Since t → x , by Proposition 8 it must be that (1)

/ Ki (x ) t ∈

(otherwise there would be time uncertainty for player i at x ). It follows that (2)

t∈ / Ki (x ).

In fact, if it were the case that t ∈ Ki (x ), then we would have (by deﬁnition of partition) that Ki (t) = Ki (x ) and, since t ∈ Ki (t), t ∈ Ki (x ), contradicting (1). Since y ∈ Ki (x), Ki (y ) = Ki (x). Thus, since x ∈ Ki (x), (3)

x ∈ Ki (y ).

By MPK it follows from x ≺ y and (3) that there exists an x such that (4)

x ∈ Ki (x )

and (5)

x ≺ x. [ 127 ]

292

GIACOMO BONANNO

Since t is the immediate predecessor of x (t → x), it follows from (5) that either x = t, or x ≺ t. The case x = t yields a contradiction between (4) and (2). Suppose, therefore, that x ≺ t. By MPK it follows from t → x and (4) that there exists a t ∈ Ki (t ) such that t ≺ x . From t ∈ Ki (t) and t ∈ Ki (t ) we get (by deﬁnition of partition) that (6)

t ∈ Ki (t).

From t ≺ x and x ≺ t we get (by transitivity of ≺) that t ≺ t. This, in conjunction with (6), yields time uncertainty at t for player i, contradicting Proposition 8. Proof of Proposition 3. Fix an arbitrary player i and an arbitrary node x. Let k = (x). First we prove part (2), namely that Ki (x) ⊆ T k . We do this by induction. First of all, it must be that Ki (t0 ) = {t0 } (where t0 is the root of the tree). In fact, if there were a t = t0 with t ∈ Ki (t0 ), then we would have Ki (t) = Ki (t0 ) and, since t0 ∈ Ki (t0 ), t0 ∈ Ki (t). Thus, since t0 ≺ t, there would be time uncertainty at t for player i, contradicting Proposition 8. Thus the statement is true for k = 0. Next we show that if it is true for all k ≤ m then it is true for k = m + 1. Fix a node x ∈ T m+1 and an arbitrary y ∈ Ki (x). Then Ki (y ) = Ki (x). By the induction hypothesis, (y ) ≥ m + 1.9 Suppose that (y ) > m + 1. Let t ∈ T m be the immediate predecessor of x. Since t → x and y ∈ Ki (x), by MPK there exists a t ∈ Ki (t) such that t ≺ y . By the induction hypothesis, Ki (t) ⊆ T m and therefore t ∈ T m . Let x be the immediate successor of t on the path from t to y . Since (t ) = m, (x ) = m+1. Thus, since (y ) > m+1, x = y . Thus we have that all of the following are true, contradicting Proposition 9: t → x, t ∈ Ki (t), t → x , x y , y ∈ Ki (x) and y = x . Thus we have shown that for every player i and node x, Ki (x) ⊆ T (x), completing the proof of part (2) of Proposition 3. To prove part (1) it is sufﬁcient to recall that, by deﬁnition of information completion, if node x belongs to information set h of player i, then Ki (x) = h. Thus the extensive form is von Neumann. APPENDIX B . PROOFS FOR SECTION

3

Proof of Proposition 5. Assume MP K. We show that both (M1) and (M2) are valid. For (M1): suppose that x |= P Ki φ. Then there exists a t such that t ≺ x and t |= Ki φ, that is, Ki (t) ⊆ V (φ). Fix an arbitrary x ∈ Ki (x). By MPK there exists a t ∈ Ki (t) such that t ≺ x . Since t ∈ Ki (t), Ki (t ) = Ki (t) and, therefore, since Ki (t) ⊆ V (φ), t |= Ki φ. Thus x |= P Ki φ and x |= Ki P Ki φ. For (M2): suppose that t |= Ki φ. Fix arbitrary x and [ 128 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

293

x such that t ≺ x and x ∈ Ki (x). By MPK there exists a t ∈ Ki (t) such that t ≺ x . Since t ∈ Ki (t), Ki (t ) = Ki (t) and, therefore, t |= Ki φ. Thus x |= P Ki φ and x |= Ki P Ki φ and t |= GKi P Ki φ. To prove the converse, assume that MPK does not hold, that is, there exist i ∈ N and t, x, x ∈ T such that all of the following hold: (7)

t ≺ x,

(8)

x ∈ Ki (x),

(9)

∀t ∈ T ,

if

t ≺ x

then

t ∈ / Ki (t).

We want to show that both (M1) and (M2) can be falsiﬁed. Let q be an atomic sentence and construct a model where V (q) = Ki (t). Then (10)

t |= Ki q.

/ Ki (t) = V (q) and therefore For every t such that t ≺ x , by (9) t ∈ (11)

t q.

It follows from (11) that (12)

t Ki q.

In fact, if it were the case that Ki (t ) ⊆ V (q) = Ki (t) then, since t ∈ Ki (t ) we would have t |= q, contradicting (11). It follows from (12) that x P Ki q. Hence, by (8), (13)

x Ki P Ki q.

By (7) and (10), x |= P Ki q. This, together with (13), falsiﬁes (M1) at x. By (13) and (7), t GKi P Ki q. This, together with (10), falsiﬁes (M2) at t.

ACKNOWLEDGEMENTS

I am grateful to two anonymous referees for helpful comments and suggestions. A ﬁrst draft of this paper was presented at the XIV Meeting on Game Theory and Applications, Ischia, July 2001. [ 129 ]

294

GIACOMO BONANNO

NOTES 1 To avoid confusion, throughout the paper we use the expression “player i’s information

partition” to refer to the standard partition of i’s decision nodes. The elements of this partition will always be referred to as “information sets”. On the other hand, player i’s partition of the set of all nodes will be called “i’s extended partition” and its elements will be called “cells”. 2 The notion of perfect recall was introduced by Kuhn (1953), who interprets it as follows: “this condition is equivalent to the assertion that each player is allowed by the rules of the game to remember everything he knew at previous moves and all of his choices at those moves (Kuhn 1997, 65). In the computer science literature the expression “perfect recall” has been used to denote a weaker property (see the next endnote). 3 This property was ﬁrst studied in the game theory literature by Okada (1987, 89). Ritzberger (1999, 77) calls it “strong ordering”, while Kline (2002, 288) calls it “occurrence memory”. An essentially identical property, called “no forgetting”, was introduced in the computer science literature by Ladner and Reif (1986) and Halpern and Vardi (1986). It was later renamed as ‘perfect recall’ in Fagin et al. (1995). See also van der Meyden (1994). 4 For example, if choice c is {(t, z ), (t , y)} and choice d is {(t, x), (t , z )}. 7 1 5 For example, in the game of Figure 2, T 1 = {t , t , t }, T 2 = {z , t , t , t , t , t }, etc. 1 2 3 1 4 5 6 7 8 6 See, for example, Chellas (1984). The connectives ∧ (for “and”) and → (for “ if def

def

. . . then”) are deﬁned as usual: φ ∧ ψ = ¬(¬φ ∨ ¬ψ) and φ → ψ = ¬φ ∨ ψ. 7 An alternative axiom for the property that we call ‘memory of past knowledge’ was sug-

gested by Ladner and Reif (1986): Ki Gφ → GKi φ. Halpern and Vardi (1986) provided a sound and complete axiomatization of systems that satisfy ‘memory of past knowledge’ ( they called this property ‘no forgetting’) and are synchronous (i.e., the agents have access φ → Ki φ, where is the ‘next time’ to an external clock). The key axiom is Ki operator, that is, t |= φ if φ is true at every immediate successor of t. As pointed out in Section 1, synchronous systems are closely related to von Neumann games. 8 When restricted to a player’s information sets, time uncertainty coincides with the notion of absent-mindedness (Piccione and Rubinstein 1997, 10; Kline 2002, 289). 9 Suppose, to the contrary, that (y ) = j with j < m + 1. Then, by the induction hypothesis, Ki (y ) ⊆ T j . Since x ∈ Ki (x) and Ki (x) = Ki (y ), x ∈ Ki (y ). Thus x ∈ T j , contradicting the hypothesis that x ∈ T m+1 .

REFERENCES

Battigalli, P. and G. Bonanno: 1999, ‘Synchronic Information, Knowledge and Common Knowledge in Extensive Games’, Research in Economics 53, 77–99. van Benthem, J.: 2001, ‘Games in Dynamic Epistemic Logic’, Bulletin of Economic Research 53, 219–248. Bonanno, G.: 2003, ‘Memory and Perfect Recall in Extensive Games’, Games and Economic Behavior, In Press. Burgess, J.: 1984, ‘Basic Tense Logic’, in D. Gabbay and F. Guenthner (eds), Handbook of Philosophical Logic, Vol. II, D. Reidel Publishing Company, pp. 89–133. Chellas, B.: 1984, Modal Logic: An Introduction, Cambridge University Press.

[ 130 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

295

Fagin, R., J. Halpern, Y. Moses, and M. Vardi: 1995, Reasoning about Knowledge, MIT Press. Goldblatt, R.: 1992, Logics of Time and Computation, CSLI Lecture Notes No. 7. Halpern, J. and M. Vardi: 1986, ‘The Complexity of Reasoning about Knowledge and Time, Proceedings 18th ACM Symposium on Theory of Computing, pp. 304–315. Kline, J. J.: 2002, ‘Minimum Memory for Equivalence between ex ante Optimality and Time Consistency, Games and Economic Behavior 38, 278–305. Kuhn, H. W.: 1953, ‘Extensive Games and the Problem of Information’, in H. W. Kuhn and W. W. Tucker (eds), Contributions to the Theory of Games, Vol. II, Princeton University Press, pp. 193–216. Reprinted in Kuhn (1997), pp. 46–68. Kuhn, H. W.: 1997, Classics in Game Theory, Princeton University Press. Ladner, R. and J. Reif: 1986, ‘The Logic of Distributed Protocols (Preliminary Report)’, in J. Halpern (ed.), Theoretical Aspects of Reasoning about Knowledge: Proceedings of the 1986 Conference, Morgan Kaufmann, pp. 207–222. Malcolm, N.: 1963, Knowledge and Certainty, Prentice-Hall. van der Meyden, R.: 1994, ‘Axioms for Knowledge and Time in Distributed Systems withPperfect Recall, Proc. IEEE Symposium on Logic in Computer Science, pp. 448-457. Munsat, S.: 1966, The Concept of Memory, Random House. Okada, A.: 1987, ‘Complete Inﬂation and Perfect Recall in Extensive Games, International Journal of Game Theory 16, 85–91. Piccione, M. and A. Rubinstein: 1997, ‘On the Interpretation of Decision Problems with Imperfect Recall’, Games and Economic Behavior 20, 3–24. Prior, A. N.: 1956, Time and Modality, Oxford University Press. Ritzberger, K.: 1999, ‘Recall in Extensive Form Games’, International Journal of Game Theory 28, 69–87. Selten, R.: 1975, ‘Re-examination of the Perfectness Concept for Equilibrium Points in Extensive Games, International Journal of Game Theory 4, 25–55. Reprinted in Kuhn (1997), 317–354. Department of Economics University of California Davis, CA 95616-8578 U.S.A. E-mail: [email protected]

[ 131 ]

KARL TUYLS, ANN NOWE, TOM LENAERTS and BERNARD MANDERICK

AN EVOLUTIONARY GAME THEORETIC PERSPECTIVE ON LEARNING IN MULTI-AGENT SYSTEMS

ABSTRACT. In this paper we revise Reinforcement Learning and adaptiveness in MultiAgent Systems from an Evolutionary Game Theoretic perspective. More precisely we show there is a triangular relation between the ﬁelds of Multi-Agent Systems, Reinforcement Learning and Evolutionary Game Theory. We illustrate how these new insights can contribute to a better understanding of learning in MAS and to new improved learning algorithms. All three ﬁelds are introduced in a self-contained manner. Each relation is discussed in detail with the necessary background information to understand it, along with major references to relevant work.

1. INTRODUCTION

Agent-Based computing is a new evolving paradigm in computer science. Nowadays, more and more technological challenges require distributed, dynamic systems. Traditional program paradigms make some assumptions which just do not hold for a great number of new applications. Often the environment for which a program is designed is neither static, nor completely known or deterministic (e.g., the internet). The characteristics of these environments imply that systems which need to interact in such an environment operate within rapidly changing circumstances, with an enormous growth of available information. These new requirements of today’s applications suggest that alternative programming paradigms are necessary. Since the early 90s agent-based systems or Multi-Agent Systems have emerged as an important active area of research to support these new requirements in Information Technology (Wooldridge 2002; Luck et al. 2003; Weiss 1999). In contrast with traditional methodologies the agentbased approach views a program as a set of one or more independent, rational agents. Typically, an agent is an autonomous computational entity with a ﬂexible dynamic behaviour in an unpredictable environment. The uncertainty of the environment implies that an agent needs to learn from, and adapt to, its environment to be successful. Indeed, it is impossible to foresee all situations an agent can encounter beforehand. Therefore, learnSynthese 139: 297–330, 2004. Knowledge, Rationality & Action 133–166, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 133 ]

298

KARL TUYLS ET AL.

ing and adaptiveness become crucial for the successful application of Multi-agent systems to contemporary technological challenges. Robocup is a nice illustration of such a challenge. The global goal of the Robocup project is stated as, by the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team (Robocup project 2003). A team of robotic players, challenging human soccer teams, must be capable of learning how to communicate, cooperate and compete. If this team of robotic players, a standard multiagent system, wants to be competitive, it must be able to coordinate their actions as a team. Hence learning and adaptiveness become crucial. Reinforcement Learning (RL) is already an established and profound theoretical framework for learning in stand-alone or single-agent systems. Yet, extending RL to multi-agent systems (MAS) does not guarantee the same theoretical grounding. As long as the environment an agent is experiencing is Markov,1 and the agent can experiment enough, RL guarantees convergence to the optimal strategy. In a MAS however, the reinforcement an agent receives, may depend on the actions taken by the other agents present in the system. Hence, the markovian property no longer holds. Moreover, previous guarantees of convergence disappear. Consider for instance the problem of ﬁnding the optimal way between two points in trafﬁc. The cost measured in time it takes to get from point A to a point B using a particular route will be inﬂuenced by the current trafﬁc conditions, i.e., how many other drivers decided to use the same route. Communication on these decisions is not always possible, moreover there is an associated cost and communication is subject to delays. Uncontrolled exploration in this situation can lead to policy oscillations (Nowe et al. 1999). When everyone decides to take the alternative route, this one becomes less interesting than the original one. Most MAS belong to this last case of non-stationarity. Obviously in these environments, the convergence results of RL are lost. In the light of the above problem it is important to fully understand the dynamics of reinforcement learning and the effect of exploration in MAS. For this aim we review Evolutionary Game Theory (EGT) as a solid basis for understanding learning and constructing new learning algorithms. The Replicator Equations will appear to be an interesting model to study learning in various settings. This model consists of a system of differential equations describing how a population of strategies evolves over time, and plays a central role in biological and economical models. Several authors have already noticed that the Replicator Dynamics (RD) can emerge from simple learning models (Sarin et al. 1997; Redondo 2001; Tuyls et al. 2003). [ 134 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

299

Figure 1. The triangular relation between RL, MAS and EGT.

This article discusses the theoretical foundations of learning in multiagent systems. For the moment, a theoretical framework in which learning and adaptiveness in agent-based systems can be understood profoundly is lacking. However, this paper reveals how Evolutionary Game Theory and Reinforcement Learning are connected and how insights from Evolutionary Game Theory provide a better understanding of learning in general in multi-agent systems. More precisely, this formal relation closes the triangle between the three ﬁelds and offers the necessary foundations for this missing formal framework. This comes down to an important triangular relation between the ﬁeld of MAS, Reinforcement Learning and Evolutionary Game Theory expressed by Figure 1. Each relation of this triangle will be discussed in detail in this paper. The outline of the paper is as follows, in the second section we present an overview of the key concepts and key results from Game Theory (GT) and Evolutionary Game Theory. This is a necessary background for the further discussion and provides an evolutionary game theoretic perspective on learning in MAS. After this we continue with discussing the three relations in more detail. Section 3 discusses the third link of Figure 1 and makes explicit how both ﬁelds relate and what the current issues are in multiagent learning algorithms. Section 4 discusses the second link of Figure 1 and reveals how both ﬁelds relate mathematically and is an opening toward solving the issues of the previous section. Section 5 closes the circle and elaborates on the ﬁrst link, i.e., reveals the interesting similarities between the ﬁelds and summarizes why EGT is an interesting framework to analyze and understand MAS. We end with a conclusion. [ 135 ]

300

KARL TUYLS ET AL.

2. EVOLUTIONARY GAME THEORY

2.1. Introduction When John Nash discovered the theory of games at Princeton, in the late 40s and early 50s, the impact was enormous. Originally, Game Theory was launched by John von Neumann and Oskar Morgenstern in 1944 in their book Theory of Games and Economic Behavior (von Neumann et al. 1944). The impact of the developments in Game Theory expressed itself especially in the ﬁeld of economics, where its concepts played an important role in for instance the study of international trade, bargaining, the economics of information and the organization of corporations. But also in other disciplines in the social and natural sciences the importance of Game Theory became clear, as for instance studies of legislative institutions, of voting behavior, of warfare, of international conﬂicts, and of evolutionary biology. However, von Neumann and Morgenstern had only managed to deﬁne an equilibrium concept for 2-person zero-sum games. Zero-sum games correspond to situations of pure competition, whatever one player wins, must be lost by another. John Nash addressed the case of competition with mutual gain by deﬁning best-reply functions and using Kakutani’s ﬁxed point-theorem. The main results of his work expressed themselves in his development of the Nash Equilibrium and the Nash Bargaining Solution concept. Despite the great usefulness of the Nash equilibrium concept, the assumptions traditional game theory make, like hyperrational players that correctly anticipate the other players in an equilibrium, made game theory stagnate for quite some time (Weibull 1996; Gintin 2000; Samuelson 1997). A lot of reﬁnements of Nash equilibria came along (for instance trembling hand perfection), which made it hard to choose the appropriate equilibrium in a particular situation. Almost any Nash equilibrium could be justiﬁed in terms of some particular reﬁnement. This made clear that the static Nash concept did not reﬂect the (dynamic) real world where people do not act hyperrational. This is were evolutionary game theory originated. More precisely, Maynard Smith adopted the idea of evolution from biology (Maynard-Smith et al. 1973; Maynard-Smith 1982). This idea led Smith and Price to the concept of Evolutionary Stable Strategies (ESS), which in fact obeys a stricter condition than the Nash condition. In evolutionary game theory the game is no longer played exactly once by rational players who know all the details of the game. Details of the game include each others preferences over outcomes. Instead EGT assumes that the game [ 136 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

301

is played repeatedly by players randomly drawn from large populations, uninformed of the preferences of the opponent players. Evolutionary Game Theory offers a solid basis for rational decision making in an uncertain world, it describes how individuals make decisions and interact in complex environments in the real world. Modeling learning agents in the context of Multi-agent Systems requires insight in the type and form of interactions with the environment and other agents in the system. Usually, these agents are modelled similar to the different players in a standard game theoretical model. In other words, these agents assume complete knowledge of the environment, have the ability to correctly anticipate the opposing player (hyperrationality) and know that the optimal strategy in the environment is always the same (static Nash equilibrium). The intuition that in the real world people are not completely knowledgeable and hyperrational players and that an equilibrium can change dynamically led to the development of evolutionary game theory. 2.2. Elementary Concepts In this section we review the key concepts of EGT and its mutual relationships. This is important to understand the further discussion in later sections. We start by deﬁning strategic games and concepts as Nash equilibrium, Pareto optimality and evolutionary stable strategies. Then we discuss the relationships between these concepts and provide some examples. 2.2.1. Strategic Games In this section we deﬁne n-player normal form games as a conﬂict situation involving gains and losses between n players. In such a game n players interact with each other by all choosing an action (or strategy) to play. All players choose their strategy at the same time. For reasons of simplicity, we limit the pure strategy set of the players to 2 strategies. A strategy is deﬁned as a probability distribution over all possible actions. In the 2-pure strategies case, we have: s1 = (1, 0) and s2 = (0, 1). A mixed strategy sm is then deﬁned by sm = (x1 , x2 ) with x1 , x2 = 0 and x1 + x2 = 1. Deﬁning a game more formally we restrict ourselves to the 2-player 2-action game. Nevertheless, an extension to n-players n-actions games is straightforward, but examples in the n-player case do not show the same illustrative strength as in the 2-player case. A game G = (S1 , S2 , P1 , P2 ) is deﬁned by the payoff functions P1 , P2 and their strategy sets S1 for the ﬁrst player and S2 for the second player. In the 2-player 2-strategies case, the payoff functions P1 : S1 × S2 → R and P2 : S1 × S2 → R player and B for the second player, see Table I. The payoff tables A, B deﬁne the [ 137 ]

302

KARL TUYLS ET AL.

TABLE I The left matrix (A) deﬁnes the payoff for the row player, the right matrix (B) deﬁnes the payoff for the column player A=

a11 a12 a21 a22

B=

b11 b12 b21 b22

instantaneous rewards. Element aij is the reward the row-player (player 1) receives for choosing pure strategy si from set S1 when the columnplayer (player 2) chooses the pure strategy sj from set S2 . Element bij is the reward for the column-player for choosing the pure strategy sj from set S2 when the row-player chooses pure strategy si from set S1 . The family of 2 × 2 games is usually classiﬁed in three subclasses, as follows (Redondo 2001), Subclass 1: if (a11 − a21 )(a12 − a22 ) > 0 or (b11 − b12 )(b21 − b22 ) > 0, at least one of the 2 players has a dominant strategy, therefore there is just 1 strict equilibrium. Subclass 2: if (a11 − a21 )(a12 − a22 ) < 0 or (b11 − b12 )(b21 − b22 ) < 0, and (a11 − a21 )(b11 − b12 ) > 0, there are 2 pure equilibria and 1 mixed equilibrium. Subclass 3: if (a11 − a21 )(a12 − a22 ) < 0 (b11 − b12 )(b21 − b22 ) < 0, and (a11 − a21 )(b11 − b112 ) < 0, there is just 1 mixed equilibrium. The ﬁrst subclass includes those type of games where each player has a dominant strategy,2 as for instance the prisoner’s dilemma. However it includes a larger collection of games since only one of the players needs to have a dominant strategy. In the second subclass none of the players has a dominated strategy (e.g., battle of the sexes). But both players receive the highest payoff by both playing their ﬁrst or second strategy. This is expressed in the condition (a11 − a21 )(b11 − b12 ) > 0. The third subclass only differs from the second in the fact that the players do not receive their highest payoff by both playing the ﬁrst or the second strategy (e.g., matching pennies game). This is expressed by the condition [ 138 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

303

(a11 − a21 )(b11 − b12 ) < 0. Section 2.2.6 provides an example of each subclass. 2.2.2. Nash equilibrium In traditional game theory it is assumed that the players are hyperrational, meaning that every player will choose the action that is best for him, given his beliefs about the other players’ actions. A basic deﬁnition of a Nash equilibrium is stated as follows. If there is a set of strategies for a game with the property that no player can increase its payoff by changing his strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute a Nash equilibrium. Formally, a Nash equilibrium is deﬁned as follows. When 2 players play the strategy proﬁle s = (si , sj ) belonging to the product set S1 × S2 then s is a Nash equilibrium if P1 (si , sj ) ≥ P1 (sx , sj ) ∀x ∈ {1, . . . , n} and P2 (si , sj ) ≥ P2 (si , sx ) ∀x ∈ {1, . . . , m}. In Section 2.2.6 some examples illustrate this deﬁnition. 2.2.3. Pareto Optimality Intuitively a Pareto optimal solution of a game can be deﬁned as follows: a combination of actions of agents in a game is Pareto optimal if there is no other solution for which all players do at least as well and at least one agent is strictly better off. More formally we have: a strategy combination s = (s1 , ..., sn ) for n agents in a game is Pareto optimal if there does not exist another strategy combination s for which each player receives at least the same payoff Pi and at least one player j receives a strictly higher payoff than Pj . 2.2.4. Evolutionary Stable Strategies The core equilibrium concept of Evolutionary Game Theory is that of an Evolutionary Stable Strategy (ESS). The idea of an evolutionarily stable strategy was introduced by John Maynard Smith and Price in 1973 (Maynard-Smith et al. 1973). Imagine a population of agents playing the same strategy. Assume that this population is invaded by a different strategy, which is initially played by a small number of the total population. If the reproductive success of the new strategy is smaller than the original one, it will not overrule the original strategy and will eventually disappear. In this case we say that the strategy is evolutionary stable against this new appearing strategy. More general, we say a strategy is an Evolutionary Stable strategy if it is robust against evolutionary pressure from any appearing mutant strategy.

[ 139 ]

304

KARL TUYLS ET AL.

Formally an ESS is deﬁned as follows. Suppose that a large population of agents is programmed to play the (mixed) strategy s, and suppose that this population is invaded by a small number of agents playing strategy s . The population share of agents playing this mutant strategy is ∈ ]0, 1[. When an individual is playing the game against a random chosen agent, chances that he is playing against a mutant are and against a non-mutant are 1 − . The payoff for the ﬁrst player, being a non mutant is: P (s, (1 − )s + s ) and being a mutant is, P (s , (1 − )s + s ) Now we can state that a strategy s is an ESS if ∀s = s there exists some δ ∈ ]0, 1[ such that ∀ : 0 < < δ, P (s, (1 − )s + s ) > P (s , (1 − )s + s ) holds. The condition ∀ : 0 < < δ expresses that the share of mutants needs to be sufﬁciently small. 2.2.5. The Relation between Nash Equilibria and ESS This section explains how the core equilibria concepts from classical and evolutionary game theory relate to one another. The set of Evolutionary Stable Strategies for a particular game are contained in the set of Nash Equilibria for that same game, {ESS} ⊂ {NE} The conditions for an ESS are stricter than the Nash condition. Intuitively this can be understood as follows: as deﬁned above a Nash equilibrium is a best reply against the strategies of the other players. Now if a strategy s1 is an ESS then it is also a best reply against itself, or optimal. If it wasn’t optimal against itself there would have been a strategy s2 that would lead to a higher payoff against s1 than s1 itself. So, if the population share of mutant strategies s2 is small enough then s1 is not evolutionary stable because, P (s2 , (1 − )s1 + s2 ) > P (s1 , (1 − )s1 + s2 ) An important second property for an ESS is the following. If s1 is ESS and s2 is an alternative best reply to s1 , then s1 has to be a better reply to s2 [ 140 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

305

TABLE II Prisoner’s dilemma: The left matrix (A) deﬁnes the payoff for the row player, the right one (B) for the column player A=

1 5 0 3

B=

1 0 5 3

than s2 to itself. This can easily be seen as follows, because s1 is ESS, we have for all s2 P (s1 , (1 − )s1 + s2 ) > P (s2 , (1 − )s1 + s2 ) If s2 does as well against itself as s1 does, then s2 earns at least as much against (1 − )s1 + s2 as s1 and then s1 is no longer evolutionary stable. To summarize we now have the following 2 properties for an ESS s1 , 1. P (s2 , s1 ) ≤ P (s1 , s1 )∀s2 . 2. P (s2 , s1 ) = P (s1 , s1 ) "⇒ P (s2 , s2 ) < P (s1 , s2 )∀s2 = s1 . 2.2.6. Examples In this section we provide some examples of the classiﬁcation of games (see Section 2.2.1) and illustrate the Nash equilibrium concept and Evolutionary Stable Strategy concept as well as Pareto optimality. For the ﬁrst subclass we consider the prisoner’s dilemma game (Gintis 2000; Weibull 1996). In this game 2 prisoners, who committed a crime together, have a choice to either cooperate with the police (to defect) or work together and deny everything (to cooperate). If the ﬁrst criminal (row player) defects and the second one cooperates, the ﬁrst one gets off the hook (expressed by a maximum reward of 5) and the second one gets the most severe punishment. If they both defect, they get the second most severe punishment one can get (expressed by a payoff of 1). If both cooperate, they both get a minimum sentence. The payoffs of the game are deﬁned in Table II. As one can see both players have one dominant strategy, more precisely defect. For both players, defecting is the dominant strategy and therefore always the best reply toward any strategy of the opponent. So the Nash equilibrium in this game is for both players to defect. Let’s now determine whether this equilibrium is also an evolutionary stable strategy. Suppose ∈ [0, 1] is the number of cooperators in the population. The expected [ 141 ]

306

KARL TUYLS ET AL.

TABLE III Battle of the sexes: The left matrix (A) deﬁnes the payoff for the row player, the right one (B) for the column player A=

2 0 0 1

B=

1 0 0 2

payoff of a cooperator is 3 +(1−0) and that of a defector is 5 +(1−1). Since for all , 5 + 1(1 − ) > 3 + 0(1 − ) defect is an ESS. So the number of defectors will always increase and the population will eventually only consist of defectors. In Section 2.3 this dynamical process will be illustrated by the replicator equations. This equilibrium which is both Nash and ESS, is not a Pareto optimal solution. This can be easily seen if we look at the payoff tables. The combination (defect, defect) yields a payoff of (1, 1), which is a smaller payoff for both players than the combination (cooperate, cooperate) which yields a payoff of (3, 3). Moreover the combination (cooperate, cooperate) is a Pareto optimal solution. For the second subclass we considered the battle of the sexes game (Gintis 2000; Weibull 1996). In this game a married couple loves each other so much they want to do everything together. One night the husband wants to see a movie and the wife wants to go to the opera. This situation is described by the payoff matrices of Table III. If they both do their activities separately they receive the lowest payoff. In this game there are 2 pure strategy Nash equilibria, i.e., (movie, movie) and (opera, opera), which both are also evolutionary stable (as demonstrated in Section 2.3.4). There is also 1 mixed nash equilibrium, i.e., where the row player (the husband) plays movie with 2/3 probability and opera with 1/3 probability and the column player (the wife) plays opera with 2/3 probability and movie with 1/3 probability. However, this equilibrium is not an evolutionary stable one (as demonstrated in Section 2.3.4). The third class consists of the games with a unique mixed equilibrium. For this category we used the game deﬁned by the matrices in Table IV. This equilibrium is not an evolutionary stable one (see Section 2.3.4). Typical for this class of games is that the interior trajectories deﬁne closed orbits around the equilibrium point. [ 142 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

307

TABLE IV The left matrix (A) deﬁnes the payoff for the row player, the right one (B) for the column player A=

2 3 4 1

B=

3 1 2 4

2.3. Population Dynamics In this section we discuss the Replicator Dynamics in a single and a multi population setting. We discuss the relation with concepts as Nash equilibrium and ESS and illustrate the described ideas with some examples. 2.3.1. Single Population Replicator Dynamics The basic concepts and techniques developed in EGT were initially formulated in the context of evolutionary biology (Maynard-Smith 1982; Weibull 1996; Samuelson 1997). In this context, the strategies of all the players are genetically encoded (called genotype). Each genotype refers to a particular behavior which is used to calculate the payoff of the player. The payoff of each player’s genotype is determined by the frequency of other player types in the environment. One way in which EGT proceeds is by constructing a dynamic process in which the proportions of various strategies in a population evolve. Examining the expected value of this process gives an approximation which is called the RD. An abstraction of an evolutionary process usually combines two basic elements: selection and mutation. Selection favors some varieties over others, while mutation provides variety in the population. The replicator dynamics highlight the role of selection, it describes how systems consisting of different strategies change over time. They are formalized as a system of differential equations. Each replicator (or genotype) represents one (pure) strategy si . This strategy is inherited by all the offspring of the replicator. The general form of a replicator dynamic is the following: (1)

dxi = [(Ax)i − x · Ax]xi dt

In Equation (3), xi represents the density of strategy si in the population, A is the payoff matrix which describes the different payoff values each [ 143 ]

308

KARL TUYLS ET AL.

individual replicator receives when interacting with other replicators in the population. The state of the population (x) can be described as a probability vector x = (x1 , x2 , . . . , xJ ) which expresses the different densities of all the different types of replicators in the population. Hence (Ax)i is the payoff which replicator si receives in a population with state x and x · Ax describes the average payoff in the population. The growth rate dxi dt xi

of the population share using strategy si equals the difference between the strategy’s current payoff and the average payoff in the population. For further information we refer the reader to Weibull (1966), Hofbauer et al. (1998). 2.3.2. Multi-Population Replicator Dynamics So far the study of population dynamics was limited to a single population. However in many situations interaction takes place between 2 or more individuals from different populations. In this section we study this situation in the 2-player multi-population case for reasons of simplicity. Games played by individuals of different populations are commonly called evolutionary asymmetric games. Here we consider a game to be played between the members of two different populations. As a result, we need two systems of differential equations: one for the row player (R) and one for the column player (C). This setup corresponds to a RD for asymmetric games. If A = B t (the transpose of B), Equation (1) would emerge again. Player R has a probability vector p over its possible strategies and player C a probability vector q over its strategies. This translates into the following replicator equations for the two populations: (2)

dpi = [(Aq)i − p · Aq]pi dt

(3)

dqi = [(Bp)i − q · Bp]qi dt

As can be seen in Equations (2) and (3)), the growth rate of the types in each population is now determined by the composition of the other population. Note that, when calculating the rate of change using these systems of differential equations, two different payoff matrices (A) and (B) are used for the two different players.

[ 144 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

309

2.3.3. Relating Nash, ESS and the RD As being a system of differential equations, the RD have some rest points or equilibria. An interesting question is how these RD-equilibria relate to the concepts of Nash equilibria and ESS. We brieﬂy summarize some known results from the EGT literature (Weibull 1996; Gintis 2000; Osborne et al. 1994; Hofbauer et al. 1998; Redondo 2001). An important result is that every Nash equilibrium is an equilibrium of the RD. But the opposite is not true. This can be easily understood as follows. Let us consider the vector space or simplex of mixed strategies determined by all pure strategies. Formally the unit simplex is deﬁned by, = {x ∈ #m + :

m

xi = 1}

i=1

where x is a mixed strategy in m-dimensional space (there are m pure strategies), and xi is the probability with which strategy si is played. Calculating the RD for the unit vectors of this space (putting all the weight on a particular pure strategy), yields zero. This is simply due to the properties of the simplex , where the sum of all population shares remains equal to 1 and no population share can ever turn negative. So, if all pure strategies are present in the population at any time, then they always have been and always will be present, and if a pure strategy is absent from the population at any time, then it always has been and always will be absent.3 So, this means that the pure strategies are rest points of the RD, but depending on which game is played these pure strategies do not need to be a Nash equilibrium. Hence not every rest point of the RD is a Nash equilibrium. So dynamic equilibrium or stationarity alone is not enough to have a better understanding of the RD. For this reason the criterion of asymptotic stability came along, where you have some kind of local test of dynamic robustness. Local in the sense of minimal perturbations. For a formal deﬁnition of asymptotic stability, we refer to Hirsch et al. (1974). Here we give an intuitive deﬁnition. An equilibrium is asymptotic stable if the following two conditions hold: – Any solution path of the RD that starts sufﬁciently close to the equilibrium remains arbitrarily close to it. This condition is called Liapunov stability. – Any solution path that starts close enough to the equilibrium, converges to the equilibrium. Now, if an equilibrium of the RD is asymptotically stable (i.e., being robust to local perturbations) then it is a Nash equilibrium. For a proof, the reader is referred to Redondo (2001). An interesting result due to Sigmund and [ 145 ]

310

KARL TUYLS ET AL.

Hofbauer (Hofbauer (1998) is the following : If s is an ESS, then the population state x = s is asymptotically stable in the sense of the RD. For a proof, see Hofbauer et al. (1998), Redondo (2001). So, by this result we have some kind of reﬁnement of the asymptotic stable rest points of the RD and it provides a way of selecting equilibria from the RD that show dynamic robustness. 2.3.4. Examples In this section we continue with the examples of Section 2.2.6 and the classiﬁcation of games of Section 2.2.1. We start over with the Prisoner’s Dilemma game (PD). In Figure 2 we plotted the direction ﬁeld of the replicator equations applied to the PD. A Direction ﬁeld is a very elegant and excellent tool to understand and illustrate a system of differential equations. The direction ﬁelds presented here consist of a grid of arrows tangential to the solution curves of the system. Its a graphical illustration of the vector ﬁeld indicating the direction of the movement at every point of the grid in the state space. Filling in the parameters for each game in Equations (2) and (3), allowed us to plot this ﬁeld. The x-axis represents the probability with which the ﬁrst player will play defect and the y-axis represents the probability with which the second player will play defect. So the Nash equilibrium and the ESS lie at coordinates (1, 1). As you can see from the ﬁeld plot all the movement goes toward this equilibrium. Figure 3 illustrates the direction ﬁeld diagram for the battle of the sexes game. As you may recall from Section 2.2.6 this game has 2 pure Nash equilibria and 1 mixed Nash equilibrium. This equilibria can be seen in the ﬁgure at coordinates (0, 0),(1, 1),(2/3, 1/3). The 2 pure equilibria are ESS as well. This is also easy to verify from the plot, more precisely, any small perturbation away from the equilibrium would lead the dynamics back to the equilibrium. The mixed equilibrium, which is Nash, is not an asymptotic stable strategy, which is obvious from the plot. From Section 2.2.6, we can now also conclude that this equilibrium is not evolutionary stable either. Figure 4 illustrates the last class of games (subclass 3). Typical for this class of games is that the interior trajectories deﬁne closed orbits around the equilibrium point, as you can see in the plot. This Nash equilibrium is not asymptotically stable, because its second condition is not met, which stated that any solution path that starts close enough to the equilibrium, converges to the equilibrium. However, the ﬁrst condition, i.e., Liapunov stability, is met, stating that any solution path of

[ 146 ]

311

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

Figure 2. The direction ﬁeld of the RD of the prisoner’s dilemma using payoff Table II.

the RD that starts sufﬁciently close to the equilibrium remains arbitrarily close to it. This can be intuitively understood from the plot.

3. REINFORCEMENT LEARNING AND MULTI - AGENT SYSTEMS

In this part we discuss the relation between learning and MAS (see Figure 1). Recall from Section 1 that learning and adaptiveness is crucial for the successful application of Multi-Agent Systems to challenging domains as for instance Robotic Soccer (Stone 2000). In a ﬁrst section we start with the already established theory of Single-Agent learning. We continue with the more challenging issues of Multi-Agent learning and discuss the different possible approaches. [ 147 ]

312

Figure 3. Table III.

KARL TUYLS ET AL.

The direction ﬁeld of the RD of the Battle of the sexes game using payoff

3.1. Single Agent Reinforcement Learning RL is the problem faced by an agent that learns behavior through trial-anderror interactions with a dynamic environment. A reinforcement learning model consists of: 1. A discrete set of environment states. 2. A discrete set of agent actions. 3. A set of scalar reinforcement signals. On each step of interaction the agent receives a reinforcement, possibly zero, and some indication of the current state of the environment, and chooses an action. The agent’s job is to ﬁnd a policy mapping states to actions, that maximizes some long-run measure of reinforcement. Very often this measure is the discounted cumulative reward. [ 148 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

313

Figure 4. The direction ﬁeld of the RD of the third category using payoff Table IV.

In its most general form, the RL problem is a problem of an agent located in an environment δ trying to maximize a long-term reward by taking actions a from different situations in δ. Figure 5 illustrates this problem statement in more detail. At time step t the agent ﬁnds itself in situation (or state) st . From st it takes action at . The environment reacts and places the agent in situation st +1 . By performing action at the agent receives an immediate reward rt . The immediate reward depends on either or both the action taken, and the next state. To choose an action at from a particular state st at time step t the agent uses a policy πt , with πt (s, a) the probability that in state s at time step t action at will be performed. Common reinforcement learning methods, which can be found in Sutton et al. (2000) are structured around estimating value functions. A value [ 149 ]

314

KARL TUYLS ET AL.

Figure 5. The reinforcement learning model.

of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. More formally we have: V (s) = Eπ {Rt |st = s} = Eπ { π

∞

γ k rt +k+1 |st = s}

k=0

Rewards further away in the future are discounted by γ with 0 < γ < 1. One way to ﬁnd the optimal policy is to ﬁnd the optimal value function. If a perfect model of the environment as a Markov decision process is known, the optimal value function can be calculated using Dynamic Programming (DP) techniques. In DP two major approaches exist, i.e., value-iteration and policy-iteration. Both approaches have their counter parts in RL, which can be considered as model-free4 stochastic approximation methods of the DP techniques. Q-learning is a well-known RL technique that belongs to the valueiteration class. It learns an evaluation function for each situation-action pair. This function Q is deﬁned by Qπ (s, a) = Eπ {Rt |st = s, at = a} ∞ γ k rt +k+1 |st = s, at = a = Eπ k=0

Thus, the Q-function expresses the expected reward if an agent takes action a in state s and then continues with policy π . Based on his experience, the agent can iteratively improve this evaluation function and as such adapt the policy π to the ideal policy π ∗ , which maximizes the long term reward. The relation between V π (s) and Qπ (s, a) is: V π (s) = max Qπ (s, a) a

[ 150 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

315

Figure 6. Classiﬁcation of RL-algorithms.

The update rule used in standard Q-learning is given below

(4)

Qt +1 (s, a) ← (1 − α)Qt (s, a) Qt (s , a )) +α(r + γ max a

where Qt (s , a ) and Qt +1 (s, a) are the estimation of the state-action value at time step t and t + 1 respectively, and s the state where the agent arrives after taking action a at time step t + 1 in situation s. Actor-critic methods are RL techniques that belong to the policy iteration class. These methods keep track of a current policy, the actor, and an estimate of corresponding state-value function. The critic is based on this estimated value function. It generates a Temporal Difference error, which is in its simplest form given by: et = rt +1 + γ V (st +1 ) − V (st ). When the error is positive the tendency of repeating the action taken in state st will be reinforced, otherwise it will be weakened. A learning cycle goes as follows. The agent is at time step t in a certain state st , for that state the current best action is chosen, with some degree of exploration. The selected action will bring the agent in a new state st +1 . If the new state st +1 looks better, i.e., et > 0, then the action at for state st will be strengthened, e.g., by increasing the probability to be selected, otherwise this probability will be decreased (Sutton et al. 2000). Figure 6 illustrates the classiﬁcation of RL-methods.

[ 151 ]

316

KARL TUYLS ET AL.

3.2. Multi-Agent Reinforcement Learning The original reinforcement learning algorithms as mentioned above, were designed to be used in a single agent setting. When applied to Markovian decision problems most RL techniques are equipped with a formal proof stating that under very acceptable conditions they are guaranteed to converge to the optimal policy, for instance, for Q-learning, see Tsitsiklis (1993). There has also been quite some effort to extend these RL techniques to Partially Observable Markovian Decision Problems and other non-Markovian settings (Loch et al. 1998; Pendrith et al. 1998; Kaelbling et al. 1996; Perkins et al. 2002). The extension to multi-agent learning recently received more attention. It is clear that the actions taken by one agent might affect the response characteristics of the environment. So we can no longer assume the Markovian property holds. In the domain of Learning Automata, this is referred to as state dependent non-stationarity (Narendra et al. 1989). When applying RL to a multi-agent case, two extreme approaches can be taken. The ﬁrst one totally neglects the presence of the other agents, and agents are considered to be selﬁsh reinforcement learners. The effects caused by the other agents also acting in that same environment are considered as noise. It is clear that for problems where agents have to coordinate in order to reach a preferable situation for both actions, this will not yield satisfactory results (Hu et al. 1999). The other extreme is the joint action space approach where the state and action space are respectively deﬁned as the Cartesian product of the agent’s individual state and action spaces. More formally, if S is the set of states and A1 , . . . , An the action sets of the different agents the learning will be performed in the product space (5)

S × A1 × · · · × An → R

This implies that the state information is shared amongst the agents and actions are taken and evaluated synchronously. It is obvious that this approach leads to very big state-action spaces, and assumes instant communication between the agents. Clearly this approach is in contrast with the basic principles of multi-agent systems: distributed control, asynchronous actions, incomplete information, cost of communication. In between these approaches we can ﬁnd examples which try to overcome the drawbacks of the joint action approach (Litmann et al. 1994; Claus et al. 1998; Jafari et al. 2001; Nowé et al. 1999). Below we describe cross-learning, which can be considered as multi-agent RL, and is important in clarifying the relationship between RL and EGT. Cross learning is a less complex model than Q-learning and even Learning Automata (LA, see Section 3.2.2) in the sense that it does not [ 152 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

317

TABLE V A payoff table U U=

U11 U12 U21 U22

require an initialisation and that there are not so many parameters to ﬁne tune as in LA and Q-learning. Cross learning does not consider a learning rate, a discount factor and an exploration strategy5 as Q-learning does, nor needs reward and penalty parameters6 as LA does. 3.2.1. Cross Learning The cross learning model is a special case of the standard reinforcement learning model (Sarin et al. 1997). The model considers several agents playing the same normal form game repeatedly in discrete time. At each point in time, each player is characterized by a probability distribution over his strategy set which indicates how likely he is to play any of his strategies. The probabilities change over time in response to experience. At each time step (indexed by n), a player chooses one of its Strategies based on the probabilities which are related to each isolated strategy. Positive payoffs represent reinforcing experiences, which induce a player to increase the probability of the strategy chosen. So, the larger the payoff, the larger the increase and thus the bigger the strength of reinforcement. As a result a player can be represented by a probability vector: p(n) = (p1 (n), . . . , pr (n)) In case of a 2-player game with payoff matrix U , player 1 gets payoff Uij when he chooses strategy i and player 2 chooses strategy j . We assume that (6)

0 < Uij < 1

In this case there is no deterrence. The iterations of the game are indexed by n ∈ N. Players do not observe each other’s strategies and payoffs and play the game repeatedly. After making their observations, each stage they update their probability vector, according to, (7)

pi (n + 1) = Uij + (1 − Uij )pi (n) [ 153 ]

318 (8)

KARL TUYLS ET AL.

pi (n + 1) = (1 − Uij )pi (n)

Equation (7) expresses how the probability of the selected strategy (i) is updated and Equation (7) expresses how all the other strategies i = i are corrected. If this player p played strategy i in the nth repetition of the game, and if he received payoff Uij , then he updates his state by taking a weighted average of the old state, and of the unit vector which puts all probability on strategy i. The probability vector q(n) (for the second player), q(n) = (q1 (n), . . . , qs (n)) is updated in an analogous manner. This entire system of equations deﬁnes a stochastic update process for the players {p(n), q(n)}. This process is called the “Cross learning process” in Sarin et al. (1997). Börgers and Sarin showed that in an appropriately constructed continuous time limit, this model converges to the asymmetric, continuous time version of the replicator dynamics (see Section 2.3). 3.2.2. Learning Automata Learning Automata have their origins in mathematical psychology (Bush et al. 1955). Originally, Learning Automata were deterministic and based on complete knowledge of the environment. Later developments came up with uncertainties in the system and the environment and lead to the stochastic automaton. More precisely, the stochastic automaton tries to provide a solution of the learning problem without having any information on the optimal action initially. It starts with equal probabilities on all actions and during the learning process these probabilities are updated based on responses from the environment. In Figure 7 a Learning Automaton is illustrated in its most general form. The environment is represented by a triple {α, c, β}, where α represents a ﬁnite action set, β represents the response set of the environment, and c is a vector of penalty probabilities, where each component ci corresponds to an action αi . The response β from the environment can take on 2 values β1 or β2 . Often they are chosen to be 0 or 1, where 1 is associated with a penalty response and 0 with a reward. Now, the penalty probabilities can be deﬁned as (9)

ci = P (β(n) = 1|α(n) = αi )

So ci is the probability that action αi will result in a penalty response. If these probabilities are constant, the environment is called stationary. [ 154 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

319

Figure 7. A Learning Automaton – Environment pair.

Several models are recognized by the response set of the environment. Models in which the response β can only take 2 values are called P models. Models which allow a ﬁnite number of values in a ﬁxed interval are called Q-models. When β is a continuous random variable in a ﬁxed interval, the model is called an S-model. In a variable structure stochastic automaton7 action probabilities are updated at every stage using a reinforcement scheme. The vector p is the action probability vector over the possible actions as with Cross Learning in the previous section. Important examples of update schemes are linear reward-penalty, linear rewardinaction and linear reward--penalty. The philosophy of those schemes is essentially to increase the probability of an action when it results in a success and to decrease it when the response is a failure. The general algorithm at timestep n + 1 is given by: (10)

pi (n + 1) = pi (n) + a(1 − β(n))(1 − pi (n)) − bβ(n)pi (n) if αI is the action taken at time n

(11)

pj (n + 1) = pj (n) − a(1 − β(n))pj (n) + bβ(n)[(r − 1)−1 −pj (n)] if αj = αi

where Equation (10) is the update rule for the performed action αi and Equation (11) for all the other actions. The constants a and b are the reward and penalty parameters respectively. When a = b the algorithm is referred to as linear reward-penalty (LR−P ), when b = 0 it is referred to as linear reward-inaction (LR−I ) and when b is small compared to a it is called linear reward--penalty (LR−P ). [ 155 ]

320

KARL TUYLS ET AL.

Figure 8. Automata Game representation.

If the penalty probabilities ci of the environment are constant, the probability p(n + 1) is completely determined by p(n) and hence p(n)n>0 is a discrete-time homogeneous Markov process. Convergence results for the different schemes are obtained under the assumptions of constant penalty probabilities, see Narendra et al. (1989). Learning automata can also be connected in useful ways. A simple example of a multi-agent system modeled as an automata game is shown in Figure 8. A play α(t) = (α 1 (t) . . . α n (t)) of n automata is a set of strategies chosen by the automata at stage t. Correspondingly the outcome is now a vector β(t) = (β 1 (t) . . . β n (t)). At every instance all automata update their probability distributions based on the responses of the environment. Each automaton participating in the game operates without information concerning payoff, the number of participants, their strategies or actions.

4. REINFORCEMENT LEARNING AND EVOLUTIONARY GAME THEORY

In this section we discuss the relation between Reinforcement Learning and EGT. We show how the 2 ﬁelds are formally related, with results from economics and computer science. Some examples will illustrate the strength of these results. More precisely, we show some examples of the dynamics of Q-learning and LA and show how EGT can be extended to be used as a formal foundation for the construction of new RL algorithms for MAS. The ﬁrst subsection summarizes some main results from economics, and the second will deal with extensions of these results. The third subsection will show how this formal relations between RL and EGT can be used as a foundation for modeling new RL algorithms for MAS, i.e., as an initial framework for RL in MAS. [ 156 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

321

Figure 9. Left: The direction ﬁeld of the RD of the prisoner’s game. Right: The paths induced by the learning process.

4.1. The Formal Relation between Cross Learning and EGT In their paper, Learning through Reinforcement and Replicator Dynamics, Börgers and Sarin prove an interesting link between EGT and Reinforcement Learning (Sarin et al. 1997). More precisely they considered a version of Bush and Mosteller (Bush et al. 1955) stochastic learning theory in the context of games and proved that in a continuous time limit, the learning model converges to the asymmetric continuous time replicator equations8 of EGT. With this result they provided a formalization of the relation between learning at the individual level and biological evolution. The version of the learning model of Bush and Mosteller is called Cross learning and has been thoroughly explained in Section 3.2.1. It is important to note that each time interval has to see many iterations of the game, and that the adjustments which players make between two iterations of the game are very small. If the limit is constructed in this manner, a law of large numbers can be applied, and the learning process converges, in the limit, to the replicator dynamics. Important to understand is that this result refers to arbitrary, points in ﬁnite time. The result does not hold if inﬁnite time is considered. The asymptotic behaviour for time tending to inﬁnity of the discrete time learning can be quite different from the asymptotic behaviour of the continuous time RD. For the mathematical proof of this result we refer the interested reader to Sarin et al. (1997). Before continuing this discussion we illustrate this result with the prisoner’s dilemma game. In Figure 9 we plotted the direction ﬁeld of the RD and the Cross learning process for this game. More precisely, the ﬁgure on the left illustrates the direction ﬁeld of the replicator dynamics and the ﬁgure on the right shows the learning process [ 157 ]

322

KARL TUYLS ET AL.

Figure 10. Left: The direction ﬁeld of the RD. Right: The paths induced by the learning process.

of Cross. We plotted for both players the probability of choosing their ﬁrst strategy (in this case defect). As you can see the sample paths of the reinforcement learning process approximates the paths of the RD. As mentioned before, the result of Börgers and Sarin only holds for a point in time t with t < ∞. It doesn’t apply however to the asymptotic behaviour for t → ∞. Moreover, the asymptotic behaviour of the learning process may be very different from that of the continuous RD. To show this we demonstrate a result of Börgers and Sarin concerning the discrete time learning process. This result says that, with probability 1, the learning process will converge to a limit in which both players play some pure strategy. For a mathematical proof of this proposition we refer to Sarin et al. (1997). Recall from Section 2.3.4 that in the third category of games, the RD circle around the mixed Nash equilibrium. Figure 10 now clearly illustrates that in this type of game the asymptotic behaviour is different for both models. 4.2. Extending the Formal Relation to Other RL-Models 4.2.1. Learning Automata and Evolutionary Dynamics In Tuyls et al. (2002) it is shown by the authors that the Cross learning model is a Learning Automaton with a linear-reward-inaction updating scheme. To provide the reader with an intuition on this relation we brieﬂy describe the mathematical relation between both learning models. More details and a variety of experiments can be found in Tuyls et al. (2002). Showing that the Cross learning model is a special case of LA we need to relate the Equations (7) and (8) with (10) and (11). [ 158 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

323

If it is assumed that b = 0 in Equations (10) and (11), the relation between both models becomes apparent. We now see that the Cross learning model is in fact a special case of the reward-inaction update scheme. When the reward penalty term a = 1, the feedback from the environment (1−β(n)) equals the game reward UjRk . Hence the equations become equivalent. The experiments in Tuyls et al. (2002) have been conducted with both the linear reward-inaction update scheme and the reward--penalty update scheme. 4.2.2. Q-Learning and Evolutionary Dynamics In this paragraph we brieﬂy9 describe the relation between Q-learning and the RD. More precisely we present the dynamical system of Q-learning. These equations are derived by constructing a continuous time limit of the Q-learning model, where Q-values are interpreted as Boltzmann probabilities for the action selection. Again we consider games between 2 players. The equations for the ﬁrst player are, xj dxi (12) = xi ατ ((Ay)i − x · Ay) + xi α xj ln dt xi j analogously for the second player, we have, (13)

dyi yj = yi ατ ((Bx)i − y · Bx) + yi α yj ln dt yi j

Equations (12) and (13) express the dynamics of both Q-learners in terms of Boltzmann probabilities.10 Each agent(or player) has a probability vector over his action set, more precisely x1 , . . . , xn over action set a1 , . . . , an for the ﬁrst player and y1 , . . . , ym over b1 , . . . , bm for the second player. For a complete discussion on this equations we refer to Tuyls et al. (2003b). Comparing (12) or (13) with the RD in (1), we see that the ﬁrst term of (12) or (13) is exactly the RD of EGT and thus takes care of the selection mechanism, see Weibull (1996). The second term turned out to be a mutation term, and can be rewritten as: (14) xi α xj ln(xj ) − ln(xi ) j

In equation (14) we recognize 2 entropy terms, one over the entire probability distribution x, and one over strategy xi . Relating entropy and mutation is not new. It is a well known fact (Schneider 2000; Stauffer (1999) that mutation increases entropy. In Stauffer (1999), it is stated that the concepts [ 159 ]

324

KARL TUYLS ET AL.

Figure 11. The direction ﬁeld plots of the battle of the sexes (subclass 2) game with τ = 1, 2, 10.

are familiar with thermodynamics in the following sense: the selection mechanism is analogous to energy and mutation to entropy. So generally speaking, mutations tend to increase entropy. Exploration can be considered as the mutation concept, as both concepts take care of providing variety. Section 4.2.2 illustrates the dynamics of Q-learning in the battle of the sexes game. The direction ﬁeld is plotted for three values of the temperature τ . 4.3. Extended Replicator Dynamics (ERD): Using the Initial Framework In Tuyls et al. (2003c) the authors changed the RD in a new kind of dynamics, i.e., the Extended Replicator Dynamics (ERD). The reasons for changing the RD become clear from Section 4.1 and Tuyls et al. (2003a, b). In one-state games it is impossible for Cross learning and Learning Automata to guarantee convergence to a stable Nash equilibrium in all types of games. In Boltzmann Q-learning a Nash equilibrium can be attained, but there is no guarantee for stability. For the development of an adapted selection dynamics, we took the Replicator dynamics and its interpretation as a starting point. In RD, the probabilities a player has over its strategies are changed greedily with respect to payoff in the present. In this section a method is shown to change these probabilities over strategies not only with respect to payoff growth in the present but also to payoff growth in the future. We call those players that act so as to optimize future payoff extended Cross learners and the class of dynamics associated extended dynamics. There are of course different ways to build such extended players. The most obvious is to use a linear approximation of the evolution of ﬁtness in time. This is the approach we use here. For the ERD we compose the following equation f , (15) [ 160 ]

f (x) = RD(x) + (dRD(x)/dt) ∗ η

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

325

where RD(x) is, (16)

dxi = [(Ax)i − x · Ax]xi dt

and η is the parameter that determines how far in the future we need to look. The composition of Equation (15) can best be understood as follows. When using the classical replicator equations (i.e., RD(x)), we act greedily toward payoff in the present. When adding our second term, (17)

(dRD(x)/dt) ∗ η

we act greedily toward payoff in the future. From an analytical point of view, the second term gives actions that are winning ﬁtness (whether its ﬁtness is negative or positive) a positive push toward a higher chance of getting selected. On the other hand, actions that are losing ﬁtness (again whether its ﬁtness is negative or positive) are given a negative push toward a lower chance of getting selected. This extends the traditional replicator equations. This extended evolutionary dynamics succeeds in converging to a stable Nash Equilibrium in all 3 categories of 2 * 2 games. In Tuyls et al. (2003c) we also constructed a model free RL algorithm which behaves as the ERD, based on Cross learning. Experiments conﬁrming this can be found in Tuyls et al. (2003c). Here we show an experiment for the third category of games. As you recall from Section 4.1 this type of game shows an important difference with our ERD. ERD and the extended Cross learning algorithm will not circle but converge to the mixed Nash equilibrium. This is illustrated in Figure 12. Moreover the equilibrium is stable, meaning that the learning process will not abandon it. The long-run learning dynamics are illustrated in the ﬁgure on the right.

5. EVOLUTIONARY GAME THEORY AND MULTI - AGENT SYSTEMS

In this section we discuss the most interesting properties that link the ﬁelds of EGT and MAS. Traditional Game theory is an economical theory that models interactions between rational agents as games of two or more players that can choose from a set of strategies and the corresponding preferences. It is the mathematical study of interactive decision making in the sense that the agents involved in the decisions take into account their own choices and those of others. Choices are determined by [ 161 ]

326

KARL TUYLS ET AL.

Figure 12. Left: The direction ﬁeld of the RD. Right: The paths induced by the learning process.

1. stable preferences concerning the outcomes of their possible decisions, 2. agents act strategically, in other words, they take into account the relation between their own choices and the decisions of other agents. Typical for the traditional game theoretic approach is to assume perfectly rational players who try to ﬁnd the most rational strategy to play. These players have a perfect knowledge of the environment and the payoff tables and they try to maximize their individual payoff. These assumptions made by classical game theory just do not apply to the real world and MultiAgent settings in particular. In contrast, EGT is descriptive and starts from more realistic views of the game and its players. A game is not played only once, but repeatedly with changing opponents that, moreover, are not completely informed, sometimes misinterpret each others’ actions, and are not completely rational but also biologically and sociologically conditioned. Under these circumstances, it becomes impossible to judge what choices are the most rational ones. The question now becomes how a player can learn to optimize its behaviour and maximize its return. For this learning process, mathematical models are developed, e.g., replicator equations. Summarizing the above we can say that EGT describes how boundedly rational agents can make decisions in complex environments, in which they interact with other agents. Bounded rationality means that agents are limited in their computational resources, in their ability to reason and have limited information. In such complex environments software agents must be able to learn from their environment and adapt to its non-stationarity. The basic properties of a Multi-Agent System correspond exactly with that of EGT. First of all, a MAS is made up of interactions between two [ 162 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

327

or more agents, with each trying to accomplish a certain (conﬂicting) goal. Not any agent has the guarantee to be completely informed about the other agents intentions or goals, nor has it the guarantee to be completely informed about the complete state of the environment. Of great importance is that EGT offers us a solid basis to understand dynamic iterative situations in the context of strategic games. A MAS has a typical dynamical character, which makes it hard to model and brings along a lot of uncertainty. At this stage EGT seems to offer us a helping hand in understanding this typical dynamical processes in a MAS and modeling them in simple settings as iterative games of two or more players.

6. CONCLUSIONS

By starting with a concise overview on key concepts from EGT and their mutual relationships, we provided an Evolutionary game theoretic point of view on Learning and MAS. By introducing the triangular relation between MAS, RL and EGT we formalized this perspective into one that results in new insights in MAS and more particularly for the indispensable concept of Learning in MAS. These new insights make it possible to overcome important daily occurrences and crucial learning issues in MAS. In this relation, EGT provides the necessary basic foundations for understanding and analyzing MAS. Because of the similarities between the two ﬁelds (recall Section 5), EGT is vital to MAS as a framework for learning and today’s applications. This is the ﬁrst link of Figure 1. The second link provided the mathematical formalization of the relation between RL and EGT. Again this relation offers a better understanding of learning in a MAS and provides basic mechanisms toward learning algorithms with more capabilities in the future (Tuyls et al (2003c). More precisely, in the case of 1-state games this means stable strategies which are possibly mixed. Obviously, this opens the door toward multi-state games, and MAS. These two ﬁrst links of Figure 1 are the keys toward solving the issues in the third link between RL and MAS. These main difﬁculties in MAS were profoundly described in Section 3. It is our belief that today’s MAS applications require agents to be adaptive and hence will beneﬁt from this evolutionary game theoretic perspective.

NOTES ∗ Author funded by a doctoral grant from the Institute for Advancement of Scientiﬁc

Technological Research in Flanders (IWT).

[ 163 ]

328

KARL TUYLS ET AL.

1 The Markov property states that only the present state gives any information of the future

behaviour of the learning process. Knowledge of the history of the process does not add any new information. 2 A strategy is dominant if it is always better than any other strategy, regardless of what the opponent may do. 3 Off course a solution orbit can evolve toward the boundary of the simplex as time goes to inﬁnity, and thus in the limit, when the distance to the boundary goes to zero, a pure strategy can disappear from the population of strategies. For a more formal explanation, we refer the reader to Weibull (1996). 4 A model consists of knowledge of the state transition probability function T (s, a, s ), which is the probability of ending up in state s after taking action a in state s, and the reinforcement function R(s, a), which is the payoff for taking action a in state s. 5 For instance in Boltzmann Q-learning the temperature determines the degree of exploration. 6 Parameters a and b in Equations (10) and (11). 7 As opposed to ﬁxed structure learning automata, where state transition probabilities are ﬁxed and have to be chosen according to the response of the environment and to perform better than a pure-chance automaton in which every action is chosen with equal probability. 8 Recall from Section 2.3.2 the deﬁnition of the asymmetric RD. 9 The reader who is interested in the complete derivation of the dynamics of Q-learning, we refer to Tuyls et al. (2003b). 10 Formally the Boltzmann distribution is described by, xi (k) = ne

τ Qai (k)

j=1

e

τ Qaj (k)

where xi (k) is the probability of playing strategy i at time step k and τ is the temperature.

REFERENCES

Bazzan A. L. C. and Franziska Klugl: 2003, ‘Learning to Behave Socially and Avoid the Braess Paradox in a Commuting Scenario’, in Proceedings of the First International Workshop on Evolutionary Game Theory for Learning in MAS, Melbourne Australia. Bazzan A. L. C.: 1997, A Game-Theoretic Approach to Coordination of Trafﬁc Signal Agents, Ph. D. thesis, University of Karlsruhe. Börgers, T. and R. Sarin: 1997, ‘Learning through Reinforcement and Replicator Dynamics’, Journal of Economic Theory 77(1). Braess D.: 1968, ‘Uber ein paradoxon aus der verkehrsplanung’, Unternehmensforschung 12, 258. Bush, R. R. and F. Mosteller, F.: 1955, Stochastic Models for Learning, Wiley, New York. Claus, C. and C. Boutilier: 1998, ‘The Dynamics of Reinforcement Learning in Cooperative Multi-Agent Systems, in Proceedings of the 15th International Conference on Artiﬁcial Intelligence, pp. 746–752. Ghosh, A. and S. Sen: 2003, ‘Learning TOMs: Convergence to Non-Myopic Equilibria’, in Proceedings of the First International Workshop on Evolutionary Game Theory for Learning in MAS, Melbourne, Australia. Gintis, C. M.: 2000, Game Theory Evolving, University Press, Princeton.

[ 164 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

329

Hirsch, M. W. and S. Smale: 1974, Differential Equations, Dynamical Systems and Linear Algebra, Academic Press, Inc. Hofbauer, J. and K. Sigmund: 1998, Evolutionary Games and Population Dynamics, Cambridge University Press. Hu, J. and M. P. Wellman: 1998, Multiagent Reinforcement Learning in Stochastic Games, Cambridge University Press. Jafari, C., A. Greenwald, D. Gondek, and G. Ercal: 2001, ‘On No-Regret Learning, Fictitious Play, and Nash Equilibrium’, in Proceedings of the Eighteenth International Conference on Machine Learning, pp. 223–226. Kaelbling, L. P., M. L. Littman, and A. W. Moore: 1996, ‘Reinforcement Learning: A Survey’, Journal of Artiﬁcial Intelligence Research. Littman, M. L.: 1994, ‘Markov Games as a Framework for Multi-Agent Reinforcement Learning’, Proceedings of the Eleventh International Conference on Machine Learning, pp. 157–163. Loch, J. and S. Singh: 1998, ‘Using Eligibility Traces to Find the Best Memoryless Policy in a Partially Observable Markov Process’, Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco. Luck, M., P. McBurney, and C. Preist: 2003, ‘A Roadmap for Agent Based Computing’, AgentLink, Network of Excellence. Maynard-Smith, J.: 1982, Evolution and the Theory of Games, Cambridge University Press. Maynard Smith, J. and G. R. Price: 1973, ‘The Logic of Animal Conﬂict’, Nature 146, 15–18. Narendra, K. and M. Thathachar: 1989, Learning Automata: An Introduction, PrenticeHall. Nowé, A., J. Parent, and K. Verbeeck: 2001, ‘Social Agents Playing a Periodical Policy’, in Proceedings of the 12th European Conference on Machine Learning, pp. 382–393. Nowé A. and K. Verbeeck: 1999, ‘Distributed Reinforcement learning, Loadbased Routing a Case Study’, Notes of the Neural, Symbolic and Reinforcement Methods for Sequence Learning Workshop at ijcai99, Stockholm, Sweden. von Neumann, J. and O. Morgenstern: 1944, Theory of Games and Economic Behaviour, Princeton University Press, Princeton. Osborne, J. O. and A. Rubinstein: 1994, A Course in Game Theory, MIT Press, Cambridge, MA. Pendrith, M. D. and M. J. McGarity: 1998, ‘An Analysis of Direct Reinforcement Learning in Non-Markovian Domains’, in Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco. Perkins, T. J. and M. D. Pendrith: 2002, ‘On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains’, in Proceedings of the International Conference on Machine Learning (ICML02). Redondo, F. V.: 2001, Game Theory and Economics, Cambridge University Press. Robocup project: 2003, ‘The Ofﬁcial Robocup Website at www.robocup.org, Robocup. Samuelson, L.: 1997, Evolutionary Games and Equilibrium Selection, MIT Press, Cambridge, MA. Schneider, T. D.: 2000, ‘Evolution of Biological Information’, Journal of Nucleic Acids Research 28, 2794–2799. Stauffer, D.: 1999, Life, Love and Death: Models of Biological Reproduction and Aging, Institute for Theoretical Physics, Köln, Euroland. Stone P.: 2000, Layered Learning in Multi-Agent Systems, MIT Press, Cambridge, MA.

[ 165 ]

330

KARL TUYLS ET AL.

Sutton, R. S. and A. G. Barto: 1998, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA. Tsitsiklis, J. N.: 1993, ‘Asynchronous Stochastic Approximation and q-Learning’, Internal Report from the Laboratory for Information and Decision Systems and the Operation Research Center, MIT Press, Cambridge, MA. Tuyls, K., T. Lenaerts, K. Verbeeck, S. Maes, and B. Manderick: 2002, ‘Towards a Relation between Learning Agents and Evolutionary Dynamics’, in Proceedings of the BelgiumNetherlands Artiﬁcial Intelligence Conference 2002 (BNAIC), KU Leuven, Belgium. Tuyls, K., K. Verbeeck, and S. Maes: 2003a, ‘On a Dynamical Analysis of Reinforcement Learning in Games: Emergence of Occam’s Razor, Lecture Notes in Artiﬁcial Intelligence, Multi-Agent Systems and Applications III, Lecture Notes in AI 2691, (Central and Eastern European conference on Multi-Agent Systems 2003), Prague, 16–18 June 2003, Czech Republic. Tuyls, K., K. Verbeeck, and T. Lenaerts, T.: 2003b, ‘A Selection-Mutation Model for QLearning in Multi-Agent Systems’, in The ACM International Conference Proceedings Series, Autonomous Agents and Multi-Agent Systems 2003, Melbourne, 14–18 July 2003, Australia. Tuyls, K., D. Heytens, A. Nowe, and B. Manderick: 2003c, ‘Extended Replicator Dynamics as a Key to Reinforcement Learning in Multi-Agent Systems’, Proceedings of the European Conference on Machine Learning’03, Lecture Notes in Artiﬁcial Intelligence, Cavtat-Dubrovnik, 22–26 September 2003, Croatia. Weibull, J. W.: 1996, Evolutionary Game Theory, MIT Press, Cambridge, MA. Weibull, J. W.: 1998, ‘What we have Learned from Evolutionary Game Theory so Far?’, Stockholm School of Economics and I.U.I., May 7, 1998. Weiss, G.: 1999, in Gerard Weiss (ed.), Multiagent Systems. A Modern Approach to Distributed Artiﬁcial Intelligence, MIT Press, Cambridge, MA. Wooldridge, M.: 2002, An Introduction to MultiAgent Systems, John Wiley & Sons, Chichester, England. Computational Modeling Lab Department of Computer Science Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium E-mail: [email protected]

[ 166 ]

ROBERT VAN ROOY

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

ABSTRACT. In this paper we study language use and language organisation by making use of Lewisean signalling games. Standard game theoretical approaches are contrasted with evolutionary ones to analyze conventional meaning and conversational interpretation strategies. It is argued that analyzing successful communication in terms of standard game theory requires agents to be very rational and fully informed. The main goal of the paper is to show that in terms of evolutionary game theory we can motivate the emergence and selfsustaining force of (i) conventional meaning and (ii) some conversational interpretation strategies in terms of weaker and, perhaps, more plausible assumptions.

1. INTRODUCTION

We all feel that information transfer is crucial for communication. But it cannot be enough: although smoke indicates that there is ﬁre, we would not say that communication is taking place. Also not all transfer of information between humans counts as communication. Incidental information transfer should be ruled out. Intuitively, in order for an event to mean something else, intentionality is crucial. And indeed, Grice (1957) characterizes ‘meaning’ in terms of communicator’s intentions. To mean something by x, speaker S must intend (1) S’s action x to produce a certain response a in a certain audience/receiver R; (2) R to recognize S’s intention (1); (3) R’s recognition of S’s intention (1) to function as at least part of R’s reason for R’s response a. The ﬁrst condition says basically that we communicate something in order to inﬂuence the receiver’s beliefs and/or behavior. However, for an act to be a communicative act, the response should be mediated by the audience’s recognition of the sender’s intention, i.e. condition 2. But also Synthese 139: 331–366, 2004. Knowledge, Rationality & Action 167–202, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 167 ]

332

ROBERT VAN ROOY

the recognition of the speaker’s intention is not sufﬁcient. To see what is missing, consider the following contrasting pair (Grice 1957): (1)a. b.

A policeman stops a car by standing in its way. A policeman stops a car by waving.

Although in both examples the ﬁrst two Gricean conditions are satisﬁed, we would say that only in (1b) some real communication is going on. The crucial difference between (1a) and (1b), according to Grice (1957), is that only in (1b) the audience’s recognition of the policeman’s intention to stop the car is effective in producing that response. In contrast to the case where he stands himself in the car’s way, the policeman does not regard it as a foregone conclusion that his waving will have the intended effect that the driver stops the car, whether or not the policeman’s intention is recognized. To be able to characterize the contrast between (1a) and (2b) is important to characterize linguistic, or conventional meaning. The difference between (2a) and (2b) seems to be of exactly the same kind. (2)a. b.

Feeling faint, a child lets its mother see how pale it is (hoping that she may draw her own conclusion and help.) A child says to its mother, “I feel faint”.

In contrast to (1a) and (2a), in the cases (1b) and (2b) an agent communicates something by means of a sign with a conventional meaning (Lewis 1969, 152–159; Searle 1969). But Grice did not really intend to characterize situations where agents intend to inﬂuence one another by making use of signals with a conventional meaning. He aimed to account for successful communication even without conventional ways of doing so. According to Grice, third-order intentionality is required for communicative acts: the speaker intends the hearer to recognize that the speaker wants the hearer to produce a particular response. Strawson (1964) and Schiffer (1972) showed by means of some examples that this third-order intentionality is not enough. We can still construct examples where an agent wants her audience to recognize her intention in order to produce a certain effect, without it intuitively being the case that the speaker’s action means the intended response: it can be that the speaker does not want her intention, that R performs the desired action, to become mutually known.1 For an action to be called communicative, the action has to make the speaker’s intention common knowledge. [ 168 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

333

In this paper we will study both conventional and non-conventional meaning in terms of signalling games as invented by David Lewis and developed further in economics and biology. However, we are going to suggest that in order to successfully communicate information we do not need as much rationality, higher-order intentionality or common knowledge as (explicitly or implicitly) required by Grice, Lewis, Schiffer, and others. Building on work of economists and biologists, we will suggest that evolutionary game theory can be used to account for the emergence and self-perpetual force of both arbitrary semantic rules and of general functional pragmatic interpretation strategies. This paper is organized as follows. In Section 2 the analysis of signalling in standard, or rational, game theory is examined. The standard problem here is that of equilibrium selection and in Section 3 Lewis’s (1969) conventional way of solving it is discussed, together with his motivation for why these conventional solutions are self-enforcing. In Section 4 evolutionary game theory is used to provide an alternative motivation for why linguistic conventions remain stable and why some candidate conventions are more natural to emerge than others. According to this alternative motivation we do not need to assume as strong notions of rationality and (common) knowledge as Lewis does. In Section 5 and 6 we argue that evolutionary signalling games can also be used to motivate why natural languages are organized and used in such an efﬁcient but still reliable way. Reliability (Grice’s maxim of quality) is tackled in Section 5, efﬁciency (Grice’s (1967) quantity and manner) in Section 6. The paper ends with some conclusions and suggestions for further research.

2. COMMUNICATION PROBLEMS AS SIGNALLING GAMES

2.1. Signalling Games For the study of information exchange we will consider situations where a speaker has some relevant information that the hearer lacks. The simplest games in which we see this asymmetry are signalling games. A signalling game is a two-player game with a sender, s, and a receiver, r. This is a game of private information: The sender starts off knowing something that the receiver does not know. The sender knows the state t ∈ T she is in but has no substantive payoff-relevant actions.2 The receiver has a range of payoff-relevant actions to choose from but has no private information, and his prior beliefs concerning the state the sender is in are given by a probability distribution P over T ; these prior beliefs are common knowledge. The sender, knowing t and trying to inﬂuence the action of the receiver, [ 169 ]

334

ROBERT VAN ROOY

sends to the latter a signal of a certain message m drawn from some set M. The messages do not have a pre-existing meaning. The other player receives this signal, and then takes an action a drawn from a set A. This ends the game. Notice that the game is sequential in nature in the sense that the players do not move simultaneously: the action of the receiver might depend on the signal he received from the sender. For simplicity, we take T , M and A all to be ﬁnite. A pure sender strategy, S, is a (deterministic) function from states to signals (messages): S ∈ [T → M], and a pure receiver strategy, R, a (deterministic) function from signals to actions: R ∈ [M → A]. Mixed strategies (probabilistic functions, which allow us to account for ambiguity) will play only a minor role in this paper and can for the most part be ignored. As an example, consider the following signalling game with two equally likely states: t and t ; two signals that the sender can use: m and m ; and two actions that the receiver can perform: a and a . Sender and receiver each have now four (pure) strategies:

Sender:

S1 S2 S3 S4

t m m m m

t m m m m

Receiver:

R1 R2 R3 R4

m a a a a

m a a a a

To complete the description of the game, we have to give the payoffs. The payoffs of the sender and the receiver are given by functions Us and Ur , respectively, which (for the moment) are elements of [T × A → R], where R is the set of reals. Just like Lewis (1969) we assume (for the moment) that sending messages is costless, which means that we are talking about cheap talk games here. Coming back to our example, we can assume, for instance, that the utilities of sender and receiver are in perfect alignment – i.e., for each agent i, Ui (t, a) = 1 > 0 = Ui (t, a ) and Ui (t , a ) = 1 > 0 = Ui (t , a).3 An equilibrium of a signalling game is described in terms of the strategies of both players. If the sender uses strategy S and the receiver strategy R, it is clear how to determine the utility of this proﬁle for the sender, Us∗ (t, S, R), in any state t: Us∗ (t, S, R) = Us (t, R(S(t))). Due to his incomplete information, things are not as straightforward for the receiver. Because it might be that the sender using strategy S sends in different states the same signal, m, the receiver does not necessarily know the unique state relevant to determine his utilities. Therefore, he determines [ 170 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

335

his utilities, or expected utilities, with respect to the set of states in which the speaker could have sent message m. Let us deﬁne St to be the information state (or information set) the receiver is in after the sender, using strategy S, sends her signal instate t, i.e. St = {t ∈ T : S(t ) = S(t)}.4 With respect to this set, we can determine the (expected) utility of receiver strategy R in information state St , which is R’s expected utility in state t when the sender uses strategy S, Ur∗ (t, S, R) (where P (t |St ) is the conditional probability of t given St ): P (t |St ) × Ur (t , R(S(t ))). Ur∗ (t, S, R) = t ∈T

A strategy proﬁle S, R forms a Nash equilibrium iff neither the sender nor the receiver can do better by unilateral deviation. That is, S, R forms a Nash equilibrium iff for all t ∈ T the following two conditions are obeyed:5 (i) (ii)

¬∃S : Us∗ (t, S, R) < Us∗ (t, S , R), ¬∃R : Ur∗ (t, S, R) < Ur∗ (t, S, R ).

As can be checked easily, our game has 6 Nash equilibria: {S1 , R1 , S3 , R2 , S2 , R3 , S2 , R4 , S4 , R3 , S4 , R4 }. This set of equilibria depends on the receiver’s probability function. If, for instance, P (t) > P (t ), then S2 , R4 and S4 , R4 are no equilibria anymore: it is always better for the receiver to perform a. In signalling games it is assumed that the messages have no pre-existing meaning. However, it is possible that meanings can be associated with them due to the sending and receiving strategies chosen in equilibrium. If in equilibrium the sender sends different messages in different states and also the receiver acts differently on different messages, we can say with Lewis (1969, 147) that the equilibrium pair S, R ﬁxes meaning of expressions in the following way: for each state t, the message S(t) means either St = {t ∈ T |S(t ) = S(t)} (in the case that the sentence is used indicatively) or R(S(t)) (if the sentence is used imperatively).6 Following standard terminology in economics (Crawford and Sobel 1982), let us call S, R a (fully) separating equilibrium if there is a one-to-one correspondence between states (meanings) and messages, i.e., if there exists a bijection between T and M. Notice that among the equilibria in our example, two of them are separating: S1 , R1 and S3 , R2 .

[ 171 ]

336

ROBERT VAN ROOY

2.2. Requirements for Successful Communication In the introduction we have seen that according to Schiffer an action only counts as being communicative if it makes the speaker’s intention common knowledge. It can be argued that this common-knowledge requirement is met if a game has a unique solution. It is well-known (Osborne and Rubinstein, 1994) that in order for a strategy pair to be a Nash equilibrium, both the strategies that the agents can play and the preferences involved have to be common knowledge. Moreover, it is required that it is common knowledge that both agents are rational selﬁsh payoff optimizers. If then, in addition, a particular signalling game has only one (Nash) solution, it seems only reasonable to claim that in that case the speaker’s intention becomes common knowledge after she sent a particular signal.7 Thus we might claim communication to take place by sending message m in such a game if and only if (i) the game has a (partly or fully) separating equilibrium in which message m is sent; and (ii) this is the unique solution of the game.8 The ﬁrst condition is prominent in the economic and biological literature on signalling games. The second, uniqueness, condition plays an important role in Schelling (1960), Lewis (1969), and Clark (1996) to solve coordination problems and is stressed in the work of Parikh (1991, 2001) on situated communication. The following example shows that in case of non-arbitrary signals this uniqueness condition is sometimes indeed unproblematically satisﬁed. Consider the following abstract situation. There are two kinds of situations: t, the default case where there is no danger; and t where there is danger. The sender knows which situation is the case, the receiver does not. We might assume for concreteness that it is commonly known between sender and receiver that P (t) = 0.8 > 0.2 = P (t ). In the normal situation, t, the sender does not send a message, but in the other case she might. The message will be denoted by m, while not sending a message will be modelled as sending . The receiver can perform two kinds of actions: the default action a (which is like doing nothing); and action a . This latter action demands effort from the receiver, but is the only appropriate action in the case that there is danger. It does not harm the sender if it is done if there is no danger (the sender is ambivalent about the receiver’s response in t). One way to describe this situation is by assuming the following (also commonly known) utility functions: Us (t, a) = 5, Us (t, a ) = 5, Us (t , a) = −50, Us (t , a ) = 50, Ur (t, a) = 6, Ur (t, a ) = 0, Ur (t , a) = −10, Ur (t , a ) = 10.

[ 172 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

337

The strategies are as expected: S is just a function from t to {m, }, where is the empty message that is always sent in t; while R is a function from {, m} to {a, a }. Thus, we have the following strategies

Sender:

S1 S2

t t m

Receiver:

R1 R2 R3 R4

a a a a

m a a a a

On assuming that P (t) = 0.8, we receive the following payoff tables (for Ui∗ (·, S, R)): t: S1 S2

t : S1 S2

R1 5,2.8 5,6

R2 5,2 5,2

R1 −50,2.8 −50,−10

R3 5,2.8 5,6 R2 50,2 50,10

R4 5,2 5,2 R3 R4 −50,2.8 50,2 50,10 −50,-10

These payoff tables show that our game has exactly one Nash equilibrium: S2 , R3 , because only this strategy pair is an equilibrium (is boxed) in both states. Because in this game the unique-solution requirement is satisﬁed, we can be sure that communication is successful: If the sender sends m, the receiver will ﬁgure out that he is in situation t and should perform a . Our game has exactly one Nash equilibrium in which meaningful communication is taking place because the sender has an incentive to inﬂuence the hearer and the receiver has no dominating action. If either the sender sees no value in sending information, or the receiver counts any incoming information as valueless for his decision, a signalling game will (also) have so-called ‘pooling’ equilibria, in which the speaker always sends the same message, and ‘babbling’ equilibria where the receiver ignores the message sent by the speaker and always ‘reacts’ by choosing the same action. In such equilibria no information exchange is taking place. One reason for why a receiver ignores the message sent might be that he cannot (always) take the incoming information to be credible. A message is not credible if an individual might have an incentive to send this message in order to deceive her audience. In an important article, Crawford and Sobel (1982) show that the amount of credible informa[ 173 ]

338

ROBERT VAN ROOY

tion exchange in (cheap talk) games depends on how far the preferences of the participants are aligned.9 However, this does not mean that in all those cases successful communication takes place when the sender sends a message. The unique solution requirement has to be satisﬁed as well, for otherwise sender and receiver are still unclear about the strategy chosen by the other conversational participant. Above we saw that in some cases such a unique solution is indeed possible. The example discussed in Section 2.1 suggests, however, that in signalling games in which messages have no pre-existing meaning, the satisfaction of the uniqueness condition is the exception rather than the rule.10 Even limiting ourselves to separating equilibria will not do. The problem is that that game has two such equilibria: S1 , R1 and S3 , R2 .11 How is communication possible in such a situation?

3. A LANGUAGE AS A CONVENTIONAL SIGNALLING SYSTEM

3.1. Conventions as Rationally Justiﬁed Equilibria Above we assumed that the agents had no real prior expectations about what the others might do. Consider a simple symmetric two-person coordination game where both have to choose between a and b; if they both choose the same action they earn 1 euro each and nothing otherwise. If both take either of the other’s actions to be equally likely (i.e., there are no prior expectations yet), the game has two (strict) Nash equilibria: a, a and b, b. Things are different if each player takes it to be more likely that the other player will choose, say, a. In that case, both have an incentive to play a themselves as well: the expected utility of playing a is higher than that of playing b. But it is not yet a foregone conclusion that both also actually should play a: the ﬁrst agent might believe, for instance, that the other player does not believe that the ﬁrst will play a and she does not take the second player to be rational. That is, the beliefs of the agents need not be coherent (with themselves, or/and with each other). In that case, the ﬁrst agent might have an incentive not to play a. This will not happen, of course, when the beliefs of the two agents and their rationality are common knowledge (or common belief). In that case, action combination a, a is the only Nash equilibrium of the game. In the light of the above discussion, Lewis (1969) gave a straightforward answer of how agents coordinate on a particular signalling equilibrium: it is based on the commonly known expectation that the other will do so and each other’s rationality. Confronted with the recurrent coordination problem of how to successfully communicate information, the agents [ 174 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

339

involved take one of the equilibria to be the conventional way of solving the problem. This equilibrium S, R can be thought of as a signalling convention; a coding system that conventionally relates messages with meanings. According to Lewis (1969), a signalling convention is a partially arbitrary way to solve a recurrent signalling situation of which it is commonly assumed by both agents that the other conforms to it. Moreover, it has to be commonly known that the belief that the other conforms to it, means that both have a good and decisive reason to conform to it themselves, and will want the other to conform to it as well. A linguistic convention is then deﬁned as a generalization of such a signalling convention, where the problem is how to resolve a recurrent coordination problem to communicate information in a larger community. We would like to explain a convention’s (i) emergence and (ii) its selfperpetuating force. Thinking of a convention as a special kind of equilibrium concept of rational game theory gives Lewis a straightforward explanation of why a convention is self-sustaining. Notice that the condition requiring that the belief that the other conforms to it means that both have a good and decisive reason to conform to it themselves is stronger than that of a Nash equilibrium: it demands that if the other player chooses her equilibrium strategy, it is strictly best (i.e., payoff-maximizing) for an agent to choose the equilibrium strategy too. Thus, according to Lewis, a convention has to be a strict Nash equilibrium.12 Strict equilibria in rational game theory are sustained simply by self-interest: if one expects the other to conform to the convention, unilateral deviation makes one (strictly) worse off.13 The notion of a strict equilibrium is stronger than the standard Nash equilibrium concept used in game theory. In terms of it we can explain why some equilibria are unlikely candidates for being conventions. Recall that the game discussed in Section 2.1 had 6 Nash equilibria: {S1 , R1 , S3 , R2 , S2 , R3 , S2 , R4 , S4 , R3 , S4 , R4 }. We have seen that only the ﬁrst two are separating: different messages are sent in different states such that there exists a 1-1 correspondence between meanings and messages. According to Lewis’s (1969) deﬁnition of a convention, only these separating equilibria are appropriate candidates for being a convention, and he calls them signalling systems. In the previous section we were confronted with what game theorists call the problem of equilibrium selection. Which of the (separating) equilibria of the game should the players coordinate on to communicate information? Lewis proposed to solve this problem by assuming that one of those equilibria is a convention. Which one of the (separating) equi[ 175 ]

340

ROBERT VAN ROOY

libria should be chosen to communicate information is, in some sense, arbitrary, and it is this fact that makes both separating equilibria S1 , R1 and S3 , R2 equally appropriate candidates for being a convention (for solving the recurrent coordination problem at hand). In some sense, however, Lewis’s solution just pulls the equilibrium selection problem back to another level: How are we to explain which of these regularities comes about? Two natural ways to establish a convention are explicit agreement and precedence. But for linguistic conventions the ﬁrst possibility is obviously ruled out (at least for a ﬁrst language), while the second possibility just begs the question. Following Lewis’s (1969) proposal of how to solve coordination problems, this leaves salience as the last possibility. A salient equilibrium is one with a distinguishing psychological quality which makes it more compelling than other equilibria. With Skyrms (1996), we ﬁnd this a doubtful solution for linguistic conventions: why should one of the separating equilibria be more salient than the other? But then, how can one signalling equilibrium be selected without making use of the psychological notion of salience? Not only is Lewis’s account of equilibrium selection problematic, his explanation of the self-perpetuating force of signalling equilibria is not completely satisfactory either. His explanation crucially makes a strong rationality assumption concerning the agents engaged in communication. Moreover, as for all equilibria concepts in standard game theory, a lot of common knowledge is required; the rules of the game, the preferences involved, the strategies being taken (i.e., lexical and grammatical conventions), and the rationality of the players must all be common knowledge.14 Though it is unproblematic to accept that the strong requirements for being common knowledge can be met for simple pieces of information, with Skyrms (1996) we ﬁnd it optimistic to assume that they are met for complicated language games played by large populations. 3.2. Natural Conventions Lewis (1969) admits that agents can conform to a signalling (or linguistic) convention without going through the explicit justiﬁcation of why they should do so, i.e. without taking into account what the others are supposed to do, or what they expect the agent herself to do. Agents can use a signalling system simply out of habit and they might have learned this habit just by imitating others. These habits are self-perpetuating as well: if each individual conforms to the signalling convention out of habit, there is no reason to change one’s own habit. Still, Lewis argues that rationality is important: the habit has a rational justiﬁcation. That might be so, but, then, not any justiﬁcation for a habit is necessarily the correct explanation [ 176 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

341

of why the habit is followed. Although rationality considerations arguably play a crucial role in learning and in following the conventions of one’s second language, this is not so clear when one learns and speaks one’s mother’s tongue. But if that is so, the higher-order intentions that Grice, Lewis, and others presuppose for successful communication are perhaps not as crucial as is standardly assumed. For signal m to mean a, a receiver does not always have to do a because of its conscious ‘recognition of the sender’s intention for it to do a’. According to a naturalistic approach towards meaning (or intentionality) – as most forcefully defended by Millikan (1984) in philosophy and also adopted by biologists thinking of animal communication as Maynard Smith and Harper (1995) – all that is needed for a signal to ‘mean’ something is that the sender-receiver combination S, R from which this message-meaning pair follows must be selected for by the force of evolution. In this way – as stressed by Millikan (1984) – a potential distinction is made not between human and animal communication, but rather between animal (including human) communication and ‘natural’ relations of indication. In distinction with the dances of honeybees to indicate where there is nectar to be found, smoke is not selected for by how well it indicates ﬁre.15 Just as Crawford and Sobel (1982) show that (cheap talk) communication is possible only when signalling is advantageous for both the sender and the receiver, in the same way it is guaranteed that for a signalling pair to be stable, there must be a selective advantage both (i) in attending and responding to the signals and (ii) in making them. This seems to be a natural reason for why a signalling convention has normative features as well. Evolutionary game theory (EGT) is used to study the notion of stability under selective pressures. Where traditional game theory is a normative theory with hyperrational players, EGT is more descriptive. It starts from a realistic view of the world, where players are neither hyperrational, i.e., are limited in their computational resources in their ability to reason, nor fully informed.

4. STABILITY AND EVOLUTION IN GAME THEORY

Lewis (1969) proposed to explain why linguistic conventions are selfsustaining in terms of rational game theory. To do so, he was forced to make very strong assumptions concerning agents’ rationality and (common) knowledge. This suggests that we should look for another theoretical underpinning of the self-sustaining force of signalling conventions. Above, we have seen that perhaps an (unconscious) mechanism like habit is an at least as natural reason for a linguistic convention to remain what it is. In this section we will show that by adopting an evolutionary stance towards [ 177 ]

342

ROBERT VAN ROOY

language, such a simpler mechanism might be enough for linguistic conventions to be stable. Our problem, i.e. which signalling conventions are self-sustaining, now turns into a problem of which ones are evolutionarily stable, i.e., resistant to variation/mutation. In Section 3.1 we have thought of a sender-receiver strategy pair S, R as a signalling convention to resolve a recurrent coordination problem to communicate information. We assumed that all that matters for all players was successful communication and that the preferences of the agents are completely aligned. A simple way to assure this is to assume that A = T and that all players have the following utility function: U (t, R(S(t))) = 1, if R(S(t)) = t = 0 otherwise. Implicitly, we still assumed that individuals have ﬁxed roles in coordination situations: they are always either a sender or a receiver. In this sense it is an asymmetric game. It is natural, however, to give up this assumption and turn it into a symmetric game: we postulate that individuals can take both the sender- and the receiver-role. Now we might think of a pair like S, R as a language. We abbreviate the pair Si , Ri by Li and take Us (t, Li , Lj ) = U (t, Rj (Si (t))) and Ur (t, Li , Lj ) = U (t, Ri (Sj (t))). Consider now the symmetric strategic game in which each player can choose between ﬁnitely many languages. On the assumption that individuals take both the sender and the receiver role half of the time, the following utility function, U(Li , Lj ), is natural for an agent with strategy Li who plays against an agent using Lj (where EUi (L, L ) denotes the expected language L if the other participant plays L , utility for i to play i.e. t P (t) × Ui (t, L, L )). 1 P (t) × Us (t, Li , Lj ))] U(Li , Lj ) = [ × ( 2 t 1 P (t) × Ur (t, Li , Lj ))] +[ × ( 2 t =

1 × (EUs (Li , Lj ) + EUr (Li , Lj )). 2

Now we say that Li is a (Nash) equilibrium of the language game iff U(Li , Li ) ≥ U(Li , Lj ) for all languages Lj . It is straightforward to show that language Li is a (strict) equilibrium of the (symmetric) language game if and only if the strategy pair Si , Ri is a (strict) equilibrium of the (asymmetric) signalling game. [ 178 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

343

Under what circumstances is language L evolutionarily stable? Thinking of strategies immediately as languages, standard evolutionary game theory (Maynard Smith 1982; Weibull 1995, and others) gives the following answer.16 Suppose that all individuals of a population use language L, except for a fraction of ‘mutants’ which have chosen language L . Assuming random pairing of strategies, the expected utility, or ﬁtness, of language Li ∈ {L, L }, EU (Li ), is now: EU (Li ) = (1 − )U(Li , L) + U(Li , L ). In order for mutation L to be driven out of the population, the expected utility of the mutant need to be less than the expected utility of L, i.e., EU (L) > EU (L ). To capture the idea that mutation is extremely rare, we require that a language is evolutionarily stable if and only if there is a (small) number n such that EU (L) > EU (L ) whenever < n. Intuitively, the larger n is, the ‘more stable’ is language L, since larger ‘mutations’ are resisted.17 As is well-known (Maynard Smith 1982), this deﬁnition comes down to Maynard Smith and Price’s (1973) concept of an evolutionarily stable strategy (ESS) for our language game. DEFINITION 1 (Evolutionarily Stable Strategy, ESS). Language L is Evolutionarily Stable in the language game with respect to mutations if 1. L, L is a Nash equilibrium, and 2. U(L , L ) < U(L, L ) for every best response L to L for which L = L. We see that L, L can be a Nash equilibrium without L being evolutionarily stable (see Tuyls et al. (this volume) for more discussion). This means that the standard equilibrium concept in evolutionary game theory is a reﬁnement of its counterpart in standard game theory (see Tuyls et al. (this volume) for more on the relation between the different equilibrium concepts). As it turns out, this reﬁnement gives us an alternative way from Lewis (1969) to characterize the Nash equilibria that are good candidates for being a convention. In an interesting article, Wärneryd (1993) proves the following result: For any sender-receiver game of the kind introduced above, with the same number of signals as states and actions, a language S, R is evolutionarily stable if and only if it is a (fully) separating Nash equilibrium.18 In fact, this result follows immediately from more general game theoretical considerations. First, it follows already directly from the deﬁnition above that being a strict Nash equilibrium is a sufﬁcient condition for being an ESS. Given that in our asymmetric cooperative signalling [ 179 ]

344

ROBERT VAN ROOY

games the separating equilibria are the strict ones, a general result due to Selten (1980) – which states that in asymmetric games all and only the strict equilibria are ESS – shows that this is also a necessary condition. Thus we have the following FACT 1 (Wärneryd (and Selten)). In a pure coordination language game, L is an ESS if and only if L, L is a separating Nash equilibrium. In this way Wärneryd (and Selten) has given an appealing explanation of why Lewisean signalling systems are self-sustaining without making use of a strong assumption of rationality or (common) knowledge. But this is not enough for the evolutionary stance to be a real alternative to Lewis’s approach towards conventions. It should also be able to solve the equilibrium selection problem. Which of the potential candidates is actually selected as the convention? As it turns out, also this problem has an appealing evolutionary solution, if we also take into account the dynamic process by which such stable states can be reached. Taylor and Jonker (1978) deﬁned their replicator dynamics to provide a continuous dynamics for evolutionary game theory. It tells us how the distribution of strategies playing against each other changes over time.19 A dynamic equilibrium is a ﬁxed point of the dynamics under consideration. A dynamic equilibrium is said to be asymptotically stable if (intuitively) a solution path where a small fraction of the population starts playing a mutant strategy still converges to the stable point (for more discussion, see Tuyls et al. (this volume) and references therein). Asymptotic stability is a reﬁnement of the Nash equilibrium concept. And one that is closely related with the concept of ESS. Taylor and Jonker (1978) show that every ESS is asymptotically stable. Although in general it is not the case that all asymptotically stable strategies are ESS, on our assumption that a language game is a cooperative game (and thus doubly symmetric)20 this is the case. Thus, we have the following FACT 2. A language L is an ESS in our language game if and only if it is asymptotically stable in the replicator dynamics. The ‘proof’ of this fact follows immediately from some important more general results provided by Weibull (1995, section 3.6). First, he shows that Fisher’s (1930) so-called fundamental theorem of natural selection – according to which evolutionary selection induces a monotonic increase over time in the average population ﬁtness –, applies to all doubly symmetric games. This means that in such games the dynamic process will always result in a ‘local maximum’ or ‘local efﬁcient’ strategy.21 From [ 180 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

345

this it follows that in such games any local efﬁcient strategy – which is itself already equivalent to being an ESS – is equivalent with asymptotic stability in the replicator dynamics.22 Fact 2 shows that a separating Nash equilibrium – i.e., a signalling equilibrium that according to Lewis is a potential linguistic convention – , will evolve in our evolutionarily language games (almost) by necessity.23 The particular one that will evolve depends solely on the initial distribution of states and strategies (languages). With Skyrms (1996) we can conclude that if the evolution of linguistic conventions proceeds as in replicator dynamics, there is no need to make use of the psychological notion of salience to explain selection of conventional equilibria.

5. RELIABILITY AND COSTLY SIGNALLING

Until now we have assumed that conventional languages are used only when the preferences of the agents involved are aligned. But, of course, we use natural language also if this pre-condition is (known) not (to be) met. As we have seen in Section 2.2, however, in that case the sender (might) have an incentive to lie and/or mislead and the receiver has no incentive to trust what the sender claims. But even in these situations, agents – human or animal – sometimes send messages to each other, even if the preferences are less harmonically aligned.24 Why would they do that? In particular, how could it be that natural language could be used for cooperative honest communication even in these unfavourable circumstances? Perhaps the ﬁrst answer that comes to mind involves reputation and an element of reciprocity. These notions are standardly captured in terms of the theory of repeated games (Axelrod and Hamilton, 1981).25 The standard answer to our problem how communication can take place if the preferences are not perfectly aligned both in economics (starting with Spence (1973)) and in biology (Zahavi 1975; Grafen 1990; Hurd 1995) does not make use of such repeated games. Instead, it is assumed that reliable communication is also possible in these circumstances, if we assume that signals can be too costly to fake.26 The utility function of the sender takes no longer only the beneﬁt of the receiver’s action for a particular type of sender into account, but also the cost of sending the message. The aim of this section is to show that this standard solution in biology and economics can, in fact, be thought of as being very close to our intuitive solution involving reputation. [ 181 ]

346

ROBERT VAN ROOY

We will assume that the sender’s utility function Us can be decomposed in a beneﬁt function, Bs and a cost-function, C. Consider now a two-type two-action game with the following beneﬁt table. two-type, two-action:

tH tL

aH aL 1, 1 0, 0 1, 0 0, 1

In this game, the informed player (the sender) prefers, irrespective of her type, column player to choose aH while column player wants to play aH if and only if the sender is of type tH . For a separating equilibrium to exist, individuals of type tL must not beneﬁt by adopting the signal typical of individuals of type tH , even if they would elicit a more favorable response by doing so. Hurd (1995) shows that when we assume that the cost of sending a message can depend on the sender’s type, an appealing separating equilibrium exists. Assume that the cost of message m saying that the sender is of type tH is denoted by C(ti , m) for individuals of type i and that sending is costless for both types of individuals. Provided that C(tL , m) > 1 > C(tH , m), the cost of sending m will outweigh the beneﬁt of its production for individuals of type tL , but not for individuals of type tH , so that the following separating equilibrium exists: individuals of type tH send message m,while individuals of type tL send . Notice that on Hurd’s characterization, in the equilibrium play of the game it is possible that not only tL sends a costless message, but that the high type individual tH does so as well!27 This suggests that the theory of costly signalling can be used to account for honest communication between humans who make use of a conventional language with cost-free messages. Moreover, an evolutionary argument shows that Hurd’s characterization with costfree messages sent in equilibrium is actually the most plausible one.28 The only thing that really matters is that the cost of sending a deceiving message is higher than its potential beneﬁt (so that they are sent only by individuals who deviate from equilibrium play). How can we guarantee this to be possible? In the example discussed in this section, as in the examples discussed in the economic and biological literature, it is advantageous pretending to be better than one actually is. This is crucially based on the assumption that messages are not (immediately) veriﬁable. This assumption opens the possibility that low-quality individuals could try to masquerade themselves as being of a high quality. And this assumption makes sense: if all messages could immediately be veriﬁed, the game being played is one of complete information in which it makes no sense to send messages about one’s type (i.e. private information) at all. However, the assumption that messages are [ 182 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

347

completely unveriﬁable is for many applications unnatural as well: an individual can sometimes be unmasked as a liar, and she can be punished for it. Thus, making a statement can be costly: one can be punished (perhaps in terms of reputation) when one has claimed to be better than one actually is.29,30 If this punishment is severe enough, even a small probability of getting unmasked can already provide a strong enough incentive not to lie.31 The above sketched analysis of truthful human communication suggests that although natural language expressions are cheap in production, the theory of costly signalling can still be used to account for communicative behavior between humans. With Lachmann et al (manuscript) I take this to be an important insight: it suggests a way to overcome the limitations of both cheap talk signalling and the adoption of the cooperative assumption by Grice and Lewis. By assuming that sending signals can be costly, we can account for successful communication even if the preferences of the agents involved do not seem to be well aligned. Perhaps the most appealing way to think of Hurd’s result is that it explains why in more situations the agent’s preferences are aligned than it appears at ﬁrst sight such that the possibility of communication is the rule, rather than the exception.32

6. THE EFFICIENT USE OF LANGUAGE

Until now we have discussed how an expression m of the language used could come to have (and maintain) its conventional meaning [[m]]. This does not mean, however, that if a speaker uses m she just wants to inform the receiver that [[m]] is the case. It is well established that a speaker normally wants to communicate more by the use of a sentence than just its conventional meaning. Sometimes this is the case because the conventional meaning of an expression underspeciﬁes its actual truth-conditional interpretation; at other times the speaker implicates more by the use of a sentence than its truth-conditional conventional meaning. It is standard to assume that both ways of enriching conventional meaning are possible because we assume that the speaker conforms to Grice’s (1967) maxims of conversation: she speaks the truth (quality), the whole truth (quantity), though only the relevant part of it (relevance), and does so in a clear and efﬁcient way (manner). Grice argued that because the speakers are taken to obey these maxims, a sentence can give rise to conversational implicatures: things that can be inferred from an utterance that are not conditions for the truth of the utterance. Above, we discussed already the maxim of quality, which has a somewhat special status. Grice argues that the implicatures generated by the other maxims come in two sorts: particu[ 183 ]

348

ROBERT VAN ROOY

larized ones, where the implicature is generated by features of the context; and generalized ones, where (loosely speaking) implicatures are seen as default rules possibly overridden by contextual features. There exist general agreement that both kinds of implicatures exist, but the classiﬁcation of the various implicatures remains controversial within pragmatics. Whereas relevance theorists (Sperber and Wilson, 1986) tend to think that implicatures depend predominantly on features of the particular context, Levinson (2000), for example, takes generalized implicatures to be the rule rather than the exception. Similar controversies can be observed on the issue of how to resolve underspeciﬁed meanings: whereas Parikh (1991, 2001) argues optimistically that indeterminacy in natural language can be solved easily in many cases through the existence of a unique (Pareto-Nash) solution of the coordination problem of how to resolve the underspeciﬁcation, proponents of centering theory (Grosz et al. 1995), for example, argue that pronoun resolution is, or needs to be, governed by structural (default) rules. Except for the maxim of quality, Horn (1984), Levinson (2000), and others argue that the Gricean maxims can be reduced to two general principles: The I -principle which tells the hearer to interpret a sentence in its most likely or stereotypical way, and the Q-principle which demands the speaker to give as much (relevant) information as possible. In this section it will be argued that two general pragmatic rules which closely correspond with these two principles can be given an evolutionary motivation which suggests that ‘on the spot’ reasoning need not play the overloaded role in natural language interpretation as is sometimes assumed.

6.1. Iconicity in Natural Languages In Section 3.1 we have seen that Lewis (1969) proposes to explain the semantic/conventional meaning of expressions in terms of separating equilibria of signalling games. However, we also saw that simple costless signalling games have many such equilibria. Lewis assumed that each of these equilibria are equally good and thus that it is completely arbitrary which one will be chosen as a convention. In Section 4 we have seen that all separating equilibria satisfy the ESS condition and that which one will in the end emerge is a matter of chance and depends only on the initial distribution of states and strategies (languages). Although natural at the level of individual words and the objects they refer to, at a higher organizational level the assumption of pure arbitrariness or chance can hardly be sustained. It cannot explain why conventions that enhance efﬁcient communication are more likely than others that do not. [ 184 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

349

Consider a typical case of communication where two meanings t1 and t2 can be expressed by two linguistic messages m1 and m2 . We have here a case of underspeciﬁcation: the same message can receive two different interpretations. In principle this gives rise to two possible codings: {t1 , m1 , t2 , m2 } and {t1 , m2 , t2 , m1 }. In many communicative situations, however, the underspeciﬁcation does not really exist, and is resolved due to the general pragmatic principle – referred to as the pragmatic iconicity principle – that a lighter (heavier) form will be interpreted by a more (less) salient, or stereotypical, meaning: (i) It is a general defeasible principle, for instance, in centering theory (Grosz et al, 1995) that if a certain object/expression is referred to by a pronoun, another more salient object/expression should be referred to by a pronoun too; (ii) Levinson (2000) seeks to reduce Chomsky’s B and C principles of the binding theory to pragmatics maxims. In particular, disjoint reference of lexical noun phrases throughout the sentence is explained by pointing to the possibility of the use of a lighter expression, viz. an anaphor or pronoun; (iii) The preference for stereotypical interpretations (Atlas and Levinson, 1981); (iv) and perhaps most obviously, Horn’s (1984) division of pragmatic labor according to which an (un)marked expression (morphologically complex and less lexicalized) typically gets an (un)marked meaning (cf. John made the car stop versus John stopped the car). Horn (1984), Levinson (2000), Parikh (1991, 2001) and Blutner (2000) correctly suggest, that because this generalized pragmatic iconicity principle allows us to use language in an efﬁcient way, it is not an arbitrary convention among language users. There is no alternative rule which would do equally well for the same class of interactions if people generally conformed to this alternative. This can be seen most simply if we think of languages that are separating equilibria in our language game as coding systems of meanings distributed with respect to a particular probability function.33 This suggests that the rule should follow from more general economic principles. Indeed, Parikh gives a game theoretical analysis of why this principle of iconicity is observed. However, he treats it as a particularized conversational implicature. Here we want to argue that it should rather be seen as a generalized default rule.34 6.1.1. Underspeciﬁcation and Pragmatic Interpretation Rules In Section 2.2 we saw that in cheap talk games meaningful communication is possible only in so far as the preferences of the participants coincide. But in Section 5 we showed that by making use of costly messages we can overcome this limitation. It is standardly assumed that this is the only reason why costs of messages are taken into account: to turn games in [ 185 ]

350

ROBERT VAN ROOY

which the preferences are not aligned to ones where they are. We have suggested that in this way we can account for Grice’s maxim of quality. In this section we will see, however, that costly messages can also be used to account for another purpose and give a motivation for the pragmatic iconicity principle. To be able to do so, we should allow for underspeciﬁcation or context dependence. In different contexts, the same message can receive a different interpretation.35 In our description of signalling games so far it is not really possible to represent a conventional language with underspeciﬁed meanings that are resolved by context. The best thing we could do is to represent underspeciﬁcation as real ambiguity: sender strategy S is a function that assigns the same message to different states, while receiver strategy R is a mixed strategy assigning to certain messages a non-trivial probability distribution over the states. Such a sender-receiver strategy combination can never be evolutionarily stable (Wärneryd, 1993): one can show that a group of individuals using a mutant language without such ambiguity has no problem invading and taking over a population of ambiguous language users (if there exists an unused message). To account for underspeciﬁcation, we have to enrich our models and take contexts into account. For the purpose of this section we can think of a context as a probability distribution over the state space T . For simplicity (but without loss of generality) we assume that T = {t, t } and M = {m, m }. Communication takes place in two kinds of contexts: in one context where P (t) = 0.9 (and thus P (t ) = 0.1) and in one where P (t) = 0.1. We assume that both contexts are equally likely. Call the ﬁrst context ρ1 and the second ρ2 . We will assume that it is common knowledge among the conversational partners in which context they are, but only the sender knows in each context in which state she is. The messages do not have a pre-existing meaning, but differ in terms of (production) costs: we assume that C(m) < C(m ). However, we assume that also for the sender it is always better to have successful communication with a costly message than unsuccessful communication with a cheap message. Thus, in contrast to Section 5, we assume that the cost of sending a message can never exceed the beneﬁt of communication. To assure this, we will take the sender’s utility function to be decomposable into a beneﬁt and a cost function, Us (ti , mj , tk ) = Bs (ti , tk ) − C(mj ), with C(m) = 0, C(m ) = 13 , and adopt the following beneﬁt function: Bs (ti , tk ) = 1, if tk = ti = 0 otherwise.

[ 186 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

351

A sender strategy is now a function mapping a state and a context to a message, while a receiver strategy is now a function from messagecontext pairs to states. This game has of course two separating equilibria, call them L and L , with no underspeciﬁcation: L gives rise to the mapping {t, m, t , m } in both contexts, while L to {t, m , t , m}. These languages are also evolutionarily stable in the sense of being an ESS. However, our new game also allows for languages with underspeciﬁcation to be evolutionarily stable. Given that it is common knowledge between sender and receiver in which context they are, the only requirement is that they give rise to a separating equilibrium in each context. We can distinguish two such underspeciﬁed languages: (i) the Horn language LH with the mappings {t, m, t , m } and {t, m , t , m} in contexts ρ1 and ρ2 , respectively; and (ii) the anti-Horn language LAH where the two mappings are used in the other contexts. It is easily seen that both languages are evolutionarily stable, because also LH , LH and LAH , LAH are strict Nash equilibria. Both languages do better against themselves than against the other, or against L or L . Our above discussion shows that underspeciﬁcation is possible. However, we want to explain something more: why is underspeciﬁcation useful, and why is the underspeciﬁed Horn language LH which incorporates the pragmatic iconicity principle more natural to emerge than the underspeciﬁed Anti-Horn language LAH ? As it turns out, the problem we encountered is a well-known one in game theory: how to select among a number of strict Nash equilibria the one that has the highest expected utility, i.e., is (in our games) Pareto optimal? In our language game described above we had four strict equilibria: L, L, L , L , LH , LH , and LAH , LAH . These equilibria correspond to our four evolutionarily stable languages L, L , LH , and LAH , respectively. A simple calculation shows that the Horn language is the one with the highest expected utility.36 Thus, if we can ﬁnd a natural explanation of why our evolutionary dynamics tends to select such optimal equilibria, we have provided a naturalistic explanation for why (i) languages make use of underspeciﬁcation, and (ii) respect the iconicity principle. 6.1.2. Correlated and Stochastic evolution In van Rooy (in press), two (relatively) standard explanations of why Pareto optimal solutions (in coordination games) tend to evolve are discussed. According to both, we should give up an assumption behind the stability concepts used so far. According to the ﬁrst explanation (Skyrms 1994, 1996) we give up the assumption that individuals pair randomly with other individuals in [ 187 ]

352

ROBERT VAN ROOY

the population. Random pairing is assumed in the calculation of the expected utility of a language. The probability with which individuals using language Li interact with individuals using Lj depends simply on the proportion of individuals using the latter language: EU(Li ) = j P (Lj ) × U(Li , Lj ). This expected utility was used both to determine the ESS concept and to state the replicator dynamics. By giving up random pairing (a well-known strategy taken in biology to account for kin-selection, and in cultural evolution to account for clustering), we have to postulate the existence of an additional function which determines the likelihood that an individual playing Li encounters an individual playing Lj , π(Lj /Li ), such /Li ) = 1. What counts then is the following expected utility: that j π(Lj EUπ (Li ) = j π(Lj /Li ) × U(Li , Lj ). The other deﬁnitions used in the dynamic system follow the standard deﬁnitions in replicator dynamics. Although this generalization seems to be minor, it can have signiﬁcant effects on the resulting stable states. Assume a form of correlation: a tendency of individuals to interact more with other individuals playing the same strategy (i.e., using the same language). Formally, positive correlation comes down to the condition that for any language Li , π(Li /Li ) > P (Li ) (Skyrms 1994). In the extreme case, i.e. π(Li /Li ) = 1, the only stable state in the replicator dynamics is the one which has the highest expected utility in self-interaction. In our case the Pareto optimal language LH is selected and we have an evolutionary explanation for the existence of underspeciﬁcation and the use of iconicity.37 For our evolutionary language game there is another, and perhaps more natural, possibility to ensure the emergence of Pareto efﬁcient languages. It is to give up the assumption that the transition from one generation to the next in the dynamic model is completely determined by the distribution of strategies played in a population and their expected utilities. We can assume that the transition is (mildly) stochastic in nature.38 As shown by Kandori, Mailath and Rob (1993) and Young (1993), this results in the selection of the so-called risk-dominant strict Nash equilibria in the (very) long term.39,40 In general, a risk-dominant equilibrium need not be Pareto efﬁcient, but in cooperative games the two concepts coincide. A natural way to allow for stochastic adjustment in our evolutionary language game is to give up the assumption that an individual simply adopts the strategy from its parent with probability 1. Giving up this assumption makes sense: the inheritance of language is imperfect, possibly due to non-optimal learning.41 It is not obvious which of the proposals to motivate the attraction of Pareto optimal solutions is more plausible to assume for natural languages.

[ 188 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

353

In fact, both factors (clustering and innovation) seem to play a role in the evolution and change of languages.

6.2. Exhaustive Interpretation According to Grice’s (1967) maxim of Quantity, or Horn’s (1984) and Levinson’s (2000) Q-principle, speakers should give as much (relevant) information as possible. In this section we will give a motivation for that, and suggest a way to explain the convention according to which the effort required for this optimal transfer of information is divided between speaker and hearer such that the speaker does not have to be fully explicit and the receiver interprets answers exhaustively. Let us assume that the relevant information is the information needed for the receiver to resolve his decision problem. Suppose the receiver r is commonly known to be a Bayesian utility maximizer and he asks question Q because this question ‘corresponds’ closely to his decision problem. What is then the best answer-strategy S to give relevant information? Let us look at some candidates. According to strategy S1 , the speaker gives as much relevant information as she can. Assume that K(t) denotes the set of states that the sender thinks are possible in t. Assume, furthermore, that question Q gives rise to (or means) a partition of T : Q. Thus, each element q of Q is also a set of states. We will refer both to the interrogative sentence and to the induced partition as a question. Then we deﬁne QK(t ) to be {q ∈ Q|q ∩ K(t) = ∅}, i.e. the elements of Q which the sender takes to be possible. Then S1 is the strategy that gives in every state t the following proposition: QK(t ), i.e., the union of the elements of QK(t ). Suppose, for instance, that Q denotes the question corresponding to ‘Who came?’, i.e., ‘Which individuals have property P ?’, and that K(t) = {t, t } such that in t (only) John came, and in t , John and Mary. The message that then expresses proposition QK(t ) is ‘John, perhaps Mary, and nobody else’. If we assume that the answerer knows exactly who came, i.e., is known to be fully competent about the question-predicate P , the proposition expressed is λt [P (t ) = P (t)]. According to strategy S2 , the speaker gives the set of individuals of whom she is certain that they satisfy the question-predicate. Thus, for the question Who has property P?, the answer is going to be λt [KP (t) ⊆ P (t )], where KP (t) is the set {d ∈ D|K(t) |= P (d)} and K(t) |= P (d) iff ∀t ∈ K(t) : d ∈ P (t ). In the example discussed for strategy S1 , the answerer would now use a message like ‘(At least) John came’. Notice that if the answerer is known to be competent about the extension of P , the answer reduces to λt [P (t) ⊆ P (t )]. [ 189 ]

354

ROBERT VAN ROOY

These answer-strategies closely correspond with some well-known analyses of questions in the semantic literature: on the assumption of competence, S1 gives rise to Groenendijk and Stokhof’s (1984) partition semantics: {S1 (t)|t ∈ T } = {λt [P (t ) = P (t)]|t ∈ T }, while S2 gives rise to {S2 (t)|t ∈ T } = {λt [P (t) ⊆ P (t )]|t ∈ T } = { {λt [d ∈ P (t )]|d ∈ P (t)}|t ∈ T } which corresponds to Karttunen’s (1977) semantics for questions. In order to investigate which of S1 , S2 , or some other strategies can be part of a Nash equilibrium together with receiver strategy RB which implements a Bayesian rational agent, we have to assume that, in equilibrium, the receiver knows S. Thus he is not going to update his belief with [[S(t)]], but rather with St = {t ∈ T |S(t ) = S(t)}. A speaker who uses S1 would give in each state at least as much (relevant) information as speakers using the alternative strategies. In particular, for each state t (where P has a non-empty extension), S1,t ⊆ S2,t . This is obvious for the propositions that would be given in state t on the assumption of full competence: S1 : λt [P (t ) = P (t)]; S2 : λt [P (t) ⊆ P (t )], but the same is true if we do not make our assumption of competence. A well-known fact of decision theory (Blackwell 1953) states that an agent with an information structure, or possibility operator, K, is able to make at least as good decisions as an agent with possibility operator K iff for each t ∈ T : K(t) ⊆ K(t ). But this means that if our questioner is Bayesian rational and is going to believe what the answerer tells him, S1 is the for him preferred answer-strategy. On an assumption of perfect cooperation, this is also true for the answerer himself. We can conclude that if RB is the strategy adopted by a perfect Bayesian, S1 , RB is the only Nash equilibrium of the game induced by question Q. In fact, it is (on average) strictly better than S2 and other alternatives, which means that S1 , RB is the only ESS on the same assumptions. Thus, in cooperative games it is optimal to obey the Gricean maxim to give asmuch relevant information as possible, i.e. give an answer with meaning QK(t ). Suppose that the question under discussion, Q, is Who has property P?. How should the answer be coded? Given that it is only propositions involving question-predicate P that counts, the optimal answer QK(t ) given in state t equals the intersection of the following propositions (where D is the set of individuals, P (d) means that the speaker thinks it is possible that d has property P , and P¯ denotes the complement of P ): {[[P (d)]] : K(t) |= P (d)}; W = d∈D

X=

d∈D

[ 190 ]

{[[P (d)]] : K(t) ∩ [[P (d)]] = ∅};

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

Y =

355

{[[P¯ (d)]] : K(t) ∩ [[P¯ (d)]] = ∅};

d∈D

Z=

{[[P¯ (d)]] : K(t) |= P¯ (d)}.

d∈D

Now suppose that the answerer is mutually known to be fully competent about predicate P . That is, she knowsof each individual whether it has property P or property P¯ . In that case QK(t ) = W ∩ Z, i.e., the proposition that states for each individual whether it has property P or property P¯ . We have seen above that the combination S1 , RB is an equilibrium strategy as far as information transfer is concerned. But it is a somewhat unfair equilibrium as well: the sender is (normally) required to give a very complex answer, while the receiver can lay back. As far as information transfer is concerned, however, our above reasoning does not force us to this equilibrium. If it is common knowledge between speaker and hearer that the latter will infer more from the use of a message than its conventional meaning, many other sender-receiver strategies give rise to the same optimal information transfer as well. For example, on our assumption of full competence again, the speaker does not have to explicitly state of each individual whether it satisﬁes question-predicate P or satisﬁes P¯ . If it is mutually known between the two that the sender only mentions the positive instances, i.e., uses strategy S2 , she can just express the above proposition W , leaving proposition Z left for the hearer to infer. This would obviously be favorable to the sender, but is a natural consequence of evolution, only if the extra effort transferred to the hearer is relatively small. This is possible, if the task left to the hearer – i.e., to infer from a message with conventional meaning W to the above proposition QK(t ) = W ∩ Z –, can be captured by a simple but still general interpretation mechanism. As it turns out, this is, in fact, the case. Assuming that the hearer receives message m as answer to the question Who has property P?, he can interpret it as follows (where t

INFORMATION, INTERACTION, AND AGENCY

Wiebe van der Hoek

Reprinted from Synthese 139:2 and 142:2 (2004), Special Section Knowledge, Rationality & Action

123

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-10 1 4020-3600-0 ISBN-13 978-1-4020-3600-0

Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands.

Cover design by Vincent F. Hendricks

Printed on acid-free paper

All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands.

Contents

Foreword Wiebe van der Hoek Logics for Epistemic Programs Alexandru Baltag and Lawrence S. Moss

vii

1

A Counterexample to Six Fundamental Principles of Belief Formation Hans Rott

61

Comparing Semantics of Logics for Multi-Agent Systems Valentin Goranko and Wojciech Jamroga

77

A Characterization of Von Neumann Games In Terms of Memory Giacomo Bonanno

117

An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems Karl Tuyls, Ann Now é, Tom Lenaerts and Bernard Manderick

133

Evolution of Conventional Meaning and Conversational Principles Robert van Rooy

167

vi Nonmonotonic Inferences and Neural Networks Reinhard Blutner

203

A Model of Jury Decisions Where All Jurors Have the Same Evidence Franz Dietrich and Christian List

235

A SAT-Based Approach to Unbounded Model Checking for Alternating-Time Temporal Epistemic Logic M. Kacprzak and W. Penczek

263

Update Semantics of Security Protocols Arjen Hommerson, John-Jules Meyer and Erik de Vink

289

Index

329

WIEBE VAN DER HOEK

FOREWORD

1. Introduction It was in 2002 that the idea arose that the time was right for a journal in the area of reasoning about Knowledge, Rationality and Action; a journal that would be a platform for those researchers that work on epistemic logic, belief revision, game and decision theory, rational agency, planning and theories of action. Although there are some prestigious conferences organised around these topics, it was felt that to have a journal in this area would have lots of added value. What such a journal would typically be a platform for, would be the kind of problems addressed by researchers from Computer Science, Game Theory, Artiﬁcial Intelligence, Philosophy, Knowledge Representation, Logic and Agents. Problems that address artiﬁcial systems that have to gather information, reason about it and then make a sensible decision about what to do next. It is for this reason, that I am very happy that Knowledge, Rationality & Action (KRA) now exists as its own Section at Springer. For the clear and obvious links that the scope of KRA has with Philosophy, it was decided that KRA would be launched as a series within the journal Synthese. This book collects the ﬁrst two issues of KRA. Its index shows that these ﬁrst two issues indeed address its ‘core business’: all the chapters refer explicitly to knowledge, for instance, and rationality is represented by the many contributions that address games, or reasoning with or about strategies. Actions are present in many chapters in this book: whether they are epistemic programs, or choices by a coalition of agents, or moves in a game, or votes by the members of a jury. All in all, there is an emphasis on Information and a notion of Agency. What is furthermore striking, is that almost all chapters study these concepts in a multi -agent perspective. In no paper in this book we have an isolated decision maker that only reasons about his own information and strategies, but always this is placed in a context of other agents, with some implicit or explicit assumptions about the Interaction.

viii

FOREWORD

Finally, this volume demonstrates that ‘classical’ approaches to Information and Agency co-exist very well with more modern trends that show how Knowledge, Rationality and Action can achieve a broad and refreshing interpretation; there is a chapter on ‘classical AGMlike’ belief revision, but also two on a modern approach in the area of Dynamic Epistemic Logic. There is a chapter on von Neumann games, but also two that defend Evolutionary Game Theory, a branch of game theory that attempts to loosen the ‘classical assumptions’ about ‘hyperrational players’ in games. There are chapters solely on logical theories, but also one that suggests how we can bridge the gap between symbolic and connectionist approaches to cognition. Finally, there is a chapter on voting that relativises Condercet’s ‘classical’ Jury’s Theorem. In the next section I will brieﬂy give some more details about the themes of this book. While this can be conceived as a ‘top-down’ description of the contents, in Section 3 I will give a more ‘bottom-up’ picture of the individual chapters.

2. Themes in this Book The ﬁrst two chapters (Logics for Epistemic Programs and A Counterexample to Six Fundamental Principles of Belief Formation), as well as the fourth (A Characterization of Von Neumann Games) and the last two (A SAT-Based Approach to Unbounded Model Checking for Alternating-Time Temporal Epistemic Logic (Chapter 9) and Update Semantics of Security Protocols) explicitly deal with information change. It is also subject of study in part of Chapter 3, Comparing Semantics of Logics for Multi-Agent Systems. Whereas the ﬁrst chapter, in order to develop a Dynamic Epistemic Logic, uses a powerful object language incorporating knowledge, common knowledge and belief, the second chapter formulates and analyses six simple postulates, in a metaand semi-formal language. A main diﬀerence in the two approaches is that in Chapter 2, the theory has to explain how ‘expectations’ implicitly encoded in a belief set determine the next belief (when a conjunction has to be given up, say), whereas in the ﬁrst chapter, this is not an issue: there, the eﬀect of the learning is exhaustively described by an epistemic program. Moreover, in the ﬁrst chapter, the ‘only’ facts prune to change are epistemic facts (for instance, one agent might learn that a second agent now does know whether a certain fact holds), where they are ‘objective’ in the second. Chapter 10 uses a mix of both kinds of information change: there, it can be both objective and ‘ﬁrst order’. Chapters 3

WIEBE VAN DER HOEK

ix

and 9 analyse the dynamics of knowledge in the context of Alternating Transition Systems, where we have several agents that jointly determine the transition from one state to another. These two chapters do not have operators that explicitly refer to the change of knowledge: it is encoded in the epistemic and action relations in the model. Knowledge in extensive games is treated in a similar way, where it is represented in the information partition of the underlying game trees. Chapter 3 demonstrates that, indeed, there is a close relation between Alternating Transition Systems and Concurrent Game Structures, of which the latter are often conceived as a generalisation of game trees. The chapters 9 and 10 take a genuine Computer Science perspective on some of the issues analysed in chapters 1 and 3. More in particular, both chapters address the problem of (automatic) veriﬁcation of complex agent systems. The systems of study in chapter 9 are the Alternating-time Transitions Systems of chapter 3, more precisely those which incorporate an epistemic component. The problem addressed in this chapter is that of model checking such systems: given a description of a transition system model, and a property expressed in an appropriate logic, can we automatically check whether that property is true in that model? Chapter 10’s aim is to develop veriﬁcation methods for the epistemic program-type of action of Chapter 1, in the area of security protocols. Since the use of encryption keys in such protocols is to hide information in messages from speciﬁc agents, but make it available to others, Dynamic Epistemic Logic seems an appropriate tool here. Games, or at least, game like structures, are the object of study in the chapters 3, 4 and 5 (An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems). The ﬁst of the two address knowledge or belief in such games, whereas the emphasis on the ﬁfth chapter is more on lack of it. Chapter 4 studies memory of past knowledge for players playing a game in extensive form. If a player knows ϕ now, is he guaranteed to always know that he ever knew ϕ? The chapter gives a necessary and suﬃcient condition for it, and shows that players having perfect recall is closely connected to the notion of a von Neumann game. The other two chapters (i.e., 3 and 5) have in common that they try to combine and relate diﬀerent formalisms. Chapter 3 compares and relates several semantics for game like logics, including Alternating-Time Temporal Logic (ATL) and Coalition Logic. Chapter 5 rather relates three disciplines: evolutionary game theory, reinforcement learning and multi-agent systems. The key trigger for this work is the insight that in many realistic multi-agent systems one has to weaken the ‘classical’ game theoretic assumptions about ‘hyper-rational’ agents, by players

x

FOREWORD

referred to as ‘bounded rational’ agents, who only have partial information about the environment and the payoﬀ tables, and who have to learn an optimal ‘policy’ by trial and error. The sixth chapter questions exactly the same assumptions that classical game theory makes on ‘hyper-rational agents’ as are debated in Chapter 5. Chapter 6 addresses the question how to explain which equilibria are chosen in signalling games, games that try to shed a light on language use and language organisation. The chapter proposes to replace current explanations for such selection, which rely on strong assumptions about rationality and common knowledge (thereof) by the players, i.e., the language users, by explanations that are based on insights from evolutionary game theory. Whereas Chapter 3 relates, on a technical level, several semantics for games like logics, and Chapter 5 makes a case to combine three disciplines in order to study the dynamics of rationality in multi-agent systems, Chapter 7 (Nonmonotonic Inferences and Neural Networks) uses a semantics in one research paradigm, i.e., non-monotonic logic as a symbolic reasoning mechanism, to bridge a gap to another paradigm, i.e. to connectionist networks in the sub-symbolic paradigm. Doing so, the chapter is a step toward bridging the gap between symbolic and sub-symbolic modes of computation, thus addressing a long standing issue in Philosophy of Mind. Finally, Chapter 8 (Evolution of Conventional Meaning and Conversational Principles) addresses the issue of rational decision making in a group, or voting. It discusses a classical result that says that, in the scenario of majority voting, if every juror is competent, the reliability that the group decision is correct, converges to certainty, if the group size increases. Thus, this chapter also sits in the multi-agent context, but rather than accepting that the result of a joint action is given by some transition function, this chapter discusses the rationality of a speciﬁc way to merge speciﬁc actions, i.e., those of voting. Moreover, again, it appears that knowledge is crucial here, because the chapter proposes, rather than to assume independence of the voters given the state of the world, we should conditionalize on the latest evidence.

3. Brief Description of the Chapters In Logics for Epistemic Programs, by Alexandru Baltag and Lawrence S.Moss (Chapter 1), the authors take a general formal approach to changes of knowledge, or, better, changes of belief in a multi-agent context. The goal of their paper is to show how several epistemic actions

WIEBE VAN DER HOEK

xi

can be explained as speciﬁc update operations on ‘standard’ Kripke state-models that describe ‘static’ knowledge. Updates describe how we move from one state-model to another, and an epistemic action speciﬁes, how such an update ‘looks like’, for every agent involved in it. An example of an epistemic action is that of a public announcement: if ϕ is publicly announced in a group of agents A, the information ϕ is truthfully announced to everybody in A, and that this is done is common knowledge among the members of A. However, in more private announcements, it may well be that agent b learns a new fact, whereby c is aware of this, without becoming to know the fact itself. The approach of the authors is unique in the sense that they also model epistemic actions by Kripke-like models, called action models. In A Counterexample to Six Fundamental Principles of Belief Formation (Chapter 2), Hans Rott reconsiders six principles that are generally well accepted in the areas of non-monotonic reasoning, belief revision and belief contraction, principles of common sense reasoning. They can all be formulated just by using conjunction and disjunction over new information, or information that has to be abandoned. Rott then pictures a reasoner who initially ‘expects’ that from a set of possible alternatives a, b, or c, none will be chosen. He then sketches three possible (but diﬀerent) scenarios in which the reasoner learns in fact that a ∨ b, a ∨ b ∨ c and c do hold, after all. Depending on how each new information is in line or goes against the reasoners ‘other expectations’, he will infer diﬀerent conclusions in each scenario. Interestingly, the six principles are then tied up with principles in the theory of rational choice, most prominently to the Principle of Independence of Irrelevant Choices: one’s preferences among a set of alternatives should not change (within that set of alternatives), if new options present themselves. Rott argues that, in the setting of belief formation, the eﬀect of this additional information should exactly be accounted for and explained. The chapter concludes negatively: logics that are closer to modelling ‘common sense reasoning’ seem to have a tendency to drift away from the nice, classical patterns that we usually ascribe to ‘standard logics’. Alternating Transition Systems are structures in which each joint action of a group of agents determines a transition between global states of the system. Such systems have inspirations from areas as diverse as game theory, computation models, epistemic and coalition models. Chapter 3 (Comparing Semantics of Logics for Multi-Agent Systems, by Valentin Goranko and Wojciech Jamroga) uses these structures to show how (semantically) several frameworks to reason about the abilities of agents are equivalent. One prominent framework in their analyses is that of Alternating-time Temporal Logic (ATL), a logic intended to reason about what coalitions of agents can achieve, by choosing an

xii

FOREWORD

appropriate strategy, and taking into account all possible strategies for the agents outside the coalition. The chapter ﬁrst of all shows that the three diﬀerent semantics that were proposed for ATL are equivalent. Moreover, the authors demonstrate that ATL subsumes (Extended) Coalition Logic. Last but not least, they show that adding an epistemic component to ATL (giving ATEL), can be completely modelled within ATL, the idea being, to model the epistemic indistinguishability relation for each agent as a strategic transition relation for his ‘associated epistemic agents’. In Chapter 4, (A Characterization of Von Neumann Games in Terms of Memory), Giacomo Bonanno analyses knowledge, and the memory of it, in the context of extensive games with incomplete information. In such a game, it is assumed that each player can be uncertain about the nodes in which he has to make a move. First of all, Bonanno extends this notion to an information completion, in which a player can have uncertainties about all nodes, not just the ones that are his. For such games, he deﬁnes a notion of Memory of Past Knowledge (MPK) in terms of the structure and the information partition in that game. If a game only allows for uncertainty for every player at his decision nodes, the the analogue of MPK is called Memory of Past Decision Nodes (MPD). Syntactically, MPK is related to perfect recall: it appears to be equivalent to saying that, at every node, a player remembers everything he has known before: if he knew ϕ in the past, then now he knows that he previously knew ϕ. This notion is then connected to that of von Neumann games, which, roughly, are games in which all players know the time. The main result of the chapter tells us that an extensive form game with incomplete information allows for an information completion satisfying MPK if, and only if, the game is von Neumann satisfying MPD. Note that from this, it follows that if we start with a game with Memory of Past Decision Nodes, this game can be extended to a game with Memory of Past Knowledge, if and only if it is a von Neumann game. Chapter 5 (An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems) by Karl Tuyls, Ann Now´e, Tom Lenaerts and Bernard Manderick, is a survey paper that argues how three currently loosely connected disciplines can use and contribute to each other’s development. They ﬁrst of all take the stance that crucial assumptions in ‘classical game theory’ make their applicability to multi-agent systems and the real world rather limited: in the latter, we cannot always assume that participants have perfect knowledge of the environment, or even of the payoﬀ tables. Rather than assuming that players are ‘hyper-rational’, who correctly anticipate the behaviour of all other players, they propose players that are ‘boundedly

WIEBE VAN DER HOEK

xiii

rational’, who are limited in their knowledge about the game and the environment, as well as in their computational resources. Moreover, such players learn to respond better by trial and error, which adds to the dynamics of the multi-player, or multi-agent system. Given these assumptions about the partially known dynamic environment, is seems natural to assume that learning and adaptiveness are skills that are important for the agents in that environment. The chapter argues how reinforcement learning, a theoretical framework that is already established in single-agent systems, has to solve several technical problems in order to be applicable to the multi-agent case. The problem is that in such richer systems, the reinforcement an agent receives, may depend on the actions taken by the other agents. This absence of ‘Markovian behaviour’ may make convergence properties of reinforcement learning, as they hold in the single agent case, disappear. In order to fully understand the dynamics of learning and the eﬀects of exploration in multi-agent systems, they propose to use evolutionary game theory in such systems, which adds a solution concept to the classical equilibria, namely that of a strategy being evolutionary stable. A strategy has this property if it is robust against evolutionary pressure from any appearing mutant strategy. Apart from giving several examples illustrating the main concepts, they show how evolutionary game theory can be used as a foundation for modelling new reinforcement algorithms for multi-agent systems. Evolution of Conventional Meaning and Conversational Principles by Robert van Rooy (Chapter 6) questions exactly the same assumptions that classical game theory makes on ‘hyper-rational agents’. This chapter addresses the question how to explain which equilibria are chosen in signalling games, games that try to shed a light on language use and language organisation. The chapter proposes to replace current explanations for such selection, which rely on strong assumptions about rationality and common knowledge (thereof) by the players, i.e., the language users, by explanations that are based on insights from evolutionary game theory, especially that of an evolutionary stable strategy. Rather than obtaining Nash equilibriua in language games by relying on almost reciprocal assumptions about mutual (knowledge of) rationality, or using a psychological notion of salience to explain selection of a so-called conventional equilibrium, the chapter shows how that equilibrium will ‘naturally’ evolve in the context of evolutionary language games. It also uses evolutionary game theory to explain how conventions that enhance eﬃcient communication are more likely to be adapted than those that do not. Finally, this chapter shows how costly signalling can account for honest communication.

x iv

FOREWORD

The aim of Chapter 7 (Nonmonotonic Inferences and Neural Networks), by Reinhard Blutner, is mainly a methodological one, i.e., to show that model-theoretic semantics may be useful for analysing properties of connectionist networks. Doing so, the chapter is a step toward bridging the gap between symbolic and sub-symbolic modes of computation, thus addressing a long standing issue in philosophy of mind. The chapter demonstrates ﬁrst of all that certain activities of connectionist networks can be seen as non-monotonic inferences. Secondly, it shows a correspondence between the coding of knowledge in Hopﬁeld networks, and the representation of knowledge in Poole systems. To do so, the chapter makes the latter systems weight-annotated, assigning a weight to all possible hypotheses in a Poole system. Then, roughly, links in the network are mapped to bi-implications in the logical system. In sum, the chapter contributes to its goals by encouraging us to accept that the diﬀerence between symbolic and neural computation is one of perspective: we should view symbolism as a high-level description of properties of a class of neural networks. Chapter 8 (A Model of Jury Decisions Where All Jurors Have the Same Evidence by Franz Dietrich and Christian List) addresses the issue of rational decision making in a group, or voting. The setting is a simple one: the decision is a jury’s decision about a binary variable (guilty or not) under the assumption that each juror is competent (predicts the right value of the variable with a probability greater than 0.5). Under this scenario, Condorcet’s Jury Theorem predicts that the reliability of a jury’s majority decision converges to 1 if the size of the jury increases unboundedly. This holds under the assumption that diﬀerent jurors are independent conditional on the state of the world, requiring that for each individual juror, a new independent view on the world is available. The authors propose a framework in which the jurors are independent on the evidence, rather than the world. This evidence is called the latest common cause of evidence of the jurors votes. This framework seems to have a realistic underpinning: a jury typically decides on the basis of commonly presented evidence, not on independently obtained signals about the world–the latter often not even being allowed for use in the court room. The chapter’s jury’ theorem then shows that the probability of a correct majority decision is typically less than the corresponding probability in the Condorcet’s model. It also predicts that, as the jury size increases, the probability of a correct majority decision converges to the probability that the evidence is not misleading. Chapter 9 (A SAT-Based Approach to Unbounded Model Checking for Alternating-Time Temporal Epistemic Logic), by M. Kacprzak and W. Penczek address the problem of (automatic)

WIEBE VAN DER HOEK

xv

veriﬁcation of complex agent systems. The systems of study in chapter 9 are the Alternating-time Transitions Systems of chapter 3, more in particular those which incorporate an epistemic component. The problem addressed in this chapter is that of model checking such systems: can we, given a description of a transition system model, and a property expressed in an appropriate logic (ATEL, an epistemic extension of ATL), automatically check whether that property is true in that model? To do so, this approach ﬁxes one of the semantics for ATL given in Chapter 4, to apply a technique from unbounded model checking to it. Then, for a given model T and ATEL-property ϕ, a procedure is given to express them as Quantiﬁed Boolean Formulas, which, using ﬁxed point deﬁnitions, in turn yield purely propositional formulas. A main theorem of the chapter then states that ϕ is true of T if and only if the obtained propositional formula is satisﬁable. Hence, model checking ATEL is reduced to a SAT-based approach, an approach that has computational advantages over model checking using for instance Binary Decision Diagrams. The last chapter, chapter 10, (Update Semantics of Security Protocols by Arjen Hommerson, John-Jules Meyer and Erik de Vink) addresses veriﬁcation of security protocols. This becomes more and more important in an era where agents send more and more private, secret or sensitive messages over an insecure medium. Decryption keys are introduced to make speciﬁc messages only readable to speciﬁc agents, which makes the need to reason about higher order information (knowledge about knowledge) in a multi-agent protocol obvious. This chapter takes three kinds of updates, or messages (or, in the terminology of Chapter 1, ‘epistemic programs’): the public announcement of an object variable, the private learning of a variable and the private learning about the knowledge of other agents about variables. The chapter ﬁrst of all gives a Dynamic Kripke Semantics for these speciﬁc actions, not unlike the semantics using action models as proposed in Chapter 1, and then puts this semantics to work to model and reason about two speciﬁc security protocols, in which encrypted messages are sent and received. This chapter might well be a ﬁrst step to apply the model checking techniques described in Chapter 9 to the dynamic logic framework of Chapter 1, in the area of security and authorisation protocols.

ALEXANDRU BALTAG and LAWRENCE S. MOSS

LOGICS FOR EPISTEMIC PROGRAMS

ABSTRACT. We construct logical languages which allow one to represent a variety of possible types of changes affecting the information states of agents in a multi-agent setting. We formalize these changes by deﬁning a notion of epistemic program. The languages are two-sorted sets that contain not only sentences but also actions or programs. This is as in dynamic logic, and indeed our languages are not signiﬁcantly more complicated than dynamic logics. But the semantics is more complicated. In general, the semantics of an epistemic program is what we call a program model. This is a Kripke model of ‘actions’, representing the agents’ uncertainty about the current action in a similar way that Kripke models of ‘states’ are commonly used in epistemic logic to represent the agents’ uncertainty about the current state of the system. Program models induce changes affecting agents’ information, which we represent as changes of the state model, called epistemic updates. Formally, an update consists of two operations: the ﬁrst is called the update map, and it takes every state model to another state model, called the updated model; the second gives, for each input state model, a transition relation between the states of that model and the states of the updated model. Each variety of epistemic actions, such as public announcements or completely private announcements to groups, gives what we call an action signature, and then each family of action signatures gives a logical language. The construction of these languages is the main topic of this paper. We also mention the systems that capture the valid sentences of our logics. But we defer to a separate paper the completeness proof. The basic operation used in the semantics is called the update product. A version of this was introduced in Baltag et al. (1998), and the presentation here improves on the earlier one. The update product is used to obtain from any program model the corresponding epistemic update, thus allowing us to compute changes of information or belief. This point is of interest independently of our logical languages. We illustrate the update product and our logical languages with many examples throughout the paper.

1. INTRODUCTION

Traditional epistemic puzzles often deal with changes of knowledge that come about in various ways. Perhaps the most popular examples are the puzzles revolving around the fact that a declaration of ignorance of some sentence A may well lead to knowledge of A. We have in mind the scenarios that go by names such as the Muddy Children, the Cheating Spouses, the Three Wisemen, and the like. The standard treatment of these matters (a) introduces the Kripke semantics of modal logic so as to formalize the Synthese 139: 165–224, 2004. Knowledge, Rationality & Action 1–60, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[1]

166

LOGICS FOR EPISTEMIC PROGRAMS

informal notions of knowledge and common knowledge; (b) formalizes one of the scenarios as a particular model; (c) and ﬁnally shows how the formalized notions of knowledge and common knowledge illuminate some key aspects of the overall scenario. The informal notion of knowledge which is closest to what is captured in the traditional semantics is probably justiﬁed true belief. But more generally, one can consider justiﬁable beliefs, regardless of whether or not they happen to be true or not; in many contexts, agents may be deceived by certain actions, without necessarily losing their rationality. Thus, such beliefs, and the justiﬁable changes affecting these beliefs, may be accepted as a proper subject for logical investigation. The successful treatment of a host of puzzles leads naturally to the following THESIS I. Let s be a social situation involving the intuitive concepts of knowledge, justiﬁable beliefs and common knowledge among a group of agents. Assume that s is presented in such a way that all the relevant features of s pertaining to knowledge, beliefs and common knowledge are completely determined. Then we may associate to s a mathematical model S. (S is a multi-agent Kripke model; we call these epistemic state models.) The point of the association is that all intuitive judgements concerning s correspond to formal assertions concerning S, and vice-versa. We are not aware of any previous formulations of this thesis. Nevertheless, some version of this thesis is probably responsible for the appeal of epistemic logic. We shall not be concerned in this paper with a defense of this thesis, but instead we return to our opening point related to change. Dynamic epistemic logic, dynamic doxastic logic, and related formalisms, attempt to incorporate change from model to model in the syntax and semantics of a logical language. We are especially concerned with changes that result from information-updating actions of various sorts. Our overall aim is to formally represent epistemic actions, and we associate to each of them a corresponding update. By “updates” we shall mean operations deﬁned on the space of all state models, operations which are meant to represent welldeﬁned, systematic changes in the information states of all agents. By an “epistemic action” (or program) we shall mean a representation of the way such a change “looks” to each agent. Perhaps the paradigm case of an epistemic action is a public announcement. The ﬁrst goal is to say in a general way what the effect of a (public) announcement should be on a model. It is natural to model such announcements by the logical notion of relativization: publicly announcing a sentence causes all agents to restrict attention to the worlds where the

[2]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

167

sentence was true (before the announcement). Note that the informal notion of announcement takes situations to situations, and the formal notion of relativization is an operation taking models to models. In this paper, we wish to consider many types of epistemic actions that are more difﬁcult to handle than public announcements. These include half-transparent, half-opaque types of actions, such as announcements to groups in a completely private way, announcements that include the possibility that outsiders suspect the announcement but this suspicion is lost on the insiders, private announcements which are secretely intercepted by outsiders etc. We may also consider types of actions exhibiting information-loss and misinformation, where agents are deceived by others or by themselves. THESIS II. Let σ be a social “action” involving and affecting the knowledge (beliefs, common knowledge) of agents. This naturally induces a change of situation; i.e., an operation o taking situations s into situations o(s). Assume that o is presented by assertions concerning knowledge, beliefs and common knowledge facts about s and o(s), and that o is completely determined by these assertions. Then (a) We may associate to the action σ a mathematical model which we call an epistemic action model. ( is also a multi-agent Kripke model.) The point again is that all the intuitive features of, and judgments about, σ correspond to formal properties of . (b) There is an operation ⊗ taking a state model S and an action model and returning a new state model S ⊗ . So each induces an update operation O on state models: O(S) = S ⊗ . (c) The update O is a faithful model of the situation change o, in the sense that for all s: if s corresponds to S as in Thesis I, then again o(s) corresponds to O(S) in the same way; i.e. all intuitive judgements concerning o(s) correspond to formal assertions concerning O(S), and vice-versa. Our aim in this paper is not to offer a full conceptual defense of these two theses. Instead, we will justify the intuitions behind them through examples and usage. We shall use them to build logical languages and models and show how these can be applied to the analysis of natural examples of “social situations” and “social actions”. As in the case of standard possibleworlds semantics (for which a ‘strong’, ontological defense is hard, maybe even impossible, to give), the usefulness of these formal developments may [3]

168

LOGICS FOR EPISTEMIC PROGRAMS

provide a ‘weak’, implicit defense of the philosophical claims underlying our semantics. Our method of deﬁning updates is quite general and leads to logics of epistemic programs, extending standard systems of epistemic logic by adding updates as new operators. These logical languages also incorporate features of propositional dynamic logic. Special cases of our logic, dealing only with public or semi-public announcements to mutually isolated groups, have been considered in Plaza (1989), Gerbrandy (199a, b), and Gerbrandy and Groeneveld (1997). But our overall setting is much more liberal, since it allows for all the above-mentioned types of actions. We feel it would be interesting to study further examples with an eye towards applications, but we leave this to other papers. In our logical systems, we capture only the epistemic aspect of these real actions, disregarding other (intentional) aspects. In particular, to keep things simple we only deal with “‘purely epistemic” actions; i.e., the ones that do not change the facts of the world, but affect only the agents’ beliefs about the world. However, this is not an essential limitation, as our formal setting can be easily adapted to express fact-changing actions. On the semantic side, the main original technical contribution of our paper lies in our decision to represent not only the epistemic states, but also (for the ﬁrst time) the epistemic actions. For this, we use action models, which are epistemic Kripke models of “actions”, similar to the standard Kripke structures of “states”. While for states, these structures represent in the usual way the uncertainty of each agent concerning the current state of the system, we similarly use action signatures to represent the uncertainty of each agent concerning the current action taking place. For example, there will be a single action signature that represents public announcements. There will be a different action signature representing a completely private announcement to one speciﬁed agent, etc. The intuition is that we are dealing with potentially “half-opaque/half-transparent” actions, about which the agents may be incompletely informed, or even completely misinformed. The components (“possible worlds”) of an action model are called “simple” actions, since they are meant to represent particularly simple kinds of actions, whose epistemic impact is uniform on states: the informational features of a simple action are intrinsic to the action, and thus are independent of the informational features of the states to which it can be applied. This independence is subject to only one restriction: the action’s presuppositions or conditions of possibility, which a state must satisfy in order for the action to be executable. Thus, besides the epistemic structure, simple actions have preconditions, deﬁning their domain of applicability: not every action is possible in every state. We model the update [4]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

169

of a state by an action as a partial update operation, given by a restricted product of the two structures: the uncertainties present in the given state and the given action are multiplied, while the “impossible” combinations of states and actions are eliminated (by testing the actions’ preconditions on the state). The underlying intuition is that, since the agent’s uncertainties concerning the state and the ones concerning the simple action are mutually independent, the two uncertainties must be multiplied, except that we insist on applying an action only to those states which satisfy its precondition. On the syntactic side, we use a mixture of dynamic and epistemic logic, with dynamic modalities associated to each action signature, and with common-knowledge modalities for various groups of agents (in addition to the usual individual-knowledge operators). In this paper, we present a sound system for this logic. The logic includes an Action-Knowledge Axiom that generalizes similar axioms found in other papers in the area; (cf. Gergrand 1999a, b; Plaza 1989). The main original features of our system is an inference rule which we call the Action Rule. This allows one to infer sentences expressing common knowledge facts which hold after an epistemic action. From another point of view, the Action Rule expresses what might be called a notion of “epistemic (co)recursion”. Overall, the Action-Knowledge Axiom and the Action Rule express fundamental formal features of the interaction between action and knowledge in multi-agent systems. The logic is studied further in our paper with S. Solecki (Baltag et al. 1998). There we present the completeness and decidability of the logic, and we prove various expressivity results. For Impatient Readers. The main logical systems of the paper are presented in Section 4.2, and to read that one would only have to read the deﬁnition in Section 4.1 ﬁrst. To understand the semantics, one should read in addition Sections 2.1, 2.3, and 3.1–3.4. But we know that our systems would not be of any interest if the motivation were not so great. For this reason, we have included many examples and discussions, particularly in the sections of the paper preceding the introduction of the logics. Readers may read as much or as little of that material as they desire. Indeed, some readers may ﬁnd our examples and discussion of more interest than the logical systems themselves. The main logical systems are presented in Section 5. Technical Results. Concerning our systems will appear in other papers. The completeness/decidability result for the main systems of this paper will appear in a paper (Baltag et al. 1998) written with Sławomir Solecki;

[5]

170

LOGICS FOR EPISTEMIC PROGRAMS

this paper also contains results on the expressive power of our systems. For stronger systems of interest, there are undecidability results; (cf. Miller and Moss (2003)).

1.1. Scenarios Probably the best way to enter our our overall subject is to consider some “epistemic scenarios.” These give the reader some idea of what the general subject is about, and they also provide us with test problems at different points. SCENARIO 1. The Concealed Coin. A and B enter a large room containing a remote-control mechanical coin ﬂipper. One presses a button, and the coin spins through the air, landing in a small box on a table. The box closes. The two people are much too far to see the coin. The main contents of any representation of the relevant knowledge states of A and B are that (a) there are two alternatives, heads and tails; (b) neither party knows which alternative holds; and (c) that (a) and (b) are common knowledge. The need for the notion of common knowledge in reasoning about multi-agent interaction is by now standard in applied epistemic logic, and so we take it as unproblematic that one would want (c) to come out in any representation of this scenario. Perhaps the clearest way to represent this scenario is with a diagram:

In more standard terms, we have a set of two alternatives, call them x and y. We also have some atomic information, that x represents the possible fact of the coin is lying heads up and that y represents the other possible fact. Finally, we also have some extra apparatus needed to indicate that, no matter the actual disposition of the coin, A and B think both alternatives are possible. Following the standard practice in epistemic logic, we take this apparatus to be accessibility relations between the alternatives. The diagram should be read as saying that were the actual state of the world to be x, say, then A and B would still entertain both alternatives. SCENARIO 2. The Coin Revealed to Show Heads. A and B sit down. One opens the box and puts the coin on the table for both to see. It’s heads. The result of this scenario is a model which again is easy to grasp. It consists of one state; call it s. Each agent knows the state in the sense that they think s is the only possible state. [6]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

171

We also shall be interested in keeping track of the relation of the model in Scenario 1 and the model just above. We indicate this in the following way:

The ﬁrst thing to note is that the dotted connection is a partial function; as we shall see later, this is the hallmark of a deterministic epistemic action. But we also will see quite soon that the before-after relation is not always a partial function; so there are epistemic actions which are not deterministic. Another thing to note at this point is that the dotted connection is not subscripted with an agent or a set of agents. It does not represent alternative possibility with respect to anyone, but instead stands for the before-after relation between two models: it is a transition relation, going from input states to the corresponding output states. In this example, the transition relation is in fact a partial function whose domain is the set of states which could possibly be subject to the action of revealing heads. This is possible in only one of the two starting states. SCENARIO 2.1. The Coin Revealed to Show Tails. As a variation on Scenario 2, there is a different Scenario in which the coin is revealed in the same way to both A and B but with the change that tails shows. Our full representation is:

SCENARIO 2.2. The Coin Revealed. Finally, we can consider the nondeterministic sum of publicly revealing heads and publicly revealing tails. The coin is revealed to both A and B, but all that we as external modelers can say is that either they learned that heads shows, or that they learned that tails shows. Our representation is

[7]

172

LOGICS FOR EPISTEMIC PROGRAMS

Observe that, although non-deterministically deﬁned, this is still a deterministic action: the relation described by the dotted connection is still a function. SCENARIO 3. A Semi-private Viewing of Heads. The following is an alternative to the scenarios above in which there is a public revelation. After Scenario 1, A opens the box herself. The coin is lying heads up. B observes A open the box but does not see the coin. And A also does not disclose whether it is heads or tails.

No matter which alternative holds, B would consider both as possible, and A would be certain which was the case SCENARIO 3.1. B’s Turn. After Scenario 3, B takes a turn and opens the box the same way. We expect that after both have individually opened the boxes they see the same things; moreover, they know this will happen. This time, we begin with the end of Scenario 3, and we end with the same end as in the public revelation of heads:

SCENARIO 4. Cheating. After Scenario 1, A secretly opens the box herself. The coin is lying Heads up. B does not observe A open the box, and indeed A is certain that B did not suspect that anything happened after they sat down. This is substantially more difﬁcult conceptually, and the representation is accordingly more involved. Such cheating is like an announcement that results in A’s being sure that the coin lies heads up, and B learns nothing. But the problem is how to model the fact that, after the announcement, B knows nothing (new). We cannot just delete all arrows for B to represent such lack of knowledge: this would actually increase B’s (false) ‘knowledge’, by adding to his set of beliefs new ones; for example, he’ll believe it is not possible that the coin is lying heads up. Deleting arrows always corresponds to increasing ‘information’ (even if sometimes this is just by adding false information). But this seems wrong in our case, since B’s possibilities should be unchanged by A’s secret cheating. Instead, our representation of the informational change induced by such cheating should be:

[8]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

173

(1)

The main point is that after A cheats, B does not think that the actual state of the world is even possible. The states that B thinks are possible should only be two states which are the same as the “before” states, or states which are in some important way similar to those. In addition to the way the state actually is, where A knows the coin lies heads up or rather as an aspect of that state, we need to consider other states to represent the possibilities that B must entertain. We have indicated names s, t, and u of the possibilities in (1). (We could have done this with the scenarios we already considered, but there was less of a reason.) State s has a different status from t and u: while t and u are the states those that B thinks could hold, s is the one that A knows to be the case. Note that the substructure determined by t and u is isomorphic to the “before” structure. This is the way that we formalize the intuition that the states available to B after cheating are essentially the same as the states before cheating. SCENARIO 5. More Cheating. After Scenario 4, B does the very same thing. That is, B opens the box quite secretly. We leave the reader the instructive exercise of working out a representation. We shall return to this matter in Section 3.5, where we solve the problem based on our general machinery. We merely mention now that part of the goal of the paper is precisely to develop tools to build representations of complex epistemic examples such as this one. SCENARIO 6. Lying. After Scenario 1, A convinces B that the coin lies heads up and that she knows this. In fact, she is lying. Here is our representation:

SCENARIO 7. Pick a Card. As another alternative to Scenario 3, C walks in and tells both A and B at the same time that she has a card which either says H, T, or is blank. In the ﬁrst two cases the card describes truly the

[9]

174

LOGICS FOR EPISTEMIC PROGRAMS

state of the coin in the box, and in the last case the intention is that no information is given. Then C gives the card to A in the presence of B. Here is our representation:

The idea here is that t and u represent states where the card was blank, s the state where the card showed H, and t the state where the card showed T. SCENARIO 8. Common Knowledge of (Unfounded) Suspicion. As yet another alternative to Scenario 3, suppose that after A and B make their original entrance, A has not looked, but B has some suspicion concerning A’s cheating; so B considers possible that she (A) has secretely opened the box (but B cannot be sure of this, so he also considers possible that nothing happened); moreover, we assume there is common knowledge of this (B’s) suspicion. That is, we want to think of one single action (a knowing glance by A, for example) that results in all of this. The representation of “before” and “after” is as follows:

One should compare this with Scenario 7. The blank card there is a parallel to no looking here; card with H is parallel to A’s looking and seeing H; and similarly for T. This accounts for the similarity in the models. The main difference is that Scenario 7 was described in such a way that we do not know what the card says; in this scenario we stipulate that A deﬁnitely did not look. This accounts for the difference in dotted lines between the two ﬁgures.

[ 10 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

175

SCENARIO 8.1. Private Communication about the Other. After Scenario 8, someone stands in the doorway behind A and raises a ﬁnger. This indicates to B that A has not cheated. The communication between B and the third party was completely private. Again, we leave the representation as an exercise to the reader. One Goal. of the paper is to propose a theoretical understanding of these representations. It turns out that there are simple operations and general insights which allow one to construct them. One operation allows one to pass, for example, from the representation of Scenario 1 to that of Scenario 4; moreover, the operation is so applicable that it allows us to add private knowledge for one agent in a “private” way to any representation. 1.2. Further Contents of This Paper Section 2 continues our introduction by both reviewing some of the main background concepts of epistemic logic, and also by situating our logical systems with respect to well-known systems. Section 2.1 presents a new “big picture” of the world of epistemic actions. While the work that we do could be understood without the conceptual part of the big picture, it probably would help the reader to work through our deﬁnitions. Section 3 begins the technical part of the paper in earnest, and here we revisit some of the Scenarios of Section 1.1 and their pictures. The idea is to get a logical system for reasoning with, and about, these kinds of models. Section 4 gives the syntax and semantics of our logical systems. These are studied further in Sections 5. For example, we present sound and complete logical systems (with the proofs of soundness and completeness deferred to Baltag et al. (2003)). Endnotes. We gather at the ends of our sections some remarks about the literature and how our work ﬁts in with that of others. We mentioned other work in dynamic epistemic logic and dynamic doxastic logic. Much more on these logics and many other proposals in epistemic logic may be found in Gochet and Gribomont (2003), and Meyer and van der Hoek (1995). Also, a survey of many topics the area of information update and communication may be found in van Benthem’s papers (2000, 2001a, b). The ideas behind several of the scenarios in Section 1.1 are to be found in several places: see, e.g., Plaza (1989), Gerbrandy (1999a, b), Gerbrandy and Groeneveld (1997), and van Ditmarsch (2000, 2001). We shall discuss these papers in more detail later. Our scenarios go beyond the work of these papers. Speciﬁcally, our treatment of the actions in Scenarios 6 and [ 11 ]

176

LOGICS FOR EPISTEMIC PROGRAMS

8 seems new. Also, our use of the relation between “before” and “after” (given by the dotted arrows) is new.

2. EPISTEMIC UPDATES AND OUR TARGET LOGICS

We ﬁx a set AtSen of atomic sentences, and a set A of agents. All of our deﬁnitions depend on AtSen and A, but for the most part we omit mention of these. 2.1. State Models and Epistemic Propositions A

A state model is a triple S = (S, → S , · S ) consisting of a set S of A “states”; a family of binary accessibility relations → S ⊆ S × S, one for each agent A ∈ A; and a “valuation” (or a “truth” map) .S : AtSenP (S), assigning to each atomic sentence p a set pS of states. When dealing with a single ﬁxed state model S, we often drop the subscript S from all the notation. In a state model, atomic sentences are supposed to represent nonepistemic, “objective” facts of the world, which can be thought of as properties of states; the valuation tells us which facts hold at which states. The accessibility relations model the agents’ epistemic uncertainty about A

the current state. That is, to say that s → t in S means that in the model, in state s, agent A considers it possible that the state is t. DEFINITION. Let StateModels be the collection of all state models. An epistemic proposition is an operation ϕ deﬁned on StateModels such that for all S ∈ StateModels, ϕ S ⊆ S. The collection of epistemic propositions is closed in various ways. 1. For each atomic sentence p we have an atomic proposition p with pS = pS .1 2. If ϕ is an epistemic proposition, then so is ¬ϕ, where (¬ϕ) S = S \ ϕ S . 3. If C is a set or class of epistemic propositions, then so is C, with {ϕ S : ϕ ∈ C}. ( C)S = 4. Taking C above to be empty, we have an “always true” epistemic proposition tr, with trS = S. 5. We also may take C in part (3) to be a two-element set {ϕ, ψ}; here we write ϕ ∧ ψ instead of {ϕ, ψ}. We see that if ϕ and ψ are epistemic propositions, then so is ϕ ∧ ψ, with (ϕ ∧ ψ)S = ϕ S ∩ ψ S . [ 12 ]

177

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

6. If ϕ is an epistemic proposition and A ∈ A, then 2A ϕ is an epistemic proposition, with (2A ϕ)S

(2)

=

A

{s ∈ S : if s → t, then t ∈ ϕ S }.

7. If ϕ is an epistemic proposition and B ⊆ A, then 2∗B ϕ is an epistemic proposition, with (2∗B ϕ)S

=

∗

B {s ∈ S : if s −→ t, then t ∈ ϕ S }.

∗

B t iff there is a sequence Here s −→

s = u0

→A1

u1

→A2

· · · →An

un+1 = t

where A1 , . . . , An ∈ B. In other words, there is a sequence of arrows B∗ includes from the set B taking s to t. We allow n = 0 here, so −→ the identity relation on S. To see that 2∗B ϕ is indeed an epistemic proposition, we use parts 3 and 6 above; we may also argue directly, of course. 2.2. Epistemic Logic Recalled In this section, we review the basic deﬁnitions of modal logic. In terms of our work in Section 2.1 above, the inﬁnitary modal propositions are the smallest collection of epistemic propositions containing the propositions p corresponding toatomic sentences p and closed under negation ¬, inﬁnitary conjunction , and the agent modalities 2A . The ﬁnitary propositions are the smallest collection closed the same way, except that we replace by its special cases tr and the binary conjunction operation. Syntactic and Semantic Notions. It will be important for us to make a sharp distinction between syntactic and semantic notions. For example, we speak of atomic sentences and atomic propositions. The difference for us is that atomic sentences are entirely syntactic objects: we won’t treat an atomic sentence p as anything except an unanalyzed mathematical object. On the other hand, this atomic sentence p also has associated with it the atomic proposition p. For us p will be a function whose domain is the (proper class of) state models, and it is deﬁned by (3)

pS

=

{s ∈ S : s ∈ pS }.

This difference may seem pedantic at ﬁrst, and surely there are times when it is sensible to blur it. But for various reasons that will hopefully become clear, we need to insist on it. [ 13 ]

178

LOGICS FOR EPISTEMIC PROGRAMS

Up until now, the only syntactic objects have been the atomic sentences p ∈ AtSen. But we can build the collections of ﬁnitary and inﬁnitary atomic sentences by the same deﬁnitions that we have seen, and then the work of the past section is the semantics of our logical systems. For example, we have sentences p ∧ q, 2A ¬p, and 2∗B q. These then have corresponding epistemic propositions as their semantics: p ∧ q, 2A ¬p, and 2∗B q, respectively. Note that the latter is a properly inﬁnitary proposition (and so 2∗B q is a properly inﬁnitary sentence); it abbreviates an inﬁnite conjunction. The rest of this section studies examples of the semantics and it also makes the connection of the formal system with the informal notions of knowledge, belief and common knowledge. We shall study Scenario 3 of Section 1.1, where A opens the box herself to see heads in a semi-public way: B sees A open the box but not the result, A is aware of this, etc. We want to study the model after the opening. We represented this as

We ﬁrst must represent this picture as a bona ﬁde state model S3 in our sense.2 The picture includes no explicit states, but we must ﬁx some states to have a representation. We choose distinct objects s and t. Then we take as our state model S3 , as deﬁned by S3

=

{s, t}

A

= =

{(s, s), (t, t)} {(s, s), (s, t), (t, s), (t, t)}

→ B →

H T

= =

{s} {t}

In Figure 1, we list some sentences of English along with their translations into standard epistemic logic. We also have calculated the semantics of the formal sentences in the model S3 . It should be stressed that the semantics is exactly the one deﬁned in the previous section. For example, 2A TS3

A

=

{u ∈ S3 : if u → v, then v ∈ TS3 }

= =

{u ∈ S3 : if u → v, then v = t} {t}

A

We also emphasize two other points. First, the translation of English to the formal system is based on the rendering of “A knows” (or “A has a justiﬁed belief that”) as 2A , and of “it is common knowledge that” by 2∗A,B . Second, the chart bears a relation to intuitions that one naturally has about the states s and t. Recall that s is the state that obtains after A looks [ 14 ]

179

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

English the coin shows heads A knows the coin shows heads A knows the coin shows tails B knows that the coin shows head A knows that B doesn’t know it’s heads B knows that A knows that B doesn’t know it’s heads it is common knowledge that either A knows it’s heads or A knows that it’s tails it is common knowledge that B doesn’t know the state of the coin

Formal rendering 2A H 2A T 2B H

Semantics {t} {s} {t} ∅

2A ¬2B H

{s, t}

2B 2A ¬2B H

{s, t}

2∗A,B (2A H ∨ 2A T)

{s, t}

H

2∗A,B ¬(2B H ∨ 2B T) {s, t}

Figure 1. Examples of translations and semantics.

and sees that the coin is lying heads up. The state t, on the other hand, is a state that would have been the case had A seen tails when she looked. 2.3. Updates A transition relation between state models S and T is a relation between the sets S and T . We write r : S → T for this. An update r is a pair of operations r

=

(S → S(r), S → rS ),

where for each S ∈ StateModels, rS : S → S(r) is a transition relation. We call S → S(r) the update map, and S → rS the update relation. EXAMPLE 2.1. Let ϕ be an epistemic proposition. We get an update Pub ϕ which represents the public announcement of ϕ. For each S, S(Pub ϕ) is the sub-state-model of S determined by the states in ϕ S . In this submodel, information about atomic sentences and accessibility relations is simply inherited from the larger model. The update relation (Pub ϕ)S is the inverse of the inclusion relation of S(Pub ϕ) into S. EXAMPLE 2.2. We also get a different update ?ϕ which represents testing whether ϕ is true. Here we take S(?ϕ) to be the model whose state set is ({0} × ϕ S ) ∪ ({1} × S). [ 15 ]

180

LOGICS FOR EPISTEMIC PROGRAMS A

A

The arrows are deﬁned by (i, s) →(j, t) iff s → t in S and j = 1. (Note that states of the form (0, s) are never the targets of arrows in the new model.) Finally, we set =

pS(?ϕ )

{(i, s) ∈ S(?ϕ) : s ∈ pS }.

The relation (?ϕ)S is the set of pairs (s, (0, s)) such that s ∈ ϕ S . We shall study these two examples (and many others) in the sequel, and in particular we shall justify the names “public announcement” and “test”. For now, we continue our general discussion by noting that the collection of updates is closed in various ways. 1. Skip: there is an update 1 with S(1) = S, and 1S is the identity relation on S. 2. Sequential Composition: if r and s are epistemic updates, then their composition r; s is again an epistemic update, where S(r; s) = S(r)(s), and (r; s)S = rS ; sS(r) . Here, we use on the right side the usual composition ; of relations.3 3. Disjoint Union (or Non-deterministic choice): If X is any set of epi stemic updates, then the disjoint union X r is anepistemic update, deﬁned as follows. The set of states of the model X r is the disjoint union of all the sets of states in each model S(r): {(s, r) : r ∈ X and s ∈ S(r)}. A

Similarly, each accessibility relation → is deﬁned as the disjoint union of the corresponding accessibility relations in each model: A

A

(t, r) →(u, s) iff if r = s and t → u in S(r). The valuation p in S( X r) is the disjoint union of the valuations in each state model: {(s, r) : r ∈ X and s ∈ pS(r) }. Finally, the update relation ( X r)S between S and S( X r) is the union of all the update relations rS : t ( r)S (u, s) iff tsS u. p

=

X

4. Special case: Binary Union. The (disjoint) union of two epistemic updates r and s is an update r s, given by r s = {r, s}. [ 16 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

181

5. Another special case: Kleene star (iteration). We have the operation of Kleene star on updates: {1, r, r · r, . . . , rn , . . .} r∗ = where rn is recursively deﬁned by r0 = 1, rn+1 = rn ; r. 1. Crash: We can also take X = ∅ in part 3. This gives an update 0 such that S(0) is the empty model for each S, and 0S is the empty relation. The operations r; s, r s and r∗ are the natural analogues of the operations of relational composition, union of relations and iteration of a relation, and also of the regular operations on programs in PDL. The intended meanings are: for r; s, sequential composition (do r, then do s); for r s, non-deterministic choice (do either r or s); for r∗ , iteration (repeat r some ﬁnite number of times). A New Operation: Synamic Modalities for Updates. If ϕ is an epistemic proposition and r an update, then [r]ϕ is an epistemic proposition deﬁned by (4)

([r]ϕ)S

=

{s ∈ S : if s rS t, then t ∈ ϕ S(r) }.

We should compare (4) and (2). The point is that we may treat updates in a similar manner to other box-like modalities; the structure given by an update allows us to do this. This point leads to the formal languages which we shall construct in due course. But we can illustrate the idea even now. Suppose we want to interpret the sentence [Pub H]2A H in our very ﬁrst model, the model S1 pictured again below:

We shall call the two states s (where H holds) and t. Again, we want to determine [[[Pub H]2A H]]S1 . We already have in Example 2.1 a general deﬁnition of Pub H as an update, so we can calculate S1 (Pub H) and also (Pub H)S1 . We indicate these in the picture

The one-state model on the right is S1 (Pub H), and the dotted arrow shows (Pub H)S1 . So we calculate: [[[Pub H]2A H]]S1 = {s ∈ S1 : whenever s (Pub H)S1 t, then also t ∈ [[2A H]]S1 (Pub H) }. [ 17 ]

182

LOGICS FOR EPISTEMIC PROGRAMS

In S1 (Pub H), the state satisﬁes 2A H. Thus [[[Pub H]2A H]]S1 = {s, t}. It might be more interesting to consider Pub H2A H; this is ¬[Pub H]¬2A H. Similar calculations show that [[Pub H2A H]]S1 = {s ∈ S1 : for some t, s (Pub H)S1 t and t ∈ [[2A H]]S1 (Pub H) }. The point here is that we have a general semantics for sentences like [Pub H]2A H, and this semantics crucially uses Equation (4). That is, to determine the truth set of a sentence like [Pub H]2A H in a particular model S, one applies the update map to S and works with the update relation between the S and the S([Pub H]); one also uses the semantics of 2A H in the new model. This overall point is one of the two leading features of our approach; the other is the update product which we introduce in Section 3. 2.4. The Target Logical Systems This paper presents a number of logical systems which contain epistemic operators of various types. These operators are closely related to aspects of the scenarios of Section 1.1. The logics themselves are presented formally in Section 4, but this work takes a fair amount of preparation. We delay this until after we motivate the subject, and so we turn to an informal presentation of the syntax and semantics of some logics. The overall idea is that the operators we study correspond roughly to the shapes of the action models which we shall see in Section 3.5. THE LOGIC OF PUBLIC ANNOUNCEMENTS. We take a two-place sentential operator [Pub −]−. That is, we want an operation taking sentences ϕ and ψ to a new sentence [Pub ϕ]ψ, and we want a logic closed under this operation. The intended interpretation of [Pub ϕ]ψ is: assuming that ϕ holds, then announcing it results in a state where ψ holds. The announcement here should be completely public. The semantics of every sentence χ will be an epistemic proposition [[χ]] in the sense of Section 2.1. Note that we refer to this operator as a two-place one. This just means that it takes two sentences and returns another sentence. We also think of [Pub ϕ] as a modal operator in its own right, on a par with the knowledge operators 2A . And so we shall think of Pub as something which takes sentences into one-place modal operators. We also consider a dual Pub ϕ to [Pub ϕ]. As one expects, the semantics will arrange that Pub ϕψ and ¬[Pub ϕ]¬ψ be logically equivalent. (That is, they will hold at the same states.) Thus, the intended [ 18 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

183

interpretation of Pub ϕψ is: ϕ holds, and announcing it results in a state where ψ holds. As one indication of the difference, in S1 , y |= [Pub H]true (vacuously, as it were: it is not possible to make a true announcement of H in y). But we do have y |= ¬Pub Htrue; this is how our semantics works out. Once again we only consider true announcements. Our semantics will arrange that ϕ → (Pub ϕψ ↔ [Pub ϕ]ψ). So in working with the example scenarios, the difference between the two modalities will be small. Further, we iterate the announcement operation, obtaining sentences such as [Pub p][Pub q]r. We also want to consider announcements about announcements, as in Pub Pub ϕψχ. This says that it is possible to announce publicly Pub ϕψ, and as a result of a true announcement of this sentence, χ will hold. THE LOGIC OF COMPLETELY PRIVATE ANNOUNCEMENTS TO GROUPS. This time, the syntax is more complicated. If ϕ and ψ are sentences and B is a set of agents, then [PriB ϕ]ψ is again a sentence. The intended interpretation of this is: assuming that ϕ holds, then announcing it publicly to the subgroup B in such a way that outsiders do not even suspect that the announcement happened results in a state where ψ holds. (The “Pri” in the notation stands for “private.”) For example, this kind of announcement occurs in the passage from Scenario 1 to Scenario 4; that is, “cheating” is a kind of private announcement to the “group” {A}. We want it to be the case that in S1 , x |= Pri{A} H(2A H ∧ ¬2B 2A H). That is, in x, it is possible to announce H to A privately (since H is true in x), and by so doing we have a new state where A knows this fact, but B does not know that A knows it. The logic of private announcements to groups allows as modal operators [PriB ϕ] for all sentences ϕ of the logic. We shall show that this logical system extends the logic of public announcements, the idea being that when we take B to be the full set A of agents, Pub ϕ and PriA ϕ should be equivalent in the appropriate sense. THE LOGIC OF COMMON KNOWLEDGE OF ALTERNATIVES. If ϕ is again and ψ are sentences and B is a set of agents, then [CkaB ϕ]ψ a sentence. The intended interpretation of this is: assuming that ϕ1 holds, then announcing it publicly to the subgroup B in such a way it is common knowledge to the set of all agents that the announcement was one of ϕ [ 19 ]

184

LOGICS FOR EPISTEMIC PROGRAMS

Syntactic sentence ϕ language L, L(), etc. action signature basic action expression σ ψ program π , action α

Semantic epistemic proposition ϕ state model S update r epistemic program model (, , ψ 1 , . . . ψ n ) canonical action model

Figure 2. The main notions in this paper.

results in a state where ψ holds. For example, consider Scenario 3. In S1 , we have x |= ¬2A H ∧ Cka{A} H, T(2A (H ∧ ¬2B 2A H) ∧ 2B (2A H ∨ 2A T)). The action here is A learning that either the coin lies heads up or that it lies tails up, and this is done in such a way that B knows that A learns one of the two alternatives but not which one. Before the action, A does not know that the coin lies heads up. As a result, A knows this, and knows that B does not know it. At this point, we have the syntax of some logical languages and also examples of what we intend the semantics to be. We do not yet have the formal semantics of any of these languages, of course. However, even before we turn to this, we want to be clear that our goal in this paper is to study a very wide class of logical systems, including ones for the representation of all possible “announcement types.” The reasons to be interested in this approach rather than to study a few separate logics are as follows: (1) it is more general and elegant to have a uniﬁed presentation; and (2) it gives a formal account of the notion of an “announcement type” which would be otherwise lacking and which should be of independent interest. So what we will really be concerned with is: THE LOGIC OF ALL POSSIBLE EPISTEMIC ACTIONS. If α is an epistemic action and ϕ is a sentence, then [α]ϕ is again a sentence. Here α is some sort of epistemic action (which of course we shall deﬁne in due course); the point is that α might involve arbitrarily complex patterns of suspicion. In this way, we shall recover the four logical systems mentioned above as fragments of the larger logic of all possible epistemic actions. Our Claims in This Paper. Here are some of the claims of the paper about the logical languages we shall construct and about our notion of updates. [ 20 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

185

Figure 3. Languages in this paper.

1. Each type of epistemic action, such as public announcements, completely private announcements, announcements with common knowledge of suspicion, etc, corresponds to a natural collection of updates as we have deﬁned them above. 2. Each type also gives rise to a logical language with that type of action as a primitive construct. Moreover, it is possible to precisely deﬁne the syntax of the language to insure closure under the construct. That is, we should be able to formulate a language with announcements not just about atomic facts, but about announcements themselves, announcements about announcements, etc. 2.5. A Guide to the Concepts in This Paper In this section, we catalog the main notions that we shall introduce in due course. After this, we turn to a discussion of the particular languages that we study, and we situate them with existing logical languages. Recall the we insist on a distinction of syntactic and semantic objects in this paper. We list in Figure 2 the main notions that we will need. We do this mostly to help readers as they explore the different notions. We mention now that the various notions are not developed in the order listed; indeed, we have tried hard to organize this paper in a way that will be easiest to read and understand. For example, one of our main goals is to present a set of languages (syntactic objects) and their semantics (utilizing semantic objects). Languages. This paper studies a number of languages, and to help the reader we list these in Figure 2. Actually, what we study are not individual languages, but rather families of languages parameterized by different choices of primitives. It is standard in modal logic to begin with a set of [ 21 ]

186

LOGICS FOR EPISTEMIC PROGRAMS

atomic propositions, and we do the same. The difference is that we shall call these atomic sentences in order to make a distinction between these essentially syntactic objects and the semantic propositions that we study beginning in Section 2.3. This is our ﬁrst parameter, a set AtSen of atomic sentences. The second is a set A of agents. Given these, L0 is ordinary modal logic with the elements of AtSen as atomic sentences and with agent-knowledge (or belief) modalities 2A for A ∈ A. We add common-knowledge operators 2∗B , for sets of agents B ⊆ A, to get a larger language L1 . In Figure 2, we note the fact that L0 is literally a subset of L1 by using the inclusion arrow. The syntax and semantics of L0 and L1 are presented in Figure 4. Another close neighbor of the system in this paper is Propositional Dynamic Logic (PDL). PDL was ﬁrst formulated by Fischer and Ladner (1979), following the introduction of dynamic logic in Pratt (1976). The syntax and the main clauses in the semantics of PDL are shown in Figure 5. We may also take L0 and close under countable conjunctions (and hence also disjunctions). We call this language Lω0 . Note that L1 is not literally a subset of Lω0 , but there is a translation of L1 into Lω0 that preserves the semantics. We would indicate this in a chart with a dashed arrow. Going further, we may close under arbitrary (set-sized) boolean operations; this language is then called L∞ 0 . PDL is propositional dynamic logic, formulated with atomic programs a replacing the agents A that we have ﬁxed. We might note that we can translate L1 into PDL. The main clauses in the translation are: (2A ϕ)t (2∗B ϕ)t

= =

t [a]ϕ [( b∈B b)∗ ]ϕ t

Beginning in Section 4.2, we study some new languages. These will be based on a third parameter, an action signature . For each “type of epistemic action” there will be an action signature . For each , we’ll have languages L0 ( ), L1 ( ), and L( ). More generally, for each family of action signatures S, we have languages L0 (S), L1 (S), and L(S). These will be related to the other languages as indicated in the ﬁgure. So we shall extend modal logic by adding one or another type of epistemic action. The idea is then we can generate a logic from an action signature corresponding to an intuitive action. For example, corresponding to the notion of a public announcement is a particular action signature

pub, and then the languages L0 ( pub), L1 ( pub), and L( pub) will have something to do with our notion of a “logic of public announcements.” [ 22 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

187

Figure 4. The languages L0 and L1 . For L0 , we drop the 2∗B construct.

Figure 5. Propositional Dynamic Logic (PDL).

In Baltag et al. (2003), we compare the expressive power of the languages mentioned so far. It turns out that all of the arrows in Figure 3 correspond to proper increases in expressive power. (There is one exception: L0 turns out to equal L0 (S) in expressive power for all S.) It is even more interesting to compare expressive power as we change the action signature. For example, we would like to compare logics of public announcement to logics of private announcement to groups. Most of the natural questions in this area are open as of this writing. 2.6. Reformulation of Test-only PDL In PDL, there are two types of syntactic objects, sentences and programs. The programs in PDL are interpreted on a structure by relations on that structure. This is not the way our semantics works, and to make this point we compare the standard semantics of (a fragment of) PDL with a language closer to what we shall ultimately study. To make this point, we consider the test-only fragment of PDL in our terms. This is the fragment built over the empty set of atomic programs. So the programs are skip, the tests ?ϕ and compositions, choices, or iterations of these; sentences are formed as in PDL We give a reformulation in Figure 6. The point is that in PDL the programs are interpreted by relations on a given model, and in our terms programs are interpreted by updates. We have discussed updates of the form ?ϕ in Example2.2. Given that we have [ 23 ]

188

LOGICS FOR EPISTEMIC PROGRAMS

Figure 6. Test-only PDL, with a semantics in our style.

an interpretation of the sentence ϕ as an epistemic proposition [[ϕ]], we then automatically have an update ?[[ϕ]]. For the sentences, the main thing to look at is the semantics of sentences [π ]ϕ; here we use the semantic notions from Section 2.3. The way the semantics works is that we have [[π ]] and [[ϕ]]; the former is an update and the latter is an epistemic proposition. Then we use both of these to get an overall semantics, using Equation (4). In more explicit terms, [[[π ]ϕ]]S

= =

([[[π ]]][[ϕ]]])S {s ∈ S : if s [[π ]]S t, then t ∈ [[ϕ]]S }

2.7. Background: Bisimulation We shall see the concept of bisimulation at various points in the paper, and this section reviews the concept and also develops some of the appropriate deﬁnitions concerning bisimulation and updates. DEFINITION. Let S and T be state models. A bisimulation between S and T is a relation R ⊆ S × T such that whenever s R t, the following three properties hold: 1. s ∈ pS iff t ∈ pT for all atomic sentences p. That is, s and t agree on all atomic information. A s , there is some state t 1. For all agents A and states s such that s → A such that t → t and s R t . A t , there is some state s 2. For all agents A and states t such that t → A such that s → s and s R t . EXAMPLE 2.3. This example concerns the update operation ?ϕ of Example2.2. Fix an epistemic proposition ϕ and a state model S. Recall that [ 24 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

189

(?ϕ)S relates each s ∈ ϕ to the corresponding pair (0, s). We check that the following relation R is a bisimulation between S and S(?ϕ): R

=

(?ϕ)S ∪ {(s, (1, s)) : s ∈ S}.

The deﬁnition of S(?ϕ) insures that the interpretations of atomic sentences are preserved by this relation. A t in S and s R (i, s ). Then we must have s = s. Next, suppose that s → A (1, t). Further, the deﬁnition of R tells us that t R (1, t), and (i, s) → A Finally, suppose that s R (i, s ) and (i, s ) → (j, t). Then s = s and A A in S(?ϕ) implies that s → t in S. j = 1. In addition, the deﬁnition → And as above, t R (1, t). This concludes our veriﬁcations. Recall that L0 is ordinary modal logic, formulated as always over our ﬁxed sets of agents and atomic sentences. The next result concerns the language L∞ 0 of inﬁnitary modal logic. In this language, one has conjunctions and disjunctions of arbitrary sets of sentences. We have the following well-known result: PROPOSITION 2.4. If there is a bisimulation R such that s R t, then s and ∞ t agree on all sentences in L∞ 0 : for all ϕ ∈ L0 , s ∈ [[ϕ]]S iff t ∈ [[ϕ]]T . A pointed state model is a pair (S, s) such that s ∈ S. The state s is called the designated state (or the “point”) of our pointed model, and is meant to represent the actual state of the system. Two pointed models models are said to be bisimilar if there exists a bisimulation relation between them which relates their designated states. So, denoting by ≡ the relation of bisimilarity, we have: (S, s) ≡ (T, t) iff T such that s R t.

there is a bisimulation R between S and

This relation ≡ is indeed an equivalence relation. When S and T are clear from the context, we write s ≡ t instead of (S, s) ≡ (T, t). We say that a proposition ϕ preserves bisimulations if whenever (S, s) ≡ (T, t), then s ∈ ϕ S iff t ∈ ϕ T . We also say that an update r preserves bisimulations if the following two conditions hold: 1. If s rS s and (S, s) ≡ (T, t), then there is some t such that t rT t and (S(r), s ) ≡ (T(r), t ). 2. If t rT t and (S, s) ≡ (T, t), then there is some s such that s rS s and (S(r), s ) ≡ (T(r), t ). [ 25 ]

190

LOGICS FOR EPISTEMIC PROGRAMS

PROPOSITION 2.5. Concerning bisimulation preservation: 1. The bisimulation preserving propositions include the atomic propositions p, and they are closed under all of the (inﬁnitary) operations on propositions. 2. The bisimulation preserving updates are closed under composition and (inﬁnitary) sums. 3. If ϕ and r preserve bisimulations, so does [r]ϕ. Proof. We show the last part. Suppose that s ∈ ([r]ϕ)S , and let (S, s) ≡ (T, t). To show that t ∈ ([r]ϕ)T , let t rT t . Then by condition (2) above, there is some s such that s rS s and (S(r), s ) ≡ (T(r), t ). Since s ∈ ([r]ϕ)S , we have s ∈ ϕ S(r) . And then t ∈ ϕ T(r) , since ϕ too preserves bisimulation. Endnotes. As far as we know, the ﬁrst paper to study the interaction of communication and knowledge in a formal setting is Plaza’s paper “Logics of Public Communications” (Plaza 1989). As the title suggests, the epistemic actions studied are public announcements. In essence, he formalized the logic of public announcements. (In addition to this, (Plaza 1989) contains a number of results special to the logic of announcements which we have not generalized, and it also studies an extension of the logic with non-rigid constants.) The same formalization was made in Gerbrandy (1999a, b), and also Gerbrandy and Groeneveld (1997). These papers further formalize the logic of completely private announcements. (However, their semantics use non-wellfounded sets rather than arbitrary Kripke models. As pointed out in Moss (1999), this restriction was not necessary.) The logic of common knowledge of alternatives was formulated in Baltag et al. (1998) and also in van Ditmarsch’s dissertation (2000). Our introduction of updates is new here, as are the observations on the test-only fragment of PDL. For connections of the ideas here with coalgebra, see Baltag (2003). One very active arena for work on knowledge is distributed systems, and the main source of this work is the book Reasoning About Knowledge Fagin et al. (1996). We depart from Fagin et al. (1996) by introducing operators whose semantics are updates as we have deﬁned them, and by doing without temporal logic operators. In effect, our Kripke models are simpler, since they do not incorporate all of the runs of a system; the new operators can be viewed as a compensation for that. REMARK. Our formulation of a program model uses a designated set of simple actions. There are other equivalent formulations. Another way would be to use a disjoint union of pointed program models; It would be [ 26 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

191

possible to further reformulate some of our deﬁnitions below and thereby to give a semantics for our ultimate languages that differs from what we obtain in Section 4.2 below. However, the two semantics would be equivalent. The reason we prefer to work with designated sets is that it permits us to draw diagrams with fewer states. The cost of the simpler representations is the slightly more complicated deﬁnition, but we feel this cost is worth paying.

3. THE UPDATE PRODUCT OPERATION

In this section, we present the centerpiece of the formulation of our logical systems by introducing action models, program models, and an update product operation. The leading idea is that epistemic actions, like state models, have the property that different agents think that different things are possible. To model the effect of an epistemic update on a state, we use a kind of product of epistemic alternatives. 3.1. Epistemic Action Models Let be the collection of all epistemic propositions. An epistemic action A A , pre), where is a set of simple actions, → model is a triple = ( , → is an A-indexed family of relations on , and pre : → . An epistemic action model is similar to a state model. But we call the members of the set “simple actions” (instead of states). We use different notation and terminology because of a technical difference and a bigger conceptual point. The technical difference is that pre : → (that is, the codomain is the collection of all epistemic propositions). The conceptual point is that we think of “simple” actions as being deterministic actions whose epistemic impact is uniform on states (in the sense explained in our Introduction). So we think of “simple” actions as particularly simple kinds of deterministic actions, whose appearance to agents is uniform: the agents’ uncertainties concerning the current action are independent of their uncertainties concerning the current state. This allows us to abstract away the action uncertainties and represent them as a Kripke structure of actions, in effect forgetting the state uncertainties. As announced in the Introduction, this uniformity of appearance is restricted only to the action’s domain of applicability, deﬁned by its preconditions. Thus, for a simple action σ ∈ , we interpret pre(σ ) as giving the precondition of σ ; this is what needs to hold at a state (in a state model) in order for action σ to be “accepted” in that state. So σ will be executable in s iff its precondition pre(σ ) holds at s. [ 27 ]

192

LOGICS FOR EPISTEMIC PROGRAMS

At this point we have mentioned the ways in which action models and state models differ. What they have in common is that they use accessibility relations to express each agent’s uncertainty concerning something. For state models, the uncertainty has to do with which state is the real one; for action models, it has to do with which action is taking place. Usually we drop the word “epistemic” and therefore refer to “action models”. EXAMPLE 3.1. Here is an action model:

A B A B Formally, = {σ, τ }; σ → σ, σ → τ, τ → τ, τ → τ ; pre(σ ) = H, and pre(τ ) = tr, where recall that tr is the “always true” proposition. As we shall see, this action model will be used in the modeling of a completely private announcement to A that the coin is lying heads up. Further examples may be found later in this section.

3.2. Program Models To model non-deterministic actions and non-simple actions (whose appearances to agents are not uniform on states), we deﬁne epistemic program models. In effect, this means that we decompose complex actions (’programs’) into “simple” ones: they correspond to sets of simple, deterministic actions from a given action model. An epistemic program model is deﬁned as a pair π = ( , ) consisting of an action model and a set of designated simple actions. Each of the simple actions γ ∈ can be thought as being a possible “deterministic resolution” of the non-deterministic action π . As announced above, the intuition about the map pre is that an action is executable in a given state only if all its preconditions hold at that state. We often spell out an epistemic A A , pre, ) rather than (( , → , pre), ). When program model as ( , → drawing the diagrams, we use doubled circles to indicate the designated actions in the set . Finally, we usually drop the word “epistemic” and just refer to these as program models. EXAMPLE 3.2. Every action model and every σ ∈ gives a program model by taking {σ } as the set of designated simple actions. For instance, in connection with Example 3.1, we have

[ 28 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

193

with pre(σ ) = H, pre(τ ) = tr, as before. Program models of this type are very common in our treatment. But we need the extra ability to have sets of designated simple actions to deal with more complicated actions, as our next examples show. EXAMPLE 3.3. A Non-deterministic Action. Let us model the nondeterministic action of either making a completely private announcement to A that the coin is lying heads up, or not making any announcement. The action is completely private, so B doesn’t suspect anything: he thinks no announcement is made. The program model is obtained by choosing = {σ, τ } in the action model from Example 3.1. The picture is

with pre(σ ) = H, pre(τ ) = tr, as before. Alternatively, one could take the disjoint union of with the one-action program model with precondition tr. EXAMPLE 3.4. A Deterministic, but Non-simple Action. Let us model the action of (completely) privately announcing to A whether the coin is lying heads up or not. Observe that this is a deterministic action; that is, the update relation is functional: at any state, the coin is either heads up or not. But the action is not simple: its appearance to A depends on the state. In states in which the coin is heads up, this action looks to A like a private announcement that H is the case; in the states in which the coin is not heads up, the action looks to A like a private announcement that ¬H. (However, the appearance to B is uniform: at any state, the action appears to him as if no announcement is made.) The only way to model this deterministic, but non-simple, action in our setting is as a non-deterministic program model, having as its ‘designated’ actions two mutually exclusive simple actions: one corresponding to a (truthful) private announcement that H, and another one corresponding to a (truthful) private announcement that ¬H.

with pre(σ ) = H, pre(τ ) = tr, and pre(ρ) = ¬H. [ 29 ]

194

LOGICS FOR EPISTEMIC PROGRAMS

3.3. The Update Product of a State Model with an Epistemic Action Model The following operation plays a central role in this paper. A Given a state model S = (S, → S , · S ) and an action model = A ( , → , pre), we deﬁne their update product to be the state model S⊗

=

A , .), (S ⊗ , →

given by the following: the new states are pairs of old states s and simple actions σ which are “consistent”, in the sense that all preconditions of the action σ “hold” at the state s (5)

S⊗

=

{(s, σ ) ∈ S × : s ∈ pre(σ )S }.

The new accessibility relations are taken to be the “products” of the corresponding accessibility relations in the two frames; i.e., for (s, σ ), (s , σ ) ∈ S ⊗ we put (6)

A (s , σ ) iff (s, σ ) →

A A s→ s and σ → σ ,

and the new valuation map .S : AtSen → P (S ⊗ ) is essentially given by the old valuation: (7)

pS⊗

=

{(s, σ ) ∈ S ⊗ : s ∈ pS }.

Intended Interpretation. The update product restricts the full Cartesian product S × to the smaller set S ⊗ in order to insure that states survive actions in the appropriate sense. A on the output frame represFor each agent A, the product arrows → ent agent A’s epistemic uncertainty about the output state. The intuition is that the components of our action models are “simple actions”, so the uncertainty regarding the action is assumed to be independent of the uncertainty regarding the current (input) state. This independence allows us to “multiply” these two uncertainties in order to compute the uncertainty regarding the output state: if whenever the input state is s, agent A thinks the input might be some other state s , and if whenever the current action happening is σ , agent A thinks the current action might be some other action σ , and if s survives σ , then whenever the output state (s, σ ) is reached, agent A thinks the alternative output state (s , σ ) might have been reached. Moreover, these all of the output states that A considers possible are of this form. [ 30 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

195

As for the valuation, we essentially take the same valuation as the one in the input model. If a state s survives an action, then the same facts p hold at the output state (s, σ ) as at the input state s. This means that our actions, if successful, do not change the facts. This condition can of course be relaxed in various ways, to allow for fact-changing actions. But in this paper we are primarily concerned with purely epistemic actions, such as the earlier examples in this section. 3.4. Updates Induced by Program Models Recall that we deﬁned updates and proper epistemic actions in Section 2.3. Right above, in Section 3.2, we deﬁned epistemic action models. Note that there is a big difference: the updates are pairs of operations on the class of all state models, and the program models are typically ﬁnite structures. We think of program models as capturing speciﬁc mechanisms, or algorithms, for inducing updates. This connection is made precise in the following deﬁnition. DEFINITION. Let ( , ) be a program model. We deﬁne an update which we also denote ( , ) as follows: 1. S( , ) = S ⊗ . 2. s ( , )S (t, σ ) iff s = t and σ ∈ . We call this the update induced by ( , ). Bisimulation Preservation. Before moving to examples, we note a simple result that will be used later. PROPOSITION 3.5. Let be an action model in which every pre(σ ) is a bisimulation preserving epistemic proposition. Let ⊆ be arbitrary. Then the update induced by ( , ) preserves bisimulation. Proof. Write r for the update induced by the program model ( , ). Fix S and T, and suppose that s ≡ t via the relation R0 . Suppose that s rS s , so s ∈ S(r) is of the form (s, σ ) for some σ ∈ . Then (t, σ ) ∈ T(r), and clearly t rT (t, σ ). We need only show that (s, σ ) ≡ (t, σ ). But the following relation R is a bisimulation between S(r) and T(r): (s , τ1 ) R (t , τ2 ) iff

s R0 t and τ1 = τ2 .

The veriﬁcation of the bisimulation properties is easy. And R shows that (s, σ ) ≡ (t, σ ), as desired. [ 31 ]

196

LOGICS FOR EPISTEMIC PROGRAMS

3.5. The Coin Scenario Models as Examples of the Update Product We return to the coin scenarios of Section 1.1. Our purpose is partly to indicate how one may obtain the models there as examples of the update product, and at the same time to exemplify the update product construction itself. In this section, the set A of agents is {A, B}, and the set AtSen of atomic propositions is {H, T}. We remind the reader that T represents the coin lying tails up, while tr is our notation for the true epistemic proposition. EXAMPLE 3.6. We begin with an example worked out with many details, and then the rest of our examples will omit many similar calculations. This example has to do with Scenario 4 from Section 1.1, where the coin lies heads up and A takes a look at the coin in such a way that she is certain that B does not suspect anything. We take as S1 and 4 the structures shown below:

(We remind the reader that T is the atomic sentence for “tails” and tr is for “true”.) S1 is from Scenario 1. In 4 , we take the set of distinguished states to be {σ }. comes from Examples 3.1 and 3.2. To take the update product, we ﬁrst form the cartesian product S1 × 4 : {(s, σ ), (s, τ ), (t, σ ), (t, τ )} Of these four pairs, we only want those whose ﬁrst component satisﬁes (in S) the precondition of the second component. We do not want (t, σ ), / [[H]]S . But the other three pairs do satisfy our since pre(σ ) = H and t ∈ condition. So the state model S1 ⊗ 4 will have three states: (s, σ ), (s, τ ), and (t, τ ). The atomic information is inherited from the ﬁrst component, so we have [[H]]S1 ⊗ 4 = {(s, σ ), (s, τ )} and [[T]]S1 ⊗ 4 = {(t, τ )}. The A B and → are those of the product. For example, accessibility relations → B B B s and σ → τ . But we do not have we have (s, σ ) → (s, τ ), because s → A A (s, σ ) → (s, τ ), because σ → τ is false. Now, we rename the states as follows: (s, σ ) ; s

(s, τ ) ; t

(t, τ ) ; u

And then we get a picture of this state model, the same one we had in Scenario 4: [ 32 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

197

The dotted line here shows the update relation between s and (s, σ ). This is the only update relation. For example, we do not relate s and (s, τ ) because τ ∈ / = {σ }. Let S4 = S1 ⊗ 4 . EXAMPLE 3.7. 4 from Example 3.6 represents the action where A cheats, learning heads or tails in a way which is oblivious to B. It is also natural to consider an action of A privately learning the state of the coin. (This action may take place even if the state is tails.) We could use the following program model 4 :

This program model consists of two disjoint components. We leave it to the reader to calculate S1 ⊗ 4 , and also the relation between “before” and “after”. EXAMPLE 3.8. Next we construct the model corresponding to Scenario 5, where B cheats after Scenario 4. We consider the update product of the state model S4 from Example 3.6 above with the program model 5 shown below:

It represents cheating by B. The update product S4 ⊗ 5 has ﬁve states: (s, σ ), (t, σ ), (s, τ ), (t, τ ), and (u, τ ). Notice that (u, σ ) is omitted, since u∈ / [[H]]S4 . We leave it to the reader to calculate the accessibility relations in S4 ⊗ 5 ; and to draw the appropriate ﬁgure. Incidentally, we posed in Section 1.1 the exercise of constructing this representation from ﬁrst principles. Many people are able to construct the ﬁve state picture, and some others construct a related picture with seven states. The seven state picture is bisimilar to the one illustrated here. EXAMPLE 3.9. We next look back at Scenario 2. The simplest action structure is 2 : (8) [ 33 ]

198

LOGICS FOR EPISTEMIC PROGRAMS

It represents a public announcement to A and B that the coin is lying heads up. Here, the distinguished set is the entire action structure. For the A Pub record, we formalize 2 as a singleton set {Pub}. We have Pub → B and Pub → Pub. Also, we set pre(Pub) = H. We did not put the name Pub in the representation in (8), but in situations where we want the name we would draw the same picture except that instead of H we would say Pub : H. Let us call the same structure S2 when we view it as a state model; formally these are the same object. Let S be any model with the property that every action has both a A B where H is true, and also a successor in → where H successor in → is true. Then S ⊗ 2 is bisimilar to S2 . In particular, S1 ⊗ 2 is bisimilar to S2 . EXAMPLE 3.10. Let 3 be the following program model:

3 represents an announcement to A of heads in the manner of Scenario 3. That is, B knows that A either sees heads or sees tails, but not which. Similarly, let let 3 represent the same kind of announcement to B:

Then we have the following: S1 ⊗ 3 ∼ = denotes = S3 , where S3 is the model in Scenario 3 and ∼ isomorphism. S3 ⊗ 3 ∼ = S2 . This conforms with the intuition that successive semiprivate viewings by the two parties of the concealed coin amount to a public viewing. S3 ⊗ 3 ∼ = S3 . There is no point for A to look twice. S4 ⊗ 3 is a three-state model bisimilar to the model S4 from Scenario 4.

EXAMPLE 3.11. To obtain the model in Scenario 7, we use the following program model 7 : [ 34 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

199

with pre(ρ) = tr, pre(σ ) = H, and pre(τ ) = T. As illustrated, the set is the entire set 7 of simple actions. More generally, the general shape of 7 is the frame for an action which has three possibilities. A learns which one happens, and B merely learns that one of the three happens. Further, these two aspects are common knowledge. We omit the calculation that shows that S1 ⊗ 7 is the model drawn in Scenario 7. EXAMPLE 3.12. The program model employed in Scenario 8 is 8 , shown below:

Again we take pre(ρ) = tr, pre(σ ) = H, and pre(τ ) = T. The difference between this and 7 above, is that instead of = 7 , we take to be {s}. Then S1 ⊗ 8 ∼ = S8 . 3.6. Operations on Program Models 1 and 0. We deﬁne program models 1 and 0 as follows: 1 is a one-action A σ for all A, P RE(σ ) = tr, and with distinguished set set {σ } with σ → {σ }. The point here is that the update induced by this program model is exactly the update 1 from Section 2.3. We purposely use the same notation. Similarly, we let 0 be the empty program model. Then its induced update is what we called 0 in Section 2.3. Sequential Composition. In all settings involving “actions” in some sense or other, sequential composition is a natural operation. In our setting, we would like to deﬁne a composition operation on program models, corresponding to the sequential composition of updates. Here is the relevant deﬁnition. [ 35 ]

200

LOGICS FOR EPISTEMIC PROGRAMS

A A Let = ( , → , pre , ) and = (, → , pre , ) be program models. We deﬁne the composition

;

=

A , pre ; , ; ) ( × , →

to be the following program model: 1. × is the cartesian product of the sets and . A in the composition ; is the family of product relations, in the 2. → natural way: A (σ, δ) → (σ , δ )

iff

A A σ→ σ and δ → δ.

3. pre ; (σ, δ) = ( , σ )pre (δ). 4. ; = × . In the deﬁnition of pre, ( , σ ) is an abbreviation for the update ( , {σ }), as deﬁned in Section 3.4 i.e., the update induced by the program model ( , {σ }). EXAMPLE 3.13. This example constructs a program model for lying, as we ﬁnd in Scenario 6. Lying in our subject cannot be taken to be simply a case of private or public announcement: this will not work out. In our given situation, B simply knows that A doesn’t know the side of the coin, and hence cannot accept any lying announcement that would claim such knowledge. One way to make sense of the action of A (successfully) lying to B is to assume that, ﬁrst, before the lying, a suspicion was aroused in B that A might have privately learnt (e.g., by opening the box, or been told) which side of the coin was lying up; then second, that B subsequently receives an untruthful announcement that A knows the coin is lying heads up, an announcement which is known to be false by A herself (but which is believable, and now believed, by B). Obviously, we cannot express things about past actions in our logic, so we have to start right at the beginning, before the lying announcement is sent, and capture the whole action of successful lying as a sequential composition of two actions: B’s suspicion of A’s private learning, followed by B’s receiving (and believing) the lying announcement. This is what we shall do here.4 Let ϕ be ¬2A H and let ψ be H ∧ 2A H. Let 6 be

The idea is that B “learns” a false statement, namely that A knows the state of the coin. Further, we saw 8 in Example 3.12. We consider 8 ; 6 . One can check using Proposition 3.14 that S1 ⊗ ( 8 ; 6 ) ∼ = (S1 ⊗ 8 ) ⊗ 6 ∼ = [ 36 ]

201

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

S8 ⊗ 6 ∼ = S6 . In addition, one can calculate 8 ; 6 explicitly to see what a reasonable one-step program for lying would be. The interesting point is that its preconditions are complex sentences having to do with actions in our language. A A , pre , ) and = (, → , pre , ), Disjoint Union. If = ( , → we take to be the disjoint union of the models, with union of the distinguished actions. The intended meaning is the non-deterministic choice between the programs represented by and . Here is the deﬁnition in more detail, generalized to arbitrary (possibly inﬁnite) disjoint unions: let A { i }i∈I be a family of program models, with i = ( i , → i , prei , i ); we deﬁne their (disjoint) union A

i =

i , → , pre,

i∈I

i∈I

to be the model given by: 1. i∈I i is i∈I ( i × {i}), the disjoint union of the sets i . A A (τ, j ) iff i = j and σ → 2. (σ, i) → iτ . 3. pre(σ, i) = prei (σ ). 4. = i∈I (i × {i}). Iteration. Finally, we deﬁne an iteration operation by ∗ = N}. Here 0 = 1, and n+1 = n ; .

n { : n ∈

We conclude by verifying that our deﬁnition of the operations on program models are correct in the sense that they are faithful to the corresponding operations on updates from Section 2.3. PROPOSITION 3.14. The update induced by a composition of program models is the composition of the induced updates. Similarly for sums and iteration, mutatis mutandis. Proof. Let ( , ) and (, ) be program models. We denote by r the update induced by ( , ), by s the update induced by (, ), and by t the update induced by ( , ); (, ). We need to prove that r; s = t. A Let S = (S, → S , [[.]]S ) be a state model. Recall that S(r; s)

=

S(r)(s)

=

(S ⊗ ( , )) ⊗ (, ).

We claim that this is isomorphic to S ⊗ ( ; , ; ), and indeed the isomorphism is (s, (σ, δ)) → ((s, σ ), δ). We check that (s, (σ, δ)) ∈ S ⊗ ( ; ) iff ((s, σ ), δ) ∈ (S ⊗ ) ⊗ . Indeed, the following are equivalent: [ 37 ]

202 1. 2. 3. 4. 5.

LOGICS FOR EPISTEMIC PROGRAMS

(s, (σ, δ)) ∈ S ⊗ ( ; ). s ∈ pre ; (σ, δ)S . s ∈ ( , σ )pre (δ)S . (s, σ ) ∈ S ⊗ and (s, σ ) ∈ pre (δ)S⊗ . ((s, σ ), δ) ∈ (S ⊗ ) ⊗ .

The rest of veriﬁcation of isomorphism is fairly direct. We also need to check that tS and (r; s)S are related by the isomorphism. Now tS

=

{(s, (s, (σ, δ))) ∈ S ⊗ ( ; ) : σ ∈ , δ ∈ }.

Recall that (r; s)S = rS ; sS(r) and that this is a relational composition in left-to-right order. And indeed, rS sS(r)

= =

{(s, (s, σ )) : (s, σ ) ∈ S ⊗ , σ ∈ } {((s, σ ), ((s, σ ), δ)) ∈ S(r) ⊗ : δ ∈ }.

This completes the proof for composition. We omit the proofs for sums and iteration. Endnotes. The work of this section explains of how complex representations of naturally occurring scenarios may be computed from state models before the scenario and program models. Indeed, this is one of our main points. There are precursors to our work in special cases, most notably Hans van Ditmarsch’s dissertation (2000). That work is about examples like 7 , where some agent or set of agents knows that the current action belongs to some set, but does not know which action it is. But to our knowledge, the work in this section is the ﬁrst time that anything like this has been obtained in general. We have taught the material in this section in several courses at different levels. Our experience is that the material here is highly appealing to students and makes the case for formal representations in the ﬁrst place (for students not familiar with formal logic) and for the kind of technical work that we pursue in this area (for those who are). The idea of representing epistemic actions as Kripke models (or variants of them) was ﬁrst presented in our earlier paper with S. Solecki (Baltag et al. 1998). However, the proposal of that paper was to employ Kripke models in the syntax of a logical language directly. Many readers have felt this to be confusing, since the resulting syntax looked as if it depended on the semantics. Proposals to improve the language were developed in several papers of Baltag (1999, 2001, 2002, 2003). The logics of these papers were more natural than those of Baltag et al. (1998). What [ 38 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

203

is new in this paper is the overall apparatus of action signatures, epistemic actions, etc. We present what we hope is a natural syntax, and at the same time we have programs as syntactic entities in our logical languages (see Section 4 just below) linked to program models in the semantics.

4. LOGICAL LANGUAGES BASED ON ACTION SIGNATURES

This section presents the centerpiece of our formal work, the logical systems corresponding to types of epistemic action. Our overall goal is to abstract from the propositional language earlier, in a way which allows us to ﬁx only the epistemic structure of the desired actions, and vary their preconditions. 4.1. Action Signatures We have presented many examples of the update product in Section 3.5. These allow us to represent many natural updates, and this is one of our goals in this paper. But the structures which we have seen so far are not sufﬁcient to get logical languages incorporating updates. For example, we have in Example 3.7 a program model that represents a private announcement to the agent A that a proposition H happens, and this takes place in a way that B learns nothing. The picture of this was

What we want to do now is to vary things a bit, and then to abstract them. For example, suppose we want to announce a different proposition to A, say ψ. We would use

Varying the proposition ϕ, all announcements of this kind could be thought as actions of the same type. We could then represent the type of the action by the following picture:

And the previous representations include the information that what actually happens is what A hears. To vary this, we need only change which world is designated by the doubled border. We could switch things, or [ 39 ]

204

LOGICS FOR EPISTEMIC PROGRAMS

double neither or both worlds. So we obtain a structure consisting of two action types:

The oval on the left represents the type PriA of a fully private announcement to agent A, while the oval on the right simply represents the type of a skip action (or of an empty, or trivial, public announcement). By inserting any given proposition ψ into the oval depicting the action type PriA , we can obtain speciﬁc private announcements PriA ψ, as depicted above. (There is no reason to insert any proposition into the right oval, since this comes already with its own precondition tr: this means that the type of a skip action uniquely determines the corresponding action skip, since it contains all the information needed to specify this action.) Another example would be the case in which we announce ψ to A in such a way that B is misled into believing ¬ψ (and is also misled into believing that everyone learns ¬ψ). Now we use

In itself, this is not general enough to give rise to an action type in our sense. But we can more generally consider the case in which ψ 1 is announced to A in such a way that B thinks that some other proposition ψ 2 is publicly announced:

By abstracting from the speciﬁc propositions, we obtain the following structure consisting of two action types:

Observe that if we are given now a sequence of two propositions (ψ 1 , ψ 2 ), we could use them to ﬁll in the oval with preconditions in two possible ways (depending on which proposition goes into the left oval). So, in order to uniquely determine how an action type will generate speciﬁc announcements, we need an enumeration without repetition of all the action types in the structure which do not come equipped with trivial preconditions (i.e., all the empty ovals in the diagram, since we assume the others have tr inside).

[ 40 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

205

So at this point, we can see how to abstract things, leading to the following deﬁnition. DEFINITION. An action signature is a structure

=

A , (σ1 , σ2 , . . . , σn )) ( , →

A ) is a ﬁnite Kripke frame, and σ1 , σ2 , . . . , σn is a desigwhere = ( , → nated listing of a subset of without repetitions. We obviously must have n ≤ | |, but we allow the case n = 0 as well, which will produce an empty list. We call the elements of action types, while the ones which are in the listing (σ1 , . . . , σn ) will be called non-trivial action types.

The way this works for us is that an action signature together with an assignment of epistemic propositions to the non-trivial action types in

will give us a full-ﬂedged action model. The trivial action types will describe actions which can always happen, so they will get assigned the trivial precondition tr, by default. And this is the exact sense in which the notion of an action signature is an abstraction of the notion of action model. We shall shortly use action signatures in constructing logical languages. EXAMLPES 4.1. Here is a very simple action signature which we call A skip for all agents A, and we

skip . is a singleton {skip}, we put skip → take the empty list () of types as listing. So we have only one, trivial, type: skip. In a sense which we shall make clear later, this is an action in which “nothing happens”, and moreover it is common knowledge that this is the case. The type of a public announcement can simply be obtained by only changing our listing in skip , to make the type non-trivial; we also change the name of the type to Pub, to distinguish it from the previous one. So the A Pub for signature pub of public announcements has = {Pub}, Pub → every agent A, and the listing is just (Pub). So Pub is the non-trivial type of a public announcement action. Note once again that we have not said what is being announced. We are purposely separating out the structure of the announcement from the content, and Pub is our model of the structure. The next simplest action signature is the “test” signature ? . We take A A skip, and skip → skip

? = {?, skip}, with the listing (?). We also take ? → for all A. So the test ? is the only non-trivial type of action here. This turns out to be a totally opaque form of test: ϕ is tested on the real world, but nobody knows this is happening: when it happening, all the agents think nothing (i.e., skip) is happening. The function of this type will be to generate tests ?ϕ, which affect the states precisely in the way dynamic logic tests do. [ 41 ]

206

LOGICS FOR EPISTEMIC PROGRAMS

For each set B ⊆ A of agents, we deﬁne the action signature PriB of completely private announcements to the group B. It has = {PriB , skip}; the listing is just (PriB ), which makes skip into a trivial type B C PriB for all B ∈ B, PriB → skip for C ∈ B, again; and we put PriB → A and skip → skip for all agents A. The action signature CkaB k is given by: = {1, . . . , k}; the listing is B i for i ≤ k and B ∈ B; and (1, 2, . . . , k), so all types are non-trivial. i → C j for i, j ≤ k and C ∈ B. This action signature is called the ﬁnally i → signature of common knowledge of alternatives for an announcement to the group B. Signature-based Program Models. Now that we have a general abstract notion, we introduce some notation to regain the earlier examples. Let

be a action signature, let (σ1 , . . . , σn be the corresponding listing of non = ψ 1 , . . . , ψ n be a list of epistemic trivial types, let ⊆ , and let ψ propositions. We obtain an epistemic program model ( , )(ψ 1 , . . . , ψ n ) in the following way: 1. The set of simple actions is , and the accessibility relations are those given by the action signature. 2. For j = 1, . . . , n, pre(σj ) = ψ j . We put pre(σ ) = tr for all the other (trivial) actions. 3. The set of distinguished actions is . In the special case that is the singleton set {σi }, we write the resulting signature-based program model as ( , σi , ψ 1 , . . . , ψ n ). To summarize: every action signature, set of distinguished action types in it, and corresponding tuple of epistemic propositions gives an epistemic program model in a canonical way. 4.2. The Syntax and Semantics of L( ) Fix an action signature . We present in Figure 7 a logic L( ). The ﬁrst thing to notice about this is that for the ﬁrst time in a great while, we have a genuine syntax. In this regard, note that n is ﬁxed from

; it is the length of the given listing (σ1 , . . . , σn ). In the programs of the form σ ψ1 , . . . , ψn we have sentences ψ rather than epistemic propositions (which we had written using boldface letters ψ 1 , . . . , ψ n in Section 2.3). Also, the signature ﬁgures into the semantics exactly in those programs σ ψ1 , . . . , ψn ; in those we require that σ ∈ . The program model ( , σ, [[ψ1 ]], . . . , [[ψn ]]) is a signature-based program model as in the previous section. [ 42 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

207

Figure 7. The language L( ) and its semantics.

The second thing to note is that, as in PDL, we have two sorts of syntactic objects: sentences and programs. We call programs of the form σ ψ1 , . . . , ψn basic actions. Note that they might not be “atomic” in the sense that the sentences ψj might themselves contain programs. The basic actions of the form σi ψ1 , . . . ψn (with i ≤ n) are called non-trivial, since they are generated by non-trivial action types. We use the following standard abbreviations: false = ¬true, ϕ ∨ ψ = ¬(¬ϕ ∧ ¬ψ), 3A ϕ = ¬2A ¬ϕ, 3∗B ϕ = ¬2∗B ϕ, and π ϕ = ¬[π ]¬ϕ. The Semantics. Deﬁnes two operations by simultaneous recursion on L( ): 1. ϕ → [[ϕ]], taking the sentences of L( ) into epistemic propositions; and 2. π → [[π ]], taking the programs of L( ) into program models (and hence into induced updates). The formal deﬁnition is given in Figure 7. The main thing to note is that with one key exception, the operations on the right-hand sides are immediate applications of our general deﬁnitions of the closure conditions on epistemic propositions from Section 2.1 and the operations on program models from Section 3.6. A good example to explain this is the clause for the semantics of sentences [α]ϕ. Assuming that we have a program model [[α]], we get an induced update in Section 3.4 which we again denote [[α]]. We also have an epistemic proposition [[ϕ]]. We can therefore form the epistemic proposition [[[α]]][[ϕ]] (see equation (4) in Section 2.3). Note that we have overloaded the square bracket notation; this is intentional, and we have done the same with other notation as well. Similarly, the semantics of skip and crash are the program models 1 and 0 of Section 3.6. We also discuss the deﬁnition of the semantics for basic actions σ ψ. This is precisely where the structure of the action signature enters. For this, recall that we have general deﬁnition of a signature-based program [ 43 ]

208

LOGICS FOR EPISTEMIC PROGRAMS

model ( , , ψ 1 , . . . , ψ n ), where ⊆ and the ψ’s are any epistemic propositions. What we have in the semantics of σ ψ is the special case of this where is the singleton {σ } and ψ i is [[ψi ]], a proposition which we will already have deﬁned when we will come to deﬁne [[σ ψ]]. At this point, it is probably good to go back to our earlier discussions in Section 2.1 of epistemic propositions and updates. What we have done overall is to give a fully syntactic presentation of languages of these epistemic propositions and updates. The constructions of the language correspond to the closure properties noted in Section 2.1. (To be sure, we have restricted to the ﬁnite case at several points because we are interested in a syntax, and at the same time we have re-introduced some inﬁnitary aspects via the Kleene star.) 4.3. Epistemic Program Logics L(S) We generalize now our signature logics L( ) to families S of signatures, in order to deﬁne a general notion of epistemic program logics. Logics Generated by Families of Signatures. Given a family S of signatures, we would like to combine all the logics {L( )} ∈S into a single logic. Let us assume the signatures ∈ S are mutually disjoint (otherwise, just choose mutually disjoint copies of these signatures). We deﬁne the logic L(S) generated by the family S in the following way: the syntax is deﬁned by taking the same deﬁnition we had in Figure 7 for the syntax of L( ), but in which on the side of the programs we take instead as basic actions all expressions of the form σ ψ1 , . . . , ψ n where σ ∈ , for some arbitrary signature ∈ S, and n is the length of the listing of non-trivial action types of . The semantics is again given by the same deﬁnition as in Figure 7, but in which the clause about σ ψ1 , . . . , ψn refers to the appropriate signature: for every ∈ S, every σ ∈ , if n is the length of the listing of , then [[σ ψ1 , . . . , ψn ]]

=

( , σ, [[ψ1 ]], . . . , [[ψn ]]).

EXAMPLE 4.2. This example constructs the logic of all epistemic programs. Take the family S

=

{ : is a ﬁnite signature }

of all ﬁnite signatures5 . The logic L(S) will be called the logic of all epistemic programs. [ 44 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

209

Preservation of Bisimulation and Atomic Propositions. We note two basic facts about the interpretations of sentences and programs in the logics L(S). PROPOSITION 4.3. The interpretations of the sentences and programs of L(S) preserve bisimulation. Proof. By induction on L(S), using Propositions 2.5 and 3.5 PROPOSITION 4.4. The interpretations of programs of L(S) preserve atomic sentences in the sense that if s [[π ]]S t, then for all atomic sentences p, s ∈ pS iff t ∈ pS([[π]]) . Proof. By induction on π . 4.4. Formalization of the Target Logics in Terms of Signatures We formalize now the target logics of Section 2.4 as epistemic program logics L(S). We use the action signatures of Examples 4.1 and the notation from there. The Logic of Public Announcements. This is formalized as L( Pub ). We have sentences [Pub ϕ]ψ, just as we described in Section 2.4. Note that L( Pub ) allows announcements inside announcements. If ϕ, ψ, and χ are sentences, then so is Pub [Pub ϕ]ψχ. We check that L( Pub ) is a good formalization of the logic of public announcements. Fix a sentence ϕ of L( Pub ) and a state model S. We calculate: S([[Pub ϕ]])

= = =

S( Pub , Pub, [[ϕ]]) S ⊗ ( Pub , Pub, [[ϕ]]) {(s, Pub) : s ∈ [[ϕ]]S }

The state model has the structure given earlier in terms of the update product operation. The update relation [[Pub ϕ]]S relates s to (s, Pub) whenever the latter belongs to the state model S([[Pub ϕ]]). The model itself is isomorphic to the sub-state-model of S induced by {s ∈ S : s ∈ [[ϕ]]S }. Under this isomorphism, the update relation is then the inverse of inclusion. This is just how the action of public announcement was described when we ﬁrst encountered it, in Example 2.1 of Section 2.3. Test-only PDL. was introduced in Section 2.6. Recall that this is PDL built over the empty set of atomic actions. Although it was not one of the target languages of Section 2.4, it will be instructive to see how it is formalized in our setting. Recall that test-only PDL has actions of the form ?ϕ. We want [ 45 ]

210

LOGICS FOR EPISTEMIC PROGRAMS

to use our action signature ? . The action types of it are ? and skip, only the ﬁrst one being non-trivial: n = 1. So technically we have sentences of the following forms: (9)

[? ϕ]χ

and [skip ϕ]χ

Let us study the semantics of the basic actions ? ϕ and skip ϕ. Fix a state model S. We calculate: S([[? ϕ]])

= = =

S( ? , ?, [[ϕ]]) S ⊗ ( ? , ?, [[ϕ]]) {(s, ?) : s ∈ [[ϕ]]S } ∪ {(s, skip) : s ∈ S}

A t in S, then The structure of S([[? ϕ]]) is that for each A, if s → A A (s, ?) → (t, skip). Also (s, skip) → (t, skip) under the same condition, and there are no other arrows. The update relation [[? ϕ]]S relates s to (s, ?) whenever the latter belongs to the updated structure. Overall, this update is isomorphic to what we described in Example 2.2 of Section 2.3. Turning to the update map of skip ϕ, we again ﬁx S. The model S([[skip ϕ]]) is again literally the same as what we calculated above. However, the update relation [[skip ϕ]]S now relates each s ∈ S to the pair (s, skip). This relation is a bisimulation. We shall formulate a notion of action equivalence later, and it will turn out that the update [[skip ϕ]] is equivalent to 1. For now, we can also consider the semantics of sentences of the form [skip ϕ]ψ. We have

[[[skip ϕ]ψ]]S

= = =

{s ∈ S : if s [[skip ϕ]]S t, then t ∈ [[ψ]]S([[skip ϕ]]) } {s ∈ S : (s, skip) ∈ [[ψ]]S([[skip ϕ]]) } {s ∈ S : s, ∈ [[ψ]]S }

This last equality is by Proposition 4.3 on bisimulation preservation. The upshot is that [[[skip ϕ]ψ]]S = [[ψ]]S . So for this reason, we might as well identify the sentences [skip ϕ]ψ and ψ. Or to put things differently, we might as well identify the basic action skip ϕ and (the constant of L( ? )) skip. Since we are already modifying the syntax, we might also abbreviate [skipϕ]ψ to ψ. Doing all this leads to a language which is exactly test-only PDL as we had it in Section 2.6, and our semantics there agrees with what we have calculated in the present discussion. In conclusion, test-only PDL is equivalent to the logic L0 ( ? ); i.e., the 2∗ -free fragment of L( ? ). The Logic of Totally Private Announcements. Let Pri be the family Pri [ 46 ]

=

{PriB : ∅ = B ⊆ A}

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

211

of all signatures of totally private announcements to non-empty groups of agents (as introduced in Examples 4.1). Then L(Pri) formalizes one of our target logics: the logic of totally private announcements. For example, in the case of A = {A, B}, L(Pri) will have basic actions of the forms: PriA ϕ, PriB ϕ, P ri A,B ϕ, skipA ϕ, skipB ϕ, skipA,B ϕ. As before, we may as well identify all the programs skipX ϕ with skip, and abbreviate [skipX ]ψ to ψ. The Logic of Common Knowledge of Alternatives. Can be formalized as L(Cka), where Cka

=

{CkaB k : ∅ = B ⊆ A, 1 ≤ k}.

4.5. Other Logics Logics Based on Frame Conditions. Many other types of logics are easily representable in our framework. For example, consider the logic of all S4 programs. To formalize this, we need only consider the disjoint union of all ﬁnite action signatures whose underlying accessibility relations are preorders. For the logic of S5 programs, we change preorders to equivalence relations. Another important class is given by the K45 axioms of modal logic. These systems are particularly important because the K45 and S5 conditions are so prominent in epistemic logic. Announcements by Particular Agents. Our modeling of the notion of the public announcement that ϕ is impersonal in the sense that the announcement does not come from anywhere in particular. It might be best understood as coming externally, as if someone shouted ϕ into the room where the agents were standing. We also want a notion of the public announcement by A of ϕ. We shall write this as PubA ϕ. For this, we identify PubA ϕ with the (externallymade) public announcement that ϕ holds and A knows this. This identiﬁcation does not represent the fact that A intends to inform the others of ϕ. But as we know, intentions are not modeled at all in our system. We claim, however, that on a purely propositional level, the identiﬁcation works. And using it, we can represent announcements by A. One way to do this is via abbreviation: we take PubA ϕψ to be an abbreviation for Pub ϕ ∧ 2A ϕψ. (A different way to formalize PubA ϕ would be to use a special signature constructed just for that purpose. But the discussion here shows that there is no need to do this. One can use Pub .) [ 47 ]

212

LOGICS FOR EPISTEMIC PROGRAMS

Lying. We can also represent misleading epistemic actions, such as lying. Again, we want to ﬁx an agent A and then represent the action of A (successfully) lying that ϕ to the other agents. To all those others, this action should be the same thing as an announcement of ϕ by A. But to say that A lies about ϕ, we want to assume that ϕ is actually false. Further, we want to assume that A moreover knows that ϕ is false. (For if ϕ just happened to be false when A said ϕ, we would not really want to call that “lying.”) The technical details on the representation go as follows. We take a A given as follows. signature Lie A

Lie

=

{SecretA , PubA }.

We take (PubA , SecretA ) as our non-repetitive list of types. The structure is A B SecretA ; for B = A, SecretA → PubA ; ﬁnally, given by taking SecretA → A B A for all B, Pub → Pub . A ) contains sentences like [SecretA ϕ, ψ]χ. The extra arguL( Lie ment ψ is a kind of secret condition. And we can use [LieA ϕ]χ as an abbreviation of [SecretA ϕ ∧ 2A ϕ, ¬ϕ ∧ 2A ¬ϕ]χ. That is, for A to lie about ϕ there is a condition that ¬ϕ ∧ 2A ¬ϕ. But the other agents neither need to know this ahead of time nor do they in any sense “learn” this from the announcement. Indeed, for the other agents, LieA ϕ is just like a truthful public announcement by A. As with private announcements, we take the family of signatures A : A ∈ A}. This family then generates a program logic. In this logic { Lie we have actions which represent lying on the part of any agent, not just one ﬁxed agent. Other Effects: Wiretapping, Paranoia etc. It is possible to model scenarios where one player believes a communication to be private while in reality a second player intercepts the communication. We can also represent gratuitous suspicion (“paranoia”): maybe no “real” action has taken place, except that some people start suspecting some action (e.g., some private communication) has taken place. With these and other effects, the problem is not so much deciding how to model them. Once one has clear intuitions about a social scenario, it is not hard to do the modeling. The real issue in their application seems to be that in complex social situations, our intuitions are not always clear. There is no getting around this, and technical solutions are of limited value for conceptual problems. [ 48 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

213

Endnote. This section is the centerpiece of this paper, and all of the work in it is new. We make a few points about the history of this approach. Our earlier paper with S. Solecki (Baltag et al. 1998) worked with a syntax that employed structured objects as actions. In fact, the actions were Kripke models themselves. This type of syntax also is used in other work such as van Ditmarsch et al. (2003). But the use of structured objects in syntax is felt by many readers to be awkward; see, e.g., remarks in chapter 4 of Kooi’s thesis Kooi (2003). We disagree with the assertion that our earlier syntax blurs the distinction between syntax and semantics. But in order to make the subject more attractive, we have worked out other approaches to the syntax. Baltag (2001, 2002, 2003) developed logical systems that streamline the syntax using a repertoire of program operations, such as learning a program and variable-binding operators. This paper is the ﬁrst one to formulate program logics in terms of action signatures.

5. L OGICAL S YSTEMS

We write |= ϕ to mean that for all state models S and all s ∈ S, s ∈ [[ϕ]]S . In this case, we say that ϕ is valid. In this section, we ﬁx an arbitrary family S of mutually disjoint action signatures, and we consider the generated logics. We present a sound proof system for the validities in L(S), and sound and complete proof systems for two important sublogics: the iteration-free fragment L1 (S) and the logic L0 (S) obtained by deleting both iteration and common knowledge operators. In particular, our results apply to the logics L( ), L1 ( ) and L0 ( ) given by only one signature. However, the soundness/completeness proofs will appear in Baltag (2003). So the point of this section is to just state clearly what the logical system is, and to illustrate its use. Sublanguages. We are of course interested in the languages L(S), but we also consider sublanguages L0 (S) and L1 (S). Recall that L1 (S) is the fragment without the action iteration construct π ∗ . L0 (S) is the fragment without π ∗ and 2∗B . It turns out that L0 ( ) is the easiest to study: it is of the same expressive power as ordinary multi-modal logic. On the other hand, the full logic L(S) is in general undecidable: indeed, even if we take a family consisting of only one signature, of public announcements Pub, the corresponding logic L(Pub) is undecidable (see Miller and Moss (2003)). L1 ( ) is decidable and we have a complete axiom system for it (see Baltag (2003)). In Figure 8 below we present a logic for L(S). We write ϕ if ϕ can be obtained from the axioms of the system using its inference rules. We often [ 49 ]

214

LOGICS FOR EPISTEMIC PROGRAMS

Figure 8. The logical system for L(S). For L1 (S), we drop the ∗∗ axioms and rule; for L0 (S), we also drop the ∗ axioms and rules.

omit the turnstile when it is clear from the context. Our presentation of the proof system uses the meta-syntactical notations associated with the notion of canonical action model (to be introduced below in Section 5.1). This could have been avoided only at the cost of complicating the presentation. We chose the simplest version, and so our logical system, as given in Figure 8, can be fully understood only after reading Section 5.1. AXIOMS. Most of the system will be quite standard from modal logic. The Action Axioms are new, however. These include the Atomic Permanence axiom; note that in this axiom p is an atomic sentence. The axiom says [ 50 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

215

that announcements do not change the brute fact of whether or not p holds. This axiom reﬂects the fact that our actions do not change any kind of local state. The Action-Knowledge Axiom gives a criterion for knowledge after an action. For non-trivial basic actions σi ψ (induced by a non-trivial action type σi ∈ , for some signature ∈ S), this axiom states that (9)

A ϕ ↔ (ψi → [σi ψ]2

A : σi → {2A [σj ψ]ϕ σj in })

In words, two sentences are equivalent: ﬁrst, an agent A will come to know ϕ after a basic action σi ψ is executed; and second, whenever this action σi ψ is executable (i.e., its precondition ψi holds), A knows (already before this action) that ϕ will come to be true after every action that A considers as possibly happening (whenever σi ψ is in fact happening). This axiom should be compared with the Ramsey axiom in conditional logic. One should also study the special case of it for the logic of public announcements in Section 5.3. The Action Rule then gives a necessary criterion for common knowledge after a simple action. (Simple actions are deﬁned below in Section Since common knowledge 5.1. They include the actions of the form σ ψ.) ∗ is formalized by the 2B construct, this rule is a kind of induction rule. (The sentences χβ play the role of strong induction hypotheses.) (For the induction rule for common knowledge assertions without actions, see Lemma 5.5.) 5.1. The Canonical Action Model Recall that we deﬁned action models and program models in Sections 3.1 and 3.2, respectively. At this point, we deﬁne a new action model called the canonical action model of a language L(S). DEFINITION. Let S be a family of mutually disjoint signatures. Recall that a basic action of L(S) is a program of the form σ ψ1 , . . . , ψn , where σ ∈ , for some signature ∈ S, and n is the length of ’s list of nontrivial action types. A simple action of L(S) is a program of L(S) in which neither the choice operation nor the iteration operation π ∗ occur. We use letters like α and β to denote simple actions only. A simple sentence is a sentence of L1 (S) in which neither action sum nor action iteration occur. So all programs in simple sentences are simple actions. The Canonical Action Model of L(S) We deﬁne a program model in several steps. The simple actions of are the simple actions of L(S) as [ 51 ]

216

LOGICS FOR EPISTEMIC PROGRAMS

A deﬁned just above. For all A, the accessibility relation → is the smallest relation such that A skip. 1. skip → A A then σ ϕ → σ ψ. 2. If σ → σ in some signature ∈ S and ϕ = ψ, A A A 3. If α → α and β → β , then α; β → α ; β .

PROPOSITION 5.1. As a frame, is locally ﬁnite: for each simple α, A∗ β. there are only ﬁnitely many β such that α −→ Proof. By induction on α; we use heavily the fact that the accessibility relations on are the smallest family with their deﬁning property. we use the assumption that all our For the simple action expressions σ ψ, signatures ∈ S are ﬁnite and mutually disjoint. Next, we deﬁne P RE : → L(S) by recursion so that P RE(skip) P RE(crash) P RE(σi ψ) P RE(σ ψ) P RE(α; β)

= = = = =

true false

ψi for σi in the given listing of

true for all trivial typesσ αP RE(β)

REMARK. This function P RE should not be confused with the function pre which is part of the structure of a program model. P RE(σ ) is a sentence in our language L(S), while pre was a purely semantic notion, associating propositions to simple actions in a program model. However, there is a connection: we are in the midst of deﬁning the program model , and its pre function is deﬁned in terms of P RE. This is also perhaps a good place to remind the reader that neither P RE nor pre is a ﬁrst-class symbol in L(S); they are deﬁned symbols. Completing the Deﬁnition of . We set pre(σ )

=

[[P RE (σ )]].

This action model is the canonical (epistemic) action model; it plays a somewhat similar role in our work to the canonical model in modal logic. PROPOSITION 5.2 (see Baltag et al. (1998) and Miller and Moss (2003)). For every family S of action signatures, the logical systems for L0 (S) and L1 (S) presented in Figure 8 are sound, complete, and decidable. However, for every signature which contains a (copy of the) “public announcement” action type Pub, the full logic L( ) (including iteration [ 52 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

217

π ∗ ) is undecidable and the proof system for L( ) presented in Figure 8 is sound but incomplete. Indeed, validity in L( ) is 11 -complete, so there are no recursive axiomatizations which are complete. (The same obviously applies to L(S) for any family of signatures S such that ∈ S.) 5.2. Some Derivable Principles LEMMA 5.3. The Action-Knowledge Axiom is provable for all simple actions α: A (10) [α]2A ϕ ↔ (P RE(α) → {2A [β]ϕ : α → β in }) The proof is by induction on α and may be found in Baltag et al. (2003). LEMMA 5.4. For all A ∈ C, all simple α, and all β such that α →A β, 1. [α]2∗C ψ → [α]ψ. 2. [α]2∗C ψ ∧ P RE(α) → 2A [β]2∗C ψ. Proof. Part (1) follows easily from the Epistemic Mix Axiom and modal reasoning. For part (2), we start with a consequence of the Epistemic Mix Axiom: 2∗C ψ → 2A 2∗C ψ. Then by modal reasoning, [α]2∗C ψ → [α]2A 2∗C ψ. By the Action-Knowledge Axiom generalized as we have it in Lemma 5.3, we have [α]2∗C ψ ∧ P RE(α) → 2A [β]2∗C ψ. LEMMA 5.5 (The Common Knowledge Induction Rule) From χ → ψ ∧ 2A χ for all A, infer χ → 2∗A ψ. Proof. We apply the Action Rule to the simple action skip, recalling that A skip for all A. P RE(skip) = true, and skip → 5.3. Logical Systems for the Target Logics We presented a number of target logics in Section 2.4, and these were then formalized in Section 4.1. In particular, we have logics L1 ( ) for a number of interesting action signatures . What we want to do here is to spell out what the axioms of L1 (S) come to when we specialize the general logic to the logics of special interest. In doing this, we ﬁnd it convenient to adopt simpler notations tailored for the fragments. The logic of public announcements is shown in Figure 9. We only included the axioms and rule of inference that speciﬁcally used the structure of the signature pub. So we did not include the sentential validities, the normality axiom for 2A , the composition axiom, modus ponens, etc. Also, we renamed the main axiom and rule to emphasize the “announcement” aspect of the system. [ 53 ]

218

LOGICS FOR EPISTEMIC PROGRAMS

Figure 9. The main points of the logic of public announcements.

Our next logic is the logic of completely private announcements to groups. We discussed the formalization of this in Section 4.4. We have actions PriB ϕ and (of course) skip. The axioms and rules are just as in the logic of public announcements, with a few changes. We must of course consider the relativized operators [PriB ϕ] instead of their simpler counterparts [Pub ϕ].) The actions skip all have true as their precondition, and since (true → ψ) is logically equivalent to ψ, we certainly may omit these actions from the notation in the axioms and rules. The most substantive change which we need to make in Figure 9 concerns the Action-Knowledge Axiom. It splits into two axioms, noted below: [PriB ϕ]2A ψ ↔ (ϕ → 2A [PriB ϕ]ψ) for A ∈ B for A ∈ /B [PriB ϕ]2A ψ ↔ (ϕ → 2A ψ) The last equivalence says: assuming that ϕ is true, then after a private announcement of ϕ to the members of B, an outsider knows ψ just in case she knew ψ before the announcement. Finally, we study the logic of common knowledge of alternatives. This, too, was introduced in Section 2.4 and formalized in Section 4.1. The Action-Knowledge now becomes B A ψ ↔ (ϕ1 → 2 for A ∈ B [CkaB ϕ]2 A [Cka ϕ]ψ) B B i A ψ ↔ (ϕ1 → 0≤i≤k 2A [Cka ϕ ]ψ) for A ∈ /B [Cka ϕ]2

where in the last clause, (ϕ1 , . . . , ϕn )i is the sequence ϕi , ϕ1 , . . . , ϕi−1 , ϕi+1 , . . . , ϕk . (That is, we bring ϕi to the front of the sequence.) [ 54 ]

219

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

5.4. Examples in the Target Logics This section studies some examples of the logic at work. We begin with an application of the Announcement Rule in the logic of public announcements. We again work with the atomic sentences H and T for heads and tails, and with the set {A, B} of agents. We show 2∗A,B (H ↔ ¬T) → [Pub H]2∗A,B ¬T. That is, on the assumption that it is common knowledge that heads and tails are mutually exclusive, then as a result of a public announcement of heads it will be common knowledge that the state is not tails. We give this application in detail. Recall that pub has one simple action which we call Pub. We take χPub to be 2∗A,B (H ↔ ¬T). In addition A Pub for all A, and there are no other arrows in pub. We take α Pub → to be Pub H; note that this is the only action accessible from itself in the canonical action model. To use the Announcement Rule, we must show that 1. 2∗A,B (H ↔ ¬T) → [Pub H]¬T. 2. (2∗A,B (H ↔ ¬T) ∧ H) → 2A 2∗A,B (H ↔ ¬T), and the same with B replacing A. From these assumptions, we may infer [Pub H]2∗A,B ¬T. For the ﬁrst statement, (a) (b) (c) (d) (e)

2∗A,B (H

↔

¬T)

→

T ↔ [Pub H]T Atomic Permanence (a), propositional reasoning (H ↔ ¬T) → (H → ¬[Pub H]T) Partial Functionality [Pub H]¬T ↔ (H → ¬[Pub H]T) Epistemic Mix 2∗A,B (H ↔ ¬T) → (H ↔ ¬T) (d), (b), (c), propositional reasoning 2∗A,B (H ↔ ¬T) → [Pub H]¬T

And the second statement is an easy consequence of the Epistemic Mix Axiom. What Happens when a Publicly Known Fact is Announced? One intuition about public announcements and common knowledge is that if ϕ is common knowledge, then announcing ϕ publicly does not change anything. Formally, we express this by a scheme rather than a single equation: (11)

2∗ ϕ → ([Pub ϕ]ψ ↔ ψ)

(In this line and in the rest of this section, we are omitting the subscripts on the 2∗ operator. More formally, the subscript should be A, since we are [ 55 ]

220

LOGICS FOR EPISTEMIC PROGRAMS

dealing with knowledge which is common to all agents.) What we would like to say is 2∗ ϕ → ψ ([Pub ϕ]ψ ↔ ψ), but of course this cannot be expressed in our language. So we consider only the sentences of the form (12), and we show that all of these are provable. We argue by induction on ϕ. For an atomic sentence p, (12) follows from the Epistemic Mix and Atomic Permanence Axioms. The induction steps for ∧ and ¬ are easy. Next, assume (12) for ψ. By necessitation and Epistemic Mix, we have 2∗ ϕ → (2A [Pub ϕ]ψ ↔ 2A ψ) Note also that by the Announcement-Knowledge Axiom 2∗ ϕ → ([Pub ϕ]2A ψ ↔ 2A [Pub ϕ]ψ) These two imply (12) for 2A ψ. Finally, we assume (12) for ψ and prove it for 2∗B ψ. We show ﬁrst that ∗ 2 ϕ ∧ 2∗ ψ → [Pub ϕ]2∗ ψ. For this we use the Action Rule. We must show that (a) 2∗ ϕ ∧ 2∗ ψ → [Pub ϕ]ψ. (b) 2∗ ϕ ∧ 2∗ ψ ∧ ϕ → 2A (2∗ ϕ ∧ 2∗ ψ). (Actually, since our common knowledge operators 2∗ here are really of the form 2∗A , we need (b) for all agents A.) Point (a) is easy from our induction hypothesis, and (b) is an easy consequence of Epistemic Mix. To conclude, we show 2∗ ϕ ∧ [Pub ϕ]2∗ ψ → 2∗ ψ. For this, we use the Common Knowledge Induction Rule of Lemma 5.5; that is, we show (c) 2∗ ϕ ∧ [Pub ϕ]2∗ ψ → ψ. (d) 2∗ ϕ ∧ [Pub ϕ]2∗ ψ → 2A (2∗ ϕ ∧ [Pub ϕ]2∗ ψ) for all A. For (c), we use Lemma 5.4, part (1) to see that [Pub ϕ]2∗ ψ → [Pub ϕ]ψ; and now (c) follows from our induction hypothesis. For (d), it will be sufﬁcient to show that ϕ ∧ [Pub ϕ]2∗ ψ → 2A [Pub ϕ]2∗ ψ This follows from Lemma 5.4, part (2). [ 56 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

221

A Commutativity Principle for Private Announcements. Suppose that B and C are disjoint sets of agents. Let ϕ1 , ϕ2 , and ψ be sentences. Then we claim that [PriB ϕ1 ][PriC ϕ2 ]ψ ↔ [PriC ϕ2 ][PriB ϕ1 ]ψ. That is, order does not matter with private announcements to disjoint groups. Actions Do Not Change Common Knowledge of Non-epistemic Sentences. For yet another application, let ψ be any boolean combination of atomic sentences. Then for all actions α of any of our logics, ψ ↔ [α]ψ. The proof is an easy induction on ψ. Even more, we have 2∗C ψ ↔ [α]2∗C ψ. In one direction, we use the Action Rule, and in the other, the Common Knowledge Induction Rule (Lemma 5.5). Endnotes. Although the general logical systems in this paper are new, there are important precursors for the target logics. Plaza (1989) constructs what we would call L0 ( Pub ), that is, the logic of public announcements without common knowledge operators or program iteration. (He worked only on models where each accessibility relation is an equivalence relation, so his system includes the S5 axioms.) Gerbrandy (1999a, b), and also Gerbrandy and Groeneveld (1997) went a bit further. They studied the logic of completely private announcements (generalizing public announcements) and presented a logical system which included the common knowledge operators. That is, their system included the Epistemic Mix Axiom. They argued that all of the reasoning in the original Muddy Children scenario can be carried out in their system. This is important because it shows that in order to get a formal treatment of that problem and related ones, one need not posit models which maintain histories. Their system was not complete since it did not have anything like the Action Rule; this ﬁrst appears in a slightly different form in Baltag (2003). 5.5. Conclusion We have been concerned with actions in the social world that affect the intuitive concepts of knowledge, (justiﬁable) beliefs, and common knowledge. This paper has shown how to deﬁne and study logical languages that contain constructs corresponding to such actions. The many examples in this paper show that the logics “work”. Much more can be said about speciﬁc tricky examples, but we hope that the examples connected to our scenarios make the point that we are developing valuable tools. [ 57 ]

222

LOGICS FOR EPISTEMIC PROGRAMS

The key steps in the development are the recognition that we can associate to a social action α a mathematical model . is a program model. In particular, it is a multi-agent Kripke model, so it has features in common with the state models that underlie formal work in the entire area. There is a natural operation of update product at the heart of our work. This operation is surely of independent interest because it enables one to build complex and interesting state models. The logical languages that we introduce use the update product in their semantics, but the syntax is a small variation on propositional dynamic logic. The formalization of the target languages involved the signature-based languages L( ) and also their generalizations L(S). These latter languages are needed to formulate the logic of private announcements, for example. We feel that presenting the update product ﬁrst (before the languages) will make this paper easier to read, and having a relatively standard syntax should also help. Once we have our languages, the next natural step is to study them. This paper presented logical systems for validities, omitting many proofs due to the lack of space.

NOTES 1 It is important for us that the sentence p be a syntactic object, while the proposition p

be a semantic object. See Section 2.5 for further discussion. 2 The subscript 3 comes from the number of the scenario; we shall speak of corresponding

models S1 , S2 , etc., and each time the models will be the ones pictured in Section 1.1. 3 We are writing relational composition in left-to-right order in this paper. 4 In Section 4.5, we shall consider a slightly simpler model of lying. 5 To make this into a set, instead of a proper class, we really mean to take all ﬁnite

signatures whose action types are natural numbers, and then take the disjoint union of this countable set of ﬁnite signatures.

REFERENCES

Baltag, Alexandru: 1999, ‘A Logic of Epistemic Actions’, (Electronic) Proceedings of the FACAS workshop, held at ESSLLI’99, Utrecht University, Utrecht. Baltag, Alexandru: 2001, ‘Logics for Insecure Communication’, in J. van Bentham (ed.) Proceedings of the Eighth Conference on Rationality and Knowledge (TARK’01), Morgan Kaufmann, Los Altos, pp. 111–122. Baltag, Alexandru: 2002, ‘A Logic for Suspicious Players: Epistemic Actions and Belief Updates in Games’, Bulletin Of Economic Research 54(1), 1–46. Baltag, Alexandru: 2003, ‘A Coalgebraic Semantics for Epistemic Programs’, in Proceedings of CMCS’03, Electronic Notes in Theoretical Computer Science 82(1), 315–335.

[ 58 ]

ALEXANDRU BALTAG AND LAWRENCE S. MOSS

223

Baltag, Alexandru: 2003, Logics for Communication: Reasoning about Information Flow in Dialogue Games. Course presented at NASSLLI’03. Available at http://www.indiana.edu/∼nasslli. Baltag, Alexandru, Lawrence S. Moss, and Sławomir Solecki: 1998, ‘The Logic of Common Knowledge, Public Announcements, and Private Suspicions’, in I. Gilboa (ed.), Proceedings of the 7th Conference on Theoretical Aspects of Rationality and Knowledge (TARK’98), pp. 43–56. Baltag, Alexandru, Lawrence S. Moss, and Sławomir Solecki: 2003, ‘The Logic of Epistemic Actions: Completeness, Decidability, Expressivity’, manuscript. Fagin, Ronald, Joseph Y. Halpern, Yoram Moses, and Moshe Y. Vardi: 1996, Reasoning About Knowledge, MIT Press. Fischer, Michael J. and Richard E. Ladner: 1979, ‘Propositional Modal Logic of Programs’, J. Comput. System Sci. 18(2), 194–211. Gerbrandy, Jelle: 1999a, ‘Dynamic Epistemic Logic’, in Lawrence S. Moss, et al (eds), Logic, Language, and Information, Vol. 2, CSLI Publications, Stanford University. Gerbrandy, Jelle: 1999b, Bisimulations on Planet Kripke, Ph.D. dissertation, University of Amsterdam. Gerbrandy, Jelle and Willem Groeneveld: 1997, ‘Reasoning about Information Change’, J. Logic, Language, and Information 6, 147–169. Gochet, P. and P. Gribomont: 2003, ‘Epistemic Logic’, manuscript. Kooi, Barteld P.: 2003, Knowledge, Chance, and Change, Ph.D. dissertation, University of Groningen. Meyer, J.-J. and W. van der Hoek: 1995, Epistemic Logic for AI and Computer Science, Cambridge University Press, Cambridge. Miller, Joseph S. and Lawrence S. Moss: 2003, ‘The Undecidability of Iterated Modal Relativization’, Indiana University Computer Science Department Technical Report 586. Moss, Lawrence S.: 1999, ‘From Hypersets to Kripke Models in Logics of Announcements’, in J. Gerbrandy et al. (eds), JFAK. Essays Dedicated to Johan van Benthem on the Occasion of his 60th Birthday, Vossiuspers, Amsterdam University Press. Plaza, Jan: 1989, ‘Logics of Public Communications’, Proceedings, 4th International Symposium on Methodologies for Intelligent Systems. Pratt, Vaughn R.: 1976, ‘Semantical Considerations on Floyd-Hoare Logic’, in 7th Annual Symposium on Foundations of Computer Science, IEEE Comput. Soc., Long Beach, CA, pp. 109–121. van Benthem, Johan: 2000, ‘Update Delights’, manuscript. van Benthem, Johan: 2002, ‘Games in Dynamic Epistemic Logic’, Bulletin of Economic Research 53(4), 219–248. van Benthem, Johan: 2003, ‘Logic for Information update’, in J. van Bentham (ed.) Proceedings of the Eighth Conference on Rationality and Knowledge (TARK’01), Morgan Kaufmann, Los Altos, pp. 51–68. van Ditmarsch, Hans P.: 2000, ‘Knowledge Games’, Ph.D. dissertation, University of Groningen. van Ditmarsch, Hans P.: 2001, ‘Knowledge Games’, Bulletin of Economic Research 53(4), 249–273. van Ditmarsch, Hans P., W. van der Hoek, and B. P. Kooi: 2003, in V. F. Hendricks et al. (eds), Concurrent Dynamic Epistemic Logic, Synt. Lib. vol. 322, Kluwer Academic Publishers.

[ 59 ]

224

LOGICS FOR EPISTEMIC PROGRAMS

Alexandru Baltag Oxford University Computing Laboratory Oxford, OX1 3QD, U.K. E-mail: [email protected] Lawrence S. Moss Mathematics Department Indiana University Bloomington, IN 47405, U.S.A. E-mail: [email protected]

[ 60 ]

HANS ROTT

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES OF BELIEF FORMATION

ABSTRACT. In recent years there has been a growing consensus that ordinary reasoning does not conform to the laws of classical logic, but is rather nonmonotonic in the sense that conclusions previously drawn may well be removed upon acquiring further information. Even so, rational belief formation has up to now been modelled as conforming to some fundamental principles that are classically valid. The counterexample described in this paper suggests that a number of the most cherished of these principles should not be regarded as valid for commonsense reasoning. An explanation of this puzzling failure is given, arguing that a problem in the theory of rational choice transfers to the realm of belief formation.

1. INTRODUCTION

Part of the cognitive state of a person is characterized by the set of her beliefs and expectations. By the term ‘belief formation’, we will refer to two different kinds of processes. The ﬁrst one is that of drawing inferences from a given set of sentences. Ideally, as a result of this process the reasoner arrives at a well-balanced set of beliefs in reﬂective equilibrium, sometimes referred to as the ‘belief set’. The second process is that of readjusting one’s belief set in response to some perturbation from outside (‘belief transformation’ might be a better term in this case). We will consider two subspecies of belief change that may be triggered by external perturbations. If there is some cognitive ‘input’, a sentence to be accepted, we speak of a belief revision. If the reasoner has to withdraw one of her beliefs, without accepting another belief in its place, she performs a belief contraction. We will be dealing with the three topics of inference, revision and contraction in turn. For each of them, we consider two qualitative principles that have up to now been regarded as very plausible ones. We shall then tell a story with a few alternative developments which is intended to show that all of these principles fail. They do not fail due to the contingencies of some particular system that is being proposed, but indeed as norms for good reasoning. Based on recent reconstructions of belief formation in terms of the theory of rational choice, we give an explanation Synthese 139: 225–240, 2004. Knowledge, Rationality & Action 61–76, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 61 ]

226

HANS ROTT

of why these principles fail. It turns out that well-known problems of this very general theory transfer to the special case where the theory is applied in operations of belief formation. In fact, it will turn out that this special case has features that block one of the standard excuses for the problem at hand. We end up in a quandary that poses a serious challenge to any future conception of belief formation procedures. For a long time, the notion of inference has been thought to be identical with the notion of logical consequence or deduction. Partly as a result of the problems encountered in research in artiﬁcial intelligence and knowledge representation during the 1960s and 1970s, however, logicians have come to realize that most of our reasoning proceeds on the basis of incomplete knowledge and insufﬁcient evidence. Implicit assumptions about the normal state and development of the world, also known as expectations, presumptions, prejudices or defaults, step in to ﬁll the gaps in the reasoner’s body of knowledge. These default assumptions form the context for ordinary reasoning processes. They help us to generate conclusions that are necessary for reaching decisions about how to act, but they are retractible if further evidence arises. Thus our inferences will in many contexts be defeasible or non-monotonic in the sense that an extension of the set of premises does not generally result in an increase of the set of legitimate conclusions. This, however, does not mean that the classical concept of logical consequence gets useless, or that our reasoning gets completely irregular. For the purposes of this paper, we can in fact assume1 that the set of our beliefs (and similarly: the set of our expectations) is consistent and closed with respect to some broadly classical consequence operation Cn. This combined notion of logical coherence (consistency-cum-closure) may be viewed as a constraint that makes the processes of inference and belief change a non-trivial task.2

2. SIX FUNDAMENTAL PRINCIPLES OF BELIEF FORMATION

2.1. In the last two decades, a great number of systems for non-monotonic reasoning have been devised that are supposed to cope with the newly discovered challenge.3 Many classical inference patterns are violated by such systems, but it is equally important to keep in mind that quite a number of classical inference patterns do remain valid. Let us now have a look at two properties that have usually been taken to be constitutive of sound reasoning with logical connectives like ‘and’ and ‘or’ even in the absence of monotonicity. First, if the premise x allows the reasoner to conclude that y is true, then y may be conjoined to the premise x, without spoiling any of the [ 62 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

227

conclusions that x alone permits to be drawn. This severely restricted form of the classical monotony condition is usually called Cumulative Monotony or Cautious Monotony: (1)

If y is in Inf(x), then Inf(x) ⊆ Inf(x ∧ y).

Here Inf(x) denotes the set of all conclusions that can be drawn if the premise is x. The reasoner may in fact possess an arbitrary ﬁnite number of premises which are conjunctively tied together in x. More importantly, Inf(x) is meant to denote what can be obtained if x is all the information available to the agent. Second, if a reasoner wants to know what to infer from a disjunction x ∨ y, she may reason by cases. She will consider ﬁrst what would hold on the assumption of x, and then consider what would hold on the assumption of y. Any sentence that may be inferred in both of these cases should be identiﬁable as a conclusion of an inference starting from x ∨ y. This is the content of a condition called Disjunction in the Premises: (2)

Inf (x) ∩ Inf (y) ⊆ Inf (x ∨ y).

Cumulative Monotony and Disjunction in the Premises hold in most defeasible reasoning systems that have been proposed since non-monotonic logic came into being, and especially in those systems that are semantically well-motivated. There is one important and striking exception. Reiter’s (1980) seminal system of Default Logic violates both Cumulative Monotony and Disjunction in the Premises. However, no advocate of Reiter’s logic has ever argued that Cumulative Monotony and Disjunction in the Premises should be violated. These violations have usually been taken to be defects of the system that need to be remedied.4 Conditions (1) and (2) have never lost their normative force. 2.2. Let us now turn to the revision of belief sets in response to new information. If someone has to incorporate a conjunction x ∧ y, she has to accept both x and y. One idea how to go about revising by the conjunction is to revise ﬁrst with x. If it so happens that y is accepted in the resulting belief set, then one should be sure that every belief contained in this set is also believed after a revision of the original belief set by x ∧ y. In the following, B ∗ x denotes the set of beliefs held after revising the initial belief set B by x (and likewise for the input sentence x ∧ y). (3)

If y is in B ∗ x, then B ∗ x ⊆ B ∗ (x ∧ y).

Another approach to circumscribing the result of a revision by the conjunction x ∧ y is to revise ﬁrst with x, and then to just add y set-theoretically [ 63 ]

228

HANS ROTT

and take the logical consequences of everything taken together. This is not always a good idea, since y may be inconsistent with B ∗ x, and thus the second step would leave us with the inconsistent set of all sentences. But even if we may end up with too many sentences, this strategy seems unobjectionable if it is taken as yielding an upper bound for the revision by a conjunction. This is the content of principle (4)

B ∗ (x ∧ y) ⊆ Cn ((B ∗ x) ∪ {y}).

2.3. Finally, we consider the removal of beliefs. Here again, we focus on upper and lower bounds of changes with respect to conjunctions. If a person wants to remove effectively a conjunction x ∧ y, she has to remove at least one of the conjuncts, that is, either x or y. So if the second conjunct y is still retained in the result of removing the conjunction, what has happened is exactly that the ﬁrst conjunct x has been removed. We will be content here with a weaker condition that replaces the identity by an . inclusion. Here and elsewhere, B − x denotes the set of beliefs that are retained after withdrawing x from the initial belief set B (and likewise for the case where x ∧ y is to be discarded): (5)

. . . If y is in B − (x ∧ y), then B −(x ∧ y) ⊆ B − x.

Another approach to circumscribing the result of a contraction with respect to the conjunction x ∧ y is to consider ﬁrst what would be the result of removing x, and then consider what would be the result of removing y. It is not always necessary to take into account both possibilities, but doing so should certainly be suitable for setting a lower bound. Any sentence that survives both of these thought experiments should surely be included in the result of the removal of x ∧ y. This is the content of principle (6)

. . . B − x ∩ B − y ⊆ B − (x ∧ y).

Principles (3)–(6) have been endorsed almost universally in the literature on belief revision and contraction. The classic standard was set by Alchourrón, Gärdenfors and Makinson (1985). Conditions (4) and (6) are the seventh of their eight ‘rationality postulates’ for revision and contraction, conditions (3) and (5) are considerably weaker – and thus considerably less objectionable – variants of their eighth postulates.5 There exist sophisticated ‘translations’ between operations of nonmonotonic inference, belief revision and removal which show that notwithstanding different appearances, conditions (1), (3) and (5) are essentially different sides of the same (three-faced) coin, as are conditions (2), (4) and (6).6 [ 64 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

229

Some methods of belief formation suggested in the literature violate one or the other of the six principles. Nevertheless, it is fair to say that these principles have retained their great intuitive appeal and have stood fast up to the present day as norms to which all good reasoning is supposed to conform. But this is wrong, or so I shall argue. The following section presents an example that shows, I think, that not a single one of the six principles listed above ought to be endorsed as a valid principle of rational belief formation.

3. THE COUNTEREXAMPLE

The story goes as follows. A well-known philosophy department has announced an open position in metaphysics. Among the applicants for the job there are a few persons that Paul, an interested bystander, happens to know. First, there is Amanda Andrews, an outstanding specialist in metaphysics. Second, we have Bernice Becker, who is also deﬁnitely a very good, though not quite as excellent a metaphysician as Andrews. Becker has in addition done some substantial work in logic. A third applicant is Carlos Cortez. He has a comparatively slim record in metaphysics, but he is widely recognized as one of the most brilliant logicians of his generation. Suppose that Paul’s initial set of beliefs and expectations includes that neither Andrews nor Becker nor Cortez will get the job (say, because Paul and everybody else thinks that Don Doyle, a star metaphysician, is the obvious candidate who is going to get the position anyway). Paul is aware of the fact that only one of the contenders can get a job. 3.1. Consider now three hypothetical scenarios, each of which describes a potential development of the selection procedure. The scenarios are not meant as describing consecutive stages of a single procedure. At most one of the potential scenarios can turn out to become real. In each of these alternative scenarios, Paul is genuinely taken by surprise, because he learns that one of the candidates he had believed to be turned down will – or at least may – be offered the position. (Doyle, by the way, has told the department that he has accepted an offer from Berkeley.) To make things shorter, we introduce some abbreviations. Let the letters a, b and c stand for the sentences that Andrews, Becker and Cortez, respectively, will be offered the position. Paul is having lunch with the dean, a very competent, serious and profoundly honest man who is also the chairman of the selection committee. [ 65 ]

230

HANS ROTT

SCENARIO 1. The dean tells Paul in conﬁdence that either Andrews or Becker will be appointed. This message comes down to supplying Paul with the premise a ∨ b. Given this piece of information, Paul concludes that Andrews will get the job. This conclusion is based on his background assumptions that Andrews has superior qualities as a metaphysician and that expertise in the ﬁeld advertised is the decisive criterion for the appointment. From his background knowledge that there is only one position available, Paul further infers that all the other candidates are going to return empty-handed. SCENARIO 2. In this scenario the dean tells Paul that either Andrews or Becker or Cortez will get the job, thus supplying him with the premise a ∨ b ∨ c. This piece of information triggers off a rather subtle line of reasoning. Knowing that Cortez is a splendid logician, but that he can hardly be called a metaphysician, Paul comes to realize that his background assumption that expertise in the ﬁeld advertised is the decisive criterion for the appointment cannot be upheld. Apparently, competence in logic is regarded as a considerable asset by the selection committee. Still, Paul keeps on believing that Cortez will not make it in the end, because his credentials in metaphysics are just too weak. Since, however, logic appears to contribute positively to a candidate’s research proﬁle, Paul concludes that Becker, and not Andrews, will get the job. This qualitative description should do for our purposes, but for those who prefer the precision of numbers, the following elaboration of our story can be given (see Figure 1). Suppose that the selection committee has decided to assign numerical values in order to evaluate the candidates’ work. Andrews scores 97 out of 100 in metaphysics, but she has done no logic whatsoever, so she scores 0 here. Becker scores 92 in metaphysics and a respectable 50 in logic. Cortez scores only 40 in metaphysics, but boasts of 99 in logic. In scenario 1, Paul takes it that metaphysics is the only criterion, so clearly Andrews must be the winner in his eyes. But in scenario 2, Paul gathers that, rather unexpectedly, logic has some importance. As can easily be veriﬁed, any weight he may wish to attach to the logic score between 1/10 and 1/2 (with metaphysics taking the rest) will see Becker ending up in front of both Andrews and Cortez. SCENARIO 3. This is a very surprising scenario in which Paul is told that Cortez is actually the only serious candidate left in the competition. There is little need to invest a lot of thinking. Paul accepts c in this case. Let us summarize the scenarios as regards the conclusions Paul would draw from the various premises that he may get from the dean of the [ 66 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

231

Figure 1. Becker is ahead of both Andrews and Cortez if logic is assigned a weight between 0.1 and 0.5.

faculty. In scenario 1, Paul infers from a ∨ b that a and ¬b (along with ¬c and ¬d which we will not mention any more). In scenario 2, he infers from a ∨ b ∨ c that ¬a and b. In scenario 3, he infers from c that ¬a and ¬b. Now we ﬁrst ﬁnd that this situation does not conform to Cumulative Monotony. Substitute a ∨ b ∨ c for x and a ∨ b for y in (1). Even though Paul concludes that a ∨ b is true on the basis of the premise a ∨ b ∨ c, it is not the case that everything inferable from the latter is also inferable from (a ∨ b ∨ c) ∧ (a ∨ b) which is equivalent with a ∨ b. Sentences ¬a and b are counterexamples. Second, the example at the same time shows that Disjunction in the Premises does not hold. Take (2) and substitute a ∨ b for x and c for y. Then notice that ¬b can be inferred both from a ∨ b and from c, but it cannot be inferred from a ∨ b ∨ c. Summing up, even though Paul’s reasoning is perfectly rational and sound, it violates both Cumulative Monotony and Disjunction in the Premises. 3.2. Let us then turn to the dynamics of belief. The case of potential revisions of belief is very similar to the case of default reasoning. What we have so far considered as the set of all sentences that may be inferred from a given premise x, will now be reinterpreted as the result of revising a belief set by a new piece of information. This is best explained by looking at the concrete case of the selection procedure. Paul’s initial belief set B contains ¬a, ¬b, ¬c and d (among other things). Paying attention to the fact that the structure of (3) is very [ 67 ]

232

HANS ROTT

similar to the structure of (1), we can re-use the above argument. If Paul’s set of initial beliefs and expectations is revised by a ∨ b ∨ c, then the resulting belief set includes a ∨ b (because it includes b). However, the revised belief set B ∗ (a ∨ b ∨ c) is not a subset of the belief set B ∗ ((a ∨ b ∨ c) ∧ (a ∨ b)) = B ∗ (a ∨ b), as is borne out by sentences like ¬a and b. Thus (3) is violated. In principle (4), substitute a ∨ b ∨ c for x and a ∨ b for y. Then the lefthand side is changed to B ∗(a ∨b), while the right-hand side consists of the set of all logical consequences of B ∗ (a ∨ b ∨ c) and a ∨ b taken together. Since a ∨b is already included in B ∗(a ∨b∨c), we need only consider this latter set. But as we have by now seen several times, the two revised belief sets just mentioned cannot be compared in terms of the subset relation. So (4) is violated. 3.3. For the consideration of belief contractions, we have to change our story slightly. Suppose now that in the different scenarios Paul may be going through, the dean does not go so far as to tell him that Andrews or Becker (or Cortez) will get the offer, but only that one of them might get the offer. Paul’s proper response to this is to withdraw his prior belief that none of Andrews and Becker (and Cortez) will get the job, without at the same time acquiring any new belief instead. In all other respects the story is just the same as before. So this time, in scenario 1 , when Paul is given the information that Andrews or Becker might get the job, he withdraws his belief that ¬a, but he keeps ¬b. And in the alternative scenario 2 , when Paul learns from the dean that Andrews or Becker or Cortez might get the job, he again understands that competence in logic is regarded as an asset by the selection committee, and so he withdraws ¬b while retaining ¬a. Scenario 3 just leads Paul to withdraw ¬c. Now we can see that the prescriptions of the above principles for belief contraction are not complied with. First consider principle (5) and substi. tute ¬a ∧ ¬b for x and ¬c for y. Then we get ¬c in B − (¬a ∧ ¬b ∧ ¬c), . but this belief set is not a subset of B − (¬a∧¬b), since ¬a is in the former but not in the latter set. Finally, the same substitutions serve to refute principle (6). The belief . . ¬b is retained in both B − (¬a ∧ ¬b) and B − ¬c, but it is withdrawn in . B − (¬a ∧ ¬b ∧ ¬c). In sum, then, we have found that Paul’s reasoning which is perfectly rational and adequate for the situations sketched leads to belief formation processes that violate each of the six fundamental principles (1)–(6). How can this be explained?

[ 68 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

233

4. FIRST REACTION : A PROBLEM OF FORMALIZATION

A ﬁrst intuitive reaction to the puzzle is to simply deny that the example exhibits the formal structure that it has been represented as having here, and to claim instead that the various messages Paul may receive from the dean are incompatible with one another. When the dean says that either Andrews or Becker or Cortez will be offered the job, isn’t she, in some sense, saying more than when she says that either Andrews or Becker will be the winner of the competition? Namely, that it is possible that Cortez will be offered the position, while the latter message, at least implicitly, excludes that possibility. Shouldn’t we therefore represent the dean’s message in a somewhat more explicit way? Three things can be said in reply to this objection. First, it is true that a∨ b implicitly conveys the information that Cortez is not among the selected candidates. However, the kind of reasoning that turns implicit messages into explicit beliefs is exactly what is meant to be captured by theories of nonmonotonic reasoning and belief change. It is therefore important to insist that ¬c is not part of the dean’s message, but that it is rather inferred (perhaps subconsciously, automatically) by the reasoner. Representing the dean’s statement in scenario 1 as (a∨b)∧¬c would simply not be adequate. Second, it is of course true that the message a ∨ b ∨ c does not in itself exclude the possibility that c will come out true. But it is not necessary that each individual disjunct is considered to be a serious possibility by any of the interlocutors. For instance, nothing in the story commits us to the view that either the dean or Paul actually believes that Cortez stands a chance of being offered the position. So the dean’s statement in scenario 2 must not be represented as saying that each of a, b and c is possible. As is common in the literature on belief formation, we presuppose in this paper that our language does not include the means to express autoepistemic possibility, something like c (read as, ‘for all I believe, c is possible’). We just saw that we do not need such means for the present case, and we want to limit the expressiveness of the propositional language. Admitting autoepistemic operators immediately makes matters extremely complicated and invalidates almost all of the logical principles that have been envisaged in the theory of belief formation.7 We conclude that the problem is not caused by a sloppy translation of a commonsensical description of the case into regimented language. What, then, does the problem arise from?

[ 69 ]

234

HANS ROTT

5. PROBLEMS OF RATIONAL CHOICE ARE PROBLEMS FOR BELIEF FORMATION

Principles of nonmonotonic inference and belief change can be systematically interpreted in terms of rational choice.8 In this view, the process of belief formation is one of resolving conﬂicts among one’s beliefs and expectations by following through in thought the most plausible possibilities. According to a semantic modelling, the reasoner takes on as beliefs everything that is the case in all of the most plausible models that satisfy the given information, where the most plausible models are determined with the help of a selection function. A syntactic modelling, closely related to the semantic one, describes the reasoner as eliminating the least plausible sentences from a certain set of sentences that generates the conﬂict within her belief or expectation set. And again, the task of determining the least plausible sentences is taken over by a selection function. It is not possible here to give a description of these nicely dovetailing mechanisms even in the barest outlines. Sufﬁce it to say that there are elaborate theories exhibiting in full mathematical detail striking parallels between the ‘theoretical reason’ at work in belief formation processes and those parts of ‘practical reason’ that manifest themselves in processes of rational choice. On this interpretation, Disjunction in the Premises (2) and its counterparts for belief change, (4) and (6), turn out to be instantiations of one of the most fundamental conditions – perhaps the most fundamental condition – of the theory of rational choice. This condition, called Independence of Irrelevant Alternatives, the Chernoff property or Sen’s Property α, says that any element which is optimal in a certain set is also optimal in all subsets of that larger set in which it is contained. Cumulative Monotony (1) and its counterparts, (3) and (5), have turned out to be instantiations of another important condition in the theory of rational choice, namely to Aizerman’s axiom. The above scenarios are modelled after well-known choice situations in which Property α is violated, cases which also happen to disobey Aizerman’s axiom. These properties may fail to be satisﬁed if the very ‘menu’ from which an agent is invited to choose carries information which is new to the agent. The locus classicus for the problem is a passage in Luce and Raiffa (1957, p. 288). They tell a story about a customer of a restaurant who chooses salmon from a menu consisting of salmon and steak only, but changes to steak after being informed that fried snails and frog’s legs are on the menu, too. This customer is not to be blamed for irrationality. The reason why he changes his mind is that he infers from the extended menu [ 70 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

235

that the restaurant must be a good one, one where no risk is involved in taking the steak (which is the customer’s ‘real’ preference as it were). Sen calls this phenomenon the ‘epistemic value’ or the ‘epistemic relevance of the menu’.9 Luce and Raiffa chose to avoid the problem by ﬁat: This illustrates the important assumption implicit in axiom 6 [=essentially Sen’s Property α], namely, that adding new acts to a decision problem under uncertainty does not alter one’s a priori information as to which is the true state of nature. In what follows, we shall suppose that this proviso is satisﬁed. In practice this means that, if a problem is ﬁrst formulated so that the availability of certain acts inﬂuences the plausibility of certain states of nature, then it must be reformulated by redeﬁning the states of nature so that the interaction is eliminated.

Luce and Raiffa thus explain away the problem of the restaurant customer because the extended menu conveys the information that the restaurant is a good one. The customer’s choice is not really between salmon and steak, but basically between salmon-in-a-good-restaurant and steak-in-agood-restaurant (assuming that he does not like fried snakes and frog’s legs). Analogously, we may say that in the above example, Paul’s doxastic choice is not simply one between the belief that Andrews gets the job and the belief that Becker gets the job. Given the information in scenario 2, his choice is rather between the belief that Andrews gets the job and logic matters, and the belief that Becker gets the job and logic matters. So it seems that the two scenarios cannot be compared in the ﬁrst place. Have we solved the puzzle now? No, we haven’t. To see this, we have to understand ﬁrst what is not responsible for the problem. In Luce and Raiffa’s example, the reason for the trouble is not that the extended menu introduces a reﬁnement in the customer’s options, nor is it that his preferences change, nor is it that the second situation cannot be compared with the ﬁrst. The customer is well aware right from the beginning that there are good restaurants and bad restaurants, and that he would prefer steak in a good, but salmon in a bad restaurant. What the availability of snails and frog’s legs signals, however, is that the customer is actually in a good restaurant, whereas he had at ﬁrst been acting on the assumption that he is in a bad one.10 Luce and Raiffa are right in suggesting that the point is that the extended menu carries novel information about the state of the world. Luce and Raiffa’s argument thus may make good sense as a rejoinder in the context of the general theory of choice and decision. It is simply not this theory’s business to explain how information is surreptitiously conveyed through the particular contents of a certain menu. So Luce and Raiffa have a justiﬁcation for refusing to deal with that problem. Unfortunately, no analogous defense is available against the problem highlighted in the present [ 71 ]

236

HANS ROTT

paper. It is the business of theories of belief formation (which include expectation-based inference and belief change) to model how one’s prior information is affected by information received from external sources. This is precisely what these theories are devised to explain! Therefore, the anomaly cannot be pushed away into a neighbouring research ﬁeld.

6. CONCLUSION

What is the moral of our story? We began by reviewing six of the most important and central logical principles that have generally been taken to be valid in common-sense reasoning and that have widely been endorsed as yardsticks for evaluating the adequacy of systems of non-classical logics intended to capture such reasoning. We have seen, however, that there are situations in which these reasoning patterns should not be expected to hold. This comes down to declaring them invalid, not as a contingent property of some particular system that has been proposed in the literature, but as norms to which rational belief formation ought to conform. We have dismissed as premature the idea that the formalization should be puffed up in order to represent what is only implicit in the dean’s message. In the context of theories of belief formation, it seems more appropriate to obey a First maxim of formalization: Do not put into the formalization what is not part of the ordinary language expression. Another lesson to be drawn from the above discussion is that a choicetheoretic modelling of belief formation processes does not only inherit the elegance and power of the theory of rational choice, but also its problems. This is not a trivial observation. Problems encountered in a general theory need not necessarily persist if this theory is applied to a restricted domain. The processes involved in belief formation are of a broadly logical kind, and one might expect that this domain is particularly well-behaved so that one would not encounter the strange phenomena haunting the notion of rational choice. Our example has shown that this is not the case. The problems do carry over from the general to the more speciﬁc domain. Fundamental principles of belief formation are as affected by the anomaly of the ‘informational value of the menu’ as the principles of rational choice. I have then argued that things are even worse as regards this special domain. The reason is that a natural defense – Luce and Raiffa’s defense – which makes sense for the general theory is not open for the special case of belief formation. The problem of the informational value of the menu [ 72 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

237

may appear to be alien to the concerns of rational choice theory. However, it is clear that theories of belief formation are theories just about the processing of information that comes in propositional form. The source of the trouble concerns a paradigm problem for belief formation theories rather than something that may be discharged into some other ﬁeld of research. But perhaps the solution lies in the formalization after all. Compare the dean’s statement that a ∨ b ∨ c with other ways of obtaining this information. Suppose for example, that Paul hears about Doyle’s new job, that he observes all the other candidates except Andrews, Becker and Cortez leaving the place with long faces, but that he is not sure whether he has missed anyone else wearing a long face. The information Paul has in this scenario seems to be the just same as the one obtained in scenario 2 above: Andrews or Becker or Cortez will get the job. But this time, of course, Paul would not conclude that logic is important.11 Why is this? I suggest that the case points to the converse to the maxim above, namely to the Second maxim of formalization: Put into the formalization everything that is part of the ordinary language expression. By ‘expression’, I now do not mean the type, but the token. The expression is not to be taken here as separated from its context of production. What Paul really learns in scenario 2 is not just that either Andrews, Becker or Cortez will get the job, but that the dean says so. And it is part of Paul’s background assumptions that the dean would not say this without very good reasons. The most plausible reason is, according to Paul’s background beliefs, that the selection committee regards Cortez’s scientiﬁc proﬁle as impressive and suitable. No such reasoning takes place in the alternative scenario in which a ∨ b ∨ c is obtained by inductive elimination of the other candidates. The fact that a piece of information comes from a certain origin or source or is transmitted by a certain medium conveys information of its own. In a short slogan, there is no message without a medium. What the example seems to teach us is that at least in some cases, the reasoner should receive not just the content of a message, but take account of the message-with-the-medium. Unfortunately, this leaves us with a dilemma. Formalization is translation into a formal language, and this translation is done with a view to processing the formulas obtained. This is the Third maxim of formalization: The results of a formalization should be efﬁciently processable by an appropriate system of formal reasoning.

[ 73 ]

238

HANS ROTT

Usually, the processing is handled by some kind of logic that allows the reasoner to combine her bits and pieces of information and draw conclusions from them. It is easy to take a ∨ b ∨ c, say, and use it together with other statements about Andrews, Becker and Cortez to infer more about the relevant facts. For example, if the reasoner receives further information that ¬c, then she can conclude that a ∨ b. But it is much harder to work with X-says-that-(a ∨ b ∨ c), since adding Y-tells-me-that-(¬c) does not give him any logical consequences. The medium screens off the message, as it were, from the reasoner’s attempt to exploit its content. Formalization is more accurate if it keeps track of the sources, but it is not very useful any more. The reasoner needs to detach the message from the medium, in order to be able to utilize its content for unmediated inferences. So our discussion has a negative end. Once we make logic more ‘realistic’ in the sense that it captures patterns of everyday reasoning, there is no easy way of saving any of the beautiful properties that have endeared classical logic to students of the subject from Frege on. We have identiﬁed a formidable problem, but we haven’t been able to offer an acceptable solution for it. But problems there are, and creating awareness of problems is one of the important tasks of philosophy.

ACKNOWLEDGEMENTS

I would like to thank Luc Bovens, Anthony Gillies, Franz Huber, Isaac Levi, David Makinson, Erik Olsson, Wlodek Rabinowicz, Krister Segerberg, Wolfgang Spohn and two anonymous referees of this journal for many helpful comments. It is only lack of space and time that prevents me from taking up more of the interesting issues they have raised.

NOTES 1 Along with Dennett (1971, pp. 10–11) and Stalnaker (1984, p. 82), as well as the

majority of the more technical literature mentioned below. 2 What has been said in this little paragraph takes the position that the nonmonotonicity

of commonsense reasoning is an effect of a certain way of using classical logic, rather than a result of applying some irreducibly non-classical, ampliative inference operation (see e.g., Morgan 2000 and Kyburg 2001). The counterexample below is independent of any particular philosophical stand in this matter. 3 For a an excellent survey of the logical patterns underlying nonmonotonic reasoning, see Makinson (1994). 4 See for instance the discussions in Brewka (1991), Giordano and Martelli (1994) and Roos (1998).

[ 74 ]

A COUNTEREXAMPLE TO SIX FUNDAMENTAL PRINCIPLES

239

5 For comprehensive treatments of this topic, see Gärdenfors (1988), Gärdenfors and Rott

(1995) and Hansson (1999). 6 See Makinson and Gärdenfors (1991) and Rott (2001, chapter 4). 7 Some of the relevant problems are highlighted by Rott (1989) and Lindström and Ra-

binowicz (1999). – As for the underlying language, it is worth noting that our challenge to the theory of belief formation does not depend on any extension of standard propositional language, as other counterexamples to prominent logical principles do. Compare the much-debated riddles raised by McGee’s (1985) counterexample to modus ponens and Gärdenfors’ (1986) trivialization theorem, which both depend on the language’s including non-truth functional conditionals. 8 The theory of rational choice I am referring to here is the classical one deriving from economists like Paul Samuelson, Kenneth Arrow and Amartya Sen. A beautiful and concise summary of the most relevant ideas is given by by Moulin (1985). This theory is applied to the ﬁeld of belief formation by Lindström (1991) and Rott (1993, 2001). 9 Sen (1993, pp. 500–502; 1995, pp. 24–26) has brought the problem to wide attention. There are other reasons why Property α may fail without the chooser being irrational; see for example Levi (1986, pp. 32–34) and Kalai, Rubinstein and Spiegler (2002) about decision making on the basis of multiple preference relations. It remains to be seen how many of the reasons that speak against Property α as a general requirement for rational choice apply to the rather special domain of belief formation. 10 Two sorts of reasons come to mind that might account for the customer’s pessimism. Either his experience is that there are more bad restaurants than good ones, which makes is more likely that the one he is just visiting is bad. Or the pessimistic assumption is made because it is the relevant one for ﬁnding out which decision minimizes maximal damage, and the customer indeed wishes to be on the safe side. 11 I wish to thank Wlodek Rabinowicz for inventing such a scenario in a discussion at a conference in Prague. Wolfgang Spohn came up with a similar point independently a little later in an email message.

REFERENCES

Alchourrón, Carlos, Peter Gärdenfors and David Makinson: 1985, ‘On the Logic of Theory Change: Partial Meet Contraction Functions and Their Associated Revision Functions’, Journal of Symbolic Logic 50, 510–530. Brewka, Gerd: 1991, ‘Cumulative Default Logic: In Defense of Non-Monotonic Inference Rules’, Artiﬁcial Intelligence 50, 183–205. Dennett, Daniel: 1971, ‘Intentional Systems’, Journal of Philosophy 68, 87–106. The page reference is to the reprint in D.D.: 1978, Brainstorms, MIT Press, Cambridge, MA, pp. 3–22. Gärdenfors, Peter: 1986, ‘Belief Revisions and the Ramsey Test for Conditionals’, Philosophical Review 95, 81–93. Gärdenfors, Peter: 1988, Knowledge in Flux. Modeling the Dynamics of Epistemic States, Bradford Books, MIT Press, Cambridge, MA. Gärdenfors, Peter and Hans Rott: 1995, ‘Belief Revision’, in D. M. Gabbay, C. J. Hogger, and J. A. Robinson (eds.), Handbook of Logic in Artiﬁcial Intelligence and Logic Programming Volume IV: Epistemic and Temporal Reasoning, Oxford University Press, Oxford, pp. 35–132.

[ 75 ]

240

HANS ROTT

Giordano, Laura and Alberto Martelli: 1994, ‘On Cumulative Default Logics’, Artiﬁcial Intelligence 66, 161–179. Hansson, Sven O.: 1999, A Textbook of Belief Dynamics: Theory Change and Database Updating, Kluwer Academic Publishers, Dordrecht. Kalai, Gil, Ariel Rubinstein, and Ran Spiegler: 2002, ‘Rationalizing Choice Functions by Multiple Rationales’, Econometrica 70, 2481–2488. Kyburg, Henry E., Jr.: 2001, ‘Real Logic is Nonmonotonic’, Minds and Machines 11, 577– 595. Levi, Isaac: 1986, Hard Choices, Cambridge University Press, Cambridge. Lindström, Sten: 1991, ‘A Semantic Approach to Nonmonotonic Reasoning: Inference Operations and Choice’, Uppsala Prints and Preprints in Philosophy, Department of Philosophy, University of Uppsala. Lindström, Sten and Wlodek Rabinowicz: 1999, ‘DDL Unlimited: Dynamic Doxastic Logic for Introspective Agents’, Erkenntnis 50, 353–385. Luce, R. Duncan and Howard Raiffa: 1957, Games and Decisions, John Wiley & Sons, New York. Makinson, David: 1994, ‘General Patterns in Nonmonotonic Reasoning’, in D. M. Gabbay, C. J. Hogger, and J. A. Robinson (eds.), Handbook of Logic in Artiﬁcial Intelligence and Logic Programming, Vol. 3: Nonmonotonic Reasoning and Uncertain Reasoning, Oxford University Press, Oxford, pp. 35–110. Makinson, David and Peter Gärdenfors: 1991, ‘Relations between the Logic of Theory Change and Nonmonotonic Logic’, in A. Fuhrmann and M. Morreau (eds.), The Logic of Theory Change, Springer LNAI 465, Berlin, pp. 185–205. McGee, Van: 1985, ‘A Counterexample to Modus Ponens’, Journal of Philosophy 82, 462– 471. Morgan, Charles G.: 2000, ‘The Nature of Nonmonotonic Reasoning’, Minds and Machines 10, 321–360. Moulin, Hervé: 1985, ‘Choice Functions over a Finite Set: A Summary’, Social Choice and Welfare 2, 147–160. Reiter, Raymond: 1980, ‘A Logic of Default Reasoning’, Artiﬁcial Intelligence 13, 81–132. Roos, Nico: 1998, ‘Reasoning by Cases in Default Logic’, Artiﬁcial Intelligence 99, 165– 183. Rott, Hans: 1989, ‘Conditionals and Theory Change: Revisions, Expansions, and Additions’, Synthese 81, 91–113. Rott, Hans: 1993, ‘Belief Contraction in the Context of the General Theory of Rational Choice’, Journal of Symbolic Logic 58, 1426–1450. Rott, Hans: 2001, Change, Choice and Inference, Oxford Logic Guides, Vol. 42, Clarendon Press, Oxford. Sen, Amartya K.: 1993, ‘Internal Consistency of Choice’, Econometrica 61, 495–521. Sen, Amartya K.: 1995, ‘Is the Idea of Purely Internal Consistency of Choice Bizarre?’, in J. E. J. Altham and R. Harrison (eds.), World, Mind, and Ethics. Essays on the Ethical Philosophy of Bernard Williams, Cambridge University Press, Cambridge, pp. 19–31. Stalnaker, Robert C.: 1984, Inquiry, Bradford Books, MIT Press, Cambridge, MA. Department of Philosophy, University of Regensburg 93040 Regensburg, Germany E-mail: [email protected]

[ 76 ]

VALENTIN GORANKO and WOJCIECH JAMROGA

COMPARING SEMANTICS OF LOGICS FOR MULTI-AGENT SYSTEMS

ABSTRACT. We draw parallels between several closely related logics that combine – in different proportions – elements of game theory, computation tree logics, and epistemic logics to reason about agents and their abilities. These are: the coalition game logics CL and ECL introduced by Pauly in 2000, the alternating-time temporal logic ATL developed by Alur, Henzinger and Kupferman between 1997 and 2002, and the alternating-time temporal epistemic logic ATEL by van der Hoek and Wooldridge (2002). In particular, we establish some subsumption and equivalence results for their semantics, as well as interpretation of the alternating-time temporal epistemic logic into ATL. The focus in this paper is on models: alternating transition systems, multi-player game models (alias concurrent game structures) and coalition effectivity models turn out to be intimately related, while alternating epistemic transition systems share much of their philosophical and formal apparatus. Our approach is constructive: we present ways to transform between different types of models and languages.

1. INTRODUCTION

In this study we offer a comparative analysis of several recent logical enterprises that aim at modeling multi-agent systems. Most of all, the coalition game logic CL and its extended version ECL (Pauly 2002, 2000b, 2001), and the Alternating-time Temporal Logic ATL (Alur et al. 1997, 1998a, 2002) are studied. These turn out to be intimately related, which is not surprising since all of them deal with essentially the same type of scenarios, viz. a set of agents (players, system components) taking actions, simultaneously or in turns, on a common set of states – and thus effecting transitions between these states. The game-theoretic aspect is very prominent in both approaches; furthermore, in both frameworks the agents pursue certain goals with their actions and in that pursuit they can form coalitions. In both enterprises the objective is to develop formal tools for reasoning about such coalitions of agents and their ability to achieve speciﬁed outcomes in these action games. An extension of ATL, called Alternating-time Temporal Epistemic Logic (ATEL) was introduced in van der Hoek and Wooldridge (2002) in order to enable reasoning about agents acting under incomplete informaSynthese 139: 241–280, 2004. Knowledge, Rationality & Action 77–116, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 77 ]

242

VALENTIN GORANKO AND WOJCIECH JAMROGA

tion. Although the semantics for ATEL is still under debate, the original version of that logic is certainly worth investigating. It turns out that, while extending ATL, ATEL can be interpreted into the former in the sense that there is a translation of models and formulas of ATEL into ATL that preserves the satisﬁability of formulas. This does not imply that logics like ATEL are redundant, of course – in fact, the way of expressing epistemic facts in ATL is purely technical, and the resulting formulas look rather unnatural. Similarly, each of the three alternative semantics for ECL and ATL, investigated here, has its own drawbacks and offers different advantages for practical use. The rest of the paper is organized a follows: ﬁrst, we offer a brief summary of some basic concepts from game theory; then we introduce the main “actors” of our study – logics and structures that have been used for modeling multi-agent systems in temporal perspective. In order to make the paper self-contained we have included all relevant deﬁnitions from Pauly (2002, 2001, 1998a), Alur et al. (2002), van der Hoek and Wooldridge (2002).1 In Sections 3 and 4 the relationships between these logics and structures are investigated in a formal way. We show that: 1. Speciﬁc classes of multi-player game models are equivalent to some types of alternating transition systems. 2. ATL subsumes CL as well as ECL. 3. The three alternative semantics for Alternating-time Temporal Logic and Coalition Logics (based on multi-player game models, alternating transition systems and coalition effectivity models) are equivalent. 4. Formulas and models of ATEL can be translated into its fragment ATL. The paper partly builds on previous work of ours, included in Goranko (2001) and Jamroga (2003).

2. MODELS AND LOGICS OF STRATEGIC ABILITY

The logics studied here have several things in common. They are intended for reasoning about various aspects of multi-agent systems and multi-player games, they are multi-modal logics, they have been obviously inspired by game theory, and they are based on the temporal logic approach. We present and discuss the logics and their models in this section. A broader survey of logic-based approaches to multi-agent systems can be found in Fagin et al. (1995) and van der Hoek and Wooldridge (2003b). [ 78 ]

LOGICS FOR MULTI-AGENT SYSTEMS

243

Figure 1. Extensive and strategic form of the matching pennies game: (A) perfect information case; (B) a1 does not show his coin before the end of the game.

2.1. Basic Inﬂuences 2.1.1. Classical Game Theory Logics of agents and action build upon several important concepts from game theory, most of them going back to the 40s and the seminal book (von Neumann and Morgenstern 1944). We will start with an informal survey of these concepts, following Mostly Hart (1992). An interested reader is referred to Aumann and Hart (1992), Osborne and Rubinstein (1994) for a more extensive introduction to game theory. In game theory, a game is usually presented in its extensive and/or strategic form. The extensive form deﬁnes the game via a tree of possible positions in the game (states), game moves (choices) available to players, and the outcome (utility or payoff) that players gain at each of the ﬁnal states. These games are usually turn-based, i.e., every state is assigned a player who controls the choice of the next move, so the players are taking turns. A strategy for player a speciﬁes a’s choices at the states controlled by a. The strategic form consists of a matrix that presents the payoffs for all combinations of players’ strategies. It presents the whole game in a ‘snapshot’ as if it was played in one single move, while the extensive form emphasizes control and information ﬂow in the game. [ 79 ]

244

VALENTIN GORANKO AND WOJCIECH JAMROGA

EXAMPLE 1. Consider a variant of the matching pennies game. There are two players, each with a coin: ﬁrst a1 chooses to show the heads (action h) or tails (t), then a2 does. If both coins are heads up or both coins are tails up, then a1 wins (and gets score of 1) and a2 loses (score 0). If the coins show different sides, then a2 is the winner. The extensive and strategic forms for this game are shown in Figure 1A. The strategies deﬁne agent’s choices at all ‘his’ nodes, and are labeled appropriately: q1 hq2 t denotes, for instance, a strategy for a2 in which the player chooses to show heads whenever the current state of the game is q1 , and tails at q2 . Note that – using this strategy – a2 wins regardless of the ﬁrst move from a1 . The information available to agents is incomplete in many games. Classical game theory handles this kind of uncertainty through partitioning every player’s nodes into so called information sets. An information set for player a is a set of states that are indistinguishable for a. Traditionally, information sets are deﬁned only for the states in which a chooses the next step. Now a strategy assigns choices to information sets rather than separate states, because players are supposed to choose the same move for all the situations they cannot distinguish. EXAMPLE 2. Suppose that a1 does not show his coin to a2 before the end of the game. Then nodes q1 and q2 belong to the same information set of a2 , as shown in Figure 1B. No player has a strategy that guarantees his win any more. A general remark is in order here. The concept of coalitional game, traditionally considered in game theory, where every possible coalition is assigned a real number (its worth), differs somewhat from the one considered here. In this study we are rather concerned with qualitative aspects of game structures rather than with quantitative analysis of speciﬁc games. It should be clear, however, that these two approaches are in agreement and can be easily put together. Indeed, the intermediate link between them is the notion of (qualitative) effectivity function (Pauly 2002). That notion naturally transfers over to alternating transition systems, thus providing a framework for purely game-theoretic treatment of alternating temporal logics. 2.1.2. Computational Tree Logic and Epistemic Logic Apart from game theory, the concepts investigated in this paper are strongly inﬂuenced by modal logics of computations (such as the computation tree logic CTL) and beliefs (epistemic logic). CTL (Emerson 1990; [ 80 ]

LOGICS FOR MULTI-AGENT SYSTEMS

245

Figure 2. Transitions of the variable controller/client system, together with the tree of possible computations.

Huth and Ryan 2000) involves several operators for temporal properties of computations in transition systems: E (for all paths), A (there is a path), X (nexttime), F (sometime), G (always) and U (until). ‘Paths’ refer to alternative courses of events that may happen in the future; nodes on a path denote states of the system in subsequent moments of time along this particular course. Typically, paths are interpreted as sequences of successive states of computations. EXAMPLE 3. As an illustration, consider a system with a binary variable x. In every step, the variable can retain or change its value. The states and possible transitions are shown in Figure 2. There are two propositions available to observe the value of x: ‘x = 0’ and ‘x = 1’. Then, for example, EF x = 1 is satisﬁed in every state of the system: there is a path such that x will have the value of 1 at some moment. However, the above is not true for every possible course of action: ¬AF x = 1. It is important to distinguish between the computational structure, deﬁned explicitly in the model, and the behavioral structure, i.e., the model of how the system is supposed to behave in time (Schnoebelen 2003). In many temporal models the computational structure is ﬁnite, while the implied behavioral structure is inﬁnite. The computational structure can be seen as a way of deﬁning the tree of possible (inﬁnite) computations that may occur in the system. The way the computational structure unravels into a behavioral structure (computation tree) is shown in Figure 2, too. Epistemic logic offers the notion of epistemic accessibility relation that generalizes information sets, and introduces operators for talking about [ 81 ]

246

VALENTIN GORANKO AND WOJCIECH JAMROGA

individual and collective knowledge. Section 4 describes them in more detail; a reader interested in a comprehensive exposition on epistemic logic can be also referred to the seminal book by Fagin, Halpern, Moses and Vardi (Fagin et al. 1995), or to van der Hoek and Verbrugge (2002) for a survey. 2.2. Coalition Game Logics and Multi-Player Game Models Coalition logic (CL), introduced in Pauly (2000b, 2002), formalizes reasoning about powers of coalitions in strategic games. It extends the classical propositional logic with a family of (non-normal) modalities [A], A ⊆ Agt, where Agt is a ﬁxed set of players. Intuitively, [A]ϕ means that coalition A can enforce an outcome state satisfying ϕ. 2.2.1. Multi-Player Strategic Game Models Game frames (Pauly 2002), represent multi-player strategic games where sets of players can form coalitions in attempts to achieve desirable outcomes. Game frames are based on the notion of a strategic game form – a tuple Agt, { a | a ∈ Agt}, Q, o} consisting of: – a non-empty ﬁnite set of agents (or players) Agt, – a family of (non-empty) sets of actions (choices, strategies) a for each player a ∈ Agt, – a non-empty set of states Q, – an outcome function o: a∈Agt a → Q which associates an outcome state in Q to every combination of choices from all the players. By a collective choice σA we will denote a tuple of choices σa a∈A (one for each player from A ⊆ Agt), and we will be writing o(σA , σAgt\A ) with the presumed meaning. REMARK 1. Note that the notion of “strategy” in strategic game forms is local, wrapped into one-step actions. It differs from the notion of ‘strategy’ in extensive game forms (used in the semantics of ATL) which represents a global, conditional plan of action. To avoid confusion, we will refer to the local strategies as choices, and use the term collective choice instead of strategy proﬁle from Pauly (2002) to denote a combination of simultaneous choices from several players. REMARK 2. A strategic game form deﬁnes the choices and transitions available at a particular state of the game. If the identity of the state does not follow from the context in an obvious way, we will use indices to indicate which state they refer to. [ 82 ]

LOGICS FOR MULTI-AGENT SYSTEMS

247

Figure 3. Transitions of the variable controller/client system.

The set of all strategic game forms for players Agt over states Q will be Agt denoted by Q . A multi-player game frame for a set of players Agt is a Agt pair Q, γ where γ : Q → Q is a mapping associating a strategic game form with each state in Q. A multi-player game model (MGM) for a set of players Agt over a set of propositions is a triple M = Q, γ , π where Q, γ is a multi-player game frame, and π : Q → P () is a valuation labeling each state from Q with the set of propositions that are true at that state. EXAMPLE 4. Consider a variation of the system with binary variable x from Example 3. There are two processes: the controller (or server) s can enforce the variable to retain its value in the next step, or let the client change the value. The client c can request the value of x to be 0 or 1. The players proceed with their choices simultaneously. The states and transitions of the system as a whole are shown in Figure 3. Again, we should make the distinction between computational and behavioral structures. The multi-player game model unravels into a computation tree in a way analogous to CTL models (cf. Figure 2). 2.2.2. Coalition Logic Formulas of CL are deﬁned recursively as: ϕ := p | ¬ϕ | ϕ ∨ ψ | [A]ϕ, where p ∈ is a proposition, and A ⊆ Agt is a group of agents. The semantics of CL can be given via the clauses: – M, q |= p iff p ∈ π(q) for atomic propositions p; – M, q |= [A]ϕ iff there is a collective choice σA such that for every collective choice σAgt\A , we have M, oq (σA , σAgt\A ) |= ϕ. EXAMPLE 5. Consider the variable client/server system from Example 4. The following CL formulas are valid in this model (i.e., true in every state of it): [ 83 ]

248

VALENTIN GORANKO AND WOJCIECH JAMROGA

1. (x = 0 → [s]x = 0) ∧ (x = 1 → [s]x = 1): the server can enforce the value of x to remain the same in the next step; 2. x = 0 → ¬[c]x = 1: c cannot change the value from 0 to 1 on his own; 3. x = 0 → ¬[s]x = 1: s cannot change the value on his own either; 4. x = 0 → [s, c]x = 1: s and c can cooperate to change the value. 2.2.3. Logics for Local and Global Effectivity of Coalitions In CL, the operators [A]ϕ can express local effectivity properties of coalitions, i.e., their powers to force outcomes in single ‘rounds’ of the game. Pauly (2000b) extends CL to the Extended Coalition Logic ECL with iterated operators for global effectivity [A∗ ]ϕ expressing the claim that coalition A has a collective strategy to maintain the truth of ϕ throughout the entire game. In our view, and in the sense of Remark 1, both systems formalize different aspects of reasoning about powers of coalitions: CL can be thought as reasoning about strategic game forms, while ECL rather deals with extensive game forms, representing sequences of moves, collectively effected by the players’ actions. Since ECL can be embedded as a fragment of ATL (as presented in Section 2.4), we will not discuss it separately here. 2.3. Alternating-Time Temporal Logic and its Models Game-theoretic scenarios can occur in various situations, one of them being open computer systems such as computer networks, where the different components can act as relatively autonomous agents, and computations in such systems are effected by their combined actions. The Alternatingtime Temporal Logics ATL and ATL∗ , introduced in Alur et al. (1997), and later reﬁned in Alur et al. (1998a, 2002), are intended to formalize reasoning about computations in such open systems which can be enforced by coalitions of agents, in a way generalizing the logics CTL and CTL∗. 2.3.1. The Logics ATL and ATL∗ In ATL∗ a class of cooperation modalities A replaces the path quantiﬁers E and A. The common-sense reading of A is: The group of agents A have a collective strategy to enforce regardless of what all the other agents do.

ATL is the fragment of ATL∗ subjected to the same syntactic restrictions which deﬁne CTL as a fragment of CTL∗ , i.e., every temporal operator must be immediately preceded by exactly one cooperation modality. The original CTL∗ operators E and A can be expressed in ATL∗ with [ 84 ]

LOGICS FOR MULTI-AGENT SYSTEMS

249

Agt and ∅ respectively, but between both extremes one can express much more about the abilities of particular agents and groups of agents. Since model-checking for ATL∗ requires 2EXPTIME, but it is linear for ATL, ATL is more useful for practical applications, and we will not discuss ATL∗ in this paper. Formally, the recursive deﬁnition of ATL formulas is: ϕ := p | ¬ϕ | ϕ ∨ ψ | AXϕ | AGϕ | AϕUψ The ‘sometime’ operator F can be deﬁned in the usual way as: AF ϕ ≡ Aϕ. It should be noted that at least three different versions of semantic structures for ATL have been proposed by Alur and colleagues in the last 7 years. The earliest version (Alur et al. 1997), involves deﬁnitions of a synchronous turn-based structure and an asynchronous structure in which every transition is controlled by a single agent. The next paper (Alur et al. 1998a) deﬁnes general structures called alternating transition systems where the agents’ choices are identiﬁed with the sets of possible outcomes. In the concurrent game structures from Alur et al. (2002), labels for choices are introduced and the transition function is simpliﬁed. The above papers share the same title and they are often cited incorrectly in the literature as well as citation indices, which may lead to some confusion. 2.3.2. Alternating Transition Systems Alternating transition systems – building on the concept of alternation developed in Chandra et al. (1981) – formalize systems of transitions effected by collective actions of all agents involved. In the particular case of one agent (the system), alternating transition systems are reduced to ordinary transition systems, and ATL reduces to CTL. An alternating transition system (ATS) is a tuple T = , Agt, Q, π, δ where: – is a set of (atomic) propositions, Agt is a non-empty ﬁnite set of agents, Q is a non-empty set of states, and π : Q → P () is a valuation of propositions; – δ: Q×Agt → P (P (Q)) is a transition function mapping a pair state, agent to a non-empty family of choices of possible next states. The idea is that at state q an agent a chooses a set Qa ∈ δ(q, a) thus forcing the outcome state to be from Qa . The resulting transition leads to a state which is in the intersection of all Qa for a ∈ Agt and so it reﬂects the mutual will of all agents. Since the system is required to be deterministic (given the state and the agents’ decisions), Q1 ∩ . . . ∩ Qk must always be a singleton.2 [ 85 ]

250

VALENTIN GORANKO AND WOJCIECH JAMROGA

Figure 4. An ATS for the controller/client problem.

DEFINITION 1. A state q2 ∈ Q is a successor of q1 if, whenever the system is in q1 , the agents can cooperate so that the next state is q2 , i.e., there are choice sets Qa ∈ δ(q1 , a), for each a ∈ Agt such that

suc a∈Agt Qa = {q2 }. The set of successors of q will be denoted by Qq . DEFINITION 2. A computation in T is an inﬁnite sequence of states q0 q1 . . . such that qi+1 is a successor of qi for every i ≥ 0. A q-computation is a computation starting from q. 2.3.3. Semantics of ATL Based on Alternating Transition Systems DEFINITION 3. A strategy for agent a is a mapping fa : Q+ → P (Q) which assigns to every non-empty sequence of states q0 , . . . qn a choice set fa (q0 . . . qn ) ∈ δ(qn , a). The function speciﬁes a’s decisions for every possible (ﬁnite) history of system transitions. A collective strategy for a set of agents A ⊆ Agt is just a tuple of strategies (one per agent from A): FA = fa a∈A . Now, out(q, FA ) denotes the set of outcomes of FA from q, i.e., the set of all q-computations in which group A has been using FA . REMARK 3. This notion of strategy can be speciﬁed as ‘perfect recall strategy’, where the whole history of the game is considered when the choice of the next move is made by the agents. The other extreme alternative is a ‘memoryless strategy’ where only the current state is taken in consideration; further variations on ‘limited memory span strategies’ are possible. While the choice of one or another notion of strategy affects the semantics of the full ATL∗ , it is not difﬁcult to see that both perfect recall strategies and memoryless strategies eventually yield equivalent semantics for ATL. [ 86 ]

LOGICS FOR MULTI-AGENT SYSTEMS

251

Let [i] denote the ith position in computation . The deﬁnition of truth of an ATL formula at state q of an ATS T = , Agt, Q, π, δ follows through the below clauses. Informally speaking, T , q A iff there exists a collective strategy FA such that is satisﬁed for all computations from out(FA , q). (A, X) T , q AXϕ iff there exists a collective strategy FA such that for every computation ∈ out(q, FA ) we have T , [1] ϕ; (A , G) T , q AGϕ iff there exists a collective strategy FA such that for every ∈ out(q, FA ) we have T , [i] ϕ for every i ≥ 0. (A, U) T , q AϕUψ iff there exists a collective strategy FA such that for every ∈ out(q, FA ) there is i ≥ 0 such that T , [i] ψ and for all j such that 0 ≤ j < i we have T , [j ] ϕ. EXAMPLE 6. An ATS for the variable client/server system is shown in Figure 4. The following ATL formulas are valid in this model: 1. (x = 0 → sX x = 0) ∧ (x = 1 → sX x = 1): the server can enforce the value of x to remain the same in the next step; 2. x = 0 → (¬cF x = 1 ∧ ¬sF x = 1): neither c nor s can change the value from 0 to 1, even in multiple steps; 3. x = 0 → s, cF x = 1: s and c can cooperate to change the value. 2.3.4. Semantics of ATL, Based on Concurrent Game Structures and Multi-player Game Models Alur et al. (2002) redeﬁnes ATL models as concurrent game structures: M = k, Q, , π, d, o, where k is the number of players (so Agt can be taken to be {1, . . . , k}), the decisions available to player a at state q are labeled with natural numbers up to da (q) (so a (q) can be taken to be {1, . . . , da (q)}); ﬁnally, a complete tuple of decisions α1 , . . . , αk from all the agents in state q implies a deterministic transition according to the transition function o(q, α1 , . . . , αk ). In a concurrent game structure the type of a strategy function is slightly different since choices are abstract entities indexed by natural numbers now, and a strategy is a mapping fa : Q+ → N such that fa (λq) ≤ da (q). The rest of the semantics looks exactly the same as for alternating transition systems. REMARK 4. Clearly, concurrent game structures are equivalent to Pauly’s multi-player game models; they differ from each other only in notation.3 [ 87 ]

252

VALENTIN GORANKO AND WOJCIECH JAMROGA

Thus, the ATL semantics can as well be based on MGMs, and the truth deﬁnitions look exactly the same as for alternating transition systems (see Section 2.3.3). We leave rewriting the deﬁnitions of a strategy, collective strategy and outcome set in terms of multi-player game models to the reader. The next section shows how this shared semantics can be used to show that ATL subsumes coalition logics. 2.4. Embedding CL and ECL into ATL Both CL and ECL are strictly subsumed by ATL in terms of the shared semantics based on multi-player game models.4 Indeed, there is a translation of the formulas of ECL into ATL, which becomes obvious once the ATL semantic clause (A, X) is rephrased as: [A] T , q AXϕ iff there exists a collective choice FA = {fa }a∈A such that for every collective choice F

Agt\A = {fa }a∈Agt\A , we have

T , s ϕ, where {s} = a∈A fa (q) ∩ a∈Agt\A fa (q) which is precisely the truth-condition for [A]ϕ in the coalition logic CL. Thus, CL embeds in a straightforward way as a simple fragment of ATL by translating [A]ϕ into AXϕ. Accordingly, [C ∗ ]ϕ translates into ATL as AGϕ, which follows from the fact that each of [C ∗ ]ϕ and AGϕ, is the greatest ﬁxpoint of the same operator over [C]ϕ and AXϕ respectively (see Section 2.5). In consequence, ATL subsumes ECL as the fragment ATLXG involving only AXϕ and AGϕ. We will focus on ATL, and will simply regard CL and ECL as its fragments throughout the rest of the paper. 2.5. Effectivity Functions and Coalition Effectivity Models as alternative semantics for ATL As mentioned earlier, game theory usually measures the powers of coalitions quantitatively, and characterizes the possible outcomes in terms of payoff proﬁles. That approach can be easily transformed into a qualitative one, where the payoff proﬁles are encoded in the outcome states themselves and each coalition is assigned a preference order on these outcome states. Then, the power of a coalition can be measured in terms of sets of states in which it can force the actual outcome of the game (i.e., sets for which it is effective), thus deﬁning another semantics for ATL, based on so called coalition effectivity models (introduced by Pauly for the coalition logics CL and ECL). This semantics is essentially a monotone neighborhood semantics for non-normal multi-modal logics, and therefore it enables the results, methods and techniques already developed for modal logics to be applied here as well. [ 88 ]

253

LOGICS FOR MULTI-AGENT SYSTEMS

Figure 5. A coalition effectivity function for the variable client/server system.

DEFINITION 4. A (local) effectivity function is a mapping of type e: P (Agt) → P (P (Q)). The idea is that we associate with each set of players the family of outcome sets for which their coalition is effective. However, the notion of effectivity function as deﬁned above is abstract and not every effectivity function corresponds to a real strategic game form. Those which do can be characterized with the following conditions: 1. Liveness: for every A ⊆ Agt, ∅ ∈ / e(A). 2. Termination: for every A ⊆ Agt, Q ∈ e(A). 3. Agt-maximality: if X ∈ / e(Agt) then Q \ X ∈ e(∅) (if X cannot be effected by the grand coalition of players, then Q \ X is inevitable). 4. Outcome-monotonicity: if X ⊆ Y and X ∈ e(A) then Y ∈ e(A). 5. Super-additivity: for all A1 , A2 ⊆ Agt and X1 , X2 ⊆ Q, if A1 ∩ A2 = ∅, X1 ∈ e(A1 ), and X2 ∈ e(A2 ), then X1 ∩ X2 ∈ e(A1 ∪ A2 ). We note that super-additivity and liveness imply consistency of the powers: for any A ⊆ Agt, if X ∈ e(A) then Q \ X ∈ e(Agt \ A). DEFINITION 5. An effectivity function e is called playable if conditions (1)–(5) hold for e. DEFINITION 6. An effectivity function e is the effectivity function of a strategic game form γ if it associates with each set of players A from γ the family of outcome sets {Q1 , Q2 , . . .}, such that for every Qi the coalition A has a collective choice to ensure that the next state will be in Qi . THEOREM 5 (Pauly 2002). An effectivity function is playable iff it is the effectivity function of some strategic game form. EXAMPLE 7. Figure 5 presents a playable effectivity function that describes powers of all the possible coalitions for the variable server/client system from Example 4, and state q0 . DEFINITION 8. A coalition effectivity frame is a triple F = Agt, Q, E where Agt is a set of players, Q is a non-empty set of states and E: Q → (P (Agt) → P (P (Q))) is a mapping which associates an effectivity [ 89 ]

254

VALENTIN GORANKO AND WOJCIECH JAMROGA

function with each state. We shall write Eq (A) instead of E(q)(A). A coalition effectivity model (CEM) is a tuple E = Agt, Q, E, π where Agt, Q, E is a coalition effectivity frame and π is a valuation of the atomic propositions over Q. DEFINITION 8. A coalition effectivity frame (resp. coalition effectivity model) is standard if it contains only playable effectivity functions. Thus, coalition effectivity models provide semantics of CL by means of the following truth deﬁnition (Pauly 2002): E, q |= [A]ϕ iff {s ∈ E | E, s |= ϕ} ∈ Eq (A). This semantics can be accordingly extended to semantics for ECL (Pauly 2001) and ATL (Goranko 2001) by deﬁning effectivity functions for the global effectivity operators in extensive game forms, where they indicate the outcome sets for which the coalitions have long-term strategies to effect. This extension can be done using the following ﬁxpoint characterizations of AGϕ and AϕUϕ as follows: AGϕ := νZ.ϕ ∧ AXZ, AϕUψ := µZ.ψ ∨ ϕ ∧ AXZ.

3. EQUIVALENCE OF THE DIFFERENT SEMANTICS FOR ATL

In this section we compare the semantics for Alternating-time Temporal Logic, based on alternating transition systems and multi-player game models – and show their equivalence (in the sense that we can transform the models both ways while preserving satisﬁability of ATL formulas). Further, we show that these semantics are both equivalent to the semantics based on coalition effectivity models. The transformation from alternating transition systems to multi-player game models is easy: in fact, for every ATS, an equivalent MGM can be constructed via re-labeling transitions (see Section 3.2). Construction the other way round is more sophisticated: ﬁrst, we observe that all multiplayer game models obtained from alternating transition systems satisfy a special condition we call convexity (Section 3.2); then we show that for every convex MGM, an equivalent ATS can be obtained (Section 3.3). Finally, we demonstrate that for every arbitrary multi-player game model a convex MGM can be constructed that satisﬁes the same formulas of ATL (Section 3.4). [ 90 ]

LOGICS FOR MULTI-AGENT SYSTEMS

255

We show also that the transformations we propose preserve the property of being a turn-based structure, and that they transform injective MGMs into lock-step synchronous ATSs and vice versa. 3.1. Some Special Types of ATSs and MGMs DEFINITION 9 (Pauly 2002). A strategic game form Agt, { a | a ∈ Agt}, Q, o is an a-dictatorship if there is a player a ∈ Agt who determines the outcome state of the game, i.e., ∀σa ∈ a ∃q ∈ Q∀σAgt\{a} o(σa , σAgt\{a} ) = q. An MGM Q, γ , π is turn-based if every γ (q) is a dictatorship.5 We note that the notion of a-dictatorship is quite strong: it presumes that any choice of the dictator forces a chosen state as the outcome. A meaningful alternative, which one can aptly call a-leadership, is when some choices of a can force the next state (the “wise choice of the leader”). It should be interesting to investigate whether the dictatorship-based and leadership-based strategic game forms lead to equivalent semantics for ATL. DEFINITION 10. A strategic game form is injective if o is injective, i.e., assigns different outcome states to different tuples of choices. An MGM is injective if it contains only injective game forms. EXAMPLE 8. Note that the variable client/server game model from Figure 3 is not injective, because choices reject, set0 and reject, set1 always have the same outcome. The model is not turn-based either: s is a leader at both q0 and q1 (he can determine the next state with σs = reject), but the outcome of his other choice (σs = accept) depends on the choice of the client. On the other hand, the game tree from Figure 1A can be seen as a turn-based MGM: player a1 is the dictator at state q0 , and player a2 is the dictator at q1 and q2 (both players can be considered dictators at q3 , q4 , q5 and q6 ). DEFINITION 11 (Alur et al. 1997). An ATS is turn-based synchronous if for every q ∈ Q there is an agent a who decides upon the next state, i.e., δ(q, a) consists entirely of singletons. Every ATS can be “tightened” by removing from every Q ∈ δ(q, a) all states which can never be realized as successors in a transition from q. Every reasonably general criterion should accept such tightening as equivalent to the original ATS. [ 91 ]

256

VALENTIN GORANKO AND WOJCIECH JAMROGA

DEFINITION 12. An ATS T = , Agt, Q, π, δ is tight if, for every q ∈ Q, a ∈ Agt and Qa ∈ δ(q, a), we have Qa ⊆ Qsuc q . COROLLARY 6. For every ATS T there is a tight ATS T which satisﬁes the same formulas of ATL. DEFINITION 13. An ATS is lock-step synchronous if the set of sucof every cessor states Qsuc q state q can be labeled with all tuples from some Cartesian product a∈Agt Qa so that all choice sets from δ(q, a) are ‘hyperplanes’ in Qsuc q , i.e., sets of the form {qa } × b∈Agt\{a} Qb , 6 where qa ∈ Qa . In other words, the agents act independently and each of them can only determine its ‘private’ component of the next state. It is worth emphasizing that lock-step synchronous systems closely correspond to the concept of interpreted systems from the literature on reasoning about knowledge (Fagin et al. 1995). Note that every lock-step synchronous ATS is tight.

3.2. From alternating transition systems to MGMs First, for every ATS T = , Agt, Q, π, δ over a set of agents Agt = {a1 , . . . , ak } there is an equivalent MGM M T = Q, γ T , π where, for q each q ∈ Q, the strategic game form γ T (q) = Agt, { a | a ∈ Agt}, oq , Q is deﬁned in a very simple way: q

– a = δ(q, a), – oq (Qa1 , . . . , Qak ) = s where

ai ∈Agt Qai

= {s}.

EXAMPLE 9. Let us apply the transformation to the alternating transition system from Example 6. The resulting MGM is shown in Figure 6. The following proposition states that it satisﬁes the same ATL formulas as the original system. Note that – as T and M T include the same set of states Q – the construction preserves validity of formulas (in the model), too. PROPOSITION 7. For every alternating transition system T , a state q in it, and an ATL formula ϕ: T , q |= ϕ iff M T , q |= ϕ. The models M T deﬁned as above share a speciﬁc property which will be deﬁned below. First, we need an auxiliary technical notion: a fusion of n-tuples (α1 , . . . , αn ) and (β1 , . . . , βn ) is any n-tuple (γ1 , . . . , γn ) where γi ∈ {αi , βi } , i = 1, . . . , n. The following is easy to check. [ 92 ]

LOGICS FOR MULTI-AGENT SYSTEMS

257

Figure 6. From ATS to a convex game structure: M T for the system from Figure 4.

PROPOSITION 8. For any game form Agt, { a | a ∈ Agt}, Q, o, where Agt = {a1 , . . . , ak }, the following two properties of the outcome function o: a∈Agt a → Q are equivalent: (i) If o(σa1 , . . . , σak ) = o(τa1 , . . . , τak ) = s then o(ςa1 , . . . , ςak ) = s for every fusion (ςa1 , . . . , ςak ) of (σa1 , . . . , σak ) and (τa1 , . . . , τak ). (ii) For every s ∈ Q there are a ⊆ a such that o−1 (s) = a∈Agt a . DEFINITION 14. A strategic game form Agt, { a | a ∈ Agt}, Q, o is convex if the outcome function o satisﬁes (any of) the two equivalent properties above. A multi-player game model M = (Q, γ , π ) is convex if γ (q) is convex for every q ∈ Q. PROPOSITION 9. For every ATS T , the game model M T is convex. Proof: Let M T be deﬁned as above. If oq (Q1a1 , . . . , Q1ak ) = oq (Q2a1 , j . . ., Q2ak ) = s then s ∈ Qa for each j = 1, 2 and a ∈ Agt, therefore

ja j1 jk 1 1 a∈Agt Qa = {s} for any fusion (Qa1 , . . . , Qak ) of (Qa1 , . . . , Qak ) and (Q2a1 , . . . , Q2ak ). [ 93 ]

258

VALENTIN GORANKO AND WOJCIECH JAMROGA

REMARK 10. Pauly has pointed out that the convexity condition is known in game theory under the name of ‘rectangularity’ and rectangular strategic game forms which are ‘tight’ in sense that their α – and β – effectivity functions coincide are characterized in Abdou (1998) as the normal forms of extensive games with unique outcomes. PROPOSITION 11. 1. Every turn-based game model is convex. 2. Every injective game model is convex. Proof. (1) Let M = Q, γ , π be a turn-based MGM for a set of players Agt, and let d ∈ Agt be the dictator for γ (q), q ∈ Q. Then for q −1 every s ∈ Q, we have oq (s) = a∈Agt a where d = {σd ∈ d | q oq (. . . , σd , . . .) = s}, and a = a for all a = d. (2) is trivial. Note that the MGM from Figure 6 is convex, although it is neither injective nor turn-based, so the reverse implication does not hold. 3.3. From Convex Multi-Player Game Models to Alternating Transition Systems As it turns out, convexity is a sufﬁcient condition if we want to relabel transitions from a multi-player game model back to an alternating transition system. Let M = Q, γ , π be a convex MGM over a set of q propositions , where Agt = {a1 , . . . , ak }, and let γ (q) = Agt, { a | M a ∈ Agt}, Q, oq for each q ∈ Q. We transform it to an ATS T = , Agt, Q, π, δ M with the transition function δ M deﬁned by δ M (q, a) = {Qσa | σa ∈ aq }, q

Qσa = {oq (σa , σAgt\{a} ) | σAgt\{a} = σb1 , . . . , σbk−1 , bi = a, σbi ∈ σbi . Thus, Qσa is the set of states to which a transition may be effected from q while agent a has chosen to execute σa . Moreover, δ M (q, a) simply collects all such sets. For purely technical reasons we will regard these δ M (q, a) as indexed families, i.e., even if some Qσ1 and Qσ2 are set-theoretically equal, they will be considered different as long as σ1 = σ2 . By con

vexity of γ (q) it is easy to verify that a∈Agt Qσa = {oq (σa1 , . . . , σak )} for every tuple (Qσa1 , . . . , Qσak ) ∈ δ M (q, a1 ) × · · · × δ M (q, ak ). Furthermore, the following propositions hold. PROPOSITION 12. For every convex MGM M the ATS T M is tight. [ 94 ]

LOGICS FOR MULTI-AGENT SYSTEMS

259

PROPOSITION 13. For every convex MGM M, a state q in it, and an ATL formula ϕ, M, q |= ϕ iff T M , q |= ϕ. Note that the above construction transforms the multi-player game model from Figure 6 exactly back to the ATS from Figure 4. More generally, the constructions converting tight ATSs into convex MGMs and vice versa are mutually inverse, thus establishing a duality between these two types of structures: PROPOSITION 14. T

1. Every tight ATS T is isomorphic to T M . M 2. Every convex MGM M is isomorphic to M T . T

Proof. 1. It sufﬁces to see that δ M (q, a) = δ(q, a) for every q ∈ Q and a ∈ Agt which is straightforward from the tightness of T . q 2. Let M = Q, γ , π be a convex MGM and γ (q) = Agt, { a | a q ∈ Agt}, Q, oq for q ∈ Q. For every σa ∈ a we identify σa with Qσa deﬁned as above. We have to show that the outcome functions oq in M and TM oq in

M agree under that identiﬁcation. Indeed, oq (Qσa1 , . . . , Qσak ) = s iff a∈Agt Qσa = {s} iff oq (σa1 , . . . , σak ) = s. The following proposition shows the relationship between structural properties of MGMs and ATSs: PROPOSITION 15. 1. For every ATS T the game model M T is injective iff T is lock-step synchronous. 2. For every convex MGM M, the ATS T M is lock-step synchronous iff M is injective. 3. For every turn-based synchronous ATS T the game model M T is turnbased. Conversely, if M T is turn-based for some tight ATS T then T is turn-based synchronous. 4. For every convex MGM M the ATS T M is turn-based synchronous iff M is turn-based. Proof. (1) Let T be lock-step synchronous and oq (Qa1 , . . . , Qak ) = s ai ∈ δ(q, ai ), i = 1, . . . , k. Then Qai = {sai } T× a1 , . . . , sak for some Q suc Q where Q = a q a∈Agt\{ai } a∈Agt Qa , whence the injectivity of M . T can be labeled Conversely, if M is injective then every state s ∈ Qsuc q . . . , Qak such that oq (Qa1 , . . . , Qak ) = s, with the unique tuple Qa1 , i.e., Qsuc q is represented by a∈Agt δ(q, a), and every Qai ∈ δ(q, ai ) can be identiﬁed with {Qai } × a∈Agt\{ai } δ(q, a). [ 95 ]

260

VALENTIN GORANKO AND WOJCIECH JAMROGA

q (2) If M is injective then Qsuc by a∈Agt a where every q can be labeled Qσai ∈ δ(q, ai ) is identiﬁed with {σai } × a∈Agt\{ai } δ(q, a). Conversely, if T M is lock-step synchronous then every two different Qσa and Qσa from δ(q, a) must be disjoint, whence the injectivity of M. (3) and (4): the proofs are straightforward. 3.4. Equivalence between the Semantics for ATL Based on ATS and MGM So far we have shown how to transform alternating transition systems to convex multi-player game models, and vice versa. Unfortunately, not every MGM is convex. However, for every MGM we can construct a convex multi-player game model that satisﬁes the same formulas of ATL. This can be done by creating distinct copies of the original states for different incoming transitions, and thus ‘storing’ the knowledge of the previous state and the most recent choices from the agents in the new states. Since the actual choices are present in the label of the resulting state, the new transition function is obviously injective. It is also easy to observe that the below construction preserves not only satisﬁability, but also validity of formulas (in the model). PROPOSITION 16. For every MGM M = Q, γ , π there is an injective (and hence convex) MGM M = Q , γ , π which satisﬁes the same formulas of ATL. q Proof. γ (q) = Agt, { a | a ∈ Agt}, Q, oq we deﬁne Qq = For every q {q}× a∈Agt a and let Q = Q∪ q∈Q Qq . Now we deﬁne γ as follows: – for q ∈ Q, we deﬁne γ (q) = Agt, { a | a ∈ Agt}, O q , Q , and O q (σa1 , . . . , σak ) = q, σa1 , . . . , σak ; – for σ = q, σa1 , . . . , σak ∈ Qq , and s = oq (σa1 , . . . , σak ), we deﬁne γ (σ ) = γ (s); – ﬁnally, π (q) = π(q) for q ∈ Q, and π (q, σa1 , . . . , σak ) = π(oq (σa1 , . . . , σak )) for q, σa1 , . . . σak ∈ Qq . q

The model M is injective and it can be proved by a straightforward induction that for every ATL formula ϕ: – M , q |= ϕ iff M, q |= ϕ for q ∈ Q, and – M , σa1 , . . . , σak |= ϕ iff M, oq (σa1 , . . . , σak ) σa1 , . . . , σak ∈ Qq .

|=

ϕ for

Thus, the restriction of the semantics of ATL to the class of injective (and hence to convex, as well) MGMs does not introduce new validities. Since every ATS can be reduced to an equivalent tight one, we obtain the following. [ 96 ]

LOGICS FOR MULTI-AGENT SYSTEMS

261

Figure 7. Construction of a convex multi-player game model equivalent to the MGM from Figure 3.

Figure 8. ATS-style transition function for the convex game model from Figure 7.

COROLLARY 17. For every ATL formula ϕ the following are equivalent: 1. ϕ is valid in all (tight) alternating transition systems. 2. ϕ is valid in all (injective) multi-player game models. We note that the above construction preserves validity and satisﬁability of ATL∗ formulas, too. EXAMPLE 10. We can apply the construction to the controller from Example 4, and obtain a convex MGM equivalent to the original one in the context of ATL. The result is displayed in Figure 7. The labels for the transitions can be easily deduced from their target states. Re-writing the game model into an isomorphic ATS, according to the Construction from Section 3.3 (see Figure 8), completes the transformation from an arbitrary [ 97 ]

262

VALENTIN GORANKO AND WOJCIECH JAMROGA

multi-player game model to an alternating transition system for which the same ATL formulas hold.

3.5. ATS or MGM? Alur has stated7 that the authors of ATL switched from alternating transition systems to concurrent game structures mostly to improve understandability of the logic and clarity of the presentation. Indeed, identifying actions with their outcomes may make things somewhat artiﬁcial and unnecessarily complicated. In particular, we ﬁnd the convexity condition which ATSs impose too strong and unjustiﬁed in many situations. For instance, consider the following variation of the ‘Chicken’ game: two cars running against each other on a country road and each of the drivers, seeing the other car, can take any of the actions: ‘drive straight’, ‘swerve to the left’ and ‘swerve to the right’. Each of the combined actions for the two drivers: drive straight, swerve to the left and swerve to the right, drive straight leads to a non-collision outcome, while each of their fusions drive straight, drive straight and swerve to the left, swerve to the right leads to a collision. Likewise, in the ‘Coordinated Attack’ scenario (Fagin et al. 1995) any non-coordinated one-sided attack leads to defeat, while the coordinated attack of both armies, which is a fusion of these, leads to a victory. Thus, the deﬁnition of outcome function in coalition games is more general and ﬂexible in our opinion. Let us consider the system from Example 4 again. The multi-player game model (or concurrent game structure) from Figure 3 looks natural and intuitive. Unfortunately, there is no ATS with the same number of states and transitions that ﬁts the system description. In consequence, an ATS modeling the same situation must be larger (Jamroga 2003). The above examples show that correct alternating transition systems are more difﬁcult to come up with directly than multi-player game models, and usually they are more complex, too. This should be especially evident when we consider open systems. Suppose we need to add another client process to the ATS from Example 6. It would be hard to extend the existing transition function in a straightforward way so that it still satisﬁes the formal requirements (all the intersections of choices are singletons). Designing a completely new ATS is probably an easier solution. Another interesting issue is extendibility of the formalisms. Game models incorporate explicit labels for agents’ choices – therefore the labels can be used, for instance, to restrict the set of valid strategies under uncertainty (Jamroga and van der Hoek 2003). [ 98 ]

LOGICS FOR MULTI-AGENT SYSTEMS

263

Figure 9. Coalition effectivity model for the variable client/server system.

3.6. Coalition Effectivity Models as Equivalent Alternative Semantics for ATL Effectivity functions and coalition effectivity models were Introduced in Section 2.5, including a characterization of these effectivity functions which describe abilities of agents and their coalitions in actual strategic game forms (playable effectivity functions, Theorem 5). We are going to extend the result to correspondence between multi-player game models and standard coalition effectivity models (i.e., the coalition effectivity models that contain only playable effectivity functions). Every MGM M = Q, γ , π for the set of players Agt corresponds to a CEM E M = Agt, Q, E M , π , where for every q ∈ Q, X ⊆ Q and A ⊆ Agt, we have X ∈ EqM (A) iff ∃σA ∀σAgt\A ∃s ∈ Xo(σA , σAgt\A ) = s. The choices refer to the strategic game form γ (q). Conversely, by Theorem 5, for every standard coalition effectivity model E there is a multiplayer game model M such that E is isomorphic to E M . Again, by a straightforward induction on formulas, we obtain: PROPOSITION 18. For every MGM M, a state q in it, and an ATL formula ϕ, we have M, q |= ϕ iff E M , q |= ϕ. EXAMPLE 11. Let M be the multi-player game model from Example 4 (variable client/server system). Coalition effectivity model E M is presented in Figure 9. By Proposition 9 and Corollary 17, we eventually obtain: [ 99 ]

264

VALENTIN GORANKO AND WOJCIECH JAMROGA

THEOREM 19. For every ATL formula ϕ the following are equivalent: 1. ϕ is valid in all (tight) alternating transition systems, 2. ϕ is valid in all (injective) multi-player game models, 3. ϕ is valid in all standard coalition effectivity models. Thus, the semantics of ATL based on alternating transition systems, multi-player game models, and standard coalition effectivity models are equivalent. We note that, while the former two semantics are more concrete and natural, they are mathematically less elegant and suitable for formal reasoning about ATL, while the semantics based on coalition effectivity models is essentially a monotone neighborhood semantics for multi-modal logics. The combination of these semantics was used in Goranko and van Drimmelen (2003) to establish a complete axiomatization of ATL. 4. ATEL: ADDING KNOWLEDGE TO STRATEGIES AND TIME

Alternating-time Temporal Epistemic Logic ATEL (van der Hoek and Wooldridge 2002, 2003a) enriches the picture with epistemic component. ATEL adds to ATL operators for representing agents’ knowledge: Ka ϕ reads as “agent a knows that ϕ”. Additional operators EA ϕ, CA ϕ, and DA ϕ refer to “everybody knows”, common knowledge, and distributed knowledge among the agents from A. Thus, EA ϕ means that every agent in A knows that ϕ holds, while CA ϕ means not only that the agents from A know that ϕ, but they also know that they know that, and know that they know that they know it, etc. The distributed knowledge modality DA ϕ denotes a situation in which, if the agents could combine their individual knowledge together, they would be able to infer that ϕ is true. 4.1. AETS and Semantics of Epistemic Formulas Models for ATEL are called alternating epistemic transition systems (AETS). They extend alternating transition systems with epistemic accessibility relations ∼1 , . . . , ∼k ⊆ Q × Q for modeling agents’ uncertainty: T = Agt, Q, , π, ∼a1 , . . . , ∼ak , δ. These are assumed to be equivalence relations. Agent a’s epistemic relation is meant to encode a’s inability to distinguish between the (global) system states: q ∼a q means that, while the system is in state q, agent a cannot really determine whether it is in q or q . Then: T , q |= Ka ϕ iff for all q such that q ∼a q we have T , q |= ϕ [ 100 ]

LOGICS FOR MULTI-AGENT SYSTEMS

265

Figure 10. An AETS for the modiﬁed controller/client problem. The dotted lines display the epistemic accessibility relations for s and c.

REMARK 20. Since the epistemic relations are required to be equivalences, the epistemic layer of ATEL refers indeed to agents’ knowledge rather than beliefs in general. We suggest that this requirement can be relieved to allow ATEL for other kinds of beliefs as well. In particular, the interpretation of ATEL into ATL we propose in Section 4.4 does not assume any speciﬁc properties of the accessibility relations. Relations ∼EA , ∼CA and ∼D A , used to model group epistemics, are derived from the individual accessibility relations of agents from A. First, ∼EA is the union of the relations, i.e., q ∼EA q iff q ∼a q for some a ∈ A. In other words, if everybody knows ϕ, then no agent may be unsure about the truth of it, and hence ϕ should be true in all the states that cannot be distinguished from the current state by even one member of the group. Next, ∼CA is deﬁned as the transitive closure of ∼EA . Finally, ∼D A is the intersection of all the ∼a , a ∈ A: if any agent from A can distinguish q from q , then the whole group can distinguish the states in the sense of distributed knowledge. The semantics of group knowledge can be deﬁned as below (for K = C, E, D): T , q |= KA ϕ iff for all q such that q ∼K A q we have T , q |= ϕ

The time complexity of model checking for ATEL is still polynomial (van der Hoek and Wooldridge 2003a). EXAMPLE 12. Let us consider another variation of the variable controller example: the client can try to add 1 or 2 (modulo 3) to the value of x; the server can still accept or reject the request (Figure 10). The dotted lines show that c cannot distinguish being in state q0 from being in q1 , while s isn’t able to discriminate q0 from q2 . Some formulas that are valid for this AETS are shown below: [ 101 ]

266 1. 2. 3. 4.

VALENTIN GORANKO AND WOJCIECH JAMROGA

x x x x

= 1 → Ks x = 1, = 2 → Es,c ¬x = 1 ∧ ¬Cs,c ¬x = 1, = 0 → sXx = 0 ∧ ¬Ks sXx = 0, = 2 → s, cX(x = 0 ∧ ¬Es,c x = 0).

4.2. Extending Multi-Player Game Models and Coalition Effectivity Models to Include Knowledge Multi-player game models and coalition effectivity models can be augmented with epistemic accessibility relations in a similar way, giving way to multi-player epistemic game models M = q, γ , π, ∼a1 , . . . , ∼ak and epistemic coalition effectivity models E = Agt, Q, E, π, ∼a1 , . . . , ∼ak for a set of agents Agt = {a1 , . . . , ak } over a set of propositions . Semantic rules for epistemic formulas remain the same as in Section 4.1 for both kinds of structures. The equivalence results from Section 3 can be extended to ATEL and its models. COROLLARY 21. For every ATEL formula ϕ the following are equivalent: 1. ϕ is valid in all (tight) alternating epistemic transition systems, 2. ϕ is valid in all (injective) multi-player epistemic game models, 3. ϕ is valid in all standard epistemic coalition effectivity models. We will use multi-player epistemic game models throughout the rest of this chapter for the convenience of presentation they offer. 4.3. Problems with ATEL One of the main challenges in ATEL is the question how, given an explicit way to represent agents’ knowledge, this should interfere with the agents’ available strategies. What does it mean that an agent has a strategy to enforce ϕ, if it involves making different choices in states that are epistemically indistinguishable for the agent, for instance? Moreover, agents are assumed some epistemic capabilities when making decisions, and other for epistemic properties like Ka ϕ. The interpretation of knowledge operators refers to the agents’ capability to distinguish one state from another; the semantics of A allows the agents to base their decisions upon sequences of states. These relations between complete vs. incomplete information on one hand, and perfect vs. imperfect recall on the other, has been studied in Jamroga and van der Hoek (2003). It was also argued that, when reasoning about what an agent can enforce, it seems more appropriate to require the agent to know his winning strategy rather than to know only that such a strategy exists.8 Two variations of ATEL were proposed as solutions: [ 102 ]

LOGICS FOR MULTI-AGENT SYSTEMS

267

Alternating-time Temporal Observational Logic (ATOL) for agents with bounded memory and syntax restricted in a way similar to CTL, and full Alternating-time Temporal Epistemic Logic with Recall (ATELR∗) where agents were able to memorize the whole game. The issue of a philosophically consistent semantics for Alternating-time Temporal Logic with epistemic component is still under debate, and it is rather beyond the scope of this paper. We believe that analogous results to those presented here about ATEL can be obtained for logics like ATOL and ATELRs and their models. 4.4. Interpretations of ATEL into ATL ATL is trivially embedded into ATEL since all ATL formulas are also ATEL formulas. Moreover, every multi-player game model can be extended to a multi-player epistemic game model by deﬁning all epistemic accessibility relations to be the equality, i.e. all agents have no uncertainty about the current state of the system – thus embedding the semantics of ATL in the one for ATEL, and rendering the former a reduct of the latter. Interpretation the other way is more involved. We will ﬁrst construct a satisﬁability preserving interpretation of the fragment of ATEL without distributed knowledge (we will call it ATELCE), and then we will show how it can be extended to the whole ATEL, though at the expense of some blow-up of the models. The interpretation we propose has been inspired by Schild (2000). We should also mention (van Otterloo et al. 2003), as it deals with virtually the same issue. Related work is discussed in more detail at the end of the section. 4.4.1. Idea of the Interpretation ATEL consists of two orthogonal layers. The ﬁrst one, inherited from ATL, refers to what agents can achieve in temporal perspective, and is underpinned by the structure deﬁned via transition function o. The other layer is the epistemic component, reﬂected by epistemic accessibility relations. Our idea of the translation is to leave the original temporal structure intact, while extending it with additional transitions to ‘simulate’ epistemic accessibility links. The ‘simulation’ – like the one in van Otterloo et al. (2003) – is achieved through adding new “epistemic” agents, who can enforce transitions to epistemically accessible states. Unlike in that paper, though, the “moves” of epistemic agents are orthogonal to the original temporal transitions (‘action’ transitions): they lead to special ‘epistemic’ copies of the original states rather than to the ‘action’ states themselves, and no new states are introduced into the course of action. The ‘action’ and “epistemic” states form separate strata in the resulting model, and [ 103 ]

268

VALENTIN GORANKO AND WOJCIECH JAMROGA

Figure 11. New model: ‘action’ vs. ‘epistemic’ states, and ‘action’ vs. ‘epistemic’ transitions. Note that the game frames for ‘epistemic’ states are exact copies of their ‘action’ originals: the ‘action’ transitions from the epistemic layer lead back to the ‘action’ states.

are labeled accordingly to distinguish transitions that implement different modalities. The interpretation consists of two independent parts: a transformation of models and a translation of formulas. First, we propose a construction that transforms every multi-player epistemic game model M for a set of agents {a1 , . . . , ak }, into a (pure) multi-player game model M ATL over a set of agents {a1 , . . . , ak , e1 , . . . , ek }. Agents a1 , . . . , ak are the original agents from M (we will call them ‘real agents’). Agents e1 , . . . , ek are ‘epistemic doubles’ of the real agents: the role of ei is to ‘point out’ the states that were epistemically indistinguishable from the current state for agent a1 in M. Intuitively, Kai ϕ could be then replaced with a formula like ¬ei X¬ϕ that rephrases the semantic deﬁnition of Ka operator from Section 4.1. As M ATL inherits the temporal structure from M, temporal formulas might be left intact. However, it is not as simple as that. Note that agents make their choices simultaneously in multi-player game models, and the resulting transition is a result of all these choices. In consequence, it is not possible that an epistemic agent ei can enforce an ‘epistemic’ transition to state q, and at the same time a group of real agents A is capable of executing an ‘action’ transition to q . Thus, in order to distinguish transitions referring to different modalities, we introduce additional states in model M ATL . States q1ei , . . . , qnei are exact copies of the [ 104 ]

LOGICS FOR MULTI-AGENT SYSTEMS

269

original states q1 , . . . , qn from Q except for one thing: they satisfy a new proposition ei , added to enable identifying moves of epistemic agent ei . Original states q1 , . . . , qn are still in M ATL to represent targets of ‘action’ moves of the real agents a1 , . . ., ak . We will use a new proposition act to label these states. The type of a transition can be recognized by the label of its target state (cf. Figure 11). Now, we must only arrange the interplay between agents’ choices, so that the results can be interpreted in a direct way. To achieve this, every epistemic agent can choose to be “passive” and let the others decide upon the next move, or may select one of the states indistinguishable from q for an agent ai (to be more precise, the epistemic agents do select the epistemic copies of states from Qei rather than the original action states from Q). The resulting transition leads to the state selected by the ﬁrst nonpassive epistemic agent. If all the epistemic agents decided to be passive, the “action” transition chosen by the real agents follows. For such a construction of M ATL , we can ﬁnally show how to translate formulas from ATEL to ATL: 1. Kai ϕ can be rephrased as ¬{e1 , . . . , ei }X(ei ∧ ¬ϕ): the epistemic moves to agent ei ’s epistemic states do not lead to a state where ϕ fails. Note that player ei can select a state of his if, and only if, players e1 , . . . , ei−1 are passive (hence their presence in the cooperation modality). Note also that Kai ϕ can be as well translated as ¬{e1 , . . . , ek } X(ei ∧ ¬ϕ) or ¬{a1 , . . . , ak , e1 , . . . , ek }X(ei ∧ ¬ϕ): when ei decides to be active, choices from a1 , . . . , ak and ei+1 , . . . , ek are irrelevant. 2. AXϕ becomes A ∪ {e1 , . . . , ek }X(act ∧ ϕ) in a similar way. 3. To translate other temporal formulas, we must require that the relevant part of a path runs only through ‘action’ states (labeled with act proposition). Thus, AGϕ can be rephrased as ϕ ∧ A ∪ Agte XA ∪ Agte G(act ∧ ϕ). Note that a simpler translation with A ∪ Agte G(act ∧ ϕ) is incorrect: the initial state of a path does not have to be an action state, since Aϕ can be embedded in an epistemic formula. A similar method applies to the translation of AϕUψ. 4. Translation of common knowledge refers to the deﬁnition of relation ∼CA as the transitive closure of relations ∼ai : CA ϕ means that all the (ﬁnite) sequences of appropriate epistemic transitions must end up in a state where ϕ is true. The only operator that does not seem to lend itself to a translation according to the above scheme is the distributed knowledge operator D, for which we seem to need more ‘auxiliary’ agents. Thus, we will begin with presenting details of our interpretation for ATELCE – a reduced version [ 105 ]

270

VALENTIN GORANKO AND WOJCIECH JAMROGA

of ATEL that includes only common knowledge and ‘everybody knows’ operators for group epistemics. Section 4.4.3 shows how to modify the translation to include distributed knowledge as well. We note that an analogous interpretation into ATL can be proposed for the propositional version of BDI logic based on CTL. 4.4.2. Interpreting Models and Formulas of ATELCE into ATL Given a multi-player epistemic game model M = Q, γ , π, ∼a1 , . . . ∼ ak for a set of agents Agt = {a1 , . . . , ak } over a set of propositions , we construct a new game model M ATL = Q , γ , π over a set of agents Agt = Agt ∪ Agte , where: 1. Agte = {e1 , . . . , ek } is the set of epistemic agents; 2. Q = Q ∪ Qe1 ∪ · · · ∪ Qek , where Qei = {q ei | q ∈ Q}. We assume that Q, Qe1 , . . . , Qek are pairwise disjoint. Further we will be using any S ⊆ Q. the more general notation S (ei ) = {q ei | q ∈ S} for 3. = ∪ {act, e1 , . . . , ek }, and π (p) = π(p) ∪ i=1,...,k π(p)ei for every proposition p ∈ . Moreover, π (act) = Q and π (ei ) = Qei . q

For every state q in M, we translate the game frame γ (q) = Agt, { a | q a ∈ Agt}, Q, o to γ (q) = Agt , { a | a ∈ Agt }, Q , o : q

q

1. a = a for a ∈ Agt: choices of the ‘real’ agents do not change; q 2. ei = {pass} ∪ img(q, ∼ai )ei for i = 1, . . . k, where img(q, R) = {q | qRq } is the image of q with respect to relation R. 3. the new transition function is deﬁned as follows:

⎧ ⎨ oq (σa1 , . . . , σak ) if σe1 = · · · = σek = pass σei if ei is the ﬁrst active oq (σa1 , . . . , σak , σe1 , . . . , σek ) = ⎩ epistemic agent.

The game frames for the new states are exactly the same: γ (q ei ) = γ (q) for all i = 1, . . . , k, q ∈ Q. EXAMPLE 13. A part of the resulting structure for the epistemic game model from Figure 10 is shown in Figure 12. All the new states, plus the transitions going out of q2 are presented. The wildcard ‘∗ ’ stands for any action of the respective agent. For instance, reject, ∗ , pass, pass represents reject, set0, pass, pass} and reject, set1, pass,pass. Now, we deﬁne a translation of formulas from ATELCE to ATL corresponding to the above described interpretation of ATEL models into ATL models: tr(p) = p, [ 106 ]

for p ∈

271

LOGICS FOR MULTI-AGENT SYSTEMS

Figure 12. Construction for the multi-player epistemic game model from Figure 10.

¬tr(ϕ) tr(ϕ) ∨ tr(ψ) A ∪ Agte X(act ∧ tr(ϕ)) tr(ϕ) ∧ A ∪ Agte XA ∪ Agte G(act ∧ tr(ϕ)) tr(ψ) ∨ (tr(ϕ) ∧ A ∪ Agte XA ∪ Agte (act ∧ tr(ϕ))U(act ∧ tr(ψ))) tr(Kai ϕ) = ¬{e1 , . . . , ei }X(ei ∧ ¬tr(ϕ)) ei ∧ ¬tr(ϕ)) tr(EA ϕ) = ¬Agte X(

tr(¬ϕ) tr(ϕ ∨ ψ) tr(AXϕ) tr(AGϕ) tr(AϕUψ)

= = = = =

ai ∈A

tr(CA ϕ) = ¬Agt XAgte ( e

ai ∈A

ei )U(¬tr(ϕ) ∧

ei )

ai ∈A

LEMMA 22. For every ATELCE formula ϕ, model M, and ‘action’ state q ∈ Q, we have M ATL , q |= tr(ϕ) iff M ATL , q ei |= tr(ϕ) for every i = 1, . . . , k. Proof sketch (structural induction on ϕ): It sufﬁces to note that tr(ϕ) can only contain propositions act, e1 , . . . , ek in the scope of AX for some A ⊆ Agt . Besides, the propositions from ϕ are true in q iff they are true in q e1 , . . . , q ek and the game frames for q, q e1 , . . . , q ek are the same. LEMMA 23. For every ATELCE formula ϕ, model M, and a state q ∈ Q, we have M, q |= ϕ iff M ATL , q |= tr(ϕ). [ 107 ]

272

VALENTIN GORANKO AND WOJCIECH JAMROGA

Proof: The proof follows by structural induction on ϕ. We will show that the construction preserves the truth value of ϕ for two cases: ϕ ≡ AXψ and ϕ ≡ CA ψ. The cases of AGψ and AψUϑ can be reduced to the case for AXψ using the fact that these operators are ﬁxpoints (resp. greatest and least) of certain operators deﬁned in terms of AXψ (see Section 2.5). For lack of space we omit the details. An interested reader can tackle the other cases in an analogous way. case ϕ ≡ AXψ, ATELCE ⇒ ATL. Let M, q |= AXψ, then there is σA such that for every σAgt\A we have oq (σA , σAgt\A ) |= ψ. By induction hypothesis, M ATL , oq (σA , σAgt\A ) |= tr(ψ); also, M ATL , oq (σA , σAgt\A ) |= act. Thus, M ATL , oq (σA , σAgt\A , passAgte ) = oq (σA , σAgt\A ) |= act∧tr(ψ), where passC denotes the strategy where every agent from C ⊆ Agte decides to be passive. In consequence, M ATL , q |= A ∪ Agte Xtr(ψ). case ϕ ≡ AXψ, ATL ⇒ ATELCE. M ATL, q |= A ∪ Agte X(act ∧ tr(ψ)), so there is σA∪Agte such that for every σAgt \(A∪Agte ) = σAgt\A we have M ATL, oq (σA∪Agte , σAgt\A ) |= act ∧ tr(ψ). Note that M ATL, oq (σA∪Agte , σAgt\A ) |= act only when σA∪Agte = σA , passAgte , else the transition would lead to an epistemic state. Thus, oq (σA∪Agte , σAgt\A ) = oq (σA , σAgt\A ), and hence M ATL , oq (σA , σAgt\A ) |= tr(ψ). By the induction hypothesis, M, oq (σA , σAgt\A ) |= ψ and ﬁnally M, q |= AXψ. case ϕ ≡ CA ψ, ATELCE ⇒ ATL. We have M, q |= CA ψ, so for every sequence of states q0 = q, q1 , . . . , qn , qi ∼aji qi+1 , aji ∈ A for i = 0, . . . , n − 1, it is true that M, qn |= ψ. Consider the same q in M ATL . The shape of the construction implies that for every sequence q0 = q, q1 , . . . , qn in which every qi+1 is a successor of qi and every qi+1 ∈ Qeji , eji ∈ Ae , we have M ATL , qn |= tr(ψ) (by induction and Lemma 22). Moreover, M ATL, qi |= eji for i ≥ 1, hence M ATL , qi |= aj ∈A ej . Note that the above refers to all the sequences that can be enforced by the agents from Agte , and have aj ∈A ej true along the way (from q1 on). Thus, Agte have no strategy from q such that aj ∈A ej holds from the next state on, and tr(ψ) is eventually false: M ATL , q ATL Agte XAgte ( aj ∈A ej )U( aj ∈A ej ∧ ¬tr(ψ)), which proves the case. ATL e e case ϕ≡ CA ψ, ATL ⇒ ATELCE. We have M , q |= ¬Agt XAgt ( aj ∈A ej )U( aj ∈A ej ∧ ¬tr(ψ)), so for every σAgte there is ATL , q σAgt \Agte = σAgt q (σAgte , σAgt ) = q ∈ Q and M such that o e |= ¬Agt ( aj ∈A ej ) U( aj ∈A ej ∧ ¬tr(ψ)). In particular, this implies that the above holds for all epistemic states q that are suc[ 108 ]

LOGICS FOR MULTI-AGENT SYSTEMS

273

cessors of q in M ATL, also the ones that refer to agents from A (∗ ). Suppose that M, q CA ψ (∗∗ ). Let us now take the action counter ∈ Q of q . By (∗ ), (∗∗ ) and properties of the construction, part qact occurs also in M, and there must be a path q0 = q, q1 = qact , qact . . . , qn , qi ∈ Q, such that qi ∼aji qi+1 and M, qn ATEL ψ. Then, M ATL, qn ATL tr(ψ) (by induction). This means also that we have a sequence q0 = q, q1 = q , . . . , qn in M ATL , in which every qi ∈ Qeji , aji ∈ A, is an epistemic counterpart of qi . Thus, for every i = 1, . . . , n: M ATL , qi |= eji , so M ATL , qi |= aj ∈A ej . Moreover, M ATL , qn ATL tr(ψ) implies that M ATL , qn ATL tr(ψ) ATL , qn |= ¬tr(ψ). Thus, M ATL , q |= (by Lemma 22), so M Agte ( aj ∈A ej )U( aj ∈A ej ∧ ¬tr(ψ)), which contradicts (∗ ). As an immediate corollary of the last two lemmata we obtain: THEOREM 24. For every ATELCE formula ϕ and model M, ϕ is satisﬁable (resp. valid) in M iff tr(ϕ) is satisﬁable (resp. valid) in M ATL. Note that the construction used above to interpret ATELCE in ATL has several nice complexity properties: – The vocabulary (set of propositions ) only increases linearly (and certainly remains ﬁnite). – The set of states in an ATEL-model grows linearly, too: if model M has n states, then M ATL has n = (k + 1)n = O(kn) states. – Let m be the number of transitions in M. We have (k + 1)m action transitions in M ATL. Since the size of every set img(q, ∼a ) can be at most n, there may be no more than kn epistemic transitions per state in M ATL, hence at most (k + 1)nkn in total. Because m ≤ n2 , we have m = O(k 2 n2 ). – Only the length of formulas may suffer an exponential blow-up, because tr(AGϕ) involves two occurrences of tr(ϕ), and the translation of AϕUψ involves two occurrences of both tr(ϕ) and tr(ψ).9 Thus, every nesting of AGϕ and AϕUψ roughly doubles the size of the translated formula in the technical sense. However, the number of different subformulas in the formula only increases linearly. Note that the automata-based methods for model checking (Alur et al. 2002) or satisﬁability checking (van Drimmelen 2003) for ATL are based on an automaton associated with the given formula, built from its ‘subformulas closure’ – and their complexity depends on the number of different subformulas in the formula rather than number of [ 109 ]

274

VALENTIN GORANKO AND WOJCIECH JAMROGA

symbols. In fact, we can avoid the exponential growth of formulas in the context of satisﬁability checking by introducing a new propositional variable p and requiring that it is universally equivalent to tr(ϕ), i.e., adding conjunct φG(p ↔ tr(ϕ)) to the whole translated formula. Then AGϕ can be simply translated as p ∧ A ∪ Agte XA ∪ Agte G(act ∧p). ‘Until’ formulas AϕUψ are treated analogously. A similar method can be proposed for model checking. To translate AGϕ, we ﬁrst use the algorithm from Alur et al. (2002) and model-check tr(ϕ) to ﬁnd the states q ∈ Q in which tr(ϕ) holds. Then we update the model, adding a new proposition p that holds exactly in these states, and we model-check (p ∧ A ∪ Agte XA ∪ Agte G(act ∧ p)) as the translation of AGϕ in the new model. We tackle tr(AϕUψ) likewise. Since the complexity of transforming M to M ATL is no worse than O(n2 ), and the complexity of ATL model checking algorithm from Alur et al. (2002) is O(ml), the interpretation deﬁned above can be used, for instance, for an efﬁcient reduction of model checking of ATELCE formulas to model checking in ATL. 4.4.3. Interpreting Models and Formulas of Full ATEL Now, in order to interpret the full ATEL we modify the construction by introducing new epistemic agents (and states) indexed not only with individual agents, but with all possible non-empty coalitions: Agte = {eA | A ⊆ Agt, A = ∅} Q = Q ∪

QeA ,

A⊆Agt,A =∅

where Q and all QeA are pairwise disjoint. Accordingly, we extend the language with new propositions {eA | A ⊆ Agt}. The choices for complex epistemic agents refer to the (epistemic copies of) states accessible via eA Then we distributed knowledge relations: e A = {pass} ∪ img(q, ∼D A) modify the transition function (putting the strategies from epistemic agents in any predeﬁned order): ⎧ ⎨ oq (σa1 , . . . , σak ) if all σeA = pass if eA is the ﬁrst active σeA oq (σa1 , . . . , σak , . . . , σeA , . . .) = ⎩ epistemic agent

[ 110 ]

275

LOGICS FOR MULTI-AGENT SYSTEMS

Again, the game frames for all epistemic copies of the action states are the same. The translation for all operators remain the same as well (just using e{i} instead of ei ) and the translation of DA is: tr(DA ϕ) = ¬Agte X(eA ∧ ¬tr(ϕ)). The following result can now be proved similarly to Theorem 24. THEOREM 25. For every ATEL formula ϕ and model M, ϕ is satisﬁable (resp. valid) in M iff tr(ϕ) is satisﬁable (resp. valid) in M ATL. This interpretation requires (in general) an exponential blow-up of the original ATEL model (in the number of agents k). We suspect that this may be inevitable – if so, this tells something about the inherent complexity of the epistemic operators. For a speciﬁc ATEL formula ϕ, however, we do not have to include all the epistemic agents eA in the model – only those for which DA occurs in ϕ. Also, we need epistemic states only for these coalitions. Note that the number of such coalitions is never greater than the length of ϕ. Let l be the length of formula ϕ, and let m ¯ be the cardinality of the “densest” modal accessibility relation – either strategic or epistemic – in M. In other words, m ¯ = max(m, m∼1 , . . . , m∼k ), where m is the number of transitions in M, and m∼1 , . . . , m∼k are cardinalities of the respective epistemic relations. Then, the “optimized” transformation gives ¯ transitions, while the new formula tr(ϕ) is us a model with m = O(l · m) again only linearly longer than ϕ (in the sense explained in Section 4.4.2). In consequence, we can still use the ATL model checking algorithm for model checking of ATEL formulas that is linear in the size of the original structure: the complexity of such process is O(ml ¯ 2 ). 4.4.4. Related Work The interpretation presented in this section has been inspired by Schild (2000) in which a propositional variant of the BDI logic (Rao and Georgeff 1991) was proved to be subsumed by propositional µ-calculus. We use a similar method here to show a translation from ATEL models and formulas to models and formulas of ATL that preserves satisﬁability. ATL (just like µ-calculus) is a multimodal logic, where modalities are indexed by agents (programs in the case of µ-calculus). It is therefore possible to ‘simulate’ the epistemic layer of ATEL by adding new agents (and hence new cooperation modalities) to the scope. Thus, the general idea of the interpretation is to translate modalities of one kind to additional modalities of another kind. Similar translations are well known within modal logics community, including translation of epistemic logic into Propositional Dynamic Logic, [ 111 ]

276

VALENTIN GORANKO AND WOJCIECH JAMROGA

translation of dynamic epistemic logic without common knowledge into epistemic logic (Gerbrandy 1999) etc. A work particularly close to ours is included in van Otterloo et al. (2003). In that paper, a reduction of ATEL model checking to model checking of ATL formulas is presented, and the epistemic accessibility relations are handled in a similar way to our approach, i.e., with use of additional ‘epistemic’ agents. We believe, however, that our translation is more general, and provides more ﬂexible framework in many respects: 1. The algorithm from van Otterloo et al. (2003) is intended only for turnbased acyclic transition systems, which is an essential limitation of its applicability. Moreover, the set of states is assumed to be ﬁnite (hence only ﬁnite trees are considered). There is no restriction like this in our method. 2. The language of ATL/ATEL is distinctly reduced in van Otterloo et al. (2003): it includes only ‘sometime’ (F ) and ‘always’ (G) operators in the temporal part (neither ‘next’ nor ‘until’ are treated), and the individual knowledge operator Ka (the group knowledge operators C, E, D are absent). 3. The translation of a model in van Otterloo et al. (2003) depends heavily on the formula one wants to model-check, while in the algorithm presented here, formulas and models are translated independently (except for the sole case of efﬁcient translation of distributed knowledge).

5. C ONCLUDING R EMARKS

We have presented a comparative study of several logics that combine elements of game theory, temporal logics and epistemic logics, and demonstrated their relationship. Still, these enterprises differ in their motivations and agendas. We wanted to show them as parts of a bigger picture, so that one can compare them, appreciate their similarities and differences, and choose the system most suitable for the intended applications. Notably, the systems studied here can beneﬁt from many ideas and results, both technical and conceptual, borrowed from each other. Indeed, ATL has already beneﬁted from being related to coalitional games, as concurrent game structures provide a more general (and natural) semantics than alternating transition systems. Moreover, coalition effectivity models are mathematically simpler and more elegant, and provide technically handier semantics, essentially based on neighborhood semantics for nonnormal modal logics (Parikh 1985; Pauly 2000a). Furthermore, the pure [ 112 ]

LOGICS FOR MULTI-AGENT SYSTEMS

277

game-theoretical perspective of coalition logics can offer new ideas to the framework of open multi-agent systems and computations formalized by ATL. For instance, fundamental concepts in game theory, such as preference relations between outcomes, and Nash equilibria have their counterparts in concurrent game structures (and, more importantly, in the alternating-time logics) which are unexplored yet. On the other hand, the language and framework of ATL has widened the perspective on coalitional games and logics, providing a richer and more ﬂexible vocabulary to talk about abilities of agents and their coalitions. The alternating reﬁnement relations (Alur et al. 1998a) offer an appropriate notion of bisimulation between ATSs and thus can suggest an answer to the question ‘When are two coalition games equivalent?’.10 Also, a number of technical results on expressiveness and complexity, as well as realizability and model-checking methods from Alur et al. (1998a, 2002) can be transferred to coalition games and logics. And there are some speciﬁc aspects of computations in open systems, such as controllability and fairness constraints, which have not been explored in the light of coalition games. There were a few attempts to generalize ATL by including imperfect information in its framework: ATL with incomplete information in Alur et al. (2002), ATEL, ATOL, ATELRs etc. It can be interesting to see how these attempts carry over to the framework of CL. Also, stronger languages like ATL∗ and alternating-time µ-calculus can provide more expressive tools for reasoning about coalition games. In conclusion, we see the main contribution of the present study as casting a bridge between several logical frameworks for multi-agent systems, and we hope to trigger a synergetic effect from their mutual inﬂuence.

ACKNOWLEDGEMENTS

Valentin Goranko acknowledges the ﬁnancial support during this research provided by the National Research Foundation of South Africa and the SASOL Research Fund of the Faculty of Science at Rand Afrikaans University. He would like to thank Johan van Benthem for sparking his interest in logical aspects of games, Marc Pauly for stimulating discussions on coalition games and valuable remarks on this paper, Moshe Vardi for useful suggestions and references, and Philippe Balbiani for careful reading and corrections on an earlier version of the text. Wojciech Jamroga would like to thank Mike Wooldridge and Wiebe van der Hoek who inspired him to have a closer look at logics for multi-agent systems; and Barteld Kooi as well as Wiebe van der Hoek (again) for their remarks and suggestions. [ 113 ]

278

VALENTIN GORANKO AND WOJCIECH JAMROGA

NOTES 1 We make small notational changes here and there to make the differences and common

features between the models and languages clearer and easier to see. 2 Determinism is not a crucial issue here, as it can be easily imposed by introducing a

new, ﬁctitious agent, ‘Nature’, which settles all non-deterministic transitions. 3 The only real difference is that the set of states Q and the sets representing agents’

choices are explicitly required to be ﬁnite in the concurrent game structures, while MGMs and ATSs are not constrained this way. However, these requirements are not essential and can be easily omitted if necessary. 4 Note that the coalition logic-related notions of choice and collective choice can be readily expressed in terms of alternating transition systems, which immediately leads to a semantics for CL based on ATS, too. Thus, ATL and the coalition logics share the semantics based on alternating transition systems as well. 5 In Pauly (2002) these game frames are called dictatorial, but we disagree with that term. Indeed, at every local step in such game one player determines the move, but these players can be different for the different moves. 6 The deﬁnition in Alur et al. (1998a) requires the whole state space Q to be a Cartesian product of the ‘local’ state spaces; Lomuscio (1999) calls such structures ‘hypercube systems’. We ﬁnd that requirement unnecessarily strong. 7 Private communication. 8 This problem is closely related to the distinction between knowledge de re and knowledge de dicto. The issue is well known in the philosophy of language (Quine 1956), as well as research on the interaction between knowledge and action (Moore 1985; Morgenstern 1991; Wooldridge 2000). 9 We thank an anonymous referee for pointing this out. 10 Cf. the paper ‘When are two games the same’ in van Benthem (2000).

REFERENCES

Abdou, J.: 1998, ‘Rectangularity and Tightness: A Normal Form Characterization of Perfect Information Extensive Game Forms’, Mathematics and Operations Research 23(3). Alur, R., T. A. Henzinger, and O. Kupferman: 1997, ‘Alternating-Time Temporal Logic’, in Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society Press, pp. 100–109. Available at http://www.citeseer.nj.nec.com/170985.html. Alur, R., T. A. Henzinger, and O. Kupferman: 1998a, ‘Alternating-Time Temporal Logic’, Lecture Notes in Computer Science 1536, 23–60. Available at http://wwwciteseer.nj.nec.com/174802.html. Alur, R., T. A. Henzinger, and O. Kupferman: 2002, ‘Alternating-Time Temporal Logic’, Journal of the ACM 49, 672–713. Updated, improved, and extended text. Available at http://wwwwww.cis.upenn.edu/ alur/Jacm02.pdf. Alur, R., T. A. Henzinger, O. Kupferman, and M. Vardi: 1998b, ‘Alternating Reﬁnement Relations’, in CONCUR’98. Aumann, R. and S. Hart (eds.): 1992, Handbook of Game Theory with Economic Applications, Vol. 1, Elsevier/North-Holland.

[ 114 ]

LOGICS FOR MULTI-AGENT SYSTEMS

279

Chandra, A., D. Kozen, and L. Stockmeyer: 1981, ‘Alternation’, Journal of the ACM 28(1), 114–133. Emerson, E. A.: 1990, ‘Temporal and Modal Logic’, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, Vol. B, Elsevier, pp. 995–1072. Fagin, R., J. Y. Halpern, Y. Moses, and M. Y. Vardi: 1995, Reasoning about Knowledge, MIT Press, Cambridge, MA. Gerbrandy, J.: 1999, ‘Bisimulations on Planet Kripke’, Ph.D. thesis, University of Amsterdam. Goranko, V.: 2001, ‘Coalition Games and Alternating Temporal Logics’, in J. van Benthem (ed.), Proceedings of the 8th Conference on Theoretical Aspects of Rationality and Knowledge (TARK VIII), Morgan Kaufmann, pp. 259–272, Corrected version. Available at http://wwwhttp://general.rau.ac.za/maths/goranko/papers/gltl31.pdf. Goranko, V. and G. van Drimmelen: 2003, ‘Complete Axiomatization and Decidability Submitted of the Alternating-Time Temporal Logic’. Hart, S.: 1992, ‘Games in Extensive and Strategic Forms’, in R. Aumann and S. Hart (eds.), Handbook of Game Theory with Economic Applications, Vol. 1, Elsevier/North-Holland, pp. 19–40. Huth, M. and M. Ryan: 2000, Logic in Computer Science: Modelling and Reasoning about Systems, Cambridge University Press. Jamroga, W.: 2003, ‘Some Remarks on Alternating Temporal Epistemic Logic’, in B. Dunin-Keplicz and R. Verbrugge (eds.), Proceedings of Formal Approaches to Multi-Agent Systems (FAMAS 2003), pp. 133–140. Jamroga, W. and W. van der Hoek: 2003, ‘Agents that Know how to Play’, submitted. Lomuscio, A.: 1999, ‘Information Sharing among Ideal Agents’, Ph.D. thesis, University of Birmingham. Moore, R.: 1985, ‘A Formal Theory of Knowledge and Action’, in J. Hobbs and R. Moore (eds.), Formal Theories of the Commonsense World, Ablex Publishing Corporation. Morgenstern, L.: 1991, ‘Knowledge and the Frame Problem’, International Journal of Expert Systems 3(4). Osborne, M. and A. Rubinstein: 1994, A Course in Game Theory, MIT Press, Cambridge, MA. Parikh, R.: 1985, ‘The Logic of Games and its Applications’, Ann. of Discrete Mathematics 24, 111–140. Pauly, M.: 2000a, ‘Game Logic for Game Theorists’, Technical Report INS-R0017, CWI. Pauly, M.: 2000b, ‘A Logical Framework for Coalitional Effectivity in Dynamic Procedures’, in Proceedings of the Conference on Logic and the Foundations of Game and Decision Theory (LOFT4), To appear in Bulletin of Economics Research. Pauly, M.: 2001, ‘Logic for Social Software’, Ph.D. thesis, University of Amsterdam. Pauly, M.: 2002, ‘A Modal Logic for Coalitional Power in Games’, Journal of Logic and Computation 12(1), 149–166. Quine, W.: 1956, ‘Quantiﬁers and Propositional Attitudes’, Journal of Philosophy 53, 177– 187. Rao, A. and M. Georgeff: 1991, ‘Modeling Rational Agents within a BDI-Architecture’, Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning, pp. 473–484. Schild, K.: 2000, ‘On the Relationship between BDI Logics and Standard Logics of Concurrency’, Autonomous Agents and Multi Agent Systems pp. 259–283. Schnoebelen, P.: 2003, ‘The Complexity of Temporal Model Checking’, Advances in Modal Logics, Proceedings of AiML 2002, World Scientiﬁc.

[ 115 ]

280

VALENTIN GORANKO AND WOJCIECH JAMROGA

van Benthem, J.: 2000, ‘Logic and Games’, Technical Report X-2000-03, ILLC. van der Hoek, W. and R. Verbrugge: 2002, ‘Epistemic Logic: A Survey’, Game Theory and Applications 8, 53–94. van der Hoek, W. and M. Wooldridge: 2002, ‘Tractable Multiagent Planning for Epistemic Goals’, in C. Castelfranchi and W. Johnson (eds.): Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS-02), ACM Press, New York, pp. 1167–1174. van der Hoek, W. and M. Wooldridge: 2003a, ‘Cooperation, Knowledge and Time – Alternating-time Temporal Epistemic Logic and its Applications’, Studia Logica 75(1), 125–157. van der Hoek, W. and M. Wooldridge: 2003b, ‘Towards a Logic of Rational Agency’, Logic Journal of the IGPL 11(2), 135–160. van Drimmelen, G.: 2003, ‘Satisﬁability in Alternating-Time Temporal Logic’, in Proceedings of LICS’2003 IEEE Computer Society Press, pp. 208–217. van Otterloo, S., W. van der Hoek, and M. Wooldridge: 2003, ‘Knowledge as Strategic Ability’, Electronic Lecture Notes in Theoretical Computer Science 85(2). von Neumann, J. and O. Morgenstern: 1944, Theory of Games and Economic Behaviour, Princeton University Press, Princeton, NJ. Wooldridge, M.: 2000, Reasoning about Rational Agents, MIT Press, Cambridge, MA. Valentin Goranko Department of Mathematics Rand Afrikaans University, South Africa E-mail: [email protected] Wojciech Jamroga Parlevink Group, University of Twente, the Netherlands Institute of Mathematics, University of Gda´nsk, Poland E-mail: [email protected]

[ 116 ]

GIACOMO BONANNO

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

ABSTRACT. An information completion of an extensive game is obtained by extending the information partition of every player from the set of her decision nodes to the set of all nodes. The extended partition satisﬁes Memory of Past Knowledge (MPK) if at any node a player remembers what she knew at earlier nodes. It is shown that MPK can be satisﬁed in a game if and only if the game is von Neumann (vN) and satisﬁes memory at decision nodes (the restriction of MPK to a player’s own decision nodes). A game is vN if any two decision nodes that belong to the same information set of a player have the same number of predecessors. By providing an axiom for MPK we also obtain a syntactic characterization of the said class of vN games.

1. INTRODUCTION

The standard deﬁnition of extensive game (Selten 1975) speciﬁes a player’s information only when it is her turn to move (that is, only at her decision nodes), thus providing only a partial description of what the player learns during any play of the game. For both conceptual and practical reasons (Battigalli and Bonanno 1999; van Benthem, 2001), it may be desirable to express what a player knows also at nodes where she does not have to move, that is, at nodes that belong to another player. For example, one might want to model what information is (or can be) given to player i after some other player has made a move, even if it is not player i’s turn to move. In order to be able to do so one needs to add, for every player, a partition of the set of all nodes, which – when restricted to that player’s decision nodes – coincides with her initial information partition (thus preserving the original information sets).1 In this paper we study one aspect of memory within the context of such extended partitions. In the philosophy literature the concept of memory has been identiﬁed with the retention of past knowledge (Malcolm 1963; Munsat 1966). In accordance with this, we deﬁne Memory of Past Knowledge (MPK) as the property that at any node the player remembers what she knew at earlier nodes. This is a natural property to consider and, indeed, the restriction of it to a player’s own decision nodes is implied by the notion of perfect Synthese 139: 281–295, 2004. Knowledge, Rationality & Action 117–131, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 117 ]

282

GIACOMO BONANNO

recall, which is routinely assumed in game theory. We show that MPK can be satisﬁed only within the class of games that Kuhn (1953) calls von Neumann (vN) games. An extensive game is vN if any two decision nodes of player i that belong to the same information set of player i have the same number of predecessors. We prove that a game satisﬁes MPK if and only if it is a vN game and, for each player, the restriction of MPK to that player’s decision nodes is satisﬁed. We call the latter property “Memory at Decision Nodes” (MDN). We also show that an implication of MPK is that, at every stage of the game, it is common knowledge among all the players that the play of the game has reached that stage (if node x has k predecessors, that is, if the path from the root to x has length k, then we say that x belongs to stage k). One can think of the stage of the game as the number of units of time that have elapsed since the beginning of the game. Thus MPK implies that the time is always common knowledge among the players. In this respect vN games that satisfy MPK are closely related to the synchronous systems studied in the computer science literature, where the agents have access to an external clock (Halpern and Vardi 1986). In Section 3 we show that the proposed notion of memory on extended partitions does indeed capture the interpretation of memory as retention of past knowledge: we show that it is characterized by either of the following axioms: 1. If in the past the player knew φ then she knows now that in the past she knew φ, 2. If the player knows φ now, then at every future time she will know that in the past she knew φ. Thus either axiom provides a syntactic characterization of the class of von Neumann games that satisfy Memory at Decision Nodes.

2. EXTENDED PARTITIONS AND MEMORY

We use the tree-based deﬁnition of extensive game, which is due to Kuhn (1953). Since our analysis deals with the structure of moves and information, and is independent of payoffs, we shall focus on extensive forms and follow closely the deﬁnition given by Selten (1975). The ﬁrst component of an extensive form is a ﬁnite or inﬁnite rooted tree T , →, t0 where t0 denotes the root and, for any two nodes t, x ∈ T , t → x denotes that t is the immediate predecessor of x (or x is an immediate successor of t). For every node t it is assumed that the number of immediate successors of t is ﬁnite (possibly zero). We denote by ≺ the transitive closure of →. Thus [ 118 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

283

t ≺ x denotes that t is a predecessor of x or x is a successor of t (that is, there is a path from t to x) and we use t x to mean that either t = x or t ≺ x. For example, in the extensive form of Figure 1 we have that t → x and t ≺ z3 . Let Z be the set of terminal nodes, that is, nodes that have no successors and X = T \Z the set of decision nodes. For example, in Figure 1, Z = {z1 , z2 , . . . , z7 } and X = {t0 , t, t , y, x, x }. The second component of an extensive form is a set of players N = {1, 2, . . . , n} and a partition {Xi }i∈N of the set of decision nodes X. For every player i ∈ N, Xi is the set of decision nodes of player i. In the extensive form of Figure 1, N = {1, 2}, the set of player 1’s decision nodes is X1 = {t0 , y} and the set of player 2’s decision nodes is X2 = {t, t , x, x }. The third component is, for every player i ∈ N, an equivalence relation ∼i ⊆ Xi × Xi (that is, a binary relation that is reﬂexive, symmetric and transitive) satisfying the following constraint: if t, t ∈ Xi and t ∼i t then the number of immediate successors of t is equal to the number of immediate successors of t . The interpretation of t ∼i t is that player i cannot distinguish between t and t , that is, as far as she knows, she could be making a decision either at node t or at node t . The equivalence classes of ∼i partition Xi and are called the information sets of player i. We denote by Hi the set of information sets of player i. In the extensive form of Figure 1, ∼1 = {(t0 , t0 ), (y,y)} and , x ) . Thus, ∼2 = (t, t), (t, t ), (t , t), (t , t ), (x, x), (x, x ), (x , x), (x for example, 2’s player information sets are t, t and x, x , that is, H2 = { t, t , x, x }. We use the graphic convention of representing an information set as a rounded rectangle enclosing the corresponding nodes, if there are at least two nodes, while if an information set is a singleton we do not draw anything around it. Furthermore, since all the nodes in an information set belong to the same player, we write the corresponding player only once inside the rectangle. The fourth, and last, component of an extensive form is, for every player i ∈ N, a choice partition, which, for each of her information sets, partitions the edges out of nodes in that information set (that is, the set of ordered pairs (t, x) such that t → x) into player i’s choices at that information set. If (t, x) belongs to choice c we write t →c x. The choice partition satisﬁes the following constraints: (1) if t →c x and t →c x then x = x , and (2) if t →c x and t ∼i t then there exists an x such that t →c x . The ﬁrst condition says that a choice at a node selects a unique immediate successor, while the second condition says that if a choice is available at one node of an information set then it is available at every node in that information set. For example, in Figure 1, x →g z2 and x →g z4 , so that [ 119 ]

284

GIACOMO BONANNO

Figure 1.

player 2’s choice g is (x, z2 ), (x , z4 ) . Graphically we represent choices by labeling the corresponding edges in such a way that two edges belong to the same choice if and only if they are assigned the same label. The main focus in game theory has been on games with perfect recall.2 An extensive form is said to have perfect recall if “for every player i and for any two information sets g and h of player i, if one vertex y ∈ h comes after a choice c at g then every vertex x ∈ h comes after this choice c” (Selten 1975; Kuhn 1997, 319). For example, the extensive form of Figure 1 satisﬁes perfect recall (both x and x come after the same choice at the earlier information set {t, t } of player 2, namely choice d ). It is shown in Bonanno (2003) that perfect recall is equivalent to the conjunction of two independent properties, one expressing memory of past actions and the other memory of past knowledge. In this paper we focus on the latter. We call “Memory at Decision Nodes” (MDN) the following property (which is a weakening of perfect recall): if one node in information set h of player i has a predecessor that belongs to information set g of the same player [ 120 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

285

i, then every node in h has a predecessor in g.3 Formally (recall that Hi denotes the set of information sets of player i): (MDN) if x ≺ y, x ∈ g ∈ Hi , y ∈ h ∈ Hi , and y ∈ h, then there exists an x ∈ g such that x ≺ y . This means that, when it is her turn to move, a player always remembers what she knew at earlier decision nodes of hers. Note that this property is considerably weaker than perfect recall, since it is independent of choices. For example, if the extensive form of Figure 1 is modiﬁed in such a way that (t, x) and (t , y) belong to different choices of player 2,4 then it will still satisfy MDN but it will violate perfect recall. In this paper we shall not assume perfect recall, although we will restrict attention to extensive forms that satisfy the weaker property MDN. Our purpose is to study an extension of this property from the set of decision nodes of player i to the set of all the nodes. This requires extending the notion of information set. DEFINITION 1. An information completion of an extensive form is an ntuple K 1 , . . . , Kn where, for each player i = 1, . . . , n, Ki is a partition of the set of nodes T that agrees on player i’s information sets, in the sense that if node t belongs to information set h of player i then the cell of Ki that contains t – denoted by Ki (t) – coincides with h. Formally: if t ∈ h ∈ Hi then Ki (t) = h. We call Memory of Past Knowledge (MPK) the extension of MDN to the extended partition Ki : ∀x, y, y ∈ T , ∀i ∈ N, (MPK) if x ≺ y and y ∈ Ki (y) then there exists an x ∈ Ki (x) such that x ≺ y . In Section 3 we show that MPK does indeed correspond to the syntactic notion of remembering what one knew in the past. In this section we prove that MPK can be only be satisﬁed in von Neumann extensive forms. For every node t ∈ T , we denote by (t) the number of predecessors of t (i.e., the length of the path from the root to t). The following deﬁnition is taken from Kuhn (1953, 1997, 52). DEFINITION 2. An extensive form is von Neumann if, whenever t and x are decision nodes of player i that belong to the same information set of player i, the number of predecessors of t is equal to the number of predecessors of x. Formally: ∀i ∈ N, ∀t, x ∈ T , if t, x ∈ h ∈ Hi then (x) = (t). [ 121 ]

286

GIACOMO BONANNO

Figure 2.

The extensive form shown in Figure 1 is not von Neumann (since x and x belong to the same information set of player 2 and (x) = 2 while (x ) = 3), while the one shown in Figure 2 is von Neumann. The proof of the following proposition requires several steps and is relegated to Appendix A. For every integer k ≥ 0 we denote by T k the set of k-stage nodes: T k = {t ∈ T : (t) = k}.5 PROPOSITION 3. Let G be an arbitrary extensive form and K1 , . . . , Kn an information completion of it that satisﬁes MPK. Then (1) G is von Neumann, and (2) For every t ∈ T , i ∈ N and k ≥ 0, if t ∈ T k then Ki (t) ⊆ T k .

Part (2) of Proposition 3 implies that at every node t it is common knowledge among all the players that the play of the game has reached the stage k = (t). In fact, since Ki (t) ⊆ T k for all i, the cell of the common knowledge partition containing t is also a subset of T k . Thus at every node the number of moves made up to that point is common knowledge among all the players (although some players may be uncertain as to what moves have been made). The following result, due to Battigalli and Bonanno (1999), gives the converse to Proposition 3. [ 122 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

287

PROPOSITION 4. Let G be a von Neumann extensive form that satisﬁes property MDN. Then there exists an information completion K1 , . . . , Kn of it that satisﬁes MPK. Typically, there will be several information completions that satisfy MPK. The ﬁnest of all such completions (capturing the maximum amount of information that can be conveyed to the players, without violating memory) is obtained as follows. First some notation. For every node t and for every player i, let Hi (t) ⊆ Hi be the set of information sets of player i that are crossed by paths starting at t: Hi (t) = {h ∈ Hi : t y for some y ∈ h}. For example, in the extensive form of Figure 2, H3 (t1 ) = {{t4 , t5 }}, H3 (t2 ) = {{t4 , t5 }, {t6 , t7 }}, H3 (t3 ) = {{t6 , t7 }, {t8 }}, H3 (t4 ) = H3 (t5 ) = {t4 , t5 }, etc. Next we introduce, for every player i, a binary relation on T , denoted by ≈i . Let v, w ∈ T ; then v ≈i w if and only if, either (1) v = w, or (2) (v) = (w) and Hi (v) ∩ Hi (w) = ∅. For example, in the extensive form of Figure 2, t1 ≈3 t2 and t2 ≈3 t3 but not t1 ≈3 t3 . The relation ≈i is clearly reﬂexive and symmetric, but, in general, it is not transitive (as in the case of Figure 2). Let ≈∗i denote the transitive closure of ≈i . Thus v ≈∗i w if and only if there exists a ﬁnite sequence of nodes {y1 , y2 , . . . , ym } such that y1 = v, ym = w and, for all k = 1, . . . , m − 1, yk ≈i yk+1 . Then ≈∗i is an equivalence relation on T . Let Ki (t) be the equivalence class of tgenerated by ≈∗i and Ki the set of equivalence classes, that is, Ki (t) = v ∈ T : t ≈∗i v and Ki = {S ⊆ T : S = Ki (t) for some t ∈ T }. For example, in the extensive form of Figure 2, K3 (t0 ) = {t0 }, K3 (t1 ) = K3 (t2 ) = K3 (t3 ) = {t1 , t2 , t3 }, K3 (t4 ) = K3 (t5 ) = {t4 , t5 }, etc. It is shown in Battigalli and Bonanno (1999) that the information completion deﬁned above is the ﬁnest completion that satisﬁes MPK. By Propositions 3 and 4, the class of vN games that satisfy property MDN is precisely the class of games where there exists an information completion that satisﬁes MPK. By Proposition 3 an extensive form which is not von Neumann cannot have an information completion that satisﬁes MPK, even if it satisﬁes MDN. We illustrate this by means of the extensive form of Figure 1, which satisﬁes property MDN. Consider an information completion K2 for player 2. Since information completions preserve information sets, it must be that K2 (t) = K2 (t ) = {t, t } and K2 (x) = K2 (x ) = {x, x }. By MPK, since y ≺ x and x ∈ K2 (x ) there must be a node v ∈ K2 (y) such that v ≺ x. The only predecessors of x are t and t0 . We cannot have t ∈ K2 (y), since that would imply (by deﬁnition of partition) that y ∈ K2 (t), contradicting the fact that K2 (t) = {t, t }. On the other hand, [ 123 ]

288

GIACOMO BONANNO

if t0 ∈ K2 (y) then, since t ≺ y and t0 ∈ K2 (y), MPK would require the existence of a v ∈ K2 (t ) such that v ≺ t0 . But K2 (t ) = {t, t }.

3. SYNTACTIC CHARACTERIZATION OF MEMORY

In this section we provide a syntactic characterization of MPK. We interpret the precedence relation ≺ as a temporal relation and associate with it the standard past and future operators from basic temporal logic (Prior 1956; Burgess 1984; Goldblatt 1992). To the extended partition Ki of player i we associate a knowledge operator for player i. Given an extensive form and an information completion of it, by frame we mean the collection T , ≺, {Ki }i∈N where T is the set of nodes, ≺ the precedence relation on T and Ki is player i’s extended partition of T . We consider a propositional language with the following modal operators: the temporal operators G and H and, for every player i, the knowledge operator Ki . The intended interpretation is as follows: Gφ: H φ: Ki φ:

“it is Going to be the case at every future time that φ” “it Has always been the case that φ” “player i Knows that φ”.

The formal language is built in the usual way from a countable set S of atomic propositions, the connectives ¬ (for “not”) and ∨ (for “or”) and the def modal operators.6 Let P φ = ¬H ¬φ. Thus the interpretation is: P φ:

“at some Past time it was the case that φ”.

Given a frame T , ≺, {Ki }i∈N one obtains a model based on it by adding a function V : S → 2T (where 2T denotes the set of subsets of T ) that associates with every atomic proposition q ∈ S the set of nodes at which q is true. Given a model and a formula φ, the truth set of φ – denoted by V (φ) – is deﬁned as usual. In particular, V (Gφ) = {t ∈ T : ∀t ∈ T if t ≺ t then t ∈ V (φ)}, V (H φ) = {t ∈ T : ∀t ∈ T if t ≺ t then t ∈ V (φ)}, V (Ki φ) = {t ∈ T : Ki (t) ⊆ V (φ)}. An alternative notation for t ∈ V (φ) is t |= φ. A formula φ is valid in a model if t |= φ for all t ∈ T , that is, if φ is true at every node. A formula φ is valid in a frame if it is valid in every model based on it. [ 124 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

289

Finally, we say that a property of frames is characterized by an axiom if (1) the axiom is valid in any frame that satisﬁes the property and, conversely, (2) whenever the axiom is valid in a frame, then the frame satisﬁes the property. The following proposition, which is proved in Appendix B, provides a characterization of MPK.7 PROPOSITION 5. MPK is characterized by either of the following axioms: (M1)

P Ki φ → Ki P Ki φ,

(M2)

Ki φ → GKi P Ki φ.

(M1) says that if, at some time in the past, player i knew φ, then she knows now that in the past she knew φ. While (M1) is backward-looking, (M2) is forward-looking: it says that if player i knows φ now, then at every future time she will know that some time in the past she knew φ. By Propositions 3 and 5, if an extensive form has an information completion that validates axiom (M1) then the extensive form is von Neumann and satisﬁes property MDN. Conversely, by Propositions 4 and 5, a von Neumann extensive form that satisﬁes MDN has an information completion that validates axiom (M1). Thus axiom (M1) provides a syntactic characterization of the class of von Neumann games that satisfy MDN. The same is true of axiom (M2).

4. C ONCLUSION

An information completion of an extensive form is obtained by extending the information partition of every player from the set of her decision nodes to the set of all nodes. One can then deﬁne, for the extended partition, the following notion of memory: at any node a player remembers what she knew at earlier nodes. We showed that this property can be satisﬁed in an extensive form if and only if the extensive form is von Neumann and satisﬁes the restriction of the property to a player’s own decision nodes. We also provided two equivalent axioms for the proposed notion of memory thus obtaining a syntactic characterization of the said class of von Neumann games. [ 125 ]

290

GIACOMO BONANNO

APPENDIX A . PROOFS OF SECTION

2

In this appendix we prove Proposition 3 of Section 2. For the reader’s convenience we repeat the deﬁnition of MPK: if t ≺ x and x ∈ Ki (x) then there exists a t ∈ Ki (t) such that t ≺ x. We say that at node x there is “time Uncertainty” for player i if the cell Ki (x) of her extended partition Ki contains a predecessor of x, that is, if there is a path in the tree that crosses the cell of player i’s extended partition that contains x more than once.8 DEFINITION 6. At x ∈ T there is time uncertainty for player i if there exists a t ∈ Ki (x) such that t ≺ x. The following lemma states that in information completions that satisfy MPK, time uncertainty “propagates into the past”. LEMMA 7. Fix an arbitrary extensive form and let K1 , . . . , Kn be an information completion of it that satisﬁes MPK. Then the following is true for every node x and every player i: if at x there is time uncertainty for player i, then there exists a t ∈ T such that (1) t ≺ x and (2) at t there is time uncertainty for player i. Proof. Let x and i be such that there exists a t ∈ Ki (x) with t ≺ x. By MPK (letting x = t) there exists a t ∈ Ki (t) such that t ≺ t. Thus at t there is time uncertainty for player i. The following proposition says that MPK rules out time uncertainty. PROPOSITION 8. Fix an arbitrary extensive form and let K1 , . . . , Kn be an information completion of it that satisﬁes MPK. Then for every node x and every player i there cannot be time uncertainty at x for player i. Proof. Suppose that there is a node t1 and a player i at which there is time uncertainty for player i. By Lemma 7 there is an inﬁnite sequence t1 , t2 , . . . such that, for all k ≥ 1, tk+1 ≺ tk and at tk+1 there is time uncertainty for player i. Since T , ≺ is a rooted tree, it has no cycles. Thus, for all j, k ≥ 1 with j = k, tj = tk , contradicting the fact that in a rooted tree every node has a ﬁnite number of predecessors. The following proposition states that a situation like the one illustrated in Figure 3 (where rounded rectangles represent cells of Ki ) is not compatible with MPK. [ 126 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

291

Figure 3.

PROPOSITION 9. Let G be an arbitrary extensive form and K1 , . . . , Kn an information completion of it that satisﬁes MPK. Then the following is true for all t, t , x, x , y ∈ T and i ∈ N: if t → x, t ∈ Ki (t), t → x , x y , and y ∈ Ki (x), then y = x. Proof. Suppose not. Then there exist t, t , x, x , y ∈ T and i ∈ N such that t → x (that is, t is the immediate predecessor of x), t ∈ Ki (t), t → x (that is, t is the immediate predecessor of x ), y ∈ Ki (x) and x ≺ y . Since t → x , by Proposition 8 it must be that (1)

/ Ki (x ) t ∈

(otherwise there would be time uncertainty for player i at x ). It follows that (2)

t∈ / Ki (x ).

In fact, if it were the case that t ∈ Ki (x ), then we would have (by deﬁnition of partition) that Ki (t) = Ki (x ) and, since t ∈ Ki (t), t ∈ Ki (x ), contradicting (1). Since y ∈ Ki (x), Ki (y ) = Ki (x). Thus, since x ∈ Ki (x), (3)

x ∈ Ki (y ).

By MPK it follows from x ≺ y and (3) that there exists an x such that (4)

x ∈ Ki (x )

and (5)

x ≺ x. [ 127 ]

292

GIACOMO BONANNO

Since t is the immediate predecessor of x (t → x), it follows from (5) that either x = t, or x ≺ t. The case x = t yields a contradiction between (4) and (2). Suppose, therefore, that x ≺ t. By MPK it follows from t → x and (4) that there exists a t ∈ Ki (t ) such that t ≺ x . From t ∈ Ki (t) and t ∈ Ki (t ) we get (by deﬁnition of partition) that (6)

t ∈ Ki (t).

From t ≺ x and x ≺ t we get (by transitivity of ≺) that t ≺ t. This, in conjunction with (6), yields time uncertainty at t for player i, contradicting Proposition 8. Proof of Proposition 3. Fix an arbitrary player i and an arbitrary node x. Let k = (x). First we prove part (2), namely that Ki (x) ⊆ T k . We do this by induction. First of all, it must be that Ki (t0 ) = {t0 } (where t0 is the root of the tree). In fact, if there were a t = t0 with t ∈ Ki (t0 ), then we would have Ki (t) = Ki (t0 ) and, since t0 ∈ Ki (t0 ), t0 ∈ Ki (t). Thus, since t0 ≺ t, there would be time uncertainty at t for player i, contradicting Proposition 8. Thus the statement is true for k = 0. Next we show that if it is true for all k ≤ m then it is true for k = m + 1. Fix a node x ∈ T m+1 and an arbitrary y ∈ Ki (x). Then Ki (y ) = Ki (x). By the induction hypothesis, (y ) ≥ m + 1.9 Suppose that (y ) > m + 1. Let t ∈ T m be the immediate predecessor of x. Since t → x and y ∈ Ki (x), by MPK there exists a t ∈ Ki (t) such that t ≺ y . By the induction hypothesis, Ki (t) ⊆ T m and therefore t ∈ T m . Let x be the immediate successor of t on the path from t to y . Since (t ) = m, (x ) = m+1. Thus, since (y ) > m+1, x = y . Thus we have that all of the following are true, contradicting Proposition 9: t → x, t ∈ Ki (t), t → x , x y , y ∈ Ki (x) and y = x . Thus we have shown that for every player i and node x, Ki (x) ⊆ T (x), completing the proof of part (2) of Proposition 3. To prove part (1) it is sufﬁcient to recall that, by deﬁnition of information completion, if node x belongs to information set h of player i, then Ki (x) = h. Thus the extensive form is von Neumann. APPENDIX B . PROOFS FOR SECTION

3

Proof of Proposition 5. Assume MP K. We show that both (M1) and (M2) are valid. For (M1): suppose that x |= P Ki φ. Then there exists a t such that t ≺ x and t |= Ki φ, that is, Ki (t) ⊆ V (φ). Fix an arbitrary x ∈ Ki (x). By MPK there exists a t ∈ Ki (t) such that t ≺ x . Since t ∈ Ki (t), Ki (t ) = Ki (t) and, therefore, since Ki (t) ⊆ V (φ), t |= Ki φ. Thus x |= P Ki φ and x |= Ki P Ki φ. For (M2): suppose that t |= Ki φ. Fix arbitrary x and [ 128 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

293

x such that t ≺ x and x ∈ Ki (x). By MPK there exists a t ∈ Ki (t) such that t ≺ x . Since t ∈ Ki (t), Ki (t ) = Ki (t) and, therefore, t |= Ki φ. Thus x |= P Ki φ and x |= Ki P Ki φ and t |= GKi P Ki φ. To prove the converse, assume that MPK does not hold, that is, there exist i ∈ N and t, x, x ∈ T such that all of the following hold: (7)

t ≺ x,

(8)

x ∈ Ki (x),

(9)

∀t ∈ T ,

if

t ≺ x

then

t ∈ / Ki (t).

We want to show that both (M1) and (M2) can be falsiﬁed. Let q be an atomic sentence and construct a model where V (q) = Ki (t). Then (10)

t |= Ki q.

/ Ki (t) = V (q) and therefore For every t such that t ≺ x , by (9) t ∈ (11)

t q.

It follows from (11) that (12)

t Ki q.

In fact, if it were the case that Ki (t ) ⊆ V (q) = Ki (t) then, since t ∈ Ki (t ) we would have t |= q, contradicting (11). It follows from (12) that x P Ki q. Hence, by (8), (13)

x Ki P Ki q.

By (7) and (10), x |= P Ki q. This, together with (13), falsiﬁes (M1) at x. By (13) and (7), t GKi P Ki q. This, together with (10), falsiﬁes (M2) at t.

ACKNOWLEDGEMENTS

I am grateful to two anonymous referees for helpful comments and suggestions. A ﬁrst draft of this paper was presented at the XIV Meeting on Game Theory and Applications, Ischia, July 2001. [ 129 ]

294

GIACOMO BONANNO

NOTES 1 To avoid confusion, throughout the paper we use the expression “player i’s information

partition” to refer to the standard partition of i’s decision nodes. The elements of this partition will always be referred to as “information sets”. On the other hand, player i’s partition of the set of all nodes will be called “i’s extended partition” and its elements will be called “cells”. 2 The notion of perfect recall was introduced by Kuhn (1953), who interprets it as follows: “this condition is equivalent to the assertion that each player is allowed by the rules of the game to remember everything he knew at previous moves and all of his choices at those moves (Kuhn 1997, 65). In the computer science literature the expression “perfect recall” has been used to denote a weaker property (see the next endnote). 3 This property was ﬁrst studied in the game theory literature by Okada (1987, 89). Ritzberger (1999, 77) calls it “strong ordering”, while Kline (2002, 288) calls it “occurrence memory”. An essentially identical property, called “no forgetting”, was introduced in the computer science literature by Ladner and Reif (1986) and Halpern and Vardi (1986). It was later renamed as ‘perfect recall’ in Fagin et al. (1995). See also van der Meyden (1994). 4 For example, if choice c is {(t, z ), (t , y)} and choice d is {(t, x), (t , z )}. 7 1 5 For example, in the game of Figure 2, T 1 = {t , t , t }, T 2 = {z , t , t , t , t , t }, etc. 1 2 3 1 4 5 6 7 8 6 See, for example, Chellas (1984). The connectives ∧ (for “and”) and → (for “ if def

def

. . . then”) are deﬁned as usual: φ ∧ ψ = ¬(¬φ ∨ ¬ψ) and φ → ψ = ¬φ ∨ ψ. 7 An alternative axiom for the property that we call ‘memory of past knowledge’ was sug-

gested by Ladner and Reif (1986): Ki Gφ → GKi φ. Halpern and Vardi (1986) provided a sound and complete axiomatization of systems that satisfy ‘memory of past knowledge’ ( they called this property ‘no forgetting’) and are synchronous (i.e., the agents have access φ → Ki φ, where is the ‘next time’ to an external clock). The key axiom is Ki operator, that is, t |= φ if φ is true at every immediate successor of t. As pointed out in Section 1, synchronous systems are closely related to von Neumann games. 8 When restricted to a player’s information sets, time uncertainty coincides with the notion of absent-mindedness (Piccione and Rubinstein 1997, 10; Kline 2002, 289). 9 Suppose, to the contrary, that (y ) = j with j < m + 1. Then, by the induction hypothesis, Ki (y ) ⊆ T j . Since x ∈ Ki (x) and Ki (x) = Ki (y ), x ∈ Ki (y ). Thus x ∈ T j , contradicting the hypothesis that x ∈ T m+1 .

REFERENCES

Battigalli, P. and G. Bonanno: 1999, ‘Synchronic Information, Knowledge and Common Knowledge in Extensive Games’, Research in Economics 53, 77–99. van Benthem, J.: 2001, ‘Games in Dynamic Epistemic Logic’, Bulletin of Economic Research 53, 219–248. Bonanno, G.: 2003, ‘Memory and Perfect Recall in Extensive Games’, Games and Economic Behavior, In Press. Burgess, J.: 1984, ‘Basic Tense Logic’, in D. Gabbay and F. Guenthner (eds), Handbook of Philosophical Logic, Vol. II, D. Reidel Publishing Company, pp. 89–133. Chellas, B.: 1984, Modal Logic: An Introduction, Cambridge University Press.

[ 130 ]

A CHARACTERIZATION OF VON NEUMANN GAMES IN TERMS OF MEMORY

295

Fagin, R., J. Halpern, Y. Moses, and M. Vardi: 1995, Reasoning about Knowledge, MIT Press. Goldblatt, R.: 1992, Logics of Time and Computation, CSLI Lecture Notes No. 7. Halpern, J. and M. Vardi: 1986, ‘The Complexity of Reasoning about Knowledge and Time, Proceedings 18th ACM Symposium on Theory of Computing, pp. 304–315. Kline, J. J.: 2002, ‘Minimum Memory for Equivalence between ex ante Optimality and Time Consistency, Games and Economic Behavior 38, 278–305. Kuhn, H. W.: 1953, ‘Extensive Games and the Problem of Information’, in H. W. Kuhn and W. W. Tucker (eds), Contributions to the Theory of Games, Vol. II, Princeton University Press, pp. 193–216. Reprinted in Kuhn (1997), pp. 46–68. Kuhn, H. W.: 1997, Classics in Game Theory, Princeton University Press. Ladner, R. and J. Reif: 1986, ‘The Logic of Distributed Protocols (Preliminary Report)’, in J. Halpern (ed.), Theoretical Aspects of Reasoning about Knowledge: Proceedings of the 1986 Conference, Morgan Kaufmann, pp. 207–222. Malcolm, N.: 1963, Knowledge and Certainty, Prentice-Hall. van der Meyden, R.: 1994, ‘Axioms for Knowledge and Time in Distributed Systems withPperfect Recall, Proc. IEEE Symposium on Logic in Computer Science, pp. 448-457. Munsat, S.: 1966, The Concept of Memory, Random House. Okada, A.: 1987, ‘Complete Inﬂation and Perfect Recall in Extensive Games, International Journal of Game Theory 16, 85–91. Piccione, M. and A. Rubinstein: 1997, ‘On the Interpretation of Decision Problems with Imperfect Recall’, Games and Economic Behavior 20, 3–24. Prior, A. N.: 1956, Time and Modality, Oxford University Press. Ritzberger, K.: 1999, ‘Recall in Extensive Form Games’, International Journal of Game Theory 28, 69–87. Selten, R.: 1975, ‘Re-examination of the Perfectness Concept for Equilibrium Points in Extensive Games, International Journal of Game Theory 4, 25–55. Reprinted in Kuhn (1997), 317–354. Department of Economics University of California Davis, CA 95616-8578 U.S.A. E-mail: [email protected]

[ 131 ]

KARL TUYLS, ANN NOWE, TOM LENAERTS and BERNARD MANDERICK

AN EVOLUTIONARY GAME THEORETIC PERSPECTIVE ON LEARNING IN MULTI-AGENT SYSTEMS

ABSTRACT. In this paper we revise Reinforcement Learning and adaptiveness in MultiAgent Systems from an Evolutionary Game Theoretic perspective. More precisely we show there is a triangular relation between the ﬁelds of Multi-Agent Systems, Reinforcement Learning and Evolutionary Game Theory. We illustrate how these new insights can contribute to a better understanding of learning in MAS and to new improved learning algorithms. All three ﬁelds are introduced in a self-contained manner. Each relation is discussed in detail with the necessary background information to understand it, along with major references to relevant work.

1. INTRODUCTION

Agent-Based computing is a new evolving paradigm in computer science. Nowadays, more and more technological challenges require distributed, dynamic systems. Traditional program paradigms make some assumptions which just do not hold for a great number of new applications. Often the environment for which a program is designed is neither static, nor completely known or deterministic (e.g., the internet). The characteristics of these environments imply that systems which need to interact in such an environment operate within rapidly changing circumstances, with an enormous growth of available information. These new requirements of today’s applications suggest that alternative programming paradigms are necessary. Since the early 90s agent-based systems or Multi-Agent Systems have emerged as an important active area of research to support these new requirements in Information Technology (Wooldridge 2002; Luck et al. 2003; Weiss 1999). In contrast with traditional methodologies the agentbased approach views a program as a set of one or more independent, rational agents. Typically, an agent is an autonomous computational entity with a ﬂexible dynamic behaviour in an unpredictable environment. The uncertainty of the environment implies that an agent needs to learn from, and adapt to, its environment to be successful. Indeed, it is impossible to foresee all situations an agent can encounter beforehand. Therefore, learnSynthese 139: 297–330, 2004. Knowledge, Rationality & Action 133–166, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 133 ]

298

KARL TUYLS ET AL.

ing and adaptiveness become crucial for the successful application of Multi-agent systems to contemporary technological challenges. Robocup is a nice illustration of such a challenge. The global goal of the Robocup project is stated as, by the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team (Robocup project 2003). A team of robotic players, challenging human soccer teams, must be capable of learning how to communicate, cooperate and compete. If this team of robotic players, a standard multiagent system, wants to be competitive, it must be able to coordinate their actions as a team. Hence learning and adaptiveness become crucial. Reinforcement Learning (RL) is already an established and profound theoretical framework for learning in stand-alone or single-agent systems. Yet, extending RL to multi-agent systems (MAS) does not guarantee the same theoretical grounding. As long as the environment an agent is experiencing is Markov,1 and the agent can experiment enough, RL guarantees convergence to the optimal strategy. In a MAS however, the reinforcement an agent receives, may depend on the actions taken by the other agents present in the system. Hence, the markovian property no longer holds. Moreover, previous guarantees of convergence disappear. Consider for instance the problem of ﬁnding the optimal way between two points in trafﬁc. The cost measured in time it takes to get from point A to a point B using a particular route will be inﬂuenced by the current trafﬁc conditions, i.e., how many other drivers decided to use the same route. Communication on these decisions is not always possible, moreover there is an associated cost and communication is subject to delays. Uncontrolled exploration in this situation can lead to policy oscillations (Nowe et al. 1999). When everyone decides to take the alternative route, this one becomes less interesting than the original one. Most MAS belong to this last case of non-stationarity. Obviously in these environments, the convergence results of RL are lost. In the light of the above problem it is important to fully understand the dynamics of reinforcement learning and the effect of exploration in MAS. For this aim we review Evolutionary Game Theory (EGT) as a solid basis for understanding learning and constructing new learning algorithms. The Replicator Equations will appear to be an interesting model to study learning in various settings. This model consists of a system of differential equations describing how a population of strategies evolves over time, and plays a central role in biological and economical models. Several authors have already noticed that the Replicator Dynamics (RD) can emerge from simple learning models (Sarin et al. 1997; Redondo 2001; Tuyls et al. 2003). [ 134 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

299

Figure 1. The triangular relation between RL, MAS and EGT.

This article discusses the theoretical foundations of learning in multiagent systems. For the moment, a theoretical framework in which learning and adaptiveness in agent-based systems can be understood profoundly is lacking. However, this paper reveals how Evolutionary Game Theory and Reinforcement Learning are connected and how insights from Evolutionary Game Theory provide a better understanding of learning in general in multi-agent systems. More precisely, this formal relation closes the triangle between the three ﬁelds and offers the necessary foundations for this missing formal framework. This comes down to an important triangular relation between the ﬁeld of MAS, Reinforcement Learning and Evolutionary Game Theory expressed by Figure 1. Each relation of this triangle will be discussed in detail in this paper. The outline of the paper is as follows, in the second section we present an overview of the key concepts and key results from Game Theory (GT) and Evolutionary Game Theory. This is a necessary background for the further discussion and provides an evolutionary game theoretic perspective on learning in MAS. After this we continue with discussing the three relations in more detail. Section 3 discusses the third link of Figure 1 and makes explicit how both ﬁelds relate and what the current issues are in multiagent learning algorithms. Section 4 discusses the second link of Figure 1 and reveals how both ﬁelds relate mathematically and is an opening toward solving the issues of the previous section. Section 5 closes the circle and elaborates on the ﬁrst link, i.e., reveals the interesting similarities between the ﬁelds and summarizes why EGT is an interesting framework to analyze and understand MAS. We end with a conclusion. [ 135 ]

300

KARL TUYLS ET AL.

2. EVOLUTIONARY GAME THEORY

2.1. Introduction When John Nash discovered the theory of games at Princeton, in the late 40s and early 50s, the impact was enormous. Originally, Game Theory was launched by John von Neumann and Oskar Morgenstern in 1944 in their book Theory of Games and Economic Behavior (von Neumann et al. 1944). The impact of the developments in Game Theory expressed itself especially in the ﬁeld of economics, where its concepts played an important role in for instance the study of international trade, bargaining, the economics of information and the organization of corporations. But also in other disciplines in the social and natural sciences the importance of Game Theory became clear, as for instance studies of legislative institutions, of voting behavior, of warfare, of international conﬂicts, and of evolutionary biology. However, von Neumann and Morgenstern had only managed to deﬁne an equilibrium concept for 2-person zero-sum games. Zero-sum games correspond to situations of pure competition, whatever one player wins, must be lost by another. John Nash addressed the case of competition with mutual gain by deﬁning best-reply functions and using Kakutani’s ﬁxed point-theorem. The main results of his work expressed themselves in his development of the Nash Equilibrium and the Nash Bargaining Solution concept. Despite the great usefulness of the Nash equilibrium concept, the assumptions traditional game theory make, like hyperrational players that correctly anticipate the other players in an equilibrium, made game theory stagnate for quite some time (Weibull 1996; Gintin 2000; Samuelson 1997). A lot of reﬁnements of Nash equilibria came along (for instance trembling hand perfection), which made it hard to choose the appropriate equilibrium in a particular situation. Almost any Nash equilibrium could be justiﬁed in terms of some particular reﬁnement. This made clear that the static Nash concept did not reﬂect the (dynamic) real world where people do not act hyperrational. This is were evolutionary game theory originated. More precisely, Maynard Smith adopted the idea of evolution from biology (Maynard-Smith et al. 1973; Maynard-Smith 1982). This idea led Smith and Price to the concept of Evolutionary Stable Strategies (ESS), which in fact obeys a stricter condition than the Nash condition. In evolutionary game theory the game is no longer played exactly once by rational players who know all the details of the game. Details of the game include each others preferences over outcomes. Instead EGT assumes that the game [ 136 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

301

is played repeatedly by players randomly drawn from large populations, uninformed of the preferences of the opponent players. Evolutionary Game Theory offers a solid basis for rational decision making in an uncertain world, it describes how individuals make decisions and interact in complex environments in the real world. Modeling learning agents in the context of Multi-agent Systems requires insight in the type and form of interactions with the environment and other agents in the system. Usually, these agents are modelled similar to the different players in a standard game theoretical model. In other words, these agents assume complete knowledge of the environment, have the ability to correctly anticipate the opposing player (hyperrationality) and know that the optimal strategy in the environment is always the same (static Nash equilibrium). The intuition that in the real world people are not completely knowledgeable and hyperrational players and that an equilibrium can change dynamically led to the development of evolutionary game theory. 2.2. Elementary Concepts In this section we review the key concepts of EGT and its mutual relationships. This is important to understand the further discussion in later sections. We start by deﬁning strategic games and concepts as Nash equilibrium, Pareto optimality and evolutionary stable strategies. Then we discuss the relationships between these concepts and provide some examples. 2.2.1. Strategic Games In this section we deﬁne n-player normal form games as a conﬂict situation involving gains and losses between n players. In such a game n players interact with each other by all choosing an action (or strategy) to play. All players choose their strategy at the same time. For reasons of simplicity, we limit the pure strategy set of the players to 2 strategies. A strategy is deﬁned as a probability distribution over all possible actions. In the 2-pure strategies case, we have: s1 = (1, 0) and s2 = (0, 1). A mixed strategy sm is then deﬁned by sm = (x1 , x2 ) with x1 , x2 = 0 and x1 + x2 = 1. Deﬁning a game more formally we restrict ourselves to the 2-player 2-action game. Nevertheless, an extension to n-players n-actions games is straightforward, but examples in the n-player case do not show the same illustrative strength as in the 2-player case. A game G = (S1 , S2 , P1 , P2 ) is deﬁned by the payoff functions P1 , P2 and their strategy sets S1 for the ﬁrst player and S2 for the second player. In the 2-player 2-strategies case, the payoff functions P1 : S1 × S2 → R and P2 : S1 × S2 → R player and B for the second player, see Table I. The payoff tables A, B deﬁne the [ 137 ]

302

KARL TUYLS ET AL.

TABLE I The left matrix (A) deﬁnes the payoff for the row player, the right matrix (B) deﬁnes the payoff for the column player A=

a11 a12 a21 a22

B=

b11 b12 b21 b22

instantaneous rewards. Element aij is the reward the row-player (player 1) receives for choosing pure strategy si from set S1 when the columnplayer (player 2) chooses the pure strategy sj from set S2 . Element bij is the reward for the column-player for choosing the pure strategy sj from set S2 when the row-player chooses pure strategy si from set S1 . The family of 2 × 2 games is usually classiﬁed in three subclasses, as follows (Redondo 2001), Subclass 1: if (a11 − a21 )(a12 − a22 ) > 0 or (b11 − b12 )(b21 − b22 ) > 0, at least one of the 2 players has a dominant strategy, therefore there is just 1 strict equilibrium. Subclass 2: if (a11 − a21 )(a12 − a22 ) < 0 or (b11 − b12 )(b21 − b22 ) < 0, and (a11 − a21 )(b11 − b12 ) > 0, there are 2 pure equilibria and 1 mixed equilibrium. Subclass 3: if (a11 − a21 )(a12 − a22 ) < 0 (b11 − b12 )(b21 − b22 ) < 0, and (a11 − a21 )(b11 − b112 ) < 0, there is just 1 mixed equilibrium. The ﬁrst subclass includes those type of games where each player has a dominant strategy,2 as for instance the prisoner’s dilemma. However it includes a larger collection of games since only one of the players needs to have a dominant strategy. In the second subclass none of the players has a dominated strategy (e.g., battle of the sexes). But both players receive the highest payoff by both playing their ﬁrst or second strategy. This is expressed in the condition (a11 − a21 )(b11 − b12 ) > 0. The third subclass only differs from the second in the fact that the players do not receive their highest payoff by both playing the ﬁrst or the second strategy (e.g., matching pennies game). This is expressed by the condition [ 138 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

303

(a11 − a21 )(b11 − b12 ) < 0. Section 2.2.6 provides an example of each subclass. 2.2.2. Nash equilibrium In traditional game theory it is assumed that the players are hyperrational, meaning that every player will choose the action that is best for him, given his beliefs about the other players’ actions. A basic deﬁnition of a Nash equilibrium is stated as follows. If there is a set of strategies for a game with the property that no player can increase its payoff by changing his strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute a Nash equilibrium. Formally, a Nash equilibrium is deﬁned as follows. When 2 players play the strategy proﬁle s = (si , sj ) belonging to the product set S1 × S2 then s is a Nash equilibrium if P1 (si , sj ) ≥ P1 (sx , sj ) ∀x ∈ {1, . . . , n} and P2 (si , sj ) ≥ P2 (si , sx ) ∀x ∈ {1, . . . , m}. In Section 2.2.6 some examples illustrate this deﬁnition. 2.2.3. Pareto Optimality Intuitively a Pareto optimal solution of a game can be deﬁned as follows: a combination of actions of agents in a game is Pareto optimal if there is no other solution for which all players do at least as well and at least one agent is strictly better off. More formally we have: a strategy combination s = (s1 , ..., sn ) for n agents in a game is Pareto optimal if there does not exist another strategy combination s for which each player receives at least the same payoff Pi and at least one player j receives a strictly higher payoff than Pj . 2.2.4. Evolutionary Stable Strategies The core equilibrium concept of Evolutionary Game Theory is that of an Evolutionary Stable Strategy (ESS). The idea of an evolutionarily stable strategy was introduced by John Maynard Smith and Price in 1973 (Maynard-Smith et al. 1973). Imagine a population of agents playing the same strategy. Assume that this population is invaded by a different strategy, which is initially played by a small number of the total population. If the reproductive success of the new strategy is smaller than the original one, it will not overrule the original strategy and will eventually disappear. In this case we say that the strategy is evolutionary stable against this new appearing strategy. More general, we say a strategy is an Evolutionary Stable strategy if it is robust against evolutionary pressure from any appearing mutant strategy.

[ 139 ]

304

KARL TUYLS ET AL.

Formally an ESS is deﬁned as follows. Suppose that a large population of agents is programmed to play the (mixed) strategy s, and suppose that this population is invaded by a small number of agents playing strategy s . The population share of agents playing this mutant strategy is ∈ ]0, 1[. When an individual is playing the game against a random chosen agent, chances that he is playing against a mutant are and against a non-mutant are 1 − . The payoff for the ﬁrst player, being a non mutant is: P (s, (1 − )s + s ) and being a mutant is, P (s , (1 − )s + s ) Now we can state that a strategy s is an ESS if ∀s = s there exists some δ ∈ ]0, 1[ such that ∀ : 0 < < δ, P (s, (1 − )s + s ) > P (s , (1 − )s + s ) holds. The condition ∀ : 0 < < δ expresses that the share of mutants needs to be sufﬁciently small. 2.2.5. The Relation between Nash Equilibria and ESS This section explains how the core equilibria concepts from classical and evolutionary game theory relate to one another. The set of Evolutionary Stable Strategies for a particular game are contained in the set of Nash Equilibria for that same game, {ESS} ⊂ {NE} The conditions for an ESS are stricter than the Nash condition. Intuitively this can be understood as follows: as deﬁned above a Nash equilibrium is a best reply against the strategies of the other players. Now if a strategy s1 is an ESS then it is also a best reply against itself, or optimal. If it wasn’t optimal against itself there would have been a strategy s2 that would lead to a higher payoff against s1 than s1 itself. So, if the population share of mutant strategies s2 is small enough then s1 is not evolutionary stable because, P (s2 , (1 − )s1 + s2 ) > P (s1 , (1 − )s1 + s2 ) An important second property for an ESS is the following. If s1 is ESS and s2 is an alternative best reply to s1 , then s1 has to be a better reply to s2 [ 140 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

305

TABLE II Prisoner’s dilemma: The left matrix (A) deﬁnes the payoff for the row player, the right one (B) for the column player A=

1 5 0 3

B=

1 0 5 3

than s2 to itself. This can easily be seen as follows, because s1 is ESS, we have for all s2 P (s1 , (1 − )s1 + s2 ) > P (s2 , (1 − )s1 + s2 ) If s2 does as well against itself as s1 does, then s2 earns at least as much against (1 − )s1 + s2 as s1 and then s1 is no longer evolutionary stable. To summarize we now have the following 2 properties for an ESS s1 , 1. P (s2 , s1 ) ≤ P (s1 , s1 )∀s2 . 2. P (s2 , s1 ) = P (s1 , s1 ) "⇒ P (s2 , s2 ) < P (s1 , s2 )∀s2 = s1 . 2.2.6. Examples In this section we provide some examples of the classiﬁcation of games (see Section 2.2.1) and illustrate the Nash equilibrium concept and Evolutionary Stable Strategy concept as well as Pareto optimality. For the ﬁrst subclass we consider the prisoner’s dilemma game (Gintis 2000; Weibull 1996). In this game 2 prisoners, who committed a crime together, have a choice to either cooperate with the police (to defect) or work together and deny everything (to cooperate). If the ﬁrst criminal (row player) defects and the second one cooperates, the ﬁrst one gets off the hook (expressed by a maximum reward of 5) and the second one gets the most severe punishment. If they both defect, they get the second most severe punishment one can get (expressed by a payoff of 1). If both cooperate, they both get a minimum sentence. The payoffs of the game are deﬁned in Table II. As one can see both players have one dominant strategy, more precisely defect. For both players, defecting is the dominant strategy and therefore always the best reply toward any strategy of the opponent. So the Nash equilibrium in this game is for both players to defect. Let’s now determine whether this equilibrium is also an evolutionary stable strategy. Suppose ∈ [0, 1] is the number of cooperators in the population. The expected [ 141 ]

306

KARL TUYLS ET AL.

TABLE III Battle of the sexes: The left matrix (A) deﬁnes the payoff for the row player, the right one (B) for the column player A=

2 0 0 1

B=

1 0 0 2

payoff of a cooperator is 3 +(1−0) and that of a defector is 5 +(1−1). Since for all , 5 + 1(1 − ) > 3 + 0(1 − ) defect is an ESS. So the number of defectors will always increase and the population will eventually only consist of defectors. In Section 2.3 this dynamical process will be illustrated by the replicator equations. This equilibrium which is both Nash and ESS, is not a Pareto optimal solution. This can be easily seen if we look at the payoff tables. The combination (defect, defect) yields a payoff of (1, 1), which is a smaller payoff for both players than the combination (cooperate, cooperate) which yields a payoff of (3, 3). Moreover the combination (cooperate, cooperate) is a Pareto optimal solution. For the second subclass we considered the battle of the sexes game (Gintis 2000; Weibull 1996). In this game a married couple loves each other so much they want to do everything together. One night the husband wants to see a movie and the wife wants to go to the opera. This situation is described by the payoff matrices of Table III. If they both do their activities separately they receive the lowest payoff. In this game there are 2 pure strategy Nash equilibria, i.e., (movie, movie) and (opera, opera), which both are also evolutionary stable (as demonstrated in Section 2.3.4). There is also 1 mixed nash equilibrium, i.e., where the row player (the husband) plays movie with 2/3 probability and opera with 1/3 probability and the column player (the wife) plays opera with 2/3 probability and movie with 1/3 probability. However, this equilibrium is not an evolutionary stable one (as demonstrated in Section 2.3.4). The third class consists of the games with a unique mixed equilibrium. For this category we used the game deﬁned by the matrices in Table IV. This equilibrium is not an evolutionary stable one (see Section 2.3.4). Typical for this class of games is that the interior trajectories deﬁne closed orbits around the equilibrium point. [ 142 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

307

TABLE IV The left matrix (A) deﬁnes the payoff for the row player, the right one (B) for the column player A=

2 3 4 1

B=

3 1 2 4

2.3. Population Dynamics In this section we discuss the Replicator Dynamics in a single and a multi population setting. We discuss the relation with concepts as Nash equilibrium and ESS and illustrate the described ideas with some examples. 2.3.1. Single Population Replicator Dynamics The basic concepts and techniques developed in EGT were initially formulated in the context of evolutionary biology (Maynard-Smith 1982; Weibull 1996; Samuelson 1997). In this context, the strategies of all the players are genetically encoded (called genotype). Each genotype refers to a particular behavior which is used to calculate the payoff of the player. The payoff of each player’s genotype is determined by the frequency of other player types in the environment. One way in which EGT proceeds is by constructing a dynamic process in which the proportions of various strategies in a population evolve. Examining the expected value of this process gives an approximation which is called the RD. An abstraction of an evolutionary process usually combines two basic elements: selection and mutation. Selection favors some varieties over others, while mutation provides variety in the population. The replicator dynamics highlight the role of selection, it describes how systems consisting of different strategies change over time. They are formalized as a system of differential equations. Each replicator (or genotype) represents one (pure) strategy si . This strategy is inherited by all the offspring of the replicator. The general form of a replicator dynamic is the following: (1)

dxi = [(Ax)i − x · Ax]xi dt

In Equation (3), xi represents the density of strategy si in the population, A is the payoff matrix which describes the different payoff values each [ 143 ]

308

KARL TUYLS ET AL.

individual replicator receives when interacting with other replicators in the population. The state of the population (x) can be described as a probability vector x = (x1 , x2 , . . . , xJ ) which expresses the different densities of all the different types of replicators in the population. Hence (Ax)i is the payoff which replicator si receives in a population with state x and x · Ax describes the average payoff in the population. The growth rate dxi dt xi

of the population share using strategy si equals the difference between the strategy’s current payoff and the average payoff in the population. For further information we refer the reader to Weibull (1966), Hofbauer et al. (1998). 2.3.2. Multi-Population Replicator Dynamics So far the study of population dynamics was limited to a single population. However in many situations interaction takes place between 2 or more individuals from different populations. In this section we study this situation in the 2-player multi-population case for reasons of simplicity. Games played by individuals of different populations are commonly called evolutionary asymmetric games. Here we consider a game to be played between the members of two different populations. As a result, we need two systems of differential equations: one for the row player (R) and one for the column player (C). This setup corresponds to a RD for asymmetric games. If A = B t (the transpose of B), Equation (1) would emerge again. Player R has a probability vector p over its possible strategies and player C a probability vector q over its strategies. This translates into the following replicator equations for the two populations: (2)

dpi = [(Aq)i − p · Aq]pi dt

(3)

dqi = [(Bp)i − q · Bp]qi dt

As can be seen in Equations (2) and (3)), the growth rate of the types in each population is now determined by the composition of the other population. Note that, when calculating the rate of change using these systems of differential equations, two different payoff matrices (A) and (B) are used for the two different players.

[ 144 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

309

2.3.3. Relating Nash, ESS and the RD As being a system of differential equations, the RD have some rest points or equilibria. An interesting question is how these RD-equilibria relate to the concepts of Nash equilibria and ESS. We brieﬂy summarize some known results from the EGT literature (Weibull 1996; Gintis 2000; Osborne et al. 1994; Hofbauer et al. 1998; Redondo 2001). An important result is that every Nash equilibrium is an equilibrium of the RD. But the opposite is not true. This can be easily understood as follows. Let us consider the vector space or simplex of mixed strategies determined by all pure strategies. Formally the unit simplex is deﬁned by, = {x ∈ #m + :

m

xi = 1}

i=1

where x is a mixed strategy in m-dimensional space (there are m pure strategies), and xi is the probability with which strategy si is played. Calculating the RD for the unit vectors of this space (putting all the weight on a particular pure strategy), yields zero. This is simply due to the properties of the simplex , where the sum of all population shares remains equal to 1 and no population share can ever turn negative. So, if all pure strategies are present in the population at any time, then they always have been and always will be present, and if a pure strategy is absent from the population at any time, then it always has been and always will be absent.3 So, this means that the pure strategies are rest points of the RD, but depending on which game is played these pure strategies do not need to be a Nash equilibrium. Hence not every rest point of the RD is a Nash equilibrium. So dynamic equilibrium or stationarity alone is not enough to have a better understanding of the RD. For this reason the criterion of asymptotic stability came along, where you have some kind of local test of dynamic robustness. Local in the sense of minimal perturbations. For a formal deﬁnition of asymptotic stability, we refer to Hirsch et al. (1974). Here we give an intuitive deﬁnition. An equilibrium is asymptotic stable if the following two conditions hold: – Any solution path of the RD that starts sufﬁciently close to the equilibrium remains arbitrarily close to it. This condition is called Liapunov stability. – Any solution path that starts close enough to the equilibrium, converges to the equilibrium. Now, if an equilibrium of the RD is asymptotically stable (i.e., being robust to local perturbations) then it is a Nash equilibrium. For a proof, the reader is referred to Redondo (2001). An interesting result due to Sigmund and [ 145 ]

310

KARL TUYLS ET AL.

Hofbauer (Hofbauer (1998) is the following : If s is an ESS, then the population state x = s is asymptotically stable in the sense of the RD. For a proof, see Hofbauer et al. (1998), Redondo (2001). So, by this result we have some kind of reﬁnement of the asymptotic stable rest points of the RD and it provides a way of selecting equilibria from the RD that show dynamic robustness. 2.3.4. Examples In this section we continue with the examples of Section 2.2.6 and the classiﬁcation of games of Section 2.2.1. We start over with the Prisoner’s Dilemma game (PD). In Figure 2 we plotted the direction ﬁeld of the replicator equations applied to the PD. A Direction ﬁeld is a very elegant and excellent tool to understand and illustrate a system of differential equations. The direction ﬁelds presented here consist of a grid of arrows tangential to the solution curves of the system. Its a graphical illustration of the vector ﬁeld indicating the direction of the movement at every point of the grid in the state space. Filling in the parameters for each game in Equations (2) and (3), allowed us to plot this ﬁeld. The x-axis represents the probability with which the ﬁrst player will play defect and the y-axis represents the probability with which the second player will play defect. So the Nash equilibrium and the ESS lie at coordinates (1, 1). As you can see from the ﬁeld plot all the movement goes toward this equilibrium. Figure 3 illustrates the direction ﬁeld diagram for the battle of the sexes game. As you may recall from Section 2.2.6 this game has 2 pure Nash equilibria and 1 mixed Nash equilibrium. This equilibria can be seen in the ﬁgure at coordinates (0, 0),(1, 1),(2/3, 1/3). The 2 pure equilibria are ESS as well. This is also easy to verify from the plot, more precisely, any small perturbation away from the equilibrium would lead the dynamics back to the equilibrium. The mixed equilibrium, which is Nash, is not an asymptotic stable strategy, which is obvious from the plot. From Section 2.2.6, we can now also conclude that this equilibrium is not evolutionary stable either. Figure 4 illustrates the last class of games (subclass 3). Typical for this class of games is that the interior trajectories deﬁne closed orbits around the equilibrium point, as you can see in the plot. This Nash equilibrium is not asymptotically stable, because its second condition is not met, which stated that any solution path that starts close enough to the equilibrium, converges to the equilibrium. However, the ﬁrst condition, i.e., Liapunov stability, is met, stating that any solution path of

[ 146 ]

311

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

Figure 2. The direction ﬁeld of the RD of the prisoner’s dilemma using payoff Table II.

the RD that starts sufﬁciently close to the equilibrium remains arbitrarily close to it. This can be intuitively understood from the plot.

3. REINFORCEMENT LEARNING AND MULTI - AGENT SYSTEMS

In this part we discuss the relation between learning and MAS (see Figure 1). Recall from Section 1 that learning and adaptiveness is crucial for the successful application of Multi-Agent Systems to challenging domains as for instance Robotic Soccer (Stone 2000). In a ﬁrst section we start with the already established theory of Single-Agent learning. We continue with the more challenging issues of Multi-Agent learning and discuss the different possible approaches. [ 147 ]

312

Figure 3. Table III.

KARL TUYLS ET AL.

The direction ﬁeld of the RD of the Battle of the sexes game using payoff

3.1. Single Agent Reinforcement Learning RL is the problem faced by an agent that learns behavior through trial-anderror interactions with a dynamic environment. A reinforcement learning model consists of: 1. A discrete set of environment states. 2. A discrete set of agent actions. 3. A set of scalar reinforcement signals. On each step of interaction the agent receives a reinforcement, possibly zero, and some indication of the current state of the environment, and chooses an action. The agent’s job is to ﬁnd a policy mapping states to actions, that maximizes some long-run measure of reinforcement. Very often this measure is the discounted cumulative reward. [ 148 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

313

Figure 4. The direction ﬁeld of the RD of the third category using payoff Table IV.

In its most general form, the RL problem is a problem of an agent located in an environment δ trying to maximize a long-term reward by taking actions a from different situations in δ. Figure 5 illustrates this problem statement in more detail. At time step t the agent ﬁnds itself in situation (or state) st . From st it takes action at . The environment reacts and places the agent in situation st +1 . By performing action at the agent receives an immediate reward rt . The immediate reward depends on either or both the action taken, and the next state. To choose an action at from a particular state st at time step t the agent uses a policy πt , with πt (s, a) the probability that in state s at time step t action at will be performed. Common reinforcement learning methods, which can be found in Sutton et al. (2000) are structured around estimating value functions. A value [ 149 ]

314

KARL TUYLS ET AL.

Figure 5. The reinforcement learning model.

of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. More formally we have: V (s) = Eπ {Rt |st = s} = Eπ { π

∞

γ k rt +k+1 |st = s}

k=0

Rewards further away in the future are discounted by γ with 0 < γ < 1. One way to ﬁnd the optimal policy is to ﬁnd the optimal value function. If a perfect model of the environment as a Markov decision process is known, the optimal value function can be calculated using Dynamic Programming (DP) techniques. In DP two major approaches exist, i.e., value-iteration and policy-iteration. Both approaches have their counter parts in RL, which can be considered as model-free4 stochastic approximation methods of the DP techniques. Q-learning is a well-known RL technique that belongs to the valueiteration class. It learns an evaluation function for each situation-action pair. This function Q is deﬁned by Qπ (s, a) = Eπ {Rt |st = s, at = a} ∞ γ k rt +k+1 |st = s, at = a = Eπ k=0

Thus, the Q-function expresses the expected reward if an agent takes action a in state s and then continues with policy π . Based on his experience, the agent can iteratively improve this evaluation function and as such adapt the policy π to the ideal policy π ∗ , which maximizes the long term reward. The relation between V π (s) and Qπ (s, a) is: V π (s) = max Qπ (s, a) a

[ 150 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

315

Figure 6. Classiﬁcation of RL-algorithms.

The update rule used in standard Q-learning is given below

(4)

Qt +1 (s, a) ← (1 − α)Qt (s, a) Qt (s , a )) +α(r + γ max a

where Qt (s , a ) and Qt +1 (s, a) are the estimation of the state-action value at time step t and t + 1 respectively, and s the state where the agent arrives after taking action a at time step t + 1 in situation s. Actor-critic methods are RL techniques that belong to the policy iteration class. These methods keep track of a current policy, the actor, and an estimate of corresponding state-value function. The critic is based on this estimated value function. It generates a Temporal Difference error, which is in its simplest form given by: et = rt +1 + γ V (st +1 ) − V (st ). When the error is positive the tendency of repeating the action taken in state st will be reinforced, otherwise it will be weakened. A learning cycle goes as follows. The agent is at time step t in a certain state st , for that state the current best action is chosen, with some degree of exploration. The selected action will bring the agent in a new state st +1 . If the new state st +1 looks better, i.e., et > 0, then the action at for state st will be strengthened, e.g., by increasing the probability to be selected, otherwise this probability will be decreased (Sutton et al. 2000). Figure 6 illustrates the classiﬁcation of RL-methods.

[ 151 ]

316

KARL TUYLS ET AL.

3.2. Multi-Agent Reinforcement Learning The original reinforcement learning algorithms as mentioned above, were designed to be used in a single agent setting. When applied to Markovian decision problems most RL techniques are equipped with a formal proof stating that under very acceptable conditions they are guaranteed to converge to the optimal policy, for instance, for Q-learning, see Tsitsiklis (1993). There has also been quite some effort to extend these RL techniques to Partially Observable Markovian Decision Problems and other non-Markovian settings (Loch et al. 1998; Pendrith et al. 1998; Kaelbling et al. 1996; Perkins et al. 2002). The extension to multi-agent learning recently received more attention. It is clear that the actions taken by one agent might affect the response characteristics of the environment. So we can no longer assume the Markovian property holds. In the domain of Learning Automata, this is referred to as state dependent non-stationarity (Narendra et al. 1989). When applying RL to a multi-agent case, two extreme approaches can be taken. The ﬁrst one totally neglects the presence of the other agents, and agents are considered to be selﬁsh reinforcement learners. The effects caused by the other agents also acting in that same environment are considered as noise. It is clear that for problems where agents have to coordinate in order to reach a preferable situation for both actions, this will not yield satisfactory results (Hu et al. 1999). The other extreme is the joint action space approach where the state and action space are respectively deﬁned as the Cartesian product of the agent’s individual state and action spaces. More formally, if S is the set of states and A1 , . . . , An the action sets of the different agents the learning will be performed in the product space (5)

S × A1 × · · · × An → R

This implies that the state information is shared amongst the agents and actions are taken and evaluated synchronously. It is obvious that this approach leads to very big state-action spaces, and assumes instant communication between the agents. Clearly this approach is in contrast with the basic principles of multi-agent systems: distributed control, asynchronous actions, incomplete information, cost of communication. In between these approaches we can ﬁnd examples which try to overcome the drawbacks of the joint action approach (Litmann et al. 1994; Claus et al. 1998; Jafari et al. 2001; Nowé et al. 1999). Below we describe cross-learning, which can be considered as multi-agent RL, and is important in clarifying the relationship between RL and EGT. Cross learning is a less complex model than Q-learning and even Learning Automata (LA, see Section 3.2.2) in the sense that it does not [ 152 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

317

TABLE V A payoff table U U=

U11 U12 U21 U22

require an initialisation and that there are not so many parameters to ﬁne tune as in LA and Q-learning. Cross learning does not consider a learning rate, a discount factor and an exploration strategy5 as Q-learning does, nor needs reward and penalty parameters6 as LA does. 3.2.1. Cross Learning The cross learning model is a special case of the standard reinforcement learning model (Sarin et al. 1997). The model considers several agents playing the same normal form game repeatedly in discrete time. At each point in time, each player is characterized by a probability distribution over his strategy set which indicates how likely he is to play any of his strategies. The probabilities change over time in response to experience. At each time step (indexed by n), a player chooses one of its Strategies based on the probabilities which are related to each isolated strategy. Positive payoffs represent reinforcing experiences, which induce a player to increase the probability of the strategy chosen. So, the larger the payoff, the larger the increase and thus the bigger the strength of reinforcement. As a result a player can be represented by a probability vector: p(n) = (p1 (n), . . . , pr (n)) In case of a 2-player game with payoff matrix U , player 1 gets payoff Uij when he chooses strategy i and player 2 chooses strategy j . We assume that (6)

0 < Uij < 1

In this case there is no deterrence. The iterations of the game are indexed by n ∈ N. Players do not observe each other’s strategies and payoffs and play the game repeatedly. After making their observations, each stage they update their probability vector, according to, (7)

pi (n + 1) = Uij + (1 − Uij )pi (n) [ 153 ]

318 (8)

KARL TUYLS ET AL.

pi (n + 1) = (1 − Uij )pi (n)

Equation (7) expresses how the probability of the selected strategy (i) is updated and Equation (7) expresses how all the other strategies i = i are corrected. If this player p played strategy i in the nth repetition of the game, and if he received payoff Uij , then he updates his state by taking a weighted average of the old state, and of the unit vector which puts all probability on strategy i. The probability vector q(n) (for the second player), q(n) = (q1 (n), . . . , qs (n)) is updated in an analogous manner. This entire system of equations deﬁnes a stochastic update process for the players {p(n), q(n)}. This process is called the “Cross learning process” in Sarin et al. (1997). Börgers and Sarin showed that in an appropriately constructed continuous time limit, this model converges to the asymmetric, continuous time version of the replicator dynamics (see Section 2.3). 3.2.2. Learning Automata Learning Automata have their origins in mathematical psychology (Bush et al. 1955). Originally, Learning Automata were deterministic and based on complete knowledge of the environment. Later developments came up with uncertainties in the system and the environment and lead to the stochastic automaton. More precisely, the stochastic automaton tries to provide a solution of the learning problem without having any information on the optimal action initially. It starts with equal probabilities on all actions and during the learning process these probabilities are updated based on responses from the environment. In Figure 7 a Learning Automaton is illustrated in its most general form. The environment is represented by a triple {α, c, β}, where α represents a ﬁnite action set, β represents the response set of the environment, and c is a vector of penalty probabilities, where each component ci corresponds to an action αi . The response β from the environment can take on 2 values β1 or β2 . Often they are chosen to be 0 or 1, where 1 is associated with a penalty response and 0 with a reward. Now, the penalty probabilities can be deﬁned as (9)

ci = P (β(n) = 1|α(n) = αi )

So ci is the probability that action αi will result in a penalty response. If these probabilities are constant, the environment is called stationary. [ 154 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

319

Figure 7. A Learning Automaton – Environment pair.

Several models are recognized by the response set of the environment. Models in which the response β can only take 2 values are called P models. Models which allow a ﬁnite number of values in a ﬁxed interval are called Q-models. When β is a continuous random variable in a ﬁxed interval, the model is called an S-model. In a variable structure stochastic automaton7 action probabilities are updated at every stage using a reinforcement scheme. The vector p is the action probability vector over the possible actions as with Cross Learning in the previous section. Important examples of update schemes are linear reward-penalty, linear rewardinaction and linear reward--penalty. The philosophy of those schemes is essentially to increase the probability of an action when it results in a success and to decrease it when the response is a failure. The general algorithm at timestep n + 1 is given by: (10)

pi (n + 1) = pi (n) + a(1 − β(n))(1 − pi (n)) − bβ(n)pi (n) if αI is the action taken at time n

(11)

pj (n + 1) = pj (n) − a(1 − β(n))pj (n) + bβ(n)[(r − 1)−1 −pj (n)] if αj = αi

where Equation (10) is the update rule for the performed action αi and Equation (11) for all the other actions. The constants a and b are the reward and penalty parameters respectively. When a = b the algorithm is referred to as linear reward-penalty (LR−P ), when b = 0 it is referred to as linear reward-inaction (LR−I ) and when b is small compared to a it is called linear reward--penalty (LR−P ). [ 155 ]

320

KARL TUYLS ET AL.

Figure 8. Automata Game representation.

If the penalty probabilities ci of the environment are constant, the probability p(n + 1) is completely determined by p(n) and hence p(n)n>0 is a discrete-time homogeneous Markov process. Convergence results for the different schemes are obtained under the assumptions of constant penalty probabilities, see Narendra et al. (1989). Learning automata can also be connected in useful ways. A simple example of a multi-agent system modeled as an automata game is shown in Figure 8. A play α(t) = (α 1 (t) . . . α n (t)) of n automata is a set of strategies chosen by the automata at stage t. Correspondingly the outcome is now a vector β(t) = (β 1 (t) . . . β n (t)). At every instance all automata update their probability distributions based on the responses of the environment. Each automaton participating in the game operates without information concerning payoff, the number of participants, their strategies or actions.

4. REINFORCEMENT LEARNING AND EVOLUTIONARY GAME THEORY

In this section we discuss the relation between Reinforcement Learning and EGT. We show how the 2 ﬁelds are formally related, with results from economics and computer science. Some examples will illustrate the strength of these results. More precisely, we show some examples of the dynamics of Q-learning and LA and show how EGT can be extended to be used as a formal foundation for the construction of new RL algorithms for MAS. The ﬁrst subsection summarizes some main results from economics, and the second will deal with extensions of these results. The third subsection will show how this formal relations between RL and EGT can be used as a foundation for modeling new RL algorithms for MAS, i.e., as an initial framework for RL in MAS. [ 156 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

321

Figure 9. Left: The direction ﬁeld of the RD of the prisoner’s game. Right: The paths induced by the learning process.

4.1. The Formal Relation between Cross Learning and EGT In their paper, Learning through Reinforcement and Replicator Dynamics, Börgers and Sarin prove an interesting link between EGT and Reinforcement Learning (Sarin et al. 1997). More precisely they considered a version of Bush and Mosteller (Bush et al. 1955) stochastic learning theory in the context of games and proved that in a continuous time limit, the learning model converges to the asymmetric continuous time replicator equations8 of EGT. With this result they provided a formalization of the relation between learning at the individual level and biological evolution. The version of the learning model of Bush and Mosteller is called Cross learning and has been thoroughly explained in Section 3.2.1. It is important to note that each time interval has to see many iterations of the game, and that the adjustments which players make between two iterations of the game are very small. If the limit is constructed in this manner, a law of large numbers can be applied, and the learning process converges, in the limit, to the replicator dynamics. Important to understand is that this result refers to arbitrary, points in ﬁnite time. The result does not hold if inﬁnite time is considered. The asymptotic behaviour for time tending to inﬁnity of the discrete time learning can be quite different from the asymptotic behaviour of the continuous time RD. For the mathematical proof of this result we refer the interested reader to Sarin et al. (1997). Before continuing this discussion we illustrate this result with the prisoner’s dilemma game. In Figure 9 we plotted the direction ﬁeld of the RD and the Cross learning process for this game. More precisely, the ﬁgure on the left illustrates the direction ﬁeld of the replicator dynamics and the ﬁgure on the right shows the learning process [ 157 ]

322

KARL TUYLS ET AL.

Figure 10. Left: The direction ﬁeld of the RD. Right: The paths induced by the learning process.

of Cross. We plotted for both players the probability of choosing their ﬁrst strategy (in this case defect). As you can see the sample paths of the reinforcement learning process approximates the paths of the RD. As mentioned before, the result of Börgers and Sarin only holds for a point in time t with t < ∞. It doesn’t apply however to the asymptotic behaviour for t → ∞. Moreover, the asymptotic behaviour of the learning process may be very different from that of the continuous RD. To show this we demonstrate a result of Börgers and Sarin concerning the discrete time learning process. This result says that, with probability 1, the learning process will converge to a limit in which both players play some pure strategy. For a mathematical proof of this proposition we refer to Sarin et al. (1997). Recall from Section 2.3.4 that in the third category of games, the RD circle around the mixed Nash equilibrium. Figure 10 now clearly illustrates that in this type of game the asymptotic behaviour is different for both models. 4.2. Extending the Formal Relation to Other RL-Models 4.2.1. Learning Automata and Evolutionary Dynamics In Tuyls et al. (2002) it is shown by the authors that the Cross learning model is a Learning Automaton with a linear-reward-inaction updating scheme. To provide the reader with an intuition on this relation we brieﬂy describe the mathematical relation between both learning models. More details and a variety of experiments can be found in Tuyls et al. (2002). Showing that the Cross learning model is a special case of LA we need to relate the Equations (7) and (8) with (10) and (11). [ 158 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

323

If it is assumed that b = 0 in Equations (10) and (11), the relation between both models becomes apparent. We now see that the Cross learning model is in fact a special case of the reward-inaction update scheme. When the reward penalty term a = 1, the feedback from the environment (1−β(n)) equals the game reward UjRk . Hence the equations become equivalent. The experiments in Tuyls et al. (2002) have been conducted with both the linear reward-inaction update scheme and the reward--penalty update scheme. 4.2.2. Q-Learning and Evolutionary Dynamics In this paragraph we brieﬂy9 describe the relation between Q-learning and the RD. More precisely we present the dynamical system of Q-learning. These equations are derived by constructing a continuous time limit of the Q-learning model, where Q-values are interpreted as Boltzmann probabilities for the action selection. Again we consider games between 2 players. The equations for the ﬁrst player are, xj dxi (12) = xi ατ ((Ay)i − x · Ay) + xi α xj ln dt xi j analogously for the second player, we have, (13)

dyi yj = yi ατ ((Bx)i − y · Bx) + yi α yj ln dt yi j

Equations (12) and (13) express the dynamics of both Q-learners in terms of Boltzmann probabilities.10 Each agent(or player) has a probability vector over his action set, more precisely x1 , . . . , xn over action set a1 , . . . , an for the ﬁrst player and y1 , . . . , ym over b1 , . . . , bm for the second player. For a complete discussion on this equations we refer to Tuyls et al. (2003b). Comparing (12) or (13) with the RD in (1), we see that the ﬁrst term of (12) or (13) is exactly the RD of EGT and thus takes care of the selection mechanism, see Weibull (1996). The second term turned out to be a mutation term, and can be rewritten as: (14) xi α xj ln(xj ) − ln(xi ) j

In equation (14) we recognize 2 entropy terms, one over the entire probability distribution x, and one over strategy xi . Relating entropy and mutation is not new. It is a well known fact (Schneider 2000; Stauffer (1999) that mutation increases entropy. In Stauffer (1999), it is stated that the concepts [ 159 ]

324

KARL TUYLS ET AL.

Figure 11. The direction ﬁeld plots of the battle of the sexes (subclass 2) game with τ = 1, 2, 10.

are familiar with thermodynamics in the following sense: the selection mechanism is analogous to energy and mutation to entropy. So generally speaking, mutations tend to increase entropy. Exploration can be considered as the mutation concept, as both concepts take care of providing variety. Section 4.2.2 illustrates the dynamics of Q-learning in the battle of the sexes game. The direction ﬁeld is plotted for three values of the temperature τ . 4.3. Extended Replicator Dynamics (ERD): Using the Initial Framework In Tuyls et al. (2003c) the authors changed the RD in a new kind of dynamics, i.e., the Extended Replicator Dynamics (ERD). The reasons for changing the RD become clear from Section 4.1 and Tuyls et al. (2003a, b). In one-state games it is impossible for Cross learning and Learning Automata to guarantee convergence to a stable Nash equilibrium in all types of games. In Boltzmann Q-learning a Nash equilibrium can be attained, but there is no guarantee for stability. For the development of an adapted selection dynamics, we took the Replicator dynamics and its interpretation as a starting point. In RD, the probabilities a player has over its strategies are changed greedily with respect to payoff in the present. In this section a method is shown to change these probabilities over strategies not only with respect to payoff growth in the present but also to payoff growth in the future. We call those players that act so as to optimize future payoff extended Cross learners and the class of dynamics associated extended dynamics. There are of course different ways to build such extended players. The most obvious is to use a linear approximation of the evolution of ﬁtness in time. This is the approach we use here. For the ERD we compose the following equation f , (15) [ 160 ]

f (x) = RD(x) + (dRD(x)/dt) ∗ η

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

325

where RD(x) is, (16)

dxi = [(Ax)i − x · Ax]xi dt

and η is the parameter that determines how far in the future we need to look. The composition of Equation (15) can best be understood as follows. When using the classical replicator equations (i.e., RD(x)), we act greedily toward payoff in the present. When adding our second term, (17)

(dRD(x)/dt) ∗ η

we act greedily toward payoff in the future. From an analytical point of view, the second term gives actions that are winning ﬁtness (whether its ﬁtness is negative or positive) a positive push toward a higher chance of getting selected. On the other hand, actions that are losing ﬁtness (again whether its ﬁtness is negative or positive) are given a negative push toward a lower chance of getting selected. This extends the traditional replicator equations. This extended evolutionary dynamics succeeds in converging to a stable Nash Equilibrium in all 3 categories of 2 * 2 games. In Tuyls et al. (2003c) we also constructed a model free RL algorithm which behaves as the ERD, based on Cross learning. Experiments conﬁrming this can be found in Tuyls et al. (2003c). Here we show an experiment for the third category of games. As you recall from Section 4.1 this type of game shows an important difference with our ERD. ERD and the extended Cross learning algorithm will not circle but converge to the mixed Nash equilibrium. This is illustrated in Figure 12. Moreover the equilibrium is stable, meaning that the learning process will not abandon it. The long-run learning dynamics are illustrated in the ﬁgure on the right.

5. EVOLUTIONARY GAME THEORY AND MULTI - AGENT SYSTEMS

In this section we discuss the most interesting properties that link the ﬁelds of EGT and MAS. Traditional Game theory is an economical theory that models interactions between rational agents as games of two or more players that can choose from a set of strategies and the corresponding preferences. It is the mathematical study of interactive decision making in the sense that the agents involved in the decisions take into account their own choices and those of others. Choices are determined by [ 161 ]

326

KARL TUYLS ET AL.

Figure 12. Left: The direction ﬁeld of the RD. Right: The paths induced by the learning process.

1. stable preferences concerning the outcomes of their possible decisions, 2. agents act strategically, in other words, they take into account the relation between their own choices and the decisions of other agents. Typical for the traditional game theoretic approach is to assume perfectly rational players who try to ﬁnd the most rational strategy to play. These players have a perfect knowledge of the environment and the payoff tables and they try to maximize their individual payoff. These assumptions made by classical game theory just do not apply to the real world and MultiAgent settings in particular. In contrast, EGT is descriptive and starts from more realistic views of the game and its players. A game is not played only once, but repeatedly with changing opponents that, moreover, are not completely informed, sometimes misinterpret each others’ actions, and are not completely rational but also biologically and sociologically conditioned. Under these circumstances, it becomes impossible to judge what choices are the most rational ones. The question now becomes how a player can learn to optimize its behaviour and maximize its return. For this learning process, mathematical models are developed, e.g., replicator equations. Summarizing the above we can say that EGT describes how boundedly rational agents can make decisions in complex environments, in which they interact with other agents. Bounded rationality means that agents are limited in their computational resources, in their ability to reason and have limited information. In such complex environments software agents must be able to learn from their environment and adapt to its non-stationarity. The basic properties of a Multi-Agent System correspond exactly with that of EGT. First of all, a MAS is made up of interactions between two [ 162 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

327

or more agents, with each trying to accomplish a certain (conﬂicting) goal. Not any agent has the guarantee to be completely informed about the other agents intentions or goals, nor has it the guarantee to be completely informed about the complete state of the environment. Of great importance is that EGT offers us a solid basis to understand dynamic iterative situations in the context of strategic games. A MAS has a typical dynamical character, which makes it hard to model and brings along a lot of uncertainty. At this stage EGT seems to offer us a helping hand in understanding this typical dynamical processes in a MAS and modeling them in simple settings as iterative games of two or more players.

6. CONCLUSIONS

By starting with a concise overview on key concepts from EGT and their mutual relationships, we provided an Evolutionary game theoretic point of view on Learning and MAS. By introducing the triangular relation between MAS, RL and EGT we formalized this perspective into one that results in new insights in MAS and more particularly for the indispensable concept of Learning in MAS. These new insights make it possible to overcome important daily occurrences and crucial learning issues in MAS. In this relation, EGT provides the necessary basic foundations for understanding and analyzing MAS. Because of the similarities between the two ﬁelds (recall Section 5), EGT is vital to MAS as a framework for learning and today’s applications. This is the ﬁrst link of Figure 1. The second link provided the mathematical formalization of the relation between RL and EGT. Again this relation offers a better understanding of learning in a MAS and provides basic mechanisms toward learning algorithms with more capabilities in the future (Tuyls et al (2003c). More precisely, in the case of 1-state games this means stable strategies which are possibly mixed. Obviously, this opens the door toward multi-state games, and MAS. These two ﬁrst links of Figure 1 are the keys toward solving the issues in the third link between RL and MAS. These main difﬁculties in MAS were profoundly described in Section 3. It is our belief that today’s MAS applications require agents to be adaptive and hence will beneﬁt from this evolutionary game theoretic perspective.

NOTES ∗ Author funded by a doctoral grant from the Institute for Advancement of Scientiﬁc

Technological Research in Flanders (IWT).

[ 163 ]

328

KARL TUYLS ET AL.

1 The Markov property states that only the present state gives any information of the future

behaviour of the learning process. Knowledge of the history of the process does not add any new information. 2 A strategy is dominant if it is always better than any other strategy, regardless of what the opponent may do. 3 Off course a solution orbit can evolve toward the boundary of the simplex as time goes to inﬁnity, and thus in the limit, when the distance to the boundary goes to zero, a pure strategy can disappear from the population of strategies. For a more formal explanation, we refer the reader to Weibull (1996). 4 A model consists of knowledge of the state transition probability function T (s, a, s ), which is the probability of ending up in state s after taking action a in state s, and the reinforcement function R(s, a), which is the payoff for taking action a in state s. 5 For instance in Boltzmann Q-learning the temperature determines the degree of exploration. 6 Parameters a and b in Equations (10) and (11). 7 As opposed to ﬁxed structure learning automata, where state transition probabilities are ﬁxed and have to be chosen according to the response of the environment and to perform better than a pure-chance automaton in which every action is chosen with equal probability. 8 Recall from Section 2.3.2 the deﬁnition of the asymmetric RD. 9 The reader who is interested in the complete derivation of the dynamics of Q-learning, we refer to Tuyls et al. (2003b). 10 Formally the Boltzmann distribution is described by, xi (k) = ne

τ Qai (k)

j=1

e

τ Qaj (k)

where xi (k) is the probability of playing strategy i at time step k and τ is the temperature.

REFERENCES

Bazzan A. L. C. and Franziska Klugl: 2003, ‘Learning to Behave Socially and Avoid the Braess Paradox in a Commuting Scenario’, in Proceedings of the First International Workshop on Evolutionary Game Theory for Learning in MAS, Melbourne Australia. Bazzan A. L. C.: 1997, A Game-Theoretic Approach to Coordination of Trafﬁc Signal Agents, Ph. D. thesis, University of Karlsruhe. Börgers, T. and R. Sarin: 1997, ‘Learning through Reinforcement and Replicator Dynamics’, Journal of Economic Theory 77(1). Braess D.: 1968, ‘Uber ein paradoxon aus der verkehrsplanung’, Unternehmensforschung 12, 258. Bush, R. R. and F. Mosteller, F.: 1955, Stochastic Models for Learning, Wiley, New York. Claus, C. and C. Boutilier: 1998, ‘The Dynamics of Reinforcement Learning in Cooperative Multi-Agent Systems, in Proceedings of the 15th International Conference on Artiﬁcial Intelligence, pp. 746–752. Ghosh, A. and S. Sen: 2003, ‘Learning TOMs: Convergence to Non-Myopic Equilibria’, in Proceedings of the First International Workshop on Evolutionary Game Theory for Learning in MAS, Melbourne, Australia. Gintis, C. M.: 2000, Game Theory Evolving, University Press, Princeton.

[ 164 ]

EVOLUTIONARY GAME THEORETIC PERSPECTIVE

329

Hirsch, M. W. and S. Smale: 1974, Differential Equations, Dynamical Systems and Linear Algebra, Academic Press, Inc. Hofbauer, J. and K. Sigmund: 1998, Evolutionary Games and Population Dynamics, Cambridge University Press. Hu, J. and M. P. Wellman: 1998, Multiagent Reinforcement Learning in Stochastic Games, Cambridge University Press. Jafari, C., A. Greenwald, D. Gondek, and G. Ercal: 2001, ‘On No-Regret Learning, Fictitious Play, and Nash Equilibrium’, in Proceedings of the Eighteenth International Conference on Machine Learning, pp. 223–226. Kaelbling, L. P., M. L. Littman, and A. W. Moore: 1996, ‘Reinforcement Learning: A Survey’, Journal of Artiﬁcial Intelligence Research. Littman, M. L.: 1994, ‘Markov Games as a Framework for Multi-Agent Reinforcement Learning’, Proceedings of the Eleventh International Conference on Machine Learning, pp. 157–163. Loch, J. and S. Singh: 1998, ‘Using Eligibility Traces to Find the Best Memoryless Policy in a Partially Observable Markov Process’, Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco. Luck, M., P. McBurney, and C. Preist: 2003, ‘A Roadmap for Agent Based Computing’, AgentLink, Network of Excellence. Maynard-Smith, J.: 1982, Evolution and the Theory of Games, Cambridge University Press. Maynard Smith, J. and G. R. Price: 1973, ‘The Logic of Animal Conﬂict’, Nature 146, 15–18. Narendra, K. and M. Thathachar: 1989, Learning Automata: An Introduction, PrenticeHall. Nowé, A., J. Parent, and K. Verbeeck: 2001, ‘Social Agents Playing a Periodical Policy’, in Proceedings of the 12th European Conference on Machine Learning, pp. 382–393. Nowé A. and K. Verbeeck: 1999, ‘Distributed Reinforcement learning, Loadbased Routing a Case Study’, Notes of the Neural, Symbolic and Reinforcement Methods for Sequence Learning Workshop at ijcai99, Stockholm, Sweden. von Neumann, J. and O. Morgenstern: 1944, Theory of Games and Economic Behaviour, Princeton University Press, Princeton. Osborne, J. O. and A. Rubinstein: 1994, A Course in Game Theory, MIT Press, Cambridge, MA. Pendrith, M. D. and M. J. McGarity: 1998, ‘An Analysis of Direct Reinforcement Learning in Non-Markovian Domains’, in Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco. Perkins, T. J. and M. D. Pendrith: 2002, ‘On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains’, in Proceedings of the International Conference on Machine Learning (ICML02). Redondo, F. V.: 2001, Game Theory and Economics, Cambridge University Press. Robocup project: 2003, ‘The Ofﬁcial Robocup Website at www.robocup.org, Robocup. Samuelson, L.: 1997, Evolutionary Games and Equilibrium Selection, MIT Press, Cambridge, MA. Schneider, T. D.: 2000, ‘Evolution of Biological Information’, Journal of Nucleic Acids Research 28, 2794–2799. Stauffer, D.: 1999, Life, Love and Death: Models of Biological Reproduction and Aging, Institute for Theoretical Physics, Köln, Euroland. Stone P.: 2000, Layered Learning in Multi-Agent Systems, MIT Press, Cambridge, MA.

[ 165 ]

330

KARL TUYLS ET AL.

Sutton, R. S. and A. G. Barto: 1998, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA. Tsitsiklis, J. N.: 1993, ‘Asynchronous Stochastic Approximation and q-Learning’, Internal Report from the Laboratory for Information and Decision Systems and the Operation Research Center, MIT Press, Cambridge, MA. Tuyls, K., T. Lenaerts, K. Verbeeck, S. Maes, and B. Manderick: 2002, ‘Towards a Relation between Learning Agents and Evolutionary Dynamics’, in Proceedings of the BelgiumNetherlands Artiﬁcial Intelligence Conference 2002 (BNAIC), KU Leuven, Belgium. Tuyls, K., K. Verbeeck, and S. Maes: 2003a, ‘On a Dynamical Analysis of Reinforcement Learning in Games: Emergence of Occam’s Razor, Lecture Notes in Artiﬁcial Intelligence, Multi-Agent Systems and Applications III, Lecture Notes in AI 2691, (Central and Eastern European conference on Multi-Agent Systems 2003), Prague, 16–18 June 2003, Czech Republic. Tuyls, K., K. Verbeeck, and T. Lenaerts, T.: 2003b, ‘A Selection-Mutation Model for QLearning in Multi-Agent Systems’, in The ACM International Conference Proceedings Series, Autonomous Agents and Multi-Agent Systems 2003, Melbourne, 14–18 July 2003, Australia. Tuyls, K., D. Heytens, A. Nowe, and B. Manderick: 2003c, ‘Extended Replicator Dynamics as a Key to Reinforcement Learning in Multi-Agent Systems’, Proceedings of the European Conference on Machine Learning’03, Lecture Notes in Artiﬁcial Intelligence, Cavtat-Dubrovnik, 22–26 September 2003, Croatia. Weibull, J. W.: 1996, Evolutionary Game Theory, MIT Press, Cambridge, MA. Weibull, J. W.: 1998, ‘What we have Learned from Evolutionary Game Theory so Far?’, Stockholm School of Economics and I.U.I., May 7, 1998. Weiss, G.: 1999, in Gerard Weiss (ed.), Multiagent Systems. A Modern Approach to Distributed Artiﬁcial Intelligence, MIT Press, Cambridge, MA. Wooldridge, M.: 2002, An Introduction to MultiAgent Systems, John Wiley & Sons, Chichester, England. Computational Modeling Lab Department of Computer Science Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium E-mail: [email protected]

[ 166 ]

ROBERT VAN ROOY

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

ABSTRACT. In this paper we study language use and language organisation by making use of Lewisean signalling games. Standard game theoretical approaches are contrasted with evolutionary ones to analyze conventional meaning and conversational interpretation strategies. It is argued that analyzing successful communication in terms of standard game theory requires agents to be very rational and fully informed. The main goal of the paper is to show that in terms of evolutionary game theory we can motivate the emergence and selfsustaining force of (i) conventional meaning and (ii) some conversational interpretation strategies in terms of weaker and, perhaps, more plausible assumptions.

1. INTRODUCTION

We all feel that information transfer is crucial for communication. But it cannot be enough: although smoke indicates that there is ﬁre, we would not say that communication is taking place. Also not all transfer of information between humans counts as communication. Incidental information transfer should be ruled out. Intuitively, in order for an event to mean something else, intentionality is crucial. And indeed, Grice (1957) characterizes ‘meaning’ in terms of communicator’s intentions. To mean something by x, speaker S must intend (1) S’s action x to produce a certain response a in a certain audience/receiver R; (2) R to recognize S’s intention (1); (3) R’s recognition of S’s intention (1) to function as at least part of R’s reason for R’s response a. The ﬁrst condition says basically that we communicate something in order to inﬂuence the receiver’s beliefs and/or behavior. However, for an act to be a communicative act, the response should be mediated by the audience’s recognition of the sender’s intention, i.e. condition 2. But also Synthese 139: 331–366, 2004. Knowledge, Rationality & Action 167–202, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[ 167 ]

332

ROBERT VAN ROOY

the recognition of the speaker’s intention is not sufﬁcient. To see what is missing, consider the following contrasting pair (Grice 1957): (1)a. b.

A policeman stops a car by standing in its way. A policeman stops a car by waving.

Although in both examples the ﬁrst two Gricean conditions are satisﬁed, we would say that only in (1b) some real communication is going on. The crucial difference between (1a) and (1b), according to Grice (1957), is that only in (1b) the audience’s recognition of the policeman’s intention to stop the car is effective in producing that response. In contrast to the case where he stands himself in the car’s way, the policeman does not regard it as a foregone conclusion that his waving will have the intended effect that the driver stops the car, whether or not the policeman’s intention is recognized. To be able to characterize the contrast between (1a) and (2b) is important to characterize linguistic, or conventional meaning. The difference between (2a) and (2b) seems to be of exactly the same kind. (2)a. b.

Feeling faint, a child lets its mother see how pale it is (hoping that she may draw her own conclusion and help.) A child says to its mother, “I feel faint”.

In contrast to (1a) and (2a), in the cases (1b) and (2b) an agent communicates something by means of a sign with a conventional meaning (Lewis 1969, 152–159; Searle 1969). But Grice did not really intend to characterize situations where agents intend to inﬂuence one another by making use of signals with a conventional meaning. He aimed to account for successful communication even without conventional ways of doing so. According to Grice, third-order intentionality is required for communicative acts: the speaker intends the hearer to recognize that the speaker wants the hearer to produce a particular response. Strawson (1964) and Schiffer (1972) showed by means of some examples that this third-order intentionality is not enough. We can still construct examples where an agent wants her audience to recognize her intention in order to produce a certain effect, without it intuitively being the case that the speaker’s action means the intended response: it can be that the speaker does not want her intention, that R performs the desired action, to become mutually known.1 For an action to be called communicative, the action has to make the speaker’s intention common knowledge. [ 168 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

333

In this paper we will study both conventional and non-conventional meaning in terms of signalling games as invented by David Lewis and developed further in economics and biology. However, we are going to suggest that in order to successfully communicate information we do not need as much rationality, higher-order intentionality or common knowledge as (explicitly or implicitly) required by Grice, Lewis, Schiffer, and others. Building on work of economists and biologists, we will suggest that evolutionary game theory can be used to account for the emergence and self-perpetual force of both arbitrary semantic rules and of general functional pragmatic interpretation strategies. This paper is organized as follows. In Section 2 the analysis of signalling in standard, or rational, game theory is examined. The standard problem here is that of equilibrium selection and in Section 3 Lewis’s (1969) conventional way of solving it is discussed, together with his motivation for why these conventional solutions are self-enforcing. In Section 4 evolutionary game theory is used to provide an alternative motivation for why linguistic conventions remain stable and why some candidate conventions are more natural to emerge than others. According to this alternative motivation we do not need to assume as strong notions of rationality and (common) knowledge as Lewis does. In Section 5 and 6 we argue that evolutionary signalling games can also be used to motivate why natural languages are organized and used in such an efﬁcient but still reliable way. Reliability (Grice’s maxim of quality) is tackled in Section 5, efﬁciency (Grice’s (1967) quantity and manner) in Section 6. The paper ends with some conclusions and suggestions for further research.

2. COMMUNICATION PROBLEMS AS SIGNALLING GAMES

2.1. Signalling Games For the study of information exchange we will consider situations where a speaker has some relevant information that the hearer lacks. The simplest games in which we see this asymmetry are signalling games. A signalling game is a two-player game with a sender, s, and a receiver, r. This is a game of private information: The sender starts off knowing something that the receiver does not know. The sender knows the state t ∈ T she is in but has no substantive payoff-relevant actions.2 The receiver has a range of payoff-relevant actions to choose from but has no private information, and his prior beliefs concerning the state the sender is in are given by a probability distribution P over T ; these prior beliefs are common knowledge. The sender, knowing t and trying to inﬂuence the action of the receiver, [ 169 ]

334

ROBERT VAN ROOY

sends to the latter a signal of a certain message m drawn from some set M. The messages do not have a pre-existing meaning. The other player receives this signal, and then takes an action a drawn from a set A. This ends the game. Notice that the game is sequential in nature in the sense that the players do not move simultaneously: the action of the receiver might depend on the signal he received from the sender. For simplicity, we take T , M and A all to be ﬁnite. A pure sender strategy, S, is a (deterministic) function from states to signals (messages): S ∈ [T → M], and a pure receiver strategy, R, a (deterministic) function from signals to actions: R ∈ [M → A]. Mixed strategies (probabilistic functions, which allow us to account for ambiguity) will play only a minor role in this paper and can for the most part be ignored. As an example, consider the following signalling game with two equally likely states: t and t ; two signals that the sender can use: m and m ; and two actions that the receiver can perform: a and a . Sender and receiver each have now four (pure) strategies:

Sender:

S1 S2 S3 S4

t m m m m

t m m m m

Receiver:

R1 R2 R3 R4

m a a a a

m a a a a

To complete the description of the game, we have to give the payoffs. The payoffs of the sender and the receiver are given by functions Us and Ur , respectively, which (for the moment) are elements of [T × A → R], where R is the set of reals. Just like Lewis (1969) we assume (for the moment) that sending messages is costless, which means that we are talking about cheap talk games here. Coming back to our example, we can assume, for instance, that the utilities of sender and receiver are in perfect alignment – i.e., for each agent i, Ui (t, a) = 1 > 0 = Ui (t, a ) and Ui (t , a ) = 1 > 0 = Ui (t , a).3 An equilibrium of a signalling game is described in terms of the strategies of both players. If the sender uses strategy S and the receiver strategy R, it is clear how to determine the utility of this proﬁle for the sender, Us∗ (t, S, R), in any state t: Us∗ (t, S, R) = Us (t, R(S(t))). Due to his incomplete information, things are not as straightforward for the receiver. Because it might be that the sender using strategy S sends in different states the same signal, m, the receiver does not necessarily know the unique state relevant to determine his utilities. Therefore, he determines [ 170 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

335

his utilities, or expected utilities, with respect to the set of states in which the speaker could have sent message m. Let us deﬁne St to be the information state (or information set) the receiver is in after the sender, using strategy S, sends her signal instate t, i.e. St = {t ∈ T : S(t ) = S(t)}.4 With respect to this set, we can determine the (expected) utility of receiver strategy R in information state St , which is R’s expected utility in state t when the sender uses strategy S, Ur∗ (t, S, R) (where P (t |St ) is the conditional probability of t given St ): P (t |St ) × Ur (t , R(S(t ))). Ur∗ (t, S, R) = t ∈T

A strategy proﬁle S, R forms a Nash equilibrium iff neither the sender nor the receiver can do better by unilateral deviation. That is, S, R forms a Nash equilibrium iff for all t ∈ T the following two conditions are obeyed:5 (i) (ii)

¬∃S : Us∗ (t, S, R) < Us∗ (t, S , R), ¬∃R : Ur∗ (t, S, R) < Ur∗ (t, S, R ).

As can be checked easily, our game has 6 Nash equilibria: {S1 , R1 , S3 , R2 , S2 , R3 , S2 , R4 , S4 , R3 , S4 , R4 }. This set of equilibria depends on the receiver’s probability function. If, for instance, P (t) > P (t ), then S2 , R4 and S4 , R4 are no equilibria anymore: it is always better for the receiver to perform a. In signalling games it is assumed that the messages have no pre-existing meaning. However, it is possible that meanings can be associated with them due to the sending and receiving strategies chosen in equilibrium. If in equilibrium the sender sends different messages in different states and also the receiver acts differently on different messages, we can say with Lewis (1969, 147) that the equilibrium pair S, R ﬁxes meaning of expressions in the following way: for each state t, the message S(t) means either St = {t ∈ T |S(t ) = S(t)} (in the case that the sentence is used indicatively) or R(S(t)) (if the sentence is used imperatively).6 Following standard terminology in economics (Crawford and Sobel 1982), let us call S, R a (fully) separating equilibrium if there is a one-to-one correspondence between states (meanings) and messages, i.e., if there exists a bijection between T and M. Notice that among the equilibria in our example, two of them are separating: S1 , R1 and S3 , R2 .

[ 171 ]

336

ROBERT VAN ROOY

2.2. Requirements for Successful Communication In the introduction we have seen that according to Schiffer an action only counts as being communicative if it makes the speaker’s intention common knowledge. It can be argued that this common-knowledge requirement is met if a game has a unique solution. It is well-known (Osborne and Rubinstein, 1994) that in order for a strategy pair to be a Nash equilibrium, both the strategies that the agents can play and the preferences involved have to be common knowledge. Moreover, it is required that it is common knowledge that both agents are rational selﬁsh payoff optimizers. If then, in addition, a particular signalling game has only one (Nash) solution, it seems only reasonable to claim that in that case the speaker’s intention becomes common knowledge after she sent a particular signal.7 Thus we might claim communication to take place by sending message m in such a game if and only if (i) the game has a (partly or fully) separating equilibrium in which message m is sent; and (ii) this is the unique solution of the game.8 The ﬁrst condition is prominent in the economic and biological literature on signalling games. The second, uniqueness, condition plays an important role in Schelling (1960), Lewis (1969), and Clark (1996) to solve coordination problems and is stressed in the work of Parikh (1991, 2001) on situated communication. The following example shows that in case of non-arbitrary signals this uniqueness condition is sometimes indeed unproblematically satisﬁed. Consider the following abstract situation. There are two kinds of situations: t, the default case where there is no danger; and t where there is danger. The sender knows which situation is the case, the receiver does not. We might assume for concreteness that it is commonly known between sender and receiver that P (t) = 0.8 > 0.2 = P (t ). In the normal situation, t, the sender does not send a message, but in the other case she might. The message will be denoted by m, while not sending a message will be modelled as sending . The receiver can perform two kinds of actions: the default action a (which is like doing nothing); and action a . This latter action demands effort from the receiver, but is the only appropriate action in the case that there is danger. It does not harm the sender if it is done if there is no danger (the sender is ambivalent about the receiver’s response in t). One way to describe this situation is by assuming the following (also commonly known) utility functions: Us (t, a) = 5, Us (t, a ) = 5, Us (t , a) = −50, Us (t , a ) = 50, Ur (t, a) = 6, Ur (t, a ) = 0, Ur (t , a) = −10, Ur (t , a ) = 10.

[ 172 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

337

The strategies are as expected: S is just a function from t to {m, }, where is the empty message that is always sent in t; while R is a function from {, m} to {a, a }. Thus, we have the following strategies

Sender:

S1 S2

t t m

Receiver:

R1 R2 R3 R4

a a a a

m a a a a

On assuming that P (t) = 0.8, we receive the following payoff tables (for Ui∗ (·, S, R)): t: S1 S2

t : S1 S2

R1 5,2.8 5,6

R2 5,2 5,2

R1 −50,2.8 −50,−10

R3 5,2.8 5,6 R2 50,2 50,10

R4 5,2 5,2 R3 R4 −50,2.8 50,2 50,10 −50,-10

These payoff tables show that our game has exactly one Nash equilibrium: S2 , R3 , because only this strategy pair is an equilibrium (is boxed) in both states. Because in this game the unique-solution requirement is satisﬁed, we can be sure that communication is successful: If the sender sends m, the receiver will ﬁgure out that he is in situation t and should perform a . Our game has exactly one Nash equilibrium in which meaningful communication is taking place because the sender has an incentive to inﬂuence the hearer and the receiver has no dominating action. If either the sender sees no value in sending information, or the receiver counts any incoming information as valueless for his decision, a signalling game will (also) have so-called ‘pooling’ equilibria, in which the speaker always sends the same message, and ‘babbling’ equilibria where the receiver ignores the message sent by the speaker and always ‘reacts’ by choosing the same action. In such equilibria no information exchange is taking place. One reason for why a receiver ignores the message sent might be that he cannot (always) take the incoming information to be credible. A message is not credible if an individual might have an incentive to send this message in order to deceive her audience. In an important article, Crawford and Sobel (1982) show that the amount of credible informa[ 173 ]

338

ROBERT VAN ROOY

tion exchange in (cheap talk) games depends on how far the preferences of the participants are aligned.9 However, this does not mean that in all those cases successful communication takes place when the sender sends a message. The unique solution requirement has to be satisﬁed as well, for otherwise sender and receiver are still unclear about the strategy chosen by the other conversational participant. Above we saw that in some cases such a unique solution is indeed possible. The example discussed in Section 2.1 suggests, however, that in signalling games in which messages have no pre-existing meaning, the satisfaction of the uniqueness condition is the exception rather than the rule.10 Even limiting ourselves to separating equilibria will not do. The problem is that that game has two such equilibria: S1 , R1 and S3 , R2 .11 How is communication possible in such a situation?

3. A LANGUAGE AS A CONVENTIONAL SIGNALLING SYSTEM

3.1. Conventions as Rationally Justiﬁed Equilibria Above we assumed that the agents had no real prior expectations about what the others might do. Consider a simple symmetric two-person coordination game where both have to choose between a and b; if they both choose the same action they earn 1 euro each and nothing otherwise. If both take either of the other’s actions to be equally likely (i.e., there are no prior expectations yet), the game has two (strict) Nash equilibria: a, a and b, b. Things are different if each player takes it to be more likely that the other player will choose, say, a. In that case, both have an incentive to play a themselves as well: the expected utility of playing a is higher than that of playing b. But it is not yet a foregone conclusion that both also actually should play a: the ﬁrst agent might believe, for instance, that the other player does not believe that the ﬁrst will play a and she does not take the second player to be rational. That is, the beliefs of the agents need not be coherent (with themselves, or/and with each other). In that case, the ﬁrst agent might have an incentive not to play a. This will not happen, of course, when the beliefs of the two agents and their rationality are common knowledge (or common belief). In that case, action combination a, a is the only Nash equilibrium of the game. In the light of the above discussion, Lewis (1969) gave a straightforward answer of how agents coordinate on a particular signalling equilibrium: it is based on the commonly known expectation that the other will do so and each other’s rationality. Confronted with the recurrent coordination problem of how to successfully communicate information, the agents [ 174 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

339

involved take one of the equilibria to be the conventional way of solving the problem. This equilibrium S, R can be thought of as a signalling convention; a coding system that conventionally relates messages with meanings. According to Lewis (1969), a signalling convention is a partially arbitrary way to solve a recurrent signalling situation of which it is commonly assumed by both agents that the other conforms to it. Moreover, it has to be commonly known that the belief that the other conforms to it, means that both have a good and decisive reason to conform to it themselves, and will want the other to conform to it as well. A linguistic convention is then deﬁned as a generalization of such a signalling convention, where the problem is how to resolve a recurrent coordination problem to communicate information in a larger community. We would like to explain a convention’s (i) emergence and (ii) its selfperpetuating force. Thinking of a convention as a special kind of equilibrium concept of rational game theory gives Lewis a straightforward explanation of why a convention is self-sustaining. Notice that the condition requiring that the belief that the other conforms to it means that both have a good and decisive reason to conform to it themselves is stronger than that of a Nash equilibrium: it demands that if the other player chooses her equilibrium strategy, it is strictly best (i.e., payoff-maximizing) for an agent to choose the equilibrium strategy too. Thus, according to Lewis, a convention has to be a strict Nash equilibrium.12 Strict equilibria in rational game theory are sustained simply by self-interest: if one expects the other to conform to the convention, unilateral deviation makes one (strictly) worse off.13 The notion of a strict equilibrium is stronger than the standard Nash equilibrium concept used in game theory. In terms of it we can explain why some equilibria are unlikely candidates for being conventions. Recall that the game discussed in Section 2.1 had 6 Nash equilibria: {S1 , R1 , S3 , R2 , S2 , R3 , S2 , R4 , S4 , R3 , S4 , R4 }. We have seen that only the ﬁrst two are separating: different messages are sent in different states such that there exists a 1-1 correspondence between meanings and messages. According to Lewis’s (1969) deﬁnition of a convention, only these separating equilibria are appropriate candidates for being a convention, and he calls them signalling systems. In the previous section we were confronted with what game theorists call the problem of equilibrium selection. Which of the (separating) equilibria of the game should the players coordinate on to communicate information? Lewis proposed to solve this problem by assuming that one of those equilibria is a convention. Which one of the (separating) equi[ 175 ]

340

ROBERT VAN ROOY

libria should be chosen to communicate information is, in some sense, arbitrary, and it is this fact that makes both separating equilibria S1 , R1 and S3 , R2 equally appropriate candidates for being a convention (for solving the recurrent coordination problem at hand). In some sense, however, Lewis’s solution just pulls the equilibrium selection problem back to another level: How are we to explain which of these regularities comes about? Two natural ways to establish a convention are explicit agreement and precedence. But for linguistic conventions the ﬁrst possibility is obviously ruled out (at least for a ﬁrst language), while the second possibility just begs the question. Following Lewis’s (1969) proposal of how to solve coordination problems, this leaves salience as the last possibility. A salient equilibrium is one with a distinguishing psychological quality which makes it more compelling than other equilibria. With Skyrms (1996), we ﬁnd this a doubtful solution for linguistic conventions: why should one of the separating equilibria be more salient than the other? But then, how can one signalling equilibrium be selected without making use of the psychological notion of salience? Not only is Lewis’s account of equilibrium selection problematic, his explanation of the self-perpetuating force of signalling equilibria is not completely satisfactory either. His explanation crucially makes a strong rationality assumption concerning the agents engaged in communication. Moreover, as for all equilibria concepts in standard game theory, a lot of common knowledge is required; the rules of the game, the preferences involved, the strategies being taken (i.e., lexical and grammatical conventions), and the rationality of the players must all be common knowledge.14 Though it is unproblematic to accept that the strong requirements for being common knowledge can be met for simple pieces of information, with Skyrms (1996) we ﬁnd it optimistic to assume that they are met for complicated language games played by large populations. 3.2. Natural Conventions Lewis (1969) admits that agents can conform to a signalling (or linguistic) convention without going through the explicit justiﬁcation of why they should do so, i.e. without taking into account what the others are supposed to do, or what they expect the agent herself to do. Agents can use a signalling system simply out of habit and they might have learned this habit just by imitating others. These habits are self-perpetuating as well: if each individual conforms to the signalling convention out of habit, there is no reason to change one’s own habit. Still, Lewis argues that rationality is important: the habit has a rational justiﬁcation. That might be so, but, then, not any justiﬁcation for a habit is necessarily the correct explanation [ 176 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

341

of why the habit is followed. Although rationality considerations arguably play a crucial role in learning and in following the conventions of one’s second language, this is not so clear when one learns and speaks one’s mother’s tongue. But if that is so, the higher-order intentions that Grice, Lewis, and others presuppose for successful communication are perhaps not as crucial as is standardly assumed. For signal m to mean a, a receiver does not always have to do a because of its conscious ‘recognition of the sender’s intention for it to do a’. According to a naturalistic approach towards meaning (or intentionality) – as most forcefully defended by Millikan (1984) in philosophy and also adopted by biologists thinking of animal communication as Maynard Smith and Harper (1995) – all that is needed for a signal to ‘mean’ something is that the sender-receiver combination S, R from which this message-meaning pair follows must be selected for by the force of evolution. In this way – as stressed by Millikan (1984) – a potential distinction is made not between human and animal communication, but rather between animal (including human) communication and ‘natural’ relations of indication. In distinction with the dances of honeybees to indicate where there is nectar to be found, smoke is not selected for by how well it indicates ﬁre.15 Just as Crawford and Sobel (1982) show that (cheap talk) communication is possible only when signalling is advantageous for both the sender and the receiver, in the same way it is guaranteed that for a signalling pair to be stable, there must be a selective advantage both (i) in attending and responding to the signals and (ii) in making them. This seems to be a natural reason for why a signalling convention has normative features as well. Evolutionary game theory (EGT) is used to study the notion of stability under selective pressures. Where traditional game theory is a normative theory with hyperrational players, EGT is more descriptive. It starts from a realistic view of the world, where players are neither hyperrational, i.e., are limited in their computational resources in their ability to reason, nor fully informed.

4. STABILITY AND EVOLUTION IN GAME THEORY

Lewis (1969) proposed to explain why linguistic conventions are selfsustaining in terms of rational game theory. To do so, he was forced to make very strong assumptions concerning agents’ rationality and (common) knowledge. This suggests that we should look for another theoretical underpinning of the self-sustaining force of signalling conventions. Above, we have seen that perhaps an (unconscious) mechanism like habit is an at least as natural reason for a linguistic convention to remain what it is. In this section we will show that by adopting an evolutionary stance towards [ 177 ]

342

ROBERT VAN ROOY

language, such a simpler mechanism might be enough for linguistic conventions to be stable. Our problem, i.e. which signalling conventions are self-sustaining, now turns into a problem of which ones are evolutionarily stable, i.e., resistant to variation/mutation. In Section 3.1 we have thought of a sender-receiver strategy pair S, R as a signalling convention to resolve a recurrent coordination problem to communicate information. We assumed that all that matters for all players was successful communication and that the preferences of the agents are completely aligned. A simple way to assure this is to assume that A = T and that all players have the following utility function: U (t, R(S(t))) = 1, if R(S(t)) = t = 0 otherwise. Implicitly, we still assumed that individuals have ﬁxed roles in coordination situations: they are always either a sender or a receiver. In this sense it is an asymmetric game. It is natural, however, to give up this assumption and turn it into a symmetric game: we postulate that individuals can take both the sender- and the receiver-role. Now we might think of a pair like S, R as a language. We abbreviate the pair Si , Ri by Li and take Us (t, Li , Lj ) = U (t, Rj (Si (t))) and Ur (t, Li , Lj ) = U (t, Ri (Sj (t))). Consider now the symmetric strategic game in which each player can choose between ﬁnitely many languages. On the assumption that individuals take both the sender and the receiver role half of the time, the following utility function, U(Li , Lj ), is natural for an agent with strategy Li who plays against an agent using Lj (where EUi (L, L ) denotes the expected language L if the other participant plays L , utility for i to play i.e. t P (t) × Ui (t, L, L )). 1 P (t) × Us (t, Li , Lj ))] U(Li , Lj ) = [ × ( 2 t 1 P (t) × Ur (t, Li , Lj ))] +[ × ( 2 t =

1 × (EUs (Li , Lj ) + EUr (Li , Lj )). 2

Now we say that Li is a (Nash) equilibrium of the language game iff U(Li , Li ) ≥ U(Li , Lj ) for all languages Lj . It is straightforward to show that language Li is a (strict) equilibrium of the (symmetric) language game if and only if the strategy pair Si , Ri is a (strict) equilibrium of the (asymmetric) signalling game. [ 178 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

343

Under what circumstances is language L evolutionarily stable? Thinking of strategies immediately as languages, standard evolutionary game theory (Maynard Smith 1982; Weibull 1995, and others) gives the following answer.16 Suppose that all individuals of a population use language L, except for a fraction of ‘mutants’ which have chosen language L . Assuming random pairing of strategies, the expected utility, or ﬁtness, of language Li ∈ {L, L }, EU (Li ), is now: EU (Li ) = (1 − )U(Li , L) + U(Li , L ). In order for mutation L to be driven out of the population, the expected utility of the mutant need to be less than the expected utility of L, i.e., EU (L) > EU (L ). To capture the idea that mutation is extremely rare, we require that a language is evolutionarily stable if and only if there is a (small) number n such that EU (L) > EU (L ) whenever < n. Intuitively, the larger n is, the ‘more stable’ is language L, since larger ‘mutations’ are resisted.17 As is well-known (Maynard Smith 1982), this deﬁnition comes down to Maynard Smith and Price’s (1973) concept of an evolutionarily stable strategy (ESS) for our language game. DEFINITION 1 (Evolutionarily Stable Strategy, ESS). Language L is Evolutionarily Stable in the language game with respect to mutations if 1. L, L is a Nash equilibrium, and 2. U(L , L ) < U(L, L ) for every best response L to L for which L = L. We see that L, L can be a Nash equilibrium without L being evolutionarily stable (see Tuyls et al. (this volume) for more discussion). This means that the standard equilibrium concept in evolutionary game theory is a reﬁnement of its counterpart in standard game theory (see Tuyls et al. (this volume) for more on the relation between the different equilibrium concepts). As it turns out, this reﬁnement gives us an alternative way from Lewis (1969) to characterize the Nash equilibria that are good candidates for being a convention. In an interesting article, Wärneryd (1993) proves the following result: For any sender-receiver game of the kind introduced above, with the same number of signals as states and actions, a language S, R is evolutionarily stable if and only if it is a (fully) separating Nash equilibrium.18 In fact, this result follows immediately from more general game theoretical considerations. First, it follows already directly from the deﬁnition above that being a strict Nash equilibrium is a sufﬁcient condition for being an ESS. Given that in our asymmetric cooperative signalling [ 179 ]

344

ROBERT VAN ROOY

games the separating equilibria are the strict ones, a general result due to Selten (1980) – which states that in asymmetric games all and only the strict equilibria are ESS – shows that this is also a necessary condition. Thus we have the following FACT 1 (Wärneryd (and Selten)). In a pure coordination language game, L is an ESS if and only if L, L is a separating Nash equilibrium. In this way Wärneryd (and Selten) has given an appealing explanation of why Lewisean signalling systems are self-sustaining without making use of a strong assumption of rationality or (common) knowledge. But this is not enough for the evolutionary stance to be a real alternative to Lewis’s approach towards conventions. It should also be able to solve the equilibrium selection problem. Which of the potential candidates is actually selected as the convention? As it turns out, also this problem has an appealing evolutionary solution, if we also take into account the dynamic process by which such stable states can be reached. Taylor and Jonker (1978) deﬁned their replicator dynamics to provide a continuous dynamics for evolutionary game theory. It tells us how the distribution of strategies playing against each other changes over time.19 A dynamic equilibrium is a ﬁxed point of the dynamics under consideration. A dynamic equilibrium is said to be asymptotically stable if (intuitively) a solution path where a small fraction of the population starts playing a mutant strategy still converges to the stable point (for more discussion, see Tuyls et al. (this volume) and references therein). Asymptotic stability is a reﬁnement of the Nash equilibrium concept. And one that is closely related with the concept of ESS. Taylor and Jonker (1978) show that every ESS is asymptotically stable. Although in general it is not the case that all asymptotically stable strategies are ESS, on our assumption that a language game is a cooperative game (and thus doubly symmetric)20 this is the case. Thus, we have the following FACT 2. A language L is an ESS in our language game if and only if it is asymptotically stable in the replicator dynamics. The ‘proof’ of this fact follows immediately from some important more general results provided by Weibull (1995, section 3.6). First, he shows that Fisher’s (1930) so-called fundamental theorem of natural selection – according to which evolutionary selection induces a monotonic increase over time in the average population ﬁtness –, applies to all doubly symmetric games. This means that in such games the dynamic process will always result in a ‘local maximum’ or ‘local efﬁcient’ strategy.21 From [ 180 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

345

this it follows that in such games any local efﬁcient strategy – which is itself already equivalent to being an ESS – is equivalent with asymptotic stability in the replicator dynamics.22 Fact 2 shows that a separating Nash equilibrium – i.e., a signalling equilibrium that according to Lewis is a potential linguistic convention – , will evolve in our evolutionarily language games (almost) by necessity.23 The particular one that will evolve depends solely on the initial distribution of states and strategies (languages). With Skyrms (1996) we can conclude that if the evolution of linguistic conventions proceeds as in replicator dynamics, there is no need to make use of the psychological notion of salience to explain selection of conventional equilibria.

5. RELIABILITY AND COSTLY SIGNALLING

Until now we have assumed that conventional languages are used only when the preferences of the agents involved are aligned. But, of course, we use natural language also if this pre-condition is (known) not (to be) met. As we have seen in Section 2.2, however, in that case the sender (might) have an incentive to lie and/or mislead and the receiver has no incentive to trust what the sender claims. But even in these situations, agents – human or animal – sometimes send messages to each other, even if the preferences are less harmonically aligned.24 Why would they do that? In particular, how could it be that natural language could be used for cooperative honest communication even in these unfavourable circumstances? Perhaps the ﬁrst answer that comes to mind involves reputation and an element of reciprocity. These notions are standardly captured in terms of the theory of repeated games (Axelrod and Hamilton, 1981).25 The standard answer to our problem how communication can take place if the preferences are not perfectly aligned both in economics (starting with Spence (1973)) and in biology (Zahavi 1975; Grafen 1990; Hurd 1995) does not make use of such repeated games. Instead, it is assumed that reliable communication is also possible in these circumstances, if we assume that signals can be too costly to fake.26 The utility function of the sender takes no longer only the beneﬁt of the receiver’s action for a particular type of sender into account, but also the cost of sending the message. The aim of this section is to show that this standard solution in biology and economics can, in fact, be thought of as being very close to our intuitive solution involving reputation. [ 181 ]

346

ROBERT VAN ROOY

We will assume that the sender’s utility function Us can be decomposed in a beneﬁt function, Bs and a cost-function, C. Consider now a two-type two-action game with the following beneﬁt table. two-type, two-action:

tH tL

aH aL 1, 1 0, 0 1, 0 0, 1

In this game, the informed player (the sender) prefers, irrespective of her type, column player to choose aH while column player wants to play aH if and only if the sender is of type tH . For a separating equilibrium to exist, individuals of type tL must not beneﬁt by adopting the signal typical of individuals of type tH , even if they would elicit a more favorable response by doing so. Hurd (1995) shows that when we assume that the cost of sending a message can depend on the sender’s type, an appealing separating equilibrium exists. Assume that the cost of message m saying that the sender is of type tH is denoted by C(ti , m) for individuals of type i and that sending is costless for both types of individuals. Provided that C(tL , m) > 1 > C(tH , m), the cost of sending m will outweigh the beneﬁt of its production for individuals of type tL , but not for individuals of type tH , so that the following separating equilibrium exists: individuals of type tH send message m,while individuals of type tL send . Notice that on Hurd’s characterization, in the equilibrium play of the game it is possible that not only tL sends a costless message, but that the high type individual tH does so as well!27 This suggests that the theory of costly signalling can be used to account for honest communication between humans who make use of a conventional language with cost-free messages. Moreover, an evolutionary argument shows that Hurd’s characterization with costfree messages sent in equilibrium is actually the most plausible one.28 The only thing that really matters is that the cost of sending a deceiving message is higher than its potential beneﬁt (so that they are sent only by individuals who deviate from equilibrium play). How can we guarantee this to be possible? In the example discussed in this section, as in the examples discussed in the economic and biological literature, it is advantageous pretending to be better than one actually is. This is crucially based on the assumption that messages are not (immediately) veriﬁable. This assumption opens the possibility that low-quality individuals could try to masquerade themselves as being of a high quality. And this assumption makes sense: if all messages could immediately be veriﬁed, the game being played is one of complete information in which it makes no sense to send messages about one’s type (i.e. private information) at all. However, the assumption that messages are [ 182 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

347

completely unveriﬁable is for many applications unnatural as well: an individual can sometimes be unmasked as a liar, and she can be punished for it. Thus, making a statement can be costly: one can be punished (perhaps in terms of reputation) when one has claimed to be better than one actually is.29,30 If this punishment is severe enough, even a small probability of getting unmasked can already provide a strong enough incentive not to lie.31 The above sketched analysis of truthful human communication suggests that although natural language expressions are cheap in production, the theory of costly signalling can still be used to account for communicative behavior between humans. With Lachmann et al (manuscript) I take this to be an important insight: it suggests a way to overcome the limitations of both cheap talk signalling and the adoption of the cooperative assumption by Grice and Lewis. By assuming that sending signals can be costly, we can account for successful communication even if the preferences of the agents involved do not seem to be well aligned. Perhaps the most appealing way to think of Hurd’s result is that it explains why in more situations the agent’s preferences are aligned than it appears at ﬁrst sight such that the possibility of communication is the rule, rather than the exception.32

6. THE EFFICIENT USE OF LANGUAGE

Until now we have discussed how an expression m of the language used could come to have (and maintain) its conventional meaning [[m]]. This does not mean, however, that if a speaker uses m she just wants to inform the receiver that [[m]] is the case. It is well established that a speaker normally wants to communicate more by the use of a sentence than just its conventional meaning. Sometimes this is the case because the conventional meaning of an expression underspeciﬁes its actual truth-conditional interpretation; at other times the speaker implicates more by the use of a sentence than its truth-conditional conventional meaning. It is standard to assume that both ways of enriching conventional meaning are possible because we assume that the speaker conforms to Grice’s (1967) maxims of conversation: she speaks the truth (quality), the whole truth (quantity), though only the relevant part of it (relevance), and does so in a clear and efﬁcient way (manner). Grice argued that because the speakers are taken to obey these maxims, a sentence can give rise to conversational implicatures: things that can be inferred from an utterance that are not conditions for the truth of the utterance. Above, we discussed already the maxim of quality, which has a somewhat special status. Grice argues that the implicatures generated by the other maxims come in two sorts: particu[ 183 ]

348

ROBERT VAN ROOY

larized ones, where the implicature is generated by features of the context; and generalized ones, where (loosely speaking) implicatures are seen as default rules possibly overridden by contextual features. There exist general agreement that both kinds of implicatures exist, but the classiﬁcation of the various implicatures remains controversial within pragmatics. Whereas relevance theorists (Sperber and Wilson, 1986) tend to think that implicatures depend predominantly on features of the particular context, Levinson (2000), for example, takes generalized implicatures to be the rule rather than the exception. Similar controversies can be observed on the issue of how to resolve underspeciﬁed meanings: whereas Parikh (1991, 2001) argues optimistically that indeterminacy in natural language can be solved easily in many cases through the existence of a unique (Pareto-Nash) solution of the coordination problem of how to resolve the underspeciﬁcation, proponents of centering theory (Grosz et al. 1995), for example, argue that pronoun resolution is, or needs to be, governed by structural (default) rules. Except for the maxim of quality, Horn (1984), Levinson (2000), and others argue that the Gricean maxims can be reduced to two general principles: The I -principle which tells the hearer to interpret a sentence in its most likely or stereotypical way, and the Q-principle which demands the speaker to give as much (relevant) information as possible. In this section it will be argued that two general pragmatic rules which closely correspond with these two principles can be given an evolutionary motivation which suggests that ‘on the spot’ reasoning need not play the overloaded role in natural language interpretation as is sometimes assumed.

6.1. Iconicity in Natural Languages In Section 3.1 we have seen that Lewis (1969) proposes to explain the semantic/conventional meaning of expressions in terms of separating equilibria of signalling games. However, we also saw that simple costless signalling games have many such equilibria. Lewis assumed that each of these equilibria are equally good and thus that it is completely arbitrary which one will be chosen as a convention. In Section 4 we have seen that all separating equilibria satisfy the ESS condition and that which one will in the end emerge is a matter of chance and depends only on the initial distribution of states and strategies (languages). Although natural at the level of individual words and the objects they refer to, at a higher organizational level the assumption of pure arbitrariness or chance can hardly be sustained. It cannot explain why conventions that enhance efﬁcient communication are more likely than others that do not. [ 184 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

349

Consider a typical case of communication where two meanings t1 and t2 can be expressed by two linguistic messages m1 and m2 . We have here a case of underspeciﬁcation: the same message can receive two different interpretations. In principle this gives rise to two possible codings: {t1 , m1 , t2 , m2 } and {t1 , m2 , t2 , m1 }. In many communicative situations, however, the underspeciﬁcation does not really exist, and is resolved due to the general pragmatic principle – referred to as the pragmatic iconicity principle – that a lighter (heavier) form will be interpreted by a more (less) salient, or stereotypical, meaning: (i) It is a general defeasible principle, for instance, in centering theory (Grosz et al, 1995) that if a certain object/expression is referred to by a pronoun, another more salient object/expression should be referred to by a pronoun too; (ii) Levinson (2000) seeks to reduce Chomsky’s B and C principles of the binding theory to pragmatics maxims. In particular, disjoint reference of lexical noun phrases throughout the sentence is explained by pointing to the possibility of the use of a lighter expression, viz. an anaphor or pronoun; (iii) The preference for stereotypical interpretations (Atlas and Levinson, 1981); (iv) and perhaps most obviously, Horn’s (1984) division of pragmatic labor according to which an (un)marked expression (morphologically complex and less lexicalized) typically gets an (un)marked meaning (cf. John made the car stop versus John stopped the car). Horn (1984), Levinson (2000), Parikh (1991, 2001) and Blutner (2000) correctly suggest, that because this generalized pragmatic iconicity principle allows us to use language in an efﬁcient way, it is not an arbitrary convention among language users. There is no alternative rule which would do equally well for the same class of interactions if people generally conformed to this alternative. This can be seen most simply if we think of languages that are separating equilibria in our language game as coding systems of meanings distributed with respect to a particular probability function.33 This suggests that the rule should follow from more general economic principles. Indeed, Parikh gives a game theoretical analysis of why this principle of iconicity is observed. However, he treats it as a particularized conversational implicature. Here we want to argue that it should rather be seen as a generalized default rule.34 6.1.1. Underspeciﬁcation and Pragmatic Interpretation Rules In Section 2.2 we saw that in cheap talk games meaningful communication is possible only in so far as the preferences of the participants coincide. But in Section 5 we showed that by making use of costly messages we can overcome this limitation. It is standardly assumed that this is the only reason why costs of messages are taken into account: to turn games in [ 185 ]

350

ROBERT VAN ROOY

which the preferences are not aligned to ones where they are. We have suggested that in this way we can account for Grice’s maxim of quality. In this section we will see, however, that costly messages can also be used to account for another purpose and give a motivation for the pragmatic iconicity principle. To be able to do so, we should allow for underspeciﬁcation or context dependence. In different contexts, the same message can receive a different interpretation.35 In our description of signalling games so far it is not really possible to represent a conventional language with underspeciﬁed meanings that are resolved by context. The best thing we could do is to represent underspeciﬁcation as real ambiguity: sender strategy S is a function that assigns the same message to different states, while receiver strategy R is a mixed strategy assigning to certain messages a non-trivial probability distribution over the states. Such a sender-receiver strategy combination can never be evolutionarily stable (Wärneryd, 1993): one can show that a group of individuals using a mutant language without such ambiguity has no problem invading and taking over a population of ambiguous language users (if there exists an unused message). To account for underspeciﬁcation, we have to enrich our models and take contexts into account. For the purpose of this section we can think of a context as a probability distribution over the state space T . For simplicity (but without loss of generality) we assume that T = {t, t } and M = {m, m }. Communication takes place in two kinds of contexts: in one context where P (t) = 0.9 (and thus P (t ) = 0.1) and in one where P (t) = 0.1. We assume that both contexts are equally likely. Call the ﬁrst context ρ1 and the second ρ2 . We will assume that it is common knowledge among the conversational partners in which context they are, but only the sender knows in each context in which state she is. The messages do not have a pre-existing meaning, but differ in terms of (production) costs: we assume that C(m) < C(m ). However, we assume that also for the sender it is always better to have successful communication with a costly message than unsuccessful communication with a cheap message. Thus, in contrast to Section 5, we assume that the cost of sending a message can never exceed the beneﬁt of communication. To assure this, we will take the sender’s utility function to be decomposable into a beneﬁt and a cost function, Us (ti , mj , tk ) = Bs (ti , tk ) − C(mj ), with C(m) = 0, C(m ) = 13 , and adopt the following beneﬁt function: Bs (ti , tk ) = 1, if tk = ti = 0 otherwise.

[ 186 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

351

A sender strategy is now a function mapping a state and a context to a message, while a receiver strategy is now a function from messagecontext pairs to states. This game has of course two separating equilibria, call them L and L , with no underspeciﬁcation: L gives rise to the mapping {t, m, t , m } in both contexts, while L to {t, m , t , m}. These languages are also evolutionarily stable in the sense of being an ESS. However, our new game also allows for languages with underspeciﬁcation to be evolutionarily stable. Given that it is common knowledge between sender and receiver in which context they are, the only requirement is that they give rise to a separating equilibrium in each context. We can distinguish two such underspeciﬁed languages: (i) the Horn language LH with the mappings {t, m, t , m } and {t, m , t , m} in contexts ρ1 and ρ2 , respectively; and (ii) the anti-Horn language LAH where the two mappings are used in the other contexts. It is easily seen that both languages are evolutionarily stable, because also LH , LH and LAH , LAH are strict Nash equilibria. Both languages do better against themselves than against the other, or against L or L . Our above discussion shows that underspeciﬁcation is possible. However, we want to explain something more: why is underspeciﬁcation useful, and why is the underspeciﬁed Horn language LH which incorporates the pragmatic iconicity principle more natural to emerge than the underspeciﬁed Anti-Horn language LAH ? As it turns out, the problem we encountered is a well-known one in game theory: how to select among a number of strict Nash equilibria the one that has the highest expected utility, i.e., is (in our games) Pareto optimal? In our language game described above we had four strict equilibria: L, L, L , L , LH , LH , and LAH , LAH . These equilibria correspond to our four evolutionarily stable languages L, L , LH , and LAH , respectively. A simple calculation shows that the Horn language is the one with the highest expected utility.36 Thus, if we can ﬁnd a natural explanation of why our evolutionary dynamics tends to select such optimal equilibria, we have provided a naturalistic explanation for why (i) languages make use of underspeciﬁcation, and (ii) respect the iconicity principle. 6.1.2. Correlated and Stochastic evolution In van Rooy (in press), two (relatively) standard explanations of why Pareto optimal solutions (in coordination games) tend to evolve are discussed. According to both, we should give up an assumption behind the stability concepts used so far. According to the ﬁrst explanation (Skyrms 1994, 1996) we give up the assumption that individuals pair randomly with other individuals in [ 187 ]

352

ROBERT VAN ROOY

the population. Random pairing is assumed in the calculation of the expected utility of a language. The probability with which individuals using language Li interact with individuals using Lj depends simply on the proportion of individuals using the latter language: EU(Li ) = j P (Lj ) × U(Li , Lj ). This expected utility was used both to determine the ESS concept and to state the replicator dynamics. By giving up random pairing (a well-known strategy taken in biology to account for kin-selection, and in cultural evolution to account for clustering), we have to postulate the existence of an additional function which determines the likelihood that an individual playing Li encounters an individual playing Lj , π(Lj /Li ), such /Li ) = 1. What counts then is the following expected utility: that j π(Lj EUπ (Li ) = j π(Lj /Li ) × U(Li , Lj ). The other deﬁnitions used in the dynamic system follow the standard deﬁnitions in replicator dynamics. Although this generalization seems to be minor, it can have signiﬁcant effects on the resulting stable states. Assume a form of correlation: a tendency of individuals to interact more with other individuals playing the same strategy (i.e., using the same language). Formally, positive correlation comes down to the condition that for any language Li , π(Li /Li ) > P (Li ) (Skyrms 1994). In the extreme case, i.e. π(Li /Li ) = 1, the only stable state in the replicator dynamics is the one which has the highest expected utility in self-interaction. In our case the Pareto optimal language LH is selected and we have an evolutionary explanation for the existence of underspeciﬁcation and the use of iconicity.37 For our evolutionary language game there is another, and perhaps more natural, possibility to ensure the emergence of Pareto efﬁcient languages. It is to give up the assumption that the transition from one generation to the next in the dynamic model is completely determined by the distribution of strategies played in a population and their expected utilities. We can assume that the transition is (mildly) stochastic in nature.38 As shown by Kandori, Mailath and Rob (1993) and Young (1993), this results in the selection of the so-called risk-dominant strict Nash equilibria in the (very) long term.39,40 In general, a risk-dominant equilibrium need not be Pareto efﬁcient, but in cooperative games the two concepts coincide. A natural way to allow for stochastic adjustment in our evolutionary language game is to give up the assumption that an individual simply adopts the strategy from its parent with probability 1. Giving up this assumption makes sense: the inheritance of language is imperfect, possibly due to non-optimal learning.41 It is not obvious which of the proposals to motivate the attraction of Pareto optimal solutions is more plausible to assume for natural languages.

[ 188 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

353

In fact, both factors (clustering and innovation) seem to play a role in the evolution and change of languages.

6.2. Exhaustive Interpretation According to Grice’s (1967) maxim of Quantity, or Horn’s (1984) and Levinson’s (2000) Q-principle, speakers should give as much (relevant) information as possible. In this section we will give a motivation for that, and suggest a way to explain the convention according to which the effort required for this optimal transfer of information is divided between speaker and hearer such that the speaker does not have to be fully explicit and the receiver interprets answers exhaustively. Let us assume that the relevant information is the information needed for the receiver to resolve his decision problem. Suppose the receiver r is commonly known to be a Bayesian utility maximizer and he asks question Q because this question ‘corresponds’ closely to his decision problem. What is then the best answer-strategy S to give relevant information? Let us look at some candidates. According to strategy S1 , the speaker gives as much relevant information as she can. Assume that K(t) denotes the set of states that the sender thinks are possible in t. Assume, furthermore, that question Q gives rise to (or means) a partition of T : Q. Thus, each element q of Q is also a set of states. We will refer both to the interrogative sentence and to the induced partition as a question. Then we deﬁne QK(t ) to be {q ∈ Q|q ∩ K(t) = ∅}, i.e. the elements of Q which the sender takes to be possible. Then S1 is the strategy that gives in every state t the following proposition: QK(t ), i.e., the union of the elements of QK(t ). Suppose, for instance, that Q denotes the question corresponding to ‘Who came?’, i.e., ‘Which individuals have property P ?’, and that K(t) = {t, t } such that in t (only) John came, and in t , John and Mary. The message that then expresses proposition QK(t ) is ‘John, perhaps Mary, and nobody else’. If we assume that the answerer knows exactly who came, i.e., is known to be fully competent about the question-predicate P , the proposition expressed is λt [P (t ) = P (t)]. According to strategy S2 , the speaker gives the set of individuals of whom she is certain that they satisfy the question-predicate. Thus, for the question Who has property P?, the answer is going to be λt [KP (t) ⊆ P (t )], where KP (t) is the set {d ∈ D|K(t) |= P (d)} and K(t) |= P (d) iff ∀t ∈ K(t) : d ∈ P (t ). In the example discussed for strategy S1 , the answerer would now use a message like ‘(At least) John came’. Notice that if the answerer is known to be competent about the extension of P , the answer reduces to λt [P (t) ⊆ P (t )]. [ 189 ]

354

ROBERT VAN ROOY

These answer-strategies closely correspond with some well-known analyses of questions in the semantic literature: on the assumption of competence, S1 gives rise to Groenendijk and Stokhof’s (1984) partition semantics: {S1 (t)|t ∈ T } = {λt [P (t ) = P (t)]|t ∈ T }, while S2 gives rise to {S2 (t)|t ∈ T } = {λt [P (t) ⊆ P (t )]|t ∈ T } = { {λt [d ∈ P (t )]|d ∈ P (t)}|t ∈ T } which corresponds to Karttunen’s (1977) semantics for questions. In order to investigate which of S1 , S2 , or some other strategies can be part of a Nash equilibrium together with receiver strategy RB which implements a Bayesian rational agent, we have to assume that, in equilibrium, the receiver knows S. Thus he is not going to update his belief with [[S(t)]], but rather with St = {t ∈ T |S(t ) = S(t)}. A speaker who uses S1 would give in each state at least as much (relevant) information as speakers using the alternative strategies. In particular, for each state t (where P has a non-empty extension), S1,t ⊆ S2,t . This is obvious for the propositions that would be given in state t on the assumption of full competence: S1 : λt [P (t ) = P (t)]; S2 : λt [P (t) ⊆ P (t )], but the same is true if we do not make our assumption of competence. A well-known fact of decision theory (Blackwell 1953) states that an agent with an information structure, or possibility operator, K, is able to make at least as good decisions as an agent with possibility operator K iff for each t ∈ T : K(t) ⊆ K(t ). But this means that if our questioner is Bayesian rational and is going to believe what the answerer tells him, S1 is the for him preferred answer-strategy. On an assumption of perfect cooperation, this is also true for the answerer himself. We can conclude that if RB is the strategy adopted by a perfect Bayesian, S1 , RB is the only Nash equilibrium of the game induced by question Q. In fact, it is (on average) strictly better than S2 and other alternatives, which means that S1 , RB is the only ESS on the same assumptions. Thus, in cooperative games it is optimal to obey the Gricean maxim to give asmuch relevant information as possible, i.e. give an answer with meaning QK(t ). Suppose that the question under discussion, Q, is Who has property P?. How should the answer be coded? Given that it is only propositions involving question-predicate P that counts, the optimal answer QK(t ) given in state t equals the intersection of the following propositions (where D is the set of individuals, P (d) means that the speaker thinks it is possible that d has property P , and P¯ denotes the complement of P ): {[[P (d)]] : K(t) |= P (d)}; W = d∈D

X=

d∈D

[ 190 ]

{[[P (d)]] : K(t) ∩ [[P (d)]] = ∅};

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

Y =

355

{[[P¯ (d)]] : K(t) ∩ [[P¯ (d)]] = ∅};

d∈D

Z=

{[[P¯ (d)]] : K(t) |= P¯ (d)}.

d∈D

Now suppose that the answerer is mutually known to be fully competent about predicate P . That is, she knowsof each individual whether it has property P or property P¯ . In that case QK(t ) = W ∩ Z, i.e., the proposition that states for each individual whether it has property P or property P¯ . We have seen above that the combination S1 , RB is an equilibrium strategy as far as information transfer is concerned. But it is a somewhat unfair equilibrium as well: the sender is (normally) required to give a very complex answer, while the receiver can lay back. As far as information transfer is concerned, however, our above reasoning does not force us to this equilibrium. If it is common knowledge between speaker and hearer that the latter will infer more from the use of a message than its conventional meaning, many other sender-receiver strategies give rise to the same optimal information transfer as well. For example, on our assumption of full competence again, the speaker does not have to explicitly state of each individual whether it satisﬁes question-predicate P or satisﬁes P¯ . If it is mutually known between the two that the sender only mentions the positive instances, i.e., uses strategy S2 , she can just express the above proposition W , leaving proposition Z left for the hearer to infer. This would obviously be favorable to the sender, but is a natural consequence of evolution, only if the extra effort transferred to the hearer is relatively small. This is possible, if the task left to the hearer – i.e., to infer from a message with conventional meaning W to the above proposition QK(t ) = W ∩ Z –, can be captured by a simple but still general interpretation mechanism. As it turns out, this is, in fact, the case. Assuming that the hearer receives message m as answer to the question Who has property P?, he can interpret it as follows (where t

=

{t ∈ [[m]]|¬∃t ∈ [[m]] : t

This interpretation strategy of answers is known as predicate circumscription (of m with respect to P ) in Artiﬁcial Intelligence (McCarty 1980). In linguistics it is known as the exhaustive interpretation of an answer (Groenendijk and Stokhof 1984). In both ﬁelds it has become clear that such an interpretation method can account for many inferences concerning what the speaker meant, but did not say: what is implicitly [ 191 ]

356

ROBERT VAN ROOY

conveyed by message m is according to this mechanism everything that P (m) but not yet from [[m]]. In vanRooy and Schulz (mafollows from Rexh nuscript) it is shown that it can account for many implicatures discussed in the semantic/pragmatic literature, most importantly the ones that are usually accounted for in terms of Grice’s (ﬁrst sub)maxim of quantity.42 Thus, it appears that the general strategy to interpret answers exhaustively is simple enough to make sense from an economical/evolutionary perspective, and can account for many things we infer from the use of a sentence on top of its conventional meaning.43 We do not need as much ‘on the spot’ reasoning involving a strong notion of rationality to account for these inferences as is sometimes assumed: we only have to apply the interpretation rule and do not have to reason that we should apply it.

7. CONCLUSION AND OUTLOOK

In this paper we have discussed conventional and non-conventional interpretation strategies, and contrasted two ways of answering the question of why these strategies are chosen: in terms of standard, or rational, and evolutionary game theory. We have argued that natural language interpretation does not need to rely as much on principles of rationality and assumptions of common knowledge as is sometimes assumed, and that taking an evolutionary stance solves some problems which rational game theory leaves unresolved. First, building on work of economists and biologists, it was shown that Lewis’s (1969) rationalistic analysis of semantic conventions could be given a natural evolutionary alternative. In the second part of the paper we suggested the same for some general pragmatic principles. Although we agree with Grice and others that the principles of truthfulness, iconicity, and exhaustive interpretation are not arbitrary and should be based on extra-linguistic economic principles, this does not necessarily mean that agents observe these principles because they are fully rational and come with a lot of common knowledge in each particular conversational situation. We argued that also the process of pragmatic interpretation might to a large extent be rule-governed, and motivated some of these rules in terms of evolutionary game theory. This does not mean that pragmatics makes no use of ‘on the spot’ reasoning, but its task is perhaps not as important as is sometimes assumed. A major task for the future is to delve more deeply into the question of how the interpretation task should be divided between linguistic rules and on the spot reasoning. A further task is to extend our analysis of signalling. The strategies in the signalling games we have discussed until now have obvious limitations and can hardly be called languages. Most obviously, we assumed that [ 192 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

357

messages are just unstructured wholes, although it is standardly assumed that the meanings of natural language sentences are determined compositionally in terms of their parts.44 It is this compositionality that allows speakers and hearers to continually produce and understand new sentences, sentences that have never before been uttered or heard by that speaker or hearer. How did humans come from our simple signalling systems to complicated languages and what was their incentive to do so? Despite the fact that the Académie des Science declared around 1850 any proposal to answer these questions as pure speculation and thus unscientiﬁc, very recently we saw a renewed interest from authors with scientiﬁc aspirations to address exactly these issues. In the literature, two kinds of answers are given to the above two questions. The ﬁrst answer to the question why compositionality arose that comes to mind is that compositionality allows speakers to communicate about a greater number of situations with obvious adaptive advantages. This answer seems to presuppose that memory is very limited. Perhaps more limited than it actually is. The second, and most standard, answer is that learnability rather than communicative success is the stimulating factor (Kirby and Hurford, 2001). If we want to talk about lots of different (aspects of) situations, a conventional language needs to distinguish many messages. In order for such a language to remain stable over generations, precedent or imitation are unnatural explanatory mechanisms: there are just too many message-meaning combinations to be learned. Once languages have a compositional structure, this problem disappears: we only need a small ﬁnite number of conventions which can plausibly be learned by children, but which together deductively entail a possibly inﬁnite number of conventional associations between messages and meanings. According to the, perhaps, standard answer to the question how compositionality arose, it is assumed that meaningful messages are really words and that different kinds of words – ones that denote objects (‘nouns’) and ones that denote actions (‘verbs’) – can be combined together to form (subject-predicate) sentences. This analysis assumes that both the set of states (our set T ) and the set of messages (M) are already partitioned into objects and actions, and nouns and verbs, respectively. This approach is worked out in some detail by Nowak and Krakauer (1999), among others, and dubbed synthetic in Hurford (2000). Wray (1998) – in the footsteps of the Quinean “radical translation” tradition – has objected to this approach. The meaningful messages in primitive communication systems should not be thought of as words, but rather as whole utterances that describe particular kinds of situations. [ 193 ]

358

ROBERT VAN ROOY

Compositional systems do not arise through the combination of meaningful words, but rather through the correlation between (i) features of meaningful messages and (ii) aspects of the situations that these utterances describe. Instead of assuming that the sets of states and messages are already partitioned, it is better to suppose that the states and messages themselves are structured and represented by something like vectors. Ideas of this kind are worked out (in terms of computer simulations) by Steels (2000), Kirby and Hurford (2001), and others, and such approaches are called analytic by Hurford (2000). This work is very appealing. It has not yet been given a theoretical underpinning within EGT. This would be very useful, because it would provide the experimental results an analytic justiﬁcation. But this only means that we have something to look forward to! ACKNOWLEDGEMENTS

The research for this paper has been made possible by a fellowship of the Royal Netherlands Academy of Arts and Sciences (KNAW). I am grateful to Wiebe van der Hoek for inviting me to submit a paper to the current volume of KRA. I would like to thank Johan van Benthem, Matina Donaldson, Gerhard Jäger, Martin Stokhof, and two anonymous reviewers of this journal for critical comments on and useful suggestions to an earlier version of this paper. NOTES 1 Here is Schiffer’s (1972) example. Suppose S wants R to think that the house he is

thinking of buying is rat-infested. S decides to bring about this belief in R by letting loose a rat in the house. He knows that R is watching him and knows that R believes that S is unaware that R is watching him. S intends R to infer, wrongly, from the fact that he let the rat loose that he did so with the intention that R should see the rat, take the rat as ‘natural’ evidence, and infer therefrom that the house is rat-infested. S further intends R to realize that the presence of the rat cannot be taken as genuine evidence; but S knows that R will think that S would not be so contrive to get R to believe the house is rat-infested unless S had good reasons for thinking it was, and so intends R to infer that the house is rat-infested from the fact that S is letting the rat loose with the intention of getting R to believe that the house is rat-infested. In this example, S’s action does intuitively not ‘mean’ that the house is rat-infested, although the Gricean conditions are all met. See Parikh (1991, 2001) for an interesting game-theoretical discussion of this example. 2 In game theory, it is standard to say that t is the type of the sender. 3 This assumption allows Hurford (1989), Oliphant (1996), Nowak and Krakauer (1999) and others to represent sender and receiver strategies by convenient transmission and reception matrices.

[ 194 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

359

4 Throughout the paper we will assume that communication is ‘noiseless’. Although in-

teresting possibilities arise when we give it up – as shown by Nowak and Krakauer (1999), it typically leads to more distinctive, or discrete, signals –, we will for simplicity assume that the receiver has no difﬁculties to perceptually distinguish the message being sent. Detecting its meaning is already hard enough. 5 Strictly speaking, this is not just a Nash equilibrium, but rather a perfect Bayesian equilibrium, the standard equilibrium concept for sequential, or extensive form, games with observable actions but incomplete information. 6 In our Lewisean games the two moods always come together. Searle (1969, 42–50) argued that the concept of ‘meaning’ can only be applied to illocutionary effects, not to perlocutionary ones. In the main text I will limit myself normally only to ‘indicative’ meaning, which might well be in accordance with Searle’s proposal. 7 See also Lewis’s (1969, 152–159) proof that if a signal is used conventionally (in Lewis’s sense), the Gricean (1957) requirements for non-natural meaning are met (though not necessarily the other way around). 8 The second condition is a little bit too strong. It is enough to require that all solutions of the game assign to m the same meaning. 9 See, among many others, van Rooy (2003) for more discussion. 10 Parikh (1991, 2001) assumes a stronger solution concept (Pareto optimality) than that of a Nash equilibrium as I assume here (for a deﬁnition of Pareto optimality, see Tuyls et al. (this volume)). With the help of this concept, more games satisfy the unique-solutioncondition (though not the one discussed in 2.1). In van Rooy (in press) I argue against the use of this concept in rational game theory, but show that the emergence of Pareto optimal solutions can be explained if games are thought of from an evolutionary point of view. See also Section 6.1 of this paper for more discussion. 11 Lewis (1969, 133) calculates that similar signalling problems with m states and n signals n! separating equilibria. have (n−m)! 12 In Lewis (1969, 8–24) an even stronger requirement is made. It is required for every

player i that if all the other players choose their equilibrium strategies, it is best for every player that i chooses her equilibrium strategy too. This equilibrium concept is called a coordinating equilibrium and in terms of it Lewis wants to rule out the possibility that an equilibrium in games of (partly) conﬂicting interests (e.g. the game of Chicken) can be called a convention (and explain why conventions tend to become norms (ibid., 97–100)). Vanderschraaf (1995) argued – convincingly we think – that there is a more natural way to rule out equilibria in such games to be called conventions: conventions have to satisfy a public intentions criterion, PIC: At a convention, each player will desire that her choice of strategy is common knowledge among all agents engaged in the game. Vanderschraaf also extends Lewis’s notion of a convention by thinking of it as a correlated equilibrium. In this way, also some ‘unfair’ equilibria (as in the Battle of Sexes game) are ruled out as candidates for conventions. We will not come back to Vanderschraaf’s PIC or his latter extension in this paper. 13 The strength of this self-sustaining force of an equilibrium depends crucially on the strength of the expectations on what others will do. With weaker expectations, ‘safer’ equilibria are more attractive. 14 A proposition p is common knowledge for a set of agents if and only if (i) each agent i knows that p, and (ii) each agent j knows that each agent i knows that p, each agent k knows that each agent j knows that each agent i knows that p, and so on.

[ 195 ]

360

ROBERT VAN ROOY

15 In the information theoretic account of content as developed by Dretske (1981) and

others, our concept of evolution is replaced by that of learning. Though the two are not the same, they are related (cf. the paper of Tuyls et al. in this volume): both take the history of the information carrying device to be crucial. 16 Although evolutionary game theory was ﬁrst used to model replication through genetic inheritance, it can and has been successfully applied to the evolution of social institutions as well, where replication goes by imitation, memory and education. For linguistic conventions we think of evolution in cultural rather than genetic terms. Fortunately, as shown by Tuyls et al. (this volume) and others, there are at least some learning mechanisms (e.g. multi-agent reinforcement learning, social learning) that provide a justiﬁcation for our use of the replicator dynamics that underlies the evolutionarily stability concept we use, in the sense that (in the limit) they give rise to the same dynamic behavior. Also the Iterated Learning Mechanism used by Hurford, Kirby and associates shows at least in some formulations a great similarity with that of evolutionary games. 17 The fact that linguistic conventions need not be resistant to larger mutations enables the theory to allow for language change from one ‘stable’ state to another. 18 This result does not hold anymore when there are more signals than states (and actions). We will have some combinations S, Ri and S, Rj which in equilibrium give rise to the same behavior, and thus payoff, although there will be an unused message m where Ri (m) = Rj (m). Now these combinations are separating though not ESS. Wärneryd deﬁnes a more general (and weaker) evolutionary stability concept, that of an evolutionarily stable set, and shows that a strategy combination is separating if and only if it is an element of such a set. 19 For our language game this can be done as follows: On the assumption of random pairing, the expected utility, or ﬁtness, of language Li at time t, E Ut (Li ), is deﬁned as:

E Ut (Li ) =

Pt (Lj ) × U(Li , Lj ).

j

The expected, or average, utility of a population of languages L with probability distribution Pt is then: E Ut (L) =

Pt (L) × E Ut (L).

L∈L

The replicator dynamics (for our language game) is then deﬁned as follows: dP (L) = P (L) × (E U(L) − E U(L)). dt 20 Our symmetric language games are doubly symmetric because for all L , L , i j

U(Li , Lj ) = U(Lj , Li ).

21 Weibull also explains why this does not mean that in such games we will always reach

the ‘global’ (or Pareto) optimal solution. As we will see in Section 6, extra assumptions have to be made to guarantee this. Of course, Fisher’s theorem holds in our games only because we made some idealizations, e.g. a simple form of reproduction (or learning) and perfect cooperation. 22 This result generalizes to the evolutionarily stable set concept used by Wärneryd (1993).

[ 196 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

361

23 It is not an absolute necessity: if we start with two equally probable states and two mes-

sages, the mixture of strategies where all 16 possible languages are equally distributed is a stable state as well. In independent research, the (almost) necessity of emerging messagemeaning relations is demonstrated also in simulation-models as those of Hurford (1989) and Oliphant (1996). 24 To my surprise, Skyrms (manuscript) shows that some kind of information exchange is possible in evolutionary cheap talk games even in bargaining games where preferences are not aligned. 25 Gintis (2000) argues that such an explanation of cooperative behavior fails to predict cooperation when a group is threatened with extinction. He argues (with many others) that, instead, we should assume a form of correlation between individuals playing alike strategies to explain the evolution of cooperative behavior. Correlation can also be of help to resolve a worry one of the anonymous reviewers has: punishment itself can be thought of as being altruistic. We will come back to correlation in Section 6. 26 Also Asher et al. (2001) propose an analysis of Grice’s quality maxim in terms of (a somewhat unusual version of) costly signalling. They do not relate it, however, with the standard literature in economics and biology. 27 See Hurd (1995) for a more general characterization. This characterization differs from the one given by Grafen (1990) – which seems to be the one Zahavi (1975) had in mind –, according to which certain messages cannot be cost-free. 28 Our game above not only has a separating equilibrium, but also a pooling one in which both types of individuals send and the receiver performs aL . As it turns out, this pooling equilibrium cannot be evolutionarily stable if C(tH , m) < 1. The same holds for separating equilibria where C(tH , m) > 0. 29 This way of looking at costs was brought to the author’s attention by Carl Bergstrom (p.c.) and it’s this way which brings us close to the conception of reciprocity. 30 Of course, we do not need the theory of costly signalling to explain why no individual would say that she is worse than she actually is. Lying is not just not truly revealing one’s type, but also doing this in such a way that it is (potentially) in one’s own advantage. 31 Lachmann et al (manuscript) argue – correctly we think – that the fact that the signalling costs are imposed socially by the receiver has two important consequences. First, the signaller now does not pay the costs associated with the signal level that she chose but rather with the signal level that the receiver thinks that she chose. As a consequence, in conventional signalling systems there will be selection for precise and accurate signals, in order to reduce costly errors. Second, in contrast to cases where costs are sender’s responsibility, receivers have no incentive to reduce signal costs in the case we consider. As a consequence, the destabilizing pressure of selection for reduced signal costs will not be a threat to signalling systems in which cost is imposed by the signal receiver. 32 Even if we assume that agents make use of signals with a pre-existing meaning and always tell the truth, this does not guarantee that language cannot be used to mislead one’s audience. Take a familiar Gricean example. If an agent answers the question where John is by saying John is somewhere in the South of France, one might conclude that the agent does not know exactly where John is (see Section 6.2 for the reason why) or that she does not think the exact place is relevant. However, it might be that she does know the exact place and knows that this is relevant, but just does not want to share this knowledge with the questioner. It all depends on the sender strategy taken, and this, in turn, depends on in how far the preferences of speaker and hearer are aligned. Look at the two-type-two-action game of this section again, assume that the expected utility for r to perform aH is higher than that

[ 197 ]

362

ROBERT VAN ROOY

of aL , and suppose that we demand truth: t ∈ [[S(t)]]. In that case, the rational message for a high-type individual to send is one that conventionally expresses {tH }, while a lowtype individual has an incentive to send a message with meaning {tH , tL }. If the receiver is naive he will choose aH after hearing the signal that expresses {tH , tL }, because aH has the highest expected utility. A receiver who knows the sender’s strategy S, however, will realize that the proposition {tH , tL } is only sent by a low type individual tL , i.e., S −1 ({tH , tL }) = {tL }, and thus will perform action aL . Obviously, when a hearer knows the sender-strategy being used by a speaker, deception is impossible. However, just as the uniqueness solution for coordination signalling problems, this is an unreasonably strong requirement to assume if it had to be determined anew for every separate conversational situation. Things would be much easier if for messages with a completely speciﬁed conventional meaning we can be assured that [[m]] = Sm , if S is the sender’s strategy used in the particular conversation at hand. Without going into detail, we would like to suggest that this is again guaranteed by high costs of messages sent by individuals who deviate from equilibrium play, just like in the main text of this section. 33 Suppose that S is the sender strategy of a signalling system and P the probability function over states. A general fact of Shannon’s (1948) information theory is that if S is an optimal coding of the meanings, it will be the case that if P (t) < P (t ) < P (t ) then l(S(t)) ≥ l(S(t )) ≥ l(S(t )), where l(S(t)) is the length of expression S(t). 34 See van Rooy (to appear, though dating back to 2001) for a more extensive argument against Parikh’s analysis. There it is also argued that Horn’s division of pragmatic labor follows from an evolutionary stance on signalling games once we have underspeciﬁcation. In this paper we go a step further, and also give an evolutionary motivation for why underspeciﬁcation itself is so useful. 35 In lexical semantic terms, we have to account for the fact that homonymy and polysemy are natural in languages. 36 Taking C(m ) = 1 = c, the expected utility of L in self-interaction can be determined 3 as follows: 1 1 U(L, L) = [ × U(ρ1 , L, L)] + [ × U(ρ2 , L, L)] 2 2 1 1 = [ × (0.9 + 0.1(1 − c)] + [ × (0.9(1 − c) + 0.1)] 2 2 1 27 2 1 18 3 = [ × ( + )] + [ × ( + )] 2 30 30 2 30 30 =

29 21 25 + = . 60 60 30

Similar calculations show that U(L , L ) U(LAH , LAH ) = 21 30 .

=

25 , U(L , L ) H H 30

=

29 , and 30

37 The assumption that π(L /L ) = 1 is unnecessary strong. Computer simulations sugi i gest that much milder forms of correlation have already the desired effect, albeit at a larger time scale. See http://signalgame.blehq.org for an implementation (mainly due to Wouter Koolen) of the evolutionary language game with correlation and mutation and play with it yourself! 38 Very recently, Gerhard Jäger started to make use of this assumption as well in his manuscript ‘Evolutionary Game Theory and Typology: A Case Study’ to account for the restricted distribution of case-marking systems among natural languages.

[ 198 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

363

39 Consider Rousseau’s famous Stag Hunt game as described by Lewis (1969); a simple

two-player symmetric game with two strict equilibria: both hunting Stag or both hunting Rabbit. The ﬁrst is Pareto optimal because it gives to both a utility of, let us say, 6, while the second only one of 4. However, assume that if one hunts Stag but the other Rabbit, the payoff is (4,0) in ‘favor’ of the Rabbit-hunter. In this case, the non-Pareto equilibrium where both are hunting Rabbit is called risk-dominant, because the ‘resistance’ of R, R against S, S is greater than the other way around. The reason for this, intuitively, is that if one player is equally likely to play either strategy, the expected utility of hunting Rabbit for the other is optimal. 40 As Gerhard Jäger reminded me of, this is only proved for 2-by-2 games, although I assume it also holds for the more general case. Fortunately, Jäger also assures me that simulation models suggest that the statement holds more generally in our doubly symmetric games. 41 Lightfoot (1991) and others show that this might have interesting consequences for language change. In a number of papers by Nowak and colleagues (Komarova et al., 2001), evolutionary dynamics is used to study the requirements on the language acquistition device in order for languages to remain stable. According to a lot of historical linguists (Croft, 2001), however, the inﬂuence of innovative language use by adults should not be underestimated. 42 It is also shown how some apparent counterexamples to the use of this mode of interpretation to account for implicatures can be overcome when we bring it together with some modern developments in semantics/pragmatics. 43 Of course, a combined speaker-hearer strategy such that the sender only gives the negative instances and the receiver adopts a strategy just like the one in the main text except that the order on T is based on P¯ instead of on P would do equally well. This would, however, be more costly if there are more negative than positive instances of predicate P , which is normally, though not always, the case. Therefore, the rule which only gives the positive instances is from an economical point of view more natural. 44 For a fuller list of limitations, see Lewis (1969, 160–161).

REFERENCES

Asher, N., I. Sher and M. Williams: 2001, ‘Game Theoretical Foundations for Gricean Constraints’, in R van Rooy and M. Stokhof (eds.), Proceedings of the Thirteenth Amsterdam Colloquium, ILLC, Amsterdam. Axelrod, R. and W. Hamilton: 1981, ‘The Evolution of Cooperation’, Science 411: 1390– 1396. Blackwell, D.: 1953, ‘Equivalent Comparisons of Experiments’, Annals of Mathemathical Statistics 24, 265–272. Blutner, R.: 2000, ‘Some Aspects of Optimality in Natural Language Interpretation’, Journal of Semantics 17: 189–216. Clark, H.: 1996, Using Language, Cambridge University Press, Cambridge. Crawford, V. and J. Sobel: 1982, ‘Strategic Information Transmission’, Econometrica 50, 1431–1451. Croft, W.: 2000, Explaining Language Change: An Evolutionary Approach, Longman Linguistic Library, Harlow. Dretske, F.: 1981, Knowledge and the Flow of Information, MIT Press, Cambridge, MA.

[ 199 ]

364

ROBERT VAN ROOY

Fisher, R. A.: 1930, The Genetical Theory of Natural Selection, Oxford University Press, Oxford. Grafen, A.: 1990, ‘Biological Signals as Handicaps’, Journal of Theoretical Biology 144, 517–546. Gintis, H.: 2000, Game Theory Evolving, Princeton University Press, Princeton. Grice, H. P.: 1957, ‘Meaning’, Philosophical Review 66: 377–388. Grice, H. P.: 1967, ‘Logic and Conversation’, typescript from the William James Lectures, Harvard University. Published in P. Grice (1989), Studies in the Way of Words, Harvard University Press, Cambridge, MA, pp. 22–40. Groenendijk, J. and M. Stokhof: 1984, Studies in the Semantics of Questions and the Pragmatics of Answers, Ph.D. thesis, University of Amsterdam. Grosz, B., A. Joshi, and S. Weinstein: 1995, ‘Centering: A Framework for Modeling the Local Coherence of Discourse’, Computational Linguistics 21, 203–226. Horn, L.: 1984, ‘Towards a New Taxonomy of Pragmatic Inference: Q-based and Rbased Implicature’, in D. Schiffrin (ed.), Meaning, Form, and Use in Context: Linguistic Applications, GURT84, Washington; Georgetown University Press, pp. 11–42. Hurd, P.: 1995, ‘Communication in Discrete Action-Response Games’, Journal of Theoretical Biology 174, 217–222. Hurford, J.: 1989, ‘Biological Evolution of the Saussurian Sign as a Component of the Language Acquisition Device’, Lingua 77, 187–222. Hurford, J.: 2000, ‘The Emergence of Syntax’, in C. Knight, M. Studdert-Kennedy and J. Hurford (eds.), The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, (editorial introduction to section on syntax), Cambridge University Press, pp. 219–230. Kandori, M., G. J. Mailath, and R. Rob: 1993, ‘Learning, Mutation, and Long Run Equilibria in Games’, Econometrica 61, 29–56. Karttunen, L.: 1977, ‘Syntax and Semantics of Questions’, Linguistics and Philosophy 1, 3–44. Kirby, S. and J. Hurford: 2001, ‘The Emergence of Linguistic Structure: An Overview of the Iterated Learning Model’, in D. Parisi and A. Cangelosi (eds.), Simulating the Evolution of Language, Springer Verlag, Berlin. Komarova, N., P. Niyogi, and M. Nowak: 2001, ‘The Evolutionary Dynamics of Grammar Acquisition’, Journal of Theoretical Biology 209, 43–59. Lachmann, M., Sz. Szamado, and C. Bergstrom: manuscript, ‘The Peacock, the Sparrow, and Evolution of Human Language’, Santa Fe Institute working paper nr. 00-12-74. Levinson, S.: 2000, Presumptive Meanings. The Theory of Generalized Conversational Implicatures, MIT Press, Cambridge, MA. Lewis, D.: 1969, Convention, Harvard University Press, Cambridge, MA. Lightfoot, D. W.: 1991, How to Set Parameters: Arguments from Language Change, MIT Press, Cambridge, MA. Maynard-Smith, J. and G. R. Price: 1973, ‘The Logic of Animal Conﬂict’, Nature 146, 15–18. Maynard-Smith, J.: 1982, Evolution and the Theory of Games, Cambridge University Press, Cambridge. Maynard-Smith, J. and D. Harper: 1995, ‘Animal Signals: Models and Terminology’, Journal of Theoretical Biology 177, 305–311. McCarthy, J.: 1980, ‘Circumscription – A Form of Non-Monotonic Reasoning’, Artiﬁcial Intelligence 13, 27–39.

[ 200 ]

EVOLUTION OF CONVENTIONAL MEANING AND CONVERSATIONAL PRINCIPLES

365

Millikan, R. G.: 1984, Language, Thought, and Other Biological Categories, MIT Press, Cambridge, MA. Nowak, M. and D. Krakauer: 1999, ‘The Evolution of Language’, Proc. Natl. Acad. Sci. U.S.A. 96, 8028–8033. Oliphant, M.: 1996, ‘The Dilemma of Saussurean Communication’, BioSystems 37, 31–38. Osborne, M. and A. Rubinstein: 1994, A course in Game Theory, MIT Press, Cambridge, MA. Parikh, P.: 1991, ‘Communication and Strategic Inference’, Linguistics and Philosophy, 14, 473–513. Parikh, P.: 2001, The Use of Language, CSLI Publications, Stanford, CA. Rooy, R. van: 2003, ‘Quality and Quantity of Information Exchange’, Journal of Logic, Language and Information 12, 423–451. Rooy, R. van: in press, ‘Signaling Games Select Horn Strategies’, Linguistics and Philosophy, to appear. Rooy, R. van and K. Schulz: manuscript, ‘Pragmatic Meaning and Non-Monotonic Reasoning: The Case of Exhaustive Interpretation’, University of Amsterdam. Schelling, T.: 1960, The Strategy of Conﬂict, Oxford University Press, New York. Schiffer, S.: 1972, Meaning, Clarendon Press, Oxford. Selten, R.: 1980, ‘A Note on Evolutionary Stable Strategies in Asymmetric Animal Contests’, Journal of Theoretical Biology 84, 93–101. Searle, J. R.: 1969, Speech Acts, Cambridge University Press, Cambridge. Shannon, C.: 1948, ‘The Mathematical Theory of Communication’, Bell System Technical Journal 27, 379–423, 623–656. Skyrms, B.: 1994, ‘Darwin Meets The Logic of Decision: Correlation in Evolutionary Game Theory’, Philosophy of Science 61, 503–528. Skyrms, B.: 1996, Evolution of the Social Contract, Cambridge University Press, Cambridge. Skyrms, B.: manuscript, ‘Signals, Evolution and the Explanatory Power of Transient Information’, University of California, Irvine. Spence, M.: 1973, ‘Job Market Signalling’, Quarterly Journal of Economics 87, 355–374. Sperber, D. and D. Wilson: 1986, Relevance, Harvard University Press, Cambridge. Steels, L.: 2000, ‘The Emergence of Grammar in Communicating Autonomous Robotic Agents’, in W. Horn (ed.), Proceedings of European Conference on Artiﬁcial Intelligence, ECAI2000, nr. CONF14, Amsterdam: IOS Press, pp. 764–769. Strawson, P. F.: 1964, ‘Intention and Convention in Speech Acts’, Philosophical Review 75, 439–460. Taylor, P. and L. Jonker: 1978, ‘Evolutionary Stable Strategies and Game Dynamics’, Mathematical Biosciences 40, 145–156. Tuyls, K. et al.: this volume, ‘An Evolutionary Game Theoretical Perspective on Learning in Multi-Agent Systems’, Knowledge, Rationality and Action. Vanderschraaf, P.: 1995, ‘Convention as Correlated Equilibrium’, Erkenntnis 42, 65–87. Wärneryd, K.: 1993, ‘Cheap Talk, Coordination, and Evolutionary Stability’, Games and Economic Behavior 5, 532–546. Weibull, J. W.: 1995) Evolutionary Game Theory, MIT Press, Cambridge. Wray, A.: 1998, ‘Protolanguage as a Holistic System for Social Interaction’, Language and Communication 19, 47–67. Young, H. P.: 1993, ‘The Evolution of Conventions’, Econometrica 61, 57–84. Zahavi, A.: 1975, ‘Mate Selection – A Delection for a Handicap’, Journal of Theoretical Biology 53, 205–214.

[ 201 ]

366 ILLC Department of Humanities University of Amsterdam Nieuwe Doelenstraat 15 1012 CP Amsterdam, The Netherlands E-mail: [email protected]

[ 202 ]

ROBERT VAN ROOY

143 REINHARD BLUTNER

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

ABSTRACT. There is a gap between two diﬀerent modes of computation: the symbolic mode and the subsymbolic (neuron-like) mode. The aim of this paper is to overcome this gap by viewing symbolism as a high-level description of the properties of (a class of) neural networks. Combining methods of algebraic semantics and nonmonotonic logic, the possibility of integrating both modes of viewing cognition is demonstrated. The main results are (a) that certain activities of connectionist networks can be interpreted as non-monotonic inferences, and (b) that there is a strict correspondence between the coding of knowledge in Hopﬁeld networks and the knowledge representation in weight-annotated Poole systems. These results show the usefulness of non-monotonic logic as a descriptive and analytic tool for analyzing emerging properties of connectionist networks. Assuming an exponential development of the weight function, the present account relates to optimality theory – a general framework that aims to integrate insights from symbolism and connectionism. The paper concludes with some speculations about extending the present ideas.

1. INTRODUCTION

A puzzle in the philosophy of mind concerns the gap between symbolic and subsymbolic (neuron-like) modes of computation/processing. Complex symbolic systems like those of grammar and logic are essential when we try to understand the general features and the peculiarities of natural language, reasoning and other cognitive domains. On the other hand, most of us believe that cognition resides in the brain and that neuronal activity forms its basis. Yet neuronal computation appears to be numerical, not symbolic; parallel, not serial; distributed over a gigantic number of diﬀerent elements, not as highly localized as in symbolic systems. Moreover, the brain is an adaptive system that is very sensitive to the statistical character of experience. Hard-edged rule systems are not suitable to deal with this aspect of behavior. The methodological position pursued in this article is an integrative one, which looks for uniﬁcation. In the case under discussions the point is to assume that symbols and symbol processing are a macrolevel description of what is considered as a connectionist system at Synthese 142: 143–174, 2004. Knowledge, Rationality & Action 203–234. 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[203]

144

REINHARD BLUTNER

the micro level. This position is analogous to the one taken in theoretical physics, relating for example thermodynamics and statistical physics (or, in a slightly diﬀerent way, Newtonian mechanics and quantum mechanics). Hence, the idea is that the symbolic and the subsymbolic mode of computation can be integrated within a uniﬁed theory of cognition. If successful, this theory is able to overcome the gap between the two modes of computation and it assigns the proper roles to symbolic, neural and statistical computation (e.g., Smolensky 1988, 1996; Balkenius and Ga¨rdenfors 1991; Kokinov 1997). It should be stressed that the integrative methodology is not the only one. Alternatively, some researchers like to play down the neuronal perspective to a pure issue of implementation. Representatives of this position are, inter alias, Fodor and Pylyshyn (1988), who insist that the proper role of connectionism in cognitive science is merely to implement existing symbolic theory. On the other extreme, there is a school that Pinker and Prince (1988) and Smolensky et al. (1992) call eliminative connectionism. The approach to the gap taken by these researchers is simply to ignore it and to deny the existence of the symbolic perspective and higherlevel cognition. Many connectionist research falls into this category and some of its representatives make a major virtue out of this denial. Finally, there is a movement towards hybrid systems (e.g., Hendler 1989, 1991; Boutsinas and Vrahatis 2001). According to this approach two computationally separate components are assumed, one connectionist and the other symbolic, and an interface component has to be constructed that allows the two components to interact. It is not unfair to say the hybrid approach is an eclectic one. In particular, it requires extra stipulations to construct the interface. In my opinion, it is not a good idea to develop models where separate modules correspond to separate cognitive processes and are described within separate paradigms, like a connectionist model of perception (and apperception) combined with a symbolic model of reasoning. Instead, both aspects should basically be integrated and contribute at every level to every cognitive process. Qualifying the hybrid approach as an eclectic one, I cannot consider the eliminative position to be especially helpful, either. It is evident that such a complex object as the human mind (and human reasoning in particular) is too complex to be fully described by a single formal theory or model, and therefore several diﬀerent and possibly contradicting perspectives are needed. [204]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

145

Downplaying connectionism as a pure issue of implementation is a point that deserves careful attention. I think, there are two diﬀerent reactions that prove this position as untenable. The ﬁrst one is to show that non-classical cognitive architectures may emerge by assuming connectionist ideas at the micro-level. The development of optimality theory (Prince and Smolensky 1993) may be a case in point, and likewise it is the demonstration of the close correspondence between certain kinds of neural networks and belief networks (e.g., Glymour 2001). The second possible reaction is to demonstrate that aspects of possibly old-fashioned, ‘‘classical’’ architectures can be explained by assuming an underlying (connectionist) micro-level. If this is achievable we have got much more than what usually is connected with the conception of implementation. In this article, I wish to demonstrate both lines of argumentation. In a nutshell, the aim of this paper is to demonstrate that the gap between symbolic and neuronal computation can be overcome when we view symbolism as a high-level description of the properties of (a class of) neural networks. The important methodological point is to illustrate that the instruments of model-theoretic (algebraic) semantics and non-monotonic logic may be very useful in realizing this goal. In this connection it is important to stress that the algebraic perspective is entirely neutral with respect to foundational questions such as whether a ‘‘content’’ is in the head or is a platonic abstract entity (cf. Partee and Hendriks 1997, p. 18). Consequently, the kind of ‘‘psycho-logic’’ we pursue here isn’t necessarily in conﬂict with the general setting of model-theoretic semantics. Information states are the fundamental entities in the construction of propositions. In the next section, a reinterpretation of information states is given as representing states of activations in a connectionist network. In Section 3 we consider how activation spreads out and how it reaches, at least for certain types of networks, asymptotically stable output states. Following and extending ideas of Balkenius and Ga¨rdenfors (1991), it is shown that the fast dynamics of the system can be described asymptotically as an non-monotonic inferential relation between information states. Section 4 introduces the notion of weight-annotated Poole systems, and Section 5 explains how these systems bring about the correspondence between connectionist and symbolic knowledge bases. Finally, in Section 6 we relate the present account to optimality theory (Prince and Smolensky 1993) – a general framework that aims to integrate insights from symbolism and [205]

146

REINHARD BLUTNER

connectionism. The paper concludes with some speculations about some extensions of the integrative story.

2.

INFORMATION STATES IN HOPFIELD NETWORKS

Connectionist networks are complex systems of simple neuron-like processing units (usually called ‘‘nodes’’) which adapt to their environments. In fact, the nodes of most connectionist models are vastly simpler than real neurons. However, such networks can behave with surprising complexity and subtlety. This is because processing is occurring in parallel and interactively. In many cases, the way the units are connected is much more important for the behavior of the complete system than the details of the single units. There are diﬀerent kinds of connectionist architectures. In multilayer perceptrons, for instance, we have several layers of nodes (typically an input layer, one or more layers of hidden nodes, and an output layer). A fundamental characteristics of these networks is that they are feedforward networks, that means that units at level i may not aﬀect the activity of units at levels lower than i. In typical cases, there are only connections from level i to level i+1. In contrast to feedforward networks, recurrent networks allow connections in both directions. A nice property of such network is that they are able to gather and utilize information about a sequence of activations. Further, some types of recurrent nets can be used for modelling associative memories. If we consider how activation spreads out, we ﬁnd that feedforward networks always stabilize. In contrast, there are some recurrent networks that never stabilize. Rather, they behave as chaotic systems that oscillate between diﬀerent states of activation. One particular type of recurrent networks are Hopﬁeld networks (Hopﬁeld 1982). Such networks always stabilize, and Hopﬁeld proved that by demonstrating the analogy between this sort of networks and the physical system of spin glasses and by showing that one could calculate a very useful measure of the overall state of the network that was equivalent to the measure of energy in the spin glass system. A Hopﬁeld net tends to move toward a state of equilibrium that is equivalent to a state of lowest energy in a thermodynamic system. For a good introduction into connectionist networks the reader is referred to the two volumes Rumelhart, McClelland, and the PDP group (1986a) which are still a touchstone for a wide variety of work [206]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

147

on parallel distributed networks. Other excellent introductions are Bechtel (2002) and Smolensky and Legendre (to appear). For the following, we start with considering neural networks as systems of connected units. Each unit has a certain working range of activity. Let this be the set [)1, +1] (+1: maximal ﬁring rate, 0: resting, )1: minimal ﬁring rate). A possible state s of the system describes the activities of each neuron: s ˛[)1, +1]n, with n ¼ number of units. A possible conﬁguration of the network is characterized by a connection matrix w. Hopﬁeld networks are deﬁned by symmetric conﬁgurations and zero diagonals ()1 £ wij £ + 1, wij ¼ wji , wii ¼ 0). That means node i has the same eﬀect on node j as node j on node i, and nodes do not aﬀect themselves. The fast dynamics describes how neuron activities spread through that network. In the simplest case, this is described by the following update function: fðsÞi ¼ hðRj wij sj Þ ðh a nonlinear function, typically ð1Þ

a step function or a sigmoid functionÞ:

Equation (1) describes a linear threshold unit. This activation rule is the same as that of Rosenblatt’s perceptron. It is applied many times to each unit. Hopﬁeld (1982) employed an asynchronous update procedure in which each unit, at its own randomly determined times, would update its activation (depending on its current net input).1 For the following, it is important to interpret activations as indicating information speciﬁcation: the activations +1 and )1 indicate maximal speciﬁcation; the resting activation 0 indicates (complete) underspeciﬁcation. It is this interpretation of the activation states, that allows introducing the notation of information as an observerdependent notion. Though this interpretation is not an arbitrary one, from a philosophical point of view it is important to stress the idea that information is not a purely objective, observer-independent unit. Instead, the observer of the network decides, at least in part, which aspects of the network are worth our consideration and which abstractions are appropriate for the observer’s aims. Generalizing an idea introduced by Balkenius and Ga¨rdenfors (1991), the set S ¼ [)1, +1]n of activation states can be partially ordered in accordance with their informational content: ð2Þ s t iff si ti 0 or si ti 0 for all i n:

[207]

148

REINHARD BLUTNER

s ‡ t can be read as s is at least as speciﬁc as t. The poset < S, ‡> does not form a lattice. However, it can be extended to a lattice by introducing a set ^ of impossible activation states: ^ = {s : si ¼ nil for 1 £ i £ n}, where nil designates the ‘‘impossible’’ activation of an unit.2 It can be shown that the extended poset of activation states < S¨^, >> forms a DeMorgan lattice when we replace the former deﬁnition of the informational ordering as follows: st ð3Þ

iff si ¼ nil or si ti 0

or si ti 0 for all 1 i nÞ:

The operation st ¼ sup{s, t} (CONJUNCTION) can be interpreted as the simultaneous realization of two activation states; the operation s t ¼ inf{s,t} (DISJUNCTION) can be interpreted as some kind of generalization of two instances of information states. The COMPLEMENT s* reﬂects a lack of information. The operations come out as follows: 8 < maxðsi ; ti Þ if si ; ti 0; ðs tÞi ¼ minðsi ; ti Þ ð4Þ if si ; ti 0; : nil; elswhere, 8 if si ; ti 0; minðsi ; ti Þ > > > ; t Þ if si ; ti 0; maxðs < i i ðs tÞi ¼ si ; ð5Þ if ti ¼ nil; > > > ; if si ¼ nil; t i : 0; elswhere. 8 if si > 0; 1 si > < 1 si if si <0; ð6Þ ðs Þi ¼ nil if si ¼ 0 > : 0 if si ¼ nil: The fact that the extended poset of activation states forms a DeMorgan lattice gives the opportunity to interpret these states as propositional objects (‘‘information states’’). 3.

ASYMPTOTIC UPDATES AND NON-MONOTONIC INFERENCE

In general, updating an information state s may result in an information state f...f(s) that does not include the information of s. However, in what follows it is important to interpret updating as speciﬁcation. If we want s to be informationally included in the [208]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

149

resulting update, we have to ‘‘clamp’’ s somehow in the network. A technical way to do that has been proposed by Balkenius and Ga¨rdenfors (1991). Let f designate the original update function (1) and f the clamped one, which can be deﬁned as follows (including iterations): ð7Þ fðsÞ ¼ fðsÞ s; ð8Þ

fnþ1 ðsÞ ¼ fðfn ðsÞÞ s:

Hopﬁeld networks (and other so-called resonance systems) exhibit a desirable property: for each given input state s the system stabilizes in a well-deﬁned output state (it is of no importance here whether the dynamics is clamped or not). The notion of resonance is a very universal one and can be deﬁned for a dynamic system [S, f] in general (here S denotes the space of possible states and f denotes the dynamics of the system, i.e., the activation function with regard to a speciﬁc conﬁguration w of the network). ð9Þ A state s 2 S is called a resonance of a dynamic system½S; fiff (i) fðsÞ ¼ s (Equilibrium) (ii) For each e > 0 there exists a 0

For each e > 0 there exists a 0

The existence of resonances is an emergent collective eﬀect in neural nets. Intuitively, resonances are the stable states of the network and they attract other states. When each state develops into a resonance, then the system produces a content-addressable memory. Such memories have emergent collective properties (capacity, error correction, familiarity recognition; for details see Hopﬁeld 1982). A neural network is called a resonance system iﬀ lim nﬁ¥ (f n (s)) exists and is a resonance for each state s ˛S and each activation function f (relative to any network conﬁguration w˛W). Cohen and Grossberg (1983) were the ﬁrst who proved that Hopﬁeld networks are resonance systems. The same proves true for a large class of other [209]

150

REINHARD BLUTNER

systems: The McCulloch–Pitts model (McCulloch and Pitts 1943), the Cohen–Grossberg model (1983), Rumelhart’s Interactive Activation model (Rumelhart et al. 1986b), Smolensky’s (1986b) Harmony networks etc. (for details see Grossberg (1989). In the present context, the classical results can be borrowed to establish that the following set of asymptotic updates of s is welldeﬁned: ð10Þ ASUPw ðsÞ ¼ ft : t ¼ limn!1 fn ðsÞg: In case of P asynchronous (non-deterministic) updates, the function E(s) = ) i>j wij si sj is a Ljapunov function (energy function) of the dynamic system (Hopﬁeld 1982) i.e., when the activation state of the network changes, E can either decrease or remain the same. Hence, the output states limnﬁ¥ fn(s) can be characterized as the local minima of the Ljapunov-function. Usually, the stable state is not the state that would yield the lowest possible values of E (the global minima). The Boltzman machine (Hinton and Sejnowski 1983, 1986) is an adaptation of the Hopﬁeld net that realizes the global minima, i.e., their output states lim nﬁ¥ fn(s) can be characterized as the global minima of the Ljapunovfunction. Like the Hopﬁeld net, the Boltzman machine updates its units by means of an asynchronous update procedure. However, it employs a stochastic activation function rather than a deterministic one. This activation function can be considered to realize some stochastic noise (‘‘faults’’), in a decreasing rate during the processing of a single pattern.3 The latter observation enables us to characterize the asymptotic updates of s as the set of all speciﬁcations of s that minimize the energy E of the system: ð11Þ ASUPw ðsÞ ¼ minE ðsÞ: The propositional objects called information states are related by the partial ordering ‡. It is obvious that this relation can be interpreted as a strict entailment relation. In any case it satisﬁes the Tarskian restrictions for such a relation: ðiÞ s s ð12Þ ðiiÞ if s t and s t u; then s u ðiiiÞ if s u; then s t u

ðREFLEXIVITYÞ; ðCUTÞ; ðMONOTONICITYÞ:

More interesting, Balkenius and Ga¨rdenfors (1991) have made clear that it is possible to deﬁne a nonmonotonic inference relation that [210]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

151

reﬂects asymptotic updating of information states. Let ~~ be a poset of activation states, and w the connection matrix. Then the notion of asymptotic updates naturally leads to a non-monotonic inferential relation between information states: ð13Þ~~

sjw t

iff s0 t for each s0 2 ASUPw ðsÞ:

It is with the help of the equivalence (11) that the usual traits of nonmonotonic consequence relations can be shown: THEOREM 1. Let jw be a relation between information states as deﬁned in (13). Then we have (SUPRACLASSICALITY); (i) if t; then sjw t (REFLEXIVITY); (ii) sjw s (iii) if sjw t and s tjw u; then sjw u (CUT); (iv) if sjw t and sjw u; then s tjw u: (CAUTIOUSMONOTONICITY): ð14Þ The proofs for SUPRACLASSICALITY and REFLEXIVITY (clampedness!) are obvious. For CUT, suppose all E-minimal speciﬁcations of s are speciﬁcations of t and all E-minimal speciﬁcations of s t are speciﬁcations of u. Let s0 be an E-minimal speciﬁcation of s. s0 speciﬁes both s and t; consequently, it speciﬁes s t. Since s t ‡ s, it follows that s0 is also an E-minimal speciﬁcation of s t. Consequently, it is a speciﬁcation of u. For CAUTIOUS MONOTONICITY, suppose all E-minimal speciﬁcations of s are speciﬁcations of t and u. We have to prove v ‡ u for each E-minimal speciﬁcation v of s t. Let m be any E-minimal speciﬁcation of s t. Of course, m is a speciﬁcation of s. We shall prove now that m is an E-minimal speciﬁcation of s. If this were wrong, there would be an E-minimal speciﬁcation m0 of s such that E(m0 ) E(m). But all E-minimal speciﬁcations of s are speciﬁcations of t, therefore m0 ‡ t and m0 ‡ s t. This contradicts the E-minimality of m with respect to the speciﬁcations of s t. Therefore m must be an Eminimal speciﬁcation of s. Since all E-minimal speciﬁcations of s are speciﬁcations of u, one concludes that m ‡ u. The results found so far correspond to the ﬁndings of Balkenius and Ga¨rdenfors (1991), who have considered information states for cases where they form a Boolean algebra. The inferential notion that is adequate to describe the fast dynamics of the neural system (how neuron activities spread through the network) can be characterized in terms of the general postulates that Gabbay (1985) and Kraus et al. [211]

152

REINHARD BLUTNER

(1990) have seen as constituting a cumulative (non-monotonic) consequence relation. A simple example may help to illustrate the ideas introduced so far and to simplify the subsequent explanations. Let’s consider a Hopﬁeld network with a set of states S = [)1, +1]3 and the connection matrix (15). 0 1 0 0:2 0:1 w ¼ @ 0:2 0 1 A ð15Þ 0:1 1 0 Figure 3 shows the activation states of the network before and after updating. For the input state it is assumed that node 1 is activated

Figure 1. Stable, asymptotically stable, and unstable temporal developments of information states.

E

start

A asynchronous updates

B

asynchronous updates with faults

Figure 2. Local and global minima of the Ljapunov function. Local minima can be realized by asynchronous updates with a deterministic activation function while reaching the global minima often requires an stochastic activation function.

[212]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

1

1

3

2

3

2

0.1

0.2

0.1

0.2

153

-1

-1

Output

Input

Figure 3. Activation states of a network with connection matrix (15) before and after updating. White, gray, black nodes indicate activation, resting, and inhibition, respectively.

and the other two nodes are resting (indicating underspeciﬁcation). Clamping node 1, the fast dynamics yields an output state where node 2 is activated and node 3 is inhibited (minimal ﬁring rate). In Table I, nine diﬀerent possible speciﬁcations of the initial state h1 0 0i are shown, and their energy is calculated with regard to the connection matrix (15). The energy-minimal state is indicated by E. It corresponds to the output state represented in Figure 3. Using deﬁnition (13) and the equivalence (11) it is obvious that the following inferences are valid:

TABLE I States and their energy in a network with connection matrix (15). E indicates the energy-minimum state S [states]

E(s) [energy]

h1 h1 h1 h1 h1 h1 h1 h1 h1

0 )0.1 0.1 )0.2 0.7 )1.1 0.2 )0.9 1.3

0 0i 0 1i 0 1i 1 0i 1 1i 1 1i 1 0i 1 1i 1 1i

E

[213]

154 ð16Þ

REINHARD BLUTNER

(i) (ii) (iii)

h1 0 0i jw h1 11i h1 0 0ijw h1 1 0i h1 0 0ijw h0 1 0i

The latter two inferences can be derived from the ﬁrst one by taking into account that h1 1 1i h1 1 0i h0 1 0i: 4. WEIGHT-ANNOTATED POOLE SYSTEMS

In connectionist systems ‘‘knowledge’’ is encoded in the connection matrix w (or, alternatively, the energy function E). Symbolic systems usually take default logic and represent knowledge as a database consisting of expressions having default status. A prominent example of such a framework has been proposed by Poole (1988, 1996). In this section, we introduce a variant of Poole’s systems, which we call weight-annotated Poole systems. This variant will be proven to be useful for relating the diﬀerent types of coding knowledge (see Section 5). Let us consider the language LAt of propositional logic (referring to the alphabet At of atomic symbols). A triple T =

[214]

a>T b iff b is an ordinary consequence of each maximal scenario of a in T:

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

155

It is important to give a preference semantics for weight-annotated Poole systems. This preference semantics may be seen as the decisive link for establishing the correspondence between connectionist and symbolic systems. Let m denote an ordinary (total) interpretation for the language LAt (m: At!{)1,1}). The usual clauses apply for the evaluation of the formulas of LAt relative to m. The following function indicates how strongly an interpretation m conﬂicts with the space of hypotheses D: ð19Þ

EðmÞ ¼ Rd2D gðdÞjjdjjm ðE is called the ‘‘energy’’ of the interpretation, or Poole’s system energy)

An interpretation m is called a model of a just in case jjajjv ¼ 1. A preferred model of a is a model of a with minimal energy E (with regard to the other models of a). As a semantic counterpart to the syntactic notion a >T b, let’s take the following relation: a >T b iff each preferred model of a is a model of b. ð20Þ As a matter of fact, the syntactic notion (18) and the semantic notion (20) coincide: THEOREM 2. For all formulae a and b of LAt : a >T b iﬀ a >T b. For the proof, it is suﬃcient to show that the following two clauses are equivalent: (A) There is a maximal scenario D0 of a in T such that {a}¨D0 ¨{b} is consistent. (B) There is a preferred model of a such that jjbjjm = )1. In order to prove this equivalence we have to state some simple facts which are immediate consequences from the corresponding deﬁnitions. Let T ¼

(the scenario associated with the model mÞ: scðD; mÞ ¼ D0 in case D0 is a maximal scenario of a in T;

ð22Þ

and m is a model of fag [ D0 :

Now we are ready to prove the equivalence between (A) and (B). (A) )(B): Let’s assume that D0 is a maximal scenario of a in T and m is a model of {a}¨D0 ¨{b}. We have to show that m is a preferred model of a in T, i.e., we have to show that for each model m0 of a in T, [215]

156

REINHARD BLUTNER

E(m0 ) ‡ E(m). From fact (21), it follows that E(m0 ) ¼ )G(sc(D, m0 )), and the facts (21) & (22) necessitate E(m) = )G(D0 ). Since sc(D, m0 ) is a scenario of a in T and D0 is a maximal scenario, it follows that E(m0 ) ‡ E(m). (B) )(A): We assume a preferred model m of a and assume jjbjjm ¼ 1. Obviously, the set sc(D, m)¨{a}¨{b} is consistent (m is a model of it). We have to show now that the scenario sc(D, m) is a maximal scenario of a in T. If it were not, then there would exist a maximal scenario D0 with G(D0 ) > G(sc(D,m). Because we have G(D0 ) ¼ E(m0 ) for any model m0 of {a}¨D0 and G(sc(D, m) ¼ E(m) [facts (21) & (22)], this would contradict the assumption that m is a preferred model of a in T. End of proof. 5. RELATING CONNECTIONISM AND SYMBOLISM

In investigating the correspondence between connectionist and symbolic knowledge bases, we have ﬁrst to look for a symbolic representation of information states. Let’s consider again the propositional language LAt , but now let us take this language as a symbolic means to speak about information states. Following usual practice in algebraic semantics, we can do this formally by interpreting (some subset of the) expressions of the propositional language by the corresponding elements of the DeMorgan algebra ~~. More precisely, let’s call the triple ~~~~ a Hopﬁeld model (for LAt ) iﬀ is a function assigning some element of S¨^ to each atomic symbol and obtaining the following conditions: ðiÞ a ^ b ¼ a b ð23Þ ðiiÞ b ¼ b ð‘‘ ’’ converts positive into negative activation and vice versa) : A Hopﬁeld model is called local (for LAt ) iﬀ it realizes the following assignments: p1 ¼ h1 0 . . . 0i p2 ¼ h0 1 . . . 0i ð24Þ~~

... pn ¼ h0 0 . . . 1i

An information state s is said to be represented by a formula a of LAt (relative to a Hopﬁeld model M) iﬀ a ¼ s. With regard to our earlier example, the following formulae represent proper activation [216]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

157

states: p1 represents h1 0 0i, p2 represents h0 1 0i, p3 represents h0 0 1i, p1 ^ p2 represents h1 1 0i, p1 represents h1 0 0i, and p1^p2^p3 represents h1 1 1i. With regard to local Hopﬁeld models it is obvious that each state can be represented by a conjunction of literals (atoms or their inner negation). In other words, for local models each information state can be considered symbolic. Local Hopﬁeld models give us the opportunity to relate connectionist and symbolic knowledge bases in a straightforward way and to represent non-monotonic inferences in neural (Hopﬁeld) networks by inferences in weight-annotated Poole systems. The crucial point is the translation of the connection matrix w into an associated Poole system Tw. Let’s consider a Hopﬁeld system (n neurons) with connection matrix w, and let At ¼ fp1 ; . . . ; pn g be the set of atomic symbols. Take the following formulae aij of LAt : ð25Þ

aij ¼def ðpi $ signðwij Þpj Þ;

for 1 i

For each connection matrix w the associated Poole system is deﬁned as Tw ¼

ðiÞ

Dw ¼ faij : 1 i

ðiiÞ

gw ðaij Þ ¼ jwij j

Updating information states was treated as a kind of speciﬁcation in Section 3. Under certain conditions (viz., where there are no isolated nodes) it can be shown that each (partial) information state is completed asymptotically; i.e., in the asymptotic state the corresponding node activities are either +1 or )1. Consequently, ASUPw(s) contains only total information states. As a matter of fact, each total information state t corresponds to a total propositional interpretation function v/t where v/t (pi) ¼ ti. Now we have the following facts: ð27Þ ðiÞ

kpi kv=t ¼ ti

ðiiiÞ ka $ bkv=t ¼ kakv=t kbkv=t ðivÞ ðvÞ

t kakiffkakv=t ¼ 1; in case a is a conjunction of literals X eðv=tÞ ¼ EðtÞ; where EðtÞ ¼ w t t is the energy i>j ij i j function of a Hopfield network with the connection matrix w and eis Poole’s system energy for the weight-annotated Poole-system Tw ; cf. (19) [217]

158

REINHARD BLUTNER

(27) (i–iv) are direct consequences of the corresponding deﬁnitions. (27) (v) expresses the equivalence between Poole’s system energy and the Hopﬁeld energy of an information state. In order to prove this equivalence we start with the deﬁnitions (25) and (26) and we get Dw ¼ fðpi $ signðwij Þpj : 1 i j ng. Next we use the deﬁnition (19) for E with regard to DW and we employ the results (27) (i–iii). The equality (27) (v) is obvious if we take into account the equation jwij j ¼ gw ðPi $ signðwij Þpj Þ, which results from (26) (ii). Together with Theorem 2 (expressing that the proof procedure in weight-annotated Poole systems is sound and complete) and the fact that in a local Hopﬁeld model each state is symbolic and can be represented by a conjunction of literals, the statement (27) (v) allows us to prove that non-monotonic inferences based upon asymptotic updates can be represented by inferences in weight-annotated Poole systems: THEOREM 3. Let a and b be formulas that are conjunctions of literals. Assume further that the Poole system T is associated with the connection matrix w. Then ð28Þ

ajw b iff a >T b ðiff a >T bÞ:

For the simple proof we start with (27)(v) – the equivalence between Poole’s system energy and the Hopﬁeld energy of an information state. Then it is straightforward that the semantic notion a >T b (entailment in preferred models, cf. Deﬁnition (20)) coincides with the non-monotonic inferential relation s jw t between information states (energy-minimal speciﬁcations, cf. Equation (11) and Deﬁnition (13)) assumed we have a local Hopﬁeld model that realizes the correspondences a ¼ s and b ¼ t. The result expressed by Theorem 3 shows that we can use nonmonotonic logic to characterize asymptotically how neuron activities spread through the connectionist network. In particular, a weighted variant of Poole’s logical framework for default reasoning has proven to be essential. Hence, the usefulness of non-monotonic logic as a descriptive and analytic tool for analyzing emerging properties of connectionist networks has been illustrated. Going back to our earlier example, Figure 4 illustrates the close relation between the connection matrix in Hopﬁeld systems and the corresponding default system (weight-annotated Poole system). Using the connection matrix w as given in (15), the [218]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

1

p1

0.1

0.2

p2

159

3

2 -1

p3

Figure 4. The correspondence between the connection matrix and a weight-annotated Poole system.

corresponding Poole system is given by the following weightannotated defaults.4 Tw ¼

ðiÞ

At ¼ fp1 ; p2 ; p3 g

ðiiÞ

Dw ¼ fp1 $ p2 ; p1 $ p3 ; p2 $ p3 g

ðiiiÞ

gw ¼ f½p1 $ p2 ; 0:2; ½p1 $ p3 ; 0:1; ½p2 $ p3 ; 1g

The translation mechanism can be read out from Figure 4. It simply translates a node i into the atomic symbol pi, translates an activating link in the network into the logical biconditional $, and translates an inhibitory link into the biconditional $ plus an internal negation of one of its arguments. Furthermore, the weights of the defaults have to be taken as the absolute value of the corresponding matrix elements. There is another perspective from which one may look at Theorem 3. One possible application of the Theorem 3 may be the use of connectionist techniques (such as ‘‘simulated annealing’’) to implement nonmonotonic inferences.5 Seen in isolation, the latter point would favour the position of implementative connectionism – that connectionist ideas may be used only for the implementation of established symbolist systems. In the next section it is argued that already by considering local representations (where symbols correspond to single nodes in the network) connectionism provides remarkable insights that go beyond what usually is associated with the conception of implementation: connectionism is able to explain some peculiarities of symbolist systems. [219]

160

REINHARD BLUTNER

6. EXPONENTIAL WEIGHTS AND OPTIMALITY THEORY

Non-trivial examples of weight-annotated Poole systems may be extracted from intra-segmental phonology. Intra-segmental phonology has been a source of inspiration for developing theories of markedness (e.g., Chomsky and Halle 1968; Kean 1975, 1981). In the present context it is used for demonstrating some aspects of integrative connectionism. Table II presents a fragment of the vowel system of English (adapted from Kean 1975), which is a bit simpliﬁed for the present purpose. It contains a classiﬁcation of the vowels in terms of the binary phonemic features back, high and low. The feature round has to be added in order to distinguish the segment /]/ (rounded) from the segment /a/ (not rounded). For the purpose of formalization, the phonological features may be represented by the atomic symbols BACK, LOW, HIGH, ROUND. The knowledge of the phonological agent concerning this fragment may be represented explicitly as in Table III (left hand part) – a list that enumerates the feature speciﬁcations for each vowel segment. It is evident that this list contains strong (absolute) and weak (or ‘‘probabilistic’’) redundancies. For example, all segments that are classiﬁed as +HIGH are correlated with the speciﬁcation )LOW (strong redundancy) and most segments that are classiﬁed as +BACK are correlated with the speciﬁcation +LOW (weak redundancy).6 Let’s assume the following two hard constraints: ð30Þ

ðiÞ

LOW ! HIGH

ðiiÞ

ROUND ! BACK:7

The generic knowledge of the phonological agent concerning this fragment may be expressed with regard to the hierarchy of features: TABLE II Fragment of the vowel system of English )Back

+Back

/i/ /e/ /æ/

/u/ /o/ /]/ /a/

[220]

+High )High/)low +Low

161

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

TABLE III Left: Feature speciﬁcations for a vowel fragment. Right: Hopﬁeld network with exponential weights representing the generic knowledge of a phonological agent

LOW HIGH ROUND

ð31Þ

c

BACK

/a/ + + _ _

/i/ /o/ /u/ / / _ + + + _ _ _ + + _ + _ _ + + +

/e/ _ _ _ _

/æ / _ + _ _

VOC BACK LOW

e1 e2

−e 4

−e 3 HIGH

ROUND

Markedness conventions in the sense of Kean (1975, 1981), are rules for determining the (un)marked value of a feature F for a segment x given its value for a feature G preceding F in the feature hierarchy. ðiÞ BACK is an unmarked property of the class VOC of vowels ð32Þ

ðiiÞ LOW is an unmarked property of BACK (the class of back vowels) ðiiiÞ

HIGH is an unmarked property of BACK

ðivÞ ROUND is an unmarked property of LOW:8 There is a very general principle that allows to calculate the unmarked values of the complements: If aF is an unmarked property of bG; then aF ð33Þ

is an unmarked property of bG; where ‘‘ ’’ turns the value of a from + into; and vice versa

For example, (33) allows us to determine +HIGH as an unmarked property of )BACK. It is intuitively clear now how the feature speciﬁcations in Table III can be calculated if the shaded elements are taken to be given (marked) explicitly. For instance, in order to calculate the speciﬁcations for the unmarked vowel /a/ we go stepwise through the feature hierarchy and calculate +BACK by applying (32)(i), +LOW by applying(32)(ii), )HIGH by applying the hard constraint (30)(i), )ROUND by applying (32)(iv). [221]

162

REINHARD BLUTNER

Taking the feature hierarchy (31) into account, the markedness conventions (32) can be represented as the elementary Hopﬁeld network shown on the right-hand site of Table III. The technical means of expressing the hierarchy is the use of exponential weights with basis 0 < e £ 0.5.9 It should be mentioned that the implementation of the hard constraints (30) is not possible within a localist Hopﬁeld network, and we will simply assume that these constraints are external and restrict the possible activations of the nodes in the network in this way. The assigned Poole-system in the case under discussion has also to make use of exponential weights:

ð34Þ

ðiÞ

VOC $ e1 BACK

ðiiÞ

BACK $ e2 LOW

ðiiiÞ

BACK $ e3 HIGH

ðivÞ

LOW $ e4 ROUND

Theorem 3 ensures the equivalence between the connectionist and the symbolic treatment. Obviously, in the symbolic case the exponential ranking corresponds to a linear ranking of the defaults – with the hard constraints (30) on top. Using ﬁnite state transducers, the computational consequences of this view have been investigated by Frank and Satta (1998) and Karttunen (1998). In cognitive psychology, the distinction between automatic processing and controlled processing has been demonstrated to be very useful (e.g., Shiﬀrin and Schneider 1977). Automatic processing is highly parallel, but limited in power. Controlled processing has powerful operations, but is limited in capacity. The distinction between these two types of processing has been used, for example, for modelling lexical access, visual perception, problem solving and parsing strategies in natural language processing. In the present context it may be helpful to assume that the distinction correlates with the shape of the Ljapunov function. In a case where the network is built with exponential weights, the energy landscape is structured like high mountains with an obvious path to the valley of the global energy minimum. Moreover, the stochastic update procedure yields robust results given some ﬂuctuation of the parameters corresponding to attention. Hence we have the tentative characterization of automatic processing. In the other case, however, the energy landscape is ﬂat and there are many walls of comparable height to cross

[222]

163

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

before we see the valley of minimum energy. In this case, it is much more diﬃcult for the adiabatic freezing mechanism to ﬁnd the global optimum. Processing is slow and the attentional parameters may become much more inﬂuential. That corresponds to the traits of controlled processing. Although the present story has several elements of speculation it is obvious that the assumption of exponential weights has important consequences for the crucial characteristics of processing. Theorem 2 opens a third way to calculate the consequences of activating a structure a, namely by determining the preferred (or optimal) model(s) of a. Table IV shows a so-called OT tableau for an input a ¼ Voc, which precisely illustrates this calculation. As with weight-annotated Poole systems, OT looks for an optimal satisfaction of a system of conﬂicting constraints. Most importantly, the exponential weights of the constraints result in a strict ranking of the constraints, meaning that violations of many lower ranked constraints invariably count less than one violation of a higher ranked constraint (Prince and Smolensky 1993). The candidates can be seen as information (activation) states (left hand side of Table IV). The Harmony (or NegEnergy H ¼ )E) can be recognized immediately from the violations of the (strictly ranked) constraints. Violations are marked by * (right hand side of Table IV) and the optimal candidate is indicated by F. The non-monotonicity of the OT framework corresponds to the fact that the optimal candidate(s) for a subset may be TABLE IV OT tableau for calculating the optimal vowels + + ) + + + + ) ) BACK

+ + ) ) ) ) ) + ) Low

) ) + + + ) ) ) ) HIGH

+ ) ) + ) + ) ) ) ROUND

* *

* * Voc l Back

* * * * * * Back l Low

* *

* *

* * Back l HIGH

* LOW l ROUND

Input: + VOC.

[223]

164

REINHARD BLUTNER

diﬀerent from the optimal candidate(s) of the original set. This is demonstrated in Table V. Whereas in Table IV the segment /a/ comes out as the optimal vowel, in Table V, the segment /u/ comes out as the optimal high vowel. Table IV shows that all constraints are satisﬁed for the optimal candidate. Table V demonstrates a case where two constraints conﬂict. The conﬂicting constraints are: VOC $ BACK and BACKLOW. Whereas the ﬁrst candidate /i/ violates the ﬁrst constraint and satisﬁes the second constraint, the second candidate /u/ satisﬁes the ﬁrst constraint and violates the second. The constraint conﬂict is resolved via a notion of diﬀerential strength. The second candidate wins because the ﬁrst constraint is stronger (ranked higher) than the second.10 Optimality theory was originally proposed by Prince and Smolensky (1993) as a kind of symbolic approximation to the patterns of activation constituting mental representation in the domain of language. It relates to Harmony Theory (Smolensky 1986), a mathematical framework for the theory of connectionist computation. This framework makes it possible to abstract away from the details of connectionist networks. The present study contrasts with this approach. It is based on a particular type of network, which exhibits simple attractive mathematical properties: Hopﬁeld networks. Hence, it cannot be our aim to contribute to the explanation of the general correspondence between connectionism and optimality theory. Instead, the present study brings about the correspondence in a very singular case only. Using the idea of local representations (Hopﬁeld models), we were able to

TABLE V OT tableau for calculating the optimal high vowels ) + +

) ) )

+ + +

) + )

BACK

LOW

HIGH

ROUND

Input: + VOC ^+ HIGH.

[224]

*

VOC l BACK

* * * BACK l LOW

* * BACK l HIGH

* LOW l ROUND

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

165

provide an explicit connection between Hopﬁeld networks and particular system of constraints and their ranking (Poole systems). The present study gives a concrete model that may be helpful in deciding which deﬁning principles of OT are derivable from connectionism and which are not. Following the general exposition in Smolensky (2000), it is obvious that the following principles can be derived from the connectionist setting: Optimality: The correct output representation is the one that maximizes Harmony. This can be derived by taking harmony as H ¼ )E, where E is the Ljapunov-function of the connectionist system. Containment: Competition for optimality is between outputs that include the given input. This point is derived from the idea that clamping the input units restricts the optimization in a network to those patterns including the input. Parallelism: Harmony measures the degree of simultaneous satisfaction of constraints. This is clearly expressed by Deﬁnition (19). Conﬂict: Constraints conﬂict: it is typically impossible to simultaneously satisfy them all. This point is derived from the fact that positive and negative connections typically put conﬂicting pressures on a unit’s activity. Domination: Constraint conﬂict is resolved via a notion of diﬀerential strength: stronger constraints prevail over weaker ones in cases of conﬂict (consider the expression (19) again). As it is made clear in Smolensky (2000), all these principles hold independently of the particular connectionist networks that have been considered. The present treatment demonstrates that there is at least one principle that is derivable from the peculiarities of Hopﬁeld networks only. This principle concerns the observation that all weak (violable) constraints have the form of bi-conditionals. As already mentioned, this principle follows from the symmetry of the weight matrix (Section 2) and from the idea of local representations (Section 5). Keans (1975, 1981) general principle (33) is a consequence of this fact. This principle is crucial for giving Kean’s markedness theory its restrictive power, and it is amazing that it is motivated by the general structure of the underlying neural network. Finally, there are at least two principles that are not derivable from connectionism but need some extra motivation.

[225]

166

REINHARD BLUTNER

Strictness of domination: Each constraint is stronger than all weaker constraints combined. This principle makes it possible to determine optimality without numerical computation. As we have seen this principle may be motivated by the assumption of an automatic processing mode. Universality: The constraints are the same in all human grammars. This principle corresponds to a strong restriction on the content of the constraints. At the moment, it expresses an empirical generalization and it is absolutely unclear how to explain it (Smolensky 2000). In Section 1, the implementationalist position was mentioned – the position that downplays connectionism as a pure issue of the implementation of existing symbolic systems. Without doubt, connectionism can be used to implement existing symbolic systems. In the present context, for example, the Boltzman machine can be used in order to implement non-monotonic inferences in weight-annotated Poole systems. However, connectionism is much more than a device to implement existing systems. Assuming connectionist ideas at the micro-level, the development of optimality theory (Prince and Smolensky 1993) has shown that non-classical cognitive architectures may emerge with broad applications in linguistics.11 Another important point is the demonstration that aspects of possibly old-fashioned, ‘‘classical’’ architectures can be explained by assuming an underlying (connectionist) micro-level. In one case, we have demonstrated that the underlying symmetric architecture of the Hopﬁeld networks helps to motivate the general form of the symbolic system (33). Furthermore, an underlying connectionist system may help motivate the connection between the automatic processing mode and the strictness assumption of domination in ranked default systems). Hence, there are good reasons to accept the integrative methodology elucidated in Section 1. 7. RELATED WORK

In the introduction to this paper I stressed the integrative methodology the present account is pursuing. Due to the pioneering work by Balkenius and Ga¨rdenfors (1991) the results of such a programme can be useful both for traditional connectionist and for traditional symbolists. On the one hand, the results can help the connectionist better to understand their networks and to solve the so-called [226]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

167

extraction problem (e.g., d’Avila Garcez et al. 2001). On the other hand, the outcomes of the integrative methodology can help the symbolist to ﬁnd more eﬃcient implementations, for instance by using connectionist methods for solving ‘‘hard’’ problems such as optimization problems and constraint satisfaction problems. Most of the authors that aim to bridge the gap between connectionism and symbolism are concentrated either on the extraction problem or on the problem of implementation. An example for the ﬁrst approach is d’Avila Garcez et al. (2001). These authors investigate feedforward networks and provide a complete and sound algorithm for extracting the underlying logical function that maps each vector of input activation to the corresponding output vector. The second approach is pursued by Derthick (1990) and Pinkas (1995), inter alias. The work by Pinkas (1995) is particularly relevant. Very similarly to the present account, it maps preferred models into the minima of an energy function. However, there are also important diﬀerences. The ﬁrst diﬀerence concerns the symbolic reasoning systems. Pinkas introduced penalty logic. Likewise to knowledge representation in weight-annotated Poole systems, a positive real number is assigned (the penalty) to every propositional formula representing domain knowledge. Importantly, these weighted beliefs are used diﬀerently in the two reasoning systems. In the case of weightannotated Poole systems both the beliefs that are included in the scenario and the beliefs that are not included count, cf. the expression (17) repeated here: X X ð17Þ gðdÞ gðdÞ: GðD0 Þ ¼ d2D0 d2ðDD0 Þ In contrast, in penalty logic, a penalty function F is constructed that counts the missing beliefs only (i.e., the beliefs that are not included in the scenario/theory): X ð35Þ gðdÞ: FðD0 Þ ¼ d2ðDD0 Þ Another important diﬀerence concerns the mapping between the expressions of the propositional default logic and the assigned symmetric (Hopﬁeld) network. Pinkas seeks a connectionist implementation of his penalty logic. Most importantly, he is able to deﬁne a function that translates every set of standard propositional formulas (paired with penalties) into a strongly equivalent symmetric network. However, this function is not one-to-one since diﬀerent logical systems can be connected with the same network. For instance, the two penalty [227]

168

REINHARD BLUTNER

logical systems w1 ¼ f<1 : p ! q >g and w2 ¼ f<1 : q ! p >; <1 : q >; <1 : :p >g realize the same (two nodes) network – a network that can be characterized by the energy function E ¼ p ) pq. Surely, this fact is not relevant when it comes to the connectionist implementation of penalty logic since in this case a unique symmetric network can be constructed for each set of expressions in penalty logic. The fact, however, becomes highly relevant when it comes to deal with the extraction problem and we have to extract the corresponding system of expressions in penalty logical for a given symmetric net. Usually, there is no unique solution to this task, and this leads to a complication of the extraction procedure since it is not clear now which system should be extracted. This contrasts with the unique translation procedure proposed in the present article. In this case the proposed mapping between Hopﬁeld nets and (a subset of) weight-annotated Poole systems is one-to-one. The mapping is a very transparent one and simply translates the links in the network to biconditionals in the logic. From the view of implementation, of course, it is a disadvantage that the present default system is restricted to simple biconditionals a $ b where a and b are literals. However, such expressions seem to have a privileged status in certain cognitive systems, for example in theories of markedness in intrasegmental phonology (see Section 6). This raises two important questions: the ﬁrst asking to justify the special status of such a restricted systems; the second asking to overcome the restrictions by introducing hidden units and/or distributed representations. 8. CONCLUSIONS

Hopﬁeld networks are, for (integrative) connectionists, to some extent what harmonic oscillators are for physicists and what propositional logic is for logicians. They are simple to study; their mathematical properties are pretty clear, but they have very restricted applications. The main advantage of concentrating on this simple type of network is a methodological one: it helps to clarify the important notions, it sharpens the mathematical instruments, and it provides a starting point for extending and modifying the simplistic framework. In this vein, it has to be stressed that the primary aim of the present investigation is a methodological one: the demonstration that [228]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

169

model-theoretic semantics may be very useful for analyzing (emerging properties of) connectionist networks. The main ﬁnding was that certain activities of connectionist networks can be interpreted as nonmonotonic inferences. In particular, there is a strict correspondence between Hopﬁeld networks and particular non-monotonic inferential systems (Poole systems). The relation between nonmonotonic inferences and neural computation was established to be of the type that holds between higher level and lower level systems of analysis in the physical sciences. (For example, statistical mechanics explains signiﬁcant parts of thermodynamics from the hypothesis that matter is composed of molecules, but the concepts of thermodynamic theory, like ‘‘temperature’’ and ‘‘entropy,’’ involve no reference whatever to molecules.) Hence, our approach is a reductionist one – understanding reductionism in a way that sees uniﬁcation as the primary aim and not elimination (cf. Dennett (1995) for a philosophical rehabilitation of reductionism). Admittedly, the results found so far are much too simplistic to count as a real contribution for closing the gap between symbolism and connectionism. There are two main limitations. The ﬁrst concerns the consideration of local representations only, where symbols correspond to single nodes in the network. Further, the present paper did not make use of hidden nodes, which considerably restricts the capacity of knowledge representation. The second limitation concerns the rather static conception of node activity studied here, which precludes the opportunity to exploit the idea of coherent ﬁring of neurons (temporal synchrony). As a consequence, we are confronted with a very pure symbolic system that fails to express constituent structure, variable binding, quantiﬁcation, and the realization that consciousness and intentionality are prerequisites for cognition and knowledge (cf. Bartsch 2002). There are several possible ways to overcome these shortcomings. First, a straightforward extension is to incorporate Pinkas’ ideas concerning the role of hidden units into the present framework. Another aspect is to introduce distributed representations for realizing constituent structures and binding (e.g., Smolensky’s 1990 tensor product representation). An alternative possibility is to adopt the so-called coherence view, where a dynamic binding is realized by some sort of coherence (e.g., coherent ﬁring of neurons – temporal synchrony: Shastri and Ajjanagadde 1993; Shastri and Wendelken 2000; circuit activity organized by attractors: Grossberg 1996; Bartsch 2002). [229]

170

REINHARD BLUTNER

Optimality Theory (OT) has proposed a new computational architecture for cognition which claims to integrate connectionist and symbolic computation. Though too simple to give a full justiﬁcation of OT’s basic principles, the present account of unifying connectionism and symbolism can help to understand them. Especially, it may help to understand the hierarchical encoding of constraint strengths in OT. The solution to this particular problem ‘‘may create a rapprochement between network models and symbolic accounts that triggers an era of dramatic progress in which alignments are found and used all the way from the neural level to the cognitive/ linguistic level.’’ (Bechtel 2002, 17) Concluding, it is important to get an active dialogue between the traditional symbolic approaches to logic, information and language and the connectionist paradigm. Perhaps, this dialogue may stimulate the present discussion of founding the basic principles of Optimality Theory, and likewise it may shed new light on the old notions like partiality, underspeciﬁcation, learning, genericity, probabilistic logic, and prototypicality. ACKNOWLEDGEMENTS

My special thanks go to Johan van Benthem, Michiel van Lambalgen and Larry Moss who have encouraged me to pursue this line of research and gave valuable impulses and stimulation. Furthermore, I have to thank Anton Benz, Paul Doherty, Jason Mattausch, Oren Schwarz, Paul Smolensky, Anatoli Strigin, and Henk Zeevat. I am grateful to two anonymous referees for very helpful comments. NOTES 1

The use of asynchronous updates helps to prevent the network from falling into unstable oscillations, see Section 3. 2 Intuitively, impossible activation states express clashes between positive and negative activation. 3 The procedure is called ‘‘simulated annealing’’(based on an analogy from physics). 4 In Figure 4, the weights are represented as indexing the biconditionals in the corresponding defauls. 5 For details cf. Derthick (1990). 6 From the perspective of Universal Grammar, absolute constraints are universal in the sense that they are actually inviolate in every language. Weak constraints, on the other hand, embody universality in a soft sense. The idea is that each feature has two

[230]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

171

values or speciﬁcations, one of which is marked, the other unmarked. Unmarked values are crosslinguistically preferred and basic in all grammars, while marked values are crosslinguistically avoided and used by grammars only to create contrasts. 7 The ﬁrst constraint is universally valid for obvious reasons. Although the second constraint is satisﬁed for the present vowel fragment, it is not valid generally. Considering it as a hard constraint despite this fact is a rough stipulation that makes sense only in order to simplify the subsequent discussion. 8 Kean (1975, 1981) considers the more general case where more than one feature preceding F in the feature hierarchy is necessary to determine the (un)marked value of the feature F. An example is (iv’) ROUND is an unmarked property of BACK ^ LOW The expression (iv) comes close to the formulation of (iv0 ) if the hard constraint (30)(ii) is taken into account (see Note 7). 9 e= 12 or smaller is a proper base in case of binary features which can applied only one time. 10 The third candidate violates the same constraints as the second one plus the additional constraint LOW $ ROUND. It can be seen as irrelevant for determining the optimal candidate. 11 Obviously, optimality theory has markedness theory as one of its predecessors. However, OT goes far beyond classical markedness theories. It includes a powerful learning theory (Tesar and Smolensky 2000) and applies to the phenomenon of gradedness (e.g., Boersma and Hayes 2001). 12 For a critical review of this approach the reader is referred to Bartsch (2002), Section 2.3.3.

REFERENCES d’Avila Garcez, A., K. Broda and D. Gabbay: 2001, ‘Symbolic Knowledge Extraction from Trained Neural Networks: A Sound Approach’, Artiﬁcial Intelligence 125, 153–205. Balkenius, C. and P. Ga¨rdenfors: 1991, ‘Nonmonotonic Inferences in Neural Networks’, in J. A. Allen, R. Fikes and E. Sandewall (eds.), Principles of Knowledge Representation and Reasoning, Morgan Kaufmann, San Mateo, CA. Bartsch, R.: 2002, Consciousness Emerging, John Benjamins, Amsterdam & Philadelphia. Bechtel, W.: 2002, Connectionism and the Mind, Blackwell, Oxford. Boersma, P. and B. Hayes: 2001, ‘Empirical Tests of the Gradual Learning Algorithm’, Linguistic Inquiry, 32, 45–86. Boutsinas, B. and M. Vrahatis: 2001, ‘Artiﬁcial Nonmonotonic Neural Networks’, Artiﬁcial Intelligence 132, 1–38. Chomsky, N., and M. Halle: 1968, The Sound Pattern of English, Harper and Row, New York. Cohen, M. A. and S. Grossberg: 1983, ‘Absolute Stability of Global Pattern Formation and Parallel Memory Storage by Competetive Neural Networks’, IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 815–826. Dennett, D. C.: 1995, Darwin’s Dangerous Idea, Simon & Schuster, New York. Derthick, M.: 1990, ‘Mundane Reasoning by Settling on a Plausible Model’, Artiﬁcial Intelligence 46, 107–157.

[231]

172

REINHARD BLUTNER

Fodor, J. A. and Z. W. Pylyshyn: 1988, ‘Connectionism and Cognitive Architecture: A Critical Analysis’, Cognition 28, 3–71. Frank, R. and G. Satta: 1998, ‘Optimality Theory and the Generative Complexity of Constraint Violability’, Computational Linguistics 24, 307–315. Gabbay, D.: 1985, ‘Theoretical Foundations for Non-monotonic Reasoning in Expert Systems’, in K. Apt (ed.), Logics and Models of Concurrent Systems, Springer-Verlag, Berlin, pp. 439–459. Glymour, C.: 2001, The Mind’s Arrows, The MIT Press, Cambridge & London. Grossberg, S.: 1989, ‘Nonlinear Neural Networks: Principles, Mechanisms, and Architectures’, Neural Networks 1, 17–66. Grossberg, S.: 1996, ‘The Attentive Brain’, American Scientist 83, 438–449. Hendler, J. A.: 1989, ‘Special Issue: Hybrid Systems (Symbolic/Connectionist)’, Connection Science 1, 227–342. Hendler, J. A.: 1991, ‘Developing Hybrid Symbolic/Connectionist Models’, in J. Barnden and J. Pollack (eds.) High-level connectionist Models, Advances in Connectionist and Neural Computation Theory, Vol. 1, Ablex Publ. Corp, Norwood, NJ. Hinton, G. E. and T. J. Sejnowski: 1983, ‘Optimal Perceptual Inference’, in Proceedings of the Institute of Electronic and Electrical Engineers Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Washington, DC; 448–453. Hinton, G. E. and T. J. Sejnowski: 1986, ‘Learning and Relearning in Boltzman Machines’, in D. E. Rumelhart, J. L. McClelland, and the PDP research group, pp. 282–317. Hopﬁeld, J. J.: 1982, ‘Neural Networks and Physical Systems with Emergent Collective Computational Abilities’, Proceedings of the National Academy of Sciences 79, 2554–2558. Karttunen, L.: 1998, The Proper Treatment of Optimality in Computational Phonology, Manuscript. Xerox Research Centre, Europe. Kean, M. L.: 1975, The Theory of Markedness in Generative Grammar, Ph.D. thesis, MIT, Cambridge, Mass. Kean, M. L.: 1981, ‘On a Theory of Markedness’, in R. Bandi, A. Belletti and L. Rizzi (eds.), Theory of markedness in Generative Grammar, Estratto, Pisa, pp. 559–604. Kokinov, B.: 1997, ‘Micro-level Hybridization in the Cognitive Architecture DUAL’, in R. Sun and F. Alexander (eds.), Connectionist-symbolic Integration: From Uniﬁed to Hybrid Approaches Lawrence Erlbaum Associates, Hilsdale, NJ, pp. 197–208. Kraus, S., D. Lehmann and M. Magidor: 1990, ‘Nonmonotonic Reasoning, Preferential Models and Cumulative Logics’, Artiﬁcial Intelligence 44, 167–207. McCulloch, W. S. and W. Pitts: 1943, ‘A Logical Calculus of the Ideas Immanent in Nervous Activity’, Bulletin of Mathematical Biophysics 5, 115–133. Partee, B. with H. L. W. Hendriks: 1997, ‘Montague Grammar’, in J. van Benthem and A. ter Meulen (eds.), Handbook of Logic and Language, MIT Press, Cambridge. pp. 5–91. Pinkas, G.: 1995, ‘Reasoning, Nonmonotonicity and Learning in Connectionist Networks that Capture Propositional Knowledge’, Artiﬁcial Intelligence 77, 203–247.

[232]

NONMONOTONIC INFERENCES AND NEURAL NETWORKS

173

Pinker, S., and A. Prince: 1988, ‘On Language and Connectionism: Analysis of a Parallel Distributed Processing Model of Language Acquisition’, Cognition 28, 73–193. Poole, D.: 1988, ‘A Logical Framework for Default Reasoning’, Artiﬁcial Intelligence, 36, 27–47. Poole, D.: 1996, ‘Who Chooses the Assumptions?’, in P. O’Rorke (ed.), Abductive Reasoning, MIT Press, Cambridge. Prince, A. and P. Smolensky: 1993, Optimality Theory: Constraint Interaction in Generative Grammar. Technical Report CU-CS-696-93, Department of Computer Science, University of Colorado at Boulder, and Technical Report TR-2, Rutgers Center for Cognitive Science, Rutgers University, New Brunswick, NJ. Rumelhart, D. E., J. L. McClelland, and the PDP research group: 1986, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume I and II. MIT Press/Bradford Books, Cambridge, MA. Rumelhart, D. E., P. Smolensky, J. L. McClelland and G. E. Hinton: 1986, ‘Schemata and Sequential Thought Processes in PDP Models’, in D. E. Rumelhart, J. L. McClelland and the PDP research group (eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume II, MIT Press/Bradford Books, Cambridge, MA, pp. 7–57. Shastri, L. and V. Ajjanagadde: 1993, ‘From Simple Associations to Systematic Reasoning’, Behavioral and Brain Sciences, 16, 417–494. Shastri, L. and C. Wendelken: 2000, ‘Seeking Coherent Explanations – A Fusion of Structured Connectionism, Temporal Synchrony, and Evidential Reasoning, Proceedings of Cognitive Science, Philadelphia, PA. Shiﬀrin, R. M. and W. Schneider: 1977, ‘Controlled and Automatic Human Information Processing: II. Perceptual Learning, Automatic Attending, and A General Theory’, Psychological Review 84, 127–190. Smolensky, P.: 1986, ‘Information Processing in Dynamical Systems: Foundations of Harmony Theory’, in D. E. Rumelhart, J. L. McClelland, and the PDP research group (eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Volume I, MIT Press/Bradford Books, Cambridge, MA, pp. 194–281. Smolensky, P.: 1988, ‘On the Proper Treatment of Connectionism’, Behavioral and Brain Sciences 11, 1–23. Smolensky, P.: 1990, ‘Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Networks’, Artiﬁcial Intelligence 46, 159–216. Smolensky, P.: 1996, ‘Computational, Dynamical, and Statistical Perspectives on the Processing and Learning Problems in Neural Network Theory’, in P. Smolensky, M. C. Mozer, and D. E. Rumelhart (eds.), Mathematical Perspectives on Neural Networks. Lawrance Erlbaum Publishers, Mahwah, NJ, pp. 1–13. Smolensky, P.: 2000, ‘Grammar-Based Connectionist Approaches to Language’, Cognitive Science 23, 589–613. Smolensky, P., and G. Legendre (to appear), The Harmonic Mind: From Neural Computation to optimality-theoretic grammar, Blackwell, Oxford. Smolensky, P., G. Legendre, and Y. Miyata: 1992, ‘Principles for An Integrated Connectionist/Symbolic Theory of Higher Cognition’, Technical Report CU-CS600-92, Department of Computer Science, Institute of Cognitive Science, University of Colorado Boulder.

[233]

174

REINHARD BLUTNER

Tesar, B. and P. Smolensky: 2000, Learnability in Optimality Theory, MIT Press, Cambridge, MA. Department of Philosophy University of Amsterdam Nieuwe Doelenstraat 15 Amsterdam The Netherlands E-mail: [email protected]

[234]

175 FRANZ DIETRICH and CHRISTIAN LIST

A MODEL OF JURY DECISIONS WHERE ALL JURORS HAVE THE SAME EVIDENCE

ABSTRACT. Under the independence and competence assumptions of Condorcet’s classical jury model, the probability of a correct majority decision converges to certainty as the jury size increases, a seemingly unrealistic result. Using Bayesian networks, we argue that the model’s independence assumption requires that the state of the world (guilty or not guilty) is the latest common cause of all jurors’ votes. But often – arguably in all courtroom cases and in many expert panels – the latest such common cause is a shared ‘body of evidence’ observed by the jurors. In the corresponding Bayesian network, the votes are direct descendants not of the state of the world, but of the body of evidence, which in turn is a direct descendant of the state of the world. We develop a model of jury decisions based on this Bayesian network. Our model permits the possibility of misleading evidence, even for a maximally competent observer, which cannot easily be accommodated in the classical model. We prove that (i) the probability of a correct majority verdict converges to the probability that the body of evidence is not misleading, a value typically below 1; (ii) depending on the required threshold of ‘no reasonable doubt’, it may be impossible, even in an arbitrarily large jury, to establish guilt of a defendant ‘beyond any reasonable doubt’.

1. INTRODUCTION

Suppose a jury (committee, expert panel, etc.) has to determine whether or not a defendant is guilty (whether or not some factual proposition is true). There are two possible states of the world: x ¼ 1 (the defendant is guilty) and x ¼ 0 (the defendant is not guilty). Given that the state of the world is x, each juror has the same probability (competence) p > 1/2 of voting for x and the votes of diﬀerent jurors are independent from one another. Then the probability that a majority of the jurors votes for x, given the state of the world x, converges to 1 as the number of jurors increases. This is the classical Condorcet jury theorem (e.g., Grofman, Owen and Feld 1983). The theorem implies that the reliability of majority decisions can be made arbitrarily close to certainty by increasing the jury size. Synthese 142: 175–202, 2004. Knowledge, Rationality & Action 235–262. 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[235]

176

FRANZ DIETRICH AND CHRISTIAN LIST

This result may seem puzzling. What if all jurors are tricked by the same evidence, which seems ever so compelling? What if, against all odds, the wind blows an innocent person’s hair to the crime scene and the jurors believe that it could not have arrived there without the person’s presence? What if the evidence is so confusing that, no matter how many jurors are consulted, there is not enough evidence to solve a case conclusively? The classical Condorcet jury theorem suggests that we can rule out such scenarios by increasing the jury size suﬃciently. Suppose each juror views the crime scene from a diﬀerent perspective and obtains a separate item of evidence about the state of the world. This requires that, for each additional juror, a new independent item of evidence is available. So there must exist arbitrarily many items of evidence as the jury size tends to inﬁnity, which are conﬁrmationally independent regarding the hypothesis that the defendant is guilty (on conﬁrmational independence, see Fitelson 2001). Call this case A. Then the jury would be able to reach a correct decision with a probability approaching 1, by aggregating arbitrarily many independent items of evidence into a single verdict. But often there are not arbitrarily many independent items of evidence. Rather, the jury as a whole reviews the same body of evidence, such as that presented in the courtroom, which does not increase with the jury size. Each juror has to decide whether he or she believes that this evidence supports the hypothesis that the defendant is guilty. Call this case B. Arguably, decisions in most real-world juries and many committees and expert panels are instances of case B. Moreover, in most legal systems, there are ‘rules of evidence’ specifying what evidence is admissible in a court’s decision and what evidence is not. Jurors are legally required to use only the evidence presented in the courtroom (typically the only evidence about a case jurors come to see) and to ignore any evidence obtained through other channels (in those rare cases where they have such evidence). We argue that, while case A might satisfy the conditions of the classical Condorcet jury theorem, case B does not. We represent each case using Bayesian networks (Bovens and Olsson 2000; Pearl 2000; Corﬁeld and Williamson 2001). Case A satisﬁes Condorcet’s independence assumption, so long as a demanding condition holds: the state of the world is the latest common cause of the jurors’ votes. In the corresponding Bayesian network, votes are direct causal descendants of the state of the world. This assumption, although implicit in the classical Condorcet jury model, is not usually acknowledged. [236]

A MODEL OF JURY DECISIONS

177

Case B, by contrast, violates the classical independence assumption, as there exists an intermediate common cause between the state of the world and the jurors’ votes, namely the body of evidence. In the corresponding Bayesian network, the jurors’ votes are direct descendants of the body of evidence, which in turn is a direct descendant of the state of the world. This dependency structure has radical implications for the Condorcet jury theorem. The model developed in this paper is based on the Bayesian network of case B. The main novelty of our model is that diﬀerent jurors are independent not conditional on the state of the world, but conditional on the evidence. This follows from the requirement, formulated in terms of the Parental Markov Condition (deﬁned below), that independence should be assumed conditional on the latest common cause. While in case A the latest common cause of the jurors’ votes is the state of the world, in case B it is the shared body of evidence. Our model shows that, irrespective of the jury size and juror competence, the overall jury reliability at best approaches the probability that the evidence is not misleading, i.e., the probability that the evidence points to the truth from the perspective of a maximally competent ‘ideal’ observer, a value typically below one. We prove further that, depending on the required threshold of ‘no reasonable doubt’, it may be impossible, even in an arbitrarily large jury and even when there is unanimity, to establish guilt of a defendant ‘beyond any reasonable doubt’. The results imply that, if real-world jury, committee or expert panel decisions are more similar to case B than to case A, the classical Condorcet jury theorem fails to apply to such decisions. Previous work on dependencies between jurors’ votes has focused on, ﬁrst, opinion leaders – jurors who inﬂuence other jurors – (Grofman, Owen and Feld 1983; Nitzan and Paroush 1984; Owen 1986; Boland 1989; Boland, Proschan and Tong 1989; Estlund 1994) and, secondly, a lack of free speech that makes votes dependent on a few dominant ‘schools of thought’ (e.g., Lahda 1992). These sources of dependence diﬀer from the one in our model. In the ﬁrst case, the votes themselves are causally interdependent. In the second, some votes have an additional common cause: a common ‘school of thought’ that is independent from the state of the world. But in both cases, unlike in our model, votes are still direct descendants of the state of the world. As a consequence, existing models with dependencies have preserved the result that the probability of a correct majority decision converges to 1 as the jury size increases, so long as diﬀerent jurors’ votes are not too highly correlated. Further, these [237]

178

FRANZ DIETRICH AND CHRISTIAN LIST

models do not impose an upper bound on the total evidence available to the jury, and they usually suggest that the diﬀerence between Condorcet’s classical model and one with dependencies lies in a different (slower) convergence rate, but not in a diﬀerent limit, as in our model. By contrast, the dependency structure of our Bayesian network model has been unexplored so far. 2. THE MODEL

2.1. The Classical Jury Model There are n jurors, labelled i ¼ 1; 2; . . . ; n. The state of the world is represented by a binary variable X taking the values 0 (not guilty) or 1 (guilty). The jurors’ votes are represented by the binary random variables V1 ; V2 ; . . . ; Vn . Each Vi takes the values 0 (a ‘not guilty’ vote) or 1 (a ‘guilty’ vote). A juror i’s judgment is correct if and only if the value of Vi coincides with that of X. We use capital letters to denote random variables and small letters to denote particular values. Condorcet’s classical model assumes the following.1 Independence Given the State of the World (I|X). The votes V1 ; V2 ; . . . ; Vn are independent from one another, conditional on the state of the world X. As argued later, this implicitly assumes that each juror’s vote is directly probabilistically caused by the state of the world,2 and is therefore independent from the other jurors’ votes once the state of the world is given. Competence Given the State of the World ðC|XÞ. For each state of the world x 2 f0; 1g and all jurors i ¼ 1; 2; . . . ; n; p ¼ PðVi ¼ xjX ¼ xÞ > 1=2. Each juror’s vote is thus a signal about the state of the world, where the signal is noisy, but biased towards the truth, as p > 1=2. The Condorcet jury theorem states that majority voting over P such independent signals reduces the noise. More precisely, let V ¼ i¼1;...;n Vi be the number of votes for ‘guilty’. Then V > n=2 means that there is a majority for ‘guilty’, and V < n=2 means that there is a majority for ‘not guilty’. THEOREM 1. (Condorcet jury theorem) If (I|X) and (C|X) hold, then PðV > n=2jX ¼ 1Þ and PðV < n=2jX ¼ 0Þ converge to 1 as n tends to inﬁnity.3 [238]

A MODEL OF JURY DECISIONS

179

2.2. Bayesian Networks Bayesian networks can graphically represent the (probabilistic) causal relations between the diﬀerent variables such as X and V1 ; V2 ; . . . ; Vn . A Bayesian network is a directed acyclic graph, consisting of nodes and (unidirectional) arrows connecting nodes. The nodes represent the variables, and arrows ( ﬁ ) between nodes represent direct causal dependencies.4 The direction of an arrow represents the direction of causality. For example, a connection of the form X ! V1 means ‘X (directly) causally aﬀects V1’. Here X is a parent of V1, and V1 is a child of X. One node is a descendant of another, the ancestor, if there exists a sequence of arrows connecting the two nodes, where each arrow points away from the ancestor node and towards the descendant node. One node is a non-descendant of another if there exists no such sequence. So the descendant relation is the transitive closure of the child relation. Acyclicity of the graph means that no node is its own descendant. A Bayesian tree is a Bayesian network in which every variable has at most one parent. Many joint probability distributions of the variables at the nodes are consistent with a given Bayesian network. Here, consistency with the network means that the following condition is satisﬁed (for details on Bayesian networks, see Pearl 2000, ch. 1): Parental Markov Condition (PM). Any variable is independent from its non-descendants (except itself), conditional on its parents.5 For example, consider a medical condition (say a ﬂu) that can cause two symptoms in a patient (a sore throat and a fever). Consider the Bayesian tree of Figure 1, which contains three variables D, S1 and S2, each of which takes the value 0 or 1: D is 1 if the patient has the condition and 0 otherwise; S1 is 1 if the patient has the ﬁrst symptom and 0 otherwise; and S2 is 1 if the patient has the second symptom and 0 otherwise. This Bayesian tree, in which the symptoms S1 and S2 are direct descendants of condition D, expresses that both symptoms are direct consequences of condition D, rather than being commonly caused by some intermediate symptom S of the condition. The two symptoms are not independent unconditionally: a sore throat increases the chance of having a ﬂu, which in turn increases the chance of having a fever. The Parental Markov Condition says that the two symptoms are independent conditional on their common cause: given that you [239]

180

FRANZ DIETRICH AND CHRISTIAN LIST

Figure 1. A simple Bayesian tree.

have a ﬂu (D ¼ 1), having a sore throat and having a fever are independent from each other, and given that you have no ﬂu (D ¼ 0) having a sore throat and having a fever are also independent from each other. 2.3. The Classical Jury Model Revisited Figure 2 shows the Bayesian tree corresponding to the classical Condorcet jury model. The votes V1 ; V2 ; . . . ; Vn are non-descendants of each other and each has X as a parent. So the Parental Markov Condition holds if and only if V1 ; V2 ; . . . ; Vn are independent from each other, conditional on X, which is exactly the independence condition of the classical jury model. So an alternative statement of that model can be given in terms of the Bayesian tree in ﬁgure 2 together with conditions (PM) and (C|X). The Bayesian tree in ﬁgure 2 has the property that the state of the world X is the latest common cause of the jurors’ votes. In case B in the introduction, this property is violated. So, if real-world jury decisions are more like case B than case A, they are not adequately captured by the classical model. 2.4. The New Model The new model gives up the assumption that the state of the world is the latest common cause of the jurors’ votes. Instead, we assume that

Figure 2. Bayesian tree for the classical Condorcet jury model.

[240]

A MODEL OF JURY DECISIONS

181

there exists an intermediate common cause between the state of the world and the votes. For simplicity, we describe that intermediate common cause as the body of evidence. To illustrate why introducing a common body of evidence creates a dependency between votes that contradicts Condorcet’s independence assumption, imagine that you, an external observer, know that the defendant is truly guilty, and you learn that the ﬁrst 10 jurors have wrongly voted for ‘not guilty’. From this, you will infer that the jurors’ common evidence is highly misleading, which in turn implies that the 11th juror is also likely to vote for ‘not guilty’. This contradicts the classical condition of independence given the state of the world, according to which the ﬁrst 10 votes provide no information for predicting the 11th vote once you know what the true state of the world is. We represent the common body of evidence by a random variable, E, which takes values in some set E. Figure 3 shows the Bayesian tree corresponding to the new model. The value of E can be interpreted as the totality of available information about the state of the world the jurors are exposed to, including the testimony of witnesses, jury deliberation, the appearance of the defendant in court (relaxed or stressed, smiling or serious etc.). In Bayesian tree terms, E is a child of the state of the world and a parent of the jurors’ votes. What matters is not the particular nature of E, which will usually be complex, but the fact that every juror is exposed to the same body of evidence.6 We do not make any particular assumption about the set E of possible

Figure 3. Bayesian tree for the new model.

[241]

182

FRANZ DIETRICH AND CHRISTIAN LIST

bodies of evidence, which may be ﬁnite, countably inﬁnite, or even uncountably inﬁnite. The probability distribution of E depends on the state of the world: the distribution of E given guilt (X ¼ 1) is diﬀerent from that given innocence (X ¼ 0). In the case of guilt, it is usually more likely that the body of evidence will point towards guilt than in the case of innocence. For instance, the defendant might be more likely to fail a lie detector test in the case of guilt than in the case of innocence. We prove that the Parental Markov Condition, when applied to the Bayesian tree in Figure 3, has two implications: Common Signal (S). The joint probability distribution of the votes V1 ; V2 ; . . . ; Vn given both the evidence E and the state of the world X is the same as that given just the evidence E. So, the votes are only indirectly caused by the state of the world: they depend on the state of the world only through the body of evidence. Once the evidence is given, what the state of the world is makes no diﬀerence to the probabilities of the jurors’ votes.7 Independence Given the Evidence (I|E). The votes V1 ; V2 ; . . . ; Vn are independent from each other, conditional on the body of evidence E. So, the votes are independent from each other not once the state of the world is given, but once the evidence is given. Technically, this is described by saying that the consequences are screened oﬀ by their common cause, which means that the consequences (here the votes) become independent when we conditionalize on their common cause. We have: PROPOSITION 1. (PM) holds if and only if (S) and (I|E) hold. Proof. All proofs are given in the appendix.

(

The important part of proposition 1 is that (PM) entails (S) and (I|E), which provides a justiﬁcation for using (S) and (I|E) in our jury model. We have also proved the reverse entailment to show that all theorems using (S) and (I|E) could equivalently use (PM). In the new model, each juror’s vote is a signal, not primarily about the state of the world, but about the body of evidence, which in turn is a signal about the state of the world.8 Both signals are noisy: the body of evidence is a noisy signal about the state of the world; and a juror’s vote is a noisy signal about the body of evi[242]

A MODEL OF JURY DECISIONS

183

dence. But both signals are typically biased towards the truth: the body of evidence is more likely to suggest guilt than innocence in cases of guilt; and an individual juror’s vote is more likely to reﬂect an ‘ideal’ interpretation of the evidence than not. We address these issues below. In essence, our new jury theorem shows that majority voting reduces the noise in one set of signals – in the jurors’ interpretation of the body of evidence – but not in the other – in the body of evidence as a signal about the state of the world.9 Let us introduce our assumption about juror competence formally. Recall that, in the classical model, competence was modelled by each juror’s probability p > 1=2 of making a correct decision, conditional on the state of the world. We here deﬁne competence as the probability of giving an ‘ideal’ interpretation of the evidence, conditional on that evidence. Speciﬁcally, we assume that, for any body of evidence e 2 E, there exists an ‘ideal’ interpretation, denoted f(e), that a hypothetical ideal observer of e would give. This ideal observer does not know the true state of the world, but gives the ideal (best possible) interpretation of the available evidence; f(e) ¼ 1 means that the ideal observer would vote for ‘guilty’, and f(e) ¼ 0 means that the ideal observer would vote for ‘not guilty’. We call f(e) the ideal vote – as opposed to the correct vote, which is the vote matching the true state of the world.10 While knowledge of the true state of the world x would allow a correct vote, the ideal vote results from the best possible interpretation of the evidence e. The ideal vote and the correct vote diﬀer in the case of misleading evidence, such as when an innocent person’s hair is blown to the crime scene (and the person has no other alibi, etc.). Our competence assumption states that the probability that juror i’s vote matches the ideal vote f(e) given the evidence e exceeds 1/2. Informally, each juror is better than random at arriving at an ‘ideal’ interpretation of the evidence.11 Competence Given the Evidence (C|E). For all jurors i ¼ 1; 2; . . . ; n and each body of evidence e 2 E; pe :¼ PðVi ¼ fðeÞjE ¼ eÞ > 1=2: The value of pe may depend on e but not on i.12 If the body of evidence e is easily interpretable, for instance in the case of overwhelming evidence for innocence, the probability that an individual juror’s vote matches the ideal vote – here f(e) ¼ 0 – might be high, say pe ¼ 0.95. If the body of evidence e is confusing [243]

184

FRANZ DIETRICH AND CHRISTIAN LIST

or ambiguous, that probability might be only pe ¼ 0.55. Thus competence is a family of probabilities, containing one pe for each e 2 E. The term ‘competence’ here corresponds to the ability to interpret the diﬀerent possible bodies of evidence e 2 E in a way that matches the ideal interpretation. For simplicity, one might replace (C|E) with the stronger (and less realistic) assumption of homogeneous competence, according to which pe is the same for all possible e 2 E. Homogeneous Competence Given the Evidence (HC|E). For all jurors i ¼ 1; 2; . . . ; n; p :¼ PðVi ¼ fðeÞjE ¼ eÞ > 1=2; for each body of evidence e 2 E. The value of p depends neither on e nor on i. 3. THE PROBABILITY DISTRIBUTION OF THE JURY’S VOTE

We consider the model based on Figure 3 – assuming (PM) and hence (S) and (I|E) P – and derive the probability distribution of the jury’s vote V ¼ i¼1;...;n Vi given the state of the world. This distribution depends crucially on two parameters: pð1Þ :¼ PðfðEÞ ¼ 1jX ¼ 1Þ and pð0Þ :¼ PðfðEÞ ¼ 0jX ¼ 0Þ. The ﬁrst is the probability that the evidence is not misleading (that it points to the truth for an ideal observer) in the case of guilt; the second is the probability that the evidence is not misleading in the case of innocence. Our ﬁrst result addresses the case of homogeneous competence (HC|E). THEOREM 2. If we have (S), (I|E) and (HC|E), the probability of obtaining precisely v out of n votes for ‘guilty’ given guilt is ð1Þ n pv ð1 pÞnv PðV ¼ vjX ¼ 1Þ ¼ p v n nv ð1Þ þ ð1 p Þ p ð1 pÞv ; v and the probability of obtaining precisely v out of n votes for ‘guilty’ given innocence is ð0Þ n PðV ¼ vjX ¼ 0Þ ¼ p pnv ð1 pÞv v n v ð0Þ þ ð1 p Þ p ð1 pÞnv : v

[244]

A MODEL OF JURY DECISIONS

185

By theorem 2, if there is a non-zero probability of misleading evidence – speciﬁcally if 0 < pð1Þ < 1 or 0 < pð0Þ < 1 – the jury’s vote V given the state of the world X does not have a binomial distribution, in contrast to the classical Condorcet jury model. The reason for this is that the votes V1 ; V2 ; . . . ; Vn ; while independent given the evidence, are dependent given the state of the world. The sum of dependent Bernoulli variables does not in general have a binomial distribution. If, on the other hand, the probability of misleading evidence is zero – i.e., p(1) ¼ 1 and p(0) ¼ 1 – the probabilities in theorem 2 reduce to those in the classical Condorcet jury model. Our next result describes the probability P(V ¼ m|X ¼ x) for the more general case where we assume (C|E) rather than (HC|E). Since E is a random variable, E induces a random variable pE which takes as its value the competence pe associated with the value e of E. To avoid confusion with the random variable E, we write the expected value operator as Exp(.) rather than E(.). THEOREM 3. If we have (S), (I|E) and (C|E), the probability of obtaining precisely v out of n votes for ‘guilty’ given guilt is PðV ¼ vjX ¼ 1Þ ¼ n p1 ð ÞExpðpE v ð1 pE Þnv jfðEÞ ¼ 1 and X ¼ 1Þ v n þ ð1 pð1Þ Þð ÞExpðpE nv ð1 pE Þv jfðEÞ ¼ 0 and X ¼ 1Þ; v and the probability of obtaining precisely v out of n votes for ‘guilty’ given innocence is PðV ¼ vjX ¼ 0Þ ¼ n ð1 pð0Þ Þð ÞExpðpE v ð1 pE Þnv jfðEÞ ¼ 1 and X ¼ 0Þ v n þ ðpð0Þ Þð ÞExpðpE nv ð1 pE Þv jfðEÞ ¼ 0 and X ¼ 0Þ: v Note that, in theorems 2 and 3, by summing P(V ¼ v|X ¼ 1) over all v > n=2; we obtain the probability of a simple majority for ‘guilty’ given guilt; and, by summing P(V ¼ v|X ¼ 0) over all v < n=2, we obtain the probability of a simple majority for ‘not guilty’ given innocence. The present results allow us to compare the probability of a correct jury verdict in our model – speciﬁcally in the case of

[245]

186

FRANZ DIETRICH AND CHRISTIAN LIST

homogeneous competence – with that in the classical Condorcet jury model for the same ﬁxed level of juror competence p. COROLLARY 1. Suppose we have (S), (I|E) and (HC|E). Let v > n=2: Then the probability of obtaining precisely v out of n votes for ‘guilty’ given guilt satisﬁes n v PðV ¼ vjX ¼ 1Þ p ð1 pÞnv ; v and so the probability of obtaining a majority for ‘guilty’ given guilt satisﬁes n v PðV > n=2jX ¼ 1Þ Rv>n=2 p ð1 pÞnv : v The left-hand sides of the two inequalities correspond to our new model, the right-hand sides to the classical model. So corollary 1 implies that the probability of a majority for ‘guilty’ given guilt in our new model is less than or equal to that in Condorcet’s model. Similarly, the probability of a majority for ‘not guilty’ given innocence in our new model is less than or equal to that in the classical model. The probability of a correct jury verdict is equal in the two models if and only if the probability of misleading evidence is zero. Unless the evidence always ‘tells the truth’ – unless p(1) ¼ p(0) ¼ 1 – the jury in our new model will reach a correct verdict with a lower probability than in the classical model. 4. A MODIFIED JURY THEOREM

We now state our modiﬁed jury theorem. Its ﬁrst part is concerned with the probability that the majority of jurors matches the ideal vote, and its second part with the more important probability that the majority of jurors matches the true state of the world. THEOREM 4. Suppose we have (S), (I|E) and (C|E). (i) Let W be the number of jurors i whose vote Vi coincides with the ideal vote f(E). For each x 2 f0; 1g; PðW > n=2jX ¼ xÞ converges to 1 as n tends to inﬁnity. (ii) PðV > n=2jX ¼ 1Þ converges to p(1) as n tends to inﬁnity, and P(V < n/2|X=0) converges to p(0) as n tends to inﬁnity. [246]

A MODEL OF JURY DECISIONS

187

Part (i) states that, given the state of the world, the probability that the majority verdict matches the ideal interpretation of the evidence converges to 1 as n tends to inﬁnity. But the ideal interpretation may not be correct. Part (ii) states that the probability that the majority verdict matches the true state of the world (given that state) converges to the probability that the ideal interpretation of the evidence is correct, i.e., that the evidence is not misleading. Reformulating part (i), the probability of no simple majority for the ideal interpretation of the evidence converges to 0. Reformulating part (ii), the probability of no simple majority matching the true state of the world converges to the probability that the evidence is misleading, i.e., that the ideal interpretation of the evidence is incorrect. This theorem allows the interpretation that, by increasing the jury size, it is possible to approximate the ideal interpretation of the evidence, no more and no less. The problem of insuﬃcient or misleading evidence cannot be avoided by adding jurors. Irrespective of the jury size, the probability of a correct majority decision at most approaches the probability that the evidence ‘tells the truth’, i.e., that its ideal interpretation matches the state of the world. Since there is typically a non-zero probability of misleading evidence – i.e., a non-zero probability that the evidence, even when ideally interpreted, points to ‘guilt’ when the defendant is innocent or vice-versa – the probability that the jury will fail to track the truth converges to a non-zero value as the jury size increases, regardless of how large the competence parameters pe are in condition (C|E).13 5. REASONABLE DOUBT

We now discuss the implications of our ﬁndings for the Bayesian question of when a jury is capable of establishing guilt of a defendant ‘beyond any reasonable doubt’. So far we have been concerned with the ‘classical’ probability of a particular voting outcome – for instance, a majority for ‘guilty’ – conditional on the state of the world. But in a jury context, we may also be interested in the Bayesian probability of a particular state of the world – for instance, the guilt of the defendant – conditional on a particular voting outcome. Suppose we initially attach a certain prior probability to the hypothesis that the defendant is guilty. We may then ask: given that the jury has produced a particular majority for guilty, what is the [247]

188

FRANZ DIETRICH AND CHRISTIAN LIST

posterior probability that the defendant is truly guilty? Reformulated in degree of belief terms, the question is this: what degree of belief can we attach to the hypothesis that the defendant is truly guilty, given that we have observed a particular voting outcome in the jury, such as an overwhelming majority for ‘guilty’? Formally, the probability we are concerned with here is not PðV ¼ vjX ¼ xÞ; but PðX ¼ xjV ¼ vÞ: Note the reversed order of conditionalization. Let r ¼ P(X ¼ 1) denote the prior probability that the defendant is guilty. We assume that there is prior uncertainty about the guilt of the defendant, i.e., 0 < r < 1. Below we also assume non-trivial probabilities of misleading evidence, i.e., 0 < pð1Þ ; pð0Þ < 1. In the classical model – assuming (I|X) and (C|X) – we have: PðX ¼ 1jV ¼ vÞ ¼

rp2vn rp2vn þ ð1 rÞð1 pÞ2vn

(List 2004a):

We can easily see that, for a suﬃciently large jury and a suﬃciently large majority, P(X ¼ 1|V ¼ v) can take a value arbitrarily close to 1. In the limiting case where all jurors vote unanimously, the posterior belief converges to the alternative (‘guilty’ or ‘innocent’) supported by all jurors: P(X ¼ 1|V ¼ n) converges to 1, and P(X ¼ 1|V ¼ 0) converges to 0, as n tends to inﬁnity. It is important to keep this implication of the classical model in mind when we see the results for our modiﬁed model. To simplify the exposition, we only consider the case of homogeneous competence here, i.e., (HC|E). The general case is technically more involved, but essentially analogous. THEOREM 5. If we have (S), (I|E) and (HC|E), then the probability that the defendant is guilty given that precisely v out of n jurors have voted for ‘guilty’ is 1 : PðX ¼ 1jV ¼ vÞ ¼ ð1pð0Þ Þðp=ð1pÞÞ2vn þpð0Þ 1r 1 þ r pð1Þ ðp=ð1pÞÞ2vn þð1pð1Þ Þ How conﬁdent in the correctness of a jury verdict can we ever be, given these Bayesian considerations? More formally, how close to 1 or 0 can the posterior probability P(X ¼ 1|V ¼ v) ever get? Possibly never very close, unlike in the classical model. Consider the best-case scenario, where all jurors vote unanimously, either for ‘guilty’ or for ‘innocent’. These two cases correspond to V ¼ n and V ¼ 0. Using

[248]

A MODEL OF JURY DECISIONS

189

theorem 5 we can determine the posterior probability of guilt given V ¼ n and the posterior probability of guilt given V ¼ 0. COROLLARY 2. Suppose we have (S), (I|E) and (HC|E). Then: (a) The probability that the defendant is guilty given a unanimous ‘guilty’ vote is 1 PðX ¼ 1jV ¼ nÞ ¼ ; ð1pð0Þ Þðp=ð1pÞÞn þpð0Þ 1r 1 þ r pð1Þ ðp=ð1pÞÞn þð1pð1Þ Þ which converges to 1 ¼ PðX ¼ 1jfðEÞ ¼ 1Þ ð< 1Þ 1 þ ðð1 rÞ=rÞðð1 pð0Þ Þ=pð1Þ Þ as n tends to inﬁnity. (b) The probability that the defendant is guilty given a unanimous ‘not guilty’ vote is 1 ; PðX ¼ 1jV ¼ 0Þ ¼ ð1pð0Þ Þðð1pÞ=pÞn þpð0Þ 1r 1 þ r pð1Þ ðð1pÞ=pÞn þð1pð1Þ Þ which converges to

1 ¼ PðX ¼ 1jfðEÞ ¼ 0Þ ð> 0Þ 1 þ ðð1 rÞ=rÞðpð0Þ =ð1 pð1Þ ÞÞ

as n tends to inﬁnity. By contrast, in the classical model P(X ¼ 1|V ¼ n) converges to 1 and P(X ¼ 1|V ¼ 0) converges to 0, as n tends to inﬁnity. So, as n increases, [the probability that the defendant is guilty given a unanimous vote for ‘guilty’] converges to [the probability that the defendant is guilty given that the evidence points towards guilt]. Likewise, as n increases, [the probability that the defendant is guilty given a unanimous vote for ‘not guilty’] converges to [the probability that the defendant is guilty given that the evidence points towards innocence]. Corollary 2 describes the bounds on the posterior probability that X ¼ 1 or X ¼ 0, given the verdict of a large jury, by assuming the unrealistic case that V/n tends to 1 or 0. But this case occurs with probability 0 (unless p ¼ 1), since with probability 1 the proportion of ‘guilty’-votes V/n converges to either p or 1)p (by the law of large numbers). The former is the case if f(E) ¼ 1, the latter if f(E) ¼ 0. However, even in these two realistic cases – V/n converging to p and [249]

190

FRANZ DIETRICH AND CHRISTIAN LIST

V/n converging to 1)p – the posterior probability of guilt, given the jury verdict, converges to exactly the same limits as in corollary 2. COROLLARY 3. Suppose we have (S), (I|E) and (HC|E). Let v1 ; v2 ; . . . ; vn 2 f0; 1g and put qn :¼ ðv1 þ v2 þ þ vn Þ=n for all n. Then the probability that the defendant is guilty given that a proportion of qn of the jurors have voted for ‘guilty’ – where qn converges to either p or 1)p as n tends to inﬁnity – is as follows: (a) If qn converges to p, then PðX ¼ 1jV=n ¼ qn Þ converges PðX ¼ 1jfðEÞ ¼ 1Þ (<1), as n tends to inﬁnity (as in case (a) corollary 2). (b) If qn converges to 1)p, then PðX ¼ 1jV=n ¼ qn Þ converges PðX ¼ 1j fðEÞ ¼ 0Þ ð> 0Þ, as n tends to inﬁnity (as in case (b) corollary 2).

to of to of

The convergence results of corollaries 1 and 2 are identical, showing that in suﬃciently large juries it is irrelevant whether the jury supports an alternative unanimously or by a proportion close to p (the exact meaning of ‘close’ depends on n and on the distance of p to 1/2). Now we are in a position to state the key implication of these results: it may be impossible, even in an arbitrarily large jury and even when there is unanimity for ‘guilty’, to establish guilt of a defendant ‘beyond any reasonable doubt’. More precisely, suppose that the jury’s overall decision (or the judge’s decision based on the jury vote) is required to satisfy the following decision principle: Convict the defendant if and only if the posterior probability of guilt, given the jury vote, exceeds c, where c is some ﬁxed parameter close to 1 (e.g., c ¼ 0.95).

The parameter c captures the threshold of reasonable doubt: only a posterior probability of guilt above c is interpreted as representing a degree of belief beyond reasonable doubt. By corollary 2, we can immediately see that, if PðX ¼ 1jfðEÞ ¼ 1Þ c, then conviction will never be possible according to the decision principle just introduced. No matter how large the jury is and no matter how large the majority for ‘guilty’ is, the jury vote will never justify a degree of belief greater than c that the defendant is guilty, and hence will never establish guilt of the defendant beyond any reasonable doubt. So, if PðX ¼ 1jfðEÞ ¼ 1Þ c, even a unanimous vote for ‘guilty’ in a ten-millionmember jury will be insuﬃcient for conviction – in sharp contrast to what Condorcet’s classical model implies. [250]

A MODEL OF JURY DECISIONS

191

6. SUMMARY

Using Bayesian networks, we have developed a new model of jury decisions. The model can represent a jury, committee or expert panel deciding on whether or not some factual proposition is true, and where the decision is made on the basis of shared evidence. We have suggested that our model is more realistic than the classical Condorcet jury model. First, it captures the empirical fact that in real-world jury, committee or expert panel decisions the state of the world is typically not the latest common cause of the jurors’ votes, but there exists some intermediate common cause: the body of evidence, as described here. Secondly, in legal contexts, the model captures the requirement that jurors must not use any evidence other than that presented in the courtroom. This means that, even if, hypothetically, the jurors could each obtain an independent signal about the state of the world (without any intermediate common cause between diﬀerent such signals), they would be required by law not to use such information. Our model makes two key assumptions: The Parental Markov Condition, applied to the Bayesian tree in Figure 3, which has two implications: s Common signal: the jurors’ votes depend on the true state of the world only through the available body of evidence. s Independence given the evidence: the votes of diﬀerent jurors are independent from each other given the body evidence. Competence given the evidence: for each possible body of evidence, each juror has a probability greater than 1/2 of matching the ideal interpretation of that evidence. In the homogeneous case, juror competence is the same for all possible bodies of evidence; in the heterogeneous case, it may depend on the evidence. Then: The probability of a correct majority decision (given the state of the world) is typically less than, and at most equal to, the corresponding probability in the classical Condorcet jury model. As the jury size increases, the probability of a correct majority decision (given the state of the world) converges to the probability that the evidence is not misleading. Unless the evidence is never misleading, the limiting probability of a correct majority decision is strictly less than one.

[251]

192

FRANZ DIETRICH AND CHRISTIAN LIST

Depending on the required threshold of ‘no reasonable doubt’, it may be impossible, even in an arbitrarily large jury and even when the jury unanimously votes for ‘guilty’, to establish guilt of a defendant ‘beyond any reasonable doubt’. Our model reduces to the classical Condorcet jury model if and only if we assume both that the evidence is never misleading and that juror competence is the same for all possible bodies of evidence (homogeneous competence). If these assumptions are inadequate in realworld jury, committee or expert panel decisions, then the classical Condorcet jury model, as it stands, fails to apply to such decisions.

ACKNOWLEDGMENTS

Previous versions of this paper were presented at the Summer School on Philosophy and Probability, University of Konstanz, September 2002, and at the GAP5 Workshop on Philosophy and Probability, University of Bielefeld, September 2003. We thank the participants at these events and Luc Bovens, Branden Fitelson, Jon Williamson and the anonymous reviewers of this paper for comments and discussion. Franz Dietrich also thanks the Alexander von Humboldt Foundation, the German Federal Ministry of Education and Research, and the Program for the Investment in the Future (ZIP) of the German Government, for supporting this research. APPENDIX: PROOFS

Proof of proposition 1. (i) First assume (PM). Let e 2 E be any body of evidence. We show that given E ¼ e the variables V1 ; . . . ; Vn ; X (votes and state of the world) are independent, which implies in particular that given E ¼ e the votes V1 ; . . . ; Vn are independent (Independence Given the Evidence (I|E)) and that given E ¼ e the vote vector (V1 ; . . . ; Vn ) is independent of X (which is equivalent to Common Signal (S)). To show that given E ¼ e the variables V1 ; . . . ; Vn ; X are independent, let v1 ; . . . ; vn ; x 2 f0; 1g be any possible realizations of these variables. First, we apply (PM) on the ﬁrst juror’s vote V1: since E is the only parent of V1 and all of V2 ; . . . ; Vn ; X are non-descendants of V1, by (PM), given E ¼ e, V1 is independent of the vector of variables (V2 ; . . . ; Vn ; X), i.e., [252]

A MODEL OF JURY DECISIONS

ð1Þ

193

PðV1 ¼ v1 ; . . . ; Vn ¼ vn ; X ¼ xjE ¼ eÞ ¼ PðV1 ¼ v1 jE ¼ eÞ PðV2 ¼ v2 ; . . . ; Vn ¼ vn ; X ¼ xjE ¼ eÞ:

Next, we apply (PM) on V2 to decompose the second term of the last product: since E is the only parent of V2 and all of V3 ; . . . ; Vn ; X are non-descendants of V2, by (PM), given E ¼ e, V2 is independent of the vector of variables (V3 ; . . . ; Vn ; X), i.e., PðV2 ¼ v2 ; . . . ; Vn ¼ vn ; X ¼ xjE ¼ eÞ ¼ PðV2 ¼ v1 jE ¼ eÞ PðV3 ¼ v3 ; . . . ; Vn ¼ vn ; X ¼ xjE ¼ eÞ: Substituting this into (1), we obtain PðV1 ¼ v1 ; . . . ; Vn ¼ vn ; X ¼ xjE ¼ eÞ ¼ PðV1 ¼ v1 jE ¼ eÞ PðV2 ¼ v2 jE ¼ eÞPðV3 ¼ v3 ; . . . ; Vn ¼ vn ; X ¼ xjE ¼ eÞ: By continuing to decompose joint probabilities, one ﬁnally arrives at PðV1 ¼ v1 ;...;Vn ¼ vn ;X ¼ xjE ¼ eÞ ¼ PðV1 ¼ v1 jE ¼ eÞ ... PðVn ¼ vn jE ¼ eÞPðX ¼ xjE ¼ eÞ; which establishes the independence of V1 ; . . . ; Vn ; X. (ii) Now assume (S) and (I|E). To show (PM) we have to go through all nodes of the tree. What (PM) states for the top node X is vacuously true since X has no non-descendants (except itself). Regarding E, its only non-descendant (except itself) is its parent X, and of course, given X, E is independent of X since X is deterministic. Finally, consider vote V1 (the proof for any other vote V2 ; . . . ; Vn ; is analogous). We have to show that V1 is independent of its vector of non-descendants (V2 ; . . . ; Vn ; X) given its parent E ¼ e. (We have excluded E from the vector of non-descendants because E is deterministic given E ¼ e.) Let v2 ; . . . ; vn ; x 2 f0; 1g be any realizations of V2 ; . . . ; Vn ; X. By (S), given E ¼ e, (V2 ; . . . ; Vn ) is independent of X, and so PðV2 ¼ v2 ; . . . ; Vn ¼ vn ; X ¼ xjE ¼ eÞ ¼ PðV2 ¼ v2 ; . . . ; Vn ¼ vn jE ¼ eÞPðX ¼ xjE ¼ eÞ: Now we can apply (I|E) to decompose the ﬁrst factor in the last product, which yields PðV2 ¼ v2 ;...;Vn ¼ vn ;X ¼ xjE ¼ eÞ ¼ PðV2 ¼ v2 jE ¼ eÞ ... PðVn ¼ vn jE ¼ eÞPðX ¼ xjE ¼ eÞ: This shows the independence of (V2 ; . . . ; Vn ; X) given E ¼ e.

( [253]

194

FRANZ DIETRICH AND CHRISTIAN LIST

An alternative proof of proposition 1 might be given using the criterion of d-separation or the theory of semi-graphoids. Proof of theorem 2. By (HC|E), each body of evidence e 2 E is equally easy to interpret ideally, and so we assume for simplicity that E ¼ {0, 1}, where e ¼ 0 is the evidence ideally interpreted as suggesting innocence f(e) ¼ 0, and e ¼ 1 is the evidence ideally interpreted as suggesting guilt f(e) ¼ 1. By (HC|E) and (I|E), if E ¼ 1 then the votes V1 ; V2 ; . . . ; Vn are independently Bernoulli distributed, with a probability p of Vi ¼ 1 and a probability 1)p of Vi ¼ 0 for each i. If E ¼ 0 then the votes V1 ; V2 ; . . . ; Vn are also independently Bernoulli distributed, with a probability p of Vi ¼ 0 and a probability 1)p P of Vi ¼ 1 for each i. Hence, given E ¼ 1, the jury’s vote V ¼ i¼1;...;n Vi has a Binomial distribution with parameters n and p. And given E ¼ 0, V has a Binomial distribution with parameters n and 1)p: n v ð2Þ PðV ¼ vjE 1Þ ¼ p ð1 pÞnv ; v n nv PðV ¼ vjE ¼ 0Þ ¼ p ð1 pÞv : v Now, the probability of obtaining precisely v out of n votes for ‘guilty’ given the state of the world x is: PðV ¼ vjX ¼ xÞ ¼ PðV ¼ vjE ¼ 1 and X ¼ xÞPðE ¼ 1jX ¼ xÞ þ PðV ¼ vjE ¼ 0 and X ¼ xÞPðE ¼ 0jX ¼ xÞ: By (S), conditionalizing on both E ¼ e and X ¼ x is equivalent to conditionalizing only on E ¼ e, so that: PðV ¼ vjX ¼ xÞ ¼ PðV ¼ vjE ¼ 1ÞPðE ¼ 1jX ¼ xÞ þ PðV ¼ vjE ¼ 0ÞPðE ¼ 0jX ¼ xÞ: Explicitly, taking the two cases x ¼ 0 and x ¼ 1, PðV¼vjX¼1Þ¼PðV¼vjE¼1Þpð1Þ þPðV¼vjE¼0Þð1pð1Þ Þ; PðV¼vjX¼0Þ¼PðV¼vjE¼0Þpð0Þ þPðV¼vjE¼1Þð1pð0Þ : Recall that pð1Þ :¼ PðfðEÞ ¼ 1jX ¼ 1Þ and pð0Þ :¼ PðfðEÞ ¼ 0jX ¼ 0Þ; and here fðEÞ ¼ E. Now theorem 2 in the case E ¼ f0; 1g follows from (2). The general case follows from theorem 3 below. ( [254]

A MODEL OF JURY DECISIONS

195

Proof of theorem 3. First, we use the law of iterated expectations to write PðV ¼ vjX ¼ xÞ ¼ ExpðPðV ¼ vjE and X ¼ xÞjX ¼ xÞ: By (S) we have PðV ¼ vjE and X ¼ xÞ ¼ PðV ¼ vjEÞ; so that we deduce (3)

PðV ¼ vjX ¼ xÞ ¼ ExpðPðV ¼ vjEÞjX ¼ xÞ:

By (C|E) and (I|E), conditional on E the votes V1 ; V2 ; . . . ; Vn are independent and Bernoulli distributed with parameter pE if f(E) ¼ 1 and 1)pE if f(E) ¼ 0. Hence the sum V has a binomial distribution with ﬁrst parameter n and second parameter pE if f(E) ¼ 1 and 1)pE if f(E) ¼ 0: 8 > > n pEv ð1 pEÞnv if fðEÞ ¼ 1 < v PðV ¼ vjEÞ ¼ > n > : p nv ð1 pEÞv if fðEÞ ¼ 0: v E In other words,

n PðV ¼ vjEÞ ¼ pEv ð1 pE Þnv 1ffðEÞ¼1g v n þ pEnv ð1 pE Þv 1ffðEÞ¼0g ; v

where 1ffðEÞ¼1g and 1ffðEÞ¼0g are characteristic functions (1A is the random variable deﬁned as 1 if the event A holds and as 0 if it does not). So, by (3) and the linearity of the (conditional) expectation operator Exp (.|X ¼ x), PðV¼vjX¼xÞ¼PðfðEÞ¼1jX¼xÞ n ExpðpE v ð1pE Þnv jfðEÞ¼1andX¼xÞ v þPðfðEÞ¼0jX¼xÞ n ExpðpE nv ð1pE Þv jfðEÞ¼0andX¼xÞ: v ( Proof of corollary 1. Suppose (HC|E) holds. Assume that v>n/2 (a majority for ‘guilty’). Then [255]

196

FRANZ DIETRICH AND CHRISTIAN LIST

pnv ð1 pÞv ¼ pv ð1 pÞnv ðð1 pÞ=pÞ2vn pv ð1 pÞnv ; since 2v n > 0 and p > 1=2. So, by the formula for PðV ¼ vjX ¼ 1Þ in theorem 2, we deduce. ð1Þ n pv ð1 pÞnv PðV ¼ vjX ¼ 1Þ p v n v ð1Þ þ ð1 p Þ p ð1 pÞnv v n v ¼ p ð1 pÞnv ; as required: ( v Proof of theorem 4. (i) We conditionalize on E. By (C|E) and (I|E), W is the sum of n independent Bernoulli variables with parameter pE. The weak law of large numbers implies that the average W/n converges in probability to pE. Since pE>1/2, it follows that lim n!1 PðW > n=2jEÞ ¼ 1: Applying the (conditional) expectation operator on both sides (which corresponds to averaging with respect to E), we obtain Expðlim n!1 PðW > n=2jEÞjX ¼ xÞÞ ¼ Expð1jX ¼ xÞ ¼ 1: By the dominated convergence theorem, we can interchange the expectation operator with the limit operator on the left hand side, so that lim n!1 ExpðPðW > n=2jEÞjX ¼ xÞ ¼ 1: By (S) we can replace PðW > n=2jEÞ by PðW > n=2jE and X ¼ xÞ: This leads to lim n!1 ExpðPðW > n=2jE and X ¼ xÞjX ¼ xÞ ¼ 1; and hence by the law of iterated expectations lim n!1 ðPðW > n=2jX ¼ xÞ ¼ 1: (ii) Using the weak law of large numbers in a similar way as in (i), it is possible to prove that the probability PðV > n=2jEÞ ¼ PðV=n > 1=2jEÞ converges to 1 if f(E) ¼ 1 and to 0 if f(E) ¼ 0 (as n tends to inﬁnity). Hence [256]

A MODEL OF JURY DECISIONS

(4)

197

lim n!1 PðV > n=2jEÞ ¼ 1ffðEÞ¼1g ;

where 1ffðEÞ¼1g is the random variable deﬁned as 1 if f(E) ¼ 1 and as 0 if f(E) ¼ 0. By the law of iterated expectations, PðV > n=2jX ¼ 1Þ ¼ ExpðPðV > n=2jE and X ¼ 1ÞjX ¼ 1Þ; which by (S) simpliﬁes to: (5)

PðV > n=2jX ¼ 1Þ ¼ ExpðPðV > n=2jEÞjX ¼ 1Þ:

Further, we have PðfðEÞ ¼ 1jX ¼ 1Þ ¼ Expð1ffðEÞ¼1g jX ¼ 1Þ ¼ Expðlim n!1 PðV > n=2jEÞjX ¼ 1Þ; where the last step uses (4). We now interchange the expectation operator with the limit (by the dominated convergence theorem) and then use (5) to obtain PðfðEÞ ¼ 1jX ¼ 1Þ ¼ lim n!1 ExpðPðV > n=2jEÞjX ¼ 1Þ ¼ lim n!1 PðV > n=2jX ¼ 1Þ: As for the case X ¼ 0, it can be shown similarly that PðfðEÞ ¼ 0jX ¼ 0Þ ¼ lim n!1 PðV

(

The complexity of this proof is due to the fact that the set of possible evidences E is arbitrarily large (and endowed with some r-algebra). For ﬁnite or countable E, (conditional) expectation operators could be replaced by summations. Proof of theorem 5. By Bayes’s theorem, for any v, PðX ¼ 1jV ¼ vÞ ¼

rPðV ¼ vjX ¼ 1Þ : rPðV ¼ vjX ¼ 1Þ þ ð1 rÞPðV ¼ vjX ¼ 0Þ

Dividing numerator and denominator by rPðV ¼ vjX ¼ 1Þ; we get 1 PðX¼1jV¼vÞ¼ : ðPðV¼vjX¼0ÞÞ 1þð1rÞ =r =ðPðV¼vjX¼1ÞÞ

[257]

198

FRANZ DIETRICH AND CHRISTIAN LIST

We use theorem 2 for expressing PðV ¼ vjX ¼ 1Þ and PðV ¼vjX ¼ 0Þ; and we then simplify: n nv n v nv ð0Þ ð0Þ p ð1pÞv ð1p Þ p ð1pÞ þP PðV¼vjX¼0Þ v v ¼ n nv PðV¼vjX¼1Þ ð1Þ n v nv ð1Þ p ð1pÞ þð1p Þ p p ð1pÞv v v ð1pð0Þ Þðp=ð1pÞÞ2vn þpð0Þ : ¼ pð1Þ ðp=ð1pÞÞ2vn þð1pð1Þ Þ

(

Proof of corollary 2. To prove part (a), note that the convergence to 1 ð1rÞ 1þ =r ð1 pð0Þ Þ=pð1Þ is clear because ðp=ð1 pÞÞn ! 1; so that the ratio ð1 pð0Þ Þðp=ð1 pÞÞn þ pð0Þ pð1Þ ðp=ð1 pÞÞn þ ð1 pð1Þ Þ is asympotically equivalent to ð1 pð0Þ Þðp=ð1 pÞÞn þ 0 1 pð0Þ ¼ : pð1Þ ðp=ð1 pÞÞn þ 0 pð1Þ The rest follows from 1 1þð1rÞ =r ð1pð0Þ Þ=pð1Þ 1 ¼ 1þ½PðX¼0Þ=PðX¼1Þ PðfðEÞ¼1jX¼0Þ=PðfðEÞ¼1jX¼1Þ PðX¼1ÞPðfðEÞ¼1jX¼1Þ : ¼ PðX¼1ÞPðfðEÞ¼1jX¼1ÞþPðX¼0ÞPðfðEÞ¼1jX¼0Þ Part (b) has an analogous proof.

(

Proof of corollary 3. In the formula of theorem 5, we replace v by nqn . If qn ! pð> 1=2Þ; then the term ½p=ð1 pÞ2nqn n ¼ ½p=ð1 pÞ2nðqn 1=2Þ tends to 1. So the ratio [258]

A MODEL OF JURY DECISIONS

199

ð1 pð0Þ Þðp=ð1 pÞÞ2nqn n þ pð0Þ pð1Þ ðp=ð1 pÞÞ2nqn n þ ð1 pð1Þ Þ is asymptotically equivalent to ð1 pð0Þ Þðp=ð1 pÞÞ2nqn n þ 0 pð1Þ ðp=ð1 pÞÞ2nqn nn þ 0

¼

1 pð0Þ pð1Þ

which proves (a). The proof of (b) is analogous.

(

NOTES 1

All conditions are formulated for a given group size n rather than beginning with ‘For all n’. However, in many of our results, the group size is not ﬁxed and tends to inﬁnity. In these results, we implicitly assume that all conditions begin with ‘For all n’ (and that the competence parameter in the competence conditions is the same for all n). Compare List (2004b). 2 We hereafter mean ‘probabilistically caused’ when we use the expression ‘caused’. Probabilistic causation means that the cause aﬀects the probabilities of consequences, whereas deterministic causation means that the cause determines the consequence with certainty. Probabilistic causation can arise for at least two reasons. Metaphysical reasons: the process in question may be genuinely indeterministic; causes determine consequences only with probabilities strictly between 0 and 1, but not with certainty. Epistemic reasons: the process in question may or may not be deterministic at the most fundamental level, but due to its complexity we may not be able to include, or fully describe, all relevant causal factors in the network representation; thus probabilities come into play. We here remain neutral on which of these two reasons apply, though it is obvious that any theoretical representation of jury decisions will be underdescribed and thus epistemically limited. (Our deﬁnition of probabilistic causation allows the special case where the net causal eﬀect on probabilities is zero, because positive and negative causes may cancel each other out.) 3 Several generalizations of the classical Condorcet jury model have been discussed in the literature. We have already referred to existing discussions of dependencies between diﬀerent jurors’ votes. Cases where diﬀerent jurors have diﬀerent competence levels are discussed, for instance, in Grofman, Owen and Feld (1983), Boland (1989) and Dietrich (2003). Cases where jurors vote strategically rather than sincerely are discussed, for instance, in Austen-Smith and Banks (1996). Cases where choices are not binary are discussed, for instance, in List and Goodin (2001). Cases where juror competence depends on the jury size are discussed, for instance, in List (2004b).

[259]

200 4

FRANZ DIETRICH AND CHRISTIAN LIST

Sometimes Bayesian networks are assumed to contain more information: each node in the graph is endowed with a probability distribution of the variable at this node conditional on the node’s parents (or unconditionally if there are no parents). 5 To specify a joint probability distribution of the variables satisfying the Parental Markov Condition, it is suﬃcient to specify a distribution of each variable conditional on its parents (an unconditional distribution if there are no parents). The product of all these conditional probability functions then yields a joint distribution of all variables that satisﬁes the Parental Markov Condition. 6 So all jurors base their votes solely on the same value e of E. Diﬀerences between jurors’ votes are not the result of the jurors’ independent – and thus potentially diﬀerent – access to the state of the world (as in the classical model), but the result of diﬀerent interpretations of the same evidence e. One juror might interpret the defendant’s smile as a sign of innocence, whereas another might give the opposite interpretation. 7 An equivalent statement of (S) is the following: given E, the vector of votes (V1 ; V2 ; . . . ; Vn ) is independent of the state of the world X. 8 One can imagine cases where (part of) the evidence E is not caused by the state of the world X. For instance, if X is the fact of whether or not the defendant has committed a given crime, then the information that the defendant bought a gun in a nearby shop two days before the crime may be evidence for guilt. But this evidence cannot be caused by the crime since the gun purchase happened before the crime. Rather, the causal link between the gun purchase and the crime goes in the other direction. To capture such cases, one might want to replace our causal relation X ! E by some other causal relation between X and E, e.g., by X E, or by a bidirectional causal relation X $ E, or by a common parent of X and E. The theorems and corollaries of this paper still apply to such modiﬁed Bayesian trees (provided that the state X remains related to the votes only through the evidence E). The reason is that the Parental Markov Condition (PM) still implies Common Signal (S) and Independence Given the Evidence (I|E), so that (S) and (I|E) remain justiﬁed assumptions. 9 This model captures not only the empirical fact that in real world jury decisions the available evidence is usually ﬁnite and limited, but also the legal norm, mentioned in the introduction, that jurors are not allowed to obtain or use any evidence other than that presented in the courtroom, or to discuss the case with any persons other than the other jurors. 10 Diﬀerent interpretations of the ideal vote f(e) may be given. One is that the ideal vote is 1 if and only if the objective probability of guilt given the evidence e exceeds some threshold. Here the ideal interpreter is assumed to know the objective likelihoods (of the evidence given guilt and given innocence) and the objective prior probability of guilt. Another interpretation, which does not require an objective

[260]

A MODEL OF JURY DECISIONS

201

prior of guilt but a shared prior of guilt, is to assume that the ideal interpreter uses the group’s shared (perhaps not objective) prior probability of guilt to calculate the posterior probability of guilt given the evidence. We can give a Bayesian account of both interpretations. Assume that the set E of all possible bodies of evidence is countable. Suppose that, by knowing the evidence-generating stochastic process, the ideal observer knows the probabilities P(E=e|X=1) and P(E=e|X=0). Suppose, further, that the ideal observer assigns the (objective or shared) prior probability r := P(X=1) to the proposition that the defendant is guilty. Then, using Bayes’s theorem, the ideal observer can calculate the posterior probability that the defendant is guilty, given the evidence e, i.e., PðX ¼ 1jE ¼ eÞ ¼ rPðE ¼ ejX ¼ 1Þ= ðrPðE ¼ ej X ¼ 1Þ þ ð1 rÞPðE ¼ ejX ¼ 0ÞÞ. Furthermore, the group (or the ideal observer) might set a (normative) threshold for when to vote for ‘guilty’. Now the ideal vote is a ‘guilty’ vote if PðX ¼ 1jE ¼ eÞ > 1 e (for a suitable e > 0) and a ‘not guilty’ vote otherwise. The prior probability r represents the degree of belief the ideal observer assigns to the guilt of the defendant before having seen any evidence. The value of e represents how demanding the threshold for voting for ‘guilty’ is. 11 We also allow that not all jurors have observed the entire evidence e. For instance, some jurors might have missed the smile of the defendant. What matters is not that all jurors base their vote on the full evidence e, but that they use information contained in e. A juror’s information is thus limited by e, which represents the maximally available information for any jury size. 12 This assumption is a technical simpliﬁcation, but involves no real loss of generality. As in the classical Condorcet jury model (e.g., Boland 1989), our model can be generalized by allowing diﬀerently competent jurors, so that the competence PðVi ¼ fðeÞjE ¼ e) depends also on i, denoted pe,i. Our asymptotic results then remain true if we replace (C|E) (respectively (HC|E)) by the weaker competence P assumption that the limiting average competence, limn!1 all i pe;i =n, exceeds 1/2. In corollary 3 one has to interpret p as the limiting average competence across jurors; since corollary 3 requires (HC|E), this limiting average competence does not depend on e here. 13 It is possible to prove a slightly stronger result than theorem 4. Given the state of the world x, the ratio V/n converges with probability 1 to the random variable deﬁned by pE ð> 1=2Þ if fðEÞ ¼ 1 and 1 pE ð< 1=2Þ if fðEÞ ¼ 0 ð< 1=2Þ. Among these two possible limits the one that corresponds to a majority for the correct alternative happens with probability pðxÞ ¼ PðfðEÞ ¼ xjX ¼ xÞ. Hence, with probability 1, there is convergence to a stable majority as the jury size increases, where this majority supports the correct alternative with the probability that the evidence ‘tells the truth’.

[261]

202

FRANZ DIETRICH AND CHRISTIAN LIST

REFERENCES Austen-Smith, D. and J. Banks: 1996, ‘Information Aggregation, Rationality, and the Condorcet Jury Theorem’, American Political Science Review 90, 34–45. Boland, P.J.: 1989, ‘Majority Systems and the Condorcet Jury Theorem’, Statistician 38, 181–189. Boland, P.J., F. Proschan, and Y.L. Tong: 1989, ‘Modelling dependence in simple and indirect majority systems’, Journal of Applied Probability 26, 81–88. Bovens, L. and E. Olsson: 2000, ‘Coherentism, Reliability and Bayesian Networks’, Mind 109, 685–719. Corﬁeld, D. and J. Williamson: (eds.) 2001, Foundations of Bayesianism, Dordrecht (Kluwer). Dietrich, F.: 2003, ‘General Representation of Epistemically Optimal Procedures’, Social Choice and Welfare, forthcoming. Estlund, D.: 1994, ‘Opinion leaders, independence and Condorcet’s Jury Theorem’, Theory and Decision 36, 131–162. Fitelson, B.: 2001, ‘A Bayesian Account of Independent Evidence with Application’, Philosophy of Science 68 (Proceedings), S123–S140. Grofman, B., G. Owen, and S.L. Feld: 1983, ‘Thirteen Theorems in search of the Truth’, Theory and Decision 15, 261–278. Lahda, K.K.: 1992, ‘The Condorcet Jury Theorem, Free Speech, and Correlated Votes’, American Journal of Political Science 36, 617–634. List, C. and R.E. Goodin: 2001, ‘Epistemic Democracy: Generalizing the Condorcet Jury Theorem’, Journal of Political Philosophy 9, 277–306. List, C.: 2004a, ‘On the Signiﬁcance of the Absolute Margin’, British Journal for the Philosophy of Science, forthcoming. List, C.: 2004b, ‘The Epistemology of Special Majority Voting’, Social Choice and Welfare, forthcoming. Nitzan, S. and J. Paroush: 1984, ‘The Signiﬁcance of Independent decisions in Uncertain Dichotomous Choice Situations’, Theory and Decision 17, 47–60. Owen, G.: 1986, ‘Fair Indirect Majority Rules, in B. Grofman and G. Owen (eds.)’, Information Pooling and Group Decision Making, Greenwich, CT (Jai Press). Pearl, J.: 2000, ‘Causality: models, reasoning, and inference’, Cambridge (C.U.P.). Franz Dietrich Philosophy, Probability and Modelling Group Center for Junior Research Fellows University of Konstanz, 78457 Konstanz, Germany E-mail: [email protected] Christian List Department of Government London School of Economics Houghton Street London WC2A2AE, U.K. E-mail: [email protected]

[262]

203 M. KACPRZAK and W. PENCZEK

A SAT-BASED APPROACH TO UNBOUNDED MODEL CHECKING FOR ALTERNATING-TIME TEMPORAL EPISTEMIC LOGIC?

ABSTRACT. This paper deals with the problem of veriﬁcation of game-like structures by means of symbolic model checking. Alternating-time Temporal Epistemic Logic (ATEL) is used for expressing properties of multi-agent systems represented by alternating epistemic temporal systems as well as concurrent epistemic game structures. Unbounded model checking (a SAT based technique) is applied for the ﬁrst time to veriﬁcation of ATEL. An example is given to show an application of the technique.

1. INTRODUCTION

Alur et al. (1997) proposed a succinct and expressive language for reasoning about game-like systems, called Alternating-time Temporal Logic (ATL). ATL is a generalization of Computation Tree Logic (CTL), where the path quantiﬁers are replaced by cooperation modalities of the form hhAii with A being a set of agents. The intended interpretation of an ATL formula hhAiiw is that the group A of agents has a winning strategy for w, i.e., the agents of A can cooperate to ensure that w holds. For example, hhAiiXa speciﬁes that the coalition A of agents has a strategy that can maintain that in a next state a is true. Van der Hoek and Wooldridge (2002b, 2003) have extended ATL to Alternating-time Temporal Epistemic Logic (ATEL) by introducing the notion of knowledge in the sense of Fagin–Halpern–Moses–Vardi model of knowledge (Fagin et al. 1995). Such an interpretation of knowledge operators refers to the agents’ capability to distinguish global states. Some variations of ATEL, e.g., Alternating-time Temporal Observational Logic or Alternating-time Temporal Epistemic Logic with Recall, of complete and incomplete information, are studied in Jamroga and van der Hoek (2004). Model checking is a powerful technique, widely used in veriﬁcation of hardware and software systems (Clarke et al. 1999). Recently, Synthese 142: 203–227, 2004. Knowledge, Rationality & Action 263–287. 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[263]

204

KACPRZAK AND PENCZEK

veriﬁcation of multi-agent systems (Wooldridge 2002) has become an active subject of research. In particular, recent contributions (van der Meyden and Shilov 1999; van der Hoek and Wooldridge 2002a,b) have focused on extending model checking tools and techniques usually employed for veriﬁcation of reactive systems. Essentially, these study the validity of a formula representing a property of interest in the model representing all the computations of the multiagent system (MAS) under consideration. The iterative approaches to model checking (i.e., those based on the explicit enumeration of all the possible states of computation) are known to suﬀer from the state explosion problem. For this reason, methods based on symbolic representation are currently seen as the most promising. In particular, the methods based on variants of Binary Decision Diagrams (BDDs) (McMillan 1993; Raimondi and Lomuscio 2003), and satisﬁability (SAT-checkers) (Biere et al. 1999; Clark et al. 2001; Kacprzak et al. 2004b) are constant focus of research. Veriﬁcation via BDDs involves translating the model checking problem into operations on Boolean functions represented in a concise and canonical way, whereas SATcheckers are used for testing satisﬁability of formulas encoding the model checking problem. Our paper deals with the problem of model checking multi-agent systems with respect to their speciﬁcations written in ATEL via a translation to a SAT-problem. The authors of ATL designed a model-checker MOCHA (Alur et al. 2000) based on BDDs, which supports the heterogenous modelling framework of Reactive Modules. A technique of reducing the model checking problem from ATEL to ATL was given in van Otterloo et al. (2003), where epistemic relations are explicitly encoded in ATL models as dynamic transitions. The main idea of the translation consists in extending the original temporal structure with additional transitions to simulate epistemic accessibility links. The simulation is achieved through adding new ‘‘epistemic’’ agents, who can enforce transitions to epistemically accessible states. However, the algorithm is intended only for turn-based acyclic transition systems, where every state is assigned a player who controls the choice of the next move, so the players are taking turns. Moreover, the languages of ATL and ATEL are reduced so that in the temporal part neither ‘‘next’’ nor ‘‘until’’ operators are allowed, and the group knowledge operators cannot be used. A more general translation, not suﬀering from the above restrictions, was introduced in Goranko and Jamroga (2004). Both the translations can be used for

[264]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

205

checking ATEL formulas exploiting model-checkers for ATL like MOCHA (Alur et al. 2000). Unlike MOCHA, we focus on propositional decision procedures. These also operate on Boolean expressions, but do not use canonical forms and therefore sometimes do not suﬀer from a potential space explosion of BDDs and excessive memory requirements. McMillan (2002) introduced a SAT-based technique, called Unbounded Model Checking (UMC), designed for verifying CTL properties. McMillan’s UMC consists in translating the model checking problem of a CTL formula into the problem of satisﬁability of a corresponding propositional formula. It exploits the characterization of the basic CTL modalities in terms of Quantiﬁed Boolean Formulas (QBF), and the algorithms that translate QBF and ﬁxed point equations over QBF into propositional formulas. Model checking via UMC can be exponentially more eﬃcient than approaches based on BDD’s in two situations: whenever the resulting ﬁxed points have compact representations in CNF, but not via BDDs and whenever the SAT-based image computation step proves to be faster than the BDD-based one (McMillan 2002). This paper is based on our AAMAS’04 contribution (Kacprzak and Penczek 2004), where we show that UMC can be applied to ATL. Now, we extend it to ATEL. In fact, we could have used our UMC method for ATL combined with a translation from ATEL to ATL model checking, but as we mention in the conclusions, we believe that a direct approach to ATEL should appear more eﬃcient. The key issue in designing UMC for ATL consists in encoding the next time operator (hhAiiXa) by QBF formula and next translating it to corresponding propositional formula. In case of ATEL we need to do the same for knowledge operators (Ka a, agent a’s knowledge; EA a, everybody knows; DA a, distributed knowledge). The other operators are computed as the greatest or the least ﬁxed points of functions deﬁned over the basic ‘‘next time’’ or ‘‘everybody knows’’ modalities. In order to adapt UMC for checking ATEL, we use three algorithms. The ﬁrst one, implemented by the procedure forall (based on the Davis–Putnam–Logemann–Loveland approach) eliminates the universal quantiﬁer from a QBF formula representing an ATEL formula of the form hhAiiXa; Ka a; EA a; or DA a, and returns the result in Conjunctive Normal Form (CNF). The remaining algorithms calculate the greatest and the least ﬁxed points. Ultimately, the technique allows for an ATEL formula a to be translated into a propositional

[265]

206

KACPRZAK AND PENCZEK

formula ½aðwÞ1 , which characterizes all the states of the model where a holds. The rest of the paper is organized as follows. In Section 2, we deﬁne alternating epistemic temporal structures (AETS). Then, in Section 3, the language of ATEL is presented. Two examples of applications of ATEL are given in Section 4. Section 5 reviews Quantiﬁed Boolean Formulas and formulas in Conjunctive Normal Form which are used in veriﬁcation. Fixed point characterization of temporal and epistemic operators is described in Section 6. Section 7 deals with Unbounded Model Checking (UMC) for ATEL. Finally, in Section 8, we show an example of applications of UMC. Conclusions are given in Section 9. 1.1. State of the Art and Related Work The recent developments in the area of model checking MAS can broadly be divided into streams: in the ﬁrst category standard predicates are used to interpret the various intensional notions and these are paired with standard model checking techniques based on temporal logic. Following this line is for example Wooldridge et al. (2002) and related papers. In the other category we can place techniques that make a genuine attempt at extending the model checking techniques by adding other operators. Works along these lines include van der Hoek and Wooldridge (2002a,b), van der Meyden and Shilov (1999) and Raimondi and Lomuscio (2003). In Penczek and Lomuscio (2003a,b), and Lomuscio et al. (2003) an extension of the method of bounded model checking (one of the main SAT-based techniques) to CTLK, a language comprising both CTL and knowledge operators, was deﬁned, implemented, and evaluated. Quite recently, unbounded model checking method for CTLK has been deﬁned and implemented (Kacprzak et al. 2004a,b). Furthermore, we described an application of UMC to ATL in Kacprzak and Penczek (2004). The present paper extends it to the case of ATEL by adding an encoding epistemic operators. Since preliminary results appear largely positive for CTLK (Kacprzak et al. 2004a), we believe that the method of UMC for ATEL will be eﬃcient as well. Recently, ATL has received a lot of interest in the MAS community. There have been several papers consideridng ATL extensions by an epistemic (van der Hoek and Wooldridge 2003; Jamroga and van der Hoek 2004) or a deontic component (Jamroga et al. 2004). [266]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

207

2. ALTERNATING EPISTEMIC TRANSITION SYSTEMS

The semantics of Alternating-time Temporal Logic (ATL) has been slightly changed since 1997 and in consequence several versions of it are known (Alur et al. 1997; 1998; 2002). Below, we recall one of them based on alternating transition systems (ATS) extended with the notion of an initial state and epistemic relations what results in alternating epistemic transition systems (AETS). Interpretation of ATL formulas can be also given over concurrent game structures (CGS) (Alur et al. 2002). However, in the current approach we exploit ATS with epistemic components rather than CGS. The reason is that ATS correspond to the description we use in the UMC technique, where we identify actions, i.e., agent choices, with the set of possible outcomes, exactly like in ATS. A detailed comparison of these two models is given in Goranko and Jamroga (2004). DEFINITION 1. An alternating epistemic transition system (AETS) is a tuple T ¼ hP; A; Q; ı, p; d; fa ja 2 Agi where:

P is a ﬁnite set of atomic propositions; A is a non-empty ﬁnite set of agents; Q is a ﬁnite set of (global) states; ı 2 Q is the initial state; p : Q ! PðPÞ2 is a valuation function which speciﬁes which propositions are true at which states; d : Q A ! PðPðQÞÞ is a transition function mapping a nonempty family of choices of possible next states to a pair (state, agent). The idea is that at a state q, the agent a chooses a set Qa 2 dðq; aÞ thus forcing the outcome state to be from Qa . The resulting transition leads to a state which is in the intersection of all Qa for each a 2 A and so it reﬂects the mutual will of all agents. Since the system is required T to be deterministic (given the state and the agents’ decisions), a2A Qa must always be a singleton, a is an epistemic relation deﬁned for agent a 2 A. The relation a models the agent a’s inability to distinguish between (global) states of the system, i.e., if q a q0 , then while the system is in state q, the agent a cannot really determine whether it is in q or q0 . This relation is usually assumed to be an equivalence relation. Let A A. Given the epistemic relations for the agents in A, the union of A’s accessibility relations deﬁnes the epistemic relation [267]

208

KACPRZAK AND PENCZEK

corresponding to the modality of everybody knows: EA ¼ S C E a2A a : A denotes the transitive closure of A , and corresponds to the relation used to interpret the modality of common knowledge. The intersection of A’s accessibility relations deﬁnes the epistemic relation T corresponding to the modality of distributed knowledge: ¼ D A a2A a . We refer to Fagin et al. (1995) for an introduction to these concepts. Computations in AETS. We say that a state q0 2 Q is a successor of a state q if there are choice sets Qa 2 dðq; aÞ, for a 2 A such that T 0 0 a2A Qa ¼ fq g . Thus, q is a successor of q iﬀ whenever AETS is in the state q, the agents can cooperate so that the next state is q0 . A computation of T is an inﬁnite sequence k ¼ q0 ; q1 ; q2 ; . . . of states such that for all positions i 0, the state qiþ1 is a successor of the state qi . We refer to a computation starting at the state q as a q-computation. For a computation k and a position i 0, we use k½i; k½0; i to denote the i-th state of k and the ﬁnite preﬁx q0 ; q1 ; . . . ; qi of k, respectively. 3. ALTERNATING-TIME TEMPORAL EPISTEMIC LOGIC

Now, we formally present Alternating-time Temporal Epistemic Logic (ATEL) (van der Hoek and Wooldridge 2002b, 2003) based on Alternating-time Temporal Logic (ATL) (Alur et al. 1997, 1998, 2002). ATEL is a logic which allows for expressing properties involving both knowledge and strategies. Before deﬁning syntax and semantics, we give an intuition behind its key constructs. ATL takes its inspiration from Computation Tree Logic (CTL). Thus, it contains all the conventional connectives and tense modalities. However, the path quantiﬁers used in CTL are replaced by cooperation operators parameterized with sets of agents. An ATL formula hhAiiXa, where A is a group of agents, means that the group A has a collective strategy to force that a is true in a next state. Similarly, hhAiiGa means that the members of A can work together to ensure that a is always true. The formula hhAiiaUb means that A can cooperate to ensure that b will become true at some time in the future and until then a remains true. If we want to express the fact that a group of agents cannot avoid some state of aﬀairs, we can use the dual operator ½½A, e.g., ½½AXa means that the group A of agents cannot cooperate to enforce that a is not true at the next state. ATEL [268]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

209

extends ATL with epistemic operators: Ka a, ‘‘the agent a knows that a’’; EA a, ‘‘every member of the group A knows that a’’; DA a, ‘‘combined knowledge of the members of the group A implies that a’’; CA a, ‘‘every agent belonging to the group A knows that a and every agent belonging to the group A knows that every agent belonging to the group A knows that a and so on’’. DEFINITION 2 (Syntax of ATEL). The set of ATEL formulas F ORM is deﬁned inductively as follows: every member p of P is a formula, if a and b are formulas, then so are :a and a _ b, if A A is a set of agents, and a and b are formulas, then so are hhAiiXa; hhAiiGa; and hhAiiaUb, if a 2 A is an agent and a is a formula, then Ka a is so, if A A is a set of agents and a is a formula, then so are EA a, DA a, CA a. Additional Boolean connectives ^, ), , are deﬁned from :, _ in the def usual manner. Moreover true ¼ p _ :p for some p 2 P and def false ¼ :true. Furthermore we write hhAiiFa for hhAiitrue Ua and use the following abbreviations for dual formulas: def

– ½½AXa ¼ :hhAiiX:a, def – ½½AGa ¼ :hhAiiF:a, def – ½½AFa ¼ :hhAiiG:a. We interpret the ATEL formulas over the states of AETS deﬁned over the same sets of propositions and agents. In order to deﬁne the semantics of ATEL formally, we ﬁrst deﬁne the notion of a strategy. By a strategy for an agent a we call a mapping fa : Qþ ! PðQÞ which assigns to every non-empty sequence of states k ¼ q0 ; . . . ; qn a choice set fa ðkÞ 2 dðqn ; aÞ. The function speciﬁes a’s decisions for every possible (ﬁnite) history of system transitions. Given a state q 2 Q, a set A of agents, and a set FA ¼ ffa ja 2 Ag of strategies, one for each agent in A, we deﬁne the outcomes of FA from q to be the set outðq; FA Þ of q-computations that the agents in A enforce when they follow the strategies in FA . That is a computation k ¼ q0 ; q1 ; q2 ; . . . is in outðq; FA Þ if q0 ¼ q and for all positions i 0 and every agent a 2 A there T is a set Qa 2 dðqi ; aÞ such that (1) Qa ¼ fa ðk½0; iÞ, and (2) qiþ1 2 a2A Qa .

[269]

210

KACPRZAK AND PENCZEK

DEFINITION 3 (Interpretation of ATEL). Let T be an AETS, q 2 Q a state, and a; b formulas of ATEL. T; q a denotes that a is true at the state q in the system T. T is omitted, if it is implicitly understood. The relation is deﬁned inductively as follows:

q p iﬀ p 2 pðqÞ, for p 2 P; q :a iﬀ q 6 a, q a _ b iﬀ q a or q b, q hhAiiXa iﬀ there exists a set FA of strategies, one for each agent in A, such that for all computations k 2 outðq; FA Þ, we have k½1 a, q hhAiiGa iﬀ there exists a set FA of strategies, one for each agent in A, such that for all computations k 2 outðq; FA Þ, and all positions i 0, we have k½i a, q hhAiiaUb iﬀ there exists a set FA of strategies, one for each agent in A, such that for all computations k 2 outðq; FA Þ, there exists a position i 0 such that k½i b and for all positions 0 j < i, we have k½ j a; q Ka a iﬀ for every q0 2 Q if q a q0 , then q0 a, q EA a iﬀ for every q0 2 Q if q EA q0 , then q0 a, 0 0 q DA a iﬀ for every q0 2 Q if q D A q , then q a, 0 C 0 0 q CA a iﬀ for every q 2 Q if q A q , then q a.

DEFINITION 4 (Validity). An ATEL formula u is valid in T (denoted T u) iﬀ T; i u; i:e:; u is true at the initial state of the system T. 4. EXAMPLES OF APPLICATIONS OF ATEL

In this section, we give two examples of applications of ATEL to specifying system properties. EXAMPLE 1. Consider a system composed of three agents 1, 2, and 3. The agents assign values to the Boolean variables respectively x; y, and z. When x ¼ 0, then the agent 1 can leave the value of x unchanged or change it to 1. When x ¼ 1, then the agent 1 leaves the value of x unchanged. When y ¼ 0, then the agent 2 can leave the value of y unchanged or change y from 0 to 1 either when x is already 1 or when simultaneously x is set to 1. When y ¼ 1, then the agent 2 leaves the value of y unchanged. In a similar way, the agent 3 can leave the value of z unchanged if z ¼ 0 or change z from 0 to 1 when y is already 1. When z ¼ 1, then the agent 3 leaves the value of z [270]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

211

unchanged. We assume that the agent 1 knows the value of variable x only, the agent 2 knows the value of x and y, while the agent 3 knows the value of y and z. If the game starts when the value of all variables is 0, then it is obvious that the agents 1 and 2 can cause the agent 3 to have its variable unchanged: hhf1; 2giiX:z, where :z means that the value of the variable z is 0. However, the agent 2 has no way to learn about it: :K2 hhf1; 2giiX:z unlike the agent 3, which can deduce this fact based on its local information, i.e., this is true in all the states it considers possible given current local information about the variables y and z : K3 hhf1; 2giiX:z. Consequently, it is not true that every member of the group f2; 3g knows that the agents 1 and 2 can block the move of the agent 3: :Ef2;3g hhf1; 2giiX:z, but since the agent 3 knows that fact this is their distributed knowledge: Df2;3g hhf1; 2giiX:z. Other interesting properties of this system can be formulated as well. For example, does the agent 3 know that the agent 2 does not know that together with the agent 1 they can block the agent 3: K3 :K2 hhf1; 2giiX:z, or does the agent 2 know that the agent 3 knows that it can be blocked by the agents 1 and 2: K2 K3 hhf1; 2giiX:z ? The answer can be given using our UMC technique, which we describe in the next sections. EXAMPLE 2. The second example we discuss is inspired by Robinson (2004). Consider a group of three bored guests at a party, who are playing the following game. Each guest chooses a natural number from 0 to 2, writes it on a piece of paper, and drops it in a passed basket. The goal is to choose a number as close as possible to the group average. Each player who came closest would share a bottle of a very expensive champagne. Such a kind of game is known to economists as ‘‘the beauty contest game’’. It simulates people reasoning in many real-world situations, in particular, participants of the stock market. Using ATEL we try to establish whether it is possible to have a strategy resulting in wining the champagne. Observe that one player, e.g., the player 1, cannot enforce getting the prize, although of course it may happen that the player 1 wins. Consequently, the formula hhf1giiXwin1 is false. It is also a very interesting question whether a group of players, e.g., 1 and 2, can cooperate to get the champagne. The answer is yes. If two players choose the same number, they certainly will be winners, which is expressed by the formula hhf1; 2giiXðwin1 ^ win2 Þ. On the other hand, since the player 3 can also choose exactly the same number the group f1; 2g cannot enforce that the player 3 loses: :hhf1; 2giiX:win3 . [271]

212

KACPRZAK AND PENCZEK

We assume that no player knows choices of his rivals, so he has the information only about his number. Nevertheless, after considering all the possibilities, he can conclude that a cooperation brings the success: K1 hhf1; 2giiXðwin1 ^ win2 Þ and K2 hhf1; 2giiXðwin1 ^ win2 Þ. Consequently, each member of the group of the players 1,2 knows it: Ef1;2g hhf1; 2giiXðwin1 ^ win2 Þ. Furthermore, everybody knows that they can cooperate to win and so on. Thus, it is their common knowledge: Cf1;2g hhf1; 2giiXðwin1 ^ win2 Þ. Although the players become conscious of having a winning strategy they are not required to know the way itself. In fact, they do not know which number to choose in order to win! This problem is considered in Jamroga and van der Hoek (2004). 5. QUANTIFIED BOOLEAN FORMULAS AND CNF FORMULAS

In this section, we give several deﬁnitions concerning Quantiﬁed Boolean Formulas (QBF) and formulas in Conjunctive Normal Forms (CNF) that we use in the current approach. Moreover, we present an algorithm (developed by K. McMillan) that allows for translating QBF formulas into CNF formulas. An application of this procedure we show in next sections. In order to have a more succinct notation for complex operations on Boolean formulas, in what follows we use Quantiﬁed Boolean Formulas (QBF), an extension of propositional logic by means of quantiﬁers ranging over propositions. In BNF: a ::¼ p j :a j a ^ a j 9p:a j 8p:a. The semantics of the quantiﬁers is deﬁned as follows: 9p:a iﬀ aðp 8p:a iﬀ aðp

trueÞ _ aðp trueÞ ^ aðp

falseÞ, falseÞ,

where a 2 QBF; p 2 PV (a set of propositional variables) and aðp wÞ denotes capture-avoiding substitution with the formula w of every occurrence of the variable p in the formula a. We use the notation 8v:a where v ¼ ðv½1; . . . ; v½mÞ is a vector of propositional variables, to denote 8v½1:8v½2 . . . 8v½m:a: We usually deal with propositional formulas in Conjunctive Normal Forms. A formula is in Conjunctive Normal Form (CNF) if it is a conjunction of zero or more clauses where by a clause we mean a disjunction of zero or more literals, i.e., propositional variables as well as negations of these. [272]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

213

In the following, we show a standard polynomial-time algorithm, to be used later, that given a propositional formula a, constructs a CNF formula which is unsatisﬁable exactly when a is valid. The procedure works as follows. First of all, for every subformula b of the formula a, including a, we introduce a distinct variable lb . Furthermore, if b is a variable, then lb ¼ b. Next we assign a formula CN F ðbÞ to every subformula b according to the following rules: if b is a variable then CN F ðbÞ ¼ true, if b ¼ :/ then CN F ðbÞ ¼ CN F ð/Þ ^ ðlb _ l/ Þ ^ ð:lb _ :l/ Þ, if b ¼ / _ u then CN F ðbÞ ¼ CN F ð/Þ ^ CN F ðuÞ^ ðlb _ :l/ Þ^ ðlb _ :lu Þ ^ ð:lb _ l/ _ lu Þ, if b¼/^u then CN F ðbÞ ¼ CN F ð/Þ ^ CN F ðuÞ^ ð:lb _ l/ Þ ^ ð:lb _ lu Þ ^ ðlb _ :l/ _ :lu Þ, if b¼/)u then CN F ðbÞ ¼ CN F ð/Þ ^ CN F ðuÞ^ ðlb _ l/ Þ ^ ðlb _ :lu Þ ^ ð:lb _ :l/ _ lu Þ. It can be easily shown that the formula a is valid exactly when the CNF formula CN F ðaÞ ^ :la is unsatisﬁable. It is important that for a given QBF formula 8v:a, we can construct an equivalent CNF formula using the algorithm forall (McMillan 2002). Given a propositional formula a and a vector of variables v ¼ ðv½1; . . . ; v½mÞ, the algorithm forall constructs a CNF formula v equivalent to 8v:a and eliminates the quantiﬁed variables on the ﬂy. A description of this procedure is given below. Procedure forallðv; aÞ, where v ¼ ðv½1; . . . ; v½mÞ and a is a propositional formula let / ¼ CN F ðaÞ ^ :la ; v ¼ true, and A ¼ ; repeat if / contains false, return v else if conflict analyse conflict and backtrack else if current assignment satisfies / build a blocking clause c0 remove variables of form v½i and :v½i from c0 add c0 to / and v else choose a literal l such that l 62 A and :l 62 A and add l to A The procedure works as follows. Initially the algorithm assumes an empty assignment A, a formula v to be true and / to be a CNF [273]

214

KACPRZAK AND PENCZEK

formula CN F ðaÞ ^ :la . First, the procedure ﬁnds a satisfying assignment for /. The search of an appropriate assignment is based on the Davis–Putnam-Logemann–Loveland approach (Davis et al. 1962), which makes use of two techniques: Boolean constraint propagation (BCP) and conﬂict-based learning (CBL). The ﬁrst builds an assignment A/ which is an extension of the assignment A and is implied by A and /. Next BCP determines the consequence of A/ . The following three cases may happen: 1. A conﬂict exists, i.e., there exists a clause in / such that all of its literals are false in A/ . So, the assignment A cannot be extended to a satisfying one. If a conﬂict is detected the CBL ﬁnds the reason for the conﬂict and tries to resolve it. Information about the current conﬂict may be recorded as clauses, which are then added to the formula / without changing its satisﬁability. The algorithm then backtracks, i.e., it changes assignment A by withdrawing one of the previous decisions. 2. A conﬂict does not exist and A/ is total, i.e., a satisfying assignment is obtained. In this case, we generate a new clause which is false in the current assignment A/ (i.e., rules out the satisfying assignment) and whose complement characterizes a set of assignments falsifying the formula a. This clause is called a blocking clause. The construction of this clause is given in McMillan (2002). Next, the blocking clause is deprived of the variables of the form v½i and the negation of these and then what remains is added to the formulas / and v and the algorithm again tries to ﬁnd a satisfying assignment for /. 3. The ﬁrst two cases do not apply. Then, the procedure makes a new assignment A by giving a value to a selected variable. On termination, when / becomes unsatisﬁable, v is a conjunction of the blocking clauses and precisely characterizes 8v:a. For more details see McMillan (2002) or Kacprzak et al. (2003, 2004b). THEOREM 1. Let a be a propositional formula and v ¼ ðv½1; . . . ; v½mÞ be a vector of propositions, then the QBF formula 8v:a is logically equivalent to the CNF formula returned by forallðv; aÞ. The proof of the above theorem follows from the correctness of the algorithm forall (see McMillan 2002).

[274]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

215

6. FIXED-POINT REPRESENTATION OF ATEL

In this section, we show how the set of states satisfying an ATEL formula can be characterized as a ﬁxed point of an appropriate function. We adapt deﬁnitions given in Clarke et al. (1999). Let T ¼ hP; A; Q; i; p; d; fa ja 2 Agi be an alternating epistemic transition system. Notice that the set PðQÞ of all subsets of Q forms a lattice under the set inclusion ordering. Each element Q0 of the lattice can also be thought of as a predicate on Q, where this predicate is viewed as being true for exactly the states in Q0 . The least element in the lattice is the empty set, which we also refer to as false, and the greatest element in the lattice is the set Q, which we sometimes write as true. A function s mapping PðQÞ to PðQÞ is called a predicate transformer. A set Q0 Q is a ﬁxed point of a function s : PðQÞ ! PðQÞ if sðQ0 Þ ¼ Q0 . Whenever s is monotonic, i.e., P1 P2 implies sðP1 Þ sðP2 Þ, it has the least ﬁxed point denoted lZ:sðZÞ and the greatest ﬁxed point S denoted mZ:sðZÞ. When sðZÞ is also -continuous, i.e., P1 P2 . . . S S S When implies T sð i Pi Þ ¼ i sðPi Þ then lZ:sðZÞ ¼ i 0 si (false). T T sðZÞ P . . . implies sð P Þ ¼ is also -continuous, i.e., P 1 2 i i i sðPi Þ T i then mZ:sðZÞ ¼ i 0 s (true) (see Tarski 1955). In order to obtain ﬁxed-point characterizations of operators, we identify each ATEL formula a with the set haiT of states of T at which this formula is true, formally haiT ¼ fq 2 Q j T; q ag. If T is known from the context, we omit the subscript T. Furthermore, we deﬁne functions: hhAiiXðZÞ, Ka ðZÞ, EA ðZÞ, DA ðZÞ for every a 2 A, A A, Z Q as follows: hhAiiXðZÞ ¼ fq 2 Q j for every a 2 A there exists a set Qa 2 dðq; aÞ such that for every state q0 2 Q,Tevery agent b 2 AnA, and every set Qb 2 dðq; bÞ if fq0 g ¼ i2A Qi , then q0 2 Zg, Ka ðZÞ ¼ fq 2 Q j for every q0 2 Q if q a q0 , then q0 2 Zg, EA ðZÞ ¼ fq 2 Q j for every q0 2 Q if q EA q0 , then q0 2 Zg, 0 0 DA ðZÞ ¼ fq 2 Q j for every q0 2 Q if q D A q , then q 2 Zg. LEMMA 1. hhhAiiXai ¼ hhAiiXðhaiÞ. Proof. hhAiiXðhaiÞ ¼ fq 2 Qj for every a 2 A there exists a set 0 agent b 2 AnA, and Qa 2 dðq; aÞ such that for every state T q 2 Q, every 0 every set Qb 2 dðq; bÞ if fq g ¼ i2A Qi , then q0 2 haig ¼ fq 2 Q j for every a 2 A there exists a function fa : Qþ ! PðQÞ such that fa ðqÞ ¼ [275]

216

KACPRZAK AND PENCZEK

Qa 2 dðq; aÞ and for every agent T b 2 AnA, every set Qb 2 dðq; bÞ, and every state q0 2 Q if fq0 g ¼ i2A Qi , then q0 ag ¼ fq 2 Q j there exists a set FA of strategies, one for each agent in A such that for every k 2 outðq; FA Þ and for every state q0 2 Q if q0 ¼ k½1, then q0 ag ¼ fq 2 Q j there exists a set FA of strategies, one for each agent in A such that for every k 2 outðq; FA Þ, k½1 ag ¼ fq 2 Q j q hhAiiXag ¼ h hhAiiXai. Similarly, we can easily show that hKa ai ¼ Ka ðhaiÞ, hEA ai ¼ EA ðhaiÞ, and hDA ai ¼ DA ðhaiÞ. Then, each of the following operators may be characterized as T the least or the greatest ﬁxed point of an S appropriate monotonic ( -continuous or -continuous) predicate transformer. hhhAiiGai ¼ mZ:hai \ hhAiiXðZÞ, hhhAiiaUbi ¼ lZ:hbi [ ðhai \ hhAiiXðZÞÞ, hCA ai ¼ mZ:EA ðhai \ ZÞ.

7. SYMBOLIC UNBOUNDED MODEL CHECKING ON ATEL

The Unbounded Model Checking (UMC) technique is based on the procedure forall eliminating quantiﬁers in QBF formulas, as well as on the standard ﬁxed-point algorithms, both used for a translation of ATEL formulas into propositional formulas. UMC uses also SATsolvers to perform satisﬁability checking. Given an alternating epistemic transition system T ¼ hP; A; Q; i; p; d; fa j a 2 Agi, we deﬁne for every q 2 Q, a 2 A, and Qa 2 dðq; aÞ, the action ðQ1 ; Q2 Þ, where Q1 ¼ fq0 2 Q j Q2 2 dðq0 ; aÞg and Q2 ¼ Qa . Then, we deﬁne Acta ¼ fðQ1 ; Q2 Þ j Q1 ¼ fq0 2 Q j Q2 2 dðq0 ; aÞgg to be a set of all the actions possible to agent a. For an action c ¼ ðQ1 ; Q2 Þ, let preðcÞ ¼ Q1 and postðcÞ ¼ Q2 . Notice that all actions are deﬁned in terms of their pre and post conditions, i.e., preðcÞ is a set of all states from which the action c can be executed and postðcÞ is a set of all states which can be reached after the execution of c. This means that the action c can be executed at any state in preðcÞ and takes to any state in postðcÞ. Next, we assume Q f0; 1gm , where m ¼ dlog2 ðjQjÞe, i.e., every state is represented by a sequence consisting of 0’s and 1’s. Let PV be a set of fresh propositional variables such that PV \ P ¼ ;, FPV be a set of propositional formulas over PV, and lit : f0; 1g PV ! FPV be a function deﬁned as follows: litð0; pÞ ¼ :p and litð1; pÞ ¼ p. [276]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

217

Furthermore, let w ¼ ðw½1; . . . ; w½mÞ be a global state variable, where w½i 2 PV for each i ¼ 1; . . . ; m. We use elements of Q as valuations3 of global states variables in formulas of FPV . For example w½1 ^ w½2 evaluates to true for the valuation q ¼ ð1; . . . ; 1Þ, and it evaluates to false for the valuation q ¼ ð0; . . . ; 0Þ. Now, the idea consists in using propositional formulas of FPV to encode sets of states of Q. For example, the formula w½1 ^ ^ w½m encodes the state represented by ð1; . . . ; 1Þ, whereas the formula w½1 encodes all the states, the ﬁrst bit of which is equal to 1. Next, the following propositional formulas are deﬁned: V – Iq ðwÞ :¼ m i¼1 litðq½i; w½iÞ; this formula encodes the state q ¼ ðq½1; . . . ; q½mÞ, i.e., q½i ¼ 1 is encoded by w½i, and q½i ¼ 0 is encoded by :w½i; notice that q ¼ ðq½1; . . . ; q½mÞ is the only valuation of a global state variable w ¼ ðw½1; . . . ; w½mÞ that satisﬁes S Iq ðwÞ, – prec ðwÞ and postc ðwÞ for every c 2 a2A Acta ; prec ðwÞ is a formula which is true for a valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ iﬀ q 2 preðcÞ and postc ðwÞ is a formula which is true for a valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ iﬀ q 2 postðcÞ, – Ra ðw; vÞ for every a 2 A, Ra ðw; vÞ is a formula which is true for valuations q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ and q0 ¼ ðq0 ½1; . . . ; q0 ½mÞ of v ¼ ðv½1; . . . ; v½mÞ iﬀ q a q0 . Next, we translate ATEL formulas into propositional formulas. Speciﬁcally, for a given ATEL formula / we compute a corresponding propositional formula ½/ðwÞ which is satisﬁed by a valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ iﬀ q 2 h/i. In so doing, we obtain a formula ½/ðwÞ such that / is valid in the system T iﬀ the conjunction ½/ðwÞ ^ Ii ðwÞ is satisﬁable. Notice that ½/ðwÞ ^ Ii ðwÞ is satisﬁable only if ½/ðwÞ is valid for the valuation implied by the initial state i. Operationally, we work outwards from the most nested subformulas, i.e., to compute ½OaðwÞ, where O is a modality, we work under the assumption of already having computed ½aðwÞ. DEFINITION 5 (Translations). Given an ATEL formula /, the propositional translation ½/ðwÞ is inductively deﬁned as follows: W ½pðwÞ :¼ q2hpi Iq ðwÞ, for p 2 P, ½:aðwÞ :¼ :½aðwÞ,

[277]

218

KACPRZAK AND PENCZEK

½a _ bðwÞ :¼ ½aðwÞ _ ½bðwÞ, let A ¼ fa1 ; . . . ; at g A and B ¼ fb1 ; . . . ; bs g ¼ AnA, W V V ½hhAiiXaðwÞ :¼ ca 2Acta ;...;cat 2Actat ð ti¼1 precai ðwÞ^ forallðv; cb 1 1 V1 V 2 Actb1 ; . . . ; cbs 2 Actbs ð sj¼1 precbj ðwÞ ^ sj¼1 V postcbj ðvÞ ^ ti¼1 postcai ðvÞ ) ½aðvÞÞÞÞ, ½Ka aðwÞ :¼ forallðv; ðRWa ðw; vÞ ) ½aðvÞÞÞ, ½EA aðwÞ :¼ forallðv; ðð Va2A Ra ðw; vÞÞ ) ½aðvÞÞÞ, ½DA aðwÞ :¼ forallðv; ðð a2A Ra ðw; vÞÞ ) ½aðvÞÞÞ, ½hhAiiGaðwÞ :¼ gfpA ð½aðwÞÞ, ½hhAiiaUbðwÞ :¼ lfpA ð½aðwÞ; ½bðwÞÞ, ½CA aðwÞ :¼ gfpC A ð½aðwÞÞ. The algorithms gfp and lfp are based on the standard procedures computing ﬁxed points and are given below. In order to simplify the subsequent notation, W Vt we use ½hhAiiXZðwÞðwÞ V to denote the formula Vs ð pre ðwÞ ^ forallðv; c ai ca1 2Acta1 ; ...;V cat 2Actat cb1 2Actb1 ; ...; cbs 2Actbs ð j ¼ 1 i¼1 V precbj ðwÞ ^ sj¼1 postcbj ðvÞ ^ ti¼1 postcai ðvÞ ) ZðvÞÞÞÞ, where ZðvÞ is a propositional formula over the global variable v encoding a subset of Q. Procedure gfpA ð½aðwÞÞ, where a is an ATEL formula let QðwÞ ¼ ½trueðwÞ; ZðwÞ ¼ ½aðwÞ while :ðQðwÞ ) ZðwÞÞ is satisfiable let QðwÞ ¼ ZðwÞ, let ZðwÞ ¼ ½hhAiiXZðwÞðwÞ ^ ½aðwÞ return QðwÞ The procedure gfpC A isW obtained by replacing ZðwÞ ¼ ½aðwÞ and ZðwÞ ¼ ½hhAii with ZðwÞ ¼ forallðv; ðð a2A Ra ðw; vÞÞ ) ½aðvÞÞÞ W XZðwÞðwÞ ^ ½aðwÞ with ZðwÞ ¼ forallðv; ðð a2A Ra ðw; vÞÞ ) ðZðvÞ^ ½aðvÞÞÞÞ. Procedure lfpA ð½aðwÞ; ½bðwÞÞ, where a; b are ATEL formulas let QðwÞ ¼ ½falseðwÞ; ZðwÞ ¼ ½bðwÞ while :ðZðwÞ ) QðwÞÞ is satisfiable let QðwÞ ¼ QðwÞ _ ZðwÞ, let ZðwÞ ¼ ½hhAiiXQðwÞðwÞ ^ ½aðwÞ, return QðwÞ The least and the greatest ﬁxed points are computed by iterative procedures what follows directly the fact that mZ:sðZÞ ¼ S from T i i i0 s (true) and lZ:sðZÞ ¼ i0 s (false). Let Qi ðwÞ denotes the i-th approximation of the ﬁxed point. Then, Qi ðwÞ ¼ [278]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

219

½hhAiiXQi1 ðwÞðwÞ ^ ½aðwÞ is an invariant of the while-loop of gfpA procedure and Qi ðwÞ ¼ Qi1 ðwÞ _ ð½hhAiiXQi1 ðwÞðwÞ ^ ½aðwÞÞ which is equivalent to ½bðwÞ _ ð½hhAiiX Qi1 ðwÞðwÞ ^ ½aðwÞÞ is an invariant of the while-loop of lfpA procedure. The ﬁxed point has been reached if Qiþ1 ðwÞ is equivalent to Qi ðwÞ. However, the series of approximations is either monotonically increasing or monotonically decreasing. It is therefore suﬃcient to check implication instead of equivalence. In case of increasing, we check if Qiþ1 ðwÞ ) Qi ðwÞ is a tautology. In case of decreasing, we check if Qi ðwÞ ) Qiþ1 ðwÞ is a tautology. These formulas are not tautologies until we reach the ﬁxed point, so the negation of each of the formulas is satisﬁable. Let s1 ðZÞ ¼ hai \ hhAiiXðZÞ and Notice that valuation s2 ðZÞ ¼ hbi [ ðhai \ hhAiiXðZÞÞ. q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes Qi ðwÞ iﬀ q 2 si1 ðtrueÞ for gfpA procedure and valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes Qi ðwÞ iﬀ q 2 si2 ðfalseÞ for lfpA procedure. THEOREM 2 (UMC for ATEL ). Given an ATEL formula u and an alternating epistemic transition system T, the following condition holds: T u iff ½uðwÞ ^ Ii ðwÞ is satisfiable: Proof. First, let us observe that a valuation s ¼ ðs½1; . . . ; s½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes the formula Iq ðwÞ iﬀ s ¼ q. So, i ¼ ði½1; . . . ; i½mÞ is the only valuation that satisﬁes the formula Ii ðwÞ. Next, we need to check that the translations given in Deﬁnition 5 are well deﬁned. To this aim, we have to prove that for every ATEL formula u and every state q of a system T we have: T; q u iﬀ the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes ½uðwÞ. The proof follows by induction on the complexity of u and directly stems from the construction of ½uðwÞ. The theorem follows directly for the propositional variables. Let u ¼ p for W p 2 P. A valuation s ¼ ðs½1; . . . ; s½mÞ satisﬁes ½pðwÞ iﬀ s satisﬁes q2hpi Iq ðwÞ iﬀ there exists q 2 hpi such that s satisﬁes Iq ðwÞ iﬀ there exists q 2 hpi such that s ¼ q iﬀ s 2 hpi iﬀ T; s p. Assume that the hypothesis holds for all the proper sub-formulas of u. If u is equal to either :a; a ^ b, or a _ b, then it is easy to check that the theorem holds. Consider u to be of the following form: u ¼ hhAiiXa. Let A ¼ fa1 ; . . . ; at g A and B ¼ fb1 ; . . . ; bs g ¼ Anfa1 ; . . . ; at g. Then, for every q 2 Q; T; q hhAiiXa iﬀ (by Lemma 1) q 2 hhAiiXðhaiÞ iﬀ for every agent a 2 A there exists a set 0 agent b 2 B and Qa 2 dðq; aÞ such that for every state T q 2 Q, every 0 every set Qb 2 dðq; bÞ if fq g ¼ i2A Qi , then q0 2 hai iﬀ for every [279]

220

KACPRZAK AND PENCZEK

a 2 A there exists an action ca 2 Acta such that q 2 preðca Þ and for every state q0 2 Q, every agent bT2 B, and every action cb 2 Actb such that q 2 preðcb Þ, if fq0 g ¼ i2A postðci Þ, then q0 2 hai iﬀ (by deﬁnition of formulas prec ðwÞ; postc ðwÞ and the inductive assumption) for every a 2 A there exists an action ca 2 Acta such that the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes the formula preca ðwÞ and for every valuation q0 ¼ ðq0 ½1; . . . ; q0 ½mÞ of v ¼ ðv½1; . . . ; v½mÞ, every agent b 2 B, Actb if q satisﬁes the formula precb ðwÞ and q0 and every action cb 2 V satisﬁes conjunction i2A postci ðvÞ, then q0 satisﬁes the formula ½aðvÞ iﬀ there exist actions ca1 2 Acta1 ; . . . ; cat 2 Actat , such that the valuation q ¼V ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes t and for every valuation the formula i¼1 precai ðwÞ 0 0 0 q ¼ ðq ½1; . . . ; q ½mÞ of v ¼ ðv½1; . . . ; v½mÞ andVall actions s if q satisﬁes the formula cb1 2 Actb1 ; . . . ; cbs 2 Actbs V j¼1 precbj ðwÞ V t s 0 and q satisﬁes conjunction i¼1 postcai ðvÞ ^ j¼1 postcbj ðvÞ, then q0 satisﬁes the formula ½aðvÞ iﬀ the valuation q ¼ W ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes the QBF formula ca 2Acta ;...;cat 2Actat 1 1 V V V V ð ti ¼ 1 precai ðwÞ ^ 8v : ð cb 2 Actb ; ...; cbs 2Actbs ð sj ¼1 precbj ðwÞ ^ sj ¼ 1 1 1 V postcbj ðvÞ ^ ti¼1 postcai ðvÞ ) ½aðvÞÞÞÞ iﬀ (by Theorem 1) the valuation q ¼ ðq½1; . . . ; q½mÞ of satisﬁes Vt W w ¼ ðw½1; . . . ; w½mÞ ð pre the propositional formula cai ðwÞ^ i¼1 2Actat Vs ca1 2Acta1 ;...;catV V V s forallðv; cb 2Actb ;...;cbs 2Actbs ð j¼1 precbj ðwÞ ^ j¼1 postcbj ðvÞ ^ ti¼1 1 1 postcai ðvÞ ) ½aðvÞÞÞÞ iﬀ (by Deﬁnition 5) the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes ½hhAiiXaðwÞ. u ¼ Ka a. Then, for every q 2 Q; T; q Ka a iﬀ q 2 Ka ðhaiÞ iﬀ for every q0 2 Q if q a q0 , then q0 2 hai iﬀ for the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ and every valuation q0 ¼ ðq0 ½1; . . . ; q0 ½mÞ of v ¼ ðv½1; . . . ; v½mÞ formula Ra ðw; vÞ ) ½aðvÞ is true iﬀ for the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ the QBF formula 8v:ðRa ðw; vÞ ) ½aðvÞÞ is true iﬀ (by Theorem 1) for the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ the propositional formula forallðv; ðRa ðw; vÞ ) ½aðvÞÞÞ is true iﬀ for the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ the formula ½Ka aðwÞ is true. u ¼ DA ajEA a. The proof is similar to the former case. u ¼ hhAiiGa. The proof is based on the ﬁxed-point characterizations of the formulas and correctness of the procedures computing ﬁxed points. For every q 2 Q; T; q hhAiiGa iﬀ q 2 mZ:hai \ hhAiiXðZÞ iﬀ q 2 si1 ðtrueÞ for the least i such that [280]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

221

si1 ðtrueÞ ¼ s1iþ1 ðtrueÞ with sðZÞ ¼ hai \ hhAiiXðZÞ iﬀ the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes the formula Qi ðwÞ ¼ ½hhAiiXQi1 ðwÞðwÞ ^ ½aðwÞ for the least i such that Qi ðwÞ is equivalent to Qiþ1 ðwÞ iﬀ the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes the propositional formula gfpA ð½aðwÞÞ iﬀ the valuation q ¼ ðq½1; . . . ; q½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes ½hhAiiGaðwÞ. u ¼ hhAiiaUbjCA a. The proof is analogous to the former case. Thus, T u iﬀ T; i u iﬀ the valuation i ¼ ði½1; . . . ; i½mÞ of w ¼ ðw½1; . . . ; w½mÞ satisﬁes the formula ½uðwÞ iﬀ ½uðwÞ ^ Ii ðwÞ is satisﬁable. 8. EXAMPLE OF AN APPLICATION OF UMC

In this section, we show an application of UMC to veriﬁcation of some properties of the multi-agent system described in Example 1 in Section 4. This system is modelled by the following alternating epistemic transition system T ¼ hP; A; Q; i; p; d; fa ja 2 Agi where: A ¼ f1; 2; 3g, Q ¼ fq; qx ; qy ; qz ; qxy ; qxz ; qyz ; qxyz g, the state q corresponds to x ¼ y ¼ z ¼ 0, the state qx corresponds to x ¼ 1 and y ¼ z ¼ 0, the remaining states have similar interpretations, i.e., the variables listed are set to 1, i ¼ q, P ¼ fx; y; zg, pðqÞ ¼ ;; pðqx Þ ¼ fxg, pðqy Þ ¼ fyg, pðqz Þ ¼ fzg, pðqxy Þ ¼ fx; yg, pðqxz Þ ¼ fx; zg, pðqyz Þ ¼ fy; zg; pðqxyz Þ ¼ fx; y; zg, dðq; 1Þ ¼ dðqy ; 1Þ ¼ dðqz ; 1Þ ¼ dðqyz ; 1Þ ¼ ffq; qy ; qz ; qyz g, fqx ; qxy ; qxz ; qxyz gg, dðqx ; 1Þ ¼ dðqxy ; 1Þ ¼ dðqxz ; 1Þ ¼ dðqxyz ; 1Þ ¼ ffqx ; qxy ; qxz ; qxyz gg, dðq; 2Þ ¼ dðqz ; 2Þ ¼ ffq; qx ; qz ; qxz g; fq; qz ; qxy ; qxyz gg, dðqx ; 2Þ ¼ dðqxz ; 2Þ ¼ ffq; qx ; qz ; qxz g; fqy ; qxy ; qyz ; qxyz gg, dðqy ; 2Þ ¼ dðqxy ; 2Þ ¼ dðqyz ; 2Þ ¼ dðqxyz ; 2Þ ¼ ffqy ; qxy ; qyz ; qxyz gg, dðq; 3Þ ¼ dðqx ; 3Þ ¼ ffq; qx ; qy ; qxy gg, dðqy ; 3Þ ¼ dðqxy ; 3Þ ¼ ffq; qx ; qy ; qxy g; fqz ; qxz ; qyz ; qxyz gg, dðqz ; 3Þ ¼ dðqxz ; 3Þ ¼ dðqyz ; 3Þ ¼ dðqxyz ; 3Þ ¼ ffqz ; qxz ; qyz ; qxyz gg, a for a 2 A is deﬁned such that q0 1 q00 iﬀ x 2 pðq0 Þ , x 2 pðq00 Þ, q0 2 q00 iﬀ x 2 pðq0 Þ , x 2 pðq00 Þ and y 2 pðq0 Þ , y 2 pðq00 Þ; q0 3 q00 iﬀ y 2 pðq0 Þ , y 2 pðq00 Þ and z 2 pðq0 Þ , z 2 pðq00 Þ:

[281]

222

KACPRZAK AND PENCZEK

TABLE I The pre and post conditions of the actions Action

Pre

Post

a1 a2 b1 b2 b3 c1 c2

q; qy ; qz ; qyz q; qx ; qy ; qz ; qxy ; qxz ; qyz ; qxyz q; qx ; qz ; qxz q; qz qx ; qy ; qxy ; qxz ; qyz ; qxyz q; qx ; qy ; qxy qy ; qz ; qxy ; qxz ; qyz ; qxyz

q; qy ; qz ; qyz qx ; qxy ; qxz ; qxyz q; qx ; qz ; qxz q; qz ; qxy ; qxyz qy ; qxy ; qyz ; qxyz q; qx ; qy ; qxy qz ; qxz ; qyz ; qxyz

The function d determines the following actions of the agents: Act1 ¼ fa1 ; a2 g, Act2 ¼ fb1 ; b2 ; b3 g, Act3 ¼ fc1 ; c2 g. The pre and post conditions of the actions are given in Table I. The joint actions of the AETS are shown in Figure 1. We now encode the states in binary form in order to use them in model checking. We need only 3 bits to encode the states. In particular, we take the ﬁrst bit to be equal 1 for x ¼ 1 and 0 for x ¼ 0, the second bit to be equal 1 for y ¼ 1 and 0 for y ¼ 0, and similarly the third bit describes the value of the variable z. Thus, ð0; 0; 0Þ ¼ q, ð1; 0; 0Þ ¼ qx , ð0; 1; 0Þ ¼ qy , ð0; 0; 1Þ ¼ qz , ð1; 1; 0Þ ¼ qxy , ð1; 0; 1Þ ¼ qxz , ð0; 1; 1Þ ¼ qyz , ð1; 1; 1Þ ¼ qxyz .

(a1,b1,c2) (a1,b2,c2)

(a2,b1,c2)

(a1,b1,c1) (a1,b2,c1)

qz (a2,b3,c1)

q (a2,b2,c1)

qxy

(a2,b1,c1)

(a2,b1,c2)

(a2,b2,c2) (a2,b3,c2)

qxz (a2,b3,c2)

qxyz

(a2,b3,c1)

qx (a2,b1,c1)

(a2,b3,c2)

(a2,b3,c1)

(a2,b3,c2)

qy

(a1,b3,c2)

(a1,b3,c1)

Figure 1. The joint actions of the AETS of the example.

[282]

(a2,b3,c2)

qyz (a1,b3,c2)

A SAT-BASED APPROACH TO UNBOUNDED MODEL

223

Let w ¼ ðw½1; w½2; w½3Þ; v ¼ ðv½1; v½2; v½3Þ be two global state variables. The following propositional formulas over w and v are deﬁned: Ii ðwÞ :¼ :w½1 ^ :w½2 ^ :w½3, prea1 ðwÞ :¼ :w½1, posta1 ðwÞ :¼ :w½1, prea2 ðwÞ :¼ true, posta2 ðwÞ :¼ w½1, preb1 ðwÞ :¼ :w½2, postb1 ðwÞ :¼ :w½2, preb2 ðwÞ :¼ :w½1 ^ :w½2, postb2 ðwÞ :¼ ð:w½1 ^ :w½2Þ _ ðw½1 ^ w½2Þ, preb3 ðwÞ :¼ w½1 _ w½2, postb3 ðwÞ :¼ w½2, prec1 ðwÞ :¼ :w½3, postc1 ðwÞ :¼ :w½3, prec2 ðwÞ :¼ w½2 _ w½3, postc2 ðwÞ :¼ w½3, R1 ðw; vÞ :¼ w½1 , v½1, R2 ðw; vÞ :¼ ðw½1 , v½1Þ ^ ðw½2 , v½2Þ, R3 ðw; vÞ :¼ ðw½2 , v½2Þ ^ ðw½3 , v½3Þ. Notice that the precondition of the action a1 is a set of all states represented by sequences in which the ﬁrst bit is 0. Therefore, the propositional formula prea1 ðwÞ is deﬁned as :w½1. The construction of the other formulas is similar. Consider the following ATEL formula a ¼ hhf1; 2giiX:z which expresses that the agents 1 and 2 can cooperate to ensure that in the next state the value of z will be 0. We shall prove that this formula is valid in the structure. First we translate the formula :z. Observe that :z is satisﬁed at every state represented by a sequence in which the third bit is 0. Therefore, ½:zðwÞ :¼ :w½3. Then, we translate the formula a: W ½hhf1; 2giiX:zðwÞ :¼ a2Act1 ;b2Act2 ðprea ðwÞ ^ preb ðwÞ ^ forall V ðv; c2Act3 ðprec ðwÞ ^ postc ðvÞ ^ posta ðvÞ ^ postb ðvÞ ) :v½3ÞÞÞ ¼ :w½2 ^ :w½3. Therefore, Ii ðwÞ ^ ½aðwÞ ¼ ð:w½1 ^ :w½2 ^ :w½3Þ ^ ð:w½2 ^ :w½3Þ ¼ :w½1 ^ :w½2 ^ :w½3. Now, consider the formulas b ¼ K2 ðhhf1; 2giiX:zÞ and c ¼ K3 ðhhf1; 2giiX:zÞ. The formula b expresses that the agent 2 knows that the agents 1 and 2 can cooperate to ensure that at the next state the value of z will be 0. The formula c says that the agent 3 has got the same knowledge about the variable z. Then, ½bðwÞ ¼ forallðv; R2 ðw; vÞ ) ½hhf1; 2giiX:zðvÞÞ ¼ forall ðv; ððw½1 , v½1Þ ^ ðw½2 , v½2Þ ) :v½2 ^ :v½3ÞÞ ¼ false, while [283]

224

KACPRZAK AND PENCZEK

½cðwÞ ¼ forallðv; R3 ðw; vÞ ) ½hhf1; 2giiX:zðvÞÞ ¼ forallðv; ððw½2 , v½2Þ ^ðw½3 , v½3Þ ) :v½2 ^ :v½3ÞÞ ¼ :w½2 ^ :w½3. Therefore, Ii ðwÞ ^ ½bðwÞ ¼ false and Ii ðwÞ ^ ½cðwÞ ¼ :w½1^ :w½2 ^ :w½3. So, the formulas a and c are valid in the structure, while b is not. Thus, we conﬁrm that the agent 2 does not know that it can cooperate with the agent 1 to block a move of the agent 3, while the agent 3 knows it. Therefore, it is not true that every member of the group f2; 3g knows that the agents 1 and 2 can enforce that at the next state the value of the variable z is false: :Ef2;3g hhf1; 2giiX:z. Moreover, at every state :b holds since there is no state at which formula b is true. Consequently, the formula K3 :K2 ðhhf1; 2giiX:zÞ is valid in the structure, which means that the agent 3 knows that the agent 2 does not know that a cooperation of the group of the agents 1 and 2 can ensure that the value of the variable z never changes.

9. CONCLUSIONS

ATEL is a formalism that has been proposed to specify and verify multi-agent systems (MAS); especially systems with imperfect or incomplete information. We have shown that a SAT-based method of UMC can be also applied to this logic. This required to encode the ATEL operators in a propositional way. Applying UMC directly to ATEL allows for verifying important properties of MAS (like agent communication protocols and planning of communication) without using a translation from ATEL to ATL (van Otterloo et al. 2003; Goranko and Jamroga 2004). Although, such a translation is at most quadratic in the size of the original model (Goranko and Jamroga 2004), we believe that our direct encoding of ATEL formulas should appear to be more eﬃcient in practice. As we have mentioned before, there are several possibilities of extending ATL with an epistemic component, where various conditions on interactions between epistemic relations and actions available to the agents are imposed (Jamroga and van der Hoek 2004). Further research in this line will pursue an application UMC technique also to these logics as well as an implementation of the method, evaluation of the experimental results that can be obtained, and comparison of these results to results obtained by standard techniques using BDDs. An extension of the technique of

[284]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

225

UMC from a purely temporal setting of CTL to a temporal-epistemic one of CTLK we showed in Kacprzak et al. (2003). Since experimental results for CTLK (Kacprzak et al. 2004a) are very promising, we expect to receive a similar eﬃciency with the current approach. REFERENCES Alur, R., L. de Alfaro, T. Henzinger, S. Krishnan, F. Mang, S. Qadeer, S. Rajamani, and S. Tasiran: 2000, MOCHA user manual. Technical report, University of California at Berkeley. http://www-cad.eecs.berkeley.edu/ mocha/doc/c-doc/cmanual.ps.gz. Alur, R., T. A. Henzinger, and O. Kupferman: 1997, Alternating-time temporal logic, In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science. (FOCS’97), IEEE Computer Society, pp. 100–109. Alur, R., T. A. Henzinger, and O. Kupferman: 1998, Alternating-time temporal logic, LNCS 1536, 23–60. Alur, R., T. A. Henzinger, and O. Kupferman: 2002, Alternating-time temporal logic, Journal of the ACM 49(5), 672–713. Biere, A., A. Cimatti, E. Clarke, and Y. Zhu: 1999, Symbolic model checking without BDDs, In Proceedings of the Fifth International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’99), Vol. 1579 of LNCS, Springer-Verlag, pp. 193–207. Clarke, E., A. Biere, R. Raimi, and Y. Zhu: 2001, Bounded model checking using satisﬁability solving, Formal Methods in System Design 19(l), 7–34. Clarke, E. M., O. Grumberg, and D. Peled: 1999, Model Checking, MIT Press. Davis, M., G. Logemann, and D. Loveland: 1962, A machine program for theorem proving, Journal of the ACM 5(7), 394–397. Fagin, R., J. Y. Halpern, Y. Moses, and M. Y. Vardi: 1995, Reasoning about Knowledge, MIT Press, Cambridge. Goranko, V. and W. Jamroga: 2004, Comparing semantics for logics of multi-agent systems. Synthese 139(2), 241–280. Available from http://journals.kluweronline. com/article.asp? PIPS=5264999. Jamroga, W. and W. van der Hoek: 2004, Agents that know how to play, Fundamenta Informaticae. To appear. Jamroga, W., W. van der Hoek, and M. Wooldridge: 2004, Obligations vs. abilities of agents via deontic ATL. Accepted for the Seventh International Workshop on Deontic Logic in Computer Science DEON’04. To appear in LNCS. Kacprzak, M., A. Lomuscio, T. Lasica, W. Penczek, and M. Szreter: 2004a, Verifying multiagent systems via unbounded model checking, In Proceedings of the Third NASA Workshop on Formal Approaches to Agent-Based Systems (FAABS III), LNCS, Springer-Verlag. To appear. Kacprzak, M., A. Lomuscio, and W. Penczek: 2004b, Veriﬁcation of multi-agent systems via unbounded model checking In N. R. Jennings, C. Sierra, L. Sonenberg, and M. Tambe (eds.), Proceedings of the Third International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’04), Vol.II, ACM pp. 638–645.

[285]

226

KACPRZAK AND PENCZEK

Kacprzak, M., A. Lomuscio, and W. Penczek: 2003, Unbounded model checking for knowledge and time. Technical Report 966, ICS PAS, Ordona 21, 01-237 Warsaw. Also available at http://www.ipipan.waw.pl/ penczek/WPenczek/ 2003.html. Kacprzak, M., and W. Penczek: 2004, Unbounded model checking for alternatingtime temporal logic, In N. R. Jennings, C. Sierra, L. Sonenberg, and M. Tambe (eds.), Proceedings of the Third International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’04), Vol.II, ACM pp.646–653. Lomuscio, A., T. Lasica, and W. Penczek: 2003, Bounded model checking for interpreted systems: Preliminary experimental results, In Proceedings of the second NASA Workshop on Formal Approaches to Agent-Based Systems (FAABS’02), Vol.2699 of LNAI, Springer-Verlag, pp.115–125. McMillan, K. L.: 1993, Symbolic Model Checking. Kluwer Academic Publishers. McMillan, K. L.: 2002, Applying SAT methods in unbounded symbolic model checking, In Proceedings of the 14th International Conference on Computer Aided Veriﬁcation (CAV’02), Vol.2404 of LNCS, Springer-Verlag, pp.250–264. Penczek, W. and A. Lomuscio: 2003a, Verifying epistemic properties of multi-agent systems via bounded model checking, Fundamenta Informaticae 55(2), 167–185. Penczek, W. and A. Lomuscio: 2003b, Verifying epistemic properties of multi-agent systems via bounded model checking, In T. Sandholm (ed.), Proceedings of the second International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’03), ACM, pp.209–216. Raimondi, F. and A. Lomuscio: 2003, A tool for speciﬁcation and veriﬁcation of epistemic and temporal properties of multi-agent system, Electronic Lecture Notes in Theoretical Computer Science. To appear. Robinson S.: 2004, How real people think in strategic games, SIAM News, 37(1). Tarski, A.: 1995, A lattice-theoretical ﬁxpoint theorem and its applications, Paciﬁc Journal of Mathematics 5: 285–309. van der Hoek W. and M. Wooldridge: 2002a, Model checking knowledge and time, In Proceedings of the Ninth International SPIN Workshop (SPIN’02), Vol.2318 of LNCS, Springer-Verlag, pp.95–111. van der Hoek W. and M. Wooldridge: 2002b, Tractable multiagent planning for epistemic goals, In Proceedings of the First International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’02), Vol. III, ACM, pp.1167–1174. van der Hoek W. and M. Wooldridge: 2003, Cooperation, knowledge, and time: Alternating-time temporal epistemic logic and its applications, Studia Logica 75 (1), 125–157. van der Meyden R. and H. Shilov: 1999, Model checking knowledge and time in systems with perfect recall, In Proceedings of the 19th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS’99), Vol.1738 of LNCS, Springer-Verlag, pp. 432–445. van Otterloo, S., W. van der Hoek, and M. Wooldridge: 2003, Knowledge as strategic ability, ENCTS, 85(2), 1–23. Wooldridge, M.: 2002, An introduction to multi-agent systems, John Wiley, England. Wooldridge, M., M. Fisher, M. P. Huget, and S. Parsons: 2002, Model checking multiagent systems with MABLE, In Proceedings of the 1st International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’02), Vol.II, ACM, pp. 952–959.

[286]

A SAT-BASED APPROACH TO UNBOUNDED MODEL

227

NOTES ?

This work relates to Department of the Navy Grant N000I4-04-1-4080 issued by the Oﬃce of Naval Research International Field Oﬃce. The United States Government has a royalty-free license throughout the world in all copyrightable material contained herein. The ﬁrst author acknowledges also support from the Polish National Committee for Scientiﬁc Research under Bialystok University of Technology (grant W/IMF/2/04). 1 Note that w is a vector of propositional variables used to encode the states of the model. 2 P denotes the power set. 3 We identify 1 with true and 0 with false. M. Kacprzak Bialystok University of Technology Institute of Mathematics and Physics 15-351 Bialystok, ul. Wiejska 45A, Poland E-mail: [email protected] W. Penczek Institute of Computer Science, PAS 01-237 Warsaw, ul. Ordona 21, Poland and Podlasie Academy, Institute of Informatics, Siedlce, Poland E-mail: [email protected]

[287]

229 ARJEN HOMMERSOM, JOHN-JULES MEYER and ERIK DE VINK

UPDATE SEMANTICS OF SECURITY PROTOCOLS

ABSTRACT. We present a model-theoretic approach for reasoning about security protocols, applying recent insights from dynamic epistemic logics. This enables us to describe exactly the subsequent epistemic states of the agents participating in the protocol, using Kripke models and transitions between these based on updates of the agents’ beliefs associated with steps in the protocol. As a case study we will consider the SRA Three Pass protocol and discuss the Wide-Mouthed Frog protocol.

1. INTRODUCTION

In today’s world of e-commerce and the Internet, the role of security protocols is getting increasingly important. The design of these security protocols is diﬃcult and error-prone (Lowe 1996; Schneier 2000; Anderson 2001), which makes (automatic) veriﬁcation of protocols of crucial importance. Since the late 1980s, one line of research, amongst others, for reasoning about security protocols is based on the use of the so-called BAN logic, proposed by Burrows, Abadi and Needham in (Burrows et al., 1990). This is an epistemic logic augmented by constructs that are relevant for reasoning about security, such as the property of having the disposal of a cryptographic key to be able to decode a message and therefore to know its contents. Although many useful results have been reported (e.g., Kessler and Neumann 1998; Agray et al., 2001; Stubblebine 2002), due to their complexity and their semantic underpinning the use of BAN logics to prove the correctness of security protocols has so far been of limited success (cf. Abadi and Tuttle 1991; Wedel and Kessler 1996; Bleeker and Meertens 1997). In this paper we will apply insights from dynamic epistemic logics as recently developed by Gerbrandy (1997, 1999), Baltag and Moss (Baltag et al., 1998; Baltag 2002; Baltag and Moss 2004), van Ditmarsch (2000, 2001), and Kooi (2003). Moreover, contrary to the traditional BAN logic approach, our approach is semantic or modeltheoretic. We use Kripke models to represent the epistemic state of Synthese 142: 229–267, 2004. Knowledge, Rationality & Action 289–327. 2004 Kluwer Academic Publishers. Printed in the Netherlands.

[289]

230

ARJEN HOMMERSOM ET AL.

the agents involved in a protocol, similarly to the S5 preserving approach of Van Ditmarsch to analyze certain kinds of games involving knowledge. From the action models of Baltag and Moss we import the idea to describe belief updates of the agents by semantic operators transforming the Kripke models at hand by copying and deleting parts of these models, although we use traditional Kripke models rather than action models. To this end, we need also operations for unfolding models, which is in its turn inspired by Gerbrandy’s work on possibilities. The diﬀerence being that in our approach only partial unfolding is called for. We furthermore propose a language to express belief updates in the context of security protocols as well as properties of these updates, and give a semantics of this language in terms of the models mentioned and the operators on them. Since our approach is model-theoretic, we believe that it may serve as a starting point for the automatic veriﬁcation of (properties of) security protocols. As a case study illustrating our approach we will consider the socalled SRA Three Pass protocol. It is not our intention to prove that the protocol is completely secure (as it is not in full generality), but we will prove that if the agents participating in the protocol are honest, then an intruder watching the communication does not learn anything about the plain-text messages in a single run. Furthermore, we show what the intruder is able to learn about the agents participating. We also discuss the Wide-Mouthed Frog protocol to illustrate the operation developed in the sequel for updating the beliefs of agents. 2. PRELIMINARIES

In this section we brieﬂy discuss some preliminaries and background regarding the semantic updates we will handle and the epistemic models we will use. First, we deﬁne the notion of an objective formula and introduce so-called o-seriality. The set of propositional variables in a model is denoted as P. DEFINITION 2.1. The class of objective formulas is the smallest class such that all propositional variables and atoms p 2 P are objective; if / is objective, then :/ is objective; if /1 and /2 are objective, then /1 ^ /2 is objective. So, objective formulas do not involve beliefs. For our purposes it is important that every agent, at every world, distinguishes a world with [290]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

231

the same ‘objective’ information. This leads to the notion of an oserial model. The operations on Kripke structures discussed in the sequel degenerate for models that are not o-serial. DEFINITION 2.2. A model M ¼ hS; p; R1 ; . . . ; Rm i is o-serial iﬀ for all agents i and w 2 S, there exists v 2 S such that Ri ðw; vÞ and for all objective formulas / it holds that ðM; wÞ / , ðM; vÞ /. We use a; b; c; etc. and i; j as typical agents, taken from a class A. Furthermore, B is used as a doxastic modal operator. For example, Ba / should be read as ‘a believes /’. We interpret formulas on standard Kripke models ðM; sÞ ¼ ðhS; p; R1 ; . . . ; Rm i; sÞ, where ðM; sÞ Bi / iff 8t 2 S : Ri ðs; tÞ ! ðM; tÞ /. We require the relations Ri to be o-serial, transitive and euclidean. This yields a class of models that we will call Kt45, a proper subset of the class of models of the well-known doxastic logic KD45. The lower case t refers to the axiom Bi / ) /

ðtÞ

where the formula / ranges over objective formulas. The system Kt45 is sound with respect to the class of o-serial, transitive and euclidean models (Hommersom 2003). (We conjecture that Kt45 is complete as well for this class.) We will show that the operations we introduce preserve Kt45. The point is that in worlds of Kt45 models, we cannot both have Bi / and :/, for an objective formula /. This is reasonable from the assumption that agents are conscious about the protocol. Therefore, they will not infer objective falsehoods. This objectivity is captured locally for each state. As a consequence, the operations that we introduce can restrict the set of states without destroying objective information. For the analysis of security protocols below, we assume that we are omniscient about the values of the variables in diﬀerent runs of a protocol. For example, the program variable p in a protocol run has the value ½½ p. In the real world it is, obviously, always true that p ¼ ½½ p. However, it is cumbersome to keep track of what is the real world in the operations on Kripke structures that we employ below. Therefore, we assume that an interpretation ½½ is given, that provides the ‘real’ values (not necessarily boolean) of the program variables when needed. It might very well be the case that p 6¼ ½½ p in a certain state. Often, we will abbreviate p ¼ ½½ p to p on (thus transforming a program expression into a propositional variable). Similarly, :p is an abbreviation of [291]

232

ARJEN HOMMERSOM ET AL.

p 6¼ ½½ p. For example, agent a that learns Bb p _ Bb :p, learns that agent b has assigned a value to the program variable p. The types of updates we consider are public announcement of a variable, private learning of a variable, and private learning about the knowledge of other agents. The ﬁrst type of update typically runs as follows: In an open network, agent a sends a message to agent b. From a security perspective, it is customary in the so-called Dolev–Yao framework (Dolev and Yao 1983), to assume that all agents in the network can read this message too. In contrast, the second type of update, describes private learning. For example, agent b receives a message fxgk from agent a. (Here, we use the notation fxgk to denote a message x encrypted with the cryptographic key k.) If b possesses the key k, then b privately learns the message content x. The ﬁnal type of update is probably the most interesting. It is realistic to assume that the steps in a protocol run are known to all agents. Therefore, observing that an agent receives a message will increase the knowledge of the other agents. For example, if agent a sends a message fxgk to agent b, then agent c learns that b has learned the information contained in the message fxgk , but typically, c does not learn x if c does not possess the key k. Stronger types of updates we do not consider here. For example, we will not update the beliefs of an honest agent such that it learns that an intruder has learned about others. In the present paper, we restrict ourselves to beliefs about objective formulas and the updating of such beliefs. 3. UPDATE CONSTRUCTIONS

In this section we describe various types of updates in detail. We will start by deﬁning an update for propositions in Section 3.1. In Section 3.2 we will deﬁne a belief update for agents that learn something about the belief of others. We do this in two slightly diﬀerent ways by varying in the operations that describe a side-eﬀect for an agent. 3.1. Objective Updates The belief update of objective formulas we will use is based on the work reported in (Baltag et al., 1998; Roorda et al., 2002). The construction works as follows: We will make copies of the states of [292]

233

UPDATE SEMANTICS OF SECURITY PROTOCOLS

the model such that the old worlds correspond to the information in the original model and the new worlds correspond to the new information. DEFINITION 3.1. Let a model ðM; wÞ ¼ ðhS; p; R1 ; . . . ; Rm i; wÞ, a group of agents B, and an objective formula / be given. Then UPDATEð/;BÞ ðM; wÞ, the update of ðM; wÞ for agents in B and formula /, is given by UPDATEð/;BÞ ðM; wÞ ¼ ðhS0 ; p0 ; R01 ; . . . ; R0m i; w0 Þ, where

S0 ¼ foldðsÞ; newðsÞ j s 2 Sg w0 ¼ newðwÞ for all p 2 P : p0 ðoldðuÞÞðpÞ ¼ p0 ðnewðuÞÞðpÞ ¼ pðuÞðpÞ for a 2 A, the binary relation R0a on S0 is minimal such that , Ra ðu; vÞ R0a ðoldðuÞ; oldðvÞÞ 0 Ra ðnewðuÞ; newðvÞÞ , Ra ðu; vÞ ^ ðM; vÞ / R0a ðnewðuÞ; oldðvÞÞ , Ra ðu; vÞ

if a 2 B if a 62 B

In order to distinguish the two copies of the states, the tagging function old and new are used. In the new part of the model, agents in B will only consider possible worlds that verify /. Therefore, states can become unreachable from the actual world newðwÞ, and can be dropped. The following example shows how this works on a concrete model. EXAMPLE 3.1 (updating). Consider the model ðM; sÞ in Figure 1, where we have pðsÞðpÞ ¼ true and pðtÞðpÞ = false. The operation we execute is UPDATEðp;fbgÞ , i.e. b learns p. This results in the model ðM; uÞ in Figure 2, where pðuÞðpÞ ¼ pðvÞðpÞ ¼ true and pðwÞðpÞ ¼ false and newðsÞ ¼ u; oldðsÞ ¼ v and oldðtÞ ¼ w. The world newðtÞ is unreachable and is omitted. We can see that the belief of agent a has not changed: it still considers its old worlds possible. The belief of agent b, however, has changed. It now only considers the state u possible where p holds. Note that agent b is aware that agent a does not know about p.

Figure 1. ðM; sÞ:

[293]

234

ARJEN HOMMERSOM ET AL.

Figure 2. ðM; uÞ:

The update operation UPDATEð/;BÞ is based on a formula / and a set of agents B. Roorda et al. (2002) propose a characterization of the formulas that are altered by such an operation with a single learning agent. Here, we extend their deﬁnition for multi-agent purposes. DEFINITION 3.2. An update function ð Þ½/; B is called proper if ðM;wÞ½/;B p , ðM;wÞ p ðM;wÞ½/;B a^b , ðM;wÞ½/;B a and ðM;wÞ½/;B b ðM;wÞ½/;B :a , ðM;wÞ½/;B 6 a ðM;wÞ½/;B Ba a , ðM;wÞ Ba a if a 62 B ðM;wÞ½/;B Ba a , 8v : ððRa ðw;vÞ and ðM;vÞ /Þ ) ðM;vÞ½/;B aÞ if a 2B for every model M and state w. Following Roorda et al. (2002) we have that UPDATEð/;BÞ is proper (cf. Roorda et al., 2002, Proposition 3.2). Moreover, UPDATEð/;BÞ is uniquely characterized by Deﬁnition 3.2 up to elementary equivalence, i.e., if ð Þ½/; B is a proper update function, then ðM; wÞ½/; B and UPDATEð/;BÞ ðM; wÞ are elementary equivalent. We collect the following properties of UPDATEð/;BÞ . THEOREM 3.1. (a) For any objective formula / and set of agents B, it holds that UPDATEð/;BÞ ðM; wÞ

for all b 2 B: [294]

Bb /

235

UPDATE SEMANTICS OF SECURITY PROTOCOLS

(b) If ðM; wÞ satisﬁes the Kt45 properties, the formula / is objective and ðM; wÞ /, then UPDATEð/;BÞ ðM; wÞ satisﬁes the Kt45 properties as well. (c) Updating is commutative, i.e. the models ðM1 ; w1 Þ ¼ UPDATEðw;CÞ ðUPDATEð/;BÞ ðM; wÞÞ; ðM2 ; w2 Þ ¼ UPDATEð/;BÞ ðUPDATEðw;CÞ ðM; wÞÞ

and

for objective formulas /; w and sets of agents B; C, are bisimilar. (d) Update is idempotent, i.e., the two models ðM1 ; w1 Þ ¼ UPDATEðw;BÞ ðUPDATEð/;BÞ ðM; wÞÞ; ðM2 ; w2 Þ ¼ UPDATEð/;BÞ ðM; wÞ

and

for an objective formula / and a set of agents B, are bisimilar. Proof. We prove parts (a) to (c). The proof of part (d) is similar to that of part (c). For part (a) we need to prove, that for any objective /, set of agents B and b 2 B, it holds that ðM0 ; w0 Þ ¼ 0 UPDATEð/;BÞ ðM; wÞ Bb /. Take b 2 B. Since w is newðwÞ, we have, by deﬁnition, for all v, if Rb ðw; vÞ then ðM0 ; vÞ /. Hence, ðM0 ; w0 Þ Bb /. We prove part (b) by checking each of the properties for a Kt45 model. Assume that ðM; wÞ ¼ hS; R1 ; . . . ; Rm ; pi is transitive, and that UPDATEð/;BÞ ðM; wÞ

¼ ðhS; R01 ; . . . ; R0m ; p0 i; w0 Þ

is not. Then there is, for some agent i; ðs; tÞ 2 R0i ; ðt; uÞ 2 R0i and ðs; uÞ 62 R0i . Because, by deﬁnition, there are no arrows from old to new states and no i has both a relation from new to new states and from new to old states, there are only three cases. Firstly, suppose oldðsÞ; oldðtÞ and oldðuÞ. Then, by deﬁnition, we have ðs; tÞ 2 Ri ; ðt; uÞ 2 Ri and by transitivity ðs; uÞ 2 Ri . Then ðoldðsÞ; oldðuÞÞ 2 R0i . Contradiction. Now suppose newðsÞ; oldðtÞ and oldðuÞ. Then ðs; tÞ 2 Ri ; ðt; uÞ 2 Ri and i 62 B. Again, this implies ðnewðsÞ; oldðuÞÞ 2 R0i , which is a contradiction. Finally, suppose newðsÞ; newðtÞ; newðuÞ, then ðs; tÞ 2 Ri ; ðt; uÞ 2 Ri and ðM; uÞ /. Thus, ðnewðsÞ; newðuÞÞ 2 R0i , which is again a contradiction and completes the proof for transitivity. For proving that R0i is euclidean (for all agents i), we need to consider exactly the same cases and for all these cases a similar argument can be made. For o-seriality, the proof obligation is to show that for all s 2 S0 , there is a t 2 S0 such that ðs; tÞ 2 Ri ^ ðM; sÞ w $ ðM; tÞ w, for all objective w. Suppose i 62 B, then this follows directly from the deﬁnition, since we will have some t such [295]

236

ARJEN HOMMERSOM ET AL.

that ðs; oldðtÞÞ 2 Ri . Suppose i 2 B, then by assumption of M being o-serial, there is a t such that ðs; tÞ 2 Ri which agree on the objective formulas. In particular, they agree on /. Suppose ðM; sÞ /, then ðM; tÞ / and therefore ðnewðsÞ; newðtÞÞ 2 Ri . Suppose ðM; sÞ /, then both s and t will be unreachable from w0 and will thus have no corresponding state in M0 . Thus, in all cases o-seriality is preserved. For part (c), we observe that four copies of the original states are made. In both cases we have an original copy (call them the old ones), a new copy that is made in the ﬁrst step is called middle, and the last copy that is made (so a copy from both the old and middle) is called new. Furthermore, the predicates have a superscript of 1 or 2 depending on the model they belong to. In addition, the elements of the model will have superscripts according to their model. We construct a bisimulation R S1 S2 such that it is minimal with respect to , Rðold1 ðsÞ; old 2 ðs0 ÞÞ Rðmiddle1 ðsÞ; new2 ðs0 ÞÞ , Rðnew1 ðsÞ; middle2 ðs0 ÞÞ ,

s ¼ s0 s ¼ s0 s ¼ s0

(referring to equality in the original model). 1. R satisﬁes forward-choice: Suppose Rðs; s0 Þ ^ R1i ðs; tÞ; s, t 2 S1 ; s0 2 S2 , then there is a t0 2 S2 such that Rðt; t0 Þ; ðs0 ; t0 Þ 2 R1 . If old1 ðsÞ, it is trivially satisﬁed (take a old 2 ðt0 Þ, and it is satisﬁed, since oldðtÞÞ. Say middle1 ðsÞ; new2 ðs0 Þ. If i 2 C, then t is apparently a copy from the old1 states to the middle1 states. But then, it is also copied to the new2 states and it is reachable for the same reason that is was reachable in the middle1 state. If i 2 B, the same argument applies as for C. If i 62 C and i 62 B, then one can go back to the old world again as these worlds are still considered possible for these agents i. 2. R satisﬁes backward-choice: The reasoning is similar to the case of forward choice. 3. For all s 2 S; s0 2 S0 ; ðRðs; s0 Þ ) pðsÞ ¼ p0 ðs0 ÞÞ: from the deﬁnition of R and the deﬁnition of UPDATEð/;BÞ ð ; Þ this is trivial. ( The update operation of Deﬁnition 3.1 is restricted to objective formulas. In principle, one can do the same constriction for non-objective ones. However, for a non-objective formula Bi /, it can happen that, unintendedly, an agent increases the objective knowledge encapsulated by the formula /. This is illustrated by the next example. [296]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

237

EXAMPLE 3.2 (updating of non-objective formula). Suppose we are interested in agent a learning the formula Bb p _ Bb :p, but not the property p itself. So, agent a learns that agent b knows about p without getting information about p itself. Consider the Kripke model ðM; sÞ in Figure 3, where pðsÞðpÞ ¼ pðuÞðpÞ = true and pðtÞðpÞ = false. This models the state where b knows that p is true. Agent a does not know p or :p,and it does not know if b knows p. If we apply the deﬁnition of the update operation, it results in the model ðM0 ; vÞ from Figure 4, where p0 ðvÞðpÞ ¼ p0 ðsÞðpÞ ¼ p0 ðuÞðpÞ = true and p0 ðtÞðpÞ = false. In ðM0 ; vÞ it holds that Ba p, as p0 ðvÞðpÞ = true, since v was copied from the state s in M. Figure 4 illustrates that a has learned Bb p _ Bb :p, but also that a has learned p itself, which we wanted to avoid. The reason that it turns out like this, is because the only state in M where Bb p _ Bb :p holds, is the state s. Thus, all the other states have no corresponding new states. In the next subsection we will deﬁne a side-eﬀect function such that a will learn about others, but does not learn any objective formulas itself. 3.2. Side-eﬀects The main reason that an update of Bb p _ Bb :p for agent a has undesired consequences, is that it actually does not include the right arrows between the copies of the original states. The construction, in the case of the non-objective formula Bb p _ Bb:p, deletes arrows of a to gain the states that satisfy the updating formula. However, for the rest, we want a to keep all the states it considers possible. Moreover, we do not want to change the knowledge of the other agents. In this subsection we deﬁne the functions that accomplish these requirements.

Figure 3. ðM; sÞ:

Figure 4. ðM0 ; vÞ:

[297]

238

ARJEN HOMMERSOM ET AL.

A technical obstacle is that states can be shared among agents. It is obvious that if we change a state with the intention to change the belief of one agent, then the belief of the other agents that consider this state possible, is changed as well. Therefore, the ﬁrst thing to do, is to separate the states of learning agents from the states of agents that do not learn. This procedure will be called unfolding. The tag newB is a generalization of new and old from the previous section; the tag orig is only used for the point of the model, i.e. the actual world. DEFINITION 3.3. Given a model ðM; wÞ with M ¼ hS; p; R1 ; . . . ; Rm i, and a partitioning X of A, we deﬁne the operation UNFOLDX ðM; wÞ, the unfolding of ðM; wÞ with respect to X, by 0 0 0 0 0 UNFOLDX ðM; wÞ ¼ ðhS ; p ; R1 ; . . . ; Rm i; w Þ, where S0 ¼ fnewB ðsÞjs 2 S; B 2 Xg [ forigðwÞg w0 ¼ origðwÞ p0 ðnewB ðsÞÞðpÞ ¼ pðsÞðpÞ and p0 ðorigðwÞÞðpÞ ¼ pðwÞðpÞ for all s 2 S, p 2 P; B 2 X for a 2 A, the binary relation R0a on S0 is minimal such that R0a ðnewB ðsÞ; newB ðtÞÞ , Ra ðs; tÞ R0a ðorigðwÞ; newB ðsÞÞ , Ra ðw; sÞ and a 2 B where B ranges over X. So, for every group of agents B there is copy of the original states (viz. newB ðsÞ for every s 2 S). The unfold operation does indeed preserve our Kt45 properties and it models the same knowledge, which is captured by the following theorem. THEOREM 3.2. (a) If ðM; wÞ is a Kt45 model and X a partitioning of A, then it holds that UNFOLDX ðM; wÞ is a Kt45 model too. (b) For every model ðM; wÞ and partitioning X, it holds that ðM; wÞ and UNFOLDX ðM; wÞ are bisimilar. Proof. Part (a) First, we prove R0a is euclidean under the assumption that Ra is euclidean, for any a 2 A. Assume that R0a ðs0 ; t0 Þ ^ R0a ðs0 ; u0 Þ; s0 ; t0 ; u0 2 S0 . The proof obligation is that R0a ðt0 ; u0 Þ where s0 ; t0 and u0 are either in one of the partitions or in orig. Suppose s0 ¼ newW ðsÞ; t0 ¼ newY ðtÞ and u0 ¼ newZ ðuÞ for W; Y; Z 2 X. From the deﬁnition of R0a ðs0 ; t0 Þ, it follows that [298]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

239

Ri ðs; tÞ ^ W ¼ Y, from R0a ðs0 ; u0 Þ follows that Ra ðs; uÞ ^ W ¼ Z. Since Ra is euclidean, we have Ra ðt; uÞ. From W ¼ Y and W ¼ Z we have Y ¼ Z. Thus we conclude R0a ðnewY ðtÞ; newZ ðuÞÞ ¼ R0a ðt0 ; u0 Þ. The only other case is that s0 ¼ origðwÞ; t0 ¼ newW ðtÞ and u0 ¼ newW ðuÞ for some W 2 X. Then Ra ðw; uÞ and Ra ðw; uÞ. Thus R0 aðnewW ðtÞ; newW ðuÞÞ. The proof that R0a is transitive is similar to the euclidean proof. O-seriality can be proven directly, by observing that each newW is a copy of the original model, so the property is preserved inside a partition. Since A 6¼ ;, it also holds in orig, because the world which is a copy of the orig in each partition is accessible (which, clearly, have the same valuation). Part (b) Construct a bisimulation R S S0 such that, for u; v 2 S and W 2 X, Rðu; newW ðvÞÞ , u ¼ v and Rðu; origðwÞÞ , u ¼ w We check the various properties. R satisﬁes forward-choice: Suppose Rðs; s0 Þ and Ri ðs; tÞ; s; t 2 S; 0 s 2 S0 ; a 2 A. If s0 ¼ newW ðsÞ, then by the deﬁnition of UNFOLDX ð ; Þ and Ra ðs; tÞ we have R0a ðnewW ðsÞ; newW ðtÞÞ. If s0 ¼ origðsÞ, we have R0a ðorigðwÞ; newW ðtÞÞ for some W 2 X with a 2 W. Furthermore, for all cases of t0 we get Rðt; t0 Þ. R satisﬁes backward-choice: Suppose Rðs; s0 Þ and Ra ðs0 ; t0 Þ; s 2 S; 0 0 s ; t 2 S0 . For all cases of R0a ðs0 ; t0 Þ, we immediately get Ra ðs; tÞ. Also, for all cases of t0 we have Rðt; t0 Þ. Rðs; s0 Þ ) pðsÞ ¼ p0 ðs0 Þ for all s 2 S, s0 2 S0 : This is immediate, ( from the deﬁnition of R and the deﬁnition of UNFOLDX ð ; Þ. EXAMPLE 3.3 (UNFOLDING). Consider the Kripke model ðM; sÞ in Figure 5 with pðsÞðpÞ ¼ pðuÞðpÞ = true, pðtÞðpÞ = false. So, b knows that p is true, while a does not. Furthermore, a does not know if b knows p. Now the operation we perform is UNFOLDffag;fbgg ðM; sÞ, thus fa; bg is split into fag and fbg, which results in the model ðM0 ; sÞ in Figure 6. So, we have separated the knowledge of a and b. In Figure 6, the state s is the original state, the primed states model a0 s knowledge and the double primed states model b0 s knowledge. Thus, the upper half

Figure 5. ðM; sÞ:

[299]

240

ARJEN HOMMERSOM ET AL.

Figure 6. ðM0 ; sÞ:

of the model represents the knowledge of a, and the lower half represents the knowledge of b. Note that no states are shared, in particular because the point of the model is not reﬂexive. Now, we give some preparatory deﬁnitions leading to the formulation of a side-eﬀect in Deﬁnition 3.9. First, we deﬁne the notion of a partial submodel. DEFINITION 3.4. A model M ¼ hS; p; R1 ; . . . ; Rm i is a partial submodel of M0 ¼ hS0 ; p0 ; R01 ; . . . ; R0m i, notation M Y M0 , iﬀ S S0 ; pðsÞðpÞ ¼ p0 ðsÞðpÞ for all s 2 S, p 2 P and Ri R0i . Note that a partial submodel is not pointed. Our notion of a partial submodel is slightly more liberal compared to the standard notion of a submodel, as here we allow to drop arrows. It is for technical reasons, viz. the handling of the atom split operation and the operation for side-eﬀects below, that we have occasion to consider partial submodels here. Next, we construct a partial submodel that represents the knowledge of a group of agents B. DEFINITION 3.5. Given a model ðM; wÞ such that ðM; wÞ ¼ ðhS; p; R1 ; . . . ; Rm i; wÞ ¼ UNFOLDB;A=B ðM0 ; w0 Þ for some ðM0 ; w0 Þ, deﬁne SUBB ðMÞ, the submodel of M for B, by 0 0 0 0 SUBB ðMÞ ¼ hS ; p ; R1 ; . . . ; Rm i where S0 ¼ fnewB ðsÞjs 2 Sg [ forigðwÞg p0 ðsÞðpÞ , pðsÞðpÞ for all s 2 S0 ; p 2 P, for all a 2 A; R0a ðs; tÞ , Ra ðs; tÞ ^ s; t 2 S0 . Clearly a B-submodel is a partial submodel in the sense of Deﬁnition 3.4. The restmodel is the complementary part of the model that is the [300]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

241

complement with respect to the accessibility relation of a given partial submodel. DEFINITION 3.6. Given a model M ¼ hS; p; R1 ; . . . ; Rm i and a partial submodel N ¼ hS00 ; p00 ; R001 ; . . . ; R00m i of M, deﬁne RESTN, the restmodel of N in M, by RESTN ðMÞ ¼ hS0 ; p0 ; R01 ; . . . ; R0m i where s 2 S0 , s 2 S ^ 9a 2 A 9ðu; vÞ 2 R0a ðu ¼ s _ v ¼ sÞ p0 ðsÞðpÞ , pðsÞðpÞ for all s 2 S0 ; p 2 P for all a 2 A; R0a ðs; tÞ , Ra ðs; tÞ ^ :R00a ðs; tÞ We can see the partial submodel and restmodel deﬁnitions in action by taking the model of Example 3.3 and applying the above deﬁnitions (see Figure 7). This exactly corresponds to the idea of two submodels that represent the belief of diﬀerent agents. Now, we would like to update the belief of some agents. To this end, we want to replace the submodel that represents their belief by a new model. We will apply the following deﬁnition. DEFINITION 3.7. Given a model N ¼ hS; p; R1 ; . . . ; Rm i, a model M, a model N0 such that N Y N0 Y M with RESTN0 ðMÞ ¼ hS0 ; p0 ; R01 ; . . . ; R0m i, we deﬁne the operation REPLACEN0 ðN; MÞ, the replacement of N0 by N in M, by REPLACEN0 ðN; MÞ ¼ hS00 ; p00 ; R001 ; . . . ; R00m i where s 2 S00 , s 2 S _ s 2 S0 , p00 ðsÞðpÞ , pðsÞðpÞ for all s 2 S00 ; p 2 P for all a 2 A; R00a ðs; tÞ , ðs; tÞ 2 Ra _ ðs; tÞ 2 R0a . The idea is, that once the belief is completely separated, we cannot only safely change the belief of certain agents, but also preserve the

Figure 7. Partial submodel and restmodel.

[301]

242

ARJEN HOMMERSOM ET AL.

Kt45 properties. The operation ATOMSPLITð/;BÞ removes the arrows for agents in the group B between states that have a diﬀerent valuation for the objective formula /. DEFINITION 3.8. Given a model M ¼ hS; p; R1 ; . . . ; Rm i and objective formula /, we deﬁne an operation ATOMSPLITð/;BÞ ðMÞ ¼ hS0 ; p0 ; R01 ; . . . ; R0m i as follows.

S ¼ S0 , p0 ðsÞðpÞ , pðsÞðpÞ for all s 2 S0 ; p 2 P, for a 2 B; R0a ðs; tÞ , Ra ðs; tÞ ^ M; s / , M; t /, for a 62 B; R0a ðs; tÞ , Ra ðs; tÞ.z

Finally we are in a position to deﬁne the actual side-eﬀect function that ties these things together. DEFINITION 3.9. For a model ðM0 ; w0 Þ, a set of agents B and an objective formula / such that ðM0 ; w0 Þ ¼ UNFOLDfB;A=Bg ðM; wÞ and N ¼ SUBB ðM0 Þ we deﬁne the operation SIDE-EFFECTð/;B;CÞ ðM; wÞ, the side-eﬀect for agents in B with respect to the agents in C and the formula /, by SIDEEFFECTð/;B;CÞ ðM; wÞ

¼ ðREPLACEN ðATOMSPLITð/;CÞ ðNÞ; M0 Þ; w0 Þ: Note, that the formula / in Deﬁnition 3.9 is required to be objective (cf. Example 3.2). EXAMPLE 3.4. We continue Example 3.3. Consider the a-submodel of M. We now apply ATOMSPLITðp;bÞ on this model which results in the model ðM00 ; sÞ in Figure 8. The arrow ðt0 ; u0 Þ has disappeared, since pðtÞðpÞ 6¼ pðuÞðpÞ. Therefore, u is not reachable anymore, and can be

Figure 8. ðM00 ; sÞ:

[302]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

243

dropped. Notice, that a believes Bb p _ Bb :p, while a has learned nothing about p itself, as was the case for Example 3.2. A typical application of the side-eﬀect function is of the form SIDE-EFFECTðp;A;bÞ where all agents collectively learn that agent b knows about the atom p. We introduce the notion of interconnection of relations, that comes in handy for a proof of the preservation of the Kt45 properties by the side-eﬀect operation. Two binary relations A and B are called interconnected iﬀ there is a ðw; vÞ 2 A and ðs; tÞ 2 B such that w ¼ s; w ¼ t; v ¼ s or v ¼ t. If two binary relations are not interconnected, we call them separated. Separateness is useful because of its following properties. LEMMA 3.1. (a) If binary relations A and B are separated and are both Kt45, then A [ B is also Kt45. (b) If A and B are separated and A [ B has the Kt45 properties then both A and B have the Kt45 properties. Proof. (a) We restrict ourselves only to the proof that union preserves the euclidean property. Assume C ¼ A [ B not euclidean. Then there is ðs; tÞ 2 C and ðs; uÞ 2 C and ðt; uÞ 62 C. Observe that (1) ðs; tÞ and ðs; uÞ cannot both come from A or both come from B, since those relations were both euclidean, and that would mean ðt; uÞ 2 C and ð2Þðs; tÞ and ðs; uÞ cannot come from the distinct subsets since that would contradict the interconnection property. So, in conclusion ðs; tÞ and ðs; uÞ cannot be elements of C. This directly contradicts the assumption. Hence C is euclidean. (b) Because of symmetry we only have to prove this for A. Suppose ðs; tÞ 2 A; ðs; uÞ 2 A. Since C ¼ A [ B is euclidean, C must contain ðt; uÞ. But because A and B are not interconnected, ðt; uÞ must be part of A. Therefore, A is euclidean. Proofs for transitivity and o-seriality are similar. ( We have seen that sets that are separated can be split and joined together without changing the Kt45 properties. In the next lemma we apply this for the replace operation. LEMMA 3.2. Let f : M ! M be an operation on the class of models such that ðiÞf ðMÞ Y M for any model M 2 M and (ii) preserves Kt45 [303]

244

ARJEN HOMMERSOM ET AL.

properties. Suppose model ðM; wÞ ¼ UNFOLDfB;AnBg ðM0 ; w0 Þ and N ¼ SUBB ðMÞ. Then the operation deﬁned by REPLACEN ðfðNÞ; MÞ preserves the Kt45 properties too. Proof. From the deﬁnition of UNFOLD, it is quite easy to see, that N is separated (for all Ra ; a 2 AÞ with the rest, since for a there’s only a relation between orig and the partition where a is in, and for the other agents the relation from orig to the partition of a does not exist. So from Lemma 3.1 and the fact that M has the Kt45 properties, we must conclude that both the B-submodel (N) and the restmodel have the Kt45 properties. Since f preserves the Kt45 properties, fðNÞ also has the Kt45 properties too. By observing that doing a REPLACEN is the same as doing a union of the accessibilityrelations of the B-restmodel with the new replacement, we can now use Lemma 3.1, and conclude that REPLACEN ðfðNÞ; MÞ has the Kt45 properties as well. ( In order to apply the above lemma we check that splitting preserves Kt45. LEMMA 3.3. Given a Kt45-model ðM; wÞ ¼ ðhS; p; R1 ; . . . ; Rm i; wÞ, then, for an objective formula / and subset of agents B, the model 0 0 0 0 ATOMSPLITð/;BÞ ðMÞ ¼ hS ; p ; R1 ; . . . ; Rm i has all the Kt45 properties too. Proof. For proving the new model is euclidean, suppose Ri is euclidean and ðs; tÞ 2 R0i ^ ðs; uÞ 2 R0i . From ðs; tÞ 2 R0i follows that pðsÞðpÞ ¼ pðtÞðpÞ and ðs; tÞ 2 Ri . From ðs; uÞ 2 R0i follows that pðsÞðpÞ ¼ pðuÞðpÞ and ðs; uÞ 2 Ri . Thus, pðtÞðpÞ ¼ pðuÞðpÞ and therefore if ðt; uÞ 2 Ri , then ðt; uÞ 2 R0i . Hence, ðt; uÞ 2 R0i . The proof for transitivity is similar. For preservation of o-seriality we suppose s 2 S0 . By deﬁnition s 2 S and there is some t 2 S such that ðs; tÞ 2 Ri and ðM; sÞ u $ ðM; tÞ u, where u is objective. In particular then ( it holds that pðsÞðpÞ ¼ pðtÞðpÞ. By deﬁnition ðs; tÞ 2 R0i . We are now in a position to prove a number of properties of the sideeﬀect operation. THEOREM 3.3. (a) If ðM; wÞ is a Kt45-model, then SIDE-EFFECTð/;B;AÞ ðM; wÞ, for any objective formula / and sets of agents B; C, is a Kt45-model as well. [304]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

245

(b) (commutativity of side-eﬀect) Given a model ðM; wÞ, sets of agents B; C; D; E and two formulas /; w, it holds that SIDEEFFECTð/;D;EÞ ðSIDEEFFECTðw;B;CÞ ðM; wÞÞ and SIDE-EFFECTðw;B;CÞ (SIDE-EFFECTð/;D;EÞ ðM; wÞÞ are bisimilar. (c) (swapping update and side-eﬀect) Given a model ðM; wÞ, sets of agents B; C; D, a formula / and an objective formula w, it holds that the models SIDEEFFECTð/;C;DÞ ðUPDATEðw;BÞ ðM; wÞÞ

and UPDATEðw;BÞ ðSIDEEFFECTð/;C;DÞ ðM; wÞÞ

are bisimilar. (d) (idempotency of side-eﬀect) Given a model ðM; wÞ, sets of agents B; C; D and a formula /, it holds that SIDEEFFECTð/;B;CÞ ðSIDEEFFECTð/;B;CÞ ðM; wÞÞ

and SIDEEFFECTð/;B;CÞ ðM; wÞ

are bisimilar. Proof. (a) Clearly, ATOMSPLITðp;CÞ ðMÞ Y M holds. Therefore, the statement follows from Lemma 3.2 and Lemma 3.3. (b) There are several tagged states. We have orig; newB ; newAnB , newD ; newAnD in both models. Construct a bisimulation R S S0 such that RðorigðwÞ; origðvÞÞ , w¼v RðnewB ðuÞ; newB ðvÞÞ , u¼v RðnewAnB ðuÞ; newAnB ðvÞÞ , u ¼ v RðnewD ðuÞ; newD ðvÞÞ , u¼v RðnewAnD ðuÞ; newAnD ðvÞÞ , u ¼ v We can now follow the reasoning from Theorem 3.1 to see that this bisimulation has all the desired properties for bisimulation. For example, R satisﬁes forward-choice: suppose Rðs; s0 Þ ^ Ra ðs; tÞ; s; t 2 S; s0 2 S0 . Suppose origðsÞ, then t could be newB or newAnB . But this copy of t is present in S0 as well, since that was created after the ﬁrst execution of SIDE-EFFECT on the original model. And indeed, there’s a origðwÞ such that ðw; tÞ 2 S0 . The checks for all the other options are similar. [305]

246

ARJEN HOMMERSOM ET AL.

(c) Construct a bisimulation R S S0 such that: RðorigðwÞ; origðvÞÞ , w¼v RðoldðuÞ; oldðvÞÞ , u¼v RðnewðuÞ; newðvÞÞ , u¼v , u¼v RðnewB ðuÞ; newB ðvÞÞ RðnewA=B ðuÞ; newA=B ðvÞÞ , u ¼ v Checking all the properties is similar to (b). (d) The result of applying the same side-eﬀect operation twice is the same model as applying it just once modulo a number of unreachable states. Again, construct a bisimulation R similar to all the previous proofs such that the relation exists if they are a copy of each other and reachable from the point of the model. The unprimed variables belong to SIDE-EFFECTð/;B;CÞ ðM; wÞ and the primed ones to SIDE-EFFECTð/;B;CÞ (SIDE-EFFECTð/;B;CÞ ðM; wÞÞ. Now, R satisﬁes forwardchoice: suppose Rðs; s0 Þ ^ Ra ðs; tÞ; s; t 2 S; s0 2 S0 . Suppose newB ðsÞ, then we indeed have such a copy newB ðt0 Þ with Rðt; t0 Þ, since no more arrows inside new0B were deleted, because only the states from newB are reachable in new0B . For the other cases it it trivial. It can also be shown that it satisﬁes the other properties that are required. ( Next, we consider how the formulas are altered by the side-eﬀect operation. We will partially answer this by presenting a few interesting formulas that hold in the resulting model. We distinguish 1: the group of agents B that learn about other agents, ranged over by b; 2: the group of agents C that is learned about, ranged over by c; 3: other agents in the group D, ranged over by d. The fact that the agents in B are the only agents that learn at all, is clear. The other agents consider exactly (copies of) their old worlds possible; their belief has not changed. With this in mind, we present a few properties of the side-eﬀect operation. Below CBCD expresses common knowledge of agents in the sets B; C and D. LEMMA 3.4. Given a model ðM; wÞ, disjoint sets of agents B; C; D and a formula /, put ðM0 ; w0 Þ ¼ SIDEEFFECTð/;B;CÞ ðM; wÞ. Then it holds that (a) ðM0 ; w0 Þ Bb ðBc / _ Bc :/Þ; (b) ðM0 ; w0 Þ Bb w iff ðM; wÞ Bb w for any objective formula w; (c) ðM0 ; w0 Þ Bb CBCD ðBc / _ Bc :/Þ; [306]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

247

(d) ðM0 ; w0 Þ Ba w iﬀ ðM; wÞ Ba w for any formula w and any agent a 62 B. Proof. We prove the typical cases of part (b) and (c). (b) The unfolded model of ðM; wÞ is bisimilar with ðM0 ; w0 Þ, and no arrows of b were deleted afterwards. Hence, the knowledge of b about objective formulas has not changed. (c) We have already seen that in every state of SUBB ðM0 Þ it holds that Bc / _ Bc :/. Now, what we need to prove is that the path j b i w0 ! s1 ! s2 ! . . . with i; j; . . . ; 2 A is a path to a state where it holds that Bb p _ Bb :p. But since s1 2 newB , and since there are no arrows from newA to other partitions, all sk are elements of newB . Thus any sk is part of SUBb ðM0 Þ. By the construction of SIDEEFFECTð/;B;CÞ ðM; wÞ, in any state in newB , we either have all arrows of c to a world where / holds (at least one, by o-seriality) or to a world where :/ holds. Furthermore, Bc / or Bc :/ holds. Hence, ( Bc / _ Bc :/ holds. In part (a) of the above lemma, an agent b obtains derived knowledge of an agent c. Part (b) states that no objective knowledge is learned. Part (c) phrases that an agent b considers the rest of the agents as smart itself. Finally, part (d) captures that other agents do not learn. Property (c) may or may not be a reasonable assumption of b about the other agents. If one agent believes that another agent knows the value of /, then it is reasonable to assume that another agent will believe the same. On the other hand common knowledge might be too strong to assume. Next, we address the issue that an agent b shares its belief about an agent c with only some other agents. We represent this by linking b’s beliefs of those other agents back to the original (unmodiﬁed) states. We distinguish four diﬀerent type of groups of agents. 1: the groups B and C are as before; 2: the group D of agents of which agents in B believe they have learned in common about agents in group C, ranged over by d; 3: the group E of agents of which agents in B believe they have learned nothing about, ranged over by e. We deﬁne the new side-eﬀect operation 0-UNFOLD that handles this reﬁnement. Here, 0 refers to zero-knowledge for the group of agents E. As before, we deﬁne an unfolding operation ﬁrst. [307]

248

ARJEN HOMMERSOM ET AL.

DEFINITION 3.10. Given a model ðM; wÞ, with M ¼ hS; p; R1 ; . . . ; Rm i, and a partitioning X ¼ fB; C; D; Eg of the set of agents A, we deﬁne the operation 0-UNFOLDX ðM; wÞ, the zero-knowledge unfolding of ðM; wÞ with respect to X, by 0-UNFOLDX ðM; wÞ ¼ ðhS0 ; p0 ; R01 ; . . . ; R0m i; w0 Þ, where S0 ¼ newB ðSÞ [ newCDE ðSÞ [ forigðwÞg w0 ¼ origðwÞ p0 ðnewY ðvÞÞðpÞ ¼ pðvÞðpÞ and p0 ðorigðwÞÞðpÞ ¼ pðwÞðpÞ for all p 2 P; Y 2 ffBg; fCDEgg for a 2 A; R0a on S0 is the minimal binary relation such that R0a ðorigðwÞ; newB ðvÞÞ R0a ðorigðwÞ; newCDE ðvÞÞ R0a ðnewB ðuÞ; newCDE ðvÞÞ R0a ðnewB ðuÞ; newB ðvÞÞ R0a ðnewCDE ðuÞ; newCDE ðvÞÞ

, Ra ðw; vÞ ^ a 2 B , Ra ðw; vÞ ^ a 62 B , Ra ðu; vÞ ^ a 2 E , Ra ðu; vÞ ^ a 62 D , Ra ðu; vÞ

So, instead of completely separating the knowledge of agents in B with the other agents, we share this knowledge with the other agents. Since the other agents do not learn anything, agents in B does not gain knowledge about E. We present a theorem similar to Theorem 3.2. THEOREM 3.4. (a) If ðM; wÞ is a Kt45 model, then so is 0-UNFOLDðM; wÞ. (b) For every model ðM; wÞ, it holds that ðM; wÞ and 0UNFOLDðM; wÞ are bisimilar. Proof. (a) The accessibility relations of every agent other than those in E are constructed exactly the same way as in Deﬁnition 3.3. Since that operation preserves the Kt45 properties, we do not have to prove it for those agents. A proof for the agents in E now follows. For transitivity assume ðs; tÞ 2 R0a ; ðt; uÞ 2 R0a . By the deﬁnition, we immediately see that u is of the form newCDE ð Þ. Also we see that if there is a Ra ðs; uÞ then there’s a R0a ðs; uÞ. Thus, we can conclude R0a ðs; uÞ. The proof that R0a euclidean is similar. For o-seriality observe that every arrow of an agent in E ends in a state that used to be its old state. Thus, o-seriality is preserved. (b) Construct a bisimulation R S S0 such that Rðu; newY ðvÞÞ , u ¼ v and Rðu; origðvÞÞ , u ¼ v: [308]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

249

We prove that R satisﬁes forward-choice: Suppose Rðs; s0 Þ and Ra ðs; tÞ; s; t 2 S; 0 s 2 S0 . Suppose origðsÞ, then we have newB ðtÞ if a 2 B, else we have newBCD ðtÞ if a 62 B. Suppose newCDE ðsÞ, then we immediately have newCDE ðtÞ for all agents a. Suppose newB ðsÞ, then we have newB ðtÞ if a 62 E, else newCDE ðtÞ. R satisﬁes backward-choice: Suppose Rðs; s0 Þ and Ra ðs0 ; t0 Þ; s 2 S; s0 ; t0 2 S0 . We have that Ra ðs0 ; t0 Þ immediately implies Ra ðs; tÞ for some the s; t such that s0 ; t0 are copies from s; t. This implies Rðt; t0 Þ. Rðs; s0 Þ ) pðsÞ ¼ p0 ðs0 Þ for s 2 S; s0 2 S0 : Direct from the deﬁnition of R. ( Due to the case distinction for arrows leaving the point origðwÞ, in Deﬁnition 3.10 above, it holds that the knowledge of B-agents about C-agents is separated, with the knowledge of other agents about agents in C or of C-agents themselves. So we can ‘cut out’ the submodel containing the c arrows from the belief of b. In this deﬁnition we can omit the point of the model in our partial submodel, which makes it slightly easier. with DEFINITION 3.11. Given a model ðM0 ; w0 Þ, 0 0 0 0 0 0 ðM ; w Þ ¼ hS ; p ; R1 ; . . . ; Rm i, an objective formula / and a partitioning X of A (as in Deﬁnition 3.10) such that ðM0 ; w0 Þ ¼ 0-UNFOLDX ðM; wÞ, for some ðM; wÞ, we deﬁne SUBB ðM0 Þ ¼ hS00 ; p00 ; R001 ; . . . ; R00m i where

S00 ¼ fnewB ðsÞjs 2 Sg p00 ðsÞðpÞ , pðsÞðpÞ for all s 2 S0 ; p 2 P R00c ðs; tÞ , R0c ðs; tÞ for s; t 2 S00 R00a ¼ ; ða 62 CÞ.

The operation 0-SIDE-EFFECTð/;XÞ , the zero-knowledge side-eﬀect of / with respect to the partitioning X, is then given by 0-SIDEEFFECTð/;XÞ ðM; wÞ ¼ ðREPLACEN ðATOMSPLITð/;CÞ ðNÞ; M0 Þ; w0 Þ where N ¼ SUBB ðM0 Þ. The operation 0-SIDE-EFFECT has algebraic properties comparable to those of the previous side-eﬀect operation (cf. Theorem 3.3). We also have to following result, corresponding to Lemma 3.4. [309]

250

ARJEN HOMMERSOM ET AL.

LEMMA 3.5. Given a model ðM; wÞ, a partitioning X of A and an objective formula / such that the model ðM0 ; w0 Þ ¼ 0-SIDEEFFECTð/;XÞ ðM; wÞ, it holds that (a) (b) (c) (d) (e)

ðM0 ; w0 Þ Bb ðBc / _ Bc :/Þ. ðM0 ; w0 Þ Bb w iﬀ ðM; wÞ Bb w for any objective formula w. ðM0 ; w0 Þ Bb CBCD ðBc / _ Bc :/Þ. ðM0 ; w0 Þ Ba w iﬀ ðM; wÞ Ba w for a 62 B for any formula w. ðM0 ; w0 Þ Bb Be w iﬀ ðM; wÞ Bb Be w for any formula w.

Proof. We provide the proof for (e). ð)Þ Assume ðM0 ; w0 Þ b e Bb Be w, then we have for all paths w ! s ! t; t w. But, since all arrows of e in newB point back to (a copy of) the original model, we have that ðM; tÞ w:ð(Þ Assume ðM; wÞ Bb Be w. Now, we have a b e path in the original model w ! s ! t. In the construction of ðM; wÞ no arrows of b and e are ever added nor deleted. ( Parts (a) to (d) are correspond to the properties given in Lemma 3.4. Part (e) states that the knowledge of agent b about agent e has not changed, which is exactly as desired. EXAMPLE 3.5 (alternative side-eﬀect function). Recall the model ðM; sÞ from Example 3.3 (Figure 5). We now present this model with four agents fb; c; d; eg in Figure 9, with a partitioning into singletons, such that pðsÞðpÞ ¼ pðuÞðpÞ ¼ true and pðtÞðpÞ = false. Now, apply 0-SIDE-EFFECTðp;b;cÞ ðM; wÞ and we gain the model ðM0 ; sÞ from Figure 10 such that pðsÞðpÞ ¼ pðs0 ÞðpÞ ¼ pðs00 ÞðpÞ ¼ pðu0 ÞðpÞ = true, pðtÞðpÞ ¼ pðt0 ÞðpÞ = false. Note that in this model, b still knows exactly the same about e as it did before. 3.3. Comparison with the Action Model Approach Baltag and Moss (Baltag et al., 1998; Baltag and Moss 2004) propose a framework for describing epistemic actions using action models. Similar to Kripke models that describe the uncertainty of agents

Figure 9. ðM; sÞ:

[310]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

251

about which world they are in, they use Kripke models to describe the uncertainty of agents about the action that is being performed. A Formally, an epistemic action model is a triple R ¼ hR; !; prei, A where R is a set of simpleactions; ! is an accessibility relation of agents on actions and the precondition pre is a mapping pre: R ! U with U being the collection of all epistemic propositions. The central operation of updating an epistemic model M ¼ hS; R1 ; . . . ; Rm ; pi as we have used so far with such an action model R is deﬁned as M R ¼ hS R; R01 ; . . . ; R0m ; p0 i, where S R ¼ fðs; rÞ 2 S RjðM; sÞ pre ðrÞg A R0i ððs; rÞ; ðs0 ; r0 ÞÞ iﬀ Ri ðs; s0 Þ and r ! r0 p0 ððs; rÞÞðpÞ iﬀ pðsÞðpÞ In Baltag and Moss (2004), Baltag and Moss provide, based on this notion of updating, illuminating examples and study several interesting applications of this idea such as the public and private learning of, what we would call, objective formulas. This poses the question if we can describe the more complicated updates as well.

Figure 10. ðM0 ; sÞ:

Figure 11. 0-SIDE-EFFECT action model.

[311]

252

ARJEN HOMMERSOM ET AL.

In Figure 11 the ovals depict a precondition with u some objective formula. Intuitively, this action model corresponds to the operation of 0-SIDE-EFFECT. Indeed, the update product of the Kripke structure of Example 3.5 in Figure 9 and the action model of Figure 11 results in the model of Figure 10. Also, omitting the e-arrows from the action model in Figure 11 yields an action model that for the concrete examples discussed above corresponds to the side-eﬀect operation of Deﬁnition 3.9. However, currently we have no proof that such a correspondence holds in general. 4. A LOGICAL LANGUAGE FOR SECURITY PROTOCOLS

In this section we exploit the ideas of the previous section for a logical language to reason about security protocols. The UPDATE and SIDE-EFFECT operations are used for its semantics. We introduce socalled transition rules for the modeling of security protocols, that we discuss in the next section. DEFINITION 4.1. Fix a set of proposition P, ranged over by p, and a set of agents A of m elements, ranged over by i; j. The language LC is given by / ::¼ pj:/j/1 ^ /2 jBi /j½r/ r ::¼ Privði ! j; pÞjPubði; pÞjr; r0 where C is a collection of so-called transition rules. The r symbol denotes a (possibly composed) communication action. The action Privði ! j; pÞ is a private or peer-to-peer message p from i to j; the action Pubði; pÞ means a public announcement or a broadcast by i about p. In the latter, every agent on the network learns p, whereas in the former, only j learns p. The bracket operator ½r/ has the interpretation that after executing the communication action r; / holds. The subscript in LC refers to a set of so-called transition rules C. The transition rules capture the updates, i.e., the expansions and side-eﬀects, necessary for the interpretation of indirect eﬀect of the constructs Privði ! j; pÞ and Pubði; pÞ. The transitions rules enforce consistency among the propositions that hold. For example, if an agent believes that the value of a message m is ½½m and possesses a key k, then it must believe that the value of the encryption fmgk of m has a value that corresponds with ½½m. [312]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

253

A transition rule has the form Bi p ) b. The condition Bi p expresses that p must be believed by agent i. The body b of a transition rule is a sequence of actions a1 ; . . . ; an . Actions come in three ﬂavours, viz. LB p; SB;C p and S0B;C;D p. Here, LB p expresses that p is learned among the agents in the set B and corresponds to belief expansion, whereas SB;C p expresses the side-eﬀect that the agents in the set B have learned that the agents in the set C now know about p. Similarly, S0B;C;D p is used for the side-eﬀect where agents in B assume that the agents in D have learned as well. As an example, we will have the transition rule Bb fxgk ) Lab x, when agents a and b share the key k, and a sends b the message fxgk . In the situation described above, agent a sends the message x to agent b and agent b returns the message fxgk . Since it is shared, a already can compute fxgk itself, so the delivery of fxgk does not teach a anything about this value. However, the transition rule expresses that a and b commonly learn, and, in particular, a learns that b knows the message x. The semantics for the language LC , provided in the next deﬁnition, follows the set-up of, e.g., Baltag et al. (1998), and Clark and Jacab (2000). Deﬁnition 4.2 is organized in three layers. First, there is the layer of the actions of the language. The deﬁning clauses make use of an auxiliary operation .p . This operation helps in the processing of the relevant transition rules. The set ModðM; w; pÞ collects all the transition rules that will change the model. The next layer of the deﬁnition concerns the body of a transition rule. The last part of Deﬁnition 4.2 concerns the validity of the formulas of LC . DEFINITION 4.2. Let C be a ﬁnite set of transition rules. For r 2 LC the relation ½½r on models for A over P is given by ðM; wÞ½½ Privði ! j; pÞðM0 ; w0 Þ , ðM; wÞ Bi p ) ðUPDATEp;j ðM; wÞ .p ðM0 ; w0 ÞÞ ðM; wÞ½½ Pubði; pÞðM0 ; w0 Þ , ðM; wÞ Bi p ) ðUPDATEp;A ðM; wÞ .p ðM0 ; w0 ÞÞ ðM; wÞ½½r; r0 ÞðM0 ; w0 Þ , ðM; wÞ½½rðM00 ; w00 Þ½½r0 ðM0 ; w0 Þ for some model ðM00 ; w00 Þ [313]

254

ARJEN HOMMERSOM ET AL.

ðM; wÞ .p ðM0 ; w0 Þ , ifðx ) bÞ 2 ModðM; w; pÞ then ðM; wÞhbiðM00 ; w00 Þ .p ðM0 ; w0 Þ for some ðM00 ; w00 Þ else ðM; wÞ ¼ ðM0 ; w0 Þ end ðM; wÞhiðM0 ; w0 Þ , ðM; wÞ ¼ ðM0 ; w0 Þ ðM; wÞhLB p; biðM0 ; w0 Þ , UPDATEðp;BÞ ðM; wÞhbiðM0 ; w0 Þ ðM; wÞhSB;C p; biðM0 ; w0 Þ , SIDEEFFECTðp;B;CÞ ðM; wÞhbiðM0 ; w0 Þ ðM; wÞhS0B;C;D p; biðM0 ; w0 Þ , 0-side-effectðp;XÞ ðM; wÞhbi where X ¼ fB; C; D; A n ðB [ C [ DÞg ðM; wÞ p , pðwÞðpÞ ¼ true ðM; wÞ :/ , ðM; wÞ/ ðM; wÞ / ^ w , ðM; wÞ / and ðM; wÞ w ðM; wÞ Bi / , ðM; vÞ / for all v such that Ri ðw; vÞ ðM; wÞ ½r/ , ðM0 ; w0 Þ / if ðM; wÞ½½rðM0 ; w0 Þ where ModðM; w; pÞ ¼ fBi p ) b 2 CjðM; wÞ Bi p; ðM; wÞhbiðM0 ; w0 Þ; 9/ : ðM0 ; w0 Þ / 6$ ðM; wÞ /g The private and public communication of p can only be executed under the condition Bi p. The communication has the eﬀect that the [314]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

255

agent j, respectively all agents get informed about p. Next, the transition rules are invoked as a consequence of some parties learning p, as expressed by the operator .p . The ‘modiﬁers’, the transition rules in the set ModðM; w; pÞ are those rules that match the learning of p and, moreover, will transform ðM; wÞ into a different model, i.e., some formula / will have changed its truth value. As a consequence of the algebraic properties of the update and side-eﬀect operators of Section 3, the order of in which the transition rules are processed does not matter and every ‘modifying’ rule gets applied at most once (by idempotency). So, no rules are applied over and over again. Apart from this, the above deﬁnition also works for general formulas instead of objective ones (cf. Balta et al., 1998; van Ditmansch 2000).

5. EXAMPLES

In this section we discuss how the machinery developed above works out for a concrete example. Preparatory for this, in order to keep the models within reasonable size, we employ two helpful tricks. The ﬁrst one is the disregarding of propositions not known to any agent. Thus, if a proposition is not part of the model, then the interpretation is that no agent has any knowledge about it. What we then have to specify is how to add a fresh proposition to the model. We accomplish this by making two copies of the original states. One of them we assign ‘positive’ and the other ‘negative’. In the positive states, the proposition will be true, and in the negative states, the proposition will be false. DEFINITION 5.1. Given a model ðM; wÞ ¼ hS; p; R1 ; . . . ; Rm i and a fresh proposition p, we deﬁne the operation ADDATOMp such that ðM0 ; w0 Þ ¼ ADDATOMp ðM; wÞ ¼ hS0 ; p0 ; R01 ; . . . ; R0m i where

S0 ¼ posðSÞ [ negðSÞ p0 ðposðsÞÞðqÞ ¼ if p ¼ q then true else pðsÞðqÞ p0 ðnegðsÞÞðqÞ ¼ if p ¼ q then false else pðsÞðqÞ R0i ðs; tÞ , Ri ðs; tÞ for any agent i w0 ¼ posðwÞ

We suppress straightforward technicalities regarding the restriction of the domain of the valuation p or expansion of the set of proposition P. [315]

256

ARJEN HOMMERSOM ET AL.

We have the following property. LEMMA 5.1. Given a model ðM; wÞ and a fresh proposition p such that ðM0 ; w0 Þ ¼ ADDATOMp ðM; wÞ it holds that (a) ðM0 ; w0 Þ p; (b) ðM; wÞ / , ðM0 ; w0 Þ / for p 62 / with / the closure under subformulas of /; (c) ðM0 ; w0 Þ 6 Bi p for all agents i. Proof. (a) Trivial, since we make the new point the positive copy of the old point. (b) We prove the stronger property ðM; sÞ /; p 62 / , ðM0 ; s0 Þ where s0 is a copy of s. Proof by induction on complexity of /. Suppose / is objective then it’s trivial (since p 62 / ). Now suppose / ¼ Bi w, then each state that is reachable in the resulting model is reachable if and only if it was reachable from a copy in the original model. By induction, we have the property for w, so it follows that we have it for /. (c) It holds that w0 ¼ posðwÞ and we have some w00 ¼ negðwÞ, such ( that Ra ðw0 ; w00 Þ. The second trick helps to short-cut the application of rules which helps keeping the model in a reasonable size. LEMMA 5.2. Given a model ðM; wÞ, an agent i 2 A, a set of agents B A it holds that the model ðM0 ; w0 Þ such that ðM; wÞhLi p; SB ;i pixhLA piðM0 ; w0 Þ for some model x, and the model ðM00 ; w00 Þ such that ðM; wÞhLA piðM00 ; w00 Þ are bisimilar. Proof. The proof is a corollary of the following two properties: 1: Li p; LA p ¼ LA p 2: SB;i p; LA p ¼ LA p The proof of these properties is similar to the ones we have seen before. Here we proof (1). Because in both operations the last operation is a LA we can easily that only new states are reachable. Bisimulation is constructed by associating links between states that [316]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

257

are copies of each other in the original model. Checking bisimulation now is trivial. For (2) it is similar. Intuitively for the ﬁrst operation, again in the last step, only the new are reachable. The removal of a in the knowledge of B in the ﬁrst step is redundant because of the removal that happens in any case in the second step. ( That is to say, if an agent i learns p and then all other agents learn about i that it has learned p, followed by the action where everyone learns p (commonly), then it is equivalent to say that they have just learned p commonly. 5.1. The SRA Three Pass protocol Shamir, Rivest and Adleman have suggested the three-pass protocol (Clark and Jacob, 1997) for the transmission of a message under the assumption of a commutative cipher. It is known to be insecure and various attacks have been suggested. However, it serves an illustrative purpose here. The protocol has the following steps: 1: a ! b : fxgka 2: b ! a : ffxgka gkb 3: a ! b : fxgkb Here, both agent a and b have their own symmetric and unshared encryption key, ka and kb , respectively. Agent a wants to send message x to agent b through an insecure channel and therefore wants to send x encrypted to b. It does this by sending x protected with its own key. Next, b will encrypt this message with b0 s key and sends this back. Since the encryption is assumed to be commutative, a can now decrypt this message and sends the result to b. Finally, b can decrypt the message it has just received and learn the value of x. In our modeling, we consider three agents fa; b; cg. It is assumed that all agents can see the activity of the network. In particular, they see messages been sent out and received. We are interested in what agent c can learn during a run of this protocol between agents a and b. We use the notation mK , for a possibly empty set of agents K, to denote the message m encrypted with the keys of all agents in the set K. We have, e.g., mfa;bg ¼ ffmgka gkb . Since the cipher is commutative, this is well deﬁned. Also, we write S;a / instead of SA;fag /, to express that all agents learn that agent a knows about formula /. [317]

258

ARJEN HOMMERSOM ET AL.

Next, we deﬁne the transition rules. The ﬁrst transition rule models the fact that agents can encrypt with their own key: Bj mK ) Lj mKþj ; S;j mKþj : For simplicity, it is assumed that j 62 K. Thus, if an agent j happens to learn the value of a message encrypted with the keys of the agents in the set K, then agent j can encrypt the message received with its own key added (provided it was not used already). Moreover, as the other agents have seen that agent j has received the message, the agents collectively learn that agent j knows about the result after adding its key, as expressed by S; j mKþj . Complementary, we have the transition rule Bj mK ) Lj mKj ; S; j mKj for every agent j 2 A. Now, it is assumed that the agent j is among the agents in K. By commutativity of the cipher, agent j can then (partially) decrypt the message mK and learn mKj whereas all others know that agent j can do this. In the modeling, we limit ourselves by deﬁning the list of useful propositions. The propositions we want to consider here are P ¼ fm; ma ; mb ; mab g where ma abbreviates fxgka ¼ ½½fxgka and mab abbreviates ffxgka gkb ¼ ½½ffxgka gkb . Recall, that ½½ y denotes the real value of y, i.e., the value of the expression y in the point of the model. Next, we must represent the initial knowledge of the agents, i.e., their knowledge before the run of the protocol. We will assume that a is the only agent that knows m and ma . Furthermore, we will assume that the other agents know this about a. The corresponding Kripke structure is in Figure 12. The SRA protocol can be captured by three public announcement Pubða; ma Þ; Pubðb; mab Þ and Pubða; mb Þ. We are curious whether at the end of the protocol

Figure 12. Starting point.

[318]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

259

(i) agent b will know m; (ii) agent a will know that agent b knows m; (iii) agent c only knows that agents a and b know about m. The ﬁrst step is executed. That is, ma is propagated on the network, so all agents will learn its value. So, we execute the action Pubða; ma Þ. If we discard the states that become unreachable, this results in the model of Figure 13. Note that in this model Bb ma holds. This is the condition of one of the transition rules, that is, it triggers Bb ma ) Lb mab ; S;b mab since its antecedent holds in the point now. For processing the operation Lb mab , we notice that mab is not modeled yet, so this is the ﬁrst thing to do. We will not repeat ma in the ﬁgure since this holds in any state of the model. The operation ADDATOMmab results in the model of Figure 14. Instead of applying the body of the transition rule Lb mab ; S;b mab to this model, we observe that in the next step of the protocol LA mab is executed, as the result of the action Pubðb; mab Þ, since the message is being transmitted to all agents on the network. So, with an appeal to Lemma 5.2, it is justiﬁed to skip the operation that are required by the transition rules and perform LA mab only. We arrive at the model in Figure 15. In turn, this results in the triggering of the transition rule: Ba mab ) La mb ; S;a mb . This is in fact a completely similar case to the previous step of the protocol. Since the next action of the protocol is

Figure 13. After Pubða; ma Þ.

Figure 14. Added mab :

[319]

260

ARJEN HOMMERSOM ET AL.

Pubða; mb Þ the value of mb will be learned by all agents, anyway. Again, we dismiss the mab proposition since every agent has learned this. We introduce the proposition mb and execute Pubða; mb Þ. So, we end up with the model in Figure 16. The last transition rule that is triggered is Bb mb ) Lb m; S;b m. Again, we discard the proposition that holds in every state, which is mb , and focus on the most interesting proposition m. First b learns m, as dictated by the operation Lb m, which results in the model in Figure 17. Next, the second action for the transition rule is that all learn Bb m _ Bb :m. If we execute this, we get the model ðM0 ; w0 Þ which is depicted in Figure 18. Recall that this model ðM0 ; w0 Þ is obtained from the initial model ðM; wÞ by application of the actions Pubða; ma Þ; Pubðb; mab Þ; Pubða; mb Þ and associated transition rules.

Figure 15. After Pubðb; mab Þ.

Figure 16. After Pubða; mb Þ.

Figure 17. After Lbm .

[320]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

261

Figure 18. After S;b m.

Moreover, in the resulting model ðM0 ; w0 Þ it holds that (i) Bb m, (ii) Cab m, and :ðBc m _ Bc :mÞ. 5.2. The Wide-Mouthed Frog protocol The next example we address to illustrate the update machinery developed above, is the well-known Wide-Mouthed Frog protocol (see, e.g., Borrows 1990; Abadi and Gordon 1999). The protocol exchanges a session key k from the agent a to another agent b via a server s. Then, agent a sends agent b a message protected with the session key k. It is assumed, that the agents a and b share each a symmetric key, kas and kbs say, with the server. The protocol can be described by 1: a ! s : fkgkas 2: s ! b : fkgkbs 3: a ! b : fmgk The keys kas and kbs are shared among a and s, and among b and s, respectively. The key k is fresh and initially only known to agent a, as is message m. In the analysis we want to focus on the session key k and the message m it protects. Therefore the protocol is represented by the sequence of actions Privða ! s; kÞ; Privðs ! b; kÞ; Pubða; fmgk Þ: Thus, the security of the channel, based on the server keys kas and kbs is expressed by private rather than public communication. We assume that the ‘ports’ of the channel from a to b can be observed, but the ones for the communication with the server are not visible to other parties. [321]

262

ARJEN HOMMERSOM ET AL.

The initial knowledge is depicted in Figure 19. As transition rule we adopt Bb fmgk ) Lb m; S;b m; S;b k i.e., after b has received the encrypted message fmgk it can learn its content m and everybody learns that b knows about it. Moreover, if agent b is known to know about the message m, then it must know about the session key k as well. Note that, there are no transition rules dealing with the communication with the server. Execution of the ﬁrst action Privða ! s; kÞ leads to an update of the knowledge of s. This is represented by a model with six (reachable) states in Figure 20. Similarly, but more complicated, the execution of the second action Privðs ! b; kÞ induces the model in Figure 21. The model gets more involved because, by assumption, the

Figure 19. Starting point.

Figure 20. After Privða ! s; kÞ.

[322]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

263

Figure 21. After Privðs ! b; kÞ.

learning of messages exchanged with the server is private. For example, agent a is not aware of agent b learning about the key k. So far, no transition rules have been triggered. Next, we execute the last step of the protocol, viz. the public communication Pubða; fmgk Þ. For this we need to add the atom mk abbreviating fmgk ¼ ½½fmgk . This doubles the number of states of the models. However, since mk will be known to all agents, its negative part can be discarded. Now, the transition rules gets activated. So, agent b learns the content m and the other agents learn that b knows about m and k, resulting in the ﬁnal model in Figure 22. Since agent b is learning twice in a row, the diﬀerence between Figures 21 and 22 are the absence of b-arrows to :m-states and between states with diﬀerent values for about m and k. Typical properties of this model include Bb ðm ^ kÞ, agent b knows the values of the message m and session key k, :Cab m; m is not commonly known by agents a and b, and, :Bc Bs k agent c does not know that the server knows the session key k. That agents a and b do not share [323]

264

ARJEN HOMMERSOM ET AL.

Figure 22. After Pubða; fmgk Þ.

the knowledge about the session key, is debatable. One way out is to modify the transition rules and have the operation Lab m instead of Lb . 6. CONCLUSION

Inspired by recent work on dynamic epistemic logics, we have proposed a logical language for describing (properties of) runs of security protocols. The language contains constructs for the three basic types of epistemic actions that happen during such runs. The semantics of the language is based on traditional Kripke models representing the epistemic state of the agents. Changes in the epistemic state of the agent system as a result of the execution of a protocol are described by means of transition rules that precisely indicate what belief updates happen under certain preconditions. These belief updates give rise to modiﬁcations of the models representing the agents’ epistemic [324]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

265

state in a way that is precisely given by semantic operations on these models. We have illustrated our approach for two well-known security protocols, viz. the SRA Three Pass protocol and the Wide-Mouthed Frog protocol. The semantic updates we used, operate on traditional Kripke models as opposed to updates in the approaches of Gerbrandy and Baltag et al. We believe that this will make it less troublesome to integrate these updates into existing model checkers, which hopefully will lead to better and new tools for verifying properties of security protocols. However, for the development of the theory, it is important to establish the precise connection of the explicit approach followed here and the approach based on action models as advocated in Baltag et al. (1998), van Ditmarsch (2000), Baltag (2002), and Baltag and Moss (2004) A ﬁrst step into this direction has been presented here, but many others will have to follow. Nevertheless, it points to a promising opportunity to establish a ﬁrm relationship between logical theory and security protocol analysis, to the beneﬁt of the latter. Although future research will have to justify this, we are conﬁdent that our method, preferably with some form of computer assistance, can be employed for a broad class of veriﬁcation problems concerning security protocols because of the ﬂexibility of our approach using transition rules for epistemic updates. ACKNOWLEDGEMENTS

We are grateful to the anonymous reviewers for their constructive comments and to Simona Orzan for various discussions on the subject. We are also indebted to Peter Lucas for his support to this research. REFERENCES Abadi, M. and A. Gordon: 1999, ‘A calculus for cryptographic protocols: The spi calculus’, Information and Computation 148, 1–70. Abadi, M. and M. Tuttle: 1991, ‘A semantics for a logic of authentication’, in Proc. PODC’91, ACM, pp. 201–216. Agray, N., W. van der Hoek, and E. P. de Vink: 2001, ‘On BAN logics for industrial security protocols’, in B. Dunin-Keplicz and E. Nawarecki, (eds.), From Theory to Practice in Multi-Agent Systems, LNAI 2296, pp. 29–38. Anderson, R. J.: 2001, Security Engineering: A Guide to Building Dependable Distributed Systems, Wiley.

[325]

266

ARJEN HOMMERSOM ET AL.

Baltag, A.: 2002, ‘A logic for suspicous players: Epistemic actions and belief-updates in games’, Bulletin of Economic Research 54, 1–46. Baltag, A.: and L. S. Moss: 2004, ‘Logics for epistemic programs’, Synthese: Knowledge, Rationality and Action 139, 165–224. Baltag, A., L. S. Moss, and S. Solecki: 1998, ‘The logic of public announcements, common knowledge and private suspicions’, in Itzhak Gilboa, (ed.), Proc. TARK’98, pp. 43–56. Bleeker, A. and L. Meertens: 1997, ‘A semantics for BAN logic’, in Proceedings DIMACS Workshop on Design and Formal Veriﬁcation of Protocols, DIMACS, Rutgers University, http://dimacs.rutgers.edu/Workshops/Security. Burrows, M., M. Abadi, and R. M. Needham: 1990, ‘A logic of authentication’, ACM Transactions on Computer Systems 8, 16–36. Clark, J. A. and J. L Jacob: 1997, ‘A survey of authentication protocols 1.0’, Technical Report, University of York. Dolev, D. and A. C. Yao: 1983, ‘On the security of public-key protocols’, IEEE Transaction on Information Theory 29, 198–208. Gerbrandy, J.: 1997, ‘Dynamic epistemic logic’, Technical Report LP- 97–04, ILLC. Gerbrandy, J.: 1999, ‘Bisimulations on Planet Kripke’, PhD thesis, ILLC Dissertation Series 1999–01, University of Amsterdam. Hommersom, A. J.: 2003, ‘Reasoning about security’, Master’s thesis, Universiteit Utrecht. Kessler, V. and H. Neumann: 1998, ‘A sound logic for analyzing electronic commerce protocols’, in J. -J. Quisquater, Y. Deswarte, C. Meadows, and D. Gollman (eds.), Proc. ESORICS’98, LNCS 1485, pp. 345–360. Kooi, B.: 2003, ‘Knowledge, Chance, and Change’, PhD thesis, ILLC Dissertation Series 2003–01, University of Groningen. Lowe, G.: 1996, ‘Breaking and ﬁxing the Needham-Schroeder public-key protocol using FDR’, Software - Concepts and Tools 17, 93–102. Roorda, J.-W., W. van der Hoek, and J.-J. Ch Meyer: 2002, ‘Iterated belief change in multi-agent systems’, in Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems: Part 2. Schneier, B.: 2000, Secrets and Lies: Digital Security in a Networked World, Wiley. Stubblebine, S. G. and R. N. Wright: 2002, ‘An authentication logic with formal semantics supporting synchronization, revocation and recency’, IEEE Transactions on Software Engineering 28, 256–285. van Ditmarsch, H. P.: 2000, ‘Knowledge games’, PhD thesis, ILLC Dissertation Series 2000–06, University of Groningen. van Ditmarsch, H. P.: 2001, ‘The semantics of concurrent knowledge actions’, in M. Pauly and G. Sandu, (eds.), Proc. ESSLLI Workshop on Logic and Games, Helsinki. Wedel, G. and V. Kessler: 1996, ‘Formal semantics for authentication logics’, in E. Bertino, H. Kurth, G. Martello, and E. Montolivo, (eds.), Proc. ESORICS’96, LNCS 1146, pp. 219–241. Arjen Hommersom Nijmegen Institute for Computing and Information Sciences University of Nijmegen Nijmegen, The Netherlands E-mail: [email protected]

[326]

UPDATE SEMANTICS OF SECURITY PROTOCOLS

267

John-Jules Meyer Institute of Information and Computing Sciences University of Utrecht Utrecht, The Netherlands Erik de Vink Department of Mathematics and Computer Science Technische Universiteit Eindhoven Leiden Institute of Advanced Computer Science Leiden University Leiden, The Netherlands

[327]

Index ability, 77, 85 action, 77, 152 axiom, see axiom conditional plan, 82 epistemic, see epistemic games, 77 in turns, 77 joint, 282 model, 4, 290, 310–313 non-deterministic, 29 one-step, 82 past, 120 payoﬀ-relevant, 169 precondition, 4 probability vector, 155 rule, see rule signature, 4, 39–42 simultaneous, 77 agent, 77, 82, 85, 148, 149, 220, 263, 289, 290 honest, 292 intruder, 292 rational, 133 agent system, see system Aizerman’s axiom, see axiom alternating -time temporal epistemic logic, see logic, ATEL -time temporal logic, see logic, ATL epistemic transition system, 77 transition system, 77–78, 85–87 announcement, 3 about announcements, 19 of particular agents, 47 private, 4, 28, 42, 57 public, 2, 15, 19, 41, 292

rule, see rule types, 19 announencement private, 46 ATEL, see alternating-time temporal epistemic logic ATL, see alternating-time temporal logic ATL∗ , see logic axiom [π]-normality, 50 2∗C -normality, 50 2A -normality, 50 action knowledge, 5 action mix, 50 action-knowledge, 50, 53 Aizerman’s, 70 announcement, 54 atomic permanence, 50 epistemic mix, 50 partial functionality, 50 Ramsey, 51 BAN logic, see logic Bayesian network, 235–238 Bayesian tree, 239–244 BDI, see logic belief, 2, 80, 101, 167, 214, 246, 289, 290, 298 change, 61–62 contraction, 61, 64, 68 degree of, 246 dynamics of, 67–68 formation, 61–65, 70–72 posterior, 246 prior, 169 removal, see belief contraction

329

INDEX

330 revision, 61, 63–64 set, 61 update, see update weighted, 227 Bernoulli variables, 245 bisimulation, 24–26, 31, 45, 46, 113, 295, 305–307, 316 Boolean constrained propagation, 274

decision jury, 235 majority, 235, 237 decision procedure, 265 decision theory, 190 disjoint union, 15, 37 disjunction in the premises, 63 distributed system, 26

Chernoﬀ property, 70 choice, 161 collective, 82 rational, 62, 70–72 circumscription, 191 CL, see coalition game logic coalgebra, 26 coalition, 77 eﬀectivity, 84 coalition eﬀectivity model, 77 coalition game logic, see logic, CL cognition, 203–205 common cause, 235, 241 knowledge, see knowledge communication, 167, 176, 312 honest, 182 private, 321 public, 321, 323 successful, 186 communicative act, 167 computation, 80–81, 87 neuron-like, 203 subsymbolic, 203 symbolic, 203 computation tree logic, see logic, CTL Condorcet jury model, 235–238, 252 conjunctive normal form, 265, 272– 274 connectionism, 216–217, 220, 225 connectionist architecture, 206 network, see neural convention, 174–177 cooperation, 263 CTL, see computation tree logic CTL∗ , see logic

ECL, see extended coalition game logic eﬀectivity function, 80, 88–90 entropy, 159 epistemic action, 2 cheating, 8 deterministic, 6 lying, 8, 36, 48 model, 3 private communication, 11 suspicion, 8 type, 22 axiom, see axiom logic, see logic operator, 18 proposition, 12 relevance, 71 state, 2 value, 71 epistemic accessibility, 264 epistemic update, 12 equilibrium, 170, 206 convention as, 174 dynamic, 145, 180 Nash, see Nash salient, 176 selection, 169, 175 separating, 187 strict, 175 extended coalition game logic, see logic, ECL extensive form, see game ﬁxed point, 275 game, 117, 271, 290

INDEX asymmetric, 178 beauty contest, 271 cheap talk, 170 coalitional, 80 concurrent structure, 77, 87, 266 cooperative, 180 evolutionary asymmetric, 144 extensive form, 79, 84, 117, 125 form, 93 frame, 82 matching pennies, 80, 138 model, 87 multi-plaer, 83 multi-player, 77, 78, 82–84 prisoner’s dilemma, 141, 146, 157 repeated, 181 sequential, 170 signalling, 167, 169–174 strategic form, 79, 82, 84, 91, 93, 137–139 structure, 80, 263 symmetric, 178 theory, 77–80, 118 evolutionary,133,136–147,156– 163, 167 turn-based, 79 von Neumann, 117, 118, 121– 125 Grice’s maxim, see maxim

331 incomplete, 77, 80, 102,113,152, 263 partition, 117 player’s, 117 private, 169 relevant, 189 set, 80, 117, 119 speciﬁcation, 207 state, 205–208 structure, 190 transfer, 167 intention, 168 intentionality, 167 iteration, 37 key

Hopﬁeld model, 216 Hopﬁeld network, 203, 206–209, 225

cryptographic, 289, 292 symmetric, 317, 321 Kleene star, 17 knowledge, 1–2, 100–112, 154, 214, 243, 263, 292, 318 base, 205 changes of, 1 collective, 82 common, 2, 5–10, 22, 50, 53, 57, 100, 118, 122, 168, 169, 174, 186, 268, 271 distributed, 100, 265, 268 individual, 82 past, 120 perfect, 162 Kripke semantics, 1

iconicity principle, 187 implicatures, 184 independence of alternatives, 70 inference, 61, 62 defeasible, 62 nonmonotonic, 62, 70 information, 293 atomic, 6 complete, 102, 182, 263 completion, 121–124, 126 economics of, 136 exchange, 169 ﬂow, 79

language, 167, 174–177 evolutionary stable, 187 natural, 169 organisation, 167 use, 183–192 learnability, 193 learning, 133, 176, 188, 290, 313 automata, 154–156 conﬂict-based, 274 cross, 152–154 private, 292 Q, 150, 152, 156 reinforcement, 133, 147–161

332 stochastic, 157 linguistinc convention, 175 logic action, 79 agents, 79 ATEL, 77, 100–112, 263, 268– 272, 283 ATL, 77, 84–88, 90–99, 103–112, 263, 268 ATL∗ , 84–85 BAN, 289 BDI, 111 CL, 77, 82–84, 88 common knowledge of alternatives, 47 conditional, 51 CTL, 77, 80–81, 83, 84, 265, 268, 285 CTLK, 266, 285 default, 63, 214 dynamic, 5 dynamic doxastic, 2, 11 dynamic epistemic, 2, 11, 289, 324 ECL, 77, 84, 88 epistemic, 4, 5, 13, 77, 80–81, 289, see also ATEL program, 44–45 modal, 1, 13 inﬁnitary, 25 multi-modal, 78, 88 nonmonotonic, 203 PDL, 21, 23, 111 penalty, 227 private announcement, 19 propositional dynamic, 4 public announcement, 18 temporal, 78 lying, 8, 36, 48, 183 markedness, 220 Markov decision problem, 152 decison process, 149 Markov condition, 235, 239, 242 Markov process, 156

INDEX maxim formalisation, 72–74 Grice’s of conversation, 183–184, 186, 190 pragmatics, 185 memory, 117–124, 193, 206, 209 of past knowledge, 117 message, 289, 292 modality cooperation, 84, 263 synamic, 17 tense, 268 model action, 27 most plausable, 70 program, 27 epistemic, 28 state, 12 model checking, 85, 101, 263 bounded, 266 symbolic, 263, 276–281 unbounded, 263, 265, 276–281, 285 monotony cautious, 63 cumulative, 63 multi-agent system, see system mutation, 143, 159 Nash equilibrium, 113, 136, 139, 179 neural network, 203, 206 nonmonotonic inference, 211, 217– 219 operator, 81 autoepistemic, 69 epistemic, 266, 268 future, 124 knowledge, 112, 124 past, 124 sometime, 85 temporal, 84, 112 optimality theory, 203–205, 220–226 Pareto optimality, 139, 187, 188 partition, 80, 119

INDEX choice, 119 extended, 118–124 information, see information payoﬀ, see utility, 137, 143 PDL, see propositional dynamic logic phonology, 220 player, 82, 119 rational, 136 policy, 149 Poole system, see system pragmatics, 184 preference, 183, 215 prior belief, see belief probability, see probability private announcement, see announcement, see announcement probability, 238, 244–246 prior, 245 propositional dynamic logic, see logic, PDL protocol, 291 security, 289, 290, 324 SRA three pass, 289, 316–321 Wide-Mouthed Frog, 289, 321– 325 public announcement, see announcement, 18 Q-principle, 189 Quantiﬁed Boolean Formula, 265, 272– 274 question, 189–191 rational agent, 162, 167, 176 rational choice, see choice rational player, 162 reasoning common sense, 72 defeasible, 63 everyday, 74 formal, 73 recall imperfect, 102 perfect, 86, 102, 117, 120 reductionism, 229

333 relation accessibility, 6, 12, 81, 103 epistemic, 100 equivalence, 100, 119 relativization, 2 replicator dynamics, 143–147, 180, 188 replicator equations, 142 reputation, 181 retention, see memory reward, 149 rule action, 5, 50 announcement, 54 common knowledge induction, 53 modal, 50, 54 modus ponens, 50 necessitation, 50 pragmatic interpretation, 185 program induction, 50 update, see update rule system, see system SAT-solver, 276 security protocol, see protocol selection, 143 selection function, 70 Sen’s property α, 70 sequential composition, 15, 35 signalling, 182 signalling game, see game skip, 15, 46 stability, 177–181 state epistemic, 289, 324 global, 263, 277 initial, 267 local, 291 state explosion, 264 strategy, 79, 82, 102, 137, 143, 153, 160, 167, 263, 269 answer-, 189 collective, 84, 86 dominant, 138 equilibrium, 191

334 evolutionary stable, 136, 139– 142, 177–181 interpretation, 192 local eﬃcient, 180 memoryless, 86 mixed, 145, 170 perfect recall, 86 population, 134 pure, 145, 170 winning, 272 structure behavioural, 81 computational, 81 symbolism, 216–217 synchronous, 91, 118 system agent, 133, 134 alternating, see alternating connectionist, 203 deterministic, 85 dynamic, 133 interpreted, 92 multi-agent, 78, 137, 147–156, 263 neural, 211 open, 98 Poole, 203, 214–216, 222 reactive, 263 resonance, 209 rule, 203 signalling, see game synchronous, 118 transition, 81, 264, 269 update, 2, 15–18, 23, 31, 290 belief, 298 construction, 292–312 operation, 3 procedure, 207 product, 27, 30–31 update rule, 155 update scheme, 159 utility, 79 veriﬁcation, 263, 289, 290, 325 vote, 235, 243–246

INDEX

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close