Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Edited by J. G. Carbonell and J. Siekmann
Lecture Notes in Computer Science Edited by G. Goos, J, Hartmanis and J. van Leeuwen
1209
Lawrence Cavedon Anand Rao Wayne Wobcke (Eds.)
Intelligent Agent Systems Theoretical and Practical Issues Based on a Workshop Held at PRICAI '96 Cairns, Australia, August 26-30, 1996
Springer
Series Editors Jaime G. Carbonell, Carnegie Mellon Universit)~ Pittsburgh, PA, USA JOrg Siekmann, University of Saarland, Saarbriicken, Germany
Volume Editors Lawrence Cavedon Royal Melbourne Institute of Technology, Computer Science Department 124 La Trobe Street, Melbourne, Victoria 3000, Australia E-mail: cavedon @cs.rmit.edu.au Anand Rao Australian Artificial Intelligence Institute Level 6, 171 LaTrobe Street, Melbourne, Victoria 3000,Australia E-mail:
[email protected] Wayne Wobcke University of Sydney, Basser Department of Computer Science Sydney, NSW 2006, Australia E-mail:
[email protected] Cataloging-in-Publication Data applied for Die Deutsche Bibliothek- CIP-Einheitsaufnahme
Intelligent agent systems : t h e o r e t i c a l a n d p r a c t i c a l issues ; b a s e d o n a w o r k s h o p h e l d at P R I C A I '96, Cairns, Australia, August 26 - 30, 1996. L a w r e n c e C a v e d o n ... (ed.). - B e r l i n ; Heidelberg ; New York ; Barcelona ; Budapest ; Hong Kong ; L o n d o n ; M i l a n ; Paris ; S a n t a C l a r a ; S i n g a p o r e ; T o k y o : S p r i n g e r , 1997 (Lecture notes in computer science ; Vol. 1209 : Lecture notes in artificial intelligence) ISBN 3-540-62686-7 NE: Cavedon, Lawrence [Hrsg.]; PRICA1 <4, 1996, Cairns>; GT CR Subject Classification (1991): 1.2, D.2, C.2.4 ISBN 3-540-62686-7 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer -Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1997 Printed in Germany Typesetting: Camera ready by author SPIN 10550374 06/3142 - 5 4 3 2 I 0
Printed on acid-free paper
Preface This volume emanated from a workshop on the Theoretical and Practical Foundations of Intelligent Agents, held at the Fourth Pacific Rim Internatione~l Conference on Artificial Intelligence in Cairns, Australia, in August, 1996. The aim of the workshop was to bring together researchers working on formal aspects of rational agency, novel agent architectures, and principles underlying the implementation of agent-based systems. The papers presented at the workshop were revised (in some cases quite substantially) following comments from referees and workshop participants. Further papers were solicited from a number of leading researchers in the area who also served on the programme committee of the workshop: John Bell, Yves Lesperance, JSrg M/Jller, and Munindar Singh. The papers in this volume have been grouped around three main topics: agent architectures, formal theories of rationality, and cooperation and collaboration. The papers themselves represented a broad cross-section of topics within these areas, including software agents, BDI architectures, 1 social commitment, believable agents, and Artificial Life. The workshop also included an invited presentation by Professor Rodney Brooks of Massachusetts Institute of Technology, and a panel discussion on the role of beliefs, desires, and intentions in the design of autonomous agents (not reproduced in this volume).
Agent architectures This section contains an extensive survey of control architectures for autonomous agent systems (M~ller) and papers covering issues related to agent architectures, including software agents (Cranefield and Purvis) and believable agents (Padgham and Taylor). Such issues are gathering increasing importance, particularly with the growing interest in distributed applications and animated interfaces. Also included is a paper on the use of a logic programming language (Golog) to build agent systems (Lesperance et al.). Golog represents a significant link between formal theory (for reasoning about action) and practice. Miiller's article surveys various architectures proposed for reactive agents, deliberative agents, interacting agents, hybrid approaches (i.e., combining the reactive and deliberative approaches), as well as believable agents (i.e., agents with "personality"), software agents, and softbots. This article is in the same style as (but significantly extends) part of Wooldridge and Jennings's earlier survey article. 2 Lesperance, Levesque, and Ruman describe the use of the logic programming language Golog in building a software agent system to support personal banking 1 A number of papers in this volume are based on a BDI framework, in which rational behavior is described in terms of an agent's beliefs, desires, and intentions, drawing particularly on the work of: Bratman, M.E. Intentions, Plans and Practical Reason, Harvard University Press, Cambridge, MA, 1987. Wooldridge, M. and Jennings, N.R. "Intelligent agents: theory and practice," Knowledge Engineering Review, 10, 1995.
Yl over networks. Golog is based on a situation calculus approach to reasoning about actions and their effects. Programming in Golog involves writing out the preconditions and expected effects of actions as axioms. The Golog interpreter constructs proofs from these axioms, which enable the prediction of the effects of various courses of action; these are then used to select the most appropriate action to perform. Cranefield and Purvis present a software agent approach to coordinating different software tools to perform complex tasks in a specified domain. Their architecture combines a planning agent with KQML-communicating agents that control the particular software tools. Goals (tasks requiring multiple subtasks) are sent to the planner, which plans a sequence of actions, each of which is performed by some tool agent, New tool agents are incorporated into the architecture by specifying the preconditions and effects of each action that can be performed in terms of the abstract view of the data on which the actions operate. Padgham and Taylor describe an architecture for believable agents formed by extending a BDI agent architecture with a simple model of emotions and personality. In their model, the way an agent reacts to goal successes and failures affects both the agent's choice of further actions and the depiction of that agent as an animated figure. Personality is a function of the emotions and "motivational concerns" of an agent, as well as how predisposed the agent is in reacting to those emotions.
Formal theories of rationality The papers in this section all concern the formal foundations of agents. Two papers address the formalization of the notion of commitment: in particular, the relationship of commitments to resource-boundedness (Singh) and to goal maintenance and coherence (Bell and Huang). A further two papers address issues related to formalizing "practical" as opposed to "idealized" rational agents: one paper presents a model of a "limited reasoner" (Moreno and Sales) while the other provides a model of a particular implemented system (Morley). The final paper in this section (van der Meyden) shows some interesting connections between logical specifications of knowledge-based protocols and their implementations. 3 These last three papers reflect a current trend towards research bridging the gap between theory and practice. Singh investigates the notion of commitment within a BDI framework. After reviewing the way in which such commitment is of benefit to resource-bounded agents, he discusses and formalizes an approach to precommitments. A precommitment is effectively a long-term commitment to an intention, one that is not normally reconsidered during the agent's deliberative reasoning. Adopting a precommitment has several consequences for an agent, e.g., the cost of satisfying This is related to the work of: Rosenschein, S.J. and Kaelbling, L.P. "The synthesis of digital machines with provable epistemic properties," in Halpern, J.Y. (Ed.) Theoretical Aspects of Reasoning About Knowledge: Proceedings of the 1986 Conference, Morgan Kaufmann, Los Altos, CA, 1986.
VII the corresponding commitment may be increased, or the option of adopting that commitment (in the future) may be ruled out altogether. The potential benefit to the agent is that deliberation may be less resource-intensive after the precommitment has been made. Bell and Huang also consider the concept of strength of commitment. They present a logical framework under which an agent's goals are arranged in a hierarchy depending on the strength of the agent's commitment to them. Within this framework, Bell and Huang address issues such as the mutual coherence of an agent's goals, and the revision of goals when multiple goals are not simultaneously achievable. Moreno and Sales present a model of a "limited reasoner" within an approach to agent design based on a syntactic view of possible worlds. An agent's inference method is implemented using a semantic tableau proof procedure; imperfect reasoning arises from limiting the class of rules used in proof construction. Morley uses a logic of events to provide a detailed formal model of a particular BDI agent architecture (Georgeff and Lansky's pRS4). Morley's logic of actions and events is specifically designed to handle parallel and composite events, making it particularly suited to modeling multi-agent and dynamic environments. Van der Meyden investigates the automatic generation of finite state implementations of knowledge-based programs, i.e., programs whose action-selection is specified in terms of their "knowledge" (represented using modal logic) of their environment. 5 In particular, he defines a sufficient condition under which a finite state implementation of a knowledge-based program exists, under the assumption of perfect recall--the assumption that a program has full knowledge of its previous observations--and defines a procedure (using concepts from the formal analysis of distributed systems) that constructs an efficient finite-state implementation for such programs.
C o o p e r a t i o n and collaboration The papers in this section address the extension of models of single agents to multiple cooperating or competing agents. The first paper addresses the formalization of commitment between agents in a multi-agent setting (Cavedon et al.). The remaining papers adopt an experimental methodology: one paper presents some interesting effects in a multi-agent version of the Tileworld that arise from simplified communication between agents (Clark et al.); the second presents an approach to the prisoner's dilemma in an Artificial Life environment (Ito). Cavedon, Rao and Tidhar investigate the notion of social commitment: i.e., commitment between agents, as opposed to the "internal" commitment of an agent to a goal (e.g. as in Singh's paper). Cavedon et al. describe preliminary 4 Georgeff, M.P. and Lansky, A.L. "Reactive reasoning and planning," Proceedings of the Sixth National Conference on Artificial Intelligence, 1987. 5 An extensive introduction to this approach is given in: Fagin, R., Halpern, J.Y., Moses, Y. and Vardi, M.Y. Reasoning About Knowledge, MIT Press, Cambridge, MA, 1995.
VIII
work towards formalizing Castelfranchi's notion of social commitment6 within a BDI logic. They relate the social commitments of a team to the internal commitments of the agents (though not totally reducing the former to the latter), and formally characterize a variety of social behaviors an agent may display within a team environment. Clark, Irwig, and Wobcke describe a number of experiments using simple BDI agents in the Tileworld, a simple dynamic testbed environment, investigating the "emergence" of benefits to agents arising from simple communication of an agent's intentions to other nearby agents. Their most interesting result is that under certain circumstances, the performance of individual agents actually improves as the number of agents increases, despite a consequent increase in competition for resources due to those agents. Ito adopts an Artificial Life approach to the iterated Prisoner's Dilemma game, and shows that the "dilemma" can be overcome in a setting involving disclosure of information: i.e., in which agents are required to honestly disclose their past social behavior. The experimental results show that a population of cooperative agents eventually dominates a population of more selfish agents under these conditions.
Acknowledgements The editors would like to thank the organizers of the Fourth Pacific Rim International Conference on Artificial Intelligence (PRICAI'96) for their support of the workshop, and all the workshop participants. We would particularly like to thank the programme committee members. Cavedon and Rao were partially supported by the Cooperative Research Centre for Intelligent Decision Systems.
Programme Committee John Bell Queen Mary and Westfield College, UK David Israel SRI International, USA Yves Lesperance York University, Canada JSrg Mfiller Mitsubishi Electric Digital Library Group, UK Ei-Ichi Osawa Computer Science Laboratory, Sony, Japan Munindar Singh North Carolina State University, USA Liz Sonenberg University of Melbourne, Australia
Workshop Organizers Lawrence Cavedon Royal Melbourne Institute of Technology, Australia Anand Rao Australian Artificial Intelligence Institute, Australia Wayne Wobcke University of Sydney, Australia 6 Castelfranchi, C. "Commitments: from individual intentions to groups and organizations," Proceedings of the First International Con]erence on Multi-Agent Systems, 1995.
Contents
C o n t r o l A r c h i t e c t u r e s for A u t o n o m o u s a n d I n t e r a c t i n g A g e n t s : A S u r v e y (Invited paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JSrg Mfitler
1
A n E x p e r i m e n t in Using Golog t o B u i l d a P e r s o n a l B a n k i n g A s s i s t a n t (Invited paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yves Lespdrance, Hector J. Levesque and Shane J. Ruman
27
A n A g e n t - B a s e d A r c h i t e c t u r e for S o f t w a r e Tool C o o r d i n a t i o n . . . . . 44 Stephen Cranefield and Martin Purvis S y s t e m for Modelling Agents Having E m o t i o n and Personality Lin Padgham and Guy Taylor
A
C o m m i t m e n t s in t h e A r c h i t e c t u r e o f a L i m i t e d , R a t i o n a l A g e n t (Invited paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Munindar P. Singh D y n a m i c G o a l H i e r a r c h i e s (Invited paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Bell and Zhisheng Huang
.
59
72
88
L i m i t e d Logical B e l i e f Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Moreno and Ton Sales
104
Semantics of BDI Agents and Their Environment . . . . . . . . . . . . . . . . . David Morley
119
Constructing Finite State Implementations of Knowledge-Based P r o g r a m s w i t h P e r f e c t Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ron van der Meyden Social a n d I n d i v i d u a l C o m m i t m e n t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lawrence Cavedon, Anand Rao and Gil Tidhar
135
152
E m e r g e n t P r o p e r t i e s of T e a m s o f A g e n t s in t h e T i l e w o r l d . . . . . . . . 164 Malcolm Clark, Kevin Irwig and Wayne Wobcke H o w do A u t o n o m o u s A g e n t s Solve Social D i l e m m a s ? . . . . . . . . . . . . . Akira Ito
177
A n E x p e r i m e n t in Using Golog to Build a Personal Banking Assistant* Yves Lesp@rance 1, Hector J. Levesque 2, a n d S h a n e J. R u m a n 2 Department of Computer Science, Glendon College, York University, 2275 Bayview Ave., Toronto, ON, Canada M4N 3M6 lesperan@yorku, ca 2 Department of Computer Science, University of Toronto Toronto, ON, Canada M5S 1A4 { h e c t o r , s h a n e ) ~ a i , t oront o. e d u
A b s t r a c t . Golog is a new programming language based on a theory of action in the situation calculus that can be used to develop multi-agent applications. The Golog interpreter automatically maintains an expficit model of the agent's environment on the basis of user supplied axioms about the preconditions and effects of actions and the initial state of the environment. This allows agent programs to query the state of the environment and consider the effects of various possible courses of action before deciding how to act. This paper discusses a substantial multi-agent application developed in Golog: a system to support personal banking over computer networks. We describe the overall system and provide more details on the agent that assists the user in responding to changes in his financial situation. The advantages and limitations of Golog for developing multi-agent applications are discussed and various extensions are suggested.
1
Introduction
G o l o g is a new logic p r o g r a m m i n g l a n g u a g e for d e v e l o p i n g intelligent s y s t e m s t h a t are e m b e d d e d in c o m p l e x e n v i r o n m e n t s a n d use a m o d e l of the e n v i r o n m e n t in d e c i d i n g how to act [7, 4]. It is well suited to p r o g r a m m i n g e x p e r t a s s i s t a n t s , software agents, a n d intelligent robots. T h e l a n g u a g e is based on a f o r m a l t h e o r y of a c t i o n specified in an e x t e n d e d version of the s i t u a t i o n calculus. T h e G o l o g i n t e r p r e t e r a u t o m a t i c a l l y m a i n t a i n s an explicit m o d e l of the s y s t e m ' s environm e n t on t h e basis of user s u p p l i e d a x i o m s a b o u t t h e p r e c o n d i t i o n s a n d effects of a c t i o n s a n d the i n i t i a l s t a t e of the e n v i r o n m e n t . T h i s allows p r o g r a m s to q u e r y the s t a t e of t h e e n v i r o n m e n t a n d consider the effects of various possible courses of a c t i o n before c o m m i t t i n g to a p a r t i c u l a r a l t e r n a t i v e . T h e net effect is t h a t p r o g r a m s m a y be w r i t t e n at a much higher level of a b s t r a c t i o n t h a n is u s u a l l y possible. A p r o t o t y p e i m p l e m e n t a t i o n in P r o l o g has been developed. * This research was made possible by financial support from the Information Technology Research Center, the Natural Science and Engineering Research Council, and the Institute for Robotics and Intelligent Systems.
28 In this paper, we discuss the most substantial experiment done so far in using Golog to develop an application. The application is a system that assists its users in doing their personal banking over computer networks. The system is realized as a collection of Golog agents that interact. Users can perform transactions using the system. They can also have the system monitor their financial situation for particular conditions and take action when they arise, either by notifying them or by performing transactions on their behalf. Currently, the system only works in a simulated financial environment and has limited knowledge of the domain. But with more than 2000 lines of Golog code, it is certainly more than a toy. The personal banking application made an excellent test domain for a number of reasons. A shift from branch-based to PC-based banking would have far reaching and largely positive effects for customers and banks alike. Most banks and other financial institutions are actively investigating PC-based banking and electronic commerce. However, successful implementations of home banking will undoubtedly require more flexibility and power than a simple client-monolithic server environment can provide. Software agents can fill this need. Financial systems need to be extremely flexible given the number of options available, the volatility of markets, and the diversity of the user needs. Software agents excel at this type of flexibility. Furthermore, the distributed nature of information in the financial world (i.e. banks do not share information, users do not trust outside services, etc.) requires that applications have a distributed architecture. As well, Golog's solid logical foundations and suitability for formal analysis of the behavior of agents are attractive characteristics in domains that involve financial resources. Finally, personal bankingapplications raise interesting problems which vary substantially in complexity; this experiment resulted in a system that has some interesting capabilities; it is clear that it could be extended to produce a very powerful application. In the next section, we outline the theory of action on which Golog is based. Then, we show how complex actions can be defined in the framework and explain how the resulting set of complex action expressions can be viewed as a programming language. In section 5, we present the overall structure of the personal banking system and then focus on the agent that assists the user in responding to changes in his financial situation, giving more details about its design. In the last section, we reflect on this experiment in the use of Golog: what were the advantages and limitations of Golog for this kind of multi-agent application? We also mention some extensions of Golog that are under development and address some of the limitations encountered.
2
A Theory
of Action
Golog is based on a theory of action expressed in the situation calculus [11], a predicate calculus dialect for representing dynamically changing worlds. In this framework, the world is taken to be in a certain state (or situation). T h a t state can only change as a result of an action being performed. The term do(a, s) represents the state that results from the performance of action a in state s. For
29 example, the formula ON(B], B2, do(PUTON(B1, B2), s)) could mean that B1 is on B2 in the state that results from the action PuTON(B1, B2) being done in state s. Predicates and function symbols whose value m a y change from state to state (and whose last argument is a state) are called fluents. An action is specified by first stating the conditions under which it can be performed by means of a precondition axiom using a distinguished predicate Poss. For example,
Poss(CaEATEALERT(alertMsg, maxPrio, monlD), s) =_ -~2alertMsg I, m a x P r i d ALEaW(alertM sg~, maxPrid, monID, s)
(1)
means that it is possible in state s to create an alert with respect to the monitored condition m o n l D with the alert message alertMsg and the m a x i m u m priority maxPrio, provided that there is not an alert for the condition monID already; creating an alert leads the agent to send alert messages to the user with a degree of obtrusiveness rising up to maxPrio until it gets an acknowledgement. Secondly, one specifies how the action affects the world's state with effect axioms. For example,
Poss(CaEATEALEaT(alertMsg, maxPrio, monID), s) D ALEaW(alertMsg, max Prio, monID, do(CREATEALERT(alertM sg, max Prio, rnonI D), s) ) says that if the action CREATEALERT(alertMs9, maxPrio, monID) is possible in state s, then in the resulting state an alert is in effect with respect to the monitored condition monID and the user is to be alerted using the message alertMs9 sent at a priority up to maxPrio. The above axioms are not sufficient if one wants to reason about change. It is usually necessary to add frame axioms that specify when fluents remain unchanged by actions. The frame problem [11] arises because the number of these frame axioms is of the order of the product of the number of fluents and the number of actions. Our approach incorporates a treatment of the frame problem due to Reiter [14] (who extends previous proposals by Pednault [12], Schubert [16] and Haas [3]). The basic idea behind this is to collect all effect axioms about a given fluent and assume that they specify all the ways the value of the fluent may change. A syntactic transformation can then be applied to obtain a successor state axiom for the fluent, for example:
Poss(a, s) D [ALEaT(alert M sg, max Prio, monI D, do(a, s) ) = a = CaEATEALEItT(alertMsg, rnaxPrio, monID) V ALERT(alertMsg, maxPrio, monID, s) A a 7s DELETEALERT(monlD)].
(2)
This says that an alert is in effect for monitored condition monID with message alertMsg and m a x i m u m priority maxPrio in the state that results from action a being performed in state s iff either the action a is to create an alert with these attributes, or the alert already existed in state s and the action was not that of
30 canceling the alert on the condition. This treatment avoids the proliferation of axioms, as it only requires a single successor state axiom per fluent and a single precondition axiom per action. 3 It yields an effective way of determining whether a condition holds after a sequence of primitive actions, given a specification of the initial state of the domain.
3
Complex Actions
Actions in the situation calculus are primitive and determinate. They are like primitive computer instructions (e.g., assignments). We need complex actions to be able to describe complex behaviors, for instance that of an agent. Complex actions could be treated as first class entities, but since the tests that appear in forms like i f r t h e n 61 else 62 e n d I f i n v o l v e formulas, this means that we would have to reify fluents and formulas. Moreover, it would be necessary to axiomatize the correspondence between these reified formulas and the actual situation calculus formulas. This would result in a much more complex theory. Instead we treat complex action expressions as abbreviations for expressions in the situation calculus logical language. They may thus be thought of as macros that expand into the genuine logical expressions. This is done by defining a predicate Do as in Do(~, s, s') where ~ is a complex action expression. Do(~, s, s') is intended to mean that the agent's doing action g in state s leads to a (not necessarily unique) state s I. Do is defined inductively on the structure of its first argument as follows:
- Primitive actions: Do(a,s, s ~) d=e f Poss(a, s) A s' -- do(a, s). - Test actions: Do(C?, s, s') ~ f r
A s ----s'
r denotes the situation calculus formula obtained from r by restoring situation variable s as the suppressed situation argument. - Sequences: Do([(~l ; (~2],s, s') d--e--J3s*(Do(61, s, s*) A Do(62, s*, s')).
- Nondeterministic choice of action: 0o((61 I 52), s, s') d=_cfDo(~l, s, s') V 00(62, s, s'). - Nondeterministic choice of argument: Do(~x 6(x), s, s') %f B~ Do(6(x), s, s'). a This discussion ignores the ramification and qualification problems; a treatment compatible with our approach has been proposed by Lin and Reiter [9].
31
-
N o n d e t e r m i n i s t i c iteration: 4 Do(5*, s, s') de----rVP{
v81[p(s1, s,)]
A
Vs,, 8~, s3[P(s,, 82) A Do(5, s2, 83) D P ( s l , 83)] } D P ( s , 80. There is another case to the definition that handles procedure definitions (including recursive ones) and procedure calls. The complete definition appears in
[7]. As in dynamic logic [13], conditionals and while-loops can be defined in terms of the above constructs as follows: i f r t h e n 5, else 52 e n d I f d-ef [r w h i l e r d o 3 e n d W h i l e ~ f [[r
51]1[-~r
52],
5]*;-~r
We also define an iteration construct f o r x : r that performs 5(x) for all x's such that r holds (at the beginning of the loop). 5
4
Golog
The theoretical framework developed above allows us to define a p r o g r a m m i n g language called Golog (alGOl in LOGic). A Golog program includes both a declarative component and a procedural component. The declarative component specifies the primitive actions available to the agent, when these actions are possible and what effects they have on the agent's world, as well as the initial state of the world. The p r o g r a m m e r supplies: precondition axioms, one per primitive action, - successor state axioms, one per fluent, axioms specifying what fiuents holds in the initial state So. -
The procedural part of a Golog program defines the behavior of the agent. This behavior is specified using an expression in the language of complex actions introduced in the previous section (typically involving several procedure definitions 4 We use second order logic here to define Do(b*, s, s ~) as the transitive closure of the relation Do(~, s, s ~) - - transitive closure is not first order definable. A first order version is used in the implementation of Golog, but it is insufficient for proving that an iterative program does not terminate. 5 for x: r is defined as: [proc P(Q) /* where P is a new predicate variable */ if 3y Q(y) t h e n 7r y, R [Q(y) A Vz(R(z) =_ Q(z) A z r y)?; ~f(y); P(R)] e n d I f endProc; rr Q [Vz(O(z) - r P(Q)]]
32 followed by one or more calls to these procedures). Here's a simple example of a Golog program to get an elevator to move to the ground floor of a building:
proc DOWN(n) (n = 0)? I DowNONEFLooR; DOWN(n -- 1) endProc; ~rm [ATFLOOR(m)?; DOWN(m)] In the next section, we will see a much more substantial example. Golog programs are evaluated with a theorem prover. In essence, to execute a program, the Golog interpreter attempts to prove
Axioms ~ 3sDo(program, So, s). that is, that there exists a legal execution of the program. If a (constructive) proof is obtained, a binding for the variable s = do(a,~,.., do(a2, do(a1, So))...) is extracted, and then the sequence of actions al, a2, 9 9 an is sent to the primitive action execution module. The declarative part of the Golog program is used by the interpreter in two ways. The successor state axioms and the axioms specifying the initial state are used to evaluate the conditions that appear in the program (test actions and i f / w h i l e / f o r conditions) as the program is interpreted. The action preconditions axioms are used (with the other axioms) to check whether the next primitive action is possible in the state reached so far. Golog programs are often nondeterministic and a failed precondition or test action causes the interpreter to backtrack and try a different path through the program. For example, given the program (a; P?) I (bic), the Golog interpreter might determine that a is possible in the initial state So, but upon noticing that P is false in do(a, So), backtrack and return the final state do(c, do(b, So)) after confirming that b is possible initially and that c is possible in do(b, So). Another way to look at this is that the Golog interpreter automatically maintains a model of the world state using the axioms and that the program can query this state at run time. If a program is going to use such a model, it seems that having the language maintain it automatically from declarative specifications would be much more convenient and less error prone than the user having to program such model updating from scratch. The Golog programmer can work at a much higher level of abstraction. This use of the declarative part of Golog programs is central to how the language differs from superficially similar "procedural languages". A Golog program together with the definition of Do and some foundational axioms about the situation calculus is a formal theory about the possible behaviors of an agent in a given environment. And this theory is used explicitly by the Golog interpreter. In contrast, an interpreter for an ordinary procedural language does not use its semantics explicitly. Nor do standard semantics of programming languages refer to aspects of the environment in which programs are executed [1]. Note that our approach focuses on high-level programming rather than plan synthesis at run-time. But sketchy plans are allowed; nondeterminism can be
33 used to infer the missing details. For example, the plan w h i l e 3b ONTABLE(b) d o rob aEMOVE(b) e n d W h i l e leaves it to the Golog interpreter to find a legal sequence of actions that clears the table. Before moving on, let us clarify one point about the interpretation of Golog programs. The account given earlier suggests that the interpreter identifies a final state for the program before any action gets executed. In practice, this is unnecessary. Golog programs typically contain fluents that are evaluated by sensing the agent's environment. In the current implementation, whenever the interpreter encounters a test on such a fluent, it commits to the primitive actions generated so far and executes them, and then does the sensing to evaluate the test. One can also acid directives to Golog programs to force the interpreter to commit when it gets to that point in the program. As well, whenever the interpreter commits and executes part of the program, it rolls its database forward to reflect the execution of the actions [8, 10]. We have developed a prototype Golog interpreter that is implemented in Prolog. This implementation requires that the program's precondition axioms, successor state axioms, and axioms about the initial state be expressible as Prolog clauses. Note that this is a limitation of the implementation, not the theory. For a much more complete presentation of Golog, its foundations, and its implementation, we refer the reader to [7]. Prior to the personal banking system documented here, the most substantial application developed in Golog had been a robotics one [4]. The robot's task was mail delivery in an office environment. A high-level controller for the robot was programmed in Golog and interfaced to a software package that supports path planning and local navigation. The system currently works only in simulation mode, but we are in the process of porting it to a real robot. Golog does not in itself support multiple agents or even concurrent processes; all it provides is a sequential language for specifying the behavior of a system. A straightforward way of using it to implement a multi-agent system however, is to have each agent be a Golog process (under Unix with its own copy of the interpreter), and provide these agents with message passing primitive actions that are implemented using T C P / I P or in some other way. This is what was done for the personal banking system described below. 5 5.1
The
Personal
Banking
Application
System Overview
As discussed in the introduction, the multi-agent paradigm is well suited to the decentralized nature of network banking applications. In our experimental system, different agents correspond to the different institutions involved a n d / o r to major subtasks to be handled by the system. The different types of agents in our system and the kind of interactions they have are represented in figure 1. We have:
34
Pers Banking Assistant A g t ~ I ~--__~
transfer request
confirm/reject~'---............~_.~..~..~
balance update/ transfer/ acct recommendation request
~
balance update transfer confirm/reject acct recommendation
I Bank A g e n t ~ d e p o s i t / w i t h d r a w
[Transfer [ [Facilitator Agt[
request
& Fig. 1. System components.
-
-
-
which perform transactions under the direction of their user and monitor his account balances for problem conditions; when the agent detects such a condition, it tries to alert the user or resolve the problem on his behalf; b a n k a g e n t s , which perform operations on users' accounts at the agent's bank in response to requests; they also provide information on the types of accounts available at the bank; t r a n s f e r f a c i l i t a t o r a g e n t s , which take care of funds transfer between different institutions; r o u t e r a g e n t s , which keep track of agents' actual network addresses and dispatch messages; a u t o m a t e d t e l l e r m a c h i n e agents, which provide a simple ATM-like interface to bank agents (not represented on the figure). personal banking assistant agents,
Figure 2 shows the system's graphical user interface, which is implemented by attaching C and Tcl/Tk procedures to Golog's primitive actions for the domain. Currently, the personal banking assistant agents are the most interesting part of the system and are the only agents to have any kind of sophisticated behavior. We describe their design in detail in the next section. A complete description of
35
Bank o f Moabreal
~J
P e ~ o n a l ac~a = nts
uutualCdtvidend)
Canada Trust sav~ngs mutual(e=er~lng)
Im
Im
m
Me$$A90$
-- h c c o ~ t t i at cmu~la t~ust h ~ r ~ f e r z e d 1000 dollaE# fr~l ~ r 1 6 2 ar169
zm
canada t r u s t I t t o c a n a d a ~ t r u s t J2
doll~s fr~acco~t:
I
c=a~h t==t
~,fer[ed 1ooo d o l l a = = f r ~ :~-=d=_t=~,t
m
11000
r i a m ~b~e i ~ 0 3 at r t ~ 0 t to
i a t b ~ k _ o f mont=e~1
o -- acco~tl 1 at = ~ d a t r u s t has ~ i * ~ a b l e 12S0 - - Acco~r.# 1 s t c a n ~ _ E ~ t has r above 1250
Fig. 2. System user interface.
the whole system appears in [15]. A number of assumptions and simplifications were made, chiefly regarding communication: messages exchanged between agents, users, and other system components always reach their destination in a relatively timely manner; communicating processes do not change their addresses without telling the router, and reliable connections can be made quickly and at any time. Furthermore, each agent assumes that the other agents and processes it relies on are available most of the time, and respond to queries and commands within certain time bounds (often very generous). We also ignore all issues associated with security and authentication. Undoubtedly, these assumptions would have to be lifted and overall robustness improved in order to move beyond our experimental prototype.
5.2
The Personal Banking Assistant Agents
Monitoring account balances is the primary duty of the personal banking assistant (PBA) agent. The user specifies "monitors" by giving the account and institution, the limit at which the balance going below or above triggers action, and the action to be taken. The PBA refreshes the balances of monitored accounts by sending messages to the bank agents where the accounts are held.
36 The frequency of account balance queries depends on the user-specified volatility of the account. The PBA agent checks all monitors on an account when an institution replies with the current balance for the account. Two types of actions in response to a monitor trigger are currently supported: alerting its owner of the trigger, and actually resolving the trigger via account transfers. When the PBA is to resolve the trigger by itself, its action depends on whether the trigger involved an account balance going above a m a x i m u m limit or below a minimum limit. In the latter case, i t arranges to bring all account balances up-to-date, and then chooses an account to transfer funds from, considering: rate of return, owner instructions on moving funds out of that account, minimum allowed balance in that account (user or bank restricted), and current account balance. In response to an account being above its maximum limit, the PBA first gathers information about accounts at various financial institutions. This is accomplished by asking the router agent to broadcast a message to all financial institutions containing its owner's investment profile (risk level, liquidity needs, and growth-versus-income needs) and requesting an account recommendation. The PBA then waits for account recommendations (the length of the wait depends on the overall urgency of monitors and can be controlled by the user). When the waiting period has elapsed, the agent examines all of the relevant account recommendations and chooses the best. If the user already has an account of this type at the given institution, the PBA orders a transfer into it; if not it opens an account of this type for the user and then orders the transfer. Some of the fluents that PBA agents use to model the world are:
- UsERAcCOUNT(type, bank, account, balance, lastUpdate, rateOfReturn, moveFunds,minBalance,penalty,refreshRate, s): the agent's owner has an account of type at bank in situation s, - MONITOR(type, bank, account, limit, lowerOrHigher, priority, response, monlD, s): the agent is monitoring account of type at bank in situation 8,
WAITING UPDw(bank, account, s): the agent is expecting an update on account at bank in situation s, ALERT(alertMessage, maxPriority, monID, s): the agent must alert its owner that monitor monID has been triggered, - ALERWSENW(medium,priority, timeStamp, monID, s): an alert has been sent via medium at timeStamp in situation s, ALERTACKNOWLEDGED(monlD,s): the alert on monitor monID has been acknowledged by the agent's owner in situation s, -- MESSAGE(S): the last message read by the agent in situation s.
--
-
-
Note that many of these fiuents, for example WAITINGUPDT, represent properties of the agent's mental state. This is because the agent must be able to react quickly to incoming messages from its owner or other agents. Golog does not support interrupts or any other mechanism that would allow the current process to be suspended when an event that requires immediate attention occurs
37 (although the more recent concurrent version of the language, ConGolog, does [5]). So to get reactive behavior from the agent, one must explicitly program it to monitor its communications ports, take quick action when a message arrives, and return to monitoring its ports. When a new message arrives, whatever associated actions can be performed immediately are done. The agent avoids waiting in the middle of a complex action whenever possible; instead, its mental state is altered using fluents such as WAITINGUPDT. This ensures that subsequent events are interpreted correctly, timely reaction to conditions detected occurs (e.g., sending out a new alert when the old one becomes stale), and the agent is still able to respond to new messages. A m o n g the primitive actions that PBA agents are capable of, we have:
SENDMESSAGE(method, recipient, message): send message to recipient via method, - - SWARWWAITINGUPDW(bank, account): note that the agent is expecting an update on account from bank, - STOPWAITINGUPDT(bank, account): note that the agent is no longer expecting an update on account from bank, CREATEALERT(message, max Priority, monID): note that the user must be alerted with message and keep trying until maxPriority is reached, - SENDALEaT(priority, message, medium, monID): send an alert to the user with priority via medium containing message, DELETEALERT(monID): stop attempting to alert the user that monitor m o n I D was triggered, -- READCOMMUNICATIONSPORT(channel): check the communications channel -
-
( T C P / I P , e-mail, etc.) for messages and set the fluent MESSAGE appropriately, -- UPDATEBALANCE(bank, account, balance, return, time): update the balance on the user's account at bank. PBA agents are programmed in Golog as follows. First, we provide the required axioms: a precondition axiom for each primitive action, for instance, axiom (1) from section 2 for the action CREATEALERT; a successor state axiom for each fluent, for instance, axiom (2) for ALERT; and axioms specifying the initial state of the system - - what accounts the user has, what conditions to monitor, etc. 6 Then, we specify the behavior of the agent in the complex actions language. The main procedure the agent executes is CONTROLPBA:
6 In this case, the axioms were translated into Prolog clauses by hand. We have been developing a preprocessor that does this automatically. It takes specifications in a high level notation similar to Gelfond and Lifschitz's .4 [2]. Successor state axioms are automatically generated from effects axioms.
38 p r o c CONTROLPBA while TaUE do REFRESHMONITOREDACCTS; HANDLECOMMUNICATIONS( T C P / I P ) ; GENERATEALERTS endWhile endProc Thus, the agent repeatedly does the following: first request balance updates for all monitored accounts whose balance has gotten stale, second process all new messages, and third send out new messages alerting the user of monitor triggers whenever a previous alert has not been acknowledged after an appropriate period. The procedure for requesting balance updates for monitored accounts appears below: p r o c REFRESHMONITOREDACCTS f o r type, where, account, lastUpdate, re freshRate : UsERAccOUNT(type, where, account, _, lastUpdate, _, _, _, _, re freshRate) A MONiTOR(type, where, account, _, _, _, _, _) A STALE(last Update, re freshRate) A -~ WAITINGUPDW(where, account) [ /* ask for balance */ COMPOSEREQUESTFORINFORMATION(type, request)?; SENDMESSAGE(TCP/IP, where, request); START WAITING UP DT( where, account) ]endProc Note how the fluent WAITINGUPDT is set when the agent requests an account update; this ensures that the request will only be made once. The following procedure handles the issuance of alert messages: p r o c GENEKATEALERTS for rnsg, maxPriority, monlD: ALERT(rnsg, rnaxPriority, rnonlD)[ if 3rrnediurn, last Priority, time ( ALERTSENT(medium, lastPriority, time, rnonlD) A -~ALERTACKNOWLEDGED(monlD) A STALE(lastPriority, medium, time) A lastPriority < rnaxPriority) then zr p, new M edium, last Priority [ (ALERTSENT(_,lastPriority, _, monID) A p is lastPriority + 1 A APPROPRIATEMEDIUM(p,newMediurn) A WORTHWHILE(newMediurn))?; SENDALERT(p,rnsg, newMedium, rnonlD)] endIf ]endProc It directs the agent to behave as follows: for every monitor trigger of which the user needs to be alerted, if the latest alert message sent has not been acknowledged and is now stale, and the maximum alert priority has not yet been reached, then send a new alert message at the next higher priority via an appropriate medium.
39 p r o e HANDLECOMMUNICATIONS(channel)[ READCOMMUNICATIONSPORT(channel); while -1 EMPTY(MESSAGE) do [ if TYPE(MESSAGE) ---- STARTMONITOR A SENDER(MESSAGE) ---- PERSONALINTERFACE t h e n
7r type, bank, account, limit, lh, prio, resp, m o n I D [ ARGS(MESSAGE) ---- [type, bank, account, limit, lh, prio, resp, m o n I D]? ; START M ONITOR( type, bank, account, limit, lh, prio, resp, mon I D ) ] else i f TYPE(MESSAGE) ---- ACKNOWLEDGEALERT A SENDER(MESSAGE) :
PERSONALINTERFACE t h e n
7r m o n l D [ARGS(MESSAGE) ---- [mon[D]?; ACKNOWLEDGEALERT(monID)] else if TYPE(MESSAGE) ~-~ UPDATECONFIRMATION /k ISBANK (SENDER(MESSAGE)) t h e n rr f r o m , account, amount, rate, time [ ARGS(MESSAGE) ---- [from, account, amount, rate, time]?; HANDLEUPDATECONFIRMATION(from, account , amount, rate, time)] else if TYPE(MESSAGE) ----UPDATEREJECT /k ISBANK(SENDER(MESSAGE)) t h e n LOG(UPDATEREJECT, SENDER(MESSAGE),ARGS(MESSAGE)) /* other cases handling message types SToPMONITOR, */ /* TransferConfirmation, ACCOUNTOPENED, and */ /* RECOMMENDEDACCOUNT are omitted */ endIf READ COMMUNICATIONSP ORT(channel ) ]endWhile ]endProc Fig. 3. Main message handling procedure.
The procedure in figure 3 contains the main message handling loop. It repeatedly reads a message from the port and dispatches to the appropriate action depending on the type of message and sender, for as long as the message queue is not empty. For instance, when a STARTMONITOR message is received from the personal interface, the agent starts monitoring the condition of interest; and when an ACKNOWLEDGEALERT message is received, the agent notes the acknowledgement. Some cases are left out for brevity. The most interesting case involves UPDATECOMFIR.MATION messages from banks. Here, the agent must note the account's updated balance, check whether this triggers a monitor, and take action if it does. HANDLECOMMUNICATIONS sets this in motion by dispatching to the procedure HANDL~.UPDATECONFmMATION which appears in figure 4. When a monitor trigger is detected in the above procedure, the reaction of the agent depends on whether the account balance has gone lower or higher than the limit and whether it is directed to alert the user or solve the problem itself. The SOLvELowBALANCE procedure in figure 5 is invoked when the PBA must solve an account balance going below its limit. The
40 p r o e HANDLEUPDATECONFIRMATION(bank,.account,amount, rate, time)[ UPDATEBALANCE(bank,account, amount, rate, time);
STOPWAITINGUPDT(bank,account); /* check monitors on the account */ for type, limit, lh, prio, resp, monI D: MONITOR(_,bank, account, limit, lh, prio, resp, monI D )[ if (lh = LOWER A amount < limit A ~ALERT(_, _, monlD)) t h e n if resp ----SOLVEt h e n
SOLVELowBALANCE(bank, account, amount, limit, lh, prio, monI D ) else
ALERTLowBALANCE(bank,account, amount, limit, lh, prio, monI D ) endIf else if (lh = HIGHER A amount > limit A ~ALERT(_, _, monID) t h e n if resp = SOLVE t h e n
SOLVEHIGHBALANCE(bank, account, amount, limit, lh, prio, monI D ) else ALERTHIGHBALANCE(bank,account, amount, limit, lh, prio, monI D ) endIf else if ALERT( . . . . monlD) A ((lh = LOWER A amount >_limit) V (lh = HIGHER A amount g limit)) t h e n HANDLEMONITORUNTRIP( resp, monl D ) e n d I f ]]endProc Fig. 4. Procedure treating update confirmation messages.
agent picks the account with the highest score based on the account's m i n i m u m balance and associated penalties, its rate of return, the user-specified mobility of funds in the account, whether the account's balance is sufficient, and whether a monitor exists on the account and what its limit is. If the score of the chosen account is above a certain limit that depends on the monitor's priority, a transfer request is issued, otherwise the user is alerted. It would be interesting to try to generalize this to handle cases where one must transfer funds from several accounts to solve the problem.
6
Discussion
One conclusion t h a t can be drawn from this experiment in developing the personal banking assistance system (2000 lines of Golog code), is that it is possible to build sizable applications in the language. How well did Golog work for this? Golog's mechanism for building and maintaining a world model proved to be easy to use and effective. It provides much more structure to the p r o g r a m m e r for doing this - - precondition and successor state axioms, etc. - - than ordinary p r o g r a m m i n g or scripting languages, and guarantees soundness and expressiveHess.
41
p r o c SOLVELoW BALANCE(bank, account, amount, limit, prio, m o n l D ) 7r bank F r om, account Frorn, score, amt Req [ CHOOSEBEsTACCOUNT(bankFrom, accountFrom, score, amtReq, bank, account, monID)?; if score > TRANSFERLIMIT(prio) t h e n TRANSFERFUNDS(bankFrom, accountFrom, bank, account, amtReq) else 7r msg [ COMPOSEMsG(FAILEDToSOLVE,bank, account, amt Req, mon I D , msg) ? ; CREATEALERT( msg, prio, mon l D )] endIf ]endProc Fig. 5. Procedure to resolve a low balance condition.
The current implementation of the banking assistance system did not make much use of Golog's lookahead and nondeterminism features. But this m a y be because the problem solving strategies used by its agents are relatively simple. For instance, if we wanted to deal with cases where several transactions are necessary to bring an account balance above its limit, the agent would need to search for an appropriate sequence of transactions. We are experimenting with Golog's nondeterminisin as a mechanism for doing this kind of "planning". We also used Golog's situation calculus semantics to perform correctness proofs for some of the PBA agent's procedures, for instance REFRESHMONITOREDACCTS; the proofs appear in [15]. This was relatively easy, since there was no need to move to a different formal framework and Golog programs already include much of the specifications that are required for formal analysis. From a software engineering point of view, we found Golog helpful in that it encourages a layered design where Golog is used to program the knowledgebased aspects of the solution and C or T c l / T k procedures attached to the Golog primitive actions handle low-level details. On the negative side, we found a lot of a r e a s where Golog still needs work. Some things that programmers are accustomed to be able to do easily are tricky in Golog: performing arithmetic or list processing operations, assigning a value to a variable without making it a fluent, etc. The language does not provide as much support as it should for distinctions such as fluent vs. non-fluent, fluents whose value is updated using the successor state axioms vs. sensed fluents, the effects of actions on fluents vs. their effects on unmodeled components of the state, etc. The current debugging facilities are also very limited. Some standard libraries for things like agent communication and reasoning about time would also be very useful. Perhaps the most serious limitation of Golog for agent p r o g r a m m i n g applications is t h a t it does not provide a natural way of specifying event-driven, reactive behavior. Fortunately, an extended version of the language called ConGolog which supports concurrent processes, priorities, and interrupts is under
42 development. This extended language allows event-driven behavior to be specified very naturally using interrupts. In [5], we describe how ConGolog could be used to develop a simple meeting scheduling application. Finally, there are some significant discrepancies between the Golog implementation and our theories of agency in the way knowledge, sensing, exogenous events, and the relation between planning and execution are treated. We would like to develop an account that bridges this gap. The account of planning in the presence of sensing developed in [6] is a step towards this.
References 1. Michael Dixon. Embedded Computation and the Semantics of Programs. PhD thesis, Department of Computer Science, Stanford University, Stanford, CA, 1991. Also appeared as Xerox PARC Technical Report SSL-91-1. 2. M. Gelfond and Lifschitz. Representing action and change by logic programs. Journal of Logic Programming, 17(301-327), 1993. 3. Andrew R. Haas. The case for domain-specific frame axioms. In F.M. Brown, editor, The Frame Problem in Artificial Intelligence: Proceedings of the 1987 Workshop, pages 343-348, Lawrence, KA, April 1987. Morgan Kaufmann Publishing. 4. Yves Lesp~rance, Hector J. Levesque, F. Lin, Daniel Marcu, Raymond Reiter, and Richard B. Scherl. A logical approach to high-level robot programming - a progress report. In Benjamin Knipers, editor, Control of the Physical World by Intelligent Agents, Papers from the 1994 A A A I Fall Symposium, pages 109-119, New Orleans, LA, November 1994. 5. Yves Lesp~rance, Hector J. Levesque, F. Lin, Daniel Marcu, Raymond Reiter, and Richard B. Scherl. Foundations of a logical approach to agent programming. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent Agents Volume II Proceedings of the 1995 Workshop on Agent Theories, Architectures, and Languages (ATAL-95), Lecture Notes in Artificial Intelligence, pages 331-346. Springer-Verlag, 1996. 6. Hector J. Levesque. What is planning in the presence of sensing? In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1139-1146, Portland, OR, August 1996. 7. Hector J. Levesque, Raymond Reiter, Yves Lesp~rance, Fangzhen Lin, and Richard B. Scherl. GOLOG: A logic programming language for dynamic domains. To appear in the Journal of Logic Programming, special issue on Reasoning about Action and Change, 1996. 8. Fangzhen Lin and Raymond Reiter. How to progress a database (and why) I. logical foundations. In Jon Doyle, Erik Sandewall, and Pietro Torasso, editors, Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International Conference, pages 425-436, Bonn, Germany, 1994. Morgan Kaufmann Publishing. 9. Fangzhen Lin and Raymond Reiter. State constraints revisited. Journal of Logic and Computation, 4(5):655-678, 1994. 10. Fangzhen Lin and Raymond Reiter. How to progress a database Ih The STRIPS connection. In Chris S. MeUish, editor, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 2001-2007, Montreal, August 1995. Morgan Kaufmarm Publishing. -
-
43 11. John McCarthy and Patrick Hayes. Some philosophical problems from the standpoint of artificial intelfigence. In B. Meltzer and D. Michie, editors, Machine Intelligence , volume 4, pages 463-502. Edinburgh University Press, Edinburgh, UK, 1979. 12. E. P. D. Pednault. ADL: Exploring the middle ground between STRIPS and the situation calculus. In R.J. Brachman, H.J. Levesque, and R. Reiter, editors, Proceedin9s of the First International Conference on Principles of Knowledge Representation and Reasoning, pages 324-332, Toronto, ON, May 1989. Morgan Kaufmann Publishing. 13. V.R. Pratt. Semantical considerations on Floyd-Hoare logic. In Proc. of the 17th IEEE Symp. on Foundations of Computer Science, pages 109-121, 1976. 14. Raymond Reiter. The frame problem in the situation calculus: A simple solution (sometimes) and a completeness result for goal regression. In Vladimir Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, pages 359-380. Academic Press, San Diego, CA, 1991. 15. Shane J. Ruman. GOLOG as an agent-programming language: Experiments in developing banking applications. Master's thesis, Department of Computer Science, University of Toronto, Toronto, ON, 1996. 16. L.K. Schubert. Monotonic solution to the frame problem in the situation calculus: An efficient method for worlds with fully specified actions. In H.E. Kyberg, R.P. Loui, and G.N. Carlson, editors, Knowledge Representation and Defeasible Reasoning, pages 23-67. Kluwer Academic Press, Boston, MA, 1990.
A System for Modelling Agents Having Emotion and Personality
Lin Padgham, Guy Taylor Department of Computer Science RMIT University, Melbourne, Australia email:
[email protected],
[email protected]
1 Introduction There is currently a widespread interest in agent-oriented systems, partly because of the ability of such systems to model and successfully capture complex functionality in a robust and flexible way, suitable for difficult applications. There has also been significant interest in going beyond the rational model of agents, such as that evidenced in the literature modelling agents as systems of beliefs, desires and intentions (BDI models, e.g. [IGR92, GI89]), to include aspects such as emotions and personality (e.g. [Bat94, BLR92b, Bat93, BLR92a, Slo81]). Motivations for modelling "broader" agents than simply the rational logical agents of BDI architectures are several. Some consider it necessary to incorporate human aspects such as personality and emotion, in order to make agents more engaging and believable, so that they can better play a role in various interactive systems involving simulation. Entertainment is one obvious application area for such simulation systems, but another is education and training. A simulator that was able to realistically model emotional reactions of people could, for instance, be used in training programs for staff who need to be trained to deal with the public. Some people also believe that emotions play a functional role in the behaviour of humans and animals, particularly behaviour as part of complex social systems (e.g. [Tod82]). Certainly the introduction of emotions, and their interaction with goals, at various levels, increases the complexity of the agents and social systems that can be modelled. Our belief is that a successful modelling of emotion will enable us to come closer to the goal of building software agents which approach humans in their flexibility and ability to be adaptable and survive in complex, changing and unpredictable environments. However significant work is needed before we can expect to understand the functional role of emotion sufficiently to successfully model it in our software agents. In this paper we explore some of the ways that we believe emotions and personality interact with goal oriented behaviour, and we describe some of the simplifications we have made in order to build an initial interactive environment
60 for experimentation with animated agents simulating personality and emotions as well as incorporating rational goal directed behaviour.
2
Interactions
of emotions,
behaviour,
goals
In considering the behaviour of human agents, it is evident that emotions affect behaviour in a number of ways. One of the most straightforward ways, and one modelled in systems such as the woggles of oz-world [Bat94] and the agents of the virtual theatre project [HR95], is that emotion modifies the physical behaviour of agents. A happy agent moves faster, and more bouncily, while a sad agent is slower and flatter in its movements. In addition, emotions can clearly affect an agent's goals, hence affecting their actions. An agent that is very scared is likely to drop a goal to explore its surroundings in favour of a goal to move to a secure position. Emotional effects on goals can be via reordering, or re-prioritising, existing goals, or by introducing completely new goals. Some goals may be introduced simply to manage extreme emotional states. An agent which is feeling intense anger may instantiate a goal to harm the subject of its anger, in order to alleviate the emotional pressure. In addition to emotions affecting existence and prioritisation of goals, goals, and their success or failure, can affect emotional states. An agent which experiences a goal failure may feel unhappy, while one experiencing goal success may feel glad. Dyer [Dye87] develops a comprehensive lexicon of emotional states, based on goal success and failure. For example an agent which expects its goal to succeed will feel hopeful, an agent whose goal is achieved by the action of another agent will feel grateful, and an agent who expects its goal to be thwarted will feel apprehensive. Figure 1 shows examples of some emotions, indexed by contributing cause. Frijda and Swagerman [FS87] postulate emotions as processes which safeguard long-term persistent goals or concerns of the agents, such as survival, a desire for stimulation, or a wish to avoid cold and damp. These long term concerns differ from the usual goals of rational agent systems in that they are not things which the agent acts to achieve. Rather they are states which are continually important, and if threatened, cause sub-goals to be instantiated. According to Frijda the process which instantiates appropriate sub-goals is emotion. The earlier work of Toda [Tod82] also postulates emotions as processes which affect the rational system of the agent, and which are based on basic urges. He groups these urges into emergency urges, biological urges, cognitive urges and social urges. These are similar in principle to the long-term concerns of Frijda and Swagerman. When activated, they focus rational activity and determine the extent to which various competing issues are addressed. Emotions are seen as varying in intensity where the intensity level is an important factor in determining the effect on the rational processing of the agent.
61
emotion(x)
status to goal-situation by mode
happy
pos
G(x) achieved
sad
neg
G(x) thwarted
grateful
pos
y G(x) achieved y
hopeful
pos
G(x) achieved
expect
disappointed
neg
G(x) thwarted
expect achieved
guilty
neg
y G(y)thwarted x
Fig. 1. Some emotions and their causes
Emotions can also be seen as processes by which agents manipulate each other in social situations, to enable achievement of their goals. By making my child feel grateful (by acting to help them achieve their goal) I increase the chance that the child will act to achieve my goals. In the emotional model which we are developing it is necessary to capture in some way both the causes of emotion and the effects of emotion. At the most abstract level, we see emotions as being caused by a combination of the state of the world (includiug self and other agents) and the values of the agent. The emotional reaction caused in an agent by goal success or failure (an event in the world), will depend to some extent on the importance of that goal to the agent. An agent m a y have a number of motivational concerns, which reflect what is i m p o r t a n t to that agent. Threats and opportunities associated with these concerns will then generate emotions, as will successs and failure of goals associated with these concerns. Emotions can affect behaviour directly, or indirectly, via the rational system. Direct effects on behaviour include such things as affecting facial expressions, stance and movement, or causing a relatively instantaneous action, as in the case of flight caused by extreme fear. Effects via the rational system include such things as re-prioritising goals, adding goals and deleting goals. An agent experiencing gratitude might delete (or prioritise lower) those goals which conflict with the goals of the agent to whom it is grateful. Or it may instantiate new goals in order to achieve directly the goals of the agent to whom gratitude is felt. Clearly there are complex interactions between goals and emotions, which involve both effect of goals on emotions, and emotions on goals. In order to explore
62 and increase our understanding of the dynamics of emotion, personality and goal directed behaviour we have implemented an interactive system for experimentation with various agent groupings.
3
Simplified
emotional
model
Our goal is to eventually develop an implemented version of a complex model of emotion and personality, incorporating all of the aspects discussed above. However in order to make a start, and to study via an implemented system, the interaction between the emotional subsystem, the cognitive subsystem, and the behavioural subsystem, we have initially developed a rather simple model of emotion. In this model emotions are seen as existing in pairs of opposites 1 such as pride and shame, happiness and sadness, love and hate. They are represented as a guage, with a neutral point about which the emotion fluctuates in a positive or negative direction. Each agent has a specified threshold for the positive and the negative emotion. When this threshold is crossed, the agent is considered to have that emotion. The fluctuation of the emotional guage is caused either by events which happen, or by the passage of time. The latter is relevant only when no events are actively influencing the emotion under consideration. This allows an activated emotional state to decay over time, returning eventually to the neutral state, if no events maintain the active state of the emotion. The events having an influence on emotional state can be of several different types. T h e first are events which universally and directly affect the emotional state. In this case the event results directly in a modification of the emotional guage of the agent experiencing the event. Probably in a richer model there would be no need for such events, but at the current stage we have included t h e m to allow the emotional effect of events which are not included in the other mechanisms. The second type of events are goal success and goal failure. Currently these events affect only the emotions of happiness, sadness and pride. Goal success (failure) of an agent affects the happiness (sadness) emotion of t h a t agent. Success of the goal of another agent that the agent cares about, results in increase of the pride emotion. The third, and perhaps most important source of events which affect an agent's emotions are events which recognise threats or opportunities with respect to the motivational concerns of a n agent. Motivational concerns are associated with states which are recognised as threats (or opportunities) with respect to those concerns. The motivational concerns are also associated with emotional reactions to be triggered in an agent when the event indicating the relevant state occurs. 1 Some emotions may not have an opposite, in which case it is left undefined.
63 A motivational concern of stimulation may have a threat associated with the agent being idle. When this state occurs it triggers an event which activates the stimulation concerns and results in an increase in the boredom emotion, which, if the idle state continues will eventually cause boredom to cross the threshold. This will then in its turn cause a bored event, which can trigger a goal to reduce the boredom. We can represent the set of emotional guages as E , and a particular emotional guage as e, where vo(n) represents the neutral value on the guage, e + represents the name of the positive emotion associated with this guage, and e - the name of the corresponding negative emotion. The value of the emotional guage e for a particular agent at any time point is then determined by the previous value of that guage, incremented (or decremented) according to what events the agent is aware of which affect the emotion, or the decay rate 2 of the emotion for that agent. We let the decay rate be a function over the emotions for each agent, written DA(e). Thus we can say that, with respect to a particular agent, A:
VA,+~(e) = VA, (e) + F({events}tt +~, DA(e, (~)) The effect of an emotion on the behavioural and cognitive subsystems of the agent comes into play when an emotion crosses the threshold for that agent. This then results in the belief that the agent experiences the emotion being asserted (or retracted) in the cognitive subsystem. The behavioural subsystem may also be notified. This in its turn can lead to goals being asserted, deleted, or reprioritised, in the cognitive subsystem, and to behaviour being modified directly in the behavioural susbsystem. For example, if the belief that the emotion sadness is being experienced is asserted, it results in modifying the agent's stance and movement in the behavioural system, to reflect sadness. An assertion of boredom 3 results in instantiating a goal to relieve the boredom. Emotional thresholds are represented individually for each agent and emotion. We let P and N be functions giving the positive and negative thresholds for all emotions for a given agent. Thus PA(e) represents the threshold at which emotion e + is asserted, and NA(e) represents the threshold at which e - is asserted, for agent A. Thus we can say that for a given agent, A: (VAt (e) ~__PA(e)) A -,believeA (e +) ~ assert believeA (e +) which results in behavioural and cognitive effects. and VA, (e) <_ NA (e) A -,believeA (e-) ~ assert believeA (e-) which results in behavioural and cognitive effects. 2 A decay rate may also be negative, allowing an emotion to increase over time, if not affected by any events. This is used particularly for modelling physical urges, such as hunger, which in many ways are similar to emotions with respect to their effect on the cognitive system. 3 For ease of expression we will sometimes speak about asserting an emotion. What is strictly meant is that we assert the belief that the agents is experiencing the emotion.
64 Also when values cross back into the neutral zone emotions are retracted in a similar way.
4
Personality
Personality is a concept which is ill-defined, but which clearly is related to some aspects of our simple emotional model of agents. In particular, the emotional model described provides three different mechanisms which can be used for modelling agent personality. Thes are the motivational concerns, the emotion thresholds, and the rate of decay for an emotion. One aspect of personality is a notion of what things are important to that person - a person who is always concerned about money and financial matters could be represented as a person having a motivational concern for financial well-being. This person will respond emotionally to events affecting this aspect of life, and will thus appear to have a quite different personality than an agent who does not have this motivational concern, and therefore does not respond emotionally to the particular events associated with threats and opportunities associated with this concern. Some motivational concerns will be more or less universal - such as that for stimulation or for survival. However others will be quite individual, and will be a significant aspect of the agent's personality. The threshold at which an emotion is asserted is also an important aspect of personality. The individual who experiences many anger increasing events, before becoming "angry" has a different personality to the agent who becomes angry easily - perhaps after one or two such events. Finally the decay rate for each emotion is an important aspect of personality. Two agents with the same threshold for anger, and the same motivational concerns, may still exhibit differing personalities (with respect to this emotion), based on their decay rate for anger. An agent whose anger wears off very slowly has a different personality to the agent whose anger dissipates almost directly. The former personality trait could be described as "the sort of person who bears a grudge". With these three aspects of the emotional model at our disposal, the personality of an agent can then be said to depend on the motivational concerns of that agent plus the thresholds and decay rates for each emotion, for that agent. If we let M A be the set of motivational concerns for agent A, the personality of A can be described by:
q3A =< MA, NA, PA, DA > These relatively simple mechanisms, while clearly not capturing all aspects of personality, nevertheless do give us a reasonably rich mechanism for beginning to represent varying personalities of agents.
65
5
System description
The system is implemented on an SGI and a Sun workstation and uses dMars 4 a descendant of PRS [IGR92, GI89], and Open Inventor TM a set of library routines for high level graphics coding. The system developed is similar to that reported in [PTQ96], but generalised to an interactive, menu-based system which allows users to develop arbitrary scenarios. Users can choose agents and objects from menus, in order to populate the world. Agent plans are then defined and edited via the dMars plan-editor, available via a system menu. Personality variables will also be definable via the menu interface, although they currently are read in from files.
SGI machine
Sun machine
Emotional module
Cognitive module
motivational concerns emotion guages decay functions
connecting layer
,r foreign ~ process agent
world model simulator specification interface System managementmodule
dMars
beliefs
plans
Open Inventor models capability specifications Behavioural module
Fig. 2. Architectural overview.
There are four main modules in the system, as shown in figure 2. The cogni4 Distributed Multi Agent Reasoning System, from the Australian Artificial Intelligence Institute.
66 tive module (which is exactly dMars) manages the cognitive processing of all agents - their beliefs, desires, intentions and the plans available to them for rational action. The behavioural module, which uses Open Inventor, manages the graphical models for agents and objects in the world, and their animation. The emotional subsystem manages the emotional processing for each agent while the system management module manages the interface of the interactive system, the simulation and world model, and the connecting layer which controls the communication between the cognitive, physical and emotional subsystems of agents. A part of the connecting layer is a "Foreign Process Agent" which is a module supplied by dMars for managing communication between dMars and other parts of an application. The foreign process agent supports the use of sockets, facilitatingthe use of different machines for the different subsystems. All communication between dMars and the emotional or physical subsystems goes through the foreign process agent. Each individual agent consists of a logical thread within dMars, the emotional subsystem, and the behavioural subsystem, with the integration of these subsystems managed by the connecting layer. Because of the difference in the level of granularity which is needed by the graphical subsystem, and that which makes sense for the cognitive (or emotional) subsystem, there is some significant management of behaviours in the connecting layer. For example, the cognitive subsystem may request the action to 'go across the room'; the connecting layer will then manage the sequence of steps required to execute this action. The connecting layer also manages the perceptions from the environment, such as sight and smell, passing these up to the cognitive and emotional subsystems whenever there is a change. While percepts are received continually, we consider it worthy of attention by the cognitive or emotional subsystem, only when these change. Having briefly described the overall architecture, we now describe the architecture of the connecting layer in more detail.
5.1
Connecting Layer
The connecting layer includes two majoi subsystems, the action subsystem and the sensing subsystem. The action subsystem controls how the bodies of the agents move within the physical environment, and are synchronised with the behavioural control emanating from the cognitive subsystem. The sensing subsystem manages how the agents percieve the environment, and communicates these perceptions to the emotional and cognitive subsystems. The action subsystem enables high level actions, appropriate to the cognitive and emotional subsystems to be transformed to the primitive actions needed for the animation system. The simplest actions are atomic actions which are conceptually instantaneous and have a negligible duration. As far as the cognitive system is concerned they
67 can either succeed or fail. These actions are at the same level of granularity as atomic actions in the animation system and can be mapped to a single c o m m a n d in the animation system. An example is a turn action, which would be mapped to a rotation. Other kinds of actions have duration. They may either have a known end condition such as an action to move to a particular location, or they may be continuous, with no fixed end point such as the action to wag a tail.
I Commands
Command Pro'set
Messagesaboutactions
Action Cun'ent Action
Controlcommands Suspended Action Queue
Action ~" Evaluator
] A:~er
Attributes ~ ' " -Dynamic - - Attri b u t e s ~ AttributeModificat
.._..._.._
timer Environment Fig. 3. Action section of Connecting Layer.
A figure of the architecture for the action subsection of the connecting layer is shown in figure 3. The C o m m a n d parser looks at commands received from the agent layer, of which there are two types, control commands and action commands. Action commands are placed in the 'current action' processor. This will calculate what is needed for the next frame and check to see if the action is complete. For example, with the action 'move to object X', at each frame the agent will take a step closer to the object. It will see if it is at the desired location, and if so, stop the current action. If not, this sequence will be repeated at the next frame. Control commands are used for controlling the queue in ways other than the norreal execution. In order to allow the system to be fully reactive, new goals/intentions must be able to take priority over current goals at any point in time. This requires that executing actions must be able to be aborted a n d / o r suspended, at
68 the animation level when requested by the cognitive or emotional subsystems. The action evaluator checks the validity of the next action. An action may be started, but at some time during its execution a condition could arise which means the action is no longer able to finish executing. Collisions with other objets are an example. In this case the action evaluator sends a message to the agent's cognitive and emotional subsystems and aborts the current activity. Thus the action evaluator checks the agent's capabilty with respect to the action, at each step. Finally, the action token executer sets the values of the graphical models in the world at each timer event by reading the head of the event queue. Each action token has associated with it a motion path, allowing Open Inventor to create a smooth animation for that atomic action. For example, an action like 'take a step' can have the motion path for leg movement included in the model file, allowing smooth walking to be easily realised. The action executor also considers input from the emotional subsystem regarding the current physical ramifications of the emotional state. The other subsystem of the connecting layer is the sensing subsystem. Two types of sensing are modelled - goal-directed sensing and automatic sensing. Goal directed sensing allows an agent to explicitly ask for a piece of sensory information, receiving a response from the connecting layer. An automatic sense is triggered whenever a certain condition occurs. For example smell may be modelled as an automatic sense. As soon as an object is within range of the agent's smelling capability, a message would be triggered, alerting the agent to the newly found smell. An automatic sense is set up by indicating what conditions would trigger the sense, for the given agent. For example smell may be triggered by any object with a smell attribute, within a certain radius of the agent. The connecting layer then monitors for the occurrence of this situation. At each timer event, the connecting layer checks to see if something is sensed, based on the agents capabilities and the current environment. If the sensory information is different from that of the last timer event, then a message is generated and sent to the agent's emotional and cognitive subsystems. This mechanism avoids redundant messages when sensory information remains unchanged.
6
Discussion
We have currently developed three scenarios within this system - "Dog World" containing dogs which explore a world containing food and obstacles, and engage in actions such as playing, barking, exploring and eating (see figure 4), a world containing mice, a wheel and a house where in addition to the dogworld actions the mice play, hide, and get frustrated, and a world based on the Luxo Jr animation with a parent and a child lamp, and some balls.
69 For our initial scenario we had two dogs, Max and Fido, both with the same motivational concerns of hunger and stimulation. Max had ]VMax(el) relatively close to the neutral point for the emotional guage el where e i- is the emotion anger. DMax(el) was low, giving a gradual decay rate for the anger emotion. This led to Max behaving as if he had a relatively aggressive personality. For the emotional guage e~ measuring fear, NMax(e2) was relatively far from the neutral point and DMa,(e2) was high. Fido, on the other hand became easily fearful with NFido(e2) close to the neutral point, but also lost his fear quite quickly with DFido(e2) being quite high. NFido(el) and DFido(el) w e r e both at intermediate levels. In both the dog and the lamp scenarios the agents successfully displayed the features of different personalities, based on their emotional profiles. Despite the simplicity of the emotional model, it was possible to model agents whose differing personalities resulted in differing behaviour, given similar external situations.
Fig. 4. Max and Fido in Dog World
An example of interaction in the Max and Fido system is when Max barks at Fido (because Fido has approached the food while Max is eating), causing Fido's fear guage to quickly reach the negative threshold, triggering the fear reaction and causing Fido to run away. Because DFido(e2) (Fido's decay rate for fear) is quite high his fear disappears quickly once Max stops barking. Thus the following happens for Fido:
e~ = fear; NFido(e2) = --3; DFido (e2) = -- 1;
VF~o. (e2)
= 0;
F({events}~ a, DA(e2, ~)) = --3 (based on a bark from Max at tl, t2 and ta) Thus, given that --,believeFido(fear) at t3 VFiao,3(e2) = --3 <_ NFido(e2) ~ assert believeFido(fear) with the effect that Fido runs away;
70 Subsequently F({events}~, DA(e2, ~)) = --1 (a further bark from Max); F({events)~, DA(e2, ~i)) = 1 (no relevant events, decay of 1); F({events}~ 6, DA(e~, ~)) = 1 (no relevant events, decay of 1); Thus, noting that believefido (fear) at t6 YFido,6(e2) -'- --2 ~ NFido(e2) ~ retract believefido(fear) with the effect that Fido stops running away; One noticeable shortcoming of the current emotional model is that it does not support aspects of emotion requiring more knowledge about the objects of emotion. For example the pride emotion is generated whenever a person in a close relationship to the agent achieves their goal. A more accurate representation of the pride emotion would however be proud of, where the emotion has an object. This is true of many emotions, for example angry at, love towards, fear of. This suggests the need to extend the representation of emotions to allow for this information. Currently the emotional update is purely a function of events, the decay rate for the emotion, and time. Probably in a richer model we would also want to include beliefs and goals in this emotional update equation. My emotional response to the failure of a particular goal if I believe that another agent caused my goal failure (leading to anger), is different when I believe that the failure is my own fault (perhaps leading to frustration). These aspects are being explored in ongoing work.
Acknowledgements The authors thank the Australian Artificial Intelligence Institute for use of the dMars software, and for useful discussions and technical support, particularly from Ralph Rbnnquist, Anand Rao and Andrew Hodgson. The authors also acknowledge Ghassan A1 Qaimari's co-supervision of the honours project that preceded this work.
References J. Bates. The nature of characters in interactive worlds and the oz project. Virtual Reality: Anthology of Industry and Culture, 1993. [Bat94] J. Bates. The role of emotion in believable agents. Communications of the ACM (Special Issue on Agents), 1994. [BLR92a] J. Bates, A.B. Loyall, and W. S. Reilly. An architecture for action, emotion and social behavior. Proceedings of the Fourth European Workshop on Modeling Autonomous Agents in a Multi-Agent World, 1992. [BLR92b] J. Bates, B. Loyall, and W. S. Reilly. Broad agents. Proceedings of the AAAI Spring Symposium on Integrated Architectures, 2(4), August 1992. [Bat93]
71 [Dye87]
[FS87] [GI89]
[HR95] [IGR92] [PTQ96]
[Slo81] [Tod82]
Michael G. Dyer. Emotions and their computations: Three computer models. Cognition and Emotion, 1(3), 1987. Nico H. Frijda and Jaap Swagerman. Can computers feel? theory and design of an emotional system. Cognition and Emotion, 1(3), 1987. M. Georgeff and F. Ingrand. Decision-making in an embedded reasoning system. In Proceedings of the International Joint Conference on Artificial Intelligence - IJCAI '89, pages 972-978, Detroit, USA, Aug 1989. Barbara Hayes-Roth. Agents on stage: Advancing the art of ai. In IJCAI'95, Montreal, 1995. M. F. Ingrand, M. P. Georgeff, and A. S. Rao. An architecture for real-time reasoning and system control. IEEE Expert, pages 34-44, Dec 1992. L. Padgham, G. Taylor, and G. A1 Qaimari. An intelligent believable agent environment. Technical Report TR-96-31, Royal Melbourne Institute of Technology, Melbourne 3001, Australia, August 1996. Presented at AAAI-96 Workshop on Entertainment and AI/A Life, Portland, Oregon. Aaron Sloman. Why robots will have emotions. In IJCAI'S1, Vancouver, 1981. Masanao Toda. Man, Robot and Society. Martinus Nijhoff Publishing, Boston, 1982.
Control Architectures for A u t o n o m o u s and Interacting Agents: A Survey JSrg P. Miiller Mitsubishi Electric Digital Library Group* 18th Floor Centre Point, 103 New Oxford Street London WCIA 1EB, U.K.
A b s t r a c t . The control architecture of an autonomous agent describes its modules and capabilities, and how they work together. Over the past few years, numerous architectures have been proposed in the literature, addressing different key features an agent should have. In this paper, we survey the state of the art in research on agent architectures. We start by presenting selected examples of three prominent research threads, i.e.: architectures for reactive agents, deliberative agents, and interacting agents. Then we describe various hybrid approaches that reconcile these three threads, aiming at a combination of different features like reactivity, deliberation, and the ability to interact with other agents. These approaches are contrasted with architectural issues of recent agent-based work, including software agents, softbots, believable agents, as well as commercial agent-based systems.
1
Introduction
The complexity and the inherent distribution of competence, control, and information in today's organizations and computer-controlled technical systems require new and flexible software architectures. Prominent examples are workflow management systems for virtual enterprises, or information management systems in the Internet (digital libraries). Robustness has replaced optimality as the main criterion for measuring the quality of these systems. An additional requirement is interoperability: heterogeneous systems need to exchange information and to work together. Today's standard software engineering approach is based on object orientation (see, e.g., [RBP91]). The object-oriented programming paradigm addresses issues of modularity and reusability; unfortunately, it is of no help as regards control and autonomy. Client-server architectures address interaction between distributed processes; however, the strong hierarchical relationship they impose is too unflexible. Furthermore, these architectures give no help to decide how to design individual nodes to meet the above requirements. In the light of these limitations, the notion of an autonomous agent = appears to be a magic word in computing of the 1990s. The concept of autonomous soft* Email:
[email protected] 2 For a rewarding paper on the definition of agents, see [FG97].
ware programs that can react to changes in their environment, that can plan and make decisions autonomously, and that can interact with each other to achieve local or global ends, has drastically changed the way Artificial Intelligence defines itself [RN95] and is about to find its way into industrial software engineering practice. T h e kernel of an autonomous agent is its control architecture, i.e., the description of its modules and of how they work together. Recently, numerous axchitectures were proposed, addressing different key features an agent should be equipped with. In this paper we survey the state of the art in the design of control architectures for autonomous agents. We start with an investigation of architectural issues raised by three influential threads of agent research, i.e.: reactive agents: agents that are built according to the behavior-based paradigm, that have no or only a very simple internal representation of the world, and that provide a tight coupling of perception and action. - deliberative agents: agents in the symbolic Artificial Intelligence tradition that have a symbolic representation of the world in terms of categories such as beliefs, goals, or intentions, and that possess logical inference mechanisms to make decision based on their world model. interacting agents: agents that are able to coordinate their activities with that of other agents through communication. These agents have been mainly investigated in Distributed AI; they may have explicit representations of other agents and may be able to reason about them. -
-
Each of these threads focusses on different important properties of an agent: reactivity, the ability to behave in a goal-directed manner (proactiveness, see [WJ95]), and the ability of social behavior. Having said this, it is not astonishing that research on agent architectures in the 1990s mainly tried to reconcile these properties in layered (mostly hybrid 3) agent architectures. In Section 5, some important layered architectures are overviewed. Up to the early Nineties, the driving force of the development of agent architectures was robotics, and many of the architectures described in the Sections 2-5 were actually developed and evaluated by using robotics applications. Recently, the term agent has been increasingly used in different contexts, e.g.: agents that are designed to be believable to humans software agents that assist users in manifold tasks - softbots that move through the Internet performing tasks there similar to the way robots do in a physical environment. -
-
In Section 6 some architectural issues raised by the above areas shall be investigated. As we mentioned previously, the usage of the term agent appears in a broad variety of meanings and across the borders of scientific communities. The research area (fortunately) is highly dynamic and it is impossible to cover all a We call an architecture hybrid if it makes use of differring means of representation or mechanisms of control in different layers.
interesting work within one survey article. Rather than trying to be complete, in this survey we select and explain a few representative instances of the most important agent-related research directions. Additional references for further reading are provided in each section.
2
Architectures for Reactive Agents
In the mid-1980s, a new school of thought emerged that was strongly influenced by behaviorist psychology. Guided by researchers such as Brooks, Chapman and Agre, Kaelbling, and Maes, architectures were developed for agents that were often called behavior-based, situated, or reactive. These agents make their decisions at run-time, usually based on a very limited amount of information, and simple situation-action rules. Some researchers, most notably Brooks with his subsumption architecture, denied the need of any symbolic representation of the world; instead, reactive agents make decisions directly based on sensory input. The design of reactive architectures is partly guided by Simon's hypothesis [Sim81] that the complexity of the behavior of an agent can be a reflection of the complexity of the environment in which the agent is operating rather than a reflection of the agent's complex internal design. The focus of this class of system is directed towards achieving robust behavior instead of correct or optimal behavior. Articifial life [Lan89] [Mae94b] is a recent research discipline that strongly builds on reactive, behaviour-based agent architectures. For filrther reading on architectures for reactive agents, see e.g., [AC87] [Bro86] [Suc87] [Fer89] [AC90] [Bro90]
[KR90] [Mae90b] [Bro91] [BA95]. 2.1
Brooks: subsumption architectures
Brooks's mlbsumption architecture provides an activity-oriented decomposition of the system into independent activity producers which are working in parallel. Individual modules (layers) extract only these aspects of the world which are of interest to them. Thus, the representation space is cut into a set of subspaces. Between the subspaces, no representational information is passed. The lowest layers of the architecture are used to implement basic behaviors such as to avoid hitting things, or to walk around in an area. Higher layers are used to incorporate facilities such as the ability to pursue goals (e.g., looking for and grasping things while walking around). Control is based on two general mechanisms, namely inhibition and suppression. Control is layered in that higher-level layers subsume the roles of the lower level layers when they wish to take control. Layers are able to substitute (suppress) the inputs to and to remove (inhibit) the output fl'om lower layers for finite, pre-programmed time intervals. The ability (bias) of the robot agent to achieve its higher-level goal while still attending to its tower-level goals (e.g., the monitoring of critical situations) crucially depend s on the programming of interlayer control, making use of the two control mechanisms. Brooks was successful
in building robots for room exploration, map building and route planning. However, to our knowledge, so far there are no subsumption-based robots that can do complex tasks requiring means-end reasoning and/or cooperation. 2.2
Steels: b e h a v i o r - b a s e d r o b o t s
Steels's approach to modeling autonomous agents described in [Ste90] forgoes any planning and instead refers to the principle of emergent functionality brought about by processes of self-organization which plays an important role in system theory [NP77], biology, physics and chemistry [SabS6]. The fundamental observation is that complex behavior of the system as a whole can emerge by the interaction of simple individuals with simple behavior. This describes the phenomenon of swarm intelligence. Steels provided the following example for self-organization in the planet scenario: robots, which have the task to collect samples located in clusters, use simple rules to indicate regions where samples may be found: if an agent carries a sample, it drops crumbs, if it carries none and detects crumbs, it picks up the crumbs again. Thus, paths are built leading to regions with high density of samples. On the other hand, agents take into account the information provided by the crumbs by following the highest concentration of crumbs. By a simulation, Steels shows that the performance of the agents can be remarkably improved by this reactive cooperation method. As for most other reactive approaches, Steels' model suffers from the fuzziness of the underlying terms such as self-organization, and emerging behavior. The extent to which his model can be generalized, and its general usefulness as a model for intelligent agents that are able to deal with a broader range of tasks and environments, is unclear. 2.3
Arkin - the AuRA architecture
AuRA (Autonomous Robot Architecture) [Ark90] is an architecture for reactive robot navigation that extends Brooks' approach by incorporating different types of domain knowledge to achieve more flexibility and adaptability. AuRA consists of five components: (i) a perception subsystem which provides perceptual input for other modules; (ii) a cartographic subsystem, i.e., a knowledge base for maintaining both a priori and acquired world knowledge; (iii) a planning subsystem which consists of a hierarchical planner and a reactive plan execution subsystem; (iv) a motor subsystem providing the effectoric interface to the actual robot; and a (v) homeostatic control subsystem which monitors internal conditions of the robot such as its energy level, and which provides this status information both to the planning subsystem and to the motor subsystem. The coupling of planner and reacting subsystem is similar t o the Procedural Reasoning System [GL86] and to Firby's RAP System (see Section 5). However, the approach focuses on reactivity; the underlying representation by a potential field is very application-specific and lacks generality. Moreover it is hard to see how models of other agents could be incorporated into the architecture apart from treating them as obstacles in a potential field. T h u s , as in basically all
reactive approaches, the cooperative abilities of A u R A robots do not exceed t h a t of simple grouping or following behaviors (see also [Mat93] [BA951).T h e r e is no way of expressing goals or even cooperative or synchronized plans.
2.4
Maes: dynamic action selection
P a t t i e Maes [Mae89] [Mae90b] presented a model of action selection in dynamic agents, i.e., a model the agent can use to decide what to do next. Driven by the drawbacks of b o t h purely deliberative agents and purely situated agents, Maes argues in favor of introducing the notion of goals for situated agents. However, in contrast to traditional symbolic approaches, her model is based on the idea of describing action selection as an emergent property of a dynamics of activation and inhibition among the different actions the agent can execute. T h e model eschews any global control arbitrating a m o n g the different actions. An agent is described by a set of competence modules; these correspond to the notion of operators in classical AI planning. Each module is described by preconditions, expected effects (add and delete lists), and a level of activation. Modules are arranged in a network by different types of links: successor links, predecessor links, and conflictor links. A successor link a A~ b denotes that a provides the precondition for b. A predecessor link between two lnodules a and b is defined as a -~ b iff b -2+ a. Finally, a conflictor link between two modules a and b, a -~ b denotes that a disables b by destroying b's precondition. Modules use these links to activate or inhibit each other in three basic ways: firstly, the activation of successors occurs by an executable module spreading activation forward. This m e t h o d implements the concept of enforcing sequential actions. Secondly, the activation of predecessors provides a simple backtracking mechanism in case of a failure. Thirdly, the inhibition of eonflictors resolves conflicts a m o n g modules. In order to avoid cyclic inhibition, only the module with the highest activation level is able to inhibit others. Activation messages increases the activation vahle of a 1nodule. If the activation value of a module exceeds the threshold specified by the activation level, and if its preconditions are satisfied, the module will take action. Maes' approach extends purely reactive approaches by introducing the useful abstraction of goals. However, the underlying process of emergence is not yet fiflly understood, and the system behavior resulting from it is difficult to understand, predict, and verify.
3
Architectures
for D e l i b e r a t i v e
Agents
Most agent models in AI are based on Simon and Newell's physical symbol system hypothesis [NS76] in their assumption that agents maintain an internal representation of their world, and that there is an explicit mental state which can be modified by some form of symbolic reasoning. These agents are often called deliberative agents. AI planning systems [FHN71] [Sac75] [Wi188] [AIS90] can be regarded as the predecessor of today's research on deliberative agents
6
(see [M/i196b] for an overview). Over the past few years, an interesting research direction has explored the modeling of agents based on beliefs, desires, and intentions (BDI architectures). In this section, we give some prominent examples of deliberative agents. Further examples of deliberative agent architectures can be found in [GL86] [SH90] [WMT96] [KGR96] [Gmy96] [NL96] [MWJ97] 3.1
Bratman
e t al.: I R M A
IRMA [BIP87] is an architecture for resource-bounded agents that describes how an agent selects its course of action based on explicit representations of its perception, beliefs, desires, and intentions. The architecture incorporates a number of modules including an intention structure, which is basically a time-ordered set of partial, tree-structured plans, a means-end reasoner, an opportunity analyzer, a filtering process, and a deliberation procedure. As soon as the agent's beliefs are updated by its perception, the opportunity analyzer is able to suggest options for action based on the agent's beliefs. Fhrther options are suggested by the means-end reasoner from the current intentional structure of the agent. All available options run through a filtering process where they are tested for consistency with the agent's current intentional structure. Finally, options that pass the filtering process successfully are passed to a deliberation process that modifies the intention structure by adopting a new intention, i.e., by committing to a specific partial plan. The IRMA model embodies two different views on plans: on the one hand, the plans that are stored in the plan library can be looked upon as beliefs the agent has about what actions are useful for achieving its goals. On the other hand, the set of plans the agent has currently adopted define its local intentional structure. This second view of plans as intentions has become now the most accepted paradigm in research on BDI architectures. IRMA tal~es a pragmatic stance towards BDI architectures. In particular, it does not provide an explicit formal model for beliefs, goals, and intentions nor for their processing 4. Thus, the contribution of IRMA has rather been the definition of a control framework for BDI-style agents which served as a basis to many subsequent formal refinements of BDI-concepts. 3.2
Rao and Georgeff: a formal BDI model
Anand Rao and Michael Georgeff [RG91b] formalized the BDI model, including the definition of the underlying logics, the description of belief, desire, and intentions as modal operators, the definition of a possible worlds semantics for these operators, and an axiomatization defining the interrelationship and properties of the BDI-operators. In contrast to most philosophical theories, Rao and Georgeff have treated intentions as first-class citizens, i.e., as a concept which has equal status to belief and desire. This allows the representation of different 4 In [Bra87], Bratman gave a theory of intentions from a philosophical point of view.
types of rational commitment based on different properties on the persistence of beliefs, goals, and intentions. The world is modeled using a temporal structure with branching time fllture and linear past, a so-called time tree. Situations are defined as particular time points in particular worlds. Time points are transformed into one another by events. There are both primitive and non-primitive events; the latter are useful to model partial and hierarchical plmas that are decomposable into subplans and, finally, into primitive actions. There is a distinction between choice and chance, i.e., between the ability of an agent to deliberately select its actions fl"om a set of alternatives and the uncertainty of the outcome of actions, where the determination is made by the environment rather than by the agent. The formal language describing these structures is a variation of the Computation Tree Logic (CTL*) [ES891. There are two types of formulae, namely state formulae which are evaluated at specific time points, and path formulae which are evaluated over a path in a time tree. Semantics is defined in three parts: a semantics for state and path formulae, a semantics of events, and a semantics of beliefs, goals, and intentions. It is specified by an interpretation M that maps a standard first-order formula into a domain and into truth values, and a possible-worlds semantics for mental modalities by introducing accessibility relations for beliefs, goals, and intentions. Beliefs are axiomatized in the standard weak-S5 (KD45) modal system (see [MvdHV91]). The D and K axioms are assumed for goals and intentions. T h a t means that goals and intentions are closed under implication and that they have to be consistent. The original axiomatization provided by [RG91a] stiflers from the well-known problem of logical omniscience, since, due to the necessitation rule, an agent must believe all valid fornmlae (see [Var86]), intends them and has the goal to achieve them. Different commitment strategies, i.e., relationships between the current intentions of an agent and its fllture intentions, were discussed in [RG91b] and empirically evaluated in [KG91]: blind commitment, single-minded commitment, and open-minded commitment. 3.3
R a o a n d G e o r g e f f : a n i n t e r p r e t e r for a B D I a g e n t
A major criticism of the BDI theory as presented in [RG91a] is that the nmltimodal BDI logics do not have complete axiomatizations and that no efficient implementations are available for them [RG95]; hence, so far they had little influence on the actual implementation of BDI-systems. In [RG92], the authors address this criticism by providing all abstract interpreter for a BDI agent. An abstract agent interpreter is specified that embodies the essential modules of Bratman's BDI agent (see Subsection 3.1). It describes the control of an agent by a processing cycle:
BDI Interpreter initialize-state(); repeat options := option-generator(event-queue); selected-options :-- deliberate(options); update-intentions(selected-options); execute(); get-new-external-events (); drop-successful- at tit udes (); drop-impossible- attitudes (); end repeat In each cycle, the event queue is looked up by the interpreter. A set of options is generated, i.e., goals that the agent could potentially pursue given the current state of the environment. The set of options is extended by the options that are generated by the deliberator. Finally, the intention formation step is taken in the procedure update-intentions. A subset of the options determined so far is selected as intentions, and the agent commits to the associated course of action. If the agent has committed to perform an executable action, the actual execution is initiated. The cycle ends by incorporating new events into the event queue, and by checking the current goals (options) and intentions whether they have been achieved, or whether they axe impossible (in the case of desires) or un-realizable (in case of intentions). In more recent work [RG95], Rao and Georgeff have proposed a number of restricting assumptions and representation choices to their model to obtain a
practical architecture: - Instead of allowing arbitrary formulae for beliefs and goals, these are restricted to be ground sets of literals with no disjunctions or hnplications. - Only beliefs about the current state of the world are explicitly represented, denoting the agent's current beliefs that are prone to change over time. - Information for means-end reasoning, i.e., about means of reaching certain world states and about the options available to the agent for proceeding towards the achievement of its goals, is represented by plans. - The intentions of an agent are represented implicitly by the agent's run-time stack. The Procedural Reasoning System ([GL861 [GI891) and dMARS [KGR961are implementations of a BDI architecture based on these assumptions. 3.4
Shoham and Thomas: agent-oriented programming
Shoham [Sho93] proposed the framework of agent-oriented programming (AOP). He presented a class of agent languages that are based on a model looking upon an agent as "an entity whose state is viewed as consisting of mental components such as beliefs, capabilities, choices, and commitments". Thus, Shoham adopts the notion of an intentional stance as proposed by Dennett [Den87].
An AOP system is defined by three components, (i) a formal language for describing mental states of agents; (ii) a programnfing language with a semantics corresponding to that of a mental state; and (iii) an ager~tifier, i.e., a mechanism for turning a device into what can be called an agent, and which can thus bridge the gap between low-level machine processes and the intentional level of agent programs. In the research published so far, Shoham focused on the former two elements of AOP . T h e formal language comprises the mental categories of beliefs, obligations, and capability. Obligation largely coincides with Rao and Georgeff's notion of intention and commitment. Capability is not directly represented as a mental concept in the BDI architecture; rather, it is covered by plans to achieve certain goals. T h e p r o g r a m m i n g of agents is viewed as the specification of conditions for making commitments. The control of an agent is implemented by a generic age~t interpreter running in a two-phase loop. In each cycle, first the agent reads current messages and updates its mental state; second, it executes its current c o m m i t m e n t s , possibly resulting in further modifications of its beliefs. A G E N T 0 is a simple instance of the generic interpreter. The language underlying A G E N T 0 comprises representations for facts, unconditional and conditional actions (both of which can be private or comlmmicative), and comn~itment rules which describe conditions under which the agent will enter into new comnfitments based Oll its current nlental state and on the messages received by other agents. Messages are structured according to message types; admissible message types in A G E N T 0 are INFORM, REQUEST, and UNREQUEST. The corresponding agent interpreter instantiates the basic loop by providing flmctions for updating beliefs and commitments. Beliefs are u p d a t e d or revised as a result of being informed or of executing an action. C o m m i t m e n t s are u p d a t e d as a result of changing beliefs or of UNREQUEST messages received by other agents. A G E N T 0 is a very simple language which was not meant for buildiug interesting applications. I m p o r t a n t aspects of agenthood have been neglected: it does not account for motivation, i.e., for how goals of agents come into being, nor for decision-making, i.e., how the agent selects among alternative options. T h e P L A C A language [Tho9a] [Tho95] extends A G E N T 0 by introducing knowledge a b o u t goals the agent can achieve, and refines the basic agent cycle by adding a time-dependent step of plan construction and plan refinement. P L A C A adopts B r a t m a n ' s view of plans as intentions, i.e., the agent has a set of plans; its intentions are described by the subset of plans the agent has c o m m i t t e d to. Whereas P L A C A clearly extends the expressiveness of A G E N T 0 by providing the notion of plans, it does not address other restrictions such as motivation, decision-making, and the weak expressiveness of the underlying language.
10
4
Architectures for Interacting Agents
Distributed Artificial Intelligence (DAI) 5 deals with coordination and cooperation among distributed intelligent agents. So far, its focus has been on the coordination process itself and on mechanisms for cooperation among autonomous agents rather than on the structure of these agents. However, some recent work deals with the incorporation of cooperative abilities into an agent framework. In the following, four prominent approaches are described in more detail: Fischer's MAGSY, the GRATE* architecture by Jennings, Steiner's M E C C A system, and the COSY architecture developed by Sundermeyer and Burmeister. For further reading, see e.g., [Geo83] [FF94] [RZ94] [MC95] [BF96] [CD96] [MLF96]. 4.1
Fischer: MAGSY
Klaus Fischer developed the MAGSY [Fis93] system, a language for the design of multiagent systems. The MAGSY agent architecture is fairly simple. A MAGSY agent consists of a set of facts representing its local knowledge, a set of rules representing its strategies and behavior, and a set of services that define the agent's interface. An agent can request a service offered by another agent by communication. Fischer demonstrates the applicability of MAGSY by the application of decentralized cooperative planning in an automated manufacturing environment. Agents are e.g., robots, and different types of machines like heating cells or welding machines. The domain plmls of the robots are represented as Petri nets, which are translated into a set of rules, so-cailed behaviors. These behaviors are procedures that interleave planning with execution. The MAGSY language enables the efficient and convenient implementation of multiagent systems. It provides a variety of useful services and protocols to establish lnultiagent communication links. Clearly, MAGSY inherits b o t h the positive and the negative properties of rule-based programming languages: on the one hand, there is concurrency and the suitability for modeling reactive agents; on the other, there is the flat knowledge representation and the awkward way to represent sequential programs. Cooperation between agents is hard-wired by connections between the Petri nets representing behaviors of different agents. Thus, MAGSY does not support reasoning about cooperation. 4.2
Jennings: GRATE*
The focus of Jennings' work on the GRATE* architecture [Jen92b] was on cooperation among possibly preexisting and independent intelligent systems, through an additional cooperation knowledge layer. The problem solving capability of agents were extended by sharing information and tasks among each s See [BG88] [GHS9] for collections of papers that provide a good overview of DAI research by the end of the 1980s. More recent work on DAI can be found in the annual proceedings of the Distributed AI Workshop (until 1994) and in the proceedings of the International Conference on Multiagent Systems (ICMAS) (bienially since 1995).
11 other. GRATE* is an architecture for the design of interacting problem solvers. A general description of cooperative agent behavior is represented by built-in knowledge. Domain-dependent information about other agents is stored in specific data structures (agent models). GRATE* consists of two layers, a cooperation and control layer and a domain level system. The latter can be preexisting or purpose built; it provides the necessary domain flmctionality of the individual problem solver. The former layer is a meta-controller operating on the domain level system in order to ensure that its activities are coordinated with those of others in the multiagent system [Jen92a]. Agent models hold different types of knowledge: the acquaintance model includes knowledge the agent has about other agents; the self model comprises an abstracted perspective of the local domain level system, i.e., of the agent's skills and capabilities. The cooperation and control layer consists of three submodules, representing the interplay among local and cooperative behavior: The control module is responsible for the planning, execution, and monitoring of local tasks. The cooperation module handles processes of cooperation and coordination with other agents. The situation assessment module forms the interface between local and social control n:echanisms. It is thus responsible for the decision to choose local or coordinated methods of problem solving. Clearly, Jennings' focus was on the cooperation process. However, he went beyond work discussed before by defining a two-layer architecture that embeds cooperation into a domain level system. The architecture does not address more subtle questions of agent behavior, such as how to reconcile reactivity and deliberation. Rather, these problems are expected to be solved within the domain level system.
4.3
S t e i n e r et al: M E C C A
In the MECCA architecture [SBKL9a] [Lux951[LS951, an agent is regarded as having an application-dependent body, a head whose purpose is to actually agentify the underlying system, and a communicator which establishes physical communication links to other agents. This view supports the construction of multiagent systems Kom second principles. Agent modeling is addressed in the design of the agent's head. It is described by a basic agent loop consisting of four parallel processes: goal activation, planning, scheduling, and execution. In the goal activation process, relevant situations (e.g., user input) are recognized and goals are created that are input to the planning process. There, a partially ordered plan structure is generated corresponding to a set of possible courses of action the agent is allowed to take. The scheduler instantiates (serializes) this partially ordered event structure by assigning time points to actions. The execution of actions is initiated and monitored by the execution process. All control processes in the basic loop nlay involve coordination with other agents, leading to joint goals, plans, and commitments, and to the synchronized execution of plans. Cooperation is based on speech act theory: MECCA provides a set of cooperation primitives (e.g., INFORM, PROPOSE, ACCEPT) which are
12 treated by the planner as actions, i,e., whose semantics can be described by preconditions and effects. This allows the planner to reason about communication with other agents as a means of achieving goals [LS95]. Moreover, cooperation primitives are the basic building blocks of communication protocols, so-called cooperation methods. 4.4
S u n d e r m e y e r et ah C O S Y
The COSY agent architecture [BS92] describes an agent by behaviors, resources, and intentions. The behavior of an agent is classified into perceptual, cognitive, communicative, and effectoric, each of which is simulated by a specific component in the COSY architecture. Resources include cognitive resources such as knowledge and belief, communication resources such as low-level protocols and communication hardware, and physical resources, e.g., the gripper of a robot. Intentions are used in a sense that differs from [ca90] [RG91a]: there are strategic intentions modeling an agent's long-term goals, preferences, roles and responsibilities, and tactical intentions that are directly tied to actions, representing an agent's commitment to his chosen course of action. The individual modules of COSY are ACTUATORS, SENSORS, COMMUNICATION, MOTIVATIONS, and COGNITION. The former three are domainspecific modules with their intuitive functionality. The motivations module implements the strategic intentions of an agent. The cognition module evaluates the current situation and selects, executes, and monitors actions of the agent in that situation. It consists of four subcomponents, a Knowledge Base, Script Exe-
cution Component, Protocol Execution Component, and Reasoning and Deciding Component. The application specific problem solving knowledge is encoded into plans. There are two types of plans stored in a plan library: scripts describing stereotypical courses of action to achieve certain goals, and cooperation protocols describing patterns of communication [BHS93]. Scripts are monitored and executed by the Script Execution Component, handing over the execution of primitive behaviors to the actuators, and protocols to the Protocol Execution Component. The Reasoning and Deciding Component is a general control mechmlism, monitoring and administering the reasoning and decisions concerning task selection and plan selection, including the reasoning and decisions concerning intra-script and intra-protocol branches. In [Had96], Haddadi has provided a deeper theoretical model by extending Rao and Georgeff's BDI model by a theory of commitments and by defining mechanisms allowing agents to reason about how to exploit potentials for cooperation by communicating with each other.
5
Hybrid Agent Architectures
The approaches discussed so far suffer from different shortcomings: whereas purely reactive systems have a limited scope insofar as they can hardly implement goal-directed behavior, most deliberative systems are based on general-purpose
13
reasoning mechanisms which are not tractable, and which are much less reactive. One way to overcome these limitations in practice, which has become popular over the past few years, are layered architectures. Layering is a powerful means for structuring functionalities and control, and thus is a valuable tool for system design supporting several desired properties such as reactivity, deliberation, cooperation, and adaptability. The main idea is to structure the functionalities of an agent into two or more hierarchically organized layers that interact with each other in order to achieve coherent behavior of the agent as a whole. Layering offers the following advantages: -
-
It supports modularization of an agent; different functionalities are clearly separated and linked by well-defined interfaces. This makes the design of agents more compact, increases robustness, and facilitates debugging. Since different layers may run in parallel, the agent's computational capability can be increased in principle by a linear factor. Especially, the agent's reactivity can be increased: while planning, a reactive layer can still monitor the world for contingency situations. Since different types and partitions of knowledge are required for the implementation of different functionalities, it is often possible to restrict the amount of knowledge an individual layer needs to consider.
These advantages have lnade layering a popular tedmique that has been mostly used to reconcile reaction and deliberation. In the following, four layered approaches are presented: architectures based on Firby's RAPs, the planner-reactor architecture proposed by Lyons and Hendriks, Ferguson's Touring Machines architecture, and the INTER.RAP architecture developed by the author of this survey. For further reading, we refer to [Bro86] [Kae90] IBM91] [Fir92] [LH92] [Dab93] [BKMS96] [SP96] [Dav97].
5.1
Firby, Gat, Bonasso: reactive action packages
Firby's work [Fir89] [Fir92] has been most influential in research on integrating reaction and deliberation in the area of AI planning and robotics. In this paragraph, we outline Firby's original work and two of its recent extensions.
The RAPs System. The RAPs (Reactive Action Packages) system describes an integrated architecture for planning and control. The underlying agent architecture consists of three modules, a planning layer, the RAP executor, and a controller. T h e planning layer produces sketchy plans for achieving goals using a world model representation and a plan library. T h e RAP executor fills in the details of the sketchy plans at run-time. The expansion of vague plan steps into more detailed instructions (methods) at run-time reduces the amount of planning uncertainty and thus largely simplifies planning. If incorrect methods are
14
selected at run-time, the RAP executor is able to recognize failure 6 and to select alternative methods to achieve the goal. Apart from controlling the process of achieving goals in a reactive manner, and thus providing the interface between subsymbolic continuous and symbolic discrete representation and reasoning, the R A P executor offers a set of abstract primitive actions to the planner. The controller provides two kinds of routines that can be activated by requests from the RAP executor and that deliver results to that module: active sensing routines and behavior control routines. Sensing routines are useful for providing lacking information about the current world state. Behavior routines axe continuous control processes that change the state of the physical environment. Examples for behavior routines axe collision avoidance, visual tracking, or moving to a specified direction. In a later paper [Fir94], the control of continuous processes (i.e., the interplay of the RAP executor and the controller) is elaborated by describing an extension to the RAPs representation language and the semmatics for task nets. A T L A N T I S Gat [Gat91b] [Gat92] describes the heterogeneous, asynchronous architecture ATLANTIS that combines a traditional AI planner with a reactive control mechanism for robot navigation applications. ATLANTIS consists of three control components: a controller, a sequencer, and a deliberator. The controller is responsible for executing and monitoring the primitive activities of the agent. [Gat91a] defines a language for modeling the often nonlinear and discontinuous control processes. The controller thus connects to the physical sensors and actuators of the system. The deliberator process performs deliberative computations which may be time-consuming, such as planning or world modeling. Between the two components stands the sequencer which initiates and terminates primitive activities by activating and deactivating control processes in the controller, and which maintains the allocation of computational resources to the deliberator, by initiating and terminating deliberation with respect to a specific task. As in the RAPs system, the sequencer maintains a task queue; each task described by a set of methods together with conditions for their applicability. Methods describe either primitive activities or subtasks; in the former case, the corresponding module in the controller is activated; the latter case is handled by recursive expansion. ATLANTIS extends Firby's original work by allowing control of activities instead of primitive actions, and provides a bottom-up flow of control: in RAPs, tasks are installed by the planner whereas they are initiated in the sequencer in Gat's architecture. The 3T architecture In [BKMS96], Peter Bonasso and colleagues have defined the layered architecture 3T which enhances the RAPs system by a planner. In particular, ~r consists of three control layers: a reactive skill layer, a sequencing layer, and a deliberation layer. The reactive skill layer provides a set of situated skills. Skills are capabilities that, if placed in the proper context, achieve 6 The underlying assumption is that of a cognizant failure: it is not required that no failure occurs, but that virtually all possible failures may be detected if they occur, and that repair methods can be applied to recognized failures.
15
or maintain particular states in the world. The sequencing layer is based on the R A P s system. It maintains routine tasks that the agent has to accomplish. The sequencing layer triggers continuous control processes by activating and deactivating reactive skills. Finally, the deliberation layer provides a deliberative planning capability which selects appropriate R A P s to achieve complex tasks. This selection process may involve reasoning a b o u t goals, resources, and timing constraints. Colnpm'ed to G a t ' s work, 3T uses a more powerful planning mechanism; moreover, reactivity, i.e., the ability to react to time-critical events, is ilnplemented at the skill layer in ~l~, whereas it is partly a task of the sequencing layer in A T L A N T I S . Both of these extensions make the sequencing layer in 3T more compact and easier to handle.
5.2
Lyons and Hendriks: planning and reaction
In [LH92] a practical approach towards integrating reaction and deliberation in a robotics domain is introduced based on the planner-reactor model proposed by Drulmnond [DB90] in their E R E architecture incorporating planning, scheduling, and control 7. Whereas D r u m m o n d and Bresina's model focused oi1 the anytime character of the architecture, Lyons and Hendriks put emphasis on the task of producing timely, relevant actions, i.e., on the task of qualitatively reasonable behavior. T h e basic structure of Lyons and Hendriks' architecture is described by a planner, a reactor, and a world (which is, in the spirit of control theory [DW911, looked upon as a part of the system to be described). In contrast to the hybrid approaches discussed so far, in Lyons and Hendriks' model planning is looked upon as incrementally adapting the reactive system which is rulming concurrently in a separate process by bringing it into accordance with a set of goals. Thus, the planner can iteratively improve the behavior of the reactive component. T h e reactor itself consists of a network of reactions, i.e., sensory processes that are coupled with action processes in a sense that the sensory processes initiate their corresponding action processes in case they meet their trigger conditions. It can act at any time independently from the planner, and it acts in real time. T h e planner can reason about a model of the environment (EM), a description of the reactor (R), and a description of goals (G) that are currently to be achieved by the reactor as well as constraints imposed by these goals. T h e task of the planner is to continuously monitor whether the behavior of R conforms to G. If this is not the case, the plmmer incrementally dlanges the configuration of R by specifying adaptations. On the other hand, the reactor can send collected sensory d a t a to the planner allowing the latter to predict the future state of the environment. Adaptations of the reactor include removing reactions from the reactor and adding new reactions. 7 A similar paradigm has been proposed by McDermott [McD91].
16
5.3
Ferguson: Touring Machines
In [Fer92], Ferguson describes a layered control architecture for autonomous, mobile agents performing constrained navigation tasks in a dynamic environment. The Touring Machines architecture consists of three layers, a reactive layer, a planning layer, and a modeling layer. These layers operate concurrently; each of them is connected to the agent's sensory subsystem from which it receives perceptual information, and to the action subsystem to which it sends action commands. The reactive layer is designed to compute hard-wired domain-specific action responses to specific environmental stimuli; thus, it brings about reactive behavior. On the other hand, the planning layer is responsible for generating and executing plans for the achievement of the longer-term relocation tasks the agent has to perform. Plans are stored as hierarchical partial plans in a plan library; based on a topological map, single-agent linear plans of action are computed by the agent. Planning and execution are interleaved to cope with certain forms of expected failure. The modeling layer provides the agent's capability of modifying plans based on changes in its environment that cannot be dealt with by the replanning mechanisms provided by the planning layer. In addition. the modeling layer provides a framework for modeling the agent's environment. eald especially, for building and maintaining mental and causal models of other agents. The individual layers are mediated by a control framework that coordinates their access both to sensory input and to action output. This is described by means of a set of context-activated control rules. There are ~wo types of rules: censors and suppressors (see also [Min86]). Censors filter selected information from the input to the control layers; suppressors filter selected information (i.e.. action commands) from the output of the control layers. The unrestricted concurrent access of the control layers to information and action and the global (i.e.. for the agent as a whole) control rules in Ferguson's model imply a high design effort to analyze, predict, and prevent possible interactions among the layer. Since each l~ver may interact with any other layer in various ways either by being activated through similar patterns of perception or by triggering contradictory or incompatible actions, a large number of control rules are necessary. Thus, for complex applications, the design of consistent control rules itself is a very hard problem. 5.4
M i i l l e r a n d P i s c h e h INTERRAP
Like the ~F and Touring Machines architectures discussed in the previous sections. INTERRAP [MP94] [Mii196b], [Mii196a] consists of three layers. However. the focus of INTERRAP was to extend the scope of agent control architectures by supporting agent interaction. For this purpose, it offers a separate cooperative planning layer on top of the behavior-based layer and the local planning layer. The cooperative planning layer implements a cooperation model (see [Mii196a]). It provides negotiation protocols and negotiation strategies (see also [RZ94]). Triggered by control messages from the lower layers, an agent can decide to
17
start a cooperation with other agents by selecting a protocol and a strategy. In [M(i196b], it has been shown how autonomous robots can solve conflicts by negotiating a joint plan. As regards the control flow, INTERRAP forgoes global control rules. Rather, it uses two hierarchical control mechanisms. Activity is triggered bottom-up by so-called upward activation requests, whereas execution is triggered top-down (downward commitment posting). I.e., a control layer in INTERRAP will become active only if the next lower layer cmmot deal with a situation. This competencebased control allows an agent to react adequately to a situation, either by patterns of behavior, by local planning, or by cooperation. The execution o u t p u t of the cooperative planning layer are partial plans with built-in synchronization commands; these are passed to the local plamfing layer, which outputs calls to procedure patterns of behaviour in the behavior-based component. The latter component then produces actions. The strict control in INTERRAP considerably simplifies design; a more flexible architecture, e.g., allowing concurrent activation of the different control layers, would certainly be useful, but the discussion of the approqaches in this section shows that this will not come for free, and requires sophisticated mechanisms of coordination and pre-emption.
6
Other
Approaches
Over the past few years, the notion of an agent has been used in a number of different contexts - - often confilsing, not only for non-experts. In this section we shall discuss architectural issues imposed in some of these areas: believable agents, software agents, and s@bots.
6.1
Believable Agents
The Oz project [Bat94] [Rei96] investigates agents that are both autonomous and that can act as believable characters in interactions with humans (e.g., in a computer game). Such agents may not be intelligent and may be not be realistic, but will have a strong personality. The approach to modeling these agents is what is called a broad but shallow approach (see also [SP96] [Dav97]): instead of being able to do a smM1 number of things particularly well, the agent is required to cope with a variety of situations occurring in interactions with human users. The architectural challenges in building believable agents lie in the integration of a large set of shallow capabilities. The Tok architecture [BLR92] extends the traditional AI approach in various respects: firstly, it enhances sensing-acting by the notion of body and face changes. Secondly, it maintains an emotional state of the agent with specific emotion types and structures, and with corresponding behavioral features. Emotions can cause changing behaviour of the agent in terms of motivation, action, and body state and facial expression.
18 6.2
Software A g e n t s a n d S o f t b o t s
The rapid development of the Web has given rise to the development of a different class of agents: software agents [Mae94a] [GK94] and softbots [EW94] [Etz96]. In these approaches, the notion of an agent is used as a metaphor for "intelligent antonomous system[s] that help[s] the user with certain computerbased tasks [... and that] offer[s] assistance to the user mad automate[s] as many of the actions of the user as possible" [Mae94b]. Thus, software agents and softbots are agents that act autonomously in a software-based domain, and the task of which is mostly to assist the user in dealing with information management tasks. Etzioni [Etz96] rightfully draws the analogy between the task of designing such software agents and that of designing autonomous robots. While the architectural requirements of modeling software agents seem indeed very similar to that of traditional (mostly robot-like) agents, there are a number of differences: firstly, a software environment is in many respects easier to deal with than a hardware environment as it liberates the designer from the problems of a rough physical environment that are related to sensing and physically manipulating the world. Thus, although software agents need to cope with a changing world, the possible ways in which this world may change is somewhat more restricted in practice. Secondly, software agents often interact closely with humans. This increases the importance of dealing with communication and cooperation compared to robotic systems. Both differences seem to justify agent architectures for software agents to focus on the higher level functionalities such as adaptation, cooperation, and planning, and to simplify the lower layers of reactivity and physical action. Current software agent research mostly operates under what Etzioni calls a useful first approada, that focusses on useful functionality and either defers o1" abandons the claim of (AI style) intelligence (see also [FG97] [Pet96] for two interesting recent essays in this context). Systems like Etzioni's softbots or Maes's assistant agents do not seem to have given up the long-term plan of combining intelligent with useful behavior in Internet domains (e.g., by using AI planning techniques in the former case); software agents research at Stanford University focusses on communication between agents and is strongly based on the typed messages paradigm [GK94]; the KQML Java agent template is an example of an implementation of a simple agent model based on JAVA [Fro96]. However, most other software agents approaches are using the term agent in a very loose and shallow manner: i.e., they rely on a definition of an agent as a program that is located in a network a n d - - either through migration or through a service interface - - can communicate with other programs on the network. l~'om the architectural point of view, this sort of agents does not seem to be very interesting. The challenge in their development is - - as it is the case with mobile agents [SBH96] - - at a different, more technical level. This holds true especially for the existing (semi-) commercial agent-based systems, most of them JAVA or TELESCRIPT-based. There are however, a
19
few notable exceptions: one is the dMARS system which offers an agent programnfing toolkit based on a BDI model (see Section 3) along with a design methodology [KG97] and which has been used to model various industrial applications [RG95]. A second one is the M E D L s Agent Framework [FJM+96] which offers a general agent p r o g r a m m i n g framework for software agent applications, based on a high-level communication model and a flexible layered agent architecture. It m a y be too early for such a conclusion, but it seems like in the future these systems might provide the appropriate trade-off between the sophisticated but not sufficiently useflfl academic approaches and today's useful but not very sophisticated software agents.
7
Conclusion
In this p a p e r we surveyed different control architectures for autonomous ageuts. Starting with a discussion of three researd: threads, i.e., reactive, deliberative, and interacting agents, we proceeded towards hybrid approaches t h a t aimed at integrating various agent capabilities. Finally, we overviewed some recent developments in software agents. In this survey we did not say anything a b o u t the numerous parental disciplines of agent research. For the interested reader, we refer to [Mfi1966, Chapter 2]. Given the variety of the approaches described in this paper, one :night wonder a b o u t whether there is some sort of convergence towards a generally accepted agent architecture. One quiel: and simple answer is that, unfortunately, this is not the case: researchers are still debating about the definition of an agent (see [FG97] for a good survey of this debate), and a general agent architecture seems to be out of sight while there is no general agreement on the basics of an agent. A :::ore optimistic answer, however, is that despite the ongoing debate, there are a variety of architectures that constitue operational models for implemented agent languages. Layered agent ardfitectures and BDI architectures are widely agreed upon, and there is active research work on combining the::: [FMP96] [I(G97]. Design methodologies for agents and agent-based systems are becoming a research issue on their own right. T h e aim of a unique and generally agreed on definition of agents and agent architectures to me seems a bit like wanting a single progrmnming language to be used worldwide. Given this analogy, m a y b e the i m p o r t a n t issue is not to agree on a unique definition of agents and of their internal structure, but rather to develop uniform interfaces and communication standards that allow different and possibly heterogeneous agents to interact.
References [AC87]
P.E. Agre and D. Chapman. Pengi: an Implementation of a Theory of Activity. In Proc. of AAAI-87, pages 268-272. Morgan Kaufmann, 1987.
s Mitsubishi Electric Digital Library
20
[AC90] [AHT90] [AIS90I [Ark901 [BA951
[Bab861 [Bat94] [BF96]
[BGS8]
P. E. Agre and D. Chapman. What are plans for? In [Mae9Oa], pages 17-34. 1990. J. F. Allen, J. Hendler, and A. Tate. Readings in Planning. Morgan Kaufmann, San Mateo, 1990. J. A. Ambros-Ingersson and S. Steel. Integrating planning, execution, and monitoring. In [AHT90], pages 735-740. 1990. R. C. Arkin. Integrating behavioral, perceptual, and world knowledge in reactive navigation. In [Mae90a], pages 105-122. 1990. T. Balch and R. C. Arkin. Motor schema-based formation control for multiagent robot teams. In Proceedings of the First International Conference on Multiagent Systems, San Francisco, CA, 1995. A. Babloyantz. Molecules, Dynamics and Life. An Introduction to SelfOrganization of Matter. John Wiley and Sons, 1986. J. Bates. The role of emotions, in beliewble agents. Communications of the ACM, 37(7):122-125, 1994. M. Barbuceanu and M. S. Fox. The architecture of an agent building shell. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent Agents
-- Proceedings of the 1995 Workshop on Agent Theories, Architectures, and Languages (ATAL-95), volume 1037 of Lecture Notes in Artificial Intelligence, pages 235-250. Springer-Verlag, 1996. A. Bond and L. Gasser. Readings in Distributed Artificial Intelligence.
Morgan Kaufmann, Los Angeles, CA, 1988. B. Burmeister, A. Haddadi, and K. Sundermeyer. Generic configurable cooperation protocols for multi-agent systems. In Pre-Proceedings of MAAMAW93. University of Neuchs August 1993. M. E. Bratman, D. J. Israel, and M. E. Pollack. Toward an architecture for [BIP87] resource-bounded agents. Technical Report CSLI-87-104, Center for the Study of Language and Information, SRI and Stanford University, August 1987. [BKMS96] R. P. Bonasso, D. Kortenkamp, D. P. Miller, and M. Slack. Experiences with an architecture for intelligent, reactive agents. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent A g e n t s - Proceedings
[BHS93]
of the 1995 Workshop on Agent Theories, Architectures, and Languages (ATAL-95), volume 1037 of Lecture Notes in Artificial Intelligence, pages [BLR92]
187-202. Springer-Verlag, 1996. J. Bates, A. B. Loyall, and W. S. Reilly. An architecture for action, emotion, and social behavior. In Proceedings of the Fourth European Workshop
on Modeling Autonomous Agents in a Multi-Agent World (MAAMAW-gP), [BM911
[Bra87] [Bro86]
[Bro90]
S. Martino al Cimino, Italy, 1992. H.-J. Bfirckert and H. J. Miiller. RATMAN: Rational Agents Testbed for Multi-Agent Networks. In Y. Demazeau and J.-P. Miiller, editors, Decentralized A. L, volume 2, pages 217-230. North-Holland, 1991. Also published in the Proceedings of MAAMAW-90. M. E. Bratman. Intentions, Plans, and Practical Reason. Harvard University Press, 1987. Rodney A. Brooks. A robust layered control system for a mobile robot. In IEEE Journal of Robotics and Automation, volume RA-2 (1), pages 14-23, April 1986. Rodney A. Brooks. A robot that walks: Emergent behaviors from a carefully evolved network. In Patric Henry Winston and Sarah Alexandra
21
[Brogl] [BS921
[CD961
[CL90] [Dab93] [Dav97]
[DB90]
[Den87] [DW91] [ES89]
[Etz961
[EW94] [Fer89]
[Ver92] [Fer95]
[FF94]
[~G97]
Shellard, editors, Artificial Intelligence at MIT, Expanding Frontiers, pages 28-39. MIT Press, Cambridge, Massachusets, 1990. R. A. Brooks. Intelligence without representation. Artificial Intelligence, 47:139-159, 1991. B. Burmeister and K. Sundermeyer. Cooperative problem-solving guided by intentions and perception. In Y. Demazeau and E. Werner, editors, Decentralized A. I., volume 3. North-Holland, 1992. B. Chaib-Draa. Interaction between agents in routine, familiar and unfamiliar situations, 1996. to appear in the International Journal of Intelligent and Cooperative Information Systems. P.R. Cohen and H.J. Levesque. Intention is choice with commitment. Artificial Intelligence, 42(3), 1990. V. G. Dabija. Deciding Whether to Plan to React. PhD thesis, Stanford University, Department of Computer Science, December 1993. D. N. Davis. Reactive and motivational agents: towards a collective raindef. In J. P. Miiller, M. J. Wooldridge, and N. R. Jennings, editors, Intelligent Agents III - - Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages (ATAL-96), Lecture Notes in Artificial Intelligence. Springer-Verlag, Heidelberg, 1997. M. Drmnmond and J. Bresina. Anytime synthetic projection: Maximizing the probability of goal satisfaction. In Proceedings of the Eight National Conference on Artificial Intelligence (AAAL90), pages 138-144. AAAI Press / MIT Press, 1990. D. Dennett. The Intentional Stance. MIT Press, Cambridge, MA, 1987. T. L. Dean and M. P. VVellman. Planning and Control. Morgan Kaufinann Publishers, San Mateo CA, 1991. E. A. Emerson and J. Srinivasan. Branching time temporal logic. In J. W. de Bakker, W.-P. de Roever, and G. Roezenberg, editors, Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency, pages 123-172. Springer-Verlag, Berlin, 1989. O. Etzioni. Moving up the information food chain: Deploying softbots on the world wide web. In Proceedings of AAAI-96 (Abstract of invited talk), 1996. O. Etzioni and D. Weld. A softbot-bmsed interface to the internet. Communications of the ACM, 37(7):72-76, 1994. J. Ferber. Eco-problem solving: How to solve a problem by interactions. In Proceedings of the 9th Workshop on DAL pages 113-128, 1989. I. A. Ferguson. TouringMachines: An Architecture for Dynamic, Rational, Mobile Agents. PhD thesis, Computer Laboratory, University of Cambridge, UK,, 1992. I. A. Ferguson. Integrated control and coordinated behaviour. In M. J. Wooldridge and N. R. Jennings, editors, Intelligent Agents -- Theories, Architectures, and Languages, volmne 890 of Lecture Notes in AL Springer, January 1995. T. Finin and R. Fritzson. KQML - - a language and protocol for knowledge and information exchange. In Proceedings of the 13th Intl. Distributed Artificial Intelligence Workshop, pages 127-136, Seattle, WA, USA, 1994. S. Franklin and A. Graesser. Is it an agent, or just a program?: A taxonolny for autonomous agents. In J. P. Miiller, M. J. Wooldridge, and N. R. Jennings, editors, Intelligent Agents III - - Proceedings of the Third
22 International Workshop on Agent Theories, Architectures, and Languages (ATAL-96), Lecture Notes in Artificial Intelligence. ECAI-96, Budapest, 1997. [FHN71] R. E. Fikes, P. E. Hart, and N. Nitsson. STRIPS: A New Approach to the .Application of Theorem Proving. Artificial Intelligence, 2:189-208, 1971. [Fir89] R. James Firby. Adaptive Execution in Dynamic Domains. PhD thesis, Yale University, Computer Science Department, 1989. Also published as Technical Report YALEU/CSD/RR#672. R. James Firby. Building symbolic primitives with continuous control rou[Fi,'921 tines. In J. Hendler, editor, Proceedings of the 1st International Conference on Artificial Intelligence Planning Systems (AIPS-9P). Morgan Kaufmann Publishers, San Mateo, CA, 1992. R. James Firby. Task networks for controlling continuous processes. In [Fir941 Proceedings of the 2nd International Conference on Artificial Intelligence Planning Systems (AIPS-94), pages 49-54, 1994. [Fis93] K. Fischer. Verteiltes und kooperatives Planen in einer flexiblen Fertigungsumgebung. DISKI, Dissertationen zur Kiinstlichen Intelligenz. infix, 1993. [FJM+96] I.A. Ferguson, N.R. Jennings, J . P . Miiller, M. Pischel, a n d M. J. Wooldridge. The MEDL agent architecture. Internal Working Paper, 1996. [FMP96] K. Fischer, J. P. Miiller, and M. Pischel. A pragmatic BDI architecture. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent Agents - - Proceedings of the 1995 Workshop on Agent Theories, Architectures, and Languages (ATAL-95), volume 1037 of Lecture Notes in Artificial Intelligence, pages 203-218. Springer-Verlag, 1996. [Fro96] H.R. Frost. The JAVA agent template. http://cdr.st anford.edu/ABE/JavaAgent.html, 1996. [Gat91a] E. Gat. Alfa: a language for programming reactive robotic control systems. In Proceedings of the IEEE Conference on Robotics and Automation, 1991. [Gat91b] E. Gat. Reliable Goal-directed Reactive Control for Real-World Autonomous Mobile Robots. PhD thesis, Virginia Polytechnic and State University, Blacksburg, Virginia, 1991. [Gat92] E. Gat. Integrating planning and reacting in a heterogeneous asynchronous architecture for controlling real-world mobile robots. In Proceedings of AAAI'92, pages 809=815, 1992. [Geo83] M. Georgeff. Communication and interaction in multi-agent plans. In Proceedings of IJCAI-83, pages 125-129, Karslruhe, Germany, 1983. [GH89] L. Gasser and M.N. Huhns. Distributed Artificial Intelligence, Volume II. Research Notes in Artificial Intelligence. Morgan Kaufmann, San Mateo, CA, 1989. [GI891 M. P. Georgeff and F. F. Ingrand. Decision-malting in embedded reasoning systems. In Proceedings of the 6th International Joint Conference on Artificial Intelligence, pages 972-978, 1989. [GK94] M. R. Genesereth and S. P. Ketchpel. Software agents. Communications of the ACM, 37(7):48-53, 1994. [GL861 M. P. Georgeff and A. L. Lansky. Procedural knowledge. In Proceedings of the IEEE Special Issue on Knowledge Representation, volume 74, pages 1383-1398, 1986. [Gmy96] P. J. Gmytrasiewicz. On reasoning about other agents. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent Agents - - Proceedings of the 1995 Workshop on Agent Theories, Architectures, and Languages
23
[Had96]
[Jen92a] [Jen92b]
[Kae90]
[I(G911
[KG97]
[I
[I
[LS951 [Lux95]
[Mae89] [Mae90a] [Mae90b] [Mae94a] [Mae94b]
(ATAL-95), volume 1037 of Lecture Notes in Artificial Intelligence, pages 143-155. Springer-Verlag, 1996. A. Haddadi. Communication and Cooperation in Agent Systems: A Pragmatic Theory, volume 1056 of Lecture Notes in Artificial Intelligence. Springer-Verlag, Heidelberg, 1996. N. R. Jennings. Joint Intentions as a Model of Multi-Agent Cooperation. PhD thesis, Queen Mary and Westfield College, London, August 1992. N. R. Jennings. Towards a cooperation knowledge level for collaborative problem solving. In Proceedings of the 10th European Conference on Artificial Intelligence, pages 224-228, Vienna, 1992. L. P. Kaelbling. An architecture for intelligent reactive systems. In J. Allen, J. Hendler, and A. Tate, editors, Readings in Planning, pages 713-728. Morgan Kauflnann, 1990. D. Kinny and M. P. Georgeff. Commitment and effectiveness of situated agents. In Proceedings of the Twelfth Intenrational Joint COnference on Artificial Intelligence (IJCAI-91), pages 82-88, Sydney, Australia, 1991. D. Kinny and M. Georgeff. Modelling and design of multi-agent systems. In J. P. M(iller, M. J. Wooldridge, and N. R. Jennings, editors, Intelligent Agents III, Lecture Notes in Artificial Intelligence. Springer-Verlag, Heidelberg, 1997. To Appear. D. Kinny, M. P. Georgeff, and A. S. Rao. A methodology and modelling technique for systems of BDI agents. In W. van de Velde and J. W. Perram, editors, Agents Breaking Away -- 7th European Workshop on Modoiling Autonomous Agents in a Multi-Agent World (MAAMAW'96), volume 1038 of Lecture Notes in Artificial Intelligence, pages 56-71. SpringerVerlag, 1996. L. P. Kaelbling and S. J. Rosenschein. Action and planning in embedded agents. In [MaegOa], pages 35-48. 1990. C. G. Langton. Artificial life. In C. G. Langton, editor, Artificial Life. Addison-Wesley, 1989. D. M. Lyons and A. J. Hendriks. A practical approach to integrating reaction and deliberation. In Proceedings of the 1st International Co,~ference on A I Planning Systems (AIPS), pages 153-162, San Mateo, CA, June 1992. Morgan Kauflnann. A. Lux and D. D. Steiner. Understanding cooperation: an agent's perspective. In Proceedings of the First International Conference on Multiagent Systems, San Francisco, CA, 1995. A. Lux. Kooperative Mensch-Maschine Arbeit - ein Modellierungsansatz u~d clessen UTnsetzung im Rahmen des Systems MEKKA. PhD thesis, Universits des Saarlandes, Saarbriicken, 1995. P. Maes. The dynamics of action selection. In Proceedings of IJCAI-89, pages 991-997, Detroit, Michigan, August 1989. P. Maes, editor. Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back. MIT/Elsevier, 1990. P. Maes. Situated agents can have goals. In [Mae9Oa], pages 49-70. 1990. P. Ma~s. Agents that reduce work and information overload. Communications of the ACM, 37(3):31-40, 1994. P. Maes. Modeling adaptive autonomous agnets. Artificial Life Journal, 1(1 8r 2), 1994.
24 M. Mataric. Synthesizing group behaviors. In Proc. of IJCAI Workshop on Dynamically Interacting Robots, pages 1-10, Chambery, France, August 1993. F. G. McCabe and K. L. Clark. April--agent process interaction lan[MC95] guage. In [WJ95], pages 324-340. 1995. D. McDermott. Robot planning. Technical Report 861, Yale University, [McD91] Department of Computer Science, 1991. M. Minsky. The Society of Mind. Simon and Schuster (Touchstone), 1986. [Min861 J. Mayfield, Y. Labrou, and T. Finin. Evaluating KQML as an agent com[MLF96] munication language. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent Agents - - Proceedings of the 1995 Workshop on Agent Theories, Architectures, and Languages (ATAL-95), volume 1037 of Lecture Notes in Artificial Intelligence, pages 347-360. Springer-Verlag, 1996. J. P. Miiller and M. Pischel. An architecture for dynamically interacting [MP941 agents. International Journal of Intelligent and Cooperative Information Systems (IJICIS), 3(1):25-45, 1994. J. P. Miiller. A cooperation model for autonomous agents. In Proceedings [Mii196a] of the Third International Workshop on Agent theories, Architectures, and Languages (ATAL-96), Budapest, 1996. [Mii196bl J. P. Miiller. The Design of Autonomous Agents - - A Layered Approach, volume 1177 of Lecture Notes in Artificial Intelligence. Springer-Verlag, Heidelberg, 1996. [MvdHV91] J.-J. C. Meyer, W. van der Hoek, and G. A. W. Vreeswijk. Epistemic logic for computer science: A tutorial (part one). In Bulletin of the EATCS, volume 44, pages 242-270. European Association for Theoretical Computer Science, 1991. [MWJ97] J. P. Miiller, M. J. Wooldridge, and N. R. Jennings, editors. Intelligent Agents, volume III of Lecture Notes in Artificial Intelligence. Springer: Verlag, 1997. T. J. Norman and D. Long. Alarms: An implementation of motivated [NL961 agency. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent Agents -- Proceedings of the 1995 Workshop on Agent Theories, Architectures, and Languages (ATAL-95), volume 1037 of Lecture Notes in Artificial Intelligence, pages 219-234. Springer-Verlag, 1996. G. Nicolis and I. Prigogine. Self-organization in Non-equilibrium Systems. [NP77] Wiley Interscience, New York, 1977. A. Newell and H. A. Simon. Computer science as empirical enquiry: Sym[NS761 bols and search. Communications of the ACM, 19(3):113-126, 1976. C. Petrie. Agent-based engineering, the web, and intelligence. To Appear [Pet96] in IEEE Expert, 1996. [RBP91] J. E. Rumbaugh, M. Blaha, and W. Premerlani. Object-oriented Modeling and Design. Prentice Hall, 1991. W. S. N. Reilly. Believable Social and Emotional Agents. PhD thesis, [Rei96] School of Computer Science, Carnegie Mellon University, 1996. [RG91a] A.S. Rao and M, P. Georgeff. Modeling Agents Within a BDIArchitecture. In R. Fikes and E. Sandewall, editors, Proc. of the 2rd International Conference on Principles of Knowledge Representation and Reasoning (KR'91), pages 473-484, Cambridge, Mass., April 1991. Morgan Kaufmann.
[Mat931
25 [RG91b]
[RG92]
[RG95] [RN95] [RZ94] [Sac75] [SBH96]
[SBKL93]
[SH90] [Sho93] [Sim81]
[SP96]
[Steg0]
[Suc87] [Tho93]
[Tho95] [Var86]
[Wi188]
A. S. Rao and M. P. Georgeff. Modeling rational agents within a BDIarchitecture. Technical Report 14, Australian AI Institute, Carlton, Australia, 1991. A. S. Rao and M. P. Georgeff. An abstract architecture for rational agents. In Proc. of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR'92), pages 439-449. Morgan Kaufmann, October 1992. A. S. Rao and M. P. Georgeff. BDI-agents: from theory to practice. In Proceedings of the First Intl. Conference on Multiagent Systems, San Francisco, 1995. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995. J. S. Rosenschein and G. Zlotkin. Rules of Encounter: Designing Conventions for Automated Negotiation among Computers. MIT Press, 1994. Earl D. Sacerdoti. The nonlinear nature of plans. In IJCAI-75, pages 206-218, 1975. M. Strafier, J. Baumann, and F. Hohl. Beyond JAVA: Merging Corbabased mobile agents and WWW. In Joint W3C/OMG Workshop on Distributed Objects and Mobile Code (Accepted Position Paper), Boston, Massachusetts, 1996. D. D. Steiner, A. Butt, M. Kolb, and Ch. Lerin. The conceptual framework of MAI2L. In Pre-Proeeedings of MAAMAW'93, Neuchs Switzerland, August 1993. J. Sanborn and J. Hendler. A model of reaction for planning in dynamic environments. International Journal of Artificial Intelligence in Engineering, 6(1):41-60, 1990. Y. Shoham. Agent-oriented programming. Artificial Intelligence, 60:5192, 1993. H. A. Simon. The Sciences of the Artificial. MIT Press, Cambridge, MA, 2nd edition, 1981. A. Sloman and R. Poli. SIM_AGENT: A toolkit for exploring agent designs. In M. Wooldridge, J. P. Miiller, and M. Tambe, editors, Intelligent Agents -- Proceedings of the 1995 Workshop on Agent Theories, Architectures, and Languages (ATAL-95), volume 1037 of Lecture Notes in Artificial Intelligence, pages 392-407. Springer-Verlag, 1996. L. Steels. Cooperation between distributed agents through selforganization. In Y. Demazeau and J.-P. Miiller, editors, Decentralized A.I., pages 175-196. North-Holland, 1990. L. A. Suchman. Plans and Situated Actions. Cambridge Universtiy Press, Cambridge, 1987. S. R. Thomas. PLACA, an Agent Oriented Programming Language. PhD thesis, Stanford University, 1993. Available as Stanford University Computer Science Department Technical Report STAN-CS-93-1487. S. R. Thomas. The PLACA agent programming language. In [W J95], pages 355-370. 1995. M. Y. Vardi. On epistemic logic and logical omniscience. In Proc. of the First Conference on Theoretical Aspects of Reasoning about Knowledge (TARK'86), pages 293-306. Morgan Kaufmann Publishers, 1986. D. E. Wilkins. Practical Planning: Extending the Classical AI Planning Paradigm. Morgan Kaufmann, San Mateo, CA, 1988.
26
[wJ95] [WMT96]
M. J. Wooldridge and N. R. Jennings, editors. Intelligent Agents - Theories, Architectures, and Languages, volume 890 of Lecture Notes in Artificial Intelligence. Springer-Verlag, 1995. M. J. Wooldridge, J. P. Miiller, and M. Tambe, editors. Intelligent Agents II, volume 1037 of Lecture Notes in Artificial Intelligence. Springer-Verlag, 1996.
An Agent-Based Architecture for Software Tool Coordination Stephen Cranefield and Martin Purvis Computer and Information Science, University of Otago, PO Box 56, Dunedin, New Zealand {scranefield,mpurvis} @commerce.otago.ac.nz Abstract. This paper presents a practical multi-agent architecture for assisting users to coordinate the use of both special and general purpose software tools for performing tasks in a given problem domain. The architecture is open and extensible being based on the techniques of agent-based software interoperability (ABSI), where each tool is encapsulated by a KQML-speaking agent. The work reported here adds additional facilities for the user to describe the problem domain, the tasks that are commonly performed in that domain and the ways in which various software tools are commonly used by the user. Together, these features provide the computer with a degree of autonomy in the user's problem domain in order to help the user achieve tasks through the coordinated use of disparate software tools. This research focuses on the representational and planning capabilities required to extend the existing benefits of the ABSI architecture to include domain-level problem-solving skills. In particular, the paper proposes a number of standard ontologies that are required for this type of problem, and discusses a number of issues related to planning the coordinated use of agent-encapsulated tools.
1 Introduction Every computer user must at some time have wished their machine was more "intelligent" and could reason about the task being performed by the user and the steps that must be performed to achieve that task. In particular, for many computer users, their day-to-day work involves the use of a variety of different software tools, developed independently and using different data representations and file formats. Coordinating the use of these tools, remembering the correct sequence of commands to apply each tool to the current problem and incorporating new and modified tools into their work patterns adds additional demands on the user's time and memory. This paper discusses an agent-based architecture designed to remove this drudgery from the user. The paper also includes a discussion of related planning and ontological issues. The architecture combines the techniques of agent-based software interoperability (ABSI) [1] with a planning agent and facilities for the user to describe the problem domain, the tasks that are commonly performed in that domain and the ways in which various software tools are commonly used by the user. Together, these features provide the computer with a degree of autonomy in the user's problem domain. The user can request the initiation of domain-level tasks and the system will plan and execute the
45 appropriate sequence of tool invocations. If a task is not completed in one session, the system will remember the current state of the task and how any intermediate data files relate to the overall task. By simply requesting the continuation of the task, the user can cause the planner to begin from where it left off previously. Providing these abilities has implications for the nature of the planning agent and also suggests the necessity of developing ontologies for certain standard information storage formats. This work places no requirements on the internal structure of agents in the system (which, apart from specialised planning and console agents, are simply encapsulations of existing software tools). Instead it focuses on the representational and planning capabilities required to extend the existing benefits of the ABSI architecture to include domain-level problem-solving skills.
2 Agent-based Software Integration In today's heterogeneous software environments, with tools written at different times for various specific purposes, there is an increasing demand for interoperability among these tools [2]. While distributed object models such as CORBA [3] are currently gaining widespread industrial acceptance as standards for sharing data and exchanging services amongst distributed applications, research is underway towards finding higher-level, declarative models of communication and cooperation. Progress in this area has been made in the field of"Agent-Based Software Interoperability" (ABSI) [ 1], also known as "Agent-Based Software Engineering" [2]. This involves the encapsulation of software tools and information servers as "agents" that can receive and reply to requests for services and information using a declarative knowledge representation language, e.g. KIF (Knowledge Interchange Format), a communication language, KQML (Knowledge Query and Manipulation Language) and a library of formal ontologies defining the vocabulary of various domains. The agents are connected in a "federation architecture" which includes special 'facilitator' agents such as the one developed by the Stanford University Computer Science Department's Logic Group [ 1]. Facilitator agents receive messages and forward them to the most appropriate agent depending on the content of the message (this is known as "content-based routing"). Agents are responsible for registering the type of message they can handle with the facilitator. Tools can be spread across different platforms to form an open and extensible agent system: a new tool can be added to the system by providing it with a 'transducer' program (that translates between KQML and the tool's own communications protocol) or a 'wrapper' layer of code (a KQML interface written in the tool's own command or scripting language) and registering its capabilities with a facilitator agent.
3 Desktop Utility ABSI ABSI projects discussed in the literature have to date largely focused on domains where the software tools to be integrated are complex and/or expensive to develop, e.g. concurrent engineering of a robotic system [4], civil engineering [5], electronic
46 commerce [6], and distributed civic information systems [7]. In contrast with these large-scale integration efforts, the problem addressed in this paper is how to support interoperation among an evolving workbench or toolkit of general purpose utilities and special-purpose tools. Working with such a toolkit can typically be characterised as follows: - A number of tasks are performed in sequence, possibly over an extended period of time and with significant time intervals between some of the tasks. - Information recording the current state of the problem domain must be kept and updated as each task is performed. - A variety of data formats and software tools are used to perform the different tasks. Relevant new tools may appear and existing tools may be replaced or upgraded over time. - Some general-purpose tools may be used to support work in a number of different problem domains. Typically, the problem domain is relatively simple and can be adequately defined using standard data modelling representations (e.g. the relational data model). Also, as some of the tools used may be general purpose utilities such as relational database management systems, programmable text editors and spreadsheets, for maximum reusability these should be encapsulated by agents that communicate in terms of the data formats the tools operate on. An architecture to support this type of tool interoperation must therefore include ontologies describing low-level data formats and how these can be related to higher-level representations of the domain. As each of the tools used may only perform a small part of the overall task, the overhead of coordinating the use of different tools into a coherent sequence represents a significant overhead. Thus planning support is needed to help the user select the appropriate sequence of tools to be used. We will refer to this type of ABSI problem as "Desktop utility ABSI" to distinguish it from the large scale interoperability projects referenced above. One application area that falls into this area is administrative work involving the management of information stored in common data formats such as text files, spread sheets and relational databases.
4
The Architecture
Figure 1 shows our architecture for Desktop Utility ABSI [8]. This is based on the federation architecture with the addition of two specialised agents: a planning agent and a console agent.
4.1 The Planning Agent For interoperating systems involving a few complex agents (large scale ABSI) the patterns of interaction between the agents are likely to be known in advance and limited in scope. In contrast, for desktop utility agent systems, there may be many simple agents each encapsulating a number of operations performed by various general-purpose tools. To achieve a user's high-level goal in the problem domain, it may be necessary for a number of different agents to perform actions in an appropriate sequence.
47
~-~
Facilitator
Fig. 1. The desktop utility agent system architecture
It would be tedious if the user were required to specify this sequence of agent invocations. However, deducing an appropriate sequence of agent actions to achieve a goal is an AI planning problem and facilitators whose is knowledge is expressed using KIF (e.g. the Stanford facilitator [1]) do not support planning well as they have no notion of state and time. Rather than attempting to extend the function of a facilitator to perform planning, our architecture contains a specialised planning agent. Each agent is required to register with the planning agent by sending specifications of the actions it can perform (consisting of the actions' preconditions and effects).
Agent: dbase-agent Action: (add-marks ?f ?ass ?db) Description: Add marks for assignment ? a s s in file ? f to the database ?db. Preconditions: (database-matches-datamodel ?db info202-dm) (file-format ?f (delim #\, (listof string string number))) (file-represents-relation ?f (select ASSESS (= cmptid ?ass)) info202-dm (stuid cmptid mark) ?state)
Postconditions: (database-includes-relation ?db (select ASSESS (= cmptid ?ass)) info202-dm (do (add-marks ?f ?ass ?db) ?state))
Fig. 2. Specification for a database agent action
48 Figure 2 shows a specification I for an action performed by a database agent as part of a university course administration process (the predicates appearing in the specification are explained in Sect. 5). This describes an ability of the database agent to read in a specially formatted marks file (as produced by a utility used to systematically mark student's electronically submitted programming assignments) and to update the student marks database. Note that an explicit state variable appears in some of the pre- and post-conditions. The reason for this is explained in Sect. 5.1. This agent action is designed for a specialised problem domain: its specification contains terms from the problem domain ontology. Agents may also encapsulate the general-purpose data manipulations of desktop utilities. For example, an agent might encapsulate the ability of the Excel spreadsheet to take as input a text file containing a column of numbers and produce a printed histogram of these numbers. In general, one agent wrapper must be provided for each application, and this will register with the planning agent all the functionalities of the underlying application that have been encoded in the agent. It is not envisaged that the full functionality of general packages such as spreadsheet or word processing applications could be expressed declaratively in a single specification. However, it should be possible to provide specifications for various simple computations that the application can perform.
4.2
The Console Agent
In the desktop utility agent architecture the user interacts with a console agent that accepts requests expressed using the ontology of the user's various problem domains. In particular, the user can issue a request ( p e r f o r m ? t a s k ) where task is the (possibly parameterised) name of a problem-domain level task that the user wishes to be performed. This request is packaged up as a KQML message and sent to the facilitator, which will forward it to the planning agent. Once an appropriate sequence of agent actions has been planned it is sent back to the console agent which is then responsible for invoking agents to perform these actions for the user. The console agent also keeps track of the current state of the system (i.e. what information is currently recorded and what it represents).
4.3
Using the Architecture
The initial step in using the architecture involves defining a problem domain (which need only be done once for each domain). The user must define a data model defining the entities of interest in the domain. This is done by asserting facts (using the console agent) defining the relations in the domain. It is also necessary to assert facts describing the initial state of the system, i.e. what information is recorded in data files or databases and how it relates to the domain-level data model. In conjunction with an existing ontology defining the terminology of relational data models, this defines an ontology for the user's domain. The various domain-related tasks to be performed are specified by declaring their names and parameters and their precondition and goal states (in terms of the information In this case no preconditions are retracted by the action, although in general this is possible.
49 recorded before and after the task is performed). The user may also provide a specification of possible expansions of each task into an ordered set of subtasks (see Sect. 6). In addition it is assumed that the user has previously developed or acquired agent wrappers for the various general or special-purpose tools used to support work in this domain. These wrappers must be equipped with planning operator style specifications of (a subset of) the possible actions these tools can perform. These specifications describe the actions' preconditions and effects in terms of the domain ontology or standard data storage ontologies as described in Sect. 5. When the user requests a task to be performed via the console agent, the planner generates a sequence of agent actions that can satisfy the task specification. These actions are then invoked by the console agent. Note that while these actions are considered to be atomic by the planner they may involve the invocation of an agent-encapsulated tool which may possibly require a series of user interactions. During execution the console agent keeps track of the current state of the system (in terms of the facts asserted and retracted by the actions' planning operators).
5 Required Ontologies 5.1
File Formats
In large scale ABSI projects, where there are a fixed number of interoperating tools that are specialised for a particular domain, it makes good sense to encapsulate each tool within an agent that speaks a high-level domain-related ontology. In contrast, in desktop utility ABSI there may be agents controlling various general-purpose tools that act at a relatively low data representation level (e.g. by performing manipulations on files). As these tools could be used in various problem domains it would limit their reusability in an ABSI framework if their agent wrappers could only declare their abilities in terms of a single high-level domain. Although for a general purpose tool it would be possible to generate separate agent wrappers corresponding to different domain ontologies, this would clearly be a duplication of effort. Instead, a generic tool should be described at the level at which it operates. Thus a utility that can manipulate text files should be described in terms of an ontology of text files. There are a number of different levels at which the contents of text files are typically viewed, e.g. as a sequence of characters, a sequence of lines or a sequence of records. Figure 3 shows a hierarchy of ontologies being constructed for representing the contents of a text file in a declarative fashion, along with the type of facts used to describe the file contents at each level. To relate the different viewpoints the ontologies also need to describe how to translate information between the different representational levels. For instance, the information that the file named "students.dat" is a text file with three string-valued fields delimited by commas might be represented by the following fact:
(file-format "students.dat" (delim #\, (list-of string string string))) This predicate would be defined in an ontology by a formula stating how facts about records and fields can be inferred from facts about lines in files. For instance, the fact
50
Conceptual level
Fact schema
Relational
( t u p l e ? p a i r s ? r e l ?dm) pairs is the set of attribute-value pairs in relation rel of data model dm
An ontology relating lists of records to sets of tuples, using facts asserting which relation and data model a file corresponds to and which fields correspond to which attributes.
Records and fields ( f i e l d ? i ?n ? f ?v) Field i of record n in file/has value v An ontology for string parsing, based on facts describing the record format.
Seq. oflines
( l i n e ?n ? f ? s ) Line n of f i l e / i s the string s
# An ontology describing newline terminated strings.
Seq. of chars
(char ?n ?f ?c) Character n in f i l e / i s c
Fig. 3. A hierarchy of text file ontologies and how they relate facts at different levels of abstraction
(line i0
"students.dat ....9 6 0 1 2 3 4 , J o e Smith,A0100001")
would allow the following three facts to be deduced: (field (field (field
1 10 2 10 3 10
"students.dat "students.dat "students.dat
.... 9601234") .... Joe Smith") .... A0100001")
Thus the f i 1 e - f o r m a t fact above provides the information needed to translate between the line-based view o f a file and the record-based view. Similarly, facts of the following form could be used to link the record-based and the relational views o f a file: (f i l e - r e p r e s e n t s - r e l a t i o n ?file ?rel ?data-model
?att-list
?state)
As tuples in relations are sets o f attribute-value pairs, whereas the fields in a record are ordered, the argument ? a t t - l i s t is required to specify the order in which the attributes appear in the f i l e - - this is a list which is some permutation o f the attributes o f the relation ? r e 1. The explicit mention o f a state in this fact is necessary because in the type o f tool use supported by this architecture, domain-level tasks may not be completed in a single
51 session and there may be information stored in temporary files that are more up-to-date than other information sources. For example, in the university course administration domain, the user may run a marking utility to mark student assignments and then go home for the day before updating the course database with the marks recorded in the file that was output by the marking utility. At this point the database and the text file contain information relating to different stages of the high level "mark assignment" task: the database does not reflect the fact that the latest assignment has been marked, whereas the text file does. The final argument in the fact above specifies which (possibly earlier) state in the plan the information in the file represents. It will be an expression of the form (do a s) representing the state resulting from performing action a in a state s (which may itself be represented by an expression of this form). It is important to note that the ontologies described above are only specifications of standard terminologies that may be used by agents. While some agents may perform inference using the formulae in an ontology, all that is required of an agent is that its use of the terminology in an ontology be consistent with the ontology's logical theory. Defining ontologies describing different conceptual views of files, and how to translate between these levels of abstraction, does not imply that all file processing will be performed using an inference system. In practice, these ontologies--particularly at the lower l e v e l s - - m i g h t be implemented by 'procedural attachment', so that telling a file manager agent a f i 1 e - f o r m a t fact such as the one above would result in that agent reading and parsing the file and sending a stream of f i e l d facts to all interested agents. 5.2
Data Models and Databases
In any ABSI project there needs to be at least one high-level ontology describing the problem domain (in some cases there may be several due to different agents having different views of the domain). In the example domain described above, the problem area involves student details, assignments and marks. An ontology for this domain can be created in various ways, e.g. an object-oriented model could be based on the Stanford ontology library's frame ontology [9]. However, like all applications involving a relational database, a domain model already exists: the relational data model developed for the database application. This could be easily described as an ontology if a standard ontology of relational data models were available. Such an ontology would define the concepts of base relations and their attributes and attribute domains, as well as a way of naming relations derived from the base relations. The discussion in this paper assumes the existence of such an ontology (which is currently under development) with relations described using the relational algebra, i.e. relations can be built from the base relations using operations such as select, join and project. Users will define the relational data models for their problem domains by asserting facts naming the model and describing its base relations. The relational data model ontology will also allow the user to declare semantic information such as candidate keys, foreign keys, foreign key rules and default attribute values, thus supporting agents that can update derived relations ("view updates"). For the course administration example, using a domain ontology based on a relational data model, a tuple in a base relation STUDENT (recording student ID numbers, names
52 and Novell NetWare IDs) could be represented by the following KIF fact:
(tuple
(setof (stuid "9601234") (stuname "Joe Smith") (nw_id "A0100001")) student info202-dm)
where s t u d e n t names the relation and i n f o 2 02 -din names the particular relational data model (there could be more than one problem domain, and hence more than one data model in use at a time), The relational data model ontology discussed above describes a conceptual model of a domain, whereas the text file ontologies discussed in the previous section describe the physical representations of data. Just as the text file ontologies refer to files, we need an ontology in which there is a concept of a database (as opposed to the conceptual model implemented by it). A database is a separate concept from a relational data model: a database could represent information from several relational models; conversely the information pertaining to a single data model could be split (or duplicated) across several databases. Therefore it is necessary to have an ontology for actual databases, defining (amongst others) the predicate:
(database-matches-datamodel
?database ?data-model)
This declares that the tables in the database match the relational data model specified (including any integrity checks implied by the foreign key information in the data model). This predicate can be implemented by procedural attachment to the database query language.
6
P l a n n i n g Issues
6.1 Hierarchical Task Network Planning The nature of desktop agent interoperation imposes a number of demands on the planner. The aim of this architecture is to remove as much manual coordination of tool use as possible from the user. For this multi-agent system to be as autonomous as possible, facilities need to be provided to assist the user in providing as much information about the problem domain as possible. In particular, the user may have common patterns of tool use that could improve the performance of the planner if it could make use of them. Such a facility is provided by hierarchical task-network (HTN) planners such as UMCP [10]. With this type of planner, the user can provide the planner with declarations describing possible decompositions of tasks into ordered networks of subtasks. The basic planning step involves expanding a task into one of its possible networks of subtasks and then attempting to resolve any unwanted interactions between steps in the plan that were introduced by this expansion. A task expansion can also include subtasks of the form achieve(g) where g is a goal. By providing methods for expanding 'achieve' tasks, the user can also provide the planner with (domain-specific) goal-directed planning capabilities.
53
6.2
Ontologies and Abstraction
An important characteristic of desktop utility ABSI is the mix of domain-specific and general-purpose tools. To use the information output by general-purpose tools as input to domain-specific tools, it is necessary to store facts stating how these low-level representations relate to the problem domain's data model. For example, Fig. 2 shows the specification for an action that can be performed by a database-encapsulating agent in the university course administration domain: to update a student record database from a specially formatted text file. The action specification contains some pre- and post-conditions that are inherently related to the problem domain (university course administration). Other conditions are expressed using the lower level ontologies of file formats and databases. The planner may need to switch back and forth between these different levels of description as the following example shows. Before the add-marks action can be performed a marking utility must be run to allow the tutor to systematically mark each Student's electronically submitted programming assignment and create a file containing the marks. When encapsulated by an agent, the postconditions for this "mark assignments" action are as follows: (file ?f (delimited #\, (listof string string string number))) (file-represents-relation ?f (join STUDENT (project (select ASSESS (= cmptid ?ass)) (stuid mark))) info202-dm (stuid stuname nw id mark) (do (mark ?ass ?f) ?state)) The format of the output file, and the relation it represents differ from that required by the a d d - m a r k s action. The planner must find a way to achieve the preconditions of the a d d - m a r k s action starting from the postconditions shown above. This is a text file manipulation process that takes the file produced as a result of running the marking program and converts it into a format suitable for adding information to the ASSESS table in the database. This can be performed by two operations at the text file level: deleting two fields of the file and then adding a new field containing the assignment name in each row. Figure 4 shows how these two file-level operations can be used to achieve the preconditions of the a d d - m a r k s action starting with the postconditions of the m a r k action (the state expressions are not given in full). To infer that this sequence of file manipulations can be used as a link in a plan involving higher level concepts requires the planner to drop down a level in the hierarchy of ontologies, generate a subplan at that level and then produce a specification of that subplan at the higher conceptual level. This translation process could be defined as a HTN planner 'task'; however, it would also be useful if the planner could solve this problem without such presupplied help from the user. Existing hierarchical planning techniques may be applicable to this problem. This is an issue requiring further investigation.
54
(file ?f (delimited #\, (list-of string string string number))) (file-represents-relation ?f (join STUDENT (project (select ASSESS (= cmptid ?ass)) (stuid mark))) info202-dm (stuid stuname nw_id mark)
sm~l)
~r
Delete columns 2 and 3 from file ? f
(file ?f (delimited #\, (list-of string number))) (file-represents-relation ?f (project (select ASSESS (= cmptid ?ass)) (stuid mark)) info202-dm (stuid mark)
sm~2)
Insert new column 2 with value of ?ass in every row
(file ?f (delimited #\, (list-of string string number))) (file-represents-relation ?f (select ASSESS (= cmptid ?ass)) info202-dm (stuid cmptid mark)
sm~3)
Fig. 4. The file manipulation process
6.3
Incorporating Run-Time Information
In some domains, the planning agent may not be able to expand a task in a plan network until it has information that may only be available at run time. For example, in the university course administration domain, the task of marking every student's submitted work for a given assignment can be broken into two subtasks: first to invoke the marking utility for some subset of the students and then (sometime later) to mark the assignments
55 for the rest of the students (a compound task to be expanded further). The subset of students whose assigments are marked during the first subtask can not be known at run t i m e - - it depends on how much time the tutor has. The expansion of the second subtask will also depend on this information. To address situations such as this, the planner's action description language must have a formalism for representing information that will become available during the execution of a plan. The planner must also be capable of interleaving planning and execution, and instantiating the run-time information. These capabilities have been provided in previous planning research by allowing special "run-time variables" to appear in action preconditions and effects [ 11 ] and this work has been extended to address other issues related to planning in the absence of complete information about the world [ 12, 13]. The formalism of"provisions" [ 14] was introduced to generalise the role that parameter instantiation and run-time variables play in the flow of information between actions in aplan. This is also the only work in this area to address HTN planning. The notion of a provision is powerful but seems unintuitive as a primitive concept in a planning formalism. There is much scope for further research in this area to clarify how information flow in plans can best be modelled.
6.4
Planning to Communicate
Planners traditionally model actions by the changes they make to a world state. In the ABSI framework, agents communicate by sending messages in KQML to each other and these communications do not directly change the state of the world. However, if the world model includes facts describing the planner's knowledge of the knowledge of other agents appearing in the plan, planning operators for different types of KQML messages can be specified in terms of the changes they make to these facts. This approach would require the planner to include at least some of the inference capabilities of epistemic logic and may introduce needless complexity. An alternative approach is for the planner to specify agent actions and generate plans under the hypothesis that a global shared knowledge base is available to all agents. The resulting plan could then be transformed into one including communication acts, in the same way that a program for a shared-memory parallel process can be translated into a program for a message-passing architecture. These two approaches for planning to communicate will be investigated and compared in future work.
7
Related Work
As discussed earlier, this work builds upon the ABSI federation architecture [2, 9]. SRI's Open Agent Architecture [15] allows agent-encapsulated desktop tools to interoperate in a distributed heterogeneous environment. Agents communicate via a distributed blackboard. User interface considerations are a focus of this work, while ontological issues and planning for high-level tasks are not addressed. The SIMS architecture [16] is specialised for agents encapsulating distributed information sources. The agents are relatively complex, containing planning and learning components, and use domain and information source models developed using specialised knowledge representation tools.
56 Softbots [17] are complex stand-alone agents that can invoke various tools on behalf of the user. However, there is no facility for users to extend or alter a softbot's functionality.
8
Conclusion
This paper has presented an agent-based architecture for a type of software interoperability problem (desktop utility ABSI) different from the large scale ABSI projects discussed in the literature. In particular the use of a planning agent to automate the selection of actions to jointly achieve domain tasks has been discussed. It has been suggested that hierarchical task-network planning techniques be used so the user can provide guidance on how different tools can be combined for particular tasks. Combining the use of special-purpose tools with general-purpose utilities means that at times during the performance of a task, information corresponding to different stages of the task may be stored in different formats and the implications of this on the planner has been discussed. Standard ontologies for data formats are required to facilitate the use of generalpurpose tools with this architecture and the design of database and text file format ontologies have been outlined. A prototype implementation [ 18] of the architecture has been developed using the Java Agent Template [19] to build the console agent, the facilitator (with help from Amzi! Prolog and its Java interface [20]) and KQML-speaking agent "transducers" for tools running under Windows NT and Unix. The Common Lisp HTN planner UMCP [ 10] is currently being used with a KQML interface provided by the KAPI software [21 ]. Currently work is continuing on the implementation of this architecture and example software agents to encapsulate the tools of the course administration domain. Further work involves elaborating the planning requirements for this architecture (as outlined in Sect. 6) and the ontologies discussed above. Also, ontologies for other common tools such as spreadsheets will be required ([22] may provide a useful starting point). A user interface will be developed to help non-sophisticated users compose their problem domain description, the specifications of the tasks to be performed in that domain and create agent wrappers for their existing software tools. At present our research assumes that all agents share the same ontologies for the domain description and the data formats they operate on. This work should eventually address the problem of interoperation between agents with different ontologies for the same domain [16, 23].
Acknowledgements This research is supported by an Otago Research Grant. Thanks to Aurora Diaz for her ongoing work in implementing this architecture.
57
References 1. N. Singh. A Common Lisp API and facilitator for ABSI: version 2.0.3. Technical Report Logic-93-4, Logic Group, Computer Science Department, Stanford University, 1993. 2. M. R. Genesereth and S. P. Ketchpel. Software agents. Communications of the ACM, 37(7):48-53, July 1994. 3. Los Alamos National Laboratory Advanced Computing Laboratory. Information resources for CORBA and the OMG. http://www.acl.lanl.gov/CORBA/. 4. M. R. Cutkosky, R. S. Engelmore, R. E. Fikes, M. R. Genesereth, and T. R. Gruber. PACT: An experiment in integrating engineering systems. Computer, 26(1):28-37, 1993. 5. T. Khedro and M. Genesereth. The federation architecture for interoperable agent-based concurrent engineering systems. International Journal on Concurrent Engineering, Research and Applications, 2:125-i 31, 1994. 6. W. Wong and A. Keller. Developing an Internet presence with online electronic catalogs. Stanford Center for Information Technology, http://www-db.stanford.edu/pub/keller/1994/ cnet-online-cat.ps. 7. T. Nishida and H. Takeda. Towards the knowledgeable community. In Proceedings of the
International Conference on the Building and Sharing of VeryLarge Scale Knowledge Bases, pages 157-166, 1993. http://ai-www.aist-nara.ac.jp/doc/people/takeda/doc/ps/kbks.ps. 8. S. J. S. Cranefield and M. K. Purvis. Agent-based integration of general-purpose tools. In Proceedings of the Workshop on Intelligent Information Agents, Fourth International Conference on Information and Knowledge Management, December 1995. http://www.cs. umbc.edu/-cikm/iia/proc.html. 9. Stanford Knowledge Systems Laboratory. Ontology Server Web page. http://www-ksl-svc. stanford.edu:5915/. 10. K. Erol, J. Hendler, and D. S. Nau. UMCP: A sound and complete procedure for hierarchical task-network planning. In K. Hammond, editor, Proceedings of the 2nd International Conference on AI Planning Systems, pages 249-254, 1994. 1 I. J. Ambros-lngerson and S. Steel. Integrating planning, execution and monitoring. In Proceedings of the 7th National Conference on Artificial Intelligence, pages 735-740, 1988. 12. K. Golden, O. Etzioni, and D. Weld. Omnipotence without omniscience: Efficient sensor management for planning. In Proceedings of the 12th National Conference on Artificial Intelligence, pages 1048-1054. AAAI Press, 1994. file://cs.washington.edu/pub/ai/tr94-0103.ps.Z. 13. C. A. Knoblock. Planning, executing, sensing, and replanning for information gathering. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995. http://www.isi.edu/sims/papers/95-sage.ps. 14. M. Williamson, K. Decker, and K. Sycara. Unified information and control flow in hierarchical task networks. In Proceedings of the AAAI-96 Workshop on Theories of Action, Planning and Control, 1996. http://www.cs.cmu.edu/~softagents/papers/provisions.ps. 15. P. R. Cohen, A. Cheyer, M. Wang, and S. C. Baeg. An open agent architecture. In Proceedings of the Spring Symposium on Software Agents, Technical Report SS-94-03. AAAI Press, 1994. ftp://ftp.ai.sri.com/pub/papers/cheyer-aaai94.ps.gz. 16. C. A. Knoblock and J. L. Ambite. Agents for information gathering. In J. Bradshaw, editor, SoftwareAgents. AAA1/MITPress, 1996. forthcoming. Also http://www.isi.edu/sims/papers/ 95-agents-book.ps. 17. O. Etzioni, N. Lesh, and R. Segal. Building softbots for UNIX. Unpublished technical report, 1992. ftp://june.cs.washington.edu/pub/etzioni/softbots/softbots-tr.ps.Z. 18. University of Otago. Software Agents Research Group Web page. http://divcom.otago.ac. nz:800/COM/INFOSCI/SECML/lab/sarg.
58 19. Stanford University Agent-Based Engineering Research Group. Java Agent Template Web page. http://cdr.stanford.edu/ABE/JavaAgent.html. 20. Amzi! Inc. WWW home page. http://www.amzi.com/. 21. Lockheed/EIT/Stanford KQML API Web page. ftp://hitchhiker.space.lockheed.com/pub/aic/ shade/software/KAPI/README.html. 22. S. S. Ali and S. Hailer. Interpreting spread sheet data for human-agent interactions. In
Proceedings of the Workshop on Intelligent Information Agents, Fourth International Conference on Information and Knowledge Management, December 1995. http://www.cs.umbc. edu/'cikm/iia/proc.html. 23. G. Wiederhold. Interoperation, mediation, and ontologies. In Proceedings of the Workshop
on Heterogeneous Cooperative Knowledge Bases, International Symposium on Fifth Generation Computer Systems, pages 33-48, 1994. http:lldb.stanford.edulpubl~ol19941medont.ps.
Commitments
in t h e A r c h i t e c t u r e o f a L i m i t e d , R a t i o n a l A g e n t * ** Munindar P. Singh*** Department of Computer Science North Carolina State University Raleigh, NC 27695-8206, USA singh@ncsu, edu
A b s t r a c t . Rationality is a useful metaphor for understanding autonomous, intelligent agents. A persuasive view of intelligent agents uses cognitive primitives such as intentions and beliefs to describe, explain, and specify their behavior. These primitives are often associated with a notion of c o m m i t m e n t that is internal to the given agent. However, at first sight, there is a tension between commitments and rationality. We show how the two concepts can be reconciled for the important and interesting case of limited, intelligent agents. We show how our approach extends to handle more subtle issues such as precommitments, which have previously been assumed to be conceptually too complex. We close with a proposal to develop conative policies as a means to represent commitments in a generic, declarative manner. 1
Introduction
How can limited agents cope with a complex world? This is a question that has drawn much attention in the study of intelligent agents. As agents find application in an increasing variety of complex domains, this question continues to gain importance. There are two dominant views about intelligent agency. -
Cognitive: The cognitive view borrows folk psychological metaphors, and treats agents as loci of beliefs, desires, intentions, and so on. This view is called the knowledge level [Newell, 1982] or the i n t e n t i o n a l stance [McCarthy,
-
The economic view borrows economic metaphors, and treats agents as rational beings. It has long been realized that perfect rationality is not realizable in limited agents, and theories of bounded rationality have been proposed [Simon, 1981].
1979]. Economic:
* This paper synthesizes and enhances some ideas that were introduced in papers that appear in the Proceedings of the 13th Annual Conference of the Cognitive Science Society (1991) and the Proceedings of the IJCAI-91 Workshop on the Theoretical and Practical Design of Rational Agents. ** I am greatly indebted to Lawrence Cavedon for extensive comments. *** This work has been partially supported by the NCSU College of Engineering and by the National Science Foundation under grants IRI-9529179 and Itti-9624425.
73 (Yet another view treats intelligence as emerging from the reactive behaviors of agents. We shall discuss this view further below.) A number of agent architectures based on the above views have been proposed. We propose not a new architecture, but a knowledge-level characterization of an architecture for limited, rational agents. Our goal is to relate the above cognitive and economic metaphors, and to formalize some primitives through which limited, rational agents can be described at the knowledge level. This is a challenging goal because, as we show below, the two views appear incompatible at first glance. The primitives we propose are related to the intentions of an agent. One primitive is commitment, which is well-known in the literature; another primitive is precommitment, which has been mentioned but largely ignored in the literature. We formalize these primitives in a manner that satisfies the intuitions of both the cognitive and the economic views. There is need for both descriptive and prescriptive theories of rationality in limited agents. A descriptive theory would consider existing intelligent systems, primarily humans, and describe how they manage to cope despite their limitations. Such a theory might also be used b y artificial agents to understand each other. A prescriptive theory would define criteria by which an agent may be designed that exhibits intelligence despite its limitations. This paper first considers commitments in a new descriptive light. From this analysis, it motivates a prescriptive theory that applies to artificial, limited agents. Section 2 describes commitments as they apply to intentions. It discusses the relationship between commitments and rationality. Section 3 conceptually describes our approach, which is based on treating commitn~mnts as levels of entrenchment. Section 4 identifies and describes some key concepts that underlie a formalization of commitments. Section 5 applies the above concepts to formalize and evaluate various constraints about entrenchment, culminating in a call for general conative polices. The formal semantics is outlined in the appendix.
2
Commitments
and
Rationality
The cognitive view of agency leads to a BDI architecture--one which assigns beliefs, desires, and intentions to agents [Rao & Georgeff, 1991]. Beliefs describe the information available to an agent; desires describe an agent's wants; intentions describe what an agent wants and has decided to act upon. Our interest here is in intentions. Intentions denote an agent's pro-attitude toward a proposition or action. Intentions are usually defined to be mutually consistent, compatible with beliefs, and direct or immediate causes of action (e.g., [Brand, 1984, p. 46]). For the above reasons, intentions are distinct from desires (which may be mutually inconsistent or incompatible with beliefs, and may not lead to actions) and beliefs (which do not in themselves lead to action). This view is supported by a number of philosophers, e.g., [Brand, 1984, pp. 121-125], [Bratman, 1987, pp. 18-23], and [Harman, 1986, pp. 78-79]. We restrict ourselves to intentions that are future-directed, i.e., geared toward future actions or conditions. The literature over the past decade or so agrees on the idea that intentions
74 involve some commitment on part of the given agent. This commitment is "psychological" rather than "social" [Castelfranchi, 1995; Singh, t996]. An agent is committed privately to his intentions, independently of his public obligations. 2.1
Why Commitments
are Useful
Commitments cause an agent to continue to hold on to his intentions over time, and to try repeatedly to achieve them.
Example 1. Consider being committed to going to the airport at 6:00 PM. Then, you would make more than one attempt to hail a taxi; if no taxis are forthcoming you might walk to a better location, rent a car, request a ride, or find some other means to get to the airport on time. | From an agent's standpoint, a useful consequence of commitments is that they enable his intentional state now to influence his actions later. Commitments enable an agent to coordinate his activities, both with his other activities, and with those of other agents.
Example 2. Having a commitment to go to the airport will save you the trouble of repeatedly planning to go to a university cafe at 6:00 PM, which you wouldn't be able to act on if you keep your original commitment. | From an agent designer's standpoint, a useful consequence of commitments is that they enable a more modular design than is otherwise possible. The designer has simply to ensure that the agents being designed have the appropriate commitments at certain times or in certain situations. At the next lower level of the design, the designer must supply a set of means, e.g., a plan library, for ensuring that the commitments are met. The interactions between the processes of deliberating about commitments and the processes for acting up to them can thus be streamlined. To a large extent, the design of the commitment layer can be carried out independently of the lower layer.
Example 3. High-level considerations, e.g., communications or social norms, may lead to the adoption of a commitment, but it is up to the planning and execution module to effect the necessary actions. You can commit to going to the airport based on a phone call from a friend, but whether you get there may depend on your driving, rather than your linguistic, skills. | 2.2
How Commitments
can be Harmful
Commitments can be harmful when they cause agents to behave suboptimally or irrationally. Commitments essentially work by taking the decision away from an agent in a specific situation, by making the agent act based on his prior judgment and his dated knowledge. If treated qualitatively, commitments can lead to actions that are unduly expensive.
75
Example 4. Your commitment to be at the airport might make you go there even though your trip was canceled. | Example 5. Your commitment to be at the airport might make you hijack a bus (something that you might regret the rest of your life). | Commitments must be tempered in some way to avoid situations where an agent latches on to his commitment fanatically.
2.3
C o m m i t m e n t s versus Rationality, D e s c r i p t i v e l y
Commitments help limited agents pursue complex goals that would otherwise be beyond their capacities. Thus, while commitments might prove quite irrational in some cases, overall, at least in ordinary circumstances, they are quite rational for agents who cannot think too fast on the fly. This requires that over-commitment ought to be rare, or at least be bounded. Conceptually, we have that - if an agent has partial knowledge about the future state of the world, and has too little time to think, then, on the average, commitments are a good way of being able to get something done - it is not a good idea to over-commit. Commitments pay off in the long run, because cognitive agents can manage to commit without over-committing. This is of course a matter of good design-agents should match the world they exist in. The relevant parts of the world are stable enough that agents can monitor them in sufficiently large intervals.
Example 5. A trip would typically be canceled sufficiently in advance or be a significant enough event that you will end up deliberating (and giving up the commitment to get to the airport) before expending too much effort. | Intuitively, commitments are useful when (1) the agent cannot switch tasks quickly; (2) the cost of reasoning is high; (3) the agent cannot consider all relevant aspects of the world on the fly; or (4) the agent has a pretty good model of the world, so that the losses of opportunity are limited.
2.4
Traditional Approaches
Briefly, traditional theories, e.g., [Rao & Georgeff, 1991; Cohen & Levesque, 1990], appear to suggest that an agent ought to be committed to an intention only as long as it is beneficial, and ought to give it up as soon as it is not. However, if the agent has to decide whether a given intention is beneficial or not repeatedly, the concept of commitment is both descriptively and prescriptively r e d u n d a n t - - t h e agent can just perform the optimal action at each moment! Indeed, this unwittingly supports the position taken by [Brooks, 1991] and others that cognitive concepts can be dispensed with entirely in the study of agents. Our chief reason for including cognitive concepts, however, is that they provide a high-level, flexible, declarative means to describe agents.
76 C o m m i t m e n t s as P e r s i s t e n c e A well-known traditional approach captures commitment as a form of persistence over time (this is in the definition of "persistent goal") [Cohen ~ Levesque, 1990, p. 236]. Intentions are defined as special kinds of persistent goals (pp. 245, 248, 254-255). An intention is defined as a goal that the agent persists with precisely until he comes to believe that Pc.rts-1. it has been satisfied; PERS-2. it will never be satisfied; or PERS-3. the "reason" for adopting it is no longer valid. This characterization is obviously too strong: in many cases an agent should not persist with an intention even though the above do not hold.
Example 7. Joe intended to go to Mars. He would like to give up that intention when he realizes that he does not want to suffer through the training. By the above theory, he cannot! Note that PEas-3 would not help in :Joe's case: he might still persist with his reason for his original intention, which is to be mentioned in the history books as one of the pioneers of interplanetary travel. | Clearly, it is not easy to give up an intention in this theory. We can try to weaken the above requirements for dropping an intention by replacing the set PERs-1, PERS-2, and P E a s - 3 by the set PEas-4 and PERS-5. T h a t is, the agent persists with an intention until he comes to believe that PERS-4. succeSS is inevitable, i.e., the intended condition would hold even if the agent does not perform any (costly) actions to achieve it; or PERS-5. success is unaffordable, i.e., achieving the intention is too expensive.
Example 8. Continuing with Example 7, Joe can now give up his intention to go to Mars if he believes that the glory is not worth the pain. | P r o b l e m s w i t h P e r s i s t e n c e Reasonable though the idea of treating commitment as temporal persistence may seem, it has conceptual and practical shortcomings. Even the weakened postulates PERS-4 and PEas-5 are a special case of the maxim "intend something as long as it is useful to do so". In other words, an intention is held as long as it has a positive expected utility--the expected utility is negative or zero when Pm~s-4 and PERS-5 are satisfied. This maxim is eminently rational. It says that a rational agent should hold an intention only so long as he believes the intention to be beneficial, all things considered. However, in taking care of rationality, this maxim makes the cognitive concepts theoretically redundant. This is because it requires an agent to engage in deep reasoning about his intentions at each moment. If the agent can perform such reasoning effectively, he might just decide what actions are optimal, and save all the bother of having intentions. Thus one of the major conceptual intuitions about intentions and commitments is lost. Intuitively, commitments should ordinarily lead to persistence. The traditional approaches err in identifying commitments with persistence.
77
3
Entrenchment
The essence of commitments is in avoiding having to repeatedly reason about one's actions. We call this entrenchment. Persistence is a natural consequence of entrenchment. Conative entrenchment applies to the entrenchment corresponding to a simple c o m m i t m e n t to an intention. Deliberative entrenchment applies to the entrenchment corresponding to a precommitment, which we introduce below.
3.1
Conative E n t r e n c h m e n t
A c o m m i t m e n t is a means of making the effort and time spent on deliberation have a longer term effect than on just the current action. A c o m m i t t e d agent would certainly miss out some opportunities that he could have noticed by rethinking, but at the advantage of not being swamped by intentions to deliberate on. In m a n y eases, careful deliberation once in a while is better than poor reasoning done repeatedly. And in the long run, the limited agent ought to come out ahead (in terms of effort expended and benefits accrued) for having committed. Let us restrict ourselves to environments where this is the case. We propose that the c o m m i t m e n t of an agent to an intention is a measure of the time or effort he is willing to put in to achieve it, or of the risk he is willing to take in trying to achieve it. Once an agent has adopted an intention, and decided his level of c o m m i t m e n t for it, he would need to reconsider it only when his time or effort or risk exceeds his initial commitment. There is obviously some computation required to keep track of when to reconsider, but in our approach it is relatively small. At that point he could either drop the intention altogether or reinstate it with a fresh commitment. Thus, the greater the agent's c o m m i t m e n t to an intention, the less frequently he would need to reconsider it. Given that c o m m i t m e n t s correspond to entrenchment, the next natural question is are there any normative or prescriptive criteria for determining how much an agent should be committed to an intention ? We propose that there can be several normative criteria depending on how one chooses to design an agent. Two of the possible candidates are introduced below (other variations are of course possible). This paper concentrates on the first candidate. U t i l i t a r i a n E n t r e n c h m e n t This approach seeks to maximize expected utility. The c o m m i t m e n t of an agent to an intention depends on the utility of t h a t intention. For a real-life agent, the commitment would actually have to be set equal to the utility he subjectively expects from the intention. This approach limits the effort invested by an agent in satisfying a c o m m i t m e n t . An i m p o r t a n t special case is when the cost is set to the time taken to perform an action.
E n t r e n c h m e n t T h r o u g h R i s k A v e r s i o n This approach seeks to minimize the total risk that an agent will undertake in order to satisfy a c o m m i t m e n t . Thus it eliminates actions that are highly risky.
78 3.2
Deliberative Entrenchment
Although Bratman presents a commitment-based analysis of intentions, he explicitly rules out cases of precommitment (p. 12). An agent is precommitted to adopting (respectively, not adopting) an intention if he has decided in advance that he will (respectively, will not) adopt that intention. An agent may precommit because he wants to ensure that he will not, in the heat of the moment as it were, make the wrong decision.
Example 9. An agent may prevent himself from adopting the intention of eating ice-cream from his refrigerator by locking it up, and throwing away the key. | Bratman, however, rules out precommitments, because they would complicate the relationship between intentions and rationality. It appears, however, that precommitments are just another example of how a limited agent may try to act rationally. In our formalization, the complexity they introduce is minimal. By precommitting to a course of action, the agent makes the results of his careful reasoning carry through longer. An ice-cream addict can save himself a lot of trouble by making ice-creams inconvenient or impossible to obtain. Precommitments of this sort enable limited agents to marshal their resources for deliberation, and avoid being overwhelmed by a complex world in which their unconsidered actions would likely be suboptimal. While commitments can cause irrationality only between successive deliberations, precommitments are quite blatantly irrational even while deliberating. T h a t is, the agent may know that relative to his beliefs about the utility of the given task what his commitment should be, and yet may commit more or less resources to it. The agent appears internally irrational. However, this sense of blatant irrationality is tempered by the knowledge that the agent would have about his limitations. If the agent knows he is limited, he might prefer his careful thought to his rushed evaluations, even if the former were based on dated information or on predictions that turned out to be false. We conjecture that precommitments are useful when (1) the agent's tasks are clear cut, so he has to do them anyway; (2) the agent is a poor reasoner under time pressure; or (3) the agent has to commit to other agents about his actions in advance. While commitments hold only up to the next deliberation, precommitments persist through ordinary deliberations, and can influence them. One way in which an agent may adopt a precommitment is by taking out a side bet to do as he now thinks is right. While this idea unnecessarily involves the notion of social commitments among agents, it yields the right metaphor with which to think of precommitments.
Example 10. Intuitively, the would-be dieter can make a side bet against his eating the ice-cream, making the cost of having the ice-cream greater than the benefit. This is one of the forms of precommitment that we formalize below. | Precommitments make the associated commitments more or less entrenched, or the corresponding intentions easier or harder to adopt. When commitments
79 are themselves analyzed as the resources allocated to an intention, this makes for a simple treatment of precommitments as well. They m a y be taken as -
-
4
bounds on the commitment that the agent would assign to an intention; or the amount (positive or negative) that must be added to the utility that would have been computed to yield the actual commitment.
Primitive
Concepts
We now introduce some primitive concepts through which c o m m i t m e n t s and p r e c o m m i t m e n t s can be formalized. These concepts involve probabilistie and utilitarian generalizations of a framework previously used to give a logical characterization of intentions and know-how [Singh, 1994; Singh & Asher, 1993]. 4.1
Actions, Branching Time, Probabilities
For concepts such as intentions, commitments, and expected utility to be formalized, we need a model that includes not just time and action, but also possibility, probability, and choice. Our model is based on a set of possible moments, partially ordered by time. The moments represent possible states or snapshots of the world. A partial ordering means that a number of scenarios m a y branch out into the future of each moment, each scenario representing a different course of events along which the world may evolve. At each m o m e n t , each agent can choose from a number of action instances, one on any scenario. Of the scenarios, only one may be realized. Our model assigns probabilities to the scenarios denoting their objective chance of being realized. 4.2
Cognitive Primitives
We take commitments as primitive, and intentions as derived. Cz(p, c) means that agent x is committed to achieving p to a level of c. Then Ix(p) -= (3c > 0 : Cx(p, c)). Ix(p) means that agent x intends p. For simplicity, unlike [Singh, 1994], we assume that p has an explicit temporal component to capture the future-directedness of intentions. Note that even though c o m m i t m e n t s can be of different degrees, these degrees just represent the entrenchment of the corresponding i n t e n t i o n - - a n intention itself is treated as being either on or off, i.e., as binary. Precommitments are notated by PreC--PreCx(p, c) means that agent x has precommitted to achieving p to the extent of c. Delib(z) is true precisely when agent x deliberates. Utility is expressed by a function Uti(., .) applying to an agent and a condition, which is evaluated at a given moment. Utilities are based on the agents' value systems, and are given as primitives. Each action instance has a c o s t - this cost can vary across different instances of the same action, and is given as Cost~(a) on a given m o m e n t and scenario. Many actions, e.g., coin tosses or rolls of dice, have several possible outcomes which have (perhaps, different) objective
80 probabilities associated with them. Objective probability is given by a function, //(.), from scenarios to the unit interval [ 0 . . . 1]. The expected cost of an action can be computed based on the probabilities of the different scenarios along which the action may progress, and its cost along each scenario. A key feature of intentions is that they lead to action. Therefore, another useful primitive is acting for an intention: an agent acts for an intention when his action is a part of what he would do in order to satisfy it. An agent acting for an intention may be doing so even if it would be impossible or unlikely for him to ever succeed though that action. The same action could be performed for two different intentions; of course, several temporally isolated actions may have to be performed for a single intention. Acting for is notated by For. For,(a,p) means that agent x performs action a for condition p. Agents can have beliefs and intentions that involve objective probability and utility statements. A formal semantics is presented in Appendix A. 4.3
Formal Language
The formal language of this paper, C (for CONATE), is based on CTL* (a propositional branching time logic [Emerson, 1990]). It is augmented with (1) quantification over basic actions; (2) functions: Uti, Cost; (3) operators: B and (); (4) predicates: C, PreC, Delib, I, and For; and (5) arithmetical operators and relations. Let x be an agent; p, q propositions; a an action; and v a probability. Now we define the syntax of C. We assume that ~ is a set of atomic formulae; .4 is a set of action symbols; X is a set of agents; and 7" is a set of terms. ]R is the set of reals: IR C T . C may be defined by the following rules, which simplify the syntax of CTL* for ease of exposition. C1. C2. C3. C4. C5. C6. C7. C8. C9. C10. Cll. C12.
Atomic formulae: r E C, for all r E Conjunction: p, q E C ==~p A q E C Negation: p E C ~ -~p E C Action: a E ` 4 , P E C,x E X ~ (a)~p E C Until: p,q E C ~ pUq E C Scenario-quantifier: p E C ~ Ap E C Action-quantifier: p E C, a E `4 =t~ (V a :p) E C Belief: p E C, z E 2( ~ B,p E C Commitment: p E C, c E T, x E X ~ C,(p, c) E C Precommitment: p E C, c E 7", x E X =~ PreC,(p, c) E C Acts-for: p E C, x E X, a E .4 ::~ For,(a,p) E C Utility: p E C, r e It~ ~ (Uti(p) = r) E C
The operators A and -- are the classical boolean operators: Implication (p q) and disjunctions of formulae (p V q) are defined as the usual abbreviations. true abbreviates p V -"p, for any atomic proposition p. false abbreviates --true. An action-expression of the form (a)~p means that action a is performed at the given moment along the given scenario by agent x, and that p holds as soon
81 as a is completed. [a]ap abbreviates "-~(a)~-~p" and means that if a is performed, then p holds upon its completion. "x" is omitted when understood. V and A are restricted quantifiers that apply only to actions. (V a : p) means that there is an action which when substituted for a in p yields t r u e - - t h u s it corresponds to an existential quantifier. (A a : p) abbreviates "-~(V a : -~p)" and corresponds to a universal quantifier. pUq is satisfied at m o m e n t t on a scenario iff q holds on a future m o m e n t on that scenario and p holds at each m o m e n t between now and the given occurrence of q. pOq entails that q holds eventually. Fp denotes "p holds sometimes in the future on this scenario" and abbreviates "trueUp." g p denotes "p always holds in the future on this scenario" and abbreviates "-~F-~p." A scenario-quantifier is one of A and E. A denotes "in all scenarios at the present time," and E denotes "in some scenario at the present t i m e " - - t h a t is, Ep _= ~A-~p.
Ezample 11. (1) (a)~true means that x performs a at the given m o m e n t on the given scenario. (2) (a)C~(p, c - u) means that immediately after a, x's commitment to p is c - u. (3) B~(Utia(p) - Costa(a) = e) means that x believes that, for him, the difference in the utility of achieving p and the cost of performing a is e. (4) (V a : Forx(a,p)) means that x performs some action for p. (5) F ( V a : For~(a,p)) means that eventually x will perform some action for p. (6) AGp means that p will hold at all moments on any (future) scenario. | (a),~true captures the notion of agent x (currently) performing action a. We require that an agent who acts for a condition intends it, and immediately performs the corresponding action. T h a t is, For,.(a, p) --+ (a)~true always holds. 5
Entrenchment
5.1
Formalized
Conative Entrenchment
We consider only utilitarian entrenchment below. An i m p o r t a n t property of intentions that connects them to action is captured by the following constraint: an agent who has a positive commitment to achieving a condition must eventually act on it (unless he deliberates again in the meantime). The following constraint says that at all scenarios from any moment, if x intends p then eventually x will deliberate or eventually x will act on p. We omit the agent subscript throughout this section, because the constraints all involve only agent x. D1. A[l(p) --+ F(V a : (a)Delib(x) V For(a,p))] The following constraint essentially "uses up" an agent's c o m m i t m e n t to an intention. As the agent acts for his intention, his intention becomes progressively less entrenched. Finally when his c o m m i t m e n t for the intention is no longer positive, the constraint for motivation (D1 above) will no longer apply; so the agent will no longer be required to act for achieving that condition. Under ordinary circumstances, the agent will no longer act for that intention. He might reinstate that intention (i.e., adopt an intention for the same condition again), in which case he will again be able to act for it.
82 D2. A[(C(p, c) A For(a,p) A Cost(a) = d) --+ (a)C(p, c - d)] For contrast, PERS-1 and P g a s - 2 both translate into constraint D3 in our framework. Intuitively, constraint D3 says that i f x intends p then either he intends p forever or intends p until he comes to believe p or believe that p is impossible. The traditional theories separately require that all goals and intentions are eventually dropped. This would eliminate the GI(p) subexpression below, but causes other repercussions, which are discussed in [Singh, 1992].
D3. All(p) --* (GI(p) V (I(p)U(B(p) V B(AG~p))))] The proposed framework does not require constraint D4 either, which says that if an agent has a certain intention, and comes to believe that it has been satisfied, then he has to deliberate immediately.
D4. A[(l(p) A B(p)) --* Delib(x)] Instead, we can have the weaker constraint D5, which says that when an intention is believed to have succeeded, the agent will eventually deliberate.
D5. A[(l(p) A B(p)) ~ FDelib(x)] Essentially the same improvement can be made for the constraint calling for intentions to be dropped when it is believed that the intended condition is impossible along any scenario.
D6. A[(I(p) A B(AG-~p)) -~ FDelib(x)] This leaves one important requirement, which concerns the definition of commitment as the utility of the corresponding intention. The following constraint says that if an agent deliberates, and adopts an intention, his commitment to that intention equals his believed utility of achieving that condition. He does not have to commit to achieving every useful condition.
D7: A[(Delib(x) A
C(p,c) Ac
> 0) ~ B(Uti(p) = c)]
Constraints Putting together the above, we require constraint D1, D5, D6, and D7. The above constraints handle cases of irrational over-commitment. However, additional primitives, e.g., those of [Norman & Long, 1996], are needed to cause agents to deliberate when the reasons for an intention are invalidated. We lack the space to formalize these here. Means and Ends Rational agents must relate their means to their ends. Intentions correspond to ends, and plans to means. The plans can lead to additional intentions, which apply within the scope of the original intention. We discuss this issue briefly to outline some of the considerations a formalization of commitments would need to accommodate.
83
Example 12. Continuing with Example 7 of section 2.4, let us assume that Joe has an end of being famous, and his means for that end is to go to Mars. ] We propose the following maxim: a rational agent ought to be more committed to his (ultimate) ends than to any specific means. This m a x i m is reasonable, because there are usually more than one means to an end, after all. We define Best(p) as the best plan for achieving p. (We have not formalized plans, but one can take them below as partial orders of actions.) For a plan ~, condition p, and scalar c, we define Yield(~,p, s) to mean that the plan ~ yields a subsidiary c o m m i t m e n t of entrenchment s toward p. Then the above m a x i m translates into the following formal constraint.
D8. A[(DeliB(x) A C(p, c) A c > 0) --* (Yield(Best(p), q, 8)
_< c )
The above constraint states that the tota] of tile c o m m i t m e n t s to the subsidiary intentions are bounded by the c o m m i t m e n t to the original intention. Greater sophistication is required before the subtleties of the interplay between rationality and intentions can be fully captured. 5.2
Deliberative Entrenchment
Now we turn to a formalization of the notion of p r e c o m m i t m e n t in the above framework. We now redefine the c o m m i t m e n t s assigned by an agent to an intention to take into account the p r e c o m m i t m e n t s he might have. The following constraints shows how precommitments can override current deliberations. P r e c o m m i t m e n t b y D e l i b e r a t i v e I n e r t i a By deliberative inertia, we mean that the agent, on adopting a precommitment, simply does not reconsider the corresponding c o m m i t m e n t as often as he might have otherwise.
Example 13. An agent m a y continue to have ice-cream out of habit, even though he would not do so were he to examine his diet carefully. | By conative entrenchment, an agent's c o m m i t m e n t to an intention would peter out in due course. Deliberative inertia makes it last longer. Thus when an agent is precommitted to achieving a certain condition, he would possibly allocate more resources to it than he would have otherwise.
Lower Bound Hysteresis. An agent might precommit by setting the m i n i m u m resources that would be assigned to the given commitment.
D9. A[(Delib(x) A B(Uti(p) -- e) A PreC(p, d)) --* C(p, max(c, d))] Upper Bound Hysteresis. An agent might precommit by setting the m a x i m u m resources that would be assigned to the given c o m m i t m e n t .
D10. A[(Delib(x) A B(Uti(p) -- c) A PreC(p, d)) -~ C(p, rain(e, d))]
84
Additive Bias. An agent might precommit by adding (positive or negative) resources to a commitment when the conative entrenchment is computed. D l l . A[(Delib(x) A B(Uti(p) = c) A PreC(p, d)) --* C(p, c + d)] P r e c o m m i t m e n t b y E l i m i n a t i o n o f O p t i o n s Instead of relying on deliberative inertia, an agent may precommit by simply eliminating certain options, the availability of which might at a later time "tempt" him to consider giving up a commitment too early. An agent may thus "burn his bridges" so to speak, and lose the option he would otherwise have of crossing them. The idea is to take some drastic step that affects the cost of the actions or the utility of the intended conditions, and then to deliberate.
Cost Adjustment. An action is performed that leads the world along a scenario where the cost of the best plan to satisfy a commitment is higher than before. Interestingly, one cannot reduce the cost of the best plan through this technique, because (by Bellman's Principle) the best plan would automatically include any optimizations that might be available and known to the agent. Example 1~. In the refrigerator example of section 3.2, the agent exhibits his precommitment, not by decreasing the resources allocated to the relevant intention, but by making the actions available for achieving it more expensive: he would now need to pry open the refrigerator door, or first locate the key. | DI2. A[(PreC(p, d) A B(Ecost(Best(p)) = c)) -~ ( V a : (a)true A B((a)Ecost(Best(p)) = c 4- d) A (alDelib(x)) ]
Utility Adjustment. Conversely, an agent may perform actions that would later make certain intentions more attractive, i.e., increase their utility to him then. Example 15. Someone may leave his wallet in his office to make sure he returns later to pick it up. Thus he would have to go to his office for his wallet, even if he would not have gone otherwise. | This is formalized below. An agent with precommitment d for p performs an action after which his utility for p increases by d. D13. A[(PreC(p, d) A B(Uti(p) = c)) --+ (V a : (a>true A B((a>Uti(p) = c-4- d) A
Delib(x))] 5.3
Conative Policies and Deliberation
Generalizing our technical development, we can see that a natural extension would be to declaratively express conative policies describing not only how the agent acts and deliberates, but also how he controls his deliberations. These policies could accommodate not only the general rationality requirements studied
85 above, but also be made conditional upon properties of the domain, and the qualities of the agent, e.g., the qualities studied by [Kinny & Georgeff, 1991]. One can impose constraints on the conative policies of agents, e.g., to prevent them from adopting intentions that they believe are mutually inconsistent or inconsistent with their beliefs. The conative policies embodied in an agent would not change due to ordinary deliberations. Deliberations of a deeper nature would needed to create and modify them. These deliberations are a kind of "soul searching" that an agent may go through in deciding upon some value system. Intuitively, conative policies are commitments about commitments. Although, one can formally define arbitrarily nested commitments, we suspect that a small number of nestings would suffice in cases of practice, and in fact, greater nestings would probably prove counter-productive. 6
Conclusions
and
Future
Work
We argued that commitments as well as the allied concepts of precommitments and conative policies are crucial for the design of intelligent, autonomous, but limited, agents. However, commitments must be formalized in a manner that is compatible with rationality, and which emphasizes the fact that real agents are limited. It is with this understanding that commitments are an effective component of a descriptive theory of rationality in limited agents. It is with the same understanding that they can prove useful in prescriptive theories as well. There has been some good work in building experimental systems that capture the intentions, commitments, plans, and coordination strategies of agents, and in developing testbeds that can be declaratively customized to different situations, e.g., [Kinny & Georgeff, 1991; Sen & Durfee, 1994; Pollack et al., 1994; Singh el al., 1993]. However, a lot more good work is need to handle the kinds of examples and constraints we described above. Eventually, this work would need the expressiveness to handle the more general conative policies. Future work includes formally expressing a wide range of conative policies that capture various rationality postulates as well as characterize interactions among agents. Real-time reasoning aspects were ignored above, but are important in several applications. Similarly, considerations on deliberating about and scheduling across multiple intentions are also important. Overall, we believe the study of commitments in the architecture of limited, rational agents will prove highly fruitful. References [Brand, 1984] Brand, Myles; 1984. Intending and Acting. MIT Press, Cambridge, MA. [Bratman, 1987] Bratman, Michael E.; 1987. Intention, Plans, and Practical Reason. Harvard University Press, Cambridge, MA. [Brooks, 1991] Brooks, Rodney; 1991. Intelligence without reason. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Computers and Thought Award Lecture.
86
[Castelfranchi, 1995] Castelfranchi, Cristiano; 1995. Commitments: From individual intentions to groups and organizations. In Proceedings of the International Conference on Multiagent Systems. [Cohen & Levesque, 1990] Cohen, Philip R. and Levesque, Hector J.; 1990. Intention is choice with commitment. Artificial Intelligence 42:213-261. [Emerson, 1990] Emerson, E. A.; 1990. Temporal and modal logic. In Leeuwen, J.van, editor, Handbook of Theoretical Computer Science, volume B. North-Holland Publishing Company, Amsterdam, The Netherlands. [Harman, 1986] Harman, Gilbert; 1986. Change in View. MIT Press, Cambridge, MA. [Kinny &: Georgeff, 1991] Kinny, David N. and Georgeff, Michael P.; 1991. Commitment and effectiveness of situated agents. In IJCAL [McCarthy, 1979] McCarthy, 3ohn; 1979. Ascribing mental qualities to machines. In Ringle, Martin, editor, Philosophical Perspectives in Artificial Intelligence. Harvester Press. Page nos. from a revised version, issued as a report in 1987. [Newell, 1982] Newell, Allen; 1982. The knowledge level. Artificial Intelligence 18(1):87-127. [Norman & Long, 1996] Norman, Timothy J. and Long, Derek; 1996. Alarms: An implementation of motivated agency. In Intelligent Agents Ih Agent Theories, Architectures, and Languages. 219-234. [Pollack et al., 1994] Pollack, Martha E.; Joslin, David; Nunes, Arthur; Ur, Sigalit; and Ephrati, Eithan; 1994. Experimental investigation of an agent commitment strategy. Technical Report 94-13, Department of Computer Science, University of Pittsburgh, Pittsburgh. [Rao ~z Georgeff, 1991] Rao, Anand S. and Georgeff, Michael P.; 1991. Modeling rational agents within a BDI-architecture. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning. 473-484. [Sen &: Durfee, 1994] Sen, Sandip and Durfee, Edmund H.; 1994. The role of commitment in cooperative negotiation. International Journal of Intelligent and Cooperative Information Systems 3(1):67-81. [Simon, 1981] Simon, Herbert; 1981. The Sciences of the Artificial. MIT Press, Cambridge, MA. [Singh & Asher, 1993] Singh, Munindar P. and Asher, Nicholas M.; 1993. A logic of intentions and beliefs. Journal of Philosophical Logic 22:513-544. [Singh et al., 1993] Singh, Munindar P.; Huhns, Michael N.; and Stephens, Larry M.; 1993. Declarative representations for multiagent systems. IEEE Transactions on Knowledge and Data Engineering 5(5):721-739. [Singh, 1992] Singh, Munindar P.; 1992. A critical examination of the Cohen-Levesque theory of intentions. In Proceedings of the lOth European Conference on Artificial Intelligence. [Singh, 1994] Singh, Munindar P.; 1994. Multiagent Systems: A Theoretical Framework for Intentions, Know-How, and Communications. Springer Verlag, Heidelberg, Germany. [Singh, 1996] Singh, Munindar P.; 1996. A conceptual analysis of commitments in multiagent systems. Technical Report TR-96-09, Department of Computer Science, North Carolina State University, Raleigh, NC. Available at http://www4.ncsu, edu/eos/info/dblab/www/mpsingh/papers/mas/ commit, ps.
87
A
Formal Semantics
The semantics of C is given relative to intensional models. The formal model is as described informally in section 4. Let M = (T, <, Jill, B, C, P, A, 11, f2} be a model. Here T is a set of possible moments ordered by <; [[]] assigns intensions to atomic propositions, predicates, and actions. B assigns an alternativeness relation to each agent that capture his beliefs. C, P, and A assign a set of commitments, precommitments, and acting-for sentences to each agent at each moment. St is the set of scenarios originating at moment t. 11 assigns a probability to each scenario at each moment, with the probabilities of the members of St adding up to 1. ~ assigns a utility (for each agent) to each condition at each moment. The intension of an atomic proposition is the set of moments where it is true. The intension of a predicate is a function that takes yields a set of moments for each tuple of the predicate's arguments. The intension of an action is, for each agent x, the set of periods in which an instance of it, is performed by x. Thus, [t,t'] E [[a]]~ means that agent x performs action a from moment t to t'. We require that action instances be nonoverlapping. We lack the space to describe additional "coherence" (i.e., well-formedness) requirements. The semantics of formulae of C is given relative to a model and a moment in it. M ~ t P expresses "M satisfies p at t." M ~s,t P expresses "M satisfies p at moment t on scenario co," and is needed for scenario-formulae as defined in section 4.3. The satisfaction conditions for the temporal operators are adapted from those in [Emerson, 1990]. Formally, we have the following definitions: $1. $2. S3. $4. $5. $6. $7. $8. $9. SIO. $11.
M ~t r ifft E [~b~ M DtpAqiffM ~tp&M
Dtq
M~t--piffM ~tp M ~tApiff(VS:SESt =>M~s,tp) M ~ t ( V a : q) iff (3b : b E A & M ~ t ql~), where qlba indicates the substitution of every occurrence of a by b in the expression q. M ~s,t pUq iff (3t' : M ~s,t, q&(Vt" : t <_ t" <_ t' => M ~s,t,, P)) M ~s,t Bp iff (Vt' : t' E B(p) => M ~t, P) M ~s,t C(p,c)iff C(p,c)C C~(S,t)) M ~s,t PreC(p,c)iff PreC(p,c) e P ~ ( S , t ) ) M #s,t For(a,p) iff For(a,p) e C ~ ( S , t ) & M #s,t (@true M #s,t P i f f M ~e P, if semantic rules $6, $7, $8, $9, and SIO do not apply on p n,(s)
x c
$12. M Dt Ecost(p) = e iff (S E st&M Ds,t (a}true A Cost(a) = c) = e The expected cost of an action is the weighted sum of its costs along the scenarios on which it occurs.
Limited Logical Belief Analysis A n t o n i o M o r e n o 1 a n d T o n Sales 2 1 Escola T~cnica Superior d'Enginyeria, Departament d'Enginyeria Informktica Universitat Rovira i Virgili Carretera de Salou, s/n; 43006-Tarragona; Spain amoreno~etse.urv.es, +34-77-559681 2 Facultat d'Inform~tica, Departament de Llenguatges i Sistemes Informgtics Universitat Polit~cnica de Catalunya Pau Gargallo, 5; 08028-Barcelona; Spain sales~lsi.upc.es, §
A b s t r a c t . The process of rational inquiry can be defined as the evolution of a rational agent's belief set as a consequence of its internal inference procedures and its interaction with t h e environment. These beliefs can be modelled in a formal way using doxastic logics. The possible worlds model and its associated Kripke semantics provide an intuitive semantics for these logics, but they seem to commit us to model agents that are logically omniscient and perfect reasoners. These problems can be avoided with a syntactic view of possible worlds, defining them as arbitrary sets of sentences in a propositional doxastic logic. In this paper this syntactic view of possible worlds is taken, and a dynamic analysis of the agent's beliefs is suggested in order to model the process of rational inquiry in which the agent is permanently engaged. One component of this analysis, the logical one, is summarily described. This dimension of analysis is performed using a modified version of the analytic tableaux method, and it models the evolution of the beliefs due to the agent's deductive power. It is shown how non-perfect reasoning can be modelled in two ways: on one hand, the agent's deductive abilities can be controlled by restricting the tautologies that may be used in the course of this logical analysis; on the other hand, it is not compulsory to perform an exhaustive analysis of the initial tableau.
1
Introduction
T h e a i m of t h i s work 3 is to m o d e l the process of rationM inquiry, i.e. t h e (rationMly c o n t r o l l e d ) t r a n s f o r m a t i o n of t h e beliefs of an intelligent agent over t i m e . T h e classical p h i l o s o p h i c a l t r a d i t i o n has considered ([REBR79]) two c o m p o n e n t s in this process: a rational one, t h a t consists in t h e a p p l i c a t i o n o f some inference p r o c e d u r e s to t h e a c t u a l beliefs (resulting in the a d d i t i o n of inferred beliefs or the discovery of s o m e i n c o n s i s t e n c y in t h e m ) , a n d an empirical one, which a d d s or r e m o v e s beliefs a c c o r d i n g to t h e result of o b s e r v a t i o n s p e r f o r m e d in t h e a g e n t ' s 3 Funded by the European Union, in the Human Capital and Mobility project VIM (A Virtual Multicomputer), under contract ERBCHRXCT930401:
t05
environment. We will also consider these two dimensions of analysis, as shown in Sect. 3. Logics of belief (doxastic logics, see e.g. [HAMO92]) are tools used to analyse in a formal way the reasoning about belief performed by an intelligent agent. These logics are modal logics (see [HUCR68]) in which the necessity operator is interpreted as belief. The formulas of these languages are usually given a semantic interpretation in the possible worlds model ([HINT62]), with its associated Kripke semantics ([KRIP63]). This semantics postulates that there is a set of possible worlds W and a binary accessibility relation R between them that connects each world w with all the worlds that could be the real one according to the agent's beliefs in w. The agent is said to believe a formula a in a world w iff a is true in all the worlds R-accessible from w (i.e. those worlds w' such t h a t
(w R w')). Some drawbacks become apparent when these logics are used to model an agent's process of rational inquiry. The main problem is that the modelled agents are logicMly omniscient (they believe all classical tautologies) and perfect reasoners (their beliefs are closed under classical deductive closure). These properties could be appropriate if the agent's designer wished to model ideal agents (e.g. the children in the m u d d y children puzzle, see [FHMV95]), but they are clearly inadequate if more realistic (human or computational) agents are considered. Some arguments against logical omniscience and perfect reasoning could be the following: - The agent can be unaware of certain facts (Fagin and Halpern tried to take this fact into account in their logic of general awareness, [FAHA85]). - The agent can have limited resources (e.g. the time required or the space needed to perform a given inference). This is the most evident justification, if rational agents have to be implemented at all in real computers. - The agent can ignore some relevant rules (e.g. the agent may not have been told what the rule of Modus Tollens is). This view was clearly considered by Konolige in his Ph.D. thesis ([KONO86]), where each agent was modelled with a base set of beliefs and a (possibly incomplete) set of inference rules. Our goal is to overcome these difficulties, by developing a way to model the process of rational inquiry (the evolution of the beliefs of a rational agent over time as a consequence of its interaction with the world and its internal inferential processes), keeping the possible worlds model and the Kripke semantics (because, after all, they seem a natural and intuitive semantics for modal logics) but trying to avoid logical omniscience and perfect reasoning. This work could for instance inspire how to build a belief module for an non-omniscient rational agent based on a BDI (Belief-Desire-Intention) architecture (see e.g. [RAGEgl], [RAGE95]).
2
Syntactic
View
of Possible
Worlds
We define a world as a set of formulas in a (subset of a) propositional doxastic language. There are no conditions imposed on this set, so it can be both partial
106
(most facts about the actual world will probably not be contained in each possible world) and inconsistent (although perhaps the inconsistency is not apparent in the set, it may be hidden in its deductive closure). Intuitively, a world is envisioned as any situation conceivable by the agent, This characterisation of possible worlds implies that a world can fail to contain some (even all) tautologies (and thus, logical omniscience is -in principle- avoided), and that it does not have to be deductively closed (and, therefore, the agent is not -in principle- a perfect reasoner). It can be argued for this kind of worlds with a number of ideas: - A logically inconsistent possible world could model for instance the situation that arises when the agent is unable to take all its beliefs into account in every inference; if it focuses in a subset of them (call that a context), it can draw conclusions which are consistent within the context but inconsistent if all the beliefs are considered (this fact was remarked by Shoham in [SHOH91]; Fagin and Halpern modelled this situation in their logic o f local reasoning, [FAHA85]; Delgrande also considered this idea in [DELG95]). - It is arguably possible to define interesting procedures of inquiry over inconsistent belief sets (as shown e.g. in [REBR79]). - The situation where no tautological information is believed could model ([JASP94]) for instance the initial state of information in the environmentalist tabula rasa theories of Locke and Rousseau. In the AI literature most authors deal with the problems of logical omniscience and perfect reasoning by introducing impossible possible worlds, e.g. possible worlds where logical connectives do not behave in the usual way, or tautologies may not be true, or inconsistent formulas may hold (see e.g. impossible possible worlds in [HINT75], non-classical worlds in [CRES72], nonstandard worlds in [REBR79] or situations in [LEVE84]). These semantic approaches somehow alleviate those problems, but they cause different ones (see e.g. [McAR88], [REIC89] for detailed reviews of these and other approaches). On the other hand, the main inconvenient of syntactic approaches (i.e. defining worlds as sets of formulas) is that they are very limiting, in the sense that the modelled agent does not seem to have a minimal set of inference capabilities (it can believe p and (p ::~ q) and not believe q, or it can believe (p A q) and not believe either p or q). Some authors (see e.g. [HALP86]) argue that, in this case, there seems to be no way to make a knowledge-based analysis of the agent's beliefs. The rest of this paper shows how syntactically-generated beliefs can be analysed and how they evolve in time as a consequence of this analysis. 3
Dynamic
Belief
Analysis
The evolution of the agent's beliefs over time is modelled with a dynamic analysis composed by two interwoven strands: - a logicM dimension that models the inferential (strictly deductive) processes carried out by the agent, and
107
- an experimental dimension that models the changes of beliefs caused by the interaction of the agent with its environment. The rest of the paper describes only the logical analysis (a description of the experimental side of the analysis and a whole picture of the belief system model can be found in [MOSA95], [MORE95]). The basic idea is that the experimental side of the analysis is used to corroborate or falsify the results obtained in the logical analysis, in a Popperian fashion ([POPP34]). If a belief obtained in the logical analysis is falsified in the experimental analysis, a belief revision procedure would be used to change the initial beliefs accordingly and the logical analysis could proceed using the revised belief set.
4
Logical
Analysis
L; is a language composed by a finite fixed set of basic propositions (7)), two logical operators (-~, V) and the modal belief operator B. The formulas of this language are restricted to the standard propositional ones prefixed by a (possibly empty) sequence of modal operators 4 (form. ~ [B*]propositional_form.). A world is defined as a set of propositional formulas of s In the logical analysis a sequence of accessibility relations is generated (see [MOSA96], [MOSA95], [MORE95]); each of them induces (via the standard Kripke semantics) a different belief set. This is the way in which the logical analysis models the evolution of the agent's beliefs over time due to its internal deductive inferences. This evolution will be illustrated with a small example, where the agent's beliefs in world w~ will be analysed. Assume that the agent's initial belief set 5 in w~ is {R, (-~R V (S V P))}. The first accessibility relation (R0) can be generated by applying the standard Kripke semantics backwards to the initial belief set: world w' is R0-accessible from world we iff every formula which is believed by the agent in we appears in w/. Moreover, the agent's beliefs in a world w are restricted in the following way: they have to be exactly all the propositions that appear in all its doxastic alternatives, i.e. there can be no proposition common to all the accessible worlds from w that is not a belief in w. The ontology of possible worlds just defined is represented in Fig. 1. Each world w is represented by a rectangle that contains four columns; from left to right, they represent propositional formulas that must be contained in w, propositional formulas that must not be contained in w, - formulas that are believed in w and -
-
4 Only one agent and a subset of propositional modal logic are considered in this paper; both restrictions will be overcome in future work. 5 Nested beliefs are not considered in this example; they would be analysed in a similar way, although an analytic tableaux tree would be necessary for each level of nesting.
108
R
-~RV(SVP)
I
LR
R0 ~ RV(SVP)
I 1 We
W~
Fig. 1. World we and its doxastic alternatives - formulas that are not believed in w 6. Classes of worlds (~) are represented in the same way (reading the expression contained~believed in w as contained~believed in all the worlds in class -~). we is R0-related to all the worlds in class ~-j, i.e. all the worlds that contain (at least) R and (~R V (S V P)). The logical analysis of the agent's beliefs is performed using (a modified version of) the analytic tableaux m e t h o d (see e.g. [SMUL68]). A tableau is divided into a left and a right column 7. The analysis starts with a tableau To t h a t contains in its left column the agent's initial belief set in we: {R, (-~R V (S V P))}. In the light of previous considerations, it can be argued that To not only represents the agent's beliefs in we; it can also be seen as a (partial) description of all the worlds initially connected to we, because all the beliefs in we are true in all of these worlds (To represents ~-j). Thus, a tableau represents a class of worlds: all the worlds in which the formulas contained in the tableau are true. More precisely, a tableau represents a11 those worlds that contain the formulas of its left column and do not contain any of the formulas of its right column. The splitting rule of the analytic tableaux method has also been modifiedS: when it is applied to analyse a disjunction (r V r contained in the left colu m n of a tableaux T, it generates three subtableaux: the first one has r and r in its left column; the second one has r in its left column and r in its right column, and the third one has r in its left column and r in its right column. Thus, this rule explores all the possibilities of accounting for the truth of the analysed disjunction, namely t h a t one (or both) of its members are true. The three subtableaux also keep all the formulas contained in T 9. 6 The last two columns are included in an inner rectangle to reinforce its doxastic interpretation. In Fig. 1 only the third column in we and the first one in ~ are used. This is the first difference with respect to classical analytic tableaux, that contain only one set of formulas. s Recall that, in the classical analytic tableaux method, the analysis of a disjunction (a V fl) causes the generation of two subtableaux, one containing a and the other one containing ft. 9 This is a minor modification with respect to the classical method; it is nevertheless convenient because later we can talk about formulas contained in all open tableaux instead of the longer and more cumbersome formulas that are contained in every branch of the tableaux tree.
109
After the application of this rule to analyse (-~R V (S V P)) (the only disjunction in To), the situation shown in Fig. 2 is reached.
R
-,RV(SVP)
I
. R
I SvP T1
SvP
]
R
I
-"1
-~R
Rv(SvP)
I svP T2
%
Fig. 2. Splitting rule applied to (-~R V (8 V P))
Consider the following partition of class w~: w~ 1 : - w~ 2 : w~3 : - w~ 4 :
worlds worlds worlds worlds
of of of of
~ ~ ~ ~
that that that that
contain -~R and (S V P). contain -q~ but do not contain (S V P). contain (S V P) but do not contain -~R. do not contain either - R or (S V P).
T h e tableau T1 represents the class of worlds WgT~, T2 represents wc,2 and Ta represents W~-7~1~ T h e worlds in class w~4 are not even considered by the analytic tableaux m e t h o d , because it looks for models of the initial set of formulas and it is not possible to have a model of the set {R, (-~R V (S V P))} in which b o t h ~R and (S V P) fail to be true. In the tableaux analysis that is being considered, a tableau is closed if any of the following circumstances holds: -
T h e tableau contains a formula and its negation 11 in its left column; the worlds belonging to the class of worlds represented by the tableau contain
10 This is a good reason for considering tableaux with positive and negative information: it makes it easier to picture the analysis of a disjunction as a partition in the class of doxastic alternatives; in the classical method, the classes of worlds represented by the two subtableaux generated by the splitting rule overlap, i.e. there are worlds that belong to both classes (the worlds that contain both disjuncts). 11 This is also the closing condition in the classical method.
110
-
an explicit contradiction, so they are considered logicMly impossible and eliminated from the analysis. The tableau contains the same formula both in its left and right columnsl2; it represents the worlds that contain and do not contain that formula, so it represents the empty set of worlds and is dismissed from the analysis.
In the example, T1 and T2 contain both R and -~R in their left columns, so the classes of worlds represented by these tableaux (~'~, w-~'~2) are considered logically impossible and they are eliminated by closing T1 and T2 and dismissing the branches that contain them from the logical analysis. The remaining tableaux tree contains only To and T3. Thus, the effect of the analysis of the formula (-~R V (S V P)) is the restriction of the doxastic alternatives of we, that shrink from to ~ . Therefore, the application of the splitting rule is modelling the generation of a new accessibility relation, R1, that restricts the set of R0-accessible worlds. This restriction is shown in Fig. 3.
R ~RV(SVP) -~R SVP 9J)~ 1
R
R
~RV(SV'P)I R !
I
-~RV(SVP) ] SVP ]
R1 We
:/R -~R] pRV(SVP) ]
|svP
I Z0Ct3
~RV(Svp)RI W'~4 Wet
Fig. 3. Reduction of doxastic alternatives 12 This situation does not arise in the example used in this paper; it would for instance if the formula R were replaced by "~R.
111 we was R0-related to all the worlds in class ~-~, but is Rl-related only to the worlds in class w ~ . The change of the accessibility relation implies a possible change in the agent's beliefs. The beliefs in we at this point would be {R, (-~R V (S V P)), (S V P)}. Notice that the restriction of the doxastic alternatives has caused the addition of a new belief, (S V P). This formula is a new belief because it was not included in all the worlds in ~ but it appears in all the worlds in w ~ . Furthermore, as -~R is not included in any of the worlds in w~3, the analysis also shows that it is not a belief in we, and this fact is also reflected in Fig. 3, with the addition of -~R in the fourth column of the representation of we. Summarizing, positive beliefs are those formulas contained in the first column of all classes of doxastic alternatives, whereas negative beliefs are those formulas contained in the second column of at least one class of doxastic alternatives. The belief analysis could proceed with the generation of the next accessibility relation, R2; it is generated by analysing the disjunction (S V P) in 2"3, obtaining the result shown in Fig. 4.
R
~Rv(svP) To
R
-~R
'-,RV(SVP) SVP
/1
t
R
RRV(SVP)
i"4
-~R
-~RV(SvP) S SvP P
P
%
%
Fig. 4. Analysis of (S V P)
A more intuitive interpretation of the restriction from R1 to R2 results if the following partition of ~ in four subclasses is considered: - w ~ 1 : worlds of ~ w ~ : worlds of ~
-
that contain S and P. that contain S but do not contain P.
112
- w ~ 3 : worlds of ~
that contain P but do not contain S.
- w~3, : worlds of ~
that do not contain either S or P.
we, that was Rl-related to the worlds in w~3, is only Re-related to the worlds in w,~3l, w ~ and w~---'~',but not to those in w ~ 0 (classes w~31, w~3~ and w ~ are represented by tableaux T4, T~ and T6). This situation is depicted in Fig. 5.
R i i
~RV(SVP) ~R
SVP
~Rv(svP) ~R ~R
~c~ 2
.RV(SVP)
R -~R ~Rv(svP)
SMP
SVP S P R2
R
i,~r131
~R
~RV(SVP) SVP
R
P
~R
~Rv(svP) SVP
~Rv(svP)
S
s
SVP
P
R
svP
~Rv(SvP)
I-
~R
~g7
Fig. 5. Further reduction of doxastic alternatives
113
Notice that the set of positive beliefs has not changed, because all the formulas common to all the R2-accessible worlds were already common to all the Rl-accessible worlds. However, the set of negative beliefs has increased, because P is not contained in some of the doxastic alternatives (the worlds in w~3~ do not contain it). The same situation happens to S, that is not contained in the worlds in class w~3, ; thus, neither P nor S are believed in w~. This fact is reflected in the fourth column of the rectangle that represents we in Fig. 5. There are no more disjunctions to be analysed in the open tableaux, so this is the end of the (purely logical) belief analysis in we (without introducing instances of tautologies, topic considered in Sect. 5). The final set of (positive and negative) beliefs of the agent in we is {BR, B(-q~ V (S V P)), B(S V P),-~B-~R,-~/gP,--,BS}. 5
Introducing
Tautologies
As shown in Sect. 4, the classical analytic tableaux method has been modified in a number of ways: - A tableau contains two sets of formulas (the left and the right columns). It represents the set of those possible worlds that contain all the formulas in its left column and do not contain any of the formulas in its right column. - When the splitting rule is applied, three subtableaux are generated. One of them contains both members of the analysed disjunction in its left column, while the other two contain one of them in the left column and the other one in the right column. These subtableaux represent all those (classes of) worlds in which the analysed disjunction holds. Although only the splitting rule (the analysis of disjunctions contained in the left column) was needed in the example shown in Sect. 4, a set of six rules is required in order to analyse all the propositional formulas (containing negations and disjunctions) that m a y appear in both columns of the tableaux. These rules are shown in Fig. 6. - After the application of any of the analytic tableaux method rules, the analysed proposition and all the other formulas in the tableau are maintained in the resulting tableau(x), they are not deleted. A tag would be added to the formula in other to recall that it had already been analysed. There is yet another important modification to the classical analytic tableaux method to be mentioned: it is forbidden to add to an open tableau any tautology (recall that, in the classical analytic tableaux method, a tautology can be introduced in any open tableau at any point). Notice that the agent's (positive) beliefs at a certain stage of the analysis are those formulas t h a t appear in all open tableaux. Thus, if any tautology could be freely introduced in any open tableau, logical omniscience could not be avoided. Moreover, in that case the agent's beliefs would be deductively closed, and therefore we would be modelling a perfect reasoner la. la If a formula c~ were a logical consequence of the initial set of beliefs I, it could be obtained in the logical analysis as a new belief by introducing the (tautological)
114
I
_,.~r
I
r162
I
r
r162
r162 r r
r
I
I I:V
r
I Fig. 6. Rules used in the analytic tableaux method Forbidding the use of tautologies is, however, a very drastic way of avoiding logical omniscience and perfect reasoning, because it puts severe limits to the a g e n t ' s deductive capabilities. Consider the following definition: D e f i n i t i o n 1 A formula ~ is derivable from a set of formulas I (I ~o ~) iff
it is possible to perform a logical analysis, using the analytic tableaux method (with the rules shown in Fig. 6), with the initial tableau To containing I in its left column, in such a way that (~ appears in the left columns of all open tableaux (at least one14). Recall t h a t the open tableaux represent classes of doxastic alternatives; therefore, if at a certain point of the logical analysis all open tableaux contain a certain formula (~ V - ~ ) in all open tableaux and exhaustively analysing all those tableaux containing I and --a in its left column, until all of them were closed. That situation would eventually be reached because the set I u {-,a} would be inconsistent. 14 There must be at least one open tableau, otherwise any formula would be derivable from an inconsistent set, which is a consequence that we want to avoid (we want our model to allow some paraconsistency, i.e. the agent's beliefs should not collapse in the presence of a single inconsistency).
115 f o r m u l a f , this formula is contained in all the doxastic alternatives so, via the Kripke semantics, it is a belief of the agent at t h a t point. Therefore, the set of potential beliefs is equal to the set of derivable formulas. This set can be characterized in the following way: D e f i n i t i o n 2 Let A be a set of formulas, and let SUBF be the following function:
- S U B F ( A ) = U~zx Sub f(5), where Subf is defined as follows: 9 Subf(Z) : {1}, being l a li e,'al 9
S u b f ( a V/3) = {(a V/3)} U S u b f ( a ) U S u b f ( e )
9 Subf(
)=
u Subf(
)
9 Subf(-~(a V Z)) = {-~(a V fl)} U Subf(-~a) U Subf(-~fl) Lemma
1 (I ~o a ) iff
there is no formula fl such thai bor fl and -1t3 belong to I, and I does nol contain the negation of a 15, and - (a e S U B F ( I ) ) and (I ~ a) -
-
The first and the second conditions are simple: if I contained a f o r m u l a and its negation, To would automatically be closed, so the set of derivable beliefs from I would be e m p t y ; if I contained a given formula, a, then -~a could not be derived, because as soon as it were generated, it would create a tableau containing a (that would be contained in all tableaux) and ~ a , and therefore it would immediately be closed. The third condition states t h a t a m u s t be a s u b f o r m u l a (in the sense of S U B F ) a n d a (classical) logical consequence of the initial set of formulas. Therefore, if no tautologies are allowed, the set of derivable formulas is quite small (a subset of the subformulas t h a t are logical consequences of the initial set of fornmlas); on the other hand, if any t a u t o l o g y is allowed, logical omniscience and perfect reasoning are unavoidable. This fact is suggesting an interesting way of controlling the agent's deductive capabilities: to allow the use o f s o m e (instances of) tautologies in the logical analysis. For instance ([MORE95]), the agent could use some instances of the Axiom of the Excluded Middle, A E M , ( a V -,a). T h e use of this particular t a u t o l o g y seems a natural way to allow the agent to have doubts, to wonder whether it believes some f o r m u l a ( a ) or its negation (-~a). This exception permits the introduction of the f o r m u l a ( a V -~a) in a tableau, which is later split into two subtableaux containing c~ in one column and -,a in the other 16. In this way, the agent can explore b o t h alternatives independently, and the logical analysis can guide the search of examples or is This clause must be read as follows: 9 -~a is not contained in I, and 9 i f a - - -'7, then 3' is not contained in 1 16 The third subtableau would be immediately closed because it would contain both a and --c~ in its left column.
116
counter-examples needed to give more credence to one side of the doubt than to the other. In fact, the possibility of adding this kind of tautologies in the analytic tableaux is a well-known idea in the tradition of classical proof theory, as described e.g. in [BEMA77]. It has also been suggested by Hintikka in his interrogative model of inquiry (see e.g. [HINT81], [HINT86], [HINT92]), in which the process of scientific inquiry is modelled with plays of the interrogative game, which is a game played by the Inquirer and Nature. The Inquirer seeks to know whether a certain conclusion C is true or not, and it can perform deductive moves (controlled with analytic tableaux) and interrogative moves, which are questions put to Nature. In these interrogative moves, Hintikka allows the introduction of (some) instances of AEM in the left side 17 of the subtableaux (those formulas (a V --a) in which a is a subsentence of a formula of the subtableaux or a substitution instance of a subformula of a formula in the subtableaux with respect to names appearing in the subtableaux). The introduction of instances of AEM is also a way to control the set of formulas that can be believed by the agent. The set of derivable formulas depends on the kind of instances of AEM that are allowed, so putting restrictions on the use of this principle is a way to constrain the set of obtainable formulas (the set of potential beliefs). This kind of ideas can be pushed even further, when the use of different tautologies is considered (see [MORE95] for a preliminary study of the use of different instances of AEM and different tautologies in the logical analysis). In fact, the main point of the logical analysis is that not only logical omniscience and perfect reasoning are avoided, but also that the agent's potential beliefs can
be controlled by restricting the (instances of the) tautologies that can be used in the logical analysis. The logical analysis suggested in this paper also introduces another way of modelling an imperfect reasoner (even if any tautology can be used). The classical analytic tableaux method is refutative, in the sense that, in order to prove that a is a consequence of a set I, it tries to fail in building a model of I U {-~a}, by exhaustively is analyzing the tableau containing I and -~a and closing all the subtableaux in the analytic tableaux tree. Our analysis has an i m p o r t a n t difference: it is not compulsory to exhaustively analyze any tableau. In this way we can model imperfect reasoning due to a lack of resources, whereas w~th the previous kind of imperfection we were modellingthose agents that do not have a perfect deductive power. 6
Summary
and
Future
Work
In this article a two-dimensional dynamic analysis of a rational agent's set of beliefs has" been suggested, and its logical dimension has been summarily de17 Hintikka uses the classical Beth-style analytic tableaux, as described in [BETH55]. 18 The tableaux obtained in the course of the analysis are just Hintikl~'s model sets waiting to be completed; they are not models (yet), so they are not useful (for the purposes of the method).
117
scribed. The logical analysis is performed using a modified version of the analytic tableaux method, and it models the deductive processes that the agent would apply to its initial belief set in order to infer more beliefs. Logical omniscience and perfect reasoning are avoided in two ways: - An agent with imperfect deductive power can be modelled through appro-
priate restrictions on the instances of tautologies that are allowed to be introduced in the open tableaux of the logical analysis. - A n agent with limited resources may be modelled with a non-exhaustive logical analysis, i.e. a logical analysis in which some open tableaux contain formulas that have not (yet) been analysed. In our future work these constraints will be studied to discover which (instances of) tautologies yield interesting sets of deductive consequences. Another key issue in this research will be the study of the interaction between the two strands of analysis (see [MORE95] for a preliminary study of these topics).
References [BEMA77] Bell, J., Machover, M., A Course in MathematicM Logic, North Holland, 1977. [BETH55] Beth, E., 'Semantic entailment and formal derivability', in Mededelingen [CRES72] [DELG95] [FAHA85]
[FHMV95]
van de Koninklijke Nederlandse Akademie van Wetenschappen, Afdeling Letterkunde 13, 309-342, 1955. Cresswell, M., 'Intensional logics and logical truth', Journal of Philosophical Logic 1, 2-15, 1972. Delgrande, J., 'A framework for logics of explicit belief', Computational Intelligence 11, 47-86, 1995. Fagin, R., Halpern, J., 'Belief, awareness and limited reasoning', Proceedings of IJCAI-85, 491-501, 1985. Fagin, R., Halpern, J., Moses, Y., Vardi, M., Reasoning about Knowledge,
MIT Press, 1995. Halpern, J., 'Reasoning about knowledge: an overview', Proceedings of TARK-86, 1-17, 1986. [HAMO92] Halpern, J., Moses Y., 'A guide to completeness and complexity for modal logics of knowledge and belief', ArtificiM Intelligence 54, 319-379, 1992. [HINT62] Hintikka, J., Knowledge and Belief, Cornell University Press, Ithaca, N.Y., 1962. [HINT75] Hintikka, J., 'Impossible possible worlds vindicated', JournM of Philosophical Logic 4,475-484, 1975. [HINTS1] Hintikka, J , 'On the logic of an interrogative model of scientific inquiry', Synthese 47, 69-83, 1981. [HINT86] Hintikka, J., 'Reasoning about knowledge in philosophy: the paradigm of epistemic logic', Proceedings of TARK-86, 63-80, 1986. [HINT92] Hintikka, J., 'The interrogative model of inquiry as a general theory of argumentation', Communication and Cognition 25, 221-242, 1992. [HUCR68] Hughes, G., Cresswell, M., An Introduction to Modal Logic, Methuen and Co., 1968.
[HALF86]
118
[JASP94] [KONO86] [KRIP63] [LEVE84] [McAR88] [MOSA96]
[MOSA95]
[MORE95] [POPP34] [RAGE91]
[RAGE95] [REIC89] [REBR79] [SHOH91] [SMUL68]
Jaspars, J., Calculi for Constructive Communication, Institute for Logic, Language and Computation, ILLC Dissertation Series 1994-4, 1994. Konolige, K., A Deduction Model of Be//ef, Morgan Kaufmann, 1986. Kripke, S., 'A semantical analysis of modal logic I: normal modal propositional calculi', Zeitschrift f//r Mathematische Logik und Grundlagen Mathematik 9, 67-96, 1963. Levesque, H., 'A logic of implicit and explicit belief', Proceedings of AAAI84,198-202, 1984. McArthur, G., 'Reasoning about knowledge and belief." a survey', Computational Intelligence 4, 223-243, 1988. Moreno, A., Sales, T., 'Dynamic belief analysis', Workshop on Agent Theories, Architectures and Languages, ATAL-96, at the European Conference on Artificial Intelligence, ECAI-96, Budapest, Hungary, August 1996. Moreno, A., Sales, T., 'Dynamic belief modelling', Fourth InternationaJ Colloquium on Cognitive Science, Donostia, May 1995. An extended version is available as Research Report 95-28-R, Department of Software, Technical University of Catalonia (UPC), 1995. Moreno, A.,'Dynamic belief analysis', Research Report 73, Human Communication Research Centre, University of Edinburgh, 1995. Popper, K., The Logic of Scientific Discovery, Hutchinson and Co. Ltd., 1934. Rao, A., Georgeff, M., 'Modelling rational agents within a BDIarchitecture', Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, KRR-91, 1991. Rao, A., Georgeff, M., 'BDI agents: from theory to practice', Proceedings of the International Conference on Multi-Agent Systems, 1995. Reichgelt, H., 'Logics for reasoning about knowledge and belief', Knowledge Engineering Review 4, 119-139, 1989. Rescher, N., Brandom, R., The Logic of Inconsistency, Rowman and Littlefield Eds., 1979. Shoham, Y., 'Varieties of context', in Artificial Intelligence and Mathematical Theory of Computation, 393-408, Academic Press, 1991. Smullyan, R., First-order logic, Springer Verlag, 1968.
Dynamic Goal Hierarchies John Bell and Zhisheng Huang Applied Logic Group Computer Science Department Queen Mary and Westfield College University of London, London E1 4NS, UK E-mail: {jb, huang}~dcs.qmw.ac.uk Tel: +44(0)171 975 5210 Fax: +44(0)181 980 6533 A b s t r a c t . In this paper we introduce and formalise dynamic goal hierarchies. We begin with a formal definition of goals, according to which they are rational desires. In particular, we require that an agent's goals are coherent; that is, that the agent believes that each goal is jointly realisable with all of the goals which the agent considers to be more important. Thus an agent's goals form a hierarchy, and new goals are defined with reference to it. We then show how preferential entailment can be used to formalise the rational revision of goals and goal hierarchies.
1
Introduction
Stan is writing a paper for a conference. On Monday he decides to work on the p a p e r throughout the week and to finish it on Sunday. He also decides to take Saturday off in order to go to the beach with his family. He considers t h a t it is more i m p o r t a n t to finish the paper, but he believes t h a t going to the beach for the day on Saturday will not prevent him from doing so. On Tuesday Stan works on the p a p e r and his goals remain unchanged. However, on Wednesday he finds t h a t there is a lot of work left to do on the paper: if he goes to the beach on Saturday, then he cannot finish the paper on Sunday. He still considers t h a t finishing the p a p e r is more important t h a n going to the beach. So he abandons the goal of going to the beach, and decides instead to take his family to the circus or to the swimming pool on Saturday afternoon. Each of these alternatives is less pleasant t h a n going to the beach. However, doing either would mean t h a t the p a p e r can still be finished on Sunday. T h e y cannot both be done in one afternoon and Stan has no preference as to which is done. So he leaves it to his children to decide on Saturday morning. On Thursday Stan works on the p a p e r and still plans to go to the swimming pool or the circus with his family on Saturday. However, while working on the paper on Friday he discovers t h a t he has done most of the required work. He realises t h a t he can go to the beach with his family on Saturday and still finish the paper on Sunday. So he decides to do so. Despite the fact t h a t Stan is constantly changing his mind, he is clearly behaving rationally.
89
In this paper we aim to model reasoning of this kind. In order to do so, we introduce the notion of a goal hierarchy. At any point in time an agent has a set of goals, a goal set, and considers some of these goals to be more important than, or preferable to, others. The agent's preferences define a preference ordering on the agent's goal set. Typically the preference ordering is partial. For example, on Wednesday Stan prefers finishing the paper to going to the swimming pool. He also prefers finishing the paper to going to the circus. However, he does not prefer going to the swimming pool to going to the circus, nor does he prefer going to the circus to going to the swimming pool. At any point in time the agent's goals and preferences amongst them form the agent's goal hierarchy at that point in time. Typically the agent's goal hierarchy is dynamic; as time progresses the agent's goals and the preferences among them change. For example, on Wednesday Stan's beliefs change, so he abandons the goat of going to the beach on Saturday and decides instead to go to to the swimming pool or to the circus. However, the goal hierarchies of rational agents tend to exhibit a certain stability. For example, Stan does not reconsider his goals on ~hesday. This is rational as nothing relevant happens to make him change his mind. His goals and the preferences among them thus persist by default. Indeed, throughout the week we see that Stan's goal hierarchy is upwardly stable; that is, the stability of a goal tends to increase as its importance to the agent increases. For example, Stan's goal of finishing the paper is more stable than his goal of going to the beach. This reflects the principle that rational agents should keep higher-level goals in preference to lower-level goals whenever possible. The goals in the hierarchy of a rational agent should also be coherent; that is, they should, in some sense, be jointly achievable. In the sequel we suggest that in resource-bounded agents it is typically sufficient that the agent believes that each of its goals is jointly achievable with every goal that the agent prefers to it. If a rational agent realises that its goals are incoherent, then the agent should revise them in order to restore coherence. For example, Stan's goals on Monday and Tuesday are coherent. However, on Wednesday his beliefs change and he realises that he can no longer achieve both of his goals. Consequently he abandons the goal of going to the beach and adopts instead the new goals of going to the swimming pool and going to the circus as, while these are less preferable, they do not prevent him from realising his most important goal. Stan also tries to maximise his goal hierarchy by replacing goals with preferred ones whenever coherence allows. For example, on Friday he abandons the goal of going to the swimming pool and the goat of going to the circus and re-adopts the preferred goal of going to the beach as this no longer prevents him from finishing his paper. We believe that we are pioneering an interesting new definition of goals and rational goal revision, as part of the larger project of producing a formal theory of practical rationality [3]. Our ideas have been influenced by the work of the philosopher Michael Bratman [5, 6], while our formal work adds to the growing literature on qualitative decision theory begun by authors such as Cohen and Levesque [7] and Boutillier [4], in contrast with the more traditional quantitative approach adopted by authors such as Doyle and Wellman [8].
90 Our theory is expressed in the language g.A [2] which has been extended to include the preference operator of ALX [10, 11]. In order to make this p a p e r self contained, we provide a concise account of this language in the next section. In Section 3 we then give the formal definition of goals and goal hierarchies, and prove some of their properties. In the final section we consider the rational revision of goals and goal hierarchies, show how preferential entailment can be used to formalise this, and, by way of illustration, we show how Stan's reasoning during the week can be formalised.
2 Time, Actions, Beliefs, Desires, Obligations, Preferences T h e language g~4 is a many-sorted first-order language with explicit reference to time points and intervals. CA contains sentences such as OnTable(B)(3) which states t h a t block B is on the table at time point 3. Time is taken to be composed of points, and, for simplicity, we assume that it is discrete and linear. Intervals are taken to be binary multi-sets of points; thus, for example, OnTable(B)([3, 8]) states t h a t B is on the table for the interval lasting from time point 3 to time point 8. Points are thus intervals of the form It, t]. The models of CA are fairly complex possible-worlds structures. Each model comes equipped with an interpretation function ]2 which assigns an n - a r y relation to each n - a r y relation symbol at each interval at each possible world. Thus, for model M, world w in M and variable assignment g:l
M,w,g ~ r ( u l , . . . U n ) ( i ) i f f ( U l , . . . ,Un) e ~)(r,i,w) Primitive event types are combined with intervals to form event tokens. For example, Occurs(Pickup(B))(1) states that a (token of a) pickup-B type event occurs at time 1. A sentence of the form Agent(a, e)(t) states t h a t a is a, possibly unintentional, agent of event e at t, and Does(a)(e)(t) r Oecurs(e)(t) A Agent(a, e)(t). Formally, CA-models contain the relations (9 c E • I • W and .4 c A • E • I • W; where A is the set of agents, E is the set of primitive events, I is the set of intervals, and W is the set of worlds. Then:
M,w,g M,w,g
Oee rs(e)(i) iff (e,i,w) e 0 Agent(a,e)(i) iff (a,e,i,w) e ,4
Complex actions and events can be defined. For example, Does(a, e; e')(i) states t h a t a does e and then e ~ in the interval i: 2 1 For the sake of simplicity of presentation we will let the distinction between terms and their denotations in M given g take care of itself. 2 Here the relations between intervals are the analogues of those in [1]; thus Starts(i,i') ~-~rain(i) =- min(i') A max(i) <_max(i'), etc.
9]
Does(a, e; e')(i) ~-. 3i', i"(Starts(i', i) A Meets(i', i") A Ends(i", i)A Does(a, e)(i') A Does(a, e')(i")) Does(a, ele')(i ) Does(a, e)(i) V Does(a, e')(i) Does(a, elle')(i ) ~-* Does(a, e)(i) A Does(a, e')(i) Does(a, e*)(i) ~-~ Does(a, e + ) (i ) V -~3i' (Contains( i, i ~) A Does(a, e ) (i') ) Does(a, e +) (i) ~-~ 3i' (Starts( i', i) A Does(a, e) (i') A (Ends( i ~, i)V 3i'(Meets(i', i') A Ends(i', i) A Does(a, e+)(i"))) Does(a, ?r r A sentence of the form Bel(a,r states t h a t agent a believes the proposition r for the interval i. Similarly Des(a, r states t h a t a desires r for i. Obl(a, r states that r is obligatory for a for i, and Per(a, r ~ -,Obl(a, -~r The formal semantics for these operators are, for simplicity, the standard possibleworlds semantics, indexed by agent and time point. For example, for agent a interval i and world w, 7~(Bel,a#,w) is a (serial, transitive, Euclidean) binary accessibility relation on worlds which represents a's beliefs in w for i. The t r u t h condition for the belief operator is then as follows:
M, w,g ~ Bel(a, r
iff M, w',g ~ r for all (w,w')
E
~'~(Bel,a,i,w).
No conditions are placed on the accessibility relations for desires and obligations. It is therefore possible for agents to have inconsistent desires and goals. This reflects the anarchic nature of desires, and leaves open the possibility of moral dilemmas. A sentence of the form Pref(a, r r states that agent a prefers r to ~b for the interval i. The semantics of the preference operator begin with von Wright's conjunction expansion principle [16]. According to this principle, to say t h a t you prefer an apple to an orange is to say that you prefer situations in which you have an apple and no orange to those in which you have an orange and no apple. In possible-worlds terms this principle might be stated as follows: agent a prefers r to r for i if a prefers r A ~b-worlds to • A -~r for i. However, this analysis is too simple as it leads to the "paradoxes of conjunction and disjunction"; if r is preferred to ~ then r V )/ is preferred to ~, and r is preferred to ~b A X. Examples of the undesirability of these properties are:"if a prefers coffee to tea then a prefers coffer or poison to tea" and "if a prefers coffee to tea then a prefers coffee to tea and a million dollars". Clearly we need to capture the ceteris paribus nature of preferences: we should compare r A ~ r worlds and r A -~r which otherwise differ as little as possible from the actual world. In order to do so we introduce the selection function from the Stalnaker-Lewis analysis of conditionals [12, 15]. Thus the function cw is of type W x P(W) --* P(W), and, intuitively, cw(w, Ir is the set of closest worlds to w in which r is true. 3 Formally, cw is required to satisfy the conditions imposed by Lewis in [12]. The agent's preferences over time are represented by 3 As usual, ~r
~r
= {w e w :
denotes the set of worlds in M in which r is satisfied by g; i.e., M, w, 9 ~
r
92 the function ~-: A • I --* P ( P ( W ) over sets of worlds to each agent X and Y, X ~-(a,i) Y means that in Y for interwl i. Each ~-(~,i) is
• P(W)), which assigns a comparison relation at each interval. Intuitively, for sets of worlds agent a prefers each world in X to each world required to satisfy the following conditions: 4
(NORM) ~ ~t(a#) X and X ~(a,i) 0, (TRAN) If cw(w, X n 7) ~-(a#) cw(w, Y n -X) and cw(w, Y n -2) ~-(a#) cw(w, Z n Y)then
cw(w,Z
n -2) ~_s ~) cw(w, Z n
~),
(ORL) If cw(w, X__N-2) ~-(a#) cw(w, Z N X)'..._andcw(w, Y n -2) ~-(a#) cw(w, Z N Y) then cw(w, (X U Y) fl Z) ~-(a#) cw(w, Z N Z t2 Y), (ORR) If cw(w, X n Y) ~-(a#) cw(w, Y n -X) and cw(w, X n -2) ~-(~#) cw(w, Z n X) then cw(w, X N Y 0 Z) ~-(a#) cw(w, (Y U Z) N X). The truth condition for preferences is as then follows:
M, w, g ~ Pref(a, r r
iff cw(w, ~r A _~r
~-(a#) CW(W,Ir A _~r
Given these semantics, we have the following axioms:
(CEP) (N) (TR) (DISL) (DZSR)
Pref(a, r r ~ Pref(a, (r A --r (9r A r ~Pref (a, _l_,r (i), -~Pref(a, r _s Pref(a, r r A Pref(a, r x)(i) --* Pref(a, r x)(i) Pref(a, r x)(i) A Pre/(a, r x)(i) ~ Pref(a, r v r x)(i) Pref(a, r x)(i) A Pref(a, r r --~ Pref(a, r X V r
(CEP) states the conjunction expansion principle. (N) and (TR) establish the normality and transitivity of preferences. (CEP) and (N) together imply the itreflexivity (IR), contraposition (CP), and asymmetry (AS) of preferences [10]. Finally, (DISL) and (DISR) state disjunction principles for preferences. 5 It is important to note that as the semantics of preferences are based on closest-world functions they avoid the counter-intuitive properties of simpler possible-worlds semantics for preferences. It is also worth pointing out the difference between preferences and desires. Most people would probably prefer death by firing squad to death by burning at the stake, however, this does not mean that they desire either of these two options. Furthermore, it is possible to have preferences between desires, thus Pref(a, Des(a, r Des(a, r states that at time t agent a's desire for r is stronger than a's desire for r In complex sentences the same agent and temporal terms are often repeated. When abbreviating such sentences we will usually adopt the convention that a missing agent term is the same as the closest agent term to its left, and that a missing temporal term is the same as the closest temporal term to its right. For example Bel(a, Pref(a, r r can be abbreviated to Bel(a, Pref(r r 4 Where, for X C W, ~ = W - X. 5 These principles were suggested by Pierre-Yves Schobbens.
93
3
Goal Hierarchies
We now proceed to the definition of goals and goal hierarchies. For simplicity we will work almost entirely with time points; the extension to intervals is straightforward. D e f i n i t i o n 1. A plan for an agent a to achieve a state 4)6 is a, possibly complex, action e which a can execute in order to bring a b o u t r
Plan(a,e,4))(t) ~ 3t',t"(t <_ t' < t" A Does(a,e)([t',t"]) A 4){t" + 1}). We are, for convenience, a d o p t i n g a simple definition of actions and plans. As in traditional AI-planning, actions can have preconditions and posteonditions, and hierarchical non-linear plans involving a single agent can be represented. Multi-agent plans can also be represented to the extent t h a t the a g e n t ' s actions m a y include getting other agent's to act on its behalf. A more sophisticated definition can, of course, be adopted. For example, it m a y be desirable to have explicit representations of multi-agent plans, partial plans, etc. D e f i n i t i o n 2. A n agent a can realise a state 4) at time t if there is some plan e by means of which a can achieve 4):
Real(a, r
~-~ 3ePlan(a, e, r
Note t h a t realisability is not closed under conjunction; t h a t is, Real(a, 4))(t) A Real(a, ~)(t) need not imply Real(a, 4)A ~p)(t). For example (Girard) suppose t h a t drinks can be obtained from a machine for a dollar each. T h e n if I have a dollar, I can use it to buy a coffee and I can use it to b u y a tea, but I c a n n o t use it to b u y b o t h a coffee and a tea. Once again a more sophisticated definition can be adopted. For example, it m a y be desirable to represent c o m m o n s e n s e causal reasoning a b o u t actions and their effects, the capabilities of agents, etc. We now introduce the notion of a candidate goal. Intuitively, a c a n d i d a t e goal is a state which an agent considers a d o p t i n g as a goal. D e f i n i t i o n 3. A candidate goal is a state which the agent desires and believes to be b o t h realisable and permissible:
CGoal(a, 4))(t) ~-* Des(a, 4))(t) A Bel(a, Real(4)) A Per(4)))(t). 6 We refrain from calling r a goal at this stage--although this usage is common in the planning literature-as we are building up to a more complex definition of goals. 7 In this definition r denotes the result of substituting t for the temporal variables or constants in r if r is atomic, and the instance of r in which t is substituted for the temporal variable which is bound by the outermost temporal quantifier in r otherwise; for example, OnTable(A)(3){5} = OnTable(A)(5) and 3t'(t <_ t' A
OnTable(t')){5} = t < 5 A OnTable(5).)
94 The belief operator occurs in this definition as we are interested in resourcebounded agents who may well be mistaken about the realisability or the permissibility of states. We will however require that the agent's beliefs about realisability satisfy the following minimal competence constraint:
(CR) Bel(a, neaZ(r A T h e permission operator occurs in the definition as the agent should not violate any legal or moral constraints which may currently apply. It also introduces an element of approval, insofar as the agent will not permit itself to do things it disapproves of. The goals of a rational agent should, in some sense, be jointly realisable. B r a t m a n proposes a "screen-of-admissibility" condition: if an agent has a goal then it should not adopt a conflicting state as a goal. For example, if Stan plans to finish the paper on Sunday and he believes that going to the beach on Saturday will prevent this, then it is not rational for him to adopt the goal of going to the beach on Saturday. One way of unpacking this condition is to require that all of the agent's goals are jointly realisable. However, this seems to be too strong a condition for resource-bounded agents, even though we can and will assume t h a t at any point in time the agent has a finite number of goals. Consequently we will require only that the agent believes that each of its goals is jointly realisable with all of the goals that it considers to be preferable. We consider the agent's beliefs here as it might well be wrong about joint realisability. The restriction to more important, or preferred, goals greatly reduces the complexity o f the task of determining joint realisability. Finally, resource-bounded agents typically operate with partial knowledge and thus form plans containing goals about which the agent is indifferent, and which are not jointly realisable. For example, on Wednesday Stan has the goal of spending Saturday afternoon at the swimming pool with his family. He also has the goal of taking them to the circus on Saturday afternoon. These goals are not jointly realisable, as there is not time to do both in one afternoon. Stan is indifferent about which alternative is chosen, as he has no personal preference between them and he does not know which alternative his children will choose. The important point, as far as Stan is concerned, is that each alternative is jointly realisable with his preferred goal of finishing his paper. We will thus require that a goal is a coherent candidate goal; t h a t is, that the agent believes that it is jointly realisable with all of the goals which the agent considers more important. Note that, as defined, coherence implies t h a t an agent's goals are (at least) partially ordered. Coherence also implies that goals cannot be defined independently, but only with reference to the agent's existing goal hierarchy. We will therefore give a recursive definition of goals. By introducing an ordering on the agent's goal set at time t we obtain the agent's goal hierarchy at t. The ordering is represented by means of the preference operator; thus, for example, Pref(a, Goal(C), Goal(Ib))(t) states that a considers that r is a more important goal than ~b at t; that is, if, other things being equal, a had to choose between the two goals, then a would choose to adopt r rather than
95 as a goal In order to determine whether or not a candidate goal is coherent, we have to check that the agent believes that it is jointly realisable with all of the goals above it in the goal hierarchy. Consequently goals are defined, in effect, by working recursively down the hierarchy. As all goals are candidate goals, we will, for technical reasons, work with the hierarchy of candidate goals, which is required to satisfy the following conditions:
(RPC) (RPG) (PGC) (PEG)
Pref(a, CGoal(r CGoal(r )(t) --~ CGoal(a, r A CGoal(a, r Pref(a, Goal(C), Goal(~) )(t) --~ Goal(a, r A Goal(a, r Pref(a, Goal(C), Goal(r --+ Pref(a, CGoal(r CGoal(~))(t) Pref (a, CGoal(r CGoaI(r )(t) A Goal(a, r A Goal(a, r --+ Pref (a, Goal(C), Goal(r (t)
(RPC) is a realism condition on preferences between candidate goals. To say that at t, a prefers candidate goal r to candidate goal r should imply that a believes that r and r are each realisable and permissible. This condition thus helps to distinguish between rational consideration of candidate goals and wishful thinking about them. Similarly (RPG) is a realism condition on preferences between goals. (PGC) and (PCG) together ensure the agent's preferences on candidate goals are consistent with its preferences on goals. The last three conditions are, of course, equivalent to the following one: Pref(a, Goal(C), Goal(r ~-+ Pref(a, CGoal(r CGoal(r )(t) A Goal(a, r
A Goal(a, r
The axioms for the irreflexivity and transitivity of preferences, (IR) and ensure that the preference orderings on goals and candidate goals are partim orderings. It will generally not be the case that the orderings are convergent. Thus, for example, the following property will not generally hold for preferences on goals:
(TR),
Pref(a, Goal(C), Goal(r )(t) A Pref(a, Goal(x), Goal(r )(t) --+ Pref(a, Goal(~), Goal(x) )(t) V Pref(a, Goal(x), Goal(r )(t). Thus in defining the coherence of a candidate goal r we need to consider all of the goals which are preferred (as candidate goals) to r So we have to consider every goal in every chain of candidate goals which are preferred to r We therefore need three preliminary definitions. First we define what it means for a goal to be preferred to a candidate goal:
Pref(a, Goal(C), CGoal(r )(t) *-+Pref(a, CGoal(r CGoal(~b))(t)A Goal(a, r Then the sentence FGoal(a, r r states that, for agent a and time t, ~ is the first goal which is preferred to the candidate goal r in a chain of candidate goals which are preferred to r Thus: s s In keeping with the assumption of resource boundedness we are assuming that at any time the agent has a finite number of goals. In [10] section 11.6.1 (pp 179-184), there
96
FGoal(a, r r
*-~ Pref(a, Goal(r
CGoal(r -~3x( Pref (a, Goal(C), Goal(x) )(t)A Pre/(a, Goal (X), CGoal (d~))(t)).
We then recursively define the sentence PGoals(a, r r which states that, for agent a, time t and candidate goal r r is the conjunction of all of the goals in a chain of candidate goals which are preferred to r Thus:
PGoals(a, r T)(t) ~-~ -~SCPref (a, Goal( ~ ), CGoal(r )(t), PGoals(a, r r A x)(t) ~-~FGoal(a, r r A PGoals(a, 4, x)(t). T h e n in order to assess the coherence of a candidate goM r we need to consider each such conjunction. D e f i n i t i o n 4. A candidate goal is coherent if the agent believes that it is jointly realisable with every goal that the agent prefers to it:
Coherent(a, r
~-~ Vr
r r
--~ Bel(a, Real(r A r
D e f i n i t i o n 5. A goal is a candidate goal which coheres with the agent's existing goals: Goal(a, r ~ CGoal(a, r A Coherent(a, r The use of the preference operator in these definitions, ensures that goals do not have the undesirable properties usually associated with possible-worlds definitions. Thus, for example, goals are not closed under implication, and all theorems are not goals [10, 11]. P r o p o s i t i o n 6. Static properties of candidate goals and goals.
1. Any maximal candidate goal is a goal: CGoal(a, r
A -,3r Pref (a, CGoal(r
CGoal( r )(t) ) --* Goal(a, r
2. If r is a candidate goal and no more important state is a candidate goal, then r is a goal: CGoal(a, r Goal(a, r
A -~3r
CGoal(r
CGoal(r
A CGoal(a, r
-*
3. All goals are candidate goals: Goal(a, r
--* Caoal(a, r
is a theorem which ensures that higher-order quantification over preferred formulas can be reduced to a finite set (and thus to a conjunction) of preferred formulas. Furthermore, this finite set always exists if we start with a finite set of formulas and use the preference axioms.
9"/
4. If no more important state than r is realisable, then r is a candidate goal if and only if r is a goal: -,3r CGoal(r CGoal(r (CGoal(a, r ~ Goal(a, r
A Bel(a, Real(r
-~
5. Candidate goals are closed under conjunction if they are jointly realisable: Bel(a, Real(r A r CGoal(a, r A r
A CGoal(a, r
A CGoal(a, r
--~
6. Goals are decomposable under conjunction: Goal(a, r A r
- - Goal(a, r
A Goal(a, r
7. Related goals are closed under conjunction: Pref(a, Goal(C), Goal(~p))(t) -* Goal(a, r A ~p)(t). 8. Related goals are logically consistent: ~Pref (a, Goal(C), Goal(-~r Proof. Most cases are straightforward from the definitions. For (1), if r is a c a n d i d a t e goal, for an agent a at time t, 9 and there is no m o r e i m p o r t a n t candidate goal t h a n r then r is a m a x i m a l CGoal. As r is a m a x i m a l CGoal, it is coherent, and hence it is also a goal. Similarly, for (2), if there exists no more i m p o r t a n t candidate goal with respect to the C G o a l r then r is coherent. (3) is straightforward from the definition. (2) and (3) t o g e t h e r i m p l y (4). For (5) suppose t h a t r and ~p are C G o a l s and t h a t r A r is realisable. Then, since the m o d a l operators for desires, permissions and beliefs are normal, it follows t h a t r A ~p is a CGoal. For (6), just note t h a t realisability is d e c o m p o s a b l e under conjunction. So are desires, beliefs, and permissions. For (7), suppose t h a t Pref(a, Goal(r holds. T h e n by (RPG) we conclude t h a t r and r are goals. F r o m the supposition it follows t h a t for, s o m e n , m > 0, we have PGoals(a,r162 A . . . A ~bn A r A r A . . . A era). Let X denote the conjunction (4) A ~p) A ~Pl A . . . A ~Pn A r A ... A era. Then, as ~p is a goal, we have Bel(a, Real(x))(@ So, as the the desire, belief, and permission o p e r a t o r s are closed under conjunction, X is a CGoal. Moreover, since X is a m a x i m a l CGoal, it follows from p a r t (1) t h a t X is a goal. Consequently, as r A ~p is a conjunct of X, we conclude from p a r t (6) t h a t Goal(a, r A ~p)(t). For (8), suppose t h a t Pref(a, Goal(C), Goal(~r T h e n it follows from p a r t (7) t h a t Goal(a, r A -~r and thus t h a t Bel(a, Real(r A -,r B u t this contradicts the c o m p e t e n c e constraint on realisability, (CR). [] Proposition
7. Static properties of goal hierarchies.
9 In the sequel, we will often omit the agent name a and the time point t in proofs when it does not cause any ambiguity.
98
1. Goal hierarchies are coherent. 2. Goal hierarchies are maximal subject to coherence. Proof. Part (1) follows from the coherence of goals. Suppose that Goal(a, r Then for any C such that PGoals(a, r C)(t), it follows from the coherence of r that Bel(a, Real(r For (2), suppose for any r and C that Pref(a, CGoal( r CGoal(r and that Coherent(a, r but -~Coherent(a, r A r We must show that Goal(a, r and ~Goal(a, C)(t). It follows from (RPC) and the first assumption that CGoal(a, r So, from the assumption that r is coherent, it follows that Goal(a,r It remains to show that -~Goal(a,C)(t). Suppose not. Then we haveGoal(a,r A Goal(a,C)(t). So, by (PCG) we have Pref(a, Goal(r It then follows from Proposition 6(7) that Goal(a, r But then from the definition of goals it follows that Coherent(a, r Ar A contradiction. [] To conclude this section we consider the special case of linear goal hierarchies; that is, goal hierarchies which satisfy the additional condition:
Goal(a, r
^ Goal(a, C)(t) Pref(a, Goal(C), Goal(~b))(t) Y Pref (a, Goal(~b), Goal(r -~
P r o p o s i t i o n 8. If the goal hierarchy is linear, then goals are closed under con-
junction: Goal(a, r
A Goal(a, C)(t) --~ Goal(a, r A C)(t).
Proof. Suppose that Goal(a, r A Goal(a, r and that the goal hierarchy is linear. Then we have either Pref(a, Goal(C), Goal(C))(t) or Pref(a, Goal(C), Goal(r In either case we conclude by Proposition 6(7). [] 4
Goal Revision
We have given a formM account of goals as rational desires. In particular goals are defined to be desires which the agent believes to be both realisable and permissible. Furthermore, at any point in time the agent's goals form a hierarchy which is required to be both coherent and maximal subject to coherence. However so far our analysis has been concerned with the static properties of goals and goal hierarchies, with the properties of agents, goals and goal hierarchies at particular points in time. In this section we consider the dynamic properties of goals and goal hierarchies; that is, how they should be revised over time. Here we aim to satisfy two apparently conflicting requirements. On one hand, as Millgram argues [13], the goals of a rational agent should be appropriate given its circumstances, so the agent should be able to revise its goals in an appropriate way when its circumstances change. For example, on Tuesday Stan believes that he can go to the beach with his family on Saturday and still finish his paper on Sunday. However, on Wednesday he no longer believes this, so it is rational for him to abandon the goal of going to the beach and to adopt the
99 alternatives of going to the pool or to the circus. On the other hand, as Bratm a n argues [5, 6], the goals of rational agents should exhibit a certain stability or persistence. For example, given Stan's beliefs about what is realisable and permissible, and his desires and preferences, it is rational for him to persist with his goal of finishing his paper throughout the week. Inspired by B r a t m a n ' s work, Cohen and Levesque [7] a t t e m p t to give necessary and sufficient conditions for the persistence of goals in their definition of "persistent relative" goals. This leads to problems as it is difficult or even impossible to stipulate an adequate set of conditions which will apply in all circumstances; see the discussion in [2]. Instead, in keeping with the two requirements, we take the view t h a t a rational agent should revise its goals, in accordance with the rationality constraints on static goals, whenever its beliefs, desires, or preferences concerning its goals change, and t h a t otherwise the agent's goals should persist by default. As rational agents do not change their beliefs, desires or preferences without reason, their goals and goal hierarchies will thus tend to be stable. Indeed, the goal hierarchies of rational agents will tend to be upwardly stable; t h a t is, the more important the goal the more it will tend to persist in and to maintain its relative position in the hierarchy. In order to represent the persistence of attitudes and preferences, we use the affected operator, Aft, of CA. This modal operator is very similar to the Ab predicate of the Situation Calculus. Let 4~ be a meta-variable which ranges over the non-temporal component of atomic formulas and atomic modal formulas. 1~ Then a formula ~(t) is affected at t if its truth value at t differs from its t r u t h value at t + 1:
M , w , g ~ Aff(4))(t) iff M, w,g ~ ~(4)(t) ~ 4)(t + 1)). We thus have the following persistence rule:
r
A ~A#(q~)(t) ~ ~(t + 1).
Intuitively we are interested in models in which this schema is used from left-to-right only in order to reason "forwards in time" from instances of its antecedent to instances of its consequent. Typically also we want to be able to infer the second conjunct of each instance nonmonotonicMly whenever it is consistent to do so. For example, if we have Goal(a, r then we want to be able to use the rule to infer Goal(a, r + 1) if Aft(Goal(a, r cannot be inferred. In order to enforce this interpretation, we define a preference order on models [14]. D e f i n i t i o n 9. Let M and M ' be models of CA which satisfy any conditions t h a t m a y be imposed. Then M is Aft-preferred to M ~, written M -~Aff M', if M 10 Atomic formulas are formulas of the form r(ul,...,Un)(t), where r is a standard relation symbol and n > 1. Atomic modal formulas are formulas of the form op(a,r r where n > 1 and op is a modal operator other than Aft.
100
and M ~ differ at most on the interpretation of Aft and there exists a time point t such that:
M' ~ Aft(~)(t') if M ~ Aft(~)(t') for any ~ and t' < t, and M' ~ Aft(~)(t) and M ~= Aff(q~)(t) for some q~. D e f i n i t i o n 10. A CA-model M ~ r and there is no model M is an Aft-preferred model model M' such that M' ~ t~
M is an Aft-preferred model of a sentence r if M ~ such that M' ~ r and M ~ -
D e f i n i t i o n 11. A set of sentences O preferentially entails a sentence r (written ~ A f t r if, for any Aft-preferred model M of ~, M ~ r If required these definitions can easily be extended to a "multi-preference" relation, ~(AH,opl .....op.), over n modal operators. Informally, the Aft-preferred models are selected, then the preferred opl models among these are selected, then the preferred op2 models among these are selected, etc. For example, we might choose the (Aft, Bel, Des, Pref)-preferred models of a theory; as beliefs give rise to desires, which give rise to preferences. P r o p o s i t i o n 12. Dynamic properties of goals and goal hierarchies.
1. 2. 3. 4.
Goals persist by default. Preferences on goals persist by default. Goal hierarchies persist by default. Goal hierarchies are upwardly stable.
Proof. For (1), let ~ be an appropriate theory of goals such that ~ ~ A f t Goal(a, r and 6) ~ A f f -~Aff(Goal(a, r T h e n it follows from the persistence rule that O ~ A f t Goal(a, r + 1). The proof for (2) is similar. Part (3) follows from (1) and (2). Part (4) follows from the maximality and default persistence of goal hierarchies. [] As the definition of goals is recursive, the persistence rule for goals corresponds to a complicated sentence. The following proposition shows how it can be stated in progressively simple ways. P r o p o s i t i o n 13. Let q~(r to denote a formula in which the proposition r appears, then the persistence rule for goals can be stated in the following ways:
1. Goal(a, r .
Goal(a, r -~3~'(r Goal(a, r
A -~Aft(Goal(a, r
--* Goal(a, r
A -~Aft(CGoal(a, r
CGoal(r CGoal(r + 1).
A Aff(O(r
+ 1).
101
3. Goal(a, r A ~Aff(CGoal(a, r V~( Pref (a, CGoal( r ), CGoal( r ) ) (t ) -~ ~Aft(CGoal(a, r A Bel(a, Real(r A r Coal(a, r + 1).
--*
Proof. (1) is the standard persistence rule for goals, and is straightforward from the definition of Aft. Rule (2) represents a simplification of (1), as it no longer refers to the recursive definition of goals. The proof that this rule ensures the default persistence of goals is as follows. Suppose that the antecedent holds. T h e condition ~Aft(CGoal(a, r guarantees that r is still a candidate goal for a at time t + 1. We must check that the coherence condition holds for r at time t + 1. Suppose that there exists a preferred goal r which is not believed to be jointly realisable with r at time t + 1. This means that there exists a formula Bel(a, Real(r A r such that Aff(Bel(a, Real(r A r holds. But this contradicts the condition - ~ P ( r 1 6 2 Rule (3) represents a further simplification, as it uses the concrete formulas CGoal(a, r and Bel(a, Real(r A r instead of the general formula scheme ~P(r The proof for this is similar to the proof for rule (2). [] To conclude this section we show how our opening example can be formalised.
Example 1. Let 1,2, .., 7 denote Monday, ~ihesday,..., Sunday respectively. Then the given facts about Stan's beliefs, preferences and candidate goals are expressed by the following set, @, of sentences:
(A) Pref (Stan, CGoal(Paper(7) ), CGoal( Beach(6) ) ))(1), (B) Pref (Stan, CGoal(Beach (6)), CGoal (Pool (6))))(1), (C) Pref (Stan, CGoal(Beach( 6)), CGoal( Circus( 6))))(1), (D) CGoal(Stan, Paper(7))(1), (E) CGoal( Stan, Beach(6))(1), (F) CGoal( Stan, Pool)(6)) (1), (a) CGoal( Stan, Circus)(6))(1), (H) Bel( Stan, Real(Paper(7) A Beach(6)))(1), (X) Bel(Stan, ~Real(Paper(7) A Beach(6)))(3), (J) Bel( Stan, Real(Paper(7) A Pool(6)))(1), (I<) Bel( Stan, Real(Paper(7) A Circus(6)))(1), (L) Bel(Stan, Real(Paper(7) A Beach(6)))(5), (M) Bel( Stan, -~Real( Beach( 6) A Pool(6)))(1), (N) Bel( Stan, ~Real (Beach( 6 ) A Circus(6)))(1), (0) Bel( Stan, -~Real( Pool( 6) A Circus(6)))(1). For natural numbers nl and n2 such that 1 < nl N n2 _~ 7, we will use 45([nl... n2]) to denote the conjunction 4~(nl) A O(nl + 1) A ... A ~(n2). Then the following sentences are true in all Aft-preferred models of O:
102
(a) Pref(Stan, CGoal(PaperCT)) , CGoaliBeachC6))))([1. . . 7]), (b) Pref (Stan, CGoal( Beach(6) ), CGoal( Pool(6) ) ) )([1. . . 7]), (c) Pref ( Stan, CGoal( Beach( 6) ), CGoal( Circus(6) ) ) )([1. . . 7]), (d) CGoal( Stan, Paper(7))([1... 7]), (e) CGoal( Stan, Beach(6))([1... 6]), (f) CGoal(Stan, Pool)(6))([1... 6]), Cg) CGoal(Stan, Circus)C6))C[1. . . 6]), (h ) Bel( Stan, ReaIC Paper(7 ) A Beach(6)))([1... 2]), (i) Bel(Stan,-~Real(Paper(7) A Beach(6)))([3... 4]), (j) Bel( Stan, RealC Paper(7 ) A Pool(6)))([1... 6]), Ck) Bel(Stan, RealCPaper(7 ) A Circus(6)))C[1... 6]), Cl) BeICStan , Real(PaperC7 ) A BeachC6)))([5... 6]), (m) BeICStan, -~Real(Beach(6) A Pool(6)))([1... 6]), (n) Bel(Stan,-~RealCBeach(6 ) A Circus(6)))([1... 6]), (o) Bel(Stan, -~Real(Pool(6) A Circus(6)))([1... 6]). So in all Aft-preferred models of t0 Stan's goals change as follows during the week: (1) GoalCStan, Paper(7))(1) Ca), (b), (c), Cd) (2) Goal(Stan, Beach(6))(1) Ca),(b), (c), (d), (1), Ce), (h) (3) -~Goal(Stan, Pool(6))(1) (2), (a), (b), (c), (m) (4) -~Goal(Stan, Circus(6))(1) (2), (a), (b), (c), (n) (5) Goal( Stan, Paper(7) )( 2) PersistenceRule (6) Goal( Stan, Beach(6) )(2) PersistenceRule (7) ~GoalC Stan, Pool( 6) )(2 ) PersistenceRule (8) -~GoalCStan, CircusC6)) (2) PersistenceRule C9) GoalCStan , paperC7))(3 ) (a), (b), (c) (10) -,Goal(Stan, Beach(6))(3) (a), (b), (c), (9), (i) (11) Goal(Stan, Poo1(6))(3) (a), (b), (c), (9), (10), (f), (j) C12) Goal(Stan, Circus(6))(3) (a), (b), (c), C9), (10), (g), (k) C13) Goal(Stan, paperCT))C4 ) PersistenceRule (14)-~Goal(Stan, Beach( 6) )( 4) PersistenceRule (15) Goal(Stan, Poo1(6))(4) PersistenceRule (16) Goal(Stan, Circus(6) )(4) PersistenceRule (17) Goal(Stan, Paper(7))(5) (a),(b),(c) (18) Goal(Stan, Beach(6))(5) (a), (b), (c), (e), (17), (1) (19) -~Goal(Stan, Poo1(6))(5) (a), (b), (c), (18), (m) (20) -~GoalCStan , Circus(6) )C5) Ca), (b), (c), (18), (n) Acknowledgements We are grateful to everyone who has commented on earlier versions of this paper. This research forms part of the Dynamo Project and is supported by the United Kingdom Engineering and Physical Sciences Research Council under grant number GR/K19266.
103
References 1. J. Allen. Towards a General Theory of Action and Time. Artificial Intelligence 23, 1984, pp. 123-154 2. J. Bell. Changing Attitudes. In: M.J. Wooldridge and N.R. Jennings (Eds.). Intelligent Agents. Post-Proceedings of the ECAI'94 Workshop on Agent Theories, Architectures, and Languages. Springer Lecture Notes in Artificial Intelligence, No. 890. Springer, Berlin, 1995. pp. 40-55. 3. J. Bell. A Planning Theory of Practical RationMity. In [9], pp. 1-4. 4. Boutillier, C., Toward a Logic for Qualitative Decision Theory. Proceedings of KR'94, pp. 75-86. 5. M.E. Bratman. Intention, Plans and Practical Reason. Harvard University Press, Cambridge Massachusetts 1988. 6. M.E. Bratman. Planning and the Stability of Intention. Minds and Machines 2, pp 1-16,1992. 7. P. Cohen and H. Levesque. Intention is Choice with Commitment. Artificial Intelligence 42 (1990) pp. 213-261. 8. Doyle, J., and Wellman, M.P., Preferential Semantics for Goals. Proceedings of AAAI'91. pp. 698-703. 9. M. Fehling (ed.), Proceedings of the AAAI-95 Fall Symposium on Rational Agency: Concepts, Theories, Models and Applications, M.I.T, November 1995. 10. Huang, Z., Logics for Agents with Bounded Rationality, ILLC Dissertation series 1994-10, University of Amsterdam, 1994. 11. Huang, Z., Masuch, M., and P61os, L., ALX: an action logic for agents with bounded rationality, Artificial Intelligence 82 (1996), pp. 101-153. 12. Lewis, D., Counterfaetuals, Basil Blackwell, Oxford, 1973. 13. E. Millgram. Rational goal acquisition in highly adaptive agents. In [9], pp. 105107. 14. Shoham, Y. Reasoning About Change, M.I.T. Press, Cambridge, Massachusetts, 1988. 15. Stalnaker, R., A theory of conditionals, Studies in Logical Theory, American Philosophical Quarterly 2 (1968), pp. 9 8-122. 1995. 16. von Wright, G., The Logic of Preference, (Edinburgh, 1963).
Constructing Finite State Implementations of Knowledge-Based Programs with Perfect Recall* (Extended Abstract) R o n van der M e y d e n Computing Science University of Technology, Sydney PO Box 123, Broadway NSW 2007 Australia email: [email protected]
Knowledge-based programs have been proposed as an abstract formalism for the design of multi-agent protocols, based on the idea that an agent's actions are a function of its state of knowledge. The key questions in this approach concern the relationship between knowledge-based programs and their concrete implementations, in which the actions are a function of the agents' local states. In previous work we have shown that with respect to a perfect recall semantics for knowledge, finite state implementations of knowledge-based programs do not always exist. Indeed, determining the existence of such an implementation is undecidable. However, we also identified a sufficient condition under which the existence of a finite state implementation is guaranteed, although this sufficient condition is also undecidable. We show in this paper that there nevertheless exists an approach to the optimization of implementations that results in a finite state implementation just when the sufficient condition holds. These results contribute towards a theory of automated synthesis of multi-agent protocols from knowledge-based specifications. Abstract.
1
Introduction
Viewing agents as h a v i n g s t a t e s of knowledge has been f o u n d to be a useful a b s t r a c t i o n for t h e design a n d a n a l y s i s of d i s t r i b u t e d , m u l t i - a g e n t a n d s i t u a t e d s y s t e m s [HM90, RK86]. T h e central insight of this a p p r o a c h is t h a t an a g e n t ' s a c t i o n s s h o u l d d e p e n d on its e p i s t e m i e state. T h i s has lead to t h e p r o p o s a l of k n o w l e d g e - b a s e d p r o g r a m m i n g f o r m a l i s m s [FHMV95a], in which a g e n t s ' actions have as p r e c o n d i t i o n s f o r m u l a e expressing p r o p e r t i e s o f their knowledge. K n o w l e d g e - b a s e d p r o g r a m s allow for e x t r e m e l y i n t u i t i v e d e s c r i p t i o n s of agent * Work begun while the author was with the Information Sciences Laboratory, N T T Basic Research Laboratories, Kanagawa, Japan, and continued at the Department of Applied Math and Computer Science, Weizmarm Institute of Science. Thanks t o Yoram Moses for helpful discussions and to Lawrence Cavedon for his comments o n the manuscript.
136
behaviour, abstracting both from the agents' data structures and the environment in which they run. In knowledge-based programs, agents' concrete states can be arbitrary data structures, and the relationship between knowledge formulae and agents' concrete states is defined semantically. An agent knows a fact P when in a local state s if P is true in all global states of the system in which the agent is in local state s. Since this definition of knowledge is non-operationM, a knowledge-based program cannot be directly executed - it is first necessary to implement it by translating it to a standard program in which the preconditions for actions are tests on the agents' concrete states. 2 In a knowledge-based program an agent's state of knowledge depends on a number of factors, including the behaviour of the environment in which the program runs, and the actions being performed by the agents in this environment. These actions in turn depend on the agents' states of knowledge. This means that the relationship between a knowledge-based program and its implementations is subtle, involving a non-trivial fixpoint definition. Consequently, a knowledge-based program may, in general, have many behaviourally distinct implementations, or none. In this paper we focus on the class of atemporal programs and a synchronous perfect recall interpretation of knowledge, for which this problem does not arise, and for which there always exists an implementation, unique up to behavioural equivalence. The main issue is then the representation and construction of this implementation. Finite state protocols are the simplest possible type of representation. In previous work [Mey96b], we have shown that with respect to a perfect recall interpretation of knowledge, finite state implementations of atemporal programs may not exist, even when the environment is finite state. However, we have also identified a general condition sufficient for the existence of finite state implementations [Mey96a], and shown that it applies for some natural classes of systems. It turns out that, like the problem of deciding the existence of a finite state implementation of a given knowledge-based program, the question of whether the sufficient condition applies is undecidable. This still allows a finite state implementation to be automatically discovered when one exists, since the problem of determining whether a given finite state protocol is an implementation of a given knowledge-based program is decidable. However, this involves an exhaustive search through all finite state implementations. In the present paper, we show that a much more systematic procedure, involving no search, is able to construct finite state implementations when the sufficient condition applies. In particular, we show that there exists an implementation with the property that it runs in constant space just when the suf2 In this respect knowledge-based programs differ from proposals for agent oriented programming [Sho93] and BDI architectures [RG92] in which agents' local states are logical theories, updated by means of procedures provided by the designer. The epistemic level description of agents in these proposals remains operational, and there is no guarantee that knowledge states are updated in a way that conforms to their semantic interpretation.
137
ficient condition holds. This result contributes towards a theory of a u t o m a t e d synthesis of multi-agent protocols from knowledge-based specifications. The structure of the paper is as follows. Sections 2 to 4 define knowledgebased programs and their semantic interpretation. Section 5 describes a canonical implementation for a knowledge-based program. Section 6 states a sufficient condition for the existence of finite state implementations. This sufficient condition is undecidable, but Section 7 shows that a systematic procedure (based on an optimization of the canonical implementation) is able to detect that it applies, and construct a finite state implementation as a result. Section 8 concludes.
2
Knowledge-Based
Programs
To make the paper self-contained, we recall in the next three sections the definitions from [Mey96b] of knowledge-based programs, the environments in which they run, and their implementations. These definitions are a variant of the definitions in [FHMV95a], to which the reader is referred for motivation. For a description of how our framework differs from that of [FHMV95a], and motivation for the changes, see [Mey96b]. To describe the agents' states of knowledge we work with the propositional multi-modal language for knowledge s c generated from some set of basic propositions Prop by means of the usual boolean operators, the monadic modal operators Ki, where i = 1 . . . n is an agent, and the monadic operators Ca, where G is a set of two or more agents. Intuitively, Kip expresses that agent i knows p, and C a p expresses that p is common knowledge amongst the group of agents G. This language is interpreted in the standard way [FHMV95b] in a class of S5n Kripke structures of the form M = (W,/(;1,..., s V), where W is a set of worlds, for each i = 1 ... n the accessibility relation ~i is an equivalence relation on W, and V : W x Prop ~ {0, 1} is a valuation on W. If G is a group of agents then we may define the equivalence relation H a on W by u "~a v if there exists a sequence uo,...,Uk of worlds such that u0 = u, uk = v and for each j = 0 . . . k - 1 there exists an agent i E G with Uj]~iUj+ 1. The crucial clauses of the truth definition are given by 1. (M, u) ~ p, where p is an atomic proposition, if V(u, p) = 1. 2. (M, u) ~ K i p if M, v ~ p for all worlds v with us 3. (M, u) ~ CGp if M, v ~ p for all worlds v with u "~G v. We now describe the structure of knowledge-based programs. For each agent i = 1 . . . n let ACTi be the set of actions that may be performed by agent i. Similarly, let ACTe be the set of actions that may be performed by the environment in which the agents operate. If ae E ACTe and ai E ACTi for each i = 1 . . . n , we say that the tuple a = (a~,al,...,a,~) is a joint action, and we write A C T for the set of all joint actions. Call a formula of s i-subjective if it is a boolean combination of formulae of the form Kip. An atemporal knowledge-based program for agent i is a finite
138
statement P g i of the form case of i f 91 d o al
if ~ do an end case
where the ~oj are/-subjective formulae of s c and the aj E ACTi are actions of agent i. Intuitively, a program of this form is executed by repeatedly evaluating the case statement, ad infinitum. At each step of the computation, the agent determines which of the formulae ~j accurately describe its current state of knowledge. It non-deterministically executes one of the actions aj for which the formula ~oj is true, updates its state of knowledge according to any new input it receives, and then repeats the process. We will give a more precise semantics below. To describe the behaviour of the world in which agents operate, we define a finite interpreted environment to be a tuple of the form E = (Se, Ie, Pe, r, O, Ve) where the components are as follows: 1. Se is a finite set of states of the environment. Intuitively, states of the environment m a y encode such information as messages in transit, failure of components, etc. and possibly the values of certain local variables maintained by the agents. 2. Ie is a subset ofSe, representing the possible initial states of the environment. 3. P~ : S~ -+ "P(ACTe) is a function, called the protocol of the environment, m a p p i n g states to subsets of the set A CTe of actions performable by the environment. Intuitively, P~(s) represents the set of actions that m a y be performed by the environment when the system is in state s. 4. r is a function mapping joint actions a E A C T to state transition functions r ( a ) : Se --~ S~. Intuitively, when the joint action a is performed in the state s, the resulting state of the environment is v(a)(s). 5. The component O is a function m a p p i n g the set of states Se to O n, where O is some set of observations. If s is a global state then the i-th component Oi(s) of O(s) will be called the observation of agent i in the state s. 6. Ve : Sex Prop -+ {0, 1} is a valuation, assigning a truth value V(s,p) in each state s to each atomic proposition p E Prop.
A trace of an environment E is a finite sequence so...sm of states such t h a t for all i = 0 . . . m - 1 there exists a joint action a = (ae, a l , . . . , am) such t h a t Si+l = r ( a ) ( s l ) and ae E Pe(si). We write fin(r) and init(r) for the final and initial states of a trace r, respectively. Example 1. We will use as a running example a robot motion planning problem introduced in [BLMS]. The discrete variant we consider here is due to [FHMV95b]. Consider a robot that moves along a fixed track with a discrete set of positions {0..4}. Initially the robot is at the leftmost position 0. The robot moves under the control of the environment, and all moves are to the right. The
139
only action the robot can perform is to apply the brakes, which prevents any motion. The robot has a sensor that gives a reading in {0..4} of its position, but this sensor is subject to some noise, so that the reading could be the actual position, or one position to the left or right. The robot has a goal of halting within the region {1, 2, 3}. To express this situation as an environment, define the set of states Se to be the set of pairs (p, p), where 1. p G {0..4} represents the actual position of the robot, 2. p 6 {0..4} represents the sensor reading of the robot, required to satisfy [ p - p[ _< 1. Since the robot starts in position 0, but the sensor reading has some uncertainty, the set of initial states Ic = {(0, 0), (0, 1)}. For the robot (agent 1), we use two actions halt and hoop, representing the robot applying the brakes and performing no action, respectively. The environment's actions are of the form (m, n), where m G {0, 1} represents velocity of the robot, and n G { - 1 , 0, 1] represents the noise in the sensor. The environment's protocol allows any action, except that if the robot is at position 4 it must have velocity 0, and if the robot is at either endpoint then the noise can't be such as to give a reading out of the range {0..4]. That is,
={
{0,1] • {0,1] i f p = O , {0} • { - 1 , 0 } i f p = 4 , {0, 1] • { - 1 , 0 , 1} otherwise.
To describe the effect of joint actions on the states, we define the transition function r by v(((m, n), a~))[(p, p)] = (p', p' + n) where p' = p + m if al = hoop and p' = p if al = halt. In any state, the robot observes its sensor reading, so we take the set of observations O to be {0..4}, and define the observation function O1 by O~[(p,p)] --- p. We will use a single propositional constant g, representing that the robot is in the goal region, so we define the valuation function by Vc(g, (p, p)) = 1 iff p 6 {1, 2, 3}. To halt in the goal region, the robot can apply the following strategy: wait until it knows that it is in the goal region, then apply the brakes (and keep applying them thereafter.) We may represent this strategy as the following knowledgebased program P g l : case o f if Kl(g) d o halt
if-~K1 (g) do e n d case
hoop
140
3
Protocols
Next, we introduce the standard protocols which will be used as implementations of knowledge-based programs, and describe the set of traces produced by executing such a protocol. Define a protocol for agent i to be a tuple Pi = (Si, qi, cri, tti) consisting of (1) a set S~ of protocol states, (2) an element q~ E S~, the protocol's initial state, (3) a function a~ : S~ • O --~ 7)(ACTi), such that hi(s, o) is a n o n e m p t y set representing the possible next actions t h a t agent i m a y take when it is in state s and is making observation o, and (4) a function #i : Si • O --~ Si, such that tti(s, o) represents the next protocol state the agent assumes after it has been in state s making observation o. In general, we allow the set of protocol states to be infinite. A joint protocol is a tuple P = ( P 1 , . . . , Pn) such that each Pi is a protocol of agent i. Intuitively, protocol states represent the m e m o r y that agents maintain a b o u t their sequence of observations for the purpose of implementing their knowledgebased programs. For each trace r of an environment E, a protocol P~ determines a protocol state Pi(r) for agent i by means of the following recursive definition. If r is a trace of length one, consisting of an initial state of E, then P~(r) = qi, the initial state of the protocol. If r is the trace r~s, where r ~ is a trace and s is a state of the environment, then Pi(r) = tti(Pi(r'), Oi(fin(r'))), where fin(r') is the final state of r'. Thus Pi(r) represents the agent's m e m o r y of its past observations in the trace r, excluding the current observation. In executing a protocol, an agent's next action in a given trace is selected from a set determined by its m e m o r y of past observations together with its current observation. Define the set of actions of agent i enabled by the protocol Pi at a trace r with final state s to be the set acti(Pi, r) = ai(Pi(r), Oi(s)). Similarly, the set of joint actions a c t ( P , r) enabled at r by a joint protocol P contains precisely the joint actions (he, a l , . . . , a n ) s u c h that ae E P~(s~) and each ai E acti(Pi, r). We m a y now describe the set of traces t h a t results when a particular protocol is executed in a given environment. Given a joint protocol P and an environment E, we define the set of traces generated by P and E, to be the smallest set of traces 7r such that (1) for each initial state s~ E I~ of the environment, Tr contains the trace of length one composed just of the state se, and (2) if r is a trace in Tr with final state s, and a E a c t ( P , r) is a joint action enabled by P at r, then the trace r . r ( a ) ( s ) is in 7r Intuitively, this is the set of traces generated when the agents incrementally maintain their m e m o r y using the update functions #i, and select their next action at each step using the functions hi.
Example 2. Define the protocol P1 for agent 1 in the environment of Example 1 as follows. There are two states $1 = {ql, h}, where ql is the initial state. The update function is given by
ttl(ql,p) =
h ifp_>2 ql otherwise
141
and #1 (h, p) = h. The action function is given b y O~l(ql , ,o) :
{halt} if p > 2 {hoop} otherwise
and a~(h, p) = {halt}. The set of traces generated by this protocol in the environment E of Example 1 is the set of all traces (P0, Po)(P,, P l ) . . . (Pm, Pro) of E such that if k is the least number for which Pk > 2, we have Pk = Pt for all 1 > k. T h a t is, the agent performs a noop until the first reading p > 2 is obtained, and performs the action halt thereafter.
4
Implementation
of Knowledge-Based
Programs
Clearly, in order to execute a knowledge-based program according to the informal description above, we need some means to interpret the knowledge formulae it contains. To do so, we introduce a particular class of Kripke structures. These structures will be obtained from a set of traces by means of a generalization of observations. Define a joint view of an environment E to be a function {.} m a p p i n g traces of E to X '~ for some set X. We write {r}i for the i-th component of the result of applying {.} to the trace r. Intuitively, a view captures the information available to the agents in each trace. We will focus in this paper on the synchronous perfect recall view {.}pr. If r is the trace sl 9 9 Sk then the synchronous perfect recall view is defined by {r}~ r = Oi ( s l ) . . . Oi (sk). T h a t is, the perfect recall view provides the agent with a complete record of all the observations it has made in a trace. The assumption of perfect recall is frequently made in the literature, because it leads to protocols that are optimal in their use of information. Given a set 7"4 of traces of environment E and a joint view {.} of E, we m a y define the Kripke structure M(74, {.}) = (W, - , q , . . . , ~,~, V), where - the set of worlds W = T4, and - for all r , r ' E T4 we have r ~i r' iff {r}i = {r'}i, and - the valuation V : T4 x P r o p - + {0, 1} is defined by V(r,p) = Ve(fin(r),p). T h a t is, the accessibility relations are derived from the view, and truth of basic propositions at traces is determined from their final states according to the valuation Ve provided by the environment. Intuitively, r Hi r ' if, based on the information provided by the view {.}i, the agent cannot distinguish between the traces r and r'. We will call M(T4, {.}) the interpreted system obtained from T4
and {.}. We m a y now define the set of actions enabled by the program P g i at a trace r of system M to be the set P g i ( M , r) consisting of all the ai such that P g i contains a line "if ~ d o ai" where (M, r) ~ ~. Similarly, the set of joint actions P g ( M , r) enabled by a joint knowledge-b.ased program in a trace r with final state s contains precisely the joint actions (ae, a l , . . . , an) such that ae E Pc(s) and each ai E P g i (M, r).
142
This definition leaves open the question of what system we are to use when executing a knowledge-based program. The answer to this is that we should use a system that is itself generated by running the program. To avoid the circularity, we first consider a system generated by some standard protocol, and then state when such a system is equivalent to that generated by the knowledge-based program. Say that a joint protocol P implements a knowledge-based program P g in an environment E with respect to the joint view {.} if for all r in the set of traces TO(P, E) generated by P and E, we have P g ( M , r) = a c t ( P , r), where M = M(Tr E), {.}) is the system obtained from ~ ( P , E) and {-}. T h a t is, the joint actions prescribed by the knowledge-based program, when the tests for knowledge are interpreted according to the system M, are precisely those prescribed by the standard protocol P. Note that this definition (unlike that of [FHMV95a]) refers only to the behaviour of the protocol, not to its internal states. More precisely, say that two joint protocols P and P~ are behaviourally equivalent with respect to an environment E if they generate precisely the same set of traces by means of the same sets of action at each step. T h a t is, P and P~ are behaviourally equivalent when TO(P, E) = TC(P', E), and for all r E 7r E) we have a c t ( P , r) = a c t ( P ' , r). When this relation holds, P implements the knowledge-based program P g with respect to the view {.} if and only if P~ also implements P g with respect to the view {.}.
Example 3. The protocol of Example 2 implements the knowledge-based program of Example 1 in the environment of that example. (Similar results have been established in [FHMV95b, BLMS] with respect to views based on only the current observation, or the current observation plus the current time. We prove it here with respect to the perfect recall view.) Let M be the system obtained from the traces TO(P1, E) and {.}yr. To establish the implementation relation we show that actl (Pz, r) = P g l (M, r) for all traces r E TO(P1, E). Consider first a trace r = (p0, P0) ... (pro, pro) of P1 in which Pk < 2 for all k. By the definition of P1 we have aetl (P1, r) = {hoop}. To show that P g l ( M , r) = {hoop} we establish that M, r ~ -,Klg. For, let r ~ be the sequence of states (0, P 0 ) . . . (0, p,~) in which the sequence of readings is exactly the same as in r, but in which the robot never leaves position 0. Since each Pk is in {0, 1}, r ~ is in fact a trace generated by P1 in E. Now {r} pr = P0...pro = {r'} V, so r ~z r'. Clearly M, r ~ ~ --g, so by the semantics of K1 it follows that M, r ~ -,Klg, as required. Next, consider the remaining traces r = (P0, P 0 ) . . . (P,~, pro) of P1 in which Pk > 2 for some k. Take k to be the least number with this property. By the definition of P1 we have acQ(P1, r) = {halt}. To show that P g I ( M , r) = {halt} we establish that M, r ~ gig. For this, let r' = (p~, p ~ ) . . . (p~, p~) be any trace of M such that r "~1 r ~. We show M, r ~ ~ g. By definition of {.}pr and O1, we must have pl = p~ for all 1. Thus k is the least number for which p~ > 2, and we must have P~-I < 2. Since the actual position differs by no more than 1 from the reading, it follows that Pk-ll < 3. Hence p~ _< 3, because the robot can move at most one position at each step. Since p~ > 2 we must also have p~ > 1. This
143
proves that p~ is in the goal region {1, 2, 3}. By definition of P1 and 7, we have p~ = p~ for all l _> k, so p ~ is in the goal region. This proves M, r ' ~ g. [] In the framework of [FHMV95a] knowledge-based programs, even a t e m p o r a l ones, can have no, exactly one, or m a n y implementations, which m a y be behaviourally different. Our notion of implementation is more liberal, so this is also the case for our definitions. However, the notion of implementation is better behaved when we consider implementations with respect to the synchronous perfect recall view. P r o p o s i t i o n 1. For every environment E and atemporal joint knowledge-based program P g , implementations of P g in E with respect to the view {.}P~ always exist. Moreover, all such implementations are behaviourally equivalent. If "/~ is the unique set of all traces generated by any implementation of P g in E with respect to {.}pr, we refer to M(7~, {-}Pr) as the system generated by P g in E with respect to {.}P~. Although our formulation is slightly different, Proposition 1 can be viewed as a special case of Theorem 7.2.4 of [FHMV95b]. Intuitively, the set of traces of length one generated by any implementation must be the set of initial states of the environment. Because of synchrony, this set of states fixes the interpretation of formulae in L;c , so may be used to uniquely determine the actions enabled at the traces of length one in any implementation. These actions in turn uniquely determine the set of traces of length two in any implementation. By the synchrony assumption we again find that the truth values of formulae i n / : c hence the set of actions enabled at a trace of length two, is uniquely determined. This again uniquely determines the traces of length three, and so on. We formalize this intuition in the next section. 5
A Canonical
Implementation
It is convenient for the purpose of later constructions to describe the implementation promised by Proposition 1 using a new type of semantic structure. Note that the temporal slices of the Kripke structures generated by a knowledge-based program have two aspects: they are themselves Kripke structures, but each world (trace) also represents in its final state information about the possible successor traces. The following notion formalizes this. Define a progression structure for an environment E with states S~ and valuation V~ to be a pair N = (M,c~> consisting of an $5,~ Kripke structure M = (W, IC1,..., IOn, V) together with a state mapping c~ : W --+ S~, satisfying V(w, p) = V~(or(w), p) for all worlds w E W and propositions p. If w is a world of N then we will write N, w ~ ~, just when M, W ~ ~. A class of progression structures that will be central to the development below is obtained from the set of traces 7~ generated by a knowledge-based p r o g r a m P g in environment E with respect to {.}pr. Let T~,~ be the set of traces in 7~ of length n. Define Mn to be the structure M(Tt,, {-}Pr), and define the m a p p i n g c~
144
from n n to the states of E by c~(r) = fin(r). Then the structure N , = (M,~, (r) is a progression structure. In particular, note that the structure N1 is independent of the program Pg: its worlds are precisely the initial states of the environment. Progression structures carry sufficient information to represent the execution of a knowledge-based program with respect to the perfect recall view. First, note that the definition of P g ( M , r) above assumes that M is a system, in which the worlds r are traces. Observe that we may extend this definition to the worlds w of arbitrary progression structures N = (M, or). We define P g ( g , w) to be the set of joint actions P~((r(w)) • P g l ( M , w) x . . . x P g , ( M , w ) . Using this observation, we now define an operator on progression structures that captures the execution of a joint knowledge-based program P g in an environment E. If N -- (M, or) is a progression structure for E, then we define the progression structure N * P g = ( i ~, ~'), where i ' = ( W ' , / E l , . . . ,/C', V~), as follows. First, we let W' be the set of pairs (w, s) where w is a world in W and s E Se is a state of E such that there exists a joint action a E Pg(N, w) such that s = v(a)(cr(w)). We define (w, s)lC~(u,t) if and only if wlCiu and Oi(s) = Oi(t). The valuation V' is given by W((w, s),p) = V~(s,p) for all propositions p. Finally the state mapping (r is defined by or(w, s) = s. It is clear that the relations /C~ are equivalence relations, so this is a progression structure.
Example 4. We describe the structure N1 * P g for Example 1. The progression structure N1 for the environment of Example 1 consists of the Kripke structure M with set of worlds W = {(0, 0), (0, 1)}, accessibility relation/(:1 the smallest reflexive relation on this set, valuation V the same as the valuation on the states of E and state mapping ~ the identity function. For both worlds w of this structure we find that M , w ~ -~Klg, hence P g l ( M , w ) -- {hoop}. Thus, N1 * P g has 10 states, which are grouped into equivalence classes by the accessibility relation/C~ as follows: { ((0, 0), (0, 0)),
((0, 0), (1,0)) }
{ ((0, 1), (0, 0)),
((0, 1), (1, 0)) }
{ ((0, 0), (0, 1)),
((0, 0), (1, 1)) }
{ ((0, 1), (0, 1)),
((0, 1), (1, 1)) }
{ <(0,0),(1,2)> }
{ ((0,1),(1,2)) }
The state mapping is the function projecting pairs (s, t) to the state t. The operator 9 can be used to give a description of the system generated by a knowledge-based program as a union of a sequence of incrementally constructed progression structures. Say two progression structures ( i , cr) and ( i ' , cr~) are isomorphic if there exists an isomorphism g of their respective Kripke structures which preserves the state mapping, i.e. for which cr'(n(W)) = or(w) for all worlds w of M. L e m m a 2. For all n, the progression structure Nn+l is isomorphic to the product Nn * P g by means of the mapping ~ defined by t~(rs) -= (r,s), where rs
145
is a trace with final state s and initial portion r. Thus, the system M generated by P g in E with respect to {.}pr is isomorphic to the I(ripke structure of N1 U (N1 * P g ) U (N1 * P g * Pg) U .... This result enables us to give an explicit description of an implementation of P g . Let P be the joint protocol defined as follows. For agent i, the states Si of the protocol are either the initial state qi or a pair (N, X) consisting of a progression structure N together with an equivalence class X of the relation Ki of this structure. The update operation #i is given by
,,(q, o) =
Io [ O (s) = o))
when q is the initial state q~. If q is the state (N, X) we define #~(q, o) = (N * P g , X/), where X ~ is the set of pairs (w,s) where w E X and s E Se is a state of the environment such that Oi(s) = o and for which there exists a joint action a E P g ( N , w) with s = r(a)(c~(w)). Finally the action function c~i is defined by c~i(q, o) = P g i ( g , w), where #i(q, o) = (N, X ) and w is any world in X. L e m m a 3 . The protocol P is an implementation of P g in E with respect to {.}P~. Moreover, for all traces rs, if pi(Pi(rs),Oi(s)) = ( N , X ) then N is isomorphic to Nlrsl. We will refer to the protocol P as the canonical implementation of P g with respect to {.}pr. Since the number of traces of length n is potentially exponential in n, this implementation could be very inefficient. However, the definition of implementation allows considerable leeway in the choice of protocol states, so this does not preclude the existence of more efficient, or even finite state, implementations. Unfortunately, it is undecidable to determine whether a finite state implementation exists [Mey96a]. Nevertheless, we will will show that it is possible to apply to the protocol P an optimization that sometimes results in its running in constant space. Before doing so, we first recall a condition sufficient for the existence of a finite state implementation, which enables us to characterize the circumstances in which this optimization results in a finite state implementation. 6
A Sufficient
Condition
for Finite
State
Implementations
In this section we recall from [Mey96a] a sufficient condition for the existence of a finite state implementation of a knowledge-based program. Define a simulation from an Shn structure M = (W,/C1,...,1Cn, V) to an $5~ structure M ' = (W',/C~,...,/C~, V') to be a function tr : W --+ W ' such that 1. for all worlds w E W and propositions p, V ( w , p ) = V'(t~(w),p), 2. for all w, w' E W and agents i, if wl~iw' then xo(w)IC~n(w'), 3. for all w E W, w ~ E W ~ and agents i, if x~(w)tS~w~ then there exists v E W with wlCiv and n(v) = w'.
146
This is a straightforward generalization of a standard notion from modal logic (called p-morphisms in [Che80]) to $ 5 . structures. An easy induction on the construction of a formula ~ E tl c shows that if ~ is a simulation from M to M ~ then M, w ~ ~ iff M', n(w) ~ ~,. Example 5. Consider the substructure N of the progression structure N1 * P g of Example 4 consisting just of the following 5 states and accessibility relation ~ with the equivalence classes indicated:
{ ((0,0),(0,0)>,
((0,0),(1,0)> }
{ <(o,o),(o,i)>, <(o,o),(i,i)> ) { ((0,0), (1,2)> } Define the mapping ~ from N1 * P g to N by n((s, t)) = ((0, 0), t>. Then n is a simulation from N1 * P g onto N. T h e o r e m 4. [Mey96a] Let P g be an atemporal knowledge-based program. Let M be the system generated by the joint knowledge-based program P g with respect to the view {.}pr. If there exists a simulation from M to a finite Shn structure M ~ then there exists a finite state implementation of P g in E with respect to {.}pr We applied this result in [Mey96a] to show that finite state implementations are guaranteed to exist for all knowledge-based programs in environments in which communication is by synchronous broadcast to all agents. This class of environments includes all single-agent environments, so the result applies to Example 1. We note that the existence of a simulation to a finite structure is not necessary for the existence of a finite state implementation. Unfortunately, Theorem 4 does not in general provide an effective approach to the identification of finite state implementations. T h e o r e m 5. [Mey96a] Given a finite environment E and a knowledge-based program P g , the following problems are undecidable (i) to determine if there exists a finite state implementation of P g in E with respect to {.}pr. (ii) to determine if there exists a simulation from the system generated by P g in E with respect to {.}Pr to a finite $5,~ structure.
This result is not as negative as it may seem, however. It follows from results in [Mey94] that it is decidable to determine whether a given finite state protocol implements a given knowledge-based program with respect to {.}P~. Thus, it is possible to find a finite state implementation, provided one exists, simply by testing each finite state protocol. But this is hardly a computationally efficient procedure. In the next section we show that a much more tractable approach to the construction of a finite state implementation is possible. This approach does not always work, but can be shown to terminate just in case the condition of Theorem 4 applies.
147
7
An Optimization
As noted, the canonical implementation is inefficient. In general, it does not appear to be possible to improve upon the idea of maintaining the temporal slices of the system generated by a knowledge based program in its implementations. Examples such as that in [Mey96b] indicate that traces widely separated in N,~ from a given trace r may have an impact on the actions enabled at later extensions of r, so that, potentially, a representation of the whole progression structure N,~ needs to be maintained in the agents' protocol states. Nevertheless, as we now show, it may be possible to optimize the canonical implementation, on the fly, in such a way that under the appropriate conditions it requires only constant space. The key observation enabling the optimization is that in general, Kripke structures may encode information in highly redundant ways, duplicating the same information across a number of worlds. We now introduce some modifications of known concepts that help to factor out such redundancies from progression structures. A bisimulation between progression structures N = (M, @ and N ' = ( M ' , c~'}, where M = (W,/(;1,...,/(:~, V) and M = ( W ' , / C ~ , . . . , K:~, V'), is a binary relation R C W x W ' such that 1. domain(R) = W and codomain(R) = W', 2. for all i = 1 . . . n, whenever u R u ~ and uICiv there exists v~ E W ~ such that vRv' and u'lC~v', 3. for all i = 1 . . . n , whenever u R u ~ and u~lC~v~ there exists v E W such that v R v ~ and uK
148
bisimulation, and that this bisimulation is an equivalence relation. Under certain conditions, this bisimulation m a y be characterized explicitly. Say t h a t the primitive propositions Prop determine the states of the environment E if whenever s and t are distinct states then there exists a proposition p E Prop such that VE(s,p) # VE(t,p). For worlds w of N define q~y(w) = {~o E s [ N, w ~ ~} to be the set of formulae holding at w. L e m m a 7. Assume that propositions determine state in E. Suppose N is a finite progression structure over E and let R be the maximal bisimulation on N . Then for two worlds u, v of N we have wRy if and only if @N(U) = @N(V). 3 If an equivalence relation R is a bisimulation between a progression structure N and itself, then we m a y form the quotient N ~ = N / R of N by R, which is a progression structure ((W ~,/Ci,... ,/C~, W), a ' ) defined as follows. First, the worlds W ~ of the quotient are the equivalence classes [w]R of the worlds w of N. We have [u]R/C~[v]R if there exists u' E [u]R and v~ E [v]R such that u'lCiv'. The valuation V' of N ' is defined by V'([u]R,p) = V ( u , p ) . Finally, the state m a p p i n g is defined by cr~([u]R) = or(u). Note that the valuation and the state m a p p i n g are well-defined because R is a bisimulation. L e m m a 8. Let R be an equivalence relation which is a bisimulation on a progression structure N . Let tr be the mapping from N to N / R which takes every world to its R-equivalence class. Then tr is a simulation of progression structures. We are now in a position to define an operation on progression structures t h a t eliminates semantic redundancies. We write reduce(N) for the quotient N / R when R is the maximal bisimulation on N. We note that the m a x i m a l bisimulation, and hence the progression structure reduce(N), m a y be constructed very efficiently using techniques developed in [KS90, PT87] for the computation of m a x i m a l bisimulations on labeled transition structures. Before describing how reduction m a y be used to optimize the canonical implementation of a knowledgebased program, it is convenient to state one more lemma. L e m m a 9. Let N and N ~ be progression structures for an environment E and P g a knowledge-based program. I f there exists a simulation a from N to N ~ then there exists a simulation fl from N * P g to N t * P g . I f tr is surjective then 13 is surjective also. We now come to the main results of this section. Let M be the system generated by a knowledge-based program P g in an environment E with respect to {.}pr. For each k > 1 take Nk to be the progression structure whose worlds are the traces of M of length k. We now define an implementation P that is similar to the canonical implementation, but which performs a reduction of the progression structure at each 3 Similar results have been previously established for bisimulation relations on labeled transition systems [BR83, HM85].
149
step. As in the canonical implementation, for agent i, the states Si of the protocol are either the initial state qi or a pair (N, X) consisting of a progression structure N together with an equivalence class X of the relation/Ci of this structure. T h e difference lies in the particular progression structures used. In the canonical implementation, the update operation #i is defined by # i ( ( g , X), o) = ( g * P g , X'}, where X ' is some set of worlds. Instead of N * P g , the optimized implementation will use the reduced structure reduce(N, P g ) . More precisely, in the case q = qi we define pi(q, o) to be
(reduce(N1), t~({s E Ie ] Oi(s) = o})) where ~ is the canonical m a p from N1 to reduce(N1). When q is of the form (N, X), let ~ be the canonical m a p from N 9 P g to r e d u c e ( N , P g ) . We define pi(q, o) = (reduce(N, P g ) , X'), where X ' is the set of ~(w, s) for all pairs (w, s) where w E X and s E S~ is a state of the environment such that Oi(s) = o and for which there exists a joint action a e P g ( N , w) with s = r(a)(~r(w)). Finally the action function (~i is defined by c~i(q, o) = P g i ( N , w), where Pi(q, o) = (N, X ) and w is any world in X. L e m m a 10. The protocol P is an implementation of P g in E with respect to {.}pr. Moreover, for all traces rs, if #i(Pi(rs),Oi(s)) = ( N , X ) then N is iso-
morphic to reduce(Nlr~l ). This result follows quite directly from L e m m a 9, and amounts to establishing the c o m m u t a t i v i t y of the diagram in Figure 1. Whereas the canonical implementation uses the progression structures Nk, the optimized protocol uses the reduced structures Qk. Let us now consider the complexity of the optimized implementation. In general, the structures Qk may grow to arbitrary size. The following result shows that the sufficient condition of Section 6 characterizes the situation in which this is not the case.
Suppose that propositions determine state in E. Then the optimized implementation of P g runs in constant space if and only if there exists a simulation from M to some finite SSn structure.
Theorem ll.
Note that we are not required to know the existence of the simulation a
priori in order to obtain the space bound. Given an environment, the optimized implementation simply starts computing the structures Qk: if the appropriate simulation exists, these will never become larger in size than some constant. In this case there will exist some numbers K < L for which Qtr is isomorphic to QL, and we could use this in order to produce an implementation represented as a finite state automaton. (The optimized implementation itself does not consist of an a priori determined finite set of states. Nor do we have a way to predict the size of the cycle point L: this is a nonrecursive function of the size of the environment.)
150
j,
O,*Pg/
/
/
,Ii
/
Q~
/
Q/Pg
Qz*Pg i
Q2
Q3
Qk
Fig. 1. An optimized implementation
8
Conclusion
We have Mready noted that the system generated by the knowledge based prog r a m and environment of Example 1 admits a simulation to a finite $5~ structure. Thus, the optimized implementation runs in constant space in this case. Of course, the existence of a finite state implementation was already known for this example from the analysis carried out in Example 3. The significance of Theorem 11 is that this result could have been obtained as the output of an aut o m a t e d procedure, which would terminate in all cases admitting a simulation to a finite S5n structure. Potentially, implementation of this procedure could result in the discovery of new examples in which finite state implementations exist. As the undecidability results indicate, strong results enabling the synthesis of finite state implementations of knowledge-based programs whenever these exist are impossible for the class of all finite environments. In the face of this, a number of compromises are possible. We have explored one in the present paper: the identification of optimizations that sometimes generate a finite state implementation. One avenue for further research is to seek stronger methods of optimization - - one possibility would be to a t t e m p t to use the structure of the environment in the reduction process. Another avenue is to focus on restricted classes of environments - - as we have remarked in previous work [Mey94], the structure of the environments used in the lower bound proofs is very unnatural, and there appears to be scope for positive results in certain more natural classes.
151
References J. van Benthem. Correspondence theory. In Handbook of Philosophical Logic II: Extensions of Classical Logic, pages 167-247. D. Reidel, Dordrecht, 1984. [BLMS] R.I. Breffman, J-C. Latombe, Y. Moses, and Y. Shoham. Applications of a logic of knowledge to motion planning under uncertainty. Journal of the A CM. to appear. [BR83] S. Brookes and W. Rounds. Behavioural equivalence relations induced by programming logics. In Proc. ICALP, pages 97-108. Springer LNCS No. 154, 1983. [CheS0] B. Chellas. Modal Logic. Cambridge University Press, Cambridge, 1980. [FHMV95a] R. Fagin, J. Halpern, Y. Moses, and M. Vardi. Knowledge-based programs. In Proc. A CM Symposium on Principles of Distributed Computing, 1995. [FHMV95b] R. Fagin, J. Halpern, Y. Moses, and M . Y . Vardi. Reasoning about Knowledge. MIT Press, Cambridge, MA, 1995. [HM85] M. Hennessy and R. Milner. Algebraic laws for non-determinism and concurrency. Journal of the ACM, 32(1):137-161, 1985. [HM90] d. Halpern and Y. Moses. Knowledge and common knowledge in a distributed environment. Journal of the ACM, 37(3):549-587, 1990. [KSg0] P.C. Kanellakis and S. A. Smolka. CCS expressions, finite state processes and three problems of equivalence. Information and Computation, 86:43 68, 1990. [Mey94] R. van der Meyden. Common knowledge and update in finite environments I. In Proc. of the Conf. on Theoretical Aspects of Reasoning about Knowledge, pages 225-242, 1994. [Mey96a] R. van der Meyden. Finite state implementations of knowledge-based programs. In Proc. Conf. on Foundations of Software Technology and Theoretical Computer Science, December 1996. Hyderabad, lndia. Springer LNCS to appear. [Mey96b] R. van der Meyden. Knowledge-based programs: on the complexity of perfect recall in finite environments. In Proc. of the Conf. on Theoretical Aspects of Rationality and Knowledge, pages 31-50. Morgan Kaufmann, 1996. [Mil90] R. Milner. Operational and algebraic semantics of concurrent processes. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics, pages 1201-1242. Elsevier Science Publishers B.V., Amsterdam, 1990. [PT87] R. Paige and R. Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 16(6):973-989, 1987. [RG92] A.S. Rao and M. Georgeff. An abstract architecture for intelligent agents. In Proc. Int. Joint. Con]. on Artificial Intelligence, pages 439-449, 1992. [RK86] S.J. Rosenschein and L.P. Kaelbling. The synthesis of digital machines with provable epistemic properties. In J.Y. Halpern, editor, Theoretical Aspects of Reasoning about knowledge: Proc. 1986 Con]., pages 83-98. Morgan Kaufman, Los Altos, CA, 1986. [Sho93] Y. Shoham. Agent oriented programming. Artificial Intelligence, 60(1):51-92, 1993. [Ben84]
Semantics of BDI Agents and Their Environment David Morley Australian Artificial Intelligence Institute, Level 6/171 Latrobe Street, Melbourne, 3000 Australia Email: morley~aaii, oz. au A b s t r a c t . This paper describes an approach for reasoning about the interactions of multiple agents in moderately complex environments. The semantics of Belief Desire Intention (BDI) agents has been investigated by many researchers and the gap between theoretical specification and practical design is starting to be bridged. However, the research has concentrated on single-agent semantics rather than multi-agent semantics and has not emphasised the semantics of the environment and its interaction with the agent. This paper describes a class of simple BDI agents and uses a recently introduced logic of actions to provide semantics for these agents independent of the environment in which they may be embedded. The same logic is used to describe the semantics of the environment itself and the interactions between the agent and the environment. As there is no restriction on the number of agents the environment may interact with, the approach can be used to address the semantics of multiple interacting agents.
1
Introduction
There is a great deal of interest in developing computational agents capable of rational behaviour. One approach has been to base the agents on the mental attitudes of Beliefs, Desires (or goals), and Intentions [Bratman, 1987; Rao and Georgeff, 1995]. Much work has been done on the semantics of these BDI agents and in developing implementations of agent-based systems for solving practical problems. The wide gaps between theoretical specification of agents and the implementations is now starting to be bridged. For example, Rao [1996] provides an operational and proof-theoretic semantics for a language AgentSpeak(L), an abstraction of one particular implementation of BDI agents. One of the main benefits of the BDI agent approach is that it is well suited to situations where an agent is embedded in a complex and dynamic real-world environment. Unfortunately, the progress on the semantics of the agents themselves is not matched by progress towards understanding the semantics of the interaction of the agent with its environment. This importance of the agent-environment interaction is illustrated by the special double volume of Artificial Intelligence, on Computational Research on Interaction and Agency [Agre, 1995]. In his introduction, Agre describes the
120
"emerging style of research" of "using principled characterizations of interactions between agents and their environments to guide explanation and design". Agre maps the scope of the research along two axes, environment complexity and number of agents, and within this space identifies three areas that categorise most current research (with papers in the double volume concentrating almost exclusively on the third): complex interactions among several agents where the environment plays almost no part; - emergent behaviour from simple interactions among a multitude of agents in very simple environments; and - single agents in more complex environments.
-
This paper explores a different area, combining the environmental complexity of the first area with the multi-agent capability of the third. What enables this transition to multiple agents is a recently introduced non-modal temporal logic allowing concurrent actions [Morley and Sonenberg, 1996]. The logic provides a monotonic solution to the frame problem that avoids problems inherent in the popular Explanation Closure approach [Schubert, 1994]. By its treatment of concurrent events and actions, the logic allows the single-agent classical planning approach to representing the world to be extended to include the simultaneous actions of multiple agents, and to include the internal operations of the agents themselves. Rather than treating an agent's environment as a finite automaton that provides sensory feedback to the agent, the agent and its workings are treated as an extension of the workings of the environment. We introduce a simplified model of a BDI agent based on the Procedural Reasoning System [Ingrand et al., 1992] (restricted to the propositional case for simplicity). This model deviates from the traditional approach of having primitive actions which enable the agent to act directly on the world, by instead allowing the agent to interact only with its set of beliefs. The interaction between the beliefs and the environment is then achieved through sensors and effectors. The logic of concurrent actions of Morley and Sonenberg [1996] is then applied to the semantics of the agent model, to the semantics of the sensors and effectors, and to the semantics of the environment. Exploiting the composability of the logic, the separate pieces can be integrated, providing the means to analyse the performance of the agent situated in its environment. We can add other agents and sensor/effector processes to model multiple agents interacting through the same environment. For illustration, we show how a simple two agent system can be represented. The focus of the paper is on providing and explaining this analytical tool, rather than on its application to specific problems. Before embarking on the main theme of the paper, we begin with a brief review of the logic of Morley and Sonenberg [1996]. 2
Brief
Overview
of
Action
Logic
Morley and Sonenberg [1996] present a logic of events and actions which al-
121
lows for concurrent actions. The key difference between that work and other approaches to events and actions is in the semantics of events. Rather than a set of behaviours, an event is interpreted as a set of e v e n t instances. An event instance consists of a behaviour, a history of all changes to the state of the world over the duration of the event including changes due to other events, t o g e t h e r w i t h an i n d i c a t i o n o f w h i c h changes are due to the e v e n t in question.
Consider a behaviour where Fred and Mary are both drinking cups of coffee at the same time. We can distinguish two event instances, "Fred drinking coffee" and "Mary drinking coffee", which share this same behaviour. However, the two event instances are distinct, because the first instance includes as effects the changes in the level of coffee in Fred's cup whilst the second instance has the changes to Mary's cup. One advantage of this approach is that it distinguishes different ways of combining events that are not distinguished by other formalisms: c o n c u r r e n t c o m p o s i t i o n - adding the effects of two different events; i n t e r s e c t i o n - taking two descriptions of the s a m e event and producing a
-
more precise description; and c o - o c c u r r e n c e - which leaves open the question of whether the events referred to are the same.
-
2.1
Syntax
To simplify the presentation, the language is untyped. However, in the use of the event language we are commonly dealing with - propositions, p, q, etc., that denote propositional fluents, and are interpreted as sets of worlds in which the propositions hold; - events, e, e', etc., that denote sets of event instances; and - actions, c~,/3, etc., that denote pairs of events: the event of successful execution of the action and the event of unsuccessful execution of the action. Terms of the language, T, are constructed out of function symbols of various arities from F = F0 U F1 U F2 U ..., variable symbols from V , and a selection of special constants and operators. T is defined to be the least set such that: variables, from V C T the universal event constant, e E T arbitrary constants, from F0 C T - arbitrary functions, f ( x l , ..., x~) E T if f E F~, and xl,..., x . E T 5, e N e I , e + e I , e U e ~,ee ~ E T i f e , e ~ E T - events, [p], N , [P• [ / ] E T if p E - events, S ( a ) , F(c~) E T if a E T actions, Z, 4, a?r while(a,/~), a It fl E T if a, fl, 7 E T
-
-
-
-
e
v
e
n
t
s
,
-
Well formed formulas are constructed from terms using predicate symbols of various arities from P = P0 U P1 U P2 U ..., using the special predicates, fluent(x),
122
event(x), action(x), e C_ e', and x = x', and connectives and quantifiers. Although the language is untyped, for 3p.r will be used for Vp.(fiuent(p) --4 r Quantification over events and actions are Va.r and 3a.r
2.2
using the standard first order logical brevity the abbreviations Vp.r and and 3p.(fiuent(p) A r respectively. similarly abbreviated as Ve.r 3e.r
Informal Semantics
T h e special "type" predicates, fluent, event, and action, are true of propositions, events, and actions respectively. The _C event operator is the "more specific than" event relationship. For example, if em is the event of m y depositing some m o n e y into my account and era5 is the event of my depositing 5 dollars into m y account, then em5 C era. The -- operator is the equality relation. and e N e ~ are standard set complement and intersection operators over events. Set intersection is used to further qualify an event. Thus if e5 is the event of someone depositing 5 dollars into my account then em~ =-- em CI es. e, the "universal" event which includes all event instances, is the identity for intersection. Set union can be defined as el U e2 ~ ~'1 1"7~ . If an event instance ends in a state that another starts with, the concatenation of the two event instances consists of concatenating the behaviours and combining the effects. The event ee ~ consists of all possible concatenations of event instances in e with event instances in e ~. The concurrent composition of two event instances is defined for event instances with the same behaviour and disjoint effects. It takes the union of the effects of the two instances. For example, the concurrent composition of the "Mary drinking coffee" event instance with the "Fred drinking coffee" event instance is "Mary and Fred drinking coffee", which causes both cups to empty. The concurrent composition event e + e' contains all possible concurrent compositions of event instances in e and e'. The event em5 9- e5 necessarily increases m y bank balance by 10 dollars. The semantics suggested by Morley and Sonenberg [1996] requires disjoint effects to span different time periods. This leads to an interleaving-style semantics for concurrent composition. However, different semantics, allowing disjoint effects to occur at the same time, are also possible. The co-occurrence event, e IA e ~, contains event instances that are formed from event instances in e and e' with matching behaviours by taking the union of their effects. For example, era5 IAe5 m a y increase m y bank balance by 5 dollars or by 10 dollars, since the events referred to m a y be either distinct or the same. The event e N e' can be seen as the subset of etA e' where the effects of the two instances are identical. The event e + e' can be seen as the subset of etA e ~ where the effects of the two instances are disjoint. Propositions refer to sets of world states and are related to events via a number of operators. The event [p] (not to be confused with a modal operator) contains all event instances where p remains true. T h e event [~] contains all event
123
instances where p remains false. From these we can define [Pt] (and [p,]) where p becomes true (or false). /p• is the set of all event instances where the truth of p is not affected. T h a t is, the truth of p may change, but not as an effect o f the event. As an example, during the "Fred drinking coffee" event instance above, the level of coffee in Mary's cup happens to change, but not as an effect of Fred drinking the coffee. Thus if p refers to Mary's cup being full, the "Fred drinking coffee" event instance is in [p• Conversely, [p-r] is the set of all event instances where if the truth ofp changes, then that change m u s t be an effect of the event. These are the events which cause all observed changes to p. Combining these with [pt] (and [p,]) we get events such as [p~-] = [pt] N [pT] which are those where p is caused to become true, as opposed to incidentally becoming true. Some events lie neither in [p• nor in [pV]. For example, consider two people fighting over what channel to watch on television. The channel may change multiple times. The event corresponding to one person's actions is only responsible for some of those changes, rather than none or all. A c t i o n s represent an agent's attempt to affect the world and may succeed or fail. Giunchiglia et at. [1994] handle the potential failure of actions explicitly using a variant of process logic. In their logic an action is represented by a modal operator and is associated with two sets of behaviours: those in which the action succeeds and those in which it fails. Applying the same basic approach, Morley and Sonenberg [1996] treat an action, a, as nothing more than a pair of events, the event of successful executions of the action, S(a), and the event of failed executions, F ( a ) . Actions can be arbitrarily complex attempts to affect the world and are described by providing axioms to constrain their success and failure events. For example, we would state that all successful event instances of a "load gun" action result in the gun being loaded. We have primitive actions, S and
124
2.3
Combinatorial
Frame Problem
The combinatorial frame problem [Georgeff, 1987] is the problem of stating the multitude of non-effects of an event or action. Usually, one states t h a t an event only affects certain properties and that "others" remain unaffected (possibly using some non-monotonic logic). As pointed out by Lifschitz [1990], we need to restrict these non-effects to some limited set of propositions, a frame. For example, if an event affects p, then we d o n ' t want to deduce that the event does not affect the negation of p. For this reason we introduce terms representing systems of propositions, a, a ~, etc., into the logic, and a relation, incl(a, p), representing system a including proposition p. We can then state that an event, e, affects (at most) propositions Pl and P2 of the frame system a as follows: Vp.(ind(a,p) e C [p• V p = Pl V p = P2) Another problem is that if P3 is also in the frame system a, we can only deduce that e does not affect P3 if we know that Pl 5~ P3. Rather than writing these inequality axioms directly for all pairs of propositions in the frame system, we use the following scheme to build up a frame system from subsystems: We use system(a, pl,...,pn) as an abbreviation to state that a contains (at least) the distinct propositions Pl,.-.,Pn:
Vi.incl(a, pi) AVi 5s j.(Pi r Pj) We use system(a, al, ..., an) as an abbreviation to state that a contains ( a t least) the propositions of the distinct subsystems al,...,an:
Vi, p.(incl(ai,p) --+ incl(a,p)) A Vi 7~ j,p.-,(incl(ai,p) A incl(crj,p)) To provide a more convenient syntax for stating the limits or range of effects of an event, we use range(e, a, pl, ..., Pn) as an abbreviation to state that of the propositions in a, e affects (at most) Pl,...,Pn:
Vp.(incl(cr, p) --+ e C [p• V ~i.p = pi) Sometimes it is more convenient to state that the range of an event includes (at most) the propositions in one or more subsystems of the frame system. For this we use range(e, a, al, ..., an) as an abbreviation for:
Vp.(incl(o',p) --+ e C_ ~• V 3i.incl(o'i,p)). 3
Simple BDI Agent Model
Roughly speaking, the state of a BDI agent can be described by - a set of beliefs about the world; - a set of goals that the agent is currently trying to achieve; - a library of plans describing how to achieve goals and how to react to changes in beliefs; and - an intention structure, describing how the agent is currently achieving its goals and reacting to changes in beliefs.
125
3.1
Beliefs
In this paper, we consider all the agent's interactions with the environment to occur through its set of beliefs. Sensors update the belief set based on the state of the environment. Effectors act on the external world when the agent adopts certain beliefs and report their success or failure by modifying the belief set. An agent's "beliefs" are represented by basic belief terms. An agent's belief set is a subset of the basic belief terms for the agent - those that are currently "believed". Syntactically, these belief terms are quite distinct from the propositional fluents of the action logic. A relationship between a belief term and a proposition describing a property of the environment may be intended by the programmer and/or ascribed by some external reasoning (e.g., [Brafman and Tennenholtz, 1994]), but no such relationship exists a priori. Belief expressions are syntactic constructs formed from belief term symbols and t (which is always believed) using the unary operator, " for negation, and the binary operators, ~ for conjunction and I for disjunction. We use f as an abbreviation for ~t. Belief expressions represent properties of an agent's current belief set - the concurrent holding or not holding of a given set of beliefs. For example, pao represents the belief set containing p and Q, while -P represents the belief set not containing P. Even if the creator of an agent designs the agent with the assumption that the agent's belief set contains P only if some property p of the environment is true, it is not necessarily true that "P, the absence of belief, occurs only if p is false. The creator is free to make this assumption or, for example, to introduce a new belief term Kp corresponding to the belief that the agent knows whether p is true. In this case I~pt~'P would indicate that tile agent believes p to be false. If P is a belief term, then +p and -P are belief change terms representing P becoming believed true and false respectively. 3.2
Goals
Goals represent objectives that an agent is trying to achieve and are expressed by goal terms. A goal term is of the form, !P, where P is a belief term. These goals represent the desire to achieve the truth of the proposition that is meant to correspond to belief P. Goals are not always achieved, and so the action that an agent uses to achieve a goal may either succeed or fail. Generally, it is assumed that the action for achieving !p will succeed if P is already believed true or if a plan to achieve P succeeds and will fail otherwise. 3.3
Plan Body Terms
Plan body terms represent actions to be performed. These may be in response to changes in beliefs, or in order to achieve goals. Plan body terms are of the following form:
126
sucr Representing the action that succeeds immediately. f a i l Representing the action that fails immediately. +P,-P Belief change terms represent the action of causing P to be believed or no longer believed. If no change in belief is possible because P was already believed (or not believed), then the action fails 1. ! P Goal terms represent the action of testing belief in P and if it is not believed, executing plans to try to achieve it. The action succeeds based on the success of the plan or plans it executes. "(P, q) As well as goals to achieve something, we also have goals to wait until something is believed. A w a i t t e r m is of the form ~(P, q) where P and Q are belief expressions. This represents the desire to wait for either P or q to be believed. The action corresponding to "(P, Q) will succeed if P is believed, or becomes believed before Q, and will fail if Q is believed before P is believed. If neither is or becomes believed, then the action will not terminate. ire(T1, T2, T3) Representing conditional execution. This executes plan body term T1 and if that succeeds executes T2 otherwise it executes T3. Success or failure of the action as a whole is the same as that of whichever of T2 or T3 is executed. while(T1, T2) Representing iteration. This executes plan body terms T1 and T2 alternately until one fails. If T1 fails first then execution of the compound t e r m succeeds, or if T2 fails first the execution of the compound t e r m fails. "1~ as an abbreviation for "(P, f), i.e., to wait until P is believed. The corresponding action never fails. ?P as an abbreviation for " ( P , t ) , i.e, to succeed if P is believed and to fail otherwise. The corresponding action always succeeds or fails straight away 2. T1 ; T2 as an abbreviation for ire(T1, T2, f a i l ) . This is just conditional sequential execution of plan body terms. Execute T1 and if that succeeds, T2.
3.4
Plans
P l a n s in an agent's plan library describe how to respond to changes in beliefs
and how to execute goals. Changes in belief arise through sensory input and the execution of plans. Execution of plans leads to belief changes and posting goals. These m a y in turn lead to further plan execution. A plan consists of three parts: - invocation condition - a goal term or belief change t e r m t h a t triggers execution of the plan; - context condition - a belief expression that describes conditions under which the plan is applicable; and - plan body - a plan body term that describes how to act. 1 Note: This differs from the treatment of assert in PRS which always succeeds. 2 Note: This use of ?P is slightly different from that of PRS [Ingrand et al., 1992]. In PRS, ?P may lead to plans being executed to test the value ofp. The equivalent here is to try to achieve Kp and then test the belief ?P.
127
If T1 is a belief change term or a goal term (the invocation condition), P is a belief expression (the context condition), and T2 is a plan body term, then plan(T1, P, W2) is a plan. Plans with a belief change term as the invocation condition are to be executed when the given change in belief occurs. Plans with a goal t e r m as the invocation condition are to be executed when a plan body being executed contains this goal term (and the belief term is not already believed). 3.5
Intentions
The intention structure of an agent consists of a number of intention threads, each containing a current plan body term yet to be executed and a stack of plans that are in the process of being executed. 3.6
Effectors and Sensory Actions
All interactions of an agent with its environment occur through its belief set. The effect of any primitive action, A, can be modelled by introducing three belief terms, dA, SA, and fA- dA represents the agent's need to execute t . sA and fA represent the success or failure of an a t t e m p t to execute t. An agent wanting to immediately execute the primitive action A, asserts dA and then waits until either success or failure is reported, cleans up, and succeeds or fails appropriately: +dA ; ite(-(SA, fA), (-SA ;-dA), (-fA ;-dA ; f a i l ) ) . The effector process repeatedly waits for the belief of dA, executes the appropriate action, and reports success or failure as appropriate by changing the belief set. One possible side effect of the effector may be to update other agent beliefs as well. Such an effector would in effect be implementing a sensory action.
4
Execution
of BDI
Agent
Language
We can represent the execution of a BDI agent as an action in the action logic. This section describes constraints on that action that a valid implementation must satisfy. The belief set of an agent is represented by a system c% of propositions of the form B(P), one for each database belief terms, P. We extend B(P) to belief expressions by letting: VP.B(-P) -- -~B(P) VP, Q.B(P~O) -- B(P) A B(O) VP, Q.B(PIQ) : B(P) V B(O) Note that, as explained in Sect. 3.1, the belief expression -P represents the absence of belief term P in the belief set and hence the lack of belief in the corresponding proposition p, not the belief in the negation of proposition p.
128
Let us assume that for any belief expression, P, we have an action, fl(P) that an agent uses to test its belief in P. We state that fl(P) does not affect any of the propositions in the frame system (rf which contains (rb by system(v,1, ~%) A VP.vange(fl(P), tr/). We require VP.S(fl(P)) C [B(P)] and VP.F(fl(P)) C_ [B(P)]. All plans are executed within an intention thread. Since plan execution can be suspended while other plans in other intention threads are executed, it is possible that there might be a time delay before each plan body term is executed. To take this into account, for each intention thread, i, there is an action, a~i representing this possible delay. If execution of the intention thread is suspended, a~i waits until the intention thread is no longer suspended, c% i does not affect the belief set and never fails, so Vi.range(a~, 07) and Vi.F(aiw) = e ~ Since, the intention is longer suspended at the end of aw,i we require V~.aw~a ~ i" i = C I ' wi . The execution of a plan body term, T, within an intention thread, i, is given by the action arc(T, i) that must satisfy the following constraints:
- arc(+P,i) causes the database belief term P to be added to the belief set. It fails if P is already present. As it only affects B(P) we have
VP.range(arc(+P), ~I, B(P)). We require arc(+P, i) = a~&a where S(a) C [B(P)tr] and F ( a ) C [B(P)]. - arc(-P,i) causes the database belief term P to be deleted from the belief set. It fails if it was not already present. Similar to arc(+P,i), but here S(~) C [B(P)~] and F(~) C [B(P)]. arc(~(P, 0), i) waits until either B(P) is true or B(Q) without itself changing the belief set. It fails if B(P) stays false and B(Q) is or becomes true, otherwise it succeeds when B(P) is or becomes true. The action does not affect the belief set so VP, Q.range(arc('(P, Q)), cr]). We require arc(~(P, Q), i) = c~iw&:c~where S(a) C [B(P)] U ([B(P)t ] N [B(Q~t'P)]) and F ( a ) C [B(Q~'P)] U ([B(q)t ] U [B(P)]). - arc(!P,i) succeeds immediately if P is believed true. If not, the relevant plans, those that have invocation condition !P, are found and the context conditions are checked for applicability. If no plan is applicable, the action fails. If one plan is applicable, the action corresponding to its plan body is executed. If more than one is applicable, then one is selected by some selection function, and the action corresponding to its plan body is executed. Thus if plan(!P, 01, T1), ... plan(!P, On, In) are the relevant plans (all those with invocation condition !P), then arc(!P,i) is ~/w~5(fl(P) .gS[fl(Q1 )?are(T1, i) l...~(Q,)?are(Tn, i)Iqh) - arc(ite(T1, T2, Ts), i) = arc(T1, i)?are(T2, i)lave(T3, i) arc(while(T1, T2), i) = while(are(T1, i), arc(T2, i)) -
-
The event representing the internal operations of the agent spawns concurrently executing intention threads (concurrent actions) when the belief set of the agent changes. Whenever a belief term is added or deleted from the belief set, the set of relevant plans, those whose invocation condition is a belief change term matching the belief set change, is collected. The applicable plans are those
129
whose context conditions are believed true in the (new) belief set. An intention thread is created for each applicable plan 3 and the action corresponding to the plan body is executed in t h a t thread. The success or failure of these actions is ignored. Thus if B(P) becomes true then all relevant plans are collected: plan(+P, Ol, Wl), ... plan(+P, Q~, Tn) and those whose context conditions are true are executed, each in a new intention thread. This corresponds to the action / 3 ( q l ) & a r c ( T l , i l ) I] ." II / 3 ( Q , ) ~ a r c ( T , , i , ) being executed where il, ..., iN are new intention threads.
5
Agent and E n v i r o n m e n t
So far we have only discussed the execution of an agent with respect to its belief set. An agent is meant to operate within an environment, using sensors and effectors to interact with that environment. The execution of the agent within an environment consists of the interplay of concurrently executing events representing: - the working of the environment; the internal operations of the agent; and - the interaction between the environment and the agent mediated by the sensors and effectors. To represent the environment, we introduce a system, c%, to be part of the frame system, ~/, distinct from the belief system of the agent, Crb. Thus
system(~r / , Orb,o%). The sensors and effectors are represented by processes (on-going events) that, based on the state of properties in the environment, cause changes to the agent's beliefs and in the case of effectors, trigger events in the environment. Suppose the author of an agent's plan library intended the agent's belief in P to reflect the property, p of the environment. If the sensor was perfectly reliable and instantaneous, then the proposition, B(P), of an agent believing P would be true whenever p was true. Realistically, there will usually be discrepancies between B(P) and p due to sensor imperfection. Naturally, this will influence the "correctness" of an agent's program. By explicitly reasoning about the interaction between the environment and the agent, we have the potential to quantify the effect that this will have on the agent's behaviour. This discrepancy between B(P) and p directly affects the semantics of the wait plan body term. The wait term actually waits for the belief in a proposition to become true rather than the actual proposition to become true. Thus if we instruct an agent to wait until the telephone rings, it is important that the agent is in a position to know that the telephone rings (and does not, for example, wander away from the telephone). Thus in considering the effectiveness of a plan containing wait terms, we need to consider the ability of the agent to detect the waited-for condition. 3 Different strategies are possible when there is more than one applicable plan, however we have chosen this for its simplicity.
130
Consider the goal !P. If the relevant plans were to implement the intended linkage function correctly, then arc( !P, i) terminates with success if p is true on t e r m i n a t i o n and with failure if p is false on termination.
6
Example
Consider a scenario where two agents interact with an e n v i r o n m e n t containing a gun, a b o m b , and a fuse. Agent one is capable of lighting the fuse and of asking agent two to shoot the bomb. For simplicity, let us assume t h a t time is discrete and t h a t the b o m b explodes if the fuse has been burning for two time periods. Agent two is capable of loading the gun and of shooting the b o m b (causing it to explode). 6.1
Environment
We describe the environment by a subsystem, ~ , of the frame system, c~l, (i.e., s y s t e m ( (rl , ~e)) containing three fluents, s y s t e m ( ge , b, x , / ) : b - the fuse burning; - z - the b o m b has exploded; and - l - the gun is loaded. -
We introduce an event, e en" , corresponding to the causality law stating t h a t the b o m b explodes if the fuse has been burning for two time periods. First let us introduce the n o t a t i o n e ~ e ~ for eee~e. This event contains no event instances where (anything happens and then) e occurs and then e ~ (and then other things) do n o t occur, i.e., e is always followed i m m e d i a t e l y by e ~. Similarly, we use e ~ e ~ for ~'~e~e, i.e., e ~ is always i m m e d i a t e l y preceded by e. Let el be the set of all events t h a t take one time period. We can define e2 = elel to be the set of all events t h a t take two time periods. Let e = [b] N (e2) be the event where the fuse is burning for two consecutive time periods, and let e ~ = [z~-] A el be the event where x is caused to become true over one time period. T h e s t a t e m e n t t h a t e is sufficient for the environment to cause e ~ is translated as e env C e ~ e ~, and the s t a t e m e n t t h a t e is necessary for the e n v i r o n m e n t to cause e ~ is translated as e ~nv C e ~ e ~. Note t h a t this does not prevent the b o m b f r o m exploding if the fuse does not burn for two time periods - it j u s t prevents the e n v i r o n m e n t from causing t h a t explosion. Finally, we need to state t h a t as far as the propositions in the frame are concerned, e env only affects x, i.e., r a n g e ( e env, ~ I , x ) . 6.2
Agent 1 Beliefs, Sensors, and Effectors
A g e n t one has "environment" belief terms, b and x, which are intended to correspond to the propositions b and x of the environment. T h e two actions,
131
-
to ignite the fuse; and - A - to ask agent two to shoot the b o m b I
-
are i m p l e m e n t e d using additional "action" belief terms: -
di si fI dA sA fA -
belief belief belief belief belief belief
in in in in in in
desire to ignite fuse; successful ignition of fuse; failed ignition of fuse; desire to ask agent two to shoot the b o m b ; successful asking agent two to shoot the b o m b ; and failed asking agent two to shoot the bomb.
So we have system(crb, Bb, Bx, B d i , B s i , B f i , B d , , Bs,, BfA). First let us consider the sensor event, e~. Suppose t h a t this sensor accurately u p d a t e s Bx immediately. We can translate this as e~ C_ [x] ~ [Bx] and e, C [x] r [Bx], i.e., x is true (up to and) at a time point if and only if B x is true at (and after) the time point. We also require range(e~, c~/, Bx). T h e sensor event eb could be defined similarly or could introduce a potential delay, so t h a t only if b is true for some time period will B b be guaranteed true. We also require range(eb, c~/, Bb). In a similar manner, we can construct an event, el, to trigger an a t t e m p t to ignite the fuse whenever Bdi becomes true and u p d a t e B s i or B f I accordingly, with range(ei, or/, b, B s i , B f i ) . A simple way of representing the c o m m u n i c a t i o n action, A, is by an event CA t h a t acts directly on agent two's belief set in a similar way. In this case we have range(eA, or/, Ba, BsA, BfA). T h e combined effect of agent one's sensors and effectors is the event ei = e:c Jr- eb + e I + CA.
6.3
Agent 2 Beliefs, Sensors, and Effectors
Agent two has belief terms, x, 1, and a. The two actions, - L - to load the gun; and - S - to a t t e m p t to shoot the b o m b are implemented using the "action" belief terms: du, SL, fL, ds, as, and fs. T h e system of propositions corresponding to agent two's beliefs is specified by system(orb I, Bx, B1, Ba, B' du, BI sL, B'fL, Bids, BIss, B I f s ) . T h e sensor and effector events are specified in the same way as for agent one. Here, the ranges of the events are: range(~i, e~, Bx), range(c~], el, B1), range(c~l, elL, l, B'SL, B'fL), and range(c~], e~s,x, B'ss, B ' f s ) . T h e combined effect of agent two's sensors and effectors is the event e~ =
e" +
+ ek +
132
6.4
Agent
1 Plans
Agent one's behaviour is governed by plans such as: -- !x : t : ! b ; ' ( x , "b) which translates as - one way to achieve the explosion of the bomb, which is applicable in all circumstances, is to achieve the fuse burning and then to wait either until the b o m b explodes, in which case the plan is successful, or until the fuse goes out (for example if someone throws water on the fuse), in which case the plan has failed; - !x : t : ! a ; ' x which translates as - one way to achieve the explosion of the bomb, which is applicable in all circumstances, is to ask agent two to shoot the b o m b and wait until the b o m b explodes; and -- !b : t : +Bdi ; i t e ( ' ( B s i , B f i ) , ( - B s i ; - B d i ) , ( - B f i ; - B d i ; f a i l ) ) which translates as - one way to make the fuse burn, which is always applicable, is to assert the desire to ignite the fuse, and wait until the action is believed successful or failed, etc. The way these plans are interpreted by the plan execution mechanism leads to the agent's internal actions being described by an event e agent, where we know that r a n g e ( ea gent, cr] , ~rb). 6.5
Agent
2 Plans
Agent two's plans might include the following: - +a : "x : !x which translates as - whenever a is newly believed, if x is not believed, try to achieve it; and - !x : t : !l;+B~ds... which translates as - one way to achieve the explosion of the bomb, which is applicable in all circumstances, is to load the gun and then a t t e m p t to shoot the bomb. If the second plan is a bit unwieldy, we can add another belief term s with a plan for !s a t t e m p t i n g to shoot the bomb. In this case we could simplify the !x plan to !x : t : !1; !s. The way these plans are interpreted by the plan execution mechanism leads to agent two's internal actions being described by an event e ~agent, where we know that r a n g e (e lagent, o'], O'b') . 6.6
Putting
I t All T o g e t h e r
We need to state that the different systems we introduced are in fact distinct subsystems of the frame system, s y s t e m (cry, (re, ~rb, ~b~). The interaction of the agents and the environment is given by the single combined event, eatt = e env -t- ei -F e~ + e agent .jr. e lagent. We can be reasonably sure that agent and interface events are the only events to affect the belief systems (in the absence of "hallucinations"). However, we need to explicitly state the assumption that nothing else affects the environment.
133
Together, these amount to stating that an event instance occurs which is of type eau and of type [pr] for all p in the belief and environment systems. The following are some examples of the types of deductions that can be made, given appropriate fairness conditions on the execution of intention threads: If all the sensors are accurate and act immediately and if there are no other events which affect the environment, then we can deduce that after agent one is given the goal !x, the b o m b will explode without agent one having to ask agent two for help. - We can consider what might happen if we relax the condition t h a t nothing else happens in the environment. If we allow the possibility of the fuse going out, then the b o m b will still explode, although agent one m a y be required to ask agent two. If as well we allow the possibility of the gun becoming unloaded, then the b o m b might not explode. However, this failure would only be as a result of both of these events occurring. - Alternatively, we can consider what might happen if the sensors are not perfect. Suppose agent one's sensors are sound but incomplete, in that a belief in b necessarily means that b is true, but sometimes b can be true without the sensor noticing it. T h a t is, B(b) -+ b. In this case the b o m b will explode, but agent one may mistakenly believe that he has to ask agent two. If agent one's sensors are just slow to respond, then the first plan for !x m a y incorrectly fail after successfully achieving !b due to the lack of belief in b. We can adjust the plan for ! b to assert the belief b if it is successful. -
7
Conclusions
By applying a logic of concurrent events and actions to the analysis of BDI agents, we provide a new approach to the study of the agent environment interaction. This approach covers a broader area than m a n y existing approaches. Firstly it provides a rich logic of events for describing complex environments. Secondly it addresses the problem of multiple interacting agents.
8
Acknowledgments
This work was partly supported by the Cooperative Research Centre for Intelligent Decision Systems, Melbourne, Australia. The author would also like to thank Liz Sonenberg and Michael Georgeff for insightful comments and helpful discussions.
References [Agre, 1995] Philip E. Agre. Computational research on interaction and agency. Artificial Intelligence, 72(1):1-52, January 1995. Introduction to special double volume of Artificial Intelligence volumes 72, 73.
134
[Brafman and Tennenholtz, 1994] Ronen I. Brafman and Moshe Tennenholtz. Belief ascription and mental-level modelling. In Jon Doyle, Erik Sandewall, and Pietro Torasso, editors, Principles of Knowledge Representation and Reasoning: Proceedings of the Third International Conference (KR'94), Bonn, Germany, 1994. Morgan Kanfmann Publishers. [Bratman, 1987] M. E. Bratman. Intentions, Plans, and Practical Reason. Harvard University Press, Cambridge, MA, 1987. [Georgeff, 1987] M.P. Georgeff. Planning. In Annual Review of Computer Science, volume 2, pages 359-400. Annual Reviews, Inc., Palo Alto, California, 1987. [Giunchiglia et al., 1994] F. Giunchiglia, L. Spalazzi, and P. Traverso. Planning with failure. In Proceedings Second International Conference on A I Planning Systems (AIPS-94), Chicago, 1994. [Ingrand et al., 1992] F. F. Ingrand, M. P. Georgeff, and A. S. Rao. An architecture for real-time reasoning and system control. IEEE Expert, 7(6), 1992. [Lifschitz, 1990] V. Lifschitz. Frames in the space of situations. Artificial Intelligence, 46:365-376, 1990. [Morley and Sonenberg, 1996] D. N. Morley and E. A. Sonenberg. A logic for concurrent events and action failure. In Norman Foo and Randy Goebel, editors, Topics in Artificial Intelligence, Proceedings of the 4th Pacific Rim International Conference on Artificial Intelligence (PRICAI'96), Lecture Notes in Artificial Intelligence, Volume 1114, pages 483-494, Cairns, Australia, August 1996. Springer-Verlag. [Rao and Georgeff, 1995] A. S. Rao and M. P. Georgeff. BDI agents: From theory to practice. In Proceedings of the International Conference on Multi-Agent Systems (ICMAS-95), San Francisco, USA, 1995. [Rao, 1996] Anand S. Rao. AgentSpeak(L): BDI agents speak out in a logical computable language. In Walter Van de Velde and John W. Perram, editors, Agents Breaking Away, Lecture Notes in Artificial Intelligence, Volume 1038, pages 42-55. Springer-Verlag, 1996. [Schubert, 1994] Lenhart K. Schubert. Explanation closure, action closure, and the Sandewall test suite for reasoning about change. Journal of Logic and Computation, 4(5):679-700, 1994.
Social and Individual Commitment (preliminary report) Lawrence Cavedon t Anand Rao$ Gil Tidhar t ?Dept. of Computer Science, R.M.I.T. GPO Box 2476V, Melbourne, Australia [email protected] tAustralian Artificial Intelligence Institute 171 LaTrobe St., Melbourne, Australia [email protected] [email protected] Agents interacting in team environments make use of a social structure which, Castelfranchi has recently claimed, must be modelled by commitments which are external to any particular agent. Such social commitments are between agents, though usually with respect to some action or state which the committed agent is prepared to perform or achieve. We start with Castelfranchi's concept of social commitment and formalise aspects of it within the BDI agent framework. We use this formalisation to characterise different sorts of social behaviour in the interaction between agents. Abstract.
1
Introduction
An autonomous agent reacting in a continuously changing environment needs to be committed to its goals a n d / o r plans in order to achieve its long-term objectives and not be driven purely by changes in the external environment. Such commitments by an individual agent have been captured as persistent goals or intentions and are commonly referred to as individual commitments [1, 3, 7]. When a group of agents cooperate with each other as a team, one can argue that the team as a whole should have a commitment towards a joint goal. If this commitment is not present there is a danger that each agent will try to satisfy its own goals to the detriment of the joint goal of the team. The collective commitment of the team has been called a joint intention or we-intention [5, 6, 9, 10]. Whether joint commitments are definable in terms of the individual commitments and beliefs of the agents or is an irreducible first-class entity is open to debate--we do not take a particular stance one way or the other in this paper. Castelfranchi [2] states that both the individual commitment and the collective commitment are internal commitments (of either an agent or a group of agents, respectively) and claims that there is a need for a commitment that captures the dependence relationship between two agents. Castelfranchi calls this commitment the social commitment. He argues that this notion is not reducible to the internal commitments of the individual agents or the groups of agents.
153
However, we would expect social commitment to influence and impose some constraints on them. In this paper, we formalise some aspects of social commitment and its relationship with the individual and collective commitments. In particular, we introduce collective attitudes (joint beliefs, joint goals, and joint intentions) as firstclass entities, similar to individual attitudes (beliefs, desires, and intentions). 1 Unlike some of the previous work in this area [5], these collective attitudes are not defined in terms of the individual attitudes, but do imply some combination of individual attitudes. Traditionally, mental attitudes have been taken to be internal and modelled as a relation between an agent (or a team) and a proposition; e.g., BEL(A, r means agent A believes in the proposition r and JBEL(r,r means that the team r jointly believes in r Following Castelfranchi, we introduce the notion of "social" attitudes and model them as ternary dependency relations between two agents (or teams) and a proposition; e.g., $COMI(A, B, r means A is intentioncommitted to B, with respect to r This may lead to A taking on the intention to achieve r for, or because of, B. Below, we discuss various conditions that could be imposed on A under which it would actually adopt such an intention. 2 The notion of social attitude can be used to capture a variety of behaviours with different interactions between their individual, collective, and social attitudes. For example, if an agent A commits honestly and subserviently to a team r to perform an action of type a, then A will perform 3 a whenever r requires i t - - i n fact, A would drop its own individual intentions if they conflicted with r ' s requirements. On the other hand, a more selfish agent might commit to performing a only if doing so doesn't interfere with its own planned courses of action. Using the concepts defined below, we will show how a team's commitment to another team can be characterised in terms of its behaviour. The results described here are formulated with respect to a Belief-DesireIntention (BDI) formal framework for agent specification [3, 7]. We do not focus on the way an individual agent's intentions constrain its commitments to performing certain actions--this is described elsewhere [7] .4 Rather, we are concerned solely with the commitment between agents or teams and other agents or teams. Below, we focus on the syntactic axiomatisation of commitment in the BDI framework. (We do sometimes sketch the corresponding semantic conditions.)
1 As stated above, this stance is taken for convenience, not because we feel particularly strongly towards it. 2 While we have defined social versions of all three attitudes, we focus here on intention. s For convenience, we will sometimes abuse the distinction between action and actiontype in the informal discussion. 4 The relationship between intention and commitment to action offers another dimension for the characterisation of agents, based on their behaviour.
154
2
Internal
Mental
Attitudes
Joint activity involves actions performed by teams, which in turn involves coordinated action by individual agents. A team of agents is effectively a collection of individuals, or of subteams. More formally: 5 D e f i n i t i o n 1. Given some set .4 of constant symbols denoting individual agents, a team of agents is defined as follows: 9 an individual agent constant a E ~4 is a team; 9 a t e a m variable v is a team; 9 if rl, ..., r . are teams, then {vl, ..., rn} is a team and vl, ..., vn are its sub-
teams. We define fl.* to be the set of all teams that can be constructed from the set of agents A. A t e a m m a y consist of an individual agent--hence, each agent is a team. Further, all (relevant) attributes of individuals are also attributable to teams. In particular, we assume that teams have beliefs, desires, intentions, and plans which are not necessarily reducible to the beliefs, desires, intentions, and plans of their subteams. D e f i n i t i o n 2. For each t e a m r and formula r our language includes the modal formulas JBEL(r, r JDES(r, r and JINT(r, r When 7" is a t e a m and r a formula, then JBEL(v, r can effectively be seen as a joint belief, JDES(r, r as a joint desire and JINT(v, r as a joint intention of that team. We do not require these notions to be reduced to the beliefs, desires and intentions of the individual subteams of r, although we do permit i t - - t h i s could be achieved by imposing certain axiomatic requirements on the joint attitudes. For much of the rest of this paper, we will focus on intention. The semantics for the above logic is given with respect to a possible-worlds model. A structure M is a tuple M = (W, {Sw}, {T~w}, M B , JT), .TZ, .4, L>, where: W is the set of worlds; w ranges over W; Sw is a set of time points in world w; T ~ C_ S~ x S~ is a total binary temporal accessibility relation; M B is a mutual-belief accessibility relation that maps each t e a m r E .A* at world w and time t to a set of worlds; similarly, J ~ and J Z are joint.desire, and joint-intention accessible worlds; ~ is the set of primitive propositions; and L is a truth assignment function that assigns to each time point in a world, the set of propositions true at that time point. The semantics of propositional formulas and temporal formulas are as given elsewhere [8] (the main definitions are presented in the Appendix). Joint intentions are defined in a straightforward manner similar to any normal modal operator, i.e., a t e a m r jointly intends r if r is true in all joint-intention-accessible worlds of the t e a m r. If this t e a m happens to be a singleton, i.e., a single agent, then the definition reduces to the well known definition of individual intentions. More formally: This definition of a team is based on that of Kinny et al. [5].
155 9 M, wt ~ JINT(r, r
iff Vw' such that fiX(r, w, t, w') we have M, w't ~ r
Joint desires and beliefs are defined in a similar manner. Even without requiring joint attitudes of teams to be reducible to the attitudes of the subteams, it is still useful to define conditions which constrain the behaviour of subteams based on the behaviour of the team. Some such possible conditions include the following: -
-
A joint intention by a single agent is the same as an internal, individual intention: JINT(a, r -- INT(a, r (Semantic Condition: E(a) - fiE(a)). A joint intention of a team entails a joint intention in all its subteams: JINT(T, r
D JINT(r~, r
where rl, ...ri,...., 7"~ are subteams of r.
The first of these conditions simply ensures that, when a single agent is involved, the notion of JJNTreduces to the usual notion of intention. The second condition ensures that team intentions are propagated downwards to the t e a m ' s subteams. It is debatable as to whether one wants to impose such a requirement on t e a m level intentions: while Levesque et al. [6] require subteams to hold the t e a m ' s goals and intentions, an alternative view is to "distribute" a t e a m ' s goals and intentions to subteams based on each s u b t e a n f s skills and capabilities (e.g., Kinny e t a ] . [5]).
3
Social
Attitudes
As pointed out by Castelfranchi [2], logics of B D I - - s u c h as the one described in the previous section--focus on the relationship between an agent's internal mental attitudes and the state of the world, and fail to take into account the relationships between agents. In order to address this, Castelfranchi introduces the notion of social commitment--one agent's c o m m i t m e n t to another agent to perform a certain action. Castelfranchi discusses two versions of social c o m m i t m e n t : the first is a c o m m i t m e n t from r to # to perform the action a (on a particular occasion); the second, which he calls generic c o m m i t m e n t , is a c o m m i t m e n t from r to/~ to perform an action of type A whenever # requires it. We follow Castelfranchi in defining modalities corresponding to social attitudes-basically, relations between two teams, with respect to a proposition. Our social attitudes are related to Castelfranchi's generic c o m m i t m e n t - - t h e y reflect c o m m i t m e n t s to perform actions of a given type (or, in our case, to achieve certain conditions in the world) whenever they are required. In particular, the c o m m i t m e n t persists over time rather than simply pertaining to a given action at a given point in time. We find the generic c o m m i t m e n t to be a more useful concept and believe it is sufficient for modelling the particular interactions we are interested in.
156
3.1
Persisting Social Relationships
Social attitudes can be used to define a network of inter-team relationships. Such a network is referred to as an organization. The problems of modelling organizations has been one of the objectives of Organization and Management Theory. The transaction cost analysis developed by Williamson [11] suggests that market-like organizations are based on short t e r m agreements t h a t lead to specific, well-defined transactions between the two sides of the agreement. It further suggests that uncertainty about the outcome of transactions and a need to reduce the overhead inherent in the establishment of agreements leads to the development of complex organizational structures. In such organizational structures, specific short t e r m agreements are replaced by generic long term agreements (e.g., employment agreements). T h e difference between "on the spot" agreements and persistent agreements parallels the difference between Castelfranchi's social c o m m i t m e n t and generic c o m m i t m e n t . As discussed above, we model here the persistent, generic commitment. We thus introduce the notion of a social c o m m i t m e n t between two teams, r and #, with respect to a formula, r We denote this by SCOMI(v, p, r this formula states t h a t v has a social intention-commitment to p with respect to r Similar operators can be defined for belief-commitment and desire-commitment. 7 We thus add modal operators to our language as follows.
Definition3. For teams 7" and #, and formula r our language includes the modal formulas SCOMB(r, #, r SCOMD(7-,/z, r and SCOMl(v, p, r The semantics for the logic is extended by adding the extra (ternary) accessibility relations: SB is a social-belief accessibility relation t h a t m a p s two teams r, # E .4* at world w and time t to a set of worlds; similarly, S:D and SE are social-desire and social-intention accessibility relations, respectively. As discussed earlier, a social intention is a c o m m i t m e n t between two teams. Hence, t e a m r has a social intention towards r because of t e a m / z iff r is true in all worlds that are accessible by v and # due to the social intention relation S I . More formally:
* M, wt ~ SCOM~(r, #, r iffV w' E W such that SZ(v, #, w, t, w') we have M, w't ~ r Social desires and beliefs are defined in a similar manner (although we are mostly concerned with intention). The intended interpretation of the formula SCOMI(r, p, r is that v is committed to ft with respect to r by this we mean that r will a t t e m p t to achieve r "for # ' s sake". Just how often v will do this for p - - e . g . , always, often, or s Note that here we use the reverse order of agents to Castelfranchi: he uses SC0MMIT(r,p, a) to mean that /a commits to r, whereas we use SC0Mx(r, p, r to mean that v commits to p. Our reason for this choice is that our order reflects the natural order in which r and p are used in the English description. 7 For the purposes of this paper, we focus on social intention-commitment only.
157 s o m e t i m e s - - i s determined by r's level of c o m m i t m e n t to # (with respect to r This is further discussed below. The social versions of belief and desire reflect similar notions of the attitude r m a y hold towards #. For example, SCOMB('r,p, r is intended to mean that r will believe r if # believes it. How often or how readily v would adopt such a belief basically reflects r ' s trust in #.
3.2
C o n n e c t i n g Social a n d Internal A t t i t u d e s
The strength of the relationship between r and # is affected by the way in which r intends to perform the actions that it has adopted because of some other team, as compared to its own (internal) commitments. Also, the occasions in which # requires the c o m m i t m e n t from r will vary. In particular, this requirement will generally be determined by the team-level plans available to # - - # m a y require other agents or teams to achieve various subgoals of plans it has adopted, depending on the skills of those other agents and teams [5]. For simplicity, we focus on the case where committed agents take on the same intentions as those of the t e a m / a g e n t to which it has committed. This is axiomatised below. The more general case--where the committed agent m a y perform a subtask of the t e a m ' s overall p l a n - - i s briefly discussed later. A potential constraint that we may want to impose on the logic of commitmerit is the following: a social intention by r to p with respect to r entails an internal joint intention in r towards r
9 SCOMI(7, #, O) D JINTO-, r (Semantic Condition: y Z ( r ) C_ ,9Z(r, #)). In fact, this condition is more appropriate if SCOMI was in line with Castelfranchi's notion of social c o m m i t m e n t - - i . e . , if SCOMI(r,/2, r meant that r was c o m m i t t e d to achieving r for # at that moment. Since SCOMI corresponds to persistent commitment, we need to qualify the condition under which r adopts r as an intention. The following axioms represent the sort of interactions that m a y lead to r c o m m i t t i n g to r
9 SCOM,(r, #, r A JINT(#, r D JINT(#, JBEL(r, JINT(#, r 9 SCOMI(T, #, r A JBEL(T, JINT(#, r D JINT(v, r The first axiom states that, whenever # intends r it will try to make 7- aware of this commitment; in turn, once r is aware of #'s intentions, then v takes on this intention itself. We could make r even more altruistic by having it take on any of #'s desires, even when # itself hasn't committed to them. This is modelled by the following:
9 SCOMI(r, #, r A JBEL(r, iDES(p, r
D JINT(r, r
There are, of course, other possible variations--for example, we could drop the requirement that # intends that r believes that it has intended/desired r or
158
that r in fact believes this (this would be tantamount to assuming perfect communication of p's internal states to r). Note again, the above axioms are appropriate in different circumstances, depending on the mode of organisation and coordination of the particular team involved. Finally, we stress again that a team may consist of a single individual agent. As such, we will usually restrict attention to behaviour of, and commitment between, t e a m s . However, it should be remembered that an individual can play the role of a team in any such situation--in such a case, the joint attitudes simply reduce to the standard individual attitudes. Similarly, any mention of an agent can be replaced by the mention of a team.
4
Modelling the Level of Social C o m m i t m e n t
In the previous section, a social commitment resulted in the committed agent r taking on #'s intentions whenever required (or at least whenever r came to realise the requirement). In this section, we further explore the sorts of requirements that one may expect from a social commitment. The nature of the commitment, and how willing an agent is to fulfil it, depends not only on circumstances but also the social characteristics of the committed agent. For example, a "subservient" agent is more likely to take on an intention for some other agent than a "selfish" agent is. Using the notion of S(OMI, we axiomatise agents by their social characteristics, depending on how well they fulfill a social contract. Below, we define a number of axioms formalising the various requirements of a social commitment. We provide only syntactic characterisations here--further work is required so as to semantically characterise these relationships in a perspicuous manner. We require some of the logical machinery of of Rao and Georgeff's BDI logic [8]. These concepts mainly relate to the use of time-trees in the usual CTL* temporal logic which Underlies our BDI logic. * Ar holds if r is true in all possible future time-paths; Er holds if r is true in s o m e possible future time-path; 9 Gr holds if r is true at all time-points along some specified time-path; Fr holds if r is true at s o m e time-point along some specified time-path. The temporal modalities are generally used as follows: intentions are held towards formulas of the form AFr the agent would intend that (eventually) r inevitably become true; to believe in the possibility of achieving r the agent is required to believe in EFr r could become true (in some possible future). Further details of the formal definitions are presented in the Appendix.
Subservient agents and teams The condition required of SCOMI in the previous section corresponds to a team r that will always fulfill its commitment to p. In this case, we call r a s u b s e r v i e n t
159 team. The only additional requirement is that r should believe that the action required of it is achievable.
9 SCOMz(T,#, r A JBEL(r, JINT(#, AFr
A JBEL(v, EFr D JINT(T, AFr
As discussed earlier, the antecedent of the above can be weakened by dropping the first .]BEE requirement (i.e., assume perfect communication) or by having r fulfill # ' s desires (not just its intentions). We could similarly weaken the conditions of all the charaeterisations below. Eagerly helpful agents and teams A subservient agent or team performs the intentions of its master. Some agents and teams are even more eager to please, taking on the very desires of the master, even before the latter has adopted them as its own intentions. We call such a team r eagerly helpful: if # (a team to which 7- is committed) desires to have a task performed, and it is possible for ~- to do it, then it adopts a joint intention towards doing it. More formally, we have
9 SCOMz(r, #, r A JBEL(T, JDES(/~,AFr
A JBEL(T, EFr D JINT(T, AFr
An alternative characterisation of an eagerly helpful agent is an agent that takes on any goals that do not conflict with its own. This would require a means of representating the fact that goals conflict. Grudgingly helpful agents and teams A less helpful agent or team is one that will take on its master's intentions, but only if this does not conflict with its own desires. A team r is grudgingly helpful if, when r is required or requested to perform a task, it only performs that task if it does not desire the opposite. Note that T need not intend the opposite for it to ignore the request.
9 SCOM,(T, #, r JINT(T, AFr
JBEL(T, JINT(#, AFr
JBEL(T, EFr A ~JDES(T, AF~r D
Selfish a g e n t s a n d t e a m s
A selfish agent or team goes even further--it will only ever perform its own desires. In particular, v will not take on an intention for another agent unless T itself desires that condition to be brought about.
9 SCOMz(T,#, r A JBEL(T, JINT(#,AFr JINT(r, AFr
A JBEL(T, EFr A JDES(T,AFr
D
The main difference between a selfish agent and a subservient agent is that the former requires the agent to have a desire in performing the task, while the latter may be agnostic with respect to having a desire for the task.
160
Vindictive agents and teams A vindictive agent or t e a m is a particularly nasty a g e n t - - a t e a m v is vindictive if it refuses to adopt a t e a m goal even if this is one of its own goals.
9 SCOMI(T,/~, r A JBEL(r, JDES(#, AFr
5:) -~JINT(T, AFr
An even stronger notion of vindictiveness would be for r to actually go out of its way to prevent the required goal from being achieved--i.e., r would a t t e m p t to achieve the opposite!
9 SCOMI(r, #, r A JBEL(r, JDES(/~, AFr
A JBEL(r, EF-~r 5) JINT(r, AF-~r
Other characterisations Other (perhaps more pertinent) characterisations of social c o m m i t m e n t would be attainable if the underlying BDI framework was extended so as to incorporate intention revision (e.g., Wobcke [12]). For example, a truly subservient t e a m could be made to drop one of its own intentions when this conflicted with an intention of a t e a m to which it was committed. Another avenue for exploration is to consider the order in which actions are performed or intentions are achieved. For example, a selfish t e a m m a y still perform the goals of the t e a m to which it is committed, but only after it had fulfilled its own intentions first.
5
Discussion and Further Work
This paper describes preliminary investigations into Castelfrachi's [2] notion of social c o m m i t m e n t and how it can be formalised within a BDI logic. In particular, our main interest was to try to model the way in which a generic social c o m m i t m e n t - - i . e . , a c o m m i t m e n t , persistent over time, from one team to another to perform actions of a certain t y p e - - m a y influence the committed t e a m ' s behaviour. This led us to define a number of conditions, of varying strength, that one m a y expect of socially committed teams. These conditions were of the form of implications, rather than equivalences, since Castelfranchi clearly believes t h a t social c o m m i t m e n t involves issues--such as "power" and " c o n t r o l ' - - t h a t we have not explicitly addressed at all here. However, we would hope that further investigations m a y lead to conditions which adequately defined the notion of c o m m i t m e n t between agents. One i m p o r t a n t aspect of social c o m m i t m e n t that we have not addressed is the issue of how and when such commitments arise. It would seem that social c o m m i t m e n t s are often put in place after explicit negotiation and agreement between agents; alternatively, one agent (particularly a subservient one) m a y commit to another agent without even being asked. An interesting scenario involves the case where such c o m m i t m e n t s are persistent for some time but essentially dynamic: for example, an agent m a y join a team, involving a c o m m i t m e n t to
161
perform some task for that team (whenever required), but later relinquishing its responsibilities by dropping out of the team structure. Future work involves investigating such flexible social commitments, whereby teams are dynamically f o r m e d - - a n d commitments made--according to the particular "skills" possessed by the potential team participants [5].
Appendix In this appendix, we present further details of the syntax and semantics of Rao and Georgeff's [8] BDI logic. The presentation below is based on that in [8].
Syntax BDI logics CTLBDI and CTL~D I [8], are propositional modal logics based on the branching temporal logics CTL and CTL* [4], respectively. The primitives of these languages include a non-empty set 9 of primitive propositions; propositional connectives V and -7; modal operators BFL (agent believes), DFS (agent desires), and [NT (agent intends); and temporal operators X (next), U (until), F (sometime in the future or eventually) and F (some path in the future or optionally). Other connectives and operators such as A, D, -=, G (all times in the future, or always) and A (all paths in the future, or inevitably) can be defined in terms of the above primitives. There are two types of well-formed formulas in these languages: state formulas (which are true in a particular world at a particular time point) and path formulas (which are true in a particular world along a certain path). State formulas are defined in the standard way as propositional formulas, modal formulas, and their conjunctions and negations. The objects of F and A are path formulas. Path formulas for CTL[3DX can be any arbitrary combination of a linear-time temporal formula, containing negation, disjunction, and the linear-time operators X and U. Path formulas of CTLBD~ are restricted to be primitive linear-time temporal formulas, with no negations or disjunctions and no nesting of lineartime temporal operators. For example, AF(p V q) is a state formula and GFp is a path formula of CTL~D I but not of CTLBDI.
P o s s i b l e - W o r l d Semantics
A structure M is a tuple M = (W, {S~o}, {7~0 }, B, 7), Z, L) where W is the set of worlds, Sw is a set of time points in world w; 7~w C Sw • Sw is a total binary temporal accessibility relation; L is a truth assignment function that assigns to each atomic formula the set of world-time pairs at which it holds. Finally, B is a belief accessibility relation that maps a time point in a world to a set of worlds that are belief accessible to it; and 7) and 2? are desire and intention accessibility relations, respectively, that are defined in the same way as B. A fullpath (wo, wl, ...) in w is an infinite sequence of time points such that (wi, wi+l) E Tl~ for all i. Satisfaction of a state formula r is defined with respect to a structure M, a world w and a time point t, denoted by M, wt ~ r
162
Satisfaction of a path formula r is defined with respect to a structure M, world w, and a fullpath (Wto, wt~, .,.) in world w. wt ~ r iff (w, t) E L(r for r an atomic formula; wt ~ -~r iff M, wt ~= r - M, wt ~ r V r if["M, wt ~ r or I, wt ~ r - M, (Wto, wtl, ...) ~ r iff M, Wto ~ r for r a state formula; - M, (wto, wtl, ...) ~ X(r iff M, (wtl, ...) ~ r - M, (Wto, wt,, ...) ~ F(r iff 3wtk E {Wto, wt,, ...} s.t. M, (wtk, wtk+l, ...) ~ r - m ~ Wto ~ A(r iff M, (wto, wt,, ...) ~ r for all fullpaths (Wto, wtl, ...); - M (Wto, wt~, ...) ~ r162 iff for some i >_ O, M, wt, ~ r and for all 0 < j < i, M, wt# ~ r - M , wt ~ BEL(r iff M, w~ ~ r for all w' satisfying (w, t, w') e B; - M ~ wt ~ DES(C) iff M, w~ ~ r for all w' satisfying (w, t, w') E 7); iff M, w~ ~ r for all w' satisfying (w,t,w') e Z. - M ~ wt ~ INT(r -M -
M,
(We define E(r
_-- -~A(-~r and G(r
= -~F(-~r
References
1. M. E. Bratman. Intentions, Plans, and Practical Reason. Harvard University Press, Cambridge, MA, 1987. 2. C. Castelfranchi. Commitments: from individual intentions to groups and organizations. In Proceedings of International Conference on Multi-Agent Systems, pages 41-48, 1995. 3. P. R. Cohen and H. J. Levesque. Intention is choice with commitment. Artificial Intelligence, 42(3), 1990. 4. E. A. Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science: Volume B, Formal Models and Semantics, pages 995-1072. Elsevier Science Publishers and MIT Press, Amsterdam and Cambridge, MA, 1990. 5. D. Kinny, M. Ljungberg, A. Rao, E. Sonenberg, G. Tidhar, and E. Werner. Planned team activity. In C: Castelfranchi and E. Werner, editors, Artificial Social Systems, volume 830 of Lecture Notes in Computer Science, pages 227-256. Springer Verlag, 1994. 6. H. J. Levesque, P. R. Cohen, and J. H. T. Nunes. On acting together. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), pages 94-99, 1990. 7. A. S. Rao and M. P. Georgeff. Modeling rational agents within a BDI-architecture. In J. Allen, R. Fikes, and E. Sandewall, editors, Proceedings of the Second International Conference on Principles of Knowledge Representation and Reasoning. Morgan Kaufmann Publishers, San Mateo, CA, 1991. 8. A. S. Rao and M. P. Georgeff. A model-theoretic approach to the verification of situated reasoning systems. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJ CAI-93), Chamberey, France, 1993. 9. J. R. Searle. Collective intentions and actions. In P. R. Cohen, J. Morgan, and M. E. Pollack, editors, Intentions in Communication. MIT Press, Cambridge, MA, 1990.
163
10. R. Tuomela and K. Miller. We-intentions. Philosophical Studies, 53:367-389, 1988. 11. O. E. Williamson. The economies of antitrust: Transaction cost considerations. In Economic Organization, chapter 11, pages 197-249. Wheatsheaf Books, 1986. 12. W. Wobeke. Plans and the revision of intentions. In First Australian Workshop on DAI, L N A I 1087, pages 100-114, 1995.
Emergent Properties of Teams of Agents in the Tileworld Malcolm Clark, Kevin Irwig, Wayne Wobcke Basser Department of Computer Science University o f Sydney, Sydney N S W 2006 Australia
Abstract An emergent property of a system is a property of the system that is not possessed by any of its components. In this paper, we examine some emergent properties of teams of agents in the Tileworld which arise from agents communicating their intentions to other agents on the same team. As expected, teams of communicating agents outperform noncommunicating agents, and teams with two-way communication abilities outperform teams where a transaction consists of a single message only. The surprising result was that, up to a certain limit, as the size of the team increased, the individual performance of the team members actually increased, even though there were more agents in competition for the resource. This is because the increased rate of replenishment of resources due to other agents' consumption more than compensates an agent for the negative effects of competition. This property depends on the effect of communication in causing the agents to spread out and avoid interfering with one another. We give a partial explanation for this phenomenon. We also investigate the effectiveness of communication between agents, a n d show that the utility of communication varies logarithmically with the range of communication.
1. I n t r o d u c t i o n An emergent property of a system is a property of the system that is not possessed by any of its components. The complex emergent property arises from, and is caused by, the properties of the component parts and the interactions of those components. While the emergent properties of neural networks and systems having very simple components is much studied, e.g. the distributed representations of Hinton, McClelland and Rumelhart (1986) and the insect robots of Brooks (1991), our interest lies in the emergent properties of collections of more complex agents based on a Belief-Desire-Intention (BDI) architecture, especially when such agents can communicate with one another. This sense of emergence is to be contrasted with one defined as a property that a population of individuals develops over a period of time,
165
either through each individual's learning or through the reproduction and eventual dominance of a subpopulation whose members possess the property, which presumably confers some survival advantage. Establishing a convention through individual agents' learning and communication has been studied by Shoham and Tennenholtz (1992) and Walker and Wooldridge (1995). The convergence of a population through reproduction is the subject of research into genetic and other evolutionary algorithms, e.g. Holland (1975). More recent work on the emergence of properties in Artificial Life scenarios may be found in Werner and Dyer (1991) and Ono, Ohira and Rahmani (1995). In this paper, we investigate the emergent properties of simple agents in a variation of the Tileworld, a simple domain in which agents compete for a "food" resource. In essence, the Tileworld we use is a 2-dimensional grid on which an agent scores points by moving to targets, known as holes. When the agent reaches a hole, the hole is filled, and disappears. We conducted a number of experiments in which the agents formed fixed teams, with varying styles of communication between team members. At the simplest level, agents communicate to other team members the hole for which they are headed, with no cost associated with communication. We expected, of course, that teams whose members communicated their intentions in this way would out-perform teams with no communication, since both sorts of agent must make the same decisions, with the communicating agents having more information on which to base their choice of action. We also expected that as team size (and so the number of agents) increased, the overall team performance would increase, but the individual performance of team members would decrease as team members began to compete with one another. That is, when each communicating agent knew the intentions of some of the other agents, we expected these agents would be able to avoid interfering with one another, so that although individual performance would decrease as team size increased, it would decrease more markedly for noncommunicating agents than for communicating agents. What was unexpected was that, up to a certain limit, the individual performance of the team members actually increased as the size of the teams increased: just being in a team provides the agent with an advantage. This is an emergent effect of the increased replenishment rate brought about by the increased consumption of resources, and is also due to the effects of communication in causing the agents to spread out and avoid interfering with one another. Part of the explanation of this result is that the number of holes is maintained at a constant level, making it possible for individuals to improve their performance as the number of agents increases. Another interesting property of the system concerns the utility of communication to the agents within a team, especially the effect of the range of communication on this utility. Here it was to be expected that as the range of communication increased, the utility of communication would also increase (as an agent would communicate more often to other agents), but also that the usefulness of the information sent would decrease (information about agents close to a given agent is much more likely to be relevant to an agent's decision than information about agents further away). What was found was that utility of communication varied logarithmically with the
166
range of communication, and this can, in turn, be explained by the 2-dimensional nature of the Tileworld grid. Our experiments were conducted in a version of the Tileworld based on that described in Kinny and Georgeff (1991). In partto validate our simulation, and in part to replicate Kinny and Georgeff's results, we conducted some experiments concerning the effectiveness of different commitment strategies of agents to their plans. Our experiments did, in fact, confirm these results: with a slow degree of change in the world, continued commitment to intentions is advantageous; however, as the degree of change in the world increases, agents that are able to react opportunistically have an advantage. The multiagent Tileworld system and our experimental methodology is also related to that of Goldman and Rosenschein (1994), who investigate the usefulness of various "cooperation strategies" for noncompetitive agents. However, the domain used in their experiments is a static one, in which the only cost to an agent of acting cooperatively is the delay incurred in achieving its current goal. Agents do not communicate with other agents, but rather change the world in ways which are considered beneficial (to the body of agents) according to a standard heuristic measure of the utility of world states. Coordination emerges from the individual adoption of helpful strategies by the agents in the world. The results presented here are, of necessity, partly determined by features specific to the Tileworld. Although the Tileworld is a very simple domain, it does bear some similarity to more realistic application settings such as stock order fulfilment in a warehouse and delivery of parcels in a transportation network. Our results suggest the possibility of similar emergent properties occurring in these more complex domains.
2. The Simulation Each individual agent in the simulation is based on the following simple BDIinterpreter, defined by Rao and Georgeff (1992).
BDI-interpreter initialize-state(); do options := option-generator(event-queue, B, G, I); selected-options := deliberate(options, B, G, I); update-intentions(selected-options, I); execute(I); get-new-external-events(); drop-successful-attitudes(B, G, I); drop-impossible-attitudes(B, G, I) until quit
167
A rational agent using this interpreter is reminiscent of a discrete event simulation system. At each time cycle, each agent has a number of beliefs (B), goals (G) and intentions (I), and there are also a number of actions the agent can perform, represented in an event queue. Some of these actions are selected for possible execution, then these possibilities are further reduced by "deliberation". After execution, new facts are observed and the system state updated. Successfully fulfilled intentions and those now considered impossible are dropped before the next cycle commences. In the Tileworld, an agent's options are the holes visible to that agent. In the simplest case of noncommunicating agents, an agent's selected option is the hole closest to that agent. Getting external events corresponds to the agent's updating its state of the world, based on its perception. An agent will drop its intention to move towards a particular square if it reaches it (intention fulfilled) or the hole disappears (intention impossible). This interpreter can very easily be incorporated into a multiagent setting. The main issue is ensuring that the simulation is "fair" in the sense that no agent or group of agents is advantaged by incidental properties of the simulation. To ensure fairness, the simulator consists of a collection of BDI-agents with an external clock. At each clock tick, each agent runs its interpreter through one cycle, accepting a new input state of the world and performing one action (although communicating agents may be allowed to both move in the world and send one message in one time cycle). In the simulation, the agents execute in sequence. This results in an "asynchronous" mode of execution, i.e. it is impossible for two agents in the simulation to act concurrently. The order in which the agents run, however, is randomized at the beginning of each clock cycle. This ensures that no agent is treated unfairly in the long run due to being positioned consistently early or late in the cycle. The implementation consists of three main components: the simulator, the world and the agents. The simulator simply provides each agent with a chance to act at each cycle. The world consists of the position of the agents and the holes. The agents can view and manipulate the world and send messages to one another. The dynamics of the domain are implemented as the actions of a special "controller" agent, which is responsible for maintaining the state of the world, e.g. making holes spontaneously disappear or appear. This agent is different from other agents only in that it is always called by the simulator at the beginning of a cycle, and that its implementation allows it to manipulate the world in ways that ordinary agents cannot. The implementation of the agents determines the details of the simulation. An agent, when called by the simulator, can check for messages, send messages, look at the world, move around in the world and change the world. Limitations on the actions of an agent (such as only being able to see part of the world or only being able to move one step) are enforced solely by the implementation of that agent. This allows different agents to have different capabilities without the world having to be altered, and is particularly useful if one wants to simulate several different types of agent competing in a single world. As they are created, each agent is allocated to a team, which is made up agents of the same type.
168
3. The Tileworld The Tileworld was first presented by Pollack and Ringuette (1990) as a testbed for multiagent systems. It has since been presented in many different forms. In essence, the Tileworld consists of a 2-dimensional grid of squares. Each square may contain a hole, a tile, or neither. Agents can move one square at a time, either north, west, east or south. In the original Tileworld, an agent's goal is to fill holes with tiles, scoring points for each hole filled. Some versions of the Tileworld include obstacles which agents must circumnavigate, holes of various depths, and even "gas stations" where agents must go to refuel. Our experiments were conducted in the simplified Tileworld based on that defined by Kinny and Georgeff (1991), which consisted only of holes and agents. In this world, an agent "fills" a hole simply by moving over it. The world in which our experiments were conducted, defined by our controller agent, is a simple one. Our controller agent is given two parameters: the number of holes, H, in the world and the "vanishing rate," v, of the holes. Each time it is called, the controller agent ensures that there are the correct number of holes (H) in the world by randomly placing new holes whenever the number falls below that given. The vanishing rate of holes is a measure of how dynamic the world is. It is defined as the probability that any given hole will "spontaneously" disappear (i.e. be deleted by the controller agent) during a given cycle. A vanishing rate of 0 means that no holes will disappear unless filled in by one of the agents. A vanishing rate of 1 means that every hole will disappear and reappear at a random position each cycle. An important consequence of this regime is that holes do not become more scarce as the number of agents increases, although competition for the holes increases. Thus there is no inherent limitation on the performance of agents due solely to the scarcity of resources, as the holes filled by agents are replenished by the controller agent at each cycle. The expected lifetime of a given hole for a vanishing rate v can be calculated as follows. The probability of the a hole on the grid disappearing during a given cycle is v. The probability of a hole which was created at cycle 0 disappearing exactly at cycle k is v.(1 - v) k i. A weighted sum of this value from k = 1 to oo gives the expected lifetime of a hole, L. Thus:
L= v. ~[k.(1-v) k-l]= k=l
v _1 [ 1 - ( I - v ) ] 2 v"
For the experiments we conducted, the world was a 50• grid containing either 20 or 30 holes. Vanishing rates varied from 0 to 0.1 (i.e. at the most dynamic, a tenth of the holes disappear at each cycle). For simplicity, the agents are capable of seeing the whole world: this has no effect on the results because with a grid size of 50 and 20 holes, there is almost certainly one hole close to the agent that it could see even with reduced vision. An agent's performance in a simulation is measured by the number of holes it fills.
169
4. T h e E x p e r i m e n t s There were three different types of agent used in the simulations. There was no cost associated with planning and each agent reconsidered its plan at every cycle. The first type of agent is our baseline. It has a simple planning strategy, at each cycle forming an intention to fill the closest hole. This agent does not communicate at all, and is hereafter referred to as a Non-Communicating Agent (NCA). The second type of agent also plans to fill the closest hole. However, this agent has the ability to send and receive messages to and from other agents in the same team. This type of agent will be referred to as a Message Passing Agent type 1 (MPA1). When there is a team member within a certain range r, it informs that agent of its current intention. If there is more than one agent within this range, it only informs the closest one. When an MPA1 receives such a message from a team member, it never forms an intention to fill the same hole. Thus two MPAls in the same team should never intend to fill the same hole at the same point in time. If an MPA1 does not execute its plan due to a change of intention, it sends a message to its closest team member within range r signalling its change of intention. Note that there is no cost associated with sending or receiving messages. The third type of agent is called an MPA2. This agent is able to have a limited "discussion" with other team members. The types of messages sent are identical to those sent by MPAls; however when an MPA2, say agent-y, is informed of agent-x's intention, it does not automatically exclude the possibility of forming an intention to fill the hole in question. If the hole is agent-y's closest hole, and agent-y is closer to the hole than agent-x, it will reply to agent-x signalling that the hole is the subject of its intention. Agent-x is then forced to abandon its intention to fill that hole. Again, there is no cost associated with sending or receiving messages.
4.1. Team Size of Communicating Agents This experiment tests the effect of communication on the performance of teams of agents. Each simulation consisted of a single team of agents of one type. Figures 1 and 2, below, plot the number of holes filled per agent in the team, for different team sizes. The Tileworld used for the experiments consisted of a 50x50 grid with 30 holes, and the scores refer to the number of holes filled by an agent in 100 000 cycles starting from a randomly generated position. Communication takes place between an agent and its closest neighbour up to a range, r, of 10. The value of 10 was chosen because, with this value, an agent almost always knows its nearest neighbour's intention. So if communication is useful at all, it is likely to be effective within this range. We expected that teams of communicating agents would perform better than the non-communicating agents, and also that teams of MPA2s would outperform teams of MPAls. This was indeed the case. We also expected that, while a larger team would get a total of more holes than a smaller team, the number of holes filled by each individual agent would decrease as the team size increased. This was expected as the presence of other agents sometimes forces an agent to abandon its intention to fill its closest hole due to a message from a team member. Therefore
170 agents are sometimes forced to head for their second or third closest hole rather than their first choice of the closest. When the vanishing rate of holes is 0.0, this is indeed the case, as shown in Figure 1. However, when the world becomes slightly dynamic (v=0.02), it is a surprise to find that, as illustrated in Figure 2, for the team of MPA2s and less so for the team of MPAls, the average holes filled per agent actually begins to increase as the team size grows. This is an emergent property of the system. It seems that, in a dynamic world, it benefits an agent to have other agents in the world. We expected that being informed of other agents' intentions would increase the score of the team. However, we did not expect that an individual agent's performance would improve beyond what it would have been were there no other agents in the world. The increase in average performance of MPA2s as team size increases can clearly be seen in Figures 2 and 3. Figure 2 shows the performance of agents in teams of all different types. The NCA agents show a decrease in average performance as the team size grows. The MPA1 agents show a slight increase in average performance as the team size grows from 1 to 4 agents, after which the performance begins to decrease. The increase in average performance of the MPA2s is more dramatic. Agents in teams of up to 15 members have an average performance which is better than that of a team of size 1 (an agent acting alone). Figure 3 shows the average performance of MPA2s against team size, for different hole vanishing rates. The rise in performance is evident for low vanishing rates, but not when the world is very dynamic. We believe that this phenomenon can be explained intuitively as follows. Prima facie, there are two main factors influencing the performance of an agent in a dynamic world: the disappearance of holes due to the non-zero vanishing rate and the competition for holes from other agents. Clearly these are both negative influences on an agent's performance. However, there is a potential positive influence from the increased replenishment rate of holes (the rate at which new holes x lo4
P e r f o r m a n c e of A g e n t s v T e a m Size (v = 0.0)
1.35
~
,20
OJ ~
1,15
Q) v
1.10
Team Size
Fig. 1: Graph showing average performance of different types of agents working in teams, for a hole vanishing rate of 0.0
171
x lO 4
Performance of Agents v Team S i z e ( v = 0 . 0 2 )
135
1,30
12 5
(0
9~
~
1,20
O 1.15
< 1.00
....
;
,;
1~
20
Team Size
Fig. 2: Graph showing average performance of different types of agents working in teams, for a hole vanishing rate of 0.02
are created, either due to other holes disappearing or to other holes being consumed). Since our world assumes a constant availability of holes, the more holes consumed, the greater the replenishment rate. Hence as the number of agents increases, the replenishment rate also increases. For the experiments described above, it seems that this positive influence actually outweighs both negative influences. This is an indirect effect of the communication between agents: when an agent is informed of another agent's intention to fill a particular hole, the agent moves away from that hole. This means that, over time, agents on a team "spread out" so as to reduce the likelihood of their interference. Now, when an agent's intended hole disappears, it must choose a new hole to fill. The increase in replenishment rate as the number of agents increases means that the agent will not need to travel as far to reach this hole: intuitively, because new holes are created in random positions, from the point of view of a single agent, more holes appear in its local environment than disappear. Now compare this situation to that of the static world, where as the number of agents increases, the performance of the agents decreases. Here, competition between agents is a negative influence while an increased replenishment rate (due to increased consumption) is a positive effect on performance. The scores from Figure 1 indicate that the zero vanishing rate is the main factor resulting in the high performance of the agents. We believe that the reason the negative factor outweighs the positive factor is that competition in a static world has more of a negative effect on performance than in a dynamic world. This is because, when an agent is forced to give up its intended hole in a static world, the competition has caused the loss of that hole to the agent's overall score. However, in a dynamic world, an agent may also fail to fill its intended hole due to the natural attrition of holes, so the impact of competition itself is not as serious.
172
x1o4
MPA2 at Different Hole Vanishing Rates
1.45
1.40
v =0.01 v = 0.02 v = 0.05 = ,
+ • --
1.25
~'~ 1.25
i
1.25 1.20
~ 1.15 1.10
1.25
1
I 2
* 4
* 6
I 8
I 10
I
12
z 14
Team S i z e
Fig. 3: Graph showing average performance of MPA2 team members for different hole vanishing rates However, all this assumes the constancy of other influencing factors. In particular, it assumes that communication is usually efficacious in making agents aware of the other agents' intentions in time for that information to be useful. For example, with a very small range of communication, or with no communication at all, agents do not spread out to avoid interfering with one another, so the negative effect of competition always outweighs any positive effect (as shown in Figures 1 and 2 for NCAs). We might also anticipate that with a large range of communication, there may be an improvement in performance as team size increases, even in a static world. This is because the negative effect of competition may be ameliorated by the increased information available from communication (although this may still not be enough to compensate for the effects of competition). Similarly, as the vanishing rate becomes smaller and smaller, we would expect that at some point, the negative effects of competition would start to balance the positive influence of replenishment, as in the static world. Also, as the number of holes (and thus the density of holes) in the world increases, we would expect similar effects, only less pronounced. 4.2. Range of Communication As described above, MPA 1s and MPA2s communicate with team members which are within a certain range r. As r increases, so does the number of messages being sent. To investigate the effect of communication, we ran simulations consisting of a team of 10 MPA2s in a world with a 50x50 grid containing 20 holes, with a vanishing rate of 0.02. As r increased, so did the effectiveness of the team, measured as the number of holes filled in 100 000 cycles, as shown in Figure 4. Figure 5 shows the same data plotted with a log scale on the range axis, confirming that the curve in Figure 4 has the shape of a logarithmic function.
173
Effectiveness vs Range --
,
,
p
10oo0
8~
gN ~,~
:si!' 850O
i
5
i
i
10
15
20
Range
Fig. 4: Graph showing the effect of range of communication on performance of a team of 10 MPA2s
Effectiveness vs Range
10000
~ v
.
.
.
.
.
.
J I
10
Range(logscale) Fig. 5: As for Figure 4, but plotted with a log scale on the range axis
174
There is only ever direct benefit in communication between agents when, if not for that communication, both agents would intend to fill the same hole. Therefore, the value of communicating with an agent depends on how far away that agent is. This is because the further apart the agents, the less likely it is that they would both be heading for the same hole. As the distance between two agents increases, the value of their communication decreases. It seems clear that the relationship between the value of communication and the distance between the agents is not a linear one. If two agents are close, the value of communication between them is large. This is because they are likely to have the same closest hole. As we consider agents which are further apart, the likelihood that they have the same closest hole decreases by a non-linear amount. Assume that the value of communicating with an agent some distance d away is proportional to 1/d 2 . This is not an unreasonable assumption; the area between them likely to contain their closest holes is proportional to d 2 , and therefore the chance that they have the same closest hole is approximately proportional to 1/d 2. Consider the grid as concentric "rings" of squares with an agent at the centre. This is shown in Figure 6 below. The value of a square at distance d from the agent, Sd, is given by S d = k / d 2 for some constant k. Each square in a given ring is the same distance from the agent (recall that distance refers to "Manhattan" distance, i.e. no diagonal steps are allowed). The number of squares in the d th ring, Nd, is 4d. Thus, the value to the agent of including the d h ring in its range of communication, Rd is: 4k K Rj =S d .N d . . . . . d d
4 434 4 3 2 3 4 4
4 3 2 1 2 3 3 2 1 0 1 2 4 3 2 1 2 3
4 3 4 4
4 3 2 3 4 434 4
Fig. 6: Illustrates the concentric rings of equal Manhattan distance from an agent
175
We can now approximate the value of a range r, Er, by summing the value of the rings of distances 1 to r. This gives the following approximation: r
r
|
E,. = ~ R j + c = K . ~ - - + c = a=l a=~d
K.logr+c
By substituting points on the graph, the constants are found to be K=685 and c=8132. Figures 4 and 5 show that this function well approximates the data for ranges up to around 10 (the theoretical function plotted here is the summation term, not the logarithmic function). For ranges larger than this, the assumption that the value of communication at distance d is proportional to 1/d2 appears to be invalid. This is reasonable, as the size of the grid means that agents this far apart will rarely be the closest agents to each other, hence communication at this distance is not likely to ever occur.
5. Conclusion We presented several experiments in a simplified Tileworld, using simple, reactive agents. The experiments dealt with the effect of communication between agents in accomplishing a task, assuming that such communication took no extra time. We showed that the greater the degree of communication between our agents, the better they perform. We also showed that a team of agents capable of two-way communication (where a limited "discussion" can take place) performs better than a team capable only of one-way communication (where a transaction consists of a single message only). Moreover, an emergent property of the system is that, in a dynamic world, the performance of an agent is actually enhanced by having other agents in the world. Each individual actually does better when team members are present than when it has the whole world to itself. This appears to be because an increase in replenishment rate due to an increase in consumption rate outweighs the negative effects of competition. This is possible because communication of the agents' intentions results in the agents spreading apart and avoiding interference with one another. The range over which agents can communicate also affects their performance in an interesting and predictable way: the utility of communication varies logarithmically with the range of communication. Although the results are determined in part by features specific to the Tileworld, we believe they raise the possibility of similar emergent properties in more complex multiagent systems, especially ones in which replenishment of resources is a significant factor in comparison with competition amongst agents.
Acknowledgements We are heavily indebted to Ed Golja for technical contributions to this work and to the Australian Research Council for financial support.
176
References Brooks, R.A. (1991) "Intelligence without Representation." Artificial Intelligence, 47, 139-159. Goldman, C.V. & Rosenschein, J.S. (1994) "Emergent Coordination through the Use of Cooperative State-Changing Rules." Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94 ), 408-413. Hinton, G.E., McClelland, J.L. & Rumelhart, D.E. (1986) "Distributed Representations." in Rumelhart, D.E., McClelland, J.L. & The PDP Research Group (Eds) Parallel Distributed Processing. Volume 1. MIT Press, Cambridge, MA. Holland, J.H. (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI. Kinny, D.N. & Georgeff, M.P. (1991) "Commitment and Effectiveness of Situated Agents." Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, 82-88. Ono, N., Ohira, T. & Rahmani, A.T. (1995) "Emergent Organization of Interspecies Communication in Q-Learning Artificial Organisms." Advances in Artificial Life: Proceedings of the Third European Conference on Artificial Life, 396-405. Pollack, M.E. & Ringuette, M. (1990) "Introducing the Tileworld: Experimentally Evaluating Agent Architectures." Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), 183-189. Rat, A.S. & Georgeff, M.P. (1992) "An Abstract Architecture for Rational Agents." Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, 439-449. Shoham, Y. & Tennenholtz, M. (1992) "Emergent Conventions in Multi-Agent Systems: initial experimental results and observations (preliminary report)." Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, 225-231. Walker, A. & Wooldridge, M.J. (1995) "Understanding the Emergence of Conventions in Multi-Agent Systems." Proceedings of the First International Conference on Multiagent Systems, 384-390. Werner, G.M. & Dyer, M.G. (1991) "Evolution of Communication in Artificial Organisms." in Langton, C.G., Taylor, C. & Farmer, J.D. (Eds) Artificial Life II. Addison-Wesley, Reading, MA.
Lecture Notes in Artificial Intelligence (LNAI)
Vol. 1047: E. Hzjnicz, Time Structures. IX, 244 pages. ] 996. Vol. 1050: R. Dyckhoff, H. Herre, P. Schroeder-Heister (Eds.), Extensions of Logic Programming. Proceedings, 1996. VIII, 318 pages. 1996. Vol. 1053: P. Graf, Term Indexing. XVI, 284 pages. 1996. Vol. 1056: A. Haddadi, Communication and Cooperation in Agent Systems. Xlll, 148 pages. 1996. Vol. t069: LW. Pertain, J_-P. M~Iler (Eds.), Distributed 5oftware Agents and Applications. Proceedings, 1994. VIII, 219 pages. 1996.
Vol. 1126: J.J. Alferes, L Moniz Pereira, E. Orlowska (Eds_), Logics in Artificial Intelligence. Proceedings, 1996. IX, 417 pages. 1996. Vol. 1137: G. G6rz, S. H011dobler (Eds.), KI~96: Advances in Artificial Intelligence. Proceedings, 1996. XI, 387 pages. 1996. Vol. 1147: L. Miclet, C. de la Higuera (Eds.), Grammatical Inference: Learning Syntax from Sentences. Proceedings, 1996. VIII, 327 pages. 1996. Vol. 1152: T. Furuhashi, Y. Uchikawa (Eds.), Fuzzy Logic, Neural Networks, and Evolutionary Computation. Proceedings, 1995. VIII, 243 pages. 1996.
Vot. 1071: P. Miglioli, U. Moscato, D. Mundici, M. Ornaghi (Eds.), Theorem Proving with Analytic Tableaux and Related Methods. Proceedings, 1996~ X, 330 pages. 1996.
Vol. 1159: D.L. Borges, C.A.A. Kaestner fEds.), Advances in Artificial Intelligence. Proceedings, 1996. XI, 243 pages. 1996.
Vol. 1076: N. Shadbolt, K O'Hara, G. Schreiber (Eds.), Advances in Knowledge Acquisition. Proceedings, 1996. XII, 371 pages. 1996.
Vol. 1160: S. Arikawa, A K . Sharma (Eds.), Algorithmic Learning Theory. Proceedings, 1996. XV1L 337 pages. 1996.
Vol. 1079: Z. W. Ra~, M. Michalewicz (Eds.), Foundations of Intelligent Systems. Proceedings, 1996. XI, 664 pages. 1996.
Vol. 1168: I. Smith, B. Faltings (Eds.), Advances in CaseBased Reasoning. Proceedings, 1996. IX, 531 pages. 1996.
Vol. 1081: G. McCalla (Ed.), Advances in Artificial Intelligence. Proceedings, 1996. XII, 459 pages. 1996. Vol. t083: K. Sparck Jones, J.R. Galliers, Evaluating Natural Language Processing Systems. XV, 228 pages. 1996. Vol. 1085: D.M. Gabbay, H.I. Ohlbach (Eds.), Practical Reasoning. Proceedings, 1996. XV, 721 pages. 1996. Vol. 1087: C. Zhang, D. Lukose (Eds.), Distributed Artificial Intelligence. Proceedings, 1995. VI11, 232 pages. 1996. Vol. 1093: L. Durst, M. van Lambalgen, F. Voorbraak (Eds.), Reasoning with Uncertainty in Robotics. Proceedings, 1995. VIII, 387 pages. 1996. Vol. 1095: W. MeCune, R. Padmanabhan, Automated Deduction in Equational Logic and Cubic Curves. X, 231 pages. 1996. Vol. ] }04: M.A. McRobbie, J.K. Slaney {Eds.), Automated Deduction - Cade- 13. Proceedings, 1996. XV, 764 pages. 1996. Vol. 1111 : J. J. Alferes, L. Moniz Pereira, Reasoning with Logic Programming. XXI, 326 pages. 1996. Vol. I 114: N. Foo, R. Goebel (Eds.), PRICAF96: Topics in Artificial Intelligence. Proceedings, 1996 XX1, 658 pages. 1996. Vol. 1115: P.W. Eklund, G. Ellis, G~ Mann (Eds.), Conceptual Structures: K n o w l e d g e R e p r e s e n t a t i o n as Interlingua. Proceedings, 1996. XIII, 321 pages. 1996.
Vol. 1171: A. Franz, Automatic Ambiguity Resolution in Natural Language Processing. XIX, 155 pages. 1996. Vol. 1177: J.P. Miiller, The Design of Intelligent Agents. XV, 227 pages. I996. Voi. I [87: K. Schlechta, Nonmonotonic Logics. IX, 243 pages. 1997. Vol. 1188: T.P. Martin, A.L. Ralescu (Eds.), Fuzzy Logic in Artificial Intelligence. Proceedings, 1995. VIII, 272 pages. 1997. Vo}. 1193: J.P. Miiller, M.J. Wooldridge, N.R. Jennings (Eds~), Intelligent Agents t1I. XV, 40! pages. 1997. Vol. 1195: R. Trappl. P. Petta (Eds.), Creating Personalities for Synthetic Actors. VII, 251 pages. 1997. Vol. 1198: H. S. Nwana, N. Azarmi (Eds.), Software Agents and Soft Computing: Towards Enhancing Machine Intelligents. X1V, 298 pages. 1997. Vol. 1202: P. Kandzia, M. Klusch (Eds.), Cooperative Information Agents. Proceedings, 1997. IX, 287 pages. 1997. Vol. 1208: S. Ben-David (Ed.), Computational Learning Theory. Proceedings, 1997. VIII, 331 pages. 1997. Vol. 1209: L. Cavedon, A. Rao, W. Wobcke (Eds.), Intelligent Agent Systems. Proceedings, 1996. IX, 188 pages. 1997 Vol. 1211: E. Keravnou, C. Garbay, R. Baud, J. Wyatt (Eds.), Artificial Intelligence in Medicine. Proceedings, 1997. XIII, 526 pages. 1997.
Lecture Notes in Computer Science
Vol. 1167: I. Sommerville (Ed.), Software Configuration Management. VII, 291 pages. 1996. Vol. 1168: I. Smith, B. Faltings (Eds.), Advances in CaseBased Reasoning. Proceedings, 1996. IX, 531 pages. 1996. (Subseries LNAI). Vol. 1169: M. Broy, S. Merz, K. Spies (Eds.), Formal Systems Specification. XXIII, 541 pages. 1996. Vol. 1170: M. Nagl (Ed.), Building Tightly Integrated Software Development Environments: The IPSEN Approach. IX, 709 pages. 1996. Vol. 1171: A. Franz, Automatic Ambiguity Resolution in Natural Language Processing. XIX, 155 pages. 1996. (Subseries LNAI). Vol. 1172: J. Pieprzyk, J. Seberry (Eds.), Information Security and Privacy. Proceedings, 1996. IX, 333 pages. 1996. Vol. 1173: W. Rucklidge, Efficient Visual Recognition Using the Hansdorff Distance. XIII, 178 pages. 1996. Vol. 1174: R. Anderson (Ed.), Information Hiding. Proceedings, 1996. VIII, 351 pages. 1996. Vol. 1175: K.G. Jeffery, J. Kr~il, M. Bartogek (Eds.), SOFSEM'96: Theory and Practice of Informatics. Proceedings, 1996. XII, 491 pages. 1996.
Vol. 1193: J.P+ Miiller, M.J. Wooldridge, N.R. Jennings (Eds.), Intelligent Agents III. XV, 401 pages. 1997. (Subseries LNAI). Vol. 1194: M. Sipper, Evolution of Parallel Cellular Machines. XIII, 199 pages. 1997. Vol. 1196: L. Vulkov, J. Waw P. Yalamov (Eds.), Numerical Analysis and Its Applications. Proceedings, 1996. XlII, 608 pages. 1997. Vol. t 197: F. d'Amore, P.G. Franciosa, A. MarchettiSpaeeamela (Eds.), Graph-Theoretic Concepts in Computer Science. Proceedings, 1996. XI, 410 pages. 1997. Vol. 1198: H.S. Nwana, N. Azarmi (Eds.), Software Agents and Soft Computing: Towards Enhancing Machine Intelligence. XIV, 298 pages. 1997. (Subseries LNAI). Vol. 1199: D.K+ Panda, C.B. Stnnkel (Eds.), Communication and Architectural Support for NetworkBased Parallel Computing. Proceedings, 1997. X, 269 pages. 1997. Vol. 1200: R. Reischuk, M. Morvan (Eds.), STACS 97. Proceedings, 1997. XIII, 614 pages. 1997. Vol. 1201: O. Maler (Ed.), Hybrid and Real-Time Systems. Proceedings, 1997. IX, 417 pages. 1997.
Vol. 1176: S. Miguet, A. Montanvert, S. Ub6da (Eds.), Discrete Geometry for Computer Imagery. Proceedings, 1996. XI, 349 pages. 1996.
Vol. 12(}2: P. Kandzia, M. Kluscb (Eds.), Cooperative Information Agents. Proceedings, 1997. IX, 287 pages. 1997. (Subseries LNAI).
Vol. 1177: J.P. Miillcr, The Design of IntelligentAgents. XV, 227 pages. 1996 (Subseries LNAI).
Vol. 1203: G. Bongiovanni, D.P. Bovet, G. Di Battista (Eds.), Algorithms and Complexity. Proceedings, 1997. VIII, 311 pages. 1997.
Vol. 1178: T. Asano, Y. Igarashi, H. Nagamochi, S. Miyano, S. Suri (Eds.), Algorithms and Computation. Proceedings, 1996. X, 448 pages, t996. .Vol. 1187: K. Schlechta, Nonmonotonic Logics. IX, 243 pages. 1997. (Subseries LNAI). Vol. 1188: T.P. Martin, A.L. Ralescn (Eds.), Fuzzy Logic in Artificial Intelligence. Proceedings, 1995. VIII, 272 pages. 1197. (subseries LNAI). Vol. 1189: M. Lomas (Ed.), Security Protocols. Proceedings, 1996. VIII, 203 pages. 1997. Vol. 1190: S. North (Ed.), Graph Drawing. Proceedings, 1996. XI, 409 pages. 1997. Vol. 1143: T.C. Fogarty (Ed.), Evolutionary Computing. Proceedings, 1996. VIII, 305 pages. 1996. Vol. 1191: V. Gaede, A. Brodsky, O. Giinther, D. Srivastava, V+ Vianu, M. Wallace (Eds.), Constraint Databases and Applications. Proceedings, 1996. X, 345 pages. 1996. Vol. 1192: M. Dam (Ed.), Analysis and Verification of Multiple-Agent Languages. Proceedings, 1996. VIII, 435 pages. 1997.
Vol. 1204: H. M0ssenbtick (Ed.), Modular Programming Languages. Proceedings, 1997. X, 379 pages. 1997, Vol. 1206: J. Bigiin, G. Chollet, G. Borgefors (Eds.), Audio- and Video-based Binmetric Person Authentication. Proceedings, 1997. XlI, 450 pages. 1997. Vol. 1207: J. Gallagher (Ed.), Logic Program Synthesis and Transformation. Proceedings, 1996. VII, 325 pages. 1997. Vol. 1208: S. Ben-David (Ed.), Computational Learning Theory. Proceedings, 1997. VIII, 331 pages. 1997. (Subseries LNAI). Vol. 1209: L. Cavedon, A. Rao, W. Wobcke (Eds.), Intelligent Agent Systems. Proceedings, 1996. IX, 188 pages. 1997. (Subseries LNAI). Vol. 1210: P. de Groote, J.R. Hindley (Eds.), Typed Lambda Calculi and Applications. Proceedings, 1997. VIII, 405 pages. 1997. Vol. 1212: I. P. Bowen, M.G. Hinchey, D. Till (Eds.), ZUM '97: The Z Formal Specification Notation. Proceedings, 1997. X, 435 pages. 1997.
H o w D o A u t o n o m o u s A g e n t s Solve Social D i l e m m a s ? Akira Ito Kansai Advanced Research Center, Communications Research Laboratory, Iwaoka, Nishiku, Kobe, 651-24, Japan A b s t r a c t . This paper explores the problem of cooperation of autonomous agents. Why it is important for autonomous agents to solve social dilemma problems is explained. They must be solved in a way that does not restrict the autonomy of agents. For that purpose, a social sanction by the disclosure of information is proposed. Agents were made to play the Prisoner's Dilemma game repetitively, each time changing the other party of the match, under the condition that the match history would be disclosed to the public. Computer simulation shows that 1) under the disclosure of information, even a selfish agent comes to cooperate for its own sake, and 2) an agent can learn how to cooperate through interactions with other agents. Thus autonomous agents can solve the dilemma problem by themselves. The paper then discusses the role of commitment in solving the social dilemma problems.
1
Introduction
Recently, autonomous agent design is attracting much attention as a technique for building flexible and robust machine systems. Such system can be developed, it is advocated, as a society of autonomous agents which have individual goals, can act autonomously, and work together to execute a given task. For an autonomous agent to do useful tasks, however, it must learn to cooperate with other agents. What I mean by "cooperation" is the following. If there is a situation where cooperation is more profitable for the society as a whole, can an autonomous agent cooperate without seeking its own profit? If it cannot, it surely will lose in the long run. For example, take up a collision avoidance problem depicted in Fig. 1. Suppose that the goal of each agent is to reach a target place with a minimum cost. If A wants to minimize the sum of the cost for A and B (e.g. the total cost for the society), it does not matter whether A yields and turns aside or makes B yield. On the other hand, if the goal of each agent is to minimize its own cost (which is very probable), it is better that the other agent yield. But if both agents think in the same way and collide, or just wait forever for the other to yield, both have to pay a cost greater than that of yielding to the other agent at the beginning (chicken game). A similar dilemma may occur even in cooperative situations. Suppose that A and B promised to do the other's unfavorite task. A doing B's task earnestly,
178 m
~
I L !
9
Fig. 1. Collision avoidance problem however, does not guarantee that B will do A's task as earnestly. Hence, it is a good strategy for A to do B's task lukewarmly. The situation is the same for B. If both think this way, however, both will probably get less than what would be expected (the Prisoner's Dilemma game). These situations where the profits of the society conflict with those of the individual members are called "social dilemma" problems [Liebrand & Messick 96]. Earlier approaches to the problems are to develop rules and protocols that agents should follow for individual situations. However, this approach cannot solve the problems completely. How can we list all the situations beforehand? Even if we could, how can we force an autonomous agent to follow these rules and protocols? Is a machine system with huge lists of rules and protocols still an autonomous agents? At first thought it seems that '% principle of benevolence" (rules such as "help others whenever possible") can solve the problem. If it is guaranteed that all the members of a society follow the rule, it probably works. If someone (either intentionally or accidentally) does not follow it, however, those who do not follow it gain advantage over the others. Bad money drives out good money. Such a system does not work by itself. The problem is very serious for an autonomous agent that learns from the environment. The moral-less machine system has no reason to hesitate in modifying his own action rules to exploit loopholes of rules and protocols. The reaction of the society is to develop more and more sophisticated set of rules, which eliminate completely the flexibility of autonomous agents. What is necessary is a social mechanism, which is not so rigid as rule-based systems, but can sanction uncooperative agents. This is a very difficult problem even for a human society, but unavoidable for making a society of autonomous agents. In the following, we will explain one of our efforts for this direction: a social sanction mechanism by the disclosure of information, and show that under proper conditions autonomous agents can solve the problems by themselves. 2 The Prisoner's information
dilemma
game
under
a disclosure
of
We used the Prisoner's Dilemma game as a model of the interaction of autonomous agents. It is a standard model for investigating the problem of cooperations [Axelrod 81] [Axelrod 84][Axelrod 88][Rosenschein & Genesereth 85]
179
[Lindgren 92]. The Prisoner's Dilemma game is a non-zero-sum game between two players. In the game, each participant plays either C (Cooperate) or D (Defect), according to which points are earned, as listed in Table 1.
Table 1. Payoff matrix of the Prisoner's Dilemma game A\B C D C A:3 B:3 A:0 B:5 D A:5 B:0 A:I B:I
The main features of the game are the following: 1) Whatever the move of the other player is, it is more profitable for a player to defect. 2) T h e points earned when both defect, or the average points earned when one defects whereas the other cooperates, are lower t h a n those earned when both cooperate. Axelrod showed the following:[Axelrod 84] 1) Cooperation emerges if both players expect to play repetitively in the future (Iterated Prisoner's Dilemma game). 2) The only rational strategy for a single round game is to defect, and no cooperation can emerge in such a situation. Our aim here is to break the above dilemma situation by introducing a disclosure of information. First, let us briefly explain our model[Ito & Yano 95]. An agent moves around randomly in a two-dimensional space (lattice). Agents occupying the same location can make a match. The match is equivalent to the Prisoner's Dilemma game, with the payoff matrix listed in Table 1. In the match, each agent decides either to cooperate or to defect, applying his match strategy algorithm to the m a t c h history of the opponent. Match records are made open to the public and are accessible by any agent. An agent has an initial asset of A0, and the profit calculated as follows is added each time the agent makes a match. Profit = Payoff of the original P D G (Match fee (constant) + Algorithm calculation cost) where the algorithm calculation cost is the Cam times number of machine codes executed in the strategy algorithm calculator to be described in Sec. 3. -
An agent whose assets become less than zero is bankrupted and is deleted from the system. On the other hand, an agent whose assets becomes larger than 2A0 bears a child and imparts the amount A0 to the child. The child inherits the m a t c h strategy of his parent. These introduce an evolutionary mechanism[Ray 92] [Werner & Dyer 92] into our system. The system parameters used are given in Table 2.
180
Table 2. System parameters Probability of the agent's random walk Pr~ 1/40 Size of the lattice N 15 Initial asset A0 20 Match fee Cd 2.5 Algorithm calculation cost/instruction Cam 0.002 3
Description
of match
strategy
algorithms
The match strategy is an algorithm that calculates what move to play next based on the match history of the opponent. As the match history will be disclosed to other agents, the strategy should take into account not only the expected profit of the current match, but also the effect the play of this match might have on future matches. To investigate the learning of match strategies, we needed a framework for describing strategies. Hence, we introduced a strategy algorithm calculator as follows. The calculator had 8 (typed) registers, and 6 opcodes (all without operands). Moreover, the calculator had an infinite-length stack, and could execute reflective (recursive) calculations. Inputs to the calculator were the (current) time, the pointer to the opponent, and the match strategy algorithm. The calculator returned the next move (either C or D) as the result of the calculation. A schematic diagram of the calculator is given in Fig. 2. First the calculator loaded the match record of the opponent's latest match into the history register. Suppose the opponent in the next match is a, and a's latest match was R~ (i.e., the match record of the match R~ was loaded), and the opponent of a in R~ was a~. The calculator processed the strategy algorithm step by step. The code LC or LD loaded 'C' or 'D' into the move register. BM, BY, and SL were branch instructions. BM branched if the move of a in R~ (i.e., rn~) was C or D. BY branched if the move of a~ in R~ (i.e., y~) was C or D. SL branched according to the move that this algorithm suggested, if it were in place of a in R~. T P went one record back in the opponent's match history (i.e., loaded R,~-I), and applied the algorithm again. The process terminated either when the algorithm reached an end, or it ran out of the match history. Code sequences are expressed in list forms of LISP. A branch instruction is accompanied by two lists corresponding to the code sequences after the branch. As is seen from the program flow, the output move was undefined if the algorithm terminated without ever executing LC or LD. In such a situation, the calculator was assumed to randomly output a move, i.e. either C or D with probability 1/2. Hence, the random strategy can be expressed by the algorithm of length 0, NIL. The following are a few basic strategies together with their strategy algorithm codes. 1. G o o d - n a t u r e d ( C C C ) : (LC) Always cooperate.
181
fRegister time t
The opponentagent a Match history TP: (tt al rnl Yl) load a next
agent a
history his . ~ , , ~ h(output) ~r I],Is__. ' ~
latest
(t2 a2 m2 Y2)
latest recorS'
~
I ~ate;Igorithrn
BM: Branch ifm n i sC o rD . BY: Branch if Yn is C or D.
calculator
The opponent of the opponent a n I SL: Match history
~Rn-t(tn.1 a an. 1 ran. 1 Yn-1) Rn (t n a a n m n Yn)
I
Branch if the result of applying this algorithmto an is C or D.
Fig. 2. Schematic diagram of a strategy algorithm calculator
2. Tit for T a t ( T F T ) : (LC BM (LC) (LD)) Mimic the opponent's play in his latest game. Cooperate, if the opponent's history is empty. 3. Random (RAN): NIL Cooperate/defect with probability 1/2, irrespective of the opponent. 4. E x p l o i t e r ( D D D ) : (LD) Always defect. 5. R e v i s e d Tit for Tat (TT3): (LC BM (LC) (BY (LD) (TP))) If both agents defected in the opponent's latest game, then go one record back in the opponent's history, and apply this algorithm again. Otherwise, mimic the latest move of the opponent. Cooperate if the opponent's history is exhausted. 6. Reflective ( R E F ) : (LC BM (LC) (BY (SL (LD) (TP)) (TP))) In the record of the opponent in the latest game, if both agents defected, or if the opponent defected and this algorithm also suggests D in the opponent's position, then go one record back in the opponent's history and apply this algorithm again. Otherwise mimic the opponent's latest move. Cooperate if the opponent's history is exhausted. TFT strategy is known to be effective for the Iterated Prisoner's Dilemma game[Axelrod 84]. In our model, however, it did not work well. Hence we invented TT3 and REF strategies to overcome the weakness of TFT. Strategies TT3 and REF are, however, listed here just for the explanation of strategy algorithm calculator, and not used explicitly in the following experiment.
182
4
Learning
of strategy
algorithms
First of all, note that the strategy devised for the Iterated Prisoner's Dilemma (IPD) game does not work well for our model. At first thought there does not seem to be much difference between our model and IPD, because all the match history is accessible. In IPD, however, if the opponent played D first, it is obvious that the opponent is uncooperative. On the other hand, in our model, you may sometimes be forced to play D to an unknown opponent, fearing that it might play D to you, or intending to punish the seemingly uncooperative behavior of the opponent. This in turn may make others play D against you. The fact that the simple T F T (Tit for Tat) strategy does not work well is confirmed by our experiment. When matched with DDD (always play D) strategy, an agent with T F T strategy starts to play D even to his comrades (i.e., agents with T F T strategy), ultimately leading to mutual retaliations among agents with T F T strategy. As we charged 2.5 points for a match fee, any algorithms that could not cooperate among themselves were destined to decay in the long run. This was a disappointing result. Couldn't autonomous agents cooperate by themselves? Couldn't they learn to cooperate through experience? We wanted agents with T F T strategy to learn a better strategy. Hence, we endowed agents with a simple learning ability and investigated how they learned through experience. As our target was to make agents learn strategies by themselves, we assumed that the agents had no prior knowledge as to what might be a good strategy. Hence, the agents had to make a blind search for a candidate strategy. The simplest search algorithm is to change the strategy code randomly, i.e., randomly insert, delete, or replace strategy codes. Of course, these random modifications of strategy codes mostly led to a poor strategy. Hence, this algorithm should be used only when an agent has a strong reason for revising its strategy. The timing of the strategy revision Each agent calculated the average points he got, and the average points all the agents got, and compared the results. If he was earning more than others, there was no reason to revise the strategy. Conversely, if he was earning less than others, it would be a good time to revise the strategy. Note that the points of other agents were calculable from the match history. The algorithms used were the following: Av(t) = 0.75Av(t - 1) + 0.25x (average points for all the agents at t - 1), N(t) = Number of matches after adopting the current strategy, S(t) = Sum of points after adopting the current strategy, A(t) = Av(t) - ( S ( t ) / N ( t ) + a / S Q R T ( N ( t ) ) ). These were calculated every 10 steps. If it was found that ~ ( t ) > 0, then with probability p,~ calculated by P m = min (105z~(t)-2"~ a random modification of strategy codes was executed. Here, the min(x,y) means the minimum of x and y, and Pr is the upper probability, which was set to 0.4. The term a / S Q R T ( N ( t ) ) was inserted in order to make allowances for statistical
183
errors in the evaluation of the average S(t)/N(t). In the simulation, a was set to 1.0. Random modification of strategy codes The following random replacements of codes were executed, where l stands for the length of its strategy algorithm codes: 1) Deletion of codes: A code in the algorithm was deleted with probability pm/l. If the deleted code was a branch instruction, one of the code sequences corresponding to the branch became unnecessary. Hence, it was also deleted. 2) Insertion of codes: An instruction code selected randomly from the code set was inserted at any place in the algorithm with probability pm/l. If the inserted code was a branch instruction, the algorithm then needed two code sequences corresponding to the branch. Hence one of the code sequences after the branch was set to NIL (NOP: No operation). 3) Replacement of NIL (NOP): Due to an insertion of a branch instruction, the code NIL often entered into the code sequences. This was unfavorable for the system performance. Hence, with an upper probability Pr, NIL was replaced by a code randomly selected from the code set. 4) Simplification of redundant code: Redundant codes brought a b o u t by the above mutation processing were deleted. It is i m p o r t a n t to note that in the above procedure, agents sought to improve their own profit, and no cooperation or altruism was implied a priori.
5
Simulation
First of all, our world needed some pressure, or incentive for learning. In fact, in a world consisting of only cooperative strategies, each agent just played C, and there was no motivation for learning. Hence, agents with uncooperative strategies (called vaccine agents hereafter) were injected into the society of T F T agents. Then cooperative strategies had to learn to fight/retaliate against vaccine agents, or they would be destroyed by the injected vaccines. Note t h a t there was no differences between injected vaccine agents and other agents, and vaccine agents could also revise their own strategies. If an injected vaccine was too strong, the learning agents were destroyed by the vaccine. On the other hand, if the injected vaccine was too weak, it had little effect on the learning. Hence, the vaccine of DDD (who always played D), or RAN (who played either C or D randomly) was injected with increasing intensity. Detailed information on the training procedure is given in Table 3. For example, in the interval [0 10000], vaccine agents with strategy DDD were added every 500 steps, until the number of vaccine agents was 10~0 of the total population. As the RAN strategy was more difficult to detect than the DDD strategy, the RAN vaccine was injected after the training by the DDD vaccine. To evaluate the effect of the learning, the system before the training ( T F T ) , after the training by DDD (S1), and after the training by RAN ($2) were matched against half the population of DDD agents (Fig. 3), or an equal population of RAN agents (Fig. 4). While the society of T F T agents was destroyed by
184
Table 3. Vaccines Injected for Training Time Vaccine Vaccine ratio(%) (TFT)0-10000 DDD 10 10,000-20,000($1) DDD 20 20,000-30,000 DDD 33 ($1)30,000-40,000 RAN 20 40,000-50,000($2) RAN 33
uncooperative strategies, $1 and $2 were able to destroy uncooperative strategies and recovered the original population, verifying the effect of the strategy learning. Figure 4 also shows that training by the RAN vaccine greatly increased the retaliative ability of the system against RAN.
Fig. 3 Matches of T F T , $1, $2 against DDD
6
Fig. 4 Matches of T F T , $1, $2 against RAN
Information, communication, and commitment
Readers may wonder why autonomous agents cannot negotiate for cooperation. When we are confronted with a dilemma situation, we use words in trying to negotiate for cooperation. Autonomous agents can, of course, also communicate, or, more precisely, can exchange information. If agents can negotiate, most of the dilemma games are changed to "cooperative games" and the problem is solved. Our assumption, however, is that an autonomous agent will do anything for its own profit. Therefore, it can tell a lie without regard for moral or other human considerations. Exchanges of information are not the same as communication. Communication has significance only when the speaker commits to what he says.
185
Our ability to communicate is thus based, not only on our intelligence, but also on our ability to commit to what we say. But what does "to commit" mean? If we say something, we mean it. Even if we fail to do what we promised to do, we are ready to admit our failure. We cannot tell a lie without a guilty conscience, without betraying an emotion through facial expressions, etc. Even if this is not true, at least people believe that it is. This is one of the reason why people want to talk face to face on important matters. Frank [Frank 88] describes an interesting role of emotions in solving social dilemmas. On the other hand, a machine system lacks morals, lacks facial expressions that might betray its emotionQ. We need a mechanism that forces a machine system to "commit" to what it says. This ability, or the inability to act otherwise, must be rooted in low level hardware limitations. In our case of the Prisoner's Dilemma game, for example, the following mechanism might work as "commitment." Suppose an agent wants to make the opponent believe that it is going to play "C" in the next turn. Suppose also that it discloses its own strategy algorithm and demonstrates that its algorithm will really output C in the current situation, and that its algorithm can explain all of the moves it played in the past. Unfortunately, this in itself does not make a "commitment," for the agent can invent, on the spot, an algorithm that satisfies the above two conditions. To find such an algorithm, however, is an inverse problem, and is generally very difficult. If we can prove that this is really difficult (such as NP-complete), then by disclosing an algorithm that can explain all of its past moves, an autonomous agent proves its "commitment" to the algorithm. Then the opponent can assure itself that the algorithm was not invented on the spot and is worth trusting. In summary, the role of communication in the cooperation of autonomous agents is effective only if we can implement a mechanism assuring "commitment" of an agent to what it says.
7
Summary
and
discussions
We have explored the problem of cooperation of autonomous agents, and explained why it is important for autonomous agents to solve social dilemma problems. It must be solved in a way that does not restrict the autonomy of agents. One solution is to introduce a social sanction mechanism by the disclosure of information. The effectiveness of this mechanism is investigated by computer simulations. Agents were made to play the Prisoner's Dilemma game repetitively, each time changing the other party of the match, under the condition that the match history was to be disclosed to the public. The aim of the agents was to increase their own profit, and no cooperation or altruism were implied a priori. W h a t we found are the following: 1 The situation does not change by the development of the techniques for machine generation of emotions and facial expressions. The point is that people physiologically hate telling a lie, and cannot help expressing (betraying) emotions.
186
1) Under the disclosure of information, even a selfish agent comes to cooperate for its own sake. 2) An agent can learn to cooperate through interactions with other agents. Thus autonomous agents solved the dilemma problem by themselves. We then discussed the role of commitment in solving social dilemmas, and considered a possible mechanism for implementing commitment in autonomous agents. In a previous paper[Ito & Yano 95], we showed that cooperation emerges through evolution. In this paper, we bestowed the agents with the ability to learn from experiences, and investigated its effects on the emergence of cooperation. At first thought, there seems to be no problem with this additional ability. This is not, however, true for the emergence of cooperation. In a society of agents with the abilities for learning (i.e., for modifying its own strategies), one has to be more cautious against possible defections of other agents. In fact, the performance (measured by the average points agents can earn) is much better in a society without learning abilities than that in a society with learning abilities. As for the cooperation, being intelligent does not improve the situation. The more rationally each agent behaves, the more uncooperative the society becomes. This is why it is called "social dilemmas." The main finding of this paper is that even in a society of agents with learning abilities, agents can develop strategies for cooperation by themselves. There is a long history of research o n Prisoner's Dilemma Game[Axelrod 81] [Axelrod 84][Axelrod 88][Rosenschein & Genesereth 85][Lindgren 92][Angeline 94]. Most of them, however, are on the Iterated Prisoner's Dilemma (IPD) game, for Axelrod showed decisively that in a Non-iterated game defection is the only rational strategy [Axelrod 84]. One of the exceptions is [Oliphant 94], in which agents were scattered randomly in space and forced to stay at its place, thus inhibiting defectors to exploit cooperators. Our research treats a Non-iterated Prisoner's Dilemma game, i. e., agents are forced to play the Prisoner's Dilemma game, each time changing the opponent of the match. Of course, we needed a mechanism to avoid Axelrod's results, and adopted a mechanism of "information disclosure." Theoretical game theory[Osborne & Rubinstein 94] has long investigated the role of information in solving the social dilemma problems, but its results are restricted to the prediction of fixed points and their stabilities, and no dynamics are known for the emergence of cooperations. There is no research we know of which treated the dynamics of the emergence of cooperation under the disclosure of information, nor is there any research mentioning a recursive algorithm, which is essential for the success of cooperative strategies. Recent research focuses on "noisy situation"[Wu & Axelrod 94] in which the player's moves are changed randomly irrespective of the opponent's intention, "multi-player games" [Banks 94] in which part of the participant can make a coalition, and "selective play" [Batali & Kitcher 94] in which a player can somehow select his opponent. This research really reveals interesting aspects of the social dilemma problems. What effect information disclosure plays in these situation is our future research theme.
187
There are a number of researches[Rosenschein &: Genesereth 85] [Rosenschein &: Zlotkin 94] coping with conflict resolution problems. Their aim is, however, to develop protocols a n d / o r heuristics for negotiating among conflicting agents. As we mentioned in Sec. 6, for the negotiation to be effective, it is essential that agents can be trusted, or in other words, an agent will do what it promised to do. We pointed out that this ability to keep promise is itself a "social dilemma" problem. It must be solved without an assumption for the trustworthiness of the negotiating agents. Of course, we cannot calculate everything from scratch, and protocols and heuristics are very useful for everyday life. What is important for an autonomous agents is, however, the ability to doubt these "protocols and heuristics" whenever it is necessary. W h a t should an autonomous agent do if nothing except himself (or his reason) can be trusted? This is the problem we tackled in this paper. In our model, information plays an essential role in maintaining a cooperative society, which is, we think, true even for our society. The secure maintenance of the public information is a very difficult problem, which is beyond the scope of this paper. We need not, however, assume an existence of complete knowledge for the behaviors of other agents. In our model, we charged a calculation cost (0.002 per machine code) for processing of information. With the parameters adopted, an agent cannot process more than several tens of records of the behaviors of other agents. For the processing of these records demands several hundreds of machine codes, whose calculation cost nearly exceeds the profit mutual cooperation can earn through the game (which, in our case, is 0.5 point). Therefore, an agent is forced to cut off his search halfway. Our eventual goal is to develop a machine system that acts at its own will and takes responsibility for its actions. The problem of cooperation - especially that involving social dilemmas - is very difficult to solve, even for a human society. How can we prepare an environment in which autonomous agent can think and act freely and at the same time can cooperate? The mechanisms proposed in this paper defines one possible approach to this question.
References [Angeline 94] Angeline, P. J., "An Alternate Interpretation of the Iterated Prisoner's Dilemma and the Evolution of Non-Mutual Cooperation", Artificial Life VI, pp.353-358, 1994. [Axelrod 81] Axelrod, R., and Hamilton, W. D.: "The Evolution of Cooperation," Science, Vol. 211, Mar. 1981, pp. 1390-1396. [Axelrod 84] Axelrod, R.: The Evolution of Cooperation, Basic Books Inc., 1984. [Axelrod 88] Axelrod, R., and Dion, D.: "The Further Evolution of Cooperation," Science, Vol. 242, Dec. 1988, pp. 1385-1390. [Banks 94] Banks, S., "Exploring the Foundations of artificial Societies: Experiments in Evolving Solutions to Iterated N-player Prisoner's Dilemma. [Batali ~ Kitcher 94] Batali, J., and Kitcher, P., "Evolutionary Dynamics of Altruistic behavior in Optional and Compulsory Versions of the Iterated Prisoner's Dilemma", Artificial Life VI, pp.343-348, 1994.
188
[Frank 88] Frank, R. H.: Passions within reason, The strategic role of the emotions, W.W. Norton & Company, Inc., 1988. [Ito • Yano 95] Ito, A., and Yano, H.: "The Emergence of Cooperation in a Society of Autonomous Agents," First Intl. Conf. on Multi Agent Systems, (ICMAS'95), pp.201-208, San Fransisco, 1995. [Liebrand & Messick 96] Liebrand, W. B. G. and Messick, D. M. (eds.), Frontiers in Social Dilemmas Research, Springer, 1996. [Lindgren 92] Lindgren, K. : "Evolutionary Phenomena in Simple Dynamics," Artificial Life II, C. G. Langton (eds.), Addison Wesley, 1992, pp. 295-312. [Oliphant 94] Oliphant, M., "Evolving Cooperation in the Non-Iterated Prisoner's Dilemma: The Importance of Spatial Organization", Artificial Life VI, pp.349-352, 1994. [Osborne $c Rubinstein 94] Osborne, M. J. and Rubinstein, A., A Course in Game Theory, The MIT Press, 1994. [Ray 92] Ray, T. S.: "An Approach to the Synthesis of Life," Artificial Life II, C. G. Langton et al. (ed.), Addison-Wesley, 1991. [Rosenschein & Genesereth 85] Rosenschein, J. S., and Genesereth, M. R.: "Deals Among Rational Agents," Proc. 9th Intl. Joint Conf. on Artificial Intelligence (IJCAI'85), Aug. 1985, pp. 91-99. [Rosenschein & Zlotkin 94] Rosenschein, J. S., and Zlotkin, G. : Rules of encounter, Designing Conventionbs for Automated Negotiation among Computers, The MIT Press, 1994. [Werner & Dyer 92] Werner, G. M, and and Dyer, M. G., "Evolution of Communication in Artificial Organisms," C. G. Langton (eds.), Artificial Life II, Addison Wesley, 1992, pp. 659-687. [Wu & Axelrod 94] Wu, J. and Axelrod, R., "How to Cope with Noise in the Iterated Prisoner's Dilemma", Journal of Conflict Resolution, pp.1-5(1994)