Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Edited by J. G. Carbonell and J. Siekmann
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
1237
Magnus Boman Walter Van de Velde (Eds.)
Multi-Agent Rationality 8th European Workshop on Modelling Autonomous Agents in a Multi-Agent World, MAAMAW'97 Ronneby, Sweden, May 13-16, 1997 Proceedings
Springer
Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA J6rg Siekmann, University of Saarland, Saarbriicken, Germany
Volume Editors Magnus Boman DSV, Stockholm University andThe Royal Institute of Technology Electrum 230, S-16440 Kista, Sweden E-mail:
[email protected] Walter Van de Velde Artificial Intelligence Laboratory Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium E-mail: walter @ arti.vub.ac.be Cataloging-in-Publication Data applied for
Die Deutsche Bibliothek - CIP-Einheitsaufnahme
Multi-agent rationality : proceedings / 8th European Workshop on Modelling Autonomous Agents in a Multi-Agent World, MAAMAW '97, Ronneby, Sweden, May 13 - 16, 1997. MagnusBoman ; Walter van de Vetde (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Budapest ; Hong Kong ; London ; Milan ; Paris ; Santa Clara ; Singapore ; Tokyo : Springer, 1997 (Lcetu~ note~ in computer science ; Vol. 1237 : Lecture not~ in artificial intelligence) ISBN 3-540-63077-5
CR Subject Classification (1991): 1.2, C.2.4, D.1.3 ISBN 3-540-63077-5 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the materiat is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer -Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1997 Printed in Germany Typesetting: Camera ready by author SPIN 10548791 06/3142 - 5 4 3 2 1 0
Printed on acid-free paper
Table of Contents I n v i t e d T a l k s (Abstracts) Market-Aware Agents for a Multiagent World Michael P. Wellman Learning and Adoption Versus Optimization Yuri M. Ermoliev
2
Limits of Strategic Rationality for Agents and M-A Systems Cristiano Castelfranchi
3
Papers Multiagent Coordination in Antiair Defense: A Case Study Sanguk Noh and Piotr J. Gmytrasiewicz
4
A Service-Oriented Negotiation Model between Autonomous Agents Caries Sierra, Peyman Faratin, and Nick R. Jennings
17
Norms as Constraints on Real.Time Autonomous Agent Action Magnus Boman
36
Distributed Belief Revision vs. Belief Revision in a Multi-Agent Environment: First Results of a Simulation Experiment Aldo Franco Dragoni, Paolo Giorgini, and Marco Baffetti
45
Multi-Agent Coordination by Communication of Evaluations Edwin de Jong
63
Causal Reasoning in Multi-Agent Systems B. Chaib-draa
79
The Reorganization of Societies of Autonomous Agents Norbert Glaser and Philippe Morignot
98
Adaptive Selection of Reactive/Deliberate Planning for the Dynamic Environment Satoshi Kurihara, Shigemi Aoyagi, and Rikio Onai
112
Distributed Problem-Solving as Concurrent Theorem Proving Michael Fisher and Michael Wooldridge
128
Commitments Among Autonomous Agents in Information-Rich Environments Munindar P. Singh
141
VI
Making a Case for Mulli-Agent Systems Fredrik Ygge and Hans Akkermans
156
Learning Network Designs for Asynchronous Teams Lars Baerentzen, Patricia Avila, and Sarosh N. Talukdar
177
Building Multi-Agent Systems with CORRELATE Wouter Joosen, Stijn Bijnens, Frank Matthijs, Bert Robben, Johan Van Oeyen, and Pierre Verbaeten
197
Modelling an Agent's Mind and Matter Catholijn M. Jonker and Jan Treur
210
Delegation Conflicts Cristiano Castelfranchi and Rino Falcone
234
Multi-Agent Rationality: Preface Magnus Boman* & Walter Van de Velde** *DECIDE, DSV, StockholmUniversity **AI Lab, Vrije Universiteit, Brussels
Theme The theme of MAAMAW workshops has never been crucial to their success. Nevertheless, it has always played a part in the vision of the near future, as presented by the respective scientific co-chairs. We decided that multi-agent rationality would be a challenging and inspiring theme for MAAMAW'97. The challenge lies in the fact that a transition from studies of individual rationality to studies of group rationality is currently taking place, and it is a rocky road. Artificial agents have been making decisions for some years now. Their guiding principle has been that of maximising their expected utility (PMEU, for short), although one cannot ignore the influence of rival principles. There are indeed good reasons for a utilitarian focus on individual rationality. To many end users agents are merely a metaphor for personalised decision support embodied in personal digital assistants, web crawlers, or intelligent filters. However, the multi-agent systems (MAS) community seems to hold the opinion that PMEU cannot be the operational definition of rationality that Newell sought in his oR-cited AAM address. Wherever there is room for co-operation and co-ordination, social issues must come into play. In our immediate future lie artificial decision makers facing sequences of heterogeneous and complex decision situations, perhaps even in real time. Their rationality might be explicated by their capacity for synthesising results from evaluations that employ different evaluation functions. The extent to which their analyses governs their behaviour will vary, but the representation and modelling of their social context (in terms of their place in the MAS, in their particular coalition, and their alignments) is central. The number of treatises on the dependence of group rationality upon individual rationality is still small, and if the choice of theme for MAAMAW'97 can inspire more activity on this topic, the whole area should benefit. Invited
Speakers
The three speakers who have been invited were selected with particular regard to our workshop theme. Each of them represents an important and influential school of thought on the topic of multi-agent rationality. Abstracts of their talks are included in this volume. Michael P. Wellman (University of Michigan) has for several years been investigating the use of economic principles for studying economically rational agent behaviour, coining the term market-oriented programming in the~process. This interdisciplinary sub-area has produced models for several multi-agent applications, and is currently one of the fastest growing research topics in the field of computer science. Of special interest to the theme is the speaker's positive stance on how successful economic rationality can be in real-life
viii MAS applications.
Yuri Ermoliev (IIASA,Vienna) is one of the founders of the area of stochastic optimization, and in recent years he has used multi-agent systems for simulating decisions under risk. He is perhaps best known for his pioneering work in economics, some of which was done with Brian Arthur, and now widely accepted. Even though this work questioned some of the foundations of economics, the speaker's view on the promises of crossing economics with computer science are not entirely negative. He is more sceptical, however, about the coupling of physics and MAS as a guarantee of rational agent behaviour. Cristiano Castelfranchi (CNR Institute, Rome) should be well known to most readers of this volume. For about ten years he has stressed the importance of studying the social aspects of agent communication and behaviour in MAS. In recent years he has expressed doubts about the hopes that many researchers hold for formal theories, notably game theory, in governing and explaining rational behaviour in MAS. His criticism extends to overconfidence in the concept of economic rationality, and so in a sense completes the picture of the invited speakers' views.
Papers Labelling papers can be difficult, but classifying papers for a scientific meeting can greatly assist the attendees. Unfortunately, it can also create confusion by clustering papers unnaturally, which can negatively affect the composition of sessions. Below therefore, papers are briefly introduced in what is hoped is a logical order, without trying to cluster them. Fifteen papers were chosen from a total of 51 submitted papers, a further fifteen of which were selected for poster presentation (not included here). One of the purposes of the theme was to try to clarify the importance of game theory to formal representation of autonomous agents in multi-agent systems: If there is a gap between the theory and its applications, how can this gap be bridged? In this light, "Multiagent coordination in antiair defense: a case study" must be classified as advocating the classical game-theoretic approach. The work could be classified as planning for autonomous agents, where the latter have a model of the other agents. Noh and Gmytrasiewicz base their studies of optimal behaviour on payoffmatrices and their model extends the matrices by probability distributions, allowing for PMEU to be applied. The paper "A service-oriented negotiation model between autonomous agents" follows a similar path, although the presentation and emphasis of the two papers varies considerably. Where Noh and Gmytrasiewicz focus on planning, Sierra, Faratin, and Jennings choose to focus on negotiations. They provide evidence for the fact that the game-theoretic assumptions can be extended without violating the tractability of the basic questions. This extension is more or less analogous to introducing subjective probability. The authors develop a formal model of negotiation with particular concern for tactics. By weighting the tactics and applying a variant of PMEU, agents are guided through the space of possible actions that ultimately define contracts with other agents. PMEU appears in its classical form in "Norms as constraints on real-time autonomous agent action" by Boman, although probabilities are augmented by credibility and reliability
IX reports concerning other agents in the MAS. The normative advice given by a decision module evaluating subjective agent reports can be overridden by norms, classified here into three types, acting as constraints on actions. Hence, rational behaviour in the group is not determined by PMEU alone, but by PMEU in conjunction with the set of relevant norms using an efficiently implemented algorithm which is part of a more ambitious anytime algorithm realising the evaluation model. Dragoni, Giorgini, and Baffetti also use credibility and reliability assessments in their paper "Distributed belief revision vs. belief revision in a multi-agent environment: first results of a simulation experiment". Instead of computing the group utility of actions, however, the authors let agents hold elections. These elections do not affect the individual agent assessments. Since the authors strive to keep the agent theories consistent, they face computational problems, such as when computing maximally consistent subsets of revised knowledge bases. Another computationally hard problem is the application of Dempster's combination rule (a non-classical rule), which effectively makes simulations the appropriate tool for strategy evaluation. Whether or not domain competence is a necessary condition for optimal behaviour has been debated for over 40 years. In his paper "Multi-agent coordination by communication of evaluations", de ,long investigates the extent to which local experts may direct other agents, whose competence is less dependent on the domain. In spite of the similarity with respect to the aim and the focus on coordination, the paper is very different to the paper by Noh and Gmytrasiewicz. Where those authors use game theory, de Jong uses so-called coordination signals that simulate real-life encounters by scalar numbers. The paper is also decidedly more related to machine learning than to planning. In "Causal reasoning in multi-agent systems", Chaib-draa recommends a relatively coarse scale for utilities, producing a model that can only give partial advice. The author prefers to stress the importance of graphically depicting cognitive maps (here called causal maps) of agent decision situations. It is further argued that the recursive modelling problem, tackled in the paper by Noh and Gmytrasiewicz, is of little practical importance in this context. This can perhaps be explained by the fact that the paper follows the AI tradition of centering around the representation of reasoning and leaving evaluation in the periphery. The same is true for "The reorganization of societies of autonomous agents", in which agents migrate between simple societies composed only of agents. Agents assume roles in their society, and are motivated to migrate by utility values pre-assigned to them as well as to the societies with which they interact. Glaser and Morignot argue that each society then establishes conventions, which encourage certain agents and punish others. Each agent also possesses four kinds of competence, which the agent itself quantifies. Its individual competence suggests its role in a particular society. The paper "Adaptive selection of reactive/deliberate planning for the dynamic environment" mixes reactive and deliberate planning with the purpose of controlling a real-world system for robot vision. Similar attempts at a synthesis between these two fundamentally different approaches were made earlier in the area of Artificial Life. The real-time application described by Kurihara, Aoyagi, and Onai is treated in part with methods developed within parallel programming, and in part with purely multi-agent methods.
The paper "Distributed problem-solving as concurrent theorem proving" by Fisher and Wooldridge shares this connection to parallel processing. In particular, the authors' vectorized approach to theorem proving is close to research usually carried out within concurrent constraint programming. As anyone familiar with different kinds of resolution knows, the operational choices made at runtime between different rules, such as unit resolution and hyperresolution, have a significant impact on computational efficiency. This fact has turned the question into a classic AI problem, crucial to distributed planning, for example, which the authors now relate to MAS techniques. During the last decade, parts of the theory of information systems have been merged successfully with parts of the theory of knowledge-based systems, and later that of agent-based systems. In his paper "Commitments among autonomous agents in information-rich environments', Singh takes this proposed merger further by investigating the connection between the extended transaction model and the organisational aspects of MAS inspired by sociology. Singh focuses on commitments and argues that the information systems community should take organisational structures more seriously, while the MAS community should perhaps take them less seriously. Specifically, the anthropomorphic interpretation of beliefs as mental states is shown to be unnecessary in the context of co-operative information systems. In "Making a case for multi-agent systems", Ygge and Akkermans investigate in depth Huberman and Clearwater's conclusion regarding the appropriateness of a MAS alternative to classical engineering methods. In comparison to the three previous papers, this position paper is more explicit in its connection to universally accepted non-MAS approaches. Impeccable from a methodological standpoint, the paper is a reminder of the careful analysis and the comparison to traditional methods that MAS methods have to withstand. An approach to distributed problem-solving with a simple form of learning is explained in "Learning network designs for asynchronous teams" by Baerentzen, Avila, and Talukdar. The very difficult problem of using historical information for solving new problem instances is attacked here by representing the problem using a form ofprobabilistic automata. The problem is further complicated by the fact that (only partly autonomous) agents make observations of earlier processes of problem solving. A full programming language environment for multi-agent applications is introduced by Joosen, Bijnens, Matthijs, Robben, Van Oeyen, and Verbaeten in "Building multi-agent systems with CORRELATE". Based on concurrent OOP, the language was developed mainly for agent-oriented programming. The long-term goal of the research project is to provide a platform for several such languages. Techniques for representing the dynamics of models include dynamic logic, cybernetics, and conceptual modelling. The representation of state in conceptual models is traditionally augmented with a temporal transition relation that captures the dynamics of the universe of discourse. In "Modelling an agent's mind and matter", Jonker and Treur extend this with a complete set of temporal rules. The authors connect to MAS by representing the lifespan between the birth and death of multiple agent objects. What is unique about their approach is that the authors break up the traditional duality that in conceptual modelling separates the object level from the meta level, and that in logic forms the basis for the division into syntax
• and semantics. The mind of the agent is instead seen as materialised, and its physical features modelled separately from the model of the physical world. Castelfranchi and Falcone cover many topics already mentioned in their paper "Delegation conflicts". They use the mind metaphor as did Jonker and Treur. They discuss the trade-offbetween autonomy and obedience as did Boman. They also discuss roles, as did Glaser and Morignot. The authors claim that an analytical theory of delegation and adoption is central to MAS, and to cooperation in particular. Delegation is explained in terms of adopting another agent's plan, and such adoption is argued to be rational also from the point of view of an artificial agent. The paper pays special attention to the possible conflicts resulting from such delegation. We note with pleasure (and not without surprise) that most of the papers are closely related to the theme, in spite of the fact that no measures were taken in the reviewing process to secure this. This indicates that problems associated with multi-agent rationality lie at the heart of MAS.
Acknowledgements We would like to thank those who were involved in the previous MAAMAW (documented in "Agents Breaking Away", edited by Van de Velde and Pertain, Volume 1038 of this series) for sharing their experiences with us. Thanks are due to Hel~ne Karlsson who assisted us considerably by keeping tabs on authors and papers. We also thank our sponsors Sydkraft, ABB Network Partner, and the City of Ronneby. Last but not least, MAAMAW'97 could not have happened without the exceptional effort put in by the organisational chair in Ronneby, Staffan Hiigg, and his staff and student volunteers. The program committee consisted of the following 26 researchers who, sometimes assisted by helpful associates, secured that each submitted paper was given at least three reviews. Magnus Boman John Campbell Cristiano Castelfranchi Helder Coelho Anne Collinot Yves Demazeau Aldo Dragoni Love Ekenberg Jacques Ferber Francisco Garijo Marie-Pierre Gleizes Rune Gustavsson Nick Jennings Wouter Joosen George Kiss
(Stockholm Univ/RIT, Sweden) (University College, London, UK) (Ist di Psicologia del CNR, Rome, Italy) (INESC, Technical Univ, Lisbon, Portugal) (LAFORIA-IBP, Paris, France) (LEIBNIZ/IMAG, Grenoble, France) (Univ di Ancona, Italy) (IIASA, Vienna, Austria) (LIRMM, Montpellier, France) (Telefonica, Madrid, Spain) (SMI, Toulouse, France) (IDE, Karlskrona/Ronneby, Sweden) (Queen Mary & Westfield Coil, London, UK) (Katholieke Univ, Leuven, Belgium) (Open Univ, Milton Keynes, UK)
• Judith Masthoff Jean-Pierre Mtiller JOrgen MOiler Eugenio Oliveira John Perram Jeffrey Rosenschein Donald Steiner Kurt Sundermeyer Jan Treur Walter Van de Velde Peter Wavish
(Philips Research, Eindhoven, Netherlands) (IliA, Neuchatel, Switzerland) (Univ Bremen, Germany) (Univ do Porto, Portugal) (Odense Univ, Denmark) (Hebrew Univ, Jerusalem, Israel) (Siemens AG, Munich, Germany) (Daimler Benz AG, Berlin, Germany) (Free Univ, Amsterdam, Netherlands) (Vrije Univ Brussel, Belgium) (Philips Research, Redhill, UK)
Market-Aware Agents for a Multiagent World Michael P. Wellman University of Michigan Ann Arbor, MI USA
[email protected] http://ai.eecs.umich.edu/people/wellman/
Abstract. The title of this aptly named workshop envisions a world populated by numerous (presumably artificial) agents, acting and interacting autonomously, producing behaviors of complexity beyond our means to predict--hence the need for the modeling effort called for in MAAMAW's first "M". For the past several years, my research group has been exploring the idea of constructing engineerable multiagent worlds based on economic princples. In this "market-oriented programming" approach, we solve multiagent decision problems by (1) casting them in terms of assigning resources to production and consumption activities of the constituent agents, and (2) running the agents within a computational market price system to determine an equilibrium allocation. To date, we have tested this approach with applications to simple problems in transportation planning, distributed engineering design, and network information services. Current work is developing more complex models in these domains, as well as investigating further applications in allocation of computational resources, and provision of distributed information services in a digital library. In this talk, I provide an overview of the approach, including some underlying economic theory and some overlying computational issues. Examples from models we have developed illustrate fundamental concepts of the methodology, including: competitive vs. strategic behavior, intertemporal allocation through futures markets, and representing uncertainty through contingent goods. I will also discuss some new infrastructure for market-oriented programming: The Michigan Intemet AuctionBot is a configurable auction server implementing marketbased negotiation over the World-Wide Web. Users create auctions by filling in web forms specifying their desired auction characteristics, and the AuctionBot takes over: accepting bids, calculating prices and matches according to the specified auction rules, and notifying bidders of the results. The AuctionBot is now operating, at: http://anction.eecs.umich.edu/
Learning and Adoption Versus Optimization Y. M. Ermoliev IIASA A-2361 Laxenburg ermoliev @iiasa.ac.at
Abstract. The aim of the talk is to discuss the role of stochastic optimization techniques in designing learning and adaptive processes. Neural and Bayesian networks, path-dependent adaptive urn's processes, automation learning problems and agent based models are considered. A unified general framework of stochastic optimization is proposed enabling us to derive various known and many other adaptive procedures for such different classes of models. We emphasize similarities with natural evolutionary processes, but at the same time we show that this similarity may be misleading when we deal with man-made systems. The "particles" (economic agents, enterprises, countries) of such systems do not follow strong laws like the laws in mechanics and physics (for instance, gravity law). Economic "particles" have flexibility to choose different behavioral patterns (policies, decisions). The uncertainty is a key issue in the modeling of anthropogenic systems and the main methodological challenge is to address related uncertainties explicitly within overall risk-based decision making processes. Purely myopic, trial and error approaches may be expensive, time consuming and even dangerous because of the irreversible nature of decisions. The decisive feature of man-made systems is their ability to anticipate and affect possible future outcomes. Approaches facilitating our ability for making decisions in the presence of uncertainties and related risks are discussed.
Limits of Strategic Rationality for Agents and M-A Systems Cristiano Castelfranchi National Research Council - Institute of Psychology Division of "AI, Cognitive Modelling, and Interaction" PSCS-Social Simulation Project Viale Marx, 15 - 00137 Roma - ITALY tel +39 6 860 90 518 / 82 92 626 fax +39 6 82 47 37 E-mail: cds @pscs2.irmkant.rm.cnr.it
Abstract. While AI adoption of the Game-Theoretic paradigm is found motivated, it is shown to suffer from basic limits for modelling autonomous agents and MA systems. After a brief re-statement of Game-Theory's role for DAI and MAS (e.g. the introduction of formal prototypical social situations ("games"); the use of formal and sound notions, a self-interested view of autonomous agents, etc.;) a number of criticisms, that have an impact on modelling intelligent social/individualaction, have been examined: the economicist interpretation of rationality; its instrumentalistconception, which leaves implicit the ends of agents' choices; the consequent multiple equilibria allowed by the theory; the context-unboundednessof rationality (here, some contributions for a more eterarchic, context-bounded, architecture of rational agent are made, and a goal-based strategy, as distinct from a strictly utilitarian principle of decisionmaking, is proposed); its troubles with Multi-Agent Systems. Finally, some limits inherent to the notion of "incentive engineering" are pointed out.
Multiagent C o o r d i n a t i o n in Antiair Defense: A Case Study* Sanguk N o h and Piotr J. Gmytrasiewicz Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019, Box 19015 {noh, piotr}@cse.uta.edu Office: (817)272-3399, 272-3334, Fax: (817)272-3784
Abstract. This research addresses rational decision-making and coordination among antiair units whose mission is to defend a specified territory from a number of attacking missiles. The automated units have to decide which missiles to attempt to intercept, given the characteristics of the threat, and given the other units' anticipated actions, in their attempt to minimize the expected overall damages to the defended territory. Thus, an automated defense unit needs to model the other agents, either human or automated, that control the other defense batteries. For the purpose of this case study, we assume that the units cannot communicate among themselves, say, due to an imposed radio silence. We use the Recursive Modeling Method (RMM), which enables an agent to select his rational action by examining the expected utility of his alternative behaviors, and to coordinate with other agents by modeling their decision-making in a distributed multiagent environment. We describe how decision-making using RMM is applied to the antiair defense domain and show experimental results that compare the performance of coordinating teams consisting of RMM agents, human agents, and mixed RMM and human teams.
1
Introduction
This paper describes rational decision-making and rational coordination among the antiair defense units facing a missile attack. The task of automated defense units is to defend a specified territory and to coordinate their attempts to intercept the attacking missiles, given the characteristics of the threat, and given what they can expect of the other defense units. Our approach to this coordinated decision-making problem is based on the assumption that the task of each of the defense units is to minimize the overall damages to the attacked territory. Under a realistic threat situation, friendly defense units cannot expect to have an advanced knowledge of the character of the incoming attack. It is, therefore, crucial that each of the defense units make * This research has been sponsored by the Office of Naval Research Artificial Intelligence Program under contract N00014-95-1-0775, and by a research initiation grant from the CSE Department of the University of Texas at Arlington.
a decision as to which incoming threat to intercept with an available interceptor by analyzing all potential threats acquired from the radar. In such cases, coordination requires an agent to recognize the current status, and to model the actions of the other agents to decide on his own next behavior. Since in any realistic combat situation the integrity of the defense team cannot be guaranteed, and even the very existence of the other friendly units cannot be counted on, relying on pre-established coordination protocols can be suboptimal or even dangerous. Therefore, our approach is that each unit is to independently decide on and execute his action, and that coordination among the units is to emerge on-the-fly as the result of the units' individual rational actions. We begin by formulating the antiair defense as a decision-theoretic problem from the point of view of an individual defense unit. As we mentioned, the objective is to minimize damages. Since the classical notion of a symbolic goal doesn't provide a sufficient basis for choice of action in uncertain situations [13], we need the attributes quantifying the quality of choices in the design of decision-making procedures. First, each attacking missile has its threat value. We compute the missile's threat considering such attributes as the altitude of the missile and the size of its warhead. Further, the defense units should consider the probability with which their interceptors would be effective against each of the hostile missiles. Based on these attributes combined, each unit has to determine the optimal action from his probabilistic decision model. For the purpose of coordinated decision-making in a multiagent environment, our research uses the Recursive Modeling Method (RMM), previously reported in [2, 3]. RMM enables an agent to model the other agents and to rationally coordinate with them even if no protocol or overall plan can be established explicitly in advance. Using RMM as a decision-making tool, an agent rationally selects his action under uncertainty guided by the principle of expected utility maximization. We expect RMM to be appropriate to the antiair defense domain, since the coordinating units have to be able to react to threats in previously unforeseen circumstances. These can include the changing nature of the attack, other defense units being shot at and possibly incapacitated, communication lines broken down, sudden need for complete radio silence, and so on. In these unexpected conditions, relying on globally consistent view of the situation, achieved by communication or by pre-established protocols, is unrealistic or likely to lock the agents into suboptimal forms of behavior. Further, by being competent decision makers and able to rationally model the action of other agents, RMM agents can effectively interact with human-controlled units, in spite of a lack of a predetermined protocol that a human would have to learn and follow. In the following sections, this paper explores the overall objective of the antiair defense scenario. To compute the expected utilities of alternative action, we elaborate on a set of attributes describing the relevant features of the problem domain. Then, we explain how RMM leads an agent to the subjective decisiontheoretic optimality with a concrete example of a coordination in our domain, and we discuss the experimental results. In conclusion, we summarize our results and further research issues.
2
Antiair Defense Environment
Let us consider a situation depicted in Fig. 1. This scenario has six incoming missiles and two defending units in a 20 by 20 grid world. Each of the defense units independently decides to launch interceptors against the incoming missiles in the absence of communication. The incoming missiles keep going straight top-down and attack the overall ground site on which the units are located.
Missile k
Interceptor
/~
Defense
Unit
Fig. 1. The antiair defense scenario.
2.1
Attribute Analysis
One of the major problems encountered in decision making process is how to model preferences and utilities. For the purpose of formalizing the decisionmaking problem of minimizing damages, we first consider the attributes that influence the units' decision-making. Each missile has its intrinsic threat value. In our model, threat is evaluated by considering the altitude of a missile and its warhead size. Intuitively, a defense battery should give priority to the missile which is closer to the ground and bigger than the others. An explosion of a missile close to the ground results in a more powerful the blast, which increases the damages. Further, the measure of damage is proportional to the size of a missile's warhead. We calculate the missile threat, MT, using the following formula:
MT,~ = W. x 1/An where 9 W,~: the warhead size of missile n
(1)
9 An: the altitude of missile n A factor that does not enter into the missile threat calculation, but one that should be considered when an intercept decision is made, is the probability that an interceptor would be successful against a given missile. This probability is assumed to depend on the angle between a missile's direction of motion and the battery's line-of-sight. This probability is maximized when this angle is 0, as follows: P ( H I T i j ) = e -~'~'j
(2)
where " 7ij: the angle between battery i's line-of-sight and missile j's direction of motion. Thus, "~ij t a n - l a - t a n - l ~ such that 0 _< Vii _< 90 9 a: the slope of the missile j's direction of motion 9 fl: the slope of the line-of-sight with which battery i aims at missile j 9 #: an interceptor-specific constant (assumed here as 0.01) :
We will use the values of missile threat and the hit probability to develop the decision-theoretic assessment of the agent's alternative plans of action in an antiair defense environment.
2.2
Planning and Execution Cycle
In the domain in Fig. 1, the defense units, {Battery1, Battery2}, is a set of planning and executing agents. The targets, {MissileA , MissileB, MissileC, MissileD, MissileE, MissileF), have no plans of their own, and are assumed not to make any decisions. The ordered actions (Scan-Area, Select-Target, Launch-Interceptor) available to agents are repeatedly used to achieve the subgoal of intercepting one of the attacking missiles during the overall plan-action cycle. As we mentioned, there is no notion of a symbolic goal in this planning. Instead, the goal of minimizing the damages is represented as a quality measure assigned to plans, which is then used to coordinate plans among multiple agents. In this case study, our work addresses the rationally coordinated target selection, i.e., the Select-Target step, providing the best defense strategy that can be implemented by independent defense units.
3
Decision-Theoretic
Agent
To be rational in decision-theoretic sense, the agents follow the principle of maximum expected utility (PMEU) [10]. In this section, we will show how PMEU can be implemented in this case study using the Recursive Modeling Method (RMM). RMM [2, 3] will be used to model the other agent, and to select the most appropriate missile to intercept by a given defense battery.
3.1
An Example
Scenario
Our a p p r o a c h is to take the agent-oriented perspective. In the examples scenario (Fig. 1), we view the decision-making t h r o u g h the eyes of an individual defense unit, B a t t e r y l , and his r a d a r - a c q u i r e d data. 1 Fig. 2 depicts the information acquired in the example scenario by B a t t e r y l for the missiles A t h r o u g h F. In Fig. 1, the left top corner of the screen is (0,0), x is pointing right, and y is pointing down. Applying formula (1) and (2) to the acquired data, B a t t e r y 1 can c o m p u t e the relevant a t t r i b u t e s of altitude, w a r h e a d size, and the angle t h a t determines the hit probability. B a t t e r y l also generates the expected hit probabilities for B a t t e r y 2 , assuming his hypothetical intercepting actions. T h e results are s u m m a r i z e d in Fig. 2.
Data Acquisition
[ Attributes )
Missile(warhead,position) A: (470,(3,3)) B: (410,(5,6)) C: (350,(9,2)) D: (370,(12,5)) E: (420,(16.3)) F: (450,(18,6))
Location(position) Batteryl:((7,20)) Battery2:((13.20))
MT MTA: (27.65) MTs :(29.29) MTc :(19.44)
(1) & (2) ~
P(HIT) P(HITtA):(0.88) P(HITIR):(0.92) P(HIT,c):(0.94) P(HITID):(0.83) P(HITIE):(0.76) P(HIT,F):(0.68)
MTD: (24.67) MTE : (24.71) MT~ : (32.14) P(HIT2A):(0.74) P(HIT2B):(0.74) P(HIT2c):(0.88) P(HIT2D):(0.96) P(HIT2E):(0.90) P(HIT2F):(0.82)
Fig. 2. Radar data acquired by Batteryl for the missiles A through F.
3.2
Generation
of the Payoff Matrix
Commonly, a decision p r o b l e m is represented by a version of belief network [6]. Poh and Horvitz [7] use an influence d i a g r a m t h a t includes a decision variable, with values ranging over the possible decisions, chance variables, t h a t represent the uncertainty of the domain, and a utility node. In our work, we rely on the payoff m a t r i x representation, used in g a m e theory. Payoff matrices, while different from belief nets, can be seen to faithfully s u m m a r i z e the information contained in belief nets by listing the expected payoffs (obtained from the utility node) of possible decisions, depending on the p a r a m e t e r s describing the domain (chance nodes). T h e expected payoffs, corresponding to b a t t e r i e s ' a t t e m p t i n g to intercept the respective missiles, can be expressed as a c o m b i n a t i o n of the t h r e a t of the 1 Battery2 acquires the information about environment from his point of view. Batteryl and Battery2 independently maintain their knowledge bases.
missiles and the probability of their interception. For example, if the Battery1 is faced with n missiles at some state, and he targets a missile j, the resulting threat will be reduced by the missile threat MTj multiplied by the probability of successful interception P(HITlj). If both batteries target missiles at the same time, the reduction of threat, and therefore the total payoff, is equal to the sum of the threats that each of them removes. 3.3
Modeling Other Agent - Recursive Model Structure
In order to solve his decision-making situation, described by the payoff matrix above, B a t t e r y l needs to hypothesize the likely actions of the Battery2. In the Recursive Modeling Method, the actions of the other rational agents are anticipated using a model of their decision-making situation. If it were known that the Battery2 is a rational decision-maker, then Battery2's decision-making situation could be modeled as a payoff matrix as well. In our case study, we considered a more realistic situation in which it is not known for sure that the Battery2 is rationally maximizing the payoff as well. For example, it could be that Battery2 has been damaged or otherwise incapacitated, in which case there is no information as to what action he would undertake. Thus, there are two alternative models that B a t t e r y l can use to model Battery2; one has the form of the payoff matrix, and the other one contains no information about Battery2's action. We call the latter model the No-info model. In RMM, each of the alternative models is assigned a probability indicating the likelihood of its correctness. Further, in case the Battery2 is not incapacitated, it is likely that Battery2 is modeling B a t t e r y l as well, in his own attempt to coordinate with B a t t e r y l . This leads to the nesting of models in RMM. Since, in our scenario, Battery2 also may be uncertain whether B a t t e r y l is not damaged, there are two alternative models on this level of modeling as well. One of the models Battery2 can have of B a t t e r y l is that of a rational maximizer, while the other one is unknown and again labeled as a No-info model. The resulting hierarchy of models, which we call the recursive model structure, terminates with a No-in]o model when the agent (in this case Battery2) runs out of modeling information. Fig. 3 is the B a t t e r y l ' s model structure of depth three for the example scenario in Fig. 2. To summarize, level 1 in the recursive model structure represents the way that B a t t e r y l observes the situation to make his own decision, shown as B a t t e r y l ' s payoff matrix. Level 2 depicts the models B a t t e r y l has of Battery2's situation, and level 3 contains the models that B a t t e r y l anticipates Battery2 may be using to model Batteryl. The recursive modeling could continue into deeper levels, but in this case we assumed that the batteries have no further information. In other words, we are examining the reasoning of B a t t e r y l in the particular case, when equipped with a finite amount of information about Battery2, and nested to the third level of modeling. 2 2 We do not elaborate on the important issues of learning and belief revision here. Thus, we analyze decision-making given a pre-existing state of knowledge, but we do
10
Battery2
Level 1:
Batteryl
A B C D
A 26.8 47.4 38.6 41.0
B 46.0 28.7 40.0 42.3
C 41.4 44.2 19.3 37.7
D 48.0 50.7 42.0 24-~
E 46.6 49.4 40.6 42.9
F 50.6 53.4 44.7 46.9
E 39.1 40.4 35.9 42.4 24.1 45.1 F 42.3 43.7 39.1 45.7 44.3 30.3
--..<2 Batteryl
Level 2:
Battery2
A B C A 26.8 47.4 38.6 B 46.0 28.7 40.0 C 41.4 44.2 19.3 D 48.0 50.7 42.0 E 46.6 49.4 40.6 F 50.6 53.4 44.7
Battery2
D 41.0 42.3 37.7 24.5 42.9 46.9
No-/.nfo E 39.1 40.4 35.9 42.4 24.1 45.1
F 42.3 43.7 39.1 45.7 44.3 30.3
[ 1,0,0~0,0,0] . . . . .
[1/6, 1/6, 116, 1/6, 1/6, 1/6]
[0,0~0~0,0,1 ]
A B C D E F A 26.8 46.0 41.4 48.0 46.6 50.6 B 47.4 28.7 44.2 50.7 49.4 53.4
Level 3: Batteryl D C 38.6 40.0 19.3 42.0 40.6 44.7 41.0 42.3 37.7 24.5 42.9 46.9 E 39.1 40.4 35.9 42.4 24.1 45.1 F 42.3 43.7 39.1 45.7 44.3 30.3
[1,0,0,0,0,01
....
[0,0,0,0,0,1 ]
Fig. 3. The recursive model structure for Batteryl.
In general, the No-info models have to differentiate among knowledge limitations of the different agents involved. For example, the situation in which B a t t e r y l does not have any information about how he is modeled by Battery2 is different from the situation in which B a t t e r y l knows that Battery2 has no information about B a t t e r y l . No-info I on level 2 in Fig. 3 represents the fact that B a t t e r y l has no information on how Battery2 models B a t t e r y l ' s actions if B a t t e r y l is incapacitated. No-in]o 2 on level 3 represents B a t t e r y l ' s belief t h a t Battery2 has no information a b o u t how B a t t e r y l models Battery2's actions, if B a t t e r y l is not incapacitated. Since the modeling structures, such as one in Fig. 3, express the optimal choices in the agents' decision-making situations recursively depending on the choices of the other agents, one can use dynamic p r o g r a m m i n g to solve them in a b o t t o m - u p fashion. We elaborate on the solution procedure in more detail in [2], in which we have also outlined an analytical method of dealing with the No-in]o models. For the purpose of this case study, however, we have designed and implemented a numerical method of solving the No-info models using logical sampling [10], according to the algorithm below. Let Ai = {ai, 1 ai,... 2 , a ni } be the set of actions available to agent i, and Pi = not detail how the models of the other agents were obtained. For our current work on learning and model formation see [4].
11
1 2 (Pi, Pi,..., P~n ) be the probability distribution over the available actions, i.e., the conjecture as to agent i's action. The conjecture, Pi, is calculated based on frequencies, Fi = (f~, f2,..., f~), with which each action turns out to be optimal,
as follows: P r o c e d u r e No-info-PDF I n p u t the payoff matrix Mi of agent i. O u t p u t the probability distribution, Pi, over the actions of agent i. begin Fi +-- (0,0,...,0) N +- 0 / / t h e total number of probability distributions. for each probability distribution, PNI, in the set of sampled probability distributions comprising the No-Info model, Multiply a probability distribution PNI by Mi. Select an action a ik which has the maximum expected utility.
S +S +l N~-N+I e n d for
Pi +- ( f ~ / N , f ~ / N , . . . , f ~ / N ) r e t u r n Pi end Using the sampling algorithm, with the sampling density of 0.1, on the third level of the recursive model structure, reveals that the probability distribution over B a t t e r y l ' s actions becomes [0.24, 0.71, 0.0, 021, 0.0, 0.04] for interception of missiles A through F, respectively. This probability distribution summarizes Batteryl's knowledge as to Battery2's expectations, if Battery2 is not incapacitated, about the actions of Batteryl, if Battery2 thinks that Batteryl is not incapacitated. The dynamic programming bottom-up solution then proceeds to level 2. The above distribution over the actions of Batteryl is combined with each individual probability distribution of N o - i n f o 1 on level 2, after each distribution is multiplied by weight 0.8 and 0.2, respectively. The sampling No-info-PDF procedure is invoked again, resulting in [0.0, 0.0, 0.0, 0.03, 0.0, 0.97] over Battery2's actions of intercepting the missiles A through F, respectively. Thus, in spite of uncertainties on the third level of modeling, B a t t e r y l ' s knowledge is enough to expect that, if Battery2 is rational, it will most likely attempt to intercept MissileF. Propagating these results into the level 1, the probability distribution describing Battery2's actions is obtained as a combination (0.8 x [0.0, 0.0, 0.0, 0.03, 0.0, 0.97] + 0.2 x [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]), which results in [0.03, 0.03, 0.03, 0.06, 0.03, 0.82]. Now, we can compute the expected utilities of Batteryl's alternative behaviors as follows:
UA : 0.03 x 26,8+0.03 x 46.0+0.03 x 41.4+ 0.06 x 48.0+0.03 x 46.6+0.82 x 50.6 = 49.20 UB : 0.03 x 47.4+0.03 x 28.7+0.03 x 44.2 +0.06 x 50.7+0.03 x 49.4+0.82 x 53.4 = 51.92 Uc : 0.03 x 38.6+0.03 x 40.0+0.03 x 19.3+0.06 x 42.0+0.03 x 40.6+0.82 x 44.7 = 43.33 UD : 0.03 x 41.0+0.03 x 42.3+0.03 x 37.7+0.06 x 24.5+0.03 x 42.9+0.82 x 46.9 = 44.85
12 Uz : 0.03 x 39.1 +0.03x 40.4+0.03 x 35.9+0.06 x 42.4+0.03 x 24.1 +0.82 x 45.1 = 43.71 Up : 0.03 x 42.3+0.03 x 43.7+0.03 x 39.1 +0.06 x 45.7+0.03 x 44.3+0.82 x 30.3 = 32.67 Thus, if Battery1 is rational, he will attempt to maximize his own expected utility and prefer to intercept MissileB first, given that he expects it most likely that Battery2 will select MissileF simultaneously. According to the planning and execution cycle, the interceptor will be launched toward the selected missile, and the decision-making process will be repeated until there are no missile threats, or until B a t t e r y l runs out of interceptors.
4
Experiments
The antiair defense simulator is written in Common LISP and built on top of MICE simulator [1], running on a LINUX platform. In the experiments we ran, each of two defense units could launch three interceptors, and were faced with an attack by six incoming missiles. We put all of the trials under the following conditions. First, the initial positions of missiles were randomly generated and it was assumed that each missile must occupy a distinct position. Second, the performance assessments of agents with different policies were compared using the same threat situation. Third, the size of each missile was constant during all of experimentations and the warhead sizes were comparable. In this experiment, the warhead sizes were 470, 410, 350, 370, 420, 45(1 for missiles A through F, respectively. Further, each interceptor could intercept only one missile, and it was moving twice as fast as the incoming missile. Finally, although there was no communication between agents, each agent could see which threat was shot at by the other agent, and then used this information to make the next decision. Our experiments were aimed at determining the quality of modeling and coordination achieved by the RMM agents in a team, and when paired with human agents, when compared to other strategies. To evaluate the quality of the agents' performance, the results were expressed as (1) the tmmber of targets the defense units attempted to intercept, and (2) the total expected damage to friendly forces after all six interceptors were launched. The total expected damage is defined as a sum of the residual warhead sizes of the attacking missiles. Thus, if a missile was targeted for interception, then it contributed (1 - P ( H I T ) ) x Warhead_Size to the total damage. If a missile was not targeted, it contributes all of its warhead size value to the damage. The target selection strategies are as follows: 9 Random: selection randomly generated. 9 Independent, no modeling: selection of arg maxj { P ( H I T i j ) x M T j } for agent i. 9 Human 3 : selection by human. 9 RMM: selection by RMM. 3 We should remark that our human subjects were simply CSE and EE graduate students who were informed about the criteria for target selection. We would expect that antiair specialists, equipped with a modern defense doctrine, could perform substantially better than our subjects. However, the defense doctrine remains classified, and was not available to us at this point.
13 The random agents were included to provide the worst base-line case of the system performance in our experiments. The performance achieved by independent agents was included to show what coordination can be expected when agents maximize but do not model each other. It turned out that the ways that human agents choose a missile were different and sometimes quite arbitrary. For example some of our human subjects shot the only 3 missiles coming to their own side by dividing the grid world into left and right sides. They sometimes had difficulties in splitting the screen when the missiles were clustered at the center area, which led to much duplicated effort. Others tended to choose missiles with the largest missile size. Still others tried to consider the multiplication of the missile size and the hit probability, but did not model the other agent appropriately. 4.1
Performance assessments
We experimented with the above policies to understand the agent interactions in two groups: the agent teams with the same policy and the mixed agent teams among different policies.
Number
of Selected (cumulative)
5.8~-
Target
I
I
[
--
I~
RMM
R-MM--Hb-M~.N
_
5.4O00
5.2000~ I 5.0000~ . '
I ..................
RMM-INDEPENDENT INDEPENDENT HUMAN
.
.
.
- ..............
RMM-RANDOM
i 4.4000~i 4.2000L 4-OOOOi 3"8000f 3.6O0O 3.4000 20.(~000
I
40.0000 - - -
I 60.0000---
i
80.0OO0
I
100.0000
Trials
Fig. 4. The number of selected targets (averages over 10 runs).
As shown by Fig. 4 and 5, we found that the all-RMM team outperformed the human and independent teams. The average number of selected targets by RMM after 100 trials is 5.5, compared to 4.8 for the independent team, and 4.7 for the all-human team. Further, the RMM-controlled coordinated defense resulted in
14 the total expected damage of 488, which is nearly half that of independent team (732) and that of the all-human team (772). The human performance is very similar to the performance of independent agents. The most obvious reason for this is that humans tend to depend on their intuitive strategies for coordination, and, in this case, found it hard to engage in deeper, normative, decision-theoretic reasoning. The common reason for the lower score of human teams were choices of the greatest threat simultaneously - they made redundant efforts to attack the same missile. This, again, suggests that human agents attempted to minimize their expected damage but did not model the decision-making of the other human very well, while RMM agents were rationally optimizing given what they expected of the other agent. The performance of the RMM team is not perfect, however, since they were equipped with limited and uncertain knowledge of each other.
Total
expected damage (cumulative)
x 10 3
I 5~X~)
RANDOM RMM-RANDOM
I 4(~x)
HUMAN
iribifd~/tbem RMM-INDEPENDENT
I 2t~t
RMM-HUMAN RMM I oc~)
t) 5(XX}
o 40~N)
,
-
=
Trials
Fig. 5. The total expected damage (averages over 10 runs).
The performance of the heterogenous teams again suggests the favorable quality of coordination of RMM agents. Comparing a heterogenous team with a homogeneous team, the average number of selected targets for the RMM-Human team is 5.1, and for the all-human team it is 4.7; that of the RMM-Independent team is 4.98 vs. 4.8 for the Independent team; that of the RMM-Random team is 4.66, and 3.77 for the all-Random team. As shown by the above comparison, the recursive modeling allows the RMM agent to improve the performance of target selection when paired with any other agent. A particularly promising facet of these results is that the recursive model structure (Fig. 3.) seems to be a robust mechanism for modeling and coordination with the human subjects, and not
15
only among RMM agents. (For the demonstration of the antiair defense domain, refer to the Web page at http ://dali.uta.edu/Air.html.)
5
R e l a t e d Work
The approaches related to the modeling and coordination in the multiagent environment are in the area of multiagent plan recognition and plan coordination. In work on plan recognition [9] and [12], the objective is to enable an agent to model or recognize the other agents through observation to anticipate the other agents' future action, given prior knowledge about these agents. In these approaches, an agent usually compares a pre-calculated plan, or a protocol, with the on-going situation, and then chooses his next action accordingly. The process of mental-state recognition [9] assumes the correct and complete knowledge of the plans of the other agents, and it does not represent uncertainty that might be present in real-world domains. As Tambe et al. [12] pointed out, ambiguities may persist when an agent must infer unobserved actions and intentions. The multiagent coordination without communication has been dealt with in [11] and [5]. Sen, in [11], used a particular reinforcement learning methodology and concentrated on the learning classifier systems where agents share no problem-solving knowledge. However, the reinforcement learning method needs a super-agent for the external reward and requires numerous training cases to converge on the desirable entries. Mor et al. [5] handled the opponent's modeling in the complexity of polynomial time, based on a game theoretic techniques. They represented a strategy of using a deterministic finite automaton and the learning process as a means to reach equilibrium.
6
Conclusion and Further Research
This paper presented a case study in modeling and coordination in a multiagent distributed environment of the antiair defense. This investigation implies that RMM can be applied to high-level tactical decision making for minimizing the overall damages. It turned out that it was relatively simple to quantify the missile threats and the hit probability to generate fairly realistic payoff matrices. Based on threat evaluation, the RMM agents we implemented to model the other agents and to follow the principle of maximum expected utility. We anticipate that the approach presented will also be applicable to other decision problems that require modeling and coordination. The most promising conclusion of our case study seems to be the high quality of the coordinated decision-making achieved by the RMM agents. This, in turn, suggests that modeling other agents, including humans, as rational decisionmakers is a promising and viable approach to achieving plan recognition and coordination. Also, the relatively high quality achieved by the mixed human/RMM team shows how a nested decision-theoretic modeling can enable human-machine coordination without pre-established protocols that the human would have to learn and follow during the interaction.
16
In other experiments, we also implemented RMM in the predator-prey pursuit environment. There, the payoff matrices the agents used are 5-dimensional. The quality of coordination among the RMM agents for this domain is slightly inferior to that of human agents (http : / / d a l i . u t a . e d u / P u r s u i t . h t m l ) . Further, we applied Bayesian learning for belief update to the antiair defense domain, as well as the pursuit domain. In our future research we will investigate the communication method between agents using KQML and probe the value of decision-model refinement [8].
References 1. E. H. Durfee and T. A. Montgomery. MICE: A flexible testbed for intelligent coordination experiments. In Proceedings of the 1989 Distributed A I Workshop, pages 25-40, Sept. 1989. 2. P. J. Gmytrasiewicz. On reasoning about other agents. In Intelligent Agents II: Agent Theories, Architectures, and Languages, pages 143-155. Springer, 1996. 3. P. J. Gmytrasiewicz and E. H. Durfee. A rigorous, operational formalization of recursive modeling. In Proceedings of the First International Conference on MultiAgent Systems, pages 125--132. AAAI Press/The MIT Press, 1995. 4. T. Kellogg and P. J. Gmytrasiewicz. Bayesian belief update in multi-agent systems. In preparation, 1996. 5. Y. Mor, C. V. Goldman, and J. S. Rosenschein. Learn your opponent's strategy (in polynomial time)! In G. WeiB and S. Sen, editors, Adaptation and Learning in
Multi-Agent Systems - IJCAI'95 Workshop, Lecture Notes in Artificial Intelligence, pages 164 I76. Springer, New York, 1996. 6. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufinan, 1988. 7. K. L. Poh and E. J. Horvitz. A graph-theoretic analysis of information value. In
Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, 1996. 8. K. L. Poh and E. J. Horvitz. Reasoning about the value of decision-model refinement: Methods and application. In Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence, pages 174 182. Morgan Kaufmann Publishers, Inc., ,July 1993. 9. A. S. Rao and G. Murray. Multi-agent mental-state recognition and its application to air-combat modelling. In Proceedings of 13th International Distributed Artificial Intelligence Workshop, pages 283 304, 1994. 10. S. J. Russell and P. Norvig. Artificial Intelligence A Modern Approach. PrenticeHall, Englewood Cliffs, New Jersey, 1995. 11. S. Sen and M. Sekaran. Multiagent coordination with learning classifier systems. In G. W'eiB and S. Sen, editors, Adaptation and Learning in Multi-Agent Systems - IJCAI'95 Workshop, Lecture Notes in Artificial Intelligence, pages 218-233. Springer, New York, 1996. 12. M. Tambe and P. S. Rosenbloom. Architectures for agents that track other agents in multi-agent worlds. In Intelligent Agents II: Agent Theories, Architectures~ and Languages, pages 156 170. Springer, 1996. 13. M. P. Wellman and J. Doyle. Preferential semantics for goals. In Proceedings of the National Conference on Artificial Intelligence, pages 698-703, July 1991.
A Service-Oriented N e g o t i a t i o n M o d e l between Autonomous Agents Carles Sierra *, P e y m a n Faratin, Nick R. Jennings Dept. Electronic Engineering, Queen Mary and Westfield College, University of London, London E1 4NS, UK. {C.A.Sierra,P.Faratin,N.R.Jennings}@qmw.ac.uk
A b s t r a c t . We present a formal model of negotiation between autonomous agents. The purpose of the negotiation is to reach an agreement about the provision of a service by one agent for another. The model defines a range of strategies and tactics that agents can employ to generate initial offers, evaluate proposals and offer counter proposals. The model is based on computationally tractable assumptions and is demonstrated in the domain of business process management. Initial proofs about the convergence of negotiation are also presented.
1
Introduction
Autonomous agents are being increasingly used in a wide range of industrial and commercial domains [2]. These agents have a high degree of self determination they decide for themselves what, when and under what conditions their actions should be performed. In most cases, such agents need to interact with other autonomous agents to achieve their objectives (either because they do not have sufficient capabilities or resources to complete their problem solving alone or because there are interdependencies between the agents). The objectives of these interactions are to make other agents undertake a particular course of action (e.g. perform a particular service), modify a planned course of action (e.g. delay or bring forward a particular action so t h a t there is no longer a conflict), or come to an agreement on a common course of action. Since the agents have no direct control over one another, they must persuade their acquaintances to act in particular ways (they cannot simply instruct them). The paradigm case of persuasion is negotiation - a process by which a joint decision is made by two or more parties. The parties first verbalise contradictory demands and then move towards agreement by a process of concession making or search for new alternatives, (cf. [3]). Given its pervasive nature, negotiation comes in m a n y shapes and forms. However in this work we are interested in a particular class of negotiation -
* On sabbatical leave from Artificial Intelligence Research Institute -IIIA, Spanish Council for Scientific Research -CSIC. 08193 Bellaterra, Barcelona, Spain. sierra~iiia.csic.es. With the support of the Spanish Ministry of Education grant PR95-313.
18 namely service-oriented negotiation. In this context, one agent (the client) requires a service to be performed on its behalf by some other agent (the server) 2. Negotiation involves determining a contract under certain terms and conditions. The negotiation may be iterative in that several rounds of offers and counter offers will occur before an agreement is reached or the negotiation is terminated. When building an autonomous agent which is capable of flexible and sophisticated negotiation, three broad areas need to be considered [7] - what negotiation protocol will be used?, what are the issues over which negotiation takes place?, and what reasoning model will the agents employ? This paper concentrates predominantly on the final point although the protocol and negotiation object are briefly defined. A comprehensive reasoning model for service-oriented negotiation should determine: which potential servers should be contacted, whether negotiation should proceed in parallel with all servers or whether it should run sequentially, what initial offers should be sent out, what is the range of acceptable agreements, what counter offers should be generated, when negotiation should be abandoned, and when an agreement is reached. This paper presents a formal account of a negotiating agent's reasoning component -in particular it concentrates on the processes of generating an initial offer, of evaluating incoming proposals, and of generating counter proposals. The model specifies the key structures and processes involved in this endeavour and defines their inter-relationships. The model was shaped by practical considerations and insights emanating from the development of a system of negotiating agents for business process management (see [5] and Section 2 for more details). The main contributions of this work are: (i) it allows rich and flexible negotiation schemes to be defined; (ii) it is based on assumptions which are realistic for autonomous computational agents (see Section 3.2 for the set of requirements and Section 7 for a discussion of related approaches); and (iii) it presents some initial results on the convergence of negotiation (see Section 6). In this paper we concentrate on many-parties, many-issues, single-encounter negotiations with an environment of limited resources (time among them). Section 2 gives details of the type of applications and scenarios we are interested in. Sections 3 to 5 present the proposed model. Finally, some results on negotiation convergence and future avenues of work are outlined.
2
Service-Oriented Negotiation
This section characterises a context in which service oriented negotiation takes place. The scenario is motivated by work in the A D E P T project [5] which has developed negotiating agents for business process management applications. However, we believe that the characteristics emerging from this domain have a wide variety of application. To provide a detailed context for this work, a multi-agent 2 A service is a problem solving activity which has clearly defined start and end points. Examples include diagnosing a fault, buying a group of shares in the stock market, or allocating bandwidth to transmit a video-conference.
19 system for managing a British Telecom (BT) business process is presented (section 2.1). This scenario is then analysed in terms of its key characteristics and assumptions as they relate to the process of negotiation (section 2.2).
2.1
BT~s P r o v i d e C u s t o m e r Q u o t e B u s i n e s s P r o c e s s
This scenario is based on BT's business process of providing a quotation for designing a network to provide particular services to a customer (figure 1) 3. The overall process receives a customer service request as its input and generates as its output a quote specifying how much it would cost to build a network to realise that service. It involves up to six agent types: the sales department agent, the customer service division agent, the legal department agent, the design division agent, the surveyor department agent, and the various agents who provide the out-sourced service of vetting customers.
Fig. 1. Agent system for BT's provide customer quote business process. 3 The negotiations between the agents are denoted by arrows (arrow head toward client) and the service involved in the negotiation is juxtaposed to the respective arrow.
20 The process is initiated by the sales agent which negotiates with the CSD agent (mainly over time, but also over the number of invocations and the form in which the final result should be delivered) for the service of providing a customer quote. The first stages of the Provide_Customer_Quote service involve the CSD agent capturing the customer's details and vetting the customer in terms of their credit worthiness. The latter sub-service is actually performed by one of the VC agents. Negotiation is used to determine which VC agent should be selected - the main attributes which are negotiated over are the price of the service, the penalty for contract violation, the desired quality of the service and the time by which the service should be performed. If the customer fails the vetting procedure, then the quote process terminates. Assuming the customer is satisfactory, the CSD agent maps their requirements against a service portfolio. If the requirements can be met by a standard off-the-shelf portfolio item then an immediate quote can be offered based on previous examples. In the case of bespoke services, however, the process is more complex. The CSD agent negotiates with the DD agent (over time and quality) for the service of designing and costing the desired network service. In order for the DD agent to provide this service it must negotiate with the LD agent (over time) and perhaps with the SD agent. The LD agent checks the design to ensure the legality of the proposed service (e.g. it is illegal to send unauthorised encrypted messages across France). If the desired service is illegal, then the entire quote process terminates and the customer is informed. If the requested service is legal then the design phase can start. To prepare a network design it is usually necessary to have a detailed plan of the existing equipment at the customer's premises. Sometimes such plans might not exist and sometimes they may be out of date. In either case, the DD agent determines whether the customer site(s) should be surveyed. If such a survey is warranted, the DD agent negotiates with the SD agent (over price and time) for the Survey_Customer_Site service. On completion of the network design and costing, the DD agent informs the CSD agent which informs the customer of the service quote. The business process then terminates. The structure of the negotiation object is based almost directly on the legal contracts used to regulate agreements in the current manual approach to business process management. This structure is fairly rich and covers both service and meta-service attributes. In more detail, it contains: (i) the service name; (ii) a unique agreement identifier (covering the case where there are multiple agreements for the same service); (iii) the agents involved in the agreement (client and server); (iv) the type of agreement (one off agreement for a single service invocation versus on-going agreements for multiple invocations of the same service); (v) timing information (duration represents the maximum time the server can take to finish the service, and start time and end time represent the time during which the agreement is valid); (vi) the volume of invocations permissible between the start and end times (for on-going agreements only); (vii) the price paid per invocation; (viii) the penalty the server incurs for every violation of the agreement; (ix) the information the client must provide to the server on service invocation; and (x) the policy used for disseminating the service's intermediate and final results to the client.
21 2.2
Characteristics and Assumptions
The following negotiation characteristics can be noted from the ADEPT business process scenario. Moreover, it is believed that these characteristics are common to a wide range of service oriented negotiations between autonomous agents. A given service can be provided by more than one agent (e.g. multiple agents can provide the vet customer service to the CSD agent). The available services may be identical in their characteristics or they may vary along several dimensions (e.g. quality, price, availability, etc.). Individual agents can be both clients and servers for different services in different negotiation contexts. Negotiations can range over a number of quantitative (e.g. price, duration, and cost) and qualitative (e.g. type of reporting policy, and nature of the contract) issues. Each successful negotiation requires a range of such issues to be resolved to the satisfaction of both parties. Agents may be required to make trade-offs between issues (e.g. faster completion time for lower quality) in order to come to an agreement. - The social context and inter-relationships of the participants influences the way agents negotiate. Some negotiations involve entities within the same organisation (e.g. between the CSD and DD agents) and hence are generally cooperative in nature. Other negotiations are inter-organisational and purely competitive - involving self interested, utility maximising agents (e.g. between the VC agents and the CSD agent). Some groups of agents often negotiate with one another for the same service (e.g. the CSD and DD agents), whereas other negotiations are more open in nature (for example, the set of VC agents changes frequently and hence the CSD agent often negotiates with unknown agents). - As the agents are autonomous, the factors which influence their negotiation stance and behaviour are private and not available to their opponents (especially in inter-organisational settings). Thus agents do not know what utilities their opponents place on various outcomes, they do not know what reasoning models they employ, they do not know their opponent's constraints and they do not know whether an agreement is even possible at the outset (i.e. the participants may have non-intersecting ranges of acceptability). - Since negotiation takes place within a highly intertwined web of activity (the business process) time is a critical factor. Timings are important on two distinct levels: (i) the time it takes to reach an agreement must be reasonable; and (ii) the time by which the negotiated service must be executed is important in most cases and crucial in others. The former means that the agents should not become involved in unnecessarily complex and time consuming negotiations - the time spent negotiating should be reasonable with respect to the value of the service agreement. The latter means that the agents sometimes have hard deadlines by which agreements must be in place (this occurs mainly when multiple services need to be combined or closely coordinated).
-
-
-
22 3
The
Negotiation
Model
The negotiation model for autonomous agents proposed in this Section is based on a variation of the two parties, many issues value scoring system presented in [8]. T h a t is, a model for bilateral negotiations about a set of quantitative variables. Our variation transforms that model into a many parties, many issues model (that is, multilateral negotiations about a set of variables). Multilateral negotiations are central to the application domains we are interested in. Our model of multilateral negotiations is based on a set of mutually influencing two parties, many issues negotiations. We will call the sequence of offers and counteroffers in a two-party negotiation a negotiation thread. Offers and counter offers are generated by lineal combinations of simple functions, called tactics. Tactics generate an offer, or counter offer, for a single component of the negotiation object using a single criteria (time, resources, etc.). Different weights in the lineal combination allow the varying importance of the criteria to be modelled. For example, when determining values of slots in the negotiation object it may initially be more important to take into account the other agent's behaviour than the remaining time. In which case, the tactics that emphasize the behaviour of other agents will be given greater preference than the tactics which base their value on the amount of time remaining. To achieve flexibility in the negotiation, the agents may wish to change their ratings of the importance of the different criteria over time. For example, remaining time may become more relevant than the imitation of the other's behaviour as the time by which an agreement must be in place approaches. We use the term "strategy" to denote the way in which an agent changes the weights of the different tactics over time. Thus strategies combine tactics depending on the history of negotiations and the mental state of agents, and negotiation threads influence one another by means of strategies (see Section 6). Before presenting our model, we introduce Raiffa's basic model for bilateral negotiation [8]. 3.1
The bilateral negotiation model
Let i (i e {a, b}) represent the negotiating agents and j (j e {1, ..., n}) the issues under negotiation. Let xj E [minj, maxj] be a value for issue j. Here we consider issues for which negotiation amounts to determining a value between a delimited range. Each agent has a scoring function ~ : [mini, maxj] --~ [0, 1] that gives the score agent i assigns to a value of issue j in the range of its acceptable values. For convenience, scores are kept in the interval [0, 1]. The next element of the model is the relative importance that an agent assigns to each issue under negotiation, wji is the importance of issue j for agent i. We assume the weights of both agents are normalized, i.e. ~l<_j
23
for all i in {a, b}. With these elements in place, it is now possible to define an agent's scoring function 4 for a contract - that is, for a value x = (xl, ..., xn) in the multi-dimensional space defined by the issues' value ranges:
l<j<_n
If both negotiators use such an additive scoring I function, it is possible to show how to compute the optimum value of x as an element on the efficient frontier of negotiation 5 (see [8], p. 164).
3.2
Service-oriented negotiation requirements
The above mentioned model for bilateral negotiation is valid for some service oriented settings. However, the model contains several implicit assumptions that, although they permit good optimisation results, are inappropriate for our scenarios: 1. Privacy of information. To find the optimum value, the scoring functions have to be disclosed. This is, in general, inappropriate for competitive negotiation. 2. Privacy of models. Both negotiators have to use the same additive scoring model. However, the models used to evaluate offers and generate counter offers are one of the things that negotiators try to hide from one another. 3. Value restrictions. There are pre-defined value regions for discussion (they are necessary to define the limits of the scoring function). However, it is not always possible to find these common regions and in many cases negotiation actually involves determining whether such regions even exist. 4. Time restrictions. There is no notion of timing issues in the negotiation. However, time is a major constraint on the agent's behaviour [6]. This is mainly true on the client side; agents often have strict deadlines by when the negotiation must be completed. For instance, a video link has to be provided at 16:00 because at that time a conference should start; negotiation about set up cannot continue after that instant in time. 5. Resource restrictions. There is no notion of resource issues in the negotiation. However, the quantity of a particular resource has a strong and direct influence on the behaviour of agents, and, moreover, the correct appreciation of the remaining resources is an essential characteristic of good negotiators. Resources from the client point of view relate directly with the number of servers engaging in negotiations; likewise from the server's point of view. Thus, the quantity of resource has a similar effect on the agents' behaviour as time.
4 Non-linear approaches to modelling utility could be used if necessary without affecting the basic ideas of the model. 5 Any contract not on this frontier is sub-optimal (i.e. not pareto-optimal) in that possible mutual gains are missed.
24
Taking the first consideration alone, it is clear that optimal solution cannot be found in our domains: it is not possible to optimize an unknown function. Hence, we shall propose a model for individual agent negotiation that looks for deals acceptable to its acquaintances but which, nevertheless, maximises the agent's own scoring function. 3.3
A service-oriented negotiation model
In service oriented negotiations, agents exhibit two possible behaviours that are, in principle, in conflict. Hence we shall distinguish (for notational convenience) two subsets of agents 6, Agents = Clients U Servers. We use roman letters to represent agents; c, e l , c 2 , . . , will stand for clients, s, s l , s 2 , . . , for servers and a, al, b, d, e , . . . for unspecific agents.
We adhere to an additive scoring system in which, for simplicity, the function Vja is either monotonically increasing or monotonically decreasing. Clients and service providers may have mutual interest for particular variables. For example, Raiffa cites an example [8, pg. 133-147] in which the Police Officers Union and the City Hall realize, in the course of their negotiations, that they both want the police commissioner fired. Having recognised this mutual interest they quickly agree that this course of action should be selected. However, in general, clients and servers have opposing interests, e.g. a client wants a low price for a service, whereas his potential servers attempt to obtain the highest price. High quality is desired by clients but not by servers, and so on. Therefore, in the space of negotiation values, negotiators represent opposing forces in each one of the dimensions. In consequence, the scoring functions verify that given a client c and a server s negotiating values for issue j, then if x j, yj E [mini, m a x j] and xj > yj then (SC(xj) > ~c(yj) iff 5S(xj) < %~(yj)). In contrast, where there is a mutual interest, a variable will be assigned one of its extreme values. Hence these variables can be taken out of the negotiation set. For instance, the act of firing the police commisioner can be removed from the set of issues under negotiation and assigned the extreme value "done". Once the agents have determined the set of variables over which they will negotiate, the negotiation process between two agents a, b E Agents consists of an alternate succession of offers and counter offers of values for these variables. This continues until an offer or counter offer is accepted by the other side or one of the partners terminates negotiation (e.g. because the time deadline is reached without an agreement being in place). Negotiation can be initiated by clients or servers. We represent by x~_~b the vector of values proposed by agent a to agent b t at time t, and by X a--,b[3] the value for issue j proposed from a to b at time t. The range of values acceptable to agent a for issue j will be represented as the interval [min;, max;]. For convenience, we assume a common global time (the calendar time) represented by a linearly ordered set of instants, namely 6 The subsets are not disjoint since an agent can participate as a client in one negotiation and as a service provider in another.
25
Time, and a reliable communication medium introducing no delays in message transmission (so we can assume t h a t emission and reception times are identical). The common time assumption is not too strong for our application domains, because time granularity and offer and counter offers frequencies are not too high. Then, D e f i n i t i o n 1. A N e g o t i a t i o n T h r e a d between agents a, b E Agents, at time t E Time, noted xa~ b t o r Xb,..~a ,t is any finite sequence of the form {X~ll,..+e 1, Xd2--*e2' t2 . . . ' x~;r~ d,~---*e,,}where: 1. ei -- di+l, proposals are alternate between both agents,
2. tk <_tl if k < l, ordered over time, 3. di, ei C {a, b}, the thread contains only proposals between agents a and b, 4. di 7~ ei, the proposals are between agents, and 5. x t'd,__*r e [m~nj"dl, max/]d or is one of the particles {accept, reject}. Superindex tn represents an instant in the set Time such t h a t tn <_ t. We will say t h a t a negotiation thread is a c t i v e 7 if Xtd:~. r {accept, reject}. For simplicity in the notation, we assume in the sequel t h a t tl corresponds to the initial time value, t h a t is tl = 0. In other words, there is a local time for each negotiation thread, t h a t starts with the utterance of the first offer. When agent a receives an offer from agent b at time t, t h a t is x~_.a, it has to rate the offer using its scoring function. If the value of Va(Xtb_~a) is greater t h a n the value of the counter offer agent a is ready to send at the time t' when the evaluation t' is performed, that is, xa_~b with t' > t, then agent a accepts. Otherwise, the counter offer is submitted. T h e interpretation function I ~ expresses this concept more formally: D e f i n i t i o n 2. Given an agent a and its associated scoring function V ~, the i n t e r p r e t a t i o n by agent a at time t' of an offer x~_~ sent at time t < t', is defined as:
i~(t,,ztb_~) =
l
a
tI
accept If Va(Ztb_~a) >_ V (x~_~b) t' Xa....b otherwise
t' b is the contract t h a t agent a would offer to b at the time of the where x~_~ interpretation. The result of Ia(t ', Xtb__.~) is used to extend the current negotiation thread between the agents. This interpretation also models the fact t h a t a contract unacceptable t o d a y can be accepted tomorrow merely by the fact t h a t time has passed. t' , agent a uses a set of tactics t h a t In order to prepare a counter offer, Xa_..+b generate new values for each variable in the negotiation set. Based on the needs of 7 We assume that any offer is valid (that is, the agent that uttered it is commited) until a counter offer is received. If the response time is relevant it can be included in the set of issues under negotiation.
26 our business process applications (Section 2), we developed the following families of tactics: 1. T i m e - d e p e n d e n t . If an agent has a time deadline by which an agreement must be in place, these tactics model the fact that the agent is likely to concede more rapidly as the deadline approaches. The shape of the curve of concession, a function depending on time, is what differentiates tactics in this set. 2. R e s o u r c e - d e p e n d e n t . These tactics model the pressure in reaching an agreement t h a t (i) the limited resources -e.g. remaining bandwidth to be allocated, money, or any o t h e r - and (2) the environment - n u m b e r of clients, number of servers or economic p a r a m e t e r s - impose upon the agent's behaviour. The functions in this set are similar to the time dependent functions except t h a t the domain of the function is the quantity of resources available instead of the remaining time. 3. I m i t a t i v e . In situations in which the agent is not under a great deal of pressure to reach an agreement, it may choose to use imitative tactics t h a t protect it from being exploited by other agents. In this case, the counter offer depends on the behaviour of the negotiation opponent. The tactics in this family differ in which aspect of their opponent's behaviour they imitate, and to what degree the opponent's behaviour is imitated. We do not claim t h a t these family types are complete, nor t h a t we have enumerated all possible instances of tactics within a given family. These are merely the types of tactics we found useful in our applications.
4
Negotiation tactics
Tactics are the set of functions t h a t determine how to compute the value of a quantitative issue (price, volume, duration, ...), by considering a single criteria (time, resources, ...). The set of values for the negotiation issue are then the range of the function, and the single criteria is its domain. The criteria we have chosen for the application domain, as explained in the previous section, are time, resources and previous offers and counter offers. Given t h a t agents may want to consider more than one criterion to compute the value for a single issue, we model the generation of counter proposals as a weighted combination of different tactics covering the set of criteria. The values so computed for the different issues s will be the elements of the counter proposal. For instance, if an agent wants to counter propose taking into account two criteria: the remaining time and the previous behaviour of the opponent, it can select two tactics: boulware (sec. 4.1, based on the remaining time, and Tit-ForTat (see. 4.3) to imitate the behaviour of the oponent. Each of the so selected tactics will suggest a value to counter propose for the issue under negotiation. s values for issues may be computed by different weighted combinations of tactics.
27 The actual value which is counter proposed will be the weighted combination of the two suggested values. Given an issue j, for which a value is under negotiation, an agent a's initial offer corresponds to a value in the issue's acceptable region, that is, a value in [min~, max,]. For instance, a client with a range [s s for the price p to pay for a good may start the negotiation process by offering the server s -what initial offer should be chosen is something the agent can learn by experience. The server, with range [s s may then make an initial counter-offer of s With these two initial values, the strategy of the first agent may consist of using a time-dependent tactic giving s -e.g. if it has a short time to reach an agreement, it will start conceding. And then if the strategy of the latter agent is to use an imitative tactic, it could generate a counter-proposal of s (imitating the s shift of its opponent). And so on.
4.1
Time-dependent tactics
In these tactics, the predominant factor used to decide which value to offer next is time, t. Thus these tactics consist of varying the acceptance value for the issue depending on the remaining negotiation time (an important requirement in our domain -Section 2.2). This requires a constant t ma a x in agent a that represents an instant in the future by when the negotiation must be completed. We model the initial offer as being a point in the interval of values of the issue under negotiation. Hence, agents define a constant ~ that multiplied by the size of the interval determines the value of issue j to be offered in the first proposal by agent a. We model the value to be uttered by agent a to agent b for issue j as the a offer at time t, with 0 < t < tmax, by a function aja depending on time as the following expression shows:
min~ + a~(t)(max~ - min~) If Vj~ is decreasing -- a2(t))(max ~ -- min~) If Vja is increasing
xta--*b[J]= [ min~ + (1
A wide range of time-dependent functions can be defined simply by varying the way in which c~(t) is computed. Functions must ensure that 0 _< c~(t) < 1, cry(0) = tr and @(t~na~ ) = 1. T h a t is, the offer will always be between the value range, at the beginning it will give the initial constant and when the time deadline is reached the tactic will suggest to offer the reservation value 9. We distinguish two families of functions with this intended behaviour, polynomial and exponential. Both families are parameterised by a value/3 E ~+ that determines the convexity degree (see Figure 2) of the curve. We have chosen these two families of functions because of the very different way they model concession. For 9 The reservation value for issue j of agent a represents the value that gives the smallest score for function V3a. The reservation value for agent a and issue j depends on the function V~a and the range [rain~,max~]. If V~a is monotonically increasing, then the reservation value is min~; if it is decreasing the reservation value is max~.
28 the same big value of ~ the polynomial function concedes faster at the beginning than the exponential one, then they behave similarly. For a small value of/3, the exponential function waits longer to start conceding than the polynomial one. Many other functions could eventually be defined. - P o l y n o m i a l . ~2(t) = nj + (1 -
E x p o n e n t i a l . c~(t) = e
a rain t,t 3/ k tmax ( 1 - m i n ( t ' t m a ~ ) )/3 In ~a
I
•
~o~
These families of functions represent an infinite number of possible tactics, one for each value of/3. However to better understand their behaviour we have classified them, depending on the value of/3, into two extreme sets showing clearly different patterns of behaviour. Other sets in between these two could be defined: 1. B o u l w a r e t a c t i c s [[8], pg. 48]. Either exponential or polynomial functions with /3 < 1. This tactic maintains the offered value until the time is almost exhausted, whereupon it concedes up to the reservation value 1~ The behaviour of this family of tactics with respect to /3 is easily understood taking into account that lim~_.0+ e (1a
lim~__,0+ ~ j + (1 - g ; ) (
min(t,tma~)~
t.~=
,
t ....
,
= ~j or
a
= ~j
2. C o n c e d e r [[3], pg. 20]. Either exponential or polynomial functions with /3 > 1. The agent goes to its reservation value very quickly. For similar reasons as before, we have lim0_~+o~ e (1-
~.
.
.
.
1 or
ma:r
4.2
Resource-dependent
tactics
These tactics are similar to the time-dependent ones. Indeed time-dependent tactics can be seen as a type of resource-dependent tactic in which the sole resource considered is time. Whereas time vanishes constantly up to its end, other resources may have different patterns of usage. We model resource-dependent tactics in the same way as time-dependent ones, that is, by using the same functions, but by making the value taa~ dynamic. Its value represents a heuristic about how many resources are in the environment. The scarcer the resource the more urgent the need for an agreement to be reached. In our application domains the most important resource to model is the number of agents negotiating with a given agent and how keen they are to reach agreements. On one hand, the greater the number of agents who are negotiating with agent a for a particular service s, the lower the pressure on agent a to reach an agreement with any 10 Besides the pattern of concession that these functions model, Boulware negotiation tactics presume that the interval of values for negotiation is very narrow. Hence, when the deadline is reached and ~(tma~) = 1 the offer generated is not substantially different from the initial one.
29
a(o
1-
/
~.io
/
0,5
I
I
i
0,5
i
i 1
t/~
I
0,5
1
t / h , mx
Fig. 2. Polynomial (left) and Exponential (right) functions for the computation of a(t). Time is presented as relative to t~ax.
specific individual. While on the other hand, the longer the negotiation thread, the greater the pressure on a to come to an agreement. Hence, representing the set of agents negotiating with agent a at time t as: Na(t) = {ilx~ais active}, we define the dynamic time deadline for agent a as: t~a~ = tc + #~ IN~(t~)12
E i Ixtc ol where #~ represents the time agent a considers reasonable to negotiate with a single agent, tc is the current time and Ix~L,aI represents the length of the current thread between i and a. Notice that the number of agents is in the numerator - so quantity of time is directly proportional to it, and averaged length of negotiation thread is in the denominator - so quantity of time is inversely proportional to it. 4.3
Imitative
tactics
This family of tactics compute the next offer based on the previous attitude of the negotiation opponent. These tactics have proved important in co-operative problem-solving negotiation settings [1], and so are useful in a subset of our contexts (see Section 2.2). The main difference between the tactics in this family is in the type of imitation they perform. One family imitates proportionally, another in absolute terms, and the last one computes the average of the proportions in a number of previous offers. Hence, given a negotiation thread ~ - - 2 6 ~n--26~-1 ~tn--26~-2 l~n--2 tn--1 ~n {''',Xb-~a '~a--*b '~b--*a ,...,Xb__.a,Xa_~b,Xb__.a~, with 5 > 1, we distinguish the following families of tactics: . Relative Tit-For-Tat The agent reproduces, in percentage terms, the behaviour that its opponent performed 5 _> 1 steps ago.
30
I xo_~ [n+l [j] =
tn- 26 9
min(max(x,~2~+2[j l ~ [3] x ta_b ..... [31,"'min~), m a x , ) n > 26 min~ + ~.?(max~ - m i n ~ ) min~ + (1 - n ; ) ( m a x ~ - rain;)
n < 2~, Vj~ decr.
n < 26, Vja incr.
Depending on the value of the quotient between two consecutive counter offers, the agent exhibits a range of different behaviours: m i r r o r if the quotient is greater than 1, r e t a l i a t o r y if it is lower t h a n 1, and a type of time independent b o u l w a r e if it is exactly 1. 2. R a n d o m A b s o l u t e T i t - F o r - T a t The same as before but in absolute terms. It means t h a t if the other agent decreases its offer by s then the next response should be increased by the same s Moreover, we add a component t h a t modifies t h a t behaviour by increasing or decreasing (depending on the value of p a r a m e t e r s) the value of the answer by a random amount. (This is introduced as it can enable the agents to escape from local minima.) M is the m a x i m u m amount by which an agent can change its imitative behaviour. rain ( max
~n-1 9 (Xa.-..-,b [3]
xt,~+~[j] = ~-'~
A,_
t~--26 (Xb__+ a
9 [3]
-
-
tn 26+2 Xb----, a
+(-1)SR(M),min;),max~) mi~; + ~;(ma~ - m~,~) rnin; + (1 ~ 2 ) ( m a x ; - m i n ; )
9 [ 3 ] ) ~-
n > 2~ n < 2~, 5 o deer. n <_ 2s E a incr.
Where s e {0, 1} and R ( M ) is a function t h a t generates a random integer in the interval [0, M]. 3. A v e r a g e d T i t - F o r - T a t T h e agent computes the average of percentages of changes in a window of size 7 > 1 of its opponents history when determining its new offer. When 7 = 1 we have the relative Tit-For-Tat tactic with/~ = 1.
min(max( xt"+'[J] =
5
t rL 2 9
tT, 1[J],mi~) m a x ; ) n > 2"~ ,.x ~ ~+~b] Xa-.b
rnin~ + a ~ rnin~ + (1 - t~;)(maz; - rnin;)
n E 27, Vj~ decr. n < 2% ~ a incr.
Negotiation strategies
The aim of agent a's negotiation strategy is to determine the best course of action which will result in an agreement on a contract x t h a t maximises its scoring function V ~. In practical terms, this equates to how to prepare a new counter offer. In our model we consider t h a t the agent has a representation of its mental state containing information about its beliefs, its knowledge of the environment (time, resources, etc.), and any other attitudes (desires, goals, obligations, intentions, etc.) the agent designer considers appropriate. The mental state of agent
31 a at time t is noted as MS~a. We denote the set of all possible mental states for agent a as M Sa. When agent a receives an offer from agent b it becomes the last element in the current negotiation thread between both agents. If the offer is unsatisfactory, agent a generates a counter offer. As discussed earlier, different combinations of tactics can be used to generate counter offers for particular issues. An agent's strategy determines which combination of tactics should be used at any one instant. Hence, the following definition: D e f i n i t i o n 3. Given a negotiation thread between agents a and b at time over domain X = Xl x ... X X p , with x t" l. and a finite set ~ b = { " ' , xt,, b---,aJ, m tactics n T a = {TiIzi : M S a ~ X}ie[1,m], a w e i g h t e d c o u n t e r p r o p o s a l any lineal combination of the tactics that generates the value at time tn+l the thread. T h a t is, for issue j
Xa__+b[Jj -'- "[jlTl(iSta'~+l)[j] H- ")'j2T2(iStan+l)[j] --[-... T "~jrnTmllVlo a such that for all issues j,
EiE[1,m] ~/ji = 1 and
tn of is in
][j]
x~+~bt~+l--__{ . . . '~b--*a'~a--*bI~'tn tn+l 1
Given a set of tactics, different types of negotiation behaviour can be obtained by weighting the tactics in a different way. T h a t is, by changing the matrix F particular to each negotiation thread:
711 ~12 ''- ~/lm / [~/21 ~22 ~2m
t
/
\~pl ~/p2
~/pm
An example of when this weighted combination may be useful is when modelling a s m o o t h transition from a behaviour based on a single tactic (e.g. Boulware, because the agent has plenty ot time to reach an agreement) to another one (e.g. Conceder, because the time is running out). Smoothness is obtained by changing the weight affecting the tactics progressively (e.g. from 1 to 0 and from 0 to 1 in the example). We model many-parties negotiations by means of a set of interacting negotiation threads. The way this is done is by making a negotiation thread influence the selection of which matrix F is to be used in other negotiation threads. Thus, D e f i n i t i o n 4. Given a, b E A g e n t s , t E T i m e , a's mental state M S~, and/"Z-,b, a N e g o t i a t i o n S t r a t e g y , is any function f of the following type:
Fat+l --,b
= f(F~--+b, M S t )
11 This definition uses the natural extension of tactics to the multi-dimensional space of issues' values.
32
A simplistic example of the application of our model would be to have a matrix F built up of 0s and ls and having F a---*b t+l = F a---*b t for all t. This would correspond to using a fixed single tactic for each issue at every instant in the negotiation.
6
Convergence results
Convergence in negotiation is achieved when the scoring value of the received offer is greater t h a n the scoring value of the counter offer the agent intended to respond with. T h a t is, D e f i n i t i o n 5. A Negotiation thread x ~t~b = { ' " , x t~ b--*~} converges at time t~+l
iff V%x~L~) --> v ~ ~, '~+~ a-*b] With this definition in mind, we have obtained some preliminary results on convergence for a single variable, single tactic, bilateral negotiation. Wider convergence criteria will be forthcoming as future work. The second proposition (6.2) is particularly interesting because it allows an agent using a time-dependent tactic to know if the negotiation will converge with an agent playing relative TitFor-Tat. Knowing if an opponent is playing Tit-For-Tat can easily be guessed by using a strategy t h a t makes some initial random offers and then examines the responses. Notice, however, t h a t convergence cannot be guaranteed in general. For example, two agents using a Tit-For-Tat tactic might stay negotating forever if no limitation on time is established.
If two agents a and b negotiate values for an issue j over the value regions [min;,max;] , [m~nj, 9 b raaxj], b satisfying [rnin~, max;] A [min~, m a x b] ~ O, then the following properties hold:
Proposition6.
1. If a and b follow a time-dependent tactic with Vja decreasing (resp. inereasb lug), Vjb increasing (resp. decreasing) and tanaz = tma x then the negotiation for issue j converges. 2. If a uses a time-dependent tactic with Vja increasing (resp. decreasing) and b uses a relative Tit-For-Tat tactic with Vjb decreasing (resp. increasing), and a starts the negotiation thread, then the negotiation converges if Xta~b[jlX~a[j] < (rain;) 2 (resp. if Xa_,b[J]Xb__,a[j ,2 ] > (maxD2) Proof. (1) We prove it for ~ a decreasing, the other case is similar. We have [min~, max;] N [min b, max b] 7~ O, then m a x ; > m i n b. W h e n time runs out the a(t) functions of both agents become 1, t h a t is a~(t~ax) = a j (bt m abx ) ---- 1 and then their acceptance values will become rain; + ~ ; ( t ) ( m a x ; - m i n ; ) = m a x ; and min~ + (1 - ajb(t))(maxjb- rain b) = minb. So, b will make the offer tbLa~
.
Xb--,~ [2] = rain b at the deadline. But given t h a t a uses a monotonically decreasing function and m a x ; > min~ we have Vja(max;) > Vj~(minb). In other words,
33
by using the assumption tma ,~t~ t a ~ [J])" So a x = tbma~ we have ~,j/ ' a (~b--,a [J]) > ]"j] ' a (~Xa--*b the negotiation converges. (2) Again we prove it for V3b increasing, the other case is similar. Without loss of generality assume that the thread is: t , - 1
X a.-.b
{Xtal_~b '
t2 , tn-l l Xb--. a 9 .. , X a--. b
and that ~ = 1. By using the definition of relative Tit-For-Tat, it can be seen that: tn-3
r .1
tn-3
t n - - 5 r .1
x ~ a [ j ] = xa-~b[3J t.-z Xa-~b~J] Xa-~b[3] x t ~ , [ j ] = b t3] a t ' ' " Xb-"':' [J] -t"-'[J] ~Xa-~b z -.b [3] za-.b in--3
try--5
9
xo-,b[J] '" ,.1 : t.._: t.-3,-:: " " ~ b~,:,,L3J= t._1,-.1 b~o,[3J Xa--,b[J] X~-~bt3J Za-~bt3] Z~-~bt3J The thread converges if Vjb
t~-i
.
> Vjb(x L [j]), but given that Vjb is
t~-i monotonically decreasing, this happens only if Xtb~[j] < X~_.b [3]. Then, by subtl " ~ in-1 r , ; 1 that is tl 9 t2 9 in-1 9 2 stituting we get ~ z ~ a [ j ] Xa__,b[jj, Xa~b[3]Xb__,~[3 ] < (X~__,b [j]) . a~b
tJJ
t 9 = min~ (by Vja beBut when time approaches t~nax we have h9m t - - , t ~ X a--*b[J] tl . t2 .] < (min~) 2 the negotiation ing increasing). So, at the limit, if x~__+b[3]Xb__,a[3 converges.
7
Related
work
Research in negotiation models has been pursued in different fields of knowledge: game theory, social sciences and artificial intelligence. Each field has concentrated on different aspects of negotiation, making the assumptions that were pertinent for the goal of their study. In game theory, researchers have been interested in mechanism design: the definition of protocols that limit the possible tactics (or strategies) that can be used by players. For instance they are interested in defining protocols that give no benefit to agents t h a t mis-represent or hide information [9]. In this work disclosure of information is acceptable, because by doing so it will benefit the agent in finding an optimal solution for itself. Contrary to our model, and as we discussed in Section 2, this is an inappropriate assumption from the point of view of real applications. As has been argued elsewhere [10], these and other assumptions limit the applicability of game theory models to solve real problems. Our interests lie in invertigating the process of negotiation among agents and not only on the outcome. Hence, our study, and those forthcoming, are much more in the experimental line of [4]. Although we do not concentrate on learning, some similarities can be found with the formalism by Zeng and Sycara [10]. We have not concentrated however on the description of negotiation protocols that has been an important focus of attention for the community of distributed artificial intelligence (see [7] for extensive references).
34
8
D i s c u s s i o n a n d future work
The next stage in the development of our model is to undertake an experimental evaluation of the tactics and strategies described herein. We believe adequate strategies have to be developed in accordance with the desired properties and characteristics of the domain at hand. These strategies then need to be tested in repeated games over a range of typical scenarios to determine which are the most successful. Some initial modeling of concrete strategies has been made considering several variables in the mental state of an agent: (i) an approximation of the first and second derivatives of the other agent's behaviour, (ii) the relation between both negotiating agents (e.g. members of the same company, boss/employee, ...), and (iii) the time remaining to reach an agreement (in this case time is playing a role at both strategic and tactic levels). This model is being used in the real modeling of the domain presented in Section 2. Currently there are two versions of the model implemented in CLIPS and PROLOG. The initial results on convergence, although simple, encourage us to make a more complete analysis of the types of negotiation situations that are likely to occur. We have identified many research opportunities in extending the model. For instance, fuzzy control techniques could be used to relate a qualitative estimate of the first and second derivatives of the opponent and a qualitative value for the/~ to be used in a tactic; we imagine a rule like: If the agent concedes quite a lot (first derivative) and the agent concession ratio (second derivative) increases then Beta is Medium. Genetic algorithms could also be used to determine experimentally which weighted combinations of tactics survive better in the line of [4]. Moreover, genetic algorithms may help to determine which negotiating agents show the desired behaviour by using the strategies as the genetic code. Finally, case-based reasoning could be used to model strategies. The case memory could be used by the agent to determine which past combinations of tactics worked best in similar circumstances.
9
Acknowledgements
This project has received the support of the Spanish Research project SMASH (CICYT number, TIC96-1038-C04001) and the D T I / E P S R C Intelligent Systems Integration Programme (ISIP) project A D E P T .
References 1. R. Axelrod. The Evolution of Cooperation. Basic Books, Inc., Publishers, New York, USA., 1984. 2. B. Crabtree and N. (eds.). The Practical Application of Intelligent Agents and Multi-Agent Technology. London, UK., 1996. 3. D. G.Pruitt. Negotiation Behavior. Academic Press, 1981.
35 4. A. Ito and H. Yano. The emergence of cooperation in a society of autonomous agents - the prisoner's dilemma gamme under the disclosure of contract histories. In V. Lesser, editor, Proceedings of the First International Conference on MultiAgent Systems, pages 201-208, San Francisco, USA, 1995. AAAI Press/The MIT Press. 5. N. R. Jennings, P. Faratin, M. J. Johnson, T. J. Norman, P. O'Brien, and M. E. Wiegand. Agent-based business process management. Int Journal of Cooperative Information Systems, 5(2-3):105-130, 1996. 6. S. Krans, J. Wilkenfeld, and G. Zlotkin. Multiagent negotiation under time constraints. Artificial Intelligence Journal, 75(2):297-345, 1995. 7. H. Mueller. Negotiation principles. In G. M. P. O'Hare and N. R. Jennings, editors, Foundations of Distributed Artificial Intelligence, Sixth-Generation Computer Technology Series, pages 211-229, New York, 1996. John Wiley. 8. H. Raiffa. The Art and Science of Negotiation. Harvard University Press, Cambridge, USA, 1982. 9. J. S.Rosenschein and G. Zlotkin. Rules of Encounter. The MIT Press, Cambridge, USA, 1994. 10. D. Zeng and K. Sycara. How can an agent learn to negotiate. In J. Mueller, M. Wooldridge, and N. Jennings, editors, Intelligent Agents III. Agent Theories, Architectures, and Languages, number 1193 in LNAI, pages 233-244. Springer Verlag, 1997.
Norms as Constraints on Real-Time Autonomous Agent Action* Magnus Boman The DECIDE Research Group Department of Computer and Systems Sciences Stockholm University and the Royal Institute of Technology Electrum 230, SE-164 40 Kista, SWEDEN Phone: +46 8 16 1678 Fax: +46 8 703 9025 e-mail:
[email protected] WWW: http://www.dsv.su.se/DECIDE & The ISES Project University of Karlskrona/Rormeby
Abstract. We describe a general model of agent action, and ways of constraining action using norms. The agents under study are supersoft agents: autonomous artificial agents programmed to collect, formalise, and actively use qualitative data. A consistent, interruptible, and pre-emptive anytime algorithm that also has measurable quality and diminishing returns is discussed. The algorithm controls procedures for decision making already implemented. The model thus has computational meaning and strives to characterise real-time decision making in agents acting under uncertainty in imprecise and partially described environments.
1. Introduction True agent autonomy is sometimes described as being incompatible with the adoption of social norms by the agent. Two extremes are the fully obedient agent that never violates a social norm and the anarchic agent that ignores, is unaware of, or even deliberately chooses to violate social norms. In this view, the obedient agent is seen as being fully dependent and the anarchic agent as being fully autonomous, see Fig-1. Full u
\ \ \ \ \ \
o
\ N
None
Autonomy Full
F i g - l : The trade-off between obedience and autonomy. t The author would like to thank Love Ekenberg, Mats Danielson, Harko Verhagen, and the participants of the ICMAS'96 Workshop of Norms, Obligations, and Conventions for discussions in relation to this paper. This work was in part supported by NFR.
37 There are several reasons for questioning the truth in this purported trade-off. First, the social norms employed might not be very restrictive. In many environments, an agent could act as if unaware of the norms without actually violating them. The likelihood of this scenario seems to increase with the complexity of the agents in the multi-agent system (MAS). Simple utilitarian agents are less likely to escape restrictions on behaviour, becanse their behaviour can be concisely described and predicted with relative ease. Hence, norms are easier to specify in such a way that they keep agents in control. By contrast, more sophisticated and social agents will inescapably affect social norms via their actions--the special case being an MAS in which social norms emerge, with the set of norms being empty at the outset. A second reason, more general than that special case, is that agents sometimes act upon incomplete information. Moreover, there might be uncertainties involved that pertain to utilities, probabilities, or some other relevant metric in the domain at hand. Being forced to act in an incomplete and uncertain environment, the agent is prone to errors and ways of reducing the number of erroneous decisions made would be welcome. Modelling agents only partially aware of social norms, the norms themselves being imperfect in that they are partial and sometimes inapplicable, is an important problem. A third reason is that it is often meaningful to talk about the existence of social norms, even though they are violated occasionally. Violations may not be a display of anarchic behaviour: the deviator might in fact honour the norm, but nevertheless have good reasons for violating it. An agent might violate a social norm because the punishment is relatively low. A utility maximiser might, for instance, neutralise a physical threat in an act of self defence. These reasons not only serve as evidence for the falsity of the trade-off in Fig-1, but are interesting in their own right. In the following section, we will further motivate the introduction of a model of autonomous agent action, as well as of ways of constraining action using norms. Section 3 discusses the model itself. Special attention is given to norms, since the agents under study are part of an MAS. Section 4 discusses evaluation aspects of the model. The final section offers conclusions.
2. Background One way of studying rational agents that repeatedly act upon the result of evaluating decision situations is to think of their means of evaluation as external to the agent. Just as a human agent might pick up a calculator for computing the product of two large numbers, an artificial agent might seek assistance in the form of decision support, almost regardless of the level of sophistication of the (internal) reasoning capabilities ascribed to the agent. As a case in point: we have just initiated a small project for making a simple version of the socalled Delta Decision Tool [9] available on the Internet. Mobile Internet agents can in the future seek advice from a decision module by interacting with a Java program on a WWW page, and await advice before moving on in the network. As designers of the page, we can have no control over the agents that choose to use the page, e.g., to what extent they accept the advice given as normative.
38 We make the following two provisos, more concise motivations for which are available in [7] and [13] respectively.
Proviso 1: Agents act in accordance with advice obtained from their individual decision module, with which they can communicate. Proviso 2: The decision module contains algorithms for efficiently evaluating supersoft decision data concerning probability, utility, credibility, and reliability. The first proviso makes our presentation clearer, because every change of preference (or belief revision, or assessment adjustment) of the agent is thought of as adequately represented in the decision module. This gives us freedom from analysing the entire spectrum of reasoning capabilities that an agent might have, and its importance to the use of the decision module. The communication requirement presents only a lower bound on agent reasoning by stating that the agent must be able to present its decision situation to the decision module. This entails that the agent can represent the information in an ordinary decision tree. That the decision module is seen as customised is inessential: It is a metaphor for a situation where all agents utilise an oracle, but where the computations of the oracle depend on the individual agent input. In other words, the oracle acts as a decision tree evaluator, and the size of the oracle is the only thing that makes it inconvenient for the agent to carry it around in its rucksack. The proviso also lets us separate the important problem of agents failing to obey social norms from the other problems discussed in this paper. Finally, the proviso makes explicit that no nonconsequentialist decision bias affects the decision chosen [3]. In the eyes of a consequentialist (e.g., a person in favour of expected utility theory), artificial agents are closer to the perfect decision maker than human agents can ever hope to be. The second proviso requires more explanation. Supersoft decision theory (SSD) is a variant of classical decision theory in which assessments are represented by vague and informal statements, such as "The outcome o is quite probable" and "The outcome o is most undesirable." The purpose of SSD is to investigate the applicability of the classical theory to real-life decision situations [ 19]. The obvious reason for decision analysts to be interested in SSD is that most human decision makers prefer vague and informal assessments to the use of precise numbers, as in "The outcome o is larger than 0.55"; not to mention "The probability of the outcome o is 0.32", the precise meaning of which is almost impossible to grasp for a human [17]. If the agents under study are supersoft agents: autonomous artificial agents programmed to collect, formalise, and actively use qualitative data, then these agents could merit from using SSD tools for evaluation. Supersoft agents need not know the true state of affairs, but can describe their uncertainty by a probability distribution. In such decisions with risk, the agent typically wants a formal evaluation to result in a presentation of the action optimal with respect to its assessments, together with measures indicating whether the optimal action is much better than any other action, using some kind of distance measure. The basic requirement for normative use of such measures is that (at least) probability and utility assessments have been made, and that these can be evaluated by an evaluation function, e.g., the principle of maximising the expected utility (PMEU). Given such a function, one might contemplate its use as a choice rule generator. In fact, several people have
39 equated rationality with the use of a set of choice rule generators containing only the PMEU (see, e.g., [14]) but there are good reasons not to accept this equation [5]. While the PMEU can be seen as constituting the core of rational agent behaviour, it is not always sufficient in its own right [6]. The PMEU can never be the operational definition of rationality that Newell sought in his oft-cited AAAI address [16],[20]. In particular, the agent may want to exclude particular actions that are in some respect too risky regardless of whether they are affirmed (or dismissed) by the PMEU. This can be done by setting security levels, an idea introduced in [22] that has been applied to MAS [10], [12]. In [7], our recommendation of using security levels to extend the PMEU stood fast but with the notion of security level and policy implementation given an important new interpretation as a norm for cooperative action (cf., e.g., [8]). This idea is combined here with notions developed in [13] to give the general model for constraining actions presented in the following section. 3. A M o d e l f o r A c t i o n C o n s t r a i n t s The model in Fig-2 below is general in that it makes relatively few assumptions about the agent architecture, language, sensors, and communication protocol. This is not to say that these matters are unimportant. Some choices of agent language would be incompatible with the model, for example. The ambition is to let the model admit a unifying view, and we therefore encourage that it be complemented by additional components as appropriate. Not even the concept of goal is necessary: Agents are seen as fully autonomous and will always maximise their absolutely local and subjective utility. Bootstrapping does not present a problem, since no restrictions apply to neither sense data, nor communication data. The only requirement is that the contents of the four bases conform to Proviso 2 above. The concept of agent credibility as used here in imprecise and uncertain domains (cf. [3]) was defined in [10], and that of reliability in [13]. We refer the reader to the latter paper for a formalisation of the four different bases in the decision module, and focus here on intuitive aspects of the representation. Before taking an action, an agent might have used its means of communicating with other agents in the MAS, its sensors, and the computational power of the decision module. If there are no norms present in the MAS, the four bases in the decision module are non-linear systems of equations representing (typically subjectively assessed) supersoft data about 9 probabilities of outcomes of different consequences of actions 9 utilities of outcomes of different consequences of actions 9 credibilities of the reports of other agents on different assessments 9 reliabilities of other agents on different agent ability assessments The preferences of the agents can be stated as intervals of quantitative measures or by partial orderings--the formal details available in our earlier papers are not repeated here. Credibility values are used for weighting the importance of relevant assessments made by other agents in the MAS (see [10] for details on how to formalise this). Reliability values mirror the reliability of another agent as it in turn assesses the credibility of a third agent (see [13]). All bases except the utility base are normalised. Note that an MAS without norms is a social structure where group utility is irrelevant to the individual agent.
40 The presence of norms can manifest itself in three ways, each representing a different level of abstraction: 1 Through skewing of the equations in the four bases. 2 By filtering normative advice before it is received by the agent. 3 By disqualifying certain actions by referring to their negative impact on the global utility in the MAS.
Sens e <<5 D
data
Action~B Colmm/nication~ / data / /
/ ~ I
k Feedback \Disqualify \
Receive ~ Request /
!
I
NORM~
Filter I
/
Skew l DECISION M O D U L E Probability base Utility base Credibility base Reliability base
A
i!
Fig-2: Constraining autonomous agent action.
The first way of constraining action through norms is by manipulation of the utilities on the lowest level, i.e. to skew assessments to have an overly positive (or negative) attitude towards some consequences. This can be viewed as a deliberate revision of the risk attitude of the agent. Note that the PMEU alone cannot model all kinds of risk attitude. To present a theory that is neither too strong, nor too weak for modelling uncertainty and risk is very difficult even in small worlds [6]. Problems with implementing company or government policy in tools for normative decision analysis are in many ways similar to the problem of
41
constraining agent action in an MAS in order to comply with norms (see, e.g., [18]). These similarities are thoroughly investigated in [7]. The second way proceeds via the elimination of actions with disastrous consequences. A robot should not choose to turn left at a comer inside a mine, for instance, if that could lead to it being destroyed as a result of falling down a shaft, even if the probability for it actually falling is very small. If information is obtained from another agent, the credibility (and in some cases the reliability) of that agent affects the probability. The third item in the above list requires that a global utility measure is available on the local level. It is quite possible that the agent has first skewed some of its assessments to comply with a set of norms, second that it has rejected the recommended action because it was deemed to risky (and there might be a norm stating that agents should avoid actions that violate certain preset security levels), and third that it, when considering the second best alternative, finds that even this alternative cannot be chosen. The reason is that although this action is the rational choice of the agent in view of the norms present, another kind of threshold value is violated, viz. the extent to which the global utility of the MAS is reduced should the action be taken. This form of social norm adoption was implicitly suggested in [13], in which global utility among autonomous agents is formalised. The model is highly individualistic. In open systems, this position is easily defended by noting that agents are less inclined to work toward group goals. We claim, however, that the model applies equally to cooperating agents, thanks to the three ways of affecting agent action just explained. Jennings and Campos identify the third form of social norm adoption above as a basic requirement for autonomous agent systems that they call the individualcommunity balance [ 16]. It is a constructive and computationally efficient way of attaining socially responsible behaviour [ 15], and an attempt at describing the social level of agents. Note that all norms cannot be described as linear constraints. As we have seen, however, such norms can play the same role as risk attitudes in that they affect security level settings, sensitivity analyses, and other ways of extending the PMEU.
4. On Implementing the M o d e l Algorithms for evaluating the contents of the different bases in the decision module have been implemented [9]. We are currently working on an anytime algorithm that realises the evaluation model of Fig-2. An outline of what that algorithm will do is given in Fig-3.
evaluation complete n
DEDUCTIVE ANSWER SATISFICING INDUCTIVE ANSWER
O
aspiration
level
evaluation
initiated
1 0 -
NO
ANSWER
Fig-3: Implementing bounded rational behaviour.
42
The aspiration level is defined as one cycle of the anytime algorithm. An agent that reaches the aspiration level is a satisficing agent [21 ]. By making the cycles simple enough to guarantee the completion of the first cycle, e.g. by proving that it can always be performed within one cycle of the system clock, we have set a minimum degree of agent rationality. With every repeated cycle, the decision module computes a better approximation of the deductive result, with the latter being reached after n completed cycles. Should the bounded rational agent run out of resources, e.g., on account of the decision module being prompted for leaving its advice while computing approximations, the normative advice presented is a machine guess. In the case of the algorithm terminating, the advice is no longer a machine guess but a deductive result that is in some sense optimal, sound, and complete [11]. The correctness of machine guesses must be defined from the vector of possible answers since the correctness function is total but not injective. In his survey [23], Zilberstein identifies seven desired properties of anytime algorithms. The remainder of this section is a status report of our algorithm based on these properties. First, our algorithm has measurable quality since, given that the deductive result is computed (in batch, or whenever the agent is idle), the exact distance between this result and the machine guess can be computed. The second property of recognisable quality is today only met in part. The quality of a machine guess cannot always be determined in constant time. What is known is the level of equation solver used, and how far that solver is from reaching a final result. There are currently four different solvers in our algorithm (see [7], [9]). If the most sophisticated solver is used, this final result is the deductive result. The third property is monotonicity: Advice quality is a nondecreasing function of time (and possibly input quality). Whether our algorithm is monotone or not in this respect is currently unknown, but if one accepts our (probably non-perfect) criteria for recognising quality, monotonicity follows since we can let the algorithm return the best approximation computed so far (as noted in [23], p.74). We have quite strong results on consistency, the fourth property: our algorithm is deterministic. The computations are independent of assumptions regarding probability distributions and hence a statistical metric, e.g. variance, is not required. Our algorithm also has the fifth property of diminishing returns. It is indisputable that this is convenient, but whether it is always a desired property has in recent years been a reason for some discussion, notably in work on increasing returns (see, e.g, [1]). As noted above, we can define our algorithm so that it always reaches the aspiration level. This fact, together with access to a memory of constant size, gives us the sixth property of interruptibility----~e algorithm can be stopped at any time and provide an answer. Our algorithm also has the final property ofpreemptability, since the algorithm can be suspended and resumed with minimal overhead. A forthcoming paper will describe the dynamic performance profile of our algorithm, as defined in [23], p.77. Once the algorithm is implemented, studies of various sets of norms will be undertaken. These studies will have the form of simulations. The first two kinds of norm adoption have already been implemented in the Delta Decision Tool [9], but their functionalities relative to the third kind of norm adoption (the group utility constraint) is yet to be examined.
43
5. Conclusions and Further Research We have presented a general implementable model for constraining action using norms. The model constrains normative advice provided by a decision module and operates on three levels of abstraction. The lowest level deals with manipulation by nonbenevolent (or even malicious) agents, with modifications of assessments as a result of sensitivity analyses, and with more or less ad hoc adoption to social norms by means of very delicate belief revision. The middle level deals with the filtering of certain actions in accordance with the risk profile of the agent. This is the natural level for a coordinator of a group of agents to use if the coordinator wishes to implement policy. An agent that strives to become part of a coalition may also let the norms adhered within the coalition act as filters on this level, but if the agent has reasons not to make its desires public, it should revise its assessments on the lowest level instead. Once it is part of the coalition it will probably have to adhere to the social norms of the group. The highest level of abstraction deals with the acceptance of social norms. Since the basis of our evaluation is the PMEU, we have adopted an idea of Ekenberg not to let agents diminish the utility of the group that they belong to by their choice of action. This is a constructive interpretation of the principle of social rationality [16]. For systems that utilises other rules for generating rational behaviour, this rule could be modified, but the important idea is that social norms usually affects rational choice only at the highest (social) level. Given that the advice received by the agent is actually followed, i.e. if the agents are consequentialistic, the three levels together constitute a metanorm [2], a formalised procedure for enforcing norms. This is interesting in itself, since many researchers have claimed that such models are of philosophical interest only because of their high computational complexity. We have given here the first pieces of evidence that this is not the case. The procedures already implemented help in the difficult transition from precise and unrealistic quantitative decision making by artificial agents to qualitative analyses. In the terminology of [23], we have an accuracy metric; a value with computational meaning that represents the difference between a machine guess and a deductive answer. We have also sketched an anytime algorithm that has measurable quality and diminishing returns, and that is consistent, interruptible, and pre-emptive. To complete this algorithm is our next objective.
References [1] B. Arthur: "Positive Feedbacks in the Economy", in Scientific American, February issue, pp.9299, 1990. [2] R. Axelrod: "An Evolutionary Approach to Norms", in American Political Science Review, Vol.80, No.4, pp. 1095-1111, 1986. [3] J. Baron: Morality and Rational Choice, Kluwer, Dordrecht 1993. [4] J. Bar~n: ``N~nc~nsequen~alist Decisi~ns~ in Behavi~ral and Brain Sciences' V~.~ 7~ N~.~ 1994. [5] M. Boman: "Rational Decisions and Multi-Agent Systems", in AAAI-95 Fall Symposium on Rational Agency, Ed. Fehling, to appear as an MIT Technical Report. [6] M. Boman & L. Ekenberg: "Decision Making Agents with Relatively Unbounded Rationality", in Proc DIMAS'95, Ed. Nawarecki, pp.I/28--I/35,AGH, Krakow 1995.
44 [7] M. Boman: "Implementing Norms through Normative Advice", in Proc ICMAS'96/IMSA '96 WS on Norms, Obligations, and Conventions, Eds. Conte & Falcone, 1996. [8] R. Conte & C. Castelfranchi: Cognitive and Social Action, UCL Press, London 1995. [9] M. Danielson: A Computational Framework for Decision Analysis, Ph.D. thesis, in preparation. [10] L. Ekenberg, M. Boman & M. Danielson: "A Tool for Coordinating Autonomous Agents with Conflicting Goals", in Proc ICMAS'95, Ed. Lesser, pp.89-93, AAAI/MIT Press 1995. [11 ] L. Ekenberg, M. Danielson & M. Boman: "Imposing Security Constraints on Agent-Based Decision Support", to appear in Decision Support Systems Intl Journal. [ 12] L. Ekenberg, M. Danielson & M. B oman: "From Local Assessments to Global Rationality", lntl Journal of Intelligent Cooperative Information Systems, Vol.5, Nos.2&3, pp.315-331, 1996. [13] L. Ekenberg: "Modelling Decentralised Decision Making", in Proc ICMAS'96, Ed. Tokoro, pp.64-71, AAAI Press 1996. [14] P. J. Gmytrasiewicz & E. H. Durfee: "Elements of a Utilitarian Theory of Knowledge and Action", in Proc 13th IJCAL pp.396-402, 1993. [15] N. R. Jennings: "Towards a Cooperation Knowledge Level for Collaborative Problem Solving", in Proc lOth ECAI, pp.224-228, 1992. [16] N. R. Jennings & J. R. Campos: 'q'owards a Social Level Characterisation of Socially Responsible Agents", in IEEProc on Software Engineering, Vo1.144, No.l, pp.11-25, 1997. [17] R. L. Keeney: "Decision Analysis: An Overview", in Operations Research, Vol.30, No.5, pp.803-838, 1982. [ 18] T. J. B. Kline & L. M. Sulsky: "A Policy-Capturing Approach to Individual Decision Making", in Canadian Journal of Behavioral Science, Vol.27, 1995. [19] P-E. Malmnlis: "Methods of Evaluations in Supersoft Decision Theory", unpublished manuscript, 1995. Available on the WWW using URL: http: //www.dsv. su. se/-mab/DECIDE. [20] A. Newell: "The Knowledge Level", in AI Magazine, Summer issue, pp. 1-20, 1981. [21] H.A. Simon: "A Behavioral Model of Rational Choice", in Quarterly Journal of Economics, Vol.59, pp.99-118, 1955. [22] A. Wald: Statistical Decision Functions, John Wiley and Sons, 1950. [23] S. Zilberstein: "Using Anytime Algorithms in Intelligent Systems", in AlMagazine, Fall issue, pp.73-83, 1996.
Distributed Belief Revision vs. Belief Revision in a Multi-agent Environment: First Results of a Simulation Experiment Aldo Franco Dragoni, Paolo Giorgini, Marco Baffetti Istituto di Informatica, Universi~ di Ancona, via Brecce Bianche, 60131, Ancona (Italy) {dragon,giorgini} @inform.unian.it
Abstract. We propose a distributed architecture for belief revisionintegration, where each element is conceived as a complex system able to exchange opinions with the others. Since nodes can be affected by some degree of incompetence, part of the information running through the network may be incorrect. Incorrect information may cause contradictions in the knowledge base of some nodes. To manage these contradictions, each node is equipped with a belief revision module which makes it able to discriminate among more or less credible information and more or less reliable information sources. Our aim is that of comparing on a simulation basis the performances and the characteristics of this distributed system vs. those of a centralised architecture. We report here the first results of our experiments.
1 Introduction Mason's Seismic Event Analyzer (SEA) [1] was initially regarded as an application of DAI techniques [2]. Conceived during and after the age of the "Salt 1" and "Salt IF' treaties against the proliferation of nuclear weapons, the system main task is that of integrating information coming from a geographically distributed network of seismographs (situated in Norway) in order to discriminate natural seismic events from underground nuclear explosions. Distributed Truth Maintenance System (DTMS) [3] is one of the theoretical backgrounds of the latest versions of the system. A limit of DTMS is that it presupposes the trustworthiness of the network's nodes. For cases in which the nodes may be mutually inconsistent, the authors proposed a Distributed Assumption-Based Truth Maintenance System (DATMS). By supporting multiple contexts, DATMS was a step toward what they called "liberal belief revision policy": "it is better let agents stand by their beliefs based on their own view of the evidence". We agree that ATMS [4] is central to belief revision. Trying to develope a method to perform belief revision in a multi-source enviroment (MSBR) [5], we realized that three relevant items are: 9
maximal consistency of the revised knowledge base
9 credibility of the information items 9 reliability of the information sources [6].
46
Achieving maximal consistency is a symbolic process and can be accomplished by an ATMS. On the other hand, the credibility of the beliefs and the reliability of the agents can hardly be estimated without numerical processing (albeit we recognize the importance of qualitative methods whenever numbers make no sense). MSBR (section 2) can be regarded as a way to integrate data coming from (eventually conflicting) sensors, databases, generic knowledge repositories and even witnesses during a trial or an inquiry [7]. However, neither MSBR nor Mason's SEA are really distributed architectures since the data integration is attained in a centralized way, as in every apparatus for sensors' data fusion. In a previous paper [8] we introduced the idea of Distributed Belief Revision (DBR). With DBR, nodes interact with each others in order to accomplish their own task. Since they can be affected by some degree of incompetence (eventually also insincerity), part of the information running through the network may be incorrect. Occasionally, incorrect information may cause contradictions in the knowledge base of some nodes. To manage these contradictions, each node is equipped with a MSBR module which makes it able to discriminate among more or less credible intormation items and more or less reliable information sources. DBR may be seen as a generalization of MSBR since it adopts the same mechanism but in a decentralized fashion. As we'll see in section 3, a major difference is that, being MSBR part of the node, also its rules and its numerical methods will be exposed to the other nodes' judgement. With DBR not only the acquisition of information is performed in a distributed manner but also the elaboration. Comparing features and performances of DBR w.r.t. MSBR can be done only on a simulation basis and on a common ground. Our testbed case is the task of extracting as much correct data as possible from a not updated and/or unreliable database. This task can be accomplished under both, the MSBR and the DBR paradigms. Section 4 illustrates this point, presenting the first results of a simulation experiment that we are currently carrying on with the DBR architecture and outlines what will be our work in the next few months.
2 A model for Belief Revision in a Multi-Source Environment (MSBR) Derived from researches in multi-agent [9] and investigative domains [10], MSBR is a novel assembly of known techniques to the treatment of consistency and uncertainty. Let us recapitulate here the main ideas. Defined as a symbolic model-theoretical problem [11~ 13,17], belief revision has also been approached both as a qualitative syntactic process [14,15] and as a numerical mathematical issue [16]. Trying to give a synoptic (although approximate) perspective of this composite subject, we begin by saying that both the cognitive state and the incoming information can be represented either as sets of weighted sentences or as sets of weighted possible worlds (the models of the sets of sentences). Weights can be either reals (normally between 0 and 1), representing explicitly the credibility
47 of the sentences/models, or ordinals, representing implicitly the believability of the sentences/models w.r.t, the other ones. Essentially, belief revision consists in the redefinition of these weights in the light of the incoming information. According to us, in a multi-agent environment, where information come from a variety of sources with different degrees of reliability, belief revision has to depart considerably from its original framework. Particularly, the principle of "priority of the incoming information" should be abandoned. While it is acceptable when updating the representation of an evolving world, that principle is not generally justified when revising the representation of a static situation. In this case, the chronological sequence of the informative acts has nothing to do with their credibility or importance. Another point is that changes should not be irrevocable. To make practical and useful belief revision in a multi-agent environment, we substitute the priority of the incoming information with the following principle[ 18]. Recoverability: any previously believed information item must belong to the current cognitive state if it is consistent with it. We will achieve recoverability by imposing the maximal consistency of the revised cognitive state. Along the paper we will represent beliefs as sentences of a propositional language L, with the standard connectives ^, v, ~ and 9. E is its set of propositional letters. Beliefs introduced directly by the sources are called assumptions. Those deductively derived from the assumptions are called consequences. Each belief is embodied in an ATMS node:
If the node represents a consequence, then Source (S) contains only the tag " d e r i v e d " and Origin Set (OS) (we borrowed the name from [19]) contains the identifiers of the assumptions from which it has been derived (and upon which it ultimately depends). If the node represents an assumption, then S contains its source and OS contains the identifier of the node itself. We call Knowledge Base (KB) the set of the assumptions introduced from the various sources, and Knowledge Space (KS) the set of all the beliefs (assumptions + consequences). KB and KS grow monotonically since none of their nodes is ever erased from memory. Normally both contain contradictions. A contradiction is a pair of nodes as follows: {<_,
ct . . . .
>, <_,
~o~ . . . .
>}
Since propositional languages are decidable, we can find all the contradictions in a finite amount of time. Inspired by Kleer, we define "nogood" a minimal inconsistent subset of KB, i.e., a subset of KB that supports a contradiction or an incompatibility and is not a superset of any other nogood. Dually, we define "good" a maximal consistent subset of KB, i.e., a subset of KB that is neither a superset of any nogood nor a subset of any other good. Each good has a corresponding "context", which is the subset of KS made of all the nodes whose OS is a subset of that good. Any node belongs to more than one context. Managing multiple contexts makes it possible to compare the credibility of different goods as a whole rather than confronting the credibility of single beliefs. Procedurally, our method of belief revision consists of four steps:
48
S1. Generating the set NG of all the nogoods and the set G of all the goods in KB $2. Defining a credibility ordering < I ~ over the assumptions in KB $3. Extending
such that HnS~:O for each S~F. A hitting-set is minimal if none of its proper subsets is a hitting-set for F. It should be clear that G can be found by calculating all the minimal hitting-sets for NG, and keeping the complement of each of them w.r.t. KB. We adopt the set-covering algorithm described in [20] to find NG and the corresponding G, i.e., to perform S 1. $2 deals with uncertainty and works with the "weight", or the "strenght", of the beliefs. A question is: should
reliability set = {<S1, R1> ..... <Sn, Rn>}, where Ri (a real in [0,1]) is the reliability of Si, interpreted as the "a priori" probability that Si is reliable.
9
information set = {<11, Bell> ..... }, where Beli (a real in [0,1]) is the credibility of I i.
The reliability set is one of the two inputs of the belief-function formalism (see figure 1). The other one is the set {<St, st> ..... <Sn, Sn>}, where si is the subset of I made of all the information items given by S i. The information set is the main output of the belief-function formalism. Figure 2 presents the I/O mechanism applied to the case in the previous example. Let us see now how the mechanism works. Remember that E denotes the set of the atomic propositions of L. The power set of E, ~ = 2 - , is called frame of discernment. Each element o~ of fl is a "possible world"or an "interpretation" for L (the one in which all the propositional letters in ~0 are true and the others are false). Given a set
49 of sentences s c I (i.e., a conjunction of sentences), [s] denotes the interpretations which are a model for all the sentences in s.
Source Information U
b
w
a
W T
Credibility Information
Credibility
Good
7981 b .597 a .113 a c a ~ - - ( b )
D-8
c
.597 c
Theory of Evidence
a --9 -~(b)
1"4351b c a 395 a ~
-~(b)
Source Reliabil~lNN " U 0.9 W 0.8 T 0.7
Source
O
New Reliability U .798
r Bayesian TM O lC~176
W .597 T .395
Fig. 1. Basic I/O of the belief-function mechanism The key assumption with this multi-source version of the belief function framework is that a reliable source cannot give false information, while an unreliable source can give correct information; the hypothesis that "S. is reliable" is compatible only with [si], while the hypothesis that "S.~is unreliable" Is compatible with the entire set ~ . Each S i gives an evidence for f~ and generates the following basic probability
assignment (bpa) m. over the elements X of 2f~: 1
mi ( X )=
Ri 0
ifX =~ otherwise
All these bpas will be then combined through the Dempster Rule of Combination:
:c Xln...nXn=X
m(X)=ml(X)|174
~
m I ( X I ) " ... " mn( Xn)
Xln...nXnr
From the combined bpa m, the credibility of a set of sentences s is given by:
Bel(s)= ~ m ( X ) X~Is]
In figure 1, we see another output of the mechanism, obtained through Bayesian Conditioning: the set {<$1, NRI> ..... <Sn, NRn>}, where NR i is the new reliability of S i. Following Shafer and Srivastava, we defined the "a priori" reliability of a source as the probability that the source is reliable. These degrees of probability are
50
"translated" into belief-function values on the given pieces of information. However, we may also want to estimate the sources' "a posteriori" degree of reliability from the cross-examination of their evidences. To be congruent with the "a priori" reliability, also the "a posteriori" reliability must be a probability value, not a belieffunction one. This is the reason why we adopt the Bayesian Conditioning instead of the Theory of Evidence to calculate it. Let us see in detail how it works here. Let us consider the hypothesis that only the sources belonging to ~ c S are reliable. If the sources are independent, then the probability of this hypothesis is:
II
S~O
Si~
We could calculate this "combined reliability" for any subset of S. It holds that R(O)= 1. Possibly, the sources belonging to a certain 9 cannot all be considered q~2 s
reliable because they gave contradictory information, i.e., a set of information items s such that [s]=O. In this case, the combined reliabilities of the remaining subsets of S are subjected to the Bayesian Conditioning so that they sum up again to "1"; i.e., we divide each of them by "1- R(O)". In the case where there are more subsets of S, say 9 l . . . . . O1, containing sources which cannot all be considered reliable, then: R(O)=R(OI)+ ... +R(O0 We define the revised reliability NRi of a source Si as the sum of the conditioned combined reliabilities of the "surviving" subsets of S containing Si. An important feature of this way to recalculate the sources' reliability is that if Si is involved in contradictions, then NRi<_Rb otherwise NRi=R i. The main problem with the belief funcion formalism is the computational complexity of the Dempster's Rule of Combination; the straight-forward application of the rule is exponential in the frame of discernment (number of propositional letters of L, that is smaller than the number of information items in KB) and the number of evidences. However, much effort has been spent in reducing the complexity of the Dempster's Rule of Combination. Such methods range from "efficient implementations" [25] to "qualitative approaches" [26] through "approximate techniques" [2?]. $3 also deals with uncertainty, but at the goods' level, i.e. it extends the ordering -
9
best-out method. Let g' and g" be the most credible assumptions (according to
9
inclusion-based method. Let G' i and G" i denote the subsets of, respectively, G' and G", made of all the assumptions with a priority i in -i, G ' ~ G"j. The goods with the highest priority obtained with this method are the same obtainable with the best-out method.
51
lexicographic method. G"IG"il and for ' " jl, and G"=GG' ifffor any j, IG' jl=lG " j4. anyj>i IGj]=IG Although the "best-out" method is easy to implement, it is also very rough since it discriminates the goods by confronting only two assumptions. The lexicographic method could be justified in some particular application domains (e.g. diagnosys). The inclusion-based method yields a partial ordering
Set of goods G OUTPUT: G o r d e r e d by ~G INPUT:
G1
(when ~KB is strict
and complete)
:=G
repeat
s t a c k : = KB ordered by
pop an assumption A from stack i f thereexistsa good in G2 containingA then deletefrom G2 the goods not containingA until G2 containsonly one good G put G in r e v e r s e o r d e r e d _ g o o d s delete G from G1 u n t i l G1 = r e t u r n reverse(reverse_ordered_goods)
Fig. 2. An algorithm to sort the "goods" of the knowledge base Among the "quantitative" explicit methods to perform $3, ordering the goods according to the average credibility of their elements seems reasonable and easy to calculate. A main difference w.r.t, the previous methods is that the preferred good(s) may no longer necessarily contain the most credible piece(s) of information. The belief-function formalism is able to attach directly a degree of credibility to any good g, bypassing $3 in our framework. The problem is that if a good contains only part of the information items supplied by a source, then its credibility is null (see [9] for an explanation). Unfortunately, the event is all but infrequent, so that often the credibility of all the goods is null. This is the reason why we adopt the best-out or the "average" method to perform $3 in MSBR. $4 consists of two substeps. 9 Selecting a good CG from G. Normally, CG is the good with the highest priority in
52
rejecting more assumptions than necessary to restore consistency. We believe that this could be avoided by simply considering
3 A model for Distributed Belief Revision (DBR) MSBR is not a distributed architecture since the integration/revision of the information is accomplished in a centralized way. Its simple behavior is described in figure 3.
~
pa. reliartbliae!ly agents I MsBR I
knowe l dge Fig. 3. Centralized Belief Revision (MSBR)
~ e
ELECTO IN kn~ S iowbet~dge
J
Fig. 4. Distributed Belief Revision (DBR)
Figure 4 shows what we mean by Distributed Belief Revision. Under normal operating conditions, we do not expect that the global output emergent from DBR will be better than the output of MSBR, neither for the quality nor for the quantity of the information provided. What we expect is that the distributed architecture will be: . more efficient: each local MSBR module should manage less information than in the centralized architecture and this should be very important as MSBR shows exponential complexity 2. more "robust" ("fault tolerant"): it should be able to offer an acceptable output even in cases that MSBR fails due to nodes seriously compromised. On the other hand, nowadays DBR is a viable alternative to MSBR since the prices of hardware (CPU, RAM and, expecially, mass storage) and the communication costs have been dramatically cut down. In DBR nodes exchange information with each others. Thus we need to equip them with (at least) two modules:
53 1. a communication module (Comm), which deals with the three fundamental communication policies: a) choice of the recipient of the communication b) choice of the argument of the communication c) choice of the time of the communication 2. a model for belief integration/revision in a multi-source environment (MSBR) Communication can be either spontaneous (nodes offer information to each others) or on demand (nodes ask each others for information). Among the various thinkable criteria to select the recipient of the communication, we see that two of them are worth noticing. Being guided by an "esprit de corps", one should offer its best information to the node it retain the least reliable one, with the aim of increasing its reliability. But the same collaborative spirit could lead to the opposite conclusion: one should send its best information item to the most reliable node since, if it will be recognized also by the others as the most reliable one, then that information item will be spreaded over all the group. The latter criterion seems to imply that unreliable nodes will be gradually isolated from the rest of the group. In order to increase the realism of this distributed architecture, we introduced two fundamental assumptions: 1. nodes do not communicate to the others the sources from where they received the data, but they present themselves as completely responsible for the knowledge they are passing onto the others; a receiver considers the sender as the source of the information it is sending 2. nodes do not exchange opinions regarding the reliability of the other nodes with whom they got in touch. With 1 we extend the scope of responsibility: a node is responsible not only for the information that it provides to the network as its original source, but also for the information that it receives from other nodes and, retaining it credible, passes it on to the others. With 2 we limit the range of useful information: an agent's opinion regarding the others' (and its own) reliability is drawn out from pure data regarding the knowledge domain under consideration, not from indirect opinions. By comparing its opinion with the others' ones, each node produces its own local opinion. The effects of the others' opinions depend on the rules adopted by the MSBR module. Although not necessary, we may want to extract from the network an emergent global opinion regarding the information treated by the group. To preserve the decentralized nature of DBR, this opinion shoud be synthetized not by an external supervisor/decisor, but by the entire group through some form of election: the group elects what it believes the global output to be returned to the external world should be. However, nothing prevents a user to get his/her information directly from a single node's output, since the election does not change the node's personal opinions. If the various cognitive states are quite similar, then the global output cannot differ very much from each node's one. Perhaps, the similarity
54
between the opinions of the various nodes could be taken as a parameter to evaluate the quality of the Comm and MSBR modules. The election of the group's emergent output could be done in several ways. We have no room here to explore sufficiently this matter, however at the extreme positions we see two distinct kinds of election: 1. "data driven" election: the candidates are information items; only the winners will be part of the global output (direct synthesis of the global output) 2. "node driven" election: the candidates are nodes of the network; only the winners will be charged to make up the global opinion (synthesis "by proxy" of the global output) Many strategies can be conceived by mixing these two kinds of election. We believed that the comparison of the characteristics and performances of the two paradigms illustrated in Fig. 3 and Fig 4 could be done only on a simulation basis.
4 The simulation experiment The task given to the group was that extracting as much correct knowledge as possible from a corrupted knowledge repository. The knowledge repository is trivially implemented as a couple of databases containing the same quantity of information items: a database holds correct information, the other one contains the negations of the information items in the correct database. Nodes cannot distinguish the two databases. Each node is characterized by a degree of "capacity" (between 0 and 1) that will be adopted as the frequency with which it accesses (unconsciously) the correct database. Fig 5 should clarify the structure of the experiment.
[MSBR I
I@"
-----
j ELECTION
Fig. 5. Structure of the experiment: comparison between MSBR and DBR Our final goal is that of comparing the ability of the centralized and of the distributed architectures in reconstructing the two databases. In [28] we showed the results concerning the study of the effects that interaction had on each node's cognitive state, i.e., how much the cognitive state differs from the case that the node would have not interacted with the others. This study returned a measure of (eventual) convenience for each node of having been part of the network. We sum in section 4.1 those results. In section 4.2 we show the most recent results regarding the comparison between the centralized (MSBR) and distributed (DBR) architectures.
55
4.1 Results regarding the local knowledge in the distributed architecture The results presented refer to the most aleatory case in which: 9 nodes access the databases once per simulation cycle; 9 the communication is peer-to-peer; 9 the recipient of the communication is selected randomly 9 the argument of the communication is a randomly selected datum in the preferred good 9 each node communicates once per simulation cycle In each node' cognitive state KB, the preferred good G is taken as the reconstruction of the correct database, while KB/G is taken as the reconstruction of the corrupted database. For each node we evaluated three parameters: its average reliability, the quality and the quantity of the data in its preferred good. Quality and quantity are evaluated as differences w.r.t, the case without interaction.
Quality = Q Qwithoutcommunication -
true propositions in G + false propositions outside G I where Q =
propositions in KB
Quantity = Itrue propositions in GI - Itrue propositions in Glwithoutcommunication lnodesl
Z rki
g ' - k=l Reliability of the ith node '-Inodesl where rki is the reliability of the ith node, estimated by the kth one. The first series of simulation, where the agents had different capacity, showed that the interaction increases the quality of an incapable node's cognitive state and decreases the quality of a capable one (fig. 6). Moreover, the interaction always increases quantity (fig. 7), and decreases all the nodes' reliability. If the average capacity of the group is greater than 0.5, then the capable nodes lose less than the incapable ones (fig. 8). On the opposite, if the average capacity is less than 0.5, than the capable nodes lose more than the incapable ones, as in fig. 9. We called this effect "majority effect". o.3 . o.2rr
.
.
.
.
.
i-~ r l l , ~- " - , " ~ T l r-~ -
. ,
.
. -
. 'T7 -~' - -=
+
+
9
13 Tlmo
.
.
.
r
T
. -
-,
. . . .
-
-,
,
,o.~-
I
5
17
21
25
Q
Fig. 6. Decreasing capacity from 1 to 5
8T
-
-, -
o
-
~
, -2
5
9
1
--
-
-
,.
.
r r13 Tim
.
.
-, 17
[ ~ 1 1
-,-
-
,
21
Fig. 7. Decreasing capacity from 1 to 5
56
0 , 5
-
.
.
.
.
-
-
-
17
3
21
/
25
S
9
llm,~
13
21
17
25
Th'~,
Fig. 8. Average capacity more than 0.5
Fig. 9. Average capacity less than 0.5
In a second series of simulations, we introduced a "teacher" node with the following features: 9 capacity = 1 (it accesses only the correct database) for each k, rk= I (all the nodes know that the teacher is absolutely reliable) it transmits but doesn't receive information (the teacher's cognitive state will not be contaminated by the others) In this case the average quality is positive and lightly increasing (fig. 10), the quantity gain for any node is higher than before (fig. 1 l), and the majority effect is 1o longer appreciable. 9 9
C, 4 T
,
,
15 T
o
5
;
*
I
I
17
21
25
l tlTio
Fig. 10. Decreasing capacity from 1 to 5
Fig. 11. Decreasing capacity from 1 to 5
In the last series of simulations we were curious to see if the group would have been able to realize that some of its member changed their degree of capacity, and how long would have it taken to the group to be aware of the change. After several simulations with a node decreasing or increasing its capacity, we realized that only high quality groups were able to perceive the change. For groups with an average capacity less than 0.6, the situation at the time of the change was sufficiently cahotic to hide it. This implies that only decreases of capacity will be perceived by the group. The graphs in fig. 12 and fig. 13 report the "quality" and "reliability" trends of five nodes with capacity 0.98, where the fifth one, at the 15 th simulation cycle, decreased its capacity to 0.8. O,E~T
,
,
,
-
,.
.
.
.
.
0.7 ~ /
Fig. 12
-I 6
: . I[
~
I
]6
21
Fig. 13
I 26
-
,
30
57
4.2 Comparison between the MSBR and DBR architectures In this section we compare the output of MSBR with the global output emerging from the DBR architecture under eight different combinations of communication policies (two) and election mechanisms (four). Communication policy PI: each agent gives its best (i.e., most credible in the preferred good) piece of information to the agent it retains the most reliable one Communication policy P2: each agent gives its best piece of information to the agent it retains the least reliable one Hence, we didn't consider the case of communication "on demand" and we didn't consider the case of insincere agents. The global opinion is attained by merging the different local beliefs through one of the following four kinds of voting mechanism. V1 (yes/no vote): each agentj votes "1" for a believed proposition i (i.e., propositions in its preferred good) and "0" for unbelieved pieces of information (vji = 0/1)
INodesl Wi :
Z
v ji
j=l
V2 (numerical vote): each agent j votes "cji" for a believed proposition i (i.e., its own opinion regarding the credibility of i) INodesl
Vi = L Cji " j=l
V3 (weighted yes/no vote): each agent j ' s yes/no (0/1) vote is weighted with its average reliability R j as estimated by the other agents ]Nodes[ INodesl
Vi=
"-~j __
ZRj.vji
zej
INodes[- 1
j=l
V4 (weighted numerical vote): each agent j's numerical vote is weighted with its average reliability R j as estimated by the other agents [Nodes[
Vi = ~_~Rj .cji j=l
Since the global list of the most credible beliefs is normally inconsistent, consistency will be restored (i.e., the globally preferred good will be obtained) through the "bestout" algorithm. We compared the two outputs through the percentage C of propositions in the correct place (believed if true and unbelieved if false) on the total number of propositions treated by the entire group. Let KBg and Gg be, respectively, the global knowledge
58 base handled by the group and the globally preferred good obtained after the vote and the elaboration of the best-out algorithm. Then: i !true propositions "' in Ggi + Ifalse propositions outside Gg [ (7 =
propositions in KBgi
We made many simulations with different distribution of capacity among the agents. We repeated each simulation (of 15 iterations) twenty times to reduce the effects of casuality. The reported results are the average cases referred to the seven particularly significant distributions described in Tab. 1.
Capacity Agent 4
Agent 1
Agent 2
Agent 3
Agent 5
Average
Sim. 1
0.9
0.9
0.9
0.9
0.9
0.9
Sim. 2
0.8
0.8
0.8
0.8
0.8
0.8
Sim. 3
0.6
0.6
0.6
0.6
0.6
0.6
Sim. 4
0.9
0.8
0.7
0.6
0.5
0.7
Sim. 5
0.9
0.9
0.9
0.9
0.2
0.8
Sim. 6
0.9
0.9
0.9
0.2
0.2
0.62
Sim. 7
0.8
0.8
0.8
0.2
0.2
0.56
Tab.1 The agents' capacity in the simulations We summarise the results in two tables, the first for the communication policy P1 and the second for the communication policy P2. The first column refers to the centralized approach (no communication, no voting). The other columns refer to the different kinds of voting V1, V2, V3 and V4.
MSBR V1
DBR with the communication policy P1 V3 V2
V4
C(%)
C(%)
AC(%)
C(%)
AC(%)
C(%)
AC(%)
C(%)
AC(%)
Sim. l
87.45
86.14
-1.31
86.40
-1.05
86.32
-1.13
86.30
-1.15
Sim. 2
76.34
76.47
0.13
76.71
0.37
76.93
0.59
76.84
0.50
Sire. 3
57.38
57.79
0.41
57.62
0.24
57.47
0.09
57.73
0.35
Sim. 4
69.17
68.00
-1.17
67.62
-1.55
68.41
-0.76
67.52
-1.65
Sim. 5
76.75
73.55
-3.20
74.57
-2.18
72.93
-3.82
74.53
-2.22
Sim. 6
65.69
63.47
-2.22
63.87
-1.82
63.60
-2.09
64.44
-1.25
Sim. 7
55.54
58.53
2.99
58.87
3.33
58.84
3.30
58.87
3.33
Tab. 2
59
DBR with the communication policy P2 V2 V3
MSBR V1
V4
C(%)
C(%)
AC(%)
C (%)
DC (%)
C (%)
AC (%)
C(%)
DC(%)
Sim. 1
87.45
86.38
-1.07
86.52
-0.93
86.12
-1.33
86.22
-1.23
Sim. 2
76.34
76.09
-0.25
76.19
-0.15
75.66
-0.68
76.00
-0.34
Sim. 3
57.38
57.90
0.52
57.79
0.41
58.14
0.76
57.65
0.27
-1.34
67.30
-1.87
Sim. 4
69.17
67.79
-1.39
67.24
-1.93
67.83
Sim. 5
76.75
74.27
-2.48
74.92
-1.83
74.08
-2.67
74.41
-2.34
Sim. 6
65.69
62.69
-3.00
62.90
-2.79
62.93
-2.76
62.99
-2.70
Sim. 7
55.54
54.85
-0.69
55.16
-0.38
54.39
-1.15
55.15
-0.39
Tab. 3 Fig.14 shows the temporal evolution of the parameter C for MSBR and DBR for agents with the same capacity 0.8, voting techniques V1, communication policy P1 (DBR P1-V1) and P2 (DBR P2-V1). 100
.
.
.
.
.
-
c
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
q
90T
-
-
-
,-
-
-
b-
-
-~-
-
-b-
-
-,-
-
-,
M S B R
7 0 t
-
-b-
-
-
-
-
,-
-
-
%-
-
-
t-
-
- 1 -
I-
-
-
- , -
-
-b-
-
-i
- I -
- I -
-
- I -
-
-t
9
11
e 9
D B R
~ I ~ I )
9
D B R
~ 2 ~ i
6O
5 0 + 1
l3
-
5
7
13
15
'X':Tm 9
Fig. 14
Conclusions While we are writing, we are far from the conclusion of our project. At least, we have still to try the effects of the communication "on demand" on the formation of the various nodes' cognitive state, and try other more sophisticated communication policies. We left out other intricate questions regarding the structure of the group and the nature of authority. We should also to compare the performances of the two architectures with different approaches to the treatment of uncertainty (probabilistic, possibilistic .... ). A problem with our study is that "five" is really a small number of agents in order to discriminate the performances of the various techniques.
60 However, trying to draw out some conclusions, we may say that the centralized and the distributed architectures seem substantially equivalent regarding the correctness of the results (at most a difference of two or three points per cent). In cases of high average capacity of the group, it seems that MSBR is advantaged. The advantage is more tight when the group contains few agents with very low capacity. This is probably due to the fact that the centralized MSBR module has more correct information than in the distributed case, hence more opportunities to discriminate the unreliable members. DBR seems advantaged in cases of groups with low average capacity (0.6). An hypothesis to explain this behaviour is that in presence of high degrees of uncertainty, exchanging only the most credible pieces of information can reduce the spread of false information. Of course (as expected and anticipated), the real advantage of DBR is efficiency. This is due to the fact that the its local MSBR modules handle at most the 30% of the information managed by the MSBR module in the centralized case. Having that module an exponential complexity (both the assumption-based reasoning and the Dempster's Rule of Combination contribute to this complexity) it should be evident the gain in memory consumption and CPU time.
References [1] Mason, C., An Intelligent Assistant for Nuclear Test Ban Treaty Verification, IEEE Expert, vol 10, no 6, 1995. [2] Cindy L. Mason and Rowland R. Johnson, DATMS: A Framework for Distributed Assumption Based Reasoning, in L. Gasser and M. N. Huhns eds., Distributed Artificial Intelligence 2, Pitman/Morgan Kaufmann, London, pp 293-318, 1989. [3] Huhns, M. N., Bridgeland, D. M.: Distributed Truth Maintenance. In Dean, S. M., editor, Cooperating Knowledge Based Systems, pages 133-147. SpringerVerlag, 1990. [4] de Kleer J., An Assumption Based Truth Maintenance System, in Artificial Intelligence, 28, pp. 127-162, 1986. [5] Dragoni A.F., A Model for Belief Revision in a Multi-Agent Environment, in Werner E. and Demazeau Y. (eds.), Decentralized A. I. 3, North Holland Elsevier Science Publisher, 1992. [6] Dragoni A.F., Belief Revision: from theory Knowledge Engineering Review", Cambridge [7] Dragoni A.F., Ceresi, C. and Pasquali, V., Inquiries, in Proc. of the "V Congreso Informatica", La Habana, 6-11 march 1996.
to practice, to appear on "The University Press, 1997. A System to Support Complex Iberoamericano de Derecho e
[8] A.F. Dragoni, P. Giorgini and P. Puliti, Distributed Belief Revision vs. Distributed Truth Maintenance, in Proc. 6th IEEE Conf. on Tools with A.I., IEEE Computer Press, 1994.
61
[9] A.F. Dragoni, P. Giorgini, "Belief Revision through the Belief Function Formalism in a Multi-Agent Environment", Intelligent Agents HI, LNAI n ~ 1193, Springer-Verlag, 1997. [10] Dragoni, A.F., Maximal Consistency, Theory of Evidence and Bayesian Conditioning in the Investigative Domain, to appear on the "International Journal on Artificial Intelligence and Law", 1997. [11] Alchourr6n C.E., Gardenfors P., and Makinson D., On the Logic of Theory Change: Partial meet Contraction and Revision Functions, in The Journal of Simbolic Logic, 50, pp. 510-530, 1985. [12] P. G~denfors, Knowledge in Flux: Modeling the Dynamics of Epistemic States, Cambridge, Mass., MIT Press, 1988. [ 13] P. G~denfors, Belief Revision, Cambridge University Press, 1992. [14] W. Nebel, Base Revision Operations and Schemes: Semantics, Representation, and Complexity, in Cohn A.G. (eds.), Proc. of the l lth European Conference on Artificial Intelligence, John Wiley & Sons, 1994. [15] Benferhat S., Cayrol C., Dubois D., Lang J. and Prade H., Inconsistency Management and Prioritized Syntax-Based Entailment, in Proc. of the 13th Inter. Joint Conf. on Artificial Intelligence, pp. 640-645, 1993. [16] Dubois D. and Prade H., A Survey of Belief Revision and Update Rules in Various Uncertainty Models, in International Journal of Intelligent Systems, 9, pp. 61-100, 1994. [17] Williams M.A., Iterated Theory Base Change: A Computational Model, in Proc. of the 14th Inter. Joint Conf. on Artificial Intelligence, pp. 1541-1547, 1995. [18] Dragoni A.F., Mascaretti F. and Puliti P., A Generalized Approach to Consistency-Based Belief Revision, in Gori, M. and Soda, G. (Eds.), Topics in Artificial Intelligence, LNAI 992, Springer Verlag, 1995. [19] Martins J.P. and Shapiro S.C. (1988), A Model for Belief Revision, in ~Artificial Intelligence,s, 35, pp. 25-79. [20] R. Reiter, A Theory of Diagnosis from First Principles, in Artificial Intelligence, 53, 1987. [21] Benferhat S., Dubois D. and Prade H., How to infer from inconsistent beliefs without revising?, in Proc. of the 14th Inter. Joint Conf. on Artificial Intelligence, pp. 1449-1455, 1995. [22] Shafer G. and Srivastava R., The Bayesian and Belief-Function Formalisms a General Perpsective for Auditing, in G. Shafer and J. Pearl (eds.), Readings in Uncertain Reasoning, Morgan Kaufmann Publishers, 1990. [23] Shafer G. (1976), A Mathematical Theory of Evidence, Princeton University Press, Princeton, New Jersey. [24] Shafer G., Belief Functions, in G. Shafer and J. Pearl (eds.), Readings in Uncertain Reasoning, Morgan Kaufmann Publishers, 1990.
62 [25] Kennes, R., Computational Aspects of the MObius Transform of a Graph, IEEE Transactions in Systems, Man and Cybernetics, 22, pp 201-223, 1992. [26] Parson, S., Some qualitative approaches to applying the Demster-Shafer theoo', Information and Decision Technologies, 19, pp 321-337, 1994. [27] Moral, S. and Wilson, N., Importance Sampling Monte-Carlo Algorithms for the Calculation of Dempster-Shafer Belief, Proceeding of IPMU'96, Granada, 1996. [28] A.F. Dragoni, P. Giorgini, "Learning Agents' Reliability through Bayesian Conditioning: a simulation study", in Weiss (ed.) "Learning in DAI Systems", LNAI n ~ , Springer-Verlag, 1997.
Multi-Agent Coordination by Communication of Evaluations Edwin de Jong Artificial Intelligence Laboratory Vrije Universiteit Brussel Pleinlaan 2, B 1050 Brussels, Belgium email: edwin~arti.vub.ac.be tel: +32 2 629 37 00 fax: +32 2 629 37 29 A b s t r a c t . A framework for coordination in multi-agent systems is introduced. The main idea of our framework is that an agent with knowledge about the desired behavior in a certain domain will direct other, domainindependent agents by means of signals which reflect its evaluation of the coordination between its own actions and their actions. Mechanisms for coordination are required to enable construction of open multi-agent systems. The goal of this investigation was to test the feasibility of guiding an agent with coordination evaluation signals, and furthermore to gather experience with instantiating the framework on a testbed domain, the Pursuit Problem. In the testbed system, agents have been created which choose their actions by maximizing the coordination evaluation signals they will receive. The performance of these agents turned out to rank among the best results encountered in literature, and behavior guided by coordination evaluation signals can thus be concluded to be useful in this domain.
1
Introduction
Without a proper mechanism that guides interactions, the mere result of gathering m a n y agents into a multi-agent system is chaos. Several such mechanisms exist. A short account of these is given in the next section. Most of these mechanisms assume that the agents involved have direct access to the system in which they are incorporated. This implies that only agents that were designed specifically for the application at hand can be put to use. We want to investigate whether this restriction can be overcome. To this end, a coordination mechanism for multi-agent systems has been defined that allows domain-independent agents to behave usefully in an unknown environment. To learn how to do this, they are supplied with coordination evaluation signals t h a t other, domain-specific agents send to them. The framework defines how to model domains in a uniform way, by stating which classes have to be designed. In short, an Agent inhabits an Environment to which it is coupled by its Interaction object. Furthermore, it defines a way to coordinate an agent's behavior in an unknown environment by defining a coordination evaluation signal, which shall be sent by CoordinationSignalingAgents
64
to CoordinationLearningAgents. The notion of domain independence throughout this paper refers to the notion of designing without knowledge of or making use of the domain in which the class, agents etc. in question, will be put to use. We do not claim our framework to be a general solution for coordination in any domain; the setup of teaching an agent to coordinate its actions by merely sending it evaluations of the appropriateness of their actions, restricts its scope for application to relatively simple domains with a limited number of possible actions. If the approach is successful, it provides a way to control the large, open multi-agent systems that may be common in the near future. In the research that is reported on here, the goal was to test whether coordination evaluation signals are a viable approach to guide an agent's behavior. If this would turn out not to be the case, then the attempt to teach an agent to behave solely on the basis of these signals is doomed to fail. To test the feasibility of coordination based on coordination evaluation signals, we instantiate our framework on a testbed domain, and construct a perfectly rational agent that chooses its actions by maximizing the evaluation it will receive from other agents. The test is passed if this agent's behavior turns out to be appropriate in the testbed domain. In that case, our goal of having domain-independent agents learn to act usefully in an unknown environment by learning to maximize the evaluation signals they receive and relating them to the situation in that environment, appears to be an attainable idea. The structure of the paper is as follows. First related work is discussed. Then the coordination signal framework is described. The description includes the object oriented model that is the basis for each application of the framework to an application domain. As an example, we demonstrate the instantiation of the framework to a domain, the pursuit problem as introduced by Benda [Benda et al., 1988]. Following that, we report on the experiments that have been conducted in this domain, and discuss the results. Finally, we point out the conclusions.
2
Related
Work
In the literature, several mechanisms for coordination exist that can be utilized in multi-agent systems. Most coordination mechanisms for multi agent system rely on the exchange of structured information between agents. Whenever complex communication is used, the expectation that the recipient understands the messages implies that common knowledge is assumed. Here, we diminish such common knowledge by restricting communication to scalar values representing evaluations of an agent's behavior. In the literature, the ultimate extrapolation of this idea has been investigated by examing systems without any communication at all. With cooperation without communication as described in [Genesereth et al., 1986], choices of actions depend on knowledge of other agents' payoff functions.
65 In [Kraus and Rosenschein, 1992] and [M. Fenster and Rosenschein, 1995], coordination without communication is investigated using focal points, points to which the attention is drawn. The features that characterize these focal points (e.g. the first or last element in a row, or one element in a series that in some aspect differs from the others) are investigated. An important aspect of our framework which is not investigated in our paper, is learning. The idea that agents "learn which goals to pursue" was already mentioned in [Brazdil et al., 1991]. This idea has been investigated in [Sandip Sen and Hale, 1994] and furthermore in [Sen and Sekaran, 1995]. Sen e.a. use environmental feedback to enable agents to behave well [Sandip Sen and Hale, 1994]. Their approach is very much in line with ours, and their results of learning in a multi-agent system appear promising. The main difference is that their signals are environmental feedback, whereas ours are evaluations of coordination as perceived by other agents, and can therefore not make use of global knowledge of other agents' plans. Thus, an agent has to determine its evaluation factor locally. Nonetheless, apart from other agents' private knowledge (such as plans), agents are allowed to use global information. Furthermore, s o m e agents may send coordination and s o m e agents may use them to choose their actions. These groups of agents may overlap. In our setup, agents can try to influence other agents to act to their benefit. Also, it provides, at least in principle, the possibility to construct agents that act, guide other agents and are guided simultaneously. 3
The Coordination
Signal Framework
In this section, the coordination signal framework is described. The description includes the steps that have to be taken to instantiate the framework in a domain. The main design goals of the framework were that it should be as general as possible, and that it should allow agents that have no knowledge of what behavior the system as a whole should show to choose their actions such that they play a role in achieving this behavior. To this end, the capacity of the application-specific agents to evaluate the system will be used to teach the existing agents what action to perform in each situation. The object oriented design specifies two classes from which any agent in a system may inherit. These classes are the coordination-learning agent and the coordination-teaching agent. By having the application-specific agents communicate their evaluation of coordination with other agents to them, these other agents should in principle be able to relate this evaluation to the situation and their recent actions, and learn this relationship.
3.1
Coordination Evaluation Signals
The coordination signals that are exchanged between agents can be seen as a metaphor for the signals that humans beings use to express their thoughts about situations or people's actions. They can be seen as a simplification of the
66 rudimentary vocabulary that people who speak different languages could use to coordinate their actions. However, the coordination signals we use are scalar numbers, and bear no comparison to the subtle information that can be read from facial expressions, gestures or nods of approval. Still, the amount of information contained in such non-verbal communication is tiny compared to what can be expressed in a few lines of text. The analogy is made because the purpose of non-verbal communication is often the same; i.e. to express to other people what we think of their behavior, encouraging to continue or warning them for mistakes (see e.g. [Minsky, 1985], p. 280; "The function of laughing is to disrupt another person's reasoning!"). This restriction to a minimal language for agent communication may seem a voluntary restriction; people who speak the same language are obviously better equipped to cooperate. However, we judge such a modest vocabulary for agent communication to be more realistic than one which is based on the exchange of text. The reason is that exchanging text presupposes that the context and intellectual capacities of the recipient of the information are such that the message yields the effect that was anticipated by the sender because he understands this message. Since no existing machine can be seen to exhibit intelligence that is not restricted to quite limited domains, this assumption is clearly unrealistic. As [Van de Velde, 1996] makes clear, an essential characteristic of a framework is that it places architectural commitments on instantiations that are supposed to comply to it. In this case, the main commitment is that agents are able to function without accessing domain-specific information directly. Naturally, in order to allow these agents to act usefully, they do need domain-specific information from the environment they inhabit, and they need a way to act. Both are provided by their Interaction object, which implements the interactions between agent and environment that are considered useful in a given environment, and couples these to the agent. Another architectural commitment, is that the coordination evaluation factor may not depend on global knowledge of the agents' plans. This follows from the coordination evaluation's characteristic of being an agent's evaluation of coordination, as opposed to an environmental feedback. 3.2
Discussion of the Object Oriented Design
An object oriented design of the framework has been made. We will proceed to discuss the design, and explain how it can be used to instantiate the framework on a domain. Figure 1 shows the five central classes and follows the notation of [Rumbaugh et al., 1991]. A multi-agent system that complies with the the coordination signal framework is an instantiation of a subclass of the Environment class. It can contain agents, which all have their own Interaction object. The Interaction object represents an agent's interaction with the environment. It was designed to facilitate interactions between systems according to the paradigm of structural coupling. Structural coupling [Maturana and Varela, 1992] occurs,
67 according to [Van de Velde, 1996] when two systems (agents in our case) "coordinate without exchange of representation, but by being mutually adapted to the influences that they experience through their common environment".
Environment A~ent RemoveAgent
I
Intentctlon
t Se ndCoordinatio nSio nal
I
I ~..~,t,o.,,0..,.~..,
I
l ~176176
I
"'
J
Fig. 1. OMT Object Diagram of Agents, the Environment and Interaction.
The underlying idea of the Interaction object, is that for internal processes of an agent, no essential difference exists between signals whose source is a perceptual process on one hand, and signals which will further on trigger actuators. The only characteristic of a signal that matters from the point of view of an internal process, is whether the process is the source or destination of the signal. Therefore, an interaction object has input and output signals. Two classes are derived from the Agent class: CoordinationSignalingAgent and CoordinationLearningAgent. A CoordinationLearningAgent learns to coordinate its actions by relating the signals it receives from CoordinationSignalingAgents to the environment.
3.3
Instantiatlng
the
Framework
to a Domain
To make an instantiation of the framework for a chosen domain, the following classes have to be designed. -
A subclass of Environment. This class contains all objects in the environment. The Environment class takes care of updating the interaction objects of the agents that are present. The method responsible for this can be overridden if necessary.
-
A subclass of Interaction. This class should provide domain-specific methods to access and change the
68 environment. It also determines which objects and information from the environment are influencing the agent, and what effect the actions of the agents have. This is done by continuously feeding the inputs of the interaction object with signals from the environment and by reading the interaction object's outputs and interpreting them as actions, which may affect the environment. This mechanism allows agents to interact with unknown environments. Interaction objects furthermore store the evaluation signals that have been received. - A subclass of CoordinationSignalingAgent. This will usually be a domain-specific agent. Agents of this class can evaluate the current situation in the environment, and as their name tells, they continuously send signals representing this evaluation to other agents, for example their neighbors. - Optional: a subclass of CoordinationLearningAgent. Such subclasses should be reusable for different domains. An instantiation of CoordinationLearningAgent has to learn to interact with the environment using an Interaction object that was designed for that environment. The agent itself is not required to know anything about this environment, and just tries to figure out a relation between its interaction signals; this corresponds to how its perceptions are related and what effects their actions have on them. Thus, the agents of this class should be able to learn to choose the right actions in unknown environments. Furthermore, domain-specific agents can also be derived from this class. In that case, the framework is used to coordinate a group of similar agents in an environment they already know. Finally, the choice of the Coordination Evaluation Signals that will be sent by the CoordinationSignalingAgent subclass is an important factor. This signal is the only means for a CoordinationLearningAgent to learn to adapt its choice of actions to its environment. In general, it should reflect the degree to which an agent's actions are in line with those of its neighboring agents. The minimal complexity of the signal admittedly limits the scope of the framework to relatively simple problems. In our view however this is unavoidable for a general architecture at the current state of the art in AI. The possibility of increased insight into coordination in general compensates for this limited use in practical applications.
3.4
Implementation Issues
The domain-specific CoordinationSignalingAgents 'know' the environment in which they live. Conceptually, they interact with it via their Interaction object. To achieve this, we could include references to objects contained in a domain-specific Environment subclass in the Interaction subclass. For reasons of efficiency however, we allow them to directly access parts of the environment. This does not compromise the function of the Interaction class; its purpose is to provide a domain-independent way of interacting with environments, but since
69 domain-specific agents have no need for this, we do not restrict their interactions with the environment to the interaction object. 4 Instantiation of the Coordination the Pursuit Problem
Signal Framework
for
In this section, we report on the instantiation of the coordination signal framework to a testbed domain, the pursuit problem. The pursuit problem is a wellknown testbed problem from the Distributed Artificial Intelligence (DAI) literature. We have implemented this environment, and an agent that can evaluate the coordination between itself and its immediate neighbors. The agent, called MaxCoordinationPredator, simply compares for each possible action the coordination evaluations that would result from it, and choose the action that maximizes this evaluation. Thus, if this evaluation is a correct one, it is clear that this agent behaves rational.
4.1
The Pursuit Problem
We based our implementation on the description of the game and definitions of performance measures that are described in [Stephens and Merx, 1990]. We will first give a short description of the game. -
Start A rectangular, 4-connected grid of 30 x 30 contains a single prey in the middle. This prey chooses randomly between its possible actions: staying where it is, or moving in one of the four directions (or less, if some directions are blocked). Diagonal moves are not allowed. The predators are placed at random positions. At each time-step, the prey may move first. Then the predators move. They move one after another, thus avoiding collisions.
- Outcomes Possible outcomes are capture, stalemate and escape. A capture occurs when the four positions around the prey (left, right, above and below) are occupied. If the prey tries to move beyond the border, it's an escape unless two predators occupy two of these four positions, in which case it's a stalemate. - Performance Measures Stephens and Merx use three performance measures: the Capture Ratio, Success Ratio, Success Efficiency. Stephens and Merx base their strategies on captures positions. Capture positions are the four positions surrounding the prey. They examine three strategies. The one with least communication is local control, where an agent notifies the other agents when it occupies a capture position. The second strategy is distributed control. There, intentions (the capture position an agent wants to
70 occupy) are transmitted before the move cycle. Finally, with central control, one agent commands the other agents.
! Envl ....
i
T-,
I intemct;on ]
1
/
i i
;
t l
[ GrldWorldihterlctlon [
9
]
Fig. 2. The Random moving Prey can operate in GridWorld Environments using a specialized Interaction class.
4.2
Design a n d I m p l e m e n t a t i o n
As a first step in constructing a multi-agent system for the pursuit problem, general classes for agents in gridworlds have been designed and implemented. This approach was taken with a view on eventual future implementations of gridworld problems other than the pursuit problem. Next, a pursuit problem specific environment and interaction have been designed and implemented. Again, this was quite straightforward, so no further discussion is required.
] C~;~dlndoql$1SpUdlngAg~t
7
,! ,
L .. ~__
....... r,. . . . . . . i
o......
D
9 ?,
!
i
i
Ii Piii~_7'"'"~oo 9
! .........
i
Fig. 3. The MaxCoordinationPredator can operate in PursuitWorld Environments and sends evaluation signals to its neighbors.
71 4.3
Coordination Signals
As mentioned in the general discussion on instantiating the framework, an important step in making an instantiation of the Coordination Signal Framework is the definition of application-specific coordination signals. These signals will in future work be used to teach new, domain-independent agents to learn to choose the right action in each situation. l | | l B m n B l l i i l l i m i l a i l | l i l i | l I l i n l l U i l i l l l l i l i l l i n i l l i l l l l l I n l l l n i l l l l l l l l n n i l i l l l i n l i n l l
ImimHimmmnnmnuemumminnuumninml l l l l i i l l m i l n i l l l i l i l i l l l l o m i l l lllUillnnnilnlnnlmnmlnin~nnnnl I l l l l l l l l l l l l l l l l l l l l l l llllll IIIIlIIIIIIlIIIIIIIIII~IA~Illll I l l l l l l l l l t l l l l l l i l l l a l l l l l l l l I l l l l l l l l l l l l l l l l l l l ~ l l l l l l l l l I l l P l I I l l l I l I l l l l I I ~ l l l I I I I I l l I l l l ~ l l l l l U l l l l l n l ~ l l l l l l l l l l l I l l l l ~ l l l l l l l l l l l ~ l l l l l l l l l l l l l l l l l l ~ l l l l l l l l l ~ l l l l l l l l l l l l l I l l l l l l ~ i l l l l l l ~ l l l l l l l l l l l l l l I l l l l l l l l l ~ l l l ~ l l l l l l l l l l l l l l l Illllllnl l l l ~ l l l l l l l t l l l l l l l l I l l l l l l l l l ~ l ~ l l l l l l l l l l l l l l l l l | l l l l l l i l l l ~ l l l l l l l l l l l l l l l l l I l l l l l t i l l ~ l ~ l l l l l l t l l l l l l l l l l i l l l l l l l i ~ l l l ~ l l l l l l l l l l l l l l l l I l l l l l l l ~ l l l l l ~ l l l l l l l l l l l l l l ! Illllll l l l l l l l ~ l l l l l l / l l l l l l l IIIIIIQI~IIIIIII~IIIlillIlIlll l l l l l ~ l l l l l l l l l l l ~ l l l l l i i l l l U l I I I I ~ I l l i l I I I I I l I I P l I I I I I I l l l l I l l l l l l l l l l l l l l l U l l a l l l l l l l l l l I I I I I I l I l I I I I I l I I I l I P t l I l I I I l l I l l l l l l i l l l l l l l l l l l l l l l ~ l n l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
Fig. 4. Result of a high spread factor: optimal angles between predators are favored over distance minimization. Figure 4 shows a prey surrounded by four predators, Ax..A4. A CoordinationSignalingAgent sends evaluations of the coordination between itself and its left and right neighbors. Consequently, it also receives a coordination evaluation from its left and right neighbors. The MaxCoordinationPredators chooses its moves by maximizing the sum of the two evaluations it is given. In the figure, A1 has neighbors A2 and A3, and will therefore move to the position that maximizes
Eval(a2, A1) + Eval(A~, A1) It should be noted that in some cases, the agent will have different neighbors at a position it is investigating as a possible choice to move to. As coordination evaluation in the pursuit problem was chosen to apply to direct neighbors only, the agent will take into account the coordination evaluation of this new neighbor, and not of its neighbor at the current position. In the Pursuit Problem domain, two factors appear to be important in surrounding a prey: moving towards it, and surrounding it. The coordination evaluation therefore combines a distance factor and a spread factor.
72 The distance factor should encourage minimizing the distance between a predator and a prey. Just minimizing the distance to the prey can be done by any individual, and requires no coordination. To coordinate moving towards the prey, we combine this factor with the degree to which an agent and his neighbor are at the same distance of the prey; the idea is that they should gradually approach the prey. All evaluation factors are in the interval [0..1], 0 representing a bad evaluation and 1 a perfect one. The Equidistance parameter determines the weight that is attributed to equidistance relative to the distance factor. E.g. an equidistanee p a r a m e t e r of 1 does not take absolute distance into account, only the degree to which the distances of the agent and his neighbor are similar, and one of 0.5 would take both factors into account evenly. An agent's position is expressed in polar coordinates relative to the prey. The spread factor can be formalized by means of a factor that takes the angles of the predators relative to the prey into account. When the agents are maximally spread, the angles between them are equal to
r
2~ = #predators
The overall coordination evaluation signal is a single value that combines the distance factor and the spread factor.
Eval(Ap,Aq)
=
5 . D i s t ( A p , A q ) + (1 - 5) 9Spread(Ap,Aq)
Dist(Ap, Aq)
=
~ . equidistance(A r, Aq) + (1 - e)- distance(Ap, Aq)
equidistance( Ap , Aq) distance(Ap,Aq)
=
Spread(Ap, Aq)
1 -
I A p ' d - Aq'dl max( Ap.d, Aq.d)
1
Ap.d + Aq.d
1 -
[r
2rv~
- ((Av.phi - Aq.phi) rood 27r)[
where Ema~
=
27r- r
For an agent A, A.r is the angle and A.d the distance, both relative to the prey. 5 is the distance p a r a m e t e r (weight of Dist, relative to Spread). c is the equidistance parameter (weight of the equidistance factor, relative to the
73 distance factor). Emax is the maximal error in the angle. The MaxCoordinationPredator makes its choices using information that is locally determined, with a global view. This implies that no intentions of other agents (nor information in which these are implicit) are known to the agent. It is therefore difficult to locate the communication complexity in or between the three control systems of [Stephens and Merx, 1990] that have been described; the MaxCoordinationPredator has no knowledge of intentions of other agents, which would put it on a par with their local control. However, it does have a global view which, as a matter of fact, it uses only to determine the positions of its two neighbors. It seems difficult to compare the value of this information to the value of knowledge of other agents' intentions.
5
Results
The coordination evaluation formula contains two parameters. These had to be fixed first. The Equidistance parameter and the distance parameter were both initially set to zero. With these parameters, the agents judge the coordination with their neighbors to be useful when they maximize their spreading around the prey. This yields no prey-following behavior, but causes the agents to try to maintain angles of 90 degrees. Then the distance parameter was gradually increased. The Equidistance parameter was kept at 0, causing the degree to which predators are at comparable distances to the prey not to be taken into account. Even at low values of the distance parameter (e.g. 0.05), the predators move towards the random moving prey. An interesting anomaly occurred: sometimes, the agents would converge to the orthogonal positions around the prey (i.e. left, right, above and below it), but in many cases they converged to the diagonal positions. As the angles between the agents are also optimal in this situation, it is another stable configuration. However, in the 4-connected pursuit problem we are investigating, only the first of these two stable positions is a capture situation. This implies that the coordination evaluation formula should favor the first configuration over the second. It turned out that raising the distance parameter yields just that result, which is achieved at a distance parameter value of 0.96. For more details, see [De Jong, 1997]. With these parameters, we tested the performance of the system. This turned out to be satisfactory for our purposes. Further experimentation with the equidistance parameter did not yield better results.
5.1
Analysis of Performance
Figure 5 shows how the constituent factors of the coordination evaluation for one predator evolve over time during one pursuit. Note that the best results
were obtained with a distance parameter of 0.96, which results in a coordination
74
1,4 E q u l d l s t a n c e Factor A b s o l u t ~ d i s t a n c e ~ a o t o r ~..... Spread ~ a c t o r ~
i.2
Total Evaluation ~-
I
0.8
0,6
,
0.4
/
/
0,:
0 ~ 0
5
iO
15
20
25
30
Fig. 5. Development of the evaluation's constituent factors for one predator over time in one pursuit.
evaluation approaching the distance term Dist very closely. The distance factor reflects the absolute distance because the equidistance parameter is set to zero. Figure 6 shows the total coordination evaluations as received by the 4 predators over time in one pursuit. It rises gradually, and stays just below its theoretical m a x i m u m of 1, This m a x i m u m cannot be attained because the predators are not allowed to move over the prey, and consequently the distance term never reaches the o p t i m u m of 1. Our goal was not to construct an optimal predator for pursuit problems, but to test if it is possible to define a coordination evaluation factor that, when maximized, is acceptable in this testbed domain (i.e. yields reasonable predator behavior). If this turns out not to be the case, then the idea of teaching a CoordinationLearningAgent using this factor would have to be rethought.
Capture Stalemate Escape Local-control 1 10 47 43 Distributed-control: 83 13 3 Central-control ~ 100 0 0 MaxCoordinationPredator ! 100 0 0 Table 1. Pursuit outcomes (percentages) of Stephens & Merx (30 trials) and MaxCoordinationPredator (100 trials)
75
1.4 P r e d a t o r A -~-P r e d a t o r B ....... Predator C-~Predator D - * -
1.2
0.8
0.6
0,4
0.2
0
0
|
|
|
i
m
5
10
15
20
25
30
Fig. 6. Development of the total evaluation for four predators over time in one pursuit.
The first table shows the relative number of captures, stalemates and escapes. The MaxCoordinationPredators yielded the same results as central control, i.e. perfection.
Local-controlI Distributed-control1 Central-control 1 MaxCoordinationPredat or
Capture ratio Success ratio Success efficiency 0.100 0.333 0.319 0.833 0.900 0.697 1.000 1.000 0.641 1.000 1.000 0.667
Table 2. Performance metrics of Stephens & Merx (30 trials) and MaxCoordinationPredator (100 trials)
The second table shows the efficiency of the predators. As all games resulted in a capture for both the MaxCoordinationPredator, the capture ratio and success ratio are not of interest. The success efficiency is just between Stephens and Merx's two most successful strategies. A game theoretic approach to the pursuit problem encountered in the literature (see [Levy and Rosenschein, 1992]) does not yield better results. In [Korf, 1992], an elegant solution to pursuit games is given. For hexagonal and diagonal (8-connected) games, a distance factor com1 Stephens & Merx
76
bined with a repulsion factor yielded very successful results. The repulsion factor is based on distances between predators, and has a function comparable to that of our spread factor, which is based on angles between predators relative to the prey. A difference is that our spread factor influenced results positively in the orthogonal (4-connected) games we are concerned with here, whereas Korf found the repulsion factor to cause stalemates in orthogonal games and therefore only applied it in the hexagonal and diagonal variants. Other differences with our experiments are that the grid is 100 x 100 instead of 30 x 30, and t h a t the prey moves to the position t h a t is furthest away from the nearest predator and rests in 10% of its moves, instead of randomly choosing between the allowed options. These differences make a comparison of the methods difficult; it is reasonable to assume a random moving prey can be captured more easily than one that evades its predators. For the orthogonal (4-connected) case, Korf used a distance function based on the m a x n o r m which, like his other solution and like ours, resulted in a capture in all cases. The results were less satisfactory than those of the other two types of game, since the system had to be stopped artificially when a capture occurred to prevent further motion of the predators. Our solution did not suffer from this problem. The coordination evaluation factor can be concluded to be more than sufficient for our purposes. We therefore conclude that the concept of a domainindependent CoordinationLearningAgent, which learns to act by relating a CoordinationSignalingAgent's signals to its interaction with the environment is, at least in the case of the pursuit problem, in principle possible.
CoordlnatlonLurnlngAgent
I i r
-
-
Interaction 1
~
,
Leamer
Fig. 7. A Learning Agent can act in any environment by learning to maximize the evaluations it receives.
6
Conclusions
A framework for coordination in multi-agent systems has been described. The goal of the experiments that are reported on here, was to acquire experience with the framework's instantiation to a domain, and moreover to test whether it is possible to base an agent's behavior on the coordination signals t h a t have been defined.
77 Both findings were positive. The instantiation of the subclasses for a multiagent system was straightforward. The MaxCoordinationPredator, an agent that chooses its actions by maximizing the coordination evaluation signals it will receive, ranks among the best predators found in literature on the pursuit problem.
7
Future work
If coordination learning agents in a finite domain, such as the pursuit problem, would have a perfect memory and infinite experience, they would be able to perform just as good as agents that were designed specially for that domain. Two factors trouble this ideal picture. One is that in many domains, waiting until all situations have appeared in the learning phase is clearly unacceptable; even in a simple problem as the pursuit problem the number of different situations that can occur is large (in a 30 x 30 grid 900!/895! ~ 5.8.1014, assuming the individual agents can be recognized). The other one is that machine learning algorithms have not reached the state of perfection. Because of the large number of possible situations, learning will have to depend on features that capture the relevant aspects of a domain, and automatic feature extraction remains a hard task. Nevertheless, in the case study of the framework's application to the pursuit problem, acting based on the coordination evaluation signals turned out to yield satisfactory results. In future research, we want to investigate CoordinationLearningAgents, who learn to relate coordination evaluation to the situation and thus learn to be useful in pursuing a prey, even though not designed for this task. Possible extensions are using multiple inheritance (agents that both send signals and learn from signals), extending the vocabulary (enabling agents to suggest actions to other agents), and using non-random preys.
8
Acknowledgements
The author wants to thank Bram Bakker, Dick de Ridder and 3 anonymous reviewers for useful comments on an earlier version, and Walter van de Velde and Ronald Schrooten for interesting and useful discussions. This research was partially funded by EU Telematics Information Engineering projects MAGICA and GEOMED. GEOMED (IE 2037) stands for 'Geographical mediation systems'. MAGICA (IE 2069) stands for 'Multi-Media Agent-based Interactive Catalogs'. Additional funding has been provided by an OZR contract for research on models for coordination and competition in multi agent systems.
References [Benda et al., 1988] Benda, M., Jagannathan, V., and Dodhiawalla, R. (1988). On optimal cooperation of knowledge sources. Technical Report BCS-G2010-28, Boeing AI Center.
78 [Brazdil et al., 1991] Brazdil, P., Gams, M., Sian, S., Torgo, L., and Van de Velde, W. (1991). Learning in distributed systems and multi-agent environments. In Kodratoff, Y., editor, Machine learning- EWSL-91, Lecture Notes in Artificial Intelligence, vol. 482, Berlin. Springer Verlag. [De Jong, 1997] De Jong, E. (1997). Ai-memo 97-05. Technical report, Vrije Universiteit Brussel, AI Lab. [Genesereth et al., 1986] Genesereth, M. R., Ginsberg, M. L., and Rosenschein., J. S. (1986). Cooperation without communication. In Proceedings of the National Conference on Artificial Intelligence, Philadelphia, Pennsylvania. AAAI-86. [Korf, 1992] Korf, R. E. (1992). A simple solution to pursuit games. In Working Papers of the l lth International Workshop on Distributed Artificial Intelligence. [Kraus and Rosenschein, 1992] Kraus, S. and Rosenschein, J. (1992). The role of representation in interaction: Discovering focal points among alternative solutions. Decentralized Artificial Intelligence III. [Levy and Rosensehein, 1992] Levy, R. and Rosenschein, J. (1992). A game theoretic approach to distributed artificial intelligence and the pursuit problem. Decentralized Artificial Intelligence III. [M. Fenster and Rosenschein, 1995] M. Fenster, S. K. and Rosenschein, J. (1995). Coordination without communication: Experimental validation of focal point techniques, in ICMAS '95. [Maturana and Varela, 1992] Maturana, H. and Varela, F. (1992). The Tree of Knowledge: The biological roots of human understanding. Shambala, Boston and London. [Minsky, 1985] Minsky, M. (1985). The Society of Mind. Simon & Schuster, London. [Rumbaugh et al., 1991] Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., and Lorensen, W. (1991). Object-Oriented Modeling and Design. Prentice-Hall International, Inc. [Sandip Sen and Hale, 1994] Sandip Sen, M. S. and Hale, J. (1994). Learning to coordinate without sharing information. In Proceedings of the National Conference on Artificial Intelligence, pages 426-431, Seattle, Washington. [Sen and Sekaran, 1995] Sen, S. and Sekaran, M. (1995). Multiagent coordination with learning classifier systems. In G. Weiss, S. S., editor, Adaption and Learning in multiagent systems, pages 218-233, Berlin, Heidelberg. Springer Verlag. [Stephens and Merx, 1990] Stephens, L. M. and Merx, M. B. (1990). The effect of agent control strategy on the performance of a dai pursuit problem. In Proceedings of the 1990 Distributed A I Workshop. [Van de Velde, 1996] Van de Velde, W. (1996). Cognitive architectures - from knowledge level to structural coupling. In Steels, L., editor, The Biology and Technology of Intelligent Autonomous Agents, pages 197-221.
Causal Reasoning in Multi-Agent Systems B. Chaib-draa chaib~ift.ulavai.ca DSpartement d'informatique, Universit6 Laval Ste-Foy, Qu6bec, Canada, G1K 7P4
A b s t r a c t . Causal knowledge involves many interacting concepts that
make them difficult to deal with, and for which analytical techniques are inadequate. Usually, a causal map (CM) is employed to cope with this type of knowledge. Causal reasoning is important in multiagent environments because it allows to model interrelationships or causalities among a set of individual and social concepts. This provides a foundation to (1) test a model about the prediction of how agents will respond to (unexpected or not) events; (2) explain how agents have done specific actions; (3) make a decision in a distributed environment; (4) analyze and compare the agents' causal representations. All these aspects are important for coordination, conflict solving and the emergence of cooperation between autonomous agents. In this paper, we investigate the issue of using cognitive maps (CMs) in multiagent environments. Firstly, we explain through different examples how and why these maps are useful in those environments. Then, we present a formal model which establishes the mathematical basis for the manipulation of (CMs). The formal model is used to derive a cognitive map from a set of assertions, to determine the total effect of any concept variable on any other concept variable.
1
Introduction
Causal reasoning in multiagent systems is about interrelationships between agents. It involves many relations among a variety of social and individual concepts and provides a foundation to: (1) predict some (expected or not) events; (2) explain past events; (3) make a decision; and (4) analyze and compare the causal representations of agents. All these aspects can be useful for coordination, conflict resolution and the emergence of cooperation between autonomous agents. Causal knowledge generally involves many interacting concepts that make them difficult to deal with, and for which analytical techniques are inadequate [23]. A causal map (CM) (or a cognitive map) is generally used to cope with this kind of problems. In this paper, we investigate the issue of using causal maps in the context of multiagent systems. Firstly, we explain why and how these maps are used in those environments. Then, we present some formal aspects for causal maps and we explain briefly how to use those aspects. Generally, the basic elements of a causal map are simple. The concepts an individual (an agent or a group of agents) uses are represented as points, and
80 the causal links between these concepts are represented as arrows between these points. This representation gives a graph of points and arrows, called a causal map or cognitive map. The strategic alternatives, all of the various causes and effects, goals, and the ultimate utility, of the decision-making agent can all be considered as concept variables, and represented as points in the causal map. Causal relationships can take on different values based on the most basic values + (positive), - (negative), and 0 (neutral). Logical combinations of these three basic values give the following: "neutral or negative" (@), "neutral or positive" (| "non-neutral" (=t=), "ambivalent" ( a ) a n d , finally, "positive, neutral or negative" (i.e., "universal" ) (?) [1,22]. The real power of this approach appears when a causal map is pictured in graph form. It is then relatively easy to see how concepts and causal relationships are related to each other and to see the overall causal relationships of one concept with another, particularly if these concepts are the concepts of several agents. For instance, Fig. 1 shows relationships between three nations (considered as agents): Taiwan, China, Moscow (adapted from [22]). This portion of map is in fact the following allies' belief: "Sending modern arms to Taiwan promotes the reconciliation between China and Moscow". Notice the two basic types of elements that this portion of map has: concepts and causal beliefs. The concepts are treated as variables, and causal beliefs are treated as relationships between variables. A concept variable is something like "Fury of China". The second type of basic element in a CM is a causal assertion like + that means "facilitates", "promotes", etc. Causal assertions are regarded as relating variables to each other, as in the assertion "Sending modern arms to Taiwan promotes Fury of China".
Sending modern arms to Taiwan
(a)
+
|
~. Fury of China (b)
Reconciliation :~ between Peking and Moscow
(c) Fig. 1. A Portion of a Causal Map (adapted from [28])
Notice that any causal map can be transformed in a matrix called an adjacency or valency matrix. A valency matrix is a square matrix with one row and one column for each concept in a cognitive map. Thus, the causal map of Fig. 1 can be represented by the following matrix: a
b
c
+o~ It is important to note that causal maps (CMs) follow personal construct theory, first put forward by Kelly [15]. This theory provides a basis for representing
81 an individual's multiple perspectives. Kelly suggests that understanding how individuals organize their environments requires that subjects themselves define the relevant dimensions of that environment. He proposed a set of techniques, known collectively as repertory grid, in order to facilitate empirical research guided by the theory. Personal construct theory has spawned many fields and has been used as a first step in generating causal maps for decision support. CMs have been used for decision analysis in the fields of international relations [1,7], administrative sciences [24], management sciences [11,26], and distributed group decision support [30,31]. In the latter context, Zhang and his colleagues provided the notions of NPN (Negative-Positive-Neutral) logic, NPN relations, neurons, and neural networks. Based on these notions, they constructed D-POOL, a cognitive map architecture for the coordination of distributed cooperative agents. This architecture consists of a collection of distributed nodes, each node being a CM system coupled with a local expert system. To solve a problem, a local node first pools cognitive maps from relevant agents into a NPN relation that retains both negative and positive assertions. Then, new cognitive maps are derived and focuses of attention are generated. With the focuses, a solution is proposed by the local node and passed to the remote systems. Based on their viewpoints, the remote systems respond to the proposal and D-POOL strives for a cooperative or compromised solution through coherent communication and perspective sharing. In doing so, Zhang and his colleagues focused on quantitative models of CM, based on particular techniques for associating intervals with edges of directed graphs. In their model, quantities combine by propagation along paths, but there is no other connection to the original spirit of cognitive maps which is the causal reasoning. Except for Zhang's work, all other approaches to CMs were based on ad-hoc inference mechanisms about the consequences of a CM. Thus, the definition of a precise semantic interpretation of qualitative causality has received very little attention. The only work that we are aware of in this context is Wellman's approach [29]. This author used an approach based on graphical dependency models, for probabilistic reasoning, and sign algebras, for qualitative reasoning. This type of approach is usually used in AI, However, his techniques are applicable only in the acyclic case [29]. In this paper, we propose an alternative approach based on relation algebra which takes into account the cyclic case. The paper is structured as follows. The next section explains, through different examples, why and how CMs are useful in multiagent environments. Section 3 gives formal rules allow to determine indirect and direct effect and explains how to use those rules. Finally, the paper is summarized.
2
Causal
Maps
in Multiagent
Systems:
Why
and How
The issue of using causal maps in multiagent environments is an interesting topic, because causal maps include the following ideas that are important in this type of environment:
82 1) 2) 3) 4) 5) 6)
causal associations help to reason about a global view of agents; causation allows explanation of past events; causation allows prediction of future actions; causation allows anticipation of effects of the agents' changes; causal evaluation is a way of choosing among alternative actions or goals; causal maps allow to represent and relate the subjective perceptions of agents.
All these ideas are important for negotiation or mediation between autonomous agents. Negotiation and mediation are notions most often stressed in multiagent systems [20] and are generally used to solve disparities and uncertainties which are generated by the bounded vision that each agent has because of its limited capacities. Notice that if these disparities and uncertainties are not solved, they can lead to some difficulties for interaction and coordination between agents, including conflicts between them. In the rest of this section, we explain how causal maps can be used in the context of coordination, decision-making, emerging cooperation, and reasoning about conflict and negotiation. 2.1
C a u s a l M a p s as a T o o l for Reasoning about Coordination
In multiagent environments, the coordination process, considered as a central problem, has as objectives: (1) to ensure that all necessary portions of the overall problem are included in the activities of at least one agent; (2) that agents interact in a manner which permits their activities to be developed and integrated into an overall solution; (3) that agents act in a purposeful and consistent manner, and (4) that all of these objectives are achievable with the available computational resources [18]. A large part of these objectives can be achieved if an agent has a wider view of the group of agents. This specific agent can then manage the activities of the other agents more efficiently. However, such an agent is rare in a realistic environment, where the bandwidth of communication is limited and hence it is difficult to keep one agent informed of all the intentions of other agents. Notice however that if the environment is not rapidly changing, agents' unchanging beliefs and interrelationships between these beliefs can be represented by causal maps. In these conditions, if an agent knows about the causal assertions of another agent, she can manage her actions and those of this agent. In the same context, if an agent knows the causal assertions of a group of agents, then she can coordinate the activities of this group. For instance, imagine three professors (considered as agents) A1, A2, A3, who desire to coordinate their respective courses C1, C2, 6'3 for the next semester. Each course is subdivided in two sections : S~, S~ for C1 ; S 21, $2 for C2; and S 3, S~ for 63. In addition, each section has an influence on the other sections. Thus, a section can facilitate, promote, hinder or prevent the goals of another section. Influences between sections in our context are: adopting the same notation, adopting the same book, giving a course before or after another, etc. If we assume that A2 is the agent which has
83 the widest view of the situation, that is, she knows causal assertions of A1, A2 and As, then this agent can compare the causal representations and negotiate with A1 and A3 (according to some social laws) in order (1) to resolve problems due to the negative relations and (2) to manage the positive relations efficiently.
2.2
Modeling Subjective Views by Causal Maps
In multiagent environments, each agent is conscious of only a portion of the world, because its perception is limited. Each agent must represent its portion to reason about it and this representation process is generally subject to problems of incompleteness. In these conditions, agents have to be able to cope with differences between their respective views. Researchers in distributed systems have used different approaches for resolving such disparities between agents. Thus, Halpern and his colleagues [12] have used reasoning about knowledge to study the knowledge of agents who reason about the world and each other's knowledge. Gmytrasiewics and Durfee [13] have elaborated a recursive modeling method (RMM) based on a payoff matrix, which succinctly and completely expresses the subjective view of an agent about its environment, its capabilities, and its preferences. Other frameworks have been proposed for reconciliation of disparate views of situations: for example the contract net protocol [25] or partial global plans [8,9]. Finally, a negotiation mechanism as a process of improving agreement (reducing inconsistency and uncertainty) has also been used [17,21,27]. In this subsection, a new approach based on causal maps is proposed for reducing disparities between agents. As stated previously, there are many reasons behind this choice: reasoning on others' subjective views, prediction, explanation, etc. Note that in an uncertain multiagent environment it is fundamental to our common sense (and also to the personal construct theory of Kelly [15]) that every individual agent sees a situation through an unique set of perceptual filters that reflects its capabilities and its experience. Therefore, it is unusual for each agent to see a situation as others see it and it is necessary to consider the interactions of individuals or groups of individuals having different appreciations of a situation and hence different CMs. Evidently, different CMs might produce a conflict between agents. A conflict is considered here as a situation in which individuals possess CMs that cannot apparently be conciliated. One method to solve this type of conflict is to allow agents to negotiate or to go to mediation in order to develop more harmonious relations. Negotiation here means that some agents try to alter others' causal maps by persuading these agents that they have real interests to do these alterations, and if this could be achieved, then more harmonious interactions could develop. On the other hand, mediation means appealing to a mediator who negotiates with each of the participants in conflict to arrive at a mutually satisfying arrangement. The role of this mediator consists in coming up with a negotiated agreement by proposing arrangements based on some arguments. In the context of modeling subjective views by CMs, it is helpful to visualize the structure of a causal map by drawing it as a graph. This graph can be
84 Staff's
Operation costs \\
recruitment
+
.t-
~ Wages
| Teaching andresearch facilities
+
I
'i
Teaching + and research < facilities
I,
Global income" ~
I i
Global income ~ +
\. Advertising and marketing compagnies\ ,
Partial income
~ Partial income ,~, \
selectionbased - ~'N Number on scholarrecords of students
i
-
,
selectionbased ~ on scholarrecords I
+ +
1
.... -
i
~ Failures
Number of students
(a)
:
> Performance ofuniversity
> Performance of university (b)
Fig. 2. (a) Administrators' causal map (CMA); (b) Administrators' perception of staff's causal maps (CMsA)
c o n s t r u c t e d by one or m a n y agents, in order to p e r f o r m a causality-based analysis where choices are m a d e in terms of the consequences of believed courses of actions and can be explained in terms of antecedent circumstances. In fact, we can use causal maps at different levels to represent the subjective views of agents [5]. Precisely, first-order CMs show the views of individuals (or groups of individuals) such as I and J, second-order CMs show w h a t agent I thinks agent Y is thinking and vice versa. Third order m a p s show w h a t agent I thinks agent Y thinks agent X is thinking and vice versa. Similarly, higher order CMs can be constructed, b u t the reflections on reflections rapidly diminish like mirrors facing each other. Of course, higher order causal m a p s are by no means the only m e t h o d of depicting such structures. However, the m e t h o d proposed in this section is relevant in those cases where some sort of causal belief structures a p p e a r to be helpful in thinking a b o u t inter-agent interaction. As illustrative examples, consider the m a p s in Fig. 2 and 3, which represent the first and second-order perception of a situation involving a typical interaction between administrators and the staff in a university. Notice t h a t for the sake of simplicity and readability of causal maps, we have only considered here the three
85 basic relationships (% 0, _)1. In these conditions, Fig. 2(a) indicates the point of view of administrators and expresses that the global income of the university depends both on the number of students and upon supplementary sources of funds such as those provided by advertising and marketing campaigns. In Fig. 2(b), administrators believe that the staff has a bounded vision of the overall situation and, particularly, administrators believe that the staff does not understand the importance of advertising and marketing campaigns for improving the number of students. Staff recruimaent
Staff - - + > Teachingand recruitment researchquality
Teaching and research facilities
Global 9 income~
+ > ..... womng facilities
Precafious~ staff
selectionbased on scholar records
l-
Precarious < staff
+
Global income
Partial
Partial income
income
T+
§
selectionbased
> Number ofstudents
on scholar records
> Number ofstudents +
> Failures -
~. Performance
"
~Failures
of university
(a)
Ped0rmance ofuniversity
(b)
Fig. 3. (a) Staff's causal map (CMs); (b) Staff's perception of Administrators' causal maps (CMAs)
The members of the university staff have also their point of view on the overall situation, as indicated in Fig. 3(a). Their causal map CMs emphasizes the working facilities (of the staff) and the recruitment of other staff in order to improve teaching and research quality. Notice that this quality has an indirect positive effect on the global income. Their causal map also shows that the global income promotes precarious jobs that, in turn, negatively influence global 1 0 represents here the "unrelated" and the "neutral".
86 income. Thus, the staff's causal map specifies what is important for the members of staff: their working conditions and the recruitment of precarious staff which has a negative indirect effect on the recruitment of the permanent staff. Finally, causal map CMAs of Fig. 3(b) denotes the staff's perception of administrators. In this cognitive map, the staff believes that administrators don't see the importance of adequate working conditions of the staff and the worst thing is that administrators try to replace permanent staff by precarious staff, with few advantages, in order to improve global income. To summarize, subjective views represented by causal maps can be used by agents to reason on others. This reasoning can bear on (1) predicting what others can do; (2) explaining what others have done; (3) trying to demonstrate to others the importance of some area of causal relationships between concepts, etc. All these aspects allow agents to coordinate their activities and to develop more coherent interactions between them. Now we describe the reasoning process, based on subjective views, through the previous examples. To facilitate the representation of causal maps, we code the concept variables as follows: 1: Recruitment of staff, 2: Staff's wages, 3: Operation costs, 4: Wages, 5: Global income, 6: Partial income, 7: Advertising and marketing campaigns, 8: Number of students, 9: Selection based on school records, 10: Performance of university, 11: Teaching and research facilities, 12: Teaching and research quality, 13: Working facilities, 14: Precarious staff, 15: Failure (of students). In this case, CMA, CMsA, CMs, and CMA8 can be easily transformed as square adjacency matrices of size 15 x 15, VA, VSA, Vs and VAS. In these matrices, each element {ij} takes a value { + , 0 , - } as expressed in causal maps CMA, CMsA, etc. This matrix representation of CMs allows administrators and staff to construct new matrices A (merging the information from VA and VSA) and S (merging the information from Vs and VAS). In matrix A (resp. S), the notation Ix, y] in entry i, j expresses that the causal relation between C.i and Cj is x in the causal map CMA (resp. CMs) and y in the causal map CMsA (resp. CMAs). Thus, for example, the notation [ + , - ] in A expresses that the relationship between Ci and Cj is + in causal map CMA and - in causal map CMsA. Entries consisting of 0,0 are left blank. 1
A =
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2
3
4
5
+,0 § +,0
-,0
6
7
8
9
10
11
-,0 +,0
+,+
+,+ +,0 +,+
+i+
12
13
14
15
87 With A and S, each party can compare his causal relations with those of the other. When comes the time of negotiation between both parties (administrators and staff) to renew their "social contract", matrix A permits administrators to see on what causal links they think they agree with the members of staff (i.e., [+, +] or [ - , - ] ) and on what other causal links they think they disagree with them (i.e, [+, 0] or [-, 0]). In the latter case, administrators try to convince the members of staff to alter their CMs so that the causal links [+,0] and [-,0] become [+, +] and [-, -]. On the other hand, the staff defends its point of view which is based on matrix S. Members of the staff try to convince administrators to change their beliefs so that causal links [0, +] and [0,-], as reflected in S, become [+, +] and [ - , - ] . 1
2
3
4
5
6
7
8
9
10
11
1 (
S=
2 3 4 5 6 7 8 9 10 11 12 13 14
15 i,
12
13
14
+,0
+,+
15
+,0
+,0 +,+ +,+ +,+
+,0 0,--
We now examine the case where possible actions are considered by agents (prediction) and suppose that in our context of university, administrators want to build a new building for services (including restaurants, banks, bars, libraries, movies, theater, etc.) to raise money. Administrators can view this action (through their causal map, CMA) as a new contribution for improving the global income of the university, whereas the university staff can consider this idea (through their subjective view on the administrators' causal map, CMAs) as a way to recruit more precarious staff. This case is a conflict and it can be solved by a negotiation between administrators and staff. To achieve this, administrators must try to alter CMA8 by persuading, with arguments, the staff that: (1) the university community needs to have such a new building, and (2) the staff's belief about the recruitment of more precarious staff is a misperception. The staff can counteract to demonstrate that it is just the opposite or that administrators want to make money, etc. Now suppose that administrators and staff do not agree on a satisfying settlement at the time of discussions about their social contract. In this situation, both parties might agree on appealing to a mediator from another university to find a solution to this impasse. We assume that they agree on a mediator who knows very well the situation (for example, the chosen person is a former professor who has been working in administration for ten years). In these conditions, the chosen mediator knows the concepts which are relevant to this situation and, consequently, can construct a grid from those concepts that she communicates to
88
both parties in order to collect their beliefs about causal links and therefore their CMs. After this collection, the mediator constructs A and S and analyzes them through a new matrix M to identify potential conflicts and potential positive relationships. 1
M
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2
3
4
5
6
7
8
9
10
11
UA L/A ~A
12
13
14
US
(P
15
US
UA SA A UA A A A
A A
Us OB
UAS
The matrix M includes A and S and is based on Bryant's representation [6]. In this representation, the notation [+, 0;+, 0] indicates that the causal relationship between concepts Ci and Cj is + in the causal map CMA, neutral in CMsA, + in CMs and neutral in CMAs. In these conditions, [+, + ; + , + ] and [-, - ; - , -] express a general agreement (noted A) about a causal link. Similarly, [+, 0;0, 0] and [-, 0;0, 0] (noted L/A) or [0, 0;+, 0] and [0, 0;-, 0] (noted Us) express an unilateral view of administrators or staff. In the same way, notations such as [0, +;0,0] and [0,-;0,0] (noted UsA) or [0,0;-, 0] and [0,0;+,0] (noted L/A8) specify an unilateral view on another agent. Other causal links include: (1) [+, +;+, 0], [-, - ; - , 0], [-, 0;-, - ] or [+, 0;+, +];
(2) [+, +;0,0], [0,0;+,+], [-,-;0,0] or [0,0;-,-]; (3)[+,0;+,0], [-,0;-,0], [0, -;0, -] or [0, +;0, +]. In (1), both parties see the same causal relationship and one side believes that the other side shares the same perception (noted SA or $B). In (2), one participant holds a point of view and believes the other does too (noted (9A or OB). Finally, in (3), both parties see causal relations in the same way, but neither believes that the other shares this perception (noted S). The matrix M allows the mediator to study: (1) areas of consensus between agents (such as, for instance, the negative relationship between the selection based on scholar records and the number of accepted students at the university, the negative relationship between teaching and research facilities, and students' failures, etc.); (2) unilateral views of administrators represented by L/A; (3) unilateral views of the university staff, Us. The problem of how the mediator reasons on these aspects to solve conflicts between both parties will be investigated in the future. Notice that the emerging potential for cooperation can also be studied by analyzing the agents' cognitive maps. This suggests that cognitive map analysis can be used to explore the following: (1) How do the agent's characteristics (e.g., its rationality, its sincerity, its role, etc.) affect its cognitive map? (2) How to efficiently focus on patterns of similarities and differences between agents, with
89 respect to both their causal beliefs and their roles? (3) How to separately analyze cognitive maps in terms of complexity, density, imbalance and inconsistencies, relative to some salient goals? The objective behind this analysis is to locate groupings of goals that could result in a coalition of agents on the issue of establishment of a cooperation [14]. 2.3
Causal Maps as a Tool for Qualitative Decision Making in Multiagent Environments
Causal maps can also help an agent or a group of agents considered as a whole to make a decision. Given a cognitive map with one or more decision variables and a utility variable, which decision should be taken and which should be rejected? To achieve this, the concerned agent should calculate the total effect of each decision on the utility variable. Those decisions that have a positive total effect on utility should be chosen, and decisions that have a negative total effect should be rejected. Decisions with a nonnegative total effect on utility should not be rejected, decisions with a nonpositive total effect should not be accepted. No advice can be given about decisions with an universal, a non-zero, or an ambivalent total effect on utility. To solve the decision-making problem, an agent generally needs to analyze its cognitive map or the cognitive map of its organization if this map is available. To illustrate the decision-making process in the context of multiagent environments, consider, for example, the causal map of a professor (considered as an agent) P1 shown in Fig. 4. This professor has to choose between two courses D1 and D2 (D1 and D2 are decisions variables). ~rthermore, P1 works with a colleague P2 in the same research group (this group is called here G12) and shares with him some students. Pl's causal map, shown in Fig. 4, includes the following beliefs. D1 favors the theoretical knowledge of G12's students. Greater theoretical knowledge gives a greater motivation to students. Greater motivation of students gives a better quality of research for group G12, which gives, in turn, a greater utility of G12. P2 gives a course C1 that improves, as D1, the theoretical knowledge of Gl~'s students. This course, however, has the disadvantage to be very hard and this makes G12's students lose their motivation. Finally, the second decision variable D2 is an easy course that decreases the workload of P1. Obviously, decreasing P1 's workload increases her utility. In this case, how can Pt make her choice between the two courses D1 and D2? Notice that in the context of our example, P1 should reason about other agents (i.e., P2 and G12) to make her decision. Under some circumstances, she can also collaborate with them to develop her decision. In this sense, the decision-making process considered here is a multiagent process. To run this process, it might be useful to convert the causal map being analyzed to the form of a valency matrix V. With the valency matrix, P1 can calculate indirect paths of length 2 (i.e. V2), 3 (i.e. V3), etc., and the total effect matrix Vt (see Definition 11 below). In fact, Vt tells P1 how the decision variables D1 and D2 affect her utility and G12's utility. The formal model that we propose in the next section allows to calculate direct and indirect effects and consequently allows agents to make decisions.
90 DI(P1)
Theoretical know. of students
Research quality of G12
+
Utility of G12
Utility of P1
/_ (P2)
Resources from science faculty
~lents motivation
+
Workload
:~ of P1
-
Rosce from CS department
Fig. 4. An Illustrative Example for Decision-Making in a Multiagent Environment
2.4
C a u s a l m a p s as a T o o l for R e p r e s e n t i n g a D y n a m i c W h o l i s t i c Approach of an Organization of Agents
Weick [28] suggested to change the prevalent static view of an organization to a dynamic view which is sustained by change. Precisely, he proposed that organization and change were two sides of the same social phenomena. His reasoning was that an organization is a process of co-evolution of agents' perceptions, cognitions and actions. In this context, Weick proposed a theory of organization and change based on the graphs of loops in evolving social systems. Recently, additional investigation guided by this approach [3,4] tried to articulate how causal maps provide a way to identify the loops that produce and control an organization. In multiagent systems, the study of an organization of agents has generally focused on some structural models such as [2]: (1) centralized and hierarchical organizations, (2) organizations as authority structure, (3) market-like organizations, (4) organizations as communities with rules of behavior. All these structures missed dynamic aspects and influences that exist in an organization of agents. Generally, dynamic aspects and influences evolve through paths that close on themselves and form loops. We have realized that such loops are important for an organization of agents for two main reasons: i) a change in an organization is the result of deviation amplifying loops, ii) the stability of an organization is the result of deviation countering loops [4]. As an example, consider
9] the organization that binds researchers, grant agencies and qualified personnel in any (science and engineering) department. The causal map representing this organization is shown in Fig. 5. The meaning of this CM is clear and we shall explain it no more.
ion of est researchers+ "~ Researcher satisfaction Research ... quall~y ~
+
Sal .ieP satisfaction
"~.....~+
~
-
> Personnal Qualification
\
+
/
/
>- Quan.tity of quahfied
+ /
_ Resource~
--firesearch
Fig. 5. An organization of agents as loops
In this causal map, concepts link together to form loops, some of which are numbered (1) to (7). Loops (1), (4)-(7) are deviation-amplifying loops. Change in the organization is the result of such loops, because any initial increase (or decrease) in any concept loops back to that concept as an additional increase (or decrease) which, in turn, leads to more increase (or decrease). Thus, in loop (5), an increase in research quality improves researcher satisfaction. Increase in satisfaction of researchers allows, in turn, to improve the retention of the best researchers. Finally, the improvement of retention of the best researchers improves research quality. Loops (2) and (3) are deviation-countering loops [4]. The stability of the organization is the result of such loops. In the case of loop (2), for instance, an increase of resources for research can lead to an increase of salaries which, in turn, reduces the resources allowed to research. If this reduction is not enough to compensate the initial increase of resources, then a residual increase of salaries takes place which, in turn, reduces the resources, and so on, until a balance between the
92 initial increase of resources and salaries is reached. Thus, deviation-countering loops are useful for stabilizing the growth generated in an organization. To summarize, we can conceptualize an organization of agents as a "whole" composed of loops of influences. This is a wholistic approach in which the "whole" constrains the concepts and the relationships between them. By achieving this, we obtain a dynamic system in which deviation-amplifying loops are responsible for change and deviation-countering loops are responsible for stability of the organization. Using these loops, an individual strategist can direct strategic change in the desired direction. This can be done by (1) choosing and changing a loop or (2) choosing and changing a set of loops [4]. In the next section, we provide rules that allow to make inferences in the context of causal maps. In this way, we can determine the type of loop in an organization of agents. 2.5
Discussion
We have clearly shown how and why causal maps are useful in multiagent environments. The formal aspect which is presented in the next section, provides a formal procedure for those maps. After presenting the formal model, we will explain how to use it by means of the previous examples. It is important, however, to point out that the causal map approach adopted here is generally applicable to an environment that is not rapidly changing. In other words, the causal maps considered here are relatively stable. It is also convenient to point out that the approach adopted here seems inappropriate to open environments where agents enter and leave the group dynamically. The main reason is that it is very difficult to update causal maps and to assure coherence between concepts. There are also some other assumptions about CMs that must be considered [23]. The most important are (1) pairwise influence and (2) superposition. The first assumption requires relations in a C M to be binary relations. The second means that the result of applying together C1 and C2 is the same as applying C1 and C2 sequentially. Finally, it is important to point out that CMs can be derived from different sources, including documents, questionnaires, grids, and interaction and communication between agents.
3
Formal
Aspects
For the sake of understanding and using the cognitive maps in multi-agent environments, here are some definitions that should be remembered. A more rigorous version of these definitions will be given in the next section. D e f i n i t i o n 1. A cognitive map C M := (C, A) is a directed graph which represents an agent's assertions about its beliefs with respect to its environment. The components of this graph are the points (C) and the arrows (A) between those points (points and arrows are defined below).
93 Definition 2. A point represents a concept variable, which may be a goal option of any agent. It can also represent the utility of any agent or the utility of its group or organization, or any other concept that may take on different values in the context of multi-agent reasoning.
3. An arrow represents causal relations between concepts. It represents a causal assertion of how one concept variable affects another. As stated in the Introduction, causal links are labelled by elements of the set {+, - , 0, $ , @, 4-, a and ?}. Definition
Definition 4. A path from variable Vl to variable vn is a collection of distinct points, Vl, v2,..., vn, together with the arrows vlv2, v2v3,..., Vn-lVn. A path is trivial if it consists of a single point. Definition 5. A cycle is a nontrivial path together with an arrow from the terminal to the initial point of the path. Definition
6. Definition 6: An acyclic CM is a graph that has no cycle.
7. An indirect effect is the result of combining direct effects of relationships that are in a sequence. For instance, from vi +---+vj +~ Vk, there is Definition
an indirect effect vi +~ v~. Usually, this operation is called "multiplication" (*) and its rules are: +*y=y 0*y=0
ifyER, ify 9
- * - = +,
(1) (2) (3)
a 9 y = a if y 9 n - { 0 } , (4)
9 distributes over U x*y=y.x, ifx, y 9
(5) (6)
where R is the set of all the four types of causal relations, i.e., R = {+, - , 0, ?}. These six rules seem to be reasonable. For example, rule (2) says that, if vi has no effect on vj (x = 0), it is natural that vi has no effect on Vk (x * y = O) no matter the effect of vj on Vk (y 9 R). Finally, rule (5) says that * distributes over U, as for instance, - 9 0 = { - } * {0 U - } = ( { - } * {0}) U ( { - } * { - } ) = {0} u {+} = . . 8. The total effect of vi on variable vj is the sum of the indirect effects of all the paths and cycles from vi to vj. The rules governing the sum (D are the following: Definition
01y = y if y e R,
(7)
a[y=a x[x=x
(8) (9)
+1-
ify 9 if y 9
= ?,
xly = ylx x, y e R. I distributes over U
(10)
(11) (12)
94 Here also R is the set of all the four types of causal relations, i.e., R = { + , - , 0, ?}. Similarly, these rules are also reasonable.
D e f i n i t i o n 9. The valency m a t r i x V of a C M is a square matrix of size n, where n is the number of concepts in the corresponding cognitive map. Each element vii can take on the values {+, - , 0, | O, • a and ?} to characterize the relationship between elements i and j of C M . Evidently, i and j of V correspond to two concepts of the C M , and vii characterizes the relationships between these two concepts. D e f i n i t i o n 10. The total effect m a t r i x V t is the matrix which has as its i j th entry the total effect of vi on vj. This matrix can be calculated from the direct effects matrix with the operations of "addition" and "multiplication" defined by 2:
( v I w ) , j -- ~ l w ~ j , (13) ( v , w ) i j = (v/1 9 Wlj){(vi2 9 w 2 j ) l ... I(Yin * W n j ) . (14) Notice that operations * and I, between elements of matrices, obey to rules (1)(9) defined above. In these conditions, we have V 1 = V; V n = V * V n - 1 ; and Vt :
V ] V 2 I V 3 ... I V n - 1 .
The previous rules which determine indirect and direct effect can be used by autonomous agents for reasoning about coordination between agents in realworld applications as for instance the example of Section 2.1. Such rules are also useful in the case of reasoning about subjective views (see Section 2.2). In this case precisely, agents can use those rules to determine (1) areas of consensus and (2) unilateral views. Finally, the previous formal model can be also used to make a decision, particularly in a distributed environment as shown in Section 2.3. As explained in this section, the decision maker should calculate the total effect of each decision on the utility variable. Those decisions that have a positive total effect on utility should be chosen, and decisions that have a negative effect should be rejected. Advice on other total effects can be based on heuristics. Adopting this procedure for the example of Section 2.3 gives the following matrix of size 2 • 2 (keeping only the relevant entries) involving two decision concepts (DC), D1 and D2, and two utilities considered as value concepts (VC), namely, Utility of G12 and Utility of P1. DC\VC
D1 D2
Utility of G12 Utility of P1 + +
Thus, P1 perceives (1) decision D1 as having a positive effect on Utility of G12 and a negative effect on her utility; (2) decision D~ as having a negative 2 Vik is the ik th element of the matrix V.
95 effect on Utility of G12 and a positive effect on her utility. In these conditions, it depends on how P1 and P2 want to cooperate and how do they rank Utility of G12 and Utility of P1. If we assume, for example, that Utility of G12 is more important than utility of P1, then decision D1 would be preferred. Conversely, D2 would be the preferred decision if utility of P1 is more important than utility of G12. Finally, our formal model can also be used on the example about the wholistic view of an organization (see Section 2.4). This allows the identification of the type of loops (deviation-amplifying or deviation countering) and the development of a strategic plan to change a wholistic system by changing its loops. More precisely, if Vi+ = + (or ~), then there exists at least one deviation-amplifying loop through node i. Similarly, if Vi+ = - (or @), then there exists at least one deviation-countering loop through i. Finally, if Vi+ = d: (or any relation containing +, such as ?), then node i belongs to both an amplifying and a countering loop. Strategic changes to a wholistic system can be made by changing a loop or a set of loops [4]. Of course, the loop to be changed should be a weak loop which is loosely coupled to the system. Changing a loop (from deviationamplifying to deviation-countering, or vice-versa), can be done by (1) adding, removing, or replacing a node; (2) changing the label of a link. Finally, note that changing a set of loops is more subtle and requires strategies that avoid conflicts between agents. The difficult problem of changes in wholistic systems is an issue of our future work.
4
Conclusion and future Work
We have shown that causal reasoning in multi-agent is an important tropic which allows autonomous agents to reason about interrelationships among a set of individual and social concepts. We have also explained how such qualitative process of reasoning is important for coordination, negotiation and cooperation. Finally, we have proposed some formal aspects and particularly some intuitive inference rules that allow agents to reason from causes to effects. There are many directions in which the proposal work presented in this paper can be extended. (1) negotiation and mediation between agents in the context of reasoning about subjective views, (2) knowledge available to or necessary to agents in the case of nested causal maps [10], (3) reasoning in and on the wholistic approach, (4) reasoning on social laws particularly for qualitative decision making and coordination. Another option is to study "fuzzy relations" between agents' concepts [16,30,31]. Our approach might be extended in this direction to take into account many degrees and vague degrees of influences between agents such as: none, very little, sometimes, a lot, usually, more or less, etc.
Acknowledgments This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under grants OGP-0121634.
96
References 1. R. Axelrod, (Ed.). Structure of Decision: The Cognitive Maps of Political Elites. Princeton University Press, 1976. 2. A. H. Bond and L. Gasser, (Eds). Readings in Distributed Artificial Intelligence. Morgan-Kaufmann, San Mateo, CA, 1988. 3. M. G. Bougon. Uncovering Cognitive Maps: The Self-Q Technique. Privately printed handbook, Penn State Univ., 1986. 4. M. G. Bougon and J. M. Komocar. Directing strategic change: a dynamic wholistic approach. In A. S. Huff, (Ed.), Mapping Strategic Thought. Wiley and Sons, 1990, pp. 135-163. 5. J. W. Bryant. Hypermaps: a representation of perception in conflicts. Omega, 11, pp. 575-586, 1983. 6. J. W. Bryant. Modelling alternative realities in conflict and negotiation. J. Opl. Res. Soc., 35, 11, pp. 985-993, 1984. 7. D. M. Buede and D. Ferrell. Convergence in problem solving: A prelude to quantitative analysis. IEEE Trans. Syst., Man, Cybern., vol. 23, 1993, pp. 746-765. 8. K. S. Decker. T/~MS: a framework for environment centered analysis and design of coordination mechanisms. In G. M. P. O'Hare and N. Jennings, (Eds), Foundations of Distributed Artificial Intelligence, Wiley InterScience, 1996, pp. 429-448. 9. E. H. Durfee. Planning in distributed artificial intelligence. In G. M. P. O'Hare and N. Jennings, (Eds), Foundations o] Distributed Artificial Intelligence, Wiley InterScience, 1996, pp. 231-246. 10. E. H. Durfee. Blissful ignorance: knowing just enough to coordinate well. Proc. of the First Int. Conf. on Multi-Agent Systems, MIT Press, San-Francisco, USA, 1995, pp. 406-413. 11. C. J. Eden, and D. Sims. Thinking in organizations. Macmillan, London, 1979. 12. R. Fagin, J. Y. Halpern, Y. Moses, and M. Y. Vardi. Reasoning About Knowledge, MIT Press, 1995. 13. P. J. Gmytrasiewics and E. H. Durfee. A rigourous, operational formalization In Proc. of the First Int. Con]. on Multi-Agent Systems, 1995, pp. 125-132 14. J. Hart. Comparative cognition. In [1]. 15. G. A. Kelly. The Psychology of Personal Constructs. New: Norton, 1955. 16. B. Kosko, Neural Networks and Fuzzy Systems. Prentice Hall. 1992. 17. B. Laasri, S. Lander, and V. Lesser. A generic model for intelligent negotiating agents. Int. J. Intell. Coop. In]. Syst. 1(1), 1992, pp. 291-318. 18. V. R. Lesser and D. D. Corkill. Distributed problem solving. In S. C. Shapiro, (Ed.), Encyc. o] Arti/. Intel., Wiley, New York,1987, pp. 245-251. 19. J. C. C. McKinsey. Postulates for the calculus of binary relations. Journ. Symbolic Logic, 5, 1940, pp. 85-97. 20. B. Moulin and B. Chalb-draa. An overview of distributed artificial intelligence. In G. M. P. O'Hare and N. Jennings, (Eds), Foundations of Distributed Artificial Intelligence, Wiley InterScience, 1996, pp. 3-55. 21. H. J. Miitler. Negotiation Principles. In G.M.P. O'Hare and N. Jennings, (Ed.), Foundations o] Distributed Artificial Intelligence, Wiley InterScience, 1996, pp. 211-230. 22. K. Nakumara, S. Iwai, and T. Sawaragi. Decision support using causation knowledge base. IEEE Trans. Syst., Man, Cybern. vol. SMC-12, 1982, pp. 765-777. 23. K. S. Park and S. H. Kim. Fuzzy cognitive maps considering time relationships. Int. J. Human-Computer Studies, 42, 1995, pp. 157-168.
97 24. L. L. Ross and R. I. Hall. Influence diagrams and organizational power. Admin. Sci. Q., vol. 25, 1980, pp. 57-71. 25. R. G. Smith. The contract net protocol: high-Level communication and control in a distributed problem solver. IEEE Trans. on Computers, C-29(12), 1980, pp. 1104-1113. 26. T. Smithin and D. Sims. Ubi Caritas?--Modeling beliefs about charities. Eur. J. Opl Res., vol. 10, 1982, pp. 273-243. 27. K. R. Sycara. Multiagent compromise via negotiation. In L.Gasser and M. N. Huhns, (Eds), Distributed Artificial Intelligence, vol. 2., Morgan Kaufmann, Los Altos, CA/Pitman, London, 1989, pp. 119-137. 28. K. E. Weick, The social Psychology of Organizing, Reading, MA: Addison Wesly, 1969. 29. M. P. Wellman. Inference in cognitive maps. Mathematics and Computers in Simulation, 36, 1994, pp. 1-12. 30. W. R. Zhang, S. S. Chen, and R. S. King. A cognitive map based approach to the coordination of distributed cooperative agents. IEEE Trans. Syst., Man, Cybern., vol. 22(1), 1992, pp. 103-114. 31. W.R. Zhang. NPN fuzzy sets and NPN qualitative algebra: a computational framework for bipolar cognitive modeling and multiagent analysis. IEEE Trans. Syst., Man, Cybern., vol. 26(4), 1996, pp. 561-574.
The Reorganization of Societies of A u t o n o m o u s Agents Norbert Glaser 1 and Philippe Morignot 2 i C R I N / C N R S - I N R I A Lorraine, BP 239, F-54506 Vandoeuvre-l~s-Nancy Fax: +33 (0) 3 83.27.83.19 Tel: +33 (0) 3 83.59.30.83 gmail: [email protected] 2 ICS-FORTH Science and Technology Park of Crete Vassilika Vouton, P.O.Box 1385 GR 711 10 Heraklion, Crete, Greece Fax: +30 81 39 16 01 Tel.: +30 81 39 16 00 [email protected]
A b s t r a c t . This paper investigates the skills of autonomous agents to reorganize their society as answer to environmental changes. The reorganization of an agent society can be motivated by the desire to reduce conflicts within inter-agent cooperation and to increase the efficiency in achieving goals. Our interest is centered on situations where new agents want to join an existing agent society which has established conventions for agent cooperation. Joining an agent society means that the society can draw benefits from the interaction with the new members having competencies which are complementary to the society ones; both, the society and the new members, need to agree on a new convention. We introduce a method for reorganization based on the principle of punishment : a society punishes or favors the behaviors of its new members.
Keywords:
1
A u t o n o m o u s agents m o d e l i n g , r e o r g a n i z a t i o n , a d a p t a t i o n , reinforcement
Introduction
In the l i t e r a t u r e a b o u t m u l t i - a g e n t s y s t e m s ( M A S ) , people t a l k on one side a b o u t a single society of agents, on the o t h e r side p e o p l e s t u d y t h e b e h a v i o r of one agent; the l i t e r a t u r e a b o u t artificial intelligence h a n d l e s this p o i n t , even if M A S does not. These studies a n d m o d e l s do not cover t h e real world which is c o m p o s e d of m a n y societies of agents where agents can m i g r a t e a m o n g t h e m . A g e n t s b e l o n g i n g to a society are said to have a c o m m o n c u l t u r a l backg r o u n d ; the society is itself coherent a n d its agents are o b s e r v e d to respect social n o r m s [20]. Nevertheless, w h a t is coherent in one society m i g h t n o t e be a c c e p t e d in a n o t h e r one; a n d all seems to be right. We d r a w f r o m h i s t o r y t h a t
99 some societies grow, others disappear by lack of adaptation to new environmental conditions. This means at a more abstract level that societies split from large old confederations, or they build new confederations for synergy enhancement. In this paper we propose a scheme for the reorganization of a society of agents due to the migration of agents among societies. More precisely, we focus on the simplest event that reveals the existence of other societies to a given society: the integration of a new agent, coming from another society, into a given society. We propose a simple model for a society expressed by a set of roles which are assigned to each agent. A role has a double interpretation : it reflects the competencies that an agent has and that the agent should provide to the other agents. The process of identifying individual roles can be seen as the agreement on a convention between agents. Change of conventions among agents means dynamically modifying the structure of the society in order to maintain a level of coherence. In a coherent society, the agents are able to combine their actions towards some unifying goal. We speak of deliberative agents which are aware of their roles. We propose furthermore a model for an autonomous agent that encapsulates reactive, cognitive, cooperative and social competencies. Autonomy is one of the fundamental characteristics of an agent [10]. Autonomous agents need to cooperate when they cannot achieve global goals individually, or when individual goals can be collectively achieved at lower costs [1]. Conflicts may arise during agents' cooperation; they are generally solved by negotiation and coordination strategies [6,17,24]. Nevertheless, social competencies are useful for agents to reduce occurring conflicts through the reorganization of their society. The acceptance of an agent by a society depends on the fact that the utility of the society increases. The society disposes a mechanism of punishment and favoring to make agents agree on roles. On the other hand, an agent joins only a society if its own utility increases, too. The paper is organized as follows. First, we present a model for agents, one for societies and one to explain how agents are part of a society. Then, we present a mechanism to analyze the integration of one agent into a society. Third, an example is presented. Finally, we briefly sum up our contribution.
2 2.1
Static M o d e l s An Individual Agent
An individual agent has four basic competencies rising from reactive, cognitive to cooperative and social ones. Figure 1 illustrates how the four basic competencies of an agent can be encapsulated into an agent model composed of four distinct layers [13,12]. Layered agent models are a very useful approach since we can express specific requirements for each layer during design and run time; it is also a very common approach for the design of multi-agent systems [23]. At the reactive level, an agent performs basic actions depending on its perception and received events. At the cognitive level, an agent can perform individual
100
/
-~--,-/
//
/
/
/ / / / / Layer / . 9 protocols . no conflict update ~c~176176 n ~ g ~ I rUo~::te//~ Cooperation tiafionmethods / /.,,/I 9 / / L a y-e/ r / planningmathods/~------'~'tntaragent / /jointp*an / / / I conflict ~. / / Cognitive / ." " / / 9 update Social
Laye;S.,...,.ct,v~ //f
.e..,v:
.av,oo// lav"
I o~,To~.
..... /
/
Layer
Fig. 1. Competencies of an Agent
action plans to achieve some non-trivial goals. It can exhibit expert behavior during the execution of these plans; however, it cannot handle conflicts which may arise with other agents. The agent can be qualified as a specialist [7]. At the cooperation level, an agent has protocols for the cooperation with other agents, it can build joint plans and solve conflicts; a cooperative agent is said to have a static role in a multi-agent society. Finally, at the social level an agent has full social behavior : it can agree with other agents on the assignment of roles; social agents are said to have the competence to reorganize within their society. In defining social knowledge, we refer the reader to [5]. D e f i n i t i o n ( A g e n t ) . An agent al is a quadruple having reactive, cognitive, cooperative and social competencies, C ~ = { B H Vi , P S Mi , C P Mi , S C Mi }. B H V~ represents a set of behaviors, P S M i a set of methods for problem-solving, C P M i a set of methods for agent cooperation, and S C M i a set of methods for the reorganization of agent societies. We associate each competence cj E C a' with a confidence value c f i ( c j ) expressing how much the agent ai believes in this competence. D e f i n i t i o n ( W o r l d ) . All defined agents represent the world W -- { a l , . . . , an} without considering their grouping into societies. The available competencies in the world are the union of all individual competencies, CB,P,C,S = ~J C ~ aiEW
2.2
Agent Society
We consider a society as a grouping of agents having common interests. An interest can be the desire or need to share resources and competencies. Within a society agents have different roles reflecting the competencies necessary for accomplishing it. This competence-based approach comprises deliberative agents which are aware of the roles they are playing and of those they want to play. Comparing artificial agents to real-world ones, we recognize that an agent is found to be part of various societies, i.e. a company, an association and a
101
family, each of them more or less organized. This has to be reflected by different types of roles which are given by the structure of a society. Moreover, we need to consider mimic agents and societies influenced by chaotic and unintended effects. The macroscopic-microscopic model of Haken [14] could be a promising and elegant solution. Definition (Society). A society S is a set of roles which identify the positions which individual agents ai E W can play, S = { r l , . . . , rn}. The set A represents the agents being part of the society. A example for a society which offers three possible roles to agents is S = {rmanager,rexpert, roperator }- Figure 2 illustrates such a society. The interpretation is the following : the manager distributes tasks to one or several experts for solution and to one or several operators for execution. The manager needs cooperation methods, the expert problem-solving methods, and the operator reactive behaviors. We assume that the definition of the different roles belonging to a society are part of the social knowledge of an agent.
organisation = {manager, expert, operator}
9" " " '
committed roles
""'",
desired roles
expected roles
|
%%'~"""~176 ......................... 9~176176176176176176176 Fig. 2. Different types of roles
Definition (Role). A role rj is defined by the competencies associated to it, rj = {ck I ck ~ CB,p,c,s}. Each individual agent is able to play different roles in a society depending on its individual competencies C a~. An agent having reactive behaviors and cooperation methods can fulfill the role of the manager and of an operator. On the other hand, the society to which the agent belongs can demand that the agent has to play the role of a manager. This example underlines that we need to distinguish between three types of roles. Definition ( D e s i r e d Role). The desired role r~ is the role that the agent wants to play. The role has a benefit for the agent. The set of desired roles of an agent is R~'. Definition ( E x p e c t e d Role). The expected role re is the role that the society expects the agent to play. The expected role is that role that is of benefit for the society. The set of expected roles of a society is Res.
102
D e f i n i t i o n ( C o m m i t t e d R o l e ) . The committed role rc is the role that an agent has committed to play within in the society. The set of c o m m i t t e d roles of an agent is R~'. We assume that each single agent has, as part of its social knowledge, the information about societies, i.e. types of structures for a society and how to build them. Fiske [9] introduces four basic relational frames or structures, i.e. communal sharing, equality matching, authority ranking and market pricing. We view them as basic frames from which societies can be derived. An agent completes this knowledge in interacting with other agents which can have different knowledge about structures. More complex roles can emerge through role-combination from the initial role set; agents are also found to cooperate to fulfill a role together. Moreover, the recombination of the agents' competencies can make new roles emerge. Haken's model could be a promising approach for the emergence of roles and structures within an agent society : simple changes at a microscopic level representing the competeneies, for example, would make roles emerge at a macroscopic level.
2.3
Convention between Agents and Society
We introduce the notion of a convention to describe how agents form a society. In fact, a convention is some sort of social norm which allows agents to increase the utility of a society and also their own one. At a generic level, our notion of convention is close to the understanding of [20] about cultural models which
have powerful influence in structuring actors' interaction with the environment. The environment is in our case the world of all agents. The cultural background may be expressed by the type of utility function each agent has. We explain this point later oil. We define a convention as a distribution of roles between agents. A convention can be an implicit agreement on a distribution of roles of which the agents become aware after some time. In a primitive society agents are found to build a convention just in behaving in a similar manner. Moreover, in a XXX society, for example, a convention may emerge from the initial random activity of the agents. We consider in this paper the case of established societies where conventions have been stated by some of its members. D e f i n i t i o n ( C o n v e n t i o n ) . A convention is represented by the set of c o m m i t t e d roles on which the agents have agreed between each other, Cs = {(ai, rj) [ ai E
W A r j 6 R~'} C W x S. This definition supposes that an agent needs to have background knowledge about possible structures of societies. This claim is reasonable as we have discussed in the previous section. Taking the example with S = {rmanager,re~:pert,roperato,, } illustrated in Figure 2, a possible convention is Cs = {(al, rrnanager), (a2, re,pert), (a3, roperator)}. A convention describes the structure of a society, or in other words its organization. The roles assigned to the agents by a convention cannot be seen as
103
independent labels for the agents but they are related to each other. For example, [19] identified an authority and communication relationship within an organization composed of functional entities having three possible roles, i.e. coordinator of actions, executer of actions and provider of expertise. How do agents now agree on a convention to form an organisation ? This question highlights that we need something more, something to express the motivation of a society to accept/refuse an agent and the motivation of an agent to accept/refuse a role. A very useful means is the definition of utility functions. 3
Dynamic
3.1
Model
The Reorganization of a Society
A typical example which requires the reorganization of a society is the situation where one agent wants to join an existing society. Several questions arise : does the new member progressively adapt itself so as to internalize the organisation of the society it lives now in ? Or does the society adapt itself so as to reflect the internal organisation of the agents it is composed of ? In the latter case there needs to be a common part in the agents that tend to group to form a society. - The first question requires a mechanism where the society does not modify itself and rejects the candidate agent until it matches the already established convention. The members of the society punish and favors the candidate agent until it has adapted to the convention. - The second question requires a mechanism where the society accepts a modification of its convention so that the candidate agent can be integrated into the society. The candidate agent rewards the members of the society for having accepted to do that. In both situations, it is the candidate agent which performs the action to derive an (expected) role from the society, i.e. from the convention of the society; it is not the society which proposes a role to the candidate agent. The candidate agent acts as a social and autonomous entity [3] having its own goals; it is nevertheless liable to influencing. The creation of a new convention includes the possibility that roles are redistributed among the remaining agents. D e f i n i t i o n ( R e o r g a n i z a t i o n ) . The reorganization within an agent society S happens through the modification of the established convention Cs and the creation of a new convention C s, for new society S'. There are two cases : 1. Agent ai leaves the society, C S, = Cs \ {(ai, rj)}. t
2. Agent ai joins the society, C s, = Cs U {(ai, rj)} A rj E R~ . If the society were to modify itself due to the loss of an agent, we believe this phenomenon would be represented by reversing the model for agent's inclusion that we present here. However this points is left for future research.
104
.................... s ................................. Cs
I
puoishmen avo.og i
...--"
Fig. 3. Principles of Reorganization
The principle of our reorganization method is illustrated in Figure 3. Agent a4 wants to join the agent society A = {al, ae, a3] having organized as S = {rl, r2] with rl is played by agent a2 and r2 by agents al, 3. i.e., the agents have established the convention Cs = {(al, r j , (a~, rl), (a3, r2)}. The society either favors the integration of the new agent, i.e., it accepts the desired roles R~ 4, or its rejects it until the agent has accepted the expected roles from the society R~s. 3.2
Utility Functions
We have already mentioned that the agent and the society need some additional information which motivates their reorganization. This information is introduced through utility functions assigned to agents and the society. Utility functions were found to be a simple but elegeant model for choosing among courses of actions (plans) for an agent [8,26]. The utility function of an agent may take many variables into account, among others, its life time, its resource production and consumption, and including the degree of satisfaction and usefullness of an agent's goals. If goals lead to alternative courses of actions, an agent pursues a goal that increases its utility. On the society level the utility function may reflect the quality of the convention convention established between the agents including basic knowledge about basic social knowledge about structures of societies. D e f i n i t i o n ( U t i l i t y of a S o c i e t y ) . The utility of a society depends on the utility of its convention//(Cs), on the cost for reorganization Cr and on the absolute utility of a society Us.
U(S) :: Us + U(Os) - G
(I)
105
This definition distinguishes the intrinsic utility of the society and the cost of any change, hence enabling to model more or less conservative society. We give a simple definition of Cr; Constraint Propagation Techniques (see for example [29]) constitute one direction to perform this evaluation on a network of symbolic roleconnected agents. We give below two examples of definition of Cr and 1A(Cs).
Cr .U(Cs) :=
grolechanges
Nrole,
~ CM(ai,rj) (ai,rj)ECs
(2)
(3)
C.h/t(ai, rj) expresses the commitment of the agent ai to role rj. Definition ( U t i l i t y of an A g e n t ) . The utility of agent depends on the roles it desires, the roles it has committed to and the confidence it has its roles. It is defined by the utility of a role it.
Ll(ai, rj) := D(ai, rj) * CiT(ai, rj)
(4)
with :D(ai, rj) is the desire of agent ai to role rj. CgV(ai, rj) is the confidence of agent ai in role rj. These two definitions allow us to clarify when an agent integrates into a society and when a society accepts that. P r i n c i p l e 1. A society chooses to integrate an agent that increases the society's utility. P r i n c i p l e 2. An agent chooses to integrate into a society that increases the agent's utility. An agent integrates in a society through the establishment of a new convention between the members of the society and the new candidate. We express this by the first equation. E q u a t i o n 1. A candidate agent ai E W integrates into a society if it matches the expected role. A simple threshold mechanism between the expected role and the agent's competencies is used :
CJ:(ai,re) = ~
cf~(ck) >>se
(5)
CkEre
A society draws benefits from the integration of a candidate agent; on the other hand, an agent draws benefits from integrating into a society. This is expressed by the following two equations.
106
E q u a t i o n 2. The expected role to which the candidate agent has committed augments the utility of the society. 3rj C S CI Re8, so that U(S) 1".
E q u a t i o n 3. The desired role to which the candidate agent has committed and which was accepted by the society augments the utility of the agent. 3rj E S A Rag' , so that U(ai, rj) ~.
(7)
Given this mechanism for a society to produce an expected role for an incoming agent, an optimal policy can be drawn for a candidate agent. If the agent is able to (1) simulate the society's utility and (2) simulate the function that produces the expected role given an agent's competencies, it can plan the expected role that it will be given. Thus an optimal policy for this agent is to find how the society's utility increases most and proposes the most relevant competencies. This learning mechanism by which an agent learns an internal model of the target society requires for the agent a large amount of knowledge on this society. The agent is little likely to acquire enough of it, since it is not a member of this society. We would assume that cooperation with a small number of members of the society would improve this learning phase. 3.3
T h e M e c h a n i s m s of Integration
At the beginning of this Section we have presented two situations for the integration of an agent into a society. Briefly, the first one concerned the adaptation of the candidate agent through punishment/favoring; the second one the adaptation of the society. These mechanisms are illustrated below. In speaking of society, we have its member agents in mind. M e c h a n i s m for I n t e g r a t i o n 1 ( A u d i t i v e a g e n t ) . The candidate agent is an auditive one which tries to derive the best fitting role for its integration into the society. 1. 2. 3. 4. 5.
The The The The The
agent analyzes the society's utility. agent analyzes the society's convention. agent tries to find a modified convention increasing the society's utility. agent proposes this role for itself. society reward the kind character of the agent.
M e c h a n i s m for Integration 2 (Selfish agent). The candidate agent is a selfish one which insists on its desired roles during the integration into a society. 1. The society analyzes its utility with the desired roles of the candidate agent.
107
2. 3. 4. 5.
The The The The
society punishes/favors the desired roles of the agent. agent makes modifications to its desired roles. agent reapplies if its utility has increased during these modifications. society rewards the agent.
Punishment means that the society tells the agent that it has no utility for integrating into it; the other way round, favoring means that the society says to the agent that it has an utility for integrating into it. Both can be realized by reinforcement learning as Q-learning. We refer the reader to [30] for a description of Q-learning and to [18] for an overview about learning techniques. The use of such a learning method requires the definition of reward functions. R e w a r d F u n c t i o n s . We reward the following situations during the integration of an agent. -
Positive feedback on a society if an agent joins. Negative feedback on a society if an agent leaves. Positive feedback on an agent if it accepts to join a society. Negative feedback on an agent if it decides to leave a society.
The previous feedbacks have a positive or a negative influence on the utility of the society and of the candidate agent.
4
Proposed
Experiments
The experiments can be realized within the C o M o M A S environment which is a knowledge engineering and simulation environment for the development of multi-agent systems [12]. The C o M o M A S approach proposes a set of conceptual models at an implementation-independent level. This work extends the COMMONKADS approach [28] for the development of multi-agent systems. The knowledge engineering module produces descriptions of the conceptual models in the Conceptual Modeling Language (CML) [27] which has been extended. The simulation module integrates an updated and extended version of MICE, an agent simulation environment introduced by [21]. We have conceived within this environment a layered-agent architecture [13] encapsulating specific competencies at distinct layers. We are studying several scenarios of various complexity in which existing societies reorganize through the integration of new agents. Take the following scenario as example. S c e n a r i o . Given a startup company composed of one manager and several programmers, A = {al,a2, a3, a4} with the convention Cs = {(el, rraanager), (a2,3,4, rprogrammer)}. Now, a candidate agent a5 wants to enter the company.
108
-
P a r t A. The company has a small size, the expected role is a programmer, r~ = Vp~og. . . . . . . . A candidate agent qualifying for a programmer has no problems to get accepted; on the other hand, a higher qualified candidate agent is refused since the society does not recognize the need to pay a high salary. Nevertheless, the higher qualified agent is employed if it accepts the role of a programmer. P a r t B. Having reached a certain size, the company needs to reorganize in order to maintain its efficiency; it favors now the candidature of higher qualified agents having manager skills, r~ = rmanager.
We illustrate below the principles in applying our approach to model this scenario. It can be implemented within the C o M o M A S environment as a set of social agents using our layered agent model [13]. The society is given by Cs = {(al, r . . . . g~), (a2,3,4, rprograrnmer)} with the expected thresholds for the roles, Sm~,~ag~ = 0.7 and Sp~og~ . . . . = 0.4. The society has the utility //(S) given by the utility of the convention l l ( C s ) = ~ (ai, rj) e Csg]td(ai, rj) + Us since no reorganization has yet occured. The utility of the society depends on the commitment of the individual agents. The candidate agent has the desired roles R~ 5 = {rma,~g~, r p r o g ~ m , ~ } with an equal confidence in both roles, g ~ ( a s , r , ~ a g ~ ) = 0.8 and CN(as, rp~og. . . . . . . ) = 0.8. The agent has a greater desire to be manager than to be programmer, T~(as, r m ~ g ~ ) = 0.9 and 7)(a5, rprog~m,,,r = 0.4. The role rj increasing the utility of the agent U(as, rj) is the role of the manager, //(as, rm~,~g~) = 0.72. Given the expected threshold values by the society for admitted roles, the candidate agent can play both roles, rm~n~g~ and rprogramme r . - In part A, the society refuses the desired role of the candidate agent, r d -~ rmanager based on its social knowledge that a small society with two managers has a lower utility than a society with one manager; this is modeled by the absolute utility values Us. The refusal is expressed by a negative feedback to the desired role of the agent. - In part B, the society has increased considerably through the employment of a huge number of programmers. Agent al has decreased its commitments to play role r,,~a~ag~r and the utility of the society U(S) has decreased despite the rewards from the integration of new agents. The society favors the role of a manager rrnanager. 5
Related
work
Working on the reorganization within an agent society requires the definition of social attitudes and social knowledge of individual agents, the investigation
109
how agents build societies and how they maintain them. Noteworthy is the work of Conte and Castelfrancchi [5], Rao and Georgeff [25], Hobbes [16] and Gauthier [11], and Etzioni [8]. The design and implementation of social agents can be supported by an appropriate conceptual model; this aspect is addressed, among others in [2,12]. The definition of social commitment and belief as introduced by [5] makes us think to integrate them as sources of knowledge for the determination of roles by agents. This is in particular interesting for an agent society to determine if its members fulfill the expected roles. In fact, we have based the utility of a convention between agents on their individual commitments. The definition of belief and desire by [25] illustrates a similar understanding of the social knowledge agents have. Instead of funding the definition on the objectives and goals of an agent, we use the more abstract notion of a role; this expresses that an agent having competencies is part or wants to be part of a society. The definition of an agent society by a set of roles, derived from the agents' competencies, allows us to model the dynamic behavior of such a society. In fact, a society evolves over time, i.e. roles disappear and new ones appear. Close to our notion of conventions are the network of contracts proposed by Hobbes [16] which are in his view the only way to avoid belligerent attitudes among humans. The notion of mutual commitments [11] expresses a similar understanding. Our work is complementary to [4] which studies the conditions, the situations and the mechanisms for the emergence of organizations between agents. The notion of an organisation used in that paper is close to our notion of a convention. Autonomous agents are required to be able to modify their local knowledge in a such a way that they can agree with other agents to build a society. [22] presents a way to structure an agent so that it can easily adapt itself to new environments. Thereby an environment can be a society, since the agent's structure includes social and cooperative aspects. [15] proposes an architecture for adaptive agents. Aspects of this architecture are comparable to our layered architecture [13]. Interesting is also the work of [8] who is a successful example for having introduced economical concepts into artificial intelligence, actually planning and the design of agents. His propositions on utility functions could allow us to integrate new elements into the definition of the utility of a society, for example, to model th choice of an agents among various societies, with each requiring various plans for integration.
6
Conclusion
The rapid evolution of the world in which societies exist makes it necessary for them to adapt. We consider the reorganization of a society as a response to an evolution of their living conditions. In particular, we consider the situation where an individual agent intends to be a member of a society. This paper presents a new approach for the reorganization of a society of autonomous agents. We define a society as a collection of agents having agreed
110
on a distributed of roles through a convention. A candidate agent having reactive, cognitive, cooperative and social competencies applies for membership. Our method is based on the principle of punishment and favoring : the society of agents favors those roles of the candidate agent which augments the society's utility and punishes the others. The candidate agent is rewarded after its integration through the growth of its own utility function. A realistic scenario illustrates the application of this society model and of this reorganization method. The scenario can be straightforwardly validated within the C o M o M A S multi-agent simulation environment which we have conceived. Future work will be centered on the extension of the current method for reorganization which considers the integration of an agent into a society. The splitting and merging of societies at a more general level is an interesting and necessary topic. We are analytically characterizing our model in full decision-theoretical framework. We are also experimentally studying the use of our method within different existing agent architectures available at our laboratories. In particular, the application of social reorganization to software agents collaborating via the Internet seems promising.
7
Acknowledgments
This project was partly funded from the European Community. The first author was funded through a doctoral fellowship within the H u m a n Capital and Mobility program, contract ERB4001GT930339, and has received the qualification MarieCurie fellow delivered by the European Union. The second author is currently supported by an E R C I M fellowship.
References 1. M. Adler, E. Durfee, M. Huhns, W. Punch, and E. Simoundis. AAAI workshop on cooperation among heterogeneous intelligent agent. A I Magazine , 13(2):39-42, 1992. 2. T. Bouron and A. Collinot. SAM: a model to design computational social agents. In Proc. 10 th ECAI, Wien, Austria, 1992. 3. C. Castelfranchi. Guaranties for autonomy in cognitive agent architectures. In ECAI-94 workshop on ATAL, pages 56-70, Amsterdam, The Netherlands, 1994. LNAI Series, 890. 4. V. Chevrier, R. Foisel, N. Glaser, and the research group MARCIA. Autoorganisation: Emergence de structures. In Journdes du PRC IA sur les SMA, 1995. CRIN report 95-R-290. 5. R. Conte and C. Casteffranchi. Cognitive and Social Action. UCL Press, 1995. 6. K.S. Decker and V.R. Lesser. Designing a family of coordination algorithms. In Proc. I ~t [CMAS, pages 73-80, San Francisco, California, 1995. 7. J. Erceau and J. Ferber. L'intelligence artificielle distribute. La Recherche, 22:750758, 1991.
111
8. O. Etzioni. Embedding Decision-Analytic Control in a Learning Architecture. Artificial Intelligence, 49:129-159, 1991. 9. A.P. Fiske. The four elementary forms os sociality : Framework for a unified theory of social relations. Psychological Review, 99:689-723, 1992. 10. S. Franklin and A. Graesser. Is it an agent, or just a program?: A taxonomy for autonomous agents. In ECAI-96 workshop on ATAL, pages 21-36, Budapest, Hungary, 1996. LNAI Series, 1193. 11. D.P. Gauthier. The Logic of Leviathan. Oxford University Press, 1969. 12. N. Glaser. Contribution to Knowledge Acquisition and Modelling in a Multi-Agent Framework - The CoMoMAS Approach. PhD thesis, Universit6 Henri Poincar6, Nancy I, 1996. 13. N. Glaser, V. Chevrier, and J.-P. Haton. Multi-agent modeling for autonomous but cooperative robots. In Proc. 1 ~t DIMAS, pages 175-182, Cracow, Poland, 1995. 14. H. Haken. Synergetics. An Introduction. Springer Verlag, Berlin, 1990. 15. B. Hayes-Roth. An architecture for adaptive intelligent systems. Artificial Intelligence: Special Issue on Agents and Interactivity, 72:329-365, 1995. 16. T. Hobbes. Leviathan. Oxford University Press, 1948. 17. N.R. Jennings. Towards a cooperation knowledge level for collaborative problemsolving. In Proc. 10 th ECAL pages 224-228, Wien, Austria, 1992. 18. L. Kaebling. Learning in Embedded Systems. MIT, 1993. 19. E. LeStrugeon, C. Kolski, R. Mandian, and M. Tendjaoni. Intelligent agents. In Second International Conference on the Design of Cooperative Systems, pages 331344, Juan-les-Pius, France, 1996. 20. G. Mantovani. Social context in HCI: A new framework for mental models, cooperation, and communication. Cognitive Science, 20:237-269, 1996. 21. T.A. Montgomery and E.H. Durfee. Using MICE to study intelligent dynamic coordination. In Proc. 2 ,~a IEEE Conf. on Tools for AI, pages 438-444, 1990. 22. Ph. Morignot and B. Hayes-Roth. Why does an agent act: Adaptable motivations for goal selection and generation. In M. Freed and M. Cox, editors, A A A I Spring Symposium, Representing Mental States and Mechanisms, pages 97-101. Stanford, 1995. 23. J.P. Miiller. The Design of Intelligent Agents : A Layered Approach, volume 1177 of Lecture Notes in Computer Science. Springer Verlag, 1996. 24. E. Oliveira, F. Mouta, and A.P. Rocha. Negotiation and conflict resolution within a community of cooperative agents. In Internat. Symposium on Autonomous Decentralized Systems, Kawasaki, Japan, 1993. 25. A.S. Rao and M.P. Georgeff. BDI agents: From theory to practice. In Proc. 1 st ICMAS, pages 312-319, San Francisco, California, 1995. 26. S. Russell. Rationality and intelligence. In Proc. 1J th IJCAI, Montr6al, Canada, 1995. 27. G. Schreiber, B.J. Wielinga, J.M. Akkermans, W. Van de Velde, and A. Anjewierden. CML: The CommonKADS conceptual modelling language. In Proc. 8 th European Knowledge Acquisition Workshop, pages 1-25, Hoegaarden, Belgium, 1994. LNAI Series, 867. 28. G. Schreiber, B.J. Wielinga, R. de Hoog, H. Akkermans, and W. Van de Velde. Commonkads: A comprehensive methodology for KBS development. IEEE Expert, 9(6):28-37, 1994. 29. P. Tsang. Foundation of Constraint Satisfaction. Academic Press, 1994. 30. C.J.C.H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279-292, 1992.
Adaptive Selection of Reactive/Deliberate P l a n n i n g for t h e D y n a m i c E n v i r o n m e n t - A Proposal and Evaluation of MRR-planning Satoshi KURIHARA, Shigemi AOYAGI, and Rikio ONAI NTT Basic Research Labs. 3-1 Morinosato-Wakamiya, Atsugi, Kanagawa, 243-01 JAPAN E-mail:kurihara@square. brl. ntt. co.jp Abstract
This paper proposes and evaluates a methodology for multi-agent realtime reactive planning. In addition to the feature of conventional real-time reactive planning, which can react in a dynamic environment, our planning can perform deliberate planning when, for example, the robot has enough time to plan its next action. The proposed planning features three kinds of agents: a behavior agent that controls simple behavior, a planning agent that makes plans to achieve its goals, and a behavior selection agent that intermediates between behavior agents and planning agents. They coordinate a plan in an emergent way for the planning system as a whole. We confirmed the effectiveness of our planning by means of a simulation. Furthermore, we implemented an active vision system, which is the first stage of building the real-world agent, and used it to verify the real-world effectiveness of our planning.
1
Introduction
In this paper, we propose "multi-agent real-time reactive planning" (hereafter called MRR-planning) that can flexibly r e s p o n d t o dynamic environmental changes in order to implement a real-world application. In a dynamic environment, goals may change dynamically and the amount of planning time available may vary. To respond reactively to environmental changes, it is essential to incorporate the features of real-time reactive planning [1] [7]. Furthermore, to undertake deliberate planning properly, it is essential to incorporate the features of classical planning [6]. Our real-world application is called the active vision system (AVS) (mentioned in Section 5) that is designed to detect, trace, and if possible identify moving objects by using image and sound data. This system is mounted in an autonomous robot so the robot can chase an object. Relatively simple short-term goals as well as complicated, deliberate long-term goals are given dynamically.
113
For example, when a detected object moves quickly, the AVS must frequently control the video camera mechanically to trace the object. Reactive plavnlng is therefore used to trace the moving object reactively. However, when the detected object moves slowly, the AVS has enough time to identify the object by using deliberate planning. To identify the object, several kinds of high-load image processing are necessary. Moreover, the time available for planning varies according to the speed of the object. The system thus requires both reactive and deliberate planning. We propose an adaptive planning that can select reactive/deliberate planning efficiently in a dynamic environment. Important issues that must be addressed in order to implement the planning are (1) how the planning generates both reactive and deliberate planning results within a restricted time and (2) how the most appropriate plan is selected from among the results of the two kinds of planning. Several behavior-based approaches to perform reactive planning have been proposed [3] [S]. In those approaches a number of planning agents reactively generate individual plans by applying a "conditioned reflex response." These agents coordinate each other and decide which plan is the best depending on the environment. It is quite difficult, however to perform deliberate planning by using these approaches. Yamada [14] proposed an efficient form of deliberate planning using the probability of changing goals, but we cannot apply this planning because it is quite difficult to calculate all probabilities of changing goals. We therefore use the multi-agent framework: each agent generates a plan reactively or deliberately and also identifies the plan that should be selected. The most efficient feature of the multi-agent framework is that the system is established in a bottom-up fashion and the complex control of the system is established by coordinating agents. If we establish the above p]anning by applying central control, however only one central module would control all the reactive/deliberate planning. Reactiveness would thus be lost. In MP~-planning there are two kinds of planning agents; one performs reactive planning and the other performs deliberate planning. MRR-planning basically applies the framework of real-time reactive planning based on the behaviorbased approach. Moreover, efficient selection of reactive/deliberate planning is achieved by using a spreading activation model and a threshold/maximum mechanism. For example, reactive planning is selected when the robot faces an emergency situation and deliberate planning is selected when the robot has enough time to plan its next action. The most important feature of MRR-planning is that the effect of conventional close coordinations of agents is achieved through their non-close coordination. Therefore, in MP~t-planning, even if there are a lot of planning agents, their coordination costs remain low so that reactiveness is not lost. Section 2 explains RTT (Real-Time TileWorld) which is a simulator used to
114
9 Obstacle ~ Tile I[~ Energysupplybase mHole (score) 0 Enemyrobot 9 Playerrobot
Figure 1: RTT (Real-Time TileWorld) evaluate MRR-planning, and Section 3 explains MRR-planning. Section 4 offers a comparative evaluation using RTT. Section 5 explains the active vision system, which is a real-world application, and Sections 6 and 7 summarize future issues and the overall results of the research.
2
RTT: Real-Time TileWorld
We used an extented version of TileWorld [10] for evaluating the proposed MRRplanning. TileWorld was originally introduced to simulate an environment where environmental changes take place dynamically, and we added components in order to expand the number of possible dynamic environmental changes. As Figure 1 shows, RTT includes a player robot, tiles, obstacles, and holes (all of which exist in the conventional setup) as well as the newly added enemy robots and energy supply bases. Two enemy robots try to capture the player robot. The player robot has an internal store of energy, which decreases as the player robot moves. A large amount of energy is taken when the player robot is captured by the enemy robots, but the player robot can replenish its energy at the energy supply bases. When the player robot's energy store reaches zero, the game is over. The player robot has three goals to be achieved by reactive planning: escaping from the enemy robots, acquiring energy, and holding or dropping a tile which suddenly appears very close to the player robot. Furthermore, it has one goal to be achieved by deliberate planning: finding the route most suitable to getting a high score and moving for a long time without being captured by the enemy robots. To perform deliberate planning, the player robot uses the locations of tiles, the locations and scores of holes, the energy stores of the player robot, and the distances between the player robot and the enemy robots. Moreover, the player robot cannot predict behaviors of the enemy robots precisely and thus
115
Real World
~
delberale P-agecl ~,-~ oldemand1om~cutplm
ac~rai0n
Planning for deciding the next behavior ~ae~r~10m
P-agent1: planning for escaping from enemy robots. P-agent2: planning for acquiring energy. P-agent3: planning for scodng reactively. P-agont4: planning for scodng deliberately.
J
B-agent1: go up.
B-agent5: hold tile. B-agant2: go dram. B-agent6: embed hole. B-agont3: go left. B-agent7: acquire energy.
B-agent4: go right.
Figure 2: MRP~-planning (for the player robot in P~TT) may have to give up the plan obtained by deliberate planning. The player robot therefore needs adaptive planning which can efficiently select reactive/deliberate planning.
3 3.1
MRR-planning Behaviors
of Agents
MR-R-planning consists of three kinds of agents: planning agents (P-agents), behavior selection agents (BS-agents), and behavior agents (B-agents) (see left side of Figure 2). The algorithm for MRR-planning consists of two parts: a cycle of performing a planning by P-agents and spreading activation energy to B-agents, which are necessary for P-agents to execute their plans, and a cycle of selecting a B-agent depending on the activation energy and actuating a B-agent to carry out an action in the real world. 3.1.1
T h e b e h a v i o r o f P-agents
A P-agent that performs reactive planning (called a reactive P-agent) plans, for example, which direction the robot moves in the next step during one planning cycle (called a one-step plan) as in the conventional real-time reactive planning [8] and executes the one-step plan immediately. The goal is achieved by repeatedly planning and executing one-step plans. Each planning time is very short so that the robot can move reactively, but the total plan towards the goal may not be optimum.
116
A P-agent that performs deliberate planning (called a deliberate P-agent) plans a consecutive series of plans to achieve its goal during one planning cycle (called a total-steps plan) as in classical planning (similar to STRIPS [6]). An optimum plan is therefore always obtained, but this kind of P-agent cannot guarantee reactiveness. As mentioned above, MRR-planning provides reactiveness basically by applying the framework of conventional real-time reactive planning, so that even if a deliberate P-agent plans a total-steps plan during one planniug cycle, it executes the one-step components of that plan in turn. The deliberate P-agent may, however have to give up the total-steps plan because of a dynamic change of environment, and it may have to re-perform deliberate planning from the beginning. This is necessary for the MRR-planning to be reactive. To execute the plan, a P-agent must request a B-agent to, for example, control the actuator. The P-agent therefore spreads the activation-energy to B-agents which achieve the P-agent's plan. 3.1.2
How to set the activation-energy?
The activation-energy is a positive real number and is calculated depending on the level of demand that must be achieved by the plan. Currently, reactive P-agents use such a measure that if the level of demand increases, the activationenergy increases according to a quadratic function (until the activation-energy exceeds the threshold). And deliberate P-agents use such a measure that if the level of demand increases, the activation-energy increases according to a linear function whose slope is small (in this function the activation-energy does not exceed the threshold) (see right side of Figure 2). Moreover, we can change the characteristic of MRR-planning by changing a relation of both the above measures: for example, if the measure of reactive P-agents is higher than that of deliberate P-agents, the characteristic of MRR-planning becomes reactive. And if the measure of deliberate P-agents is higher than that of reactive P-agents, the character of MRR-planning becomes deliberate (see right side of Figure 2). In current MRR-planning, these measures and several parameters for the spreading activation model must be set by hand. 3.1.3
T h e b e h a v i o r o f a BS-agent
The BS-agent determines which B-agent is to be activated. This determination is based on the planning of P-agents using a spreading activation model and a threshold/maximum mechanism. The B-agent is in charge of simple behavior that it can execute by itself, and it does not have a mechanism enabling it to perform planning independently. The BS-agent selects the most suitable B-agent depending on only activationenergy, which means that the BS-agent does not consider the content of a P-
117
2-[5]
P-agent
BS-agent
Figure 3: Algorithms of the P-agent and BS-agent
agent's plan. Synchronization between P-agents and BS-agents is therefore not necessary. The behavior of B-agents, however makes the state of the robots and the environment change, so P-agents reflect the changes to their planning by using information from sensors and B-agents (see left side of Figure 3). In the next section, we explain the cycle of the BS-agent. Up to now we have been explaining that the P-agent spreads the activation-energy to B-agents, but the P-agent actually spreads the activation-energy to a slot corresponding to the B-agent in the BS-agent (see right side of Figure 2).
3.2
Selection of
reactive/deliberate planning
The BS-agent makes possible efficient selection of reactive/deliberate planning. The algorithm for this is as follows (see right side of Figure 3).
[,4] In the BS-agent, there are several slots each of which corresponds to a Bagent. When a P-agent finishes its planning it spreads activation-energy to a slot corresponding to a target B-agent. [B] It is likely that several P-agents spread activation-energy to the same slot. In such a case, activation-energy is added. [0] The BS-agent repeats the following loop until a B-agent is selected and activated. [1] Set all slots at NULL. [2] Execute the following two processes (CUR1, CUR2) concurrently. CURl-[3] Check each slot in turn until activation-energy is spread to all slots. CURI-[~] If activation-energy has been spread to all slots, the B-agent whose slot has the largest activation-energy is activated by the BS-agent. This means that one (or several) of the P-agents which spread activation-energy to this Bagent achieved its demand of planning deliberately. CURl-[5] Suspend CUR2 and go back to [0] CUR2-[3] Check each slot in turn until the slot value becomes larger than the threshold.
118
Deliberate situation
[activation-energy]
UP
~
THRESHOLD:J200]
player [obot~,,,~A*(3)[90] ~,L_
(4)[70]
(2)[50] DOWN
p-a~m~: (1). (2)
.T~
T P-agera2: (a), (4)
MAX: RIGHT[190] < THRESHOLD B-agent:RIGHT is tbctiv6ted
Reactive situation UP
~ (3)[90]
THRESHOLD:[200] P-agent1: (1), (2)
T] ~ I (4)[70 p-aoe~:(3).(4) LE(1)[2101 I1~ RTGHT P-~e.t4:sfill~p~anningl T (2)[190]
LEFT[210] > THRESHOLD
B-agent:LEFTis activated
DOWN Figure 4: Selection of reactive/deliberate planning
CUR2-[4] If one of the slot values has became larger than the threshold, the BS-agent immediately activates the B-agent corresponding to that slot. This means that one (or several) of the P-agents which spread activation-energy to this B-agent, achieved its demand of planning reactively. CUR2-[5] Suspend CUR1 and go back to [0] We can describe the features of MRR-planning as follows. B - a g e n t s Even though the amount of activation-energy spread to the slot of a B-agent is small, that B-agent can be activated if the total activation-energy becomes the largest. Thus, the maximum mechanism is selected and the demand of the P-agent, which spreads the small activation-energy, can be achieved (by [B], CURI-[,~]) (see Figure upper part of 4). However, in an emergency situation, for example, a reactive P-agent spreads high activation-energy, so the total activation-energy in the slot of a B-agent becomes higher than the threshold. In this case the threshold mechanism is selected (by CUR2-[~]) (see lower part of Figure 4). All P-agents have equal conditions and, in MRR-planning, restraint and admission control, caused by the hierarchy relation of agents, are not necessary against the subsumption architecture [3]. In MRR-planning, agents do not carry out close information exchanges, so even when the number of agents increases the possibility of impairing the real-time characteristic remains low. Selection
of
119 player robot 1 1
enemy robot 11 enemy robot2 1 1 1 2
2 3
2 3
TICKS 15, 20, 25 15, 20, 25
15, 20, 25 15, 20, 25
Player robot can move one step during one scheduling time. "Enemy robot 1 = 2" means that enemy robot 1 can move one step during two scheduling times. At this point, "1 TICK = 10 rag' Table 1: Environmental setup in RTT Selection of r e a c t i v e / d e l i b e r a t e p l a n n i n g When there is enough time for the robot to plan its next behavior, the amount of activation-energy spread from the P-agent to the B-agent is small, and the total activation-energy does not exceed the threshold. Therefore it is possible to select a B-agent after all P-agents have completed their planning (by CURI-[~]). This means that in this situation a deliberate P-agent can finish its planning and spread activation-energy stably. Even in this situation, however reactive P-agents can repeatedly plan and spread activation-energy while deliberate P-agents continue their planning; therefore, reactiveness is maintained (by [.4]'). However, when there is not enough time for the robot to plan its next behavior, the activation-energy f~om the P-agent is high and exceeds the threshold. In this case, a B-agent with this activation-energy is activated immediately so that MR/t-planning can perform reactive planning (by CUR2-[.~]). S u p e r v i s i n g B-agents The B$-agent supervises the aggregation of B-agents, which constantly form competitive relationships (however, competition can take place at the B$-agent level).
4
Evaluation
In RTT, to evaluate the performance of MRR-planning, the movements of the player robot and the two enemy robots are controlled by a common scheduler, and the player robot can perform a one-step planning and its execution during one scheduling time. If the player robot can finish planning early in one scheduling time, the player robot must wait the remaining time. The simulation was made with many environmental setups (see Table 1). In the reactive environmental setup, the enemy robots could move as fast as the player robot; and in the deliberate environmental setups, the player robot could move much faster than the enemy robots. Figure 5 shows the results of movement of P-agents. This graph indicates the alterations of P-agents that request activation of certain B-agents. This result enabled us to confirm that
120 P-agent1 P-agent2 P-agent3 P-agent4
1
i
l
L 20
30
,10
~SO
l
L 7@
60
Reactive environmental setup (Probot=
], Erobot7
= 7, E r o b o t 2
[ 80
Number of steps
= !)
P-agent1 P-agent2 P-agent3 P-agent4 0
10
20
30
40
50
60
70
Deliberate environmental setup (Probot=
l , Erobot l =3, Erobot2
Number of s~ps
=3)
Figure 5: P-agent movement sequence MRR-planning could select reactive/deliberate planning effectively in a dynamic environment. To evaluate MRR-planning using RTT, we prepared a planning that used the conventional reactive planning as a target of comparison. This planning is based on classical planning and, as in the case of IRMA mentioned in Ref. [10], reactiveness was enhanced by preparing in advance several plans to meet the environmental changes. We implemented four P-agents: the first agent performed reactive planning to run away from the enemy robots; the second agent performed reactive planning, depending on the remaining energy to go to the energy supply base; the third agent performed reactive planning to respond reactively to the appearance of tiles and holes; and the fourth agent performed deliberate planning to get higher scores and move for long time. The activation-energy was based on two distances: that between the player robot and the enemy robots, and that between the player robot and the energy supply base. The following seven B-agents were prepared: an agent to capture the tile; an agent to lower the tile; an agent to embed the tile in the hole; and agents in charge of movement in four directions (up, down, left, and right). Our evaluation was based on (1) the obtained scores and (2) the number of steps the player robot was able to move. The obtained scores were averaged to produce final results (Figure 6). Scores Under the deliberate environmental setups, MRR-planning obtained scores as high as those obtained by the the conventional reactive planning. This indicates that in the deliberate environmental setups there were no major differences between the performances of MRR-planning and the conventional planning. Under the reactive environmental setups, however the scores obtained
121 Number of steps
P~n~scored
800 +ml &Si ,,,,y
m,
700
"-'2..r
5o
+
\/
I
_
.
.
.
.
.
I N I ~ - -
2~----
-
Figure 6: Points scored and number of operational steps by the conventional reactive planning decreased rapidly, while MRR-planning achieved higher scores in all reactive environmental setups. This result enabled us to confirm that MRR-planning can select reactive/deliberate planning effectively in a dynamic environment (left side of Figure 3). N u m b e r o f s t e p s To evaluate the number of steps, we set up a rule such that the player robot succeeded in its mission if it could move up to 1,000 steps. Under reactive environmental setups, conventional reactive planning required more planning time for one movement than MRR-planning required. Therefore, the energy of the player robot in the conventional reactive planning became fully depleted and the game ended immediately. Under MRR-planning, in contrast, the player robot was able to maintain stable operations in all environmental setups. This result confirmed that MRR-planning could adapt to dynamic environmental changes more flexibly(right side of Figure 3). The results of these evaluations confirm that MRR-planning can perform not only reactive planning but also deliberate planning and can achieve goals in a stable manner regardless of dynamic environmental changes.
5
Application
to
Real-world
Systems
Making a comparative evaluation using RTT enabled us to verify the effectiveness of MRR-planning at the simulation level. We then verified its effectiveness with applications designed for the real world by putting MRR-planning to actual use.
5.1
What
is t h e A c t i v e
Vision
System?
The active vision system (AVS) we have implemented is designed to detect moving objects in the real world and collect valuable image and sound data autonomously.
122
ActiveCIinlmll-I
AcUveCamem~l
i m0)
SGI Indigo2
801 Indy
Figure 7: Flow diagram of the AVS
Figure 8: Photograph of the AVS
So to speak, AVS is an "adaptive moving object detecting system." In the movie "Jurassic Park," there was the following line: "dinosaurs detect only moving objects." The purpose of the AVS is similar to that of the vision system of dinosaurs. The following section explains the functions achieved by AVS. The AVS consists of two ActiveCameras and one pair of stereo microphones (see Figures 7 and 8). The two cameras in the system have different functions. 9 Functions of ActiveCamera-I (narrowing the focus) -Detect moving objects and make the camera follow them (pan/tilt functions). -Zoom in to magnify the movement of small objects (zoom function); the captured object is small. -Judge whether the detected object is human. If so, the camera continues monitoring; if not, it does not follow the movement of the object. Functions of ActiveCamera-II (always focusing on a wide range) -Detect moving objects and make the camera follow them (no zoom functions). -Follow up in response to sound information (pan function only). 9 Functions relating to the coordinative operations of the two cameras -Supplement for the limited scope of each ActiveCamera. 9
The ActiveCamera incorporates a lift that can be computer-controlled via the RS-232C and includes pan and tilt functions. It also incorporates a zoom function, that is computer-controlled. One workstation is used to control one camera unit, and interaction between workstations is achieved via a UNIX socket by a communication function. The present AVS itself cannot move like a robot, so we are planning to load it into a robot. Although many studies have been carried out to make systems similar to the AVS [5] they tended to focus on each mechanism (for example, vision processing and sound processing) rather than mechanisms for coordination and integration.
123
Figure 9: AVS agent network Our purpose is to coordinate and integrate these mechanisms so that for each processing we can use suitable existing technology which has real-time characteristics. Reactive planning in the AVS is used for controlling an ActiveCamera when the degree of object movement is high, and deliberate planning is used for judging whether the detected object is human (which decision requires a long time).
5.2
Implementing
P-agents
Figure 9 explains each agent implemented in AVS. We implemented two sets of Mp~D~-plannings to control each ActiveCamera.
5.2.1
Moving Object Detection Agent (P-agent1, P-agent6)
This P-agent performs reactive planning to control the llft to make sure that the moving object comes to the center of the output frame. The P-agent first gets the interframe differential data by using image data from the real world. Then the center of gravity of the differential data is calculated and activation-energy is spread to the target B-agent so that the center of gravity shifts to the frame center. 5.2.2
A g e n t for Enlarging a n d R e d u c i n g M o v i n g O b j e c t s (P-agent2)
This P-agent zooms in/out using the differential data in such a way that moving objects always appear on the screen in an appropriate size. The P-agent calculates the decentralized value of the differential data and when the decentralized value is higher the camera zooms out, and when the decentralized value is lower
124
Figure 10: Image Processed it zooms in. Then the P-agent performs planning, depending on the decentralized value, to control the lift so t h a t the moving object comes to the center of the output frame.
5.2.3
Human Detection Agent
(P-agent3)
Whether the moving object is human can be determined by observing the color of its surface. At first this P-agent cuts out the smallest rectangle that contains a section experiencing movement by using the decentralized value. This rectangle is then divided into several segments, and we determine whether the sub-area contains a color corresponding to that of skin. Lastly, if there are black sub-areas on these sub-areas containing skin color, we think that this moving object may be a human. Moreover, the P-agent performs planning to control the lift so that the moving object comes to the center of the o u t p u t frame depending on the locations of these sub-areas. The processed image is shown on the right-hand side of Figure 10. Here, the sub-area of the face section is assessed as possessing a skin color. 5.2.4
M o v e m e n t T r a c e A g e n t (P-agent4, P-agent7)
This P-agent carries out planning so that both ActiveCameras are made to operate in a manner reflecting their individual movements. When ActiveCamera-I can detect nothing for a long time, for example, ActiveCamera-II makes planning to control ActiveCamera-I to move and zoom in the same way as ActiveCamera-II.
5.2.5
Sound Detection Agent (P-agent5)
First, only the sound emitted more than a certain power is filtered. Next, the dynamic measure [2] is used so that a voice signal that is mixed with other noise can be detected with high accuracy. More specifically: (1) when image information is lacking so that the camera is unable to detect moving objects, the camera can react to incidental noise; but (2) when the camera does detect moving objects, it does not react to accidental noise other than a human voice.
125
P-agent3
----
l
9
B
I
,
P-agent2
__
•
,
Am
:,-
Rapid
u I-
movement
P-agent1 Idle
9
B/
--"
P-agent4
- m.--..~.a
P-agent7
9
movement
V
9
P--agent6
- -
Idle
i
i
No movement
I 9
i
detected
P--agent5 0
200
400
e o o
Number
800 of steps
1000 Time
(3'35")
Figure 11: P-agent movement sequence
5.3
Behaviors of P-agents in the AVS
Figure 11 shows the results of the movement of P-agents. This graph indicates the alterations of P-agents that request activation of certain B-agents. Idle means that no B-agent is currently activated. P-agentl and P-agent2 need reactive planning and P-agent3 and P-agent4 need deliberate planning for ActiveCamera-I. And P-agent6 needs reactive planning and P-agent5 and P-agent7 need deliberate planning for ActiveCamera-II. When the movement of the object was rapid, P-agentl controlled ActiveCameraI without waiting for the results of planning performed by P-agent3, and when the movement of the object was slow, P-agent3 could perform its planning and decide that the detected object was human. When ActiveCamera-I could not detect the moving object, P-agent7 in ActiveCamera-II controlled ActiveCameraI. When ActiveCamera-II did not detect the movement of the object, P-agent5 performed planning to detect sound. To evaluate the robustness of MR_R-planning, we tested the AVS when the function of P-agent2 turned off. Figure 12 shows the results of movement of P-agents. The function of P-agent2 turned off at 500 steps so that the P-agent could not plan after 500 steps. Nevertheless, the AVS could work stably and P-agent3 could determine whether the detected object was human.
6
Future
Issues
We are currently planning to load the AVS on the B14 autonomous moving robot made by Applied AI Systems, Inc. To control the robot, a lot of P-agents, BS-agents, and B-agents are necessary to integrate the information from the AVS and the many sensors of the robot. Therefore, verification of the algorithm of MRR-planning becomes necessary because in these situations the loads of BS-agents become large.
126
P--agent3
-7,
P--agent2
:I- ""
"I
P--agent1 Idle
-
P--agent4 o
200
460
6 0 0
6 0 0
N u m b e r of steps
1 0 0 0
Time
(3"35")
Figure 12: Verify the robustness of MRR-planning Mechanisms for automatically learning two kinds of parameters are necessary: a parameter for the P-agent to calculate the activation-energy, and a parameter for the BS-agent to determine which B-agent is to be activated. At present, we plan to use the evolutionary approach. Moreover, if MRR-planning can store a history of the behaviors of agents, MRR-planning can re-use this history as experience to deal with new situations (similar to K-line). Lastly, more deliberate studies for the deliberate planning are necessary. The deliberate P-agent in the AVS can always reset its planning when the reactive P-agent's demand is achieved for each step. In other situations, however the deliberate P-agent may be able to or must continue its planning regardless of the selection of the reactive P-agents. In this case, it may happen that the planning of the reactive P-agents influences the deliberate P-agent which then must change its planning policy according to the influence of the reactive P-agents (so-called utility problem). A mechanism to resolve this problem is necessary.
7
Conclusion
This paper proposed a multi-agent real-time reactive planning (MRR-planning), the main features of which are that (1) reactive or deliberate planning is selected efficiently in a dynamic environment and (2) the selection mechanism is established by non-close coordinations of agents. In a simulation, we obtained results that were satisfactory in comparison with those of a planning that uses conventional reactive planning. We also implemented the active vision system, which is the first step in building a realworld agent, and used it to verify the real-world effectiveness of MRR-planning. We plan to focus our future study on building the real-world agent described in Section 4.1. We are planning to load the AVS on an autonomous moving robot, have the robot move around in the environment we live in, and let it interact with us. Finally, we are continuing research on building and retrieving a multimedia database [12], [15]. And, for this database, we are currently planning to use the autonomous robot as an intelligent tool for acquiring data from the real world.
127
Acknowledgements W e would like to thank to our manager, Dr. Kenichiro Ishii, and the researchers of the Semantic Computing Research Group.
References [1] P.E. Agre and D. Chapman: What Are Plans For?, Designing Autonomous Agents, pp. 17--49. Bradford-MIT, 1990. [2] Aaron E. Rosenberg and Frank K. Soong: Recent Research in Automatic Speaker Recognition, Advances in Speech Signal Processing, in Sadaaki Furui and M. Mohan Sondhi, pp. 701-738. Dekker, 1989. [3] R.A. Brooks: Intelligence Without Reason, Proceedings oflJCAI-91, pp. 569-595, 1991. [4] Mark Boddy and Thomas Dean: Solving Time-Dependent Planning Problems, Proceedings of IJCAI-8g, pp. 979-984, 1989. [5] Christel M, Stevens S, Kanade T, Mauldin M, Reddy R, and Wactlar H: Techniques for the Creation and Exploration of Digital Video Libraries, Multimedia Tools and Applications (B. Furth, ed.), Vol. 2, ch. 17, Kluwer Academic, 1995. [6] R.E. Fikes and N.J. Nilsson: STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving, Artificial Intelligence, vo]. 2, pp. 189-208, 1971. [7] Michael P. Georgeff and Amy L. Lansky: Reactive Reasoning and Planning, Proceedings of AAAI, pp. 677'-682, 1987. [8] Pattie Maes: The Agent Network Architecture (ANA), SIGART Bulletin, vol. 2, no. 4, pp. 115-120, 1991. [10] Martha E. Pollack and Marc Ringuette: Introducing the TileWorld, Experimentally Evaluating Agent Architectures, Proceedings of AAAI, 1990. [12] Toshihiro Takada, Mitsukazu Washisaka, Shigemi Aoyagi, and Rikio Onai: Twoway Linkage System between Video and Transcribed Text for Protocol Analysis,
Proceedings of the International Conference on Multi Media Japan'96 (MMJ'96), 1996. [14] Seiji Yamada: Controlling Deliberation with Success Probability in a Dynamic Environment, Proceedings of AIPS'96, pp. 251-258, 1996. [15] Mitsukazu Washisaka, Toshihiro Takada, Shigemi Aoyagi, and Rikio Onai: Video/Text Linkage System Assisted by a Concept Dictionary and Image Recognition, Proceedings of the ICMCS'96, 1996.
Distributed Problem-Solving as Concurrent Theorem Proving* Michael Fisher 1 and Michael Wooldridge 2 1 Department of Computing, Manchester Metropolitan University, Manchester M1 5GD, United Kingdom M. F i s h e r @ d o c . m m u . a c . u k
2 Zuno Ltd, International House, Ealing Broadway Centre London W5 5DB, United Kingdom mj w @ d l i b , c o m
Abstract. Our principal aim in this paper is to demonstrate that distributed prob-
lem solving may fruitfully be viewed as concurrent theorem proving. We begin by introducing a novel agent-based approach to concurrent theorem proving, and then describe Concurrent METATEM, a multi-agent programming language whose model of computation is closely related to that used within the theorem proving approach. An extended case study is then presented, wherein we demonstrate how a multi-agent planning system can be implemented within the agentbased theorem proving framework. We then show how extensions and refinements of the planning system can easily be accommodated within this framework. We conclude with a detailed discussion of related work, from both the multi-agent systems community and the (concurrent) theorem proving community.
1
Introduction
Problem solving is a fundamental issue in AI and, along with game-playing, is perhaps the oldest research topic in the discipline. The view of problem solving as theorem proving has a long and influential history in AI, going back at least to the work of Green [ 12]. This deductive view of problem solving has been particularly influential and useful in AI planning research. Distributed problem solving (DPS), wherein a group of decentralised semi-autonomous agents cooperate to solve problems, is perhaps the paradigm example of multi-agent activity, and is certainly the most-studied process in distributed AI [5]. However, while many logic-based approaches to distributed AI have been described in the literature, we are aware of no work that has explicitly proposed viewing distributed problem solving as concurrent, agent-based theorem proving. This is perhaps due to the lack of an appropriate agent-based computational model for concurrent theorem proving. The main aim of this paper is to show how distributed problem solving may usefully be treated as concurrent theorem proving. To this end, we utilise a recently developed general framework for agent-based theorem proving, and demonstrate how this * This work was partially supported by EPSRC under grant GR/J48979.
129
framework may easily be implemented in a multi-agent programming language. As an example of distributed problem solving, we consider distributed planning, representing both the basic system and various extensions and refinements within our framework. The remainder of this paper is structured as follows. In w we present a general framework for concurrent theorem proving [10], and show how it can be used as the basis for distributed problem solving. In w we introduce Concurrent METATEM [8], a multi-agent programming language whose computational model is closely related to the agent-based theorem proving framework, making the language well-suited to implementing the technique. In w we present an extended case study, in which we show how a range of planning techniques may be implemented using the concurrent theorem proving technique, and finally, in w we discuss related work and provide concluding remarks.
2
A Framework for Concurrent Theorem Proving
In this section, we introduce a novel, agent-based approach to theorem proving. In the interests of brevity, we only give an outline of the technique here. A fuller description, together with a number of technical results associated with the technique, is given in [10]. The basic idea behind the approach is easily illustrated by means of a simple example. Consider the following set of propositional Horn clauses 3: 1. p 2. -~p V q V - ~ r 3. -~p V -~q V -~r 4. -~p V r
Using classical resolution, it is easy to derive the empty clause from this set. In our framework, theorem proving proceeds by allocating each clause i E { 1 , . . . , 4} to an agent, Agi. The agents we consider are self contained reasoning systems, encapsulating both data and behaviour, which are able to execute independently of each other and communicate via broadcast message-passing [ 19]. In our theorem-proving context, these agents behave as follows: -
-
any agent representing a clause containing just a positive literal should pass that information (via broadcast message-passing) to all other agents; upon receipt of a message, agents add this message to their current set of clauses, and repeatedly apply classical resolution in order to transform these clauses, broadcasting any new literals generated or contradictions produced.
Now consider the behaviour of the agents in the simple example given above, when we apply this behaviour. As the agents first begin executing, agent Agl, containing only the proposition p, broadcasts the message p to all other agents. Once p has been received, each agent transforms its clauses by applying classical (in this case, unit) resolution, and the configuration becomes: a Note that, although this example is both propositional and a set of Horn clauses, the technique is more general, being complete for full (non-Horn) first-order logic.
130
(Agl) (Ag2) (Ag3) (Ag4)
: : : :
p q V ~r ~q V ~ r r
Agent Ag4 then broadcasts r as new information. After this message reaches the other agents, they update their clauses, and the configuration becomes:
(Agl) (Ag2) (Agz) (Ag4)
: : : :
p q -~q r
Finally, agent A92 then broadcasts q, and upon receipt of this message, Ag3 generates a contradiction, which it then broadcasts. Since the empty clause has been derived, theorem proving is now at an end. Note that the theorem-proving activity is not dependent upon the order in which messages are sent. If the empty clause can be derived from the initial clauses, then it will be, as long as all messages sent are guaranteed to (eventually) arrive. Also note that since broadcast message-passing is used as the basic communication mechanism, other agents are able to view (and utilize) the intermediate deductions produced by each agent. Hence, global deductions are carried out collectively by the set of agents. While many variations of theorem-proving utilising concurrency have been developed [ 13], few use such a model of computation. Those that are related are discussed in w
2.1
Generality and Correctness
Despite its simplicity, this approach is just as powerful as classical resolution: communication patterns in the concurrent system match the proof steps in a sequential resolution refutation. In the case of Horn clauses, the messages between agents correspond to positive literals while in the case of full classical logic, the messages themselves correspond to Horn clauses. In [10], we prove that the technique is refutation complete for full classical logics, and in addition show how it can be extended to first-order classical logic. The main result from [10] can be stated as follows. Theorem 1 (Correctness [10]). If a set of clauses, A is distributed amongst a set of agents as described above, then a false message will eventually be generated by at least one of the agents if and only if, A is unsatisfiable. Agents may also represent and exchange messages about both the heuristics currently being employed and the organisation of the agent society. Such meta-level information can be used improve the efficiency of the theorem proving process, and allows cooperative, competitive, or opportunistic problem solving structures to be implemented and investigated [ 10].
131
2.2
Efficiency and I m p l e m e n t a t i o n
One potential criticism of the technique described above is the use of broadcast message passing, which is often regarded as too demanding of communication bandwidth to be used in practice. However, in spite of the use of broadcast, the system need not be flooded with messages. Not only is it possible to structure agents so that related information only occurs within one agent, but also, by grouping agents containing related parts of the problem-solving capability together, the number of messages generated can be greatly reduced [10]. Branching in the search space is replaced by additional broadcast messages. Thus, in architectures where broadcast is prohibitively expensive, the technique may prove to be inefficient. However, most contemporary architectures provide efficient multicast mechanisms (indeed, many distributed operating systems are based upon this mechanism: see, for example, [3]).
3
Implementing the Framework
Having outlined the general model of concurrent theorem proving, we now describe the high-level programming language in which problem-solving applications will be represented. A Concurrent METATEM system [8, 9] consists of a set of concurrently executing agents, which communicate through asynchronous broadcast message-passing. The internal computation mechanism for an agent is provided by the execution of temporal logic formulae [2]. We begin by giving a brief overview of temporal logic, followed by an outline of the execution mechanism for temporal formulae. Temporal logic can be seen as classical logic extended with modal operators for representing temporal aspects of logical formulae. The temporal logic we use is based on a linear, discrete model of time. Thus, time is modeled as an infinite sequence of discrete states, with an identified starting point, called 'the beginning of time'. Classical formulae are used to represent constraints within individual states, while temporal formulae represent constraints between states. As formulae are interpreted at particular states in this sequence, operators which refer to both the past and future are required. The future-time temporal operators used in this paper are as follows: the sometime in the future o p e r a t o r - - ~qa is true now if ~ is true sometime in the future; and the always in the future operator - - D ~ is true now if qo is true always in the future. Similarly, connectives are provided to enable formulae to refer to the past. The only past-time temporal operators needed for the examples in this paper are as follows: the sometime in the past operator - - ~ ~p is true now if qo was true in the past; the beginning of time o p e r a t o r - - start is only true at the beginning of time; and the strong last-time operator - - OqD is true if there was a last moment in time and, at that moment, ~p was true 4. Concurrent METATEM uses a set of 'rules', couched in temporal logic, to represent agent's intended behaviour. These rules are of the form: 'past and present formula' =~ 'present or future formula' 4 A number of other operators are provided in Concurrent METATEM,though as they are not required for this paper, they will not be mentioned here; see [2, 9].
132
Consider the following rules, forming a fragment of an example Concurrent METATEM program. start ~ achieves(a) Ogoal(X) ~ ~planned(X) Otopgoal( Y) ~ subgoal( Y) V fact(Y) Here, both ' X ' and ' Y' represent universally quantified variables. Thus, we can see that achieves(a) is made true at the beginning of time and whenever goal(X) is true in the last moment in time, a commitment to eventually make planned(X) true is given. Similarly, whenever top9oal(Y) is true in the last moment in time, then either subgoal(Y) or fact(Y) must be made true. An agent's program rules are applied at every moment in time (i.e., at every step of the execution) and thus execution in a Concurrent METATEM agent can be distinguished from the logic programming approach in that refutation is not involved in the computation process and the model for the formula contained within the agent is constructed by following the temporal rulesforwards in time. Once the agent has commenced execution, it continually follows a cycle of reading incoming messages, collecting together the rules that 'fire' (i.e., whose left-hand sides are satisfied by the current history), and executing one of the disjuncts represented by the conjunction of right-hand sides of 'fired' rules. Each agent contains an interface describing both the messages that the agent will recognise and those it may send. For example, the interface
top (9oal, achieves)[planned, subgoal] : defines top to be the name of an agent in which {9oal, achieves } is the set of messages the agent will accept, and {planned, subgoal} defines the set of messages the agent can send. For a more detailed description of the execution mechanism underlying Concurrent METATEM, see [2, 9].
4
A Case Study: Distributed
Planning
In order to illustrate our approach, we show how planning problems can be represented within our model. We begin with an overview of AI planning; our presentation is relatively standard, and is based on [14]. First, we assume a fixed set of actions Ae = { c q , . . . , c~,~}, representing the effectoric capabilities of the agent for which we are developing a plan. A descriptor for an action c~ E Ac is a triple (Pa,D,, A~), where5:
- Pa C s is a set of sentences of first-order logic that characterise the pre-condition of c~; - D , _C s is a set of sentences of first-order logic that characterise those facts made false by the performance of a (the delete list); 5 We assume a standard first-order logic s
with logical consequence relation ' ~ ' .
133
-
A,~ C_ s
is a set of sentences of first-order logic that characterise those facts made
true by the performance of a (the add list). A planning problem (over Ac) is then determined by a triple (,4, O, "y), where: "4 _C/2o is a set of sentences of first-order logic that characterise the initial state of the world; - O : {(P~, D~, As) I a E Ac} is an indexed set of operator descriptors, one for each available action c~; and - 3' _C/Z0 is a set of sentences representing the goal to be achieved. -
A plan 7ris a sequence of actions 7r = (a 1 , . - . , an). With respect to a planning problem {A, 0 , 7 ) , a plan 7r = (cq,...,c~n) determines a sequence of n + 1 world models ,40, A 1 , - - . , An where: ,40 = "4
and
"4i = (,4/--1 k D~,)U A~,
for I < i < n.
A (linear) plan 7r = ( a l , . . . , an) is said to be acceptable with respect to the problem (A, O, "y) if, and only if, "4/-1 ~ P,~,, for all 1 < i < n (i.e., if the pre-condition of every action is satisfied in the corresponding world model). A plan 7r = ( a l , . . . , an) is correct with respect to {'4, O, 7) if, and only if, it is acceptable and An ~ 7 (i.e., if the goal is achieved in the final world state generated by the plan). The planning problem can then be stated as follows: Given a planning problem (,4, O, 7), find a correct plan
for (,4, O, 7). We will now demonstrate how the planning problem can be solved using the general concurrent theorem proving paradigm we described in w More precisely, in w we show how a Concurrent METATEM system can be generated to solve the planning problem in a top-down (goal-driven) manner. We then prove correctness of the approach. In w we give an alternative method for deriving a Concurrent METATEM system, that will generate a solution to the planning problem in a data-driven (bottom-up) fashion, while in w we consider refinements of the two approaches. 4.1
Goal-Driven
(Top-down)
Planning
In this section, we demonstrate how, from a planning problem (A, O, 7), a Concurrent METATEM system that will solve the problem in a top-down, goal-driven fashion can be systematically derived. We begin with a discussion of the various predicates that will be used, and an overview of the derived system structure. We use just four domain predicates (see Table 1). The predicate top-goal(...) represents the fact that its argument (a set of s sentences) is the top-level goal of the system. The unary predicate goal(...) is used to represent sub-goals; its argument is also a set of s sentences. The achvs(...) predicate takes two arguments, the first of which is a set of s sentences, the second of which is a plan; achvs(A', 7r) represents the fact that plan 7r, when executed from the initial world state ,4, will achieve ,4'. Initially, we shall assume that plans are linear (see w for more complex plans), and represent them using a PROLOG-like list notation. Finally, the predicate plan(...) is
134
top-goal(7) 9oai(7) achvs( A, 7r) plan("/, 7r)
7 is a top-level system goal 7 is a sub-goal plan ~ achieves A plan ~ is a correct planfor 7
Table 1. Domain Predicates
used to communicate a plan to the originator of the top-level goal: plan(7 , 7r) means that plan 7r, if executed in the initial world, will achieve "7Given a planning problem (A, O, 7), the basic generated system will contain [A[ + ]OI + 1 agents: one for each element of A and O, and one 'top-level' agent. The toplevel agent takes in a request for a plan to achieve 7, and sends out a message that creates a sub-goal 7. For each operator description ( P , , D , , A , ) C O, an agent a is created, which encapsulates knowledge about the pre- and post-conditions of a. This knowledge is represented by the two rules (TO1) and (TO2). The first of these, (TO1), is essentially a rule for sub-goaling: it is fired when a message is received indicating that a goal has been created corresponding to the post-condition of a; in this case, the rule causes a new sub-goal to be created, corresponding to the pre-conditions of a. At some stage, a sub-goal created by this process will correspond to an initial state of the world (otherwise, the top-level goal 7 is not achievable). This is where the third type of agent plays a part. For each sentence ~Pi E A, an agent initi is created, containing a rule which represents the fact that if ever qoi is a sub-goal, it can be achieved by the empty plan, '[]'. When such a rule fires, this information is propagated by sending the message achvs({~}, []). These base agents can also combine initial conditions, sending out composite 'aehvs' messages. Within each c~ agent, there will be a single (TO2) rule, characterizing the effect of executing c~; this rule will fire when a message achvs (A t, 7r) is received, such that A ~matches the pre-condition of c~. When fired, the rule will cause a message aehvs(A', [c~ ] 7r]) to be broadcast, where A" is the world-model obtained by executing c~ in A ' We shall now describe the derived system (and in particular, the agents and rules used) in more detail.
Top-level agent: For {A, O, 7), we create a top-level agent as follows. top-level(top-goal, achieves)[plan, goal] : ( TG1) 0 top-goal(7) ~ goal(7); (TG2) Oachvs(A, Tr) A ~ top-goal(7 ) A (7 C_ A) ~ plan(7, Tr). The agent top-level accepts 'requests' for plans in the form of a message top-goal(7 ), where 7 is the goal, as above. The rule (TG1) then simply propagates 7 as a sub-goal. The predicate top-goal would be given to the system by a user. Rule (TG2) simply characterises the plan predicate: ~v is a correct plan for 7 if 7r achieves 7. When the top-level agent is informed of a plan 7r that achieves the top-level goal 7, it sends a message plan(7, 7r), indicating that a plan for the goal has been found. Thus, rule (TG1) represents the input to the system, whereas (TG2) represents the output.
135
Base agents: Given an initial world model ,5 = {qOx, . . . , ~Pm}, we generate m agents, initl,..., initm, each containing a rule (TB1) showing that the initial conditions are achieved by the empty plan, together with a rule allowing the combination of relevant initial conditions (TB2). initi ( goal ) [achvs ] : (TB1) 9oal(~vi) =~ achvs( {~oi}, []); (TB2) Oachvs(A', []) A (qPi ~- `5') =~ achvs((`5' U {(pi}), []). Action agents: For each operator description (Pa, Da, As) 6 O, where both A~, = {(Pl,..., qo,~} and P~ = { r r we create an agent a as follows. a(goal, achvs)[goal, achvs] : (TO1) goal(~l) V . . . V goal(qom) =~ goal(C1) A . . . A goal(r (TO2) Oachvs(,5',Tr) A (P~ C_ ,5') =r achvs(((,5' \ D,~) U As), [c~ I 7r]). Rule (TO1) generates sub-goals: if a sub-goal is received that matches against the post-condition of c~, then this rule causes the pre-conditions of c~ to be propagated as sub-goals. Rule (TO2) defines the effect that action a has upon an arbitrary state that satisfies its pre-condition. This rule effectively restricts us to linear plans-- we consider non-linear planning in w It is important to note that, while this approach may seem inefficient at first, achvs messages are only initiated for members of ,5 that are required for one of the possible plans. Correctness In this section, we prove that the approach to top-down planning discussed above is correct, in that: (i) any plan generated by the system is correct, and (ii) a system is guaranteed to eventually generate a plan for the top level goal 3'- Alternatively, the correctness of this planning approach can be established via correspondence to the (complete) concurrent theorem-proving system [10]. Theorem 2. Any plan generated by the system given above is correct. More precisely, if the message achvs (,5', 7r) is broadcast in a system derived from a problem (,5, O, 7), then 7r is a correct plan for ,5'.
Proof By induction on the structure of plans. The base case is where 7r is empty; a message achvs(A', []) will only be sent by an initi agent, in which case A is true in the initial world, and will clearly be achieved by the empty plan. Next, suppose that 7r is of the form [c~ ] 7r'], and that if achvs(A", rr') is sent, then 7r' is correct for A". If achvs(A', [a I 7r']) is subsequently sent, then it must originate from a (TO2) rule within the agent a. In this case, it is easy to see from inspection of (TO2) that [a I 7r'] is correct for ,5'. Theorem3. If there exists a solution to the problem (A, O, 3'), then eventually, a system derived as above will find it. More precisely, if there exists a solution to (,5, O, 7), and the message top-goal(7 ) is sent, then eventually a message achvs(% 7r) will be sent.
136
Proof By induction on the length of successful plans. The base case is that 7 is directly achieved by the initial conditions of the system, and thus 9oal(7) generates an appropriate achvs(A', []) message (where "y C_ A'). Assuming that all problems requiring plans of length n - 1 can be solved, we assume that the plan a l , . 9 a n achieves the goal 7- Here, the message goal(7) reaches agent an, which recognises 7 and broadcasts appropriate subgoals. By the induction hypothesis, the subgoals will be solved and achvs(A", 7r) will eventually be received by an (where P~ C_ A"). The an agent will then broadcast the solution to % Theorems 2 and 3 together imply that a Concurrent METATEM program derived from problem (A, O, "/) using the above scheme will be totally correct.
4.2 Data-Driven (Bottom-up) Planning The operation of many implemented planners corresponds to the basic approach developed in the preceding section, in that they are goal-driven. Of course, there is an alternative, whereby a plan is developed in a data-driven manner. Many of the concepts are similar to the top-down planner (e.g., the various domain predicates retain their meaning). For this reason, our presentation will be somewhat more terse. Given a planning problem (A, O, 7), we now generate a Concurrent METATEM system containing IOl + 2 agents: one top-level agent, (as above), one agent for A, and one agent for each element of O. The system works by forward chaining from the initial state of the world, generating all possible plans and their consequences. Eventually, the desired plan will be generated. However, given that there are I01! simple linear plans possible for operators O, it is not difficult to see that this form of plan generation will, in general, be impractical.
Top-level agent: This agent simply awaits a plan achieving the goal 3'top-level(top-goal, achvs)[plan] : (BG1) O achvs(A', 7r) A ~ top-goal(7) A (7 C_ A') ~ plan(% 70. Base agents: Given the initial world A = { ~ 1 , . 9 9 , ~ r n }, we now generate an agent, init which broadcasts all the relevant initial information 6. Again, the agent contains a rule allowing the combination of relevant initial conditions (BB2).
initO[aehvs] (BB1)
(BB2)
:
Oachv,(
start => A im achvs({qoi},[]); =1
', []) A
r
achvs(( ' O
[]).
6 We here choose to use one base agent, rather than m base agents, in order to reinforce the fact that information can be distributed amongst agents in a variety of ways.
137
Action agents: For each operation descriptor (P~, Da, A,~) E 0 we generate an agent a, as follows: a( achvs)[achvs] : (BO1) O achvs(,4', 70 A (P~ C_ ,4') =:V achvs(((,4' \ D~) U A(~), [a I 7r]). The rule (BO1) is identical to (TO2), above. Thus, there are no rules for decomposing a goal to produce sub-goals. Goals can only be solved by the required combination of plan elements being generated bottom-up. Again, the correctness of this approach can be easily shown. 4.3
Refinements
We will now briefly outline a few possible refinements to the basic planning mechanisms discussed in the previous sub-sections.
Uni-directional top-down planning: When considering simple linear plans, generated using the top-down approach in w there is often no need to pass messages back through the action agents to achieve the final plan. Now, we extend the goal predicate with a second argument in which the partial plan for the current goal is stored. If the toplevel agent broadcasts goal(% []), then each action agent need only have the following rule for producing subgoals. n
goal( ,,O ^ (As
# O)
goat(A i=1
Thus, if any post-condition of the action occurs within a goal, then a new subgoal corresponding to the pre-condition is generated and the current partial plan is extended with the action. Each base agent now broadcasts the plan if it can reduce the goal completely:
goal(~i,r) ~ plan(Tr). In this way, once a goal is completely decomposed, the second argument to goal must hold the plan that achieves the goal.
Non-linear planning: The top-down and bottom-up planners sketched above generate basic linear plans. An obvious extension is to develop non-linear plans. To provide this extension, we simply allow the preconditions of an action to be satisfied by states derived from different routes. Thus, assuming P,~ is a set of n literals, qvl,. 9 ~n, we simply change (BO1)/(TOI) to be
Oachvs('41,Trl) A ((Pl E ,41) Oachvs(,4.,
.) ^
e ,4.)
A ]
^
aeh.s(((A\D.)UA.), [. I
"4 = /lx U ... U A , A consistent(A) where consistent checks the satisfiability of a set of (usually ground) sentences.
138
Grouping and Efficiency: Until now, we have not considered the possibility of structuring the agent space. Here, we briefly outline a mechanism, based upon grouping [ 16, 9], whereby simple organisational structures can be implemented, thus limiting the extent of broadcast communication. The idea underlying the notion of an agent group is that each agent may be a member of several groups, and when an agent sends a message, that message is, by default, broadcast to all the members of its group(s), but to no other agents. Thus, if such groups are constructed appropriately, then communication between groups will be more restrained than the communication within groups. (It would be natural to implement such a scheme by limiting each group to a single processor if possible.) Consider the simple top-down planning approach outlined in w Here, even if a plan never requires certain operators or initial conditions, broadcast messages will still be sent to the agents representing these irrelevant items. If we are able to partition the agents space so that agents we are certain will not be needed in the final plan are excluded from a group, then broadcast communication may be effectively limited. The smaller the group produced, the less communication is required. As an example of this, consider the set of clauses presented in w By grouping clauses 2 and 3 together into a sub-group which can receive messages, but not send them, we can ensure that messages relating to p and r are allowed through to Ag~ and Ag3, while q messages will never be allowed out of this sub-group. In this way, communication regarding q is localised and broadcast message-passing is reduced. While grouping has many practical advantages, it can obviously lead to incompleteness. It is important that any heuristics used to group agents together are shown to retain the correctness of the problem-solving system. Thus, much of our current work, particularly with respect to concurrent theorem-proving, is centred around the development of appropriate heuristics.
5
Concluding Remarks and Related Work
We have introduced a model for distributed problem-solving, based upon an agentbased approach to concurrent theorem-proving. While space precludes both a longer discussion of the elements of the model and its application to other problem-solving domains, we believe the use of this model to represent varieties of distributed planning shows the potential for this approach. In addition to providing a consistent basis for distributed problem solving, this framework allows for the development of flexible and open agent-based systems, the use of broadcast communication being vital to the latter. While the utility of a logic-based approach for proving the correctness of distributed problem solving systems is clear in the case of planning, a wide range of further applications can be re-cast in this framework. In this way, the difficulties often encountered in establishing the correctness of dynamic distributed problem solvers may be alleviated. Our future work in this area is to continue developing implementations and refinements of the approach (a prototype system already exists), to extend it to a wider range of distributed problem solving applications, and to show how such distributed systems can be formally derived from their single process counterparts. In addition, we are actively developing heuristics which can be used not only to group agents appropriately,
139
but also to distribute information amongst agents. Finally, we briefly consider related work in the section below. 5.1
Related Work
Our approach to concurrent theorem proving and distributed problem solving is somewhat similar to the blackboard model [7]. However, there are also significant differences: perhaps most importantly, our model allows for true concurrency, in that agents execute in parallel, and communicate via message-passing, rather than via globally accessible data structure. In terms of Smith's general classification of distributed problem solving, our framework is based on result sharing (as opposed to task-sharing) [18]. With respect to the underlying model of concurrent theorem-proving, while other systems share some features with our approach, the particularly close link between the operational model, communication and deduction, the possibility of dynamic creation, and the openness of the system makes it significantly different from previous systems. In the DARES distributed reasoning system [15], agents cooperate to prove theorems. As in our model, information (set of clauses) is distributed amongst agents and local deduction occurs purely within an agent, but the agent can broadcast requests for further information. In contrast to our approach, not only is the number of agents static, but the opportunity for more sophisticated structuring of the agent space within the DARES system is absent. Further, the broadcast mechanism is not pervasive - - it is only used to solicit new data when an agent stalls. While the agents within the DARES system are all of the same type, one of the suggestions of that work was to consider different 'specialist' agents within the system. This form of system has been developed using the TEAMWORK approach, a general framework for the distribution of knowledge-based search [6]. While the number of agents within TEAMWORK is more fluid than in DARES, and more sophisticated structuring is provided through the concept of 'teams', the control within the system is centralised through the use of 'supervisor' agents. Also, in contrast to our model, less reliance is placed on broadcast communication. The clause diffusion approach to concurrent theorem-proving [4] also partitions sets of clauses amongst agents. Unlike our framework, new clauses generated may be allocated to other agents. Thus, while new information generated in our approach is distributed by broadcast message-passing, this is achieved in clause diffusion via the migration of clauses. In contrast to our approach, clause diffusion is not primarily intended as a basis for the development of dynamic cooperative agent-based systems. Finally, this work obviously has links with the considerable amount of research which characterises standard planning as sequential theorem-proving, beginning with Green's development of a planning procedure based on resolution [ 12]. Both Green's approach and subsequent developments are described in [ 11] and [ 1].
140
References 1. J. Allen, J. Hendler & A. Tate (eds). Readings in Planning. California: Morgan Kaufmann, 1990. 2. H. Barringer, M. Fisher, D. Gabbay, G. Gough, & R. Owens. METATEM: An Introduction. Formal Aspects of Computing, 7(5):533-549, 1995. 3. K. Birman. The Process Group Approach to Reliable Distributed Computing. TR91-1216, Dept. of Computer Science, Comell University, 1991. 4. M. Bonacina & J. Hsiang. The Clause-Diffusion Methodology for Distributed Deduction. Fundamenta lnformaticae, 24:17%207, 1995. 5. A. H. Bond & L. Gasser (eds). Readings in Distributed Artificial Intelligence. MorganKaufmann, 1988. 6. J. Denzinger. Knowledge-Based Distributed Search using Teamwork. In Proceedings of the First International Conference on Multi-Agent Systems (ICMAS), San Francisco, USA, 1995. 7. R. Englemore & T. Morgan (eds) Blackboard Systems. Addison-Wesley, 1988. 8. M. Fisher. Concurrent METATEM - - A Language for Modeling Reactive Systems. In Parallel Architectures and Languages, Europe (PARLE), Munich, Germany, June 1993. (Published in Lecture Notes in Computer Science, volume 694, Springer-Verlag). 9. M Fisher. A Survey of Concurrent METATEM - - The Language and its Applications. In First International Conference on Temporal Logic (ICTL ), Bonn, Germany, 1994. (Published in Lecture Notes in Computer Science, vol. 827, Springer-Verlag). 10. M. Fisher. An Alternative Approach to Concurrent Theorem-Proving. In J. GeUer, H. Kitano and C. Suttner (eds), Parallel Processing for Artificial Intelligence, 3, Elsevier B.V., 1997. l 1. M. Genesereth and N. Nilsson. Logical Foundations of Artificial Intelligence. California: Morgan Kaufmann, 1987. 12. C. Green. Application of Theorem Proving to Problem Solving. In Proceedings oflnternational Joint Conference on AI, 1969. (Also in B. Webber and N. Nilsson (eds), Readings in Artificial Intelligence. Morgan Kaufmann, 1981.) 13. E Kurfel3. Parallelism in Logic. Vieweg, 1991. 14. V. Lifschitz. On the Semantics of STRIPS. In Reasoning About Actions & Plans, Morgan Kaufmann Publishers: San Mateo, CA, 1986. 15. D. Macintosh, S. Conry, & R. Meyer. Distributed Automated Reasoning: Issues in Coordination, Cooperation, and Performance. IEEE Transactions on Systems, Man and Cybernetics, 21(6):1307-1316, 1991. 16. T. Maruichi, M. lchikawa, and M. Tokoro. Modelling Autonomous Agents and their Groups. In Y. Demazeau and J. P. Muller, editors, Decentralized AI 2 - Proceedings of the
2 nd European Workshop on Modelling Autonomous Agents and Multi-Agent Worlds (MAAMAW '90). Elsevier/North Holland, 1991. 17. D. A. Plaisted and S. A. Greenbaum. A Structure-Preserving Clause Form Translation. Journal of Symbolic Computation, 2(3):293-304, September 1986. 18. R. Smith & R. Davis. Frameworks for Cooperation in Distributed Problem Solving. IEEE Transactions on Systems, Man and Cybernetics, 11(1):61-70, 1981. 19. M. Wooldridge & N. Jennings. Intelligent agents: Theory and practice. The Knowledge Engineering Review, 10(2):115-152, 1995.
C o m m i t m e n t s A m o n g A u t o n o m o u s Agents in Information-Rich Environments* Munindar P. Singh ~ Department of Computer Science North Carolina State University Raleigh, NC 27695-8206, USA
singhQncsu, edu
A b s t r a c t . Commitments are crucial to understanding and designing autonomous agents and multiagent systems. We propose a definition of commitments that applies especially well to agents in information-rich applications, such as electronic commerce and virtual enterprises. Our approach has a number of important features, including - not gratuitously translating social concepts to psychological concepts - distinguishing between satisfied and inapplicable commitments - incorporating social policies to handle the creation, satisfaction, and cancelation of commitments relating commitments to organizational structure in a multiagent system - showing how commitments are acquired by agents as a consequence of adopting a role. -
1
Introduction
Commitments are central to DAI. In this paper, "commitment" refers to social, not psychological, commitment. C o m m i t m e n t s have drawn much research attention because they are an i m p o r t a n t abstraction for characterizing, understanding, analyzing, and designing multiagent systems. C o m m i t m e n t s help coordinate and structure multiagent systems to achieve coherence in their actions. Multiagent systems are finding increasing application in heterogeneous and open information environments--such systems are called cooperative information systems (CISs) [Singh & Huhns, 1995]. CISs have increased expectations of robustness and guarantees of the atomicity, durability, and recoverability of actions. Our ongoing research program seeks to develop abstractions for building flexible CISs to the standards of robustness of traditional systems. * This is an extended and revised version of a paper presented at the ICMAS-96 Workshop on Norms, Obligations, and Conventions. I would like to thanks Rosaria Conte, Christian Lemaitre, and the anonymous reviewers for their comments. I have benefited from discussions with several people over the years, most notably Nicholas Asher, Cristiano Castelfranchi, Les Gasser, Michael Georgeff, and Michael Huhns. ** This work is supported by the NCSU College of Engineering, the National Science Foundation under grants IRI-9529179 and IRI-9624425, and IBM corporation.
142
Technical Motivation Commitments arise not only in the study of agents, but also in distributed databases. However, databases (DB) implement a procedurally hard-wired and irrevocable form of commitment. Modern DB applications, which involve heterogeneity, flexibility, and human collaboration, do not fit the traditional mold. Some of these applications have been addressed using agentbased techniques, e.g., [Wittig, 1992; Singh & Huhns, 1994]; others with advanced database techniques, e.g., [Bukhres & Ehnagarmid, 1996]; and still others by combining in organizational techniques, e.g., [Papazoglou et al., 1992]. The DB and DAI strands of research into commitments have progressed without much cross-fertilization. The DB ideas have tended to be rigid, but in a manner that facilitates robustness. The DAI ideas have been more flexible. However, with respect to information systems, they do not guarantee correctness properties comparable to the DB approaches. We submit that a conceptually well-founded synthesis can yield abstractions for effectively programming CISs. We view CISs as recursively composed loci of commitments. These commitments can be about actions, but in database settings they are typically about results that are released or "published" by different components. Whereas the traditional database approach is to release outputs only when they are definite, in the case of nonterminating computations, we cannot afford to wait till they end! In general, we must allow outputs to be released prematurely. This is also essential, for example, in cases where the given activities must cooperate, so they may exchange their partial results before they terminate. The construction of effective CISs involves the careful synthesis of three kinds of concerns: ....data integrity: correctness of data despite concurrent access and failures; - control and data flow: how triggering, i.e., control, information and data flows through the system; and organizational structure: how the various components relate to each other
in achieving coherent behavior, e.g., whether a control signal is expected and would not be ignored depends on the organizational structure of the components. Traditional nested transactions provide integrity, but restrict the other aspects. Extended Transaction Models (ETMs) also focus on integrity, but allow freer control and data flow at the cost of relaxing the integrity requirements. Database workflow approaches ignore the integrity aspects, but deliver the control and data flow required by specific applications. Workflows in groupware also provide application-specific control and data flow without regard to integrity. In contrast with the above, our approach focuses on how the different components achieve coherence in their interactions. Control and data flow serve to achieve coherence, and integrity is a consequence of it. By organizational structure, we mean not only the roles that different agents play, but also the commitments that they may enter into based on their roles. In our approach, each recursively composed CIS provides a context in which its constituent agents interact. In particular, the agents respect certain commitmerit policies and cancelation policies, which determine when they may adopt
143
or drop commitments. In some cases, these policies might help achieve a correct data state; in others, they may only guarantee that the CIS as a whole is behaving properly.
Organization. Section 2 describes traditional ways of structuring computations. Section 3 discusses our approach to commitment, shows how it handles social policies and the structure of multiagent systems, and discusses its formal aspects and implementation. Section 4 reviews the pertinent literature from three main areas.
2 Problem: Structuring Computations in Open Information Systems We introduce our running example, which involves a simplified form of electronic commerce and virtual enterprises.
~"
valves v-id a z b
idia 21 43 43
f )dia 21 21 43
hoses h-id h12 h14
dia 21 43
J Fig. 1. Traditional (Closed) Transactions
Example 1. Suppose we need to purchase two interdependent parts--a valve and two hoses, with the requirement that their diameters match (otherwise, each is useless). Consider a composite activity that attempts to purchase a shipment of valves from Valvano & Co and matching hoses from Hoosier Inc., thus accessing the databases as shown in Figure 1 (please ignore GT, LT1, and LT2, for now). Let these subactivities be called val and hos, respectively. We imagine that Valvano and Hoosier form a virtual enterprise to provide a higher level of service to their common customers, but continue to run autonomous databases. The key requirement for the purchase is that either (a) both val and hos have an effect, or (b) neither does. |
144
Traditionally, it would be up to the application program to enforce this requirement. Although traditional database transactions have been used extensively in homogeneous settings, it now well-known that they are inappropriate for heterogeneous environments. We show why next. To avoid terminological confusion, we use the t e r m "succeed" instead of the database t e r m "commit" where only the success of an individual transaction is implied. 2.1
Traditional Database
Transactions
Traditional transactions are computations that satisfy a number of useful properties, in particular the so-called ACID properties [Gray & Reuter, 1993].
-
atomicity: all or none of a transaction happens consistency: a transaction preserves the consistency of the database isolation: intermediate results of a transaction are not visible externally durability: when a transaction concludes successfully, its effects are permanent.
If the individual transactions are p r o g r a m m e d correctly, the system guarantees consistency for any arbitrary concurrent mix of transactions. Atomicity is essential to ensure that the integrity of distributed data is preserved. Consequently, the actions or subtransactions that constitute a transaction must either (a) all happen, thereby transforming the database from a consistent state to a new consistent state, or (b) each fail to happen, thereby leaving the database in its original (consistent) state.
Example 2. Continuing with Example 1, we can obtain database support for maintaining consistency as shown in Figure 1. G T is a global, closed-nested transaction corresponding to the purchase activity. It consists of local subtransactions, LT1 and LT2, corresponding to val and hos. G T preserves consistency (either both LT1 and LT2 succeed or neither does), and allows only correct purchases to be visible. | Unfortunately, the above formulation proves highly undesirable. To ensure transaction atomicity, the system must ensure that both val and hos succeed, or neither does. To ensure transaction isolation, the system must ensure that no other transaction sees the intermediate results of val or hos. Further, if a transact.ion that runs on the same databases sees the final results of one subtransaction (e.g., val), then it also sees the final results of the other subtransaction (e.g., hos). The above requirements are stronger than our informal requirement that both or neither subtransaction should have an effect. To realize the above transactional properties requires a mutual commit protocol, e.g., two-phase commit, to be executed. However, that might be impossible, since Valvano and Hoosier are independent enterprises and their databases m a y not even have visible precommit states, essential to execute a m u t u a l commit protocol. Even if the precommit states are visible, it is seldom acceptable to lock resources while communicating with a remote site. Thus traditional transactions are unacceptable in heterogeneous environments.
145
2.2
Extended Database Transactions
Extended transaction models (ETMs) take some steps toward overcoming these limitations. However, they typically address only a part of the problem, chiefly by allowing results to be released prematurely. Failure recovery is typically achieved by compensating the subtransactions that erroneously recorded success (even though other related transactions did not)--the compensations are of course domain-specific. Consider the following example, which uses a simplified version of the DOM ETM [Buchmann et al., 1992].
valves v-id a z b
idia 21 43 43
f ~dia 21 21 43
hoses h-id h12 h14
dia 21 43
J Fig. 2. Extended (Open) Transactions
Example 3. Continuing with Example 2, we now define a purchase activity as in Figure 2. Here, GT is an open-nested global transaction consisting of the val (LT1) and hos (LT2) subtransactions. GT executes LT1 and LT2 concurrently. The results of LT1 and LT2 are visible even before GT has completed. If both or neither succeed, consistency is preserved. If one succeeds and one fails, then either (a) the one that succeeded can be compensated through LT1-1 or LT2-1, e.g., by canceling the its order, or (b) the one that failed can be retried. | This assumes that (a) compensating actions are defined for some of the subtransactions, and (b) it is acceptable to allow temporary inconsistencies. Extended transaction models do not provide perspicuous means to specify and schedule activities, nor means to coordinate them. Scheduling techniques are hard-coded separately for each transaction model.
2.3
Agents
Agents can perform several functions in enterprise integration scenarios. They can capture the semantic constraints and apply them in order to execute or
146
enact workfiows in an integrity-preserving manner. In this way, agents can carry out the business processes in an enterprise. For example, although database consistency is assured even if both transactions fail, the agent might encode that some progress is essential from the purchaser's standpoint.
Example 4. In the scenario of Examples 1 and 2, a purchasing agent can be used. This agent initiates val and hos concurrently. If both succeed, the purchase succeeds. However, if one or both fail, the agent can (a) retry the failed transactions a certain number of times, (b) search for alternative sources and attempt the transactions there, or (c) negotiate with the user's agent and with database agents to enable progress. | The agents can thus realize workflows that correspond to generalized forms of extended transaction models. More importantly, however, the agents can form a CIS and interact with each other in an effective manner. For example, agents can coordinate workflows so that the unavoidable interactions among those workflows do not violate data integrity or prevent progress. Further, the requirements for each workflow can be locally captured by the resource administrators most familiar with the resources that the workflow involves. The formal specifications are kept modular and small, which facilitates their acquisition and verification.
Example 5. Consider ongoing activities to repeatedly stock inventory, ship goods to customers, receive purchase orders, and forecast the market demand. These activities must be coordinated. (a) Stocking up replenishes the inventory for shipping. The stocking up and shipping agents must agree whether to build up large inventories or break up large purchase orders. (b) A purchase order must be received before it is shipped. (c) Market forecasting can either trigger stocking up, or disable it. | But how can we ensure that the agents behave properly? Surely, we need better abstractions than having the application programmer supply hardcoded solutions. 2.4
The Problem
Thus the main problem is to structure activities in a manner that can respect the autonomy of the information resources. The database approaches are restrictive. The agent approaches are flexible, but there is need for tools and formal approaches for designing them. In particular, there is need for a notion of commitment that fexibly reflects the organizational structure of how agents interact.
3
Solution: Spheres of C o m m i t m e n t
We define commitments in a manner that satisfies the above requirements. We dub our approach spheres of commitment (SoCom). SoComs involve not only the data integrity issues, but also reflect the organizational structure associated
147
with CISs, which constrains the control and data flow as well. Each SoCom is autonomous, and has authority over some information resources, on the basis of which it can enter into commitments about those resources. 3.1
S p h e r e s o f Control
To best appreciate our approach, it is instructive to see how spheres of control (SoCs) work. SoCs, which were proposed about two decades ago [Davies, 1978], capture some of the same intuitions as the extended transaction models. The database community is seeing a resurgence of interest in SoCs as the limitations of traditional transactions are being realized [Gray & Reuter, 1993, pp. 174180]. Intuitively, SoCs attempt to contain the effects of an action as long as there might be a necessity to undo them. Ordinarily, a result is released only when it is established that it is correct (and will remain correct). However, if a result may later have to be undone, it can be released only if an SoC can be set up that encloses the activities that consume the result. When the result needs to be undone, the enclosing SoC can undo the activities that used that result.
Example 6. Continuing with Example 2, we can define an SoC that contains the val and hos subtransactions. The results of these subtransactions can be made visible only to those activities that are also within the given SoC. If the results of val and hos are inappropriately released, they can be undone, possibly by also undoing the activities that consumed those results. | SoCs generalize transactions by essentially requiring the entire execution history to be maintained. SoCs require rolling back the execution to undo the effects of erroneously committed activities, followed by rolling forward the execution to redo the necessary computations. Unfortunately, despite their generality in some respects, in a fundamental sense SoCs remain almost as restrictive as traditional transactions. This is because SoCs are also data-centric, and attempt to preserve or restore data integrity. Specifically, we believe that the problem lies in two facts. - SoCs are not active entities, and - SoCs view commitments in the traditional DB sense, which is as depending solely on the computation that commits, not on the interplay between the computation that commits, and the computations that take advantage of that commitment. 3.2
Commitments
Despite its shortcomings, we find the SoC concept useful in motivating SoComs. SoComs provide a means for organizing agents and CISs. We begin by eliminating the distinction between agents and multiagent systems. We view agents as being either individuals or groups, which are recursively composed of agents. In this sense, a CIS is an agent and is potentially composed of agents. We augment
148
our initial definition of agents to additionally require them to be loci of social commitments. Thus, each agent or CIS can be a SoCom. Agents interact by forming c o m m i t m e n t s toward one another. We use the term commiler to refer to the agent that makes a commitment, and the term commitee (not "committee") to refer to the agent who receives the commitment. C o m m i t m e n t s are formed in a context, which is given by the enclosing CIS (or, ultimately, by society at large). We refer to this as the context group. Concomitant with a c o m m i t m e n t is a specification of how it m a y be satisfactorily discharged, and a specification of how it may be canceled. We define three main kinds of social actions, which are instantiated by the following operations on commitments. create
(satisfactorily) discharge -
cancel
Based on the above intuitions, we motivate the following logical form for commitments. D e f i n i t i o n 1. A c o m m i t m e n t is an expression of the form C(x, y, p, G, d), where x is the commiter, y the commitee, G the context group, p the discharge condition, and d the cancelation condition (formally a proposition). It is convenient to define the operations of notify and release as follows. notify(x, y, q) mean that x notifies y of q, and re/ease(y, c) means t h a t y "releases" the commiter of c o m m i t m e n t c, essentially agreeing to its success. In a sense, these are low-level operations, which can be used to implement the above social actions. They are however, quite natural and common to a number of domains. Where necessary, we include the release requirements in the discharge condition. For example, it is possible to commit to "making the sky green," or "making the sky appear green to the commitee" (these are different commitments, with different chances of satisfiability). We now discuss some possible cancelation conditions, which relate to different situations. Let the given c o m m i t m e n t be c = C(x, y, p, G, d). (Explicitly naming the c o m m i t m e n t itself enables setting up mutual commitments.) P]. d = false: the c o m m i t m e n t is irrevocable. P2. d = notify(x, y, q): the commiter is only obliged to notify the commitee, where q means that the c o m m i t m e n t c is being canceled. P3. d = true: the c o m m i t m e n t can be given up at will, and is effectively not a c o m m i t m e n t at all. P4. d -- re/ease(y, c): the commitee must explicitly release the commiter. Ph. d = release(G, c): the context group must explicitly release the commiter.
Example 7. Consider the situation of Example 2 after val has successfully completed its internal processing, but not yet officially published its results. This can be modeled as cl -- C(val, hos, succeed(val), G, cannot_succeed(hos)). Here
149
G corresponds to the global transaction. The above commitment means that if val can succeed, it will unless hos cannot succeed. Additional commitments are need to capture the entire specification, e.g., to ensure that val does not succeed unless hos succeeds. |
Fig. 3. Nested Spheres of Commitment
Example 8. Continuing with Example 2, we define two SoComs--shown in Figure 3--with authority over the Valvano and Hoosier databases, respectively. These SoComs execute the corresponding subtransactions. There is also a SoCorn corresponding to the Valvano-cum-I-Ioosier virtual enterprise (VE). As in Example 4, a customer agent carries out the desired workflow. This agent might itself be a SoCom with authority over purchases in its enterprise. A possible set of commitments could be as follows. - The Valvano and Hoosier SoComs inform other agents as to how many units of a valve or hose they have in stock. If stock is available, they will "lay-away" up to a certain number of units for a (potential) customer; if stock is not available, they will notify the customer. - However, if the stock falls low, the SoComs can ask a customer to decide or pay a nonrefundable deposit. -
150
- The customer commits to releasing chase. - The customer can request to apply selling SoCom's discretion. - The customer can request a refund of purchase price is refunded if a available.
a lay-away if he decides against the purthe deposit for another purchase, at the from the VE SoCom. The entire deposit matching item (hose or valve) was not
In this setup, val or hos in general cannot be undone--customers can't expect full refunds after the purchase. However, if val and hos are being performed as a package, i.e., in the Valvano-cum-Hoosier VE, the VE SoCom ensures that customers will get refunds if one of the subtransactions fails. Other customers who were told that stock was not available will be notified, and given an opportunity to retry their purchase. Lastly, negotiation and exceptions are allowed, although after commitments have been made, the decision might reside with one of the participants. I 3.3
Social Policies
Social policies are policies that govern the social actions--they characterize when the associated action occurs. It is helpful to define the order of a commitment as follows. D e f i n i t i o n 2 . Consider a commitment c = C ( x , y , p , G , d ) . c is 0-order iff p makes no reference to any commitments, c is (i + 1)-order iff the highest order commitment referred to in p is/-order. Social policies are formally represented as conditional expressions. Policies f o r / - o r d e r commitments are (i + 1)-order commitments. Even for policies, our fundamental correctness condition remains: if a commitment is created, then it must satisfactorily discharged, unless it is canceled in time. A variety of policies can be defined, with applicability in different settings. Policies also have a computational significance, which is that they can lead to commitments being created, discharged, or canceled without reference to the context group. It is the locality of policies that makes them useful in practice. Consider a simple example. Example 9. Continuing with Example 8, consider a customer who makes an Ecash deposit for some valves, but later decides not to get them. He might avoid losing his deposit by finding another purchaser for those valves. The selling SoCom would have to accept an alternative purchaser if the applicable social policies allow that, unless they were explicitly overridden by the contract. There is no need to invoke the context group, i.e., complain to the VE SoCom, or to file a lawsuit. I In the above the actions are performed by the constituent CISs. Sometimes, however, it is useful to perform actions at a higher level CIS. Such actions might be necessary when the actions of the member agents need to be atomically performed or undone.
151
Example 10. Continuing with Example 8, suppose an order for matching valves and hoses is successfully placed. It turns out later that the valve manufacturer discontinued the model that was ordered, but recommends a substitute. The substitute valve takes a different hose diameter than the original choice. Suppose the VE SoCom knows the relevant constraint, and is authorized to update the order. It would be better to undo and rerun both val and hos before notifying the customer, than to notify the customer about each subtransaction individually. This strategy assumes that the VE SoCom is responsible for performing actions to correct orders. | 3.4
Social v e r s u s Psychological Primitives
Some previous approaches, e.g., [Levesque et al., 1990; Grosz & Sidner, 1990], attempt to reduce social constructs to psychological constructs. They do not have an explicit construct for commitments, but postulate mutual beliefs among the committing agents. However, Mutual beliefs require the agents to hold beliefs about each other to unbounded levels of nesting, which can be tricky [Singh, 1996a]. Also, mutual beliefs cannot be implemented except through additional simplifying assumptions, which is why the direct approach of using social constructs is more appealing. In fact, it is known that in settings with asynchronous, unreliable or unboundedly delayable communication, mutual beliefs can be obtained only if they are there from the start--i.e., the mutual beliefs are the invariants of the system [Chandy & Misra, 1986]. We conjecture that named groups and named commitments, which are reminiscent of contract numbers in business dealings, provide the necessary connections among the agents. This is a reasonable conjecture, because commitments and the groups they exist in can provide the requisite context that is copresent with all of the agents. Membership in a group can require mutual commitments, which can refer to each other (by using each other's names). Thus, the effect that traditional theories attempt to achieve by using mutual beliefs can be achieved without mutual beliefs, and without reducing social primitives to psychological primitives. We believe that with further technical development, this will prove to be an important point in favor of social commitments. 3.5
Implementation
The above view of commitments can thus lead to CISs that behave flexibly. In order to make the construction of such CISs equally flexible, we are developing a generic facility for commitment specification and management. This facility would allow the specification of CISs along with the social policies that apply within them. We provide a generic set of Java classes through which abstract CISs can be specified. These specifications include the different roles in a given CIS, and the capabilities and resources required to instantiate each role. These specifications also include the social policies--expressed in terms of roles--that apply within the abstract CIS. Essentially, these are the commitments that the
152
role comes with. For example, the seller role presupposes that the seller will respond to requests for price quotes, and honor its quotes. The abstract CISs are instantiated with concrete agents filling each role. The concrete agents may be individuals or groups. Recalling [Gasser, 1991], a concrete agent may fill in more than one role in an abstract CIS, and participate in more than one abstract CIS concurrently. The act of joining a CIS corresponds to creating commitments. The commitments associated with a role are schematic. Upon instantiation of the roles, these are instantiated into commitments by and toward concrete agents. Thus agents can thus autonomously enter into SoComs. Agents must make sure they have the capabilities and resources required to take on any additional role, and its concomitant commitments. Some of the inherited commitments might require overriding some prior commitments. For example, the Valvano agent must relax its refund policy when joining the above-mentioned VE. Once the concrete CISs have been instantiated, any of the member agents can initiate an activity, which can trigger additional activities. The facility provides primitives through which agents can instantiate a CIS, create commitments within the context of a CIS, and satisfy or cancel commitments. The facility takes care of the bookkeeping required for these operations, and to ensure that the correctness condition is met. The underlying means of execution is based on a temporal logic approach, which extends the results of [Singh, 1996b], to provide primitives for coordinating heterogeneous activities.
4
Comparisons
with the Literature
D A I Approaches. Gasser describes some of the sociological issues underlying multiagent systems [Gasser, 1991]. His notion of the multiple simultaneous roles played by social agents inspired part of our discussion above. Castelfranehi studies concepts similar to those here [Castelfranchi, 1993]. Our context groups generalize his notion of a witness. Castelfranehi distinguishes a notion of collective commitment, which is subsumed by our concept of commitment (through the orthogonal representation of the structure of multiagent systems). Tnomela develops an interesting theory of joint action and intention that bears similarities to collective commitments [Tuomela, 1991]. [Sichman et al., 1994] develop a theory and interpreter for agents who can perform social reasoning. Their agents represent knowledge about one another to determine their relative autonomy or dependence for various goals. Dependence leads to joint plans for achieving the intended goals. This theory does not talk about commitments per se, so it is complementary to our approach. We also believe that our approach with its emphasis on structure and context can be married with that of [Sichman et al., 1994] to lead to more sophisticated forms of social reasoning. The approach of [Levesque et al., 1990] requires the agents to have a mutual belief about their goals. Further, it hardwires a specific approach to canceling commitments (for joint intentions)--the participating agents must achieve a mutual belief that the given commitment has been canceled. The approach of
153
[Jennings, 1993] is closer in spirit to the present approach. Jennings postulates conventions as ways in which to reason about commitments. Thus, he can generalize on [Levesque et al., 1990]. However, for teams, he requires a "minimum" convention, which recalls the approach of [Levesque et al., 1990]. Jennings also requires a mental state as concomitant with a joint commitment. While we share many of the intuitions and motivations of [Jennings, 1993] (including applications involving heterogeneous information systems), we attain greater generality through the explicit use of the structure of multiagent systems. The agents always enter into commitments in the context of their multiagent system, and sometimes to that system. This has the pleasant effect that social concepts are not made dependent on psychological concepts. The multiagent system serves as the default repository for the cancelation and commitment policies, although these can, in several useful cases, be assigned to the member agents. We believe the relationship of our approach to open-nested transaction models and workflows will lead to superior multiagent systems for information applications. Distributed assumption-based [Mason & Johnson, 1989] or justification-based [Huhns & Bridgeland, 1991] truth maintenance systems (DTMSs) are also germane. These systems help a group of agents revise their beliefs as a consequence of messages received. On the one hand, DTMSs can be given a knowledge-level characterization in terms of commitments; on the other hand, they can be used to implement some of the reasoning required in maintaining commitments.
DB and Groupware Approaches. A number of extended transaction models have been proposed, e.g., [Bukhres & Elmagarmid, 1996]. The extended transaction models allow partial results to be released, and then attempt to restore consistency through actions to compensate for the effects of erroneously completed actions. Some workflow scheduling approaches exist that provide functionality to capture control flow among tasks. The database approaches don't provide much support for the organizational aspects. For example, they ignore social commitments altogether. Some of the groupware approaches, which study organizational structure, do not consider quite as rich a form of commitments as here. For example, information control nets are primarily geared toward control and data flow aspects [Nutt, 1993]. The notion of commitments finds applicability in some groupware tools. For example, [Medina-Mora & Cartron, 1996] shows how the flow of work in an organization is expressed through commitments in the ActionWorkflow tool. This tool comes with a fixed set of specifications from which the developer can choose. Although the participants can decide whether a given task was successfully performed, there is no notion of failure recovery, of commitments being canceled, or of commitment and cancelation policies. Still, we believe, this is an interesting system that shows how much can be achieved through the careful use of commitments.
154
5
Conclusions and Future Work
We sought to present the unifying principles behind commitment for single-agent and multiagent systems. Our approach marries insights from DB and DAI, to yield a framework for flexible, yet robust, cooperative information systems. Our approach makes the following contributions. It - does not require translating commitments to psychological concepts, such as beliefs - distinguishes between satisfied and inapplicable commitments - incorporates policies to handle the creation, satisfaction, and cancelation of commitments - relates commitments to organizational structure in a multiagent system - shows how commitments are acquired by agents as a consequence of membership in a group. A practical challenge is determining classes of commitments and policies that are more relaxed than the traditional approaches, yet can be efficiently implemented. Two other technical challenges are introducing temporal aspects into the language, and relating the development of commitments to decision theoretic analyses of rational behavior.
References [Buchmann et al., 1992] Buchmann, Alejandro; ()zsu, M. Tamer; Hornick, Mark; Georgakopoulos, Dimitrios; and Manola, Frank A.; 1992. A transaction model for active distributed object systems. In [Elmagarmid, 1992]. Chapter 5, 123-158. [Bukhres & Elmagarmid, 1996] Bukhres, Omran A. and Elmagarmid, Ahmed K., editors. Object-Oriented Multidatabase Systems: A Solution for Advanced Applications. Prentice Hall. [Castelfranchi, 1993] Castelfranchi, Cristiano; 1993. Commitments: From individual intentions to groups and organizations. In Proceedings of the AAAI-93 Workshop on AI and Theories of Groups and Organizations: Conceptual and Empirical Research. [Chandy & Misra, 1986] Davies, K. M. and Jayadev Misra; 1986. How Processes Learn. Distributed Computing 1:40-52. [Davies, 1978] Davies, Charles T. Jr.; 1978. Data processing spheres of control. IBM Systems Journal 17(2):179-198. [Elmagarmid, 1992] Elmagarmid, Ahmed K., editor. Database Transaction Models for Advanced Applications. Morgan Kaufmann. [Gasser, 1991] Gasser, Les; 1991. Social conceptions of knowledge and action: DAI foundations and open systems semantics. Artificial Intelligence 47:107-138. [Gray & Reuter, 1993] Gray, Jim and Reuter, Andrea.s; 1993. Transaction Processing: Concepts and Techniques. Morgan Kaufmann. [Grosz & Sidner, 1990] Grosz, Barbara and Sidner, Candace; 1990. Plans for discourse. In Cohen, P.; Morgan, J4 and Pollack, M., editors, SDF Benchmark Series: Intentions in Communication. MIT Press, Cambridge, MA. [Huhns & Bridgeland, 1991] Huhns, Michael N. and Bridgeland, David M.; 1991. Multiagent truth maintenance. IEEE Transactions on Systems, Man, and Cybernetics 21(6):1437-1445.
155
[Jennings, 1993] Jennings, N. R.; 1993. Commitments and conventions: The foundation of coordination in multi-agent systems. The Knowledge Engineering Review
2(3):223-250. [Levesque et al., 1990] Levesque, H. J.; Cohen, P. R.; and Nunes, J. T.; 1990. On acting together. In Proceedings of the National Conference on Artificial Intelligence. [Mason & Johnson, 1989] Mason, Cindy L. and Johnson, Rowland R.; 1989. DATMS: A Framework for Distributed Assumption-Based Reasoning. In Gasser, L. and Huhns, M. N., editors, Distributed Artificial Intelligence, Volume II. Pitman/Morgan Kaufmann, London. 293-318. [Medina-Morn & Cartron, 1996] Medina-Morn, Radl and Cartron, Kelly W.; 1996. ActionWorkflowR in use: Clark County department of business license. In Proceedings of the 12th International Conference on Data Engineering (ICDE). 288-294. [Nutt, 1993] Nutt, Gary J.; 1993. Using workflow in contemporary IS applications. Technical Report CU-CS-663-93, University of Colorado. [Papazoglou et al., 1992] Papazoglou, Mike P.; Laufmann, Steven C.; and Sellis, Timothy K.; 1992. An organizational framework for cooperating intelligent information systems. International Journal on Intelligent and Cooperative Information Systems 1(1):169-202. [Sichman et al., 1994] Sichman, Jaime Sims Conte, Rosaria; Demazeau, Yves; and Castelfranchi, Cristiano; 1994. A social reasoning mechanism based on dependence networks. In Proceedings of the 11th European Conference on Artificial Intelligence. [Singh & Huhns, 1994] Singh, Munindar P. and Huhns, Michael N.; 1994. Automating workflows for service provisioning: Integrating AI and database technologies. IEEE Expertg(5). Special issue on The Best of CAIA '94 with selected papers from Proceedings of the 10th IEEE Conference on Artificial Intelligence for Applications, March 1994. [Singh & Huhns, 1995] Singh, Munindar P. and Huhns, Michael N.; 1995. Cooperative information systems. Tutorials notes from conferences including the International Joint Conference on Artificial Intelligence (IJCAI), Montreal, 1995; the IEEE International Conference on Data Engineering (ICDE), New Orleans, 1996; and the European Conference on Artificial Intelligence (ECAI), Budapest, 1996. [Singh, 1996a] Singh, Munindar P.; 1996a. A conceptual analysis of commitments in multiagent systems. Technical Report TR-96-09, Department of Computer Science, North Carolina State University, Raleigh, NC. h t t p ://w~w. csc. ncsu. e d u / f a c u l t y / mpsingh/papers/mas/commit, ps. [Singh, 1996b] Singh, Munindar P.; 1996b. Synthesizing distributed constrained events from transactional workflow specifications. In Proceedings of the 12th International Conference on Data Engineering (ICDE). [Tuomela, 1991] Tuomela, Ralmo; 1991. We will do it: An analysis of group-intentions. Philosophy and Phenomenological Research LI(2):249-277. [Wittig, 1992] Wittig, Thies, editor. ARCHON: An Architecture for Multi-agent Systems. Ellis Horwood Limited, West Sussex, UK.
Making a Case for Multi-Agent Systems Fredrik Ygge ~
Hans Akkermans
qEnerSearch AB and University of Karlskrona/Ronneby Department of Computer Science (IDE) 3 7 2 25 R o n n e b y , S w e d e n F r e d r i k . Y g g e @ ide.hk-r, se, h t t p : / / w w w . rby.hk-r, s e / - f r e d r i k y w
and University of Twente
Department of Information Systems (INF/IS) P.O. B o x 2 1 7 , N L - 7 5 0 0 A E E n s c h e d e , T h e N e t h e r l a n d s akkermans @ecn.nl, J.M.Akkermans @cs.utwente.nl
Abstract Multi-Agent Systems (MAS) promise to offer solutions to problems where established, older paradigms fall short. To be able to keep promises, however, indepth studies of advantages and weaknesses of MAS solutions versus conventional ones in practical applications are needed. In this paper we offer one such study. Climate control in large buildings is one application area where MAS, and marketoriented programming in particular, have been reported to be very successful. We have therefore constructed and implemented a variety of market designs for this problem, as well as different standard control engineering solutions. This paper gives a detailed analysis and comparison, so as to learn about differences between standard versus MAS approaches, and yielding new insights about benefits and limitations of computational markets.
1
Introduction
When new paradigms arise on the scientific horizon, they must prove their value in comparison and competition with existing, more established ones. The multi-agent (MAS) paradigm is no exception. In a recent book on software agents, edited by Jeffrey Bradshaw [Bradshaw, 1997], Norman observes that perhaps "the most relevant predecessors to today's intelligent agents are servomechanisms and other control devices". And indeed, a number of applications for which MAS have recently claimed success, are close to the realm of what is traditionally called control engineering. One clear example is the climate control of large buildings with many office rooms. Here, Huberman et al.
157
[Huberman and Clearwater, 1995] have constructed a working MAS solution based on a market approach, which has been reported to outperform existing conventional control. A key question is: in what respect and to what extent are multi-agent solutions better than their alternatives? We believe that the above-mentioned application provides a nice opportunity to study this question in more detail. It is practically very relevant, it lends itself to alternative solutions, and it is quite prototypical for a wide range of industrial applications in resource allocation (including the energy management applications we are working on [Ygge and Akkermans, 1996], the file allocation problem of Kurose and Simha [Kurose and Simha, 1989], and the flow problems investigated by Wellman [Wellman, 1993]). The contribution of this paper is a detailed analysis of a published MAS approach to building climate control and a comparison between this approach and traditional approaches. We also introduce a novel approach to this problem based on a general equilibrium market. From the analysis of these approaches we draw conclusions not only about the suitability of these approaches, but of many other MAS approaches as well. The structure of the paper is as follows. Section 2 introduces the application: it describes the office environment and gives the physical model for cooling power and the various temperature and outside weather influences. First, we give a standard control engineering solution, based on local and independent integral controllers (Section 3). Next, we consider the market-based approach as put forward by Huberman et al. [Huberman and Clearwater, 1995] (Section 4), and we discuss a number of variations on this market design (Section 5), providing a kind of factor analysis for its success. Section 6 then shows the results of an improved control engineering scheme that exploits global data. Finally, we propose a market design of our own, based on local data only, and make a comparison across the different approaches (Section 7). Section 8 puts our results into perspective and summarizes the conclusions.
2
The Office Environment
In this section, we present the mathematical-physical model of the office environment. For those readers who find the mathematics forbidding, we first give a conceptual summary which makes it possible to skip the equations. The offices are attached to a pipe in which the resource (cold air) is transported as in Figure 1. The characteristics of this system are similar to the characteristics of a district heating system, but with offices instead of households. We assume that there are 100 offices in total, and that they are equally distributed towards East, South, West and North. The thermodynamics of the office environment is actually quite simple. Every office is seen as a storage place for heat, but heat may dissipate to its environment. In the model, the thermodynamic behaviour of an office is equivalent to a simple electrical RC-circuit. Here, voltage is analogous to temperature, and electrical current is analogous to heat
158
Figure 1: Offices and air transportation.
flow. C and R then respectively denote heat capacitance and heat losses to the environment. A good general reference on thermodynamic models as we use here is [Incropera and Witt, 1990]. The heat equations are continuous in time, but are discretized according to standard procedures used in control engineering, described e.g., in [Ogata, 1990]. The AI aspects of thermodynamic model building with component libraries are discussed extensively in [Borst et al., 1997].
2.1
Basic Physical Properties
The resource treated is cooling power. Each office can only make use of a fraction, ~/, of the available resource at that office, Paio, i.e.
Pcio ~_ 71" Paio,
(1)
where Pcio is the consumed power. The available resource at one office is equal to the available resource at the previous office minus the consumed resource at the previous office. Throughout the paper we use an 71of 0.5. The index i denotes the time interval under observation and the index o denotes the office under observation. This naming convention is used throughout the entire paper. In addition, the index k is also sometimes used for time intervals. We treat everything in discrete time with a time interval of one minute. For all offices
159
the temperature, Tio, is described by i
Tio = To,o + E ( P h k o
(2)
-- P c k o ) / C o ,
k=l
where Phko is the heating power and Co is the thermal capacitance. The heating power is described by Phio = (T,,io - T i o ) / R o , (3) where Ro is the thermal resistance and Trio is a virtual outdoor temperature detailed below. LFrom Eq. (2) and Eq. (3) we obtain Tio -
2.2
1
1 1 + noCo
(Ti-l,o +
Tv,~ _ Pc~o ),
Ro
Co
i > O.
(4)
Weather Model
All external weather influences on the office environment are modelled by a virtual temperature, representing outdoor temperature, sun radiation etc. We assume that there is sunshine every day and that the outdoor temperature, T ~ varies from 22 to 35~ according to T~ = 22 + 13. e -((i's-a) rood 24-12)2/20 (5) where s is the length of each time interval expressed in hours, i.e. here s = 1/60. The virtual temperature, Trio, is described by Trio = T~ i~
T ~o + Tdio, --1- ~
(6)
where Ta is a random disturbance, thought to represent small fluctuations caused by e.g. the wind. Td is Gaussian distributed with an average value of 0 and a standard deviation of 1. Tr is a sun radiation component. For the offices located at the East side Tr is described by T rEast = 8 " e - ( ( i's+4 ) rood 24-12)2/5 (7) i
and correspondingly for the South and the West offices
24-12)2/5
(8)
TrWest = 8" e -((i's-4) rood 24-12)2/5
(9)
TrSi~
-- 15 9e -(i's
mod
and
The various temperatures are plotted in Figure 2.
160
35
~40:
o
0 :: . . . . . . . . . . . . . .
5 J
i,
~§
:: :
~ ~o7
I I I I
Time of dlly, h
Time of day, h
Time of day, h
Figure 2: The leftmost plot shows the outdoor temperature, T ~
The middle plot shows the sun radiation components, T~, (with peaks at time 8, 12, and 16 hours for offices located at the East, South, and West side respectively). Finally, the outdoor temperature plus the sun radiation components are plotted to the right.
2.3
Office Temperatures Without Control
In Figure 3, the temperature for a South oriented office is plotted with different thermal
resistances, Ro, a n d t h e r m a l capacitances, Co. For simplicity w e a s s u m e all Ro to be equal and all Co to be equal. From this figure we note two things: first, the higher RoCo the bigger the lag in temperature changes, and second, the higher RoCo the smaller the
fluctuations in temperature. For the experiments in this paper we chose Ro = 10 and Co -- 10. An Ro or a Co equal to zero implies that Tio = Trio, as can be seen by letting Ro or Co approach zero in Eq. (4).
4s T
i
40
~ I ! i
~
o 351
R=0 or C=0
i - - B - - R= 10, C= 10,
~
- - X - - R=I 0 and C= 20 or R=20 and i C=10 ! - - R=20 a n d C = 2 0
25
[
2O #
o
04
~
{o
O3
0
O~
"~
~
aO
0
Time of day, h
Figure 3: The indoor temperature for an uncontrolled office is plotted for different values of the thermal resistance and the thermal capacitance. Small values of the thermal resistance and capacitance give higher peaks, while higher values give smoother curves.
161
2.4
Simulations
All solutions discussed in this paper have been implemented in and simulated by the authors using C++ on an IBM compatible PC running Windows95. Furthermore, for guaranteeing that a fair comparison on the same footing was carried out, all simulations have been independently recreated and verified from the paper, as part of a master's thesis project, using Python on an IBM compatible PC running Linux.
3 3.1
CONTROL-A: Conventional Independent Controllers Integral Control for the Offices
The application of regulating the office temperature has a long tradition in control theory. The most widely known controllers are different variants of the PID controller. The letters PID denote that the control signal is proportional to the error (i.e. the difference the between the setpoint and the actual value) - - the P; proportional to the integral of the error - - the I; and proportional to the derivative of the e r r o r - - the D. Here, we will use a variant of the integrating controller, i.e., an I-controller, of the form
Fio = Fi-l,o +/~(Tio - T~etP),
(10)
where F is the output signal from the controller and ~ is a gain parameter. For the simulations it is assumed that Fio will not be below 0 and that it will not exceed 3. The control signal Fio is sent to the actuator and the actual Peio is obtained from
Pcio = ~ Fio, ( ~" Paio,
Fio < 71. Paio Fio > ~" Paio
(ll)
Plots of the office temperatures with different gains are shown in Figure 4. The gain of the controller is not critical for this application. Too high a gain will result in the controller overreacting when the temperature exceeds the setpoint, after which it will be under the setpoint for quite some time. This leads to a larger error than if smaller adjustments are made. Also, the amplitude of the control signal then gets unnecessarily high, but the system does not get dangerously unstable. We note that the maximum deviation here is •176 Thus, controllers using any of the three gains perform very well. Apart from the plots shown here, also a step response has been analyzed. From a brief analysis we chose a gain of 10.
The Performance Measure We use the same evaluation of the building control as Huberman et al. [Huberman and Clearwater, 1995], namely the standard deviation of the deviation from the setpoint, I
N
1 S t d D e v ( T - T ~etp) = ~1 -~ ~ [ ( T i o - T seep) - ((Ti) -
(12)
162
20,06
r
20,04 [ o
20,02
li
t
i~
:i
Ji
~:
i;
20
::
,
I [
.E
Gain = 100 ] - Gain = lO i Gain = 1
I.- 19,98 19,96 19,94 Minutes in an observed interval
Figure 4: The indoor temperature.for an office, utilizing a integral controller, is plotted for different controller gains.
where T setp is the setpoint temperature, (T) is the temperature averaged over all offices, and ( T setp) is the average setpoint temperature.
c 5 .O ~ 4
........ Avail R e s = 130 ............ Avail R e s = 140
"o 3
Avail R e s = 150 Avail R e s = 160
Ill
0
I[11
P ,
i
?
o T i m e o f day, h
Figure 5: The standard deviation, as defined in Eq. (12)for each interval, i, is plotted for different amounts of available resource (130, 140, 150, and 160) with 100 offices. The lower the amount of available resource the higher the standard deviation.
Limitations on Available Resource So far we have assumed that the total amount of available resource is unlimited. Now we assume that there is a maximum value for the power that is inserted into the system. In such a situation offices that are situated close to the air input will have a sufficient amount of cool air, while those near the end will suffer if totally uncoordinated controllers are used. Thus, the smaller the amount of the total available resource, the larger the standard deviation will be. This is visualized in
163
Figure 5. From the figure we have chosen an upper limit for the total amount of resource of 140.
Discussion As seen from the example, independent integrating controllers perform very well when the resource is unlimited. On the other hand, when there is a shortage of resource the standard deviation increases dramatically.
4
MARKET-A: The A p p r o a c h by H u b e r m a n et al.
A MAS solution to the problem of building control has been presented by Huberman et al. [Huberman and Clearwater, 1995, Clearwater and Huberman, 1994]. The approach taken is to model the resource allocation problem of the building environment as a computational market where agents buy and sell resource. The non-separability in terms of agents, i.e. the fact that the amount of resource the penultimate agent can use depends on how much resource the last agent receives, is ignored. The basic idea is that every office is represented by an agent that is responsible for making good use of the resource and to this end trades resources with other agents. The agents create bids which are sent to an auctioneer that calculates a clearing price, and makes sure that no agent has to buy for a price higher than its bid nor has to sell for a price lower than its bid. Section 4.1 describes the mathematical details of this approach. The reader having no special interest in these details may skip this part and take a look at the results of the corresponding simulations in section 4.2.
4.1 Original Formulation All formulas in this section were taken from [Huberman and Clearwater, 1995] and [Clearwater and Huberman, 1994].
Trade Volumes
First, the decision for an agent to buy or sell is based on
T-~etp (Ti) tio = Ti----~" (Tsetp)
~ tio > 1, seller !. tio < 1, buyer
(13)
Then the total trade volume, V, is calculated from N
gi = ~ l l - t i o l , o=1
where N is the number of offices.
(14)
164
Then every agent calculates its request volume, v, according to
I1 - tiol Vio = ( x - -
(15)
E
When an agent buys or sells its v the actual movement of a valve, called V A V , is computed from A V A K i o = f ( f l o w w , Vio, VA$~o), (16) where f is an empirically determined function which is not explicitly given in the papers [Huberman and Clearwater, 1995, Clearwater and Huberman, 1994]. From this the actual V A V position for each interval is updated according to
VAVi+1,o = VAl/Tio + AVAV~o. Bids
(17)
The bids are based on the marginal utility I of the form described by
U(tio/T, setp, mio) = [U(O, mio)] ('-t'~
l1
= [U(O, mio)]'
(Ti)
~ ' ,
~,
(18)
with
U(O, mio) = u3 - (ua - Ul)e -'ym'~
(19)
? = In [U3 -- Ul ] ,
(20)
and Lu3 -- u2 J
where ul = 20, u2 = 200, and u3 = 2000, and m is the amount of money that an agent has, as given by mio = 100(2 - VAVio). (21) By observing Eq. (19) and Eq. (20), we note that the equations can be simplified to U(O, mio) =
?23 -
( u 3 - 1 1 , 1 ) e-?mi~ = ?/,3 -
u3 - ( u a
-ul)
('//,3 - ~ l ) ( e ~ r )
( ~ a - u ~ -m`~ . \ ua-u2 ]
-mi~ :
(22)
Since the V A V varies between 0 and 1, the amount of money an agent has varies between 100 and 200, and thus U(0, m) varies between 1999.85632 and 1999.99999. Hence, the notion of gold does not effect the marginal utility in practice, as will be verified by the simulations in Section 5. The bids are calculated from multiplying the marginal utility with the previous price,
price, according to Bi+l,o = U w ( t i o / T setp, mio) x pricei.
(23)
IWhat Huberman et al. call utility in their work [Hubermanand Clearwater, 1995, Clearwater and Huberman, 1994] is generally called marginal utility in micro-economictheory.
165
Auction All bids are sent to an auctioneer which calculates a market price where supply meets demand. All agents which requested to buy and which bids are above or equal to the market price are assigned the corresponding amount, and, similarly, all agents which requested to sell and which bids are below or equal to the market price are assigned the corresponding of amount of resource. Assumptions and Simplifications Since f in Eq. (16) above is not explicitly given and since the relation between the V A V position and Pc is specific to the office environment, we need to make an assumption about how the bid volumes relate to Pc. We take the simplest choice and let Eq. (16) and Eq. (17) be replaced by
Fi+l,o = Fio + rio,
(24)
where plus or minus depends on whether it was an accepted buy or sell bid. Pcio is obtained from Eq. (11). This simplification is not crucial to our central line of reasoning. As pointed out by Clearwater & Huberman [Clearwater and Huberman, 1994], the purpose of the auction is only to reallocate between the agents and not to affect the average temperature. This means that even if the valves are not opened very much, there is plenty of resource in the system, and if the offices are all above the setpoint, no valve can open further without another one closing. In order not to complicate the model further, the simulation is only performed in a time interval where there is a total lack of resource, with the total available resource assigned to the system. Thus, we do not implement a mechanism that inserts and deletes the total resource to the system. We choose to simulate between 3 p.m. and 7 p.m. We assume the amount of money to be given by
mio = 100(2
Foma= - Fio, -~o--~-~-
).
(25)
Thus, the amount of money will vary from 100 to 200, as in the original work. Eq. (23) turns out to produce major problems in simulations. Above we have seen that U(0, m) ~ 2000. Equation (18) shows that U(tio/T setp, mio) is minimized for minimized Tio and maximized " As we can expect Tio to be well above 10~ and
~
to be well below 2, we can be sure that
U(tio/ToSetp~mio)
will be well above
20001- ~ ~ 437. Thus, the bidding price will never be below 437. Then, Eq. (23) tells us that the market price will be at least priceo x 437 i. This leads to numerical overflow after only a few iterations. In practice the market price scales with approximately priceo • 1300 i which of course is even worse. We note however that, since all agents multiply their bids with the previous price, this has no effect on the reallocation itself, but only affects the price level. Therefore, we choose to omit multiplying by the previous price in our simulations, noting that this leads to exactly the same allocations but avoids numerical overflow. Hence, the bids are based on Eq. (18) rather than Eq. (23). The price at each auction is adjusted until supply meets demand. But, since the bids are given using discrete volumes, supply will seldom match demand exactly at the cleating
166
price. Normally, there will be a very small excess demand or supply. If there is an excess supply, all buyers that are willing to pay at most the clearing price will buy, but not all sellers that are willing to accept at least the clearing price can sell. In this situation, the sellers are selected randomly from the valid candidates. The same procedure is used when there is a small excess demand.
4.2
Simulations
Figure 6 shows two plots. The upper plot is the standard deviation when independent integrating controllers are used and the lower one shows the agent-based control scheme as defined above. We found that an c~, as described by Eq. (15), of 64 led to the smallest overall standard deviation. It is seen that the agent approach offers at least one order of magnitude of improvement.
2S
i--c~t, oI.A~ {--------.Market-,*,
,,~ .................
-~ . . . . . . . . . . . . . . . . . . . . .
, ......................
+
Tim* of da'/h
Figure
6:
Standard deviationfor independent controllers (top) and agent-based control (bottom).
Compared to the independent controllers this is indeed a major advance. It is clear that the market coordination of the controllers leads to a much smaller standard deviation and to higher comfort in the offices. This validates the results of Huberman et al. [Huberman and Clearwater, 1995, Clearwater and Huberman, 1994].
5
MARKET-At: A Suite of Variations
In this section we present a suite of variations on the scheme presented in Section 4. The main aim is to see if the scheme can be simplified without loss of performance.
Deleting the Gold Dependency
Plots of the standard deviation with the original scheme as well as with a scheme where all gold dependencies have been removed - by setting U(0, m) in Eq. (19) to a constant value (2000) - - are shown in Figure 7. The scheme without gold dependencies performs approximately as good as the original
167
0,25
0,2
g ~>,15 --Market-A - m
'
Market-A', No G o l d
0,05
0
t
I
I
r
Time of day, h
Figure 7: Standard deviation for the original scheme and a scheme with the gold dependencies removed. scheme. The reason for this, as pointed out earlier, is that U(0, m)'s gold dependency is extremely weak. For the scheme without gold an ~ of 66 turned out to be optimal.
0,2
•>'•,15
- -
-
-s
Msrkot -A', No Gold
-a
- - M a r k e t A', No Gold, No
~o,I m
temp
0.05
0~-m
Time o'Rday, h
Figure 8: Standard deviation for the scheme with the gold dependencies removed, compared to a scheme with both the gold and temperature dependencies removed.
Deleting the Temperature Dependency The next issue is the marginal utility's dependence on the current temperature, i.e. the dependency of U on T in Eq. (18). In Figure 8 the standard deviation is plotted for the scheme without gold, mentioned above, and a scheme where the dependency on the temperature has been removed as well. We see that also here the performance is approximately as good as that of the original scheme. In order to remove the temperature dependency from the marginal utility, we simply set all selling prices to 10 and all buying prices to 100. If there is a mismatch between supply and demand, say, supply exceeds demand, the agents that will sell are picked randomly. Here, an a of 65 turned out to be optimal. Note that the temperature is still used to determine both the bidding volume and the decision whether to buy or sell.
168
0,25
0,2 - -
Market-A
- -
Market-A', No Go!d, No Temp, NO Auction
I
0,15
'~
0,1
0,05
0
Time of day, h
Figure 9: Standard deviation with the auction scheme removed, compared to the original MARKET-A scheme. Deleting the Auction Next, we let the agents assign their bids to themselves without any auction whatsoever. This means that the sum total of the controller outputs, Fio, might sometimes exceed the total amount of resource and sometimes be below. The physical model is of course still obeyed so that the total resource actually used, i.e. the sum of the cooling power, Pcio, will never exceed the total amount of available resource, as described by Eq. (1). The result of this simulation is shown in Figure 9. Note that this scheme is roughly ten times better than the MARKET-A scheme. Here an a of 17 turned out to be optimal.
Discussion At first glance, it might seem counterintuitive that performance actually improves significantly when virtually all the core mechanisms of the market are removed. First we showed that introducing the market improves performance considerably and then we showed that most market mechanisms are superfluous. What, then, is the big difference between the uncoordinated integrating controller, the CONTROL-A scheme, and the scheme without any auction that we ended up with? The simple answer, in our view, is the access to global data, in terms of the average temperature and the average setpoint temperature (as seen from Eq. (13) and Eq. (15)), in the MAR KET-A and MARKET-A / schemes.
6
CONTROL-B: A Standard Controller Using Global Data
Having concluded that the access to global data is crucial for performance, it is of course of interest to analyze what the performance of an integrating controller, like the one introduced in Eq. (10), but now incorporating global data, would be,
169
We would like the controller to take into account not only its own deviation from its setpoint, but also to consider the deviations of the other offices from their setpoints. Therefore, the controller in Eq. (10) is extended to (26)
Fio = F i - l , o + / 3 [ ( T i o - T setp) - ( ( T i ) - (Tsetp))],
where/3 is set to l0 as previously, and where Pcio, as before, is obtained from Eq. (l 1). The plot from the simulation with this controller, compared to the MARKET-A simulation, and to the MARKI~T-A~ simulation where the auction was removed, is shown in Figure 10. 0,25
0,2
i
~Market-A
~
0,15
~
Market -A', No GoLd, No Ternp, No Auction Con~ro~-B
0,05
j
0
+ - -
I
m
Time
o! dau h
Figure 10: Standard deviation with an integrating controller that utilizes global data compared to the MARKET-A scheme, and the MARKET-At scheme with the auction removed.
We see that the standard deviation is approximately the same for the CONTROL-B and the MARKET-At schemes. Thus, also the CONTROL-B scheme performs roughly ten times better than MARKET-A. An important difference, though, is that CONTROLB employs the well-known integrating controller for which there are well-understood methods and theories for e.g. stability analysis2. In contrast, the MARKET-A scheme is not easily analyzable from a formal theory perspective, since it does not rely on such well established concepts.
7
MARKET-B: A Market with Local Data Only
In the previous sections we have seen that the computational market MARKET-A was outperformed by the global control scheme CONTROL-B. It is an interesting question if a simple and well performing computational market approach can be devised that does 2This holds under the assumption that the time scale of the variations of the average temperature is much slower than the fluctuation of the temperature. This is a very reasonable assumption in the present case, however.
170
not depend on having available global information, in contrast to both these schemes. In this section we show that this is indeed the case. We adopt the concept of resource allocation, as proposed by Wellman [Wellman, 1993], whereby general equilibrium theory is applied to resource allocation. We note that any resource allocation problem with a single resource, like the ones treated in [Ygge and Akkermans, 1996, Kurose and Simha, 1989], can be viewed as a proper market exchange with two commodities: the resource r and money ra. We then write the utility function for each agent as u(r, m) = u(r) + m. The competitive behaviour for each agent, then, is to trade r for m whenever 0u is below the current market price and vice versa. The equilibrium price is a price, p, for r in terms of m, such that supply meets demand. Thus, at market equilibrium all 0~, = p , except for the agents that are at their boundary values. Agents being at their upper bounds will have a ~0u _> p, and ou _< p. We note that the resulting equilibagents at their lower bounds will have a N rium is identical to the Kuhn-Tucker conditions for maximizing the sum of the utilities, when the utilities are concave and independent. Under these conditions, we know that the equilibrium is unique and that it is equal to the solution maximizing the sum of the individual utilities.
7.1
The Relation Between Markets and Independent Controllers Utilizing Global Data
The performance measure for the system is given by Eq. (12). The best system is therefore the one that minimizes this equation. Hence, the most straightforward move one can think of to come to a market model, is to take this measure as representing the utility function for the overall system. So, the utility function for the individual agents is ideally related to [ ( T i o - T ~ e t p ) - ( ( T i ) - (Tsetp))] 2. 3 However, this is still a formulation containing global information. Thus, we want to obtain a purely local reformulation, by getting rid of the terms with {T) containing the global information. We might replace them, though, by terms relating to the changes in the local resource. In doing so, we take inspiration from the standard controller equations Eq. (10) and Eq. (26), indicating that we get good results (for unconstrained resources) with the update equation Fio = Fi-l,o + r when r has the form r = fi(Tio - T~etp). The intended interpretation of r is to represent the output that the local controller would have delivered if acting independently with unconstrained resource. In the market setting each agent updates its control signal Fio through Fio = Fi-l,o + AFio, where AFio is determined by the outcome of the market. Since N the resource is only redistributed among the agents, we have that ~ o = a AFio = 0. Accordingly, as a step in the design of a MARKET-B scheme, based on local data only, we employ the following definition. 3Since utilities are expressions of preference orderings, they are invariant under monotonic transformations.
171
Definition 1. Assume as the utility function for the individual office agents:
u( A Fio) = -~2o( A Fio - r
2,
(27)
where So is a strength parameter for each office representing its physical properties such as Ro and Co. The proper choice of C~ois discussed in Proposition 2 later on.
Proposition 1. A general equilibrium market in which the agents hold the utility function of Eq. (27) is equivalent to an integrating controller, the resource update equation of which exploits global information and is described by:
AFio = r
1 - ~2o . (1/c~2) (r
(28)
given that no agent is at its boundary. Proof. At equilibrium, all u'(AFio) will be equal, and thus it will hold for every office that AFio - Oio : ~2~ (AFiN N--1
o=1 r
=
--
•iN).
Summing the equations yields v'g-lz-.,o=lAF-,o -
r
~-~N=I AFio = 0, this gives A F i N = r try, this equation holds for all offices.
I ~-~o" 1 Together with the resource constraint, --
~ ( ~ )1i ) .
For reasons of symme-
Corollary. For the special case where all So are equal, and r = fl(Tio -T~etP), Equation (28) becomes exactly the CONTROL-B scheme as captured by Eq. (26). Consequently, a computational market based on the utility function u(AFio) = - a 2 [ A F i o /~(Tio - TSoetp)]2, will always lead to a resource allocation identical to that of the global CONTROL-B scheme. In sum, we see that local data plus market communication is equivalent to independent control utilizing global data. Even though our proof was based on the assumption that the agents are never at their boundary values, it will be a very close approximation in most practical applications. It should also be mentioned that managing the boundaries is not required for a successful implementation. As seen above, omitting the management of boundaries in the current application leads to C O N T R O L - B , which was shown to have a very high performance.
7.2
Finding an Optimal Utility Function
In this section we show how an optimal utility function is constructed in the constrained case, from an optimal controller for the unconstrained case.
Proposition 2. If Tio is a linear function of AFio, 4 and if r
minimizes Eq. (12) in the unconstrained case, then in a market where the utility functions are described by 4From the thermal model discussed in Section 2 (especially Eqs. (4) and (11)) we note that this is indeed the case for a reasonably wide range. Another, and general, reason that this assumption is adequate, resides in the fact that linearization around a working point (hem, the minimum of f ) is a commonly used and successful procedure in control engineering. Accordingly, the proposition can be read as giving the exact result, when we cut off the Taylor expansion after the linear term. We mention in passing that the error made in f due to this cut-off is only of third order with respect to the resource.
172
o7',0 the associated market equilibrium minimizes Eq. (12) in Eq. (27), with s o - O/XF~o' the constrained case, if the resource can be independently allocated among the agents. Proof. Minimizing Eq. (12) boils down to minimizing f ( T i l , T i 2 , . . . ,Tin ) = N T io -- T oset ) -- ((Ti) --
-
rewritten as ~-~No_l[Tio(AFio ) - Tio(~io)] 2. Since Tio is a linear function of A F i o , o7% o . ( A F i o - Oio), and hence f becomes we have that Tio(AF~o) = Tio(~io) + o~XF,
] Eo _l \(~ Fo~ ,o
( A F i o - ~io) 2. Thus f = - Y'~ff=l Uo. Since we know (as pointed
out in the beginning of Section 7), that the equilibrium maximizes the sum of the utilities we see that it minimizes f .
Discussion. Previously, we saw that the independent controller CONTROL-B that incorporates global data, viz. the average temperatures, performs very well. In this section we positively answered the question if one can construct a market, MARKET-B, that is based on local data only and that performs as good or even better. In this section a market approach based on general equilibrium theory was used. This is of course not the only available mechanism for resource allocation in MAS. It seems interesting to try out other mechanisms, like the contract net protocol [Davis and Smith, 1983], and see if they perform better. However, proposition 2 tells us that, if we treat this problem of building control as separable in terms of agents, like done by Huberman et al., there is no better scheme. 5 For example, if we assigned all the resource to an auctioneer, that on its turn would iteratively assign the resource in small portions to bidders bidding with their true marginal utility, we would end up with something close to the competitive equilibrium, but we can not do better than MARKET-B. Furthermore, this would be an extremely inefficient way to arrive at equilibrium compared to other available methods [Ygge and Akkermans, 1996, Andersson and Ygge, 1997]. That is, we can use different mechanisms for achieving the competitive equilibrium, but we can never hope to find a mechanism that would do better than the MARKET-B scheme.
5We note that, as mentionedearlier, the problem is actually not really separable in terms of agents and therefore better solutionsmay exist if this is taken into account. However,we believethat for managingnonseparability, centralizedalgorithmsare likelyto be even morecompetitivecompared to distributedones. Another observationis that in this paper we have only investigatedthe case of usingthe currentavailableresource as the interestingcommodity,as done by Hubermanet al., and found an optimal mechanismfor that. At the same time, extendingthe negotiationsto future resources as well, could potentiallyincreaseperformance,but this is a different problemsetting with different demands on availablelocal and/or global informationitems, such as predictions.
173
8
Discussion and Conclusion
We believe that both the approach and the results, as presented in this paper, pose a general challenge to the MAS community. Multi-agent systems offer a very new way of looking at information technology, with potentially a big future impact. They may lead to what Kuhn calls a 'paradigm shift' [Kuhn, 1970]. However, when new paradigms arise on the scientific horizon, they must prove their value in comparison and competition with existing, more established ones. The MAS paradigm is no exception. We have therefore deliberately played the role of the devil's advocate in this paper. Any new paradigm in science sparks off enthusiasm. But this tends to lead to a parochial view, on its turn bringing along exaggerated claims and promises, and thus to wrong expectations from society. Pride and prejudice are dangers lurking around the corner, even in the field of MAS. These are not imaginary dangers, because the history of Artificial Intelligence and computer science is full of examples. Take, as an example, the field of knowledge-based systems. Knowledge systems actually are a success story. They are now in everyday use around the world. Yet, we are very far from the 'copying and automating human experts' image that was pictured to the public in the early eighties. According to some, such ill-conceived expectations contributed to the so-called 'AI Winter' later in the decade. Still, knowledge systems constitute a positive example, as they represent the single most important commercial and industrial offspring from AI. But their role has become much more modest, as intelligent support tools or assistants, usually functioning as knowledge-intensive components embedded within much larger conventional information systems. So, the key question to be answered by the MAS community is: in what respect and to what extent are multi-agent solutions better than their more conventional alternatives? And the key message of this paper is that arguing in favour of the multi-agent systems paradigm does require much more careful analysis. As yet, there are hardly any studies in the MAS literature of the kind carried out in the present paper. But as we have shown in technical detail, old paradigms such as conventional control cannot be that easily dismissed. Paradigm shifts and scientific revolutions are brought about, not only through noisy crowds (although these can indeed be very helpful), but by lots of hard everyday technical work. Abstract considerations alone, concerning what agents are, or on the general nature of autonomy, rationality, and cooperation, are not sufficient to achieve this. Therefore, we have taken a different approach, one that is bottom-up and application-oriented. In this way, we aim at obtaining experimental data points on the basis of which convincing MAS claims can be established. The data point considered in this paper is climate control within a large building consisting of many offices. This is a quite prototypical application relating to the general problem of optimal resource allocation in a highly distributed environment. This class of problems has already received much attention in the MAS area. Reportedly, this type of application is very suitable for market-oriented programming [Huberman and Clearwater, 1995]. On the other hand, we have devised some more conventional control engineering solutions, as well as alternative market designs.
174
important conclusions of our investigation are: The market approach by [Huberman and Clearwater, 1995] indeed outperforms a standard control solution based on local, independent controllers. So, the MAS approach indeed yields a working solution to this type of problem. 9 However, if conventional control schemes are allowed to exploit global information, as does the market design of Huberman et al., they perform even better. We have proposed an alternative market design that uses local data only, and still performs as good as a control engineering scheme having access to global information. So, our general conclusion is that "local data + market communication = global information". The important difference is that in computational markets this global information is an emergent property rather than a presupposed concept, as it is in standard control. These results provide the opportunity to more clearly state what the added value of market-oriented programming is in this type of applications. There is a scientific role for debunking here. As we have seen, one can model the situation in the more traditional terms of control engineering. A similar argument regarding distributed resource allocation could be construed, by the way, vis-h-vis mathematical optimization (cf our discussion in ]Ygge and Akkermans, 1996]). In our analysis we have focused on the market approach. It is tempting to ask whether things are different when a non-market MAS approach is followed, say, using the contract net [Davis and Smith, 1983]. As we argued, the answer in our opinion is a straightforward no. Arguing for non-market approaches is not a way out, but a dead-end street. The goal in the considered class of problems is to find the optimal distributed solution. Alternative MAS approaches, market as well as non-market ones, only change the multiagent dynamics on the way to this goal. This might be done in a better or poorer way, but it is not possible to change this goal itself. The goal state in any MAS approach is, however formulated, equivalent to market equilibrium, the yardstick for having achieved it is given by some quantitative performance measure as we discussed, and both are stable across different MAS approaches. The agent and market approach does give a highly intuitive and natural picture of problems in distributed environments - - even when at a strictly algorithmic level it effectively leads to the same end result as alternative approaches. We do believe that this is a value in itself: for conceptual modelling, understanding, explanation, knowledge transfer. Moreover, models and pictures that have conceptual simplicity are more easily generalized to more complicated situations, e.g., to allocation of multiple resources in multicommodity markets [Wellman, 1993]. Because of its focus on local information, communication and action, the agent paradigm is more flexible than centralized approaches. This does not show up very clearly in the case analyzed here, because the underlying physics and mathematics has been strongly simplified (for example, all ofrices and thermodynamic processes are assumed to be equal). This will generally not be
175
true in reality, and that will necessitate a big modelling effort in centralized approaches as standard control engineering. For the present case, we have shown that the MAS approach offers working solutions of equal quality (see the simulation experiments and, for a formal and general result, our Proposition 1). As a point in favour of a MAS approach, we note that it can treat resource constraints relatively easily, see especially our Proposition 2. So, when we have strong heterogeneity, large scale, and relative independence, we believe that the agent approach starts to pay off. This is visible for example in the application of power load management we are working on ourselves [Ygge and Akkermans, 1996]. It does have similarities with the application discussed in this paper, but for the reasons mentioned the current industrial state of the art based on central control approaches is quite limited. Notwithstanding this, it does not diminish the need for thorough analysis i of (market) failures and successes.
Acknowledgments We thank Rune Gustavsson at the University of Karlskrona/Ronneby and Hans Ottosson at EnerSearch AB for all their support. A special acknowledgment goes to Olle Lindeberg whose very detailed comments on the draft papers led to significant improvements in the simulations and whose ideas also helped us in the design of the market in Section 7. We also thank Bengt Johannesson, who, as a part of his master's thesis, went through our paper in detail and recreated all the simulations. We thank Arne Andersson, Eric Astor and the SoC team at he University of Karlskrona/Ronneby for useful comments and discussions of draft material.
References [Andersson and Ygge, 1997] A. Andersson and E Ygge. Managing large scale computational markets. 1997. In preparation. [Borst et al., 1997] W.N. Borst, J.M. Akkermans, and J.L. Top. Engineering ontologies. International Journal of Human- Computer Studies, 46: 365-406, 1997. ISSN 10715819. [Bradshaw, 1997] J.M. Bradshaw. Menlo Park, CA, 1997.
Software Agents.
AAAI Press/The MIT Press,
[Clearwater and Huberman, 1994] S. Clearwater and B. Huberman. Thermal markets for controlling building environments. Energy Engineering, 91(3):25-56, 1994. [Davis and Smith, 1983] R. Davis and R. G. Smith. Negotiation as a metaphor for distributed problem solving. Artificial Intelligence, (20):63-109, 1983.
176
[Huberman and Clearwater, 1995] B. A. Huberman and S. Clearwater. A multi-agent system for controlling building environments. In Proceedings of the First International Conference on Multi-Agent Systems ICMAS'95, pages 171-176, Menlo Park, CA, June 12-14 1995. AAAI Press / The MIT Press. [Incropera and Witt, 1990] EE Incropera and D.E De Witt. Fundamentals of Heat and Mass Transfer, third edition. Wiley and Sons, New York, 1990. ISBN 0-471-51729-1. [Kuhn, 1970] T.S. Kuhn. The Structure of Scientific Revolutions. The University of Chicago Press, Chicago, IL, 1970. Second Edition. [Kurose and Simha, 1989] J. E Kurose and R. Simha. A microeconomic approach to optimal resource allocation in distributed computer systems. IEEE Transactions on Computers, 38(5):705-717, 1989. [Ogata, 1990] K. Ogata. Modern Control Engineering, Secod Edition. Prentice-Hall, Englewood Cliffs, NJ, 1990. ISBN 0-13-589128-0. [Wellman, 1993] M. Wellman. A market-oriented programming environment and its application to distributed multicommodity flow problems. Journal of Artificial Intelligence Research, (1): 1-23, 1993. [Ygge and Akkermans, 1996] E Ygge and J.M. Akkermans. Power load management as a computational market. In M. Tokoro, editor, Proceedings of the Second International Conference on Multi-Agent Systems ICMAS'96, pages 393-400, Menlo Park, CA, December 9-14 1996. AAAI Press.
Learning Network Designs for Asynchronous Teams
Lars Baerentzen 1 , Patricia Avila ~, and Sarosh N. Talukdar 1 1 Engineering Design Research Center Carnegie Mellon University Pittsburgh PA 15213 e-marl: snt~edrc.cmu.edu, [email protected] Power Systems Technology Division Redes de Energia S.A. Lepanto, 350 3. 20. 08025 Barcelona, Spain e-mail: avila~redesa.es
A b s t r a c t . An asynchronous team (A-Team) is a network of agents (workers) and memories (repositories for the results of work). It is possible to design A-Teams to be effective in solving difficult computational problems. The main design issues are: "What structure should the network have?" and "What should be the complement of agents?" In the past, the structure-issue was resolved by intuition and experiment. This paper describes a procedure by which good structures can be learned from experience. The procedure is based on the use of regular expressions for encoding the capabilities of networks.
1
Introduction
An Asynchronous T e a m (A-Team) is a problem solving architecture consisting of collections of agents and memories connected into a strongly cyclic directed network. The memories form the nodes of the network, the agents form the arcs. Figure 1 below shows such a network. Each m e m o r y holds a population of trial solutions. The solutions are not necessarily solutions to the overall problem that the A-Team is supposed to solve. Some memories m a y be dedicated to some subproblem of the overall problem, or to some relaxation of it. Each agent copies solutions from its input memory, modifies them and writes the results to its output memory. Each agent is autonomous in that it decides for itself when to work and on which solutions to work. Agents in an A-Team cooperate by working on each others results. The agents can be divided into two types: Construction agents that produce new solutions, and destruction agents that erase existing solutions from their output memories. A construction agent consists of several parts: an operator, that transforms solutions from its i n p u t - m e m o r y into solutions of its o u t - m e m o r y type. - a scheduler, that decides when the agent should run. -
178
a
b
c
e
Memory
i,
lh
d
g
Memory
Memory
Fig. 1. An A-Team consisting of memories and agents.
- a selector, that decides which solution from the input-memory the operator should work on. In a destruction agent the operator is missing. All the skill of a destruction agent lies in its ability to choose when to erase which solution from its output memory. A-Teams can be very useful for solving hard problems where lots of improvement heuristics are available, but where no single heuristic performs satisfactorily on its own. In such cases good results can often be achieved by combining the different heuristics into an A-Team. Each improvement heuristic is turned into an agent by adding a scheduler and a selector. A-Teams have been developed for a number of difficult problem-areas including sets of nonlinear equations [1], [2], traveling salesman problems [3], highrise building design [4], reconfigurable robot design [5], diagnosis of faults in electric networks [6], control of electric networks [7], job-shop-scheduling [8], protein structure analysis [9], and train-scheduling [10], steel-mill scheduling, paper-mill scheduling and constraint satisfaction [11]. Until now A-Team builders have designed the topology of the network for an A-Team for a given problem more or less by trial and error, and by hand. In this paper we explore a technique for automatically learning good ATeam network designs. The technique relies on a method for measuring agent cooperation presented in [13]. In section 2 we first give a brief presentation of an extended version of this method. Then in section 3 we develop the concept of a Regular Expressions with Probabilities (REP) which is needed to calculate the effectiveness of different network topologies for an A-Team,
179
and show how to use the results from section 2 to obtain good network topologies. Section 4 concludes the paper with some computational results. 2
Measuring
Agent
Cooperation
in A-Teams
In this section we present a parametric model for how the agents in an AT e a m cooperate. We then show how to use m a x i m u m likelihood estimation to estimate the parameters in the model for a real A-Team. 2.1
The model
D e f i n i t i o n 1. For a given m e m o r y let: - S be the space of all solutions t h a t can be stored in memory. - G (Goal) be the subset of S that we consider acceptable final solutions. If the A-Team can find one of these solutions we will be happy. - So, $1, $2, ..., Soo be subsets of S such that:
.So=G 9 $1 contains those non-acceptable solutions for which there exists some agent that will turn the solution into an acceptable final solution with positive probability. 9 Sn+l contains those solutions not in any earlier Sk(k <_ n) for which there exists some agent that will turn the solution into a solution in Sn with positive probability. - Soo be those solutions in S t h a t are not in any Sn. In other words Soo are those solutions for which no finite sequence of agents has positive probability of turning the solution into an acceptable final solution. - A solution in Sn be said to be at a distance n from the Goal. If, given any solution, it is possible to compute which class S~, the solution belongs to, then the underlying optimization problem t h a t the A-Team is trying to solve can be attacked with all kinds of greedy heuristics. In this paper we are concerned with the more difficult and more realistic scenario in which we can only distinguish acceptable final solutions from non-acceptable solutions. We assume: - There is an oracle which, given a solution, answers yes or no to the question: "Is this solution an acceptable final solution?", i.e. is this solution in So or not. No other assesments of solution quality are available. -
Whenever an agent, A, works on a solution, x, it produces a new solution A(x). There are some probabilities that depend both on the solution and on the agent, that the new solution will be closer to the goal, further away from the goal, or at the same distance from goal as the old solution. To keep things simple we will assume that:
180
-
if x C S,~ then A(x) will be at most one step further away from the goal. T h a t is A(x) E S,~-I U S~ U Sn+l. This automatically holds if all agents have inverses.
D e f i n i t i o n 2. The usefulness of applying an agent Ai to a solution x is the triple of probabilities (p, q, r), where p is the probability that the new solution, Ai(x), will be one step closer to the goal than x. - q is the probability that the new solution will be one step further away from the goal than x. r is the probability that the new solution will be at the same distance from the goal as x. To model agent cooperation we define: D e f i n i t i o n 3. For each pair of agents Ai, Aj let
M i ,J be a 3 by 3 matrix that models how the usefulness of agent Aj is changed when a solution is modified by running agent Ai. If the usefulness of running agent Aj on solution x is (Pold, qold, told) and agent Ai is used to generate solution y from solution x then the new usefulness of running agent Aj on solution y is given by (Pnew, qnew, r n e w ) = (Pold, qold, told) Mi'J
(1)
To get a simple but complete parametric model describing how the A-Team works we define: D e f i n i t i o n 4. For each agent Ai let
(Pi, qi,
ri)
be the usefulness of Ai on a randomly chosen solution from S, and D e f i n i t i o n 5. Let 71"0, 7[" 1 ~ 9 . . ~ 71"N
~ 71"oo
be the probabilities that a random starting solution is in S0,S1,...,SN,S~ respectively. The model is relatively simple, and there are of course many ways agents can cooperate that can not be captured well in this model. For instance, the situation where running agent Ai is very good after a pair of agents Ajl and Aj2 have both been run, but not after running either of the two agents, cannot be modeled in our framework.
181
2.2
Estimating
the Model
Parameters
We assume that we have a lot of information about the A-Team available in the form of historic runs. We assume: -
We are given a number of sequences of agents, each sequence terminated by a No or a Yes. Each sequence of agents was applied to a random starting solution. As soon as an acceptable final solution was found the system was stopped and a Yes reported. If some maximum number of agents were applied before an acceptable solution was found the sequence was terminated with a No.
Now let the vector 00 denote a specific value of each of the parameters in the model. For a given 00 we want to compute the probability of reaching the goal in exactly the same manner as we did in the historical sequences. For sequences terminated with a No, we seek the probability that the goal will not be reached after applying that particular sequence of agents. For a sequence of length n terminated with a Yes, we seek the probability that after the first n - 1 agents have been applied the goal has not yet been reached, but when the last agent is applied the goal is reached. The higher these probabilities are for the historical sequences, the better the historical sequences are explained by the parameter values 0o. The better a particular 0o explains the historical sequences, the more likely it is, that that set of parameters is the correct set of parameters for describing the workings of the A-Team. The function that computes the probability of getting the exact observations that we have gotten as a function of the parameter values is called the likelihood function in statistics. The likelihood function in our case is rather hairy to write down. Nevertheless we have written a program (an A-Team in fact) that does a pretty good job at maximizing it. The set of parameter values that maximizes the likelihood function is known as the maximum likelihood estimate for the unknown parameters. We can show, but it is beyond the scope of this paper, that under mild conditions on how the agents used in the sequences that comprise our observations are chosen the maximum likelihood estimator will converge in probability, not necessarily to the correct underlying parameter values, but to a set of values that predict perfectly the behavior of the A-Team we are modeling. 3
Designing
a Good
Network
Topology
for an A-Team
The task of designing a good network topology consists of two major subtasks. The most important subtask is to decide on a good decomposition of the problem into related problems: subproblems, relaxed problems etc. This task is very application specific and is hard to formalize. In this section we focus on the second aspect of good topology design, which is: Given the different types of memories for the different related problems, and given that each available agent can read from and write to particular types of memories how
182
should the network of agents and memories that compose our A-Team be configured?
3.1
T o p o l o g y for doing low o v e r h e a d selection
The fundamental observation about network topologies is that different network topologies allow different sequences of agents to be applied to initially seeded solutions. With the topology in fig. 2 below, all sequences of a,b and c's are possible, with the topology in fig. 3 only sequences consisting of 0 or more a's followed by exactly one b followed by 0 or more c's are possible for solutions seeded into the left memory.
a
b
c
Memory
Fig. 2. All sequences containing a,b and c possible.
a
c
b Memory
Y
Memory
Fig. 3. Only sequences consisting of any number of a's followed by one b followed by any number of c's can be applied to solutions starting in the left memory.
Of course the individual selectors in the agents can further restrict what sequences are actually generated, but in m a n y eases, obviously bad sequences
183
can cheaply be avoided by having a network structure that makes them impossible. In principle, one can always get by with only a single memory. This memory would hold the 'sum type' of each of the individual solution types, that is the memory could hold a solution of any of the types. Then it would be up to the selector within each agent to only select solutions from the memory that fit it's required input type. Of course no one would ever want to do this. We only mention this to emphasize that having multiple memories can be seen as just a way of making the selection problem for the agents easier. 3.2
Topology effectiveness
To examine the effectiveness of different topologies for an A-Team we must fix the selection and scheduling strategies of each agent. In particular we shall assume: - Each agent does random selection. - Each agent is scheduled with constant frequency in each memory. Different copies of the same agent working on different memories may have different scheduling frequencies. The result of the above assumptions is that from the solution perspective the probability that agent k will work on the solution next is some constant ak that only depends on the relative frequency of the agents hooked up to the memory containing the solution. Given estimates for the parameters in the model described in section 2.1 we would like to compute the expected waiting time for the earliest creation of an acceptable final solution. This we currently cannot do. We shall demonstrate however, how for a given topology one can compute the expected value of the total discounted drift towards goal: E
7nAn
where 0 < 7 < 1 is the discounting factor, and
(2)
An is a random variable measuring the drift (pn - qn) in iteration n. The larger the expected discounted drift, the better the topology (and/or the better the chosen agent frequencies.) There are (at least) 2 ways to think of the discount factor A. - It is merely a mathematical trick to make the above sum finite, allowing us to compare two systems each having constant drift say 0.2 and 0.6 respectively per agent application. Without the discounting factor both would have drift oo, but the latter is clearly preferable to the former. It makes us prefer drift that happens sooner, rather than later. This is important since the objective really is to find a final acceptable solution as fast as possible. The sooner the drift happens the sooner the acceptable final solution is found. -
184
Regular Expressions and Finite State Automatons. To calculate the expected discounted drift to goal we need to keep track of the set of sequences of agents that can be generated by a given topology, as these are exactly the strings we are taking the expectation over. We will need the concepts of regular languages, regular expressions and finite state automatons borrowed from computer science (see for instance [14].) First some definitions: D e f i n i t i o n 6. - An alphabet is a finite set of symbols {al, ..., a,,}. An element from the alphabet is called a letter, symbol or character. - A string, sentence or word is a finite sequence of letters. - A language is a set (possibly infinite) of strings. - The concatenation of two strings x and y written x y is the sequence of letters formed by first writing down the letters that form x then the letters that form y. For example the concatenation of 'after' and 'wards' is 'afterwards'. The concatenation of two languages L and M is L M = {xy I x is a string of L and y is a string of M} - The language L,, is L concatenated with itself n times. L0 is the language consisting only of the empty string. - The union of two languages L and M is L U M = {x I x is a string of L V x is a string of M} -- The language L* is L0 U L1 U L2 tO ... It consists of all strings that can be made by concatenating 0 or more (only finitely many) strings from L. We are now in position to define regular expressions and regular languages. The definitions are recursive: D e f i n i t i o n 7. The regular expressions over an alphabet are exactly those expressions that can be formed using the following construction rules. For each expression there is an associated regular language. The definition of the associated language is given with each construction rule. 1. The special symbol e is a regular expression. It denotes the language consisting only of the empty string. 2. For each letter 'a' in the alphabet, 'a' is a regular expression denoting the language consisting of the single string 'a' 3. If R1 and R= are regular expressions denoting languages L1 and L2 respectively then (a) (R1) I (R2) is a regular expression denoting the language L1 tO L2 (b) (R1)(R2) is a regular expression denoting L~L2 the concatenation of L1 and L2 (c) (R1)* is a regular expression denoting LI* To avoid having to write all the parentheses we define an operator hierarchy. We define * to have the highest priority, then concatenation and finally I the union operator. We allow parenthesis to be omitted whenever it can be done without causing confusion as to which regular language is denoted.
185
D e f i n i t i o n 8. A Nondeterministic Finite state Automaton (NFA) is a 5tuple (Q, S , (f, q, F) where -
-
Q is a finite set called the states. Z is a finite set called the alphabet. (f is a relation ~ C Q • Z • Q called the possible transitions q E Q is a special state, called the initial state. F C Q is a subset of the states called the accepting states.
One should think of an NFA as a machine that reads a string of letters from the alphabet. At any given time the machine is in one of the finitely many states in Q. The machine starts in the special initial state q and each time a letter is read the machine transitions from one state to another. The set of possible new states depend on the current state and the symbol read. If the current state is qc and the symbol read is x then the possible new states are those qn for which (qc, x, qn) E (f. The automaton is called nondeterministic because there might be more than one such q,~ for a given qc and x. D e f i n i t i o n 9. An NFA (Q, E, 6, q, F ) is said to accept a string xl, x 2 , . . . , xk if there exists states q0, ql, 9 9 qa such that
qo=q (qi-l,Xi, qi) E ~ V O < i < k and - qkEF
-
-
-
T h a t is the NFA is said to accept a string if it is at all possible for the machine to be in one of the accepting states F after having read the string. The set of strings that is accepted by a particular NFA M = (Q, s 5, q, F ) is called the language L(M) accepted by M. It is a well known result from Computer Science (see for instance [14]) that: 10. For any NFA M the language L ( M ) accepted by M is regular. For any regular language L there exists an NFA M such that L = L ( M )
Theorem
R e p r e s e n t i n g N e t w o r k s as R e g u l a r E x p r e s s i o n s . Theorem 10 is very important as it allows us to look at a network of memories and agents in terms of regular expressions. Rather than calculating the total expected discounted drift to goal directly for a network of memories and agents we can translate the network to an equivalent regular expression, and calculate the total expected drift to goal based on the sequences of agents that can be produced by the regular expression. T h e o r e m 11. A network of memories and agents forms an NFA
Proof. Consider a network of memories {ml, m 2 , . . . , mn } and distinct agent types {al, a2, 9 9 am}. Define the relation ~i C {ml, 9 9 ran} x {al, 9 ar,~} • { m l , . . . , m,,} such that (mi, aj, ink) whenever the network contains an arc with an agent of type aj reading from memory mi and writing to memory mk. Define:
186
-O={ml,m2 .... ,m.} Z= {al,a2,...,am} q=~rnl -F--Q -
-
Then the NFA M = (Q, 2~, 6, q, F) accepts exactly those sequences of agents that can be applied to solutions seeded into m e m o r y mt The way we are using NFA's they should not be seen as machine's accepting strings but rather as machines generating strings. The set of strings that can be generated are the same as the set of strings that the machine would accept. The machine generates a random string in the following nondeterministic way: 1. Start by setting the current state qc = q, the special initial state of the NFA. 2. If the current state qc is in F you may stop; 3. If you choose not to stop pick a random transition from delta with q~ as the first coordinate (qc, anew, qnew). 4. write anew to the output; set qc := qnew; goto 2. I n c l u d i n g S c h e d u l i n g F r e q u e n c i e s . There is one problem with using the standard NFA's and the standard regular expressions. They do not include information about the probability of a particular string of agents being generated, they merely represent which strings can or cannot be generated. We want a way to include information about the probability ak of agent k working on a solution in a memory. In the case of the NFA we can easily define an NFA with probabilities (NFAP): D e f i n i t i o n 12. A Nondeterministic Finite state A u t o m a t o n with Probabilities (NFAP) is a 5-tuple (Q, 5 , 6, q, F) where - Q, ~ , q and F are as for the NFA. - 6 is a relation: 6 C Q • ~P x Q x [0;1] where ( q i , a j , q k , p ) E 6 means that the transition consisting of writing aj and moving from qi to qk has probability p of being applied when the machine is in state qi Similarly we can add probabilities to a regular expression. As for ordinary regular expressions the definition is recursive: D e f i n i t i o n 13. The regular expressions with probabilities ( R E P ) ' s over an alphabet are exactly those expressions that can be formed using the following construction rules. For each expression there is an associated regular language L and a probability u(z) for each string z of the language. 1. The special symbol e is a REP. It denotes the language consisting only of the empty string e which has probability u(e) = 1
187
2. For each letter 'a' in the alphabet, 'a' is a REP denoting the language consisting of the single string 'a' with probability v(a) = 1 3. If R1 and R2 are REP's denoting (L1, ul) and (L2, v~) respectively then (a) (R1) [P (R2) is a REP denoting the language (L1 U L2, u) where If a string z is in both L1 and L2 then its new probability is v(z) ~-- pvl(Z) --t-(1 - p)/]2(z) If a string z is only in L1 then its new probability is v(z) = pvl(z) If a string z is only in L2 then its new probability is v(z) = (1 - p) 2(z) (b) (R1) (R2) is a REP denoting (L1L2, v) the concatenation of L1 and L2 with the probability of a string z in L1L2 being -
-
-
v(Z) = ~_~ Vl(Zl)V2(z2) Zl~Z2
where the sum is over those pairs of string zl, z2 such that zlz2 = z, zl E L1, and Z 2 E L2. (c) (R1) *~pJ is a REP denoting the language (LI*, v) with probabilities OO
(3) n~---O
Zl:*,.~Zn
where the second sum is over all sets of n strings zl, 9.., z= all in L92 such that z = ZlZ2... z,~ One should think of these probabilities as occurring the following way: i. First flip a skewed coin with probability p of getting heads and (1 - p ) of getting tails. If we get tails we immediately stop; other wise goto 3(c)ii ii. generate a random string from L1, choosing each possible string y from L1 with probability vl(y). iii. goto 3(c)i. This means that in effect we are choosing a geometrically distributed number of strings from L1 and concatenating them. Formula (3) sums up the different ways a particular string z can be generated in this way. The problem of converting from an NFAP to a REP and vice versa is a little complicated. There is a simple process for converting back and forth between regular expressions and ordinary NFAs. We can show, but we will not go into the proof here, that T h e o r e m 14. For any R E P (R, v) there exists an N F A P (Q, Z , ~, q, F), and for each N F A P there exists a R E P such that
1. The set of strings generated by the N F A P is the same as the set of strings generated by the REP. (This we know from Theorem (10)), and
188
2. The probability that the N F A P will generate a particular string z is exactly
.(z) Instead let us demonstrate how to calculate the expected total drift towards goal for a REP based on the agent interaction matrices from the A-Team model. Calculating Total Expected Discounted Drift towards Goal for a R E P . Notation (reminder): - Let "~ < 1 be the discounting factor. Let (pk(x), qk(x),rk(x)) denote the probabilities that the new solution generated by agent k will be one step closer to goal, one step further away from goal or at the same distance from goal, respectively, compared to the solution x which the agent read in. - Let P~ be shorthand for the entire vector -
(pl(x),ql(x),rl(X),...,pN(x),qN(x),rN(X)) for a system with N different agents. Let M m,n be the matrix describing how running agent m changes (p~, q~, r~), that is: (p,~(m(x)),q,~(m(x)),r,~(m(x))) = (p,~(x),q,~(x),r,~(x))M m''~ -
(4)
let M m be the big 3N by 3n matrix that describes how running agent rn changes p,q and r for all N agents. M m is block diagonal: "Mrn,1 0 0 0
0 M m,2 0 0
0 0 0 0 ... 0 0 M m,N
(5)
We define D e f i n i t i o n 15. Let V ( R , P) be the total expected discounted drift towards goal from running a random sequence of agents from the language defined by the REP R given the starting solution is described by the vector P. Note that this definition is valid even for vectors P with Pk + qk + rk ~ 1. Before we give a complete algorithm for computing V ( R , P) we note some simple properties and calculate some simple examples: - Let V(a, P) be the expected drift towards goal from applying agent 'a' to any solution x with P~ = P. Then V(a, P) = p~ - q~, where p~ and qa are the relevant coordinates of P corresponding to agent 'a'. Note that V(a, P) is linear in the second argument: V(a, 7 P ) = 7 V ( a , P) for any real number 7.
189
-
Let V(ab, R) be the expected total discounted drift towards goal from first apply ing agent 'a' to a solution x with Px = P, and then agent 'b' to a(x). Then V(ab, P) = Y(a, P) + 7Y(b, P M a) (6) rewriting the right hand side we get
V(ab, P) = V(a, P) + V(b, P("/Ma)) -
(7)
It will ease the notation if we include the discount factor 7 in the Ma matrix, so define for any agent m:
Qm = ../M,~
(8)
V(ab, P) = Y(a, P) + V(b, pQa)
(9)
in which case (7) becomes:
It is perhaps somewhat surprising that formula (9) has a very nice generalization to any pairs of REPs R1 and R2: T h e o r e m 16. For any pair of REPs R1 and R~ there exists a 3N by 3N
matrix Qn~ which depends only on R1 such that V(RIR~, P) - V(R1, P) + V(R~, pQn,)
(10)
Proof. We give a constructive proof by showing how to recursively compute QR,. There are five possibilities for R1 1. R I = ~ In this case the empty sequence is the only sequence in L(R1) so P is not changed when a random sequence from L(R1) is run, hence Qnl = I Note for later use that I the identity matrix m particular is 3 by 3 block diagonal (in fact it is diagonal). 2. R1 = 'a' In this case Qa is as defined in (8), and hence is also 3 by 3 block diagonal. 3. R1 is of the form RI = Rl1 I~ R12. In this case the sequence of agents run will be a sequence from Rll with probability a and a sequence from R12 with probability 1 - a. Let QRI~ and QR12 be the two matrices describing how P would be changed by running a random sequence from Rll and from R12 respectively. In the mean the new P is then Pnew = aPQ n'' + (1 - a)PQ na~ hence Qn, = a Q n , , + (1 - a)Q n'~ If both QR~ and QRI: are 3 by 3 block diagonal then so is QR,.
190
4. R1 is of the form R11R12 In this case we first run a random sequence of agents from Rll changing P to />new = pQnll Then a random sequence of agents from R12 is run changing Pnew to Pnewnew = PnewQ n'~ = pQn,~QR,~ hence OR1 = Qn,,QR,2 Note that if the two matrices are 3 by 3 block diagonal, then so is their product. T,(~) 5. R1 is of the form R1 = /-~11 What should QR~{") be? Remember that P*(~) 1~11 denotes the REP in which the string consisting of n random sequences from L(Rll) has probability (1 - ~)c~n of being generated, thus we get.: O<3
v~nl~'" n,(~) ,~,t2, P) = V(R~{ ~), P) + ~-].(1 _ cx)oe,~V(R2, p(QRH),~) rl,=O
Using the fact that V is linear in the second coordinate we get.: co
The infinite sum in (11) which is exactly Q/t~[ ~ can be reduced: OK3
Qn;i") = E ( 1
- c~)(~n(QR'~) n
(12)
n----0 Do
= (1 - c~)I+ E ( 1
- oe)~'~(QR~') n
(13)
;q__j_ cO
= (1 - c,)I + oeQn'* ~-]~(1 - a ) ~ ( Q n , , ) ,
(14)
n----0
= (1 - a)I + a Q m ' Q n ; { ") ~ Q R;I~) = (1 - (x)(I - c~QR~) -1
(15) (16)
where QR[{ ~) only depends on Q/t~ and (x and can be computed using (16) by inverting (I - c~Qn~). Note that this is easy to do if QRH is 3 by 3 block diagonal since then (I oeQR11) will be so too. -
Note also, that in that case Qn~{~) will be 3 by 3 block diagonal as well since it is the inverse of a 3 by 3 block diagonal matrix.
191
Since each of the basic forms of Qn is 3 by 3 block diagonal, and since each of the three different ways to compute a new QR from the subexpressions of R maintain this property Q2 will be 3 by 3 block diagonal for any REP R.
V(R, P) for any REP R. Again the formula
We now show how to compute is recursive:
1. If R = e then V(R, P) = 0 since no agents are run. 2. If R = a then V(R,P) = Pa - qa, where Pa and qa are the relevant coordinates of P corresponding to agent a. 3. I f R = R1R2 then V(R, P) = V(R1, P)+V(R~, pQ2~) since after running a random sequence from R1 starting from state P, the state of the system will be characterized by pQR1 by definition of Q2~ 4. If R = R1 ]~ R2 then V(R, P) -- o~Y(R1, P) + (1 - ,~)V(R2, P) since the sequence of agents will be chosen from R1 with probability a. 5. It is a little more complicated to compute V(R, P ) when R = R~(~): If we letR~ mean running n sequences of agents from R1 then we get: oo
V(R*I(~), P ) = ~-~(1 - c~)o?~V(R?, P)
(17)
n~-O
Now since
V(R?,P) = ~ V(R1, p(Q21)k-1)
(18)
k----1
we get: oo
n-1
V(R~(a)' P) -= Z (1 - a ) ~ ' ( Z n=0
V(R1, p(Q21)k))
(19)
an)V(Rl, p(Q2~)k)
(20)
k=0
by changing the order of summation we get:
V(R~ (c'), P ) = ~-~.(1 - a)( ~ k---0
n----k+l
= ~ ok+iV(R1 ' p(Q2,)k)
(21)
k=O oo
= V(R1, aP(Zotk(O2')k))
(22)
k=O
Now if we use oo
oo
X=Zc~k(Q21)k=I+Zo~k(QR1)k = I + ( ~ Q ' I X ~ . k:0
X = (I - aOR~)-I
(23)
k=l
(24)
192
to substitute in (22) we get
V(R~ (~) , P) = V ( R t , a P ( I - otQR') -1)
(25)
which is also easy to calculate given Qnl. Since Qnl is 3 by 3 block diagonal then so is (I - o~Qnl) which makes it easy to invert. Applying the Method. following steps:
To apply the method one must go through the
1. Hook up all the agents to read from and write to the same memory. Collect observations from many runs on several different instances of the problem that the A-Team is supposed to solve. 2. Use the observations gathered to estimate the parameters in the agent cooperation model. 3. Given the estimated parameters, the discounted drift for any REP can be calculated as described above. This step frees us from the rather tedious and time consuming task of actually experimenting with different network topologies for the A-Team. 4. When a REP with high drift has been identified, translate the REP to the equivalent NFAP, and hook up the agents in the corresponding network.
4
An Example
Consider the Traveling Salesman Problem, and two heuristics for solving it: Lin-Kernigan and simple heuristic, 2-city swap, which randomly swaps two cities in a tour without regard for the change in tour length. What is the best way (best network) for combining these heuristics? As humans, with our knowledge of the workings of each of these two agents, we have a feeling for the general strategy that must be used: The random 2-city swap should mainly be used for lifting LK out of its local minima. The main improvement in tour length will be done by LK. The power of the m e t h o d presented in this paper is that it can uncover the optimal network for the A-Team using only the information about which sequences of agents were run, and how fast the goal was reached. We collected data by running 500 random sequences of the two agents on the 318 city instance of the T S P problem, terminating with a Yes when a tour of length less than 2 percent above optimum is found and terminating with a No if no such tour is found after 100 agent applications. The observations where then used to estimate the parameters in the cooperation model, we got the following maximum likelihood estimate:
p• O. 195782 p i [0] == 2. 07479e-05
193
pi[1] == pi [2] == pi [3] == pi [4] == pi [5] == pi [6] == pi [7] == pi [8] ==
0.180393 0.0466464
0.138674 3.9718e-05 0.00820783 0.000126658 0.214058 0.0785221 p i [ 9 ] == O. 13753 p[O]== 2.1118e-23 q[O]== 1.12917e-14 riO]== 1 p[1]== 0.0837636 q[1]== 0.915009 rill== 0.00122723
matrix[O] [0] : p_new== 0 p_old+ 0 q_old+ 1.00565e-05 r_old q_new== 0.0216914 p_old+ 0.0190053 q_old+ 0.000291845 r_old r_new== 0.978309 p_old+ 0.980995 q_old+ 0.999698 r_old matrix [0] [1] : p_new== 3.34688e-06 p old+ 8.06538e-05 q_old+ 3.29203e-09 r_old q_new== 0.0080602 p_old+ 0.000816439 q_old+ 8.04717e-07 r_old r_new == 0.991936 p_old+ 0.999103 q_old+ 0.999999 r_old matrix[i] [0] : p new== 0.999997 p_old+ 0.922703 q_old+ 0.59284 r_old q_new== 2.43702e-06 p_old+ 0.0488165 q_old+ 0.397015 r_old r new== 9.4355e-07 p_old+ 0.0284802 q_old+ 0.0101451 r_old matrix[l] [I] : p new= = 0.624745 p_old+ 1.15111e-06 q_old+ 0 r_old q new== 0.182888 p_old+ 3.22376e-05 q_old+ 7.38058e-11 r_old r_new == O. 192367 p_old+ 0.999967 q_old+ I r_old
Where agent 0 is Lin-Kernigan and agent 1 is the random city swap. The first 11 lines gives the probabilities ~',~ that an random initial solution is in each of the S~'s. The next two lines gives the usefulness of each of the two agents, when applied to the initial solution. Notice that the estimates are "wrong" in the sense that we would probably expect LK to actually make progress when applied to an initial solution, and we would expect the random city swap to leave the solution at the same distance from goal, when applied to an initial solution. Instead the parameters, as estimated, state that LK will leave the solution at the same distance to goal with probability 1, and the two city swap will move the solution 1 step away from goal. This difference, between what is actually happening, and what the parameters estimate is happening, is offset by the parameters stating that the solutions start closer to the goal than they actually do. As soon as either agent 0 or agent 1 has been run agent 1, the two city swap agent, will have usefulness (p, q, r) = (0, 0, 1) as can be seen from the two rows:
194
matrix [0] [i] :
r_new == 0.991936 p_old+ 0.999103 q_old+ 0.999999 r_old matrix[l] [I] :
r_new == 0.192367
p_old+
0.999967 q_old+
I r_old
Running agent 0 (LK) yields a solution, that is a local optimum with respect to LK. This is captured by the estimates for matrix[0][0], which moves all probability mass to r: matrix [0] [0] : p_new== 0 p_old+ 0 q_old+ 1.00565e-05 r_old q_new == 0.0216914 p_old+ 0.0190053 q_old+ 0.000291845 r old r_new == 0.978309 p_old+ 0.980995 q_old+ 0 . 9 9 9 6 9 8 r_old
Finally when the usefulness of LK thus is (p, q, r) = (0, 0, 1) running agent 1 (2-city-swap) once changes the usefulness of LK to
(p,q,r)=
(0,0,1)
-0.9999 2.437e-06 9.435e-07" 0.9227 0.04882 0.02848 ~ (0.59,0.40,0.01) 0.5928 0.3970 0.01015
This compares well with what we found by manual inspection. If only 2 cities were swapped before LK was run again, then LK had a pretty high probability of simply undoing the swap, and finding the same local minimum as earlier. If however agent 1 was applied once more, then the new tour was sufficiently different, that LK would find a new local minimum when run again. In terms of the estimated parameters, the usefulness of LK after running agent 1 once more is (p, q, r) = (0.59, 0.40, 0.01)
[0.9999 2.437e-06 9.435e-07] /0.9227 0.04882 0.02848 / ~ (0.97, 0.01, 0.02) L0.5928 0.3970 0.01015J
We tried computing V(R, P) where P is the vector of initial usefulness for each of the two agents P = (2.11e-23, 1.129e-14, 1, 0.0837, 0.9150, 0.001227) for three different REP's. In all cases we used discount factor 3, = 0.99 1. R = (0 p 1)*0.00). This REP can be implemented in a single memory with both agent 0 and 1 reading and writing f r o m / t o it. In each iteration agent 0 is run with probability ~ and agent 2 with probability 1 - ~. The optimal drift was found to occur when c~ equaled 0.36. The total discounted drift in this case was 15.1319.
195
2. R = (0(1'(~))) *(1-~176This REP represents repeatedly running LK once followed by a random number of 2-city swaps. It requires two memories to implement. LK reads from the first memory and writes to the second. One copy of the 2-city swap agent read from and writes to the second memory, and a second copy of the 2-city swap agent read from the second memory and writes to the first. The optimal drift was found to occur for a = 0.64 which corresponds to running on average ~ = 1.56 applications of the 2-city swap agent between each pair of LK applications. The total discounted drift was 15.4822. The drift is slightly better than what could be achieved with a single memory. 3. R = (011) *(1~176requires 3 memories to implement. LK reads from the first and writes to the second. One copy of the 2-city swap agent reads from the second and writes to the third. A second copy reads from the third and writes to the second. This set-up yields a discounted drift towards goal of 30.785. The clear superiority of this topology comes from the fact that only the optimal sequence of agents is applied. There is no randomness in what sequence is applied.
5
Future Work
The obvious next step is to devise a search strategy for automatically searching the space of REPs for the optimal REP. We still do not have a good feel for what structure there is on the Space of all REPs. Secondly, we would like to incorporate the running time required of each agent into the model. One approach is to use a different discounting factor for each agent. Agents that take longer to run would have a smaller discounting factor than agents that run quickly.
References 1. de Souza, P., Talukdar, S.N.: Genetic Algorithms in Asynchronous Teams. Proceedings of the Fourth International Conference on Genetic Algorithms. Morgan Kaufmann, Los Altos, CA, 1991. 2. Talukdar, S.N., Pyo, S.S., Giras, T.: Asynchronous Procedures for Parallel Processing. IEEE Trans. on PAS, Vol. PAS-102, NO 11, Nov. 1983. 3. de Souza, P.: Asynchronous Organizations for Multi-Algorithm Problems. Ph. D. dissertation, Dept. of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 1993. 4. Quadrel, R.W.: Asynchronous Design Environments: Architecture and Behavior. P h . D . dissertation, Department of Architecture, Carnegie Mellon University, Pittsburgh, PA, 1991. 5. Murthy, S.: Synergy in cooperating agents: designing manipulators from task specifications. Ph.D. dissertation, Dept. of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 1992.
196
6. Chen, C.L.: Bayesian Nets and A-Teams for Power System Fault Diagnosis. Ph.D. dissertation, Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA, 1992. 7. Talukdar, S.N., Ramesh, V.C.: A Parallel Global Optimization Algorithm And Its Application To The CCOPF Problem. Proceedings of the Power Industry Computer Applications Conference, Phoenix, May, 1993. 8. Chen, S.Y., Talukdar S.N., Sadeh N.M.: Job-Shop-Scheduling by a Team of Asynchronous Agents. IJCAI-93 Workshop on Knowledge-Based Production, Scheduling and Control, Chambery, France, 1993. 9. Lukin, J.A., Gove, A.P., Talukdar S.N., Ho, C.: An Automated Probabilistic Method for Assigning Backbone Resonaces of (13C, 15N)-Labelled Proteins. 10. Tsen, C.K.: Solving Train Scheduling Problems Using A-Teams. Ph.D. dissertation, Electrical and Computer engineering Department, CMU, Pittsburgh, 1995. 11. Gorti, S.R., Humair, S., Sriram, R.D., Talukdar, S.N., Murthy, S.: Solving Constraint Satisfaction Problems Using A-Teams. To appear in AI-EDAM. 12. Talukdar, S., Baerentzen L., Gove, A., de Souza, P.: Cooperation Schemes for Autonomous Agents. In review. 13. Baerentzen, L., Talukdar S.N.: Improving Cooperation Among Autonomous Agents in Asynchronous Teams. Submitted to: Journal of Computational and Mathematical Organizational Theory. 14. Aho, A.V., Ullman, J.D.: Principles of Compiler Design. Addison Wesley 1977.
Building Multi-Agent Systems with CORRELATE Wouter Joosen i, Stijn Bijnens2, Frank Matthijs 1, Bert Robben i, Johan Van Oeyen 2, and Pierre Verbaeten i I K.U.Leuven, Dept. of Comp. Sc., Celestijnenlaan 200A, B-3001 Leuven, Belgium 2 NetVision N.V., Psrijsstrsat74, 3000 Leuven, Belgium In this paper, we introduce CORRELATE, a programming language with an open implementation framework. In principle, CORRELATE exploits the know how of concurrent object-oriented programming languages with high level synchronisation primitives. This is an essential element to support multi-agent systems as the inherent concurrency between the computational agents must be controlled. Another specific element of CORRELATE is its capability to define autonomous operations: we illustrate how these language elements can be used to program reactive and cognitive multi-agent systems. CORRELATE is aimed to be a powerful testbed and development system at the crossroad of many disciplines: generic aspects of agent models and architectures are supported while the open implementation framework can still be customized to deal with different flavors of agent-oriented programming. 1
Introduction
In this paper, we discuss CORRELATE, a language for multi-agent systems. A major aim of the CORRELATE development team is to build a language that acts as a high level programming interface and that offers an open execution platform for a wide variety of multi-agent systems. The scope of applications we aim to support ranges from large populations of relatively simple agents that expose emergency (for example, see [BDA96]) to intelligent software agents on the Internet (for a challenging example, see [Ste94]). In the sequel of this paper, we will use the term agent space as a synonym for a multi-agent system. An agent space is a system composed of a set of intelligent agents which can cooperate, compete or simply coexist. Agents interact by sending messages to each other. Messages are pieces of information exchanged among agents and are treated as stimuli by the receiving agent. Different types of messages can be exchanged [ST90] ( e.g. action messages, observation messages, ...). At the application level, our interest is in supporting multi-agent systems which have been classified as being either cognitive or reactive. The distinction between the two categories probably has become less and less important in itself, but an important feature associated with the two schools of thought is the nuraber of agents that constitute an application, as well as their grain size. The initial focus in this paper is on reactive systems that may expose a relatively large number of finely grained agents. Further on, we will discuss how our work affects the full scope of multi-agent systems.
198
Our background is in modelling distributed systems using concurrent objectoriented technology. Concurrent object-oriented programming is a powerful paradigm to model autonomous agents that interact with each other. Agent-oriented programming [Sho93] specialises the object-oriented notions by fixing the state (now called mental state) of the objects (now called agents) to consist of instance variables such as beliefs (including beliefs about the world, about themselves, and about one another), capabilities, and decisions. A computation consists of these agents informing, requesting, offering, accepting, rejecting, competing, and assisting one another. These interaction patterns can be described in a programming language by specifying objects and their interactions CORRELATE is both a language and a system. It offers two major facilities: 1. A programming language that enables a high level description of agents that appear in the application problem domain. 2. A programming environment that supports the engineering of complex multiagent applications. The basic idea is based on a separation of concerns: the application developer can program the logical distribution of information without considering the actual execution platform for the application. CORRELATE therefore offers a clear programming interface to support high level descriptions of an agent space, while platform specific optimisations are managed by offering a meta-programming interface. This paper is structured as follows. Section 2 summarises our view on multiagents systems. Section 3 illustrates how agents can be modelled from an internal viewpoint: the focus of this section is on the modelling of concepts like goals, plans and beliefs. Section 4 illustrates how agents can be modelled from an external viewpoint: the focus of this section is on interaction and synchronisation. In section 5, we will discuss how the CORRELATE framework can be applied for building an agent space with complex and intelligent species (i.e. real cognitive agents). We conclude in section 6.
2
Developing
Multi-Agent
Systems
In this section, we briefly summarise our view on autonomous agents and on the development process for multi-agent systems. The first part will illustrate why not just any concurrent object-oriented language is a sufficient basis for developing multi-agent systems. The second part pictures the overall context of the CORRELATE project: we are not only interested in language design, but also in supporting the development process of complex applications. This part will also identify demands on the language framework.
2.1
A u t o n o m o u s agents
Our basic view on an agent is relatively straightforward: agents are active entities (objects) with some level of autonomy. The latter may vary from reactive systems
199
to goal autonomous systems. The distinction between autonomy levels can be characterised as follows: -
-
Reactive agentsperform activitieswhen corresponding stimuli are perceived. No intelligent indirection appears between perception input and the execution of an action. True autonomous agents are capable of exploiting higher-order reasoning capabilities that are essential in fulfillingtheir role in the agent society. The actual role of an agent is described by a set of goals the agent attempts to achieve. W e define a true autonomous agent as an intelligent entity that encapsulates a reactive agent by adding functionality which attempts to exploit reactive capabilities for the sake of the subject's own goals. 9 Plan autonomous systems are capable of selecting an appropriate plan in trying to achieve one predefined goal. A plan is recursively defined as a collection of subplans. Subplans m a y be direcglyezecutablewhen they consist of a collection of activities of the reactive basis. 9 Goal autonomous systems can handle multiple goals, and dynamically prioritise them. Note that one can consider the capability of an agent to dynamically change its set of goals. Such systems have bccn called vslue autonomous systems[VS96]. The relevance of refining a classificationof true autonomous agents is limited for the sequel of this paper.
A goal is a description of the future state of the world[SBKL93]. A n agent can only evaluate whether a goal has been satisfied by looking at itself,at other agents or at the environment. In our perspective, the environment acts as an indirection between an agent and a large collection of other agents, in fact all the agents that the observer is not directly aware of. In fact, in its basic form, a C O R R E L A T E agent will have a notion of itself, a notion of its neighbours and a notion of large collections of agents that cannot be addressed directly. The evaluation of a goal satisfaction function will thus involve the observation of these agents, a capability that will be supported by their reactive behaviour.
2.2
A View on t h e D e v e l o p m e n t of M u l t i - A g e n t S y s t e m s
The development of multi-agent systems is inherently difficult. Hence, a clean and powerful program development process is required to achieve reliable and efficient software that remains (trans)portable, maintainable and extensible. The object-oriented methodology has proven to be a very effective approach to software engineering, especially in an area involving substantial complexity, like designing software for a distributed environment. In our model, the top down development of applications for multi-agent systems involves two phases: first the development of a compu~ationsl model which only includes abstractions from the application's problem domain, and then the development of a physical model which covers the execution environment, for instance on a distributed m e m o r y system.
200
1. To begin with, the application developer must describe the computational model of his application. The resulting description corresponds to the problem domain of the application. In this phase, distribution aspects are completely hidden for the application programmer, who models an application as a set of agents that operate concurrently. 2. Secondly, an optimal ezecution environment must be described to target the application to the specific architecture it is running on. When treating distribution aspects, the software developer will describe the physical model of his application to target the application to a specific architecture. The entities that will be added are system agents.
/ f ~ ~ - ' ~ ' ~ " \ Distributed System I \/
// / [ ~
\\L--.t..r'--! Network / ,~ ~gent; ~ t"
~ace
\
1
I
/
0 "~
~A~
///
0
~
", / " \
Physical View
/
Files
0 __
Concurrent Object-Oriented Programming
9
Libraries
"~
0 Computational View
~
\
Reflection
0
~
Classes
/
Components
0
~ . m / ~
Compositional View Software Engineering
Fig. 1. A view on the development process
Orthogonal to this top down view of application development stands a bottom up view, which features software composition. Key abstractions and properties of the computational a n d / o r physical models of the software may be represented by reusable components in component stores. These components may be appropriate for the specific requirements of the application, or they may simply enable rapid prototyping. Thus the compositional view becomes a third and very important element in the software development scene (Figure 2.2). The description of the computational model of an application is (by definition) a portable program. The remaining challenge then is to combine this
201
portability advantage with efficient executions on various hardware platforms. In general, one could consider the execution environment as an extended operating system which needs to be tailored to the specific application(s) it is executing.
3
Autonomous
Agents
in CORRELATE
C O R R E L A T E evolved from a class-based concurrent object-oriented language. A C O R R E L A T E class mainly is a code template that enables the instantiation of individual agents s. In this section, we will briefly discuss in what sense C O R R E L A T E supports key elements of concurrent object-oriented technology and we will focus on the essential elements of the language for supporting true autonomous agents. In other words, we will treat the internal viewpoint on a C O R R E L A T E agent. In its basic form, a C O R R E L A T E agent is pretty much comparable with a concurrent object. An important concern in concurrent environments is the synchronisation of objects. Depending on the state of the object, a specific operation may or may not be executed. In other words, in a certain state an object can accept only a subset of its entire set of operations in order to maintain its internal integrity. This issue can be expressed with synchronisation constraints that reflect application specific semantics of the object. The description of synchronisation constraints inherently enforces the programmer to reveal state information of an agent. The problem of specifying synchronisation constraints therefore is related to the inherent conflict between encapsulation (one of the basic features of OO) and concurrency. Our basic approach in designing the C O R R E L A T E language is to define a view on an active object that reveals more state information than the amount that would be required in a sequential model, while maintaining encapsulation as much as possible. C O R R E L A T E objects therefore expose an intermediate abstraction layer which basically corresponds to an abstract state machine. At the level of a class definition, the language's syntax provides a behaviour section that describes the abstract states. Figure 2 describes a Shopper agent that buys airplane tickets for its owner. In principle, each abstract state is represented by a boolean selection operation that can determine whether an object is in the corresponding state or not. For the shopper, IsTicke~Bought0 determines whether the agent has actually bought the ticket. The application programmer then implements the mapping of the actual implementation (the private data members) on the abstract states. A precondition can be specified for each constrained operation. This precondition can use the abstract state and the parameters of the operation. Otherwise it is 3 The code fragments that will be shown through this paper are based on CORRF_~ LATE classes, because they essentially model large collections of similar agents. However, note that the computational model we have to focus on is the one of interacting agents in an agent space.
202
active Shopper{ interface: //Reactive Behaviour void ShowTicket0; behaviour: bool IsTicketBought0; for ShowTicket0 precondition IsTicketBought0; implementation:
); Fig. 2. The Shopper class
coded as any other operation of the class. Operations without a precondition are unconstrained. The main distinguishing feature of a C O R R E L A T E agent is its capability to perform so-called autonomous operations. Autonomous operations reflect the autonomy of the instantiated agents. The operational semantics of an autonomous operation causes its invocation each time it is finished. In the rest of this section, we will explain why this feature is useful in supporting autonomous agents. A goal can be satisfied or not, this depends on the state of the system. We consider an autonomous operation to be goal-specific: it will evaluate whether a goal is satisfied and eventually invoke a method that will change the state of a number of agents. This may lead to goal satisfaction. The sophistication of the goal satisfaction function may be subject to variation. In any case, we assume that the execution of a plan is caused by an autonomous operation. To illustrate the use of autonomous operations, consider Figure 3. The figure features an extended version of our shopper agent, which has to fetch us an airplane ticket as cheaply as possible. In other words, the overall goal of the agent is having the correct and reasonably priced ticket. This goal can be achieved by satisfying two subgoals: knowing a TravelingAgency that is not too expensive and obtaining a ticket from such a TravelingAgency. Again, in order to satisfy its goals, the agent employs plans: gathering information in order to find good TravelingAgencies, and buying a ticket in order to obtain one. The overall plan, then, is to gather information about TravelingAgencies and buy a ticket when it knows of at least one appropriate agency. To check this, the agent periodically evaluates whether is has enough information for this decision (Evaluatelnfo() in the figure). Looking at Figure 3, we see that each plan is implemented by an autonomous operation. The behaviour section contains the preconditions for each plan. This expresses that certain plans are only feasible when the agent is in a specific state (for example, the agent can only buy a ticket when it knows of a good TravelingAgency) and that plans need not be executed when the corresponding goals are achieved.
203
active Shopper~ a u t o n o m o u s : //Autonomous behaviour void BuyTicket 0 { ticket = shop $ Buy(destination); } void GetInfo0; void EvaluateInfo0; interface: //Reactive Behaviour void ShowTicket0; behaviour: bool IsTicketBought0; bool IsGoodAgencyKnown0; for Buyticket0 precondition IsGoodAgencyKnown 0 & & N O T IsTicketBought0; for GetInfo0 precondition IsEnoughlnfoGathered0; for ShowTicket0 precondition IsTicketBought0;
implementation: . . .
);
Fig. 3. The Shopper agent
4
C O R R E L A T E agents from an external v i e w p o i n t
In this section, we discuss how autonomous C O R R E L A T E agents constitute a multi-agent system. An agent space consists of a collection of agents that interact. In addition, advanced syuchronisation means must be provided to control the inherent concurrency between autonomous agents.
4.1
Interaction
Agent interaction happens by sending messages (invoking operations). Different models for message passing exist. Synchronous message passing blocks the sender's activity until the operation is executed to completion in the receiver object. With asynchronous message passing, the sender does not have to wait for the completion of the invocation, thus increasing concurrency. C O R R E L A T E supports both synchronousand asynchronousmessage passing. The application programmer must explicitly specify what kind of message passing (s)he prefers. Group communication is a straightforward extension in the sense that one agent may trigger similar behaviour (actually the same capability) in a set of recipients. Pattern-based Interaction Higher levelinteraction primitives are provided as well. Message based interaction is translated into the possibility to invoke operations on other agents. In order to express a message based interaction the sender must specify the recipient of a particular message. Most programming models
204
employ a name-based scheme to indicate the recipient(s). Agents will have to use neighbor-lists[BJV94]to remember the partners to communicate with. The pitfalls of pure name-based interaction are caused by extensive book-keeping operations that become essential when the neighbor listschange dynamically. W e have introduced powerful language constructs to specify coordinated object interaction: group based interaction can be defined on a set of agents which
share a property. These primitives are called pa~ern-based group communication. This facility assists in specifying sender-initiated coordination towards a group of recipients based on some of their properties. We have discussed this feature extensively in [B3V94]. 4.2
Synchronisation
In section 3, we have introduced the behaviour section of a C O R R E L A T E class. The discussion remained restricted so far: preconditions have bccn used to show constraints that are solely determined by the internal state of an individual agent. W e now discuss multi-object synchronisation constraints to express the synchronisation constraints that are determined by the state of multiple agents in the agent space.
active LonelyPhilosopher{ autonomous: //Autonomous behaviour void Eat(void); void Think(void); behaviouz: bool IsHungrY0; bool IsNotHungryO; for Eat() precondition IsHungry0; for Think() precondition IsNotHungry(); implementation:
}; Fig. 4. The LonelyPhilosopher class
G r o u p S y n e h r o n i s a t i o n We will illustrate the use of multi-object synchronisation constraints by treating the classical example of the dining philosophers. An individual philosopher could be modelled as being either Hungry or NotHungry. Being Hungry is a precondition for Eat(), being NotHungry is a precondition for Think(). Such a philosopher could be modelled as indicated in Figure 4.
205
active DiningPhilosopher: public LonelyPhilosopher{ behaviour: bool heating(); bool IsNotEating0; for Eat() precondition IsHungry0 && _left$IsNotEating 0 && .xlght$IsNotEatin80;
};
Fig. 5. The DiningPhilosopher class
A real dining philosopher will not exist in isolation: more sophisticated synchronisation is needed when assembling three or more similar philosophers around the same table while each stick is shared between two neighbor. However, when sitting at the table with his colleagues, a dining philosopher is not really interested in his neighbours being hungry or not, but rather in the question whether they are actually eating (and using the shared resource) or not. The DiningPhilosopher Agent therefore adds another view on the existing LonelyPhilopher, which is modelled in the behaviour section of the class. Yet we see in Figure 5 a simple example of multi-agent synchronisation 4.
ElegantDiningPhilosopher: public LonelyPhilosopher{ active behaviour: bool IsEating0; bool IsNotEating0; for Eat() precondition IsHungry 0 && ElegantDiningPhilosopher [IsNeighbour(location)]IsNotEating();
};
Fig. 6. The ElegantDiningPhilosopher class
Pattern-based specification of other agents is also possible when expressing multi-agent synchronisation: Figure 6 shows a more elegant solution of the same problem. One can see how explicit data members are removed from the high level agent description 5. We refer to [BJV94] for a more complete description of 4 Note that we have used the inheritance mechanism to add an extra view to the existing agent. s In fact, CORRELATE programmers are expected to provide an indirection between the high-level description of the agent class and the implementation details. This typically results in extra observation operations.
206
the syntax for pattern-based interaction/synchronisation. Initially, we have introduced multi-object synchronisation as a facility to specify coordination (conflict resolution) by expressing proper synchronisation constraints that restrict the inherent concurrency that results from a multi-agent model. An important added value of these primitives however, is their capability to express goal satisfaction functions at a high level. Indeed, as goals are desired states of the world, one can write a goal satisfaction function as a precondition that refers to all the agents that are affected when expressing the goal. Note again that an agent can exploit the pattern-based features of C O R R E L A T E to refer to agents it is not directly aware of.
4.3
The Language Implementation Framework
Autonomous operations and asynchronous object invocations introduce logical concurrency into the object space. Synchronisation constraints must be taken care of. These concurrency issues must be realized by the run-time system of CORRELATE. For each object described in C O R R E L A T E there is an individual metaobject. These individual meta objects (also called local reference objects) exist in the run-time of CORRELATE. To enforce the synchronisation constraints we must be able to delay the execution of an operation. For this purpose the local reference object intercepts all the object invocations and stores them in a message queue. Consequently an object invocation must be reified: an invocation is a first class object with operations to store, examine, manipulate and forward messages. A local reference object executes the following algorithm: select an operation from the message queue if (precondition fulfilled) { delegate operation towards the application object } else { restore the operation in the message queue
} The local reference object encapsulates the synchronisation constraints of the application object. It is obvious that local reference objects generate run time overhead which cannot be ignored. However, significant progress has been made in optimizing such architectures[MMWY92] and the ratio between computation and communication for one particular application can definitely result in acceptable performance.
207
5
Discussion
C O R R E L A T E has been developed in a close collaboration between application domain experts and language oriented research groups (ourselves in the first place). For instance, in [BJV94] we have illustrated how C O R R E L A T E can be exploited in the development of reactive multi-agent systems. In this context, the availability of coordination primitives has shifted the programming interface to a level which is far above the traditional stimuli-response paradigm that is often the basis for programming reactive systems. The main question which we have to address in this section is related to the overall applicability of our framework. How and to which extent do we support cognitive agents? The answer contains two elements: - First, note that sophisticated (intelligent) agents have been built in our framework[JBR+95]. In particular, we refer to a load balancing framework that improves the execution of irregular scientific simulations: load managers that are instantiated from this framework are able to dynamically modify their policy. Appropriate cooperation protocols have been used for this purpose. Also, one can see from some of the examples in this paper (Figure 3) that the construction of sophisticated agents is realistic indeed. - Secondly, and much more important, the C O R R E L A T E run time system is a strong base for embedding high level agent interpreters. For example, in [Sho93], an agent interpreter has been modelled to be an execution engine that executes a basic control loop 1. that reads current messages and updates the mental state of the agent; 2. and that executes commitments for the current time. Dependent on a particular agent model/language, a variation on the basic loop of the interpreter may be defined. An excellent and comprehensive example can be found in [Rao96]: here an abstract interpreter is presented for AgentSpeak, a relative of the Procedural Reasoning System PRS[IGR92]. One of the key advantages of the C O R R E L A T E open language framework is that the individual meta-object of an agent can be specialised to execute the agent interpreter. In other words, the algorithm that is embedded in a local reference object can be adapted to match the operational semantics of an abstract interpreter. The second element of the above mentioned observations indicates the road we want to follow when supporting intelligent agents with CORRELATE: the language acts as an open implementation framework in which the specific semantics of a particular agent language can be embedded. Our long term goal is to enable C O R R E L A T E to act as a core support system for multiple agentoriented languages. We believe a two-level interpreter will be needed, consisting of a language dependent and a generic part, the latter being implemented in CORRELATE. This division will allow us to re-use a generic framework to support various AOL's. This design is considered to be essential in a large-scale
208
environment, since experience learns that no single agent language will ever satisfy all users. As such, the interpreter must be adaptable to newly developed agent-oriented languages.
Project Status and Ongoing Work The CORRELATE system is currently being used on DEC/alphas running BSD based OSF/1 (Digital Unix), on Sun/Solaris2.x and on SGI IRIX 5.3. It is clear many of the benefits that emerge by exploiting object technology will become more apparent when using a base level operating system that supports objects/agents as a key abstraction. CORRELATE class definitions are typically translated into C++ code for individual meta objects. The result is a relatively efficient execution environment. This definitely is an advantage when building multi-agent systems that are expected to scale without unacceptable performance losses. We believe this currently is a problem when developing cognitive multi-agent systems.
6
Conclusion
In this paper, we have introduced CORRELATE, a programming language with an open implementation framework. In principle, CORRELATE exploits the know how of concurrent object-oriented programming languages with high level synchronisation primitives. This is an essential element to support multi-agent systems as the inherent concurrency between the computational agents must be controlled. Another specific element of CORRELATE is its capability to define autonomous operations: we have illustrated how these language elements can be used to program reactive and cognitive multi-agent systems. We are extending the language in order to support high-level interactions (negotiation, cooperation...) such that the remaining programming efforts become minimal. We believe that in the long run, multi-agent systems will be based on multi-paradigm technology in which, for instance, rule-based and constraintbased approaches will play a dominant role in domain specific agent languages. However, it seems also clear that from the software engineering point of view, core technology will be built in the distributed systems community, where concurrent object technology will be needed to master the complexity of a powerful and generic execution engine for multi-agent systems. We believe that CORRELATE is a useful testbed and development system at the cross-road of the above mentioned disciplines: generic aspects of agent models and architectures are supported while the open implementation framework can still be customised to deal with different flavours of agent-oriented programming.
209
References [BDA96]
Christof Baeijs, Yves Demaseau, and Luis Alvares. SIGMA: Application of Multi-Agent Systems to Cartographic Generalization. In Modelling Autonomous Agents in a Mult-Agent World: Agents Breaking Away, number 1038, pages 163-176. Lecture Notes in Artificial Intelligence, 1996. [BJV94] Stijn Bijnens, Wouter Joosen, and Pierre Verbaeten. Language Constructs for Coordination in an Agent Space. In Y. Demazeau, J-P. Muller, and J. Perram, editors, Modelling Autonomous Agents in a Multi-Agent World, Lecture Notes in Artificial Intelligence. Springer-Verlag, 1994. To be published. [IGR92] F.F. Ingrand, M.P. Georgeff, and Anand S. Rao. An architecture for realtime reasoning and system control. IEEE Empert, 7(6), 1992. [JBR+95] Wouter Joosen, Stijn Bijnens, Bert Robben, Johan Van Oeyen, and Pierre Verbaeten. Flexible load balancing software for parallel apphcations in a time-sharing environment. In Bob Hertzberger and Giuseppe Serazzi, editors, High-Performance Computing and Networking, LNCS-919, pages 398-406. Springer-Verlag, 1995. [MMWY92] Hidehiko Masuhara, Satoshi Matsuoka, Takuo Watanabe, and Akinori Yonezawa. Object-oriented concurrent reflective languages can be implemented efficiently. In Proceedings OOPSLA '9~, ACM SIGPLAN Notices, pages 127-XXX, October 1992. Published as Proceedings OOPSLA '92, ACM SIGPLAN Notices, volume 27, number 10. [Rao96] Anand S. Rao. AgentSpeak (L): BDI Agents Speak Out in a Logical Computable Language. In Modelling Autonomous Agents in a Mult-Agent World: Agents Breaking Away, number 1038, pages 42-55. Lecture Notes in Artificial Intelligence, 1996. [SBKL93] Donald Steiner, Alasteir Burt, Michael Kolb, and ChristeUe Lerin. The Concept MAI2L. In Modelling Autonomous Agents in a Mult-Agent World: From Recation to Cognition, pages 144-151. Lecture Notes in Artificial Intelligence, 1993. [Sho93] Y. Shoham. Agent-Oriented Programming. Artificial Intelligence, (60):51-92, 1993. [ST90] T. Sueyoshi and M. Tokoro. Dynamic Modeling of Agents for Coordination. In Proceedings of the European Workshop on Modeling Autonomous Agents in a Multi-Agent World (MAAMAW'90), August 1990. [Ste94] Luc Steels. Beyond Objects. In Object-Oriented Programming, Proceedings of ECOOP'94, pages 1-11. Lecture Notes in Computer Science, 1994. [vs96] H.J.E Verhagen and R.A. Smit. Modeling Social Agents in a Multi-Agent World. In Position Paper for the 7th European Workshop on Modelling Autonomous Agents in a Mult-Agent World, pages 163-176, 1996.
Modelling an Agent's Mind and Matter
Catholijn M. Jonker, Jan Treur Vrije Universiteit Amsterdam Department of Mathematics and Computer Science Artificial Intelligence Group De Boelelaan 1081 a, 1081 HV Amsterdam, The Netherlands Email: {jonker,treur} @cs.vu.nl URL: http://www.cs.vu.nl
Abstract. In agent models often it is assumed that the agent maintains internal
representations of the material world (e.g., its beliefs). An overall model of the agent and the material world necessarily incorporates sub-models for physical simulation and symbolic simulation, and a formalisation of the (static and dynamic) representation relation between the two types of sub-models. If it is also taken into account that the agent's mind has a materialisation in the form of a brain, the relations between mind and matter become more complex. The question of how the different types of interaction between mind and matter of an agent and the material world can be modelled in a semantically sound manner is the main topic of this paper. The model can be used to simulate a variety of phenomena in which (multiple) mind-matter interactions occur, such as sensing, acting, (planned) birth and death, causing brain damage, and psychosomatic diseases.
1 Introduction To be able to maintain interaction with a dynamic (material) world is one of the crucial abilities of most agents. How to define the semantics of the relation of an agent with the material world in a formal manner is a nontrivial issue. For example, in [Pylyshyn, 1986] the relations are described by so-called transducers that connect aspects of the material world to symbolic representations and vice versa. Also in practical agent m o d e l l i n g projects the division and relation between agent and material world is not trivial. For example, a cup on a table can be considered part of the material world, but it is also convenient to consider material aspects of the agent as part of the world; for example a relation between the cup and a robot gripper that has picked up the cup then can be viewed as part of the structure of the material world. This perspective can be extended to a material world describing two agents shaking hands or even one agent, the left hand of which has gripped the right hand. These external material agent aspects (the agent's matter) can be modelled as different from the internal mental aspects of the agent such as its beliefs about the world, its goals and plans, and its reasoning (the agent's mind). If it is also taken into account that the agent's mind has a materialisation in the form of brain, the relations between mind and matter become more complex. The question of how the different manners in
211
which the mind and matter aspects of an agent relate, and how their interaction can be modelled is the main topic of the current paper.
1.1 The Knowledge Representation Hypothesis One of the starting points is the knowledge representation hypothesis formulated in [Smith, 1982, 1985]. The essence of this hypothesis is the strict division between (a) the meaning of a representation, that can be attributed from outside, and (b) the manipulation of representation structures independent of their meaning, that is, it proceeds on the basis of form only. In logic the knowledge representation hypothesis is the basis for formal systems. These systems formally define a language (in which formulae stand for the representations of knowledge), e.g., the language of predicate logic. The attribution of semantics is formalised by formal structures (called models, standing for world states the knowledge refers to); e.g., [Tarski, 1956; Dalen, 1980; Chang and Keisler, 1973; Hodges, 1993]. For connections to reasoning systems, e.g., see [Weyhrauch, 1980; Treur 1988, 1991]. The manipulation of these syntactical structures is based on inference rules, such as modus ponens, conjunction introduction, and others. These inference rules are defined in a generic manner: they do not depend on the meaning of the formulae on which they are applied. Formal systems as defined in logic can be used to formalise cognitive representation systems and their (reference) relation with the material world they represent. However, there is a second type of relation between a cognitive system and the material world: the cognitive representations themselves are embodied in a material form in the brain.
1.2 Two Types of Representations Although it is not known in all details how the mental activities of a human agent take place, it is clear that the brain is an essential material aspect of it. Every thought is somehow physically embodied within the brain and every reasoning process is performed as physical brain activity. This is a completely different relation between a
Fig. 1.
Dualrepresentation relations
212
symbolic system and a material system that has nothing to do with the content of the symbolic representation (i.e., the material world aspects the representations refer to), but only with the form in which the representation is materialised. In this paper we interpret the materialisation of representations as a second process of representation, on which again the knowledge representation hypothesis can be applied. For example, consider the concept of time. The symbolic representation noon can be represented in a material manner by a clock. A clock, a material piece of machinery, represents the symbol noon by the material configuration in which both hands of the clock point upward. Manipulations with these material representations take place according to physical laws that indeed (as demanded by the knowledge representation hypothesis) are independent of the content the representations refer to; i.e., the movement of the hands of the clock just follow physical laws and are not affected in any manner by our attribution of semantics to the material configurations that occur. Thus, following the knowledge representation hypothesis, it is not only possible to represent material aspects in a symbolic manner, it is also possible to represent symbolic or mental aspects in a material manner. We distinguish the two types of representation as material representation versus symbolic representation. Dual representation relations are obtained (see Figure 1): material aspects of the world have a symbolic representation, and symbolic aspects have a material representation. Note that these relations are not related in a direct manner; e.g., they are not each others' inverse. Specific and bi-directional types of mind-matter interaction do occur frequently: observations in the material world affecting the information in the brain (sensing), mental processes leading to material actions affecting the world (acting), material processes in the world affecting the brain itself (e.g., causing brain damage), or mental processes affecting the material state of the body (e.g., causing psychosomatic diseases).
Fig, 2.
Simulating the material world and the cognitive symbolic system representing it.
213
1.3 Simulation of Material and Symbolic Processes and their Interactions The model developed in this paper simulates both types of mind-matter interaction. The material world in which agents live and think is depicted in Figure 2 at the right bottom. The cognitive symbolic system depicted at the right top represents the world and performs reasoning about the world (cf., [Lindsay and Norman, 1977, 1989; Newell, 1980; Laird, Newell and Rosenbloom, 1987; Simon and Kaplan, 1989]). In order to make a model of the interacting material and symbolic processes that is executable on a computer system, a (formal) simulation model can be made. The simulation model is depicted on the left-hand side of Figure 2. It formalises the following processes: - the material processes in the physical world - the symbolic processes in the cognitive system - the interaction between these two types of processes Note that a simulation does not pretend to have the exact same behaviour as the original system: a rough approximation may be sufficient to obtain a specific insight in these processes.
2 An Example of Multiple Interaction Between Mind and Matter Consider an agent walking down a street, see Figure 3 (position p0. The agent observes that he can buy an ice-cream in the supermarket across the street (the supermarket is at position pa in Figure 3). As he has a desire for ice-cream, he sets himself the goal of visiting the super-market. To do this he has to cross the street. Although the shrub to his left limits his view of the road, he decides to cross the street as he does not see any cars.
Fig. 3.
Initial situation
Fig. 4.
Situation at the time of the accident
Unfortunately, there is a car coming down the street. The driver, being a bit in a hurry, comes around the curve with the shrub (position p2 in Figures 3 and 4) at the same moment that the agent arrives at position pa. As can be seen in Figure 4, the car hits the agent. Although the accident is minor one (the agent has no permanent injuries), the agent is momentarily stunned and suffers from temporary amnesia (his
214
short-term memory is lost). One of the effects is that the agent cannot remember his goal to visit the super-market. Furthermore, he cannot remember any of the observations done prior to the crossing of the street. Realising that he lacks knowledge about his present predicament, the agent decides to observe his surroundings (again).
3 DESIRE: a Modelling Framework for Multi-Agent Systems The model described in this paper is specified within the compositional modelling framework DESIRE for multi-agent systems (framework for DEsign and Specification of Interacting REasoning components; cf. [Brazier, Dunin-Keplicz, Jennings, Treur, 1995, 1997]). A number of generic models for agents and their tasks have been developed in DESIRE and have been used for a number of applications (e.g., [Brazier, Jonker and Treur, 1997]). The architectures upon which specifications for compositional multi-agent systems are based are the result of analysis of the tasks performed by individual agents and groups of agents. Task compositions include specifications of interaction between tasks at each level within a task composition, making it possible to explicitly model tasks which entail interaction between agents and interaction between agents and the world (which is modelled as a separate component). Models specified within DESIRE define the structure of compositional architectures.: Components in a compositional architecture are directly related to tasks in a task composition. The hierarchical structures of tasks, interaction and knowledge are fully preserved within compositional architectures. Below the formal compositional framework for modelling multi-agent tasks DESIRE is introduced, in which the following aspects are modelled and specified: (1) (2) (3) (4) (5)
a task composition, information exchange, sequencing of tasks, task delegation, knowledge structures.
3.1 Task Composition To model and specify composition of tasks, knowledge of the following types is required:
9 a task hierarchy, 9 information a task requires as input, 9 information a task produces as a result of task performance meta-object relations between tasks 9
Within a task hierarchy composed and primitive tasks are distinguished: in contrast to primitive tasks, composed tasks are composed of other tasks, which, in turn, can be either composed or primitive. Tasks are directly related to components: composed tasks are specified as composed components and primitive tasks as primitive components.
215
Information required/produced by a task is defined by input and output signatures of a component. The signatures used to name the information are defined in a predicate logic with a hierarchically ordered sort structure (order-sorted predicate logic). Units of information are represented by the ground atoms defined in the signature. The role information plays within reasoning is indicated by the level of an atom within a signature: different (meta)levels may be distinguished. In a two-level situation the lowest level is termed object-level informatior~ and the second level meta-level information. Meta-level information contains information about objectlevel information and reasoning processes; for example, for which atoms the values are still unknown (epistemic information). Similarly, tasks which include reasoning about other tasks are modelled as meta-level tasks with respect to object-level tasks. Often more than two levels of information and reasoning occur, resulting in metameta-.., information and reasoning.
3.2 Information Exchange Information exchange between tasks is specified as information links between components. Two types of information links are distinguished: private information links and mediating information links. For a given parent component, a private information link relates output of one of its components to input of another, by specifying which truth value of a specific output atom is linked with which truth value of a specific input atom. Atoms can be renamed: each component can be specified in its own language, independent of other components. In a similar manner mediating links transfer information from the input interface of the parent component to the input interface of one of its components, or from the output interface of one of its components to the output interface of the parent component itself. Mediating links specify the relation between the information at two adjacent abstraction levels in the component hierarchy. The conditions for activation of information links are explicitly specified as task control knowledge. 3.3 Sequencing of Tasks Task sequencing is explicitly modelled within components as task control knowledge. Task control knowledge includes not only knowledge of which tasks should be activated when and how, but also knowledge of the goals associated with task activation and the extent to which goals should be derived. These aspects are specified as component and link activation together with task control foci and extent to define the component's goals. Components are, in principle, black boxes to the task control of an encompassing component: task control is based purely on information about the success and/or failure of component reasoning. Reasoning of a component is considered to have been successful with respect to its task control focus if it has reached the goals specified by this task control focus to the extent specified (e.g., any or every). 3.4 Delegation of Tasks During knowledge acquisition a task as a whole is modelled. In the course of the modelling process decisions are made as to which tasks are (to be) performed by which agent. This process, which may also be performed at run-time, results in the
216
delegation of tasks to the parties involved in task execution. In addition to these specific tasks, often generic agent tasks, such as interaction with the world (observation) and other agents (communication and cooperation) are assigned.
3.5 Knowledge Structures During knowledge acquisition an appropriate structure for domain knowledge must be devised. The meaning of the concepts used to describe a domain and the relations between concepts and groups of concepts, are determined. Concepts are required to identify objects distinguished in a domain (domain-oriented ontology) , but also to express the methods and strategies employed to perform a task (task-oriented ontology). Concepts and relations between concepts are defined in hierarchies and rules based on order-sorted predicate logic. In a specification document references to appropriate knowledge structures (specified elsewhere) suffice; compositional knowledge structures are composed by reference to other knowledge structures. The semantics of the modelling language are based on temporal logic (cf., [Engelfriet and Treur, 1994; Treur, 1994; Brazier, Treur, Wijngaards and Willems, 1996]. By explicitly modelling and specifying the semantics of static and dynamic aspects of a system, a well-defined conceptual description is acquired that can be used for verification and validation, but also is a basis for reuse. Conceptual design is supported by graphical tools within the DESIRE software environment. Translation to an operational system is straightforward; the software environment includes implementation generators with which formal specifications can be translated into executable code. DESIRE has been successfully applied to design both single agent and multi-agent systems.
4 The Material World and its Symbolic Representation In this section the material world and its symbolic representation, as well as the concept of transducers are discussed. The approach discussed in the Introduction will be applied. In Figure 5 the component material_world simulates the actual material world. All changes with respect to physical aspects of objects take place within this component. The component symbolicrepresentation_ofmaterial_world simulates the state of the symbolic representation of the material world over time. Both components and their interaction will be discussed in more detail in subsequent (sub)sections.
4.1 Material World As discussed in the Introduction, the material world is simulated by a specification in terms of executable temporal rules. The vocabulary within the component materiaLworlo in which these temporal rules are expressed is defined by a signature that has a compositional structure: signature material_wodd_sig signatures generic_material_world sig, specific_material_world_sig, specific_material_brain_sig; end signature
217
symbolic representation ofmaterialworld I materialeffectuationofworld I I
material
~ 1[
symbolicrepresentationofwordl
world
Fig. 5. Transduction links between the material world and its symbolic representation The (composed) signature material_world_sig refers to three other signatures which are specified below. Referring to another signature means that all language elements of that other signature can be used to determine the vocabulary specified by the signature. signature generic_mate dal_world_sig sorts ACTION, AGENT, A G E N T _ P R O P E R T Y , EVENT, OBJECT, POSITION, P R O P E R T Y , SIGN, T I M E ;
sub-sorts ACTION : AGENT : AGENT_PROPERTY :
EVENT; OBJECT; PROPERTY;
objects agent : neg, pos : tO, t l , t2, t3 :
AGENT ; SIGN ; TIME ;
functions position:
O B J E C T * POSITION -> P R O P E R T Y ;
relations at_time: current_time: currently : effect : event_after : event_to_happen : next : next_time_point : precedes :
P R O P E R T Y * SIGN * TIME ; TIME ; P R O P E R T Y * SIGN ; EVENT * P R O P E R T Y * SIGN ; EVENT * TIME ; EVENT ; P R O P E R T Y * SIGN ; TIME ; TIME * TIME ;
end signature signature specific_matedal_world_sig objects car_to_appear: car, ice_cream, supermarket:
EVENT ; OBJECT ;
218
pl, p2, p3: car present:
POSITION ; PROPERTY ;
functions close_by for: goto: has_hit: next_on path: sells:
OBJECT * AGENT -> PROPERTY ; POSITION -> ACTION ; OBJECT * OBJECT -> PROPERTY ; POSITION * POSITION * POSITION -> PROPERTY ; OBJECT * OBJECT -> PROPERTY ;
relations cu rrent_action_by: current_observation_by: current_observation_result_of:
ACTION * AGENT ; PROPERTY * AGENT ; PROPERTY * SIGN * OBJECT ;
end signature signature specific_material brain_sig sorts AGENT_ATOM, BRAIN LOCATION, INFORMATION OBJECT, LTM LOCATION, STM_LOCATION;
sub-sorts AGENTATOM : BRAIN_LOCATION : LTM_LOCATION : STM_LOCATION : INFORMATION_OBJECT:
AGENT_PROPERTY ; POSITION ; BRAIN_LOCATION ; BRAIN_LOCATION ; OBJECT ;
functions contents_of stmto_ltm: has_amnesia: information_object: Itm_location: recovered: recovering: stm location: to_be_stored:
INFORMATION_OBJECT * STM_LOCATION -> EVENT ; AGENT -> AGENT_PROPERTY ; AGENT_ATOM * SIGN -> INFORMATION OBJECT ; AGENT_ATOM * TIME -> LTM_LOCATION AGENT -> AGENT_PROPERTY ; AGENT -> EVENT ; AGENT_ATOM * TIME -> STM_LOCATION AGENT ATOM * SIGN ;
end signature
The (temporal) knowledge simulating the processes within the material world is specified as follows: /* domain dependent knowledge */ at time(position(supermarket, p3), pos, T: TIME) ; attime(sells(supermarket, ice_cream),pes,T: TIME) ; attime(position(agent, pl), pos, tl) ; a t t i m e ( c a r p r e s e n t , neg, t l ) ; effect(car_toappear, position(car, pl), neg) ; effect(car_toappear, position(car, p2), pos) ; effect(goto(P : POSITION), position(agent, P : POSITION), pes) ; effect(recovering(X : OBJECT), has_amnesia(X : OBJECT), neg) ; eventafter(car_toappear, tl) ; precedes(tO, t l ) ; precedes(t1, t2) ; precedes(t2, t3) ; if
at_time(position(A : AGENT, pl), pos, T: TIME) and at_time(position(O:OBJECT,p3),pos,T:TIME) then at time(close by_for(O : OBJECT, A : AGENT), pos ,T : TIME) ; if
current observation_by(P : PROPERTY, A : AGENT)
and current_time(T : TIME) and at_time(P : PROPERTY, S : SIGN, T : TIME) then current_observation_result of(P : PROPERTY, S : SIGN, A : AGENT) ;
219
/* domain independent knowledge */ if
at_time(position(X : OBJECT, P : POSITION), pos, T1 : TIME) end at_time(position(Y : OBJECT, Q : POSITION), pos, T1 : TIME) and not equal(X : OBJECT, Y : OBJECT) and net equal(P : POSITION, Q : POSITION) and precedes(T1 : TIME, T2 : TIME) and at_time(position(X : OBJECT, R : POSITION), pos, T2 : TIME) and at_time(position(Y : OBJECT, R : POSITION), pos, T2 : TIME) then at_time(has_hit(X : OBJECT, Y : OBJECT), pos, T2 : TIME) ; if at_time(has_hit(X : OBJECT, agent), pos, T : TIME) then at_time(has_amnesia(agent), pos, T : TIME) ; if
current action by(A : ACTION, X : AGENT ) and effect(A : ACTION, P : PROPERTY, S : SIGN) then next(P : PROPERTY, S : SIGN) ; if
event_to_happen(E : EVENT) and effect(E : EVENT, P : PROPERTY, S : SIGN) then next(P : PROPERTY, S : SIGN) ; if currently(has_amnesia(X : OBJECT), pos) then event_to_happen(recovering(X : OBJECT)) ; if
current_time(T2 : TIME) end precedes(T1 : TIME, T2 : TIME) and at_time(position(I : INFORMATION_OBJECT, B: STM_LOCATION), pos, T1 : TIME) then eventto_happen(contents of stm to_ltm(I : INFORMATION_OBJECT, B: STM_LOCATION)) ; if
current_time(T1 : TIME) and precedes(T1 : TIME, T2 : TIME) then effect(contents_of_stm_to_ltm(I : INFORMATION_OBJECT, B: STM_LOCATION), position(I : INFORMATION_OBJECT, Itm_location( : INFORMATION_OBJECT, T2 : TIME)), pos ) ; if
currently(has_amnesia(X : AGENT), neg) and current_time(T2 : TIME) and precedes(T1 : TIME, T2 : TIME) and not event_to_happen(contents_of_stm_to_ltm(I : INFORMATION_OBJECT, B: STM_LOCATION)) and at_time(position(I : INFORMATION_OBJECT, B: STM_LOCATION), pos, T1 : TIME) then at_time(position(I : INFORMATION OBJECT, B: STM_LOCATION), pos, T2 : TIME) ; if
current_time(T2 : TIME) and precedes(T1 : TIME, T2 : TIME) and at_time(I : INFORMATION_OBJECT, B: LTM LOCATION), pos, T1 : TIME) then at_time(I : INFORMATION_OBJECT, B: LTM_LOCATION), pos, T2 : TIME) ; if
current_time(T:TIME) and at_time(P : PROPERTY, S : SIGN, T : TIME) then currenUy(P : PROPERTY, S : SIGN) ; if
event_after(E : EVENT, T : TIME ) and current_time(T : TIME) then event_to_happen(E : EVENT ) ; if not equal(P : POSITION, Q : POSITION) then effect(goto(P : POSITION), position(agent, Q : POSITION), neg) ; If
current_time(T1 : TIME) and precedes(T1 : TIME, T2 : TIME) then next_tirne_point(T2 : TIME) ; if
next_time_point(T2 : TIME) and next(X : PROPERTY, S : SIGN) then at_time(X : PROPERTY, S : SIGN, T2 : TIME) ;
220
To execute the temporal rules specified above, updates are required from the current time point to the next time point. These updates are specified by an information link from the component material woddtO itself.
4.2 Symbolic Representation of the Material World In order to reason about the material world and its behaviour, a symbolic representation of the material world is called for. In Figure 5, the component symbolic_representation of material wodd specifies a simulation of such a representation. The vocabulary used within the component symbolic representation of material_worldis specified by the following signature. signature symbolicrepresentation of world_sig sorts WORLD_TERM ;
mets-descriptions matedal_world_sig :
WORLD_TERM ;
relations to be e x e c u t e d b y : to be observed_by: just_acquired :
ACTION * AGENT ; PROPERTY * AGENT ; WORLD_TERM ;
end signature
This signature introduces a new sort WORLD_TERMthat is used in the construction of a meta-description of the signature material world_sig. In the meta-description all n-ary relations of the signature are transformed into n-ary functions into the sort WORLD_TERM.This construction allows, for example, the following atom: just_acquired(current_observation_result_of(car_present, neg, agent)) Within
the component
symbolic_representation_of_material_world n o
knowledge is specified.
The component in principle only models the maintenance of representation states. Also within this component updates are maintained, i.e., whenever an observation has been performed. Updates are specified by an information link from the component symbolic_representation of materialworld t o
itself.
4.3 Transduction Links As discussed in the introduction, there are two issues in using a symbolic representation of the material world. The first is how changes in the material world become to be reflected in the symbolic representation (upward transduction). The second is how changes in the symbolic representation of the world affect the material world itself (downward transduction). In Figure 5, the simulations of transducers are modelled within the framework DESIRE as information links between the output and input interfaces of the components material_worldand symbolic_representation of material_world. The information links that model transducers are called t r a n s d u c t i o n l i n k s (and depicted in italics). The upward transducer is modelled by the transduction link m a t e r i a l e f f e c t u a t i o n of world, the d o w n w a r d by the transduction link symbolic_representation_of world. The downward link transfers actions that are to be performed to the component material_world. Given that the computer systems uses of one or more sensors, observations can be made. The results of observations are transferred to the component symbolicrepresentationof_material_world, by way of the transduction link symbolic_representation_of_world, during which a symbolic representation
221
of the observation results is made that can be processed by the receiving component. In Figure 5 each component has a levelled interface (denoted by the rectangles on side of the components). The transduction link symbolic_representation_of_world transfers epistemic meta-level information on the material world (e.g., expressed by the truth of the atom true(current observation_result_of(car_present, pos, agent))) to object level information that can be used by the component symbolic_representation_of_materiaLwortd (expressed by the truth of the atom just_acquired(current_observation result_of(car present, pos, agent))); atom links
)
true(current_observation result_of(P : PROPERTY, S : SIGN, agent)), just acquired(current_observation_result_of(P : PROPERTY, S : SIGN, agent)) : <<true,true>> ; true(currenLobservation result_of(P : PROPERTY, S : SIGN, agent)), just_acquired(currentobservation_result_of(P : PROPERTY, neg, agent)) : <> ;
The transduction link material_effectuation_of_world links information on actions to be executed of the component symbolic_representation of_material_world, to meta-level information on the material world: atom links
(
to_be_executed_by(A : ACTION, agent), assumption(current_action_by(A : ACTION, agent), pos) : <<true,true>, , > ; to_be_observed_by(P : PROPERTY, agent), assumption(current_observaUon_by(P : PROPERTY, agent), pos) : <<true,true>, , > ;
In this example, the truth value combinations and ensure that previous actions are retracted, so that actions will not be performed ad infinitum.
5 An Agent's Rational Behaviour in Interaction with the Material World As discussed in Section 4 the downward transduction link is needed for the actual execution of actions. However, the component symbolic_representation_of material_world is not modelled as a component in which rational decisions are made on which observation or action is to be performed and when (pro-active behaviour). Such mental decision processes are modelled in the component agent, see Figure 6.
5.1 Agent The component agent models the cognitive symbolic reasoning system of an agent as a logical system. The rational agent can determine to perform observations and actions. The vocabulary used within the component agent is specified by the following signature.
222
observations
symbolic
and actions
I
material representation of world
agent
observation
I
]
]..~ ][ world J~
symb~176176 material
results
Fig. 6. Transduction and symbolic links connecting agent and material world signature symbolic_agent_sig sorts WORLD_TERM
;
rneta-descriptions material_world_sig functions current_observation_result closeby : own_position : visiting :
relations belief: current_belief: desire: goal: most_recent observation: observed: observed_at: possible_observation: to_beexecuted: to be observed: end signature
WORLDTERM :
;
P R O P E R T Y * SIGN -> W O R L D _ T E R M OBJECT-> PROPERTY ; POSITION ; A G E N T -> W O R L D T E R M ;
;
WORLD_TERM ; P R O P E R T Y * SIGN ; OBJECT ; WORLD_TERM ; P R O P E R T Y " SIGN ; PROPERTY ; P R O P E R T Y * SIGN * T : T I M E ; PROPERTY ; ACTION ; PROPERTY ;
Note again the meta-description construct within this signature. 5.2 The Agent Components The agent is modelled as a composed component consisting of two sub-components, ownprocess_control and maintain_world_information, see Figure 7. The reasoning about its goals, desires, and plans is performed within the component own_process control. Its knowledge about the world, obtained by observations, is maintained within the component maintain_wodO_information.
223
agenttaskcontrol
effectuation [ infoto O P C = ~
own process control
I ~ob
servationresults ~
illobs~ervations actionsand
maintain world
effecruation ~
information
oDsel~r
Fig. 7.
representation fofroraOPC
I~
J representation itlfofrom MWI
wOrlOinlo
Transduction links between the agent and its material representation
The component ownprocess_controlcontains the following knowledge: desire(ice_cream) ; to be observed(own_position(P2 : POSITION)); to be observed(car_present); If desire(G : OBJECT) then possible_observation(position(S : OBJECT, P1 : POSITION)) and possible_observation(close_by(S : OBJECT)) and possible_observation(sells(S : OBJECT, G : OBJECT)); if
possible_observation(P : PROPERTY) and not observed(P : PROPERTY) then to_be_observed(P : PROPERTY); if
current_belief(sells(S : OBJECT, G : OBJECT), pos) and desire(G : OBJECT) and current_belief(close_by(S : OBJECT), pos) then goal(visiting(S : OBJECT)); if
goal(visiting(S : OBJECT)) and current_belief(position(S : OBJECT, P : POSITION), pos) and current_belief(own_position(Q : POSITION), pos) and current_belief(car_present, neg) and current_belief(next_on_path(R : POSITION, Q : POSITION, P : POSITION), pos) then to be executed(goto(R : POSITION)); if
current_time(T:TIME) and just_acquired(current_observation_result(P : PROPERTY, S : SIGN))
then observed_at(P : PROPERTY, S : SIGN, T : TIME);
The link observed_world_infotransfers the just acquired knowledge about the world from own_process_control tO maintain_world_information.The agent obtains this knowledge by observations. The link updates the truth values of the atom most_recent_observation ensuring that the atom indeed reflects the most recent information about the world.
224
The link most_recent_observation results determines the beliefs that are to be held by the agent (within its component own_process_control).
5.3 Symbolic Links The symbols representing the decisions to perform observations and actions are linked to the symbolic system modelled by the component symbolic_representation of material_world. All connections between symbolic systems are called symbolic links. Symbolic links are modelled as information links within the framework DESIRE. The symbolic link that transfers the symbolic representations of observations and actions that are to be performed is called observations and actions. This link connects the object level of the output interface of the component agent with the object level input interface of the component symbolic representation_of_materiaLworld: term links (close_by(O : OBJECT), close by_for(O : OBJECT, agent)) ; (own_position(P : POSITION), position(agent, P : POSITION)) ; atom links to be_observed(P : PROPERTY), to be observed_by(P : PROPERTY, agent) : <<true,true>, , > ; to_be_executed(A : ACTION), to_be executed_by(A : ACTION, agent) : <<true,true>, , > ;
The results of observations performed within material_world are transferred to the component agent through the transduction link symbolic_representation_world(see previous section) and the symbolic ]ink observation_results that connects the component symbolic_representation of materialworld t o t h e c o m p o n e n t
agent:
term links (close_by_for(O : OBJECT, agent), close by(O : OBJECT)) ; (position(agent, P : POSITION), own position(P : POSITION)) ; atom links just_acquired(current_observation_result_of(X : PROPERTY, S : SIGN, agent)), just_acquired(current_observation_result(X : PROPERTY, S : SIGN)) : <<true,true>, > ; just_acquired(current_observation_resutt_of(X : PROPERTY, S : SIGN, agent)), observed(X : PROPERTY) : <<true,true>> ; current_time(T : TIME), current_time(T : TIME) : <<true,true>, , > ;
6 An Agent and its Material Representation In Figure 8 the cognitive symbolic system of the agent is modelled by the component agent described in the previous section. The component material_representation_ofagent models the material representation of (the symbolic system of) the agent. As discussed in the Introduction, the relation between the agent and its material representation is modelled in a manner similar to the manner in which the relation between the material world and its symbolic representation is modelled. An upward
225
transducer defines how symbolic aspects of the agent are represented in a material form, a downward transducer defines how properties of the material world affect the processes of the symbolic system within the agent.
material representation of agent
symbolic effectuationof agem I I material representationof agent
agent
Fig. 8.
Transduction links between the agent and its material representation
6.1 Material Representation of an Agent The vocabulary used within the component material_representation_of_agentis specified by the following composed signature. signature rnaterial_representation_of agent_sig signatures generic_material_world_sig, specific_material brain_sig ; meta-descriptions symbolie_agent_sig: AGENT_ATOM ; end signature
The signature symbolic_agent_sigis used in a meta-description construct, such that the relations of that signature can be used as functions into the sort AGENTATOM.The sort AGENT_ATOMis a sub-sort of the sort AGENT_PROPERTY,which in turn is a sub-sort of the sort PROPERTY.Therefore, all relations that have PROPERTYas an argument and are specified in the signatures generic_material world_sig and speeific_material_brain_sig can be applied to the new terms. Within the component material_woad a simple model for memory is specified. The component material_representation_of_agent only maintains a state, from and to which information is transferred to and from the component materiaLworld.The only exception
226
is the following knowledge base rule that combines the information to be stored and the current time point and determines the actual storage of the information as a physical property: if
current_time(T : TIME) and to_be_stored(A : AGENT_ATOM, S : SIGN) then at_time(position(information_object(A : AGENT ATOM, S : SIGN), stm_location(A : AGENT_ATOM, T : TIME), pos, T : TIME) ;
6.2 Transduetion Links The information maintained by the agent is built of atoms with an explicit reference to their truth value in the form of a sign. The atom is transformed into a term by the transition from the agent to its material representation. For example, the atom observed_at(car_present, neg, tl) that could be used within the component agent, is represented by a term within the component material_representation_of_agent.I f the atom is true within agent, the sign pos is to be added within material_representation_of_agent,i f t h c atom was false, the sign neg is to be added. If the atom has the truth-value unknown it is not stored in material_representation of agent. If the agent, by reasoning, makes cognitive changes in its beliefs, desires, goals, or knowledge, the material representations of these changes are materialised in the brain. This process of upward transduction is modelled (see Figures 7 and 8) by the transduction links material representation_or_agent, representation into from OPC and representation into from_MWl. A s a n example, the atom links of the transduction link representation into from_OPC are specified as follows: atom links
( ) ( )
true(A : IIOA), to be stored(A : A G E N T A T O M , pos) : <<true,true>> ; false(A : IIOA), to be stored(A : AGENT ATOM, neg) : <<true,true>> ;
An example of an instantiated atom link of representation into from_OPC is: ( )
true(observed_at(car~oresent, neg, tl)), to_be stored(observed_at(car present, neg, tl), pos) : <<true,true>> ;
An example of an instantiated atom link of material_representation of_agent is: ( )
to be stored(observed at(car present, neg, tl), pos), to be stored(observed_at(car present, neg, tl), pos) : <<true,true>> ;
For simplicity in this paper it is assumed that there exist functions that relate information in memory to locations within the brain, i.e., positions: position(I : INFORMATION_OBJECT, B : BRAIN_LOCATION)
The simple model for memory used in this paper has a short-term memory and a long term memory. To model this distinction, the sub-sort BRAIN_LOCATIONof POSITIONhas two sub-sorts: STM_LOCAT~ONand LTM_LOCAnON.Given the atom of the agent (a term of the sort AGENT_ATOM)and a time point (a term of the sort TIME), the function stm_location relates information to a position within the short term memory, whereas ItmJocation
227
relates information to a position within the long term memory. The time point used by the function is the moment in time that the information is stored into the memory. An information object is specified as information_object(A : AGENT_ATOM, S : SIGN),
where the sort AGENTATOM contains objects that refer to atoms of the agent, e.g., observed_at(car present, neg, tl). The current status of the memory is modelled by atoms of the form: currently(position(information_object(A: AGENT_ATOM,S : SIGN),B : BRAIN_LOCATION),pos) If a physical change within the component material_representation_of_agent occurs, the symbolic interpretation of the changed information is linked to the component agent by the downward transduction process, modelled by the transduction links symbolic_effectuation__of_agent, effectuation_info to_OPC and effectuation_info_to_MWl. T h e atom link of the transduction link effectuation_infoto__oPc are specified as follows: atom links
( )
currently(position(information_object(A : AGENT_ATOM, S : SIGN), B : STM_LOCATION), pos), assumption(A : AGENT_ATOM, S : SIGN) : <<true,true>, , > ;
By these transduction links object level information from the component material_representation_of_agent is transferred to meta-level information within the component agent, which defines the current information state of the agent.
material updatematerial representatio~ of age
representation of agent
~pdate naterialworld
slmboliceffectuationof agent
I
materialrepresentationof agent
material world
Fig. 9. Transduction and material links connecting material world and agent
228
7 The Material World's Physical Behaviour in Interaction with an Agent The material representation of the agent is a part of the material world. Therefore, the component material_representation_of_agentis modelled as a simple component for passing on information. The material links connecting these two components (see Figure 9), update_material_worldand update_material_representation_ofagent, are s i m p l e i d e n t i t y l i n k s , i.e.,
they only transfer information, they do not translate it. For example, the material link update_material_representation_of_agentlinks atoms to themselves: ( )
at_time(position(I : INFORMATION_OBJECT, B : BRAIN_LOCATION), S : SIGN, T : TIME), at_time(position(I : INFORMATION_OBJECT, B : BRAIN_LOCATION), S : SIGN, T : TIME) : <<true,true>, , > ;
8 The Complete Model As can be seen from Figures 5, 6, 8, and 9 it is possible to create a symbolic representation of a material system and to create a material representation of a symbolic system. In Figure 10 all components and all information links (transduction, symbolic and material links) of the top level of the complete model are presented. Together, they sketch two connections between the agent and the material world. The connection between material representations and symbolic representations is made by transduction links, between symbolic representations by symbolic links and between material representations by material links.
9 Trace of the Example Interaction Patterns In this section it is shown how the course of events in the example introduced in Section 2 is simulated as a reactive pattern using the model introduced in the previous sections. The trace is started at the moment that the agent is in position pl and has observed that a supermarket where ice cream is sold is at position p3, and that a path from pl to p3 is available with p2 as next position. Moreover, the agent has observed that no car was present. These observations where made using the transduction links symbolic_effectuation of world and material_representation of world b e t w e e n the m a t e r i a l w o r l d
and its symbolic representation, and the symbolic links observations_andactionsand observation_results. As a result the observation information is available within the agent (as current beliefs). The trace is started at time point tl. The situation at time tl is represented in Figure 3. - reasoning within the component agent;
it derives conclusions goal(visiting(supermarket)),to_be_executed(goto(p2)) transfer the action to the material world
by the symbolic link observations and actions tO the component symbolic_representation_of_materialworldand b y the d o w n w a r d t r a n s d u c t i o n l i n k materiaI_effectuation of world tO the c o m p o n e n t material_world
229
r"reTaT_
update material
martial
1
of agent
~rial
representation r'] update world
o
reS;eT~nb~176
s)~aboliceffectuationof agent I material representationof agent
[El
of materialworld
II
material rep. . . . . . tion of world [ I s)mbolic effectuationof world
agent
matenaJ world
observation results
Fig. 10. Transduction, symbolic and material links connecting agent and material world
- execution o f the action and the event car appears within the material world; determination o f consequences thereof see Figure 4.
determination of the effect at_time(position(agent,p2), t2) of the action goto(p2); determination of the effect aLtime(position(car,p2), t2) of the event car__to_appear; determination of aLtime(has_hit(car,agent),t2); determination of at_time(has_amnesia(agent),t2); no determination of at_time(position(l:lNFORMATION_OBOECT, B:STM_LOCATION),pos, 12) because the condition currently(has_amnesia(agent),neg) lacks; - transfer the effects to the agent
the material effects and their consequences are transferred by the material link update_material_representation_oLagentand from there by the downward transduction link symbolic_effectcation_of_agenttO the component agent; because of this no facts are available anymore that were materially represented in the STM memory: the agent looses facts such as: goal(visiting(supermarket)), to be executed(goto(p2)),and all observations performed at the previous two time points - reasoning o f the agent with lost STM.
the agent has a severe lack of knowledge about its current situation; e.g., what is its position, what was its goal; it decides to observe again. By new observations information on the current material situation can be obtained; however, information about its (previous) goal cannot be found that easily. and so on .....
230
10 Other Types of Interaction Patterns Between Mind and Matter In the previous sections the example course of events was simulated as a reactive pattern trough Figure 10 from the lower left hand side component (agent) tO the upper right hand side component (symbolic_representation_of_material_world)to the lower right hand side component (material world) tO the higher left hand side component (materialrepresentation_of_agent) to the lower left hand side component (agent). Also for various other types of interaction between symbolic systems and material systems such patterns can be identified. In this section a number of examples are discussed. 9
Drug use
Using the model introduced in this paper the process of taking a (narcotic) drug can be simulated as follows (see Figure 10): - decision o f the agent to take the drug
reasoning within the component agent; deriving conclusion to_be executed(take_drug) - transfer the action to the material world
by the symbolic link observations and actions to the component symbolic_representation of material_worldand by the downward transduction link material_effectuation_of_worldtO the component material_world - execution o f the action take drug within the material world
determination of the effect active_brain of the action takedrug - transfer the effects o f take drug to the agent
by the material link update_material_representation of_agentand the downward transduction link symbolic effectuation of agenttO the component agent - execution o f the agent with drug effect 9
Agents planning and executing birth and death
Using the model introduced in this paper the process of creating a new child agent by a rational agent can be simulated by a similar pattern in Figure 10: - decision o f the agent to create a child agent
reasoning within the component agent; deriving conclusion to be executed(create_child) - transfer the action to the material world
by the symbolic link observations_and_actionstO the component symbolic_representation_of_material_worldand by the downward transduction link material_effectuation of worldtO the component material_world - execution o f the action create chiM within the material world
determination of the effect of the action createchild - transfer the effects o f to create child to the agent
by the material link update_material_representation_of_agent and the downward transduction link symbolic effectuatioo_ot_agenttO the component agent; this link modifies the component agent by replacing it by two similar components - execution o f the agent and its child agent
In a similar manner a rational action to kill an agent can be modelled,
231
9 Psychosomatic diseases For psychosomatic diseases the pattern in Figure 10 proceeds in a different direction: from the lower left hand side component to the upper left hand side component to the lower right hand side component. For example, a heart attack induced by psychological factors can be modelled as follows. the agent reasons about highly stress-provoking information stressful reasoning within the component agent transfer of the stress to the material representation of the agent by the upward transduction link material_representation of agent to the component materialrepresentation_of_agent (to the property over_activebrain) and by the material link update_material_worldto the component material wodd - execution of the material world determination of the effect over_active_brainto heart functioning
11 Discussion Internal representations of the material world as maintained by an agent, are related to the material world by a representation/reference relation. In this paper a simulation model is introduced covering both a sub-model for the agent (simulating its mental processes) and a sub-model for the material world (simulating its physical processes). The semantical relations between the two sub-models are formalised as dual repesentation relations. In the model it is taken into account that the agent's mind has a materialisation in the form of a brain. Most parts of the specification of the model are generic; although the example instantiation that is used to illustrate the model is kept rather simple, the generic part of the model can be (re)used to simulate a variety of phenomena in which (multiple) mind-matter interactions occur. The compositional design method DESIRE supports that specific components in the model can be replaced by other components without affecting the rest of the model. For example, more sophisticated memory models can replace the rather simplistic model used as an illustration in this paper. The work in this paper is of importance for 9 foundational questions from a philosophical and logical perspective 9 research in cognitive psychology, neuro-physiology, and their relation 9 application to dynamic multi-agent domains in which agents can be created and killed The relevance of the model for each of these three areas will be explained. An interesting foundational philosophical and logical issue is the semantics of dual representation relations (see also [Hofstadter, 1979]). Both from a static and from a dynamic perspective further questions can be formulated and addressed. For example, the further development of a foundation of semantic attachments and reflection principles [Weyhrauch, 1980] in the context of dual representation relations, and dynamically changing mental and physical states. Another question is the semantically sound integration of (qualitative and quantitative) simulation techniques and (temporal) logical modelling.
232
Cognitive and neuro-physiological models can be semantically integrated using the model introduced in this paper. The presented generic model can be instantiated by existing models of both kinds. A useful test for existing philosophical approaches to the mind-body problem (e.g., such as described in [Harman, 1989]) is to investigate the possibility to operationalise them using the presented model. Among the applications of the model are agents capable of planning and executing life affecting actions, such as giving birth and killing (other) agents. These capabilities are essential for Internet agents that can decide on the fly to create new agents to assist them in their tasks and removing these agents after completion of the task they were created for. This application area is one of the focusses of our current research.
References Brazier, F.M.T., Dunin-Keplicz, B., Jennings, N.R and Treur, J. Formal specification of Multi-Agent Systems: a real-world case. In: V. Lesser (ed.), Proceedings of the First International Conference on Multi-Agent Systems, ICMAS'95, MIT Press, Cambridge, MA, 1995, pp. 25-32. Brazier, F.M.T., Dunin-Keplicz, B., Jennings, N.R. and Treur, J. DESIRE: modelling multiagent systems in a compositional formal framework, International Journal of Cooperative Information Systems, M. Huhns, M. Singh, (eds.), special issue on Formal Methods in Cooperative Information Systems: Multi-Agent Systems, vol. 6, 1997, pp. 67-94. Brazier, F.M.T., Jonker, C.M., Treur, J., Formalisation of a cooperation model based on joint intentions. In: J.P. MOiler, M.J. Wooldridge, N.R. Jennings (eds.), Intelligent Agents III (Proc. of the Third International Workshop on Agent Theories, Architectures and Languages, ATAL'96), Lecture Notes in AI, volume 1193, Springer Verlag, 1997, pp. 141155. Brazier, FM.T., Treur, J., Wijngaards, N.J.E. and Willems, M. Temporal semantics of complex reasoning tasks. In: B.R. Gaines, M.A. Musen (eds.), Proceedings of the 10th Banff Knowledge Acquisition for Knowledge-based Systems workshop, KAW'96, Calgary: SRDG Publications, Department of Computer Science, University of Calgary, 1996, pp. 15/1-15/17. Chang, C.C., Keisler, H.J., Model theory, North Holland, 1973. Dalen, D. van, Logic and Structure, Springer Verlag, 1980. Engelfriet, L, Treur, J., Temporal Theories of Reasoning. In: C. MacNish, D. Pearce, LM. Pereira (eds.), Logics in Artifical Intelligence, Proceedings of the 4th European Workshop on Logics in Artificial Intelligence, JELIA '94, Springer Verlag, 1994, pp. 279-299. Also in: Journal of Applied Non-Classical Logics 5 (1995), pp. 239-261. Harman, G. Some Philosophical Issues in Cognitive Science: Qualia, Intentionality, and the Mind-Body Problem. In: (Posner, 1989), pp. 831-848. Hofstadter, D., G~del, Escher, Bach: an Eternal Golden Braid. Basic Books, New York, 1979. Hodges, W., Model theory, Cambridge University Press, 1993. Laird, J.E., Newell, A., Rosenbloom, P.S., Soar: An architecture for general intelligence. Artificial Intelligence 33 (1987), pp. 1-64. Lindsay, P.H. Norman, D.A., Human Information Processing. Academic Press, 1977. Newell, A., Physical Symbol Systems, Cognitive Science 2 (1980), pp. 135-184.
233
Newell, A., Rosenbloom, P.S., Laird, J.E., Symbolic Architectures for Cognition. In: [Posner, 1989], pp. 93-132. Posner, M.I. (ed.), Foundations of Cognitive Science. MIT Press, 1989. Pylyshyn, Z.W., Do mental events have durations? Behavioral and Brain Sciences, vol. 2 (1979), pp. 277-278. Pylyshyn, Z.W., Computation and Cognition: Towards a Foundation for Cognitive Science. MIT Press, 1984. Pylyshyn, Z.W., Computing in Cognitive Science. In: [Posner, 1989], pp. 49-92. Simon, H.A., Kaplan, C.A., Foundations of Cognitive Science. In: [Posner, 1989], pp. 1-48. Smith, B.C., Reflection and semantics in a procedural language. MIT Comp. Science Tech. Report 272, Cambridge, Massachusetts, 1982. Smith, B.C., Prologue to Reflection and Semantics in a Procedural Language. In: Brachman, R.J., Levesque, H.J. (eds.), Readings in Knowledge Representation. Morgan Kaufman, 1985, pp. 31-40. Tarski, A., Der Wahrheitsbegriff in den formalisierten Sprachen. Studia Philosophica, 1:261405, 1936. English translation: A. Tarski, Logic, Semantics, Metamathematics. Oxford University Press, 1956. Treur, J., Completeness and definability in diagnostic expert systems, In Proceedings European Conference on Artificial Intelligence, ECAI-88, 1988, pp. 619-624. Treur, J., Declarative functionality descriptions of interactive reasoning modules, In H. Boley, M.M. Richter (eds.) Proceedings International Workshop on Processing of Declarative Knowledge, PDK'91, Lecture Notes in Artificial Intelligence, vol. 567, Springer-Verlag, 1991, pp. 221-236. Treur, J., Temporal Semantics of Meta-Level Architectures for Dynamic Control of Reasoning. In: L. Fribourg, F. Turini (ed.), Logic Program Synthesis and Transformation - MetaProgramming in Logic (Proc. of the Fourth International Workshop on Meta-Programming in Logic, META'94), Lecture Notes in Computer Science, vol. 883, Springer Verlag, 1994, pp. 353-376. Weyhrauch, R.W., Prolegomena to a theory of mechanized formal reasoning, Artificial Intelligence 13 (1980), p 133-170.
Delegation Conflicts
Cristiano Castelfranchi - Rino Falcone IP - CNR - Roma, Italy Division of "AI, Cognitive Modelling and Interaction'* {cris, faleone} @pscs2.irmkant.rm.cnr.it
Abstract. In this paper we study possible conflicts arising between an agent (the "client") delegating some tasks to some other agent, and this agent (the "contractor") adopting and/or satisfying those tasks; conflicts which are either due to the intelligence and the initiative of the delegated agent or to an inappropriate delegation by the client. We present a plan-based definition of delegation, adoption and task, and a theory of different kinds and levels of delegation and adoption. We examine two kinds of conflicts due to different cooperative attitudes between the two agents: 9 conflicts that arise when the provided help does not match the intended delegation (sub help, over help, critical help, hyper-critical help). In particular, we examine paradoxical conflicts due to the agent's willingness to collaborate and to better and deeply help the other, included conflicts due to the tutorial (paternalistic) attitude of the helper. ~ conflicts between the reasons and motives for the adoption as planned by the client and the reasons for adopting offered by the helper. We neither analyse conflicts due to some misunderstanding or to the helper's personal interest, nor conI'licts about the delegation of the control. We claim that delegation and its related conflicts are the core of the interaction with any kind of autonomous agent and are relevant for modelling both MAS, organizations, and user-agents interaction. In fact, in order to exploit local knowledge and local adaptation delegation cannot be fully specified.
Introduction Autonomy and intelligence are precious features in those agents one is relying on for some task or function. This is why natural selection developed a mind in some organisms so that they can flexibly and creatively guarantee their biological functions, and see about their adaptation. In fact, if an agent has to passively and mechanically just executing a rigid and very specific task, it will not be able to solve possible problems, to adapt its behaviour to circumstances, to find better solutions, etc. But, of course, there is a tradeoff: the more intelligent and autonomous the agent (able to solve problems, to choose between alternatives, to reason and to plan by itself) the less quickly and passively This research has been supported by Italian National Research Council Project SARI (Coord. Prof. L. Caducei Aiello), 1996-97. A number of aspects of this model are under development within the GOLEM Project (University of Bari - IP-CNR). We would like to thank F. de Rosis for useful comments on previous version of this paper.
235
"obedient" it is. The probability that the provided solution or behaviour does not correspond to what exactly we expect and delegated, increases. In this paper we study possible conflicts arising between a "client" delegating some tasks to some agent, and the "contractor" or in general the agent adopting and/or satisfying those tasks; conflicts which are either due to the intelligence and the initiative of the delegated agent or to an inappropriate delegation by the client. In particular, we examine paradoxical conflicts due to the agent's willingness to collaborate and to better and deeply help the other. In the first part, we present our definition of delegation and adoption, a plan-based definition of tasks, and of different kinds and levels of delegation and adoption. This implicitly also eharaeterises different levels of agency and autonomy (in the delegated agent). Delegation and adoption are two basic ingredients of any collaboration and organization. In fact, the huge majority of DAI and MA, CSCW and negotiation systems, communication protocols, cooperative software agents, etc. are based on the idea that cooperation works through the allocation of some task (or sub-task) of a given agent (individual or complex) to another agent, via some "request" (offer, proposal, announcement, etc.) meeting some "commitment" (bid, help, contract, adoption, etc.). This core constituent of any interactive, negotial, cooperative system is not so clear, well founded and systematically studied as it could seem. Our claim is that any support system for cooperation and any theory of cooperation require an analytic theory of delegation and adoption. We will contribute to an important aspect of this theory with a plan-based analysis of delegation. In the second part we present the most interesting conflicts that arise when the provided help does not match the intended delegation. In particular, we will examine conflicts due to over help (we will distinguish among over, critical, overcritical, and hypercritical help) and to the tutorial (paternalistic) attitude of the helper. In the third part another source of conflict relative to the relationship between delegation and adoption is examined: the conflict between the reasons and motives for the adoption as planned by the client and the reasons for adopting offered by the helper. All these conflicts might be based on some misunderstanding between the delegating and the delegated agent, but we will not consider this case, since we are interested in studying conflicts due to different cooperative attitudes between the two agents, which cannot be solved by a simple clarification, and require other kinds and levels of negotiation. All this is claimed to be important both for supporting human cooperative behaviours and the organisations, and for agents and Multi-Agent (MA) systems. This relevance becomes clear considering on the one hand that the notion of "agent" itself is very often based on the notion of delegation [Mac, Goo, Luc], on the other hand that task delegation and task adoption, and the related conflicts and negotiations are the core problem of MA systems [Carl, Ros] and of their protocols [Had]. Also our examples (see w will provide some evidence of current and future importance of this kind of conflicts in MASs. We claim that delegation and its related conflicts are the core of the interaction with any kind of autonomous agent.
I A plan-based theory of Delegation/Adoption I.I Definitions Informally, in delegation an agent A needs or likes an action of another agent B and includes it in its own plan. In other words, A is trying to achieve some of its goals through
236
B's actions; thus A has the goal that B performs a given action. A is constructing a MA plan [Casl, Kin] and B has a "part", a share in this plan: B's task (either a state-goal or an action-goal). In adoption an agent B has a goal since and until it is the goal of another agent A, i.e. B has the goal of performing an action since this action is included in the plan of A. So, also in this ease B plays a part in this plan. Both delegation and adoption may be unilateral: B may ignore A's delegation while A may ignore B's adoption. In both cases A and B are, in fact, performing a MA plan. In the following we present a plan-based formal analysis of delegation/adoption. We assume that, to delegate an action necessarily implies to delegate some results of that action. Conversely, to delegate a goal state always implies the delegation of at least one action (possibly unknown by A) that produces such a goal state as result. Thus, we consider the couple action/goal x=(ct,g) as the real object of the delegation, and we will call it task. Then by x, we will refer to the action (ct), to its resulting world state (g), or to both. The action (ct) may be either elementary or complex (i.e. a plan). It is possible to delegate not only tasks but also roles [Wer], since roles are based on classes of delegated tasks (see w 1.6). In this way roles are viewed both as social relationships among agents and as defined position in MA-ptans and in organizations. Plan hierarchy affects role hierarchy, and the delegation levels correspond to different levels and entitlements of roles [Cas2]. There exists an hierarchical structure of roles which is related to the kind of task and to the plan structure (for ex. executive roles and decisional roles). Once established a role (role-contract) this creates constraints for task-delegation: the delegation of a specific task must instantiate the role tasks. There could rise conflicts between task-delegation and role-contract, and the negotiation about specific tasks can bring it up for discussion again the related role-contract.
12 Plan Ontology Since the hierarchical nature of a plan and the position of x in it is the basis of the levels of both delegation and adoption, we need to introduce in this section a more formal representation of agents, actions, plans [Poll and action results. For a more detailed description of these concepts see [Cas2, Fall. Let Act={ct I..... cq} be a finite set of actions, let Agt={Al .... A,, B, C .... } a finite set of agents. Each agent has an action repertoire, a plan library, resources, goals, beliefs, interests [Con]. The general plan library is li = II" U II d, where I I ' is the abstraction hierarchy rule set and lId is the decomposition hierarchy rule set. As usual for each action there are: body, preconditions, constraints, results. We will call c t a composed action (plan) in 1-I if there is in II d at least one rule: ct -> cq,..,ct,. The actions cq ..... cq are called component actions of a. We will call et an abstract action (plan] in lI if there is in IT at least one rule: ct --> cq. cq is called a specialized action of ct. An action r is called elementary action in 1-I if there is no rule r in li such that ct' is the left part of r. We will call BAct (Basic Actions), the set of elementary actions in lI and CAct (Complex Actions) the remaining actions in Act: BAct__Act, CAct = Act - BAct.
237
Given cq, o~ and H, we will say that at dominates a2 (or a2 is dominated by al) if there is a set of rules (r~.... r~,) in II, such that: (cq ---Lr0^(cqERr. )^(Lri65Rri.0 where: Lrj and Rrj are, respectively, the left part and the right part of the rule rj and 2.ci,~m. We will say that aj dominates at level k a2 if the set (h .... r~) includes k rules. We will call Act^, the set of actions known by the agent A. Act^_CAct. Called HA the A's plan library, the set of the irreducible actions (through decomposition or specification rules) included in it is composed of two subsets: the set of actions that A believes to be elementary (BAct^) and the set of actions that A believes to be complex but has not reduction rules for (NRAct^: Non Reduced actions). Then BAct^CAct and it might be that BAct^~BAct. In fact, given an elementary action, an agent knows (or not) the body of that action. We will call skill set of an agent A, S^, the actions in BAct^ whose body is known by A (action repertoi re of A). S ^ _BAct^. US~ (on all A,EAgt) __CBAct. I In sum, an agent A has his own plan library, HA, in which some actions (CAct A and NRActA) are complex actions (and he knows the reduction rules o f CActA) while some other actions (BActA) are elementary actions (and he knows the body o f a subset - SA - o f
them). A has a complete executableknow-how of ct if either ctES, or in I'I^there are a set of rules (r~.... rm) able to transform ct in (cq ....a k) and for each l-:i-~k,oqES^. The operator CEK(A,ct) returns ((r,....r,0),then CEK(A,a)~0 (0 is the empty set) when A has at least a complete executable know-how of a. Infact,CEK(A,a) characterizesthe executiveautonomy of the agent A relativeto a. To execute an action a -that we represent with Execution(a)- means: - to execute the body o f a, i f ct is an elementary action or - to execute the body o f each elementary action to which a can be reduced (through one o f the possible sequences o f rules in H), i f a is not an elementary action.
From the previous assertions it follows that an action a might be an elementary action for the agent A while it might he a plan for another agent B. Again, the same plan a could have, for different agents, different reduction rules. Agents execute actions to achieve goals: they look up in their memory the actions fitting the goals, select and execute them. Fixed some pertinent world states c, we will call R(a,c) the operator that, when applied to an action a and to c, returns the set of the results produced by a (when executed alone). We will assume that changing the pertinent aspects of the world state in which an action is applicable, its results will change and the name of the action itself changes. Then R(a,c) may be denoted with R(a) because c are defined in a univoque way.
1 Notice that we do not call; as usual, "body" the decomposition rule of a plan but only the procedural attachment of the elementary actions and their procedural composition in case of complex actions.
238
We define P(tx) = Pc(a) U Cn(ct) where Pc is the operator that when applied to any action ct returns the set of the preconditions of ct and Cn(ct) is the set of constraints of tx. An action a can be executed if preconditions and constraints of ct are satisfied. RA(Ct) returns the results that A believes ct will produce when executed. R^(ct) might (or not) correspond with R(ct), however when an action has been executed each agent in Agt has the same perception of its results: exactly R(ct). For each action ct, Executed(ct)ER(et), after Execution(a), Executed(a) holds. We will call relevant results of an action for a goal (set of world states), the subpart of the results of that action which correspond with the goal; more formally, given a and g, we define the operator Rr such that: Rr(ct,g)={g~ I ~(Eg} if gC_R(ct), --0 otherwise. Then, the same action used for different goals has different relevant results. Let us suppose that ct is a component (or specialized) action of ct' and Rr(ct',g),~0; we define pertinent results of ct in ct' for g, Pr(ct,ct',g), the results of tx useful for that plan ct' aimed at the goal g; they correspond with a subset of R(ct) such that: 1) if ct is a component action of ct': P r ( a , a ' , g ) = {q~ER(a)) I ( q ~ R r ( a ' , g ) ) v ((qi=P(al))^(dominate-level-1 a' al)^(a,~al))}; in other terms, an action ct is in a plan ct' (aimed at a goal g) either because some of its results are relevant results of a ' (aimed at g) or because some of its results produce the preconditions of another action cq in that plan. 2) if ct is a specialized action of ct': Pr(ct,ct',g) = {ql(ch~R(et)) ^ (~ljl (qjE~Rr(a',g)) ^ (q~ is a specialization of %)2 }. The pertinent results of an action ct in a' represent the real reason for which that action ct is in that plan ct'. Hereafter we will call A the delegating-adopted agent (the client) and B the delegatedadopting agent (the contractor).
1.3 Types and levels of delegation Delegation is a "social action", and also a meta-action, since its object is an action. We define the operator of delegation with 4 parameters: (Delegates A B x d), where A,BEAgt, x=(ct,g), d=deadline. This means that A delegates to B the task x with the deadline d. In the following we will put aside both the deadline of x, and the fact that in
2 Let us define temporary results of an action a in a plan ct', the results of a that are not results of ct': Tr(a,a') = {q~ I(o~ER(a)) ^ (qi~R(a')))}. We define transitory results (or pertinent temporary results) of an action a in a plan (1' aimed at the goal g: TRr(ct,a',g) = Tr(a,a')^Pr(a,ct',g) they correspond with those results of a that enable another action cq in a ' but that are not results of a ' aimed at the goal g: TRr(a,(x',g)={q~ I (qER(a)) A (qi~R(et')) ^ (o~=P(a~)) ^ (dominate-levei-I ct' cq) ^ ((x,~al)}. Let us define relevant results of ct in a ' aimed at g: Rr(a,a',g)={q~l(q~ER(a))^ (q,O~'(a' ,g))}
239
delegating x, A could implicitly delegate also the realization o f a preconditions (that normally implies some problem-solving and/or planning). We can consider several dimensions of the notion of delegation.
Relation-based types of delegation Different types of delegation may be characterized on the basis of the relation between the delegating agent and the delegated one. Weak delegation - there is no agreement, no request or even influence: A is just exploiting in its plan a fully autonomous action of B. More precisely, a) The achievement o f x 3 is a goal of A. b) A believes that exists another agent B that has the power of [Cas3] achieving x. c) A believes that B will achieve x in time. c-bis) A believes that B intends to achieve x in time (in the case that B is a cognitive agen0. d) A prefers 4 to achieve x through B. e) The achievement of x through B is the goal of A. f) A has the goal (relativized to (e)) of not achieving x by itself. We consider (a, b, c, and d) what the agent A sees as a "Potential for rely on" the agent B; and (e and f) what A sees as the "Decision to rely on" B. We consider "Potential for rely on" and "Decision to rely on" as two constructs temporally and logically related to each other. All these cognitive ingredients behind the act of delegation represent A's trust towards B.
Mild delegation - there is no agreement, no request, but A is itself eliciting, inducing in B the desired behaviour in order to exploit it. Strict delegation: - is based on B's adopting A's task normally in response to A's request/order.
Specification-based types of delegation The object of delegation (x) can be minimally specified (open delegation), completely specified (close delegation) or specified at any intermediate level. Let us consider two main cases from A's point of view:
3 , Achieve T" means either i) 3a' EAct with gc_R(a') and (Executed a g , if x=g or ii) (Executed a), if either x=-(a,g) or x=a. 4 This means that, either relative to the achievement x or relative to a broader goal g' that includes the achievement of'c, A believes to be dependent on B [Sic].
240
- Pure Executive (Close) Delegation: when either etESA,or et~BActA, or g is the relevant result of et (and ct~SA or et~BActA). In other words, the delegating agent believes to delegate a completely specified task. - Open Delegation: either etC-CActA,or etENRAct^; and also when g is the relevant result of ct (and etECAct^ or et~NRAct^). In other words, agent A believes to delegate an incompletely specified task: either A is delegating a complex or abstract action, or he is delegating just a result (state of the world). The agent B can (or must) realize the delegated task in an autonomous way. Implicit aspects of the delegation produce various possible misunderstandings among the agents. A's perspective can be in contrast with B's point of view: x can be considered at different levels of complexity from the two interacting agents (see table 1). iiii ii
A,'s point of view x=a with (ctESA)v(et~BActA) ~It is an elementary action/~ x=et with (ctENRActA)v I(ctECActA) "It is a complex action/"
correspondin~ B's point of view (a~SB)v(aEBActa) (aC--CActa)v(aENRActB) "It is an elementary action/" -> No Conflict (a~SB) v (et~BActB)
"It is a complex action/" .> Conflict (aECActa)v (ctENRActB)
"It is an elementary action.'" -> Conflict
"It is a complex action/" -> No Conflict
table 1 It is worth to understand the great importance of open delegation in collaboration theory. On the one hand, we would like to stress that open delegation is not only due to A's preference (utility) or practical ignorance or limited ability (know how). Of course, when A is delegating to B x, he is always depending on B as for x [Sic]: he needs B's action for some of his goals (either some domain goals or goals like saving time, effort, resources, etc.). However, open delegation is fundamental because it is also due to A's ignorance about the world and its dynamics. In fact, frequently enough it is not possible or convenient to fully specify 9 because some local and updated knowledge is needed in order for that part of the plan to be successfully executed. Open delegation is one of the basis of the flexibility of distributed and MA plans. To be radical, delegating actions to an autonomous agent always requires some level of "openness~: the agent cannot avoid monitoring and adapting its own actions. On the other hand, we would like to show how the distributed character of the MA plans derives from the open delegatio. As we saw, A can delegate to B either an entire plan or some part of it (partial delegation). The combination of the partial delegation (where B might ignore the other parts of the plan) and of the open delegation (where A might ignore the sub-plan chosen and developed by B) creates the possibility that A and B (or B and C, both delegated by A) collaborate in a plan that they do not share and that nobody entirely knows: that is a distributed plan [Gro, Con]. However, for each part of the plan there will be at least one agent that knows it.
241
Kinds of delegation object The object of the delegation can be a practical or domain action as well as a meta-action (searching, planning, choosing, problem solving, and so on). When A is open-delegating to B some domain action, it is necessarily also delegating to B some meta-action: at least searching for a plan, applying it, and sometimes deciding between alternative solutions. We call B's discreption about x, the fact that some decision relative to x is delegated to B.
Control-based types of delegation Control is an action aimed at knowing whether an other action has been successfully executed or not. Controlling an action means to verify that its relevant results hold (including the execution of the action itself). Given Rr(ct,g) of any action etEAct, a set of actions ctcr=Act -that we call *control actions o f a a i m e d a t g"- may be associated to it. Each action in ctc can be either an elementary or a complex action. The relevant results of each CtkECtc for the goal of controlling ct can be indicated through Rr(ct k (control ct g)). It returns the truth values of each &Eg in Rr(ct, g). Plans typically contain control actions of some of their actions. When A is delegating a given object-action what about its control actions? Considering, for simplicity, that the control action is executed by a single agent, when (Delegates A B x) there are four possibilities: i) ~l~kEaC l(Delegates A B at) (i.e., A delegates the control to B); ii) =lo,,Ea ' , :iX~_Agt (with (X,,A)^(X,,B)) I (Delegates A X otO (i.e., A delegates the control to a third agent); iii) For each a t E c : , for each X~_Agt I (not (Executed cto) (i.e., A gives up the control); iv) :lCtkE/Ct~ I (Executed ak)^(Agent (Execution ctk )=A) (i.e., A maintains the control). Given a plan ctEActA with its component actions (cq ..... a,), if A delegates to B the whole ct, for each action oq~(a~ ..... a,3 A can apply all previous control possibilities. For sake of brevity in this paper we will not consider the rich and important typology of control-related conflicts. 5
5 Apart from the control goals and actions, other collateral goals/actions normally support any action in any plan: coordination and support goals~actions. When A has a plan a with two component actions a; and a2, and he intends to execute it, he also has the goal of coordinating the execution of a, with the execution of oh, to avoid the creation of possible obstacles to oh or to create favouring conditions for it (or viceversa). What happens of these coordination and support goals when A delegates a 2 to B? Like in the case of the control goals, there are several possibilities: i) A drops any coordination goal (that is quite risky); ii) A maintains the goal of coordinating his own action a, with B's task, avoiding conflicts and creating favouring conditions: collaborative coordination (this notion is quite close to [Gro] notion of "Intention that", although we derive it from a general plan property); iii) A delegates this goal to the agent C that has been delegated the action a~ (like he can delegate to B the goal of reciprocally coordinating oh with r
242
1.4 Types and levels of Adoption (Help) In analogy with the delegation we introduce the corresponding operator for the adoption: (Adopts B A x). We consider now some dimensions of the adoption notion.
Relation-based types of adoption Weak adoption - there is no agreement, no information or even influence: B autonomously has the goal of performing a given action since and until this action is either contained in A 's plan or it is an interest (see w 2) of A itself. Notice that this kind of help can be completely unilateral and spontaneous from B, and/or even ignored by A. In other words, B can adopt some of A's goals independently on A's delegation or request. More precisely, a') B believes that the achievement of "r is a goal of A. b') B believes that B has the power o f achieving T. c') B believes that A will not achieve x by itself. d') B intends to achieve x for A (i.e., B has the goal to achieve x relativized to the previous beliefs). In analogy with the weak delegation we consider (a', b', and c') what the agent B sees as a "Potential for weak adoption" of the agent A; and (d') what B sees as the "Decision to
weak adopting" A. Strict adoption: - there is an agreement between A and B about A's task delegation to B in response to B's offer (or B's adopting A's task in response to A's request/order).
Delegation-Adoption (Contract) In Strict Delegation, the delegated agent knows that the delegating agent is relying on it and accepts the task; in Strict Adoption, the helped agent knows about the adoption and accepts it. In other words, Strict Delegation requires Strict Adoption, and viceversa: they are two facets of a unitary social relation that we call "delegation-adoption" or "contract" .6 There is a delegation-adoption relationship between A and B for x, when:
It does not seems possible to delegate coordination goals to a third agent different from the agent that has to execute a~, like for monitoring and control goals. 6 Our treatment of delegation/adoption relationship can be distinguished from that of Haddadi [Had] for several aspects like the fact that we introduce the more basic elements of weak adoption and weak delegation. The most relevant difference seems to be the fact that for us in the goal adoption the contractor acquires a new goal (changes his mind) and does so just because it is a goal of the other agent (the client). This makes our notion of adoption more dynamic and flexible, covering several types of social relationships and also clearly related to the notion of influencing [Cas3].
243
"Potential for request of contract" from A to B: - On A's point of view: a) The achievement of x is a goal of A. b) A believes that exists an agent B that has the power of achieving x. d) A prefers to achieve x through B. - On B's point of view: b') B believes that B has the power of achieving x.
"Agreement": A series of mutual beliefs (MB) are true: (MB A, B, (a, b, c, d, e, f, h, a', b', c', d')) where: h) B is socially committed to A to achieve x for A. The delegation/adoption relation is the core of the social commitment [Cas4, Sin, Fik, Bon] relationship. Thus, it is a basic ingredient for joint intentions, true cooperation and team work [Kin, Lev,,Gro]. In other words, we claim that in collaborative activity each partner is relying on the other partners "strongly" delegating them some tasks, and, viceversa, each partner is adopting by the other partners his own tasks. Both delegations and adoptions can be either explicit or implicit.
Levels of adoption relative to the delegated task
Literal help - B adopts exactly what has been delegated by A (elementary or complex action, etc.). Overhelp - B goes beyond what has been delegated by A without changing A's plan. Critical help - B satisfies the relevant results of the requested plan/action, but modifies that plan/action. Overcritical help - B realizes an Overhelp and in addition modifies/changes that plan/action. Hyper-critical help - B adopts goals or interests of A that A itself did not take into account: by doing so, B neither performs the action/plan nor satisfies the results that were delegated.
1.5 Levels of agency Notice that the open delegation presupposes some cognitive skills in the delegated agent. The same holds also for certain kinds of adoption which presupposes in B the ability for plan abduction and recognition and agent modelling [Cas2]. Types and levels of delegation characterize the autonomy of the delegated agent. The autonomy of the delegated agent (B) from the delegating one (A) increases along various dimensions: - the more open the delegation, or - the many the control actions given up or delegated to B, or - the many the delegated decisions (discretion), or - the less dependent is B on A as for the resources necessary for the task [Sic], the more autonomous B is from A as for that task.
244
1.6 Delegation and Role
Agents delegate roles as they delegate tasks. In a broad sense any task delegation is the creation of a role: in fact, given an occasional execution of any plan through the execution of its component actions by more than one agent, one might say that these agents have a given "role" in that plan and group. This is a "transitory" or occasional role. However, we decide to use (as usual) the term "role" only for more long term and stable organizations; and to use just the term "task" for occasional delegation. As we saw, we can consider the couple action/result x----(ct,g)as the delegation-adoption object. We can specialize the defined contract relation in two subtypes: the task-contract and the role-contract. The task-contract concerns an occasional delegation. Let us define as Role contract or relation (p), the triple p=(A, B, T); where A is the Role Contractor class, B is the Role Client class and T is the Role Task class. More precisely: - T is the "Role domain" or "Role competence": it is the set of the services the role can provide; - A and B are the classes of the contractor agents and of the client agents respectively (in some cases A and B are simply individuals) and there is a relation of DelegationAdoption between these two types of agents about the Role Task (Nomic Task). Then, for each task ~ - T if BiEB, 1; is a potential task of Bi, that is to say the agent is delegated (from A) to bring it about. Analogously, for each task ~ if Ai~A "~ is its potential task to delegate to B.
Levels of delegation relative to the B's role or the B's offered help By comparing the task delegated with the role tasks or with the help spontaneously offered by B, we can characterize various kinds of delegation: - the delegated task matches with the role tasks of the agent or with its offer. - the delegated task is an "over-task" compared with the role tasks of the agent (for example the delegated plan contains the plan in the role task) or with its offer. - the delegated task is a "sub-task" compared with the role tasks of the agent (for example the plan in the role task contains the delegated plan) or with its offer. - the delegated task does not match with the role tasks of the agent or with its offer. The above analysis permits to consider the conflicts arising from the mismatch between due or offered help by B and requested tasks by A. This kind of conflicts is specially important in organizations.
2 C o n f l i c t s a b o u t the l e v e l o f a d o p t i o n
Given this characterisation of delegation and adoption and of their plan-based levels, we can derive a basic ontology of conflicts arising between the two agents when there is a mismatch between the intended delegation and the intended adoption. These mismatches are neither due to simple misunderstandings of A's request/expectation or of B's offer, nor to a wrong or incomplete plan/intention recognition by B.
245
2.1 Sub-help The supporter or the contractor might either offer the adoption or in fact satisfy just a subgoal of the delegated task. Given the situation: (Delegates A B x=-(ct,g)) with (dominates ct' ct). In other words, A delegates (to B) x within x'=(ct',g'), then if (Adopts B A lrl) ,~ (dominates a al) we say that B subhelps A (see fig 1). In other words, B does not satisfy the delegated task. Example in conversation: A: "What time is it?", B: "I don't know". A's subgoal that B answers, is satisfied, but the goal (to know the time) is not. Example in practical domain: A delegates to B "make-fettuccinipesto" and B just "make-pesto". ct'(g')
delegated adopted
riga This is due to several possible reasons: B is not able to do all the task; it is not convenient for it; it does not want to help A as for the higher goal because for example it believes that A is able to do it by itself, etc. Let's not go deeply in this area since we are mainly interested in collaborative conflicts which come from B's intention of helping A beyond his request or delegation and to exploit its own knowledge and intelligence (reasoning, problem solving, planning, and decision skills) for A.
2.2 Beyond Delegation B's intention to really collaborate can create some problems. On the one hand, one cannot be satisfied by an agent that is helping it just doing what is literally requested to do. This is not a very collaborative agent. It has no initiative, it does not care of our interests, it does not use its knowledge and intelligence to correct our plans and requests that might be incomplete, wrong or self-defeating. Thus, a truly helpful agent should not be "helpful" in the sense defined by [Cohl]: Va (Bel x (Goal y (Eventually (Done x a))))^ -, (Goal x -, (Done x a)) (P-Goal x (Done x a)) This agent is just adopting an action (not its goal) and in a literal way. It is a mere executor, it obeys, but is not very helpful. It may even be dangerous. However, on the other hand, there are dangers also when the agent take the initiative of helping the client beyond its requests.
246
Conflicts of Over, Critical, Overcritical and Hypercritical Help Given the situation: (Delegates A B x=-(~t,g)) with (dominates ct' et), then if (Adopts B A lrl) ,~ (dominates cs a) A (dominates-or-equal a' al) we say that B overhelps A (see fig.2). ct'(g')
delegated adopted
L ....
fig.2
Example in conversation domain [Car2]: A: "What time is it?", B: "Be calm, it is 5pm and our meeting is at 6pm, we are in time". Both, the delegated action (to inform about time) and the higher, non-delegated results (plan) (to know whether A are late or not; to not be anxious) are adopted and satisfied by B. Practical example: A asks B to prepare the sauce for the ravioli she will prepare for dinner, and B prepares both the sauce and the "ravioli with sauce". Given the situation: (Delegates A B -c=(ct,g)), with (dominates ct' a), then if
(Adopts B A zx) A (zx=(ct~,g)) we say that B makes a critical help of x (see fig.3). In fact, what happens is that B adopts g, that is to say it is sufficient for B to find in Act, an action et x whatever, such that g_cR,(cq). ct'(g') delegated adopted alternative
fig.3 Critical help holds in the following alternative cases: i) (CEK(B,et)=0) v ( g g R , ( a ) ) v (P(a)=false); that is to say, agent B either is not able to execute ct or, on the basis of his knowledge on action results, guesses that g is not among the results of ct, or the conditions of et are not true (and he is unable to realize them). Correspondingly he must guess that there is an action et x such as: (CEK(B,cq),~0) ^
247
(g_CR,(cq)) ^ (P(cq)=true); in other words B finds another way to realize g, using another action ctx in its action repertoire, such that: B is able to realize it, the new action contains g among its results and its conditions are satisfied. ii) B thinks that the other results of ct (beyond g) are in conflict with other goals - in plan or off plan - or interests of the client. On the other side, he thinks that there is an action cq with: (CEK(B,cq),,0) ^ (gCp~(cq)) ^ (P(cq)=true) and the results of o~ are not in conflict with other goals or interests of the client. iii) There is also the case of optimization, where the conditions in (i) are all false but there is an action a , such that g is reached in a more profitable way for B (relative to any criteria). Given the situation: (Delegates A B x=(ct,g)), with (dominates ct' ct), then if (Adopts B A ~rx)A Or,=(ax,g')) we say that B makes an overcritical help of x (see fig.4). In fact, what happens is that B adopts g', that is to say it is sufficient for B to find in Act, an action cq whatever, such that g'g_P~(cq). It is a mixed case in which there are overhelp and critical help at the same time. a'(g')
~ , , ~
~
--
.-.
~
delegated adopted alternative
g) fig.4
Overcritical help holds in the following alternative cases: i) Pr(a,a',g')=0 and at the same time (::lcq~Acts I Pr(ctx,ct',g'),~0 ^ CEK(B,%)~0 ^ P(ct,)=true). In other words, there are no pertinent results of ct in Qt'; but it exists at least one action cq which is pertinent in ct' aimed at g'. This means that a is useless for x'. It is even possible that it is noxious: i.e. that R(ct) produces results that contradict those intended with x'. A is delegating to B a plan that in B's view is wrong or self-defeating. ii) Pr(ct,ct',g'),~0 A CEK(B,a)#0 A P(a)=true and in addition (3 c t , ~ e t B I CEK(B,cq),~0 A P(ct0=true A Pr(cq,ct',g'),dt), moreover iil) R(ct~) achieves the goals internal to the plan (i.e. g') in a better way (maximization). Example: A asks B "to buy second class train tickets for Naples" (action ct) for her plan *to go to Naples by train cheaply" (action a'). B adopts A's goal "to be in Naples and spend little money" (goal g') replacing the whole plan (oh) with another plan: "go with Paul by car'. ii2) R(ctx) achieves not only the goals of the plan (i.e. g') but also other goals of A external to that plan (ex. g"): (g'CR(~))^(g"CR(cq)). Example: A asks B "to buy second class train tickets for Naples" (action ct) for her plan "to go to Naples by train cheaply" (action or'). B adopts A's goal "to be in Naples and spend little money" (goal g') replacing the whole plan (et') with another plan (ctx)
248
"to go with Paul by car" that satisfies also another goal of A - that she did not consider or satisfy in her plan - but B knows: "to travel with friends". ii3) R(cq) achieves not only the goals of the plan but also s o m e interests (i) o f A: (g'CR(a~)) ^ (iCR(oh,)). Example: A asks B "to buy second class train tickets for Naples" (action ~x) for her plan "to go to Naples by train cheaply" (action ct'). B adopts A's goal "to be in Naples and spend little money" (goal g') replacing the whole plan (ct') with another plan (ctO "to go to Naples by bus" that satisfies an interest of A of "not risking to meet Paul that she ignores to be on the same train". Given the situation: (Delegates A B x=(ct,g)), with (dominates a' ct), then if (Adopts B A gO we say that B makes an hypercritical help of x (see fig.5). In fact, B adopts g~, where gt is an interest (or an off-plan goal) of A more important than g' (we leave here this notion just intuitive). Since there is a conflict between the result R(ct) (and/or the result R(ct')) and some g~ of A, to adopt g~ would imply to not obtain R(ct) (or R(ct')). ct'(g') ~ct(g)
1) ~ ~ . . . .
Ql(gl)
~
1
delegated adopted
fig.5 In any case of over, critical, overcritical and hypercritical help there is apparently a conflict, since A has the goal that B does a, while B is doing or intends to do something different. These conflicts can be very rapidly solved for two reasons. First, B's intention is helping A, it is a collaborative one; second, normally B is "entitled" by A (either explicitly or implicitly) to do this deeper help, and A is expecting this initiative and autonomy. Thus, normally there is no real conflict since A is ready to accept B's collaborative initiative. However, sometimes these cases initiate serious conflicts to be negotiated. In fact, A might be against B's initiative or offer for several reasons: - It is not a (better) solution. A disagrees about B's plan. They have different knowledge about domain plans and A is not persuaded (cannot revise his own beliefs converging with B's beliefs). So A do not consider a good or a better solution what B is doing/proposing. (This applies to critical, overcritical and hypercritical help). - I don't trust you. There are two cases. First, A does not believe that the proposed
solution is worst, but he does not rely too much on B's intelligence, competence, or honesty (she might have some personal interest to pursue or suggest that solution; see later). Second, A believes that B is not able to correctly execute the larger plan. (Critical, overcritical and hypercritical help).
249
- Who entitled you? A does not like B's initiative of going his own way. He didn't entitle
B helping him beyond his literal request. Either what he really wants is to be "obeyed" (for example in order to don't lose the control), or he doesn't want the other to be intrusive, or he wants solve his own problems by himself; etc. (Over, critical, overcritical and hypercritical help). - It is not your job~role. This is similar to the previous point, but here B and A disagree about the institutional position of B: A challenges B's pretence of autonomy and/or of high-level collaboration. (Over, critical, overcritical and hypercritical help). - Don't be paternalistic/Nobody knows better than me what is good for me.
This case deserves special attention, since it's based on interest adoption which is the highest level of helpfulness, but also the most risky. (Hypercritical help) As we said several important kinds of conflict are due to either misunderstanding or to wrong/incomplete plan abduction by B, or to B considering his personal interest. These conflicts are out of the scope of this paper.
Tutorial and paternalistic conflicts The adoption of A's interests beyond its request and expectation is the more problematic case. In fact, B is claiming to know better then A what is good for him. Not only to know better than him a solution or a plan for his current goals or for his long term goals. It claims that A does not have the proper goals. When B takes care of A's interests and tries to induce A to pursue certain goals because this is its own interest ('for your benefit") we call this social attitude and relation tutorial. Of course, this attitude is very pretentious and might be annoying for A, which perceives it as "paternalistic'. Any adult agent pretends to know and to be able to decide what is better for him. Nevertheless, it is objective -given our cognitive and rational limits- in many cases that we ignore our interest, what is better for us, and that we do not pursue as goal what we need. Thus this dialectic is intrinsic in any relation of deep and spontaneous help.7 Normally in tutorial adoption and attitude there is conflict since the contractor wants that the client has a different goal from those currently it has. In fact objective "interests" are defined [Con] as something that an agent believes that should be a goal of the other agent, but that (it believes) it is not. Something that is useful or necessary for the other agent, that it is needed, but that the other does not understand or know, or it does not prefer (then ignoring what is better for it). When you believe that I is an interest of yours ('is in your interest') I becomes a goal of yours, although not necessarily a pursued goal. When you believe that I is your prevailing interest (or that is your interest to pursue/), not only I becomes a goal but it is preferred and pursued. What the tutorial agent is trying to do is to influence you to adopt your interest as a goal or the prevailing goal of yours. A tutorial software agent or system, would be an agent that for example answering to your request of reserving a table in a nice restaurant could says "it is too expensive for
7 Tutorial relations not only are annoying because of this pretence to know better than myself my good, but also because B might be hypocrite: it might pass as "my interest" and "for me" what is its interest or the interest of the institution it is representing. In this case it is really paternalistic and is deceiving me (manipulating me).
250
you!", or requested to order a box of whisky answers "alcohol is toxic". Want we such an intrusive and father-like agent? What kind of limits have we to put to this helping intrusion? This would be very annoying, but what about a software agent that without expliciting its criteria just give us advises and plans that are based on our presumptuous interests more than on our actual goals? This might be even worst, even if it has some interesting aspects. Consider for example a tourist adviser that has the goal of avoiding crowding of tourists all in the same famous places and the goal of make them visit beautiful but not well-known monuments. This is claimed to be in the interest of the tourist themselves (but of course is also in the interest of the tourist office, of the city and of its major, etc.). Suppose you say this system in Rome that you have just one day and that you would like to see Caravaggio. Well, this system will adopt your goal, but also your interest, and send you to see some beautiful Caravaggio that is not that famous and perhaps is not what you expected. This may be a very good advice and system: you might appreciate this discover and your quiet enjoying it. But this may also be quite bad: you might be disappointed not seeing the Caravaggio your friends will ask you about when you will be back home. So, what to do? Has this kind of help to be allowed in artificial agents? With what kind of limits? There are several possible remedies: - Agents should tacitly and automatically adopt the interests of the client only when those interests are reconcilable with the client's request, and there is a plan for satisfying both. In this case the over adoption is just a better solution for a broader problem. More formally, A delegates x=(ct,g) and B adopts x'=(ct',g') with (gUi)C_g' (i is an A's interest). In their social life humans accept (and expect) this kind of help only by certain agents (mother, friends, doctor, etc.) relative to certain area of goals (for ex. health from the doctor, salvation from a priest, happiness from a friend). Similarly we could decide whether we want a tutorial attitude by our software agents or helping systems or not, and relative to which domain. In practice, we could establish various classes of tasks where the agent can assume a paternalistic behaviour. The adoption of our interests in case of conflict must be explicit, so that the adopted agent can refuse or at least discuss till is persuaded. In this case the hypercritical adoption must be communicated to the client before being applied. - The agent's refusal of the client's request for tutorial reasons is not allowed (except for some "vital" interest, like life). -
-
3 Conflicts about reasons for adopting In this section we will examine a second type of conflicts due to different cooperative attitudes between the two agents. We will do this in a non formal way, since these conflicts are not relative to the plan but to agent's motivations. Humans not only want goal-adoption and practical help from the others, but they also consider very important -sometimes more important of the adoption in itself- the mental attitude of the helper, the reasons for adopting. In fact they search and plan for a given type of adoption based on specific reasons, and they ask for a given mental attitude, not just for help. Let's give the example of speech acts. The mind of the addressee has been insufficiently analysed in speech act theory [Lev], while it plays a very important role. We claim that
251
speech acts differ from one another as for the different minds the speaker is attempting to obtain from the hearer, the speaker attempts to induce different reasons for goal-adoption in the addressee. The difference between a pray and a command is not only pragmatic, sociological and contextual: a pray is used between a powerless and a powerful person, etc.; a command requires a command position, an authority, etc., therefore it is pragmatically inappropriate that a general prays a soldier or that a soldier gives commands to a general. In our view, the difference is in the cognitive analysis of the act: in the speaker's plan about the hearer's mind. 9 A pray is a request of adhesion for pity, for generosity: I'm planning your mental attitude in helping me. 9 A "please-request" ("could you please...") is searching for courtesy-based adhesion. 9 A command is aimed at obedience: I want you to do that not for any reason but because you acknowledge my authority. This is within the speech-act plan itself, and in its meaning, not in the context! Thus, speech acts prescribe an entire mind rather than a given behaviour only. This pretence by the delegating agent (the speaker) produces several possible conflicts. When you cry "Help me! help me!" drowning in a river you neither want nor expect that the guy which is supposed to help you ask you "What will you give me?". There is a conflict here about the reasons for adoption: you are asking for an altruistic adoption, and the other offers you an exchange-based adoption [Con]. When, after a sexual intercourse, your panner girl asks you for some money, or viceversa when the men puts some money on the table, there is a terrible disillusion: there was a misunderstanding and a conflict about reasons for adoption. In the former case, the men was searching/asking for a cooperative and symmetric adoption (based on the same reasons he has: appeal, pleasure, affect,..) while the girl offered a commercial adoption. In the latter case, the men was searching for an exchange-money based adoption of his sexual desires, without affective or others "complications", while the girl was adopting for other reasons (attraction, sympathy, love, etc.). This kind of conflicts are relevant also among artificial intelligent agents. In fact, also among these agents there are several reasons for adopting each other, in particular in open environments (like the web) and in MASs with heterogeneous, self-interest agents. One should at least distinguish between "free" (no obligations) and "due" adoption and commitment.
3.1 Due adoptions: debts and roles Debts. First of all, it is very reasonable that in several MA systems8 the self-interested agents have to have a memory of previous interactions, and in particular have to maintain both a memory of the reliability and honesty of the other agents (their "reputation"), and a book/record of their credits (I did something for him, I'm waiting for some reciprocation) and debts (she did something for me, she is waiting for some reciprocation). Of course, this would also require a "social norm" (an obligation) or a built-in goal of reciprocation in the agents.
8 In our view, this is needed also in CSCW systems (supporting commitments and collaborations among human partners)
252
Now, suppose agents which have such a debts-credits information; they are prone to
anotherform of delegation conflict: agent A asks B to do action ct for reciprocation, while B is ready to help A but not as reciprocation. Either B believes that she has no debts with A, or she does not want to reciprocate. B might for example intent to help A for exchange (asking for some immediate reward or for some future reciprocation) or for sympathy, benevolence, etc. The conflict is not about helping or not, or about the amount of reward [Kra]: it's about the reasons for adopting. A delegates r to B provided that B adopts his goal for specific reasons (duty), while B adopts A's task, provided that A accepts her reasons (not a "due" adoption but a free one). 9 Duties (role). Second, suppose an organisation, i.e. a collective activity of a group of agents based on some previous commitment among them relative not just to a specific and extemporaneous task, but to a class of possible tasks (see the notion of Organisationai-Commitment). These commitments about classes of goals within the organisation plans, define, as we said the Role of the agent in that organisation. The existence of such an established Role, generic commitments, and pre-agreement, deeply changes the relation between the client and the contractor. In fact, if A asks B to do something (ct) that belongs to her Role, this is just an instantiation of what B already promised/agreed to do: it is her Role-duty. B has to do this "by Role", by contract. Also this source of duty rises possible conflicts of delegation. On the one side, B might disagree about ct belonging to her office: (Bel A (3pl I ((A~_.Al)^(B~Bi)^(x~rl)))) ^ (Bel B (-, =lpl I ((AEAI)^(B~EBI)^(-t~T0))). On the other side, the conflicts might give a different interpretation of the original contract (Organisational Commitment); the conflict might also be about reasons for adopting: A might search for a due, role-based adoption, while B is ready to help A but for other reasons. For example A gives a "command" to B, while B does not want to "obey" but just to friendly help or to exchange. Free adoption Of course, also within free adoption there might be conflicts about reasons: A wants that B helps him for cooperation (since he believes that they are cointerested in a common goal) while B asks for some reward. Moreover, it is not excluded the possibility of either implementing or let evolve in some population of artificial agents (robots, softbots) some form of "reciprocal altruism". In this case a new motive for help (a "terminal goal") is provided, and new conflicts are allowed: an agent might ask for an altruistic help while the other might supply an exchange or a reciprocating help; or viceversa.
3.2 Role conflicts Since along the paper we had to mention conflicts relative to the role of the agents (as a set of classes of pre-established tasks), let us shortly recapitulate, for sake of clarity, the main types of role conflicts. Given (Delegates A B x):
9 This conflict about titles for delegating and reasons for adopting, has not to be mixed up with possible conflicts about the amount of the debt/credit: "I acknowledge my debt, but what you ask me as reciprocation is more than you gave me!".
253
9 A's delegation does not match with B's role: either a sub-task or an over-task is requested: (Bel B (=lp21 ((A~--A2,)A(I~-Bz)A('I~'~--T2)) A ((dominates-at-level-k ct' ct) v (dominates-at-level-k ct ct')) v (no plan relations between ct ct')). 9 A's delegation does not fit with A's role: A shouldn't delegate/ask for that kind of task (for ex. it is not entitled to delegate x): (Bel B Vpi such that ((B~_B,) ^(x~-T,)) then (A~A~). 9 A and B disagree about x belonging to B's office: (Bel B Vpi such that (BEB i ) then (xvr where ((dominates-at-level-k ct' ct) v (dominates-at-level-k ct ct')) 9 A and B disagree about the reasons for B's adoption (whether it has to be a role adoption or not).
4 Conclusions We any We and
claimed that delegation and its related conflicts are the core of the interaction with kind of autonomous agent. presented our definition of delegation and adoption, a plan-based definition of tasks, of different kinds and levels of delegation and adoption. We attempted to show that: i) There are several levels of cooperation - more or less "deep" and helpful- and several levels of task delegation. ii) These levels are related to the hierarchical structure of plans or tasks. iii) There is a non-arbitrary correspondence between levels of delegation and levels of adoption; we called "contract" this relation. iv) A "deep" cooperation needs understanding of plans, goals, and interests of the other agent or user. v) There is a fundamental distinction between the delegation/adoption of: a domain task (practical action), or a planning or problem solving action or a control action. We illustrated the most interesting conflicts that arise when the provided help does not match the intended delegation (except conflicts relative to the control, and conflicts due to misunderstanding). We discussed also conflicts due to critical over help, and to the tutorial (paternalistic) attitude of the helper, which care of our interests against our requests. Finally, another source of conflict relative to the relationship between delegation and adoption has been examined: the conflict between the reasons and motives for the adoption as planned by the client and the reasons for adopting offered by the helper; a conflict that presupposes a record of debts and credits among the interacting agents. References [Bon] Bond, A.H., Commitments, Some DAI insigths from Symbolic Interactionist Sociology. In Proceedings of the 9A International AAA1 Workshop on Distributed Artificial Intelligence, .239261. Menlo Park, 1989. [Carl] Chu-Carroll J., Carberry, S., Conflict detection and resolution in collaborative planning, IJCAI-95 Workshop on Agent Theories, Architectures, and Languages, Montreal, Canada. 1995. [Car2] Chu-Carroll J., Carberry, S., A Plan-Based Model for Response Generation in Collaborative Task-Oriented Dialogues in Proceeedings of AAAI-94. 1994.
254
[Casl] Castelfranchi, C., Falcone, R., Towards a theory of single-agent into multi-agent plan transformation. The 3rd Pacific Rim International Conference on Artificial Intelligence, Beijing, China, 16-18 agosto 1994. [Cas2] Castelfranchi, C. & Falcone, R., Levels of help, levels of delegation and agent modeling. AAAI-96 Agent Modeling Workshop, 4 angustl996. [Cas3] Social Power: a missed point in DAI, MA and HCI. In Decentralized AL Y. Demazeau & J.P.Mueller (eds), 49-62. Amsterdam: Elsevier. [Cas4] Castelfranchi, C., Commitment: from intentions to groups and organizations. In Proceedings oflCMAS'96, S.Fruncisco, June 1996, AAAI-MIT Press. [Coh] Cohen, Ph. & Levesque, H., "Rational Interaction as the Basis for Communication". Technical Report, N~ CSLI, Stanford. 1987. [Con] Conte,R. & Castelfranehi, C. Cognitive and Social Action, UCL Press, London, 1995. [Fall Falcone, R., Castelfranchi, C., "On behalf of ..": levels of help, levels of delegation and their conflicts, 4th ModelAge Workshop: "Formal Model of Agents", Certosa di Pontignano (Siena), 1517 gennaio 1997. [Fik] Fikes, R. ED1982. A commitment-based framework for describing informal cooperative work. Cognitive Science, 6:331-347. [Goo] R. Goodwin, Formalizing Properties of Agents. Technical report, CMU-CS-93-159, 1993. [Gro] Grosz B., Kraus S., Collaborative plans for complex group action, Artificial Intelligence 86, pp. 269-357, 1996. [Had] A. Haddadi, Communication and Cooperation in Agent Systems. Springer-Verlag, 1996. [Kin] D.Kinny, M. Ljungberg, A.Rao, E.Sonenberg, G.Tidhar and E.Wemer, Planned Team Activity, in C. Castelfranchi and E. Wemer (Eds.) Artificial Social Systems (MAAMAW'92), Springer-Vedag, LNAI-830, 1994. [Kra] Kraus Sarit, Agents Contracting tasks in non-collaborative environments, in proceedings of AAAI'93, pp. 243-248, 1993. [Lev] Levesque HJ., P.R. Cohen, Nunes J.H.T. On acting together. In Proceedings of the 8th National Conference on Artificial Intelligence, 94-100. San Marco, Califomia: Kaufmann. 1990 [Lev] S.C. Levinson, "The Essential Inadequacies of Speech Act Models of Dialogue. In H. Parrett editor, Possibilities and Limitations of Pragmatics, 198 I. [Luc] M. Luck & M. d'Inverno, "A formal freamwork for agency and autonomy". In proceedings of the First International Conference on Multi-Agent Systems, pages 254-260. AAAI Press/MIT Press, 1995. [Mac] P. Maes, Situated agents can have goals. In P. Maes, editor, Designing Autonomous Agents, pp. 49-70. The MIT Press, 1990. [Ros] Rosenschein, J.S. and Zlotkin, G. Rules of encounters Designing conventions for automated negotiation among computers. Cambridge, MA: MIT Press. 1994. [Pol] Pollack, M., Plans as complex mental attitudes in Cohen, P.R., Morgan, J. and Pollack, M.E. (eds), Intentions in Communication, MIT press, USA, tap 77-103, 1990. [Sic] Sichman, J, R. Conte, C. Castelfranchi, Y. Demazeau. A social reasoning mechanism based on dependence networks. In Proceedings of the 11th ECAI, 1994. [Sin] M.P. Singh, Multiagent Systems: A Theoretical Framework for Intentions, Know-how, and Communications. Springer Verlag, LNCS, volume 799, 1995. [Wer] Werner, E., Cooperating agents: A unified theory of communication and social structure. In L.Gasser and M.N.Huhns, editors, Distribuited Artificial Intelligence: Volume II. Morgan Kaufmann Publishers, 1990.
Lecture Notes in Artificial Intelligence (LNAI)
Vol. 1076: N. Shadbolt, K. O'Hara, G. Schreiber (Eds.), Advances in Knowledge Acquisition. Proceedings, 1996. XII, 371 pages. 1996.
Vol. 1160: S. Arikawa, A.K. Sharma (Eds.), Algorithmic Learning Theory. Proceedings, 1996. XVII, 337 pages. 1996.
Vol. 1079: Z. W. Rag, M. Michalewicz (Eds.), Foundations of Intelligent Systems. Proceedings, 1996. XI, 664 pages. 1996.
Vol. 1168: 1. Smith, B. Faltings (Eds.), Advances in CaseBased Reasoning. Proceedings, 1996. IX, 531 pages. 1996.
Vol. 1081: G. McCalla (Ed.), Advances in Artificial Intelligence. Proceedings, 1996. XII, 459 pages. 1996. Vol. 1083: K. Sparek Jones, J.R. Galliers, Evaluating Natural Language Processing Systems. XV, 228 pages. 1996. Vol. 1085: D.M. Gabbay, H.J. Ohlbach (Eds.), Practical Reasoning. Proceedings, 1996. XV, 721 pages. 1996. Vol. 1087: C. Zhang, D. Lukose (Eds.), Distributed Artificial Intelligence. Proceedings, 1995. VIII, 232 pages. 1996. Vol. 1093: L. Dorst, M. van Lambatgen, F. Voorbraak (Eds.), Reasoning with Uncertainty in Robotics. Proceedings, 1995. VIII, 387 pages. 1996. Vol. 1095: W. McCune, R. Padmanabhan, Automated Deduction in Equational Logic and Cubic Curves. X, 231 pages. 1996. Vol. 1104: M.A. McRobbie, J.K. Slaney (Eds.), Automated Deduction -Cade- 13. Proceedings, 1996. XV, 764 pages. 1996. Vol. 1111 : J. J. Alferes, L. Moniz Pereira, Reasoning with Logic Programming. XX1, 326 pages. 1996. Vol. 1114: N. Foo, R. Goebel (Eds.), PRICAI'96: Topics in Artificial Intelligence. Proceedings, 1996. XXI, 658 pages. 1996. Vol. 1115: P.W. Eklund, G. Ellis, G. Mann (Eds.), Conceptual Structures: Knowledge Representation as lnterlingua. Proceedings, 1996. XIII, 321 pages. 1996. Vol. 1126: J.J. Alferes, L. Moniz Pereira, E. Orlowska (Eds.), Logics in Artificial Intelligence. Proceedings, 1996. IX, 417 pages. 1996. Vol. 1137: G. GiSrz, S. H011dobler (Eds.), KI-96: Advances in Artificial Intelligence. Proceedings, 1996. XI, 387 pages. 1996. Vol. 1147: L. Miclet, C. de la Higuera (Eds.), Grammatical Inference: Learning Syntax from Sentences. Proceedings, 1996. VIII, 327 pages, t996. Vol. 1152: T. Furuhashi, Y. Uchikawa (Eds.), Fuzzy Logic, Neural Networks, and Evolutionary Computation. Proceedings, 1995. VIII, 243 pages. 1996. Vol. 1159: D.L. Borges, C.A.A. Kaestner (Eds.), Advances in Artificial Intelligence. Proceedings, 1996. XI, 243 pages. 1996.
Vol. 117 l: A. Franz, Automatic Ambiguity Resolution in Natural Language Processing. XIX, 155 pages. 1996. Vol. t 177: J.P. Mtiller, The Design of Intelligent Agents. XV, 227 pages. 1996. Vol. 1187: K. Schleehta, Nonmonotonie Logics. IX, 243 pages. 1997. Vol. 1188: T.P. Martin, A.L. Raleseu (Eds.), Fuzzy Logic in Artificial Intelligence. Proceedings, 1995. VIII, 272 pages. 1997. Vol. 1193: J.P. Miiller, M.J. Wooldridge, N.R. Jennings (Eds.), Intelligent Agents III. XV, 401 pages. 1997. Vol. 1195: R. Trappl, P. Petta (Eds.), Creating Personalities for Synthetic Actors. VII, 251 pages. 1997. Vol. 1198: H. S. Nwana, N. Azarmi (Eds.), Software Agents and Soft Computing: Towards Enhancing Machine Intelligents. XIV, 298 pages. 1997. Vol. 1202: P. Kandzia, M. Klusch (Eds.), Cooperative Information Agents. Proceedings, 1997. IX, 287 pages. 1997. Vol. 1208: S. Ben-David (Ed.), Computational Learning Theory. Proceedings, 1997. VIII, 331 pages. 1997. Vol. 1209: L. Cavedon, A. Rat, W. Wobcke (Eds.), Intelligent Agent Systems. Proceedings, 1996. IX, 188 pages. 1997. Vol. 1211: E. Keravnou, C. Garbay, R. Baud, J. Wyatt (Eds.), Artificial Intelligence in Medicine. Proceedings, 1997. XIII, 526 pages, t997. Vol. 1216: J. Dix, L. Moniz Pereira, T.C. Przymusinski (Eds.), Non-Monotonic Extensions of Logic Programming. Proceedings, 1996. XI, 224 pages. 1997. Vol. 1221: G. WeiB (Ed.), Distributed Artificial Intelligence Meets Machine Learning. Proceedings, 1996. X, 294 pages. 1997. Vol. 1224: M. van Someren, G. Widmer (Eds.), Machine Learning: ECML-97. Proceedings, 1997. XI, 361 pages. 1997. Vol. 1227: D. Galmiche (Ed.), Automated Reasoning with Analytic Tableaux and Related Methods. Proceedings, 1997. XI, 373 pages. 1997. Vol. 1228: S.-H. Nienhuys-Cheng, R. de Wolf, Foundations of Inductive Logic Programming. XVII, 404 pages. 1997. Vol. 1237: M. Boman, W. Van de Velde (Eds.), MultiAgent Rationality. Proceedings, 1997. XII, 254 pages. 1997.
Lecture Notes in Computer Science
Vol. 1202: P. Kandzia, M. Klusch (Eds.), Cooperative Information Agents. Proceedings, 1997. IX, 287 pages. 1997. (Subseries LNAIL Vol. 1203: G. Bongiovanni, D.P. Bovet, G. Di Battista (Eds.), Algorithms and Complexity. Proceedings, 1997. VIII, 311 pages. 1997. Vol. 1204: H. M6ssenb6ck (Ed.), Modular Programming Languages. Proceedings, 1997. X, 379 pages. 1997. Vol. 1205: J. Toccaz, E. Grimson, R. M6sges (Eds.), CVRMed-MRCAS'97. Proceedings, 1997. XIX, 834 pages. 1997. Vol. 1206: J. Bigtin, G. Chollet, G. Borgefors (Eds.), Audio- and Video-based Biometric Person Authentication. Proceedings, 1997. XII, 450 pages. 1997. Vol. 1207: J. Gallagher (Ed.), Logic Program Synthesis and Transformation. Proceedings, 1996. VII, 325 pages. 1997. Vol. 1208: S. Ben-David (Ed.), Computational Learning Theory. Proceedings, 1997. VIII, 331 pages. 1997. (Subseries LNAI).
Vol. 1219: K. Rothermel, R. Popescu-Zeletin (Eds.), Mobile Agents. Proceedings, 1997. VIII, 223 pages. 1997. Vol. 1220: P. Brezany, Input/Output Intensive Massively Parallel Computing. XIV, 288 pages. 1997. Vol. 1221: G. Weig (Ed.), Distributed Artificial Intelligence Meets Machine Learning. Proceedings, 1996. X, 294 pages. 1997. (Subseries LNAI). Vol. 1222: J. Vitek, C. Tschudin (Eds.), Mobile Object Systems. Proceedings, 1996. X, 319 pages. 1997. Vol. 1223: M. Pelillo, E.R. Hancock (Eds.), Energy Minimization Methods in Computer Vision and Pattern Recognition. Proceedings, 1997. XII, 549 pages. 1997. Vol. 1224: M. van Someren, G. Widmer (Eds.), Machine Learning: ECML-97. Proceedings, 1997. XI, 361 pages. 1997. (Subseries LNAI). Vol. 1225: B. Hertzberger, P. Sloot (Eds.), HighPerformance Computing and Networking. Proceedings, 1997. XXI, 1066 pages. 1997. Vok 1226: B. Reusch (Ed.), Computational Intelligence. Proceedings, 1997. XIII, 609 pages. 1997.
Vol. 1209: L. Cavedon, A. Rao, W. Wobcke (Eds.), Intelligent Agent Systems. Proceedings, 1996. IX, 188 pages. 1997. (Subseries LNAI).
Vol. 1227: D. Galmiehe (Ed.), Automated Reasoning with Analytic Tableaux and Related Methods. Proceedings, 1997. XI, 373 pages. 1997. (Subseries LNAI).
Vol. 1210: P. de Groote, J.R. Hindley (Eds.), Typed Lambda Calculi and Applications. Proceedings, 1997. VIII, 405 pages. 1997.
Vol. 122g: S.-H. Nienhuys-Cheng, R. de Wolf, Foundations of Inductive Logic Programming. XVII, 404 pages. 1997. (Subseries LNAI).Vol. 1230: J. Duncan, G. Gindi (Eds.), Information Processing in Medical Imaging. Proceedings, 1997. XVI, 557 pages. 1997.
Vol. 1211: E. Keravnou, C. Garbay, R. Baud, J. Wyatt (Eds.), Artificial Intelligence in Medicine. Proceedings, 1997. XIII, 526 pages. 1997. (Subseries LNAI). Vol. 1212: J. P. Bowen, M.G. Hinchey, D. Till (Eds.), ZUM '97: The Z Formal Specification Notation. Proceedings, 1997. X, 435 pages. 1997. Vol. 1213: P. J. Angeline, R. G. Reynolds, J. R. McDonnell, R. Eberhart (Eds.), Evolutionary Programming VI. Proceedings, 1997. X, 457 pages. 1997. Vol. 1214: M. Bidoit, M. Dauchet (Eds.), TAPSOFT '97: Theory and Practice of Software Development. Proceedings, 1997. XV, 884 pages. 1997. Vol. 1215: J. M. L. M. Palma, J. Dongarra (Eds.), Vector and Parallel Processing - VECPAR'96. Proceedings, 1996. XI, 471 pages. 1997. Vol. 1216: J. Dix, L. Moniz Pereira, T.C. Przymusinski (Eds.), Non-Monotonic Extensions of Logic Programming. Proceedings, 1996. XI, 224 pages. 1997. (Subseries LNAI). Vol. 1217: E. Brinksma (Ed.), Tools and Algorithms for the Construction and Analysis of Systems. Proceedings, 1997. X, 433 pages. 1997. Vol. 1218: G. Paun, A. Salomaa (Eds.), New Trends in Formal Languages. IX, 465 pages. 1997.
Vol. 1231: M, Bertran, T. Rus (Eds.), TransformationBased Reactive Systems Development. Proceedings, 1997. XI, 431 pages. 1997. Vol. 1232: H. Comon (Ed.), Rewriting Techniques and Applications. Proceedings, 1997. XI, 339 pages. 1997. Vol. 1233: W. Fumy (Ed,), Advances in Cryptology - EUROCRYPT '97. Proceedings, 1997. XI, 509 pages. 1997. Vol 1234: S. Adian, A. Nerode (Eds.), Logical Foundations of Computer Science. Proceedings, 1997. IX, 431 pages. 1997. Vol. 1235: R. Conradi (Ed.), Software Configuration Management. Proceedings, 1997. VIII, 234 pages. 1997. Vol. 1237: M. Boman, W. Van de Velde (Eds.), MultiAgent Rationality. Proceedings, 1997. XII, 254 pages. 1997. (Subseries LNAI). Vol. 1240: J. Mira, R. Moreno-Diaz, J. Cabestany (Eds.), Biological and Artificial Computation: From Neuroscience to Technology. Proceedings, 1997. XXI, 1401 pages. 1997