AGENT ENGINEERING
World Scientific
AGENT ENGINEERING
SERIES IN MACHINE PERCEPTION AND ARTIFICIAL INTELLIGENCE* Editors:
H. Bunke (Univ. Bern, Switzerland) P. S. P. Wang (Northeastern Univ., USA)
Vol. 25: Studies in Pattern Recognition Memorial Volume in Honor of K S Fu (Eds. H. Freeman) Vol. 26: Neural Network Training Using Genetic Algorithms (Eds. L. C. Jain, R. P. Johnson and A. F. J. van Rooij) Vol. 27: Intelligent Robots — Sensing, Modeling & Planning (Eds. B. Bolles, H. Bunke and H. Noltemeier) Vol. 28: Automatic Bankcheck Processing (Eds. S. Impedovo, P. S. P. Wang and H. Bunke) Vol. 29: Document Analysis II (Eds. J. J. Hull and S. Taylor) Vol. 30: Compensatory Genetic Fuzzy Neural Networks and Their Applications {Y.-Q. Zhang and A. Kandel) Vol. 31: Parallel Image Analysis: Tools and Models (Eds. S. Miguet, A. Montanvert and P. S. P. Wang) Vol. 33: Advances in Oriental Document Analysis and Recognition Techniques (Eds. S.-W. Lee, Y. Y. Tang and P. S. P. Wang) Vol. 34: Advances in Handwriting Recognition (Ed. S.-W. Lee) Vol. 35: Vision Interface — Real World Applications of Computer Vision (Eds. M. Cheriet and Y.-H. Yang) Vol. 36: Wavelet Theory and Its Application to Pattern Recognition {Y. Y. Tang, L H. Yang, J. Liu and H. Ma) Vol. 37: Image Processing for the Food Industry (£. R. Davies) Vol. 38: New Approaches to Fuzzy Modeling and Control — Design and Analysis (M. Margaliot and G. Langholz) Vol. 39: Artificial Intelligence Techniques in Breast Cancer Diagnosis and Prognosis (Eds. A. Jain, A. Jain, S. Jain and L Jain) Vol. 40: Texture Analysis in Machine Vision (Ed. M. K. Pietikainen) Vol. 41: Neuro-Fuzzy Pattern Recognition (Eds. H. Bunke and A. Kandel) Vol. 42: Invariants for Pattern Recognition and Classification (Ed. M. A. Rodrigues) *For the complete list of titles in this series, please write to the Publisher.
Series in Machine Perception and Artificial Intelligence - Vol. 43
AGENT ENGINEERING Editors
Jiming Liu Hong Kong Baptist University
Ning Zhong Maebashi Institute of Technology, Japan
Yuan Y Tang Hong Kong Baptist University
Patrick S P Wang Northeastern University, USA
fe World Scientific m
Singapore • New Jersey • London • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
AGENT ENGINEERING Copyright © 2001 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-02-4558-0
Printed in Singapore by World Scientific Printers (S) Pte Ltd
List of Contributors
K. Suzanne Barber Electrical and Computer Engineering The University of Texas at Austin 201 24th Street, ACE 5.436 Austin, Texas 78712 USA Alan D. Blair Department of Computer Science University of Melbourne Parkville, Victoria 3052 Australia email:
[email protected] Bengt Carlsson Department of Software Engineering and Computer Science Blekinge Institute of Technology Box 520, S-372 25 Ronneby Sweden email:
[email protected] Stefan J. Johansson Department of Software Engineering and Computer Science Blekinge Institute of Technology Box 520, S-372 25 Ronneby Sweden email:
[email protected] Sam Joseph NeuroGrid Consulting 205 Royal Heights 18-2 Kamiyama-cho Shibuya-ku, Tokyo 150-0047 Japan email:
[email protected]
Takahiro Kawamura Corporate Research & Development Center TOSHIBA Corp. 1 Komukai Toshiba-cho Saiwai-ku, Kawasaki 212-8582 Japan email:
[email protected]
Chunnian Liu School of Computer Science Beijing Polytechnic University (BPU) Beijing 100022, P.R. China email:
[email protected]
Jiming Liu Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong email:
[email protected]
Felix Lor Intelligent & Interactive Systems Department of Electrical & Electronic Engineering Imperial College of Science Technology & Medicine London, SW7 2BT United Kingdom
Cheryl E. Martin Electrical and Computer Engineering The University of Texas at Austin 201 24th Street, ACE 5.436 Austin, Texas 78712 USA
vi
List of
Contributors
Setsuo Ohsuga Department of Information and Computer Science School of Science and Engineering Waseda University 3-4-1 Okubo Shinjuku-Ku, Tokyo 169 Japan email:
[email protected] Jordan B. Pollack Department of Computer Science Brandeis University Waltham, MA 02454-9110 USA email:
[email protected] Elizabeth Sklar Computer Science Department Pulton Hall, Room 460 Boston College Chestnut Hill, MA 02467 USA email:
[email protected] Yuan Y. Tang Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong email:
[email protected]
John K. Tsotsos Department of Computer Science York University 4700 Keele Street, Toronto, Ontario Canada M3P 1P3
Patrick S. P. Wang College of Computer Science Northeastern University Boston, MA 02115 USA
Yiming Ye IBM T.J. Watson Research Center 30 Saw Mill River Road (Route 9A) Hawthorne, N.Y. 10532 USA email:
[email protected]
Ning Zhong Head of Knowledge Information Systems Lab. Department of Information Engineering Maebashi Institute of Technology 460-1 Kamisadori-Cho Maebashi-City, 371-0816 Japan email:
[email protected]
TABLE OF CONTENTS
List of Contributors
v
Introduction to Agent Engineering Jiming Liu, Ning Zhong, Yuan Y. Tang, and Patrick S. P. Wang
1
Chapter 1 Why Autonomy Makes the Agent Sam Joseph and Takahiro Kawamura
7
Chapter 2 Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 29 Yiming Ye and John K. Tsotsos Chapter 3 The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems K. Suzanne Barber and Cheryl E. Martin Chapter 4 Dynamically Organizing KDD Processes in a Multi-Agent KDD System Ning Zhong, Chunnian Liu, and Setsuo Ohsuga
59
93
Chapter 5 Self-Organized Intelligence Jiming Liu
123
Chapter 6 Valuation-Based Coalition Formation in Multi-Agent Systems Stefan J. Johansson
149
Vll
viii
Contents
Chapter 7 Simulating How to Cooperate in Iterated Chicken and Prisoner's Dilemma Games Bengt Carlsson Chapter 8 Training Intelligent Agents Using Human Data Collected on the Internet Elizabeth Sklar, Alan D. Blair, and Jordan B. Pollack
175
201
Chapter 9 Agent Dynamics: Soap Paradigm Felix W. K. Lor
227
Author and Subject Index
261
Introduction to Agent Engineering
Jiming Liu Hong Kong Baptist University Ning Zhong Maebashi Institute of Technology, Japan Yuan Y. Tang Hong Kong Baptist University Patrick S. P. Wang Northeastern University, U. S. A.
Agent engineering is concerned with the development of autonomous computational or physical entities capable of perceiving, reasoning, adapting, learning, cooperating, and delegating in a dynamic environment. It is one of the most promising areas of research and development in information technology, computer science, and engineering today. Motivation Traditionally, ever since the Dartmouth workshop where the notion of AI was first defined, the field of AI has been based primarily on the following premise: For a given problem (either in mathematics, in engineering, or even in medicine), we can always represent it in precise, formal mathematical expressions, such as logical predicates or symbolic manipulation operators. Thereafter, based on these wellformulated representations, we can derive or deduce the exact solution to the problem. One difficulty with this approach can readily be recognized, namely: Sometimes the problem requires formal models from a variety of domains. Owing to this difficulty, researchers later considered the alternative of having several AI systems working l
2
J. Liu, N. Zhong, Y. Y. Tang and P. S. P. Wang
concurrently, each of which would be assigned to handle a particular task at a particular time. Note that here the fundamental "philosophy" remained to be unchanged, i.e., symbolic - it has to rely on logical expression manipulation, and top-down - we have to define how many subsystems are required and how they should cooperate with each other. This later task-decomposition approach represents one step improvement over the single AI systems approach. Now a question that one may ask is the following: If we do not have the complete knowledge or model about the given problem, then how to formulate it into welldefined expressions or statements. This is particularly true in the case of solving real-life problems. The key shortcomings of the traditional AI approaches may be summarized into one sentence: They rely on human beings to plan the exact steps for transforming and solving the problem, and to carefully distribute the task to individual AI systems. Unfortunately, most of the time, we as systems designers fail to do this job very effectively. This is in fact one of the main reasons why we have not seen much early-promised AI applications in real-life, ever since 1956 when Herbert A. Simon, Allen Newell, Marvin L. Minsky, Seymour A. Papert, and even Alan Turing (before 1956) founded this field. With an attempt to account for the above-mentioned limitations and to get rid of the rather unrealistic assumptions of AI, people have started to realize the importance of developing autonomous agent-based systems. Unfortunately, this happened just today - over forty years later. Key Issues Now, the general questions that remain are: What is it meant by "autonomous agents"? How can we build agents with autonomy? What are the desirable capabilities of agents, with respect to surviving (they will not die) and living (they will furthermore enjoy their being or existence)? How can agents cooperate among themselves? Each autonomous agent may have its own very primitive behaviors, such as searching, following, aggregation, dispersion, filtering, homing, and wandering. Two important issues to be considered here are how to develop learning and selection mechanisms (e.g., action observation and evaluation mechanisms) for acquiring agent behaviors and how to implement an array of parameters (e.g., search depth, life-span, and age) for controlling the behaviors. A system of decentralized agents may contain more than one class of agents. All the agents belonging to one class may share some (but not all) of their behavioral
Introduction
to Agent Engineering
3
characteristics. The issues to be addressed here are how to develop agent collective learning and collective behavior evolution/emergence and how to establish and demonstrate the interrelationships between the autonomy of individual agents and the emerged global properties of agent teams, which will result from the dynamic interaction as well as (co)evolution among several classes of agents and their environment. In this regard, some specific questions can readily be posed. For instance, if the behaviors are finite and defined locally for the individual agents as well as for the classes of agents, then in a given environment, how will the agents evolve in time? That is, how will the dynamics of the population change over time? How will the agents dynamically diffuse within the environment? Can they converge to a finite number of observable steady states? How do the parameters (such as the initial number/distribution of the agents and their given behavioral parameters) will affect the converged states? Let us take one step further. Say, we have two concurrent ways to linearly change the behavioral parameters, in order to make the above mentioned steady state convergence faster and also more selective (since we may be interested in only one of the states). The first way is through each of the individual agents itself, i.e., the agent records its own performance, such as the number of encounters and the number of moves, and then it based on such observations tries to control its own behavioral parameters in order to achieve (or to control) an optimal performance, such as the maximum number of encounters and the minimum number of moves. Another way is through the feedback of the global information that can be observed globally from the entire environment, such as the pattern formation change of different classes of the agents. The control in this case comes globally. The examples of this second way of behavioral changes would be: One particular class of the agents switches from one behavior to another as commanded from the global control mechanism, or one behavioral parameter in a particular class changes in a certain way. The purpose of doing this is to achieve the global optimal performance for the entire system. Now it comes to the following question: In order to achieve the optimal performance at the global level only, how much optimization at the local individual level and how much at the global level would be necessary? An Overview of this Volume The aim of this volume is to address some of the key issues and questions in agent engineering. In Chapter 1, Joseph and Kawamura provide a definition of (mobile) agents in terms of whether or not mobile objects can autonomously decide to adjust their stated objectives and modify the ways they achieve their objectives by assessing the outcome of their decisions. The authors explain why this distinction should be made
4
J. Liu, N. Zhong, Y. Y. Tang and P. S. P. Wang
and illustrate this idea with examples from distributed mobile agent systems for conserving network resources. In order to select actions and exhibit goal-directed behaviors, an agent should develop its own awareness, i.e., the knowledge of itself and its environment. In Chapter 2, Ye and Tsotsos explicitly address the two important questions concerning this issue: how much detail the agent should include in its knowledge representation so that it can efficiently achieve its goal and how an agent should adapt its methods of representation such that its performance can scale to different task requirements. In Chapter 3, Barber and Martin argue that a multiagent system must maintain an organizational policy that allows for the dynamic distribution of decision-making control and authority-over relationships among agents in order to adapt to dynamically changing run-time situations. Based on a series of simulation-based experiments and comparative studies focusing on a form of organizational restructuring called Dynamic Adaptive Autonomy (DAA), they explain why such an organizational-level adaptation is desirable from the point of view of the systems performance and how this can be effectively implemented. By increasing autonomy and versatility, multiagent systems have much to offer in solving large-scale, high-complexity computational problems. One such example is in the area of KDD (Knowledge Discovery and Data Mining) where different techniques should be appropriately selected in achieving different discovery goals. In Chapter 4, Zhong, Liu, and Ohsuga provide a generic framework for developing a multiagent KDD system. The important feature of their framework lies in that the KDD processes are dynamically organized through a society of KDD agents. The coordination among the KDD agents is achieved using a planning agent (i.e., metaagent). Chapter 5 is concerned with the issue of inducing self-organized collective intelligence in a multi-agent system. The specific tasks that Liu uses to demonstrate his approach are: (1) cellular agents are used to efficiently search and dynamically track a moving object, and (2) distributed robots are required to navigate in an unknown environment toward shared common goal locations. Besides coordinating among distributed agents, agent engineering must also deal with the problems of cooperation and competition. What makes agents form coalitions? Under what conditions will an agent be included or excluded in a coalition? How coalitions can be strengthened? In Chapter 6, Johansson argues for a rational, continuous view of agent membership in coalitions. The membership is based on how valuable a coalition is for an agent and vice versa. His work results in a theoretical model for updating and correcting group values in order to have a
Introduction
to Agent Engineering
5
trustful relation. In multiagent cooperation situations, sometimes there can be a conflict of interest among agents. When this happens, how should agents cooperate with each other? Carlsson explicitly addresses this issue in Chapter 7 and examines several game strategies based on rational and evolutionary behavior. Can a population of software agents be produced using human behavior as a basis? Sklar, Blair, and Pollack's chapter (Chapter 8) describes a method for training such a population using human data collected at two Internet gaming sites. Their work proposes and tests two different training approaches: individual and collective. Modeling a multiagent system can be helpful in explaining and predicting certain organizational properties of the system. Lor's chapter (Chapter 9) presents an attempt in modeling multiagent dynamics by considering an analogue between an agent and a soap bubble. In the proposed model of soap agents, the interaction among agents in a system is modeled as the dynamic expansion or shrinkage of soap bubbles. We wish you enjoy reading this book. Acknowledgements We wish to express our gratitude to all the contributing authors of this book, not only for submitting their research work, but also for devoting their time and expertise in the cross-review of the chapters. Our special thanks go to Ms. Lakshmi Narayanan of World Scientific for coordinating and handling the publication/production related matters.
Chapter 1
Why Autonomy Makes the Agent Sam Joseph NeuroGrid Consulting Takahiro Kawamura Computer & Network Systems Laboratory, Toshiba
1.1 Introduction This chapter presents a philosophical position regarding the agent metaphor that defines an agent in terms of behavioural autonomy; while autonomy is defined in terms of agents modifying the way they achieve their objectives. Why might we want to use these definitions? We try to show that learning allows different approaches to the same objective to be critically assessed and thus the most appropriate selected. This idea is illustrated with examples from distributed mobile agent systems, but it is suggested that the same reasoning can be applied to communication issues amongst agents operating in a single location. The chapter is structured as follows. Section 1.2 looks at the fundamental metaphors of agents, objects and data, while section 1.3 moves on to consider the more complex concepts such as autonomy and mobility. In section 1.4 the authors attempt to define what a mobile agent actually is, and how one might be used to conserve network resources is addressed in section 1.5. Finally we explore the relationship between autonomy and learning, and try to clear up some loose ends.
7
8
S. Joseph and T.
Kawamura
1.2 Agents, Objects & Data This paper works on the premise that the position stated by Jennings et al. [17] is correct. Specifically that, amongst other things, the agent metaphor is a useful extension of the object-oriented metaphor. Object-oriented (OO) programming [29] is programming where data-abstraction is achieved by users defining their own data-structures (see figure 1), or "objects". These objects encapsulate data and methods for operating on that data; and the OO framework allows new objects to be created that inherit the properties (both data and methods) of existing objects. This allows archetypeal objects to be defined and then extended by different programmers, who needn't have complete understanding of exactly how the underlying objects are implemented. While one might develop an agent architecture using an object-oriented framework, the OO metaphor itself has little to say about the behavioural autonomy of the agents, i.e. their ability to control access to their methods. In OO the process of hiding data and associated methods from other objects, and other developers, is achieved by specifying access permissions on object-internal data elements and methods. Ideally, object internal data is invisible from outside the object, which offers functionality through a number of public methods. The locus of control is placed upon external entities (users, other objects) that manipulate the object through its public methods. The agent-oriented (AO) approach pressures the developer to think about objects as agents that make requests of each other, and then grant those requests based upon who has made the request. Agent systems have been developed that rely purely on the inherited network of accessibility of OO systems (Binder 2000), but ideally an AO programming environment would provide more fine-grained access/security control through an ACL (Agent Communication Language) interface (see figure 1).
Why Autonomy
1) Data Structure
E.g. Public int getHighestlnt(void);
Makes the Agent
9
3) Agent
Figure 1. Example specifications at each level of abstraction. 1) a user-created data structure, 2) methods are added to allow manipulation of the underlying data, giving us an object, 3) to create an agent the object (s) are wrapped in an ACL interface that specifies how to interact with the agent in this case via a DTD (Document Type Definition). Thus, the important aspect of the Agent Oriented approach is that, in opposition to object method specification, an ACL interface requires that the communicating parties must be declared allowing the agent to control access to its internal methods, and thus its behaviour. This in itself means that the agent's objectives must be considered, even if only in terms of which other entities the agent will collaborate with. The AO framework thus supports objects with objectives, which leads us on to the subject of Autonomy.
1.3 Autonomy, Messages & Mobility Autonomy is often thought of as the ability to act without the intervention of humans [3,5,12,13]. Autonomy has also been more generally defined as an agent's possession of self-control, or self-motivation, which encompasses the broader concept of an autonomous agent being free from control by any other agent, including humans [8,10,17] This is all well and good but what does it mean in functional terms?
10
S. Joseph and T.
Kawamura
Autonomous behaviour is often thought of as goal-directed [22] with autonomous agents acting in order to achieve their goals. Pro-activeness is also thought of as another fundamental quality of autonomous agents [14] in as much as an agent will periodically take the initiative, performing actions that will support future goal achievement. Barber & Martin [2] define autonomy as the ability to pursue objectives without outside interference in the decision-making process being employed to achieve those objectives. Going further they make the distinction that agents can have different degrees of autonomy with respect to different goals. For example, a thermostat autonomously carries out the goal to maintain a particular temperature range, but it does not autonomously determine its own set point. To be as concrete as possible, given that an objective may be specified (e.g. transfer data X from A to B), Autonomy can be thought of as the ability of an entity to revise that objective, or the method by which it will achieve that objective1, and an Agent is an Object that possesses Autonomy. We consider messaging as a pre-requisite of Autonomy in that if an agent cannot interact with its environment then it has no relevant basis upon which to modify, or even achieve its objectives. Perhaps we should say sensing/acting rather than messaging, but exchange of information in computer networks is arguably closer to a messaging paradigm, than a sensing/acting one. This relates to the previous discussion in which we considered how an agent communication language (ACL) forces an agent developer to specify who to collaborate with, which is part of the process of specifying an objective. For the purposes of this chapter, let us take an objective to be a goal with a set of associated security constraints. One might argue that there is not a strong connection between being able to modify one's objectives, and restrictions about who to communicate with. However if one thinks of the different agents that one can communicate with as offering different functionalities, then the extent to which one can modify one's objective becomes dependent on what information and functionality we can gain from those around us. For example; for our hypothetical agent attempting to transfer data X from A to B, it's ability to change its approach in the light of ongoing circumstances depends crucially on it's continuing interaction between the drivers that are supporting different message protocols, and the agents it has received its objectives from. If transfer cannot be achieved by available methods, then the agent will need to refer 1 To save space from now on, when we talk about modifying an objective we also mean modification of the way in which it is achieved.
Why Autonomy
Makes the Agent
11
back to other agents to get permission to access alternate transport routes, or receive new instructions. This might all be seen as a needless change in perspective over existing object development frameworks, but before we can demonstrate the benefits of this approach we need to consider code mobility, or the ability to transfer code from one processor to. If we start to ask questions about whether this means that a process running in one location gets suspended and continued in a new location we head into dangerous territory. The actual advantage of mobile agent2 techniques over other remote interaction frameworks such as Remote Procedure Calls (RPC), mobile code systems, process migration, Remote Method Invocation (RMI), etc., is still highly disputed [23] There are various studies that show advantages for mobile agents over other techniques under certain circumstances, but in general they appear to rely on assumptions about the degree of semantic compression that can be achieved by the mobile agent at a remote site [1,9,16,28,32]. In this context semantic compression refers to the ability of an agent to reduce the size of the results of an operation due to its additional understanding of what is and isn't required (e.g. disposing of copies of the same web page and further filtering them based on some user profile). However it is difficult to predict the level of semantic compression a particular agent will be able to achieve in advance3. By moving into the area of mobile agents we encounter various disputes; particularly as regards the concept of a multi-hop agent, a mobile agent that moves to and performs some activity at a number of remote locations without returning to its starting location. Some researchers such as Nwana and Ndumu [24] even go so far as to question the value of current mobile agent research. Nwana & Ndumu advocate that we should solve the problems associated with stationary agents before moving on to the more complex case of mobile agents. While there might be some truth in this, the authors of this chapter would like to suggest that in fact it is possible to gain insight into solutions that can be applied to stationary agents by investigating mobile agents. This seemingly backwards notion might become a little clearer if we allude to the possibility of constructing virtual locations within a single location.
2
Rest assured this term will soon be more concretely defined. Although there are examples in Network Management applications that avoid this problem, e.g. finding the machine with the most free memory [ 1 ]
12
S. Joseph and T.
Kawamura
1.4 Defining a Mobile Agent For further clarity we shall have to dive into definitions of state, mobile code, and mobile agent; but once we have done so we hope to show the utility of all these definitions. Specifically that they help us to think about the different types of techniques that can be used to help an agent or group of agents achieve an objective. In terms of a distributed environment possible techniques include messaging between static agents, or multi-hop mobile agents, or combinations thereof. It will hopefully become clear to the reader that these approaches can be translated into virtual agent spaces, where we consider interactions between agents in a single location. We can perhaps rephrase the issue in terms of the question: Is it more valuable to perform a serial operation (using multi-hop mobility) or a parallel operation (using messaging)? Or in other words, if we need to poll a number of knowledgeable entities in order to solve a problem, should we ask them all and potentially waste some of their time, or should we first calculate a ranking of their ability to help us and then ask them each in turn, finishing when we get the result we want? Or some combination of the two? This question is especially pertinent in the distributed network environment, since transferring information around can be highly expensive, but in the case that all our agents reside in the same place (potentially on a number of adjacent processors), the same issues arise, and the same kinds of tools (chain messages, serial agents, parallel messages) are available as alternate strategies, and their respective utilities need to be evaluated on a case-by-case basis. So let's be specific and further define our terms: • • •
Message: State: Code:
read-only, data structure read/write data structure a set of static operations on data
Where an "operation" means something that can be used to convert one data structure into another. A data structure is taken to follow the C/C++ language idea of a data structure, a variable or set of variables that may be type-specified (e.g. float, int, hashtable, etc.) that may be arbitrarily nested (e.g. hashtable of hashtables of char arrays). State is often used to refer to the maintenance of information about the currently executing step in some code, which requires a read/write data structure.
Why Autonomy
Makes the Agent
13
Given that we are transmitting something from one location to another it is possible to imagine the transmission of any of the eight possible combinations of the three types defined above (e.g. message & code, message & state, etc.). Some of the possible combinations are functionally identical since a read/write component (state) can replicate the functionality of a read-only component (message). We might have considered write-only components as well, but they would not appear to add anything to our current analysis. In summary we can distinguish four distinct entities: • • • •
MESSAGE
(implicitly parallel)
CHAIN MESSAGE (serial) MOBILE CODE
(parallel) (serial)
MOBILE OBJECT
Message Only Message & state Code Only Code & State
We can consider each of the above entities in terms of sending them to a number of network locations in either a serial or parallel fashion (see figures 2 & 3). While there are other possibilities such as star-shaped itineraries [31] or combinations of serial and parallel, we shall leave those for the moment. The important thing to note is that in a parallel operation, state has little value since any individual entity will only undergo a single-hop (one step migration), while state becomes essential to take advantage of a serial operation in order to maintain and compare the results of current processing with previous steps.
14
S. Joseph and T.
Kawamura
SA gm
SA
i
J>
V
MA
MA SA
J
MA
feedback
-i
SA i
MA { f —-J"'
Serial Figure 2. Serial Chain Message or Mobile Object framework. SA (Stationary Agent), MA (Mobile Agent). Arrows represent movement of object or message. Basically we are considering the utility of each of these entities in terms of performing distributed computation or search. If the objective is merely to gather a number of remote data items in one location, then sending a request message to each remote location will probably be sufficient. If we want to run a number of different processes on different machines, mobile code becomes necessary, if not a mobile object. However, if we think an advantage can be gained by remotely comparing and discarding the results of some processing then chain messages and mobile objects seem more appropriate (since they can maintain state in order to know what was has been achieved so far etc.).
Why Autonomy
Makes the Agent
15
Figure 3. Parallel Messaging or Mobile Code framework. SA (Stationary Agent), MA (Mobile Agent). Arrows represent movement of messages or objects. A mobile agent can be defined as a mobile object that possesses autonomy, where autonomy was previously defined as the ability to revise ones objective. In order to support autonomy in an entity we need some way of storing previous occurrences, e.g. state. Which means that a message or a piece of mobile code cannot by itself support autonomy. We also require some kind of processing in order to make the decision to change an objective or method of achieving it, which means that by itself a chain message cannot be autonomous, although by operating in tandem with the processing ability of multiple stationary agents, autonomous behaviour can be achieved. Which leaves mobile objects, which carry all the components required to support autonomy. This by itself does not make a mobile object an autonomous entity, but given that it is set up with an objective and framework for revising it, it may be made autonomous, and we would suggest that in this case that it is worth breaking out a new term, i.e. Mobile Agent. So, just to be clear about the distinction we are making: in the serial itinerary of figure 2, a mobile object will visit all possible locations, while a mobile agent has the ability to stop and revise the locations it plans to visit at In the simplest case this could be a while}} loop monitoring some environmental variable; the complexity of the decision making process is not at issue here.
16
S. Joseph and T.
Kawamura
any point. While the reader might disagree with the use of these particular words, there does seem to be a need to distinguish between the two concepts, particularly since, as we shall discuss in the next section, the presence of autonomy enables a more efficient usage of network resources.
1.5 Efficient Use of Network Resources It might well be the case that there is no killer application for mobile agents or indeed for non-mobile agents. Unlike previous "killer-apps." such as spreadsheets or web-browsers that introduced users to a new way of using a computer, agents should perhaps instead be considered as a development methodology with no associated killer-app. There is perhaps little disagreement that software should be easy to develop, maintain, upgrade, modify, re-use, and fail gracefully. One might go so far as to suggest that these kinds of qualities are likely to be provided by systems based on independent autonomous modules, or indeed agents. The pertinent question is what is the associated cost of creating such a system, and will we suffer a loss of efficiency as a consequence. What is efficiency? When we employ a system to achieve some objective on our behalf, any number of resources may be consumed, such as our patience, or emotional resolve, but more quantifiably, things like time (operation-, development-, preparation-, maintenance-), CPU cycles, network bandwidth, heap memory usage. In determining whether a (mobile) agent system is helping us achieve our goals it is important to look at all the different resources that are consumed by its operation in comparison with alternate systems. Some are more difficult to measure than others, and different people and organisations put different premiums on different resources. The authors' research into mobile-agents has focused on time and bandwidth consumption since these are considered to currently be in short supply. If we can keep all of this in mind then we might be able to assess the agent-oriented metaphor with a greater degree of objectivity than previously. The OO metaphor has overheads in terms of specifying and maintaining object hierarchies and permissions, but it seems to have become widely accepted that this is outweighed by the greater maintainability and flexibility of the code developed in this fashion. If we can show that the costs of constructing more complex agent-oriented systems is outweighed by some
Why Autonomy
Makes the Agent
17
similar advantage then perhaps we can put some arguments about agents to rest.
=3: V s
Figure 4. Communicating across the network with RPC calls. Copyright General Magic 1996. A key paper in the recent history of the mobile agent field is the Telescript white paper [33] in which some benefits of using mobile agents were introduced. There are two diagrams from this paper that have been reproduced both graphically and logically in many papers/talks/discussions on mobile agents. The first diagram shows us the Remote Procedure Call (RPC) paradigm approach to communicating with a remote location (figure 3), while the second (figure 4) indicates how all the messy individual communication strands of the RPC can be avoided by sending out a mobile agent. The central idea being that the mobile agent can reduce the number of messages moving around the network, and the start location (perhaps a user on their home computer or Personal Data Assistant • PDA) can be disconnected from the network.
G
-r^U^c
Figure 5. Communicating across the network with mobile agents. Copyright General Magic 1996. The advantage of being able to disconnect from the network is tied up with the idea that one is paying for access to the network, i.e. connecting twice for twenty seconds, half an hour apart will be a lot cheaper than being continuously connected for half an hour. While this might be the case for a
18
S. Joseph and T.
Kawamura
lot of users connecting to the network through a phone-company stranglehold, it in fact does not work well as an argument for using mobile agents throughout the network. A TCP/IP based system will break the mobile agent up into packets in order to transmit it, so the real question becomes, "Is the agent larger than the sum of the size of the messages it is replacing?". Or more generally does encoding our communication in terms of a mobile agent gain any tangible efficiency improvements over encoding it as a sequence of messages? The problem is predicting which communication encoding will be more effective for a given task and a network environment. 1.5.1 Prediction Issue The use of an agent-oriented development methodology helps in the design and maintenance of large software systems, at least as far as making them more comprehensible. But this does not automatically mean that mobile agents will necessarily have any advantage over a group of stationary agents communicating via messages. To illustrate the point let us imagine an example application that is representative of those often used to advocate mobile agent advantages. Let us say that we are searching for a number of web pages, from a variety of different search engines, a meta-search problem so to speak (see figure 6.). Search engines currently available on the web allow us to submit a set of search terms, but will not host our mobile agents. In some future situation in which we could send mobile agents out to web search engines, or in some intranet enterprise environment where database wrappers can host mobile agents [19], we might be tempted to try and send out a single mobile agent rather than lots of separate queries.
Why Autonomy s
User
Connect by Agent?
1*& / f
Makes the Agent
19
~s
j f Search Engine Interface/ A ^ V Database Wrapper J
Connect by Message?
Database
internet/Intranet
Search Engine Interface/A \ Database Wrapper J
~. i f . Database
' '
||
Search Engine Interface/ A \ Database Wrapper J
Database
Figure 6. MetaSearch through the Internet or Intranet. Quite apart from whether we might benefit from a multi-hop mobile agent performing this search for us, we can ask whether we can gain anything from having our agent perform some local processing at a single remote search engine/database wrapper. For example perhaps we are keen not to receive more than ten results from each remote location; perhaps we are searching for documents that contain the word "Microsoft", but rather than just returning the top ten hits when we get more than ten, maybe we would like the search to be narrowed by the use of additional keywords; a sort of conditional increase in search specificity as shown by the flow-chart in figure 7.
20
S. Joseph and T.
Kawamura
"Microsoft"?
No . ht¥.
r~" \
Matches
""•-.
>10?
\
Yes
..--
(Return) "Monopoly"'
No -
^--^ Yes
.--' Matches >10?
\
(Return) Figure 7. Conditionally increasing search specificity The flow chart summarises the kind of code that an agent might execute at a remote location as part of our meta-search. The main point is that if the number of matched documents is actually less than the threshold then all the information apart from the first search term is not needed. Sending out code has just consumed bandwidth without delivering any benefits. Of course, you cry, sometimes the rest of the code will be used, just not on every occasion. Exactly, but what is the likelihood that we will need the extra code or indeed extra information? Clearly we need to hedge our bets; in a search where we expect large numbers of results to require some semantic compression at a remote location, then we can happily send out lots of code and data just in case to make sure we don't take up too much bandwidth. However, we need to be more specific about the details of this trade-off. If we want to show any kind of non-situation specific advantage of transferring code/agents over the network, we need to be able predict the kinds of time/bandwidth efficiency savings they will create against the time/bandwidth their implementation consumes.
Why Autonomy
Makes the Agent
21
Joseph et al. [18] work through a more specific example of an object search application, showing that the ability to roughly predict the location of an object allows efficient switching between two different search protocols. If we refer back to figure 3 the parallel diagram indicates either Mobile Code or Message transfer, while the serial diagram in figure 2. indicates Mobile Object or Chain Message paradigms. Let us re-emphasise what it means to change our mobile object into a mobile agent. In the serial diagram we can see that the presence of behavioural autonomy, the ability to adjust one's method of achieving a goal, would allow the entity being transferred around the system to return early if the desired item was found, the network environment changed etc. The ability to adjust a plan, in this case to visit four locations in sequence, and return to base at will allows network resources to be conserved. For our single-hop agent performing meta-search, autonomy concerns controlling when to finish processing and return the results. All this requires is a while loop waiting for some change (achieving the goal), which can be achieved using a mobile object, you protest; but wait, the OO paradigm has nothing to say about whether or not that kind of framework should be set up. What the AO framework should provide is a way for these goals and the circumstances under which they should be adjusted, to be easily specified [34]. The remaining issue is how can an agent make a decision to adjust its goals or method of achieving them, if it can't predict the effects of the change. It is the authors' humble opinion that in the absence of predictive ability, agents cannot effectively make decisions, except in relatively simple environments. If a problem is sufficiently well understood then the probabilities of any occurrence might well be known, but in all those really interesting problems, where they aren't known in advance, we are forced to rely upon learning as we go along. 1.5.2 Learning We define learning as the adjustment of a model of the environment in response to experience of the environment, with the implicit objective being that one is trying to create a model that accurately reflects the true nature of the environment, or perhaps more specifically those sections of the environment that influence the objectives an agent is trying to achieve. In these terms the simple updating of a location database to accurately reflect the contents of a network location can be considered learning, but really what additional characteristics are required? Where learning is mentioned
22
S. Joseph and T.
Kawamura
one tends to think of the benefits gained through generalisation and analogy, which are in fact properties of the way in which the environment is represented in the memory of our learner. A more concrete example of this is provided by Joseph et al.[18], which we summarise here. Specifically that learning about the location of objects within a distributed network environment can lead to a more efficient use of resources. Essentially the particular learning algorithm used is not as important as the representation of the objects; although the learning algorithm needs to be able to output probabilistic estimates of an objects location (for a review of probabilistic learners see Buntine [6]), and Joseph et al. [18] used a representation based on object type5 after Segal [27]. To make a long story short, when a chain message or mobile agent is performing a serial search for an object, knowing the probability that it exists in each of the search locations allows one to estimate when the search will terminate. Even when using parallel messaging the same estimates can be used to choose a subset of possible locations to make an initial inquiry. That information then allows the alternative methods of achieving the same objective to be quantitatively compared and the most efficient option selected. The natural question that follows is "how can we be sure that our probability estimates are correct?" We can't, but the reasoning of Etzioni [11] seems sound, that we should use the results of previous searches, or processing, in order to create future predictions. It might also be expedient to rank the different representational units in terms of their predictive ability, such as finding that knowing a file is a word file allows us to predict its location with some accuracy, while knowing that something is a binary file is not so useful. In the meta-search example an estimate of how many results will be generated in response to a particular set of search terms can perform the same function. Effectively setting up a profile of which search engines are experts in which domains so that the most appropriate subset can be contacted depending on our current query. One can easily imagine a network of static agents that function as searchable databases learning about each others specialities and forwarding queries based on their mutual understanding of each other. It is in this kind of environment that one could practically use mobile agents and expect to make measurable gains in
In fact file type: executable, text, word file, etc.
Why Autonomy
Makes the Agent
23
efficiency, or at least be able to determine with some accuracy if there were any gains to be made.
1.6 Discussion The main points that have been brought up in this chapter are that the Agent-Oriented approach to software might have something to offer over and above the Object-Oriented approach. We can think of an Agent-Oriented approach as offering a developer an easy way to establish an ACL and a format for specifying the objectives of individual agents. This can be thought of as just a shift in terminology, but the authors of this chapter go further, to suggest that if this Agent-Oriented framework allows for the dynamic adjustment of agent's objectives, then functional differences can be achieved in the system performance and efficiency. We have tried to present an argument that there is a quantifiable difference between those mobile objects that can decide to adjust their objectives en route and those that can't; and if we want to take advantage of "mobile objects" they should be able to switch behaviours to suit circumstances and make those decisions on the basis of predictions about the most effective course of action. There are of course many unanswered questions, such as what our Agent-Oriented programming languages should look like, and what functions they should provide in order to assist developers in creating agents with objectives that can be modified in the face of their ongoing experience; which we hope to make the subject of future publications. For an example of the kind of work going on in this area we refer readers to the work of Wooldridge et al. [35].
Appendix Now, on to some of those prickly issues we dodged in the main sections. Firstly there is the question of how we specify an agent's objective, for example in terms of the Belief-Desire-Intention (BDI) framework [25]. While this kind of framework is clearly very important in the long term, it would seem expedient in the short term at least to simply encourage agent developers that they think about their agent's objectives. Insisting that agent system developers employ (and by implication learn) an unfamiliar new
24
S. Joseph and T.
Kawamura
objective modelling language is likely to put many people off. In the short term the objectives of agents get specified implicitly by way of any number of restrictions on agent behaviour (security restrictions, temporal restrictions, etc.). For example, an agent may be trying to load balance, but be restricted in which resources can be used to balance the processing load over a number of machines, for security reasons or whatever; this creates a bounded objective, e.g. "Solve this problem, but don't use machine B to do it". It is likely to only be a matter of time before agreed upon (or at least widely used) formats for these kind of specifications arise, but in their advance the authors' believe it is useful to work towards some kind of philosophically consistent agent-oriented metaphor before working on a detailed specification; in much the same way that the object-oriented language specifications came after the philosophical development of the OO metaphor. Next comes the problem of definitions and their value. Throughout this paper we make a number of definitions, and the fall out may be that, for example, some systems people would not like to think of as agents will be labelled as agents. An analogy could be drawn with trying to define a concept like "alive", which might be the quality an entity possesses if it matches a number of criteria such as growth, metabolism, energy use, nutrition, respiration, reproduction and response to stimuli (as one of the authors seems to remember from a high-school biology textbook). The point is that with any such definition there might be unfortunate side effects such as a car-making factory being classified as more "alive" than a virus, or that kind of thing. While this might be regarded as a horrific consequence by some, it seems that rather than repeatedly modifying definitions to try and make them fit in with our "intuition" about what is meant by a particular term, we should focus on making definitions that make a distinction that is of some value, e.g. whether or not a system can modify its stated objectives, and gaining insight from the categorisations that follow. There are various issues relating the messaging protocols since autonomy is by its nature tied up with the ability to communicate. This might not be clear at first but if autonomy is defined as ah ability to modify ones objectives there needs to be some basis to make those decisions upon. In the absence of any interaction with an environment (whether or not it has any other autonomous entities in it our not), any such decision becomes of no consequence. In a relatively static environment we might want to talk about sensing instead of communicating, but when we think about computer
Why Autonomy
Makes the Agent
25
networks, any sensing of the environment takes place in an active fashion, i.e. we might just be "sensing" the file system, but increasingly we are communicating with some file system agent or wrapper. In order for any sensing or communication to be useful in the computer network environment, protocols are necessary; or perhaps we mean ontologies? The distinction becomes complex and a full investigation is beyond the scope of this paper. In order to summarise the current convoluted state of affairs let us describe four possible outcomes of current research: 1. Everyone spontaneously agrees on some communication framework/protocol (FIPA-ACL, KQML, Labrou et al., [21]) 2. Someone works out how to formally specify all the different ACL (Agent Communication Language) dialects within one overarching framework, that includes lots of helpful ontology brokering services that make communication work [30] 3. Someone figures out how to give agents enough wits to be able to infer the meanings of speech acts from the context in which they are communicated [20] 4. Some combination of the above. While this is not a trivial issue, it is possible to overlook it in a given agent system by assuming that all the agents subscribe to a single protocol, which is often the case in most implemented agent systems. There is also the "who does what for free?" issue. Jennings et al. [17] summarise the difference between objects and agents in terms of the slogan "Objects do it for free; agents do it for money". However due to possible semantic conflict with the saying "Professionals do it for money; amateurs do it for the love of it", we suggest the possible alternate slogan "Objects do it because they have to; agents do it because they want to" in order to directly capture the point that the agent-oriented approach is advocating that software entities (i.e. agents) have a policy regarding their objectives: what they are intending to achieve, and which objectives they are prepared to collaborate in achieving. Finally we should look more closely at our definition of code. The issue being that any piece of information could be taken to represent an operation. We can get into complex epistemological questions about whether a meteor shower, or RNA protein manufacture constitute data processing. However for the current purposes we seek to define an operation as something that can
26
S. Joseph and T.
Kawamura
be interpreted within our current system as an operation. For example a simple list of letters (e.g. E,F,U,S) could be taken to represent a series of operations in a system set up to recognise that representation. In summary we are tempted to think of code as a set of operations that can be mterpreted within the system in question, although a different distinction could be created by suggesting that code distinguishes itself from data by having control flow, i.e. that conditional statements can be interpreted so that different policies will be employed under different circumstances. One final note is that we could actually construct a read-only chain message by having the remote stationary agent checking its own ID against the read-only destination IDs in the chain message, but this is not a general solution and would create security issues about untrusted hosts knowing the complete itinerary of the chain message, although some Peer to Peer (P2P) protocols do use this approach. Also of note is that we have a lot more possibilities than just sending purely parallel or purely serial messages, but then our search space gets very big very quickly. Still, these possibilities do deserve further attention.
Acknowledgements We wish to thank Takeshi Aikawa, Leader of the Computer & Network Systems Laboratory, for allowing us the opportunity to conduct this research, and Shinichi Honiden & Akihiko Ohsuga for their input and support.
References 1. Baldi M. & Picco G. P. Evaluating the Tradeoffs of Mobile Code Design Paradigms in Network Management Applications. In Kemmerer R. and Futatsugi K. (Eds.), Proc. 20th Int. Conf. Soft. Eng. (ICSE'98), ACM Press, 146155, 1998. 2. Barber S. K. & Martin C. Specification, Measurement, and Adjustment of Agent Autonomy: Theory and Implementation. Technical Report TR99-UT-LIPSAGENTS-04, University of Texas, 1999. 3. Beale R. & Wood, A. Agent-based Interaction. Proc. People and Computers IX: Proceedings of HCI'94, Glasgow, UK, 239-245, 1994.
Why Autonomy
Makes the Agent
27
4. Binder W. Design and Implementation of the J-SEAL2 Mobile Agent Kernel. http://cui.umge. ch/~ecoopws/wsOO/index.html 6th ECOOP Workshop on Mobile Object Systems: Operating System Support, Security, and Programming Languages, 2000. 5. Brown S. M., Santos Jr. E., Banks S. B., & Oxley M. E. Using Explicit Requirements and Metrics for Interface Agents User Model Correction. Proc. Second InternationalConference on Autonomous Agents, Minneapolis/St. Paul, MN. 1-7, 1998. 6. Buntine W. A guide to the literature on learning probabilistic networks from data. IEEE Trans. Knowl. & Data Eng., 8(2):195-210, 1996. 7. Carzaniga A., Picco G. P., & Vigna G. Designing distributed applications with mobile code paradigms. In Taylor R. (Ed.), Proc. 19th Int. Conf. Soft. Eng. (ICSE'97), ACM Press, 22-32, 1997. 8. Castelfranchi C. Guarantees for Autonomy in Cognitive Agent Architecture. Intelligent Agents: ECAI-94 Workshop on Agents Theories, Architectures, and Languages, M. J. Wooldridge and N. R. Jennings, Eds. Berlin: Springer-Verlag. 56-70, 1995. 9. Chia TH. & Kannapan S. Strategically mobile agents. In Rothermel K. and Popescu-Zeletin R. (Eds.) Lecture Notes in Computer Science: Mobile Agents, Springer, 1219:174-185, 1997. 10. Covrigaru, A. A. & Lindsay R. K. Deterministic Autonomous Systems. AI Magazine, vol.12. 110-117, 1991. 11. Etzioni 0. Embedding decision-analytic control in a learning architecture. Artificial Intelligence, 49:129-159, 1991. 12. Etzioni O. & Weld D. S. Intelligent Agents on the Internet: Fact, Fiction, and Forecast. IEEE Expert. 10(4). 44-49, 1995. 13. Evans M., Anderson J. & Crysdale G Achieving Flexible Autonomy in MultiAgent Systems Using Constraints. Applied Artificial Intelligence, vol. 6. 103-126,1992. 14. Foner L. N. What's An Agent, Anyway? A Sociological Case Study. MIT Media Lab, Boston, Technical Report, Agents Memo 93-01, 1993. 15. Fuggetta A., Picco G. P., and Vigna G. Understanding code mobility. IEEE Trans. Soft. Eng., 24(5):342-361, 1998. 16. Ismail L. & Hagimont D. A performance evaluation of the mobile agent paradigm. OOPSLA, ACM SigPlan Notices, 34(10):306-313, 1998. 17. Jennings N. R., Sycara K., and Wooldridge M. A roadmap of agent research and development. Autonomous Agents and Multi-Agent Systems, 1:7-38, 1998 18. Joseph S., Hattori M. & Kase N. Efficient Search Mechanisms For Learning Mobile Agent Systems. Concurrency: Practise and Experiment. In Press. 19. Kawamura T., Joseph S., Hasegawa T., Ohsuga A. & Honiden S. Evaluating the Fundamental Agent Paradigms. In Kotz D. & Mattern, F. (eds) Agent Systems, Mobile Agents, and Applications. Lecture Notes in Computer Science 1882, 2000.
28
S. Joseph and T.
Kawamura
20. Kirby S. Syntax out of learning: the cultural evolution of structured communication in a population of induction algorithms. In Floreano D., Nicoud J.-D. and Mondada F. (eds). Advances in Artificial Life. Lecture Notes in Computer Science 1674, 1999. 21. Labrou Y., Finin T., & Peng Y. Agent communication languages: the current landscape. IEEE Int. Sys. \& their App., 14:45-52, 1999. 22. Luck M. & D'Inverno M. P. A Formal Framework for Agency and Autonomy. Proc.First International Conference on Multi-Agents Systems, San Francisco, CA, 254-26, 1995. 23. Milojicic D. Mobile agent applications. IEEE Concurrency, 80-90, 1999 24. Nwana H. S. & Ndumu D. T. A perspective on software agents research. To appear in Knowledge Engineering review. 25. Rao A. S. & Georgeff M. P. Modeling rational agents within a BDI-architecture. In Fikes R. & Sandewall E. Proceedings of Knowledge Representation and Reasoning, Morgan Kaufmann. 473-484, 1991. 26. Schwartz C. Web search engines. Journal of the American Society for Information Science, 49(ll):973-982, 1998. 27. Segal R. St. Bernard: the file retrieving softbot. Unpublished Technical Report, FR-35, Washington University, 1993. 28. Strasser, M. & Schwehm, M. A performance model for mobile agent systems. In H Arabnia (Ed.) Proc. Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA'97) II: 1132-1140, 1997. 29. Stroustrup B. What is "Object-Oriented Programming"? AT&T Bell Laboratories Technical Report, 1991. 30. Sycara K., Lu. J & Klusch M. Interoperability amongst heterogeneous software agents on the internet. Carnegie Mellon University, PA(USA), Technical Report CMU-RI-TR-98-22, 1998. 31. Tahara Y., Ohsuga A. & Honiden S. Agent system development method based on agent. Proc. ICSE, IEEE, 1999. 32. Theilmann W. & Rothermel K. Disseminating mobile agents for distributed information filtering. Proc ASA/MA, IEEE Press, to appear. 33. White J. Mobile agents white paper. http:// wwwiiuf.ch/~chantem/white_whitepaper/ whitepaper.html 1996. 34. Woolridge M., Jennings N. R. & Kinny D. A Methodology for Agent-Oriented Analysis and Design. Autonomous Agents 99: 69-76, 1999. 35. Woolridge M., Jennings N. R. & Kinny D. The Gaia Methodology for AgentOriented Analysis and Design. Autonomous Agents and Multi-Agent Systems: 3,285-312,2000.
Chapter 2
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem
Y i m i n g Ye IBM T . J . Watson Research Center, USA J o h n K. Tsotsos York University, C a n a d a
2.1
Introduction
This paper studies the scaling problem with respect to an agent - a computational system t h a t inhabits dynamic, unpredictable environments. An agent has sensors to gather d a t a about the environment and can interpret this d a t a to reflect events in the environment. Furthermore, it can execute motor c o m m a n d s t h a t produce effects in the environment. Usually, it has certain knowledge about itself and the world. This knowledge can be used to guide its action selection process when exhibiting goal-directed behaviors [l] [13]. It is i m p o r t a n t for an agent to choose a reasonable representation scheme in order to scale to the task at hand. There are two extremes regarding granularity of knowledge representation. At one end of the spectrum is the scheme t h a t the selection of actions requires little or even no knowledge representation [3]. At the other end of the spectrum is the purely planning scheme which requires the agent to maintain and use as much detailed knowledge as possible. Experience suggests t h a t neither of the above two extreme schemes are capable of producing the range of behaviors required by intelligent agents in a dynamic, unpredictable environment. For example, Tyrrell [18] has noted the difficulty of applying, without modification, the model of Brooks [3] to the problem of modeling
29
30
Y. Ye and J. K. Tsotsos
action selection in animates whose behavior is supposed to mirror t h a t of real animals. On the other hand, although it is theoretically possible to compute the optimal action selection policy for an agent t h a t has a fixed set of goals and t h a t lives in a deterministic or probabilistic environment [18], it is impossible to do so in most practical situations for the following reasons: (A) resource limitations (time limit, c o m p u t a t i o n complexity [20], memory limit); (B) incomplete and incorrect information (knowledge difference [21], sensor noise, etc); (C) dynamic, non-deterministic environment. Thus, m a n y researchers argue to use hybrid architectures [19] [9] [15] [ l l ] , a combination of classical and alternative approaches, to build agent systems. One example is the layered architecture [9] [15]. In such an architecture, an agent's control subsystems are arranged into a hierarchy, with higher layers dealing with information at increasing levels of abstraction. Thus, the very lowest layer might m a p raw sensor d a t a directly onto effector outputs, while the uppermost layer deals with long-term goals. Or, the upper abstract space might be used to solve a problem and then the solution might be refined at successive levels of detail by inserting operators to achieve the conditions t h a t were ignored in the more abstract spaces [12]
[14]. Much of the previous work on scaling emphasizes the absolute complexities (efficiency) of planning systems. We, however, believe t h a t the problem of scaling is a relative term and is closely related to the task requirements of an agent in uncertain, dynamic or real-time environments. We will say t h a t an agent scales to a given task, if the agent's planning syst e m and knowledge representation scheme are able to generate the range of behaviors required by the task. We consider knowledge abstraction over a spectrum based on the granularity of knowledge representation. Our approach is different from previous approaches [9] [15] in t h a t there is no logical relationship between elements of any two adjacent layers. We study the scaling problem related to different representation schemes, be it a single granularity scheme or a hybrid granularity scheme. Many factors, such as the planning engine, the way knowledge is represented, and the dynamic environment can influence whether an agent scales to a given task. Here, we concentrate on the influences of knowledge granularity. It is obvious t h a t knowledge granularity can influence the efficiency of a given inference engine, since granularity influences the amount of d a t a to be processed by the engine. It has been suggested t h a t one m a y increase the computational efficiency by limiting the form of the statements in the knowledge base [16]
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 31 [7]. In this paper, we study the relationship between different representation schemes and the performance of an agent's planning system. T h e goal is to find the proper scheme for representing an agent's knowledge such t h a t the representation allows the agent to scale to a given task. We address the following issues. T h e first is how to define the granularity of an agent's representation of a certain kind of knowledge. T h e second is how this granularity influences the agent's action selection performance. T h e third is how the hierarchical granularity representation influences the agent's action selection performance. T h e study of these issues can help an agent in finding a reasonable granularity or scheme of representation such t h a t its behavior can scale to a given task.
2.2
A Case Study: t h e Object Search Agent
To start, we use object search as an example to study the influence of knowledge granularity on the performance of an agent. Object search is the task of searching for a given object in a given environment by a robotic agent equipped with a pan, tilt, and zoom camera (Figure (1))- It is clear t h a t exhaustive, brute-force blind search will suffice for its solution; however, the goal of the agent is to design efficient strategies for search, because exhaustive search is computationally and mechanically prohibitive for nontrivial situations. T h e action selection task for the agent refers to the task of selecting the sensing parameters (the camera's position, viewing direction and viewing angle size) so as to bring the target into the field of view of the sensor and to make the target in the image easily detectable by the given recognition algorithm. Sensor planning for object search is very i m p o r t a n t if a robot is to interact intelligently a n d effectively with its environment. In [23] [20] Ye and Tsotsos systematically study the task of object search and give an explicit algorithm to control the state parameters of the camera by considering both the search agent's knowledge about the target distribution and the ability of the recognition algorithm. In this section, we first briefly describe the two dimensional object search agent and its action selection strategy (please refer to [23] for corresponding three dimensional descriptions). Then we study the issue of knowledge granularity with respect to object search agent and present experimental results.
32
Y. Ye and J. K.
2.2.1
Task
Tsotsos
Formulation
We need to formulate the agent's sensor planning task in a way that incorporates the available knowledge of the agent and the detection ability of the recognition algorithm. The search region 0 can be in any two dimensional from/such as a two dimensional room with many two dimensional tables, etc. In practice, 0 is tessellated into a series of elements cit Q = (J? = 1 Ci and Cif\cj = 0 for i # j . In the rest of the paper, it is assumed that the search region is a two dimensional office-like enYironment and it is tessellated into little square cells of the same size. An operation f = f (SDC, yc, #, wt a) is an action of the search agent within the region 0 . Here (xC}yc) is the position of the two dimensional camera center (the origin of the camera viewing axis); # is the direction of the camera Yiewing axis, 0 < tf < 2w; w is the width of the viewing angle of the camera; and a is the recognition algorithm used to detect the target.
Fig. 2.1 An example hardware of a search agent and a search environment, (a) The search agent - a mobile platform equipped with a camera; (b) The pan, tilt, and zoom camera on the platform; (c) An example search region.
The agent's knowledge about the possible target position can be specified by a probability distribution function p , so that p(cj,Tf) gives the agent's knowledge about the probability that the center of the target is within square c» before an action f (where Tf is the time just before f is applied). Note, we use p(c 0> Tf) to represent the probability that the target is outside the search region at time Tf. The detection function on O is a function b , such that h(cit £) gives the conditional probability of detecting the target given that the center of the target is located within Ci and the operation is f. For any operation, if the projection of the center of the square c* is outside the image, we assume
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 33
b(ci,f) = 0. If the square is occluded or it is too far from the camera or too near to the camera, we also have b(ci,f) = 0. It is obvious that the probability of detecting the target by applying action f is given by
P(f) = £ > ( C i , r f ) b ( C i , f ) .
(2.1)
t=i
The reason that the term Tf is introduced in the calculation of P(f) is that the probability distribution needs to be updated whenever an action fails. Here we use Bayes' formula. Let o^ be the event that the center of the target is in square Ci, and a0 be the event that the center of the target is outside the search region. Let (3 be the event that after applying a recognition action, the recognizer successfully detects the target. Then P(-i/3 | ai) = 1 — b(c;,f). It is obvious that the updated probability distribution value after an action f failed should be P(oti | "'/?), thus we have p(c;, Tf+) = P(cti \ ~^P)- Where Tf+ is the time after f is applied. Since the above events ai, . . ., an, a0 are mutually complementary and exclusive, from Bayes formula we get the following probability updating rule:
„/, T ^ , P(ci,rf)(l-b(cj,f)) p(e;,T f + ) i— = ^ 5 — — — — . (2.2) EjiiP(ci,Tf)(l-b(cj,f)) where i = 1,. . ., n, o. The cost t(f) gives the total time needed to perform the operation f. Let On be the set of all the possible operations that can be applied. The effort allocation F = {f l t .. .,f^} gives the ordered set of operations applied in the search, where f; £ On- It is clear that the probability of detecting the target by this allocation is:
fc-i P[F) = P(fO + [1 - P(f 1 )]J'(f 2 ) + . . . + { f ] [ l - P(fi)]}P(f fc ) .
(2.3)
i=l
The total cost for applying this allocation is: k
T[F] = £ t ( * i ) . »=i
(2.4)
34
Y. Ye and J. K.
Tsotsos
Suppose K is the total time t h a t can be allowed in applying selected actions during the search process, then the task of sensor planning for object search can be defined as finding an allocation F C O n , which satisfies T[F] < K and maximizes P[F]. Since this task is NP-Complete [20], we consider a simpler problem: decide only which is the very next action to execute. Our objective then is to select as the next action the one t h a t maximizes the t e r m
m =™ •
(2.5)
We have proved t h a t in some situations, the one step look ahead strategy m a y lead to an optimal answer.
2.2.2
The
Sensor
Planning
Strategy
The agent needs to select the camera's viewing angle size and viewing direction for the next action f such t h a t E(f) is maximized. Normally, the space of available candidate actions is huge, and it is impossible to take this huge space of candidate actions into consideration. According to the image formation process and geometric relations, we have developed a m e t h o d t h a t can tessellate this huge space of candidate actions into a small number of actions t h a t must be tried. A brief description of the sensor planning strategy is as follows (please refer to [23] for detail). For a given recognition algorithm, there are m a n y possible viewing angle sizes. However, the whole search region can be examined with high probability of detection using only a small n u m b e r of them. For a given angle size, the probability of successfully recognizing the target is high only when the target is within a certain range of distance. This range is called the effective range for the given angle size. Our purpose here is to select those angles whose effective ranges will cover the entire depth TV of the search region, and at the same time there will be no overlap of their effective ranges. Suppose t h a t the biggest viewing angle for the camera is w0, and its effective range is [NQ,FO\. Then the necessary angle sizes (wi) (where 1 < i < no) and the corresponding effective ranges [Ni, Fi] (where 1 < i < no) are:
Knowledge Granularity
Wi =
Spectrum, Action Pyramid,
and the Scaling Problem
35
2arctan[(^)Han{if)];
(2.6)
For each angle size derived above, there are an infinite number of viewing directions t h a t can be considered. We have designed an algorithm t h a t can generate only directions such t h a t their union can cover the whole viewing sphere with m i n i m u m overlap [23]. Only the actions with the viewing angle sizes and the corresponding directions obtained by the above m e t h o d are taken as the candidate actions. So, the huge space of possible sensing actions is decomposed into a finite set of actions t h a t must be tried. Finally, E(f) can be used to select a m o n g t h e m for the best viewing angle size and direction. After the selected action is applied, if the target is not detected, the probability distribution will be u p d a t e d and a new action will be selected again. If the current position does not seem to find the target, the agent will select a new position and begin to search for the target at the new position.
2.2.3
Knowledge
Granularity
for
Search
Agent
As we have illustrated above, the object search agent uses its knowledge a b o u t the target position to guide its action selection process. This knowledge is encoded as a discrete probability density t h a t is updated whenever a sensing action occurs. To do this, the search environment is tessellated into a number of small squares, and each square c is associated with a probability p(c). To perfectly encode the agent's knowledge, the size of the square should be infinitely small - resulting in a continuous encoding of the knowledge. But this will not work in general because an infinite a m o u n t of memory is needed. In order to make the system work, we are forced to represent the knowledge discretely - to use squares with finite size. This gives rise to an interesting question: how we should determine the granularity of the representation (the size of the square) such t h a t the best effects or reasonable effects can be generated. To make the discussion easier, a = ( s , k s , k p , G , I , t s e i e c t , tapPiy, rameters of the agent, kjg is the configuration of the environment.
we denote an object search agent a as M, T, U). Where s is the state paagent's knowledge a b o u t the geometric k p is the agent's knowledge about the
36
Y. Ye and J. K.
Tsotsos
target position a n d is encoded as probabilities associated with tessellated cubes. G is the granularity function, which gives a measurement of the granularity of a certain knowledge representation scheme. I is the inference engine, which selects actions and updates agent's knowledge. By applying I to hs and k p , an action is generated. T h e term tappiy is the cost function for applying actions: tappiy(f) gives the t i m e needed to apply an action f and is determined by the time needed to take a picture and run the recognition algorithms. T h e term tseiect is the cost function for selecting actions. M is the agent's memory limit. T h e m e m o r y used to store all the knowledge and inference algorithms should not exceed this limit. T is the time limit. T h e total time spent by the agent in selecting actions and executing actions should be within T. U is the utility function, which measures how well the agent performs during its search process within T. T h e granularity function G can be defined as the total m e m o r y used by the agent to represent a certain kind of knowledge divided by the m e m ory used by the agent to represent a basic element of the corresponding knowledge. For example, G ( k p ) gives the granularity measurement of the knowledge representation scheme kp. Suppose the length of the search environment is L units (the side length of a square is one unit), the width of the search environment is W units. Then the total environment contains LW squares. T h e probability p(c) associated with each square c is a basic element in the representation scheme k p . Suppose m[p(c)] gives the m e m ory of the agent used to represent p(c). Then the total memory for the agent to represent kp is LWm\p(c)]. Thus, G ( k P ) = m[T(e)i = LW. Here we study the influence of G ( k p ) on the performance of the search agent. This performance can be measured by the utility a n d time limit pair {U,T). Where U = P[F] is calculated by Formula (2.3). T h e actions in F are selected according to Section 2. For a finer granularity G ( k p ) , more time will be spent on action selection, leaving less time for action execution. T h e selected actions are generally with better quality because the calculation of E(F) is more accurate in most situations. For a coarser granularity G ( k p ) , less time will be spent on action selection, leaving more time for action execution. The selected actions are generally of lower quality because calculation of E(F) is less accurate in most situations. In the following sections, we will present experiments to illustrate the influence of knowledge granularity on the agent's performance.
Knowledge Granularity Spectrum, Action Pyramid,
2.2.4
and the Scaling Problem
37
Experiments
A two dimensional simulation object search system is implemented to test the influence of the knowledge granularity on the performance of the action selection process. T h e system is implemented in C on IBM RISC System/6000. T h e search environment is a two dimensional square as shown in Figure 2.2(a). If we tessellate the two dimensional square into 1000 x 1000 small square cells, then the relevant d a t a for the system is as follows. T h e two dimensional camera has two effective angle sizes. T h e width of the first angle size is 40°. Its effective range is [50,150]. Its detection function is: 6(c,f) = D(l)(l - | | j - ) , where a < 20.5° is the angle between the agent's viewing direction and the line connecting the agent center and the cell center, D(l) is as shown in Figure 2.2(c), and I is the distance from the cell center to the agent center. According to formulas in Section 2, the width of the second effective angle size is 14°, and its effective range is [150,450]. T h e initial target distribution is as follows. T h e outside probability is 0.05. For any cell c within region A (bounded by 30 < x < 75 and 30 < y < 75), p(c) = 0.000004. For any cell c within region C (bounded by 600 < x < 900 and 600 < y < 900), p{c) = 0.000005. For any other cell c, p(c) = 0.000001. T h e agent is at position [10, 10] in the beginning. We assume t h a t there is only one recognition algorithm, thus the time needed to execute any actions are same.
1
^
(luuo.iooui
j'
I
a
disunite A—
0.4 • 0.2 •
,1.1 '^€ •
I
(a)
A
ii
|
(b)
,
1
0 20 40 60 f^O 100 120 140 160 Dili
(c)
Fig. 2.2 (a) The two dimensional environment. The agent is at the lower left corner of the region. An obstacle is present within the region, (b) The two dimensional environment when it is tessellated into a square of size 1000 X 1000- (c) The value of D(l).
In the first group of experiments, the agent only selects actions at position [10,10]. In the second group of experiments, the agent first select 7
38
Y. Ye and J. K. Tsotsos
actions at position [10,10], then it moves to position [700, 400] to begin the new search. T h e following sections list the experimental results. 2.2.4.1
Knowledge
Granularity
and Action
Selection
Time
To select the next action f, the agent need to calculate P ( f ) (Equation (2.1)) for any candidate actions (Section 2). It is obvious t h a t the knowledge granularity G ( k p ) has a great influence on the action selection time tjeiect(f)- T h e higher the value of the knowledge granularity, the longer the time needed to select an action. We have performed a series of experiments to test the influence. T h e results are listed in the following table.
G(k p ) ^select
G(k p ) ^select
G(k p ) ^•select
[30 x 30] 15
[40 x 40] 30
50 x 50] 41
[60 x 60] 91
[70 x 70] 121
[80 x 80] 157
[90 x 90] 217
[100 x 100] 289
[200 x 200] 1083
[300 x 300] 2443
[400 x 400] 4380
[500 x 500] 7467
Table 1 Note t h a t tseiect(f) (measured in seconds) is obtained by taking the difference in times obtained from the c o m m a n d "system("date")" executed before the system enters the action selection module and after the system finishes the action select module. T h e average value for different actions with the same granularity is taken as the value of tseiect f ° r the corresponding granularity. T h e accuracy is within one second. 2.2.4.2
The Error Associated
with Knowledge
Granularity
Clearly the approximations involved in discretization will cause errors in calculating various values. In general, the higher the value of the knowledge granularity, the less the error caused by discretization. The error associated with knowledge granularity m a y influence the quality of the selected actions, and thus influence the performance of the agent. Figures 2.3(d)(e)(f)(g) show how the granularity influences the error in calculating -P(f). We notice t h a t in general the higher the knowledge granularity, the less the error of the calculated -P(f). For example, for
Knowledge Granularity Spectrum, Action Pyramid, 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
1
1
!-
1
1 i— Real -9 Calculated -H— .
and the Scaling Problem
0.07 0.06 0.05
•
O
t
flhi
1 *
10 15 20 Action Index
0.04 0,03 0.02 0.01
?A*
5
0 25
0
30
5
10
15 20 Action Index
25
0
30
5
10 15 20 Action Index
25
30
(<0
(a) 0.035 40 -*— 50 -<--- _ 60 - s -
"Leu
0.025
i
:|i -Jiji'.fS
0.018 0.016 0.014
0.03
Differ
0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0
39
* J P
100 -*— 200 -*--- • 300 - B - .
•I
• .
0.012 0.02
0.01 0.008
?
0.01
0.006 0.004
t *L*
0.005
0.002
0.015
A
a \: M\
0 0
5
10
15 20 Action Index
25
30
(f) 70 -•— 80 -*--- . 90 -a—
T 0
0
5 10 15 20 25 30 Number of Applied Actions
0)
0
•
5 10 15 20 25 30 Number of Applied Actions
5 10 15 20 25 30 Number of Applied Actions
00
Fig. 2.3 (a) The calculated probability P(f) associated with knowledge granularity G ( k ) = 30 X 30 for the selected action f, and the real detection probability for f. (b) The difference between the real and calculated probabilities for G ( k ) = 30 X 30. (c) The calculated and the real probability of detecting the target for the selected effort allocation F for G ( k ) = 30 X 30. (d), (e), (f), (g) The difference between the real and calculated probability of detecting the target for different knowledge granularity, (h), (i)> (i)t (k) The real probability of detecting the target for the given effort allocation at position [10,10].
40
Y. Ye and J. K. Tsotsos
G ( k p ) = 4 0 x 4 0 , the error for the first action is 0.037115, while for G ( k p ) = 500 x 500, t h e error for the first action is 0.002356. Figures 2.3 (h)(i)(j)(k) show the real probability of detecting the target P[F] with effort allocation F for different degrees of knowledge granularity. We can see t h a t the higher the value of G ( k ), the faster the system reaches its detecting limits.
10
I0O
1000 10300 100000 le+06 Cos!
(a)
100
1000
10O0O Cosl
10O0O0
le+06
1000
(b)
(d)
1O0OO I000O0 Cos!
Ie-t06
(c)
(e)
Fig. 2.4 The influence of tappiy and G ( k ) on the performance of the agent: (a) tap-ply = 1 seconds; (b) tapply = 100 seconds; (c) tapply = 1000 seconds; (d) tappiy = 10000 seconds; (e) tappty = 100000 seconds.
2.2.4.3
Knowledge
Granularity
and Agent
Performance
In this section, we analyze t h e influence of knowledge granularity on t h e overall performance of the agent. Figure 2.3(h)(i)(j)(k) show t h a t the higher the knowledge granularity, t h e better t h e quality of the selected actions. However, t o achieve t h e expected benefits, we need t o execute t h e action in addition t o select them. Thus, both t h e action selection time a n d t h e action execution time are i m p o r t a n t . For a higher knowledge granularity, although the selected actions might have good quality, t h e time needed t o get these actions is also longer. If the time needed t o execute an action is very long, then it is worth spending more time t o select good actions. However, if the time needed t o execute an action is very short, it may not be beneficial t o spend a lot of time in action selections, because this a m o u n t of time can be used t o execute all
Knowledge Granularity
Spectrum, Action Pyramid,
and the Scaling Problem
41
the possible actions. Thus, purely reactive strategy (no planning) only wins when the action execution t i m e is short. W h e n the action execution time is very long, we are forced to spend more time (use a higher knowledge granularity) in order to select good quality actions. Figure 2.4 illustrates how the performance of the agent is affected when assuming tappiy equals 1 second, 100 seconds, 1000 seconds, 10000 seconds, and 100000 seconds, respectively. T h e performance is represented by the probability of detecting the target for the selected effort allocation F verses the cost in selecting and executing the effort allocation F . We can see from Figure 2.4(a) t h a t for tapj,iy — 1, the performance of G ( k ) = 40 x 40 is better t h a n the performance of G ( k p ) = 100 x 100 and G ( k p ) = 500 x 500. As the cost in tappiy increases, the situation changes gradually. W h e n tappiy = 10000, G ( k ) = 100 x 100 becomes the best knowledge granularity. W h e n tappiy = 100000, G ( k p ) = 500 x 500 becomes the best granularity. 2.2.4.4
When the Agent is Allowed to Move
We also performed experiments for different inference engines I and similar results are obtained. Figure 2.5 lists the experimental results when the agent is allowed to move. T h e agent first selects 7 actions at position [10,10], then it moves to [700,400] to continue the search process. From Figure 2.5, we can observe the same phenomena as we observed in the previous sections.
2.3
Knowledge Granularity in General
In general, if we represent an agent as a = (s, k i , • • •, k m , G , I, tseiect, tappiy, M,T,U), where k i , • • •, and k ^ are the representation schemes for the different kinds of knowledge maintained by the agent, and the other symbols are similar to those in Section 2.2.3, then we can define the knowledge granularity G(k») for ki as the total a m o u n t of memory needed t o represent the corresponding knowledge by scheme k; divided by the m e m o r y needed to represent a basic element of the corresponding knowledge. In this section, we study in general the influences of knowledge granularity on an agent's action selection performance. For a task oriented agent, a finer granularity usually results in a better selected action. However, the action selection time for a finer granularity is usually longer. Thus, a finer granularity requires more t i m e for selecting actions, and has less time in
42
Y. Ye and J. K.
Tsotsos
(d)
(e)
Fig. 2.5 Experiments performed for another inference engine for G ( k ) = 40 X 40 and G ( k p ) = 500 X 500. (a) Error in calculating P(f); (b) Different effects with respect to P[F],- (c) Performance when tappiy = 1; (d) Performance when tappiy = 1000; (e) Performance when tappiy = 100000.
executing actions. On the other hand, a coarser granularity requires less time for action selection, thus has more t i m e for action execution. In other words, with respect to a fixed time constraint, an agent can usually execute more low quality actions for a coarser granularity, a n d less high quality actions for a finer granularity. It is thus very interesting to study how the performance of a task oriented agent is influenced by the degree of knowledge granularity, a n d how the agent should choose a reasonable granularity from the spectrum of knowledge abstraction. Different agents use different kinds of knowledge a n d different kinds of action selection procedures. Because of the complexity a n d diversity of the world of agents, it is impossible to provide a general conclusion or solution with regard to knowledge granularity. W h a t we can do is to group agents into different categories a n d study the behavior with respect to each category. It is obvious t h a t the performance of an agent is influenced by the action execution time, te, the action selection time, ts, the total time constraint for the given task, T , a n d t h e quality Q of the selected a n d executed actions. ts and Q is influenced by the knowledge granularity adopted by the agent. Suppose for a granularity g, t h e average time needed in selecting an action is ts(g), the average contributions of a selected action
Knowledge Granularity Spectrum, Action Pyramid,
and the Scaling Problem
43
to the task is Q(g). Assuming t h a t the t o t a l contributions U m a d e by an agent within the time constraint T can be represented by the sum of the average contributions of all the actions t h a t is executed within T. Then, U can be represented as follows.
In the following, we study how U(g) is influenced by different ts(g) and Q(g). This is done by assigning ts(g) and Q(g) different functions of g and study how the value of U(g) will be influenced. We assume t h a t the total time constraint for the agent to perform a certain task is 100, T = 100; the granularity range used by the agent is [1,150]. T h e following functions are used in our empirical study: 71(5) = 6:72(5) = ln(g)', 73(5) = 5 + 1 ; 74(5) = 5 * 5 * 5 + l i 7 5 ( 5 ) — exp(g) + l. These functions represent different relations between the knowledge granularity g and the entities to be discussed. Of course, there are all kinds of relations, the above functions are only a very small fraction of the possible relations. Function 71(5) means t h a t the entity is a constant, and thus is not influenced by granularity. For example, if ts(g) = 71(5), then the action selection time is not influenced by the granularity. If Q(g) = 71(5), then the average contributions of a selected action is not influenced by granularity. Functions 72(5), 73(5), 74(5), 75(5) refer to different degrees of the influence of granularity on the entity. Figure 2.6 shows how the granularity influences the performance of the agent under different situations. Table 2 lists the indexes in Figure 2.6 with respect to their corresponding situations for ts(g) and Q(g). For example, Figure 2.6(aa) corresponds to the situations t h a t t,(g) = 71(5) and Q{g) — 71(5).
Fig.
(aa)
(ab)
(ac)
(ad)
(ae)
(ba)
(bb)
(be)
(bd)
(be)
t,(g) Q(9)
7i
7i 72
7i 73
7i 74
7i 75
72
72 72
72 74
Fig.
(ca)
(cb)
(cc)
(cd)
(ce)
7i (da)
72 73
72
7i
(db)
(dc)
(dd)
(de)
ts{g) Q(9)
73
73
73
74
74 74
74
7i
74 72
74
73
73 74
73
72
75
7i
Table 2
73
75
75
44
Y. Ye and J. K.
Tsotsos
(ae)
(ba)
(bb)
(be)
(bd)
(be
(ca)
(cb)
(cc)
(cd)
(ce)
(da)
(db)
(dc)
(dd)
(de)
Fig. 2.6
The influence of knowledge granularity on the performance of an agent.
Knowledge Granularity
Spectrum, Action Pyramid,
and the Scaling Problem
45
From Figure 2.6, we can notice t h a t t h e action execution t i m e te is a very i m p o r t a n t factor t h a t influences the selection of the knowledge granularity. Figures 2.6(aa)(ab)(ac)(ad)(ae) show the situation when action selection time is not influenced by the granularity. In this special case, the finer the granularity, the better the performance, except for the first one (aa). Figures 2.6 (ca)(cb)(cc)(cd)(ce) show situations when ts = g + 1, while Q(g) equals to 71, . . . , 75 respectively. We can notice t h a t for a large execution time ( > 50), the granularity should be low in order to guarantee t h a t at least one action can be executed. Figure 2.6(ca) shows the situation t h a t the benefit of each action is not influenced by the granularity, thus a smaller granularity is preferred no m a t t e r what the action execution time is. Figure 2.6(cc) shows the situation t h a t the action's benefit is influence in a linear way by the granularity. We can notice t h a t the result becomes complex. It depends on how the quality of the selected actions are influenced by the granularity. For example, for a small execution time, there are several granularities t h a t can generate satisfactory results. These reasonable granularities are different for different action execution time. Similar analysis can be applied to other figures. As a conclusion, in complex situations, the selection of granularity depends on m a n y factors, such as task constraint (total time allowed to perform a task), action execution time, the influence of granularity on action selection time, and the qualities of actions selected with a given granularity, etc.. In general, we can construct graphs like Figure 2.6 to analyze the effects of granularity on the performance of the agent under different factors and constraints, and then select a favorable granularity.
2.4
Selecting Knowledge Granularity
T h e experiments in the above section show t h a t the level of knowledge granularity has a big impact on the quality and speed of the agent's behavior. It is thus i m p o r t a n t for an agent to a d a p t its knowledge granularity based on environmental and task-specific demands. In this section, we address the following interesting question: how can we select the knowledge granularity G ( k ) for a given representation scheme k such t h a t the best agent performance or a relatively good agent performance can be achieved?
46
Y. Ye and J. K. Tsotsos
2.4.1
Best
Granularity
In some situations, we are able to select the best knowledge granularity in the sense t h a t it maximizes the performance of the agent. Here is an example. Suppose we have an agent whose task is to collect food from a region of length L within a t i m e limit T. T h e agent can use different representation lengths A = {l\,. . ., lq} to represent the region, (suppose j - is integer, where 1 < i < q). If the agent selects I £ {llt . .., lq} as its representation scheme k for the corresponding knowledge, then the corresponding knowledge granularity for this scheme will be G ( k ) = j . T h e total region is thus divided into j units. T h e process of food collection is as follows. Before the collecting process, all the units of the region will be in the status of "not ready". W h e n the collecting process begin, one of the units becomes "ready". T h e agent will then search for this unit. T h e time, ts(l), used by the agent to locate the unit is the t i m e for the agent to select an action under the current representation scheme. Suppose ts(l) = j . After the unit is located, the agent will collect food from this unit. T h e total time needed for the agent to collect food is the time needed for the agent to execute the selected action. Suppose it is te(l) = CI (where C is a constant). T h e total a m o u n t of food t h a t is collected is B(l) = j . W h e n the agent finishes its food collection process at the selected unit, the status of another unit will become "ready". T h e agent will search for this new unit and collect food again from this new unit. This process will continue until the total time T is used u p . If the total time T is exhausted when the agent is locating a unit or when the agent is collecting food within a unit, then the a m o u n t of collected food from the corresponding unit will be zero. It is obvious t h a t the number of units t h a t can be processed by the agent within T is t Wu-t (I)' a n c ^ ^ e n u m D e r of units available is j . T h e performance 9 of the agent is measured by the total a m o u n t of food collected by the agent and is given by the following formula:
* = S
This is actually
W) , T
,Dm lt,(l)+t<(l)\B(l)
11 t
>L
T
if
( 0
( 0
- Li " t,(i)+t«(0 ' T^ ' if < 11 t,(l)+t.(0 ^ l
•
(2-7)
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 47 L
i=
* = < ''
if < < y
T_CL
- y±zl±.
.
(2.8)
T-CL
T h e problem is to find a ! i n A = { ! i , . . , l ! , } such t h a t * is maximized. T h e set A can be divided into two parts A A = {h,---,lj}
and Ag
{lj+i, • • •, lq}, such t h a t all the elements in A A are less t h a n JT_CL,
= and
all the elements in Ajg are greater t h a n or equal to JT_LCLIt is obvious t h a t for elements I £ AA, the smallest one has the best performance because h is a decreasing function. For elements I £ AB, we can calculate the value of Lc^TT-l I *° identify the best element. T h e n we compare the smallest element in A^ and the best element in A B to identify the one t h a t maximizes the performance of the system. T h e above example shows t h a t in some situations, an agent is able t o identify an o p t i m u m knowledge granularity based on the task requirement (here T ) and the environmental characteristics (here L). T h e basic m e t h o d is to try to represent the performance of the agent as a function of the agent's knowledge granularity, and then to find the granularity t h a t maximizes the performance. In general, it is very difficult or even impossible to find a best knowledge granularity for an agent, because the performance of the agent might be influenced by m a n y other factors in addition to the knowledge granularity. For example, there does not exist a best knowledge granularity for the object search agent, because its performance is also influenced by the initial target distribution. A granularity t h a t is best for one distribution might not be the best for another distribution. T h u s , in general, we need to relax our requirements. Instead of finding the best granularity, we search for a reasonable one such t h a t a relatively good performance can be achieved. Because of the variations of different agent systems, it is impossible to provide a detailed procedure to select the acceptable granularity t h a t can be applied to all the agent systems. However, we can provide a general guideline for the selection of the knowledge granularity.
48
Y. Ye and J. K. Tsotsos
2.4.2
Selecting Reasonable Environment
Granularity
in Complex
Agent
In an agent environment where the relationships a m o n g the task constraints, the environments, and the knowledge granularity are very complex, the "demand-environment-granularity" (DEG) Hash Table can be used to select a reasonable granularity. T h e D E G Hash Table is a Hash Table such t h a t the "key" is the combination of different factors and the "value" is the granularity t h a t is appropriate for the corresponding factors. When an agent is informed of task requirements, it first transforms the task requirements and the environmental factors into a key. Then it retrieves the granularity from the D E G Hash Table based on the key. This granularity will be used by the agent to represent the corresponding knowledge. For a complex agent environment, it might have more t h a n one task constraints T i , . .., T „ T . Each T i forms one component in the "key" of the D E G Hash Table. It can be divided into several groups T^i, . . ., T;^ based on certain criteria. For example, the task constraint for an object search agent is the total time available for the search. This time constraint can be divided into groups like "from 1 second to 30 seconds", "from 30 seconds to 100 seconds", etc.. In addition to the task constraints, we should also consider the influences of the environmental factors when selecting the granularity. Suppose E\, . . ., EnB are the environment factors t h a t need to be considered. Like above, each Ej can be divided into several groups T; r i, . . . , T^fe based on a certain criteria. T h e D E G Hash Table is then looks like following:
Ti ti
J- I 1 T ^TIT
Ei
E
ei
e
n n
E
B
G
9
Table 3 Where each row in the table, except the first one, gives a "key" (ti,. . . , tnT, e i , . . ., e n s ) and the corresponding granularity value g. Here, ti is a category (group) for the task constraint factor T ; and e; is a category (group) for the environmental factor E i . Term g is the knowledge granularity value corresponding to the "key" and should be obtained by conducting various
Knowledge Granularity Spectrum, Action Pyramid,
and the Scaling Problem
49
simulation experiments or theoretical analysis before the agent performs any task. W h e n an agent is informed of a task, it first determines the key based on the current situations, and then uses this "key" to locate the knowledge granularity.
2.5
Knowledge Granularity Spectrum and Action Pyramid
From the above discussion we know t h a t knowledge granularity has a big influence on the quality of selected actions. Usually, with respect to the knowledge granularity spectrum, the higher the value of the knowledge granularity, the better the quality of the selected actions. However, it is not always beneficial to use high granularity, because the cost usually increases as well. In order to benefit from both the short action selection time of low granularity and the high quality of the selected actions of high granularity, a hierarchy of granularity layers can be used to select actions. For example, we can choose several granularities for the purpose of action selection, as follows. First, the coarsest granularity is used to select a set of actions from the pool of all the actions. Then the second coarsest granularity is used to select a even smaller set of actions from the set of actions chosen before. This procedure continues until the actual action to be applied is selected according to the finest granularity. T h e sets of actions selected by different granularities form an action pyramid. In this section, we compare the performances of a single layer granularity scheme and the multi-layer granularity scheme. We restrict our discussion to two layers because of limited space. Similar results can be obtained for more t h a n two layers.
2.5.1
Adding actions
a coarse
new layer for
filtering
out
non-interesting
Suppose after some analysis we find t h a t granularity g0 is a favorable choice. We might increase the performance by adding a coarser layer. T h e idea is to use the coarser granularity to select a small set of actions t h a t are suitable, and then use g0 to select an action to execute from this small set of actions. Suppose originally there are totally N actions to be selected from. Now consider adding another layer of granularity gc. Suppose t h a t for gc, we need to collect the first n9c actions in order to guarantee t h a t enough good actions are within this set of actions. In other words, the quality of the
50
Y. Ye and J. K. Tsotsos
actions selected by g0 from the total N actions is almost the same as the quality of the actions selected by g0 from the n9c actions. Suppose t h a t for this planning system the action selection time is governed by t(g,n), where g is the granularity and n is the number of actions t o be considered. Now we compare the time needed to select one action for the two strategies. T h e time T0S needed to select an action from the single layer is: T* = t(g0,N). T h e t i m e needed to select an action T£ for the new strategy is: T^ = t(gc,N) + t(g0,ngJ. Thus, the difference in selecting an action by the two strategy is given by: T£ - Z
= t(gc, N) + t(9o, ngc) - t(9o,
N)
Figure 2.7 and Figure 2.8 show the results of experiments t h a t are performed to test the performance difference under different strategies. In our experiments, we assume t h a t t(g,n) — ti(g)t2(n). Where the function t\ gives the sensitivity measurement of the granularity g with respect to the action selection time, and i j gives the sensitivity measurement of the number of actions to be considered y with respect to the action selection time. T h e index (i,j) in Figure ? ? means t h a t ti(x) = ^(x), t 2 (j/) = lj(y), where function 7 is defined in the previous section. For example, Figure 2.7(2,3) means t h a t t\(x) = 72(2;), ^2(2/) = 73(2/)- In the figure, the z axis is Ts, the difference in action selection time. Other axis are gc and n3c. In the test, we set N = 100 and g0 = 100. Let gc and n9c change. The range for n3c is 1 < n 9 c < 100, the range for gc is 15 < gc < 100. Figure 2.7 and Figure 2.8 show the difference T£ - T0S = t(g', 100) + t(100,ng>) - i(100,100) as a function of gc and n3c. In order to make the comparison easier, we also draw the surface of z = 0. Figure 2.7(1,1) shows the situation t h a t the action selection t i m e is not influenced by the granularity a n d the number of actions to be selected from. In this situation, the two layer strategy is always worse t h a n the single layer strategy by a constant. This constant is the time used to pre-select the set of actions by the coarse layer. Figure 2.7(1,1)(1,2)(1,3) show the situation when the action selection time is not influence by granularity. In this case, adding a new coarse layer does not save time. Because the coarse layer itself will spend the same t i m e as the old granularity g0, and extra t i m e must be spent by g0 to select an action from the pre-selected action pools by gc. Figures 2.7(2,1)(2,2)(2,3)(2,4)(2,5) show the situation t h a t the influence of granularity on the action selection is governed by 72, while the influence of the number of actions is governed
Knowledge Granularity
Spectrum, Action Pyramid,
and the Scaling Problem
51
by 71, 72, 73, 741 and 75 respectively. We can notice t h a t the more sensitive the action selection time is influenced by the number of actions in the action pool, the better the two layer strategy. This is illustrated by the increase of the area of those gc and n9c t h a t is below the plane z = 0. T h e reason is t h a t a decrease in granularity for a more sensitive situation tends to have a bigger saving in action selection time. T h e same analysis can be applied to Figures 2.7(3,2)(3,3)(3,4), Figures 2.8(4,1)(4,2)(4,3)(4,4)(4,5), and Figures 2.8(5,1)(5,2)(5,3)(5,4)(5,5). From Figure 2.7 and Figure 2.8 we can also notice t h a t for a fixed granularity gc, the smaller the value of n9c, the better the two layer strategy. T h e reason is t h a t a smaller n3c tends to save t i m e for g0. We can also notice t h a t for a fixed ngc, the smaller the value of gc, the better the two layer strategy. From above experiment, we know t h a t in some situations adding a coarse layer can increase the performance of an agent. T h u s , when a single granularity does not allow the agent to scale to the task at hand, we can consider adding a coarse layer to increase the chances of scaling. To do this, we can first draw the performance figure as above, and then select the granularity t h a t corresponds to the lowest point on the surface as the granularity for the coarse layer.
2.5.2
Adding
a finer layer
to obtain
better
quality
actions
Another way to use hierarchical representation to increase performance and chances of scaling is to add a finer layer. T h e idea is to use the current granularity g0 to pre-select a small set of candidate actions, and then use a finer granularity gj to choose a better quality action to execute. T h e utility for the single layer strategy is:
For the two layer strategy, Suppose n3o is the number of actions t h a t must be selected by g0 in order to guarantee t h a t the actions selected by gj will reach a desired quality Q(gf). T h e time to select an action for the two layer strategy is: ts = t(g0,N) +t(gf , n 3 o ) Suppose the t o t a l time available for the agent is T. T h e utility of the new strategy is: U{9f)
= L
t. + t.0r o ,JV)'+*.^,»,.) Jga " )
52
Y. Ye and J. K.
Tsotsos
(2,4)
(2,5)
Fig. 2.7 The performance comparison when a coarse layer of granularity is added to pre-select a small set of actions to consider.
Knowledge Granularity Spectrum,
Action Pyramid,
53
(4,5)
(4,4)
(5,1)
and the Scaling Problem
(5,2)
(5,3)
Fig. 2.8 Continued: the performance comparison when a coarse layer of granularity is added to pre-select a small set of actions to consider.
54
Y. Ye and J. K.
Tsotsos
Experiments have been performed to show the performance difference of the new strategy and the old strategy, U&ijj = U(gf) — U(g0). In the experiments, we assume: T — 100, g0 = 100, N = 100. We also assume t h a t the action execution t i m e is te = 6. In general te has a big influence on the analyzing result. Here we take te = 6 as an example to study the influence of other factors on the agent performance. As in the previous section, we assume ts(g,n) — t\{g)t2{n). Q(g) is another function, which gives the quality of the action selected with granularity g. In the experiments, we take gj as one variable, and n9o as another variable, and we draw the surface formed by {/<*»// when gj and nT are changing. T h e domain range for the two variables are 100 < gj < 150 and 1 < nT < 100. In Figure 2.9, the index (i,j,k) means t h a t the figure is drawn by setting t,(g,n) = 7 i ( 5 , ) 7 j ( n / 7 0 ) and Q(g) = •yk(g). Figure 2.9(1,1,1) shows the situation when granularity a n d the n u m b e r of actions to be selected has no influence on the action selection time, and granularity has no influence on the quality of the selected actions. In this situation, the two layer strategy is always worse t h a n the one layer strategy, because it is a waste of effort to pre-select a set of actions for the finer layer. Figures 2.9(2,2,2)(2,2,3)(2,2,5), Figures 2.9(2,3,2)(2,3,3)(2,3,5), Figures 2.9(3,3,2)(3,3,3)(3,3,5), a n d Figures 2.9(3,5,2)(3,5,3) show t h a t the more sensitive the quality of selected actions with respect to granularity, the more benefit the two layer strategy. T h e influence of the sensitivity of rigo with respect t o action select t i m e on the performance can be complex. For example, Figures 2.9(2,2,2)(2,3,2) and Figures 2.9(3,3,2)(3,5,2) show t h a t a sensitive n9o with respect to active selection time is not good for the two layer strategy; Figures 2.9(2,2,3)(2,3,3) show the opposite; while Figures 2.9(3,3,3)(3,5,3) show sometime it is good and sometimes it is not good depending on different n r and gj. In general, many factors can influence the performance of the two strategies and a graph need to be drawn in order to determine which strategy is better.
2.6
Conclusion
In this paper, we introduce the concept of knowledge granularity and study the relationship between different knowledge representation schemes and the scaling problem. We p r o m o t e the viewpoint t h a t the problem of scaling is closely related to an agent's task requirement. By scale to a task,
Knowledge
Granularity Spectrum, Action Pyramid, and the Scaling Problem
(1.1,1)
(2,2,2)
(2,2,3)
(2,2,5)
(2,3,2)
(2,3,3)
(2,3,5)
(3,3,2)
(3,3,3)
(3,3,5)
(3,5,2)
(3,5,3)
55
Fig. 2.9 The performance comparison when a finer layer of granularity is added to select good quality actions.
56
Y. Ye and J. K.
Tsotsos
we mean t h a t an agent's planning system and knowledge representation scheme are able to generate the range of behaviors required by the task in a timely fashion. Here, we study the influence of knowledge granularity and related representation schemes on an agent's scaling problem. From the study of related issues with respect to an object search agent, we know t h a t knowledge granularity can greatly influence the performance and scaling of an agent. We then perform experiments to study the influence of knowledge granularity on an agent's performance under various situations. Experimental results show t h a t m a n y factors can influence the values of reasonable granularities t h a t allow an agent to scale to a given task, such as the task constraint a n d action execution time, etc.. In complex situations, a reasonable granularity can be selected by constructing graphs as in Figure 3 and analyzing the resulted graphs. Finally, we conduct experiments to compare the performance between a single layer granularity scheme and multiple layer granularity scheme. T h e reason to use several granularities from the granularity spectrum and to choose actions from the corresponding action pyramid is t h a t the agent might benefit from b o t h the short action selection t i m e of low granularities and the high quality of the selected actions of high granularities. Experimental results show t h a t a hierarchical representation scheme can produce a better performance in some situations, especially when the quality of the selected actions are greatly influenced by the granularity or when the action selection time is greatly influenced by the granularity.
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 57
Bibliography
1] Allen, J., Tate, A., Hendler, J., (1990) Readings in Planning, Norgan Kaufmann Publishers, USA. 2] Bajcsy, R. (1985) "Active perception vs. passive perception", Third IEEE Workshop on Vision, pages 55-59, Bellaire. 3] Brooks, R. A., (1991) "Intelligence Without Representation", Artificial Intelligence, 47: 139-160. 4] Chapman D. (1987) "Planning for Conjunctive goals", Artificial Intelligence, 32(3): 333-377. 5] Connell J. (1989) "An Artificial Creature", Ph.D Thesis, AI Lab, MIT 6] Currie K. and Tate A. (1986) "O-plan: the open planning architecture", Artificial Intelligence, 31(1): 49-86. 7] Doyle J. and Patil R. (1991) "Two theses of knowledge representation: Language restrictions, taxonomic classification, and the utility of representation services", Artificial Intelligence, 48: 261-298. 8] Etzioni O. (1990) "Acquiring search-control knowledge via static analysis", Artificial Intelligence, 62 255-302. 9] Ferguson I. (1992) "Touring Machine: An Architecture for Dynamic, Rational, Mobile Agents", Ph.D thesis, University of Cambridge, UK. 10] Garvey T.D. (1976) "Perceptual strategies for purposive vision", Technical Report, SRI International ill] Georgeff M. and Lansky A.L. (1987), "Reactive reasoning and planning", Proceedings of the Sixth National Conference on Artificial Intelligence, pages 677-682, Seattle, WA. ;12] Giunchiglia F., Villafiorita, A., Walsh, T. (1997) "Theories of abstraction", AI Communications, 10: 167-176. :13] Hendler J. and Tate A. and Drummond M. (1990), "AI planning: Systems and Technologies", AI Magazine, Summar. ,14] Knoblock C. (1994), "Automatically generating abstractions for planning", Artificial Intelligence, 68:243-302.
58
Y. Ye and J. K. Tsotsos
[15] Muller J.P. and Pischel M. (1994), "Modelling interacting agents in dynamic environments", Proceedings of the Eleventh European Conference on Artificial Intelligence, pages 709-713, Amsterdam, The Netherland. [16] Selman B. and Kautz H. (1996), "Knowledge Compilation and Theory Approximation", Journal of the ACM, 43(2): 193-224. [17] Sen S., Roychowdhury, S., and Arora, N. (1996), "Effects of local information on group behavior", Proceedings of Second International Conference on Multi-Agent Systems, pages 315-321, Kyoto, Japan [18] TyrreU T. (1993), "Computational Mechanisms for Action Selection", Ph.D Thesis, Center for Cognitive Science, University of Edinburgh, England. [19] Wooldridge M. and Jenning N. (1995), "Intelligent agents: theory and practice", The Knowledge Engineering Review, 10(2):115-152. [20] Ye, Y. and Tsotsos K.J. (1996), "Sensor planning in 3D object search: its formulation and complexity", The 4th International Symposium on Artificial Intelligence and Mathematics, Florida, U.S.A. [21] Ye, Y. and Tsotsos K.J. (1997), "Knowledge Difference and its Influence on a Search Agent", First International Conference on Autonomous Agents, Marina del Rey, CA. [22] Ye, Y. and Tsotsos K.J. (1999), "Knowledge Granularity for Task Oriented Agents", Proceedings of the Third Annual Conference on Autonomous Agents, USA. [23] Ye, Y. and Tsotsos K.J. (1999a), "Sensor Planning for 3D Object Search", Computer Vision and Image Understanding, 73(2):145-169.
Chapter 3
The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems K. Suzanne Barber and Cheryl E. Martin The Laboratory for Intelligent Processes and Systems, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
3.1 Introduction As agent-based applications have become more widespread, the demand for robust performance and flexibility has increased. Agent characteristics required to meet these demands include the ability to adapt to changes in the environment through re-planning, re-scheduling, or re-organizing. This adaptation should occur dynamically, during agent operation, for maximum effect. The research presented here focuses on adaptation at the organizational level in multi-agent systems. Dynamic reorganization allows agents to overcome problems such as agent failure (by restructuring collaborative decision-making to exclude failed agents), communication failure (by allowing agents awaiting orders to eventually take initiative), and under-performance (by allowing agents to discover new collaborations that may work better). The idea that different organizations work better under different circumstances was first formalized with respect to human
59
60 K. S. Barber and C. E. Martin organizations as the basis of contingency theory [15]. Contingency theory holds that the best way to organize is contingent on environmental conditions. Although the original theory considered only organizational design with respect to the organization's environment, the application of this concept has recently been extended to organizational change with respect to task-environment characteristics [1, 23]. That is, to remain effective over changing situations, an organization may need to change as well. Various types and degrees of reorganization are available for multi-agent systems (see the Related Work section for comparisons). This research focuses on reorganization in which the decision-making-control and authority-over relationships among agents are allowed to change during system operation, but the application-specific resources, task responsibilities, and capabilities of the agents may remain constant. Decision-making control and authority-over relationships dictate how agents interact to determine a solution during collaborative problem solving. A specification of "decisionmaking control" dictates which agents make decisions about how to achieve a goal. A specification of "authority-over" dictates to which agents the decision-makers can assign tasks (i.e. which agents the decision-makers have authority over). These relationships are described by a decision-making framework. Decision-making frameworks can vary from centralized control, to distributed control, to local control. An agent's degree of autonomy is determined by the decision-making frameworks in which it participates [4]. An agent's degree of autonomy can be described qualitatively along a spectrum as shown in Figure 1. The three discrete autonomy level categories labeled in Figure 1 define salient points along the spectrum.
Figure 1. The Autonomy Spectrum. Command-driven — The agent does not make any decisions about how to pursue its goal and must obey orders given by some other agent(s). True Consensus — The agent works as a team member, sharing decisionmaking control equally with all other decision-making agents.
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
Locally Autonomous /Master — The agent makes decisions alone and may or may not give orders to other (command-driven) agents. Agents in a multi-agent system can move along the autonomy spectrum during system operation by forming, dissolving, or modifying decisionmaking interactions with other agents. The research presented here provides empirical motivation for this capability. Multi-agent research must often rely on experimental observation to answer questions for which no formal proof can be offered. Due to the complexity of multi-agent systems in general, the performance of agents operating under a particular decision-making framework cannot be determined a priori. The experiments presented here allow performance differences to be examined through empirical observation. Good experimental design is crucial, including the formation of a strong, refutable hypothesis, the use of statistically significant samples, and the ability to repeat experimental observations. This chapter discusses in detail the infrastnicture for, and design of, the experiments presented here as well as the result? of these experiments. The authors' previous multi-agent experiments have shown that the "best" decision-making framework for a group of agents depends not only on the application and the pre-defined characteristics of the system, but also on run-time factors that can change during system operation [3]. This result remained consistent regardless of which performance metric was used to identify the "best" decision-making framework. The research presented here extends the results of those experiments. There are several differences between those previous experiments and the research presented here. The previous experiments employed a single monolithic process to produce the behavior of the system, environment, and every agent. The current experiments have been performed using a true multi-agent distributed testbed [2]. In addition, the algorithms used by the agents to solve the application's problem have been completely re-worked. Finally, the number of different types of situations the agents can encounter has been increased. This research confirms the results of the previous experiments and shows that the best decision-making framework for a group of agents depends on the agents' current run-time situation. These experiments provide a clear motivation for the implementation of dynamic decision-making frameworks in multi-agent systems.
61
62
K. S. Barber and C. E.
Martin
3.2 Related Work This work relates to several different areas of agent-based research. Primarily, these areas include organizational adaptation through reconfiguration and re-structuring as well as performance comparisons across different types of organizations. This section discusses each of these areas in detail. 3.2. J Organizational Adaptation An organization's structure defines the pattern of information, control, and communication relationships among agents as well as the distribution of tasks, resources, and capabilities [12, 23, 24]. There are two primary classifications for organizational adaptation (1) organizational reconfiguration in which the structure of the organization remains the same but the identity of the participants in this structure may vary over time and (2) organizational restructuring in which the structure of the organization itself changes over time. Both types of reorganization attempt to adapt the agents' organization to their current situation. An agent's situation is defined by the current characteristics of its own state, its goals or tasks, and its environment — including other agents. 3.2.1.1 Organizational Reconfiguration Agent interaction within a problem-solving environment can be modeled as the fulfillment of certain application-specific roles [12, 16, 23]. A role specifies application-specific tasks an agent takes on. Organizations can be dynamically "reconfigured" by allowing agents to dynamically assume one or more different pre-defined roles during system operation [23]. One good example of such a system comes from research on flexible teamwork using the STEAM approach [25]. The teamwork model in STEAM allows agents to monitor their progress toward achieving team-oriented goals. Agents take on roles with associated application-specific responsibilities (e.g. "scout" or "company commander" in a synthetic battlefield attack helicopter domain). The STEAM model allows agents participating in a simulation to monitor their teammates for the failure to perform tasks associated with their respective roles. If a critical role failure is detected (e.g. the company's scout helicopter crashes into a hill before it can survey its assigned area [25]), the team can be reconfigured by substituting another agent into the failed role.
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
The general teamwork model implemented by STEAM additionally supports a rich set of dynamic coordination capabilities (see below). Another example of organizational reconfiguration is employed by the RETSINA approach [9]. For example, with respect to information agents, many different agent instances may fulfill the role of 'information provider' for a given problem. The RETSINA agent approach uses 'middle agents' to help route information requests to deal with the failure and recovery of agents or communication links. The set of agents participating in a particular instance of problem-solving may vary as the system operates. Reconfiguration, substituting available agents into failed roles or allowing agents to take on roles in addition to their own, is a powerful adaptation mechanism for multiagent systems facing agent failure and unreliable communication. 3.2.1.2 Organizational Restructuring Reorganization can also affect the structure of the organization more deeply by adding or removing roles or by changing the allocation of tasks and resources among roles, the pattern of control relationships among agents, or the available communication channels. One basic form of organizational restructuring involves composition and decomposition of agents. This type of reorganization is employed by a technique called organizational self design (OSD) based on strategic work-allocation and load-balancing [14]. As agents become overloaded, they decompose themselves into two agents and share the work. If two agents are idle too long, they compose themselves into one agent, which frees up system resources. Restructuring of this sort affects the system at a fundamental level, allowing it to function more efficiently overall. By changing the structure of the organization in which coordination occurs, reorganization mechanisms such as OSD and Dynamic Adaptive Autonomy (explored in this chapter) also indirectly change the way that agents in a system coordinate to achieve their goals. Other types of adaptation mechanisms operate directly on agents' coordination algorithms. Dynamic coordination mechanisms allow agents to dynamically change the way interleaving agent actions are scheduled or change which agent is responsible for what task. Under dynamic coordination, the overall domainspecific role of an agent as well as its decision-making interactions with other agents may remain constant, but some of its lower-level tasks or actions may change to fit the situation. For example, on a "trick play" in American football a running-back may attempt to throw a forward pass,
63
64
K. S. Barber and C. E.
Martin
which is a task usually reserved for the role of quarterback. Representations such as production lattices [13] and partial global plans [10, 11] support the implementation of dynamic coordination mechanisms. An example of dynamic coordination can be seen in the anti-air defense domain where agents must coordinate to destroy incoming warheads [20]. Although the agents' decision-making framework, as defined by this chapter, is fixed, where each agent locally determines and implements its own behavior, the coordination among the agents (i.e. which incoming warhead each agent shoots down) is dynamic. Each agent uses payoff matrices to determine the most useful action for it to take given its model of the environment and other agents. Overall, this research shows that allowing agents to determine their own coordination behavior (including action and message selection) on-thefly, rather than relying on fixed protocols, performs well for dynamic, unpredictable domains [19, 20]. In addition, Decker and Lesser have shown that dynamic coordination is more effective than static coordination for distributed sensor networks [8]. Because organizational restructuring affects coordination, organizational restructuring holds similar promise. 3.2.1.3 Relationship to Dynamic Adaptive Autonomy The experiments presented here explore a form of organizational restructuring called Dynamic Adaptive Autonomy (DAA). Under DAA, the part of the organizational structure that specifies relationships of decisionmaking control is allowed to dynamically vary. Other pieces of the organizational structure (application-specific resources, task responsibilities, and capabilities of the agents) are held constant. DAA varies which agents are in control of making decisions and who is bound to carry out the decisions made. This, in turn, affects the coordination mechanisms used by the agents. DAA gives agents the capability to dynamically adapt their decision-making frameworks to their situation. The experiments presented in this chapter motivate the implementation of DAA by showing that agent performance does vary across situations given fixed decision-making frameworks. 3.2.2 Performance Comparisons Across Organization Types Which decision-making framework is most appropriate for a given situation depends on how well the agents perform in that situation under that decisionmaking framework. Previous research has described both advantages and
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
disadvantages for statically defined centralized (master/ command-driven), distributed (consensus or locally autonomous with communication), and local-control (locally autonomous with no communication) problem solving structures [6, 7, 17, 18, 23]. Given the tradeoffs among problem-solving frameworks across various statically-defined conditions, it is possible that such tradeoffs exist with respect to dynamic run-time variables as well. Osawa has described experiments showing that for at least one situational change in the pursuit game that a dynamic change of "organizational scheme" (similar to decision-making framework) can be beneficial [22]. The preliminary experiments on DAA showed that variations in the performance of decision-making frameworks due to differences in dynamic run-time conditions are pervasive in complex, dynamic environments [3]. The experiments presented in this chapter extend these results and further support this hypothesis.
3.3 Multi-Agent Testbed Infrastructure The experimental data presented in this chapter was gathered using the Sensible Agent Testbed [2]. The Sensible Agent Testbed provides a completely distributed infrastructure for implementation and execution of multi-agent systems. The testbed primarily supports agents that conform to the Sensible Agent architecture [5]. However, any type of agent can be used with the testbed as long as it (1) conforms to the formal interface definitions specified for interacting with the environment and (2) can interpret messages from and send interpretable messages to the other agents in the system.
Figure 2. Structure of Sensible Agent Testbed
65
66
K. S. Barber and C. E.
Martin
The Sensible Agent Testbed, whose structure is depicted in Figure 2 exploits the concepts, standards, and technologies of the Object Management Group's (OMG ) Common Object Request Broker Architecture (CORBA®) [21]. The CORBA Interface Definition Language (IDL ) provides a means to define the public interfaces of each runtime module in an implementationlanguage independent manner. Figure 2 shows the connectivity among the different components of the Sensible Agent Testbed. Each Sensible Agent is composed of a set of internal module and sub-module instances that are linked to one another and to a Sensible Agent System Interface (SASI). The SASI provides a single handle for communication, naming, and information flow into and out of an agent. There is no direct access between an agent's internal modules and the world external to the agent. Therefore, each "agent" is abstractly modeled by the rest of the testbed as the public interface of the SASI [2]. The simulation environment functions independently of the internal architectures of the agents in the system. The agents must gather all information about their system and other agents from sensors registered with the environment. The SASI objects have no direct connection to one another. All inter-agent communication is routed through the environment simulator and communication sensors. This allows experimental control of communication capabilities. A primary concern for the implementation of the Sensible Agent Testbed is the support for repeatable experimentation. Given the same initialization information, initial random seeds, and repeatable agent behavior, any testbed execution is repeatable [5]. The Sensible Agent Testbed supports various types of multi-agent experiments including those presented here.
3.4 Multi-agent Experiments The experiments presented here are based on the following hypothesis: The most appropriate decision-making framework for agents in a multi-agent system varies with run-time changes in situation characteristics. For the purposes of these experiments the "most appropriate" decision-making framework is defined as that which most often performs the best for a given performance measure. This chapter supports this hypothesis through a series of simulations in the problem domain of naval radar frequency management
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
where quality and cost of solution are compared across decision-making frameworks and situations.
3.4.1 Problem Domain and Agent Behavior These experiments were performed using the problem domain of naval radar frequency management. For the purposes of this discussion, a naval radar is a radar on-board a military ship. A radar detects distant objects and determines their position and velocity by emitting very high-frequency radio waves and analyzing the returning signal reflected from the targets' surfaces. Each ship in this system carries one radar. There is one agent associated with each radar. Radar interference is any form of signal energy detected by a radar that comes from some source other than a reflection of its own emitted wave, but which is indistinguishable from actual return signals. Radar interference decreases the signal to noise ratio of a "victim" radar, thereby making it more difficult for this radar to detect targets. The goal for the problem of naval radar frequency management is to control the frequencies of all the radars in the system such that radar interference is minimized. 200 No Interference
1 . 150 o u c CO «
100
Interference
Q
c
o
2
co a.
50
0 03
0 CD
.
,
,
-^
CD
CO
O
Frequency Difference (MHz)
"~
CN
Figure 3: Distance-Frequency Relationship for Radar Pair. Radar interference occurs primarily when two radars are operating in close proximity at similar frequencies. In real-world terms, radar interference itself is not a measurable quantity. However, it is possible to determine the probability that one radar interferes with another. If a specified probability of interference (-.001) is taken to be acceptable, then a necessary frequencydistance relationship for a pair of radars can be determined. Figure 3 shows
67
68
K. S. Barber and C. E.
Martin
an example of such a relationship for radars with typical characteristics. This type of relationship is used to generate straight-line approximations for interference models in the agent-based simulation described below. This simplified interference model allows a measurement of the "level of interference" experienced by a naval radar in this simulation. Agents in this application work together, without human intervention, to determine how to manage their frequencies to control interference. Under normal system operation, each agent has the following capabilities: • Communication — the ability to send and receive messages and information to/ from other agents. • Sensing — the ability to sense environmental factors affecting radar interference as well as the position and frequency of other radars. Agents can also "sense" how much their radar is being interfered with, but cannot sense the source. • Environmental modeling — the ability to maintain an internal, local, model of the agent's world, separate from the simulation model of the world. For example, since the agents do not sense the interference experienced by other radars in the system, they must calculate local estimates. • Problem solving — the ability to propose, instantiate, select, and allocate subtasks or subgoals designed to achieve a higher-level goal. For example agents attempt to achieve the goal of "minimize radar interference" by selecting and assigning appropriate frequencies. Depending on the decision-making framework the agent is operating under, it will have control over the frequency of one or more radars. • Action execution — the ability to perform application-specific actions such as adjusting the radar's frequency. The agents' problem-solving behaviors are directed by their assigned decision-making frameworks. An agent's decision-making style can be classified as (1) locally autonomous, (2) master, (3) consensus, or (4) command-driven. Each classification is mapped to a particular decisionmaking algorithm as described below. Locally Autonomous (LA): An agent who is attempting to resolve interference through frequency management in a locally autonomous fashion will make decisions alone to attempt to resolve the interference. The agent will use its internal model to select a frequency that is below its determined interference threshold. The positions and frequencies of other radars in the system are treated as static constraints. If the agent generates a frequency
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
assignment that it calculates will reduce its own interference as well as the interference in the entire system, it will adopt that frequency. In addition, the agent has a small probability of accepting a solution that will increase the interference in the overall system. This helps keep the system from settling out at a locally optimal solution that is not globally optimal. Locally autonomous agents do not communicate with other agents either to share information nor for collaborative problem solving. Master/ Command-driven (M/CD): Only the master makes decisions in a master/command-driven relationship. A command-driven agent simply waits for task allocations from its master. For frequency management, the master attempts to eliminate interference through iterative assignments. First, it chooses its own frequency by selecting the frequency that gives it the least possible interference. Ties for best frequency, which occur often, are broken at random. During this process, the master does not consider the frequencies of its command-driven radars as constraints. Then, using its own assigned frequency and the frequencies of those radars that are not command-driven as constraints, it determines an interference-free frequency for each command-driven agent. Once a frequency assignment is made for a command-driven agent, this information is added to the list of constraints and the process continues until all assignments have been made. If the master agent determines that the solution it has reached does not improve the level of system interference, the frequency selection process is restarted. The frequency-selection algorithm is pseudo-random, so the master may find a good set of frequencies on the next attempt. Once the master discovers a set of frequencies that reduce system interference, it changes to its selected frequency and orders all command-driven agents to implement their respective frequencies. This frequency assignment requires communication from the master to the command-driven agents. For these experiments, agents in a master/command-driven relationship do not share information. No information is communicated from agents who are command-driven to the agent who is their master. The master's problem-solving success relies entirely on the accuracy of its own sensed information and world model. Consensus (CN): Each agent involved in consensus interaction plays an equal part in determining frequency assignments. Each agent independently carries out the same frequency selection algorithm as described for the master/command-driven situation, treating the other consensus agents as if they were command-driven. However, at the conclusion of frequency selection, each agent proposes a list of frequency assignments to the rest of
69
70 K. S. Barber and C. E. Martin the consensus group during a synchronization phase. Along with this proposal, each agent passes its own model of itself (position, frequency, and interference) to the other agents in the group. The consensus group members then evaluate each proposal using the updated information they have received and vote on the best solution. Each member then tallies the votes and adopts the winning solution. In case of ties, the vote goes to the solution proposed by the agent with the lowest identification number. 3.4.2 Experimental Design A system of three agents was used for these experiments. One radar per ship was modeled, with one agent controlling each radar's frequency assignment. The available frequencies are limited to integer values in the range of 1300MHz to 1350MHz. The agents' problem solving performance was measured across variations in decision-making framework and situation. In general, an agent's situation is defined by characteristics of its own state, its goal or task, and its environment—including other agents. For these experiments, salient situation characteristics include (1) radar locations, (2) status of communication capabilities, and (3) status of position sensing capabilities. For each experimental run, the system radars were arranged geographically in an equilateral triangle around a point on a two-dimensional "ocean" surface. Agent 2
Non-Agent v Entity
Agent 1
JP
Agent 3
Figure 4. Geographical Configuration of Radars.
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
In addition, a non-agent entity was incorporated into the problem. This non-agent entity represents a radar whose frequency is not controlled by an agent in the system (approximates a fishing boat or other source of nonsystem radar energy). This non-agent entity maintains a fixed position at the center of the system triangle and a fixed frequency over the problem-solving time. The radar strength associated with the non-agent entity is half the radar strength associated with agent-controlled radars. The geographical configuration used for these experiments is pictured in Figure 4, where r represents the distance of each ship from the center of the triangle. As r decreases, the problem becomes more difficult due to the frequency-distance requirement (see Figure 3). Problem Difficulty for 3 Agents plus Non-Agent Entity Radii chosen for experimental situations are marked with X
)?*-*
."•»~~^
" ^* A
PDzero -»-- PDavint
~\
X\
\ S. \
Xs
\ \
10
20
S i^ i
•
0
*
30
40
-
—
50
60
70
Radius of Radars from Center of Equilateral Triangle
Figure 5. Problem Difficulty Estimates for Given Radii Thirty-six experimental situations were generated based on variations in the salient situation characteristics. Nine different possibilities for radar locations were chosen (r = 0, 5, 9, 10, 27, 36, 44, 49, or 58). These radii were chosen based on estimates of the difficulty of finding optimal solutions to the frequency assignment problem at given locations. Two measures of problem difficulty were used: (1) PDzero, one minus the number of possible frequency assignments resulting in zero radar interference divided by the total number of possible frequency assignments and (2) PDavint, the average amount of radar interference across all possible frequency assignments at a given radius normalized by the average amount of radar interference across all possible frequency assignments at a radius of 0. The first measure,
71
72
K. S. Barber and C. E.
Martin
PDzero, gives an estimate of the sparseness of the search space. The second measure, PDavint, shows that even though no solutions may be available (as for radii 0 through 9), lower interference levels can still be achieved for some radii (e.g. 9) than for others (e.g. 0). Figure 5 shows the results of this problem difficulty analysis. The step-wise character of the PDzero line is a result of the restriction to integer frequency assignments. Each radius chosen for use in the experiments is marked with an 'X' on the chart. Radii were chosen at the endpoints: most difficult (r = 0) and least difficult (r = 58). Radii were also chosen around the discontinuity between the highest radius for which there are no possible frequency assignment solutions that meet the interference requirements (r = 9) and the lowest radius for which minimal solutions do exist (r = 10). The remaining radii were chosen at approximately equal intervals of problem difficulty in the intervening spaces. In addition to the variation of radar positions, other variations in experimental runs include the following: There are two different possible communication states for the system, fully connected communication (Comm UP) or completely failed communication (Comm DOWN). There are two different possible position-sensing states for the system, fully connected (PosSen UP) or completely failed (PosSen DOWN). When position sensing is not available, the agents interpret the other radars to be located very far away. For each of the possible situations, five different decision-making frameworks were tested. Given the four possible discrete autonomy assignments (M, CD, LA, CN), with three agents there are 4x4x4=64 possible decision-making frameworks, of which only 14 are valid. (A master must have at least one CD agent. Any CN agent must be partnered with least one other CN agent. No two agents can be master for the same CD agent. A CD agent must have one master agent, etc.). Of these 14 valid decision-making frameworks, only five are unique if the agents are homogeneous (i.e. eliminating order considerations by treating M, CD, CD the same as CD, M, CD). Because the three agents in these experiments are homogeneous and are arranged geographically in a uniform pattern (equilateral triangle), order considerations are, in fact, eliminated for these experiments. Therefore, only five possible decision-making frameworks exist to test. These include (1) All LA, all the agents act in a locally autonomous manner, (2) All M/CD, one master controls two commanddriven agents, (3) AH CN, all the agents collaborate in true consensus to
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
solve the problem, (4) 2M/CD ILA, two agents are in a master/commanddriven relationship, and one agent acts in a locally autonomous manner, and (5) 2CN ILA, two agents are in consensus and one agent acts in a locally autonomous manner. For each of the situation/decision-making framework combinations, five different sets of initial frequency assignments were each simulated with five different sets of random number seeds for the agents' frequency selection algorithms. Table 1 summarizes the simulations that comprise this experimental set. Simulation Parameter Radar Position: (radius) varies from 0 to 58 Communication: full or no communication Position Sensing: full or no position sensing # of Situations (9*2*2) Decision-Making Framework (DMF) # of Situation/DMF Combinations (5*36) Initial Frequency Assignments # of Unique Runs per Assignment Total Starting Conditions per Situation (5*5) Total Simulation Executions (180*25)
# of Values 9 2 2 36 5 180 5 5 25 4,500
Table 1: Experimental Setup Each simulation execution was completed after a time-out limit of 150 time-steps was reached (15.0 hours in simulated time, approximately 2 minutes real-time). The performance measures used for these experiments include (1) time to solution, (2) average level of interference in the system over simulation time (3) number of frequency changes undertaken, and (4) number of messages passed. These performance measures are explained in detail in the following section. 3.4.3 Experimental
Results
The figures in this section present performance data plotted against the radius of the ships from the center of their geographical configuration. Problem difficulty decreases from left to right on these charts. Each figure consists of four charts across which the status of communication and position-sensing capabilities for the ships varies. Each point represents one
73
74 K. S. Barber and C. E. Martin experimental situation given a fixed decision-making framework. Therefore, each point represents the average performance across 25 simulation executions.
ASI for Comm-UP PosSetvUP -«-•-©*ig-
fa
AULA AKM/CD AHCN 2M/CD1LA 2CN1LA
ASP for Comm-UP PosSen-DOWN
8' •2:350
7""***Radar Distance from Center
Radar Distance from Center
ASI for Comm-DOWN PosSen-UP
ASI for Comm-DOWN PosSen-DOWN
Radar Distance from Center
Radar Distance from Center
Figure 6. Average System Interference (ASI). These four graphs depict the amount of interference experienced by system agents averaged over the entire simulation execution time. Low values indicate that agents were able to minimize interference successfully. Note that all decision-making frameworks perform similarly when communication and position-sensing are both available (ASI for Comm-UP PosSen-UP). When communication is down, the All CN decision-making framework performs poorly. In these cases, the consensus agents cannot communicate to agree on any course of action. Therefore no actions are taken at all to reduce the radar interference.
The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems 75
ATTS for Comm-UP PosSen-DOWN
ATTS for Comm-UP PosSen-UP
13
-&-+-&-*~
•°
^
AULA AIM/CD AUCN ZWCD1LA
f
LU
\ \ \ \\
\ \\ \
^ 5
-~A Radar Distance from Center
Radar Distance from Center ATTS for Comm-DOWN PosSen-DOWN
ATTS for Comm-DOWN PosSen-UP 13
-a- AJLA - • - AIM/CO -6-AICN - * - 2M/CD1LA
,_:^KNIIA_ 10
\
t-
»b
tf-
<,
Tim
•v._
-e-•-B—-*-
1
!
\ =2=a=J%. Radar Distance from Center
r s
AILA AIM/CD AICN 2WCD1LA 2CN1IA
T \
\
\ \
\
V^IIN\ —
-
.
•.
^ — - » . •
Radar Distance from Center
Figure 7. Average Time to Solution (ATTS). These four graphs depict the amount of simulation time the agents took to reach a completely interference-free state for all system radars. The simulation was terminated after 150 time steps (15.0 hours in the simulated world). For the most difficult problems (small radii), the agents were not able to reach a no-interference solution at all. However, in these cases the agents may have been more or less likely to minimize the interference depending on their decision-making framework (see AS1 graphs).
76 K. S. Barber and C. E. Martin
> G1 < 0
• 10
' 2
• O
3
w
' O
«
5
O
6
" * •> O
> 0' < 0
• 10
• 2 O
3
'— 0 4
' 0
,
5
=^^a*^-»' 0 60
Radar Distance from Center
Radar Distance from Center
Radar Distance from Center
Radar Distance from Center
Figure 8. Average Number of Frequency Changes (ANFC). These four graphs illustrate the total number of frequency changes that were initiated by agents in the system. This corresponds to the number of solutions that were attempted before the problem was actually solved and is indicative of the efficiency of the decision-making framework. Note that the agents often implement more frequency changes when operating under the All LA framework than they do when operating under other frameworks.
The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems ANMP for Comm-UP PosSen-UP
:5=S=3 = • r * i rRadar Distance from Center
ANMP for Comm-UP PosSen-DOWN
$ qpRrRadar Distance from Center
ANMP for Comm-DOWN PosSen-UP
a
ANMP for Comm-DOWN PosSen-DOWN
0L160
8,140
< t • %Radar Distance from Center
-e-*-B•-•:•-V-
AULA AIM/CD ABCN 2M/CD1LA 2CN1LA
< T-MTRadar Distance from Center
Figure 9. Average Number of Messages Passed (ANMP). These graphs show the number of messages the agents pass during problem solving. As expected, consensus decision-making frameworks involve more message-passing than do other frameworks. When communication is down, no messages are passed (although the agents may have attempted to send messages, the messages would not have been received). 3.4.4 Evaluation
of Relative
Performance
Based on the starting condition (i.e. initial frequency assignments), performance can vary widely in this experimental domain for any one situation. Although averaging the data across multiple runs gives some indication of the usefulness of various decision-making frameworks, such averages can often be misleading. It is desirable to perform some additional data analysis to determine if the differences seen are significant. Generally, placing confidence bands around graphs of experimental data allows one to determine whether or not the differences seen are statistically significant. However, confidence bands can also be misleading for highly variable data sets such as this one. For these data sets, the absolute performance for any single decision-making framework across initial conditions can vary more than the relative performance differences across decision-making
77
78
K. S. Barber and C. E.
Martin
frameworks within a single initial condition, for the same experimental situation. In these cases, confidence bands can actually mask useful differences as shown in Figure 10. In order to determine whether the differences seen in the graphs above are significant, this chapter presents an analysis based on an index of usefulness (IU) that gives an indication of how often a decision-making framework outperforms other decision-making frameworks. High values and wide separations for the IU measure indicate that a particular decision-making framework often outperforms others. Combined with the average performance data given above, the IU measure indicates how typical a given average performance difference is. The IU analysis is based on a relative ranking of the performance of each decision-making framework for a given experimental run. This relative ranking system ignores the absolute magnitude of the performance measures and retains only the information about which framework performs best (second best, and so forth) for a given experimental run. The ranks can then be averaged across runs within the same situation to yield an indication of which framework performs best most often. In this way, wide variations in performance that arise, in general, due to differences in initial starting conditions are eliminated from the analysis. However, any relative variations, with respect to which framework performs best, are retained. If no one framework performs best more often than any other, the IU measure gives each framework an equal rating. Random performance differences should also be discounted using this method because, on average, the value of one random variable is no higher or lower than the value of another from the same set. To provide further clarification, Figure 10 and Figure 11 describe some simple examples of the IU concept. Figure 10 shows a simple example describing the concept of index of usefulness. Some performance values are recorded for candidates A and B (e.g. decision-making frameworks) in trials 1 through 6. The average of these performance values shows that candidate B performs better, on average, than does candidate A. However, due to the high degree of variability in the data set, across trials, the confidence intervals placed on the data averages seem to indicate that the average performance difference between candidate A and candidate B is not significant. On the contrary, candidate B outperforms candidate A in every single trial. An index of usefulness analysis (IU) highlights this fact. The IU assigned to A and B for each trial is annotated in the graph above each column. The average IU for candidate B across trials is 1, whereas the average IU for candidate A across
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
trials is 0. This analysis shows that B is useful much more often than A. The large separation in the IU values indicates that the average performance difference seen is typical. Performance for Candidates A and B for Trials 1 through 6 with Average Performance and Standard Deviations (Index of Usefulness Annotations Included on Individual Trials)
r
t
1 20
i
0 1
15
i 1
10 -•
i
5
ri Trials 1 through 6, Averages for Candidates A and B
Figure 10. A Simple Example of the Index of Usefulness Calculation.
Performance for Candidates C and D for Trials 1 through 6 with Average Performance and Standard Deviations (Index of Usefutrms* Annotation* Included on Individual Trials)
ft 1
2
3
4
5
6
Trials 1 through 6, Averages for Candidates C and D
Figure 11. A Simple Example of the Index of Usefulness Calculation (Reversed).
Figure 11 shows another example of the IU concept. Similar to the case shown in Figure 10, candidate D outperforms candidate C on average, but the confidence intervals indicate that this average performance difference may not be significant. However, in contrast to the previous case, in this example neither framework performs best more often than the other. An IU
79
80
K. S. Barber and C. E.
Martin
analysis highlights this characteristic. The average IU for candidate D is .5, and the average IU for candidate C is also .5. By this measure, candidate D is actually no more or less useful than candidate C1. The similarity of the IU measures for both candidates indicates that the average performance difference seen is not typical. This section provides an IU analysis across decision-making frameworks based on the experimental data reported in the previous section. This analysis is based on ranking of the relative performance of each decisionmaking framework in each starting condition (there are 25 starting conditions for each situation). Because there are more than two candidate decision-making frameworks, the assignment of ranks is a two-phase process. First, each decision-making framework is assigned a unique rank of 0, 1,2, 3, or 4 for a given starting condition. During this first phase, ranks are assigned on a first-come, first-served basis. Equal performers receive consecutive ranks. In the second phase, decision-making frameworks with equal performance are re-assigned equal ranks. The actual values of these second-phase rankings are determined by the average of the ranks the equally performing decision-making frameworks received during the first phase. Both phases are required to ensure that (1) the sum of the ranks remains constant (equals 10 for possible ranks of 0, 1, 2, 3, and 4) and that (2) equal performers get equal ranks. With constant sum ranks, if the two worst performers performed the same, the final ranks would be .5, .5, 2, 3, and 4. If all perform equally, the final ranks are 2, 2, 2, 2, and 2. If the ranks were not constrained to a constant sum, ranks could be skewed higher or lower than they should be, and comparisons across different sets of ranks would become difficult. An example rank assignment from the experimental data should help clarify this process. For a given situation and starting condition, there exists one set of performance measures for each decision-making framework. Table 2 shows the values for the performance measure "Number of Frequency Changes" in the five experimental runs where communication is DOWN, position sensing is UP, radius is 0.0, the initial frequencies are Agent 1=1300.0, Agent 2=1325.0, Agent 3=1350.0, and the planning seeds are Agent 1=4975988339999789512, Agent 2=5231336255541465468, 1 Note: The IU analysis considers the ranks of candidates (i.e. better than, worse than) without considering the magnitude of the relative differences among the candidates. In some cases, particularly in risk-adverse domains, considering this relative magnitude difference could be very helpful. Future work will examine this tradeoff.
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
Agent 3=4661803532763251635, respectively. The IU ranks for each phase are also given in this table. Framework
# of Frequency Changes
Phase-1 IU Rank
Phase-2 IU
Normalized IU
Rank AULA All M/CD A11CN 2M/CD 1LA 2CN 1LA
1 39 (worst) 0 (best) 1 0 (best)
2 0 4 1 3
1.5 0 3.5 1.5 3.5
.15 .00 .35 .15 .35
Table 2. Example Two-Phase IU Calculation for Five Candidate Frameworks
Once the second phase assignment is complete, the IU is finally calculated by normalizing the phase-2 ranks by the sum of all ranks (10, in this case). The resulting normalized rankings are then averaged across the 25 initial starting conditions for a given situation. Each initial starting condition varies by initial frequency assignments and random planningseeds. The resulting IU averages are plotted in the following charts. High values and wide separations indicate a decision-making framework that often outperforms others in the given situation. Similar values indicate that one decision-making framework performs better no more often than the other frameworks.
81
82 K. S. Barber and C. E. Martin
IU ASI for Comm-UP PosSen-UP
IU ASI for Comm-UP PosSen-DOWN
Radar Distance from Center
Radar Distance from Center
IU ASI for Comm-DOWN PosSen-UP
IU ASI for Comm-DOWN PosSen-DOWN
-&-*-&•••.••• -V-
All LA All M/CD A1ICN 2M/CD1LA 2CN 1LA
Radar Distance from Center
-G- AILA -*- AIWCD -e-
AICN
•-»- 2M/CD1LA - ? - 2CN 1LA
Radar Distance from Center
Figure 12. Index of Usefulness (IU) for Average System Interference (ASI). These charts show the Index of Usefulness with respect to Average System Interference. Note that the All CN framework often out-performs the other frameworks when communication is available but position sensing is down.
The Motivation
for Dynamic Decision-Making
Frameworks in Multi-Agent
IU ATTS for Comm-UP PosSen-UP AULA AIIIWCD ANCN 2M/CD1LA 2CN1LA
IU ATTS for Comm-UP PosSen-DOWN -O-•-B~ •"+-?~
n0.45
^
|0
-©-*-S•-;•-• -^-
Systems
CO
All LA AIIM/CD AIICN 2tWCD1LA 2CN ILA
1 1
—*~~"iP.
f[l .-"'
--a%-
FL"^^> -.
-—V*
<
£ 30.05
Radar Distance from Center IU ATTS for Comm-DOWN PosSen-UP -e-•-B••••-• -y-
§045 J0.4 CO 1
"0.15
<0 1 £
Radar Distance from Center ATTS for Comm-DOWN PosSen-DOWN
AlfLA AIIIWCD AIICN 2FWCD1LA 2CN ILA
- . - • • • ^ - " " "
^
3005 Radar Distance from Center
Radar Distance from Center
Figure 13. Index of Usefulness (IU) for Average Time to Solution (ATTS). These charts show the Index of Usefulness with respect to solution speed. Note that in the cases where no solution was reached under any decision-making framework (radar distance = 0, 5, or 9), no decision-making framework is reported to perform better than any other.
83
84 K. S. Barber and C. E. Martin
IU ANFC for Comm-UP PosSen-UP
IU ANFC for Comm-UP PosSen-DOWN •e- AULA
-*•Sr V-
AHM/CD AH CN 2M/CD1LA 2CN1LA
Radar Distance from Center
Radar Distance from Center
A N F C for C o m m - D O W N
PosSen-DOWN
-e- AULA - • - ABM/CD
1-
-e-
S»
fi
>-"*^
E
AICN
- I - 2M/CD1LA ••V- 2CN1LA
So,
\ _.—Radar Distance from Center
Radar Distance from Center
Figure 14. Index of Usefulness (IU) for Average Number of Frequency Changes (ANFC). These charts show the Index of Usefulness with respect to the number of frequency changes attempted by the agents during problem-solving. Although the All CN framework performs best in the cases where communication is down, this result is an artifact of the inability of consensus agents to take action without first communicating. No frequency changes were made at all for these cases.
The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems 85 ANMP for Comm-UP PosSen-DOWN
IU ANMP for Comm-UP PosSen-UP
Radar Distance from Center
Radar Distance from Canter
IU ANMP for Comm-DOWN PosSen-UP
IU ANMP for Comm-DOWN PosSen-DOWN -e- AILA -+- AIM/CD -e- AICN ••;•••• 2M/CD I I A -V- 2CN1LA
-©- All LA - • - AIIM/CO -e-
AIICN
- J - 2M/CD1LA -*j»~ 2CN1LA
r: - »-«• -
i
•
-•-—
» - • -
»
E Radar Distance from Center
Radar Distance from Center
Figure 15. Index of Usefulness (IU) for Average Number of Messages Passed (ANMP). These charts show the Index of Usefulness with respect to the number of messages passed by the agents during problem solving. Since the All LA decision-making framework passes no messages, it is the clear winner for cases where communication is available.
3.5 Discussion The results presented in the preceding sections support the hypothesis that the most effective decision-making framework varies with changes in runtime conditions. Table 3 lists the decision-making framework that performs best most often in each situation for each of the four performance measures discussed above. For example, for the situation in which communication is up, position sensing is down, and the radars are all located in the same position (radius from center is 0.0), then with respect to average system interference, the decision-making framework that performs the best most often is all consensus (All CN). A table entry of ' — ' indicates that the best decision-making framework is too close to call. This may be because all the decision-making frameworks perform the same (as in the case when
86
K. S. Barber and C. E.
Martin
communication is down and the performance measurement is with respect to number of messages passed.) However the '—' symbol also indicates cases where no clear winner exists, but there may be obvious under-performers. One example of this type of situation occurs for communication down and position sensing up, at radii of 36.0, 44.0, and 49.0. Table 3 indicates that there is no clear winner for average system interference, but an examination of the IU ASI chart for these situations indicate that the two frameworks All LA and 2M/CD ILA frameworks tie for best performance, but the other three frameworks perform much worse. Tradeoffs exist among the reported performance measures. For example, the all consensus decision-making framework (All CN) always has the lowest number of attempted frequency changes when communication is down. Although this measure is a good reflection of problem-solving cost, it is not a good indication of overall solution quality. Without communication, it is impossible to form even one consensus plan of action. Thus, no actions are taken at all, resulting in a very low implementation cost but very poor solution achievement. In addition, all locally autonomous operation (All LA) often produces the fastest solution (ATTS) when communication is down. However the costs of this decision-making framework are higher (higher level of interference during problem solving than 2M/CD ILA for many cases). In practice, some weighted combination of these performance measures would be used to determine the best decision-making framework overall. Tradeoffs among the different measures serve to increase the motivation for adaptation of decision-making frameworks because the most highly-weighted performance measure at any given time can be an important run-time condition. Communication UP, Position Sensing UP Perf. Meas. 0.0
Radius, r, difficulty of problem decreasing left to right 10.0 27.0 36.0 44.0 49.0 58.0 5.0 9.0
ASI
2M/CD ILA
2M/CD ILA
All M/CD
All M/CD
2M/CD ILA
2M/CD ILA
2M/CD ILA
2M/CD ILA
—
ATTS
—
—
—
All M/CD
All M/CD
All M/CD
2M/CD ILA
2M/CD ILA
—
ANFC
2M/CD ILA
—
All CN
—
2CN ILA
—
All LA
All LA
—
ANMP
All LA
All LA
All LA
All LA
All LA
All LA
All LA
All LA
—
The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems 87
Communication UP, Position Sensing DOWN Perf. Meas.
Radius, r, difficulty of problem decreasing left to right 10.0 27.0 36.0 44.0 49.0 58.0 9.0 5.0
0.0 ASI
All CN
All CN
All CN
ATTS
All CN
All CN
2M/CD ILA
2M/CD ILA
All CN
All CN
2M/CD ILA
2M/CD ILA
ILA
ANFC
All M/CD
All M/CD
All M/CD
All M/CD
All M/CD
All M/CD
All M/CD
All M/CD
ANMP
All LA
All LA
All LA
All LA
All LA
All LA
All LA
All LA
Communication DOWN, Position Sensing UP Radius, r, difficulty of problem decreasing left to right
Perf. Meas. ASI
0.0
5.0
9.0
10.0
27.0
36.0
44.0
49.0
58.0
All LA
All LA
All LA
All LA
All LA
—
—
—
—
All LA
All LA
All LA
All LA
All LA
All CN
All CN
All CN
All CN
All CN
ATTS ANFC
—
All CN
All CN
All CN
ANMP Perf. Meas.
Communication DOWN, Position Sensing DOWN Radius, r, difficulty of problem decreasing left to right 0.0
5.0
9.0
10.0
27.0
ASI
2M/CD ILA
2M/CD ILA
2M/CD ILA
2M/CD ILA
All LA
ATTS
—
—
—
All LA
ANFC
All CN
All CN
All CN
All CN
36.0
44.0
49.0
58.0
—
—
All LA
—
All LA
All LA
All LA
All LA
—
All CN
All CN
All CN
All CN
—
ANMP Table 3. Most Appropriate Decision-Making Framework for Each Possible Situation
88
K. S. Barber and C. E.
Martin
These experiments empirically identify the most appropriate decisionmaking framework for each of the situations presented in this domain under the implemented planning algorithms and coordination mechanisms. Changes in the frequency selection algorithms or interaction styles corresponding to each decision-making framework will likely result in changes in the relative performance of these decision-making frameworks given particular situations. Because many of the algorithms controlling the agents' behavior were revised for this set of experiments, these performance results should not be directly compared to those reported from the authors' previous experiments [3]. However, both sets of experiments show that for a given set of planning and coordination algorithms, no one decision-making framework performs best across all situations. The different simulated situations represent variations that could occur at run-time. During an unconstrained system run, each of these factors may change as ships move, as communication or position sensing fails and is reestablished, and as Dynamic Adaptive Autonomy is exercised. The results clearly indicate that no matter which performance measure is considered most important, there is no one decision-making framework that performs best across all run-time situations. In fact, performance under a particular decision-making framework depends highly on the situation. Generalizations about the appropriateness of a particular decision-making framework across different types of situations cannot be supported by the data. These results show that the best decision-making framework for a system of agents cannot always be determined statically at design time. Agents should be given the flexibility to adapt their decision-making frameworks to achieve maximum performance in each run-time situation they face.
3.6 Conclusions Previous work noting the effectiveness of centralized and distributed decision making across various problem contexts suggests differences that can be exploited by adaptive multi-agent systems (see Related Work section). The experiments presented in this chapter show that the "best" type of decision-making organization for a group of agents depends not only on the application and the pre-defined characteristics of the system, but also on factors that can change during system operation.
The Motivation for Dynamic Decision-Making
Frameworks in Multi-Agent
Systems
These experiments investigate the performance of a multi-agent system under 36 different situations that the agents may encounter during run-time. For each of these situations, five different decision-making frameworks are compared. Four different performance measures assess the quality of solution as well as solution cost. No one decision-making framework is found to work best for all situations. In fact, these experiments show that which decision-making framework works best varies greatly across situations. Although the situations considered in these experiments are static, changing circumstances would move the agents through several of these situations during an unconstrained system run. The differences among the situations are differences that would be seen at run-time. Therefore, adaptation to each situation must occur at run-time. These experiments provide a clear motivation for the implementation of adaptive decision-making frameworks in multi-agent systems. These experiments indicate that overall system performance could be improved if agents were able to reason about and modify their decision-making frameworks at run-time. Future work will explore both (1) the reasoning process agents should use to select the most appropriate decision-making framework for a given run-time situation, and (2) the mechanism through which agents can agree to form, dissolve, or modify decision-making frameworks during system operation.
Acknowledgements This research was supported in part by the Texas Higher Education Coordinating Board (#003658452) and a National Science Foundation Graduate Research Fellowship. The authors would also like to thank the reviewers, Bengt Carlsson and Sam Joseph, for their insights and revision suggestions concerning this chapter.
89
90
K. S. Barber and C. E.
Martin
Bibliography 1.
2.
3.
4.
5.
6.
7. 8.
9. 10.
11.
12.
K. S. Barber, "Adaptive Autonomy: The Key to Dynamic, Responsive Formation of Sensible Agent Organizations," In Proceedings of the Conference on Intelligent Systems and Semiotics, Gaithersburg, MD, 1997. pp. 4146-4151. K. S. Barber, A. Goel, D. Han, J. Kim, T. H. Liu, C. E. Martin, and R. M. McKay, "Simulation Testbed for Sensible Agent-based Systems in Dynamic and Uncertain Environments," TRANSACTIONS of the Society for Computer Simulation International, Special Issue on Modeling and Simulation in Manufacturing, 16(4) (1999) pp. 186-203. K. S. Barber, A. Goel, and C. E. Martin, "The Motivation for Dynamic Adaptive Autonomy in Agent-based Systems," in Intelligent Agent Technology: Systems, Methodologies, and Tools. Proceedings of the 1st Asia-Pacific Conference on IAT, Hong Kong, December 14-17, 1999, eds. J. Liu and N. Zhong, World Scientific, Singapore, 1999, pp. 131-140. K. S. Barber and C. E. Martin, "Agent Autonomy: Specification, Measurement, and Dynamic Adjustment," In Proceedings of the Autonomy Control Software Workshop at Autonomous Agents (Agents'99), Seattle, WA, 1999. pp. 8-15. K. S. Barber, R. McKay, A. Goel, D. Han, J. Kim, T. H. Liu, and C. E. Martin, "Sensible Agents: The Distributed Architecture and Testbed," IE1CE Transactions on Communications. IECIA/IEEE Joint Special Issue on Autonomous Decentralized Systems, E83-B(5) (2000) pp. 951-960. W. Briggs and D. Cook, "Flexible Social Laws," In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, 1995. pp. 688-693. R. L. Daft and D. Marcic, Understanding Management, Second ed. Fort Worth, TX: The Dryden Press, 1998. K. Decker and V. Lesser, "A One-shot Dynamic Coordination Algorithm for Distributed Sensor Networks," In Proceedings of the Eleventh National Conference on Artificial Intelligence, Washington, DC, 1993. pp. 210-216. K. S. Decker and K. P. Sycara, "Intelligent Adaptive Information Agents," Journal of Intelligent Information Systems, 9(3) (1997) pp. 239-260. E. H. Durfee, "Planning in Distributed Artificial Intelligence," in Foundations of Distributed Artificial Intelligence, Sixth-Generation Computer Technology Series, eds. G. M. P. O'Hare and N. R. Jennings, John Wiley & Sons, Inc., New York, 1996, pp. 231-245. E. H. Durfee and V. R. Lesser, "Using Partial Global Plans to Coordinate Distributed Problem Solvers.," In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Milan, Italy, 1987. pp. 875-883. M. S. Fox, M. Barbuceanu, M. Gruninger, and J. Lin, "An Organizational Ontology for Enterprise Modeling," in Simulating Organizations, eds. M. J.
The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems
13.
14. 15. 16.
17. 18.
19.
20.
21. 22. 23.
24. 25.
Prietula, K. M. Carley, and L. Gasser, AAAI Press / The MIT Press, Menlo Park, CA, 1998, pp. 131-152. L. Gasser, N. F. Rouquette, R. W. Hill, and J. Lieb, "Representing and Using Organizational Knowledge in DAI Systems," in Distributed Artificial Intelligence, vol. 2, eds. L. Gasser and M. N. Huhns, Pitman/Morgan Kaufman, London, 1989, pp. 55-78. T. Ishida, L. Gasser, and M. Yokoo, "Organization Self-Design of Distributed Production Systems," IEEE Transactions on Knowledge and Data Engineering, 4(2) (1992) pp. 123-134. P. R. Lawrence and J. W. Lorsch, Organization and Environment: Managing Differentiation and Integration. Boston: Harvard Business School Press, 1967. Z. Lin, "The Choice Between Accuracy and Errors: A Contingency Analysis of External Conditions and Organizational Decision Making Performance," in Simulating Organizations, eds. M. J. Prietula, K. Carley, M., and L. Gasser, AAAI Press / The MIT Press, Menlo Park, CA, 1998, pp. 67-88. P. Mertens, J. Falk, and S. Spieck, "Comparisons of Agent Approaches with Centralized Alternatives Based on Logistical Scenarios," Information Systems, 19(8) (1994) pp. 699-709. B. Moulin and B. Chaib-draa, "An Overview of Distributed Artificial Intelligence," in Foundations of Distributed Artificial Intelligence, SixthGeneration Computer Technology Series, eds. G. M. P. O'Hare and N. R. Jennings, John Wiley & Sons, Inc., New York, 1996, pp. 3-55. S. Noh and P. Gmytrasiewicz, "Implementation and Evaluation of Rational Communicative Behavior in Coordinated Defense," In Proceedings of the Third International Conference on Autonomous Agents, Seattle, WA, 1999. pp. 123-130. S. Noh and P. Gmytrasiewicz, "Towards Flexible Multi-Agent DecisionMaking Under Time Pressure," In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999. pp. 492-498. OMG, OMG Home Page, 1999, Object Management Group, 1999.
. E.-I. Osawa, "A Metalevel Coordination Strategy for Reactive Cooperative Planning," In Proceedings of the First International Conference on MultiAgent Systems, San Francisco, CA, 1995. pp. 297-303. Y.-p. So and E. H. Durfee, "Designing Organizations for Computational Agents," in Simulating Organizations, eds. M. J. Prietula, K. M. Carley, and L. Gasser, AAAI Press / The MIT Press, Menlo Park, CA, 1998, pp. 47-66. K. P. Sycara, "Multiagent Systems," AI Magazine, 19(2) (1998) pp. 79-92. M. Tambe, "Towards Flexible Teamwork," Journal of Artificial Intelligence Research, 7 (1997) pp. 83-124.
91
Chapter 4
Dynamically Organizing KDD Processes in a Multi-Agent KDD System
Ning Zhong 1 , Chunnian Liu 2 , Setsuo Ohsuga 3 1 Maebashi Institute of Technology, Japan 2 Beijing Polytechnic University, China 3 Waseda University, Japan
4.1
Introduction
KDD (Knowledge Discovery and Data Mining) means discovering new, useful knowledge from vast amount of data accumulated in an organization's databases. KDD is essentially a demand-driven field. Although early work in KDD inevitably concentrated on individual mining techniques, what really important is the KDD systems combining various KDD techniques and their successful applications to real-world databases. KDD systems have rapidly evolved. While the first generation of KDD systems was stand-alone mining applications over files, the second generation has been integrated with data management, and the third (current) generation is characterized by distribution of data and computation over enterprises' Intranets or across the global Internet. We will call this kind of KDD systems multiagent KDD Systems. In recent years, along with developing new KDD techniques, we pay increasing attention to the process and architecture aspects of KDD systems. We observe how to increase both autonomy and versatility of a knowledge discovery system is a core problem and a crucial aspect of the 93
94
N. Zhong, C. Liu and S. Ohsuga
KDD research. Zytkow described a way of increasing cognitive autonomy in machine discovery by implementing new components of the discovery process [37], namely, greater autonomy means more discovery steps in succession performed without external intervention, and external intervention can be replaced by automated search, reasoning, and the use of background knowledge-bases. On the other hand, it has been recently recognized in the K D D community t h a t the K D D process for real-world applications is extremely complicated [l; 5; 26; 29; 30; 3 l ] . There are several levels, phases and large number of steps and alternative K D D techniques in the process, iteration can be seen in anywhere and at any time, and the process may repeat at different intervals when n e w / u p d a t e d d a t a comes. However, no one has begun to describe • How to plan, organize, control, and manage the K D D process dynamically for different K D D tasks; • How to get the system to know it knows and impart the knowledge to decide what tools are appropriate for what problems and when. Solving of such issues needs to develop meta levels of the K D D process by modeling such a process. We argue t h a t modeling of the K D D process constitutes an important and new research area of K D D , including formal specification of the process, its planning, scheduling, controlling, management, evolution, and reuse. T h e key issue is how to increase both autonomy and versatility of a KDD system. Our methodology is to create an organized society of KDD agents. This means • To develop many kinds of K D D agents for different tasks; • To use the KDD agents in multiple learning phases in a distributed cooperative mode; • To manage the society of K D D agents by multiple meta-control levels. T h a t is, the society of KDD agents is made of many smaller components t h a t are called agents. Each agent by itself can only do some simple thing. Yet when we join these agents in an organized society, this leads to implement more complex KDD tasks. Based on this methodology, we also design a multi-strategy and multi-agent K D D system called GLS (Global Learning Scheme). In this paper, we describe a way of increasing b o t h autonomy and versatility of the GLS system by applying several AI planning techniques
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
95
t h a t are implemented as a meta-agent to organize dynamically the K D D process [30; 3 l ] . To be able to apply AI planning techniques, each K D D agent should be regarded as an operator, and formally described. We introduce an ontology of K D D agents for this purpose in the style of O O E R (Object-Oriented Entity Relationship d a t a model). For each type of KDD agents, the types of its i n p u t / o u t p u t , the precondition and effect of its execution, and its functionality are explicitly specified in the d a t a model. T h e most difficult problem in a multi-strategy and multi-agent KDD system is t h a t how to choose appropriate K D D techniques t o achieve a particular discovery goal in a particular domain. In our method, the combination of the ontology of K D D agents and the planning mechanism gives an automatic solution to this problem (to some extent, at least). In such a K D D system, both autonomy and versatility are increased. T h e basic planning mechanism is a core domain-independent non-linear planner [9; 10] plus a K D D domain specific layer. The two meta-agents (planner and controller) cooperate to decompose the overall K D D process into a K D D agents network in a hierarchical manner. T h a t is, high-level agents are gradually decomposed into networks of sub-agents (sub-plans). To facilitate this, the type of a high-level agent has the specification listing the types of its candidate sub-agents (meanwhile a low-level agent just has the associated K D D algorithms to carry out its task). Given a (sub)goal (to build a sub-plan to achieve the effect of a high-level K D D agent), the planner reasons on the candidate sub-agent types to choose the appropriate ones and build the (sub)plan which, when executed, would achieve the (sub) goal. In a K D D process, both the data, knowledge, and the process itself are evolving. For instance, knowledge refinement on d a t a change is an import a n t component of the K D D process. To support the evolution, we use the techniques such as incremental replanning or integration of planning and execution, which have been successfully applied to software development process [34; l l ] . T h e chapter is organized as follows: Section 2 summarize the requirements needed to support multi-agent KDD systems. Section 3 describes the architecture of the GLS system t h a t is able to dynamically organize and manage the K D D process and K D D agents; Section 4 describes how to plan and organize the K D D process, and how to manage the K D D agents, and Section 5 describes how t o handle iteration and changes of K D D process.
96
N. Zhong, C. Liu and S. Ohsuga
Finally, Section 6 gives the conclusions and our future work.
4.2
R e q u i r e m e n t s for M u l t i - A g e n t B a s e d K D D ture
Architec-
Multi-agent based KDD faces unique challenges and needs architectural support to cope with them. T h e requirements for architectural support to multi-agent based K D D can be summarized as follows. • Multiple roles: Unlike simple, stand-alone, prototype K D D work, real-world K D D process involves multiple h u m a n roles. We can identify at least three types of them: the analysts (for KDD task planning and result analysis), the knowledge engineers (executing the mining tasks), and the end-users (people managing and optimizing the business process within t h a t the KDD process occurs). Multiple people may access to the d a t a and the analytical results (the models), so the KDD system must provide multiple access points. • Mining on data of huge size: Gigabytes or even terabytes of d a t a have been accumulated in large organizations. Mining on such large scale of d a t a has the following implications on K D D architecture: — We need large computational power (high-performance servers) for mining tasks, and visualization tools for d a t a analysis and model analysis. — T h e mining operation should be run close to the databases, because it is not practical to move the vast d a t a between the sites of individual analysts. This requirement can be supported either by mobile mining components traveling to the database sites and executing there, or by setting up high-performance servers close to databases. — T h e user should be allowed to browse and sample d a t a during planning and editing his/her mining tasks. • Mining on diverse and distributed data sources: Because various types of d a t a are accumulated on many sites in a large organization. A user may need to access to multiple datasets. So the KDD system must support distributed mining and combining partial results into
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
97
a meaningful total. KDD process planning: There are several stages in K D D process (the three major stages are: pre-processing, knowledge elicitation, and knowledge refinement). For each stage, there is a large number of available K D D techniques and algorithms. Some of t h e m may be out-of-date soon while new ones come continuously. So, good combination of K D D techniques and easy integration with new techniques are very desirable, and this demands careful planning of the K D D tasks. Note that from different kinds of d a t a resources, different K D D techniques are needed, so the planning involves browsing and sampling data. Interactions among KDD roles: Because the K D D process is iterative through the cycle of data-selection, pre-processing, model building, and model analysis and refinement, high degree of interactions among analysts, knowledge engineers and the end-users is needed. Flexibility: Wide range of configuration options is needed to fulfill different needs of large organizations, so t h a t the applications can be scaled from a few client workstations to high-performance server machines. Open-ended-ness for future extension. Conceptual and architectural simplicity is important in designing such a complex system to ensure/enhance its correctness, flexibility and openness, etc.
On the implementation level, the rapid development of Internet and related technologies such as software component technology and various J a v a / C O R B A packages do provide solutions to multi-agent based KDD.
4.3
The GLS System
This section describes the architecture of the GLS system, which is a multistrategy and multi-agent KDD system, and is able to dynamically organize and manage the K D D process and KDD agents for increasing both autonomy and versatility.
98
N. Zhong, C. Liu and S. Ohsuga
Planning to organize dynamically the discovery process
^/o*>
o-*s//r Communication/ Negotiation between Agents
Dynamically generating KDD Agents Managing distributed cooperative use of agents
s^£=^EJZPre-processing
Knowledge-elicitatlon
Refinement
Structuring (Multi-Strategy Learning in Multi-Learning Phases)
Fig. 4.1
4.3.1
An Architecture
The architecture of the GLS system
of KDD
Process
KDD process is a multi-step process centered on data mining algorithms to identify what is deemed knowledge from databases. In [29], we model the KDD process as an organized society of autonomous knowledge discovery agents (KDD agents, for short). Based on this model we have been developing a multi-strategy and multi-agent KDD system called GLS which increases both autonomy and versatility. Here we give a brief summary of the architecture of the GLS system [26; 29; 30]. The system is divided into three levels: two meta-levels and one object level as shown in Figure 4.1. On the first meta-level, the planning metaagent (planner, for short) sets the discovery process plan that will achieve the discovery goals when executed. On the second meta-level, the KDD agents are dynamically generated, executed, and controlled by the controlling meta-agent (controller, for short). Planning and controlling dynamically the discovery process is a key component to increase both autonomy and versatility of our system. On the object level, the KDD agents are grouped into three learning phases: Pre-processing agents include: agents to collect information from global information sources to generate a central large database; agents to clean
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
99
the data; and agents to decompose the large database into several local information sources (sub-databases), such as RSH (rough sets with heuristics for feature selection), CBK ( a t t r i b u t e oriented clustering using background knowledge), DDR (discretization by the division of ranges), RSBR (rough sets with Boolean reasoning for discretization of continuous valued attributes), FSN (forming scopes/clusters by nominal or symbolic attributes), and SCT (stepwise Chow test for discovering structure changes in time-series data) [26; 24; 25; 3; 16]. Knowledge-elicitation agents include: agents such as KOSI (knowledge oriented statistic inference for discovering structural characteristics - regression models), DBI (decomposition based induction for discovering concept clusters), GDT-RS (generalization-distribution-table and rough sets based induction for discovering classification rules), PRM (peculiarity rule miner), and RS-ILP (rough first-order rule miner) [24; 25; 2; 33; 35; 12]. Knowledge-refinement agents acquire more accurate knowledge (hypothesis) from coarse knowledge (hypothesis) according to d a t a change a n d / o r the domain knowledge. K D D agents such as IIBR (inheritance-inference based refinement) and HML (hierarchical model learning) are commonly used for this purpose [28; 27]. Note t h a t the GLS system, as a multi-strategy and multi-agent K D D system, must provide alternative K D D agents for each learning phase. On the other hand, because of the complexity of databases and the diversification of discovery tasks, it is impossible to include all known/forthcoming K D D techniques. T h e K D D agents listed above are by no means exhaustive: they are included here because they have been developed previously by us. More agents will enter the system when the involved techniques become m a t u r e . In terms of AI planning, no m a t t e r how many KDD agents we have, each of t h e m is an operator. Each operator by itself can only do some simple thing, only when they are organized into a society, we can accomplish more complex discovery tasks. T h e K D D planner reasons on these operators to build K D D process plans - networks of K D D agents t h a t will achieve the overall discovery goals when executed. But t o apply AI planning techniques, we must be able to formally describe the K D D agents as operators. This is the subject of the next section.
100
JV. Zhong, C. Liu and S. Ohsuga
Entity Agent | Data
j
[ Knowledge j
[Automatic
| RSILP |
| Clause |
Fig. 4.2
4.3.2
Ontology
of KDD
Interactor I
The ontology of the GLS system
Agents
The KDD planner, as any AI planner, needs a World State Description (WSD) and a pool of Operators (Ops). We use an ontology, which is a kind of O O E R (Object-Oriented Entity Relationship) d a t a model, to describe them. T h e traditional E R model has concepts of entity/relation, type/instance, instance-level attributes, and so on. T h e ontology further incorporates object-oriented concepts such as sub-typing, multiple inheritance, procedures, and type-level attributes/procedures, and so on. There are two kinds of types, D&K types and Agent types, for passive and active objects respectively. Figure 4.2 shows the (simplified) ontology used in the GLS system. T h e D&K types describe various d a t a and knowledge presented in a KDD system. On the d a t a part, we have RawData from the global information source, CleanData from the central large database, SelectedData (Scope or Cluster) from the sub-databases, and so on. On the knowledge part, we first distinguish among Kdomain (the background knowledge), Kdiscovered (the discovered knowledge), and Krefined (the refined knowledge). The type Kdiscovered has sub-types: Regression (structural characteristics), CCluster (conceptual clusters), CRule (classification rules), FRule (peculiarity rules) Clause (first-order rules), and so on. Krefined has sub-types Regre-
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
101
Tree (family of regression models) and so on. T h e Agent types describe various KDD techniques used in the GLS system. We distinguish Automatic ( K D D algorithms) from Interactor ( K D D techniques t h a t need h u m a n assistance). Kdiscover means the overall K D D task, while Preprocess, Kelicit and Krefine stand for the three learning phases: pre-processing, knowledge-elicitation, and knowledge-refinement, respectively. Collect, Clean and Select are activities in Preprocess. Most agent types take the same technical names as mentioned in Section 4.3.1, such as RSH, CBK, DDR, RSBR, FSN, SCT, KOSI, DBI, GDT-RS, PRM, RS-ILP, IIBR, HML. Note t h a t in Figure 4.2, we show only the sub-type relations among K D D objects (a sub-type is-a special case of the supertype). For instance, all of Kdiscover, Preprocess, Kelicit, Krefine are sub-types of Interactor. We will see below how to express the sub-agent relation, for instance, Preprocess, Kelicit, Krefine are three sub-agents of Kdiscover. Types have the ordinary instance-level attributes. For instance, D&K has the attribute s t a t u s describing the current processing s t a t u s of the data/knowledge (created, cleaned, reviewed, stored, etc.), and this attribute is inherited by all sub-types of D&K. Kdiscovered has the attribute timestamps recording the time when the knowledge is discovered, and this attribute is inherited by all sub-types of Kdiscovered (Regression, CCluster, Rule, and Clause). As for Agent types, there are additional properties defined. For instance, we may have type/instance-level procedures expressing operations on the types or instances (creation, deletion, modification, etc.). However, the most interesting properties of Agent types are the following type-level attributes with information t h a t is used by the planning meta-agent: (1) I n / O u t : specifying the types of the i n p u t / o u t p u t of an agent type. T h e specified types are some sub-types of D&K, and the types of the actual i n p u t / o u t p u t of any instance of the agent type must be sub-types of the specified types. For instance, the I n / O u t for agent type CBK is: CleanData and Kdomain of)
—• $Scope ($ means unspecified number
(2) Precond/Effect: specifying the preconditions for an agent (an instance of the agent type) t o execute, and the effects when executed.
102
N. Zhong, C. Liu and S. Ohsuga
Precond/Effect are logic formulas with t h e restrictions as in the classical S T R I P S (see [20], for instance). However, we allow more readable specifications for them. In next section we will see t h a t the planner has a (KDD) domain-specific layer, and part of this layer will transform the high-level specifications into low-level logic formulas. As m a t t e r of the fact, a large p a r t of the Precond/Effect, concerning constraints on i n p u t / o u t p u t of the agent type, has been specified implicitly by the I n / O u t a t t r i b u t e . This is more declarative, and also because the detailed form (as conjunctions of literals) may not be able to write down at the type level. At planning time, the I n / O u t specification will be transformed into conjunctions of literals, then added to the Precond/Effect on which the planner reasons. (3) Action: a sequential program performing real K D D actions upon agent execution (e.g. to call the underlying K D D algorithms). It is empty for high-level agents (see below). (4) Decomp: describing possible sub-tasking. Instances of high-level agents (marked by shadowed boxes in Figure 4.2 should be decomposed into a network of sub-agents. Decomp specifies the candidate agent types for the sub-agents. For instance, the Decomp for agent type Kdiscover is: {Preprocess, Kelicii, Krefine}. This specifies t h a t a Kdiscover agent should be decomposed into a sub-plan built from Preprocess, Kelicit and Krefine agents. T h e exact shape of this sub-plan is the result of planning (in this case, the sub-plan happens to be a sequence of the three learning phases). As we will see in Section 4.4.2 about hierarchical planning, when the controller meets a high-level agent HA in the plan, it calls the planner to make a sub-plan to achieve the effect of HA. Then the planner searches the pool of the (sub)agent types listed in Decomp of HA, rather t h a n Ops - the entire set of operators. Note t h a t our method is different from either [20] in which the sub-plan itself is written in Decomp, or [4] in which the decomposition is user-guided.
4.4
K D D Process Planning
T h e planning meta-agent (the planner) has three layers as shown in left part of Figure 4.3 (the right part will be explained in Section 4.5.1). T h e inner
Dynamically Organizing KDD Processes in a Multi-Agent KDD System
103
KDD Process Schema
The Controlling Meta-Agent (Controller)
The Controlling Meta-Agent (Controller)
Execution & Monitoring Scheduling, Resource Allocation, Interaction & Communication
(a) Coupling Mode
Scheduling, Resource Allocation, Interaction & Communication
(b) Integration Mode
Fig. 4.3 Coupling vs. Integration of the planning and controlling meta-agents
layer is a domain-independent non-linear planner, the middle layer deals with KDD-specific issues, and the out layer interacts with another metaagent, the controller, to realize hierarchical planning. In the following subsections we will discuss these three layers respectively, and give a scenario of KDD process planning.
4.4.1
Non-linear
Planning
The inner layer as shown in left part non-linear planner. We recite the system implementing the non-linear gorithm which is called by the KDD high-level KDD agent A: ALGORITHM-1: INPUT:
of Figure searching planning controller
4.3 is a domain-independent strategy of the production as a non-linear planning alwhen it tries t o "execute" a
Non-Linear Planning (1) High-level agent A to be decomposed (2) Current WSD Plan of A (a network of agents carrying out A's job)
OUTPUT: METHOD: 1. Create STRIPS goal G from the Out/Effect attributes of A, consulting WSD; 2. Build the initial partial-plan of A: (START{WSD], GFINISHQ)
104 N. Zhong, C. Liu and S. Ohsuga 3. Inspect the current partial-plan P to find all flaws, order them according to heuristics about flaw priority, and store them in AGENDA 4. IF AGENDA is empty /* P is flaw-free, hence a plan of A */ THEN stop /* P is returned to the KDD controller */ 5. Try to fix up the flaw AGENDA[top] : IF there is no more applicable (meta-)rule for the flaw THEN backtrack ELSE (1) choose (using heuristics) a rule to fix up the flaw; (2) transform P into a new partial-plan P'; (3) set a backtrack point; 6. Inspect the new current partial-plan P' to re-adjust AGENDA; /* remove the fixed flaw and add newly introduced flaws */ 7. Goto step4.
4.4.2
Hierarchical
Planning
As we are dealing with real world KDD applications, a hierarchy of abstractions is essential. T h e process of alternatively adding detailed steps to the plan and actually executing some steps should continue until the goal is achieved. In GLS, the hierarchical planning is accomplished by the cooperation of the two meta-agents - planner and controller. T h e interface between them is the outer layer of Figure 4.3(a). T h e two meta-agents interact as follows. At the beginning, the controller generates a high-level KDD agent HA (Kdiscover, for instance) with the discovery goal as its effect. This single agent HA can be regarded as the first coarse plan. When the controller tries to execute HA, it calls the planner to decompose it into a more detailed sub-plan. T h e planner works as described in the above sub-section, taking the current world state as its initial-state, the effect of HA as its goal, and searching the types of sub-agents specified in the D E C O M P attribute of HA, instead of the whole pool of operators, to achieve the goal. T h e produced sub-plan is added to the original plan, with each node linked to HA by a sub-agent relationship. T h e n the controller resumes its work (generating, executing, and controlling K D D agents according to the sub-plan). Obviously this mechanism can work in multi-levels: if the controller meets another high-level agent when it executes the sub-plan, it will call the planner again.
Dynamically
4.4.3
KDD
Organizing KDD Processes in a Multi-Agent
Specific
KDD System
105
Issues
Because the core of the planner is domain-independent, we provide a middle layer as shown in Figure 4.3(a) to deal with all K D D specific issues: • To transform the K D D goals into S T R I P S goals (logic formulas in the style of S T R I P S , t h a t is, conjunctions of literals), especially to translate the i n p u t / o u t p u t constraints specified in the I n / O u t attribute into Precond/Effect. • To search the pool of operators (or more exactly, to search the types of sub-agents specified in the D E C O M P attribute of a high level agent HA in the decomposition process) to introduce suitable K D D agents into the plan. • To consult the world state description (WSD) to see if a precondition is already satisfied by the W S D , a n d / o r help to transform the I n / O u t specification into conjunction of literals as p a r t of Precond/Effect. • To represent the resulting plan as a network of K D D agents, so the controller can dynamically generate and execute the KDD agents according to the network. T h e network can be also used by the user of the GLS system as a visualization tool. 4.4.4
A
Scenario
Assume t h a t we have a central, large space science database, each record (tuple) describing a star. T h e interesting attributes include CD (cluster designation), E T (effective t e m p e r a t u r e ) , LU (luminosity), B-V and U-B (color indexes). The facts such as we have already had a central, large database with CleanData, and the nominal attribute CD can be used to form Scopes, etc. are explicitly stated in the initial-state (WSD). T h e discovery goal is to find structural characteristics hidden in the database and to refine t h e m upon d a t a change. Based on the specifications of W S D , goal, and KDD agent types, the planner and the controller cooperate in the manner as described in Section 4.4.2, and come up with a full KDD process plan as shown in Figure 4.4. T h e process goes as follows. T h e initial plan consists of a single K D D agent Kdiscover to produce RegreTree t h a t is a sub-type of Krefined. It is decomposed into the sequential phases: Preprocess, Kelicit, and Krefine (in Figure 4.4, we also show the i n p u t / o u t p u t types for these K D D agents). As
106
N. Zhong, C. Liu and S. Ohsuga
Fig. 4.4
A sample KDD process plan
the W S D contains the fact t h a t we have already got CleanData, Preprocess can be simply done by Select (no need of Collect and Clean). Because the nominal attribute CD designates star clusters, and we need clustering other attributes as preparation for the next learning phase, Select is decomposed into FSN and CBK t h a t can be executed in parallel and cooperatively. The result of the execution of Select (or its sub-plan) is n sub-databases. Then the second learning phase Kelicit is under consideration. Here we show how to transform the I n / O u t of Kelicit into part of Precond/Effect. T h e original I n / O u t specification, SelectedData
—•
Kdiscovered,
is first refined to sub-types: $Scope —*• SRegression, where $ means unspecified number of. Then, by consulting the WSD t h a t contains n sub-databases, we have got the precondition are n Scopes" and the effect "there will be n regression model (represented as conjunctions of literals). T h e result of decomposing
current "there stores" Kelicit
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
107
as shown in Figure 4.4 is t h a t n KOSIs are needed in the sub-plan of Kelicit to learn from the n sub-databases separately and in parallel. (But in terms of implementation, we may install only one KOSI tool with multiple executions, j u s t as in software development process, one compiler may be used simultaneously to compile several source files). In the third learning phase Krefine, we have a similar situation: there are n IIBRs t o refine and manage the knowledge (regression models) discovered by the n KOSIs in parallel.
4.5
Handling Iteration and Changes of K D D Process
As stated above, the GLS system has set up the K D D process framework as an organized society of intelligent agents, and solved the basic problem in a multi-strategy and multi-agent K D D system: how to choose appropriate KDD techniques to achieve a particular discovery goal in a particular domain [26; 30]. The solution has been based on AI planning techniques, and increased both autonomy and versatility of the GLS system [30; 3l]. In this section, we would like to address two deeper issues on organizing/managing K D D processes: • Process Iteration: Because knowledge discovery process contains a circle of hypothesis generation, evaluation, and refinement, K D D process is essentially a repetitive process. In fact, iteration can be seen in anywhere and at any time in the process, and the process may repeat at different intervals when n e w / u p d a t e d d a t a comes. Formalization or automation of iteration in KDD process is an interesting research topic with practical significance. • Change Management: During the (long) lifetime of a KDD application session, there may be many kinds of changes which demand replanning the KDD process, such as: changes in the databases, introducing new KDD techniques a n d / o r new strategies to coordinate various discovery steps, etc. As replanning from scratch is in most cases unpleasant and unnecessary, we need a method to reuse the exiting KDD process plan, with local adjustment adapted t o the changes. In order to formalize the process iteration, we propose a mechanism integrating process planning with process controlling. As shown in Fig-
108
N. Zhong, C. Liu and S. Ohsuga
ure 4.3(a), the process planning and process controlling are two separated (and coupled) modules, and the non-linear planner is modeled as a production system. This section extends the (meta-)rule set of the production system with new (meta-)rules representing process execution and monitoring, thus integrating the two modules. In this integrated mode, the iteration of K D D process (that is, the re-execution of some agents) can be properly formalized and a u t o m a t e d as follows: Execution failures causing process iteration are detected by the monitoring rule; Feedback p a t h s and re-execution of the agents in the p a t h s are determined dynamically and automatically by cooperation of several (meta-)rules; And even the iterating number for each loop is also determined dynamically and automatically. However, in the overall architecture of the GLS system, we still need two m e t a agents (planning and controlling), because here only a part of the functionalities of the controlling m e t a agent is integrated with planning. In order to manage changes, we first identify and list all possible changes (though we cannot claim the completeness of the list) which demand replanning the KDD process. T h e n present a general, incremental replanning algorithm which readjusts the existing K D D plan to reflect the changes. Besides, we argue t h a t though some changes can also be handled by adding more (meta-)rules in the integrated mode, it is better to design and implement the replanning facility as an additional component of the searching strategy in our production system. We also describe (within the context of the coupling mode) the mechanism to initiate the replanning process (when, where, how and by whom).
4.5.1
Handling Planning
KDD and
Process Iteration Controlling
by Integration
of
Coupling the KDD planner and the K D D controller can realize hierarchical planning and plan execution. T h e two meta-agents interact as follows (see Figure 4.3(a)). At the beginning, the controller generates a high-level K D D agent HA (of Kdiscover type, for example) with the discovery goal as its Effect. This single agent HA can be regarded as the first coarse plan. W h e n the controller tries to execute HA, it calls the planner to decompose it into a more detailed sub-plan. T h e planner takes the current world state as its initialstate, the Effect of HA as its goal, and searching the types of sub-agents specified in the Decomp a t t r i b u t e of HA (instead of the whole pool of oper-
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
109
ators), t o achieve the goal. T h e produced sub-plan is added t o the original plan, with each node linked to HA by a sub-agent relationship. Then the controller resumes its work (generating, executing, and controlling K D D agents according to the sub-plan). However, K D D process is essentially a repetitive process. In fact, iteration can be seen in anywhere and at any time in the process. For example, if the result of a Krefine agent is not exactly what you want, then you have to go back to Kelicit agent or even Preprocess agent. We may solve this problem of iteration in ad hoc manner. For example, the user may point out which K D D agent in the plan should be re-executed if the result of the current agent is not satisfactory. But obviously, automating process iteration will give the user more intelligent assistance. We are giving such an automatic solution in the following. T h e main idea is to treat the unsatisfactory result of a KDD agent as execution exception or failure (the K D D agent fails to achieve its Effect). The detection of exceptions is the task of the K D D controller, but it is difficult for the controller to decide where to go after the failure detected, if there are not iteration paths specified in the K D D plan. Our solution is to integrate the K D D planner with a part of the KDD controller, resulting in an augmented planning meta-agent t h a t is directly connected to an environment. Then all relevant activities (planning, execution, monitoring, re-planning, and re-execution) are within a uniform mechanism and can be interleaved at a very fine granularity. This integration mode shown in Figure 4.3(b) has only two layers (the domain independent layer and the KDD domain specific layer), in contrast to the coupling mode with three layers in the left part of the same figure. Remember t h a t the non-linear planning (the inner layer) was implemented as a production system. Given the discovery GOAL and the world state description W S D , the starting point is an initial (empty) partial-plan with two d u m m y agents START, FINISH: ({emptyPrecond}START{Effect {Precond = GOAL}FIN
= WSD}, ISH{emptyEf feci})
Each (meta-)rule in the production system transforms a partial-plan into a new partial-plan, fixing up some flaw (unsupported Preconds, possible conflicts in agent ordering, etc.). T h e searching strategy of the production system chooses and applies suitable (meta-)rules to gradually transform the
110 N. Zhong, C. Liu and S. Ohsuga initial partial-plan into a final, flaw-free plan which will achieve the GOAL when executed. Now what needed in our integration mode is to extend the (meta-)rule set with new rules for agent execution and monitoring. Furthermore, the hierarchical planning (which was realized by the coupling of the K D D planner and controller) is now realized by introducing an e x t r a "decomposition" (meta-)rule. We first (re)state the original planning (meta-)rules. In the rule description below, some notations need explanation: K D D agents are denoted by A, B, C, W. Functions Precond(A)/Effect(A) stand for the Precond/Effect attributes of agent A. If q is a literal in Effeci(A), p is a literal in Precond(B), and q supports p (i.e. they are unifiable), then the notation q —• p is called a (protection) range. Effect(C) "undoes" p means t h a t Effect(C) includes a literal matching with not(p). Finally, "there is" means "there is in the current partial-plan". 1. Agent Ordering Rules: IF there is a range q — • p /* q is in Effect(A), p is in Precond(B) */ THEN IF there is C parallel to B and Effect(C) undoes p THEN set B before C IF there is C parallel to A and Effect(C) undoes p THEN set C before A IF C is between A and B and Effect(C) undoes p THEN introduce a new agent W between C and B to re-establish p. 2. Agent Selection Rules: IF no range supporting p in Pr econd(A) THEN IF there is B before A and q in Effect(B) supports p THEN set a new range q •p IF there is B parallel to A and q in Effect(B) supports p THEN set a new range q • p; set B before A IF q in Effect(W) supports p but W is not in the partial-plan THEN introduce W into the partial-plan between START and A;
Dynamically Organizing KDD Processes in a Multi-Agent KDD System set a new range q
111
• p.
Now we extend this planning rule set with new rules for agent execution, monitoring, and decomposition. Note t h a t in the coupling mode the planning phase as described above is "static" in the sense t h a t the world state description (WSD) would not change until the planner have finished its j o b and the controller starts to execute the agents in the plan. On the other hand, in the integration mode, planning activities and execution/monitoring activities are interleaving at a very fine granularity, the W S D is changing all the time. T h u s we will talk about the "current W S D " in the following description of new rules. 3. Agent Execution Rule: IF A is not a "high-level" agent and ready for execution (Precond(A) is satisfied by the current WSD and A is not involved in any range violation) THEN execute A; remove A from the plan when it times out (succeeds or not). 4. Decomposition Rule: IF A is a "high-level" agent and ready for execution THEN decompose A into a sub-plan. 5. Monitoring Rule: IF A is being executed THEN monitor the execution: IF an expected effect q really appears d THEN change the producer of all the relevant range q —>p from A to START (i.e. put q into WSD) ELSE excise all the relevant ranges q — • p I* q fails to appear */ W i t h the new rules, failed effects always cause new flaws (unsupported Preconds), which sooner or later will trigger some planning activities (Rules 1 and 2). As the results, some of the executed (thus removed) agents will be re-introduced into the plan, and will be re-executed in due time. Obviously this can be regarded as a nice mechanism to automatically handle the problem of process iteration. T h e feedback p a t h s of iteration are determined dynamically, automatically, and based on logical reasoning which is in the
112
N. Zhong, C. Liu and S. Ohsuga
core of AI planning. We again use the stars database as an example to handle the K D D process iteration. First we show how the coupling mode works. Based on the specifications of W S D , goal, and K D D agent types, the planner and the controller can cooperate to produce a full KDD process plan as shown in Figure 4.4. Next we show how the integration mode automatically solves the problem of process iteration. Suppose t h a t in the above scenario the KOSI agent KOSI-1 is being executed, and t h a t the Monitoring Rule (Rule 5) has detected t h a t the expected effect does not appear (i.e. Regression Models Store-1 is not produced or is not acceptable). Then, on one hand, according to the Agent Execution Rule (Rule 3), agent KOSI-1 is removed nevertheless when it times out; On the other hand, the ELSE part of the Monitoring Rule (Rule 5) excises all the relevant ranges, leaving the subsequent agent IIBR-1 with unsupported precondition: there is no proper regression models store as its input to work on. This unsupported Precond flaw will trigger some planning activities (the Planning Rules: Rules 1 and 2) to re-introduce to the plan the previously removed KOSI agent to reestablish the precondition of IIBR-1 (that is, to fix up the unsupported Precond flaw). But the re-introduced KOSI agent will have its own Precond unsupported, therefore some Select agent will be also re-introduced to the plan, and so on. All these re-introduced agents will be re-executed to select better DB-1 a n d / o r learn better regression models. In summary, we may name this mechanism as automatic possesses the following desirable features:
iteration which
• Execution failures are detected by Monitoring Rule; • Feedback p a t h s are determined dynamically and automatically by cooperation of several (meta-)rules; • Re-execution is also realized by the (meta-)rules; • T h e iterating number for each loop is also determined dynamically and automatically. Finally we give a remark about the overall architecture of the GLS system. Even with this integrated mode, our GLS system still needs two m e t a agents (planning and controlling), because here only a part of the functionalities of the controlling m e t a agent is integrated with planning. T h e controlling m e t a agent remains in the architecture as shown in Figure 4.1 and
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
113
is responsible for other tasks such as: scheduling, resource allocation, manmachine interaction, interaction and communication among KDD agents, and etc.
4.5.2
Change
Management
by Incremental
Replanning
T h e KDD process is a long-term and evolving process. During its lifetime, many kinds of change may occur, hence change management is recognized as an important research issue with practical significance in the field of K D D process. We can identify the following kinds of changes (but we do not claim t h a t the list is complete): • Local Data Changes in Databases: W h e n the K D D process is planned and executed first time, the original database is used and the discovered regression models are stored for each sub-databases. Later, whenever local d a t a change (a new d a t a item is added, an old d a t a item is deleted/modified, etc.) occurs, the planning and execution process will iterate to find and add new version of regression models to the stores, and each IIBR agent will manage and refine the corresponding tree of regression models. This is a universal and imp o r t a n t problem in all real world K D D applications, as the contents of most databases are ever changing. • Large-Scale and/or Structural Changes in Databases: Some changes in the d a t a could be big and structural, resulting in different decomposition of the central database, for example. In this case, the process plan itself should be changed accordingly. This is called process evolution. • Changes in the Process Schema: T h e formal description of all available KDD techniques (i.e. the Agent types) in the KDD system is called the process schema. Process schema could change during the lifetime of the K D D process. For example, new K D D techniques can be introduced into the K D D system; existing K D D techniques could become obsolete, or remain in the system but with new parameter settings; new/modified strategies coordinating various discovery steps are adopted; and etc. These changes should be reflected in the process schema accordingly: some new Agent types are added, while some old Agent types are either removed, or modified in their "type-level" attributes (In/Out, Precond/Effect, Decomp). Finally,
114
N. Zhong, C. Liu and S. Ohsuga
process schema changes in t u r n cause process plan changes. T h a t is, we see process evolution here again. For some of the changes mentioned above, the integration mode presented in the previous section can be further extended t o deal with them. For example, if we add the following Monitoring Rule: 6. (More Monitoring Rule:) IF there is local change in the databases THEN restart the process according to the same process plan. W i t h this new (meta-)rule, the databases are under monitoring. Whenever their contents change locally, (a new d a t a item is added, or an old d a t a item is deleted/modified), the integrated m e t a agent restarts the K D D process according t o the same process plan. However, in the case of process evolution, changes are difficult to be handled in this way. Because here the problem we are facing is not the re-execution of (part of) the existing plan. Rather, we should replan the K D D process to reflect the changing environment. More precisely, we have the following observations: • If we insist in solving the problem of process evolution and process replanning by further extending the set of (meta-)rules, our production system will become too complicated. As Jonsson & Backstrom points out [7], the integrated mode of planning and execution is suitable only for some restricted classes of planning problems (the 3S class, for example). As we are not sure if the KDD planning problem in its full-scale can be solved properly by the ever-expanding set of (meta-)rules, we may try to realize replanning as an additional component of the searching strategy of the production system. • As replanning from scratch is in most cases unpleasant and unnecessary, we need a method to reuse the existing KDD process plan, with local adjustment adapted to the changes. In other word, we need an incremental replanning algorithm. • T h e big variety of possible changes does not mean t h a t we need a separate replanning algorithm for each kind of changes. In fact, all possible changes can disturb an existing plan only in the following ways: — Some new preconditions come in;
Dynamically Organizing KDD Processes in a Multi-Agent KDD System
115
— Some old preconditions become unsupported; — Some old effects become obsolete; — Decomps of some agents change when new (old) Agent types are added into (removed from) the schema. A general incremental replanning algorithm just needs to consider all these situations and take proper replanning activities. • Because of the hierarchical planning, the K D D process plan has a hierarchical structure. Incremental replanning always works on a particular p a r t at particular levels of the existing plan, and at a particular time. So we should specify when, where and how to replan. In light of the above observations, we have designed a general, incremental replanning algorithm. In the following, we present the algorithm in the context of the original coupling mode (replanning in the context of the integrated mode can be described similarly). T h e incremental replanning algorithm is also called by the K D D controller. Recall t h a t one of the main tasks of the K D D controller is to monitor the execution of the process plan. Concerning change management, we charge it with the following extra responsibilities: • Detecting changes in the databases; • Receiving and approving changes in the process schema; • Determining the starting point of replanning - the high-level K D D agent A t h a t is the root of the affected part in the existing, hierarchical plan; • Calling the replanning algorithm ( A L G O R I T H M - 2 below) with agent A and the changes as the input parameters. ALGORITHM-2: INPUT:
Incremental Replanning (1) High-level agent A and its existing plan (2) Changes demanding replanning from A (3) Current WSD Re-adjusted plan of A, coping with the changes
OUTPUT: METHOD: 1. IF there is any change in WSD (databases), or in Out/Effect of A THEN re-create the STRIPS goal G' for A; 2. IF there is a change in Decomp of A THEN delete those agents whose types disappear in the new Decomp of A;
116 N. Zhong, C. Liu and S. Ohsuga /* New agents of new types may be added in step 6 below */ 3. For each agent Ai in the existing plan of A : IF there is any change in WSD, or in Out/Effect of Ai THEN re-adjust Effect of Ai according to the change; /* specially, START will have new WSD as its Effect */ 4. For each agent Ai in the existing plan of A : IF there is any change in WSD, or in In/Precond of Ai THEN re-adjust Precond of Ai according to the change; /* specially, FINISH will have new goal G' as its Precond */ 5. Delete all "dead" agents in the existing plan; /* an agent supporting no Precond of other agents becomes "dead" */ 6. /* Now the existing plan is disturbed, because step 1-4 above have introduced various flaws into it. */ Invoke the planner to resume its work at step3 of ALGORITHM-1 to find and fix up new flaws, returning a new plan of A; 7. For each high-level sub-agent HAj in the new plan of A : IF HAj is newly introduced, or though HAj was in the old plan but had not been expanded THEN do nothing here /* planning of HAj will be done later when controller tries to execute it. */ ELSE Apply this ALGORITHM-2 recursively to EA3 /* because HAj may need replanning as well as its parent A */. Note t h a t the replanning algorithm ( A L G O R I T H M - 2 ) is a recursive procedure, and it in t u r n calls the non-linear planning algorithm ( A L G O R I T H M 1 shown in Section 4.4.1). Let us look at an example of replanning (see Figure 4.5. Supposing t h a t we have got the following events: • Time-series data, come implying possible structural changes in the central DB; • A new K D D technique SCT (stepwise Chow test to discover structural changes in time-series data) is introduced into the K D D system; • Decomp of Select type is modified from (FSN,CBK) to
(FSN,CBK,SCT). When the K D D controller detects and approves these changes, it determines t h a t the high-level agent Kdiscover in Figure 4.5 is the starting point of
Dynamically
Organizing KDD Processes in a Multi-Agent
Fig. 4.5
KDD System
117
A sample KDD process plan and replan
replanning, and calls A L G O R I T H M - 2 to recursively re-adjust the existing, hierarchical plan, resulting in the following changes in the plan, which are marked in Figure 4.5 by bold lines and boxes: • T h e sub-plan of the Select agent has an additional SCT sub-agent to discover possible structural changes in time-series data; • T h e Select agent has more sub-databases as its o u t p u t ; • There are more KOSIs in the sub-plan of Kelicit to learn Regression Models from the new subDBs; • There are more IIBRs in the sub-plan of Krefine to build Model Trees from the new Regression Models.
4.6
Concluding Remarks
We presented a methodology of dynamically organizing KDD processes for increasing both autonomy and versatility of a discovery system, and the framework of the GLS system based on this methodology. In comparison, GLS is mostly similar to INLEN in related systems [14]. In INLEN, a database, a knowledge-base, and several existing methods of machine learning are integrated as several operators. These operators can generate diverse kinds of knowledge about the properties and regularities existing in the d a t a . INLEN was implemented as a multi-strategy KDD system
118
N. Zhong, C. Liu and S. Ohsuga
like GLS. However, GLS can dynamically plan and organize the discovery process performed in a distributed cooperative mode for different discovery tasks. Moreover, the refinement for knowledge is one of important capabilities of GLS that was not developed in INLEN. Since the GLS system to be finished by us is very large and complex, however, we have only finished several parts of the system and have undertaken to extend it for creating a more integrated, organized society of autonomous knowledge discovery agents. That is, the work that we are doing takes but one step toward a multi-strategy and multi-agent KDD system.
Acknowledgements The authors would like to thank Prof. Jan Zytkow and Mr. Y. Kakemoto for their valuable comments and help. This work was partially supported by Telecommunications Advancement Foundation (TAF).
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
119
Bibliography
1] B r a c h m a n , R . J . and A n a n d , T . " T h e Process of Knowledge Discovery in D a t a b a s e s : A H u m a n - C e n t r e d A p p r o a c h " , In Advances in Knowledge Discovery and Data Mining, M I T Press (1996) 37-58. 2] Dong, J.Z., Zhong, N., and O h s u g a , S. "Probabilistic Rough Induction: T h e G D T - R S Methodology and A l g o r i t h m s " , Z.W. R a s and A. Skowron (eds.) Foundations of Intelligent Systems. LNAI 1609, Springer-Verlag (1999) 621629. 3] Dong, J.Z., Zhong, N., and O h s u g a , S. "Using Rough Sets with Heuristics to F e a t u r e Selection", Zhong, N., Skowron, A., and O h s u g a , S. (eds.) New Directions in Rough Sets, Data Mining, Granular-Soft Computing, LNAI 1711, Springer-Verlag (1999) 178-187. 4] Engels, R. " P l a n n i n g T a s k s for Knowledge Discovery in D a t a b a s e s - Performing Task-Oriented U s e r - G u i d a n c e " , Proc. Second International Conference on Knowledge Discovery and Data Mining (KDD-96), A A A I Press (1996) 170175. 5] Fayyad, U.M., P i a t e t s k y - S h a p i r o , G, and S m y t h , P. "From D a t a Mining to Knowledge Discovery: an Overview", In Advances in Knowledge Discovery and Data Mining, M I T Press (1996) 1-36. 6] Fayyad, U.M., P i a t e t s k y - S h a p i r o , G., S m y t h , P., and U t h u r u s a m y , R. (eds.) "Advances in Knowledge Discovery and D a t a Mining", A A A I Press (1996). 7] Jonsson, P. a n d B a c k s t r o m , C. " I n c r e m e n t a l P l a n n i n g " , in New Directions in AI Planning, IOS Press (1996) 79-90. 8] Klosgen, W . " P r o b l e m s for Knowledge Discovery in D a t a b a s e s and Their T r e a t m e n t in t h e Statistics I n t e r p r e t e r E x p l o r a " , International Journal of Intelligent System, Vol.7, No.7 (1992) 649-673. 9] Liu, C. "Software Process P l a n n i n g and Execution: Coupling vs. I n t e g r a t i o n " , LNCS 498, Springer (1991) 356-374. 10] Liu, C. and C o n r a d i , R. " A u t o m a t i c R e p l a n n i n g of Task Networks for Process Evolution in E P O S " , Proc. ESEC'93, L N C S 717, Springer (1993) 437-450.
120
N. Zhong, C. Liu and S. Ohsuga
111 Liu, C. and Zhong, N. "Handling K D D Process Iteration by I n t e g r a t i o n of P l a n n i n g a n d C o n t r o l l i n g " , Proc. 1998 IEEE International Conference on Systems, Man, and Cybernetics (SMC'98) (1998) 411-416. 12] Liu, C. and Zhong, N. "Rough P r o b l e m Settings for I n d u c t i v e Logic P r o g r a m m i n g " , Zhong, N., Skowron, A., and O h s u g a , S. (eds.) New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, LNAI 1711, Springer-Verlag (1999) 168-177. 131 M a t h e u s , C.J., C h a n , P.K., a n d P i a t e t s k y - S h a p i r o , G. "Systems for Knowledge Discovery in D a t a b a s e s " , IEEE Transactions on Knowledge and Data Engineering, Vol.5, No.6 (1993) 904-913. 14] Michalski, R.S., Kerschberg, L., K a u f m a n , K.A., a n d Ribeiro, J.S. "Mining for Knowledge in D a t a b a s e s : T h e I N L E N A r c h i t e c t u r e , Initial Implement a t i o n a n d First R e s u l t s " , Journal of Intell. Infor. Sys., Kluwer A c a d e m i c Publishers, Vol.1, N o . l (1992) 85-113. 15] Minsky, M. The Society of Mind, Simon a n d Schuster, New York (1986). 161 Nguyen, S.H. and Nguyen, H.S. " Q u a n t i z a t i o n of Real Value A t t r i b u t e s for C o n t r o l P r o b l e m s " , Proc. Forth European Congress on Intelligent Techniques and Soft Computing EUFIT'96 (1996) 188-191. 171 O h s u g a , S. "Framework of Knowledge Based S y s t e m s - Multiple Meta-Level A r c h i t e c t u r e for R e p r e s e n t i n g P r o b l e m s and P r o b l e m Solving Processes", Knowledge Based System, Vol.3, No.4 (1990) 204-214. 181 O h s u g a , S. "A W a y of Designing Knowledge Based S y s t e m s " , Knowledge Based System, Vol.8, No.4 (1995) 211-222. 191 O h s u g a , S. and Y a m a u c h i , H. "Multi-Layer Logic - A P r e d i c a t e Logic Including D a t a S t r u c t u r e as Knowledge R e p r e s e n t a t i o n L a n g u a g e " , New Generation Computing, Vol.3, No.4 (1985) 403-439. 201 Russell, S.J. and Norvig, P. Artificial Intelligence - A Modern Approach Prentice Hall, Inc. (1995). 211 P i a t e t s k y - S h a p i r o , G. and Frawley, W . J . (eds.), Knowledge Discovery in Databases, A A A I Press and T h e M I T Press (1991). 221 Zhong, N. and O h s u g a , S. "GLS - A M e t h o d o l o g y for Discovering Knowledge from D a t a b a s e s " , P.S. Glaeser and M . T . L . Millward (eds.) New Data Challenges in Our Information Age (1992) A20-A30. 231 Zhong, N. and O h s u g a , S. " T h e GLS Discovery System: Its Goal, Architect u r e a n d C u r r e n t R e s u l t s " , Z . W . R a s a n d M . Z e m a n k o v a (eds.) Methodologies for Intelligent Systems. LNAI 869, Springer-Verlag (1994) 233-244. 241 Zhong, N. and O h s u g a , S. "Discovering C o n c e p t Clusters by Decomposing D a t a b a s e s " , Data & Knowledge Engineering, Vol.12, No.2, Elsevier Science Publishers (1994) 223-244. 251 Zhong, N. a n d O h s u g a , S. "KOSI - An I n t e g r a t e d Discovery System for Discovering F u n c t i o n a l Relations from D a t a b a s e s " , Journal of Intelligent Information Systems, Vol.5, N o . l , Kluwer Academic Publishers (1995) 2550. 261 Zhong, N. and O h s u g a , S. "Toward A M u l t i - S t r a t e g y and C o o p e r a t i v e Discov-
Dynamically
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
Organizing KDD Processes in a Multi-Agent
KDD System
121
ery S y s t e m " , Proc. First International Conference on Knowledge Discovery and Data Mining (KDD-95), A A A I Press (1995) 337-342. Zhong, N. and O h s u g a , S. "A Hierarchical Model Learning A p p r o a c h for Refining and M a n a g i n g C o n c e p t Clusters Discovered from D a t a b a s e s " , Data & Knowledge Engineering, Vol.20, No.2, Elsevier Science P u b l i s h e r s (1996) 227-252. Zhong, N. and O h s u g a , S. "System for Managing and Refining S t r u c t u r a l C h a r a c t e r i s t i c s Discovered from D a t a b a s e s " , Knowledge Based Systems, Vol.9, No.4, Elsevier Science Publishers (1996) 267-279. Zhong, N., K a k e m o t o , Y., and O h s u g a , S. "An Organized Society of Aut o n o m o u s Knowledge Discovery A g e n t s " , P e t e r K a n d z i a and M a t t h i a s Klusch (eds.) Cooperative Information Agents. LNAI 1202, Springer-Verlag (1997) 183-194. Zhong, N., L i u , C , a n d O h s u g a , S. "A W a y of Increasing b o t h A u t o n o m y and Versatility of a K D D S y s t e m " , Z.W. R a s and A. Skowron (eds.) Foundations of Intelligent Systems. LNAI 1325, Springer-Verlag (1997) 94-105. Zhong, N., Liu, C , K a k e m o t o , Y., and O h s u g a , S. " K D D Process P l a n n i n g " , Proc. Third International Conference on Knowledge Discovery and Data Mining (KDD-97), A A A I Press (1997) 291-294. Zhong, N. and O h s u g a , S. "A M u l t i - P h a s e Process for Discovering, M a n a g i n g and Refining Strong F u n c t i o n a l Relationships Hidden in D a t a b a s e s " , T r a n s actions of Information Processing Society of J a p a n , Vol.38, No.4 (1997) 698-706. Zhong, N., Dong, J.Z., and O h s u g a , S. " D a t a Mining: A Probabilistic Rough Set A p p r o a c h " , L, Polkowski a n d A. Skowron (eds.) Rough Sets in Knowledge Discovery, Vol.2, Physica-Verlag (1998) 127-146. Zhong, N., Liu, C , a n d O h s u g a , S. "Handling K D D Process C h a n g e s by I n c r e m e n t a l R e p l a n n i n g " , J. Zytkow and M. Quafafou (eds.) Principles of Data Mining and Knowledge Discovery. LNAI 1510, Springer-Verlag (1998) 111-120. Zhong, N., Yao, Y.Y., and O h s u g a , S. "Peculiarity Oriented M u l t i - D a t a b a s e Mining", J. Zytkow and J a n R a u c h (eds.) Principles of Data Mining and Knowledge Discovery. LNAI 1704, Springer-Verlag (1999) 136-146. Ziarko, W . 1991. " T h e Discovery, Analysis, a n d R e p r e s e n t a t i o n of D a t a Dependencies in D a t a b a s e s " , P i a t e t s k y - S h a p i r o and Frawley (eds.) Knowledge Discovery in Databases, T h e A A A I Press (1991) 195-209. Zytkow, J.M. " I n t r o d u c t i o n : Cognitive A u t o n o m y in M a c h i n e Discovery", Machine Learning, Kluwer Academic Publishers, Vol.12, No.1-3 (1993) 716. Zytkow, J.M. and Zembowicz, R. " D a t a b a s e Exploration in Search of Regularities", Journal of Intelligent Information Systems, Kluwer A c a d e m i c Publishers, Vol.2, N o . l (1993) 39-81.
Chapter 5
Self-Organized Intelligence
J i m i n g Liu Department of Computer Science Hong Kong Baptist University
5.1
Introduction
This chapter is concerned with the problem of how to induce self-organized intelligence in a multi-agent system. It addresses one of the central issues in the development and applications of multi-agent robotic systems, namely, how to develop self-organized multi-agent systems to collectively accomplish certain tasks in robot vision and group navigation. In doing so, we will explicitly define and implement two multi-agent systems; one is for searching and tracking digital image features and another is for controlling a group of distributed robots to navigate in an unknown task environment toward goal locations. The coordination, cooperation, and competition among the agents will manifest in the ways in which the agents exchange and share certain information, such as the current status of the overall system and/or those of neighboring agents, and in which they select their own actions based on such information. We will consider the problem of goalattainability with a group of distributed autonomous agents. The agents self-organize their behaviors based on their previously-acquired individual dynamics, called local memory-driven behavioral selection, and their average distance to target locations, termed global performance-driven behavioral learning. The aim of our work is to show to what extent the
123
124
J. Liu
two types of information can affect the goal-attainability of the system. We will empirically investigate the performance of agent behavioral selforganization incorporating local memory-driven behavioral selection and global performance-driven behavioral learning with respect to the goalattainability as well as task-efficiency of the multi-agent system.
5.2
O r g a n i z a t i o n of t h e C h a p t e r
T h e remainder of this chapter is organized as follows: Section 5.3 introduces the key notions and states the general problem to be addressed from the point of view of a multi-agent approach. Section 5.4 describes the selforganized vision approach, covering the models of reactive behaviors, their adaptive self-organization, and the empirical validation of an implemented multi-agent vision system in performing image feature tracking. Section 5.5 focuses on the formulation of a robot group navigation problem into a multiagent self-organized motion problem. Section 5.6 provides an overview of some of the related work in the areas of image processing, robot group behavior, and adaptive self-organization. Finally, Section 5.7 concludes the chapter by highlighting the key contributions of this chapter and pointing out several avenues for future extension.
5.3
Problem Statement
T h e goal of our work is to show (1) how the tasks of robot vision and group motion can be handled collectively by classes of distributed agents t h a t respond locally to the conditions of their environment and (2) how the behavioral repository of the agents can be constructed. For the ease of understanding our proposed approach, in the sequel, we will carry out our discussions based on the following general search problem: There are several convex regions in a rectangular search space, S, each of which is composed by a number of feature elements or locations with the same physical feature characteristics, i.e., each convex region is homogeneous. Distributed agents are required to search and find all the feature (goal) locations within S. Now let us formally describe the specific problem of self-organized intelligence to be addressed in this chapter.
Self-Organized
Intelligence
125
(1) The environment: (a) Physical features: S contains a number of homogeneous regions composed by elements or locations with the same physical feature characteristics. The feature characteristics can be calculated and evaluated based on certain numerical measures. (b) Geometrical characteristics: S is a rectangular grid-like search space in a two-dimensional plane, with the size of U x V. Each homogeneous region in S is connected and convex, and possesses a boundary of connected locations. (2) The task: Distributed agents are dispatched in S in order to search and label all the feature locations of homogeneous regions. (3) The behaviors of the agents: (a) Primitive behaviors: The agents can recognize and distinguish certain physical feature locations, if encountered, based on some predefined criteria. (b) Complex behaviors: The agents can decide and execute their next step reactive behavior. That is, they may breed, move, or vanish in S, based on their task, previously executed behavior, and current environment characteristics. Remark 5.1: Primitive behaviors are the fixed intrinsic operations of agents. We may create and distinguish various classes of agents based on their primitive behaviors. In our present work, we assume that the feature locations to be found correspond to the borders of certain homogeneous regions. Mathematically, we will define the feature characteristics of the border of a homogeneous region using the relative contrast of measurement values within a small region. Here, the term measurement is taken as a generic notion; the specific quantity that it refers to will depend on the nature of applications. For instance, it may refer to the grey-level intensity of an image in the case of image processing. Or, it may refer to a spatial measurement function in the case of robot environment modeling. Remark 5.2: By complex behaviors, we mean that the agents can self-organize and make decisions on what behaviors to produce next. In this regard, we say that the agents possess the characteristics of autonomy.
126
J. Liu
Remark 5.3: The agents may vanish, as soon as they leave a marker, breed the next generation of offspring agents, or leave the space geometrically described by the environment. (4) Feature searching: Definition 5.1 (Feature searching) Let N denote the total number of feature locations in S. The goal of distributed agents in S is to extract all the feature regions. This problem is equivalent to the problem of extracting the borders, or all the locations on the borders, of homogeneous regions. If the total number of feature locations detected and labeled by the distributed agents is equal to TV, it is said that all the feature locations in S are reachable by the agents. In other words, the goal of the agents is attainable in the given environment.
5.4
Adaptive Self-Organized Vision for Image Feature Detection and Tracking
In this section, we will consider our first task, i.e., to apply collective behavior in solving the robot vision problem of searching and tracking image features. Here, the two-dimensional lattice, S, in which the proposed autonomous agents reside is a grey-level image of size U x V {i.e., an array of U columns by V rows of pixels). Suppose that S contains a certain number of pixels whose intensity relative to those of its neighboring pixels satisfies some specific mathematically well-defined conditions. Furthermore, whether a pixel p in S can be classified as belonging to the feature can be decided by evaluating the outcome of a mathematical operator, D (i.e., a feature descriptor), as applied at p. The total number of feature pixels in S is denoted by M. Thus, the objective of the autonomous agents in S is to extract all the predefined features of S by finding and marking at the feature pixels. This is essentially an optimization problem as stated below. Definition 5.2 (Optimal feature extraction) If the total number of feature pixels detected and marked by active agents, N, is equal to M, it is said that an optimal feature extraction is achieved.
Self-Organized
Intelligence
127
Definition 5.3 (Active agents) At a certain time t in the two-dimensional lattice, autonomous agents whose ages do not exceed a given life span will continue to react to their image environment by way of evaluating the pixel grey-level intensity and selecting accordingly some of their behaviors. Such agents are called active agents at time t.
5.4.1
An Overview
of Adaptive
Self-Organized
Vision
With respect to the image feature detection problem as mentioned above, one may consider to use an extreme approach in which the entire plane is placed with the agents and each of them reacts to its immediate environment simultaneously, whereas in another extreme approach, a border is traced using some predefined templates. Our approach can be viewed as a compromise between these two approaches. The main distinction lies in that in our approach, each autonomous agent can locally reproduce and diffuse, and hence adaptively extract image features (e.g., contours in an image). Now let us take a look at the detailed formalisms of adaptive selforganizing autonomous agents, including their environment, local pixel evaluation functions, the fitness definition, and the evolution of asexual self-reproduction and diffusion.
5.4.2
Two-Dimensional
Lattice
of an Agent
Environment
The adaptive nature of our proposed agent automata consists in the way in which generations of autonomous agents are replicated and selected. Such agents directly operate in two-dimensional rectangular grid lattices that correspond to the digitized images of natural scenes. That is, each of the 8-connected grids represents an image pixel. The grid also signifies a possible location for an autonomous agent to inhabit, either temporarily or permanently, as illustrated in Figure 5.1. Definition 5.4 (Neighboring region of an agent) The neighboring region of an agent at location (i, j) is a circular region centered at the given location with radius R(i,j).
128
J. Liu
An agent located at
nj
Agent neighboring region
\ An agent
Fig. 5.1
5.4.3
An autonomous agent, at location («', j ) , and its local neighboring region.
Local Stimulus
in Two-Dimensional
Lattice
Definition 5.5 The local stimulus that selects and triggers the behaviors of an agent at pixel location (i,j) is computed from the sum of the pixels belonging to a neighboring region which satisfy the following condition: the difference between their grey-level intensity values and the value at (i,j) is less than a positive threshold. In other words, the stimulus is determined by the density distribution of all the pixels in its neighboring region whose grey-level intensity values are close to the intensity at (i, j). More specifically, the density distribution can be defined as follows:
R
Di ' ( » J )
£
R
£
{s°*°l \\m(i + s,j + t)-m(i,j)\\<S}
(5.1)
s = -Rt = -R
where R s, t m(i, j)
the radius of a circular region centered at (i,j), the indices of a neighboring pixel relative to (i,j), the grey-level value at a location (i, j), and a predefined positive constant.
Self-Organized
5.4.4 5.4.4.1
Self-Organizing
Intelligence
129
Behaviors
Diffusion
When the age of an agent does not exceed its life span (i.e., it is an active agent) and the agent has not found a feature pixel whose grey-level intensity satisfies the condition as set by Eq. 5.1, it will move in a certain direction to a location inside its neighboring region. Diffusion behavior plays an important role for the agent to search feature pixels in the two-dimensional lattice. The specific stimulus that triggers this behavior is given as follows: Definition 5.6 (Diffusion) Let <j> = [fa, fa] be an acceptable range of the pixel count as defined by Eq. 5.1, where fa < fa. An agent moves to its adjacent locations whenever the outcome of its evaluation of the density distribution falls outside the
interval, i.e., D^,f x ^ fa The direction of the diffusion is selected based on an 8-element probability vector in which each value indicates the probability of becoming high-fitness if the agent diffuses in the corresponding direction. The direction vector of the agent as mentioned in the above definition is updated based on the diffusion directions of its previously selected high-fitness agents. The details on the updating computation are given in Subsection 5.4.4.5. 5.4.4.2
Self-Reproduction
When an agent detects a feature pixel, p, it will reproduce a finite number of offspring agents within its neighboring sectors. This behavior enables the agent to populate its offspring agents near a pixel location that meets the feature definition, and hence increases the likelihood of further feature extraction. Definition 5.7 (Self-reproduction) Let <j> = [fa, fa] be an acceptable range of the pixel count as defined by Eq. 5.1, where fa < fa. An agent reproduces a finite number of offspring agents inside its neighboring region of radius R(i,j) in a direction as computed from its direction probability vector, if the outcome of its evaluation of the density distribution at p falls into the <j> interval, i.e., D^ .* € fa The direction vectors for self-reproduction by the parent agent and its offspring will be determined based on an updating mechanism.
130
J. Liu
5.4.4.3 Feature Marking When an agent detects a feature pixel, p, it will place a fixed marker at p. There may be different kinds of features in an image, hence several kinds of markers can exist. The marking behavior of an autonomous agent is necessary in order to label detected image features. The stimulus for selecting this behavior is stated as follows: Definition 5.8 (Feature marking) Let (f> = [<^i,>2] be an acceptable range of the pixel count as defined by Eq. 5.1, where \ < fo- An agent places a marker at pixel p, if the outcome of its evaluation of the density distribution at p falls into the £ 6.
5.4.4.4
Agent Fitness Function
As mentioned before, the reactive behaviors of diffusion and self-reproduction will be augmented with direction parameters. In order to select the most effective parent agents among the previously successful ones and copy the directions as used in their diffusion and reproduction (before they found local features), here we introduce a measure of agent fitness as follows: Definition 5.9 (Agent fitness function) Let F(ui) denote the fitness value of an agent u>. Thus, F(W)
= (1-
5 t 6 P S bef
°—Reduction
I —1
jf
y
finds
& t r i g g e r i n g
s t i m u l u s
otherwise (5.2)
As can be noted from this definition, the fitness function measures how long it takes the agent to find a feature pixel. The maximum fitness value will be equal to one if the agent is directly placed at the feature pixel when being reproduced. 5.4.4.5
Direction Adaptation
What follows describes the updating mechanism for an autonomous agent to compute its diffusion and self-reproduction direction vectors. By definition, a direction vector for a certain behavior specifies an array of probabilities of success if respective directions are chosen for that behavior.
Self-Organized
Intelligence
131
Assume that a parent agent u)9' of generation g produces a set of agents {UJJ }• This set will further produce offspring of generation g+2, denoted as {u>,-^ }, if any of them encounters a triggering condition in the environment. Thus, the directions for diffusion and reproduction by agent u\a-k are determined from the directions of the selected agents from {w,-? '} and {w,-?^ }• The selection criterion is based on their fitness values, as computed using Eq. 5.2. Specifically, the probability value associated with direction 77 for diffusion and self-reproduction by agent uMk can be derived, respectively, as follows: For all u G {w|| + 1 ) } and { w ^ 2 ) } , and F(u) > 0, compute:
where
5.4.5
0 T Ni Mi
Experimental
the the the the
P(ve9)u = =^w
(5.3)
*£r)» =E ^
(54)
directions for diffusion, directions for self-reproduction, agents diffused to stimuli in direction agents reproduced in direction i.
Studies
The preceding section has provided a formal model (e.g., rules) for agent behavioral self-organization. Now let us examine how such agents are applied in a digital image environment to extract interesting image features. Specifically, we discuss a typical image-processing experiment on feature tracking. Figure 5.2(a) illustrates a sequence of time steps, over which the location of a T-shaped feature region changes. In this example, self-organizing agents may detect the borders of this region at one time step t as shown in Figure 5.2(b), but lose some of the detected feature pixels at another time step, t + 1, simply because the feature region has moved to a new location. Thus, some of the agents previously selected at t may no longer be selected at time t + 1, as illustrated in Figure 5.2(c). When this change
132
r
J. Liu
(a)
(b) t
(c)
t+1
Fig. 5.2 (a) An example dynamic environment in which a T-shaped object moves in discrete space and time, (b) Assume that agents have been selected in the T-shaped environment at time step t. (c) At time step t + 1, the T-shaped object moves to a new location, resulting in previously high-fitness agents to become lower-fitness.
in the agent environment occurs, the low-fitness agents will actively diffuse to their adjacent locations, self-reproduce offspring agents as soon as some feature pixels are encountered again, and update their behavioral vectors accordingly in order to maximize their fitness in the new environment (i.e., local fitness optimization). As a result, the agents can quickly figure out the right diffusion and self-reproduction directions for tracking the moving target at the subsequent time steps.
Self-Organized
5.5
Intelligence
133
Self-Organized M o t i o n in Group R o b o t s
Self-organized motion is concerned with the problem of how to effectively generate emergent motion (e.g., navigation) behaviors in a group of robots when complete information about the robot environment is not available or too costly to obtain. This issue is particularly relevant if we are to develop robust group robots t h a t can work collectively and adaptively on a common task, even though each of t h e m only senses and hence reacts to its environment locally.
5.5.1
The Task of Group
Robot
Navigation
and
Homing
In the task of distributed robot navigation and homing, we assume t h a t a commonly-shared goal, i.e., a set of points p £ £ C Rn, is given where JC satisfies certain constraints p. T h e constraints may be mathematically expressed as follows: P = {*>$& I
I
OX
=0} S=x'a
(5.5) J
In addition to the goal locations, we also delineate a closed area as the robot environment, which is denoted as follows: S = | ( z > y) Xmin <X<
Xmax,ymin
< y < ymaxj
%min: %max > ymin>ymax
(5.6)
£ '*-
Note t h a t a robot environment can be of various shapes, e.g., the enclosed area of a circle. T h e only requirement is t h a t the environment be closed and connected. For the sake of illustration, we will consider the environment as being a convex set in 1tn.
5.5.1.1
Performance
Criteria
When distributed agents (i.e., group robots) with different behavioral rules are dispathed in an environment, what kind of collective behaviors with respect to the given goal locations in the given environment can be expected or self-organized by the agents? Before we address this question in details, let us first define two notions.
134
J. Liu
Definition 5.10 Suppose that there is a group of N agents in environment S. The shared goal locations for the agents are specified by C and the attributes of the agents are defined in A. We say that the agents with attributes A following behavioral self-organization rules V can attain their goal £ in S iff the agents can reach goal C after their interaction with the environment. That is,
{S,l,A,V}t^oo{S,C,A,V}
(5.7)
where I denotes the initial distribution of agents. Otherwise, we say the goal of the agents is unreachable.
The above notion can also be defined in the terms of probability: Definition 5.11 If
p({S,l,A,V}t^l{S,C,A,V])
= l
(5.8)
then we say the goal of the agents, C, is reachable with probability 1.
Our task is to create proper self-organizing rules {i.e., local motion controllers) for the agents that would enable them to move from their current positions toward the given goal locations L. In our system, the position of an agent will change according to its velocity. Note that here the velocity is a vector, representing both the direction and the magnitude of changes in the agent position. The velocity of the agent will change based on the observations from the environment, that is, the velocity will be updated according to certain rules triggered by the signals received from other local neighboring agents. In addition to such local signals, the agent will also receive global performance feedback, denoted by gB(t), that corresponds to an overall group performance evaluation calculated and sent by a higherlevel agent.
Self- Organized Intelligence
5.5.2 5.5.2.1
An Overview
of the Multi-Agent
135
System
The Attributes of Agents
For a group of robots Ai where i is numbered from 1 to iV, the attributes of Ai, i.e., A, are defined as follows: " ai a-2 «3 04
a5 a6 a7 as a9 _
<— classification-identifier <— x ..coordinate <— y-coordinate <— x-moving step <— y-moving step <— velocity-change-coef ficitnt <— visible-areashape
(5.9)
Based on the above attributes, we can define the position and velocity of Ai (i = 1,2,..., N) as x = (02,03) and v = (04,05), respectively, d = (07, as, 09) specifies the visible area of an agent, where ai denotes a shape primitive (e.g., a circle or a square) and (ag, ag) represent the depths in x— and y— directions, respectively. In our work, it is assumed that each step motion of an agent does not exceed the area of its visible area. Furthermore, 06 = (oi,/3,7, A) reflects the change of agent velocity; the specific definition of 06 will be given later. We assign classification identifier a\ of an individual agent Ai according to its distance from goal £. The farther Aj is away from C, the smaller its a\ will be. In our present work, the classification identifier of agent Ai is defined in the following manner: ' 22; _ J 12; I 6; . 3;
if V(Ai,goalX) 0; if V(Ai,goal-C) < visible-range-of-Ai; if 3Aj,V(Aj,Aj) < visible-range-of-Ai; if -BAj,V(Aj,Ai)
/ecir^
where T>(-, •) denotes a distance measure. Note that constants 22, 12, 6, and 3 are used just as labels, and thus can be set arbitrarily, as long as they are in a descending order. Depending on the value of a\, that is, A,'s closeness to C, we can divide agents into four groups named GQ,G\,G2 and G3: Ai G Go iff a\ = 22;
136
J. Liu
At G Gi iff ai = 12; Ai G G2 iff ai = 6; ,4; G G 3 «/f ai = 3. As agents move in environment S, their distances to goal C will change dynamically. Thus, the classification identifiers (a*'s) of the agents, namely their group identification, will also change accordingly. In our present work, the initial attributes and spatial distribution of the agents are set as follows: a\
=
3;
i.e., all the agents are initially set to be in group G3
(a2,a3)
-
Rand{(xmin,ymin),(xmax,ymax)}
(a4,a5)
— Rand{±
,± cr
5.5.2.2
};
(5-11)
a > 1;
a
Behavioral Self-Organization
The dynamical motion of each agent will be governed by the following behavioral self-organization steps: x(t + l)
=
x(t) + v{t)
(5.12)
v(t + 1)
=
av(t) + f3c{t) + jf(t) - \sign(v(t))sat(gB{t))
(5.13)
where r(t) is a random excitation signal with zero mean and finite variance; gB{t) is the smoothed gradient of average distance from goal C sign(v(t)) is a vector, each of its elements corresponds to the sign of a v(t) component. sat(-) is a saturation function. As shown in Eq. 5.13, the motion of an agent incorporates the collective influence from the group that it belongs to. Such a collective influence is reflected in term c(t) that is respectively calculated for the four groups as follows: ci0(t)
=
0;
3i(«)
=
Ea
iff ''fj{a:=22}(rra:'); Numberjof-Aj
Ai G Go{ai = 22} . 7 / A . - € G ! { a i = 12}
(5.14) (5.15)
Self-Organized
Intelligence
137
where agents Aj 's denote those inside the visible range of agent ^4,-. As may be noted above, there will be no collective effect for the agents belonging to Go- Agents in G\ will be pulled toward GQ. The velocities of the agents belonging to G2 will be self-organized from those of the agents in Gi if the G\ agents are visible. Agents in G3 will look around and take the collective velocity into account when they make a decision about their moving direction and speed.
5.5.3
Local Memory-Driven Performance-Driven
Behavioral Behavioral
Selection and Global Learning
The localized memory-driven behavioral selection of an agent is embedded in its motion control that corresponds to the previously-acquired dynamical behavior of the agent. This is signified in term v(t) of Eq. 5.13. Constant a quantifies the extent to which the previous experience will be taken into account. Further, we define one important feedback signal, B(t), where t denotes a discrete time instant. B(t) is the averaged distance of all agents from goal C. We let the B(t) signal be broadcast to all agents. This will become the only knowledge that the agents can have about the overall performance of the whole system. In our work, the gradient of B(t), denoted by gB(t), other than B(t) itself will be incorporated. This is because, statistically speaking, the gradient of B(t) gives the information on whether or not the agents tend to move toward goal C. In order to avoid some high-frequency noisy vibration, gradient gB(t) will be smoothed before it is used. The specific definitions are as follows:
v
'
B{t) gB(t)
^
v
Number.of-At
=
0AVB{t) + 0.WB(t-l)
=
0.5{B{t)-B{t-l))
+ 0.2VB(t+
+Q.2(B(t - 2) - B(t - 3))
'
2) + 0.1VB(t -(3)19)
0.3(B{t-l)-B{t-2)) (5.20)
138
J. Liu
5.5.4
Experimental
Studies
We have conducted a series of simulation-based experiments to examine the goal-attainability of distributed group robots as a multi-agent system, with respect to different sets of parameters for Eqs. 5.12 and 5.13. 5.5.4.1 Experimental Design In the simulations, agent environment S is a two-dimensional grid-like search space of size 300x300. The shared goal locations of agents are those that satisfy a linear equation, ^ + ^ = 1. The visible range of each agent is defined as a circular surrounding region of radius 20. In our experiments, we will examine the effects of local memory-driven selection and global performance-driven learning in behavioral selforganization on the goal-attainability of the system. These two terms are weighted in the behavioral self-organization algorithm of Eq. 5.13 by coefficients a and A, respectively. In addition, we will also take a look at the dynamics of the system with different numbers of agents N. We will conduct our simulations by changing one parameter at a time while keeping other parameters unchanged. The initial position and velocity distributions of agents will be randomly set in each empirical experiment in order to obtain statistical conclusions. Each simulation will be executed for 3000 steps, snapshots at time steps 1, 2000, and 3000 will be recorded and plotted. 5.5.4.2
Experiment 1: Varying Degree of Local Memory-Driven Behavioral Selection
In this experiment, the influence of agent memory weighted by coefficient a in Eqs. 5.12 and 5.13 on the convergence rate of averaged distance of all agents from goal C will be investigated. This coefficient may be viewed as a damping factor in the dynamics of an individual agent that reveals the degree of agent's self-confidence in determining its moving direction and speed. Figure 5.3 presents the convergence results in the following four situations: We fix the number of agents involved (58), the global performance feedback strength A (40), and the visible range of agents (20), while allowing a to vary from 0.1, 0.35, 0.7 to 0.95, respectively. Three curves are
Self-Organized
139
C20S3w5L40AgN58
C20S3w5L40AgN58
200
200
150
150
100
100
1000 2000 3000 (b) Av dis & gr.2; l=-0.002572 C20S3w5L40AgN58
1000 2000 3000 (a) Av dis & gr.2; l=-0.00256 C20S3w5L40AgN58
200
200
150
150
100
100
1000 2000 3000 (c) a=0.70; !=-0.004103
Intelligence
1000 2000 3000 (d) a=0.95; l=-0.009134
Fig. 5.3 The dynamics of a multi-agent system with 58 agents. All the parameters are kept fixed while only the previously-acquired behavior factor a varies, it takes the value of 0.1, 0.35, 0.7, and 0.95, respectively in (a), (b), (c), and (d). This figure shows the relationship between convergence speed £ and the previously-acquired behavior factor a. In the four sub-figures, the line that gradually increases to 100 corresponds to the percentage of agents in group G 0 among all agents. The thin fluctuating descending line corresponds to the averaged distance of all agents not in G0 from the goal. The thicker descending line gives an exponential function that is used to fit the averaged distance.
plotted in each of the four sub-figures. The line that gradually increases to 100 corresponds to the percentage of agents in group Go among all agents. The thin fluctuating descending line corresponds to the averaged distance of all agents not in G0 from the goal. The thicker descending line gives an exponential function used for fitting the averaged-distance curve. As can be noted from Figure 5.3, a lower memory level (a=0.1) will cause the system not to attain its goal, while a higher memory level (a=0.95) will lead the system to attain the shared goal locations very rapidly. Ta-
140
J. Liu
£: Convergence rate (Decaying rate of the averaged distance) \ Test 1 Test 2 Test 3
a: Damping level of local memory (Self-confidence measurement of agents) a=0.35 a=0.70 a=0.95 a=0.1* 0.001638 0.002600 0.003359
0.003359 0.002572 0.002228
0.005018 0.003359 0.004100
0.007168 0.009100 0.006431
Table 5.1 Relationship between 4 and a. * indicates that some of the agents failed to reach the goal locations, however, the dynamics of their averaged distance still obeys the exponentially-decaying law.
ble 5.1 summarizes the convergence rates corresponding to various a values, obtained in three separate test runs.
5.5.4.3 Experiment 2: Varying Degree of Global PerformanceDriven Behavioral Learning Figure 5.4 presents three snapshots of agents at step 1 (upper-left plot), step 2000 (upper-right plot), and step 3000 (lower-left plot), respectively. It also shows the convergence of the agents toward the shared straight-line goal locations in terms of the changes in the percentage of the Go agents and the averaged distance of all agents from the goal (lower-right plot). Note that at step 1000, only few agents have not reached the goal. Recall that term XgB(t) in the behavioral self-organization algorithm of v(t+l) serves as a global performance feedback. In the above simulation, we have incorporated a global performance feedback signal by setting A = 60. As shown from the results, the agents in such a case are goal-attainable. The global performance-driven learning is necessary to guarantee a fast group homing behavior. Such a necessity is most apparent from the following simulation test where the global performance-driven learning is removed. The results of this test are shown in Figure 5.5. We note that many agents, except those originally located near the line, are merely wandering. Figure 5.6 presents four sets of simulation results that illustrate the general effects of global performance feedback A on the convergence of averaged distance B(t). The A values considered are 10, 20, 40, and 60, respectively, while memory coefficient a is kept constant at 0.7 and the total number
Self-Organized
141
Snapshot: 2000
Snapshot: 1 300
300
200
200
100
100
0
Intelligence
0
100 200 300 Distribution of agents. Snapshot: 3000
100 200 300 Distribution of agents. C20S3w5L60AgN127
150 300 100
200 100
0
100 200 300 Distribution of agents.
1000 2000 3000 Avdis&gr.2;l=-0.003123
Fig. 5.4 The dynamics of a multi-agent system with 127 agents, with global performance-driven learning (A = 60). Agents in Go (they have arrived at the goal locations) are labeled with symbol ' + ' . The multi-agent system in this case is goalattainable and its convergence rate is very high. The smoothed plot for the average distance of agents away from goal C is given in the lower-right sub-figure.
of agents is 58. From these results, we can observe that if A is non-zero, B(t) —> 0 as t —> oo. Generally speaking, a larger A value leads to a faster convergence speed, i.e., B(t) at t = 3000 is lower. Nevertheless, a quantitative relationship between the A value and the resulting B(t) value at t = 3000 still remains to be explored.
5.5.5
Discussions
Our simulations have shown that if global performance-driven learning is incorporated, agents will move faster toward goal C and successfully attain
142
J. Liu Snapshot: 1
Snapshot: 2000
.
300 v
" *
.
• ,\
200
* ••
\ 100
•
•
1
•
•
•
"
0 0
100 200 300 Distribution of agents. Snapshot: 3000
0
100 200 300 Distribution of agents. C20S3w5L0AgN127
150 300 100
0
100 200 300 Distribution of agents.
1000 2000 Av dis & gr.2; l=-0.00256
3000
Fig. 5.5 The dynamics of a multi-agent system with 127 agents, without global performance feedback (A = 0). In this case, many agents are far away from the goal, keeping on wandering.
their shard goal (goal-attainable). On the other hand, if there is no global performance feedback Xsign(v(t))sat(gB(£)) involved, the agents in G3 are not goal-attainable from the practical point of view. If we use — Xsat(gB (t)) to modify the velocity of agents according to Eq. 5.13, all the agents pertaining to Go,Gi, and G2 will be goal-attainable. Concerning the agents in Gz when A > 0, although our simulation supports the goal-attainability conclusion, we have not analytically examined the goal-attainability of the agents in group G3. What we can say is that it is very likely that they will be goal-attainable due to the existence of a global performance feedback term — \sign(v(t))sat(gB{t))There are other factors affecting the convergence rate of agents as they are moving toward target locations. In our simulations, it has been found
Self-Organized
Intelligence
143
C20S3w5L20AgN58
C20S3w5L-20AgN58 200
200 150 100 50
0
200 r
200
0
1000 2000 3000 a=0.7;b0=15.17l=-0.002228 C20S3w5L10AgN58
1000 2000 3000 a=0.7;b0=18.33l=-0.001249 C20S3w5L100AgN58
1000 2000 3000 a=0.7;b0=4.793l=-0.002572
0
1000 2000 3000 a=0.7;b0=16.24l=-0.001638
Fig. 5.6 The effects of global performance feedback factor A on the convergence of averaged distance B(t) in the multi-agent system.
out that increasing the coefficient of random term 7 or just increasing the variance of random signal r (t) will only have little influence on convergence rate £. It simply increases the variations of the curve at different time steps; the curves vibrate in a larger magnitude. The nonzero mean of f(t) will cause the agents move in a fixed direction until they are reflected back by the boundary of robot environment S.
144
J. Liu
5.6 5.6.1
Related Work Image
Feature
Detection
In robot vision, detecting geometric features such as regions, edges, curves, corners, and borders can greatly facilitate the interpretation of the scenes. Many theories and algorithms have been proposed and applied in the fields of computer vision and image processing. For instance, Liow [13] proposed an extended border tracing technique that combines the operations of region finding and closed contour detection. Alter and Basri [l] applied the so-called Salient Network method for extracting salient curves and noted that this method could suffer the problem of failing to identify any salient curve other than the most salient one (according to their proposed saliency measure). Lee and Kim [12] presented a method of extracting topographic features directly from a grey-level character image, without calculating eigenvalues and eigenvectors of the underlying image intensity surface. The method efficiently computes the directions of principal curvature. Maintz et al. [16] investigated the problem of evaluating ridge seeking operators for multimodality medical image matching. They have constructed various ridge measures related to isophote curvature in an attempt to identify the useful convolution operators for CT/MRI matching of human brain scans. With conventional techniques for image feature identification, grid template-like look-up tables [13] and/or models [3; 19] are used to determine the existence of any features by tracing from a current pixel or region to its neighbors. The main disadvantage of this approach is that all the possible situations must be carefully analyzed and exhaustively searched.
5.6.2
Learning
in Group
Robots
Fukuda and Iritani [6] proposed a mechanism for modeling group cooperative behaviors among decentralized autonomous robots, called CEBOT (i.e., Cellular Robots). Their work simulated the generation of group behaviors based on a globally stable attractor and the identification of new group behaviors based on bifurcation-generated new attractors. Mataric [18; 17] studied the problem of group behaviors such as coordination among robots, and developed a group behavioral learning method in which heterogeneous reward function-based reinforcement learning (RL)
Self-Organized
Intelligence
145
was applied to synthesize collective behaviors, such as flocking, foraging, and docking by means of direct/temporal summation and switching of some basic behaviors.
5.6.3
Adaptive
Self-Organization
Adaptation is concerned with applying the computational models of evolutionary processes (e.g., [2]) to either achieving intelligent agent behaviors, where intelligence is measured in terms of the agent ability to contribute to its self-maintenance at genetic, structural, individual, as well as group levels [2l], or solving real-life computation-intensive engineering problems, such as numerical optimization. Fogel [5] has provided a thorough treatment on the foundation and scope of this field (also see [7; 8]). Adaptive self-organizing agents as applied to digital image processing is a newly-explored area of research that studies the emergent behaviors in a lattice of finite automata in which agents react locally according to a set of behavioral rules [4; 9; 10; 11; 14; 15]. Each of the agents may be viewed as a learning automaton [20]; the probabilities of individual actions are updated whenever the output of a certain action is observed and evaluated using a performance criterion.
5.7
Concluding Remarks
In this chapter, we have investigated how to apply a multi-agent approach to tackling robot vision and group motion problems. The key to the emergence of collective agent behavior to solve those problems lies in the utilization of some bottom-up, self-organizing rules by autonomous agents. We have presented and demonstrated an approach to image feature searching and tracking that utilizes adaptive self-organizing agents. In our approach, an adaptive agent, being a distributed computational entity, resides in the two-dimensional lattice of the digital image, and exhibits a number of reactive behaviors. Also presented in this chapter is a self-organized motion approach applicable to the cases where a group of distributed robots is required to navigate in an unknown environment. While providing the detailed formu-
146
J. Liu
lations and self-organizing rules for each individual robot, i.e., an agent in the self-organized multi-agent system, we have also carried out various case studies. It is evident from our simulations that if a global performance feedback signal is introduced, distributed agents can quickly navigate toward shared common goal C.
Acknowledgements The author wishes to acknowledge the support provided by Hong Kong Baptist University throughout this research project. Special thanks go to Mr. Y. Lei for his assistance and help in part of the experimentation.
Self-Organized
Intelligence
147
Bibliography
[1] T. D. Alter and R. Basri. Extracting salient curves from images: An analysis of the saliency network. Memo 1550, MIT AI Lab, 1995. [2] W. Banzhaf and F. H. Eeckman, editors. Evolution and Biocomputation: Computational Models of Evolution. Springer-Verlag, Berlin, 1995. [3] M. Barzohar and D. B. Cooper. Automatic finding of main roads in aerial images by using geometric-stochastic models and estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7):707-721, 1996. [4] F. Dellaert and R. D. Beer. Toward an evolvable model of development for autonomous agent synthesis. In R. A. Brooks and P. Maes, editors, Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pages 246-257. The MIT Press, Cambridge, MA, 1994. [5] D. B. Fogel. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ, 1995. [6] T. Fukuda and G. Iritani. Construction mechanism of group behavior with cooperation. In Proceedings of the 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 535-542, Penns y l v a n i a , ! ^ , 1995. [7] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Publishing Company, Reading, MA, 1989. [8] J. H. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, 1975.
148
J. Liu
[9] C. G. Langton. Self-reproduction in cellular automata. Physica D, 10:135-144, 1984. [10] C. G. Langton. Studying artificial life with cellular automata. Physica D, 22:120-140, 1986. [11] C. G. Langton. Artificial life. In Artificial Life: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems, Los Alamos, New Mexico, pages 1-47, Redwood City, CA, 1988. Addison-Wesley Publishing Company, Inc. [12] S. Lee and C. Yi. Assemblability evaluation based on tolerance propagation. In Proceedings of the 1995 IEEE International Conference on Robotics and Automation, pages 1593-1598, 1995. [13] Y. Liow. A contour trancing algorithm that presevers common boundaries between regions. CVGIP - Image Understanding, 53(3):313—321, 1991. [14] M. W. Lugowski. Computational metabolism: Towards biological geometries for computing. In Artificial Life: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems, Los Alamos, New Mexico, pages 341-368, Redwood City, CA, 1988. Addison-Wesley Publishing Company, Inc. [15] P. Maes. Modeling adaptive autonomous agents. Artificial Life, 1(12):135-162, 1994. [16] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever. Evaluation of ridge seeking operators for multimodality medical image matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):353-365, 1996. [17] M. J. Mataric. Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(l):73-83, 1997. [18] M. J. Mataric and D. Cliff. Challenges in evolving controllers for physical robots. Robotics and Autonomous Systems, 19(1), 1996. [19] N. Merlet and J. Zerubia. New prospects in line detection by dynamic programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):426-431, 1996. [20] K. S. Narendra and M. A. L. Thathachar. Learning Automata. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1989. [21] L. Steels. Intelligence — dynamics and representations. In L. Steels, editor, The Biology and Technology of Intelligent Autonomous Agents, pages 72-89. Springer-Verlag, Berlin, 1995.
Chapter 6
Valuation-Based Coalition Formation in Multi-Agent Systems
Stefan J. Johansson Department of Software Engineering and Computer Science, Blekinge Institute of Technology, Sweden
6.1
Introduction
The notions of coalitions, norms and agents raise a lot of interesting questions. What makes agents form coalitions? When are new agents in a Multi-Agent System (MAS) considered to be members in a coalition? And when does a coalition think it might be time for certain agents to leave? Can we design agents in such a way that they will continuously improve and strengthen the coalitions that they are part of? Does the size of the coalition matter? How are the norms of a coalitions updated? Not all of these questions will be treated here, but we will try to provide some thoughts on issues such as the value of a coalition having a certain agent as a member, continuous degrees of membership, and whether cheating is possible in such models or not. The main contribution of this work is a discussion about design principles for coalition formation based on some (of many) possible value-based models of choice of actions. The question of whether to cooperate or not is not new. Game theorists such as Lloyd Shapley discussed the matter of values and alternative costs in n-person games in the fifties [14]. Shapley argued that the value of a cooperating agent is directly associated with the alternative cost for it leaving the coalition of cooperators (i.e. its Shapley value), and thus, it would be fair for that agent to have a share of the profit that is proportional to its Shapley value. 15 years later Owen showed that the Shapley values cannot be interpreted as a measure of power (i.e. the ability to bargain) of the agents [10]. 149
150
S. J.
Johansson
Others, for example Conte and Paolucci, have tried to model situations in which social control may be reached, but neither ultimate, utilitarian nor a normative strategies are optimal in all situations [3]. The art of building states in which people voluntarily (or by force) cooperate for the best of the state, even though it implies paying high taxes is discussed by Iwasaki et al. [6]. Klusch and Vielhak have discussed negotiations in coalition formation and implemented the COAL A environment for simulating it [8; 18].
6.1.1
Examples
of
coalitions
Let us give an example of some situations in which there are both agents and explicitly or implicitly stated coalitions: A person normally has several coalitions with other people he knows. His family, his employer, his neighbors, friends all have some expectations on how he should behave explicitly or implicitly stated in their common norms. For instance: a forgotten birthday may result in a weakened value in the coalition with the forgotten person and receiving help from a neighbor will hopefully increase the value of the neighbor from the perspective of the person. In the same way, we may expect computerized agents to have explicit or implicit expectations on getting paid for the services they provide. Micro payments and similar fine-grained ways to describe debts could be of use here. Automated multiple multi-commodity markets have in some sense caught the essence of mutual valuations. If an agent is unable to find what it is looking for (at the right price) in one market, it proceeds to the next one. If the number of potential buyers has a positive impact on the utilities of the sellers, then the sellers will try to make themselves as valuable as possible for the buyers, and the buyers go to the marketplaces where the most valuable offers are available. The agents may of course include their probabilities of actually getting their hands on one of the cheap offers. Therefore, it may be the case that the market with lowest price is not the market it eventually prefers when e.g. expected delivery time etc. are taken into account.
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
151
All three examples show how coalitions of agents (the companies, the persons known and the markets) affect the choice of actions of individual agents (consultants, the person herself and the agent at the market) as well as the individual agent may have an impact on the norms and the membership of others in the coalition. Given these examples, we will describe a theory of mutual valuation between agents and their coalitions that is able to, at least in theory, model the situations above. We take the approach of considering the membership in coalitions as something continuous where the matter of the degree of membership is decided through how it values the coalition as well as how the coalition values the agent. We will get back to the question of how to calculate the degree of membership later in the chapter where we give one model (of many possible ones) of such a calculation. Of course this will lead to a possibility for an rational agent to leave a coalition or join more than one coalition if there are other more tempting offers, as discussed for example by Sandholm [13]. Our approach also opens up the possibility for an agent to believe it is 42% part of one coalition, even if the coalition as such does not think that the agent is part of it to more than 17% and for dynamic continuous coalitions to evolve both over time and in strength. Based on this point of view, we would like to make the following definition of the term coalition: Definition 1 A coalition is a tuple (N, M) consisting of a set of norms N, and a set of degrees of memberships M = {mi...m„}, where each m; is a pair (OJ, d,) where ai is an unique agent id and di is a number describing to what degree the agent is part of the coalition. The agents are also supposed to be rational, or at least boundedly rational in their decision of what to do, i.e. they will, as far as they know, do their best to reach the goals that they are designed to achieve. Such a point of view has been discussed and criticized for example by Doyle [4]. One of the advantages with the approach is the possibility of characterizing types of behaviors at a knowledge level, rather than just enumerating them, making it easier to relate them to relevant concepts in other sciences. However, in practice no agent is skilled enough to make truly rational choices, and even if they are, the choices of actions they make are rational given a set of conditions that in turn may be inaccurate and dynamically changing. We will take a pragmatic point of view claiming that boundedly rational agents
152
S. J.
Johansson
will do their best regardless of their knowledge, i.e. they will be able to choose the best action, given their limited amount of knowledge and sparse reasoning capabilities. 6.1.2
Outline of the
chapter
In the next Section, we will make some definitions concerning agents, their actions, the consequences, probabilities and so on. Sec. 6.3 will introduce two different values, Vj and V?, the value of the coalition i for an agent j and the value of agent j for the coalition i, respectively. Sec. 6.3 proposes a set of recurrence relations that may work as a simple model for updating these values and refine them to employ an adjustable degree of memory loss or forgiveness. Finally, we draw some conclusions and point out possible future trajectories of the work.
6.2
Agents and Actions
An agent ai £ A = {ai,..., an} may at each point in time choose to perform one of the actions bi,...,bm chosen from the set of possible actions B. We refer to the action taken by a* at time t as (3(i, t), i.e. j3 : Axt —>• B. Three things are worth take notice of in this description. Not to do anything is also a decision of what to do, hence an action and thus in B. \B\ = m may or may not be finite, but for reasons of simplicity we assume that it is finite and that the agents of the system may be unable to perform all actions in B. The actions that an agent performs may lead to intended or unintended consequences, i.e. partial descriptions of states in the environment. Regardless of whether the agent acted with the intentions to cause a certain consequence or not, we will assume that the casual relations between actions and consequences is describable, at least on an a posteriori basis. Of course, we could use the notion of states instead of consequences, but by letting the consequences be partial descriptions of the environment, the current state (as interpreted by the agent) is the current consequences that the agent believes are true.* Each action b{ will, by a probability of p* lead to a consequence qj 6 Q = ' F o r sensitive persons, the word believe may be exchanged by another word describing the data known by the agent.
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
153
{<7i, •••Qr}, thus for each action bl, there is a consequence probability vector, here denoted pl, describing the probabilities of each of the consequences. Also each consequence qj is associated with a vector of the probabilities that an action will lead to it, pj. In fact, all probabilities can be described in a matrix, where the p's make the columns and the pjS make its rows. Each of these consequences are better or worse for each of the agents in the environment, and in order to catch the effect of combined consequences, we associate each of the agents, a; with a utility function Ui : 2^ —> R that maps each combination of consequences (i.e. state of the MAS) to an utility value for agent i. We will in the rest of this text assume that the agents have full knowledge about its own utility function and the consequence probabilities of all actions that it can perform (although constraints on time, memory, etc. may limit their capability of making the most rational actions, c.f. boundedly rational agents [2]), but that it may be unaware of how other agents utilities and consequence probabilities look like. 6.2.1
Coalitions
and
Norms
There are several ways of looking at norms and coalitions (in which agents commit themselves to certain norms). Let us start with the coalitions. Either they are static a priori groups of agents. This may be the situation when agents are set to solve a central problem in a distributed way, such as in distributed problem solving. They do their part of the task, without questioning their role. In this case, the norms of the coalitions are never questioned. Durfee proposes a time scale for commitments, in which these static groups (with static commitments) corresponds to the permanent commitments [5]. However using Durfees terminology, a dynamic environment require dynamic norms and coalitions and we must change our position on the time scale to one that talks in terms of plans and organizations in the perspective of days, months and years for communities and teams instead. Where the number of agents in a coalition is concerned, we consider coalitions with two or more agents only. Each agent may be part of more than one coalition at the same time. The possibility of singular coalitions where an agent breaks its own norms is a subject that might be interesting, but we disregard that possibility here, also, the scale used by Durfee may not always be applicable as a guideline for future multi-agent systems, where we may
154
S. J.
Johansson
have short-term coalitions lasting for parts of a second, but in order to get a working metaphor, we would like to take the more human-like perspective of days-months-years. So another perspective is to let the agents continuously reconsider memberships of groups. They may choose to try to persuade other members of a coalition that it should be accepted in it (or at least be a regarded as a prospect of future membership). It can also, at certain occasions, choose an action that is out of the range of permitted behavior of the group, i.e. breaking one or more of the norms of the coalition. It may be the case that it values the outcome of a certain action more than the membership of a group (a group with a set of norms that does not support the action). Concerning the norms, we may from one point of view consider them to be definite laws that have to be obeyed by the agents in order for them to be accepted. To avoid situations where such a norm may prohibit the most beneficial choice for both the agent and the group, conditions may be set to decide when a norm is applicable or not a point of view discussed e.g. by Boman, where the norms are set to be global constraints that the agents never break [l]. Another way of treating norms is that the agent are advised to follow them and that they should follow them with a probability not to drop below a certain threshold. Or the agent commits itself to follow the norms according to an expected probability. Note the difference: in the former case, if the agent has broken a certain norm at one out of five occasions, and the threshold is set to 75% , it cannot break it in the next occasion, since it then would interfere with the threshold. In the latter case, this would be acceptable to do, given that it in the long run sticks to the 75%. None of these approaches have to be discrete in its nature, in the sense that when an agent make an action comprised by a norm, it is considered as a norm-breaker and subject to exclusion of the coalition. For instance, the norms may be relaxed and included in a fitness function that describes how well the agent fits in the coalition. Yet another way to look upon norms is to use norms as recommendations of what to do, in contrast to the former ones that put restrictions to the agents. This could be done e.g. by associating some types of actions with a positive feedback from the coalition. But really... What is the difference between punishing misbehaviors and praising compliance with the norms? In the former, all non-normbreaking actions are considered equally good by the coalition and in the
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
155
latter, all actions not recommended are considered equally bad. Maybe we have an agent that learns the norms of the coalition by receiving feedback and decide to modify its behavior based on its perception of the coalitions and its own goals. Of course, how fit an agent is for an coalition is dependent of both how many actions it performs that are good for the group and how many that are bad, and of course, how good and bad these actions are. For that reason, we will associate each coalition Cj with a norm function ipj : B —> [0,1], i.e. for every action a value between 0 and 1 tells whether it fit the norms of the coalition (near 1) or not (near 0). A general model for the update of the values is presented in the next section. The set of norms of a coalition is from our perspective a representation of guidelines of how to behave. All of these should be explicit to the members of the coalition, but one may imagine that in some cases, some norms are hidden for members that do not reach certain thresholds of memberships. Some of the norms may of course also be public for everyone, members as well as non-members, to take part of. The agent applies for membership and is then either accepted (to a certain degree) or rejected by the members of the coalition.
6.3
Models of Dynamic Groups
If the intentions of the design of an agent is to make it adapt to a changing environment, memberships of coalitions can not be an end in itself for the agents. Instead, all coalitions must be built on the conviction that the coalition will lead to an advantage for its members compared to not being part of it. Not only the individual agent must be convinced that joining the coalition will be worth the trouble, but also the rest of the coalition, i.e. its coalition partners must be convinced that the individual agent will strengthen them. Therefore, the expectations that a coalition/an agent will improve the fitness of the agent/coalition in the future may be reason enough for an agent to join or being given an offer by the coalition. By providing some models of how coalitions may be regarded, we build a base for the discussion about how to design coalition forming, but yet rational agents. Denote the value for an agent a; to be part of a coalition cj by VJ and the value for Cj to have a^ as a member by V?. It is clear that the higher
156
S. J.
Johansson
the values of V? and V? are, the more committed is a^ to Cj and of course, the other way around: if en and Cj do not see any value of cooperating with each other, nor will
The equilibrium
between V? and V?
The interesting cases are when V and V differ, that is: when the agent and a coalition disagree about the value of the agent cooperating with the coalition. As we will see, there are reasons to believe that both the agent and the coalition may perform actions aimed at level out the differences between the valuations they make of each other. • y? < v? 3
»
In this case, coalition Cj is more interested in getting agent a; to join it, than G^ is interested in joining Cj. This may lead to that the agents of Cj change its norms in order to increase V-, the value of en being part of Cj and thus making it more interesting for agent eii.
It may also be the case that, since Oj is of more value for Cj than the other way around, it may choose to violate some of the norms and still be accepted in Cj (although to a lower V?). This may be the case when an individual goal of an agent may be achieved through an action that Cj dislikes, but since it all together finds it
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
157
more rewarding to perform the banned action than to follow the norms of the group, it takes the punishment (in form of being less appreciated and thereby have a less influence on the future norm shaping in the coalition). • y? > V? 3
i
In this case, the agent is more interested in the coalition, than the other way around. This may lead to that a; performs actions that increase the interest of c, having a^ as a part of it, i.e. increase V?. For instance the agent may help the coalition in computing something or giving it some resources in order to arouse its interest. Another possible development is that the coalition for different reasons do not want en to join, e.g. since coalition resources and assignment may be restricted by the number of participating agents. Cj may then exploit a; and make it perform actions beneficial for the coalition, but not for a; itself until it finally realizes that it is being used by the coalition (and thus, its interest for the coalition, Vj is decreased). In both of the cases above, any of the four actions taken by Cj and a; strive towards reducing the differences between Vj and V?. Both the agents and the coalitions are able to increase and decrease their value to the others by performing appropriate actions. However, the comparisons themselves raise some interesting questions, e.g. what happens if the agent and the coalition have different opinions about the values of Vj and V/ ? The answer is that both act upon what they know, i.e. if they disagree on the relation between Vj and V?, they will choose actions according to their own view of the situation, in this case from different inequalities. Below, we will describe a few different models of updating the values of Vj and V?. For simplicity, we have used Vj and V? for denoting the values for agents and coalitions, but since these are things that vary over time, and time is essential to describe dynamic systems, we will from here on let Vj (t) and V? (t) denote Vj and V? at time t respectively. We will also from now assume that both Vj(t) and V?(t) £ [0,1].+ t Every group and agent may scale this value to another, more suiting range if they like to; so this assumption does not change the expressiveness of our model. Also it may be the case that different coalitions use different time scales, however, we will for reasons of simplicity keep to one time, denoted t and allowing agents and coalitions to be idle.
158
S. J.
Johansson
In order for the coalition to be able to judge the actions of an agent, one way to do this is to let the other agents estimate the role and responsibility that the agent has for the system being in the current state. Let zlc : S —• [0,1] be the opinion of the other agents that an agent i is responsible for the system being in state s; G S. We will not try to solve the question of how to calculate this function here and now; examples of such calculations can be found in the literature of e.g. reinforcement learning [16], or the COIN agents by Wolpert and Turner [20]. Also we would like to have a measure of how strong a coalition is. The more members it has, the bigger it is and the higher their degrees of memberships are, the stronger is the coalition. For simplicity, we will refer to the size of Cj, \CJ\ as being the sum £ \ V?, i.e. the sum of all degrees of memberships of the agents. This way of treating coalition strength can of course be controversial. For instance, the larger the coalition is, the harder it gets to distribute all information, but we will neglect these kind of more practical problems in this work. All symbols used in this chapter may be found in Table 6.1.
6.3.2
Two simple models of
valuation
We present two simple models of valuation, — one arithmetic and one geometric. 6.3.2.1
An arithmetic model
This first arithmetic mode/simply adds the utility of new actions to the previous utilities and takes the arithmetic average of the values. We assume (for the reason of simplicity and in all models) that each agent perform exactly one (possibly empty) action per time step and we have the following value Eqs.: V?'(0)=0,
(6.1)
^ ) = (^-l)-^-i)+W(M-D) ) t > 0
(62)
When deciding how interested a^ is in coalition Cj, it takes into account the actions of the "members" of c,- and the effect these actions have on the
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
159
state sj. V?(0) = 0,
(6.3)
(6.4)
6.3.2.2
A geometric model
Instead of updating the values arithmetically, it may be done geometrically, i.e. by multiplying the values with their updates and then averaging by taking the n:th root of the product. V?'(0)=0,
(6.5)
Vi (t) = y ( V ? ' ( t - l ) ) ' - i . ^ ( / 3 ( * , t - l ) ) , t > 0
(6.6)
In this model, not only the previous moves are averaged geometrically; also the effect of the actions of the other agents are multiplied and rooted. V?(0) = 0,
Vjii) = « (v;(t- i))*-i • (|e .,_W JJ I - v>(i -z$(s))Ui(s,t-
(6-7)
i),t > o
N (6.8) 6.3.2.3
A comparison between the models
Both models are in one respect like two elephants.* Every single move by every agent done so far is remembered and equally valued, no matter how long it has been since it was performed. However, note that the models cope with changes in the norms of a coalition in that an action always is judged in its actual time and environment, not in a future state (where norms may have changed the value of that action). What varies between the models is that in the former one, other agents influence the final result in a way proportional to their part of the coalition. *I assume without proofs that elephants lack the gift of forgetting things. Since this assumption is used for strictly metaphoric reasons, we will leave the discussion of the memory function of the elephant here.
160
S. J. Johansson
If we for instance have a coalition in which nine out of ten agents do a very good job (with Zj(s)v,i(s,t — 1) near 1), and the tenth behaves badly, the latter will only effect 10% of the result and the overall impression of the coalition will be that it solves its problems quite well (Vj! at about 0.9). This may work for some domains, but in others, especially the ones where agents are highly dependent on each other, the deceit of one agent may spoil the result of the coalition as a whole. It may therefore be of interest for an agent to know if the whole coalition works or not. In that case, the geometric model might be handy, since it focus more on the weaknesses of the coalition. However, it is very hard for the model to forget previous mistakes and since all previous moves are weighted equally, a "bad action" will effect the Vs for an (unnecessarily) long time. The elephant property make it impossible to fully forget a "mistake" in the sense that every single move by every agent done so far is remembered and equally valued, no matter how long it has been since it was performed. However, note that the model copes with changes in the norms of a coalition in that an action always is judged in the time and environment in which it occurs, not in a future state (where norms may have changed the value of that action). Sometimes we may prefer a model that let the present actions have a greater impact on the valuations than the actions of a previous step in time.
6.3.3
Two forgiving
models
Just as humans are able to forget and forgive, this may be a desired property in a MAS as well. It turns out that such a change is quite easy to implement and the previous model can be changed to the following: 6.3.3.1
The arithmetic model V?'(0) = 0, m
1
=
(6.9)
7-Vj^-i)
+
W(M-D)
(6 . 10)
>0
1+7
y/(o) = o, VUt) 3
7-F/ft-l + = —2± i
(6.ii) ^ ' 1+ 7
|c. M
,*>0
, ^ (6.12)
Valuation-Based
An B = m
{ai,...,an} {bi,...,bm}
C = {ci,...c e } Ni Mi = (m1,...mn) Q = {qi,--,qr} r S = {si,s 2 ,...,s 2 -}
M,t) P) Pl Pi
Uj{s,t)
VI
vi •4>j{h) z){s)
M 7
4>
Coalition Formation
in Multi-Agent
Systems
161
The set of agents (considered in the system) The number of agents (considered in the system) The set of possible actions The number of actions possible to perform in the system The set of possible coalitions Cj = (Ni,Di) The set of norms of coalition Ci The vector of degrees of memberships of coalition c; The set of consequences The number of possible consequences in a system The set of states of the system, each si C Q The action performed by a,j at time point t The probability that action bi leads to qj The vector (of size r) that describes the probabilities of each one of the consequences of action bi The vector (of size m) that describes the probabilities that the action bj will lead to a certain consequence The utility of agent aj being in a state formed by s C Q at time t The value of coalition i for agent aj The value of agent i for coalition j The fitness of coalition j given its norms and an action bi G B performed by agent a; The opinion of coalition Cj that agent a; is responsible for the system being in state s C Q The size of cj measured e.g. by the sum ^- V? The forgiveness factor, i.e. the parameter deciding the weight of long-term vs. short-term memory. The payoff function used in the scenarios.
Table 6.1
Symbols used in this chapter
We see that the 7-factor decrease the influence of past V and V values to the benefit of the most recent action and judgment.
162
S. J. Johansson
6.3.3.2
the geometric model
Vi(0) = 0, V/(t) =
(6.13)
1+
^/(V/(t-l))-r.^(P(i,t-l)), t > 0
VJ(Q) = 0,
(6.14)
(6.15)
Vj(t) = i+j
(Vj(t - 1))T • ( , e .,_^/ J ] 1 - V£(l - z^s))Ui(s,t-
l),t>0
aky£a,i
^
(6.16) The forgiveness factor 7 > 0 will decrease the weight of previous actions ranging from they have no impact on calculations (7 = 0) to all previous actions are equally valued (just as in our former model 7 = t — 1). 6.3.4
An illustration
of the
models
So how does these models work out in practice? Let us construct an easy example in order to illustrate the differences. 6.3.4.1
Specification of the models
Imagine a situation where four agents ai,...,a± can choose between fully cooperative actions (bi — 1.0) and fully selfish actions (bi = 0.0). The total payoff is then built upon two payoffs, the individual payoff (fiind(i) and the total coalition payoff <j>Coai, each defined through: 4>ind(i) = V(0.5+ bt),
(6.17)
4
&00i = E M 1 ' 5
(6-18)
As can be seen in Fig. 6.1, the example requires that more than one agent join the coalition in order for it to be successful (in the sense that the agent get a higher payoff than the rest of the agents). To create a fair split of the coalition payoff between the members of the coalition, they get as much of the payoff as they are members in the coalition (relative to the
Valuation-Based
Coalition Formation
in Multi-Agent
1.5
3
Systems
163
Coalition payoff Individual payoff
3
0.5
1
2
2.5
3.5
4
Coalition size
Fig. 6.1 The payoff of the coalition Co as a function of its size (|co|) compared to the maximum individual payoff <j>i = 2.
other members), i.e.:
coai(i) =
l
| , c ° a ', where
(6.19)
4
\co\ = Y,V?>and
(6.20)
1=1
total{l) = coal{i) + ind(i)
(6.21)
What the norms are concerned, we will in this example use a norm function bnorm{t) that is the averaged action in the previous step of time among the members, where the influence is relative to their degree of membership.
M _T.Uv?-i>i
(6.22)
col
How well the own actions correspond to the norm (ipoih)) is then calculated through: ipo(bi) =
l-\bi-b„
(6.23)
164
S. J.
Time 0 10 20 30 60 80
Johansson
Event The scenario starts. Four agents are present of which agent one is cooperating. Agent two joins the coalition Agent three joins the coalition The last agent joins the coalition Agent one perform an action that differ considerably from the norm of the coalition The scenario ends Table 6.2
The events of the scenario
In this simple model, we will let the -Zg(s)-function be:
4(s)
= M*);v?
(6.24)
l c o| In the case where we look at the forgiving models, 7 is set to 1.5. 6.3.5
Specification
of the
scenario
The chosen scenario will show us two things. Firstly, how the model will react when new agents join the coalition. Secondly how will it react on agents trying to rip off the coalition by choosing a single very uncooperative action 6j = 0.01. The scenario is described in Table 6.2. 6.3.5.1
The results of the scenario evaluation
So, let us move on to the results of the evaluation of the models. We have no noise in the calculations and will show the results in terms of the total payoff of the agents, showing how temptations of fast pay-backs will be punished by the coalition. In Fig. 6.2, we see how the arithmetic model is very slow in converging to a fair distribution of the payoff. The payoff for the agents within the coalition raises steep as a new agent joins. That is because the new agent contributes as much as the full members of the coalition, but without being able to collect more than its relative impact (which is roughly based on its value of V"). When agent one is uncooperative, the other agents gets a temporary dip in their payoff, at the same time as the former agent collects
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
165
Fig. 6.2 The total payoff for the agents in the arithmetic non-forgiving model. At time points 10, 20 and 30, new agents join the coalition and at time point 60, the first agent break the norm to make a short term profit.
Agenl 1 Agent 2 Agent3
-
—
I7\^J
^
-
~\l ~"~
......
" " " ' • . /
i
Fig. 6.3
'
The total payoff for the agents in the geometric non-forgiving model
the overhead payoff. Fig 6.3 shows the geometric version of Fig. 6.2. We recognize a slower convergence but also that the deviation from the norms done by agent one is punished harder.
166 S. J. Johansson Agent Agent Agent Agent
£
1 2 3 4
-u\ _ J
2
'
0
10
20
30
40
50
60
70
80
Tina
Fig. 6.4 The total payoff for the agents in the arithmetic forgiving model
We see a great difference in the shape of the payoffs as we move on to the forgiving models. In the forgiving models in Fig. 6.4 and 6.5, we see how the agents reach a convergence in payoffs in just a few rounds. Also, these models will punish the breaking of norms more immediate than the previous models. The difference between these two models lies in the shape of the payoff function, but also in the ability to punish misbehavior, where the geometric forgiving model is less forgiving than the arithmetic one.
6.4
Some thoughts on agent self-contemplation
In Sec. 6.3 we argued that if the values of V? and V/ is not in balance, the parts may try to level out the differences by exploiting the other. Is it then possible, as an agent system designer, to actively differentiate valuations in order to get systems that are more cooperative? 6.4.1
Agents
and the law of J ante
One thing that is put forth as typical for Scandinavia is the law of Jante. The law consists of ten statements written down by Aksel Sandemose based on studies of how people in his home town behaved [12] and consists of the following rules:
Valuation-Based
Coalition Formation
1N ^ ^ M
Agent 1 Agent 2 Agent 3 Agent 4
-^
/
\
I I
in Multi-Agent
Systems
167
1 1/ /
I
-'
y
Fig. 6.5
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
You You You You You You You You You You
•
•
y
/
y
The total payoff for the agents in the geometric forgiving model
shall shall shall shall shall shall shall shall shall shall
not not not not not not not not not not
believe you are something. believe you are as good as we. believe you are more wise than we are. fancy yourself better than we. believe you know more than we. believe you are greater than we. believe you amount to anything. laugh at us. believe that anyone is concerned about you. believe you can teach us anything.
The essence of it is that You shall not plume yourself or think that your work is better than ours or that you could teach us anything, etc.. If applied to the agents, what would the result be? Well, firstly we must find out: what does it mean in terms of VJ and V? ? To underestimate its own value to the coalition in relation to the coalitions value to itself is to create a situation in which (at least from the perspective of the agent) Vj > V?. In order to re-establish the equilibrium, the agent may try even harder to follow norms, etc so that it will be accepted by the coalition. If the coalition have the same opinion about the relationship, it may lead to that it will try to exploit the agent, in order to decrease the agents interest for it, but since the assumption is that it is the agent that underes-
168
S. J.
Johansson
timates its own value, the coalition may think they are in balance, while the agent does not. In all, if a majority of the agents have low self-confidence, it will lead to stronger coalitions with high degrees of membership. 6.4.2
Agents
with high
self-confidence
The opposite of the agent underestimating its value, is the one overestimating it, i.e. V- < V?. For various reasons it has a tremendous self-confidence and it thinks that it is irreplaceable for the coalition, or at least that the coalition has much more use of it, than it has use for the coalition. Such a situation lead to that the agent performs actions that decrease its value for the group (that is, the V?) in order to make short-term gains e.g. by cheating on its coalition members although it breaks the norms of the coalition. Or the coalition increase the value for the agent to a level that fits it, e.g. by changing its norms so that it suits the agent better^ . In all, if all agents apply a self-confident strategy, the system as a whole will have trouble creating stable coalitions, since none of them will work for the sake of the coalition, if they may gain more by not doing so. 6.4.3
An evolutionary
perspective
Although a small proportion of the law of Jante in every agent may seem to be a promising design principle (in that it strengthens coalitions), it is not the case that it automatically leads to robust systems. On the contrary, Janteists are subject to invasion and exploitation by the self-confident agents in an open system. This leaves us with two kinds of stable solutions1': • either with self-confident agents only (if the gain of strong coalitions is low). Then no agent is willing to sacrifice anything of their resources for the good of the other agents, since if it did, the other agents would take the resources and never pay back. • or a mixed equilibrium of Janteists, true valuators and self-confident agents (if the gain of strong coalitions exceeds the expected value §This is done for false reasons. It is mainly the opinion of the agent that it is worth more than it actually is, thus the inequality is a concern of the agent, rather than the coalition. ' s t a b l e in the sense that no agent will improve their payoff if they alone change their behavior, c.f. the Hawk-and-Dove game described e.g. in [9]
Valuation-Based
Coalition Formation
D D H
in Multi-Agent
Systems
169
H
R, R 0,2R 2R,0 R-F,R-F
Table 6.3 A Hawk-and-Dove game matrix. When two doves (D) meet, they equally share the common resource (2R), a hawk (H) will always take all of the resources when meeting a dove, and two hawks will fight over the resources to an averaged cost F. An evolutionary stable strategy in this game is a mix of a hawk behavior 2R/F parts of the time, and a dove behavior the rest of the time.
of acting in a self-confident way). In this case, there are enough agents willing to trust each other and build a coalition in order to maintain the coalition, but neither the self-confident agents, nor the Janteists would improve their payoff by changing strategy. In Table 6.3 we see a the (famous) Hawk-and-Dove (HD) game. Compared to the discussion earlier on equilibria in coalition formation, we see that there are similarities, as a matter of fact, the HD game is a formalization of the decision of whether or not to cooperate in a coalition. If F is high enough compared to R, the agents will cooperate, since the risk of a hawk to run into another hawk may make the dove behavior beneficial. If F is low, e.g. F = 0, the payoffs for two hawk (self-confident) agents meeting will be equal to the ones of two dove (Janteist) agents meeting, but for every time the hawk meets the dove, it will win over the dove, making the only rational choice of strategy being the hawk (or self confident) behavior. In the literature of evolutionary game theory the matters of mixed strategies and equilibria are discussed thoroughly for instance in the classic book by Maynard Smith [9]. Rosenschein and Zlotkin formulated several agent scenarios in terms of game theory in their Rules of Encounter [ll] and Weibull gives an rationalistic economics perspective [19]. Given that every system possible to exploit will be exploited, we must ask ourselves the question whether the behaviors described above (the law of Jante and the self-confident) are exploitable or not. 6.4.3.1
Exploiting the Janteists
It would actually be enough not to underestimate the value of yourself in the coalition in order to get an advantage over the "Janteists". By doing
170
S. J.
Johansson
so, you will have more impact on the coalition and forming its norms" and this can be used to form norms that at an average suits you slightly better than the others. Better norms (for an agent) in this case, is interpreted as norms that suits the agents own intentions better, so that it does not have to choose actions that contradicts its own goals, just because the norms of the coalition says so. 6.4.3.2
Exploiting the self-confident
To exploit self-confident agents is harder. We cannot approach the problem in the same way as we did with in the previous section, since if we were to raise our own value above the ones of the self-confident, it would only make us even more self-confident, i.e. non-willing to cooperate in an altruistic fashion.
6.5
Conclusions
We have argued for a rational, continuous view of membership in coalitions, where the membership is based on how valuable the coalition is for the agent and vice versa. We have also presented a theoretical model of updating group values, both from the individual agent and the coalition perspectives and an improvement that generalizes the notion of forgiveness and make the model range from elephants to "forgetters". Three examples of how valuations between agents and coalitions may work have been discussed and one of them has been explicitly expressed in the proposed models. However, the models are just examples and we believe that several other models will fit into the discussion about exploiters and Janteists as well, e.g. the work of Verhagen [17]. The main contribution of this work is instead the discussions around the models and that of what actually can be done by the agents themselves and what we as designers have to think about when designing agents that will form coalitions. It seems like if the law of Jante may give the coalitions extra fuel in that agents will do a little bit more than they are expected to, in order to be even more accepted in the coalition; however that behavior is possible to exploit and an equilibrium may be expected between exploiters and II This is under the assumption that the more "member" you are, the more impact will you have on the norms of the coalition.
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
171
exploited agents. What the self-confident agents are concerned, they do not seem to suffer from exploiters, but instead, the system in which they act might be characterized by weak (if any) coalitions, a claim that is supported e.g. by the work of Shoham and Tanaka [15]. Acknowledgements I would like to thank Paul Davidsson, Magnus Boman, Harko Verhagen, Patrik Werle, Bengt Carlsson, Sam Joseph and the anonymous reviewers for their comments on various drafts of this work (first published at IAT '99 [7]), and the participants of the IAT '99 for the discussions.
172 S. J. Johansson
Bibliography
[1] M. Boman. Norms in artificial decision making. Artificial and Law, 7:17-35, 1999.
Intelligence
[2] K. Carley and A. Newell. The nature of the social agent. Journal of Mathematical Sociology, 19(4):221-262, 1994. [3] R. Conte and M. Paolucci. Tributes or norms? the context-dependent rationality of social control. In R. Conte, R. Hegelmann, and P. Terna, editors, Simulating Social Phenomena, volume 456 of Lecture Notes in Economics and Mathematical Systems, pages 187-193. Springer Verlag, 1997. [4] J. Doyle. Rationality and its role in reasoning. Computational gence, 8(2):376-409, 1992. [5] E.H. Durfee. Practically coordinating. AI Magazine, 1999.
Intelli-
20(1):99-116,
[6] A. Iwasaki, S.H. Oda, and K. Ueda. Simulating a n-person multi-stage game for making a state. In Proceedings of Simulated Evolution and Learning, volume 2, 1998. [7] S. Johansson. Mutual valuations between agents and their coalitions. In Proceedings of Intelligent Agent Technology '99, 1999. [8] M. Klusch. Cooperative Information Agents on the Internet. PhD thesis, University of Kiel, Germany, 1997. in german. [9] J. Maynard Smith. Evolution and the theory of games. Cambridge University Press, 1982. [10] G. Owen. A note on the Shapley value. Management Science, 14:731732, 1968. [11] J. S. Rosenschein and G. Zlotkin. Rules of Encounter. MIT Press, 1994.
Valuation-Based Coalition Formation in Multi-Agent Systems [12] A. Sandemose. En flykting korsar sitt spar. Forum, 1977. In Swedish, first edition in danish 1933. [13] T. Sandholm. Leveled commitment contracting among myopic individually rational agents. In Proceedings of the third International Conference on Multi-Agent Systems (ICMAS) '98, pages 26-33, 1998. [14] L.S. Shapley. A value for n-person games. Annals of Mathematics Studies, 2(28):307-317, 1953. [15] Y. Shoham and K. Tanaka. A dynamic theory of incentives in multiagent systems. In Proceedings of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI) '97, volume 1, pages 626-631, 1997. [16] R.S Sutton and A.G. Barto. Reinforcement Learning: An MIT Press, 1998.
introduction.
[17] H.J.E. Verhagen. Norm Autonomous Agents. PhD thesis, Department of Computer and Systems Sciences, Stockholm University and Royal Institute of Technology, 2000. [18] T. Vielhak. COALA — a general testbed for simulation of coalition formation among autonomous agents. Master's thesis, Institute of Computer Science and Applied Mathematics, University of Kiel, Germany, 1998. user's guide. [19] J. Weibull. Evolutionary Game Theory. MIT Press, 1996. [20] D.H. Wolpert and K. Turner. An introduction to collective intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Centre, 2000.
173
Chapter 7
Simulating How to Cooperate in Iterated Chicken and Prisoner's Dilemma Games Bengt Carlsson Department of Software Engineering and Computer Science, Blekinge Institute of Technology, Sweden 7.1
Introduction
In the field of multi-agent systems (MAS) the concept of game theory is widely in use ([15]; [23]; [30]). The initial aim of game theorists was to find principles of rational behavior. When an agent behaves rationally it "will act in order to achieve its goal and will not act in such a way as to prevent its goals from being achieved without good cause" [19]. In some situations it is rational to cooperate with other agents to achieve its goal. With the introduction of the "trembling hand" noise ([32]; [4]) a perfect strategy would take into account that agents occasionally do not perform the intended action1. To learn, adapt, and evolve will be of a major interest for the agent. It became a major task for game theorists to describe the dynamical outcome of model games defined by strategies, payoffs, and adaptive mechanisms, rather than to prescribe solutions based on a priori reasoning. The crucial thing is what happens if the emphasis is on a conflict of interest among
1 In this metaphor an agent chooses between two buttons. The trembling hand may, by mistake, cause the agent to press the wrong button.
175
176 B. Carlsson
agents. How should in such situations agents cooperate with one another if at all? A central assumption of classical game theory is that the agent will behave rationally and according to some criterion of self-interest. Most analyses of iterative cooperate games have focused on the payoff environment defined as the Prisoner's dilemma ([5]; [10]) while the similar chicken game to a much less extent has been analyzed. In this chapter, a large number of different (Prisoner's dilemma and chicken) games are analyzed for a limited number of simple strategies. 7.2
Background
Game theory tools have been primarily applied to human behavior, but have more recently been used for the design of automated interactions. Rosenschein and Zlotkin [30] give an example of two agents, each controlling a telecommunication network with associated resources such as communication lines, routing computers, short and long-term storage devices. The load that each agent has to handle varies over time, making it beneficial for each if they could share the resources, but not obvious for the common good. The interaction for coordinating these loads could involve prices for renting out resources within varying message traffic on each network. An agent may have its own goal trying to maximize its own profit. In this chapter games with two agents each having two choices are considered2. It is presumed that the different outcomes are measurable in terms of money or a time consuming value or something equivalent. 7.2.7
Prisoner's dilemma and chicken game
Prisoner's dilemma (PD) was originally formulated as a paradox where the obvious preferable solution for both prisoners, low punishment, was unattainable. The first prisoner does not know what the second prisoner intend to do, so he has to guard himself. The paradox lies in the fact that both prisoners has to accept a high penalty, in spite of a better solution for 2
Games may be generalized to more agents with more choices, a n-persons game. In such games the influence from the single agent will be reduced with the size of the group. In this paper we will simulate repeated two person's games which enlarge the group of agents, and at least partly may be treated as a n-persons game (but still with two choices).
Simulating How to Cooperate in Iterated Chicken and Prisoner's Dilemma Games 177
both of them. This paradox presumes that the prisoners were unable to talk to each other or take revenge after the years in jail. It is a symmetrical game with no background information. In the original single play PD; two agents each have two options, to cooperate or to defect (not cooperate). If both cooperate, they receive a reward, R. The pay-off of R is larger than of the punishment, P, obtained if both defect, but smaller than the temptation, T, obtained by a defector against a cooperator. If the suckers payoff, S, where one cooperates and the other defects, is less than P there is a Prisoner's dilemma defined by T > R > P > S and 2R > T+S (see Fig. 7.1). The second condition means that the value of the payoff, when shared in cooperation, must be greater than it is when shared by a cooperator and a defector. Because it pays more to defect, no matter how the opponent choose to act, an agent is bound to defect, if the agents are not deriving advantage from repeating the game. More generally, there will be an optimal strategy in the single play PD (playing defect). This should be contrasted to the repeated or iterated Prisoner's dilemma where the agents are supposed to cooperate instead. We will further discuss iterated games in the following sections. The original Chicken game (CG), according to Russell [31] was described as a car race: "It is played by choosing a long straight road with a white line down the middle and starting two very fast cars towards each other from opposite ends. Each car is expected to keep the wheels of one side of the white line. As they approach each other, mutual destruction becomes more and more imminent. If one of them swerves from the white line before the other, the other, as he passes, shouts Chicken! and the one who has swerved becomes an object of contempt.. ."3 The big difference compared to Prisoner's dilemma is the increased costs for playing mutually defect. The car drivers should not really risk crashing into the other car (or falling off the cliff). In a chicken game the pay-off of S is bigger than of P, that is T > R > S > P. Under the same conditions as in the Prisoner's dilemma defectors will not be optimal winners when playing the chicken game. Instead there will be a combination between playing defect and playing cooperate, winning the game. In Fig. 7.1b R and P are assumed An even earlier version of the chicken game came from the 1955 movie "Rebel Without a Cause" with James Dean. Two cars are simultaneously driving off the edge of a cliff, with the car driving teenagers jumping out at the last possible moment. The boy who jumps out first is "chicken" and loses.
178
B.
Carlsson
to be fixed to 1 and 0 respectively. This can be done through a two steps reduction where in the first step all variables are subtracted by P and in the second step divided by R-P. This makes it possible to describe the games with only two parameters S' and T' (see Fig. 7.7 in the simulation section of this chapter). In fact we can capture all possible 2 x 2 games in a twodimensional plane4. a.
Cooperate
Defect
b.
Cooperate
Defect
Cooperate
R
S
Cooperate
1
(S-P)/(R-P)
Defect
T
P
Defect
(T-P)/(R-P)
0
Fig. 7.1 Pay-off matrices for 2 x 2 games where R = reward, S= sucker, T= temptation and P= punishment. In b the four variables R, S, T and P are reduced to two variables S'= (S-P)/(T-P) and T'= (T-P)/(R-P) As can be seen in Fig. 7.2 these normalized games are limited below the line S'= 1 and above the line T'= 1. CG has an open area restricted by 0 < S' < 1 and T' > 1 whereas PD is restricted by T' + S' < 2, S'< 0 and T' > 1. If T'+ S' > 2 is allowed there will be no upper limit for the value of the temptation. There is no definite reason for excluding this possibility (see also [12]). This was already pointed out when the restriction was introduced. 'The question of whether the collusion of alternating unilateral defections would occur and, if so, how frequently is doubtless interesting. For the present, however, we wish to avoid the complication of multiple 'cooperative solutions'." [28]. In this study no strategy explicitly make use of unilateral defections, so the extended area of PD is used.
4
Although there are an infinitely number of different possible games, we may reduce this number by regarding the preference orderings of the payoffs. Each agent has 24 (4!) strict preference orderings of the payoffs between it's four choices. This makes 24*24 different pairs of preference orderings, but not all of them represent distinct games. It is possible to interchange rows, columns and agents to optain equal games. If all doublets are put away we still have 78 games left [29]. Most of these games are trivial because there is one agent with a dominating strategy winning.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
179
s,n
1
o •
! ! ! 1
i • • 1
| !
Chicken game r 2
x
p X
X
X
Prisoner's di\ lemma T'+S'<2
1 •
N
V
Prisoner's dilemma T'+S'>2
\ x x
Fig. 7.2 The areas covered by Prisoner's dilemma and chicken game in a twodimensional plane. It is no coincidence that researchers have paid most interest to the Prisoner's dilemma and chicken games areas of the two-dimensional space. If we look at the left part (T' < 1) there will be no temptation, playing defect. If S' > 1 there will be no penalty for playing cooperate against playing defect.5 7.2.2 Evolutionary and iterated games In evolutionary game theory ([25], [26]), the focus has been on evolutionary stable strategies (ESS). The agent exploits its knowledge about its own payoffs, but no background information or common knowledge is assumed. An evolutionary game repeats each move, or sequence of moves, without a memory function being involved i.e. there is no way to anticipate the future by looking back into the memory. In many MAS, however, agents frequently use knowledge about other agents. There are at least three different ways of describing ESS from both an evolutionary and a MAS point of view. Firstly, we define the ESS as a Nash equilibrium of different strategies. A Nash equilibrium describes a set of strategies where no agent unilaterally intend to change its choice. In MAS however, some knowledge about the other agents may be accessible when simulating the outcome of strategies. 5
Of course there are other interesting 2 x 2 plays, but this is outside the scope of this article. For an overview see [29]
180
B.
Carlsson
Assume that agents can predict the behavior of their opponents from their past observations of play in "similar games", either with their current opponents or with "similar" ones. If agents observe their opponents' strategies and receive a number of observations, then each agent's expectations about the play of his opponents converges to the probability distribution corresponding to the sample average of play he has observed in the past. The problem is that this is not the same as finding a successful strategy in an iterated game where an agent must know something about the other's choice. Instead of having a single prediction we end up with allowing almost any strategy. This is a consequence of the so-called Folk Theorem (see, e.g., [16]; [23]). A game can be modeled as a strategic or an extensive game. A strategic game is a model of a situation in which each agent chooses its strategy once and for all, and all agents' decisions are made simultaneously while an extensive game specifies the possible orders of events. An agent playing a strategic game is not informed of the plan of action chosen by any other agent while an extensive agent can reconsider its plan of action whenever a decision has to be made. All the agents in this chapter are playing strategic games. According to the second way of describing the ESS, it can be described as a collection of successful strategies, given a population of different strategies. An ESS is a strategy (or possibly a set of strategies) such that if all the members of a population adopt it, then no mutant strategy (a strategy not in the current set of strategies) could invade (become a resident part of successful strategies) the population under the influence of natural selection. A successful strategy is one that dominates the population; therefore it will tend to meet copies of itself. Conversely, if it is not successful against copies of itself, it will not dominate the population. The problem is that this is not the same as finding a successful strategy in an iterated game because in such games the agents are supposed to know the history of the moves. For nontrivial MAS and evolutionary systems, it is impossible to create a complete set of strategies. Instead of finding the best one, we can try to find a possibly sub-optimal but robust strategy in a specific environment, and this strategy may be an ESS. If the given collection of strategies is allowed to compete in a population tournament, we will possibly find a winner, but not necessarily the same one for every repetition of the game. A population tournament allows successful strategies to be more common in the population of strategies when a new generation is introduced. In the simulation part of this
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
181
chapter we show some major differences between PD and CG in population tournaments. Thirdly, the ESS can be seen as a collection of genetically evolving successful strategies i.e. combining a population tournament with the ability of introducing new generation strategies. It is possible to simulate a game through such a process, consisting of two crucial steps: mutation (i.e., a variation of the ways agents act) and selection (the choice of the preferred strategies). Different kinds of genetic computations (see, e.g., [18]: [17]: [20]) have been applied within the MAS society, but it is important to remember that the similarities to natural selection are restricted.6 For PD and CG mutational changes may occur by allowing strategies to change a single move (cooperate or defect) and then be subject to population selection. This method is not further expounded in this chapter. 7.2.3 Simulating iterated games In an iterated game, unlike the repeated evolutionary game, the strategies are assumed to have a memory function. Most studies today look at the iterated Prisoner's dilemma (IPD) as a cooperative game where "nice" and "forgiving" strategies, like the Tit-for-Tat (TJT), are successful ([3]; [5]). A nice strategy is one, which never chooses to defect before the other agent defects, and a forgiving strategy does not retaliate a defect by playing defect forever. TJT simply follows the move of it's opponent drawn in the round before. In iterated chicken game (ICG), mutual cooperation is less clearly the best outcome [22] but the situation is complicated. A mixed7 strategy may favor mutual cooperation. Axelrod and Hamilton [5] introduced the concept of reciprocal altruism to game theory in their famous article "The evolution of cooperation". People were invited to submit their favorite strategy to an iterated Prisoner's dilemma game tournament [1]. The tournament was conducted as a round robin tournament where everyone met each other two by two. The only Firstly, genetic algorithms use a fitness function instead of using dominating and recessive genes in the chromosomes. Secondly, there is a crossover between parents instead of the biological meiotic crossover. With pure and mixed strategies we here refer to the set of strategies (played by individuals) winning the population game. A mixed strategy is a combination of two or more strategies from the given set of strategies i.e. an extended strategy set could include the former mixed strategy as a pure strategy.
182
B.
Carlsson
known strategy for the participators in the beginning was the strategy random. For the tournament the TJT strategy was most successful, on average beating every other strategy. TJT starts with playing cooperatively and then repeats every move done by its antagonist. Axelrod [2] informed the participators about the result and invited them to a new extended tournament. Once again TJT was the major winning strategy. Axelrod also conducted a population tournament where each strategy was allowed to survive into new generations of strategies. The proportion of the strategies depended on how successful each strategy was in the previous generation. In the end of the simulation there was typically only one successful strategy left. Again, TJT won most of the plays proving to be a robust strategy against these strategies. The conclusions drawn by Axelrod were that nice, forgiving strategies like TJT defeat strategies playing defectively using threats and punishments. This is a remarkable conclusion because of: In the single play PD playing defect is the winning strategy. In both single play and repeated PD a defecting strategy always wins against a cooperating strategy. TJT uses the advantage of being nice and forgiving when it meets itself. A defecting strategy always wins (or play even) against a nice strategy, but gets a low score when meeting other defecting strategies. A nice strategy will get a high score when meeting other nice strategies. This will compensate for the deficit in score against a defecting strategy. If there are a lot of cooperating strategies they will eliminate the defecting strategies of the competition. Another complication that can be introduced to iterated games is the presence of noise. If there is uncertainty about the outcome, TJT will be less successful. In Axelrod's simulation, TJT still won the tournament when 1 per cent chance of misperception was added [3]. In other simulations of noisy environments, TJT has instead performed poorly [7]. The uncertainty represented by the noise reduces the payoff of TJT when it plays itself in the IPD. Instead playing defect or playing a modified TJT strategy like contriteTJT (cTJT, [10]), Pavlov [16] or generous-TJT (gTJT, [27]) may be more successful. cTJT (also called Fair [21]) is a modified version of TJT where the strategy is allowed to "apologize" or "be angry" instead of just repeating the opponent's move. Pavlov or Simpleton [3] cooperates if and only if the two competitors used the same move in the previous round. gTJT always
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
183
cooperate if the other agent cooperated in the previous round, but defect with a probability less than one if the other agent defected. Ever since Axelrod presented his results there has been a lively discussion about the PD. As an example Axelrod is generally using the same payoff matrix for different simulations. Binmore [8] gives a critical review of TjT, and of Axelrod's simulation. He concludes that TjT is only one out of a very large number of equilibrium strategies and that TJT is not evolutionary stable. On the other hand evolutionary pressures will tend to select equilibrium for the IPD and ICG in which the agents cooperate in the long run. 7.2.4
Generous and greedy strategies
The principle for the categorization of strategies into nice and forgiving against defecting strategies, which uses threats and punishments, is unclear. Why is TjT not just treated as a strategy repeating the action of the other strategy instead? One alternative way of categorize strategies is to group them together as being generous, even-matched, or greedy ([13]; [14]). If a strategy more often plays as a sucker, ns, than playing temptation, %, then it is a generous strategy (ns > nT). An even-matched strategy has ns ~ nT and a greedy strategy has ns < nT. ns and nT are the number of time an agent plays sucker and temptation respectively. Boerlijst, et al [9] uses a similar categorization into good or bad standings. A agent is in good standing if it has cooperated in the previous round or if it has defected while provoked, i.e., if the agent is in good standing it should not be greedy unless the other agent was greedy the round before. In every other case of defection the agent is in bad standing i.e. it tries to be greedy. The generous and greedy categorization uses a stable approach, a once and for all categorization, contrary to a more dynamic good and bad standing dealing with what happened in the previous move. The stable approach of the generous and greedy categorization makes it easier to analyze this model. The basis of the partition is that it is a zero-sum game on the meta-level in that the sum of proportions of the strategies ns must equal the sum of the strategies nT. In other words, if there is a generous strategy, then there must also be a greedy strategy. The classification of a strategy can change depending on the surrounding strategies. Let us assume we have the following four strategies:
184
B.
Carlsson
Always Cooperate (AUC) has 100 per cent co-operate nR + ns when meeting another strategy. AUC will never act as a greedy strategy. Always Defect (AllD) has 100 percent defect nT + nP when meeting another strategy. AllD will never act as a generous strategy. Tit-for-Tat (TfT) always repeats the move of the other contestant, making it a repeating strategy. TfT naturally entails that ns = nT. Random plays cooperate and defect approximately half of the time each. The proportions of ns and nT will be determined by the surrounding strategies. Random will be a greedy strategy in a surrounding of AUC and Random, and a generous strategy in a surrounding of AllD and Random. Both TfT and Random will behave as an even-matched strategy in the presence of only these two strategies as well as in a surrounding of all four strategies, with AUC and AllD participating in the same proportions. All strategies are evenmatched when there is only a single strategy left.
Fig 7.3: Proportions out of 100% of R, S, T, and P for different strategies In the next section we use a simulation tool with 15 different strategies (see Table 7.1). We interpret the proportions of these strategies in Fig. 7.3 as a kind of context dependent fingerprint for the strategy in the given environment, independent of the actual value of the payoff matrix.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
185
AUC definitely belongs to a group of generous strategies and so do 95% Cooperate (95%Q, Tit-for-two-Tats (TJ2T), Grofman, Fair, and Simpleton, in this specific environment. The even-matched group of strategies includes TjT, Random, and AntiTit-for-tat (ATJT). Within the group of greedy strategies, Feld, Davis, and Friedman belong to a smaller family of strategies doing more co-operation moves than Random, i.e. having significantly more than 50 per cent R or S. An analogous family consists of Joss, Tester, and AUD. These strategies cooperate less frequently than does Random. What will happen to a particular strategy depends both on the surrounding strategies and on the characteristics of the strategy. For example, AUC will always be generous while 95%C will change to a greedy strategy when these two are the only strategies left. The described relation between strategies is independent of what kind of game is played, but the actual outcome of the game is related to the payoff matrix. 7.3
The simulations
As mentioned earlier in this chapter the repeated Prisoner's dilemmas is regarded as cooperate games, i.e. games favoring agents playing cooperate. A typical winning strategy, like TjT, ends up in agents playing cooperate all the time. In chicken games the advantage of cooperation should be even stronger, because it costs more to defect compared to the Prisoner's dilemmas. Surprisingly, this is not the case when trying to analyze the chicken games. In the hawk and dove game [26], consisting of one PD and one CG part, the outcome for the CG is supposed to be a combination of playing cooperate and playing defect. We think this new "dilemma" can be explained by a larger robustness for the chicken games. This robustness may be present if more strategies are allowed and/or noise is introduced. In this chapter three different simulations comparing IPD and ICG are presented trying to verify this hypothesis. Variants of Axelrod's original matrix—the first simulation used Axelrod's original payoff matrix for 36 different strategies. To investigate the differences we used 11 different matrices gradually moving from PD to CG [12]. Adding noise—5 different variants of Axelrod's matrix were used for 15 different strategies. Different levels of noise were added [13].
186
B.
Carlsson
Normalized matrices—In all 209 different matrices were used for 15 different memory-0 and memory-1 strategies [11]. 7.3.1
Variants ofAxelrod's original matrix
Axelrod found his famous Tit-for-Tat solution for the Prisoner's dilemma when he arranged and evaluated a tournament. He used the payoff matrix in fig. 7.4 a for each move of the Prisoner's dilemma: a
b
c2 Ci
D,
3,3 5,0
D2 0,5 1, 1
c2 Ci Di
3,3 5,1
D2 1,5 0,0
Fig. 7.4 Example payoff matrix Prisoner's dilemma (4 a) and the chicken game (4 b). In our experiment we use the same total payoff sum for the matrices as Axelrod used and a simulation tool involving 36 different strategies [24]. However, we vary the two lowest payoffs (0 and 1) continuously so that they change order between PD and the CG matrix in fig. 7.4 b. It is a round robin tournament between different strategies with a fixed length of 100 iterations. Each tournament was run five times. Besides the two matrices above we varied P and S ten steps between 1 and 0 respectively without changing the total payoff sum for the matrix. As an example, (0.4; 0.6) means that a cooperate agent gets 0.4 meeting a defect and defect gets 0.6 meeting another defect. The different strategies are described in Mathieu and Delahaye [24]. We have used three characterizations of the different strategies: Initial move—if the initial move of the strategy was cooperative, defect or random. Nice—If the strategy does not make the first defect in the game. Static—If the strategy is fully or partly independent of other strategies or if the strategy is randomized. 7.3.2 Adding noise to PD and CG We developed a simulation tool in which 15 different strategies competed. Most of the strategies are described in ([1], [2]). In table 7.1 all the strategies
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
187
are described. All strategies handle the moves of the other agent and not the payoff value, since the latter does not affect the strategy. In a round-robin tournament, each strategy was paired with each different strategy plus its own twin, as well as with the random strategy. Each game in the tournament was played on average 100 times (randomly stopped) and repeated 5000 times.
c2 Ci Di
1.5 1
D2 2 1.5+q
Fig. 7.5. A cost matrix for the resource allocation matrices. The average payoff Eavg(S) for a strategy S is a function of the payoff matrix and the distribution of the payoffs among the four outcomes (Fig. 7.5): Ems(S) = 1.5p(C,C) + 2p(C,D) + lp(D,C) + (1.5+q)p(D,D)
(7.1)
We ran a simulation with the values for 1.5+q equal to: 1.6; 1.9; 2.1; 2.4; 3.0, and then we introduced noise on four levels: 0.01, 0.1, 1 and 10 per cent. This means that the strategies changed to the opposite moves for this given percentage. 7.3.3
Normalized matrices
The normalized study includes two different sets of simulations. In the first set, the strategies compete in a round-robin tournament with the aim of just to determine the tendency of different strategies to play cooperates and defect. In the second set, the competitive abilities of strategies in iterated population tournaments were studied within the IPD and the ICG. In the simulations of the IPD and the ICG two sets of strategies were used. We used the strategies in Fig. 7.6 represented by finite automata [21]. The play between two automata is a stochastic process where all finite memory strategies can be represented by increasingly complicated finite automata. Memory-0 strategies, like AUC and AUD, does not involve any
188
B.
Carlsson
Strategy
First move Description
A11C 95%C TOT
C C C
Grofman
C
Fair
C
Simpleton
C
TfT Feld
C
Davis
c
Friedman
c
ATfT
D
Joss
C
Tester
D
A11D
D
c
Cooperates all the time Cooperates 95% of the time Tit-for-two-Tats, Cooperates until its opponent defects twice, and then defects until its opponent starts to cooperate again Cooperates if R or P was played, otherwise it cooperates with a probability of 2/7. A strategy with three possible states,—"satisfied" (C), "apologizing" (C) and "angry" (D). It starts in the satisfied state and cooperates until its opponent defects; then it switches to its angry state, and defects until its opponent cooperates, before returning to the satisfied state. If Fair accidentally defects, the apologizing state is entered and it stays cooperating until its opponent forgives the mistake and starts to cooperate again. Like Grofman, it cooperates whenever the previous moves were the same, but it always defects when the moves differed (e.g.S). Tit-for-Tat. Repeats the moves of the opponent Basically a Tit-for-Tat, but with a linearly increasing (from 0 with 0.25% per iteration up to iteration 200) probability of playing D instead of C. Cooperates on the first 10 moves, and then, if there is a defection, it defects until the end of the game. Cooperates as long as its opponent does so. Once the opponent defects, Friedman defects for the rest of the game Anti-Tit-for-Tat. Plays the complementary move of the opponent. A TfT-variant that cooperates with a probability of 90%, when opponent cooperated and defects when opponent defected. Alters D and C until its opponent defects, then it plays a C and TfT. Defects all the time
Table 7.1. Description of the different strategies. memory capacity at all. If the strategy in use only has to look back at one draw, there is a memory-1 strategy (a choice between two circles dependant
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
189
of the other agent's move). All the strategies in Fig. 7.6 belong to memory-0 or memory-1 strategies.
to
a) XX
>®
XD XD
>© ( D ^ xc d)
xc
c)
XC
€®L ©<
XD
XX
3©
XD Fig. 7.6 a) AUD (and variants) b) TJTc) ATjTd) AUC (and variants). On the transition edges, the left symbol correspond to an action done by a strategy against an pponent performing the right symbol, where an X denotes an arbitrary action. Y in Cy and Dy denotes a probability factor for playing C and D respectively. Both sets of strategies include AUD, AUC, TJT, ATjT and Random. In the first set of strategies, the cooperative-set five AUC variants (100, 99.99, 99.9, 99 and 90 % probability of playing C) are added and in the second set of strategies, the defective-set the corresponding five AUD variants are added. Cy and Dy in Fig. 7.3 show a probability factor y 100, 99.99, 99.9, 99, 90 % or for the Random strategy 50% for playing C and D respectively. Cooperate (C) Defect (D)
Cooperate (C) 1 l+s 2
Defect (D) l-si
0
Fig. 7.7 A payoff matrix for PD and CG. C stands for cooperate, D for defect and Si and s2 are cost variables. If s 1 > 1 it is a PD. If Si < 1 it is a CG. To obtain a more general treatment of IPD and ICG, we used several variants of payoff matrices within these games, based on the general matrix of Fig. 7.7 (corresponding to Fig. 7.2).
190 B. Carlsson
In the first set of simulations we investigated the successfulness of the agents using different strategies (one strategy per agent) in a round-robin tournament. Since this is independent of the actual payoff value, the same round-robin tournament can be used for both IPD and ICG. Every agent was paired with all the other agents plus a copy of itself. Every meeting between agents in the tournament was repeated on average 100 times (randomly stopped) and played for 5000 times. The result from the two-by-two meetings between agents using different strategies in the round-robin tournament was used in a population tournament. The tournament starts with a population of 100 agents for each strategy, making a total population of 900. The simulation halts when there is a winning strategy (all 900 agents use the same strategy) or when the number of generations exceeds 10.000. Agents are allowed to change strategy and the population size remains the same during the whole contest. For the IPD the following parameters were used: sL e {1.1, 1.2...2.0} and s2 e {0.1,0.2...1.0,2.0}, making a total of 110 different games8. For the ICG games with parameter settings sL e {0.1,0.2,...0.9} and s2 e {0.1,0.2,.... 1.0,2.0} a total of 99 different games were run. Each game is repeated during 100 plays and the average success is calculated for each strategy. For each kind of game there is both the cooperative-set and the defective-set. 7.4
7.4.1
Results
Variants ofAxelrod's original matrix
Out of 36 different strategies Gradual won in a PD game. Gradual cooperates on thefirstmove, then defects n times after n defections, and then calms down its opponent with 2 cooperation moves. In CG a strategy Coop_puis_tc won. This strategy cooperates until the other agent defects and then alters between defection and cooperation the rest of the time. TjT was around 5th place for both games. Two other interesting strategies are joss_mou (2nd place) and jossjdur (35th place). Both start with cooperation 8
For the strategies used in this simulation the constraint 2R > T + S does not affect the results, so these combinations are not excluded.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
191
and basically play TfT. Joss-mou plays cooperation strategy one time out of ten instead of defect and joss_dur plays defect one time out of ten instead of cooperate. This causes the large differences in scores between the strategies.
Fig.7.8. Comparing PD and CG. In thefigureabove the CG is in the foreground and the PD in the background, the best strategies are to the left and the worst to the right. The top scoring strategies start with cooperation and react towards others i.e. they are not static. Both PD and CG have the same top strategies. A majority of the low score games are either starting with defect or have a static strategy. Always defect has the biggest difference in favor of PD and always cooperate the biggest difference in favor of CG. The five games with the largest difference in favor of CG are all cooperative with a static counter. There is no such connection for the strategies in favor of PD, instead there is a mixture of cooperate, defect and static strategies. Our simulation indicates that the chicken game to a higher extent rewards cooperative strategies than the Prisoner's dilemma because of the increased cost of mutual defections. The following parts of the result confirm these statements: All the top six strategies are nice and start with cooperation. They have small or moderate differences in scores between the chicken game and Prisoner's dilemma. TfT is a successful strategy but not the best. All the 11 strategies, with a lower score than random, either starts with defect or, if
192
B.
Carlsson
they start with cooperation, is not nice. All of these strategies are doing significantly worse in the CG than in the PD. This means that we have a game that benefits cooperators better than the PD, namely the CG. A few of the strategies got, despite of the overall decreasing average score, a better score in the CG than in the PD. They all seem to have taken advantage of the increasing score for cooperation against defect. In order to do that, they must, on the average, play more C than D, when its opponent plays D. The mimicking strategies, like TjT, cannot be in this group, since they are not that forgiving. In fact, most strategies that demand some kind of revenge for an unprovoked defect will be excluded, leaving only the static strategies9. All static strategies, which cooperate on the first move, and some of the partially static ones, do better in the CG than in the PD. We interpret this result to be yet another indicator of the importance of being forgiving in aCG. 7.4.2
Adding noise to PD and CG
ToUl
. ^ __ s » - -»—£ _ - * * ' «- . — -— S i — •»• C " .
/
•*»*
Y-
"*"•
PDO
PO0.O1
PD0.1
-
/ . -v.' ®
s*r PD 1
V
^
PD 10
Noise
Fig. 7.9 The four most successful strategies in PD games with increasing noise. Total represents the percentage of the population these four strategies represented.
9
In fact extremely nice non-static strategies (e.g. a 7/7"-based strategy that defects with a lower probability than it cooperates on an opponent's defection) also would probably do better in a CG than in a PD, but such strategies were not part of our simulations.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
193
Instead of looking at all the different games we formed two different groups: PD, consisting of the Axelrod, 1.6D and 1.9D matrices, and CG consisting of 2.ID, 2.4D and 3.0D matrices. For each group we examined the five most successful strategies for different levels of noise. Fig. 7.9 and Fig. 7.10 show these strategies for PD and CG when 0, 0.01, 0.1,1.0, and 10.0 per cent noise is introduced. Among the four most successful strategies in PD there were three greedy and one even-matched strategy (Fig. 7.9. see also Fig. 7.3.). In all, these strategies constituted between 85% (1% noise) and 60% (0.1%) of the population. TfT was doing well with 0.01% and 0.1% noise; Davis was most successful with 1% noise, and A11D with 10% noise. —
-
•
•»••
/^
^~"
J*fo\&\
s^ w
CG0
SimDleton „
CG0.01
CG0.1
^ ^
CGI
CG 10
Noise
Fig. 7.10 The five most successful strategies in CG games with increasing noise. Total represents the percentage of the population thesefivestrategies represented. Three out of five of the most successful strategies in CG were generous. The total line in Fig. 7.10. shows that five strategies constitute between 50% (no noise) and nearly 100% (0.1% and 1% noise) of the population. TJT, the only even-matched strategy, was the first strategy to decline as shown in the diagram. At a noise level of 0.1% or more, TJT never won a single population competition. Grofman increased its population until 0.1% noise, but then rapidly disappeared as noise increased. Simpleton that declined after 1% noise level showed the same pattern. Only Fair continued to increase when more noise was added, making it a dominating strategy at 10% noise together with the greedy strategy AllD.
194
B.
Carlsson
7.4.3 Normalized matrices 7.4.3.1 Playing random If agents with a number of random strategies are allowed to compete with each other, they will find a single winning strategy after a number of generations. This has to do with genetically drift and small simulation variations between different random strategies about how they actually play their C and D moves. As can be seen in Fig. 7.11 there are an increasing number of generations for finding a winning strategy when the total population size increases. This almost linear increase (r = 0.99) is only marginally dependent of what game is played. 10000
50
100
150
200
250
300
350
Population size each strategy
Fig. 7.11. Number of generations for finding a winning strategy among 15 random strategies with a varying population size The simulation consists of strategies with a population size of 100 individuals each. Randomized strategies with 100 individuals are, according to Fig. 7.11., supposed to halt after approximately 2800 generations in a population game. There are two possible kinds of winning strategies: pure strategies that halt and mixed strategies (two or more pure strategies) that do
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
195
not halt. If there is an active choice of a pure strategy it should halt before 2800 generation, because otherwise playing random could be treated as a winning pure strategy. Fig. 7.12 shows the relations between pure and mixed strategies for IPD and ICG. For all 110 games each run with one cooperative-set and one defective-set within IPD this is true. For the ICG only one out of 99 different games halted before 2800 generations. This game (T=l.l, S=0.1) was very close to an IPD. For the rest of the ICG there was a mixed strategy outcome. There is no reason to believe us to find a single strategy solution by extending the simulation beyond 10000 generations. If there exists a pure solution, this solution should turn up much earlier. 7.4.3.2 Pure and mixed strategies for cooperative and defective sets. Fig. 7.12 shows a major difference between pure and mixed-strategies for IPD and ICG. IPD has no successful mixed strategies at all, while ICG favors mixed-strategies for a overwhelming majority of the games. Some details not shown in Fig. 7.12 are discussed below.
Pure strategies Mixed strategies
IPD Cooperative-set TjT7S% AUD 20% none
Defective-set TjT15% AUD 20% none
ICG Cooperative-set Defective-set TfT2% TjT 3% 2-strat. 61% 3-strat 33%
2-strat 69% 3-strat 24%
Fig. 7.12. The difference between pure and mixed-strategies in IPD and ICG. For details see text. For the cooperative-set there is a single strategy winner after on average 167 generations. TjT wins 78% of the plays and is dominating 91 out of 110 games10. AUD is dominating the rest of the games and wins 20% of the plays. For the defective-set there is a single strategy winning in 47 generations on average. TjT is dominating 84 games, AUD 21 games and 99.99D, playing D 99.99% of the time, 5 games out of 110 games in all. TjT wins 75% of the plays, AUD 20% and 99.99D 4%. In the cooperative-set there are two formations of mixed-strategies winning most of the games; one with two strategies and the other with three A game is dominated by a certain strategy if it wins more than 50 out of 100 plays.
196 B. Carlsson
strategies involved. This means that when the play was finished after 10000 generations not a single play could separate these strategies finding a single winner. The two-strategy set ATJT and AllD wins 61 % of the plays and the three-strategy set ATJT, AllD and AUCm wins 33% of the plays. AUCto, means that one and just one of the strategies AllC, 99.99C, 99.9C, 99C or 90C is the winning strategy. For 3% of the games there was a single TJT winner within relatively few generations (on average 754 generations). In the defective-set there is the same two formations winning most of the games. ATJT + AUDtot wins 69% of the plays and ATJT + AllC + AUDtot wins 24% of the plays. AUDtot means that one and just one of the strategies AllD, 99.99D, 99.9D, 99D or 90D is the winning strategy. TJT is a single winning strategy in 2% of the plays, which needs on average 573 generations before winning a play. 7.4.3.3 Generous and greedy strategies in IPD and ICG In the C-variant set all AllC variants are generous and TJT is even matched. AllD, ATJT and Random are all greedy strategies. In the D-variant set all AllD variants are greedy and TJT is still even-matched. AllC, ATJT and Random are now representing generous strategies. In the IPD the even-matched TJT is a dominating strategy in both the Cand D-variant set with the greedy AllD as the only primary alternative. So the IPD will end up being a fully cooperative game (TJT) or a fully defecting game (AllD) after relatively few generations. This is the case both for the (Invariant set and, within even fewer generations, for the D-variant set. In ICG there is instead a mixed solution between two or three strategies. In the C-variant ATJT and AllD form a greedy two-strategy set11. In the threestrategy variant the generous AUCtot join the other two. In all, generous strategies only constitute about 10% of the mixed strategies. In the D-variant the generous ATJT forms various strategy sets with the greedy AUDM. 7.5
Discussion
In our first study of variants of Axelrod's original matrix a CG tends to favor cooperation more than a PD because of the values of the payoff matrix. The " With just ATfT and AllD left ATfr will behave as a generous strategy even though it starts off as a greedy strategy in the C-variant environment.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
197
payoff matrix in this first series of simulations is constant, a situation that is hardly the case in a real world application, where agents act in environments where they interact with other agents and human beings. This changes the context of the agent and may also affect its preferences. None of the strategies in our simulation actually analyses its score and acts upon it, which gave us significant linear changes in score between the games. We looked at an uncertain environment, free from the assumption of any existing perfect information between strategies, by introducing noise. Generous strategies were dominating the CG while greedy strategies were more successful in PD. In PD, TJT was successful with a low noise environment; and Davis and AUD with a high noise environment. Fair was increasingly successful in CG when more noise was added. We conclude that the generous strategies are more stable in an uncertain environment in CG. Especially Fair and Simpleton were doing well, indicating these strategies are likely to be suitable for a particularly unreliable and dynamic environment. The same conclusion about generous strategies in PD, for another set of strategies, has been drawn by Bendor ([6]: [7]). In our PD simulations we found TJT being a successful strategy when a small amount of noise was added while greedy strategies did increasingly better when the noise increased. This indicates that generous strategies are more stable in the CG part of the matrix both with and without noise. In the normalized matrices stochastic memory-0 and memory-1 strategies are used. The main difference between IPD and ICG is best shown by the two strategies TJT and ATJT. TJT does the same as its opponent. This is a successful way of behaving if there is a pure-strategy solution because it forces the winning strategy to cooperate or defect, but not doing both. ATJT is doing very badly in IPD because it tries to jump between playing cooperate and defect. In ICG we have a totally different assumption because a mixed-strategy solution is favored (at least in the present simulation). ATJT does the opposite as its opponent but cannot by itself form a mixed-strategy solution. It has to rely on other cooperative or defect strategies. In all different ICG ATJT is one of the remaining strategies, while TJT is only occasionally winning a play. For a simple strategy setting like the cooperative and defective-set, ICG will not find a pure-strategy winner at all but a mixture between two or more strategies, while IPD quickly finds a single winner.
198
B.
Carlsson
Unlike the single play PD, which always favors defect, the IPD will favor playing cooperate. In CG the advantage of cooperation should be even stronger, because it costs more to defect compared to the PD, but in our simulation greedier strategies were favored with memory-0 and memory-1 strategies. We think this new paradox can be explained by a larger "robustness" of the chicken game. This robustness may be present if more strategies, like the strategies in the two other simulations, are allowed and/or noise is introduced. Robustness is expressed by two or more strategies winning the game instead of a single winner or by a more sophisticated single winner. Such a winner could be cTJT, Pavlov, or Fair in the presence of noise, instead of TJT. In Carlsson and Jonsson [14] 15 different strategies were run in a population game within different IPD and ICG and with different levels of noise. TJT and greedy strategies like AUD dominated the IPD while Pavlov and two variants of cTJT dominated the ICG. For all levels of noise it took on average fewer generations to find a winner in the IPD. This winner was greedier than the winner in the ICG. If instead a lot of non-intuitive strategies were used together with AUD, AUC, TJT and ATJT, IPD very quickly terminated with TJT and AUD winning the games, while ICG did not terminate at all for most of the different noises. We propose that the difference between IPD and ICG can be explained by pure and mixed-strategy solutions for simple memory-0 or memory-1 strategies. For simple strategies like TJT and ATJT, ICG will not have a purestrategy winner at all but a mixture between two or more strategies, while IPD quickly finds a single winner. For an extended set of strategies and/or when noise is present the ICG may have more robust winners than the IPD by favoring more complex and generous strategies. Instead of TJT a complex strategy like Fair is favored. From an agent engineering perspective the strategies presented in this chapter are quite simple. The presupposed agents are modeled in a predestined game theoretical environment without a sophistically internal representation. If we give the involved agents the ability to establish trust the difference between the two kinds of games are easier to understand. In the PD establishing trustworthiness between the agents means establishing trust, whereas in CG, it involves creating fear, i.e. avoiding situations where there are too much to lose. This makes CG a strong candidate for being a major cooperate game together with PD.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
199
Acknowledgements The author wishes to thank Paul Davidsson, Stefan Johansson, Ingemar Jonsson and the anonymous reviewers from the IAT conference for their critical reviews of previous version of the manuscript and Stefan Johansson for running the simulation.
Bibliography [I] Axelrod, R. Effective Choice in the Prisoner's Dilemma Journal of Conflict Resolution vol. 24 No. 1, p. 379-403, 1980a. [2] Axelrod, R. More Effective Choice in the Prisoner's Dilemma Journal of Conflict Resolution vol. 24 No. 3, p. 3-25, 1980b. [3] Axelrod, R., The Evolution of Cooperation. Basic Books, New York 1984. [4] Axelrod, R., and Dion, D., The further evolution of cooperation Nature, 242:1385-1390, 1988. [5] Axelrod, R., and Hamilton, W.D., The evolution of cooperatioa Science 211, 1390,1981. [6] Bendor, J., and Kramer, R.M., and Stout S., "When in Doubt...Cooperation in a Noisy Prisoner's Dilemma." Journal of conflict resolution vol. 35 No 4 p. 691719, 1991. [7] Bendor, J., "Uncertainty and the Evolution of Cooperation" Journal of Conflict resolution vol. 37 No 4 p. 709-734, 1993 [8] Binmore, K. Playing fair: game theory and the social contract The MIT Press Cambridge, MA, 1994. [9] Boerlijst, M.C., Nowak, MA. and Sigmund, K., Equal Pay for all Prisoners. / The Logic of Contrition IIASA Interim Report IR-97-73 1997. [10] Boyd, R., Mistakes Allow Evolutionary Stability in the Repeated Prisoner's Dilemma Game, J. Theor. Biol. 136, pp. 47-56, 1989. [II] Carlsson, B., How to Cooperate in Iterated Chicken Game and Iterated Prisoner's Dilemma Intelligent Agent Technology pp. 94-98 1999. [12] Carlsson, B. and Johansson, S. "An Iterated Hawk-and-Dove Game." In W. Wobcke, M. Pagnucco, C. Zhang ed. Agents and Multi-Agent Systems Lecture Notes in Artificial Intelligence 1441 p. 179—192, Springer-Verlag, 1998. [13] Carlsson, B., Johansson, S. and Boman, M., Generous and Greedy Strategies Proceedings of the Congress on Complex Systems, Sydney 1998. [14] Carlsson, B. and Jonsson, K.I., The fate of generous and greedy strategies in the iterated Prisoner's Dilemma and the Chicken Game under noisy conditions. Manuscript 2000. [15] Durfee, E.H., Practically Coordinating AI Magazine 20 (1) pp. 99-116, 1999. [16] Fudenberg, D. and Maskin, E., Evolution and cooperation in noisy repeated games, American Economic Review 80 pp. 274-279, 1990.
200
B.
Carlsson
[17] Goldberg, D. Genetic Algorithms Addison-Wesley, Reading, MA, 1989. [18] Holland, J.H. Adaptation in natural and artificial systems MIT Press, Cambridge, MA, 1975. [19] Jennings, N. and Wooldridge, M., "Applying Agent Technology" in Applied Artificial Intelligence, vol.9 NoA p 357-369 1995. [20] Koza, J. R. Genetic Programming On the Programming of Computers by Means of Natural Selection. The MIT press, Cambridge, MA, 1992. [21] Lindgren, K., Evolutionary Dynamics in Game-Theoretic Models in The Economy as an Evolving Complex System II (Arthur, Durlauf and Lane eds. Santa Fe Institute Studies in the Sciences of Complexity, Vol XXVII) AddisonWesley, 1997. [22] Lipman, B.L., Cooperation among egoists in Prisoner's Dilemma and Chicken Game. Public Choice 51, pp. 315-331, 1986. [23] Lomborg, B., Game theory vs. Multiple Agents: The Iterated Prisoner's Dilemma, in Artificial Social Systems (C. Castelfranchi and E. Werner eds. Lecture Notes in Artificial Intelligence 830) 1994. [24] Mathieu, P., and Delahaye, J.P., http:/www.lifl.fr/~mathieu/ipd/ [25] Maynard Smith, J. and Price, G.R., The logic of animal conflict, Nature vol. 246, 1973. [26] Maynard Smith, J., Evolution and the theory of games, Cambridge University Press, Cambridge 1982. [27] Molander, P., The optimal level of generosity in a selfish, uncertain environment, J. Conflict resolution 29 pp. 611-618 1985. [28] Rapoport, A. and Chammah, A.M., Prisoner's Dilemma A Study in Conflict and Cooperation Ann Arbor, The University of Michigan Press 1965. [29] Rapoport, A.and Guyer, M. A taxonomy of 2 x 2 games Yearbook of the Society for General Systems Research, XL203-214, 1966. [30] Rosenschein, J. and Zlotkin, G., Rules of Encounter, MIT Press, Cambridge, MA, 1994. [31] Russell, B., Common Sense and Nuclear Warfare Simon & Schuster 1959. [32] Selten, R., Reexamination of the perfectness concept for equilibrium points in extensive games. International Journal of Game theory, 4:25-55, 1975.
Chapter 8
Training Intelligent Agents Using Human Data Collected on the Internet
Elizabeth Sklar Department of Computer Science Boston College, USA Alan D . Blair Department of Computer Science and Software Engineering University of Melbourne, Australia Jordan B. Pollack DEMO Lab Department of Computer Science Brandeis University, USA
8.1
Introduction
Hidden inside every mouse click and every key stroke is valuable information that can be tapped, to reveal something of the human who entered each action. On the Internet, these inputs are called clickstream data, "derived from a user's navigational choices expressed during the course of visiting a World Wide Web site or other online area." [6] Clickstream data can be analyzed in two ways: individually, as input from single users, or collectively, as input from groups of users. Individualized input may be utilized to create user profiles that can guide activities dn a web site tailored to the needs of a particular person. Data mining 201
202
E. Sklar, A. D. Blair and J. B. Pollack
the clickstream to customize to individual users is nothing new. Starting as early as 1969, Teitelman began working on the a u t o m a t i c error correction facility t h a t grew into DWIM (Do W h a t I Mean) [24]. In 1991, Allen Cypher demonstrated "Eager", an agent t h a t learned to recognise repetitive tasks in an email application and offered to j u m p in and take over for the user [7]. In 1994, P a t t i e Maes used machine learning techniques to train agents to help with email, to filter news messages and to recommend entertainment, gradually gaining confidence at predicting what a user wants to do next [12]. Today, commercial products like MicrosoftWord provide contextsensitive "wizards" t h a t observe their users and pop u p to assist with current tasks. Internet sites like altavista (http://www.aitauista.com) recognise keywords in search requests, offering alternate suggestions to help users hone in on desired information. At the amazon.com (http://www.amazon.com) book store, after finding one title, other books are recommended to users who might be interested in alternate or follow-up reading. On m a n y sites, advertisements which at first seem benign, slowly a d a p t their content to the user's input, subtly wooing unsuspecting surfers. Input from users m a y also be examined collectively and grouped t o illuminate trends in h u m a n behavior. Users can be clustered, based on a feature like age or gender or win rate (of a game), and the behavioral d a t a for all h u m a n s exhibiting the same feature value can be grouped and analyzed, in an a t t e m p t to recognize characteristics of different user groups. An Internet system allows us to combine user profile knowledge with statistics on group behavior (from a potentially very large set of humans) in order to make more informed decisions about software a d a p t a t i o n t h a n input from a single source would provide. These techniques may prove especially useful when applied to educational software. T h e work presented here examines these ideas in the context of an Internet learning community where h u m a n s and software agents play games against each other.
Training Intelligent
8.2
Agents Using Human Data Collected on the Internet
203
Motivation
Many believe t h a t the secret to education is motivating the student. Researchers in h u m a n learning have been trying to identify the elements of electronic environments t h a t work to captivate young learners. In 1991, Elliot Soloway wrote "Oh, if kids were only as motivated in school as they are in playing Nintendo." [23] T w o years later, Herb Brody wrote: "Children assimilate information and acquire skills with astonishing speed when playing video games. Although much of this gain is of dubious value, the phenomenon suggests a potent m e d i u m for learning more practical things." [5] T h o m a s Malone is probably the most frequently referenced author on the topic of motivation in educational games. In the late 1970's and early 1980's, he conducted comprehensive experimental research to identify elements of educational games t h a t m a d e t h e m intrinsically motivating [13]. He highlighted three characteristics: challenge, fantasy and curiosity. We are primarily interested in the first characteristic. Challenge involves games having an obvious goal and an uncertain outcome. Malone recommends t h a t goals be "personally meaningful" , reaching beyond simple demonstration of a certain skill; instead, goals should be intrinsically practical or creative. He emphasizes t h a t achieving the goal should not be guaranteed and suggests several elements t h a t can help provide this uncertainty: variable difficulty level, multiple goal levels, hidden information, randomness. He states t h a t "involvement of other people, both cooperatively and competitively, can also be an i m p o r t a n t way of making computer-based learning more fun." [14] We concentrate on multi-player games, particularly on the Internet because it is widely accessible. T h e Internet offers the additional advantage t h a t participants can be anonymous. Indeed, participants do not even have to be h u m a n — they can be software agents. We take a population-based approach to agency [15]. Rather t h a n building one complex agent t h a t can play a game using m a n y different strategies, we create a population of simple software agents, each exhibiting single strategies. T h e notion of training agents to play games has been around since at least the 1950's, beginning with checkers [20] and chess [21; 4], and branching out to include backgammon [3; 25; 16], tic-tac-toe [l], Prisoner's Dilemma [2; 9] and the game of tag [18]. W i t h these efforts, the goal was to build a champion agent capable of defeating all of its opponents.
204
E. Sklar, A. D. Blair and J. B. Pollack
Our work differs because our goal is to produce a population of agents exhibiting a range of behaviors that can challenge human learners at a variety of skill levels. Rather than trying to engineer sets of strategies associated with specific ability levels or to adapt to individual players, we observe the performance of humans interacting in our system and use these data to seed the population of agents. This chapter describes our efforts training agents in two domains: one is a video game and the other is an educational game. In both cases, the agents were trained using human data gathered on our web site. We use this data both individually and collectively. With the individual, or one-toone, method, we use input from one human to train a single agent. With the collective, or many-to-one, approach, we use input from a group of humans to train a single agent. The first major section of the chapter details the video game domain, outlining the agent architecture, the specifics of the training algorithm and experimental results. The second major section provides similar discussion of the educational game and additionally compares the results obtained in the two domains. Finally, we summarize our conclusions and highlight future directions.
8.3
The first domain: Tron
Tron is a video game which became popular in the 1980's, after the release of the Disney film with the same name. In Tron, two futuristic motorcycles run at constant speed, making right angle turns and leaving solid wall trails behind them — until one crashes into a wall and dies. In earlier work led by PabloFunes[8], we built a Java version of the Tron game and released it on the Internet (http://www.demo.cs.brandeis.edu/tron) (illustrated in Figure 8.1). Human visitors play against an evolving population of intelligent agents, controlled by genetic programs (GP) [ll]. During the first 30 months online (beginning in September 1997), the Tron system collected data on over 200,000 games played by over 4000 humans and 3000 agents. In our version of Tron, the motorcycles are abstracted and are represented only by their trails. Two players — one human and one software agent — each control a motorcycle, starting near the middle of the screen and heading in the same direction. The players may move past the edges of the screen and re-appear on the opposite side in a wrap-around, or toroidal, game arena. The size of the arena is 256 x 256 pixels. The agents are pro-
Training Intelligent
Agents Using Human Data Collected on the Internet
Hli
'. v-r
Trbn
1
j
•j^^QH^I
Fig. 8.1
205
•
The game of Tron.
vided with 8 simple sensors with which to perceive their environment (see Figure 8.2). The game runs in simulated real-time (i.e., play is regulated by synchronised time steps), where each player selects moves: left, right or straight.
Fig. 8.2
Agent sensors.
Each sensor evaluates the distance in pixels from the current position to the nearest obstacle in one direction, and returns a maximum value of 1.0 for an immediate obstacle (i.e., a wall in an adjacent pixel), a lower number for an obstacle further away, and 0.0 when there are no walls in sight.
Our general performance measure is the win r a t e , calculated as the number of games won divided by the number of games played. The overall win rate of the agent population has increased from 28% at the beginning of our experiment (September 1997) to nearly 80%, as shown in Figure 8.3(a).
206
E. Sklar, A. D. Blair and J. B. Pollack
During this time, the number of human participants has increased. Figure 8.3(b) illustrates the distribution of performances within the human population, grouped by (human) win rate. While some segments of the population grow a bit faster than others, overall the site has maintained a mix of human performances. Tion g u i e t wanJlnal {tamplhg r>t»=10aa)
(a) Agent win rate. Fig. 8.3
(b) Distribution of human population.
Results from the Internet experiment.
The data collected on the Internet site consists of these win rate results as well as the content of each game (referred to as the moves string). This includes the length of the game (i.e., number of time steps) and, for every turn made by either player, the global direction of the turn (i.e., north, south, east or west) and the time step in which the turn was made. 8.3.1
Agent Training and
Control
We trained agents to play Tron, with the goal of approximating the behaviour of the human population in the population of trained agents. The training procedure uses supervised learning[17; 26], as follows. We designate a player to be the trainer and select a sequence of games (i.e., moves strings) that were played by that player, against a series of opponents, and we replay these games. After each time step, play is suspended and the sensors of the trainer are evaluated. These values are fed to a third player, the trainee (the agent being trained), who makes a prediction of which move the trainer will make next. The move predicted by the trainee is then compared to the move made by the trainer, and the trainee's control mechanism is adjusted accordingly.
Training Intelligent
Agents Using Human Data Collected on the Internet
207
The trained agents are controlled by a feed-forward neural network (see Figure 8.4). We adjust the networks during training using the backpropagation algorithm [19] with Hinton's cross-entropy cost function [10]. The results presented here were obtained with momentum = 0.9 and learning jrate = 0.0002. Fig. 8.4
Q Q ^ O ^ ^ \ ^ J>=\ r-\ (_) )>N ^~">»- />>
O
o o
tanh
{^J
\^J
Straight
/~\ ^S-^^ /~\ . . (~) output nodes hidden nodes
input nodes
8.3.2
sigmoid
Agent control architecture.
Each agent is controlled by a feed-forward neural network with 8 input units (one for each °f the sensors in Figure 8.2), 5 hidden units and 3 output units — representing each of the three possible actions (left, right, straight); the one with the . largest value is selected as the action for the agent.
Challenges
The supervised learning method described above is designed to minimize the classification error of each move (i.e., choosing left, right or straight). However, a player will typically go straight for 98% of time steps, so there is a danger that a trainee will minimize this error simply by choosing this option 100% of the time; and indeed, this behaviour is exactly what we observed in many of our experiments. Such a player will necessarily die after 256 time steps (see Figure 8.5). Conversely, if turns are emphasized too heavily, a player will turn all the time and die even faster (Figure 8.5b). The discrepancy between minimizing move classification error and playing a good game has been noted in other domains [25] and is particularly pronounced in Tron. Every left or right turn is generally preceded by a succession of straight moves and there is a natural tendency for the straight moves to drown out the turn, since they will typically occur close together in sensor space. In order to address this problem, we settled on an evaluation strategy based on the frequency of each type of move. During training, we construct a table (table 8.1) that tallies the number of times the trainer
208
E. Sklar, A. D. Blair and J. B. Pollack
and trainee turn, and then emphasize turns proportionally, based on these values.
(a) a trainee that makes no turns
(b) a trainee that only makes turns
Fig. 8.5
(c) a trainee that learns to turn
(d) the trainer
A comparison of different trainees.
All had the same trainer; trainee variations include using 12-input network and different move evaluation strategies. All games are played against the same GP opponent. The player of interest is represented by the solid black line and starts on the left.
Table 8.1
Frequency of moves, for the best human trainer.
trainer
8.3.3
Experiments
left straight right
and
left 852 5723 123
trainee straight 5360 658290 4668
right 161 5150 868
Results
We trained three populations of players: one with GP trainers and two with human trainers. Although our goal is to approximate the behaviour of the human population, we initially tuned our training algorithm by training agents to emulate the behaviour of the GP players from the Internet site. These GPs are deterministic players (so their behaviour is easier to predict than humans'), thus providing a natural first step toward our goal. Separate training and evaluation sets were compiled for both train-
Training Intelligent
Agents
Using Human Data Collected on the Internet
209
ing efforts, as detailed in Figure 8.6. There were 69 GPs who had played more than 1000 games on the Internet site (agentslOOO); these were used as trainers. There were 135 GPs who had played between 100 and 1000 games (agentslOO); these were used for evaluation purposes. There were 58 humans who had played more than 500 games on the Internet site (humans500); these were used as human trainers. data for GP trainees training set evaluation set
agentslOOO agentslOOO vs vs agentslOOO agentslOO
data for human trainees Internet data humans500 vs GPs training set
evaluation set
humans500 = humans > 500 Internet games (58 humans) agentslOO = GPs < 1000 Internet games, and > 100 Internet games (135 agents) agentslOOO = GPs > 1000 Internet games (69 agents)
evaluation set n= agentsl 00
Fig. 8.6
D a t a sets for training and evaluation.
The humans500 data set was used both individually and collectively. First, 58 individual trainees were produced, based on a one-to-one correspondance between trainers and trainees. Second, 10 collective trainees were produced, based on a many-to-one correspondance between trainers and trainees, where the 58 individuals were sorted into 10 groups based on their win rates (e.g., group 1 had 0-10% win rate, group 2 had 10-20% win rate, etc.). Each GP trainer played against agentslOOO to produce a training set and against agentslOO to produce an evaluation set. The games played by humans500 were alternately placed into training and evaluation sets, and then the evaluation set was culled so that it consisted entirely of games played against members of the agentslOO group. We examine our training efforts in two ways. First, we look directly at the training runs and show the improvement of the networks during training. Second, we present the win rates of the two populations of trainees, obtained from playing them against a fixed set of opponents, and compare trainers with their trainees. Our measure of improvement during training is based on the frequency of moves table and how it changes. Referring back to table 8.1, if the trainee were a perfect clone of its trainer, then all values outside the diagonal would be 0 and the correlation coefficient between the two players would be 1. In reality, the GP trainees reach a correlation of approximately 0.5, while
210
E. Sklar, A. D. Blair and J. B. Pollack
the human trainees peak at around 0.14. For comparison, we computed correlation coefficients for 127 random players, i.e., players that choose a move randomly at each time step, resulting in a much smaller correlation of 0.003. Figure 8.7 shows the change in correlation coefficient during training for selected trainees.
number of training cycles
(a) GPS
Fig. 8.7
x1n
e
number ot training cycles
(b) humans (one-to-one)
Change in correlation coefficient during training runs.
In the GP experiment, the best trainer gave rise to the worst trainee, hence the label for the figure on the left "best trainer and worst trainee" refer to the same player. In the human one-to-one experiment, the best trainer produced the best trainee, hence the label for the figure on the right "best trainer and best trainee" refer to the same player. The terms "best" and "worst" refer to the win rates of the players as measured in games played against the evaluation set (see Figures 8.8(b) and 8.8(d)). The win rates in the evaluation games for the trainers and trainees are shown in Figure 8.8, for each of three training efforts: (1) G P training (Figures 8.8a and 8.8b), (2) human one-to-one training (Figures 8.8c and 8.8d), and (3) human many-to-one training (Figures 8.8e and 8.8f). There are two types of plots shown. The first column contains the first type of plot (for each training group, Figures 8.8a, 8.8c and 8.8e). Here, the players are sorted within each population according to their win rate, so the ordering of individuals is different within each trainer and trainee population. The plot demonstrates that the controllers have learned to play Tron at a variety of different levels. The second column contains the second type of plot (for each training group, Figures 8.8b, 8.8d and 8.8f). Here, we plot the win rate of individual
Training Intelligent
Agents Using Human Data Collected on the Internet
211
trainees against the win rate of their corresponding trainers. It is interesting to notice that the best human trainer (from Figure 8.8d) has given rise tothe best trainee (see Figures 8.9a and 8.9b), while the best GP trainer (from Figure 8.8b) has produced the worst trainee (see Figures 8.9c and 8.9d). A few of the trainees play very poorly. These are cases where the network either fails to make any turns or makes turns at every move (in spite of the strategy described in section 8.3.2). Also, in a number of cases, the trainee outperforms its trainer.
212
E. Sklar, A. D. Blair and J. B. Pollack
~
50
|
40
-JP-
*
P\ 10
20
30
individual players, in sorted order by win rats
(a)
GPS
40 SO 60 70 win rate of trainers (%)
x.
( b ) GPS
win rates of human population
100 90
90
80
^"°°.
50 of trainees (%)
70
^ ^
gw
oooo°° ^OOO
1 50 oo° 5
40
o°
oo
c
nna DD
QO
5
30
„o°°
•
•
*g 40
ann°
DD D
20
.
30 20
• *
ooo
o •>
10 O
original trainees
"
10
sBDnooDoofmBDeH 10
20
30
individual players, In sorted order by win rate
(c) humans: one-to-one
40 50 60 70 win rate of trainers (%)
(d) humans:
80
90
100
90
100
one-to-one
composite human trainers and trainees
win rates of composite human population
90 80 H
°
M
o
£ 70
•
1
B0
<•
win rate
£
70
e
-
30 20 10 fl i
•
°
» a
D
original trainees 10
Individual players, in sorted order by win rate
(e) humans: many-to-one
Fig. 8,8
20
30
40 50 SO 70 win rata of trainers (%)
80
(f) humans: many-to-one
Win rates of trainer and trainee populations.
T h e horizontal lines denote boundaries for grouping players (according to win rate); the h u m a n trainers produce a population of trainees with a distribution across these groupings fairly similar to their own.
Training Intelligent
Agents Using Human Data Collected on the Internet
213
Finally we step away from statistics and highlight some of the trainers and their trainees by showing selected games against the same opponent. Note two situations where a trainer t h a t is a bad player produces a trainee t h a t plays well (Figures 8.9e and 8.9f), and a trainer t h a t is a good player produces a trainee t h a t plays poorly (Figures 8.9g and 8.9h).
(a) trainee (b) trainer best human trainer is also best trainee
(c) trainee (d) trainer best GP trainer is also worst trainee
(e) trainee (f) trainer bad player can produce good trainee
(g) trainee (h) trainer good player can produce bad trainee
Fig. 8.9
Sample games of individual trainers and trainees.
All games are played against the same GP opponent. T h e player of interest is represented by the solid black line and starts on the left hand side of the arena.
8.3.4
Discussion
T h e overwhelming dominance of the straight move inherent in the Tron domain makes it difficult for most controllers to learn when to turn. Indeed, this characteristic proved to be extremely challenging, and initially
214
E. Sklar, A. D. Blair and J. B. Pollack
we produced hundreds of networks t h a t never learned to turn. T h e evaluation strategy t h a t we settled on (based on the frequency of moves table) has allowed players to learn effectively. However, we believe t h a t this method works to produce players t h a t turn only when necessary, and cannot result in more varied behaviours such as those illustrated in Figures 8.5d, 8.9b and 8.9h. While this precise evaluation strategy is highly domain dependent, the technique may be quite valuable for training in domains where one input tends to swamp others and for learning to generalize h u m a n behaviour in more complex domains. We make several observations about the results we have obtained, speculating on the discrepancies between trainers and trainees and addressing the issues raised at the beginning of section 8.3.3. How can we explain a trainer t h a t wins 2% of the time, yet produces a trainee t h a t wins 50% of the time (see Figure 8.8b)? T h e trainee is not being trained on whether it wins or not — in fact the trainee doesn't know if it wins at all. T h e trainee learns only from a sequence of moves. If the trainer makes nine good moves and then a bad one ends the game, the trainee has still gained from 90% of this experience. Does our m e t h o d produce controllers t h a t can play a decent game of Tron? Yes — and one conclusion we can draw from our statistics is t h a t a population of h u m a n s can act as effective trainers for a graded population of agents, because there is naturally more variation in behaviour both across an entire population of humans and within a single stochastic h u m a n player. It is i m p o r t a n t for artificially trained players to experience a wide variety of behaviours, otherwise they will not be robust and will only perform well against players with styles similar to those of their trainers. Were we able to produce a population t h a t approximates the behaviour of its trainers? This is a difficult question to answer. While the correlation between individual GP trainers and trainees based on choice of move is much higher than t h a t for humans, the correlation between win rates of individual trainers and trainees against the same opponent is better for h u m a n s . We speculate t h a t the discrepancies may be due to artifacts of the domain and the nature of each type of controller. Features t h a t contribute include: GPs are deterministic players (vs. non-deterministic h u m a n s ) , and GPs share a limited view of their environment, using the same sensors t h a t are employed by the trainee networks. T h e h u m a n players, in contrast, have a global view of the playing arena which is not practical for artificial controllers in this context.
Training Intelligent
Agents Using Human Data Collected on the Internet
215
Humans often produce different responses when presented with the same situation multiple times. Clearly then, it is not possible for a deterministic controller to model the behaviour of the humans exactly. Further work is exploring adding some measure of non-determinism to the controller. Nonetheless, we propose to take advantage of networks that are able to filter out mistakes that humans make and thus achieve performance superior to that of their trainers — as was the case for 19 of the 58 human trainees.
8.4
The second domain: CEL
CEL (Community of Evolving Learners) [22] is an Internet learning community created for children. It is located on the web (http://www.demo.cs.brandeis.edu/cei) and is open to anyone with a Java-enabled browser. Inside CEL, participants engage in multi-player educational games. If not enough humans are logged into the site, then software agents act as artificial players, maintaining an active presence in the system at all times and thereby sustaining the community. The CEL system was designed as a framework to host experiments focused on learning, in humans and in machines. CEL differs from other Internet learning communities — particularly because it is more accessible, it enforces user anonymity, it is designed for children, it supports real-time multi-user activities and it offers a shareable server that can act as host to others' activities. Its basis is in computer science, not education, so the purpose is not to put forward a new pedagogical example. On the contrary, the goal is to establish a platform that others with research interests in human learning, cognitive science or artificial intelligence can use to define and implement their own studies. The work presented here represents one such study, in which human data collected in the CEL system was used to train software agents to play a simple keyboarding (typing) game.
8.4.1
A brief tour of CEL
Students log into the CEL web site with an individual user name and password. In order to maintain privacy, the user name is never shown to others; instead, participants are represented inside the system by two-dimensional graphical icons called IDsigns, which users create using a pixel editing tool. After logging in, students are shown a simple menu page containing
216
E. Sklar, A. D. Blair and J. B. Pollack
a list of available activities. Clicking- on a game icon selects that activity. Next, users are placed in an open playground, a page that contains a matrix filled with IDsigns belonging to other users who are currently logged into CEL and are playing the-same game (Figure 8.10a). These are a user's playmates; together they comprise a user's playgroup. By clicking on a playmate's IDsign, a student invites that playmate to join her in a match. The match begins when the browser displays a game page, containing a Java applet that facilitates play. Both players participate according to the particular format of the selected game. When the match is over, each player is returned to his playground and is then free-to engage in another match with another (or the same) playmate.
(a) A sample playground. Fig. 8.10
(b) The game of Keyit. CEL screens.
The work presented here is based on a simple game called Keyit (Figure 8.10b), a two-player activity in which participants are each given ten words to type and are scored based on speed and accuracy. The same set of words is presented to both players, selected from a database containing nearly 3§,000 words. Every word in the database is characterized by a
Training Intelligent
Agents Using Human Data Collected on the Internet
217
vector of seven feature values: word length, key boarding level*, Scrabble scoret, number of vowels, number of consonants and number of 2 and 3consonant clusters. For each player, a timer begins when she types the first letter of a word and stops when she presses the Enter key to terminate the word. Time is measured using the system clock on her computer. In order to protect young players, there is no chat facility inside CEL. Participants communicate only through the moves of the games they are playing. Most multi-user environments involve some type of natural language communication, even MUD's*, which generally use a restricted form of English (or other common spoken language). Deploying a believable software agent as a substitute human partner in the CEL environment is therefore a simpler task than in other settings. 8.4.2
Agent
control
Inside CEL, software agents need to exhibit three categories of behaviors: (1) system behavior, (2) playground behavior, and (3) game behavior. System behavior refers to high-level actions like logging into and out of CEL at particular times of day and selecting different playgrounds. Playground behavior refers to entering and exiting playgrounds and inviting playmates to engage in matches. Game behavior refers to the play within a specific game. A top-level controller for the agent decides which behavior to follow, based on the agent's current state (e.g., residing in a playground or playing a game). Here, we limit our discussion to game behavior, specifically for the game of Keyit. The basic task is as follows: given a word, characterized by its corresponding set of seven feature values, output the length of time to type the "Several standards define an order for introducing keys to students learning typing. We assign a value based on the highest keyboarding level of any of the letters in each word, according to the ordering listed here: http://www.absurd.org/jb/typodrome/. t Scrabble is a board game in which players take turns making interconnecting words by placing letter tiles on a grid in crossword puzzle fashion. Each letter is assigned a value, according to its frequency of use in American English. Players receive a score for each word they place — the sum of the values for each letter in the word. 'Multi-User Domain
218
E. Sklar, A. D. Blair and J. B. Pollack
word. In addition to the feature values, we also consider the amount of time that has elapsed since the previous word was typed§. The agents are controlled by feed-forward neural networks. The network architecture is shown in Figure 8.11. There are 8 input nodes, corresponding to each of the seven feature values (normalized) plus the elapsed time. The elapsed time is partially normalized to a value between 0 and (close to) 1. There are 3 hidden nodes and one output node, which contains the time to type the input word, in hundredths of a second. word length keyboarding level scrabble score
s-("
\ ^
s-( A T - V V - J ^ ^ . s»-( r^C\%?S<^/%?
^-^ \
number of vowels number of consonants number of 2-consonant clusters number of 3-consonant clusters elapsed time since last entry input Fig. 8.11
8.4.3
Agent
hidden layer
output
Neural network architecture.
training
In 1999, a pilot study was conducted in which CEL was used by 44 fourth and fifth grade children at a public primary school'. The primary objective of this study was to examine the effectiveness of the CEL mechanism, and so the activities were limited to simple games, one of which was Keyit. The last 19 days of data collection (spread over four months) were used as the basis for training the agents to play Keyit. Note that the humans were learning throughout this period, so the networks were trained to approximate the average performance of each human across the entire time period. §This only pertains to words within the same game. The first word in a game has an elapsed time value of 0. ' A l l participants had signed parental permission.
Training Intelligent
Agents Using Human Data Collected on the Internet
219
For each child involved in the pilot study, we gathered all the moves from all games of Keyit. A "move" includes a timestamp, the word being typed and the amount of time that the player took to type the word. Then we calculated the time that had elapsed between moves (based on consecutive timestamps) and, along with the seven feature values for each word, created two files — one for training and one for testing — placing moves from alternate games in each file. We conducted two training experiments. First, we defined a one-to-one correspondence between human trainers and network trainees, with the goal of producing 44 agents whose behaviors emulate their individual trainers. Second, we employed a many-to-one correspondence between groups of human trainers and network trainees. For the second method, we clustered human trainers into eight groups, based on their similarity in average typing speed. The objective here was to generate a population of agents that could challenge human learners across a range of abilities. Figure 8.12 contains data for all the humans involved in the pilot study, plotted in ascending order according to typing speed (in letters/second). We highlight four students: one fourth grade boy (id = 119), one fourth grade girl (id = 98), one fifth grade boy (id = 88) and one fifth grade girl (id = 89). The plot also indicates the groupings of players, used for the many-to-one training scenarios. The players are clustered according to typing speed, in increments of 0.5 letters/sec.
1
3.5
° ° *
tl 0.5
gr4 girts gr4 boys gr5 giris gr5 boys
«*°i o*
!
98
individual players
0
Fig. 8.12
average typing speeds of players, with standard deviation.
As with the Tron networks described in the previous section, we trained
220
E. Sklar, A. D. Blair and J. B. Pollack
the Keyit networks using supervised learning. The networks were presented with a series of moves, and they predicted the trainer's speed for those moves. Based on the accuracy of the trainees' predictions, the network weights were adjusted using backpropagation. The results presented here were obtained with a learning rate of 0.00001. All the networks were trained for 10,000 epochs, but progress generally leveled off after 2500 epochs. Throughout the training sequence, we kept track of the prediction error for the network — the difference between the typing time predicted by the network and the actual typing time of the training set. We saved one "best" network for each training sequence, corresponding to the set of weights which resulted in the smallest prediction error. After the training sequences were completed, we evaluated the best networks for each effort by comparing its prediction with the human's data, for both the training set and the (reserved) test set of data. 8.4.4
Results
We look at the results of the training efforts in several ways. First, we look at the training period and show how the network improved its predictive ability during training. Figure 8.13 shows the performance of the networks trained for the four sample players (88, 89, 98 and 119). The plots in the top row illustrate the prediction error for the networks. The solid curve plots the error based on the test data set; the dashed curve plots the error based on the training data set. The plots in the bottom row show how the error in typing speed improves over time, when the networks are confronted with the test data set (solid curve) and the training data set (dashed curve). The networks learn quite quickly, sometimes within 500 epochs. It is interesting to note that in some cases, as with players 98 and 119, the difference in prediction error between the training and test data sets is relatively marked; however the difference in typing speeds is negligible. Another way in which we examine the training effort is by studying the correlation between the trainers and the best trainees. Figure 8.14 plots the typing speed for the trainees (horizontal axes) versus their trainers (vertical axes), for both the test and training data sets, for the one-to-one and many-to-one training efforts. The correlation coefficients are listed in table 8.2, illustrating the average relationship between trainers and trainees across both populations. The correlation is much higher for the many-toone trainees than the one-to-one trainees.
Training Intelligent
""' E s
id = 89
id = 88
id = 98
id = 119 j.
Agents Using Human Data Collected on the Internet
E^
L=J
\
V „£
-
"
[.-, sr\
\
Fig. 8.13
Table 8.2
Improvement during training.
Correlation coefficients: individual training.
one-to-one many-to-one
training set 0.6364 0.9910
4
4
35
3.5
*
5 3
"•»
1. A I' *'Iff "3 •
•v
•a
t
c
test set 0.4204 0.9965
3
I2-5
•»
•
o a
2
•
«
training set test set
1 2 3 trainees (networks), letters/sec
(a) one-to-one Fig. 8.14
training set test set
|0.5
OJ
1 2 3 trainees (networks), letters/sec
(b) many-to-one
Correlation between trainers and best trainees.
221
222
E. Sklar, A. D. Blair and J. B. Pollack
The final way in which we study the results takes a collective, or manyto-one, approach. Figure 8.15 compares the average speeds of the human population with those of the agent populations, for both training schemes. The comparison is made by first sorting both populations according to speed and then calculating the correlation coefficients. In the one-to-one case, sorting the trainees re-orders the comparisons that are made when computing the correlation coefficient, and so the correlation is higher. In the many-to-one case, the population-based correlation between trainers and trainees is precisely the same as in the individual case, because the training went so well that sorting the trainees does not change their order and so the two comparisons are equivalent. Note that the average speeds for the agent population were based on data collected during the testing runs only.
Table 8.3
Correlation coefficients: collective training
test set 0.8002 0.9965
one-to-one many-to-one
J
4r
° °
humans agents
3.5
a!
3
f
o ooo
"i
8 -3 2
2 5
2
CD
_ 0 ooo
-1.5| 1
humans agents
1.5
1
„.»«"
0.5 individual players
(a) one-to-one Fig. 8.15
0
individual players
(b) many-to-one
Correlation between populations of trainers and best trainees.
Training Intelligent
8.5
Agents Using Human Data Collected on the Internet
223
Conclusion
In both domains discussed here, our goal was to approximate the behavior of the human population in a population of software agents. We make several observations from these experiments, first in regard to measuring performance of training efforts and second in regard to organizing data to be used for training. When measuring the performance of training efforts, it is extremely difficult to emulate exactly the behavior of individual humans, even in the limited domains we have studied here. Making direct comparisons after oneto-one training efforts, between trainee and trainer, does not give a reliable indication of how well the training runs have gone (e.g., Figures 8.8b, 8.8d, 8.8f and 8.14). Instead, statistical evaluations should be made by comparing features of the trainee population with features of the trainer population (e.g, Figures 8.8a, 8.8c, 8.8f and 8.15). When organizing data to be used for training, the collective approach, rather than the individual approach, produces more robust trainees. Place human trainers who exhibit similar features in a group together and pool their input data; then use this collective data set to train a smaller number of trainees. The result will be fewer, but more experienced, trainees — because they have trained on a wider data set that is the collective experience of a group of similar humans (e.g., Figures 8.8e, 8.8f, 8.14b and 8.15b). Our goal is to produce a population of graded agents, using human behaviour as the basis for constructing the graded population, and then to select opponents from this population that are appropriate learning partners for humans at various stages of advancement. The next step with this work is to deploy the agent populations that have been described here and implement selection algorithms that will choose the appropriate learning partners. Future work involves building agents that can adapt their performance on-line. One method for accomplishing this would be to train an agent using data from the first few games, deploy the agent and then continue to train it further, by incorporating moves from subsequent games of its human trainer.
224
E. Sklar, A. D. Blair and J. B. Pollack
Acknowledgements Special thanks to Pablo Funes, for his lead on the Tron project. Additional thanks to Travis Gephardt, Matthew Hugger and Maccabee Levine for implementation help, and to Tom Banaszewski and Jackie Kagey for supporting the CEL pilot study. This research was partially funded by the Office of Naval Research under grant N00014-98-1-0435 and by a University of Queensland Postdoctoral Fellowship.
Training Intelligent
Agents Using Human Data Collected on the Internet
225
Bibliography
[1] P. J. Angeline and J. B. Pollack. Competitive environments evolve better solutions for complex tasks. In S. Forrest, editor, Genetic Algorithms: Proceedings of the Fifth International Conference (GA93), 1993. [2] R. Axelrod. The Evolution of Cooperation. Basic Books, 1984. [3] H. J. Berliner. Backgammon computer program beats world champion. Artificial Intelligence, 14, 1980. [4] H. J. Berliner and C. Ebeling. Pattern knowledge and search: The suprem architecture. Artificial Intelligence, 38(2), 1989. [5] H. Brody. Video games that teach? Technology Review, November/December, 1993. [6] CAIP. Privacy code, 1996. [7] A. Cypher. Eager: Programming repetitive tasks by example. In Proceedings of CHI'91, 1991. [8] P. Funes, E. Sklar, H. Juille, and J. B. Pollack. Animal-animat coevolution: Using the animal population as fitness function. In From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, 1998. [9] T. Haynes, R. Wainwright, S. Sen, and D. Schoenefeld. Strongly typed genetic programming in evolving cooperative strategies. In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95), 1995. [10] G. Hinton. Connectionist learning procedures. Artificial Intelligence, 40, 1989. [11] J. Koza. Genetic Programming: On the Programming of Computers
226
E. Sklar, A. D. Blair and J. B. Pollack
by Means of Natural Selection. MIT Press, Cambridge, MA, 1992. [12] P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7):31-40,146, 1994. [13] T. Malone. Toward a theory of intrinsically motivating instruction. Cognitive Science, 4:333-369, 1981. [14] T. Malone. What makes computer games fun? Byte, December 1981. [15] M. Minsky. Society of Mind. Picador, London, 1987. [16] J. B. Pollack and A. D. Blair. Co-evolution in the successful learning of backgammon strategy. Machine Learning, 32:225-240, 1998. [17] D. Pomerleau. Neural Network Perception for Mobile Robot Guidance. Kluwer Academic, 1993. [18] C. W. Reynolds. Competition, coevolution and the game of tag. In R. A. Brooks and P. Maes, editors, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems. MIT Press, 1994. [19] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, 323, 1986. [20] A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3:210-229,1959. [21] C. E. Shannon. Programming a computer for playing chess. Philosophical Magazine [Series 7], 41, 1950. [22] E. Sklar. CEL: A Framework for Enabling an Internet Learning Community. PhD thesis, Brandeis University, 2000. [23] E. Soloway. How the nintendo generation learns. Communications of the ACM, 34(9), 1991. [24] W. Teitelman. A display oriented programmer's assistant. International Journal of Man-Machine Studies, 11:157-187, 1979. [25] G. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8, 1992. [26] G. Wyeth. Training a vision guided robot. Machine Learning, 31, 1998.
Chapter 9
Agent Dynamics: Soap Paradigm
Felix W . K. Lor Intelligent & Interactive Systems, Department of Electrical & Electronic Engineering, Imperial College of Science, Technology and Medicine, UK 9.1
Multi-agent systems
Over the past few decades machine intelligence has been growing more attraction in research and real world applications. In the pre-historic period of time, human beings only made use of simple tools by grasping a branch or a stone in order to capture a prey or cook food. During the evolution, human beings learnt to produce composite materials, such as steel, rather than using raw materials only. Moreover, they sharpened their simple tools to create better equipment. Changing from an individual to tribes, human beings grouped together and formed a larger society. The community grows bigger and the social structure becomes more complicated. Human beings then split up into various companies to do different jobs. Different people took different roles and give different contributions to the society. People gradually own their property. Afterwards, a market for trading exists among people to provide a media to exchange their goods and services. Not only food catering, but people also earn their living by accommodation, dressing, travel and entertainment, many task-specific human agents then emerged in the society in order to ease our lives. Human society is one of the typical examples of multi-agent systems.
227
228
F. W. K. Lor
9.1.1
Development
of
computers
5000 years ago, the abacus, known as the first computer, was invented in China [17]. In 1642, Pascal invented a numerical mechanical calculator. Later in the eighteenth century, it came the industrial revolution. A large amount of machinery was brought into the industry to ease the tasks of workers and hence enhanced the performance. In 1822, Charles Babbage proposed to create the Difference Engine and the first general-purpose computer, the Analytical Engine. Although he eventually failed to manufacture them, he introduced a breakthrough in the concept of general-purpose computer. During the World War II, the first electronic computer was made by Atanasoff and Berry. Since then the capability and capacity of machines has been exponentially increasing. Machines have become more and more important in our daily life. They are not only used for performing routine jobs, but also for solving complicated tasks. In the 1960s, the Internet emerged to link up computers together and in 1991 Tim Berners-Lee firstly introduced the World Wide Web technology [3], which allows users to easily access information throughout the world. Similar to the evolution of mankind, the development of machines and artificial intelligence was initially to handle simple tasks with simple machines. It then further develops to do sophisticated jobs with complex systems. At the beginning, vacuum tubes were the basic building blocks of computers in 1940s, which later became transistors in 1950s, integrated circuit (IC) in 1960s to present ultra-large scale integrations (ULSI). Machine intelligence is also developing from centralized expert systems to distributed multi-agent systems [16]. Each agent that acts like a human in the society plays its own different role and has its own targets and attempts to interact* with one another in order to achieve the final goal. 9.1.2
Game
theory
Typically game theory is one common methodology to analyze the behaviors of agents. The origin of game theory can be traced back to the beginning of two thousand years ago around the Babylonian period [2]. In 1944, J. von Neumann and O. Morgenstern published a concrete theory of a twoperson zero sum game [30]. Then game theory has attracted considerable attention. Since the early 1950s, J. Nash proved the existence of equilibrium *An agent can either cooperate with its partners or compete against its opponents.
Agent Dynamics:
Soap Paradigm
229
points [28] and also established a bargaining theory [27], which later earned him to get a Nobel Price. Afterwards, L. Shapley [38], J. C. Harsanyi [12; 13; 14] and R. Selten [37] et al. studied intensively on equilibrium behaviors and equilibrium selection rules by trembling hand perfect equilibria [15]. Based on a traditional game theoretical approach, all information can be generally distributed to other agents by communications inside different sub-games. Therefore, every state of the systems, as well as the probability of each transition, can be easily written down. If the system is finite and the state manifold is static*, we can figure out the whole space by stating all the possible outcomes and traditional probabilities. On the contrary, if the system is infinite and the state manifold is dynamic*, it is not so trivial to express the state manifold at the time step t+1 by the state manifold at the time step t, unless the transformation among different state manifolds — the curvature — is known. Thus, traditional game theory is not appropriate to study the massive dynamic multi-agent systems. In order to avoid this pitfall, a novel dynamic model, soap froths, has been introduced to emulate the complex behavior of massive multi-agents.
9.2
Multi-soap agent model
Multi-soap agent model is mainly concerned with local interactions among agents in each agent's neighborhood, especially with the agents containing geometrical constraints. It is different from the traditional game theoretical approach. This novel model does not require that each agent perceives all information of the entire system through communications. Each of them obtains a part from the nearest neighbors and then constructs its own model to predict the whole world. In the following, the multi-soap agent model is described in detail. Before mentioning soap bubbles as multi-agents, it is essential to understand the basics of natural soap bubbles. Over centuries, scientists have been developing many interesting theories and observing many interesting phenomena that govern the dynamics of soap bubbles. ' All the states are fixed — the states at the time step t + 1 are exactly the same as the states at the previous time step t. '•The state manifold is not flat as usual, the distance measure may not be able to be well-defined if the space is inhomogeneous. Even if the state manifold is finite but the changes are rapid, the system is still not able to update or process all information within an instant of time period.
230
F. W. K. Lor
There are also many other phenomena related to the behaviors of soap bubbles. 9.2.1
Brief historical
review of soap
bubbles
Soap bubbles are one of the most beautiful objects in nature [4]. Since Leonardo da Vinci, it has drawn the interest of many scientists. He discovered the rising effect of a liquid at the interface of a capillary tube. In the eighteenth century, J. A. Segner introduced the important concept of surface tension [34] and then T. Young [45] proposed the concept of the angle of contact to explain the capillary action. In the early nineteenth century, a Belgian physicist, J. A. F. Plateau [ll], spent most of his lifetime investigating the surface properties of fluids. Because of prolonged exposure of his eyes to the sun, he eventually became blind. In his research, he observed that the surfaces of soap films have to retain a particular pattern at the equilibrium state. As a consequence, his results gave a significant contribution to the latter study of minimal surfaces. Before the budding of differential geometry [31], minimal surfaces were one of the hardest problems in mathematics. Even to date, it is not a trivial question to answer, "What is the surface of smallest area spanning a given contour?" With the help of powerful computers, scientists are able to investigate it by using computer graphical tools [5]. Minimal surfaces can be exploited to elucidate the most stable configuration of geometrical structures in soap films. Afterwards, many scientists investigated not only the static patterns of soap films, but also the dynamics of soap froths. In the late nineteenth century, J. W. Gibbs observed the draining and thinning effects of soap films. Then, J. Dewar [6] and A. S. C. Lawrence [20] studied the draining and stability of soap froths. Besides theoretical studies, many scientists also employed the theories of soap froths to study other natural phenomena and perform various applications. D'Arcy W. Thompson published a book called "On Growth and Form" [43] to discuss the shape of living organisms by the comparison of patterns to soap bubbles. Later, many chemical plants produce detergents, paints, and soap so that they had to spend more effects to investigate the surface properties in coalescing, dyeing, emulsification, foaming and wetting. In recent years, soap bubble theory has been widely exploited to study
Agent Dynamics:
Soap Paradigm
231
many natural phenomena, such as the equilibrium and growth shapes of crystals, vegetables tissues as well as the evolution of the universe. 9.2.2
Soap froth
model
The main force that governs the dynamics of soap froths is surface tension. Surface tension occurs only at the interface between the surface of a fluid and the surface of another material or fluid as shown in the Fig. 9.1.
/
Fig. 9.1
Surface tensions between a fluid droplet and a substrate
It is totally dependent on the properties of the fluid and the material of contact. Surface tension affects the angle of contact along the interface between the surfaces. In other words, surface tension can be interpreted as the surface internal energy by multiplying the length of the interface with the surface tension. Inside the soap solution, molecules near to the surface of the fluid experience a non-uniform force from the surrounding. Thus, there exists a net force to pull the molecules near the surface inwards. Then the intramolecular distance decreases and it leads to an increase of the outward force to oppose the shrinkage. In general, the surface of a fluid is convex in shape. 9.2.2.1
Macroscopic view
Identifying a bubble as an agent, the interactions among agents are similar to the dynamics of soap bubbles. The main department responsible for dealing with customers is the front office or the customer section. How
232
F. W. K. Lor
the front office interacts with the customers is dependent on its back office and what kind of services the agent company provides to the customers. In every natural phenomenon, an object must have inertia to select a most probable way with the lowest energy. This is similar to the idea of Calculus of Variation founded by J. Bernoulli and L. Euler [7] in the eighteenth century. The variational method was derived to determine the minimum area surfaces. Later J. L. Lagrange reformulated it to a well-known EulerLagrange equation, which is an alternative method of Newton equations to express the equation of motion of an object in classical mechanics. In the other words, the minimization of energy is somewhat equivalent to the maximization of entropy of the entire system. And the entropy can be interpreted as an index of the disorder of the universe. Similar to human decision making, it is to find a profitable way to perform a task. Let us map the energy as the cost and the entropy as the profit. Each agent company tries to achieve its maximum profit with minimum cost. Based on this similar criterion, a soap bubble evolves to establish an equilibrium state by balancing the internal pressure with the external pressure. It is to attain a state such that the tension along the whole surface becomes uniform and perpendicular to the interface of surfaces, otherwise the bubble with unbalanced pressure may rupture. Inside soap froths, all the bubbles are attempting to achieve their equilibrium points. As a consequence, this leads to chain reactions of morphs. This is the same in the scenario of multi-agent systems that chain reactions occur among agents due to the change of adaptation strategies of each agent based upon the environment. Neighboring agents belonging to different companies compete against one another to acquire more clients. If an agent has more resources than others, it can further expand its business to acquire more customers and vice versa. The system may achieve equilibrium by balancing the demand and supply provided there are no external stimuli to upset the system.
9.2.2.2
Microscopic view
To better understand the mechanism of morphing of soap films, let us observe the soap film structure in molecular level. Natural soap solution, which is normally made up of sodium or potassium salts of fatty acids — typical examples are sodium stearate Ci7H3sCOO _ Na + or potassium stearate Ci 7 H35COC" _ K + — has a tendency to form stable bubbles and
Agent Dynamics:
Soap Paradigm
233
films. After dissolving metal salts of fatty acids into the solution, metal cations (Na + or K + ) are mixed with water molecules (HO~H+) inside the solution bath. The amphipathic ions (Ci7H3sCOO~) form a monomolecular layer, also known as a surfactant, along the surface of a soap film. These ions are composed of two components. One is hydrophilic, that is the negatively charged carboxyl head (COO - ) which is attracted by water molecules. Another is hydrophobic, that is the neutral hydrocarbon tail (C17H35) which repels water molecules. Therefore, if we look at the solution in detail, we can find most of the negatively heads are dipped into the soap bath while the hydrophobic tails emerge out of the solution as shown in the Fig. 9.2.
Air * :•* t
<
'»
*f-
•
:-'j
Solution phase '
Cation Surfactant
Fig. 9.2
Air
*•
• J. *
v
t
•
V'"
Air
r
Interesting behavior of amphipathic ions across the boundary of the solution
This peculiar structure of soap films produces many interesting physical phenomena. The negatively charged carboxyl heads act as partners and cooperate with the solution. On the contrary, the neutral hydrocarbon tails act as opponents and compete against one another. Consequently, this leads to the amphipathic ions being able to form clusters called micelles, which are illustrated in the Fig. 9.3. The attraction and repulsion of these ions also affect the stability of
234
F. W. K. Lor
Fig. 9.3
Uneven forces exerted on surfactants and micelles
soap films. As the soap film expands, the concentration of amphipathic ions drops. Subsequently the soap solution behaves more like water, but the surface tension of water is greater than the surface tension of soap solution. This then increases the force between the interfaces in order to shrink the film. The processes of expansion and shrinkage switch alternatively to balance the pressure difference. Such a phenomenon is called the Marangoni effect, which happens under the non-equilibrium condition. The Maragoni effect results from the variation of surface tension with a change in the area of an element of the surface. (It is different from the Gibbs elasticity [9], which is found in the stable equilibrium case by balancing the excess pressure.) These interfacial hydrodynamics, which are rather complicated, can be used to study the fluctuation of agent dynamics in the transient period. 9.2.2.3
Topological properties
After understanding the factors that govern the dynamics of soap films, let us consider the geometrical structures of two-dimensional soap froths. We now equate the surface tensions at the point of contact with different
Agent Dynamics:
Soap Paradigm
235
film surfaces. When it is in equilibrium, the surface tension along different interfaces of contact must be balanced. If the films are formed in a soap solution, all the magnitudes of the components of surface tensions along different interfaces are equal. For instance, when three surfaces meet together as shown in the Fig. 9.4, the total sum of surface tensions is in general equal to zero.
CT
31
2
a
Fig. 9.4 media
23
3
The balance of surface tensions at a point of the interface of three different
0"12 + CT23 + CT31 = 0
(9.1)
Therefore the angle of contact of three different interfaces should be 120°, which also fulfills the minimal pathway that links three points. This can be argued by the minimization of energy or, alternatively, proved by a simple Cartesian geometrical derivation. This minimization problem of finding the shortest distance to link three points was firstly studied by J. Steiner in the early nineteenth century [10]. Hence it is also known as the three-point Steiner problem. 9.2.3
Three point Steiner
problems
For any three points A, B and C arranged in any way, find a fourth point P linking up three points such that the overall sum of the distance from the point P to other three points is a minimum, that is the minimal distance
236
F. W. K. LOT
of AP + 5P + CP.
Fig. 9.5
Geometrical proof of three-point Steiner problem
Let P be the point that constitutes the shortest distance linking A, B and C as illustrated in the Fig. 9.5. Let the angle AACB be the largest angle of the triangle AABC, i.e. the total length of AC + CB is shorter than the sum of any other two sides of the triangle AABC. If the triangle AABC possesses internal angle less than 120°, then the point P is located inside the triangle AABC. Otherwise, P must coincide with the point, C. Assume here the largest angle LACB is less than 120°. Let us consider an ellipse with foci A and B, for any point P on the circumference of an ellipse,
A~P + P~B = constant
(9.2)
Consider a circle centered at C with the radius CP. If AP + BP + CP is minimal, then there exists a tangent common to both the ellipse and the circle. For any point P' on the circumference of the circle, it must satisfy
AP' + BP' + CP' >AP + BP + CP
(9.3)
Agent Dynamics:
Soap Paradigm
237
Since CP and CP1 are the radius, then AP> + B~P~' > IP + B~P
(9.4)
Hence, the minimum AP + BP implies IBPC
= ICPA
(9.5)
Repeat the arguments by taking the foci of the ellipse as B, C and C, A and the centers of the circles as A and B, respectively. Thus an equality of the angles is obtained. IAPB
= IBPC
= ICPA
(9.6)
and since IAPB
+ IBPC
+ ICPA
= 360°
(9.7)
IAPB
= IBPC
= ICPA
= 120°
(9.8)
therefore,
For a general n points Steiner problem, it does not have any general proof because the degeneracy of solution occurs for joining four or more points together. Therefore, the configuration of soap froths has many different patterns with the same internal energy. This is also related to one kind of topological rearrangement, called Tl — switching, which is discussed in the section below. 9.2.4
Euler topological
characteristics
The coordination number^ of a connected network can be illustrated to be three as the most stable configuration by a similar argument of length minimization in the previous section. If four edges meet at a vertex, they §The number of edges connected to a vertex is defined as the coordination number of a graph. For instance, a network with coordination number of 3 means that every point inside t h e network has 3 edges attached.
238
F. W. K. Lor
tend to split apart into pairs of vertices with three edges. It is because the total length of four lines radiated from a vertex to connect other four points must be longer than the sum of five lines such that one of the edges connecting two vertices with three edges as shown in the Fig. 9.6.
O
Length = 2 . 7 3 Fig. 9.6 Decomposition of a four-edge vertex to a more stable configuration of two vertices with three edges
Therefore, the most stable condition for a planar structure is to form a regular hexagonal structure. This also explains many biological cellular patterns, such as honeycombs. Furthermore, the topological coordination parameter is essential to illuminate the dynamics of soap froth evolution. In a planar connected network, the Euler topological characteristic is equal to one. So the topology of two-fold soap froths can be written as
Nv,
N(
•Nt cells
(9.9)
Agent Dynamics:
Soap Paradigm
239
where Wvertices, Wedges a n d -Wells are the number of vertices, edges and cells, respectively in a two-dimensional network. Suppose the average number of sides of a cell be n and the cells be inside an infinite number of cells of a lattice neglecting the boundary effect. Because of the coordination number to be three, each vertex is shared by three cells and each cell in average has n vertices, therefore
^vertices = g « W c e l l s
(9.10)
Similarly, each edge is located besides two cells,
Wedges = 2 n W c e l l s
C9-11)
Substituting the above Equations 9.10 & 9.11 into the Euler characteristics Equation 9.9, then
-SAUis - -nNcells
+ NceUs = 1
(9.12)
If Wceus goes to infinity, n will trends to equal to six. Hence the average number of sides of a cell in an infinite number of cells of two-dimensional lattice is six. 9.2.5
Topological
rearrangements
Based on the assumption of the diffusion rate being equal to the length of the film times the pressure gradient across the interface, von Neumann's dynamics Equation 9.13 can be easily derived for the evolution of cellular structures. The rate of change of the area inscribed of a bubble with respect to the number of edges, dAe/dt, is linearly proportional to the number of edges £ of a bubble.
~
=^ - 6 )
(9.13)
where K is a positive physical constant related to the surface tension and the permeability of soap solution. From the von Neumann's dynamics Equation 9.13, a general dynamic behavior can be induced: bubbles with the number
240
F. W. K. Lor
of edges I more than six grow, while the bubbles with their number of edges I less than six shrink and eventually disappear. This disappearance of bubbles is named as the T2 process of topological rearrangements, which is dominant in the beginning of the evolution. T2 processes of 3-, 4-, and 5-sided bubbles are shown in the Fig. 9.7.
Tl Process
6 • «*•
Fig. 9.7
li + i
T2 Processes
Id
Topological rearrangements of two-dimensional connected cellular networks
Since a 2-sided bubble is rarely found in soap froths or quickly disappears even if it exists, we discard this type of rearrangement and consider only others in our analytical model. Moreover, we can conclude from the von Neumann Equation 9.13 that a stable configuration of a planar pattern is either a regular hexagonal lattice or an empty lattice. Besides T2 processes, there is another process called Tl rearrangement, which is an edge switching process illustrated in the Fig. 9.7. Tl processes do not happen so frequently as compared with T2 processes in the transition period of the evolution. Tl processes become dominant only at the latter stage of the evolution, as the occurrence of T2 processes reduces. 9.2.5.1
Agent dynamics
As above we treat the soap bubble as an analogue of an agent company, where the area of the soap bubble inscribed is treated as the resources of the agent. The perimeter of the soap bubble is labeled as the number of customers that the agent serves. Each agent competes against its neighbor-
Agent Dynamics:
Soap Paradigm
241
ing agents and the interactions are mapped into the dynamics of expansion or shrinkage of the soap bubble. When the agent acquires more clients, the agent expands with elongating the perimeter of the soap bubble^. Because all sides of a soap bubble must be convex with respect to the center of it, the area expands while the perimeter elongates. This implies that, generally, the resources of the agent are increased when the number of customers increases and vice versa. T2 topological rearrangement can also be used to explain the death of an agent due to an insufficient number of customers. If one agent acquires more customers from its neighborhood, the number of customers in its neighborhood will reduce. The switching of the choice of customers to different agents is represented by Tl topological rearrangements. In general, the customers of the agent should not be confined in a connected set and each agent does not always have the same or fixed neighboring agents in a planar configuration. To avoid considering a higher dimension of hyperspace, we label more than one bubble as in the same agent company in the system. In addition to Tl neighboring switching processes, the neighborhoods of agents can be altered as well as their customers can spread out in various locations. Hence, the two-fold soap froth is still a good model to elucidate well the behavior of multi-agent systems.
9.3
Evolution dynamics of soap froths
To further understand the evolution dynamics of soap froths, we look at the results from experiments and simulations of soap froths. Let us firstly discuss the correlations [36] that are observed from the direct experiments of real soap.
9.3.1
Empirical
laws
Briefly speaking, there are in total three kinds of correlation laws [40] found in the configuration of real soap froths. Those are related to the area, radius and the number of sides of neighboring bubbles. ^Feltham's Law [8] governs the relationships between the area and perimeter of the bubbles.
242
F. W. K. Lor
9.3.1.1
Levis' Law
F. T. Lewis [22], who studied the relationships between the epithelial cells of cucumber and soap froths in two and three dimensions, discovered the empirical rule that the average area of ^-sided cells, At, is linearly correlated to the number of sides £. Lewis' Law [23], which governs the entire process of the evolution of many two-dimensional cellular structures, can be written as
Ae = AN[l + X{£-6)}
(9.14)
where A is a fitting parameter and AN is the average area of a cell belonging to a cellular network of total N cells within the overall area A^ot • 9.3.1.2
Radius Law
In addition, a similar linear empirical relationship was established with the average radius of an ^-sided bubble ft and the number of sides £. The Radius Law is expressed as ft = Cle + c2
(9.15)
where c\ and c2 are the curve fitting parameters of slope and intercept. 9.3.1.3
Aboav-Weaire's Law
Recently, D. A. Aboav [l] and D. Weaire [44] observed another simple linear correlation function for the average number of sides of the neighbors of an £-sided cell, me, and the number of sides £ of a cell. The relationship is given as
eme = (6 - a)£ + (6a + fi2)
(9.16)
where a is a constant approximately equal to one and fi2 = J2t(^ ~ 6) Pt is the second moment of the probability distribution Pt of cells with £ sides. Alternatively, these empirical rules can also be derived using the maximum entropy argument. For completeness, the sum of all probabilities must be equal to one, Y^i^t = •*•• According to the Euler characteristics, Equation 9.9, and the careful counting of the distribution of number of sides
Agent Dynamics:
Soap Paradigm
243
I of the cells, N. Rivier et. al. proved these linearity functions following the maximum entropy argument [33]. This can also be illustrated by a Potts model11 simulation of a random lattice model [35]. Those linear correlation distribution functions are extremely important for the investigation of the dynamics of soap froths. They are well-known quantities to describe many planar cellular structures although they have been recently found to have slight distortion of the linearity due to the insignificant Tl processes. 9.3.2
Survivor
Selection
Rules
Consider a system with the initial condition of containing many numbers of bubbles inside. Then let the system evolve according to the dynamics described earlier. During the evolution, no new bubbles are introduced. As the time passes, the coalescence of bubbles happens as shown in the Fig. 9.8.
•A
i '0
*1
>2
'3
Fig. 9.8 Evolution of soap froths: the initial configuration of the system is at time toAs the time evolution, the system eventually evolves into the pattern at time 13. Yellow color indicates the survivor bubbles on the evolution from time to to time £3
Let us now consider the rate of change of the number of cells in soap froths, dN/dt. Following the Aboav-Weaire's Law 9.16, we can write down the equation for the rate of change of 3-, 4- and 5-sided cells because of the T2 processes. Since only T2 processes alter the number of cells in the system, the overall rate of change of the number of cells is equal to the summation of the rate of change of 3-, 4- and 5-sided cells. Then the rate II Potts model is a multi-state model
244
F. W. K. Lor
of vanishing £-sided cells can be expressed as dN — = -(CJ3N3
+ co4N4 + to5N5)
(9.17)
u>e is the rate of disappearance of ^-sided cells and Ne is the number of ^-sided cells. Thus, we can determine the rate of change of the probability distribution Pe of £-sided cells, dPtjdt. In general, the rate of change of the probability distribution of cells dPe/dt with sides £ greater than five is written as a simple first order differential equation of motion as
^
= cJ+1Pe+1 - (c+ + cJ)Pt + c+^Pe-i
(9.18)
where c£, cj are the transition rates of ^-sided cells gaining a side and losing a side, respectively. If there exists equilibrium in the system, the rate of change of the probability Equation 9.18 is set to be zero. Then the equilibrium solution of the probability distribution P/** is of the form AX1+BX2, where A, B are arbitrary constants and Ai, A2 are the solutions of this quadratic equation
c
m * 2 - ( 4 + C7)A + 4+i = 0
(9.19)
Therefore, the selection rules of survivors are totally dependent on the determinant of Equation 9.19
A = (c+ + a')2 - 4c7+1c+_i
(9-20)
With the use of Equation 9.17 and Aboav-Weaire's Law 9.16, we can explicitly solve the equation for the sides of cells equal to 3, 4 and 5, according to the different rates of T2 processes. Hence other probability distributions Pi for £-sided cells are easily evaluated by induction. The calculated results as well as the experimental observations are shown in the Fig. 9.9.
Agent Dynamics:
Soap Paradigm
245
Topology Distribution
u.o T
0.4 -
j\
8 c 0)
w °x Ul
"S
0.3 -
V
0.2 -
\
bi!lity
Q.
\
1—
CO X!
0.1 -
o fc_ Q.
0.0 -
-0.1 -
—W
1
i
!
4
6
8
10
£
•
12
14
Number of Sides £ of Bubbles Fig. 9.9
9.3.3
Probability distribution of different number of sides £ of bubbles
Universalities
of soap froth
configurations
As a result of the probability distribution being constant in time, we can deduce that the rate of change of the number of cells dN/dt is proportional to N2 by considering the Equation 9.13 and the mean-field approximation [21]. Hence, the average area of a bubble with £ sides Ae is a universal scaling constant [39] in the evolution. More specifically, the system is globally asymptotically stable but it is locally unstable. For the multi-agent dynamics, though the local competing behavior against other agents causes the system to fluctuate, the system has a global quasi-equilibrium state with respect to a certain distribution of various sizes of agents. Similarly, it can be shown that the average perimeter of a bubble with t sides Le is another universal scaling constant [41] by a direct experimental observation as well as a derivation from the statistical microscopic ensemble
246
F. W. K. Lor
of the surface energy of soap froths [42]. From these two universal scaling constants A( and Lg, we conclude that a multi-agent system does not precisely obey an economic theory of demand and supply. The relation of demand and supply is only invariant by taking the number of surrounding competitors into account [26]. For instance, an agent is surrounded by many other agents. If it wants to expand its business to acquire more customers, it should provide more resources to compete against others. Otherwise, it may be possible to be discarded and then vanish if its surrounding agents are very strong. On the other hand, if there are not so many agents competing, the agent does not need to provide many resources in order to still survive in the system.
9.4
Simulation model
After the discussion of analytical study of multi-soap-agent model, let us consider the simulation [24]. To simplify the simulation model, we are here only concerned with the equilibrium state or steady state of cellular networks consisting of two agent companies, so that only the color of the multi-agent system is changing. Our first investigated network is the regular hexagonal lattice. The second network, which is more realistic, is an irregular lattice generated from a source point patterns using a Voronoi construction, which form cells that partition the point patterns by the nearest distance, as illustrated in the Fig. 9.10. Let S denote the set of all source points in two-dimensional space. The Voronoi cell belonging to a point p € S is defined by
Vor(p) = {x€ R2\\\x -p\\ < \\x - q\\,Vq G S/p}
(9.21)
This pattern is the same as the configuration of soap bubbles at the equilibrium state by minimizing the surface energy. Each bubble or cell is then identified with one color amongst two. The system then undergoes color switching according to different color interactions. 9.4.1
Switching
dynamics
On a given cellular network, each cell is randomly selected to have either, for example, red or blue in color, with equal probability. This corresponds
Agent Dynamics:
Fig. 9.10
Soap Paradigm
247
Voronoi constructions and the dual of Delaunay triangulation
to the situation of two equally powerful companies. In applications, the probability of color assignment to cells can be adjusted to suit realistic situations. The energy of a bubble i is calculated as
Ei = m s a m e £ S a m e ~ "^diff^diff
(9.22) .
where m s a m e and m^is are the number of neighboring cells with the same and different colors, respectively. £ s a m e and £diff a r e related to the surface energies of a boundary between the same and different colors
£diff
oc 70-Ldiff
(9.23)
where a and 7 are the strengths of bonding energy of the same and different colors. We assume and define a strength ratio parameter of different color interaction to same color interaction as x = 7 / a . Here a is the surface tension, LSame and Ldiff are the lengths of a boundary with the same color and different color. Let Pi be the probability of the bubble i switching into another color and Qi be the probability of the bubble i remaining the same color. Because of the axiom of completeness,
248
F. W. K. Lor
Pi + Qi = l
Vi
(9.24)
Let us also assume that the switching probability Pi in the Monte Carlo simulation has the usual Boltzmann distribution, -PEi
(9.25)
Pi
where Mi is the set of nearest neighbors of the bubble i. These probabilities are parameterized by the color interaction ratio x and the noise level parameter /? and we will monitor the color fraction as a function of these two parameters. When f3 is small, then Pi is close to 1/Mi, and color switching depends only on the number of edges the cell has, making the switching dynamics a highly random process. On the contrary, when /? is large, then Pi depends critically on the color of its neighbors and the switching dynamics will be highly selective. /
:
(^#
Fig. 9.11 Surface plot of phase diagram of dominant color fraction on the 50 x 50 regular hexagonal lattice under the periodic boundary condition
Agent Dynamics:
9.4.2
Simulation
Soap Paradigm
249
results
Firstly, we generate a 50 x 50 regular hexagonal lattice with a periodic boundary condition. The system then undergoes the evolution for 10000 cycles according to the switching mechanics described in the previous section. The whole system is synchronously updated in each cycle. Ten sample runs are executed to collect the statistics. The average value of the dominant color fraction is then plotted versus the logarithm of the relative strength ratio of color interaction x and the logarithm of noise level parameter /J. Fig. 9.11 illustrates the three-dimensional surface plot of the phase transitions of the system.
Fig. 9.12 Surface plot of phase diagram of dominant color fraction on the 400 Voronoi cells under partial periodic boundary condition
Secondly, 400 Voronoi cells are constructed by finding the dual of the Delaunay triangulation1"1" or the Wigner-Seitz cells** of a given source point ttDelaunay triangulation of a point set is defined to be a set of triangulations such that no other points of the set are inscribed by the circumcircle of the three points of a triangle. **Wigner-Seitz cell is a set of spaces which cicumscribes a lattice point more closer than
250
F. W. K. Lor
pattern generated randomly in a square with the center at the origin of the xy-plane. The four boundaries of the square are complemented with identical point patterns to minimize the boundary and finite size effect. Similar to the above case of regular hexagonal lattice, the system follows the switching rules to evolve for 1000 cycles. Fig. 9.12 shows an ensemble average of ten runs of the dominant color fraction against the logarithm of the strength ratio of color interaction x and the logarithm of noise level parameter /?.
Fig. 9.13 Phase diagram of dominant color fraction against the noise level (3 with the equal strength of color interaction on the 50 x 50 regular hexagonal lattice under the periodic boundary condition
9.4.3
Ordered and disordered
phases
FVom Fig. 9.11 & Fig. 9.12, we observe that the dominant color fraction is more strongly correlated to the noise level parameter /? than the relative strength ratio of color interaction x. If the relative strength ratio of color interaction x increases, the system trends to become less stable. Due to the dominance of different color interaction, £diff> in terms of the surface energy of bubbles, it is more favorable for the system to switch into different any other point.
Agent Dynamics:
Soap Paradigm
251
color. On the other hand, if the surface energy is influenced more by the same color interaction, £Same, the system will shift to a stationary state with the same color bubble clusters. Hence, the peak is in general higher with decreasing x. In the Fig. 9.11, there is a small bump at 2.5 of log 10 j3. This indicates that the relative strength ratio of color interaction x is only important at the narrow range of the phase transition from an ordered phase into disordered phase. A similar small bump can also be observed on irregular Voronoi cells in the Fig. 9.12. As we can see the Fig. 9.13, the two-dimensional cut of a fixed relative strength ratio of color interaction x, the dominant color fraction gradually increases the values from 0.5 up to 0.95 around log 10 f3 = —0.3. Then the curve, on average, levels off between the values of log 10 /3 of 0 to 2.5. At 2.5 of the noise level parameter j3, the dominant color fraction is suddenly dropping down to 0.5 again. Similarly a phase transition happens in the results of irregular Voronoi cells as illustrated in the Fig. 9.14. However, the fluctuation is considerably larger and the transition noise level (3 is approximately shifted by one order of the magnitude smaller.
M V*
0 62
I a
,r
2
-
i 1
, 0
1
, 2
, 3
i 4
Fig. 9.14 Phase diagram of dominant color fraction against the noise level (3 with the equal strength of color interaction on the 400 Voronoi cells under the partial periodic boundary condition
This behavior in our multi-soap-agent system is akin to the one in Ising
252
F. W. K. Lor
spin models. The noise level parameter j3 is equivalent to the inverse of temperature. When the temperature is sufficiently low, the system is in a dormant state. Switching does not occur very frequently. As the temperature increases, the system becomes more active. It appears that the color of a soap bubble begins to switch to be coincident with the color. This process shows that there may be a phase transition at log 10 j3 equal to 2.5 for hexagons and 1.0 for Voronoi cells. When the temperature further increases, color switching occurs more frequently. The system eventually becomes more unstable. This is similar to the demagnetization process in Ising spin models. In conclusions, the noise level parameter /3 does not just account for the mixing of the system into a disordered phase, but is essential in changing the system into an ordered phase.
9.5
Applications
To describe their dynamics and equilibrium behaviors, the multi-soap-agent model can be applied to many multi-agent systems. Particularly, a system consists of massive number of agents, each of which has its own limited resources. They form different groups and compete against one another. We here briefly describe a typical example of business management of one particular service in the commercial world. Let us consider a residential district as shown in the Fig. 9.15. Inside the district, there are for example five major supermarket-chain agent companies. Each company has branches of supermarkets distributed in various areas. The area of an agent is labeled by one color according to its agent company. Normally customers go to shopping at the nearest supermarket and the one with the largest variety of goods. If one customer wants to buy domestic groceries but the nearest supermarket can provide food and drinks only, the customer will travel to a farther supermarket. On the other hand, if a farther supermarket has many special offers or discounts for its goods while the price of goods supplied by the supermarket nearby is dear, customers may prefer to travel to the farther supermarket for shopping. Using the multi-soap-agent model, it is easy to analyze the interactions between different supermarket-chains [25]. Surface energy is related to the cost as well as the resources. As above, we can setup any relative strength ratios of the interactions among supermarket agent companies based upon their variety of goods and special promotions. Afterwards, we perform
Agent Dynamics:
Soap Paradigm
253
Fig. 9.15 Scenarios of the distribution of different supermarket-chain companies in a district according to multi-soap-agent model
Monte Carlo simulations. Tuning the parameters discussed the section above, the system will evolve into different patterns. If the system falls in the regime of disordered phase, it means that a fair market exists. Different supermarket-chain agent companies distribute approximately evenly in the district. On the contrary, if the system is within the regime of ordered phase, it describes colonization of monopolies. Inside the district, one company dominates in the market among others. The introduction of a new agent into the system and the death of a non-profit making agent in the market can be modeled as well, the change of the distribution of the market will be updated by considering topological rearrangement processes and other soap froth dynamics. Moreover, the survivors can be predicted by the selection rules discussed in the previous section.
254
9.6
F. W. K. Lor
Concluding remarks
Soap froth dynamics is a very powerful analytical tool that can be exploited to study, not only the natural phenomena in biological and physical world, but the multi-agent behavior. In the derivation above, we demonstrate the analogy to multi-agent system dynamics and find some intrinsic properties that govern the dynamics of evolution. In general, the dynamics is a non-equilibrium in local regimes. According to the maximum entropy or minimization of energy arguments and fundamental topological properties of two-dimensional connected networks, two universal scaling constants related to the amount of resources and the number of customers are determined as function of the number of surrounding agents. Moreover, the survivors in multi-agent systems can be predicted by considering von Neumann's dynamics Equation 9.13 of cellular structures. In conclusion, multi-agent dynamics are not completely chaotic. There also exists some global fractal scaling parameters that govern the evolution. These scaling parameters also suggest that the supply is not precisely proportional to the demand. It depends on the amount and strength of its competitors as well. It also makes good sense an agent must provide better services by increasing its resources in order to acquire more customers in a competitive market. The above example of supermarket-chains is one of the typical business management applications which contains geographical information data. Besides the distribution of agents in multi-agent systems, applications, such as telecommunications, Internet, e-commerce, distributed systems, and especially massive systems can also be analyzed by the multi-soapagent model. Here we are mainly concerned with the switching dynamics of colors of multi-soap-agent systems in a static cellular network. Although the novel model is simple, a lot of interesting results, such as stability, scalability and fairness of multi-agent systems, can be obtained. More realistically, the evolving network can be governed by the soap froth dynamics, where gas diffusion, Tl and T2 topological rearrangement processes, and other known properties of soap froths can be used for guidance. Furthermore, the analogy of Ising model on an irregular two-dimensional lattice to our two-color multi-agent system can be generalized to multi-colors as in the above application. The analogy is the Potts model on an irregular two dimensional lattice that is useful to further investigate the analysis of multi-
Agent Dynamics:
Soap Paradigm
255
agent systems. Many interesting questions, both on static and dynamics, can be asked with the multi-color multi-agent systems. Besides exploring the macroscopic properties of multi-agent systems, the analysis can also be extended to study microscopic and mesoscopic behaviors as well as the dynamics of foam rheology. Questions such as the average lifetime of an agent and the average color neighborhood of those long living agents are definitely of interest in real applications. The color dominance can be achieved or in many cases avoided for the sake of abiding to social ideology. A natural avenue for investigating these interesting questions can be based on our simple model, with the emphasis on the effects of model parameters on stability. Finally, the scaling behavior found so universal in soap froths can be addressed in our model by performing a finite size scaling analysis in our simulation. Acknowledgements The author would like to thank K. Y. Szeto, I. Rezek, J. Liu, N. Zhong, S. Johansson and B. Carlsson for their comments, and also acknowledge the financial support from Multi-Agent Architecture for Distributed Intelligent Network Load Control and Overload Protection (MARINER).
256
F. W. K. Lor
Bibliography
Aboav, D.A., The Arrangement of Grains in a Polycrystal, Metallography, vol. 3, no. 4, pp. 383-390, 1970. Aumann, R.J., and Maschler, M., Game Theoretic Analysis of a Bankruptcy Problem from the Talmud, Journal of Economic Theory, vol. 36, pp. 195-213, 1985. Berners-Lee, T., Weaving the Web, Harper San Francisco, 1999. Boys, C.V., Soap Bubbles: Their Colours and the Forces which Mould Them, Dover, New York, 1959. Callahan, M.J., Hoffman, D., and Hoffman, J.T., Computer Graphics Tool for the Study of Minimal Surfaces, Commun. ACM, vol. 31, pp. 648-661, 1988. Dewar, J., Soap Bubbles of Long Duration, Proceedings of Roy. Inst. Gt. Britain, vol. 22, pp. 179-185, 1917. Euler, L., Opera Omnia, Orell Fiissli, vols. 24, 25, 1952. Feltham, F., Grain Growth in Metals, Acta. Met., vol. 5, pp. 97-105, 1957. Gibbs, J.W., The Scientific Papers of J. Willard Gibbs, volumes 1 and 2, Dover 1961. Gilbert, E.N., and Pollack, H.O., Steiner Minimal Trees, SIAM Journal of Applied Math., vol. 16, pp. 1-29, 1968. Gillispie, G.C., Dictionary of Scientific Biography: J.A.F. Plateau, volume XI, Charles Scribner, 1975. Harsanyi, J.C., A General Theory of Rational Behavior in Game Situations, Econometrica, vol. 34, pp. 613-634, 1966.
Agent Dynamics: Soap Paradigm 257 Harsanyi, J.C., Games with Incomplete Information Played by 'Bayesian' Players, Parts I, II and III, Management Science, vol. 14, pp. 159-182, 320-334, 486-502, 1968. Harsanyi, J.C., Games with Random Distributed Payoffs: A New Rational for Mixed Strategy Equilibrium Points, International Journal of Game Theory, vol. 2, pp. 1-23, 1973. Harsanyi, J.C., and Selten, R., A General Theory of Equilibrium Selection in Games, MIT Press, 1988. Jennings, N.R., Sycara, K., and Wooldridge, M., A Roadmap of Agent Research and Development, International Journal of Autonomous and Multi-Agent Systems, vol. 1, pp. 7-38, 1998. Jones, G.R., Jones Telecommunications CD-ROM, Jones International, 2000.
and Multimedia
Encyclopedia
Lagrange, J.L., Oeurves de Lagrange Bd.l, Gauthier-Villars, Paris, pp. 335-362, 1867. Laszlo, M., and Michael, C.M., Solid State Physics: Solutions, Wiley, John & Sons inc., 1996.
Problems and
Lawrence, A.S.C., Soap Films, a Study of Molecular Individuality, Bell, 1929. Levitan, B., Slepyan, E., Krichevsky, O., Stavans, J., and Domany, E., Topological Distribution of Survivors in an Evolving Cellular Structure, Physical Review Letters, vol. 73, no. 5, pp. 756-759, 1994. Lewis, F.T., The Correlation between Cell Division and the Shapes and Sizes of Prismatic Cells in the Epidermis of Cucumis, Anat. Record, vol. 38, pp. 341-362, 1928. Lewis, F.T., The Analogous Shapes of Cells and Bubbles, Proceedings of A.A.A.S., vol. 77, pp. 147-186, 1948. Lor, W.K.F., and Szeto, K.Y., Switching Dynamics of Multi-Agent Systems: Soap Froths Paradigm, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks 2000, vol. 6, pp. 625-628, 2000. Lor, W.K.F., and Szeto, K.Y., Existence of Minority in Multi-Agent Systems using Voronoi Tessellation, Proceedings of the IDEAL 2000, pp. 320-325, 2000. Lor, W.K.F., and De Wilde, P., Soap Bubbles and the Dynamical Behavior of Multi-Agent Systems, Proceedings of Asia-Pacific Conference on Intelligent Agent Technology, pp. 121-130, 1999. Nash, J.F., The Bargaining Problem, Econometrica, vol. 18, pp. 155162, 1950.
258
F. W. K. Lor
[28; Nash, J.F., Equilibrium Points in n-Person Games, Proceedings of the National Academy of Sciences of the United States of America, vol. 36, pp. 48-49, 1950. [29 Von Neumann, J., Discussion, Metal Interfaces, (American Society for Metals: Cleveland), pp. 108-110, 1952. [30; Von Neumann, J., and Morgenstern, O., Theory of Games and Economic Behavior, Princeton University Press, 1944. [31 Nitsche, J.C.C., Plateau's Problems and their Modern Ramifications, American Math. Monthly, vol. 81, no. 9, pp. 945-968, 1974. [32 Okabe, A., Boots, B., and Sugihara, K., Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, John Wiley, Chichester, England, 1992. [33 Peshkin, M.A., Strandburg, J., and Rivier, N., Entropic Predictions for Cellular Networks, Physical Review Letters, vol. 67, no. 13, pp. 1803-1806, 1991. [34; Porter, A.W., Surface Tension, Encyclopedia Britannica, 595, 1964.
vol. 21, p.
[35; Rivier, N., Statistical Crystallography: Structure of Random Cellular Networks, Phil. Mag. B, vol. 52, no. 3, pp. 795-819, 1985. [36;
[37;
Rivier, N., and Lissowski, A., On the Correlation between Sizes and Shapes of Cells in Epithelial Mosaics, Journal of Physica A, vol. 15, no. 3, pp. L143-L148, 1982. Selten, R., Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games, International Journal of Game Theory, vol. 4, pp. 25-55, 1975.
[38 Shapley, L.S., A Value for n-Person Games, Annals Of Studies, vol. 28, pp. 307-317, 1953.
Mathematics
[39 Stavans, J., Domany, E., and Mukamel, D., Universality and Pattern Selection in Two-Dimensional Cellular Structures, Europhysics Letters, vol. 15, no. 5, pp. 479-484, 1991. [40 Stavans, J., and Glazier, J.A., Soap Froth Revisited — Dynamic Scaling in the Two-Dimensional Froth, Physical Review Letters, vol. 62, pp. 1318-1321, 1989. [41 Szeto, K.Y., and Tarn, W.Y., Edge Scaling of Soap Froth, Physica A, vol., 254, pp. 248-256, 1998. [42; Szeto, K.Y., and Tam, W.Y., Universal Topological Properties of Layers in Soap Froth, Physical Review E, vol. 53, pp. 4213-4216, 1996. nd [43; Thompson, D'Arcy W., On Growth and Form, 2 edition, Cambridge University Press: Cambridge, 1942.
Agent Dynamics:
Soap Paradigm
[44] Weaire, D., and Rivier, N., Soap Cells and Statistics-Random Patterns in Two Dimensions, Contemp. Phys., vol. 25, no. 1, pp. 59-99, 1984. [45] Young, T., Cohesion of Fluids, Phil. Trans. Roy. Soc. London, vol. 1, no. 65, p. 84, 1805.
259
Author Index
Barber, K. S., 59 Blair, A. D., 201
Ohsuga, S., 93 Pollack, J. B., 201
Carlsson, B., 175 Sklar, E., 201 Johansson, S. J., 149 Joseph, S., 7
Tang, Y. Y., 1 Tsotsos, J. K., 29
Kawamura, T., 7 Wang, P. S. P., 1 Liu, C , 93 Liu, J., 1, 123 Lor, F. W. K., 227
Ye, Y., 29 Zhong, N., 1, 93
Martin, C. E., 59
262
Index
Subject Index
Aboav-Weaire's Law, 242-244 ACL (Agent Communication Language), 8-10, 23, 25 Action Pyramid, 29 adaptation, 59, 130, 131 agent capabilities, 68 self-confident, 168 self-contemplation, 166 agent action, 152, 153 agent coalition, 153-155 agent metaphor, 7, 8 agent planning system, 31 agent-oriented, 8, 16, 18, 24, 25 amphipathic ions, 233, 234 angle of contact, 230, 231, 234, 235 application problem, multi-agent 67 asymptotically stable, 245 authority-over, 60 autonomous agent, 9 autonomous behaviour, 15 autonomy, 6-9, 15, 16, 21, 24, 26-28, 60 autonomy, adaptation of, 64 bandwidth, 16, 20 Bayes formula, 33 blind search, 31 boundedly rational agents, 151
chain message, 12, 14, 15, 22, 26 change management of KDD process, 113 clickstream, 201, 202 Coala, 150 coalescence, 243 coalition arithmetric model, 158-161 continuous, 151 definition of, 151 dynamic, 151, 153, 154 elephant model, 159, 160 example of, 150, 151 forgiving model, 160-162 geometric model, 159, 160, 162 static, 153 strength of, 158 collective behavior, 123 color fraction, 248-251 command-driven, 60, 69 completeness, 247 complex behavior, 125 consensus, 60, 69 consequences of actions, 152, 153 contingency theory, 60 cooperation, 123 coordination number, 237, 239 coordination, 123 coordination, dynamic, 63
Index 263 data-abstraction, 8 decision-making control, 60 decision-making framework, 60 Delaunay triangulation, 247, 249 detection function, 32 differential geometry, 230 diffusion, 129 disordered, 250-253 Durfee's time scale for commitments, 153 Dynamic Adaptive Autonomy (DAA), 64 dynamic groups models of, 155-166 education, 203, 215 efficiency, 16, 18, 20, 23 effort allocation, 33 equilibrium between V? and V?, 156, 157 equilibrium, 228-232, 234, 235, 244-246, 252 Euler characteristic, 237-239, 242 evolutionary perspective on agent coalitions, 168 experiments, multi-agent, 61, 66 fair market, 253 feature extraction, 125 Feltham's Law, 241 finite size, 252, 255 fitness function, 130 game theory, 175, 176, 179, 181 game, chicken, 175-179, 181, 183, 185-187, 189-193, 195-198 game, cooperate, 176, 185, 198 game, evolutionary, 179, 181 game, extensive, 180 game, hawk and dove, 185 game, Iterated, 177, 179-182 game, Prisoner's dilemma, 175-179, 181-183, 185-187, game, strategic, 180
games, 203, 204, 207, 215 genetic programming, 204, 208, 210, 214 Gibbs elasticity, 234 goal-attainability, 123, 126, 134, 142 goal-directed, 10 group robots, 123, 133, 144 Hawk-and-Dove game, 168, 169 hierarchical representation, 51 hydrophilic, 233 hydrophobic, 233 index of usefulness (IU), 78 interfacial hydrodynamics, 234 Ising model, 251, 252, 254 Janteist agent, 166-168 exploitation of, 169, 170 Java, 204, 215, 216 KDD agents, 94 KDD process iteration, 108 KDD process planning, 102 killer application, 16 knowledge abstraction, 30 knowledge difference, 30 Knowledge Discovery and Data Mining, 93 Knowledge Granularity Spectrum, 29 Knowledge Granularity, 29 Law of Jante, 166, 167, 170 learning, 7, 21, 22, 27, 28 Levis' Law, 242 local stimulus, 128 locally autonomous, 61, 68 macroscopic, 231, 255 Marangoni effect, 234 master, 61, 69 matrix, normalized, 186, 187, 194, 197 matrix, payoff, 183-187, 189, 196, 197 maximum entropy, 232, 242, 243, 254
264
Index
mean-field approximation, 245 membership of a coalition, 155-166 memory-driven behavioral selection, 123, 137 mesoscopic, 255 message, 12, 13, 24, 21 messaging, 10, 12, 22, 24 meta-search, 18, 20-22 micelles, 233, 234 microscopic ensemble, 245 microscopic, 232, 255 minimal distance, 235, 236 minimal surface, 230 minimization of energy, 232, 235, 246, 254 mixed equilibria, 168 mobile agent, 7, 11, 12, 15, 17-19, 21, 22, 27, 28 mobile code, 11, 12, 14, 15, 27 mobile objects, 14, 15, 23 multi-agent systems, MAS, 175, 179-181 multi-hop, 11, 12, 19 navigation, 133 neural networks, 206, 215, 218, 220 noise level parameter /?, 248-252 noise, 175, 182, 185-187, 192, 193, 197, 198 norm function ipj, 155 norms as recommendations, 154 dynamic, 153, 154, 156 in coalitions, 153-155 probability-based, 154 static, 154 object search agent, 31 object search, 31 object-oriented, 8, 24 offspring agent, 129 ontology brokering, 25 OO (Object-Oriented) metaphor, 8,
16, 24 ordered, 250-252 organizational adaptation, 59, 62 P2P (Peer to Peer), 26 parallel operation, 12, 13 performance-driven behavioral learning, 123, 137 Potts model, 243, 254 primitive behavior, 125 probabilistic estimates, 22 probabilistic learners, 22 quasi-equilibrium, 245 Radius Law, 242 reconfiguration, organizational, 62 reorganization, 59, 62 representational units, 22 restructuring, organizational, 63 rheology, 255 RMI (Remote Method Invocation), 11 RPC (Remote Procedure Call), 11, 17 Scaling Problem, 29 scaling, 245, 246, 254, 255 search engines, 18, 22, 28 self-confident agent, 168, 171 exploitation of, 170 self-organized intelligence, 123 self-organized motion, 133 self-organized vision, 126 self-organizing behavior, 129, 136 self-reproduction, 129 semantic compression, 11, 20 Sensible Agent (SA), 65 sensor planning, 34 serial operation, 12, 13 Shapley value, 149 simulation, 138, 178, 180, 182-187, 190, 191, 194, 195, 197-199 singular coalitions, 153 stability, 230, 233, 254, 255 star-shaped itineraries, 13
Index 265 state, 12-15 stationary agents, 11, 15, 18 Steiner problem, 235-237 strategy, evolutionary stable, 179-181 strategy, generous, 183-185, 196-198 strategy, greedy, 183-185, 193, 196-198 strategy, mixed, 194-196 strategy, pure, 194, 195 strategy, Tit-for-Tat, 181-186, 188-193, 195-198 surface tension, 231, 234, 235, 239, 247 Tl rearrangement, 237, 240, 241, 243, 254 T2 rearrangement, 240, 241, 243, 244, 254 target tracking, 132 T C P / I P , 18 Telescript, 17
testbed, Sensible Agent, 65 The GLS System, 97 tournament, population, 180-182, 187, 190 tournament, round robin, 181, 186 transition, 229, 240, 244, 249, 251, 252 Vj, 155-162 V/, 155-162 valuation examples of, 162-166 forgiving models, 166 non-forgiving models, 164, 165 virtual locations, 11 von Neuman's dynamics equation, 239, 240, 254 Voronoi construction, 246, 247, 249 Wigner-Seitz cell, 249 wrapper, 19, 25