This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
• ->.Desi (-><£) \/Bdi(
S T F F T T T T T F
W T T T T T T T T T
R T T T T F F T F T
Table 3. Consequential Closure Principles and their satisfaction in Basic BDI Systems
# CI C2 CS
Formula S Intendi{
R F F F
W T T T
the agent is very cautious, and only intends and desires propositions that believes to be achievable. In realism the set of intention accessible worlds is a subset of the desire-accessible worlds, and the set of desire-accessible worlds is a subset of the belief-accessible worlds Figure l(ii). The axioms are given in Table 1. An agent based on realism is an enthusiastic agent and believes that it can achieve its desires and intentions. Finally, in weak realism, the intersection of intention- and desire-, intention- and belief-, and belief- and desire-accessible worlds is not the empty set as is shown in Figure l(iii). The axiom schemas for weak realism are provided in Table 1. The agent described by weak realism is a more balanced agent than the two other types of agents. The three different systems that result from the adoption of the corresponding axioms of realism will be called S-BDI, R-BDI and W-BDI respectively. Bratman 1 and Rao and Georgeff4, discussed several properties or conditions of rationality that a BDI system should satisfy. The first set of such properties is known as the Asymmetry Thesis or the incompleteness and the inconsistency principles, and they hold pairwise between desires, beliefs, and intentions. They are listed in Table 2 along with their satisfaction in the basic systems. The second set is called the Consequential Closure principles. They are provided in Table 3 along with their satisfaction in the basic BDI systems.
77
3
Circumspect Agents
Different domains of applications for agents have different requirements, which need to be reflected in their conceptualisation, design and implementation. For instance an agent that has to deal in the stock market may have a different reasoning model from an air-traffic controller agent that has as a responsibility the safe landing and take off of aircrafts. The stock market agent may be required to engage in transactions that involve risk, whereas the air-traffic controller agent needs to be strictly cautious and avoid decisions that involve uncertainty and risk. Thus the need for heterogeneous agents stems from the fact that the cognitive model of the agent may have to vary depending on the requirements of a particular application. In the BDI paradigm this need is addressed by adopting different realism constraints. For instance the notion of strong realism characterises a cautious agent. However, strong realism describes only one possible way of relating the three sets of accessible worlds for capturing cautious agents. Moreover, it does not satisfy three Asymmetry Thesis principles as shown in Table 2. Here we propose alternative constraints for characterising cautious or circumspect agents. A circumspect agent is not willing to take any risks, that is, we interpret circumspect agents in the BDI framework as agents that only intend a proposition if they believe it to be achievable in all accessible worlds. Although a number of notions of realism have been uncovered, only three will be presented due to space limitations. According to our interpretation of circumspect agents, such an agent only intends to optionally achieve ip if it believes in all respective accessible worlds that ip is a n achievable option. Thus, one basic restriction for such agents in terms of semantic conditions is that the set of belief-accessible worlds should be a subset of the intention-accessible worlds. Consequently the A2 principle will not be satisfied for such an agent. Therefore we will attempt to improve on the remaining principles, namely A3 and A9. In the first notion of realism RC1-BDI the set of belief-accessible worlds is a subset of the intention-accessible worlds, the intersection of desire- and intention-accessible worlds is not the empty set, and the intersection of belief- and desire-accessible worlds is not the empty set as shown in Figure 2(i). Although we relax the requirement of strong realism in that an agent can have desires that it may not believe in all possible worlds to be achievable, the agent remains circumspect regarding its intentions and it will only adopt intentions that believes to be achievable options. The axioms imposed according to the set relations are given in Table 4b. The respective system called RC1-BDI consists of the 6
Again the application of the realism axioms of Table 4 is restricted to O-formulas.
78 I
I
D B
B
D
B
D
I i)
ii)
iii)
Figure 2. i) RC1-BDI Realism, ii) RC2-BDI Realism, iii) RC3-BDI Realism
Table 4. Axioms for the notions of realism for Circumspect Agents
RC1-BDI
RC2-BDI
RC3-BDI
Intendi(4>) => Beli((t>) Intendi{4>) =>• Beli(
BeU{4>) => -iDesj(-i^)
basic BDI axiomatisation and the axioms for RC1-BDI realism. Another type of circumspect agent is depicted in Figure 2(h). This agent believes that its desires and intentions are achievable options, although its intentions are loosely coupled with its desires. Thus, the set of belief-accessible worlds is a subset of the desire-accessible worlds, the set of belief-accessible worlds is a subset of the intention-accessible worlds as well, and the intersection of the intention- and desire-accessible worlds is not the empty set. The axioms are provided in Table 4 and the respective system is called RC2-BDI. A third variation is illustrated in Figure 2(iii). In this both the set of belief- and desire-accessible worlds are subsets of the intention-accessible worlds, while the intersection of the desire- and belief-accessible worlds is not the empty set. The axioms that are imposed according to these conditions are provided again in Table 4. This system is called RC3-BDI. We turn now our attention to the evaluation of the proposed systems with regards to the desiderata for rational agents as suggested by Bratman 1 and Rao and Georgeff4. According to our interpretation and basic condition for circumspect agents, the A2 principle is not satisfiable in these systems. Comparing the three notions of realism with strong realism we see that RC1BDI, RC2-BDI and RC3-BDI provide certain improvements. In strong realism three of the Asymmetry Thesis principles (A2,A3 and A9) are not satisfied whereas in RC1-BDI only one is not satisfied, and in RC2-BDI and RC3-BDI two of them are not. In all three systems the Consequential Closure principles are satisfied. In conclusion the three proposed systems seem to come closer to the requirements for rational BDI agents than that of strong realism.
79 Table 5. Asymmetry Thesis and Consequential Closure in Circumspect BDI Agents
RC1 RC2 RC3 4
Al T T T
A2 F F F
A3 T T F
A4 A5 T T T
T T T
A6 T T T
A7 T T T
A8 T T T
A9 T F T
CI T T T
C2 C3 T T T T T T
Conclusions
The research presented in this paper has been motivated by the need to formalise heterogeneous agents and in particular circumspect agents in the BDI paradigm. A circumspect BDI agent will only adopt an intention to optionally achieve ip, if it believes that this is an achievable option. Three different notions of realism for circumspect agents were presented. These were shown to have better characteristics than the notion of strong realism. In the scope of this research and in the effort to investigate all the available options, additional notions of realism were uncovered. However, due to lack of space we only described those that seem to yield the most interesting properties. In contrast to circumspect agents one can consider bold agents. Such an agent can adopt an intention towards a proposition if it does not believe that the proposition is not an achievable option. The basic condition that seems to characterise such agents is: Intendi(4>) =^ -'Bel^-xf)). Notions of realism for such agents were explored elsewhere2. In conclusion, we believe that the research presented here comes one step closer towards heterogeneous BDI agents. Perhaps the most interesting aspect of this work is to consider real applications and investigate how real agents that correspond to these formal cognitive models can be built. References 1. M.E. Bratman, Intentions, Plans, and Practical Reason. Harvard University Press (1987). 2. M. Fasli, Towards Heterogeneous BDI Agents I: Bold Agents. In Proceedings of the 14th International FLAIRS Conference, AAAI Press (2001). 3. A. Rao and M. Georgeff, Modelling Rational Agents within a BDIArchitecture. In Proc. of the 2nd Int. Conf. on Principles of Knowledge Representation and Reasoning, pp.473-484 (1991). 4. A. Rao and M. Georgeff, Decision Procedures of BDI Logics. Journal of Logic and Computation, 8(3):293-343 (1998).
A PREFERENCE-DRIVEN APPROACH TO DESIGNING AGENT SYSTEMS
S T E F A N J. J O H A N S S O N Department of Software Engineering and Computer Blekinge Institute of Technology, S-372 25 Ronneby, e-mail:sja@bth. se
Science, Sweden
JOHAN KUMMENEJE Department
of Computer and Systems Sciences, Stockholm Royal Institute of Technology, S-I64 42 Kista, e-mail: j ohankSdsv. su.se
University Sweden
and the
We present a preference-driven approach to the construction of agent systems in which owners and designers of both the agents and environments, are recognized to influence the preferences of the agents in order to maximize their expected utilities. We propose some general guidelines for using preferences in the process of agent engineering and identify the need for future research in the area.
1
Introduction
One important issue of agency is control. We must not be enticed to believe that the agents live in social and environmental isolation. They have been designed to fulfill the goals of their creators by interacting with its environment and other agents. Central to our point of view are the following concepts: Definition 1 An agent owner, (Ao) is the (human or artificial) agent that has the power to launch the agent, as well as make the decision whether the agent should be shut down or be assigned new preferences. The owner expresses its preferences to the agent, and get it to work toward the given preferences. Definition 2 An agent designer, ( A D ) is the (human or artificial) agent that has designed (and possibly implemented) the control mechanism of an agent. With control, we mean the internal evaluation of the environment and the owner preferences. Definition 3 A designer of an environment, ( E D ) is the (human or artificial) agent that has designed and possibly implemented the rules and conditions under which agents are able to act in the environment. Definition 4 An environment owner, (Eo) is the (human or artificial) agent whose run-time preferences are reflected in the dynamics of the rules and the conditions under which agents are able to act in the environment. 80
81
We will try to clarify the role of each and one of these characters in the following sections. In the next section, we will give some (artificial) examples of agent systems and also discuss how the different users and designers relate to their parts of the system. Section 3 discusses a real example of preference dynamics based on the simulated league in RoboCup, in which designers and users of both agents and environments act on the preferences of the others. We finish off with a section of discussion and future work. 2
The Meta-Design of a System
In an agent system, we may identify the following features: First, each of the agents has a set of dynamic preferences expressed by their owners as well as a set of static preferences decided at design level. Secondly, the agents may take into account preferences expressed by the designer and the owner of the environment. Thirdly, each of the agents optimize their actions according to their preferences, their knowledge and their abilities, i.e. they are houndedly rational (more about bounded rationality is found in e.g. Boman 1 ). Fourthly, the actions of the agent influence the environment either directly or indirectly, and Fifthly, changes occur in the environment as a result of the actions of the agents. These are the possibly observable side-effects of the system that the owner may benefit from, and possibly adjust its preferences according to. The Agents' Choice of Actions: The assumption of bounded rationality is pragmatic in the sense that the may be unaware of the preferences, abilities, etc. of other agents. Consider an agent as not being bounded rational, then it would deliberately be acting non-optimal with respect to its design objectives which the user and designers would consider to be the best action. Instead, some other preferences must have been present, which is in contradiction with the fact that the only thing that guides the behavior of an agent is the preferences of its owner and its designers and the state of the environment. The Observations of the Owner: It is rarely the case that agents as such are the reason for running a system (exceptions are to be found e.g. in agentbased simulations 2 ). Instead, what the owners of the agents are generally interested in, is the side-effects of actions of the agents. To illustrate this, imagine an office environment agent. The owner of this agent is interested in the result of the negotiations, i.e. that the local environment gets as close to the owner preferences as possible, not the negotiation protocols used nor how many agents it had to negotiate with.
82 Owner preferences
Environment preferences Figure 1. The different sources of valuation
Design Principles for Agent Environments: As an E D , the task is to implement the rules and conditions under which agents that act in the environment will be evaluated. At the same time as the E D will have to design a (hopefully unambiguous) framework of rules, much effort must be put into the design of a system of punishments and rewards. The E o then sets the rewards and the punishments for certain behaviors in a way that will lead the expected behavior to an acceptable behavioral equilibrium. If not, the result will be an environment in which the agents niche themselves in behaviors that are sub-optimal for the environment as whole. We therefore suggest the following schematic guidelines for environment design and owner maintenance: (i) Set up the conditions under which the agents are allowed to act in the environment. (ii) Assign to each (class of) possible allowed state(s) a preference describing the estimated value of the state (from the perspective of the E D - E O ) , and (iii) Calculate the assignment of punishments and rewards of behaviors that, when implemented in the environment, will have its equilibrium in the preferred states. The complexity of the calculation of punishments and rewards is of course dependent on the complexity of the allowed actions. It is not our purpose to expound our ideas about how to calculate punishments and rewards here, instead we leave it for future work. Design Principles for Agents: Each agent have a set of preferences in which each preference is a measure of the importance that a certain goal is fulfilled. We can distinguish two types of these preferences, static, and dynamic. The
83
static preferences are the ones set at the designers levels when the agents and the environments are implemented. The dynamic preferences are the ones set by the owners of the agents, and to some extent the owners of the environment, in run-time. We may expect a further development of the skills and abilities of the agents as the field of agent engineering matures. This means that they will be able to (if possible) exploit the weaknesses of the environments that they act in, as well as the weaknesses of other agents. Today these weaknesses are exploited manually through the expression of explicit owner preferences, but as the level of abstraction increases, we may expect this to be automated in a way that the ADS provide skills that automagically find out the weak spots of the environment and use them for its own purposes. A suggested set of guidelines for ADS are therefore to design/implement: (i) Abilities to find out the rules and conditions of an environment (e.g. by look-up services, etc). (ii) Abilities to optimize the behavior with respect to: a) the actions possible to perform in the given environment, b) the expected rewards and punishments of different behaviors in the environment, and c) the preferences of the Ao. (iii) An interface to the Ao in which the Ao can express its preferences. The Relation between the Agent and the Environment: It is possible to recognize two different types of relationships — between an agent and its environment, and between agents (i.e. communicative acts). Also, an agent may observe the effects of its own and other agents actions, even though it may be hard or even impossible for the agent to draw any causal conclusions. If we take a closer look at what happens in the environment, the actions are performed under the assumption of the agent that the action was the best possible thing to do in order to reach its goals, expressed by its preferences, regardless of whether they are communicative or not. The agent must in all cases to some extent observe the external state of the environment and the other agents, but the distribution of computational attention between for example observing and acting, is individual from agent to agent. This is typically a parameter that is determined on the designers level. For instance, an agent that rely on learning in order to perform well may be designed to be more observant than an agent that must be prepared for quick responses on changes in the preferences of its owner. This means that it is possible that one agent in one system collects all possible observations, while another agent only observe the actions performed by itself. A study of the trade off between deliberation and action can be found in e.g. the work of Schut 3 .
84
3
A n Exemplification of Preferences
To exemplify our point, we use the student implementations of RoboCup teams at Linkoping University. The example, though somewhat artificial, clearly illustrates a number of occasions where the preferences of the environment designer, the agent designer, and the agent owner influences the development process. RoboCup can simply be described as robots playing soccer, however we focus on the simulated league as we avoid dealing with the ambiguity of the real-world (more information on RoboCup and the simulated league is available in Kummeneje 4 ). The designers of the server of the simulated league, is in our example considered to be the environment designer. RoboSoc5 is a base to ease the creation of new soccer playing teams, and Heintz is thereby considered in our example to be the agent designer, while the students creating their teams are considered to be the agent owners. The agent owners may or may not be aware of the preferences expressed in the simulation server and the RoboSoc platform, however if they are aware of the preferences (and most likely any caveats), they might be able to use these preferences. For instance, in 1997 and 1998 the maximum ball speed was not limited, allowing a team to accelerate the ball to incredible speeds by simply passing the ball a number of times. Afterward the discovery of this feature, it was changed to have a fixed limit. We thereby recognize that the set of preferences of the E D , A D , and the Ao are not fixed, but dynamically changing over time. The preferences may also be viewed as being delicately intertwined. 4
Discussion and Concluding Remarks
The designer of the agent may be the same as the owner, however, more likely is that the future user of agent system is someone who is not able to program the low level algorithms, etc., but who prefer to use the agent at the service level. This will of course raise the issue of trust in agent design. How can we as users of an agent make sure that the agent we have launched to perform a certain task will do its best to serve us without putting the interests of the agent designer in the first room? For instance, should we trust a flight ticket buying agent, designed by someone at the payroll of a major airline company? Questions like this are important to ask if we as agent designers and representatives of the agent research community would like to deserve respect for what we are doing from the point of view of the users of our agents. We have presented a perspective on agent systems, based on preferences
85
set by users and designers and suggested general guidelines for the engineering of agents, as well as agent environments. From an evolutionary perspective, we may expect the agent designers to be better on taking other, external preferences into consideration, while the owners get less interested in how exactly the agent works, and more keen on having their preferences satisfied. The environment designers will concentrate on setting up rules, specific for the domain it is designed for. These rules will not be able to control what actions that can be performed by which agents at what time. However, indirectly the punishments and the rewards of the environment will have a great impact on these matters. Even though this study include a good example of the preference perspective in the domain of RoboCup, it is far too early to draw any extensive conclusions based on this and we suggest that more effort must be put into this promising area of research. Acknowledgments Stefan Johansson would like to thank the EC research programme 1ST-199910298 ALFEBIITE and the KK-foundation for funding and inspiration to this work6. The authors thank Paul Davidsson and Magnus Boman for comments. References 1. M. Boman. What is rational agency. Technical Report 95-048, Department of Computer Systems Sciences, 1995. Internal Working Note. 2. H.J.E. Verhagen. Norm Autonomous Agents. PhD thesis, Department of Computer and Systems Sciences, Stockholm University and Royal Institute of Technology, 2000. 3. M. Schut. Intention reconsideration as discrete deliberation scheduling. In Proceedings of 2001 AAAI Spring Symposium on Game Theoretic and Decision Theoretic Agents, Thecnical Report SS-01-03. AAAI Press, 2001. 4. Johan Kummeneje. RoboCup as a Means to Research, Education, and Dissemination, March 2001. Licentiate thesis, Department of Computer and Systems Sciences, Stockholm University and the Royal Institute of Technology. 5. Fredrik Heintz. RoboSoc a System for Developing RoboCup Agents for Educational Use. Master's thesis, Department of Computer and Information Science, Linkoping University, March 2000. 6. The ALFEBIITE home page, http://www.iis.ee.ic.ac.uk/alfebiite.
AGENT CONSUMER REPORTS: OF THE AGENTS, BY THE AGENTS, AND FOR THE AGENTS
XIAOCHENG LUAN, YUN PENG, AND TIMOTHY FININ University of Maryland, Baltimore County, 22215 Overview Lane, Boyds, MD 20841, USA E-mail: {XLUAN1, YPENG, FININ}@CS.UMBC.EDU Service matching is critical in large, dynamic agent systems. While finding exact matches is always desirable as long as an agent knows what it wants, it is not always possible to find exact matches. Moreover, the selected agents (with exact match) may or may not provide quality services. Some agents may be unwilling or unable to advertise their capability information at the sufficient level of details, some might unknowingly advertise inaccurate information, while others might even purposefully provide misleading information. Our proposed solution to this problem is the agent "consumer reports". The broker agent will not only collect the information advertised by the service provider agents, but also learn about the experiences the consumer agents have about their service providers. It might also hire some agents to test certain service providers to see how well they can do what they claim they are capable of doing. Then agent consumer reports will be built based on the information collected. The advanced level of agent consumer reports will also dynamically capture the probabilistic distribution of the services and use it to assess the probability of a match. We plan to extend LARKS and use it as our agent capability description language.
1
Introduction
Finding the right agent(s) for the right task (service) is critical to achieve agent cooperation in large, dynamic agent systems. A popular approach to this problem is to use a broker agent (may also be called matchmaker, or facilitator) to connect the service provider agents and the service consumer agents, via service matching. Typically a broker agent recommends service providers based on the capabilities/services advertised by the service providers themselves. The matching method evolves from the early age, simple KQML performative based matching, to syntax and semantic based matching; from yes/no matches to matches with probabilities. However, we may still have problems since some agents may be unwilling or unable to advertise their capability information at sufficient level of details; some might unknowingly advertise inaccurate information; while others might even purposefully provide misleading information. We have similar problems in the real world: we don't know whether the colorful, fancy, and even touching commercials are true or not. There is no perfect solution to this real world problem, but consumer reports certainly help a lot (besides the justice system). Consumer reports are created using the information from the manufacture's specification, consumer's feedback, and their test results on the products. It provides
86
87
guidance for consumers to choose the right product. We believe that this consumer reports approach should work for the agent world, too. By following a simple brokering protocol (which will not be discussed here because of space limitation), the broker agent will not only collect the information advertised by the service provider agents, but also learn about the experiences the consumer agents have about their service providers. It might also hire some agents to test certain service providers to see how well they can do what they claim they are capable of doing. Based on the collected information and the domain knowledge, consumer reports can be built to assist in service matching. Moreover, the broker agent can dynamically capture the probabilistic distribution of the agent services and use this information to assess the probability of a service match. Finally, our approach goes beyond the simple notion of a "reputation server" in that it discovers and refines a complex, symbolic model of a service provider's performance. This rest of this article is organized into two sections. In section 2, we shall describe how the agent consumer reports will be built, and we will discuss some related issues in section 3. 2
Building Consumer Reports
In our model of agent system, there are three types of agents: service provider agents, service consumer agents, and broker agents. A broker agent is the one responsible for building the agent consumer reports. To simplify the problem, but without loss of generality, we make the following assumptions: (1) All the agents (including the broker agent) in a system share a common domain ontology, and (2) the security and/or privacy issues are orthogonal to what we will discuss in this article. 2.1
Representation
We are extending the LARKS framework for use in describing the agent's capabilities. LARKS, Language for Advertisement and Request for Knowledge Sharing, is an agent capability description language developed at CMU. It describes an agent's service by specifying the context, the data types, the input and output variables, and the input and output constraints. It also has a slot for the definition of the concepts used in the description. The matchmaking scheme in LARKS is relatively flexible and powerful. It has five filters, each of which addresses the matching process from a different perspective. "Context matching" determines if two descriptions are in the same or similar context; "profile comparison", "similarity matching", and "signature matching" are used to check if two descriptions syntactically match; "semantic matching" checks if the
88
input/output constraints of a pair of descriptions are logically match. Based on the need of a specific application domain, these filters can be combined to achieve different types/levels of matching. Since LARKS doesn't provide the mechanisms for describing the "ratings" of an agent service, we plan to extend LARKS so that, besides the 7 standard slots described above, a description will also have zero or more "CR" (Consumer Reports) slots. These slots (if any) are typically domain dependent, and will be used to describe the strength of various aspects of the service provided by some specific agent. For example, the integer sort service description can have some CR slots (in Italic) as shown in figure 1. Context Types Input Output InConstraints OutConstraints ConcDescriptions Pricelndex ResponseTimelndex
Sort Xs: ListOf Integer; Ys: ListOf Integer; Le(length(xs), 100); Before(x,y,ys) <- ge(x.y); In(x,ys) <- in(x,xs); 2 (10 is best) 1 (10 is best)
Figure 1. Capability description for integer sort, with CR slots.
Basically we will add another type of filter, the consumer reports filter, to handle the CR related slots. Since these slots are usually domain dependent, the evaluation and comparison of these slots might need to be done in a domain dependent way. A default CR filter can be provided, e.g., to compare integer-typed slots. The system will allow customized CR filters to be plugged-in to handle the CR slots in a domain dependent way during the matchmaking or comparison. It is recommended that the consumer reports filter be applied after all the other designated filters have been applied. The CR filter will then be used to pick the best one(s) from all the candidates. Please note that while we plan to extend LARKS and use its service/capability description language and its matching filters, we think the approach proposed here is applicable to other representations or systems as well. 2.2
Building Consumer Reports
The consumer reports are built based on the information the broker collects about the service provider agents. The information comes from various channels: The feedback
89
from service consumer agents, testing results (relevant agents can be asked or "hired" to test the service provider agents, when appropriate), the service descriptions advertised by the service provider agents, the domain knowledge etc. If the broker also performs task brokering (in which the broker receives a query, finds an appropriate agent, forwards the query to that agent, and passes the result back to the requesting agent), the requests and the results are useful sources for learning too. The building of consumer reports is more than just collecting feedback data and assigning ratings. There are two levels of consumer reports - the basic level and the advanced level. The basic level is simply about assigning ratings to each relevant CR slots of the original service descriptions based on the information collected. The advanced level, however, goes beyond the originally advertised service descriptions. It might also rate the sub-classes and super-classes of the advertised service class, and captures the probabilistic distribution of the services. Let's use an example to illustrate the basic idea. Consider selling televisions as a service with three sub-service classes: selling traditional TVs, selling HD-ready TVs, and selling HDTVs. Suppose the broker discovered that 85% of the advertisements/requests are about traditional TVs, 8% are about HD-ready TVs, and the rest (7%) are about HDTVs. Then if an agent requests a recommendation on "selling TV" service, the broker would be able to recommend a traditional TV seller with pretty high confidence, or recommend a HD-ready TV seller or a HDTV seller with low confidence (if there is no better choice). Five years later, the distribution of the 3 sub service classes might change to 30%, 20%, and 50% respectively. The broker agent will then be able to dynamically capture the changes in the probabilistic distribution and change its matching criteria accordingly. On the other hand, while most of the TV sellers (those who advertise that they sell TVs) sell traditional TVs, not that many TV sellers sell HDTVs. So based on the probabilistic distribution, the broker agent would be more confident to recommend a TV seller if the request is about traditional TV, while it would be less confident (to recommend a TV seller) if the request is about HDTV. When computing the probabilistic distributions, we consider both how many sub classes a service class has, and the frequency of the advertisements and recommendation requests on that service. Moreover, the feedback from the consumer agents will also be taken into account. In large, heterogeneous agent systems, while exact service matches are always desirable (as long as you know what you want), it's not always possible to find exact matches. Therefore, it's important for the broker agent to learn the probabilistic distribution of the services so as to identify the partial matches that have higher probability of success.
90 3
Discussions
This paper presents some preliminary concepts and plans for an adaptive service broker which learns and refines a model of a service provider's performance. Although we have touched on a number of issues, significant additional issues remain as well as a concrete implementation. The related issues not addressed here include (but not limited to) the security issue, the privacy issue, the fairness issue, and the ontology issue. We believe that the security issue and the privacy issue are orthogonal to what we've discussed here. The fairness issue is more closely related. Though we believe that in general the agent consumer reports provide basis for better service matching, the ratings on specific services may not always be "accurate" - the evaluation of "accuracy" itself is already a big issue. One (partial) solution in mind is for the broker agent to always return a list of service provider agents (instead of the best one(s) only) but will be ordered. For the ontology issue, what if the agents have only a limited subset of shared ontology, or they might use just different ontologies. This issue is somewhat orthogonal, but not cleanly. Employment of ontology translation or ontology negotiation might help. One of the ideas behind this work is the law of locality. The approach proposed here is meant to capture both the temporal locality (e.g., the distribution may change over time) and the spatial locality (e.g., a sub set of the services may get referenced frequently). We will develop a prototype implementation of a system which is partly based on the LARK framework. We will incorporate new ideas which are evolving from the semantic web [Berners-Lee, et. Al. 2001] and the DAML [DAML, 2000] language in particular. Some initial work has been done to explore how DAL can be used to represent and reason about web services and agent services [DAML-S 2001, Mcllraith and Zeng 2001]. References 1. [Cohen, et al, 1992] Cohen, W., Borgida, A. and Hirsh, H. Computing Least Common Subsumers in Description Logics. Proceedings of the National Conference on Artificial Intelligence - AAAI 92, pp 754-760, 1992 2. [Decker, et al, 1996 (1)] Decker, K, and Sycara, K and Williamson, M, Modeling Information Agents: Advertisements, Organizational Roles, and Dynamic Behavior. Working Notes of the AAAI-96 workshop on Agent Modeling, AAAI Report WS96-02. 1996.
91
3. [Dellarocas 2000] Dellarocas C, , Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior, Proceedings of the 2nd ACM Conference on Electronic Commerce, Minneapolis, MN, October 17-20, 2000 4. [Genesereth & Singh, 1993] Genesereth, M. R. and Singh, N. P., A Knowledge Sharing Approach to Software Interoperation Stanford Logic Group Report Logic93-12. 5. [Gruber, 1993] Gruber, T. R., A Translation Approach to Portable Ontologies. Knowledge Acquisition, 5(2): 199-220, 1993. 6. [Michalski, et al, ????] Michalski, R. S., Carbonell, J. G., Mitchell, T. M., Machine Learning, An Artificial Intelligence Approach, Tioga Publishing Company 7. [Mui 2001] Mui, Lik, Szolovitz, P, and Wang, C , Sanctioning: Applications in Restaurant Recommendations based on Reputation, Proceedings of the Fifth International Conference on Autonomous Agents, Montreal, May 2001. 8. [Sycara, et al, 1998] Sycara, K., Lu, J. and Klusch. M. Interoperability among Heterogeneous Software Agents On the Internet. CMU-RI-TR-98-22. 9. [Berners-Lee, et. Al. 2001] Tim Berners-Lee, James Hendler and Ora Lassila, The Semantic Web, Scientific American, May 2001. 10. [Chen et. Al., 2001] Harry Chen, Anupam Joshi, Tim Finin. "Dynamic Service Discovery for Mobile Computing: Intelligent Agents Meet Jini in the Aether." The Baltzer Science Journal on Cluster Computing. March 2001 (Volume 3, No. 2). 11. [DAML 2000] DAML specification, http://www.daml.org/, October 2000. 12. [DAML-S, 2001] DAML-S: A DAML for Web Services, White paper, SRI, http://www.ai.sri.com/daml/services/daml-s.pdf 13. [Labrou, et. Al, 2001] Yannis Labrou, Tim Finin, Benjamin Grosof and Yun Peng, Agent Communication Languages, in Handbook of Agent Technology, Jeff Bradshaw, ed., MIT/AAAI Press, 2001. 14. [Mcllraith abd Zeng, 2001] Mcllraith, S., Son, T.C. and Zeng, H. "Semantic Web Services" , IEEE Intelligent Systems. Special Issue on the Semantic Web. To appear, 2001. 15. [WSDL, 2001] Web Services Description Language (WSDL) 1.1, January 23, 2001, Microsoft Corporation, http://msdn.microsoft.com/xml/general/wsdl.asp
Logical Formalizations Built on Game-Theoretic Argument about Commitments Lamber Royakkers and Vincent Buskens *
Abstract The formalization of commitment is a topic of continuing interest in Artificial Intelligence (AI)'s understanding of human cooperative activity and organization. Such formalizations are crucial for clarifying rational behavior. AI research on commitments, however, has been focusing on describing systems of agents, neglecting the individual incentives to perform certain actions. We argue in this paper that an understanding of a system of agents needs to incorporate not only a logical system of possible actions, but also an incentive structure related to the actions and the interdependence of agents involved in interactions between more agents. As an example we will discuss the use of commitments in interactions between two agents. By adding game-theoretic reasoning, we will not only be able to describe different commitment systems in various (legal) settings, but we can also determine whether or not such commitment system is expected to be socially efficient, desirable, and able to influence human behavior.
1
Introduction
M a n y social interactions between two (or more) agents d e m a n d for various reasons t h e use of c o m m i t m e n t s t o reach socially efficient or avoid socially inefficient outcomes. We will s t a r t with an example. Assume you want to write an article together with a colleague. You are b o t h convinced t h a t joining forces will p r o d u c e a b e t t e r p r o d u c t t h a n writing two articles separately. However, you as well as your colleague cannot be sure t h a t the other will actually invest his fair share in this joint project (cooperate). Still, if b o t h of you work hard, you will b o t h be satisfied. You realize t h a t if t h e colleague sits back (defects) while you do the j o b , he is even b e t t e r off and you would have preferred t o write an article alone. Clearly, your colleague also fears t h a t you sit back and profit from his effort. * Supported by a grant from the Niels Stensen Foundation and by a grant from the Netherlands Organization for Scientific Research (NWO), email: [email protected], [email protected].
92
93 Agent 2 Defect
Cooperate
Defect
2,2
4,1
Cooperate
1,4
3,3
Agent 1
Figure 1: Strategic form of the Prisoner's Dilemma Game The "game" described above (without commitments) is called a Prisoner's Dilemma Game [3]. In strategic form, 1 the game is shown in figure 1. The values in the cells of the matrix indicate the payoffs for each agent related to a combination of actions of the two agents. The expected action in this game is "defect" by both agents, because independent of the action of the other agent, each agent is better of by defecting. Consequently, both agents receive 2 instead of 3, which they could obtain if they both would cooperate. Thus, the expected outcome (2,2) is socially inefficient. However, by committing to cooperation, e.g., my mutually informing the responsible professor who can incur sanctions on the researcher who does not work on the joint paper, cooperation becomes the best option for both agents. Hence, a mutual commitment leads to a better outcome for both agents in this situation. If we want to represent such a simple interaction in a logical system, only the possible actions are described. Commitment is then introduced as an elementary proposition. This implies that the commitment is a fact that does or does not occur. More sophisticated theories [2, 4] describe a formalization of motivational attitudes such as intentions, goals, and wishes that explain why agents behave the way they do. However, within the logical systems there is nothing that drives the motivational attitudes. It is only stated that if certain attitudes are present, commitments are used without explicit reasoning why and when a certain attitude leads to a commitment. For example, in organization theories of Distributed Artificial Intelligence (DAI), negotiation systems, and cooperative software agents, the notion of commitment is used as a mediator of the transformation of the collective activity to the agents expressing issues such as delegation, adaptation, intention, responsibility, etc., which constitutes the theory of collective activity in a narrower way (cf. [1]). We use the primitive notions of intention, knowledge, and goal to define formally social commitment, inspired by Castelfranchi [1, 2]: COMM(i, j,r) = def INT(i,r) A / ^ ( I N T ^ . r ) ) A GOAL(j, ACHIEVE^, r)), (1) where Kj(<j>) stands for the fact that agent j knows
F o r all basic game-theoretic terminology and aspects we refer the reader to [6].
94 The last condition can be seen as a goal adoption: the achievement of the task is a goal of j . In game theory, motivational attitudes are represented by the payoffs agents receive at the end of an interaction, based on their combination of actions. The situation discussed above is only one example of a situation in which a commitment can change the expected outcome for an interaction between two agents. Likewise, the usefulness of commitment systems can be investigated for many social and legal interactions. For now, we will give a very informal description of what we mean by a commitment in this paper. Later we will become more precise and we will show that there are various types of commitments. Definition 1 A commitment is an action by an agent before an interaction with other agents that signals to the other agents the intention to perform a particular action later on in the interaction. We restrict ourselves in this paper to commitments that ensure that the agent who commits to a certain action will execute this action (binding commitments).
2
Adding Game Theory
Our main criticism of logical systems is that they do not explain but only describe actions by agents, probably including the use of commitments. Logical systems fail to explain why commitments are used in some situations and not in others. Logical systems cannot distinguish which commitment is or is not credible in a given interaction. The reason is that logical systems generally neglect the incentives related to various combinations of actions and the strategic interdependence between different agents. Besides explaining the use and effectiveness of commitments, game theory can help to distinguish between different types of commitments. As an illustration, we consider games in which two agents have each two possible actions and preferences over the four possible outcomes are strictly ordered for both agents. Because only the ordering of the payoffs is important for the analyses, they can be labeled as 1, 2, 3, and 4. 2 Rapoport, Guyer, and Gordon [5] show that there exist 78 distinct 2 x 2 games with strictly ordered payoffs.3 Each of the four outcomes represents a possible goal state for the agents. The goal states for the two agents do not need to coincide. For considering commitments, we classify these 78 games in eight groups. Figure 2 presents the matrices for one representative of each group. In these games, agent 1 chooses between T(op) and B(ottom), while agent 2 chooses between L(eft) and R(ight). T h e example of the introduction is not included in this set of games, because in this example the four possible outcomes are not strictly ordered. Including games for which the outcomes are not strictly ordered complicates the analysis considerably. 3 T w o games are considered the same if the one can be constructed from the other by changing rows, columns, or person labels.
95
T
L 4,4
R 3,3
T
L 2,4
B
2,2
1,1
B
3,2
R 4,1 1,3
T
L 3,3
R 1,4
T
L 2,4
B
4,1
2,2
B
1,2
(2)
(1)
R 4,1 3,3
(4)
(3)
L
R
L
R
L
R
L
R
2,3
4,1
3,4
2,1
2,4
3,1
3,3
2,4
1,2
3,4
1,2
4,3
1,2
4,3
4,2
1,1
(5)
(6)
(7)
(8)
Figure 2: Representative examples of 2 x 2 games with strictly ordered outcomes Examples (1) and (2) illustrate two situations in which both agents do not want or need to commit to any of the two actions. Example (1) represents a group of 58 games in which at least one of the two agents has a dominant strategy. 4 The other agent optimizes her payoff given the dominant strategy of the first agent, and both agents cannot do better using a commitment for some other strategy. 5 Example (2) represents 4 games in which none of the agents has a dominant strategy and there exists only one (mixed) equilibrium in which the agents randomly choose between the two options. Their expected payoffs lie between 2 and 3. If one agent would commit, she would not obtain more than 2. 6 For examples (1) and (2) it is impossible to formalize a commitment that affects the behavior of the agents. I.e., any commitment the agents want to make leads to the same behavior as they would execute if there was no commitment. Example (3) is the Prisoner's Dilemma game. This is a very special game. In this game, the game-theoretic solution predicts that both agents obtain 2, while they both would prefer to obtain 3. However, this would imply that both agents have to deviate from their dominant strategy. Consequently, the only commitment arrangement that can work in this game is one in which both agents commit to not playing the dominant strategy. No agent wants to commit unilaterally to Top or Left, respectively, because then the other agent certainly plays the dominant strategy leaving the first agent with the worst outcome possible. This can formally be expressed as follows: COMM(l, 2, Top) A COMM(2,1, Left),
(2)
implying that agent 1 commits to playing Top and agent 2 to playing Left, which leads to the goal state (3,3). This bilateral commitment can be seen as a special case of a collective commitment. Example (4) is also a unique game. In this game, agent 1 wants to commit to playing Bottom, which would result in a payoff 3 for both agents. However, agent 2 prefers to play the game without commitment, which leads to a payoff 4 4 An agent has a dominant strategy if there is one action the agent can perform t h a t gives her a higher payoff for each of the actions the other agent can perform. Readers interested in the precise classification of all the games can contact the authors for an overview. 6 A (Nash) equilibrium is (loosely) an outcome in which none of the agents wants to change her action given t h e action of t h e other agent.
96 for her. This shows that definition (1) is too restrictive to incorporate some kinds of commitments. It requires that the commitment of one agent contributes to a goal of the other agent. This presupposes that both agents have the same goal state. However, example 3 illustrates a situation in which (3,3) is the goal state of agent 1 while (2,4) is the goal state of agent 2. Moreover, without commitment the outcome will be (2,4). Consequently, agent 1 wants to commit to play Bottom. Because this is not the goal state of agent 2, such a commitment does not follow the definition (1). However, an alternative definition: COMM'(i, j , r) = d e ( INT(i, r) A K(j, INT(i, r))
(3)
formalizes a unilateral commitment that does not need to lead to the goal state of agent j . This definition excludes that there has to be an agreement between the agents about whether or not the commitment can be made. 7 Example (5) represents a group of 8 games, in which both agents agree that one agent should commit. Without commitment they both obtain less compared to the situation that one agent commits. In example (5), agent 1 has to commit to play Bottom. Example (6) represents 3 games, which could also be called "coordination" games. In these games, there are more equilibria, and both agents want to coordinate on one of the equilibria, but without a commitment they do not have a clue about what the other agent will choose. In these games, the agent who commits first is best off, and the other agent is better off than if their would not be a commitment, although she would have preferred to be the one who committed herself. Note that in these games, a two-sided commitment does not work if, for example, agent 1 commits to Bottom and agent 2 commits to Left. The definition (1) is a suitable formalization for a commitment that leads to a socially efficient outcome in example (5) and example (6). However, for example (6), there is a complication because both agents might commit, but they should not commit simultaneously. Therefore, a suitable commitment system should prescribe which agent is allowed to commit. Both agents want to commit because the committed agent receives 4, while the other agent receives 3. The system can be formalized by the convention: COMM(l, 2, Bottom) V COMM(2,1, Left)A -.(COMM(l, 2, Bottom) A COMM(2,1, Left)).
(4)
Example (7) looks very much the same as example (6). The only difference is that agent 1 prefers to play the game without a commitment, rather than that agent 2 commits to playing Left, while this is the best solution for agent 2. On the other hand, both agents prefer to play the game while agent 1 commits to playing Bottom over playing the game without a commitment. There are two 7 For example, a car driver will stop for somebody who s t a r t e d crossing the road, although the car driver would have preferred to continue driving while the other person waited at the sidewalk. In this example, starting t o cross t h e road is the commitment signaling the intention of the pedestrian to go first.
97 games with this s t r u c t u r e . T h i s analysis suggests t h a t C O M M ( l , 2, B o t t o m ) is t h e preferred formalization of a c o m m i t m e n t in this situation. Finally, example (8) is a unique example in which different c o m m i t m e n t systems lead t o three different solutions. If t h e agents can commit unilaterally, agent 1 commits to playing B o t t o m , while agent 2 commits to playing Right. T h e one who commits first o b t a i n s 4, while t h e other who has to follow obtains 2. However, if t h e y can agree on c o m m i t t i n g t o play Top and Right, they b o t h o b t a i n 3, which is still b e t t e r t h a n playing without a c o m m i t m e n t , because t h e expected outcome for b o t h agents is t h e n somewhere between 2 and 3. T h e socially efficient outcome (3,3) can only be reach with a bilateral c o m m i t m e n t , expressed by formula (2). W h a t we learn from this classification of simple 2 x 2 games is t h a t the definition of a social c o m m i t m e n t provided in logical systems leaves too m a n y essential dimensions of a c o m m i t m e n t unspecified. If t h e c o m m i t m e n t has t o be agreed u p o n by t h e n o n - c o m m i t t e d agent, the c o m m i t t e d agent will commit in other situations t h a n if t h e c o m m i t t e d agent can unilaterally commit which is neglected in existing logical formalizations. Therefore, we introduced a n o t h e r t y p e of c o m m i t m e n t using t h e o p e r a t o r C O M M ' , which does not include t h a t t h e intended action of t h e c o m m i t t e d agent contributes to t h e goal of t h e other agent. It might be crucial whether one or b o t h agents have an option to commit t o a move and in which order t h e agents obtain t h e o p p o r t u n i t y t o commit. In game-theoretic t e r m s , these options can be formalized by adding moves t o t h e game t h a t i m p l e m e n t t h e possibilities for t h e agents to commit and, eventually, to accept t h e c o m m i t m e n t of t h e other agent. T h e s e moves m i g h t be specified simultaneously or sequentially. Using game-theoretic reasoning, solutions of these extended games can be calculated, which provides predictions a b o u t whether or not c o m m i t m e n t s will be used and w h a t t h e consequences of these c o m m i t m e n t s are d e p e n d i n g on t h e chosen c o m m i t m e n t system. As a result, insides are o b t a i n e d a b o u t whether a c o m m i t m e n t s y s t e m is socially efficient or favors one of t h e two agents.
References [1] Castelfranchi, C , Commitments: From individual intentions to groups and organizations, in: V. Lesser (ed.), Proceedings First International Conference on Multi-Agent Systems, AAAI-Press and MIT Press, San Francisco, 41-48, 1995. [2] Dunin-Keplicz, B., and R. Verbrugge, Collective commitments, in: M. Tokora (ed.), Proceedings Second International Conference on Multi-Agent Systems, AAAI-Press, San Francisco, 56-63, 1996. [3] Luce, R.D. and H. Raiffa, Games and Decisions, Wiley, New York, 1957. [4] Meyer, J.-J.Ch., W. van der Hoek and B. van Linder, A Logical approach to the dynamics of commitments, Artificial Intelligence 113, 1-40, 1999. [5] Rapoport, A., M.J. Guyer, and D.G. Gordon, The 2x2 Game, University of Michigan Press, Ann Arbor, MA, 1976. [6] Rasmusen, E., Games and Information: An Introduction to Game Theory (2nd), Blackwell, Oxford, 1994.
ASYNCHRONOUS CONSISTENCY MAINTENANCE MARIUS-CALIN SILAGHI, DJAMILA SAM-HAROUD, AND BOI FALTINGS EPFL, CH-1015, Switzerland {Marius. Silaghi, Djamila. Haroud, Boi. Faltings} @ epfl. ch Maintaining local consistency during backtrack search is one of the most powerful techniques for solving centralized constraint satisfaction problems (CSPs). Yet, no work has been reported on such a combination in asynchronous settings. The difficulty in this case is that, in the usual algorithms, the instantiation and consistency enforcement steps must alternate sequentially. When brought to a distributed setting, a similar approach forces the search algorithm to be synchronous in order to benefit from consistency maintenance. Asynchronism 1 | 2 is highly desirable since it increases parallelism and makes the solving process robust against timing variations. This paper shows how an asynchronous algorithm for maintaining consistency during distributed search can be designed. The proposed algorithm is complete and has polynomial-space complexity. Experimental evaluations show that it brings substantial gains in computational power compared with existing asynchronous algorithms.
1
Introduction
A constraint satisfaction problem (CSP) is defined as a set of variables taking their values in particular domains and subject to constraints that specify consistent value combinations. Distributed constraint satisfaction problems (DisCSPs) arise when the constraints or variables come from a set of independent but communicating agents. The most successful centralized algorithms for solving CSPs combine search with local consistency. The local consistency algorithms prune from the domains of variables the values that are locally inconsistent with the constraints, hence reducing the search effort. When a DisCSP is solved by search using a distributed network of agents, it is desirable that this search exploits asynchronism as much as possible. Asynchronism gives the agents more freedom in the way they can contribute to search. It also increases both parallelism and robustness. In particular, robustness is improved by the fact that the search can still detect unsatisfiability even in the presence of crashed agents. The existing work on asynchronous algorithms for distributed CSPs has focused on one of the following types of asynchronism: a) deciding instantiations of variables by distinct agents. The agents can propose different instantiations asynchronously. b) enforcing consistency. The distributed process of achieving "local" consistency on the global problem is asynchronous (e.g. Distributed Arc Consistency 3 ). We show how these techniques can be combined without losing asynchronism.
98
99 A s a^----rv
level 0 Bli^>o-leveM proposals of A1
level 0 level 1 level 2
A1 A^ Aa.
MA
Figure 1. Distributed search trees: simultaneous views of distributed search seen by A2, A4, and A4, respectively. Each arc corresponds to a proposal from Aj_ 1 to Aj.
2
Preliminaries
Asynchronous search In this paper we target problems with finite domains. We consider that each agent Ai wants to satisfy a local CSP, CSP(Aj). The agents may keep their constraints private but publish their interest on variables. The technique we propose builds on Asynchronous Aggregation Search (AAS), a general complete protocol for solving distributed CSPs with polynomial space requirements 2 . AAS is an extension of Asynchronous Backtracking (ABT) and allows for asynchronism of type a. AAS uses a strict order on agents. We assume that Aj has the position j , J> 1- If j > k, we say that Aj has a lower priority than Ak- Aj is then a successor of Ak, and Ak a predecessor of Aj. Asynchronous distributed consistency The centralized local-consistency algorithms prune from the domain of variables the values that are locally inconsistent with the constraints. Their distributed counterparts (e.g. 3 ) work by exchanging messages on value elimination. The restricted domains resulting from such a pruning are called labels. In this paper we will only consider the local consistencies algorithms which work on labels for individual variables (e.g. arc-, bound-consistency). Let P be a Distributed CSP with the agents Al,ie{l..n}. We denote by C{P) the CSP defined by Uj6{i..n}CSP(.4t). Let A be a centralized local consistency algorithm as just mentioned. We denote by DC(.4) a distributed consistency algorithm that computes, by exchanging value elimination, the same labels for P as A for C(P). When DC(A) is run on P, we say that P becomes DC(A) consistent. 3
Asynchronous consistency maintenance
In distributed search, each agent has its own perception of the distributed search tree. It is determined by the proposals received from its predecessors. In Figure 1 is shown a simultaneous view of three agents. Only A2 knows the fourth proposal of A\. Ay, has not yet received the third proposal of Ai consistent with the third proposal of A\. However, A4 knows that proposal of A2. Suppose that A\ has not received anything valid from A3, A4 will assume that A3 agrees with A2. The term level in Figure 1
100
refers to the depth in the (distributed) search tree viewed by an agent. We show that A, can then benefit from the value eliminations resulting by local consistency from the proposals of subsets of its predecessors, as soon as available. 4
The DMAC protocol
This section presents DMAC (Distributed Maintaining Asynchronous Consistency), a complete protocol for maintaining asynchronous consistency, built on AAS. Definition 1 (Aggregate) An aggregate is a triplet (xj,Sj,hj) where Xj is a variable, Sj a set of values for Xj, s-,-^0, and hj a history of the pair (XJ , Sj). The history guarantees a correct message ordering. Let ai = (xj,Sj,hj) and a^ = (xj,s'j,h'j) be two aggregates for the variable Xj. a\ is newer than a-i if hj is more recent than /i'. The ordering of histories is described in full detail in 4 . The newest aggregates received by an agent Ai define its view, view(Ai). An aggregate-set is a set of aggregates. Let V be an aggregate-set and vars(Aj) the variables of CSP(^4,). Ti(V) will denote the set of tuples directly disabled from CSP(J4,) by V. Definition 2 (Nogood entailed by the view) V—>-iTi(V) is a nogood entailed for At by its view V, denoted NVi{V), iff V'CV and T(V) = T(V). Definition 3 (Explicit nogood) An explicit nogood has the form -> V, or V—>fail, where V is an aggregate-set. The information in the received nogoods that is necessary for completeness can be stored compactly in a polynomial space structure called conflict list nogood. Definition 4 (Conflict list nogood) A conflict list nogood, denoted by CL, for A; has the form V—>->T, where V
101
annouce explicit nogoods. Any received valid explicit nogood is merged into the maintained CL using an inference technique. 4.1
DM AC
In addition to the messages of AAS, the agents in DMAC may exchange information about nogoods inferred by DCs. This is done using p r o p a g a t e messages. Definition 5 (Consistency nogood) A consistency nogood for a level k and a variable x has the form V—>(x&lx) or V^>^(xGs\lx). V is an aggregate-set and may contain for x an aggregate (x,s,h), lxCs. Any aggregate in V must have been proposed by predecessors of'Ak+i. lx is a label, lx^ty. Each consistency nogood for a variable x and a level k is tagged with the value of a counter Cx at sender and is sent via p r o p a g a t e messages to all interested agents Ai,i>k. The agents Ai use the most recent proposals of the agents Aj,j
102
Property 1 Vi in finite time tl either a solution or failure is detected, or all the agents Aj, 0<j
Conclusion
Consistency maintenance is one of the most powerful techniques for solving centralized CSPs. Bringing similar techniques to an asynchronous setting poses the problem of how search can be asynchronous when instantiation and consistency enforcement steps are combined. We present a solution to this problem. A new distributed search protocol which allows for asynchronously maintaining distributed consistency with polynomial space complexity is then proposed. References 1. M. Yokoo, E. H. Durfee, T. Ishida, and K. Kuwabara. The Distributed CSP: Formalization and algorithms. IEEE Trans, on KDE, 10(5):673-685, 98. 2. M.-C. Silaghi, D. Sam-Haroud, and B. Faltings. Asynchronous search with aggregations. In Proc. of AAAI2000, pages 917-922, 2000. 3. Y. Zhang and A. K. Mackworth. Parallel and distributed algorithms for finite constraint satisfaction problems. In Proc. of Third IEEE Symposium on Parallel and Distributed Processing, pages 394-397, 91. 4. M.-C. Silaghi, D. Sam-Haroud, and B. Faltings. ABT with asynchronous reordering. In IAT, 01. 5. M.-C. Silaghi, D. Sam-Haroud, and B. Faltings. Asynchronous consistency maintenance with reordering. Technical Report #01/360, EPFL, March 2001.
CHAPTER 2 COMPUTATIONAL ARCHITECTURE AND INFRASTRUCTURE
REASONING ABOUT MUTUAL-BELIEF AMONG MULTIPLE COOPERATIVE AGENTS WENPIN JIAO Department of Computer Science, University of Victoria, Victoria, BC V8W 3P6, Canada wpjiao @ csr. esc. uvic. ca Believing mutually is an important premise to ensure that cooperation among multiple agents goes smoothly. However, mutual belief among agents is always considered for granted. In this paper, we adapt a method based on the position-exchange principle to reason about mutual belief among agents. To reason about mutual belief formally, we first use a process algebra approach, the pi-calculus, to formalize cooperation plans and agents, and then bind the position-exchange principle into the inference rules. By reasoning about mutual belief among agents, we can judge whether cooperation among agents can go on rationally or not.
1
Introduction
Cooperation among agents is one of the keys to drawing multiple intelligent systems together [6]. Cooperation among multiple agents should meet at least three criteria: 1) agents should response mutually, 2) all agents should make joint commitments, 3) each agent should be committed to supporting inter-actions [1]. That is, every agent participating in cooperation must believe that any other agents are honest and will take actions following a specific cooperation plan, and vice versa. Shortly, all agents involved in cooperation must believe each other mutually. Generally, once after an agent takes an action, it must expect to observe a specific result or response from others so that it could conclude whether it can believe others or it is believed by others. If any agent participating in cooperation believes that it itself is believed by others and others are believable as well, we will say that those agents believe each other mutually and the cooperation will proceed smoothly. However, in a distributed system, an agent almost knows nothing about others, thus it can only reason about the others' knowledge based on its own knowledge. To achieve that, an agent has to assume that others will think and act in a similar way as itself. In this paper, we adopt a technique using the positionexchange principle to reason about mutual belief between agents. The position-exchange principle means that one will put him in others' position and judge others' feelings by his own. In other words, when one wants to reason about another, he will take the view of the other and thinks as if he were the other. For example, to reason about another's knowledge, one may say "/// did it, I believe that if he were me he would do it under the similar circumstance, too." In a logic system, the position-exchange principle can be described as the following formula. BA(a -*p)^ BA(BB{a{yA] -* /?{%})) 104
105 Where, B$ indicates that X believes Y is held; a{B/A) is a new formula different from a, in which all variables related to A are substituted with variables related to B. It means if A believes that /? will be held under condition a, A will believe as well that B will believe the similar conclusion fi{B/A] will be held under the similar condition a{B/A}. When we were using the position-exchange principle, we need not only substitute those variables related to agents but also transform the actions associated with agents since one does not know how the other acts. However, in a general logic framework, we cannot reason about actions. So we use process algebra, the picalculus, to reason about mutual belief among agents. In the pi-calculus, actions of the pi-calculus processes occur in pairs and are mainly for communicating. Thus when we use the position-exchange principle, we can reason about other's belief by substituting both variables and mutual complementary input/output actions. In the following sections, we first give the formal framework in section 2. Then in section 3, we formally describe what an agent, a cooperation plan, and cooperation look like. In section 4, we define inference rules based on the positionexchange principle for reasoning about mutual belief among agents, and then use them to reason about the rationality of specific cooperation among agents. The last section offers some conclusions. 2
The Formal Framework
In this paper, we adopt a process algebra approach, the pi-calculus [5], to formalize agents, plans, and cooperation. In the pi-calculus, there are only two kinds of entities: processes and channels, where processes are active components of a system and they communicate with each other through ports (or names) connected via channels. The processes in the picalculus have the following forms. P-.^H^^.P,
|
P\Q
| \P
| (yx)P
| [x = y]P
7t::=x(y)\xy\t
Where, / is a finite set. £iGl % .P, represents to execute one of these / processes and when / = 0 we mark E ieI m -P, as 0, which is inert. x(y) and xy represent that name y will be input/output along channel x, respectively, whereas r represents a silent action. P\Q represents the parallel composition of two processes of P and Q. \P represents any number of copies of P. (vx)P introduces a new channel x with scope P, where v is the constriction operator, [x = y]P means that process P will proceed only if x and y are the same channel, where [x = y] is the matching operator. In the pi-calculus, the computation and the evolution of a process are defined by reduction rules. The most important reduction relation is about communication. ay.P\a(z).Q ~^au) )P\QV/)
106 It means that the process will reduce into the right form after the communication, and meanwhile all free occurrences of z in Q will be substituted with y. 3
Agents and Their Cooperation
Though an agent is an active entity with pro-activities, it should take actions complying with the global cooperation plan. In this section, we first define cooperation plans formally as the pi-calculus processes. And then we define agents formally and show how to bind agents together to perform the cooperation plan. 3.1
Cooperation Plan
A cooperation plan is always composed of a series of tasks, among which there are some specific relationships to coordinate their performing. In general, a cooperation plan can be viewed as a tree, in which nodes are tasks to be allocated to agents; and relationships among tasks can be mapped to relationships among nodes. A cooperation plan can be defined recursively as follows. 1. The cooperation plan has a hierarchical structure, which is represented as a tree. 2. Any task is corresponding to a node within the plan tree. The global task, P, corresponding to the global plan, is the root of the plan tree, and Plan =def P. 3. If a task P consists of a set of sub-tasks, Pj, P2, ..., Pn, the node corresponding to the task will have as many children-nodes as the sub-tasks. And P ~dej Pi I P2 I • • • I Pn-
4.
Among those sibling nodes, there are two categories of relations. If there is a unary relation over P, or a binary relation over />,- and P}• (1 < i ^ j < n), Ph PJt and P may need to be redefined. 4.1. Unary relation: Repetition. It means that the corresponding task needs to be performed many times. And P, =redef 'Pi4.2. Binary relations. There are four kinds of binary relations between sibling nodes, serialization, synchronization, sequence, and parallel1. 4.2.1. Serialization. It means that the performing order of two tasks is not important, but the two tasks cannot be carried on concurrently. And P, =rm
P„ -P.'^P
Pj =re„e, P,J • fy • V
^
1
While defining the plan process, we require that serialization relations must be considered first, and then synchronization and sequence; otherwise, deadlocks may be brought into the plan process. For example, consider three sub-processes, P, Q, R, among which P and Q must be performed serially and R must be carried on before both P and Q. If we do not follow the above convention, we may get process p .5 .P.V~\p . .Q.^~\R.1T.1T\S , whereS -~p~.v .S • Then, if r'pq
pr
'
r
pg I r pq
v
qr
*£
r
pq 1*
^pr
vy
qr \
pq'
"'"-'**
>~>^
yp(j
* pg
^ ^
Q communicates with Spq before P has a chance to do so, a deadlock will occur.
107
Where, s = y
def
rT.v .S rij
y
ij
is like a pv semaphore controller in r
^ y
r
operating systems. 4.2.2. Synchronization. Two tasks with a synchronization relation must be performed at the same time. And2 Pi=,m$irPi> P
= ,
m
( . ^ , j ) - \
Pi=retef^rP>'and' P
i \ - \
P
j \ -
4.2.3. Sequence. The performing of two tasks should be controlled under a restricted order, i.e., one must precede the other. And
Pt=«ufP,-'s'v. ^=^r p=^(ySu)-\p,\-\Pj\-
W
a n d
4.2.4. Parallel. They can even be carried on concurrently. For that case, processes need not to be redefined. 5. There are no any other kinds of nodes or relations within the plan tree except for those defined above. For example, in an electronic commerce community, a price negotiation procedure can be planned as the repetition of price bargaining between two parties (figure 1).
Wait for a stroked price
Wait for an asked price
Accept the price? Agree/Disagree '"•£ Dash arrowhead arc represents the unary repetition relation. •••> Sequence Strike a <-> Serialization price ••-• Synchronization
Figure 1. The plan tree of a price negotiation procedure
In the plan, the bargaining process, which is divided into two sub-processes of price asking and striking, will repeat for any times until both sides make a deal. For the price-asking process, it is divided further into two sub-processes, a process asking a price and then the other waiting for a stroked price. For the price-striking process, it is also divided into two sub-processes, one waiting for a price and then the other striking a price back. Once someone (for instance, the bargaining initiator) thinks the stroked price is acceptable, it can stop bargaining and make a deal. 2
Synchronization relations are symmetric, so we need only to consider those cases that i < j . Thus deadlocks can be avoided among synchronized nodes.
108
The plan shown in figure 1 can be expressed in the pi-calculus as follows. PriceNegotiationPlan = P0 = (vS0) !/>, . S0 | S0 • P2 P 1 I =(^ I )/ J 1 1 1 .^"|J 1 .P 1 I 2
/>, = {v52)Pm .T2\S2
.Pm
When representing a cooperation plan in the pi-calculus processes, we add some new communicating ports to control the execution of sub-processes so that we could represent relationships within a composition process. Generally, when there are relationships such as serialization, synchronization, and sequence in a system, there may occur deadlocks. Fortunately, by using the procedure described above, we can get a non-deadlock plan process if there is no deadlock among the plan tree. Proposition 1. If there is no deadlock among the plan tree, the corresponding composition process of the plan will be deadlock free. The proof is quite simple. As discussed above, we can first eliminate the possibility of a deadlock lying in serialization and synchronization relations. On the other hand, any two synchronized processes cannot have sequence relations with another process simultaneously, and vice versa. That is to say, any sequence relations and synchronization relations are impossible to bring a cyclic waiting-chain into processes if there is no cyclic waiting-chain occurring in the plan tree. Thus, we can say that the translation described above is deadlock free. 3.2
Agent
In a cooperative environment, an agent must undertake tasks to cooperate with others by complying with a certain cooperation plan. We can define an agent as an entity that includes actions, tasks it undertakes, and behavior specifications consistent with a specific cooperation plan. To represent the behavior specifications of an agent, we define a function of expectation from actions to actions to indicate that the agent expects to conceive what kind of response after it takes an action. An agent is an 4-ary tuple. A = Where, A is an action set, 7"is a collection of tasks, £"is A's expectations and defined as a function E: A—> A, and Bis A's beliefs. Components of Agents can be defined on the pi-calculus formally, in which the action set A is a set of pi-calculus actions, the task set 7~is a collection of pi-calculus processes, and for any process P e 7~and P-y.P',ye A. Suppose that a, /? e A, then E(a) = ft means that if the agent A takes action a, it will expect that action p to happen. In general, we can say that only when an agent is waiting for something does it expect that thing to appear, so we will only define an agent's expectations on its input actions. Then if E{a) = 0, a can be either an input or an output, but /3 must be an input action.
109
For any process P s T, suppose that P has the following form.
P= — .a.p.— Where, oris an input/output action, and /?is an input action. Then E(a) = J3. In addition, suppose that the agent is assigned two tasks within a cooperation plan, P, and P2, if there is a sequence relation between them, and
^=(-.or.-).^T,
P2=Sn.
(-.p.-)
Where, oris an input/output action, and /?is an input action. Then E\a) = p. Since each agent has its own actions, tasks, expectations, and beliefs, A, T, E, and B can be viewed as functions with the domain of agents. In the rest context, we use /4(A), 7[A), E(A), and B(A) to denote the action set, the task set, the expectations, and the beliefs of A, respectively. In this paper, we will only consider such kind of beliefs as whether an agent trusts others, whether the agent is trusted by others, and so on. For convenience, we mark x e Bas A > x. Suppose there is a set of agents, Ag, and A, B e Ag, then A > B means A trusts B, whereas A> (B\> A) means A believes that B trusts A as well. 3.3
Bind Agents into the cooperation plan
The cooperation plan is only a cooperation blueprint or specification of tasks, which does not provide concrete actions or functions to perform those tasks. After cooperation is planned, tasks should be assigned to cooperative agents. For example, if we allocate those tasks shown in figure 1 to a seller agent, S, and a buyer agent, B, for instance, P0, Pi, Pn, Pm, Pm, P2, and P2i to S, and Pn, Pm, P122, ar| d P22 to B, agent S and B can be defined as follows. S= B= A = {ap,a(x),0\p,o2(y)) A = {eop,a(x),ol(y),o2t} T = {P0,Pl,P„,Pln,Pnl,P1,P2l) T = {Pl2,Pm,Pll2,P12) E = {,} E = {
B = {)
PIU =CalculatePrices(p).ap
p
Pul=aKx) P2l=o,p.o2(y)
Pm=cop P22=o,(y).o2t
m = a(x).CakulatePrice
„(p)
Figure 2. Formal definitions of agent S and B
Where, a and a represent actions "asking a price" and "waiting for an asked price" respectively, co and a represent actions "striking a price" and "waiting for a stroked price", and o, asks "Accept the price or not?" and then o2 waits for the answer. Functions CalculatePricesip) and CalcuatePriceB(p) are used to calculate a new asking price and a new striking price, respectively.
110
For agent S's expectations, they mean that the seller hope that it will receive a response after each round of bargaining and the buyer will acknowledge its any questions. For agent B's expectations, the buyer may expect that the bargaining must be initiated by someone else, and after it strikes a price it may hope that the seller asks a new price or makes a deal with it. To assemble cooperative agents into the cooperation plan, we should connect the abstract plan specification with those concrete implementations of agents' functions. In the pi-calculus, we can use the following method to achieve that. First, we view the tasks occurring in the plan process as pointers and then make those pointers point to the functions provided by agents. For example, suppose that Pi is a task in the plan process and has been assigned to agent A, who will undertake that task by taking action Ta, then we can define following processes. Pi=Z,.A>
A
= Zi.A'Ta
Then compose the processes defined above into a composition process, that is Pi\A
= Z,.A I *LA • T<,
Thus we bind the agent with the plan together. On the other hand, an agent may undertake several tasks, for instance, Th T2, ...,Tke 7(A), then 7(A) can be re-defined as a composition of processes. r(A)
= zl-Tl\z2.T2\
•••
\zk-Tk
Thus, a cooperation system with a cooperation plan, Plan, and a collection of cooperative agents, Ah A2, ..., A„, can be defined as follows. Sys = Plan\T(Al)\T(A2)\ ••• |7"(A„) 4
Reason about Mutual-Belief
In this section, we will define some inference rules for reasoning about mutual-belief among agents. While defining those rules, we mix the position-exchange principle into the definitions. And then we will describe in what condition agents will believe each other mutually. 4.1
Rules on Beliefs
To define rules on beliefs, we should first know what actions are observable to an agent. To represent an agent obseves an action y, we assign the form A—Z—*A' with the following meaning. P e TO!),«„«.,,—,gte/4Q4), P
">•">• "•"" >P', P'—^P",r*^ A—L>A'
A ^ e A(A)
Intuitively, if an action is observable in a process, it is also observable to the agent. In general, An agent knows nothing about others. To build beliefs on others, it can only base on those messages it has sent and received. However, not all messages
111
it receives are something that it is waiting for or expecting. So, in our definitions of rules on beliefs, we include the expectations of agents as premises and then agents will only believe things that they are expecting. Based on the position-exchange principle, an agent can derive beliefs on it from messages it receives, and then derive beliefs on others from messages it sends. 1. Belief about honesty of the other If the agent receives a message that it is expecting, it will believe that the sender agent is trustable. A—g->A',3cr-(a,/7)e E(A),~J3e A{B) A B
2.
4.2
> (BR1) Where, a can be an input/output action, whereas P must be an input action. Belief on the other's belief Correspondingly, under the position-exchange principle, A will believe that agent B also trusts it if A responds a message to B as B requests. A—£-»A',3a-(«,jff)eF(B),/7e/4(fi) A>(B>A) (BR2) While using the position-exchange principle in the above rule, we do not substitute all occurances of A. Instead, we just replace the action /} with its complement one p since A may not be clear how the receiver, B, is evolving. Mutual Belief among Agents
Informally, we say two agents have built mutual belief if both of them trust each other and each of them believes that it counterpart also trusts it. Then, the mutualbelief can be defined formally in several groups of beliefs. 1. Both of the agents believe in its counterpart. A> B, and B> A 2. Each of the two agents believes its counterpart trusts it as well. A>(B> A), and B>(A> B) For a cooperation plan, in which its tasks are allocated to cooperative agents, if those agents cannot build mutual belief during cooperation, we will say that cooperation will not proceed smoothly and it is irrational. In other words, to build mutual belief among agents is the least requirement for cooperation. Definition: At-Least-Rationality of cooperation. If agents can build mutual belief during cooperation, we say that cooperation is at least rational. 4.3
Reason about Mutual Belief among Agents - an Example
Consider the example shown in figure 1 again, the complete plan, and parts of agent S and agent B are redefined as follows.
112
Plan =! {{(v8x )z, . Sx | Sl . z2) | ((v£, )z 3 . £21tf2. z 4 )). J01S0 • ((vf, )f, • z51 $", • z6) n 5 ) = z,.A' 111 |z 2 .P 112 |z s .l , 21 , ««/ T(B) = z 3 . PI2I | z 3 . />122 |z 6 . P22 Then the procedure to reason about mutual belief between S and B can proceed at the same time while the computation between S and B is going on. 1. S calculates out an asking price and sends it to B, and then waits for Z?'s response. On the other side, B is waiting for S to ask for a new price. If B receives the message from S, i.e., B observes action a(x), then by rule BR1 B—^B\(T,a)sE(B),ae/t(S) then B>S 2. Once after B receives an asking price, it will calculate a new price for striking and then send it back to S. At that case, by rule BR2 B—^B\(a,a))eF(S),coeA(S) then B>{S>B) On the other side, for S, by rule BR1 S—2-^S',(a,aJ)eE(S),~coeA(B) then S>B 3. By now, B has believed that S is trustable and it itself is also trustable for S. However, S is not certain whether it is trusted by B or not though it has trusted B. If the cooperation stopped now, cooperation would be uncompleted since the two agents have not built mutual belief. Nevertheless, according to the cooperation plan, agent S has two choices for its succeeding actions. 3.1. Continue by suggesting another asking price to B. Then by rule BR2 S—Z-*S',(a),a)^E{B),a&A{B) then S>(B>S) 3.2. Or stop bargaining and make a deal with B. Similarly as 3.1 5—^-^S',{w,o l )e E{B),oxeA(B) then S > (B > S) Now, although the computation between S and B does not finish, the mutual belief has been built between them. If we reason about further, we can only enhance the mutual belief. Thus we can say the cooperation between 5 and B is rational. 5
Conclusions
In [1], it gave three criteria for cooperation among multiple agents. Briefly, to cooperate, all agents must believe each other mutually. However, cooperation schemes in current literatures take mutual belief for granted [2][3][4][6][8], and they always assume that cooperating agents believe each other mutually, which will leave many chances for malicious agents to do harms on cooperation. Only when we know that every agent participating in the cooperation believes each other mutually can we say that the cooperation will go through smoothly. In this paper, to reason about mutual belief among agents, we adopt a technique using the position-exchange principle. By using those inference rules based on the principle, we can reason about an agent's beliefs on it and on others. In [7], a
113
different inference rule was used to reason about knowledge of others. That inference rule can be expressed as follows. BABB(a - » £ ) - > (BABBa-» BABBfi) Intuitively, this rule says that if A believes that B believes some implication is held, then once A believes that B believes the premise of the implication is satisfied then A will also believe that B will believe the result of the implication is implied. That inference rule has several main differences from ours. First, it requires that A must have already had beliefs on B. Second, the rule can only be applied to the circumstance that all agents have completely common knowledge. However, in a distributed environment, agents are incapable of owning knowledge or beliefs about others in advance, and it is impossible for agents to possess all knowledge dispersed within the environment, either, which will lead the above rule unsuitable for real distributed systems. Before defining the position-exchange principle in inference rules, we first take a process algebra approach, the pi-calculus, to formalize cooperation plans and then define an agent as an entity with actions, tasks, expectations, and beliefs. While defining the inference rules for reasoning about mutual belief, we take an agent's expectations into consideration and bind the expectations with its beliefs together so that the agent will only believe what it is expecting. Thus once mutual belief is built among agents; we will be able to say that the cooperation will go on rationally. References 1. M. E. Bratman. Shared cooperative activity. Philosophy Reviews, 101:327-341, 1992. 2. Barbara Grosz and Sarit Kraus. Collaborative plans for complex group actions. Artificial Intelligence, 86(2):269-357, 1996. 3. V. R. Lesser. A retrospective view of fa/c distributed problem solving. IEEE Transactions on Systems, Man, and Cybernetics, 21(6), December 1991. 4. H. J. Levesque, P. R. Cohen, and J. H. T. Nunes. On acting together. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), pp.94-99, Boston, MA, 1990. 5. R. Milner, J. Parrow, and D. Walker. A Calculus of Mobile Processes, Part I, II. Journal of Information and Computation, Vol.100, 1992, pp. 1-77. 6. Sarit Kraus. Negotiation and Cooperation in Multi-Agent Environments. Artificial Intelligence Journal, 94(l-2):79-98, 1997. 7. SHI Zhongzhi, Tian Qijia, and Li Yunfeng. RAO Logic for Multiagent Framework. Chinese Journal of Computer Science and Technology 14(4), 1999. 8. Michael Wooldridge and Nicholas R. Jennings. Towards a theory of cooperative problem solving. In Proceedings of Modelling Autonomous Agent in a Multi-Agent World (MAAMAW-94), Odense, Denmark, 15-26, 1994.
PORTABLE RESOURCE CONTROL FOR MOBILE M U L T I - A G E N T S Y S T E M S IN JAVA
WALTER BINDER CoCo Software
Engineering, Margaretenstr. 22/9, A-1040 E-mail: w.binder Q coco. co. at
Vienna,
Austria
J A R L E G. H U L A A S , A L E X V I L L A Z O N , A N D R O R Y G. V I D A L University
of Geneva, rue General Dufour 24, CH-1211 Geneva, E-mail: {Jarle.Hulaas, Alex Villazon)@cui.unige.ch vidalr5<3cuimail. unige. ch
Switzerland
Prevention of denial-of-service attacks is indispensable for distributed multi-agent systems to execute securely. To implement the required defense mechanisms, it is necessary to have support for resource control, i.e., accounting and limiting the consumption of resources like CPU, memory, and threads. Java is the predominant implementation language for mobile agent systems, even though resource control is a missing feature on standard Java platforms. Moreover, prevailing approaches to resource control in Java require substantial support from native code libraries, which is a serious disadvantage with respect to portability, since it prevents the deployment of applications on large-scale heterogeneous networks. This article describes the new resource-aware version of the J-SEAL2 mobile agent kernel. The resource control model is based on a set of requirements, where portability is very significant, as well as a natural integration with the existing programming model.
1
Introduction
Java was designed as a general-purpose programming language, with special emphasis on portability in order to enhance the support of distributed applications. Therefore, it is natural that access to low-level, highly machinedependent mechanisms were not incorporated from the beginning. New classes of applications are however being conceived, which rely on the facilities offered by Java, and which at the same time push and uncover the limits of the language. These novel applications, based on the possibilities introduced by code mobility, open up traditional environments, move arbitrarily from machine to machine, execute concurrently, and compete for resources on devices where everything from modest to plentiful configurations can be found. We are therefore witnessing increased requirements regarding fairness and security, and it becomes indispensable to acquire a better understanding and grasp of low-level issues such as resource management. Operating system kernels provide mechanisms to enforce resource limits
114
115
for processes. The scheduler assigns processes to CPUs reflecting process priorities. Furthermore, only the kernel has access to all memory resources. Processes have to allocate memory regions from the kernel, which verifies that memory limits for the processes are not exceeded. Likewise, a mobile agent kernel must prevent denial-of-service attacks, such as agents allocating all available memory. For this purpose, accounting of resources (e.g., memory, CPU, network, threads, etc.) is crucial. The great value of resource control is that it is not restricted to serve as a base for implementing security mechanisms. Application service providers may need to guarantee a certain quality of service, or to create the support for usage-based billing. The basic mechanisms described here will be necessary to schedule the quality of service or to support the higher-level accounting system, which will bill the clients for consumed computing resources. This article is organized as follows. The next section presents the design goals and the resulting resource control model. Section 3 compares our approach with related work, whereas section 4 concludes the article. 2
Objectives and Resulting Model
The ultimate objective of this work is to enable the creation of execution platforms, where anonymous agents may securely coexist without harming their environment. The desire to deploy secure systems translates into the following requirements: • Accounting of low-level resources, like CPU and memory, as well as of higher-level resources, such as threads. • Prevention against denial-of-service attacks, which are based on CPU, memory, or communication misuse. • No dependence on particular hardware or operating system features, in order to enable a portable implementation. Portability and transparency are crucial in heterogeneous environments. • Minimal overhead for trusted agents, which have no resource limits. • Support for resource sharing between closely collaborating agents, in order to minimize resource fragmentation. Since some aspects of resource control are to be manageable by the application developer, it is important that the general model integrates well with the existing programming model of the J-SEAL2 mobile agent system 3 . The
116
Fully trusted domains (no accounting needed)
Figure 1. Illustration of the general resource control model.
J-SEAL2 kernel manages a tree hierarchy of nested protection domains. This model of hierarchically organized domains stems from the Ja¥aSeal mobile agent kernel 4 . Protection domains encapsulate agents as well as service components. The J-SEAL2 kernel ensures that protection domains are completely isolated from each other. Furthermore, a parent domain may terminate its children at any time, forcing the children to release all allocated resources immediately. A general model for hierarchical resource control fits very well to the hierarchical domain model of J-SEAL2. At system startup the root domain owns by default all resources. Moreover, the root domain, along with the other domains loaded at platform startup, are considered as completely safe, and, consequently, no resource accounting will be enforced on them. When a nested protection domain is created, the creator donates some part of its own resources to the new domain. Figure 1 illustrates the way resources are either shared or distributed inside a hierarchy. In the formal model of J-SEAL2, the Seal Calculus 6 , the parent domain supervises all its subdomains, and interdomain communication management was the main concern so far. Likewise, in the resource control model proposed here, the parent domain is responsible for the resource allocation with its subdomains. Within each untrusted protection domain, the J-SEAL2 kernel accounts for the following resources (for details, see 2 ) :
117
• CPU-RELATIVE defines the relative share of CPU, and is expressed as a fraction of the parent domain's own relative share. In our current implementation, this resource is controlled by periodic sampling of the amount of executed bytecode instructions. • MEM_ACTIVE is the highest amount of volatile memory that a protection domain is allowed to use at any given moment. • THREADS-ACTIVE specifies the maximal number of active threads by protection domain at any moment. • THREADS-TOTAL limits the number of threads that may be created throughout the lifetime of a protection domain. • DOMAINS-ACTIVE specifies the maximal number of active subdomains a protection domain is allowed to have at any given moment. • DOMAINS-TOTAL bounds the number of subdomains that a protection domain may generate throughout its lifetime. Note that the kernel of J-SEAL2 is not responsible for network control, because network access is provided by different services. These network services or some mediation layers in the hierarchy are responsible for network accounting according to application-specific security policies. Let us stress that the network is not a special case, since J-SEAL2 may limit communication with any services, like networking, file 10, etc. 3
Related Work
Our current implementation, which is based on Java bytecode transformations (for details see 2 ) , has been inspired by JRes 5 , a resource control library for Java that takes CPU, memory, and network resource consumption into account. The resource management model of JRes works at the level of individual Java threads; there is no notion of application as a group of threads, and the implementation of resource control policies is therefore cumbersome. JRes is a pure resource accounting system and does not enforce any separation of domains. For its implementation, JRes relies on native code libraries for network and CPU accounting. Therefore, JRes does not meet our requirement of full portability. KaffeOS 1 is a Java runtime system allowing to isolate applications from each other, as if they were run on their own Java Virtual Machine. Thanks to KaffeOS it is possible to achieve resource control with a higher precision
118
than what is possible with bytecode rewriting techniques, where e.g. memory accounting is limited to controlling the respective amounts consumed in the common heap, and where CPU control does not account for time spent by the common garbage collector working for the respective applications. The KaffeOS approach should by design result in better performance, but is however inherently non-portable. 4
Conclusion
Whereas other approaches to resource control in Java demonstrate a longterm, deep re-design of the Java runtime system, our proposal might be grossly characterized as a language-based patch. J-SEAL2 isolates agents from each other, and particularly prevents denial-of-service attacks originating from inside the execution platform. Moreover, the complete compatibility and portability of our approach makes it immediately usable for the benefit of distributed multi-agent systems, especially when mobile code is involved. References 1. G. Back, W. Hsieh, and J. Lepreau. Processes in KaffeOS: Isolation, resource management, and sharing in Java. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDF2000), San Diego, CA, USA, October 2000. 2. W. Binder, J. Hulaas, and A. Villazon. Resource control in J-SEAL2. Technical Report Cahier du CUI No. 124, University of Geneva, October 2000. f t p : / / c u i . u n i g e . c h / p u b / t i o s / p a p e r s / T R - 1 2 4 - 2 0 0 0 . p d f . 3. W. Binder. Design and implementation of the J-SEAL2 mobile agent kernel. In The 2001 Symposium on Applications and the Internet (SAINT2001), San Diego, CA, USA, January 2001. 4. C. Bryce and J. Vitek. The JavaSeal mobile agent kernel. In First International Symposium on Agent Systems and Applications (ASA '99)/Third International Symposium on Mobile Agents (MA '99), Palm Springs, CA, USA, October 1999. 5. G. Czajkowski and T. von Eicken. JRes: A resource accounting interface for Java. In Proceedings of the 13th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA-98), volume 33, 10 of ACM SIGPLAN Notices, New York, October 1998. 6. J. Vitek and G. Castagna. Seal: A framework for secure mobile computations. In Internet Programming Languages, 1999.
AN AGENT-BASED MOBILE E-COMMERCE SERVICE PLATFORM FOR FORESTRY AND AGRICULTURE 1
MATTHIAS KLUSCH AND ANDREAS GERBER German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, 66123 Saarbrucken, Germany E-mail: (klusch, [email protected] The range of applications developed in the domain of agriculture and forestry covers restricted types of market places as well as information systems. However, the innovative integration of Internet, agent technology, and mobile telecommunication for integrated commerce, supporting business processes in these domains, is still at the beginning. We present the first approach of a holonic agent-based information and trading network (CASA ITN) for dynamic production and sales in which integrated services for logistics and ecommerce are provided. This paper introduces the agent-based architecture and describes the added-value services of the CASA ITN for mobile timber sales.
1
Introduction
Electronic commerce (e-commerce) is a general name for business transactions that are entered into through electronic rather than paper-based means. E-commerce has the capacity to change the way the entire world does business, because it enables people to buy and sell goods and services from anywhere in the world. Especially in the agricultural and forestry domains their is a great demand to announce offers and information about goods to a large audience and in doing fast negotiation before perishable goods get a loss in quality. In the project CASA2 at DFKI we are developing agent-mediated services for the CASA ITN to support the main operative business processes users are performing in each of the following application scenarios: (1) customer-oriented, dynamic timber production, (2) mobile trading of timber using different types of auctions, fixed or negotiable price, and (3) electronic trading of cereals. The approach taken for providing information and trading services in the CASA ITN focuses on the effective integration of production, logistics and trading processes of these scenarios. It is motivated by the paradigm of integrated commerce (i-commerce) which can be seen as an operational extension of traditional e-commerce. The basic ideas of i-commerce are (a) to get customers more involved
1
This research is sponsored by the Ministry of Economics of the Saarland, Germany, under the grant 032000. 2 Abbrev.: Cooperative Agents and Integrated Services for Logistic and Electronic Trading in Forestry and Agriculture 119
120
in the activities related to his/her orders and tasks, and (b) to get related processes in the supply chain more integrated in practice. The agent-based CASA services for i-commerce can easily be accessed from anywhere by using PC or mobile WAP 1.1-enabled devices such as smart phones or PDAs. Efficient coordination of services is performed by appropriate types of collaborating software agents. The WAP application services are currently implemented using the T-D1WAP gateway of the Deutsche Telekom. 2
2.1
CASA Agents and Services
Holonic Agent System of the CASA ITN
We differentiate between following groups of participants in the CASA ITN: producers offering goods; buyers purchasing goods; retailers acting of their own or in agency of companies; logistics companies responsible for transportation tasks, storage and resource management. Each member of these groups is represented by a special so-called holonic agent (cf. Figure 1). The concept of holonic agents [1,5] is used for reasons of effectively accomplishing complex, mostly hierarchically decomposed tasks and resource allocations in the selected application scenarios. A holonic agent (or holon) co-ordinates and controls the activities and information flow of its subagents. In a holonic multi-agent system, autonomous agents may join others to form, reconfigure, or leave a holon. A human user in the ITN is represented by a special holonic agent called personal assistant. It pro-actively acts on behalf of its user even if (s)he is off-line; the personal assistant is the coordinating head of a set of other specialized agents for individual negotiation, participation in auctions, finding relevant partners and information, and elaboration of optimal trading strategies over time. Each corporation is represented by a special holonic agent system according to its taskoriented subdivision into departments for information management, logistics, and production planing. In this context we presume that (1) information management services provide information either on certain products, or on current market situation and potential competitors, (2) logistics services support the co-ordination of machines for production and transportation, human resources, and storage capacities., and (3) production planning services support short-, middle-, and longterm product planning cycles. A corporation holon is constituted by other holonic agents each of them representing a special department. Since in the CASA ITN the roles of buyer/retailer or and seller/producer may be interchangeably used both are modeled by similar holonic agent structures. In addition, logistics companies are usually contracted by other corporations for the purpose of time- and cost-saving delivery of goods on demand.
121
Figure 1. Overview of the holonic CASA agent system Finally, we developed agent-based services for a distributed virtual market place to enable different kinds of trading between the participants such as multiple online auctions, sales by fixed or negotiable prices in simultaneously bilateral negotiations. 2.2
Agent-Based Services of the CASA ITN
The CASA agent society co-ordinates and provides the following classes of services to its users. • Auction mechanisms [4] including Dutch, English, Vickrey auctions. • Integrated services for dynamic pricing via information on transportation costs and constraints during the bidding processes of the user. • Logistics [2] services provide dynamic, approximately optimal (re-) scheduling and (re-)planning of transportation. • Information management [7]. Agents gather relevant information on behalf of its users in different trading and production settings. • Mobile services to let the users access most services of the CASA ITN also on WAP-enabled mobile devices.
122
2.3
Application Scenarios
In brief, the application scenarios of the CASA ITN for its users are as follows. • Customer-oriented dynamic timber production: Foresters and timber harvester appropriately cooperate with pro-active support of service providing agents of the CASA ITN to satisfy an individual customer's order to deliver a certain quantity and quality of timber at a given time. The processing of an order can be influenced by many side effects such as changes in weather, uncompleted order parts, stoppage of harvesting machines, or shortage of human resources. Therefore the approximately optimal, dynamic (re-)planning and coordination of services for harvesting, processing, and transportation has to be performed just-in-time and is additionally supported by mobile WAP-enabled devices. • Mobile timber sales: CASA ITN members may set up and participate in one or multiple different timber auctions via Internet or WAP-enabled mobile devices. • E-trading of cereals: Similar to the mobile timber sales scenario registered users may trade grains via auctions or multi-lateral negotiations. The first two application scenarios have been implemented using the FIPA-OS 2.0 agent system platform and Java; for reasons of space limitations we briefly describe the mobile timber sales scenario in the following sections. 2.4
Mobile Timber Sales: Services, Interactions, and Agents
In this special scenario each forester may sell timber via different kinds of auctions, fixed or negotiable sales offers to other registered users of the CASA ITN. Main benefits of the agent-based service support are the concurrent monitoring and computation of optimal transport costs per individual bid or buying offer, and the full mobile service support of the user via WAP-enabled mobile devices. 2.4.1
Services and Interactions
In general, the mobile timber sales services of the CASA ITN enable registered users to initiate or participate in one or multiple timber auctions. But the members could also sell or buy timber at fixed or negotiable prices. In the first case, the CASA ITN offers types of auctions like Dutch, English, Vickrey, and First-PriceSealed-Bid. The auction server has been built upon a general holonic coordination server [3]. Any user may invoke integrated services for decision-support during the participation in auctions. For example, a personal CASA agent may concurrently determine the optimal transportation costs and delivery dates of some auction goods for each individual bid of its user. As a result, the agent may notify its user in realtime if estimated optimal transport costs exceed the allowed limit due to given buying preferences or if some deadlines are at risk to be exceeded. In addition, each of the information and trading services is available on mobile WAP 1.1 enabled
123
devices and PCs connected to the Internet. A synchronization is co-ordinated by appropriate CASA agents (cf. fig. 1)[6]. These are holonic agents for users as buyers or sellers/auctioneers, and shipping companies. Buyers without logistics capabilities have to contract carriers appropriately. Participation in any trading process can be delegated to a personal user agent which then is in charge of negotiating or bidding at an auction, and notifying its user, e.g., via SMS or email. 3 Related Work There are just a few market places known which resemble the CASA system. Agriftow[8], for example, is putting Europe's arable industry on the fast track to ebusiness with a series of dynamic products, including Cigrex, an online co-operative independent grain exchange, and Agrivox, an information service. The Virtual Agricultural Market (VAM)[9] system has been built for B2B transactions in agricultural markets. It offers mechanisms for trading, and activities for distribution of products; VAM provides a set of generic functionality, in a stakeholderindependent, and interoperable way. However, these systems significantly differ from CASA in its architecture and provision of added values implied by the dynamic integration of logistics and information in mobile timber sales and production.
References 1. Burckert, H.-J., Fischer, K., and Vierke, G., Transportation Scheduling with Holonic MAS — The TeleTruck Approach. Proc. 3rd Intl Conference on Practical Applications of Intelligent Agents and Multiagents PAAM'98, (1998). 2. Biirckert, H.-J., Fischer, K., and Vierke, G. Holonic Transport Scheduling With TELETRUCK. Applied Artificial Intelligence, 14, (2000), pp. 697-725. 3. Gerber, A. and RuB, C , A Holonic Multi-agent Co-ordination Server. In Proc. 14th Intl. FLAIRS Conference, 2001. pp. 200-204, ISBN 0-1-57735-133-9 4. Gerber, A., Klusch, M., RuB, C , and Zinnikus, I., Holonic Agents for the Coordination of Supply Webs. Proc. Intl. Conf. on Autonomous Agents, (2001) 5. Gerber, C , Siekmann, J., and Vierke, G., Flexible Autonomy in Holonic Agent Systems. Proc. AAAI Spring Sympos. on Agents with Adjustable Autonomy, (1999). 6. Gerber, C , Siekmann, J., and Vierke, G., Holonic Multi-Agent Systems. DFKI Research Report RR-99-03, (1999), ISSN 0946-008X. 7. Klusch, M., Information Agent Technology for the Internet: A Survey. Data and Knowledge Engineering, 36, 1&2 (2001) pp. 337-372 8. Agriflow: www.agriflow.com 9. C.I. Costopoulou, M.A. Lambrou, An architecture of Virtual Agricultural Market systems: Information services and use, Vol 20 (1), 2000), ISSN 01675265, pp. 39-48
An Itinerary Scripting Language for Mobile Agents in Enterprise Applications" Seng Wai Loke School of Computer Science and Information Technology RMIT University, GPO Box 2476V, Melbourne VIC 3001, Australia swlokeQcs . rmit. edu. au A r k a d y Zaslavsky, Brian Yap, J o s e p h Fonseka School of Computer Science and Software Engineering Monash University, Caulfield VIC 3145, Australia Arkady.ZaslavskyQmonash.edu.au, b r i a n l l Q h o t m a i l . c o m , rukiQmbox.com.au We view an agent's itinerary describing which tasks to be performed when and at which location (e.g. which host) as a script glueing the tasks of the agents together in a (possibly) complex way. We present the ITAG (ITinerary AGent) scripting language which is based on the notion of the itinerary. We also discuss the enterprise-wide infrastructure needed for executing ITAG scripts, and illustrate our approach with examples of scripts for voting and distributed authoring.
1
Introduction
This paper introduces a scripting language approach to developing mobile agent applications. In the scripting approach,2 a scripting language is used to glue components together to assemble an application rather than programming an application from scratch. Our scripting language is based on the concept of the agent itinerary. An agent's itinerary describes which actions (or tasks) are to be performed when and at which location (e.g. which host), i.e. an agent's itinerary glues the actions of the agent in a (possibly) complex way while each action at a location might involve complex algorithms and data structures. A scripting language should closely match the nature of the problem in order to minimize the linguistic distance between the specification of the problem and the implementation of the solution, thereby resulting in cost reductions and greater programmer productivity.3 Our itinerary scripting language provides a higher level of abstraction, and economy of expression for mobility behaviour: the programmer expresses behaviour such as "move agent A to place p and perform action a" in a simple direct succinct manner without the clutter of the syntax of a full programming language. a
T h e work reported in this paper has been funded in part by the Co-operative Research Centre Program through the Department of Industry, Science & Tourism of the Commonwealth Government of Australia.
124
125 In the following section, we first present our itinerary scripting language, and in §3, present an example of a distributed authoring application scripted in our language. We conclude in §4. 2
ITAG: The Itinerary Scripting Language
We previously created an itinerary algebra.1 ITAG is an executable implementation of this algebra in the form of a scripting language. We first outline the algebra below. We assume an object-oriented model of agents (e.g., with Java in mind), where an agent is an instance of a class given roughly by: mobile agent = state + action + mobility We assume that agents have the capability of cloning, that is, creating copies of themselves with the same state and code. Also, agents can communicate to synchronize their movements, and the agent's code is runnable in each place it visits. Let A, O and P be finite sets of agent, action and place symbols, respectively. Itineraries (denoted by I ) are now formed as follows representing the null activity, atomic activity, parallel, sequential, nondeterministic, conditional nondeterministic behaviour, and have the following syntax: I : : = 0 | Aap | ( I | | e I ) | ( I • J ) | ( I | I ) | ( I : n I ) where A e A, a € O, p € P , © is an operator which, after a parallel operation causing cloning, recombines an agent with its clone to form one agent, and IT is an operator which returns a boolean value to model conditional behaviour. We specify how © and II are used but we assume that their definitions are application-specific. We assume that all agents in an itinerary have a starting place (which we call the agent's home) denoted by h £ P . Given an itinerary I, we shall use agents(I) to refer to the agents mentioned in J. Agent Movement (Aav). Aav means "move agent A to place p and perform action a". This expression is the smallest granularity mobility abstraction. It involves one agent, one move and one action at the destination. Parallel Composition ("\\"). Two expressions composed by "||" are executed in parallel. For instance, (Ap || Bbq) means that agents A and B are executed concurrently. Parallelism may imply cloning of agents. For instance, to execute the expression (A® || Abq), where p ^ q, cloning is needed since agent A has to perform actions at both p and q in parallel. When cloning has occurred, decloning is needed, i.e. clones are combined using an associated applicationspecific operator (denoted by © as mentioned earlier). Sequential Composition ("•")• Two expressions composed by the operator "•"
126
are executed sequentially. For example, (A^ • Aq) means move agent A to place p to perform action a and then to place q to perform action b. Independent Nondeterminism ("\"). An itinerary of the form (/ | J) is used to express nondeterministic choice: "I don't care which but perform one of / or J". If agents(I) n agents(J) ^ 0, no clones are assumed, i.e. / and J are treated independently. It is an implementation decision whether to perform both actions concurrently terminating when either one succeeds (which might involve cloning but clones are destroyed once a result is obtained), or trying one at a time (in which case order may matter). Conditional Nondeterminism (":"). Independent nondeterminism does not specify any dependencies between its alternatives. We introduce conditional nondeterminism which is similar to short-circuit evaluation of boolean expressions in programming languages such as C. An itinerary of the form I -u J means first perform / , and then evaluate II on the state of the agents. If II evaluates to true, then the itinerary is completed. If II evaluates to false, the itinerary J is performed (i.e., in effect, we perform I • J). The semantics of conditional nondeterminism depends on some given II. We give an an example using agents to vote. An agent V, starting from home, carries a list of candidates from host to host visiting each voting party. Once each party has voted, the agent goes home to tabulate results (assuming that home provides the resources and details about how to tabulate), and then announces the results to all voters in parallel (and cloning itself as it does so). Assuming four voters (at places p, q, r, and s), vote is an action accepting a vote (e.g., by displaying a graphical user interface), tabulate is the action of tabulating results, and announce is the action of displaying results, the mobility behaviour is as follows: \/vote v p
\/vote v q
v\rvote
r
yvote ' vs
v^/tabulate
h
/^/announce ^ P
II j/announce II 1
\\ \/announce II r
II \rannounce\ II s )
Implementation. To allow the programmer to type the itinerary expressions into the computer, we provide an ASCII syntax and a Controlled English version. The translations are given in Table 1. When the operators are used without op, we assume a pre-specified system default one, i.e. using op is an optional clause. A° • Aq • A% can be described as follows: "(move A to a do p) then (move A to b do q) then (move A to c do r)." Apart from the above basic elements of the language, we define the following five phrases that map down to more complex expressions: 1. A^ is translated as return A do a. 2. Aav • Aq • A® • Aas is translated as tour A t o p,q,r,s in s e r i e s do a. 3. Ap|\Ag\\A"11Aas is translated as tour A to p,q,r,s in p a r a l l e l do a. 4. y l p l ^ l ^ l ^ is translated as t o u r A t o one of p,q,r,s do a.
127
Symbol
ASCII
Controlled English
Aa
[A,p,a]
move A to p do a
:n
:{op}
then otherwise using op
I He
I
or
#{op}
in parallel with using op
Table 1: Translations.
5. Ap : Aaq : A^ : A" is translated as tour A i f needed to p,q,r,s do a. Similarly, we also have A^ :n A% :n A% :n Aas translated as tour A i f needed to p,q,r,s do a using II. Using the phrases, the voting itinerary can be described succinctly as follows: (tour V to p , q , r , s in s e r i e s do vote) then (return V do tabulate) then (tour V to p , q , r , s in p a r a l l e l do announce) Our current implementation is in the Java programming language and is built on top of the Grasshopper mobile agent toolkit. 6 In our current implementation, the user first types in itinerary scripts into an applet (running in a Web browser). Then, the itinerary script is parsed into a binary tree representation and executed by an interpreter. Execution is as follows: the interpreter translates the actions specified in the script into commands which are then forwarded to Grasshopper agents which are initially at a place (the home). These agents on receiving the commands are then launched into the network of places to do their work. 3
An Example: Distributed Authoring
We aim mainly for lightweight applications (e.g., ad hoc workflows), lightweight in the sense that they can be quickly scripted as long as the required actions code can be downloaded from a code server. Here, we consider an example adapted from Tripathi et al.4 concerning coordinating the activities of a distributed authoring system involving the author, an editor and two reviewers. In this collaboration among the four parties, the agent transfers the required information (e.g., the document draft, reviews, etc) and the itinerary represents the order in which actions are to be accomplished. For example, in a typical scenario, the author first publishes the document to the editor, the edifc
See http://www.grasshopper.de
128
tor then sends the document to the reviewers, after which the reviewers forward reviews to the editor, and finally, the editor adds further comments and sends all the information to the author. Assuming agent A is launched by the author, places abbreviated as e d i t o r , author (the place from which the agent is launched), reviewerl, and reviewer2, actions are submit, review, f i n a l i z e and notify, the following script can be written to enact this collaboration: (move A t o e d i t o r do submit) then ((move A t o reviewerl do review) in p a r a l l e l with (move A t o reviewer2 do review)) then (move A t o e d i t o r do f i n a l i z e ) then (move A t o author do n o t i f y ) Note that data (including the draft document, the reviews, and editor's comments) are carried with the agent. 4
Conclusions a n d F u t u r e W o r k
We contend that a scripting approach is well-suited for developing mobile agent applications and presented ITAG based on the notion of the agent itinerary. Autonomy and flexibility are important aspects of intelligent agents. ITAG accommodates agents with a degree of autonomy and flexibility in performing tasks via the nondeterminism and conditional nondeterminism operators. References 1. S.W. Loke, H. Schmidt, and A. Zaslavsky. Programming the Mobility Behaviour of Agents by Composing Itineraries. In P.S. Thiagarajan and R. Yap, editors, Proceedings of the 5th Asian Computing Science Conference (ASIAN'99), volume 1742 of Lecture Notes in Computer Science, pages 214-226, Phuket, Thailand, December 1999. Springer-Verlag. 2. J.K. Ousterhout. Scripting: Higher Level Programming for the 21st Century. IEEE Computer, March 1998. Available at
INTELLIGENT AGENTS FOR MOBILE COMMERCE SERVICES
MIHHAIL MATSKIN Department of Computer and Information Science, Norwegian University of Science and Technology, N-7491 Trondheim, Norway E-mail: mishaQiidi. ntnu. no We consider application of intelligent agents in mobile commerce services. Basic idea of the approach is providing customers of mobile devices and service providers with personal intelligent agents representing their interests in the Internet and usage of multi-agent system approach for coordination, communication and negotiation between the agents. We demonstrate how such agents and services can be implemented in the Agora environment that we developed earlier. Some properties of developed prototype mobile commerce services are briefly discussed.
1
Introduction
Development of mobile communication technology in the last years opens new perspectives for providing services to the users of mobile devices such as cellular phones or PDAs. An essential feature of mobile services is that the user of mobile device can be available for services almost anytime and anywhere. This allows high reactivity of user responses and decisions. At the same time development of such technology as WAP [2,5] allows the users of mobile devices get access to the Internet, which was before a privilege of PC users only. In particular this means that the users of mobile devices get access to web-based technologies and computing network resources outside of telecom networks. However, opening access to the Internet resources mobile communication technology put quite serious restrictions to such communication. Basic restrictions are related to low bandwidth, high cost of communication, slow CPU, small memory, restricted power supply, small screen and complicated input for mobile devices. In order to relax such restrictions we think that precision and focus of information delivered to the mobile devices should be very high [4]. In particular, this means that: 1) the amount of delivered information should be as minimal as possible but sufficient enough to be interesting to the user, 2) user input should be minimized as much as possible 3) connection time of mobile devices to network during processing the user request should be shortened. In order to achieve such precision and focus, most of work for information analysis and processing should be done off-line, and the analysis and processing should be personalized as much as possible: they should take into account user preferences, interests as well as context of communications (a geographical position, time etc). We think that usage of intelligent agents [1] and agent technology is a constructive approach to intelligent and personalized off-line processing. In 129
130
particular, this assumes providing participants of the commercial activity (they are mobile device customers and service providers) with software assistants-agents. Some details of this approach are presented in [4]. Here we demonstrate how the approach can be applied to support of particular mobile commerce services. As a tool for implementing the approach we use the Agora environment for support of multi-agent cooperative work [3]. For communication with mobile devices we use WAP technology [2,5] and SMS messages. The rest of the paper is organized as follows. First we give a brief introduction to the Agora environment and present solutions for mobile services using the Agora based approach. Then we consider some details of implemented prototype services. Finally, we present conclusions and future work. 2
The Agora system and mobile commerce services
In order to support agent creation and multi-agent cooperative work we use the Agora system which we developed earlier [3]. Basic idea behind this system is consideration of cooperative work as a set of cooperative acts which include coordination, negotiation and communication and providing means for supporting such cooperative acts. In order to get such support we propose a concept of cooperative node (we call it Agora). The Agora node allows registration of agents and provides means for support of cooperative activity such as matchmaking, coordination and negotiation between the registered agents. If we apply the Agora concept to the mobile commerce services then we, first, need to identify participants of the cooperative work and possible cooperative acts between them. In our case participants are customers and service providers, and we assume the following basic cooperative acts between participants: 1) buying/selling products/services by customers and providers, 2) product/service information exchange between different customers, 3) customer coalitions formation for coshopping, 4) providers coalition formation for common policy development, 5) coordination between different agents of the same customer, 6) subscription service management. Our next step is to map participants into agents and cooperative acts into corresponding Agoras. For example this can be done as it is shown in Figure 1 (in this figure rectangles denote agents, diamonds denote Agoras and arrows show registration of agents at Agoras). Each agent in the Agora system has planner, knowledge base, communication block and goal analyzer. By default, knowledge base and planner use Prolog-like notation for knowledge representation. However, all agent components can be overridden when necessary. An important feature of such implementation is encapsulation of private data in agents and ability to get service without disclosing personal preferences to providers. The planner, knowledge base and ability to handle events by goal analyzer provide a basis for implementation of pro-activity.
131
Figure 1. Customers, providers and Agoras
Ability to communicate is based on communication adapters and message wrappers in the Agora system. Both KQML and FIPA are supported. This is done by implementing an intermediate representation level (wrappers) which allows translation of constructions from both languages. Usage of wrappers also allows defining own syntax and parameter types for communicative acts. In particular, we use that for plan and action files exchange between agents. Different ontology can be described and their combination with performatives uniquely defines the communicative act. Communication Adapter
Information about registered agents .4
l/\? 1/ \ y Negotiator^
/ \
Manager \
^
Customer notificator
Provider notificator
Events handler
Matchmaker
History browser
Registrator
\ /Coordinators.
•*
Figure 2. Subscription service Agora
In the case of subscription services, customers specify information they are interested in and service provider sends the information to the customers in some time interval or upon a specified event. Basic steps of agent-based subscription service are registration, announcement of the offers, matchmaking, events generation and handling (both for providers and customers). These steps are supported by a manager of the Subscription service Agora (see Figure 2).
132
Customers present their interests to corresponding Agora by pointing rules, keywords or ranked list of interests. The Agora manager tries to match customer interests with providers' proposals and, when the matching is successful, notifies the customers. Both provider and user interests can be presented/updated anytime and asynchronously. Complexity of the matchmaker can be different for different applications. In the optimistic case (when customer discloses detailed preferences) the matchmaker does the whole work for matching customer requests and provider offers and notifies the customer when matching was successful. It is possible to implement a more intelligent behavior of manager with pro-active recommendation of offers which are relevant to customer's interests but are not presented explicitly. In the pessimistic case (when customer doesn't disclose his particular interests but rather subscribes for a wide scope information) the matchmaker does a pre-filtering of the information but particular analysis is performed by the customer agent. After successful matchmaking the customer agent may directly contact corresponding provider agent and perform additional information request or negotiation using the Negotiator component of the Agora. Managers for other types of Agoras (such as Customers, Providers, Buying/Selling or Coalitions Agoras) may have functionality different from the functionality of the Subscription service Agora. The Agora system allows attaching different manager agents for different Agoras. 3
Some applications
There have been developed several prototype systems of mobile commerce services based on the above-described approach. They include: 1) Valued customer membership service and product search; 2) Financial services (notification of stock quotes change); 3) Real-estate agent (search and notification for real-estate property); 4) Advertising over Internet with agents. For the valued customer membership service a user of mobile device can register for a customer service which provides membership benefits. After registration a personal assistant agent is created. Basically, agent operates on a user's host providing a privacy of personal data; however, it may also operate on a server provider host when the user trusts the environment. When the agent finds that some special offer matches the customer interests, the agent may send corresponding message to the user's mobile device (if it requires quick reaction) or may place the offer to a user WML-page. In addition to analyzing offers from the customer service, the agent can perform a search of relevant products from other specified sources. In the case of financial services, notification of changes in quotes of specified stocks is implemented. The Agora system is used for deploying agents and matching required and provided services. Both the specified stocks and conditions of their change are kept privately in the agent.
133
Advertising service uses Agoras for formation of customer and service provider coalitions. The coalitions are used for co-shopping and for co-advertising. The real-estate agent searches for real-estate property which satisfies the user's preferences, notifies the user via cellular phone when it is found and, if it is of user's interest, starts a bidding process for the property according to user's instructions. 4
Conclusions
We present an approach to usage of intelligent agents in mobile commerce services. The approach is based on providing users of mobile devices with personal software assistant agents and usage of the Agora system for support of cooperative work between agents. The general conclusions are as follows: 1) Usage of agents as personal assistants for users of mobile devices is a practical and feasible approach; 2) Even with a simple intelligence and functionality agents provide a great benefit by employing autonomy, communication ability and pro-activity; 3) A concept of Agora as a cooperative node is practical and convenient mean for multi-agent system design. Our future plans are directed to increasing intelligent capabilities of the agents and Agoras in the system. In particular we would like to use different negotiation protocols, rules for coalition formation and planning activity of the agents in mobile services support. This work is partially supported by the Norwegian Research Foundation in the framework of the Distributed Information Technology Systems (DITS) program and the ElComAg project. I also would like to thank Thomas Heiberg and J0ran Pedersen (product search and valued memberships services), Terje Wahl (financial services), Lars Killingdalen (advertising with agents) and Bj0rn Skogseth (realestate search and analysis) for their work for implementing the prototypes. References 1. Bradshaw, J. M. (Ed.). Software Agents. Menlo Park, CA: AAAI Press/The MIT Press, 1997. 2. Mann, S. Programming Applications with the Wireless Application Protocol: The Complete Developer's Guide. John Wiley & Sons, 2000. 3. Matskin, M., O. J. Kirkeluten, S. B. Krossnes and 0ystein Saele. Agora: An Infrastructure for Cooperative Work Support in Multi-Agent Systems. T. Wagner, O. Rana (eds.) Infrastructure for Scalable Multi-Agent Systems. Springer Verlag, LNCS Volume 1887, 2000. 4. Matskin, M. and A. Tveit. Mobile Commerce Agents in WAP-Based Services. Journal of Database Management, Vol. 12, No. 3, 2001, pp. 27-35 5. WAP: URL:http://www.wapforum.org
A N E W C O N C E P T OF A G E N T A R C H I T E C T U R E IN AGENTSPACE
T . N O W A K A N D S. A M B R O S Z K I E W I C Z of Computer Science, Polish Academy of Sciences, al. Ordona 21, PL-01-237 Warsaw, and Institute of Informatics, University of Podlasie, al. Sienkiewicza 51, PL-08-110 Siedlce, Poland E-mail: sambrosz, [email protected]
Institute
Agentspace is an emerging environment resulting from process automation in the Internet and Web. It is supposed that autonomous software (mobile) agents provide the automation. The agents realize the goals delegated to them by their human masters. Interoperability is crucial to assure meaningful interaction, communication and cooperation between heterogeneous agents and services. In order to realize the goals, the agents must create, manage and reconfigure complex workflows.
1
Introduction
Cyberspace, the emerging world created by the global information infrastructure and facilitated by the Internet and the Web, offers new application scenarios as well as new challenges. One of them is creating new infrastructures to support high-level business-to-business and business-to-consumer activities on the Web, see for example Sun ONE, Microsoft .NET, and UDDI. T h e second one is Semantic W e b 4 , conceptual structuring of the Web in an explicit machine-readable way. These two challenges are strongly related to each other, t h a t is, semantic interoperability is necessary for integration of heterogeneous, distributed Web services. It is supposed t h a t the integration will be performed automatically by autonomous software (mobile) agents. Agent is a running program t h a t can migrate from host to host across a heterogeneous network under its own control and interact with other agents and services. Since the software agents are supposed to "live" in the cyberspace, they must be intelligent, t h a t is, they must efficiently realize the goals delegated to them by their h u m a n masters. Hence, along the development of cyberspace the new world (called agentspace), inhabited by the software agents, is being created. It seems t h a t the process automation in the Internet and Web makes the development of agentspace inevitable. H u m a n users are situated at the border of the agentspace and can influence it only by their agents by delegating to t h e m complex and time consuming tasks to perform. Since the Internet and Web are open distributed and heterogeneous environments, agents and services can be created by different users according
134
135 to different architectures. Interoperability is crucial to assure meaningful interaction, communication and cooperation between heterogeneous agents and services. We can distinguish two kinds of interoperability: interaction interoperability and semantic interoperability. Interaction interoperability provides common communication infrastructure for message exchanging whereas semantic interoperability provides the message understanding. T h e semantic interoperability concerning the meaning of resources on the Web is a subject of current research, see DAML 5 -f OIL 8 as the most prominent example. In order to use services established by different users working in heterogeneous domains, agents must be capable of acquiring knowledge about how to use those services and for what purposes. There must be a common language for expressing tasks by the users, delegating these tasks to agents, as well as for describing services, and for communication between agents and services. There are several efforts for creating such language, see DAML-Enabled Web Services 7 , A T L A S 3 , C C L 1 0 , W S D L 9 , and FIPA ACL. As to the communication infrastructure, there is no need to force one transportation platform (i.e. one message format and one message delivery way) as the standard. It seems t h a t rather message language and its meaning is crucial here, not message wrapping. It is relatively easy to provide a transformation service between two platforms for translating message format of one platform to the message format of the other. Mobile agent platform (MAP, for short) gives also a communication infrastructure as well as "migration service" for the agents. One may ask if agent mobility is essential for creating agentspace, see for example J A D E 6 framework where mobility is not provided. In our approach, agent mobility may be seen as a means for learning between heterogeneous environments. Our project aims at creating an absolute m i n i m u m necessary for joining heterogeneous applications as services on the one hand and for using them by heterogeneous agents (on behalf of their users) on the other hand. As this m i n i m u m we propose the language Entish (a shorthand for e-language), and its intended semantics. We introduce a new form of agent migration. Usually, a M A P provides weak form of migration t h a t consists in moving agent's d a t a and code to a new place and executing this code at the new place whereas the agent process at the old place is closed. The d a t a and the code is strictly related to each other in t h a t agent architecture. We propose a new architecture where the d a t a are independent of the code. As a result we get a much weaker migration form where agent's d a t a can be moved without the code. The d a t a are expressed in Entish and contain all parameters needed to continue agent process at the
136 new place. This agent's d a t a is called agent "soul" and is separated from agent body responsible for reasoning and action execution. T h e idea of the new migration form is t h a t a running agent process stores all its essential d a t a and control parameters in its soul. T h e process may be closed at any time and then fully reconstructed at any new place. At the new place, agent soul is given a new body (may be a different code) and then the completed agent can continue its process. So t h a t d a t a (soul) are independent of the code (body). T h e new migration form is independent of M A P and it can be applied to communication platforms t h a t does not support (weak) agent mobility, like J A D E or a platform based on H T T P + S O A P transport. T h e structure of soul constitutes the core of language Entish. T h e main achievement of our project is a generic architecture of agentspace and its implementations. The idea of agentspace consists in constructing middleware t h a t provides transparency between heterogeneous agents and heterogeneous services. We define agentspace as an implementation of the language Entish and its semantics on a communication platform. So far we have implemented Entish on Pegaz - our own MAP, and we are completing Entish implementation on another communication platform, called Hermes, t h a t is based on H T T P + S O A P transport. It seems t h a t Hermes platform may serve as a middleware for Web Service integration. We are also implementing transport protocol of Hermes in Pegaz and vice versa, so t h a t we will achieve complete interoperability between these two agentspaces. It means t h a t agents (actually their souls) can migrate from one agentspace to the other as well as communicate with services located in the other agentspace.
2
Agentspace architecture
T h e idea of agentspace consists in construction of open distributed infrastructure t h a t would allow to join heterogeneous applications as services on the one hand and to use t h e m by heterogeneous agents on the other hand. A user, delegating a task to an agent, need not to know the locations of services and resources necessary for task realization. T h e user expresses the task in our high level common language called Entish. T h e agent migrates across the agentspace, communicates with services and other agents looking for information, services and resources needed to realize the delegated task. Since agentspace is an implementation of the language Entish and its intended semantics on a communication platform, the layered architecture seems to be natural and generic. The architecture consists of three layers: interaction layer, agent/service layer, and language layer. T h e interaction layer specifies infrastructure t h a t provides basic functionality for agents and services like
137 agent moving from one place to another and communication between agents and services. This layer is implemented by a communication platform. In our case it is done by Pegaz and Hermes. However, it may be any communication platform, like J A D E 6 , or a new one built on, for example, on the top of C O R B A , RMI-IIOP. The second layer, i.e., agent/service layer specifies some aspects of agent and service architecture t h a t allow t h e m to evaluate formulas (called situations) expressed in the language Entish as well as determining new situations resulting from performing elementary actions. T h e agents are equipped with mental attitudes: knowledge, goals, intentions and commitments represented as Entish formulas. These attitudes serve as d a t a and control parameters of agent behavior. Agents and services execute actions (migration and message exchange) in the interaction layer, whereas the message contents is expressed in Entish. The agent/service layer implements the intended semantics of Entish. T h e language layer consists of Entish - a simple version of the language of first order logic, along with a specification how to "implement" it for open and distributed use. The implementation follows the idea of so called "webizing language" see T. Berners-Lee 4 . T h e language describes the "world" (i.e. agentspace) to be created on the basis of infrastructure provided by the previous layers. However, this description is purely declarative. Actions are not used in Entish; the formulas describe only the results of performing actions. So t h a t no causal relations can be expressed here. T h e language is sufficient to express desired situations (tasks) by the users as well as by agents and services, however it can not explicitly express any idea about how to achieve them. This may be done by implementing distributed information services (called InfoServices) where an agent may get to know how to realize the delegated task, or to get a hint. Usually, as the reply to its query (expressed also in Entish) agent gets a sequence of intermediate situations to follow. BrokerServices play the role of virtual brokers to facilitate complex task realization. A BrokerService forms, manages and reconfigures a workflow t h a t realizes special type of complex tasks. T h e workflow can be quite sophisticated and consist of a large numbers of ordinary services. So t h a t it may be seen as virtual organization in agentspace. T h e language is implemented in the second layer by DictionaryServices containing the syntax and new concept definitions. There are three additional types of services, namely SecretaryService, MailService, and BodyService. Let us note t h a t all those services are not system services. They can be implemented and developed independently by different users. It is important t h a t only "operation type" of any of these services is specified in Entish. Roughly, operation type is a description of the function performed by a particular ser-
138 vice. A service implementation must only satisfy specification of the operation type. T h e paper presents our work in progress. The limit of space does not allow to present details. T h e first version of Entish syntax and semantics is completed. A prototype of agentspace based on Pegaz is already implemented. Implementation of Hermes, i.e., agentspace based on H T T P + S O A P transport, will be completed shortly. Now, we are developing (by implementing services) and testing our small agentspace in the frame of Pegaz Ring t h a t consists of several research groups. Acknowledgment s The work was done partially within the framework of E S P R I T project No. 20288 CRIT-2, and KBN project No. 7 T11C 040 20. References 1. S. Ambroszkiewicz, W. Penczek, and T . Nowak. Towards Formal Specification and Verification in Cyberspace. Presented at G o d d a r d Workshop on Formal Approaches to Agent-Based Systems, 5 - 7 April 2000, NASA Goddard Space Flight Center, Greenbelt, Maryland, USA. To appear in Springer LNCS. 2. S. Ambroszkiewicz, O. Matyja, and W. Penczek. " T e a m Formation by Self-interested Mobile Agents." In Proc. 4-th Australian DAI-Workshop, Brisbane, Australia, July 13, 1998. Published in Springer LNAI 1544. 3. ATLAS - Agent Transaction Language for Advertising Services h t t p : / / w w w . c s . c m u . e d u / softagents/atlas/ 4. T. Berners-Lee - www.w3.org/DesignIssues/Webize.html -and- /Designlssues/Logic.html 5. DAML www.daml.org/ 6. J A D E Java Agent DEvelopment Framework http://sharon.cselt.it/projects/jade/ 7. Mcllraith, S., Son, T. and Zeng, H. "Mobilizing the Web with DAMLEnabled Web Services", www.ksl.stanford.edu/projects/DAML/ 8. OIL, Ontology Interchange Language, www.ontoknowledge.org/oil/ 9. Web Services Description Language (WSDL) www.w3.org/TR/2001/NOTE-wsdl-20010315 10. S. Willmott, M. Calisti, B. Faltings, S. Macho-Gonzalez, O. Belakhdar, M. Torrens. " C C L : Expressions of Choice in Agent Communication" T h e Fourth International Conference on MultiAgent Systems (ICMAS-2000).
21 s ' CENTURY SYSTEMS, INC.'S AGENT ENABLED DECISION GUIDE ENVIRONMENT (AEDGE™) PLAMEN V. PETROV s
21 ' Century Systems, Inc., Omaha, Nebraska, E-mail:
USA
[email protected]
ALEXANDER D. STOYEN University of Nebraska and 21s' Century Systems, Inc., Omaha, Nebraska, E-mail:
USA
[email protected]
JEFFREY D. HICKS University of Nebraska and 21s' Century Systems, Inc., Omaha, Nebraska, E-mail:
USA
[email protected]
GREGORY J. MYERS s
21 ' Century Systems, Inc., Omaha, Nebraska, E-mail:
USA
[email protected]
21 s ' Century Systems, Inc.'s Agent Enabled Decision Guide Environment (AEDGE™) is a standardized Commercial Off the Shelf (COTS), DII COE compliant, agent architecture that enables complex DSS to be developed as an expansion of the AEDGE core functionality. The AEDGE core consist of a Master Server, Entity Framework, Agent Infrastructure and Database Connectivity components. User service specific DSS tools, such as agents, servers or clients, are quickly and efficiently constructed above the core functionality through the use of common interfaces and data structures. The extender components (Simulation Server, Live Links, Visualization Client, Agent Client, and Data Bridges) serve as a template for extending the application. To facilitate Agent interactions, the AEDGE provides a number of local and remote mechanisms for service registration and invocation. In addition Agents can interact, synchronize, and cooperate via Agent Managers, which in turn provide the aggregate agent functionality to the user. The componentized structure of the AEDGE enables multiple levels of product availability that satisfies the needs of the user through different levels of product involvement.
1
Introduction
In the past decade we have observed a significant increase in the demand for computer-based decision support systems (DSS), due primarily to the overwhelming 139
140
availability of data from multiple sources with various degrees of quality, coming from networked sensors, databases, archives, web-based applications, and other sources. Simultaneously, a new branch of distributed computing, based on intelligent, semi-autonomous processes, referred to as Agents, has been the center of attention because of its flexibility, extensibility, and network-friendliness. 21st Century Systems, Inc. (21CSI), a small company has pioneered the integration of agent-based computing into DSS applications. We have developed stand-alone and mobile agents and agent architectures to perform individual and team decision support for multiple defense-oriented environments such as AWACS [1], Aerospace Operations Centers, Navy Ship Command Centers [2] etc. The need for a standardized common infrastructure has lead us to design an environment where both agents and simulated entities (or representations of real-world assets) are represented as first-class objects capable of interacting with each other. The Agent Enabled Decision Guide Environment (AEDGE™) (see Figure 1) is 21CSFs undertaking to build a common reference framework and a test-bed environment for integrated simulation and agent-based decision support. AEDGE defines Agents, Entities, Avatars and their interactions with each other and with external sources of information. This standardized architecture allows additional components, such as service-specific DSS tools to be efficiently built upon the core functionality. Common interfaces and data structures can be exported to interested parties who wish to extend the architecture with new components, agents, servers, or clients. When the core AEDGE components developed by 21CSI are bundled with customer-specific components in an integrated environment, a clean separation of those components, through APIs, is provided. | ftetahascs
!
.
. Bridf^__
Party Compts Figure 1. 21CSI's AEDGE Product Structure
^^**
141
2
Agent Enabled Decision Guide Environment (AEDGE™)
21CSI's DSS product [3] is based on an extensible architecture and a number of standard components that enable simulation and decision support capabilities. AEDGE is designed in an open, DII-COE and CORBA compliant manner. The architecture is unified and allows users to use and extend existing components, as well as to build new, compatible and customized add-ons. The kernel of the architecture consists of four core and five extender components. These define the internal structures, dataflows, and interface to the architecture. • Master Server. Tracks components and matches service providers with service requesters. The Master Server is a network component of AEDGE that facilitates connections and interactions among the rest of the AEDGE components. It provides component registration and tracking services, interface matching services and component search, identification and connection services. The Master Server is also responsible for synchronizing simulation time (and real time) among multiple simulation servers and live links. • Entity Representation Framework. The Entity Representation Framework is an integral part of AEDGE, which provides the basic entities and events for a time-event simulation or live-feed connections. The object-oriented hierarchy of entities represents a wide range of structures, vehicles, platforms, weapons, and sensors. The Framework includes a interfaces, which allow users to add new entities with new behaviors or with combinations of existing behaviors. • Agent Infrastructure. The Agent Infrastructure provides the basic inter-agent communication and synchronization mechanisms, as well as the interfaces for agents to use other data sources, such as simulation servers, live data links, databases etc. A base hierarchy of agents is also provided, and it could be extended and customized for particular user's need. • Database Connectivity. AEDGE provides the capability of storing and retrieving data to/from various databases. The Database Connectivity components provide generic and specific bridges to a number of proprietary and public databases. New Database Connectivity modules can be added by extending the provided bridges and implementing the connectivity interfaces. In addition to these kernel components, extender components define the basic functionality of information clients and servers and define interfaces for adding new functionality. These components are, in essence, templates for extending the platform with new functionality, while maintaining tight integration and efficient implementation. The following standard AEDGE extender packages are provided: • Simulation Servers. Simulation Servers model a particular aspect of the physical reality in terms of the AEDGE components. In other words, a simulation server maintains a set of entities, their properties, and those of the environment and models the interactions among those. For example, the vehicle movement model, based on kinematics, affects the position, speed, direction of motion and fuel burn rates for the entities; the weapon models affect the
142
outcome of engagements, the communication models determine how orders and subordinate feedback are distributed. A simulation server may potentially interact with all four-core components of AEDGE. It registers with the Master Server and posts its exported services (e.g. providing entity position information). The server manipulates a set of entities (object instances) from the Entity Framework that represent the current view of the world according to that simulator. The simulation server may interact bidirectionally with agents from that Agent Infrastructure, both providing information about the state of the world and receiving recommendations and action requests from Agents. Finally, a server may require information from various databases that is provided through the Database Connectivity component. Live Links. Live Links are similar to Simulation Servers in that they provide information about the world to the AEDGE components. However, this information is based on sensor information and reflects the state of the physical world in real-time. Thus, the information flow is unidirectional, since we do not yet support actuators placed in the physical world. The live links may provide entity or track information, weather information, or any other state or capability changes. The links can interface with all core AEDGE components, much like the simulation servers can, with the limitation of unidirectional communication. Visualization Clients. Visualization Clients are responsible for interactions with the human users. They present data from the AEDGE in a clear and intuitive manner, allowing for simple, yet powerful presentations of complex interdependencies in the simulated/sensor world. Visualization clients interact with all components through a bidirectional information flows. They receive information on the simulated entities, their environment and interactions, as well as on agent evaluations and recommendations. The users' interactions with the Visualization client provide feedback to the AEDGE core components. Agent Clients. Agent Clients host one or more Intelligent Agents, which monitor the simulated world, react to changes in it and interact among each other and with human users according to their specific agent behaviors. The agent client receives information from the AEDGE core on the state of the world and sends back agent requests and feedback. Database Bridges. These are a natural extension of the AEDGE core Database Connectivity. Bridges to characteristics and performance data, weapons performance and effectiveness data and terrain databases are provided. Interfaces for new database bridges are also provided. 3
Componentization
The AEDGE Architecture enables the commercialization through four componentized availability levels that cover the needs of customers with different levels of involvement. The Demo availability level provides for execution and
143
evaluation rights of a binary (compiled) distribution of the product. This type of availability level is aimed at early users or prospective customers. The Enterprise availability level is designed to fulfill the needs of advanced customers who have a specific use of the platform. Often the Enterprise availability level is accompanied by customized extender components. The Research availability level delivers the best configuration for scientists who may use and/or extend the platform for their particular experimental needs. More interfaces to the system are provided to enable the researcher to tap in the rich data sources of the platform. The Development availability level enables advance users to build components on top of the base platform. It provides all interfaces and some source code examples to key components. Under this level, customers are able to extend the core functionality with their own agents or graphical displays. While none of the availability levels enable re-distribution of the platform, the developer availability level permits the distribution of the binaries only, with proper disclosure. 4
Conclusion
21 s ' Century Systems, Inc. has developed the Agent Enabled Decision Guide Environment (AEDGE™), an open DII COE and CORBA compliant agent-based environment that enables the development of componentized decision support systems. AEDGE's core functionality can be easily extended with new capabilities by using extender components and bridges to third party products. A number of commercial and military customers already benefit from this decision support environment in a variety of applications (AWACS Command and Control, Griffin Special Forces Rote planner, IDAS Aerospace Operations Center, Navy's Advanced Battle Station, etc). Customers use AEDGE at multiple levels of component availability to satisfy their specific needs for intelligent agent DSS architecture. 5
Bibliography 1. Petrov, P. V., Stoyen A. D. An Intelligent-Agent Based Decision Support System for a Complex Command and Control Application. Proceedings of the Sixth IEEE International Conference on Engineering of Complex Computer Systems, ICECCS'2000, Tokyo, Japan, September 2000. 2. Hicks, J. D., Stoyen, A. D., Zhu, Q Intelligent Agent-Based Software Architecture for Combat Performance under Overwhelming Information Inflow and Uncertainty. Proceedings of the Seventh IEEE International Conference on Engineering of Complex Computer Systems, ICECCS 2001, Skovde, Sweden, June 2001. 3. 21 st Century Systems, Inc. Extensible Multi-Component DSS Architecture - a Multi-agent Decision Support Environment. Technical report. 21CSI, Omaha, NE. January 2001.
PROACTIVENESS AND EFFECTIVE OBSERVER MECHANISMS IN INTELLIGENT AGENTS JON PLUMLEY, KUO-MING CHAO, RACHID ANANE AND NICK GODWIN School of Mathematical and Information Sciences Coventry University,Coventry CVI 5FB, UK E-mail:{j.plumley, kxhao, r.anane, a.n.godwin} @coventry.ac.uk Proactiveness is a necessary property for an autonomous intelligent agent. We believe that to exhibit this property, agents require an effective and efficient observing mechanism. In this paper, we propose a novel method that enables agents to observe dynamic change in other agents. The method incorporates two components: an ORB-based observing mechanism and a mobile element. This new approach goes beyond the observing mechanism in Java. The scope for interoperability is improved, and the dynamic generation and transfer of observable conditions between agents is fully supported by the flexibility of the mechanism. Under this new scheme, the ORB-based observing mechanism enables agents to monitor any changes in the component objects of other agents. The mobile agent is used to transfer the filtered knowledge between agents in order to set the monitoring conditions in a dynamic manner.
1.
Introduction
Intelligent Agent technology has attracted a number of researchers and industrialists in the field of distributed systems[8,9]. We argue that agent technology can be useful in the integration of diverse systems in a distributed environment. The realisation of an agent's proactiveness through the use of a two level ORB based observer mechanism could reduce the tendency to redundant computation. This paper proposes a Dynamic Mobile Agent (DMA) with an Observer/Observed Mechanism (O/OM) operating at two levels- the global level and the object level. The observation of change allows the proposed agent to change the rule set of its mobile elements. It is this dynamic behaviour, which is described in detail in the next section, that makes the agent more proactive and more autonomous, and therefore better suited to dynamic distributed systems.
2.
An overview of the Proposed Dynamic Mobile Agent functionality.
Two essential elements of the DMA are the Observation strategy and the reasoning process.
144
145
2.1 The Observation Strategy The DMA maintains a table of active objects with its Observer mechanism at the global level by observing any object creation or deletion. The logical integrity of any decision taking process would be flawed if the client held objects of which the DMA was not aware. Likewise a lack of knowledge of deleted objects would lead to run time error if the DMA were to attempt to reference such a deleted object. A set of meta rules is held in the Belief Desire Intention (BDI) [7] module, and with the built in mobile element rule generator, rules can be generated for each mobile element. Specific mobile elements can then be dispatched to observe particular objects. Fig. 1 illustrates the separation of the static and mobile elements of a DMA. The observer mechanism and rule set of the mobile element allows it to monitor any changes in the object states, and the significance of such a change. With the knowledge of the observed change a decision is made (by human intervention) as to whether any changes in the rule sets are needed. If so then a mobile element(s) with a revised rule set(s) can be dispatched to continue observation.
Figure 1. A conceptual view of the elements of a Dynamic Mobile Agent
2.2 The reasoning process Once dispatched with its own rule set, the mobile element observes its designated object. When a change is observed in the object the mobile element is able to use its rule set to determine the significance of the change. If appropriate it will pass a message to the DMA notifying it of the observed change. The DMA will then be able to use its global rule set to determine the significance to the whole system. This
146
may result in the need to change the 'observation rules' of one or more of the mobile elements. When a change of rule set is appropriate the specific mobile element is retracted, a new rule set generated, and then a new mobile element is dispatched to continue observation. This dynamic behaviour ensures that each of the agents involved in a multiple agent system responds to the dynamics of the system as a whole and that they are able to cooperate together efficiently. 3. Architecture of a Dynamic Mobile Agent The proposed mechanism is supported by a three level architecture. The three levels and their appropriate mechanisms are described below. 3.1
The Three-level Architecture
The three levels of the system architecture are the communication mechanism, the mental model, and the interaction with the observer mechanism. The communication mechanism involves message passing built upon the Object Request Broker (ORB) principles. It transports the agent's message using the syntax of an Agent Communication Language (ACL) [4] to the recipient, which subsequently parses the message. The mental model interprets the content of the message, reasons with it and asks the underlying application to perform the task. The underlying application returns the result to the mental model. The mental model generates the appropriate reply and forwards it to the requesting agent. Remote method invocation is used to invoke the functions in the application at the lowest level. The interface between the application and the mental model uses the ORB in order to support applications that are implemented in different programming languages. 3.2.
The Mental Model
The Belief Desire Intention (BDI) module parses the incoming message from the ACL module and reasons with its content. The BDI then invokes appropriate methods. The BDI is a reasoning mechanism that interprets the information, motivational, and deliberative states of the agents. "These mental attributes determine the system's behaviour and are critical for achieving adequate or optimal performance when deliberation is subject to resource bounds" [7].
147
4 Conclusions and future work
4.1 Discussions Wooldridge and Jennings [9] identify proactiveness as a key property of an intelligent agent. A proactive agent is able to exhibit goal-directed behaviour by taking the initiative through its ability to observe the internal and external environment. An effective and efficient observation mechanism is required for the agents to be proactive. In this respect the A-Design system [3] is a proactive system requiring a constant flow of information, and a failure to note that the object being observed has been deleted could cause system errors. The mobile agent has been widely used in the area of information retrieval over the internet [2, 6]. We exploit this feature to work with our global observation mechanism in order to ensure that the system maintains a consistent state. The JAM agent [5] supports agent mobility with BDI representation. It provides a flexible representation for mobile agents. We use this feature and apply it in agent observations. Ahmad and Mori [1] proposed using mobile agents to push and pull data to cope with ever-changing situations in information services and to reduce access time for the users. Our proposed method provides a more flexible approach that allows the intelligent agent to generate new monitoring rules as required and introduced the ORB observing mechanism to cater for changes to the objects in the environment. 4.2 Conclusion The main contribution of this work is the proposal of a method that supports an intelligent agent's proactiveness with an observing mechanism that operates at two levels: global and local (object level). The global observation allows the agent to be aware of any changes such as creation and deletion of objects, thus enhancing the robustness of the system. The local observer associated with the BDI and mobile element generator, enables the observer agent to generate and dispatch an autonomous mobile element to observe the state of a particular object. Changes to the monitoring rules in the mobile element can be made when the need arises without recompiling the code. The architecture of the system enables the intelligent agents to be autonomous and to reflect the dynamic environment. The volume of communication between agents can be reduced, because the mechanism in the mobile element only sends filtered information to the agent rather than the raw data. The ORB observer mechanism also contributes to the reduction of communication traffic, because it is server side, the observable agent, pushing the data out to the client side, the observer agent. Thus, the observer agent does not need to constantly monitor the status of the objects in the observable agent. This
148
then, is an effective method of maintaining system consistency in a dynamic environment where the objects and monitoring rules may change frequently. The agent framework has been partially implemented. A simple example was used to test the ORB observing mechanism and the mobile element in order to evaluate its feasibility. A further implementation of these components is needed in order to carry out a demonstrable case study. References 1.
2.
3.
4. 5.
6. 7.
8.
9.
Ahmad H. F., Mori K., Push and pull of information in autonomous information service system, Proceedings 2000 International Workshop on Autonomous Decentralized System, IEEE Comput. (2000), pp. 12-18. Cabri G., Leonardi L., Zambonelli F., Agents for information retrieval: issues of mobility and coordination, Journal of Systems Architecture, 46(15), (2000) pp. 1419-33. Campbell, M. I., Cagan, J., Kotovsky, K., A-Design: An Agent-Based Approach to Conceptual Design in a Dynamic Environment, Journal of Research in Engineering Design, 11(3), (1999), pp. 172-192. FIPA, Agent Communication Language Specifications 97, http://www. fipa.org. (1997). Huber M. J., JAM: a BDI-theoretic mobile agent architecture. Proceedings of the Third International Conference on Autonomous Agents. ACM. (1999), pp.236-43. Lieberman H., Selker T., Out of context: computer systems that adapt to, and learn from, context, IBM Systems Journal, 39(3-4), (2000), pp.617-32. Rao, S. A., & Georgeff. M. P., BDI Agents: From Theory to Practice, Conference Proceedings of 1st international conference on multiple agent system, (1995), pp. 312-319. Shen, W. M., Douglas H. N., Agent-based Systems for Intelligent manufacturing: A State-of-the-Art Survey, International journal of Knowledge and Information Systems, 1(2), (1999) pp. 129-156. Wooldridge, M. and Jennings, N. R., Agent Theories, Architectures, and Languages: a Survey, Intelligent Agents, ed. by Wooldridge, M., Jennings, N. R., (1995), pp. 1-22.
CHAPTER 3 LEARNING AND ADAPTATION
P A R R O N D O STRATEGIES FOR ARTIFICIAL T R A D E R S MAGNUS BOMAN Swedish
Institute
of Computer Science, Box 1263, SE-164 E-mail: mabQsics.se
29 Kista,
Sweden
S T E F A N J. J O H A N S S O N Department Blekinge Institute
of Software Engineering and Computer Science, of Technology, Box 520, SE-372 25, Ronneby, Sweden E-mail: [email protected] DAVID LYBACK
Financial
Market Systems, E-mail:
OM AB, SE-105 78 Stockholm, [email protected]
Sweden
On markets with receding prices, artificial noise traders may consider alternatives to buy-and-hold. By simulating variations of the Parrondo strategy, using real data from the Swedish stock market, we produce first indications of a buylow-sell-random Parrondo variation outperforming buy-and-hold. Subject to our assumptions, buy-low-sell-random also outperforms the traditional value and trend investor strategies. We measure the success of the Parrondo variations not only through their performance compared to other kinds of strategies, but also relative to varying levels of perfect information, received through messages within a multi-agent system of artificial traders.
Keywords: Artificial trader, Parrondo strategy, on-off intermittency, multi-agent system, artificial stock market 1
Introduction
Stock markets to an ever-increasing extent allow for trading controlled by artificial agents, or more generally, program trading. For instance, the Swedish Securities Dealers Association finds that it has no objections to program trading, and already in 1992 declared that only the means to exploiting unlawful quotes manipulation, resulting from program trading, should be controlled 19 . Nasdaq, in a communication to their members write 17 : Recent events show that the way some stocks are traded is changing dramatically, and the change in trading methods may affect price volatility and cause increased trading volume. This price volatility and increased volume present new hazards to investors, regardless of whether trading occurs on-line or otherwise.
150
151
In general, stock markets do not apply restrictive policies to program trading. A primary objective of the market place operator is to promote a high liquidity in the traded instruments. This can be done through reducing the transaction costs: one typical implicit cost is lack of orders, leading to wide spreads, or non-existing quotes. The operators thus have reasons to encourage inbound orders. As long as these are authenticated, and the network can keep up disseminating the market info in a proper fashion so that the situation stays in line with the overall aim of up-keeping a fair and orderly market, the operator should have nothing against a large number of valid orders per second being placed by artificial agents. Hence, we feel motivated to relate certain theoretical results from physics to artificial traders of the future. We do not assume markets populated solely by artificial traders, however. If we did, we could move on to claim that the Efficient Market Hypothesis and rational choice theory yield efficient equilibria 14 , since the vast empirical evidence against such assumptions are directed almost exclusively towards human traders 13 . We instead believe that artificial traders have gradually and almost unnoticeably slipped onto the same markets as human traders, and we will treat them as speculating noise traders (traders with non-rational expectations and potentially zero intelligence)6. Artificial stock markets possibly exhibit volatility (i.e., standard deviation) of a different kind than ordinary excess volatility markets 2 , as argued, e.g., in the ban of crawlers from the Internet auction site eBay 20 . In practice, Internet marketplaces supply information on their acceptance of artificial traders and other softbots in a file named r o b o t s . t x t , and on Internet markets that do allow for softbots, their behavior is usually monitored in some way, in order to mitigate the effects of speculation through unconventional methods such as denial-of-service attacks. Program trading has also in general reached a level where flocking behavior worry policy makers 7 . On an artificial stock market, in contrast to an ordinary market 16 , active portfolio management should also incorporate the price dynamics, because of the intense trading. This factor has also led to transaction fee policies being radical on some artificial trader markets. Since significant transaction fees can render the Parrondo strategies described in sections 2 and 3 below useless, the existence of markets with low or no transaction fees is important to our object. We will consider portfolios on markets with receding prices. We will represent artificial traders as agents in a multi-agent system (MAS), in which agents affect each other's behavior through trusted message passing, as explained in section 3. In the MAS setting, variations of Parrondo strategies are then subject to experiments on a simulation testbed, the results of which are reported in section 4. In the last section, we present directions for future research.
152 2
The Parrondo Strategy in Games
The flashing ratchet (or Brownian motor) 1 is a molecular motor system consisting of Brownian particles moving in asymmetric potentials, subject to a source of non-equilibrium 18 . In its game-theoretical formulation 9 , the flashing ratchet can be described in terms of two games (A and B) in which biased coins are tossed. • Game A is a single coin game in which the coin comes up heads (=win) 50 — e per cent of the time (for some small e > 0) and results in tails the rest of the times (Parrondo himself18 used e = 0.005, and the constraints are described, e.g., at seneca.fis.ucm.es/parr/GAMES/discussion. html). • Game B involves two coins. The first coin comes up heads 10 — e per cent of the time, and the second coin 75 — e per cent of the time. What coin to flip is decided through looking at the capital of the player. If it is divisible by 3, the first coin is flipped, while the second coin is used in the rest of the cases. Clearly, game A is a losing game, but the same holds for game B. This is because the player is only allowed to flip the second coin if her capital is not a multiple of 3. The latter situation comes up more often than every third time: The player will start with the unfavorable coin, which will very likely place her in loss -1. She will then typically revert to 0, and then back again to -1, and so on. Whenever the unfavorable coin lands tails twice in succession, however, she will end up with capital -3, and then the pattern will repeat, leading to -6, etc. Hence, game B is a losing game, just like game A. The Parrondo strategy for playing games A and B repeatedly is to choose randomly which game to play next. Somewhat counter-intuitively, this discrete representation of a ratchet yields a winning game. 3
The Parrondo Strategy in Artificial Trading
Artificial trading and herd behavior have often been studied through bottomup simulations, as in Sugarscape 8 or the Santa Fe artificial stock market 2 . We have concentrated on speculating investors that use variations of the Parrondo strategy. Table 1 briefly describes these strategies, as well as some control strategies. Value investors (exemplified by BLSH in Table 1) seek profits, while trend investors (exemplified by BHSL in Table 1) try to identify upward and downward movers and adjust their portfolios accordingly 10 . In
153
Strategy Buy-and-hold (BaH) Random Insider Buy low, sell high (BLSH)
Buy low, sell random (BLSR) Buy random, sell high (BRSH) Buy high, sell low (BHSL)
Description The buy-and-hold strategy here acts as a control strategy that trades no stocks. This strategy trades stocks randomly. The insider gets quality ex ante information about some stocks on which it may react before the market. This Markovian value investor strategy monitors if the stock increased or decreased in value during the latest time interval. If the value increased, it sells the stock, and if the value dropped, it buys the stock. Like BLSH, except BLSR randomly chooses what stock to sell. Like BLSH, except BRSH randomly chooses what stock to buy. This Markovian trend investor strategy is the opposite of BLSH. Table 1. The artificial trading strategies.
our simulations, the value investor proportion is larger, but this significant fact notwithstanding, our object is not the study of how it affects the market dynamics. Instead, we augment the Parrondo variations by market information, in the form of agent messages. The agents may thus influence each other by passing hints on what to buy, or what to sell. A message is treated by the receiver as trusted information, and the receiving agent will act upon the content of the message, interpreting it as normative advice. A message can be interpreted as perfect (or even insider) information, randomized for the sake of our experiment. We considered a portfolio often stocks with receding prices, assumed to be unaffected by agent trading. The data used is real daily data from the Swedish stock market, from the one-year period starting March 1, 2000. The stocks are listed in Table 2, and in Figure 1 their development is shown. Values have been normalized to 100 for the start of the period. The strategies initially held $10000 value of each stock. One trade was done per day, in which the strategy decided what to sell and what to reinvest in. Three different levels of hint probabilities were used: 1%, 5%, and 10% chance of receiving a hint. A 1% level means that the strategy will on average receive a hint for one of the ten stocks every tenth day of trading. When choosing randomly what to buy and what to sell, 10 integers were randomized and taken modulo 10 in
154
Stock ABB Allgon Boliden Enea Data Hennes&Mauritz Ericsson OM Scania Securitas Skandia
Business area Industrial Telecom Mining IT Clothes Telecom Financial Industrial Security Insurance
Value 83.33 24.55 37.19 20.09 60.40 36.36 48.67 77.80 80.35 53.22
Table 2. The ten stocks used in the experiment, and their normalized values on March 1, 2001.
order to get (at most 10) stocks that were then traded. For each of the stocks sold, a percentage of the possession p e [0.2,0.8] was sold. The values of all sales were then reinvested according to their relative part in a similar selection process. If the strategy did not get at least one stock to buy and one to sell, it held its possessions until the next day. Each strategy was evaluated towards the same set of stocks and the same set of hints (if used). In order to even out differences due to the randomness of the trading, the simulations were repeated 1000 times. Alignment and docking experiments are encouraged, and specifics are available upon request. 4
Experiment Results
As can be seen in Figure 2, most of the strategies over the 252 trading days followed the major trends of the market and none of them managed to maintain the initial portfolio value. There was considerable movement, as shown in the blowup of the last days of trading in Figure 3, but also significant differences between outcomes (Table 3). Buy-low-sell-random was the only strategy that outperformed Random. Strategies also differed with respect to volatility. For instance, BLSH was inferior to all strategies for most of the year. However, around day 100 through day 120, it outperformed all other strategies. As expected, BHSL amplified the receding trend. In spite of its poor performance, there are still many reasons for policy makers and speculators to use buy-and-hold even on supposedly receding markets. One reason is to declare and uphold a clear company investment
155
>
r^pg»p|
60
ABB Allgon Boliden Enea H&M Ericsson OMG Scania Securitas Skandia 1
50
100
150 Time
Figure 1. The development of the values of the stocks used in the experiment.
Strategy BLSR Random BaH BLSH BHSL BRSH
Value 6110.88 5524.60 5383.40 5338.15 5202.71 5140.29
Table 3. Strategy results without hint probabilities (strategies are explained in Table 1).
policy, another is that frequent re-investments could be undesirable (e.g., due to transaction fees). Nevertheless, we feel that BLSR produced good enough results to merit further study. For now, we will be content with comparing it to various levels of hint probabilities, however. From those results, shown in Figure 4, we see that BLSR is comparable to the insider strategy with a hint probability of approximately 4%.
156 BaH Random BLSH BHSL BRSH BLSR
11000
10000
i
.\
'•..» : ; i
inki .
i
9000 v
I
•>
^'T'r
I/
Tr
8000
! 7000
-
&r
y w
^4 A A
^ % ; :
-
F
A
^.•-••iC^!»K:..,.
iift&J
6000
5000
'
K
1?
?
i
100
150 Time
Figure 2. The development of the values of the experiment portfolios.
5
Conclusions and Directions for Future Research
We have shown that the use of certain Parrondo-based strategies may improve the performance of artificial traders. Our model is simplistic, in the following respects. The messages sent must be allowed to have richer content, and may be indicators or signals, rather than simple instructions. Instead of interpreting received messages as normative advice, trust could somehow be represented. For instance, a probability distribution may be associated with messages, and trust assignments can then be represented as secondorder probabilities. Market norms should be modeled and adhered to by the traders 3 . Message content can then depend on market dynamics. Artificial traders have two ways of communicating such dynamics. Firstly, they may observe and recognize other traders and try to model them with the intent of communication and possibly co-operation 5 . Secondly, they may monitor prices, as in the Trading Agent Competition 4 (see t a c . e e c s . u m i c h . e d u / ) or artificial stock market approaches 11 . Naturally, each trader itself also observes the market dynamics. We have placed no reasoning facilities in the trader at this stage, and so the trader cannot act on sense data. Another simplifica-
157 BaH Random BLSH BHSL BRSH BLSR
6500
6000
5500
4500 244
245
246
248 Time
249
250
Figure 3. Magnification of the last days of trading.
tion is that our models should incorporate transient phenomena, including not only crashes and bubbles, but also transient diversity, i.e. we must find the homogeneity and heterogeneity drivers in our MAS 15 . A related point in need of further investigation is learning in artificial traders 12 . For purposes of prediction, our results are almost useless, since we cannot in general design in advance a portfolio of stocks, the prices of which are all receding. In rare circumstances, such as during the period of almost universally receding prices of IT stocks in the autumn of 2000, ex ante portfolios could relatively easily be assembled, and then Parrondo variations would indeed be an interesting alternative to buy-and-hold. For our experiment, the real data was chosen ex post from a large sample space with the criterion that each stock should have a saw-tooth receding price curve. While the above shortcomings together render our results useless for practical purposes, they should be seen as directions for future research. We intend to pursue the important question of strategy programming for artificial traders, as we feel that such programming will be of increasing importance in the future. By replacing our unrealistic assumptions one by one, we hope to achieve our ultimate goal of reasonably efficient strategies on real-time markets with non-linear dynamics.
158 -T
1
BaH Insider 1% Insider 5% Insider 10%
Figure 4. The development of the values with three different levels of hint probabilities.
Acknowledgements Magnus Boman was in part financed by a NUTEK (VINNOVA) grant within the PROMODIS (Programming modular and mobile distributed systems) programme. Stefan J. Johansson was financed by the KK-foundation. David Lyback was supported by a research assignment in the OM corporation. The authors wish to thank Fredrik Liljeros, as well as their respective colleagues, for comments on drafts. References A. Ajdari and J. Prost, Mouvement Induit par un Potentiel Periodique de Basse Symetrie: Dielectrophorese Pulsee, C. R. Acad. Sci. Paris 315, 1635 (1992). W. B. Arthur, J. Holland, B. LeBaron, R. Palmer, and P. Tayler, Asset Pricing under Endogenous Expectations in an Artificial Stock Market, in The Economy as an Evolving Complex System II, eds. W. B. Arthur, S. Durlauf, and D. Lane, pp. 15-44, Addison-Wesley, Reading, MA, 1997.
159
3. M. Boman, Norms in Artificial Decision Making, Artificial Intelligence and Law 7, 17 (1999). 4. M. Boman, Trading Agent Competition, AgentLink News 6, 15 (2001). 5. M. Boman, L. Brouwers, K. Hansson, C-G. Jansson, J. Kummeneje, and H. Verhagen, Artificial Agent Action in Markets, Electronic Commerce Research 1, 159 (2001). 6. J. B. De Long, A. Shleifer, L. H. Summers, and R.J. Waldmann, The Survival of Noise Traders in Financial Markets, J. of Business 64, 1 (1991). 7. V. M. Eguiluz and M. G. Zimmermann, Transmission of Information and Herd Behaviour: An Application to Financial Markets, Phys. Rev. Lett. 85, 5659 (2000). 8. Epstein and R. Axtell, Growing Artificial Societies (Brookings, Washington D.C., 1996). 9. G. P. Harmer and D. Abbott, Losing Strategies can Win by Parrondo's Paradox, Nature 402(6764), 864 (1999). 10. P. Jefferies, M. Hart, P. M. Hui, and N. F. Johnson, ^From Market Games to Real-World Markets, cond-math0008387 (2000). 11. B. LeBaron, Agent Based Computational Finance: Suggested Readings and Early Research, Economic Dynamics and Control 24, 679 (2000). 12. M. Lettau, Explaining the Facts with Adaptive Agents: The Case of Mutual Fund Flows, Economic Dynamics and Control 21, 1117 (1997). 13. T. Lux, Herd Behaviour, Bubbles and Crashes, The Economic Journal 105, 881 (1995). 14. T. Lux and M. Ausloos, Market Fluctuations I: Scaling, Multi-Scaling and Their Possible Origins, in Theories of Disasters: Scaling Laws Governing Weather, Body and Stock Market Dynamics, eds. A. Bunde and H-J. Schellnhuber, Springer-Verlag, Berlin, in press. 15. D. Lyback, Transient Diversity in Multi-Agent Systems, DSV Report 99-X-097, Royal Institute of Technology, Stockholm, 1999. 16. S. Maslov and Y-C. Zhang, Optimal Investment Strategy for Risky Assets, Theoretical and Applied Finance 1(3), 377 (1998). 17. NASD Regulation Issues Guidance Regarding Stock Volatility, NASD Notice to Members 99-11, 1999. 18. J. M. R. Parrondo, J. M. Blanco, F. J. Cao, and R. Brito, Efficiency of Brownian Motors, Europhys. Lett. 43(3), 248 (1998). 19. Swedish Securities Dealers Association, Recommendations on Program Trading and Related Topics, May 19, 1992 (In Swedish). 20. T. Wolverton, Judge Bars eBay Crawler, CNETNews.com, May 25, 2000.
BDI MULTIAGENT L E A R N I N G B A S E D O N F I R S T - O R D E R I N D U C T I O N OF LOGICAL DECISION TREES ALEJANDRO GUERRA HERNANDEZ, AMAL EL-FALLAH SEGHROUCHNI AND HENRY SOLDANO Universite Paris 13 , Laboratoire d'Informatique de Paris Nord, U.P.R.E.S.-A. CNRS 7030, Institute Galilee, Avenue Jean-Baptiste Clement, Villetaneuse, 93430, France. Email: {agh,elfallah,soldano}@lipn.univ-parisl3.fr This paper is about learning in the context of Multiagent Systems (MAS) composed by intentional agents, e.g. agents that behave based on their beliefs, desires, and intentions (BDI). We assume that MAS learning differs in subtle ways from the general problem of learning, as defined traditionally in Machine Learning (ML). We explain how BDI agents can deal with these differences and introduce the application of first-order induction of logical decision trees to learn in the BDI framework. We exemplify our approach learning the conditions in which plans can be executed by an agent. Key words: MAS learning, BDI systems, Logical Decision Trees.
1
Introduction
We are interested in learning in the context of Multiagent Systems (MAS) composed by intentional agents, e.g. BDI agents. In this paper, we deal with the issue of adding learning competences to a BDI architecture, which lead us to consider learning methods applied to systems which behavior is explained in terms of beliefs, desires, intentions (BDI propositional attitudes), and partial hierarchical plans, as proposed in practical rationality theories *, and that can be characterized as autonomous, reactive, pro-active and social 15 . Usually, MAS learning 10 ' 14 is characterized as the intersection of Machine Learning (ML) and Distributed Artificial Intelligence (DAI). Motivations for this are reciprocal: i) MAS community is interested in learning, because it seems to be central to different properties defining agents; and ii) an extended view of ML dealing with agency and MAS can improve the understanding of general principles underlying learning in natural and artificial systems. A learning agent9 can be conceptually divided into four components: i) a learning element responsible for making improvements executing a learning process; ii) a performance element responsible for taking actions, e.g. the agent without learning competences; iii) a critic responsible for providing feedback; and iv) a problem generator responsible for suggesting actions that will lead to informative experiences. Then, the design of the learning element, and consequently the choice of a
160
161
particular learning method, is affected by five major issues: i) which elements of the performance element are to be improved? ii) what representation is used for these components? iii) what feedback is available? iv) what prior information is available? v) is it a centralized or decentralized learning case? In this paper we expose the way BDI agency can be used to conceive learning agents able to operate in MAS, using induction of logical decision trees. In order to do that, the paper is organized as follows: Section 2 recalls briefly BDI architectures, introducing an example used in the rest of the paper. Section 3 presents our approach to MAS learning, it considers the design of a BDI learning agent, the learning method used (first-order induction of logical decision trees), and examples. Section 5 focuses on discussion, related and future work. 2
B D I Agency
BDI theories of agency are well known. Different aspects of intentionality and practical reasoning have been studied formally using extensions of modal and temporal logics 5-11-15. The goal of the section is just to recall the way BDI architectures work to complement the discussion on learning. Examples in this paper comes from a very simple scenario proposed originally by Charniak and McDermott 2 (see figure 1). This scenario is composed by a robot with two hands, situated in an environment where there are: i) a board; ii) a sander; iii) a paint sprayer; iv) a vise. Different goals can be proposed to the robot, for example, sand the board or even get self painted! which introduces the case of incompatible goals, since once painted, the robot stops being operational for a while. The robot has different options to achieve its goals, it can use both of its hands to sand the board, for example, or well, use the vise and one hand. Eventually, another robot will be introduced in the environment to deal with examples about different interactions. In general, a BDI architecture contains four key data structures: beliefs, desires or goals, intentions, and a plan library. Beliefs represent information about the world. Each belief is represented symbolically as a ground literal of first-order logic. Two activities of the agent update its beliefs: i) the perception of the environment; and ii) The execution of intentions. The scenario shown in Fig. 1 can be represented by the following beliefs of robot r l as: somewhere(sander), somewhere(board), somewhere(sprayer), free-hand (left), free-hand (right), operational (rl). Desires, or goals, correspond to the tasks allocated to the agent and are usually considered logically consistent. Two kinds of desires are considered: i) to achieve a desire expressed by a belief formula, i.e. !sanded (board); and
162 Environment
i^S) board santier
,/ f
^__ r?r^ \
Vy
^ \ robot rl
^^-^
rov I;
Plan: pO Trigger: ! sanded(X) Context: free-hand(Y) and somwhere(X) Body:
^ h
^ vise
o pickup(X) V
—•;
;:;
"
sprayer
robot r2
-
O put-in-vise(X) Q
T ! sand-in-vise(X)
1
Figure 1. The scenario for examples and a typical plan.
ii) to test a situation expressed as a disjunction and/or conjunction of belief formulae. Plans have several components. The invocation condition specifies, as a trigger event, the circumstances under which the plan should be considered. Four types of trigger events are possible: the acquisition of a new belief, the removal of a belief, the reception of a message, and the acquisition of a new (sub)goal. The context specifies, as a situation formula, the circumstances under which the execution of the plan may start. The body of a plan is represented as a tree where nodes are labeled with states and arcs with actions or subgoals, specifying a course of action. The maintenance conditions describe the circumstances that must remain to continue the execution of the plan. Finally, a set of internal actions is specified for the cases of success and failure of the plan. Figure 1 shows a simplified plan pO to sand an object X. The last branch in the plan is a subgoal, because the robot will need to take the sander to do its work, which involves another plan. An intention is implemented as a stack of plan instances. In response to an event, the agent must find a plan instance to deal with it. Two cases are possible: i) If the event considered is an external one, an empty stack is created and the associated plan is pushed on it, i.e. if the event is .'sanded(board), the plan pO is considered, possibly among others, and the substitution (board/X, left/Y) makes it executable. So, this substitution and pO are used to form a new intention stack identified as ipO; ii) If the event is an internal one, it means it was produced by some already existing intention. The plan instance generated for the internal event is pushed in the intention stack that gener-
163
ated the event, i.e. When executing ipO, the last branch in the plan body is a subgoal, so the event (!sand-in-vise(X),ipO) will be posted and will be processed as usual, but the intention formed, will be pushed on the top of ipO. A BDI interpreter 3 manipulates these structures, selecting appropriate plans based on beliefs and desires, structuring them as intentions and executing these ones.
3
B D I Learning Agents
We consider that learning in the MAS context differs in subtle ways from learning in other ML situations. There are two sources for these differences: i) the flexible autonomous behavior defining agency introduces some considerations which are not present in traditional software 4 ' 8 , i.e. autonomy and pro-activeness; ii) MAS environments are usually complex and dynamic. This suggests that the same mechanisms controlling the behavior of the agent should b use to control learning processes, e.g. learning processes should be considered as actions of the agent. In particular: i) Agents have to be able to identify situations where learning is necessary (pro-activity); ii) Agents have to evaluate and prioritize their learning processes (action selection); iii) Eventually, agents should be able to cope with simultaneous learning processes, attending different learning goals found by the agent; and iv) The result of the learning processes should be incorporated in the agent architecture. We have observed that applications and challenges of MAS for ML are indicative of a hierarchy of MAS levels of different complexity, that could be useful to adopt a bottom-up approach in MAS learning research towards a full distributed MAS learning. Levels are as follows: i) In the first level, agents learn from the observation of their environment without direct interaction with other agents (centralized learning); ii) In the second level, an elementary form of direct interaction is introduced: implicit exchange of messages among agents, requests included. Since it is a form of delegation this level introduces social learning in MAS; iii) In the third level, agents are enabled to learn from the observation of the behavior of other agents; and iv) All previous levels are forms of centralized learning. In the fourth level, decentralized learning is considered, i.e. agents with different beliefs participating in the same learning process. Defining BDI learning agents involves: i) taking into account the above considerations; ii) considering the questions suggested while defining learning agents (section 1) under these considerations; and iii) Choosing a learning method.
164
3.1
Defining BDI learning agents
What components of performance can be improved? Plans are central in our approach: i) the context of each plan determines when they are executable affecting the order in which they are considered, so we want agents to learn the context of their plans that led to successful executions of them; ii) plans will be used as background knowledge; and iii) Success and Failure components of the plan help to build examples using internal actions. BDI learning agents will not learn their beliefs, but use them to build examples to learn. Events can be used in two ways: i) trigger events label the concept to be learn (event satisfied or not); and ii) the set of plans obtained after a given event can be used as background knowledge. What representation is used for these components? The whole BDI interpreter is built on first-order logic representations. Belief formulae are defined as an atom or the its negation. Beliefs are grounded belief formulae. Situation formulae are a conjunction and/or disjunction of belief formulae. Two goals are considered, achieving a belief formula and testing a situation formula. Actions are seen as procedure calls, possibly including arguments. Plans, as seen, are complex structures. What is relevant here is that the invocation of a plan is represented as a trigger event, the context of a plan is represented as a situation formula, and the body of a plan is a tree which arcs are labelled with either goals or actions. Intentions are built as stacks of plan instances. What feedback is available? The agent keeps traces of the execution of their intentions. Success in achieving an intention executes a set of internal actions to update the agent structure. These actions can include saving information about the context in which the intention was satisfied. Failures are processed in a similar way, but the event associated originally with the plan is reposted in the queue with the following information i) which plan is producing it, and ii) which plans has failed to satisfy it. This can be complemented with information about the beliefs of the agent when success or failure occurs, to build learning examples. What prior information is available? Basically, we consider as prior information the bootstrap component of the BDI architecture, i.e. the plan library, and initial beliefs. 3.2
First- Order Induction of Logical Decision Trees
After the representations used in BDI architectures, we considered first-order learning methods. Since the context of plans was represented as a disjunction of conjunctions of belief formulae, we decided to use decision trees as target representation.
165
Decision tree learning is a widely used and very successful method for inductive inference. As introduced in the ID3 algorithm by Quinlan 13 , this method approximates discrete-value target functions. Learned functions are represented as trees and instances as a fixed set of attribute-value pairs. These trees represent, in general, a disjunction of conjunctions of constrains on the attribute values of the instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree itself is a disjunction of these conjunctions. Decision trees are inferred by growing them the root downward, greedily selecting the next best attribute for each new decision branch added to the tree, in a divide-and-conquer strategy, differing from its rule-based competitors, i.e. CN2 and AQ, which use covering strategies. Since clausal representation used in inductive logic programming (ILP) exhibits discrepancies with the structure underlying decision trees, Luc de Raedt 7 introduced the concept of logical decision trees, that are binary decision trees (trees where tests have two possible outputs) constrained by: i) every test is a first-order conjunction of literals; and ii) a variable that is introduced in some node can not occur in its right subtree. This representation that corresponds to a clausal representation known as learning from interpretations paradigm 6 . The learning from interpretations paradigm can be defined in the following way. Given: i) a set of classes C; ii) a set of classified examples E; iii) a background theory B. Find a hypothesis H, such that: Ve £ E,HAeAB |= c', where c is the class of the example e and c' € C\ {c}. The background theory B is used in the following way. Rather than starting from complete interpretations of the target theory, examples are a kind of partial interpretations (sets of facts) that are completed by taking the minimal Herbrand model M(B U J) of the background theory B and the partial interpretation / . This paradigm enables the agent to conceive examples as sets of beliefs considered when executing an intention. Tilde 7 is a learning from interpretations algorithm, operating on logical decision trees.It uses the same heuristics that C4.5, a predecessor of ID3 (gain ratio, post-pruning heuristics), but the computation of the tests is based on a classical refinement operator under 0-subsumption. 3.3
Exemplifying the approach
In the scenario proposed in Fig. 1 we can consider the following predicates to specify the actions configuring the behavior of the agent: pickup(X), putdown(X), put-in-vise(X), sand-in-vise(X), sand-in-hand(X), paint(X), selfpaint(X). To describe the environment where the agent is situated, the fol-
166
lowing predicates are used: free-hand(X) to indicate that the robot has the hand X free; somewhere(X) to indicate that the object X is somewhere there; in-vise(X) to indicate that the object X is in the vise; in-hand(X) to indicate that the object X is in a hand of the robot; operational(X) to indicate that the robot X is operational; sanded(X) and painted(X). Then we can consider the simple plan body of pO to sand an object X, executing sequentially: pickup(X), put-in-vise(X), and sand-in-vise(X). This plan body is executed if (context of plan): free-hand(Y) and somewhere(board).The specification of the plan can be incorporated in the background knowledge, as well as other general knowledge of the agent: board-sanded : - plan(pO,board). plan(pO,board) : - free-hand(Y), somewhere(board), sanded(board). sanded(X) : - pickup(X), p u t - i n - v i s e ( X ) , sand-in-vise(X). The agent can build examples as models of the cases where the execution of pO lead to the board sanded and also for the cases where it does not. For this, the trigger event Isanded(board) produces two classes to consider boardsanded and board-not-sanded. The rest of the models are beliefs the agent had when the intention containing pO was executed. begin(model(l)). board-sanded. free-hand(left). operational(rl). somewhere(board)plan(pO). end(model(1)).
begin(model(2)). board-sanded. free-hand(right) operational(rl). somewhere(board) plan(pO). end(model(2)).
begin(model(3)). board-not-sanded. free-hand(left). somewhere(board). plan(pO). end(model(3)).
begin(model(4)). board-not-sanded. free-hand(left). somewhere(board). in-vise(sander). plan(pO). end(model(4)).
begin(model(5)). board-sanded. free-hand(left). operational(rl). somewhere(board). plan(pO). end(model(5)).
begin(model(6)). board-sanded. free-hand(left). operational(rl). somewhere(board) , plan(pO). end(model(6)).
The following pruned tree for this learning setting is obtained by Tilde:
167
operational(A) ? +—yes: board-sanded [4 / 4][ml,m2,m5,m6] +—no: board-not-sanded [2 / 2][m3,m4] Fractions in the form [i / j] indicate the number of examples in the class (i) and how many of them were well classified (j). Examples in the class are listed immediately (ml...m6). Induction time, for this example was of 0.03 seconds. The equivalent logic program for this logical decision tree is: nl :- operational(A). c l a s s ( b o a r d - n o t - s a n d e d ) : - not n l . class(board-sanded) : - operational(A). The definite clause nl :- operational(A) is introduced by the refinement operator of Tilde, because it will be useful to define the branch for the class board-not-sanded, which is defined in terms of not nl. The decision tree obtained suggests that the agent must add operational A) in the preconditions of the plan pO. Observe that examples expressed as models, can include beliefs about other agents, i.e. operational(r2) where r2 is a different robot, or also beliefs that other agents have sent to robot rl, without affecting the learning process. This is very important to scale up the approach to social learning, particularly to the fourth MAS level proposed. 4
Discussion
We have explained and exemplified how BDI agents can learn using FirstOrder Induction of Logical Decision Trees. Different triggers have been considered in literature 4 to start learning processes associated with specific areas, i.e. expectation violations, and perceived need of improvement. All of them are possible in a BDI agent thanks to the way it uses its plans. We have not considered here expectation violations, but expectations can be represented in the states of plan bodies to verify these conditions. Unsuccessful executions of intentions suggest the need of improvement. The setting used in learning from interpretations are very important here, since using the BDI architecture we can: i) identify a task that is not well accomplished; ii) obtain examples of the execution of intentions (positives and negatives); and iii) obtain background knowledge, defining in this way the area where learning is necessary.
168
The example introduced suggests, it is possible for the agent to learn with few examples. We think that this is due to the way BDI architectures built windows of rationality 1 enabling the agent to focus on beliefs and plans relevant to particular events. More complicated experiments are necessary to know if Tilde continues to infer useful information with few models, specially in the case of the agent considering interactions with other agents. We have decided to do our own implementation of a BDI interpreter. The reasons for this decision include i) we knew that different implementations for BDI architectures already existed, e.g. PRS , its re-implementation dMARS 3 , but we only had access to formal specifications of them, not the source code or low level information that help us modify or extend them accordingly to our needs. We are using Allegro CL 4.3 running on a Linux platform. This lisp interpreter enables us to execute several functions, i.e. agents, sharing the same lisp environment in a multiprocess way. For the learning algorithm we are using Tilde version 5.5.1. Some works in the same direction that ours include: Olivia and co-authors 12 present a Case-Based BDI framework applied to intelligent search on the Web, but the interpreter operates in a case-based cycle. Grecu and Brown 4 have some similar position about the way learning must be incorporated in agent systems, but their agents are not intentional and they use propositional learning. Jacobs et al. 8 presents the use of ILP systems for th validations of MAS. Experimental results are promising. Even when the scenario proposed is very simple, extended with a second robot, it seems to be sufficient to experiment different interaction situations among agents. Immediate work to do is completing some details about the interaction of the interpreter and the learning processes, in order to use more realistic scenarios. Experiments done up to now have help us to better understand the interaction of the agents with their learning processes. 5
Acknowledgements
Discussion with David Kinny and Pablo Noriega has been very helpful. The first author is supported by Mexican scholarships from Conacyt, contract 70354; and Promep, contract UVER-53. References 1. M Bratman, Intention, Plans, and Practical Reasoning,(Harvard University Press, Cambridge MA., USA, 1987).
169
2. E Charniak and D McDermott, Introduction to Artificial Intelligence, (Addison-Wesley, USA, 1985). 3. M D'Inverno, D Kinny, M Luck, and M Wooldridge in Intelligent Agents IV, Volume 1365 in Lecture Notes in Artificial Intelligence, pages 155-176, (Springer-Verlag, Berlin-Heidelberg, Germany, 1997). 4. D L Grecu and D C Brown in Proceedings of the Third IFIP Working Group 5.2 Workshop on Knowledge Intensive CAD, eds. T Tomiyama and M Mantyla, Guiding Agent Learning in Design, pages 237-250, Tokio, Japan,1998. 5. A Rao and M P Georgeff, Decision Procedures for BDI Logics, Journal of Logic and Computation 8(3):293-344, 1998. 6. L De Raedt and Dzeroski, First-order jk-clausal theories are PAClearnable, Artificial Intelligence (70):375-392, 1994. 7. L De Raedt and H Blockeel, Top-Down Induction of Logical Decision Trees, Technical Report, Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1997. 8. J Nico et al in Inductive Logic Programming, eds. N Lavrac and S. Dzeroski, Using ILP-Systems for Verification and Validation of Multi-Agent Systems, pages 145-154, (Springer Verlag, Berlin-Heidelberg, Germany, 1997). 9. S J Russell and P Norvig, Artificial Intelligence, a modern approach, (Prentice-Hall, New Jersey, USA, 1995). 10. S Sen and G Weiss, Multiagent Systems, a modern approach to Distributed Artificial Intelligence, (MIT Press, Cambridge, MA., USA, 1999). 11. M Singh et al in Multiagent Systems, a modern approach to Distributed Artificial Intelligence, ed. G Weiss, chapter Formal Methods in DAI: Logic-based Representation and Reasoning, (MIT Press, Cambridge MA., USA, 1999). 12. C Olivia et al in AAAI Symposium on Intelligent Agents, Case-Based BDI Agents: an Effective Approach for Intelligent Search on the WWW, Stanford University, USA, 1999. 13. J R Quinlan, Induction of Decision Trees, Machine Learning 1:81-106, 1986. 14. G Weiss and S Sen, Adaptation and Learning in Multiagent Systems, Number 1042 in Lecture Notes in Artificial Intelligence (Springer-Verlag, Berlin-Heidelberg, Germany, 1996). 15. M Wooldridge, Reasoning about Rational Agents, (MIT Press, Cambridge MA., USA, 2000).
EVOLUTIONARY BEHAVIORS OF C O M P E T I T I V E A G E N T S IN DILEMMA SITUATION T i n T i n Naing, Lifeng He*, Atsuko M u t o h , Tsuyoshi N a k a m u r a a n d Hidenori Itoh Intelligence and Computer Science Department Nagoya Institute of Technology, Nagoya, Japan E-mail: [email protected] ^Faculty of Information Science and Technology Aichi Prefectural University, Aichi, Japan Evolutionary behaviors of agents have received many interests of researchers because of its important role in both of multi-agent interactions and understanding of human interactions. "Game-theoretic approach" is a major means to study agent's behaviors in which competitive problems are formalized as games. Iterated Prisoner's Dilemma (IPD) game has been well studied for such purpose in various research areas. However, not all situations in real world environment can be formalized as IPD. Among others, the dead-lock avoidance problem is such one. In this paper, we propose a new game model, called Compromise Dilemma (CD), for studying evolutionary behaviors of agents, which is suitable for dead-lock avoidance problem. For each agent, there are two basic actions and one intermediate action. The combination with opponent's action makes an agent opportunist or victim. Evolutionary behaviors of agents in a co-evolutionary population are studied. We test our model with different parameters for evolutionary algorithm and analyze the results, and show that agents can evolve in a manner to achieve their optimal cooperative strategy to share the maximum average score for each other.
1
Introduction
Recently, searching an optimal interactive strategy for agents in multi-agent system has received a lot of attention among researchers, because multi-agent systems play an important role in developing and analyzing models and theories of interactivity in human societies. Although interaction between human beings is an integrated part of our everyday life, their mechanisms are still poorly understood. With the help of evolutionary learning, one of Distributed Artificial Intelligence technologies, we are able to explore their sociological and psychological foundations. There is a big trend of using "Game-theoretic approach" for studying autonomous multi-agent models. By formalizing situations around agents to an appropriate game, we can use it to find a good strategy of agents. Iterated Prisoner's Dilemma (IPD) is one of the most popular game models that has been studied in numerous works. However, it is a pity that not all situations in real world can be formalized as IPD. In IPD framework, even rational agents can get higher profit than they cooperate. Such model is useless for deadlock170
171
avoidance problems where competitors will risk their life if they only consider their self-profits. One example is front-to-front car race 1 . In this paper, we propose a dynamic game model, called Compromise Dilemma (CD), for studying deadlock-avoidance problems. In our model, two agents will utilize the same resource, that can only be used by one agent at each time, to accomplish their work. The action taking the resource increases the work done if succeeded, but raises a collision that decreases the work done of both agents, if failed. Normally, IPD allows agents only have two choices of action: full cooperation and full defection. However, recent papers have considered more choices than the two extremes 10 ' 11 . In our work, each agent will consider two alternate actions with an intermediate one during competition with his opponent. In real community, human sometimes also considers an intermediate action, for example, waiting for a chance, without making his decision at once. He may watch first what his opponent does and takes opportunity at the next time, or sometimes, he may make himself to be in opposite condition by exploitation of his opponent. Allowing the existence of intermediate actions leads us able to make a more realistic approach to study human interactions. The remain of this paper is organized as follows. In the next section, we briefly introduce a dilemma problem, formalize it as Compromise Dilemma with an intermediate action. In the section 3, we describe how to implement the evolutionary learning algorithm. We test the model with different parameters for GA operations and analyze the results in section 4. Finally, we discuss why the evolutionary approach can lead rational agents to provide profit their community in section 5. 2 2.1
Game-theoretic Approach Deadloak-avoidance Problem
To formalize a conflict resolution problem, we consider a grid-lane environment in which mobile agents are navigating to their predefined goals according to their planned space-time paths, as shown in Figure 1. Here we assume that agents are unable to communicate with each other, so they must decide their actions by themselves. In Figure 1, agents x and y are moving towards point B and A respectively. In this case, if both agents go forward in current directions, then there will be a collision of the two agents. To avoid the collision, one of them must give up the way. If one is sure that his opponent will give up the way, it is profitable for him to advance the way. On the other hand, there may be a waste of the space-time resource if both of them give up the way without having any information what opponent intended
172 N
V2
G
W
T
G
U=3
LS = 2
1 =2 i
W
o
=4 1
T
Figure 1: Deadlock-avoidance Problem
A=5
H E=5
V=1 C =0
Fi
S u r e 2 : P a y ° f f Matrix for CD with Intermediate Action
to do. We formalize this competitive problem as Compromise Dilemma (CD). It is a two-player game and similar to the so-called Chicken Game 1 . 2.2
Compromise Dilemma Model
In the situation introduced in section 2.1, usual dilemma games allow each agent choose an action either "take the way" or "give the way" on each play. However, in real community, human beings might consider an intermediate action such as "waiting chance" without making any decision at once. Therefore, in order to approach a more realistic model of evolution, we add an intermediate action "waif into our dilemma game. The intermediate action means "do nothing in current step and watch what the opponent does first". If his opponent gives up the way in current step, he will become the opportunist because at the next step he can take the way without any disturbance. On the other hand, he will be victim if his opponent takes the way in current step, because he is tricked and he must change his direction at the next step. According to combinations of their actions, each agent can get a score due to the payoff matrix shown in Figure 2. In the payoff matrix, the row player is P\ and the column agent is P 2 , respectively. We use symbol "G" for agent's action "give", "T" for "take" and "W" for "wait". Each content of the matrix expresses a payoff that agent Pi gets due to the corresponding combination of his and his opponent's actions. If both agents take the action "give", each one obtains a payoff U = 3 for "loss by unnecessary compromise". If both choose the action "take", both obtain C = 0 for "punishment for damage by collision". If one agent chooses the action "take" while his opponent chooses "give", he gets A = 5 for "advantage". In opposite situation, he gets J = 2 for "intended compromise". In contrast with the Prisoner's Dilemma, when both of agents play advance
173
(take), the payoff of any agent is lower than that when the agent playing compromise (give) and his opponent play advance (take). Obviously, if both of them compromise, they will avoid the crash and none of them will either be a winner or risk his life. If one of them certainly swerve away, they will be "chicken" as in Chicken Game 1, but will survive with the result that the opponent will get all the honor. If they crash each other, the cost for either of them will be higher than the cost of being a chicken. In addition to those combinations, we give a payoff O = 4 for "opportunist ", LS = 2 for "lose but save", L = 2 for "lazy", V = 1 for "victim" and E = 5 for "exploitation". Notice here that a collision occurs only when both agents advance the way simultaneously. We assume that if an agent chooses "take" while his opponent chooses "wait", he just tricks his opponent in current step that causes no damage for his opponent. Therefore we give a payoff E = 5, the same as A, when he exploited his opponent. Suppose we define up(a,i, a,-) as the score that agentp received when agentp executes action o^ and his opponent executes action a,-. In this paper, the above payoff matrix satisfies the following conditions. up(T,T)
Implementation of Evolutionary Learning
To study evolutionary behaviors of agents in CD, a population of 100 individuals is used. Each individual in the population is a strategy represented by Moore-machine of Finite State Automaton (FSA), as shown in Figure 3. An input alphabet to FSA is one of the actions {G, W,T}, which represents the latest action of its opponent and is expressed as the label of the corresponding transition arrow in FSA. Output alphabet is also an action, which represents the action of the agent for current play and is expressed in the corresponding letter around the outside of each state. In this work, we assume that each initial FSA has only one state, and the maximum number of states in each FSA is limited to 8. Initial state is pointed by a thick arrow. In Figure 3, as an example, if the initial state is si and the latest action of opponent is "G", then transition occurs from si to s2 along
174
Figure 3: Expressing strategies by Moore-Machine
the arrow labeled as "G", and FSA returns the label "W" outside of the state s2 as the action for current play. Following Axelord 8 , a genetic algorithm (GA) maintains a population of trial strategies. In each generation, each individual plays the iterated CD game against each of other individuals in the same population. The fitness of a strategy (an individual) is the average payoff of all those games, defined as follows. fitness(pi)
=
^
N
_
x
"
(!)
where N is the population size, score(pi, Oj) is the average payoff of individual Pi for random iterations play against opponent individual Oj, and is denned by:
where a£ (a™) is the action taken by agent Pi (opponent Oj) in the nth iteration, and Round is the number of iterations decided randomly. Initially, a population of 100 individuals, each of which has only one state, is generated randomly. Starting from the initial population, co-evolution goes on with requiring no prior knowledge of how to play a game well. With the population evolves, the individual strategies improve as the game goes on. After each generation, individuals are sorted according to their fitness. The 50 best individuals are transferred to the next generation, and the remaining individuals are discarded. Then, parent individuals are selected from the 50 elites by roulette-wheel preservation method, and the genetic operators of mutation, insertion and deletion are applied to the selected parents to generate 50 offsprings for the next generation. Parent individuals are mutated at a probability of a. Internal states in parent FSA are inserted at a probability of (3 and those are deleted at a probability of 7. These probabilities will be used as the parameters of our tests. Three genetic operators are implemented as follows:
175
3.1
Genetic Operations
Mutation: An offspring is generated by random change of one of three transition conditions of a state, selected at random, of parent FSA.
Before mutation
After mutation
Insertion: An offspring is generated by inserting a newly generated state at random position in the parent FSA. Three transition conditions of the new state to other existing states are set randomly.
Before insertion
New state
After insertion
Deletion: An offspring is generated by deleting a randomly selected internal state of the parent FSA. The transition arrows previously directed to that state are changed to other states randomly.
w
Before deletion
After deletion
176
4
Experimental Results
0
50
100
150
200 250 300 Generations
350
400
450
500
(a)
•
•
• ;, i
0
50
100
150
200 250 300 Gene ra No ns
350
450
500
(b)
•
•
.
>, i'''* w f'"T'''*Jti^' k1*
:
IMKii"' lUH •'i;
M 'I
0
50
run 2- - - ran 10
100
150
200 250 300 Generations
(c)
350
400
450
500
(d)
Figure 4: Plots of Population Averaged Fitness in each generation, (a) for a = 0.5,/3 = 0.25,7 - 0.25. (b) for a = 0.1,/? = 0.05,7 = 0.05. (c) for a = 0.04, j3 = 0.02,7 = 0.02. (d) Means of 10 runs of (a), (b) and (c).
In this paper, we provide two experiments on our model. In each one, a population of 100 trial strategies evolves until certain generations. In the first experiment, in each generation, each individual plays an iterated game against other members of the same population (round-robin). The number of iterations in each game is decided randomly. We make the test by changing the probability parameters of genetic operations. 10 runs for each of
177
three parameter sets, (a = 0.5,(3 = 0.25,7 = 0.25), (a = 0.1,(3 = 0.05,7 = 0.05) and (a = 0.04, [3 = 0.02,7 = 0.02) are made and each three of them are plotted in Figure 4(a), 4(b) and 4(c) respectively. The mean value of 10 runs for each of three parameter sets are plotted in Figure 4(d). In the test with parameter set (a — 0.5,(3 = 0.25,7 — 0.25), 3 out of 10 runs did not get the optimal score, example: run 9 in Figure 4(a). In other two tests, all runs reached the nearly optimal score, (the optimal score is 3.5), after 400 generations. In all three tests, we can see the tremendous change of population-averaged fitness in earlier generations. As there is only one state in each FSA in the initial generation, agents behave as blind. An individual has no other incentives except the output value of the initial state. If the output of the initial state is "G", agent gives up the way whenever he meets dilemma in spite what his opponent's action would be. On the other hand, if the output of the initial state is "T", he advances without considering whether that would risk his life, and so on. As evolution proceeds, population members become better judges of each other. Genetic mutations create more states in FSAs, therefore an individual can alter to the transition paths to get higher score. From the mean plots in Figure 4(d), we find that the less individuals are genetically mutated, the longer it takes for evolving to reach their saturation, but the higher saturated score (nearly optimal) they can get.
1000 1200 1400 1600 1800 2000 Generations
(a)
0
200
400
600
800 1000 1200 1400 1600 1800 2000 Generations
(b)
Figure 5: Results for second experiment (a) 3 out of 10 runs, (b) Mean of 10 runs
On the other hand, in a real world problem, for example, multi-agent burden carriage problem, agents cannot expect when and where they will meet with which opponent. Therefore, in the second experiment, we let individuals
178
play iterated games against only part of the members in a population. Each agent plays games in random iterations with randomly selected opponents. We fixed the probability parameter set as (a = 0.04, (3 = 0.02,7 = 0.02). At this time, individuals cannot get their optimal score in 500 generations. 10 runs are made and three of them are plotted in Figure 5(a) and the mean of 10 runs is plotted in Figure 5(b). We found that agents can evolve to reach their optimum after 1000 generations in almost all runs. In all tests, agents behave as blind in the earliest generations. As evolution proceeds, they improve in playing the game by taking more and more complex actions, and emerging cooperative interactions. Here, cooperative interaction in CD means that agents take their action alternatively to avoid damage or loss. In later generations, agents keep their cooperative interactions while generating optimal score and keeping their community peaceful.
5
Conclusions
In this paper, we proposed a game model for Compromise Dilemma problems and observed the evolutionary behaviors of simulated agents with such a model. According to the experimental results, evolutionary approach makes the agents able to evolve their own strategies for dealing with an uncertain environment in an intelligent manner. The reason is that autonomous agents are able to share maximum average score by avoiding two kinds of extreme cases: occurrence of damage by collision and undesired loss of space-time (resource). In other words, agents can achieve their optimal strategy that enables them to utilize the resources of environment as much as possible. Compromise Dilemma is naturally a competitive problem in which all individuals try to maximize their own benefit. However, since their opponents are who also concurrently evolve in the same way to upgrade their fitness, cooperative interactions between agents are established. Accordingly, population fitness becomes escalated as the evolution goes on. Following the experimental results, we speculated that rational agents yielded a communal pronto with suitable circumstances. Our future work is to use this model to simulate real world problems.
Acknowledgments This work is partially supported by the Hori Information Science Promotion Foundation, Japan.
179
References 1. Bengt Carlsson and Stefan Johansson: "An Iterated Hawk-and-Dove Game" Proceedings of the Third Australian Workshop on Distributed AI and Lecture Notes in Artificial Intelligence 1441, 1997. 2. Boyd, R., & Lorberbaum, J.P.: "No pure strategy is evolutionarily stable in the repeated prisoner's dilemma game" Nature, 327 pp 58-59, 1987. 3. Fogel, D.B : "Evolving Behaviors in the Iterated Prisoner's Dilemma" Evolutionary Computation, 1(1) pp 77-97 1993. 4. Akira Ito and Hiroyuki Yano: "The Emergence of Cooperation in a Society of Autonomous Agents- The Prisoner's Dilemma Game Under the Disclosure of Contract Histories" ICMAS'95 5. Lindgren K.: "Evolutionary Phenomena in Simple Dynamics" Artificial Life II pp 295-311, 1991. 6. Peter J Angeline: "An Alternative Interpretation of the Iterated Prisoner's Dilemma and the Evolution of Non-Mutual Cooperation" Artificial Life IV, Proceedings of the fourth international workshop on the synthesis and simulation of living system pp 353-358 7. Reiji Suzuki and Takaya Arita: "Meta-Pavlov: Strategies that SelfAdjust Evolution and Learning Dynamically in the Prisoner's Dilemma Game" Game Informatics 1999. 8. R.M.Axelord: "The evolution of Cooperation" Basic Books, New York 1984. 9. Multiagent Systems : "A Modern Approach to Distributed Artificial Intelligence" The MIT press, 1999. 10. Paul G.Harrald and David B.Fogel: "Evolving continuous behaviors in the Iterated Prisoner's Dilemma" Biosystems, 1996. 11. Yao.X, Darwen.P : "How Important Is Your Reputation in a Multi-Agent Environment" Proceedings of the 1999 IEEE international conference on systems, man, and cybernetics(SMC99) .
A STRATEGY FOR CREATING INITIAL DATA O N ACTIVE L E A R N I N G OF MULTI-LAYER P E R C E P T R O N
K A Z U N O R I IWATA A N D N A O H I R O ISHII Dept.
of Intelligence and Computer Science, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, 466-8555, Japan E-mail: {kiwata,ishii} Qegg.ics.nitech.ac.jp Keywords : active learning, multi-layer perceptron, network inversion, pseudo-random number, low-discrepancy sequence
Many active learnings in the training of a partially trained Multi-Layer Perceptron (MLP) have been proposed. We note any active learning performance depends on initial training data. The initial training data plays an important role for active learning performance, because any active learning algorithm generates additional training data that is useful for improving the classification accuracy, based on initial training data. Most of conventional methods have generated initial data at random using a pseudo-random number. However, in practical case, we can not prepare enough data by the limit of time and cost. Therefore, the bias of initial training data becomes critical, especially in the case of input space dimension to be large. In this paper, we propose a strategy by the use of low-discrepancy sequence for creating more uniform initial data than pseudo-random numbers. For the classification problem of MLP, we analyze the experimental performances of network inversion algorithm which use a pseudo-random number and a low-discrepancy sequence as initial training data. In experimental results, we found low-discrepancy sequences give a good strategy to create initial training data. Finally, we also discuss some advantages and disadvantages of low discrepancy sequences as initial training data.
1
Introduction
Learning by the use of queries through training data generation mechanism is well-known as active learning 1 ' 2 . Active learning includes the interaction with the oracle, which always responses a correct answer when queried with example. In other words, the oracle gives a correct classification for a given data point. Examples of oracle include a human expert, costly experimentation, computer simulators and so on. The classifier adds the properly classified point by the oracle as training data. Such a learning with additional training data can significantly increase the resulting classification accuracy with a small computation 1,2 , and has recently attracted considerable attention. In this paper, we consider only the case of active learning on Multi-Layer Perceptron (MLP) 3 . Many active learnings in the training of a partially trained MLP have been proposed 1 ' 2,4 ' 5 ' 6 ' 7 . We note any active learning performance depends on
180
181
initial training data. The initial training data plays an important role for active learning performance, because any active learning algorithm generates additional training data that is useful for improving the classification accuracy, based on initial training data. In practical case, it is desirable that we prepare various initial data, that is, uniformly distributed data for a given space. There are several reasons for uniformly distributed data to be required. One is that each class should have a few initial data, because if no training data initially exist within each class region, most of active learning algorithms can not refine these classification boundary. However, in many case, we cannot recognize each class region in advance. A good strategy is to prepare as uniform data as possible for a given space avoiding the repetition of the same data. Another reason is to detect the whole boundary by active learning algorithm. The bias of initial data may cause the classification bias for a given space. Most of conventional methods have generated initial data at random using a pseudo-random number. By the low of large number and central limit theorem in statistics, the pseudo-random number can distribute uniformly for a given space as the number of data approaches infinity. However, in practical case, we can not prepare enough data by the limit of time and cost. Therefore, the bias of initial training data becomes critical, especially in the case of input space dimension to be large. In this paper, we propose a strategy by the use of low-discrepancy sequences for creating more uniform initial data than pseudo-random numbers. For the classification problem of MLP, we analyze the experimental performances of network inversion algorithm which use a pseudo-random number and a low-discrepancy sequence as initial training data. Network inversion algorithm is one of effective active learnings to create additional training data in terms of classification independence of input distribution, computational cost and complexity of implements. The organization of this paper is as follows. In section 2, we briefly explain the back-propagation and network inversion algorithms. Low-discrepancy sequence is discussed in section 3. In section 4, for the two-class classification problem, we compare the experimental performances which employ a pseudorandom number and a low-discrepancy sequence as initial training data, and discuss some advantages and disadvantages of low-discrepancy sequence. Finally, we summarize and give some conclusions in section 5. 2
Dynamics of Multi-layer Perceptron
It is helpful to review the dynamics of MLP before moving to the main task. We start with the forward and learning (backward) phases of MLP, and then
182
proceed to the Network Inversion (NI) algorithm. 2.1
Forward and Learning Dynamics
Let the number of layer be L and the lth layer has 7Vj neurons. The 1st layer, the Lth layer and the other (from 2nd to L — 1th) layers are called input layer, output layer and middle layer, respectively. The output at each layer is expressed in the following equations, JV,_i
Ui{l)
= Y, ^{1)^(1 - 1) + 9i(l)
Oi(l) = f(Ui(l))
(1) (2)
where Ui(l) and ai{l) denote the net value and activation value of the ith neuron at the lth layer, respectively. 6i(l) is the bias of the ith neuron at the Zth layer. Wij(l) denotes the weight connected between the jth neuron at the I — lth layer and the ith neuron at the lth layer. /(•) is an activation function (e.g. sigmoid function). The back-propagation method is the most popular method for learning of MLP. Using an iterative gradient descent algorithm, the mean squared error E between the teaching vector £ = (ii, • • • ,tNL) and the actual output vector a(L) = (oi(L), • • • , CLNL (L)) is minimized according to the rule : 3E
«,„(!)<- »,,(()- n-g^
-"*w-"®£$>
(3)
(4)
where n is the learning rate, and the mean squared error E and the error signal Si (I) are calculated recursively : 1
NL
E=-Y,(ti-ai(L)f 2
Si{l) =
(5)
dE da.i{l) [-{U-ai{L)) (l = L) E j r M * + 1 ) ^ 1 ^ (otherwise)
(6)
183
2.2
Network Inversion Algorithm
NI algorithm 7 ' 8 is designed to move each existing data to one specific boundary point. This idea is similar to the back-propagation algorithm. In NI algorithm, by using a gradient descent algorithm, the error signal e is propagated from output layer to input layer in order to update the input vector so that the desired output vector r = (T± , • • • , TNL ) will be produced by the network. 1
NL
(7) i=l
m £i()
_ dE ~ daiil) _{-{Ti-ai{L)) ~ \ EfiV
(l = L) ej(l + l ) ^
1
(ot^rwise)
(8)
NI algorithm works concurrent with back-propagation algorithm. 2.3
Additional Data by Network Inversion
In order to represent the classification problem concisely without loss of generality, we consider MLP with outputs is designed to output a twoclass classification value for an input vector. That is, each output neuron Oi(L)(l < i < NL) is trained to output either 1 for one class or 0 for the others. The input vector which produces the desired output vector r, typically T = 0.5, can be considered the classification boundary of MLP. In other words, the most likely explanation from an ambiguity point of view is that the input vector corresponding to the output vector T includes the region of maximum classification ambiguity (see Figure 1). We employ NI algorithm to invert initial training data toward the region of maximum classification ambiguity, and use the inverted data as additional training data so that MLP effectively improves the boundary. Such an additional training data can significantly increase the resulting classification accuracy. Note any additional training data is created based on the initial training data. 3
Low-Discrepancy Sequences
In general, any active learning algorithm can generate effective additional training data to improve the classification accuracy. NI algorithm which is one of active learning algorithms also creates additional data based on the initial training data as has discussed above. This means the initial training data
184
\^(^region of maximum classification ambiguity Figure 1. A concept of the region of maximum classification ambiguity
plays an important role on its performance. Suppose that we can generate any point inside a given input space, and are allowed to get the properly classification (teaching signal) by the means of interaction with the oracle. But each class region is not tell us in advance. A good strategy is to generate as uniform initial data as possible inside the input space without the repetition of the same data so that a few initial data exist inside each class at least, and the whole boundary can be detected. Most of traditional ways use a pseudorandom number to generate data uniformly. Low-discrepancy sequence9 (LDS) is well-known in the field of Quasi Monte Carlo method. One of the notable features of LDS is to have low discrepancy. Discrepancy means a measure of the uniformity of distribution of finite point sets. In short, LDSs create more uniform distributed data than pseudo-random numbers for a given space. In addition, LDSs never generate the repetition of the same data point. For many cases of multidimensional integration, Quasi Monte Carlo method by the use of LDS is more effective than the conventional Monte Carlo method by the use of pseudo-random number. We employ LDS as a strategy for creating initial training data for the multidimensional classification. In this section, we briefly review the basic of LDS at first, and then explain Faure sequence which is a kind of LDS a . 3.1
Discrepancy
To carry the discussion of properties of LDS further, let us define the term discrepancy in detail. Let x{n) — (xi(n),--- ,XK(TI)) and E{x) be the nth training data of K dimensions and the subset [0,X\) x ••• x [0,xx) in the K dimensional hypercube [0,1] K , respectively, ./^-discrepancy TK{U) of the training data set P — {x(n) \ n = 1, • • • , N} by the measure of L2 norm for "For further details of LDSs, see
185
Lebesgue integrable function space is defined as follows.
Tx{n)*g{f
( #(£(*
I A0)_fix(n))2^
(9)
where #{E(x \ N)) denotes the number of data inside E(x). In the same way, imax-discrepancy DK (n) by the measure of maximum norm is defined as the following equation. def
DK(n)
=
| #{E(x I AQ) sup | *e[o,i] K
^
. . . \\x{n) |
TT
,im (10)
k~i
Equations 9 and 10 are the amount of the uniformity of distribution of N data point sets by the measure of each norm. Only Z/2-discrepancy is known to be calculated by the following equation. N T
K^
N
K 1
=^ E E I I (
- msK{xk(n),xk{m)})
n = l m=\ k=l Ol-K
N K
—ArEII(1-^H) + 3-if n = l
(ii)
Jfc=l
For N > 1, the relation between Z/2-discrepancy and L max -discrepancy satisfies the following equation. TK(n)
< DK(n)
(12)
With the large number of training data so that data points are distributed as uniform as possible, we can consider DK(n)
N
-^° 0
(13)
asymptotically. Equations 12 and 13 lead the following equation. TK(n)N-^T0
(14)
LDS keeps the following minimum discrepancy for N > 1. DK(n) < cK^p!L
(15)
where CK is the constant number depends on K dimensions. The multidimensional LDSs include the Halton, Sobol', Faure and other sequences 9 ' 10 . We will concentrate on Faure sequence in the next section and leave the details of other sequences to references 9 ' 10 .
186
3.2
Faure Sequence
Faure sequence is generated based on a prime number p more than K as the radical number for the K dimensional problem. The first step in the calculation of nth data point is to compute the first element x\(n) as follows. ar1(n)=53ai,ro(n)p-m-1
(16)
m=0
where ai, m (n) is the number which satisfies the following digit expansion. oo
" = 5Z a^m{n)pm
(17)
m=0
Then, in the next step, the other elements Xk(n)(2 < k < K) are computed as x
k(n) = ^2
a
k,m(n)p
(18)
m=0
where each ak,m(n) is the number which satisfies the following equation. fak,o(n)\ ak,i(n) ak,2(n)
=
/oC0 iC0 2C0 • •-^ fc-i 1C1 2 Ci •• 2C2 • •
fa1>0(n)\ ai,i(n) ai, 2 (n)
(mod p)
(19)
V
V
V 0 ••/ / / where 0C, denotes o combination •. We use Faure sequence as typical LDS in the experiment in the next section. 4
Experimental Results and Discussion
In order to be simple without loss of generality, we will take up the two-class hyper-sphere classification problem with K dimensional input vector where classification target is 1 inside hyper-sphere and 0 otherwise within hypercube [0,1] K as shown in Figure 2. That is, when queried with the nth data, the correct classification is h(n)
_/l
if£f = 1 (^W-0.5) 2
(20)
where r denotes the radius of hyper-sphere. Any training data is generated inside hypercube [0, l]k. Firstly, MLP was trained with each initial training data. The training was concurrent with inverting initial training data
187
(0.5, ... ,0.5)
Figure 2. Two-class hyper-sphere classification problem
Table 1. Each parameter of MLP training
Dimension of input vector (K) Number of neuron in middle layer Radius of hyper-sphere (r) Number of initial training data
2 3 0.3 75
3 4 0.4 400
4 5 0.5 600
6 5 0.6 900
8 5 0.7 1200
10 5 0.8 1500
toward the boundary point. Then, after convergence, the resulting inverted data is classified with a correct classification by the oracle. Secondly, MLP was re-trained with combination of the original data and inverted data with a correct classification. Finally, MLP classifies 104 validation data, which uniformly distributed inside the input space [0, l]k. We evaluated the classification accuracy of MLP by the measure of misclassification ratio. We set the learning rate r/ to 0.01 and each initial weight randomly within [—0.05,0.05]. Table 1 shows the structure of three layer perceptron, radius of hyper-sphere and the number of initial training data. Figures 3 and 4 is one of graphical representations of inverted data of two dimensions based on a pseudo-random number and Faure sequence, respectively. Circle in each figure denotes the true boundary. These figures tell us how well NI algorithm detects the whole boundary. In the classification accuracy, it is important to create additional training data for the whole boundary. As these figures indicate, the inverted data based on pseudo-random number failed to detect the lower part of boundary. By contrast, the inverted data based on Faure sequence detected the whole boundary well so that the classification accuracy
188 1
•
1
•
r^*^^,
0.8
0.6
• -*tSJ««S
0.8
0.6
•
Ii
ft It
0.4
V '"
•
0
/"
0.4
/*
/ 0.2
0.2
M M
i
,
0.6
0.8
•.
0
0.2
0.4
0.6
"0
0.8
0.2
0.4
*
Figure 3. Inverted data based on pseudo- Figure 4. Inverted data based on Faure serandom number quence
Table 2. Misclassification ratio (%) in each dimension
Dimension of input vector (K) pseudo-random number Faure sequence
2 3.0 1.74
3 4.39 3.65
4 6.23 5.82
6 16.47 15.47
8 27.68 25.58
10 25.34 23.3
can be improved. Table 2 shows the experimental results, averaged over 5 simulations. We found the trained MLP based on Faure sequence identifies more better than the pseudo-random number, especially in the high dimensional cases. It follows from these results that LDSs provide a good strategy to generate initial training data for the classification problem. The superiority of LDS can be explained by the sampling principle "Data should be sampled uniformly for a sample space so that it avoids sampling bias" in statistics. As a drawback of LDSs, LDSs tend to introduce systematic artifacts compared with pseudo-random numbers. However, this drawback disappears as the number of data is increased. 5
Conclusion
In this paper, we discussed the use of LDS for generating initial training data on active learning of MLP. The use of LDS is designed to create initial training data uniformly so that each class initially has a few data at least and the whole
189
boundary is detected by any active algorithm without the repetition of the same training data point. In our experiments, we compared the performances of NI algorithm which used a pseudo-random number and a LDS as initial training data of MLP. We showed in experiments that LDSs have an advantage over pseudo-random numbers in terms of effective generation method of initial training data that the classification accuracy can be improved. Especially, a good performance of LDSs becomes critical in higher dimensional cases. In the further work, we would like to show a theoretical advantage of LDS for generation method of initial data in active learning. References 1. Les Atlas, David Cohn, and Richard Ladner. Training connectionist networks with queries and selective sampling. Advances in Neural Information Processing Systems, 2:566-573, 1990. 2. Jenq-Neng Hwang, Jai J. Choi, Seho Oh, and Robert J. Marks II. Querybased learning applied partially trained multilayer perceptrons. IEEE Transactions on Neural Networks, 2(1):131-136, 1991. 3. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 323:533-536, 1986. 4. D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4:590-604, 1992. 5. Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Information, prediction, and query by committee. Advances in Neural Information Processing Systems, 5, 1993. 6. D. Cohn. Neural network exploration using optimal experiment design. Neural Networks, 9, 1996. 7. Hiroyuki Takizawa, Taira Nakajima, Hiroaki Kobayashi, and Tadao Nakamura. An active learning algorithm based on existing training data. IEICE Transactions Information and Systems, E83-D(l):90-99, January 2000. 8. A. Linden and J. Kindermann. Inversion of multilayer nets. In Proceedings of International Joint Conference on Neural Networks, pages 425-430, Washington DC, June 1989. 9. H. Niederreiter. Random Number Generation and Quasi-Monte Carlo methods, volume 63 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, 1992. 10. Syu Tezuka. Uniform Random Numbers Theory and Practice. Kluwer Academic Publishers, 1995.
EQUILIBRIUM SELECTION IN A SEQUENTIAL MULTI-ISSUE B A R G A I N I N G MODEL W I T H EVOLUTIONARY AGENTS N O R B E R T O EIJI NAWA1'2, KATSUNORI SHIMOHARA1-2, OSAMU KATAI2 2
1 A T R International Grad. School of Informatics,
- ISD, Soraku-gun, Kyoto University,
Kyoto 619-0288, Japan Sakyo-ku, Kyoto 606-8501,
Japan
A multi-issue alternating-offers bargaining model is presented where the issues are sequentially negotiated by the agents, as opposed to the classical setting where they are disputed simultaneously in bundles. The strategies that determine the agents' negotiation behaviors are generated by evolutionary algorithms. Preliminary results show a qualitative conformity with game theoretic predictions. Moreover, they suggest that, in specific situations, the sequential setting can lead to better outcomes concerning social welfare.
1
Introduction
Recent years have witnessed an intense cross-fertilization between economics and computer science, more specifically with the area of artificial intelligence (AI). 1 Negotiation is the coordination mechanism that involves the interaction of two or more parties with heterogeneous, possibly conflicting preferences, searching for a compromise that is satisfactory and mutually beneficial, so as to be accepted by all participants. It has been long a subject of study in economics, but recently it has also attracted the interest of AI researchers, due to its direct implications in the implementation of multi-agent systems. This paper reports on preliminary results of experiments performed with a sequential multi-issue bargaining model. The players have their bargaining strategies developed by means of a class of evolutionary algorithms named evolution strategies (ES). 2 Differently from the classical setting, where the issues are disputed simultaneously in a single bundle, in the present model each issue is negotiated individually in sequence. Our interest in the sequential setting of bargaining processes lays on the fact that often the negotiated issues have time-varying, inter-dependent complementarities. That is, from the point of view of the players, the requirements with regard to a certain issue may change depending on the results of negotiations with regard to other issues. If the negotiation occurs over bundles of issues, the players have to consider the inter-relationships in advance in order to calculate the utilities of the possible outcomes and settle an agreement that provides a satisfactory trade-off. On the other hand, by negotiating the issues sequentially, it is expected that these inter-issue relations are more naturally dealt with.
190
191
2
Bargaining Models
A bargaining situation consists of two or more players trying to engage in a mutually beneficial agreement over a set of issues. The players, or agents, have a common interest to cooperate; the question that remains open is which one of the possibly several compromise settings will be chosen by the players. 3 That decision should be deliberated by the participating agents, in light of their different and perhaps incompletely revealed, conflicting preferences. The seminal work by Rubinstein 4 set the dominant tone in the systematic analysis of bargaining games. Rubinstein started by illustrating the typical situation using the following scenario: two players, A\ and A2, have to reach an agreement on the partition of a "pie". For this purpose, they alternate offers describing possible divisions of the pie, such as " A\ receives x and A
192 For a setting where agents A\ and A^ are penalized with discount factors 5-y and 62, respectively, and assuming that A\ is granted the first offer, the composition of the P.E.P contract is that player A\ receives a share of the pie which returns her a utility of U\ = (1 — #2)/(l ~ ^1^2)) whereas player A^ gets a share that returns him a utility of C/2 = £2(1 — ^i)/(l — ^lfe). It is possible to perform a similar analysis for the finite-horizon case. Say the maximum number of steps in the game, n, is common knowledge to the players. In the case where n = 1 (also known as the ultimatum game), agent A\ makes the only offer; A2 can accept it or refuse it; in either case the negotiation process ends. If the offer is refused, both agents receive nothing. For a rational agent "anything is better than nothing"; therefore, A\, knowing about the rationality of its opponent, will tend to keep the whole pie to herself, offering only a minimum share to A^ aware that there are no further stages to be played in the game, rational A2 inevitably accepts the tiny offer. Applying a backward induction reasoning on the situation above, it is possible to calculate the P.E.P for n > 1. For values of S close to 1, finite-horizon alternating-offers bargaining games give a great advantage to the player making the last offer, since it becomes similar to an ultimatum game.
3
Evolutionary Computation and Economic Agents
The dissertation by Oliver 5 was the first work that succeeded to show that evolutionary algorithms can be used in the design of strategies for multiissue negotiations. Oliver's motivation originated from the observation that negotiation problems are rather inefficiently resolved by humans, who often settle in suboptimal agreements. In his framework, a strategy consists of a vector of numbers that represent offer and threshold values. Offer values indicate the portion that the proposer is willing to share with an opponent; threshold values correspond to the minimum value a received offer should have in order to be accepted. The work by van Bragt, Gerding and La Poutre 6 has a different spirit from the previous one; an interesting and detailed game-theoretic analysis of the evolved trading strategies is performed. Using Rubinstein's alternatingoffers bargaining game with a finite-horizon as a base model, they perform several numerical experiments with traders that evolve strategies in the same format devised by Oliver.5 The results show that despite the bounded rationality of the traders (since they are only aware of their own payoffs and discount factor), the evolved behaviors are aligned with what is predicted by game theory.
193
4
Sequential Multi-issue Bargaining Model
In the classic multi-issue alternating-offers model, the issues are negotiated simultaneously, in a bundle. If there are N issues in the setting, trader Ak makes an offer to its opponent by passing a vector Ok = (o\, 02,03,..., ON) with one offer relative to each one of the issues. Usually in multi-issue situations, a trader describes its preferences on the several issues through a vector of weights, Wk — {w\, w-i, W3,..., WN), indicating the relative importance of the issues. If an offer Ok is accepted, the offering trader receives an utility of (J — Ok) • Wk, where J is the unity vector, assuming that 0 < Oj < 1, i e [0,1, 2 , . . . N]. Accordingly, the agent receiving the offer gains an utility of Ok • Wq, where Wq denotes its own weight vector. However, often the issues are inter-related and complementary; the utility of an issue is a function of the values obtained from other issues. One could devise situations where the weights attributed to the issues may change according to the value obtained in other issues, or vary as a function of some other external parameter, such as time. If there are single issues or subgroups of issues within the whole set that are substitutable, it may be the case that the utility obtained with one issue or a subgroup of issues affects the weight assigned to other issues. Building up in an example presented by Boutilier et al., 7 if a producer is negotiating with a transportation company the most suitable way to carry its goods to the consumers, the agenda of issues may contain options such as trucks, ships, and airplanes. However, if the producer succeeds in obtaining a reasonable deal with the trucks, the utility of the ships and airplanes would be diminished. Negotiating all the issues at once in such a scenario demands the agent to consider all the inter-dependencies between the issues, before computing the utility of a contract or making an offer to its opponent. The calculation of all the possible combinations and trade-offs can be computationally expensive, especially if the the number of related issues is large. By negotiating the issues sequentially, this cost could be naturally avoided. 4-1 Model Description and Experiments Experiments were performed" with a model inspired by van Bragt et al.'s framework.6 Two bargaining agents, Ai and A2, each one equipped with its own evolutionary algorithm, optimize the numerical parameters of the negotiation strategies. The strategies consist of vectors of floating point numbers in "The system was implemented using the Swarm Simulation System, developed by the Swarm Development Group. Detailed information about the software can be found at www.swarm.org. The source code used in the experiments described in this paper is available upon request.
194 the interval [0,1], encoding offers and thresholds, as the strategies employed by Oliver.5 Being a finite-horizon model, the total number of offers that can be exchanged between the traders has a maximum value of n. If n is even, as Ai always makes the first offer, the last offer is granted to A2. If n is odd, Ai has the first and the last offers. Traders should reach an agreement before the maximum number is exceeded, otherwise they receive a null payoff. As the issues are negotiated in sequence, each strategy corresponds to a set of N sub-strategies, each one relative to one issue. Each agent uses a conventional (/z + A) evolution strategies (ES). 2 In one complete iteration, all the strategies are evaluated and ordered according to their fitness values. In an (/x + A)-ES, the best /i strategies (parents) remain in the set from one iteration to the next; in addition, A new strategies (offspring) are produced at each iteration. Offspring is generated by applying operators such as mutation and recombination in the set of parents. In the experiments, only the mutation operator was employed when generating offspring. In an ES, mutation consists of adding or subtracting samples from a Gaussian distribution with standard deviation s to the parameters of a certain parent strategy. The parameter s is self-regulated and determines the strength of the mutation. Each strategy keeps the s value of the Gaussian distribution from which it was generated; at each iteration, the average of the parents' standard deviations is used to produce the Gaussian distribution that generates the next set of offspring. Threshold and offer values were only accepted in the range [—1,1] (negative values were used in their absolute form); any strategy that contained a value out of that range received a penalty, if the respective parameter was demanded by the negotiation process. The parameters /i and A were both set to 25. Each simulation instance was run for at least 750 generations. At every generation, each one of the strategies owned by A\ had to confront a randomly chosen subset of size 25 of A2 strategies, and vice versa. The fitness value of a strategy was calculated as the mean value of the all the payoffs and penalties obtained in the confrontations in one generation. Ak's payoff, t/jt, was calculated as follows. Assume a deal on the first issue I is reached at t = 77, yielding Ak a share of a, and a deal on the second issue is reached at t = 777, yielding a share of (3, then Uk is: _ ST' -a-wfk k ~
+
A,
wfk + wff
-p-wfjk (-V
Note that the discount factor is more severe with IPs share, as it is negotiated at least one stage after issue /. Moreover, in this model of sequential
195
bargaining, if the traders can not reach an agreement on the division of issue / , the confrontation is halted and the bargaining on issue II is canceled. In the first session of experiments, the influence of different values of 6 was investigated. The tested values of S were in the interval from 0 to 1, in increments of 0.1. The original amount of the pie at t = 0 is 1. The same value of 5 is applied to both traders and issues. The vector of weights for agents A\ and A2 were, respectively, (wf\wfl) = (0.3,0.7) and (wf2, wff) = (0.7,0.3). Figures 1 and 2 show the P.E.Ps predicted by a game theoretic analysis for finite bargaining games of size n —> oo (full line) and n = 10 (dashed line), for agents A\ and A2, respectively. These partitions were calculated by regarding the negotiation process of each one of the issues as a single game. After calculating the values of each agent's shares in the equilibrium for each one of the games, the utilities were calculated by (1) discounting 5 in the share obtained from issue II, as it is negotiated one stage after issue /, and (2) weighting the equilibrium shares with the respective set of individual weights. The dotted lines are the payoffs obtained by the evolutionary traders (mean value over the whole set of strategies in the last 100 generations from a total of 750); 20 runs were performed for each of the 5 values. The vertical bars at each of the tested points show the standard deviation of the results.
0.0
02
04
0.6
08
10
discounbng factor
Figure 1. Relation between the discount factor and agent Ai's utility, in the multi-issue sequential model of sizes n = 10 (dashed) and n —> 00 (full). The dotted line shows the utility actually obtained by the evolutionary agent in the experiments, when (vjj 1, WJJ1 ) = (0.3,0.7).
As noted by van Bragt et al., 6 despite the bounded rationality of the bargainers, who have no explicit representation of the size of the game or any knowledge about the opponent's discount factor values, the traders achieve
196
0.0
0.2
0.4
0.6
0.8
1.0
discounting factor
Figure 2. Same as in Figure 1, for agent A2's utility, when (wj 2,Wjf)
= (0.7,0.3).
outcomes which are qualitatively close to what is predicted by game theoretical models. In these results, a phenomenon that was previously observed 6 is also detected: A\ does a little better than the game theoretic predictions, whereas Ai performs considerably worse. It is suggested 6 that the poor performance of A2 is due to the fact that, especially for small values of S, it is too severe to A^ not to accept A\& first offer; the latter, taking advantage of this fact, then offers very small shares of the pie. Indeed, it is also observed in the results of this first session that the smaller the S the higher is the frequency of deals that are closed right away in the first stage (Table 1). Also, it is interesting to notice that there is a great leap in the average size of the negotiation process between 5 = [0.9,1.0]. Intuitively, one would expect that the negotiation process sizes would grow smoothly, following the decrease in time pressure. However, the parameter S is strongly perceived by the evolutionary processes, which leads the agents to play the game as if there were only one offer to be exchanged, resulting in a great advantage to the agent that makes the first offer. Table 2 shows the average value of the first offer for issue / by Ai over all the strategies at the 750*^ generation, and the correspondent average threshold by A%, against which the first offer is checked. It is interesting to notice that despite the spread of ^2's threshold values, the offers by Ai are equally spread and just higher enough to be accepted right away by A2 (in average). In the second session of experiments, the vector of weights {wf1, wf/) was set to different values. 20 runs were performed for each case, with S = 0.9. The data shown in Table 3 is an average of the payoffs obtained in all confrontations in the last 100 generations (total of 750).
197 Table 1. Average number of stages until an agreement is reached as a function of the discount factor (average for both issues over 20 runs, with n = 10).
5 0.1 0.2 0.3 0.4
5
# stages (er) 1.06 1.04 1.06 1.05
(0.00) (0.00) (0.00) (0.01)
0.5 0.6 0.7
# stages (a) 1.11 (0.01) 1.13 (0.01) 1.22 (0.02)
S
# stages (a)
0.8 1.24 (0.02) 0.9 1.39 (0.04) 1.0 8.19 (0.12)
Table 2. Ai's average first offer and ^ ' s average first threshold for issue / , across all the strategies at generation 750 t h , for each one of the runs (S = 0.9, n = 10). Run
Ax off.
A2 t h r .
Run
Ai
off.
1 2 3 4 5 6 7 8 9 10
0.28 0.79 0.55 0.35 0.55 0.17 0.85 0.37 0.79 0.82
0.01 0.68 0.45 0.31 0.37 0.02 0.80 0.27 0.68 0.72
11 12 13 14 15 16 17 18 19 20
0.63 0.78 0.73 0.65 0.14 0.65 0.38 0.91 0.48 0.35
A2
thr.
0.55 0.73 0.62 0.54 0.05 0.57 0.30 0.83 0.38 0.28
Table 3. Results using different weights for Alt for 20 runs each. (wf2,wfj2) = (0.7,0.3), and S = 0.9, n = 10 (ut. = utility; * are the values used in the previous session).
(wf^wfp
Ai ut.
Ai ut.std.
A2nt.
A2 ut.std.
(0,1) (0.1, 0.9) (0.3, 0.7)* (0.5, 0.5) (0.7, 0.3) (0.9, 0.1) (1,0)
0.42 0.47 0.43 0.48 0.48 0.54 0.53
0.01 0.00 0.01 0.00 0.00 0.00 0.01
0.64 0.53 0.50 0.42 0.43 0.39 0.46
0.01 0.01 0.00 0.00 0.00 0.00 0.00
Noticeably, there is a tendency that makes A\ to receive higher values of utility as Wj 1 increases. This can be justified as a combination of two factors. First, the influence of the discount factor applied to IPs share decreases, as II's relative importance gradually diminishes in the total utility received by
198
A\. As the importance of issue I increases, A\ manipulates ^2's inability of using the advantage of offering last, in the presence of a relative slight discount factor. In the last session of experiments, a simple case of a negotiation over intersubstitutable issues was simulated. The values of wnk were made dependent of the size of the share obtained from issue I, i.e., if an agent obtained more than 0.9 of issue I, then WJJ was made 0.1, otherwise, 0.9. The weight assigned to I by both agents was fixed in 0.3. As n — 10 and 6 = 1.0, Ai has the last offer advantage in both issues. We were particularly interested to see whether it would be possible for the players to engage in an agreement which would be socially fair, i.e., once A2 obtained a large portion of issue I, and therefore becoming "less interested" in disputing issue II, would that allow A\ to obtain a more satisfactory share of issue III In fact, the weight ratio is kept constant (3:1) between the most and least valued issues in both agents; however, whether issue I or 77 is the most valued depends on how the negotiation of I develops. As the relative importance of the issues may shift between them, they are considered to be substitutable. The question addressed is whether learning agents equipped with evolutionary algorithms are capable to achieve a mutually satisfactory solution in such setting. Figure 3 shows the histograms of the utilities obtained by A\ in the setting with variable valuations (right) and in a test case (left), where both issues are equally valuated with fixed weights by the players throughout the bargaining game {WJJ = 0.5). This latter case can be interpreted as a situation where the agents regard the issues as being perfectly inter-substitutable; as the agents valuate them with the same importance, both issues are disputed with the same strength. From the results, it is possible to observe that, as expected, when the weights are fixed, A2 makes use of its last offer advantage very frequently, yielding A\ very low payoffs (leftmost bar at U\ = 0 ) . When the valuation is variable, though the frequency of low payoffs is still relatively high (i.e., A\ looses I and II entirely to A2), with almost the same frequency Ai is able to obtain everything of issue II. The bar at 0.7-0.8 in the right figure cover the cases where A\ obtains practically nothing of I and almost everything of II, representing a situation of high social welfare, which suggests that there are situations where the sequential negotiation can be beneficial. 5
Conclusions
This paper presented a model of a sequential multi-issue alternating-offers bargaining model, where the agents have their strategies devised by an evolutionary algorithm. Differently from the usual bargaining model, where several issues are negotiated simultaneously, in this setting the issues are disputed one
199
Figure 3. Histograms for the average utility obtained by A\ over 50 runs, in t h e last 100 generations (total of 1000), with fixed (left), and variable weights (right).
by one, in sequence. Numerical experiments were performed; the results are qualitatively aligned with game theoretic predictions, as previously shown in a simultaneous multi-issue model, 6 despite the fact that the evolving agents have no restrictions concerning rational behaviors. A simple case with intersubstitutable issues was also presented, illustrating a possible scenario where a sequential negotiation may actually be beneficial for both parties to achieve a satisfactory agreement. Acknowledgments Thanks to four anonymous reviewers for their helpful comments. NEN receives partial financial support from CNPq under grant #200050/99-0. References C. Boutilier, Y. Shoham, and M. P. Wellman, editors. Artifical Intelligence, vol. 94 (1-2). July 1997. T. Back, G. Rudolph, and H.-Paul Schwefel. Evolutionary programming and evolution strategies: Similarities and differences. Proc. the 2nd Annual Evolutionary Programming Conference, 11-22, February 1992. A. Muthoo. A non-technical introduction to bargaining theory. World Economics, 145-166, 2000. A. Rubinstein. Perfect equilibrium in a bargaining model. Econometrica, 50(1):97109, January 1982. J. R. Oliver. On Artificial Agents for Negotiation in Electronic Commerce. P h D thesis, U. of Pennsylvania, 1996. D. D. B. van Bragt, E. H. Gerding, and J. A. La Poutre. Equilibrium selection in alternating-offers bargaining models: The evolutionary computing approach. In 6" 1 Int. Conf. of the Society for Computational Economics on Computing in Economics and Finance (CEF'2000), July 2000. C. Boutilier, M. Goldszmit, and B. Sabata. Sequential auctions for the allocation of resources with complementarities. In Proc. of the Int. Joint Conf. on Artificial Intelligence (IJCAI-99), 527-534, 1999.
AFFECT AND AGENT CONTROL: EXPERIMENTS WITH SIMPLE AFFECTIVE STATES MATTHIAS SCHEUTZ Department of Computer Science and Engineering University of Notre Dame, Notre Dame, IN 46556, USA E-mail: [email protected] AARON SLOMAN School of Computer Science The University of Birmingham, Birmingham, BI5 2TT, UK E-mail: [email protected] We analyse control functions of affective states in relatively simple agents in a variety of environments and test the analysis in various simulation experiments in competitive multi-agent environments. The results show that simple affective states (like "hunger") can be effective in agent control and are likely to evolve in certain competitive environments. This illustrates the methodology of exploring neighbourhoods in "design space" in order to understand tradeoffs in the development of different kinds of agent architectures, whether natural or artificial.
1
Introduction
Affective states (such as emotions, motivations, desires, pleasures, pains, attitudes, preferences, moods, values, etc.) and their relations to agent architectures have been receiving increasing attention in AI and Cognitive Science.1'2'3 Detailed analyses of these subspecies of affect should include descriptions of their functional roles in contributing to useful capabilities within agent architectures4, complemented by empirical research on affect in biological organisms and concrete experiments with synthetic agent architectures, to confirm that the proposed architectures have the claimed properties. Our approach contrasts with most evolutionary AI research, which attempts to discover what can evolve from given inital states. Instead, we explore "neighbourhoods" and "mini-trajectories" in design space, by starting with examples of agent architectures, then explicitly provide possible extensions with evolutionary operators that can select them, and run simulations to investigate which of the extensions have evolutionary advantages in various environments. This can show how slight changes in environments alter tradeoffs between design options. To illustrate this methodology, we next analyse functional roles of affective states and then describe our simulation experiments which show how certain simple affective control mechanisms can be useful in a range of environments and are therefore likely to evolve in those environments.
200
201 2
What Affective States are and aren't
If we attempt to define "affective" simply in terms of familiar examples, such as "desiring', "having emotions", "enjoying", etc. we risk implicitly restricting the notion to organisms with architectures sufficiently like ours. That could rule out varieties of fear, hunger, or aggression found in insects, for example. We need an architecture-neutral characterisation, which is hard to define if it is to be applicable across a wide range of architectures (such as insect-like reactive architectures or deliberative architectures with mechanisms able to represent and reason about nonexistent and possible future states). Our best hope is to define "affective" in terms of a functional role which can be specified independent of the specific features of an architecture. The intuitive notion of "affect" already has two aspects that are relevant to a variety of architectures, namely direction and evaluation. On the one hand there is direction of internal or external behaviour, for instance, wanting something or trying to avoid something. On the other hand there is positive or negative evaluation of what is happening, for instance, enjoying something or finding it unpleasant. However, even evaluation is linked to direction insofar as enjoying involves being disposed to preserve or repeat and finding painful involves being disposed to terminate or avoid. Either way affective states are examples of control states5. Yet, not all states in control systems are affective states, even if they have some effect on internal or external behaviour. For instance, perceiving, knowing, reasoning, and self-monitoring can influence behaviour but are not regarded as affective. Suppose an agent can use structures as representations of states of affairs (never mind how). Anything that represents must be capable of failing to represent. There are various kinds of mismatch, and in some cases the mismatch can be detected, for instance perceiving that some desired state has or has not been achieved, or that a goal is being approached but very slowly. If detection of a mismatch has a disposition to cause some behaviour to reduce the mismatch there are (to a first approximation) two main cases: (1) the behaviour changes the representation to fit the reality, or (2) the behaviour changes reality to fit the representation. In (1) the system has a "belief-like" state, and in (2) a "desire-like" state. In other words, belief-like states tend to be changed to make them fit reality, whereas attempts are made to change reality to make it fit desire-like states. It is this distinction between belief-like and desire-like control states that can give us a handle on how to construe affective states, namely as "desire-like" control states whose role is initiating, evaluating and regulating, internal or external behaviour, as opposed to merely acquiring, interpreting, manipulating, or storing information (that might or might not be used in connection with affective states to initiate or control behaviour). A state representing the current position of an effector, or the location of food
202
in the environment, or the agent's current energy level is, therefore, not an affective state. However, states derived from these which are used to initiate, select, prioritise, or modulate behaviour, either directly or indirectly via other such states would be affective states. An example might be using a measurement of the discrepancy between current energy level and a "target" level (a "hunger" representation), to modulate the tendency of the system to react to perceived food by going for it. This might use either a "hunger threshold" to switch on food-seeking or a continuous gain control. In complex cases, the "reference states" used to determine whether corrective action is required may be parametrised by dynamically changing measures or descriptions of the sensed state to be maintained or prevented, and the type of corrective action required, internally or externally. For instance, an organism that somehow can record how frequently food sources are encountered might use a lower hunger threshold to trigger searching for food. If sensitive to current terrain it might trigger different kinds of searches in different terrains. Thus while the records of food frequency and terrain features are acquired they function as components of perceptual or belief-like states, whereas when they are used to modulate decision making they function as components of affective states. Affective states can vary in cognitive sophistication. Simple affective mechanisms can be implemented within a purely reactive architecture, like the "hunger" example. More sophisticated affective states which include construction, evaluation and comparison of alternatives, or which require high-level perceptual categorisations, would require the representational resources of a deliberative architecture. However, recorded measurements or labels directly produced by sensors in reactive architectures can have desire-like functions, and for that reason can be regarded as affective states that use a primitive "limiting case" class of representations6. The remainder of this paper describes simulation experiments where agents with slightly different architectures compete for resources in order to survive in a carefully controlled simulated environment. Proportions surviving in different conditions help to show the usefulness of different architectural features in different contexts. It turns out that simple affective states can be surprisingly effective. 3
The Simulation Environment
The simulated environment consists of a rectangular surface of fixed size (usually around 800 by 800 units) populated with various kinds of agents and other objects such as "lethal" entities of various sizes, some static and some moving at different speeds in different directions, and "food items" (i.e., energy sources which pop up at random locations and disappear after a pre-determined period of time unless consumed by agents). Agents use up energy at a fixed rate, when stationary, and require additional energy proportional to their speed, when moving. Hence, they are in per-
203
manent need of food, which they can consume sitting on top of a food source in a time proportional to the energy stored in the food source depending on the maximum amount of energy an agent can take in at any given time. Agents die and are removed from the simulation if they run out of energy, or if they come into contact with lethal entities or other agents. All agents are equipped with a "sonar" sensor to detect lethal entities, a "smell" sensor to detect food, a "touch" sensor to detect impending collisions and an internal sensor to measure their energy-level. For both sonar and smell sensors, gradient vectors are computed and mapped onto the effector space (see below), yielding the direction in which the agent will move. The touch sensor is connected to a global alarm system, which triggers a reflex to move away from anything touched, unless it is food. These movements are initiated automatically and cannot be controlled by the agent. They are somewhat erratic and will slightly reorient the agent (thus helping it to get out of "local minima"). On the effector side, agents have motors for locomotion (forward and backward), motors for turning (left and right in degrees) and a mechanism for consuming food. After a certain number of simulation cycles, agents reach maturity and can procreate asexually, in which case depending on their current energy level they will have a variable number of offspring which pop up in the vicinity of the agent one at a time (the energy for creating a new agent is subtracted from the parent, occasionally causing the parent to starve). While different agents may have different short term goals at any given time (e.g., getting around lethal entities or consuming food), common to all of them are the two implicit goals of survival (i.e., to get enough food and avoid running into/getting run over by lethal entities or other agents) and procreation (i.e., to live long enough to have offspring). For evolutionary studies, a simple mutation mechanism modifies with a certain probability some of the agent's architectural parameters (e.g., the parameters responsible for integrating smell and sonar information). Some offspring will then start out with the modified parameters instead of being exact copies of the parent. This mutation rate as well as various other parameters need to be fixed before each run of the simulation (a more detailed description of the simulation and its various control parameters is provided elsewhere)7. In is worth pointing out that our setup differs in at least two ways from other simulated environments that have been used to study affective states. 8 ' 9 ' 10 ' 11 ' 12 First, by allowing agents to procreate (i.e., have exact copies of themselves as offspring) we can study trajectories of agent populations and can thus identify properties of architectures that are related to and possibly influence the interaction of agent populations. And second, by adding mutation, we can examine the potential of architectures to be modified and extended over generations of agents. In particular, by controlling which components of an architecture can change while allowing for randomness in
204
the way they can change, we are able to study evolutionary tradeoffs of such extensions/modifications. From these explorations of "design space" and "niche space"13 we cannot only derive advantages and disadvantages of architectural components, but also the likelihood that such components would have evolved in natural systems using natural selection. 4
The Agents and their Architectures
In the following we consider two kinds of agents: reactive agents (R-agents) and simple affective agents (A-agents) (other studies have compared different kinds7). R-agents process sensor information and produce behavioural responses using a schema-based approach, which obviates the need for a special action selection mechanism: both smell and sonar sensors provide the agent with directional and intensity information of the objects surrounding the agent within sensor reach, where intensity = 1/'distance2 (i.e., the distance of the object from the current position of the agent). The sum of these vectors (call them S and F for sonar and food, respectively) is then computed as a measure of the distribution of the respective objects in the environment and passed on to the motor schema, which maps perceptual space into motor space yielding the direction, in which to go: SS + 7 F (where 5 and 7 are the respective gain values).0 A-agents are extensions of R-agents. They have an additional component, which can influence the way sensory vector fields are combined by altering the gain value 7 based on the level of energy. In accordance with our earlier analysis of affective states as modulators of behaviours and/or processes, this component implements an affective state, which we call "hunger". The difference in the architecture gives rise to different behaviour: R-agents are always "interested" in food and go for whichever food source they can get to, while A-agents are only interested in food when their energy levels are low. Otherwise they tend to avoid food and thus competition for it, reducing the likelihood of getting killed because of colliding with other competing agents or lethal entities. 5
The Behavioural Potential of a Simple Affective State
We start our series of experiments by checking whether each agent kind can survive in various kinds of environments on its own. Five agents of the same kind are placed in various environments (from environments with no lethal entities to very "dangerous" environments with both static and moving lethal entities) at random locations to "average out" possible advantages due to their initial location over a large number 0
Note that this formula leaves out the details for the touch sensor for ease of presentation.
205
Table 1. Surviving agents in an n-environment when started with 5 agents of only one kind. R-agents Env 0 5 10 20 30 40 50
A* 14.60 13.20 11.90 11.60 7.50 2.90 0.20
(7
2.80 4.78 3.81 3.47 4.43 3.57 0.63
Con 1.73 2.96 2.36 2.15 2.75 2.21 0.39
A* 19.20 17.20 17.20 15.40 13.00 10.40 8.00
A-agents a 2.74 3.05 3.77 3.95 3.56 3.57 3.56
Con 1.70 1.89 2.33 2.45 2.21 2.21 2.21
Table 2. Surviving agents in an n-environment when started with 5 agents each of boths kinds. Env 0 5 10 20 30 40 50
A« 0.00 0.00 1.60 0.10 0.00 0.00 0.00
R-agents a 0.00 0.00 5.06 0.32 0.00 0.00 0.00
Con 0.00 0.00 3.14 0.20 0.00 0.00 0.00
A* 17.20 16.30 14.50 14.50 15.10 12.80 10.00
A-agents a 3.61 2.91 6.54 4.22 3.35 2.49 3.16
Con 2.24 1.80 4.05 2.62 2.08 1.54 1.96
of trials. The "food rate" is fixed at 0.25 and the procreation age at 250 update cycles. Table 1 shows for each agent kind the average number ([/,) of surviving agents as well as standard deviation (
206
clearly the ability of affective agents to coexist in large groups. With lower food rates the advantage of A-agents over R-agents slowly decreases as waiting for hunger to grow before moving towards food is not a good strategy. Eventually, at food rates of 0.125 and below, survival in crowded environments becomes impossible for any agent kind-there are simply too many lethal entities obstructing the paths to food. The superior performance of A-agents might not seem very surprising, since the additional information about the current energy level, ignored by R-agents, but utilized by A-agents, allows for a more complex mapping between sensory input and behavioural output. However, using more information does not automatically lead to better performance, as can be seen from the fact that A-agents may lose out against R-agents if the "rules of the game" are slightly modified: in a simulation without procreation, where either the numbers of surviving agents of each kind are counted after a predetermined number of cycles or the average lifespan of an agent is used as a measure of fitness, R-agents almost always perform (slightly) better than Aagents (in all of the above environments). Only in combination with procreation is the tendency of A-agents to distribute themselves better over the whole environment (in Seth's terminology: their lower degree of "dumpiness" 12 ) by virtue of being at times less attracted to food beneficial, as their offspring will benefit from not having to compete immediately with many other agents in their vicinity. In this light, the answer to the following question whether A-agents can be produced by some evolutionary process is not obvious at all. 6
Simple Affective States Can Be Evolved
To study the degree to which simple affective states like "hunger" can be evolved in a competitive environment, we allowed for mutation of the link between the component connected to the energy sensor (which is supposed to assume the role of the affective "hunger" state) and the component encoding the food gain value 7 in the mapping from perceptual to motor space. This link, expressed as a multiplicative factor and called "foodweight", is initialised at random in the interval (—0.2,0.2). Whenever an agent has offspring, the probability for "genetic modification" of the foodweight is 1/3 and the probability for weight increase/decrease (by the given factor T — 0.05) is 1/6, respectively. Everything else remains the same. Of all seven environments, A-agents did not survive in the 40- and 50environments, which are very tough in that wrong moves are punished right away: there is simply no room for genetic trial and error.6 In the other five environments, A-agents evolved using the state in the expected way, although to varying degrees: the less crowded an environment, the better the use 'The only agents that survived on 2 out of 10 runs were the R-agents in 40-environments.
207
Table 3. Average weight values, standard deviation and confidence level at a = 0.05 for the "foodweight" of the surviving affective agents in one run in an n-environment. Env 0
5
Foodweight Con a M 0.26 0.09 0.05 0.27 0.05 0.03 0.23 0.08 0.05 0.07 0.05 0.03 0.17 0.06 0.04 0.33 0.12 0.07 0.19 0.06 0.03 0.10 0.09 0.06 0.18 0.04 0.03
Env 10
20 30 40 50
Foodweight Con a M 0.19 0.07 0.05 0.30 0.04 0.03 0.29 0.09 0.05 0.24 0.12 0.00 0.17 0.11 0.07 0.13 0.07 0.04 0.00 0.00 0.00 0.00 0.00 0.00
Table 4. Surviving affective agents in an n-environment when started with 5 R-agents, which can have randomly initialised A-agents as part of their offspring with a probability of 0.25. No R-agent survived a single run. Env 0 5 10 20 30 40 50
A-agents a H 17.90 3.60 14.90 1.91 15.10 3.03 11.53 8.39 5.45 3.80 5.57 4.10 2.31 1.00
Con 2.23 1.19 1.81 5.20 3.38 3.45 1.43
Foodweight Con a A* 0.18 0.10 0.06 0.19 0.11 0.07 0.19 0.11 0.07 0.18 0.09 0.05 0.17 0.09 0.06 0.21 0.10 0.06 0.21 0.09 0.06
of the state can be evolved, the reason being that agents with initial random weights are very likely to be inefficient in navigating through the environment, if able at all. In such cases it is helpful if food is not obstructed by too many lethal entities. Table 3 shows for each environment mean, standard deviation and confidence interval (again for a — 0.05) for weights for all those runs on which affective agents survived. The above experiment also works for different mutation rates as well as different values of r. Note that while in 5 out of 10 runs the "affective use" of the state was evolved in 0-environments, only in 2 out of 10 runs was the use evolved in 30-environments. The positive value of the foodweight indicates that the hunger state deserves its name. Yet, the magnitudes of the weight seem small given the procreation age of 250 and the increment/decrement factor r = 0.05. On closer inspection, however, it turns out that evolution was quite fast, since assuming that there are only about 40
208
generations of agents in each run, and given that the probability of a positive increase of the weight by r is 1/6, then starting at a slightly positive hunger weight, say, the maximum we should expect is about 1/3. We have not dealt with issues of genetic coding and how genetic codes relate to the "added machinery" in the cognitive architecture of affective agents. Rather, we assume that adding a realizer of such a state (e.g., a neuron) is an evolutionarily feasible operation (e.g., which could result from some sort of duplication operation of segments of genetic information14) and that mutation on genetically coded weight information can lead to an increase or decrease of weight values. We have, however, considered an evolutionarily more plausible variant of the experiment. Starting with R-agents, let some of their offspring have additional architectural capacities with a certain probability (in our case, the capacities of A-agents). The probability, with which R-agents have such randomly initialised A-agents as offspring is 0.25 (the results are also valid for much lower rates such as 0.05). It turns out that environments with only R-agents in the beginning will eventually be also populated by A-agents (most of the time exclusively, see Table 4). It is worth mentioning that the results of this section also hold for extended simulations, where agents need a second resource (e.g., water) for survival. Multiple affective control states (e.g., "hunger" and "thirst") are even more beneficial when agents have multiple needs, which can be seen from the fact that R-agents can hardly survive on their own in such a setting (to "always go for the nearest resource" is not simply not a good strategy, e.g., see u ) . They even lose against A-agents if fitness is determined without procreation (see the end of the last section). 7
Discussion
The above experiments help us understand some of the conditions in which affective states like hunger have survivial value, and indicate that in certain competitive environments, if there is an option to develop new architectural resources that implement such affective states, then these resources will likely evolve. Especially the last result is not obvious, for a reason that makes the question why higher species with more complex and sophisticated control architectures evolved in the first place so fascinating: every species along an evolutionary trajectory has to have a viable control architecture, which allows its individuals to survive and procreate, otherwise it will die out. This is a very severe constraint imposed on trajectories in design and niche space, which we are only slowly beginning to understand. Our investigations are, of course, just a start. Many more experiments using different kinds of affective states are needed to explore the space of possible uses of affective states and the space of possible affective states itself. We have begun to explore a slightly different neighbourhood in design space by allowing some agents
209 to have deliberative capabilities, and comparing them with A-agents . In a surprising variety of environments the deliberative agents do less well, though a great deal more investigation is needed. Further work on the capacities of affective states as control mechanisms and the likelihood of their evolution in certain environments should thus also help to explain why evolutionary developments that increase intelligence by adding a deliberative layer were favoured by so few species! Acknowledgments The work was conducted while the first author was on leave at the School of Computer Science at the University of Birmingham and funded by the Leverhulme Trust. References 1. K. Oatley and J.M. Jenkins. Understanding Emotions (Blackwell, Oxford, 1996). 2. R. Picard. Affective Computing (MIT Press, Cambridge, Mass, 1997). 3. G. Hatano, N. Okada, and H. Tanabe, eds. Affective Minds (Elsevier, Amsterdam, 2000). 4. A. Sloman, In Cognitive Processing, 1 (2001). 5. A. Sloman. In Philosophy and the Cognitive Sciences, eds. C. Hookway and D. Peterson (Cambridge University Press, 1993). 6. A. Sloman. In Forms of representation: an interdisciplinary theme for cognitive science, ed. D.M.Peterson (Intellect Books, Exeter, U.K., 1996). 7. M. Scheutz and B. Logan. In Proceedings ofAISB 01 (AISB Press, 2001). 8. P. Maes. In From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, eds. J.A. Meyer and S.W. Wilson (MIT Press, Cambridge, MA, 1991). 9. T. Tyrrell. Computational Mechanisms for Action Selection (Ph.D. Thesis, University of Edinburgh, 1993). 10. D. Canamero. In Proceedings of the First International Symposium on Autonomous Agents (AA'97, Marina del Rey, CA, The ACM Press, 1997). 11. E. Spier From reactive behaviour to adaptive behaviour (Ph.D. Thesis, University of Oxford, 1997). 12. A. Seth. On the Relations between Behaviour, Mechanism, and Environment: Explorations in Artificial Evolution (Ph.D. Thesis, University of Sussex, 2000). 13. A. Sloman. In Parallel Problem Solving from Nature - PPSN VI, LNCS 1917, eds. M. Schoenauer, et al. (Berlin, Springer-Verlag, 2000). 14. J. Maynard-Smith and E. Szathmary. The Origins of Life: From the Birth of Life to the Origin of Language (Oxford University Press, Oxford, 1999).
META-LEARNING PROCESSES IN MULTI-AGENT SYSTEMS R O N SUN CECS,
University of Missouri, Columbia, MO 65211, E-mail: [email protected]
USA
Straightforward reinforcement learning for multi-agent co-learning settings often results in poor outcomes. Meta-learning processes beyond straightforward reinforcement learning may be necessary to achieve good (or optimal) outcomes. Algorithmic processes of meta-learning, or "manipulation", will be described, which is a cognitively realistic and effective means for learning cooperation. We will discuss various "manipulation" routines that address the issue of improving multi-agent co-learning. We hope to develop better adaptive means of multi-agent cooperation, without requiring a priori knowledge, and advance multi-agent co-learning beyond existing theories and techniques.
1
Introduction
It is common that a group of agents deal with a situation jointly, with each agent having its own goal and performing a sequence of actions to maximize its own payoffs. However, difference sequences of actions (by different agents) interact in determining the final outcome for each agent involved. In such a situation, each agent has to learn to adapt to other agents that are present and "negotiate" an equilibrium state that is beneficial to itself (e.g. Kahan and Rapoport 1984). Our focus will be on nonverbal "communications" in which (sequences of) actions by agents may serve the purpose of communicating intentions and establishing cooperation, in an incremental and gradual way. The ultimate goal is to avoid mutually harmful outcomes and to distribute payoffs to individual agents in a rational way. We need some framework that determines a proper strategy for each agent in order to deal with the presence of other agents. The framework should allow adaptive determination of strategies on the fly during interaction. The framework should allow learning from scratch without a priori domain knowledge. Game theory (e.g., van Neumann and Morgenstern 1944) has been focusing on static equilibria of strategies in a variety of game settings (Osborne and Rubinstein 1994), which furthermore unrealistically assumes unbounded rationality on the part of agents (Simon 1957). The recent surge of study of game learning (e.g., Fudenberg and Levine 1998, Camerer and Ho 1999) brings adaptive processes of reaching equilibria into the focus. To study the dynamics of reaching equilibria, learning algorithms need to be developed. However,
210
211
learning algorithms studied in much existing work on game theoretic learning have been overly simple and, thus, to a large extent, unrealistic. Beyond static equilibria and simple learning, complex algorithmic processes involved in learning by cognitive agents (Sun et al 2001) need to be studied. By complex algorithmic processes, I mean procedures that include detailed, varied, and subtle steps of manipulation of information, strategies, or equilibria (see Sun and Qi 2000, Sonsino 1997 for preliminary versions of such processes). I hope that, by incorporating such complex algorithmic processes, we can extend game theoretic studies to more realistic settings of multi-agent interaction. What I emphasize in this work is not just end results (i.e., equilibria; Osborne and Rubinstein 1994), not just simple processes involving simple operators (Fudenberg and Levine 1998), but complex algorithmic operations and processes. This is because what an agent may do is not limited to completely rational choices as assumed by many game theorists, but also some apparently irrational behaviors which may nevertheless lead to desirable outcomes in the future. We are interested in learning and action selection that are more opponent-oriented and more determined on the fly, than many existing processes. This kind of algorithmic process helps to improve cooperation.
2 2.1
Background Game Theory
Game theory studies decision making involving multiple agents (Osborne and Rubinstein 1994). A strategic game is the one in which all agents choose their actions simultaneously and once for all. In contrast, in an extensive game, agents perform actions sequentially. Formally, an extensive game is a 4-tuple: (N,H,P,U), where N is a set of agents, H is a set of history (see Osborne and Rubinstein 1994 for further specifications), P is the player function such that P(h) specifies the player after history h e H, U is the payoff function that maps each terminal history to a real value. For simplicity, in the following discussions, we assume the length of games is finite, that is, each game always terminates in a finite number of steps. Each agent has perfect information. Given these assumptions, we will look into extending current game theory, incorporating more complex algorithmic processes that capture more realistic cognitive and social processes during game learning.
212
2.2
Reinforcement Learning
Reinforcement learning is a general learning framework suitable for learning extensive games. In a single-agent learning setting, there is a discrete-time system, the state transitions of which depend on actions performed by an agent. A Markovian process determines state transition after an action is performed. Costs (or rewards) can occur for certain states and/or actions. Normally the costs/rewards accumulate additively, with or without a discount factor 7 G (0,1]. One algorithm for learning optimal policies is the Q-learning algorithm of Watkins (1989): Q{st,at)
:= (1 -a)Q(st,at) + 7 max
+
a(r(st+1)
(Q(st+i,at+i)))
at + iSA
where a is the learning rate, which goes toward zero gradually. Action at is determined by an exploration action policy, e.g., using (1) alternating exploration (random actions) and exploitation (greedy actions) periods, (2) a small fraction of random actions (with probability e, a random action is chosen; with probability 1 — e, a greedy action is chosen), or (3) stochastic action selection with the Boltzmann distribution. Such an algorithm allows completely autonomous learning from scratch, without a priori domain knowledge. Extending Q-learning to co-learning in extensive games, we may simply use the above single-agent Q-learning equation, or we may use multi-agent Q-learning equations (Littman 2001). We assume that each state (used in Q-learning), or information set (as termed by game theorists), is comprised of all the actions up to the current point and, optionally, information about the initial situation at the time when the game begins. State transitions are deterministic. We assume that there is sufficient exploration during reinforcement learning (which is a standard requirement for ensuring convergence of RL), so that each agent knows the payoff outcomes of all the paths on the game tree. But, eventually, each agent converges to a deterministic action policy, i.e., a pure strategy in game theoretic terms. 3
Types of Meta-Learning
In practice, performance of Q-learning in extensive games tends to be very poor (Shoham and Tennenholtz 1994, Sandholm and Crites 1995, Haynes and
213
(2,5)
(1)
(1.1)
(1,1)
(3.3) (3,5)
(2)
(1,1)
(1,1)
(3,3) (2.5)
(4.4)
(1,1)
(3,3)
(3)
Figure 1. Three cases of the left/right game. The numbers in circles indicate agents. I and r are two possible actions. The pair of numbers in parentheses indicate payoffs (where the first number is the payoff for agent 1 and the second for agent 2).
Sen 1996, Sun and Qi 2000), despite the fact that Q-learning is cognitively justifiable (Sun et al 2001). The problem may lie in the fact that other cognitive mechanisms may also be needed, on top of such trial-and-error learning, in order to achieve good performance (Sun et al 2001). In this paper, I shall explore additional adaptive mechanisms (i.e., meta-learning routines), within a RL framework, to facilitate the attainment of optimal or near optimal results. 3.1
Manipulation by Preemptive Actions
An agent can manipulate other agents by adopting suboptimal actions (likely temporarily) in order to force or induce opponents to take actions that are more desirable to the agent but result in lower payoff (and are thus less desirable) to themselves. For example, in the game of Figure 1 (1), with reinforcement learning, agent 1 will end up rationally choosing action r, which will lead agent 2 to choose its best action r, and hence a payoff of (3,3) for them. However, agent 2 prefers the outcome of (2,5). Therefore, it may deliberately choose / after agent 1 chose r, leading to a payoff of (1,1), which forces agent 1 to change its action. With further reinforcement learning, agent 1 may adapt to this change and choose / instead (because this action can lead to the best possible outcome, if agent 2 rationally chooses / afterwards, given its manipulative action of / after agent l's r). This change gives agent 2 its preferred outcome of (2,5). We assume that there are only two agents in an extensive game, which take turn in acting. Assume that after sufficient learning using Q-learning, a subgame perfect equilibrium (Osborne and Rubinstein 1994) is reached. Sufficient exploration is done during reinforcement learning and, thus, each agent
214 have fairly accurate knowledge of the payoffs of different p a t h s on the entire game tree. After reinforcement learning settles into a particular payoff outcome, assume there is an alternative outcome (not necessary a Nash equilibrium) t h a t is more desirable t o an agent but less desirable to its opponent. Suppose this alternative outcome can b e reached if t h e opponent takes a different action at a certain point (but follows the current optimal policy as determined by the reached subgame perfect equilibrium thereafter). This point (the targeted switch point) can be determined by a search of the game tree: A l g o r i t h m 1. 1. Search from the root of the tree along the current (equilibrium) path using depth-first search. At each point of action by the opponent, do the following: 1.1. Adopt an alternative action. 1.2. Follow the current, optimal policy (the subgame perfect equilibrium strategy) of each agent thereafter. 1.3. If one of these alternative actions leads to a more desirable outcome for the agent, add the whole path to the candidate path set. 2. Choose from the candidate path set the most desirable path. Start the manipulation process at the point of the alternative action by the opponent in the chosen path. Here is what the agent can do to change the action by the opponent at the point (the manipulation): A l g o r i t h m 2. Search the subtree that starts at the action (by the opponent) that the agent aims to change (using depth-first search): 1. If there is an alternative action by the agent at any point along the current path in the subtree, that creates a path (1) that leads to a payoff for the opponent that is lower than the payoff of the most desired path, and (2) on which all other actions (by either agents) conform to the optimal policies (determined by the equilibrium), then, commit to that action (i.e., perform that action whenever that point of the game tree is reached). 2. If there are multiple such actions by the agent, choose the one highest in the tree (that is, the closest to the current action by the opponent to be changed). T h e algorithm is based on the following result: T h e o r e m 1 For the subgame described by the part of the game tree below the point of the committed (manipulating) action (of the manipulating agent),
215
the original subgame perfect equilibrium strategies for both agents remain the subgame perfect equilibrium strategies. Thus, the acquired policies below the changed action need not be changed and re-learned. Similarly, Theorem 2 For the subgame described by the part of the game tree below the point of an alternative action by the opponent, the original subgame perfect equilibrium strategies for both agents remain the subgame perfect equilibrium strategies. An obvious shortcoming of Algorithm 1 is the cost of the exhaustive search used. An alternative is to search and find only one desirable path for the agent, with a straightforward modification of Algorithm 1, and then to force the opponent to go down that path using Algorithm 2. We may similarly eliminate exhaustive search in Algorithm 2. In either case, the hope is that the opponent will opt for an alternative action at the targeted switch point that leads to a better outcome for the agent (as a result of further trials and further learning by the opponent during those trials). However, there may be multiple action choices for the opponent at this point or another (before the committed action of the agent). The opponent may opt for an action that is not the desired action. To force the opponent to take the desired action, the agent needs to close off all loopholes (all "distractor" paths). That is, the above algorithm can be repeatedly applied, if the desired outcome is not reached due to the opponent taking an unintended action at a point above the committed action by the agent. This process can continue until all the other alternatives are "eliminated" except the desired path (or, when an outcome that is equivalent to, or better than, the desired outcome is reached). a As a result of further trials during which further reinforcement learning occurs, the opponent may adapt to the manipulation and take the target action intended for it by the agent. Thus the game settles into a new state that is a subgame perfect equilibrium state given the manipulation (i.e., with the original action by the agent at the point of manipulation being "prohibited" or removed). However, the opponent may counter-react to the manipulation. First of all, counter-reaction may take the form of obstinacy: The opponent can refuse to change any action despite the worsened outcome as a result of the manipulation and despite the existence of alternative actions that can lead to better outcomes (although they may not be as good as the original outcome). Sec° Alternatively, we may at once lowers the payoffs of all the alternative actions for the opponent, if they are higher than that of the desired outcome for the opponent (see Sun et al 2001).
216
ond, counter-reaction may also take the form of counter-manipulation using the same algorithm described above. The opponent can, e.g., eliminate the outcome that is the most desirable for the original agent (and thus was the goal of the original manipulation). These issues are dealt with elsewhere and not repeated here due to the page limit. 3.2
Manipulation by Nudging Actions
This is the case of an agent adopting some suboptimal actions in order to direct its opponent to take actions that are equally, or more, desirable to each agent involved. As a result of the manipulation, everyone receives a payoff that is equal to, or higher than, the payoff each would receive otherwise (without the manipulation). This type of manipulation is obviously easier to accomplish, and does not call for counter-reaction from opponents. If there is a point (the targeted switch point) along the equilibrium path in the game tree (as determined by the reached subgame perfect equilibrium) where an alternative action by the opponent may lead to better payoffs to the agent and no worse payoffs to its opponent, then the agent can take a nonoptimal action at a point below the afore-identified targeted switch point to force a worse payoff on the opponent if it follows the old path. The algorithms for reaching the desired outcome, including selecting the switch point and forcing the switch, have been discussed earlier and remain the same. For example, in the game of Figure 1 (2), with reinforcement learning, agent 1 may choose action r, which leads agent 2 to choose action r, and hence a payoff of (3,3) for them. However, agent 2 prefers the outcome of (3,5). Therefore, it decides to take / after agent 1 took r, leading to a payoff of (1,1) for them, in order to nudge agent 1 to change its action as well. With further reinforcement learning, agent 1 adapts to this change and chooses I instead, which leads to the outcome of (3,5) — a better outcome for agent 2. This is a special case of the previously discussed scenario where there is no need for the opponent to consider counter-reaction. 3.3
Manipulation by (Mutual) Compromise
An agent can adopt some suboptimal actions, in order to induce its opponent to take actions that are suboptimal too, which together, however, can lead to outcomes more desirable to both agents involved. This case can be viewed as reaching a mutually beneficial compromise in order to maximize the payoffs of all those involved. For example, in the game of Figure 1 (3), with reinforcement learning, agent 1 will end up choosing action r, which leads agent 2 to choose action r,
217 and hence a payoff of (3,3) for them. However, agent 2 prefers the outcomes of (2,5) or (4,4). It cannot easily induce agent 1 to an outcome of (2,5), because it gives agent 1 a worse payoff. But it can induce agent 2 to an outcome of (4,4). Therefore, it consistently takes r if agent 1 takes I, which gives agent 1 incentives t o take / instead of r (because it leads t o a better payoff for agent 1). W i t h further reinforcement learning, agent 1 settles on action /, which leads to the outcome of (4,4) — a compromise between the two agents. b As a result of the manipulation, everyone receives a payoff t h a t is higher t h a n the payoff each would receive otherwise (without the manipulation). However, as with the previous cases of manipulations, the resulting outcome is not a Nash equilibrium, and it is stable only under the reached compromise (i.e., given t h e committed action choice). Algorithm 3. 1. Search from the root of the tree along the current (the subgame perfect equilibrium) path. At each point of action by the opponent, and at each point of action by the agent itself following that, try a pair of alternative actions. That is, repeat the following (using depth-first search): 1.1. Adopt an alternative action at a point of action by the opponent. 1.2. Follow thereafter the current policy (the subgame perfect equilibrium strategy) of each agent, except the following change. 1.3. At a point of action by the agent itself, try an alternative action. 1.4. If this pair of alternative actions leads to more desirable outcomes for both agents, store the pair as a candidate pair. Now, the agent commits to his part of this compromise (a chosen pair of alternative actions): A l g o r i t h m 4. If there is at least one candidate pair (that is, if at least one of these pairs of alternative actions led to more desirable outcomes for both agents), start the manipulation process: 1. Find the best such pair (based on a criterion such as the maximum increase of payoffs for the manipulating agent, or the highest total increase of payoffs for both agents). 2. Commit to the action (of the agent) from the chosen pair of actions. W i t h o u t explicit communication, the agent has to wait for the opponent to discover this commitment through exploration during further reinforcement learning. Most likely, the opponent will discover the advantage of taking the "Note that, in this game, it is also possible for agent 2 to take preemptive actions to force an outcome of (2.5), as in section 3.1.
218
corresponding action determined by this compromise, and thereafter both agents will be able to reap the benefit. We show below that the desired action change of the opponent as determined by the compromise will lead to the highest possible payoffs for both agents, given the manipulation, and thus there is sufficient incentive for the opponent to take that action determined by the compromise: Theorem 3 (1) The targeted action change of the opponent will lead to the highest possible payoffs for both agents, given the manipulation. (2) The optimal policies of both agents, either below the agent's manipulating action, above the targeted alternative action by the opponent, or in between the two points, will not be changed due to the manipulation. Note that, compared with earlier types of manipulations, here the agent chooses to induce rather than to force its opponent to take the action it wants it to take. This mutual compromise process can be extended to more than two steps. 4
Concluding Remarks
This paper considers algorithmic meta-learning processes of joint sequential decision making that are both cognitively realistic and practically effective. In essence, we incorporate more cognitively realistic learning processes by combining simple decision making studied in game theory with complex algorithmic processes (Sun et al 2001). Armed with extended senses of rationality, we are aiming at an algorithmic account of multi-agent learning of cooperation (that starts from scratch without a priori domain knowledge). Of course, it is important that we extend our basic assumptions to deal with more general cases, which are being worked on right now. References 1. C. Camerer and T. Ho, (1999). Experience-weighted attraction learning in normal-form games. Econometrica, 67, 827-874. 2. C. Claus and C. Boutilier, (1998). The dynmics of reinforcement learning in cooperative multiagent systems. Proceedings of AAAI'98. AAAI Press, San Mateo, CA. 3. D. Fudenberg and D. Levine, (1998). The Theory of Learning in Games. MIT Press, Cambridge, MA.
219
4. T. Haynes and S. Sen, (1996). Co-adaptation in a team. Journal of Computational Intelligence and Organizations. 5. J. Kahan and A. Rapoport, (1984). Lawrence Erlbaum Associates, London.
International
Theories of Coalition Formation.
6. M. Littman, (2001). Value-function reinforcement learning in Markov games, special issue on multi-agent learning (edited by R. Sun), Cognitive Systems Research, Vol.2, No.l, 2001. 7. J. Nash, (1950). Equilibrium points in N-person games. Proceedings of National Academy of Science, vol.36, 48-49. 8. M. Osborne and A. Rubinstein, (1994). A Course on Game Theory. MIT Press, Cambridge, MA. 9. T. Sandholm and R. Crites, (1995). Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37, 147-166. 10. S. Sen and M. Sekaran, (1998). Individual learning of coordination knowledge. Journal of Experimental and Theoretical Artificial Intelligence, 10, 333356. 11. Y. Shoham and M. Tennenholtz, (1994). Co-learning and the evolution of social activity. Technical Report STAN-CS-TR-94-1511, Stanford University. 12. H. Simon, (1957). Administrative Behavior (2nd ed.). New York: Macmillan. 13. D. Sonsino, (1997). Learning to learn, pattern recognition, and Nash equilibrium. Games and Economic Behavior, 18, 2, 286-331. 14. V. Soo, (2000). Agent negotiation under uncertainty and risk. Design and Applications of Intelligent Agents, pp.31-45. Springer-Verlag, Heidelberg, Germany. 15. R. Sun, E. Merrill, and T. Peterson, (2001). From implicit skills to explicit knowledge: a bottom-up model of skill learning. Cognitive Science. 16. R. Sun and D. Qi, (2000). Rationality assumptions and optimality of co-learning. Design and Applications of Intelligent Agents. Lecture Notes in Artificial Intelligence, Volume 1881. pp.61-75. Springer-Verlag, Heidelberg, Germany. 17. J. von Neumann and O. Morgenstern, (1944). Theory of Games and Economic Behavior. John Wiley and Sons, New York. 18. C. Watkins, (1989). Learning with Delayed Rewards. Ph.D Thesis, Cambridge University, Cambridge, UK.
SCALABILITY A N D THE EVOLUTION OF N O R M A T I V E BEHAVIOR
J O R G W E L L N E R t , S I G M A R P A P E N D I C K * , A N D W E R N E R DILGER+ t Chemnitz
University D-09107 {jwe, wdi}
of Technology, Computer Chemnitz, Germany Qinformatik.tu-chemnitz.de
* University
of Konstanz, Department of D-784 64 Konstanz, Germany [email protected]
Science
Sociology
We present an evolutionary approach for developing an agent system consisting of a large and varying number of agents. We start off by describing in short a sociological theory to problems of coordination in societies with many members. Elaborated cognitive explanations of handling individual information are rejected and the concept of symbolically generalized communication media is suggested instead. In a first attempt we have modeled an agent system based on this concept. Simulation results show that agents may coordinate their actions even though they have no individual representations of each other. Simulation starts with a small group of agents and evolves a system of several hundred agents which base their actions mainly on exchanged messages.
1
Introduction
Many coordination approaches for agent systems rely on mechanisms which include detailed knowledge of an agent architecture. For one agent it is essentially knowledge of another agent in order to cooperate with it. Agents personalized this knowledge in the way that they know goals, skills, or beliefs1 for different opponents. To simplify the usual situation one can state that the more potential partners for an interaction an agent has the more agent specific knowledge it has to cope with. This is one reason why current logic based agent approaches scale so badly, because keeping track of information about other agents is an expensive matter. In the next section we consider in more detail a concept, developed by sociologists, to answer questions concerning the coordination of individuals in a society of a huge number of members, namely symbolically generalized communication media. A first approach of modeling the proposed mechanisms for one media is presented in Section 3, focusing on the ability of the agents to acquire a shared symbol system. Simulation results are discussed in Section 4. Section 5 concludes the paper, indicating that one can reasonably base multi-agent systems on the proposed sociological concepts in order to achieve a good scaling.
220
221 2
The concept of symbolically generalized communication media
Humans faced a scaling problem during the development from small groups to modern societies. In small groups it is possible for each individual to keep in mind relevant facts about other members of the group. Different strategies were developed to keep one's knowledge about each other up-to-date, e. g. gossip2. As groups became larger, personalized coordination mechanisms became less efficient, due to the necessary increase of cognitive capabilities which are - however - constrained. In these situations, generalized media simplify communication and the representation of situations. The concept of generalized media has been introduced in sociological theory by Talcott Parsons 3 who used the term "generalized media of interchange". In the context of constructivistic sociological systems theory, the concept has been adopted as "symbolically generalized communication media" (SGCM) by Niklas Luhmann 4 . They offer a mechanism to allow coordinated behavior among individuals that have few or no representations about each other's individual goals, beliefs, intentions or restrictions - which used to be regarded as indispensable for behavioral selections in most of the dominating microsociological models. SGCM simplify the predictability of behavior because they offer a universaUstic mechanism of generating strong motivations as a prerequisite for further cooperation. They symbolize the expectability of getting rewarded by others in situations of requested cooperation. A typical example of such an symbolic representation is money: Its possession symbolizes the expectability of having the option to instrumentalize cooperational behavior of others, for instance in case of spending money and getting goods or services, regardless of the time or situation this option is needed. If money is transferred, this option is transferred also and has to be represented and evaluated only as an option which can be used by the recipient. It can be coded, communicated and represented by a simple binary distinction of having or not having money. Therefore, it works as an generalizable and reliable communication mechanism of initiating coordinated behavior without ponderous and cognitive complex procedures of making others adopt the own goals in order to cooperate. Thus, the use of symbolically generalized media is an efficient way to reduce social complexity by symbolizing expectability. Another important example of a SGCM is the symbolization of power, which is the main objective of our model. Like money, power is used as a mechanism to symbolize the instrumentability of coordinated behavior of others. A typical example of symbolizing power are policemen dressed in uni-
222
forms. By wearing a police-uniform, the option to apply expectable superior sanctions of criminal prosecution in case of normative deviance can be communicated very efficiently. It is also based on expectations, but - in contrast to money - these are predictions of getting sanctioned if requested cooperation is refused. As money, power as a symbolizing mechanism can only be established and preserved if its function can be proofed on demand. If the inability to apply sanctions being supported by others gets observable, the auto-catalytic mechanism of reinforcing a symbol by referencing networks of cooperation breaks down in dynamics comparable to inflationary processes of currency in economic markets. Some concepts related to SGCM are already dealt with on a large scale in multi-agent research, especially norms 5 ' 6 ' 7 and market based coordination mechanisms 8 ' 9 ' 10 . All the mechanisms based on these concepts, first of all norms, have the goal of enabling interactions between agents which do not know much about each other, but do know something in general. Norms are condensed expectation structures. A population wide norm makes actions of agents predictable. It is obvious that coordination mechanisms based on SGCM may play an important role in scaling huge agent systems. The main benefit of such media is that they reduce the amount of knowledge that agents need to interact. Interactions, controlled by a SGCM, are structured in a straightforward way: They do not ensure that an interaction always succeeds, but they ensure that agents know in advance on what aspect negotiation should be limited. Agents need not to know each other. Agents can be black boxes to each other, and indeed they cannot look inside their head. Coordination may still succeed and agents know in what stage an interaction currently is, and when it should be stopped. Every agent is only concerned with its own beliefs or goals, there is no need to take into account elaborated reasoning mechanisms about beliefs or goals of other agents, since they become immediately apparent to each other during an interaction to some extent. Whatever an agent wants or believes will be conveyed by a medium to another agent. A medium does not reveal an agent's goal or its beliefs, but it offers a way to achieve a goal or to verify or to strengthen its own beliefs. In the remainder of this paper we concentrate on the SGCM power as proposed by Luhmann. We do not model predefined rules of power that allow an agent to interact with another one. We rather focus on the evolution of a shared symbol system and the meaningful use of a sanction mechanism that represents power, both, with respect to efficiently coordinated actions.
223
3
A first approach to the evolution of the power medium
A simulation consists of a large number of trials of a cooperation game which we called the "Planter-and-Harvester-Game" for simplicity. We introduce two different types of agents, with respect to their ability to change the environment. There are also two types of actions that change the state of the environment in an effective way, namely "planting" and "harvesting", which complement each other. Plant agents, called Planter can perform only plant actions effectively, harvest agents, called Harvester can perform only harvest actions effectively. At the beginning of a game the environment U is always in state Us = 0. A plant action Plantj - performed by a Planter - transforms the environment into state Ut — 1, a harvest action Harvest/ - performed by a Harvester - transforms it into the final state Ue = 2. In more complicated games the final state may be Ue > 2 assuming action sequences Planti, Harvesti, Plantu, and so on. Action Planti in state U — 1 has no effect with regard to the state of the environment, similarly action Harvest/ in state U = 0. Furthermore, a Planter might successfully apply in state U = 2 only action Plantu, not actions Plantj or Plantuj. At the beginning of a game two agents are randomly selected from the population, one of them is the start agent. This agent begins by sending a message Mo. The other agent receives this message and performs an action ai and sends another message Mi to the start agent. Then, the first agent performs an action ai and sends a message M^ to the second agent, and so on. A round is defined by a successive sequence of performing one action and generating a message for each of the two agents. Both types of agents have the same repertoire of actions regardless of the efficiency: apart from plant and harvest actions they have a ./VuZZ-action without any effect, a Sanction action, an action Exit, and an action Replace. The later action affects the opponent agent in the way, that it gets replaced by another agent, randomly selected from the population. This may increase the general possibility for a successful coordination. A game may end by three different outcomes: an agent performed the Exit action, the environment reached the final state Ue, or the number of rounds in the games exceeded the defined threshold rounds. There is a predefined set of symbols S = {0,1,2, ...,Smax} for message generation. A message consists exactly of one of these symbols. A symbol itself has no meaning to an agent, there is no predefined semantics at all. A game ends successfully if the environment was transformed into the final state Ue. In this case, the last two agents, participating in the game, get a certain amount E* of "energy". In other cases there is no energy payoff. Every action that an agent performs consumes a specified amount of energy
224
of the agent. There are low cost actions (Null, Exit, and Replace) and high cost actions (Plantx, Harvestx). For a low cost action the agent consumes energy E[ > 0, for a high cost action Ei + E^, Eh > 0. The cost of the action Sanction is Ei + E^, Eb > 0. This action affects the other agent in the way that the sanctioned agent looses pain energy Ep > 0. At the beginning of an agent's life time its energy is set to E = Es > 0, its start energy. If E ever falls below 0, the agent dies, that is, the agent is removed from the population. An agent does not know its own type nor perceives the type of another agent. They are black-boxes to each other. An agent perceives the message of another agent, the state of the environment, and the fact of being sanctioned. In any case not all relevant aspects of the environment are known in the same way to all the participants, for instance the direct result of an action. Agents must test different actions at different times and the only hint to whether an action or message was appropriate is given by a reward signal. This signal is always generated by the agent itself, based on the energy difference between two consecutive actions. A sigmoid function generates the reward signal r based on the energy difference e^; a positive energy difference results in a positive reward, a negative difference results in a negative reward. Thus, individual agents employ reinforcement learning. This definition of a reward signal is a weak one, since it does not assume any intelligent observer (outside the agent) who generates a reward signal based on its knowledge about correct actions. Beside an energy value agents have an age A, which at the beginning of an agent's life time is set to 0. Any time an agent gets selected to play the game, its age will be incremented by 1. If the age reaches an individual maximum, Amax, the agent will be removed immediately from the population. At the start of the simulation, the population P consists of a certain number of agents Ps. The number of agents during the simulation may shrink or grow, depending on the fitness of the agents. An agent may enter the population if there are at least two agents, whose age is above value Asex and whose energy value is above a value Esex. The two "parents" are selected by a "Roulette wheel" 11 from all possible parent agents based on their energy value. Once a successful breeding occurred, the two parent agents are prevented from reproduction for a certain period of time tpause. Whenever the number of agents in the population Pt falls below Ps, agents are randomly added to the population until Pt = PsWe focused explicitly on one certain aspect of media, namely the relevance of expectations in choosing an appropriate answer to a received message. Thus, we combine an internal state with the expectation of a received message. This results in a frame-like structure which will be executed on two levels. In
225
a first step a set Ft of frame structures is chosen based on the state of the environment. This step will be performed without any learning by the agent and is totally determined by the environment. In a second step the agent chooses one frame structure from the previously chosen set Fi. The selected frame will be executed resulting in an action at+i and a new message Mt+\. A frame F is defined with respect to a received message Mr = Mt in the following way: if elseif else
MT = Mei Mr = Me2
then a := acti and M := mes\ then a := acti and M := mes2 execute a trouble frame in FT ,
where at+1 = a and Mt+i — M. A "trouble frame" Ff will be executed in the case that the received message was neither Me\ nor Me2- This frame has a special structure, because it does not check the occurrence of a certain message, rather it checks whether the agent has been sanctioned or not in order to determine the new action and message: if else
sanctioned = true
then then
a :— actri a := actr2
and M := mesri and M := mesx2 •
For every state of the environment the agent has two frames. The selection of a frame at time t will be guided by a Q-value Qp, that is, reinforcement learning 12 takes place in order to choose an appropriate frame in a given (environmental) situation. The entire collection of frames for an agent by a given final state Ue of the environment is: Fu = {F(k,o),F(k,i)}, for fc = 0 , . . . , Ue. An additional frame set is employed by an agent when the agent starts the communication by generating the start message M 0 . For the trouble state UT the agent can choose also between two (trouble) frames FT = {F^Ff}. Evolution is based on frames, agents do not change frames during their life time, they are just able to change the Q-value of a frame with respect to the other frame inside the same frame set. At the beginning of the simulation, all frames of all agents are initialized randomly. In particular, variables M e i, Me2, mesi, mes2, mesri, and mesT2 get randomly chosen values from S = { 0 , 1 , 2 , . . . tSmax], and variables acti, act2, actxi, and actT2 g e t randomly chosen values from A = {Null, Sanction, Exit, Replace, Planti, Harvesti, Plant11,...}. Inheritance happens on the frame level, that is, cross-over takes place between frames, not inside a frame (but inside a frame set). Individual parts of a frame are subjected to mutation. Therefore, e. g. part M e i or act2 may get a new random value during mutation process. Qvalues are not passed on to offspring, and are set to a small random value at the beginning of an agent's life time.
226
I yAt*tw
<=)
Figure 1. Simulation of 1000000 games (results averages 1000 games). Result of the simulation: a) maximum possible success (counting the occurrence of a "correct" pairing of the agents); b) the actually achieved success; c) correctly performed Exit; d) Exit in wrong situation; e) stopped, because maximum rounds exceeded. For example: after around 500000 games, the average result of 1000 games was 60% successful games, out of a maximum of 75% possible successful games, 25% were correctly and 10% were incorrectly exited by an agent, and 5% were stopped by the system (values approximated). Ue = 4, Smax - 3, rounds = 10, E* = 10.0, Et = 0.5, Eh = 2.5, Eb = Ep = 0.1, Es = 50.0, Amax e { 5 5 0 , . . . , 800}, Asex - 20, tpause = 20, a = 5.0, b = 1.0.
4
Simulation results
Figure 1 shows the general outcome of a simulation, and Figure 2 shows the statistics of the number of sanctions in 1000 games, the number of living agents, and the average energy of the agents. The maximum number of agents was set to 1024. The simulation started with 3 agents and as long as the number of agents was below 15 a higher energy pay off E* was given for success than indicated in the caption of Figure 1 (to support an onset of evolution). The number of agents grew rapidly until the limit was reached. Later, evolution still took place optimizing the frame structures. This may result, for example, in changing cooperation sequences, or in a "competition" of different sequences as indicated in Figure 3. A sequence was denned by MQ, M\ a\ M2 a,2 • •., that is, Mo is the start message of the first agent, Mi the answer message and a\ the action of the other agent and so on. The coding of actions is: 0 - Null, 1 - Sanction, 2 - Exit, 3 - Replace, 4 - Plant\, 5 - Harvestj, Because we analyzed only sequences which did not contain a i?ep/ace-action, and which were successful, all these sequences end with action 7 (Harvestji). Figure 3 shows the eight most frequent sequences of the entire simulation. The sequence 1 occurred 160877 times, out of 346727 successful sequences, without a -Rep/ace-action. In detail, sequences are shown
227
£
400 3rLw~—»w
1000 -
:L Figure 2. From top to bottom: Number of sanctions ("Bites", not averaged), number of living agents, and average energy of the agents. The number of agents was restricted to 1024. When this number was reached, agents did increase their amount of energy on the average.
Figure 3. The eight main sequences of the frame based evolution. Left: Absolute occurrence of sequences (average of 1000 games), right: relative occurrence of the sequences (in relation to 346727 successful sequences). The eight sequences occurred 329895 times.
in Figure 4. The communicative behavior of agents became more and more regular. Because there were two frames for each environmental situation it is obvious that a frame set is assumed to contain exactly one appropriate frame for Planters and one for Harvester. An individual only has to explore which one is better suited. A detailed analysis of the communicative behavior reveals indeed that communication controls the behavior of agents. As the results indicate, the agents were able to set up a population wide semantics for the
228 number (see Fig. 3) 1 2 3 4 5 6 7 8
number of occur. 160877 66551 37402 26721 19039 7118 6453 5734
seq. M0 M\ en M2 a2 • • • 104051627 20404051627 004051627 01504051627 21504051627 00404051627 20504051627 21704051627
Figure 4. Eight most sequences in detail.
exchanged symbols. The meaning of a symbol depends - of course - on the environmental state, however symbols became functional for the agent's choice of the next message or action. Sanctions became less important as the behavior became more normative. Although not shown here, simulations are easily adapted to cases where several thousand agents may evolve, still acting in a coordinated manner. 5
Conclusion
We have shown that a growing population of agents may act in a coordinated manner even in the case when the cognitive capabilities of the agents are limited and, moreover, when agents do not know anything about each other (apart from received messages). From an observer's point of view agents reveal a normative behavior, although we did not predefine any norms. We started by questioning what kind of mechanisms human society evolved in order to cope with a growing number of individuals. We found an interesting answer in the work of sociologists, especially SGCMs proposed by Luhmann. We have modeled one SGCM (power) in a first approach. However, our simulation is still too simple to establish all aspects of a symbolic medium. Nevertheless, Luhmann's suggestions regarding SGCM, especially the aspect of structuring a situation by expectations, turned out to be useful. We modeled some aspects of his theory, mainly aspects of a closed communication system, but found interesting approach to answer well-known problems in multi-agent research, namely problems of scalability and the definition of norms. In subsequent work, we will deal with a more elaborated model of a symbolic medium. Further, the impact of more than one medium has to be analyzed, especially their potential for a more heterogeneous agent society and more complex problems to be solved by the agents.
229 Acknowledgement We are grateful to three anonymous reviewers for their comments. This work is supported by the Deutsche Forschungsgemeinschaft under grant number DI 452/10-1 and part of a research project headed by Werner Dilger and Bernhard Giesen. References 1. A. S. Rao and M. P. Georgeff. Modeling Rational Agents within a BDIArchitecture. In Proceedings of the Second International Conference on Principles of Knowledge Representation and Reasoning, pages 473-484, Cambridge, Mass., 1991. 2. R. Dunbar. Grooming, Gossip, and the Evolution of Language. Harvard University Press, Cambridge, Mass., 1996. 3. T. Parsons. The Structure of Social Action. Free Press, New York, 1968. 4. N. Luhmann. Social Systems. Stanford University Press, Stanford, Ca., 1995. 5. M. Paolucci and R. Conte. Reproduction of Normative Agents: A Simulation Study. Adaptive Behavior, 7(3/4) :307-322, 1999. 6. K. Binmore. Game Theory and the social contract, volume 1: Playing fair. Cambridge, Mass.: MIT Press, 1994. 7. Y. Shoham and M. Tennenholtz. Social Laws for Artificial Agent Societies: Off-line Design. Artificial Intelligence, 73, 1995. 8. M. P. Wellman. A Market-Oriented Programming Environment and its Application to Distributed Multicommodity Flow Problems. Journal of Artificial Intelligence Research, 1:1-23, 1993. 9. S. Park, E. H. Durfee, and W. P Birmingham. Emergent Properties of a Market-based Digital Library with Strategic Agents. In Y. Demazeau, editor, Third Int. Conf. on Multi-Agent Systems (ICMAS98), pages 230 - 237, Los Alamitos, Cal., 1998. IEEE Computer Society. 10. G. Ballot and E. Taymaz. Technological change, learning and macro-economic coordination: An evolutionary model. Journal of Artificial Societies and Social Simulation
THINKING-LEARNING BY ARGUMENT
ALADDIN AYESH De Montfort University, The Gateway, Leicester LEI 9BH Email:
[email protected]
Humans argue all the time. We may argue with one's self, with a partner or even with some one we have just met. The argument can take a decision-making form, discussion form, thinking form or in some cases it could be for the argument sake. In this paper we describe a system that uses three object-oriented components referred to as cells to utilize the argument concept to enable thinking-learning process to take place.
1
Introduction
Our argument ability allows us to express our concerns, possibilities and make collective decisions. We may argue with one's self, with a partner or even with a complete stranger. The argument may take the form of decision-making, discussion, thinking or argument for argument sake. Argument with one's self for learning, thinking and decision-making purposes is the concern of this paper. This paper describes a system that uses three object-oriented components to utilize the argument concept into a thinking-learning process. These components are developed using agents' theory and techniques. However these components form one entity and they are not individual agents. Therefore and for clarity sake these components will be referred to as cells through out the paper. The paper discusses the argument concept and outlines the system proposed to utilize this concept. 2
Preliminaries
There are two relevant subjects to be discussed before proceeding further: arguing as a human's mental process, and argumentative agents. Arguing is a powerful tool we use individually and socially [1]. We use this tool to reach to agreements or understanding with our social partners. We use it individually to form understanding about ourselves and about matters of individual concern, as part of our thinking process. And finally we use it as a way of learning new facts from perceived knowledge. The relation between arguing and the three processes of understanding, thinking and learning can be seen in the early work of Plato and who followed his technique of philosophers. Also this relationship is evident in our social life. Consider the statement 'the more we discuss (argue about) issue X the more I learn about your personality'. This could be about your attitude towards or beliefs about the subject of discussion and so on. Finally arguing is affected greatly by our 230
231
perception and by our initial and developed set of beliefs [2], Arguing as a communication protocol in multi-agents systems has been studied intensively. An example is the work done by Mora et al. on distributed extended logic programs [3]. Another example is the work done by Jennings et al. on negotiation [4]. Nonetheless there are differences. In multi-agents systems there is usually a problem to be solved by negotiation. Each agent participates in the argument autonomously. In contrast the agent-like components in our system are limited to three components that form collectively one entity. These components are chosen to converge the argumentation nature and agents technology. Each agent has pre-determined function. 3
Learning by argument system - basics
The proposed system comprises three cells, which are represented as object-agents. These cells are named Observer cell (O cell), Questioner cell (Q cell) and Memory cell (M cell). Each of the three cells is explained here. 3.1
Observer cell (O cell)
O cell represents the perception system. It observes the environment and feeds back to the Questioner cell (Q cell) which is described next. From the observations provided the Q cell forms some knowledge about the observed objects. The cycle continues to perceive as much needed observations to form an opinion about the object or a set of facts describing that object. To demonstrate the working mechanism of O cell let us take for an example our eyes and the argument we have with our perception system. Let us assume that I want to buy a car. I go to a car dealership showroom and look at cars. I see a nice car so in my brain I say 'it is a nice car', a reply comes back with 'but it is a blue car and I want a red car'. I see another car, which is red, but it is not as nice. Now I have one of two choices either to decide in favor of a nice car or a red car, or I can ask if they make the nice car in red. O cell deals with qualitative and quantitative information. Therefore a representational language is being devised using Hybrid logic [5] and adaptive neural nets [6, 7] to represent both qualitative and quantitative information. 3.2
Questioner cell (Q cell)
Q cell is the voice that replies to our observations and stipulate further information. It is the part of the brain that says ' yes it is nice car but it is not red' . The main task of Q cell is to interrogate the information provided by the O cell and feeds back. This will provide stimulus to trigger O cell to provide further observations. Q cell can be viewed as a knowledge management component, which review the M cell to determine ignorance points. Once this is done the questions are formulated and passed to O cell. Q cell uses the same representational framework used by O cell.
232
However, determining ignorance points, which is the consequential issue of this component, is determined by using three-valued logic where U predicate implies 'do not know' [8]. The work is still undergoing to build the representational language. 3.3
Memory cell (M cell)
There are two types of fact perceived by the system, which are asserted facts and observations. The following definitions state these two types. Definition 1 An observation may be defined as a feature a of an observed object T in relation to subject K with a relevance 'low', annotated: 0(a, T) -> Relevance(r, K, low) v Relevance(cc, T, low). • Definition 2 An asserted fact may be defined as a feature a of an observed object T in relation to subject K with a relevance 'strong' or 'definitive', annotated: 0(a, T) > Relevance(r, K, strong) v Relevance(cc, T, definitive).* The Memory cell imitates the memory concept as defined in psychology: working memory and persistent memory which may also be identified respectively as short and long memory [9, 10]. This encourages the investigation of two types of neural nets: self-organizing neural nets (NN) [7] and adaptive architecture NN [6]. Self-organizing NN are well known in machine learning [7]. However, the size and type of information, which M cell deals with, vary greatly depending on the argument process between O cell and Q cell. 3.4
System architecture
Figure 1 shows the communication between the proposed system's main segments. Arguments
OCell Feeding back
OCell Reading
MCell Figure 1: Overview of OMQ System
Definition 3 An OMQ system may be defined as a tuple of components <0,M,Q> where O is an Observer cell, M is a Memory cell and Q is a Questioner cell; under a communication mechanism E in which packets are quadruple defined as follow: Observation packet (O)