This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany
LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany
7027
Yongchuan Tang Van-Nam Huynh Jonathan Lawry (Eds.)
Integrated Uncertainty in Knowledge Modelling and Decision Making International Symposium, IUKM 2011 Hangzhou, China, October 28-30, 2011 Proceedings
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Yongchuan Tang Zhejiang University, College of Computer Science Hangzhou, 310027, P.R. China E-mail: [email protected] Van-Nam Huynh Japan Advanced Institute of Science and Technology School of Knowledge Science 1-1 Asahidai, Nomi City, Ishikawa, 923-1292, Japan E-mail: [email protected] Jonathan Lawry University of Bristol Department of Engineering Mathematics Bristol, BS8 1TR, UK E-mail: [email protected]
ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-24917-4 e-ISBN 978-3-642-24918-1 DOI 10.1007/978-3-642-24918-1 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011939089 CR Subject Classification (1998): I.2.6, I.2, H.2.8, H.3-5, F.1, F.2.2, J.1 LNCS Sublibrary: SL 7 – Artificial Intelligence
This volume contains papers presented at the 2011 International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making (IUKM 2011), which was held at Zhejiang University, Hangzhou, China, during October 28–30, 2011. The principal aim of IUKM 2011 was to provide a forum in which researchers could exchange ideas and results on both theoretical and applied research relating to all aspects of uncertainty management and their applications. The organizers received 55 papers. Each paper was peer reviewed by two members of the Program Committee. Finally, 21 papers were chosen for presentation at IUKM 2011 and publication in the proceedings. The keynote and invited talks presented at the symposium are also included in this volume. As a follow-up of the symposium, a special issue of the International Journal of Approximate Reasoning is anticipated to include a small number of extended papers selected from the symposium as well as other relevant contributions received in response to subsequent open calls. These journal submissions will go through a fresh round of reviews in accordance with the journal’s guidelines. The IUKM 2011 symposium was partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61075046, Zhejiang Natural Science Foundation under Grant No. Y1090003, JSPS Grant-in-Aid for Scientific Research [KAKENHI(B) No. 22300074], and SCOPE 102305001 of the Ministry of Internal Affairs and Communications (MIC), Japan. We are very thankful to the College of Computer Science and Technology of Zhejiang University for providing crucial support through out the organization of IUKM 2011. We would like to express our appreciation to the members of the Program Committee for their support and cooperation in this publication. Last, but not the least, we wish to thank all the authors and participants for their contributions and fruitful discussions that made this symposium a success. October 2011
Yongchuan Tang Van-Nam Huynh Jonathan Lawry
Organization
IUKM 2011 was co-organized by the College of Computer Science and Technology, Zhejiang University, and the School of Knowledge Science, Japan Advanced Institute of Science and Technology.
General Co-chairs Yueting Zhuang Yoshiteru Nakamori
Zhejiang University, China Japan Advanced Institute of Science and Technology, Japan
Program Co-chairs Yongchuan Tang Van-Nam Huynh Jonathan Lawry
Zhejiang University, China Japan Advanced Institute of Science and Technology, Japan University of Bristol, UK
Program Committee Byeong Seok Ahn Bernard De Baets Yaxin Bi Bernadette Bouchon-Meunier Tru H. Cao Fabio Cuzzolin Van Hung Dang Thierry Denoeux Gary Gang Feng Lluis Godo Yongyong He Enrique Herrera-Viedma Kaoru Hirota Tu Bao Ho
Chung-Ang University, Korea Ghent University, Belgium University of Ulster at Jordanstown, UK University Pierre and Marie Curie, France Ho Chi Minh City University of Technology, Vietnam Oxford Brookes University, UK Hanoi National University, Vietnam; UNU-IIST, Macau University of Technology of Compi`egne, France Hong Kong City University, Hong Kong Artificial Intelligence Research Institute, CSIC, Spain Tsinghua University, China University of Granada, Spain Tokyo Institute of Technology, Japan Japan Advanced Institute of Science and Technology, Japan
VIII
Organization
Kaizhu Huang Wei Huang Eyke Hullermeier Mitsuru Ikeda Masahiro Inuiguchi Gabriele Kern-Isberner Etienne E. Kerre Hiroaki Kikuchi Laszlo Koczy Vladik Kreinovich Ming Li Churn-Jung Liau Jun Liu Weiru Liu Trevor Martin Radko Mesiar Sadaaki Miyamoto Tetsuya Murai Hung T. Nguyen Witold Pedrycz Zengchang Qin Jordi Recasens Jonathan Rossiter Andrzej Skowron Noboru Takagi Vicenc Torra Milan Vlach Junzo Watada Ronald Yager
Chinese Academy of Sciences, China Huazhong University of Science and Technology, China Philipps-Universit¨ at Marburg, Germany Japan Advanced Institute of Science and Technology, Japan Osaka University, Japan University of Dortmund, Germany University of Ghent, Belgium Tokai University, Japan Budapest University of Technology and Economics, Hungary University of Texas at El Paso, USA Nanjing University, China Academia Sinica, Taiwan University of Ulster at Jordanstown, UK Queens University Belfast, UK University of Bristol, UK Slovak University of Technology Bratislava, Slovakia University of Tsukuba, Japan University of Hokkaido, Japan New Mexico State University, USA University of Alberta, Canada Beihang University, China UPC Barcelona, Spain University of Bristol, UK Warsaw University, Poland Toyama Prefectural University, Japan Artificial Intelligence Research Institute, CSIC, Spain Charles University, Czech Republic Waseda University, Japan Machine Intelligence Institute, USA
Sponsoring Institutions Zhejiang University Japan Advanced Institute of Science and Technology The University of Bristol Springer Elsevier
Introduction to the ER Rule for Evidence Combination . . . . . . . . . . . . . . . Jian-Bo Yang and Dong-Ling Xu
7
Theories and Approaches to Treat Incomparability . . . . . . . . . . . . . . . . . . . Yang Xu and Jun Liu
16
Knowledge Science – Modeling the Knowledge Creation Process . . . . . . . Yoshiteru Nakamori
18
Two Classes of Algorithms for Data Clustering . . . . . . . . . . . . . . . . . . . . . . Sadaaki Miyamoto
19
Fusing Conceptual Graphs and Fuzzy Logic: Towards the Structure and Expressiveness of Natural Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tru H. Cao
An Information Processing Model for Emotional Agents Based on the OCC Model and the Mood Congruent Effect . . . . . . . . . . . . . . . . . . . . . . . . Chao Ma, Guanghong Gong, and Yaofei Ma
98
On Distributive Equations of Implications and Contrapositive Symmetry Equations of Implications Based on a Continuous t -Norm . . . Feng Qin and Meihua Lu
109
A Novel Cultural Algorithm Based on Differential Evolution for Hybrid Flow Shop Scheduling Problems with Fuzzy Processing Time . . . . . . . . . . Qun Niu, Tingting Zeng, and Zhuo Zhou
121
An Over-Relaxed (A, η, m)-Proximal Point Algorithm for System of Nonlinear Fuzzy-Set Valued Operator Equation Frameworks and Fixed Point Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heng-you Lan, Xiao Wang, Tingjian Xiong, and Yumin Xiang
133
Reliability-Based Route Optimization of a Transportation Network with Random Arc Capacities and Time Threshold . . . . . . . . . . . . . . . . . . . Tao Zhang, Bo Guo, and Yuejin Tan
Clustering Based Bagging Algorithm on Imbalanced Data Sets . . . . . . . . . Xiao-Yan Sun, Hua-Xiang Zhang, and Zhi-Chao Wang
179
Agglomerative Hierarchical Clustering Using Asymmetric Similarity Based on a Bag Model and Application to Information on the Web . . . . . Satoshi Takumi and Sadaaki Miyamoto
Talking with Uncertainty Toyoaki Nishida Graduate School of Informatics, Kyoto University, Kyoto, Japan [email protected]
Abstract. We are living in the world with uncertainty. Our artificial partners should be able to not only reason about uncertainty but also communicate it in an empathic fashion. Combination of an immersive WOZ environment and the learning by mimicking framework allows for the development of robotic agent’s communication competence in a data-driven fashion. Conversational decision making permits an artificial partner to interactively help people make decision under uncertainty. A cognitive architecture needs to be considered for provoking empathy in social decision making. Keywords: Communicating Uncertainty, Conversational Informatics, Conversational Artifacts, Immersive WOZ Environment, Conversational Decision Making, Empathic Agent.
make decision under uncertainty. As we will discuss later, a key appears to be facilitating dialectic conversation to incrementally formulate a joint intention. Third, we need to think about a cognitive architecture that may provoke empathy in managing and communicating uncertainty, as social decision making needs to be supported by empathy among the participants.
2
Building Conversational Artifacts
The most popular communicative activity is face-to-face conversation. In order for an artificial partner to participate in conversation to communicate with people, it should be able to not only express and recognize a rich vocabulary for representing uncertainty but also follow and control the conversation flow. Nonverbal communication is deemed useful for the speaker, when s/he wants to directly refer to the real world, illustrate some semanic features [3], or exhibit her/his attitude towards a subject as a social signal [4]. Meanwhile, nonverbal communication allows the participant to control or even negotiate the discourse flow of conversation, e.g., by gazing the partner to monitor the current state or by averting eye gaze in order not to be interrupted after turn is obtained, as reported by [5]. Our group addressed how much we can measure social signals from nonverbal cues. Huang et al [6] introduced two heuristics to drive the utterance policy in multiple-user interaction: Interaction atmosphere of the participants (AT) and the participant who tends to lead the conversation at a specific time point (CLP). AT and CLP are estimated from the activities of the participants’ face movement and acoustic information. Ohmoto et al [7] introduced the notion of I-degree, the degree of involvement in interaction. It is assumed that the I-degree naturally manifests as physiological indices such as skin conductance response and respiration as reflecting a person’s mental state. A method is proposed for estimating the I-degree of participants from visual cues, by exploiting measured correlation between the visual and physiological cues. I-degree is used to detect the atmosphere of conversation in multi-user interaction. How can we turn these insights into the design of artificial partners? A baseline approach is “behavior from observation”, i.e., to observe exactly how people behave in face-to-face conversation and create a quantitative model for reproducing the behavior. Recently, numerous works have been done with this approach, e.g., [8, 9]. This approach leverages recent advances in sensing technology, which allows for capturing nonverbal communication behaviors that people exhibit consciously or unconsciously during conversation to build a quantitative model for conveying propositions encompassing uncertainty in varying situations or controlling the discourse of conversation. Unfortunately, this approach is faced with a difficulty when differences in embodiment manifest between the human and the robot, for different embodiment induces different communication patterns and the communication patterns in human-human interaction cannot be directly applied to human-robot interaction. For example, we have found in our experiments that people tend to use clear, emphasized, complete,
Talking with Uncertainty
3
Fig. 1. The immersive WOZ environment ICIE
and redundant expressions in human-robot interaction, as opposed to vague, subtle, incomplete, and parsimonious ones in human-human interaction [10]. It is probably because robotic agents are deemed not as competent as humans in communication and the common ground is not well-established. Under the circumstances we need to employ the WOZ (Wizard of OZ) method to observe interaction between human participants and a robot controlled by a hidden human operator (WOZ). In order for the WOZ approach to be successful, we need to overcome difficulties for manipulating a robot with many degrees of freedom. Our immersive WOZ environment ICIE [11] allows the human operator to control a robot as if s/he stayed inside it, as shown in Fig. 1. The audio-visual environment surrounding the WOZ-operated robot is captured, e.g., by an omnidirectional camera attached to the robot’s head, and is sent to the WOZ operator’s cockpit to be projected on the surrounding immersive screen and the speakers The current version of ICIE employs eight 64-inch display panels arranged in a circle with about 2.5 meters diameter. Eight surround speakers are used to reproduce the acoustic environment. Together, the immersive environment allows the WOZ operator in the center of the cockpit to grasp in detail the situation around the robot to determine exactly what to do if s/he were the robot. The WOZ operator’s behavior, in turn, is captured in real time by a collection of range sensors. Noise filters and human body model are used for robust recognition of pose, head direction and gesture. The captured motion is mapped on the robot for motion generation. The sound on each side of the WOZ operator is gathered by microphones and communicated via network so that other participants in the conversation place can hear the voice of the WOZ operator (with a modulation, when necessary). The behavioral model of the robot is generated from the collected data in four stages in the framework of learning by mimicking (Fig. 2) [12, 13]. First, the basic actions and commands are discovered on the discovery stage. A number of novel algorithms have been developed. RSST (Robust Singular Spectrum Transform) is an algorithm that calculates likelihood of change of dynamics in continuous time series without prior knowledge. DGCMD (Distance-Graph Constrained Motif Discovery) uses the result of RSST to discover motifs (recurring temporal patterns) from the given time
4
T. Nishida
Fig. 2. The framework of learning by mimicking [12, 13]
series. Second, a probabilistic model is generated to specify the likelihood of the occurrence of observed actions as a result of observed commands on the association stage. Granger causality is used to discover natural delay. Third, the behavioral model is converted into an actual controller on the controller generation stage to allow the robotic agent to act in similar situations. Finally, the gestures and actions learned from multiple interactions are combined into a single model on the accumulation stage.
3
Conversational Decision Making
At the higher level, conversation can be seen as a process for realizing a joint activity. In order to participate in a conversation to help people make decisions, an artificial partner should be able to facilitate dialectic conversation so that a joint intention can be formulated incrementally during the conversational interactions. We have addressed to unveil how a skillful facilitator promotes group discussion by supporting the group's social and cognitive processes [14]. In general, facilitators are considered to allow participants to focus on substantive issues in the decision making process, by appropriately interposing based on the most important arguments of each participant. We have investigated the nonverbal and paralinguistic behavior of the participants in face-to-face discussion led by a skillful facilitator. As a result of linear discriminant analysis, we have found that four types of the facilitating behaviors can be classified with 80.0% accuracy by using six independent variables. The discriminant functions indicated that the facilitator paid attention to fairness of discussion and the participants conveyed their requests to the facilitator by using nonverbal behavior.
Talking with Uncertainty
5
In the succeeding study [15], we investigated how much one could help the user formulate a preference structure on an unfamiliar subject by repeating interviews consisting of presentation of possible choices and asking for the preference. During each interview, not only verbal responses but also body movement and physiological indices (SCR and LF/HF) were measured to estimate the ordered set of features the user was emphasized. We assumed that the user’s preference structure may change as s/he changes emphasized feature. We have obtained several interesting findings, e.g., the preferential structure of the user does change during a session, our method can better track the user’s changing emphasis than a previous method, and our method resulted in user’s satisfaction more often than the previous method. A further experiment [16] suggests that critical changes of features emphasized by the user can be detected by the combination of verbal reactions, body movements, and physiological indices.
4
Towards Empathic Agents for Living with Uncertainty
An artificial partner needs to be empathic in order to be successful. In sociocultural computing, building an emphatic agent for moderating communication across different cultures is deemed a key problem [17]. Pentland [4] suggests that we need to listen to honest signals, such as influence, mimicry, activity, or consistency, which come from our brain structure and physiology. Gallese, Eagle, and Migone [18] suggest that intentional attunement or embodied simulation enabled by the dynamics of our embodiment and neural system, including mirror neurons, might be a key. I suspect that sharing the abstract conceptualization with grounded symbols might be critical for the empathy to sustain. Theory of mind [19] needs to be implemented so that an artificial partner can think about the mental state of other agents. A challenging AI problem is to build an intelligent agent that can build and maintain a shared image with other participants in the conversation place.
References 1. AI evolution: From Tool to Partner, http://news.harvard.edu/gazette/story/2002/01/ ai-evolution-from-tool-to-partner/ 2. Halpern, J.Y.: Reasoning about Uncertainty. The MIT Press (2005) 3. Clark, H.H.: Using Language. Cambridge University Press, Cambridge (1996) 4. Pentland, A.: Honest Signals – How they shape our world. MIT Press, Cambridge (2008) 5. Kendon, A.: Some functions of gaze direction in social interaction. Acta Psychologica 26, 22–63 (1967) 6. Huang, H.H., Furukawa, T., Ohashi, H., Cerekovic, A., Pandzic, I., Nakano, Y., Nishida, T.: How Multiple Current Users React to a Quiz Agent Attentive to the Dynamics of Their Participation. In: Proc. 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Toronto, Canada, pp. 1281–1288 (2010) 7. Ohmoto, Y., Miyake, T., Nishida, T.: A Method to Understand an Atmosphere based on Visual Information and Physiological Indices in Multi-user Interaction. In: The 8th International Workshop on Social Intelligence Design (SID 2009), Kyoto, Japan (2009)
6
T. Nishida
8. Nishida, T. (ed.): Conversational Informatics: an Engineering Approach. John Wiley & Sons Ltd., London (2007) 9. Rehm, M., Nakano, Y., André, E., Nishida, T., Bee, N., Endrass, B., Huang, H.H., Lipi, A.A., Wissner, M.: From Observation to Simulation: Generating Culture-specific Behavior for Interactive Systems. AI & Society 24(3), 267–280 (2009) 10. Ohmoto, Y., Ohashi, H., Nishida, T.: How do Constraints on Robot’s Embodiment Influence Human Robot Interaction?, 1E1-4. In: Proc. the 25th Annual Conference of the Japanese Society for Artificial Intelligence (2011) (in Japanese) 11. Ohmoto, Y., Ohashi, H., Lala, D., Mori, S., Sakamoto, K., Kinoshita, K., Nishida, T.: ICIE: Immersive Environment for Social Interaction based on Socio-Spacial Information. To be presented at TAAI Conference on Technologies and Applications of Artificial Intelligence (TAAI 2011), Taoyuan, Taiwan (2011) 12. Mohammad, Y.F.O., Nishida, T., Okada, S.: Unsupervised Simultaneous Learning of Gestures, Actions and their Associations for Human-Robot Interaction. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), pp. 2537–2544 (2009) 13. Mohammad, Y.F.O., Nishida, T.: Learning Interaction Protocols using Augmented Baysian Networks Applied to Guided Navigation. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), pp. 4119–4126 (2010) 14. Ohmoto, Y., Toda, Y., Ueda, K., Okada, S., Nishida, T.: The Analysis of the Facilitation Actions based on the Divergence-Convergence in Discussions and Nonverbal behavior, 2G1-OS3-5. In: Proc. the 24th Annual Conference of the Japanese Society for Artificial Intelligence (2010) (in Japanese) 15. Ohmoto, Y., Miyake, T., Nishida, T.: Estimating Preference Structure through HumanAgent Interaction, 3E2-2. In: Proc. the 25th Annual Conference of the Japanese Society for Artificial Intelligence (2011) (in Japanese) 16. Ohmoto, Y., Kataoka, M., Miyake, T., Nishida, T.: A Method to Dynamically Estimate Emphasizing Points and Degree by using Verbal and Nonverbal Information and Physiological Indices. To be Presented at the 2011 IEEE International Conference on Granular Computing (GrC 2011), Kaohsiung, Taiwan (2011) 17. Aylett, R., Paiva, A.: Computational Modelling of Culture and Affect., Emotion Review (in press, 2011) 18. Gallese, V., Eagle, M.N., Migone, P.: Intentional Attunement: Mirror Neurons and the Neural Underpinnings of Interpersonal Relations. Journal of the American Psychoanalytic Association 55, 131–176 (2007) 19. Baron-Cohen, S., Leslie, A.M., Frith, U.: Does the Autistic Child have a Theory of Mind? Cognition 21, 37–46 (1985)
Introduction to the ER Rule for Evidence Combination Jian-Bo Yang and Dong-Ling Xu Manchester Business School The University of Manchester Manchester M15 6PB, UK {jian-bo.yang,ling.xu}@mbs.ac.uk
Abstract. The Evidential Reasoning (ER) approach has been developed to support multiple criteria decision making (MCDM ) under uncertainty. It is built upon Dempster’s rule for evidence combination and uses belief functions for dealing with probabilistic uncertainty and ignorance. In this introductory paper, following a brief introduction to Dempster’s rule and the ER approach, we report the discovery of a new generic ER rule for evidence combination [16]. We first introduce the concepts and equations of a new extended belief function and then examine the detailed combination equations of the new ER rule. A numerical example is provided to illustrate the new ER rule. Keywords: Evidential reasoning, Belief function, Evidence combination, Dempster’s rule, Multiple criteria decision making.
1
Basic Concepts of Evidence Theory
The evidence theory was first investigated in 1960’s [2] and formalised in 1970’s [7]. It has since been further developed and found widespread applications in many areas such as artificial intelligence, expert systems, pattern recognition, information fusion, database and knowledge discovery, multiple criteria decision making (MCDM ), audit risk assessment, etc. [1, 3, 5, 9–15]. In this section, the basic concepts of belief function and Dempster’s combination rule of the evidence theory are briefly introduced as a basis for introduction of the Evidential Reasoning (ER) approach in the next section. Suppose H = {H1 , . . . , HN } is a set of mutually exclusive and collectively exhaustive propositions, referred to as the frame of discernment. A basic probability assignment (bpa) is a belief function m : Θ → [0, 1], satisfying: m(∅) = 0, and m(c) = 1 (1) C∈Θ
with ∅ being an empty set, C any subset of H, and Θ the power set of H, consisting of all the 2N subsets of H, or Θ = {∅, {H1 }, . . . , {HN }, {H1 , H2 }, . . . , {H1 , HN }, . . . , {H1 , . . . , HN −1 }, H} (2) Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 7–15, 2011. c Springer-Verlag Berlin Heidelberg 2011
8
J.-B. Yang and D.-L. Xu
A basic probability mass m(C) measures the degree of belief exactly assigned to a proposition C and represents how strongly the proposition is supported by evidence. Probabilities assigned to all the subsets of H are summed to unity and there is no belief left to the empty set. A probability assigned to H, or m(H), is referred to as the degree of global ignorance. A probability assigned to any subset of H, except for any individual proposition Hn (n = 1, . . . , N ) and H, is referred to as the degree of local ignorance. If there is no local or global ignorance, a belief function reduces to a conventional probability function. Associated with each bpa to C are a belief measure, Bel(C), and a plausibility measure, P l(C), defined by the following equations: Bel(C) = m(B) and P l(C) = m(B) (3) B⊆C
B∩C=∅
Bel(C) represents the exact support to C and its subsets, and P l(C) represents all possible support to C and its subsets. The interval [Bel(C), P l(C)] can be seen as the lower and upper bounds of support to C. The two functions can be connected by the following equation P l(C) = 1 − Bel(C)
(4)
where C denotes the complement of C. The difference between the belief and plausibility measures of C describes the degree of ignorance in assessment to C [7]. The core of the evidence theory is Dempster’s rule for evidence combination by which evidence from different sources is combined. The rule assumes that information sources are independent and uses the so-called orthogonal sum to combine multiple belief functions: m = m1 ⊕ m2 ⊕ . . . ⊕ mL
(5)
where ⊕ is the orthogonal sum operator. With two pieces of evidence m1 and m2 , Dempster’s rule for evidence combination is given as follows: ⎧ θ=∅ ⎨ 0, m1 (B)m2 (C) [m1 ⊕ m2 ](θ) = (6) B∩C=θ ⎩ 1− m1 (B)m2 (C) , θ = ∅ B∩C=∅
Note that Dempster’s rule provides a non-compensatory process for aggregation of two pieces of evidence and can lead to irrational conclusions in the aggregation of multiple pieces of evidence in conflict [4, 6, 8], in particular in cases where multiple pieces of evidence are mutually compensatory in nature. By a compensatory process of evidence combination, it is meant that any piece of evidence is not dominating but plays a relative role, which is related to its relative importance. On the other hand, the ER approach [9, 11–15] introduced in the next section provides a compensatory evidence aggregation process, which is different from Dempster’s rule in that it treats basic probability assignments as weighted belief degrees, embraces the concept of the degree of indecisiveness caused due to evidence weights, and adopts a normalisation process for combined probability masses without leaving any belief to the empty set.
Introduction to the ER Rule for Evidence Combination
2
9
The Main Steps of the ER Approach for MCDM
In the ER approach, a MCDM problem is modelled using a belief decision matrix. Suppose M alternatives (Al , l = 1, . . . , M ) are assessed on L criteria ei (i = 1, . . . , L) each on the basis of N common evaluation grades (proposition) Hn (n = 1, . . . , N ), which are required to be mutually exclusive and collectively exhaustive. If an alternative Al is assessed to a grade Hn on a criterion ei with a belief degree of βn,i (Al ), this assessment can be denoted by a belief function with global ignorance Si (Al ) = S(ei (Al )) = {(Hn , βn,i (Al )), n = 1, . . . , N, (H, βH,i (Al ))}, with βH,i (Al ) used to measure the degree of global N ignorance, i=1 βn,i (Al ) + βH,i (Al ) = 1, βn,i (Al ) ≥ 0 (n = 1, . . . , N ) and βH,i (Al ) ≥ 0. The individual assessments of all alternatives each on every criterion can be represented by a belief decision matrix, defined as follows: D = (Si (Al ))L×M
(7)
Suppose ωi is the relative weight of the ith criterion, normalised by 0 ≤ ωi ≤ 1 and ωi = 1
(8)
i
The ER approach has both the commutative and associative properties and as such can be used to combine belief functions in any order. The ER aggregation process can be implemented recursively [11–13], summarised as the following main steps. Step 1: Assignment of basic probability masses Suppose the basic probability masses for an assessment S1 (Al ) are given by: mn,1 = mH,1 = mΘ,1 = 1 − ω1
ω1 βn,1 (Al ) for n = 1, . . . , N, N
ω1 βH,1 (Al ), and
n=1 βn,1 (Al )
+ βH,1 (Al ) = 1 − ω1
(9)
In the evidence theory, mn,1 may be interpreted as discounted belief. In MCDM, it should be interpreted as weighted belief or individual support to the assessment of Al to Hn , as it means that in assessing an alternative Al the 1st criterion only plays a limited role that is proportional to its weight. mH,1 represents the weighted global ignorance of the assessment. mΘ,1 is referred as to the degree of indecisiveness left by S1 (Al ), representing the amount of belief that is not yet assigned to any individual or any subset of grades by S1 (Al ) alone but needs to be jointly assigned in accordance with all other assessments in question. Similarly, the basic probability masses for another assessment S2 (Al ) are given by mn,2 = mH,2 = mΘ,2
ω2 βn,2 (Al ) for n = 1, . . . , N, ω2 βH,2 (Al ), and N = 1 − ω2 n=1 βn,2 (Al ) + βH,2 (Al ) = 1 − ω2
(10)
10
J.-B. Yang and D.-L. Xu
Step 2: Combination of basic probability masses The basic probability masses for S1 (Al ) and S2 (Al ) can be combined using the following ER algorithm: {Hn } : mn,12 = k(mn,1 mn,2 + mn,1 (mH,2 + mΘ,2 ) + (mH,1 + mΘ,1 )mn,2 ) (11) for n = 1, . . . , N {H} : mH,12 = k(mH,1 mH,2 + mH,1 mΘ,2 + mH,2 mΘ,1 )
(12)
{Θ} : mΘ,12 = k(mΘ,1 mΘ,2 )
(13)
⎛ k = ⎝1 −
⎞−1
N N
mn,1 mt,2 ⎠
(14)
n=1 t=1;t=n
In the above ER algorithm, mn,12 and mH,12 measure the relative magnitudes of the total beliefs in the individual grade Hn and the frame of discernment H, respectively, generated by combining the two belief functions S1 (Al ) and S2 (Al ). mΘ,12 is the degree of indecisiveness left by both S1 (Al ) and S2 (Al ), representing the amount of belief that needs to be re-assigned back to all subsets of grades proportionally after the combination process is completed, so that no belief is assigned to the empty set. k measures the degree of conflict between S1 (Al ) and S2 (Al ). Step 3: Generation of total belief degrees If there are more than two assessments, Step 2 can be repeated to combine an uncombined assessment with the previously-combined assessment given by mn,12 (n = 1, . . . , N ), mH,12 and mΘ,12 . After all assessments are combined recursively, the finally combined probability masses need be normalised to generate the total belief degrees βn,12 and βH,12 (for L=2) by proportionally re-assigning mΘ,12 back to all subsets of grades as follows: {Hn } : βn,12 =
mn,12 , n = 1, . . . , N 1 − mΘ,12
(15)
mH,12 1 − mΘ,12
(16)
{H} : βH,12 =
The combined assessment for Al is then given by the following belief function: S(Al ) = {(H1 , β1,12 ), (H2 , β2,12 ), . . . , (HN , βN,12 ), (H, βH,12 )}
(17)
The above belief function provides a panoramic view about the combined assessment of the alternative Al with the degrees of strength and weakness explicitly measured by the belief degrees.
Introduction to the ER Rule for Evidence Combination
11
Introduction to the ER Rule for Evidence Combination
3
In Section 2, a belief function with global ignorance was represented by Si (Al ) = S(ei (Al )) = {(Hn , βn,i (Al )), n = 1, . . . , N, (H, βH,i (Al ))}, with (Hn , βn,i (Al )) referred to as a focal element of S(ei (Al )) if βn,i (Al ) > 0. mn,1 = ω1 βn,1 (Al ) given in Equation (9) represents the individual support of the evidence S(e1 (Al )) to the hypothesis that Al is assessed to Hn . Similarly, mn,2 = ω2 βn,2 (Al ) given in Equation (10) represents the individual support of the evidence S(e2 (Al )) to the same hypothesis. As such, mn,1 mn,2 represents the joint support of both S(e1 (Al )) and S(e2 (Al )) to the same hypothesis. Generally, suppose a piece of evidence S(ei ) withthe weight ωi is represented by the following conventional belief function with θ∈Θ βθ,i = 1 S(ei ) = {(θ, βθ,i ), ∀θ ∈ Θ}
(18)
We can now show the extension of the above conventional belief function to include a special element (Θ, (1 − ωi )) for constructing a new extended belief function for S(ei ) as follows [16]: mi = {(θ, mθ,i ), ∀θ ∈ Θ, (Θ, mΘ,i )}
(19)
mθ,i = ωi βθ,i , ∀θ ∈ Θ, and mΘ,i = 1 − ωi
(20)
with Note that the following relationships between a conventional belief function and its extended belief function are always true [16]: βθ,i =
mθ,i , ∀θ ∈ Θ 1 − mΘ,i
(21)
We are now in a position to introduce the new ER rule [16] as follows. Let two pieces of independent evidence S(e1 ) and S(e2 ) with the relative weights ω1 and ω2 be represented by the conventional belief functions defined by Equation (18) with ω1 + ω2 = 1, mθ,1 = ω1 βθ,1 and mθ,2 = ω2 βθ,2 for all θ ⊆ H. Then, S(e1 ) and S(e2 ) can be combined by the following ER rule which can be used recursively for aggregating multiple pieces of evidence [16]: mθ,12 = θ⊆H
m
θ,12 m
Θ,12 , ∀θ ⊆ H, and mΘ,12 = m
θ,12 + m
Θ,12 m
θ,12 + m
Θ,12
(22)
θ⊆H
m
θ,12 βθ,12 = , ∀θ ⊆ H, m
θ,12 θ⊆H
m
θ,12 = [(1 − ω2 )mθ,1 + (1 − ω1 )mθ,2 ] +
(23)
mB,1 mC,2
(24)
B,C⊆H;B∩C=θ
m
Θ,12 = mΘ,1 mΘ,2
(25)
12
J.-B. Yang and D.-L. Xu
The combined extended and conventional belief functions can then be represented as follows: m1 ⊕ m2 = {(θ, mθ,12), ∀θ ∈ Θ, (Θ, mΘ,12 )}
(26)
S(e1 ) ⊗ S(e2 ) = {(θ, βθ,12 ), ∀θ ∈ Θ}
(27)
where ⊕ is the orthogonal sum operator composed of Equations (22), (24) and (25) for generating combined extended belief functions, which can be applied recursively, and ⊗ is the ER operator consisting of Equations (23) and (24) for generating combined conventional belief functions, which can be used after extended belief functions are combined. The new ER rule results from the innovation of implementing Dempster’s rule on the new extended belief functions. It can be shown that the current ER approach as summarized in section 2 is a special case of the new ER rule. The new ER rule provides a generic process for generating total beliefs from combination of multiple pieces of independent evidence under the normal condition that each piece of evidence plays a role equal to its relative weight. The ER rule can be applied in areas where the above normal condition is satisfied, for example in multiple criteria decision making. It is important to note that the combined belief generated by using the ER rule to aggregate two pieces of evidence is composed of two parts: the bounded average of the individual support, which is the first bracketed term in Equation (24), and the orthogonal sum of the joint support, which is the last term in Equation (24). This is in contract to the partial belief generated by using Dempster’s rule on conventional belief functions, including only the orthogonal sum to count for joint support, with individual support either abandoned or assigned to the empty set, either of which is irrational.
4
Illustration of the ER Rule
We now examine a simple example to illustrate how the ER rule can be implemented and explain whether the results it generates are rational. Suppose three pieces of evidence of equal importance are given as the following three belief functions each with only its focal elements listed: S(e1 ) = {(A, 0.99), (B, 0.01)} S(e2 ) = {(B, 0.01), (C, 0.99)}
(28) (29)
S(e3 ) = {(B, 0.01), ({A, C}, 0.99)}
(30)
with ω1 = ω2 = ω3 = 0.3333. Suppose they each play a role equal to their relative weights. Note that H = {A, B, C} and Θ = {∅, A, B, C, {A, B}, {A, C}, {B, C}, {A, B, C}}
Introduction to the ER Rule for Evidence Combination
13
in this example. The extended belief functions corresponding to Equations (28)(30) are given using Equations (19) and (20) as follows m1 = {(A, 0.33), (B, 0.0033), (Θ, 0.6667)} m2 = {(B, 0.0033), (C, 0.33), (Θ, 0.6667)}
(31) (32)
m3 = {(B, 0.0033), ({A, C}, 0.33), (Θ, 0.6667)}
(33)
The calculations of the ER rule for the above example are shown in Table 1. The ER rule is applied recursively (Equations (22), (24) and (25)) to Equations (31)(33). The results of the last row, generated using Equation (23) after the second iteration, show the final combined conventional belief function, complementary to the extended belief function, shown in the last but one row, which is generated by aggregating all the three extended belief functions shown in rows 5-7. In the final results, βA,123 = βC,123 = 0.3718 are the highest total belief, which makes sense as the first evidence supports the proposition A and the second evidence supports the proposition C with the same magnitude, while the third evidence supports the proposition {A, C} with no discrimination between the two individual propositions A and C. β{A,C},123 = 0.2487 is generated rightly as the second highest total belief as the third evidence supports {A, C}, so the significant local ignorance in {A, C} should remain in the final results. The proposition B is assessed to be unlikely by all the three pieces of evidence, so it makes sense that the total belief in this proposition should also be rather small. The total belief in each of the other propositions (∅, {A, B}, {B, C} and {A, B, C}) is zero as it should be. Table 1. Illustration of the ER Rule Belief
In this paper, following the discussion of Dempster’s rule and the ER approach, we reported the discovery of the new ER rule that provides a general process for combining multiple pieces of independent evidence in form of belief functions under the normal condition that every piece of evidence plays a limited role equivalent to its relative weight. The ER rule generates the total beliefs from combination of every two pieces of evidence as the addition of the bounded average of the individual support from each of the two pieces of evidence and the orthogonal sum of the joint support from the two pieces of evidence, which reveals that the orthogonal sum of the joint support from two pieces of evidence is only part of their total combined belief. A numerical example was examined in some detail to illustrate this general yet rational and rigorous ER rule for evidence combination. The new ER rule can be applied for combination of independent evidence in any cases where the normal condition is satisfied. Acknowledgments. This work was supported by the UK Engineering and Physical Science Research Council under the Grant No: EP/F024606/1 and the Natural Science Foundation of China under the Grant No: 60736026.
References 1. Beynon, M.: DS/AHP method: A mathematical analysis, including an understanding of uncertainty. European Journal of Operational Research 140(1), 148–164 (2002) 2. Dempster, A.P.: Upper and lower probabilities induced by a multi-valued mapping. Annals of Mathematical Statistics 38, 325–339 (1967) 3. Denoeux, T., Zouhal, L.M.: Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets and Systems 122(3), 409–424 (2001) 4. Haenni, R.: Shedding new light on zadeh’s criticism of Dempster’s rule of combination. In: The 7th International Conference on Information Fusion (FUSION) (2005) 5. McClean, S., Scotney, B.: Using evidence theory for knowledge discovery and extraction in distributed databases. International Journal of Intelligent Systems 12, 763–776 (1997) 6. Murphy, C.K.: Combining belief functions when evidence conflicts. Decision Support Systems 29, 1–9 (2000) 7. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976) 8. Wang, Y.M., Yang, J.B., Xu, D.L.: The evidential reasoning approach for multiple attribute decision analysis using interval belief degrees. European Journal of Operational Research 175(1), 35–66 (2006) 9. Xu, D.L., Yang, J.B., Wang, Y.M.: The ER approach for multi-attribute decision analysis under interval uncertainties. European Journal of Operational Research 174(3), 1914–1943 (2006) 10. Yager, R.R.: Decision making using minimization of regret. International Journal of Approximate Reasoning 36, 109–128 (2004)
Introduction to the ER Rule for Evidence Combination
15
11. Yang, J.B., Singh, M.G.: An evidential reasoning approach for multiple attribute decision making with uncertainty. IEEE Transactions on Systems, Man, and Cybernetics 24(1), 1–18 (1994) 12. Yang, J.B.: Rule and utility based evidential reasoning approach for multiattribute decision analysis under uncertainties. European Journal of Operational Research 131, 31–61 (2001) 13. Yang, J.B., Xu, D.L.: On the evidential reasoning algorithm for multiattribute decision analysis under uncertainty. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans 32(3), 289–304 (2002) 14. Yang, J.B., Wang, Y.M., Xu, D.L.: The Evidential reasoning approach for MADA under both probabilistic and fuzzy uncertainties. European Journal of Operational Research 171(1), 309–343 (2006) 15. Yang, J.B., Xu, D.L., Xie, X.L., Maddulapalli, A.K.: Multicriteria evidential reasoning decision modelling and analysis - prioritising voices of customer. Journal of the Operational Research Society 62, 1638–1654 (2011) 16. Yang, J.B., Xu, D.L.: The evidential reasoning rule for evidence combination. Artificial Intelligence (submitted to, 2011)
Theories and Approaches to Treat Incomparability Yang Xu1 and Jun Liu2 1
2
College of Mathematics, Southwest Jiaotong University Chengdu, Sichuan 610031, P.R. China [email protected] School of Computing and Mathematics, University of Ulster, UK [email protected]
Extended Abstract As we know, when human beings try to understand and deal with practical problems, especially in their evaluation or decision making process, comparison is a way commonly used to tell something about something else - “there can be no differentiation without comparison”. Some claim that chains, i.e., totally ordered sets, can be applied in most cases. But the assumption is often an oversimplification of reality. Actually, relations in the real world are rarely linear. Incomparability is a kind of uncertainty often associated with humans’ intelligent activities in the real world, and it exists not only in the processed object itself, but also in the course of the object being processed. It is a kind of overall uncertainty of objects due to the complexity of objects themselves, associated with many factors and the inconsistency among those factors. It is caused by the complex degree of an object. The more complex for certain object, the more for attribute of certain, and the larger for incomparability. This fact implies an overall uncertainty of objects, which can be due to missing information, ambiguity or conflicting evaluations. Incomparability is an important type of uncertainties, especially inevitable in decision making and evaluation situations, but it is not easily handled through conventional methods because of its complexity. This talk is organized as follows: Section 1 as an introduction gives several examples of incomparability. Section 2 proposes the algebraic approaches to characterize incomparability, some typical algebras, such as Residuated Lattice, Pseudo-Boolean Algebra, BL-Algebra, Lattice Implication Algebra (LIA) (specially, Linguistic-Valued Lattice Implication Algebra) , are overviewed. Section 3 proposes the logical theory and methods to deal with incomparability, including the lattice-valued logic system based on LIA, the uncertainty reasoning based on lattice-valued logic with truth-values in LIAs, as well as the resolution-based automated reasoning in lattice-valued logic system with truth-values in LIAs. Section 4 proposes mathematical analysis theory and methods to treat incomparability, which is based on l ∗ -module, i.e., a lattice-ordered module with two lattice-ordered structures. Concluding remarks are presented in Section 6. In summary, some academic standpoints are pointed out as follows: Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 16–17, 2011. c Springer-Verlag Berlin Heidelberg 2011
Theories and Approaches to Treat Incomparability
17
1. There exist in both objective world and subjective world great quantities of incomparability. 2. Algebraic method, logical method and mathematics analytical method can be applied to treat incomparability. 3. Incomparability also commonly exists in linguistic values of human language, therefore, algebraic method, logical method and mathematics analytical method can be applied to treat linguistic value structure with incomparability. 4. Generally, in the evaluation and decision-making always involved with incomparability, linguistic values and uncertainty reasoning. Therefore, it is a kind of scientific approach to apply lattice-valued logic system based on linguistic truth-valued algebra. 5. The problem about how to establish appropriate methods to handle those incomparability and complex linguistic value structure in the real world application is still open and it has been an important and worthwhile research topic. But we believe that it is feasible and reasonable to use ordering structure (especially lattice), logical reasoning and mathematics analysis as the possible solution, especially use lattice-valued algebra and lattice-valued logic to establish strict linguistic truth-valued logic and various kinds of corresponding linguistic information processing systems, based on what have been done so far about lattice-valued algebra, lattice-valued logic by different researches, also relying on a continuous work on this research direction. Acknowledgement. This work is partially supported by the National Natural Science Foundation of China (Grant No. 60875034) and the projects TIN-20090828, P08-TIC-3548 and FEDER funds.
Knowledge Science – Modeling the Knowledge Creation Process Yoshiteru Nakamori School of Knowledge Science Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan [email protected]
Extended Abstract Knowledge science is a problem-oriented interdisciplinary field that takes as its subject the modeling of the knowledge creation process and its application, and carries out research in such disciplines as knowledge management, management of technology, support for the discovery, synthesis and creation of knowledge, and innovation theory with the aim of constructing a better knowledge-based society. This presentation considers what knowledge science should be, introducing a forthcoming book entitled “Knowledge Science - Modeling the Knowledge Creation Process” (Nakamori [1]) as well as the School of Knowledge Science at Japan Advanced Institute of Science and Technology, which is the first school established in the world to make knowledge a target of science. The first dean of the School was Professor Ikujiro Nonaka who is famous worldwide for his organizational knowledge creation model called the SECI spiral (Nonaka and Takeuchi [3]), which is in fact the key factor in establishing the School. The presentation also briefly introduces a methodology for knowledge synthesis called the theory of knowledge construction systems; its fundamental part was already published in Systems Research and Behavioral Science (Nakamori et al. [2]). Keywords: Knowledge technology, Knowledge management, Knowledge discovery, Knowledge synthesis, Knowledge justification, Knowledge construction.
References 1. Nakamori, Y. (ed.): Knowledge Science – Modeling the Knowledge Creation Process. CRC Press (2011) 2. Nakamori, Y., Wierzbicki, A.P., Zhu, Z.: A Theory of Knowledge Construction Systems. Syst. Res. Behav. Sci. 28, 15–39 (2011) 3. Nonaka, I., Takeuchi, H.: The Knowledge Creating Company: How Japanese Companies Create the Dynamics of Innovation. Oxford University Press, New York (1995)
Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, p. 18, 2011. c Springer-Verlag Berlin Heidelberg 2011
Two Classes of Algorithms for Data Clustering Sadaaki Miyamoto Department of Risk Engineering, Faculty of Systems and Information Engineering University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan [email protected]
Abstract. The two classes of agglomerative hierarchical clustering algorithms and K-means algorithms are overviewed. Moreover recent topics of kernel functions and semi-supervised clustering in the two classes are discussed. This paper reviews traditional methods as well as new techniques. Keywords: agglomerative hierarchical clustering, K-means clustering, kernel functions, constrained clustering.
1
Introduction
Cluster analysis, also called data clustering or simply clustering, has now become a standard tool in modern data mining. Clustering techniques are divided into two classes of hierarchical and non-hierarchical methods. The major technique in the first class is the agglomerative hierarchical clustering [2, 18] which is old but has been found useful in a variety of fields of applications. The latter category has various methods. Some of them are a family of K-means algorithms [13, 14, 32], fuzzy c-means and their variations [6, 7, 12, 15, 16, 25, 40], mixture of densities [33, 43], algorithms related to SOM [27], and other heuristic techniques. To overview all of them needs a book or at least a book chapter. Instead, we discuss a class that has most frequently been studied, i.e., K-means algorithms and their variations. This paper thus considers two classes of clustering techniques: agglomerative hierarchical clustering and methods related to K-means. In the latter class we mention the mixture of distributions which is also regarded as a standard technique of clustering using the EM algorithm [11, 33, 43]. The topics discussed here is the overview of basic methods as well as more recent studies of kernel functions [44, 48] and semi-supervised clustering [3–5, 9, 50, 52]. This paper thus reviews well-known techniques as well as newly developed methods. The rest of this paper is as follows. Section 2 introduces agglomerative hierarchical clustering and standard linkage methods. Section 3 is devoted to the discussion of a class of K-means techniques. In particular fuzzy c-means and their variations are focused upon. Section 4 discusses the use of kernel functions in cluster analysis. After studying a recent topic of semi-supervised clustering in Chapter 5, The final chapter concludes the paper. Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 19–30, 2011. c Springer-Verlag Berlin Heidelberg 2011
20
2
S. Miyamoto
Agglomerative Hierarchical Clustering
We first review the general procedure of agglomerative hierarchical clustering. Let the set of objects for clustering be X = {x1 , . . . , xN }. The object xk is generally a point in the p-dimensioanl Euclidean space Rp , unless otherwise stated. Generally a cluster denoted by Gi is a subset of X. The family of clusters is denoted by G = {G1 , G2 , . . . , GK }, where the clusters form a crisp partition of X: K
Gi = X,
Gi ∩ Gj = ∅
(i = j).
i=1
Moreover the number of objects in G is denoted by |G|. Agglomerative hierarchical clustering uses a similarity or dissimilarity measure. We use dissimilarity here: dissimilarity between two objects x, y ∈ X is assumed to be given and denoted by D(x, y). Dissimilarity between two clusters is also used, which is denoted by D(G, G ) (G, G ∈ G) which also is called an inter-cluster dissimilarity. In the classical setting a dissimilarity measure is assumed to be symmetric: D(G, G ) = D(G , G). Let us first describe a general procedure of agglomerative hierarchical clustering [34, 36]. AHC (Agglomerative Hierarchical Clustering) Algorithm: AHC1: Assume that initial clusters are given by ˆ 2, . . . , G ˆ N0 }. where G ˆ 1, G ˆ2, . . . , G ˆ N are given initial clusters. ˆ 1, G G = {G ˆ j = {xj } ⊂ X, hence N0 = N . Generally G Set K = N0 . (K is the number of clusters and N0 is the initial number of clusters) ˆ i (i = 1, . . . , K). Gi = G Calculate D(G, G ) for all pairs G, G ∈ G. AHC2: Search the pair of maximum similarity: (Gp , Gq ) = arg min D(Gi , Gj ), Gi ,Gj ∈G
Merge: Gr = Gp ∪ Gq . Add Gr to G and delete Gp , Gq from G. K = K − 1. if K = 1 then stop and output the dendrogram. AHC3: Update similarity D(Gr , G ) for all G ∈ G. Go to AHC2. End AHC.
(1)
Two Classes of Algorithms for Data Clustering
21
Well-known linkage methods are as follows: all assume symmetric dissimilarity measures [2, 18, 34]. – single link:
– complete link:
D(G, G ) = D(G, G ) =
min
D(x, y),
(2)
max
D(x, y),
(3)
x∈G,y∈G
x∈G,y∈G
– average link: D(G, G ) =
1 |G||G |
D(x, y).
(4)
x∈G,y∈G
– centroid method: Let the centroid of a cluster G be M (G) =
1 xk . |G| xk ∈G
Using M (G) and the Euclidean norm · , we define D(G, G ) = M (G) − M (G )2 . – Ward method: Let E(G) =
(5)
x − M (G)2 .
x∈G
We define
D(G, Gj ) = E(G ∪ G ) − E(G) − E(G).
(6)
Each of the five linkage methods have the updating formula in AHC3 whereby the computation is greatly reduced but we omit the formulas to save space [2, 34, 36]. These linkage methods are discussed later again in relation to kernel functions and semi-supervised clustering.
3
K-means Clustering
In K-means and their variations, the number of clusters, denoted by c here, is assumed to be given beforehand. We also use membership matrix U = (uki ) in which uki is the membership of xk to cluster i (or Gi ); uki may either be crisp or fuzzy. Moreover cluster centers denoted by vi (i = 1, . . . , c) are used. We first introduce an informal procedure called a c-prototypes: A c-prototype procedure. Step 0: Choose randomly initial c prototypes for clusters. Step 1: Take a set of objects and allocate each of them to the cluster of the nearest prototype.
22
S. Miyamoto
Step 2: Update prototypes. Step 3: If clusters are convergent, stop. Else go to Step 1. This procedure is not a precise algorithm, since the way how prototypes are updated is not described. However, several algorithms are derived as variations of the above procedure. The simplest algorithm is that of the K-means [2, 32] which also is called crisp c-means. The basic algorithm is as follows: (I) Assume that the initial c clusters are randomly generated. Let the centroid of cluster i be vi (i = 1, . . . , c). (II) For k = 1, . . . , N , allocate xk to the cluster of the nearest center: i = arg min xk − vj 2 . 1≤j≤c
(7)
(III) Update the centroid vi for cluster i (i = 1, . . . , c). If clusters are convergent, stop. Else go to step (II). Comparing this algorithm with the c-prototype procedure, we find that the prototypes are centroids (centers of gravity) and the updating uses the recalculation of the centroids. The above algorithm reallocates all objects, but the on-line algorithm [13, 36] takes one object at a time and update the cluster centers as the centroids. SOM techniques [27] are now popular and sometimes used as methods of clustering. Let us consider the simplest algorithm of VQ [27]. The VQ clustering can be derived from the c-prototype: one object x(t), t = 1, 2, . . . , is taken and suppose vi (t) is the nearest prototype: Then vi is updated using the learning ratio α(t): vi (t + 1) = vi (t) + α(t)(x(t) − vi (t)), (8) while other prototypes remain unchanged. Each object is allocated to the cluster of the nearest prototype. Moreover fuzzy c-means and the mixture of distributions are considered to be variations of the c-prototype procedure. Fuzzy c-means are discussed later in detail, but we note that the nearest allocation is replaced by fuzzy memberships, and centroids are replaced by weighted centroids using fuzzy memberships. The mixture of distributions is also related to c-prototypes, where the nearest allocation is replaced by the probability P (Gi |xk ) by which observation xk should be allocated to cluster Gi , where the probability is calculated by the EM algorithm [33]. Hence the update of prototypes implies parameter updating in the distributions. 3.1
Fuzzy c-means and Variations
Fuzzy c-means clustering have been studied by many researchers, but there are less known and useful facts. We first introduce two basic objective functions:
Two Classes of Algorithms for Data Clustering
JB (U, V ) =
c N
(uki )m D(xk , vi ),
(m > 1),
23
(9)
i=1 k=1
JE (U, V ) =
c N
{uki D(xk , vi ) + λ−1 uki log uki },
(λ > 0),
(10)
i=1 k=1
where D(xk , vi ) = xk − vi 2 . Note that if m = 1 in JB , the function expresses that for crisp c-means (alias K-means). Function JB has been introduced by Dunn [15, 16] and Bezdek [6, 7]. On the other hand, JE using the entropy term was later discussed by a few authors [30, 31, 35]. To consider and compare these two functions is important to observe theoretical properties of fuzzy c-means. To save space, discussions of theoretical properties are omitted; readers may refer to [40]. The basic algorithm of fuzzy c-means is the following alternate optimization, where c J = JB or J = JE . Optimization with respect to U uses the constraint of i=1 uki = 1 and ukj ≥ 0 for all j, k. FCM Algorithm of Alternate Optimization. FCM1: Put initial value V¯ randomly. ¯. FCM2: Minimize J(U, V¯ ) with respect to U . Let the optimal solution be U ¯ , V ) with respect to V . Let the optimal solution be V¯ . FCM3: Minimize J(U ¯ , V¯ ) is convergent, stop. Otherwise go to FCM2. FCM4: If (U End FCM. The solutions for JB are as follows: 1
uki =
1
D(xk ,vi ) m−1 c 1 1 j=1 D(xk ,vj ) m−1
,
N
k=1 vi = N
(uki )m xk
k=1 (uki )
m
,
whereas the solutions for JE are given by the following: uki
exp(−λD(xk , vi )) = c , j=1 exp(−λD(xk , vj ))
N
k=1 vi = N
uki xk
k=1
uki
.
There are many variations of fuzzy c-means. For example, fuzzy c-regression models [21] that produce c regressions from input-output data {(xk , yk )}1≤k≤N using Dki = (yk − βiT xk − γi ), i = 1, . . . , c instead of D(xk , vi ). Note that the fuzzy c-regression models can also be discussed in relation to the c-prototype procedure. Another function that uses a quadratic term u2ki instead of the entropy term uki log uki in JE has been proposed in [39]. This method is theoretically interesting but requires more computation than that of JE .
24
S. Miyamoto
A remarkable study has been done by Ichihashi and his colleagues [24, 40]: they introduced two more variables α, S to JE in addition to U and V . The variable α = (α1 , . . . , αc ) ≥ 0 controls cluster volumes and S = (S1 , . . . , Sc ) is for clusterwise covariances. They propose the following objective function: c N
uik + log |Si |}. αi i=1 k=1 (11) c where i=1 αi = 1. Although we omit the optimal solutions (see, e.g., [40]), an interesting property is that the solutions are equivalent to those by the mixture of Gaussian distributions [36, 40] by choosing parameter λ = 1/2. Thus, although the fuzzy and probabilistic models are different, they lead to the same solutions. Another class of variations is the possibilistic clustering [10, 28] which uses JKL (U, V, A, S) =
JP (U, V ) =
c N
{uki (xk − vi )T Si−1 (xk − vi ) + λ−1 uki log
{(uki )m D(xk , vi ) + ζ −1 (1 − uki )m },
(ζ > 0).
(12)
i=1 k=1
and the FCM algorithm but without the constraint of the optimal solutions but they are easily derived. 3.2
c
i=1
uki = 1. We omit
Deciding the Number of Clusters
Frequently the number of clusters is unknown but it should be specified in Kmeans and related algorithms in this section. To decide the number is therefore important, but it is at the same time known as a difficult problem. There are two approaches: one is to use a method of model selection in statistical literature, e.g., [1]. In spite of many efforts, there is no study reporting a certain method of model selection is generally useful. The second approach is common in fuzzy literature. A cluster validity measure should be used. A problem is that although there have been many validity measures [6, 12, 22], there is no report stating a particular measure is the most useful. Hashimoto et al. [20] compared several measures using many numerical experiments, and report that, although there is no best measure for all experiments, the number of clusters can rightly be decided by comparing the results of different measures if clusters are well-separated.
4
Kernel Functions in Clustering
Linear cluster boundaries are obtained from crisp and fuzzy c-means clustering. In contrast, nonlinear boundaries can be derived using kernel functions, as discussed in support vector machines [44, 47, 48]. A high-dimensional feature space H is assumed, while the original space Rp is called the data space. H is an inner product space. Assume that the inner product is ·, ·. The norm of H for g ∈ H is given by g2H = g, g.
Two Classes of Algorithms for Data Clustering
25
A mapping Φ : Rp → H is used whereby xk is mapped into Φ(xk ). Explicit representation of Φ(x) is unknown in general but the inner product Φ(x), Φ(y) is assumed to be represented by a kernel function: K(x, y) = Φ(x), Φ(y).
(13)
A well-known kernel function is the Gaussian kernel: K(x, y) = exp − Cx − y2 , (C > 0). 4.1
Kernel Functions in Fuzzy c-means
We can use kernel functions in crisp and fuzzy c-means [37, 38]: objective functions JB and JE are used but the dissimilarity is changed as follows: D(xk , vi ) = Φ(xk ) − vi 2H ,
(14)
where vi ∈ H. When we derive a kernel-based fuzzy c-means algorithm, we cannot use the solutions for vi . Let us consider JB . We have N (uki )m Φ(xk ) vi = k=1 , N m k=1 (uki ) but function Φ(xk ) is generally unknown. Hence we cannot use FCM algorithm. Instead, we update dissimilarity measure D(xk , vi ): D(xk , vi ) = Kkk − N
N
2
m k=1 (uki )
1 + N ( k=1 (uki )m )2
(uji )m Kjk
j=1
N N
(uji ui )m Kj ,
(15)
j=1 =1
where Kjk = K(xj , xk ). Note that m = 1 in (15) when JE is considered. We thus repeat (15) and 1
uki =
1
D(xk ,vi ) m−1 c 1 1 j=1 D(xk ,vj ) m−1
until convergence. To handle JE and the crisp c-means [19] is similar and omitted. 4.2
Kernel Functions in Agglomerative Hierarchical Clustering
Let us give a short remark as to how kernel functions can be used in agglomerative hierarchical clustering. It means that we consider {Φ(x1 ), . . . , Φ(xN )} instead of {x1 , . . . , xN }.
26
S. Miyamoto
Let us assume that the mapping from the original squared distance δij = xi − xj 2 to the squared distance in H, Δij = Φ(xi ) − Φ(xj )2 , is monotone: if δij ≤ δkl , then Δij ≤ Δkl , for all 1 ≤ i, j ≤ N and 1 ≤ k, l ≤ N . This monotone property holds for typical kernel functions such as the Gaussian kernel. Moreover, completely monotone functions [45] define a wide class of kernel functions (see, e.g., [23]). The following proposition holds: Proposition 1. The updating formulas of the single linkage, the complete linkage, the average linkage, the centroid method, and the Ward method are used for clustering {Φ(x1 ), . . . , Φ(xN )} with Δij without any change. This property has been studied in [17]. We omit the proof, as we do not show the updating formulas here. It is not strange that all the updating formulas of the linkage methods are the same even when the high-dimensional feature space is used, as the derivations of the formulas use the monotonicity and the calculation of the inner products. Note that when agglomerative hierarchical clustering is applied with a kernel function, the initial value Δij has to be calculated using the kernel: Δij = K(xi , xi ) + K(xj , xj ) − 2K(xi , xj ).
5
Semi-supervised Clustering
Many researchers are now working on semi-supervised learning [3–5, 9, 26, 29, 46, 49–52]. They are divided into two major classes: in the first class we have two sets of objects: {(xk , yk )}1≤k≤N with the class labels yk and {xl }N +1≤l≤N +L without labels. Thus, partly supervised cases should be handled. For such a case the mixture of distributions with an EM algorithm can be used [52], or the transductive support vector machines can be applied [49, 52]. A variation of fuzzy c-means for such a case has also been considered [8]. Another class is called constrained clustering [5], where two sets of constraints, called pairwise constraints, are assumed: a set M L = {(xi , xj )} ⊂ X ×X consists of must-link pairs so that xi and xj should be in a same cluster, while another set CL = {(xk , xl )} ⊂ X×X consists of cannot-link pairs so that xi and xj should be in different clusters. M L and CL are assumed to be symmetric in the sense that if (xi , xj ) ∈ M L then (xj , xi ) ∈ M L, and if (xk , xl ) ∈ CL then (xl , xk ) ∈ CL. We assume that no inconsistency (e.g., (xi , xj ) ∈ M L, (xj , xk ) ∈ M L, and (xi , xk ) ∈ CL) arises for simplicity. Methods to handle constrained clustering are as follows: 1. The method of COP K-means [50] uses the same algorithm as the crisp c-means, except that it checks the pairwise constraints are satisfied or not when each object are allocated to the nearest cluster center. If a constraint is violated, the algorithm terminates with failure. 2. Crisp c-means with penalty terms are used [29]. 3. The mixture of distributions with penalty terms has been proposed [3, 4, 46]. 4. Fuzzy c-means with constraints have also been proposed [51].
Two Classes of Algorithms for Data Clustering
27
We mention agglomerative hierarchical clustering with pairwise constraints [41, 42]. The results are summarized as follows. (i) To introduce the pairwise constraints to the single linkage, the complete linkage, and the average linkage is straightforward. We change the dissimilarity to zero for pairs in M L and +∞ for pairs in CL. (ii) To introduce the pairwise constraints to the centroid method and the Ward method needs further consideration. One way is to modify the dissimilarity using kernel functions [41]; numerical experiments showed some effects of this method. (iii) To introduce the pairwise constraints to the centroid method and the Ward method using penalties have also been studied [42] and numerical experiments showed that this method is also effective. The method in (iii) implies that (1) should be replaced by (Gp , Gq ) = arg min D(Gi , Gj ) Gi ,Gj ∈G WCL − + x∈Gi ,x ∈Gj ,(x,x )∈CL
WML .
(16)
x∈Gi ,x ∈Gj ,(x,x )∈ML
where WCL and WML are penalties for cannot-link and must-link, respectively.
6
Conclusions
We overviewed basic algorithms of agglomerative hierarchical clustering and the class of K-means algorithms. The latter class includes the mixture of distributions and fuzzy c-means with variations. More recent studies on kernel functions and semi-supervised clustering have also been discussed. Although many studies have been done on fuzzy c-means clustering, we have still many possibilities for future research. For such a purpose, consideration of relations with probabilistic models seems promising. There are many other interesting studies of clustering, but we omit them. Moreover numerical examples are also omitted due to space limitations. They are given in the cited literature. Acknowledgment. This work has partly been supported by the Grant-inAid for Scientific Research, Japan Society for the Promotion of Science, No. 23500269.
References 1. Akaike, H.: A Bayesian Analysis of the Minimum AIC Procedure. Annals of the Institute of Statistical Mathematics 30(1), 9–14 (1978) 2. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)
28
S. Miyamoto
3. Basu, S., Bilenko, M., Mooney, R.J.: A Probabilistic Framework for SemiSupervised Clustering. In: Proc. of the Tenth ACM SIGKDD (KDD 2004), pp. 59–68 (2004) 4. Basu, S., Banerjee, A., Mooney, R.J.: Active Semi-Supervision for Pairwise Constrained Clustering. In: Proc. of the SIAM International Conference on Data Mining (SDM 2004), pp. 333–344 (2004) 5. Basu, S., Davidson, I., Wagstaff, K.L. (eds.): Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC, Boca Raton (2009) 6. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press (1981) 7. Bezdek, J.C., Keller, J., Krishnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer, Boston (1999) 8. Bouchachia, A., Pedrycz, W.: A Semi-supervised Clustering Algorithm for Data Exploration. In: De Baets, B., Kaynak, O., Bilgi¸c, T. (eds.) IFSA 2003. LNCS (LNAI), vol. 2715, pp. 328–337. Springer, Heidelberg (2003) 9. Chapelle, O., Sch¨ olkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006) 10. Dav´e, R.N., Krishnapuram, R.: Robust Clustering Methods: A Unified View. IEEE Trans. on Fuzzy Systems 5(2), 270–293 (1997) 11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. B39, 1–38 (1977) 12. Dumitrescu, D., Lazzerini, B., Jain, L.C.: Fuzzy Sets and Their Application to Clustering and Training. CRC Press, Boca Raton (2000) 13. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons (1973) 14. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001) 15. Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-separated Clusters. J. of Cybernetics 3, 32–57 (1974) 16. Dunn, J.C.: Well-separated Clusters and Optimal Fuzzy Partitions. J. of Cybernetics 4, 95–104 (1974) 17. Endo, Y., Haruyama, H., Okubo, T.: On Some Hierarchical Clustering Algorithms Using Kernel Functions. In: Proc. of FUZZ-IEEE 2004, CD-ROM Proc., Budapest, Hungary, July 25-29, pp. 1–6 (2004) 18. Everitt, B.S.: Cluster Analysis, 3rd edn. Arnold, London (1993) 19. Girolami, M.: Mercer Kernel Based Clustering in Feature Space. IEEE Trans. on Neural Networks 13(3), 780–784 (2002) 20. Hashimoto, W., Nakamura, T., Miyamoto, S.: Comparison and Evaluation of Different Cluster Validity Measures Including Their Kernelization. Journal of Advanced Computational Intelligence and Intelligent Informatics 13(3), 204–209 (2009) 21. Hathaway, R.J., Bezdek, J.C.: Switching Regression Models and Fuzzy Clustering. IEEE Trans. on Fuzzy Systems 1, 195–204 (1993) 22. H¨ oppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. Wiley, Chichester (1999) 23. Hwang, J., Miyamoto, S.: Kernel Functions Derived from Fuzzy Clustering and Their Application to Kernel Fuzzy c-Means. Journal of Advanced Computational Intelligence and Intelligent Informatics 15(1), 90–94 (2011) 24. Ichihashi, H., Honda, K., Tani, N.: Gaussian Mixture PDF Approximation and Fuzzy c-Means Clustering with Entropy Regularization. In: Proc. of Fourth Asian Fuzzy Systems Symposium, vol. 1, pp. 217–221 (2000)
Two Classes of Algorithms for Data Clustering
29
25. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990) 26. Klein, D., Kamvar, S.D., Manning, C.: From Instance-level Constraints to Spacelevel Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proc. of the Intern. Conf. on Machine Learning, Sydney, Australia, pp. 307–314 (2002) 27. Kohonen, T.: Self-Organizing Maps, 2nd edn. Springer, Berlin (1997) 28. Krishnapuram, R., Keller, J.M.: A Possibilistic Approach to Clustering. IEEE Trans. on Fuzzy Systems 1, 98–110 (1993) 29. Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised Graph Clustering: A Kernel Approach. Mach. Learn. 74, 1–22 (2009) 30. Li, R.P., Mukaidono, M.: A Maximum Entropy Approach to Fuzzy Clustering. In: Proc. of the 4th IEEE Intern. Conf. on Fuzzy Systems (FUZZ-IEEE/IFES 1995), Yokohama, Japan, March 20-24, pp. 2227–2232 (1995) 31. Li, R.P., Mukaidono, M.: Gaussian Clustering Method Based on Maximum-fuzzyentropy Interpretation. Fuzzy Sets and Systems 102, 253–258 (1999) 32. MacQueen, J.B.: Some Methods of Classification and Analysis of Multivariate Observations. In: Proc. of 5th Berkeley Symposium on Math. Stat. and Prob., pp. 281–297 (1967) 33. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000) 34. Miyamoto, S.: Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer, Dordrecht (1990) 35. Miyamoto, S., Mukaidono, M.: Fuzzy c-means as a Regularization and Maximum Entropy Approach. In: Proc. of the 7th International Fuzzy Systems Association World Congress (IFSA 1997), Prague, Czech, June 25-30, vol. II, pp. 86–92 (1997) 36. Miyamoto, S.: Introduction to Cluster Analysis, Morikita-Shuppan, Tokyo (1999) (in Japanese) 37. Miyamoto, S., Nakayama, Y.: Algorithms of Hard c-means Clustering Using Kernel Functions in Support Vector Machines. Journal of Advanced Computational Intelligence and Intelligent Informatics 7(1), 19–24 (2003) 38. Miyamoto, S., Suizu, D.: Fuzzy c-means Clustering Using Kernel Functions in Support Vector Machines. Journal of Advanced Computational Intelligence and Intelligent Informatics 7(1), 25–30 (2003) 39. Miyamoto, S., Suizu, D., Takata, O.: Methods of Fuzzy c-means and Possibilistic Clustering Using a Quadratic Term. Scientiae Mathematicae Japonicae 60(2), 217– 233 (2004) 40. Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Heidelberg (2008) 41. Miyamoto, S., Terami, A.: Semi-Supervised Agglomerative Hierarchical Clustering Algorithms with Pairwise Constraints. In: Proc. of WCCI 2010 IEEE World Congress on Computational Intelligence, CCIB, Barcelona, Spain, July 18-23, pp. 2796–2801 (2010) 42. Miyamoto, S., Terami, A.: Constrained Agglomerative Hierarchical Clustering Algorithms with Penalties. In: Proc. of 2011 IEEE International Conference on Fuzzy Systems, Taipei, Taiwan, June 27-30, pp. 422–427 (2011) 43. Redner, R.A., Walker, H.F.: Mixture Densities, Maximum Likelihood and the EM Algorithm. SIAM Review 26(2), 195–239 (1984) 44. Sch¨ olkopf, B., Smola, A.: Learning with Kernels. MIT Press (2002) 45. Sch¨ onberg, I.J.: Metric Spaces and Completely Monotone Functions. Annals of Mathematics 39(4), 811–841 (1938)
30
S. Miyamoto
46. Shental, N., Bar-Hillel, A., Hertz, T., Weinshall, D.: Computing Gaussian Mixture Models with EM Using Equivalence Constraints. In: Advances in Neural Information Processing Systems, vol. 16 (2004) 47. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998) 48. Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (2000) 49. Vapnik, V.N.: Transductive Inference and Semi-supervised Learning. In: Chapelle, O., et al. (eds.) Semi-Supervised Learning, pp. 453–472. MIT Press, Cambridge (2006) 50. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means Clustering with Background Knowledge. In: Proc. of the 9th ICML, pp. 577–584 (2001) 51. Wang, N., Li, X., Luo, X.: Semi-supervised Kernel-based Fuzzy c-Means with Pairwise Constraints. In: Proc. of WCCI 2008, pp. 1099–1103 (2008) 52. Zhu, X., Goldberg, A.B.: Introduction to Semi-Supervised Learning. Morgan and Claypool (2009)
Fusing Conceptual Graphs and Fuzzy Logic: Towards the Structure and Expressiveness of Natural Language Tru H. Cao Ho Chi Minh City University of Technology and John von Neumann Institute, VNU-HCM 268 Ly Thuong Kiet Street, District 10, Ho Chi Minh City, Vietnam [email protected]
Extended Abstract Natural language is a principal and important means of human communication. It is used to express information as inputs to be processed by human brains then, very often, outputs are also expressed in natural language. The capacity for humans to communicate using language allows us to give, receive, and understand information expressed within a rich and flexible representational framework. Moreover, we can reason based on natural language expressions, and make decisions based on the information they convey, though this information usually involves imprecise terms and uncertain facts. How humans process information represented in natural language is still a challenge to science, in general, and to Artificial Intelligence, in particular. However, it is clear that, for a computer with the conventional processing paradigm to handle natural language, a formalism is required. For reasoning, it is desirable that such a formalism be a logical one. A logic for handling natural language should have not only a structure of formulas close to that of natural language sentences, but also a capability to deal with the semantics of vague linguistic terms pervasive in natural language expressions. Conceptual graphs (Sowa [2,3]) and fuzzy logic (Zadeh [7,8]) are two logical formalisms that emphasize the target of natural language, each of which is focused on one of the two mentioned desired features of a logic for handling natural language. While a smooth mapping between logic and natural language has been regarded as the main motivation of conceptual graphs (Sowa [4,5,6]), a methodology for computing with words has been regarded as the main contribution of fuzzy logic (Zadeh [9,10,11]). However, although conceptual graphs and fuzzy logic have the common target of natural language, until recently they were studied and developed quite separately. Their combination would be a great advantage towards a knowledge representation language that can approach the structure and expressiveness of natural language. At this juncture, conceptual graphs provide a syntactic structure for a smooth mapping to and from natural language, while fuzzy logic provides a semantic processor for approximate reasoning with words having vague meanings. This talk presents the combined result of an interdisciplinary research Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 31–32, 2011. c Springer-Verlag Berlin Heidelberg 2011
32
T.H. Cao
programme focused on the integration of conceptual graphs and fuzzy logic, towards a knowledge representation language that is close to natural language in both of the structure and expressiveness (Cao [1]). First, the talk summarizes the development of fuzzy conceptual graphs and their logic programming foundations, as a graph-based order-sorted fuzzy set logic programming language for automated reasoning with fuzzy object attributes and types. Second, it presents the extension of fuzzy conceptual graphs with general quantifiers and direct reasoning operations on these extended conceptual graphs, which could be mapped to and from generally quantified natural language statements. Third, it introduces recent applications of fuzzy conceptual graphs for understanding natural language queries and semantic search.
References 1. Cao, T.H.: Conceptual Graphs and Fuzzy Logic: A Fusion for Representing and Reasoning with Linguistic Information. Springer, Berlin (2010) 2. Sowa, J.F.: Conceptual graphs for a data base interface. IBM Journal of Research and Development 20(4), 336–357 (1976) 3. Sowa, J.F.: Conceptual Structures - Information Processing in Mind and Machine. Addison-Wesley Publishing Company, Massachusetts (1984) 4. Sowa, J.F.: Towards the expressive power of natural language. In: Sowa, J.F. (ed.) Principles of Semantic Networks - Explorations in the Representation of Knowledge, pp. 157–189. Morgan Kaufmann Publishers, San Mateo (1991) 5. Sowa, J.F.: Matching logical structure to linguistic structure. In: Houser, N., Roberts, D.D., Van Evra, J. (eds.) Studies in the Logic of Charles Sanders Peirce, pp. 418–444. Indiana University Press, Bloomington (1997) 6. Sowa, J.F.: Conceptual graphs. In: van Harmelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 213–237. Elsevier (2008) 7. Zadeh, L.A.: Fuzzy sets. Journal of Information and Control 8, 338–353 (1965) 8. Zadeh, L.A.: Fuzzy logic and approximate reasoning (In memory of Grigore Moisil). Synthese 30, 407–428 (1975) 9. Zadeh, L.A.: PRUF - a meaning representation language for natural languages. International Journal of Man-Machine Studies 10, 395–460 (1978) 10. Zadeh, L.A.: Fuzzy logic = computing with words. IEEE Transactions on Fuzzy Systems 4, 103–111 (1996) 11. Zadeh, L.A.: Precisiated natural language. AI Magazine 25, 74–91 (2004)
A MMORPG Decision-Making Model Based on Persuasive Reciprocity Helio C. Silva Neto, Leonardo F.B.S. Carvalho, F´ abio Paragua¸cu, and Roberta V.V. Lopes Federal University of Alagoas (UFAL - Universidade Federal de Alagoas), Institute of Computing (IC - Instituto de Computa¸c˜ ao), Campus A. C. Sim˜ oes, 57072-970 Macei´ o, Alagoas, Brazil {lfilipebsc,helio.hx,fabioparagua2000}@gmail.com,
Abstract. From a videogame perspective, decision-making is a crucial activity that happens at all times and at different levels of perception. Moreover, it has direct influence over a game’s performance, a fact that also concerns to RPGs as they can act as tools to enhance the improvement of the proximal development zones of the involved individuals. As the RPG has an inherent cooperative character that stimulates socialization, interaction and the improvement of communication skills, it was thought that it would be interesting to take advantage of the RPG to build a model using Petri Net, based on the Game Theory and on a application of the Theory of Persuasion to an MMORPG environment, that involves the user in some kind of plot, at the same time that favors a greater ease in decision-making. Keywords: Psychology of Persuasion, Decision Making Systems, MMORPG, RPG, Petri Net and Game Theory.
1
Introduction
It is common sense that, at all times, people have to make decisions concerning different situations and problems. At those times, they are likely to need to use past experiences, values, beliefs, knowledge or even their technical skills. In addition to it, while some people are more conservative in their actions, others will have an innovative character and will be more than willing to accept potential risks. All these assumptions are valid during a decision-making process and will represent either the success or failure of those depending on it [8]. In a game context the making of decisions is a crucial process that has direct influence on any player performance, as it is a process that is being done at all times and at different levels of importance and awareness of player. This statement clears the importance of the decision-making process in games, making it impossible to think of games without considering the constant occurrence decision-making. In fact, the decision-making can be seen even in an empirical level by simple observation of a game match [8]. To ease this process, the work presented here employs two particular methods. First, this paper uses the Game Theory that consists of models for the analyses Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 33–47, 2011. c Springer-Verlag Berlin Heidelberg 2011
34
H.C. Silva Neto et al.
of conflicts and operations that are reliant on strategic behaviors in which; a player’s actions are partly dependent on the actions of other players [5][6]. Lastly and perhaps taking a more prominent approach, this paper applies the Principles of Persuasion to provide support to decision-making process, due to its emphasis of using communication skills to change attitudes, beliefs and even the behavior of other people, thus, preventing the use of force or coercion. In order words, a person who persuades others makes them accept a particular idea [3]. As to encourage players to build their own strategies and decisions for pursing their goals (i.e. to build their own knowledge), this paper presents a model that couples the two concepts described in the above paragraph for implementation in a Role Playing Game (RPG) environment. Therefore, this paper aims to build a decision-making model founded on the Reciprocity concept from the Principles of Persuasion for use on MMORPG environments and in that, taking advantage of an entertainment environment that favors the interaction of several players in order to apply concepts taken from the Principles of Persuasion and from the Game Theory to aid these players to build their own knowledge. To present this, this paper is divided into different sections. First Sec. 2 will discuss the MMORPG genre and detail some of its concepts, including those native from the more broad RPG genre and its tabletop games. Section 3 will focus on the Principles of Persuasion, particularly the principle of Reciprocity while Sec. 4 will explain the concepts and properties of decision-making systems in addition to the real conditions for making a decision. Section 4 will present the authors’ model built using a Petri Net and an analysis of its application the Principles of Persuasion, in addition to emphasize the importance of the use of the Game Theory in this model. Last, Sec. 6 will draw the paper conclusions.
2
Massively Multiplayer Online Role-Playing Game (MMORPG)
The acronym RPG that stands for Role Playing Game was first appointed by Dave Ameson in the USA at 1973. RPGs consist mainly of situations described by speech representation that rely on the imagination of players engaged in a game session. The described situations are often some kind of adventure that players take part in and that, commonly, was started at an interrupted previous session [10][11][2]. As occurs in many activities the RPG has its own language to refer to specific activities or occupations. The storyteller for example is known as master, while the ones listening and participating in the story told by the master are the players. The story itself is called an adventure. The list bellow describes the basic concepts of the RPG as is given by Debbie [7]: – Player: the individuals in charge of one or more characters of the plot (each known as PC, player character). A player has freedom of action in a game scenario, provided it meets the game’s system of rules;
A MMORPG Decision-Making Model Based on Persuasive Reciprocity
35
– Game Master: controls all factors concerning the game settings and plot that do not involve the actions of PCs. It is a game master responsibility to control plot characters (Non-Player Character, NPC) interacting with PCs. It is also the one responsible for controlling the game settings, being able to adjust the plot of the game according to what is needed. Additionally, the game master is also the one in charge of the plot progress and its secret objectives. A game master is not above the game’s system of rules and must obey to it, but, has the power to change things within a reasonable logic if it would benefit the game plot. – System of Rules: any action taken by PC must be addressed to the game master, who in turn verifies the game system of rules to inform the player of the results of this action under the circumstances it was taken. Therefore, there are specific rules that are applied for each different situation, as well as specific indications that must be accounted when dealing with unexpected situations. It is also common the use of a random element provided (provided by a dice) to simulate the real world uncertainty about the result of an action; – Scenario: the world where PCs are geographically and temporally located (or in some cases an astral plane or different dimension in which PC might be). Scenarios are often defined by players and the game master and chosen due to its ability to contain interesting characters and game plot. It might employ an everyday reality or a completely strange one; – Characters: the PCs. The characters are the imaginary projection of players within the game scenario. They can be built by their respective players or be provided by the game. In any case, they must respect the game system of rules and scenario. Depending of the plot some characters might be more or less attractive to players. Each PC is assign with abilities that define how it interacts with the environment and reflect the players interest in building a particular kind of character; – Plot: the game story and reason why PC have come together in some kind of adventure or action. Usually, PCs will act in accordance to the plot, meeting people and finding objects and places that are important to the progress of the plot. For this, players have no need to be sure of what is their goal or who is the one within the story that is taking the important decisions; – NPCs (Non Player character): common RPG term that indicates a character that is not controlled by any player. Thus, belonging to the game master. NPCs are usually supporting characters for the adventure; The launch of computer RPGs that allow multiplayer game mode, which allows multiple users to play via LAN, modem or the Internet, made an additional set of characteristics a common occurrence in RPGs of this instance, including: – Multiplayer interaction; – Exploration of wide worlds granted with large locations; – Existence of several sub-plots that allow players to create their own story and adventures;
36
H.C. Silva Neto et al.
– A greater similarity with tabletop RPGs due to the possibility of creating and evolving player characters; – A large majority of these games allows users to customize their main characters, as for example, by creating adventures, items, weapons or worlds for them. An additional main characteristic of MMORPGs is the constant intervention of a team of Game Masters, whether they are NPCs or real-world human developers that work on plots and create challenges for PCs. Moreover, the plots are nonlinear and thus do not require a beginning, a middle or even an end. The real concept of this is that what exists is an open story that is presented as a virtual world to be explored. Lastly, as the genre name suggests, the idea behind a MMORPG is that of a massively multiplayer role playing game that allows for thousands of online players to coexist and interact in the game’s “virtual” world.
3
The Theory of Persuasion
According to Robert Cialdini [3] persuasion is related to the use of communication to change the attitudes, beliefs or behaviors of other people. However, this change must occur voluntarily and not through the use of force or any means of coercion. Therefore, the person using of persuasion leads others to the acceptance of a particular idea by convincing them of it. The persuasive speech seeks to represent “the whole truth” by using of linguistic resources selected as expressions of a “truth” that introduce an overlapping of one’s previous assumptions. In this sense, the ultimate goal of persuasive speech is to employ rhetorical devices to “convince or change already established attitudes or behaviors” [4]. 3.1
Reciprocity
The principle of reciprocity is one of the Principles Persuasion and is related to the meaning of a passed down obligation. It is a ubiquitous concept of human culture as states sociologists such as Alvin Gouldner [9] who affirms that, there is no human society that relinquishes to this rule. Moreover, the archaeologist Richard Leakey [13] goes as far as to attribute the concept of reciprocity to the very essence of what makes us humans: “We are human because our ancestors learned to share their food and their skills in a community network”. Competitive human societies follow the principle of reciprocity and as such expect all its members to respect and believe in this principle. Each person of these societies was taught to live by the rule connected to this principle and for this reason every one of their members knows about the social sanctions and scorn applied on those who violate it. Additionally, people who do not attend
A MMORPG Decision-Making Model Based on Persuasive Reciprocity
37
to the statements of the principle of reciprocity are assigned with labels such as ungrateful or deadbeat. These labels represent the discontentment of society in being faced with members who receive a “favor” and make no effort to repay it. One of the reasons for the effectiveness of the reciprocity principle is the fact that its rule is imbued with a force that, in the existence of a previous “debit” often produces a “yes” response to a request that otherwise would surely be denied. Therefore, albeit plain in appearance, the strategies connected to the principle of reciprocity are extremely efficient and almost undetectable. As a consequence, the best solution for fighting these strategies is to think before accepting any favor or concession from someone whose true intentions are not known. It is important to note that, although the mechanisms connected to the principle of reciprocity might recall those of a standard practice for exploitation by using of manipulation and influence, this principle is, in fact, a fundamental pillar of human societies and one of the reasons for the development of the very earliest human communities.
4
Decision-Making Systems
As the name suggests, decision-making systems are focused on helping people make decisions about different sorts of matter. In order to understand how these systems operate, it is necessary to know which are the elements that might lead people to make their decisions and the context in which they fit. These elements can be better perceived by looking the decision-making process from the standpoint of business management, a perspective that according to Batista [1] clears to key elements of decision-making: the information channels, responsible for defining the source for acquiring data; and communication networks that define where data should be sent. In addition, a company must have for its system what is known as a base of knowledge, which contains the sets of data, rules and relationships that its decision-making process must account to achieve a suitable result or informational value [17]. Thus, the knowledge base has the task of facilitating the reorganization of the data and information useful for achieving the objectives of an organization. A particular important feature considering that nowadays, the success of any business is connected to the speed with which it is able to assimilate information as well as the speed with which it is able to take decisions. As a final note, although it may be intuitive that the success described above is the result of a process that has its origins in the fundamental components of Information Technology, it must be clear that the context that the term “knowledge” takes here corresponds to that given by Laudon and Laudon [12]. Therefore, knowledge is seen here as the set of the conceptual tools and categories that humans use to create, collect store and share information; an observation that must be taken into account in the following sections of this paper.
38
4.1
H.C. Silva Neto et al.
Information Systems
Information systems are designed to generate information for decision-making processes by collecting, processing and transforming data into information. Stair [17] states that information systems can be understood as a series of interconnected elements or components that collect (input), manipulate and store (process), and disseminate (output) data and information, in addition to provide a feedback mechanism for this process. To ensure the efficiency of these systems, Pereira and Fonseca [14] state that the following set o requirements must be fulfilled. – Address the real needs of users; – Keep the focus on end users (customers) instead of the professionals who created it (Developers); – Proper Customer Care; – Present compatible costs; – Constantly adapt to new information technologies; – Comply with the company’s business strategies. 4.2
Conditions for Decision-Making
The making of decisions can occur in different kinds of situations and might even involve conditions of uncertainty or risk. Additionally, previously planned decisions often present lower risk than unplanned ones. As pointed out by Stephen and Coulter [15] the ideal situation for decisionmaking is the certainty, which represents a situation that has known results for each possible alternative, thus enabling accurate decision-making in every circumstance. Moreover, when a decision includes a risk element all alternatives are designed under known probabilities and have a specific result, thus, the decision-maker knows all alternatives and is aware that the risk is unavoidable. As the above paragraph implies, the decision-making process is connected to a company’s potential of acquiring information from its Information System. In turn, the Information System of a company must provide information that is as useful as possible for the company’s needs, in order to better assist its end users in managing the business.
5
A Decision-Making Model Based on the Principle of Reciprocity from the Theory of Persuasion
Know that the concepts of the principle of Reciprocity and of the decision-making systems have been explained, this section will now present how these concepts were arranged into a model that fits the context of MMORPGs. 5.1
Circumstances for Reciprocity in MMORPGs
There are many circumstances in a MMORPG game environment for which the principle of reciprocity might be applicable, a diversity that is justified by the
A MMORPG Decision-Making Model Based on Persuasive Reciprocity
39
way that this genre of games combines real-world elements with those of a surreal one in a single environment in which players are able to interact with. The most common of these circumstances are: – To provide aid for someone hunting monsters in order to either increase level, locate an item or even complete a Quest1 . In such cases, the one who is receiving help will be in debt to the ones providing it and promote an exchange of knowledge; – More often than not, a MMORPG has a forum where players can interact and share their game knowledge. Therefore, by receiving tips or advices via forum one would also be in debt to the ones who provided the information or helped in some way; – To provide discount when selling some item, particularly in such cases where the said discount does not normally exists is likely to attract buyers (especially if the seller announces the discount). Unlike the previous cases, these situations might require a little “push” from sellers, or might be bounded to the acquisition of certain X amount or value of an item, thus resulting in an N discount. The same practice applies for promotions and contests; – Players that regularly buy items from a same player might get discounts by creating a bond of loyalty. Reciprocity Principle and the Flow Module Modeled in Petri Net. The principle of reciprocity acts as an exchange of favors, thus, it consists in sharing that which was received. Some of the ways to activate it in a MMORPG environment were mentioned in the previous section and were modeled according to the depiction of Fig. 1, which clearly expresses the operational flow between those concepts in a MMORPG environment. According to the flow depicted in Fig. 1 the principle of reciprocity can be activated when a player asks another for help with simple tasks such as locating an item, completing a Quest or hunting a monster. MMORPGs tend to provide tools that enable players to realize such interactions, such as to create a Party 2 or Trade 3 an item. As a consequence, the depicted flow can be started at the desire of any player that seeks reciprocity for any of its tasks, such as Help in Finding Item, Help in Hunting, Help in Quest. In this sense, should help mean assistance in finding an item that another player already has, the Trade task could be activated to perform the transference of the said item. If not, a Party could be put together to search for the desired item, or to attend any of the flow purposes. Additionally, a Party allow its members to share money among them as well as the experience points the game provides as reward for their activities. The Discount on Shopping task is a circumstance in which players apply the reciprocity principle to attain a loyalty bond from other players or a discount on 1 2 3
In RPGs the term quest denotes a mission or a purpose that must be accomplished. A term employed by RPGs for creating a hunting group or task force. A MMORPG tool that allows two players to exchange or sell items between them.
40
H.C. Silva Neto et al.
Fig. 1. Flow of Reciprocity
a large amount of objects being negotiated. Once the items and values are set, players can use the Trade tool to transfer the intended goods. The Give Tips on Forum task is triggered when players access the virtual platform, i.e. the forum, and perform questions or searches about desired subjects. After achieving their intents the ones performing the research are able to approve the response given by other members in order to facilitate future searches on the same subject and to motivate other players to improve their reputations. When a player uses all tools contained in this principle flow the player arrives at the Goal Achieved task. As the task name suggests, it indicates that the player has reached the completion of the intended goal or mission. At this point, the player chooses whether to take a passive stand and repay the community who helped him/her and, in that, creating a cyclical movement for the principle of reciprocity, or, might choose to not repay the other players and by thus, breaking the cycle of the principle of reciprocity. Nevertheless, by choosing this last option the player is aware of the penalties that one might suffer for breaking it. In addition to the flow of 1, 2 presents the model for the principle of reciprocity using a Petri Net that allows a proper visualization of the model, including the model locations (states), transitions and set of guide arcs. The formal definition for this Petri Net as m1 as its initial marking, however, even thought it adopts a initial marking for the net, the model accepts the creation of a cyclical movement for its network, which might be extended to all other principles of persuasion. In turn, this movement can only be interrupted if the player taking the active role in the Reciprocity Principle of the Theory of Persuasion decides to not return
A MMORPG Decision-Making Model Based on Persuasive Reciprocity
41
that what was shared with him/her. Aside from that, the designed network also contains several final locations (final states) that are used to identify the amount of players that completed the full cycle of the reciprocity principle and the amount of players that did not. The locations and variables of this network are: – Variable m1 - Start of Reciprocity: this is the initial state of the model and corresponds to a player using the Reciprocity Principle to achieve a desired goal. Afterwards, the player activates the Requesting Support transition to change the current state of the net. The requesting player may only proceed with the desired request if a supporting player becomes available; – Variable m2 - Supporter Available: at this state, a supporting player waits for requests coming from the player who activated the Reciprocity cycle. The state also verifies whether or not a supporting player for the requested issue is available. If there is no available player to act as a passive intermediary for the reciprocity principle the state triggers a transition to Supporter Unavailable. However, if there is a supporting player available the Requesting Support transition is triggered; – Variable m3 - Waiting for Support Request : this state occurs while the active player of the principle of reciprocity is choosing what he/she whishes to ask the passive players of this persuasion principle. The Petri Net has redundant paths emerging of this state as the net itself was developed to model parallel, concurrent, asynchronous and non-deterministic systems. The reason for this is that the net was modeled after human activities. Thus, the choice of route to take depends only on the active player, who might choose either the Buy Item, Request Help or Ask in Forum transition; – Variable m5 - Waiting for Discount : an active player reaches this state after triggering the Buy Item transition. At this location, the player takes advantage of a discount due the fact of being an active costumer; be buying a large amount of items and so forth. In turn, the passive player will ensure the completion of the sale by an exchange of favors and by triggering the Discount on Item transition; – Variable m4 - Waiting for the Order : a passive player can grant a discount only in those cases in which an item is properly requested. If there is no such request, them the No Request for Purchasing Item transition is triggered. Otherwise, the item is addressed to the Discount on Item transition where the item value will be negotiated; – Variable m8 - True to Purchase: this state is attained after the value of a product is set and a reliability bond is created between the involved players. At this state either the Request Trade or Purchase Rejected transitions might be triggered; – Variable m11 - Negative Reputation for Requester by Purchase: it is one of the final states of the modeled Petri Net and is activated when a player assuming the active role of the reciprocity principle decides to not proceed with the purchase of an item. This decision creates a bond of unreliability between the involved players. At this case, as the player is breaking the
42
–
–
– –
– –
–
–
–
H.C. Silva Neto et al.
reciprocity cycle a penalty might be applied to him/her that will remove points of his/her reputation. Even though this situation is undesirable for player as well as for the intents of this paper it may occur under these particular circumstances; Variable m9 - Waiting for Available Item: as implied by its name, this state concern the availability of an item owned by a passive player who may negotiate it. Moreover, in case the passive player does not currently have the item being negotiated the Unavailable Item transition is triggered and this condition persists until the passive player takes possession of the item in order to sell it. When this happens, the Request Trade transition is triggered; Variable m10 - Trade: is the state where the actual selling of an item occurs. Once an item becomes available for selling and its value is set the Approve Trade transition is triggered in order to check the goods being transferred, items and values alike; Variable m12 - Purchase Accomplished: it is the state where the player completes a purchase and that triggers the Score Reputation transition; Variable m13 - Score Reputation by Sale: the model final state for the analyses of the amount of sale a player accomplished by using the principle of reciprocity, thus granting the player a better reputation on future sales; Variable m14 - Score Reputation by Purchase: this state works similarly to the previous one, but focus the buyer instead of the seller; Variable m25 - Goal of Reciprocity Achieved : the principle of reciprocity does not always acts as a cycle, thus it is possible to attain a final objective. In fact, in order for this to happen a player needs only to provide a requesting player the assistance he/she needs, thus, taking the passive role of a seller or adviser for example and in that, assuming a position within the principle of reciprocity that does not place him/her in debit with someone else. This situation fits into the context of the model, while still employing the principle of reciprocity. With that in mind, it is up to the player on the active role of the principle to whether or not trigger the Activation of Reciprocity transition in order to repay the received aid; Variable m26 - Waiting for the Choice of a Requester : this states stands for a “waiting room” where players are placed while they decide what action to take. To trigger the Chooses to Not Repay or the Chooses to Repay transition; Variable m27 - Negative Reputation for Requester : it is a final state of the network that occurs when the active player decides to not return that what was shared with him/her and thus, acquires a negative score on his/her reputation; Variable m28 - Reputation for the Requester : there are two parallel routs that a player may take at the Chooses To Repay state, one of which is the Reputation for the Requester state. Here, a positive reputation is assigned to the active player, a fact that will benefit him/her on future activations of the principle of reciprocity. The other option is the Waiting Completion of Reciprocity Cycle state;
A MMORPG Decision-Making Model Based on Persuasive Reciprocity
43
– Variable m29 - Waiting Completion of Reciprocity Cycle: this state can be achieved when a player triggers the Chooses To Repay transition. This state poses as a “waiting room” where a player waits the activation of the Completion of Reciprocity Cycle transition;
Fig. 2. Reciprocity in Petri Net
44
H.C. Silva Neto et al.
– Variable m30 - Waiting Start of Reciprocity: this state indicates the moment at which the Reciprocity principle starts to act. This instant effectively marks the network start and triggers the m1 and m2 states; – Variable m6 - Waiting for Help: the state that occurs when the player at the active role triggers the Request Help transition. At this state a player might trigger Request Party transition if there is another player who wants to offer help; – Variable m15 - Waiting for Available Party: the objective of this state is to assert whether or not a party assistance is available. If there is no party available, the Party Unavailable transition is triggered. Otherwise, the Request Party transition is triggered; – Variable m16 - Party: a state that occurs when players intending to assist each other create a single group of individuals. Such groups are known in RPGs as a party and in this model are created using of the homonymous Party tool. Depending of the conduct of the players in a same party the Assistance in Progress or the Party Rejected transitions might be triggered. The last of these two indicates the disbanding of the group, in general, due to a lack of commitment to provide assistance, whether it comes from the group or from the player who requested the assistance; – Variable m17 - Negative Reputation for the Requested : in case the group or the passive player asks for the disbanding of the Party, the reputation of one or several of those involved may receive a negative score as penalty; – Variable m18 - Assistance Provided : this state is the end of the helping process and is reached when the group achieves its goal. This state triggers the Score Reputation transition; – Variable m19 - Score Reputation by Assistance: grants the player with a score, which is set in respect to a previously provided assistance; – Variable m20 - Score Reputation for the Assisted : similar to the m14 state, though this final state sets a score for the player who requested the assistance (the one who received the aid). The transition that triggers this state might also trigger the m25 state, which in turn, keeps the cycle of the network; – Variable m7 - Waiting for a Reply: a state located at a different path of the network that an active player might access via the Ask in Forum transition. At this state the active player waits for the answer of a passive player. After this, the active player may chose to activate the Assert Answer transition. In case there is no given answer, the No Given Answers transition is triggered; – Variable m21 - Waiting for a Question: this state is achieved when passive players are waiting for a question that they can answer. In case no question shows up the No Given Questions state is triggered. However, after question is made and answered it is up to the active player to decide whether or not to trigger the Assert Answer transition; – Variable m22 - Tip Provided : this state stands as the consequence of asserting an answer. Thus, it follows the Assert Answer transition. This state triggers the Score Reputation transition;
A MMORPG Decision-Making Model Based on Persuasive Reciprocity
45
– Variable m23 - Score Reputation by Tip: a state that grants a player with a score that is based on active participation at the forums. Particularly, by providing requesting players with the information they require; – Variable m24 - Score Reputation by Question: similar to the state above. However, it grants a value to the reputation of the player who asks the question instead of the one who answers it. This state is also triggered by the Score Reputation transition, which in turn, may also lead to the m25 state in order to keep its movement. However, a Petri Net requires one more variable in addition to the above ones. This variable is given by the net weight function and is responsible for launching the network. Thus, a minimal weight value is required to effectively start the Petri Net. For this paper, the weight function corresponds to the strategy of each player, and it must also account that each player is likely to have several strategies. In order to develop those strategies, this paper also adopts some of the equations provided by the Game Theory. The scope of this theory that fits the discussion of this paper will be explained now in order to provide a better understanding of its equations. 5.2
Game Theory
The Game Theory is a mathematical theory designed to model the phenomena that can be observed when two or more “decision agents” interact. This theory also provides the language for discretion of conscious decision-making processes and for goals involving more than one individual [5][6]. When applying the principles of the Game Theory to the model proposed in this paper what it is expected to achieve is the study of the choice of the optimal decision for the activation of the Reciprocity Principle under conditions of conflict. The set of engaged players are considered the basic game elements for these circumstances. Moreover, as each player has its own set of strategies, every time a player chooses a desired strategy a new profile is created at the space that comprises all possible situations. Each of these situations also corresponds to a profile, due to every player having different interests that are focused on the outcome of different circumstances. Mathematically, it is said that each player has its own utility function that assigns a real number (the gain of the player) to every game situation [16]. With the above said, for the model proposed in this paper, a game has the following basic elements: a finite set of players given as G = {g1 , g1 , . . . , gn }, with each gi ∈ G player having a finite set of options that is given as Si = {si1 , si2 , . . . , sim } and are referred as a pure strategy for the gi (mi ≥ 2) player; a s = {s1j1 , s2j2 , . . . , snjn } vector in which siji is as a pure strategy for the gi ∈ G player and that is named a profile of pure strategies. The set of all pure strategies profiles known as the game’s space of pure strategies, which that is given by the
46
H.C. Silva Neto et al.
Cartesian product [16] shown at 1. And finally, an utility function (shown at 2) that exists for each gi ∈ G player and that links its ui (s) gain to the appropriate s ∈ S profile of pure strategy [16]. S=
n
Si = S1 × S2 × . . . × SN
(1)
i=1
ui :
S→R s → ui (s)
(2)
The understanding of the above principles and its mathematical functions are important for this paper because, according to the proposed model, they are what gives each player the possibility of choosing the best strategy for applying the Principle of Reciprocity and thus, to trigger it at the appropriate game moment.
6
Conclusion
There are currently few people aware of the importance of applying any of the principles within the Theory of Persuasion to the decision-making processes. In fact, there are so few researches that, during the development of this work, there were found no decision-making models using of the artifices provided by the Theory of Persuasion and thus, games are still very incipient to the use of this theory. The research for this paper also made clear that implementing the Theory of Persuasion as a decision-making tool for MMORPGs’ environments changes the way players deal with information (knowledge) as each of them try to create its own best strategy and also, as each of them tries to figure out how their own strategies relate to the interests (strategies) of other players. However, the research presented at this paper is not concluded as it can be assumed that there is still much work left to do. One directed example of this is related to the fact that, until there is no real commercial interest in developing such game environments for educational purposes or any other field of interest, the initiative showed here will be no more than a mere academic project that may never come to fruition. To change the above reality it is necessary that the ones least aware of the benefits brought by the Theory of Persuasion start to comprehend it and thus, work in the creation of environments that understand the needs of players and that encourages them to develop new strategies for gaining knowledge, including, obtaining it by learning from other players, thus, effectively transforming the game environment in an educational partner. In order to fit the perspective described above, this paper focus on the real benefits that the Theory of Persuasion (particularly its Principle of Reciprocity) may bring to the development of a decision-making environment, demonstrating its advantages by using of an existing game environment that is also used to model its architecture in accordance to the real needs of players (a problem
A MMORPG Decision-Making Model Based on Persuasive Reciprocity
47
that is present in conventional playable environments). A work that was done in order to demonstrate the authors’ belief that, the application of decision-making systems to MMORPGs environments that combine the principles and theories presented throughout the paper are able to accord with the settled conditions for the proposed model. Hence, giving support to games that apply this model by providing players with tools to better build their own knowledge. Therefore, the Principle of Reciprocity from the Theory of Persuasion that was discussed here will contribute in creating decision-making systems as it will transport decisions taken upon conditions of uncertainty to the ensuring scenario of conditions of certainty, trough the use of the persuasive factors contained in this principle.
References 1. Batista, E.d.O.: Information System: the conscious use of technology for management (in pt-BR). Saraiva (2004) 2. Cale, C.: The real truth about dungeons & dragons (2002), http://www.cale.com/paper.htm (last access in 10/06/2011) 3. Cialdini, R.B.: Influence: the psychology of persuasion. Harperbusiness Essentials. Collins (2007) 4. Citelli, A.: Language and persuasion (in pt-BR), 2nd edn. 5. Conway, J.: All games brigth and beautiful. The American Mathematical Monthly, 417–434 (1977) 6. Conway, J.: The gamut of game and theories. Mathematics Magazine, 5–12 (1978) 7. Debbio, M.: Arkanun (in pt-BR), 2nd edn. DAEMON EDITORA LTDA (1998) 8. Garcia, E., Garcia, O.P.: The importance of management information system for business administration (in pt-br). Social Science in Perspective Magazine, 21–32 (2003) 9. Gouldner, A.W.: The norm of reciprocity: A preliminary statement. American Sociological Review, 161–178 (1960) 10. Hughes, J.: Therapy is fantasy: Roleplaying, healing and the construction of symbolic order (1988) 11. Jackson, S.: GURPS’ RPG Basic Module (in pt-BR). Devir (1994) 12. Laudon, K.C., Laudon, J.P.: Essentials of Management Information Systems. Prentice Hall (1999) 13. Leakey, R.: People of the Lake. Anchor Press/Doubleday (1978) 14. Pereira, M.J.L.B., Fonseca, J.G.M.: Faces of Decision: the paradigm shifts and the power of decision (in pt-BR). Makron Books (1997) 15. Robbie., S.P., Coulter, M.: Management. Prentice Hall (1996) 16. Sartini, B.A., et al.: A introduction to game theory (in pt-br) (Introduction to Game Theory) 17. Stair, R.: Principles of Information Systems: A Managerial Approach, Test Item File. Course Technology (1995)
A Computing with Words Based Approach to Multicriteria Energy Planning Hong-Bin Yan1 , Tieju Ma1 , Yoshiteru Nakamori2 , and Van-Nam Huynh2 1
2
School of Business, East China University of Science and Technology Meilong Road 130, Shanghai 200237, P.R. China [email protected],[email protected] School of Knowledge Science, Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi City, Ishikawa, 923-1292, Japan {huynh,nakamori}@jaist.ac.jp
Abstract. Exploitation of new and innovative energy alternatives is a key means towards a sustainable energy system. This paper proposes a linguistic energy planning model with computation solely on words as well as considering the policy-maker’s preference information. To do so, a probabilistic approach is first proposed to derive the underlying semantic overlapping of linguistic labels from their associated fuzzy membership functions. Second, a satisfactory-oriented choice function is proposed to incorporate the policy-maker’s preference information. Third, our model is extended to multicriteria case with linguistic importance weights. One example, borrowed from the literature, is used to show the effectiveness and advantages of our model.
1
Introduction
Efforts toward a sustainable energy system are progressively becoming an issue of universal concern and decision-makers. The objective of an energy policy toward a sustainable energy system includes efficient production, distribution and use of energy resources and provision of equitable and affordable access to energy while ensuring security of energy supply and environmental sustainability. Consequently, exploitation of sustainable or new energy alternatives (also called energy planning) is a key means of satisfying these objectives and has gained a great interest during the last decade. Energy planning endeavor involves finding a set of sources and conversion devices in order to meet the energy requirements of all the tasks in an optimal manner [4]. Making an energy planning decision involves a process of balancing diverse ecological, social, technical, and economic aspects over space and time. The complexity of energy planning and energy policy in particular makes multicriteria decision analysis (MCDA) a valuable tool in the energy planning process [6]; providing the flexibility and capacity to assess the energy supply alternatives’ implications to economy, environment and social framework. In particular, the concept of MCDM has been widely used for the design of energy and environmental policies as well as for sustainable energy planning [7,12]. However, the Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 48–59, 2011. c Springer-Verlag Berlin Heidelberg 2011
Linguistic Multicriteria Energy Planning
49
assessment of innovative energy alternatives through a number of criteria is a complex and time-consuming task, since the analysis has to face a series of uncertainties such as fossil fuel price, environmental regulations, market structure, technological, and demand and supply uncertainty [11]. In addition, sustainability is an inherently vague and complex concept and the implications of sustainability development as a policy objective is difficult to be define or measured. In view of these difficulties, fuzzy set theory [13] offers a formal mathematical framework for energy planning in order to resolve the vagueness, ambiguity and subjectivity of human judgment. A realistic approach to solving such situation is the use of fuzzy linguistic approach [14], which deals with linguistic information that is represented in linguistic labels. Such a problem will be referred to as linguistic energy planning, the process of which usually creates the need for computing with words. In this context, a lot of fuzzy linguistic models have been proposed/applied for energy planning problems, such as fuzzy linguistic AHP based model [2] and fuzzy TOPSIS based model [1,10]. These fuzzy models can efficiently deal with both fuzzy uncertainty and complexity in energy planning problems. Unfortunately, it simultaneously has an unavoidable limitation of the loss of information [3], which consequently implies a lack of precision in the final result. In this sense, direct computation on linguistic variables for energy planning can be considered as a direct and adequate framework [15]. To the best of our knowledge, little attention has been paid to multicriteria approaches with direct computation on linguistic variables for the evaluation of energy alternatives. Furthermore, our another motivation comes from the fact that the experts are not necessarily the decision-makers, but only provide an advice [8]. In energy planning, the experts can be people familiar with a particular energy supply and energy problem domain; whereas the real decision-maker in energy planning is the policy-maker, whose preference information plays an important role in choice of energy supply alternatives and is missed in most research. The main focus of this paper is to propose and develop a linguistic energy planning framework with computation solely based on linguistic labels as well as considering the policy-maker’s preference information. To do so, Section 2 presents some basic knowledge of energy planning and formulates the research problems. Section 3 formulates our single criterion linguistic energy planning, which is able to capture the underlying semantic overlapping of experts’ judgments as well as the policy-maker’s target preference information. Section 4 extends single criterion to multicriteria case with linguistic weight information. Section 5 borrows one example from the literature to illustrate the effectiveness and advantages of our model. Finally, this paper is concluded in Section 6.
2 2.1
The Linguistic Assessment-Based Framework Selecting Energy Alternatives and Evaluation Criteria
Most popular energy supply resources or technologies are the alternatives based on solar energy (photovoltaic and thermal), wind energy, hydraulic energy,
50
H.-B. Yan et al.
biomass, animal manure, combined heat and power, and ocean energy. Despite environmental drawbacks, nuclear and conventional energy alternatives like coal, oil and natural gas may be still included in the list of energy alternatives to be promoted. Let A = {A1 , A2 , . . . , AM } be a set of energy alternatives. The decision-making process to determine the best energy policy is multidimensional, made up of a number of aspects such as economic, technical, environmental, political, and social. It has to be noted that the criteria and the performances are dependent on the specific problem’s formulations and particularly on the country’s specific energy characteristics, its development needs and perspectives and the energy actors, and interests. Formally, the evaluation criteria are expressed as C = {C1 , C2 , . . . , CN }. 2.2
Selecting Linguistic Label Set and Their Semantics
Essentially, in any linguistic approach to solving a problem, label sets of involved linguistic variables and their associated semantics must defined first to supply users with an instrument by which they can naturally express their judgments. The concept of a linguistic variable was first introduced by [14] to model how words or labels can represent vague concepts in natural language. Definition 1. A linguistic variable is a quadruple < L, T(L), Ω, M > where L is the variable name, T(L) is the set of labels or words (i.e. the linguistic values of L), Ω is a universe of discourse, and M is the semantic rule which associates with each linguistic label its meaning. The semantic rule M is defined as a function that associates a normalized fuzzy subset of Ω with each word in T(L). We only consider the linguistic variable with a finite ordered label set L = {L0 , L1 , . . . , LG } on a continuous discourse Ω. In order to accomplish the objective of choosing the appropriate linguistic descriptors and their semantics for the label set of a linguistic variable, an important aspect needed to analyze is the granularity of uncertainty, i.e., the level of discrimination among different counts of uncertainty. Moreover, the label set must have the following characteristics: – The set presents a total order: Lg1 ≥ Lg2 if g1 ≥ g2 . – There is the negation operator: Neg(Lg1 ) = Lg2 such that g2 = G − g1 . – max(Lg1 , Lg2 ) = Lg1 if g1 ≥ g2 ; min(Lg1 , Lg2 ) = Lg1 if g1 ≤ g2 . We denote the linguistic label set for performance of energy alternatives and the one for importance weights as LI and LII , respectively. Example 1. Vahdani et al. [10] have defined the following linguistic label set with their associated fuzzy set semantics to rate different energy alternatives: LI = {LI0 = Very poor(VP), LI1 = Poor(P), LI2 = Medium poor(MP), LI3 = Fair(F), LI4 = Medium good(MG), LI5 = Good(G), LI6 = Very good(VG)} = {(0, 1, 2), (1, 2, 3), (2, 3, 4, 5), (4, 5, 6), (5, 6, 7, 8), (7, 8, 9), (8, 9, 10)}.
Linguistic Multicriteria Energy Planning
51
Example 2. The label set and their associated fuzzy set semantics with a additive linguistic preference relation to rate the importance weights regarding different criteria can be defined as follows [10]: II II II LII = {LII 0 = Very low(VL), L1 = Low(L), L2 = Medium low(ML), L3 = Medium(M), II II LII 4 = Medium high(MH), L5 = High(H), L6 = Very high(VH)}
The value (a, b, c) or (a, b, c, d) is used to represent a triangular or trapezoidal fuzzy number. 2.3
Gathering Data and Developing Computational Models for Energy Planning
A finite set of experts, denoted as E = {E1 , E2 , . . . , EK }, is then required to assess the energy alternatives in terms of selected criteria, making use of a linguistic label set LI . In addition, the experts would be also asked to provide their opinions on the importance of different criteria, making use of a linguistic label set LII . The group of experts is assigned a weight vector ω = (ω1 , ω2 , . . . , ωK ), K k=1 ωk = 1. Without additional information, all the experts are assumed to have the same weight such that ωk = 1/K, (k = 1, 2, . . . , K). Formally, the assessment data obtained by this way can be described as follows: – The value xkmn is used to denote the rating of energy alternative Am on criterion Cn , provided by expert Ek , where m = 1, 2, . . . , M ; n = 1, 2, . . . , N ; k = 1, 2, . . . , K; and xkmn ∈ LI . – If the group of experts provides importance weights regarding each criterion directly, LII will be a linguistic label set with an additive linguistic preference. Then, ynk is used to denote the assessment of criterion Cn provided by expert Ek , where n = 1, 2, . . . , N ; k = 1, 2, . . . , K; and ynk ∈ LII .
3
Single Criterion Linguistic Energy Planning Involving Underlying Semantic Overlapping
If an expert E assesses an alternative A ∈ A using L ∈ L, it implies that the expert makes an assertion “A on C is L”. From the philosophical viewpoint of the epistemic stance [5], humans posses some kind of mechanism for deciding whether or not to make certain assertions. Furthermore, although the underlying concepts are often vague, the decisions about assertions are, at a certain level, bivalent. That is to say for an energy alternative A on a criterion C and a description L, the expert is willing to assert that “A on C is L” or not. However, the dividing line between those linguistic labels are and those are not appropriate to use may be uncertain. Therefore, if one expert assesses an energy alternative using L, other linguistic labels L ∈ L(L = L) may also be appropriate for describing A on a criterion C. Such a phenomenon is referred to as the semantic
52
H.-B. Yan et al.
overlapping of linguistic data. Motivated by the epistemic stance, we assume that any neighboring linguistic labels have partial semantic overlapping in the energy evaluation framework. Also, similar with [9], the linguistic label L will be called a prototype label. 3.1
Deriving Underlying Semantic Overlapping from Fuzzy Membership Functions
Assume a linguistic label set L = {L0 , L1 , . . . , LG } and their associated membership functions are {M(L0 ), M(L1 ), . . . , M(LG )}. We first assume an expert’s judgment is a numerical value x, we obtain the linguistic description of x relative to the linguistic variable L, which is a fuzzy subset of T(L) such that fL (x) = {L0 /μM(L0 ) (x), L1 /μM(L1 ) (x), . . . , LG /μM(LG ) (x)}. For each possible x ∈ Ω, a mass assignment function mx on 2L can be derived from the membership degrees μM(L0 ) (x), μM(L1 ) (x), . . . , μM(LG ) (x) as follows. Definition 2. Given the fuzzy subset fL (x) of a universe Ω relative to the linguistic variable L such that the range of the membership function fL (x), μfL (x) , is {π1 , π2 , . . . , πJ }, where πj > πj+1 > 0. Then the mass assignment of f , denoted as mx , is a probability distribution on 2Ω satisfying mx (∅) = 1 − π1 , mx (Fj ) = πj − πj+1 , for i = j, . . . , J − 1, mx (FJ ) = πJ , (1) where Fj = {L ∈ L|μM(L) (x) πj } for j = 1, . . . , J, and {Fj }Jj=1 are referred to as the focal elements (sets) of the mass assignment function mx . In Definition 2, the mass function mx (F ) means one belief that F is the extensions of a value x. The notion of mass assignment mx suggests a definition of probability distribution p as follows. Definition 3. The probability distribution of x ∈ Ω on L is given by mx (Fj ) p(L|x) = ,L ∈ L Fj :L∈Fj (1 − mx (∅))|Fj |
(2)
where {Fj } is the corresponding set of focal elements. The mass mx (∅) = 0 can be interpreted as the degree of inconsistency conveyed by x or the belief committed exactly to other hypotheses which are not included in L. The value p(L|x) reflects the probability that L ∈ L belongs to the extensions of the expert’s assessment x ∈ Ω. This notion can be extended to the case where the value given is a continuous set of Ω in which the appropriate linguistic description is defined as follows. Definition 4. Let S ⊆ Ω, then the probability distribution of S on L is 1 p(L|S) = p(L|x)dx, L ∈ L, λ(S) S
(3)
where λ is the Lebesgue measure which in the case that S is an interval corresponds to its length. The value p(L|S) reflects the probability that L ∈ L belongs to the extensions of the expert’s assessment S ⊆ Ω.
Linguistic Multicriteria Energy Planning
53
Extending interval value to the case where a value is a fuzzy subset of Ω, the appropriate linguistic description is as follows. Definition 5. Let f ⊆f Ω (the symbol ⊆f denotes fuzzy subset), then the probability distribution of f on a linguistic label set L is 1 1 p(L|f ) = p(L|x)dxdα, L ∈ L, (4) 0 λ(fα ) fα where fα is the alpha-cut of f . The intuition underlying this definition is as follows. For each focal set F or the alpha-cut of f we average the probability of L being selected to label values in F . This is then averaged across the focal sets to give the overall probability of L. We are now able to derive the underlying semantic overlapping of an expert’s linguistic judgment. If one expert provides a linguistic label L ∈ L as his judgment, it means that the expert will choose the fuzzy subset M (L) as his judgment. Here, L will be called the prototype label. Then by using Eq. (4), the linguistic description of a prototype linguistic label is as follows. Definition 6. With a prototype label L ∈ L, then the probability distribution of L on the linguistic label set L is 1 1 p(Lg |L) = p(Lg |x)dxdα, g = 0, 1, . . . , G. (5) 0 λ(M (L)α ) M(L)α The value p(Lg |L) reflects the probability that Lg (g = 0, 1, . . . , G) belongs to the extensions of the expert’s assessment L ⊆ L. Consequently, there are G + 1 possible prototype linguistic labels with respect to L. Then we can obtain a probability distribution matrix representing the underlying semantic overlapping of the expert’s linguistic judgment. 3.2
Group Opinions Aggregation
In linguistic energy planning, a group of experts E = {E1 , E2 , . . . , EK } is chosen to assess a set of energy supplies A = {A1 , A2 , . . . , AM } on a criterion C using a linguistic variable LI = {LI0 , LI1 , . . . , LIG }. The judgment of expert Ek for alternative Am on criterion C is denoted as xkm ∈ LI . With the associated fuzzy membership functions of the linguistic labels LIg ∈ LI (g = 0, 1, . . . , G), according to Definition 6, we obtain a probability distribution pkm of xkm on LI such that pkm = pm LI0 |xkm , pm LI1 |xkm , . . . , pm LIG |xkm , m = 1, . . . , M, k = 1, . . . , K. With the weighting vector ω = (ω1 , ω2 , . . . , ωK ) of the experts, we can obtain the collective probability distribution on LI regarding energy alternative Am on criterion C such that K pm LIg = ωk · pm LIg |xkm , (6) k=1
54
H.-B. Yan et al.
where g = 0, 1, . . . , G and xkm ∈ LI . Therefore, we will obtain a probability distribution pm = [pm (LI0 ), pm (LI1 ), . . . , pm (LIG )], which will be referred to as the profile of energy alternative Am on criterion C. 3.3
Satisfactory-Oriented Choice Function
Most linguistic multiexpert decision making process is basically aimed at reaching a “consensus”. Consensus is traditionally meant as a strict and unanimous agreement of all the experts regarding all possible alternatives. The model presented below assumes that experts do not have to agree in order to reach a consensus. There are several explanations that allow for experts not to converge to a uniform opinion. It is well accepted that experts are not necessarily the decision-makers, but provide an advice [8]. The experts in energy planning are not necessarily the real policy-maker. Due to this observation, the linguistic judgment provided by the expert does not represent the policy-maker’s preference. In fuzzy set computation based model, an optimization procedure is usually needed to select the best energy choices. It is realized that the human behavior should be modeled as satisficing instead of optimizing. In the sequel, we shall propose a satisfactory-oriented choice function. The inferred probability distribution on a linguistic label set LI for each energy alternative Am could be viewed as a general framework of decision making under uncertainty (DUU), described as follows. Energy supplies Am (m = 1, 2, . . . , M ) represent the alternatives available to a policy-maker, one of which must be selected. There are G + 1 possible values corresponding to the so-called state space S = {S0 , S1 , . . . , SG }, which is characterized by a probability distribution pS on the state space S. pm (LIg )(g = 1, . . . , G) on LI acts as the the probability distribution on the state space S. By assuming the policy-maker has a target T in his mind and the target is independent on the set of M alternatives and the linguistic judgments provided by the experts, we define the following function: Prm Pr(Am T ) =
G g=0
Pr LIg T · pm LIg ,
(7)
where Pr LIg T is the probability of linguistic label LIg meeting target T . The quantity Pr(Am T ) could be interpreted as the probability of “the performance of Am is as at least good as T ”. We refer to it as satisfactory-oriented principle. We assume there exists a probability distribution pT of target T on the linguistic label set LI such that pT = pT (LI0 ), pT (LI1 ), . . . , pT (LIG ) , then we define the following value function:
G G Prm = u(Lg , Ll ) · pT (Ll ) · pm (LIg ), (8) g=0
l=0
where u(Lg , Ll ) is the utility level (1 or 0). Therefore, we can induce the following value function
Linguistic Multicriteria Energy Planning
pT (LIl ) · pm (LIg ) , for a benefit criterion; g=0 l=0
G G I I = pT (Ll ) · pm (Lg ) , for a cost criterion.
Prm = Prm
G
g=0
g
l=g
55
(9) (10)
Now let us consider two special cases. Without additional information (if the policy-maker does not assign any target), we assume the policy-maker has a tar1 get T uniformly distributed on LI such that pT (LIg ) = G+1 , g = 0, . . . , G. Then we can obtain the probability of meeting the uniformly distributed linguistic target as follows: G g+1 · pm (LIg ), benefit criterion; G+1 Prm = g=0 (11) G G+1−g I g=0 G+1 · pm (Lg ), cost criterion. Consider the case that the policy-maker assigns a specific linguistic label LIl ∈ LI as his target. As discussed in Section 3.1, in linguistic energy planning, the linguistic judgment has a underlying semantic overlapping. In this context, we can also derive a probability distribution pT on LI such that pT = pT LI0 |LIl , pT LI1 |LIl , · · · , pT LIG |LIl . (12) Consequently, we can obtain the probability of meeting a specific linguistic label target LIl ∈ LI for benefit and cost criteria, respectively. With the satisfactoryoriented choice function, if there exists only one criterion, we can obtain best energy alternative(s) as A∗ = arg maxm {Prm }.
4
Multicriteria Linguistic Energy Planning
Extending single criterion to multicriteria, a group of experts is chosen to assess a set of energy supplies on a set of criteria C = {C1 , C2 , . . . , CN } using a linguistic set LI = {LI0 , LI1 , . . . , LIG }. The linguistic assessment of alternative Am on criterion Cn provided by expert Ek is denoted as xkmn ∈ LI . Simk k I ilarly, we can derive a probability distribution pmn ofxmn on L such that pkmn = pmn LI0 |xkmn , pmn LI1 |xkmn , . . . , pmn LIG |xkmn . With the weighting vector ω = (ω1 , ω2 , . . . , ωK ), we can obtain a collective probability distribution for each energy supply Am on each criterion Cn such that pmn = pmn (LI0 ), pmn (LI1 ), . . . , pmn (LIG ) , m = 1, 2, . . . , M, n = 1, 2, . . . , N. (13) Based on the satisfactory-oriented principle in Section 3.3, a set of linguistic targets T = {T1 , T2 , . . . , TN } regarding the criteria set C can be defined by default or by the policy-maker. With mutual independent and additive preferences of targets, we can obtain a probability vector of energy supply Am meeting target Tn for evaluation criterion Cn as follows: Prm = [Prm1 , Prm2 , . . . , PrmN ] .
(14)
56
H.-B. Yan et al.
The group of experts provides their linguistic judgments for weighting in II II formation by using a set of linguistic labels LII = LII with an 0 , L 1 , . . . , LG additive linguistic preference relation. The linguistic rating for the importance weight of criterion Cn provided by expert Ek is denoted as ynk ∈ LII . Based on the interpretation of underlying semantic overlapping of linguistic labels in k II Section 3.1, we can obtain a probability n on IIdistribution pw L for the linguis k k k II k k tic rating yn such that pwn = pwn L0 |yn , pwn L1 |yn , . . . , pwn LII , G |yn where k = 1, 2, . . . , K and n = 1, 2, . . . , N . With the weighting vector ω = (ω1 , ω2 , . . . , ωK ) of the group of experts, we can obtain a collective probability distribution on LII regarding criterion Cn such that K k pwn (LII ωk · pwn (LII (15) g)= g |yn ), g = 0, 1, . . . , G. k=1
Such a value is referred to as the weight profile of criterion Cn , denoted as pwn . Consequently, we will obtain N probability distributions pwn (n = 1, 2, . . . , N ) on LII for the criteria set C. In order to derive the weight for each criterion, we proceed as follows. First, we define the probability that the profile of criterion Cn is equivalent to that of criterion Cl using the following function G II (16) Pr(Cn = Cl ) = pwn LII g × pwl Lg , g=0
where n, l = 1, 2, . . . , N. Second, the probability that the profile of criterion Cn is greater than that of Cl is defined as Pnl = Pr(Cn > Cl ) = Pr(Cn ≥ Cl ) − 0.5Pr(Cn = Cl ),
(17)
where n, l = 1, 2, . . . , N, and Pr(Cn ≥ Cl ) can be derived from the satisfactoryoriented choice function in Section 3.3. According to Eqs. (16)-(17), we have Pnl +Pln = 1. Such a function satisfies the properties of additive fuzzy preference relations such that – When Pnl = 0.5, it means that no difference exists between Cn and Cl . – When Pnl = 1, it indicates that Cn is absolutely better than Cl . – When Pnl > 0.5, it indicates that Cn is better than Cl . The preference matrix in AHP is generally assumed to be an additive reciprocal. Consequently, we can build a fuzzy preference matrix P by means of Eqs. (16)(17) such that C1 C2 . . . CN C1 0.5 P12 . . . P1N P = C2 P21 0.5 . . . P2N (18) .. .. .. . . . . .. . . . CN PN 1 PN2 . . . 0.5 Using the fuzzy preference relation matrix P, we can derive the importance weighting vector W = (w1 , w2 , . . . , wN ) for different criteria by using the arithmetic averaging method such that 1 N Pnl wn = , n = 1, 2, . . . , N. (19) N l=1 N m=1 Pml
Linguistic Multicriteria Energy Planning
57
Using the derived weight vector W = [w1 , w2 , . . . , wN ] and the individual probabilities in Eq. (14), we are now able to obtain the global value for energy N supply Am as follows: V (Am ) = n=1 wn × Prmn , m = 1, 2, . . . , M.
5
An Illustrative Example
In this section, we borrow an example from the literature to illustrate the advantages and effectiveness of our model. 5.1
Problem Descriptions: Alternative-Fuel Buses Selection
Vahdani et al. [10] study the problem of alternative-fuel buses selection. In their research, twelve energy technologies (fuel modes) are considered such that A = A1 : Conventional diesel engine, A2 : Compressed natural gas, A3 : Liquid propane gas, A4 : Fuel cell, A5 : Methanol, A6 : Electric vehicle opportunity charging,A7: Direct electric charging, A8 : Electric bus with exchangeable batteries, A9 : Hybrid electric bus with gasoline engine, A10 : Hybrid electric bus with diesel engine, A11 : Hybrid electric bus with compressed natural gas engine, A12 : Hybrid electric bus with liquid propane gas engine . They also investigate four aspects of evaluation criteria: social, economic, technological, and transportation. They establish 11 eval uation criteria such that C = C1 : Energy supply, C2 : Energy efficiency, C3 : Air pollution, C4 : Noise pollution, C5 : Industrial relationship, C6 : Implementation cost, C7 : Maintenance cost, C8 : Vehicle capability, C9 : Road facility, C10 : Speed of traffic flow, C11 : Sense of comfort . of three experts E = {E1 , E2 , E3 } with a weighting vector ω = 1 A1 group 1 is chosen to assess each alternative-fuel mode and rate importance 3, 3, 3 weights regarding different criteria. The linguistic assessments for the performance values of 12 energy alternatives on 11 criteria by the three experts with the linguistic label set LI1 in Example (1) can be referred to [10] [pp. 1405-1406]. In addition, the linguistic assessments for criteria weights via the linguistic label C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 E1 H VH L L H L L VH H VH VH II set L in Example (2) are E H VH L L H L L VH H VH H . 2 E3 H H L L VH L L VH VH VH H 5.2
Solution Based on Our Model
Now let us apply our model to solve this problem proceeding as follows. According to Definition 6 in Section 3.1, we can derive the underlying semantic overlapping of linguistic label set LI1 in Example 1. Similarly, we can also obtain the underlying semantic overlapping of linguistic label set LII1 in Example 2. With the weight vector ω = (ω1 , ω2 , . . . , ωK ) of the group of experts, we can derive the performance profile (collective probability distribution) of alternative Am on criterion Cn , denoted as pmn . Similarly, we can derive the weight profile pwn for criterion Cn .
58
H.-B. Yan et al.
With the weight profiles of all the criteria, we can obtain a probability matrix via pairwise comparison such that C1 C2 C3 C4 C P = C5 6 C7 C8 C9 C10 C11
We can then obtain a weight vector W for the criteria set via Eq. (19) as W = (0.099, 0.131, 0.033, 0.033, 0.115, 0.033, 0.033, 0.146, 0.115, 0.146, 0.115). With a set of predefined linguistic targets T = {T1 , T2 , . . . , TN } regarding the criteria set C, we can calculate the probability Prmn of energy alternative Am on criterion Cn meeting target Tn . Here as an illustration, we consider two cases: – The policy-maker does not assign any target, therefore all the criteria have the same targets T1 uniformity distributed on LI . – According to weight vector of the N criteria, we know that criteria C8 and C10 are the most two important criteria, i.e., C8 = C10 = 0.146. Therefore, we assume that the policy-maker assigns a specific linguistic label as his target T2 for criteria C8 and C10 . Other criteria have the same uniform target T1 . We consider four targets: T12 = Fair(F), T22 = Medium good(MG), T32 = Good(G), T42 = Very good(VG). According to the semantic overlapping of linguistic labels discussed in Section 3.1, we can derive a probability distribution for each specific label typed target. Consequently, we can obtain the probabilities of meeting different predefined targets for each fuel mode on each evaluation criterion. Finally, a global value for each fuel model can be obtained. It is founded that – If the policy-maker does not assign any target or assigns Good(G) as his target for criteria C8 and C10 , A3 : Liquid propane gas is the best fuel mode. – If the policy-maker assigns Fair(F) or Medium good(MG) as his targets toward C8 and C10 , A11 : Hybrid electric bus with compressed natural gas engine will be best fuel mode. – If the policy-maker assigns Very good(VG) as his targets toward C8 and C10 , A1 : Conventional diesel engine will be best fuel mode.
6
Concluding Remarks
This paper proposed a linguistic energy planning model. Essentially, a probabilistic approach was first proposed to derive the underlying semantic overlapping
Linguistic Multicriteria Energy Planning
59
of linguistic labels from their associated fuzzy membership functions. Second, a satisfactory-oriented choice function was proposed to incorporate the policymaker’s preference information. Third, our model was extended to multicriteria case with linguistic importance information. An alternative-fuel bus selection problem was borrowed from the literature to show the effectiveness and advantages of our model. The main advantages of our model are it ability to deal with computation solely with words involving the underlying semantic overlapping as well as the real decision-maker’s preference. Acknowledgements. This study was supported by the National Natural Science Foundation of China (71101050, 70901026) and the Program for New Century Excellent Talents in University (NCET-09-0345).
References 1. Cavallaro, F.: Fuzzy TOPSIS approach for assessing thermal-energy storage in concentrated solar power (CSP) systems. Appl. Energ. 87(2), 496–503 (2010) 2. Heo, E., Kim, J., Boo, K.J.: Analysis of the assessment factors for renewable energy dissemination program evaluation using fuzzy AHP. Renew. Sustain Energ. Rev. 14(8), 2214–2220 (2010) 3. Herrera, F., Mart´ınez, L.: A 2-tuple fuzzy linguistic representation model for computing with words. IEEE Trans. Fuzzy Syst. 8(6), 746–752 (2000) 4. Hiremath, R., Shikha, S., Ravindranath, N.: Decentralized energy planning; modeling and application–a review. Renew. Sustain Energ. Rev. 11(5), 729–752 (2007) 5. Lawry, J.: Appropriateness measures: An uncertainty model for vague concepts. Synthese 161(2), 255–269 (2008) 6. Løken, E.: Use of multicriteria decision analysis methods for energy planning problems. Renew. Sustain Energ. Rev. 11(7), 1584–1595 (2007) 7. Poh, K.L., Ang, B.W.: Transportation fuels and policy for Singapore: an AHP planning approach. Comput Ind. Eng. 37(3), 507–525 (1999) 8. Shanteau, J.: What does it mean when experts disagree? In: Salas, E., Klein, G.A. (eds.) Linking Expertise and Naturalistic Decision Making, pp. 229–244. Psychology Press, USA (2001) 9. Tang, Y., Lawry, J.: Linguistic modelling and information coarsening based on prototype theory and label semantics. Int. J. Appr. Reason 50(8), 1177–1198 (2009) 10. Vahdani, B., Zandieh, M., Tavakkoli-Moghaddam, R.: Two novel FMCDM methods for alternative-fuel buses selection. Appl. Math. Model 35(3), 1396–1412 (2011) 11. Venetsanos, K., Angelopoulou, P., Tsoutsos, T.: Renewable energy sources project appraisal under uncertainty: the case of wind energy exploitation within a changing energy market environment. Energ. Pol. 30(4), 293–307 (2002) 12. Wang, B., Kocaoglu, D.F., Daim, T.U., Yang, J.: A decision model for energy resource selection in China. Energ. Pol. 38(11), 7130–7141 (2010) 13. Zadeh, L.A.: Fuzzy sets. Inform. Contr. 8(3), 338–353 (1965) 14. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning–Part I. Inform. Sci. 8(3), 199–249 (1975) 15. Zadeh, L.A.: From computing with numbers to computing with words–From manipulation of measurements to manipulation of perceptions. IEEE Trans. Circ. Syst. Fund Theor. Appl. 46(1), 105–119 (1999)
Bipolar Semantic Cells: An Interval Model for Linguistic Labels Yongchuan Tang1, and Jonathan Lawry2 1 2
College of Computer Science, Zhejiang University, Hangzhou, 310027, P.R. China Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1TR, UK [email protected], [email protected]
Abstract. An interval model for linguistic labels is proposed by introducing bipolar semantic cells for concept representation. According to this model, the degree to which each element is a positive case of a given linguistic expression is an interval value. Fundamental to our approach is that there is an uncertain border area associated with linguistic labels. This is modeled by assuming that there are two uncertain boundaries for each linguistic label, resulting in a bipolar semantic cell for concept representation. The calculus of lower and upper neighborhood functions of linguistic expressions is developed and investigated. This then provides a framework for modelliong the vague concepts in uncertain reasoning. Keywords: Prototype theory, Label semantics, Vagueness, Bipolarity, Interval fuzzy sets.
1 Introduction In human communication linguistic labels and expressions provide a flexible and effective mechanism for communicating information. The use of linguistic labels and expressions can permit us to express granular knowledge efficiently at a suitable level of detail, as well as being robust to small changes in attribute measurements and noisy data. This paper presents an interval-valued formalization of linguistic labels. The underlying philosophy of our approach is that the boundary of the extension of a linguistic label may not be a single borderline but rather a border area. This kind of philosophical viewpoint is similar to the epistemic stance of vagueness [4] where there is an uncertain but crisp division between those labels which are, and those which are not appropriate to describe a given element. The proposed bipolar model also assumes that there is a set of prototypes which is described by the linguistic label certainly. The latter assumption is the basic viewpoint of prototype theory proposed by Rosch [8,9]. Based on these assumptions, the bipolar model proposed in this paper uses a transparent cognitive structure, referred to as a bipolar semantic cell, to represent a linguistic label. Intuitively speaking, a linguistic label L on domain Ω can be expressed as about P , similar to P or close to P where P ⊆ Ω is a set of prototypes of L. We use the term bimembrane of L to refer to two uncertain boundaries, corresponding to
Corresponding author.
Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 60–71, 2011. c Springer-Verlag Berlin Heidelberg 2011
Bipolar Semantic Cells: An Interval Model for Linguistic Labels
61
a pair of distance thresholds (1 , 2 ) to the prototypes P , together with an associated probability density function δ on [0, +∞)2 . Then from this bipolar semantic cell we can define two functions, the lower and upper neighborhood functions, to quantity the appropriateness of this linguistic label to describe elements from Ω. According to the bipolar semantic cell model, the appropriateness of linguistic label L with prototypes P is an interval value. Hence, it is an extension of work by Lawry and Tang [6,7] where the appropriateness is a single value generated by an uncertain boundary. In this paper we will also show that the proposed bipolar semantic cells are different from the intuitionistic fuzzy sets proposed by Atanassov [1]. However, as shown in the recent work by Lawry [5] it is also possible to develop the same calculus as intuitionistic fuzzy sets theory from prototype theory and random set theory. In the sequel we explore the use of bipolar semantic cells to model linguistic labels, introduce lower and upper neighborhood functions of compound expressions, and discuss the semantic bipolarity of linguistic expressions.
2 Bipolar Semantic Cells Let Ω denote the underlying universe of discourse and LA = {L1 , . . . , Ln } be a finite set of linguistic labels for describing elements of Ω. For each label Li there is a set of prototypical elements Pi ⊆ Ω, such that Li is certainly appropriate to describe any prototypical elements Pi ⊆ Ω. In other words, the linguistic label Li can be considered as a word with meaning ‘about Pi ’, ‘similar to Pi ’ or ‘close to Pi ’ where Pi ⊆ Ω. In the following we firstly introduce a cognitive structure for the semantic representation of vague concept Li . This structure is referred to as a bipolar semantic cell and has three components: a prototype set Pi (⊆ Ω) of Li , a distance function di defined on Ω and a density function δi defined on [0, +∞)2 . Definition 1 (Bipolar Semantic Cell). Bipolar semantic cell is a structural and semantic unit of vague concepts. It is the smallest unit of a vague concept, and is the building brick of concept representation. More formally, A bipolar semantic cell ‘Li ’ is a triple Pi , di , δi , where Pi ⊆ Ω is a set of prototypes of Li , di is a distance metric on Ω such that di (x, y) ≥ di (x, x) = 0 and di (x, y) = di (y, x) for all x, y ∈ Ω, and δi is a probability density function on [0, +∞)2 such that δi ([0, +∞)2 ) = 1 and δi (1 , 2 ) ≥ 0 for any 1 , 2 ≥ 0, which represents the joint probability distribution of neighborhood sizes of Li . Intuitively speaking, a bipolar semantic cell Li is composed of a semantic nucleus and a semantic bimembrane. The semantic nucleus represents the set of prototypes of Li . The semantic bimembrane represents two uncertain boundaries of Li , where uncertainty is modelled by a 2-dimensional probability density function on two distance thresholds to Pi . Here we assume that for all x ∈ Ω di (x, Pi ) = inf y∈Pi di (x, y). According to this definition, we can see that bipolar semantic cell Pi , di , δi can provide an effective model of a concept Li with the prototypes Pi . In this structure, Pi is a prototype set without uncertainty, while the uncertainty is associated with the unknown
62
Y. Tang and J. Lawry
δ(ε1,ε2)
neighborhood boundaries of Li . Intuitively speaking, the semantic nucleus corresponds to the prototype set Pi and the semantic bimembrane corresponds to the resilient constraint ‘about’. An interesting component is the semantic bimembrane which consists of two crisp but uncertain boundaries of the semantic cell. This means that the distance thresholds 1 (≥ 0) and 2 (≥ 0) from two corresponding boundaries to the prototype set Pi are two random variables with a joint density function δi on [0, +∞)2 . See for example, the illustration of a bipolar semantic cell on a 2-dimensional domain in figure 1.
ε1
Pi
ε2 ε2
ε1
Fig. 1. The bipolar semantic cell Li = Pi , d, δ on a 2-dimensional domain of course
In this paper we assume that all linguistic labels Li ∈ LA share the same density function δ on [0, +∞)2 . For each linguistic label Li the lower and upper extensions of Li are defined to be those elements of Ω with distance from Pi less than or equal to a lower and an upper threshold value respectively. Formally speaking, the lower extension of Li is taken to be {x ∈ Ω : di (x, Pi ) ≤ } and the upper extension of Li is taken to be {x ∈ Ω : di (x, Pi ) ≤ }, where ≤ . In the following, one way of defining the lower and upper extensions of Li is given. Definition 2 ((1 , 2 )-Lower and Upper Neighborhoods). For any 1 , 2 ≥ 0 the (1 ,2 ) ( , ) (1 , 2 )-lower and upper neighborhoods N Li1 2 and N Li are respectively defined as follows: ( , ) N Li1 2 = {x : di (x, Pi ) ≤ min{1 , 2 }} (1) (1 ,2 )
N Li
= {x : di (x, Pi ) ≤ max{1 , 2 }}
(2)
In this definition we take the lower distance threshold = min{1 , 2 } and the upper distance threshold = max{1 , 2 }. We will see that this kind of definition results in a non-truth-functional calculus system. In the recent work by Lawry [5] another way of defining the lower and upper distance thresholds was proposed, which result in a truth-functional calculus system and a new interpretation of intuitionistic fuzzy sets [1]. From the lower and upper extensions of linguistic label Li we can obtain the lower and upper neighborhood functions according to the following definition.
Bipolar Semantic Cells: An Interval Model for Linguistic Labels
63
Definition 3 (Lower and Upper Neighborhood Functions). For any x ∈ Ω and Li ∈ LA the lower and upper neighborhood functions of Li , μL (x) and μLi (x), are defined i respectively as follows: ( , ) μL (x) = δ (1 , 2 ) : x ∈ N Li1 2 (3) i
μLi (x) = δ
(1 ,2 )
(1 , 2 ) : x ∈ N Li
(4)
Here μL (x) quantifies the belief that the linguistic label is definitely appropriate to dei scribe x, and μLi (x) quantifies the belief that the linguistic label is possibly appropriate to describe x. It is easy to show that μL (x) ≤ μLi (x), so we can also assert that the i degree to which element x belongs to the extension of Li is at least μL (x) and at most i μLi (x). In other words the membership degree of x belonging to Li is an interval value. Proposition 4. For any Li ∈ LA the following hold: μL (x) = i
di (x,Pi )
μLi (x) = μL (x) + i
+∞
+∞ 1
di (x,Pi )
0
δ(1 , 2 )d2 d1 +
+∞ di (x,Pi )
+∞
di (x,Pi )
δ(1 , 2 )d2 d1 +
+∞
2
di (x,Pi ) 0
δ(1 , 2 )d1 d2
+∞ di (x,Pi )
δ(1 , 2 )d1 d2
If δ(1 , 2 ) = δ1 (1 )δ2 (2 ) and let Δk () = δk ([, +∞)) for k = 1, 2 then μL (x) = i
+∞ di (x,Pi )
δ1 (1 )Δ2 (1 )d1
+∞
di (x,Pi )
δ2 (2 )Δ1 (2 )d2
μLi (x) = μL (x) + (1 − Δ1 (di (x, Pi )) Δ2 (di (x, Pi )) + (1 − Δ2 (di (x, Pi )) Δ1 (di (x, Pi )) i
Example 1. Assume that δ(1 , 2 ) = δ1 (1 )δ2 (2 ) and δi (·) = δ(· | ci , σi ) for i = 1, 2 are the normalized normal density functions, δ( | ci , σi ) = 2
G( | ci , σi ) 1 − F (0 | ci , σi )
(5)
i) exp (−c is the normal density function with mean ci −2σi2 0 and standard variance σi and F (0 | ci , σi ) = −∞ G( | ci , σi )d. Figure 2(a) shows lower and upper neighborhood functions for L1 = {0}, d, δ and L2 = {1}, d, δ where d is the Euclidean distance function and δ(1 , 2 ) = δ(1 | c1 , σ1 )δ(2 | c2 , σ2 ) with c1 = 0.4, σ1 = 0.2, and c2 = 0.3, σ2 = 0.1. Figure 2(b) shows the lower and upper neighborhood functions for L1 and L2 again with c1 = 0.4, σ1 = 0.3, and c2 = 0.5, σ2 = 0.2.
where G( | ci , σi ) =
√ 1 2πσi
64
Y. Tang and J. Lawry 1
1 L1
L
L1
2
0.5
0.5
0 −1
L2
0 −1
2
1
0
1
0
x
x
(a)
(b)
2
Fig. 2. The lower and upper neighborhood functions of linguistic labels L1 = {0}, d, δ and L2 = {1}, d, δ where δ(1 , 2 ) = δ(1 | c1 , σ1 )δ(2 | c2 , σ2 ). (a) The lower and upper neighborhood functions for L1 and L2 with c1 = 0.4, σ1 = 0.2, and c2 = 0.3, σ2 = 0.1. (b) The lower and upper neighborhood functions for L1 and L2 with c1 = 0.4, σ1 = 0.3, and c2 = 0.5, σ2 = 0.2.
3 Lower and Upper Neighborhood Functions of Linguistic Expressions We consider linguistic expressions corresponding to compound expressions generated by recursive application of the connectives ∧, ∨ and ¬ to the labels in LA. Let LE denote the set of linguistic expressions. Then LE has the following definition: Definition 5 (Linguistic Expressions). The set of linguistic expressions, LE, is defined recursively as follows: 1. LA ⊆ LE; 2. If θ ∈ LE, ϕ ∈ LE, then θ ∧ ϕ ∈ LE, θ ∨ ϕ ∈ LE, ¬θ ∈ LE. In this definition θ ∧ϕ means ‘θ and ϕ’, θ ∨ϕ means ‘θ or ϕ’, and ¬θ means ‘not θ’. For example, ¬Li may means ‘dissimilar to Pi ’. LE is actually a T -free algebra where T = {∧, ∨, ¬}. The lower and upper neighborhoods of linguistic expressions are then defined recursively as follows: Definition 6 (Lower and Upper Neighborhoods). For any θ ∈ LE and any 1 , 2 ≥ 0 (1 ,2 ) ( , ) are defined respecthe (1 , 2 )-lower and upper neighborhoods N θ 1 2 and N θ tively in the following manner: ( ,2 )
Nθ 1
(1 ,2 )
Nθ ( ,2 )
Nθ 1
(1 ,2 )
Nθ
( ,2 )
= N Li1
(6)
if θ = Li
(7)
1 ,2 ) ∩ N ( if θ = φ ∧ ϕ ϕ
(8)
(1 ,2 )
= N Li
( ,2 )
= Nφ 1
(1 ,2 )
= Nφ
if θ = Li
(1 ,2 )
∩ Nϕ
if θ = φ ∧ ϕ
(9)
Bipolar Semantic Cells: An Interval Model for Linguistic Labels ( ,2 )
Nθ 1
(1 ,2 )
Nθ
( ,2 )
= Nφ 1
(1 ,2 )
= Nφ
1 ,2 ) ∪ N ( if θ = φ ∨ ϕ ϕ
(1 ,2 )
∪ Nϕ
65
(10)
if θ = φ ∨ ϕ
(11)
( ,2 )
( , ) c 1 2 = Nφ if θ = ¬φ
(12)
(1 ,2 )
c ( , ) = Nφ 1 2 if θ = ¬φ
(13)
Nθ 1 Nθ
Since the distance threshold pair (1 , 2 ) has a density function δ on [0, +∞)2 , the lower and upper neighborhood functions of expression θ ∈ LE for element x ∈ Ω are then ( , ) given by the probability of a value of (1 , 2 ) such that x ∈ N θ 1 2 and the probability (1 ,2 )
of pair (1 , 2 ) such that x ∈ N θ
.
Definition 7 (Lower and Upper Neighborhood Functions). For any θ ∈ LE the lower and upper neighborhood functions, μθ (x) and μθ (x), are defined respectively as follows: ( , ) μθ (x) = δ (1 , 2 ) : x ∈ N θ 1 2 (14) (1 ,2 ) μθ (x) = δ (1 , 2 ) : x ∈ N θ (15) In the following we will investigate the relationships between lower and upper neighborhoods of any linguistic expressions. Firstly, we will show that the lower neighborhood is indeed a subset of the upper neighborhood. Lemma 8. For any θ ∈ LE and 1 , 2 ≥ 0 the following holds: ( ,2 )
Nθ 1
(1 ,2 )
⊆ Nθ
(16)
Proof. Let LE 1 = LA and LE m = LE m−1 ∪ {φ ∧ ϕ, φ ∨ ϕ, ¬φ : φ, ϕ ∈ LE m−1 }, m then LE = ∪∞ m=1 LE . We now carry out induction on m. If θ = Li then it holds (1 ,2 ) since N Li = {x : di (x, Pi ) ≤ min{1 , 2 }} ⊆ {x : di (x, Pi ) ≤ max{1 , 2 }} = (1 ,2 )
N Li . Assume that for any θ ∈ LE m it holds, then for any θ ∈ LE m+1 either θ ∈ m LE , in which case the result holds trivially, or θ ∈ {φ ∧ ϕ, φ ∨ ϕ, ¬φ : φ, ϕ ∈ LE m } for which one of the following holds: ( ,2 )
1. For θ = φ ∧ ϕ where φ, ϕ ∈ LE m N θ 1 (1 ,2 )
Nφ
2. 3.
(1 ,2 )
∩ Nϕ
(1 ,2 )
= N φ∧ϕ
(1 ,2 )
= Nθ
( , )
( ,2 )
1 2 = N φ∧ϕ = Nφ 1
1 ,2 ) ∩ N ( ⊆ ϕ
.
( , ) (1 ,2 ) ( , ) 1 ,2 ) For θ = φ ∨ ϕ where φ, ϕ ∈ LE N θ 1 2 = N φ∨ϕ = N φ 1 2 ∪ N ( ϕ (1 ,2 ) (1 ,2 ) (1 ,2 ) (1 ,2 ) Nφ ∪ Nϕ = N φ∨ϕ = N θ . ( , ) c c 1 2 (1 ,2 ) ( , ) ( , ) m For θ = ¬φ where φ ∈ LE N θ = N ¬φ1 2 = N φ ⊆ Nφ 1 2 (1 ,2 ) (1 ,2 ) N ¬φ = Nθ . m
⊆ =
66
Y. Tang and J. Lawry
Theorem 9. For any θ ∈ LE the following holds: μθ (x) ≤ μθ (x)
(17) ( ,2 )
Actually for any linguistic expression θ the lower neighborhood N θ 1 neighborhood
(1 ,2 ) Nφ
and upper
are the random sets taking values as subsets of Ω, and μθ and (1 ,2 )
( , )
μθ are the single point coverage functions of N θ 1 2 and N θ respectively. Hence, μθ (x) and μθ (x) can be considered as membership values of x in the lower and upper extensions of θ respectively. In other words, the membership value of x in the extension of θ is an interval value [μθ (x), μθ (x)]. Theorem 10. For any θ ∈ LE the following formulas hold: μ¬θ (x) = 1 − μθ (x)
(18)
μ¬θ (x) = 1 − μθ (x)
(19)
Proof. For any θ ∈ LE the following hold: ( , ) c 1 2 ( , ) μ¬θ (x) = δ (1 , 2 ) : x ∈ N ¬θ1 2 = δ (1 , 2 ) : x ∈ N θ (1 ,2 ) = 1 − δ (1 , 2 ) : x ∈ N θ = 1 − μθ (x) c (1 ,2 ) ( , ) (1 , 2 ) : x ∈ N ¬θ = δ (1 , 2 ) : x ∈ N θ 1 2 ( , ) = 1 − δ (1 , 2 ) : x ∈ N θ 1 2 = 1 − μθ (x)
μ¬θ (x) = δ
These two formulas describe the relationship between the lower and upper neighborhood functions. Furthermore, they are related to each other in a bipolar manner as described in [2] such that a linguistic expression θ is definitely appropriate to describe elements x if and only if ¬θ is not possibly appropriate to describe x. Lemma 11. For any θ ∈ LE and 1 , 2 ≥ 0 the following hold: ( , )
1 2 N θ∧¬θ =∅
(20)
(1 ,2 )
N θ∨¬θ = Ω ( , )
( ,2 )
1 2 Proof. For any θ ∈ LE, N θ∧¬θ = Nθ 1
(21) ( ,2 )
∩ N ¬θ1
(1 ,2 ) ( , ) since N θ 1 2 ⊆ N θ according to lemma 8. And ( , ) ( , ) ( , ) ( , ) (N ¬θ1 2 )c ∪ (N θ 1 2 )c = (N ¬θ1 2 ∩ N θ 1 2 )c =
( ,2 )
= Nθ 1
(1 ,2 ) c
∩ (N θ
(1 ,2 ) (1 ,2 ) N θ∨¬θ = N θ (1 ,2 ) c (N θ∧¬θ ) = Ω.
) =∅
(1 ,2 ) ∪ N ¬θ
=
Theorem 12. For any θ ∈ LE the following hold: μθ∧¬θ (x) = 0
(22)
μθ∨¬θ (x) = 1
(23)
Bipolar Semantic Cells: An Interval Model for Linguistic Labels
67
This theorem shows that the linguistic expression θ ∧ ¬θ is not definitely appropriate to describe any element x, and the linguistic expression θ ∨ ¬θ is possibly appropriate to describe any element x. Lemma 13. For any φ, ϕ ∈ LE, and any 1 , 2 ≥ 0, the following hold: ( , )
( ,2 )
1 2 – N ¬(¬φ) = Nφ 1
– –
(1 ,2 ) N ¬(φ∧ϕ) (1 ,2 ) N ¬(φ∨ϕ)
= =
(1 ,2 )
(1 ,2 )
and N ¬(¬φ) = N φ
(1 ,2 ) N ¬φ∨¬ϕ (1 ,2 ) N ¬φ∧¬ϕ
and and
(1 ,2 ) N ¬(φ∧ϕ) (1 ,2 ) N ¬(φ∨ϕ)
= =
.
(1 ,2 ) N ¬φ∨¬ϕ . (1 ,2 ) N ¬φ∧¬ϕ .
Proof. For any φ, ϕ ∈ LE, and any 1 , 2 ≥ 0, we have ( , ) c c c (1 ,2 ) 1 2 (1 ,2 ) ( , ) ( , ) – N ¬(¬φ) = N ¬φ = Nφ 1 2 = N φ 1 2 . Similarly N ¬(¬φ) = c ( , ) c c (1 ,2 ) 1 2 ( , ) N ¬φ1 2 = Nφ = Nφ . ( , ) c ( , ) c ( , ) c ( , ) c ( 1 2 1 2 1 ,2 ) 1 2 1 2 (1 ,2 ) – N ¬(φ∧ϕ) = N φ∧ϕ = Nφ ∩ Nϕ = Nφ ∪ Nϕ = ( ,2 )
( , )
( ,2 )
( , )
1 2 1 ,2 ) ∪ N ( = N ¬φ∨¬ϕ . ¬ϕ c c c (1 ,2 ) (1 ,2 ) ( , ) ( , ) 1 ,2 ) Similarly N ¬(φ∧ϕ) = N φ∧ϕ = N φ 1 2 ∩ N ( = Nφ 1 2 ∪ ϕ c (1 ,2 ) (1 ,2 ) (1 ,2 ) (1 ,2 ) Nϕ = N ¬φ ∪ N ¬ϕ = N ¬φ∨¬ϕ . ( , ) c ( , ) ( , ) c ( , ) c (1 ,2 ) c 1 2 1 2 1 2 1 2 (1 ,2 ) – N ¬(φ∨ϕ) = N φ∨ϕ = Nφ ∪ Nϕ = Nφ ∩ Nϕ =
N ¬φ1
1 2 1 ,2 ) ∩ N ( = N ¬φ∧¬ϕ . ¬ϕ c c c (1 ,2 ) (1 ,2 ) ( , ) ( , ) 1 ,2 ) Similarly N ¬(φ∨ϕ) = N φ∨ϕ = N φ 1 2 ∪ N ( = Nφ 1 2 ∩ ϕ c (1 ,2 ) (1 ,2 ) (1 ,2 ) 1 ,2 ) N ( = N ¬φ ∩ N ¬ϕ = N ¬φ∧¬ϕ . ϕ
N ¬φ1
Theorem 14. For any φ, ϕ ∈ LE the following hold: μ¬(¬φ) (x) = μφ (x)
(24)
μ¬(¬φ) (x) = μφ (x)
(25)
μ¬(φ∧ϕ)) (x) = μ¬φ∨¬ϕ (x)
(26)
μ¬(φ∧ϕ)) (x) = μ¬φ∨¬ϕ (x)
(27)
μ¬(φ∨ϕ)) (x) = μ¬φ∧¬ϕ (x)
(28)
μ¬(φ∨ϕ)) (x) = μ¬φ∧¬ϕ (x)
(29)
Theorem 15. For any θ ∈ LE the following hold: μφ∨ϕ (x) = μφ (x) + μϕ (x) − μφ∧ϕ (x)
(30)
μφ∨ϕ (x) = μφ (x) + μϕ (x) − μφ∧ϕ (x)
(31)
68
Y. Tang and J. Lawry
Proof. For any θ ∈ LE we have ( , )
( ,2 )
1 2 μφ∨ϕ (x) = δ({(1 , 2 ) : x ∈ N φ∨ϕ }) = δ({(1 , 2 ) : x ∈ N φ 1
( ,2 )
= δ({(1 , 2 ) : x ∈ N φ 1
( ,2 )
δ({(1 , 2 ) : x ∈ N φ 1
1 ,2 ) ∪ N ( }) ϕ
1 ,2 ) }) + δ({(1 , 2 ) : x ∈ N ( }) − ϕ 1 ,2 ) ∩ N ( }) ϕ
= μφ (x) + μϕ (x) − μφ∧ϕ (x) (1 ,2 )
(1 ,2 )
(1 ,2 ) Nφ })
+ δ({(1 , 2 ) : x ∈
(1 ,2 ) Nϕ })
(1 ,2 ) Nφ
(1 ,2 ) Nϕ })
μφ∨ϕ (x) = δ({(1 , 2 ) : x ∈ N φ∨ϕ }) = δ({(1 , 2 ) : x ∈ N φ = δ({(1 , 2 ) : x ∈ δ({(1 , 2 ) : x ∈
∩
(1 ,2 )
∪ Nϕ
})
−
= μφ (x) + μϕ (x) − μφ∧ϕ (x) Notice that in general we cannot expect that μφ∨ϕ (x) = max{μφ (x), μϕ (x)} and μφ∨ϕ (x) = max{μφ (x), μϕ (x)}. The calculus of lower and upper neighborhood functions is essentially non-truth-functional. Corollary 16. For any θ ∈ LE the following hold: μθ∨¬θ (x) = μθ (x) + μ¬θ (x)
(32)
μθ∧¬θ (x) = μθ (x) − μθ (x)
(33)
Formula (32) means that the degree of linguistic expression θ ∨ ¬θ being definitely appropriate to describe element x is exactly the summation of degrees of linguistic expressions θ and ¬θ being definitely appropriate to describe element x. Formula (33) means that the difference between the upper and lower neighborhood functions is exactly the degree of linguistic expression θ ∧ ¬θ being possibly appropriate to describe the underlying element. We could not expect μθ∧¬θ (x) to be zero in general, since the border area of θ may be non-null. In the following we consider other concise representations of sets {(1 , 2 ) : x ∈ (1 ,2 ) (1 ,2 ) Nθ } and {(1 , 2 ) : x ∈ N θ } determined by linguistic expression θ and element x. Definition 17. For any θ ∈ LE I(x, θ) ⊆ [0, +∞)2 and I(x, θ) ⊆ [0, +∞)2 are defined recursively as follows: I(x, θ) = {(1 , 2 ) : di (x, Pi ) ≤ min{1 , 2 }} if θ = Li
(34)
I(x, θ) = {(1 , 2 ) : di (x, Pi ) ≤ max{1 , 2 }} if θ = Li
(35)
I(x, θ) = I(x, φ) ∩ I(x, ϕ) if θ = φ ∧ ϕ
(36)
Bipolar Semantic Cells: An Interval Model for Linguistic Labels
I(x, θ) = I(x, φ) ∩ I(x, ϕ) if θ = φ ∧ ϕ
(37)
I(x, θ) = I(x, φ) ∪ I(x, ϕ) if θ = φ ∨ ϕ
(38)
I(x, θ) = I(x, φ) ∪ I(x, ϕ) if θ = φ ∨ ϕ
(39)
I(x, θ) = (I(x, φ))c θ = ¬φ
(40)
I(x, θ) = (I(x, φ))c θ = ¬φ
(41)
( ,2 )
It is very easy to show that I(θ, x) = {(1 , 2 ) : x ∈ N θ 1 x∈
(1 ,2 ) Nθ },
69
} and I(θ, x) = {(1 , 2 ) :
which then immediately result in the following theorem.
Theorem 18. For any θ ∈ LE the following hold: μθ (x) = δ(I(x, θ))
(42)
μθ (x) = δ(I(x, θ))
(43)
Let LE ∧,∨ be the set of linguistic expressions generated by recursively applying connectives ∧ and ∨ to linguistic labels in LA. Then the lower and upper neighborhood functions of any θ ∈ LE ∧,∨ have relatively simple forms. Definition 19. For any θ ∈ LE ∧,∨ the real number lb(θ) is defined recursively as follows: lb(θ) = di (x, Pi ) if θ = Li
(44)
lb(θ) = max(lb(φ), lb(ϕ)) if θ = φ ∧ ϕ
(45)
lb(θ) = min(lb(φ), lb(ϕ)) if θ = φ ∨ ϕ
(46)
Theorem 20. For any θ ∈ LE ∧,∨ the following hold: I(x, θ) = {(1 , 2 ) : lb(θ) ≤ min{1 , 2 }}
(47)
I(x, θ) = {(1 , 2 ) : lb(θ) ≤ max{1 , 2 }}
(48)
∧,∨ ∧,∨ ∧,∨ Proof. Let LE0∧,∨ = LA, LEm+1 = LEm ∪ {φ ∧ ϕ, φ ∨ ϕ : φ, ϕ ∈ LEm } +∞ ∧,∨ ∧,∨ for m > 0, then LE = ∪m=0 LEm . We now carry out the induction on m. If ∧,∨ θ = Li then the results hold immediately. Assume that θ ∈ LEm the results hold. ∧,∨ ∧,∨ Then for any θ ∈ LEm+1 either θ ∈ LEm , for which case the results hold trivially, ∧,∨ or θ ∈ {φ ∧ ϕ, φ ∨ ϕ : φ, ϕ ∈ LEm } for which one of the following holds:
Although the lower and upper neighborhood functions is non-truth-functional in general, they are truth-functional for the linguistic expressions in LE ∧,∨ . Hence bipolar semantic cells are partially consistent with intuitionistic fuzzy sets [1].
Bipolar Semantic Cells: An Interval Model for Linguistic Labels
71
4 Conclusions An interval model of linguistic labels is proposed based on bipolar semantic cells. The important aspect of this model is that the linguistic label and its negation share an uncertain border area. The bipolar semantic cell formalizes this idea by assuming that there are two uncertain boundaries for the linguistic label. The calculus of lower and upper neighborhood functions developed from this interval model is non-truth-functional in essence. Future possible work may include the extension of theory of label semantics [3] using this bipolar model, and bipolar rule-based reasoning and decision-making. In addition, it could be interesting to incorporate the bipolar into applications of semantic cells to machine learning and control [10]. Acknowledgment. Yongchuan Tang is funded by the National Natural Science Foundation of China (NSFC) under Grant No. 61075046 and Zhejiang Natural Science Foundation under Grant No. Y1090003.
References 1. Atanassov, K.: Intuitionistic fuzzy sets. Fuzzy Sets and Systems 20, 87–96 (1986) 2. Dubois, D., Prade, H.: An introduction to bipolar representations of information and preference. International Journal of Intelligent Systems 23, 866–877 (2008) 3. Lawry, J.: A framework for linguistic modelling. Artificial Intelligence 155, 1–39 (2004) 4. Lawry, J.: Appropriateness measures: an uncertainty model for vague concepts. Synthese 161(2), 255–269 (2008) 5. Lawry, J.: A Random Set and Prototype Theory Interpretation of Intuitionistic Fuzzy Sets. In: H¨ullermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010, Part I, CCIS, vol. 80, pp. 618–628. Springer, Heidelberg (2010) 6. Lawry, J., Tang, Y.: Uncertainty modelling for vague concepts: A prototype theory approach. Artificial Intelligence 173, 1539–1558 (2009) 7. Lawry, J., Tang, Y.: Granular knowledge representation and inference using labels and expressions. IEEE Trans. Fuzzy Syst. 18(3), 500–514 (2010) 8. Rosch, E.: Natural categories. Cognitive Psychology 4(3), 328–350 (1973) 9. Rosch, E.: Cognitive representations of semantic categories. Journal of Experimental Psychology: General 104(3), 192–233 (1975) 10. Tang, Y., Lawry, J.: A prototype-based rule inference system incorporating linear functions. Fuzzy Sets and Systems 161, 2831–2853 (2010)
A Fuzzy Rule-Based Classification System Using Interval Type-2 Fuzzy Sets Min Tang1 , Xia Chen1 , Weidong Hu1 , and Wenxian Yu2 1
ATR Key Lab, National University of Defense Technology, Changsha 410073, China [email protected] 2 School of Electronic, Information and Electrical Engineering, Shanghai Jiaotong University, Shanghai 200030, China
Abstract. The design of type-2 fuzzy rule-based classification systems from labeled data is considered in this study. With the aid of interval type-2 fuzzy sets, which can effectively capture uncertainties in the data, a compact and interpretable interval type-2 fuzzy rule base with fewer rules is constructed. Corresponding type-2 fuzzy reasoning method for classification is also presented. The validity of this classification system is shown through experimental results on several data sets. Keywords: fuzzy rule-based classification system, interval type-2 fuzzy set, type-2 fuzzy rule base.
1
Introduction
Fuzzy rule-based systems[1] are a popular tool to solve pattern classification problems, which use linguistic variables and fuzzy logic to build interpretable models. The fuzzy rule-based classification systems have been widely employed to real applications such as intrusion detection, medical diagnosis, and shewhart control charts. The generation of fuzzy classification rule base is a critical problem in rule-based systems design. There are many approaches for extracting fuzzy classification rules from data, such as heuristic approach[2, 3] clustering methods [4, 5], nero-fuzzy approaches[6, 7], and genetic algorithms based schemes[8–10]. One of the main advantages of the fuzzy rule-based systems is the high interpretability of the rules. However, the inflexibility of the concept of linguistic variable imposes restrictions to the fuzzy rule structure[11], which degrades the system classification accuracy when dealing with some complex systems. For example, in the case that the classes overlap, the exact knowledge about the membership degree of some elements to the fuzzy sets that characterize the attributes defining the class is not possible. It has been shown that type-2 fuzzy sets (T2 FSs)[12] can manage the uncertainty of patterns more effectively than conventional fuzzy sets (T1 FSs) due the extra degree of freedom. This fact suggests that desirable results may be obtained by representing fuzzy attribute characterizations (of objects) with T2 FSs in rule-based classification context. Besides, with T2 FSs, the apparent paradox of modeling imprecise concepts using precise membership functions faced by T1 FSs is naturally addressed. Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 72–80, 2011. c Springer-Verlag Berlin Heidelberg 2011
Type-2 Classification
73
The design of type-2 fuzzy rule-based classification system has been limitedly discussed[13, 14]. In these works, the type 2 fuzzy sets used are constrained to a special class of Gaussian-form, which can not describe the uncertainty accurately. Besides, there is also a limitation in the number of classes for the problem to be resolved. In this paper, we design a fuzzy classification system from multiclass sample data using general interval type-2 fuzzy sets. A compact type-2 fuzzy rule base with fewer rules is generated with the aid of the representing effectiveness of type-2 fuzzy sets. In particular, interval type-2 fuzzy sets are deployed due to the relative small computational complexity for operations on them. A type2 fuzzy reasoning method is proposed accordingly. Simulation results show the validity of the proposed system generation method.
2
Type-2 Fuzzy Sets and Fuzzy Classification Rules
As a generalization of T1 FSs, the T2 FSs can effectively describe uncertainties in complex uncertain situations. A secondary membership function modeling the uncertainty of the membership of exact T1 FSs is used in T2 FSs. A T2 FS A in the universal of discourse X is characterized by a type-2 fuzzy membership function µA (x) x ∈ X = A
x∈X
µA (x)/x =
x∈X
u∈Jx
fx (u)/u /x Jx ⊆ [0, 1]
(1)
Where µA (x) is the type-2 fuzzy membership grade of element x,which is represented by the secondary membership function fx (u)[12]. Jx is the primary membership representing the domain of the secondary membership function. By extending the extra degree of freedom of primary membership, the T2 FSs are useful in modeling uncertainty. But the operations with regards to the general T2 FSs, like intersection, union, and type-reduction, require undesirably large amount of computations. However, the computational complexity can be reduced drastically by dealing with interval type-2 fuzzy sets (IT2 FSs). IT2 FSs are special T2 FSs whose secondary membership functions are interval sets = A 1/u /x (2) x∈X
u∈Jx
Where Jx is a subinterval of [0,1]. In this case, the Jx x ∈ X completely deter Two T1 FSs, µ (x) = maxu∈Jx (u) and µ (x) = minu∈Jx (u) mines the IT2 FS A. A A are called the upper memx ∈ X,which together are equivalent to the IT2 FS A, bership function and lower membership function respectively. Type-2 fuzzy classification rules can be constructed accordingly by extending their type-1 counterparts, with type-1 fuzzy sets replaced by type-2 fuzzy sets. We use type-2 fuzzy rules of the following form for a M classes problem, j , Rj : If x is A
j then (r1j , · · · , rM )
(3)
74
M. Tang et al.
Where Rj is the label of the j th rule, x = (x1 , · · · , xn ) is an n-dimensional j is an antecedent fuzzy set in the n-dimensional pattern space pattern vector, A j and rq is the certainty degree for the rule j to predict the class Cq q = 1, · · · , M for a pattern belonging to the fuzzy region represented by the antecedent of the rule. The type-1 counterpart of Rj is used in[15].
3
Design of Type-2 Fuzzy Rule-Based Classification System
The fuzzy rule base consisting of fuzzy classification rules establishes a relationship between the space of the features and the space of the classes. We generate type-2 fuzzy rules by using interval type-2 fuzzy C means algorithm for the derivation of IT2 FSs from labeled patterns in this study. A fuzzy reasoning method is proposed accordingly. 3.1
Generation of Interval Type-2 Fuzzy Rules from Data
Given a set of labeled patterns, we use the interval type-2 fuzzy C means (IT2 FCM) algorithm[16] to generate the IT2 FSs for the fuzzy rules to be derived. By allowing a variation of the fuzzifier in FCM, the IT2 FCM can manage the uncertainty in data more effectively. For the given set of labeled patterns, the IT2 FCM algorithm can be implemented in a supervised manner to derive a type-2 fuzzy partition of the pattern space. Since the number of clusters for each class and corresponding initial prototypes can be estimated from the labeled patterns. Each resulting cluster of the IT2 FCM algorithm describes a type-2 fuzzy subspace contained in the region of pattern space formed by the patterns for one class. A fuzzy classification rule is generated from each resulting cluster accordingly. For example, we can partition the pattern space into type-2 fuzzy subspaces with the number of subspaces the same as the number of classes if the patterns from different classes overlap but are separable to certain extent. Then each cluster represents a class. Fig.1 gives a bidimensional example of pattern distribution for three classes, in which each class can be effectively described by a IT2 FS. When the patterns from one class are better described by more than one cluster, we can include this requirement in the IT2 FCM by setting the corresponding initial cluster prototypes for patterns. Let the Mnumber of clusters for class Ck be lk . Then the total cluster number is C = k=1 lk . The IT2 FCM is implemented after the C cluster prototypes are initialized. Suppose that the resulting C cluster prototypes of the IT2 FCM be vj j = 1, . . . C. Then C type-2 fuzzy sets in the n-dimensional pattern space can be formulated. The membership grade bounds of any pattern x for cluster j can be calculated as follows. 1 1 1 C C C 2/(m1 −1) if 2/(m1 −1) > 2/(m2 −1) k=1 (dj /dk ) k=1 (dj /dk ) k=1 (dj /dk ) µj (x) = (4) 1 otherwise C 2/(m2 −1) k=1 (dj /dk ) 1 1 1 C C C 2/(m1 −1) if 2/(m1 −1) ≤ 2/(m2 −1) k=1 (dj /dk ) k=1 (dj /dk ) µj (x) = k=1 (dj /d1k ) (5) otherwise C 2/(m −1) 2 (d /d ) k=1
j
k
Type-2 Classification
75
10
8
6
x
2
4
2
0
−2
−4 −5
0
5 x
10
1
Fig. 1. A bidimensional sample dataset
Where dj is the distance between x and the cluster prototype vj j = 1, . . . C, m1 and m2 are two fuzzifiers. j j = 1, . . . C, then µ (x) = Let the j th IT2 FS obtained in this manner be A Aj [µj (x), µj (x)]. We construct a type-2 fuzzy rule base consisting of C rules. j , If x is A Where rqj =
c(xk )=Cq Uj (xk ) N k=1 Uj (xk )
j then (r1j , · · · , rM ) j = 1, . . . , C.
(6)
, with c(xk ) denotes the class label of training pat-
µ (xk )+µj (xk )
tern xk and Uj (xk ) = j . N is the number of training patterns. The 2 antecedent type-2 fuzzy sets model uncertainty in data more precisely and effectively. A compact rule base consisting of fewer rules is consequently constructed, which gives a robust description to the patterns by taking into account the certainty degrees for all classes in each rule. The three IT2 FSs generated by IT2 FCM from patterns in Fig.1 are shown in Fig.2-4 by setting the number of prototypes for each class be one and m1 = 2, m2 = 5. The upper and lower membership functions are plotted to describe the IT2 FSs in these figures. 3.2
A Fuzzy Reasoning Method
Given an input pattern, conclusions can be derived using the fuzzy reasoning method based on the set of fuzzy if-then rules. We give a reasoning method to match the generated type-2 fuzzy rules to realize class discrimination.
76
M. Tang et al.
1 0.8 0.6 0.4 0.2 0 10 10
5 5 0 x
0 −5
2
−5
x
1
Fig. 2. The interval type-2 fuzzy set corresponding with class “o”
1 0.8 0.6 0.4 0.2 0 10 10
5 5 0 x
0 −5
2
−5
x
1
Fig. 3. The interval type-2 fuzzy set corresponding with class “+”
Considering a new pattern x, the reasoning steps are the following: 1) Matching degree calculation. The matching degree is the strength of activation of the if-part for all rules in the rule base with the pattern x. Since the antecedent fuzzy sets are IT2 FSs, an interval matching degree can be obtained for each rule, i.e., µA j (x) = [µj (x), µj (x)].
Type-2 Classification
77
1 0.8 0.6 0.4 0.2 0 10 10
5 5 0 x
0 −5
2
−5
x
1
Fig. 4. The interval type-2 fuzzy set corresponding with class “×”
2) Association degree computation. The association degre bqj of pattern x with class Cq under the j th rule is obtained by combining µA j (x) and rqj with an aggregation operator h, bqj = h(µA j (x), rqj ). The “product” operator is used for h in this study. Therefore the association degree bqj is also an interval, which can be expressed as bqj = [bqjl , bqjr ] = [rqj · µj (x), rqj · µj (x)]. 3) Pattern classification soundness degree for all classes. Computing the soundness degree Yq of pattern x with class Cq , Yq = f (bqj , j = 1, . . . , C, bqjr > 0), f being a aggregation operator verifying min ≤ f ≤ max. The Quasiarithmetic mean s operator[15] is adopted in this study, i.e., f (a1 , . . . , as ) = H −1 [ 1s i=1 H(ai )], where H(a) = a20 . Yq still assumes a interval form due to the monotonous and continuous properties of H. Let Yq expressed as Yq = [Yql , Yqr ]. 4) Classification. Interval soundness degrees need to be treated after previous step 3). A simple discrimination criteria is used: assign x to the class Cq such Y +Y kr that ql 2 qr = maxk=1,...,M Ykl +Y . 2
4
Numerical Examples
The utility of the interval type-2 fuzzy classification system is demonstrated on two data sets. One is the dataset shown in Fig.1, and the other is the iris dataset from the UCI repository of machine learning databases. The dataset in Fig.1 contains 180 bidimensional patterns from three classes, with 60 patterns in each class, denoted by three point types. The iris dataset is composed of 150 4-dimensional patterns uniformly distributed among three classes.
78
M. Tang et al.
For each problem, we compare the presented interval type-2 fuzzy classification system with the type-1 and triangular type-2 fuzzy systems addressed in[17]. Ten runs of the methods are performed on each dataset. In each run, half of the patterns (90 patterns in the bidimensional problem, 75 patterns in the iris problem) are randomly chosen as training patterns, and the rest are used for testing the systems. In our presented method, the number of clusters for each class is chosen as one in both these experiments. Three type-2 rules are generated in each experiment. The parameters m1 and m2 in the supervised IT2 FCM algorithm are selected as 2 and 5 respectively. The mean of the training patterns in each class is used as the initial prototype for each cluster. The best, the worst as well as the average classification results for the ten independent runs of the three methods are shown in Table 1. For an intuitive perception, the classification result of one run for the sample dataset is shown in Fig. 5, where the patterns belonging to three classes are denoted by three types of points,the black points and red points represent training patterns and testing patterns respectively and the points covered by blue squares are misclassified patterns. It can be observed that the proposed interval type-2 classification system outperforms the type-1 and triangular type-2 systems, improvements in sometimes are significant. The classification accuracy could be enhanced through a further refinement to the system design process, such as the certainty degree derivation and class discrimination criteria determination. However, the aim of this study is mainly to stress the usability of type-2 fuzzy sets in constructing fuzzy classification rules from labeled patterns. Moreover, the effectiveness of IT2 FSs in representing uncertainty in data is also validated to be incorporable in the fuzzy reasoning process. 10
8
6
x
2
4
2
0
−2
−4 −5
0
5 x
10
1
Fig. 5. The classification result of the proposed interval type-2 system in one run
Type-2 Classification
79
Table 1. Classification accuracy of the type-1, triangular and interval type-2 methods Problem Result Type-1 (%) Triangular type-2 (%) Interval type-2 (%) Sample dataset Best 95.56 93.33 95.56 in Fig.1 Worst 86.67 86.67 87.78 Average 90.67 89.89 92.00 Iris Best 94.67 97.33 97.33 Worst 70.67 72.00 72.00 Average 84.00 85.07 88.80
5
Conclusions
A type-2 fuzzy rule-based classification system has been developed in this paper. The IT2 FSs are generated from labeled patterns using IT2 FCM algorithm, which give effective descriptions to the uncertainty in data in a compact form. A interpretable type-2 fuzzy rule base is constructed using the resulting IT2 FSs accordingly, which has fewer rules and models the relationship between feature space and class space effectively. A fuzzy reasoning method is also presented to match the generated type-2 fuzzy rules. Simulation results show that classification accuracy is improved by the interval type-2 fuzzy system as compared with its type-1 and triangular type-2 counterparts. The generated interval type-2 fuzzy system takes advantage of the effectiveness of IT2 FSs in representing the uncertainty in data and the classical fuzzy classification reasoning form, which is considered to be a very powerful tool for solving complex problems. Continuing work can be done to improve the presented interval type-2 method in applications, such as the determination of the number of prototypes for each class, identification of the fuzzifiers and refinement of fuzzy reasoning method.
References 1. Ishibuchi, H., Nakashima, T., Nii, M.: Classification and Modeling with Linguistic Information Granules: Advanced Approaches to Linguistic Data Mining. Springer, Heidelberg (2004) 2. Ishibuchi, H., Nozaki, K., Tanaka, H.: Distributed Representation of Fuzzy Rules and Its Application to Pattern Classification. Fuzzy Sets and Systems 52, 21–32 (1992) 3. Mansoori, E.G., Zolghadri, M.J., Katebi, S.D.: A Weighting Function for Improving Fuzzy Classification Systems Performance. Fuzzy Sets and Systems 158, 583–591 (2007) 4. Abe, S., Thawonmas, R.: A Fuzzy Classifier with Ellipsoidal Regions. IEEE Transactions on Fuzzy Systems 5, 358–368 (1997) 5. Roubos, J.A., Setnes, M., Abonyi, J.: Learning Fuzzy Classification Rules from Labeled Data. Information Sciences 150, 77–93 (2003) 6. Chakraborty, D., Pal, N.R.: A Neuro-Fuzzy Scheme for Simultaneous Feature Selection and Fuzzy Rule-Based Classification. IEEE Transactions on Neural Networks 15, 110–123 (2004)
80
M. Tang et al.
7. Nauck, D., Kruse, R.: A Neuro-Fuzzy Method to Learn Fuzzy Classification Rules from Data. Fuzzy Sets and Systems 89, 277–288 (1997) 8. Ishibuchi, H., Yamamoto, T., Nakashima, T.: Hybridization of Fuzzy GBML Approaches for Pattern Classification Problems. IEEE Transactions on Systems, Man, and Cybernetics, Part B 35, 359–365 (2005) 9. Mansoori, E.G., Zolghadri, M.J., Katebi, S.D.: SGERD: A Steady-State Genetic Algorithm for Extracting Fuzzy Classification Rules from Data. IEEE Transactions on Fuzzy Systems 16, 1061–1071 (2008) 10. Ho, S.Y., Chen, H.M., Ho, S.J., Chen, T.K.: Design of Accurate Classifiers with a Compact Fuzzy-Rule Base using An Evolutionary Scatter Partition of Feature Space. IEEE Transactions on Systems, Man, and Cybernetics, Part B 34, 1031– 1044 (2004) 11. Bastian, A.: How to Handle the Flexibility of Linguistic Variables with Applications. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2, 463–484 (1994) 12. Mendel, J.: Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions. Prentice Hall (2001) 13. Liang, Q., Mendel, J.: MPEG VBR Video Traffic Modeling and Classification Using Fuzzy Technique. IEEE Transactions on Fuzzy Systems 9, 183–193 (2001) 14. Wu, H., Mendel, J.: Classification of Battlefield Ground Vehicles Using Acoustic Features and Fuzzy Logic Rule-Based Classifiers. IEEE Transactions on Fuzzy Systems 15, 56–72 (2007) 15. Cordn, O., del Jesus, M.J., Herrera, F.: A Proposal on Reasoning Methods in Fuzzy Rule-Based Classification Systems. International Journal of Approximate Reasoning 20, 21–45 (1999) 16. Hwang, C., Rhee, F.: Uncertain Fuzzy Clustering: Interval Type-2 Fuzzy Approach to C-means. IEEE Transactions on Fuzzy Systems 15, 107–120 (2007) 17. Starczewski, J.T.: Efficient Triangular Type-2 Fuzzy Logic Systems. International Journal of Approximate Reasoning 50, 799–811 (2009)
Rough Approximations in General Approximation Spaces Keyun Qin1 , Zheng Pei2 , and Yang Xu1 1
2
College of Mathematics, Southwest Jiaotong University, Chengdu, Sichuan 610031, China School of Mathematics & Computer Engineering, Xihua University, Chengdu, Sichuan, 610039, China {keyunqin,pqyz}@263.net, [email protected]
Abstract. This paper is devoted to the discussion of rough approximations in general approximation space. The notions of transitive and Euclidean uncertainty mapping were introduced. The properties of some rough approximations were derived based on transitive and Euclidean uncertainty mappings. Additionally, it is pointed out that some existing approximation mappings are not suitable candidate for rough approximations. Keywords: Rough set, general approximation space, transitive uncertainty mapping, Euclidean uncertainty mapping, rough approximations.
1
Introduction
The theory of rough sets was firstly proposed by Pawlak [13, 14]. It is an extension of set theory for the study of intelligent systems characterized by insufficient and incomplete information. Using the concepts of upper and lower approximations in rough set theory, knowledge hidden in information systems may be unravelled and expressed in the form of decision rules. In Pawlak’s rough set model, an equivalence relation is a key and primitive notion. The equivalence classes are the building blocks for the construction of the lower and upper approximations. This equivalence relation, however, seems to be a very stringent condition that may limit the application domain of the rough set model. To solve this problem, generalizations of rough sets were considered by some scholars. One generalization approach is to consider a similarity or tolerance relation [3, 8, 12, 17, 19, 20, 25–27] rather than an equivalence relation. Another generalization approach is to extend the partition of the universe to a cover [1, 2, 10, 15, 28–32]. The equivalence relation is replaced by a fuzzy relation to deal with data sets with both vagueness and fuzziness and the rough sets are generalized to fuzzy rough sets [4, 5, 9, 11, 16, 18, 21–24]. Gomolinska [6] provided a new approach for the study of rough approximations where the starting point is a generalized approximation space. The rough approximation operator was regarded as set-valued mapping, called approximation mapping. Two pairs of basic approximation mappings were defined typically Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 81–89, 2011. c Springer-Verlag Berlin Heidelberg 2011
82
K. Qin, Z. Pei, and Y. Xu
and generalized approximation mappings were constructed by the compositions of these basic approximation mappings. Some axioms for approximation mappings were proposed. Based on these axioms, the best low-approximation mapping was studied. This paper is devoted to the discussion of rough approximations in general approximation space. The notions of transitive and Euclidean uncertainty mapping were introduced. The properties of some rough approximations which are compositions of basic approximation mappings were derived based on transitive and Euclidean uncertainty mappings. Additionally, it is pointed out that, in accordance with Gomolinska’s axioms, the rough approximations f8 and f9 presented in [6] are not suitable candidate for rough approximations.
2
Preliminaries
This section presents a review of some fundamental notions of Pawlak’s rough sets. We refer to [13, 14] for details. Let U be a finite set, the universe of discourse, and R an equivalence relation on U , called an indiscernibility relation. The pair (U, R) is called a Pawlak approximation space. R will generate a partition U/R = {[x]R ; x ∈ U } on U , where [x]R is the equivalence class with respect to R containing x. For each X ⊆ U , the upper approximation R(X) and lower approximation R(X) of X are defined as [13, 14]: R(X) = {x; [x]R ∩ X = ∅},
(1)
R(X) = {x; [x]R ⊆ X}.
(2)
Alternatively, in terms of equivalence classes of R, the pair of lower and upper approximation can be defined by R(X) = ∪{[x]R ; [x]R ∩ X = ∅},
(3)
R(X) = ∪{[x]R ; [x]R ⊆ X}.
(4)
Let ∅ be the empty set and ∼ X the complement of X in U , the following conclusions have been established for Pawlak’s rough sets: (1) (2) (3) (4) (5) (6) (7) (8) (9)
It has been shown that (3), (4) and (8) are the characteristic properties of the lower and upper approximations [25, 27].
Rough Approximations in General Approximation Spaces
3
83
A General Notion of Rough Approximation Mapping
A general approximation space [6] is a triple A = (U, I, k), where U is a nonempty set called the universe, I : U → P (U ) is an uncertainty mapping, and k : P (U ) × P (U ) → [0, 1] is a rough inclusion function. In general approximation space A = (U, I, k), w ∈ I(u) is understood as w is in some sense similar to u and it is reasonable to assume that u ∈ I(u) for every u ∈ U . Then {I(u); u ∈ U } forms a covering of the universe U . The role of the uncertainty mapping may be played by a binary relation on U . In what follows, we suppose that u ∈ I(u) for every u, v ∈ U . We consider mappings f : P (U ) → P (U ). We can define a partial ordering relation, ≤, on the set of all such mappings as follows: f ≤ g if and only if ∀x ⊆ U (f (x) ⊆ g(x)), for every f, g : P (U ) → P (U ). By id we denote the identity mapping on P (U ). g ◦ f : P (U ) → P (U ) defined by g ◦ f (x) = g(f (x)) for every x ⊆ U , is the composition of f and g. We call g dual to f , written g = f d , if g(x) =∼ f (∼ x). The mapping f is monotone if and only if for every x, y ⊆ U , x ⊆ y implies f (x) ⊆ f (y). 3.1
Axioms for Rough Approximation Mappings
Theoretically speaking, every rough approximation operator is a mapping from P (U ) to P (U ), we call it approximation mapping. Gomolinska [6] proposed some fundamental properties that any reasonable rough approximation mapping f : P (U ) → P (U ) should possibly possess. They are the following axioms: (a1) Every low-mapping f is decreasing, i.e., f ≤ id. (a2) Every upp-mapping f is increasing, i.e., id ≤ f . (a3) If f is a low-mapping, then (∗)∀x ⊆ U ∀u ∈ f (x)(I(u) ⊆ x). (a4) If f is a upp-mapping, then (∗∗)∀x ⊆ U ∀u ∈ f (x)(I(u) ∩ x = ∅). (a5) For each x ⊆ U , f (x) is definable in A, i.e., there exists y ⊆ U such that f (x) = ∪{I(u); u ∈ y}. (a6) For each x ⊆ U definable in A, f (x) = x. The motivation behind these axioms was analyzed in [6]. Also, it is noticed that finding appropriate candidates for low- and upp-mappings satisfying these axioms is not an easy matter in general case. 3.2
The Structure of Rough Approximation Mappings
Let A = (U, I, k) be a general approximation space. The approximation mappings f0 , f1 : P (U ) → P (U ) were defined as [6]: for every x ⊆ U , f0 (x) = {I(u); u ∈ x}, (5) f1 (x) = {u; I(u) ∩ x = ∅}.
(6)
Observe that f0d and f1d satisfy: f0d (x) = {u; ∀w(u ∈ I(w) ⇒ w ∈ x)},
(7)
84
K. Qin, Z. Pei, and Y. Xu
f1d (x) = {u; I(u) ⊆ x}.
(8) f0d
f1d
If {I(u); u ∈ U } is a partition of U , then f0 = f1 , = and they are the classical rough approximation operators. Based on f0 , f1 and their dual mappings, several approximation mappings were defined in [6] by means of operations of composition and duality as follows: for every x ⊆ U , . f2 = f0 ◦ f1d : i.e., f2 (x) = {I(u); I(u) ⊆ x}, . f3 = f0 ◦ f1 : i.e., f3 (x) = {I(u); I(u) ∩ x = ∅}, . f4 = f0d ◦ f1 = f2d : i.e., f4 (x) = {u; ∀w(u ∈ I(w) ⇒ I(w) ∩ x = ∅)}, . f5 = f0d ◦ f1d = f3d : i.e., f5 (x) = {u; ∀w(u ∈ I(w) ⇒ I(w) ⊆ x)}, . f6 = f1d ◦ f1d : i.e., f6 (x) = {u; ∀w(w ∈ I(u) ⇒ I(w) ⊆x)}, . f7 = f0 ◦ f6 = f0 ◦ f1d ◦ f1d = f2 ◦ f1d : i.e., f7 (x) = {I(u); ∀w(w ∈ I(u) ⇒ I(w) ⊆ x)}, . f8 = f1d ◦ f1 : i.e., f8 (x) = {u; ∀w(w ∈ I(u) ⇒ I(w) ∩x = ∅)}, . f9 = f0 ◦ f8 = f0 ◦ f1d ◦ f1 = f2 ◦ f1 : i.e., f9 (x) = {I(u); ∀w(w ∈ I(u) ⇒ I(w) ∩ x = ∅)}. Theorem 1. [6] Consider any f : P (U ) → P (U ). (1) f (x) is definable for any x ⊆ U iff there is a mapping g : P (U ) → P (U ) such that f = f0 ◦ g. (2) The condition (∗) is satisfied iff f ≤ f1d . (3) The condition (∗∗) is satisfied iff f ≤ f1 . Theorem 2. [6] For any sets x, y ⊆ U , we have that: (1) fi (∅) = ∅ and fi (U ) = U for i = 0, 1, · · · , 9. fid (∅) = ∅ and fid (U ) = U for i = 0, 1. (2) fi and fjd are monotone for i = 0, 1, · · · , 9 and j = 0, 1. (3) fi (x ∪ y) = fi (x) ∪ fi (y) for i = 0, 1, 3. (4) fi (x ∩ y) = fi (x) ∩ fi (y) and fjd (x ∩ y) = fjd (x) ∩ fjd (y) for i = 5, 6 and j = 0, 1. Theorem 3. [6] Let A = (U, I, k) be a general approximation space. (1) f5 ≤ f1d ≤ f2 ≤ id ≤ f4 ≤ f1 ≤ f3 . (2) f5 ≤ f0d ≤ id ≤ f0 ≤ f3 . (3) f6 ≤ f7 ≤ f1d . (4) f8 ≤ f9 ≤ f1 . (5) fi ◦ fi = fi for i = 2, 4. Example 1. Let U = {x, y}. We suppose that I : U → P (U ) is defined by: I(x) = {x}, I(y) = {x, y}. (1) By the definition, we have: f8 ({x}) = f1d (f1 ({x})) = f1d ({x, y}) = {x, y}, f8 ({y}) = f1d (f1 ({y})) = f1d ({y}) = ∅. f9 ({x}) = f0 (f8 ({x})) = f0 ({x, y}) = I(x) ∪ I(y) = {x, y}, f9 ({y}) = f0 (f8 ({y})) = f0 (∅) = ∅.
Rough Approximations in General Approximation Spaces
85
It follows that neither f8 ≤ id nor id ≤ f8 holds in general. Similarly, we have f9 id and id f9 . By axiom (a1) and (a2), neither f8 nor f9 is a suitable candidate for rough approximations. (2)Let f = f0 ◦ f0d . Then f ({x}) = f0 (f0d ({x})) = f0 (∅) = ∅, f ({y}) = f0 (f0d ({y})) = f0 ({y}) = I(y) = {x, y}. It follows that f id and id f . Consequently, f is not a suitable candidate for rough approximations. Based on f0 , f1d , we define f10 = f1d ◦ f0 , i.e., f10 (x) = {u ∈ U ; ∀v(v ∈ I(u) → ∃w ∈ x(v ∈ I(w)))} for every x ⊆ U . Theorem 4. For any sets x, y ⊆ U , we have that: (1) f10 (∅) = ∅, f10 (U ) = U . (2) id ≤ f10 . (3) x ⊆ y implies f10 (x) ⊆ f10 (y). (4) f10 (x ∩ y) ⊆ f10 (x) ∩ f10 (y), f10 (x ∪ y) ⊇ f10 (x) ∪ f10 (y). The proof of this theorem is straightforward. In view of the previous results, we summarize the rough approximations in the following table. By upp (resp., low) we denote upper (resp. lower) approximations, while ⊥ denotes that the corresponding composition is not a suitable candidate for rough approximations. Table 1. Rough approximations based on uncertainty mapping f f0 f1 f0d f1d d f0 f0 ◦ f0 (upp) f0 ◦ f1 (f3 , upp) ⊥ f0 ◦ f1 (f2 , low) d f1 f1 ◦ f0 (upp) f1 ◦ f1 (f6d , upp) f1 ◦ f0d (f10 , low) ⊥ f0d ⊥ f0d ◦ f1 (f4 , upp) f0d ◦ f0d (low) f0d ◦ f1d (f5 , low) f1d f1d ◦ f0 (f10 , upp) ⊥ f1d ◦ f0d (low) f1d ◦ f1d (f6 , low)
4
The Transitive and Euclidean Uncertainty Mapping
In view of the previous results and in accordance with the axioms, any lowor upp-mapping should have the form f0 ◦ g, where g : P (U ) → P (U ) satisfies f0 ◦ g ◦ f0 = f0 and, moreover, f0 ◦ g ≤ f1d in the lower case, while id ≤ f0 ◦ g ≤ f1 in the upper case [6]. Clearly, ≤ −maximal among the low-mappings
86
K. Qin, Z. Pei, and Y. Xu
and ≤ −minimal among the upp-mappings would be the best approximation operators. The greatest element among the low-mappings just described is the mapping h : P (U ) → P (U ) where for any x ⊆ U , h(x) = ∪{(f0 ◦ g)(x); g : P (U ) → P (U ) ∧ f0 ◦ g ◦ f0 = f0 ∧ f0 ◦ g ≤ f1d }.
(9)
It is noticed that an analogous construction, using ∩, does not provide us with the least element of the family of upp-mappings [6]. 4.1
The Transitive Uncertainty Mapping
Let A = (U, I, k) be a general approximation space. I is said to be a transitive uncertainty mapping [7], if u ∈ I(v) implies I(u) ⊆ I(v) for every u, v ∈ U . Theorem 5. [7] Consider any f : P (U ) → P (U ). f satisfies (a5) and (a6) if and only if there is a mapping g : P (U ) → P (U ) such that f = f0 ◦ g and f 0 ◦ g ◦ f0 = f0 . Theorem 6. [7] Let A = (U, I, k) be a general approximation space and I a transitive uncertainty mapping. (1) f0 ◦ f1d = f1d , (2) f1d ◦ f0 = f0 . (3) fi ◦ fi = fi for i = 0, 1. (4)f2 = f6 = f7 = f1d . (5) f4 = f1 . Theorem 7. [7] Let A = (U, I, k) be a general approximation space. There exists a lower approximation f which satisfy (a1), (a3), (a5) and (a6) if and only if I is a transitive uncertainty mapping. In this case, f1d is ≤ −maximal among the lower approximations which satisfy (a1), (a3), (a5) and (a6). This theorem shows that f1d is the best lower approximation if I is transitive. In this case, the rough approximations can be summarized in the following table: Table 2. Rough approximations based on transitive uncertainty mapping f f0 f1 f0d f1d d f0 f0 (upp) f0 ◦ f1 (f3 , upp) ⊥ f1 (f2 , low) d f1 f1 ◦ f0 (upp) f1 (f6d , upp) f0d (f10 , low) ⊥ f0d ⊥ f1 (f4 , upp) f0d (low) f0d ◦ f1d (f5 , low) f1d f0 (f10 , upp) ⊥ f1d ◦ f0d (low) f1d (f6 , low)
4.2
The Euclidean Uncertainty Mapping
In this subsection, we concentrate on properties specific for Euclidean uncertainty mapping.
Rough Approximations in General Approximation Spaces
87
Definition 1. Let A = (U, I, k) be a general approximation space. (1) I is said to be a symmetric uncertainty mapping, if u ∈ I(v) implies v ∈ I(u) for every u, v ∈ U . (2) I is said to be an Euclidean uncertainty mapping, if u ∈ I(v) implies I(v) ⊆ I(u) for every u, v ∈ U . By this definition, if I is Euclidean, then I is symmetric. Theorem 8. Let A = (U, I, k) be a general approximation space. I is symmetric if and only if f0 = f1 . Proof. Assume that I is symmetric. Consider any x ⊆ U and u ∈ U . Then u ∈ f0 (x) ⇔ ∃w ∈ x(u ∈ I(w)) ⇔ ∃w ∈ x(w ∈ I(u)) ⇔ I(u) ∩ x = ∅ ⇔ u ∈ f1 (x). It follows that f0 = f1 . Conversely, assume that f0 = f1 . Let u, v ∈ U and u ∈ I(v). Then u ∈ I(v) = f0 ({v}) = f1 ({v}) = {w ∈ U ; I(w) ∩ {v} = ∅}. It follows that I(u) ∩ {v} = ∅ and hence v ∈ I(u) as required. Theorem 9. Let A = (U, I, k) be a general approximation space. I is Euclidean if and only if f1d ◦ f1 = f1 . Proof. Assume that I is Euclidean. Consider any x ⊆ U and u ∈ U . If u ∈ f1 (x), then I(u) ∩ x = ∅. For every v ∈ I(u), we have I(v) ∩ x ⊇ I(u) ∩ x = ∅. It follows that v ∈ f1 (x) and hence I(u) ⊆ f1 (x). Consequently, u ∈ {w ∈ U ; I(w) ⊆ f1 (x)} = f1d (f1 (x)) = (f1d ◦ f1 )(x), and hence f1 (x) ⊆ (f1d ◦ f1 )(x). By f1d ≤ id we know that f1 (x) ⊇ (f1d ◦ f1 )(x), and f1 (x) = (f1d ◦ f1 )(x) as required. Conversely, assume that f1d ◦ f1 = f1 . Let u, v ∈ U and u ∈ I(v). For every w ∈ I(v), by f1 ({w}) = {t ∈ U ; I(t) ∩ {w} = ∅} = {t ∈ U ; w ∈ I(t)}, we know that v ∈ f1 ({w}) = f1d (f1 ({w})) and hence I(v) ⊆ f1 ({w}). By u ∈ I(v), u ∈ f1 ({w}) followed and consequently w ∈ I(u). So we have I(v) ⊆ I(u) as required. Corollary 1. Let I be an Euclidean uncertainty mapping. Then (1) f1 ◦ f1d = f1d . (2) f0d ◦ f0 = f0 . (3) f0 ◦ f0d = f0d . Corollary 2. Let I be an Euclidean uncertainty mapping. Then (1) f2 = f0d . (2) f4 = f8 = f9 = f10 = f0 . (3) f3 = f0 ◦ f0 . (4) f5 = f6 = f7 = f0d ◦ f0d . We notice that an Euclidean uncertainty mapping need not necessarily to be transitive. So, in this case, the lower approximation which satisfy (a1), (a3), (a5) and (a6) does not exist in general according to Theorem 8.
88
K. Qin, Z. Pei, and Y. Xu
Acknowledgements. This work has been supported by the National Natural Science Foundation of China (Grant No. 60875034) and the Fundamental Research Funds for the Central Universities of China (Grant No. SWJTU09ZT37).
References 1. Bonikowski, Z., Bryniarski, E., Wybraniec, U.: Extensions and intentions in the rough set theory. Information Sciences 107, 149–167 (1998) 2. Bryniarski, E.: A calculus of a rough set of the first order. Bulletin of Polish Academy of Sciences 16, 71–77 (1989) 3. Cattaneo, G., Ciucci, D.: Algebraic structures for rough sets. In: Peters, J.F., Skowron, A., Dubois, D., Grzymala-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds.) Transactions on Rough Sets II. LNCS, vol. 3135, pp. 208–252. Springer, Heidelberg (2004) 4. Dubois, D., Prade, H.: Rough fuzzy set and fuzzy rough sets. International Journal of General Systems 17, 191–209 (1990) 5. Dubois, D., Prade, H.: Putting fuzzy sets and rough sets together. In: Slowinski (ed.) Intelligent Decision Support, pp. 203–232. Kluwer Academic (1992) 6. Gomolinska, A.: A comparative study of some generalized rough approximations. Fundamenta Informaticae 51, 103–119 (2002) 7. Jiang, B., Qin, K., Pei, Z.: On Transitive Uncertainty Mappings. In: Yao, J., Lin´ ezak, D. (eds.) RSKT 2007. gras, P., Wu, W.-Z., Szczuka, M.S., Cercone, N.J., Sl¸ LNCS (LNAI), vol. 4481, pp. 42–49. Springer, Heidelberg (2007) 8. Lin, T.Y.: Neighborhood systems-application to qualitative fuzzy and rough sets. In: Wang, P.P. (ed.) Advances in Machine Intelligence and Soft-Computing, Department of Electrical Engineering, Duke University, Durham, NC, USA, pp. 132– 155 (1997) 9. Liu, G.-L., Sai, Y.: Invertible approximation operators of generalized rough sets and fuzzy rough sets. Information Sciences 180, 2221–2229 (2010) 10. Liu, G.-L., Sai, Y.: A comparison of two types of rough sets induced by coverings. International Journal of Approximate Reasoning 50, 521–528 (2009) 11. Morsi, N.N., Yakout, M.M.: Axiomatics for fuzzy rough sets. Fuzzy Sets and Systems 100, 327–342 (1998) 12. Nieminen, J.: Rough set tolerance equality. Fundamenta Informaticae 11(3), 289– 296 (1998) 13. Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11, 341–356 (1982) 14. Pawlak, Z.: Rough sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Boston (1991) 15. Qin, K., Gao, Y., Pei, Z.: On Covering Rough Sets. In: Yao, J., Lingras, P., Wu, ´ ezak, D. (eds.) RSKT 2007. LNCS (LNAI), W.-Z., Szczuka, M.S., Cercone, N.J., Sl¸ vol. 4481, pp. 34–41. Springer, Heidelberg (2007) 16. Qin, K.-Y., Pei, Z.: On the topological properties of fuzzy rough sets. Fuzzy Sets and Systems 151(3), 601–613 (2005) 17. Qin, K.-Y., Yang, J.-L., Pei, Z.: Generalized rough sets based on reflexive and transitive relations. Information Sciences 178, 4138–4141 (2008) 18. Radzikowska, A.M., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126, 137–155 (2002)
Rough Approximations in General Approximation Spaces
89
19. Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12, 331–336 (2000) 20. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27, 245–253 (1996) 21. Thiele, H.: On axiomatic characterizations of fuzzy approximation operators: I. The fuzzy rough set based case. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 239–247. Springer, Heidelberg (2001) 22. Thiele, H.: On axiomatic characterization of fuzzy approximation operators II, the rough fuzzy set based case. In: Proceedings of the 31st IEEE International Symposium on Multiple-Valued Logic, pp. 330–335 (2001) 23. Wu, W.-Z., Mi, J.-S., Zhang, W.-X.: Generalized fuzzy rough sets. Information Sciences 151, 263–282 (2003) 24. Wu, W.-Z., Zhang, W.-X.: Constructive and axiomatic approaches of fuzzy approximation operators. Information Sciences 159, 233–254 (2004) 25. Yao, Y.Y.: Relational interpretations of neighborhood operators and rough set approximation operators. Information Sciences 111, 239–259 (1998) 26. Yao, Y.Y., Wong, S.K.M.: Generalization of rough sets using relationships between attribute values. In: Proceedings of the Second Annual Joint Conference Information Sciences, pp. 30–33 (1995) 27. Yao, Y.Y.: Constructive and algebraic methods of theory of rough sets. Information Sciences 109, 21–47 (1998) 28. Zakowski, W.: Approximations in the space (U, ). Demonstratio Mathematica 16(40), 761–769 (1983) 29. Zhu, W., Wang, F.-Y.: Reduction and axiomization of covering generalized rough sets. Information Sciences 152(1), 217–230 (2003) 30. Zhu, W.: Topological approaches to covering rough sets. Information Sciences 177, 1499–1508 (2007) 31. Zhu, W.: Relationship among basic concepts in covering-based rough sets. Information Sciences 179, 2478–2486 (2009) 32. Zhu, W.: Relationship between generalized rough sets based on binary relation and covering. Information Sciences 179, 210–225 (2009)
Multi-agents and Non-classical Logic Systems Chenfang Zhao and Zheng Pei School of Mathematics and Computer Engineering, Xihua University, Chengdu 610039, China [email protected]
Abstract. To model voting machine by internet, valuation of classical propositional calculus is extended, and multi-agents valuation of propositional calculus is proposed. Then formal concept analysis is used to express uncertainty of statements, i.e., degrees of truth value, the conclusion points out that non-classical logic systems is necessary to process uncertain information.
1
Introduction
Classical logic is used as a main tool in inference and decision making field in real world application. In addition, many non-classical logic systems such as Lukasiewicz logic, Goguen logic, G¨odel logic, fuzzy logic, etc [1]-[5] and random set theory, fuzzy set theory, rough set theory, etc [6]-[11] are presented and used to describe and deal with uncertainty. In this paper, a kind of uncertainty which is generated by multi-agents assignation, will be discussed. The application background of this uncertainty is voting machine by internet. As far as our knowledge concerned, voting machine by internet is widely used. In many cases, information of voting machine by internet is certain. However, when all information given by all voting agents (or voting users) is considered, uncertainty will generate, e.g., abstractly, for a statement p, when the question about “truth value (true or false) of p?” is asked, voting agent A maybe assign “true” to p; voting agent B maybe assign “false” to p, · · · . Now, considering all information assigned by voting agents, then p is “true” or “false”? From valuation of classical propositional calculus, the problem can be expressed as 1. For an voting agent, e.g., A, valuation of A is true or false, i.e. TA (p) = true or TA (p) = f alse ; 2. For all voting agents, valuation of voting agents is T(A,B,··· ) (p) = (true, f alse, · · · ). Uncertainty is inherent in extending one-dimension valuation true or f alse to multi-valuation (true, f alse, · · · ). To solve the meaning of fuzzy truth values or membership functions of fuzzy sets, a voting mechanism for fuzzy logic is proposed, and used to the fuzzy predicate calculus in literature [12]-[14]. Formal concept analysis(FCA) is a discipline that studies the hierarchical structures induced by a binary relation between Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 90–97, 2011. c Springer-Verlag Berlin Heidelberg 2011
Multi-agents and Non-classical Logic Systems
91
a pair of sets in literature [15]-[25]. In this paper, based on a voting mechanism for fuzzy logic, valuation of classical propositional calculus is extended, and multi-agents valuation of propositional calculus is proposed. Then FCA is used to express uncertainty of statements, i.e., degrees of truth value, the conclusion points out that non-classical logic systems is necessary to process uncertain information.
2
Preliminaries
In literature [12], a voting mechanism for fuzzy logic is proposed to extend notion of binary valuation for fuzzy concepts and fuzzy predicate. Formally, let L be a countable language of the propositional calculus consisting of a countable set of propositional variables P V L together with the connectives ∧, ∨ and ¬. Let SL denote the sentences of L. Definition 1. [12] A fuzzy valuation of L is a function F : SL×[0, 1] −→ {t, f } such that ∀θ ∈ SL, ∀0 ≤ y < y ≤ 1, F (θ, y) = f =⇒ F (θ, y ) = f and satisfies the following: ∀θ, φ ∈ SL, y ∈ [0, 1], 1)F (θ ∧ φ, y) = t ⇐⇒ F (θ, y) = t and F (φ, y) = t; 2)F (θ ∨ φ, y) = t ⇐⇒ F (θ, y) = t or F (φ, y) = t; 3) F (¬θ, y) = t ⇐⇒ F (θ, 1 − y) = f . In the Definition, y ∈ [0, 1] is viewed as the scepticism level of the voting agent. Based on the voting model, many interesting and important subjects of fuzzy logic have been discussed. Information systems and formal concept analysis(FCA) are widely used in data mining and knowledge discovering, etc [13]-[25]. Definition 2. A formal context is an ordered triple T = (G, M, I), where G, M are nonempty sets and I : G × M −→ {0, 1} is a binary relation. The elements of G are called objects and the elements of M attributes. I(g, m) = 1 means that object g has attribute m. Binary relation I of a formal context can be naturally represented by an twovalued table. The following two set-valued functions are used to define the formal concept of T = (G, M, I) ↑ : 2G → 2M , X ↑ = {m ∈ M ; ∀g ∈ X, I(g, m) = 1}, ↓ : 2M → 2G , Y ↓ = {g ∈ G; ∀m ∈ Y, I(g, m) = 1}.
(1) (2)
Definition 3. A formal concept of a context T = (G, M, I) is a pair (A, B) ∈ 2G × 2M such that A↑ = B and B ↓ = A. The set A is called its extent, the set B its intent.
92
3
C. Zhao and Z. Pei
Multi-agents Valuation of Simple Statements
The propositional calculus considers ways in which simple statements may be combined to form more complex statements by using connectives ∧, ∨, → and ¬. Let L = {p1 , p2 , · · · , pn } be the set of simple statements, AG = {ag1 , ag2 , · · · , agm } the set of Multi-agents. Definition 4. A multi-agents valuation of propositional calculus is F : AG × L −→ {true, f alse}, (agi , pj ) −→ true(orf alse).
(3)
In which, F (agi , pj ) = true(orf alse) means that agent agi assigns “true (t)” (or “false (f )”) to simple statement pj ∈ L. It is noticed that a multi-agents valuation of propositional calculus can be expressed as in Table 1. Table 1. A multi-agents valuation table ag1 ag2 .. . agi .. . agm
p1 t (or f ) t (or f ) .. . t (or f ) .. . t (or f )
p2 t (or f) t (or f) .. . t (or f) .. . t (or f)
··· ··· ··· .. . ··· .. . ···
pj t (or f ) t (or f ) .. . t (or f ) .. . t (or f )
··· ··· ··· .. . ··· .. . ···
pn t (or f ) t (or f ) .. . t (or f ) .. . t (or f )
Comparing a multi-agents valuation with classical valuation of propositional calculus, for L = {p1 , p2 , · · · , pn }, 1. There are N = 2mn valuations instead of N = 2n ; 2. Table 1 can be understood as classical formal context, in which, L = {p1 , p2 , · · · , pn } is understood as set of attributes, AG = {ag1 , ag2 , · · · , agm } as set of objects. From this point of view, for every simple statement pj ∈ L, truth value of pj , denoted by Tpj , is an vector, i.e., Tpj = (t (or f ), t (or f ), · · · , t (or f )) ↑ ↑ ··· ↑ ag1 ag2 ··· agm
(4)
instead of Tpj = t (or f ) in classical propositional calculus 3. In multi-agents valuation, degree of truth value is necessary for every simple statement pj . Based on (4), it can be found that it is meaningless that considering valuation of pj is t or f under multi-agents environment. In our opinion, if classical logic system is understood as one-agent valuation, due to valuation “t or f ” in one-agent
Multi-agents and Non-classical Logic Systems
93
valuation is converted by vector (t (or f ), · · · , t (or f )) in multi-agents valuation, uncertainty (incomparability between two vectors) is created. It maybe a good idea to use non-classical logic systems to solve the problem. In this paper, the uncertainty is expressed by degree of truth value. Definition 5. For a fixed multi-agents valuation F and every pj ∈ L, degree of truth value of pj is DTpj =
|p↓j | , |AG|
(5)
in which, p↓j = {agi |F (agi , pj ) = t}. As a special case, if DTpj = 0, i.e., p↓j = {agi |F (agi , pj ) = t} = ∅ (every agent assigns f to pj ), then pj is called absolutely false under multi-agents environment. If DTpj = 1, i.e., p↓j = {agi |F (agi , pj ) = t} = AG (every agent assigns t to pj ), then pj is called absolutely true under multi-agents environment. In other cases, DTpj expresses some uncertainty of valuation of pj . Example 1. Let L = {p1 , p2 , · · · , p7 } be the set of simple statements, and AG = {ag1 , · · · , ag5 } the set of multi-agents. A multi-agents valuation is Table 2. Table 2. A multi-agents valuation table ag1 ag2 ag3 ag4 ag5
p1 t t t t t
p2 f t f f t
p3 t f f t t
p4 t f t t t
p5 f t f f f
p6 f t t f f
p7 f f f f f
According to Table.2, the following degrees of truth value can be computed |p↓ |
1 DTp1 = |AG| = 55 = 1, DTp2 = 25 , DTp3 = 35 , DTp4 = 45 , DTp5 = 15 , DTp6 = 25 and DTp7 = 0. In the valuation, truth value of p1 and p7 are absolutely true and absolutely false, respectively.
4
Degree of Truth Value of Complex Statements
From the viewpoint of logic, let L = {p1 , p2 , · · · , pn } be the set of simple statements, then all statements can be recursively generated by using connectives ∧, ∨, → and ¬as following: 1) Simple statement is a statement; 2) If pj is statement, then ¬pj is a statement; 3)If pj and pk are statement, then pj ∗ pk is a statement, in which, ∗ is ∧, ∨ or →; 4)All statements are generated finitely by using above three steps. In classical logic system, truth values of complex
94
C. Zhao and Z. Pei
statements can be obtained by truth values of simple statements which is used to generate complex statements, i.e., t, if pj = f t, if pj = t and pk = t T¬pj = , Tpj ∧pk = (6) f, if pj = t f, otherwise f, if pj = f and pk = f Tpj ∨pk = , Tpj →pk = T(¬pj )∨pk . (7) t, otherwise Similarly, degrees of truth value of complex statements under multi-agents environment are consider in this Section. Definition 6. Let L = {p1 , p2 , · · · , pn } be the set of simple statements, F be defined as in (3). For pj , pk ∈ L, ag1 ag2 agm F (¬pj ) = (T¬p , T¬p , · · · , T¬p ), j j j
F (pj ∧ pk ) = = F (pj ∨ pk ) = = F (pj → pk ) = =
(8)
1 m 1 m F (pj ) ∧ F (pk ) = (Tpag , · · · , Tpag ) ∧ (Tpag , · · · , Tpag ) j j k k ag1 ag2 agm (Tpj ∧pk , Tpj ∧pk , · · · , Tpj ∧pk ), 1 m 1 m F (pj ) ∨ F (pk ) = (Tpag , · · · , Tpag ) ∨ (Tpag , · · · , Tpag ) j j k k ag1 ag2 agm (Tpj ∨pk , Tpj ∨pk , · · · , Tpj ∨pk ), 1 m 1 m F (pj ) → F (pk ) = (Tpag , · · · , Tpag ) → (Tpag , · · · , Tpag ) j j k k ag1 ag2 agm (Tpj →pk , Tpj →pk , · · · , Tpj →pk ),
agi i in which, Tpag is truth value of pj assigned by i-th agent agi , T¬p , j j agi and Tpj →pk are obtained by (6) and (7), respectively.
i Tpag , j ∧pk
(9) (10) (11)
i Tpag j ∨pk
Example 2. As a continuation of Example 1. According to Table 1 and (8)ag1 ag5 (11), we have F (¬p1 ) = (T¬p , · · · , T¬p ) = (f, f, f, f, f ), F (¬p7 ) = (t, t, t, t, t), 1 1 1 5 F (¬p2 ) = (t, f, t, t, f ), F (¬p6 ) = (t, f, f, t, t), F (p2 ∧p6 ) = (Tpag , · · · , Tpag )= 1 ∧p6 1 ∧p6 (f, t, f, f, f ), F (p1 ∧ p7 ) = (f, f, f, f, f ), F (p3 ∧ p4 ) = (t, f, f, t, t), F (p3 ∧ p6 ) = 1 5 (f, f, f, f, f ), F (p2 ∨ p6 ) = (Tpag , · · · , Tpag ) = (f, t, t, f, t), F (p1 ∨ p7 ) 1 ∨p6 1 ∨p6 = (t, t, t, t, t), F (p3 ∨ p4 ) = (t, f, t, t, t), F (p3 ∨ p6 ) = (t, t, t, t, t), F (p2 → p6 ) = 1 5 (Tpag , · · · , Tpag ) = (t, t, t, t, f ), F (p1 → p7 ) = (f, f, f, f, f ), F (p3 → p4 ) = 1 →p6 1 →p6 (t, t, t, t, t), F (p3 → p6 ) = (f, t, t, f, f ). Property 1. Let F be a multi-agents valuation. ∀pj ∈ L, DT¬pj =
|p↓j | |(¬pj )↓ | =1− = 1 − DTpj , |AG| |AG|
(12)
where, DT¬pj is degree of truth value of ¬pj and (¬pj )↓ = {agi ∈ AG|F (agi , ¬pj ) = t}. agi i Proof. According to (6), F (agi , ¬pj ) = t ⇐⇒ T¬p = t ⇐⇒ Tpag = f ⇐⇒ j j ↓ F (agi , pj ) = f , this means that (¬pj ) = {agi|F (agi , ¬pj ) = t} ⇐⇒ {agi |F (agi , pj ) = f } ⇐⇒ AG − (pj )↓ = AG − {agi |F (agi , pj ) = t}, hence,
Property 2. Let F be a multi-agents valuation. ∀pj , pk ∈ L, DTpj ∧pk = DTpj ∨pk
|(pj ∧ pk )↓ | |{pj , pk }↓ | = , |AG| |AG|
|p↓j ∪ p↓k | |(pj ∨ pk )↓ | = = , |AG| |AG|
(13) (14)
where, DTpj ∧pk and DTpj ∨pk are degree of truth value of pj ∧ pk and pj ∨ pk , respectively. (pj ∧ pk )↓ = {agi ∈ AG|F (agi , pj ∧ pk ) = t} and (pj ∨ pk )↓ = {agi ∈ AG|F (agi , pj ∨ pk ) = t}. i i Proof. According to (6) and (7), F (agi , pj ∧pk ) = t ⇐⇒ Tpag = t ⇐⇒ Tpag =t j ∧pk j i and Tpag = t ⇐⇒ F (ag , p ) = t and F (ag , p ) = t, F (ag , p ∨ p ) = t ⇐⇒ i j i k i j k k i i i Tpag = t ⇐⇒ Tpag = t or Tpag = t ⇐⇒ F (agi , pj ) = t or F (agi , pk ) = t, j ∨pk j k these mean that (pj ∧ pk )↓ = {agi ∈ AG|F (agi , pj ) = t and F (agi , pk ) = t} = {pj , pk }↓ , (pj ∨ pk )↓ = {agi ∈ AG|F (agi , pj ) = t or F (agi , pk ) = t} = {agi ∈ AG|F (agi , pj ) = t}∪ {agi ∈ AG|F (agi , pk ) = t} = p↓j ∪ p↓k .
Property 3. Let F be a multi-agents valuation. ∀pj , pk ∈ L, DTpj →pk =
where, DTpj →pk is degree of truth value of pj → pk . (pj → pk )↓ = {agi ∈ AG|F (agi , pj → pk ) = t}. agi i Proof. According to (7), F (agi , pj → pk ) = t ⇐⇒ Tpag = t ⇐⇒ T¬p = t or j →pk j agi Tpk = t ⇐⇒ F (agi , ¬pj ) = t or F (agi , pk ) = t, these mean that (pj → pk )↓ = {agi ∈ AG|F (agi , ¬pj ) = t or F (agi , pk ) = t} = (¬pj )↓ ∪ p↓k = (AG − p↓j ) ∪ p↓k .
Example 3. As a continuation of Example 1. In Table 1, p↓1 = AG, p↓2 = {ag2 , ag5 }, p↓3 = {ag1 , ag4 , ag5 }, p↓5 = {ag2 }, p↓4 = {ag1 , ag3 , ag4 , ag5 }, p↓6 = {ag2 , ag3 }, p↓7 = ∅. DTp1 ∧p7 = 0, DTp2 ∧p6 = 15 , DTp3 ∧p4 = 35 , DTp3 ∧p6 = 0, DTp1 ∨p7 = 1, DTp2 ∨p6 = 35 , DTp3 ∨p4 = 45 , DTp3 ∨p6 = 1, DTp1 →p7 = 0, DTp2 →p6 = 45 , DTp3 →p4 = 1, DTp4 →p3 = 45 , DTp3 →p6 = 25 . In the example, it can be noticed that DTpj ∧pk = min{DTpj , DTpk } and DTpj ∨pk = max{DTpj , DTpk } do not always hold, e.g., for DTp2 ∧p6 and DTp2 ∨p6 , DTp2 ∧p6 = min{DTp2 , DTp6 } and DTp2 ∨p6 = max{DTp2 , DTp6 }. On the other hand, it is difficult to connect → in multi-agents valuation with → in many existing valued logic systems. However, in some special case, a multi-agents valuation is Lukasiewicz logic system. Corollary 1. If a multi-agents valuation F satisfies that ∀pj , pk ∈ L, either p↓j ⊆ p↓k or p↓j ⊇ p↓k , then DTpj ∧pk = min{DTpj , DTpk }, DTpj ∨pk = max{DTpj , DTpk } and DTpj →pk = min{1, 1 − DTpj + DTpk }.
96
C. Zhao and Z. Pei
Proof. According to Proposition 2, if ∀pj , pk ∈ L, either p↓j ⊆ p↓k or p↓j ⊇ p↓k , it is easy to prove DTpj ∧pk = min{DTpj , DTpk } and DTpj ∨pk = max{DTpj , DTpk }. For DTpj →pk , according to Proposition 3 and set operation, (AG − p↓j ) ∪ p↓k = AG − (p↓j ∩ (AG − p↓k )), if p↓j ⊆ p↓k , then p↓j ∩ (AG − p↓k ) = ∅. Hence, DTpj →pk =
In this paper, multi-agents valuation of propositional calculus is discussed. Then FCA is used to express uncertainty of statements, i.e., degrees of truth value. In a special case, multi-agents valuation of complex statements can be processed by Lukasiewicz logic system. Acknowledgments. The authors would like to thank the research fund of sichuan key laboratory of intelligent network information processing (SGXZD1002-10), key laboratory of the radio signals intelligent processing (Xihua university) (XZD0818-09) and the fund of Key Disciplinary of Computer Software and Theory, Sichuan. Grant No.SZD0802-09-1.
References 1. Nov´ ak, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical principles of fuzzy logic. Kluwer Academic Publishers (1999) 2. Nov´ ak, V.: Antonyms and linguistic quantifiers in fuzzy logic. Fuzzy Sets and Systems 124, 335–351 (2001) 3. Dvoˇr´ ak, A., Nov´ ak, V.: Fromal theories and linguistic descriptions. Fuzzy Sets and Systems 143, 169–188 (2004) 4. Boˇsnjak, I., Madar´ asz, R., Vojvodi´c, G.: Algebras of fuzzy sets. Fuzzy Sets and Systems 160, 2979–2988 (2009) 5. Couso, I., Dubois, D.: On the variability of the concept of variance for fuzzy random variables. IEEE Transactions on Fuzzy Systems 17(5), 1070–1080 (2009) 6. Zadeh, L.A.: Fuzzy logic = computing with words. IEEE Trans. Fuzzy Systems 4, 103–111 (1996) 7. Dubois, D., Prade, H.: Gradualness, uncertainty and bipolarity: Making sense of fuzzy sets. Fuzzy Sets and Systems, doi:10.1016/j.fss.2010.11.007 8. Freund, M.: On the notion of concept I. Artificial Intelligence 172, 570–590 (2008)
Multi-agents and Non-classical Logic Systems
97
9. Fortin, J., Dubois, D., Fargier, H.: Gradual numbers and their application to fuzzy interval analysis. IEEE Transactions on Fuzzy Systems 16(2), 388–402 (2008) 10. Cooman, G.: A behavioural model for vague probability assessments. Fuzzy Sets and Systems 154, 305–358 (2005) 11. Zadeh, L.A.: Toward a theory of fuzzy information granulation and its centrality in houman reasoning and fuzzy logic. Fuzzy Sets and Systems 90, 103–111 (1997) 12. Lawry, J.: A voting mechanism for fuzzy logic. International Journal of Approximate Reasoning 19, 315–333 (1998) 13. Lawry, J.: A methodology for computing with words. International Journal of Approximate Reasoning 28, 51–89 (2001) 14. Lawry, J.: A framework for linguistic modelling. Artificial Intelligence 155, 1–39 (2004) 15. Wille, R.: Concepe lattices and conceptual knowledge systems. Comput. Math. Apll. 23(6-9), 493–515 (1992) 16. Lawry, J., Tang, Y.: Granular knowledge representation and inference using labels and label expressions. IEEE Transactions on Fuzzy Systems 18(3), 500–514 (2010) 17. Pei, Z., Ruan, D., Liu, J., Xu, Y.: Linguistic Values based Intelligent Information Processing: Theory, Methods, and Application. In: Atlantis Computational Intelligence Systems, vol. 1. Atlantis press & World Scientific (2009) 18. Pei, Z., Xu, Y., Ruan, D., Qin, K.: Extracting complex linguistic data summaries from personnel database via simple linguistic aggregations. Information Sciences 179, 2325–2332 (2009) 19. Pei, Z., Resconi, G., Van Der Wal, A.J., Qin, K., Xu, Y.: Interpreting and extracting fuzzy decision rules from fuzzy information systems and their inference. Information Sciences 176, 1869–1897 (2006) 20. Jin, J., Qin, K., Pei, Z.: Reduction-Based Approaches Towards Constructing Galois (Concept) Lattices. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 107–113. Springer, Heidelberg (2006) 21. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with TITANIC. Data & Knowledge Engineering 42, 189–222 (2002) 22. Berry, A., Sigayret, A.: Representing a concept lattice by a graph. Discrete Applied Mathematics 144, 27–42 (2004) 23. Berry, A., SanJuan, E., Sigayret, A.: Generalized domination in closure systems. Discrete Applied Mathematics 154, 1064–1084 (2006) 24. Diday, E., Emilion, R.: Maximal and stochastic Galois lattices. Discrete Applied Mathematics 127, 271–284 (2003) 25. Kim, M., Compton, P.: Evolutionary document management and retrieval for specialized domains on the web. Int. J. Human-Computer Studies 60, 201–241 (2004)
An Information Processing Model for Emotional Agents Based on the OCC Model and the Mood Congruent Effect Chao Ma, Guanghong Gong, and Yaofei Ma Beijing University of Aeronautics and Astronautics, Advanced Simulation Technology Lab, Dept. of ASEE, Xueyuan Road 37, 100191, Beijing, China [email protected]
Abstract. Emotional Agents can be regarded as traditional ones with emotional factors. There are differences between emotional Agents and traditional Agents in information perception and processing. This paper mainly deals with the design of cognitive module (information processing module) for emotional Agents. The design contains mathematical approaches to human information processing, and also takes account of the achievements in modern Psychology. The cognitive module is easy to be applied in engineering, which makes the design suitable for most circumstances. Keywords: Emotional Agent, OCC Model, Mood Congruent Effect, Cognition, Information Processing.
1
Introduction
Agent is an important concept in Artificial Intelligence(AI). An Agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators [1]. The intelligence of Agents reflects the state of the art of AI directly. During the early days, researchers paid much attention to building up rational machines which can make proper decisions according to the environment and the problems they have to face. However, they failed to consider the irrational factors such as emotion and affect. One of the founders of AI, Marvin Minsky has stated: “The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions [2].” This statement figured out the relations between emotions and intelligence: Agents cannot achieve true intelligence without emotions. Constructing emotional Agents is to equip the Agents with some emotional factors and take them as influences during the information process. Agents thus will appear to be more human-like. AI researchers have plumbed into emotional Agent construction in different application fields [3] [4]. Emotional machines have been realized in game industry [5], intelligent traffic [6] and etc. Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 98–108, 2011. c Springer-Verlag Berlin Heidelberg 2011
An Information Processing Model for Emotional Agents
2
99
The Structure of Emotional Agents
Emotional Agents are traditional ones with emotional factors. The structure of emotional Agents is depicted in Fig.1. The cognitive module containing information processing model which is the main topic in this paper is a substantial part of emotional Agents.
Fig. 1. The structure of emotional Agents
Sensor: in charge of perceiving information (the occurrence of events, the state change of other Agents, the parameter change of the objects) from the outside world. Cognitive module: processes the information taking into account of emotional states of the Agents. The information processed will then be used as factors to produce emotional states and to make decisions. The principle of this module is mood congruent effect [7], and the mathematical model is the information processing model, which we will cover later. Emotional module: produces emotional states of the Agents, taking information processed by the cognitive module as a factor. It’s based on OCC model [8]. Performance element module: makes decision of what to do next for the Agents, taking information from cognitive module and the emotional states produced by emotional module as input. Knowledge refresh module: refreshes the knowledge base of performance element module to bring about new knowledge to the system. The cognitive module is the main focus of this paper. This module imitates human information perception in the real world. Human beings do not make use of all the information perceived by the perceptional organs. In other words, human beings “choose” the information according to certain laws. And emotion plays an important role in this process. To make the Agents more like a human, we simulate the effects of emotions on information processing.
100
C. Ma, G. Gong, and Y. Ma
When the Agent is working, the emotional states produced by the emotional module using OCC model will be transferred to the cognitive module, and this module will process the information in the guide of the information processing model and the mood congruent effect.
3
OCC Model
There are two main distinguished viewpoints in the modeling of emotions: cognitive theories and dimensional theories of emotion [9]. OCC model belongs to the former. OCC Model is brought forward by A. Ortony, G.Clore and A.Collins in 1988 [8]. It hypotheses that emotions are aroused by the occurrence of events, the appraise of Agents and the liking of objects. Hence, we can inspire artificial emotions using events, states of Agents and attributes of objects in the emotional module. The OCC Model defined 22 basic emotions, tabled below [10] in Table 1: Table 1. OCC’s pairs of emotions Groups of Emotion
Positive Emotion Negative Emotion
Fortunes of Others
Happy for
Pity
Gloating
Resentment
Hope
Fear
Satisfaction
Fears Confirmed
Relief
Disappointment
Well-being
Joy
Distress
Attribution
Pride
Shame
Admiration
Reproach
Prospect Based
Well-being/Attribution Gratification
Attraction
Remorse
Gratitude
Anger
Love
Hate
The model divided the 22 emotions into 11 pairs. In each pair, there is a positive emotion and a negative one. There are also other methods to divide emotions, which are beyond the scope of this paper. In the cognitive module discussed later, we will mark the emotions as positive or negative according to this table.
4
Mood Congruent Effect
The Mood Congruent Effect is the basic rule the Agents have to obey when processing information and also the principle for the design of cognitive module.
An Information Processing Model for Emotional Agents
101
We use the term mood congruent effect to refer to the phenomenon that people tend to choose the information that is congruent with their emotional states, which indicates some startup effect of emotions (i.e. people with positive emotions tend to believe the positive information from outside world and vise versa). Sometimes, we also call it the emotion congruent effect [7]. The Mood Congruent Effect can be recognized as mood congruent memory and mood congruent processing. For the reason that the cognitive module we built does not have a memory part, we only implement mood congruent processing which means the test-takers with a certain emotion will be apt to choose the information congruent with it. This effect depicts the relations between human emotions and human information processing. In order to imitate these relations, the emotional Agents should embody this effect. If the Agent is in a positive emotional state, it will probably choose more information that will better its emotion rather than choose the information that will bring it to a negative state. We can divide the information into positive and negative groups. Positive information refers to the information that will probably bring the Agent to a positive emotional state and negative information the information does the opposite. There is a third information group, which will be best described as neutral information because it refers to the information that will hardly ever arouse any emotions.
5
Information Processing Model
The information processing model is the key part of the cognitive module. This model is based on the OCC Model. 5.1
Descriptions of Emotion
We have to classify and quantitate different emotions to make it easier to be used in the information processing model. There are varieties of methods to quantitate emotions, and we will adopt the n-dimensional method mentioned by Schlosberg and Lang [11]. We instantiate n as 1. The magnitude refers to intensity and the scale to direction. We can consider the emotions of the same row in table1 as a pair, for example gratitude and anger. Variable S ∈ (−1, 1) is introduced to describe them. In the case that S > 0, we explain the emotion as gratitude, and the intensity of the emotion is |S|; otherwise, we explain the emotion as anger, the intensity is also |S|. If S → 1, the emotion of gratitude tends to be greatest in intensity, while when S → −1, the emotion of anger tends to be greatest. When S = 0, the emotional state is calm. In the information processing model, we only care about whether the emotional state is positive or negative without drilling into certain concrete emotions such as gratitude and anger. Based on this simplification, it can be understood as when S > 0, the emotional state is positive with an intensity of |S| and when S < 0, the emotional state is negative also with an intensity of |S|.
102
5.2
C. Ma, G. Gong, and Y. Ma
Common Information Processing
The common information mentioned here is the information that will not have strong effects on the existence or the main goal of the Agents, which can also be expressed as information that could be ignored in the decision-making process. We can draw some conclusions according to the Mood Congruent Effect: 1. Humans can only deal with part of the information from the outside world; and 2. the choosing of information will be influenced by emotional factors. Considering these characteristics, we can describe the information processing stochastically. If the Agent is in a positive emotion, it will choose positive information with higher possibilities, and vise versa. We define this possibility as the possibility of believing (PB). A General Mathematical Model. In a general condition, when S ∈ (−1, 1), we set P (S)as the PB of positive information and P (S)as the PB of negative information. We make two assumptions based on the fact that the ability of human information processing is limited and emotional factors will influence the PB of the information: 1. Positive emotions and negative emotions have the same mechanism which influences the PB, but the effect is opposite. The PB of positive information influenced by emotion S equals the PB of negative information influenced by emotion −S. In mathematical form, we have P (S) = P (−S). 2. The aggregate possibility of the PB will be a constant, and we set it as D. D is also the PB for the neutral information. According to the assumptions, we can infer: xP (S) + (1 − x)P (S) = xP (S) + (1 − x)P (−S) = D
(1)
Where S ∈ (−1, 1), x stands for the proportion of positive information in non-neutral information. Check (1), we can get P (0) = D. It is easy to be observed that when D is constant and x varies, P (S) and P (−S) will have different values, too. 5.3
Key Information Processing
Key information refers to the information should not be ignored in decisionmaking process. In the real world, this information should not be neglected and has much greater importance and priority. The method we use to deal with key information is different from the one with common information. We adopt stochastic method to imitate the processing of common information for the reason that it can be ignored without bringing
An Information Processing Model for Emotional Agents
103
about calamitous results. However, if we adopted the same method for key information, the Agent would be insensitive to the substantial information from the environment, which will lead to the decline of the adaptation ability of the Agents. Thus, we do not consider stochastic method for key information. We have to make sure the Agent will get the key information immediately. This approach reflects the priority in human information-process. D. A. Norman and et al have also introduced different methods for different information with varieties of priorities [12], which is an imitation of the actions we human will take when processing information.
6
Cognitive Module
Cognitive module is an implement of the information processing model in Agent structure. 6.1
The Structure of Cognitive Module
There are three kinds of information from the outside world (occurrence of events, states of Agents, attributes of objects). Cognitive module accepts these three arouses as input. We only take the occurrence of events as an example to explain the work of cognitive module and the processing of the other two kinds of information are actually similar. The structure of cognitive module is displayed in Fig.2. Input: The input comes from the sensors. The sensors perceive the information outside and then transfer it to the cognitive module for further processing. Output: The output becomes the inputs for the emotional module and the performance element module. The inner structure: Cognitive module can be divided into two sub-modules, namely the appraise sub-module and the main-processor sub-module. Appraise: Provide elementary process of information and evaluate what emotional state will be aroused by the current event. Main processor: Take the emotional states and information from appraise sub-module as inputs and use the information processing model to decide which information to take and which to drop. 6.2
Main Processor Sub-module
This sub-module is the implement of the information processing model. First of all, we have to declare some definitions: Event attribute: the category that the event belongs to, such as positive, negative and neutral.
104
C. Ma, G. Gong, and Y. Ma
Fig. 2. The structure of the cognitive module
Event attribute value (θ): 1 for positive event, −1 for negative and 0 for neutral. Event aggregate: the aggregate of events perceived by the sensors in one time step. Event aggregate attribute value (Θ): Θ= kθ (2) Where k stands for the ratio between a certain kind of believed events and the events in the event aggregate. Assume that the ratio between positive events and the event aggregate is lp , the ratio between negative events and the event aggregate is ln , the ratio between neutral events and the event aggregate is lneu , then we get Θ = lp P (S) · 1 + ln P (−S) · (−1) + lneu D · 0 = lp P (S) − ln P (−S)
(3)
Θ > 0 reflects the fact that the Agent takes the environment as positive while Θ < 0 as negative. The attribute value defined as 0, the neutral events have no effect on the event aggregate attribute value. Hence, divide equation (3) by (lp + ln ) and consider lp x = (lp +l we will get n) Θ =
Θ = xP (S) − (1 − x)P (−S) (lp + ln )
(4)
When the event aggregate is given, the difference between Θ and Θ is just a constant ratio. Thus Θ is also capable of describing the attribute of the event aggregate. What’s more, equation (4) is much easier to be used together with equation (1). According to the research by Eric J. Johnson and et al [13], people will overevaluate risks by 74% when in a negative emotion (three groups of experiments
An Information Processing Model for Emotional Agents
105
were carried out, and over-evaluation was observed in each group. The percentages are 133%, 56% and 50%, average 74%). This risk over-evaluation is also caused by the mood congruent effect so it is sufficient to embody the strength of the effect. Other psychological experiments also brought us similar results [14] [15]. In the information processing model, we take the effect of mood congruent effect as aggrandizing the event aggregate attribute (compared with the situation in a calm state). And the percentage of aggrandizement will be around 74%. The event aggregate attribute in a calm state is xP (0) − (1 − x)P (−0) = (2x − 1)P (0) = (2x − 1)D, so we can get the equation as follows: When x ∈ [0.5, 1], S ∈ (0, 1) (positive information & positive emotion) or x ∈ [0, 0.5], S ∈ (−1, 0) (negative information & negative emotion), there exists: xP (S) − (1 − x)P (−S) = 1.74(2x − 1)D The simultaneous equations of (5) & (1): xP (S) + (1 − x)P (−S) = D xP (S) − (1 − x)P (−S) = 1.74(2x − 1)D The result is:
P (S) = (1.74x−0.37)D x P (−S) = (1.37−1.74x)D 1−x
(5)
(6)
(7)
Considering a value of possibility will be constraint to [0, 1], we illustrate P (S) and P (−S) in Fig.3. When D and S vary, we get the curves of P (S) in Fig.4. Different Ds mean different considerations of the information perceived by the Agents. The greater the D, the more information will be taken into account by the Agents so that they can make more accurate judgments on the situation they have to face; and the smaller the D, the less sensitive the Agents. In practice, the users of this model can set the D themselves. To test this emotional model, we built up a testing framework called emotional model testing framework. In this framework, we can view documents and details of the models, take tests of them, record the data and make some analysis as well. What’s more, we apply this model in some scenarios such as flight route planning demo. In this demo, Agents with different emotions (Hope and Fear) have to choose their own route between two places separated by a high mountain. See Fig.5. From Fig.5, we can conclude that Flight Agents with positive emotions tend to choose short routes with high risks (higher mountains etc.) like route A; while Flight Agents with negative emotions tend to choose longer routes with lower risks like route B. This means that Agents with positive emotions will pay more attention to the positive information such as shorter routes while Agents with negative emotions will pay more attention to negative information such as danger when making decisions.
106
C. Ma, G. Gong, and Y. Ma
Fig. 3. Distributions of P (S) and P (−S)
Fig. 4. Curves of P (S) and P (−S)
An Information Processing Model for Emotional Agents
In this paper, we weaken the effects of neutral information. We only put positive and negative information in the event aggregate attribute but this kind of approach is enough for engineering and the calculation is comparably easy. If we take neutral information into account, we have to consider the ratio between neutral event and the event aggregate to decide the attribute of the aggregate. All in all, the method we offer in the paper is easy to be realized and able to embody the effect of emotion in the Agents. What’s more, the psychological foundation is firm. This method can be used in most Agent systems to increase the similarity between human information processing and the Agent information processing. However, the human information processing is a complicated, nonlinear process, so more accurate consideration will need much more efforts of AI researchers.
References 1. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, New Jersey (2003) 2. Minsky, M.: The Society of Mind. Simon & Schuster, New York (1988) 3. Turkia, M.: A Computational Model of Affects. CoRR, Vol.abs/081 1.0123 (2008) 4. Barteneva, D., Lau, N., Reis, L.P.: A Computational Study on Emotions and Temperament in Multi-Agent Systems. CoRR, Vol.abs/0809.4784 (2008) 5. Qiang, J., Lan, P., Looney, C.: A Probabilistic Framework for Modeling and Realtime Monitoring Human Fatigue. IEEE Trans.Systems, Man and Cybernetics, Part A: Systems and Humans 36, 862–875 (2006) 6. Slater, S., Moreton, R., Buckley, K., Bridges, A.: A Review of Agent Emotion Architectures. Eludamos Journal for Computer Game Culture 2, 203–214 (2008) 7. Zhuang, J.: The Psychology of Decision-Making. Shanghai Educational Press, Shanghai (2006)
108
C. Ma, G. Gong, and Y. Ma
8. Ortony, A., Clore, G.L., Collins, A.: The Cognitive Structure of Emotions. Cambridge University Press, New York (1988) 9. MacDorman, K.F., Ishiguro, H.: Generating Natural Motion in an Android by Mapping Human Motion. In: Proceeding IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3301–3308. IEEE Press, New York (2005) 10. Trabelsi, A., Frasson, C.: The Emotional Machine: A Machine Learning Approach to Online Prediction of User’s Emotion and Intensity. In: 2010 10th IEEE International Conference on Advanced Learning Technologies, pp. 613–617. IEEE Press, New York (2010) 11. Wang, Z.: Artificial Emotion. China Machine Press, Beijing (2007) 12. Norman, D.A., Ortony, A., Russell, D.M.: Affect and Machine Design: Lessons for the Development of Autonomous Machines. IBM Systems Journal 42(1), 38–44 (2003) 13. Johnson, E.J., Tversky, A.: Affect, Generalization and the Perception of Risk. Journal of Personality and Social Psychology 45(1), 20–31 (1983) 14. Eysenck, M.W.: Anxiety: The Cognitive Perspective. Erlbaum, Hove (1992) 15. Lerner, J.S., Keltner, D.: Beyond Valence: Toward a Model of Emotion-specific Influences on Judgement and Choice. Cognition and Emotion 14, 473–494 (2000)
On Distributive Equations of Implications and Contrapositive Symmetry Equations of Implications Based on a Continuous t-Norm Feng Qin1,2 and Meihua Lu2 1
College of Mathematics and Information Science, Nanchang Hangkong University, 330063, Nanchang, P.R. China 2 College of Mathematics and Information Science, Jiangxi Normal University, 330022, Nanchang, P.R. China [email protected]
Abstract. In this paper, we summarize the sufficient and necessary conditions of solutions for the distributive equation of implication I(x, T1 (y, z)) = T2 (I(x, y), I(x, z)) and characterize all solutions of the functional equations consisting of I(x, T1 (y, z)) = T2 (I(x, y), I(x, z)) and I(x, y) = I(N(y), N(x)), when T1 is a continuous but not Archimedean triangular norm, T2 is a continuous and Archimedean triangular norm, I is an unknown function, N is a strong negation. We also underline that our method can apply to the three other functional equations closely related to the above-mentioned functional equations. Keywords: Fuzzy connectives, Fuzzy implications, Continuous Archimedean t-norms, Continuous t-norms, Distributive equations of implications, Contrapositive symmetry equations of implications.
1
Introduction
The ability to build complex commercial and scientific fuzzy logic applications has been hampered by what is popularly known as the combinatorial rule explosion problem, which is associated with the conventional fuzzy rule configuration and its accompanying rule matrix. Since all the rules of an inference engine are exercised during every inference cycle, the number of rules directly affects the computational duration of the overall application. To reduce complexity of fuzzy “IF-THEN” rules, Combs and Andrews [8-10] required of the following classical tautology (p ∧ q) → r = (p → r) ∨ (q → r). They refer to the left-hand side of this equivalence as an intersection rule configuration (IRC) and to its right-hand side as a union rule configuration (URC). Subsequently, there were many discussions [9-11,18], most of them pointed out
This work is supported by National Natural Science Foundation of China (Nos. 60904041, 61165014) and Jiangxi Natural Science Foundation (No.2009GQS0055).
Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 109–120, 2011. c Springer-Verlag Berlin Heidelberg 2011
110
F. Qin and M. Lu
the need for a theoretical investigation required for employing such equations, as concluded by Dick and Kandel [11], “Future work on this issue will require an examination of the properties of various combinations of fuzzy unions, intersections and implications” or by Mendel and Liang [18], “We think that what this all means is that we have to look past the mathematics of IRC⇔URC and inquire whether what we are doing when we replace IRC by URC makes sense.” And then, Trillas and Alsina[26], in the standard fuzzy theory, turned the about requirement into the functional equation I(T (x, y), z) = S(I(x, z), I(y, z)) and obtained all solutions of T when I are special cases of R-implications, Simplications and QL-implications, respectively. Along the lines, Balasubramaniam and Rao[6] investigated the other three functional equations interrelated with this equation. In order to study it in more general case, Ruiz-Aguilera[22,23] and Qin[21], in their own papers, generalized the above equation into uninorm. On the other hand, from fuzzy logical angle, Turksen[27] posed and discussed the equation I(x, T (y, z)) = T (I(x, y), I(x, z)),
x, y, z ∈ [0, 1],
(1)
and then, got the necessary conditions for a fuzzy implication I to satisfy Eq.(1) when T = TP . Later, Baczy´ nski[1] generalized some Turksen’s results into strict t-norm T and obtained the sufficient and necessary conditions of functional equations consisting of Eq.(1) and the following equation I(x, I(y, z)) = I(T (x, y), z),
x, y, z ∈ [0, 1].
(2)
Moreover, he[2] also studied the functional equations composed of Eq.(1) and the following equation I(x, y) = I(N (y), N (x)),
x, y, z ∈ [0, 1].
(3)
After this, Yang and Qin in [28] got the full characterizations of the functional equations composed of Eq.(1) and Eq.(3) when T is a strict t-norm. Recently, many people[2,4,5,20], including Baczy´ nski and Qin, investigate again the distributivity of fuzzy implications over nilpotent or strict triangular t-norms or t-conorms. Specially, we in [19], in the most general case, explored and got the sufficient and necessary conditions of solutions for the distributive equation of implication I(x, T1 (y, z)) = T2 (I(x, y), I(x, z)),
x, y, z ∈ [0, 1].
(4)
And then, we in [18], characterized all solutions of the functional equations consisting of Eq.(3) and Eq.(4). Along the above line, in this paper, we summarize the sufficient and necessary conditions of solutions for Eq.(4) and the functional equations consisting of Eq.(3) and Eq.(4), when T1 is a continuous but not Archimedean triangular norm, T2 is a continuous and Archimedean triangular norm, I is an unknown function, N is a strong negation. We also underline that our method can apply to the three other functional equations closely related to the above-mentioned functional equations.
On Distributive Equations of Implications
111
The paper is organized as follows. In section 2, we present some results concerning basic fuzzy logic connectives employed in the sequel. In Section 3, we recall all solutions of Eq.(4) when T1 is a continuous but not Archimedean triangular norm, T2 is a continuous Archimedean triangular norm and I is an unknown function. In section 4, we investigate the functional equations consisting of Eq.(3) and Eq.(4) when T1 is a continuous but not Archimedean triangular norm, T2 is a strict triangular norm. In section 5, we do the same investigation except that T2 is a nilpotent triangular norm. Finally, a simple conclusion is in Section 6. Unless otherwise stated, we always assume that T1 is only continuous but not Archimedean, which means that T1 must have at least one non-trivial idempotent element, because the case that T1 is continuous and Archimedean has been studied by Baczy´ nski[1,2,4,5], Qin[5,20,28], and so on.
2
Preliminaries
In this section, we recall basic notations and facts used in the sequel. Definition 2.1[12-14]. A binary function T : [0, 1]2 → [0, 1] is called a triangular norm (t-norm for short), if it fulfills, for every x, y, z ∈ [0, 1], the following conditions (1) T (x, y) = T (y, x), (commutativity) (2) T (T (x, y), z) = T (x, T (y, z)), (associativity) (3) T (x, y) ≤ T (x, z), whenever y ≤ z, (monotonicity) (4) T (x, 1) = x. (boundary condition) Definition 2.2[12,14]. A t-norm T is said to be (1) Archimedean, if for every x, y ∈ (0, 1), there exists some n ∈ N such that xnT < y, where xnT = T (x, x, · · · , x); n times
(2) strict, if T is continuous and strictly monotone, i.e., T (x, y) < T (x, z) whenever x ∈ (0, 1] and y < z; (3) nilpotent, if T is continuous and if for each x ∈ (0, 1) there exists some n ∈ N that xnT = 0. Remark 2.1. If T is strict or nilpotent, then it must be Archimedean. The converse is also true when it is continuous. (see Theorem 2.18 in [14]) Theorem 2.1[14,16]. For a function T : [0, 1]2 → [0, 1], the following statements are equivalent: (1) T is a continuous Archimedean t-norm. (2) T has a continuous additive generator, i.e., there exists a continuous, strictly decreasing function t : [0, 1] → [0, ∞] with t(1) = 0, which is uniquely determined up to a positive multiplicative constant, such that it holds T (x, y) = t(−1) (t(x) + t(y)) for all x, y ∈ [0, 1], where t(−1) is the pseudo-inverse of t, given −1 t (x), x ∈ [0, t(0)], by t(−1) (x) = 0, x ∈ (t(0), ∞].
112
F. Qin and M. Lu
Remark 2.2. (1) Without the pseudo-inverse, the representation of a t-norm in Theorem 2.1 can be rewritten as T (x, y) = t−1 (min(t(x) + t(y), t(0))),
x, y ∈ [0, 1].
(5)
(2) A t-norm T is strict if and only if each continuous additive generator t of T satisfies t(0) = ∞. (3) A t-norm T is nilpotent if and only if each continuous additive generator t of T satisfies t(0) < ∞. Theorem 2.2[7,12]. T is a continuous t-norm, if and only if (1) T = TM , or (2) T is continuous Archimedean, or (3) there exists a family {[am , bm ], Tm }m∈A such that T is the ordinal sum of this family denoted by T = (< am , bm , Tm >)m∈A . In other words, y−am m am + (bm − am )Tm ( bx−a ) (x, y) ∈ [am , bm ]2 ; m −am bm −am T (x, y) = (6) min(x, y) otherwise, where {[am , bm ]}m∈A is a countable family of non-over lapping, closed, proper subintervals of [0, 1]with each Tm being a continuous Archimedean t-norm, and A is a finite or countable infinite index set. For every m ∈ A, [am , bm ] is called the generating subinterval of T , and Tm the corresponding generating t-norm of T on [am , bm ]. In some literatures we can find several diverse definitions of fuzzy implications (see [7], [14], [16], [25]). But, in this article, we will use the following one, which is equivalent to the definition introduced by Fodor and Roubens (see [12]). Definition 2.3[3,12]. A function I : [0, 1]2 → [0, 1] is called a fuzzy implication, if I fulfills the following conditions I1: I is decreasing with respect to the first variable; I2: I is increasing with respect to the second one; I3: I(0, 0) = I(0, 1) = I(1, 1) = 1, I(1, 0) = 0. (7) In virtue of the above definition, it is obvious that each fuzzy implication satisfy I(0, x) = I(x, 1) = 1 for all x ∈ [0, 1]. But we can say nothing about the value of I(x, 0) and I(1, x) for all x ∈ (0, 1). Definition 2.4[14,17,19,24]. A continuous function N : [0, 1] → [0, 1] is called a strong negation, if it is strictly decreasing, involutive and satisfies N (0) = 1 and N (1) = 0. Specially, when N (x) = 1 − x, we call it the standard negation, denoted by N0 .
3
Solutions to Eq.(4) When T1 Is a Continuous t-Norm and T2 Is a Continuous Archimedean t-Norm
In this section, we recall the characterizations of function I satisfying Eq.(4) when T1 is a continuous t-norm and T2 is a continuous Archimedean t-norm.
On Distributive Equations of Implications
113
For any given continuous t-norm T1 and binary function I, and fixed x ∈ [0, 1], we define U(T1 ,I,x) = {y ∈ [0, 1] | I(x, y) = 0, y is an idempotent element of T1 }, μ(T1 ,I,x) = sup U(T1 ,I,x) , and V(T1 ,I,x) = {y ∈ [0, 1] | I(x, y) = 1, y is an idempotent element of T1 }, ν(T1 ,I,x) = inf V(T1 ,I,x) . For a more precise presentation, we must underline the relation between (T1 and I) and (μ(T1 ,I,x) and ν(T1 ,I,x) ). Note that U(T1 ,I,x) and V(T1 ,I,x) actually determined by T1 , I and x. U(T1 ,I,x) and V(T1 ,I,x) may be different when either T1 or I are different. We stipulate here that sup ∅ = 0 and inf ∅ = 1 and obtain from Lemma 3.3 in [18] that μ(T1 ,I,x) ≤ ν(T1 ,I,x) for any T1 , I and x ∈ [0, 1]. Now, by the order between μ(T1 ,I,x) and ν(T1 ,I,x) , we need to consider two cases: μ(T1 ,I,x) = ν(T1 ,I,x) and μ(T1 ,I,x) < ν((T1 ,I,x) . Theorem 3.1. Let T1 be a continuous t-norm, T2 a continuous Archimedean t-norm, I: [0, 1]2 → [0, 1] a binary function and assume that μ(T1 ,I,x) = ν(T1 ,I,x) for some fixed x ∈ [0, 1]. Then the following statements are equivalent: (1) The triple of functions (T1 , T2 , I(x, ·)) satisfies Eq.(4) for any y, z ∈ [0, 1]; (2) The vertical section I(x, ·) has the following forms: (i) If μ(T1 ,I,x) ∈ U(T1 ,I,x) , then 0, y ≤ μ(T1 ,I,x) , y ∈ [0, 1]. (8) I(x, y) = 1, y > μ(T1 ,I,x) , (ii) If ν(T1 ,I,x) ∈ V(T1 ,I,x) , then 0, y < ν(T1 ,I,x) , I(x, y) = 1, y ≥ ν(T1 ,I,x) ,
y ∈ [0, 1].
Next, let us consider the case μ(T1 ,I,x) < ν(T1 ,I,x) .
(9)
0, y < μ(T1 ,I,x) , 1, y > ν(T1 ,I,x) , for any x ∈ [0, 1], when μ(T1 ,I,x) < ν(T1 ,I,x) . But we do say nothing about the value of I(x, y) for any y ∈ [μ(T1 ,I,x) , ν(T1 ,I,x) ]. We will solve this problem next, considering the different assumptions on t-norm T2 . At first, we recall characterizations of fuzzy implications I satisfying Eq.(4) when T1 is a continuous t-norm and T2 is a strict t-norm. Remark 3.1. We know from Remark 3.11 in [18] that I(x, y) =
Theorem 3.2. Let T1 be a continuous t-norm, T2 a strict t-norm, I: [0, 1]2 → [0, 1] a continuous binary function except the vertical section I(0, y) = 1 for y ∈ [0, a], which satisfies Eq.I3. Then the following statements are equivalent: (1) The triple of functions (T1 , T2 , I) satisfies Eq.(4) for all x, y, z ∈ [0, 1].
114
F. Qin and M. Lu
(2) T1 admits the representation (6), there exist two constants a < b ∈ [0, 1] such that μ(T1 ,I,x) = a, ν(T1 ,I,x) = b for all x ∈ [0, 1], and there exist continuous, strictly decreasing functions ta , t2 : [0, 1] → [0, ∞] with ta (1) = t2 (1) = 0, ta (0) = t2 (0) = ∞, which are uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm Ta of T1 on the generating subinterval [a, b] and T2 admit the representation (5) with ta and t2 respectively, and there exists a continuous function c: (0, 1] → (0, ∞), c(0) = 0, uniquely determined up to a positive multiplicative constant depending on constants for ta and t2 , such that I has the form ⎧ 1, x = 0, y ∈ [0, a], ⎪ ⎪ ⎨ 0, x = 0, y ∈ [0, a], I(x, y) = x, y ∈ [0, 1]. (10) y−a −1 t (c(x)t ( )), x ∈ [0, 1], y ∈ [a, b], ⎪ a b−a ⎪ ⎩ 2 1, x ∈ [0, 1], y ∈ [b, 1]. Next, we recall characterizations of fuzzy implications I satisfying Eq.(4) when T1 is a continuous t-norm and T2 is a nilpotent t-norm. Theorem 3.3. Let T1 be a continuous t-norm, T2 a nilpotent t-norm, there exist two constants a < b ∈ [0, 1] such that μ(T1 ,I,x) = a, ν(T1 ,I,x) = b for all x ∈ [0, 1], and the corresponding generating t-norm Ta of T1 on the generating subinterval [a, b] is strict. I: [0, 1]2 → [0, 1] a continuous binary function except the vertical section I(0, y) = 1 for y ∈ [0, a], which satisfies Eq. I3. Then the following statements are equivalent: (1) The triple of functions (T1 , T2 , I) satisfies Eq.(4) for all x, y, z ∈ [0, 1]. (2) T1 admits the representation (6), and there exist continuous, strictly decreasing function ta , t2 : [0, 1] → [0, ∞] with ta (1) = t2 (1) = 0, ta (0) = ∞, t2 (0) < ∞, which is uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm Ta of T1 on the generating subinterval [a, b] and T2 admit the representation (5) with ta and t2 respectively, and there exists a continuous function c: (0, 1] → (0, ∞), c(0) = 0, uniquely determined up to a positive multiplicative constant depending on constants for ta and t2 , such that I has the form ⎧ 1, x = 0, y ∈ [0, a], ⎪ ⎪ ⎨ 0, x = 0, y ∈ [0, a], I(x, y) = (11) y−a −1 t (min(c(x)t ( ), t (0))), x ∈ [0, 1], y ∈ [a, b], ⎪ a b−a 2 ⎪ ⎩ 2 1, x ∈ [0, 1], y ∈ [b, 1]. Theorem 3.4. Let T1 be a continuous t-norm, T2 a nilpotent t-norm, there exist two constants a < b ∈ [0, 1] such that μ(T1 ,I,x) = a, ν(T1 ,I,x) = b for all x ∈ [0, 1], and the corresponding generating t-norm Ta of T1 on the generating subinterval [a, b] is nilpotent. I: [0, 1]2 → [0, 1] a continuous binary function except the vertical section I(0, y) = 1 for y ∈ [0, b], which satisfies Eq. I3. Then the following statements are equivalent: (1) The triple of functions (T1 , T2 , I) satisfies Eq.(4) for all x, y, z ∈ [0, 1].
On Distributive Equations of Implications
115
(2) T1 admits the representation (6), and there exist continuous, strictly decreasing function ta , t2 : [0, 1] → [0, ∞] with ta (1) = t2 (1) = 0, ta (0) < ∞, t2 (0) < ∞, which is uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm Ta of T1 on the generating subinterval [a, b] and T2 admit the representation (5) with ta and t2 respectively, and there exists a continuous function c: (0, 1] → [ tta2 (0) , ∞), c(0) = 0, uniquely deter(0) mined up to a positive multiplicative constant depending on constants for ta and t2 , such that I has the form ⎧ 1, x = 0, y ∈ [0, b], ⎪ ⎪ ⎨ 0, x = 0, y ∈ [0, a], I(x, y) = (12) y−a −1 t (min(c(x)t ( ), t (0))), x ∈ (0, 1], y ∈ [a, b], ⎪ a b−a 2 ⎪ ⎩ 2 1, x ∈ [0, 1], y ∈ [b, 1].
4
Solutions to the Functional Equations Consisting of Eq.(3) and Eq.(4) When T1 Is a Continuous t-Norm and T2 Is a Strict t-Norm
In this section, we characterize all solutions to the functional equations consisting of Eq.(3) and Eq.(4) when T1 is a continuous t-norm and T2 is a strict t-norm. Remark 4.1. From the Remark 4.1 in [18], we can draw a conclusion that if I is not continuous at point (0, 0), then I is also not continuous on the partly vertical section I(0, y) for all y ∈ [0, a]. On the other hand, note that we mainly investigate the functional equations consisting of Eq.(3) and Eq.(4), and Eq.(3) is the contrapositive symmetry equation of implication. Hence, we get that I is yet not continuous on the partly horizontal section I(x, 1) for all x ∈ [N (a), 1], where N is a strong negation. Next, let’s find all solutions to the functional equations consisting of Eq.(3) and Eq.(4). To this end, we need to consider the following several cases. Theorem 4.1. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm, N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except at the points (0, 0) and (1, 1), which satisfies Eq.(7). Then the quaternion of functions (T1 , T2 , I, N ) does not satisfy the functional equations consisting of Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1]. Theorem 4.2. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm. There exist two constants a < b ∈ [0, 1], a ∈ (0, 1) such that μ(T1 ,I,x) = a, ν(T1 ,I,x) = b for all x ∈ [0, 1], and two continuous, strictly decreasing functions ta , t2 : [0, 1] → [0, ∞] with ta (1) = t2 (1) = 0, ta (0) = t2 (0) = ∞, which are uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm Ta of T1 on the generating subinterval [a, b] and T2 admit the representation (5) with ta and t2 respectively. N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except
116
F. Qin and M. Lu
the partly vertical section I(0, y) = 1 for y ∈ [0, a] and the partly horizontal section I(x, 1) for x ∈ [N (a), 1], which satisfies Eq.(7). Then the following statements are equivalent: (1) the quaternion of functions (T1 , T2 , I, N ) satisfies the functional equations consisting of Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1]. (2) b = 1 and there exists a constant r ∈ (0, ∞) such that I has the following form ⎧ (x, y) ∈ {0} × [0, 1] ∪ [0, 1] × {1}, ⎨ 1, N (x)−a y−a I(x, y) = t−1 (r · t ( ) · t ( )), (x, y) ∈ (0, N (a)) × (a, 1), a a 1−a 1−a ⎩ 2 0, otherwise. (13) Theorem 4.3. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm, N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except the partly vertical section I(0, y) = 1 for y ∈ [0, 1] and the partly horizontal section I(x, 1) for x ∈ [0, 1], which satisfies Eq.(7). Then the quaternion of functions (T1 , T2 , I, N ) satisfies the functional equations consisting of Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1] if and only if I has the following form 1, (x, y) ∈ {0} × [0, 1] ∪ [0, 1] × {1}, I(x, y) = (14) 0, otherwise. Remark 4.2. By contrast with the results obtained by Baczy´ nski in [2], we can find our results are more complex. In [28], we have showed that the functional equations consisting of Eq.(3) and Eq.(4) have many solutions when both T1 and T2 are strict t-norms, N is a strong negation, I is a continuous binary function except at the points (0, 0) and (1, 1). While, in this paper, Theorem 4.1 shows that the above-mentioned functional equations have no solution when T1 is a continuous t-norm, T2 is a strict t-norm, N is a strong negation and I is a continuous binary function except at the points (0, 0) and (1, 1). But there is no any contradiction because T1 has at least one idempotent element in this paper. Hence these results do not include the others.
5
Solutions to the Functional Equations Consisting of Eq.(3) and Eq.(4) When T1 Is a Continuous t-Norm and T2 Is a Nilpotent t-Norm
Similar to analysis in last section, it is enough to consider I(x, y) is not continuous on the vertical section I(0, y) for all y ∈ [0, a] and on the horizontal section I(x, 1) for all x ∈ [N (a), 1]. Again, we only investigate all solutions of the functional equations consisting of Eq.(3) and Eq.(4). To do this, we firstly consider the case the correspondingly generating t-norm Ta of T1 on the generating subinterval [a, b] is strict.
On Distributive Equations of Implications
117
Theorem 5.1. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm. And there exist one constant b ∈ [0, 1] such that μ(T1 ,I,x) = 0, ν(T1 ,I,x) = b for all x ∈ [0, 1], and two continuous, strictly decreasing function t0 , t2 : [0, 1] → [0, ∞] with t0 (1) = t2 (1) = 0, t0 (0) = ∞, t2 (0) < ∞, which are uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm T0 of T1 on the generating subinterval [0, b] and T2 admit the representation (5) with t0 and t2 respectively. N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except the points (0, 0) and (1, 1), which satisfies Eq.(7). Then the quaternion of functions (T1 , T2 , I, N ) does not satisfy Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1]. Theorem 5.2. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm. There exist two constants a ∈ (0, 1), b ∈ (0, 1] such that μ(T1 ,I,x) = a, ν(T1 ,I,x) = b for all x ∈ [0, 1], and two continuous, strictly decreasing function ta , t2 : [0, 1] → [0, ∞] with ta (1) = t2 (1) = 0, ta (0) = ∞, t2 (0) < ∞, which are uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm Ta of T1 on the generating subinterval [a, b] and T2 admit the representation (5) with ta and t2 , respectively. N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except the partly vertical section I(0, y) = 1 for y ∈ [0, a] and the partly horizontal section I(x, 1) for x ∈ [N (a), 1], which satisfies Eq.(7). Then the following statements are equivalent: (1) The quaternion of functions (T1 , T2 , I, N ) satisfies the functional equations consisting of Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1]. (2) b = 1 and there exists a constant r ∈ (0, ∞) such that I has the following form ⎧ (x, y) ∈ {0} × [0, 1] ∪ [0, 1] × {1}, ⎨ 1, N (x)−a y−a I(x, y) = t−1 (r · t ( ) · t ( )), (x, y) ∈ (0, N (a)) × (a, 1), a a 1−a 1−a ⎩ 2 0, otherwise. (15) Theorem 5.3. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm, N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except the partly vertical section I(0, y) = 1 for y ∈ [0, 1] and the partly horizontal section I(x, 1) for x ∈ [0, 1], which satisfies Eq.(7). Then the quaternion of functions (T1 , T2 , I, N ) satisfies the functional equations consisting of Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1] if and only if I has the following form 1, (x, y) ∈ {0} × [0, 1] ∪ [0, 1] × {1}, I(x, y) = (16) 0, otherwise. Next, let us consider the case the generating t-norm Ta of T1 on the generating subinterval [a, b] is nilpotent.
118
F. Qin and M. Lu
Theorem 5.4. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm. And there exist one constant b ∈ [0, 1] such that μ(T1 ,I,x) = 0, ν(T1 ,I,x) = b for all x ∈ [0, 1], and two continuous, strictly decreasing function t0 , t2 : [0, 1] → [0, ∞] with t0 (1) = t2 (1) = 0, t0 (0) < ∞, t2 (0) < ∞, which are uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm T0 of T1 on the generating subinterval [0, b] and T2 admit the representation (5) with t0 and t2 respectively. N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except the points (0, 0) and (1, 1), which satisfies Eq.(7). Then the quaternion of functions (T1 , T2 , I, N ) does not satisfy Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1]. Theorem 5.5. Let T1 : [0, 1]2 → [0, 1] be a continuous t-norm, T2 : [0, 1]2 → [0, 1] a strict t-norm. There exist two constants a ∈ (0, 1), b ∈ (0, 1] such that μ(T1 ,I,x) = a, ν(T1 ,I,x) = b for all x ∈ [0, 1], and two continuous, strictly decreasing function ta , t2 : [0, 1] → [0, ∞] with ta (1) = t2 (1) = 0, ta (0) < ∞, t2 (0) < ∞, which are uniquely determined up to positive multiplicative constants, such that the correspondingly generating t-norm Ta of T1 on the generating subinterval [a, b] and T2 admit the representation (5) with ta and t2 , respectively. N : [0, 1] → [0, 1] is a strong negation, I : [0, 1]2 → [0, 1] a continuous binary function except the partly vertical section I(0, y) = 1 for y ∈ [0, a] and the partly horizontal section I(x, 1) for x ∈ [N (a), 1], which satisfies Eq.(7). Then The quaternion of functions (T1 , T2 , I, N ) does not satisfy the functional equations consisting of Eq.(3) and Eq.(4) for all x, y, z ∈ [0, 1]. Remark 5.1. In fact, applying the method used by Baczy´ nski in [2], we easily show that the functional equations consisting of Eq.(3) and Eq.(4) have many solutions when T1 is a strict t-norm, T2 is nilpotent t-norm, N is a strong negation, I is a continuous binary function except at the points (0, 0) and (1, 1). While, in this paper, Theorem 5.4 shows that the above-mentioned functional equations have no solution when both T1 is a continuous t-norm , T2 is a nilpotent t-norm, N is a strong negation and I is a continuous binary function except at the points (0, 0) and (1, 1). But there is no any contradiction because T1 has at least one idempotent element in this paper. Hence these results do not include the others.
6
Conclusion
We, in this work, summarize the sufficient and necessary conditions of solutions for Eq.(4) and the functional equations consisting of Eq.(3) and Eq.(4), when T1 is a continuous but not Archimedean triangular norm, T2 is a continuous and Archimedean triangular norm, I is an unknown function, N is a strong negation. We also underline that our method can apply to the three other functional equations and show that our results and the results obtained by Baczy´ nski in [3] and Qin in [20] , do not include the others. In future works we will try to concentrate on the other cases about these functional equations that are not considered in this paper, for example, when both T1 and T2 are continuous but not Archimedean t-norms.
On Distributive Equations of Implications
119
References 1. Baczy´ nski, M.: On a class of distributive fuzzy implications. Internat. J. Uncertainty, Fuzziness, Knowledge-Based Systems 9, 229–238 (2001) 2. Baczy´ nski, M.: Contrapositive symmetry of distributive fuzzy implications. Internat. J. Uncertainty, Fuzziness and Knowledge-Based Systems 10(suppl.), 135–147 (2002) 3. Baczy´ nski, M.: On the distributivity of fuzzy implications over continuous and Archimedean triangular conorms. Fuzzy Sets and Systems 161, 2256–2275 (2010) 4. Baczy´ nski, M., Balasubramaniam, J.: On the distributivity of fuzzy implications over nilpotent or strict triangular conorms. IEEE Trans. Fuzzy Syst. 17(3), 590–603 (2009) 5. Baczy´ nski, M., Drewniak, J.: Conjugacy classes of fuzzy implication, in: Computational Inteligence: Theory and Applications. In: Reusch, B. (ed.) Fuzzy Days 1999. LNCS, vol. 1625, pp. 287–298. Springer, Heidelberg (1999) 6. Balasubramaniam, J., Rao, C.J.M.: On the distributivity of implication operators over T-norms and S-norms. IEEE Trans. Fuzzy Syst. 12(1), 194–198 (2004) 7. Bustince, H., Burillo, P., Soria, F.: Automorphisms, negations and implication operators. Fuzzy Sets and Systems 134, 209–229 (2003) 8. Combs, W.E., Andrews, J.E.: Combinatorial rule explosion eliminated by a fuzzy rule configuration. IEEE Trans. Fuzzy Syst. 6, 1–11 (1998) 9. Combs, W.E.: Author’s reply. IEEE Trans. Fuzzy Syst. 7, 371 (1999) 10. Combs, W.E.: Author’s reply. IEEE Trans. Fuzzy Syst. 7, 478–479 (1999) 11. Dick, S., Kandel, A.: Comments on Combinational rule explosion eliminated by a fuzzy rule configuration. IEEE Trans. Fuzzy Syst. 7, 477–477 (1999) 12. Fodor, J.C., Roubens, M.: Fuzzy preference modeling and multi-criteria decision support. Kluwer, Dordrecht (1994) 13. Gottwald, S.: A Treatise on Many-Valued Logics. Research Studies Press, Baldock (2001) 14. Klement, E.P., Mesiar, R., Pap, E.: Triangular Norms. Kluwer, Dordrecht (2000) 15. Kuczma, M.: An Introduction to the Theory of Functional Equations and Inequalities: Cauchy’s Equations and Jensen’s Inequality. PWN-Polish Scientitic Publishers and University of Silesia, Warszawa-Krakow-Katowice (1985) 16. Ling, C.H.: Representation of associative functions. Publ. Math. Debrecen 12, 189– 212 (1965) 17. Mas, M., Monserrat, M., Torrens, J.: Modus ponens and modus tollens in discrete implications. International Journal of Approximate Reasoning 49, 422–435 (2008) 18. Mendel, J.M., Liang, Q.: Comments on Combinational rule explosion eliminated by a fuzzy rule configuration. IEEE Trans. Fuzzy Syst. 7, 369–371 (1999) 19. Qin, F., Baczy´ nski, M.: Distributive equations of implications based on continuous triangular norms. IEEE Trans. Fuzzy Syst. (acception) 20. Qin, F., Yang, L.: Distributive equations of implications based on nilpotent triangular norms. International Journal of Approximate Reasoning 51, 984–992 (2010) 21. Qin, F., Zhao, B.: The Distributive Equations for Idempotent Uninorms and Nullnorms. Fuzzy Sets and Systems 155, 446–458 (2005) 22. Ruiz-Aguilera, D., Torrens, J.: Distributivity of Strong Implications over Conjunctive and Disjunctive Uninorms. Kybernetika 42, 319–336 (2005) 23. Ruiz-Aguilera, D., Torrens, J.: Distributivity of Residual Implications over Conjunctive and Disjunctive Uninorms. Fuzzy Sets and Systems 158, 23–37 (2007)
120
F. Qin and M. Lu
24. Trillas, E.: Sobre funciones de negacion en la teoria de conjuntos difusos. Stochastica III, 47–60 (1979) (in Spanish) 25. Trillas, E., Mas, M., Monserrat, M., Torrens, J.: On the representation of fuzzy rules. International Journal of Approximate Reasoning 48, 583–597 (2008) 26. Trillas, E., Alsina, C.: On the law [p ∧ q → r] ≡ [(p → r) ∨ (q → r)] in fuzzy logic. IEEE Trans. Fuzzy Syst. 10, 84–88 (2002) 27. Tursksen, I.B., Kreinovich, V., Yager, R.R.: A new class of fuzzy implications: Axioms of fuzzy implication revisited. Fuzzy Sets and Systems 100, 267–272 (1998) 28. Yang, L., Qin, F.: Distributive equations based on fuzzy implications. In: IEEE International Conference on Fuzzy Systems, Korea, pp. 560–563 (2009)
A Novel Cultural Algorithm Based on Differential Evolution for Hybrid Flow Shop Scheduling Problems with Fuzzy Processing Time Qun Niu, Tingting Zeng, and Zhuo Zhou Shanghai Key Laboratory of Power Station, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, China [email protected]
Abstract. Considering the imprecise or fuzzy nature of the data in realworld problems, this paper proposes a novel cultural algorithm based on differential evolution (CADE) to solve the hybrid flow shop scheduling problems with fuzzy processing time(FHFSSP). The mutation and crossover operations of differential evolution (DE) are introduced into cultural algorithm (CA) to enhance the performance of traditional CA. Experimental results demonstrate that the proposed CADE method is more effective than CA, particle swarm optimization (PSO) and quantum evolution algorithm (QA) when solving FHFSSP. Keywords: Cultural algorithm, Differential evolution, Hybrid flow shop scheduling, Makespan, Fuzzy processing time.
1
Introduction
The hybrid flow shop scheduling problem (HFSSP) [1] has been well-known as one of the hardest combinatorial optimization problems. In the most of studies concerned with the HFSSP, processing times were treated as crisp value. However, in many practical applications, information is often ambiguous or imprecise. It may be more appropriate to consider fuzzy processing time for HFSSP to reflect the real-world situations. In the past few decades, a great deal of research work has been performed on fuzzy scheduling problems. The earliest paper in fuzzy scheduling appeared in 1979 [2]. Scheduling problem with fuzzy due-dates was firstly studied by Ishii et al. [3]. In [4], Ishibuchi et al. investigated flow shop scheduling with fuzzy processing times. Kuroda [5] analyzed fuzzy job shop scheduling problem. The open shop scheduling problem with fuzzy allowable time and resource constraint was discussed by Konno [6]. Lei [7] proposed an efficient Pareto archive particle swarm optimization for multi-objective fuzzy job shop scheduling. Recently, Pengjen [8] presented ant colony optimization to minimize the fuzzy makespan and total weighted fuzzy completion time in flow shop scheduling problems. Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 121–132, 2011. c Springer-Verlag Berlin Heidelberg 2011
122
Q. Niu, T. Zeng, and Z. Zhou
Zhengyi [9] introduced a hybrid particle swarm optimization for solving flow shop scheduling problem with fuzzy due date. In recent years, as a relatively new member in the family of evolutionary algorithms, cultural algorithms (CA) was first proposed by Reynolds [10] as a vehicle for modeling social evolution and learning in agent-based societies. Since it’s simple to implement when little tuning on its parameters, CA has been successfully used to solve many diverse optimization problems. For instance, evolutionary neural fuzzy network, timetabling problems and constrained optimization. However, for scheduling problems, there are only few applications. Daniel [11] and Tieke [12] employed CA to solve the job shop scheduling problem. This paper presents a novel CA based on differential evolution (CADE) to tackle the fuzzy hybrid flow shop scheduling problems (FHFSSP). Due to lack of a direction track toward the best solution and development burden for different applications in CA, the mutation operator and nonuniform crossover with DE are introduced into CA so as to overcome the premature convergence and increase the performance of traditional CA to make it competitive with other approaches.
2 2.1
Hybrid Flow Shop Scheduling Problems with Fuzzy Processing Time Problem Description
The hybrid flow shop (HFS) system is defined by the set M = {1, ..., i, ..., m} of m processing stages. There is a set Mi = {1, ..., l, ..., mi} of mi identical parallel machines at each stage i. The set J = {1, ..., j, ..., n} of n independent jobs has to be processed in the system defined by the sets M and Mi . It can be seen that each job j is composed of m operations and operation Oij illustrates the ith operation of job Jj . The processing time of operation Oij is delegated as a triangular fuzzy number (TFN) pij = (a1ij , a2ij , a3ij ). The ith operation of a job is processed at the ith stage and can begin only after the completion of former operation from the same sequence. Notations are given as follows: i: The number of the stages. j: Index of job, j = 1, 2, ..., n. sij : The fuzzy starting time of job j in stage i. pij : The fuzzy processing time of job j in stage i. cij : The fuzzy completion time of job j in stage i. Then, the HFSSP can be formulated as followed with the above notation. M in.
n j=1
Subject to cij = sij + pij , i = 1, ..., n; j = 1, ..., m. cij ≤ si(j+1) , i = 1, ..., n; j = 1, ..., m.
j C
A Novel CADE for FHFSSP m i=1 n l=1 n i=1
123
xij = 1, j = 1, 2, ..., n. xil = 1, i = 1, 2, ..., n. xil sij ≤
n i=1
xi(l+1) sij , i, l = 1, ..., n; j = 1, ..., m.
In this paper, maximum fuzzy completion time (makespan) is considered as a criterion which is described: max = C
max
j=1,2,...,n
j C
(1)
j and C max denotes where the fuzzy completion time of job Jj is shown as TFN C the maximum fuzzy completion time. 2.2
Operations on Fuzzy Processing Time
The fuzzy sets theory was proposed by A. Lotfi Zadeh in 1965[13]. In fuzzy context, some operations of fuzzy number are required to be redefined to build a and fuzzy maximum schedule. These operations include the fuzzy addition + max of two fuzzy numbers as well as the ranking methods of fuzzy numbers. Fuzzy addition is used to calculate the fuzzy completion time of operation, fuzzy maximum is used to determine the fuzzy beginning time of operation and the ranking method is for the maximum fuzzy completion time [14]. For two triangular fuzzy numbers: s = (a1 , b1 , c1 ) and t = (a2 , b2 , c2 ), where a1 (a2 ) and c1 (c2 ) are lower and upper bounds, while b1 (b2 ) is the modal value of the triangle. We adopt the following fuzzy addition and fuzzy maximum in order to conserve the triangular form of the obtained result: s + t = (a1 + a2 , b1 + b2 , c1 + c2 )
(2)
max( s+ t) = (max(a1 + a2 ), max(b1 + b2 ), max(c1 + c2 )) (3) The following criteria are adopted to rank s = (a1 , b1 , c1 ) and t = (a2 , b2 , c2 ): Criterion 1: If ct1 ( s) = a1 +2b41 +c1 > (<)ct1 ( t) = a2 +2b42 +c2 , then s > (<) t; Criterion 2: If ct1 ( s) = ct1 ( t), then ct2 ( s) = s2 is compared with ct2 ( t) = t2 to rank them; Criterion 3: If they have the identical ct1 and ct2 , the difference of spreads s) = c1 − a1 is chosen as a third criterion. ct3 ( For s and t, the membership function μs t (z) of s t is defined as: μs t (z) = sup min(μs (x), μt (y)) z=x
(4)
y
In this paper, the max withthe criterion: ifs > t, of s and t is approximated then s t = s; else s t= t. The criterion s t ≈ (a1 a2 , b1 b2 , c1 c2 ) is first used by Sakawa and Mori [15] and named Sakawa criterion for simplicity. Compared with the Sakawa criterion, the new criterion has the following features:
124
Q. Niu, T. Zeng, and Z. Zhou
(1) For s and t, their approximate max is either s or t; (2) Only three pairs of special points (si , ti ) are compared in the Sakawa criterion and three criteria to rank them are used in the new criterion. The approximate max of the new criterion approaches the real max better than that of the Sakawa criterion. Assume that a number of job sequences are constructed. The question is how to evaluate their fuzzy makespans.
3
The Proposed Cultural Algorithm Based on Differential Evolution
In this section, basic concepts concerning the cultural algorithm and a novel cultural algorithm differential evolution (CADE) method are described. Next, using CADE to solve fuzzy hybrid flow shop scheduling problem (FHFSSP) are further represented. 3.1
Cultural Algorithm Differential Evolution (CADE)
CA [10] involves acquiring the belief space from the evolving population space and then exploiting that information to guide the search. CA is dual inheritance systems which utilize a belief space and a population space. The belief space is where the knowledge, acquired by individuals through generations, is preserved. The population space consists of a set of possible solutions to the problem, and can be modeled using any population-based approach. CA provides self-adaptive capabilities which can generate the helpful information for the FHFSSP. However, state-of-the-art CAs exhibit drawbacks related to the lack of a direction track toward the best solution and the development burden (time and cost) for different applications. For the FHFSSP, the diverse population has a more desirable impact on the search ability in CA, therefore, embedding the mutaion and crossover operators with DE into CA can improve the performance of CA. The difference between the proposed CADE and the previous version of cultural differential evolution (CDE) [16] is that CDE uses DE in the population space while CADE introduces the operators of DE into influence function. To enhance the ability of searching for a near-global optimal solution, a novel CADE method is proposed, which combines the cooperative DE and CA to reproduce its search logic capacity and to increase diversity of the population. Fig. 1 shows the flowchart of the proposed CADE method. 3.2
Implementation of CADE for FHFSSP
The CADE method process is described step-by-step as follows. Step 1: Create initial populations The initial population which is represented by a real number encoding can be generated randomly using the following equation: xij,G = rand ∗ mi + 1
(5)
A Novel CADE for FHFSSP Start
125
End
Create initial vectors
Yes No
Create initial belief space
Termination? Yes
Update every position (YDOXDWHWKHSHUIRUPDQFHIXQFWLRQ 8SGDWHWULDOYHFWRUDQGWDUJHWYHFWRU $GMXVWHDFKEHOLHIVSDFH *HQHUDWHHDFKQHZYHFWRU
p=p+1 No
Is this the last Vectorp? 1d p d P
Fig. 1. Flowchart of the proposed CADE method
where rand is a random number between [0, 1) and mi means the number of the identical parallel machines at each stage i. Step 2: Create initial belief space The belief space is the information storage vault in which the vectors can preserve their experiences for other vectors to learn from them indirectly. Create P belief space, Bp (p = 1, 2, ..., P ). Each initial Bp is defined as an empty set. Step 3: Update every target vector Step 3.1: Evaluate the performance objective of each V ectori In this paper, FHFSSP with maximum fuzzy completion time (makespan) is considered as the objective and the criterion. Step 3.2: Update the trial vector Up,i and target vector Vp,i In this step, the first thing should update the trial vector. Compare the fitness value of each initial vector with that of trial vector. If the fitness value of the trial vector is better than that of initial vector, then the trial vector is replaced with the value of the initial vector, such as Eq.6. Then, updating the target vector. Compare the fitness value of all vectors with that of the target vector. If the fitness value of the target vector exceeds those of trial vectors, then the next generation of target vector is replaced with the trial vector, as shown in Eq.7. Up,i(G) , if f (Up,i(G) ) < f (xij,G ) Up,i(G+1) = (6) xij,G , otherwise Vp,i(G+1) =
Up,i(G+1) , if f (Up,i(G+1) ) < f (Vp,i(G) ) Vp,i(G) , otherwise
(7)
where G is the current generation. Step 4: Acceptance function The acceptance function yields the number of vectors that are applied to each belief space as Eq.8. The number of accepted vectors decreases when the number of generations increases.
126
Q. Niu, T. Zeng, and Z. Zhou
Naccepted = ap% × I +
ap% ×I t
(8)
where ap% is a parameter that is set by the user, and must specify the top performing 20% [17], I and t denote the number of vectors and the tth generation. Step 5: Adjust each belief space Bp using an acceptance function This step sorts these vectors in each V ectorp in the order of increasing fitness. Then the best value of each V ectorp is put into the belief space Bp using an acceptance function. The region of belief space BIp is described as BIp = [lp , up ] = {x|lp ≤ x ≤ up , x ∈ R}, where lp and up are represent the lower bound and the upper bound on Bp , respectively. Then, compare the solution of each vector in Bp with lp . If the solution of the vector is smaller than lp , then lp is replaced with the current solution, such as Eq.9. Moreover, compare the solution of each vector in the Bp with up . If the solution of the vector is greater than up, then up is replaced with the current solution, such as Eq.10. lp = up =
xp,i , if xp,i ≤ lp lp , otherwise
(9)
xp,i , if xp,i ≥ up up , otherwise
(10)
Step 6: Generate new V ectorp using lp , up , Up,i and Vp,i Using an influence Eq.11 to adjust every solution of each V ectorp is the first step which can change the direction of each vector in solution space so that it is not easily being trapped at a local optimum. Then, according to Eq.12, 13 and 14 update the trial vector and target vector to generate the each new V ectorp . xp,i(G) =
where F denotes scaling factor, CR represents crossover rate and r1 , r2 , r3 th vectors are three parameter vectors which are chosen in a random fashion from the current population.
A Novel CADE for FHFSSP
4
127
Experimental Results
To illustrate the effectiveness of our approach, 30 sets of instances are randomly generated using the method in [18] which can fuzzify some of the crisp benchmarks to evaluate the performance of the proposed CADE. For each crisp duration x, a three-point triangular fuzzy number is built. The first point is drawn randomly from the interval [δ1 x, x], where δ1 < 1. The center point is set equal to x, and the third point is drawn randomly from the interval [x, δ2 x], where δ2 > 1, as show in Fig. 2. In this paper, we set δ1 = 0.85 and δ2 = 1.3. Taking the first instance for example, the notation of j10s5fp1 means a 10-job, 5-stage and 1-fuzzy problem. The letters j, s and fp denote job, stage and fuzzy problem, respectively. The combinations of the 3 factors gave a total of 30 sets of problems. Then three other algorithms including cultural algorithm (CA), quantum evolution algorithm (QA) and particle swarm algorithm (PSO) are tested to compare with CADE in the experimental results. All methods were implemented using Matlab software and run on a PC with a Pentium (R) Dual 1.6 GHz processor with 2GB of RAM.
P A (t )
1.0
[G1 x, x]
x
[ x, G 2 x ]
time
Fig. 2. Fuzzification of a crisp data(x)
4.1
Parameter Setting
The population size is 20 and maximum number of iteration G is 100, so that the total fitness evaluations are the same for all the compared methods. Thirty independent runs are performed on each instance. In this paper, the parameter configurations for CA, PSO and the proposed CADE are over a large amount of trials. For CA, the probability to apply the situational knowledge p = 0.2 [17]. For PSO, inertia weight w = 0.8 and the acceleration coefficient c1 = c2 = 2 [19]. For the proposed CADE, the operators embedded from DE are crossover rate CR = 0.8, the scaling factor F = 0.6 [20] and the parameter from CA is the probability to apply the situational knowledge p = 0.2 [17].
In the following Tables, “Best solution” denotes the best makespan found in 30 runs, “Average value” indicates the average value of the best solutions found in all runs and “Time” expresses CPU average time. Tables 1, 2 and 3 summarize the comparison results obtained by CADE, CA, QA, and PSO. Each Table consists of 10 instances. From Table 1-3, it can be concluded that CADE performs better than CA, QA and PSO. For all instances, the average values obtained by CADE are always smaller than the corresponding results for CA, QA and PSO. Moreover, CADE can find the best solution of all instances, CA, QA and PSO do not approximate the best solutions of most instances and the computational times for CADE are always slightly longer than those for CA, QA and PSO. In particular, from Table 1, for ten 10 × 5 simple problems, four algorithms find the same “Best solution” for five 10 × 5 problems and CADE outperforms the other three algorithms for the other five 10 × 5 instances. From Table 2 and 3, for ten 10 × 10 instances and ten 15 × 10 instances, the “Best solution” generated by CADE are better than those of CA, QA and PSO.
5
Conclusion
This paper proposes a novel cultural algorithm based on differential evolution (CADE) method to solve FHFSSP with makespan objective. Since processing times were modeled as triangular fuzzy numbers, the makespan is a triangular fuzzy number as well. Combining CA and DE reasonable, CADE method has the ability to obtain better solution for the FHFSSP. The performance of CADE is evaluated in comparison with CA, QA and PSO for 30 instances. Computational results demonstrate the effectiveness of the proposed CADE. With respect to the application, CADE can be applied to some other scheduling problems such as parallel machine scheduling in the future work. Acknowledgments. This work is supported by the National Natural Science Foundation of China (grant no.60804052),Chen Guang project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation, Shanghai University “11th Five-Year Plan” 211 Construction Project.
References 1. Linn, R., Zhang, W.: Hybrid flow shop scheduling: A survey. Computers & Industrial Engineering 37, 57–61 (1999) 2. Prade, H.: Using fuzzy set theory in a scheduling problem: a case study. Fuzzy Sets and Systems 2, 153–165 (1979) 3. Ishii, H., Tada, M., Masuda, T.: Two scheduling problems with fuzzy due-dates. Fuzzy Sets and Systems 46, 339–347 (1992) 4. Ishibuchi, H., Yamamoto, N., Murata, T., Tanaka, H.: Genetic algorithms and neighborhood search algorithms for fuzzy flow shop scheduling problems. Fuzzy Sets and Systems 67, 81–100 (1994)
132
Q. Niu, T. Zeng, and Z. Zhou
5. Kuroda, M., Wang, Z.: Fuzzy job shop scheduling. International Journal of Production Economics 44, 45–51 (1996) 6. Konno, T., Ishii, H.: An open shop scheduling problem with fuzzy allowable time and fuzzy resource constraint. Fuzzy Sets and Systems 109, 141–147 (2000) 7. Lei, D.M.: Pareto archive particle swarm optimization for multi-objective fuzzy job shop scheduling problems. Int. J. Adv. Manuf. Technol. 37, 157–165 (2007) 8. Peng-Jen, L., Hsien-Chung, W.: Using ant colony optimization to minimize the fuzzy makespan and total weighted fuzzy completion time in flow shop scheduling problems. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 17, 559–584 (2009) 9. Zhengyi, J., Shanqing, L., Jianmin, Z.: A new hybrid particle swarm optimization for solving flow shop scheduling problem with fuzzy due date. Advanced Materials Research, 189-193, 2746–2753 (2011) 10. Reynolds, R.G.: An introduction to cultural algorithms. In: Sebald, A.V., Fogel, L.J. (eds.) Proceedings of the Third Annual Conference on Evolutionary Programming, pp. 131–139. World Scientific, River Edge, New Jersey (1994) 11. Daniel, C.R., Ricardo, L.B., Carlos, A.C.: Cultural algorithms, an alternative heuristic to solve the job shop scheduling problem. Engineering Optimization 39, 69–85 (2007) 12. Tieke, L., Weiling, W., Wenxue, Z.: Solving flexible job shop scheduling problem based on cultural genetic algorithm. Computer Integrated Manufacturing Systems 16, 861–866 (2010) 13. Deming, L.: Solving fuzzy job shop scheduling problems using random key genetic algorithm. Int. J. Adv. Manuf. Technol. 49, 253–262 (2010) 14. Zadeh, L.A.: Fuzzy Sets. Information and Control 8, 338–353 (1965) 15. Sakawa, M., Wang, Z.: An efficient genetic algorithm for job shop scheduling problems with fuzzy processing time and fuzzy due date. Comput. Ind. Eng. 36, 325–341 (1999) 16. Storn, R., Price, K.V.: Differential evolution: A simple and efficient adaptive scheme for global optimization over continuous spaces. J. Global Optimization 11, 341–359 (1997) 17. Li, B.B., Wang, L.: A hybrid quantum-inspired genetic algorithm for multiobjective flow shop scheduling. IEEE Trans. on Systems, Man, and CyberneticsPart B: Cybernetics 37, 576–591 (2007) 18. Omar, A.G.: A bi-citeria optimization: minimizing the integral value and spread of the fuzzy makespan of job shop scheduling problems. Applied Soft Computing 2, 197–210 (2003) 19. Shi, Y., Eberhart, R.C.: Parameter selection in particle swarm optimization. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 591–600. Springer, Heidelberg (1998a) 20. Liu, J., Lampinen, J.: On setting the control parameter of the differential evolution algorithm. In: Proceedings of the 8th International Mendel Conference on Soft Computing, pp. 11–18 (2002a)
An Over-Relaxed (A, η, m)-Proximal Point Algorithm for System of Nonlinear Fuzzy-Set Valued Operator Equation Frameworks and Fixed Point Problems Heng-you Lan1, , Xiao Wang2 , Tingjian Xiong1 , and Yumin Xiang1 1
2
School of Science, Sichuan University of Science & Engineering, Zigong, Sichuan 643000, P.R. China School of Computer and Science, Sichuan University of Science & Engineering, Zigong, Sichuan 643000, P.R. China [email protected]
Abstract. In order to find the common solutions for nonlinear fuzzy-set valued operator equations and fixed point problems of Lipschitz continuous operators in Hilbert spaces, the purpose of this paper is to construct a new class of over-relaxed (A, η, m)-proximal point algorithm framework with errors by using some results on the resolvent operator corresponding to (A, η, m)-maximal monotonicity. Further, the variational graph convergence analysis for this algorithm framework is investigated. Finally, some examples of applying the main result is also given. The results presented in this paper improve and generalize some well known results in recent literatures. Keywords: (A, η, m)-maximal monotonicity, nonlinear fuzzy-set valued operator equation and fixed point problem, Over-relaxed (A, η, m)proximal Point Algorithm with errors, variational graphical convergence.
1
Introduction
It is well known that variational inequalities and variational inclusions have been widely used as a mathematical programming tool in modeling many optimization and decision making problems. However, facing uncertainty is a constant challenge for optimization and decision making, see, for example, [1-6] and the references therein. In 1989, Chang and Zhu [1] introduced the concepts of the variational inequalities for fuzzy-set valued operators. Several kinds of variational inequalities, variational inclusions and complementarity problems for fuzzy-set valued operators were considered and studied by many authors, see, for example, [2-5] and the references therein. On the other hand, in order to solve the variational inclusions and related optimization problems, the generalized resolvent operator techniques, which are
Corresponding author.
Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 133–142, 2011. c Springer-Verlag Berlin Heidelberg 2011
134
H.Y. Lan et al.
extended and modified the projection method and its variant forms including the Wiener-Hopf equations, have been in use for a while and are being applied to several other fields, for stance, equilibria problems in economics, global optimization and control theory, operations research, management and decision sciences, and mathematical programming. See the following example, and [4-17] and the references therein. Example 1. ([18]) Let H be a real Hilbert space, and M : dom(M ) ⊂ H → H be an operator such that M is monotone and R(I + M ) = H, Then based on the Yosida approximation Mρ = 1ρ (I − (I + ρM )−1 ), for each given u0 ∈ dom(M ), there exists exactly one continuous function u : [0, 1) → H such that the first u (t) + M u(t) = 0, order evolution equation holds for all t ∈ (0, ∞), where the u(0) = u0 , derivative u (t) exists in the sense of weak convergence, that is, u(t+h)−u(t) h u (t) as h → 0. In [12], Lan first introduced a new concept of (A, η)-monotone (so called (A, η, m)maximal monotone [14]) operators, which generalizes the (H, η)-mono-tonicity, A-monotonicity and other existing monotone operators as special cases, and studied some properties of (A, η)-monotone operators and defined resolvent operators associated with (A, η)-monotone operators. Further, some (systems of) variational inequalities, nonlinear (random or parametric) operator inclusions, nonlinear (set-valued) inclusions, complementarity problems and equilibrium problems have been studied by some authors in recent years because of their close relations to Nash equilibrium problems. See, for example, [6,13,14,19] and the references therein. ˆ An operator T from a real Hilbert spaces H to the collection F(H) = {E : H → [0, 1] a function} of fuzzy sets over H is called a fuzzy-set valued operator, which means that for each x ∈ H a fuzzy set T (x), denoted by Tx in the sequel, is a function from H to [0, 1]. For each y ∈ H, Tx (y) denotes the membership-grade ˆ of y in Tx . A fuzzy-set valued operator T : H → F(H) is said to be closed if for each x ∈ H, the function y → Tx (y) is upper semicontinuous, that is, for any given net {yk } ⊂ H, satisfying ys → y0 ∈ H, we have lim sup Tx (ys ) ≤ Tx (y0 ). s
ˆ Let E ∈ F(H), α ∈ [0, 1]. Then the set (E)t = {x ∈ H : E(x) ≥ t} is called a ˆ t-cut set of E. Let T : H → F(H) be a fuzzy-set valued operator satisfying the following condition (I): There exists an operator a : H → [0, 1], such that for all x ∈ H, we have (Tx )a(x) ∈ CB(H). Remark 1. ([4]) Let X be a normed vector space. If T is a closed fuzzy-set valued operator satisfying the condition (I), then for all x ∈ X, the set (Tx )a(x) belongs to the collection CB(X) of all nonempty closed and bounded subsets of X. Let H be a real Hilbert space, A : H → H, η : H × H → H be any nonlinear operators, M : H × H → 2H be an (A, η, m)-maximal monotone operator,
An Over-Relaxed (A, η, m)-Proximal Point Algorithm
135
B : H × H → H, f, g : H → H be single-valued operators, λ, > 0 are two constants, a : H → [0, 1] be an operator and T : H → Fˆ (H) be a fuzzy-set valued operator satisfying the condition (I). In this paper, we shall consider the following nonlinear fuzzy-set valued operator equation: Find x, u ∈ H such that Tx (u) ≥ a(x) (i.e., u ∈ (Tx )a(x) ) and η,M g(x) − Jρλ,A [(1 − λ)A(g(x)) + λ(A(f (x)) − ρB(x, u))] = 0,
(1)
η,M = (A + ρλM )−1 is the where , ρ, λ are three positive constants, and Jρλ,A resolvent operator associated with the set-valued operator M . Based on the definition of the resolvent operators associated with (A, η, m)maximal monotone operators, Eqn. (1) can be written as
0 ∈ A(g(x)) − A(f (x)) + ρ[B(x, u) + M (g(x))].
(2)
We remark that for appropriate and suitable choices of H, A, η, B, M, f, g, T and λ, , one can know that the problem (1) or (2) includes a number (systems) of quasi-variational inclusions, generalized (random or parametric) quasivariational inclusions, quasi-variational inequalities, implicit quasi-variational inequalities as special cases. See, for example, [6,9,10,16] and the references therein. Motivated and inspired by these recent algorithmic developments in [4,19], especially, the approach based on the variational graph convergence for approximating the solutions of the nonlinear variational inclusions [6,11,17], in this paper, by using some results on the resolvent operator corresponding to (A, η, m)-maximal monotonicity, we shall study the variational graph convergence analysis for a new class of over-relaxed (A, η, m)-proximal point algorithm framework with errors in the context of finding the common solutions for equation (1) and fixed point problems of Lipschitz continuous operators in Hilbert spaces
2
Preliminaries
Let H be a real Hilbert space endowed with a norm · and an inner product ·, · , CB(H) denote the family of all nonempty closed bounded subsets of H and 2H denote the family of all the nonempty subsets of H. In the sequel, we give some concept and lemmas needed later. Definition 1. Let A : H → H be a single-valued operator. Then an operator B : H × H → H is said to be (i) (π, υ)-relaxed cocoercive with respect to A in the first argument, if there exist positive constants π and υ such that for x, y, w ∈ H, B(x, w) − B(y, w), A(x) − A(y) ≥ −πx − y2 + υB(x, w) − B(y, w)2 . (ii) Lipschitz continuous with constant σ in the first argument if there exists a constant σ > 0 such that B(x, z) − B(y, z) ≤ σx − y, ∀x, y, z ∈ H.
136
H.Y. Lan et al.
In a similar way, we can define (relaxed) cocoercivity and Lipschitz continuity of the operator B(·, ·) in the second argument. Remark 2. The notion of the cocoercivity is applied in several directions, especially to solving variational inequality problems using the auxiliary principle and projection methods [15], while the notion of the relaxed cocoercivity is more general than the strong monotonicity as well as cocoercivity. Several classes of relaxed cocoercive variational inequalities and variational inclusions have been studied in [7, 11-17]. Definition 2. A single-valued operator η : H × H → H is said to be τ -Lipschitz continuous if there exists a constant τ > 0 such that η(x, y) ≤ τ x − y for all x, y ∈ H. Definition 3. Let η : H×H → H and A : H → H be two single-valued operators. Then set-valued operator M : H → 2H is said to be (i) m-relaxed η-monotone if there exists a constant m > 0 such that u − v, η(x, y) ≥ −mx − y2 ,
∀x, y ∈ H, u ∈ M (x), v ∈ M (y);
(ii) (A, η, m)-maximal monotone if M is m-relaxed η-monotone and R(A + ρM ) = H for every ρ > 0. Similarly, we can define strictly η-monotonicity and strongly η-monotonicity of nonlinear operators. Remark 3. (1) If m = 0 or A = I or η(x, y) = x − y for all x, y ∈ H, (A, η, m)maximal monotonicity (so-called (A, η)-monotonicity [12], (A, η)-maximal relaxed monotonicity [7]) reduces to the (H, η)-monotonicity, H-monotonicity, Amonotonicity, maximal η-monotonicity, classical maximal monotonicity (see, for example, [7,9, 12-16, 20]). Further, we note that the idea of this extension is so close to the idea of extending convexity to invexity introduced by Hanson in [21], and the problem studied in this paper can be used in invex optimization and also for solving the variational-like inequalities as a direction for further applied research, see, related works in [22,23] and the references therein. (2) Moreover, operator M is said to be generalized maximal monotone (in short GMM-monotone) if: (i) M is monotone; (ii) A + ρM is maximal monotone or pseudomonotone for ρ > 0. Example 2. ([7]) Suppose that A : H → H is r-strongly η-monotone, and f : H → R is locally Lipschitz such that ∂f , the subdifferential, is m-relaxed ηmonotone with r − m > 0. Clearly, we have u − v, η(x, y) ≥ (r − m)x − y2 , where u ∈ A(x) + ∂f (x) and v ∈ A(y) + ∂f (y) for all x, y ∈ H. Thus, A + ∂f is η-pseudomonotone, which is indeed, η-maximal monotone. This is equivalent to stating that A + ∂f is (A, η, m)-maximal monotone. Definition 4. Let A : H → H be a strictly η-monotone operator and M : H → 2H be an (A, η, m)-maximal monotone operator. Then the corresponding general ρ,A solvent operator Jη,M : H → H is defined by ρ,A (x) = (A + ρM )−1 (x), Jη,M
∀x ∈ H.
An Over-Relaxed (A, η, m)-Proximal Point Algorithm
137
Remark 4. The (A, η, m)-resolvent operators include the corresponding resolvent operators associated with (H, η)-monotone operators, maximal η-monotone operators, H-monotone operators, A-monotone operators, η-subdifferential operators, the classical maximal monotone operators [7,9, 12-14, 20]. Lemma 1. ([12]) Let η : H × H → H be τ -Lipschitz continuous, A : H → H be r-strongly η-monotone and M : H → 2H be (A, η, m)-maximal monotone. Then ρ,A τ the resolvent operator Jη,M : H → H is r−ρm -Lipschitz continuous. Definition 5. Let T : H → Fˆ (H) be a closed fuzzy-set valued operator satisfying ˆ the condition (I). Then, T is said to be ξ-H-Lipschitz continuous if ˆ x )a(x) , (Ty )a(y) ) ≤ ξx − y, H((T
∀x, y ∈ H,
ˆ is the Hausdorff metric on CB(H). where ξ > 0 is a constant and H Definition 6. Let M n , M : H → 2H be (A, η, m)-maximal monotone operators for n = 0, 1, 2, . . . . Let A : H → H be r-strongly η-monotone and β-Lipschitz A−G continuous. The sequence M n is graph-convergent to M , denoted M n −→ M , if for every (x, y) ∈ graph(M ) there exists a sequence (xn , yn ) ∈ graph(M n ) such that xn → x, yn → y as n → ∞. Based on Definition 6 and Theorem 2.1 in [17], we have the following lemma. Lemma 2. Let M n , M : H → 2H be (A, η, m)-maximal monotone operators for A−G n = 0, 1, 2, · · · . Then the sequence M n −→ M if and only if ρ,A ρ,A Jη,M n (x) → Jη,M (x), ∀x ∈ H, ρ,A ρ,A where Jη,M = (A + ρM n )−1 , Jη,M = (A + ρM )−1 , ρ > 0 is a constant, and A : H → H is r-strongly η-monotone and β-Lipschitz continuous.
3
Proximal Point Algorithm and Graph-Convergence
In this section, by using some results on the resolvent operator corresponding to (A, η, m)-maximal monotonicity, we shall develop a new perturbed iterative algorithm framework based on the variational graph convergence for approximating the common solutions for nonlinear fuzzy-set valued operator equations and fixed point problems for Lipschitz continuous operators in Hilbert spaces. Firstly, we note that x ∈ H is called a fixed point of a fuzzy-set valued operator (as generalization of fixed point of a set-valued (multivalued) operator) if T : H → Fˆ (H) is a fuzzy-set valued operator satisfying the condition (I) and x ∈ (Tx )a(x) ∈ CB(H).
138
H.Y. Lan et al.
Further, if (x∗ , u∗ ) is a solution of Eqn. (1) and g(x∗ ) ∈ F (Q), where F (Q) is the set of fixed points of Q, that is, F (Q) = {x ∈ H : Q(x) = x}, then we note that the Eqn. (1) can be rewritten as λρ,A ∗ g(x∗ ) = Q(g(x∗ )) = Q(Jη,M (z )), z ∗ = (1 − λ)A(g(x∗ )) + λ(A(f (x∗ )) − ρB(x∗ , u∗ )). This formulation allows us to construct the following perturbed iterative algorithm framework with errors for finding a common element of two different sets, that is, the set of the solutions for nonlinear fuzzy-set valued operator equation (1) and the set of fixed point for a Lipschitz continuous operator. Algorithm 1. Step 1. For an arbitrary initial point x0 ∈ H, take z0 ∈ H such that λρ,A g(x0 ) = Jη,M (z0 ).
Step 2. For all b = c, d, e, choose sequences {bn } ⊂ H is error sequence to take into account a possible inexact computation of the operator points, which satisfy the following conditions: lim bn = 0,
n→∞
∞
bn − bn−1 < ∞.
n=1
Step 3. Let the sequence {(xn , zn )} ⊂ H×H, a : H → [0, 1] and Tx (un ) ≥ a(xn ) satisfy λρ,A xn+1 = (1 − α)xn + α[xn − g(xn ) + Q(Jη,M n (zn ))] + αdn + en , (3) zn = (1 − λ)A(g(xn )) + λ(A(f (xn )) − ρB(xn , un )) + cn . λρ,A n −1 where α, λ, ρ, > 0 are constants and Jη,M for all n ∈ N. n = (A + λρM ) Step 4. Choose Tx (un+1 ) ≥ a(xn+1 ) such that (see [24])
un − un+1 ≤ (1 +
1 ˆ xn )a(x ) , (Txn+1 )a(x ) ). )H((T n n+1 n+1
(4)
Step 5. If xn , un , zn , cn , dn and en satisfy (3) and (4) to sufficient accuracy, stop; otherwise, set n := n + 1 and return to Step 2. Now we prove the existence of a solution of the problem (1) and the convergence of Algorithm 1. Theorem 1. Let H be a real Hilbert space, η, A, M, B and T , g, f be the same as in the Eqn. (1). Also suppose that the following conditions hold: (H1 ) η is τ -Lipschitz continuous, Q : H → H is κ-Lipschitz continuous, and A is r-strongly η-monotone and σ-Lipschitz continuous; (H2 ) g is δ-strongly monotone and υ-Lipschitz continuous, f is ς-Lipschitz ˆ continuous, T is a closed ξ-H-Lipschitz continuous fuzzy-set valued operator satisfying the condition (I) with a function a : H → [0, 1]; (H3 ) B is (π, ι)-relaxed cocoercive with respect to f in the first argument and Lipschitz continuous with constants β and in the first and second variable,
An Over-Relaxed (A, η, m)-Proximal Point Algorithm
139
respectively, where f : H → H is defined by f (y) = A ◦ f (y) = A(f (y)) for all y ∈ H; (H4 ) for n = 0, 1, 2, · · · , M n : H → 2H is (A, η, m)-maximal monotone operA−G ators with M n −→ M ; (H5 ) there exist positive constants λ, and ρ such that √ ⎧ k = 1 − 2δ + υ 2 < 1, h = ξ + m(1−k) < β, ⎪ λκτ ⎪ ⎪ συ r l ⎨ l = r(1−k)−(1−λ)κτ < σς, ρ < min{ , }, λκτ m h (5) 2 2 2 ς 2 − l 2 )(β 2 − h2 ), ιβ > π + lh + (σ ⎪ √ 2 ⎪ ⎪ 2 ⎩ (ιβ −π−lh)2 −(σ2 2 ς 2 −l2 )(β 2 −h2 ) ρ − ιββ 2−π−lh . −h2 < β 2 −h2 Then the iterative sequence (xn , un ) defined by Algorithm 1 converges strongly to a solution (x∗ , u∗ ) of the problem (1). Proof. By the assumptions of the theorem, (3) and (4), we have xn − xn−1 − [g(xn ) − g(xn−1 )] ≤ 1 − 2δ + υ 2 xn − xn−1 , B(xn , un ) − B(xn , un−1 ) ≤ un − un−1 ˆ xn )a(x ) , (Txn−1 )a(x ) ) ≤ ξ(1 + n−1 )xn − xn−1 , ≤ (1 + n−1 )H((T n
ακτ r−ρm cn − cn−1 + ακ(εn + ρλ,A ρλ,A Jη,M p (zp ) − Jη,M (zp ) for
εn−1 ) + αdn − dn−1 + en − en−1 and εp = p = n − 1, n. + ρλξ + λ σ 2 2 ς 2 − 2ριβ 2 + 2ρπ + ρ2 β 2 , θ = 1 − α + √Let ϑ = (1 − λ)συ ακτ ϑ α 1 − 2δ + υ 2 + r−ρm . Then, we know that θn ↓ θ as n → ∞. The condition (5) implies that 0 < θ < 1 and so there exist n0 > 0, θ0 ∈ (θ, 1) such that θn ≤ θ0 for all n ≥ n0 . Hence, it follows from (6) that xn+1 − xn ≤ θ0 xn − xn−1 + ωn ≤ θ0n−n0 xn0 +1 − xn0 +
n−n 0
θ0i−1 ωn−(i−1) ,
i=1
which implies that for any m ≥ n > n0 , xm − xn ≤
m−1
xj+1 − xj
j=n
≤
m−1 j=n
θ0j−n0 xn0 +1 − xn0 +
m−1 j−n 0
θ0i−1 ωj−(i−1) .
(7)
j=n i=1
From the hypothesis of Algorithm 1, Lemma 2 and (7), it follows that {xn } is a Cauchy sequence, that is, there exists x∗ ∈ H such that xn → x∗ as n → ∞. Next, we prove that un → u∗ ∈ (Tx∗ )a(x∗ ) . In fact, condition (H2 ) implies that {un } is also Cauchy sequence in H. Let un → u∗ . In the sequel, we will show that u∗ ∈ (Tx∗ )a(x∗ ) . Noting un ∈ (Txn )a(xn ) , from the results in [24], we have d(u∗ , (Tx∗ )a(x∗ ) ) = inf{un − y : y ∈ (Tx∗ )a(x∗ ) } ≤ u∗ − un + d(un , (Tx∗ )a(x∗ ) )
Hence d(u∗ , (Tx∗ )a(x∗ ) ) = 0 and therefore u∗ ∈ (Tx∗ )a(x∗ ) . By continuity and the hypothesis of Algorithm 1, we know that (x∗ , u∗ ) satisfies the Eqn. (1). This completes the proof. 2 Remark 5. Condition (H5 ) in Theorem 1 holds for some suitable value of constants, for example, λ = ς = 0.8, ρ = 4.5, = 1, δ = 0.3, υ = 0.6, σ = τ = 0.5, β = 0.08, ξ = 0.01, κ = 0.4, = 0.05, π = m = 0.02, ι = 10, r = 0.2. Remark 6. If M is a (H, η)-monotone operator, H-monotone operator, Amonotone operator, maximal η-monotone operator and classical maximal monotone operator and dn = 0 or en = 0 or cn = 0 for all n ≥ 0 in Algorithm 1, then we can obtain the corresponding results of Theorem 3.1. Our results improve and generalize the corresponding results of [6,9,16,19] and many other recent works.
An Over-Relaxed (A, η, m)-Proximal Point Algorithm
141
Example 3. Assume that H is a real Hilbert space, A : H → H is r-strongly η-monotone, and ϕ : H → R is a locally Lipschitz functional such that ∂ϕ, the subdifferential, is m-relaxed η-monotone with r − m > 0. This is equivalent to stating that A + ∂ϕ is (A, η, m)-maximal monotone. Thus, if all the conditions for Theorem 1 are satisfied, one can apply Theorem 1 to the approximationsolvability of the operator inclusion problem of finding x ∈ H and u ∈ (Tx )a(x) such that A(f (x)) ∈ (1 + ρ)A(g(x)) + ρB(x, u) + ρ∂ϕ(g(x)).
4
Conclusions
In this paper, we first introduce a class of nonlinear fuzzy-set valued operator equations. Then, by using some results on the resolvent operator corresponding to (A, η, m)-maximal monotonicity, we construct a new class of over-relaxed (A, η, m)-proximal point algorithm framework with errors and investigate the variational graph convergence analysis for this algorithm framework in the context of finding the common solutions for the nonlinear equations and fixed point problems of Lipschitz continuous operators in Hilbert spaces. Furthermore, we also give some examples of applying the main result. The results presented in this paper improve and generalize some well known results in recent literatures. Acknowledgments. This work was supported by the Sichuan Youth Science and Technology Foundation (08ZQ026-008), the Open Foundation of Artificial Intelligence of Key Laboratory of Sichuan Province (2009RZ001) and the Scientific Research Fund of Sichuan Provincial Education Department (10ZA136). The authors are grateful to the editors and referees for valuable comments and suggestions.
References 1. Chang, S.S., Zhu, Y.G.: On variational inequalities for fuzzy mappings. Fuzzy Sets and Systems 32, 359–367 (1989) 2. Farhadinia, B.: Necessary optimality conditions for fuzzy variational problems. Inform. Sci. 181(7), 1348–1357 (2011) 3. Lan, H.Y.: An approach for solving fuzzy implicit variational inequalities with linear membership functions. Comput. Math. Appl. 55(3), 563–572 (2008) 4. Lee, B.S., Khan, M.F.: Salahuddin: Fuzzy nonlinear set-valued variational inclusions. Comput. Math. Appl. 60, 1768–1775 (2010) 5. Liu, Z., Debnath, L., Kang, S.M., Ume, J.S.: Generalized mixed quasivariational inclusions and generalized mixed resolvent equations for fuzzy mappings. Appl. Math. Comput. 149(3), 879–891 (2004) 6. Agarwal, R.P., Verma, R.U.: General implicit variational inclusion problems based on A-maximal (m)-relaxed monotonicity (AMRM) frameworks. Appl. Math. Comput. 215, 367–379 (2009) 7. Agarwal, R.P., Verma, R.U.: General system of (A, η)-maximal relaxed monotone variational inclusion problems based on generalized hybrid algorithms. Commun. Nonlinear Sci. Num. Sim. 15(2), 238–251 (2010)
142
H.Y. Lan et al.
8. Cai, L.C., Lan, H.Y., Zou, Y.Z.: Perturbed algorithms for solving nonlinear relaxed cocoercive operator equations with general A-monotone operators in Banach spaces. Commun. Nonlinear Sci. Numer. Simulat. 16(10), 3923–3932 (2011) 9. Fang, Y.P., Huang, N.J.: H-Monotone operator and resolvent operator technique for variatonal inclusions. Appl. Math. Comput. 145, 795–803 (2003) 10. He, X.F., Lou, J., He, Z.: Iterative methods for solving variational inclusions in Banach spaces. J. Comput. Appl. Math. 203(1), 80–86 (2007) 11. Lan, H.Y., Cai, L.C.: Variational convergence of a new proximal algorithm for nonlinear general A-monotone operator equation systems in Banach spaces. Nonlinear Anal. TMA. 71(12), 6194–6201 (2009) 12. Lan, H.Y.: A class of nonlinear (A, η)-monotone operator inclusion problems with relaxed cocoercive mappings. Adv. Nonlinear Var. Inequal. 9(2), 1–11 (2006) 13. Lan, H.Y.: Approximation solvability of nonlinear random (A, η)-resolvent operator equations with random relaxed cocoercive operators. Comput. Math. Appl. 57(4), 624–632 (2009) 14. Lan, H.Y.: Sensitivity analysis for generalized nonlinear parametric (A, η, m)maximal monotone operator inclusion systems with relaxed cocoercive type operators. Nonlinear Anal. TMA. 74(2), 386–395 (2011) 15. Verma, R.U.: A-monotononicity and applications to nonlinear variational inclusion problems. J. Appl. Math. Stochastic Anal. 17(2), 193–195 (2004) 16. Verma, R.U.: A-monotone nonlinear relaxed cocoercive variational inclusions. Central European J. Math. 5(2), 386–396 (2007) 17. Verma, R.U.: A generalization to variational convergence for operators. Adv. Nonlinear Var. Inequal. 11(2), 97–101 (2008) 18. Komura, Y.: Nonlinear semigroups in Hilbert space. J. Math. Society Japan 19, 493–507 (1967) 19. Petrot, N.: A resolvent operator technique for approximate solving of generalized system mixed variational inequality and fixed point problems. Appl. Math. Letters 23(4), 440–445 (2010) 20. Zeidler, E.: Nonlinear functional analysis and its applications, vol. I. Springer, New York (1986) 21. Hanson, M.A.: On sufficiency of Kuhn-Tucker Conditions. J. Math. Anal. Appl. 80(2), 545–550 (1981) 22. Soleimani-damaneh, M.: Generalized invexity in separable Hilbert spaces. Topology 48(2-4), 66–79 (2009) 23. Soleimani-damaneh, M.: Infinite (semi-infinite) problems to characterize the optimality of nonlinear optimization problems. European J. Oper. Res. 188(1), 49–56 (2008) 24. Nadler, S.B.: Muliti-valued contraction mappings. Pacific J. Math. 30, 475–488 (1969)
Reliability-Based Route Optimization of a Transportation Network with Random Arc Capacities and Time Threshold Tao Zhang, Bo Guo, and Yuejin Tan College of Information Systems and Management, National University of Defense Technology, 410073 Changsha, China
Abstract. The classical route optimization problem of a network focuses on the shortest or fastest route mainly under the assumption that all roads will not fail. In fact, the capacities of roads in a transportation network are not determinate but random because of the traffic accidents, maintenance or other activities. So a most reliable route from source to sink under the time threshold may be more important than the shortest or fastest route sometimes. This paper describes a stochastic Petri net-based simulation approach for reliability-based route optimization of a transportation network. The capacities of arcs may be in a stochastic state following any discrete or continuous distribution. The transmission time of each arc is also not a fixed number but stochastic according to its current capacity and demand. To solve this problem, a capacitated stochastic colored Petri net is used for modeling the system behavior. By the simulation, the optimal route with highest reliability can be obtained. Finally, an example of a transportation network with random arc capacities is given. Keywords: Route optimization, Reliability, Multi-state, transportation network, Petri net, time threshold.
1
Introduction
The classical route optimization problems are the variants of the shortest path problem, which focus on how to obtain the shortest route, least cost route, quickest route or route with some combination of multiple criteria when goods or commodities are transmitted from one node (source) to another node (sink) through the network[1–4]. In a transportation network, it is an important issue to reduce the total transportation time through the network. Hence, the quickest route optimization problem, a time version of the shortest path problem, is proposed. This problem is for finding a quickest route with minimum transmission time to send a given amount of data from the source to the sink, where each arc has two attributes; the capacity and the duration time[5]. In most of the studies, the capacity and the duration time of each arc are both assumed to be deterministic. However, in many real transportation networks, the capacity of each road (arc) is stochastic due to failure, partial failure, traffic accident, maintenance, etc. The state of each road may not be fully working either fully Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 143–156, 2011. c Springer-Verlag Berlin Heidelberg 2011
144
T. Zhang, B. Guo, and Y. Tan
failed, but be in an intermediate state. Such a network is named a multi-state network, also called as a stochastic-flow network. In this network, the shortest or quickest route may not be the most reliable route, but sometimes it would be preferred to obtain the most reliable route under a given time threshold. To obtain the most reliable route, we need to calculate the reliability of a given route firstly. The classical network reliability problem is the two-terminal reliability (2T R) problem which assumes both network and components have only binary states, fully working or fully failed states. However, the traditional binary reliability theory and models fail to characterize the behavior of a multi-state network[6]. Hence, some multi-state reliability analysis approaches and models have been proposed to cope with this problem. The multi-state two-terminal reliability at demand level d (M 2T Rd) is the original one which is defined as the probability that d units of demand can be supplied from source to sink through multi-state arcs[7, 8]. Then it is extended to some more complex problems, such as multi-commodity reliability[9–13], M 2T Rd under time threshold (M 2T R(d,T ))[14, 15] and under budget constraint (M 2T R(d,c))[16, 17]. Many previous models for computing M 2T Rd mostly based on minimal cut (MC) or minimal path (MP). In such approaches, the key problem is to obtain all possible d-MCs or d-MPs under d units of demand. Then the reliability can be computed directly if d-MCs or d-MPs are given. These extended problems, i.e. M 2T R(d,T ) and M 2T R(d,c), can also be solved by the MC or MP based approaches where the difference is the algorithm of obtaining MCs or MPs. Considering a stochastic-flow network, Lin Y.K.[18] extended the fastest route problem to evaluating the system reliability that d units of data could be sent under the time constraint T . In his study, the lead time of each arc is related to its current capacity and demand d, and the data are transmitted through two minimal paths simultaneously in order to reduce the transmission time. A simple algorithm is proposed to generate all (d, T )-MPs and the system reliability can then be computed in terms of (d, T )-MPs. Moreover, the optimal pair of minimal paths with highest system reliability could be obtained. The MC/MP-based approach is often a tedious process. It is also hard to be applied when there exist some arcs whose states or lead times follow different continuous probability distributions. In fact, each road of a transportation network may be in a stochastic state following a probability distribution and the total transportation time of a given route is not determinate but related with the type of vehicle, the current states of roads and the transportation demand. Hence, more practical algorithms have been proposed to solve this problem. Simulation is a very effective method which trade off accuracy for execution time. Ramirez-Marquez J.E. et al.[6] presented a Monte-Carlo simulation approach for approximating the M 2T Rd. Janssens G.K. et al.[19] considered the uncertainty in travel times caused by congestion on roads and applied a methodology in which a heuristic was used to find a solution for routing and the time Petri net was used to evaluate the sensitivity of the route solution. This paper describes a capacitated stochastic colored Petri net (CSCPN) based simulation approach for reliability-based route optimization of a transportation
Reliability-Based Route Optimization of a Transportation Network
145
network with arcs whose capacities are stochastic following any discrete or continuous distribution. The rest of the paper is organized as follows. Section 2 describes the problem. Section 3 presents the CSCPN model and how to describe the dynamic behavior of a multi-state transportation network. Section 4 gives how to obtain the most reliable route by the CSCPN-based simulation. Section 5 presents a computational experiment. The final section makes a conclusion.
2
Problem Description
There is a transportation network G = (N, A) with a required transportation demand d from source node s to sink node t in the time threshold T by the specified vehicles. The required minimal capacity of each road is c so that the vehicles can pass by. N = {n1 , n2 , ..., nm } represents the set of nodes, m is the number of nodes and ni represents the ith node. A = {a1 , a2 , ..., an } represents the set of arcs, n is the number of arcs and ai represents the ith arc. The current state (capacity) of arc ai is represented by ci . It is stochastic with a given probability distribution. Let li and ui be the smallest and largest capacities of arc ai respectively, so li ≤ ci ≤ ui . Let δ(ai , ci , d) be the transmission time to transmit d units of demand through arc ai under its capacity ci . If ci < c, δ(ai , ci , d) = ∞. It is also stochastic and described by a given probability distribution. The aim is to obtain the most reliable route when the demand d, the time threshold T and the required minimal capacity c are given. The assumptions in this study are as below. (1) Each node is perfectly reliable. (2) The capacities of different arcs are statistically independent.
3
Capacitated Stochastic Colored Petri Net
Due to the uncertainties in a multi-state network, the Petri-net method is suitable for describing the dynamic behavior of the system. First created in 1962 and reported in the thesis of Petri[20], Petri-net are an adaptable and versatile, yet simple, graphical modeling tool used for dynamic system representation. In this study, CSCPN, a kind of advanced stochastic colored Petri net (SCPN), is advanced for analyzing the dynamic behavior of multi-state transportation network. 3.1
Petri Net
A basic Petri net (PN) is a kind of directed graph which has four elements: places, transitions, arcs and tokens. The arcs connect the places to the transitions and the transitions to the places. Each place is marked with m tokens where m is a non-negative integer, called marking. Such a token can be used to stand for state, data, items or resources. After the enabling condition of a transition is satisfied, some tokens may move out or in the corresponding places. The number of tokens moving out or in is related to the weight of the corresponding arc[21].
146
T. Zhang, B. Guo, and Y. Tan
Formally, a PN is a five-tuple P N = P, T, P re, P ost, M0 ,
(1)
where[22]: P = {P1 , P2 , ..., PL } is a finite, nonempty set of places, L > 0. T = {T1 , T2 , ..., TJ } is a finite, nonempty set of transitions, J > 0. P re : P × T → N + is the input incidence function such that P re(Pi , Tj ) is the weight of the arc directed from place Pi to transition Tj . N + is the set of non-negative integers. P ost : P × T → N + is the output incidence function such that P ost(Pi , Tj ) is the weight of the arc directed from transition Tj to place Pi . M0 : P → N + is the initial marking function that associates zero or more tokens to each place. 3.2
SCPN
To describe the duration of behavior, timed Petri net (TPN) was defined by associating time with the firing of transitions in PN. A special case of TPN is stochastic Petri net (SPN) where the firing times are considered random variables. In many systems, they present similar processes, and differ from each other only by their inputs and outputs[23]. To reduce the quantity of places, transitions and arcs in a PN, a more compact representation of a Petri net , called as colored Petri net (CPN), was developed[24]. In CPN, each token moving between places and transitions is assigned with a color. The concept of color is analogous to the concept of type, common among the programming languages. It is easy to know that SCPN have combined the characteristics of both SPN and CPN. SCPN is defined as a directed graph represented by a eight-tuple SCP N = P, T, Co, H, P re, P ost, F T, M0 ,
(2)
where: P = {P1 , P2 , ..., PL } is a finite, nonempty set of places, L > 0. T = {T1 , T2 , ..., TJ } is a finite, nonempty set of transitions, J > 0. Co : P ∪ T → C is a color function that associates with each element in P ∪ T a non empty set of colors. C is the set of all possible token colors. In this study, a token color is represented by not an integer but a structure with multi-values. Co(Pi ) is the set of possible token colors in place Pi ∈ P . Co(Ti ) is the set of possible token colors in transition Ti ∈ T . H : P × T → CMS is defined for an inhibitor arc that connects a place to a transition. An inhibitor arc between Pi ∈ P and Tj ∈ T (i.e. H(Pi , Tj ) = {c}, c ∈ C) implies that as soon as there are a token whose color is c in Pi , the arc inhibits the firing of Tj . CMS is a multiset over the non-empty set C. P re is the pre-incidence matrix. P re(Pi , Tj ) : Co(Tj ) → Co(Pi )MS is the input incidence mapping function from the set of occurrence colors of Tj ∈ T to the set of multisets over Co(Pi ), i.e. P re(P1 , T1 ) : c1 → {c1 , c3 } implies that if
Reliability-Based Route Optimization of a Transportation Network
147
there are token colors c1 , c3 in the place P1 , the firing condition of transition T1 on the place P1 is satisfied. P ost is the post-incidence matrix. P ost(Pi , Tj ) : Co(Tj ) → Co(Pi )MS is the output incidence mapping function from the combination of Tj and Pi to the multiset over the set of colors of Co(Pi ). F T : T × C → R+ is the firing time function such that F T (Tj , ci ) is the duration of the firing of transition Tj when the enabled token color is ci ∈ Co(Tj ). R+ is the set of non-negative real numbers. F T is according to a kind of given distribution and the token colors. M0 : P → CMS is the initial token color set function that associates zero or more token colors to each place. M0 (Pi ) = {c1 , c2 , ..., cn }, ci ∈ C, i = 1, 2, ..., n, implies that the initial token color set of place Pi is {c1 , c2 , ..., cn }. For example, there are two components whose failure and maintenance time follow different distribution types. The requirements of maintenance resources are also different. One repairman of type A should be available for the maintenance of the first component and one repairman of each type A and B should be available for second one. Fig.1(a) shows a SPN model for this system. The places P1 , P3 respectively stand for the working condition of the first and second component. The places P2 , P4 respectively stand for the failed condition of the first and second component. The places P5 , P6 respectively stand for the state of repairman of type A and B. The transitions T1 , T2 respectively stand for the failure and maintenance process of the first component. The transitions T3 , T4 respectively stand for the failure and maintenance process of the second component. Although their time distributions and requirements are different, their failure and maintenance process are similar. Hence, the SPN shown in Fig.1(a) can be modified to the SCPN shown in Fig.1(b). The token color marked with and respectively stand for the first and second component. The token color marked with and respectively stand for the repairman of type A and B. So,
Fig. 1. SPN and SCPN of the failure and maintenance process of two different components
148
T. Zhang, B. Guo, and Y. Tan
in the SCPN shown in Fig.1(b), P = {P1 , P2 , P3 }, T = {T1 , T2 }, Co(P1 ) = {1, 2}, Co(P2 ) = {1, 2}, Co(P3 ) = {3, 4}, Co(T1 ) = {5, 6}, Co(T2 ) = {7, 8} , P re(P1 , T1 ) = {{5} → {1}, {6} → {2}}, P re(P2 , T2 ) = {{7} → {1}, {8} → {2}}, P re(P3 , T2 ) = {{7} → {3}, {8} → {3, 4}}, P re(P2 , T1 ) = {{5} → {1}, {6} → {2}}, P re(P1 , T2 ) = {{7} → {1}, {8} → {2}}, P ost(P2 , T1 ) = {{5} → {1}, {6} → {2}}, P ost(P1 , T2 ) = {{7} → {1}, {8} → {2}}, P ost(P3 , T2 ) = {{7} → {3}, {8} → {3, 4}}, M0 (P1 ) = {1, 2}, M0 (P3 ) = {3, 4}. F T (T1 , 5), F T (T1 , 6), F T (T2 , 7), F T (T2 , 8) may follow any kind of probability distribution according to the failure and maintenance process. 3.3
CSCPN for Transportation Network
In this study, to describe the dynamic behavior of a multi-state transportation network, SCPN is modified to be more universal basing on the original definition. Original transition is extended to capacitated transition which has a stochastic capacity. CSCPN is defined by a nine-tuple CSCP N = P, T, Co, H, P re, P ost, F T, CP, M0 ,
(3)
where CP is added into the definition of SCPN and F T is extended as below. CP : T → R+ is the capacity function such that CP (Tj , ci ) is the capacity of transition Tj when the enabled token color is ci ∈ Co(Tj ). CP is related to a kind of given distribution. F T : T × C → R+ is the firing time function such that F T (Tj , cpj , ci ) is the duration of the firing of transition Tj when its current capacity is cpj and the enabled token color is ci ∈ Co(Tj ). For a multi-state transportation network, the nodes and arcs of network are respectively represented by the places and transitions in CSCPN model. The one way and two way arcs in a transportation network will be represented by one transition and two transitions respectively as shown in Fig. 2 and Fig. 3. The firing durations of the transition T1 and T2 are respectively the transmission time from node 1 to 2 and from node 2 to 1. In fact, their transmission time would be following different kinds of probability distribution.
Fig. 2. SCPN model for a one way arc
Fig. 3. SCPN model for a two way arc
Reliability-Based Route Optimization of a Transportation Network
149
The transitions whose input/output places include Pi are called as the output/input transitions of place Pi . Because of the characters of a transportation network, each transition in the CSCPN of this kind network has only one input place and one output place. Transitions with more than one input or output places shown in Fig. 4 are not permitted in the CSCPN of a transportation network.
Fig. 4. Impossible transitions in the SCPN of a transportation network
For example, there is a transportation network with eight crossings and fifteen roads. When the source and sink node are node 1 and 8 respectively, this network can be represented by a graph as shown in Fig. 5. Basing on the description rule of CSCPN, it is easy to be transformed from a network graph into CSCPN as shown in Fig. 6.
Fig. 5. An example of transportation network
In order to analyze the route and provide enough information for evaluation of two-terminal reliability, A complex token color in CSCPN of a multi-state network is defined as a structure as below. Struct T okenColor { string RouteInf o; // the information of the route that this token has passed int SourceN o; // the number of the source node place int SinkN o; // the number of the sink node place double Demand; // the demand of transportation, it is set to be d double Capacity; // the minimal requirement of the road capacity double T imestamp; //the time stamp }
150
T. Zhang, B. Guo, and Y. Tan
Fig. 6. The CSCPN of the transportation network shown in Fig.5
The color of a token can be also described by a six-tuple co=(RouteInf o, SourceN o, SinkN o, Demand, Capacity, T imestamp). The aim is to record the route and time information and to guide this token to the sink node place. If a token is labelled by co, co.[propertyname] is used to get or set the value of corresponding property, i.e. co.SinkN o represents the sink node place number of token co. If a token co = (”[P 1] − [T 2] − [P 3] − [T 11] − [P 6] − [T 20] − [P 8]”, 1, 8, 50, 10, 60.8) appears in place P8 , it stands for that 50 units of demand of need to be sent from source node (P1 ) to sink node (P8 ), the required minimal capacity is 10 and it arrives at the sink node P8 at the time of 60.8 through the nodes of P3 and P6 . In the SCPN of this study, the enabling condition of the transition Tj is that the input place of Tj has more than one token whose color co satisfies three conditions as below. (1) co.RouteInf o does not contain the name of the output place of Tj . It guarantees that a token will not be circulating in the network. (2) co.Capacity is less than the current capacity of Tj which will be given by the sampling function. (3) co.SinkN o is not the number of the input place. As shown in Fig.7, the transition Tj is enabled by the token co in its input place Pi and a new token color co will be created as below.
co .RouteInf o = co.RouteInf o + ” − [Tj ] − [Pk ]”
(4)
co .T imestamp = co.T imestamp + F T (Tj , cpj , co.Demand) ,
(5)
Reliability-Based Route Optimization of a Transportation Network
151
where Pk is the output place of Tj and F T (Tj , cpj , co.Demand) is the firing time function which stands for the transportation time under the current capacity cpj and demand d given, F T = δ(aj , cpj , d). Then it will move into the transition Tj . Analogously, the token color co will be created and move into the transition Tj as shown in Fig. 7(b). The token co will not be removed until all output transitions of place Pi have been checked out. Anyhow, the token colors in the sink node place will not be moved. In the example of Fig. 7, co .T imestamp < co .T imestamp. So when it is the time of co .T imestamp, the token color co in the transition Tj will move into the output place Pk as shown in Fig. 7(c). Analogously, when it is the time of , the token color co will move into the output place Pn as shown in Fig. 7(d). Because the token color co is moved from the place Pi and co .RouteInf o contains the name of the place Pi , the enabling condition of the transition Tm will not be satisfied by this token. It is shown that the token will be broadcast out until it arrives at the sink node. It helps to find all possible routes from the source node to the sink node under the time threshold.
Fig. 7. An example of tokens moving in CSCPN of a transportation network
4
Simulation for Reliability-Based Route Optimization
The reliability in this problem is defined as the probability that a demand of d units can be transported from source to sink node in given time threshold T and the required minimal capacity c considering a multi-state transportation network (M 2T R(d,T,c)). Basing on the simulation of SCPN, all possible routes
152
T. Zhang, B. Guo, and Y. Tan
which may satisfy the requirement of the transportation can be found. The route with highest reliability will be taken as the best route. Let N be the number of simulations. In each simulation, the simulation will be over when the current simulation time is more than T . Let Ci = {ci1 , ci2 , ..., ciMi } , Mi respectively be the set and number of token colors in the sink node place after the ith simulation, i = 1, 2, ..., N . They are obtained when each simulation is fini ished. Let Ci .RouteInf o be {C1i .RouteInf o, C2i .RouteInf o, ..., CM .RouteInf o}. i So if the number of simulations is big enough, the set of all possible routes is given by N AllRoutes = {r1 , r2 , ..., rM } = Ci .RouteInf o , (6) i=1
where M is the total number. For each route rj in AllRoutes, the route reliability can be calculated by R(rj ) =
N 1 ( Count(Ci , rj )) , N
(7)
i=1
where
0, if rj is not in Ci .RouteInf o, Count(Ci , rj ) = 1, if rj is in Ci .RouteInf o.
For a transportation network, the reliability can be calculated by M 2T R(d,T,c) = max(R(r1 ), R(r2 ), ..., R(rj ), ..., R(rM )), rj ∈ AllRoutes
(8)
The route with highest reliability which will be taken can be given by Route = rj ,
(9)
where R(rj ) = M 2T R(d,T,c), rj ∈ AllRoutes.
5
Experimental Results
The experiment are basing on the example shown in Fig. 5. The source and sink respectively are the 1st and 8th nodes. The capacity and the transmission time distribution parameters of each arc are all shown in Table 1. Table 2 presents the experiment result when different combinations of the time threshold T and the required capacity c are given and the transportation demand d is 50. Table 3 presents the experiment result when the transportation demand d is 100. Each experiment result includes the best route and its reliability M 2T R(d,T,c). Hence, after the time threshold and required capacity are given, the reliabilitybased best route can be obtained in these tables which support the planners to do a fast decision making.
Reliability-Based Route Optimization of a Transportation Network Table 1. Network data of the example Arc(T ransition) Capacity distribution (ci ) Transmission time δ(ai , ci , d), ci > 0
1 (T1 )
Capacity Probability 0 0.063 5 0.162 10 0.236 20 0.549
2 (T2 )
Mean distribution Min=0, Max=50
3 (T3 )
4 (T4 , T5 )
5 (T6 , T7 ) 6 (T8 ) 7 (T9 , T10 ) 8 (T11 )
Triangular distribution Min=0,Mean=20, Max=25 Capacity Probability 0 0.053 5 0.257 8 0.434 12 0.256 Triangular distribution Min=0,Mean=15, Max=25 Mean distribution Min=0, Max=40 Triangular distribution Min=0,Mean=16, Max=40 Triangular distribution Min=0,Mean=40, Max=50
9 (T12 , T13 )
Mean distribution Min=0, Max=50
10 (T14 )
Mean distribution Min=0, Max=35
11 (T15 , T16 )
12 (T17 , T18 )
Triangular distribution Min=0,Mean=15, Max=40 Capacity Probability 0 0.012 4 0.203 8 0.331 12 0.454
13 (T19 )
Mean distribution Min=0, Max=30
14 (T20 )
Mean distribution Min=0, Max=50
15 (T21 )
Triangular distribution Min=0,Mean=15, Max=20
Lognormal distribution, StandardDeviation=6, if d <80,Mean=220/ci , else Mean=264/ci . Weibull distribution, Shape=2.5, if d <80,Scale=1000/ci , else Scale=1200/ci . Weibull distribution, Shape=2, if d <80,Scale=700/ci , else Scale=840/ci . Lognormal distribution, StandardDeviation=10, if d <80,Mean=240/ci , else Mean=288/ci . Weibull distribution, Shape=2, if d <80,Scale=480/ci , else Scale=576/ci . Weibull distribution, Shape=2, if d <80,Scale=1500/ci , else Scale=1800/ci . Weibull distribution, Shape=3, if d <80,Scale=600/ci , else Scale=720/ci . Weibull distribution, Shape=3, if d <80,Scale=2100/ci , else Scale=2520/ci . Weibull distribution, Shape=2, if d <80,Scale=2000/ci , else Scale=2400/ci . Weibull distribution, Shape=3, if d <80,Scale=1100/ci , else Scale=1320/ci . Weibull distribution, Shape=3, if d <80,Scale=1200/ci , else Scale=1440/ci . Lognormal distribution, StandardDeviation=20, if d <80,Mean=640/ci , else Mean=768/ci . Weibull distribution, Shape=3, if d <80,Scale=1050/ci , else Scale=1260/ci . Weibull distribution, Shape=2, if d <80,Scale=1200/ci , else Scale=1440/ci . Lognormal distribution, StandardDeviation=15, if d <80,Mean=750/ci , else Mean=900/ci .
153
154
T. Zhang, B. Guo, and Y. Tan
Table 2. Computational result of different combinations of the time threshold T and the required capacity c (d=50) c T = 410 T = 420 T = 430 T = 440 T = 450 T = 460 T = 470 T = 480 T = 490 T = 500 1 0.858 0.870 0.873 0.877 0.881 0.888 0.892 0.896 0.899 2 0.852 0.860 0.865 0.871 0.876 0.879 0.880 0.885 0.892 3 0.837 0.841 0.845 0.850 0.853 0.852 0.863 0.863 0.867 4 0.806 0.812 0.816 0.816 0.820 0.823 0.824 0.827 0.827 5 0.783 0.784 0.787 0.788 0.793 0.792 0.792 0.793 0.794 6 0.750 0.752 0.754 0.754 0.755 0.755 0.758 0.758 0.758 7 0.716 0.716 0.717 0.717 0.720 0.720 0.721 0.721 0.722 8 0.681 0.681 0.682 0.682 0.682 0.682 0.683 0.682 0.682 9 0.641 0.644 0.644 0.644 0.645 0.645 0.645 0.646 0.646 10 0.607 0.608 0.608 0.610 0.610 0.610 0.611 0.611 0.612 The route with highest reliability : 1 → 4 → 3 → 6 → 8; : 1 → 3 → 6 → 8.
This paper describes a stochastic Petri net-based simulation approach for reliability-based route optimization of a transportation network. The capacities of arcs may be in a stochastic state following any discrete or continuous distribution. The transmission time of each arc is also not a fixed number but stochastic according to its current capacity and demand. In this study, CSCPN is proposed for modeling the behavior of a transportation network. Capacitated transition and self-modified token color with route information are defined to describe the multi-state network. By the simulation, the optimal route whose reliability is highest can be given. Though the reliability is approximated, it is a practical and effective approach which trade off accuracy for execution time.
Reliability-Based Route Optimization of a Transportation Network
155
Acknowledgments. This study was done whilst Tao Zhang was visiting in the University of Nottingham. His research and visiting were supported by the National Science Foundation of China, under Grant No. 70971132. The authors gratefully acknowledge Prof. John Andrews who is the Royal Academy of Engineering and Network Rail Professor of Infrastructure Asset Management. He is also the Director of The Lloyd’s Register Educational Trust Centre for Risk and Reliability Engineering at the University of Nottingham. Thank him for many helpful and constructive comments.
References 1. Ahuja, R.K.: Minimum cost-reliability ratio problem. Computers and Operations Research 16, 83–89 (1998) 2. Bodin, L.D., Golden, B.L., Assad, A.A., Ball, M.O.: Routing and scheduling of vehicles and crews: The state of the art. Computers and Operations Research 10, 63–211 (1998) 3. Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. Journal of ACM 34, 596–615 (1987) 4. Golden, B.L., Magnanti, T.L.: Deterministic network optimization: A bibliography. Networks 7, 149–183 (1977) 5. Park, C.K., Lee, S., Park, S.: A label-setting algorithm for finding a quickest path. Computers and Operations Research 31, 2405–2418 (2004) 6. Ramirez-Marquez, J.E., Coit, D.: A Monte-Carlo simulation approach for approximating multi-state two-terminal reliability. Reliability Engineering and System Safety 87, 253–264 (2005) 7. Lin, J.S., Jane, C.C., Yuan, J.: On reliability evaluation of a capacitated-flow network in terms of minimal pathsets. Networks 3, 131–138 (1995) 8. Yeh, W.C.: A simple approach to search for all d-MCs of a limited-flow network. Reliability Engineering and System Safety 71, 15–19 (2001) 9. Lin, Y.K.: Study on the multicommodity reliability of a capacitated-flow network. Comput. Math. Appl. 42, 255–264 (2011) 10. Lin, Y.K.: Two-commodity reliability evaluation for a stochastic-flow network with node failure. Computers and Operations Research 29, 1927–1939 (2002) 11. Lin, Y.K.: Two-commodity reliability evaluation of a stochastic-flow network with varying capacity weight in terms of minimal paths. Computers and Operations Research 36, 1050–1063 (2009) 12. Lin, Y.K.: Performance evaluation for the logistics system in case that capacity weight varies from arcs and types of commodity. Applied Mathematics and Computation 190, 1540–1550 (2007) 13. Yeh, W.C.: A simple minimal path method for estimating the weighted multicommodity multistate unreliable networks reliability. Reliability Engineering and System Safety 93, 125–136 (2008) 14. Lin, Y.K.: Routing policy of stochastic-flow networks under time threshold and budget constraint. Expert Systems with Applications 36, 6076–6081 (2009) 15. Lin, Y.K.: System reliability of a stochastic-flow network through two minimal paths under time threshold. Int. J. Production Economics. 124, 382–387 (2010) 16. Lin, Y.K.: An algorithm to evaluate the system reliability for multicommodity case under cost constraint. Computers and Mathematics with Applications 48, 805–812 (2004)
156
T. Zhang, B. Guo, and Y. Tan
17. Zhou, Y., Meng, Q.: Improving efficiency of solving d-MC problem in stochasticflow network. Reliability Engineering and System Safety 92, 30–39 (2007) 18. Lin, Y.K.: Time version of the shortest path problem in a stochastic-flow network. Journal of Computational and Applied Mathematics 228, 150–157 (2009) 19. Janssens, G.K., Caris, A., Ramaekers, K.: Time Petri nets as an evaluation tool for handling travel time uncertainty in vehicle routing solutions. Expert Systems with Applications 36, 5987–5991 (2009) 20. Petri, C.A.: Kommunikation mit automaten. PhD thesis (1962) (in German) 21. Molloy, M.H.: Performance analysis using stochastic Petri nets. IEEE Trans. Comp. C-31, 913–917 (1982) 22. Tolba, C., Lefebvre, D., Thomas, P., Moudni, A.: Continuous and timed Petri nets for the macroscopic and microscopic traffic flow modelling. Simulation Modelling Practice and Theory 13, 407–436 (2005) 23. de Athayde Prata, B., Nobre Jr., E.F., Barroso, G.C.: A stochastic colored petri net model to allocate equipments for earth moving operations. ITcon 13, 476–490 (2008) 24. Jensen, K.: Colored Petri nets: Basic concepts, analysis methods and practical use, vol. 1. Springer, New York (1992)
Modeling Multilocation Transshipment with Application of Stochastic Programming Approach Jingxian Chen and Jianxin Lu School of Business, Nantong University, Nantong 226019, P.R. China [email protected]
Abstract. Lateral transshipment could be used as an effective and fast replenishment policy in inventory systems. However, in decentralized supply chain systems, imperfect transshipment planning can make total costs increasingly. In this paper, we consider a single-item, multi-location, two-echelon supply chain system with lateral transshipment existence. A continuous review order-up-to policy is assumed for the inventory control of the item. Two objectives are investigated: total costs of the supplier, total costs of the retailers’ coalition. A stochastic multi-objective decision model is established to describe systems behavior. Based on stochastic quasi gradient algorithm (SQGA) and genetic algorithm (GA), a solution procedure is developed to obtain the with-probability-1 (w.p.1) optimal order-up-to levels. An extensive numerical experiment shows the model and algorithm is very efficient. We also test cost parameters sensitivity to the model. Keywords: Inventory, lateral transshipment, stochastic quasi gradient algorithm, genetic algorithm.
1
Introduction
From 1960s, inventory pooling, a fast and low-cost stock replenishment policy, has obtained a widespread application in practice. Many famous group enterprises employed pooling policy to reduce costs and improve customer service meanwhile. For instance, Xerox has consolidated all of its country-based warehouses in Europe into a single European Logistics Center in the Netherlands (Here et al, 2006) [1]. Lateral transshipment can be defined as ’one depot transships residual stocks to other depots which faced with residual demand’, and in this way, members of the same echelon pool their inventories, which can allow them to lower inventories level and costs whilst still achieving the required
This work was partially by the National Social Science Foundation of China under grant No. 10CGL025 and by the Jiangsu University Social Science Foundation under grant No. 2010SJB630055 and by the Nantong University Social Science Foundation under grant No. 09W021 and by the Nantong University Nature Science Foundation under grant No. 10ZJ010. Corresponding author.
Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 157–167, 2011. c Springer-Verlag Berlin Heidelberg 2011
158
J. Chen and J. Lu
service level (Paterson et al, 2010) [2]. But some recent studies point that transshipment can cause serious problems to depots’ inventory control, especially in the circumstance that lots of depots involved in the transshipment process. Olsson (2009) [3] holds that inventory models allowing transshipments among all locations tend to be very difficult to analyze analytically, and suggests that it is important not to establish ’unnecessary’ transshipment links to reduce the complexity of the system. Based on system dynamics model approach, Chen & Lu (2010) [4] propose that unnecessary transshipment link cannot effectively improve the performance of supply chain systems, even reduce system’s customer service level and increase inventories variation when involved depots face with different random distribution demand. Therefore, when the system faces with uncertain demand, some decisions must be made before transshipment happen. Would supply chain systems utilize pooling for replenishment? How to make an effective and reasonable plan for transshipping network? This paper will take these problems as the basic starting point and will employ stochastic programming approach to explain why the systems can transship and how transship efficiently. Generally speaking, lateral transshipment involves many benefit individuals under the decentralized system. Single objective models that minimize system total costs or maximize system total profits cannot exactly reflect individuals decision behavior in practice. Lots of important extant literatures have studied the optimal control policy of inventory systems which participated in stock pooling (Lee, 1987 [5]; Tagaras, 1989 [6]; Herer & Rashit, 1999 [7]; etc.), optimal or suboptimal solutions that attached to system performance were obtained by mathematical programming approach. On the other side, some other scholars began to remove game theory to inventory system where existed lateral transshipment and to study the pooling game decision model (Rudi et al, 2001 [8]; Dong & Rudi, 2004 [9]; Hu et al, 2007 [10]; Hu et al, 2008 [11]; Hanny et al, 2010 [12]; etc.), equilibrium solutions were obtained based on the extension of newsvendor model. However, single objective programming approach or game theory approach perhaps cannot efficiently handle the multilocation transshipment problem, especially more than three retailers involved in stock pooling. Single objective model can deal with the system optimal decision variables but cannot deal with the individual’s transshipment behavior in supply chain system, such as order quantities and holding quantities. Moreover, game theory, the best approach for describing behavior of rational economic entity, cannot efficiently model the transshipment in multilocation system since the game process and game structure is too complicated to establish analytical model of the system. In this paper, we consider the multilocation transshipment problem as our object, and employ stochastic programming theory to establish our multi-objective transshipment model. We expect our model can effectively solve the optimal inventory level in supply chain, where lateral transshipment may happen. The remains of this paper are organized as follows. The multi-objective stochastic programming model of lateral transshipment will be introduced in
Modeling Multilocation Transshipment with Application
159
Section 2, and Section 3 describes the solution algorithm in detail. Numerical examples are shown in Section 4. We end with summary and directions for future research in Section 5.
2
The Model
In this paper, we consider a two-echelon supply chain system, including a single supplier and multi retailers. They all employ continuous review order-up-to policy to control inventory level. The sequence of events is as follows: 1) According to order-up-to level, each retailer order merchandises from the supplier; 2) Retailers get the order quantities and demands are realized; 3) Lateral transshipment, from the retailer faced with residual stocks to the retailer faced with residual demands, is happened; 4) Together with order quantities and transshipment quantities, demands are satisfied. In order to characterize the operation of the supply chain system by mathematical programming approach, we use the following notation. n =number of retailers; di =stochastic demand at retailer i; s0 =order-up-to level of the supplier; si =order-up-to level of the retailer i; qi =order quantities from retailer i to supplier; xij =transshipment quantities from retailer i to retailer j , and i = j; γij =transshipment allocation value, and γij = 0 means xij = 0, γij = 1 means xij = 0; Ii h0 hi p0
=net inventories of retailer i after transshipment; =holding cost incurred at supplier per unit held per period; =holding cost incurred at retailer i per unit held per period; =penalty cost incurred at supplier per unit backlogged per period;
pi =penalty cost incurred at retailer i per unit backlogged per period; ci =transfer price per unit from retailer i to retailer j; ci =replenishment cost per unit at retailer i ; cij =direct transshipment cost per unit from retailer i to retailer j; cij =effective transshipment cost per unit from retailer i to retailer j, ci ) − (cj + cj ). and cij = cij + (ci + In this paper, we will consider the supply chain system as a decentralized system. The supplier will minimize its own total costs, and the retailers’ coalition will minimize their own total costs. Thus, we construct the following mathematical
160
J. Chen and J. Lu
model. Because the demand at retailer i is a stochastic variable, we employ stochastic expected value programming model to illustrate the system. min
F0 (si , γij , xij ) = E
n
(ci + ci )qi +
i=1
+
n n
n
γij cij xij ,
i=1
hi Ii+ +
n i=1
pi Ii− (1)
i=1 j=1 j=i
min
n n + + qi + p0 qi − s0 . F1 (s0 , si , γij , xij ) = h0 E s0 −
qi = si − di −
s.t.
i=1 n
n
j=1 j=i
j=1 j=i
γij xij +
Ii+ = (si − di −
Ii− = (di − si − n j=1 j=i n j=1 j=i n j=1 j=i
n j=1 j=i n
(2)
i=1
γji xji ,
(3)
γij xij )+ ,
(4)
γji xji )+ ,
(5)
j=1 j=i
n xij ≤ min (si − di )+ , (dj − sj )+ ,
xji ≤ min (di − si )+ ,
j=1 j=i n
(sj − dj )+ ,
(6)
(7)
j=1 j=i
xij ≥ 0,
n
xji ≥ 0,
(8)
j=1 j=i
γij = 0 or 1, s0 , si ≥ 0.
(9)
In this model,(·)+ means max(·, 0)+ ; Ii+ means the holding inventories after transshipment; Ii− means the backlog after transshipment; and E means the expected value for the stochastic demand vector d = (d1 , d2 , ..., dn ). Objective (1) is the total costs of the retailers’ coalition, and objective (2) is the total costs of the supplier. This bi-objective programming model refers that the goal of this optimization is to derive the optimal decision variables for the retailers’ coalition and the supplier, while most extant literatures consider only a single optimization objective (Herer et al, 2006; Olsson, 2009). Equation (3), (4) and (5) transform qi , Ii+ and Ii− to the expression of s0 , si , γij and xij , which is the model decision variables. The leftover inequations and equations of the model are the constraints of the decision variables.
Modeling Multilocation Transshipment with Application
161
In addition, in order to assure transshipment is occurred when one retailer owns residual stocks and the other one owns residual demands, we should carry on some constraints to the model cost parameters. ci < cj + cj + cji , ci + hi < hj + cji ,
(10) (11)
pi < pj + cji .
(12)
Where inequation (10) means that the intermediary retailer cannot exist in the system, and inequation (11) means that transshipment cannot occur when all retailers are faced with the residual stock, and inequation (12) means that transshipment cannot occur when all retailers have backlogs. Considering the stochastic variable in the model, we will introduce an efficient stochastic programming algorithm into the model solution procedure in the next section.
3
The Solution Algorithm
Because the stochastic demand variable is included in the model, we design the algorithm based on stochastic programming approach. The stochastic quasigradient (SQG) methods are stochastic algorithmic procedures for solving general constrained optimization problems with nondifferentiable, nonconvex functions. Considered the model that we established, two objective functions are nonconvex and nondifferentiable functions, as well as the constraints. Thus, SQG method can be employed. However, using SQG algorithm will require estimate stochastic gradient of each sample path in every iterative. If the optimization problem is complicated and the iterations are very large, the SQG algorithm will occupy lots of computation resources and cannot efficiently obtain the optimal solution. In view of the fact that the huge superiority of genetic algorithm (GA) in iterative computation, we combine the SQG and GA into the solution procedure, as figure 1 shows. We develop the solution algorithm in view of hierarchical solution process. Two solution layers are shown in the figure 1. In order to optimize the total costs of retailers’ coalition, the inner solution procedure aims to achieve the optimal solution of retailers’ order-up-to level, while the external solution procedure aims to achieve the optimal solution of supplier’s order-up-to level so as to minimize the total costs of supplier. With regard to this algorithm, the basic methodology which utilizes the stochastic gradient to generate the next iterative value will lead to the optimal solution when the algorithm convergence, at least obtain the with-probability-1 optimal solution. Equation (13) and (14) are the iterative computation formulas of sθ0 and sθi . θ 0 s sθ+1 = Π − ρ ∇ (θ) , (13) θ θ 0 θ 0 0 = Πθ sθi − ρθ ∇1θi (θ) . (14) sθ+1 i
162
J. Chen and J. Lu
Fig. 1. The solution algorithm procedure
Where sθ+1 and sθ+1 denote the order-up-to level of sample path θ + 1, Πθ 0 i denotes the projection operator, ρθ denotes the iterative step, ∇0θ0 (θ) and ∇1θi (θ) denote the estimated stochastic gradient by function F0 and F1 in sample path θ. Based on the SQG theory that is established by Ermoliev (1983) [13], we conclude ρθ should be satisfied with condition (15), and Πθ could be determined by formula (16), where y denotes an arbitrary variable. ρθ ≥ 0,
∞
∞
ρ2θ < ∞,
(15)
Πθ (y) = arg min y − s2 : s ∈ S .
(16)
θ=1
ρθ = ∞,
θ=1
In addition, we can estimate the stochastic quasi gradient ∇0θ0 and ∇1θi by formula (17) and (18), where Δθ is an arbitrary positive number, ek is a unit vector
Modeling Multilocation Transshipment with Application
163
which the value of kth dimension is 1. dθi k is the kth observed value of retailer i’s demand on sample path θ. ∇0θ0 (θ) = ∇1θi (θ) =
m F0 (sθ + Δθ ek , dθk ) − F0 (sθ , dθ0 )
ek ,
(17)
F1 (sθi + Δθ ek , dθi k ) − F1 (sθi , dθi 0 ) k e . Δθ
(18)
0
k=1 m k=1
i
Δθ
0
i
Besides, the fitness function must be taken advantaged of function F1 . Because the purpose of inner solution procedure is to optimize the total costs of retailers’ coalition, the fitness evaluation should be accordance with function F1 . And other parameters in the solution procedure which hasn’t be mentioned can be configured by the extant literatures as follows.
4
Numerical Example
In this section, we use a numerical example to illustrate the significance of the effect of the stochastic programming approach on the optimal transshipment policy. Based on the Herer’s experiment configuration, we set the total number of steps for the path search θ = 1000, the number of steps for the genetic algorithm search v = 100, and the step size ρθ = 100/θ for the validation examples. As a stopping criterion, we compared the order-up-to levels over 1000 iterations and required that these values do not differ by more than one. We now illustrate our algorithm with one supplier and four retailers. Firstly, we investigate the identical cost parameters case. We assume that the supplier’s holding cost (h0 ) is $1 per unit, the supplier’s penalty cost (p0 ) is $4 per unit, each retailer’s holding cost (hi ) is $1 per unit, each retailer’s penalty cost (pi ) is $4 per unit, the transfer price that from retailer to supplier (ci ) is $5 per unit, the replenishment cost that from supplier to retailer ( ci ) is $0.2 per unit, direct transshipment cost from retailer i to retailer j (cij ) is $0.5 per unit. Thus, the effective transshipment cost per unit from retailer i to retailer j ( cij ) is $0.5 per unit. Setting the initial order-up-to levels of the supplier and retailers are s0 = si = 100 units, the convergence of the solution algorithm is shown in Figure 2. From the iteration curve of the solution algorithm, we conclude that the convergence criterion was satisfied long before 1000 iterations. Then, we order the simulation program MATLAB to report the final computation results, as Table 1 shows. Note that, from Fig.2, convergence of the algorithm for computing order-upto level is very rapid. The computation experiment was conducted on a personal computer with a 2.6-GHz Pentium IV microprocessor. And we should point out that the computational time is 264 seconds. It means that our algorithm is rather efficient in planning problem. Also, From Table 1, the standard deviation of order-up-to quantity is relatively low. And the half-width of a 95% confidence interval is also reported to show the low variability of the SQG estimator. Thus,
164
J. Chen and J. Lu
Fig. 2. Convergence of the solution algorithm for a four-retailer configuration
based on the computation results we can conclude that the two-layer stochastic programming approach can effectively plan the multilocational transshipment problem in the supply chain system. Additionally, in order to confirm the benefit of our model in the aspect of system cost reduction, we compute the identical transshipment case and the nonidentical cost parameters case. Then, we compare the two transshipment cases with the non-transshipment case, as Table 2 shows. Analyzing the computational results that are shown in Table 2, we find that (1) Comparing with the non-transshipment case, identical transshipment policy increase the order-up-to level slightly, but decrease the total costs dramatically; (2) Comparing with the non-transshipment case, nonidentical transshipment policy decrease the orderup-to level and the total cost, but not very significantly; (3) When the holding cost per unit increases, the very retailer will decrease the order-up-to level, and the others will change their order-up-to quantity slightly, supplier change slightly as well, total costs of supplier and retailers will ascend compared to identical case. However, increase the penalty cost per unit, total costs will ascend a lot, and the order-up-to level will decrease a little. Comparing to identical case, transfer price increasing will lead to the order-up-to level increase significantly, especial the supplier’s quantity. Total costs of the supplier and retailers will also increase, but not more than non-transshipment case. Changing the replenishment cost per unit will not lead to a dramatic change in the system, but the very retailer will decrease the order-up-to level compared to the identical case. In short, transshipment policy will be beneficial for supply chain system decreasing total costs.
/ N (100, 20) N (200, 50) N (150, 30) N (170, 50)
s0 s1 s2 s3 s4 213.2 123.8 245.6 178.4 210.3
214.0 117.8 242.3 174.0 210.9
1.87 3.32 2.57 4.41 2.81
212.3 118.2 241.9 176.2 211.5
0.538 1.031 0.972 1.491 0.546
Demand order-up-to level Average Median Standard deviation Optimal value Half width of a 95% distribution level value value deviation value confidence interval
hi
pi
ci
ci
s∗0
NT 1 4 5 0.5 276.3 IT 212.3 NIT-1 h1 = 1.5 4 5 0.5 205.3 hi = 1 NIT-2 1 p1 = 4.5 5 0.5 192.4 pi = 4 NIT-3 1 4 c1 = 6 0.5 239.5 ci = 5 NIT-4 1 4 5 c1 = 0.8 198.6 ci = 0.5
Case 239.8 241.9 230.9 229.9 239.2 230.5
118.1 128.7 123.2
s∗2
115.9 118.2 104.4
s∗1
17.9
180.6
176.7
173.9 176.2 174.0
s∗3
221.3
189.4
190.7
209.8 211.5 203.0
s∗4
62.4
96.9
75.9
102.6 66.3 57.8
73.2
129.4
113.9
156.3 91.9 82.6
TCsupplier TCretailers
Table 2. Computational results of non-transshipment (NT) case, identical transshipment (IT) case and nonidentical transshipment case (NIT)
Table 1. Computational results of identical cost case with one supplier and four retailers
Modeling Multilocation Transshipment with Application 165
166
5
J. Chen and J. Lu
Conclusions and Future Directions
In this paper, we investigate the multilocation transshipment problem in supply chain system with stochastic programming approach. First, a stochastic expect value model was established for description of the multilocation transshipment problem. Second, combining stochastic quasi gradient algorithm with genetic algorithm, we develop a hierarchical solution algorithm and the concrete procedure of the algorithm is discussed. Finally, we employed a simulation-based method using the solution algorithm for optimization. Our simulation-based optimization approach, therefore, provides a platform to analyze transshipment problems of supply chain. Three numerical cases are studied by the simulation-based approach. Some interesting discoveries are discussed based on the computational results. Our finds will be beneficial for supply chain management practice, whereas there are still many other important issues should be studied. We consider the total costs of retailers’ coalition as the second objective in the model, while each retailer’s costs are not considered. With the number of retailers who involved in transshipment become larger and larger, the objective of the model will heavily increase. Thus, developing an efficient algorithm for solving this kind model should be paid attention to. Another hot topic is the transshipment coordination mechanism. Considering each player will maximize her own profits or minimize her own costs, the supply chain should design a reasonable coordination mechanism to coordinate every player.
References 1. Herer, Y.T., Tzur, M., Ycesam, E.: The multilocation transshipment problem. IIE Trans. 38(3), 185–200 (2006) 2. Paterson, C., Kiesmller, G., Teunter, G., Glazebrook, K.: Inventory models with lateral transshipments: A review. Euro. J. Oper. Res. 210(2), 125–136 (2011) 3. Olsson, F.: An inventory model with unidirectional lateral transshipments. Euro. J. Oper. Res. 200(3), 725–732 (2011) 4. Chen, J., Lu, J.: Influence of Lateral Transshipment Policy on Supply Chain Performance: A Stochastic Demand Case. iBusiness 2(1), 77–86 (2010) 5. Lee, H.L.: A multi-echelon inventory model for repairable items with emergency lateral transshipments. Management Sci. 33(10), 1302–1316 (1987) 6. Tagaras, G.: Effects of pooling on the optimization and service levels of two-location inventory systems. IIE Trans. 21(3), 250–257 (1989) 7. Herer, Y.T., Rashit, A.: Lateral stock transshipments in a two-location inventory system with fixed and joint replenishment costs. Naval Res. Logist. 96(5), 525–547 (1999) 8. Rudi, N., Kapur, S., Pyke, D.F.: A two-location inventory model with transshipment and local decision making. Management Sci. 47(12), 1668–1680 (2001) 9. Dong, L., Rudi, N.: Who benefits from transshipment? Exogenous vs. Endogenous wholesale price. Management Sci. 50(5), 645–667 (2004)
Modeling Multilocation Transshipment with Application
167
10. Hu, X., Duenyas, I., Kapuscinski, R.: Existence of coordinating transshipment prices in a two-location inventory model. Management Sci. 53(8), 1289–1302 (2007) 11. Hu, X., Duenyas, I., Kapuscinski, R.: Optimal joint inventory and transshipment control under uncertain capacity. Oper. Res. 53(8), 1289–1302 (2007) 12. Hanny, E., Tzur, M., Levran, A.: The transshipment fund mechanism: Coordinating the decentralized multilocation transshipment problem. Naval Res. Logist. 57(4), 342–353 (2010) 13. Ermoliev, Y.: Stochastic quasigradient methods and their application to system optimization. Stochastics: An International Journal of Probability and Stochastic Processes 9(1-2), 1045–1129 (1983)
Identifying a Non-normal Evolving Stochastic Process Based upon the Genetic Methods Kangrong Tan1 , Meifen Chu2 , and Shozo Tokinaga3 1 2 3
Department of Economics, Kurume University, 1635 Mii-machi, Kurume City, Fukuoka, Japan 839-8502 Department of Economics, Kyushu University, 6-19-1 Hakozaki, Higashi-ku, Fukuoka City, Japan 812-8581 Graduate School of Economics, Kyushu University, 6-19-1 Hakozaki, Higashi-ku, Fukuoka, Japan 812-8581
Abstract. In the real world, many evolving stochastic processes appear heavy tails, excess kurtosis, and other non-normal evidences, though, they eventually converge to normals due to the central limit theorem, and the augment effect. So far many studies focusing on the normal cases, such as Brownian Motion, or Geometric Brownian Motion etc, have shown their restrictions in dealing with non-normal phenomena, although they have achieved a great deal of success. Moreover, in many studies, the statistical properties, such as the distributional parameters of an evolving process, have been studied at a special time spot, not having grasped the whole picture during the whole evolving time period. In this paper, we propose to approximate an evolving stochastic process based upon a process characterized by a time-varying mixture distribution family to grasp the whole evolving picture of its evolution behavior. Good statistical properties of such a time-varying process are well illustrated and discussed. The parameters in such a time-varying mixture distribution family are optimized by the Genetic Methods, namely, the Genetic Algorithm (GA) and Genetic Programming (GP). Numerical experiments are carried out and the results prove that our proposed approach works well in dealing with a non-normal evolving stochastic process.
1 Introduction Many research results have shown that an evolving stochastic process usually displays its evolution process from a non-normal state to a normal state, since the effects of the Central Limit Theorem and time augment. So far, many studies based upon the conventional normal cases, such as Brownian Motion, or Geometric Brownian Motion etc, have achieved a great deal of success in identifying an evolving stochastic process, although these conventional normal assumptions have their restrictions when applied to the cases where the normality does not hold. Furthermore, recent studies have yet focused on the statistical properties at a special time spot, such as the distributional parameters, not giving the details for a whole time-varying picture of such an evolving process. However, in the real world, many cases indicate that an evolving stochastic process usually appears some strong statistical evidences for non-normality, such as heavy tails and excess kurtosis etc, on its evolving way to the normal state. Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 168–178, 2011. c Springer-Verlag Berlin Heidelberg 2011
Identifying a Non-normal Stochastic Process
169
In order to grasp the whole evolving picture of an evolving process, especially under the circumstances of non-normality, we propose a new approach to identify the whole evolving process with a time-varying distribution family, which is a mixture distribution family constructed from a set of weighted basic distributions. In the previous works, a mixture distribution is known as it has a couple of good statistical properties, which make it more flexible adjust its shape to catch the complicated statistical characteristics of observed data [1][2]. It has been applied to many research fields ranging from financial engineering to industrial practice [3][4][5]. However, the parameters in such a mixture distribution family need to be optimized. There are several ways to optimize the weights and other distributional parameters in a mixture distribution, such as, the Markov Chain Monte Carlo Method (MCMC), or Expectation Maximum (EM) Algorithm. But, in the case that the numbers of distributional parameters and the basic distributions for selection are larger, it turns out to be a nonlinear optimization problem under the circumstances of a complicated model setting. It is difficult to check whether all the parameters are converged or not, if the MCMC method is applied, since there does not exist a sufficient standard to see whether all the parameters are converged or not [6][7]. Sometimes a few parameters are converged, but others do not, besides, the MCMC method needs a prior distribution for each parameter; and it is also not suitable for the process of distribution selection from the basic distributions. Neither does the EM algorithm in dealing with high dimensional problems. On the other hand, the crucial problem is that the form of a mixture distribution is usually unknown. Some fixed weights mixture distributions which can be estimated by the MCMC method or the EM algorithm cannot catch the dynamics of observed data, obtained from an evolving stochastic process as pre-assumed above. It is necessary to incorporate the unknown time factor functions into the mixture distribution as well. It is difficult for the existent statistical methods, such as the MCMC method and the EM algorithm to estimate the right form of unknown functions in a mixture distribution. So that, we propose to use the Genetic Methods (Genetic Algorithm, Genetic Programming) to optimize the parameters, simultaneously to select the optimal distribution forms from the basic distributions, and the suitable time factor functions. The Genetic Methods also have been applied to a great deal of practice problems, such as, optimization of the pipelines of a gas station, the search of the optimal members of a mixture distribution, and identifying the structure of digital circuit [8][9]. In our numerical experiments, we give an evolving stochastic process generated from a mixture distribution family of the Student’s t and the normal with different time factor functions, since the Student’s t is a typical distribution with heavy tails. A mixture distribution family constructed from these two distributions can approximate a class of stochastic processes which evolve from a non-normal state to a normal state. And Parallel GAs are designed and adopted in our numerical experiments. The numerical results show that our proposed approach works well. The rest of this paper is organized as follows. Section 2 gives and discusses some statistical properties of a time-varying mixture distribution family, constructed by a convolution of several basic distributions. Section 3 describes the details of how to use the Genetic Methods to identify an evolving stochastic process. Section 4 displays the results of our numerical experiments. And Section 5 provides some concluding remarks.
170
K. Tan, M. Chu, and S. Tokinaga
2 Some Statistical Properties of a Mixture Distribution Family Suppose that we have two random variables x1 , x2 , and one follows a normal distribution and the other follows a heavy-tailed distribution, respectively. By denoting the time factor functions α1 (t), and α2 (t), we consider the distribution p(z, t) of α1 (t)x1 + α2 (t)x2 [10]. For simplicity, hereafter we restrict our discussions on the case that the heavy-tailed is the Student’s t distribution. We assume that each distribution has been standardized to have zero mean and unit variance. We first consider some statistical properties of such a convolution density function. And then we show some simulated results based upon the statistical properties of this convolution density, starting from a heavy-tailed state, and converging into a normal state. One may extend the following discussions easily to other heavy-tailed distributions. 2.1 Statistical Properties We have the following theorems. Theorem 1. α1 (t)x1 follows a normal distribution with mean zero and standard deviation α1 (t), denoted as f (.). Theorem 2. α2 (t)x1 follows a Student’s t distribution with mean zero and standard deviation α2 (t), denoted as g(.). We then have the following convolution result for α1 (t)x1 + α2 (t)x2 based on the convolution theorem. Theorem 3. z has the probability density function, ∞ p(z, t) = f (z − x, t)g(x, t)dx,
For the time factor functions, simply let α1 (t) + α2 (t) = 1,
(3)
so that this mixture distribution family can simulate a stochastic process evolving from a heavy-tailed state to a normal state by setting time-varying function α2 (t) (α1 (t)) as a monotonic reducing (increasing) function, evolving from one (zero) to zero (one). Theorem 4. The p.d.f. p(z, t) is symmetric about 0 with respect to z.
Identifying a Non-normal Stochastic Process
171
Proof. We have p(−z, t) = C
∞
−∞ ∞
e
=C Denoting x=-y, we have p(−z, t) = C =C
−∞
−∞ +∞ ∞ −∞
e
e
e
−(−z−x)2 2α2 (t) 1 −(z+x)2 2α2 (t) 1
−(z−y)2 2α2 (t) 1
−(z−y)2 2α2 (t) 1
[1 +
[1 +
[1 +
[1 +
ν+1 x2 ]− 2 dx, 2 (ν − 2)α2 (t)
ν+1 x2 ]− 2 dx, (ν − 2)α22 (t)
ν+1 x2 ]− 2 d(−y), 2 (ν − 2)α2 (t)
ν+1 x2 ]− 2 dy. (ν − 2)α22 (t)
(4)
Thus, equation (4) is the same as equation (2). We then have p(z, t) = p(−z, t). This indicates that p(z, t) is symmetric about 0 with respect to z. Thus, from the above theorem, we can write p(z, t) as ∞ −(z−x)2 ν+1 x2 2 p(z, t) = 2C e 2α1 (t) [1 + ]− 2 dx. 2 (ν − 2)α2 (t) 0 We also note that the mode of p(z, t) can be calculated as follows. ∞ −(0−x)2 ν+1 x2 2 e 2α1 (t) [1 + ]− 2 dx. p(0, t) = 2C 2 (ν − 2)α (t) 0 2
(5)
(6)
(7)
2.2 Some Simulation Results We have carried out some simulations for this mixture distribution family setting. Figure 1 displays the convolution density function with different time factors α1 (t) = 0.2, α2 (t) = 0.8 (dotted line), and α1 (t) = 0.8, α2 (t) = 0.2 (solid line), where ν = 2.5. As shown in the Table 1, the modes evolve with different time factors α1 (t), α2 (t). And Figure 2 displays the evolving modes shown in the Table 1, correspondent to each α1 (t), where α1 (t) = t, and α2 (t) = 1 − t. As seen from Table 1 and Figure 2, it can be confirmed that the modes are approaching to 0.3989, the mode of the standard normal density, since the Student’s t component is reducing quicker, as α1 (t) is getting larger.
3 Genetic Methods In this section, we simply summarize the Genetic Methods, the Genetic Algorithm (GA) and Genetic Programming (GP), and give the details of the schemes in our applications. Basically, we apply the GA to identify the parameters in a mixture distribution family at each time spot, then apply GP to estimate the time factor functions for the whole time period.
K. Tan, M. Chu, and S. Tokinaga
0.6
172
0.3 0.0
0.1
0.2
p(z)
0.4
0.5
a=0.2 a=0.8
−10
−5
0
5
10
x
Fig. 1. Convolution densities with time factor α1 (t) = 0.2, α2 (t) = 0.8 (dotted line), and α1 (t) = 0.8, α2 (t) = 0.2 (solid line)
Table 1. Evolution of the modes with different time factors α1 (t) = t, α2 (t) = 1 − t α1 (t) 0.5000 0.5499 0.5998 0.6497 0.6996 0.7495 0.7994 0.8493 0.8992 0.9491 0.9990
Fig. 2. Modes evolving with different time factor functions α1 (t), α2 (t) shown in Table 1
3.1 GA-Based Parameter Optimization Let us simply explain why we propose to use the GA-based (Genetic Algorithm) method to identify the parameters in a mixture distribution family here. As mentioned above, since the forms of the mixture distribution and time factor function incorporated are unknown, any kind of basic distribution may be the component distribution for the convolution density, the proposed mixture distribution is of a complicated form. And it is difficult to apply the χ2 -fitting to estimate the distributional parameters. Moreover, since a couple of parameters need to be estimated, as abovediscussed, it is also not suitable to apply other statistical methods, such as, the MCMC method. For the Maximum Likelihood method (ML), it sometimes leads to such a complicated Likelihood Function that it may get stuck in some local maxima [11]. Furthermore, the main disadvantage of these methods is that it is rather difficult for the MCMC method or the ML method to automatically select the optimal form of the time factor function in a mixture distribution under the circumstances of the functional form is unknown. Thus, we propose to use the Genetic Algorithm to optimize these distributional parameters simultaneously, since the GA has the ability to reach a global optimal solution without getting stuck in local solutions. It has been widely applied in many research fields ranging from scientific to social studies. Our Proposed GA Scheme Our GA scheme is designed as follows. The basic distributions are to be selected as the components for the convolution density. Assuming there are n types of basic distributions, it then turns out to be Cn2 combinations.
174
K. Tan, M. Chu, and S. Tokinaga
For example, the first group has a normal and a χ2 components, and the second group has a normal and a Student’s t components, ..., we design a GA1 for first group, and GA2 for the second group, ..., and call all these parallel GAs as P-GAs. We denote the member i of P-GAs as GAi . Suppose we have M data sets for different M (t=1, 2, ..., M) time spans; then we can apply the GA to the observed data at each time span to obtain each optimal mixture distribution at each time span, then each GAi has optimal convolution densities f1i , f2i , M i i ..., fM , to compute the sum of errors j=1 (fji ) for M data sets for f1i , f2i , ..., fM of group i, to find the optimal group with minimum sum of errors for M data sets as the results of P-GAs. Certainly, P-GAs are to be executed in parallel based on the M data sets. For each GAi the following steps are to be carried out. Step 1: Initial population Generate random numbers as individuals of the first generation in a certain population. Each individual represents a set of parameters in p(z, t) with a set of specified distribution components, time factor functions α1 (t), α2 (t) are treated as unknown scaling weights. Step 2: Evaluation of fitness Evaluate the fitness of each individual based on a predetermined fitness function, and sort all individuals of the generation according to their fitness values. Step 3: Selection of individuals Select two individuals with higher fitness values from the generation at a certain probability level. The selection strategy can be very varied and a roulette strategy is adopted in our applications. Step 4: Genetic operations Carry out genetic operations, namely, the crossover and the mutation operations on two selected individuals to produce their offsprings and place them in the pool of the next generation. A crossover operation randomly decides the crossover positions on the two selected individuals, and then exchanges parts of the two individuals with each other. Generally, there are two methods for this exchange, the one-point crossover, and the multipoints crossover. The later has been applied in our case. A mutation operation randomly decides the mutation positions with a certain probability, and then changes these position values for a selected individual. Again, there are two ways to do this: one is a onepoint mutation, and the other a multipoints mutation. The later has been adopted in our application. Step 5: Replacement of individuals Re-evaluate the fitness of each individual in the new generation, to see if the results meet the terminal conditions. If it does, then the GA terminates, otherwise it goes back to Step 3. The fitness function for evaluating the jth individual is defined as (j) F itness(j) = 1 − j (j) where (j) is the sum of the mean square error corresponding to the individual j.
(8)
Identifying a Non-normal Stochastic Process
175
3.2 Genetic Programming To identify the time factor functions α1 (t), α2 (t), we propose to apply Genetic Programming (GP) to the estimated results obtained from the GA step. GP-based optimization approaches have been successfully applied to various nonlinear optimization problems, including structural problems [8][9]. Roughly speaking, the individuals of GP have a tree structure, and can be used to approximate nonlinear functions. The first generation (individuals) of GP is randomly generated, each individual representing a candidate function. An individual represents a candidate function for the time factor function. An individual with a smaller error is considered that it has a larger fitness. The accuracy of function estimation is improved by the genetic operations (Crossover and Mutation operations) of the GP. Eventually, accurate estimate of time factor function can be obtained, when the predetermined number of generations is reached, or the computational accuracy of the goal is achieved. Here the fitness is defined as the reciprocal of the approximation errors. Basic Functions Used for Approximation The following basic functions are adopted in our GP. 1) polynomial function 2) exponential function 3) log function 4) sine function 5) sinh function
4 Numerical Experiments In this section, two numerical experiments with different types of time factor functions have been carried out. The first application is of the time factor function of an exponent function form. And the second application is of the time factor function of a logistic function form. We first give the details of our data sets before we show the results. Namely, we generate two artificial data sets by the following convolution density function discussed above. ∞ −(z−x)2 ν+1 x2 2 p(z, t) = C e 2α1 (t) [1 + ]− 2 dx. (9) 2 (ν − 2)α2 (t) −∞ with three types of time factor functions as follows. (1) α1 (t) = 1 − e−t , α2 (t) = e−t 1 (2) α1 (t) = 1+e1 −t , α2 (t) = 1 − 1+e1 −t = 1+e t where the time factor functions α1 (t), α2 (t) are correspondent to the normal component and Student’s t component respectively in equation (9). And the degrees of freedom in both applications are set as ν = 2.5. Here M = 1, 2, ..., 20. Namely, we take 21 time spans, corresponding to t=0.0, 0.3, ... , 5.7. At each time span we generate 2,000 samples in each application. Table 2 and 3 show the values of α2 (t) = e−t in Application 1, and α2 (t) = 1 − 1 1 = 1+e t in Application 2, respectively. Figure 3 gives an example of the evolution 1+e−t behavior of this mixture distribution family in Application 1.
176
K. Tan, M. Chu, and S. Tokinaga Table 2. Application 1:Values of α2 (t) = e−t e−t 1.0000000000000 0.7408182206817 0.5488116360940 0.4065696597406 ... 5.4 0.0045165809426 5.7 0.0033459654575 t 0.0 0.3 0.6 0.9
Table 3. Application 2:Values of α2 (t) = t 0.0 0.3 0.6 0.9
Results of Application 1 We then apply above-discussed GA and GP to the data set of Application 1. The basic distributions are normal, Student’s t, χ2 , F-distribution, then the groups of P-GAs is 6. Namely, we have 6 groups of GAi to be executed in parallel. The GA parameters are set as follows. Size of individual: 200; Probability of Mutation: 0.10; Probability of Crossover: 0.50; And elitist strategy has been taken in each GAi . The parameters of GP are set as follows. Size of individual: 200; Probability of Mutation: 0.20; Probability of Crossover: 0.30; And elitist strategy has been taken. The results are summarized in Table 4. As seen from the table, the distribution combination with the smallest error turns out to be the combination of the normal and the Student’s t, among the six P-GAs groups. Although the estimated parameters are somewhat biased from the true values, the estimated parameters and time factor functions almost reveal the evolution process from a heavy-tailed distribution to a normal distribution based on the observed data.
177
y
0.0
0.1
0.2
0.3
0.4
Identifying a Non-normal Stochastic Process
−4
−2
0
2
4
x
Fig. 3. An example of evolving mixture distribution family with time factor functions α1 (t), α2 (t) of Application 1 Table 4. Results of Application 1 Estimated normal distribution N(.0182, 0.97292 ) Estimeted Student’s t t(.0001, 1.02152 , 2.55) Estimated α1 (t) 1 − e−0.9829t
Besides, we have also tried the time factor function which has the form of ca−kt , where c, k > 0, and a > 1 hold. It has been confirmed that our proposed method works well. Results of Application 2 By the same approach, we apply our proposed method to the data set of Application 2. The setting of the Genetic Methods are the same as in Application 1. The results of the numerical experiment are summarized in Table 5. It is also confirmed that this approach works as well when α1 (t) = c+ec−kt , where c, k > 0 holds. Table 5. Results of Application 2 Estimated normal distribution N(.0129, 1.00692 ) Estimeted Student’s t t(.0011, 1.02782 , 2.64) 1 Estimated α1 (t) 1+1.0007e−0.9921t
Discussion We also carry out the numerical experiments of these two applications under many different initial conditions of the proposed Genetic Methods, such as the probabilities of crossover and mutation operations. We get almost the same results presented above,
178
K. Tan, M. Chu, and S. Tokinaga
though, the numbers of convergence loops (generations) are different. However, for the GP step, sometimes we get close functional expressions, such as an approximate of a polynomial function for the time factor function, which is a close expression of Taylor’s series etc. It sometimes needs to be checked by human expert. One way to overcome these problems is to select the lowest tree structure among the optimal candidates based upon the rule that simplicity is the best.
5 Concluding Remarks In this paper, we have designed a mixture distribution family to approximate an evolving process to grasp the whole evolving statistical properties, starting with non-normal evidences and ending at normal situations. And we have discussed and showed the good statistical properties of such a mixture distribution family. We also have proposed a Genetic Algorithm based approach to optimize the parameters in the mixture distribution family, namely, a Parallel GAs (P-GAs) method to obtain the optimal combination of the distribution components for the convolution density function; meanwhile, Genetic Programming has been applied to identify the time factor functions in the mixture distribution family automatically. The results of our numerical experiments have shown that our proposed approach works well. Acknowledgement. The authors would like to thank two anonymous reviewers for their valuable suggestions, which significantly improved the quality of this paper.
References 1. Titterington, D., Smith, A., Makov, U.: Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons (1985) 2. Tan, K., Tokinaga, S.: Identifying returns distribution by using mixture distribution optimized by genetic algorithm. In: Proceedings of NOLTA 2006, pp. 119–122 (2006) 3. McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Marcel Dekker (1988) 4. Carol, A.: Normal mixture diffusion with uncertain volatility: Modelling short and long-term smile effects. Journal of Banking & Finance 28(12) (2004) 5. Tan, K.: An Approximation of returns distribution based upon GA optimized mixture distribution and its applications. In: Proceedings of the Fourth International Conference on Computational Intelligence, Robotics and Autonomous Systems, pp. 307–312 (2007) 6. Carlin, B.P., Louis, T.A.: Bayes and empirical Bayes methods for data analysis. Chapman and Hall, New York (1996) 7. Clifford, P.: Discussion on the meeting on the Gibbs sampler and other Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B 55, 53–54 (1993) 8. Goldberg, D.E.: Genetic Algorithm: in Search, Optimization, and Machine Learning. Addison-Wesley Press (1989) 9. Koza, J.R.: Genetic Programming. MIT Press (1992) 10. Tan, K., Gani, J.: Theoretical Advances and Applications in Operatios Research. Kyushu University Press (2011) 11. Dorsey, B., Mayer, W.J.: Genetic algorithms for estimation problems with multiple optima, nondifferentiability and other irregular features. Journal of Business and Economic Statistics 13, 53–66 (1995)
Clustering Based Bagging Algorithm on Imbalanced Data Sets Xiao-Yan Sun1,2 , Hua-Xiang Zhang1,2 , and Zhi-Chao Wang1,2 1
Department of Information Science and Engineering, Shandong Normal University Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology 250014, Jinan, Shandong China {xiaomeixi 1987,huaxzhang,lsws33}@163.com
2
Abstract. The approach of under-sampling the majority class is an effective method in dealing with classifying imbalanced data sets, but it has the deficiency of ignoring useful information. In order to eliminate this deficiency, we propose a Clustering Based Bagging Algorithm (CBBA). In CBBA, the majority class is clustered into several groups and instances are randomly sampled from each group. Those sampled instances are combined together with the minority class instances, and are used to train a base classifier. Final predictions are produced by combining those classifiers. The experimental results show that our approach outperforms the under-sampling method. Keywords: Under-sampling, Bagging, Clustering.
1
Introduction
In many real applications, we often face the problem of imbalanced data sets where the instances of one class are fewer than that of other classes, which means that the class distribution is highly skewed. These kinds of problems are called class-imbalanced learning issues, and exist in many practical domains, such as fraud detection, introduction prevention, risk management and medical research [1]. This work focuses on binary classification problems, and refers the minority and majority class as positive and negative class respectively. Traditional algorithms tend to show a strong bias toward the majority, since they aim to maximize the overall accuracy. For example, in medical detection, the number of people who have cancer is about 1% of all the people. If an algorithm predicts all the instances as the majority class, it still gets a high accuracy of 99%, but it cannot recognize the minority class instances. However, in many cases, the accuracy of the minority class is often much important. Therefore, many studies have been discussed to tackle this demanding problem, and the approaches proposed are mainly divided into two aspects: data level and algorithm level. On data level, solutions are proposed to artificially balance the training sets by modifying the distribution of the data sets and the commonly used methods are known as under-sampling and over-sampling respectively. Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 179–186, 2011. c Springer-Verlag Berlin Heidelberg 2011
180
X.-Y. Sun, H.-X. Zhang, and Z.-C. Wang
The under-sampling alters the size of data sets by taking representative samples of the majority class. One simple method is the random under-sampling that removes the majority class instances randomly and has the potential to ignore useful information. Several advanced methods are proposed to select more representative instances. One-sided selection tries to remove the borderline and noisy instances and use the remaining samples as the training sets [2]. A roughly balanced Bagging algorithm determines the majority instances according to the minority binomial distribution [3], and a cluster-based under-sampling algorithm selects instances based on clustering [4]. The over-sampling approach increases the number of minority class instances to balance the training set. Random over-sampling can be easily implemented by duplicating the minority class instances, and at the same time it may cause over-fitting. More complex approaches have been proposed and one of the famous over-sampling approaches is SMOTE [5]. SMOTE produces synthetic minority class instances between existing minority instances and can avoid overfitting. However, it blindly generates minority class instances without considering majority class instances, and may cause overgeneralization [4]. The borderlineSMOTE improves SMOTE, in which the minority examples near the borderline are over-sampled [6]. A cluster-based over-sampling algorithm attempts to deal with between-class imbalance and within-class imbalance simultaneously [7]. It has been shown that both sampling methods are helpful in imbalanced problems [1]. In this paper, we focus on under-sampling, because it has been shown to outperform over-sampling [8]. On algorithm level, approaches are designed to modify the learning algorithm. Cost-based algorithms intend to minimize the total cost of the misclassification by assigning the minority class instances more misclassification costs. Many algorithms combined with ensemble methods have been proposed, such as AdaCost [9], Costing [10], AdaC2 [11]. Though these algorithms have been proved to be effective [12], the effectiveness is limited [13]. Boosting has been widely used in classifying fields, but it does not perform well on imbalanced problems as it does not take into account the class imbalance of the training sets with respect to their class. Recently, many advanced boosting algorithms combined with over-sampling have been proposed. SMOTEBoost is based on a combination of the SMOTE algorithm and a boosting procedure. It generates synthetic examples at each boosting round, and then changes the distribution of the training data [14]. Databoost-IM tries to find out hard examples of both classes and generate synthetic examples for each class, and ensure the total weights of each class in the new training sets are rebalanced [15]. EasyEnsemble combines the entire minority class with an equal proportion of the majority class as the training sets in each boosting round and aggregates several sub-classifiers to make final predictions [16]. Bagging is less attractive than boosting-based algorithms when classifying imbalanced data sets[3]. Some variations of bagging methods have been proposed. A common strategy is to construct a balanced training set by sampling the majority class instances. Meanwhile, the number of sampled majority class
Clustering Based Bagging Algorithm on Imbalanced Data Sets
181
instances is equal to the size of the minority class. Then, a balanced training set is used to train a base classifier by combining all the minority class instances. Most advanced algorithms tend to alter the sampling strategy for the majority class. For example, Tao et al. proposed to use bootstrap sampling [17], which may ignore some useful instances in the majority class, Li divided the majority class instances into several parts and the number of each part is equal to the minority class [18]. This method uses all the majority class instances, which may not be useful. We propose a clustering based bagging algorithm to classify imbalanced data sets. Clustering technique is used to partition the majority class into several groups, and some majority class instances are sampled from each group, and combined with the minority class instances to train a base classifier. Those classifiers are then combined to classify new instances. The rest of this paper is organized as follows. Section 2 presents the proposed method. Section 3 reviews the performance metrics Section 4 shows the experiments, and section 5 concludes this paper.
2
Clustering Based Bagging Algorithm (CBBA)
This section describes our proposed method that alters the class distribution by removing some majority class instances. As previously discussed, a balanced training set is effective. The main point is how to sample the majority class instances. So, our algorithm CBBA aims to take representative samples from the majority class by using K-means. Firstly, we use K-means to cluster the majority class into several groups, and then randomly sample instances from each group, and combine with all the minority class instances to train a base classifier. Table 1 describes the algorithm in detail.
Table 1. CBBA algorithm 1: Input: is the training data set, set and represent the minority class and majority class respectively, and the size of each class is |P | and |N |. L is the number of iterations to train a bagging algorithm K is the number of clusters 2: Build the Random-BBVC model, for i=1 to L (1)K-Means algorithm is used to cluster the N into K clusters: C1 , C2 , ..., Ck ,set|Cj | as the size of each Cj ,1 ≤ j ≤ K (2)Let Cj contains the instances that randomly sampled from each Cj , set |Cj | as |P |∗|Cj | the size of each Cj ,and |Cj | = |N |
(3)Set N be the set combined all the Cj , (4)Using N and P as the training set to train a base classifier f i (x) . 3: Output: H(x) = sign[Σf i (x)]
182
X.-Y. Sun, H.-X. Zhang, and Z.-C. Wang
In CBBA, the majority class instances are clustered in the first step. The main idea of CBBA is that some similar instances will be assigned to the same cluster. If we randomly sample a cluster, we can remove some instances with similar characteristics, and still retain the useful information for classification. This is the main innovation of this algorithm. Meanwhile, the instability of Kmean makes different results of each cluster. If we randomly sample instances, the difference of training data sets in base classifiers will be increased. The |P |∗|C | number of sampled instances is |N | j ,and that is to say, if one cluster has more
instances, we will sample more. All Cj s are combined as the majority instances to train a base classifier, and we can see that the number of sampled majority class instances almost equals the size of the minority class. Final decisions are decided by combining all the base classifiers.
3
Performance Metrics
In this section, we will describe several performance metrics used in the experiment. Table 2 shows the confusion matrix for a two-class problem, where TP and TN represent the number of positive and negative that are classified correctly, and FN and FP represent the number of negative and positive that are misclassified. Table 2. Confusion matrix for a two-class problem Predicted positive Predicted negative Positive TP(true positive) FN(false negative) Negative FP(false positive) TN(true negative)
Accuracy. Accuracy represents the population of the correctly predicted examples, which is not an appropriate evaluation criterion in imbalanced data sets, and we will not put on much attention it. Acc = (T P + T N )/(T P + F N + F P + T N )
(1)
True positive rate. True positive rate (T P rate) is the percentage of positive examples correctly classified in the positive class. T P rate = T P/(T P + F N )
(2)
F-value. The F-value combines the Precision and Recall, and gets a higher value when both of Precision and Recall are high. F − value = ((1 + β 2 ) × P recision × Reccall)/(β 2 × P recision + Recall) (3) Where P recision = T P/(T P + F P ),Recall = T P rate = T P/(T P + F N ), and β is usually set to 1.
Clustering Based Bagging Algorithm on Imbalanced Data Sets
183
G-mean. The G-mean is defined as the square root of the accuracy on both classes. A higher G-mean value indicates that a learning algorithm performs better on both classes. √ G − mean = Acc+ × Acc− (3) Where Acc− = T N/(T N + F P ), Acc+ = T P/(T P + F N ).
4
Experiments
4.1
Data Sets Description
We test CBBA on 12 data sets from UCI 1 . All the data sets are chosen or transformed into binary data sets. For example, Segment dataset has 2310 examples and 7 classes, and we take the fifth class “window” as the positive class, and combine others as the negative class. A summary of those data sets are given in Table 3. Table 3. Information of data sets Dataset
4.2
size Attribute min/ maj Minority%
Balance-scale 625
4
49/576
7.84
breast-cancer 286
9
85/201
29.72
colic
368
22
136/232
36.96
credit-g
1000
20
300/700
30.00
diabetes
768
8
268/500
34.90
glass
214
9
76/138
35.51
haberman
306
3
81/225
26.47
hepatitis
155
19
32/123
20.65
segment
2310
19
330/1980
14.29
sick
3772
29
231/3541
6.12
vehicle
846
18
212/634
25.06
vote
435
16
168/267
38.62
Experimental Settings
We implement CBBA in Weka 3.5.8. For every data set, we use a 10-fold cross validation to evaluate the results of our experiments. We compare the results of the data sets on TP rate, F-value and G-mean. In our experiments, we compared two methods [3] Baggingabbreviated as Bag:Bagging use all the data 1
C. Blake, E. Keogh and C. J. Merz, UCI repository of machine learning datasets, http://www.ics.uci.edu/mlearn/MLRepository.html
184
X.-Y. Sun, H.-X. Zhang, and Z.-C. Wang
instances(P + N ), and REPTree is used to train the base classifier. The number of iterations L is set to 10. Under-sampling and Bagging (abbreviated as Un-Bag): We randomly sample a subset N from N (|N | = |P |), and then use REPTree to train the base classifier with |N | + |P |. The number of iterations is set to 10. CBBA: The number of clusters is set to 2,3,5. Those algorithms are named as CBBA2, CBBA3, and CBBA5 respectively. REPTree is used to train the base classifier. The number of iterations is set to 10. 4.3
Experimental Results and Analyses
Table 4 describes the comparison of all the data sets on True positive rate. Surprisingly, our approach outperforms Bag and Un-Bag for all the data sets. This may be attributed to exploring the majority class as much as possible. The results suggest our approach is effective in improving the classification accuracy for minority class. We test 2, 3 and 5 as the number of clusters respectively, we can see CBBA2 performs better on four data sets, CBBA3 performs better on five data sets, CBBA5 performs better on five data sets, including on the segment data set, CBBA2 has the same accuracy with CBBA3 of 98.8%, on colic data set, CBBA3 and CBBA5 have the same accuracy of 81.6%, but none of them performs excellently than others. Table 4. TP rates on all datasets (the optimal values are in boldface) Dataset Balance-scale
Bag Un-Bag CBBA2 CBBA3 CBBA5 0.653
0.694
breast-cancer 0.165 0.635
0.671
0.635
0.576
0.809
0.816
0.816
colic
0
0.721 0.787
0.633
0.714
credit-g
0.447
0.73
0.763
0.747
0.747
diabetes
0.578 0.769
0.799
0.795
0.776
glass
0.618 0.763
0.737
0.789
0.763
haberman
0.173 0.593
0.593
0.617
0.63
hepatitis
0.313 0.781
0.813
0.875
0.906
segment
0.888 0.985
0.988 0.988
sick
0.818 0.961
0.974
vehicle
0.396 0.807 0.952
vote
0.97
0.976
0.974
0.983
0.811
0.797
0.816
0.982
0.988
0.988
Table 5 demonstrates results on F-value, from which, we see that CBBA behaves better on F-value in ten data sets. On the segment and sick data set, Bagging has higher value than under-sampling bagging and CBBA. Compare
Clustering Based Bagging Algorithm on Imbalanced Data Sets
185
with Table 4, we know CBBA improves the TP rate, but it decreases the Fvalue of the minority class, which may reflect that CBBA misclassifies more majority instances. Table 5. F-value and G-mean on all datasets(the optimal values are in boldface) Data set
Table 5 also shows the comparison on G-mean, which reflects the overall classification performance of a classifier. Similarly, we see that CBBA still works well, which has higher G-mean value than each of other methods on ten data sets. On the diabetes data set, CBBA and Un-Bag have the same G-mean value, but CBBA has higher TP rate and F-value than Un-Bag. On the sick data set, Un-Bag performs best, but has lower F-value than Bagging and TP rate than CBBA. And this may be attributed to the characteristics of this data set or the instability of CBBA. The experimental results show that CBBA improves the classification accuracy of minority class without sacrificing the majority class too much.
5
Conclusions
This paper proposes an approach to cluster the majority class into several groups and sample some instances from each group to make the training data sets balanced. Experimental results show our approach improves TP rate, F-value, and G-mean when classifying imbalanced data sets. There are several open issues left to be discussed. The number of clusters needs to be addressed, and more experiments should be conducted to compare with other under-sampling approaches. Acknowledgments. This research is partially supported by the National Natural Science Foundation of China (No. 61170145), the Science and Technology Projects of Shandong Province, China (No.s 2008B0026, ZR2010FM021, and 2010G 0020115) and supported by Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology.
186
X.-Y. Sun, H.-X. Zhang, and Z.-C. Wang
References 1. Weiss, G.M.: Mining with rarity: A unifying framework. Chicago, IL, USA. SIGKDD Explorations 6(1), 7–19 (2004) 2. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-Sided selection. In: 14th International Conference on Machine Learning, Tennessee, pp. 179–186 (1997) 3. Hido, S., Kashima, H.: Roughly Balanced Bagging for Imbalanced Data. In: 2008 SIAM International Conference on Data Mining, pp. 143–152 (2008) 4. Yen, S.-J., Lee, Y.-S.: Cluster-based sampling approaches to imbalanced data distributions. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 427–436. Springer, Heidelberg (2006) 5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 341–378 (2002) 6. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005) 7. Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explorations 6(1), 40–49 (2004) 8. Drummond, C., Holter, C.: C4.5, class imbalance and cost sensitivity: Why undersampling beats over-sampling. In: ICML Workshop on Learning from Imbalaneed Data Sets, Washington D.C (2003) 9. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: Misclassification Costsensitive boosting. In: 16th International Conference on Machine Learning, Bled, Slovenia, pp. 97–105 (1999) 10. Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: 3rd IEEE International Conference on Data Mining, pp. 435–442 (2003) 11. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Journal of Pattern Recognition 40(12), 3358–3375 (2007) 12. Lin, Y., Lee, Y., Wahba, G.: Support Vector Machines for Classification in Nonstandard Situations. Machine Learning 46(1-3), 191–202 (2002) 13. Wu, G., Chang, E.Y.: KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution. IEEE Transactions on Knowledge and Data Engineering 17(6), 786–795 (2005) 14. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavraˇc, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003) 15. Guo, H.-Y., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explorations 6(1), 30–39 (2004) 16. Liu, X.Y., Wu, J.X., Zhou, Z.H.: Exploratory Under-Sampling for Class-Imbalance Learning. In: 6th IEEE International Conference on Data Mining, Hong Kong, China, pp. 539–550 (2006) 17. Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(7), 1088–1099 (2006) 18. Li, C.: Classifying Imbalanced Data Using A Bagging Ensemble Variation (BEV). In: 45th Annual Southeast Regional Conference, pp. 203–208 (2007)
Agglomerative Hierarchical Clustering Using Asymmetric Similarity Based on a Bag Model and Application to Information on the Web Satoshi Takumi1 and Sadaaki Miyamoto2
2
1 Graduate School of Systems and Information Engineering University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan [email protected] Department of Risk Engineering, Faculty of Systems and Information Engineering University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan [email protected]
Abstract. An algorithm of agglomerative hierarchical clustering using an asymmetric similarity measure based on a bag model is proposed. This bag model is studied for document clustering and analysis of information on the web. The definition of an inter-cluster similarity is proposed and a dendrogram output reflecting asymmetry of the similarity measure is shown. It is also proved that the dendrogram has no reversals. An example of word clusters on Twitter shows how the method works.
1
Introduction
Cluster analysis or clustering is becoming a standard tool in modern data mining and data analysis. Clustering techniques are divided into two classes of hierarchical and non-hierarchical methods. The major technique in the first class is the well-known agglomerative hierarchical clustering [1,2] which is old but has been popular in a variety of applications. Agglomerative hierarchical clustering uses a symmetric similarity or dissimilarity measure between a pair of objects. In real applications, however, relation between objects is frequently asymmetric. In such cases we have a motivation to analyze asymmetric measures and obtain clusters having asymmetric features. Several studies have been done on clustering based on asymmetric similarity measures [3,6,10,7,8]. They discuss generalizations of algorithms for symmetric measures such as the single link, complete link, and average link. In contrast, Takumi and Miyamoto [9] show two different models to define inter-cluster similarities in accordance with different classes of applications. Moreover they prove that the resulting dendrograms have no reversals. We propose a new linkage method for an asymmetric similarity measure that is appropriate to document/term clustering in this paper. Document/term clustering has been considered on the basis of a bag model which means that each document has a bag of words. We will show that this bag-theoretical model naturally leads to an asymmetric measure of similarity between two objects, Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 187–196, 2011. c Springer-Verlag Berlin Heidelberg 2011
188
S. Takumi and S. Miyamoto
i.e., documents or words. Moreover we prove that an agglomerative hierarchical clustering algorithm using this measure does not produce any reversal on a dendrogram output. As an application we handle a simulated Twitter, showing how the method works.
2
Preliminary Considerations
We review agglomerative hierarchical clustering and a bag model for document/keyword clustering before introducing an asymmetric similarity measure herein. 2.1
Agglomerative Hierarchical Clustering
Let the set of objects for clustering be X = {x1 , . . . , xN }. Generally a cluster denoted by Gi is a subset of X. The family of clusters is denoted by G = {G1 , G2 , . . . , GK }, where the clusters form a crisp partition of X: K
Gi = X,
Gi ∩ Gj = ∅
(i = j).
i=1
Moreover the number of objects in G is denoted by |G|. Agglomerative hierarchical clustering uses a similarity or dissimilarity measure. We use similarity here: similarity between two objects x, y ∈ X is assumed to be given and denoted by s(x, y). Similarity between two clusters is also used, which is denoted by s(G, G ) (G, G ∈ G) which also is called an inter-cluster similarity. In the classical setting a similarity measure is assumed to be symmetric: s(G, G ) = s(G , G).
(1)
A general procedure of agglomerative hierarchical clustering [4,5] is as follows. AHC: Agglomerative Hierarchical Clustering Algorithm. AHC1: Assume that initial clusters are given by ˆ 2, . . . , G ˆ N0 }, where G ˆ 1, G ˆ2, . . . , G ˆ N are given initial clusters. ˆ 1, G G = {G ˆ Generally Gj = {xj } ⊂ X, hence N0 = N . Set K = N0 . (K is the number of clusters and N0 is the initial number of clusters) ˆ i (i = 1, . . . , K). Gi = G Calculate s(G, G ) for all pairs G, G ∈ G.
Agglomerative Hierarchical Clustering Using Asymmetric Similarity
189
AHC2: Search the pair of maximum similarity: (Gp , Gq ) = arg max s(Gi , Gj ),
(2)
mK = s(Gp , Gq ) = max s(Gi , Gj ).
(3)
Gi ,Gj ∈G
and let Gi ,Gj ∈G
Merge: Gr = Gp ∪ Gq . Add Gr to G and delete Gp , Gq from G. K = K − 1. If K = 1 then stop and output the dendrogram. AHC3: Update similarity s(Gr , G ) and s(G , Gr ) for all G ∈ G. Go to AHC2. End AHC. Note 1. The calculation of s(G , Gr ) in AHC3 is unnecessary when the measure is symmetric: s(Gr , G ) = s(G , Gr ). Well-known linkage methods such as the single link, complete link, and average link all assume symmetric similarity measures [1,2,4]. In particular, the single link uses the following inter-cluster similarity definition: s(G, G ) =
max
x∈G,y∈G
s(x, y).
(4)
The average link defines the next inter-cluster similarity: s(G, G ) =
1 |G||G |
s(x, y).
(5)
x∈G,y∈G
For the single link, complete link, and average link, it is known that we have the monotonicity of mK : mN ≥ mK−1 ≥ · · · ≥ m2 ≥ m1 .
(6)
If the monotonicity does not hold, we have a reversal in a dendrogram: it means ˆ = G ∪ G at level m = s(G, G ) and after that G and G are merged into G ˆ and G are merged at the level m ˆ G ), and m that G ˆ = s(G, ˆ > m occurs. Reversals in a dendrogram is observed for the centroid method. Consider the next example [4,5]: Example 1. If three points A, B, C on a plane are near equilateral triangle but two points A, B are nearer, these two are merged into a cluster, and then the distance between the mid point (centroid) of AB and C will be smaller than the distance between A and B. We thus have a reversal. Apparently, if the monotonicity always holds for a linkage method, no reversals in the dendrogram will occur. A simple example of a reversal is shown in Fig. 1.
190
S. Takumi and S. Miyamoto
a
b
c
Fig. 1. A simple example of reversal
2.2
A Bag Model
Assume that D = {d1 , . . . , dL } is a set of documents and T = {t1 , . . . , tM } is a set of terms alias keywords. We also assume an L × M matrix M = (mij ) is given: mij is the number of occurrences of term tj in document di . It is obvious that mij ≥ 0. A bag generally is characterized by the count function: if we consider a bag B of T , count function CB (tj ) means the number of occurrence of tj in B. The collection of all bags of T is denoted by N T . We list basic relations and operations in the following. – – – – –
Inclusion: B ⊆ B ⇐⇒ CB (t) ≤ CB (t), ∀t ∈ T . Equality: B = B ⇐⇒ CB (t) = CB (t), ∀t ∈ T . Union: CB∪B (t) = max{CB (t), CB (t)}. Intersection: CB∩B (t) = min{CB (t), CB (t)}. Addition: A bag also has the operation of addition that is specific to bags: CB⊕B (t) = CB (t) + CB (t).
We now introduce a bag model to analyze document/term relation: it uses a mapping F : D → N T : (7) CF (di ) (tj ) = mij . The ‘inverse mapping’ F −1 : T → N D is also considered: CF −1 (tj ) (di ) = mij .
(8)
For a subset G ⊆ T , we define
CF −1 (G) (di ) =
mij .
(9)
tj ∈G
In other words, F −1 (G) =
t∈G
F −1 (t),
(10)
Agglomerative Hierarchical Clustering Using Asymmetric Similarity
191
using the addition, and moreover F (K) =
F (d).
(11)
d∈K
The reason why we use such a bag model is that we define measures of similarity. For this purpose we define the cardinality of a bag B of T : |B| =
CB (tj ).
tj ∈T
A well-known symmetric measure of similarity is as follows [4]. s(t, t ) =
|F −1 (t) ∩ F −1 (t )| ; |F −1 (t) ∪ F −1 (t )|
(12)
s(d, d ) is defined likewise by substituting F into F −1 . An interpretation of the above measure is the area of the intersection of the two bags divided by the area of the union of the two bags. In contrast, an asymmetric measure is sometimes discussed: s (t, t ) =
|F −1 (t) ∩ F −1 (t )| . |F −1 (t)|
(13)
The interpretation is the degree of inclusion: if F −1 (t) is completely included in F −1 (t ), the last measure takes its maximum value of unity. If we consider an asymmetric similarity on D, we use s (d, d ) =
3
|F (d) ∩ F (d )| . |F (d)|
(14)
Clustering Using an Asymmetric Similarity Measure
We assume hereafter that similarity measures are asymmetric in general: s(G, G ) = s(G , G). First, we use AHC algorithm in the previous section, which means that two clusters (Gp , Gq ) with s(Gp , Gq ) = max s(Gi , Gj ) Gi ,Gj ∈G
(15)
will be merged regardless of asymmetric property. The above equation can be rewritten as s(Gp , Gq ) = max max{s(Gi , Gj ), s(Gj , Gi )}. (16) i<j
192
S. Takumi and S. Miyamoto
Let us introduce a concrete linkage method for the bag model herein. For this purpose we should define an asymmetric inter-cluster similarity s(G, G ) for G, G ⊆ T , i.e., we consider term clustering: s(G, G ) =
|F −1 (G) ∩ F −1 (G )| . |F −1 (G)||G |
(17)
Note that |G | in the denominator is the number of elements of the ordinary set G . We have the following proposition. Proposition 1. The similarity measure (17) satisfies 0 ≤ s(G, G ) ≤ 1. If |G |=1, s(G, G ) takes its maximum value of unity if and only if F −1 (G) ⊆ F −1 (G ). Moreover when G = {t} and G = {t }, then (17) is reduced to (13). The proof is almost trivial and omitted. We proceed to see that the next property holds. Proposition 2. AHC algorithm with (17) does not produce any reversal in the dendrogram. The rest of this section is devoted to the proof of Proposition 2. Let us define S(K) = {s(G, G ) : ∀(G, G ) ∈ G × G, G = G },
(18)
where K is the index in AHC and G changes as K varies, e.g., |G| = K. Hence S(K) is the set of all values of similarity for K. We also assume max S(K) is the maximum value of S(K): it exactly is mK given by (3). We then have the next lemma. Lemma 1. If max S(K) is monotonically non-increasing with respect to K: max S(N ) ≥ max S(N − 1) ≥ · · · ≥ max S(2) ≥ max S(1),
(19)
then there is no reversal in the dendrogram. Proof. The proof is almost trivial, since max S(K) = mK . Thus (19) is exactly the same as (6), thus we have the conclusion. Second lemma states a few properties of components in (17). Lemma 2. The following equations and inequality hold for G ∩ G = ∅. |G ∪ G | = |G| + |G |, F |F
−1
−1
(G ∪ G ) = F
(G ∪ G ) ∩ F
−1
−1
(G) ⊕ F
(G )| ≤ |F
−1
+ |F
−1
(20) −1
(G) ∩ F
(G ),
−1
(G ) ∩ F
(21)
(G )|
−1
(G )|.
(22)
Agglomerative Hierarchical Clustering Using Asymmetric Similarity
193
Proof. The first two equations (20) and (21) are easy to prove, since G ∩ G = ∅. The last inequality (22) is proved from the next relation: min(a + b, c) ≤ min(a, c) + min(b, c),
which is easily checked. Using the last lemma, we have the following proposition. Proposition 3. For three clusters G, G , G ∈ G, the next inequalities hold. s(G ∪ G , G ) ≤ max{s(G, G ), s(G , G )},
s(G , G ∪ G ) ≤ max{s(G , G), s(G , G )}.
(23) (24)
Proof. |F −1 (G ∪ G ) ∩ F −1 (G )| |F −1 (G ∪ G )||G | −1 |(F (G) ⊕ F −1 (G )) ∩ F −1 (G )| = |F −1 (G) ⊕ F −1 (G )||G | −1 |F (G) ∩ F −1 (G )| + |F −1 (G ) ∩ F −1 (G )| ≤ (|F −1 (G)| + |F −1 (G )|)|G | = max{s(G, G ), s(G , G )},
s(G ∪ G ,G ) =
where way:
a+b c+d
≤ max{ ab , dc } is used. The second inequality is proved in the same |F −1 (G ) ∩ F −1 (G ∪ G )| |F −1 (G )||G ∪ G | |F −1 (G ) ∩ (F −1 (G) ⊕ F −1 (G ))| = |F −1 (G )||G ∪ G | ≤ max{s(G , G), s(G , G )}.
s(G , G ∪ G ) =
The proof of Proposition 2 is now easy, since (23) and (24) imply that max S(K) is monotonically non-increasing with respect to K. Hence Lemma 1 is applied and we see that we have no reversals on the dendrogram. 3.1
Asymmetric Dendrogram
Foregoing studies propose asymmetric dendrograms [6,10,9]. Note again that (15) is equivalent to (16). When Gp and Gq are merged at the level s(Gp , Gq ), we have s(Gp , Gq ) ≥ s(Gq , Gp ). Yadohisa [10] proposed to show the value s(Gq , Gp ) in addition to the merged level s(Gp , Gq ) in the dendrogram using another lines. We do not use this idea of showing the both levels, since such a dendrogram is too cumbersome to observe, especially when the number of objects is relatively large.
194
S. Takumi and S. Miyamoto
y
x
1
0
Fig. 2. Asymmetric dendrogram: a variation of Yadohisa’s [10] that is used in [9]
Instead, we use a simpler dendrogram that shows asymmetry by the ratio and position of the merged clusters [6]. Suppose s(Gp , Gq ) ≥ s(Gq , Gp ). We use the next two rules: 1. Cluster Gq is placed at the upper side of the branch while Gp is at the lower side of the branch that shows merging of Gp and Gq . 2. Asymmetric branch like the ones in Figs. 3 and 4 is used: the line toward right of the branch does not stem from the midst of the branch, but from the point nearer the upper side of the branch. Let the ratio of the length between the stemming point and the upper side of the branch indicating Gq be Lq and that between the point and the lower side indicating Gp be Lp . We set Lq /Lp = s(Gq , Gp )/s(Gp , Gq ). We have Lq /Lp ≤ 1 from rule 1). Thus the dendrogram is similar to that for symmetric similarity except that the branch has the asymmetry.
4
Illustrative Examples
First example dealt with Japanese Wikipedia: 17 entries shown in Fig. 3 have been extracted. The entries are English translations of the titles of the original documents in Japanese. In this example document clustering was done instead of term clustering. Terms occurring more than twice were used for T . We observe that the title words of similar concepts are assigned to clusters of high similarity, while those of different concepts are assigned to different clusters, e.g., entries concerning ‘sports’ are related to three universities of ‘Tokyo University’, ‘Kyoto Universsity’ and ‘Tsukuba University’ in Japan. Second example used simulation of a Twitter (or chat) about smart phones in Japan. The original data are in Japanese which are based on a real conversation among students that was summarized into a series of short sentences like a Twitter. Keywords were selected and the proposed method was applied. The resulting dendrogram is shown in Fig. 4. We observe two major clusters: one is from the top ‘IS03’ to ‘display’; the other is from ‘problem’ to the bottom ‘Qrcode’. The first cluster includes companies
Agglomerative Hierarchical Clustering Using Asymmetric Similarity
195
Tokyo Univ Tsukuba Univ Kyoto Univ basketball handball soccer SONY Nintendo SNS Facebook Twitter earthquake tsunami nuclear energy text mining clustering principal component analysis 1
0
Fig. 3. Asymmetric dendrogram generated from 17 entries of Japanese Wikipedia
IS03 image quality au Skype fashionable iphone charge design GALAXY OEL garapagos phone OS display problem time function radio wave docomo smart phone battery external memory infrared light 1SEG Qrcode 1
0
Fig. 4. Asymmetric dendrogram generated from a simulated Twitter on smart phones in Japan
196
S. Takumi and S. Miyamoto
with different smart phones as subclusters, and the second is mainly on different functions and capabilities that are common to different types of smart phones.
5
Conclusions
We have discussed a particular model of asymmetric similarity based on a bag model for application to document/term clustering and information on the web. We had a method handling an asymmetric similarity without any reversal on dendrograms. This property was also studied in [9], and will be a basis for further research of agglomerative hierarchical clustering with asymmetric similarity measures. As an application, we showed a small example of simulated Twitter. Although the example uses simulation, real Twitters and other information on the web can be analyzed by the same method. Acknowledgment. This work has partly been supported by the Grant-inAid for Scientific Research, Japan Society for the Promotion of Science, No. 23500269.
References 1. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1960) 2. Everitt, B.S.: Cluster Analysis, 3rd edn. Arnold, London (1993) 3. Hubert, L.: Min and max hierarchical clustering using asymmetric similarity measures. Psychometrika 38(1), 63–72 (1973) 4. Miyamoto, S.: Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer, Dordrecht (1990) 5. Miyamoto, S.: Introduction to Cluster Analysis, Morikita-Shuppan, Tokyo (1999) (in Japanese) 6. Okada, A., Iwamoto, T.: A Comparison before and after the Joint First Stage Achievement Test by Asymmetric Cluster Analysis. Behaviormetrika 23(2), 169– 185 (1996) 7. Saito, T., Yadohisa, H.: Data Analysis of Asymmetric Structures. Marcel Dekker, New York (2005) 8. Takeuchi, A., Saito, T., Yadohisa, H.: Asymmetric agglomerative hierarchical clustering algorithms and their evaluations. Journal of Classification 24, 123–143 (2007) 9. Takumi, S., Miyamoto, S.: Agglomerative Clustering Using Asymmetric Similarities. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds.) MDAI 2011. LNCS (LNAI), vol. 6820, pp. 114–125. Springer, Heidelberg (2011) 10. Yadohisa, H.: Formulation of Asymmetric Agglomerative Clustering and Graphical Representation of Its Result. J. of Japanese Society of Computational Statistics 15(2), 309–316 (2002) (in Japanese)
Applying Agglomerative Fuzzy K-Means to Reduce the Cost of Telephone Marketing Ming-Jia Hsu, Ping-Yu Hsu, and Bayarmaa Dashnyam Department of Business Administration, National Central University, No. 300, Jhongda Rd., Jhongli City, Taoyuan County 32001, Taiwan [email protected], [email protected], [email protected]
Abstract. This research utilizes marketing research database the Taiwan telecom itself has together with Agglomerative Fuzzy K-Means to proceed fuzzy clustering analysis. The database content includes online behaviors and basic properties of clients, such as online motive, online frequency, salary, and gender. First, we use descriptive statistics to determine the difference in online behavior among different client clusters; these differences among clusters comprise indexes. Next, we compare the obtained indexes with experts’ judgments to verify the precision of each index. These indexes can be used to estimate client’s mobile online hours and the adaptive tariff plan. In addition, while approaching different cases, sales personnel can specifically query on significant questions within the index. Moreover, using these pre-identification indexes, prolonged question analysis, especially on illogical answers, can be avoided. This can result in time saving and increase the number of cases handled, causing an overall improvement in industry performance. Keywords: Data mining, Clustering analysis, Internet behavior, Agglomerative Fuzzy K-Means.
renewal contracts depends simply on the tariff plan. Most firms in the telecom industry use marketing investigation databases, compiled by statistical analysis on the features of both original and new customers, to launch a tariff plan that is better suited and closer to customer needs. According to statistics, the average time spent by each telecom sales personnel on a new or renewal case is about 11.7 min. If the time can be reduced, telecom carriers can serve more customers by recommending an appropriate tariff plan, and consequently increase the mobile internet revenue. Therefore, this research utilizes market research data on online behavior of users as well as their basic information, performs clustering analysis to establish the diversity of usage among customers, and finally investigates specific questions to achieve improved efficiency. The structure of this paper is as follows: Section 1 introduces the background of the research with its motive and purpose. Section 2 reviews the related works. Section 3 describes the system design of the research. Here we identify the system design procedures in four steps: first, acquire properties that might affect the willingness of a user. Second, convert non-numerical properties into numerical values. Third, perform manual artificial clustering by fuzzy clustering algorithm to achieve clustering. Fourth, output the clustering result. Section 4 shows the system verification and result analysis. Section 5 provides the concluding remarks.
2 2.1
Review of Literature Internet Behavior
According to Kotler’s statement, the variables of market segment are mainly divided into two categories [1]: consumer characteristics and consumer response. Both motives and habits of mobile internet are cross variables of these two segments. Some researchers have turned “Recency (R),” “Frequency (F),” and “Money Value (M),” the three relationship indicators of customer evaluation from the traditional RFM analysis model into new variables for customer internet behavior study. “R” was transferred into recent surfer times, “F” into the average surfer times within 3 months, and “M” into the average surfer hours within 3 months. RFM analysis can be used to combine these three customer-value-based variables—as a customer segmentation tool—with a clustering algorithm to group customers, and then to determine the target customer [2]. Moreover, the previous internet behaviors could also be counted in the databases while estimating the willingness of potential customers. When performing customer segmentation, customers with different values (such as students and office workers) should be grouped in different clusters; however, different internet behaviors could lead to different values. 2.2
Fuzzy Clustering
The theory of Fuzzy Clustering was proposed by Zadeh [3]. Precisely speaking, , , … and each of , , … is a cluster. could be could be regarded as the Membership regarded as a Member Function, and Degree of the fuzzy clustering ; the Membership Degree value is between [0,1],
Applying Agglomerative Fuzzy K-Means to Reduce the Cost of Telephone Marketing
199
and ∑ 1. However, Hsu, TH, KM Chu, and HC Chan propose two ways of fuzzy clustering. One is clustering by Fuzzy relation, and the other is clustering by an Object function [4]. Fuzzy K-Means algorithm was proposed by Bezdek in 1981 [5], also called Fuzzy C-Means, is an extension of the K-Means algorithm which proposed by Dunn in 1974. Its main purpose is to solve the optimal clustering problem. It is suitable for spherical clustering detection [6]. Gustafson–Kessel (GK) Algorithm [6-7] is another fuzzy clustering algorithm, which is also described in [8] and [11]. GK is the deformation of Fuzzy K-Means; it uses “Adaptive Distance Measure” to calculate the distance between data sample and cluster centers; moreover, it adds the “Fuzzy Co-variances” concept for detecting, especially for data with clusters having different shapes. Thus, it is suitable for detecting ellipsoidal clusters. 2.3
Agglomeration Fuzzy K-Means and Fuzzy K-Means
K-Means [9] was proposed by MacQueen and utilizes Euclidean Distance as the measurement standard of similarity. It involves randomly selecting the initial center, and then repeatedly reducing the sum of squared difference, [12] which means dividing n numbers of samples into k clusters until optimized. In other words, it involves that individual clusters existing the most similar data, and yet the data should greatly differ from that of other clusters. Dunn and Bezdek are the pioneers of the application of Fuzzy theory in clustering analysis. They call it “Fuzzy K-Means.” Their method involves the determination of center of clusters before starting. The result shows better performance in some uncertain circumstances, which also corresponds with reality. Thus, it is broadly used for subjective consciousness removal. It should be noted that Fuzzy K-Means and KMeans both take random choice as the determination of initial points and are easily impacted by “Noise” interruption with unstable clustering results that still have space for improvement. The Agglomerative Fuzzy K-Means Clustering adopted here was proposed by Li, Ng, Cheung, and Huang in 2008. Agglomerative means clustering from the bottom to the top [12]. The calculation processes are as follows: Step 1 - divide the data into n clusters. Step 2 - calculate the similarity of the clusters by the distance between their centers. Step 3 - gather closer clusters into a new cluster. Step 4 - re-calculate the distance between the new cluster and the rest of the clusters; furthermore, merge the similar clusters. Step 5 - merge until all clusters are merged into one cluster. Frigui and Krishnapuram once proposed a fuzzy clustering method based on an agglomerate formula that utilizes the minimum of an objective function to create a continuous split and to reduce the number of clusters [10]. This method resolved two problems of Fuzzy K-Means [1]: (1) In practice, it is unknown to what value the initial number of clusters should be set. (2) K-Means-type algorithms are vulnerable to the impact of the initial setting point.
3
Progress of Clustering
In this paper, we use one of the market investigation results from a telecommunication corporation in Taiwan to facilitate our research. During the investigation process, the
200
M.-J. Hsu, P.-Y. Hsu, and B. Dashnyam
data of the telecom carrier itself is undisclosed. This market investigation is used to discern the acceptance degree of different tariff plans for mobile wireless internet as well as the internet behavior of customers. The table below lists three kinds of tariff plans: Table 1. Content of tariff plans Tariff Plan
1
2
3
3.1
Content 1. Monthly rent NT$450, 500 MB free online surfing volume; surpasses volume will be charge depending on its volume. For all-you-can-eat tariff, the upper limit set is NT$1000, with monthly rent NT$450 included. 2. No binding contract needed; however, with 24-months contract, one could purchase a network card for NT$900. 1. Monthly rent NT$250 for 3 days usage; however, NT$69 extra charge each day counting from the fourth day. No limit on the monthly rent. 2. 24-months contract needed. If one needs a network card, s/he could purchase one for NT$900. 1. Daily rent NT$39, 20 MB free online surfing volume; surpasses volume charge in depends. NT$70 maximum per day. Calculated by the days of customer use. 2. No binding contract needed, but neither is there a discount for network card purchase.
Attributes Captured from Market Research Data
There are various questions and options available in the market research. First, the sales personnel of the telecommunication company will select options usually used for customer analysis. Table 2 lists the codes for 18 customer attributes. Table 3 presents the codes for the consideration acceptance degree of the customer toward the proposed future tariff plans. The telecommunication company will follow the result of this investigation to perform customer clustering; for example, {WU1, WU2, WU3} = {1, 2, 3} will be the customer group in the same cluster. We assume that the responses of the consideration of acceptance degree to each tariff plan are independent, with no influence on each other. We used STATISTICA to process MNOVA analysis of the customer attributes chosen by sales personnel. WU1, WU2, and WU3 were verified individually and it was checked whether the above 18 attributes really have significant influence on WU1, WU2, and WU3. The result shows the following: {AT2, AWD, ADTW, ATHW, ATH, U1} has significant influence on WU1; {AWDW, AWD, ADT, U2, education, sex} has significant influence on WU2; {ATW2, ADT, U3, income} has significant influence on WU3. Let {AT2, AWD, ADTW, ATHW, ATH, U1}∪{AWDW, AWD, ADT, U2, education, sex}∪{ATW2, ADT, U3, income} to gain 14 attributes as {ATW2, AT2, AWDW, AWD, ADTW, ADT, ATHW, ATH, U1, U2, U3, education, income, sex}. As these 14 attributes have influence on WU1, WU2, and WU3 individually, we proceeded with the union. In other words, first we eliminated the attribute that has no influence on WU1, WU2, and WU3; then, we used these 14 attributes for further clustering.
Applying Agglomerative Fuzzy K-Means to Reduce the Cost of Telephone Marketing
201
Table 2. Customer attributes Code ATW1 ATW2 AT1 AT2 AWDW AWD ADTW ADT ATHW ATH U1 U2 U3 Expense
Age Education Income Sex
Content Grand total mobile internet online hours from Monday to Friday Grand total mobile internet online hours during the weekend (Saturday–Sunday) Grand total fixed lines online hours from Monday to Friday Grand total fixed lines online hours during the weekend (Saturday–Sunday) Average time of mobile internet usage a week during one recent month. Average time of fixed line internet usage a week during one recent month. Number of times per day mobile internet was accessed. Number of times per day fixed line internet was accessed. Number of hours per mobile internet usage. Number of hours per fixed line internet usage. The degree of understanding of mobile internet monthly rent tariff NT$450. The degree of understanding of mobile internet monthly rent tariff NT$250. The degree of understanding of mobile internet daily rent tariff plan. Monthly expense on broadband internet (ADSL, Cable, or optical fiber), (including online fee and electricity fee). Age Education level Monthly income Sex
Table 3. Acceptance degree of customers Tariff Plan WU1 WU2 WU3
3.2
Content & Inquiry Whether user would consider mobile internet monthly rent Tariff NT$450. Whether user would consider mobile internet monthly rent Tariff NT$250. Whether user would consider mobile internet daily rent tariff.
Option 1:Absolutely refuse; 2:Decline to accept; 3:Neutral; 4:Incline to accept; 5:Definitely accept
Determination of Center Point of the Cluster
As mentioned above in the literature discussion, the data must be converted into numerical values for Agglomerative Fuzzy K-Means calculation. Therefore, if the data we capture is non-numerical, it must be converted properly. In addition, to
202
M.-J. Hsu, P.-Y. Hsu, and B. Dashnyam
prevent errors in clustering results caused by the inconsistency of each attribute units, database normalization is needed. Formulas are as follows: ,
1, 2, 3 … ,
1, 2, 3 …
1
: represents customer attributes. , , …, : represents sample customer. , , …, is an n × 1 matrix. is the data after normalization. Starting from the determination of cluster center points and the numbers of initial clusters with the result of WU1, WU2, and WU3 (the consideration acceptance degree) approaching single clustering, {WU1, WU2, WU3} may be set as {1,1,1}, {1,1,2}….{5,5,5}. The existing sets of statistical results are shown in Fig. 1. These sets are also the basic cluster numbers that the telecommunication companies use for customer characteristic analysis.
Fig. 1. Existing sets of statistical results
This research uses these foundation cluster numbers as the initial number of clusters and determines the cluster centers by the following formula: ∑ ∑
1
1
represents the center of the cluster. represents the probability of the sample that belongs to a “certain cluster”; it is also called attribution degree. represents the 1st attribute of the sample . represents the numbers of clusters. represents the numbers of attributes. represents the rank of cluster. here is evaluated by the attribution probability of each The determination of sample customer against the description sets as shown in Fig. 1. For example, as 2 3 1, 2, 3, 4, 5 , then the th cluster, and its set is 1, 2, 3 , 1
、 、
Applying Agglomerative Fuzzy K-Means to Reduce the Cost of Telephone Marketing
203
attribution probability of all the other is 0; the attribution probability of is 1. Thus, we can use to progress the Agglomerative Fuzzy K-Means calculation. 3.3
Agglomerative Fuzzy K-Means
Here, we use Agglomerative Fuzzy K-Means Algorithm. The data must consist of numerical values; therefore, non-numerical data have to be transformed into as a numerical data. This algorithm is based on [1]: set , , …, cluster with numbers of sample and each sample . represents , , …, Agglomerative Fuzzy K-Means algorithm uses an objective formula, shown below, to get the minimum as well as to divide into numbers of clusters. ∑
,
,
λ∑
∑
∑
2
,
And ∑ 1 0,1 1 , and represents an n × k matrix, which also indicates the attribution probability of the sample to the th cluster, as well as the relevance between sample and cluster. represents an k × m is the similarity (which is also the distance matrix, is the number of attributes, between samples or between a sample to the cluster center), and is given by ∑
3
The left side of Equation (2), , , represents the basic K-Means algorithm that is used to calculate the dispersion degree of the cluster; the right side is used to calculate the entropy of the clustering process. As the entropy increases, its value becomes more negative. At the maximum value of the entropy, the entire objective function is minimized. Expressions for and that we used during calculation were derived from [E1]; the derivation formulas are as follows: ∑
1
∑
1
λ
∑
4
5 λ
is as follows: (the distance from sample to the th cluster The calculation of center)/(the sum of the distance from to all cluster centers). To control the changing process of the cluster volume, this algorithm sets a parameter λ to reduce the number of clusters. Its features are as follows: 1. With decreasing λ, the final value of the cluster volume will approach the initial plays a crucial role when volume of clusters. In addition, ∑ ∑ ∑ ∑ >λ ∑ ∑ . In this circumstance, the clustering process will minimize the clustering dispersion.
204
M.-J. Hsu, P.-Y. Hsu, and B. Dashnyam
2. With increasing λ, the final value of the cluster volume will gradually decrease. Progressively increasing λ will make the entropy smaller and force the initial cluster centers to move. If cluster centers move to the same place, the center points will merge. Here, λ ∑ ∑ plays a crucial role when ∑ ∑ <λ ∑ ∑ 3.4
Merging of Clusters
The flow chart is shown below:
,
,
Input initial 1 center set initial
,
and 1 0.1, and
and determine the number of the initial cluster 0.1
Progress Agglomerative Fuzzy K-Means Algorithm Calculate the distance between the clusters as the minimum of , occurs; when the distance of cluster to cluster approaches 0, combine the clusters into one. Which means represents the number of combined clusters.
,
No /10
Identify whether if ? Yes
1
,
Set
,
progress Agglomerative Fuzzy K-Means Algorithm, and calculate as the number of combined clusters.
No 1
If
1
Yes Output the clustering result.
Fig. 2. The existence sets of statistical results
First, we need to determine the initial numbers of cluster center . To prevent as high as possible. Then, decide the neglecting the optimal clustering result, set ,…, ,…, , 1 . Next, set attributes of each cluster center as the initial value of λ, and begin the implementation of the Agglomerative Fuzzy
Applying Agglomerative Fuzzy K-Means to Reduce the Cost of Telephone Marketing
205
K-Means Algorithm. Determine whether there is any cluster to be merged. If there is, decrease λ until there is no cluster to be merged. Then, execute the next loop. The purpose of the loop is to prevent the neglect of the optimal clustering result. Before executing the next loop, increase λ again. Then, execute the Agglomerative Fuzzy KMeans Algorithm as well. Determine whether there is any cluster to be merged after the minimum of , appeared. After that, gradually increase the value of λ , proceed with the algorithm repeatedly until all the cluster centers are merged into one. The merge depends on the distance between centers. When it approaches zero, merge the two center points into one cluster. Re-measure all distances between the samples . and the new center, thus creating a new 3.5
Clustering Result
When the number of clusters starts approaching one, observe the change in the number of clusters. The optimal clustering occurs when the parameter value keeps increasing and the value (cluster number) remains constant. Thus, from the Fig. 3 we may find the determination of the clustering result is two.
Fig. 3. The existence sets of statistical results
Then, output . Evaluate to determine which sample belongs to which cluster. 1. As 0, for all and 1, sample belongs to cluster th. against a particular cluster are all the same, and against other 2. As sample’s clusters are 0, then this sample might belong to this particular cluster. The clustering result could be taken as a database for customer characteristic analysis. 3. Utilizing the final results of each sample’s attribution probability toward these two clusters, together with the weighted average of the original data, we can get the real attribution value of these two cluster centers. Formula (6) is shown as follows: ,
,
,…
∑ ∑
,1
,1
6
206
M.-J. Hsu, P.-Y. Hsu, and B. Dashnyam
4
System Verification
4.1
Analysis of Clustering Result
After clustering the market research data, customers are divided into two clusters. Then, we proceed with the weighted average of the original data and attributions of these two clusters. The result is as follows: Table 4. Original attributes of the cluster center AWD
ADTW
ADT
ATHW
ATH
U1
U2
3 4
5 5
3 3
2 3
1 1
10 6
4 4
4 4
4 4
3 3
2 2
SEX
AWDW
7 6
INCOME
AT 2
10 8
EDUCA TION U3
ATW2 97 116
1 1
From the result of Table 4, we suspect sales personnel can ask five questions, ATW2, AT2, AWDW, ADT, and ATH, with different attributes while analyzing the responses. 4.2
Descriptive Statistics Analysis
The bottom of the market research questionnaire presents three tariff options. Users can rate the acceptance degree on a 1–5 scale, 1 being the lowest and 5 being the . highest. Thus, we can proceed with the analysis as follows by each cluster’s Analysis 1: Utilize Equation (6) with original data from WU1, WU2, and WU3 to individually calculate the acceptance degree of these three tariffs for these two clusters. The acceptance degrees of both student and employee groups to WU1 & WU3 are 3 and to WU2 are 2; thus, we exclude WU2 from our discussion. Analysis 2: Analyze student and employee groups and add in the occupation category as its analysis condition. From the descriptive statistics analysis, we may observe that the student group takes the largest proportion in cluster 97; on the other hand, white-collar workers take the largest proportion in cluster 116. Therefore, we named cluster 97 as students group and cluster 116 as employees group. Analysis 3: First, determine the differential attributes that result from the clustering. Utilize attributes different from those derived from the clustering result of the original two cluster centers with the weighted average to calculate the customer’s average mobile internet online hours per week. Formula is shown as follows: ATW
AT
ADT ATH
AWDW
As the calculation result shows, the weekly average of mobile internet online hours of the students group is around 36.375 h and that of the employees group is around 43 h. Therefore, we divided customers into students and employees. Then, we use these two
Applying Agglomerative Fuzzy K-Means to Reduce the Cost of Telephone Marketing
207
timescales to determine the surfing volume of the customer. If it is higher than the average value, we identify it as high level; if it is lower, we identify it as low level. Thus, sales personnel would not need to ask additional questions to clarify the needs of the customer but could recommend an appropriate tariff directly. As a result, time could be saved using this method. 4.3 Experiment Verification In generally, as shown in the previous statistical data of the telecommunication industry, the average time spent was around 11.7 min per case. Moreover, the ratio of time spent on analysis against promotion was 4:6. Thus, it took 4.68 min per case on analysis. The average number of cases handled for each sales personnel per day was 128 cases. The percentage of new and renewal mobile internet contracts was a total of 23%, which means around 29 cases. The transaction rate was 78% with 22 cases a day. Using our proposed analysis method, time spent on each case could be reduced to an average of 8.9 min. On analysis, it is reduced to 3.56 min. This indicates that through the sequence of inquiry starting with occupation first and later with the content of indexes, it can be reduced to 1.12 min on analysis, indicating 24% improvement. Thus, we infer that the average number of cases handled on either new or renewal contract for each sales personnel could increase to 38 cases. Compared with the previous record, this is an increase of 9 cases. Therefore, we consider that the result of this research, using our analysis method, will contribute to a time reduction in case handling; in addition, the number of cases handled for either new or renewal contracts would allow for significant growth. The number of transaction cases would grow from 22 to 30; this would provide increased scope for sales promotion and expansion in the internet business market.
5
Conclusion
The main purpose of this research is to enhance the mobile internet market business. We expect an increase in the total volume of cases handled by saving consultation time. If the volume of cases handled is increased, the numbers of mobile internet users will also increase. In this research, we adopt Agglomerative Fuzzy K-Means Clustering to advance fuzzy clustering. Utilizing market research database from the telecom industry, we acquired valuable information regarding customer internet behavior for clustering analysis. The clustering results divide customers into two groups, cluster 97 and cluster 116. Within these two clusters we found that there are differences amongst the five attribute sets, ATW2, AT2, AWDW, ADT, and ATH. Through the descriptive statistics, we observed that students comprise cluster 97, while office workers comprise cluster 116. Moreover, from the market research questionnaire, we observed that both these clusters are not interested in tariff plan 2 (WU2). Therefore, WU2 can be excluded from our discussion. Next, we transformed the above-mentioned five attributes into an index of probable online periods per week. The index values of student and employee groups are 36.375 and 43, respectively. Based on the average of these two index values, we can distinguish whether the
208
M.-J. Hsu, P.-Y. Hsu, and B. Dashnyam
customer is a strong or weak demand user. In addition, it creates a more efficient sequence of the analyzing process: Step 1 - inquire about details of occupation. Step 2 - inquire about the content of indexes. Thus, sales personnel could make an inquiry with specific questions and avoid questions with random choices. Furthermore, we pass these indicators to telecom sales personnel for on-the-spot examination. The time spent on analysis reduced to 1.12 min, which is a 24% improvement compared to the previous timing. Compared with the previous database, we infer that the total amount of new and renewal contracts could reach 38 cases; the amount of successful cases could rise from 22 to 30. Therefore, it indirectly enhances the potential revenue of the telecom carrier in the mobile market.
References 1. Li, M.J., Ng, M.K., Cheung, Y.-m., Senior Member, IEEE, Huang, J.Z.: Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters. IEEE Transactions on Knowledge and Data Engineering 20(11), 1519–1534 (2008) 2. Chuang, H.-M., Chang, Y.-K.: The Study on Application of Data Mining on Target Marketing -Based on the Telecommunication Users (2003) 3. Kotler, P., Armstrong, G.: Principles of Marketing, 7th edn. Prentice-Hill, Englewood Cliffs (1996) 4. Tjøstheim, I., Boge, K.: Norsk Regnesentral/Norwegian Computing Center. In: MOBILE COMMERCE–WHO ARE THE POTENTIAL CUSTOMERS? COTIM 2001 Proceedings From E-Commerce to M-Commerce (2001) 5. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithm. Plenum Press, New York (1981) 6. Yang, C., Bruzzone, L., Sun, F., Lu, L., Guan, R., Liang, Y.: A Fuzzy-Statistics-Based Affinity Propagation Technique for Clustering in Multispectral Images. IEEE Transactions on Geoscience and Remote Sensing 48(6) (2010) 7. Ding, Z.J., Yu, J., Zhang, Y.Q.: A new improved K-means algorithm with penalized term. In: Proc. IEEE ICCC, p. 313 (2007) 8. Zadeh, L.A.: Fuzzy sets. Information Control 8(3) 9. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist. Probability, vol. 1 (1967) 10. Frigui, H., Krishnapuram, R.: Clustering by Competitive Agglomeration. Pattern Recognition 30(7), 1109–1119 (1997) 11. 12.
Kansei Information Transfer Technology Yoshiteru Nakamori School of Knowledge Science Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan [email protected] Abstract. This paper introduces a governmental project1 to develop a recommendation system, which recommends products to customers, recognizing their ‘kansei’ desires. The project aims at supporting sales expansion and new products development in traditional crafts in Ishikawa prefecture, Japan. For this, the project is developing a technique for selecting and providing information according to individual person’s ‘kansei’ desire, in order to develop a ‘kansei’ search engine with an information aggregation system and a product data base system. In the future, this technique will be used in developing a technique to measure and evaluate peoples’ feeling or ‘kansei’, or a technique to select and provide ‘kansei’ information according to individual person’s preference, ability, or characteristics.
1
Introduction
‘Kansei’ is a Japanese word corresponding roughly to ‘sensibility’, which is an individual subjective impression from a certain artifact, environment or situation, using all senses of sight, hearing, feeling, smell, taste, and sense of balance. ‘Kansei’ engineering [1] deals with consumers’ subjective impressions and images of a product (artifact and service) into design elements. It is also referred to as ‘sensory engineering’ or ‘emotional usability’. ‘Kansei’ engineering has been developed and successfully applied to a variety of industries [2]. Especially in Japan, it has been widely applied to the product design processes in industries such as automotive, home electrons, office machines, cosmetics, food and drink, packaging, building products, and others [3]. This paper briefly introduces a national project in Japan which will develop a recommendation system to recommend products to customers, recognizing their ‘kansei’ desires, aiming at supporting sales expansion and new products development in traditional crafts in Ishikawa prefecture, Japan. The research contents include: (1) to validate an information aggregation method, which integrates physical information and ‘kansei’ information, based on the data from a largescale ‘kansei’ evaluation experiment; (2) to develop a technique of target-oriented decision analysis, which takes into account context information in addition to physical and ‘kansei’ information; (3) based on the above techniques, to develop 1
This study is supported by SCOPE 102305001 of Ministry of Internal Affairs and Communications (MIC), Japan.
Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 209–218, 2011. c Springer-Verlag Berlin Heidelberg 2011
210
Y. Nakamori
a recommendation system which includes a ‘kansei’ search engine, an information aggregation system, and a data base system of traditional crafts in Ishikawa prefecture, Japan. To accomplish the above tasks (1) and (2), we carried out an evaluation experiment, using 30 items of traditional Japanese pottery as evaluation objects, 30 pairs of words for ‘kansei’ evaluation, and asking 60 persons as evaluators. The semantic differential method [4] was used here to collect the so-called ‘kansei’ data. We used this data for two purposes: one is, as mentioned above, to validate a new aggregation framework which treats physical, ‘kansei’, and contextual information uniformly; the other is to select evaluation words which should be used in the recommendation system. A new aggregation framework is an extension of our previous works to deal with the inconsistent preference order relations as well as the vagueness of ‘kansei’ data [5][6], which applied the target-oriented decision model [7] to ‘kansei’ evaluation problems. The roots of recommendation systems can be traced back to the extensive work in cognitive science [8], approximation theory [9], information retrieval [10], or forecasting theories [11]. Recommendation systems have become an important research area since the appearance of papers on collaborative filtering in the mid1990s [12] [13] [14]. However, these approaches cannot be used for our purpose mentioned in (3) above, because the products used here are relatively expensive potteries, and general people buy them only occasionally. Instead of using recorded purchasing data, we have to develop a technique to transfer ‘kansei’ information via the interface of purchasing sites, which is the main theme of this paper.
2
System Overview
The objective of a national project “Research and Development on Kansei Information Transfer Technology Aiming at Vitalization of Traditional Industries in Ishikawa Prefecture” is to develop a recommendation system for supporting sales expansion and new products development in traditional crafts as mentioned above. Figure 1 illustrates that in reality a shop owner aggregates different types of information related to the desire of a customer and show the customer several products immediately. This project aims to develop such an ability artificially, therefore relates to such disciplines as knowledge engineering, decision science, ‘kansei’ engineering, and in total knowledge science. The project is divided into three stages: at the first stage, we validate an information aggregation method [15], which integrates physical information and ‘kansei’ information, based on the data from a large-scale ‘kansei’ evaluation experiment; at the second stage, we develop a technique of target-oriented decision analysis [16][6], which takes into account context information in addition to physical and ‘kansei’ information; and at the third stage, based on the above techniques, we develop a recommendation system which includes a ‘kansei’ search engine, an information aggregation system, and a data base system of traditional crafts in Ishikawa prefecture, Japan. The system searches products with words, based on a unified framework to treat information on physical, contextual and ‘kansei’ attributes. Examples of
Kansei Information Transfer Technology
211
Show me a poery vase, which is Quite modern; prey
Kansei aributes
Not so big; a lile expensive
Physical aributes
Suitable for the young.
Contextual aribute
How about this? Informaon aggregaon
Fig. 1. Information aggregation and product recommendation
physical attributes are product size, price, etc., contextual attributes are purpose of products such as for gift or for everyday use, and ‘kansei’ attributes are expressions of products by adjectives such as cute, traditional, etc. The novelties of this system are the new modeling technique using a unified framework for identification of fuzzy models for product attribute information, and the prioritized information aggregation method. The system consists of three types of managers to be described below. 2.1
Data Manager
The first one is the data manager which registers the necessary data recorded in Excel format to a relational database, including product attributes, profiles, photos, and evaluation of product attributes: – Physical attributes (small — large, low-cost — expensive, etc.) – Context attributes (for men — for women, for you — for gifts, etc.) – ‘Kansei’ attributes (soft — hard, cold — warm, etc.) In this research, an attribute is given by a pair of words, for instance, [cold, warm] In collecting data, product attributes are evaluated on a scale of seven grades, for instance, [very cold, cold, slightly cold, neither, slightly warm, warm, very warm]
212
Y. Nakamori
We memorize the evaluated data using the numbers [−3, −2, −1, 0, +1, +2, +3] corresponding to the above seven grades. Figure 2 shows the ptoducts used for ‘kansei’ evaluation, and Table 1 shows the pairs of words for evaluation. In this paper we omit to explain how to select the pairs of words to be used in the system.
Fig. 2. Objects for kansei evaluation
2.2
Model Manager
The second one is the model manager which develops fuzzy models using evaluation of product attributes. Here, it is important to select attributes to be used in the system because the evaluator’s opinions are divided for some attributes of objects. For example, consumers will be difficult to assess whether a product is traditional or contemporary. Such attributes will not be used in the system, or estimates by product creators or sales workers will be used. For each product, a fuzzy model, or a triangle-type membership function, for each attribute is created, and the model manager memorizes the left, center, and right points of the triangle-type membership function. For instance, the model of the attribute j = [cold, warm] of an object i is expressed as μij (x) : [−1.033, 0.467, 1.967] which correspond respectively to the left, center, and right points of the triangle.
Kansei Information Transfer Technology
213
Table 1. Pairs of kansei words
Soft Cool Busy Candid Luxury Calm Cute Plain Light Momentum Gentle Dynamic Rural Delicate Fresh Sociable Traditional Feminine Dignified Naive For senior For females For Western-style rooms For myself For visitors use Souvenir Expensive looking My taste I want Likely to be sold
Hard Warm Quiet Dense Simple Lilting Tasteful Flashy Heavy Serene Strong Static Urban Smart Typical Solemnly Contemporary Masculine Congenial Smart For young people For males For Japanese-style rooms For gift For routine use Wedding gift So affordable Not in my favor I do not want Not likely to be sold
The model manager also prepares the membership functions of seven targets: [very cold, cold, slightly cold, neither, slightly warm, warm, very warm] as follows: μ−3 (x) =
μ+3 (x) =
⎧ ⎨
0,
x ≤ −3
⎩
−x − 2, −3 ≤ x ≤ −2 ⎧ 2≤x≤3 ⎨ x − 2, ⎩
0,
3≤x
214
Y. Nakamori
and for k = −2, −1, 0, 1, 2, μk (x) =
⎧ ⎨ x + 1 − k, k − 1 ≤ x ≤ k ⎩
−x + 1 + k, k ≤ x ≤ k + 1
These target membership functions will be used in reasoning to be explained in the next section. Figure 3 shows a model of the attribute j = [cold, warm] of the product i, and seven targets given by membership functions. The figure also indicates the way to calculate the fitness value of the target (a little warm: +1) with the model; the details will be given below.
+1
µij (x)
A model
-3
µ +1 ( x)
Targets
-2
-1
0
+1
+2
+3
warm
Product i is more or less warm
+1
: fitness value of the target (a lile warm: +1) with the model warm -3
-2
-1
0 +1
+2
+3
Fig. 3. A model and seven targets
2.3
Reasoning Manager
The third one is the reasoning manager which receives customer needs, and presents recommended products. Here, we use the idea of target-oriented decision support, that is, we use the idea of target-orientation instead of optimization. A customer expresses his/her desires (targets), for instance, “Please show me Kutani2 vases, which are not very loud, not so much greater, and appropriate for the sixtieth birthday celebration”. The system has the function to 2
China-making in Kutani dates back around 340 years to the middle of the 17th century. As part of a policy to encourage local industry, a kiln to make colored porcelain was established at Kutani village, now in Ishikawa Prefecture, when highquality china clay was discovered there.The type of pottery was named Kutani-yaki after the village where it began. It had a distinctive style that involved the plentiful use of Kutani gosai, or the five hues of Kutani, including deep blue, purple, yellow, green, and red, which were used to cover the entire surface with colored decoration. Themes included birds and flowers, landscape motifs, and geometrical patterns.
Kansei Information Transfer Technology
215
show some vases which meet the demand of physical information (not so much greater), context information (for the sixtieth birthday celebration), and ‘kansei’ information (not very loud). The system treats the above information as follows: – For the attribute [small, big], the system reads that the demand is small, or the attribute value of around -2 is requested. – For the attribute [for youth, for older], the system reads that the demand is very old man facing, or the attribute value +3 is requested. – For the attribute [quiet, loud], the system reads that the demand is somewhat sober, or the attribute value -1 is requested. Theoretically, there is no limitation about the number of attributes that the customer can express his/her targets. But, for the simplicity, let us denote the above attributes as j1 , j2 , j3 . The evaluation models of the above attributes for a product i are given by the following membership functions of triangle-type: μij1 (x), μij2 (x), μij2 (x) The fitness values of this product i to the desires: small, very old man facing, and somewhat sober are defined by fij1 = max min {μ−2 (x), μij1 (x)} x
fij2 = max min {μ+3 (x), μij2 (x)} x
fij3 = max min {μ−1 (x), μij3 (x)} x
The reasoning manager defines the fitness of the product i to the three targets as ei = AO[fij1 , fij2 , fij3 ] Here, AO[·] is an information aggregation operator. We can use arbitrary operator such as the OWA operator (Ordered Weighted Averaging Aggregation Operators)[17], but this paper proposes a new information aggregation method.
3
Mathematical Description
In case where we have a plenty of data from an evaluation experiment, we define a membership function as follows, which indicates a vague fitness of the product i with the attribute j: ⎧ 1 ⎨ cσi {x − (mij − cσi )}, x ≤ mij μij (x) = ⎩ 1 − cσi {x − (mij + cσi )}, x ≥ mij Here, mij is the average, σij is the standard deviation, and c > 0 is a tuning parameter. If the value of above function is smaller than a given positive small number, then we replace its value with this positive number.
216
Y. Nakamori
For the modeling of requests, the system generally uses the attribute values as [−n, −n + 1, · · · , −1, 0, +1, · · · , n − 1, n] where n is a positive integer. The membership functions of requests are defined as follows: – For k = −n, μk (x) = – For −n < k < n,
⎧ ⎨ −x + 1 + k, k ≤ x ≤ k + 1 ⎩
0,
otherwise
⎧ x + 1 − k, k − 1 ≤ x ≤ k ⎪ ⎪ ⎪ ⎪ ⎨ μk (x) = −x + 1 + k, k ≤ x ≤ k + 1 ⎪ ⎪ ⎪ ⎪ ⎩ 0, otherwise
– For k = n, μk (x) =
⎧ ⎨ x + 1 − k, k − 1 ≤ x ≤ k ⎩
0,
otherwise
In the product recommendation, the evaluation model for the attribute j of the product i is given by the following membership function, where J indicates the set of attributes for which the targets are given: μij (x),
j∈J
The fitness of the product i to the target is defined by fij = max min {μk (x), μij (x)} , x
j∈J
Here, μk (x) is the membership function of the target level k. We then introduce the priority level. If the desire on the attribute j is the level l, then we transform the above fitness by Fij = Gl (fij ) ,
l = 1, 2, · · · , L
The transformation function Gl is defined by ⎧ 2L−l l 0 ≤ x ≤ 2L ⎨ l x, Gl = ⎩ l l (x − 1) + 1, 2L ≤x≤1 2L−l The fitness of the product i to the target is defined by Ei = AO [Fij , j ∈ J] = min {Fij , j ∈ J} The recommendation is made in descending order of Ei (Max-min strategy). Thus, the priority is introduced to look for products that meet the needs of all.
Kansei Information Transfer Technology
217
Then we introduce the values of the fitness attribute of low-priority targets. We define the overall fitness of the product by the smallest attribute value after transformation. Figure 4 shows examples of the transformation functions and gives an example in which the attribute j1 becomes important because its transformed fitness value is the smallest one.
Gl
1: vastly most important 2: much more important 3: more important 4: average 5: less 6: much less 7: vastly less
Fig. 4. Transformation functions and an example of transformation
4
Conclusion
This paper briefly explained a national project in Strategic Information and Communications R&D Promotion Program, Ministry of Internal Affairs and Communications, Japan. There are many interesting research topics in selection of words (attributes), selection of data (public feelings or experts’ opinions), modeling (statistical, fuzzy-sets [18], rough-sets [19] approaches), aggregation methods (OWA [17], RFP [20], prioritization [5], etc.), and system development with networking of providers.
References 1. Nagamachi, M.: Kansei engineering: A new ergonomic consumer-oriented technology for product development. International Journal of Industrial Ergonomics 15, 3–11 (1995) 2. Nagamachi, M.: Kansei as powerful consumer-oriented technology for product development. International Journal of Industrial Ergonomics 33, 289–294 (2002)
218
Y. Nakamori
3. Childs, T., de Pennington, J., Rait, J., Robins, T., Jones, K., Workman, C., Warren, S., Colwill, J.: Affective design (Kansei engineering) in Japan. Faraday Packaging Partnership, Univ. Leeds, Leeds (2001) 4. Osgood, C.E., Suci, G.J., Tannenbaum, P. H.: The Measurement of Meaning. Univ of Illinois Press, Urbana (1957) 5. Yan, H.B., Huynh, V.N., Murai, T., Nakamori, Y.: Kansei evaluation based on prioritized multi-attribute fuzzy target-oriented decision analysis. Information Sciences 178(21), 4080–4093 (2008) 6. Huynh, V.N., Yan, H.B., Nakamori, Y.: A target-based decision-making approach to consumer-oriented evaluation model for Japanese traditional crafts. IEEE Transactions on Engineering Management 57(4), 575–588 (2010) 7. Bordley, R., LiCalzi, M.: Decision analysis using targets instead of utility functions. Decisions in Economics and Finance 23(1), 53–74 (2000) 8. Rich, E.: User modeling via stereotypes. Cognitive Science 3(4), 329–354 (1979) 9. Powell, M.L.D.: Approximation Theory and Methods. Cambridge University Press, Cambridge (1981) 10. Salton, G.: Automatic Text Processing. Addison-Wesley, Boston (1989) 11. Armstrong, J.S. (ed.): Principles of Forecasting - A Handbook for Researchers and Practitioners. Springer, New York (2001) 12. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: An open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (CSCW 1994), Chapel Hill, NC, USA, October 22-26, pp. 175–186 (1994) 13. Shardanand, U., Maes, P.: Social information filtering: Algorithms for automating ‘word of mouth’. In: Proceedings of the ACM CHI 1995 Human Factors in Computing Systems Conference, Denver, CO, USA, May 7-11, pp. 210–217 (1995) 14. Hill, W., Stead, L., Rosenstein, M., Furnas, G.: Recommending and evaluating choices in a virtual community of use. In: Proceedings of the ACM CHI 1995 Human Factors in Computing Systems Conference, Denver, CO, May 7-11, pp. 194–201 (1995) 15. Huynh, V.H., Nakamori, Y., Ryoke, M., Ho, T.B.: Decision making under uncertainty with fuzzy targets. Fuzzy Optimization and Decision Making 6(3), 255–278 (2007) 16. Huynh, V.N., Nakamori, Y., Lawry, J.: A probability-based approach to comparison of fuzzy numbers and applications to target oriented decision making. IEEE Transactions on Fuzzy Systems 16(2), 371–387 (2008) 17. Yager, R.R.: On ordered weighted averaging operators in multi-criteria decision making. IEEE Transactions on Systems, Man and Cybernetics 18(1), 183–190 (1988) 18. Zadeh, L.A.: The concept of linguistic variable and its applications to approximate reasoning. Information Sciences 4, 199–249 (1978) 19. Pawlak, Z.: Rough sets. International Journal of Parallel Programming 11(5), 341– 356 (1982) 20. Wierzbicki, A.P.: Multiple criteria decision making: Theory and applications. In: Fandel, G., Gal, T. (eds.). LNEMS, vol. 177, pp. 468–486. Springer, Heidelberg (1988)
Combining Binary Classifiers with Imprecise Probabilities Sébastien Destercke1 and Benjamin Quost2 1 2
INRA/CIRAD, UMR1208, 2 place P. Viala, F-34060 Montpellier cedex 1, France [email protected] HEUDIASYC, UMR UTC-CNRS 6599. Université de Technologie de Compiègne. Centre de Recherches de Royallieu. BP 20529, F-60205 COMPIÈGNE, France [email protected]
Abstract. This paper proposes a simple framework to combine binary classifiers whose outputs are imprecise probabilities (or are transformed into some imprecise probabilities, e.g., by using confidence intervals). This combination comes down to solve linear programs describing constraints over events (here, subsets of classes). The number of constraints grows linearly with the number of classifiers, making the proposed framework tractable for problems involving a relatively large number of classes. After detailing the method, we provide some first experimental results illustrating the method interests.
1
Introduction
A straightforward approach to multi-class classification tasks consists in training a single classifier to separate each class. An alternative strategy, known as binary classifier combination [5,4], has been shown to be efficient in a number of situations. It consists in separating the initial problem into simpler subproblems, solving each subproblem with a dedicated classifier, and combining the results thus obtained. Classical decomposition strategies consist in opposing each class to each other (one-versus-one scheme) [5] or each class to all others (oneversus-all scheme). Decomposition using error-correcting output codes (ECOC) [4] generalizes both these approaches, by opposing two subsets of classes to each other. When the trained classifiers provide probabilistic outputs, the problem boils down to pool estimates of the conditional posterior probabilities of the classes. Remark that these conditional probabilities are seldom consistent, due to the fact that they are only approximations to the (admittedly) true but unknown conditional probabilities. A classical solution is to compute a probability distribution whose conditionings are the closest possible to the outputs of the classifiers, by solving an optimization problem. Such techniques have been studied for the one-versus-one decomposition strategy (see, e.g., [6,14]) as well as in the general ECOC framework [8]. In this paper, we address the problem in the framework of imprecise probability theory. Imprecise probability theory [13] deals with the cases where the Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 219–230, 2011. c Springer-Verlag Berlin Heidelberg 2011
220
S. Destercke and B. Quost
available information is not sufficient (or too conflicting) to identify a single probability distribution. It is therefore well suited to the combination of inconsistent (precise or imprecise) conditional probabilities. Our proposal is to weaken the given conditional assessments to make them consistent, and to consider the resulting set of probabilities as our final predictive model. Due to their robustness, imprecise probabilistic models appear particularly interesting in cases where some classes are difficult to separate or poorly represented in the training set, or when the data are very noisy. To our knowledge, there is no previous work on the combination of binary classifiers in this framework, although the combination of one-versus-one imprecise classifiers has already been studied in the framework of belief functions [10]. Section 2 reminds background knowledge about imprecise probabilities. Section 3 describes how binary classifiers returning imprecise conditional probabilities may be combined. As classifier outputs may still be inconsistent in this framework, we propose a discounting strategy ensuring that a consistent result is reached. Finally, we report some experiments on simulated and real data sets, for the one-versus-one decomposition strategy, considering the cases of precise and imprecise classifiers (Section 4).
2
Imprecise Probabilities: A Short Introduction
Let X = {x1 , . . . , xM } be a finite space of M elements describing the possible values of (ill-known) variables (here, X is the set of possible classes). We assume that the uncertainty about the actual value of a variable X is described by a convex set of probablities P, often called credal set. An usual way to describe P is by providing a set of linear constraints restricting the possible probabilities in P (Walley’s lower previsions [13] correspond to bounds of such constraints). Credal sets are instrumental models of uncertainty when available information does not allow one to identify a unique probability of interest (here, information coming from binary classifiers). From a credal set P, one can compute lower and upper probabilities P , P such that, for any event A ⊆ X , P (A) = inf P (A) P ∈P
and
P (A) = sup P (A). P ∈P
They are dual, in the sense that P (A) = 1 − P (Ac ), with Ac the complement of A. More generally, given a real-valued and bounded function f on X , one can compute lower and upper expectation bounds E, E such that E(f ) = sup E(f ) p∈P
and
E(f ) = inf E(f ). p∈P
with E the expected value of f w.r.t. p. They are also dual, as E(f ) = −E(−f ). Lower and upper probabilities of an event A are the lower and upper expectations of its indicator function. Alternatively, one can start from constraints on expectations and consider the credal set satisfying these constraints.
Combining Binary Classifiers with Imprecise Probabilities
3
221
Classifier Combination Using Imprecise Probabilities
The basic task of classification is to predict the class or output value x of an object described by a set of features y ∈ Y. Provided that these features are known, the knowledge of the actual class x of y may be represented by a posterior probability p(x|y). Classifying y then amounts to estimate as accurately as possible p(x|y) using a finite set of training samples. A binary classifier on a set of classes X aims at predicting whether an instance class belongs to a subset A ⊆ X or to a (disjoint) subset B ⊆ X (i.e., A ∩ B = ∅). Its prediction is then an estimation of the conditional probability P (A|A ∪ B, y) that the instance belongs to A (P (B|A ∪ B, y) = 1 − P (A|A ∪ B, y) by duality).1 Combining binary classifiers then consists in finding p from a set of such conditional assessments. To model a set of binary classifiers, we may use the language of code correction matrices. A code correction matrix is a matrix C with general element cij ∈ {+1, 0, −1}, i ∈ 1, . . . , M with M the number of classes, and j ∈ 1, . . . , N with N the number of binary classifiers. For a given j, the sets Aj = {xi |cij = 1, i = 1, . . . , M } and Bj = {xi |cij = −1, i = 1, . . . , M } are respectively the sets of positive and negative classes in the training set of classifier j. We now recall the combination problem in a precise setting and then extend it to an imprecise setting. 3.1
The Precise Case
In the precise case, classifier j returns an evaluation P (Aj |Aj ∪ Bj ) = αj . Using the fact that P (Aj |Aj ∪ Bj ) = P (Aj )/P (Aj ∪Bj ) and P (Bj |Aj ∪ Bj ) = P (Bj )/P (Aj ∪Bj ) = 1 − P (A |A ∪ B ), we obtain from these two equations the j j j following equality2 αj P (Aj ) = P (Bj ). (1) 1 − αj This gives N equalities that describe partial knowledge about the true but unknown probabilities of the classes. As the number of equalities is usually much higher than the cardinality M of X , the problem is often over-constrained. Moreover, the outputs of the classifiers are estimates of the true conditional probabilities; hence, they are generally not consistent with each other. For these reasons, the set of equations deduced from the outputs of the classifiers are often without solutions, as shows the next example. Example 1. Consider a 3 classes problem X = {x1 , x2 , x3 }. Assuming we are working with a one-against-one framework (i.e., |Aj | = |Bj | = 1), consider the following output of classical probabilistic classifiers: P ({{x1 }|{x1 , x2 }}) = 0.2, P ({{x1 }|{x1 , x3 }}) = 1/3, P ({{x2 }|{x2 , x3 }}) = 0.8. 1 2
From now on, we will drop the y in the conditional statements, as the combination always concern a unique instance whose input features remain the same. We assume here that p({x}) is strictly positive for any x ∈ X . In a practical setting, this does not appear as a restrictive assumption, as p({x}) can be as small as possible.
222
S. Destercke and B. Quost
These statements, once transformed to express unconditional constraints, respectively give the equalities (using the notation pi = p(xi ) = P ({xi })) p1 = 1/4p2 , p1 = 1/2p3 , p2 = 4p3 , which lead (together with the consistency constraints xi ∈X pi = 1, pi ≥ 0) to a system without solution. To solve this issue, the methods proposed in [6,14] solve optimization problems to compute the probability distribution for which the conditional probabilities are as close as possible to the outputs of the classifiers. 3.2
The Imprecise Case
Let us now consider imprecise binary classifiers. The output of classifier j (or the transformation into imprecise probabilities of its precise output) is now a pair of values bounding the conditional probabilities obtained in the precise case. We will denote by αj , βj the bounds of P (Aj |Aj ∪ Bj ), that is αj ≤ P (Aj |Aj ∪ Bj ) ≤ βj
(2)
and, by complementation, we have 1 − βj ≤ P (Bj |Aj ∪ Bj ) ≤ 1 − αj .
(3)
To get a joint credal set from these constraints, we will turn them into linear constraints over unconditional probabilities. First, assuming again that P (Aj ∪ Bj ) > 0, we transform equations (2) and (3) into αj ≤
P (Aj ) P (Bj ) ≤ βj and 1 − βj ≤ ≤ 1 − αj . P (Aj ∪ Bj ) P (Aj ∪ Bj )
By dividing these two equations, we reach the following inequality: αj P (Aj ) βj ≤ ≤ , 1 − αj P (Bj ) 1 − βj which can in turn be transformed into two linear constraints: αj βj P (Bj ) ≤ P (Aj ) and P (Ai ) ≤ P (Bj ). 1 − αj 1 − βj These equations can be restated as αj βj pi ≤ pi and pi ≤ pi 1 − αj 1 − βj xi ∈Bj
with pi := P (xi ).
xi ∈Aj
xi ∈Aj
xi ∈Bj
(4)
Combining Binary Classifiers with Imprecise Probabilities
223
These constraints define a set of admissible probability distributions. For each class xi , the minimal/maximal values P ({xi }), P ({xi }) for P ({xi }) may be computed by solving the constrained optimization problems P ({xi }) = min pi and P ({xi }) = max pi under the constraints (4), xi ∈X pi = 1, and pi > 0 for all i = 1, . . . , M . Note that if N classifiers are trained, then there are 2N such equations. This means that the number of constraints grows linearly with the number of classifiers, while the number of variables remains constant (= M ). As the quantity of classifiers will remain limited (usually between M and M 2 ), P ({xi }) and P ({xi }) may be computed using modern and efficient optimization techniques. Example 2. Consider the same situation as in Example 1 with classifiers providing the (slightly) relaxed system such that 1/9p 2
corresponding to the classifier outputs P ({{x1 }|{x1 , x2 }}) ∈ [0.1, 1/3], P ({{x1 }|{x1 , x3 }}) ∈ [1/6, 0.4], P ({{x2 }|{x2 , x3 }}) ∈ [2/3, 0.8]. Note that the constraints of Example 1 are included in these ones. The above system is no longer without solution, e.g., p1 = 0.1, p2 = 0.6 and p3 = 0.3 is an admissible solution. Getting the minimal/maximal probabilities for each class then comes down to solve 6 optimization problems (i.e., minimising and maximising the probabilities pi ), that give p1 ∈ [0.067, 0.182],
p2 ∈ [0.545, 0.735],
p3 ∈ [0.176, 0.31].
Note that even imprecise outputs can be inconsistent. Such a situation corresponds to the case P = ∅. In the next section, we suggest a relaxation strategy to ensure that a given set of classifier outputs will end up in a feasible system (possibly providing a vacuous, i.e. non-informative, solution). 3.3
Handling Inconsistent Outputs: A Discounting Strategy
We propose a strategy to handle inconsistent classifier outputs, i.e., outputs for which the induced linear problem is not feasible. Let ∈ [0, 1] be a discounting factor. The -discounted problem corresponds to the constraints (1 − )αj ≤ P (Aj |Aj ∪ Bj ) ≤ + (1 − )βj ,
j = 1, . . . , N.
(5)
Discounted constraints on P (Bj |Aj ∪ Bj ) are obtained by complementation. These constraints generalize those defined by Equation (4), which are retrieved when = 0. Such a discounting strategy is common in imprecise probabilistic
224
S. Destercke and B. Quost
approaches, since it corresponds to the -contamination model [13, Sec. 3.2.5] and to the basic discounting operation in evidence theory [9]. We denote by P the credal set obtained by discounting the initial problem with a value . When the classifier outputs are not consistent, we propose to increase this factor up to the point where the linear problem becomes feasible (i.e. the associated credal set is non-empty). Note that there is always a value of for which the problem will become feasible, as = 1 corresponds to trivial constraints P (Aj |Aj ∪ Bj ) ∈ [0, 1], meaning that the set P 1 corresponds to the set of all probability measures on X . This is sufficient to ensure that the linear problem given by Eq. (5) will be feasible for some value . For a given instance y, let ∗ be the lowest discounting factor such that the set of constraints induces a feasible problem: ∗ = min∈[0,1] P = ∅. This value ∗ gives an information on the global level of conflict of the various classifiers. Indeed, ∗ = 0 means that all classifiers are consistent, and thus no discounting is needed. On the other hand, ∗ 1 means that at least one classifier gives a conditional information that is strongly conflicting with the others. It turns out that computing ∗ exactly is a difficult optimization problem. Indeed, one may imagine introducing ∗ as a variable in the optimization problem described in Section 3.2, to compute it along with the probabilities pi , i = 1, . . . , M . However, this results in a linear optimization problem with quadratic constraints, the Lagrangian of which is indefinite. Eventually, note that the obtained credal sets for different values of are nested (i.e., P ⊆ P for any ≤ ). This makes the current approach close to other similar models proposed in the imprecise probabilistic literature [1]. In the present work, the value should not be interpreted as having any statistical meaning in terms of confidence value. Linking to some statistical confidence value is the matter of further work. 3.4
Decision Making and Classifier Accuracy
We now assume that the combination of the classifiers has been processed and that a credal set modelling our knowledge about object y class has been obtained, hence a decision regarding the actual class of object y should be taken. Imprecise probability theory proposes many ways to make a decision [12]. In this article, we will consider two ways of extending classical decision making based on maximal expected value. A first strategy will consist in classifying an object y into a single class; alternatively, a set of plausible classes may be retained, the ultimate decision being left to the user. We will retain a rule for each of these approaches: the maximin and interval dominance rule, respectively. Let xi ∈ X be a class, and let P ({xi }) and P ({xi }) be the minimal and maximal values for P ({xi }) computed as described in Section 3.1. The maximin decision rule amounts to classify y into x such that x := arg max P ({xi }). xi ∈X
Using this rule requires to solve M linear systems with 2N + M + 1 constraints and to achieve M comparisons.
Combining Binary Classifiers with Imprecise Probabilities
225
of classes such that Interval dominance rule amounts to select the set X := {xi ∈ X | ∃xj s.t. P ({xi }) ≤ P ({xj })}. X Using this rule requires to solve 2M linear systems with 2N + M + 1 constraints i.e., the and to make M (M − 1) comparisons at most. Also note that x ∈ X, optimal solution given by the maximin rule is necessarily one of those retained by the interval dominance rule. Example 3. Assume that for some data point y, combining the classifiers outputs gave the following minimal/maximal probabilities: p1 ∈ [0.034, 0.183] p2 ∈ [0.245, 0.735] p3 ∈ [0.143, 0.370]. Using the maximin rule, we would classify y into class x2 . The interval dominance = {x2 , x3 }. rule would give the set of plausible classes X Note that as the interval dominance rule selects a set of plausible classes, evaluating the accuracy of the classifier is no longer trivial. In this article, we considered two strategies. The first one consists in considering the classification as fully accurate whenever the actual class of an evaluated data point belongs It amounts to consider that the final decision is left to to the set of classes X. the user, who always makes the good choice. The error rate thus computed is an optimistic estimate of the accuracy of the classifier. We denote by Sacc this estimate. Another solution is to use a discounted accuracy measure, denoted Dacc from now on. Assume we have T labeled observations whose classes xi , i = 1 , . . . , X T have 1, . . . , T are known and for which T sets of plausible classes X been selected. The discounted accuracy of the classifier is then Dacc =
T 1 Δi , i |) T i=1 f (|X
(6)
i , zero otherwise and f an increasing function such that with Δi = 1 if xi ∈ X f (1) = 1. Note that f (x) = x is the usual choice for the discounted accuracy. It has recently been shown [16] that this choice leads to consider imprecise classification as being equivalent to make a random choice inside the set of optimal classes. This comes down to consider that the decision strategy is risk neutral; i.e., having imprecise classification in case of ambiguity is not considered as an advantage. Sacc corresponds to f (x) = 1 in Eq (6).
4
Experiments
In this section, we report experiments carried on synthetic and real data. 4.1
Dataset Description
The datasets used in the experiments are presented in Table 1. One of the datasets was generated for the purpose of illustrating graphically the properties of our method. The others were downloaded from the UCI machine learning repository (http://archive.ics.uci.edu/ml).
226
S. Destercke and B. Quost Table 1. UCI data sets used in experiments
# samples training set test set 139 75 3284 2189 2921 2573 1400 910 2250 750 528 462 1491 3509 890 594
Classification Strategies
In these experiments, we used two classification algorithms as base classifiers: logistic regression [7] and evidential K-nearest neighbours (K-NN)[2]. The former method consists in estimating the posterior probabilities of the classes, based on a log-likelihood maximization over all training data. The latter is a K-NN algorithm proposed in the framework of belief functions. It considers the distance between the neighbours of a data point to classify this data point; the learnt distance minimizes the conflict over training data. The number of neighbours is set to K = 10 throughout the experiments. This method directly provides imprecise outputs in the form of belief masses. We focused on the one-versus-one or pairwise decomposition strategy: each pair of classes forms a classification problem solved by a binary (pairwise) classifier. The classifiers are then pooled using the approach presented in Section 3.2. As mentioned in Section 3.3, computing the discounting factor ∗ together with the minimal or maximal probabilities in the optimization problem is difficult. Therefore, we have adopted the following strategy in our first experiments: for each test data, we start with a low = 0.001 and increment it gradually and linearly by steps of 0.05. The corresponding beliefs (respectively, plausibilities) of the singletons provide us with lower (resp., upper) bounds on the conditional probabilities. Decisions were taken using the minimax rule and the intervaldominance rule presented in Section 3.4. In the latter case, we reported both the set accuracy Sacc and the discounted accuracy Dacc . We compared our method to three other classification strategies. The multiclass approach consists in learning a single classifier to directly solve the global classification problem. This method will be refered to as “Single”. We also used two methods for combining conditional probabilities provided by the pairwise classifiers [6,14]. Both of these methods compute the probability distribution whose conditionings are the closest possible to the classifier outputs. The former uses a Kullback-Leibler divergence to penalize the difference between the conditionings and the classifier outputs. We will refer to this method as “PComb1”. The second one uses a L2 penalization criterion; the combination may be processed by solving a linear system. This method will be denoted “PComb2”. Note
Combining Binary Classifiers with Imprecise Probabilities
227
that when evidential K-NN are used, the belief masses are transformed into probabilities using the pignistic transform [11], prior to combination using both these methods. Results obtained using logistic regression. We first present some results obtained on the synthetic two-dimensional data. The discounting level and (interval dominance rule) are displayed on the number of plausible classes |X| Figure 1. As may be observed, the conflict between the outputs of the classifiers (and hence the level of discounting required) is higher in regions where classes overlap. Consequently, the number of plausible classes retained by the interval dominance rule increases in these regions.
Fig. 1. Discounting levels (left) and number of optimal classes (right), synthetic dataset
Table 2 presents the performances of the various methods compared on the datasets. The significance of the differences between the results was evaluated using a McNemar test [3] at level 95%. The best results are underlined; results that are not significantly different are printed in bold. Table 3 presents the average number of plausible classes retained by the interval dominance rule, both when using logistic regression and evidential K-nearest neighbours. Table 2. Test error rates obtained using logistic regression classifiers dataset synthetic glass pageblocks satimage segment vowel waveform yeast
single 4.80 44.00 4.02 14.26 18.02 51.30 13.94 46.30
logistic regression all well classified misclassified data data data 1.04 1.04 1.00 2.16 2.50 3.05 1.23 1.21 1.20 2.05 2.12 2.08 2.57 2.59 1.00 8.47 8.33 6.91 1.11 1.11 1.12 4.08 4.18 4.89
evidential K-NN all well classified misclassified data data data 1.00 1.00 1.00 1.19 1.07 1.06 1.36 1.38 1.31 1.02 1.02 1.02 1.55 1.59 1.03 1.14 1.15 1.06 1.36 1.38 1.49 1.23 1.23 1.38
First, it may be remarked that all methods providing precise decisions perform very close to each other. The set of plausible classes retained by the interval dominance often encloses the actual class of the pattern, as suggests the optimistic classification error reported in Table 2. However, the discounted accuracy criterion may give very poor results compared to the others. This suggests that using an imprecise decision criterion may be very fruitful in classification tasks, but that imprecision may be high in some regions (possibly requiring the use of more advanced classifiers in such regions). It should also be noted that interval dominance is one of the most “cautious” imprecise probabilistic decision rule, in the sense that it results large possible sets of optimal classes. Other more precise decision rules could have been used. Devising a method to pick the “best” decision rule is the matter of future work. Eventually, note that imprecise classifications seem evenly distributed among well classified and misclassified data. Imprecise case. The results obtained using the evidential K-nearest neighbours algorithm [2] are presented in Tables 4 and 3. Remark first that the best results are often obtained using a single classifier. However, the results obtained using the other strategies are never significantly different from the best one. Also note that the average number of plausible classes retained by the interval dominance strategy is overall low. This suggests that the imprecision resulting from the combination of the classifier outputs is lower than in the previous case. A possible explanation is that the classifier outputs are here more consistent. Indeed, a K-NN classifier is more likely to fit complex non-linear decision boundaries than logistic regression. It also tends to return imprecise outputs that are usually not as close as 1 or 0 than logistic regression. We can therefore expect less conflict in the final assessments, whence more precise decisions. These results clearly show that the choice of the algorithm (and of its parameters) is important, and that this matter should be treated in future works.
Combining Binary Classifiers with Imprecise Probabilities
229
Table 4. Test error rates obtained using evidential K-nearest neighbours dataset synthetic glass pageblocks satimage segment vowel waveform yeast
5
single 4.67 48.00 5.16 10.53 8.57 39.39 16.44 37.37
In this paper, we have introduced a method to combine binary classifiers based on an imprecise probabilistic approach. It handles classifiers with both precise and imprecise probabilistic outputs (including possibilistic, evidential [2] and credal classifiers [15]). The advantage of our method is that the combination step provides an imprecise output, that reflects the conflict between the classifiers. Thus, decisions strategies that take this imprecision into account may be employed. Experiments show that our approach is comparable to other state-of-the-art combination methods, as well as to single classifier approaches. In particular, when classifiers provide conflicting outputs (which is often the case when data are hard to classify), the result of the combination exhibits a high degree of imprecision. Then, selecting a set of plausible classes and letting an user make the final decision may lead to dramatically reduce the misclassification error. Future works may be conducted in several directions. First of all, we may consider designing optimization methods fitted to the imprecise framework. Besides, we may also study how the discounting rate may be integrated in the optimization process. Eventually, our approach should be tested on other wellknown imprecise classification algorithms (such as credal networks or classification trees). Acknowledgements. The authors wish to express their thanks to Thierry Denœux for fruitful discussions about this work, and for giving interesting perspectives.
References 1. Baudrit, C., Couso, I., Dubois, D.: Joint propagation of probability and possibility in risk analysis: towards a formal framework. Int. J. of Approximate Reasoning 45, 82–105 (2007) 2. Denœux, T.: A k-nearest neighbor classification rule based on dempster-shafer theory. IEEE Trans. on Systems, Man and Cybernetics 25(5), 804–813 (1995)
230
S. Destercke and B. Quost
3. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998) 4. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error- correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995) 5. Friedman, J.H.: Another approach to polychotomous classification. Technical report, Department of Statistics and Stanford Linear Accelerator Center, Stanford University (1996) 6. Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10 (1998) 7. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, Heidelberg (2001) 8. Huang, T.-K., Weng, R.C., Lin, C.-J.: Generalized bradley-terry models and multiclass probability estimates. Journal of Machine Learning Research 7, 85–115 (2006) 9. Mercier, D., Quost, B., Denoeux, T.: Refined modeling of sensor reliability in the belief function framework using contextual discounting. Information Fusion 9(2), 246–258 (2008) 10. Quost, B., Denœux, T., Masson, M.-H.: Pairwise classifier combination using belief functions. Pattern Recognition Letters 28(5), 644–653 (2007) 11. Smets, P., Kennes, R.: The transferable belief model. Artificial Intelligence 66, 191–234 (1994) 12. Troffaes, M.: Decision making under uncertainty using imprecise probabilities. Int. J. of Approximate Reasoning 45, 17–29 (2007) 13. Walley, P.: Statistical reasoning with imprecise Probabilities. Chapman and Hall, New York (1991) 14. Wu, T.-F., Lin, C.-J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research 2(5), 975–1005 (2004) 15. Zaffalon, M.: The naive credal classifier. J. Probabilistic Planning and Inference 105, 105–122 (2002) 16. Zaffalon, M., Corani, G., Maua, D.: Utility-based accuracy measures to empirically evaluate credal classifiers. In: ISIPTA 2011 (2011)
Applying Hierarchical Information with Learning Approach for Activity Recognition Hoai-Viet To1 , Hoai-Bac Le2 , and Mitsuru Ikeda1 1 School of Knowledge Science Japan Advanced Institute of Science and Technology Nomi, Ishikawa, Japan {viet.to,ikeda}@jaist.ac.jp 2 Faculty of Information Technology University of Science, Vietnamese National University Ho Chi Minh City, Vietnam [email protected]
Abstract. This paper discusses the problem of applying ontology for activity recognition and proposes a hierarchical classification approach by using categorize information among activities with machine learning method. In activity recognition problem, machine learning approaches have the ability to adapt to various real environments but actual setting do not often obtain enough quality data to construct a good model for recognizing multiple activities. Our approach exploits the hierarchical structure of activities to overcome the problem uncertainty and incomplete data for multi-class classification in real home setting datasets. While slightly improves the overall recognition accuracy from 59% to 63%, hierarchical approach can recognize infrequent activities such as “Going out to work” and “Taking medication” with accuracies of 80% and 56% respectively. Those activities had recognition accuracies lower than random guess in previous learning method. The preliminary results support the idea to develop a methodology to utilize semantic information represented in ontologies for activity recognition problem.
1
Introduction
Activity recognition is the key challenge to realize the vision of computing ubiquitous service or Smart home environment. In such environment, appropriate services are provided based on the user’s activities and behaviors. The application of this environment ranges from e-health care service that can monitor and support an independently and distantly living disabled or aged person to automated control of HVAC (Heating, Ventilating, and Air Conditioning) and home entertainment systems that facilitate daily living of people. To recognize activities, sensors are deployed in the environment to monitor and collect user’s information in the environment. Due to the fact that many kinds of devices can be used, variety of approaches has been proposed to deal with different kinds Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 231–242, 2011. c Springer-Verlag Berlin Heidelberg 2011
232
H.-V. To, H.-B. Le, and M. Ikeda
of information collected. However, lab-based research has a long way to go to deliver a practical application of Smart home environment in real world. There is a shift of sensor usage from a single sensor to dense simple sensors in recent years. The single sensor approach utilizes rich feature sensor such as camera or wearable sensors such as accelerometers and location beacons to monitor user’s activities. While camera based approach suffers from the current drawback of computer vision to extract informative information from the huge amount of low level image features, the wearable devices lack the ability to precisely distinguish similar activities such as bathing and washing. The dense sensing approach originates from the idea that we can infer what people are doing by observing what they are interacting with. By attaching simple sensors in every possible objects that user can interact during an activity, hopefully we can collect sufficient data to identify this activity. However, the limitation of current sensor technology and the designing methodology the still generate a lot of uncertainty and incomplete data in sensoring data. Ontology, as a mean for semantic representation [3], has been introduced to the activity recognition problem to fill the gap between noisy low data and meaningful activity in the Smart home environment. The usage of ontology varies in different works and raises a promising approach to deal with the uncertainty problem in activity recognition. In this paper, we propose a novel approach to apply hierarchical information with machine learning method for activity recognition problem. Section 2 introduces relating approaches that show different aspects of ontology usage for the problem. Section 3 presents our proposed hierarchical classification approach by using hierarchical information to improve performance of machine learning method. Experimental results and our discussion were introduced in section 4 and 5 respectively. Section 6 concludes this paper with future work of our proposed approach.
2
Applying Ontology for Activity Recognition
In this section, we summarize typical approaches that apply ontology to deal with the problem of noisy and incomplete model in activity recognition. Those approaches can be divided into two groups: the first group exploits semantic meaning and relationship among objects to improve learning performance and robustness, and the second utilizes ontology to construct activity models that allows activity recognition through logical inference. In the first group, semantic relationship among objects is exploited to refine low level feature for statistical or machine learning approach. In [10], an hierarchical structure of objects that relates to daily life activities is constructed by extracting lexical relationship from WordNet. This ontology is used to propagate the likelihood from the root node to leaf nodes by a method called hierarchical shrinkage. This approach helps to smooth the likelihoods among sliding leaf nodes that represent objects sharing the same function so it reduces the required samples to calculate reliable probabilities and makes the model robust to unobserved objects. Experiments show that hierarchical can improve overall
Applying Hierarchical Information
233
recognition accuracy by 15.11% compare with initial model constructed by mining probabilities from the web. They also show the robustness of the approach: when all the objects are replaced by others that have the same function, recognition accuracy only drops by 33% with shrinkage approach, comparing with 91.66% of one without using shrinkage. The role of hierarchical relationship among objects is also studied in [12]. In this paper, an ontology of objects is constructed and terms in higher level are use to represent features instead of specific objects. Those features are used for an learning approach such as Na¨ıve Bayesian to recognize activity spaces, the spaces that associate with activities. This approach is used to deal with the situation when there are new objects that are not available in the training phase. The second group develops and utilizes semantic representation in ontologies to recognize activities through logical inference, with or without uncertainty. In [2], a comprehensive conceptual activity ontology is introduced to model activities with wide range of relating objects such as sensors, resources, location, time, and activities, etc. Hierarchical structures of activities and objects provide the ability of recognizing activities at different levels of detail as well as dealing with various representations of relating objects. A case study is introduced to demonstrate for those ability. However, this approach lacks the ability to deal with uncertainty information which is necessary in real world situation. Typical techniques used to deal with uncertainty including Bayesian approach and Dempster-Shafer theory have been introduced in different papers. In [7], activity models are converted into dynamic Bayesian network and web mining technique is used to estimate involvement probabilities of the objects. Experiment that uses approximate inference to recognize 14 activities of daily living achieve overall accuracy at 88% precision and 73% recall. While Bayesian approach is based on objective probabilities obtained by statistical operation, Dempster-Shafer theory allows to represent and integrate subjective belief or reliability and to combine evidences from different resources [8]. Dempster-Shafer approach is introduced in [4] to represent uncertainty in activity ontologies and preliminary experiment performed in a case study shows that Dempster-Shafer approach can recognize “Toileting” activity in a Smart home environment with an accuracy of 88.2% [6]. Above approaches give promising experimental results but are limited in the terms of real world environment. Most of the approaches have been evaluated in example case study [2, 4] or stimulating scenario in laboratory [7, 12, 10]. In spite of the effort to model noisy and incomplete data, those settings lack the characteristics of real world setting. In [6], preliminary experiment of applying Dempster-Shafer approach with latticed structure to recognize “Toileting” activity from MIT Home Setting datasets [11]. However, the recognition of a single activity can not give a comprehensive justification about its efficiency. In this paper, we concentrate on recognition accuracy on real world datasets such as MIT Home Setting dataset [11]. The actual environment consists of many sources of uncertainty such as device’s inaccuracy, the interference of other devices, the difference of activity context such as user’s habits, etc. Therefore, we argue that machine learning approach, with the support of other prior knowledge,
234
H.-V. To, H.-B. Le, and M. Ikeda
is appropriate to adapt with the variety of noise from actual environment for the activity recognition problem. We introduce a novel approach to utilize semantic relationship from activity ontologies with the machine learning approach in the next section.
Our proposed approach is the combination of prior knowledge and observed data by using hierarchical information of activities to support the learning method. Due to the variety of user setting and actual settings, machine learning or statistical approaches show their ability to characterize user activities in real environment as well as to deal with noisy and incomplete sensor data. However, it is also difficult to collect sufficient and good quality data for learning methods because their are many activities that need to be concerned and no optimal configuration to deploy sensors has been proposed [11]. In [1], a categorization of daily live activities by rate of energy absorbed is introduced and this categorization forms a hierarchical structure among activities. In the context of health-care application in home setting, activities within a category tend to share the same location and interacted objects. Our approach uses this relationship information by applying a hierarchical classification scheme for activity recognition task. Figure 1 demonstrates the idea of hierarchical classification approach. In this approach, instead of using a single classifier for all activities, multiple classifiers are constructed following the hierarchical structure among those activities. The idea of this approach intuitively stimulates the top-down approach of recognition ability of human being: people try to recognize objects in the more general level first, and then in more specific level. We realize this idea by exploiting the semantic relationship among categories and activities extracting from activity ontology: category is the more general description of activity. Furthermore, activities which belong to one category tend to share some similar characteristics because they perform similar functions. This implies that classifying categories is easier than classifying individual activities. In a simple case with a 2-layer structure, there are one upper classifier for categories and one lower classifier within every category to classify activities. To predict the activity label for new sample, two steps of classification are performed. The category classifier predict its category and an appropriate activity classifier is selected to produce final prediction. Figure 2 shows the algorithms for training and predicting steps of the proposed hierarchical classification approach. In the training step, the training dataset is used to build multiple classifiers: one to classify sample into category and one for each category to classify sample into activity. At first, an upper classifier which classifies each sample into one category is trained. Then, the specific classifiers are trained: the whole dataset is divided into separate subsets based on samples’ category labels, and within each subset, which have the same category label, a classifier is trained to classify data samples into specific activities.
Applying Hierarchical Information
235
Fig. 1. Hierarchical Classification Approach Example: to recognize new sample X, at first Category Classifier is used to predict its category label, then the correspondence Activity Classifier 1 is selected and used to predict its activity label
The functionalities of the main procedures of training and predicting algorithms are as follow: Categorize Data procedure partitions the whole training dataset into separate subsets upon their category labels and those subsets are used to train a set of activity classifiers. M ake Classif ier procedure is used to train the classifier whatever it is a category or activity classifier. The usage of the same procedure is this algorithm implies that we use the same learning algorithm for all classifiers. However, different algorithms can be used in different layer when adapting this algorithm to another case. In the predicting step, a sample is classified twice to obtain its category first and then its activity label using two classifiers. Classif y procedure generates predicted label for the sample and Select Classif ier procedure select appropriate activity classifier from the set of activity classifiers based on the category label of the sample. 3.2
Problem of Infrequent Categories
We introduce an additional step of merging infrequent categories for the hierarchical classification approach. From the learning aspect, classification accuracies are expected to be higher at the upper layer (category) than the lower layer (activity) in the hierarchical structure due to the decrease of the number of classes and the increase of the number of samples per class. However, when there are the category with single activity, the effect of hierarchical classification in upper layer is not ensured. Furthermore, recognition accuracies of those categories tend to decrease because the unbalanced problem among classes becomes larger in the category layer. Merging infrequent categories can help by reducing the number of classes and increase the number of samples within the new class. Figure 3 shows an example of infrequent category problem retrieved from MIT Home Setting datasets [11]. In the dataset of subject one, three categories “Travel
236
H.-V. To, H.-B. Le, and M. Ikeda
procedure Hierarchical Training input Set of training sample train ds output A cat classif ier to classify sample into categories and a set of act classif iers to classify sample into activities cat classif ier ← M ake Classif ier(train ds, ’cat label’) cat ds set ← Categorize Data(train ds) for all cat ds in cat ds set do act classif ier ← M ake Classif ier(cat ds, ’act label’) Add(act classif ier set, act classif ier) end for procedure Hierarchical Predict input New sample to classify output Predicted act label cat label ← Classif y(sample, cat lassif ier) act classif ier ← Select Classif ier(act classif ier set, cat label) act label ← Classif y(sample, act classif ier) return act label Fig. 2. Hierarchical Classification Training and Predicting Algorithms for Activity Recognition
Fig. 3. Merging Infrequent Categories
employment”, “Preparing a snack” and “Preparing a beverage” are considered infrequent since they only contains one activity each category and every activity also consists of few observed samples. Detail explanation for this justification will be discussed in next section. In our experiment, those categories were merged together into one common category, named “Infrequent Category”, and this common category was used to classify instead.
4 4.1
Experiments Datasets and Methods
Experiments were performed on the MIT Home Setting datasets [11] to compare recognizing accuracies of hierarchical classification approach against flat classification. These datasets include activity and sensoring information of two
Applying Hierarchical Information
237
participants in real home setting. Sensors were attached to objects in each participant’s house to collect interacting information of the subject with objects while those subjects used PDAs to report their activity every 15 minutes in two weeks. Collected data was then processed to eliminate empty samples, sample with no sensoring data, and like in [11], only activities with more than 6 samples were used in our experiments. The numbers of the samples per activity of each subject in our experiments are shown in Table 1. In [11], Na¨ıve Bayesian algorithm was applied to classify activities from low level sensor features. Two kinds of binary features was used: exist and bef ore features. The exist feature determines whether a correspondence sensor activated when activity was happening or not and bef ore feature represents the temporal relation among sensors within this duration. According to experimental evaluation in [11], the exist feature with multi-class classifiers achieved best discriminant ability on these datasets. Therefore, the exist features were used in our learning algorithms to compare our hierarchical classification approach with flat classification approach. The numbers of features of two subjects are 76 and 70, respectively. In our experiments, Na¨ıve Bayesian learning algorithm was applied with both flat and hierarchical classification approaches. Na¨ıve Bayesian learning algorithm obtains comparable accuracy with complex methods such as C4.5 [5] and fast and simple to train, that is suitable to build multiple classifiers in hierarchical approach. Flat classification was performed to reproduce learning accuracies of Na¨ıve Bayesian method on the datasets as reported in [11], since the datasets we collected were slightly different from what was described in that paper, and to compare with our proposed approach of using hierarchical information for classification. Leave-one-out cross validation method was used to evaluate the learning algorithms. The datasets contains sensoring data of two subjects for two weeks. For each dataset, one day’s data was used to as testing data and the data of all remaining days was used as training data. Using the exist feature, flat classification method and its results were basically similar to those of Activity Detected at Least Once evaluation method [11]. This was the best recognizing method under our investigation into confusion matrices in [9]. According to [11] and our experiments, the following activities were not able to be recognized better than random guess: “Preparing a snack”, “Preparing dinner”, “Going out to work”, “Washing dishes”, and “Cleaning” for subject one; and “Preparing dinner”, “Preparing a snack” and “Taking medication” for subject two. The comparison between our hierarchical classification approach and flat classification approach is presented in the next section. 4.2
Results
Table 2 and 3 shows the comparison between those methods with leave-one-out cross validation on each subject respectively. In each table, precisions, recalls and f-measures of two approaches are presented for all activities. F-Measures stand for the average recognition accuracies of the methods. Activities that were not able be recognized by both approaches were excluded from those tables.
238
H.-V. To, H.-B. Le, and M. Ikeda Table 1. Number of Samples Per Activity of Each Subject Category
Activity
Clean house
Doing laundry
Subject 1 Subject 2 19
Clean house
Cleaning
9
-
Leisure
Listening to music
-
17
Leisure
Watching TV
-
15
Meal Cleanup
Washing dishes
8
20
Personal hygiene
Toileting
84
37
Personal hygiene
Bathing
18
-
Personal hygiene
Dressing
24
-
Personal hygiene
Grooming
37
-
Personal medical
Taking medication
-
14
-
Preparing a beverage Preparing a beverage
15
-
Preparing a meal
Preparing dinner
8
14
Preparing a meal
Preparing lunch
17
20
Preparing a meal
Preparing breakfast
14
18
Preparing a snack
Preparing a snack
15
16
Travel employment
Going out to work
12
-
Table 2. Comparison between Classification Approaches on Subject One’s Dataset Flat classification
Hierarchical classification
Activity
PR
RE
FM
PR
RE
FM
Going out to work
0.33
0.17
0.22
0.67
1.00
0.80
Toileting
0.63
0.81
0.71
0.74
0.80
0.77
Bathing
0.90
0.50
0.64
0.89
0.44
0.59
Grooming
0.60
0.76
0.67
0.60
0.76
0.67
Dressing
0.81
0.71
0.76
0.88
0.58
0.70
Preparing breakfast 0.60
0.43
0.50
0.41
0.50
0.45
Preparing lunch
0.59
0.59
0.59
0.59
0.59
0.59
Preparing a snack
0.33
0.40
0.36
0.42
0.53
0.47
Preparing a beverage 0.63
0.67
0.65
0.77
0.67
0.71
Doing laundry
0.89
0.87
0.76
0.84
0.80
0.85
From those results, the common trend of the affect of hierarchical classification approach on both datasets can be recognized: hierarchical classification have different affects on frequent and infrequent categories. In our experiments, a
Applying Hierarchical Information
239
Table 3. Comparison between Classification Approaches on Subject Two’s Dataset Flat classification
Hierarchical classification
Activity
PR
RE
FM
PR
RE
FM
Toileting
0.45
0.86
0.59
0.60
0.81
0.69
Taking medication 0.29
0.14
0.19
0.44
0.79
0.56
Preparing breakfast 0.61
0.78
0.68
0.63
0.83
0.71
Preparing lunch
0.52
0.55
0.54
0.54
0.65
0.59
Preparing dinner
0.83
0.36
0.50
0.80
0.29
0.42
Preparing a snack
0.25
0.06
0.10
0.10
0.06
0.08
Washing dishes
0.63
0.50
0.56
0.54
0.35
0.42
Watching TV
0.86
0.40
0.55
1.00
0.47
0.64
Listening to music 0.56
0.53
0.55
0.46
0.35
0.40
category is frequent if it contains more than 16 samples, that is more than one sample per day on average.1 Those were “Personal hygiene”, “Preparing a meal” and “Clean house” for subject one; and “Personal hygiene”, “Preparing a meal”, “Meal Cleanup” and “Leisure” for subject two. Other categories are considered infrequent and merged into one common category for classification. The affect of hierarchical classification approach varies on different activities of frequent categories, making different changes of recognizing accuracies of those activities. For subject one, the accuracy of “Toileting” increases from 71% to 77% while of “Bathing”, “Dressing”, “Preparing breakfast” and “Doing laundry” decrease from 64% to 59%, 76% to 70%, 50% to 45% and 87% to 80%, respectively. Remaining activities had no change in recognizing accuracy. For subject two, the activities that had increase in accuracy were “Toileting”, from 59% to 69%, “Preparing breakfast”, from 68% to 71%, “Preparing lunch”, 54% to 59%, and “Watching TV”, 55% to 64%. Three other activities “Preparing dinner”, “Washing dishes” and “Listening to music” shows the decrease in their accuracies. The changes of recognizing accuracies tend to depend on numbers of samples per activity and this problem will be discussed in detail in next section. While hierarchical classification approach does not show a common affect on accuracies of frequent categories, it significantly improves recognizing accuracies of activities of infrequent categories. The accuracies of “Going out to work” in subject one’s dataset and “Taking medication” in subject two’s increase from 22% to 80% and 19% to 56% respectively. Those activities had recognition accuracies lower than random guess in previous experiment [11]. Other activities in these categories also show improvement in accuracy were “Preparing a snack”, from 36% to 47%, and “Preparing a beverage”, from 65% to 71%, in subject one’s dataset. However, in the case of “Preparing a snack” of subject two, precision 1
The threshold of one sample per day was chosen arbitrarily based on our assumption that an activity was considered frequent if it repeated within a day.
240
H.-V. To, H.-B. Le, and M. Ikeda
decreased from 25% to 10%, bringing out the decrease of recognizing accuracy from 10% to 8%. Besides, hierarchical classification approach also can not recognize activities that Na¨ıve Bayesian algorithm with flat approach can not, which were excluded from those tables. The next section will discuss the following main problems of proposed approach from the experimental results: the affect of hierarchical classification approach and the mutual affect of hierarchical approach and observed data on the recognition accuracy.
5
Discussion
The first discussion is about the affect of hierarchical classification approach on the recognizing accuracies on the datasets. Despite the various changes on accuracies of different activities, the overall recognizing performance was slightly improved in both datasets. The average accuracy of subject one, which is the weighted average of accuracies of all activities, increases from 59% to 63% and of subject two from 50% to 53%. One of the main problem of hierarchical classification approach is the error propagation, that is the recognition error from upper layer transmits to lower layer in structure so the recognition, causing the reduce of overall recognition accuracy. In the case of MIT Home Setting datasets, the improvement can be explained by the better classifying performance of the hierarchical approach in both category and activity layer. In both layers, the number of classes was fewer than in flat classification approach and even there are much more samples per class in category layer. The error propagating affect will increase if more hierarchical layers are used. In our experiments, just two layers were used so the negative affect can also be restricted. Another affect can be observed is that the hierarchical approach amplified the influence of data on classifying accuracies. The result of this affect is that the change of accuracy of activity or category depends on its size. For subject one, “Personal Hygiene” was dominant in term of the number of sample, it had the increase in recognition accuracy while the others’ decreased. This affect can be observed within a category where activities with more samples, for example “Toileting”, obtain better accuracies in the hierarchical approach. For subject two, all three frequent categories “Personal Hygiene”, “Preparing a meal” and “Leisure” also had increase in accuracies, the number of samples per categories of those categories are approximate. Within “Preparing a meal” and “Leisure” categories, the accuracy changes of activities depend on the proportion of their samples. The amplifying affect has a beneficial result on infrequent categories through the merging step. The recognition accuracies of most categories except “Washing dishes” increase, especially of “Going out to work” in subject one’ dataset and “Taking medication” in subject two’s. In [11], the authors noticed the case of “Going out to work” because this activity trigger specific sensors but the recognition accuracy was not as high as expected. Previous explanation was that those sensors also activated in other activities so that the their discriminant power decreased. Our additional explanations are as follow: (1) the number of samples
Applying Hierarchical Information
241
of this activity is too few, and (2) the number of sensors triggered by this activity is also lower than normal (two door sensors). The combination of those reason make the likelihood of this activity dominant by other major activities in predicting step, causing its low recall. Through merging step, the size of activity’s category increases and the propagation of this activity within its category also increases. The recognition accuracy of “Going out to work” significantly improve with 100% recall. For the remaining activities, recognition accuracies are not improved and it still lower than random guess. This suggest that the sensor features do not actually have the discriminant power on those activities. The beneficial affect of hierarchical approach in MIT Home Setting datasets is optimistic. However, more general experiments should be performed to obtain a reasonable justification about this approach. As a machine learning approach, our proposed approach can adapt with different contexts of users. In the experiments, our learning approach can recognize various activities of both subjects in the datasets. This is the advantage of our learning approach over logical approach which need to individually design formal knowledge for every activity such as in [2, 4, 6]. This advantage allows us to perform more experiments on other real datasets in the future. The problem of machine learning approaches is that they need annotated data to train the model for every user. Therefore, designing a suitable plan to deploy sensors and collect data is necessary to realize the method in real world application.
6
Conclusion
In this paper, we have presented a novel approach that utilizes semantic knowledge to improve the performance of machine learning approach by using hierarchical information in activity ontology. In the preliminary experiments, the hierarchical classification approach can increase overall recognition accuracies for both subjects in the datasets and it can also recognize infrequent activities. This result shows the promising research direction of combining prior knowledge and machine learning methods such as this work and those in other researches mentioned in Section 2. Our future work relating to this research is to develop an activity ontology for the activity recognition problem. By development of such ontology, we hope to obtain better understanding about the ability of using semantic representation in both operation and design steps of Smart home system in real world application.
References 1. Ainsworth, B.E., Haskell, W.L., Whitt, M.C., Irwin, M.L., Swartz, A.M., Strath, S.J., O’brien, W.L., Bassett, D.R., Schmitz, K.H., Emplaincourt, P.O., Jacobs, D.R., Leon, A.S.: Compendium of physical activities: an update of activity codes and met intensities. Medicine and Science in Sports & Exercise 32, 498–504 (2000) 2. Chen, L., Nugent, C.: Ontology-based activity recognition in intelligent pervasive environments. International Journal of Web Information Systems 5, 410–430 (2009)
242
H.-V. To, H.-B. Le, and M. Ikeda
3. Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5, 199–220 (1993) 4. Hong, X., Nugent, C., Mulvenna, M., McClean, S., Scotney, B., Devlin, S.: Evidential fusion of sensor data for activity recognition in smart homes. Pervasive and Mobile Computing 5(3), 236–252 (2009); pervasive Health and Wellness Management 5. Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proceedings Of The Tenth National Conference On Artificial Intelligence, pp. 223–228. MIT Press (1992) 6. Liao, J., Bi, Y., Nugent, C.: Using the dempster shafer theory of evidence with a revised lattice structure for activity recognition. IEEE Transactions on Information Technology in Biomedicine 15, 74–82 (2011) 7. Philipose, M., Kenneth, P., Perkowitz, M., Patterson, D.J., Fox, D., Kautz, H., H¨ ahnel, D.: Inferring activities from interactions with objects. IEEE Pervasive Computing 3, 50–57 (2004) 8. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976) 9. Tapia, E.M.: Activity Recognition in the Home Using Simple and Ubiquitous Sensors. Master’s thesis, Massachusetts Institute of Technology (2003) 10. Tapia, E.M., Choudhury, T., Philipose, M.: Building reliable activity models using hierarchical shrinkage and mined ontology. In: Fishkin, K.P., Schiele, B., Nixon, P., Quigley, A. (eds.) PERVASIVE 2006. LNCS, vol. 3968, pp. 17–32. Springer, Heidelberg (2006) 11. Tapia, E.M., Intille, S.S., Larson, K.: Activity recognition in the home using simple and ubiquitous sensors. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 158–175. Springer, Heidelberg (2004) 12. Yamada, N., Sakamoto, K., Kunito, G., Isoda, Y., Yamazaki, K., Tanaka, S.: Applying ontology and probabilistic model to human activity recognition from surrounding things. IPSJ Digital Courier 3, 506–517 (2007)
Querying in Spaces of Music Information Wladyslaw Homenda1,2 and Mariusz Rybnik2 1
Faculty of Mathematics and Information Science, Warsaw University of Technology Plac Politechniki 1, 00-660 Warsaw, Poland 2 Faculty of Mathematics and Computer Science, University of Bialystok ul. Sosnowa 64, 15-887 Bialystok, Poland
Abstract. This study is focused on querying accomplished on structured spaces of information. Querying is understood in terms of mining structures of information and of knowledge understanding. We consider information as a subject of descriptions expressed in some language. Information is hidden behind such descriptions. Operations done on structured spaces of information are performed on language constructions describing such structures. However, automatic operations not always can be performed directly on language constructions. In such cases it is necessary to expand performance to the space of information. The study concerns paginated (i.e. printed and handwritten) music notation. It is shown that querying in the space of music information requires syntactic structuring as well as its expansion to semantic analysis. It is worth underlining that data understanding requires analysis of uncertainty: analyzed data are usually incomplete, uncertain and with some incorrectness. Such imperfectness of information is hidden under the level of syntax and semantics. Due to limitation of the paper this problem is not studied. Keywords: structured information, music representation, syntactic structuring, semantic analysis, querying, data mining, knowledge discovery, knowledge understanding.
1
Introduction
The paper is structured as follows. Preliminary information on the subject of the study is included in subsections 1.1. Basic notions of mathematical linguistics are given in subsection 1.2 and concepts of syntax, semantics and understanding are introduced in subsection 1.3. Section 2 is devoted to syntactic structuring of music information and is focused on paginated music notation. Due to limited space of the paper extended discussion on other types of music information is omitted. Semantic analysis and automatic understanding of music information is explored in Section 3. Operations on spaces of music information are studied in Section 4. Automatic accomplishment of structural operations relies on syntactic structuring, semantic analysis and automatic understanding. Finally, conclusions are given in Section 5.
This work is supported by The National Center for Research and Development, Grant no N R02 0019 06/2009.
Y. Tang, V.-N. Huynh, and J. Lawry (Eds.): IUKM 2011, LNAI 7027, pp. 243–255, 2011. c Springer-Verlag Berlin Heidelberg 2011
244
W. Homenda and M. Rybnik
Fig. 1. Polonaise from Sonata for Flute and Piano by Beethoven, the beginning excerpt
1.1
The Subject
In this paper we analyze structured spaces of music information oriented to paginated music notation. The study is focused on operations done on spaces of music information, which are related to querying in such spaces. Methods presented in the paper are illustrated with fragments of scores of classical music. Namely, we inspect Polonaise from Sonata for Flute and Piano by L. v. Beethoven, c.f. Figure 1 and Suite No. 3 in D major by J. S. Bach transcribed for Piano by T. A. Johnson, c.f. Figure 2. Our discussion fingers elements of both scores. Nevertheless, methods illustrated with these pieces can be expanded on the whole pieces as well as on other scores. Similarly, the study of spaces of music information oriented to paginated music notation may be easily adopted to other methods of describing music information, e.g. Music XML [2], Braille music description [8], MIDI [9] etc. The study continues the new issue of “automatic image understanding” raised by R. Tadeusiewicz a few years ago, c.f. [10,11]. 1.2
Grammars and Languages
The discussion is based on common definition of grammars and context-free grammars. Let us recall that a system G = (V, T, P, S) is a grammar, where: (a) V is a finite set of variables (called also nonterminals), (b) T is a finite set of terminal symbols (simply called terminals), (c) a nonterminal S is the
Querying in Spaces of Music Information
245
Fig. 2. J. S. Bach, Suite (Ouverture) No. 3 in D major, BWV 1068, transcription for Piano by T. A. Johnson, the beginning excerpt of the second movement Air
initial symbol of the grammar and (d) P is a finite set of productions. A pair (α, β) of strings of nonterminals and terminals is a production assuming that the first element α of the pair is a nonempty string. Production is usually denoted α → β. Grammars having all productions with α being a nonterminal symbols are context-free grammars. A derivation in a grammar is a finite sequence of strings of nonterminals and terminals such that: (a) the first string in this sequence is just the initial symbol of the grammar and (b) for any two consecutive strings in the sequence, the latter one is obtained from the former one using a production, i.e. by replacing a substring of the former one equal to the left hand side of the production with the right hand side of it. We say that the last element of the string is derivable in the grammar. For a context-free grammar a derivation can be outlined in a form of derivation tree, i.e. (a) the root of the tree is labelled with the initial symbol of the grammar and (b) for any internal vertex labelled by the left side of a production, its children are labelled by symbols of the right side of the production. Finally, the set of all terminal strings derivable in a given grammar is the language generated by the grammar. In this paper, terminal strings generated by a grammar are called words, sentences or texts while parts of such units are named phrases. Units of paginated music notation are called scores. Parts of scores have their domain-dependent names, e.g. stave, measure, key signature etc. These names are not applicable to mathematical linguistics, i.e. are not applicable for products of a corresponding grammar. Anyway, we will be using both
246
W. Homenda and M. Rybnik
naming conventions applying such names as sentences and phrases to products and parts of products of a grammar and using domain names to describe scores and their parts. 1.3
Syntax, Semantics, Understanding
The notion querying raises two fundamental aspects: what (is queried) and how (it is queried). Firstly, conscious querying (n.b. conscious querying is a case of conscious communication) is associated with understanding what is queried, i.e. understanding queried information in terms of its structure and possible meaning. Secondly, queries are expressed and communication is carried out in some language. Consequently, constructions of the language express the aspect how (it is queried). Understanding, as recognized in this paper, is an ability to identify concepts in the real world, i.e. objects and sets of objects in the world described by constructions of a given language. The term syntax is used in the meaning of structuring constructions of the language. Semantics is a mapping or relation, which casts constructions of the language on objects and local and global structures of objects of the real world. Therefore, ability to recognize the semantics, i.e. to identify such a mapping or relation, is a denotation of understanding.
2
Syntax
Paginated music notation is a language of communication. There is no convincing proof that paginated music notation is or is not a context-free language. Music notation includes constructions of the form ww (e.g. repetitions), which are context sensitive ones. Consequently, it seems formally that music notation is a context sensitive language, c.f. [7]. Therefore, a context-free description of paginated music notation is not possible. However, context sensitive methods, which would be utilized in precise description, are not explored enough for practical applications. On the other hand, even if paginated music notation is a context-free language, its complexity does not allow for practical context-free description. For these reasons an effort put in precise formal description of paginated music notation would not be reasonable. Instead we attempt to construct a context-free grammar covering paginated music notation. The term covering paginated music notation is used not only in the sense of generating all valid music notation constructions, but also not valid ones. This is why sharp syntactical analysis of music notation is not done. Usage of a simplified context-free grammar for the purpose of syntactical structuring of paginated music notation is valid in practice. The grammar will be applied in analysis of constructions, which are assumed to be well grounded pieces of paginated music notation. Of course, such a grammar can neither be applied in checking correctness of constructions of paginated music notation, nor in generation of such constructions.
Querying in Spaces of Music Information
2.1
247
Grammar, Derivations
Here we give a raw description of paginated music notation in a form of productions of a context-free grammar. Music notation is a collection of staves (staff lines) placed on pages. Every stave is surrounded by corresponding symbols placed on the stave and around it. However, the collection of staves has its own structure with staves grouped into higher-level units called systems. A raw description of the structure of music notation could be approximated by the set of context-free productions given below. Components of the grammar G = (V, T, P, S) are as follows. The set of nonterminals includes all identifiers in triangle brackets printed in italic. The nonterminal <score> is the initial symbol of G. The set of terminals includes all non bracketed identifiers. <score> → <score part> <score> | <score part> <score part> → <page> <score part> | <page> <page> → <system> <page> | <system> <system> → <stave> <system> | <stave> <system> → <part name> <stave> <system> | <part name> <stave> <part name> → Flute | Piano | etc. <stave> → beginning-barline | → | →