This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Frontiers in Artificial Intelligence and Applications Volume 132 Published in the subseries
Knowledge-Based Intelligent Engineering Systems Editors: L.C. Jain and R.J. Howlett Recently published in KBIES: Vol. 115. G.E. Phillips-Wren and L.C. Jain (Eds.), Intelligent Decision Support Systems in Agent-Mediated Environments Vol. 104. A. Abraham, M. Köppen and K. Franke (Eds.), Design and Application of Hybrid Intelligent Systems Vol. 102. C. Turchetti, Stochastic Models of Neural Networks Vol. 87. A. Abraham et al. (Eds.), Soft Computing Systems – Design, Management and Applications Vol. 86. R.S.T. Lee and J.H.K. Liu, Invariant Object Recognition based on Elastic Graph Matching – Theory and Applications Vol. 83. V. Loia (Ed.), Soft Computing Agents – A New Perspective for Dynamic Information Systems Vol. 82. E. Damiani et al. (Eds.), Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies – KES 2002 Vol. 79. H. Motoda (Ed.), Active Mining – New Directions of Data Mining Vol. 72. A. Namatame et al. (Eds.), Agent-Based Approaches in Economic and Social Complex Systems Vol. 69. N. Baba et al. (Eds.), Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies – KES’2001 Recently published in FAIA: Vol. 131. B. López et al. (Eds.), Artificial Intelligence Research and Development Vol. 130. K. Zieliński and T. Szmuc (Eds.), Software Engineering: Evolution and Emerging Technologies Vol. 129. H. Fujita and M. Mejri (Eds.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the fourth SoMeT_W05 Vol. 128. J. Zhou et al. (Eds.), Applied Public Key Infrastructure – 4th International Workshop: IWAP 2005 Vol. 127. P. Ritrovato et al. (Eds.), Towards the Learning Grid – Advances in Human Learning Services Vol. 126. J. Cruz, Constraint Reasoning for Differential Models
ISSN 0922-6389
Advances in Logic Based Intelligent Systems Selected Papers of LAPTEC 2005
Edited by
Kazumi Nakamatsu University of Hyogo, Japan
and
Jair Minoro Abe University of Sao Paulo, Paulista University, Brazil
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
Preface It constitutes a great honor for us to publish the selected papers of the 5th Congress of Logic Applied to Technology – LAPTEC’2005 held in Himeji, JAPAN, from April 2nd to 4th, 2005. LAPTEC’2005 was hosted School of Human Science and Environment – University of Hyogo in Japan. It was the first time for LAPTEC to be held in other countries than Brazil since its birth in 2000, and has made the congress more international, with delegates from Japan, Brazil, Taiwan, China, Australia, Brunei. In LAPTEC’2005 we promoted discussion and interaction between researchers and practitionaers focused on both theoretical and practical disciplines concerning logics applied to technology, with diverse backgrounds including all kinds of intelligent systems having classical or non-classical logics as underlying common matters. First of all, we would like to express our greatest gratitude to Dr. Isao Shirakawa who accepted our offer to be the general chair of LAPTEC’2005, and Dr.Yutaka Suzuki (Univ. Hyogo, Vice President) and Dr.Akihiro Amano (Univ. Hyogo, Vice President) who have kindly presented their great invited lectures in the congress. We also would like to express our gratitude to Dr. Tetsuya Murai and Dr. Masahiro Inuikuchi who have organized the workshop “Rough Sets and Granularity” in LAPTEC’2005, Prof. Germano Lambert Torres (Univ. Itajuba Brazil) and his staff for the construction of the web site of the LAPTEC2005 and its maintenance, and all other committee members. We have chairs and committees in LAPTEC’2005 as General Chairs Isao Shirakawa (Japan) Kazumi Nakamatsu (Japan) Jair Minoro Abe (Brazil) Honorary Committee Hiroakira Ono (Japan), Kiyoshi Iseki (Japan), Lotfi A. Zadeh (U.S.A.), Newton C.A. da Costa (Brazil), Patrick Suppes (U.S.A.), Yutaka Suzuki (Japan) Program Committee Ajith Abraham (U.S.A.), Atsuyuki Suzuki (Japan), Don Pigozzi (U.S.A.), Edger G. K. Lopez-Escobar (U.S.A.), Eduardo Massas (Brazil), Germano Lambert Torres (Brazil), Hiroakira Ono (Japan), John A. Meech (Canada), Lakhmi C. Jain (Australia), Lotfi A. Zadeh (U.S.A.), Kenzo Kurihara (Japan), Kiyoshi Iseki (Japan), Manfred Droste (Germany), Marcelo Finger (Brazil), Maria C. Monard (Brazil), Michiro Kondo (Japan), MuDer Jeng (Taiwan), Nelson Favilla Ebecken (Brazil), Newton C.A. da Costa (Brazil), Patrick Suppes (U.S.A.), Pulo Veloso
viii
(Brazil), Sachio Hirokawa (Japan), Seiki Akama (Japan), Setsuo Arikawa (Japan), Sheila Veloso (Brazil), Sheng-Luen Chung (Taiwan), Shusaku Tsumoto (Japan), Tadashi Shibata (Japan), Takahira Yamaguchi (Japan), Tetsuya Murai (Japan), Yukihiro Itoh (Japan), Yutaka Hata (Japan). Organizing Committee Alexandre Scalzitti (Germany), Claudio Rodrigo Torres (Brazil), Hiroshi Ninomiya (Japan), Kazuo Ichikawa (Japan), Marcos Roberto Bombacini (Brazil), Patrick T. Dougherty (Japan), Yutaka Yamamoto (Japan). Also we would like to thank the following scholars who helped us in refereeing papers : Maria Ines Castineira (Brazil), Claudia Regina Milare (Brazil), Ricardo Luis de Freitas (Brazil), Adenilso da Silva Simao (Brazil). Last, we would like to express great thanks to University of Hyogo with hosting LAPTEC’2005, and acknowledge that this publication was partly supported by the Grant in The Japanese Scientific Research Fund Foundation (C)(2) Project No. 16560468. This book is dedicated to Emeritus Professor Atsuyuki Suzuki in commemoration of his honorable retirement from Shizuoka University, March 2005. Prof. Suzuki is learned in application of paraconsistent logic, and has contributed many papers and as a member of the program committee to LAPTEC since the beginning.
Contents Dedication Preface Kazumi Nakamatsu and Jair Minoro Abe
v vii
Constructive Logic and Situation Theory Seiki Akama and Yasunori Nagata
1
Hybrid Particle Swarm Optimizer with Mutation Ahmed Ali Abdala Esmin and Germano Lambert-Torres
9
An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited Munehiro Iwami
18
Data Transformation in Modern Petrol Engine Tune-up Chi-man Vong, Pak-kin Wong and Yi-ping Li
26
Testing Significance in Bayesian Classifiers Marcelo de S. Lauretto and Julio M. Stern
34
Obtaining Membership Functions from a Neuron Fuzzy System Extended by Kohonen Network Angelo Pagliosa, Claudio Cesar de Sá and F.D. Sasse
42
EVALPSN-Based Process Control in Brewery Plants Sheng-Luen Chung and Yen-Hung Lai
50
Decision Making Based on Paraconsistent Annotated Logic Fábio Romeu de Carvalho, Israel Brunstein and Jair M. Abe
55
Intelligent Safety Verification for Pipeline Based on EVALPSN Kazumi Nakamatsu, Kenji Kawasumi and Atsuyuki Suzuki
63
A Discrete Event Control Based on EVALPSN Stable Model Kazumi Nakamatsu, Hayato Komaba and Atsuyuki Suzuki
71
An EVALP Based Traffic Simulation System Kazumi Nakamatsu, Ryuji Ishikawa and Atsuyuki Suzuki
79
Modelling and Prediction of Electronically Controlled Automotive Engine Power and Torque Using Support Vector Machines P.K. Wong, C.M. Vong, Y.P. Li and L.M. Tam Multi-View Semi-Supervised Learning: An Approach to Obtain Different Views from Text Datasets Edson Takashi Matsubara, Maria Carolina Monard and Gustavo E.A.P.A. Batista
87
97
x
A Planning-Based Knowledge Acquisition Methodology Eder Mateus Nunes Gonçalves and Guilherme Bittencourt
105
Digital Images: Weighted Automata Theoretical Aspects Alexandre Scalzitti, Kazumi Nakamatsu and J.M. Abe
113
Modeling the Behavior of Paraconsistent Robots José Pacheco de Almeida Prado, Jair Minoro Abe and Alexandre Scalzitti
120
A System of Recognition of Characters Based on Paraconsistent Artificial Neural Networks Luís Fernando Pompeo Ferrara, Keiji Yamanaka and João Inácio da Silva Filho
127
Feature Subset Selection for Supervised Learning Using Fractal Dimension Huei Diana Lee, Maria Carolina Monard and Feng Chung Wu
135
Functional Language of Digital Computers I Kenneth K. Nwabueze
143
Learning Algorithm of Neural Network Using Orthogonal Decomposition Method Shigenobu Yamawaki and Lakhmi Jain
147
Para-Analyzer and Its Applications Jair Minoro Abe, João I. da Silva Filho, Fábio Romeu de Carvalho and Israel Brunstein
153
Methods for Constructing Symbolic Ensembles from Symbolic Classifiers Flavia Cristina Bernardini and Maria Carolina Monard
161
Efficient Identification of Duplicate Bibliographical References Vinícius Veloso de Melo and Alneu de Andrade Lopes
169
Autoepistemic Theory and Paraconsistent Logic Program Kazumi Nakamatsu and Atsuyuki Suzuki
177
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence I Kazumi Nakamatsu and Atsuyuki Suzuki
185
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence II Kazumi Nakamatsu and Atsuyuki Suzuki
192
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence III Kazumi Nakamatsu and Atsuyuki Suzuki
199
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence IV Kazumi Nakamatsu and Atsuyuki Suzuki
207
A Note on Non-Alethic Temporal Logics Jair Minoro Abe and Kazumi Nakamatsu
216
Railway Signal and Paraconsistency Kazumi Nakamatsu and Jair M. Abe
220
xi
– Workshop “Rough Sets and Granularity” – T. Murai and M. Inuiguchi On Topological Properties of Generalized Rough Sets Michiro Kondo Rough-Set-Based Approaches to Data Containing Incomplete Information: Possibility-Based Cases Michinori Nakata and Hiroshi Sakai
229
234
Rough Set Semantics for Three-Valued Logics Seiki Akama and Tetsuya Murai
242
Paraconsistency and Paracompleteness in Chellas’s Conditional Logics Tetsuya Murai, Yasuo Kudo, Seiki Akama and Jair M. Abe
248
Rough Sets Based Minimal Certain Rule Generation in Non-Deterministic Information Systems: An Overview Hiroshi Sakai and Michinori Nakata
256
Equivalence of Fuzzy-Rough Modus Ponens and Fuzzy-Rough Modus Tollens Masahiro Inuiguchi, Salvatore Greco and Roman Słowiński
264
Non-Commutative Fuzzy Logics and Substructural Logics Mayuka F. Kawaguchi, Osamu Watari and Masaaki Miyakoshi
272
Visibility and Focus: An Extended Framework for Granular Reasoning Yasuo Kudo and Tatsuya Murai
Constructive Logic and Situation Theory Seiki Akama , and Yasunori Nagata Teikyo Heisei University, Japan. University of the Ryukyus, Japan. Abstract. Infon Logic was introduced by Devlin as a logic for situation theory which constitutes a logical foundation for Barwise and Perry’s situation semantics. ) based on Nelson’s conIn this paper, we propose constructive infon logic ( structive logic with strong negation. A Kripke semantics is given with a completeness proof. Keywords. situation theory, constructive logic with strong negation, infon logic.
1. Introduction Barwise and Perry [8] proposed situation semantics for natural language. The foundation for situation semantics is called situation theory. It is interesting to study the logic of situation theory. Devlin [9] firstly developed a logical system for situation theory called infon logic; also see Barwise and Etchemendy [7]. There are two crucial notions in infon logic. One is the concept of infon and the other is that of situation. Roughly speaking, infon is considered as a discrete item of information, and situation as some part of the activity of the world. These two notions are intimately connected. Infon logic has non-classical flavors due to the treatments of negation and quantifiers. This means that classical logic is not suited to outline a basis for infon logic. The point is in fact recognized by workers in situation theory. For instance, Barwise and Etchemndy used Heyting algebras and Devlin adopted a version of partial logic. However, there are other interesting possibilities. The aim of this paper is to develop constructive infon logic ( ) based on constructive logic with strong negation of Nelson [10].
2. Infon Logic Devlin [9] identifies the concept of information with the following: objects do/do not stand in the relation . Thus, information can be described by means of objects and relation holding these objects. Let be -place relation and be appropriate objects for . Then, is used to mean the informational item that stand in the relation , and is used to mean the infordo not stand in the relation . An infon is an object of the mational item that 1 Correspondence to: Seiki Akama, Computational Logic Laboratory, Department of Information Systems, Teikyo Heisei University, 2289 Uruido, Ichihara, Chiba 290-0193, Japan. Tel.: +81 436 74 6134; Fax: +81 436 74 6400; E-mail: [email protected]
2
S. Akama and Y. Nagata / Constructive Logic and Situation Theory
form where is an -place relation, are appropriate objects for , and is the polarity equal to 1 or 0. Then, is a value to denote the above two representations. If an infon corresponds to the way things actually are in our world, it is called a fact. From a traditional logical point of view, an infon correspond to an atomic sentence or its negation. Namely, it seems to be a basic representation of information. A situation is part of our world. Thus, a situation could be understood as partial possible to worlds by modal logicians. Let be a situation and be an infon. We write denote that is “made true by" . If is a set of infons and s is a situation, to mean that for every infon in . Devlin’s infon logic aimes at developing a logical calculus for complex infon. For doing so, logical connectives (conjunction), (disjunction), and bounded quantifiers (for all), (for some) are introduced. Let and be infons. Then, conjunction and disjunction are interpreted in the following way: iff
and
,
iff
or
Let be an infon, be a parameter and be some set, be an object given by an anchor. We simplify a situation theorist’s notion of anchor by a suitable substitution. Then, existential and universal quantifier can be interpreted as follows: iff
,
iff
.
Devlin did not introduce negation of an infon because the polarity of an infon can simulate negation. One of the important properties of an infon is the property of persistence. for any situation and appropriate objects This means that if in , then for any situation which extends .
3. Constructive Infon Logic For an infon logic, we need a logic with appropriate negation and implication. One of such candidates is constructive logic with strong negation denoted by originally proposed by Nelson [10], which is an extension of positive intuitionistic predicate logic with a new connective for strong negation. is an extension of the positive intuitionistic predicate logic with the following axioms for strong negation ( ): (N1) (N3) (N5) (N7)
( (
(N2) (N4) (N6)
(
The rules of inference are as follows: (MP) (UG)
(EG)
where does not occur in in (EG) and (UG). If is deleted from , we obtain paraconsistent constructive logic of Almukdad and Nelson [5]; also see Akama [4]. . Clearly, In , we can define intuitionistic negation ( ) as follows: strong negation is stronger than intuitionistic negation, namely, holds, but the converse does not hold; see Akama [1]. A Kripke semantics for strong negation was developed by Thomason [12]; also see Akama [1,2,3]. A completeness proof for may
3
S. Akama and Y. Nagata / Constructive Logic and Situation Theory
be found in Akama [1]. Thomason [12] proved that with the constant domain axiom ( ): , where is not free in , has a Kripke semantics ) is not acceptable for constructivists. with constant domains. It is well known that ( , which is inspired by the work Next, we introduce constructive infon logic with strong negation and of Wang and Mott [13] who proposed a first-order logic bounded quantifiers which is a variant of of Thomason [12]. Here, some remarks provides natural negation. If negation is introduced into Devlin’s are in order. First, infon logic, it obeys double negation law and de Morgan laws following Barwise and Perry [8]. It is then possible to express an infon as atomic formula or strong negation has real implication satisfying modus ponens and the of atomic formula. Second, deduction theorem, which is equivalent to intuitionistic implication. This is of interest to the representation of deduction. Finally, a situation can be interpreted as a set of infons and compounded infons can be formed constructively. In addition, a situation is a piece of is that of with a set of information with the persistency. The language and the membership symbol . An atomic formula is the expression of bounders or . Here is -place predicate symbol and of the form are terms, and is a constant and . Here, a term is defined as usual. Then, the is a formula with a variable and formulation rule of quantified formulas reads: if is a bounder, then and are formulas. is based on an axiomatization of positive A Hilbert style axiomatization of intuitionistic propositional logic with the following axioms: (C1) (C3) (C5) (C7) (C9)
(C2) (C4) (C6) (C8)
Here, is an arbitrary term. The rules of inference are as follows: (MP) (UG)
(EG) .
Here, does not occur in . (C1)-(C5) are equal to (N1)-(N5). However, axioms for quantification need modifications due to the presence of bounded quantifiers. . Let be a set of variables, be a Next, we turn to a Kripke semantics for be a set of -place predicate symbols, and be a set of bounders. set of constants, is of the form , where is a A constructive infon (CI) model for set of situations with the actual situation such that , is a reflexive , is a domain function assigning sets of individuals to and transitive relation on then , and is an interpretation the elements of satisfying that if into function satisfying the following conditions: (1) is a partial function from satisfying that (a) for constant , if and is defined, then is also defined . (2) for -place predicate , is a partial function from and into , and if , then is an extension of . (3) for bounder , is a partial function from into satisfying that if then , and is a three-valued valuation function assigning (true), (false), (undefined) to the atomic formula at satisfying: iff iff
are defined and are defined and
, ,
4
S. Akama and Y. Nagata / Constructive Logic and Situation Theory
otherwise, iff is defined and iff is defined and otherwise.
, ,
can be extended for any formula as follows: iff iff iff iff iff iff iff
and or or and and and , iff iff iff iff
, , , , , , iff and
, , and and
, ,
and
Here, we assume that every object has the same name. Note also that nor . is true iff . neither iff it is true in all constructive infon models. , if Theorem 3.1 For any formula and any situation then Theorem 3.2 For any formula and any , .
iff is valid, written and
,
4. Tableau Formulation for as a variant of the one In this section, we describe a tableau calculus introduced in Akama [3,4]. A basic idea of tableau calculus is to employ indirect proof (cf. Smullyan [11]). We here use the notion of a signed formula. If is a formula, then and are signed formulas. reads “ is provable" and reads “ is not provable", respectively. If is a set of signed formulas and is a signed formula, then for . A tableau calculus consists of axioms and we simply write reduction rules. A tableau is constructed by repeated applications of reduction rules until they cannot be applied. Let be an atomic formula and and be formulas. Tableau Calculus Axioms (AX1) Reduction Rules
, (AX2)
, (AX3)
S. Akama and Y. Nagata / Constructive Logic and Situation Theory
5
Here, the constant is arbitrary and the constant satisfies the restriction that it must not . stands for .A occur in any formula of or in the formula . A tableau is a tree constructed by the proof of a sentence is a closed tableau for above reduction rules. A tableau is closed if each branch is closed. A branch is closed to mean that is provable in . Let if it contains the axioms. We write be a set of signed formulas, be a constructive infon model, and . Say that refutes iff if , and if . A set is refutable if something refutes it. If is not refutable, it is valid. ) If is provable, then is valid. Theorem 4.1 (Soundness of (Proof): If is of the form of axioms, it is easy to show its validity. For reduction rules, it . By suffices to check that they preserve validity. For example, consider the rule refutes and the assumption, there is a constructive infon model, in which . This implies: iff and Therefore, and are refutable. , by the assumption, we have a constructive infon model, in which For the rule refutes and . This implies: iff . Here, is an arbitrary. By is also refutable. is similarly checked. theorem 3.1, , from the assumption, there is a constructive infon model, in which For the rule refutes and . This implies: iff Here, is subject to the variable restriction. Then, is also refutable. is similarly checked. The verification of other rules presents no difficulty. A finite set of signed formulas is consistent if no tableau for it is closed. An infinite set of signed formulas is consistent if every finite subset is consistent. If a set of signed formulas is not consistent, it is consistent. Definition 4.2 Let P be a set of parameters and a set of signed formulas. We say that is maximal consistent with respect to P if (1) every signed formula in uses only parameters of P, (2) is consistent, (3) for every formula with parameters in P, either or .
6
S. Akama and Y. Nagata / Constructive Logic and Situation Theory
Here, we denote by the new language extending the set of constants of the original language with a set of constants . Definition 4.3 We say that a consistent set of signed formulas is -saturated if (1) is maximal consistent with respect to , (2) if , then for some , (3) if , then for some . Lemma 4.4 A consistent set of signed formulas can be extended to a maximal consistent set of signed formulas . (Proof): Since the language has a countably infinite set of sentences, we can enumerate sentences Now, we define for a consistent set of signed formulas a sequence of consistent sets of signed formulas , ... in the following way:
Then, we set . It is shown that is a maximal consistent set. Lemma 4.5 A consistent set of signed formulas in can be extended to a -saturated consistent set of signed formulas in . (Proof): Let . Extend to a set maximal consistent with respect to . Since is a countable set of constants not in . we can enumerate sentences of the form in as By definition , it suffices to check the case of . can be then defined for any as follows: Take the first formula of the form . If but for all , then set . By lemma 5.4, we extend to , which is maximal consistent with respect to . Then, we define . Here, we can easily check that is -saturated. Since each is consistent, is also consistent. Let be any sentences of with . From the maximality of , one of the conditions , or and are provable, holds. Thus, is shown to be maximal in . Finally, we check . We suppose , i.e. the -th enumeration. From the above construction, for some must hold. This implies that is -saturated. , ... be a countable sequence of disjoint countable sets of Definition 4.6 Let constants not occurring in . We denote by . Then, we define a canonical constructive infon model as follows: 1. is -saturated in 2. If is -saturated and 3. We define in the following way:
for some , then iff
4. 5. 6.
7.
.
, , and
,
7
S. Akama and Y. Nagata / Constructive Logic and Situation Theory
Lemma 4.7 For any (1) if (2) if (3) if (4) if (5) if (6) if (7) if (8) if (9) if (10) if (11) if (12) if (13) if (14) if
in a canonical constructive infon model, we have: , then and , , then or , , then or , , then and , , then or , , then for some such that , then or , then and , then and , then or , then and , then or , then , , then .
,
and
,
, , , , , ,
Theorem 4.9 For any in a canonical constructive infon model and any formula, iff , iff . (Proof): By induction on . The case is an atomic formula is immediate. The interesting cases are as follows: (1) iff iff or iff or iff iff and iff and iff (2) : iff iff iff iff iff iff iff iff (3) : iff ) iff and iff iff iff iff (4) : iff iff iff iff iff iff Theorem 4.10 (Completeness Theorem)
iff
.
5. Discussion In this paper, we developed a constructive infon logic. Finally, we discuss some theoretical issues in in relation to situation theory addressing the logical connectives, i.e. negation, implication and universal quantifier. We start with the problem of negation. In Devlin’s [9] infon logic there is no negation, and polarity in an infon can play a role of negation. Because infon logic is formalized in a partial setting, the negation is not clas-
8
S. Akama and Y. Nagata / Constructive Logic and Situation Theory
al negation. Barwise and Etchemendy’s [7] Heyting infon algebra assumes that infon logic is intuitionistic logic. Unfortunately, intuitinistic negation is too weak to be used for infon logic. Strong negation is a desirable negation. If we allow contradiction in a situation, the resulting logic should be paraconsistent (cf. [5]). Second, conditional is of special interest from a logical viewpoint. In , the implication is intuitionistic implication. As is well known, the interpretation of the intuis that there is a construction which transforms a proof of itionistic implication into a proof of . This could be paraphrased in a situation theoretic setting as “there is an information flow from an infon to an infon ". There are, however, other possibilities of information flow. For example, Wansing [14] studied substructural constructive logics by means of Kripke models. A more elaborated treatment of implication may be found in the tradition of relevance logic; see Anderson, Belnap and Dunn [6]. Third, we consider the issue of quantification again. Although existential quantifier present no difficulty, universal quantifier gives rise to several interpretations. We think that there seem at least two intriguing interpretations, namely static and dynamic interpretations. The static interpretation, which is usually assumed by situation theorists, reads: if . The dynamic interpretation adopted by intuitionistis reads: iff . Here, we neglect bounders. The static interpretation is simpler than the dynamic one. But, the price is to give up the persistency. We adopt the dynamic interpretation in view of constructive logics with strong negation.
References [1] Akama, S., Constructive predicate logic with strong negation and model theory, Notre Dame Journal of Formal Logic, 29 (1988), 18–27. [2] Akama, S., Subformula semantics for strong negation systems, The Journal of Philosophical Logic, 19 (1990), 217–226. [3] Akama, S., Tableaux for logic programming with strong negation, D.Galmiche (ed.), TABLEAUX’97: Automated Reasoning and Analytic Tableaux and Related Methods, 31–42, Springer, Berlin, 1997. [4] Akama, S., Nelson’s paraconsistent logics, Logic and Logical Philosophy, 7 (1998), 101–115. [5] Almukdad, A. and Nelson, D., Constructible falsity and inexact predicates, Journal of Symbolic Logic, 49 (1984), 231–33. [6] Anderson, R., Belnap, N. and Dunn, J., Entailment vol. 2, Princeton University Press, Princeton, 1992. [7] Barwise, J. and Etchemendy, J., Information, infons, and inference, K. Cooper, K. Mukai, and J. Perry (eds.), Situation Theory and its Applications, vol. 1, 33–78, CSLI Lecture Notes 22, Stanford, 1990. [8] Barwise, J. and Perry, J., Situations and Attitudes, MIT Press, Cambridge, Mass., 1983. [9] Devlin, K., Logic and Information, Cambridge University Press, Cambridge, 1991. [10] Nelson, D., Constructible falsity, Journal of Symbolic Logic, 14 (1949), 16–26. [11] Smullyan, R., First-Order Logic, Springer, Berlin, 1968. [12] Thomason, R.H., Semantical study of constructible falsity, Zeitschrift für mathematische Logik und Grundlagen der Mathematik, 15 (1969), 247–57. based on situations, [13] Wang, X. and Mott, P., A variant of Thomason’s first-order logic Notre Dame Journal of Formal Logic. 39 (1998), 74–93. [14] Wansing, H., The Logic of Information Structures, Springer, Berlin, 1993.
Hybrid Particle Swarm Optimizer with Mutation Ahmed Ali Abdala Esmin1 Germano Lambert-Torres2 Federal University of Itajubá
Abstract. This paper presents a Hybrid Particle Swarm Optimizers combining the idea of the particle swarm with concepts from Evolutionary Algorithms. The hybrid Particle Swarm Optimizers with Mutation (HPSOM) combine the traditional velocity and position update rules with the idea of numerical mutation. This model is tested and compared with the standard PSO on unimodal and multimodal functions. This is done to illustrate that PSOs with mutation operation have the potential to achieve faster convergence and the potential to find a better solution. The objective of this paper is to describe the HPSOM model and to test their potential and competetiveness on function optimization. Keywords. Particle Swarm Optimizer, Genetic Algorithms, Hybrid Model
Introduction The original Particle Swarm Optimisation (PSO) algorithm introduced in [1], including the latest inertia weight and constriction factor versions as an alternative to the standard Genetic Algorithm (GA). The PSO was inspired by insect swarms and has, then, proven to be a competitor to the standard GA when it comes to function optimisation. Since then several researchers have analysed the performance of the PSO with different settings, e.g., neighbourhood settings ([2,3]). Work presented in [4] describes the complex task of parameter selection in the PSO model. Comparisons between PSOs and the standard GA were made analytically in [5] and also with regards to performance in [6]. Angeline points out that the PSO performs well in the early iterations, but has problems reaching a near optimal solution in several real-valued function optimisation problems. Both Eberhart and Angeline conclude that hybrid models of the standard GA and the PSO, could lead to further advances. The behaviour of the PSO in the gbest model presents some important aspects related with the velocity update. If a particle’s current position coincides with the 1
Corresponding Author: Av. BPS 1303 – Itajuba – 37500-000 – MG – Brazil – Email: [email protected]
10
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation
global best position, the particle will only move away from this point if its inertia weigh (w) and previous velocity are different from zero. If their previous velocities are very close to zero, then all the particles will stop moving once they catch up with the global best particle, which may lead a to premature convergence of the algorithm. In fact, this does not even guarantee that the algorithm has converged on a local minimum — it merely means that all the particles have converged to the best position discovered so far by the swarm. This phenomenon is known as stagnation. The solution presented in [7] is based on adding a new parameter and additional equations. Another solution is presented in [8] by introducing a breeding and subpopulation. To solve the problem above, this paper proposes a new model called Hybrid Particle Swarm Optimizer with Mutation (HPSOM), by incorporating the mutation process often used in GA into PSO. This process allows the search to escape from local optima and search in different zones of the search space. The objective of this paper is to describe how to make the hybrids benefit from genetic methods and to test their potential on function optimisation. The rest of paper is organized as follows. The next section presents the PSO definition. Section 3 presents an overview of Genetic Algorithms. Section 4 describes the structure of HPSOM model. Section 5 describes the experimental setting. Section 6 discussed the experimental results and finally section 7 presents the conclusion and the futures work.
1. PSO Definition The Particle Swarm Optimiser (PSO) is a population-based optimisation method first proposed by Kennedy and Eberhart [1]. PSO technique finds the optimal solution using a population of particles. Each particle represents a candidate solution to the problem. PSO basically developed through simulation of bird flocking in two-dimension space. The particles change their positions by flaying a round the search space until a relatively unchanging has been encountered, or the stop criteria is satisfied. Some of the attractive features of the PSO include ease of implementation and the fact that no gradient information is required. It can be used to solve a wide array of different optimisation problems; some example applications include neural network training and function minimisation The PSO definition is presented as follows: Each individual particle i has the following properties: A current position in search space, xi, a current velocity, vi, and a personal best position in search space, yi. The personal best position, yi, corresponds to the position in search space where particle i had the smallest error as determined by the objective function f , assuming a minimisation task. The global best position denoted by y represent the position yielding the lowest error amongst all the yi. Eqs. (1) and (2) define how the personal and global best values are updated at time t, respectively. It is assumed below that the swarm consists of s particles, thus i 1 .. s . yi (t 1)
yi ( t ) ® ¯ xi (t 1)
if
f ( yi (t ) d f ( xi (t 1)))
if
f ( yi (t ) ! f ( xi (t 1)))
(1)
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation
y(t ) min^ f ( y), f ( y(t ))` y ^y0 (t ), y1(t ),....., ys (t )`
11
(2)
During each iteration each particle in the swarm is updated using the Eqs. (3) and (4). Two pseudo-random sequences, r1 ~ U ( 0 ,1 ) and r2 ~ U ( 0 ,1) are used to effect the stochastic nature of the algorithm. For all dimensions j 1 .. n , let xi; j , yi; j and vi; j be the current position, current personal best position and velocity of the j th dimension of the i th particle. The velocity update step is
v i , j ( t 1)
wv i , j ( t ) c1 r1, j ( t )[ y i , j ( t ) x i , j ( t )] c 2 r2 , j ( t )[ y j ( t ) x i , j ( t )]
(3)
The new velocity is then added to the current position of the particle to obtain the next position of the particle:
x i (t 1)
x i (t ) v i (t 1)
(4)
The value of each dimension of every velocity vector vi is clamped to the range reduce the likelihood of the particle leaving the search space. The value of vmax is usually chosen to be [ v max , v max ] to
vmax k * xmax,
where 0.1 d k t 1.0 ,
Where xmax denotes the domain of the search space. Note that this does not restrict the values of xi to the range [ v max , v max ] ;it merely limits the maximum distance that a particle will move during one iteration. The acceleration coefficients, c1 and c2, control how far a particle will move in a single iteration. Typically these are both set to a value of 2.0, although it has been shown that setting c1 z c2 can lead to improved performance [4]. The inertia weight, w, in Eq. (3) is used to control the convergence behavior of the PSO. Small values of w result in more rapid convergence usually on a suboptimal position, while a too large value may prevent convergence. Typical implementations of the PSO adapt the value of w during the training run, e.g. linearly decreasing it from 1 to near 0 over the run. Convergence can be obtained with fixed values as shown in [4]. The PSO system combines two models: a social-only model and the cognition-only model [2]. These models are represented by the velocity update Eq. (3) components. The second term in the velocity update equation, c1r1, j (t)[ yi, j (t ) xi, j (t )] is associated with cognition since it only takes into account the particle's own experiences. The third term in the velocity update equation, c2r2, j (t)[y j (t) xi, j (t)] , represents the social interaction between the particles. It suggests that individuals ignore their own experience and
12
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation
adjust their behaviour according to the successful beliefs of individuals in the neighbourhood. The algorithm consists of repeated application of the update equations presented above.
2. The Genetic Algorithm
2.1. Overview Genetic algorithms are general-purpose search techniques based on principles inspired from the genetic and evolution mechanisms observed in natural systems and populations of living beings. Their basic principle is the maintenance of a population of solutions to a problem (genotypes) as encoded information individuals that evolve in time [9–11]. Generally, GA comprises three different phases of search: phase 1: creating an initial population; phase 2: evaluating a fitness function; phase 3: producing a new population. A genetic search starts with a randomly generated initial population within which each individual is evaluated by means of a fitness function. Individuals in this and subsequent generations are duplicated or eliminated according to their fitness values. Applying GA operators creates further generations. This eventually leads to a generation of high performing individuals [12].
2.2. The Genetic Algorithm Operators There are usually three operators in a typical genetic algorithm [12]: the first is the production operator (elitism) which makes one or more copies of any individual that posses a high fitness value; otherwise, the individual is eliminated from the solution pool; the second operator is the recombination (also known as the ‘crossover’) operator. This operator selects two individuals within the generation and a crossover site and carries out a swapping operation of the string bits to the right hand side of the crossover site of both individuals. Crossover operations synthesize bits of knowledge gained from both parents exhibiting better than average performance. Thus, the probability of a better performing offspring is greatly enhanced; the third operator is the ‘mutation’ operator. This operator acts as a background operator and is used to explore some of the invested points in the search space by randomly flipping a ‘bit’ in a population of strings. Since frequent application of this operator would lead to a completely random search, a very low probability is usually assigned to its activation.
3. The HPSOM Modal As it was mentioned previously, the behavior of the PSO in the gbest model presents some important aspects related with the velocity update. If a particle’s current position coincides with the global best position, the particle will only move away from this
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation
13
point if its inertia weigh (w) and previous velocity are different from zero. If their previous velocities are very close to zero, then all the particles will stop moving once they catch up with the global best particle, which may lead a to premature convergence of the algorithm. In fact, this does not even guarantee that the algorithm has converged on a local minimum — it merely means that all the particles have converged to the best position discovered so far by the swarm. This phenomenon is known as stagnation [7]. The solution presented in [7] is based on adding a new parameter and additional equations. Another solution is presented in [8] by introducing a breeding and subpopulation. To solve the problem above, this paper proposes a new model called Hybrid Particle Swarm Optimizer with Mutation (HPSOM), by incorporating the mutation process often used in GA into PSO. This process allows the search to escape from local optima and search in different zones of the search space. This process starts with the random choice of a particle in the swarm and moves to different positions inside the search area. In this paper the mutation process is employed by the following equation: mut ( p [ k ])
p ([ k ] * 1 ) Z
(5)
Where p[k] is the random choice particle from the swarm and Z is randomly obtained within the range >0, 0 .1 * ( xmax xmin ) @ , representing 0.1 times the length of the search apace. Figure 1 lists the pseudo-code for the basic HPSOM algorithm. begin Create and initialise: While (stop condition is false) begin evaluation update velocity and position mutation end end Figure 1. The Pseudo Code for HPSOM algorithm.
4. Experimental Setting For comparison, both the original PSO algorithm and the HPSOM algorithm were tested on four benchmark problems, all minimisation problems. The first two functions were unimodal while the last two were multimodal with many local minima. These four functions have been commonly used in other studies on particle swarm optimizers (e.g. [2,4,7]). Spherical: The generalized Sphere function is a very simple, unimodal function with its global minimum located at x = 0, with f(x) = 0. This function has no interaction between its variables.
14
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation n
¦x
f1 ( x )
2 i
i 1
Where, x is a n dimensional real-valued vector and xi is the ith element of that vector. Rosenbrock: The second function is the generalized Rosenbrock function. A unimodal function, with significant interaction between some of the variables. n1
f 2 ( x)
¦(100(x
2 2 2 i 1 xi ) ( xi 1) )
i 1
Griewank: A multi-modal function with significant interaction between its variables, caused by the product term. The global minimize, x = 0, yields a function value of f(x) = 0.
f3 ( x)
1 4000
n
n
¦
xi2
i 1
cos( i 1
xi i
) 1
Rastrigin: The fourth and final test function is the generalized Rastrigin function. A multi-modal version of the Spherical function characterized by deep local minima arranged as sinusoidal bumps. The global minimum is f(x) = 0, where x = 0. The variables of this function are independent. n
f 4 ( x)
¦(x
2 i 10 cos(2Sxi ) 10)
i 1
The search space and initialisation ranges for the experiments are listed in Table 1. Table 1. Search space and initialisation ranges for test function. Search space
Initialisation ranges
f1
Fun.
100 d x i d 100
50 d x i d 100
f2
100 d x i d 100
15 d xi d 30
f3
600 d x i d 600
300 d x i d 600
f4
10 d x i d 10
2.56 d xi d 5.12
All experiments consisted of 100 runs. The PSO and HPSOM parameters were set to the values c1=c2 = 2.0 and a linearly decreasing inertia weight starting at 0.7 and
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation
15
ending at 0.4 was used. The maximum velocity (Vmax) of each particle was set to be half the length of the search space in one dimension The population size in the experiments was fixed to 20 particles in order to keep the computational requirements low [4]. Note that the HPSOM has the additional parameter related to the mutation rate were set to 30%. The experiments of the four functions were done using different dimension (10, 20 and 30) and different iteration (1000, 1500 and 2000) respectively.
5. Experimental Results Table 2 lists the test function, the dimensionality of the function, the number of iteration the algorithm was run and the average best fitness for the best particle found for the 100 runs of the four test functions respectively. Standard error for each value is also listed. Table 2 also list the corresponding average best fitness of both the standard PSO and the HPSOM with the same settings as described in the previous section. Figures 2 to 5 presents the graphs corresponding to the reported experiments. These figures show the average best fitness for each iteration for both the standard PSO model, the HPSOM model. The graphs illustrate a representative set of experiments for functions with a dimensionality of 30. Table 2. The Results of average best fitness of 100 runs (Average best fitness r standard error) f f1
The graphs illustrate experiments with both a unimodal and a multimodal test functions both of 30 dimensions. In experiments with the unimodal functions, the Sphere function and the Rosenbrock function, the HPSOM achieved better results and had much faster convergence than the standard PSO. When the dimensionality of the test functions was higher, the HPSOM accomplished better results than the standard PSO model; this is accomplished by incrementing the iteration number. In the experiments with the multimodal functions, the Griewank function and the Rastrigin function, the HPSOM model had also a faster convergence than the standard PSO, and found the minimum value (zero). The performance results listed at Table 2, shows that the HPSOM model is better than the standard PSO model. This achieved by more exploring the search space using the numerical mutation operation.
16
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation
Figure 2. PSO versus HPSOM model for Spherical function- f1
Figure 3. PSO versus HPSOM model for Rosenbrock function- f2
Figure 4. PSO versus HPSOM model for Griewank function- f3
Figure 5. PSO versus HPSOM model for Rastrigin function- f4
6. Conclusions
This paper introduced a new PSO-based model called Particle Swarm Optimiser with Mutation (HPSOM). The HPSOM algorithm was basically the standard PSO combined with arithmetic mutation. Furthermore, the notion of mutation in the hybrid model was introduced, also from the genetic algorithm field. On unimodal test functions (Sphere and Rosenbrock) and on multimodal test functions (Griewank and Rastrigin) the HPSOM performed better than the standard PSO a comparison of the best optima. The optima found by the hybrid were better to those of the standard PSO model and the convergence speed was faster. Future work should cover and investigate and analyses the behavior of the HPSOM model.
A.A.A. Esmin and G. Lambert-Torres / Hybrid Particle Swarm Optimizer with Mutation
17
References [1] J. Kennedy and R.C. Eberhart - "Particle swarm optimization", Proceeding of the 1995 IEEE International Conference on Neural Networks, vol. 4, 1942-1948. IEEE Press. [2] J. Kennedy, ”Small Worlds and Mega-Minds: Effects of Neighborhood Topology on Particle Swarm Performance”, Proceedings of the 1999 Congress of Evolutionary Computation, vol. 3, 1931–1938. IEEE Press. [3] P. N. Suganthan, ”Particle Swarm Optimizer with Neighbourhood Operator”, Proceedings of the 1999 Congress of Evolutionary Computation, vol. 3, 1958–1962. IEEE Press. [4] Y. Shi and R. C. Eberhart, ”Parameter Selection in Particle Swarm Optimization”, Evolutionary Programming VII (1998), Lecture Notes in Computer Science 1447, 591–600. Springer. [5] R. C. Eberhart and Y. Shi, ”Comparison between Genetic Algorithms and Particle Swarm Optimization”, Evolutionary Programming VII (1998), Lecture Notes in Computer Science 1447, 611–616. Springer. [6] P. J. Angeline, ”Evolutionary Optimization Versus Particle Swarm Optimization: Philosophy and Performance Differences”, Evolutionary Programming VII (1998), Lecture Notes in Computer [7] F. van den Bergh and A. P. Engelbrecht. "A New Locally Convergent Particle Swarm Optimizer." Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Hammamet, Tunisia, October 2002. [8] M. Løvbjerg, T. K. Rasmussen, and T. Krink, “Hybrid Particle Swarm Optimiser with Breeding and Subpopulations”, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), San Francisco, USA, July 2001. [9] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison Wesley, Reading, MA, 1989. [10] L. Davis (Ed.), Handbook of Genetic Algorithms, Van Nostrand, New York, 1991. [11] J.J. Grefenstette, Optimization of control parameters for genetic algorithms, IEEE Trans. Syst. Man Cybern. 16 (1) (1986) 122– 128. [12] B. Awadh, N. Sepehri, O. Hawaleshka, A computer-aided process planning model based on genetic algorithms, Comput. Oper. Res. 22 (8) (1995) 841–856.
An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited Munehiro Iwami Faculty of Science and Engineering, Shimane University Abstract. Simplification orderings, like the recursive path ordering and the improved recursive decomposition ordering, are widely used for proving the termination property of term rewriting systems. The improved recursive decomposition ordering is known as the most powerful simplification ordering. In this paper, we investigate the improved recursive decomposition ordering for proving termination of term rewriting systems. We completely show that the improved recursive decomposition ordering is closed under substitutions. Keywords. Term rewriting system, Termination, Improved recursive decomposition ordering, Simplification ordering
1. Introduction Term rewriting systems (TRSs, for short) are regarded as a computation model that reduces terms by applying directed equations, called rewrite rules. TRSs are widely used as a model of functional and logic programming languages and as a basis of automated theorem proving, symbolic computation, algebraic specification and verification [1,15,23]. The terminating property is fundamental notion of TRSs as computation models [4]. Since the terminating property of TRS is undecidable in general [5], several sufficient conditions for proving this property have been successfully developed in particular cases. These techniques can be classified into two approaches: semantic methods and syntactic methods. Simplification orderings are representatives of syntactic methods [18,21]. Many simplification orderings (for instance, the recursive path ordering (with status) (RPO(S), for short) [2,10], the recursive decomposition ordering (with status) (RDO(S), for short) [8,12,13], the improved recursive decomposition ordering (with status) (IRD(S), for short) [17,19] and so on) have been defined on TRSs. IRDS is among the most powerful simplification orderings [19,20]. First, Jouannaud, Lescanne and Reinig defined the recursive decomposition ordering with multiset status [8]. They said that the closure under substitutions of it is straightforward using definition of decomposition. However they did not give the formal proof of it. Munehiro Iwami, 1060 Nishikawatsu, Matsue, Shimane, 690-8504, Japan, E-mail: [email protected].
M. Iwami / An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited
19
The recursive decomposition ordering with arbitrary status (RDOS) was first described by Lescanne [12]. Complete proofs concerning the lexicographical status are given by Lescanne [13]. An implementation of recursive decomposition ordering with multiset status has been made in the first rewriting environment with tools for proving termination called REVE as it was a convenient tools for proposing extension of the precedence [11]. Rusinowitch [17] gave the definition of the improved recursive decomposition ordering (IRD) and investigated the relationship between several simplification orderings : the path of subterm ordering (PSO) [16], the recursive path ordering (RPO) and the recursive decomposition ordering (RDO). But they did not discuss that IRD is closed under substitutions. Steinbach [19] gave the definition of the improved recursive decomposition ordering with status (IRDS) based on IRD defined by Rusinowitch [17] and compared of the power as well as the time behavior of all orderings suggested [18,20,22]. They showed that IRDS is a simplification ordering and IRDS is closed under substitutions [18,19], however their proof was not complete. They used the proposition that for any substitution , implies as key idea in their proof without proving. But this proposition was not trivial. So, we need give the formal proof in this paper. of it by induction on We proposed IRDS for higher-order rewrite systems, called the higher-order improved recursive decomposition ordering (HIRDS, for short) [6,7]. Our method was inspired by Jouannaud and Rubio’s idea for RPOS [9] and particular properties of IRDS. We showed that our ordering is a more powerful ordering than their ordering. Furthermore we showed that HIRDS is closed under substitutions. However our proof was very complicated and generalized, so we try to show that IRDS is closed under substitutions in this paper. Furthermore we review that IRDS is a simplification ordering. In section 2 we give the basic notations. Section 3 presents the definition of the improved recursive decomposition ordering with status (IRDS) and we completely show that IRDS is closed under substitutions. Also, we review that the IRDS is a simplification ordering.
2. Preliminaries We mainly follow the basic notations of [14,19]. An abstract reduction system (ARS for consisting of a set and a binary relation . We say short) is a pair is terminating if there is no infinite sequence of that ARS elements in . A binary relation on a set is called a (strict) partial ordering over if it is a irreflexive and transitive on . The partial ordering is usually denoted by . A partial ordering on a set is well-founded if has no infinite descending sequences, of elements in . i.e., there is no sequence of the form is a natural number A signature is a set of function symbols. Associated with be the denoting its arity. Function symbols of arity are called constants. Let set of all terms built from and a countably infinite set of variables, disjoint from . . The root symbol of a term The set of variables occurring in a term is denoted by is defined as follows: if is a variable and if .
20
M. Iwami / An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited
A substitution is a map from to with the property that the set is finite. If is a substitution and a term then denotes the result of applying to . We call an instance of . is a term in We introduce a fresh constant symbol , named hole. A context containing precisely one hole. If is a context and a term then denotes the result of replacing the hole in by . implies , A binary relation on terms is closed under substitutions if for any substitution . And a binary relation on terms is closed under contexts if implies , for any context . denotes the size of term , i.e., the total number of function symbols and variables occurring in . Terms are identified with finite labeled trees. A position in a term can be viewed as a finite sequence of natural numbers, pointing out a path from the root of this tree. denotes the set of all positions of a denotes the set of all terminal positions (positions of all leaves) of the term term . . The letter denotes root positions. We write if is a prefix of . The subterms . If and then is of at position is denoted by , and we write . called the proper subterm of , denoted by A rewrite rule on is a pair of terms such that and . where is a set of function A term rewriting system (TRS, for short) is a pair . is often abbreviated as symbols and is a set of rewrite rules on and in that case is defined to be the set of function symbols that appear in . We often present a TRS as a set of rewrite rules, without making explicit its signature, assuming that the signature consists of the function symbols occurring in the rewrite rules. The that contains is denoted by . So if smallest rewrite relation on in , a substitution , and a context such that there exists a rewrite rule and . The subterm of is called a redex and we say that rewrites to by contracting redex . We call a rewrite or reduction step. is defined as the transitive cloGiven a binary relation , the multiset extension on multisets. , , where sure of the following relation and for any , , . Assume is a well-founded ordering on a set . is a well-founded ordering on the multisets of elements in 3 . We say that a Then for any context binary relation on terms has the subterm property if and term .
3. Improved Recursive Decomposition Ordering Revisited Throughout this section we are dealing with finite signatures only. is a partial ordering that Definition 3.1 ([2,4,14]) A simplification ordering on is closed under substitutions, contexts and has the subterm property. Since we are dealing with finite signatures only, we obtain the following result. Theorem 3.2 ([2,4,14]) Simplification orderings are well-founded. We obtain the following theorem from the result of Dershowitz [2]. Theorem 3.3 Let be a TRS and let be a simplification ordering on for any then is terminating.
. If
M. Iwami / An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited
21
Proof. Assume that , where and are terms. There exists a rewrite rule in , a substitution and a context such that and . By the assumption and definition 3.1, hence holds. Since is by theorem 3.2, is terminating. well-founded on The improved recursive decomposition ordering depends on a partial ordering on the signature , the so-called precedence. A status function is assumed, mapping every to either or for some permutation on elements, where is arity of . For a partial ordering on the partial ordering is defined on sequences of length : describes describes lexicographic comparison according to multiset extension and the permutation . The results of an application of the function to a term , , depend on the status of : If , then is the multiset , , and otherwise, is the tuple , , . For , a path-decomposition is a set of subterms , , implies . We also define a decomposiof . Note that tion , , , , , . A decomposition is a . For the path-decomposition multiset of all path-decomposition of the terms , a set of subterms of , , . We give the improved recursive decomposition ordering with status (IRDS) defined by Steinbach [19,20] as following. Definition 3.4 (IRDS) Let and be terms. For a precedence improved recursive decomposition ordering with status (IRDS) on follows: where
and a status the is defined as
is the multiset extension of
. is defined by the following
,
and
.
, or ,
and either , or and , or
,
,
and
. Next, we give the example of comparison using IRDS. Example 3.5 We consider the term , and . as follows: We have and Then and Then and
and for any
where . We give the precedence
. See figure 1. where . where , .
,
22
M. Iwami / An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited
By the following cases , and holds by definition of IRDS.
,
holds. Then
holds.
and and
since
. since
holds. . holds.
and and
since .
Y 1 X
Y
Z
11
21
(1)
(2)
(2)
X
Z
22
21
22
(3)
(1)
(3)
Figure 1.
.
We review that IRDS is a simplification ordering, i.e., IRDS is a partial ordering has the subterm property on that is closed under substitutions and is closed under contexts. These properties are essential for applying IRDS to termination proof of TRS. Lemma 3.6 The IRDS is partial ordering on Proof. Let
, and by induction on
.
and
that
. We can show that imply . For any term and in , we can prove by induction on .
Lemma 3.7 The IRDS on
has the subterm property.
Proof. Let and be terms such that . It is shown by induction on that . The following lemma is the key to prove the main result in this paper that IRDS is closed under substitutions. Lemma 3.8 Let s where and are terms and . Then for any substitution , the following two claims hold. If
then s
and
, for any t .
and such that
M. Iwami / An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited
If
then
, for any
23
such that
. Proof. See appendix A. Lemma 3.9 Let .
and be terms. Then
implies
The following lemma is the main result in this paper. We completely show that IRDS is closed under substitutions. implies
Lemma 3.10 The IRDS is closed under substitutions, i.e., for any substitution .
Proof. Assume that , i.e., where and are terms. We show that , i.e., , holds for any subsuch that stitution . Strictly speaking, we must prove: implies such that . Let , then such that and . Since and by lemma 3.9, there exists such that . To prove that such that , we have to distinguish two cases: , i.e. (Otherwise Since for some , and lemma 3.8, , with . , i.e. . 2.
). ,
1.
. Since . Hence, . Since
. Hence,
and lemma 3.8, . . Hence,
and lemma 3.8, with
, .
Lemma 3.11 The IRDS is closed under contexts. implies Proof. Let and be terms. We have to show that for any context . It can be proved by induction on context . Lemma 3.12 The IRDS is a simplification ordering on
.
Proof. By lemmas 3.6, 3.7, 3.10 and 3.11, the IRDS is partial ordering on is closed under substitutions, contexts and has the subterm property. Example 3.13 ([18]) Given the following signature and TRS
We give the following precedence and status: for any . Since theorem 3.3 and lemma 3.12, is terminating.
:
,
that ,
,
and by example 3.5 and
24
M. Iwami / An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited
4. Conclusion We have investigated the improved recursive decomposition ordering to term rewriting systems for proving termination. We completely have shown that the improved recursive decomposition ordering is closed under substitutions as main result in this paper. Also we have reviewed the improved recursive decomposition ordering is a simplification ordering.
References [1] F. Baader and T. Nipkow: Term rewriting and all that, Cambridge University Press, 1998. [2] N. Dershowitz: Orderings for term-rewriting systems, Theoretical Computer Science 17 (3) (1982), 279–301. [3] N. Dershowitz and Z. Manna: Proving termination with multiset orderings, Communications of the ACM 22 (8) (1979), 465–476. [4] N. Dershowitz: Termination of rewriting, J. Symbolic Computation 3 (1987), 69–116. [5] G. Huet and D. Lankford: On the uniform halting problem for term rewriting systems, Report 283, INRIA, 1978. [6] M. Iwami, M. Sakai and Y. Toyama: An improved recursive decomposition ordering for higher-order rewrite systems, IEICE Transactions on Information and Systems E81-D (9) (1998), 988–996. [7] M. Iwami: Termination of higher-order rewrite systems, Ph.D.Thesis, JAIST, 1999. [8] J. P. Jouannaud and P. Lescanne and F. Reinig: Recursive decomposition ordering, in: Proc. of Working Conf. on Formal Description of Programming Concepts vol.II (IFIP), ((GarmischParternkirchen, Germany, 1982), North-Holland Publishing Company, 1983), 331–348. [9] J. P. Jouannaud and A. Rubio: A recursive path ordering for higher-order terms in -long -normal form, in: Proc. 7th International Conf. on Rewriting Techniques and Applications, Lecture Notes in Computer Science, vol.1103 (Springer-Verlag, 1996), 108–122. [10] S. Kamin and J. J. L´evy, Attempts for generalizing the recursive path orderings, Unpublished manuscript, University of Illinois, 1980. [11] P. Lescanne: Computer experiments with the REVE term rewriting system generator, in: Proc. of ACM Principle of Programming and Languages, (ACM Press, 1983), 99–108. [12] P. Lescanne: Uniform termination of term rewriting systems : Recursive decomposition ordering with status, in: Proc. 9th International Colloquium Trees in Algebra and Programming, (Cambridge University Press, 1984), 181–194. [13] P. Lescanne: On the recursive decomposition ordering with lexicographical status and other related orderings, J. Automated Reasoning 6 (1) (1990), 39–49. [14] A. Middeldorp and H. Zantema: Simple termination of rewrite systems, Theoretical Computer Science 175 (1) (1997), 127–158. [15] E. Ohlebusch: Advanced topics in term rewriting, Springer-Verlag, 2002. [16] D. Plaised: A recursively defined ordering for proving termination of term rewriting systems, Report UIUCDCS-R-78-943, University of Illinois, 1978. [17] M. Rusinowitch: Path of subterms ordering and recursive decomposition ordering revisited, J. Symbolic Computation 3 (1987), 117–131. [18] J. Steinbach: Termination of rewriting-extension, comparison and automatic generation of simplification orderings, Ph.D. Thesis, University of Kaiserslautern, 1994. [19] J. Steinbach: Term orderings with status, SEKI Report SR-88-12, University of Kaiserslautern, 1988.
M. Iwami / An Improved Recursive Decomposition Ordering for Term Rewriting Systems Revisited
25
[20] J. Steinbach: Extensions and comparison of simplification orderings, in: Proc. 3rd International Conf. on Rewriting Techniques and Applications, Lecture Notes in Computer Science, vol.355 (Springer-Verlag, 1989), 434–448. [21] J. Steinbach: Simplification ordering: History of results, Fundamenta Informaticae 24 (1995), 44–87. [22] J. Steinbach: Simplification orderings: Putting them to the test, J. Automated Reasoning 10 (1993), 389–397. [23] Terese: Term rewriting systems, Cambridge University Press, 2003.
A. Proof of lemma 3.8 Definition A.1 Let be a substitution. Let , , denotes , , .
,
,
be a subset of
.
where and be terms and Lemma A.2 (Lemma 3.8) Let and . Then for any substitution , the following two claims hold. If
then
, for any
and then
If
such that
. , for any by induction on
Proof. We show that the claim .
such that
.
. Assume that
. (1) Consider the case and definition of multiset extension, By the assumption , , , , , , consider the cases that and for any , , , there exists , , such that . , we can show that , For any , , , , where , . Hence we have to show that implies . We distinguish the cases . with respect to the definition of If If
then , , then we can show by induction hypothesis. In the case that , and Consider the case that and and , for any
(2) In case of ,
, ,
implies
,
holds. and , ,
, , ,
, , it follows that from induction hypothesis. , , . We can show by induction hypothesis.
, we can show that where , , . Hence we can show that , in similar to the proof of (1).
Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao ♣
Department of Electromechanical Engineering, Faculty of Science and Technology, University of Macau, Macao
Abstract: Data transformation is a kind of data preprocessing [1, 3, 5] and an important procedure for mathematical modelling. Mathematical model estimated based on a training data set results better if the data set has been properly preprocessed before passed to the modelling procedure. In the paper, different preprocessing methods on automotive engine data are examined. The preprocessed data sets using different preprocessing methods are passed to neural networks for models estimation. The generalizations of these estimated models could be verified by applying test sets, which determine the effects of different preprocessing methods. The results of preprocessing methods for automotive engine data are shown in the paper. Key words: Automotive engine setup, PCA, CCA, Kernel PCA, Kernel CCA
Introduction Mathematical modelling [1, 2, 3] is very common in many applications because of its capability of estimating an unknown and complex mathematical model covering the application data. However, as there is a natural law – GIGO (Garbage In Garbage Out). No matter how good the modelling tool is. If garbage data is passed in, then garbage results are returned. Hence data preprocessing is a must for high accuracy of modelling results. Traditional statistical methods concentrate on data redistribution and data sampling in order to provide consistency within the data. However, most statistical methods are not capable to handle high data dimensionality. To overcome this problem, dimensionality reduction is usually applied. However, reducing some input features may cause information loss
1
Corresponding author
C.-m. Vong et al. / Data Transformation in Modern Petrol Engine Tune-up
27
because the input features themselves are highly (and perhaps nonlinearly) correlated. Several preprocessing methods from machine learning, support vector machines (SVM) and statistics are compared to verify their ability to handle the issues of high dimensionality and nonlinear correlation. In the comparison, a testing application of petrol engine tune-up is selected since it involves a moderate number of dimensions (≥ 70) and the engine features are nonlinearly correlated.
1. Data Preprocessing Formally, data preprocessing is a procedure to clean and transform the data before it is passed to other modelling procedure. Data cleaning involves removing the noise and outliers in the data set, while data transforming tries to reduce the irrelevant number of inputs, i.e., reducing the dimensionality of the input space. As data cleaning is very straightforward of applying standard process of “zero mean and unit variance”, the concentration is put on data transformation. The following subsections introduce the common data transformation methods [5, 8].
1.1 Principal Component Analysis A well-known and frequently used technique for dimensionality reduction for input space is linear Principal Component Analysis (PCA). Consider an engine setup dataset X = (x1, x2, …, xN), vectors xk ∈ Rn ,for k = 1 to N, are mapped into lower dimensional vectors zk ∈ Rm with m < n. We proceed then by estimating the covariance matrix:
Ȉˆ xx = where x =
1 N ¦ (x k − x)(x k − x) T N − 1 k =1
1 N
(1)
N
¦x
k
is the mean vector of all training data points (engine setups) and xk is the
k =1
adjustable engine parameters in the kth sample data point (i.e. the kth engine setup). Each x contains n adjustable engine parameters (ECU parameters + camshaft setup parameters), such as ignition spark advance, fuel injection time, valve overlap angle, etc. After that, the eigenvalue decomposition is computed
ˆ u =λu Ȉ xx i i i
(2)
C.-m. Vong et al. / Data Transformation in Modern Petrol Engine Tune-up
28
where ui is the ith eigenvector of
ˆ Ȉ xx
and λi is the ith eigenvalue of
ˆ . By selecting m non-zero Ȉ xx
and large number eigenvalues λ1, λ2, …, λm and the corresponding eigenvectors u1, u2, …, um, m transformed variables (or score variables zk1, zk2,…, zkm) are obtained to produce the reduced mvector zk corresponding to xk:
z ki = (x k − x ) T u i
(3)
Hence, the dimension reduction is done through the following computation, where zk = (zk1, zk2, …, zkm) is the reduced m-vector, with m < n: z k = ( x k − x) T [u1
u2 um ]
= [x k′ 1
x ′k 2 x ′kn ]
= [ zk 1
z k 2 z km ]
ªu11 «u « 12 « « ¬u1n
u 21 u m1 º u m 2 »» » » u 2 n u mn ¼ u 22
The remaining (n – m) eigenvalues, whose values are zero, are neglected, because they are no longer important. In this case, the transformed variables zki are no longer real physical variables. The same procedure is applied to every xk ∈ X ⊂ Rn so that its corresponding zk can be obtained to construct the dataset Z = (z1, z2, …, zN) ⊂ Rm, m < n. This reduced training dataset is used for data modelling instead of the original training dataset X = (x1, x2, …, xN).
1.2 Kernel Principal Component Analysis Linear PCA always performs well in dimensionality reduction when the input variables are linearly correlated. However, for nonlinear case, PCA cannot give good performance. Hence PCA is extended to nonlinear version under support vector machines (SVM) formulation [6]. This nonlinear version is called Kernel PCA (KPCA). The basic idea of KPCA remains the same as PCA except that the transformation of reduced variables zi is done in the kernel space. KPCA involves solving the following system equations in α:
ȍĮ = λĮ
where
(4)
ȍ kl = K (x k , x l )
for k, l = 1,…,N.
C.-m. Vong et al. / Data Transformation in Modern Petrol Engine Tune-up
29
The kernel function K is chosen as RBF (Radial Basis Function), i.e., K(x, y) = exp(-||x-y||/2σ2), with the user predefined standard deviation σ. The vector of variables α = [α1 ; … ; αN] is an eigenvector of
ȍ
and λ ∈ R is the corresponding eigenvalue. In order to obtain the maximal variance, the user
selects the eigenvector corresponding to the largest eigenvalue. The transformed variables (score variables) for vector x become N
z = z i (x ) = ¦ α i ,l K ( x l , x )
(5)
l =1
where αi = [αi1 ; … ; αiN] is the eigenvector corresponding to the ith largest eigenvalue, i = 1, 2, …,p, and p is the largest number such that eigenvalue λp of the eigenvector αp is nonzero. One more point to note is that the eigenvectors αi should satisfy the normalization condition of unit length:
ĮTi Į i =
1
λi
, i = 1,2,...., p
(6)
where λ1 ≥ λ2 ≥ … ≥λp > 0, i.e., λi are nonzero. After obtaining all corresponding reduced vectors zk, the reduced dataset Z is constructed for modelling.
1.3 Canonical Correlation Analysis In canonical correlation analysis (CCA) [4, 8], one is interested in finding the maximal correlation between projected variables zx = wTx and zy = vTy, where x ∈ Rn, y∈ Rm denote given random vectors with zero mean. CCA also involves an eigenvalue problem for which the eigenvectors w, v are solved:
°C xx −1C xy C yy −1C yx w = ρ 2 w ® −1 −1 2 °¯ C yy C yx C xx C xy v = ρ v
(7)
where Cxx = E[xxT], Cyy=E[yyT], Cxy = E[xyT] and the eigenvalues ρ2 are the squared canonical correlations. Only one of the eigenvalue equations needs to be solved since the solutions are related by
C xy w = ρλ x C xx w v T C yy v where λ x = λ−1 = ® y w T C xx w ¯ C yx v = ρλ y C yy v
(8)
C.-m. Vong et al. / Data Transformation in Modern Petrol Engine Tune-up
30
1.4 Kernel Canonical Correlation Analysis In kernel canonical correlation analysis (KCCA) [5, 6], the formulation is similar to CCA except kernel trick is applied. The kernel chosen is again RBF. Solve the following system in α, β as the projection vectors:
v1 , v 2 are lagrange multipiers I is an identity matrix, 1v is a 1 - vector ∈ R N
(10)
2. Modern Petrol Engine Tune-up Modern automotive petrol engines are controlled by the electronic control unit (ECU). The engine performance, such as power output, torque, brake specific fuel-consumption and emission level, is significantly affected by the setup of control parameters in the ECU. Much parameter is stored in the ECU using a look-up table format. Normally, the car engine performance is obtained through dynamometer tests. Traditionally, the setup of ECU is done by the vehicle manufacturer. However, in recent the programmable ECU and ECU read only memory (ROM) editors have been widely adopted by many passenger cars. The devices allow the non-OEM’s engineers to tune-up their engines according to different add-on components and driver’s requirements. Current practice of engine tuneup relies on the experience of the automotive engineer [7]. The engineers will handle a huge number of combinations of engine control parameters. The relationship between the input and output parameters of a modern car engine is a complex multi-variable nonlinear function, which is very difficult to be found, because modern petrol engine is an integration of thermo-fluid, electromechanical and computer control systems. Consequently, engine tune-up is usually done by trial-and-error method. Vehicle manufacturers normally spend many months to tune-up an ECU optimally for a new car model. Moreover, the performance function is engine dependent as well. Knowing the performance function/model can let the automotive engineer predict if a new car engine set-up is gain or loss, and the function can also help the engineer to setup the ECU optimally.
C.-m. Vong et al. / Data Transformation in Modern Petrol Engine Tune-up
31
In order to acquire the performance model for an engine, modelling techniques such as neural networks, or support vector machines could be employed. No matter which method is used for modelling, the data must be preprocessed. In the paper, neural networks are used for model testing because it is easy to use and easily implemented in many commercial packages, such as MATLAB Neural Networks Toolbox.
3. Experiment Setup In order to compare the previous methods, a set of 200 different sample data is acquired through the dynamometer. Practically, there are many input control parameters that are also ECU and engine dependent. Moreover, the engine horsepower and torque curves are normally obtained at full-load condition. The following common adjustable engine parameters and environmental parameters are selected to be the input (i.e., engine setup) at engine full-load condition.
x = < Ir, O, tr, f, Jr, d, a, p > and y =
where
− r: Engine speed (RPM) and r ∈ ī = {1000, 1500, 2000, 2500, …, 8000} − Ir: Ignition spark advance at the corresponding engine speed r (degree before top dead centre) − O: Overall ignition trim ( ± degree before top dead centre) − tr: Fuel injection time at the corresponding engine speed r (millisecond) − f: Overall fuel trim ( ± %) − Jr: Timing for stopping the fuel injection at the corresponding engine speed r (degree before top dead centre) − d: Ignition dwell time at 15V (millisecond) − a: Air temperature (°C) − p: Fuel pressure (Bar) − Tr: Engine torque at the corresponding engine speed r (Nm) After acquiring the sample data, it is ready to pass to each of the mentioned preprocessing methods to verify which method is best to automotive engine data. Those methods are implemented in commercial computing package such as MATLAB running under Windows XP.
4. Results Results are separated into two parts: dimensionality reduction and pertained accuracy. Table 1 shows the effects of dimensionality reduction of different methods, with 5% information loss, i.e., all the dimensions contributing only 5% information in total for the training data set are discarded. The number of original dimensions is calculated as (size of ī) x (number of attributes with subscript r) +
C.-m. Vong et al. / Data Transformation in Modern Petrol Engine Tune-up
32
(number of attributes without subscript r) = 16 x 4 + 6 =70. Hence before any data preprocessing, the number of dimensions is 70. After applying different preprocessing methods, the reduced numbers of dimensions are obtained and shown in Table 1. Another result is about the accuracy pertained, i.e., the generalization on unseen inputs of the models built using the reduced dimensional data set. To compare the pertained accuracy, we need a mathematical model to be built based on the original data set and additional four models based on the reduced data set. In total, there are five mathematical models built. In our case, neural networks [1, 3, 8] are used as the modelling tool because it is very mature and available in MATLAB Neural Networks Toolbox. The setting of the neural networks is as follows:
− Input neurons: as indicated in Table 1 for corresponding preprocessed methods − Hidden neurons: 50, this is just a guess and usually it is already capable to train the networks − Output neurons: | ī | =16, i.e., the torques at the 16 different rpm. − Activation function for hidden neurons: Tan-Sigmoid Transfer function. − Activation function for output neurons: Pure linear filter function. Sample network architecture for KPCA is shown in the Figure 1. After building the five models upon different number of engine features, the generalizations for the five models are tested upon a common test set of 20 cases that are acquired from the dynamometer as well. Table 2 shows the results of average accuracy where MSE (Mean Squared Error) function is employed. From the results, it is shown that KPCA performs best among all preprocessing (or no preprocessing) methods, because the engine features are nonlinearly correlated.
Conclusions Data transformation is a useful preprocessing procedure for data modelling when the dimensionality of a training data set is high. With lower dimensions, the computation issues are relaxed and the estimated models based on the reduced training set may even perform better in not only the training accuracy but also the generalization. In this paper, different preprocessing methods are tested and the results are compared. In the application of petrol engine tune-up, it is verified that KPCA is the best among the methods we tested. The reason is that the engine features are nonlinearly correlated.
C.-m. Vong et al. / Data Transformation in Modern Petrol Engine Tune-up
33
Reference [1] A. Smola, C. Burges, H. Drucker, S. Golowich, L. Van Hemmen, K. Muller, B. Scholkopf, V. Vapnik. Regression
Estimation
with
Support
Vector
Learning
Machines,
1996.
Available
at
http://www.first.gmd.de/~smola [2] C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press. , 1995. [3] D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, 1999. [4] D. Borowiak. Model Discrimination for Nonlinear Regression Models. Marcel Dekker, 1989. [5] J. Suykens, T. Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle. Least Squares Support Vector Machines. World Scientific, 2002. [6] M. Seeger. Gaussian processes for machine learning. International
Journal of Neural Systems, 2004,
14(2), pp. 1-38. [7] M. Traver, R. Atkinson and C. Atkinson. Neural Network-based Diesel Engine Emissions Prediction Using InCylinder Combustion Pressure. SAE Paper 1999-01-1532, 1999. [8] S. Haykin. Neural Networks: A comprehensive foundation. Prentice Hall, 1999. Table 1. Comparison of dimensionality reduction for different methods Method PCA KPCA CCA KCCA
Original Dimension 70 70 70 70
Reduced Dimension 65 61 66 60
Table 2. Comparison of model generalizations built upon the reduced training sets Method No preprocessing PCA KPCA CCA KCCA
Accuracy on test set 92.2% 86.3% 93.1% 90.1% 91.2%
Testing Significance in Bayesian Classifiers Marcelo de S. Lauretto, Julio M. Stern BIOINFO and Computer Science Dept., São Paulo University Abstract. The Fully Bayesian Significance Test (FBST) is a coherent Bayesian significance test for sharp hypotheses. This paper explores the FBST as a model selection tool for general mixture models, and gives some computational experiments for Multinomial-Dirichlet-Normal-Wishart models. Keywords. Mixture models, classification, significance tests
1. FBST and Model Selection The Fully Bayesian Significance Test (FBST) is presented by Pereira and Stern, [1], as a coherent Bayesian significance test. The FBST is intuitive and has a geometric characterization. In this article the parameter space, Θ, is a subset of Rn , and the hypothesis is defined as a further restricted subset defined by vector valued inequality and equality constraints: H : θ ∈ ΘH where ΘH = {θ ∈ Θ | g(θ) ≤ 0 ∧ h(θ) = 0}. For simplicity, we often use H for ΘH . We are interested in precise hypotheses, with dim(Θ0 ) < dim(Θ) . f (θ) is the posterior probability density function. The computation of the evidence measure used on the FBST is performed in two steps: The optimization step consists of finding f ∗ , the maximum (supremum) of the posterior under the null hypothesis. The integration step consists of integrating the posterior density over the Tangential Set, T where the posterior is higher than anywhere in the hypothesis, i.e., f (θ)dθ , where Ev(H) = Pr(θ ∈ T | x) = T ∗
T = {θ ∈ Θ : f (θ) > f } and f ∗ = supH f (θ) Ev(H) is the evidence against H, and Ev(H) = 1 − Ev(H) is the evidence supporting (or in favour of) H. Intuitively, if Ev(H) is “large”, T is “heavy”, and the hypothesis set is in a region of “low” posterior density, meaning a “strong” evidence against H. Several FBST applications and examples, efficient computational implementation, interpretations, and comparisons with other techniques for testing sharp hypotheses, can be found in the authors’ papers in the reference list. 2. Dirichlet-Normal-Wishart Mixtures In a d-dimensional multivariate finite mixture model with m components (or classes), and sample size n, any given sample xj is of class k with probability wk ; the weights,
M. de S. Lauretto and J.M. Stern / Testing Significance in Bayesian Classifiers
35
wk , give the probability that a new observation is of class k. A sample j of class k = c(j) is distributed with density f (xj | ψk ). This paragraph defines some general matrix notation. Let r:s:t indicate either the vector [r, r + s, r + 2s, . . . t] or the corresponding index range from r to t with step s; r:t is a short hand for r:1:t. A matrix array has a superscript index, like S 1 . . . S m . So k Sh,i is the h-row, i-column element of matrix S k . We may write a rectangular matrix, X, with the row (or shorter range) index subscript, and the column (or longer range) index superscript. So xi , xj , and xji are row i, column j, and element (i, j) of matrix X. 0 and 1 are matrices of zeros and ones which dimensions are given by the context. V > 0 is a positive definite matrix. In this paper, let h, i be indices in the range 1:d, k in 1:m, and j in 1:n. The classifications zkj are boolean variables indicating whether or not xj is of class k, i.e. zkj = 1 iff c(j) = k. Z is not observed, being therefore named latent variable or missing data. Conditioning on the missing data, we get: m m f (xj | θ) = f (xj | θ, zkj )f (zkj | θ) = wk f (xj | ψk ) k=1 k=1 n n m f (X | θ) = f (xj | θ) = wk f (xj | ψk ) j=1
j=1
k=1
Given the mixture parameters, θ, and the observed data, X, the conditional classification probabilities, P = f (Z | X, θ), are: pjk = f (zkj | xj , θ) =
f (zkj , xj | θ) wk f (xj | ψk ) = m j f (xj | θ) k=1 wk f (x | ψk )
We use yk for the number of samples of class k, i.e. yk = j zkj , or y = Z1. The likelihood for the “completed” data, X, Z, is: n m wk yk f (xj | ψc(j) )f (zkj | θ) = f (xj | ψk ) f (X, Z | θ) = j=1
k=1
j | c(j)=k
We will see in the following sections that considering the missing data Z, and the conditional classification probabilities P , is the key for successfully solving the numerical integration and optimization steps of the FBST. In this article we will focus on Gaussian finite mixture models, where f (xj | ψk ) = N (xj | bk , Rk ), a normal density with mean bk and variance matrix V k , or precision Rk = (V k )−1 . Next we specialize the theory of general mixture models to the Dirichlet-Normal-Wishart case. Consider the random matrix Xij , i in 1:d, j in 1:n, n > d, where each column contains a sample element from a d-multivariate normal distribution with parameters b (mean) and V (covariance), or R = V −1 (precision). Let u and S denote the statistics: n u = (1/n) xj = (1/n) X1 S=
n j=1
j=1
(xj − b) ⊗ (xj − b) = (X − b)(X − b)
The random vector u has normal distribution with mean b and precision nR. The random matrix S has Wishart distribution with n degrees of freedom and precision matrix R. The Normal, Wishart and Normal-Wishart pdfs have expressions:
36
M. de S. Lauretto and J.M. Stern / Testing Significance in Bayesian Classifiers n d/2 N (u | n, b, R) = ( 2π ) |R|
W (S | e, R) = c−1 |S|
1/2
(e−d−1)/2
exp ( −(n/2)(u − b) R(u − b) ) exp ( −(1/2)tr(S R) )
−e/2 ed/2 d(d−1)/4 d with normalization constant c = |R| 2 π i=1 Γ((e − i + 1)/2) . Now consider the matrix X as above, with unknown mean b and unknown precision matrix R, and the statistic n (xj − u) ⊗ (xj − u) = (X − u)(X − u) S= j=1
The conjugate family of priors for multivariate normal distributions is the NormalWishart, see [2]. For the precision matrix R, take as prior the wishart distribution with e˙ > d − 1 degrees of freedom and precision matrix S˙ and, given R, take as prior for b a multivariate normal with mean u˙ and precision nR, ˙ i.e. let us take the Normal-Wishart ˙ Then, the posterior distribution for R is a Wishart distribprior N W (b, R | n, ˙ e, ˙ u, ˙ S). ¨ and the posterior for b, given R, is ution with e¨ degrees of freedom and precision S, k-Normal with mean u ¨ and precision n ¨ R, i.e., we have the Normal-Wishart posterior: ¨ = W (R | e¨, S) ¨ N (b | n N W (b, R | n ¨ , e¨, u ¨, S) ¨, u ¨, R) n ¨ = n˙ + n , e¨ = e˙ + n , u ¨ = (nu + n˙ u)/¨ ˙ n S¨ = S + S˙ + (nn/¨ ˙ n)(u − u) ˙ ⊗ (u − u) ˙ All covariance and precision matrices are supposed to be positive definite, and proper priors have e˙ ≥ d, and n˙ ≥ 1. Non-informative Normal-Wishart improper priors are given by n˙ = 0, u˙ = 0, e˙ = 0, S˙ = 0, i.e. we take a Wishart with 0 degrees of freedom as prior for R, and a constant prior for b, see [2]. Then, the posterior for R is a Wishart with n degrees of freedom and precision S, and the posterior for b, given R, is d-Normal with mean u and precision nR. The conjugate prior for a multinomial distribution is a Dirichlet distribution: M (y | n, w) = n! y1 ! . . . ym ! w1 y1 . . . wm ym m D(w | y) = Γ(y1 + . . . + yk ) Γ(y1 ) . . . Γ(yk ) wk yk −1 k=1
with w > 0 and w1 = 1. Prior information given by y, ˙ and observation y, result in the posterior parameter y¨ = y˙ + y. A non-informative prior is given by y˙ = 1. Finally, we can write the posterior and completed posterior for the model as: ˙ = f (X | θ)f (θ | θ) ˙ f (θ | X, θ) n m j pk wk N (xj | bk , Rk ) f (X | θ) = j=1
k=1
m ˙ = D(w | y) f (θ | θ) ˙ N W (bk , Rk | n˙ k , e˙ k , u˙ k , S˙ k ) k=1
pjk = wk N (xj | bk , Rk )
m k=1
wk N (xj | bk , Rk )
˙ = f (θ|X, Z)f (θ|θ) ˙ = D(w|¨ f (θ|X, Z, θ) y)
m k=1
N W (bk , Rk | n ¨ k , e¨k , u ¨k , S¨k )
M. de S. Lauretto and J.M. Stern / Testing Significance in Bayesian Classifiers
37
y = Z1 , y¨ = y˙ + y , n ¨ = n˙ + y , e¨ = e˙ + y n j n j zk xj , S k = zk (xj − uk ) ⊗ (xj − uk ) uk = (1/yk ) j=1
j=1
u ¨k = (1/¨ yk )(n˙ k u˙ k + yk uk ) S¨k = S k + S˙ k + (n˙ k yk n ¨ k )(uk − u˙ k ) ⊗ (uk − u˙ k )
3. Gibbs Sampling, Integration and Optimization In order to integrate a function over the posterior measure, we use an ergodic Markov Chain. The form of the Chain below is known as Gibbs sampling, and its use for numerical integration is known as Markov Chain Monte Carlo, or MCMC. Given θ, we can compute P . Given P , f (z j | pj ) is a simple multinomial distribution. Given the latent variables, Z, we have simple conditional posterior density expressions for the mixture parameters: f (w | Z, y) ˙ = D(w | y¨) , f (Rk | X, Z, e˙ k , S˙ k ) = W (R | e¨k , S¨k ) ¨k , u ¨k , Rk ) f (bk | X, Z, Rk , n˙ k , u˙ k ) = N (b | n Gibbs sampling is nothing but the MCMC generated by cyclically updating variables Z, θ, and P , by drawing θ and Z from the above distributions, see [3,4]. A multinomial variate can be drawn using a uniform generator. A Dirichlet variate w can be drawn using a gamma generator with shape and scale parameters α and β, see [5]. Johnson [6] describes a simple procedure to generate the Cholesky factor of a Wishart variate W = U U with n degrees of freedom, from the Cholesky factorization of the covariance m V = gk = G(yk , 1) ; b) wk = gk / R−1 = C C , and a chi-square generator: a) k=1 gk ; c) for i < j , Bi,j = N (0, 1) ; d) Bi,i = χ2 (n − i + 1) ; and e) U = BC . All subsequent matrix computations proceed directly from the Cholesky factors, [7]. Given a mixture model, we obtain an equivalent model renumbering the components 1:m by a permutation σ([1:m]). This symmetry must be broken in order to have an identifiable model, see [8]. Let us assume there is an order criterion that can be used when numbering the components. If the components are not in the correct order, Label Switching is the operation of finding permutation σ([1:m]) and renumbering the components, so that the order criterion is satisfied. If we want to look consistently at the classifications produced during a MCMC run, we must enforce a label switching to break all non-identifiability symmetries. For example, in the Dirichlet-Normal-Mixture model, we could choose to order the components (switch labels) according to the the rank given by: 1) A given linear combination of the vector means, c ∗ bk ; 2) The variance determinant |V k |. The choice of a good label switching criterion should consider not only the model structure and the data, but also the semantics and interpretation of the model. The semantics and interpretation of the model may also dictate that some states, like certain configurations of the latent variables Z, are either meaningless or invalid, and shall not be considered as possible solutions. The MCMC can be adapted to deal with forbidden states by implementing rejection rules, that prevent the chain from entering the forbidden regions of the complete and/or incomplete state space, see [9,10].
38
M. de S. Lauretto and J.M. Stern / Testing Significance in Bayesian Classifiers
˙ see [11, The EM algorithm optimizes the log-posterior function f l(X | θ)+f l(θ | θ), 12,13]. The EM is derived from the conditional log-likelihood, and the Jensen inequality: If w, y > 0, w 1 = 1 then log w y ≥ w log y. Let θ and θ˜ be our current and next estimate of the MAP (Maximum a Posteriori), and pjk = f (zkj | xj , θ) the conditional classification probabilities. At each iteration, the log-posterior improvement is: ˜ θ | X, θ) ˙ = f l(θ˜ | X, θ) ˙ − f l(θ | X, θ) ˙ = δ(θ, ˜ θ | X) + δ(θ, ˜ θ | θ) ˙ δ(θ, ˜ θ | θ) ˙ = f l(θ˜ | θ) ˙ − f l(θ | θ) ˙ δ(θ, ˜ θ | X) = f l(X | θ) ˜ − f l(X | θ) = δ(θ,
j ˜ j pj w w ˜k f (xj | ψ˜k ) k ˜k f (x | ψk ) ˜ θ | xj ) = p log ≥ ∆( θ, k k pj k f (xj | θ) pjk f (xj | θ) k
˜ θ | X, θ) ˙ = ∆(θ, ˜ θ | X) + δ(θ, ˜ θ | θ), ˙ is a lower bound to δ(θ, ˜ θ | X, θ). ˙ Hence, ∆(θ, ˙ ˙ Also ∆(θ, θ | X, θ) = δ(θ, θ | X, θ) = 0. So, under mild differentiability conditions, both surfaces are tangent, assuring convergence of EM to the nearest local maximum. But ˜ θ | X, θ) ˙ over θ˜ is the same as maximizing maximizing ∆(θ,
˙ ˜ θ) = pj log w ˜k f (xj | ψ˜k ) + f l(θ˜ | θ) Q(θ, k,j k
and each iteration of the EM algorithm breaks down in two steps: E-step: Compute P = E(Z | X, θ) . ˜ θ) , given P . M-step: Optimize Q(θ, For the Gaussian mixture model, with a Dirichlet-Normal-Wishart prior, m n j ˙ ˜ θ) = ˜ k ) + f l(θ˜ | θ) ˜k + log N (xj | ˜bk , R pk log w Q(θ, k=1
j=1
˙ = log D(w f l(θ˜ | θ) ˜ | y) ˙ +
m k=1
˜ k | n˙ k , e˙ k , u˙ k , S˙ k ) log N W (˜bk , R
Lagrange optimality conditions give a simple analytical solutions for the M-step: m
y = P1 , w ˜k = (yk + y˙ k − 1) y˙ k n−m+ k=1
uk =
1 yk
n
pj xj j=1 k
, Sk =
n
pj (xj j=1 k
− ˜bk ) ⊗ (xj − ˜bk )
k k k k k ˜k ˜k ˙k ˜bk = n˙ k u˙ + yk u , V˜ k = S + n˙ k (b − u˙ ) ⊗ (b − u˙ ) + S n˙ k + yk yk + e˙ k − d
In more general (non-Gaussian) mixture models, if an analytical solution for the M-step is not available, a robust local optimization algorithm can be used, for example [14]. The EM is a local optimizer, but the MCMC provides plenty of starting points, so we have the basic elements for a global optimizer. To avoid using many starting points going to a same local maximum, we can filter the (ranked by the posteriori) top portion of the
39
M. de S. Lauretto and J.M. Stern / Testing Significance in Bayesian Classifiers 7
7
6.5
6.5
6
6
5.5
5.5
5
5
4.5 5
6
7
8
4.5 5
6
7
8
Figure 1. Iris virginica data and models with one (left) and two (right) components
MCMC output using a clustering algorithm, and select a starting point from each cluster. For better efficiency, or more complex problems, the Stochastic EM algorithm can be used to provide starting points near each important local maximum, see [15,16,17]. 4. Experimental Tests and Final Remarks Our test case is the Iris virginica data set, with sepal and petal length of 50 specimens (1 discarded outlier), where the botanical problem consists of determining whether or not there are two distinct subspecies in the population, [18,19]. Here, the data X are assumed to follow a mixture of bivariate normal distributions with unknown parameters, including the number of components. Figure 1 presents the dataset and posterior density optimized for the 1 and 2 component models. level curves for the parameters, θ∗ and θ, In the FBST formulation of the problem, the 2 components is the base model, and the hypothesis to be tested is the constraint of having only 1 component. The FBST selects the 2 component model, rejecting H, if the evidence against the hypothesis is above a given threshold, Ev(H) > τ , and selects the 1 component model, accepting H, otherwise. The threshold τ is chosen by empirical power analysis, see [21,22,23]. Let θ∗ and θ represent the constrained and unconstrained (1 and 2 components) maximum a posteriori (MAP) parameters optimized to the Iris dataset. Generate two collections of t α(τ ) and β(τ ) simulated datasets of size n, the first collection at θ∗ , and the second at θ. are the empirical type 1 and type 2 statistical errors, i.e., the rejection rate in the first collection and the acceptance rate in the second collection. A small, t = 500, calibration run sets the threshold τ so to minimize the total error, (α(τ ) + β(τ ))/2. Other methods like sensitivity analysis, see [24,25,26], and loss functions, see [27], could also be used. When implementing the FBST one has to be careful with trapping states on the MCMC. These typically are states where one component has a small number of sample points, that become (nearly) collinear, resulting in a singular posterior. This problem is particularly serious with the Iris dataset because of the small precision, only 2 significant digits, of the measurements. A standard way to avoid this inconvenience is to use flat or minimally informative priors, instead of non-informative priors, see [20]. We used as flat prior parameters: y˙ = 1, n˙ = 1, u˙ = u, e˙ = 3, S˙ = (1/n)S. Robert [20] uses, with similar effects, e˙ = 6, S˙ = (1.5/n)S.
40
M. de S. Lauretto and J.M. Stern / Testing Significance in Bayesian Classifiers α
β
100
(α+β)/2
100
55
90
90
50
80
80
45
70
70
60
60
50
50
40
35
30
25
40
40
30
30
20
20
10
10
20
0 50
100
150
0 50
15
10
5
100
150
0 50
100
150
Figure 2. FBST(O), AIC(X), AIC3(+) and BIC(*): Type 1, 2 and total error rates for different sample sizes.
Biernacki and Govaert [28] studied similar mixture problems and compared several selection criteria, pointing as the best overall performers: AIC - Akaike Information Criterion, AIC3 - Bozdogan’s modified AIC, and BIC - Schwartz’ Bayesian Information Criterion. These are regularization criteria, weighting the model fit against the number of parameters, see [29]. If λ is the model log-likelihood, κ its number of parameters, and n the sample size, then, AIC = −2λ + 2κ , AIC3 = −2λ + 3κ and BIC = −2λ + κ log(n) . Figure 2 shows α, β, and the total error (α + β)/2. The FBST outperforms all the regularization criteria. For small samples, BIC is very biased, always selecting the 1 component model. AIC is the second best criterion, caching up with the FBST for sample sizes larger than n = 150. Finally, let us point out a related topic for research: The problem of discriminating between models consists of determining which of m alternative models, fk (x, ψk ), more adequately fits or describes a given dataset. In general the parameters ψk have distinct dimensions, and the models fk have distinct functional forms. In this case it is usual to call them “separate” models (or hypotheses). Atkinson [30], although in a very different theoretical framework, was the first to analyse this problem using a mixture formulation, m f (x | θ) = wk fk (x, ψk ) . k=1
The theory for mixture models presented here can be adapted to analyse the problem of discriminating between separate hypotheses. This is the subject of the authors’ forthcoming articles with Carlos Alberto de Bragança Pereira and Basílio de Bragança Pereira. The authors are grateful for support of Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Fundação de Apoio à Pesquisa do Estado de São Paulo (FAPESP).
References [1] C.A.B.Pereira, J.M.Stern, (1999). Evidence and Credibility: Full Bayesian Significance Test for Precise Hypotheses. Entropy Journal, 1, 69–80. [2] M.H.DeGroot (1970). Optimal Statistical Decisions. NY: McGraw-Hill.
M. de S. Lauretto and J.M. Stern / Testing Significance in Bayesian Classifiers
41
[3] W.R.Gilks, S.Richardson, D.J.Spiegelhalter (1996). Markov Chain Monte Carlo in Practice. NY: CRC Press. [4] O.Häggström (2002). Finite Markov Chains and Algorithmic Applications. Cambridge Univ. [5] J.E.Gentle (1998). Random Number Generator and Monte Carlo Methods. NY: Springer. [6] M.E.Johnson (1987). Multivariate Statistical Simulation. NY: Wiley. [7] M.C.Jones (1985). Generating Inverse Wishart Matrices. Comm. Statist. Simula. Computa. 14, 511–514. [8] M.Stephens (1997). Bayesian Methods for Mixtures of Normal Distributions. Oxford Univ. [9] C.H.Bennett (1976). Efficient Estimation of Free Energy Differences from Monte Carlo Data. Journal of Computational Physics 22, 245-268. [10] X.L.Meng, W.H.Wong (1996). Simulating Ratios of Normalizing Constants via a Simple Identity: A Theoretical Exploration. Statistica Sinica, 6, 831-860. [11] A.P.Dempster, N.M.Laird, D.B.Rubin (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Stat. Soc. B, 39, 1-38. [12] D.Ormoneit, V.Tresp (1995). Improved Gaussian Mixtures Density Estimates Using Bayesian Penalty Terms and Network Averaging. Advances in Neural Information Processing Systems 8, 542–548. MIT. [13] S.Russel (1988). Machine Learning: The EM Algorithm. Unpublished note. [14] J.M.Martinez (2000). BOX-QUACAN and the Implementation of Augmented Lagrangian Algorithms for Minimization with Inequality Constraints. Comp. Appl. Math. 19, 31-56. [15] G.Celeux, D.Chauveau, J.Diebolt (1996). On Stochastic Versions of the EM Algorithm. An Experimental Study in the mixture Case. Journal of Statistical Computation and Simulation, 55, 287–314. [16] G.C.Pflug (1996). Optimization of Stochastic Models: The Interface Between Simulation and Optimization. Boston: Kluwer. [17] J.C.Spall (2003). Introduction to Stochastic Search and Optimization. Hoboken: Wiley. [18] E.Anderson (1935). The Irises of the Gaspé Peninsula. Bulletin of the American Iris Society, 59, 2-5. [19] G.McLachlan, D.Peel (2000). Finite Mixture Models. NY: Wiley. [20] C.P.Robert (1996). Mixture of Distributions: Inference and Estimation. In [3]. [21] M.Lauretto, C.A.B.Pereira, J.M.Stern, S.Zacks (2003). Comparing Parameters of Two Bivariate Normal Distributions Using the Invariant FBST. Brazilian Journal of Probability and Statistics, 17, 147-168. [22] M.R.Madruga, C.A.B.Pereira, J.M.Stern (2003). Bayesian Evidence Test for Precise Hypotheses. Journal of Statistical Planning and Inference, 117,185–198. [23] J.M.Stern, S.Zacks (2002). Testing the Independence of Poisson Variates under the Holgate Bivariate Distribution. The Power of a New Evidence Test. Statistical and Probability Letters, 60, 313–320. [24] J.M.Stern (2003). Significance Tests, Belief Calculi, and Burden of Proof in Legal and Scientific Discourse. Laptec’03, Frontiers in Artificial Intelligence and its Applications, 101, 139–147. [25] J.M.Stern (2004a). Paraconsistent Sensitivity Analysis for Bayesian Significance Tests. SBIA’04, Lecture Notes Artificial Intelligence, 3171, 134–143. [26] J.M.Stern (2004b). Uninformative Reference Sensitivity in Possibilistic Sharp Hypotheses Tests. MaxEnt 2004, American Institute of Physics Proceedings, 735, 581–588. [27] M.Madruga, L.G.Esteves, S.Wechsler (2001). On the Bayesianity of Pereira-Stern Tests. Test,10,291–299. [28] C.Biernacki G.Govaert (1998). Choosing Models in Model-based Clustering and Discriminant Analysis. Technical Report INRIA-3509-1998. [29] C.A.B.Pereira, J.M.Stern, (2001). Model Selection: Full Bayesian Approach. Environmetrics, 12, 559–568. [30] A Method for Discriminating Between Models. J. Royal Stat. Soc. B, 32, 323-354.
Obtaining Membership Functions from a Neuron Fuzzy System Extended by Kohonen Network Angelo Pagliosa a,1 , Claudio Cesar de Sá b and F. D. Sasse c Departamento de Engenharia Elétrica, UDESC, 89223-100 Joinville,SC, Brazil b Departamento de Ciência da Computação, UDESC, 89223-100 Joinville,SC, Brazil c Departamento de Matemática, UDESC, 89223-100 Joinville,SC, Brazil a
Abstract. This article presents an hybrid computational model , called Neo-FuzzyNeuron Modified by Kohonen Network (NFN-MK),that combines fuzzy system techniques and artificial neural networks. Its main task consists in the automatic generation of membership functions, in particular, triangle forms, aiming a dynamic modeling of a system. The model is tested by simulating real systems, here represented by a nonlinear mathematical function. Comparison with the results obtained by traditional neural networks, and correlated studies of neurofuzzy systems applied in system identification area, shows that the NFN-MK has a similar performance, despite its greater simplicity. Keywords. Artificial neural networks, Neurofuzzy systems, Kohonen Networks
1. Introduction A traditional approach to Artificial Intelligence (AI) is known as connectionism, and its represented by the field of Artificial Neural Network (ANN). A second approach to AI is the symbolic one, with its various branches, Fuzzy Logic (FL) among them. ANNs models offer the possibility of learning from input/output data and its functionality is inspired by biological neurons. Normally, ANNs require a relative long training time and cannot be described as a mechanism capable to explain how its results were obtained by training. Therefore, some projects involving ANNs can become complex, also lacking strong foundations. On the other hand, the FL framework deals with approximate reasoning typical from human minds. It allows the use of linguistic terminology and common sense knowledge. The main limitation of this system consists in the nonexistence of learning mechanisms, capable of generating fuzzy rules and membership functions, which depend on the specialist knowledge. An interesting alternative consists of dealing with hybrid systems (HS), which use the advantages of both ANNs and FL. It can also employ the process model knowledge in order to decrease the duration of the project. In particular, HS are characterized by 1 Correspondence to: F. D. Sasse, Dept. of Mathematics, CCT/UDESC, 89223-100 Joinville, SC, Brazil. Tel.:
A. Pagliosa et al. / Obtaining Membership Functions from a Neuron Fuzzy System
new architecture, learning methods, predefined parameters and knowledge representation, combining the fuzzy systems capacity of dealing with unprecise data and ANN ability in learning process by examples. This work proposes a hybrid system, called Neo-Fuzzy-Neuron Modified by the Kohonen Network (NFN-MK), applied to the context of function identification. The NeoFuzzy-Neuron (NFN) was originally proposed by Uchino and Yamakawa [6] as an hybrid alternative model applied to real systems. The NFN-MK is an extension of NFN that uses a Kohonen Network to generate initial positions of triangular curves, that model the fuzzy neuron.
2. Description of the Model The NFK-MK model is applied here in the context of function approximations, with the objective of adjusting membership functions for a neurofuzzy system, using Kohonen self-organizing map. The neurofuzzy model used is the Neo-Fuzzy-Neuron (NFN) developed by Yamakawa [9,6]. This model was chosen for its short training time, compared to those typical from multilayer networks [8]. The input functions in the system are supposed to be unknown, except by samples of data. The problem consists in determining a system in which the process output (yd ) and the model process (y) become close together, according to a given criteria [7]. In general we suppose that the process block shown in figure 1 is nonlinear. In general this implies in difficulties for the mathematical modeling. The NFN-MK model can be advantageously used in cases like this, without the necessity of linearizing techniques, unsuitable for some inputs. Output
Input
Nonlinear System
− error
+
NFN with S2
triangular curves adjusting weights in NFN
S1
Kohonen Network Finding initial vertices of membership functions
NFN−MK
Figure 1.: Block diagram for the training phase of NFN-MK
44
A. Pagliosa et al. / Obtaining Membership Functions from a Neuron Fuzzy System
The proposed model consists in two main blocks as seen in figure 1, included in the dashed box. One of them is the original Yamakawa’s NFN [9,6]. It works on triangular curves in its neuron fuzzy model, used for simplicity. In our model the NFN is extended by a classical application of Kohonen’s network [3]. As shown in figure 1 there are two switches, used for the training process. When switch S2 is open and S1 closed, the Kohonen network works, looking for the central vertices of the triangular membership functions. Initially these values are equally divided in seven fuzzy curves, which belong to the fuzzy neuron network. After this phase, the locations of the vertices are updated in the NFN block, where the base points are found by a new training. The training on the NFN block occurs when S1 is open and S2 is closed. The training proceeds like a backpropagation algorithm, finding the weights and base points of the triangular curves for each neuron. New points for these triangular curves represent rules such “if-then” like original idea of NFN. Here seven membership functions of triangular type are uniformly distributed in an interval [xmax , xmin ] of input domain. The values of the membership functions are based in the experiments of Shaw et al. [5], which show that a change from five to seven triangular sets increases the precision in about 15%. There are no significative improvements for greater number of sets. We note that equidistant membership functions may not be convenient in situations where there are concentration of patterns in some regions and dispersion in others [5]. One alternative is to deal with function nonuniformly distributed. The adjustment of the membership functions can be made using a grouping algorithm, like Kohonen networks [2,3]. This is the main reason why we are using a very basic Kohonen network, that finds new positions of the vertices of triangular curves. In this method the network weights correspond to the values associated to the vertices of the triangular curves, and the number of neurons belonging to the processing layer corresponds to the number of fuzzy subsets for each NFN network input. The winner neuron is the one that corresponds to the shortest Euclidean distance from the input weight vector [2,3].
3. Experiment A mathematical function that can be used as a benchmark is the Mexican hat, defined by
f (x1 , x2 ) =
sin x1 sin x2 . x1 x2
(1)
This function, shown in figure 2, represents the nonlinear system to be identified (cf. figure 1). Here x1 ∈ [10.0, −10.0] and x2 ∈ [−10.0, 10.0] are mapped to f (x1 , x2 ) in the interval [−0.1, 1.0].
A. Pagliosa et al. / Obtaining Membership Functions from a Neuron Fuzzy System
45
1
0,8
0,6
0,4
0,2
-10
0
-5
-0,2 10
0 5
x2
5
0 x1
-5 -10
10
Figure 2.: Mexican hat function used as nonlinear system One reason for choosing this particular two-variable mathematical function for testing our system, is that the resulting 3D points can be easily visualized in a graphic. The points (x1 , x2 , f (x1 , x2 )) are used for training of Kohonen network and also for adjusting the weights in NFN (cf. figure 1). Initially, the seven membership functions, showed in the figure 3, are equally distributed in the domain. The vertices as their limits, right and left, are complemented by three first columns in table 1.
u(x) NM
1.0
NS
ZE
PS
PM
NL
PL
0,5
−10.0
−6.67
−3.33
0.0
3.33
6.67
10.0
Range of inputs
Figure 3.: Initial membership functions The curves in figure 3 represent the initial state of neurons in NFN, which are seven for each input xi (x1 and x2 ). The semantic values of these curves are summarized in table 2, where the notation is the usual one from fuzzy logic (FL) systems. The next step consists in finding new vertices using a very basic Kohonen network. These new vertex values are presented in the table 1, (vertex column). This phase is computed with S2 switch open and S1 closed (cf. figure 1).
46
A. Pagliosa et al. / Obtaining Membership Functions from a Neuron Fuzzy System
Once new vertices are found, the triangular curves are redrawn in according the clustering in each two curves. These curves are built to keep the convexity, in the sense that the summation is 1 for each two curves. For example, if µZE (x) = 0.37, its neighbour curve in right side has the complement µP S (x) = 0.63. By making these adjustments we are following the idea of clustering exhibited in the Kohonen network. Therefore, the new left and right limits for these triangular curves are found. Since the function 2 is symmetric in the two variables, the vertex positions in the triangular curves are equally distributed. In this case, the NFN model presents fourteen fuzzy neurons, seven for each variable (x1 and x2 ).
u(x) 1.0 NS NL
ZE
PM
NM PS
PL
0,5
−10.0
−6.67
−3.33
0.0
3.33
6.67
10.0
Range of inputs
Figure 4.: New membership functions adjusted by NFN-MK
By closing switch S2 and opening S1 (cf. figure 1), the adjustment of weights in the NFN model follows. The NFN model is trained with 225 patterns of input/output (x1 , x2 , f (x1 , x2 )) equally distributed, during 10 epochs. The new results are presented in table 1, and the new values of the weights in NFN (cf. figure 5) are showed in last two columns of table 1. The names of the fuzzy curves are defined in table 2.
Fuzzy Curves
Initial Values
New Values Left
Vertex
Right
wf1 inal
wf2 inal
10.0 -10.0 -3.5
-10.0 -3.5 -0.2
-3.5 -0.2 3.2
0.0715 -0.1414 0.5154
-0.0643 0.0103 0.4973
0.0 3.33 1 -0.2 3.2 6.5 3.33 6.67 3.2 6.5 10.0 6.67 10.0 6.5 10.0 10.0 10.0 10.0 10.0 10.0 10.0 Table 1. Parameters for the Mexican hat function
-0.0824 0.0143 0.0739 0.0
0.0321 0.1714 -0.0463 0.0358
Left
Vertex
Right
NL NM NS
-10.0 -10.0 -6.67
-10.0 -6.67 -3.33
-6.67 -3.33 0.0
ZE PS PM PL
-3.33 0.0 3.33 6.67
w1,2
A. Pagliosa et al. / Obtaining Membership Functions from a Neuron Fuzzy System
NL: NM: NS: ZE: PS: PM:
47
Negative Large Negative Medium Negative Small Zero Positive Small Positive Medium
PL: Positive Large Table 2. Semantic meanings of fuzzy curves
The symbols wf1 inal and wf2 inal denote the average weights of NFN model, considering each fuzzy neuron already modified (cf. figure 5). Such values correspond the following vectors: wf1 inal = [w11 , w12 , w13 , ..., w17 ] and wf2 inal = [w21 , w22 , w23 , ..., w27 ]. Here w11 = w11, ..., w17 = w17, w21 = w21,..., w27 = w27, etc.
NL x1
1 w11 ........... PL 7
w17
f(x1,x2) Sum
NL x2
w21
1 ............
w27
PL 7 Figure 5.: NFN model for two variables (x1 , x2 )
The resulting sum gives
f (x1 , x2 ) = µm (x1 )wm + µ(m+1) (x1 )w(m+1) + µn (x2 )wn + µ(n+1) (x2 )w(n+1) . (2) Expression (2) follows the original idea of Uchino and Yamakawa [9,6], where the membership functions are complementary. Thus, the indices m and m + 1 are associated to a desfuzzyfication into two complementary curves. The same idea is applied to n and n + 1 indices. A numerical evaluation of the equation (2), is given table 1. The graph in figure 4 can be easily computed for any pair of (x1 , x2 ).
48
A. Pagliosa et al. / Obtaining Membership Functions from a Neuron Fuzzy System
4. Evaluation of the Performance The NFN-MK is trained with 225 samples during 10 epochs, so the internal parameters become determined (neural weights and triangular curves of these neurons). After this some inputs are given. The results of these tests are shown in the figure 6. Here the points denoted by + represent the simulation results. The continuous line represents the actual function.
Figure 6.: Mexican hat function obtained by NFN-MK
The values for the training (225 samples and 10 epochs) as well as the model selected (Mexican hat function), were chosen in order to compare our results with those found elsewhere. The parameters chosen were provided in these references, except by the ANN model that uses a classical backpropagation algorithm with 2 neurons as input layer, 7 other in hidden layer, and 1 in output layer. The parameter considered is the number of mathematical operations necessary to evaluate a cycle (an epoch), for each neuron model in its respective architecture. As an simplification, the multiplication, sum and subtraction operations are associated to the same value. These results are shown in table 3. They show that the NFN-MK performance is equivalent to that of other models used to identify the same curve. Although our system presented a mean quadratic error (MQE) slightly greater than those obtained by NFHQ (Neuro-Fuzzy Hierarquic Quadtree) and FSOM (Fuzzy Self-Organized Map) models, it is much simpler in what concerns mathematical operations. On the other hand, the NFN-MK presented a much smaller MQE than the classical ANN.
A. Pagliosa et al. / Obtaining Membership Functions from a Neuron Fuzzy System
Models
Number of Operations (+,−,×)
Operations by function in the model
NFN-MK
8
2
0.0426
NFHQ
168
21
0.0150
FSOM
200
101
0.0314
NN
42
8
0.1037
49
Mean Quadratic Error (MQE)
Table 3. Model comparisions
5. Conclusion The objective of this work was to extend a NFN model that allowed the adjustment of membership functions of triangular type by the method of Kohonen. The result was the proposal of the Neo-Fuzzy-Neuron Modified by a Kohonen Network (NFN-MK). The model was successfully tested in function approximation, and its performance was similar to those obtained by more complex, classical neural networks. The obtained results do not allow inferences about the generality of the NFN-MK model, but they seem to indicate that we have a viable model for system identification. What makes this model particularly interesting is the relatively reduced number of operations and function calculations involved, implying small processing times, when compared to other ANNs [1,4].
References [1] Flávio Joaquim de Souza. Modelos Neuro-Fuzzy Hierárquicos. PhD thesis, Pontifícia Universidade Católica - Rio de Janeiro, 1997. (Doctoral Thesis, in Portuguese, url:http://www.ica.ele.puc-rio.br/publicacoes/download/tes_0017.pdf). [2] T. Kohonen. Self-organizing feature maps and abstractions. In I. Plander, editor, Artificial Intelligence and Information-Control Systems of Robots, pages 39–45. North-Holland, Amsterdam, 1984. [3] Teuvo Kohonen. Self-organizing map. Neurocomputing, 21(1):1–6, 1998. [4] Camargo H. A. Rissoli, V. R. V. and J. A. Fabri. Geração automática de regras a partir da arquitetura neuro-fuzzy para classificação de dados (nefclass). Simpósio Brasileiro de Automação Inteligente, 1999. (in Portuguese). [5] I. S. SHAW and M. G SIMÕES. Control and Fuzzy Modeling. Edgard Blucher Ltda, São Paulo - SP - Brazil, 1999. (in Portuguese). [6] E. Uchino and T. Yamakawa. Neo-fuzzy-neuron based new approach to system modeling, with application to actual system. In Proc. Sixth Int. Conf. on Tools with Artificial Intelligent, New Orleans, USA, pages 564–570, 1994. [7] Peter Vas. Eletrical Machines And Drives. London: Oxford University Press, 1999. [8] F. Gomide W. Caminhas, H. Tavares and W. Pedrycz. Fuzzy set based neural networks: Structure, learning and aplication. In Journal of Advanced Computational Intelligence, volume 3, pages 151–157. UNICAMP - State University of Campinas, 1999. [9] Takeshi Yamakawa. A neo fuzzy neuron and its application to system identification and prediction of chaotic behavior. In Computational Intelligence: Imitating Life, pages 383–395, 1994.
EVALPSN-Based Process Control in Brewery Plants Sheng-Luen Chung1 and Yen-Hung Lai National Taiwan University of Science and Technology, Taipei 106, Taiwan
Abstract. Process control in a brewery plant deals with the open/close decisions of valves for pipelines in the brewery to meet the service requests of filtration and CIP (Clean in pipe) processes. In order to maximize concurrency among different process requests, it is desired that non-conflicting processes be enabled as much as possible. By exploring its similarity to that of railway interlocking policy, this paper adopts an EVALPSN-based concurrency control approach proposed by Nakamatsu et al. In doing so, system configuration of the system in terms of sub-process and valves for all the processes involved is first tabulated. EVALPSN-statements that reflect the pipeline configuration and imposed safety constraints of mutual exclusive usage of sub-processes are then systematically constructed. In deriving a decision as to either granting or denying a service request, these EVALPSN statements are executed in a PLC-based implementation that is connected to both human operator’s input requests as well as sensor status updates. Successfully implemented for a local brewery plant in Taiwan, the EVALPSN-based decision approach is shown to have the advantage in general pipeline control applications. Keywords. Brewery control, pipeline control, interlock policy
1. Introduction In a standard brewery plant, there can be three pipeline processes, as shown in Figure 1: beer transfer, filtration and cleaning pipelines. Different kinds of liquid are used in different process, and mixture of different kinds of liquid is strictly forbidden. Pipeline control in a brewery plant, in short, deals with control of open/close decision for all the valves along the pipeline structure [1], [2]. In addition to avoiding mixture of different liquid, it is desired at the same time to maximize the pipe utilization by allowing non-conflicting processes running simultaneously; for instance filtration and beer transfer may occur concurrently. Pipeline control in essence belongs to the category of concurrency control problems: interleaved service requests of conflicting processes require exclusive usage of the associated sub-processes for correct operation. Distinction of conflicting and non-conflicting processes need be defined to reflect the natures of the processes involved. Conventional approaches of brewery process control rely on trial-and-error simulation to yield a control map of concurrent processed in the brewery pipelines. In contrast, this paper adopts an EVALPSN-based approach, proposed by K. Nakamatsu et.al.[3][4][5]. System configuration of the system for all processes in terms of subprocess and valves is first tabulated. EVALPSN-statements that reflect the pipeline 1 Corresponding Author: Electrical Engineering Department, National Taiwan University of Science and Technology, Taipei 106, Taiwan; E-mal: [email protected]. This research was supported in part by the grants of NSC93-2218-E-011-011 and NSC93-2213-E-011079.
S.-L. Chung and Y.-H. Lai / EVALPSN-Based Process Control in Brewery Plants
51
configuration and imposed safety constraints of mutual exclusive usage of subprocesses are systematically constructed. In particular, similarity of the pipeline control to that of railway interlocking policy is exploited and modified. ˦˗ˋˀˆ
˦˗ˊˀˇ ˦˗˄ˉˀˉ
˗˩ ˄ˉ
˕̈˹˹˸̅
˕̈˹˹˸̅
˧˴́˾˄
˧˴́˾˅
˗˩ ˉ
˦˗˄ˈˀ˄ˉ
˦˗ˈˀˉ
˗˩ ˄ˈ
˦˗˄ˈˀˈ
˗˩ ˈ
˦˗ˈ
ˠˠ˦˄ ˗˩ ˄ ˦˗˄˄ˀ˄
˗˩ ˄˄
˦˗˄ˀ˅
˗˩ ˅
˦˗˅ˀˆ
˦˗˄˅ˀ˅ ˦˗˄˄ˀ˄˅
˗˩ ˆ
˦˗ˆˀˇ
˦˗˄ˆˀˆ ˦˗˄˅ˀ˄ˆ
˗˩ ˄˅
˕˸˸̅
˗˩ ˇ
˦˗ˇˀˈ
˦˗˄ˇˀˇ ˦˗˄ˆˀ˄ˇ
˗˩ ˄ˆ
ˠˠ˦˅
˦˗˄ˆˀˌ
˗˩ ˄ˇ
˕̂̇̇˿˼́˺ ˦̌̆̇˸̀
˦˗˄ˇˀ˄ˈ
˦˗˄ˇˀ˄˃
˗˩ ˌ
˗˩ ˋ
˦˗ˋˀ˄˄
˦˗˄˃
˗˩ ˄˃
˦˗ˌˀ˄˃
˗˩ ˊ
˦˗ˊˀˋ
ˣ̅̂˶˸̆̆ ˦̊˼̇˶˻
˕˸˸̅˜́ ˄
˙˼˿̇̅˴̇˸ ˄
˕̂̇̇˿˸ ˄
˙̂̅˶˸
˖˜ˣ ˦̊˼̇˶˻
˟˜ˡ˘ ˄
˟˜ˡ˘ ˅
˧˴́˾ ˄
˧˴́˾ ˅
˕˸˸̅˜́ ˅
˙˼˿̇̅˴̇˸ ˅
˦˗ˊ
˖˜ˣ ˦̈̃̃˿̌ ˦̌̆̇˸̀
˕̂̇̇˿˸ ˅
Figure 1. Brewery filtration and CIP process
2. Approach Process control of pipe-lines in a brewery plant shares much similarity with the problem of railroad interlocking problem [3]. Safety verification for railway interlocking verifies the safety when securing or releasing railway routes. It is verified by checking whether route interlocking requests or sub-process release requests by operators contradict the safety properties. The approach by EVALPSN in safety verification is the following: the safety properties, route interlocking and subprocess release requests are first expressed deontically in the framework of EVALPSN[4][5], and then the interlocking safety verification are executed as usual logic programming inquiry. Pipeline utilization is divided into phases of: service request, permission, and execution. To maximize the pipe utilization, it is desired that non-conflicting service requests can be processed simultaneously while satisfying safety requirement. We suppose the safety properties, SD for sub-processes, DV for valves, and PR for processes to avoid unexpected mixture of different kinds of liquid in the pipeline network. SD: It is a forbidden case that two or more sub-processes over a given pipe are simultaneously locked by different kinds of liquid. DV: Whenever two or more sub-processes connecting to a valve are locked, the valve must be controlled appropriately with those sub-processes. PR: Whenever a process is set, all its component sub-processes are locked.
52
S.-L. Chung and Y.-H. Lai / EVALPSN-Based Process Control in Brewery Plants
The EVALPSN based safety verification is carried out by verifying whether process requests with pipeline controllers contradict the safety properties expressed as EVALPSN statements. The following three steps of (1), (2) and (3) are to be executed concretely: (1) The safety properties for the pipeline network, which should be insured when the network is interlocked, and some control methods for the network are translated into EVALPSN clauses, and they must have been already stored as an EVALPSN Psc; (2) The if-part of the process request that is the current environment state and the then-part of the process request that is supposed to be verified are translated into EVALPSN clauses as an EVALP Pi and an EVALP Pt; (3) The EVALP clauses Pt are inquired from the EVALPSN {Psc ы Pi}, then if yes is returned, the request is assured and the defeasible deontic control is performed, otherwise, not assured and nothing is done.
3. Implementation With the EVALPSN statements formulated, we can construct an EVALPSN-based request control, as shown in Figure 2. At the right center, EVALPSN engine, prompted by the service polling mechanism at the far right, lists all the enabled process requests on the availability panel at the far left, based on the sensor reading that reflect the current status. Once the operator selects a process request, the valve actuators will then be enabled, and the designated process occurs, which will then be reflected in the EVALPSN engine and thus changing the process availability for subsequent operation selection.
˔̉˴˼˿˴˵˼˿˼̇̌ ̃˴́˸˿
˥˸ ̆̂̈̅˶ ˸ʳ ˴˿˿̂˶˴̇˼ ̂́ ʻ ˣ ˥ ʿ˦ ˗ ʿ˗ ˩ ʼ ̆̇˴̇̈̆
˘˩˔˟ˣ˦ˡ ˸́˺˼́˸ ʾʳ ̆̇˴̇˸̀˸́̇̆
˦˸̅̉˼˶˸ ̃̂˿˿˼́˺
˦˸́̆̂̅ ˩˴˿̉˸ʳ˴˶̇̈˴̇̂̅
ˣ˿˴́̇
Figure 2. EVALPSN-based request control
In particular, EVALPSN statement are coded in a SIEMENS PLC in both ladder [6]and SCL [7], follows IEC1499 standard, and consists of many function blocks [8] (FB, FC). Its final implementation contains a control sub-modules of a PLC (Programmable Logic Controller) for control logic and a Man Machine Interface (MMI) for operators to make process requests. Figure 3. shows the brewery system implement in MMI system with Intouch 8.0 [9].
S.-L. Chung and Y.-H. Lai / EVALPSN-Based Process Control in Brewery Plants
53
Figure 3. Brewery system implement in MMI The lower part of Figure 3 is a Man Machine interface from which operators make process requests. Each process of filtration and CIP has a designated switch button, which also serves as an indicator with the following four different displays to show different modes concerning granting or denying process requests from the operators: 1. Slow flash – The process is permissible and ready to be set. 2. Off – The process is forbidden to be set. 3. On - The process is set. This happens when the operator push a slow flash button, which also initiate the designated process. 4. Quick flash – When a set process is finished processing, the quick flash is used to prompt the operator to reset it by pushing the button to bring it back to “OFF” mode. When a process is forbidden to set and, nonetheless, the process is mistakenly initiated, the quick flash is to remind the operator to cancel the wrongly initiated process.
4. Discussion This paper addresses the concurrency control of filtration and CIP processes in a brewery plant in the framework of EVALPS. Similar to that of interlocking control mechanism adopted in railroad safety control, a concurrency table that details what process requests can be processed simultaneously is derived and implemented in an
S.-L. Chung and Y.-H. Lai / EVALPSN-Based Process Control in Brewery Plants
54
industrial brewery plant. The main focus of this paper is on the valve control. Future study includes the safety control of pumps, and the sequence control involved in the CIP process that requires consideration of timing of each sub-process operations.
References [1] [2]
[3]
[4]
[5]
[6] [7] [8]
[9]
D. Boothroyd, “Control network has a lot of bottle,” IEEE Computing & Control Engineering Journal, Vol.6, no.2, pp. 79– 81, April 1995. D. Troupis, S. Manesis, N. T. Koussoulas, T. Chronopoulos, “Computer integrated monitoring, fault identification and control for a bottling line,” Proc. of IEEE Industry Applications Conference: 1995. Thirtieth IAS Annual Meeting, IAS '95. vol.2, no.8-12, pp. 1549 – 1556, Oct. 1995. K. Nakamatsu, J. M. Abe, and A. Suzuki, “Applications of EVALP Based Reasoning. Logic,” Artificial Intelligence Robotics, Frontiers in Artificial Intelligence and Applications Vol.71, IOS Press, pp.174-185, 2001. K. Nakamatsu, J. M. Abe, and A. Suzuki, “Annotated Semantics for Defeasible Deontic Reasoning,” in the Proc. the Second International Conference on Rough Sets and Current Trends in Computing, LNAI 2005, Springer-Verlag, pp.432-440, 2001. K. Nakamatsu, J. Nagashima, J. M.Abe, and A. Suzuki, “An Automated Safety Verification System for Railway Interlocking Based on Extended Vector Annotated Logic Programming,” Proc. 6th World Multi-conference on Systems, Cybernetics and Informatics, Vol.XIV, pp.367-372, 2001. Siemens ladder language user manual, Siemens Co., 2003. Siemens SCL language user manual, Siemens Co., 2003. J. Thieme and H. M. Hanisch, ” Model-based generation of modular PLC code using IEC61131 function blocks” Industrial Electronics,2002. ISIE 2002. Proceedings of the 2002 IEEE International Symposium , vol. 1 , no.8-11, pp. 199 – 204, July 2002 . Wonderware 21CFR Part 11 Intouch 8.0 and industrial SQL Server 8.0 Deployment Guide, Invensys Systems, Inc. Sep. 2002. http://www.wonderware.com/Products/appserver/Deploy.pdf
Decision Making based on Paraconsistent Annotated Logic Fábio Romeu de CARVALHO b,1, Israel BRUNSTEIN a,2 and Jair M. ABE a,b,3 a University of São Paulo, São Paulo - Brazil b Paulista University, UNIP – São Paulo – Brazil Abstract. This work shows a process of decision making based on a new kind of logic - Paraconsistent Annotated Logic (PAL). Choosing the factors that influence in the success or in the failure of an enterprise and applying the PAL techniques, the Para-analyser Algorithm and the Baricenter Analysis Method we obtain a sole result. Then we can decide if the enterprise is viable or not viable, or if the data are non-conclusive, with an established level of requirement. Key Words. Decision making, paraconsistent logic, para-analyzer algorithm.
Introduction Recently several kinds of non-classical logics have been proposed in order to handle uncertainty and contradictory data without becoming trivial. One class of such logics, the paraconsistent annotated logics, can manipulate uncertain, inconsistent and paracomplete information data. These logics have been applied successfully in some areas, v.g. in Robotics and Artificial Intelligence [1]. 1. Paraconsistent Annotated Evidential Logic EW The atomic formulae of the paraconsistent annotated logic EW is of the type p(P1; P2), where (P1; P2) [0, 1]2 and [0, 1] is the real unitary interval with the usual order relation and p denotes a propositional variable. There is the order relation defined on [0, 1]2: (P1; P2) d (O1, O2) P1 d O1 and P2 d O2. Such ordered system constitutes a lattice that will be symbolized by W. p(P1; P2) can be intuitively read: “It is believed that p’s belief degree (or favorable evidence) is P1 and disbelief degree (or contrary evidence) is P2”. The pair (P1; P2) is called an annotation constant. So, we have some interesting readings: (1; 0) intuitively means total belief and no disbelief (p is a true proposition); (0; 1) intuitively means no belief and total disbelief (p is a false proposition); (1; 1) means total belief and disbelief (p is an inconsistent proposition); (0; 0) means total absence of belief and disbelief (p is a paracomplete proposition), and (0,5; 0,5) can be read as indefinite state [2]. There is a natural operator defined on [0, 1]2: a(P1; P2) = (P2; P1) which will work as the “meaning” of the negation of EW [2]. Also, we have the operators (P1; P2)OR(O1; 1
F.R. de Carvalho et al. / Decision Making Based on Paraconsistent Annotated Logic
56
O2) = (max{P1, O1}; max{P2, O2}) and (P1; P2)AND(O1; O2) = (min{P1, O1}; min{P2, O2}). We introduce the following concepts (all considerations are made having 0 d P1, P2 d 1): Segment perfectly defined (CD): P1 + P2 – 1 = 0; Segment perfectly undefined (AB): P1 - P2 = 0; Contradiction (or uncertainty) degree: Gcontr (P1; P2) = P1 + P2 - 1; Certainty degree: Hcert(P1; P2) = P1 - P2. The logical states (extreme and non-extreme) or output consist of 12 states according to the Figure 1.
Table 1. Output states and symbolization Description Indetermination or para-completeness Inconsistency Truth Falsity Quasi-inconsistency tending to falsity Quasi-falsity tending to inconsistency Quasi-falsity tending to indetermination Quasi-indetermination tending to falsity Quasi-indetermination tending to truth Quasi-truth tending to indetermination Quasi-truth tending to inconsistency Quasi-inconsistency tending to truth
Representation A Ɇ V F QɆ o QF o QF o Qɇ o Qɇ o QV o QV o QɆ o
F Ɇ ɇ F V ɇ Ɇ V
1.1. Rule of Decision In Figure 1, regions CPQ (region of truth) and DTU (region of falsity) may be called decision regions. The first, favorable decision (viability) and the second, unfavorable decision (not viability). So we can write the decision rule [3].
F.R. de Carvalho et al. / Decision Making Based on Paraconsistent Annotated Logic
57
Hcert t 0,70 favorable decision (viable enterprise); Hcert d - 0,70 unfavorable decision (not viable enterprise); - 0,70 Hcert 0,70 non-conclusive In fact, if the point X (P1; P2) belongs to one of these regions, the decision will be made. We will make a favorable decision (viability) if X belongs to region CPQ, or make an unfavorable decision (not viability) if it belongs to region DTU. In the example above, we have taken |Hcert| = 0,70 as borderlines of truth and falsity. This means that the analysis will only be conclusive when |Hcert| t 0,70. Therefore, the 0.70 value translates the minimum value of |Hcert| so that it falls in the region of truth or falsity, that is, for making a favorable or unfavorable decision. That is why it is called Level of Requirement (Lreq) of the decision [3]. This means that under these conditions, decisions would be taken with a minimum 70% of certainty. Hence the Level of Requirement (Lreq) is defined as the minimum value adopted for the degree of certainty modulus for which the decision is made (favorable or unfavorable). Of course, the degree of requirement depends on the safety one will want to have in the decision, which, on the other hand, will depend on the responsibility it implies, the investment at stake, the involvement or not of risk to human lives, etc. It is easy to observe that the larger the degree of requirment is the smaller the decision regions will be. In a more generic way, the rule of decision can be so written . Hcert t Lreq favorable decision (viable enterprise); Hcert d - Lreq unfavorable decision (not viable enterprise); - Lreq Hcert Lreq non-conclusive.
2. Application: Viability Analysis for the Implementation of a Manufacturing System with Advanced Technologies The problem most businessmen and entrepreneurs face when the machinery in their offices or factories becomes outdated or needs to be changed is the following: shall we keep the productive manufacturing process and only replace the old machines for the same new machines, or shall we innovate and substitute the manufacturing system for a new one with the introduction of advanced Technologies (new machinery, new techniques, new processes, etc.)? [8] If the alternative is to introduce new technologies there is still a doubt: which technological innovation is the most appropriate? There are many options for manufacturing systems using advanced Technologies and which one has more advantages or disadvantages in relation to the previous traditional system. These advantages or disadvantages are connected to strategic factors and to economical and operational factors, some of qualitative nature and some of quantitative nature. These factors, on the other hand are related to the amount of capital to be invested and with the operational and financial results resulting from such investments. [9] So, our problem is to find out – analyzing the influence of such factors - if there are or there aren’t advantages in replacing the old manufacturing system with traditional technology for a new manufacturing system of advanced Technologies and which of those systems is best suited for this case.
58
F.R. de Carvalho et al. / Decision Making Based on Paraconsistent Annotated Logic
Presently, there are several technologically advanced manufacturing systems that can be introduced in the daily routine of a factory, such as: CAD/CAM – Computer Automatized Project; GT/CM – Group Technology and Cellular Manufacturing; RE – Robotics Equipment; FMS – Flexible Manufacturing Systems; AA – Automatic Assembling; CIM – Computer Integrated Manufacturing. [10] On the other hand, there are many features (factors or indicators), whose performance can infuence the results of implementing these innovations and which can bring advantages or not in relation to the traditional process being used. A comparative analysis of those features (indicators) in the new and old systems will define the viability or not of replacing the old system for the new one. Let us see a list of those factors (indicators) and divide them by class: Factors related to the company’s strategic objectives: technology reputation, market share, competitive position and product innovation. Quantitative and qualitative factors, economic or operational, all related to the amount to be invested: product heterogeneity, number of manufactured items, payback period, net present value (NPV), future operating costs, residual values, useful life, real time measurements, delivery dates, product reliability, time of response, economy in direct labor, creation financing, factory floor space, additional indirect labor, product waste, guarantee rights, replacement period, preparations, reprocessing costs, etc. 2.1. Performance Coefficient Firstly, we will define a number to translate the performance of a new manufacturing system using advanced technologies compared with the old one, for a specific factor of influence. That I0 and I are the values of the specific indicator choose in the old system and in the new one, respectively. For this indicator, we will define the performance coefficient of the new system compared with the old one as: PC = 1 ± ( 'I / I0 ), where 'I = I – I0. The sign ± may be interpreted as: if the system performance is better when I increases, one must use the sign +; if the system performance is better when I decreases, one must use the signal –. That is, we use the signal + when the performance (P) is an increasing function of I, and the signal – when P is a decrasing function of I. 2.2. Establishing the sections for the factors of influence For each indicator we will establish five sections (R1 a R5) so that R1 represents a much better situation of the new system with advanced technologies compared with the old one; R2, a better situation; R3, an indifferent situation; R4, an worse situation; and R5 represents a much worse situation. We can say that the situation of the new system with advanced technologies compared with the old one is much better when the performance coefficient (PC) is bigger than 1.30. Hence, section R1 is characterized by PC ! 1.30. Similarly, we can characterize all the sections established as follows: R1: PC ! 1.30 (the new system is much better than the old one); R2: 1.10 PC d 1.30 (the new system is better than the old one); R3: 0.90 d PC d 1.10 (the new system is equivalent to the old one); R4: 0.70 d PC 0.90 (the new system is worser than the old one); R5: PC 0.70 (the new system is much worser than the old one).
F.R. de Carvalho et al. / Decision Making Based on Paraconsistent Annotated Logic
59
Some factors are not measurable. As examples, we have the strategic attributes and the qualitative ones. These factors (or indicators) cannot be translated by value I, and so, it is not possible to define the performance coefficient for them. In that case, the fitting of the factor in a section will be done by an expert, using qualitative data and his feeling and experience. 2.3. Factors of influence A list of factors of influence that may (or not) be used in the viability analysis of a new manufacturing system with advanced technologies; the use or not use of the factor depends on the system that is being analyzed. To some systems a factor may be important; to others, not. Hence, the importance of the factor is relative. The more important factors will be used in the analysis; that less important ones or the ones with no importance will not be used. Factors related to the company’s strategic objectives are not measurable. They are almost intagible so that they can only be gathered in a section by experienced experts [10]. F01 – technological reputation; F02 – company’s market share; F03 – company’s competitive position in the market; F04 – product innovation made by the company Factors related to the company’s economic and operational results. Among such factors, some are measurable and some are not. The first can be put in the section through quantitative criterion, while the latter can only be classified by experienced and specialized professionals [10]. F05 – total investment; F06 – total expenses; F07 – net present value (NPV); F08 – payback period; F09 – residual values; F10 – creation financing; F11 – product heterogeneity; F12 – product reliability; F13 – system’s useful life; F14 – system flexibility; F15 – future operating costs; F16 – direct labor costs; F17 – reprocessing costs; F18 – additional indirect labor costs; F19 – material costs; F20 – capital investment costs; F21 – real time measurements; F22 – replacement period; F23 – delivery period; F24 – response period; F25 – preparation period; F26 – machinery use period; F27 – waiting period; F28 – factory floor space; F29 – number of manufactured items; F30 – guarantee claim (or right); F31 – refuses; F32 – wastes. 2.4. Database construction Database is built with the degrees of belief (or favorable evidence) and with the degrees of disbelief (or contrary evidence) that experts will attribute to all factors (or indicators) in each one of the five established sections. All experts in agreement will attribute weights to each of the factors (or indicators) to the new manufacturing system with advance technologies under analysis. The determination of these weights may have restrictions such as on how the weights must be whole numbers in the interval [1; 10]. It is convenient that all experts required to build the database have different and complimentary backgrounds, so that the different aspects of the problem can be taken into consideration. For instance, let us consider for this task a group of four experts: Expert 1 – a production engineer (technical); 2 – a marketing executive; 3 – a finance executive; and 4 – an administrator. Table 2 summarizes the degrees of belief and disbelief attributed to the factors by the experts in each of the five sections. This is our database. Here we present only a part of the database.
F.R. de Carvalho et al. / Decision Making Based on Paraconsistent Annotated Logic
60
Table 2: Database (degrees of belief and disbelief attributed by the experts to the factors in each section) Expert 1 F
F01
F02
F03 F15
F16
Expert 2
Expert 3
Expert 4
S
P11
P21
P12
P22
P13
P23
P14
P24
R1
1,00
0,00
0,90
0,10
1,00
0,10
0,90
0,00
R2
0,70
0,20
0,80
0,30
0,80
0,20
0,70
0,30
R3
0,50
0,50
0,60
0,50
0,60
0,40
0,50
0,40
R4
0,30
0,70
0,40
0,60
0,40
0,70
0,30
R5
0,00
1,00
0,10
0,80
0,20
0,90
R1
1,00
0,05
0,95
0,15
1,00
R2
0,75
0,25
0,85
0,25
R3
0,55
0,45
0,55
R4
0,35
0,65
R5
0,00
R1
0,92
Expert 1
Expert 2
Expert 3
Expert 4
S
P11
P21
P12
P22
P13
P23
P14
P24
R1
0,99
0,06
0,94
0,16
0,99
0,11
0,84
0,01
R2
0,57
0,48
0,62
0,43
0,52
0,45
0,52
0,47
R3
0,55
0,45
0,65
0,40
0,45
0,55
0,55
0,45
0,60
R4
0,13
0,81
0,14
0,92
0,17
0,93
0,01
0,96
0,20
1,00
R5
0,02
0,94
0,14
0,88
0,15
1,00
0,18
0,91
0,10
0,85
0,00
R1
0,88
0,22
0,98
0,21
0,88
0,12
0,98
0,12
0,85
0,30
0,73
0,35
R2
0,55
0,45
0,65
0,40
0,45
0,55
0,55
0,45
0,45
0,65
0,40
0,45
0,55
R3
0,57
0,48
0,62
0,43
0,52
0,45
0,52
0,47
0,40
0,65
0,35
0,75
0,24
0,65
R4
0,10
0,86
0,15
0,93
0,24
0,98
0,08
1,00
0,95
0,15
0,75
0,15
0,85
0,25
1,00
R5
0,13
0,81
0,14
0,92
0,17
0,93
0,01
0,96
0,08
0,98
0,18
0,88
0,12
0,82
0,07
F19
R1
0,98
0,90
0,04
0,12
0,93
0,87
0,02
0,02
F31
F
F17
F18
R5
0,06
0,86
0,11
0,93
0,20
0,98
0,08
1,00
R5
0,00
1,00
0,10
0,80
0,90
0,08
1,00
0,15
R1
0,99
0,06
0,94
0,16
0,99
0,11
0,84
0,01
R1
0,99
0,25
0,90
0,19
0,84
0,14
0,94
0,15
R2
0,57
0,43
0,67
0,38
0,47
0,53
0,57
0,43
R2
0,55
0,45
0,65
0,40
0,45
0,55
0,55
0,45
R3
0,57
0,44
0,62
0,39
0,52
0,41
0,52
0,43
R3
0,57
0,48
0,62
0,43
0,52
0,45
0,52
0,47
R4
0,14
0,86
0,19
0,93
0,28
0,98
0,12
1,00
R4
0,14
0,86
0,19
0,93
0,28
0,98
0,12
1,00
R5
0,13
0,78
0,14
0,89
0,17
0,90
0,01
0,93
R5
0,06
0,86
0,11
0,93
0,20
0,98
0,08
1,00
F32
2.5. Application of the Baricenter Analysis Method (BAM) Now we show an application in the viability study for the implementation of FMS. We are going to consider the following influence factor and will not consider the reason for the choice. Let us pretend that all experts, in agreement, have attributed to each factor – according to its importance in decision making – the weights (P) in parenthesis on the left of each one, in a scale of 1 to 10 (please, see column 2 of the Table 3). (5) F01 – technological reputation; (4) F02 – company’s market share; (6) F03 – company’s competitive position in the market; (4) F04 – product innovation done by the company; (10) F07 – net present value (NPV); (5) F08 – payback period; (3) F11 – product heterogeneity; (3) F12 – product reliability; (1) F18 – additional indirect labor costs; (1) F19 – material costs; (2) F20 – capital investment costs; (1) F24 – response time; (2) F25 – preparation period; (3) F29 – number of items manufactured. Surveys conducted by specialists as well as researches on companies that have adopted and are currently using FMS show in which section each of those factors is placed (please see column 3 of Table 3). In order to apply the maximization and minimization techniques of Annotated Paraconsistent Logic, the experts will be placed in two groups: Group A – formed by Experts 1 and 2 (production engineer and marketing executive respectively), and Group B – Experts 3 and 4 (finance executive and administrator respectively). Thus, the method of application of operators OR (maximization) and AND (minimization) is the following: [(Expert 1) OR (Expert 2)] AND [(Expert 3) OR (Expert 4)] For decision making, let us assume that the level of requirement is equal to 0,60. Consequently, the rule of decision is the following: Table 3: Analysis of resultsdecision through application of rule of decision. Hcert t 0,60 favorable (viable enterprise); Hcert d - 0,60 unfavorable decision ( not viable enterprise); - 0,60 Hcert 0,60 non-conclusive.
F.R. de Carvalho et al. / Decision Making Based on Paraconsistent Annotated Logic 1
2
3
4
5 6 Group A
Expert 1
7
Expert 2
8
9 10 Group B
Expert 3
11
Expert 4
12
13
A E1 OR E2
14
15
B E3 OR E4
16
17
A AND B
61
18
20
Lreq =
0,600
Conclusions
F
P
S
P11
P21
P12
P22
P13
P23
P14
P24
P1A
P2A
P1B
P2B
P1R
P2R
Hcert
F01
5
R1
1,00
0,00
0,90
0,10
1,00
0,10
0,90
0,00
1,00
0,10
1,00
0,10
1,00
0,10
0,90 Viable
Decision
F02
4
R2
0,75
0,25
0,85
0,25
0,85
0,30
0,73
0,35
0,85
0,25
0,85
0,35
0,85
0,25
0,60 Viable
F03
6
R1
0,92
0,08
0,98
0,18
0,88
0,12
0,82
0,07
0,98
0,18
0,88
0,12
0,88
0,12
0,76 Viable
F04
4
R2
0,70
0,26
0,86
0,30
0,80
0,21
0,66
0,31
0,86
0,30
0,80
0,31
0,80
0,30
0,50 Non-conclusive
F07
10
R1
0,95
0,15
1,00
0,10
0,85
0,00
1,00
0,05
1,00
0,15
1,00
0,05
1,00
0,05
0,95 Viable
F08
5
R1
0,98
0,18
0,88
0,12
0,82
0,07
0,92
0,08
0,98
0,18
0,92
0,08
0,92
0,08
0,84 Viable
F11
3
R2
0,86
0,30
0,80
0,21
0,66
0,31
0,70
0,26
0,86
0,30
0,70
0,31
0,70
0,30
0,40 Non-conclusive
F12
3
R1
0,94
0,14
0,84
0,08
0,78
0,03
0,88
0,04
0,94
0,14
0,88
0,04
0,88
0,04
0,84 Viable
F18
1
R3
0,57
0,48
0,62
0,43
0,52
0,45
0,52
0,47
0,62
0,48
0,52
0,47
0,52
0,47
0,05 Non-conclusive -0,81 Not viable
F19
1
R5
0,01
0,94
0,13
0,88
0,14
1,00
0,17
0,91
0,13
0,94
0,17
1,00
0,13
0,94
F20
2
R2
0,47
0,43
0,52
0,44
0,57
0,39
0,47
0,41
0,52
0,44
0,57
0,41
0,52
0,41
0,11 Non-conclusive
F24
1
R4
0,14
0,86
0,19
0,93
0,18
0,02
0,21
0,95
0,19
0,93
0,21
0,95
0,19
0,93
-0,74 Not viable
F25
2
R2
0,88
0,04
0,94
0,14
0,84
0,08
0,78
0,03
0,94
0,14
0,84
0,08
0,84
0,08
0,76 Viable
F29
3
R1
0,97
0,90
0,03
0,12
0,92
0,87
0,01
0,02
0,97
0,90
0,92
0,87
0,92
0,87
-
50
-
0,85
0,23
0,05 Non-conclusive 0,63 Viable
Baricenter G: weighted averages of the resultant degrees
With the assistance of an Excel computer program, we have sought in the database (Table 2) the expert’s opinions: the degrees of belief and disbelief, thus obtaining columns 4 to 11 in Table 3. Then, the program will apply to each factor the maximization and minimization techniques of PAL, obtaining the resulting degrees of belief and disbelief (columns 16 and 17), which enables to calculate the degree of certainty to each factor (column 18). With such degree of certainty, within the established level of requirement (0,60), the program itself apply the decision rule and will come up with the conclusion if the factor actually contributes to the viability or not viability of the enterprise or if it is a non-conclusive factor (column 20). 2.6. Result's Analysis We observe seven favorable, two unfavorable factors to the enterprise and five nonconclusive, all in accordance with the established level of requirement (0,60). However, the disparate influences of those factors in the viability decision of the enterprise can be summarized by the center of gravity, or baricenter (G) of the points representing the factors of influence [3]. Hence, in order to come up to a final and global analysis, taking into consideration the combined influence of all factors, the program will calculate the degrees of belief and disbelief of the baricenter (G). These are obtained by the calculation of the weighted averages of the resultant degrees of belief and disbelief of all factors. With the degrees of belief and disbelief of the baricenter (last line of columns 16 and 17) its degree of certainty is calculated, a sole result (column 18), which will lead to the final decision (column 20): FMS implementation is “VIABLE” in relation the established level of requirement (0,60). The baricenter, degrees of belief (P1G = 0,85) and disbelief (P2G = 0,23) permit calculate its degree of certainty: Hcert G = P1G - P2G = 0,85 – 0,23 = 0,62. Since 0,62 t 0,60, by applying the rule of decision, we have come to a favorable decision. The analysis made by the rule of decision, individually to each factor, or globally by the baricenter, can be also performed with the para-analyzer algorithm in Figure 2,
F.R. de Carvalho et al. / Decision Making Based on Paraconsistent Annotated Logic
62
where we noticed that seven factors (points) are in the region of truth (suggesting the FMS implementation); two are in the region of falsity (suggesting the nonimplementation of FMS); and the other five are in the “limbo” region, thus nonconclusive. One factor (point) is in the region of inconsistency showing that in this case the expert’s opinions present a high degree of inconsistency (highly contradictory). 1 ,2 0
Degree of disbelief
1 ,0 0
0 ,8 0
Factors Baricenter Outline
0 ,6 0
:
Central Div. Diagonal Div.
0 ,4 0
0 ,2 0
0 ,0 0 0 ,0 0
0 ,2 0
0 ,4 0
0 ,6 0
0 ,8 0
1 ,0 0
1 ,2 0
Degree of belief Figure 2: Resulting analysis made by the para-analyzer algorithm.
References [1] J.M. Abe, Some Aspects of Paraconsistent Systems and Applications, Logique et Analyse, 157, 83-96. 1997. [2] J.M. Abe, Fundamentos da Lógica Anotada (Foundations of Annotated Logic) (in Portuguese), Ph.D. Thesis, University of São Paulo, São Paulo, 1992. [3] F.R. de Carvalho, Lógica Paraconsistente Aplicada em Tomadas de Decisão: uma abordagem para a administração de universidades, (Annotated Paraconsistent Logic in Decision Making: an approach for university managment )(in Portuguese), Editora Aleph, São Paulo, Brasil, 2002. [4] J.I. da Silva Filho & J.M. Abe, Paraconsistent analyser module, International Journal of Computing Anticipatory Systems, vol. 9, ISSN 1373-5411, ISBN 2-9600262-1-7, 346-352, 2001. [5] N.C.A. da Costa, C. Vago & V.S. Subrahmanian - The Paraconsistent Logics PW, in Zeitschr. f. math. Logik und Grundlagen d. Math, Bd. 37, pp. 139-148, 1991. [6] N.C.A. da Costa, J.M. Abe & V.S. Subrahmanian, Remarks on annotated logic, Zeitschrift f. math. Logik und Grundlagen d. Math. 37, pp 561-570, 1991. [7] J.I. da Silva Filho & J.M. Abe, Manipulating Conflicts and Uncertainties in Robotics, Multiple-Valued Logic and Soft Computing, V.9, ISSN 1542-3980, 147-169, 2003. [8] S. Woiler &, W.F. Mathias, Projetos: Planejamento, Elaboração e Análise, (Projects: Planning, Ellaboration and Analysis), Editora Atlas, São Paulo, Brasil, 1996. [9] M. Gaither & G. Frazier, Administração da Produção e Operações, (Production and Operations Management), Ed. Pioneira, São Paulo, Brasil, 2001. [10] P. Chalos, Managing Cost in Today's Manufacturing Environment, Dept. of Accounting, University of Illinois, Chicago, USA, 1991.
Intelligent Safety Verification for Pipeline Based on EVALPSN Kazumi Nakamatsu a , Kenji Kawasumi b and Atsuyuki Suzuki b a University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN {cs0029,suzuki}@cs.inf.shizuoka.ac.jp Abstract. We have developed an annotated logic program called an Extended Vector Annotated Logic Program with Strong Negation(abbr. EVALPSN), which can deal with defeasible deontic reasoning and contradiction. We have already applied EVALPSN to safety verification and control such as railway interlocking safety verification. In this paper, we show pipeline valve safety verification to avoid liquid mixture accidents with a simple example for brewery pipeline control. Keywords. pipeline valve control, safety verification, defeasible deontic reasoning, EVALPSN
1. Introduction We have developed an annotated logic program called an EVALPSN(Extended Vector Annotated Logic Program with Strong Negation) in order to deal with defeasible deontic reasoning and contradictions [3], and shown that EVALPSN can been applied to automated safety verification [6,5], and some kinds of control such as robot action control and traffic signal control [4,7]. The safety verification for pipeline valve control is a crucial issue to avoid unexpected accidents such as dangerous liquid mixture. In fact, different kinds of liquid such as acid and caustic soda are used in various processes in chemical plants, and the mixture of different kinds of liquid has to be avoided strictly by controlling valves safely. In this paper, we introduce a formal method for safety verification of pipeline valve control based on EVALPSN with a simple example for brewery pipeline control.
2. EVALPSN Generally, a truth value called an annotation is explicitly attached to each literal in annotated logic programs. For example, let p be a literal, µ an annotation, then p : µ is called an annotated literal. The set of annotations constitutes a complete lattice. An annotation in VALPSN [1] which can deal with defeasible reasoning is
64
K. Nakamatsu et al. / Intelligent Safety Verification for Pipeline Based on EVALPSN (2, 2) q P P @ PP ∗1 ∗3 P P q @q PP α P @ P @ @q ∗2 @q (0, 2) (2, 0) q @ @ @q γP @q β PP 1@ 1 P P @q ⊥ (0, 0)
Figure 1. Lattice Tv (n = 2) and Lattice Td
a 2-dimensional vector called a vector annotation such that each component is a non-negative integer and the complete lattice Tv of vector annotations is defined as : Tv = { (x, y)|0 ≤ x ≤ n, 0 ≤ y ≤ n, x, y and n are integers }. The ordering of the lattice Tv is denoted by a symbol and defined : let v 1 = (x1 , y1 ) ∈ Tv and v 2 = (x2 , y2 ) ∈ Tv , v 1 v 2 iff x1 ≤ x2 and y1 ≤ y2 . For each vector annotated literal p : (i, j), the first component i of the vector annotation denotes the amount of positive information to support the literal p and the second one j denotes that of negative information. For example, a vector annotated literal p : (2, 1) can be intuitively interpreted that the literal p is known to be true of strength 2 and false of strength 1. In order to deal with defeasible deontic reasoning we extended VALPSN to EVALPSN. An annotation in EVALPSN called an extended vector annotation has a form of [(i, j), µ] such that the first component (i, j) is a 2-dimentional vector as a vector annotation in VALPSN and the second one, µ ∈ Td = {⊥, α, β, γ, ∗1, ∗2 , ∗3 , }, is an index that represents deontic notion or inconsistency. The complete lattice Te of extended vector annotations is defined as the product Tv × Td . The ordering of the lattice Td is denoted by a symbol d and described by the Hasse’s diagrams in Figure1. The intuitive meaning of each member in the lattice Td is ; ⊥ (unknown), α (fact), β (obligation), γ (nonobligation), ∗1 (both fact and obligation), ∗2 (both obligation and non-obligation), ∗3 (both fact and non-obligation) and (inconsistent). Therefore, EVALPSN can deal with not only inconsistency between usual knowledge but also between permission and forbiddance, obligation and forbiddance, and fact and forbiddance. The Hasse’s diagram(cube) shows that the lattice Td is a tri- lattice in which the − → −−→ direction γβ represents deontic truth, the direction ⊥∗2 represents the amount −→ of deontic knowledge and the direction ⊥α represents factuality. Therefore, for example, the annotation β can be intuitively interpreted to be deontically truer than the annotation γ and the annotations ⊥ and ∗2 are deontically neutral, i.e., neither obligation nor not-obligation. The ordering over the lattice Te is denoted by a symbol and defined as : let [(i1 , j1 ), µ1 ] and [(i2 , j2 ), µ2 ] be extended vector annotations, [(i1 , j1 ), µ1 ] [(i2 , j2 ), µ2 ] iff (i1 , j1 ) v (i2 , j2 ) and µ1 d µ2 . There are two kinds of epistemic negations ¬1 and ¬2 in EVALPSN, which are defined as mappings over Tv and Td , respectively. Definition 1 (Epistemic Negations, ¬1 and ¬2 ) ¬1 ([(i, j), µ]) = [(j, i), µ], ∀µ ∈ Td , ¬2 ([(i, j), ⊥]) = [(i, j), ⊥], ¬2 ([(i, j), α]) = [(i, j), α], ¬2 ([(i, j), β]) = [(i, j), γ], ¬2 ([(i, j), γ]) = [(i, j), β], ¬2 ([(i, j), ∗1 ]) = [(i, j), ∗3 ], ¬2 ([(i, j), ∗2 ]) = [(i, j), ∗2 ], ¬2 ([(i, j), ∗3 ]) = [(i, j), ∗1], ¬2 ([(i, j), ]) = [(i, j), ]. These epistemic negations, ¬1 and ¬2 , can be eliminated by the above syntactic operation. On the other hand, the ontological negation(strong negation ∼) in
K. Nakamatsu et al. / Intelligent Safety Verification for Pipeline Based on EVALPSN
T2 HH P i2 -
T0 HH
-
T3
P i0
Pi - 1 V0
P i4
-
-
6
-
P i3
65
T1 HH
-
6
V1
HH
Figure 2. Pipeline Example
EVALPSN can be defined by the epistemic negations, ¬1 or ¬2 , and interpreted as classical negation. Definition 2 (Strong Negation) ∼ F =def F → ((F → F ) ∧ ¬(F → F )), where F be a formula and ¬ be ¬1 or ¬2 . Definition 3 (well extended vector annotated literal) Let p be a literal. p : [(i, 0), µ] and p : [(0, j), µ] are called well extended vector annotated literals(wevaliterals for short), where i, j ∈ {1, 2}, and µ ∈ { α, β, γ }. Defintion 4 (EVALPSN) If L0 , · · · , Ln are weva-literals, L1 ∧ · · ·∧ Li ∧ ∼ Li+1 ∧ · · · ∧ ∼ Ln → L0 is called an Extended Vector Annotated Logic Program clause with Strong Negation (EVALPSN clause for short). An Extended Vector Annotated Logic Program with Strong Negation is a finite set of EVALPSN clauses. Deontic notions and fact are represented by extended vector annotations in EVALPSN as follows : “fact of strength m” is represented by an extended vector annotation [(m, 0), α] ; “obligation of strength m” by an extended vector annotation [(m, 0), β] ; “forbiddance of strength m” by an extended vector annotation [(0, m), β] ; “permission of strength m” by an extended vector annotation [(0, m), γ] ; where m is a positive integer. Therefore, for example, a weva-literal p : [(2, 0), α] can be intuitively interpreted as “it is known that the literal p is a fact of strength 2”, and a weva-literal q : [(0, 1), β] can be intuitively interpreted as “the literal q is forbidden of strength 1”.
3. EVALPSN Safety Verification for Brewery Pipelines This section introduce EVALPSN based safety verification for valve control in a brewery pipeline network. [ Brewery Pipeline Network ] We take the pipeline network described in Figure. 2 as an example for the pipeline control based on EVALPSN safety verification. In the Figure. 2, arrows indicate the directions of liquid flow, homeplate figures indicate tanks and cross figures indicate valves. In the pipeline network, we have physical entities : four tanks T A = {T0 , T1 , T2 , T3 } ; five pipes PI = {P i0 , P i1 , P i2 , P i3 , P i4} (a pipe is a pipeline including neither valves nor tanks) ; two valves VA = {V0 , V1 } ; and logical entities : four processes
66
K. Nakamatsu et al. / Intelligent Safety Verification for Pipeline Based on EVALPSN
6
6 -
?
?
NORMAL
CROSS
Figure 3. Normal and Cross Directions 66
6 -
MIXTURE
-
SEPARATE
Figure 4. Controlled Mixture and Separate
PR = {P r0 , P r1, P r2, P r3}. Processes are defined as a set of sub-processes and valves : five sub-process SPR = {SP r0 , SP r1 , SP r2 , SP r3 , SP r4 }. Each entity has logical or physical states as follows. Sub-processes have two states locked(l) and free(f), “the sub-process is locked” means that the sub-process is supposed to be interlocked(logically reserved) by beer or some kinds of cleaning liquid and “free” means unlocked. Processes have two states set(s) and unset(xs), “the process is set” means that all the sub-processes in the process are locked and “unset” means not set. Here we assume that valves in the network can control two liquid flows in the normal and cross directions as shown in Figure 3. Valves have two controlled states ; controlled mixture(cm) which means that the valve is controlled to mix the liquid flows in the normal and cross directions, controlled separate(cs) which means that the valve is controlled to separate the liquid flows in the normal and cross directions as shown in Figure 4. We suppose that there are five kinds of cleaning liquid, cold water(cw), warm water(ww), hot water(hw), nitric acid(na) and caustic soda(cs). Then we consider the following four processes in the pipeline network : P r0, a brewery process ; the tank T0 to the valve V0 (cs) to the tank T1 : P r1 and P r2 , cleaning processes by nitric acid and cold water ; the tank T2 to the valve V1 (cm) to the valve V1 (cs) to the tank T3 : P r3 , a brewery process with mixing ; the tank T0 to the valve V0 (cs) to the tank T1 , and the tank T2 to the valve V1 (cm) to the valve V1 (cs) to the tank T3 : In order to verify the safety for the above processes, the pipeline controller issues a process request that consists of if-part and then-part before the process starting. The if-part describes the current environment state of the pipelines that are provided to the process, and the then-part describes the permission for processing the process. We also suppose the process schedule chart for the processes P r0,1,2,3 in Figure 5 as an example. [ Pipeline Safety Property ] We introduce the safety properties for the pipeline valve control, SPr (for sub-processes), Val (for valves), and Pr (for processes), for avoiding unexpected mixture of different kinds of liquid in the pipeline network. SPr : It is a forbidden case that the sub-process over a given pipe is simultaneously locked by different kinds of liquid.
K. Nakamatsu et al. / Intelligent Safety Verification for Pipeline Based on EVALPSN
BREWERY CLEANING
P r0 P r1 -
P r3
-
P r2
67
-
-
Figure 5. Process Schedule Chart
Val : It is a forbidden case that valves are controlled to mix different kinds of liquid. Pr : Whenever a process is set, all its component sub-processes are locked and all its component valves are controlled consistently. [ Pipeline Safety Verification in EVALPSN ] First of all, in order to translate the safety properties for the pipeline network into EVALPSN, we have to define some predicates used in the EVALPSN safety verification. P r(i, l) represents that the process i for the liquid l is set(s) or unset(xs), where i ∈ {p0, p1, p2, p3} is a process id corresponding to the processes P r0,1,2,3 , l ∈ {b, cw, ww, hw, na, cs} is a kind of liquid, and we have an EVALPSN clause P r(i, l) : [µ1 , µ2 ], where µ1 ∈ Tv1 = {⊥1 , s, xs, 1 } and µ2 ∈ Td = {⊥, α, β, γ, ∗1, ∗2 , ∗3 , }. The complete lattice Tv1 is a variant of the complete lattice Tv Therefore the annotations ⊥1 , s, xs and 1 are for the vector annotations (0, 0), (1, 0), (0, 1) and (1, 1), respectively. The epistemic negation ¬1 over Tv1 is defined as the following mapping : ¬1 ([⊥1 , µ2 ]) = [⊥1 , µ2 ], ¬1 ([s, µ2 ]) = [xs, µ2 ], ¬1 ([xs, µ2 ]) = [s, µ2] and ¬1 ([1 , µ2 ]) = [1 , µ2 ]. SP r(i, j, l) represents that the sub-process from the valve i (or the tank i) to the valve j (or the tank j) occupied by the liquid l is locked(l) or free(f). Moreover, if a sub-process is free then the kind of the liquid in the pipe is not cared, and the liquid is represented by the symbol “0”(zero). Therefore, we have l ∈ {b, cw, ww, hw, na, cs, 0}, and i, j ∈ {v0, v1, t0, t1, t2, t3}, are valve id and tank id corresponding to the valves V0,1 and the tanks T0,1,2,3 . Then we have an EVALPSN clause SP r(i, j, l) : [µ1 , µ2 ], where the epistemic negation ¬1 over Tv2 = {⊥2 , l, f, 2} is defined as well as the mapping over Tv1 . V al(i, ln , lc ) represents that the valve i occupied by the two kinds of liquid ln , lc ∈ {b, cw, ww, hw, na, cs, 0} is controlled separate(cs) or mixture(cm), where i ∈ {v0, v1} is a valve id. We suppose that valves have two directed liquid flows in the normal or the cross directions, refer to Figure 3. Therefore, the second argument ln represents the liquid flowing in the normal direction and the third argument lc represents the liquid flowing in the cross direction. Generally, if a valve is released from the controlled state, the liquid flow in the valve is represented by the symbol 0 that means “free”. We have an EVALPSN clause V al(i, ln , lc ) : [µ1 , µ2 ], where the epistemic negation ¬1 over Tv3 = {⊥3 , cm, cs, 3 } is defined as well as the mapping over Tv1 . Eql(l1 , l2 ) represents the liquids l1 and l2 have the same(sa) kind or different(di) ones, where l1 , l2 ∈ {b, cw, ww, hw, na, cs, 0}. We have an EVALPSN clause Eql(l1 , l2 ) : [µ1 , µ2 ], where the epistemic negation ¬1 is defined as well as the mapping over Tv1 . Now we consider the process release conditions and need one more predicate to indicate the end of processes. We suppose that if the terminal tank Ti of a process P rj is filled with a kind of liquid, the process P rj finishes and the finishing
68
K. Nakamatsu et al. / Intelligent Safety Verification for Pipeline Based on EVALPSN
signal F in(pj) is issued. • T an(ti, l) represents that the tank Ti has been filled fully(fu) with the liquid l or empty(em). Then we have an EVALPSN clause T an(ti, l) : [µ1 , µ2 ], where i ∈ {0, 1, 2, 3}, l ∈ {b, cw, ww, hw, na, cs, 0} and the epistemic negation ¬1 over Tv5 = {⊥5 , fu, em, 5 } is defined as well as the mapping over Tv5 . • Str(pi) represents that the start signal for the process P ri is issued (is) or not (ni). • F in(pj) represents that the finishing signal for the process P rj has been issued(is) or not (ni). Then we have EVALPSN clauses Str(pi) : [µ1 , µ2 ] and F in(pi) : [µ1 , µ2 ], where i, j ∈ {0, 1, 2, 3} and The epistemic negation ¬1 over Tv6 is defined as well as the mapping over Tv1 . For example, F in(p3) : [ni, α] can be interpreted as “it is a fact that the finish signal for the process P r3 has not been issued yet”. Here, we formalize all the safety properties SPr, Val and Pr in EVALPSN. SPr This condition can be intuitively interpreted as derivation rules of forbiddance. If a sub-process from a valve( or a tank) i to a valve( or a tank) j is locked by one kind of liquid, it is forbidden for the sub-process to be locked by different kinds of liquid simultaneously. Thus, generally we have the following EVALPSN clauses : SP r(i, j, l1 ) : [l, α]∧ ∼ Eql(l1 , l2 ) : [sa, α] → SP r(i, j, l2 ) : [f, β],
(1)
where l1 , l2 ∈ {b, cw, ww, hw, na, cs}. Moreover, in order to derive permission for locking sub-processes we need the following EVALPSN clauses : ∼ SP r(i, j, l) : [f, β] → SP r(i, j, l) : [f, γ],
(2)
where l ∈ {b, cw, ww, hw, na, cs}. Val This condition also can be intuitively interpreted as derivation rules of forbiddance. We have to consider two cases : one is for deriving the forbiddance from changing the control state of the valve, and another one is for deriving the forbiddance from mixing different kinds of liquid without changing the control state of the valve. Case 1 If a valve is controlled separate, it is forbidden for the valve to be controlled mixture, conversely, if a valve is controlled mixture, it is forbidden for the valve to be controlled separate. Thus, generally we have the following EVALPSN clauses : V al(i, ln , lc ) : [cs, α]∧ ∼ Eql(ln , 0) : [sa, α]∧ ∼ Eql(lc , 0) : [sa, α] → V al(i, ln , lc ) : [cs, β],
where ln , lc ∈ {b, cw, ww, hw, na, cs, 0}. Case 2 Next, we consider the other forbiddance derivation case in which different kinds of liquid are mixed even if the valve control state is not changed. We have the following EVALPSN clauses :
K. Nakamatsu et al. / Intelligent Safety Verification for Pipeline Based on EVALPSN
where ln1 , lc1 ∈ {b, cw, ww, hw, na, cs, 0} and ln2 , lc2 ∈ {b, cw, ww, hw, na, cs}. Note that the EVALPSN clause ∼ Eql(ln , 0) : [sa, α] represents there does not exist information such that the normal direction with the liquid ln in the valve is free (not controlled). As well as the case of sub-processes, in order to derive permission for controlling valves, we need the following EVALPSN clauses : ∼ V al(i, ln , lc ) : [cm, β] → V al(i, ln , lc ) : [cm, γ],
(9)
∼ V al(i, ln , lc ) : [cs, β] → V al(i, ln , lc ) : [cs, γ],
(10)
where ln , lr ∈ {b, cw, ww, hw, na, cs, 0}. Pr This condition can be intuitively interpreted as derivation rules of permission and directly translated into EVALPSN clauses as a rule “if all the components of the process can be locked or controlled consistently, then the process can be set”. For example, if the brewery process P r0 consists of the sub-process from the tank T0 to the valve V0 , the valve V0 with controlled separate by beer in the normal direction, and the sub-process from the valve V0 to the tank T1 , then we have the following EVALP clause to obtain the permission for setting the process P r0 : SP r(t0, v0, b) : [f, γ] ∧ SP r(v0, t1, b) : [f, γ] ∧ V al(v0, b, l) : [cm, γ] ∧T an(t0, b) : [fu, α] ∧ T an(t1, 0) : [em, α] → P r(p0, b) : [xs, γ],
(11)
where l ∈ {b, cw, ww, hw, na, cs, 0}. Although we also have some EVALP clauses for setting the other processes, we omit those clauses due to space restriction. [ Example ] We suppose that all the sub-processes and valves in the pipeline network are unlocked (free) and no process has already started at this initial stage. In order to verify the safety for all the processes P r0,1,2,3 , the following fact EVALP clauses(the environment information) are input to the EVALPSN pipeline control : SP r(t0, v0, 0) : [f, α], V al(v0, 0, 0) : [cs, α], SP r(v0, t1, 0) : [f, α], V al(v1, 0, 0) : [cs, α], SP r(v0, t2, 0) : [f, α], SP r(v1, v0, 0) : [f, α], SP r(t3, v1, 0) : [f, α], T an(t0, b) : [fu, α], T an(t1, 0) : [em, α], T an(t2, 0) : [em, α], T an(t3, na) : [fu, α]. Then all the sub-processes and valves in the network are permitted to be locked or controlled. However the tank conditions do not permit the processes P r2 and P r3 to be set. We show that the beer process P r0 can be verified to be set as follows : we can have neither the forbiddance from locking the sub-processes SP r0 and SP r1 , nor the forbiddance from controlling the valve V0 separate with
70
K. Nakamatsu et al. / Intelligent Safety Verification for Pipeline Based on EVALPSN
beer in the normal direction, by the EVALPSN clauses (1), (4), (5), (6) and the above fact EVALP clauses ; therefore we have the permission for locking the subprocesses SP r0 and SP r1 , and controlling the valve V0 separate with beer in the normal direction and any liquid in the cross direction, SP r(t0, v0, b) : [f, γ], V al(v0, b, l) : [cm, γ], SP r(v0, t1, b) : [f, γ], where l ∈ {b, cw, ww, hw, na, cs, 0}, by the EVALPSN clauses (2) and (9) ; moreover, we have the tank conditions, T an(t0, b) : [fu, α] and T an(t1, 0) : [em, α], thus we have the permission for setting the beer process P r0 , P r(p0, b) : [xs, γ], by the EVALPSN clause (11). 4. Conclusion In this paper, we have introduced EVALPSN based safety verification for pipeline control. What we have shown in this paper is a pipeline safety verification method for avoiding unexpected mixture of different kinds of liquid. Furthermore if we consider temporal relation between processes, it is needed to verify the safety for process order. We will propose a new safety verification method based on EVALPSN for process order in the near future.
References [1] Nakamatsu,K., Abe,J.M., and Suzuki,A., “Defeasible Reasoning Between Conflicting Agents Based on VALPSN”, Proc. AAAI Workshop Agents’ Conflicts, AAAI Press,(1999) 20–27. [2] Nakamatsu,K., Abe,J.M., and Suzuki,A., “A Defeasible Deontic Reasoning System Based on Annotated Logic Programming”, Proc. the Fourth International Conference on Computing Anticipatory Systems, AIP Conference Proceedings 573 (2001) 609–620. [3] Nakamatsu,K., Abe,J.M., and Suzuki,A., “Annotated Semantics for Defeasible Deontic Reasoning”, Proc. the Second International Conference on Rough Sets and Current Trends in Computing, LNAI 2005(2001), 432–440. [4] Nakamatsu,K., Abe,J.M., and Suzuki,A., “Defeasible Deontic Robot Control Based on Extended Vector Annotated Logic Programming”, Proc. the Fifth International Conference on Computing Anticipatory Systems, AIP Conference Proceedings 627 (2002), 490–500. [5] Nakamatsu,K., Suito,H., Abe,J.M., and Suzuki,A., “Paraconsistent Logic Program Based Safety Verification for Air Traffic Control”, Proc. 2002 IEEE International Conference on Systems, Man and Cybernetics (CD-ROM), IEEE, (2002). [6] Nakamatsu,K., Abe,J.M., and Suzuki,A., “A Railway Interlocking Safety Verification System Based on Abductive Paraconsistent Logic Programming”, Soft Computing Systems, Frontiers in AI Applications, 87 (2002) 775–784. [7] Nakamatsu,K., Seno,T., Abe,J.M., and Suzuki,A., “Intelligent Real-time Traffic Signal Control Based on a Paraconsistent Logic Program EVALP”, Proc. the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, LNCS 2639 (2003) 719–723.
A Discrete Event Control Based on EVALPSN Stable Model Kazumi Nakamatsu a , Hayato Komaba b and Atsuyuki Suzuki b a
University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN {cs0038,suzuki}@cs.inf.shizuoka.ac.jp Abstract. In this paper, we introduce a typical discrete event control example, Cat and Mouse problem can be controlled by EVALPSN stable model computation. First we show the Cat and Mouse example can be easily formalized as an EVALPSN whose stable models provide its control. Generally, stable model computation takes long time and not so appropriate for real-time control. Therefore, in order to realize real-time control for the Cat and Mouse example, we consider a restricted subset of the stable models. Keywords. discrete event, EVALPSN(Extended Vector Annotated Logic Program with Strong Negation), stable model, real-time control
1. Introduction We have already proposed EVALPSN [4,5] defeasible deontic control for a basic discrete event control example, Cat and Mouse [7], for discrete event control [6]. However, it is not so easy to construct the EVALPSN control because we have to construct an EVALPSN defeasible deontic model for the Cat and Mouse example to do that. Moreover, the EVALPSN control does not have flexibility. If the Cat and Mouse example has a different doorway allocation, we might have to different EVALPSN defeasible deontic model. In this paper, we propose a flexible EVALPSN control that can be obtained by translating the Cat and Mouse control properties into EVALPSN directly. Although the EVALPSN control has much more flexibility than the EVALPSN defeasible deontic control that we have already proposed before, it requires stable model[1] computation and takes long time. In order to realize real-time control, we provide a strategy to implement the EVALPSN control. This paper is organized as follows : first, we review EVALPSN briefly and introduce how to translate the Cat and Mouse control conditions into EVALPSN ; next, we describe how the Cat and Mouse control is performed by the EVALPSN stable model computation with an example ; last, the future work is described.
72
K. Nakamatsu et al. / A Discrete Event Control Based on EVALPSN Stable Model (2, 2) q P P @ PP ∗1 ∗3 P P q @q PP α P @ P @ @q ∗2 @q (0, 2) (2, 0) q @ @ @q γP @q β PP 1@ 1 P P @q ⊥ (0, 0)
Figure 1. Lattice Tv (n = 2) and Lattice Td
2. EVALPSN Generally, a truth value called an annotation is explicitly attached to each literal in annotated logic programs. For example, let p be a literal, µ an annotation, then p : µ is called an annotated literal. The set of annotations constitutes a complete lattice. An annotation in VALPSN [3] which can deal with defeasible reasoning is a 2-dimensional vector called a vector annotation such that each component is a non-negative integer and the lattice Tv of vector annotations is defined as : Tv = { (x, y)|0 ≤ x ≤ n, 0 ≤ y ≤ n, x, y and n are integers }. The ordering of the lattice Tv is denoted by a symbol and defined : let v 1 = (x1 , y1 ) ∈ Tv and v 2 = (x2 , y2 ) ∈ Tv , v 1 v 2 iff x1 ≤ x2 and y1 ≤ y2 . For each vector annotated literal p : (i, j), the first component i of the vector annotation denotes the amount of positive information to support the literal p and the second one j denotes that of negative information. For example, a vector annotated literal p : (2, 1) can be intuitively interpreted that the literal p is known to be true of strength 2 and false of strength 1. In order to deal with defeasible deontic reasoning we extended VALPSN to EVALPSN. An annotation in EVALPSN called an extended vector annotation has a form of [(i, j), µ] such that the first component (i, j) is a 2-dimentional vector as a vector annotation in VALPSN and the second one, µ ∈ Td = {⊥, α, β, γ, ∗1 , ∗2 , ∗3 , }, is an index that represents deontic notion or inconsistency. The complete lattice Te of extended vector annotations is defined as the product Tv × Td . The ordering of the lattice Td is denoted by a symbol d and described by the Hasse’s diagrams in Figure1. The intuitive meaning of each member in the lattice Td is ; ⊥ (unknown), α (fact), β (obligation), γ (non-obligation), ∗1 (both fact and obligation), ∗2 (both obligation and non-obligation), ∗3 (both fact and non-obligation) and (inconsistent). Therefore, EVALPSN can deal with not only inconsistency between usual knowledge but also between permission and forbiddance, obligation and forbiddance, and fact and forbiddance. The Hasse’s diagram(cube) shows that the lattice Td − → is a tri- lattice in which the direction γβ represents deontic truth, the direction −→ −−→ ⊥∗2 represents the amount of deontic knowledge and the direction ⊥α represents factuality. Therefore, for example, the annotation β can be intuitively interpreted to be deontically truer than the annotation γ and the annotations ⊥ and ∗2 are
K. Nakamatsu et al. / A Discrete Event Control Based on EVALPSN Stable Model
73
deontically neutral, i.e., neither obligation nor not-obligation. The ordering over the lattice Te is denoted by a symbol and defined as : let [(i1 , j1 ), µ1 ] and [(i2 , j2 ), µ2 ] be extended vector annotations, [(i1 , j1 ), µ1 ] [(i2 , j2 ), µ2 ] iff (i1 , j1 ) v (i2 , j2 ) and µ1 d µ2 . There are two kinds of epistemic negations ¬1 and ¬2 in EVALPSN, which are defined as mappings over Tv and Td , respectively. Definition 1 (Epistemic Negations, ¬1 and ¬2 ) ¬1 ([(i, j), µ]) = [(j, i), µ], ∀µ ∈ Td , ¬2 ([(i, j), ⊥]) = [(i, j), ⊥], ¬2 ([(i, j), β]) = [(i, j), γ],
These epistemic negations, ¬1 and ¬2 , can be eliminated by the above syntactic operation. On the other hand, the ontological negation(strong negation ∼) in EVALPSN can be defined by the epistemic negations, ¬1 or ¬2 , and interpreted as classical negation. Definition 2 (Strong Negation) ∼ F =def F → ((F → F ) ∧ ¬(F → F )), where F be a formula and ¬ be ¬1 or ¬2 . Definition 3 (well extended vector annotated literal) Let p be a literal. p : [(i, 0), µ] and p : [(0, j), µ] are called well extended vector annotated literals(wevaliterals for short), where i, j ∈ {1, 2}, and µ ∈ { α, β, γ }. Defintion 4 (EVALPSN) If L0 , · · · , Ln are weva-literals, L1 ∧ · · · ∧ Li ∧ ∼ Li+1 ∧ · · · ∧ ∼ Ln → L0 is called an Extended Vector Annotated Logic Program clause with Strong Negation (EVALPSN clause for short). If it does not include the strong negation, it called EVALP clause for short. An Extended Vector Annotated Logic Program with Strong Negation is a finite set of EVALPSN clauses. Deontic notions and fact are represented by extended vector annotations in EVALPSN as follows : where m is a positive integer, “fact of strength m” is by an annotation [(m, 0), α] ; “obligation of strength m” is by an annotation [(m, 0), β] ; “forbiddance of strength m” is by an annotation [(0, m), β] ; “permission of strength m” is by an annotation [(0, m), γ]. For example, a weva-literal p : [(2, 0), α] can be intuitively interpreted as “it is known that the literal p is a fact of strength 2”, and a weva-literal q : [(0, 1), β] can be intuitively interpreted as “the literal q is forbidden of strength 1”.
3. EVALPSN Control for Cat and Mouse Cat and Mouse Example A cat and a mouse are placed in the maze shown in Figure 2. Each doorway in the maze is either for the exclusive use of the cat,
74
K. Nakamatsu et al. / A Discrete Event Control Based on EVALPSN Stable Model
c2
-
cat 2 m2
m3
c7 ?
-
6
m6
c5
m1
?
c1 0
c4
3
c3
6
1
m4
6 -
?
c6 m5
4 mouse
Figure 2. Cat and Mouse Maze
i
j d1
k d2 -
d3 d4
q @
q @
@q q q @q sop scl @ @ q @q cl@ op @q ⊥
Figure 3. Rooms
or for the exclusive use of the mouse. It is assumed that each doorway, with the exception of c7 , can be opened or closed as required in order to control the movement of the cat and the mouse. The objective is to find the control schema that permits the cat and the mouse the greatest possible freedom of movement, but which also guarantees that A) the cat and the mouse never occupy the same room simultaneously, and B) it is always possible for the cat and the mouse to return to the initial state, i.e., the state in which the cat is in room 2, and the mouse in the room 4. In order to formalize the control for the Cat and Mouse in EVALPSN, we interprete the constrained properties A) and B) as 6 deontic control rules. Before constructing the EVALPSN control, we introduce some predicates to formalize the EVALPSN control for the Cat and Mouse and some sets of annotations. We suppose that ; - there are doorways between any two rooms, and even if there is no doorways between the rooms actually, a strongly closed doorway is supposed to be ; - if there is an uncontrollable doorway that is always open, it can be treated as strongly open ; - if the cat or the mouse move to different room, we call the movement a step.
K. Nakamatsu et al. / A Discrete Event Control Based on EVALPSN Stable Model
75
We have to consider deadlock states in the Cat and Mouse example and avoid them when controlling. Suppose that the cat is in the room 0 and the mouse is in the room 3. As the doorway c7 for the cat cannot be controlled, all the doorways c1 ,c4 , and m6 must be closed. Then, there is no available doorway for both the cat and the mouse to return back to their initial rooms. We call such a state a deadlock state. Now, we introduce the following predicates : occu(i, t) represents that the room i is occupied by an animal at the t-th step, and the cat and the mouse are represented by conflicting annotations cat and mou for the predicate occu(i, t), respectively ; therefore, we have an weva-literal occu(i, t) : [ani, µ], providing ani ∈ {cat, mou} and µ ∈ {α, β, γ} ; the epistemic negation 1 ¬1 for each set of annotations is defined as ¬1 cat = mou, ¬1 mou = cat ; for example, a weva-literal occu(i, t) : [cat, β] represents both the obligation for the cat to occupy the room i and the forbiddance for the mouse (¬1 cat) from occupying the room i ; door(i, j, ani, t) represents that the doorway the room i to j for the animal ani is controlled to be in a state at the t-th step, and the states “strongly open”, “open”, “closed” and “strongly closed” of doorways are also represented by conflicting annotations such as op and cl for the predicate door(i, j, ani, t), respectively ; therefore, we have a weva-literal door(i, j, ani, t) : [st, µ], providing st ∈ {sop, op, cl, scl} and µ ∈ {α, β, γ}. circum(i, j, t) represents that the circumstance in which the cat is in the room i and the mouse is in the room j at the t-th step is a deadlock state or not, and the states “deadlock” or “normal” are also represented by conflicting annotations for the predicate circum(i, j, t) ; therefore, we have a wevaliteral circum(i, j, t) : [st, µ], providing st ∈ {dl, nl}. Using these predicates, we can provide the following EVALPSN clauses as the translation of the constraint properties A) and B).
[Control for Doorways]
• If the ani is in the room i, the eani is in the room j, and there is a controllable doorway for the ani the room i to the room j at the t-th step, then the doorway must be controlled closed, that is to say, it is forbidden to control the doorway open. This rule is translated into : occu(i, t) : [ani, α] ∧ occu(j, t) : [eani, α] ∧ ∼ door(i, j, ani, t) : [sop, α]∧ ∼ door(i, j, ani, t) : [scl, α] → door(i, j, ani, t) : [cl, β], where
i=
j,
(1)
ani, eani ∈ {cat, mou},
t = u, u + 1.
• If the doorway for the ani the room i to the room j is strongly closed(open) at the t-th step, then the doorway must be controlled closed(open), that is to say, it is forbidden to control the doorway open(closed). This rule is translated into : door(i, j, ani, t) : [scl, α] → door(i, j, ani, t) : [cl, β],
(2)
door(i, j, ani, t) : [sop, α] → door(i, j, ani, t) : [op, β],
(3)
where
i=
j,
ani ∈ {cat, mou},
t = u, u + 1.
K. Nakamatsu et al. / A Discrete Event Control Based on EVALPSN Stable Model
76
• If there is a controllable doorway for the ani the room i to the room j at the t-th step, and there is not forbiddance from the doorway being open, then the doorway must be controlled open, that is to say, it is forbidden to control the doorway closed. This rule is translated into : ∼ door(i, j, ani, t) : [sop, α]∧ ∼ door(i, j, ani, t) : [scl, α] ∧ ∼ door(i, j, ani, t) : [cl, β] → door(i, j, ani, t) : [op, β], where
i=
j,
ani ∈ {cat, mou},
(4)
t = u, u + 1.
• If the ani is in the room i, the eani is in the room k, there is a controllable doorway for the ani the room j to the room k at the t-th step, and the circumstance in which the ani is in the room i and the eani is in the room j at the next t + 1-th step is a deadlock, then the doorway for the eani the room k to the room j must be controlled closed, that is to say, it is forbidden to control the doorway open at the t-th step. This rule is translated into : occu(i, t) : [ani, α] ∧ occu(k, t) : [eani, α] ∧ ∼ door(k, j, eani, t) : [sop, α]∧ ∼ door(k, j, eani, t) : [scl, α] ∧ circum(i, j, t + 1) : [dl, α] → door(k, j, eani, t) : [cl, β], where
i=
j, j =
k, k =
i,
(5)
ani, eani ∈ {cat, mou},
and
t = u.
• If the ani is in the room i, the eani is in the room k, there is a controllable doorway for the eani the room k to the room j, and the doorway for the eani the room j to i (or the doorway for the ani the room i to j) is strongly open at the t-th step, then the doorway for the eani must be controlled closed, that is to say, it is forbidden to control the doorway open. This rule is translated into : occu(i, t) : [ani, α] ∧ occu(k, t) : [eani, α] ∧ ∼ door(k, j, eani, t) : [sop, α]∧ ∼ door(k, j, eani, t) : [scl, α] ∧ door(j, i, eani, t) : [sop, α] → door(k, j, eani, t) : [cl, β], ( door(i, j, ani, t) : [sop, α] → door(k, j, eani, t) : [cl, β] ) where i =
j, j =
k, k =
i,
and
ani, eani ∈ {cat, mou},
(6) t = u, u + 1.
• If the ani is in the room i, the eani is in the room j, and all the doorways from the rooms i and j must be closed at the t + 1-th step, then such circumstance is defined as deadlock. This definition is translated into : occu(i, t + 1) : [ani, α] ∧ occu(j, t + 1) : [eani, α] ∧ 4
door(i, l, ani, t + 1) : [cl, β] ∧
4
door(j, m, eani, t + 1) : [cl, β]
m=0
l=0
→ circum(i, j, t + 1) : [dl, α], where
l=
i,
i=
j,
m=
j,
(7) and
ani, eani ∈ {cat, mou},
t = u.
K. Nakamatsu et al. / A Discrete Event Control Based on EVALPSN Stable Model
77
Note : the EVALPSN clause ∼ door(i, j, ani, t) : [sop, α]∧ ∼ door(i, j, ani, t) : [scl, α] represents that there is a controllable doorway for ani between the rooms i and j in the above EVALPSN. The above EVALPSN has stable models, however, the computation of the stable models for the EVALPSN takes long time. The stable model semantics for annotated logic program with strong negation is introduced in [2]. In fact, if we take complete stable models into account, the computation may continue forever, because, the EVALPSN stable models includes the infinite chain of the step numbers t = 0, 1, 2, · · ·, although we need the door control at only the u-th step (u = 0, 1, 2, · · ·). Therefore, we restrict the stable models as the present step t = u and the next step t = u + 1. We show an example for the EVALPSN control. EVALPSN Control Example [Initial Stage] Suppose that the cat is in the room 2 and the mouse is in the room 4 initially. Then, each controllable doorway’s open-close is controlled by the stable model of the EVALPSN P0 = {instances of EVALPSN clauses (1) − (7) with u = 0} as follows : EVALP clause representing the cat and the mouse’s rooms, occu(2, 0) : [cat, α] ∧ occu(4, 0) : [mou, α], are added to the EVALPSN P0 ; moreover, we need to consider EVALP clauses that represent the doorways are strongly closed such as door(1, 4, mou, 0) : [scl, α], which are added to the EVALPSN P0 ; we suppose that there are also virtual doorways for both the animals from their rooms to themselves, which are strongly open, and the broken doorway C7 for the cat as a strongly open doorway ; then, we also need to consider EVALP clauses representing that the doorways are strongly closed such as door(1, 3, cat, 0) : [sop, α], which are added to the EVALPSN P0 ; lastly, we compute the stable models for the EVALPSN P0 and obtain a stable model that includes the weva-literals, door(0, 1, cat, 0) : [op, β],
door(0, 3, cat, 0) : [op, β],
door(1, 2, cat, 0) : [op, β],
door(2, 0, cat, 0) : [op, β],
door(3, 4, cat, 0) : [op, β],
door(4, 0, cat, 0) : [op, β],
door(0, 2, mou, 0) : [op, β],
door(0, 4, mou, 0) : [op, β],
door(1, 0, mou, 0) : [op, β],
door(2, 1, mou, 0) : [op, β],
door(3, 0, mou, 0) : [op, β],
door(4, 3, mou, 0) : [op, β],
which represent the doorway control at the initial stage, all doorways must be open. [2nd Stage] Suppose that only the cat moves to the room 0. Then, each controllable doorway’s open-close is controlled by the stable model of the EVALPSN P1 = {instances of EVALPSN clauses (1) − (7) with u = 1} as follows : the EVALP clause representing the cat and the mouse’s rooms occu(1, 1) : [cat, α] ∧ occu(4, 1) : [mou, α],
78
K. Nakamatsu et al. / A Discrete Event Control Based on EVALPSN Stable Model
are added to the EVALPSN P1 ; uncontrollable doorways’ states are same as the initial stage ; we compute the stable models for the EVALPSN P1 and obtain two stable models that include the weva-literals, door(0, 1, cat, 1) : [op, β],
door(0, 3, cat, 1) : [op, β],
door(1, 2, cat, 1) : [op, β],
door(2, 0, cat, 1) : [op, β],
door(3, 4, cat, 1) : [op, β],
door(4, 0, cat, 1) : [op, β],
door(0, 2, mou, 1) : [op, β],
door(0, 4, mou, 1) : [op, β],
door(1, 0, mou, 1) : [op, β],
door(2, 1, mou, 1) : [op, β],
door(3, 0, mou, 1) : [op, β],
door(4, 3, mou, 1) : [cl, β],
which represent the doorway control at the second stage, all doorways must be open except for the doorway m5 for the mouse. 4. Conclusion In this paper, we have introduced a discrete event control based on EVALPSN stable model computation with taking the Cat and Mouse as an example. The stable models for the EVALPSN representing the Cat and Mouse control model are essentially same as ordinary automaton models for the Cat and Mouse. Acknowledgement We acknowledge that this research was financially suppoerted by the Grant in The Japanese Scientific Research Fund Foundation (C)(2) Project No. 16560468. References [1] Gelfond, M. and Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. 5th International Conference and Symposium on Logic Programming, IEEE, (1989), 1070-1080. [2] Nakamatsu,K. and Suzuki, A., Annotated Semantics for Default Reasoning, Proc. 3rd Pacific Rim Int’l Conf. Artificial Intelligence, Academic Press, (1994), 180-186. [3] Nakamatsu,K., Abe,J.M., and Suzuki,A., Defeasible Reasoning Between Conflicting Agents Based on VALPSN, Proc. AAAI Workshop Agents’ Conflicts, AAAI Press (1999), 20–27. [4] Nakamatsu,K., Abe,J.M., and Suzuki,A., A Defeasible Deontic Reasoning System Based on Annotated Logic Programming, Proc. the Fourth International Conference on Computing Anticipatory Systems, AIP Conference Proceedings 573, AIP, (2001), 609–620. [5] Nakamatsu,K., Abe,J.M., and Suzuki,A., Annotated Semantics for Defeasible Deontic Reasoning, Proc. the Second International Conference on Rough Sets and Current Trends in Computing, LNAI 2005, Springer-Verlag (2001), 432–440. [6] Nakamatsu,K., Komaba,H., and Suzuki,A., “Defeasible Deontic Control for Discrete Events Based on EVALPSN ”, Proc. the Fourth International Conference on Rough Sets and Current Trends in Computing, LNAI 3066, Springer (2004), 310-315. [7] Ramadge,J.G.P. and Wonham,W.M., “The Control of Discrete Event Systems”, Proc. IEEE, Vol.77, No.1, pp.81-98, 1989.
An EVALP Based Traffic Simulation System Kazumi Nakamatsu a , Ryuji Ishikawa b and Atsuyuki Suzuki b a University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN {cs0005,suzuki}@cs.inf.shizuoka.ac.jp Abstract. Driving actions of human beings such as putting the brake in order to control the car speed can be regarded to be decided by defeasible deontic reasoning based on environmental information such as the distance between two cars. We formalize such a car driving model in a paraconsistent logic program EVALP(Extended Vector Annotated Logic Program), which can deal with defeasible deontic reasoning. In this paper, we introduce an EVALP defeasible deontic reasoning based car driving model and a traffic simulation system based on the model, which can be implemented in the cell automaton method for traffic simulation. Keywords. traffic simulation, EVALPSN(Extended Vector Annotated Logic Program with Strong Negation), defeasible deontic resoning, drivers’ model
1. Introduction We have already proposed EVALPSN(Extended Vector Annotated Logic Program) [2,3] that can deal with defeasible deontic reasoning, and applied it to various kinds of action control such as traffic signal control [5]. Driving actions of human beings such as putting the brake in order to control the car speed can be regarded to be decided by defeasible deontic reasoning based on environmental information such as the distance between two cars and the speeds of the two cars. Generally, in action control based on EVALPSN defeasible deontic reasoning [6,4], forbiddance or permission for actions are defeasibly derived from environment information such as sensory information, and if the permission for an action is derived, we have obligation to do the action at the next step. For example, if there is enough distance between two cars, the distance derives permission to speed up the following car, on the other hand, if there is a traffic light with red in front of the following car, the red light derives forbiddance to speed up. Then, either the permission or the forbiddance are derived by defeasible reasoning, and speed up or slow down are decided as the next action. We formalize such a driving model based on defeasible deontic reasoning in EVALPSN and call the driving model as drivers’ model in this paper. Moreover,
K. Nakamatsu et al. / An EVALP Based Traffic Simulation System
80
we introduce a traffic simulation system based on the EVALPSN drivers’ model computation. This paper is organized as follows : first, we review EVALPSN briefly and introduce the drivers’ model in EVALPSN ; next, we describe some sample rules to control car speed and show how the rules are translated into EVALPSN cluases ; and show the traffic simulation system based on the EVALPSN drivers’ model.
2. EVALPSN Generally, a truth value called an annotation is explicitly attached to each literal in annotated logic programs. For example, let p be a literal, µ an annotation, then p : µ is called an annotated literal. The set of annotations constitutes a complete lattice. An annotation in VALPSN [1] which can deal with defeasible reasoning is a 2-dimensional vector called a vector annotation such that each component is a non-negative integer and the complete lattice Tv of vector annotations is defined as : Tv = { (x, y)|0 ≤ x ≤ n, 0 ≤ y ≤ n, x, y and n are integers }. The ordering of the lattice Tv is denoted by a symbol and defined : let v 1 = (x1 , y1 ) ∈ Tv and v 2 = (x2 , y2 ) ∈ Tv , v 1 v 2 iff x1 ≤ x2 and y1 ≤ y2 . For each vector annotated literal p : (i, j), the first component i of the vector annotation denotes the amount of positive information to support the literal p and the second one j denotes that of negative information. For example, a vector annotated literal p : (2, 1) can be intuitively interpreted that the literal p is known to be true of strength 2 and false of strength 1. In order to deal with defeasible deontic reasoning we extended VALPSN to EVALPSN. An annotation in EVALPSN called an extended vector annotation has a form of [(i, j), µ] such that the first component (i, j) is a 2-dimensional vector as a vector annotation in VALPSN and the second one, µ ∈ Td = {⊥, α, β, γ, ∗1 , ∗2 , ∗3 , }, is an index that represents deontic notion or inconsistency. The complete lattice Te of extended vector annotations is defined as the product Tv × Td . The ordering of the lattice Td is denoted by a symbol d and described by the Hasse’s diagrams in Figure1. The intuitive meaning of each member in the lattice Td is ; ⊥ (unknown), α (fact), β (obligation), γ (non-obligation), ∗1 (both fact and obligation), ∗2 (both obligation and non-obligation), ∗3 (both fact and non-obligation) and (inconsistent). Therefore, EVALPSN can deal with not only inconsistency between usual knowledge but also between permission and forbiddance, obligation and forbiddance, and fact and forbiddance. The Hasse’s diagram(cube) shows that the lattice Td − → is a tri- lattice in which the direction γβ represents deontic truth, the direction −→ −−→ ⊥∗2 represents the amount of deontic knowledge and the direction ⊥α represents factuality. Therefore, for example, the annotation β can be intuitively interpreted to be deontically truer than the annotation γ and the annotations ⊥ and ∗2 are
K. Nakamatsu et al. / An EVALP Based Traffic Simulation System
81
(2, 2) q P P @ PP ∗1 ∗3 P P q @q PP α P @ P @ @q ∗2 @q (0, 2) (2, 0) q @ @ @q γP @q β PP 1@ 1 P P @q ⊥ (0, 0)
Figure 1. Lattice Tv (n = 2) and Lattice Td
deontically neutral, i.e., neither obligation nor not-obligation. The ordering over the lattice Te is denoted by a symbol and defined as : let [(i1 , j1 ), µ1 ] and [(i2 , j2 ), µ2 ] be extended vector annotations, [(i1 , j1 ), µ1 ] [(i2 , j2 ), µ2 ] iff (i1 , j1 ) v (i2 , j2 ) and µ1 d µ2 . There are two kinds of epistemic negations ¬1 and ¬2 in EVALPSN, which are defined as mappings over Tv and Td , respectively. Definition 1 (Epistemic Negations, ¬1 and ¬2 ) ¬1 ([(i, j), µ]) = [(j, i), µ], ∀µ ∈ Td , ¬2 ([(i, j), ⊥]) = [(i, j), ⊥], ¬2 ([(i, j), β]) = [(i, j), γ],
These epistemic negations, ¬1 and ¬2 , can be eliminated by the above syntactic operation. On the other hand, the ontological negation(strong negation ∼) in EVALPSN can be defined by the epistemic negations, ¬1 or ¬2 , and interpreted as classical negation. Definition 2 (Strong Negation) ∼ F =def F → ((F → F ) ∧ ¬(F → F )), where F be a formula and ¬ be ¬1 or ¬2 . Definition 3 (well extended vector annotated literal) Let p be a literal. p : [(i, 0), µ] and p : [(0, j), µ] are called well extended vector annotated literals(wevaliterals for short), where i, j ∈ {1, 2}, and µ ∈ { α, β, γ }. Defintion 4 (EVALPSN) If L0 , · · · , Ln are weva-literals, L1 ∧ · · · ∧ Li ∧ ∼ Li+1 ∧ · · · ∧ ∼ Ln → L0 is called an Extended Vector Annotated Logic Program clause with Strong Negation (EVALPSN clause for short). If it does not include the strong negation, it called EVALP clause for short. An Extended Vector Annotated Logic Program with Strong Negation is a finite set of EVALPSN clauses. Deontic notions and fact are represented by extended vector annotations in EVALPSN as follows : where m is a positive integer,
82
K. Nakamatsu et al. / An EVALP Based Traffic Simulation System
“fact of strength m” is by an annotation [(m, 0), α] ; “obligation of strength m” is by an annotation [(m, 0), β] ; “forbiddance of strength m” is by an annotation [(0, m), β] ; “permission of strength m” is by an annotation [(0, m), γ]. For example, a weva-literal p : [(2, 0), α] can be intuitively interpreted as “it is known that the literal p is a fact of strength 2”, and a weva-literal q : [(0, 1), β] can be intuitively interpreted as “the literal q is forbidden of strength 1”.
3. EVALPSN Based Drivers’ Model What kind of information makes the decision for braking and acceleration when driving ? The traffic red light lets the driver put the brake and the long distance between two cars lets the driver put the accelerator. Suppose that both the informations, “the traffic red light” and “the long distance between the cars” are simultaneously input to the driver. Then, which action, braking or acceleration are done ? Probably another information is taken into account and the decision must be made. Drivers make the decision for slowing down or speeding up the car based on various environmental information like this. We introduce a drivers’ model based on EVALPSN defeasible deontic reasoning. The EVALPSN drivers’ model can compute the next car action, “slow down”, “speed up”, or “keep the present speed” based on defeasible deontic reasoning. Here we assume the following assumptions. [Assumptions for EVALPSN Drivers’ Model] We assume the following items for EVALPSN drivers’ model : • three actions for car driving, “speed up”,“slow down” and “continue”, are computed as the control result for each car in the simulation system based on EVALPSN drivers’ model ; • forbiddance or permission for the action “speed up” are derived from the environmental information such as the distance between the precedent car and the object ; • one obligation for the three actions shown in the item 1 is derived by EVALPSN defeasible deontic reasoning, and the obligation becomes the next action of the car ; • drivers are supposed to obey traffic rules such as the speed limit of the road and traffic lights ; • basically, a cell automaton based simulation method is assumed as the traffic simulation method. We use the following predicates to represent the drivers’ model : mv(t) represents an action of the car at the time t, if it has an annotation [(0, 1), β], it represents the weak forbiddance from “speed up”, if it has an annotation [(2, 0), γ], it represents the strong permission for “slow down”, etc. ;
K. Nakamatsu et al. / An EVALP Based Traffic Simulation System
83
vo (t) represents the speed of the car at the time t, then the set of vector annotations for the predicate is {(i, j)|i, j ∈ {0, 1, 2, 3, 4, 5}}, roughly speaking, it may be thought that the vector annotation (2, 0) represents about 10km/h, the vector annotation (5, 0) represents over 40Km/h, the vector annotation (0, 0) represents that the car does not move, etc. ; vn (t) represents the speed of the precedent car at the time t, the vector annotations are as well as the predicate vo (t) ; vo (s, t) represents the speed of the oncoming car at the time t, the vector annotations are as well as the predicate vo (t) ; dp (t) represents the distance between the precedent and the objective cars at the time t, then the set of vector annotations for the predicate is {(i, j)|i, j ∈ {0, 1, 2, . . .}}, roughly speaking, it may be thought that the vector annotation (2, 0) represents that the distance is 2 cells, the vector annotation (5, 0) represents that the distance is 5 cells, etc. ; dc (t) represents the distance between the curve and the car at the time t, the vector annotations are as well as the predicate dp (t) ; df (t) represents the distance between the oncoming and the cars at the time t, the vector annotations are as well as the predicate dp (t) ; go(t) represents the direction that the car is headed, a vector annotation for the predicate (i, j) ∈ {(0, 0), (1, 0), · · · , (2, 2)}, where the annotation (2, 0) represents the direction right, the annotation (0, 2) represents the direction left, and the annotation (1, 1) represents the direction straight. [Computational Rules in Drivers’ Model] We have some considerable rules to construct the drivers’ model and introduce the following three rules only. Traffic Light Rule If the traffic light in front of the car indicates : - red light : it can be taken that there is an obstacle on the stop line before the traffic light, that is to say, we have the forbiddance to enter the intersection ; - yellow light : it is the same as the red light rule except that if the distance between the car and the stop line is less than 2 cells, it is the same as the green light ; - green light : it has no restriction for cars going straight except that it can be taken for the cars turning at the intersection that there is an obstacle in the intersection, that is to say, the car has to slow down in the intersection. Straight Road Rule If there is a car running on a straight road, the car speed is controlled by - the distance between the precedent car and the object ; - the speeds of the precedent car and the object ; - the speed limit of the road and the traffic light color.
84
K. Nakamatsu et al. / An EVALP Based Traffic Simulation System
v=2
v=1
v=0
object Case 1
v=1
object Case 2
Figure 2. Cell States in the Case 1 and 2
Generally, forbiddance or permission for the three actions are derived according to the above informations. For example, suppose that the object car is moving at the speed of 1, then we have the following EVALP clauses to control the action of the object. [Case 1] If the distance between the precedent car and the object is longer than 2 cells, we have permission to accelerate the car at the time t. This rule is translated into : vo (t) : [(1, 0), α] ∧ dp (t) : [(2, 0), α] → mv(t) : [(0, 1), γ].
(1)
[Case 2] If the precedent car not moving is located in the next cell and the car is moving at the speed of 1, we have strong forbiddance from speed up at the time t, which means strong obligation to stop. This rule is translated into : vo (t) : [(1, 0), α] ∧ vn (t) : [(0, 0), α] ∧ dp (t) : [(0, 0), α] → mv(t) : [(0, 2), β].
(2)
[Case 3] If the precedent car is faster than the car whose speed is 1, we have permission to accelerate the car at the time t. This rule is translated into : vo (t) : [(1, 0), α] ∧ vn (t) : [(2, 0), α] → mv(t) : [(0, 1), γ].
(3)
Then, if both the permission mv(t) : [(0, 1), γ] and the forbiddance mv(t) : [(0, 2), β] are derived, since the forbiddance is stronger than the permission, we have the control to slow the car down at the next step. Curve and Turn Rule If the car is headed to the curve or intends to turn at the intersection, the obligation to slow the car down (i.e. the forbiddance to speed up the car) always has to be derived. [Case 4] If the car is moving at the speed of 3 and the distance between the car and the curve is 2 cells at the time t. This rule is translated into : vo (t) : [(3, 0), α] ∧ dc (t) : [(2, 0), α] ∧ go(t) : [(2, 0), α] → mv(t) : [(0, 1), β]
(4)
K. Nakamatsu et al. / An EVALP Based Traffic Simulation System
85
6 v=3 object Case 4 Figure 3. Cell States in the Case 4
4. Simulation System In this section, we introduce the traffic simulation system based on EVALPSN based drivers’ model. The Figure 4 shows the traffic simulation around the crossing road with a traffic light. In the figure, each squere box with a number shows a car, and the number attached to the car indicates its speed at that time. When we simulate the behavior of car traffic we compute the EVALPSN drivers’ model for each car in the simulation system. Moreover, the simulation system simulates the traffic light control based on EVALPSN traffic light control system [5] in which the length of each traffic light (red, yellow, green, etc.) is controlled by sensed traffic amount.
Figure 4. Traffic Simulation at Intersection
86
K. Nakamatsu et al. / An EVALP Based Traffic Simulation System
Acknowledgement In this paper, we have introduced a drivers’ model based on EVALP defeasible deontic reasoning and its simulation system. We acknowledge that this research was financially suppoerted by the Grant in The Japanese Scientific Research Fund Foundation (C)(2) Project No. 16560468.
References [1] Nakamatsu,K., Abe,J.M., and Suzuki,A., Defeasible Reasoning Between Conflicting Agents Based on VALPSN, Proc. AAAI Workshop Agents’ Conflicts, AAAI Press (1999), 20–27. [2] Nakamatsu,K., Abe,J.M., and Suzuki,A., A Defeasible Deontic Reasoning System Based on Annotated Logic Programming, Proc. the Fourth International Conference on Computing Anticipatory Systems, AIP Conference Proceedings 573, AIP, (2001), 609–620. [3] Nakamatsu,K., Abe,J.M., and Suzuki,A., Annotated Semantics for Defeasible Deontic Reasoning, Proc. the Second International Conference on Rough Sets and Current Trends in Computing, LNAI 2005, Springer-Verlag (2001), 432–440. [4] Nakamatsu,K., Abe,J.M., and Suzuki,A., Defeasible Deontic Robot Control Based on Extended Vector Annotated Logic Programming, Proc. the Fifth International Conference on Computing Anticipatory Systems, AIP Conference Proceedings 627, AIP (2002), 490–500. [5] Nakamatsu,K., Seno,T., Abe,J.M., and Suzuki,A., “Intelligent Real-time Traffic Signal Control Based on a Paraconsistent Logic Program EVALP”, Proc. the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, LNCS 2639, Springer-Verlag (2003) 719–723. [6] Nakamatsu,K., Mita,Y., Shibata,T., and Abe,J.M., Defeasible Deontic Action Control Based on Paraconsistent Logic Program and its Hardware Implementation, Proc. 3rd International Conference on Computational Intelligence for Modelling Control and Automation (CD-ROM), (2003).
Modelling and Prediction of Electronically Controlled Automotive Engine Power and Torque Using Support Vector Machines P.K. WONG a,1, C.M. VONG b, Y.P. LI b, L.M. TAM a Department of Electromechanical Engineering, FST, University of Macau, Macao b Department of Computer and Information Science, FST, University of Macau, Macao a
Abstract. Modern automotive engines are controlled by the electronic control unit (ECU). The electronically controlled automotive engine power & torque is significantly affected with effective tune-up of ECU. Current practice of ECU tune-up relies on the experience of the automotive engineer. Therefore, engine tine-up is usually done by trial-and-error method because a mathematical power & torque model of the electronically controlled engine has not been determined yet. With an emerging technique, Support Vector Machines (SVM), the approximate power & torque model of an electronically controlled vehicle engine can be determined by training the sample data acquired from the dynamometer. This model can be used for the engine performance prediction. The construction and accuracy of the model are also discussed in this paper. The study shows that the predicted results are good agreement with the actual test results. Keywords. Electronically controlled automotive engine, Support vector machines, Modelling
Introduction Modern automotive engines are controlled by the electronic control unit (ECU). The electronically controlled automotive engine power & torque are significantly affected by the setup of control parameters in the ECU. Normally, the car engine power & torque are obtained through dynamometer tests. Current practice of engine tune-up relies on the experience of the automotive engineer. The engineers will handle a huge number of combinations of the engine control parameters. The relationship between the input and output parameters of an electronically controlled vehicle engine car engine is a complex multi-variable function [1], which is very difficult to be found. Consequently, engine tune-up is usually done by trial-and-error method. Moreover, the 1
Corresponding Author: P.K.Wong, Department of Electromechanical Engineering, Faculty of Science &
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
power & torque model is engine dependent. Knowing the power & torque model can let the automotive engineer predict if a new car engine set-up is gain or loss. Traditional mathematical methods of nonlinear regression [2-3] may be applied to construct the engine performance model for prediction. However, an electronically controlled vehicle engine setup involves too many parameters and data. Constructing the model in such a high dimensional and nonlinear data space is a very difficult task for traditional regression methods. With an emerging technique, Support Vector Machines (SVM) [4-6], the traditional problem of high dimensionality regression is overcome. The engine power & torque model regressed can be used for the engine performance prediction without using dynamometer tests.
1. Support Vector Machines SVM is an emerging technique pioneered by Vapnik [4-6]. It is an interdisciplinary field of machine learning, optimization, statistical learning and generalization theory. Basically it can be used for pattern classification and multi-variable regression. No matter which application, SVM considers the application as a Quadratic Programming (QP) problem for the weights with regularization factor included. Since a QP problem is a convex function, the solution of the QP problem is global (or even unique) instead of a local solution.
1.1. SVM formulation for multi-variable regression Consider the regression to the data set, D = {(x1, y1), …, (xN, yN)}, with N data points where xi ∈ Rn, y ∈ R. SVM formulation for multi-variable regression is expressed as the following equation [6-7].
Min W(α,α*) = α,α*
N N 1N N * * * K x x yi (αi −αi*) ( α − α )( α − α ) ( , ) + ε ( α + α ) − ∑∑ ∑ ∑ i i j j i j i i 2 i=1 j=1 i=1 i=1
N
such that
∑(α −α ) = 0 i
* i
i=1
(1)
where
α,α * : Lagrangian multipliers (Each multiplier is expressed as an N-dimensional vector)
αi , α j ∈ α,
α i ∗ , α j ∗ ∈ α* ,
K: kernel function
for 1 ≤ i, j ≤ N and αi , α j , αi* , α *j ∈[0, c]
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
89
ε : user pre-defined regularization constant c : user pre-defined positive real constant for capacity control In this specific application, some parameters in Eq. (1) are specified as: xi : engine input control parameters in the ith sample data point i = 1,2…N (i.e. the ith engine setup) yi : engine output torque in the ith sample data point N : total number of engine setups
αi and αi* are known as support values corresponding to the ith data point, where ith data point means the ith engine setup and output torque. Besides, Radial Basis Function (RBF) with user pre-defined sample variance σ2 is chosen as the kernel function because it often has a good result for nonlinear regression [8]. After solving Eq. (1) with a commercial optimization package, such as MATLAB and its optimization toolbox, two N-vectors
α,α * are obtained to be the solutions, resulting in the
following target multi-variable function:
N
N
M (x) = ∑ (α i − α ) K ( x, x i ) + b = ∑ (α i − α )e * i
i =1
* i
i =1
−
x −xi
σ
2
2
+b (2)
where b: bias constant x: new engine input setup with N parameters σ2: user-specified sample variance In order to obtain b, m training data points dk = <xk, yk> ∈ D, k = 1,2,…, m, are selected, such that their corresponding αk and αk* ∈ (0,c). By substituting xk into Eq. (2) and setting M(xk) = yk, a bias bk can be obtained. Since there are m biases, the optimal bias value b* is usually obtained by taking the average of bk.
2. Application of SVM to Engine Modelling In this application, M(x) in Eq. (2) is the torque model of an electronically controlled automotive engine. The issues of the use of SVM to this application domain are discussed in the following sub-section.
90
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
2.1. Engine Data Representation The training data set is expressed as D = {(xi, yi)}, i = 1 to N. Practically, there are many input control parameters and they are also ECU and engine dependent. Moreover, the engine horsepower and torque curves are normally obtained at full-load condition. For the demonstration purpose of the SVM methodology, the following common adjustable engine parameters and environmental parameter are selected to be the input. x = < Ir, O, tr, f, Jr, d, a, p, v > and y =
where r: Engine speed (RPM) and r = {1000, 1500, 2000, …, 8000} Ir: Ignition spark advance at the corresponding engine speed r (degree before top dead center) O: Overall ignition trim ( ± degree before top dead center) tr: Fuel injection time at the corresponding engine speed r (millisecond) f: Overall fuel trim ( ± %) Jr: Timing for stopping the fuel injection at the corresponding engine speed r (degree before top dead center) d: Ignition dwell time (millisecond) a: Air temperature (°C) p: Fuel pressure (Bar) Tr: Engine torque at the corresponding engine speed r (Nm) v: VTEC changeover point (RPM) The engine speed range for this project has been selected from 1000 RPM to 8000 RPM. Although the engine speed r is a continuous variable, in practical ECU setup, the engineer normally fills the setup parameters for each category of engine speed in a map format. The map is usually divided the speed range discretely with interval 500, i.e. r = {1000, 1500, 2000, 2500,…}. Therefore, it is unnecessary to build a function across all speeds. Under this reason, r is manually categorized with a specified interval of 500 instead of any integer ranging from 1000 to 8000. As some data is engine speed dependent, another notation Dr is used to further specify a data set containing the data with respect to a specific r. For example, D1000 contains the following parameters: , while D8000 contains . Consequently, D is separated into fifteen subsets namely D1000, D1500, D2000 …, D8000. An example of the training data (engine setup) for D1000 is shown in Table 1. For every subset Dr, it is normalized within the range [0,1] and then passed to the SVM regression module, Eq. (1), one by one in order to construct fifteen torque models Mr(x) with respective to engine speed r, i.e. Mr(x)=Mr ={M1000, M1500, M2000,…,M8000}.
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
91
The normalization process can prevent any parameter from dominating the output value. Table 1. Example of training data di in data set D1000 I1000
O
t1000
f
J1000
d
a
p
v
T1000
d1
8
0
7.1
0
385
3
22
2.8
5500
20.5
d2
11
2
6.5
0
360
3
24
2.8
4000
11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
dN
12
0
8.5
3
365
2.6
30
2.8
6500
12.6
˄ˉ˃
˄ˉ˃
˄ˇ˃
˄ˇ˃
˄˅˃
˄˅˃
˄˃˃
˄˃˃
ˋ˃
ˋ˃
ˉ˃
ˉ˃
ˇ˃ ˅˃
ˇ˃ ˅˃
˃
˄˃˃˃
˄ˈ˃˃
˅˃˃˃
˅ˈ˃˃
ˆ˃˃˃
ˆˈ˃˃
ˇ˃˃˃
ˇˈ˃˃
ˈ˃˃˃
ˈˈ˃˃
ˉ˃˃˃
ˉˈ˃˃
˘́˺˼́˸ʳ̆̃˸˸˷ʳʻ˥ˣˠʼ
ˊ˃˃˃
ˊˈ˃˃
ˋ˃˃˃
˃
˧̂̅̄̈˸ʳʻˡ̀ʼ
˛̂̅̆˸̃̂̊˸̅ʳʻ˛ˣʼ
In this way, the SVM module is run for fifteen times. At each run, a distinct subset Dr is used as training set to estimate its corresponding torque model. A torque against engine speed curve of the engine can therefore be obtained by fitting a curve that passes through the data points generated by M1000, M1500, M2000,…,M8000. Of course, the data points generated should be de-normalized in order to obtain the actual output torque values. A case of actual engine torque curve acquired by dynamometer and predicted engine torque curve using SVM is shown in Figure 1.
Figure 1. A case of predicted and actual engine torque and power curves
92
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
3. Data Sampling and Implementation Issues In practical engine setup, the automotive engineer determines an initial setup, which can basically start the engine, and then the engine is fine-tuned by adjusting the parameters about the initial setup values. Therefore, the input parameters are sampled based on the data points about an initial setup parameters supplied by the engine manufacturer. In our experiment, a sample data set of 250 different engine setups along with torque output D was acquired from a Honda B16A DOHC VTEC engine (Figure 2) controlled by a programmable ECU, MoTeC M4 (Figure 3), running on a chassis dynamometer (Figure 4) at wide open throttle.
Figure 2. Honda B16A DOHC VTEC engine for testing
Figure 3. Adjustment of engine input parameters using MoTeC M4 programmable ECU
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
93
Figure 4. Car engine power & torque data acquisition on a chassis dynamometer
The output data is only the engine torque against the engine speeds because the horsepower of an engine can be easily calculated using Eq. (3) [9].
HP =
r ×T 7123.78
(3)
where HP : Engine horsepower (Hp) r : Engine speed (RPM) T: Engine torque (Nm) After collection of sample data set D, for every data subset Dr ⊂ D, it is randomly divided into two sets: TRAINr for training and another TESTr for testing, such that Dr = TRAINrЖTESTr, where TRAINr contains 80% of Dr and TESTr holds the remaining 20%. Then every TRAINr is sent to the SVM module for training, which has been implemented using MATLAB 6.5 with its optimization toolbox running on MS Windows XP platform. The detail implementation is discussed in following subsection.
94
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
3.1. SVM Training Before training the SVM system, the hyper-parameters in Eq. (2) is set to be c = σ = 1, which are common choices. Therefore, the remaining hyper-parameter to be found is ε. In our case, the value of ε is taken from a range of 0.0 to 0.2 with increment 0.01. That means there are totally 20 values and the values are 0.01, 0.02, 0.03,... and 0.2. After applying 10-fold cross validation to a training set TRAINr for 20 times, the ε value producing minimum validation error cost for TRAINr is chosen to be the best hyper-parameter εr*. The 10-fold cross validation is a well-know technique for determination of hyper-parameters. For more details, please refer to the reference [10]. By repeating this procedure for fifteen times and all εr* values for all TRAINr could be determined. Finally, the fifteen torque models Mr are produced using SVM module based on the corresponding training data set TRAINr and the determined hyper-parameter εr*. The biases b* for different Mr models can also be easily calculated by taking the average of bk obtained from Eq. (2).
4. Results After obtaining all torque models Mr, the accuracy is evaluated one by one against their own test sets TESTr. To verify the accuracy of each model of Mr, an error function has been established. For a certain model Mr, the corresponding validation error is:
Er =
1 N
yi − M r ( x i ) ∑ yi i =1 N
2
(4)
where xi ∈ Rn is the engine input parameters of ith data point in a test set or a validation set, and di = <xi, yi> represents the ith data point; yi is the true torque value in the data point di; and N is the number of data points in the test set or validation set. The error Er is a root-mean-square of the difference between the true torque value yi of a test point di and its corresponding estimated torque value Mr(xi). The difference is also divided by the true torque yi, so that the result is normalized within the range [0, 1]. It can ensure the error Er also lies in that range. Hence the accuracy rate for each torque model of Mr is calculated using the following formula:
Accuracyr = (1 − Er ) × 100%
(5)
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
95
According to the accuracy obtained in Table 2, the predicted results are in good agreement with the actual test results under their hyper-parameters εr*. One of the cases of the actual engine torque and power data acquired by dynamometer and the predicted engine torque and power data using SVM is shown in Figure 1. Nevertheless, it is believed that the model accuracy could be further improved by increasing the number of training data. Table 2. Accuracy of various models of Mr and the corresponding hyper-parameter (assuming c = σ = 1) Engine torque model Mr
εr*
b r*
Average accuracy with test set Testr
M1000
0.08
2.3
91.2%
M1500
0.11
2.5
90.4%
M2000
0.12
1.9
89.6%
M2500
0.04
3.3
91.2%
M3000
0.09
1.4
92.2%
M3500
0.17
4.1
86.2%
M4000
0.06
1.2
90.3%
M4500
0.18
2.2
94.4%
M5000
0.16
0.7
87.1%
M5500
0.18
0.7
85.2%
M6000
0.09
0.9
88.7%
M6500
0.2
3.3
93.5%
M7000
0.13
3.0
91.4%
M7500
0.12
1.3
90.6%
M8000
0.11
1.1
86.8%
5. Conclusions SVM method was applied to produce a set of power & torque models for an electronically controlled automotive engine according to different engine speeds. The models were separately regressed based on fifteen sets of sample data acquired from an electronically controlled automotive engine. Some experiments have been done to indicate the accuracy of the power & torque models and the results are highly
96
P.K. Wong et al. / Modelling and Prediction of Electronically Controlled Automotive Engine Power
satisfactory. The prediction models developed are very useful for vehicle fine tune-up because the effect of the trial ECU setup can be predicted to be gain or loss before running the vehicle engine on a dynamometer or road test. Hence the prediction models can greatly reduce the number of expensive dynamometer tests, and saves not only the time taken for optimal tune-up, but also the large amount of expenditure on fuel, spare parts and automotive fluids, etc. It is also believed that the models can let the automotive engineer predict if his/her new engine setup is gain or loss during road tests, where the dynamometer is unavailable. This methodology can be applied to different kinds of vehicle engines.
References [1]
J. Hartman, J, Fuel Injection Installation: Performance Tuning, Modifications. Motorbooks International, USA, 1993.
[2]
T. Ryan, Modern Regression Methods. Wiley-Interscience, 1996.
[3]
G. Seber, C. Wild, Nonlinear regression, New Ed edition. Wiley-Interscience, 2003.
[4]
N.Cristianini, J.Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.
[5]
J. Perez-Ruixo, F. Perez-Cruz, A. Figueiras-Vidal, A. Artes-Rodriguez, G. Camps-Valls, E. Soria-Olivas, Cyclosporine Concentration Prediction using Clustering and Support Vector regression. IEE Electronics Letters, 38, 2002, pp.568-570.
[6]
B. Schölkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[7]
S. Gunn, Support Vector Machines for Classification and Regression. ISIS Technical Report ISIS-1-98. Image Speech & Intelligent Systems Research Group, University of Southapton, May. 1998, U.K.
[8]
M. Seeger, Gaussian processes for machine learning. International Journal of Neural Systems, 14(2),
[9]
W. Pulkrabek, Engineering Fundamentals of the Internal Combustion Engine. Prentice Hall, 1997.
[10]
J. Suykens, T. Gestel, J. De Brabanter, B. De Moor and J. Vandewalle, Least Squares Support Vector
Multi-view Semi-supervised Learning: An Approach to Obtain Different Views from Text Datasets Edson Takashi Matsubara, Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo – USP Institute of Mathematics and Computer Science – ICMC Laboratory of Computational Intelligence – LABIC P. O. Box 668, 13560-970, São Carlos, SP, Brazil {edsontm, mcmonard, gbatista}@icmc.usp.br Abstract. The supervised machine learning approach usually requires a large number of labelled examples to learn accurately. However, labelling can be a costly and time consuming process, especially when manually performed. In contrast, unlabelled examples are usually inexpensive and easy to obtain. This is the case for text classification tasks involving on-line data sources, such as web pages, email and scientific papers. Semi-supervised learning, a relatively new area in machine learning, represents a blend of supervised and unsupervised learning, and has the potential of reducing the need of expensive labelled data whenever only a small set of labelled examples is available. Multi-view semi-supervised learning requires a partitioned description of each example into at least two distinct views. In this work, we propose a simple approach for textual documents pre-processing in order to easily construct the two different views required by any multi-view learning algorithm. Experimental results related to text classification are described, suggesting that our proposal to construct the views performs well in practice.
1. Introduction Due to the rapidly increasing amount of textual data available and the range of interesting and important problems arising in text analysis, there has been a growing interest in applying machine learning methods to text. By combining unsupervised and supervised learning, the need for labelled training data can often be greatly reduced, allowing for the development of more powerful models and methods. Methods that have been proposed under this paradigm are known as semi-supervised learning, and can be considered as the middle road between supervised and unsupervised learning. Semi-supervised algorithms learn a concept definition by combining a small set of labelled and a large set of unlabelled examples. The multi-view semi-supervised CO - TRAINING method [1] dealt with in this work applies to datasets that have a natural separation of their features into two disjoint sets. In other words, each example is described by two disjoint views, however each view is sufficient for inducing a classifier. Afterwards, a supervised learning system is trained separately using each view, producing two different classifiers. These classifiers are used to label the unlabelled examples, assigning a confidence level to each classification. Unlabelled examples classified with high confidence are used to enlarge the pool of labelled examples; this process is repeated to increment the labelled set until a stop criteria is reached.
98
E.T. Matsubara et al. / Multi-View Semi-Supervised Learning
In this work we propose and evaluate a simple approach to obtain the two disjoint views, needed by CO - TRAINING, from any textual data base. In order to evaluate the proposed approach, we perform an experimental evaluation with a set of documents extracted from scientific articles published in the Lecture Notes on Artificial Intelligence series. The experimental results were obtained using P RE T EX T [2], a computational environment for text pre-processing that implements our approach to construct the views from text data, and an implementation of the CO - TRAINING algorithm using Naive Bayes as the underlying learner. The rest of this paper is organized as follows: Section 2 reports some related work on semi-supervised learning. Section 3 describes the CO - TRAINING algorithm and some extended features present in our implementation of this algorithm. Section 4 presents our proposed approach to construct the different views. Section 5 reports the results obtained in the experimental evaluation and Section 6 concludes this paper. 2. Related Work Semi-supervised learning algorithms can be divided into single-view and multiple-view [3]. In a single-view scenario, the algorithms have access to the entire set of domain features. In a multi-view setting, the domain features are presented in subsets (views) that are sufficient for learning the target concept. Single-view algorithms can be split up into transductives [4], Expectation Maximization (EM) variations [5], background knowledge based algorithms [6] and seeded clustering algorithms [7]. Multi-view algorithms are based on the assumption that the views are both compatible and uncorrelated. If all examples are labelled identically by the target concepts in each view, the dataset is compatible. Two views are uncorrelated if given the label of any example, its descriptions in each view are independent. The CO - TRAINING algorithm introduces the theoretical foundations of multi-view learning, and other multi-view learning algorithms have been proposed, such as: COEM [8] which combines EM and CO - TRAINING; CO-T ESTING [3] which combines active and semi-supervised learning, and CO-EMT [3] an extension of CO-T ESTING with CO-EM. The use of Support Vector Machines (SVM) instead of Naive Bayes (NB) as the underlying learner is proposed in [9, 10]. An improved version of CO-EM using SVM is proposed in [11] showing experimental results that outperform other algorithms. Applications of CO - TRAINING include email classification [9], named entity recognition [12], wrapper induction [13], and classification of web pages [1]. However, multi-view learning algorithms are highly dependent on the application. For example, the views in [1] consist of words in the hyperlinks pointing to the pages and words in the Web pages, while the first and second views in [9] consist of the body and the head of emails, respectively. Thus, the views from a dataset can be obtained in different ways. In this work we propose a simple and general way to obtain two views from textual documents. 3. The CO - TRAINING Algorithm Given a set of N examples E = {E1 , ..., EN } defined by a set of M features X = {X1 , X2 , ..., XM }, CO - TRAINING needs two disjoint views, namely view D1 and D2 , of the set of examples E. We shall refer to these two views as XD1 and XD2 such that X = XD1 ∪ XD2 and XD1 ∩ XD2 = ∅, and where each view is sufficient to induce a classifier. For simplicity, let us consider XD1 = {X1 , X2 , . . . Xj } and XD2 = {Xj+1 , Xj+2 , . . . XM } — Figure 1(a). For unlabelled data we consider the y
E.T. Matsubara et al. / Multi-View Semi-Supervised Learning
(a) Disjoint views XD1 and XD2 of E
99
(b) Subsets LD1 , UD1 , LD2 and UD2 used by CO - TRAINING
Figure 1. The two views used as input for CO - TRAINING
value as “?”. Furthermore, there are few examples in set E for which the value of the label y is known. The E set can be divided into two subsets L (Labelled) and U (Unlabelled) examples. The subset L ⊂ E composed by the labelled example is further divided into two disjoint views LD1 and LD2 where L = LD1 ∪ LD2 and LD1 ∩ LD2 = ∅. Similarly, the subset of unlabelled examples U ⊂ E is split up into two disjoint views UD1 and UD2 where U = UD1 ∪ UD2 and UD1 ∩ UD2 = ∅. These four subsets LD1 , LD2 , UD1 and UD2 , illustrated in Figure 1(b), constitute the input to CO - TRAINING described by Algorithm 1. Initially, a small pool U ⊂ U of unlabelled examples is created. U examples consist and UD , which are withdrawn from UD1 and UD2 , respectively. It is of two views, UD 1 2 ∪ UD and UD ∩ UD = ∅. After the creation of U the important to note that U = UD 1 2 1 2 main loop of Algorithm 1 starts. The training examples LD1 and LD2 are used to induce two classifiers hD1 and hD2 , respectively. Using these two classifiers, examples from UD 1 and UD2 are labelled and inserted in RD1 and RD2 , respectively. After that, the labelled examples in RD and RD are given to the function bestExamples which is responsible 1 2 for selecting the “best” examples to be inserted in LD1 and LD2 . bestExamples only con and RD that have the same class label. After the examples are siders examples from RD 1 2 inserted in LD1 and LD2 the process is repeated until a stop criteria is reached. Currently, two stop criterias are implemented and can be reached: either the user defined maximum number of iterations or the U sets become empty. We have implemented several extended features in our implementation of CO TRAINING . For instance, the bestExamples function has some parameters that enable the and RD are selected. Two of these parameters user to set how the examples from RD 1 2 are: (i) the minimum probability to label an example, and; (ii) the maximum number of examples of each class that may be inserted in L. These two parameters are very important, the first one defines a minimum confidence level to label an example; the second one influences the examples class distribution in LD1 and LD2 . Next, we describe the proposed procedure to obtain the two disjoint views XD1 and XD2 from texts.
100
E.T. Matsubara et al. / Multi-View Semi-Supervised Learning
Algorithm 1: CO - TRAINING Input: LD1 , LD2 , UD1 , UD2 , k Output: LD1 , LD2 Build UD and UD as described; 1 2 ; UD1 = UD1 − UD 1 ; UD2 = UD2 − UD 2
for i = 0 to k do Induce hD1 from LD1 ; Induce hD2 from LD2 ; ) set of classified examples from U ; = hD1 (UD RD D1 1 1 ) set of classified examples from U ; RD = hD2 (UD D2 2 2 , R ); (RD1 , RD2 ) = bestExamples(RD D2 1 LD1 = LD1 ∪ RD1 ; LD2 = LD2 ∪ RD2 ; if UD1 = ∅ then return(LD1 , LD2 ) else Randomly select examples from UD1 and UD2 to replenish UD and UD respectively; 1 2 end end return(LD1 , LD2 );
4. Constructing two disjoint views The attribute-value representation of documents used in Text Mining provides a natural framework to create the two disjoint views needed by CO - TRAINING. However, the attribute-value representation is characterized by very high dimensional data since every word in the document may be treated as an attribute. In this work, we use a text preprocessing computational tool we have implemented, called P RE T EX T [2], to efficiently decompose text into words (stems) using the bag-of-words approach, as well as reducing the dimensionality of its representation, making text accessible to most learning algorithms that require each example be described by a vector of fixed dimensionality. The documents are written either in Portuguese, Spanish or English. Our tool is based on the Porter’s stemming algorithm for the English language, which was adapted for Portuguese and Spanish. In addition, the tool includes facilities to reduce the dimensionality of datasets using the well known Zipf’s law and Luhn cut-offs. In the identification of terms as bag-of-words, a term can be represented by simple words (1-gram), which are represented by the stem of simple words in our tool, or composed words (2 and 3-gram) that occur in the document. Each term is used as an attribute of the dataset represented in the attribute-value format. It can be observed that the two views needed by CO - TRAINING can easily be constructed using this approach. In this work, we have used 1-gram representation for one view and 2-gram representation for the other view. Furthermore, P RE T EX T has several known measures implemented to represent the value of terms in the documents. In this work we have used the term frequency measure, which counts the number of occurrences of a term in a document. 5. Experimental Evaluation We carried out an experimental evaluation using the LNAI dataset [14], a collection of title, abstracts and references of 277 (70%) articles from Inductive Logic Programming (ILP) and 119 (30%) articles from Case Based Reasoning (CBR) from Lecture Notes in Artificial Intelligence (LNAI). Using P RE T EX T we constructed the 1-gram and 2-gram views. For both views, only stems that appeared more than once in all documents were
101
E.T. Matsubara et al. / Multi-View Semi-Supervised Learning
considered. After this pre-processing phase, there were 2,914 stems left (attributes) for the 1-gram view and 3,245 for the 2-gram view. Table 1 summarizes the datasets employed in this study. It shows the number of documents (#Doc) in the LNAI dataset, the number of attributes (#Attributes) in each view, and class distribution. It is important to note that the LNAI dataset is completely labelled. It allows us to analyze the behavior of CO - TRAINING, comparing the labels assigned by CO - TRAINING in each iteration with the true labels. In other words, we use the CO - TRAINING algorithm in a simulated mode, in which the true labels are hidden from the algorithm. In order to obtain a lower bound of the error that CO - TRAINING can reach on this dataset, we measured the error rate of a Naive Bayes (NB) classifier using all examples and 10-fold cross-validation. This result (mean error and respective standard deviation) is shown in the last column (NB Error) of Table 1, as well as the prediction power of each individual view. #Doc
In order to measure the behavior of CO - TRAINING using 10-fold cross-validation, we adapted the sampling method as shown in Figure 2. First, a 10-fold for each view was created. Afterwards, pairs from each view were considered i.e., first fold of view 1 with first fold of view 2, second fold of view 1 with second fold of view 2, and so on. Vision 1
Vision 2
Data set
Training set
Training set
Data set
Training set
Training set
Training set
Training set
Test set
Test set
Test set
Test set
Test set
Test set
Iteration 1
Iteration 2
Iteration 10
Iteration 1
Iteration 2
Iteration 10
Vision 1 Training set Test set Iteration 1
Vision 2 Training set Test set
Iteration 1
Vision 1 Training set
Vision 2 Training set
Vision 1
Vision 2
Training set
Training set
Test set
Test set
Test set
Test set
Iteration 2
Iteration 2
Iteration 10
Iteration 10
Figure 2. 10-fold construction for CO - TRAINING evaluation
As the main idea of semi-supervised learning is to use a large unlabelled sample to improve the performance of supervised learning algorithms when only a small set of labelled
102
E.T. Matsubara et al. / Multi-View Semi-Supervised Learning
examples is available, the first experiment aims to verify the behavior of CO - TRAINING using a different number of initial labelled examples. For this experiment, the number of examples of each class that may be inserted into L in each iteration is set to 2 for ILP and 2 for CBR classes. In addition, for this and subsequent experiments the minimum probability to label an example was set to 0.6. Table 2 shows the results obtained, where: |Lini | and |Uini | refer respectively to the initial number of labelled and unlabelled examples. After execution |Lend | shows the mean number of examples labelled by CO - TRAINING; #Errors and %Errors show the mean number and proportion of incorrectly labelled examples respectively where %Errors = #Errors/(|Lend | − |Lini |). Standard deviations are shown in brackets. In all cases the stop criteria was UD1 = ∅ — Algorithm 1 — for k near 70. % and |Lini |
|Uini |
|Lend |
#Errors
%Errors
4.1% (0.9) 11.0 (2.5) 275.7 (2.9) 350 2% 6 3.7% (1.2) 9.5 (3.3) 276.8 (4.6) 339 5% 17 2.8% (1.1) 7.10 (2.9) 276.0 (3.4) 332 7% 24 322 279.8 (1.7) 10% 34 7.4 (1.8) 3.0% (0.8) Table 2. Mean number of CO - TRAINING incorrectly labelled examples varying |Lini |
The performance of CO - TRAINING using the constructed views is very good for all |Lini | values, since few examples were labelled erroneously. Moreover, using NB as the underlying classifier, it is possible to construct a combined classifier h which computes the probability P (yv , Ei ) of class yv given the instance Ei = (xD1i , xD2i ) by multiplying the class probabilities of hD1 and hD2 , i.e., P (yv , Ei ) = P (yv |xD1i )P (yv |xD1i ). Table 3 shows the mean error and standard deviation of the classifiers hD1 , hD2 and h on the first and last iteration of CO - TRAINING, and Figure 3 shows the mean error in each iteration. Iteration first
% de |Lini | 2%
last first
5%
last first last
7%
first last
10%
h D1
h D2
1-gram
2-gram
13.4 (7.7)
20.0 (8.0)
5.3 (4.4)
4.0 (3.0)
3.0 (3.3)
8.3 (4.9) 4.3 (3.4)
10.3 (4.3) 4.3 (2.9)
7.6 (4.6) 3.0 (2.3)
6.6 (4.6) 4.0 (3.8)
8.3 (4.0) 3.5 (2.5)
3.0 (2.6)
5.0 (3.9) 3.3 (4.5)
7.6 (2.9) 4.0 (3.3)
4.5 (3.7) 3.0 (3.3)
h 11.1 (6.4)
5.8 (4.1)
Table 3. Mean error of NB and combined classifiers on the first and last iterations
The maximum number of examples of each class inserted into L is an important parameter for CO - TRAINING [1]. We executed CO - TRAINING with three different settings: (i) bestExamples function selects examples from U in the same proportion of the class distribution; (ii) bestExamples selects the same number of examples from each class, and; (iii) bestExamples selects examples in the inverse proportion of the class distribution. It is important to note that the class distribution is known because the LNAI dataset is completely labelled. However, when only a small set of labelled examples is available, the class distribution might not be accurately estimated from the data. In these cases, class dis-
103
E.T. Matsubara et al. / Multi-View Semi-Supervised Learning 16
Figure 3. Mean error of combined classifiers for different values of |Lini |
tribution might be estimated using domain knowledge, whenever this knowledge is available. The following experiment evaluates the impact of selecting examples at different distributions. Table 4 shows the results for |Lini | = 17 since similar results were obtained for the other three cases. Best results were always obtained selecting examples at a proportion similar to the class distribution — results in bold in Tables 4 and 5. Observe that except for the inverse proportion case, the error of hD1 and hD2 on the last iteration are acceptable comparing with the ones obtained by NB on the whole dataset — Table 1. This might indicate that selecting the same proportion of examples can be an acceptable choice when no further information related to class distribution is available. |Lini |
|Uini |
(maj,min) (4,2)
17
|Lend |
#Error
%Error
312.4 (2.0)
3.0 (0.9)
1.0% (0.3)
339
(2,2) 276.8 (4.6) 3.7% (1.2) 9.5 (3.3) (2,4) 225.6 (5.2) 12.4 (4.7) 5.9% (2.2) Table 4. CO - TRAINING performance for different proportions of examples selected in each iteration
Iteration
(maj,min)
h D2 2-gram
h
1-gram
h D1
first last
(4,2)
7.8 (2.2) 2.3 (3.0)
11.1 (3.2) 2.5 (2.1)
7.3 (2.8) 1.8 (2.1)
first last
(2,2)
8.3 (4.9) 4.3 (3.4)
10.4 (4.3) 4.3 (2.9)
7.6 (4.6) 3.0 (2.3)
7.8 (3.1) 11.4 (7.9) 7.6 (3.0) (2,4) first 4.5 (4.5) 5.3 (6.5) 5.8 (4.8) last Table 5. NB classifiers mean error and standard deviation on CO - TRAINING first and last iteration
6. Conclusions In this work we propose a simple pre-processing method to construct the views required by multi-view semi-supervised learning algorithms. The proposed approach can be applied to
104
E.T. Matsubara et al. / Multi-View Semi-Supervised Learning
any set of textual documents. Experiments with CO - TRAINING on a set of documents extracted from scientific articles showed the applicability of this proposal, as well as encouraging initial results. We also show the importance of “tuning” CO - TRAINING execution aiming to obtain better results. Further research should provide a broader experimental evaluation on other textual datasets. Acknowledgements. This work was partially supported by the Brazilian Research Councils CAPES and FAPESP. References [1] Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with cotraining. In Proc. 11th Annu. Conf. on Comput. Learning Theory, pages 92–100. ACM Press, New York, NY, 1998. [2] Edson Takashi Matsubara, Claudia Aparecida Martins, and Maria Carolina Monard. Pretext: A pre-processing text tool using the bag-of-words approach. Technical Report 209, ICMC-USP, 2003. (in portuguese) ftp://ftp.icmc.sc.usp.br/ pub/BIBLIOTECA/rel_tec/RT_209.zip. [3] Ion Muslea. Active Learning With Multiple Views, 2002. Phd Dissertation, University Southerm California. [4] V. Vapnik. Statistical learning theory. John Wiley & Sons, 1998. [5] Kamal Nigam and Rayid Ghani. Analyzing the effectiveness and applicability of cotraining. In Conference on Information and Knowledge Management, pages 86–93, 2000. [6] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schroedl. Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 577–584, 2001. [7] Marcelo Kaminski Sanches. Semi-supervised learning: an approach to label examples from a small pool of labeled examples, 2003. Master Dissertation, ICMCUSP, (in portuguese) http://www.teses.usp.br/teses/disponiveis/ 55/55134/tde-12102003-140536. [8] Kamal Nigam, Andrew K. McCallum, Sebastian Thrun, and Tom M. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103–134, 2000. [9] Svetlana Kiritchenko and Stan Matwin. Email classification with co-training. Technical report, University of Otawa, 2002. [10] Michael Kockelkorn, Andreas Lüneburg, and Tobias Scheffer. Using transduction and multi-view learning to answer emails. In Proceedings of the European Conference on Principle and Practice of Knowledge Discovery in Databases, pages 266– 277. Springer-Verlag, 2003. [11] Ulf Brefeld and Tobias Scheffer. Co-EM Support Vector Learning. In Proceedings of the International Conference in Machine Learning. Morgan Kaufmann, 2004. [12] M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 100–110, 1999. [13] Ion Muslea, Steve Minton, and Craig Knoblock. Active + semi-supervised learning = robust multi-view learning. In International Conference on Machine Learning, pages 435–432. Morgan Kaufmann, 2002. [14] Vinícios Melo, Marcos Secato, and Alneu Andrade Lopes. Automatic extraction and identification of bibliographical information from scientific articles (in portuguese). In IV Workshop on Advances and Trend in AI, pages 1–10, Chile, 2003.
A Planning-Based Knowledge Acquisition Methodology Eder Mateus Nunes Gonçalves and Guilherme Bittencourt UFSC - University Federal of Santa Catarina DAS - Automation and Systems Department
1
Abstract. In the development of complex distributed systems using a cognitive multi-agent approach, where each agent encapsulates an expert system, the knowledge acquisition process is known to be the most difficult task. This paper presents a methodology based on planning and on high level Petri nets to the knowledge acquisition process of such systems. This methodology was applied in the implementation of a robot soccer team for the Robocup simulator. Keywords. Artificial Intelligence, Knowledge Acquisition, Planning, Petri Nets
1. Introduction Distributed Artificial Intelligence (DAI) is one of the fastest growing subdomains of Artificial Intelligence (AI) [3]. DAI is concerned with the application of AI techniques, supported by Distributed Systems methods and tools, to solve complex distributed problems. These kind of problems share the following characteristics: (i) they are physically and/or conceptually distributed, in the sense that their global state is composed by the aggregation of partially independent local states, (ii) the tasks involved in solving these problems refer to different levels of abstraction, varying from global coordination protocols to local perception/action procedures, that use sensors to perceive the world state and effectors to act in the world. A possible framework to solve this class of problems is the cognitive multi-agent systems approach. In this approach, each agent in the multi-agent society encapsulates a knowledge-based system, usually an expert system [5], that is responsible for the reasoning capabilities of the agent, but in this case it should consider its role in the social context. The problem complexity can be split in a hierarchical set of plans [4] and implemented in the form of one or more expert systems. How to elicit these plans and how to represent them as expert system knowledge bases is the, so called, knowledge acquisition problem [9]. The knowledge acquisition process has fundamental importance in any knowledgebased system and is known to be the bottleneck of any traditional expert system development project. On the one hand, the expert system designer, the knowledge engineer, 1 Correspondence to both authors: Caixa Postal 476, CEP 88040-400, Florianopolis - SC - Brazil. Tel.: +55 48 331 7576; Fax: +5 48 331 9934; E-mails: {eder | gb}@das.ufsc.br
106
E.M.N. Gonçalves and G. Bittencourt / A Planning-Based Knowledge Acquisition Methodology
should represent the acquired knowledge using a given knowledge representation language, in general a quite complex task. On the other hand, usually, the person that is knowledgeable about the domain, the expert, has a different background and perspective, what may lead to serious communication problems, that can even compromise all the acquisition process. In this context, the classical expert system development methodologies [9] are only partially applicable to this class of problems, where we should construct several expert system knowledge bases, each one associated with a different agent that compose the multi-agent system. This paper proposes a knowledge acquisition methodology for complex distributed system, that uses the Petri net formalism as a representation tool. The proposed methodology considers, at abstraction level1 , a interaction between the knowledge engineer and the expert, where the objective is to determine the goals at this level. This should be formalized as a planning problem, and further expressed in an ordinary Petri Net set, that here works as a communication language between the knowledge engineer and the expert. The resulting ordinary Petri nets are integrated, according to their abstraction levels, into a hierarchical high level Petri net that uses knowledge bases as tokens. The hierarchical high level Petri net can be directly simulated by a player program or automatically translated into the language supported by a suitable expert system shell. The paper is organized as follows. In Section 2, the use of the planning approach to acquire knowledge in a complex distributed problem context is presented. Section 3 shows how to use Petri nets to represent the obtained plans and defi nes an hierarchical high level Petri net model that integrates the different abstraction levels and allows the introduction of knowledge bases as tokens. Section 4 describes how the resulting hierarchical high level Petri net can be translated into expert system rules. Finally, Section 5 presents the conclusions and comment upon future work.
2. Knowledge Acquisition as a Planning Problem We claim that in cognitive multi-agent systems aimed to solve complex distributed problems, the rules of the expert systems that control the agents can be elicit more easily if they are seen as actions in plans in a social context. Planning [4] is concerned with the automatic synthesis of action strategies (plans) based on a formal description of perceptions, actions and goals. A plan can be seen as an action sequence that conduces a system from an initial state to a goal state. Planning is the process that generates such sequences. In the classical deterministic approach to planning, the plan generation problem can be represent by: • • • •
A discrete fi nite state space S. An initial state s0 ∈ S. A non empty set of goals SG ⊆ S. A set of actions A = {a1 , . . . , an } and a mapping α : S → 2A that defi nes for each state s ∈ S which actions α(s) ⊆ A are possible.
1 The methodology is designed to be applied in a top-down way, beginning by the highest abstraction level, but usually many backtracks between abstraction levels will be necessary.
E.M.N. Gonçalves and G. Bittencourt / A Planning-Based Knowledge Acquisition Methodology
107
• A deterministic state transition function f : S × A → S, such that, when a ∈ α(s), f (s, a) is the state in S that results from the execution of action a in state s. • A cost function c : S × A → R that evaluates how diffi cult it is to execute a in s. A plan is a sequence of actions {a1 , . . . , an }, ai ∈ A such that, when executed in the state s0 leads to a state f (· · · f (f (s0 , a1 ), a2 ), . . . , an ) ∈ SG . In the proposed methodology, planning is only used as a design tool and, to fulfi ll this role, classic deterministic planning seems to be enough, despite of the criticism about its applicability. Real world complexity is introduced in a latter step of the methodology, when the ordinary Petri nets generated by the planning processes are joined into a hierarchical high level Petri net (see Section 3). Once the planning problem is defi ned by its elements (i.e., S, SG , f, A, α, c) it can be solved using a suitable state of S as initial state s0 . This generates a set of cost ordered plans to achieve each goal. Each plan π can be represented by a pair π = w, σ, where w is its weight and σ is a sequence σ = {(a1 , s0 ), . . . , (an , sn−1 )} where f (ai , si−1 ) = si . The set of actions in one abstraction level defi nes the goals of the abstraction level immediately below, until a level is reached where the actions correspond to available primitive operations of the domain. Example 1 In the robot soccer, to win a match, the robots in the team need strategies that correspond to all their possible goals. The goals, in this case, represent the possible roles to be performed by the robots in each possible situation of the match. Defining these strategies can be seen as a planning problem and each plan as a suitable set of rules to achieve the associated goals. At the highest abstraction level, the team has two basic strategies: one with the ball control and another without the ball control. If the team has not the ball control, then the goal is to take the ball control. When the team has the ball control, it turns to attack the opponent, choosing one of the possibilities: attack by the center, by the right side, or by the left side, which depends of the ball position in the field. The possible strategies can be formalized in the following planning problem: • • • •
Solving the problem we find the following plans, ordered by increasing cost: -1, { (get-ball-control,not-ball-control), (right-attack, ball-control), (kick-to-goal, attacking) } -1, { (get-ball-control,not-ball-control), (center-attack, ball-control), (kick-to-goal, attacking) } -1, { (get-ball-control,not-ball-control), (left-attack, ball-control), (kick-to-goal, attacking) } 0, { (get-ball-control,not-ball-control), (right-attack, ball-control), (loose-ball-control, attacking) }
108
E.M.N. Gonçalves and G. Bittencourt / A Planning-Based Knowledge Acquisition Methodology
not−ball−control
get−ball−control loose−ball−control ball−control
right−attack
center−attack
left−attack
attacking
kick−to−goal
goal
Figure 1. Set of Plans Represented with a Petri Net.
3. Plan Representation using Petri Nets A Petri net is a mathematical and graphical tool that allows the formal specifi cation of event driven systems in general. Its graphical character also allows it to be used as a communication language between the different actors involved in the design of such systems [7]. 3.1. Plans as a Petri Net As a result of the fi rst step of the methodology, for each abstraction level and for each goal at this level, we have a set of plans, each one with an associated weight. We defi ne the following Petri net: • P = S the set of places correspond to the set of states in one abstraction level. • T = A the set of transitions correspond to the set of actions in one abstraction level. • P re(p, t) = w, if p = a ∈ α(s), t = s such that f (s, a) = s with P ost(s , a) = 1 and w is the weight of the minimum weight plan π = w, σ in which (s, a) ∈ σ. • P ost(p, t) = 1, if p = a, t = s such that f (s , a) = s with P re(s , a) = w. The resulting Petri net includes all the possible plans associated with a goal and its simulation can be used by the expert and the knowledge engineer to validate and refi ne the representation. Note that, at each state in the Petri net, the weights labelling the output edges indicate the cost of the different plans in which such transition occurs. Example 2 The set of plans specified in example 1, can be integrated into a Petri Net, like the one in the Figure 1.
E.M.N. Gonçalves and G. Bittencourt / A Planning-Based Knowledge Acquisition Methodology
109
3.2. High Level Petri Net In order to represent an expert system using a Petri net, it is necessary to extend the expressive power of the tokens to allow the representation of the knowledge base manipulations that occur when a rule is fi red. For that purpose, high level Petri net models (e.g. [10]) are well adapted. From an epistemological perspective, the central component of an expert system is the knowledge base [8]. Informally, a knowledge base is formed by a set of fact descriptions. A generic knowledge base, independently of the adopted knowledge representation method, can be formalized through the defi nition of two access functions called Tell and Ask, that allow, respectively, to include a new fact in the knowledge base and to query a given knowledge base. More formally, let KB be the set of all possible knowledge bases and φ an expression of the formal language used by the adopted knowledge representation method. Without loss of generality, we suppose that φ is a term. Let V be a set of variable symbols, C a set of names of primitive entities in the domain and F a set of function names. The set of all terms T is defi ned as follows: • V ⊆T; • C ⊆T; • if t1 , . . . , tn ∈ T and f ∈ F, then f (t1 , . . . , tn ) ∈ T . Let also S be the set of all possible mappings V → T , i.e., the set of all substitutions of variables, and T ∗ be the set of all ground terms, i.e., terms where no variable occurs. In this way, it is possible to defi ne: T ell : KB × T ∗ → KB Ask : KB × T → S During the knowledge acquisition process, when the lowest abstraction level is reached, the actions associated with the Petri net transitions become actual operations in the domain. These operations usually has preconditions and effects that are registered in a knowledge base. To introduce this conditions and effects into the formalism we extends the Petri net defi nition as follows. The token is defi ned as an element of the set KB, i.e., the token now represents a knowledge base. We introduce the following functions: Cond : T × KB → S Act : T × KB × S → KB A Cond function is associated with each transition. It receives a knowledge base k ∈ KB and returns a substitution θ ∈ S. In the lowest abstraction level, its general form is: θ1 ← Ask(k, φ1 ) .. . θn ← Ask(k, φn ) θ ← Combine(θ1 , . . . , θn )
110
E.M.N. Gonçalves and G. Bittencourt / A Planning-Based Knowledge Acquisition Methodology
where φi ∈ T are domain dependent terms, possibly containing variables, that are used to query the knowledge base k and Combine is a function that combines substitutions. In higher abstraction levels, the Cond function may contain expressions such as: θi ← Run(φ, R) where R is a lower level Petri net and φ ∈ T is a domain dependent term, possibly containing variables, that is used as a query to the lower abstraction level knowledge base, after the execution of the Petri net R. An Act function is also associated with each transition. It receives a knowledge base k ∈ KB and a substitution θ ∈ S, and return an updated knowledge base. Its general form is: T ell(T ell(k, ψ1θ), . . . , ψn θ) where ψi ∈ T are domain dependent terms, possibly containing variables, that represent a generic action and ψi θi ∈ T ∗ are the associated ground terms that are used to update the knowledge base k. The semantic of this extension is the following: before a transition is fi red, the Cond function is applied on the knowledge base token and, if the result is a non empty substitution θ, then the function Act is executed with the substitution θ applied to all the terms that occur in it. Example 3 The Petri Net generated in example 2 can be used to derive the high level Petri net in the Figure 2. In the figure, only the right-attack goal is considered.
4. Petri Net Representation using Expert Systems Once the high level petri net is defi ned, it is straight forward to translate it into a set of rules for an expert system shell. Because both languages, the Petri net defi nition and the rule language of the chosen expert system shell, are formal languages a compilercompiler (e.g., Lex-Yacc [6]) can be used to implement the translation. Example 4 Each transition of the high level Petri net is represented by a rule in the knowledge base. In our example, we used the Expert-Coop++ shell [2]. For instance, the rule generated by transition t4 in the Petri net of Figure 2 is presented, using the Expert-Coop++ syntax in Figure 3.
5. Conclusion The goal of this work is to propose a methodology that could help a knowledge engineer and an expert in a specifi c domain in the knowledge acquisition process of a cognitive multi-agent system aimed to solve complex distributed problems. This methodology is based in planning techniques and also in a high level Petri net formalism where the tokens contain a knowledge base.
E.M.N. Gonçalves and G. Bittencourt / A Planning-Based Knowledge Acquisition Methodology
The methodology is designed to be used in domains in which the possible actions can be seen as plans to achieve goals. This methodology was used to implement a simulated Robocup soccer team. In this domain, the strategies are based on the play set that is used according to the game state. Each play represents a goal to be accomplished by the agent and by the team. The main advantages observed in the use of Petri Nets are: (i) it is possible to describe a partial order between events; (ii) the system states, as the events, can be repre-
112
E.M.N. Gonçalves and G. Bittencourt / A Planning-Based Knowledge Acquisition Methodology
(rule_004 (if
(logic (global_goal current rws_attack_play)) (logic (global_goal status active)) (logic (local_goal current ?lg1)) (logic (local_goal status ?lg2))) (filter (= ?lg1 kick) (= ?lg2 sucess)) (then (logic (global_goal status sucess))))
Figure 3. Rules Generated by Petri Net
sented explicitly; (iii) Petri nets provide a tool family to specifi cation, model, analysis, evaluation, and implementation; (iv) a precise and formal description of the agent synchronization in the environment is possible. (v) it is possible to describe the control flow in expert systems. This work was focussed in the methodology to build a complex distributed system, the aspects relative to the analysis and validation are described elsewhere. Future works will consider, mainly, the extension of this methodology to the social context in a multiagent environment.
References [1] Emiel Corten, Klaus Dorer, Fredrik Heintz, Kostas Kostiadis, Johan Kummeneje, Helmut Myritz, Itsuki Noda, Jukka Riekki, Patrick Riley, Peter Stone, and Tralvex Yeap. Soccerserver Manual, July 1999. [2] A. C. P. L. Costa, G. Bittencourt, E. M. Gonçalves, and L. Rottava da Silva. Expert-coop++: Ambiente para desenvolvimento de sistemas multiagente. In XXIII Congresso da SBC, IV Encontro Nacional de Inteligência Artifi cial (ENIA’2003), SBC, Unicamp, Campinas, 2 a 8 de agosto, 2003. [3] E.H. Durfee, V.R. Lesser, and D.D. Corkill. Trends in cooperative distributed problem solving. IEEE Transactions on Knowledge and Data Engineering, 1(1):63–83, March 1989. [4] Héctor Geffner. Perspectives on artifi cial intelligence planning. In Eighteenth National Conference on Artifi cial Intelligence (AAAI-2002), pages 1013–1023. AAAI/MIT Press, 2002. [5] Peter Jackson. Expert Systems. Addison Wesley, third edition, 1998. [6] M. E. Lesk. Lex - a lexical analyzer generator. Technical report, Bell Laboratories, Murray Hill, New Jersey, October 1975. Technical Report No 39. [7] Tadao Murata. Petri nets: Properties, analysis and applications. IEEE, 77(4):481–497, April 1989. [8] Stuart Russel and Peter Norvig. Artifi cial Intelligence, A Modern Approach. Alan Apt, 1995. [9] Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Robert de Hoog, Nigel Shadbolt, Walter Van de Velde, and Bob Wielinga. Knowledge Engineering and Management, The CommonKADS Methodology. MIT Press, Cambridge, 1999. [10] C. Sibertin-Blan. High-level Petri nets with data structures. In European Workshop on Application and Theory of Petri net, pages 141–170, Helsinki, Finland, jun 1985.
Digital Images: Weighted Automata Theoretical Aspects a
Alexandre SCALZITTI a,1 , Kazumi NAKAMATSU b,2 and J.M. ABE c,3 Institute for Algebra, Dresden University of Technology, Dresden, D-01062, Germany b University of Hyogo, Shinzaike, Himeji, 670-0092, Japan. c Institute for Advanced Studies - Universityy of São Paulo Av. Luciano Gualberto, Trav. J, 374, Térreo, Cidade Universitária CEP 05508-900-São Paulo-SP-Brazil and Information Technology Dept., ICET- Paulista University, UNIP Rua Dr. Bacelar, 1212 CEP 04026-002-São Paulo-SP-Brazil Abstract. This paper is a survey which presents fundamental ideas about the application of weighted automata in digital image processing. We present basic definitions such as semirings, weighted automata and digital images. Then we explain how we can represent an image using a weighted automaton. Keywords. weighted automata, formal power series, digital images, pixel addressing.
Introduction In the classical theory of automata, Kleene’s fundamental theorem [10] on the coincidence of regular and rational languages in free monoids has been extended in several directions. Schützenberger in [16] introduced weighted automata, that is, automata whose transitions are labeled with elements of a fixed semiring. With these weighted transitions, a weighted automaton computes costs of input words. These costs are computed using the operations of addition and multiplication of a fixed semiring. In other words, we can say that a weighted automaton computes a cost function, that is, a function of the type f : Σ∗ → K where Σ is a fixed alphabet and K is a fixed semiring. Cost functions are also called formal power series. Schützenberger generalized Kleene’s concept of recognizability and rationality of languages to formal power series and proved that recognizable and rational formal power series form that same class of cost functions. The reader interested in the background on formal power series should refer to [1,15,11,12]. Weighted automata have recently received much interest due to their applications in image coding, manipulation and compression (Culik II and Kari [5,4,6], Hafner [7], Katritzke [9], Jiang, Litow and de Vel [8]) and in speech-to-text processing (Mohri [13], Mohri, Pereira and Riley[14], Buchsbaum, Giancarlo and Westbrook [2]). In this paper we focus on image coding. In [5] the authors present an algorithm which receives as 1 E-mail:
A. Scalzitti et al. / Digital Images: Weighted Automata Theoretical Aspects
input a multiresolution image and outputs a weighted automaton which represents (encodes) the image in case that the image satisfies a condition which can be formulated in linear algebraic terms. If the image does not satisfy this condition, the algorithm does not terminate. In [3] Culik and Kari made improvements in the algorithm presented in [5] by generating a weighted automaton which encodes an approximation of the input image. Katritzke in [9] proposes several refinements for the algorithm presented in [5,3].
1. Background 1.1. Semirings and formal power series A semiring K is a structure K = (K, ⊕, , 0K , 1K ) such that (K, ⊕, 0K ) is a commutative monoid, that is, ⊕ is a commutative associative binary operation on K and 0K is the neutral element with respect to ⊕; (K, , 1K ) is a monoid, that is, is an associative binary operation on K; is both left and right distributive over ⊕, that is, for all x, y, z ∈ K it holds that x (y ⊕ z) = x y ⊕ x z and that (y ⊕ z) x = y x ⊕ z x and 0K is absorbing with respect to , that is, for all x ∈ K, it holds that 0K x = x 0K = 0K . We call ⊕ and respectively the addition and the multiplication of the semiring K. We call 0K neutral element with respect to the addition and 1K neutral element with respect to the multiplication. For further definitions in this paper, let us consider the above defined semiring K fixed. Examples for semirings are 1. the boolean semiring B = ({0, 1}, ∨, ∧, 0, 1) with ∨ acting as addition and ∧ acting as multiplication; 2. the natural numbers (N, +, ·, 0, 1) with the usual addition and multiplication; 3. the real numbers (R, +, ·, 0, 1) with the usual addition and multiplication; 4. the real max-plus semiring Rmax = (R≥0 ∪ {−∞}, max, +, −∞, 0) with max acting as addition and the usual addition of real numbers + acting as multiplication. Morever, R≥0 = [0, ∞) with the convention −∞ + x = −∞ = x + −∞ for all x ∈ Rmax ; 5. stochastic semiring ([0, 1], max, ·, 0, 1) with [0, 1] ⊆ R with max acting as addition and the usual multiplication · acting as multiplication; 6. distributive lattices: in this case, ∨ is interpreted as the semiring addition and ∧ is interpreted as the semiring multiplication. Let n be a positive integer. Let us consider the set M n×n of all matrices of dimension n × n with entries in K. We define the addition ⊕M and the multiplication M of K n×n as follows. Let A, B ∈ K n×n . For i, j ∈ {1, . . . , n} we define 1. (a ⊕M b)ij := (a) nij ⊕ (b)ij and 2. (a M b)ij := k=1 aik bkj . The neutral element with respect to the addition ⊕M is the matrix 0M such that all entries are 0K and the neutral element with respect to the multiplication is the matrix 1M such that its main diagonal has entries 1K and the rest of 1M has entries 0K . We observe that M = (M n×n , ⊕M , M , 0M , 1M ) is a semiring. We also observe that we can compute for example Ap×q M Br×s where 1 ≤ p, q, r, s ≤ n. We just have to ensure the usual compatibility criterium that q must be equal r.
A. Scalzitti et al. / Digital Images: Weighted Automata Theoretical Aspects
115
An alphabet is a non-empty finite set. Its elements are called symbols or letters. In further definitions in this paper, we consider an alphabet Σ fixed. A (finite) word w over Σ is a finite sequence a1 a2 . . . an of symbols of Σ. We say that n is the size of w. If n is 0 then w is the empty word and we denote it by ε. We denote the set of words over Σ by Σ∗ . We call a function f : Σ∗ → K a formal power series with values in K. 1.2. Weighted automata A K-weighted automaton A is a tuple (Q, T, in, out) such that Q is a non-empty finite set of states; a finite set T ⊆ Q×Σ×K ×Q of K-weighted transitions; in, out : Q → K are cost functions for entering respectively leaving each state. Let A be a K-weighted automaton. We say that A is deterministic if for every two transtions (p, a, x1 , q1 ) and (p, a, x2 , q2 ) of T we have x1 = x2 and q1 = q2 . A is called complete if for every p ∈ Q and a ∈ Σ there is x ∈ Rmax and p ∈ Q such that (p, a, x, p ) ∈ T . A finite path P in A is a finite word on T of the form (pi , ai+1 , xi+1 , pi+1 )i∈{0,...,n−1} for some positive integer n. The length of the path P is the length of P considered as a word. We call q0 and qn domain and codomain of P and we denote them by dom(P ) and cod(P ), respectively. The label of P is the finite word w := a1 a2 . . . an . We also say that P is a w-labeled path from q0 to qn . Let P := (pi , ai+1 , xi+1 , pi+1 )i∈{0,...,n−1} be an arbitrary finite path in A. The running cost of P in A, denoted by rcostA (P ), is defined by: rcostA (P ) :=
n
xi
i=1
and rcostA (P ) = 1K if n is equal 0. The cost of P , denoted by costA (P ), is defined by: costA (P ) := in(p0 ) rcostA (P ) out(pn ). The behavior of A, denoted by A, is the function A : Σ ∗ → Rmax defined by (A, w) :=
{costA (P ) | P is w-labeled path in A}
where w ∈ Σ∗ . We observe that if the above defined set is empty, then (A, w) := 0 K . In the sequel, we present an alternative representation of weighted automata which is very suitable for computations. Let A = (Q, T, in, out) be a K-weighted automaton with n states. We say that A has a matrix representation if there are a row vector I ∈ K 1×n , a column vector F ∈ K n×1 , and for each a ∈ Σ a matrix Wa ∈ K n×n such that for every w := a1 a2 . . . ak ∈ Σ∗ , (A, w) = I M Wa1 M Wa2 M . . . M Wak M F. We state the following lemma. Lemma 1.1. Let A be a K-weighted automaton. Then A has a matrix representation. Instead of proving this lemma, we present an example.
116
A. Scalzitti et al. / Digital Images: Weighted Automata Theoretical Aspects
a/5
a/3 b/2 q0 1
q1 1
b/1
1 1
Figure 1. a R-weighted automaton
Example 1.2. We consider the weighted automaton of the Figure 1. Its weights are in the semiring (R, +, ·, 0, 1) that is, the semiring of real numbers with usual addition and multiplication. The alphabet Σ in this case is {a, b}. We write 30 02 1 , Wb := , F := . I := 1 1 , Wa := 05 10 1
2. Digital images Let k and m be two positive integers. A digitized graytone image of the finite resolution k × m consists of k × m pixels each of which takes a real value which represents the graytone intensity. This real value is in the practice digitized to a value between 0 and 2k − 1, typically k = 8. We deal in this paper only with images of resolution n × n, that is, square images. For the sake of simplicity, in what follows, we omit the word “square”. In this paper we consider only resolutions of type 2 n × 2n . The unit square is the set U = [0, 1]2 of the real plane. If we consider that an image has the dimensions of the unit square, a pixel at resolution 2n × 2n corresponds to a subsquare of U whose size of the side is 2−n . Given a natural n, let us consider all pixels at resolution 2 n × 2n of U . We describe now how we can assign to each pixel an address. We can do this addressing in such way which allows us to introduce weighted-automatic tools. The idea is the following. We assign to each pixel a word. This word, which represents the pixel, has a “cost”, namely the real value which corresponds to the graytone intensity. Now we try to precise the above idea. How can we assign words to pixels? Depending on the resolution, we divide U in quadrants, quadrants of quadrants and so on in the following way: let us suppose we are given n = 0 which means that we have to consider U at resolution 1 × 1 which means that there is only one pixel which is U itself. We assign ε as an address to U ; now let us assume that we are given n = 1. This means that we are dealing with U at resolution 2 × 2 and because of that we have 4 pixels which we number with 0, 1, 2 and 3 according to Figure 2 on the left; now let us assume that n = 2. We have U at resolution 4 × 4 and therefore 16 pixels. We give addresses to these 16 as shown in Figure 2 on the right. This is done by subdiving each of the subsquares of U at the left again in 4 quadrants and label each of them inductively with 0, 1, 2 and 3. Roughly speaking, the pixel addressing method presented above provides us a set of finite words which can be interpreted as follows. Let w a pixel address over the alphabet
117
A. Scalzitti et al. / Digital Images: Weighted Automata Theoretical Aspects
1
3
0
2
11
13
31
33
10
12
30
32
01
03
21
23
00
02
20
22
Figure 2. Pixel addressing
Σ = {0, 1, 2, 3}. If w = v · a with a ∈ Σ, then w addresses quadrant a of the subsquare of U addressed by v. 2.1. Average-preserving multiresolution images It is frequently useful to consider multiresolution images, that is, images which are simultaneously specified for all possible resolutions and which satisfy some compatibility condition. More especifically, a multiresolution image can be given by a function f : Σ∗ → R, which we call multiresolution function, and the compatibility condition in our case is that f must be average-preserving, this is, f (w) =
1 · [f (w0) + f (w1) + f (w2) + f (w3)] 4
for every w ∈ Σ∗ . Let us consider the set P of all average-preserving multiresolution functions. Let f1 , f2 , f ∈ P and c ∈ R. We define two operations: • addition: (f1 + f2 )(w) := f1 (w) + f2 (w), for every w ∈ Σ∗ ; • scalar multiplication: (cf )(w) = c · f (w), for every w ∈ Σ∗ . Proposition 2.1. The set P together with the above defined addition and scalar multiplication is a vector space. A weighted automaton is said to be average preserving if its behavior is an averagepreserving function. Let A be an n-state R-weighted automaton. The matrices W a , for every a ∈ Σ, and the final distribution F define a multiresolution function ψ i , for every state i ∈ {1, . . . , n}, by ψi (a1 a2 . . . ak ) = (Wa1 Wa2 . . . Wak F )i . Equivalently we have ψi (aw) =
n
(Wa )ij ψj (w).
j=1
Proposition 2.2. If A is average-preserving then ψi is also average-preserving, for every i ∈ {1, . . . , n}.
118
A. Scalzitti et al. / Digital Images: Weighted Automata Theoretical Aspects
Proposition 2.3. Let fA be the multiresolution image computed by A. Then fA =
n
Ij ψj .
j=1
2.2. Zooming and decoding The pixel addressing system which we used allows us to regenerate not only the whole image but also a zoomed subimage. Let f be a multiresolution image over Σ and let A be an R-weighted automaton which computes f . Moreover, let u := a 1 a2 . . . ak ∈ Σ∗ . We define the multiresolution image fu (w) = f (uw), for every w ∈ Σ∗ , that is, the image obtained from image f by zooming to the subsquare with address u. We can obtain an R-weighted automaton Au which computes fu from A by replacing I by Iu where Iu := I · Wa1 · . . . · Wak Let A be an R-weighted automaton and let fA be its behavior. We decode the image fA at resolution 2k × 2k by computing fA (w) for every w ∈ Σ∗ of length k. 2.3. Constructing a weighted automaton from an image Now, we present an algorithm which receives an average-preserving function f as input and outputs an average-preserving R-weighted automaton A which computes f , provided such weighted automation exists. The algorithm assumes that the executor can check if a function is a linear combination from a given finite collection of averagepreserving functions. Moreover, the algorithm assumes also that the executor can effectively compute the coefficients of this linear combination. Algorithm Generate Weighted Automaton - GWA Input : an average-preserving function f : Σω . Output: if f can be computed by an R-weighted automaton then GWA returns an average-preserving R-weighted automaton A which computes f . Otherwise, GWA does not terminate. Step 1. i ← 0; j ← 0 ; Step 2. create state 0 and assign ψ0 ← f = fε ; Step 3. assume ψi = fw . For k = 0, 1, 2, 3 do: if there are c0 , . . . , cj such that fwk = c0 ψ0 + . . . + cj ψj then set Wk (i, x) ← cx for x = 0, . . . , j otherwise j ← j + 1, ψj ← fwk and Wk (i, j) ← 1. Step 4. if i = j goto Step 5 otherwise i ← i + 1, goto Step 3 ; Step 5. Assign the initial distribution I0 ← 1 and Ix = 0 for every x > 0 and the final distribution Fx = f (w). Proposition 2.4. GWA stops if and only if the set {fw | w ∈ Σ∗ } generates a linear space of finite dimension. Moreover, the number of states produced by GWA (if it stops) is exactly the dimension of the linear space.
A. Scalzitti et al. / Digital Images: Weighted Automata Theoretical Aspects
119
3. Conclusions In recent years, much in theoretical research and in applications of weighted automata has been made. Especially concerning applications, much has been done to apply weighted automata in speech recognition, hardware design and data compression.
References [1] J. Berstel and C. Reutenauer. Rational series and their languages, volume 12 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, Berlin, 1988. [2] A. L. Buchsbaum, R. Giancarlo, and J. R. Westbrook. On the determinization of weighted finite automata. SIAM J. Comput., 30(5):1502–1531 (electronic), 2000. [3] K. Culik and J. Kari. Image-data compression using edge-optimizing algorithm for wfa inference. Journal of Information Processing and Management, 30:829–838, 1994. [4] K. Culik and J. Kari. Finite state transformations of images. In Automata, languages and programming (Szeged, 1995), volume 944 of Lecture Notes in Comput. Sci., pages 51–62. Springer, Berlin, 1995. [5] K. Culik, II and J. Kari. Image compression using weighted finite automata. In Mathematical foundations of computer science 1993 (Gda´nsk, 1993), volume 711 of Lecture Notes in Comput. Sci., pages 392–402. Springer, Berlin, 1993. [6] K. Culik, II and J. Kari. Digital images and formal languages. In Handbook of formal languages, Vol. 3, pages 599–616. Springer, Berlin, 1997. [7] U. Hafner. Low Bit-Rate Image and Video Coding with Weighted Finite Automata. PhD. thesis, Universität Würzburg, Germany, 1999. [8] Z. Jiang, B. Litow, and O. de Vel. Similarity enrichment in image compression through weighted finite automata. In Computing and combinatorics (Sydney, 2000), volume 1858 of Lecture Notes in Comput. Sci., pages 447–456. Springer, Berlin, 2000. [9] F. Katritzke. Refinements of data compression using weighted finite automata. PhD thesis, Universität Siegen, Germany, 2002. [10] S. C. Kleene. Representation of events in nerve nets and finite automata. In Automata studies, Annals of mathematics studies, no. 34, pages 3–41. Princeton University Press, Princeton, N. J., 1956. [11] W. Kuich. Semirings and formal power series: their relevance to formal languages and automata. In Handbook of formal languages, Vol. 1, pages 609–677. Springer, Berlin, 1997. [12] W. Kuich and A. Salomaa. Semirings, automata, languages, volume 5 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, Berlin, 1986. [13] M. Mohri. Finite-state transducers in language and speech processing. Comput. Linguist., 23(2):269–311, 1997. [14] M. Mohri, F. Pereira, and M. Riley. The design principles of a weighted finite-state transducer library. Theoret. Comput. Sci., 231(1):17–32, 2000. Implementing automata (London, ON, 1997). [15] A. Salomaa and M. Soittola. Automata-theoretic aspects of formal power series. SpringerVerlag, New York, 1978. Texts and Monographs in Computer Science. [16] M. P. Schützenberger. On the definition of a family of automata. Information and Control, 4:245–270, 1961.
Modeling the Behavior of Paraconsistent Robots José Pacheco de ALMEIDA PRADO b, 1, Jair Minoro ABE a, b, 2 and Alexandre SCALZITTI c, 3 a
Institute For Advanced Studies – University of São Paulo, São Paulo - Brazil b Paulista University, UNIP – São Paulo – Brazil c Institute for Algebra - Dresden University of Technology - Dresden - Germany Abstract. It can be observed that the number and the complexity of the application domains, where the Paraconsistent Annotated Logic has been used, have grown a lot in the last decade. This increase in the complexity of the application domain is an extra challenge for the designers of such systems, once there are not suitable computerized models for the representation and abstraction of the paraconsistent systems. This work proposes a new model to Paraconsistent Systems called Paraconsistent Finite Automata. Keywords. Paraconsistent Annotated Logics, Finite Automata, Paraconsistent Automata, Paraconsistent Robots.
Introduction Finite Automata were introduced in the 40’s in order to model human brain and research on them has become essential for the study of computational boundaries. Roughly speaking, a finite automaton is an abstract machine which can assume a number of distinct finitely many states. A large number of applications of finite automata can be found in systems in the areas of robotics, compilers, digital circuits, architecture of computers, graphic interfaces, etc. In such systems, we can assume that the input is made in an ordered and sequential way. A Finite Automata are composed by four main elements: States – Finitely many possible internal configurations. There are two special subsets of the set of states: the set of initial and final states. State Transitions – under certain conditions, an automaton can change its internal current state to another state. This is called transition. Rules – they are conditions which must be satisfied so that a transition can be performed. Events or Input Symbols – they are generated externally and can activate rules which invoke transitions. In an automaton, control goes from one state to another while reading an external input sequence. In many applications the possibility to consider more than one state at the same time can be very helpful. This property of being able to consider more than one state at 1
J.P. de Almeida Prado et al. / Modeling the Behavior of Paraconsistent Robots
121
the same time can be applied for example at the problem of guessing something about a certain amount of data, mobile robotics, criptography, signal processing among others. In this paper we introduce Paraconsistent Finite Automata which realizes the idea above, that is, we can model phenomena by finite automata in which we can assume more than one state at the same time. Robots are essentially imprecise with respect to the environment conditions. This impreciseness is a consequence of the limitations of sensors and of the impossibility of foresee most of the environments where the robot will perform. So, mobile robot designers may use Paraconsistent Automata to model and specify the behavior of their robots. 1. Paraconsistent, Paracomplete and Non-Alethic Logics Let T be a deductive theory whose underlying logic is L and let us suppose that the language of T and of L has a negation symbol. The theory T is said to be inconsistent if we can prove A and A. On the contrary, we say that T is consistent. The theory T is said to be trivial if all formulas of L – or all closed formulas of L – are theorems of T, this is, informally speaking, if everything which can be expressed in the language of T can be proved in T. On the contrary, T is said to be non-trivial. In most of usual logical systems, the presence of a contradiction trivializes such theory. This fact makes this theory not interesting, so every proposition can be proved in the theory: it becomes impossible to distinguish between the true and the false proposition. Paraconsistent logics allow a theory to be inconsistent but non-trivial. A logic L is said to be paraconsistent if it can be the underlying logic for inconsistent but non-trivial theories. 2. Annotated Paraconsistent Logics A way to apply the theoretical concepts of paraconsistent logics was established with the development of the Annotated Paraconsistent Logics in [1]. In the Annotated Paraconsistent Logic considered in this paper every predicates have an associated belief degree P, for example, p:P which can be read as follows: “it is known with minimal belief P that p is true” where P is an element of a finite lattice. Let us consider a finite lattice W = < |W|, d > such that |W| = {A, v, f, qt, qf, vqd, fqd, T} represented by the Hasse diagram in the Figure 1. The elements – annotational constants – of the lattice represent: x x x x x x x x
A: unknown; v: true; f: false; qv: almost true; qf: almost false; vqd: almost unknown true ; fqd: almost unknown false and T: inconsistent.
122
J.P. de Almeida Prado et al. / Modeling the Behavior of Paraconsistent Robots
The underlying order d is represented by the Hasse diagram in Figure 1. The expression “It is true that Peter is German and it is almost true that Marie is French” can be represented using Annotated Paraconsistent Logic as follows: German(Peter):v French(Marie):qv Supposing the existence of two data bases with the following contradictory assertions: Peter was born in Berlin, represented by: Born(Peter, Berlin):v Peter was not born in Berlin, represented by: Born(Peter, Berlin):f A system which is based on Annotated Paraconsistent Logics could represent such information as follows: Born(Peter, Berlin):T We emphasize that the designer can choose the most suitable lattice to its application.
Figure 1. Lattice of eight elements.
3. Paraconsistent Automata A paraconsistent finite automata can be defined by a 6-uple M = (Q, ¦, G, W , q0, Pi, F) where: x x x x x
x x
Q is a finite, non-empty set of states; ¦ is a finite input alphabet which represent the set of actions which the automaton can meet; q0:Pi, where q0 Q and Pi W is the annotated initial state; F is the set of final states where F Q; G is the transition function which maps (Q x W ) x (¦ x W ) in Q x W , that is, G(qi, Pi, c, Pe) is a new state qf with minimal belief Pf, such that the event c occurs with minimal belief Pe and the automaton is in state qi with minimal belief Pi ; W is a closed lattice; The symbols of the input sequence are labeled in the form c:Pe, where c is in ¦ and Pe is in W . This notion can be understood as “the event c occurs with minimal belief Pe” with the condition that more than one event can occur simultaneously.
J.P. de Almeida Prado et al. / Modeling the Behavior of Paraconsistent Robots
123
The rules by which the automaton M chooses its next internal state are codified inside its transition function. The transition G(qi,Pi, c, Pe) = (qf, Pf), where M is in state qi Q with minimal belief PI W and the current event (the last one that occurred) is c:Pe, where c ¦ and Pe W , can be read as follows: “if the event c happens with minimal belief Pe and the automaton is in state qi with minimal belief Pi, we can say with belief Pf that the automaton will assume state qf. In other words, the transition function evaluates the belief in the current event and the necessary belief to perform the transition. Let us consider the following Paraconsistent Automaton given by the tuple M = ({A, B, C, D}, {O, 1}, G, W , A:V, {D:V}) G(A:V, 0:QV) = B:QV G(A:V, 1:QV) = C:V G(B:QV, 1:VQD) = B:QV G(B:qv, 1:qv) = D:v G(C:v, 0:V) = D:v It is true that this automaton is in state A (belief degree v). If event 0 occurs with belief vqd the automaton will remain in this state because the belief in the event is not the necessary one to perform the transition. If the event 0 occurs with belief v then the automaton will go to state B with belief qv. While the event 1 occurs with belief vqd the automaton will remain in this state. However if event 1 occurs with minimal belief qv, then the automaton will go to states B and D.
Figure 2. Example of Paraconsistent Automaton
4. An Application We show an application of Paraconsistent Automata in the field of mobile robots. The objective is to describe the behavior of a robot which has five sonars and which must move from one place to another one inside a room. A picture of the robot can be seen in Figure 3.
124
J.P. de Almeida Prado et al. / Modeling the Behavior of Paraconsistent Robots
Figure 3: Picture of Robot Tropeço.
The robot has a reactive behavior which is similar to the one of the robot Emmy presented in [4]. When Emmy detects an obstacle, it tries to deviate from it. The problem is that the sonars can provide different readings which can lead the robot to an inconsistent situation. Due to physical features, sonars are very extremely noisesensitive and because of that they can provide the robot with incorrect, imprecise and inconsistent readings. In spite of it, due to their low cost and usage simplicity, they are one of the most used sensors in robotics. The Paraconsistent Automaton of Figure 4 describes the behavior of the robot. In order to present a simple example, three events with different belief degrees were considered: x Activate the robot is activated; x Arrived the robot has arrived to the destination and x O the robot has found an obstacle. The automaton has six states: x I:t initial state; x Fm:t moving forward with maximum speed; x Fl:qt moving forward with minimum speed; x Dd:qt deviating to the right; x P:qt robot stops and performs new readings; x VF:qt robot stops at the destination. Through this automaton it is possible to observe that the robot will have a defensive attitude: it slows its speed down to the smallest sign of obstacle – state Fl:qt – and begins to deviate if its belief in the existence of obstacles increases – state Dd:qt – and stops to perform new readings if it detects data inconsistency. If we modify the belief degrees of the transitions it is possible to radically change the behavior of the robot. It can, for example, adopt a more audacious attitude, slowing its speed down
J.P. de Almeida Prado et al. / Modeling the Behavior of Paraconsistent Robots
125
only when it is almost sure that an obstacle is ahead on. Paraconsistent Automata is an important tool for the construction of paraconsistent robots because it allows the designer to model, project and analyze the robot’s behavior using arbitrary belief degrees for events and states. By doing this, possible failures of behavior can be detected and corrected before any implementation is done.
Figure 4. Example of Paraconsistent Automaton
5. Conclusions In the last decades many robot navigation paradygms have come up. The first papers have often assumed the existence of a correct and complete model of the environment where the robot would perform and its actions would be determined by planning systems. The great difficulty, or even impossibility, to generate representation models of correct and complete worlds under time constraints that the application imposes, have led designers to adopt reactive approaches where such models are not used. An important system which follows this approach is described in [2], where the behavior of the robot is determined by a finite automaton which maps the sensor inputs into actions and there is no world modeling. This approach however bounds the activities which can be done by robots because their knowledge about the world is restricted to the range of their sensors. A robot which is constructed following an approach which is based in Annotated Paraconsistent Logics is between these two extremes. In this approach there is a model for the world where planning systems can perform but such model does not need to be totally correct and complete because the mechanisms of annotation and manipulation of inconsistencies allow planning systems to perform in an approximated model of the
126
J.P. de Almeida Prado et al. / Modeling the Behavior of Paraconsistent Robots
real world – where there can have inconsistencies – which is constructed under the time constraints imposed by the application. The automaton model proposed in this paper is an important tool for modeling the behavior of paraconsistent robots. We hope to say more in forthcoming papers. References [1] [2] [3] [4]
[5] [6]
J.M. Abe - “Fundamentos da Lógica Anotada”, (Foundations of Annotated Logics) Ph. D. Thesis, (in Portuguese) Universidade de São Paulo, São Paulo, 1992. Brooks, R. A., "Intelligence Without Representation", Artificial Intelligence Journal (47), pp. 139-159, 1991. N.C.A. da Costa, J.M. Abe, J.I. da Silva Filho, A.C. Murolo & C.F.S. Leite, Lógica Paraconsistente Aplicada, (in Portuguese) ISBN 85-224-2218-4, Editora Atlas, 214 pp., 1999. J.I. da Silva Filho & J.M. Abe – “Emmy: a paraconsistent autonomous mobile robot”, in Logic, Artificial Intelligence, and Robotics, Proc. 2nd Congress of Logic Applied to Technology – LAPTEC’2001, Edts. J.M. Abe & J.I. Da Silva Filho, Frontiers in Artificial Intelligence and Its Applications, IOS Press, Amsterdan, Ohmsha, Tokyo, Vol. 71, ISBN 1 58603 206 2 (IOS Press), 4 274 90476 8 C3000 (Ohmsha), ISSN 0922-6389, 53-61, 287p., 2001. J.P.A. Prado, A Paraconsistent Robot Navigation System. In 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2003), Proceedings, Orlando, v.3, p. 217-22, 2003. J.M. Abe & J.I. da Silva Filho, Manipulating Conflicts and Uncertainties in Robotics, Multiple-Valued Logic and Soft Computing, V.9, ISSN 1542-3980, 147-169, 2003.
A System of Recognition of Characters based on Paraconsistent Artificial Neural Networks Luís Fernando Pompeo FERRARA1 , Keiji YAMANAKA2 , SILVA FILHO1,3
João Inácio DA
[email protected] UNISANTA - Universidade Santa Cecília Rua Osvaldo Cruz, 266 CEP-110045- Santos – SP - Brasil [email protected] 2 Centro de Ciências Exatas e Tecnologia da Universidade Federal de Uberlândia Av. João Naves de Ávila, 2160 CEP-38400-902 – Uberlândia – MG – Brasil [email protected] 3 IEA - Instituto de Estudos Avançados da Universidade de São Paulo Av. Prof. Luciano Gualberto,374 Trav.j, Térreo, Cidade Universitária CEP 05508-900 - São Paulo –SP- Brasil 1
Abstract - In this paper we presented a System capable to realize a recognition of characters with base in the theoretical concepts of the Paraconsistent Annotated Logic. The Paraconsistent Annotated Logic PAL as shown in [1] is a class of the Non-Classic Logic which allows to manipulate contradictory signals. In [5] were presented the Paraconsistent Artificial Neural Cells built with Algorithms based on PAL. These Cells showed the capacity of learning certain signals in form of functions applied in their inputs. In this work, based on these Cells, were made connections and groupings among the algorithms to create a Recognizer of Characters Paraconsistent System (RCPS) capable of to learning and recognizing different types of alphabet letters or sources of signals. After the learning characters, the RPCS can recognize the letter with a high efficiency and further compares it to the group of characters learned previously. The results of tests demonstrate that the RPCS can be used as Specialist Systems of words and images Recognition Key words: paraconsistent logic, paraconsistent annotated Logic, neural nets, paraconsistent artificial neural networks, neuro computation.
Introduction Applied systems of recognition of characters only appeared in the 50’s and the first of them were used in systems of recognition of optic characters (OCR - Optical Character
128
L.F.P. Ferrara et al. / A System of Recognition of Characters Based on PAN Networks
Recognition). Initially, the OCR systems recognized some types of sources only. With the progresses achieved through research in the subject, the OCR multi-source systems appeared with larger capacity of recognition and a bigger range of sources. Recently has appeared the OCR-omni-source system, capable to recognize any source type [8]. Nowadays, the researches are concentrated in the development of the intelligent systems of recognition of characters (ICR), where the recognition of characters is done in hand written texts instead of printed paper [10] [7]. We know that the Hand written form is the most complex recognition due to several variations, because from a person to another, the same character can be written of different forms.
1. Recognition of Characters The systems of recognition of characters has been researched with the objective of reproducing, in a certain way, the human capacity of reading texts. The basic operation of these systems can be described as following: first, the computer receives an image, processes it and compares with a pattern, applying in a recognition process. Then, the system trying to imitate the human behavior and their abilities, creating devices and algorithms capable to accomplish the same human functions such as locating objects, classifying patterns and detecting the relation among them. For the recognition of characters is usually used the techniques of recognition of pattern [10]. There are some researchers that prefer to combine different techniques because they believe that it could obtain better results. The two approaches more used in recognition of patterns are; the statistical approaches (or theory of decision) and syntactics (or structural). However recently, the artificial neural networks has been used as a third approach form. The larger difficulty in systems of pattern recognition is the determination of the characteristics group susceptible to extraction, i.e, the characteristics that we should look for in a pattern that will allow the description or classification, if possible, in one mode. A hardware efforts are necessary for obtaining the wished characteristics from the pattern, without efficiency loss caused by mistakes and/or noises interferences. The pattern distortion is a great problem for recognition and not always it is possible to find characteristics that are not seriously affected by degenerative elements. The recognition process can be neutralized if the recognition system is not prepared to work with such distortions.
2 - The Paraconsistent Annotated Logic with Annotations of two values - PAL2v The contradictions or inconsistencies are common when we have described parts of the real world. The analyses, learning and recognition of characters systems used in Artificial Intelligence performing, in general based on the classic logic. In the classic logic the description of the world is considered by two states: False or True. These binary systems cannot treat appropriately the contradictory situations generated by noises in the image or
L.F.P. Ferrara et al. / A System of Recognition of Characters Based on PAN Networks
129
different outlines in the writing of the characters. The Paraconsistent Logic has been created to find means of giving treatment to the contradictory situations. The studies of the Paraconsistent Logic presented results that make possible to consider the inconsistencies [4] [1], then, it is more appropriate to treat problems caused by contradictions situations that appear when we have worked with the real world. The Paraconsistent Annotated Logic (PAL) is a class of Evidential logic that treats the signals represented by annotations that allows a description of real world and solves the contradictions through Algorithms. In the PAL the proposition is accompanied with annotations. Each annotation, belongs to a finite lattice and attributes a value for the correspondent proposition. A Paraconsistent Annotated Logic can be represented as a finite lattice of "four states", according to the Figure 1 (a). The Paraconsistent Annotated Logic with annotation of two values - PAL2v is an extension of PAL and it can be represented through a lattice of four vertexes [2] where we can determinate some terminologies and conventions, as following: Be W = < |W|, d > a fixed finite lattice, where: 1. ~W~ = [0, 1] u [0, 1] 2. d = {((P1, U1), (P2, U2)) ([0, 1] u [0, 1])2~P1 d P2 e U1 d U2} (where d indicates the usual order of the real numbers). In the Paraconsistent analysis the main objective is to know with what value of certainty degree Dc we can affirm that a proposition is False or True. Therefore, it is considered as a result of the analysis only the value of the certainty degree Dc, and the value of the Contradiction degree Dct is a indicative of the inconsistency measure. If the result of the analysis is a low certainty degree value or a high inconsistency, the result will be undefined. These values can be put in two representing axes of finite lattice, according to the figure 1 (b). The control values adjusted externally are limits that will serve as reference for analysis. T Dct = +1
T
Dct = P1+P2 -1
F= false
t= true
t Dc = +1
F Dc = -1
Dc = P1-P2
A (a)
A Dct = -1 (b)
Figure.1 Finite lattice of PAL2v four states with values.
130
L.F.P. Ferrara et al. / A System of Recognition of Characters Based on PAN Networks
3 - The Paraconsistent Artificial Neural Cells A lattice description uses the values obtained by the equations results in the Algorithm denominated "Para-analyzer"[4]. This algorithm can be written in reduced form, expressing a Paraconsistent Artificial Neural Cell basic PANCb, as described: * / Definitions of the adjustable values * / Vhcc = C1 * / high value of the certainty control * / Vlcc = C2 * / low Value of certainty control * / Vhctc = C3 * / high value of contradiction control * / Vlctc = C4 * / low Value of contradiction control * / * / Inputs variables * / P1, P2 * / Outputs variables * Digital Output = S1 Analogical Output = S2a Analogical Output = S2b * / Mathematical expressions * / being: Dct = P1 + P2 - 1 and: Dc = P1 - P2 * / determination of the extreme logical states * / then S1 = t If Dc t C1 If Dc d C2 then S1 = F If Dct t C3 then S1 = T If Dct d C4 then S1 = A Or else S1 = I - indefinite Dct = S2a Dc = S2b * / END * / The element capable to treat a signal composed by two degrees of belief and disbelief (P1a, P2a) supplying a output result in the form: DCt = contradiction degree, DF = belief degree and X = constant of annotation indefinite, is called named Paraconsistent Artificial Neural Cell basic (PANCb). The figure 2 (a) shows the representation of a PANCb. The studies of PANCb originated a family of Paraconsistent Artificial Neural Cells that constitute the basic elements of the Paraconsistent Artificial Neural Networks (PANN"s). In this work, were necessary only three types [4] of Cells for the elaboration of the Recognizer of Characters Paraconsistent System (RCPS): 1-The Paraconsistent Artificial Neural Cell of Learning PANl, that can learn and memorize an applied pattern in its input. 2-The Paraconsistent Artificial Neural Cell of Simple Logical Connection of Maximization - PANCLs, that determines its output signal by the largest applied value in the input.
L.F.P. Ferrara et al. / A System of Recognition of Characters Based on PAN Networks
131
3-The Paraconsistent Artificial Neural Cell of Decision -PANCd that determines the final result from the Paraconsistent analysis. The Paraconsistent Artificial Neural Cell of learning PANl is a Paraconsistent Artificial Neural Cell basic with an output P1r interlinked to the input P2c (complemented disbelief degree) according to figure 2 (b).
(a)
(b)
Figure. 2 (a) Paraconsistent Artificial Neural Cell basic PANCb. (b) Paraconsistent Artificial Neural Cell of learning (ready to receive patterns)
As we see below, in the Learning Algorithm, successive applied of values to the input of the Belief degree (P1) results in the gradual increase of the Belief degree of the output (P1r). This Cell can work of two ways: by learning the belief pattern, where are applied values P1 =1 successively until the Belief degree of the output to arrive to the P1r=1, or by learning the pattern of falsehood, in this case are applied values P1=0 until the degree of belief resulting arrives to the P1r=1. Learning algorithm for the Paraconsistent Artificial Neural Cell - PANl 1 - Beginning: P1r = 1/2 * / virgin Cell * / 2 - Define: FL = Value where: FL t 1 * Enter with the value of the learning Factor * / 3 - Define: FLO = Value: FLO t 1 * Enter with the value of the Loss Factor * / 4 - Define: P * / input Pattern, 0 d P d 1 * / 5 – Do: Dci = P - P2c * / Calculates the Degree of initial belief * / 6 – If Dci 0 Do:P1 =1 - P * / The degree of belief is the complement of the pattern * /
132
L.F.P. Ferrara et al. / A System of Recognition of Characters Based on PAN Networks
7 - If Dci > 0 Do: P1 = P * / The degree of belief is the Pattern * / 8 -Do: P2 = P1r * / Connects the output of the cell in the input of the disbelief degree * / 9 –Do:P2c =1 - P2 * / Applies the Complement in the value of the input of the disbelief degree * / 10 – Do: Dc = P1 - P2c * / Calculates the Degree of Belief * / 11 – If Dc t 0, do: C1 = FL 12 – If Dc < 0, do: C1 = FLO 13 - Do: P1r = {(Dc x C1) +1} + 2 * / Found the degree of Belief resulting in the output * / 14 - While P1r z 0, returns to the step 8 15 - If P1r = 0, do: P1r = 1 and P1 = 1 – P * / Applies the function NOT and it complement the Belief degree * / 16 - Returns to the step 8
4. The Recognizer of Characters Paraconsistent System (RCPS) In this work the Recognizer of Characters Paraconsistent System is composed by basic modules denominated: Paraconsistent Artificial Neural Unit of Comparison of Patterns PANUCP. The PANUCP store characters patterns that will be compared with others that will be applied in the inputs. Each Unit - PANUCP is composed of two Paraconsistent Artificial Neural Cells; a Paraconsistent Artificial Neural Cell of learning PANl and a Paraconsistent Artificial Neural Cell of Simple Logical Connection of Maximization PANCLs. The output of the modules is connected to a Paraconsistent Artificial Neural Cell of Decision - PANCd. The diagram blocks from the figure 3 shows how a Recognizer Characters Paraconsistent System is composed.
Figure.3. Diagram of Recognizer of Characters Paraconsistent System.
L.F.P. Ferrara et al. / A System of Recognition of Characters Based on PAN Networks
133
The PANCCP Modules can be joined to expand the recognition capacity. The amount of interlinked modules will depend of project application. 5. Methodology We opted to build a Recognizer that allows the comparison with five types of sources. In this work four modules of Paraconsistent Artificial Neural Unit of Comparison of Patterns - PANCCP was used. Experimental works – Test 1 In this first experimental works the net was trained for recognition of just a type of source, being each character formed from a head office [11 X 9] this is, for 11 lines and 9 columns of points. The net was trained with characters based in the source Arial 11. Later, it was applied, in the input of the net, the added character of a noise generated by the own program. This noise makes alterations in about 20% of the points that constitute the character. After the recognition, was exhibited in the output of the net, the identified pattern and the Degree of Belief resulting from the same. Experimental works – Test 2 In this second test, the net was trained to learn the characters based in the sources Arial 11, Tahoma 11, Comic San Ms 11, Century Gotic 11 and Georgia 11. Later, was applied into the input of the net the character of one of the sources added of noise. This noise produced alterations in about 20% of the points that constitute the character. After the recognition, in the output of the net, it was displayed the identified pattern (shown through the source arial 11), the Degree of Belief resulting from the same, the source regarding to the recognized character. For a better evaluation of the experiment, it is exhibited the Degree of Belief of the three characters that had their values of closer degrees of belief resultant of the recognized character.
Figure.4. Screen of the program, representing the recognition of the character "A"
134
L.F.P. Ferrara et al. / A System of Recognition of Characters Based on PAN Networks
6. Practical Results In the first test, the Degree of Belief of the characters recognized by the net varied from 0.7749 (obtained in the recognition of the letter "C") up to 0.9325 (obtained in the recognition of the letter "X"). The largest doubt found by the net was in the recognition of the letter "Q", where the difference between Degree of Belief in relation to the letter "O" was of 0.0095. In the second test, the Degree of Belief of the characters recognized by the net varied from 0.7838 (obtained in the recognition of the letter "P") up to 0.9436 (obtained in the recognition of the letter "D") and the largest doubt found by the net was in the recognition of the letter "O", where the difference between Degree of Belief in relation to the letter "Q" was of 0.0229.
7. Conclusion Analyzing the medium value and the standard deviation of the Degrees of Belief is verified that that both tests produced similar results, because, first test resulted a medium value of 0.8528 with a standard deviation of 0.0423 and, the second test the medium value was of 0.8578 with a standard deviation of 0.0440. The results demonstrate that the algorithms of the Paraconsistent Artificial Neural Cells, when interlinked, as proposed in this work, can become a robust Recognizer of Characters System. This Recognizer of Characters Paraconsistent System (RCPS) can be used in several fields of Artificial Intelligence area; such as Specialist Systems of words and images Recognition.
ABE, J. M “Fundamentos da Lógica Anotada” (Foundations of Annotated Logics), in Portuguese, Ph D thesis, University of São Paulo, FFLCH/USP - São Paulo, 1992. BISHOP, C.M. Neural Networks for Pattern Recognition. 1.ed. Oxford University Press, 1995. DA COSTA, N.C.A. & ABE, J.M. & SUBRAHMANIAN, V.S., “Remarks on Annotated Logic” Zeitschrift fur Mathematische Logik und Grundlagen der Mathematik, Vol.37, 561-570, 1991. DA SILVA FILHO, J.I.& ABE, J.M. Fundamentos das Redes Neurais Artificiais - destacando aplicações em Neurocomputação. 1.ed. São Paulo, Editora Villipress, Brazil 2001. FAUSETT, L. Fundamentals of Neural Networks, Architectures, Algoritms and Aplications. Editora Prentice-Hall, Englwood Cliffs – 1994 GOVINDAN, V. K. & SHIVAPRASAD A. P., “Character Recognition – A Review”, Pattern Recognition, v. 23, n. 7, p. 671-683, 1990. MCCULLOCH , W & PITTS, W. “A Logical Calculus of the Ideas Immanent in Nervous Activity” Bulletin of Mathematical Biophysics, 1943. MORI, Shunji, SUEN, Ching Y. & YAMAMOTO Kazuhiko., “Historical Review of OCR Research and Development”, Proc. IEEE, v. 80, n. 7, p. 1029-1057, July 1992. SIEBERT, W. “Stimulus Transformation in Peripheral Auditory System in Recognizing Patterns” Ed. Murray Eden, MIT Press, Cambridge, 1968. SUZUKI, Y., “Self-Organizing QRS-Wave Recognition in ECG Unsing Neural Networks”, IEEE Trsnd. On Neural Networks, 6, 1995.
Feature Subset Selection for Supervised Learning using Fractal Dimension Huei Diana Lee a,b,1 , Maria Carolina Monard a and Feng Chung Wu b,c a University of São Paulo – USP, Institute of Mathematics and Computer Science – ICMC, Laboratory of Computational Intelligence – LABIC, São Carlos, SP, Brazil b West Paraná State University – UNIOESTE, Engineering and Exact Sciences Center, Bioinformatics Laboratory – LABI, Foz do Iguaçu – PR, Brazil c Institute of Technology in Automation and Informatics – ITAI, Foz do Iguaçu – PR, Brazil Abstract. Feature Subsect Selection is an important issue in machine learning, since non-representative features may reduce accuracy and comprehensibility of hypotheses induced by supervised learning algorithms. Feature Subsect Selection is applied as data pre-processing step, which aims to find a subset of features that describes well the data to be used as input to the inducer. Several approaches to this problem have been proposed, among them the filter approach. This work proposes a filter which uses Fractal Dimension as importance criterion to select a subset of features from the original data. Empirical results on real world data sets are presented. Performance comparison of the proposed criterion with two other criteria frequently considered within the filter approach shows that Fractal Dimension is an appropriated criteria to select features for supervised learning. Keywords. Feature Subset Selection, Fractal Dimension, Machine Learning
1. Introduction In supervised Machine Learning – ML – the induction algorithm learns from a training data set, in which every example is described by a feature vector and its class label. The task of the induction algorithm is to induce a classifier (hypothesis) that labels new cases with a good accuracy [1]. However, some of these features may be irrelevant or redundant. Avoiding irrelevant or redundant features is important because they may have a negative effect on the accuracy of the induced classifier. Furthermore, by using fewer features it may be possible to reduce the cost of acquiring data and improve the comprehensibility of the classification model. Thus, one central problem in ML is Feature Subset Selection – FSS – which aims to find a subset of features that describes the data set as well as the original features of the data set do [2]. There are several FSS approaches [3]. In this work, we propose the use of filter approach considering as relevance criterion the Fractal Dimension – FD – of the data set. It should be observed that although there are well known applications of fractal theory in high-dimensional indexing structures and cluster detection, fractal theory is still not 1 Correspondence to: H. D. Lee, P. O. Box 961, 85870-650 - Foz do Iguaçu, PR, Brazil. Tel.: +55 45 5768114; Fax: +55 45 5752733; E-mail: [email protected]
136
H.D. Lee et al. / Feature Subset Selection for Supervised Learning Using Fractal Dimension
much used in FSS problems applied to supervised learning algorithms, as proposed in this work. Several experiments on medical data sets using FD to FSS are presented. The performance of FD is compared to other two criteria frequently considered within the filter approach, showing that the FD is another appropriated criteria for FSS. The remainder of this paper is organized as follows: Section 2 describes the feature subset selection problem. Sections 3 and 4 give a brief description of fractals and fractal dimension. Section 5 contains the empirical study, presenting used data sets, experimental setup and applied tools. Results and discussion are presented in Section 6, and Section 7 concludes this work and points out some future work. 2. Feature Subset Selection Feature Subset Selection is frequently applied as a data pre-processing for Machine Learning. Its objective is to choose a subset from the original features that describe a data set by removing irrelevant and/or redundant features. FSS has also shown its contribution in dealing with large dimensionality data as the one used in Data Mining [2,3]. In other words, FSS aims at extracting as much information as possible from a given data set by keeping the smallest number of features that describe the data set as well, or better, than the original set of features do. This is achieved by removing irrelevant and/or redundant features according to some importance criterion. Some advantages associated to FSS in supervised learning are related to: reducing potencial hypothesis space; improving data quality, thus increasing the efficiency of the learning algorithm; improving predictive accuracy, and enhance comprehensibility of the induced classifier [4,5,6]. There are three main FSS approaches: embedded, filter and wrapper [7]. In the first one, FSS is performed internally by the algorithm itself, i.e., is embedded within the induction algorithm. The filter approach introduces a separate process, which occurs before the application of the induction algorithm itself. The idea is to filter features before the induction takes place, based on general characteristics from the data set in order to select some features and discard others. Thus, filter methods are independent from the induction algorithm which simply takes as input the output from the filter, i.e., the reduced data set. The wrapper approach also happens externally to the induction algorithm. However, it uses such algorithm as a black box to evaluate candidate feature subsets using, for example, the classifier accuracy to evaluate the feature subset in question. This process is repeated on each feature subset until a stopping criterion is satisfied. Thus, contrary to filters, wrappers are computationally expensive. Most supervised learning FSS methods consider as importance criterion feature relevance to determine the class attribute. Nevertheless, feature relevance alone is insufficient for efficient FSS. Although some research work [6,8] pointed out existence and effect of feature redundancy in FSS, more work is needed on explicit treatment of redundancy [4]. In fact, using as importance criterion the Fractal Dimension of the data set in order to filter features, as proposed in this work, treats the problem of redundancy, which in this work is compared with other two importance criteria based on feature relevance. 3. Fractals Fractals are defined by the property of self-similarity, i.e., they present the same characteristics for different variations in scale and size. Thus, parts of the fractal, which may
H.D. Lee et al. / Feature Subset Selection for Supervised Learning Using Fractal Dimension
137
be a structure, an object or a data set, are similar, exact or statistically, to the fractal as a whole. In general, fractals have unusual characteristics, such as the well known Sierpinsky Triangle – Figure 1 – which has infinite perimeter and null area. Therefore, neither can be considered an Euclidean uni-dimensional object, because it has infinite perimeter, nor a bi-dimensional Euclidean object, as it presents a null area [9]. Consequently, a fractionary dimension may be considered, and is denoted Fractal Dimension [10].
Figure 1. Fractal Example – Sierpinsky Triangle.
It should be observed that many real data sets behave like fractals. Hence, it is natural the idea of applying concepts from fractal theory to analyze these data sets [11]. 4. Fractal Dimension of a Data Set Fractal Dimension can be associated to the idea of redundant features in a data set description, and the possibility of this data set being well described in a smaller dimension, i.e., using a subset of features. The main idea is to employ the FD of the data set, which is relatively not affected by redundant features, as the criterion to determine how many and which are the most important features in the data set. In this way, the concepts of embedding dimension and intrinsic dimension should be defined. The first one is concerned with the number of features of the data set (its address space). However, the data set may be representing an object that has a smaller dimension than the one in which it is immersed. Thus, the intrinsic dimension is the spacial dimension of the object represented by the data set. Conceptually, if a data set holds all its variables (features) independent one from the others, then its intrinsic dimension will be equal to the embedded dimension. However, when there is a correlation between two or more variables, the intrinsic dimension of the data set is reduced accordingly. Usually, neither correlations between features nor the existence of these correlations are known. By means of the intrinsic dimension of the data set, it is possible to decide how many features are necessary to describe it. Different types of correlation may reduce the intrinsic dimension in different proportions, even by fractionary proportions. Hence, the concept of FD may be used as the intrinsic dimension of a data set [12]. There are several measures of FD. Exactly self-similar fractals, i.e., the ones characterized by well defined construction rules, may have their FD calculated by D = log(R)/log(e), where R represents the quantity of parts and e the scale in which the parts are generated at each iteration. For example, for the Sierpinsky Triangle – Figure 1 – D = log(3)/log(2) = 1.58496, since three parts in a 1:2 scale are generated at each iteration, as shown in Figure 2. Statistically self-similar fractals, such as real world data sets, may have their Fractal Dimension defined in many ways. One of them is the Correlation Fractal Dimension D2 that can be calculated using the Box Count Plot method [11]. This method consists in
138
H.D. Lee et al. / Feature Subset Selection for Supervised Learning Using Fractal Dimension
Figure 2. Example of Iteration to Construct the Sierpinsky Triangle.
embedding the data set, with a point set in a N -dimensional space, in a N -grid where cells have sides of size r. Afterwards, focusing on the i-th cell, the number of points that fall into each cell (Cr,i ) is counted, and the value i Cr,i 2 is computed. The Correlation Dimension D2 is defined by Eq. 1. D2 =
∂log( i Cr,i 2 ) , r ∈ [rmin , rmax ] ∂log(r)
(1)
In theory, exactly self-similar fractals are infinite. In practice, real world data sets which present a finite number of points, are considered statistically self-similar fractals for a determined interval of scales r ∈ (rmin , rmax ), if they fulfill a well known construction rule in this interval. Therefore, the intrinsic dimension of a specific data set may be by the slope of the linear part of the resulting graph obtained from plotting measured 2 C for different values of r [12]. In this work, the Correlation Dimension D2 will i r,i be denoted simply as Fractal Dimension D. 5. Empirical Study In this section, experiments performed in order to evaluate the proposed method using four medical data sets are presented. 5.1. Description of Data Sets The following four data sets from UCI Repository [13] were used in the experiments: Bupa The problem is to predict whether a male patient has or has not a hepatic disfunction considering several blood exams and the amount of consumed alcohol; Pima The problem is to predict whether a female patient, at least 21 years old of Pima Indian heritage, shows signs of diabetes according to World Health Organization, given clinical and laboratorial data; Breast Cancer The problem is to predict whether a mammary tissue sample, obtained from a patient, is malignant or benign; Hungarian The problem is to predict whether a patient has or has not a heart disease. Table 1 shows characteristics of these four data sets. For each data set it describes: number of examples; number and percentage of duplicated examples (i.e. appear more than once) or conflicting examples (i.e. having same feature values except for the class value); total number of features together with the number of continuous and discrete features; class value and distribution; majority class error (i.e. error when a new case is predicted as belonging to the class that occurs most), and existence or not of missing values.
139
H.D. Lee et al. / Feature Subset Selection for Supervised Learning Using Fractal Dimension
Table 1. Summary of Data Sets. Data Set
# Examples
# Duplicate or Conflicting (%)
# Features (cont.,nom.)
345
4 (1.16%)
6 (6,0)
Bupa Pima
769
1 (0.13%)
8 (8,0)
Breast
699
8 (1.15%)
9 (9,0)
Cancer Hungarian
294
1 (0.34%)
13 (13,0)
Class %
Class
Majority Error
Missing Values
No
1
42.03%
42.03%
2
57.97%
on value 2
0 1
65.02% 34.98%
34.98% on value 0
No Yes
2
65.52%
34.48%
4
34.48%
on value 2
1
36.05%
36.05%
0
63.95%
on value 0
Yes
Data Set 8 ...
All Features
Data Set 1
Model Construction
Filter FSS (Fractal Dimension)
Selected Features
(Decision Rules)
1
3
Results
Evaluation
4
2
Figure 3. Experimental Setup.
5.2. Experimental Setup Experiments were performed in four steps shown in Figure 3 and described next. Step 1. Data Pre-processing: consisting of two tasks related to data cleaning and data preparation respectively. During data cleaning, missing (unknown) values were removed in the following way: whenever unknown values were concentrated in few examples, these examples were removed from the data set; whenever unknown values were concentrated in one feature, the correspondent feature (column) was removed from the data set. The main reason to remove missing values from the data sets used in the experiments is due to the fact that some algorithms, such as C4.5 used in this work, treat missing values in a special way [14], while other algorithms do not treat missing values. Aiming at not introducing interferences associated with the use of one or another method to treat this problem, it was decided to remove missing values from the data sets. Regarding the second task, data preparation, two new data sets were generated from each original data set: one with the class feature and another without this feature. It is interesting to notice that FD calculation considers every feature indistinctly, including the class feature. This procedure was adopted with the objective of verifying the real influence of the class feature in each data set. At the end of this step, all data sets were converted to the syntax required by each one of the algorithms used as filter; Step 2. Feature Subset Selection: the proposed filter approach, considering the FD as the importance criterion to determine how many and which are the important features to describe the data set, was performed in this step; Step 3. Model Construction: models (classifiers) were induced using those data sets considering all features remaining after Step 1, as well as considering only features selected in Step 2;
140
H.D. Lee et al. / Feature Subset Selection for Supervised Learning Using Fractal Dimension
Step 4. Results Evaluation: for real world data, prior knowledge about important features is not often available, so predictive accuracy is commonly used as an indirect measure to evaluate the quality of the selected features. Thus, in this step, for each one of the induced models, its averaged error rate was estimated using 10 fold cross-validation. Related to Step 1, there were only found missing values in data sets Breast Cancer (resulting in 683 examples and equal number of features), and Hungarian (resulting in 261 examples and 10 features). In Step 2, Fractal Dimension D values and ranking of the features according to this criterion were measured using the Measure Distance Exponent – MDE – tool [15]. Observe that determining which features are important means to determine which ones, when excluded from the data set, cause a change in the recalculated FD value. Based on that the method used by the MDE tool consists in measuring the FD value D from the original data set and also the partial FD values, pD 1 , ignoring one feature at a time. The process continues by selecting the feature that allows the minimum difference between D and pD. If the difference is within a small limiar, which determines how accurate the resulting data set needs to be in order to preserve the characteristics of the original data set, this feature may be considered to contribute little to the characterization of the original data set. The process continues, considering the remaining feature set and D = pD and applying the described procedure, until there are no more features to be removed. At the end of the process, features will be inversely ordered according to their importance to measure the Fractal Dimension of the data set [12]. For Model Construction, Step 3, rules were induced using See5 (http://www.rulequest.com), a comercial version of C4.5 [16] supervised induction algorithm. 6. Results and Discussion Table 2 shows results obtained using FD as well as the ones obtained in a previous work [17], using as importance criteria to filter features the Column Importance measure – CI – available from MineSet TM (Silicon Graphics Inc.) and C4.5. Column one in Table 2 identifies the data set and its majority class error. Columns two, three, four and five show mean error and standard error rates of models induced by See5 using 10 fold cross-validation, considering all features and only the ones selected by FD, CI, and C4.5, respectively, as well as the percentage of filtered features. Furthermore, in order to evaluate statistical significance of the models induced by See5 using features selected through FD and the ones selected by the other filters, t-Test results are also shown, where t should verify |t| ≥ 2.1 at significance level 0.05. Not considering the number of features selected by each filter, it can be observed that the models induced by See5 using selected features by FD are competitive with the models using the other two filters, except for Hungarian data set. Related to the number of features selected considering all data sets, FD selected 39.94% while CI and C4.5 selected 64.65% and 88.03%, respectively. In fact, only CI for Bupa data set selected less features than FD, although the error of the induced model increased. The set of features selected by FD seems to be appropriated to Breast Cancer data set. Besides the induced model being competitive with the other two, although not at the 95% confidence level, it selected a much smaller number of features. This shows that Fractal 1 Partial
FD pD is calculated taking into account all features except for the i-th feature under consideration.
H.D. Lee et al. / Feature Subset Selection for Supervised Learning Using Fractal Dimension
141
Table 2. Averaged Error, Standard Error and Percentage of Selected Features. All Features
FD
CI
C4.5
(% SF)
(% SF)
(% SF) t (FD x CI)
(% SF) t (FD x C4.5)
Bupa
31.90±2.00
33.10±2.80
41.42±2.85
32.70±2.79
(42.03)
(100.00%)
(66.67%)
(16.67%)
(100.00%)
-0.93
0.05
26.53±0.73 (75.00%)
25.88±0.99 (87.50%)
0.27
0.38
Data Set (MC Error)
Pima (34.98)
25.40±1,10 (100.00%)
27.80±2.00 (50.00%)
Breast Cancer
5.00±1.20
5.30±1.10
5.86±0.84
6.01±0.76
(34.48)
(100.00%)
(20.00%)
(90.00%) -0.18
(80.00%) -0.24
Hungarian
20.00±2.10
31.80±2.80
19.74±2.50
20.09±2.59
(36.05)
(100.00%)
(23.08%)
(76.92%)
(84.62%)
1.44
1.37
Dimension has potencial to filter features to be used by learning algorithms. Moreover, as stated earlier, FD does not distinguish the class feature from other features. In our experiments with and without the class feature, FD extracted the same features. This result should be expected in supervised learning since the class feature must be dependent on the other features, in order that the embedded concept in data can be mapped by a function c, such that y = c(E), where E is any example from the data set and y is the associated class of this example. If different feature subsets were selected from a data set with and without the class feature, this would mean that the class is independent from the other features, violating the class attribute concept. 7. Conclusions This work proposes the use of Fractal Dimension as a criterion to filter features to be used by supervised learning algorithms. A series of experiments using medical data sets were performed using FD as filter, and were compared to filters that use Column Importance and C4.5 criterion. In most cases, results show that models induced using selected features by FD have similar accuracy as the ones constructed with the features selected by the other two importance criteria considered. Furthermore, in average, FD selected less features than the other two filters. Note that each selected feature subset is optimum with respect to the criterion used as filter. For example, the subset selected using FD contains the most important features according to this criterion, and the same happens for CI and C4.5. Is should be observed that there is no consensus related to a best feature importance measure, since it depends on the question “important related to what ?”. In general, the answer is related to the application to which features are selected for. Thus, in order to select a subset of relevant features for a given data set, several methods and criteria should be tested with the objective of verifying which ones are the best. Future work includes the analysis, with the help of a domain specialist, of features selected using different criteria. Among others, this analysis would allow to verify if
142
H.D. Lee et al. / Feature Subset Selection for Supervised Learning Using Fractal Dimension
features selected by one method are more interesting than the ones selected by another method, as well as models induced using these features, from the point of view of the domain specialist. Acknowledgements We would like to thank Elaine P. de Sousa and Humberto Razente for their valuable help. References [1] Mitchell, T. M. (1997). Machine Learning. WCB McGraw-Hill. [2] Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, pages 273–324. [3] Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge and Data Mining. Kluwer Academic Publishers, Massachusetts. [4] Yu, L. and Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5:1205–1224. [5] Blum, A. L. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, pages 245–271. [6] Koller, D. and Sahami, M. (1996). Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 284–292, Bari, Italy. [7] John, G., Kohavi, R., and Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Kaufmann, M., editor, Proceedings of the Eleventh International Conference on Machine Learning, pages 167–173, San Francisco, CA. [8] Hall, M. A. (2000). Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 359–366, Stanford, CA. [9] Sousa, E. P. M. (2003). Classification and Detection of Clusters using Indexing Structures. Doctorate Qualification Exam, ICMC-USP. (in portuguese) [10] Mandelbrot, B. B. (1985). The Fractal Geometry of Nature: Updated and Augmented. W. H. Freeman and Company, New York. [11] Faloutsos, C. and Kamel, I. (1994). Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension. In Proceedings of the 13th ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems (PODt’94), pages 4–13, Minneapolis, MN. [12] Traina, C., Traina, A. J. M., Wu, L., and Faloutsos, C. (2000). Fast feature selection using fractal dimension. In Proceedings of the Fifteenth Brazilian Data Base Symposium, pages 158– 171, João Pessoa, PA. [13] Blake, C., Keogh, E., and Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/ mlearn/MLRepository.html. [14] Batista, G.E.A.P.A and Monard, M.C. An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence, Vol. 17, No. 5, pages 519–533. [15] Traina, C., Traina, A. J. M., and Faloutsos, C. (2003). Mde - measure distance exponent manual. [16] Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann. San Francisco. [17] Lee, H. D. (2000). Selection and Construction of Relevant Feature for Machine Learning. Master’s Thesis, ICMC-USP. (in portuguese) http://www.teses.usp.br/teses/disponiveis/55/55134/tde-15032002-113112.
Functional Language of Digital Computers I Kenneth K. Nwabueze 1 University of Brunei, BRUNEI Abstract. In digital electronics there may be situations in which the designer may need the system to be able to “undo" certain processes. If that is a requirement, what the designer does, in simple mathematical terms, is to restrict the domain so that the process becomes a one-to-one function. The purpose of this short note is to discuss some simple logical operations in computer science in the context mathematical functions. Keywords. Functions, numbers, computers, bits, idempotent, operations.
1. Introduction Like any other machine, a computer hardware is normally switched on in order for electrons to flow through and activate it. Be that as it may, this is just all that computer hardware requires to function. A computer is like an ignorant being that needs to be told specifically what to do, and computer programs are the tools that tell the computer precisely what to do after being switched on. Computers understand only one language, namely the machine code. The machine code is a sequence of bits (binary digits), namely, 0 and 1. Electronic computers use binary digits for internal representations of integers, and either base 8 (octal) or 16 (hexadecimal) for display purposes. We start by discussing operations involved in the conversion of numbers from one base to the other in terms of the concept of mathematical functions in the classical sense. We shall extend the discussion by summarizing the process involved in logical operations, compilers and error message generators, as functions. Although these concepts are usually presented without the benefit of functions, a non-expert’s understanding of the material can be enhanced if the concepts are viewed in a broader context via functions.
2. Number Base Conversion The conversion of numbers from one number base to another is a function. To see this, let fa,b (x) be the number base conversion process which takes a value expressed in base a and converts it into base b; for example f2,10 (11001) = 25 and f16,10 (AF B2) = 44978 (cf [1]). It is easy to see that fa,b (x) is a function in the classical sense. Note that in the function fa,b (x) from one base into itself does not change the number, that is, 1 Tel:
K.K. Nwabueze / Functional Language of Digital Computers I
fa,a (x) = x. Moreover, fa,b (x) is one-to-one, and so has an inverse. Recall that when we convert from a first base into a second base and then convert that result from the second base into the first base, we get the original number back again, that is, fb,a (fa,b (x)) = x, and this is a well known property of a one to one function. The idea of one-to-one functions is usually utilized in designing digital electronics. In digital electronics there may be situations in which the designer may need the system to be able to “undo" certain processes. When that is the case, what the designer does, in simple mathematical terms, is to restrict the domain so that the process becomes a one-to-one function. Also recall that a practical method for conversion between two bases different from base 10 is to convert from the first base to base 10 and then convert from base 10 into the second base. This means that fa,b (x) = f10,b (fa,10 (x)), and this is simply a composition of functions. 2.1. Functions from base 2 to 8 or 16 We now give particular examples of functions arising from conversions between some of the bases used by the computer for internal representation and external display, namely bases 2, 8, and 16. Note that these bases are related, since each base is a power of 2, that is 21 = 2, 23 = 8, and 24 = 16. We now display the conversion of a base 2 number into a base 8 number as a function fa,b ; for example when a = 2 and b = 8 one has f2,8 (1111101). Recall that this is done by separating the base 2 number into groups of three binary digits (going from right to left) as follows: 1 111 101. Each group of digits is then converted into the appropriate octal, that is, f2,8 (1111101) = f2,8 (1 111 101) = 175. The reverse process of going from base 8 into base 2 is equally easy. Recall that the value of f8,2 (2716) is computed by taking each of the octal digits in the base 8 number and converting them into three binary digits: f8,2 (2716) = 010 111 001 110 = 010111001110. A similar process is used between base 2 and base 16, except that each of the hexadecimal digits represents 4 binary digits. For example, f2,16 (10101001) = f2,16 (1010 1001) = AD. To convert from base 16 to base 2, each hexadecimal number is replaced with 4 binary digits. For example: f16,2 (A3C7) = 1010 0011 1100 0111 = 1010001111000111. It is now easy to see that the steps used in the above conversions are properties of a function in the classical sense. We provide more examples. 3. Logical Operations We now discuss some logical operations in terms of functions. 3.1. The N OT (x) Function The unary N OT operation is a function in the classical sense, and has the following two function calls: N OT (T rue) = F alse, N OT (F alse) = T rue. Note that the function N OT is one-to-one, and so there exists an inverse function, say N OT −1 , where N OT −1 (F alse) = T rue and N OT −1 (T rue) = F alse. Because we have that N OT (F alse) = T rue and N OT (T rue) = F alse, we conclude that the function N OT is its own inverse. A quick way of looking at this situation is to note that N OT (N OT (T rue)) = T rue and N OT (N OT (F alse)) = F alse, or N OT (N OT (x)) = x, showing again that N OT is its own inverse. This function is called an idempotent function. Therefore, the function N OT is an example of an idempotent function.
K.K. Nwabueze / Functional Language of Digital Computers I
145
3.2. The OR(x, y) Function The binary OR(x, y) operation is a function in the classical sense, and we have the following four function calls: OR(T rue, T rue) = T rue OR(T rue, F alse) = T rue OR(F alse, T rue) = T rue OR(F alse, F alse) = F alse Note that the above four function calls imply that, although OR(x, y) is a function, it is not a one-to-one function; and so the inverse does not exist. To see this, observe that there are three instances where OR(x, y) = T rue and we cannot predict the specific values of x and y which produced the result of True. The only thing that can be inferred is that at least one of the two values must be True. Be that as it may, we can make the OR(x, y) function a one-to-one function by an appropriate restriction on the domain; that is, we need to specify a subset of the function which is one-to-one. One example of a typical restriction of the domain of OR(x, y) would be that x = F alse. Another example of a restriction would be that y = F alse. A third trivial example of a restriction would be that x = y = F alse. The restrictions in the three examples above result in a one-to-one function, and so an inverse exists for each of these examples. 3.3. The XOR(x) Function The binary XOR operation is a function in the classical sense, with the following four function calls: XOR(T rue, T rue) = F alse XOR(T rue, F alse) = T rue XOR(F alse, T rue) = T rue XOR(F alse, F alse) = F alse This implies that XOR(x, y) does not have an inverse, since it is not a one-to-one function. To see that this is not one-to-one, observe that there are two instances where XOR(x, y) = T rue and the specific values of x and y which produced that result of True cannot be predicted. However, one can predict that the x and y values had to be different. Note also that there are two instances where XOR(x, y) = F alse, and the specific values of x and y which produced that result of False cannot be predicted. However, we can conclude that x and y have to be equal. Although XOR(x, y) is not one-to-one, one can restrict the domain by specifying a subset of the function which is one-to-one. For an inverse to exist, one can, for example, restrict the domain of XOR(x, y) to be x = T rue. Other examples of a restriction would be that x = F alse, y = T rue, or y = F alse. Similar to the OR(x, y) and XOR(x, y) functions, one can derive equivalent conclusion for the N OR(x, y), N AN D(x, y), and AN D(x, y) operations. We present their function calls next. 3.4. The N OR(x, y) Function N OR(x, y) is a binary function with the four function calls:
146
K.K. Nwabueze / Functional Language of Digital Computers I
N OR(T rue, T rue) = F alse N OR(T rue, F alse) = F alse N OR(F alse, T rue) = F alse N OR(F alse, F alse) = T rue 3.5. The AN D(x, y) Function AN D(x, y) is the binary function with the following four function calls: AN D(T rue, T rue) = T rue AN D(T rue, F alse) = F alse AN D(F alse, T rue) = F alse AN D(F alse, F alse) = F alse 3.6. The N AN D(x) Function N AN D(x) is a binary function with the four function calls: N AN D(T rue, T rue) = F alse N AN D(T rue, F alse) = T rue N AN D(F alse, T rue) = T rue N AN D(F alse, F alse) = T rue 4. Compilers and Error Message Generators Recall that the compiler evaluates the entire computer program and then translates all the programming statements into a machine language program, which is then executed at once. A computer language compiler can be regarded as a function because each valid command in the source file is converted into a predictable series of machine language commands (cf [2]). If one considers only the assembly language programming, then one has a one to one function, because each mnemonic corresponds to one machine language command (cf [2]). Although some machine code carries more than one byte, we still have that it is a one to one function because we have a single unique command. On the other hand, the part of the compiler which provides error codes or error messages can be regarded as a many to one function, because many different errors are identified with the same error code or error message. One common example of such errors is the SYNTAX ERROR message which can arise from something like: a spelling error command, a use of invalid command, a punctuation error, a use of improper command, and so on. This implies that if the computer generates the SYNTAX ERROR message, then it is impossible to predict what actually is responsible for that error unless you examine the specifics of the line where the problem is and correct it.
References [1] Nwabueze, Kenneth, Basic number theory: a first course, Educational Technology Centre, Universiti Brunei Darussallam, 2003. [2] Strong, Vernon, Functions: Computer science connections, IMSA Math Journal(2),1993.
Learning Algorithm of Neural Network using Orthogonal Decomposition Method Shigenobu Yamawaki * and Lakhmi Jain ** * Department of Electric and Electronic Engineering, School of Science and Engineering Kinki University, Osaka, 577-8502, JAPAN e-mail: [email protected] ** Knowledge-Based Intelligent Engineering Systems Centre (KES) University of South Australia, Adelaide Mawson Lakes, South Australia, 5095 e-mail: [email protected]
Abstract
In this paper, we present a new learning algorithm of neural network based on the orthogonal decomposition method (ORT). The main scheme of this algorithm is using the ORT to obtain specially structured subspaces defined by the input-output data. This structure is then exploited in the calculation of the parameter estimation of the neural network. Therefore, the method to obtain the comparatively accurate estimate is introduced without iteration calculations. We show that this algorithm can be applied to successfully identify the nonlinear system in the presence of comparatively loud noise. Results from several simulation studies have been included to the effectiveness of this method.
1
Introduction
Recently, the neural network paradigm as a powerful tool for learning complex input-output mappings has stimulated many studies in using neural network models for identification of dynamical systems with unknown nonlinearities. For the system identification using the neural network, there are two main issues: One is the choice of the model structure 1), and the other is the choice of the learning algorithm 2)~4). The recurrent neural network structure is applied to the system identification. As far as the learning algorithm is concerned, a dynamic backpropagation algorithm has been developed under a quadratic cost criterion. Many of the proposed algorithms can guarantee asymptotic convergence of the estimation error to zero. This paper proposes the learning algorithm of the neural network applying the orthogonal decomposition method (ORT) 5). The main of this algorithm is the calculation of an orthogonal decomposition of the input-output data. Since the value of input-output data of the hidden layer is not being acquired directly, these values are estimated in applying the error backpropagation. We can combine the data matrix pair of the output of the hidden layer and input data. The orthogonal decomposition of the above matrix is used to estimate the parameters of the neural networks. Accordingly, the identification for nonlinear systems is obtained using the proposed
148
S. Yamawaki and L. Jain / Learning Algorithm of Neural Network Using ORT
learning algorithm. Applying the proposed method to the system identification, we will prove the validity of this method.
2 Learning Algorithm of Neural Network using Orthogonal Decomposition Method In this paper, we propose the learning algorithm of the neural networks (NN) described as follows: x N (t ) = AN oN (t − 1) + BN u(t − 1) + θ N + w(t ), oN (t ) = f ( x N (t )), f ( x N (t )) = [ f1 ( x N 1 (t )) f 2 ( x N 2 (t )) f n ( x Nn (t )) ] , (1) 2 fi ( x) = λ − 1 , 1 exp( x q ) + − s y N (t ) = CN oN (t ) + w(t ) where, xN (t ) , oN (t ) and u (t ) are n-dimensional states, same dimensional output of hidden layer and q-dimensional input of the NN at the step t . θ N is the threshold value of the NN at the step t . The weights parameter AN , BN and C N are appropriately sized coefficient matrices for each layer of the NN. The sigmoid function f i ( x) is the amplitude λ and qs slope. The variable y N (k ) is pdimensional expanded output of the NN w(t ) and v(t ) are system noise and observation noise, respectively. Moreover, w(t ) and v(t ) are the average zero and the white noise vectors that covariance matrices are given as follows. w( t ) T Q S w ( s ) v T ( s ) = T δ (t − s ) E R v(t ) S We define some frequently used some notations in this paper. The data set U1, N −1 which consists from the input u (t ) is defined as follows. The data set Y1,N −1 , X 1,N and O1,N −1 are defined from yN (t) respectively x N (t ) and oN (t) in a similar way as
U1,N −1 was constructed from u(t) , where x N (t ) denotes x N (t ) = x N (t ) − θ N . Then we can denote the Hankel matrix U1,k,N −1 constructed from u(t) as (v). Similarly, Y1,k,N −1 ,
X 1, k , N and O1,k,N −1 are Hankel matrices constructed from yN (t) , x N (t ) and oN (t) (i) U1, N −1 = [ u(1) u(2) u( N − 1) ] , (ii) Y1, N −1 = [ y N (1)
(iii) X 1, N = [ x N (2)
y N (2) y N ( N − 1) ] , x N (3) x N ( N ) ] ,
(iv) O1, N −1 = [ oN (1) oN (2) oN ( N − 1) ] ,
S. Yamawaki and L. Jain / Learning Algorithm of Neural Network Using ORT
149
U1, N −1 Y1, N −1 U Y 2, N 2, N (v) U1,k , N −1 = , (vi) Y1,k , N −1 = , U k , N + k −1 Yk , N + k −1 X 1, N O1, N −1 O X 2, N (vii) X 1, k , N = 2, N +1 , (viii) O1,k , N −1 = Ok , N + k −1 X k , N + k Now, we can describe NN (1) by using the above notations as follows: x Nd (t ) x Ns (t ) AN BN oNd (t − 1) x Ns (t ) (2) d + s = + 0 u (t − 1) y Ns (t ) y N (t ) y N (t ) C N Namely, x N (t ) is divided into the deterministic component x Nd (t ) and the stochastic component x Ns (t ) . Each variable is summarized in a similar way as x N (t ) was divided: x N (t ) = x Nd (t ) + x Ns (t ), oN (t ) = oNd (t ) + oNs (t ), y N (t ) = y Nd (t ) + yNs (t ), (3) s d x N (t ) = AN oN (t ) + w(t ), y Ns (t ) = C N oNs (t ) + v(t ) Furthermore, we can denote input and output sequences of NN more compactly as follows: X 1,dk , N X 1,s k , N AN BN O1,dk , N X 1,s k , N (4) d + s = Ik ⊗ + s Y1, k , N Y1,k , N C N 0 U1,k , N −1 Y1, k , N Then, consider the ORT factorization of the pair 0 U1,k , N −1 U1, k , N −1 Ip (5) d = I k ⊗ d X BN AN O1,k , N 1, k , N denoted as: U1, k , N −1 L11 0 Q1t (6) X d = t 1, k , N L21 L22 Q2 We can express I k ⊗ BN as;
−1 I k ⊗ BN = L21 L11 Hence, we obtain I k ⊗ AN O1,dk , N = L22 d 1, k , N
When it gets O
and X
d 1, k , N
the learning algorithm of NN.
(7) (8) , (7) and (8) can be computed. Therefore, we formulate
150
S. Yamawaki and L. Jain / Learning Algorithm of Neural Network Using ORT
< Algorithm > {Calculate x N (t) , oN (t) and θ N } 1) Calculate x N (t) , oN (t) and θ N using the backpropagation method. 2) Correct the parameter C N . < Algorithm > {Estimate the parameter} 3) Construct the Hankel matrices U1,k,N −1 and X 1,dk , N , defined in (v) and (vii). 4) Achieve the ORT factorization as given in (6) 5) Compute the SVD of the matrix L22 as following equation.
Σ1 0 V1t t R2 ] t ≅ R1Σ1V1 0 Σ V 2 2 I k ⊗ AN = R1 Σ1/2 1 6) Obtain the parameters BN and AN :
L22 = [ R1
(9) (10)
−1 (1: q,1: q ), BN = L21 L11 (11) AN = R1Σ11/ 2 (1: n,1: n ) where D(1: n,1: n ) denotes the first n rows and first n columns of D . 7) Repeat the step 7 from the step 1 until the stop condition is satisfied.
3
Examples
We have used for an identification of the bilinear system described as below; where wT (k ) = [ w1 (k ) w2 (k )] and v T (k ) = [v1 (k ) v2 (k )] are given by the Gaussian white noise of the average zero, variance 1, respectively. And u (t ) is the binary random series with ±1 .
Λ y of about 1.2. Λ y is the covariance matrix of the
undisturbed output. In the estimation, the number of data was taken to be 2000. The estimation result of the NN (1) for n = 6 and k = 20 is shown in Fig. 1. The applied algorithms are the proposed algorithm and the error back-propagation method using a least-squares method (LSBP). It is clear that the proposed algorithm is able to improve estimation accuracy from Figure 1. Next, the estimated error and covariance of residuals are shown in Table 1. Using the proposed method, although the intense linearity between the output and the prediction value is not obtained, it is shown that the covariance is estimated almost correctly from Table 1.
S. Yamawaki and L. Jain / Learning Algorithm of Neural Network Using ORT
(b) System output y2 (t ) and neural network output yN 2 (t ) Fig. 1 The estimation result for the proposed algorithm
151
152
S. Yamawaki and L. Jain / Learning Algorithm of Neural Network Using ORT
Table 1 Estimation error for model structure e
e
ORT
0.073 0.242
0.252
LSBP
-0.061 -0.020
0.064
BP
-0.775 0.199
0.800
cov(e) 0.310 0.026 0.454 0.070 0.692 -0.050
0.026 0.311 0.070 0.504 -0.050 0.651
cov(e)
0.096 0.224 0.449
4 Conclusion In this paper, we have proposed the algorithm based on the orthogonal decomposition method to estimate the parameter of the neural network. The proposed algorithm is able to estimate the parameter of the neural network without repetitively calculating. The validity of proposed algorithm was clarified by applying to the identification of the nonlinear system in the presence of unknown driving system and observed noise. It has been clear that the accurate estimate has been obtained from the simulation.
References 1. S. Chen, S. A. Billings and P. M. Grant: Non-linear system identification using neural networks; INT. J. CONTROL, Vol. 51, No. 6, 1191/1214, (1990) 2. S. Yamawaki, M. Fujino and S. Imao: An Approximate Maximum Likelihood Estimation of a Class of Nonlinear Systems using Neural Networks and Noise Models; T. ISCIE , Vol. 12, No. 4, pp.203-211, (1999) ( Japanese ) 3. S. Yamawaki: A study of Learning Algorithm for Expanded Neural Networks; Proc. KES 2002, 358/363, (2002) 4. S. Yamawaki and Lakhmi Jain: Expanded Neural Networks in System Identification; Proc. KES 2003, 1116/1121, (2003) 5. M. Verhaegen and P. Dewilde: Sunspace model identification Part 1. The outputerror state-space model identification class of algorithms; INT. J. Control,596/604, (1992)
Para-analyzer and Its Applications Jair Minoro ABE a, b, 1, João I. da Silva Filho b, 4, Fábio Romeu de CARVALHO b, 2 and Israel BRUNSTEIN a, 3 a
b
University of São Paulo, São Paulo - Brazil Paulista University, UNIP – São Paulo – Brazil
Abstract. In this expository work we show how the Para-analyzer can be useful to a variety of applications involving decision-making when facing mainly with uncertainty, inconsistent or paracomplete information. The Para-analyzer can be implemented electronically, originating the Para-control, very useful in applications in the area of Robotics and Automation. Keywords. Paraconsistent logic, annotated logics, para-analyzer, decision-making theory
Introduction It is well know that the concept of the uncertainty plays a central role when we make description of many parts of the real world and more sharply when we have to manipulate such set of information, requiring sophisticated tools. For instance, for long time, Classical logic has been fundamental for applications, but such standard logical formalisms are inadequate in regard to their ability to model informal arguments. Fuzzy sets provided an important contribution to apply directly in informal arguments. Many other logical formalisms were been proposed: temporal logics, modal and dynamic logics, many-valued logics, intuitionistic logics, defesiable, deontic, default, nonmonotonic reasonings, besides a long list of alternative systems. In some previous works [3] we have introduced a logical analyzer based on Paraconsistent Annotated Logic EW dubbed Para-analyzer. It constitutes an alternative tool to deal with uncertainty, inconsistent or paracomplete data, giving an elegant treatment in a nontrivial manner. This paper summarizes such ideas. 1. Paraconsistent Logics and Related Systems Roughly speaking, paraconsistent logics allow formulas of the form A & A (A and the negation of A) to be applied in a non-trivial manner in deductions. Nowadays there are many systems of this type, v.g. Cn logics of Da Costa (see, e. g. [15]). Their “dual”, in a precise sense, are the logics known as paracomplete logics. A logic is called paracomplete if, according to it, a proposition and its negation can be both false (some many-valued logics are paracomplete in this sense). Finally, logics that are both paraconsistent and paracomplete are called non-alethic logics. 1
J.M. Abe et al. / Para-Analyzer and Its Applications
2. Paraconsistent Annotated Logic EW Annotated logics are a new class of non-alethic 2-sorted logics. In this paper we’ll consider a particular annotated logic, namely the paraconsistent annotated logic EW. The atomic formulas of EW is of the type p(P, O), where (P, O) [0, 1]2 and [0, 1] is the real unitary interval (p denotes a propositional variable). There is an order relation defined on [0, 1]2: (P1, O1) d (P2, O2) P1 d P2 and O1 d O2 . Such ordered system constitutes a lattice that will be symbolized by W. There is a natural operator defined in the lattice. ~ : | W | o | W | is defined as ~ [(P, O)] = (O, P). Such operator works as the “meaning” of the logical negation of EW. Also, we have the operator of maximization (P1; P2)OR(O1; O2) = (max{P1, O1}; max{P2, O2}) and the operator minimization: (P1; P2)AND(O1; O2) = (min{P1, O1}; min{P2, O2}). The pair (P, O) is called an annotation constant. p(P, O) can be intuitively read: “It is believed that p’s belief degree (or favorable evidence) is P and disbelief degree (or contrary evidence) is O.” So, (1.0, 0.0) indicates intuitively that p is a true proposition, (0.0, 1.0) indicates intuitively that p is a false proposition, (1.0, 1.0) indicates intuitively that p is a contradictory proposition, (0.0, 0.0) indicates intuitively that p is a paracomplete proposition, and (0.5, 0.5) can be read that p is an indefinite proposition. A detailed account of annotated logics is to be found in [1]. The consideration of the values of the belief degree and that of disbelief degree is made, for example, by specialists that use heuristics knowledge, probability [16] or statistics [17]. Now let us see the negation of an atomic formula. Let us take for instance p(0.5, 0.5), that is to say, “It is believed that p’s belief degree (or favorable evidence) is 0.5 and disbelief degree (or contrary evidence) is 0.5.” What is its negation, p(0.5, 0.5)? It is clear that it is the same formula, i.e. p(0.5, 0.5). So, it becomes evident that EW is a paraconsistent logic. More generally, we can state that p(P, O) is equivalent to pa(P, O). This property becomes fundamental when complexity of hardware implementation of paraconsistent logical circuits is the main concern. In order to manipulate the concepts of uncertainty, inconsistency and paracompleteness, we introduce the following concepts (all considerations are made having 0 d P, O d 1): Contradiction degree: Gct(P; O) = P + O - 1; Certainty degree: Gce(P; O) = P - O. The logical (or output) states (extreme and non-extreme) consist of 12 states according to the figure 1 and table 1 table 2. These states can be easily characterized with the values of the certainty degree and uncertainty degree. We have chosen the resolution 12 (number of the regions considered according in the Figure 1), but such division is totally dependent of the precision of the analysis required in the output. In order to make easier the recognition of each region, each one received a denomination in agreement with its proximity with the extreme states points of the lattice.
155
J.M. Abe et al. / Para-Analyzer and Its Applications
Figure 1. Lattice of output states Table 1. Extreme states
Extreme States Inconsistent False True Paracomplete
Region CPN BNM PQD AMQ
Symbol T F V A
Table 2. Non-extreme states
Non-extreme states Quasi-true tending to Inconsistent Quasi-true tending to Paracomplete Quasi-false tending to Inconsistent Quasi-false tending to Paracomplete Quasi-inconsistent tending to True Quasi-inconsistent tending to False Quasi-paracomplete tending to True Quasi-paracomplete tending to False
Region PUO QUO SON MOR TOP TON ROQ ROM
Symbol QVoT QVoA QFoT QFoA QToV QToF QAoV QAoF
We can consider the following control values (in this work we have chosen ½): Maxvcc = maximum value of certainty control Maxvctc = maximum value of contradiction control Minvcc = minimum value of certainty control Minvctc = minimum value of contradiction control
156
J.M. Abe et al. / Para-Analyzer and Its Applications
Figure 2. Representation of the certainty degrees and of the contradiction degrees with the control values: Maxvcc = Maxvctc = ½ and Minvcc = Minvctc = -½.
With these considerations, we have built the “Para-analyzer” [3].
3. Para-analyzer Algorithm */ Definitions of the values */ Maxvcc = C1 */ maximum value of certainty Control*/ Maxvctc = C2 */ maximum value of contradiction control*/ Minvcc = C3 */ minimum value of certainty Control */ Minvctc = C4 */ minimum value of contradiction control*/ */ Input Variables */ P O */ Output Variables */ digital output = S1 Analogical output = S2a Analogical output = S2b * / Mathematical expressions * / being: 0 d P d 1 and 0 d O d 1 Gct(P; O) = P + O - 1; Gce(P; O) = P - O * / determination of the extreme states * / if Gce(P; O) t C1 then S1 = V if Gce(P; O) t C2 then S1 = T if Gct(P; O) t C3 then S1 = F if Gct(P; O) d C4 then S1 = A */ determination of the non-extreme states * / for 0 d Gce < C1 and 0 d Gct < C3 if Gce t Gct then S1 = QVoT
J.M. Abe et al. / Para-Analyzer and Its Applications
157
else S1 = QToV for 0 d Gce < C1 and C4 < Gct d 0 if Gce t | Gct | then S1 = QVoA else S1 = QAoV for C2 < Gce d 0 and C4 < Gct d 0 if |Gce | t | Gct | then S1 = QFoA else S1 = QAoF for C2 < Gce d 0 and 0 d Gct < C3 If |Gce | t Gct then S1 = QFoT else S1 = QToF Gct = S2a Gce = S2 */ END */
4. Applications Let us suppose that a proposition is being analyzed by some experts, shown as in the figure below. The information, whatever they come, facts, subjective opinions, incomplete information, statistical data, etc. can have agreements, disagreements or even indefinition. The Para-analyzer proposed can perform a paraconsistent reasoning that analyzes the favorable and contrary evidences. A first consideration is in Expert Systems. Let’s take, for instance, a launching product being analyzed by marketing people and selling people. We can consider a proposition to be analyzed p and several experts are invited to give their opinions. For instance, let M1, M2, ... , Mn be n experts in marketing area and S1, S2, ... , Sm are m experts in selling area. It is intuitive that we can use the maximization operator among experts of marketing area and the same for experts among selling area. (M1 OR M2 OR ... , OR Mn) AND (S1 OR S2 OR ... , OR Sm) Between marketing group and selling group we use minimization operator, so they are different in nature. This can be performed for each factor that we can analyze for the proposition considered and by applying the Para-analyzer we can get a resulting annotation belonginning to one of the output states. By using the contradiction degree and certainty degree, for each region we can determine the appropriate decision. This methodology allows also putting weights in the experts or in the factors being considered. Applications in these directions are [10] that analyzes where to open an enterprise, [10] analyzes product launching in the market, [7] gives a decision making for logistics. Such paper analyzes how the Para-analyzer can be useful to increase robot availability through maintenance. Let us take a simple application based on [7]: suppose that the maintenance of a number of robots are being made by three experts. The main engineer receives several information, each of them is a proposition attached with a favorable evidence and a contrary evidence: for instance, last maintenance, type of robot, past recordings, etc. The Para-analyzer proposed can perform a paraconsistent reasoning that will analyze each evidence for the favorable and contrary evidences. A suggested form for
158
J.M. Abe et al. / Para-Analyzer and Its Applications
this implementation is the use of a maximization analysis with the connective OR and a minimization with the connective AND among the three specialists' information. The figure 4 below displays the net with more details, where it stands out the action Para-analyzer in the information brought by the three specialists. PROPOSITION
Expert E1 Belief Degree
Disbelief Degree
P1a
Expert E3
Expert E2
P2a
OR
Belief Degree
P1b
Belief Degree
Disbelief Degree
P2b
P1c
Disbelief Degree
P2c
OR
P2OR
P1OR
AND P1R
AND P2R
PARA-ANALYZER Gce
CONCLUSION
Gct
Figure 4. Analysis of the proposition concerning to “To make maintenance of the robot”.
The Para-analyzer can be built in electronic circuits by a sophisticated experimental hardware implementation. It generates a logical controller called Paracontrol [6]. The Para-control can be applied to manipulate conflicts and paracompleteness developing a robot that thinks more flexibly and to implement decision-making in the presence of uncertainties. Such electronic circuit treats logical signals in a context of paraconsistent annotated logic EW. The circuit compares logical values and determines domains of a state of the lattice W corresponding to output value. Favorable and contrary evidence degrees are determined by analogies of operational amplifiers. The Para-control comprises both analog and digital systems and it can be externally adjusted by applying positive and negative voltages.
159
J.M. Abe et al. / Para-Analyzer and Its Applications
The Para-control was tested in real life experiments with a mobile robot named Emmy (in homage to Emmy Nöether), whose favorable/contrary evidence degrees coincide with values of ultrasonic sensors and distances to obstacles are represented by continuous values of voltage [10], [5]. A second prototype was recently developed [9] which is an improvement of Emmy. It was called Emmy II, and the Para-analyzer allows implement velocity controls, making the robot’s movements softly. Also it allows back movements, a novelty in relation to Emmy.
Input P
Paraconsistent Logic Controller Para-Control Para-Analyzer Gct(P; O) = P + O - 1
Input O
Gce(P; O) = P - O
Certainty Degree Contradiction Degree L O G I C A L S T A T E S
DECISION Take a control action based on lattice W
Figure 5. Para-analyzer scheme
Also in [12], [13] we have discussed how to handle Fuzzy logic and Paraconsistent Annotated Logic EW building a hybrid logic controller Para-Fuzzy. It is capable to treat fuzziness, inconsistencies and paracompleteness in a non-trivial way. This type of approach makes the systems more complete, with great robustness and leading to more reliability in the conclusions creating a new way to represent uncertainty, inconsistent and/or paracomplete knowledge. As application, for instance, it could work as a main component in an autonomous mobile robot's control system that could navigate in an unknown environment with the movements oriented by means of two sensors (favorable and contrary evidences) which can generate all combinations of degrees ranging between 0 an 1 according to the output lattice seen previously. The fuzzy or contradictions generated by received signals of several and different sources of information could be treated by Para-fuzzy that could present more closely conclusive results.
5. Conclusions This application of the Para-analyzer opens new possibilities of manipulating concepts of vagueness, inconsistencies and paracompleteness with applications in Artificial Intelligence, Automation and Robotics. We hope to say more in forthcoming papers.
References [1]
J.M. Abe, Fundamentos da Lógica Anotada (Foundations of Annotated Logic) (in Portuguese), Ph.D. Thesis, University of São Paulo, São Paulo, 1992.
160
J.M. Abe et al. / Para-Analyzer and Its Applications
[2] [3] [4] [5] [6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14] [15] [16] [17]
J.M. Abe, Some Aspects of Paraconsistent Systems and Applications, Logique et Analyse, 157, 83-96. 1997. J.I. da Silva Filho & J.M. Abe, Paraconsistent analyzer module, International Journal of Computing Anticipatory Systems, vol. 9, ISSN 1373-5411, ISBN 2-9600262-1-7, 346-352, 2001. N.C.A. da Costa, J.M. Abe & V.S. Subrahmanian, Remarks on annotated logic, Zeitschrift f. math. Logik und Grundlagen d. Math. 37, pp 561-570, 1991. J.I. da Silva Filho & J.M. Abe, Manipulating Conflicts and Uncertainties in Robotics, MultipleValued Logic and Soft Computing, V.9, ISSN 1542-3980, 147-169, 2003. J.M. Abe & J.I. Da Silva Filho, Simulating Inconsistencies in a Paraconsistent Logic Controller, International Journal of Computing Anticipatory Systems, vol. 12, ISSN 13735411, ISBN 29600262-1-7, 315-323, 2002. J.M. Abe & J.I. Da Silva Filho, A Para-analyzer Method to Increase Robot Availability Through Maintenance, Proceedings of the “International Conference on Industrial Logistic 2001”, Okinawa, Japan, 327-337, 2001. F.R. Carvalho, I. Brustein & J.M. Abe, Paraconsistent annotated logic in analysis of viability: an approach to product launching, in Computing Anticipatory Systems: CASYS 2003 - Sixth International Conference on Computing Anticipatory Systems, Eds. D.M. Dubois, American Institute of Physics, AIP Conference Proceedings, Springer - Physics & Astronomy, Vol. 718, ISBN 0-7354-0198-5, ISSN: 0094-243X, pp. 282-291, 2004. C.R. Torres, J.M. Abe & G.L. Torres, Sistema Inteligente Paraconsistente para Controle de Robôs Móveis Autônomos, Anais do I Workshop Universidade-Empresa em Automação, Energia e Materiais, 5-6 Nov., 2004, Taubaté (SP), Brazil, 2004. J.I. da Silva Filho & J.M. Abe, Emmy: a paraconsistent autonomous mobile robot, in Logic, Artificial Intelligence, and Robotics, Proc. 2nd Congress of Logic Applied to Technology – LAPTEC’2001, Edts. J.M. Abe & J.I. Da Silva Filho, Frontiers in Artificial Intelligence and Its Applications, IOS Press, Amsterdan, Ohmsha, Tokyo, Editores, Vol. 71, ISBN 1586032062 (IOS Press), 4 274 90476 8 C3000 (Ohmsha), ISSN 0922-6389, 53-61, 287p., 2001. J.I. da Silva Filho & J.M. Abe, Fundamentos das Redes Neurais Paraconsistentes – Destacando Aplicações em Neurocomputação, (in Portuguese) Editôra Arte & Ciência, ISBN 85-7473-045-9, 247 págs., 2001. J.I. da Silva Filho & J.M. Abe, Para-Fuzzy Logic Controller – Part I: A New Method of Hybrid Control Indicated for Treatment of Inconsistencies Designed with the Junction of the Paraconsistent Logic and Fuzzy Logic, Proceedings of the International ICSC Congress on Computational Intelligence Methods and Applications - CIMA’99, Rochester Institute of Technology, RIT, Rochester, N.Y., USA, ISBN 3-906454-18-5, Editors: H. Bothe, E. Oja, E. Massad & C. Haefke, ICSC Academic Press, International Computer Science Conventions, Canada/Switzerland, 113-120, 1999. J.I. da Silva Filho & J.M. Abe, Para-Fuzzy Logic Controller – Part II: A Hybrid Logical Controlller Indicated for Treatment of Fuzziness and Inconsistencies, Proceedings of the International ICSC Congress on Computational Intelligence Methods and Applications CIMA’99, Rochester Institute of Technology, RIT, Rochester, N.Y., USA, ISBN 3-906454-18-5, Editors: H. Bothe, E. Oja, E. Massad & C. Haefke, ICSC Academic Press, International Computer Science Conventions, Canada/Switzerland, 106-112, 1999. N.C. Da Costa, J.M. Abe, J.I. Da Silva Filho, A.C. Murolo & C.F.S. Leite, Lógica Paraconsistente Aplicada, (in Portuguese) ISBN 85-224-2218-4, Editôra Atlas, 214 págs., 1999. N.C.A. Da Costa, On the theory of inconsistent formal systems, Notre Dame J. of Formal Logic, 15, 497-510, 1974. A.P. Dempster, Generalization of Bayesian inference, Journal of the Royal Statistical Society, Séries B-30, 205-247, 1968. R.O. Duda, P.E. Hart, K. Konolid & R. Reboh, A computer-based Consultant for Mineral Exploration, TR, SRI International, 1979.
Methods for Constructing Symbolic Ensembles from Symbolic Classifiers Flavia Cristina Bernardini and Maria Carolina Monard University of São Paulo — USP Institute of Mathematics and Computer Science — ICMC Laboratory of Computational Intelligence — LABIC P. O. Box 668, 13560-970, São Carlos, SP, Brazil e-mail: {fbernard,mcmonard}@icmc.usp.br AbstractPractical Data Mining applications use learning algorithms to induce knowledge. Thus, these algorithms should be able to operate in massive datasets. Techniques such as dataset sampling can be used to scale up learning algorithms to large datasets. A general approach associated with sampling is the construction of ensembles of classifiers, which can be more accurate than the individual classifiers. However, ensembles often lack the facility to explain its decisions. In this work we explore a method for constructing ensembles of symbolic classifiers, such that the ensembles are able to explain its decisions to the user. This idea has been implemented in the ELE system described in this work. Keywords. Symbolic Machine Learning, Ensembles of Classifiers, Combining Classifiers
1. Introduction An active research area in Machine Learning (ML) is related to developing methods capable of dealing with large datasets [1], as required by the Data Mining (DM) process. There are several approaches to deal with ML systems on large datasets. One of these is the supervised learning ensemble approach. In general, an ensemble of classifiers consists of a set of classifiers whose individual decisions of classification are combined in some way (typically by weighted or unweighted voting) to classify new examples. Furthermore, under certain conditions, an ensemble can be more accurate than its classifiers components [2]. Although ensembles and classifiers power prediction can be considered a strong goal, human understanding and evaluation of the induced knowledge, which is often neglected, also plays an important role in both DM and ML. Human understanding can be achieved using symbolic learning systems, i.e., the ones that induce what we shall call symbolic classifiers. In this work, we consider a classifier as symbolic if it can be transformed into a set of propositional knowledge rules. However, combining symbolic classifiers by a majority (or other) voting mechanism does not necessarily result in a symbolic ensemble that is able to explain its decisions. Our interest is not only related to correct classification of new instances by the ensemble, but is also related to the ensemble explanation facility. In other words, we are interested in ensemble’s ability to explain the user the reasons of classifying new instances into one of the possible classes.
162
F.C. Bernardini and M.C. Monard / Methods for Constructing Symbolic Ensembles
In this work we explore a method for constructing ensembles of symbolic classifiers using several voting mechanisms, such that the ensembles are able to explain its decisions to the user. This idea has been implemented in the ELE (Ensemble Learning Environment) system described in this work. The rest of this paper is organized as follows: Section 2 introduces notation and definitions used in the text; Section 3 describes our proposal to construct symbolic ensembles; Section 4 describes the system implemented; Section 5 shows some experimental results and Section 6 concludes the paper.
2. Definitions and Notation A dataset T is a set of N classified instances {(x1 , y1 ), ..., (xN , yN )} for some unknown function y = f (x). The xi values are typically vectors of the form < xi,1 , xi,2 , ..., xi,M > whose components are discrete – or real – values, called features or attributes. Thus, xij denotes the j-th feature or attribute of xi . In what follows, the i subscript will be dropped out when implied by the context. For classification, the y values are drawn from a discrete set of NCl classes, i.e. y ∈ {C1 , ..., CNCl }. Given a set S ⊂ T of training examples, a learning algorithm outputs a classifier h, which is an hypothesis about the true function f . Given new x values, h predicts the corresponding y values. A symbolic classifier is an hypothesis whose description language can be transformed into a set of rules. A complex is a disjunction of conjunctions of feature tests in the form xi op Value, where xi is a feature name, op is an operator in the set {=, =, <, ≤, >, ≥} and Value is a valid xi feature value. A rule R assumes the form if B then H or symbolically B → H, where H stands for the head, or rule conclusion, and B for the body, or rule condition. H and B are both complexes with no features in common. A classification rule assumes the form if B then class = Ci , where Ci ∈ {C1 , ..., CNCl }. The coverage of a rule is defined as follows: considering a rule R = B → H, instances that satisfy the B part compose the covered set of R, called B set in this work; in other words, those instances are covered by R. Instances that satisfy both B and H are correctly covered by R, and these instances belong to set B ∩ H. Instances satisfying B but not H are incorrectly covered by the rule, and belong to set B ∩ H. On the other hand, instances that do not satisfy the B part are not covered by the rule, and belong to set B. Given a rule and a dataset, one way to measure its performance on that dataset is by computing its contingency matrix [3] — Table 1. Denoting the cardinality of a set A as a, i.e. a = |A|, then b and h in Table 1 denote the number of instances in sets B and H respectively, i.e. b = |B| and h = |H|. Similarly, b = |B|; h = |H|; bh = |B ∩ H|; bh = |B ∩ H|; bh = |B ∩ H|; and bh = |B ∩ H|. The contingency matrix of a rule R enables the calculation of several rule quality measures, such as support (Sup(R) = hb/N ), rule accuracy (Acc(R) = hb/b) and others, within a common framework [3]. Table 1. Contingency matrix for B → H H H
B bh bh b
B bh bh b
h h N
F.C. Bernardini and M.C. Monard / Methods for Constructing Symbolic Ensembles
163
An ensemble consists of a set of individual classifiers whose predictions are combined in some way, by a majority or other voting mechanism, in order to predict the label of new instances. Although under certain conditions ensembles can reduce the classification error by reducing bias and variance, ensembles can be very large [2]. Another problem is related to ensembles’ interpretability by humans, since an ensemble of symbolic classifiers is not necessarily symbolic. In what follows, we describe a method we are proposing for constructing ensembles using few symbolic classifiers and several voting mechanisms, which enables the ensemble to explain its decisions to the user.
3. Combining Multiple Classifiers The task for constructing ensembles can be broken down into two sub-tasks [2]. The first one consists in generating a set of base-level classifiers. The second one consists in deciding how to combine the decisions of the base-level classifiers to classify new instances. In this work, the first task is done in the usual way. Let L be the number of base-level classifiers to be induced given a dataset S. First of all, L samples S1 , ..., SL , with or without restitution, are extracted from S. Each sample is used as input to a symbolic ML algorithm, inducing L hypothesis (classifiers) h1 , ..., hL . The algorithm does not need to be the same for all L samples. Afterwards, given a new instance (example) x to be classified, the individual decisions of the set of L hypothesis should be combined to output its label. Figure 1 illustrates the method where Combine(h1 (x), ..., hL (x)) constitutes the symbolic ensemble h∗ (x). Symbolic Ensemble h*
S
S1
Alg1
h1
h1(x)
S2
Alg2
h2
h2(x)
. . .
. . .
AlgL
hL
. . .
SL
Combine
h*(x)
hL(x)
Example x to be classified
Figure 1. A method for constructing ensembles of classifiers
As we use symbolic classifiers, we can consider two different ways to classify x, i.e, two ways of finding h1 (x), ..., hL (x): 1. Classifier Classification: where each induced classifier is responsible for classifying x; 2. Best Rule Classification: where the best classifier’s rule that covers the example x, according to a rule measure specified by the user, is responsible for classifying x. Thus, Combine(h1 (x), ..., hL (x)) can use the following four methods to combine multiple classifiers in order to construct the final ensemble h∗ :
164
F.C. Bernardini and M.C. Monard / Methods for Constructing Symbolic Ensembles
1. Unweighted Voting – UV: the class label of x is the one that receives more votes from the L classifiers; 2. Weighted by Mean Voting – WMV: the x class label given by each classifier is weighted using the classifier’s mean error rate m_err(hi ), and the class label of x is the one having maximum total weight from the L classifiers:
W M V (x, Cv ) =
g(hl (x), Ci ) =
max
Ci ∈{C1 ,...,CNCl }
L
g(hl (x), Ci ) , where
l=1
lg((1 − m_err(hl ))/m_err(hl )) 0
if hl (x) = Ci , otherwise.
3. Weighted by Mean and Standard Error Voting – WMSV: similar to the previous one but considering also the standard error se_err(hi )) of the classifier’s mean error rate to estimate the corresponding weight:
W M SV (x, Cv ) =
max
Ci ∈{C1 ,...,CNCl }
L
g(hl (x), Ci ) , where
l=1
lg((1 − m_err(hl ))/m_err(hl )) if hl (x) = Ci , g(hl (x), Ci ) = + lg((1 − se_err(hl ))/se_err(hl )) 0 otherwise. 4. Best Rule Voting – BRV: According to a rule measure specified by the user, the best rule among all the rules in the ensemble components is responsible for the final classification of x. In order to validate our proposal, we have implemented a computational system called Ensemble Learning Environment (ELE), which is integrated to a computational environment called D ISCOVER. Both are described next.
4. Implementation Description There are several learning algorithms and methods that can be used in Data Mining tasks. However, it is not possible to apply such algorithms and methods without careful data understanding and preparation. In addition, evaluating, understanding, and interpreting the extracted knowledge are also hard activities. Although several commercial tools are currently available, they are too expensive for some developing countries university budgets. Furthermore, commercial tools are generally developed as closed products from the end user point of view, making it difficult for researchers to integrate other DM methods.
F.C. Bernardini and M.C. Monard / Methods for Constructing Symbolic Ensembles
165
With these in mind, a group of researchers from the Computational Intelligence Laboratory1 is developing an integrated environment, called D ISCOVER [4], implemented in Perl, which is being used to help our research related to DM and ML. The aim of the system is to integrate the community’s most frequently used learning algorithms with data and knowledge processing tools developed as results of our work. Furthermore, D IS COVER can also be used as a workbench for new tools and ideas that can be integrated as new components to the system [4]. One of the advantages of D ISCOVER as a research system is the unifying vision on which the system handles objects using standard formats. Thus, the development of new tools for both data and knowledge processing, such as the ELE system proposed in this work, is related to handling sets of common objects, which can be represented in different formats. As stated before, the D ISCOVER project adopted the standard format concept to its objects, where an object can be, for example, a dataset, a classifier, a rule or a measure. Besides that, the use of standard formats for objects allows to have an unifying vision of these objects, facilitating their understanding. For data representation, we adopted the attribute-value standard format [5] with some extensions. In order to manipulate the data in this standard format, D ISCOVER provides an object oriented library D ISCOVER O BJECT L IBRARY (DOL) [5]. There is also a standard format for classification rules, called PBM [6], with a major addition: for each rule we have the contingency matrix — Table 1 — which is the base for most rule evaluation measures [3,6]. To manipulate rules in PBM standard format, there is another object oriented library which, among other functionalities, it can translate symbolic hypothesis induced by several ML algorithms most frequently used by the community to the PBM format. Afterwards, using a set of rules in PBM format and a dataset, it calculates the contingency matrix of each rule. Other available functionalities include calculation of rule quality measures. Using the functionalities of these libraries we developed ELE — Ensemble Learning Environment — as a module integrated to D ISCOVER. ELE consists of two submodules: the first one, called E NSEMBLE T REE, creates the ensembles’ component hypotheses and finds an estimative of error rates of these components. The second one, called E NSEMBLE M ETHODS, offers functionalities related to combination methods and ensembles’ classification error rates. The first task of E NSEMBLE T REE is constructing the ensemble’s base-level classifiers given a dataset S. First of all, it extracts L samples from S, with or without restitution, obtaining S1 , ..., SL samples — Figure 1. After this, it induces L base-level classifiers and estimates their mean error rates using k-fold cross validation. Sub-module E NSEMBLE M ETHODS implements the classification and combination methods previously described. Furthermore, together with E NSEMBLE T REE, it is also responsible to estimate the constructed ensemble mean error rate using k-fold cross validation. It should be observed that L, sampling method, base-level classifiers and k are determined by the user, as well as the classification and combination methods that should be executed by ELE. Details about ELE implementation can be found in [7]. Next section shows results obtained by ELE using a real dataset. 1 http://labic.icmc.usp.br
166
F.C. Bernardini and M.C. Monard / Methods for Constructing Symbolic Ensembles
5. Experiments and Results In order to illustrate the ELE system, several experiments were conducted using a real world dataset, Nursery, from UCI [8]. Table 2 shows the number of instances (# Inst.), number of continuous and discrete features (# Features), class distribution (Class %) and majority error of this dataset, which has neither unknown values nor duplicate or conflicting examples. Table 2. Nursery datasets characteristics summary # Inst. 12960
Base-level classifiers were induced using CN 2 and C4.5 symbolic learning algorithms [6]. Using the classifiers classification option — Section 3 — the three proposed methods to construct ensembles, UV, WMV e WMSV, were executed. There were conducted 9 (nine) experiments varying the number of samples, which were created without restitution, and varying the learning algorithm used on each sample in order to induce the base-level classifiers, as described in Table 3. For example, in the first experiment (Exp 1), 3 samples were created and they were used by CN 2 to induce the base-level classifiers, while in Exp 4, 5 samples were created and CN 2 was used on 3 samples while C4.5 was used on the other 2 samples. Table 3. Experiment Description Experiment Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Exp 7 Exp 8 Exp 9
Table 4 summarizes the results obtained. Initially, for each sample S i used in experiment Exp j, it shows the mean error and standard error rate of the base-level classifier induced using that sample. These errors were estimated using 10-fold cross validation. Afterwards are shown the results obtained by each combination method, i.e. mean error and standard error rate of the final ensemble, also estimated using 10-fold cross validation. It can be observed that in all experiments the mean error rate of the ensemble is smaller than any of its base-level classifiers. Moreover, these results (ensemble versus base-level classifiers) are all significant at 95% confidence level. Taking into account the limited number of base-classifiers used in the experiments (minimum 3 and maxi-
F.C. Bernardini and M.C. Monard / Methods for Constructing Symbolic Ensembles
167
Table 4. Summary of results using Nursery dataset. Sample S1 S2 S3 S4 S5 S6 S7 UV WMV WMSV
mum 7) it can be considered a very good result. However, none of the experiments show difference among the three combination methods. As expected, considering the sampling method used (without restitution) and limited size dataset, the mean error rate of base-level classifiers tends to increase for higher number of classifiers. This is related to the fact that Nursery dataset contains less than 13.000 examples, and using more samples (without restitution) implies in less number of examples in each sample. This sort of problem will not be present whenever huge datasets are used. Furthermore, as can be observed in Table 2, there is a class distribution problem related to two Nursery class values (class recommend and class very_recom). As samples were created without restitution, and due to Nursery limited size, it is possible that examples from these minority classes do not participate in all samples. Although ELE was originally thought for massive datasets, observe that it could also be used on limited size datases having class distribution, or other problems, by using the dataset as a single sample, and different learning algorithms to induce base-classifiers from that sample.
6. Conclusions and Future Work In this work we describe the ELE system that addresses the problem of predictive DM, whenever the result of learning can be expressed in the form of symbolic rules, by constructing ensembles of base-level classifiers which are able to explain its decisions to the user. Ensembles can be constructed using several methods to combine the decisions of individual base-level classifiers; actually four combination methods have been implemented in ELE.
168
F.C. Bernardini and M.C. Monard / Methods for Constructing Symbolic Ensembles
In order to illustrate the system, we conducted a series of experiments using a limited size dataset. Although for this dataset experimental results did not show differences among the combination methods tested, for all experiments the mean error rate of the constructed ensemble was smaller than any of its base-level classifiers, and these results were all significant at 95% confidence level. As we used few base-level classifiers, this result is encouraging since due to our focus on ensemble explanation facilities, it is interesting not to use too many base-level classifiers. Furthermore, although our initial idea was related to massive datasets, it is also possible to use ELE on limited size datasets, as discussed in previous section. Related to ensemble explanation, the actual ELE implementation simply shows to the user all different individual classifiers’ rules that correctly cover the example, i.e. fired rules from classifiers that participate in the final (combined) ensemble classification. We are currently improving ELE explanation facility, aiming to reduce the set of explanatory rules showed to the user. Although results of this work are encouraging, they are not conclusive. So, ongoing work on this research includes further experiments using massive as well as limited size datasets from different domains. Furthermore, we also plan to investigate under which conditions, if any, varying the voting method could provide a reduction in ensemble’s classification error. Acknowledgments This research were supported by FAPESP, Brazil, under process n o 02/06914-0. References [1] P. Cabena, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi. Discovering Data Mining: from Concept to Implementation. Prentice Hall, 1998. [2] T. G. Dietterich. Ensemble methods in machine learning. In First International Workshop on Multiple Classifier Systems. LNCS, volume 1857, pages 1–15, New York, 2000. [3] N. Lavrac, P. Flach, and B. Zupan. Rule evaluation measures: a unifying view. In Proc. 9th International Workshop on Inductive Logic Programming. LNAI, volume 1634, pages 74–185, 1999. [4] R. C. Prati, M. R. Geromini, and M. C. Monard. An integrated environmnet for data mining. In IV Congress of Logic Applied to Technology — LAPTEC 2003, volume 2, pages 55–62, Brazil, 2003. [5] G. E. A. P. A. Batista and M. C. Monard. DLE — D ISCOVER Learning Environment: implementation description (in portuguese). Technical Report 187, ICMC/USP, 2003. ftp: //ftp.icmc.usp.br/pub/BIBLIOTECA/rel_tec/RT_187.PDF. [6] R. C. Prati, J. A. Baranauskas, and M. C. Monard. An unifying language proposal to represent hypothesis induced by symbolic machine learning algorithms (in portuguese). Technical Report 137, ICMC/USP, 2001. ftp://ftp.icmc.sc.usp.br/pub/BIBLIOTECA/ rel_tec/RT_137.ps.zip. [7] F. C. Bernardini and M. C. Monard. ELE — Ensemble Learning Environment to construct symbolic classifiers: Implementation description (in portuguese). Technical Report 243, ICMC/USP, 2004. ftp://ftp.icmc.usp.br/pub/BIBLIOTECA/rel_tec/ RT_243.pdf. [8] C. Blake, E. Keogh, and C.J. Merz. UCI repository of machine learning databases, 1998.
Efficient Identification of Duplicate Bibliographical References VINÍCIUS VELOSO DE MELO and ALNEU DE ANDRADE LOPES1 Instituto de Ciências Matemáticas e de Computação da Universidade de São Paulo
Abstract. In this work we present an approach to extract and to structure bibliographical references from BibTex files, allowing the identification of the duplicate ones, which can appear slightly different in different files. To deal with this problem, existing systems use classifiers, clustering or others algorithms, allied with an Edit Distance metric, to distinguish between duplicate and nonduplicate records. The main challenge is to identify the duplicate records in database where the volume of the references can reach millions, in an efficient computational time. The technique proposed constructs a key (string) with information from each reference and stores them in a metric data structure called Slim-Tree. The Slim-Tree structure allows the minimization of the comparisons between references (being close to O(nlog(n))), considering only the most similar keys to a given one. Keywords: metric trees, duplicate record detection, bibliographical references.
Introduction In the last few years many techniques in natural language processing have been proposed in particular to the task of information retrieval. Tools that help the search, selection, and specific and relevant information extraction from scientific areas had become popular [2]. In order to achieve such tasks one has to cope with duplicated or similar information in the data. In scientific articles, for instance, an important source of information is the bibliographical references present in the text. However, even in a homogeneous corpus, where the articles came from the same source (publication), multiple representations of the same reference can appear in different articles. Therefore, to use information from these references one has to accomplish the task of merging these duplicated references into a single and more complete record. The duplications may be due to different formats used, typographical or OCR errors, abbreviations, missing data, among other reasons. In this paper we deal with the problem of the duplicate bibliographical references in a large database. Systems that handle this kind of problem, such as Citeseer2, process a large number of scientific publications to extract, parse and identify (commonly using a distance function – a metric – to determine the similarity between two strings) their 1 2
Laboratory of Computational Intelligence at ICMC-USP www.citeseer.com
170
V.V. de Melo and A. de Andrade Lopes / Efficient Identification
citation lists. However, as discussed in [3], Citeseer uses an approximate record-level matching algorithm. This type of matching algorithm is used when one of the metadata sources represents entities with a single string. Thus, a field-level matching on title, year, and author is in general more accurate. In order to complete this task in an efficient way we use a metric data structure (the distance function is a metric) to store and query the records. Due to the fact that the distance function calculation consumes a lot of time, we store only small string keys in the metric data structure. Each key identifies one record (bibliographical reference) that is stored in an array. Querying for the similar keys we got a small set of full records whereby we actually compute the matching using the entire authors and title fields. The next sections of this article are organized in the following way: in the section 1 we present the problem of the duplication of bibliographical references, the metric used to calculate the similarity between them, and describe some related work. In the section 2 we describe the approach as well the data structure used for indexing and querying each reference. In the section 3 we show the results of the proposed technique and, finally, in section 4 we finish the article with the conclusions and future works.
1. The Problem of Duplicate References References to a same article can present significant differences in format, in different articles (Figure 1). Thus, if one needs to use such information, firstly each reference mentioned in an article has to be individually identified. The identification and indexing process, for instance, knowing when a same reference appears in different articles, provide information such as the impact factor of certain article or publication. The CiteSeer, for instance, uses that information to make easier the search for scientific articles, but it is not perfect in the identification of the similar references, as we can see in the Figure 2. Figure 1: Different formats for a same reference.
[28] Stephen Muggleton and Wray Buntine. Machine invention of first-order predicates by inverting resolution. In Proc. of ICML 88, pages 339-352. Morgan Kaufmann, 1988. Muggleton, S., and Buntine, W., 1988. “Machine Invention of First-Order Predicates by Inverting Resolution”, In Morgan Kaufmann, editor, Proceedings of the 5th International Conference on Machine Learning, pp. 339-352. Muggleton, S., &Buntine,W. (1988). Machine invention of first-order predicates by inverting resolution. In Proceedings of the Fifth International Conference on Machine Learning, pp. 339352 Ann Arbor, MI.
The identification of the similar references is achieved through the comparison between strings. They are considered duplicate when the similarity degree between them is above of a given threshold. In the case of bibliographical references, the strings are commonly formed by the concatenation of the fields author and title [5; 8; 9]. The comparison is computed by an Edit Distance function d(source, destiny). Some examples of used algorithms are the Levenshtein (L-edit) [7], Smith-Waterman [12], Needleman - Wunsch [11], in implementations of tools as the Unix agrep. In the Edit Distance, the difference between two strings is simply the number of insertions, deletions, or substitutions required to transform one string into another. For instance, the distance d(“abcd”, “abcde”) = 1, because it is necessary the insertion of
V.V. de Melo and A. de Andrade Lopes / Efficient Identification
171
the character “e” at the end of the source string to transform it into the destiny string. The distance d(“abcd”, “aabcd”) = 1, because it is necessary the insertion of the character “a” at the beginning of the source string to transform it into the destiny string. For that reason, usually the distance function used in problems of string matching is the Edit Distance. Figure 2: Use of the bibliographical references in CiteSeer.
The problem of duplication of records is also important to clean up databases containing inaccurate or inconsistent data, like addresses, customers’ records, requests records, among others. Next, some work on bibliographical references are described. Hylton considers each record as a source record and query the database to select the group of candidate records that contain similar terms to the source record, that belongs to a same author. He uses the algorithm merge/purge [4] and a similarity function based on 3-grams of characters of the fields author + title, to form clusters of records. Monge & Elkan use an algorithm that carries out several sorting and search steps to maintain a trail of the clusters of duplicate records as they are found. They adopted the Smith-Waterman measure [12] to compare author + title, and the number of comparisons was reduced in 75% if compared to Hylton’s work. The approach used in this work, explained in the next section, considers the comparison of a special key using the authors and title attributes for doing the selection of the candidate references. However, to select the references we use a metric data structure, and the employed similarity measure is an implementation derived from the GNU diff 2.7 [10], acquired in the package String::Similarity for Perl, developed by Marc Lehmann.
2. Our Approach The approach described in this paper works with files in BibTex format, and the algorithm is divided basically in 2 stages: 1. pre-processing: process all the BibText files in order to organize them to the next step where the similar references are identified; 2. identification: aims at identifying all the similar references by comparing the strings formed by the concatenation of the fields author(s) and title, assigning the same ID to them. In next sub-sections we detail the extraction and identification processes.
172
V.V. de Melo and A. de Andrade Lopes / Efficient Identification
2.1. Pre-processing A Perl script does the pre-processing extracting the fields author(s), title and year of each reference and creating a record containing: ID (code of the reference - initially each reference has a different ID), Author(s), Title, Year (for the references that doesn't contain year, it is adopted year 9999) and Key. The key is a string of variable size, composed by the first character of each one of existing terms in author(s), and title, after removing some stopwords (the, from, goes, to, etc), digits and delimiter symbols. The script sorts the characters by field and removes the repeated ones to create the Key. After that, the script stores the record in a table. The script also sorts the table in ascending order by year, key, author(s), and title. Some articles have references without publication year. Such references are positioned at the end of the table so that they can be identified when some similar reference, that has the year, has already been identified. With that sort, same references, written in a different way, tend to be closer (Table 1). Table 1: Similar References close to each other. Year
Key
Author(s)
Title
1975
BCEMRCFIRS
E. Rosch and C.B. Mervis
1975
BCEMRCFRS
Rosch, E., Mervis, C. B
1975 1975
DESBPS EMRCFPRS
E. D. Sacerdoti Rosch, E., Mervis
Family resemblance studies in the internal structure of categories. Family resemblances: studies in the structure of categories. A structure for plans and behavior. CB.: Family resemblances: studies in the structure of categories.
In the end of this stage, the script saves the table to allow the identification process. 2.2. Identification The main problem of the identification stage is the evaluation of the similarity (or distance) between two strings using an Edit Distance algorithm. The distance is given by the number of insertions, deletions, or substitutions required to transform one string into another. Such algorithms are known to be O(n2), impairing the computational performance. In our approach the distance function used is measures the distance in percentage. When the difference between two strings is inferior to 25% they are considered to be the same. Based on tests, that was the value that presented better results for the dataset used in our experiments. The concatenation of the fields author(s) and title, as used in [5; 8; 9;] allows that differences between strings be compensated by other fields. For instance, the difference between "Aamodt, A." and "Agnar Aamodt" is 45%. If we don’t concatenate the fields, this example would fail. 2.2.1. Data Structures for Query The main point of our approach is the way that the algorithm selects the most similar references, which reduces the amount of comparisons. The selection is a search for strings that have an inferior distance to a certain threshold, from the source string. As
V.V. de Melo and A. de Andrade Lopes / Efficient Identification
173
we use a metric distance function, it is natural to adopt a metric data structure or Metric Access Method (MAM). A MAM builds a structure of indexes evaluating the distances between the objects that are inserted in the tree, supporting proximity or similarity queries in a natural way. The chosen MAM to evaluate our approach was the SlimTree. The Slim-Tree [13] is a dynamic tree to organize groups of data based on a metric, see Figure 3. The Slim-tree aims at organizing the objects into a hierarchical structure using a representative as the center of a minimum bounded area, containing the objects in a sub-tree. The Slim-Tree uses the bottom-up construction technique that maintains the tree balanced and enables new insertions after its construction. It has internal nodes (index nodes), which points to sub-trees, and leaf nodes, that store the objects. This way, the objects are stored in sub-trees in agreement with its distance in relation to the central object of that sub-tree, justifying the use of a MAM in the string-matching problem. Figure 3: Structure of Slim-Tree storing words with the L-edit distance function, where R is the index node coverage radius. [14].
2.2.2. The References Identification Process The table generated in the pre-processing stage is loaded and stored in two data structures: 1. 2.
a table that stores all the records (pre-processed references); an auxiliary structure, used to search similar records, in which one inserts only the non-duplicate keys (with a link for the corresponding record in the table), or the ones with same keys but with different authors.
Those criteria are adopted to reduce the number of keys (that are used in the similarity search), reducing, consequently, the number of comparisons. The comparison between two strings is carried out only if they have the same year of publication. For that, an array of Slim-Trees is used as an auxiliary structure, where each tree stores the keys of a certain year. Hylton and Monge & Elkan compare references of different years, but only when both have the same authors. This last criterion reduces still more the number of comparisons. However, we do not adopt it because one can have mistakes or differences in the orthography of the names, avoiding that two equal references be compared. It is important to note that the distance calculation between two references considers the concatenated strings (author + title) of the references. The update of the ID of each reference, represented by Figure 4, can be obtained using two types of search:
174
V.V. de Melo and A. de Andrade Lopes / Efficient Identification
Figure 4: References Identification Process.
x
x
k-nearest neighbor (KNN): the algorithm searches by k nearest neighbors in the Slim-Tree, where the input are the key of the source reference that one wants to identify and the year of publication. The output is a list containing the candidate keys, that is, the k more similar keys to the input key, with the same year of publication. Each one of the keys is associated to a reference in the table of records (1). After that, it is computed the distance between the source reference and each returned references (author + title) (2). The algorithm updates the ID of the source reference with the ID of the most similar reference, chosen from the set of returned references (with distance threshold). If this set is empty or there isn’t a similar reference, it creates a new ID (3). range query: it returns a list containing all the keys at a distance d from the input key with the same year of publication (1). Then, the algorithm computes the distance between the source reference and all the references corresponding to the returned keys (2). If one of the references found is sufficiently similar, the algorithm updates the ID of all references of the list with distance d threshold. Otherwise, the source reference gets a new ID.
The next section presents the results of the use of MAM, being compared the two search methods.
3. Experimental Results We applied our approach to a similar dataset used in [5; 9], taken from A Collection of Computer Science Bibliographies [1]. Due to the different sources, the records may be duplicate and probably have typographical errors, different types of abbreviations and information. After parsing the 340 BibTex files, the system creates 320.377 records. Each record has many fields such as author, title, publisher, year, edition, volume, number, address, topic, pages, etc, but our approach considers only author and title for the comparison, and year for partitioning the references set. Before running the experiments, we have to define a threshold that determines when two strings represent the same record. Some authors set the value between 60% and 80%, depending on the distance function. In our case, the value was set to 75% of similarity. Other two values must be set: k, the k most similar references, and the radius for the range query. After some tests, we set k to 4 and the radius to 60% of similarity. To do a correct comparison, we configure the KNN approach to ignore the keys below 60% of similarity. It is important to say that the higher is the k and the radius values, more keys will be returned from the query, resulting in a larger number of comparisons and, maybe, a better accuracy in the identification process.
V.V. de Melo and A. de Andrade Lopes / Efficient Identification
175
In the Figure 5 we present the results of the experiments. According to the graphics, one can see that the KNN curve is lower than O(nlog(n)), in this dataset. One important point is that the references without year has to be compared against the references of the other years, until find a similar one. This increases the number of comparisons. In our dataset, we do not have any reference without year. Figure 5: Results of the experiments.
Table 2 presents the total number of references versus the number of different references found by the different experiments. The number of references identified by the KNN approach is lower than the number identified by the Range query. The real number of different records in the dataset is unknown. This way, we can’t say what is the best result. However, based on visual inspection, one can say that the great majority of the duplicate records were correctly identified. Table 2: Different references found. References
KNN
Range
320.000
267.112
266.390
Using a dataset with approximately 254.000 references, Hylton reports making about 7.5 million comparisons. Monge & Elkan reports 1.6 million. With 320.000 references, our approach carried out 533.430 comparisons using KNN and k=4. The 3 datasets are not the same, but in our approach the number of comparisons needed to identify all references of a dataset, in the worst case for the KNN classification is Comp = (k * n) + (k * m * y), where n is the number of references that have year, m is the number of references that doesn’t have year, and y is the number of different years. We note that here we only compute references comparisons. As the value of k is commonly low (we use k=4), the number of comparisons is near linear to n. However, the insertion of references without year can impact the number of comparisons.
4. Conclusions and Future Works In a previous work [8], we discussed a technique in which we had focused in the accuracy of the duplicated references identification process. We had shown that the use of a key structured from author and title fields, as well the metric distance as we have used in this work lead us to 97% of accuracy in the indexing process. Here, our aim
176
V.V. de Melo and A. de Andrade Lopes / Efficient Identification
was to overtake the efficiency of that previous approach improving the identification process to be able to deal with large dataset. The duplicate detection approach described in this paper shows that the use of a metric data structure can be a good choice in this task and related problems. The KNN query easily overcomes the Range query. But, if the dataset has a large number of duplicate records, the range query approach, that updates all the similar ones at a time, can be a better choice. The Slim-Tree structure allows the minimization of the comparisons between references (being close to O(nlog(n))), considering only the most similar keys to a given one. However, one problem that remains open is how to minimize the comparison when the references do not have the year field.
References [1] [2]
[3]
[4] [5] [6] [7] [8]
[9]
[10] [11] [12] [13] [14]
Achilles, A.C . (1996). A collection of computer science bibliographies. http://liinwww.ira.uka.de/bibliography/index.html. Bollacker, K.; Lawrence, S.; and Giles, C. L. (1998). CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Sycara, K. P. and Wooldridge, M., editors, Proceedings of the Second International Conference on Autonomous Agents, pages 116–123, New York. ACM Press. Borkar, V.; Deshmukh, K.; and Sarawagi, S. (2001). Automatic segmentation of text into structured records. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 175–186, Santa Barbara, California. Hernandez, M. A. (1996). A Generalization of Band Joins and The Merge/Purge Problem. PhD thesis, Columbia University. Hylton, J. A. (1996). Identifying and merging related bibliographic records. Master Thesis MIT/LCS/TR-678, MIT. Lawrence, S.; Giles, C. L.; and Bollacker, K. (1999). Digital libraries and autonomous citation indexing. IEEE Computer, 32(6):67–71. Levenshtein, V. I. (1966). Binary codes capable of correcting insertions and reversals. Soviet Physics Doklady, 10(8):707–710. Melo, V.; Secato, M.; and Lopes, A. A. (2003). Extração e identificação automáticas de informações bibliográficas de artigos científicos. IV Workshop on Advances and Trends in AI for Problem Solving, pages 1–7. Monge, A. E. and Elkan, C. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records. In Research Issues on Data Mining and Knowledge Discovery, pages 0–. Myers, E. (1986). An O(nd) difference algorithm and its variations. Algorithmica, 1:251–256. Needleman, S. B. and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48:443–453. Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147:195–197. Traina., A. J. M. (2001). Suporte à visualização de consultas por similaridade em imagens médicas através de estrutura de indexação métrica. Tese de livre docência apresentada ao ICMC-USP. Traina, C.; Traina, A. J. M.; Seeger, B.; and Faloutsos, C. (2000). Slim-trees: High performance metric trees minimizing overlap between nodes. In VII International Conference on Extending Database Technology - EDBT, pages 51–65, Konstanz - Germany.
Autoepistemic Theory and Paraconsistent Logic Program Kazumi Nakamatsu a and Atsuyuki Suzuki b University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN [email protected] a
Abstract. We clarify the relation between autoepistemic theories and a paraconsistent logic program called Vector Annotated Logic Program (VALPSN for short) proposed by K.Nakamatsu et al.. We review the stable class semantics for VALPSN and propose a translation from Moor’s autoepistemic theories to VALPSN. Based on the translation, we prove that there is a one-to-one correspondence between stable classes of VALPSN and iterative expansion classes of autoepistemic theories. Keywords. paraconsistent logic program, non-monotonic reasoning, auto -epistemic theory, stable class, expansion class
1. Introduction Various non-monotonic reasonings, eg. default, autoepistemic, defeasible reasoning, are utilized in artificial intelligence field. For instance, more than two kinds of non-monotonic reasoning are used such as default and temporal reasoning. Recently the treatment of contradiction has become important. However, it is difficult to deal with such non-monotonic reasoning uniformly, since they have different semantics. Thus, we represent the semantics for such non-monotonic reasoning in tractable paraconsistent logic programs that can deal with contradiction easily. We have already proposed a paraconsistent logic program called ALPSN (Annotated Logic Program with Strong Negation) [8] in order to deal with default theory. We have also proposed an extended version of ALPSN called VALPSN (Vector Annotated Logic Program with Strong Negation) [9,10] in order to deal with other kinds of non-monotonic reasoning such as defeasible or plausible ones. The purpose of this paper is to represent the semantics for Moore’s autoepistemic theory [7] in VALPSN and clarify the relation between the autoepistemic theory and VALPSN. Then, we can treat autoepistemic theory on a common platform based on paraconsistent logic programs as well as other kinds of nonmonotonic reasoning. In this paper, first of all, we introduce VALPSN that has been already proposed in [9,10] and the stable class semantics[2], Additionally, we propose a trans-
178
K. Nakamatsu and A. Suzuki / Autoepistemic Theory and Paraconsistent Logic Program
[Syntax] tr
Autoepistemic Theory =⇒ VALPSN AE tr(AE) 6 6 [Semantics] ? ? 1 to 1 Expansion Classes ⇐⇒ Stable Classes Figure 1. Autoepistemic Theories and VALPSN
lation from Moore’s autoepistemic theory into VALPSN and prove that a oneto-one correspondence between the iterative expansion classes of autoepistemic theory and the stable classes of VALPSN. 2. VALPSN In this section, we recaputulate VALPSN and propose the stable class semantics for VALPSN. Generally, a truth value called an annotation is explicitly attached to each literal in annotated logic programs. For example, let p be a literal, µ an annotation, then p : µ is called an annotated literal. The set of annotations constitutes a complete lattice. An annotation in VALPSN is a 2-dimensional vector called a vector annotation such that each component is a non-negative integer and the complete lattice Tv of vector annotations is defined as : Tv = { (x, y)|0 ≤ x, y ≤ m, x and y are non-negative integers }. The ordering of the lattice Tv is denoted by a symbol and defined : let x1 ≤ x2 and y1 ≤ y2 . v 1 = (x1 , y1 ) and v 2 = (x2 , y2 ), then, v 1 v 2 iff For a vector annotated literal p : (i, j), the first component i of the vector annotation denotes the amount of positive information to support the literal p and the second one j denotes that of negative information. We assume the integer n as 1 throughout this paper. For example, a vector annotated literal p : (1, 0) can be informally interpreted that the literal p is known to be true of strength 1 and false of strength 0, which means the literal p is known to be true. There are two kinds of negation, epistemic negation, ¬, and strong negation, ∼. The epistemic negation is defined as a mapping over Tv . Definition 1(Epistemic Negations of VALPSN ¬) ¬(p : (i, j)) = p : ¬(i, j) = p : (j, i). The epistemic negation followed by a vector annotated literal can be eliminated by the above syntactic operation. The strong negation (∼) in VALPSN can be defined by the epistemic negation as follows and interpreted as classical negation. Definition 2 (Strong Negation) Let F be an arbitrary formula. ∼ F =def F → ((F → F ) ∧ ¬(F → F )).
K. Nakamatsu and A. Suzuki / Autoepistemic Theory and Paraconsistent Logic Program
179
The reasons motivating our choice of VALPSN to study the problem of representing the semantics for autoepistemic theory are explained as follows : the modal operator K used in autoepistemic theory is given an epistemic interpretation. Intuitively, Kα is to be interpreted as “α is known to be true. On the other hand, a vector annotated literal (A : µ) can represent explicitely a truth value of A by its annotation µ
based on the complete lattice structure of truth values Tv = {(0, 0), (1, 0), (0, 1), (1, 1)}. Then we have the following intuitive interpretations : A : (1, 0) : A is known to be true ; A : (0, 1) : A is known to be false ; A : (1, 1) : A is known to be both true and false (inconsistent) ; A : (0, 0) : A is unknown to be neither true nor false. Therefore, atomic formulas of both autoeistemic theory and VALPSN can be interpreted epistemically. Moreover, nonmonotonicity in autoepistemic theory can be represented by the strong negation in VALPSN as well as the case of default theory. Definition 3 (well vector annotated literal) Let p be a literal. p : (i, 0) or p : (0, j) are called well vector annotated literals, where i and j are 1 in this paper. Definition 4 (VALPSN) If L0 , · · · , Ln are well vector annotated literals, L1 ∧ · · · ∧ Li ∧ ∼ Li+1 ∧ · · · ∧ ∼ Ln → L0 is called a vector annotated logic program clause with strong negation (VALPSN clause). A Vector Annotated Logic Program with Strong Negation is a finite set of VALPSN clauses. All interpretations for a VALP P have as their domain Herbrand base BP under consideration. The Herbrand interpretation I of the VALP P is considered to be a mapping I : BP −→ Tv . Usually, I is denoted by the set, {p : µ|I |= (p : µ 1 ) ∧ · · · ∧ (p : µ n )}, where, µ i is the least upper bound of {µ 1 , . . . , µ n }. The ordering over Tv is extended to interpretations and the notion of satisfaction is defined. Definition 5 Let I1 and I2 be any interpretations of a VALP P , and A be a literal. I1 I2 =def (∀A ∈ BP )(I1 (A) I2 (A)). Definition 6(Satisfaction) the the the the
An interpretation I is said to satisfy
formula F iff it satisfies every closed instance of F ; variable-free atom A : µ iff µ I(A) ; variable-free annotated literal ¬A : µ iff ¬ µ I(A) ; formula ∼ F iff I does not satisfy F .
The satisfaction of other formulas F1 ∧ F2 , F1 ∨ F2 , F1 → F2 , ∀xF , and ∃xF are satisfied as well as classical logic. The satisfaction is denoted by the symbol models. Associated with every VALPSN P , a function TP between Herbrand interpretations. Definition 7 Let A be an atom, P a VALP, and I an interpretation of P .
180
K. Nakamatsu and A. Suzuki / Autoepistemic Theory and Paraconsistent Logic Program
TP (I)(A) =def { µ|B1 ∧ · · · ∧ Bm → A : µ is a VALP clause in P and I |= B1 ∧ · · · ∧ Bm }. The least upper bound always exists, since Tv is a complete lattice under . Here we define a special interpretation ∆ to be the interpretation that assigns the truth values (0, 0) to all members of the Herbrand base BP . Definition 8 The upward iteration is defined as : TP ↑ 0 = ∆
TP ↑ λ = α<λ TP (TP ↑ α)
for all ordinals α and λ. We introduce well-known results about VALP without strong negation and the operator TP . Proposition 1 If P is a VALP, then (1) TP is a monotonic function. (2) P has a least model that is identical to the least fixed point of TP . (3) TP ↑ ω is identical to the least fixed point of TP . The proof of the above propositions can be found in [3].
3. Stable Class Semantics for VALPSN In this section, we extend the stable class semantics for ordinary predicate logics with negation proposed in [1] to VALPSN and every VALPSN has no empty stable class. First of all, we review the stable model introduced in [5] for a VALPSN P . Definition 9(Gelfond-Lifschitz Transformation) Let I be any interpretation, P I the Gelfond-Lifschitz transformation of the VALPSN P with respect to I, can be obtained from the VALPSN P by deleting ; each VALPSN clause that has a literal ∼ (C : µ) in its body with I |= (C : µ), and all strongly negated literals in the bodies of the remaining VALPSN clauses. Since P I does not have neither the epistemic negation nor the strong negation, it has the unique least model that is given by TP I ↑ ω [3,5]. Definition 10(Stable Model) If I is a Herbrand interpretation of a VALPSN P , Iis called the stable model ofP
iff I = TP I ↑ ω.
Definition 11(Stable Classes) Let R be a set of indices, S = {Ii |i ∈ R} a set of interpretations, P Ii the Gelfond-Lifschitz transformation of P with respect to Ii , and M (P Ii ) the unique least Herbrand model of P Ii . Sis the stable class ofP
iff S = {M (P Ii )|i ∈ R}.
Suppose that an operator FP (I) that maps interpretations to interpretations such that FP (I) = M (P I ), the concept of the stable models is defined in terms of the fixed point of the operator FP (i.e.,I is the stable model of P iff FP (I) =
K. Nakamatsu and A. Suzuki / Autoepistemic Theory and Paraconsistent Logic Program
181
M (P I ) = I)). However, this operator may not always have fixed point, since the operator FP is anti-monotonic. The basic idea behind the stable class is that even though the operator FP may have no fixed point, there might be the fixed point F 2P . Let us define the meaning of a VALPSN P based on the stable classes. Definition 12 Let {I1 , . . . , In } be a stable class and p : µ ∈ Ii (i = 1, . . . , n), where each Ii is an interpretation. The literal p has the truth value µ in the stable class iff µ = {µi |i = 1, . . . , n}. Let {S1 , . . . , Sm } be stable classes and the literal p have the truth value µ j in each stable class Sj (j = 1, . . . , m). The literal p has the truth value µ in the stable classes iff µ = {µ j |j = 1, . . . , m}, where is a greatest lower bound operator. Now, we show that every VALPSN has non-empty stable classes. The following theorem can be obtained by extending the THEOREM 1 in [1] to VALP. Theorem 1 Every VALPSN has non-empty stable classes. proof Suppose FP (I) = M (P I ). Due to the property of the Gelfond-Lifschitz transformation, if I1 ≤ I2 then P I1 ⊆ P I2 , and since both P I1 and P I2 are VALPs contain no strong negation, M (P I1 ) ≤ M (P I2 ). Therefore, FP is anti-monotonic and F 2 P is monotonic. Let Iˆ be the least fixed point of F 2 P and gˆ be the greatest fixed point of FP . From the reference [6], ˆl = {x|F 2 P (x) ≤ x} and gˆ = {x|x ≤ F 2 P (x)}. Let L = {x|F 2 P (x) ≤ x} and G = {x|x ≤ F 2 P (x)}. Since ˆl ∈ L, F 2 P (ˆl) ≤ ˆl. Hence, FP (ˆl) ∈ G. Since gˆ is an upper bound of G, FP (ˆl) < gˆ (eq1). By anti-monotonicity of FP , we have FP (ˆ g ) < FP (ˆl). Since ˆl 2 ˆ ˆ is the fixed point of FP , FP (l) < l (eq2). Similarly, since gˆ ∈ G, gˆ < FP 2 (ˆ g ). g ) ∈ L. Since ˆl is a lower bound of L. Thus, ˆl < FP (ˆl) (eq3). By Hence, FP (ˆ anti-monotonicity of FP , we have FP (ˆ g ) < FP (ˆl). Since gˆ is the fixed point of 2 FP , gˆ < FP (ˆl) (eq4). From (eq2) and (eq3), we have ˆl = FP (ˆl), and from (eq1) and (eq4), we have gˆ = FP (ˆl). Hence, there is a non-empty stable class {ˆl, gˆ}.
4. Vector Annotated Semantics for Autoepistemic Theory In this section, we consider the translation tr from autoepistemic theory into VALPSN. It also translate the iterative expansion classes for autoepistemic theory into the stable classes of the corresponding VALPSN. We have the relation in Figure 1 between autoepistemic theory and VALPSN. We review autoepistemic theory briefly. Autoepistemic logic is a non-monotonic logic proposed by Moore[7] and its meaning is characterized by an expansion that is a set of modal formulas. An epistemic interpretation is given for the modal operator of autoepistemic theory. Autoepistemic language LK is obtained by extending first-order language L with a modal operator K. If φ is a formula in L, Kφ is a formula in LK . Intuitively, Kφ is to be read as “φ is known to be true”. Moreover, it is known that every autoepistemic theory can be represented equivalentlly in a normal form where all sentences have a form Kα1 ∧ · · · ∧ Kαm ∧ ¬Kβ1 ∧ · · · ∧ ¬Kβn → W,
182
K. Nakamatsu and A. Suzuki / Autoepistemic Theory and Paraconsistent Logic Program
where α1 , . . . , αm , β1 , . . . , βn and W do not contain the modal operator K. However, we restrict those formulas are literals in order to translate the normal forms into VALPSN clauses. We introduce the iterative expansion classes for autoepistemic theory. Definition 13(Iterative Expansion) Suppose that S and E are sets of formulas of LK . Let ¯ B E 0 (W ) = Cn(W ∪ ¬K E), B E n (W ). B E n+1 (W ) = B(B E ( W )), B E ( W ) = B(S) = Cn(S ∪ KS),
0≤n≤ω
Then, E is an iterative expansion of an autoepistemic theory AE iff E = B E (AE), where Cn is a Tarskian consequence operator. Given an autoepistemic theory AE, we associate an operator GA that maps sets of formulas in LK as follows : GA (E) = B E (AE). Definition 14(Iterative Expansion Classes) A finite non-empty set E of formulas in LK is said to be an iterative expansion class for an autoepistemic theory AE iff E = {GA (Ei )|Ei ∈ E}. 4.1. Autoepistemic Theory into VALPSN Here, the translation tr from autoepistemic theories to VALPSN is defined. Definition 15(translation tr) For any restricted standard form F of an autoepistemic theory AE such that F = Kα1 ∧ · · · ∧ Kαm ∧ ¬Kβ1 ∧ · · · ∧ ¬Kβn → w, tr(F ) = α1 : (1, 0) ∧ · · · ∧ αm : (1, 0)∧ ∼ β1(0, 1) ∧ · · · ∧ ∼ βn(0, 1) → w : (0, 1), tr(AE) = {tr(F )|F ∈ AE}. We show the correspondence between the stable classes of VALPSN and the iterative expansion of autoepistemic theory based on the translation tr. Definition 16 Let I be an interpretation of a VALPSN and p any ground literal. Then, Cna(I) = {p|p :∈ I}. Theorem 2 Let AE be an autoepistemic theory. S = {Si |i ∈ R} is a stable class of the VALPSN tr(AE) iff E = {Cna(Si )|Si ∈ S} is an iterative expansion class of the autoepistemic theory AE. We prove a lemma before proving Theorem 1. Lemma 1 Let A be any ground literal that is the set W of consequence of autoepistemic standard formula F . Then : for n = 0, 1, 2, . . ., A and ¬KA ∈ B E n (W ) iff TP I ↑ (n + 1) |= A : (1, 0) and ∼ (A : (0, 1)), where the VALPSN=tr(AE), I is an interpretation of P and E = Cna(I). Proof By the induction on the integer n such that A ∈ B E n (W ). We define an interpretation TP I ↑ 1 to be the interpretation that assigns the value (0, 0) to all members of BP \{A} and the value (1, 0) to the literal A, and let TP I ↑ i = TP I (TP I ↑ (i − 1)), for any integer i ≥ 2. Basis : n = 0. In this case, by the definition of B E 0 ,
K. Nakamatsu and A. Suzuki / Autoepistemic Theory and Paraconsistent Logic Program
183
¯ A and ¬KA ∈ B E 0 (W ) iff A ∈ Cn(W ¬K E). ¯ and derive TP I ↑ 1 |= A : (1, 0). By the definition We assume A ∈ Cn(W ¬K E), of TP I ↑ 1, it satisfies the VALP P I and the vector annotated literal A : (1, 0). Hence, TP I ↑ 1 |= A : (1, 0) and ∼ (A : (0, 1)). Inversely, suppose TP I ↑ 1 |= A : (1, 0) and ∼ (A : (0, 1)). By the definition of TPI , we can assume that there is a literal A in W . Thus, we have ¯ A ∈ Cn(W K E). Induction Hypothesis : there is an integer n ≥ 0 such that, for any ground literal A, A and ¬KA ∈ B E n (W ) iff TP I ↑ (n + 1) |= A : (1, 0) and ∼ (A : (0, 1)). Induction Step : we prove that A and ¬KA ∈ B E n+1 (W ) iff TP I ↑ (n + 2) |= A : (1, 0) and ∼ (A : (0, 1)). In this case, A ∈ {w|Kα1 ∧ · · · ∧ Kαm ∧ ¬Kβ1 ∧ · · · ∧ ¬Kβn → w ∈ AE}. There is a VALPSN clause α1 : (1, 0) ∧ · · · ∧ αm : (1, 0)∧ ∼ β1 : (0, 1) ∧ · · · ∧ ∼ βn : (0, 1) → w : (1, 0) in the VALPSN P and, since Kα1 ∧· · ·∧Kαm and ¬Kβ1 ∧· · ·∧¬Kβn ∈ B E n (W ), by the induction hypothesis, TP I ↑ (n + 1) |= α1 : (1, 0) ∧ · · · ∧ αm : (1, 0) ∼ (β1 : (0, 1)) ∧ · · · ∧ ∼ (βn : (0, 1)) → w : (1, 0)
(1)
By the definition of TP I and (1), we have TP I ↑ (n + 2) |= A : (1, 0) and ∼ (A : (0, 1)).
(2)
Inversely, suppose that TP I ↑ (n + 2) |= A : (1, 0) and ∼ (A : (0, 1)). By the definition of TP I , there is a VALPSN clause α1 : (1, 0) ∧ · · · ∧ αm : (1, 0)∧ ∼ (β1 : (0, 1)) ∧ · · · ∧ ∼ (βn : (0, 1)) → w : (1, 0) in P I such that TP I ↑ (n + 1) |= α1 : (1, 0) ∧ · · · ∧ αm : (1, 0) ∧ ∼ (β1 : (0, 1)) ∧ · · · ∧ ∼ (βn : (0, 1)) → w : (1, 0). Then, there is an autoepistemic formula Kα1 ∧ · · · ∧ αm ∧ ¬Kβ1 ∧ · · · ∧ Kβn → A. Additionally, by the induction hypothesis, Kα1 ∧ · · · ∧ αm ∧ ¬Kβ1 ∧ · · · ∧ Kβn ∈ B E n (W ). Hence, A ∈ B E n (W ) and ¬KA ∈ B E n (W ). Therefore, we have the conclusion, for any integer n ≥ 0, A and ¬KA ∈ B E n (W ) iff TP I ↑ (n + 1) |= A : (1, 0) and ∼ (A : (0, 1)). Q.E.D By the Lemma 1 we can easily prove Theorem 2.
184
K. Nakamatsu and A. Suzuki / Autoepistemic Theory and Paraconsistent Logic Program
5. Conclusion In this paper, we have introduced VALPSN and its stable class semantics. Moreover, we have introduced iterative expansion class semantics for autoepistemic theory. Based on the translation that we proposed from autoepistemic theory into VALPSN, we have shown that there is a one-to-one correspondence between the iterative expansion classes of autoepistemic theory and the stable classes of VALPSN.
References [1] Baral,C.R. and Subrahmanian,V.S., Stable and Extension Class Theory for Logic Programs and default Logics, J.Automated Reasoning, 8 (1992), 366-385. [2] Baral,C.R. and Subrahmanian,V.S., Dualities Between Alternative Semantics for Logic Programming and Nonmonotonic reasoning, J.Automated Reasoning, 10 (1993), 399-420. [3] Blair,H.A. and Subrahmanian,V.S., Paraconsistent Logic Programming, Theoretical Computer Science, 68 (1989), 135-154. [4] da Costa,N.C.A., Subrahmanian,V.S., and Vago, C., The Paraconsistent Logics PT , Zeitschrift fur Mathematische Logic und Grundlangen der Mathematik, 37 (1989), 139-148. [5] Gelfond, M. and Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. 5th Int’l Conf. and Symp. on Logic Programming, (1989), 1070-1080. [6] Lloyd,J.W., Foundations of Logic Programming (2nd Edition), Springer-Verlag, 1987. [7] Moore,R., Semantical Considerations on Non-monotonic Logic, Artificial Intelligence, 25 (1985), 75-94. [8] Nakamatsu,K. and Suzuki, A., Annotated Semantics for Default Reasoning, Proc. 3rd Pacific Rim Int’l Conf. Artificial Intelligence, Academic Press, (1994), 180-186. [9] Nakamatsu,K., Abe,J.M. and Suzuki,A., Defeasible Reasoning Between Conflicting Agents Based on VAPSN, Proc. AAAI Workshop Agents’ Conflicts, AAAI Press, (1999) 20-27. [10] Nakamatsu,K. : On the Relation Between Vector Annotated Logic Programs and Defeasible Theories, Logic and Logical Philosophy, 8 (2001) 181-205.
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence I Kazumi Nakamatsu a and Atsuyuki Suzuki b University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN [email protected] a
Abstract. In this paper, we introduce an annotated logic program with strong negation abbreviated as ALPSN(Annotated Logic Program with Strong Negation) and its stable model semantics and stable class semantics. Then we provide four declarative annotated semantics for nonmonotonic reasonings : default theory, nonmonotonic ATMS, the Negation as Failure rule, the Closed World Assumption based on ALPSN in the subsequent papers. Keywords. annotated logic program, default theory, nonmonotonic reasoning, nonmonotonic ATMS, Negation as Failure, Closed World Assumption
1. Introduction Various nonmonotonic reasonings (default, autoepistemic, circumscription etc.) are utilized in AI and the treatment of inconsistency is becoming more and more important in some specialized fields. For example, using default reasoning for belief revision techniques in knowledge representation systems and the treatment of a contradiction between agents in a multi-agent system, is crucial. When we consider more intelligent systems, they would require more than two kinds of nonmonotonic reasonings as well as the capability to deal with inconsistency, such as the NOGOOD in a nonmonotonic ATMS. However, it is difficult to deal with these nonmonotonic reasonings and inconsistency uniformly, since they have different semantics. Thus, we try to uniformly represent the semantics for such nonmonotonic reasonings and inconsistency by utilizing annotated logic. In this paper, first we review annotated logic programming and introduce an annotated logic program with strong negation and its stable model and stable class semantics. Then we provide four nonmonotonic reasonings : Reiter’s default theory [7], Dressler’s nonmonotonic ATMS [8], Clark’s Negation as Failure(NF) [10], and Reiter’s Closed World Assumption(CWA) [11]. Our method to represent the semantics for the nonmonotonic reasonings are based on translations from the nonmonotonic logic formulas into annotated formulas. The semantics for the default theory and the nonmonotonic ATMS are represented by an annotated
186
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI I
(inconsistent) @ @ f (false) t(true) @ @ ⊥(unknown) Figure 1. Lattice-4 of Truth Values
logic program with strong negation(ALPSN). The semantics for NF and CWA are represented by annotated predicate formulas. Generally annotated logic has a complete lattice structured set of truth values(annotations). The annotated logic which we use here is a four-valued logic which has a lattice structure called Lattice-4.
2. ALPSN In this section, we recapitulate the syntax and the semantics of an annotated logic program(ALP). We propose an annotated logic program with strong negation(ALPSN) and its stable class semantics. 2.1. Syntax of ALPSN We assume the reader is familiar with the usual syntactic notion of atomic formulas, terms, literals and the other concepts of ordinary logic and logic programming in Lloyd [3]. Generally, a set T of truth values of annotated logic has an arbitrary, but fixed, complete lattice structure. Throughout this paper we assume that T has the lattice structure given in the following Figure 1 called Lattice-4. The ordering of this lattice is denoted in the usual fashion by ≤. An ALPSN has two kinds of negation, the epistemic negation ¬ and the ontological negation (strong negation) ∼. The epistemic negation is a unary function from an annotation to an annotation such that ¬(⊥) = ⊥,
¬(f) = t,
¬(t) = f,
¬() = .
Therefore, we have (¬A : t) = (A : f) and (¬A : f) = (A : t). The ontological negation ∼ is a strong negation which is similar to classical negation. [Definition 2.1] If A is a literal, then A : µ is called an annotated literal, where µ ∈ T . µ is called an annotation of A. If µ is one of {t, f}, then A : µ is called a well-annotated literal. Note: ¬p : µ is interpreted as p : (¬µ). For instance, ¬p : t = p : (¬t) = p : f. Therefore, we may assume every annotated logic program that will appear in the rest of this paper has no epistemic negation ¬. [Definition 2.2](Strong Negation(∼))
Let A be any formula.
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI I
187
∼ A = (A → ((A → A ∧ ¬(A → A))). This strong negation has all the properties that a classical negation contains. [Definition 2.3]
If L0 , · · · , Ln are well-annotated literals over the lattice T ,
L1 ∧ · · · ∧ Ln → L0
(1)
is called a generalized Horn clause(gh-clause). If L0 , · · · , Ln are any annotated literals over the lattice T , then the formula (1) is called an annotated clause(aclause) over the lattice T , and L1 ∧ · · · ∧ Li ∧ ∼ Li+1 ∧ · · · ∧ ∼ Ln → L0
(2)
is called an annotated clause with strong negation(asn-clause) over the lattice T . [Definition 2.4] Generalized Horn Programs(GHP), ALP and ALPSN are finite sets of gh-clauses, a-clauses and asn-clauses, respectively. ALP may be regarded as a special case of ALPSN with no strong negation. 2.2. Semantics of ALPSN We now address the semantics of an ALPSN and assume that all interpretations under consideration have as their domain a Herbrand base BP (the set of all variable-free atoms). Since T is a complete lattice, a Herbrand interpretation I of an ALPSN P over the lattice T may be considered to be a mapping I : BP → T . Usually I is denoted by the set {(p : ∪µi )|I |= (p : µ1 ) ∧ · · · ∧ (p : µn )}, where ∪µi is a least upper bound of {µ1 , . . . , µn }. The ordering ≤ on the lattice T is extended to interpretation in a natural way. [Definition 2.5] formula.
Let I1 and I2 be any interpretations, and A be an atomic
I1 ≤ I2 = (∀A ∈ BP )(I1 (A) ≤ I2 (A)). Satisfaction is defined as follows. [Definition 2.6] (1) (2) (3) (4)
a a a a
An interpretation I is said to satisfy
formula F iff it satisfies every closed instance of F , variable-free annotated atom A : µ iff I(A) ≥ µ, variable-free annotated literal ¬A : µ iff I(A) ≥ ¬(µ), formula ∼ F iff I does not satisfy F .
188
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI I
The satisfaction of the other formulas, F1 ∧ F2 , F1 ∨ F2 , F1 → F2 , ∀xF and ∃xF , are identical to that in classical logic. The satisfaction is denoted by the symbol |=. Associated with every ALPSN P over T , a function TP from a Herbrand interpretation to a Herbrand interpretation is now defined. [Definition 2.7]
TP (I)(A) = ∪{µ | B1 ∧ · · · ∧ Bm ∧ ∼ C1 ∧ · · · ∧ ∼ Cn → A : µ is a ground instance of an asn-clause in P and I |= B1 ∧ · · · ∧ Bm ∧ ∼ C1 ∧ · · · ∧ ∼ Cn }, where we use the notation ∪ to denote a least upper bound. The special interpretation ! is defined as an interpretation which assigns the truth value ⊥ to all members of BP . The upward iteration of TP is defined as TP ↑ 0 = ! TP ↑ λ = ∪α<λ TP (TP ↑ α) for all ordinalsα, λ. The well-known results regarding an ALP and the operator TP are as follows. [Proposition 2.8]
If P is an ALP over the lattice T , then
(1) TP is a monotonic function, (2) P has a least model that is identical to the least fixed point of TP , (3) TP ↑ ω is identical to the least fixed point of TP . We next extend the stable class semantics for a typical logic program with the negation proposed in [1] to ALPSN. First, we describe the stable model introduced in [2] for an ALPSN P . Let I be any interpretation. P I , the Gelfond-Lifschitz transformation of P with respect to I, is an ALP obtained from the ALPSN P by deleting (1) each clause which has literals ∼ (C : µ) in its body with I |= (C : µ), and (2) all strongly negated literals in the bodies of the remaining clauses. Since P I is an ALP with no negation(both ∼ and ¬), it has the unique least model which is given by TP I ↑ ω [2,4,5]. [Definition 2.9](Stable Model) If I is a Herbrand interpretation of an ALPSN P , I is called a stable model of P [Example 2.10]
iff
I = TP I ↑ ω.
Let P be an ALPSN : {(q : t)∧ ∼ (s : t) → (p : t), ∼ (r : f) → (q : t)}
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI I
189
and if I = {(p : t), (q : t), (r : ⊥), (s : ⊥)}, Then, P I = {(q : t) → (p : t), (q : t)} and I is the stable model of P . [Definition 2.11](Stable Class for ALPSN) [1] Let A be a set of indicies. Let S = {Ii |i ∈ A} be a set of interpretations, P Ii be the Gelfond-Lifschitz transformation of P with respect to Ii , and M (P Ii ) be the unique least Herbrand model of P Ii . Then, S is a stable class of P
iff
S = {M (P Ii )|i ∈ A}.
Suppose that there is an operator FP which maps an interpretation to an interpretation such that FP (I) = M (P I ), the concept of stable models is then defined in terms of the fixed point of the operator FP , i.e., I is a stable model of P
iff
FP (I) = M (P I ) = I.
However, this operator may not always have fixed points, since FP is antimonotonic, i.e., if I1 ≤ I2 then FP (I2 ) ≤ FP (I1 )). The basic idea behind the stable class is that even though FP may have no fixed point, there might be a fixed point of FP2 (i.e., FP (FP (I)) = I). Actually, there exist two fixed points, the least fixed point and the greatest fixed point of FP2 . The following theorem can be obtained by extending the THEOREM 1 in [1] to ALP. [Theorem 2.12]
Every ALPSN has non-empty stable classes.
We have provided the proof for [Theorem 2.12] in [9]. 2.3. Computation for Stable Classes In this subsection, we provide an algorithm for computing stable classes of ALPSN. We give some definitions. [Definition 2.13]
For a given ALPSN P ,
Head(P ) = {(A : λ)|B → (A : µi ) ∈ P, λ = ∪µi }, T h(P ) = {(A : λ)|P " (A : µi ), λ = ∪µi }, CN is the set of clauses containing the strong negation. Note : we have always " (A : ⊥) for any atom A [6], therefore for any atom A in P , there is an annotated atom (A : µ) in T h(P ) such that µ ∈ {⊥, f, t, }.
190
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI I
Start
j ← 0, SC ← ∅
j ←j +1, S ← ∅
End
no PP j ≤n P P P
PPP yes
α←j
!!aaa !! a no ! ! aa ∃k(Mα = Ik) !a ! aa ! a!
-
yes
S ← S ∪ {Mα }
α←k
6
SC ← SC ∪ {S}
P yes no j = k PP P P P
6
PP
Figure 2. Flowchart for Computing Stable Classes
[Step 1] Determine each interpretation Ij being used for Gelfond-Lifschitz transformation. Compute as Ij = T h((P \ CN ) ∪ hj )(j = 1, . . ., n), where hj is any set of annotated atoms containing the same predicate symbol as Head(CN ) such that hj ≤ Head(CN ). We need not consider every interpretation for the ALPSN P . We need only take into account the interpretations Ij (j = 1, . . . , n). [Step 2] Compute the Gelfond-Lifschitz transformation of the ALPSN P in terms of each interpretation Ij . [Step 3] Construct the least Herbrand model Mj (P Ij for each P Ij in the [Step 2]. Moreover, compute the least Herbrand model Mj = T h(P Ij ). [Step 4] Compute the stable classes, SC, by the flow chart in Figure2.
3. Concluding Remarks In this paper, we define the syntax for an annotated logic program ALPSN and its stable model semantics and stable class semantics in order to provide four annotated semantics for default theory, nonmonotonic ATMS, the Negation as
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI I
191
Failure and the Closed World Assumption based on ALPSN and annotated completion formulas in the subsequent papers. Furthermore, we propose an algorithm for computing the stable class for ALPSN. We believe that ALPSN can be logical bases for various intelligent systems.
References [1] Baral,C.R. and Subrahmanian,V.S., Stable and Extension Class Theory for Logic Programs and Default Logics, J.Automated Reasoning 8 (1992) 366-385. [2] Gelfond, M. and Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. 5th Int’l Conf. and Symp. on Logic Programming, MIT Press (1989) 10701080. [3] Lloyd,J.W., Foundations of Logic Programming (2nd Edition), Springer-Verlag (1987). [4] Blair,H.A. and Subrahmanian, V.S., Paraconsistent Foundations for Logic Programming, J. Non-Classical Logic 5 (1988) 45-73. [5] Blair,H.A. and Subrahmanian, V.S., Paraconsistent Logic Programming, Theoretical Computer Science, 68 (1989) 135-154. [6] da Costa,N.C.A., Subrahmanian,V.S. and Vago,C., The Paraconsistent Logic PT , Zeitschrift f¨ ur Mathematische Logik und Grundlangen der Mathematik, 37 (1989) 139-148. [7] Reiter,R., A Logic for Default Reasoning, Artificial Intelligence, 13 (1980) 81-132. [8] Dressler,O., An Extended Basic ATMS, Proc. 2nd Int’l Workshop on Nonmonotonic Reasoning, LNCS 346,Springer-Verlag (1988) 143-154. [9] Nakamatsu,K. and Suzuki,A., Annotated Semantics for Default Reasoning, Proc. 3rd PRICAI, International Academic Press (1994) 180-186. [10] Clark,K.L., Negation as Failure, Logic and databases (H.Gallair and J.Minker, Eds.), Plenum Press (1987) 293-322. [11] Reiter,R., On Closed World Databases in Logic and Databases(H.Gallair and J.Minker Eds.), Plenum Press.(1987) 55-76.
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence II Kazumi Nakamatsu a and Atsuyuki Suzuki b University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN [email protected] a
Abstract. In this paper, we introduce the annotated semantics for Reiter’s default theory. We propose a syntactical translation from the Reiter’s default theory into ALPSN(Annotated Logic Program with Strong Negation), and show that the translation preserves their provability and satisfiability, that is to say, if a formula in the default theory is provable then the translated a-clause is also provable in the corresponding ALPSN. We clarify this relation by showing that there is a one-to-one correspondence between the default extension classes of the original default theory and the stable classes of the translated ALPSN. Keywords. ALPSN(Annotated Logic Program with Strong Negation), default theory, extension class, stable class.
3. Annotated Semantics for a Default Theory In this paper, we consider a translation tr from default theory [5] into ALPSN [7]. The tr also translates the extension class semantics for default theory into the stable class semantics [2,1] for the corresponding ALPSN. We show that there is a relationship shown in Figure. 1 between default theory and ALPSN. The motivating factor behind our choice of ALPSN to study the problem of the semantics for default reasoning is explained as follows. A default rule A : B/C can be interpreted as : “If A and B are consistent with the knowledge knowledge base, then C is true”. Moreover, we can interprete “B is consistent” as “B is true or ¬B does not exist in the knowledge base”. On the other hand, an annotated [Syntax] tr
Default Theory =⇒ ALPSN (D, W ) tr(D, W ) 6 6 [Semantics] ? ? 1 to 1 Extension Classes ⇐⇒ Stable Classes Figure 1. Relation between Default Theory and ALPSN
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI II
193
(inconsistent) @ @ f (false) t(true) @ @ ⊥(unknown) Figure 2. Lattice-4 of Truth Values
literal (A : µ) explicitely expresses the truth value of the literal A by its annotation µ over the lattice T in Figure 2. (A : f) A is known to be false,
(A : t) A is known to be true,
(A : ⊥) A is known to be neither true nor false(i.e., A is unknown). (A : ) A is known to be inconsistent. The strong negation can be intuitively interpreted as follows : |=∼ (A : t) is equivalent to |= (A : ⊥) or |= (A : f), |=∼ (A : f) is equivalent to |= (A : ⊥) or |= (A : t). Thus, if a default rule A : B/C can be interpreted as “if A is true and B is true or ¬B does not exist, C is true”, it can be expressed by an asn clause (A : t)∧ ∼ (B : f) → (C : t). Therefore, we believe that it is adequate to represent default theory by annotated formulas. 3.1. Default Theory and Extension Classes In this subsection, we review Reiter’s default theory briefly and propose its extension class semantics. Default logic is a nonmonotonic logic introduced by Reiter[5] in order to formalize default reasoning. Generally, a default theory T = (D, W ) consists of a set of facts, W , which are closed first order formulas, and of a set of defaults, D, which are specific inference rules having the form u : v/w, where u, v and w are first order formulas. In this paper, however, we restrict W to be a set of generalized Horn clauses [3,4] (which are allowed to contain negative literals in their heads or bodies), and D, to be a set of defaults having the following form : p1 ∧ · · · ∧ pm : j1 , . . . , jk /C, where p1 , . . . , pm, j1 , . . ., jk , C are literals. p1 ∧ · · · ∧ pm is the prerequisite, ji (1 ≤ i ≤ k) the justification and C the consequent of the default. An informal interpretion of the default is that it is allowed to add C to the current knowledge databases whenever p1 ∧ · · · ∧ pm belong to that database and j1 , . . . , jk are consistent with that database, that is to say, the justification ¬j1 , · · · , ¬jk do not belong to that database. [Definition 3.1] [5]
Let T = (D, W ) be a default theory and E be a set of ground
194
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI II
literals called a context. There is an operator, RE,D , which maps sets of ground literals to the sets of ground literals in the following way : RE,D (S) = Cn(S ∪ {C | p1 ∧ · · · ∧ pm : j1 , . . ., jk /C ∈ D, pi ∈ S(1 ≤ i ≤ m), {¬j1 , . . ., ¬jk } ∩ E = ∅}), where
Cn(S) = {w|S " w}.
[Definition 3.2] [5] Let T = (D, W ) be a default theory and E a context. The set RE,D (W ) of literals which have a strong proof from W using D with respect to the context E is defined recursively as follows : (W ) = Cn(W ) RE,D 0 E,D RE,D (RE,D (W )) n n+1 (W ) = R ∞ E,D RE,D (W ). ∞ (W ) = ∪n=0 Rn
The most important semantics for default theory is extension, which is defined below. [Definition 3.3] (extension) [5] E is an extension of a default theory T = (D, W )
iff
E = RE,D ∞ (W ).
However, default theory may have many extensions or none. The structure, called extension class, by which we can deal with the semantics for default theory in such cases was introduced in [1]. [Definition 3.4] (extension class) A family, E = {Ei |i ∈ A} of the sets of ground literals is the extension class of a default theory T = (D, W ), where A is a set of indices iff i ,D (W )|Ei ∈ E}. E = {RE ∞ A formula F is assigned true(resp. false) by the extension class E = {Ei |i ∈ A} of a default theory T = (D, W ) iff F is true(resp. false) in each Ei . 3.2. Translation From Default Theory into ALPSN First of all, we propose the translation tr from default theory into ALPSN. If T = (D, W ) is a default theory, 1. for any d ∈ D such that d = p1 ∧ · · · ∧ pm : j1 , . . . , jk /C,
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI II
195
tr(d) = (p1 : t) ∧ · · · ∧ (pm : t)∧ ∼ (ji : f) ∧ · · · ∧ ∼ (jk : f) → (C : t) and tr(D) = {tr(d1 ), . . . , tr(dn )} provideng that di ∈ D and (1 ≤ i ≤ n) ; 2. for any w ∈ W such that w = A1 ∧ · · · ∧ Al → A0 , tr(w) = (A1 : t) ∧ · · · ∧ (Al : t) → (A0 : t) and tr(W ) = {tr(w1 ), . . . , tr(wk )} providing that wi ∈ W and (1 ≤ i ≤ k). [Definition 3.5] tr(D, W ) = tr(D) ∪ tr(W ). [Example 3.6]
Consider the default theory T = (D, W ) with
D = {: p/p, : q/q, : r/r}
and
W = {q → ¬p,
q → ¬r}.
Then we have the following translation : tr(D) = { ∼ (p : f) → (p : t), tr(W ) = { (q : t) → (p : f),
∼ (q : f) → (q : t),
∼ (r : f) → (r : t) }
(q : t) → (r : f) }.
Here, we clarify the relation between the stable class of ALPSN and the extension class of default theory based on the translation tr. [Definition 3.7] literal. Then :
Let I be an interpretation of an ALPSN and L any ground Cna(I) = {L|(L : t) ∈ I}.
[Theorem 3.8]
Let T = (D, W ) be a default theory.
S = {Si |i ∈ A} is the stable class of the ALPSN tr(D, W ) iff E = {Cna(Si )|Si ∈ S} is the extension class of the default theory T , where Cna(Si ) = {L|(L : t) ∈ Si } and L is a ground literal. [Proof ] this theorem is proved the induction on the translation tr. We omit the proof.
196
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI II
3.3. Examples In this subsection, we describe two famous examples of default reasoning, the Penguin Triangle and the Nixon Diamond. We show how extension classes and the stable classes of the corresponding ALPSN which is the translation of the original default theory are computed. [Example 3.9] (Penguin Triangle) Consider the following default theory T = (D, W ) : let the set W of formulas as : {P (tw)} “Tweety is a penguin”; {P (tw) → ¬F (tw)} “If Tweety is a penguin then it does not fly”; {P (tw) → B(tw)} “If Tweety is a penguin then it is a bird”. If we take a default rule {B(tw) : F (tw)/F (tw)} “If Tweety is a bird then it flies” as the set D of default rules, tr(D, W ) = { P (tw) : t, P (tw) : t → F (tw) : f, P (tw) : t → B(tw) : t, B(tw) : t∧ ∼ F (tw) : f → F (tw) : t }. We describe the computation of the stable classes for the penguin triangle ALPSN P according to [Step 1]–[Step 4] in Section 2.3 [6]. [Step 1]
Let an ALPSN P = tr(D, W ). Then : CN = {B(tw) : t∧ ∼ F (tw) : f → F (tw) : t}, Head(CN ) = {F (tw) : t}, h1 = {F (tw) : ⊥},
h2 = {F (tw) : t},
I1 = T h((P \CN ) ∩ h1 ) = {P (tw) : t,
B(tw) : t,
F (tw) : f},
I2 = T h((P \CN ) ∩ h2 ) = {P (tw) : t,
B(tw) : t,
F (tw) : }.
[Step 2] P I1 = P I2 = { P (tw) : t,
P (tw) : t → F (tw) : f,
P (tw) : t → B(tw) : t },
[Step 3] M1 = T h(P I1 ) = M2 = T h(P I2 ) = { P (tw) : t,
B(tw) : t,
F (tw) : f }.
[Step 4] There is only one stable class {{p(tw) : t,
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI II
Thus, there exists only one extension class
197
{{P (tw), B(tw), ¬F (tw)}}.
[Example 3.10] (Nixon Diamond) Let us consider another default theory T = (D, W ), which consists of two facts : {Q(n)}
“Nixon is a Quaker”,
{R(n)}
“Nixon is a Republican” ;
and two default rules : {Q(n) : P (n)/P (n)}
“If Nixon is a Quaker, he is a pacifist”,
{R(n) : ¬P (n)/¬P (n)} “If Nixon is a Republican, he is not a pacifist”. Then, tr(D, W ) = { Q(n) : t,
R(n) : t,
Q(n) : t∧ ∼ P (n) : f → P (n) : t, R(n) : t∧ ∼ P (n) : t → P (n) : f }. [Step 1]
Let the ALPSN P be tr(D, W ). Then, CN = {Q(n) : t ∼ P (n) : f → P (n) : t, R(n) : t ∼ P (n) : t → P (n) : f}, Head(CN ) = {P (n) : }, h1 = {P (n) : ⊥}, h3 = {P (n) : t},
h2 = {P (n) : f}, h4 = {P (n) : },
I1 = T h((P \CN ) ∩ h1 ) = { Q(n) : t, R(n) : t, P (n) : ⊥ }, I2 = T h((P \CN ) ∩ h2 ) = { Q(n) : t, R(n) : t, P (n) : f }, I3 = T h((P \CN ) ∩ h3 ) = { Q(n) : t, R(n) : t, P (n) : t }, I4 = T h((P \CN ) ∩ h4 ) = { Q(n) : t, R(n) : t, P (n) : }. [Step 2] P I1 = {Q(n) : t, R(n) : t, Q(n) : t → P (n) : t, R(n) : t → P (n) : f}, P I2 = {Q(n) : t, R(n) : t, R(n) : t → P (n) : f}, P I3 = {Q(n) : t, R(n) : t, Q(n) : t → P (n) : t}, P I4 = {Q(n) : t, R(n) : t}. [Step 3] M1 = T h(P I1 ) = { Q(n) : t, R(n) : t, P (n) : }, M2 = T h(P I2 ) = { Q(n) : t, R(n) : t, P (n) : f }, M3 = T h(P I3 ) = { Q(n) : t, R(n) : t, P (n) : t }, M4 = T h(P I4 ) = { Q(n) : t, R(n) : t, P (n) : ⊥ }.
198
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI II
[Step 4]
There are three stable classes : {{Q(n) : t, R(n) : t, P (n) : }, {Q(n) : t, R(n) : t, P (n) : ⊥}}, {{Q(n) : t, R(n) : t, ¬P (n) : t}} and
{{Q(n) : t, R(n) : t, P (n) : t}}.
Because, FP2 (I1 ) = I1 , Fp (I2 ) = I2 , Fp (I3 ) = I3 , and Fp2 (I4 ) = I4 . Therefore, we have three extension classes for the default theory T = (D, W ). {{Q(n), R(n), P (n), ¬P (n)}, {Q(n), R(n)}}, {{Q(n), R(n), ¬P (n)}} and
{{Q(n), R(n), P (n)}}.
3.4. Concluding Remarks In this paper, we have proposed the translation tr from default theory into ALPSN, and shown that there is the one-to-one correspondence between the extension classes for default theory and the stable classes for ALPSN based on the translation tr. We have also shown the translation preserves the provability, i.e., if a formula is provable in a default theory the translated a-clause is also provable in the corresponding ALPSN based on the correspondence. We have taken two famous examples for default theory, and described the translation into ALPSN and the computing processes for the stable classes of the corresponding two ALPSNs. We believe that ALPSN can be a logical framework to deal with some nonmonotonic reasonings.
References [1] Baral,C.R. and Subrahmanian,V.S., Stable and Extension Class Theory for Logic Programs and Default Logics, J.Automated Reasoning 8 (1992) 366-385. [2] Gelfond, M. and Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. 5th Int’l Conf. and Symp. on Logic Programming, MIT Press (1989) 10701080. [3] Blair,H.A. and Subrahmanian, V.S., Paraconsistent Foundations for Logic Programming, J. Non-Classical Logic 5 (1988) 45-73. [4] Blair,H.A. and Subrahmanian, V.S., Paraconsistent Logic Programming, Theoretical Computer Science, 68 (1989) 135-154. [5] Reiter,R., A Logic for Default Reasoning, Artificial Intelligence, 13 (1980) 81-132. [6] Nakamatsu,K. and Suzuki,A., Annotated semantics for Nonmonotonic Reasonings in Artificial Intelligence I, Frontiers Series in Artificial Intelligence this volume IOS Press (2005). [7] Nakamatsu,K. and Suzuki,A., Annotated Semantics for Default Reasoning, Proc. 3rd PRICAI, International Academic Press (1994) 180-186.
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence III Kazumi Nakamatsu a and Atsuyuki Suzuki b a
University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN [email protected] Abstract. In this paper, we introduce the annotated semantics for Dressler’s nonmonotonic ATMS. We propose a syntactical translation from the Dressler’s ATMS into ALPSN(Annotated Logic Program with Strong Negation), and show that the translation preserves their provability and satisfiability, that is to say, if a formula in the ATMS is provable then the translated a-clause is also provable in the corresponding ALPSN. We clarify this relation by showing that there is a one-to-one correspondence between the extension of the original ATMS and the stable model of the translated ALPSN. Keywords. ALPSN(Annotated Logic Program with Strong Negation), ATMS(Assumption Based Truth Maintenance System), stable model, ATMS extension
4. Annotated Semantics for Nonmonotonic ATMS In this paper, we provide declarative annotated semantics for a nonmonotonic ATMS(Assumption Based Truth Maintenance System) with out-assumption introduced by Dressler [4]. We also propose another translation, tr1 , from the nonmonotonic ATMS into ALPSN(Annotated Logic Program with Strong Negation) [6]. We show that there is a natural one-to-one correspondence between the extensions of the nonmonotonic ATMS [4] and the stable models of the corresponding ALPSN [2,6] which is depicted in Figure 1. The nonmonotonic ATMS includes two meta rules : the Consistent Belief Rule and the Nogood Inference Rule, and an axiom : the Negation Axiom. It is shown that the reasoning by those two meta rules and the axiom can be implicitely implemented in the stable model computation of ALPSN. In order to describe the computation of the ATMS extensions and the stable models of ALPSN, we take an example based on “The Three Laws of Robotics” written by I.Asimov [1]. We consider a complete lattice structure shown in Figure 1 as the set T of truth values in ALPSN appeared in this paper. The annotations i and o represent the states IN and OUT of a node in the nonmonotonic ATMS.
200
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI III
[Syntax] tr
1 nonmonotonic ATMS =⇒ ALPSN AT M S(A, O, J) tr1 (AT M S(A, O, J)) 6 6 [Semantics] ? ? 1 to 1 Extension ⇐⇒ Stable Model
(inconsistent) @ @ o (OUT) i(IN) @ @ ⊥(unknown)
Lattice T of Truth Values
Figure 1. Nonmonotonic ATMS and ALPSN
4.1. Nonmonotonic ATMS with Out-Assumption In this subsection, we review the formal definition of Dressler’s nonmonotonic ATMS [4]. The data reasoned about is called a node. A set of nodes is designated to be the assumptions which are presumed to be true unless there is evidence to the contrary. A set of assumptions is called the environment. An environment is inconsistent if it derives the distinguished node named False. An inconsistent environment is called the nogood. The set of nodes derivable from a consistent environment is called the context. Every derivation is recorded as a (nonmonotonic) justification : a1 , · · · , ak , Out(b1 ), · · · , Out(bm ) → c, where each Out(bi ) (1 ≤ i ≤ m) is called an out-assumption. As with a normal asssumption, an outassumption Out(x) can not be used as a consequence of a justification, and it can be a member of an environment. The aim of introducing the out-assumption Out(x) is to do all computations with respect to the set of contexts in which the node x can not be derived, i.e., an out-assumption Out(x) can be added to the environment when the node x is not derivable. In order to achieve this aim, the following two aspects should be watched for : • the node x and the out-assumption Out(x) do not hold in the same context, • and for each context, either the node x or the out-assumption Out(x) hold. These aspects are achieved by adding the following meta-rules to the nonmonotonic ATMS. [Consistent Belief Rule] from the node x and the out-assumption Out(x), the special node F alse is inferred. This rule can be encoded into the nonmonotonic ATMS by a justification of the form : x, Out(x) → F alse. [Nogood Inference Rule] If an environment {A1 , · · · , An } is consistent and {A1 , · · · , An , Out(x)} is inconsistent, then {A1 , · · · , An } " x holds. [Example 4.1] A nonmonotonic justification Out(p) → p leads to a nogood {Out(p)} because of p, Out(p) → F alse. Then, the Nogood Inference Rule deduces that p is universally true, i.e., its label is {{}}. [Definition 4.2](Nonmonotonic ATMS) [4] AT M S(A, O, J) such that : A is a set of assumptions,
A nonmonotonic ATMS is a triple
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI III
201
O is a set of out-assumptions (O ⊆ A), and J is a set of justifications. Generally, conventional ATMS can not deal with the negation of nodes. However, this nonmonotonic ATMS can do it by assuming the following Negation Axiom. [Negation Axiom] -
For given two nodes x and ¬x,
at any time at least one of x and ¬x is derivable, which is expressed by a justification : Out(x), Out(¬x) → F alse, at any time not both of them can be derived, which is expressed by a justification : x, ¬x → F alse.
Here, we review the extension of the nonmonotonic ATMS[4]. [Definition 4.3](extension) An extension of an environment E with respect to a set O of out-assumptions is the set of nodes which can be derived from a minimal set M of assumptions such that the following conditions hold : E ≤ M,
∀Out(b) ∈ O|Out(b) ∈ M or M " b,
where M is called the characterizing environment of the extension. An extension is inconsistent if it contains F alse, otherwise it is consistent. 4.2. From Nonmonotonic ATMS into ALPSN In this subsection, we propose the translation tr1 from the nonmonotonic ATMS with out-assumptions, into ALPSN. To begin with, we give the motivation for the translation. For each node x, we consider that the concept of “to be derivable(IN)” can be expressed by an annotation, i, in the lattice in Figure 1. Then, for each annotated literal, (x : µ) (µ ∈ T ), the following interpretations hold : (x : i) x is known to be derivable(IN) in the context, (x : o) x is known to be not derivable(OUT) in the context, (x : ⊥) x is known to be neither derivable nor not derivable in the context, (x : ) x is known to be both derivable and not derivable in the context. Strongly negated annotated literals are also informally interpreted as follows : ∼ (x : i) ∼ (x : o)
x is not known to be derivable in the context, x is not known to be not derivable in the context.
Therefore, if we interprete “a node x holds in the context” as “x is known to be derivable in the context”, the out-assumption Out(x) can then be interpreted as “a node x is known to be not derivable in the context”. However, the outassumption Out(x) in the antecedent of the justification is regarded as a default rule “a node x does not hold if there is no support of x to hold in the context”. Reiter’s default rule a : M b / c can be encoded to a justification a, Out(¬b) → c [4]. Additionally, we have shown that default reasoning can be expressed by an ALPSN in the Section 3 [6]. Thus, the out-assumption Out(x) can be interpreted
202
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI III
as “x is not known to be derivable in the context”. For example, a justification a, Out(¬b) → c may be translated into an a-clause (a : i)∧ ∼ (b : o) → (c : i). We introduce the translation rule tr1 from nonmonotonic ATMS into ALPSN. [Definition 4.4](ALPSN-translation tr1 ) tonic ATMS with out-assumption.
Let AT M S(A, O, J) be a nonmono-
• If n is an ordinary node (n ∈ A), its ALPSN-translation is tr1 (n) = (n : i). • If Out(x) is an out-assumption, its ALPSN-translation is tr1 (Out(x)) = (x : o). • If j = a1 , · · · , ak , Out(b1), · · · , Out(bm ) → c is a nonmonotonic justification, its ALPSN-translation is tr1 (j) = (a1 : i) ∧ · · · ∧ (ak : i)∧ ∼ (b1 : i) ∧ · · · ∧ ∼ (bm : i) → (c : i). In order to show that there is a one-to-one correspondence between the extension of the nonmonotonic ATMS and the stable model of ALPSN, we would like to redefine the extension by means of Reiter’s operator RE,J [3] and define the ALPSN-translation of AT M S(A, O, J), tr1 (AT M S(A, O, J)). [Definition 4.5] For a context E and a set J of justifications, the operator RE,J maps a set S of nodes to a set of nodes as follows : RE,J (S) = Cn(S ∪ {c | a1 , . . . , ak , Out(b1 ), · · · , Out(bm) → c ∈ J, aj ∈ S(1 ≤ j ≤ k), Out(bl ) ∈ S or bl ∈ E(1 ≤ l ≤ m)}), where
Cn(S) = {w | S " w}.
[Definition 4.6] Let A be a set of out-assumptions. We also assume the same condition as in the [Definition 4.5]. The set RE,J (A) of nodes which can be derivable from A using J with respect to the context E is defined recursively as follows : RE,J 0 (A) = Cn(A),
E,J RE,J (RE,J n (A)), n+1 (A) = R
∞ E,J RE,J ∞ (A) = ∪n=0 Rn (A).
[Definition 4.7] Let O be a set of out-assumptions and assume the same condition as in the [Definition 4.5]. E is the extension of an environment A with respect to O iff (1) E = RE,J ∞ (A)
and
(2) ∀Out(b) ∈ O | Out(b) ∈ E ∨ b ∈ E.
The second condition (2) can be translated into two asn-clauses such that, for each out-assumption Out(b) ∈ O \ A | ∼ (b : o) → (b : i), ∼ (b : i) → (b : o), (i.e., (b : o) ∨ (b : i)). Note the out-assumption Out(x) in A has already been translated into (x : o). [Definition 4.8](tr 1(AT M S(A, O, J))) ATMS with out-assumption.
Let AT M S(A, O, J) be a nonmonotonic
tr1 (AT M S(A, O, J) = tr1 (A) ∪ tr1 (J) ∪ M,
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI III
203
where M = {∼ (b : i) → (b : o), ∼ (b : o) → (b : i)|Out(b) ∈ O \ A}. We show some derivations from the Consistent Belief Rule, the Nogood Inference Rule and the Negation Axiom with a simple example. [Example 4.9]
Let J = {Out(a) → b, Out(b) → c} and A = {}. Then,
tr1 (AT M S(A, O, J)) = { ∼ (a : i) → (b : i),
∼ (b : i) → (c : i),
∼ (a : i) → (a : o),
∼ (b : i) → (b : o),
∼ (a : o) → (a : i),
∼ (b : o) → (b : i), }.
Extensions of this AT M S(A, O, J) are {Out(a), b} and {Out(b), a, c}. Stable models of tr1 (AT M S(A, O, J)) are {(a : o), (b : i), (c : ⊥)} and {(a : i), (b : o), (c: i)}. Furthermore, let A = {Out(a), Out(b)}. Then, tr1 (AT M S(A, O, J)) = { ∼ (a : i) → (b : i), → (b : o),
∼ (b : i) → (c : i),
→ (a : o) }.
This AT M S(A, O, J) has the unique extension which is consistent. Since A, J " b, Out(b), by the Consistent Belief Rule, a node F alse is derived. Extension of this AT M S(A, O, J) is {Out(a), b, Out(b), F alse}. The corresponding stable model is {(a : o), (b : )}. Since the annotated literal (b : ) corresponds to F alse nodes in the nonmonotonic ATMS, the environment {Out(a), Out(b)} can be identified to be a nogood in this stable model. Therefore, there may be a one to one correspondence between the extensions of the AT M S(A, O, J) and the stable models of its ALPSN-translation tr1 (AT M S(A, O, J)). Additionally, if we let A = {Out(b)}, then the extension is {a, Out(b), c} and the corresponding stable model is {(a : i), (b : o), (c : i)}. Thus, the Nogood Inference Rule : ”if {Out(b)} is consistent and {Out(b), Out(a)} is inconsistent, then {Out(b)} " a” can be implemented in the ALPSN stable model computation. If M is removed from tr1 (AT M S(A, O, J)) such that A = {}, then there does not exist such a one to one correspondence. There is thus only one stable model {(b : i)} of the ALPSN-translation tr1 (A) ∪ tr1 (J) = {∼ (a : i) → (b : i),∼ (b : i) → (c : i)}. This stable model corresponds to the extension of Reiter’s default rules { : ¬a/b, : ¬b/c} [4,6,7]. Out-asssumptions in justification are therefore regarded as Reiter’s default rules when the set M is removed. The set M of asn-clauses shows that, for each out-assumption Out(x), the node x is known to be either IN or OUT. If a F alse node is deduced by the Consistent Belief Rule, the node x can be reassigned to either IN or OUT by the Nogood Inference Rule. These inferences can be implemented through the set M of asn-clauses in an ALPSN stable model computation. In this example, although there are out-assumptions in the extension, the out-assumption Out(x) can be transformed into the negation (¬x) of the node x by the Negation Axiom. Eventually, we obtain the following results. [Theorem 4.10] • there exists a one-to-one correspondence between the nonmonotonic ATMS extensions and ALPSN stable models,
204
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI III
• two meta-rules, the Consistent Belief Rule and the Nogood Inference Rule can be implemented automatically in computing the stable models of tr1 (AT M S(A, O, J)), • the ALPSN translation of the Negation Axiom is not necessary in the stable model computation. [Proof ] of this theorem can be found in [5]. 4.3. Examples In this subsection, we show that The Three Laws of Robotics by I.Asimov can be implemented as an ALPSN in the nonmonotonic ATMS. This example describes how the nonmonotonic ATMS and its ALPSN-translation program works when “Paul(human being) orders Hal(robot) to injure Sam(human being)” whilst utilizing the following laws. [The Three Laws of Robotics](Handbook of Robotics by I.Asimov [1]) 1. A robot may not injure a human being, or, through inaction allow a human being to harm. 2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. 3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. The exceptions in Rules 2 and 3 can be regarded as default rules. [Predicates and their meanings in the nonmonotonic ATMS] M (p) Paul is a human being. M (s) Sam is a human being. R(h) Hal is a robot. C(p) Paul is a commander. Ho(h, s) “Hal must injure Sam” is an order. H(h, s) Hal injures Sam. P1 robots must obey the Rule 1. P2 robots must obey the Rule 2. P3 robots must obey the Rule 3. O robots must obey what a human being orders. We define AT M S(A, O, J) as follows : A = { M (p), M (s), R(h), C(p), Ho(h, s), P1 }, J = { [a] Ho(x, y), Out(¬O) → H(x, y), [b] P1 , Out(¬P2) → P2 , [c] P1 , P2, Out(¬P3 ) → P3 , [d] R(x), M (y), H(x, y), P1 → ¬P1 , [e] C(x), M (x), P2, ¬O → ¬P2 , [f] R(x), H(y, x), P3 → ¬P3 }, where x and y are variables.
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI III
205
Each justification has the following meanings : [a] : execution of the order, [b] : the Rule 1, [c] : the Rule 2, [d] : derivation of a node when not obeying the Rule 1, [e] : derivation of a node when not obeying the Rule 2, [f] : derivation of a node when not obeying the Rule 3. We now describe the computation for the nonmonotonic ATMS extensions. By [a], [b], [c], [d] and the Negation Axiom we have A ∪ {Out(¬O), Out(¬P2), Out(¬P3)} " O, ¬P1, P2 , P3 , H(h, s), F alse. Thus, one of the extensions is A ∪ {O, ¬P1, P2 , P3 , H(h, s), F alse}. By [a], [d] and the Negation Axiom, A ∪ {Out(¬O)} " F alse. Since A is consistent, by the Nogood Inference Rule, A " ¬O. Therefore, by [e] and the Negation Axiom, A ∪ {Out(¬P 2)} " F alse. Similarly, we have A " ¬P2 . Then, A ∪ {Out(¬P 3)} " ¬O, ¬P2, P3. Hence, the other one is A ∪ {¬O, ¬P2, P3 }. Then, the AT M S(A, O, J) is translated into its ALPSN-translation P = tr1 (AT M S(A, O, J)) = tr1 (A) ∪ tr1 (J) ∪ M. tr1 (A) = { M (p) : i, M (s) : i, R(h) : i, C(p) : i, Ho(h, s) : i, P1 : i}. tr1 (J) = { Ho(x, y) : i∧ ∼ (O : o) → H(x, y) : i, P1 : i∧ ∼ (P2 : o) → P2 : i, P1 : i ∧ P2 : i∧ ∼ (P3 : o) → P3 : i, R(x) : i ∧ M (y) : i ∧ H(x, y) : i ∧ P1 : i → P1 : o, C(x) : i ∧ M (x) : i ∧ P2 : i ∧ O : o → P2 : o, R(x) : i ∧ H(y, x) : i ∧ P3 : i → P3 : o } M = { ∼ (O : o) → O : i,
∼ (O : i) → O : o,
∼ (P2 : o) → P2 : i,
∼ (P2 : i) → P2 : o,
∼ (P3 : o) → P3 : i,
∼ (P3 : i) → P3 : o}.
206
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI III
This ALPSN P has two stable models I1 = tr1 (A) \ {P1 : i} ∪ {O : i, P1 : , P2 : i, P3 : i, H(h, s) : i}, I2 = tr1 (A) ∪ {O : o, P2 : o, P3 : i}. The stable model I1 shows that if the order “Hal must injure Sam” is executed, it conflicts with the Rule 1, and the stable model I2 shows that the exception for the Rule 2 works successfully.
5. Concluding Remarks In this paper, we introduced the annotated semantics for Dressler’s nonmonotonic ATMS. We have proposed a syntactical translation from the ATMS into ALPSN(Annotated Logic Program with Strong Negation), and shown that the translation preserves their provability and satisfiability, that is to say, if a formula in the ATMS is provable then the translated a-clause is also provable in the corresponding ALPSN. We have clarified this relation by showing that there is a one-to-one correspondence between the extension of the original ATMS and the stable model of the translated ALPSN. We believe that ALPSN must be a logical base for intelligent systems.
References [1] Asimov,I., I,Robot, Gnome Press, 1950. [2] Gelfond, M. and Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. 5th Int’l Conf. and Symp. on Logic Programming, MIT Press (1989) 10701080. [3] Reiter,R., A Logic for Default Reasoning, Artificial Intelligence, 13 (1980) 81-132. [4] Dressler,O., An Extended Basic ATMS, Proc. 2nd Int’l Workshop on Nonmonotonic Reasoning, LNCS 346,Springer-Verlag (1988) 143-154. [5] Nakamatsu,K. and Suzuki,A., On the Relation Between Nonmonotonic ATMS and ALPSN, Proc.Japanese Society for AI SIG FAI 9502 1, (1995) 1-8. [6] Nakamatsu,K. and Suzuki,A., Annotated Semantics for Default Reasoning, Proc. 3rd PRICAI, International Academic Press (1994) 180-186. [7] Reinfrank,M., Dessler,O. and Brewka,G. On the Relation between Truth Maintenance and Autoepistemic Logic, Proc. 11th IJCAI, (1989) 1206-1212.
Annotated Semantics for Nonmonotonic Reasonings in Artificial Intelligence IV Kazumi Nakamatsu a and Atsuyuki Suzuki b University of Hyogo, HIMEJI 670-0092 JAPAN [email protected] b Shizuoka University, HAMAMATSU 432-8011 JAPAN [email protected] a
Abstract. In this paper, We define an annotated logic AL for NF and CWA based on the set of truth values(annotations) in such a way that the structure is the 4-valued complete lattice(Lattice-4) and each annotation corresponds to each state in logic program derivation. We also propose two kinds of annotated completion formulas, which can be the declarative annotated semantics for NF and CWA as well as Clark’s completion. Furthermore, we take a multi-agent system having a contradiction between each agent’s world as a simple example and show how the annotated completion deals with such inconsistency. Keywords. AL(Annotated Logic), Negation as Failure, Closed World Assumption, annotated completion
5. Annotated Semantics for NF and CWA The derivation rules of negative information in logic programming systems and knowledge bases, Clark’s Negation as Failure(NF) [3] and Reiter’s Closed World Assumption(CWA) [6], are also nonmonotonic reasonings. Many researchers have given some logical interpretations to these rules. Fitting [8] and Kunen [9] provided the explanation for a logic program with negation based on some logical models. Balbiani [5] and Gabbay [7] showed that SLDNF-provability has modal meaning and it can be expressed in a modal logic. However, their semantics for these rules cannot deal with inconsistency, such as a contradiction between agents in a multi-agent system. Therefore, in this section we try to give different declarative semantics for NF and CWA based on an annotated logic AL. With respect to these semantics, we show the soundness and completeness of NF and CWA. In this paper, first, we review NF and CWA, and give procedual interpretations for them from the viewpoint of logic programming. We define an annotated logic AL for NF and CWA based on the set of truth values(annotations) in such a way that the structure is the 4-valued complete lattice(Lattice-4) and each annotation corresponds to each state (succeeds, finitely fails, loops, and both succeeds and finitely fails(inconsistent)) of logic program derivation. Then, we propose two kinds of annotated completion formulas, which can be the declarative annotated
208
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV
(inconsistent) @ @ f(failure) s(success) @ @ l(loop) Figure 1. The Lattice T of Truth Values(Lattice-4)
semantics for NF and CWA as well as Clark’s completion. Next, we take a multiagent system having contradiction between each agent’s world as a simple example and show how the annotated completion deals with such inconsistency. Last, we show the soundness and completeness of NF and CWA with respect to the annotated completion without proof in details. 5.1. NF, CWA and an Annotated Logic AL NF rule is an inference rule in logic programming which gives the false value to a ground atom if the logic program cannot give the proof of the ground atom. If A is a ground atom NF rule can be interpreted informally as follows : the goal ¬A succeeds if the ground atom A finitely fails, the goal ¬A finitely fails if the ground atom A succeeds. On the other hand, CWA rule is an inference rule which postulates that a ground atom is false whenever it is not a logical consequence of a given logic program which includes the ground atom [7]. Although CWA is defined formally as “for a program P and a ground atom A, CW A(P ) |= ¬A iff P |=A”, it can be interpreted informally as follows : the goal ¬A succeeds if the A doesn’t succeed, the goal ¬A finitely fails if the ground atom A doesn’t finitely fail. We give the semantics for these two rules by means of the annotated completion of a program P , which differs from Clark’s completion where the underlying logic is an annotated logic rather than classical ones. Generally a goal(ground atom) A either succeeds, finitely fails or loops in logic programming. We consider the following complete lattice T (Lattice-4) of truth values(annotations) {l, f, s, } shown in Figure 1. The reason motivating our choice of the annotated logic AL to study the problem of the semantics for NF and CWA is explained. An annotated atom (A : µ) can express the derivation of A by its annotation µ as follows : (A : l) the atom A loops, (A : f) the atom A finitely fails, (A : s) the atom A succeeds, (A : ) the atom A succeeds and finitely fails.
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV
209
Usually the annotation does not appear to characterize ordinary logic programs with NF or CWA. However, it is necessary for expressing inconsistency in such cases where a multi-agent system has contradiction between its agents. We have the following equivalences based on the property of the epistemic negation. (¬A : s) ≡ (A : ¬(s)) ≡ (A : f)
(1)
(¬A : f) ≡ (A : ¬(f)) ≡ (A : s)
(2)
Then we can interprete the expressions (1) and (2) informally as (1)
“¬A succeeds” is equivalent to “A finitely fails”.
(2)
“¬A finitely fails” is equivalent to “A succeeds”.
On the other hand, the following interpretation for the strong negation holds. |=∼ (A : s) is equivalent to |= (A : l) or |= (A : f)
(3)
|=∼ (A : f) is equivalent to |= (A : l) or |= (A : s)
(4)
The expressions (3) and (4) are interpreted informally as (3)
“∼ A succeeds” is equivalent to “A loops or A finitely fails, i.e.,A doesn’t succeed”. (4) “∼ A finitely fails” is equivalent to “A loops or A succeeds, i.e.,A doesn’t fail”.
The expressions (1) and (2) show the meaning of NF, and the expressions (3) and (4) show that of CWA. Therefore, we will construct the annotated semantics for NF and CWA based on the ideas in the following subsection. 5.2. Annotated Completion The most widely accepted declarative semantics for NF uses the “completed database” introduced by Clark [3]. These formulas are usually called just completion or Clark’s completion of a program P and are denoted by Comp(P ). The aim of the completion is to logically characterize the SLD-finite failure set of the program P . We propose annotated completion formulas and prove the soundness and completeness theorems of NF with respect to the annotated completion. The idea of Clark’s completion is to consider that a predicate is totally defined by clauses whose heads have the same symbol as the predicate. This property is syntactically denoted in Comp(P ) as an equivalence between a predicate symbol and the disjunction of clause bodies. In the general case of Comp(P ), each clause L1 , · · · , Lm → R(x1 , · · · , xn ) in P , in which the predicate symbol R appears in the head is taken and rewritten in a general form : ∃y1 · · · ∃yk (x1 = t1 ∧ · · · ∧ xn = tn ∧ L1 ∧ · · · ∧ Lm ) → R(x1 , · · · , xn ),
210
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV
where x1 , · · · , xn are new variables, i.e., not already occuring in any of these clauses, and y1 , · · · , ym are variables of the original clause. If the general forms of all these clauses are E1 → R(x1 , · · · , xn ), · · · , Ej → R(x1 , · · · , xn ), then the completed definition of R is E1 ∨ · · · ∨ Ej ↔ R(x1 , · · · , xn). The completion Comp(P ) of R is defined to be the set of completed definitions of each predicate symbol in P together with the equality and freeness axioms called CET (Clark’s Equational Theory)[3]. We define the annotated completion formula ACompn(P ) and prove that NF is sound and complete in terms of the annotated completion ACompn(P ). Let P be a program and ∀x1 · · · ∀xn (E1 ∨ · · · ∨ EM ↔ R(x1 , · · · , xn)) be the completed definition of R in P . This formula is logically equivalent to the conjunction of formulas ∀x1 · · · ∀xn (E1 ∨ · · · ∨ EM → R(x1 , · · · , xn ))
and
∀x1 · · · ∀xn (¬E1 ∧ · · · ∧ ¬EM → ¬R(x1 , · · · , xn )). Each Ei (1 ≤ i ≤ M ) is of a form ∃y1 · · · ∃yk (x1 = t1 ∧ · · · ∧ xn = tn ∧ L1 ∧ · · · ∧ Lm ), and each negation ¬Ei (1 ≤ i ≤ M ) is of a form ∀y1 · · · ∀yk (x1 = t1 ∧ · · · ∧ xn = tn → ¬L1 ∨ · · · ∨ ¬Lm ), where the original clause is L1 , · · · , Lm → R(t1 , · · · , tn ). Note : We assume the axiomatic system of AL to include CET . The interpretation of equality is given in the usual manner. Let us replace each literal Li (1 ≤ i ≤ m) and R(t1 , · · · , tn ) by the corresponding annotated literal (Li : s) (1 ≤ i ≤ m) and (R(t1 , · · · , tn ) : s) respectively. Then we obtain the following definitions which constitute the annotated completed definition of the predicate R. [Definition 5.1](ACompn(P ))
Let P be a program and
L1 , · · · , Lm → R(t1 , · · · , t2 ) be a program clause in P . The positive annotated completed definition of R is
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV + ∀x1 · · · ∀xn (E1+ ∨ · · · ∨ EM → R(x1 , · · · , xn ) : s),
211
(5)
the negative annotated completed definition of R is − → R(x , · · · , x ) : f), ∀x1 · · · ∀xn (E1− ∧ · · · ∧ EM 1 n
(6)
where each Ei+ (1 ≤ i ≤ M ) is of a form ∃y1 · · · ∃yk (x1 = t1 ∧ · · · ∧ xn = tn ∧ (L1 : s) ∧ · · · ∧ (Lm : s)), and each Ei− (1 ≤ i ≤ M ) is of a form ∀y1 · · · ∀yk (x1 = t1 ∧ · · · ∧ xn = tn → (L1 : f) ∨ · · · ∨ (Lm : f)). Let P1+ be the set of the positive completed definitions (5) of all heads in the program P , and P1− be the set of the negative completed definitions (6) of all heads in the program P . ACompn(P ) = P1+ ∪ P1−. In the annotated completion ACompn(P ), the annotated atom (A : s) which has s as its annotation should be interpreted as “A succeeds in P ”, that is to say, “there is the SLD-refutation of A in P ”. The annotated atom (A : f) which has f as its annotation should be interpreted as “A fails finitely in P ”, in other words, “there is the SLD-finitely failed tree of A in P ”. Generally, the SLD-derivation of A either succeeds, finitely fails or loops. In the annotated completion of P , the annotations s and f can be considered to express “succeeds” and “finitely fails” respectively. If any attempt to refute A in P loops on A then A neither succeed nor finitely fail in P . We can represent this fact by the annotated formula ∼ ((A : s) ∨ (A : f)) ↔ (A : l). This formula is always valid in the annotated logic AL. Thus the annotated atom (A : l) is adequate to express the SLD-derivation of A loops. In Balbiani [5] and Gabbay [7], they express the provability of a ground atom, A, in modal language in various ways. For example, “A succeeds in P ” is expressed by 2A, “A finitely fails in P ” by ¬2A and “A loops in P ” by ¬(2A ∨ ¬2A). It seems that the expressions by the annotated formulas make it easier to understand the derivation state of ground atoms in P than those by modal formulas. There is another way to introduce negation in logic programming, the Closed World Assumption(CWA). In the case of a CWA, the negation of a ground atom A should be a logical consequence of a different annotated completion from the case of NF when CWA postulates A to be false. We define the different annotated completion ACompc(P ) for CWA. [Definition 5.2](ACompc(P ))
Let P be a program and
L1 , · · · , Lm → R(t1 , · · · , tn )
212
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV
be a clause in P . Then we obtain the formulas which constitute the annotated completed definition of R, similar to the case of NF. The positive annotated completed definition of R is + ∀x1 · · · ∀xn (E1+ ∨ · · · ∨ EM →∼ (R(x1 , · · · , xn ) : f)),
(7)
the negative annotated completed definition of R is − →∼ (R(x , · · · , x ) : s)), ∀x1 · · · ∀xn (E1− ∧ · · · ∧ EM 1 n
(8)
where each Ei+ (1 ≤ i ≤ M ) is of a form ∃y1 · · · ∃yk (x1 = t1 ∧ · · · ∧ xn = tn ∧ ∼ (L1 : f) ∧ · · · ∧ ∼ (Lm : f)) and each Ei− (1 ≤ i ≤ M ) is of a form ∀y1 · · · ∀yk (x1 = t1 ∧ · · · ∧ xn = tn → (∼ (L1 : s) ∨ · · · ∨ ∼ (Lm : s))). Let P2+ be the set of the positive completed definitions (7) of all heads in the program P , and P2− be the set of the negative completed definitions (8) of all heads in the program P . ACompc(P ) = P2+ ∪ P2− . In the annotated completion ACompc(P ), an annotated formula ∼ (A : s) should be interpreted as “A never succeeds in P ”, that is to say, “the SLD-resolution never proves A in P ” and an annotated formula ∼ (A : f) should be interpreted as “A never fails finitely in P ”, that is to say, “the SLD-resolution never proves the finite failure of A in P ”. We show how the annotated completions, ACompn(P ) and ACompc(P ), describe the meaning of logic programs with NF and CWA, for which inconsistency is included by illustrating a simple example. [Example 5.3] Let PA = {Q, Q → R} and PB = {Q, ¬Q → R} be programs which express the worlds of Agents A and B, respectively. If we take NF as an inference rule of negative information, we have ACompn(PA ) = { (Q : s), (Q : s) → (R : s) } ∪ { (Q : f) → (R : f) }, ACompn(PB ) = { (Q : s), (Q : f) → (R : s) } ∪ { (Q : s) → (R : f) }, ACompn(PA ) |= (R : s)
and
ACompn(PB ) |= (R : f).
Therefore, ACompn(PA ) ∪ ACompn(PB ) |= (R : ). These formulas express that the agent A believes “R succeeds” in his world, the agent B believes “R fails finitely” in his world, and there is a contradiction in terms of R between their worlds. On the other hand, if we take CWA as an inference rule of negative information, we have
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV
Therefore, ACompc(PA ) ∪ ACompc(PB ) |=∼ ((R : f) ∨ (R : s)), i.e., ACompc(PA ) ∪ ACompc(PB ) |= (R : l). These formulas express that the agent A never believes “R succeeds” in his closed world and the agent B never believes “R fails finitely” in his closed world, but both of them believe “R loops” in their world both. 5.3. Soundness and Completeness of NF and CWA We show the soundness and completeness of NF and CWA with respect to the annotated completions [3,4]. Following Lloyd [2], we recapitulate some concepts about logic programming before giving the soundness and completeness theorems. UL denotes a Herbrand Universe and BL denotes a Herbrand Base. We identify as usual a Herbrand interpretation IH as a subset of BL . ground(P ) denotes the set of all instanciations of clauses in a program P . A mapping TP from a set of Herbrand interpretations to itself is defined as follows : for every Herbrand interpretation IH , TP (IH ) = {A ∈ BL |(L1 ∧ · · · ∧ Lm → A ∈ ground(P ))(Li ∈ IH )(1 ≤ i ≤ m)}. Then the finite failure set F F of P and the upward iteration of TP are defined recursively. [Definition 5.4] [5,4,2] fails in P at depth d.
Let F Fd be a set of ground atoms in BL which finitely
• F F0 = BL \ TP (BL ), • A ∈ F Fd+1 , if for every clause L1 ∧ · · · ∧ Lm → A of ground(P ) there is an integer i(1 ≤ i ≤ m) such that Li ∈ F Fd , • F F = ∪d∈ω F Fd, where ω is the set of all natural numbers. The upward iteration of TP is defined : TP ↑ 0 = ∅, TP ↑ α = TP (TP ↑ (α − 1)), TP ↑ λ = ∪{TP ↑ η|η < λ}, where α is a successor ordinal and λ is a limit ordinal. A set of Herbrand interpretations can be ordered by the usual set inclusion relation and is a complete lattice. The least fixed point of TP is a set, TP ↑ ω, and it is equivalent to the least Herbrand model of P .
214
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV
Now we define the other two mappings, TN and TC . Both of them map an interpretation of the annotated logic AL into themselves. Let JH be an interpretation of the annotated logic AL [1]. TN (JH )(A) = ∪{µ | E → (A : µ) is a ground instance of the annotated completion formulas, (1) or (2), of A and JH satisfies the formula E}, TC (JH )(A) = ∪{µ | E →∼ (A : µ) is a ground instance of the annotated completion formulas, (3) or (4), of A and JH does not satisfy the formula E}, where ∪ means “least upper bound”. If we define a special interpretation ∆ to be an interpretation which assigns the value l to all members of BL , the upward iterations of the mappings TN and TC are also defined : TN ↑ 0 = ∆,
Then we have the following theorem. [Theorem 5.5] • A is a logical consequence of P
iff
P1+ |= (A : s) in AL.
Since “A is a logical consequence of P ” is equivalent to A ∈ TP ↑ ω, the above theorem can be proved by induction on the least integer d such that A ∈ TP ↑ d and (A : s) ∈ TN ↑ d. • A belongs to the finite failure set of P
iff
P1− |= (A : f) in AL.
This theorem can be proved by induction on the least integer d such that A ∈ F Fd and (A : f) ∈ TN ↑ d. • A is not a logical consequence of P
iff
P2− |=∼ (A : s) in AL.
An equivalent proposition, “A is a logical consequence of P iff there is an AL model of P2− ∧ (A : s)”, can be proved by induction on the least integer d such that A ∈ TP ↑ d and (A : s) ∈ TC ↑ d instesd of this proposition. • A does not belong to the finite failure set of P
iff
P2+ |=∼ (A : f) in AL.
An equivalent proposition, “A belongs to the finite failure set of P iff there is an AL model of P2+ ∧ (A : f)”, can be proved by induction on the least integer d such that A ∈ F Fd and (A : f) ∈ TC ↑ d instead of this proposition.
K. Nakamatsu and A. Suzuki / Annotated Semantics for Nonmonotonic Reasonings in AI IV
215
6. Concluding Remarks In this paper, we have provided two annotated semantics for the Negation as Failure and the Closed World Assumption based on annotated completion formulas. In this series of papers, we have totally provided four annotated semantics for the default logic, the nonmonotonic ATMS, the Negation as Failure and the Closed World Assumption. Moreover, we have proposed different annotated logic programs called VALPSN and EVALPSN, and clarified the relationship between defeasible logic, defeasible deontic logic and them. It has been shown that annotated logic can be a logical base for various intelligent reasoning systems with these results.
References [1] Gelfond, M. and Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. 5th Int’l Conf. and Symp. on Logic Programming, MIT Press (1989) 10701080. [2] Lloyd,J.W., Foundations of Logic Programming (2nd Edition), Springer-Verlag (1987). [3] Clark,K.L., Negation as Failure, Logic and databases (H.Gallair and J.Minker, Eds.), Plenum Press (1987) 293-322. [4] Jaffer,J.,Lassez,J.L. and Lloyd,J.W., Completeness of the Negation as Failure, Proc. 8th ACM Symp. on Principles of Databases System, ACM (1989) 379-393. [5] Balbiani,P., Modal Logic and Negation as Failure, J.Logic and Computation 1 Oxford Univ. Press (1991) 331-356. [6] Reiter,R., On Closed World Databases in Logic and Databases(H.Gallair and J.Minker Eds.), Plenum Press.(1987) 55-76. [7] Gabbay,D.M., Modal Provability Foundations for Negation by Failure, Proc. Int’l Workshop Extension of Logic Programming, LNAI 475 (1989) 179-222. [8] Fitting,M., A Kripke-Kleene Semantics for Logic Programs, J.Logic programming 5 (1985) 295-312. [9] Kunen,K., Negation in Logic Programming, J.Logic Programming 7 (1987) 91-116.
A Note on Non-Alethic Temporal Logics Jair Minoro Abe 1, 2 and Kazumi Nakamatsu 3 1 2
Institute For Advanced Studies – University of São Paulo Information Technology Dept., ICET – Paulista University [email protected] 3 School of H.S.E. 㧙 University of Hyogo [email protected]
Abstract. In this paper we discuss time in paraconsistency and paracompleteness. We introduce briefly the temporal logical system '1 which may constitute, for instance, a framework for non-alethic temporal reasoning.
1. Introduction In this work we discuss the possibility of a development of a linear temporal logic, which allows describing sequences of contradictory and/or paracomplete states and to reason about them in a non-trivial manner. The consideration of contradictory states appears in many contexts, for instance, as conflicting goals, contradictory states, and son on. Also, we may have states in a paracomplete sense. Linear temporal logic is a version of temporal logic for formalizing sequences of situations. If the sequence of situations is viewed as the successive states of a program, linear temporal logic suited to program specification and verification (see [4]). Unfortunately, existing approaches to linear temporal logic rely on the classical logic. This means that we cannot adequately deal with inconsistent and/or paracomplete states in temporal reasoning. It is thus expected to work out the foundations of non-alethic linear temporal logic. 2. The non-alethic temporal logic '1 The desired system is based on ideas of [3] and [4]. Now we introduce the non-alethic temporal calculus '1. Let L be the language of '1. The primitive symbols of L are the following: propositional variables: a denumerable set of propositional variables, o (implication), (conjunction), (disjunction), (negation), temporal operators: P (“in the following instant”), R (“always”), ¡ (“at least once” or “inevitablement”), U (“until”), and parentheses. Definition 1. Let A be any formula. Then Ao is shorthand for (A A). A* is an abbreviation for A A, and *A is an abbreviation for A A0. We write A l B for (A o B) (B o A). The symbol * is called strong negation. The postulates (axiom schemes and inference rules) of '1 are those of classical positive logic plus the following: (1) A* B0 o ((A o B) o ((A o B) o A) (2) A0 B0 o ((A B)0 (A B)0 (A o B)0)
J.M. Abe and K. Nakamatsu / A Note on Non-Alethic Temporal Logics
217
(3) A* B* o ((A B)* (A B)* (A o B)*) (4) A0 o (A o A) (A o (A o B)) (5) A* o (A o A) (6) A0 A* (7) P(A o B) o (PA o PB) (8) *PA l P*A (9) ¡A l *R*A (10) R(A o B) o (RA o RB) (11) RA o (A PA PRA) (12) R(A o PA) o (A o RA) (13) A U B o (B (A P(A U B))) (14) [C R(C o (B (A PA)))] o A U B A (15) oA Theorem 1. In '1, all valid schemes and rules of classical positive propositional logic are true. In particular, the deduction theorem is valid in '1. '1 contains intuitionistic positive logic.
In '1, A0 expresses intuitively that the formula A ‘behaves’ classically, so that the motivation of the postulates (1) - (6) are clear. Furthermore, in this calculus, the connectives o, , , and * have all the properties of classical implication, conjunction, disjunction, and negation, respectively. Therefore the classical propositional calculus is contained in '1, though it constitutes a strict sub-calculus of the former. Theorem 2. In '1, the following schemes are not valid: (A A) o B; (A A) o B; (A o (A o B)); (A l A) o B; A o A; (A l A) o B; A o A; (A (A B)) o B; (A o B) o (B o A); (A A); A A; (A B) l (A B); (A B) l (A B); (A A) l (B B) Theorem 3. '1 is non-trivial. 3. Semantic Analysis of '1
We now present Kripke semantics for '1. Definition 1. A Kripke model for '1 is a set theoretical structure K = [W, R, M] where
1. 2. 3.
W is a nonempty set of elements called states. R is a functional binary relation on W. M is an interpretation function with the usual properties (system N1).
Let s be a state. Then: R0(s) indicates s, R1(s) indicates R(s), and Rk(s) indicates R(R (s)), (k N, k > 0, N is the set of natural numbers). k-1
218
J.M. Abe and K. Nakamatsu / A Note on Non-Alethic Temporal Logics
Definition 2. If A is a formula of '1, and w W, we define the relation K,w ՝ A
(K,w force A) by recursion on A, as usual. If it is not the case that K,w ՝ A we write K,w բ A. 1.
K,w բ A K,w ՝ A;
2.
K,w ՝ A K,w ՝ A;
3.
K,w ՝ Bo and K,w ՝ A o B and K,w ՝ A o B K,w բ A;
4.
K,w ՝ A o B K,w բ A or K,w ՝ B;
5.
K,w ՝ A B K,w ՝ A and K,w ՝ B;
6.
K,w ՝ A B K,w ՝ A or K,w ՝ B;
7.
K,w ՝ Ao and K,w ՝ Bo K,w ՝ (A o B)o and K,w ՝ (A B)o and K,w ՝ (A B)º
8.
K,w ՝ A* and K,w ՝ B* K,w ՝ (A o B)* and K,w ՝ (A B)* and K,w ՝ (A B)*
9.
K,w ՝ A K,w բ A.
10. K,w ՝ A or (exclusive) K,w ՝ A K,w ՝ A or K,w ՝ A 11. If K,w ՝ A or K,w ՝ A and K,w ՝ B or K,w ՝ B, then K,w ՝ A o B or K,w ՝ (A o B), K,w ՝ (A B) or K,w ՝ ((A B)), and K,w ՝ (A B) or K,w ՝ ((A B)) 12. If A is a formula, 12.1 K, s ՝ PA iff K, R1(s) ՝ A 12.2 K, s ՝ RA iff K, Ri(s) ՝ A for all i 0 12.3 K, s ՝ ¡A iff K, Ri(s) ՝ A for some i 0 12.4 K, s ՝ A U B iff K, Rj(s) ՝ A for all j 0, or K, Ri(s) ՝ B for some i 0 and K, Rj(s) ՝ A for all j such that 0 j i. Definition 3. Let K = [W, R, M] be a Kripke structure for '1. The Kripke structure K
forces a formula A (in symbols, K ՝ A), if K,w ՝ A for each w W. A formula A is called '1-valid if for any '1-structure K, K ՝ A. A formula A is called valid if it is '1valid for all '1. We symbolize this fact by ՝ A. If * is a set of formulas, we can introduce the concept of * ՝ A, as usual. Theorem 1. Let K be a '1-structure. For all formulas A, B, then
1.
If A is an instance of a propositional tautology then, K ՝ A
2.
If K ՝ A and K ՝ A o B, then K ՝ B
3.
K ՝ R(A o B) o (RA o RB)
4.
K ՝ ¡A o R¡A
5.
K ՝ RA o ¡ A
J.M. Abe and K. Nakamatsu / A Note on Non-Alethic Temporal Logics
6.
219
If K ՝ A then K ՝ RA Theorem 2. Let K be a '1-structure and w W. Then:
1.
K,w ՝ A K,w բ ŗA;
2.
K,w բ A K,w ՝ ŗA;
3.
K,w բ Ao K,w ՝ A and K,w ՝ A;
4.
K,w բ A K,w ՝ A) = 0 and K,w ՝ A;
5.
K,w բ Ao K,w ՝ (A)o;
6.
K,w ՝ Ao K,w ՝ A or K,w բ A Lemma 1. * Ō A * ՝ A. Proof. By induction on the length of the deductions of A from *.
Lemma 2. Each non-trivial set of formulas is contained in a non-trivial maximal set. There are non-trivial inconsistent sets. Lemma 3. Every non-trivial maximal set * has a model. 4. Concluding remarks.
In the proposed temporal system '1, in some state A and A can be both true and in some another state B and B can be both false. We can thus represent an inconsistent state, paracomplete state, or both in these systems. This feature allows a natural description of so-called reactive system. It should also be noted that annotated temporal system are useful for program verification and parallel computation. We intend to discuss more in forthcoming papers. References.
[1] J.M. Abe, K. Nakamatsu & S. Akama, Non-Alethic Reasoning in Distributed Systems, to appear, 2005. [2] B. Banieqbal & H. Barringer, A study of an extended temporal language and a temporal fixed point calculus, Technical Report UMCS-86-10-2, Department of Computer Science, University of Manchester, 1986. [3] Da Costa, N.C.A., Logiques Classiques et non Classiques, Masson, 1996. [4] E.A. Emerson, Temporal and modal logic, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, 995-1072, Elsevier, Amsterdam, 1990. [5] D. Gabbay, A. Pnuelli, S. Shelah, & J. Stavi, The temporal analysis of fairness, in Seventh ACM Symposium on Principles of Programming languages, pp. 163-173, Las Vegas, 1980.
Railway Signal and Paraconsistency Kazumi Nakamatsu a and Jair M. Abe b a University of Hyogo, Himeji JAPAN [email protected] b Univ.Sao Paulo, Paulista University, Sao Paulo BRAZIL [email protected] Abstract. In this paper, we propose a complete lattice structure that expresses a paraconsistent view for railway signal knowledge. In the complete lattice, the meanings of the railway signal information regarding to railway interlocking are denoted well. Also, we have introduce a simulation system in which a paraconsistent logic program EVALPSN (Extended Vector Annotated Logic Program with Strong Negation) based on the railway signal lattice is used to express interlocking information. Keywords. railway interlocking, railway signal display, EVALPSN (Extended Vector Annotated Logic Program with Strong Negation), paraconsistency,
1. Introduction Recently, the treatment of inconsistency has become a crucial issue in all engineering fields. For example, in a multi-agent system, there may be contradictions both in each agent itself and between some agents. We have already proposed a paraconsistent logic program EVALPSN that can deal with defeasible deontic reasoning [3,4], and provided a railway interlocking simulation system based on EVALPSN [5]. In the railway simulation system, we introduce a paraconsistent view for railway signal information and propose a complete lattice structure that expresses railway signal meanings well. This paper is organized as follows : first, we review EVALPSN briefly and introduce the paraconsistent view for railway signal based on EVALPSN ; next, we describe how the railway signal display can be viewed paraconsistently in the EVALP based railway simulation system.
2. EVALPSN Generally, a truth value called an annotation is explicitly attached to each literal in annotated logic programs. For example, let p be a literal, µ an annotation, then p : µ is called an annotated literal. The set of annotations constitutes a complete lattice. An annotation in VALPSN [2] which can deal with defeasible reasoning is a 2-dimensional vector called a vector annotation such that each component is a non-negative integer and the lattice Tv of vector annotations is defined as :
K. Nakamatsu and J.M. Abe / Railway Signal and Paraconsistency
221
(2, 2) q P P @ PP ∗1 ∗3 P P q @q PP α P @ P @ @q ∗2 @q (0, 2) (2, 0) q @ @ @q γP @q β PP 1@ 1 P P @q ⊥ (0, 0)
Figure 1. Lattice Tv (n = 2) and Lattice Td
Tv = { (x, y)|0 ≤ x ≤ n, 0 ≤ y ≤ n, x, y and n are integers }. The ordering of the lattice Tv is denoted by a symbol and defined : let v 1 = (x1 , y1 ) ∈ Tv and v 2 = (x2 , y2 ) ∈ Tv , v 1 v 2 iff x1 ≤ x2 and y1 ≤ y2 . For each vector annotated literal p : (i, j), the first component i of the vector annotation denotes the amount of positive information to support the literal p and the second one j denotes that of negative information. For example, a vector annotated literal p : (2, 1) can be intuitively interpreted that the literal p is known to be true of strength 2 and false of strength 1. In order to deal with defeasible deontic reasoning we extended VALPSN to EVALPSN. An annotation in EVALPSN called an extended vector annotation has a form of [(i, j), µ] such that the first component (i, j) is a 2-dimentional vector as a vector annotation in VALPSN and the second one, µ ∈ Td = {⊥, α, β, γ, ∗1 , ∗2 , ∗3, }, is an index that represents deontic notion or inconsistency. The complete lattice Te of extended vector annotations is defined as the product Tv × Td. The ordering of the lattice Td is denoted by a symbol d and described by the Hasse’s diagrams in Figure1. The intuitive meaning of each member in the lattice Td is ; ⊥ (unknown), α (fact), β (obligation), γ (non-obligation), ∗1 (both fact and obligation), ∗2 (both obligation and non-obligation), ∗3 (both fact and non-obligation) and (inconsistent). Therefore, EVALPSN can deal with not only inconsistency between usual knowledge but also between permission and forbiddance, obligation and forbiddance, and fact and forbiddance. The Hasse’s diagram(cube) shows that the lattice Td − → is a tri- lattice in which the direction γβ represents deontic truth, the direction −→ −−→ ⊥∗2 represents the amount of deontic knowledge and the direction ⊥α represents factuality. Therefore, for example, the annotation β can be intuitively interpreted to be deontically truer than the annotation γ and the annotations ⊥ and ∗2 are deontically neutral, i.e., neither obligation nor not-obligation. The ordering over the lattice Te is denoted by a symbol and defined as : let [(i1 , j1 ), µ1 ] and [(i2 , j2 ), µ2 ] be extended vector annotations, [(i1 , j1 ), µ1 ] [(i2 , j2 ), µ2 ] iff (i1 , j1 ) v (i2 , j2 ) and µ1 d µ2 . There are two kinds of epistemic negations ¬1 and ¬2 in EVALPSN, which are defined as mappings over Tv and Td , respectively. Definition 1 (Epistemic Negations, ¬1 and ¬2 )
222
K. Nakamatsu and J.M. Abe / Railway Signal and Paraconsistency
These epistemic negations, ¬1 and ¬2 , can be eliminated by the above syntactic operation. On the other hand, the ontological negation(strong negation ∼) in EVALPSN can be defined by the epistemic negations, ¬1 or ¬2 , and interpreted as classical negation. Definition 2 (Strong Negation) [1] ∼ F =def F → ((F → F ) ∧ ¬(F → F )), where F is any formula and ¬ is ¬1 or ¬2 , and → is the classical implication. Definition 3 (well extended vector annotated literal) Let p be a literal. p : [(i, 0), µ] and p : [(0, j), µ] are called well extended vector annotated literals(weva-literals for short), where i, j ∈ {1, 2}, and µ ∈ { α, β, γ }. Defintion 4 (EVALPSN) If L0 , · · · , Ln are weva-literals, L1 ∧ · · · ∧ Li ∧ ∼ Li+1 ∧ · · · ∧ ∼ Ln → L0 is called an Extended Vector Annotated Logic Program clause with Strong Negation (EVALPSN clause for short). If it does not include the strong negation, it called EVALP clause for short. An Extended Vector Annotated Logic Program with Strong Negation is a finite set of EVALPSN clauses. Deontic notions and fact are represented by extended vector annotations in EVALPSN as follows : where m is a positive integer, “fact of strength m” is by an annotation [(m, 0), α] ; “obligation of strength m” is by an annotation [(m, 0), β] ; “forbiddance of strength m” is by an annotation [(0, m), β] ; “permission of strength m” is by an annotation [(0, m), γ]. For example, a weva-literal p : [(2, 0), α] can be intuitively interpreted as “it is known that the literal p is a fact of strength 2”, and a weva-literal q : [(0, 1), β] can be intuitively interpreted as “the literal q is forbidden of strength 1”.
3. Paraconsistent View for Railway Signal Generally, there are five display color in railway signal light colors, green(g), green-yellow(gy), yellow(y), yellow-yellow(yy) and red(r). In the railway signal, for example, green color indicates that two track sections ahead of the train have been locked for it and the train is allowed to be operated at full speed ; yellow color
K. Nakamatsu and J.M. Abe / Railway Signal and Paraconsistency
223
does that only one track section ahead of the train has been locked for itself and the train is allowed to go ahead with less than 45km/h. Therefore, the meanings of railway signal colors are different from those of traffic signal colors. So, we propose the complete lattice structure Tv (n = 4) for extended vector annotations in Figure2, which shows railway signal meaning knowledge.
knowledge amount 6
•
r @
slow
• @
• @
yy@•
@• @
@
@• @
• @ @• @
• @
@• @
y@•
@• @ @• @
@
@• @
@• @ @• @ @• @
@• @
@• @
@• gy @• @ g @• @•
@•
@•
⊥
-
fast
Figure 2. Railway Signal Lattice
4. Railway Simulator with Signal Operation In this section, we introduce a prototype simulator for railway systems based on EVALPSN programming with a simple example in Figure 3, which includes railway signal control. First, we describe the signal control with taking five signals S0,S1,S2,S3,S4 in Figure 3. We suppose that the railway network is in the station yard, and there are platforms along the tracks T2 and T3 . Thus, we also suppose that ; the signal S0 is a station yard signal, which has two states, yy(yellow-yellow, which means, for example, “slow down to less than 25km/h”) and r1 (red, which means “stop”) ; the other four signals S1,S2,S3,S4 are start signals, which have two states, g(green, which means “go”) and r2 (red, which means “stop”). These states are represented as the first components of extended vector annotation as well as other entities’ states. Then, the signal control is formalized in EVALPSN clauses, however, as the EVALPSN is a stratified logic program, the strong negation can be treated as the Negation as Failure. [S0]
If it is a fact that the sub-routes R02 or R04 are set and the track section T0 is occupied, the signal S0 is yy ; otherwise, r1 . [S1] If it is a fact that the sub-route R1 is set and the track section T2 is occupied, the signal S1 is g ; otherwise, r2 . [S2] If it is a fact that the sub-route R29 is set and the track section T2 is occupied, the signal S2 is g ; otherwise, r2 . [S3] If it is a fact that the sub-route R3 is set and the track section T3 is occupied, the signal S3 is g ; otherwise, r2 .
224
K. Nakamatsu and J.M. Abe / Railway Signal and Paraconsistency
Figure 3. Simulation for Safety Verification
[S4]
If it is a fact that the sub-route R49 is set and the track section T3 is occupied, the signal S4 is g ; otherwise, r2 . This signal control is formalized in EVALPSN as follows : R(02) : [s, α] → S(0) : [yy, β], R(04) : [s, α] → S(0) : [yy, β], ∼ S(0) : [yy, β] → S(0) : [r1 , β], R(1) : [s, α] → S(1) : [g, β], ∼ S(1) : [g, β] → S(1) : [r2, β], R(29) : [s, α] → S(2) : [g, β], ∼ S(2) : [g, β] → S(2) : [r2, β], R(3) : [s, α] → S(3) : [g, β], ∼ S(3) : [g, β] → S(3) : [r2, β], R(49) : [s, α] → S(4) : [g, β], ∼ S(4) : [g, β] → S(4) : [r2, β].
Figure 3 shows that : we have a train B345 (the black box) in the track section T0 , which is going through the route R02 that includes the sub-routes, T1ca and T2ba ; the sub-routes T1ca and T2ba (the black arrows) have been already locked by the train b345, and the route R02 has been set ; then, we also have another train A123 (the white box) in the track section T3 , which is supposed to go through the route R3 that includes the sub-routes T1bc and T0ab . Under the situation, if the PRR Q3 (for the train A123) to lock the sub-routes T1bc and T0ab (the white arrows), the safety of the PRR Q3 is verified by EVALP programming. Intuitively, as the track section T0 has been occupied by the train B345 and the sub-route T1ca has been locked, there are conflicts between these facts and the safety of the
K. Nakamatsu and J.M. Abe / Railway Signal and Paraconsistency
225
PRR Q3. Therefore, the route R3 is not permitted to be set safely. The states of railway interlocking and the results of the safety verification are described in the window of the simulation frame in Figure 3. The second line in the window shows the states of the five signals S0,S1,S2,S3,S4, only S0 is yellow-yellow (yy) because the route R02 has been set, and the other signals are red (r1 , r2 ), because no other routes can be set safely or no PRR. Note we have to take time sequence into account in actual implementation of the simulation system, althought we have not addressed it in this paper for emphasizing EVALPSN based safety verification and simplicity.
5. Conclusion In this paper, we propose the complete lattice structure that expresses railway signal knowledge with paraconsistent view. In the complete lattice, the meanings of the railway signal information regarding to railway interlocking are denoted well. Also, we have introduce a simulation system in which EVALPSN based on the railway signal lattice is used to express interlocking information.
References [1] da Costa,N.C.A., Subrahmanian,V.S., and Vago,C., The Paraconsistent Logics PT , Zeitschrift f¨ ur Mathematische Logic und Grundlangen der Mathematik 37 (1991) 139–148. [2] Nakamatsu,K., Abe,J.M., and Suzuki,A., Defeasible Reasoning Between Conflicting Agents Based on VALPSN, Proc. AAAI Workshop Agents’ Conflicts, AAAI Press (1999) 20–27. [3] Nakamatsu,K., Abe,J.M., and Suzuki,A., A Defeasible Deontic Reasoning System Based on Annotated Logic Programming, Proc. the Fourth International Conference on Computing Anticipatory Systems, AIP Conference Proceedings 573, AIP (2001) 609–620. [4] Nakamatsu,K., Abe,J.M., and Suzuki,A., Annotated Semantics for Defeasible Deontic Reasoning, Proc. the Second International Conference on Rough Sets and Current Trends in Computing, LNAI 2005, Springer-Verlag (2001) 432–440. [5] Nakamatsu,K., Abe,J.M., and Suzuki,A., “A Railway Interlocking Safety Verification System Based on Abductive Paraconsistent Logic Programming”, Soft Computing Systems, Frontiers in AI Applications, 87, IOS Press (2002) 775–784.
This page intentionally left blank
– Workshop “Rough Sets and Granularity” – T. Murai and M. Inuiguchi
On topological properties of generalized rough sets 1 a
Michiro Kondo a,2 , Tokyo Denki University, Japan
Abstract. In this note we consider topological properties of generalized rough sets. In any approximation space (X, R) of a generalized rough set we show that 1. If R is reflexive, then the set OR = {A ⊆ X | R− (A) = A} defined by R− is a topology on X; 2. If R is reflexive and symmetric, then the topology OR has the following property (Clop) A : open ⇐⇒ A : closed 3. Conversely, for any topology O with (Clop), there is a reflexive and symmetric relation R on X such that O = OR . Keywords. rough sets, binary relation, tolerance, topology
1. Preliminaries Let (X, R) be an approximation space of a generalized rough set, that is, X is a nonempty set and R a relation on X. An operator R− : P(X) → P(X) is defined by R− (A) = {x ∈ X | ∀y (xRy → y ∈ A)} (∀A ⊆ X). For the operator R− we have (cf. [4]) Proposition 1. (1) A ⊆B =⇒ R−(A) ⊆ R−(B) (2) R−( λ Aλ ) = λ R− (Aλ ) (3) λ R−(Aλ ) ⊆ R− ( λ Aλ ) Now we consider a family OR of subsets, which is constructed by the operator R− : OR = {A ⊆ X | R−(A) = A} It holds that Lemma 1. If R is reflexive, then OR is a topology on X and satisfying the property (IP ): 1 This work was supported by Grant-Aid for Scientific Research (No.15500016), Japan Society for the Promotion of Science. 2 Correspondence to: Michiro Kondo, School of Information Environment, Tokyo Denki University, Inzai 270-1382, Japan. Tel.: +81 476 46 8457; E-mail: [email protected]
230
M. Kondo / On Topological Properties of Generalized Rough Sets
(IP ) Aλ ∈ OR (λ ∈ Λ) implies
λ Aλ
∈ OR
Proof. At first we show that OR is the topology on X if R is reflexive. We have to show that (1) ∅, X ∈ OR (2) If A, B ∈ OR , then A ∩ B ∈ OR (3) If Oλ ∈ OR (λ ∈ Λ), then λ Oλ ∈ OR
We only show the case (3). Since R is reflexive, it is obvious that R− ( λ Oλ ) ⊆ λ Oλ . Suppose that x ∈ λ Oλ and Oλ ∈ OR (λ ∈ Λ). There exists a µ ∈ Λ such that x ∈ Oµ = R−(Oµ ).For every y such that xRy, we have y ∈ Oµ ⊆ λ Oλ and x ∈ R− ( λ Oλ ). Hence, λ Oλ ⊆ R−( λ Oλ ). This means that λ
Oλ = R−( Oλ ). λ
Thus the set OR = {A ⊆ X | R−(A) = A} is the topology on X. Moreover it is clear from Proposition 1 (2) that the topology OR satisfies the property (IP ). Now we have a converse question, that is, for any topological space (X, O) with (IP ), is there a relation R such that O = OR = {A ⊆ X | R−(A) = A} ? In general, when a topology O is given, we can define an interior operator I by IA =
{O ∈ O | O ⊆ A} (A ⊆ X).
That is, IA is the greatest open subset contained in A. The following holds for every topology with (IP ). Lemma 2. Let O be a topology with (IP ). For every subset Aλ (λ ∈ Λ), I( Aλ ) = I(Aλ ) We can show the next result for the converse problem of Lemma 1, whether, for any topological space (X, O) with (IP ), is there a relation R such that O = OR = {A ⊆ X | R−(A) = A} ? Theorem 1. For every topological space (X, O) with (IP ), there is a relation R on X such that O = OR = {A ⊆ X | R−(A) = A}. Proof. Let (X, O) be a topological space with (IP ). We define a relation RO as follows: For all x, y ∈ X, (x, y) ∈ RO ⇐⇒ ∀B ⊆ X (x ∈ IB → y ∈ B). It is easy to prove that for all subsets B, C ⊆ X,
M. Kondo / On Topological Properties of Generalized Rough Sets
231
(a) IB ∈ O (b) IB ⊆ B (c) B ⊆ C implies IB ⊆ IC (d) B ∈ O ⇐⇒ B = IB. It is obvious that RO is reflexive. For every subset A ⊆ X, we claim that IA = (RO )− (A). Since IA ⊆ (RO )− (A) by definition of RO , it is sufficient toshow the converse relation. Let x ∈ IA. | x ∈ IB} ∪ {Ac},then Γ = ∅. Indeed, if If we take Γ = {B c Γ = ∅,that is, {B | x ∈ IB} ∩ A = ∅, then we have {B | x ∈ IB} ⊆ A and hence I( {B | x ∈ IB}) ⊆ IA. By lemma 2, it follows that {IB | x∈ IB} ⊆ IA and hence x ∈ IA. But this is a contradiction. So we can conclude that Γ = ∅. There exists an element y ∈ Γ. For that element, we have (x, y) ∈ RO and y ∈ A. This implies that x ∈ (RO )− (A). Thus we have (RO )− (A) ⊆ IA and hence (RO )− (A) = IA. This yields that O = OR . 2. Reflexive and Symmetric Relation Before considering the the case of R being reflexive and symmetric, we study the familiar case of R being reflexive and transitive. The following is a well-known result. Theorem 2. If R is reflexive and transitive, then the operator R− is an interior operator, that is, for all A, B ⊆ X, (1) R− (A) ⊆ A (2) If A ⊆ B then R− (A) ⊆ R−(B) (3) R− (R− (A)) = R− (A) Of course, since R is reflexive, the topology OR constructed by R− satisfies (IP ). Conversely, similar argument of theorem 1 give the next rsult. Theorem 3. For any topology OI construced by an interior operator I with (IP ), there exists a reflexive and transitive relation R such that OI = OR .
Remark If R is reflexive and transitive, since R− is the interior operator, then we can construct a topology OR from R. But this topology has an extra property (IP ). This means that for any family {Oλ } of open sets the intersection Oλ also becomes an open set. This property does not hold for all topology. For example, let X be the set of all real numbers. We introduce a usual topology on X and take a subset
232
M. Kondo / On Topological Properties of Generalized Rough Sets
An =
1 1 − ,1+ n n
(n ∈ N ).
It is obvious that every An is an open set. But their intersection
1 1 − ,1+ = [0, 1] An = n n n
is not the open set. As the example shows, the topological space constructed by a reflexive and transitive relation R is very different from the usual one defined by the interior operator. In the following we consider the case of R being reflexive and symmetric. We can charaterize the topological space defined by such relation. Proposition 2. Let R be a reflexive and symmetric relation on X. For every A ⊆ X, we have R− (A) = A ⇐⇒ Ac = R−(Ac ) Proof. Suppose that R−(A) = A. It is sufficient to show that Ac ⊆ R−(Ac ). If x ∈ R−(Ac ), then there exists y such that xRy but y ∈ Ac and hence that y ∈ A = R−(A). Since R is symmetric, we have yRx and hence x ∈ A, that is, x ∈ Ac . This means that Ac ⊆ R− (Ac ). The converse can be proved similarly. From the above we can show that Theorem 4. If R is reflexive and symmetric, then the topology constructed by R− has a property (Clop): (Clop) A:open set (A ∈ OR ) ⇐⇒ A:closed set (Ac ∈ OR ) Conversely, Theorem 5. For any topology O with (Clop), there exists a reflexive and symmetric relation R such that O = OR . Proof. Let O be a topology with (Clop). We define I and RO as the same as the proof of Theorem 1 : IB = {O ∈ O | O ⊆ B} (x, y) ∈ RO ⇐⇒ ∀B ⊆ X (x ∈ IB → y ∈ B) It is obvious that RO is reflexive and symmetric. By the definition of RO , we have IA ⊆ (RO )− (A). Conversely, we | x ∈ IB} ∪ {Ac }, then suppose that x ∈ IA. If we take Γ = {B c we can conclude Γ = ∅. Otherwise, {B | x ∈ IB} ∩ A = ∅. Since IB ⊆ B, this implies that {IB | x ∈ IB} ∩ Ac = ∅, that is, {IB | x ∈ IB} ⊆ A. Hence we
M. Kondo / On Topological Properties of Generalized Rough Sets
233
have I( {IB | x ∈ IB}) ⊆IA. It follows from (Clop) that IB is closed and hence that I( {IB | x ∈ IB}) = {I(IB) | x ∈ IB} = {IB | x ∈ IB}. This means that x ∈ {IB | x ∈ IB} ⊆ IA. But thiscontradicts to the assumption x ∈ IA. Thus we can conclude that Γ = ∅. Since y ∈ Γ for some y, we have (x, y) ∈ RO and y ∈ Ac by definition of Γ. This yields that x ∈ (RO )− (A). Thus we have (RO )− (A) ⊆ IA and hence (RO )− (A) = IA. It indicates O = OR .
3. Conclusion We considered the relationship between the approxiomation spaces of generalized rough sets and topologies. Since the operator R− defined by a relation R preserves inf in the power set lattice, that is, R− ( Aλ ) = R− (Aλ ), the set OR defined by R− becomes a topology only under the condition for R being reflexive.
Acknowledgements The author would like to thank Professor Andreas Hamfelt in Uppsala University Sweden for his help and support for long time. He and Uppsala University offered me comfortable time to study the theory of rough sets.
References [1] T.B.Iwinski, Algebraic approach to rough sets, Bull. Pol. Ac. Math., 35 (1987), 673–683. [2] J.Jarvinen, Knowledge representation and rough sets, TUCS Dissertations 14 (Turku Center for Computer Science, Turku, Finland), (1999). [3] M.Kondo, On the structure of generalized rough sets, submitted [4] Z.Pawlak, Rough sets, Int. J.Inform. Comp.Sci., 11 (1982), 341–356. [5] L.Polkowski, Rough Sets, Physica-Verlag, Springer, 2002. [6] Y.Y.Yao, Constructive and algebraic methods of the theory of rough sets, Information Sciences 109 (1998), 21–47.
Rough-set-based approaches to data containing incomplete information: possibility-based cases Michinori Nakata a,1 , a and Hiroshi Sakai b a Josai International University b Kyushu Institute of Technology Abstract. Methods based on rough sets to data containing incomplete information are examined under a possibilty-based interpretation for whether a correctness criterion is satisfied or not. The correctness criterion is to give the same results as methods by possible tables. The methods proposed so far do not give the same results as methods by possible tables. Therefore, we show a new formula not using implication operators in methods by valued tolerance relations. The formula bears the results that agree with ones from using possible tables. Thus, by using the formula the methods by valued tolerance relations satisfy the correctness criterion. Keywords. Rough sets, Incomplete information, Missing values, Possibilty-based interpretation
1. Introduction Rough sets, proposed by Pawlak[8], give suitable methods to knowledge discovery from data. Usually, approaches based on rough sets are applied to complete data not containing uncertainty and imprecision. However, there ubiquitously exists uncertainty and imprecision in the real world[7]. Researches handling uncertainty and imprecision are actively done on the field of databases[7], but are not so much on the field of knowledge discovery. Recently, a method directly and strictly handling incomplete information by rough sets has been proposed[11]. On the other hand, a method based on possible tables is proposed[9]. This method is to apply the conventional methods based on rough sets to possible data obtained from dividing an incomplete table into possible tables, and then to aggregate the obtained results. The former methods have to give the same results as the latter. This requirement is called a strong representation system or a strong correctness criterion on the field of incomplete databases[4,12]. We apply the strong correctness criterion as a correctness criterion to methods based on rough sets. Thus, we check whether the method proposed so far to directly handle incomplete information by rough sets satisfies the correctness criterion or not. When incomplete information is probabilistically interpreted, Nakata clarifies that the correctness criterion is not satisfied for the method proposed so 1 Correspondence to: Michinori Nakata, Josai International University,1, Gumyo, Togane, Chiba 283-8555, Japan. Tel.: +81-475-53-2121; Fax: +81-475-55-8811; E-mail: [email protected].
M. Nakata and H. Sakai / Rough-Set-Based Approaches to Data Containing Incomplete Information 235
far[6]. In this paper, we focus on cases in which incomplete information is possibilistically interpreted. 2. Approaches based on rough sets In a table t consisting of a set A(= {A1 , . . . , An }) of attributes, the indiscernibility relation IN D(X) for a subset X of attributes is IN D(X) = {(o, o )|∀Ai ∈ X o[Ai ] = o [Ai ]}, where o[Ai ] and o [Ai ] are attribute values of objects o and o for an attribute Ai , respectively. Suppose that the family of all equivalence classes obtained from the indiscernibility relation IN D(X) is denoted by E(X) (= {E(X)1 , . . . , E(X)m }), where E(X)i is an equivalence class. When every value of attributes comprising X is exact, E(X)i ∩ E(X)j = ∅ with i = j. Thus, the objects are uniquely partitioned. The indiscernible set S(X)o ∈ E(X) for a value o[X] of an object o over X is S(X)o = {o |∀Ai ∈ X o[Ai ] = o [Ai ]}. The lower approximation IN D(Y, X) and the upper approximation IN D(Y, X) of IN D(Y ) by IN D(X) are expressed by means of using indiscernible sets as follows: IN D(Y, X) = {o |S(X)o ⊆ S(Y )o }, IN D(Y, X) = {o |S(X)o ∩ S(Y )o = ∅}. Since the objects are restricted in ones contained in one table, the above formulas are reduced to IN D(Y, X) = {o|S(X)o ⊆ S(Y )o }, IN D(Y, X) = {o|S(X)o ∩ S(Y )o = ∅}. When an object o contains incomplete information for some attributes, it does not always take the same actual value as another object o , even if both objects have the same expression. We can obtain to what degree the object o takes the same actual value as the object o . The degree is an indiscernibility degree of the object o with the object o . The indiscernible set S(X)o is replaced as follows: S(X)o = {(o , EQ(o[X], o[X]))|∀Ai ∈ X EQ(o[Ai ], o [Ai ]) = 0 ∧ o = o } ∪{(o, 1)},
where EQ(o[X], o [X]) is an indiscernibility degree of o[X] with o [X] over X, and EQ(o[X], o [X]) = min EQ(o[Ai ], o [Ai ]). Ai ∈X
The lower approximation IN D(Y, X) and the upper approximation IN D(Y, X) of IN D(Y ) by IN D(X) are expressed by means of using indiscernible sets as follows: IN D(Y, X) = {(o, κ(S(X)o ⊆ S(Y )o ))|κ(S(X)o ⊆ S(Y )o ) > 0}, IN D(Y, X) = {(o, κ(S(X)o ∩ S(Y )o = ∅))|κ(S(X)o ∩ S(Y )o = ∅) > 0},
236 M. Nakata and H. Sakai / Rough-Set-Based Approaches to Data Containing Incomplete Information
where κ(F ) denotes a degree to which F is satisfied. In order to estimate to what extent the approximation is correct, a measure called quality of approximation is used. This measure means to what degree a dependency of attributes Y to attributes X holds[8]; in other words, to what degree a table t satisfies a dependency X ⇒ Y . The degree is κ(X ⇒ Y )t = |IN D(Y, X)|/|t|, where |t| is the cardinality of a table t, which is equal to the total number of objects in the table t. This degree can be also calculated by means of summing a degree to which each object o in the table t satisfies X ⇒ Y . The degree κ(X ⇒ Y )o to which an object o satisfies X ⇒ Y is, κ(X ⇒ Y )o = κ(S(X)o ⊆ S(Y )o ). Using the degree, κ(X ⇒ Y )t = Σo∈t κ(X ⇒ Y )o /|t|. In the next section, we calculate a degree κ(X ⇒ Y )o of a dependency X ⇒ Y for each object o under rough-set-based methods. This means to obtain a degree with which each object o belongs to the lower approximation IN D(Y, X).
3. Methods handling incomplete information Some pioneering work is done by Slowi´nski and Stefanowski[10] and Grzymala[3] to handle incomplete information by using rough sets. When we handle a table containing incomplete information, obtained equivalence classes overlap each other; namely, E(X)i ∩ E(X)j = ∅ with i = j. Recently, several investigations have been made on this topic. Kryszkiewicz applies rough sets to data containing incomplete information by interpreting a missing value expressing unknown as indiscernible with an arbitrary value[5]. In this method an object in which some attribute values are missing values is indiscernible with every object for the attributes. Slowi´nski and Tsoukiàs apply rough sets to a table containing incomplete information by making an indiscernibility relation from the viewpoint that an object with an exact attribute value is similar to another object with the attribute value being missing, but the converse is not so[11]. The above two methods handle incomplete information by deriving an indiscernibility relation from giving a missing value an interpretation for indiscernibility and then by applying the conventional method of rough sets to the indiscernibility relation. The effect of the number of domain elements to indiscernibility of missing values is not taken into account. Thus, these methods only approximately deal with missing values under some interpretations. Stefanowski and Tsoukiás make an indiscernibility relation by introducing the degree to which two objects cannot be discerned under the premise that an attribute can equally take an arbitrary value included in the corresponding domain when the attribute value is missing[11]. This method strictly deals with missing values. In the method, they use implication operators in calculating an inclusion degree of two indiscernible sets. Active researches are made into incomplete information in the field of databases[7]. Some extensions have to be made to operators in order to directly deal with incomplete information. In order to check whether the extended operators create correct results in
M. Nakata and H. Sakai / Rough-Set-Based Approaches to Data Containing Incomplete Information 237
query processing or not, a criterion called a strong representation system or a strong correctness criterion is used[4,12]. We adopt the criterion as a correctness criterion in rough-set-based methods. Directly dealing with tables containing incomplete information can be regarded as equivalent to extending the conventional method applied to tables not containing incomplete information. The correctness criterion is checked as follows: • To derive a set of possible tables from a table containing incomplete information. • To aggregate the results obtained from applying the conventional method to each possible table. • To compare the aggregated results with ones obtained from directly applying the extended method to the table. When two results coincide, the correctness criterion is satisfied. In the next section, we examine the correctness of methods proposed so far according to this criterion through calculating a degree of dependency.
4. Comparative studies on methods handling incomplete information 4.1. Methods by possible tables We suppose that table t containing incomplete information is given as follows: t O 1 2 3 4
A x x @ @
B a a b a
Here, attribute O denotes the object identity and @ denotes a missing value that means unknown. Suppose that domains dom(A) and dom(B) of attributes A and B are {x, y} and {a, b}, respectively. When a missing value of an attribute is possibilistically interpreted, it is expressed in a uniform possibility distribution with a possibility degree 1 for every element comprising the domain of the attribute. For example, attribute values o3 [A] and o4 [A] denoted by @ are expressed in a possibility distribution {(x, 1), (y, 1)}p. Possible tables obtained from table t are those that every missing value @ is replaced by an element of the possibility distribution that expresses the missing value. The following four possible tables with a possibility degree 1 are derived: P oss(t)1 O A B 1 x a 2 x a 3 x b 4 x a
P oss(t)2 O A B 1 x a 2 x a 3 x b 4 y a
P oss(t)3 O A B 1 x a 2 x a 3 y b 4 x a
P oss(t)4 O A B 1 x a 2 x a 3 y b 4 y a
By using these four possible tables, the original table t is expressed in the following possibility distribution:
238 M. Nakata and H. Sakai / Rough-Set-Based Approaches to Data Containing Incomplete Information
t = {(P oss(t)1 , 1), (P oss(t)2 , 1), (P oss(t)3 , 1), (P oss(t)4 , 1)}p. We examine contributions of each object oi to a dependency A ⇒ B. There exists no object that contributes to A ⇒ B in P oss(t)1 . Only the fourth object contributes to A ⇒ B in P oss(t)2 . All the objects contribute to A ⇒ B in P oss(t)3 . The first and second objects contribute to A ⇒ B in P oss(t)4 . Thus, the contributions of the objects to A ⇒ B are expressed in the same possibility distribution {(0, 1), (1, 1)}p. Thus, possibility degrees to which each object oi satisfies A ⇒ B are as follows: κ(A ⇒ B)o1 = κ(A ⇒ B)o2 = κ(A ⇒ B)o3 = κ(A ⇒ B)o4 = 1. Consequently, κ(A ⇒ B)t = (1 + 1 + 1 + 1)/4 = 1. We examine whether the same value κ(A ⇒ B)oi for each object oi is obtained or not for the method proposed so far in the following subsections. 4.2. Methods by valued tolerance relations Stefanowski and Tsoukiàs[11] take a probabilistic interpretation to missing values. When an attribute value is a missing value, the actual value is one of elements in the domain of the attribute and which element is the actual value does not depend on a specified element; in other words, each element has the same degree for that the element is the actual value. Under this interpretation, a discernibility relation is called a valued tolerance relation. We check their method under a possibility-based interpretation of missing values. When missing values are expressed in possibility distributions, indiscernibility relations IN D(A) and IN D(B) for attributes A and B in table t are, respectively, 1111 1101 1 1 1 1 1 1 0 1 IN D(A) = , IN D(B) = . 1 1 1 1 0 0 1 0 1111 1101 The indiscernible sets of the objects for attribute A are, S(A)o1 = {(o1 , 1), (o2, 1), (o3 , 1), (o4 , 1)}, S(A)o2 = {(o1 , 1), (o2, 1), (o3 , 1), (o4 , 1)}, S(A)o3 = {(o1 , 1), (o2, 1), (o3 , 1), (o4 , 1)}, S(A)o4 = {(o1 , 1), (o2, 1), (o3 , 1), (o4 , 1)}. The indiscernible sets of the objects for attribute B are, S(B)o1 = {(o1 , 1), (o2 , 1), (o4, 1)}, S(B)t2 = {(o1 , 1), (o2 , 1), (o4, 1)}, S(B)t3 = {(o3 , 1)}, S(B)t4 = {(o1 , 1), (o2 , 1), (o4, 1)}. Suppose that an object o belongs to sets S and S with degrees Po,S and Po,S , respectively. The degree κ(S ⊆ S ) that the set S is included in another set S is,
M. Nakata and H. Sakai / Rough-Set-Based Approaches to Data Containing Incomplete Information 239
κ(S ⊆ S ) = min κ(o ∈ S → o ∈ S ). o∈S
In this formula, the inclusion degree of two sets is calculated by means of using an implication operator →. We calculate cases using Gödel, Kleene-Dienes, and Lukasiewicz implication operators, which are representatives of R, S, and R-S implication operators[1]. Now, S and S are S(A)oi and S(B)oi , respectively. In using Gödel implication operator, the contributions of the objects are as follows: κ(A ⇒ B)o1 = min(1, 1, 0, 1) = 0, κ(A ⇒ B)o2 = min(1, 1, 0, 1) = 0, κ(A ⇒ B)o3 = min(0, 0, 1, 0) = 0, κ(A ⇒ B)o4 = min(1, 1, 0, 1) = 0. Thus, for the degree of dependency A ⇒ B, κ(A ⇒ B)t = (0 + 0 + 0 + 0)/4 = 0. The obtained values κ(A ⇒ B)oi and κ(A ⇒ B)t are not equal to ones obtained from possible tables. So do the results obtained from using Kleene-Dienes and Lukasiewicz implication operators.
5. Methods satisfying the correctness criterion Why cannot the method by Stefanowski and Tsoukiàs satisfy the correctness criterion? Stefanowski and Tsoukiàs calculate the inclusion degree of two sets to which each element belongs with a degree as follows: • To calculate to what degree every element belonging to a set also belongs to another set by using an implication operator. • To aggregate the obtained degrees together. The process shows that the total inclusion degree is obtained through aggregating the inclusion degrees for every element. This is valid under the condition that an inclusion degree for an element is determined independently of another element. Is this valid in the present situation? In the previous section, the degree κ(A ⇒ B)oi of a dependency A ⇒ B for every object oi does not coincide with the degree obtained from using possible tables. This is due to not taking into account the fact that when an attribute value of an object is missing, the object simultaneously has both possibilities that it is equal to another object and it is not equal to that object for the attribute. For example, in the indiscernible set S(A)o1 of the object o1 for the attribute A there exists four cases: o1 = o2 = o3 and o1 = o2 = o4 , o1 = o2 = o3 = o4 , o1 = o2 = o4 = o3 , and o1 = o2 = o3 = o4 . These cases have the same degree 1 of possibility. Therefore, all the objects in a discernible set have to be simultaneously, not separately, dealt with. This shows that the condition described above is not valid in the present situation. From considering the above viewpoint, we propose a new formula for calculating κ(X ⇒ Y )oi . Let ps(X)oi ,l be an element of the power set P S(X)oi of S(X)oi \oi .
240 M. Nakata and H. Sakai / Rough-Set-Based Approaches to Data Containing Incomplete Information
κ(X ⇒ Y )oi = κ(S(X)o ⊆ S(Y )o ) = max min(κ(∧o ∈ps(X)oi ,l (oi [X] = o [X]) l
∧o ∈ps(X)oi ,l (oi [X] = o [X])), κ(∧o ∈ps(X)oi ,l (oi [Y ] = o [Y ]))), where κ(f) is the degree to which a formula f is valid and κ(f) = 1 when there is no f. In this formula, all the elements in a discernible set are simultaneously handled. The first term denotes a possibility degree with which objects are indiscernible and the others are discernible for a set X of attributes. The second terms denotes a possibility degree with which the objects that are indiscernible for X are also indiscernible for a set Y of attributes. Therefore, a possibility degree of inclusion for two sets is correctly calculated. We recalculate the degree of dependency A ⇒ B in table t. For the object o1 , S(A)o1 \o1 = {(o2 , 1), (o3 , 1), (o4 , 1)}. For the power set P S(X)o1 of S(A)o1 \o1 , P S(X)o1 = {{(∅, 1)}, {(o2, 1)}, {(o3 , 1)}, {(o4, 1)}, {(o2, 1), (o3 , 1)}, {(o2 , 1), (o4 , 1)}, {(o3, 1), (o4 , 1)}, {o2(1), o3 (1), o4 (1)}}. We omit all the cases of elements containing o3 for the power set SP (X)o1 , because κ(o1 [B] = o3 [B]) = 0. For the element {(∅, 1)}, κ(o1 [A] = o2 [A] ∧ o1 [A] = o3 [A] ∧ o1 [A] = o4 [A]) = 0. For the element {(o2 , 1)}, κ(o1 [A] = o2 [A] ∧ o1 [A] = o3 [A] ∧ o1 [A] = o4 [A]) = 1, κ(o1 [B] = o2 [B]) = 1. For the element {(o4 , 1)}, κ(o1 [A] = o2 [A] ∧ o1 [A] = o3 [A] ∧ o1 [A] = o4 [A]) = 0. For the element {o2 (1), o4 (1)}, κ(o1 [A] = o2 [A] ∧ o1 [A] = o3 [A] ∧ o1 [A] = o4 [A]) = 1, κ(o1 [B] = o2 [B] ∧ o1 [B] = o4 [B]) = 1. Thus, κ(X ⇒ Y )o1 = max(0, min(1, 1), 0, 0, 0, min(1, 1), 0, 0) = 1. Similarly, κ(X ⇒ Y )o2 = κ(X ⇒ Y )o3 = κ(X ⇒ Y )o4 = 1. The obtained results coincide with ones from possible tables. Proposition When the new formula is used, methods by valued tolerance relations satisfy the correctness criterion.
M. Nakata and H. Sakai / Rough-Set-Based Approaches to Data Containing Incomplete Information 241
6. Conclusions We have examined rough-set-based methods for calculating a degree of dependency, a measure of quality of approximation, in tables containing missing values under the possibility-based interpretation. The method by Stefanowski and Tsoukiàs do not simultaneously handle all the objects in a discernible set. The example shows that their method does not satisfy the correctness criterion that corresponds to the strong correctness criterion on the field of databases. Therefore, we have proposed a new formula in which all the objects in a discernible set are simultaneously dealt with. This formula leads to that methods by valued tolerance relations satisfy the correctness criterion.
Acknowledgements This research has partially been supported by the Grant-in-Aid for Scientific Research (C), Japanese Ministry of Education, Science, Sports, and Culture, No. 16500176.
References [1] Dubois, D. and Prade, H. [1991]Fuzzy Sets in Approximating, Part 1: Inference with Possibility Distributions. Fuzzy Sets and Systems, 40, 143-202. [2] Gediga, G. and Düntsch, I. [2001]Rough Approximation Quality Revisited, Artificial Intelligence, 132, 219-234. [3] Grzymala-Busse, J. W. [1991]On the Unknown Attribute Values in Learning from Examples, in Ras, M. Zemankova, (eds.), Methodology for Intelligent Systems, ISMIS ’91, Lecture Notes in Artificial Intelligence 542, Springer-Verlag, 368-377. [4] Imielinski, T. and Lipski, W. [1984]Incomplete Information in Relational Databases, Journal of the ACM, 31:4, 761-791. [5] Kryszkiewicz, M. [1999]Rules in Incomplete Information Systems, Information Sciences, 113, 271-292. [6] Nakata, M. [2004] Some Issues on Rough-set-based Approaches to Data Containing Incomplete Information, Proceedings of SCIS & ISIS 2004, Joint 2nd International Conference on Soft Computing and Intelligent Systems and 5th International Symposium on Advanced Intelligent Systems, THP-8-4(6 pages). [7] Parsons, S. [1996] Current Approaches to Handling Imperfect Information in Data and Knowledge Bases, IEEE Transactions on Knowledge and Data Engineering, 83, 353-372. [8] Pawlak, Z. [1991] Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers 1991. [9] Sakai, H. [1998]Some Issues on Nondeterministic Knowledge Bases with Incomplete Information, in: Proceedings of RSCTC’98, Polkowski, L. and Skowron, A., eds., Lecture Notes in Artificial Intelligence Vol. 1424, Springer-Verlag 1998, pp. 424-431. [10] Słowi´nski, R. and Stefanowski, J. [1989]Rough Classification in Incomplete Information Systems, Mathematical and Computer Modelling, 12:10/11, 1347-1357. [11] Stefanowski, J. and Tsoukiàs, A. [2001]Incomplete Information Tables and Rough Classification, Computational Intelligence, 17:3, 545-566. [12] Zimányi, E. and Pirotte, A. [1997] Imperfect Information in Relational Databases, in Uncertainty Management in Information Systems: From Needs to Solutions, A. Motro and P. Smets, eds., Kluwer Academic Publishers, 1997, pp. 35-87.
Rough Set Semantics for Three-Valued Logics Seiki Akama
, and Tetsuya Murai
Department of Information Systems, Teikyo Heisei University, Japan. Graduate School of Engineering, Hokkaido University, Japan. Abstract. Rough set was introduced by Pawlak to represent coarse information and is related to a semantic basis for non-classical logics. In this paper, we proposed rough set semantics and show that it can be used as a natural semantics for some three-valued logics. As a case study, we deal with three-valued logics of Lukasiewicz and Kleene. Keywords. rough sets, rough set semantics, three-valued logic
1. Introduction Rough set was introduced by Pawlak to represent coarse information; see Pawlak [5]. Since then, the concept of rough set has been applied to several fields in computer science, in particular, data analysis and information systems. From a theoretical point of view, there is a connection between rough set and logic. It is well known that the collection of all subsets of a set constitutes a Boolean algebra and that its logic is exactly the classical propositional logic. J. Pomykala and J.A. Pomykala [7] showed that the collection of rough sets of an approximation space forms a regular double Stone algebras. Such results suggest that rough sets could serve as semantical tools for non-classical logics. In fact, rough set based semantics is closely related to Kripke semantics for modal logic. In this paper, we propose rough set semantics as a general semantical framework for non-classical logics. Our starting point is a representation of rough sets using Boolean algebra, and it can give a valuation based on rough set. As a case study, we deal with some three-valued logics. The rest of this paper is structured as follows. In section 2, we review rough sets. In section 3, we introduce a rough set semantics and discuss Kleene’s strong and weak connectives within the proposed semantics. In section 4, by giving a semantics for implications we reach a rough set semantics for some three-valued logics of Lukasiewicz and Kleene. The final section concludes the paper. 1 Correspondence to: Seiki Akama, Computational Logic Laboratory, Department of Information Systems, Teikyo Heisei University, 2289 Uruido, Ichihara, Chiba 290-0193, Japan. Tel.: +81 436 74 6134; Fax: +81 436 74 6400; E-mail: [email protected]
S. Akama and T. Murai / Rough Set Semantics for Three-Valued Logics
243
2. Rough Sets The concept of rough set was proposed by Pawlak [5]; also see Pawlak [6]. A rough set can be seen as an approximation of a set denoted by a pair of sets, called the lower and upper approximation of the set to deal with reasoning from imprecise data. Rough set theory was developed in order to serve as theoretical foundations for many applications as shown in Pawlak [6]. We here sketch the background of rough sets. Let be a non-empty finite set, called the universe of objects in question. Then, any subset is called a concept in and any family of concepts in is called knowledge about . If be the equivalence relation on , then denotes the family of all equivalence relations of (or classification about ), called categories or concepts of . We write for a category in containing an element . If and , then is also an equivalence relation called indiscernibility relation on , designated as . An approximation space is a pair . Then, for each subset and equivalene relation , we associate two subsets, i.e. ,
.
is called the lower approaximation of , and is called the upper approxiHere, mation of , respectively. A rough set is designated as the pair . Intuitively, is the set of all elements of which can be certainly classified as elements of in the is the set of elements which can be possibly classified as elements knowledge , and of in the knowledge . Then, we can define three types of sets, i.e. ( -positive region of ), ( -negative region of ), ( -boundary region of ). These sets enable us to classify our knowledge. For several mathematical properties of rough sets, the reader is referred to Pawlak [6].
3. Three-Valued Logics Three-valued logics are many-valued logics with three truth-values. In particular, threevalued logics of Lukasiewicz [2] and Kleene [1] are famous, and they have been applied to the formalization of several topics in philosophy, linguistics, computer science, and others. Their semantics can be simply given by the truth-value table. This is effective but lack intuitive appeal. We believe that rough set theory is promising to model three-valued logics. From the definition of rough set, we can naturally interpret truth as positive region and falsity as negative relgion. In addition, non-falsity correspond to boundary region. Then, for the case of three-valued logic, the third truth-value is expressed by positive and boudary region. This implies that several three-valued logics are distinguished by specfifying the interpretation of non-falsity. Then, we set some restriction on the computation of non-falsity. By means of Boolean algebras, the idea can be formally demonstrated. The language of three-valued logic has the unary connective (negation) and three binary connectives (conjuction), (disjunction), (implication). Let be a
244
S. Akama and T. Murai / Rough Set Semantics for Three-Valued Logics
non-empty set of propositional variables. We may use connectives with the subscript to denote a specific logic. Then, a formula is constructed as usual. We denote the set of all formulas by Fml. In this section, we focus on the subsystem of with negation, conjunction and disjunction, and we discuss implications in the next section. be a set and be a Boolean algebra based on . A rough set model Let of is seen as a triple , where is a set, is the and and is a rough valuation function for all , where , we write set function. For , , . , the member of is intutively seen as a world. In other words, in the rough In upper approximation, respecset term, corresponds to lower approximation and reads that is a set of worlds in tively. More formally, we can claim that which is true and is a set worlds in which is not false, respectively. and , we define Next, we can define . Let as follows: iff iff iff
, , .
is extended for complex formulas by using rough set function of logical connectives. We first discuss Kleene’s strong three-valued connectives which are the same as the be the truth-values: true, undeones in Lukasiewicz’s three-valued logics. Let fined, and false, respectively. The truth-value has the correponding interpretations in different three-valued logics. For example, Kleene regarded it as “undefined" in view of recursive function theory and Lukasiewicz viewed it as “indeterminate" from his philosophical motivations. Here are the truth-value tables for negation, conjunction and disjunction.
Now, we provide an interpretation for these logical connectives. This can be done by for complex formulas. stipulating rough set function in a rough set model . Then, is defined as follows: Let
S. Akama and T. Murai / Rough Set Semantics for Three-Valued Logics
245
Here, denote complement, intersection, union in Boolean algebra, respectively. is defined as . Theorem 3.1 Let be a formula in with negation, conjunction and disjunction and be a rough set model. Then, is a valuation function for . (Proof): It suffices to check that obeys the truth-value table for described above. First, we consider negation . Let . Then, is a set in which is true is true, i.e. is false, we and is a set in which is true or undefined. Since in have that is not false in . In , is false, i.e. is true. From these, we have . Next, we consider conjunction . Let . is true iff both and are true. Thus, we have that is true in and is true in , i.e. is true in . is false iff either is false or is false (and both are false). This means that is false in . From this, is not false in . From these, we have . Finally, we consider disjunction . Since can be defined , follows. In Kleene’s weak three-valued logic denoted , a formula receives the truthvalue if it contains a subformula whose truth-value is . Here are the truth-value tables for weak connectives.
We can also provide rough set model . We denote Kleene’s weak connectives . is defined as . The definition of is as follows: by
Then, we have the following result whose proof is carried out as above: Theorem 3.2 with negation, conjunction and disjunction and be Let be a formula in . a rough set model. Then, is a valuation function for From a point of view of rough set semantics, strong connective is more intuitive than weak one.
246
S. Akama and T. Murai / Rough Set Semantics for Three-Valued Logics
4. On Implications As is well known, Lukasiewicz’s three-valued logic and Kleene’s strong three-valued are distinguished by the interpretations of implication . Namely, when logic both and receive , receives whereas receives . Below is the and . truth-value table for implications
A rough set model for is denoted by , is defined as . For , we set: For
for
. Here,
is the same as
.
. is true iff is false or is true. This interpretation is paraprased as follows. is not false (true or undefined) iff ( is false or is true) or ( is not true and is not false). Theorem 3.3 and be a rough set model. Then, is a valuation Let be a formula in function for . for implication satisfies the truth-value table for . (Proof): It suffices to check that is false, i.e. is true and is false. This implies that First, we consider the case . From this, is not false in . is true, i.e. is false or is true, or both Second, we consider the case and are true. This gives rise to . These two lead us to conclude that . is expressed as , where are as in Rough set model for except :
Theorem 3.4 and be a rough set model. Then, is a valuation Let be a formula in function for . and lies in the treatment of the implication in which (Proof): The difference of and is undefined. In this case, is undefined. As a consequence, should be replaced by for truth. The proof of remaining cases are similar to those in theorem 3.3. What we can learn from these two theorems is that Lukasiewicz’s and Kleene’s threevalued logic are rough. And we can distinguish them in view of boundary region of truth values. This interpretation seems new for these major three-valued logics.
S. Akama and T. Murai / Rough Set Semantics for Three-Valued Logics
247
5. Concluding Remarks We proposed rough set semantics for three-valued logics. This work is part of larger project of a development of rough set semantics for non-classical logics. For the case of three-valued logics, non-falsity subsumes truth and undefinedness. This situation can be effectively given by a proper definition of rough set function. In addition, the proposed semantics presents a natural observation of undefinedness. Technically, the construction can be shown for any three-valued logic which has a truth-value table. The idea could be also extended for other three-valued logics and many-valued logics like four-valued logics. For other non-classical logics, we should elaborate our semantics. In Kripke type semantics for non-classical logics, the notion of world and accesibility relation play cricual role. It is thus necessary to incorporate these concepts into rough set model. We will explore the topic in future.
References [1] Kleene, S.C., Introduction to Metamathematics, North-Holland, Amsterdam, 1952. [2] Lukasiewicz, J., On 3-valued logic, 1920, S.McCall (ed.), Polish Logic, Oxford University Press, Oxford, 1967. [3] Orlowska, E., Modal logics in the theory of information systems, Zeitschrift für mathematische Logik und Grundlagen der Mathematik, 30 (1988), 213–222. [4] Orlowska, E., Logic for reasoning about knowledge, Zeitschrift für mathematische Logik und Grundlagen der Mathematik, 35 (1989), 559–572. [5] Pawlak, Z., Rough sets, International Journal of Computer and Information Sciences, 11 (1982), 341–356. [6] Pawlak, Z., Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer, Dordrecht, 1990. [7] Pomykala, J. and Pomykala, J.A., The stone algebra of rough sets, Bulletin of Polish Academy of Science, Mathematics 36 (1988), 495–508.
Paraconsistency and Paracompleteness in Chellas’s Conditional Logics Tetsuya Murai a,1 Yasuo Kudo b , Seiki Akama c , and Jair M. Abe d a
Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Japan b Deaprtment of Computer Science and Systems Engineering, Muroran Institute of Technology, Japan c Department of Information Systems, Teikyo Heisei University, Japan d University of Sa˜ o Paulo, Brazil Abstract. Paraconsistency and its dual paracompleteness are now counted as key concepts in intelligent decision systems because so much inconsistent and incomplete information can be found around us. In this paper, a framework of conditional models for conditional logic and their measure-based extensions are introduced in order to represent association rules in a logical way. Then paracomplete and paraconsistent aspects of conditionals are examined in the framework. Keywords. Paraconsistency, Paracompleteness, Conditional logics, Standatrd models, Minimal models, Neasure-based semantics.
1. Introduction Recenly many researchers have been put emphasis on both paraconsistency and its dual paracompleteness in intelligent decision systems because nowadays there are so much inconsistent and incomplete information around us. Now we must deal with inconsistency in a clever way. In classical logic, inconsistency means triviality in the sense that all sentences become theorems. Paraconsistency means inconsistency but non-triviality. Thus we need new kinds of logic like paraconsistent and annotated logics[1,2,5]. Paracompleteness is the dual concept of paraconsistency where there is a sentence such that neither the sentence nor its negation can be proved. In this paper, we introduce a framework of Chellas’s conditional models for conditional logic and then extend it for measure-based cases. Then paracomplete and paraconsistent aspects of conditionals are examined in the framework. 1 Correspondence to: Tetsuya Murai, Research Group of Mathematical Information Science, Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Japan. Tel. & Fax: +81 11 706 6757; E-mail: [email protected].
T. Murai et al. / Paraconsistency and Paracompleteness in Chellas’s Conditional Logics
249
2. Conditional Models for Conditional Logics 2.1. Language Given a set P of atomic sentences, a language LCL (P) for conditional logic is formed from P as the set of sentences closed under the usual propositional operators such as , ⊥, ¬, ∧, ∨, →, and ↔ as well as 2→ and 3→1 (two conditionals ) in the following usual way. 1. 2. 3. 4.
If x ∈ P then x ∈ LCL (P). , ⊥ ∈ LCL (P). If p ∈ LCL (P) then ¬p ∈ LCL (P). If p, q ∈ LCL (P) then p ∧ q, p ∨ q, p → q, p ↔ q, p2→q, p3→q ∈ LCL (P).
2.2. Standard conditional models Chellas[4] describes two kind of models, that is, the standard and minimal ones, in possible worlds semantics in conditional logic. There relationship is similar to Kripke and Scott-Montague models for the usual modal logics. Definition 1 (Chellas[4], p.268) tional logic is a structure
A standard conditional model MCL for condiW, f, v,
where W is a non-empty set of possible worlds, v is a truth-assignment function for the atomic sentences at every world: v : P × W → {0, 1}, and f is a function f : W × 2W → 2 W . v is extended for compound sentences and, in particlar, the truth conditions for 2→ and 3→ in standard conditional models are given by df
1. MCL , w |= p2→q ⇐⇒ f(w, pMCL ) ⊆ qMCL , df 2. MCL , w |= p3→q ⇐⇒ f(w, pMCL ) ∩ qMCL = ∅, where pMCL = {w ∈ W | MCL , w |= p}. Thus we have the following relationship: p2→q ↔ ¬(p3→¬q). The function f can be regarded as a kind of selection function. That is, p2→q is true at a world w when q is true at any world selected by f with respect p and w. Similarly, p3→q is true at a world w when q is true at least at one of the worlds selected by f with respect p and w. 1 In
[4], Chellas used only 2→. The latter connective 3→ follows Lewis[6].
250
T. Murai et al. / Paraconsistency and Paracompleteness in Chellas’s Conditional Logics
2.3. Minimal conditional models A minimal conditional models is a Scott-Montague-like extension of standard conditional model[4]. Definition 2 (Chellas[4], p.270) tional logic is a structure
A minimal conditional model MCL for condiW, g, v,
where W and v are the same ones as in the standard conditional models. The difference is the second term W
g : W × 2W → 22 . The truth conditions for 2→ and 3→ in a minimal conditional model are given by df
1. MCL , w |= p2→q ⇐⇒ qMCL ∈ g(w, pMCL ), df
C
2. MCL , w |= p3→q ⇐⇒ (qMCL ) ∈ g(w, pMCL ), Thus we have also the following relationship: p2→q ↔ ¬(p3→¬q). Note that, if the function g satisfies the following condition X ∈ g(w, pMCL ) ⇔ ∩g(w, pMCL ) ⊆ X for every world w and every sentence p, then, by defining fg (w, pMCL ) = ∩g(w, pMCL ), df
we have the standard conditional model <W, fg , V > that is equivalent to the original minimal model.
3. Measure-Based Extensions of Models for Conditional Logics Next we introduce measure-based extensions of the previous minimal conditional models. Such extensions are models for graded conditional logics. Given a finite set P of items as atomic sentences, a language LgCL (P) for graded conditional logic is formed from P as the set of sentences closed under the usual propositional operators such as , ⊥, ¬, ∧, ∨, →, and ↔ as well as 2→k and 3→k (graded conditionals ) for 0
T. Murai et al. / Paraconsistency and Paracompleteness in Chellas’s Conditional Logics
251
4. If p, q ∈ LgCL (P) then p ∧ q, p ∨ q, p → q, p ↔ q ∈ LgCL (P), 5. If [p, q ∈ LgCL (P) and 0 < k ≤ 1] then p2→k q, p3→k q ∈ LgCL (P). A graded conditional model is defined as a family of minimal conditional model (cf. Chellas[4]): Definition 3
Given a fuzzy measure m : W × 2W → [0, 1],
a measure-based conditional model Mm gCL for graded conditional logic is a structure W, {gk }0
gk (w, X) = {Y ⊆ 2W | m(Y, X) ≥ k}. The model Mm gCL is called finite because so is W . Further, in this paper, we call the model Mm gCL uniform since functions {gk } in the model does not depend on any world in Mm gCL . The truth conditions for 2→k and 3→k in a measure-based conditional model are given by m
m
MgCL ∈ gk (w, pMgCL ), Mm gCL , w |= p2→k q iff q m
C
m
MgCL ) ∈ gk (w, pMgCL ). Mm gCL , w |= p3→k q iff (q
The basic idea of these definitions is the same as in fuzzy-measure-based semantics for graded modal logic defined in [7,8,9]. When we take m as a conditional probability, the truth conditions of graded conditional becomes Pr
Pr
MgCL | pMgCL ) ≥ k. MPr gCL , w |= p2→k q iff Pr(w, q
We have several soundness results based on probability-measure-based semantics (cf.[7,8,9]) shown in Table 1.
4. Paraconsistency and Paracompleteness in Conditionals As Chellas pointed out in his book[4](p.269), conditionals p2→q and p3→q are regarded as relative modal sentences like [p]q and pq, respcetively. Thus we first study paraconsistency and paracompleteness in the usual modal logic setting for convenience.
T. Murai et al. / Paraconsistency and Paracompleteness in Chellas’s Conditional Logics
252
Table 1. Soundness results of graded conditionals by probability measures. 0
1 2
4.1. Modal logic case. Let us define some standard language L for modal logic with two modal operators 2 and 3. In [10], we examined some relationship between modal logics and paraconsistency and paracompleteness. Let us assume a language L of modal logic as usual. In terms of modal logic, paracompleteness and paraconsistency have a close relation to the following axiom schemata: D. 2p → ¬2¬p, DC . ¬2¬p → 2p, because they have their equivalent expressions ¬(2p ∧ 2¬p), 2p ∨ 2¬p, respectively. That is, given a system of modal logic Σ, define the following set of sentences df
T = {p ∈ L | "Σ 2p},
T. Murai et al. / Paraconsistency and Paracompleteness in Chellas’s Conditional Logics
253
where "Σ 2p means 2p is a theorem of Σ. Then the above two schemata mean that, for any sentence p not (p ∈ T and ¬p ∈ T ) p ∈ T or ¬p ∈ T respectively, and obviously the former describes the consistency of T and the latter the completeness of T . Thus • T is inconsistent when Σ does not contain D. • T is incomplete when Σ does not contain DC . A system Σ is regular when it contains the following rule and axiom schemata p ↔ q ⇒ 2p ↔ 2q (2p ∧ 2q) ↔ 2(p ∧ q) Note that every normal system is regular. In [10] we pointed out the followings. If Σ is regular, then we have (2p ∧ 2¬p) ↔ 2¬
(1)
where ⊥ ↔ ¬ and ⊥ is falsity constant, which means inconsistency itself. Thus we have triviality: T = L. But if Σ is not regular, then we have no longer (1), thus, in general T = L, which means T is paraconsistent. That is, local inconsistency does not generate triviality as global inconsistency. 4.2. Conditional logic case. Next we apply the previous idea into conditional logics. In conditional logics, the corresponding axiom schemata CD. ¬((p2→q) ∧ (p2→¬q)) CDC . (p2→q) ∨ (p2→¬q) Given a system CL of conditional logic, define the following set of conditionals (rules): df
R = {p2→q ∈ LCD | "CL p2→q}. where LCD is a language for conditional logic and "CL p2→q means p2→q is a theorem of CL. Then the above two schemata mean that, for any sentence p
254
T. Murai et al. / Paraconsistency and Paracompleteness in Chellas’s Conditional Logics
not (p2→q ∈ R and p2→¬q ∈ R) p2→q ∈ R or p2→¬q ∈ R respectively, and obviously the former describes the consistency of R and the latter the completeness of R. Thus, for the set R of conditionals (rules) • R is inconsistent when CL does not contain CD. • R is incomplete when CL does not contain CDC . 4.3. Graded conditional logic case. Finally we deal with graded conditionals based on probability. Define the following set of conditionals with confidence k: df
Rk = {p2→k q ∈ LgCD | "gCL p2→k q}. A graded conditional p2→k q is also regarded as a relative necessary sentences [p]k q and the properties of relative modal operator [·]k are examined in Murai et al.[7,8] in the following correspondence: Confidence k 0 < k ≤ 12 1
Systems EMDC NP EMDNP KD
The former two systems are not regular, so Rk may be paraconsistent. The last one is normal so regular. For 0 < k ≤ 12 , Rk is complete but for some p and q, the both rules p2→k q and p2→k ¬q may be generated. This should be avoided. For 12 < k < 1, Rk is consistent but may be paracomplete. 5. Concluding Remarks In this paper, we examined both paraconsistency and paracompleteness that appear in conditional logics. For lower values of k (less than or equal to 12 ), both p2→k q and p2→k ¬q may be generated, Acknowledgments. The part of this research was supported by Grant-in-Aid (No.14380171) for Scientific Research(B) of the Japan Society for the Promotion of Science.
References [1] Akama, S. and Abe, J.M. (1998): Many-Valued and Annotated Modal Logics. Proc. of 28th ISMVL, 114-119.
T. Murai et al. / Paraconsistency and Paracompleteness in Chellas’s Conditional Logics
255
[2] Akama, S. and Abe, J.M. (2000): Fuzzy Annotated Logics. Proc. of IPMU 2000, 504-509. [3] Akama, S. and Abe, J.M. (2002): Paraconsistent Logics Viewed as a Foundation of Data Warehouses. Advances in Logic, Artificial Intelligence and Robotics, IOS Press, 96-103. [4] Chellas, B.F. (1980): Modal Logic: An Introduction. Cambridge Univ. Press, Cambridge. [5] da Costa, N.C.A., Abe, J.M., Subrahmanian, V.S. (1991): Remarks on Annotated Logic. Zeitschr. f. Math. Logik und Grundlagen d. Math., 37, 561-570. [6] Lewis,D. (1973): Counterfactuals. Blackwell, Oxford. [7] Murai, T., Miyakoshi, M., Shimbo, M. (1993): Measure-Based Semantics for Modal Logic. R.Lowen and M.Roubens (eds.), Fuzzy Logic: State of the Art, Kluwer, Dordrecht, 395–405. [8] Murai, T., Miyakoshi, M., Shimbo, M. (1994): Soundness and Completeness Theorems Between the Dempster-Shafer Theory and Logic of Belief. Proc. 3rd FUZZIEEE (WCCI), 855–858. [9] Murai, T., Miyakoshi, M., Shimbo, M. (1995) A Logical Foundation of Graded Modal Operators Defined by Fuzzy Measures. Proc. 4th FUZZ-IEEE/2nd IFES, 151–156. [10] Murai, T., Sato, Y., Kudo, Y. (2003) Paraconsistency and Neighborhood Models in Modal Logic. Proc. 7th World Multiconference on Systemics, Cybernetics and Informatics, Vol.XII, 220–223.
Rough Sets Based Minimal Certain Rule Generation in Non-deterministic Information Systems: An Overview Hiroshi Sakai and Michinori Nakata Faculty of Engineering, Kyushu Institute of Technology Faculty of Management and Information Science, Josai International University Abstract. Rough sets based minimal certain rule generation in Non-deterministic Information Systems (NISs) is presented. This is advancement from rule generation in Deterministic Information Systems (DISs). NISs were proposed by Pawlak, Orłowska and Lipski in order to handle information incompleteness in DISs. According to the previous research on rule generation in DISs, rules in NISs are defined by logical implications satisfying some constraints. Rough sets based algorithms for generating rules are also proposed. Especially in this paper, minimal certain rules are focused on, and discernibility functions are newly introduced into NISs. A minimal certain rule is generated by means of a solution of a discernibility function. This procedure is implemented on a workstation in prolog. Keywords. Rough sets, non-deterministic information, rule generation, discernibility function, tool
1. Introduction Rough set theory is seen as a mathematical foundation of soft computing. This theory usually handles tables with deterministic information. Many applications of this theory to rule generation, machine learning and knowledge discovery have been presented [1,2,3,4,5,6,7,8,9]. In this paper, we follow rule generation in DISs [4,5,6,7,8] and propose minimal certain rule generation in NISs. NISs, which were proposed by Pawlak, Orłowska and Lipski, have been recognized to be the most important framework for handling incomplete information [10,11,12,13,14]. Therefore, rule generation in NISs will also be an important framework for rule generation from incomplete information. Certain rules and minimal certain rules in a NIS are defined by means of all derived DISs from a NIS. Namely, certain rules are defined in the manner of . In every NIS, the number of all derived DISs is finite, so there is no doubt of defining certain rules. However, the number of derived DISs increases in exponential order, and there may be huge number of derived DISs. We apply rough sets based algo1 Correspondence to: Hiroshi Sakai, Department of Mathematics and Computer Aided Science, Faculty of Engineering, Kyushu Institute of Technology, Tobata, Kitakyushu 804, Japan. Tel(Fax).: +81-93-884-3258; E-mail: [email protected].
H. Sakai and M. Nakata / Rough Sets Based Minimal Certain Rule Generation in NISs
257
rithms to generating certain rules and minimal certain rules, which depend on all derived DISs.
2. An Outline of Rule Generation in DISs Let us see an outline of rule generation in DISs according to Table 1, which is a part , , of a table in [2]. Table 1 shows a relation between attributes and . Table 1. Exemplary deterministic information system
_ _
We identify a tuple with a set of implications, for example : and : _ are extracted from patient , and : is extracted from . We usually call a pair of an attribute and an attribute value a . Implication contradicts , because the same condition concludes the different decisions and . However is consistent with implications from any other tuple. Most of rough sets based rules are defined by means of this concept of ’ [1,2]. Three measures, , and are also applied to defining rules in DISs [3,7]. Generally, a rule is defined by an implication satisfying some constraints. Equivalence relations in DISs are usually employed to generate rules [1,2,3,6,7]. We see that two objects belong to the same equivalence class, if the attribute values of them are the same. In Table 1, we have equivalence classes : and : on an attribute . We similarly have equivalence classes : and : on , and : on . The concept of the consistency is examined by and : indicates that and the inclusion of equivalence classes [1,2]. The relation are inconsistent for attributes and , and the relation indicates and are consistent for attributes and . : _ and Now, let us consider : _ from . Both implications are consistent with other implications, and is more simple than . Because is removed . For generating more simple rules, the reduction of from the condition part in [4] is the condition part has been studied [1,2,4,8,15]. The most effective way to obtain more simple rules. In order to generate more simple rule in Table 1, should be discriminated from . whose decision part is and ( or ) discriminate from and , respectively. In Attributes , and we obtain more simple rule this case, it is sufficient to specify an attribute . A discernibility function is generally a disjunctive normal form of attributes, and a minimal set of attributes satisfying this function becomes the condition part of minimal rules [1,4,15].
258
H. Sakai and M. Nakata / Rough Sets Based Minimal Certain Rule Generation in NISs
3. Rough Sets Based Issues on NISs NISs were proposed by Pawlak, Orłowska and Lipski in order to handle information incompleteness in DISs [10,11,12,13,14]. In NISs, some attribute values are given as a set of values, and we interpret this set as that The actual value exists in this set but it is unknown due to the information incompleteness.’ Table 2. A non-deterministic information system
_ _
In NISs, the concept of an has been employed. Namely, it is possible to generate a DIS by means of replacing every set of values with a value in the set. We name DISs from a NIS. In Table 2, there are 16 derived DISs, and Table 1 such DISs and , which is a derived DIS from NIS in Table 2. Two modalities are defined by means of all derived DISs, are introduced into NISs. (Certainty) If a formula holds in every derived DIS from a NIS, also holds in the unknown real DIS. (Possibility) If a formula holds in some derived DISs from a NIS, there exists such a possibility that holds in the unknown real DIS. We have coped with several issues related to these two modalities, for example the definability of a set in NISs [16], the consistency of an object in NISs, data dependency in NISs [17], rules in NISs [18,19,20], reduction of attributes in NISs [20], etc. An important problem is how to compute two modalities depending upon all derived DISs from a NIS. A simple method, such that every definition is sequentially computed in all derived DISs from a NIS, is not suitable. Because the number of derived DISs from a NIS increases in exponential order. This problem is uniformly solved by means of applying and information or [16,17,18]. either
4. An Outline of Rule Generation in NISs 4.1. Certain and Possible Rules Let us show an outline of rule generation in NISs according to Table 2. We name implications in derived DISs denoted by . In Table 2, : from is definite, and appears in all derived : and : DISs. However, from are indefinite, and both implications appear in 8 derived DISs, respectively. and are consistent in 8 derived DISs, whereas and The are inconsistent in the other 8 derived DISs. In such case, we say is ( ). On the other hand, : _ from is consistent in all derived DISs. In such case, we say is ( ). We say other possible implication is ( ). According to this consideration, we define 6 classes of possible implications [18,19] in Table 3.
H. Sakai and M. Nakata / Rough Sets Based Minimal Certain Rule Generation in NISs
259
Table 3. Six classes of possible implications in NISs
In DISs, there exist only two classes and . These two classes are extended to 6 classes in NISs. In Table 3, possible implications in class are not influenced by the information incompleteness, therefore we name possible implications in class . We also name possible implications in either , , or classes . 4.2. Algorithms for Examining Certain and Possible Rules Let us show an outline of algorithms for examining certain and possible rules according to Table 2. There are two important properties for this examination. In Table 2, there are 16 derived DISs, and we name equivalence classes in these derived DISs ( -classes). Table 1 is a derived DIS from Table 2, therefore all equivalence classes, : and : on , : and : on , and : and : on , are -classes. For another derived DIS, other different -classes may exist. Let us consider and from again. For and , we have two -classes including , i.e., : and : . As for , we have two -classes including , i.e., : and : . In these cases, and hold, respectively. For every object , it is possible to define a minimum -class and a maximum -class such that . We name these classes and , respectively. The unknown real class is between these and . This lies in information incompleteness in NISs, and = holds in every DIS. We have following two properties. Property 1.[16,18] Let be a -class including an object on attribute , and be a -class including an object on attribute . A -class including an object on attributes and is . Property 2.[18,19] Let denote a conjunction , and let us consider a possible implication : . The following holds. (1) belongs to class if and only if information for is a subset of information for . (2) belongs to class if and only if information for is a subset of information for . (3) belongs to class if and only if information for is not a subset of information for . According to Property 1, it is possible to produce -classes on any set of attributes according to -classes on each attribute. Therefore, it is also possible to produce and information on any set of attributes. Then, we apply produced and information to examining Property 2. In Table 2, holds, which assures us that belongs to class. For an attribute , we also have two -classes with , i.e., : and : . Here, holds, which assures us that belongs to class.
260
H. Sakai and M. Nakata / Rough Sets Based Minimal Certain Rule Generation in NISs
4.3. A Problem: Generation of Minimal Certain Rules in NISs In Table 2, let us consider and : _ from . Both and belong to class, and is more simple than . In this case, it is impossible to reduce the condition part in , therefore we say is . Clearly, is not minimal. For generating more simple rules, it is necessary to deal with minimal rules in NISs, too. The problem in this paper is defined in the following. Problem. For any , let be decision attributes and let be a tuple of decision attributes values for . Then, find all minimal certain rules in the form of . If there exist lots of minimal certain rules, find some appropriate minimal certain rules. This problem has already been investigated in DISs. An important problem for finding minimal rules in DISs lies in such a fact that the minimal rule may not be unique. Some minimal rules may be generated from an object. In [4,15], a in DISs has been proposed, and some algorithms including reduction of attributes were investigated. To find a minimal reduct in a DIS is proved to be NP-hard [4], too.
5. Minimal Certain Rule Generation Based on a Discernibility Function 5.1. Discernibility Functions on NISs According to Property 1 and Property 2, it is possible to decide whether certain rules are obtained from an object or not. However, these property are not applicable to generate minimal certain rules. For generating minimal certain rules, we propose discernibility functions on NISs. In Table 1 the discernibility function, which discriminates from , is . In DISs, a discernibility function is a disjunctive normal form of attributes. However in NISs, a disjunctive normal form of attributes is not sufficient. In order to discriminate from , it is necessary to specify a descriptor . Namely, it is necessary to specify a pair of an attribute and an attribute value. Furthermore, a discernibility function on every class is different from each other. Let us consider a possible implication, whose decision part is , from object , and let denote information for . Furthermore, let be a definite value of in , let denote information for , and let ( ) denote . Let denote , which we call a discernibility function of on class. Similarly, it is possible to define , and . The precise definition of is in [20]. For example, from belongs to class, so it is possible to generate minimal certain rules. In this case, it is meaningful to deal with . For decision attribute , we employ information, i.e., = = , and must be discriminated from . For an attribute , = = , so [ , ] can not discriminate from . However for an attribute , _ = = ,
H. Sakai and M. Nakata / Rough Sets Based Minimal Certain Rule Generation in NISs
261
therefore [ , _ ] can discriminate from . = , _ = _ holds, and = = = , _ is obtained. Since the minimal solution of this function is only _ , we obtain a unique minimal certain rule . According to the above discussion, we give Property 3 in the following. Property 3. Let us suppose that a possible implication from belongs to class. For a minimal solution of , is a minimal certain rule from . 5.2. Manipulations on Discernibility Functions This subsection proposes manipulations for obtaining minimal solutions of . It is easy to obtain a solution by means of tree search. However, we have to remark this tree search. Let us consider the following Example. Example 1. Let us suppose = . If we select in , this is revised to ’= . Because, is assigned to , and can be removed from . Such procedure is called an in [15]. This absorption law is the most fundamental law for reduction. Similarly if we select in ’, ’ is revised to ”= . Finally, we select and we obtain a set . This set satisfies , but this set is not minimal. Because, both sets and satisfy . Now, we propose some methods for obtaining minimal solutions of . Enumeration Method (E-method): We enumerate every subset of all descriptors in , then we sequentially examine the satisfiability of . In Example 1, all subsets are sequentially , , , , , and . A set is the first set satisfying . Then, we remove since . A set also satisfies , and we obtain two minimal solutions. Interactive Selection and Enumeration Method (ISE-method): We sequentially select a descriptor in , and we reduce to new ’ . By repeating this procedure, it is possible to obtain a set of descriptors satisfying . For each obtained set of descriptors, we apply E-method. ISE-method with a Threshold Value (ISETV-method): We fix the threshold number of descriptors. Let be the number of selected descriptors, and be the number of distinct descriptors in a reduced discernibility function ’ . We sequentially select a descriptor, and E-method is invoked if . ISETV-method controls minimal rule generation in NISs by means of adjusting a threshold value . For large size of , we obtain most minimal certain rules from objects but it may take much execution time. On the other hand for small size of , we may obtain small number of minimal certain rules based on specified descriptors.
6. A Real Execution in Table 2 is generated. Program In a real execution, minimal certain rule finds objects, which certain rules can be generated from. Program generates minimal certain rules, which consist of only core or common descriptors.
262
H. Sakai and M. Nakata / Rough Sets Based Minimal Certain Rule Generation in NISs
?-init. /* Init finds objects, which certain rules can be generated from. */ DECLIST: Certain Rules come from [1] /* Certain rules can be generated from object 1. */ EXEC_TIME=0.000(sec) yes ?-minimal. /* Minimal rule generation in Table 2 */ «Minimal Certain Rules from object 1» Descriptor [2,very_high] is a core for object 3 [2,very_high]=>[3,yes][8/8(=4/4,2/2),DGC:Only Core] This rule covers objects [1] EXEC_TIME=0.003(sec) yes
For other minimal certain rules, we employ a discernibility function, and interactively specify descriptors in the discernibility function. Program for object 5 simulates this process for another data ( =10, =8, the number of derived DISs is 7346640384). Minimal certain rules, which depend upon user’s selection of descriptors, from object 5 are generated. ?-solall(5). /* Interactive rule generation from object 5 in another table */ Input a Number of Descriptors to Start Exhaustive Search:3. Exhaustive Search for less than 8 Cases !! [Loop:1] DF without Core Descriptors:[[1,[1,4],[4,5]],[2,[1,4], [8,[2,1],[7,4]],[9,[1,4],[2,1],[4,5],[7,4]]] Descriptors in the Current DF:[[1,4],[2,1],[4,5],[7,4]] Select a Descriptor:[1,4]. Revised DF without Core:[[6,[4,5],[7,4]],[8,[2,1],[7,4]]] Common Descriptors in the Current DF:[[7,4]] Exhaustive Search begins for [[1,4],[7,4]] [1,4]&[7,4]=>[8,1][3888/3888(=72/72,54/54),DGC] This rule covers objects [5] [Loop:2] Revised DF without Core:[[6,[4,5]],[8,[2,1]]] Descriptors in the Current DF:[[2,1],[4,5]] Exhaustive Search begins for [[1,4],[2,1],[4,5]] [2,1]&[4,5]=>[8,1][972/972(=18/18,54/54),DGC] This rule covers objects [5] yes
7. Execution Time for Other NISs and Concluding Remarks A NIS in Table 2 is very small, so we define other NISs and examine the execution time. Table 4 shows the details of three NISs. These NISs are artificial data. A framework of minimal certain rule generation in NISs is presented. Discernibility functions are newly introduced into NISs, and a minimal certain rule from object is . Some macharacterized by a minimal solution of a discernibility function nipulations on a discernibility function are also proposed, and these manipulations are implemented on a workstation with 450MHz Ultrasparc CPU in prolog. This work is partly supported by the Grant-in-Aid for Scientific Research (C) (No.16500176) from The Japanese Society for the Promotion of Science.
H. Sakai and M. Nakata / Rough Sets Based Minimal Certain Rule Generation in NISs Table 4. Definitions of Three NISs. of attribute values for attribute
:number of objects,
:number of attributes and _
Table 5. Execution time of programs. Here depends upon rules. Program certain rules from an object.
.
_
263 :number
_
implies the number of objects generating minimal certain is fixed to 10, so program generated all minimal
References [1] Z.Pawlak, , Kluwer Academic Publisher, 1991. [2] Z.Pawlak, Some Issues on Rough Sets, Transactions on Rough Sets, Springer-Verlag 1 (2004), 1-58. [3] J.Komorowski, Z.Pawlak, L.Polkowski and A.Skowron, Rough Sets: a tutorial, Rough Fuzzy Hybridization, Springer (1999), 3-98. [4] A.Skowron and C.Rauszer, The Discernibility Matrices and Functions in Information Systems, Intelligent Decision Support - Handbook of Advances and Applications of the Rough Set Theory, Kluwer Academic Publishers (1992), 331-362. [5] J.Grzymala-Busse and P.Werbrouck, On the Best Search Method in the LEM1 and LEM2 Algorithms, Incomplete Information: Rough Set Analysis, Phisica-Verlag (1998), 75-91. [6] J.Grzymala-Busse, Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction, Transactions on Rough Sets, Springer-Verlag, 1 (2004), 78-95. [7] S.Tsumoto, Knowledge Discovery in Clinical Databases and Evaluation of Discovered Knowledge in Outpatient Clinic, Information Sciences, 124 (2000), 125-137. [8] M.Kryszkiewicz, Rules in Incomplete Information Systems, Information Sciences, 113 (1999), 271-292. [9] Rough Set Software: Bulletin of Int’l. Rough Set Society, 2 (1998), 15-46. [10] E.Orłowska and Z.Pawlak, Representation of Nondeterministic Information, Theoretical Computer Science, 29 (1984), 27-39. [11] E.Orłowska (Ed.): Incomplete Information: Rough Set Analysis, Physica-Verlag, 1998. [12] W.Lipski, On Semantic Issues Connected with Incomplete Information Data Base, ACM Trans. DBS, 4 (1979), 269-296. [13] W.Lipski, On Databases with Incomplete Information, Journal of the ACM, 28 (1981), 41-70. [14] M.Nakata and S.Miyamoto, Databases with Non-deterministic Information, Bulletin of Int’l. Rough Set Society, 7 (2003), 15-21. [15] M.Kryszkiewicz and H.Rybinski, Computation of Reducts of Composed Information Systems, Fundamenta Informaticae, 27 (1996), 183-195. [16] H.Sakai, Effective Procedures for Handling Possible Equivalence Relations in Non-deterministic Information Systems, Fundamenta Informaticae, 48 (2001), 343-362. [17] H.Sakai: Effective Procedures for Data Dependencies in Information Systems, Rough Set Theory and Granular Computing, Studies in Fuzziness and Soft Computing, 125 (2003), 167-176. [18] H.Sakai and A.Okuma, Basic Algorithms and Tools for Rough Non-deterministic Information Analysis, Transactions on Rough Sets, Springer-Verlag, 1 (2004), 209-231. [19] H.Sakai, A Framework of Rough Sets based Rule Generation in Non-deterministic Information Systems, Lecture Notes in AI, Springer-Verlag, 2871 (2003), 143-151. [20] H.Sakai, An Advancement of a Rough Sets based Rule Generator - Minimal Certain Rules and Discernibility Functions - Proc. Int’l. Conf. on Soft Computing and Intelligent Information Systems, THP8-1 (2004), 1-6.
Equivalence of Fuzzy-rough Modus Ponens and Fuzzy-rough Modus Tollens Masahiro Inuiguchi a,1 , Salvatore Greco b and Roman SáowiĔski c, d Graduate School of Engineering Science, Osaka University, Japan b Faculty of Economics, University of Catania, Italy c Institute of Computing Science, PoznaĔ University of Technology, Poland d Institute for Systems Research, Polish Academy of Sciences, Poland a
Abstract. We have proposed a fuzzy rough set approach to induce gradual decision rules from decision tables without using any fuzzy logical connective. In this paper, we discuss the equivalence between fuzzy-rough modus ponens and fuzzy-rough modus tollens obtained from the induced gradual decision rules. We show the necessary and sufficient conditions for fuzzy-rough modus ponens and fuzzy-rough modus tollens to be equivalent.
1.
Introduction
Rough sets [11,12] and fuzzy sets [17] treat different kinds of uncertainties. The former deals with uncertainty of information resulting from ambiguity caused by granular description of objects, while the latter treats the uncertainty of information as result of concepts and linguistic categories with vague boundaries. Because of this difference, many attempts in the combination of rough sets and fuzzy sets have been done [1,3,5,6,9,10,13,14,15,16]. In many of them, some fuzzy logical connectives (t-norm, tconorm, fuzzy implication) are employed. Recently, the authors [4] proposed fuzzy rough sets without using any fuzzy logical connective. The fuzzy rough sets are associated with gradual decision rules. Using the gradual decision rules, we formulated fuzzy-rough modus ponens and fuzzy-rough modus tollens and demonstrated their similarities [7]. In this paper, we discuss the equivalence between fuzzy-rough modus ponens and fuzzy-rough modus tollens. Both fuzzy-rough modus ponens and fuzzy-rough modus tollens are characterized by modifier functions [8]. Thus, by the equivalence between fuzzy-rough modus ponens and fuzzy-rough modus tollens we mean the equality of the associated modifier functions. We show some necessary and sufficient conditions for the identity of modifier functions. In the next section, we introduce fuzzy rough sets as well as fuzzy-rough modus ponens and fuzzy-rough modus tollens. We show the properties of modifier functions characterizing fuzzy-rough modus ponens and fuzzy-rough modus tollens in Section 3. Using these properties, the main theorem is given also in Section 3. In Section 4, conclusions of this paper and future research directions are discussed.
1 Corresponding Author: Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531, Japan; E-mail: [email protected]
M. Inuiguchi et al. / Equivalence of Fuzzy-Rough Modus Ponens and Fuzzy-Rough Modus Tollens 265
2.
Fuzzy-rough Modus Ponens and Fuzzy-rough Modus Tollens
Suppose that we want to approximate knowledge contained in fuzzy set Y using knowledge about fuzzy set X over a finite set U of all objects in a given decision table. Let us also adopt the hypothesis that X is positively related to Y. Then, we can define the lower approximation App+(X,Y), and upper approximation App +(X,Y) of Y by the following membership functions [4,7]:
P[App+(X,Y),x]=
min
{P Y ( z )} ,
zU :P X ( z ) t P X ( x )
P[ App +(X,Y),x]=
max
{P Y ( z )} ,
zU :P X z d P X x
where PX and PY are membership functions of X and Y. Similarly, if we adopt the hypothesis that X is negatively related to Y, then we can define the lower approximation App(X,Y), and upper approximation App (X,Y) of Y by the following membership functions [4,7]:
P[AppX,Y),x]=
min
{P Y ( z )} ,
zU :P X ( z ) d P X ( x )
P[ App (X,Y),x]=
max
{P Y ( z )} .
zU :P X z t P X x
The lower and upper approximations defined above can serve to induce lower and upper gradual decision rules over all possible objects Uˆ U in the following way. Let us remark that inferring lower and upper gradual decision rules is equivalent to finding modifier functions fY|X+, fY|X, gY|X+ and gY|Xassociated with lower and upper approximations. These can be defined as follows: for each D[0,1]
Note that P[App+(X,Y),x]=fY|X+(PX(x)), P[ App (X,Y),x]=gY|X(PX(x)) hold for xU such that PX(x)>0, and P[AppX,Y],x]=fY|X(PX(x)), P[ App +(X,Y),x]=gY|X+(PX(x)) hold for xU such that PX(x)<1. We assume that data in a given decision table is only a sample. Therefore, it seems reasonable to define fY|X+(0)=fY|X(1)=0 and gY|X+(1)=gY|X(0)=1 for possible existence of objects y, z Uˆ such that PX(y)=0, PX(z)=1, PY(y)< min{PY(x)| PX(x)=0, xU} and PY(z)>max{PY(x)| PX(x)=1, xU}. Using fY|X+, fY|X, gY|X+ and gY|X, we may have the following decision rules: x LP-rules: "if PX(x)tD then PY(x)t fY|X+(D)"; x LN-rules: "if PX(x)dD then PY(x)t fY|X(D)"; x UP-rules: "if PX(x)dD then PY(x)d gY|X+(D)"; x UN-rules: "if PX(x)tD then PY(x)d gY|X(D)", where the LP-rule can be regarded as a gradual decision rule [2]; it can be interpreted as: "the more object x is X, the more it is Y". In this case, the relationship between credibility of premise and conclusion is positive and certain. The LN-rule can be interpreted in turn as: "the less object x is X, the more it is Y", so the relationship is negative and certain. On the other hand, the UP-rule can be interpreted as: "the more object x is X, the more it could be Y", so the relationship is positive and possible. Finally, the UN-rule can be interpreted as: "the less object x is X, the more it could be Y", so the relationship is negative and possible. Based on the decision rules, we can formulate the following fuzzy-rough modus ponens (MP): (LP-MP) if PX(x)tD, then PY(x)tfY|X+(D) (LN-MP) if PX(x)dD, then PY(x)tfY|X(D) and PX(x)dD’ and PX(x)tD’ then PY(x)t fY|X+(D’) then PY(x)t fY|X(D’) (UP-MP) if PX(x)tD, then PY(x)dgY|X+(D) (UN-MP) if PX(x)dD, then PY(x)dgY|X(D) and PX(x)dD’ and PX(x)tD’ + then PY(x)dgY|X (D’) then PY(x)dgY|X(D’) Moreover, the following fuzzy-rough modus tollens (MT) were obtained in [7]: (LP-MT) if PX(x)tD, then PY(x)tfY|X+(D) (LN-MT) if PX(x)dD, then PY(x)tfY|X(D) and PY(x)<E and PY(x)<E + then PX(x)<(fY|X )H (E) then PX(x)>(fY|X)H(E)
M. Inuiguchi et al. / Equivalence of Fuzzy-Rough Modus Ponens and Fuzzy-Rough Modus Tollens 267
(UP-MT) if PX(x)dD, then PY(x)dgY|X+(D) (UN-MT) if PX(x)tD, then PY(x)dgY|X(D) and PY(x)>E and PY(x)>E + then PX(x)>(gY|X )H (E) then PX(x)<(gY|X)H(E) where H is an infinitely small positive number and we define inf{D [0 ,1] | f Y| X (D ) t E }, if {D [0 ,1] | f Y| X (D ) t E } z , (fY|X+)H(E)= ® 1 İ, otherwise, ¯ sup{D [0,1] | f Y| X (D ) t E }, if {D [0,1] | f Y| X (D ) t E } z , (fY|X)H(E)= ® 0 İ, otherwise, ¯ sup{D [0 ,1] | g Y| X (D ) d E }, if {D [0 ,1] | g Y| X (D ) d E } z , (gY|X+)H(E)= ® 0 İ, otherwise, ¯ inf{D [0 ,1] | g Y| X (D ) d E }, if {D [0 ,1] | g Y| X (D ) d E } z , (gY|X)H(E)= ® 1 İ, otherwise. ¯
Considering the fact that we induce gradual decision rules from a given decision table, we may find some similarity between the above fuzzy-rough MT and fuzzyrough MP based on fX|Y+, fX|Y, gX|Y+ and gX|Y. In fact, the fuzzy-rough MP based on fX|Y+, fX|Y, gX|Y+ and gX|Y are similar to inference patterns (LP-MT), (LN-MT), (UP-MT) and (UN-MT) in which X and Y are exchanged. In the next section, we discuss the necessary and sufficient conditions for that fuzzyrough MP based on fX|Y+, fX|Y, gX|Y+ and gX|Y are equivalent with the fuzzy-rough MT based on gY|X+, fY|X, fY|X+ and gY|X, respectively. More precisely, we show the necessary and sufficient conditions for that fX|Y+, fX|Y, gX|Y+ and gX|Y are equal to (gY|X+)H, (fY|X)H, (fY|X+)H and (gY|X)H, respectively.
3.
Theorems
First of all, we point out some mutual relations among fX|Y+, fX|Y, gX|Y+ and gX|Y. Let Xc and Yc be fuzzy sets having membership functions P X c ( x) =1PX(x), xU and
P Y c ( x) =1PY(x), xU, i.e., complements of X and Y, respectively. We have the following relations by definition: for anyD>@ fX|Y(D)= f X|Y c (1 - D ) gX|Y+(D)=1 f Xc |Y c (1 - D ) andgX|Y(D)=1 f Xc |Y (D ) These relations hold when X and Y are exchanged. Therefore, we have, for anyD>@ (fY|X)H(D)= ( g Yc | X ) -1İ (1 - D ) , (fY|X+)H(D)=1 ( g Yc | X c ) -1İ (1 - D ) and (gY|X)H(D)=1 ( g Y| X c ) -1İ (D )
268 M. Inuiguchi et al. / Equivalence of Fuzzy-Rough Modus Ponens and Fuzzy-Rough Modus Tollens
Because of these relations, the necessary and sufficient condition for that fX|Y+ equals to (gY|X+)H provides the necessary and sufficient conditions for that fX|Y, gX|Y+ and gX|Y equal to (fY|X)H, (fY|X+)H and (gY|X)H, respectively. In what follows, we show the necessary and sufficient condition for fX|Y+(D)=(gY|X+)H(D), for all D(D1, D2] with some 0dD1dD2d1. Let us define a subset V of U by V={xU| there is no yU such that PX(y)>PX(x) and PY(y)=PY(x)}. The following three lemmas are useful to obtain the condition. Lemma 1. If there exist x,yV such that PX(x)tPX(y) and PY(x)0 with (fY|X+)H(PY(y))z1+H, or (fY|X+)H(PY(y))=1+H. Lemma 2. Assume that there exists xU such that there exists yV satisfying PX(y)>PX(x) and PY(y)=PY(x) but there is no zV satisfying PX(z)tPX(x) and PY(z)0 with (fY|X+)H(PY(x))z1+H. Lemma 3. Assume that there exists xU such that there exists zV satisfying PX(z)>PX(x) and PY(z)PX(x), wU} and PX(u)>PX(x). Then, we have gX|Y+(PY(u))(fY|X+)H(PY(u))>0 with (fY|X+)H(PY(u))z1+H.
Under the assumption that there exists xU such that PY(x)(D1, D2], we define max{P X ( x) | P Y ( x) d D 1 , x U }, if x U , P Y ( x) d D 1 M(D1)= ® , otherwise ¯ İ ,
D2 max{PY(x)| PY(x)(D1, D2], xU}. Consider the following conditions: (a) For any x, yV such that PY(x), PY(y)(D1, D2], PY(x)>PY(y) implies PX(x)>PX(y), (b) For all xUV such that PY(x)>D1, PX(x)[mV(D1, D2), M(D1, D2)], there exists yV such that PY(y)(D1, PY(x)] and PX(y)=PX(x), (c) M(D1)<mV(D1, D2), (d) M(D1)tmax{PX(x)| PX(x)<mV(D1, D2), xU}, if xU, PX(x)<mV(D1, D2), (e) if D2 then M(D1, D2)=1. In what follows, under assumptions, (A1) there exists xU such that PY(x)(D1, D2], (A2) min{PY(x)| PX(x)=max{PX(y)| yU}, xU}tD2, we prove that the conjunction of (a)~(d) is a necessary and sufficient condition to have (fY|X+)H(D) gX|Y+(D), for all D(D1, D2 ].
(1)
M. Inuiguchi et al. / Equivalence of Fuzzy-Rough Modus Ponens and Fuzzy-Rough Modus Tollens 269
Figure 1. The illustration of the conditions (a)a(e)
Conditions (a)~(e) are illustrated in Figure 1. The coordinates of “u” correspond to description of an object from set V. From condition (a), the objects in the ellipse should climb monotonically upward. From (c), there is no object in the region (0, D1]u(M(D1),1]. From (d), there is no object in the region [0,1]u(M(D1),mV(D1, D2)). Considering (b) with (c) and (d), the objects from UV can exist only in the shaded area. Condition (e) is not shown in Figure 1 since D2 z1. If D2 =1, which implies D2=1, then condition (e) requires M(D1, D2)=1. The necessity is given in the following theorem. Theorem 1. Under assumptions (A1) and (A2), conditions (a)~(e) are necessary for equation (1).
To prove the sufficiency, the following lemma is useful. Lemma 4. Under assumptions (A1) and (A2), if conditions (a)~(e) hold, then we have
(i) for all xV such that PY(x)(D1, D2], P[ App (Y,X),x]=PX(x) and P[App+(X,Y),x] =PY(x),
(ii) for all xU such that PY(x)dD1, P[ App (Y,X),x]<mV(D1, D2),
(iii) for all xU such that PY(x)>D2, P[ App (Y,X),x]tM(D1, D2), (iv) for all xU such that PX(x)<mV(D1, D2), P[App+(X,Y),x]dD1, and (v) for all xU such that PX(x)t M(D1, D2), P[App+(X,Y),x]tD2 . Theorem 2. Under assumptions (A1) and (A2), if conditions (a)~(e) hold, then we have equation (1).
By Theorems 1 and 2, we know that the conjunction of (a)~(e) is the necessary and sufficient condition of equation (1) under assumption (A1) and (A2). The following proposition is useful to know a necessary and sufficient condition of
270 M. Inuiguchi et al. / Equivalence of Fuzzy-Rough Modus Ponens and Fuzzy-Rough Modus Tollens
(fY|X+)H(D)gX|Y+(D)dH, for all D(0, 1].
(2)
Proposition 1. The following assertions are valid:
1) For any D>max^PY(z)|zU`, we have gX|Y+(D)=1 and (fY|X+)H(D)=1+H. 2) For any D(min^PY(z) | PX(z)=max{PX(y)| yU }, zU`, max^PY(z) | zU`], we have gX|Y+(D)= max{PX(y)| yU} and (fY|X+)H(D)=1+H. 3) Let D=min^PY(z) | PX(z)=max{PX(y)| yU }, zU`. We have gX|Y+(D)=(fY|X+)H(D)= max{PX(y)| yU} Consider the case where D1=0 and D2=min^PY(z) | PX(z)=max{PX(y)| yU }, zU`. Obviously this case satisfies assumptions (A1) and (A2). When (a)~(e) are satisfied for D1=0 and D2=min^PY(z) | PX(z)=max{PX(y)| yU }, zU`, by Theorems 1 and 2, we obtain (1) with D2 D2=min^PY(z) | PX(z)=max{PX(y)| yU }, zU`. However, for any D between min^PY(z) | PX(z)=max{PX(y)| yU }, zU` and max^PY(z) | zU`, we have (fY|X+)H(D)gX|Y+(D)>H from Proposition 1(2). Moreover, from Proposition 1(1), we know (fY|X+)H(D)gX|Y+(D)dH for any D>max^PY(z) | zU`. Hence, if min^PY(z) | PX(z)=max{PX(y)| yU}, zU`=max^PY(z) | zU` holds, we always have (fY|X+)H(D)gX|Y+(D)dH for any D(0, 1] except D=min^PY(z) | PX(z)=max{PX(y)| yU}, zU`=max^PY(z) | zU`. From Proposition 1(2), the necessary and sufficient condition for (2) is (I) (a)~(e) are satisfied for D1=0 and D2=1, or equivalently, (a)~(e) are satisfied for D1=0 and D2= max^PY(z) | zU` and (II) max{PX(y)| yU }=1 or min^PY(z) | PX(z)=max{PX(y)| yU }, zU`=max^PY(z) | zU`. Note that (A1) holds whenever there exists yU such that PY(y)>0. (A2) holds from (I) and (II). Finally, we discuss the values of gX|Y+(0) and (fY|X+)H(0). By definition, we know gX|Y+(0)=max{PX(x) | PY(x)=min{PY(y) | yU}, xU} and (fY|X+)H(0)=0. Therefore, gX|Y+(0)>0 whenever there exists xU such that PY(x)=min{PY(y) | yU}. Then the necessary and sufficient condition of (fY|X+)H(D)gX|Y+(D)dH, for all D[0, 1],
PX(x)>0 for (3)
is (I)~(II) and (III) for any xU such that PY(x)=min{PY(y) | yU}, PX(x)=0. Note that (c) holds from (III) and D1=0. (A1) holds whenever there exists yU such that PY(y)>0. (A2) holds from (I) and (II). (d) can be rewritten as min{PX(x)| PY(x)>0, xU}<mV(0, 1). From gX|Y+(D)=1 f Xc |Y c (1 - D ) and (fY|X+)H(D)=1 ( g Yc | X c ) -1İ (1 - D ) , we obtain the necessary and sufficient condition for
M. Inuiguchi et al. / Equivalence of Fuzzy-Rough Modus Ponens and Fuzzy-Rough Modus Tollens 271
(gY|X+)H(D) fX|Y+(D), for all D[1D2 , 1D1). It is obtained as (a)~(e) under assumptions (A1) and (A2), by replacing PX, PY, D1 and D2 with 1PX, 1PY, 1D1 and 1D2, respectively
4.
Conclusions
We discussed the necessary and sufficient conditions for fuzzy-rough modus ponens and fuzzy-rough modus tollens to be essentially equivalent. The condition with respect to a pair of fuzzy-rough modus ponens and fuzzy-rough modus tollens is obtained by investigating the equality of modifier functions characterizing the fuzzy-rough modus ponens and the fuzzy-rough modus tollens. The obtained conditions look to be complex but they are rather simple when illustrated by a figure. The result can be useful to analyze the monotonicity between two concepts expressed by fuzzy sets.
References [1] G. Cattaneo: Fuzzy extension of rough sets theory, in: L. Polkowski and A. Skowron (eds.), Rough Sets and Current Trends in Computing, LNAI 1424, Springer, Berlin (1998) 275-282. [2] D. Dubois and H. Prade: Gradual inference rules in approximate reasoning, Information Sciences 61 (1992) 103-122. [3] D. Dubois and H. Prade: Putting rough sets and fuzzy sets together, in: R. SáowiĔski (ed.): Intelligent Decision Support: Handbook of Applications and Advances of the Sets Theory, Kluwer, Dordrecht (1992) 203-232. [4] S. Greco, M. Inuiguchi and R. SáowiĔski: Rough sets and gradual decision rules, in: G. Wang, Q. Liu, Y. Yao and A. Skowron (eds.): Rough Sets, Fuzzy Sets, Data Minig, and Granular Computing, LNAI 2639, Springer-Verlag, Berlin (2003) 156-164. [5] S. Greco, S., B. Matarazzo and R. SáowiĔski: The use of rough sets and fuzzy sets in MCDM, in: T. Gal, T. Stewart and T. Hanne (eds.): Advances in Multiple Criteria Decision Making, Kluwer Academic Publishers, Boston (1999) 14.1-14.59. [6] S. Greco, B. Matarazzo and R. SáowiĔski: Rough set processing of vague information using fuzzy similarity relations, in: C.S. Calude and G. Paun (eds.): Finite Versus Infinite – Contributions to an Eternal Dilemma, Springer-Verlag, London (2000) 149-173. [7] M. Inuiguchi, M., S. Greco and R. SáowiĔski: Fuzzy-rough modus ponens and modus tollens as a basis for approximate reasoning, in: S. Tsumoto, R. SáowiĔski, J. Komorowski, J.W. Grzymala-Busse (eds.): Rough Sets and Current Trends in Computing. LNAI 3066, Springer-Verlag, Berlin (2004) 84-94. [8] M. Inuiguchi, S. Greco, R. SáowiĔski and T. Tanino: Possibility and necessity measure specification using modifiers for decision making under fuzziness, Fuzzy Sets and Systems 137 (2003) 151-175. [9] M. Inuiguchi and T. Tanino: New fuzzy rough sets based on certainty qualification, in: S. K. Pal, L. Polkowski and A. Skowron (eds.): Rough-Neural Computing: Techniques for Computing with Words, Springer-Verlag, Berlin (2003) 278-296. [10] A. Nakamura, J. M. Gao: A logic for fuzzy data analysis, Fuzzy Sets and Systems 39 (1991) 127-132. [11] Z. Pawlak: Rough sets, Internatonal Journal of Information & Computer Sciences 11 (1982) 341-356. [12] Z. Pawlak: Rough Sets. Kluwer, Dordrecht (1991). [13] L. Polkowski: Rough Sets: Mathematical Foundations, Physica-Verlag, Heidelberg (2002). [14] R. SáowiĔski: Rough set processing of fuzzy information, in: T. Y. Lin and A. Wildberger (eds.): Soft Computing: Rough Sets, Fuzzy Logic, Neural Networks, Uncertainty Management, Knowledge Discovery, Simulation Councils, Inc., San Diego, CA (1995) 142-145. [15] R. SáowiĔski and J. Stefanowski: Rough set reasoning about uncertain data, Fundamenta Informaticae 27 (1996) 229-243. [16] Y. Y. Yao: Combination of rough and fuzzy sets based on D-level Sets, in: T.Y. Lin and N. Cercone (eds.): Rough Sets and Data Mining: Analysis for Imprecise Data, Kluwer, Boston (1997) 301-321. [17] L.A. Zadeh: Fuzzy sets, Information and Control 8 (1965) 338-353.
Non-Commutative Fuzzy Logics and Substructural Logics Mayuka F. KAWAGUCHI a,1 Osamu WATARI b Masaaki MIYAKOSHI a Division of Computer Science, Graduate School of Information Sci. & Tech., Hokkaido University b Hokkaido Automotive Engineering College
a
Abstract. This report treats the relation between substructural logics and fuzzy logics, especially focuses on the non-commutativity of conjunctive operators i.e. substructural logics without the exchange rule. As the results, the authors show that fuzzy logics based on the left continuous pseudo-t-norms form the models of FLw. Also, we introduce the definition of pseudo-uninorms and give some methods to construct them, and show that such functions realize the fuzzy logical systems as the models of FLc and FL, which is the weakest substructural logic. Keywords. Non-commutative fuzzy logics, substructural logics, pseudo-t-norms, pseudo uninorms
Introduction Substructural logics are logics lacking some or all of the structural rules when they are formalized in sequent systems. It has been known that they cover many of the well-known non-classical logics. According to Ono [1], the purpose of the study of substructural logics is to introduce a uniform framework in which various kinds of non-classical logics that originated from different motivations can be discussed together, and to find common feartures among them, taking structural rules for a clue. On the other hand, the theoretical aspects of fuzzy logics have been developed mainly from the viewpoint of logical connectives represented by the family of t-norms and their residuals [2]. Recently, the relations of fuzzy logics to substructural logics have attracted the attentions of some researchers in both sides of fuzzy logics [3], [4], [5] and substructural logics [1], [6], i.e. t-norm based fuzzy logics are included in the framework of FLew (substructural logic without contraction).
1 Corresponding Author: Mayuka F. KAWAGUCHI, Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, JAPAN; E-mail: [email protected].
M.F. Kawaguchi et al. / Non-Commutative Fuzzy Logics and Substructural Logics
273
This research treats fuzzy logics from the viewpoint of substructural logics without exchange, because the lack of exchange in sequent systems corresponds to the noncommutativity property in the algebraic systems, non-commutative logics has become relevant especially in the field of computer science, and fuzzy logics can easily provide the various concrete examples of such logical connectives.
1. Substructural Logics and Full Lambek Algebra The formal systems by sequent calculi corresponding to the classical logic and the intuitionistic logic are called LK and LJ, respectively. A sequent calculus consists of an initial sequent, structural inference rules, and inference rules on logical connectives. Here, the structural inference rules are weakening, contraction, exchange and cut. Substructural logics are the logical systems in which some of the structural rules except for cut are restricted from LK or LJ. The formal system called full Lambek Caluculus FL is obtained by removing structural rules: weakening, contraction and exchange from LJ, and forms the basis of all substructural logics. By adding all or some of weakening, contraction and exchange
* ,* c o ' (w left) * , A, * c o ' * , A, A, * c o ' (c left) * , A, * c o '
* o ', ' c (w right) * o ', A, ' c * , A, B, * c o ' (e left) * , B, A, * c o '
to FL, one obtains the various intuitionistic substructural logics FLe, FLec, FLew, FLecw (=LJ), FLc, FLw, and FLcw. Here, “w”, “c” and “e” denote that the rules: weakening, contraction and exchange are added to FL, respectively. On the other hand, by removing weakening and/or contraction from LK, one obtains the various classical substructural logics CFLe, CFLec, CFLew and CFLecw (=LK). The full Lambek algebra (shortly, FL-algebra), which is an algebraic interpretation of the full Lambek calculus FL, is defined as follows [7], [8]. Definition 1. The algebra V , , , D, o, oc, 1, 0, D, E satifying the following properties is called full Lambek algebra. (FL1) V , , , D, E is a lattice with the largest element D and the least element E , (FL2) (FL3) (FL4) (FL5) (FL6)
V , D, 1 is a monoid of which unit element is 1, x, y, z , w V : z D ( x y ) D w ( z D x D w) ( z D y D w) , x, y, z V : x D y d z x d y o z , x, y, z V : x D y d z y d x oc z , 0 V .
M.F. Kawaguchi et al. / Non-Commutative Fuzzy Logics and Substructural Logics
274
It should be noted that an arbitrary element 0 is needed in order to give the interpretation of negation as x { x o 0 and cx { x oc 0 . The structural inference rules in formal systems correspond to the following properties in algebraic systems, respectively. (FLe) (FLw) (FLc)
D is commutative, 0 E , x D y d x, x d xDx.
exchange: weakening: contraction:
yDx d x,
With respect to the classical substructural logics, the following property should be added to the algebraic systems. (CFL)
( x o 0 ) oc 0
x
( x o c 0) o 0 .
2. Algebraic Structures for Substructural Logics without Exchange In this section, let us investigate the algebraic systems corresponding to the substructural logics without exchange. We shall refer to the algebraic interpretation of the logical system FLƑ as FLƑ-algebra. FLw-algebra: V , , , D, o, oc, 1, 0 One obtains FLw-algebra by adding the condition (FLw) to FL-algebra as follows. (FLw1)
V , , , 1, 0 is a lattice with the largest element 1 and the least element 0 ,
(FL2)
V , D, 1 is a monoid of which unit element is 1,
(FL3) (FL4) (FL5)
x, y, z , w V : z D ( x y ) D w ( z D x D w) ( z D y D w) , x, y, z V : x D y d z x d y o z , x, y, z V : x D y d z y d x oc z .
It should be noted that the largest element D and the least element E coincide with 1 and 0 , respectively. Also, 0 D a a D 0 0 holds for any a V . FLc-algebra: V , , , D, o, oc, 1, 0, D, E One obtains FLc-algebra by adding the condition (FLc) to FL-algebra. FLcw-algebra: V , , , D, o, oc, 1, 0 One obtains FLcw-algebra by adding the condition (FLc) to FLw -algebra.
M.F. Kawaguchi et al. / Non-Commutative Fuzzy Logics and Substructural Logics
275
3. Non-Commutative Fuzzy Logics and Substructural Logics without Exchange 3.1. Pseudo-t-Norms and Pseudo-Uninorms Flondor et al. [9] have introduced pseudo-t-norms as conjunctive operators in order to construct a non-commutative fuzzy logic. Definition 2. A two-place function Tˆ : [0, 1]2 o [0, 1] satisfying the following properties is called a pseudo-t-norm. (pT1) Tˆ (a, 1) Tˆ (1, a ) a (pT2) a d b Tˆ ( a, c) d Tˆ (b, c), Tˆ (c, a ) d Tˆ (c, b) Tˆ (a, Tˆ (b, c ))
(pT3)
Tˆ (Tˆ (a, b), c)
Now, we need to consider the left-hand continuity of conjunctive operators in logical systems because the axioms (FL5) and (FL6) are required in FL-algebra. Flondor et al. [9] have given the following family of functions as an example of left-hand continuous pseudot-norms: Tˆ ( x, y )
if ai d x d bi , ai d y d ci ai , ® x y min( , ) otherwise ¯
(1)
where 0 d a1 b1 c1 a2 b2 c2 " an bn cn d 1 . When bi ci (i 1, !, n) , this function becomes commutative, i.e. a t-norm. The simplest example of this family is given as follows: TˆMesiar ( x, y )
if x d a, y d b 0 , ® ¯ min( x, y ) otherwise
(2)
where 0 a b 1 . Now, the authors introduce the definition of a non-commutative extension of uninorms [10], according to the way by Flondor et al. to introduce a pseudo-t-norm. Definition 3. A two-place function Uˆ : [0, 1]2 o [0, 1] satisfying the following properties is called a pseudo-uninorm. (pU1) Uˆ (a, e) Uˆ (e, a) a (pU2) a d b Uˆ (a, c) d Uˆ (b, c), Uˆ (c, a ) d Uˆ (c, b) (pU3)
Uˆ (a, Uˆ (b, c))
Uˆ (Uˆ ( a, b), c)
276
M.F. Kawaguchi et al. / Non-Commutative Fuzzy Logics and Substructural Logics
As the same way in the case of uninorms [11], we have to distinguish the conjunctive pseudo-uninorms satisfying Uˆ (0,1) Uˆ (1, 0) 0 from others in order to give the role as conjunctive operators to the pseudo-uninorms. We can consider the following functions as examples of the conjunctive pseudouninoms.
Uˆ1 ( x, y )
a ° ® min( x, y ) ° max( x, y ) ¯
if a d x b and a d y c if x e and y e otherwise
(3)
if a d x d b and a d y d c if y d 2e x otherwise
(4)
Here, b c e or c b e .
Uˆ 2 ( x, y )
a ° ® min( x, y ) ° max( x, y ) ¯
Here, e ! 0.5 ; and b c 2e 1 or c b 2e 1 . Clearly, Uˆ1 is right continuous, and Uˆ is left continuous. As another example of the left continuous and conjunctive pseudo2
uninorms, an extension of the idempotent uninorms by De Baets [12] should be considered. U cg ( x, y )
min( x, y ) ® ¯ max( x, y )
if y d g ( x) otherwise
(5)
Here, g : [0,1] o [0,1] is a decreasing and continuous function satisfying g (1) g (e) e , and g ( x ) e (for x ! e ).
0 ,
3.2. Fuzzy Logics as FLw and FLcw Hajék [13], [14] has formulated the algebraic structures for non-commutative fuzzy logic as follows. Definition 4.
Consider an algebraic system
L, , , , o, oc, 1, 0
and the following
properties on it: (i) L, , , 1, 0 is a lattice with the largest element 1 and the least element 0 , (ii) (iii) (iv)
L, , 1 is a monoid of which unit element is 1,
x y d z x d y o z y d x oc z ( x o y ) ( y o x ) 1 ( x oc y ) ( y oc x )
x , y , z L :
M.F. Kawaguchi et al. / Non-Commutative Fuzzy Logics and Substructural Logics
277
x y ( x o y ) x x ( x oc y ) (v)' ( x o y ) x d x y 㧘 x ( x oc y ) d x y When L satisfies (i), (ii) and (iii), L is called a residuated lattice-ordered monoid (shortly, a residuated l-monoid). Also, L satisfying (i), (ii), (iii), (iv) and (v) is called a pseudo-BL algebra (shortly, psBL-algebra), and L satisfying (i), (ii), (iii), (iv) and (v)' is called a pseudo-MTL algebra (shortly, psMTL-algebra). (v)
It is clear that a residuated l-monoid is equivalent to FLw-algebra. Therefore, psBLalgebra and psMTL-algebra are included in the framework of FLw-algebra. The algebraic structures: psBL-algebra and psMTL-algebra correspond to the logical systems: pseudo-BL (pseudo basic logic) [9], [14], [15] and pseudo-MTL (pseudo monoidal t-norm based logic)[9], [14], respectively, which have been introduced as noncommutative extension of BL [16], [17] and MTL [18], respectively. Let us focus our attention on the framework of fuzzy logic i.e. the case that L [0,1] . The property (iv) in Definition 4 is called prelinearity. Also, the property (v) (or (v)') means that the monoid is continuous (or left continuous, resp.), when L is a continuum. According to Definition 2 mentioned at the previous subsection, it is clear that a left continuous pseudo-t-norm based fuzzy logic [0,1], max, min, Tˆ , o, oc,1,0 forms psMTL-algebra. With respect to FLcw, it has already known that FLcw coincides with FLecw i.e. LJ (See [1] for the details.). Thus, Tˆ reduces to minimum operator in the framework of fuzzy logics, then the fuzzy intuitionistic logic [0,1], max, min, o,1,0 forms a model of FLcw=LJ. 3.3. Fuzzy Logics as FL and FLc Let us consider in this subsection, a method to construct the algebraic systems corresponding to fuzzy logics as a subclass of FL. On the analogy of the case of FLw, it should be adequate to add the properties (iv) and (v)' (or (v) for stronger restriction) to FLalgebra (Definition 1). One can formulate the fuzzy logic model of FL using the left continuous and conjunctive pseudo-uninorms which the authors introduced in §3.1, as [0,1], max, min,Uˆ , o, oc,1,0 . With respect to FLc, as a similar way to the case of FL, it is expected that FL-algebra equipped with the properties (iv) and (v)' (or (v)) gives the algebraic structure for fuzzy logics. As a concrete example, left continuous, conjunctive and idempotent pseudouninorm expressed as eq. (5) can play the role of the conjunctive operators as [0,1], max, min,Uˆ g , o, oc,1,0 . c
M.F. Kawaguchi et al. / Non-Commutative Fuzzy Logics and Substructural Logics
278
4. Concluding Remarks Through this research work, our interests have been already on the logical structures without the exchange rule (i.e. commutativity property). The authors have formulated the algebraic structures: FLw-algebra, FLc-algebra and FLcw-algebra, for substractural logics: FLw, FLc and FLcw, respectively. Then, we have shown that fuzzy logics based on the left continuous pseudo-t-norms form the models of FLw. As the main results of this work, we have introduced the definition of pseudo-uninorms and give some methods to construct them, and show that such functions realize the fuzzy logical systems as the models of FLc and FL. It should be noted that the notion of uninorms was originally introduced [10] as a family of aggregation operators in the field of fuzzy modeling, not as logical connectives. Nevertheless, from the viewpoint of substructural logics without the weakening rule, uninorm (and pseudo-uninorm) based fuzzy logics can be justified. The authors’ forthcoming research paper should be closely related to substructural logics without the weakening rule.
H. Ono: Substructural logics and residuated lattices ֣ an introduction, Trends in Logic ֣ 50 Years of Studia Logica (V.F. Hendrics & J. Malinowski eds.), Trends in Logic 21, Kluwer Academic Publishers (2003). E.P. Klement, R. Mesiar & E. Rap: Triangular Norms, Trends in Logic 8, Kluwer Academic Publishers (2000). R. Bou, A. Garcia-Cerdana & V. Verdu: On some substructural aspects of t-norm based logics, Proc. of IPMU2004, Pelugia, Italy (2004). S. Jenei: Structure of Girard Monoids on [0,1], Topological and Algebraic Structures in Fuzzy Sets (S.E. Rodabaugh & E.P. Klement, eds.), Trends in Logic 20, Kluwer Academic Publishers (2003). S. Jenei: On the structure of rotation-invariant semigroups, Archives of Mathmatical Logic, 42 (2003) pp.489-514. H. Ono: Residuation theory and substructural logics, Proc. of 34th MLG meeting, Echigo-Yuzawa, Japan (2001) pp.15-18. H. Ono: Semantics for substructural logics, Substructural Logics (K. Došen & P. Shroeder- Heister eds.), Oxford Univ. Press (1993) pp.259-291. H. Ono: Proof-theoretic methods in non-classical logic ֣ an introduction, Theories of Types and Proofs, MSJ Memoirs 2, Mathematical Society of Japan (1998) pp.207-254. P. Flondor, G. Georgescu & A. Iorgulescu: Pseudo-t-norms and pseudo-BL algebras, Soft Computing, 5 (2001) pp.355-371. R.R. Yager & A. Rybalov: Uninorm aggregation operators, Fuzzy Sets and Systems, 80 (1996) pp.111-120. J.C. Fodor, R.R. Yager & A. Rybalov: Structure of uninorms, Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 5 (1997) pp.411-427. B. De Baets: Idempotent uninorms, European J. Operational Research, 118 (1999) pp.631-642. P. Hájek: Fuzzy logics with non-commutative conjunctions, J. Logic and Computation, 8 (2003) pp.38-43. P. Hájek: Observations on non-commutative fuzzy logic, Soft Computing, 8 (2003) pp.38-43. A. Di Nola, G. Georgescu & A. Iorgulescu: Pseudo-BL algebras, Multiple-Valued Logic, 8 (2001); Part I, pp.673-714; Part II, pp.717-750. R. Cignoli, F. Esteva, L. Godo & A.Torrens: Basic fuzzy logic is the logic of continuous t-norms and their residua, Soft Computing, 4 (2000) pp.106-112. P. Hájek: Metamathematics of Fuzzy Logic, Trends in Logic 4, Kluwer Academic Publishers (1998).
M.F. Kawaguchi et al. / Non-Commutative Fuzzy Logics and Substructural Logics
279
[18] F. Esteva & L.Godo: Monoidal t-norm based logic: towards a logic for left-continuous t-norms, Fuzzy Sets and Systems, 124 (2001) pp.271-288.
Visibility and Focus: An Extended Framework for Granular Reasoning Yasuo Kudo a,1 , and Tatsuya Murai b a b
Dept. of Computer Science and Systems Eng., Muroran Institute of Technology, Japan Graduate School of Information Science and Technology, Hokkaido University, Japan Abstract. In this paper, we introduce another concept of granular reasoning, called visibility. Visibility separates all sentences into "visible" sentences, that is, sentences we consider, and "invisible" sentences which are out of consideration. Combining visibility and focus, we connects granular reasoning and four-valued truth valuations. We also discuss relationship among granularity based on the visibility and focus, nonmonotonic reasoning by Ziarko’s variable precision rough set models, and knowledge base revision. Keywords. Granular reasoning, belief revision, non-monotonic reasoning, visibility and focus
1. Introduction Granular computing based on rough set theory (Pawlak [13,14]) has been widely studied as a new paradigm of computing (for example, see [6,16]). Murai et al. have proposed granular reasoning as a mechanism for reasoning using granular computing [7], and have developed zoom reasoning systems in the series of papers [5–9]. In these series, Murai et al. have used the concept of focal point or focus, which represents sentences we use in some step of reasoning. The focus provides a three-valued truth valuation that assigns the truth value "true" or "false" to atomic sentences that appear in the focus, and assigns the truth value "unknown" to other atomic sentences. In this paper, we introduce another concept of granularity, called visibility. Visibility is an analogy of the term about vision that means the range of vision. Applying the concept of visibility into the context of granular reasoning, we intend to connect granular reasoning and four valued truth valuations with the following four values: true, false, unknown and undefined. Moreover, combining the visibility and focus we redefine, we separate all atomic sentences in the following three groups: "invisible" sentences, that is, atomic sentences with the truth value "undefined", "obscurely visible" sentences with the truth value "unknown", and "clearly visible" sentences with the truth value "true" or "false". 1 Correspondence to: Yasuo Kudo, Department of Computer Science and Systems Engineering, Muroran Institute of Technology, Mizumoto 27–1, Muroran 050–8585, Japan. Tel.: +81 143 46 5469; Fax: +81 143 46 5499; E-mail: [email protected].
Y. Kudo and T. Murai / Visibility and Focus: An Extended Framework for Granular Reasoning
281
Moreover, we also discussed connections among granularity based on the visibility and focus, nonmonotonic reasoning by Ziarko’s variable precision rough set models [17], and knowledge base revision [3,4].
2. Preliminaries 2.1. Kripke-Style Models Let P be a set of (at most countably infinite) sentences. We construct a language LML (P) for modal logic from P using logical operators (the truth constant), ⊥ (the falsity constant), ¬ (negation), ∧ (conjunction), ∨ (disjunction), → (material implication), and two modal operators 2 (necessity) and 3 (possibility) by the following rules: (1) p ∈ P ⇒ p ∈ LML (P),(2) p ∈ LML (P) ⇒ ¬p ∈ LML (P),(3) p, q ∈ LML (P) ⇒ (p ∧ q), (p ∨ q), (p → q) ∈ LML (P),(4) p ∈ LML (P) ⇒ 2p, 3p ∈ LML (P). A sentence is called non-modal if the sentence does not contain any modal operators. A Kripke model is a triple M = W, R, v, where W is a non-empty set of possible worlds, R is a binary relation on W , and v is a valuation that assigns either the truth value t (true) or f (false) to every atomic sentence p ∈ P at every world w ∈ W . We define M, w |= p ⇐⇒ v(p, w) = t. The relation |= is naturally extended to every sentence p ∈ LML (P) by the usual way. For any sentence p ∈ LML (P), we define the truth set of p in M as p = {w ∈ W | M, w |= p}. 2.2. Zooming Reasoning Systems Zooming reasoning systems provide reasoning processes using reconstruction of models by generating equivalent classes of possible worlds [9]. Such construction operations are called zooming in & out [8,9]. Zooming reasoning systems are formalized as follows: Let M = W, R, v be a Kripke model, and L(P) be a propositional language generated from P by usual way similar to constructing LML (P). Suppose we consider a set Γ of non-modal sentences that illustrates the set of sentences we need to use the current reasoning step. The set Γ is called a focal point or a focus. We define the set PΓ of atomic sentences that appear in the current reasoning step by PΓ = P ∩ Sub(Γ), where Sub(Γ) is the union of the sets of subsentences of each sentence in Γ. Using PΓ , an equivalence relation RΓ over W , called an agreement relation, is defined by def
xRΓ y ⇐⇒ v(p, x) = v(p, y) for all p ∈ PΓ .
(1)
˜ Γ def = W/RΓ . Each element The agreement relation RΓ induces the quotient set W ˜ Γ is a granule of possible worlds under Γ, and called a granularized possible [x]RΓ ∈ W world. Hereafter, we denote a granularized world [x]RΓ by x ˜. We also construct a truth valuation v˜Γ for granularized possible worlds. The valuation v˜Γ becomes the following three-valued one: ˜ Γ −→ 2{t,f } \ {∅}. v˜Γ : P × W
(2)
282
Y. Kudo and T. Murai / Visibility and Focus: An Extended Framework for Granular Reasoning
The three-valued valuation v˜Γ is defined by ˜, {t}, if v(p, w) = t for all w ∈ x {f }, if v(p, w) = f for all w ∈ x ˜, ˜) = v˜Γ (p, x {t, f }, otherwise.
(3)
Hereafter, we abbreviate the two singletons {t} and {f } as simply t and f , respectively. Now we have a granularized model ˜ Γ def ˜ Γ , · · · , v˜Γ . M = W
(4)
of M with respect to Γ. The three-valued semantic consequence relation |=3 is partially def ˜ Γ, w ˜ |=3 p ⇐⇒ v˜Γ (p, x ˜) = t, and extended by the usual way. defined: M When we move to the next step in some reasoning process, we need to reconstruct the granularized possible worlds and the granularized model. Let Γ be the current focus, and ∆ be the focus in the next step. (1) When PΓ ⊃ P∆ , we need further granularization, which is represented by a mapping Γ : WΓ −→ W∆ , O∆ def
Γ (˜ x) = {w ∈ W | v(p, w) = v(p, x) for all p ∈ P∆ and x ∈ x ˜}, O∆
(5) (6)
and called a zooming out from Γ to ∆. (2) When PΓ ⊂ P∆ , we need the inverse operation of granularization. we represent this operation by a mapping Γ I∆ : WΓ −→ 2W∆ , def
Γ (˜ x) = {˜ y ∈ W∆ | v(p, x) = v(p, y) for all p ∈ PΓ , x ∈ x ˜ and y ∈ y˜}, I∆
(7) (8)
and called a zooming in from Γ to ∆. (3) If PΓ and P∆ are not nested each other, the movement from Γ to ∆ is represented by combination of "zooming in & out", that is, a zooming in from Γ to Γ ∪ ∆ first, and next, a zooming out from Γ ∪ ∆ to ∆.
3. Visibility: Another Concept of Granularization In this section, we introduce another concept of granularization, called visibility, and propose a four-valued valuation based on the visibility and focus. 3.1. Motivation "Visibility" is a term about vision that means the range of vision. Visibility divides objects we can see into two types primitively: objects inside of the range of vision, that is, currently visible objects, and outside objects, that is, currently invisible objects. Moreover, combining the visibility and the focus, visible objects are further divided into two types. If an object is in the range of vision but out of focus, it looks obscurely, and we can look the object clearly only if it is in the focal point.
Y. Kudo and T. Murai / Visibility and Focus: An Extended Framework for Granular Reasoning
283
Table 1. An Example of Truth Valuation ˜ {p,q} W def
w ˜1 = p ∩ q def
w ˜2 = p ∩ (q)C def
w ˜3 =
(p)C
∩ q
def
w ˜4 = (p)C ∩ (q)C
p
q
r
t
t
{t, f }
t
f
{t, f }
f
t
{t, f }
f
f
{t, f }
On the other hand, by an analogy between objects we can see and atomic sentences we use in granular reasoning, there is no "invisible" atomic sentence. All atomic sentences are either "clearly visible" in the sense that the sentences have the truth value either t or f , or "obscurely visible" that means sentences have the truth value {t, f }. However, in the semantic viewpoint, we may not be able to distinguish atomic sentences which we have occurrences in Γ, and atomic sentences which we have no occurrence in Γ. For example, suppose P has three atomic sentences: P = {p, q, r}, and we def
def
have a Kripke-style model M = W, · · · , v such that W = 2P and v(p, w) = t ⇐⇒ p ∈ w. Moreover, suppose Γ = {q → p, p}. Then PΓ = {p, q} and we have a truth val˜ Γ illustrated in Table 1: Here, the focus uation in the constructed granularized model M ˜2 , and we have both Γ |=3 q and Γ |=3 ¬q. Similarly we also Γ has two models w ˜1 and w have both Γ |=3 r and Γ |=3 ¬r. Thus, from the viewpoint of the semantic consequence relation |=3 , we can not distinguish q, that is, the atomic sentence we have an occurrence in Γ, and r, that is, the atomic sentence we have no occurrence in Γ. To represent the difference between such q and r, we introduce the concept of visibility into the context of granular reasoning. Moreover, we redefine the focus of granularization. The intuitive meanings of visibility and focus are the following: Visibility is a set of atomic sentences considered in the current step of reasoning, or semantically, a set of atomic sentences that each atomic sentence has a truth value either t or f or {t, f }. Focus is a set of atomic sentences p such that either p or ¬p is derived from the visibility, or semantically, a set of atomic sentences that each atomic sentence has a truth value either t or f . 3.2. A Formulation of Visibility Let Γ be a set of non-modal sentences considered in the current step of reasoning. Using Γ, we define the visibility relative to Γ. Moreover, we redefine the the concept of the focus, and proposed the focus relative to Γ. The definitions of the visibility Vs(Γ) and focus Fc(Γ) relative to Γ are as follows: def
Vs(Γ) = P ∩ Sub(Γ) = PΓ , def
Fc(Γ) = {p ∈ P | either Γ " p or Γ " ¬p} .
(9) (10)
Note that we have Fc(Γ) ⊆ Vs(Γ) for any Γ. To characterize the semantic meaning of visibility and focus, we also construct a ˜ Fc(Γ) based on the focus Fc(Γ) relative to Γ. First, if we have granularized model M Fc(Γ) = ∅, we define the agreement relation RFc(Γ) by Eq. (1), and construct the set
284
Y. Kudo and T. Murai / Visibility and Focus: An Extended Framework for Granular Reasoning
˜ Fc(Γ) . On the other hand, if Fc(Γ) = ∅, we define of granularized possible worlds W def ˜ Fc(Γ) = {W }. W ˜ Fc(Γ) as the folNext, we construct a valuation v˜Fc(Γ) in the granularized model M lowing four-valued valuation: ˜ Fc(Γ) −→ 2{t,f } . v˜Fc(Γ) : P × W
(11)
Actually, the four-valued valuation v˜Fc(Γ) is defined by {t}, if p ∈ Vs(Γ) and v(p, x) = t for all x ∈ w, ˜ ˜ {f }, if p ∈ Vs(Γ) and v(p, x) = f for all x ∈ w, def if p ∈ Vs(Γ) but v(p, x) = t for some x ∈ w ˜ ˜ = v˜Fc(Γ) (p, w) {t, f }, and v(p, y) = f for some y ∈ w, ˜ ∅, if p ∈ Vs(Γ).
(12)
Similar to the case of the three-valued valuation, we abbreviate the two singletons {t} and {f } as simply t and f , respectively. Moreover, the semantic consequence relation |=4 is partially defined as follows: def ˜ Fc(Γ) , w ˜ |=4 p ⇐⇒ v˜Fc(Γ) (p, x ˜) = t, M
(13)
def ˜ Fc(Γ) , w M ˜ |=4 p ⇐⇒ either v˜Fc(Γ) (p, x ˜) = f or v˜Fc(Γ) (p, x ˜) = {t, f }.
(14)
Thus, the relation |=4 is not defined for any atomic sentences q such that q ∈ Vs(Γ). The main differences between the three-valued valuation v˜Γ defined by Eq. (3) and the four-valued valuation v˜Fc(Γ) defined by Eq. (12) are the following: 1. The existence of the fourth truth value ∅, and 2. Separation of atomic sentences by the visibility Vs(Γ) and the focus Fc(Γ). For 1., we interpret the truth value ∅ as "undefined" or "out of consideration", thus we do not consider any "invisible" atomic sentences. For example, in the same setting of the example in section 3.1, but using the four-valued valuation v˜Fc(Γ) , we can distinguish the atomic sentence q we have an occurrence in Γ, and the atomic sentence r we have ˜ {p,q} , we have M ˜ Fc(Γ) , w no occurrence in Γ. This is because, for all w ˜ ∈ W ˜ |=4 q, ˜ Fc(Γ) , w ˜ Fc(Γ) , w ˜ |=4 r nor M ˜ |=4 r. however, we have neither M For 2., the visibility Vs(Γ) divides "considered" atomic sentences and other atomic sentences which are "out of consideration". Moreover, it is clear that, for each p ∈ Fc(Γ) ˜ Fc(Γ) , we have either v˜Fc(Γ) (p, w) and each w ˜∈W ˜ = t or v˜Fc(Γ) (p, w) ˜ = f. We define the truth values of complex sentences by truth tables illustrated in Table 2. Our intension of these definitions are to extend the definition of our truth value assignment by Eq. (12) to all propositional sentences p ∈ L(P) as long as possible. def
Here, the connective → is defined by p → q = ¬p ∨ q. The definition of truth values for connectives correspond to consider a total order t > {t, f } > f > ∅ over the set 2{t,f } of four truth values, and identity the connectives ∧ and ∨ by the meet operation and the union operation based on the total order, respectively. The definition of connectives is quite different to Belnap’s four-valued logic [1,2], and is a simple extension of Kleene’s strong three-valued logic [5]. Note that not all two-valued tautologies are
Y. Kudo and T. Murai / Visibility and Focus: An Extended Framework for Granular Reasoning
285
Table 2. Truth tables of connectives ¬, ∧, ∨ and → Negation ¬p ¬p p ∅ f t {t, f }
HH q p H
HH q p H
∅ t f {t, f }
Disjunction p ∨ q ∅
f
t
{t, f }
∅
∅
f
t
{t, f }
f
f t {t, f }
f t {t, f }
t t t
{t, f } t {t, f }
t {t, f }
Conjunction p ∧ q ∅
f
t
{t, f }
∅ f
∅ ∅
∅ f
∅ f
∅ f
t {t, f }
∅ ∅
f f
t {t, f }
{t, f } {t, f }
HH q p H
Implication p → q ∅
f
t
{t, f }
∅
∅
f
t
{t, f }
f
t f {t, f }
t f {t, f }
t t t
t {t, f } {t, f }
t {t, f }
valid in our four-valued valuations. For example, we have v˜Fc(Γ) (p ∨ ¬p, w) ˜ = {t, f } if v˜Fc(Γ) (p, w) ˜ = {t, f }. More refinement and syntactic characterizations of visibility and focus are future issues.
4. Knowledge Base Revision, Nonmonotonic Reasoning and Granularity 4.1. Default Reasoning and Granularity Similar to Murai et al. [12], we can represent a kind of default rules [15] by the given ˜ Vs(Γ) = W ˜ Vs(Γ) , · · · , v˜Vs(Γ) using Ziarko’s variable precision granularized model M rough set models [17]. One of the main features is, for any atomic sentence p ∈ Vs(Γ) \ Fc(Γ), we can regard literals of p, that is, p and ¬p, as normal default rules. A normal default rule is a default rule without preconditions, for example, "(If ¬p is not proved, ˜ Fc(Γ) , we then) generally p" [15]. In general, for each p ∈ Vs(Γ)\Fc(Γ) and each w ˜∈W have v˜Fc(Γ) (p, w) ˜ = {t, f }, that is, w ˜ ∩ p = ∅ and w ˜ ∩ (p)C = ∅ by the two-valued valuation v. However, if we have |w ˜ ∩ p| ≥1−β |w| ˜ β
˜ ⊆ p. for some β (0 ≤ β < 12 ), using Ziarko’s notation of inclusion [17], we denote w Then, using this notation, we can extend |=4 as follows: β 1 def ˜ Fc(Γ) , w M ˜ |=β4 p ⇐⇒ p ∈ Vs(Γ) and w ˜ ⊆ p, where 0 ≤ β < . 2
(15)
β
def ˜ Fc(Γ) , w M ˜ |=β4 p ⇐⇒ p ∈ Vs(Γ) but w ˜ ⊆ p.
(16)
˜ Fc(Γ) , w We interpret M ˜ |=β4 p as "p is generally true at w". ˜ Note that, if β = 0, the β relation |=4 is identical to |=4 .
286
Y. Kudo and T. Murai / Visibility and Focus: An Extended Framework for Granular Reasoning def
Let ND be a set of literals such that ND = (Vs(Γ) \ Fc(Γ))∪ {¬p|p ∈ Vs(Γ) \ Fc(Γ)}. Using some β (0 ≤ β < 12 ) and Eq. (15) and (16), we can consider a default theory Γ, ND with a knowledge base Γ and a set of normal default rules ND, and its extension similar to the literature [15]. Further properties of default rules defined above are future issues. 4.2. Knowledge Base Revision and Granularity Regarding a set Γ of non-modal sentences as a knowledge base, operations of knowledge base revision may induce some changes of the visibility and the focus relative to Γ. In this section, we treat the case that we revise Γ by some information p, that is, to accept p, we change Γ as little as possible in some sense. The result of revision Γ∗p triggers recon˜ Vs(Γ) = W ˜ Vs(Γ) , · · · , v˜Vs(Γ) . struction processes of the current granularized model M For a given set Γ of non-modal sentences and a non-modal sentence p, a revising process of the current visibility Vs(Γ) and the focus Fc(Γ) relative to Γ is as follows: Step1 Using Γ and p, calculate the result of revision Γ∗p by some (syntactic or semantic) mechanism. Step2 Calculate the visibility Vs(Γ ∗ p) by Eq. (9), and the focus Fc(Γ ∗ p) by Eq. (10), respectively. Step3 Compare Vs(Γ) and Vs(Γ ∗ p): 1. If Vs(Γ) = Vs(Γ ∗ p), go to Step4. 2. If Vs(Γ) ⊃ Vs(Γ ∗ p), construct a new granularized possible worlds by a zooming out from Vs(Γ) to Vs(Γ ∗ p). 3. If Vs(Γ) ⊂ Vs(Γ ∗ p), construct a new granularized possible worlds by a zooming in from Vs(Γ) to Vs(Γ ∗ p). 4. Otherwise, construct a new granularized possible worlds by a zooming in & out, that is, a zooming in from Vs(Γ) to Vs(Γ) ∪ Vs(Γ ∗ p) first, and next, a zooming out from Vs(Γ) ∪ Vs(Γ ∗ p) to Vs(Γ ∗ p). Step4 Construct the four-valued truth valuation v˜Fc(Γ∗p) based on the revised focus Fc(Γ ∗ p). The revision of the current "knowledge base" Γ by the new information p may cause some changes of the current visibility Vs(Γ) and the focus Fc(Γ). The change of Vs(Γ) triggers zooming in & out operations, and provides the resulted granularized model ˜ Fc(Γ∗p) = W ˜ Fc(Γ∗p) , · · · , v˜Fc(Γ∗p) . More refinement of this process and connections M with other belief change operations are future issues.
5. Conclusion In this paper, we introduced the visibility as another concept of granular reasoning, and proposed a four-valued valuation by combining the visibility and focus. Visibility and focus separate all atomic sentences in the three groups: "invisible", "obscurely visible" and "clearly visible", which correspond to four-valued truth values ∅, {t, f } and t or f , respectively. Moreover, we discussed connections among granularity based on the visibility and focus, nonmonotonic reasoning and knowledge base revision: any literals
Y. Kudo and T. Murai / Visibility and Focus: An Extended Framework for Granular Reasoning
287
with the truth value {t, f } are able to be regarded as normal default rules, and belief base revision triggers zooming in & out. We need more theoretical refinements of proposed concepts, and these are future issues.
References [1] Belnap, N.: How a computer should think, Contemporary Aspects of Philosophy, Ryle, G. (ed.), pp.30–56, Oriel press (1977). [2] Belnap, N.: A useful four-valued logic, Modern Uses of Multiple-valued Logic, Dunn, J. M. and Epstein, G. (eds.), pp.8–37 (1977). [3] Gärdenfors, P.: Knowledge in Flux, MIT Press (1988). [4] Gärdenfors, P. and Rott, H.: Belief Revision, Handbook of Logic in Artificial Intelligence and Logic Programming, Gabbay, D. M., Hogger, C. J. and Robinson, J. A. (eds.), pp.35–132, Clarendon Press (1995). [5] Kleene, S.: Introduction of Metamathmatics, Van Nostrand (1952). [6] Lin, T. Y.: Granular Computing on Binary Relation, I & II. Rough Sets in Knowledge Discovery 1: Methodology and Applications, Polkowski, L. et al. (eds.), Physica-Verlag, pp.107– 121, pp.122–140 (1998). [7] Murai T., Nakata, M. and Sato, Y.: A Note on Filtration and Granular Reasoning. Terano, T. et al. (eds.), New Frontiers in Artificial Intelligence, LNAI 2253, Springer, pp.385–389 (2001). [8] Murai T., Resconi, G., Nakata, M. and Sato, Y.: Operations of Zooming In and Out on Possible Worlds for Semantic Fields. Damiani, E. et al. (eds.), Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies, pp.1083–1087 (2002). [9] Murai T., Resconi, G., Nakata, M. and Sato, Y.: Granular Reasoning Using Zooming In & Out: Part 2. Aristotle ’s Categorical Syllogism. Electronic Notes in Theoretical Computer Science, Vol.82, Issue 4 (2003). [10] Murai T., Resconi, G., Nakata, M. and Sato, Y.: Granular Reasoning Using Zooming In & Out: Part 1. Propositional Reasoning. Rough sets, Fuzzy sets, Data mining, and Granular Computing, Wang, G., Liu, Q., Yao, Y. and Skowron, A. (eds.), LNAI 2639, Springer, pp.421– 424 (2003). [11] Murai T., Sanada, M., Kudo, Y. and Kudo, M.: A Note on Ziarko’s Variable Precision Rough Set Model and Nonmonotonic Reasoning, Rough Sets and Current Trends in Computing, Tsumoto. S. et al. (eds.), LNAI 3066, Springer (2004). [12] Murai T., Sanada, M., Kudo, Y. and Sato, Y.: Monotonic and Nonmonotonic Reasoning in Zoom Reasoning Systems. Knowledge-Based Intelligent Information and Engineering Systems, Negoita, M. G., Howlett, R. J. and Jain, L. C. (eds.), LNAI 3213, Springer, pp.1085– 1091 (2004). [13] Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences, Vol.11, pp.341–356 (1982). [14] Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer, Dordrecht (1991). [15] Reiter, R.: A Logic of Default Reasoning, Artificial Intelligence, Vol. 13, pp.81–132 (1980). [16] Skowron, A.: Toward Intelligent Systems: Calculi of Information Granules. New Frontiers in Artificial Intelligence, Terano et al. (eds.), LNAI 2253, Springer, pp.251–260 (2001). [17] Ziarko, W.: Variable Precision Rough Set Models. Journal of Computer and System Science, Vol.11, pp.39–59 (1993).
vii, 55, 113, 120, 153 216, 220, 248 Akama, S. 1, 242, 248 Batista, G.E.A.P.A. 97 Bernardini, F.C. 161 Bittencourt, G. 105 Brunstein, I. 55, 153 Chung, S.-L. 50 da Silva Filho, J.I. 127, 153 de Almeida Prado, J.P. 120 de Andrade Lopes, A. 169 de Carvalho, F.R. 55, 153 de Melo, V.V. 169 de S. Lauretto, M. 34 de Sá, C.C. 42 Esmin, A.A.A. 9 Ferrara, L.F.P. 127 Gonçalves, E.M.N. 105 Greco, S. 264 Inuiguchi, M. 264 Ishikawa, R. 79 Iwami, M. 18 Jain, L. 147 Kawaguchi, M.F. 272 Kawasumi, K. 63 Komaba, H. 71 Kondo, M. 229 Kudo, Y. 248, 280
Lai, Y.-H. 50 Lambert-Torres, G. 9 Lee, H.D. 135 Li, Y.P. 26, 87 Matsubara, E.T. 97 Miyakoshi, M. 272 Monard, M.C. 97, 135, 161 Murai, T. 242, 248, 280 Nagata, Y. 1 Nakamatsu, K. vii, 63, 71, 79, 113 177, 185, 192, 199, 207, 216, 220 Nakata, M. 234, 256 Nwabueze, K.K. 143 Pagliosa, A. 42 Sakai, H. 234, 256 Sasse, F.D. 42 Scalzitti, A. 113, 120 Słowiński, R. 264 Stern, J.M. 34 Suzuki, A. 63, 71, 79, 177, 185 192, 199, 207 Tam, L.M. 87 Vong, C.M. 26, 87 Watari, O. 272 Wong, P.K. 26, 87 Wu, F.C. 135 Yamanaka, K. 127 Yamawaki, S. 147