Non-standard logics for automated reasoning

Non-Standard Logics for Automated Reasoning This book is the result of a project that was funded by the Commission of...

Author: Philippe Smets

81 downloads 1078 Views 34MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Non-Standard Logics for Automated Reasoning

This book is the result of a project that was funded by the Commission of the European Communities' COST (Committee on Science and Technology) initiative.

Non-Standard Logics for Automated Reasoning Edited by

PH ILl PPE SM ETS /RID/A, Universite Libre de Bruxel/es, Belgium

ABE MAMDANI Department of Electrical Engineering, Queen Mary College, London, UK

DIDIER DUBOIS Laboratoire Langages et Systemes lnformatiques, Universite Paul Sabatier, Toulouse, France

HENRI PRADE Laboratoire Langages et Systemes lnformatiques, Universite Paul Sabatier, Toulouse, France

1988

@ ACADEMIC PRESS Harcourt Brace Jovanovich, Publishers London San Diego New York Boston Sydney Tokyo Toronto

ACADEMIC PRESS LIMITED 24/28 Oval Road, London NW1 7DX

United States Edition published by ACADEMIC PRESS INC. San Diego, CA 92101 Copyright © 1988 by ACADEMIC PRESS LIMITED Chapter 4 "Autoepistemic Logic" © by R. C. Moore Chapter 8 "Probabilistic Logic" © by G. Paass "Discussion of Smets" © by G. Paass Chapter 9 "Belief Functions" © by P. Smets

All Rights Reserved No part of this book may be reproduced in any form by photostat. microfilm, or any other means, without written permission from the publishers

British Library Cataloguing in Publication Data Non-standard logics for automated reasoning. 1. Logic, Symbolic and mathematical I. Smets, P.H. 511.3 BC135 ISBN 0-12-649520-3

Filmset by Eta Services (Typesetters) Ltd, Beccles, Suffolk Printed in Great Britain by St Edmundsbury Press Ltd, Bury St Edmunds, Suffolk

Participants

Philippe Besnard, IRISA, Domaine Universitaire, Campus Beaulieu, Avenue du General Leclerc, 35042 Rennes Cedex, France John Bigham, Department of Electrical and Electronic Engineering, Queen Mary College, Mile End, London El 4NS, England John A. Campbell, Department of Computer Science, University College London, Gower Street, London WC IE 6BT, England Eugene Chouraqui, * Groupe Representation et Traitement des Connaissances, Centre National de Ia Recherche Scientifique, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 09, France Michael R. B. Clarke, Department of Computer Science, Queen Mary College, Mile End Road, London E1 4NS, England Anthony G. Cohn, Department of Computer Science, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, England Marie-Odile Cordier, LRI, Batiment 490, Universite Paris-Sud, 91405 Orsay Cedex, France Didier Dubois, Laboratoire Langages et Systemes lnformatiques, Universite Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cedex, France Jean Fargues, Centre Scientifique IBM France, 36 Avenue Raymond Poincare, F-75116 Paris, France Luis Farinas del Cerro, Langages et Systemes lnformatiques, Universite Paul Saba tier, 118 Route de Narbonne, 31062 Toulouse Cedex, France Christine Froidevaux, UA 410, CNRS, Laboratoire de Recherche en lnformatique, Universite de Paris-Sud, 91405 Orsay Cedex, France Richard A. Frost, Department ofComputer"Sciences, University of Windsor, 401 Sunset Avenue, Windsor, Ontario N9B 3P4, Canada D. M. Gabbay,* Department of Computing, Imperial College of Science and Technology, University of London, 180 Queen's Gate, London SW7 2AZ, England Paul Gochet, Seminaire de Logique et d'Epistemologie, Universite d'Etat de Liege, 32 Place du XX Aout, 4000 Liege, Belgium Eric Gregoire, Unite d'Informatique, Universite Catholique de Louvain, Place Sainte Barbe 2, Bl348 Louvain-La-Neuve, Belgium Andreas Herzig,* Laboratoire Langages et Systemes lnformatiques, Universite Paul Saba tier, 118 Route de Narbonne, 31062 Toulouse Cedex, France *Contributor but not participant

viii

Preface

been realized. The editors, who were also the coordinators of the project, gratefully acknowledge this support. The content of this book was discussed and polished at workshops held in September 1986 and June 1987. The workshops took place over long weekends (each amounting to three and a half days of meetings and discussions), the first in Cordes and the second in Rodez-two pleasant locations near Toulouse, France. The format of these meetings is worth a brief mention here. Formal sessions took place from 9 a.m. to 12 p.m. and from 4 p.m. to 7 p.m. each day. This allowed considerable time in the afternoon and the evenings for informal discussions. For the first workshop, a few of the participants had been nominated by the project coordinators to present draft papers. These papers were read and discussed at great length. The coordinators also nominated discussants to provide the critiques for each presentation, but voluntary discussions were also encouraged. The revised papers, their critiques and rejoinders were circulated to all participants for comment and were later discussed at the second workshop, which essentially performed the function of editing the text for the book. As editors and project coordinators, we were heartened to witness the intense intellectual activity that occurred at the workshops, both during the formal sessions and outside them. At the end of the introduction to the book we have included three appendices. The first two provide short tutorials on classical logic and modal logics, and the third gives a brief introduction to the existing literature on the logical aspects of probability theory. These tutorials and the bibliography included in the appendices provide useful reference material for the reader. The critiques and rejoinders accompanying each paper are there further to enhance the intellectual flavour of each presentation and accurately reflect the discussions that took place. The editors hope that a discerning reader will be able to gather from these discussions the care that needs to be exercised in applying each logic and the open questions that need further research. A graduate student will find the text useful both in selecting a topic for research and also in directing attention to the sort of questions that need to be researched when working in non-standard logics. The organization of this project necessitated the distribution of a large amount of printed material from each author to the discussants and to other participants, throughout the two years of the project. In this we were helped greatly by Fabienne Gerard and Lucy Ineson, and we wish to acknowledge their efforts here. Philippe Smets Abe Mamdani Didier Dubois Henri Prade

Contents

Preface

IX

Introduction E. H. Mamdani, John Bigham and Flash Sheridan Appendix A: Classical Logic- An Introduction Anthony G. Cohn Appendix B: Modal Logic-A Brief Tutorial Bruno Marchal Appendix C: Logic and Probability Didier Dubois and Henri Prade

1 8 15

1 On-Game-Theoretic Interactions with First-Order Knowledge Bases Peter Jackson Discussants: P. Gochet 54 L. Farinas del Cerro 56 R. C. Moore 56 Reply 59 2

3

4

An Automated Modal Logic of Elementary Luis Farinas del Cerro and Andreas Herzig Discussants: J. Fargues 76 F. Veltman 78 Reply 79 Formal Expression of Time in a Eugene Chouraqui Discussants: C. Testemale 98 L. Farinas del Cerro 99 J. A. Campell 100 Reply 101 Autoepistemic Logic Robert C. Moore Discussants: J. Fargues 127 P. Jackson 129 Ph. Smets 130 E. Gregoire 132 Reply 133 ix

23

27

Changes

Knowledge

63

Base 81

105

Contents

X

5 The Preferential-Models Approach to Non-Monotonic Logics Philippe Besnard and Pierre SieKel Discussants: C. Froidevaux 156 F. Sheridan 159 Reply 160 6

7

8

9

I0

An lntuitionistic Basic for Non-Monotonic Michael R. B. Clarke and D. M. Gahhay Discussant: J. A. Campbell 175 Reply 177

Reasoning

Inheritance in Semantic Networks and Default Christine Froidevaux and Daniel Kayser Discussants: D. Dubois and H. Prade 206 Ph. Smets 208 D. Dubois and R. Valette 209 Reply 211 Probabilistic Logic Gerhard Paass Discussants: F. Veltman 244 D. Dubois and H. Prade Ph. Smets 248 Reply 249 Belief Functions Philippe Smets Discussants: M. R. B. Clarke 277 G. Paass 279 D. Dubois and H. Prade Reply 282 An Introduction to Possibilistic Didier Duhois and Henri Prade Discussants: M .-0. Cordier 315 P. Gochet 318 F. Sheridan 320 Reply 321

Index

137

163

Logic 179

213

246

253

280

and

Fuzzy

Logics 287

327

Introduction

In trying to automate reasoning one should first attempt to articulate not how humans do reason but rather how they might most fruitfully reason. Logic is thus concerned with the normative aspects of reasoning. "Automated reasoning" in the title of this book refers to the automation of a formal system of logic. The term logic in the title refers to such a formal system. About 150 years ago, George Boole expounded his Laws of Thought. This suggested that logical human reasoning was subject to mathematical laws. (It is interesting to note that this book's full title is An Investigation of the Laws of Thought on which are Founded the Mathematical Theories of Logic and Probabilities; we shall have more to say about probability later.) His work was the starting point of formal systems of logic. The motivations for automating re1:1soning and studying a formal system of logic are different. The point of automating reasoning is to apply it to reasoning about a range of practical problems. In contrast, one may undertake to study a formal system for philosophical reasons or for scholastic reasons, i.e. for the sake of the beauty of the system' itself. For example, while studying a formal system, one attempts to ensure that the system adheres to certain well-understood metalogical (for the definition of this and the following terms see Appendix A on classical logic) requirements, such as the consistency, soundness and the completeness of a system. Now a system that is to be automated should adhere to the requirements for soundness; that is, if a statement is provable from a given set of premises using the system's formalisms, then that statement should also logically follow from the premises. However, the requirement for completeness-that everything that logically follows from a set of premises should be provable within the system--can quite conceivably be relaxed for an automated practical system of reasoning. None of the logics presented here explicitly consider incompleteness resulting from limited computational resources, although in practical systems for automated reasoning that is one factor that needs to be taken into account. The system of logic that is not non-standard is called classical logic. It is the formal system of logic that has been most extensively studied since George Boole. However, several non-standard systems were also investigated before the advent of the digital computer and the need for automating reasoning. The latter has simply accelerated the study of non-standard systems. There is NON-STANDARD LOGICS FOR AUTOMATED REASONING ISBN 0-12-649520-3

Copyright~('} /988 Academic Pre.~.'i Limited All rights t~{reproduction in anyfiJrm reserv,•J

2

E. H. Mamdani eta!.

controversy concerning non-standard logics among philosophers and logicians. This controversy has also spilled over to scientists who wish to apply logics by automating them, but the criteria for the debate ought to be (but often are not) different in these two settings. The work on non-standard systems continues apace and this book is aimed at satisfying the need for an explanation of such systems. Moreover, these explanations are accompanied by an exposition of the arguments that surround each of the systems. At the end of this introduction we have included three appendices that the reader may find particularly useful. Appendix A is a short tutorial on classical logic. Many of the non-standard logics are either extensions or deviants of classical logic. Therefore, even in 1l book on non-standard logic, classical logic remains a benchmark. This appendix also includes a short discussion of a particular deviant of classical logic: namely, intuitionistic logic. One way to extend classical logic is to extend its vocabulary with modal operators. Depending on the assumed properties of these operators, we obtain a large class of modal logics. The semantics of modal logics are explained in terms of "many possible worlds" first articulated by Kripke. Appendix B is a short tutorial on modal logics and Kripke semantics. A significant background work in logic and probability theory (recall the motivation of George Boole as expressed in the full title of his book) is largely ignored by the artificial intelligence community who use probability theory in their application. Appendix C is a very brief introduction to the literature that exists on logical aspects of probability that we feel ought to be given due recognition.

WHY NON-STANDARD LOGICS? It is impossible to give a short summary of the controversy surrounding nonstandard logics. This book does not pretend to adopt an impartial stance, merely an intellectually open one. The claim that they are all wrong or unnecessary, and usually both, may be ignored in automating reasoning, where ideas of abstract truth and economy in primitive elements may be of less concern than questions of more convenient expressive power and, perhaps, also of the economy of computational effort. Moreover, non-standard systems are conceived and put forward in order to capture particular modes of reasoning (e.g. reasoning about beliefs, probabilistic reasoning or default reasoning) rather than to simply investigate the rules of logical reasoning

per se. The first question that should occur to someone presented with an alternative to classical logic is, "Why bother?" Classical logic is widely understood and agreed upon; this is not to say that it is either perfect or

Introduction

3

invariably useful. The purpose of this book is, broadly, not to argue the latter, but to show that for many purposes, some alternative to classical logic works better. There are, of course, many arguments that classical logic is flawed, and these arguments are the historical origin of many of the logics used here; but the arguments against classical logic do not themselves belong here. We are showing how one can use various non-!rtandard logics, not that one must. It is interesting first of all to give a rough classification of the logics discussed in this book; any such classification is open to debate. Most of them have modified their classical paradigm for greater expressive power. It is not that classical theory comes up with the wrong answers to questions, but that certain questions cannot be expressed in it easily, naturally, or efficiently computationally. The most obvious distinction is numeric versus non-numeric; Paass, Smets, and Dubois and Prade in this book all use numeric logics; the rest, nonnumeric. The second distinction, we take from Susan Haack's Philosophy of Logics (Cambridge University Press, 1978). Some logics are extended; they can talk about things that classical logic cannot by extending the basic vocabulary of logic. Other logics are restricted or deviant: some, at least, of the things they say use roughly the same vocabulary as classical logic, but they make some of the theorems false where classical logic would have them true. A logic can be both deviant and extended. Farinas and Herzig, Chouraqui, and Moore have all extended clasicallogic by adding modal operators that apply to sentences; any sentence of their systems without these operators is true if and only if it is true in classical logic. Similarly, Smets, and one part of Dubois and Prade's chapter (on logics of uncertainty) can be viewed as extensions of classical probability theory. Whereas classical probability theory assigns a sentence a' single number, both of these assign a sentence two numbers. Another sort of extension of any theory is to start reasoning about the assertions of theory. This is called the theory's meta-theory (see Appendix A). Siegel and Besnard use model theory, part of classical logic's meta-theory; this also is an extension. Paass, on the other hand, reasons about the assertions of probability theory; so his theory is an extension of probability theory. The other part of Dubois and Prade's chapter, on the logics of vagueness, is a deviation from classical logic; both the law of the excluded middle and the law of contradiction hold in classical logic, but in general none of them hold in fuzzy logic. Another deviant logic is intuitionistic logic, presented by Clarke and Gabbay; this denies the law of the excluded middle only. Jackson's chapter and that of Froidevaux and Kayser are both quite different from this. Jackson uses classical logic; he just has a different

4

E. H. Mamdani et a!.

semantics (see Appendix A) for it. Froidevaux and Kayser use neither logic nor probability theory; in order to do much the same job as default logic (an extension of classical logic) they use inheritance trees. Below, we discuss each of the non-standard logics in the order in which they appear in this book. The earlier chapters concern logics that rely on only symbolic (hence non-numeric) reasoning. The last three chapters all deal with logics based on numeric formalisms.

SYMBOLIC REASONING The paradigm for non-numeric reasoning has been classical logic, in which classical theorem-proving programs make automated reasoning a (limited) reality. Ad hoc methods exist, but their logical foundations need. further investigation. When standard logic is modified, the extensions and modifications must be given a clear interpretation. In non-numeric non-standard logic, "non-monotonic" is the most prominent buzz-word, but not the central issue; non-monotonicity is merely a property of some of the logics. We must sometimes change our minds; this, essentially, is non-monotonicity. Upon learning more, we realize that something that we thought was true is not. So any logic to deal with practical matters is likely to be non-monotonic, but this will be only one of its properties. (For a more technical definition of "non-monotonic", see Appendix A.) Game-theoretic semantics Jackson does not use a non-standard logic at all; he uses a non-standard semantics for classical logic, due to Hintikka, in order to provide a model of a computer's attempt to maintain a consistent knowledge base. It is conservative when accepting information updates but permissive and helpful when answering queries. Ordinary, truth-functional semantics is essentially nonpragmatic, whereas Hintikka's game-theoretic semantics is pragmatic in that it incorporates the notion of verification and refutation. This allows the user to give domain-dependent heuristics that (although they can lead to errors) may be able to answer queries more efficiently. Modal logics Classical logic is timeless. If the proposition that Gottlob Frege had a beard on 1 January 1886 was once true, it will always be true. But this proposition

Introduction

5

has no logical connection with his having a beard on the next day. If we are going to reason about things changing in time, we will have to do better than this; temporal logics apply modal logics to reasoning about time. In a temporal logic we can express that at a given time Frege changed from a bearded state to a beardless one. This we cannot express directly in classical logic. Chouraqui's chapter starts with a survey of the field and concludes with his own theory. Another thing that classical logic cannot express directly is contrary-tofact conditionals (alias counterfactuals), e.g. sentences like "IfGalileo had been American he would have taught at Harvard." This statement presents more difficulty than at first appears. One may believe it, but one may also believe that ifGalileo had been American he would have been an uneducated farmer; this contradicts our first counterfactual. The difficulty comes in deciding what other things to change when we change the antecedent from the truth (Galileo was Italian) to something else (that he be American). What else do we change? His education? His native language? No-one has found a solution that works well in all cases; Farinas and Herzig present a solution which will work in the case where all facts are mutually independent. Reasoning with incomplete knowledge Moore, Siegel and Besnard, and Froidevaux and Kayser each deal with a different form of reasoning, not just from what we know, but also from realizing what we do not know. The traditional example is as follows: if we know Tweety is a bird, then we assume he can fly if we do not know the contrary. This requires us to draw a conclusion based not just on our knowledge of the world, but on our knowledge of our own ignorance. This is a metatheoretic extension of classical logic's expressive power; classical logic cannot talk about ignorance. Reasoning from ignorance has an odd effect, namely non-monotonicity. If further information leads us to believe that Tweety is a penguin then we can no longer conclude that he can fly. More information leads to one less conclusion. Moore presents autoepistemic logic, a non-monotonic logic for modelling the set of beliefs that an ideally rational agent, reflecting on what he knows and does not know, would hold. He shows that such theories (i.e. sets of beliefs) have Kripke (possible worlds) models (see Appendix B). This also enables us to demonstrate the soundness and completeness of such a theory with respect to a set of premises. Moore presents a procedure for determining whether, given some data, such a rational agent might conclude a given formula. For many cases the algorithm requires exponential time for an initial calculation for a data set; afterwards answers to queries can be computed quickly.

6

E. H. Mamdani et a!.

A different approach to the Tweety problem is not to use classical logic as a paradigm, but to use inheritance hierarchies instead. Froidevaux and Kayser discuss type hierarchies (e.g. Tweety is a bird; "bird" is a type of animal; "animal" is a type of living thing), and properties inherited (perhaps with exceptions) through them. They also discuss their relation to default logics; default logics perhaps have a firmer logical foundations, but type hierarchies are computationally simpler. They show how to translate some type hierarchies into default inferences. A particularly simple example of reasoning from ignorance is the following: If a database does not say that Flight 913 stops in Chicago, we assume it does not stop in Chicago, even though we have not been told this; this is the Closed World Assumption. Another sort of non-monotonic inference that until now was considered quite distinct is this: if we do not know why a light does not light, we prefer the explanation that it is the electrical socket that has failed rather than the light, the hair drier, and the refrigerator. This is the idea behind Circumscription: select the solution that minimizes the number of things acting abnormally. Siegel and Besnard use classical logic (more specifically, Model Theory) to provide a neat presentation of both the ClosedWorld Assumption and Circumscription in one framework. It is not only easier to see the relation between the two concepts as the authors present them, but we believe that it is the easiest available introduction to circumscription, and the easiest way to understand the consequences of the Closed-World Assumption. The more technical parts of this chapter will be beyond the beginner. Intuitionism

Clarke and Gab bay first discuss the nature of non-monotonic inference; then they compare intuitionistic logic with classical logic. They claim the intuitionistic notion of truth-that some things may not yet be true, but might become so, and that there is not necessarily a single way things could turn out-is more realistic, and that this realism makes it well suited for nonmonotonic reasoning. They present a non-monotonic inference rule and discuss automating it, and examine its connection with classical nonmonotonic theories such as Moore's.

NUMERIC FORMALISMS

Numerical techniques have been extensively used in automated reasoning (e.g. expert systems), sometimes without a clear understanding of what the numbers mean. The chapters in this book attempt to clarify this issue. The

Introduction

7

most respected counterpart to the role of classical logic in Artificial Intelligence has always been probability theory. There are alternatives to classical logic that use real numbers, usually from zero to one, inclusive; they are not probability theory, but for them probability theory has always been a strong paradigm, to mimic, to disagree with, or to extend. One extension of any theory is to start reasoning about the assertions of the theory. This is called the theory's meta-theory. In probability theory we can deduce the probability of one event from some other probabilities. To get these probabilities, one must ask someone; his answers may be erroneous, uncertain, or even inconsistent. Paass's probability logic, reasoning about these probabilities (hence meta-theoretically), attempts to turn them into a consistent, and we hope more accurate, set of probability distributions. In the case of insufficient knowledge, it may only be possible to derive upper and lower bounds on probabilities. Probabilistic logic is concerned with defining probability distributions over logical propositions. Paass's chapter examines the methods and the underlying assumptions necessary for evaluating these in inference networks, and comments on the relationship between this and other formalisms of uncertain reasoning. In practical situations, the available pieces of information often have different reliability; this must be taken into account for evaluation. The quantitative theory of belief that Smets presents differs in two ways from probability theory: he wants to say something different, and he wants to say more. On the one hand, Smets argues, Shafer theory is a theory of subjective belief, as opposed to objective but unknown probabilities. On the other hand, Shafer can say things about belief that probability theory cannot say about probability. For instance, in probability theory, if we know that either P or Q will definitely happen, but we don't know the probabilities of P and of Q, we must nonetheless give them numerical values: We must (in the usual interpretation of probability) say that P and Q each has a probability of 0.5. In Shafer theory, we merely give all of our subjective weight of belief to the set { P, Q}, which is different from giving a weight of0.5 toP and to Q. It is more difficult to say whether this theoretical advantage brings a corresponding advantage in guiding action. In contrast with the above two, fuzzy logic is not an attempt to say more than probability theory, but to get by with less information. In cases where probability theory would say one needs more information to answer a question, fuzzy logic comes up with an answer (possibly imprecise). Its adherents claim that varieties of fuzzy logic are axiomatizations of vagueness and of uncertainty (as distinct from probability); these claims are disputed. While it may be debatable what fuzzy logics are logics of, it is clear that they have been useful in application. Dubois and Prade characterize the difference between fuzzy logics of vague-

f'.. H. Mamdani et al.

8

ness and uncertainty logic. The logics of uncertainty are uncontroversially not truth-functional; they argue that the logic of vagueness can be truth functional under perfect information. The basic notion is that of the degree of truth of a vague proposition, which will be a number between zero and one. The ease (and controversy) come from traditional fuzzy logic's truth-functionality. This gives a very (perhaps excessively) simple nature to the logical connectives. The degree of truth of a disjunction, for instance, is merely the maximum of the degrees of the disjuncts: if P is 0. 7 true and Q is 0.6 true then P v Q is 0.7 true (in the most common variety of truth-functional fuzzy logic; in another it is 1.0). They show that many classical inferences have analogs in possibility logic and discuss the relationships with Shafer theory, probability theory, and modal logic, suggesting that possibility logic offers a graded theory of the modal notions of possibility and necessity. E. H. MAMDANI, JOHN BIGHAM and FLASH SHERIDAN

Department of' Electrical Engineering, Queen Mary College, London, UK APPENDIX A: CLASSICAL LOGIC-AN INTRODUCTION In this appendix a very brief overview of classical logic is presented for those readers who would like a short overview or revision of the main concepts taken for granted in some of the other chapters. A logic is a formal system for representing knowledge unambiguously and reasoning with it. The first requirement therefore is an unambigous syntax that defines the strings in the language, called wel/formedformulae (wffs). In order to know what a wffmeans, a semantics must be also be given so that it is possible to say when any given formula is true and when it is false. Finally, we shall want to define a proof system or calculus that will allow chains of reasoning to be constructed in order to represent a given argument (i.e. the derivation of a conclusion from a set of premises). Perhaps the most studied system of logic, and certainly the most prevalent historically in Computer Science and Artificial Intelligence is First-Order Predicate Logic (FOL). There are many concrete syntaxes in the literature, but the differences are unimportant. A typical formulation is as follows. The alphabet of the language is the union of the following sets. (i) (ii)

P: a non-empty set of predicate symbols. We shall use strings

composed entirely of capital Roman letters as elements of P. F: a non-empty set of function symbols. We shall use strings composed entirely of lower-case Roman letters or numerals as elements of F.

9

Introduction

(iii) (iv) (v) (vi) (vii)

V: a non-empty set of variables. We shall use lower-case italic letters as elements of V. A set of Boolean connectives. For simplicity here, we shall restrict ourselves just to {--,, -+ }. {V, 3}: two quantifiers: the universal quantifier \f and the existential quantifier 3 (read "for all" and "there exists"). {T, _1_}: representing the two logical constants true and false respectively. {(, ), ', '}: three punctuation symbols.

Associated with every oc E (P u F) is a non-negative integer called its rank. We shall write P" and F" to denote the sets of predicate and function symbols of rank n respectively. Rank-zero function symbols are often called constants and rank-zero predicate symbols are often called propositions (or propositional variables). The set of wffs is the smallest set of strings such that: (i) (ii) (iii) (iv)

s1 is a wff if s1 = f3(oc 1 , ••• , ocn) and f3 E P" and oc; is a term for all l ~ i ~ n; s1 is a wff if s1 = (84 -+ ~). where fiB and ~ are wffs; s1 is a wff if s1 = T or if s1 = _i; s1 is a wff if s1 = \focf!B or if s1 = 3oc84, where oc E V and 84 is a wff, where y is a term iff yEV

or

y = f3(oc 1 ,

..• ,

ocn) and f3 E F" and oci is a term for all l

~

i

~

n

If oc E (F 0 u P 0 ) then we can unambiguously write oc rather than oc( ), since F n Vis empty. Thus, for example, if Q E P 2 , g E F 1 and bE F 0 then Q(f(b), x) and \fx3y(Q(x, y) -+ Q(y, x)) are both wffs. Notice that, for example, \fx\fyQ(y(b), x) is not a wff; in first-order logic one may not quantify over functions (or over predicates). Logics that admit such strings as formulae are higher-order logics. If no predicate symbols other than rank-zero ones are used and the only possible wffs are of type (i), (ii) or (iii) above then we get a sublanguage, which is usually known as propositional logic. Sometimes additional Boolean connectives other than -+ and --, are included. This can either be done by definitions or by including them in the language properly and giving semantic definitions, axiom schemas and rules of inference in addition to those given below. Commonly " {conjunction}, v (disjunction) are included and can be defined thus: (.W'

1\

fiB)=: I(.W'-+ 1f1B)

(.W' v 84)

=(--, s1

-+

84)

10

E. H. Mamdani et al.

Similarly, .l and T are often dispensed with and either V or 3 can be omitted since they are interdefinable: Vrx..r#

=- -, 3rx.---, .Jli'

Two important concepts are those of free and bound variables. An occurrence of a variable rx. E V in a wff .r# is bound in the wffs Vrx..r# and 3rx.d'. It is free if there is an occurrence that is not bound. Thus, for example, in (R(x, y) -+ 3xVzR(x, z)), z is bound, y is free, and x has both free and bound occurrences. If a wff has no free variables then it said to be closed. So far all we have is a set of wffs; we have said nothing about what a formula means. The semantics of a wff of FOL can be defined by composing the meanings of the subexpressions. Logical symbols such as the quantifiers and the Boolean connectives have fixed meanings, but the non-logical symbols (i.e. F and P) do not. Whenever we write a formula, we usually have an intended meaning or interpretation for the non-logical symbols and an intended universe of discourse (i.e. the set of things over which we want to quantify) in mind. Formally, an interpretation (or model) is a tuple
U"

-+

u

if Cl.. E F"

and a(rx.)

~

U"

if Cl.. E pn

For example, if we are developing a theory of numbers then U might be the natural numbers N, and we might have a function symbol s E F 1 whose intended interpretation is the successor function, a constant symbol z E F 0 whose intended interpretation is the number zero and a predicate symbol L E P 2 whose intended interpretation is {
II

Introduction

what x denotes). An environment ¢ is a mapping V -+ U. ¢

~ is the

environment exactly like ¢ except that ¢({3) = o:. Normally we only admit closed sentences into a theory. However, the semantic definition below is inductively defined on the structure of the formula (which may have open, i.e. non-closed, subexpressions), and thus we need to use environments in the semantic definition. Now, given¢ and a, we can say what any term y denotes in U by defining a function dena.p that maps terms to U: if y E V then dena.p(y) = ¢(y) otherwise y = {J(o:" ... , O:n) and dena.p(y) = a({J)(dena.p(o:t), .. . , dena.p(o:n)) Thus den tells us what any term means. We will now extend den to apply to any wff .w: (i) (ii) (iii) (iv)

if .w = {J(o: 1 , •• • , O:n) and {J E P" then dena.p(A) = I if dena.p(o: 1 ), ... , dena.p( O:n) ); if .w = (!f.l -+ rc) then dena.p(.w) = I if dena.p(rc) = I; otherwise dena.p(.w) = I - dena.p(JI); if .w = T then dena.p(.w) = I; if .w = l_ then dena.p(.w) = 0; if .w = Vo::/4 then dena.p(.w') = I if dena.pJL(JI) = I for all {3 E U and dena.p(.w) = 0 otherwise; a if .w = 3o:.'1d then dena.p(s1) = I if dena.pJL(:?J) = I for some {3 and dena.p(.w) = 0 otherwise. a

E

U

It is usual to define a relation I= (pronounced "satisfies") between interpretations and closed wffs such that al= .w iff den"(.w) = I. Some formulae, such as T, (A -+ A), (VxR(x)-+ 3xR(x)) are satisfied by any interpretation a. Such a formula s:1 is said to be valid (or a tautology in propositional logic) and we write I= .w. Conversely, if there is no a such that al= s:1 then s:1 is a contradiction or a falsehood or inconsistent and we write I=F .w. Examples are l_ and ((A -+ A) -+ 1_). Some wffs imply the truth of other formulae. For example, R(b) is satisfied by any interpretation that satisfies VxR(x). We extend the meaning of I= and write .<# 1 ••• •wn I= .?J whenever every interpretation that satisfies all the .9/; also satisfies Jd. In the case of propositional logic, since there are only a finite number of interpretations for a finite set of propositional variables, the truth of .9/ 1 ... s:1 n I= :?4 can be decided simply by enumerating the possible interpretations. However, in general, since U may be infinite, there are an infinite set of interpretations and thus the problem is more complex. Rather than trying to establish whether .w 1 .•• •wn I= :3d by enumerating interpretations, it is usual to

12

E. H. Mamdani et al.

define one or more syntactic transformations on formulae (called rules of itiference). For example, one commonly found rule of inference (called modus ponens) is defined thus: d

(d-+ 34)

34 meaning, given wffs of the form .s:l and (d -+ :?l) then one may itifer the formula 34. Thus inference rules of a proof system n (which may contain a (possibly infinite) set of axioms) define a derivability relationship I-, between a set of formulae and a formula: d 1 ••• d n I-, 34 is true if :?l = d; for some i, or :?l is an axiom or there is a sequence of wffs, :?1 1 ... 34m such that 34; is inferable with a single application of an inference rule from d 1 •.• d n:?l 1 ..• 34;- 1 and 34m = !Jl. Provided that the sequence :?1 1 ••• :?lm is finite we have a proof of 34 from d 1 ... ,<;( n. A very common proof system (when the only logical symbols of the language are -+, ---, and V) consists of the axiom schemast (i)

(ii) (iii) (iv)

(v)

d -+ (34 -+ .s#) ((.<:f -+ (:?l-+ ~)) -+ ((d -+ 34) -+ (d -+ ~)) ((o:?l-+ ---,,<;()-+ ((1.:?4-+ d)-+ :?l)) (Vr:xd -+ 34) where :?l is exactly the same as d except that all free occurrences of r:x

are replaced by some term {3; if f3 contains any variables then these variables must all be free occurrences wherever f3 occurs in 34; (Vr:x(.s# -+ :?l) -+ (.s:l -+ Vr:x:?l)) where d does not contain r:x;

and two rules of inference: modus ponens (as given above) and Gen:

The relationship between syntactic derivability (I-) and semantic entailment (I=) is very important. If the rules of the inference system n are such that .<:f 1 ... •<;( n I= .18 is true whenever .<:f 1 •.• •s:l n I-, 34 then 1t is said to be a sound proof system: I-" preserves validity. Conversely, if d I ... ,<;( n I-" 34 is true whenever d 1 ... ,<;{ n I= 34 then 1t is said to be complete. Informally, soundness means that we can only prove true things, and completeness means that we can prove all true things. Clearly we should expect a deductive system to be sound. Although completeness is a desirable property, some logics have no complete inference systems. In particular, second- and higher-order logics are tAn axiom schema defines an infinite set of axioms that are obtained by consistently substituting wffs for the meta language variables .ol, :!1, etc.

13

Introduction

all incomplete. However, there are many proof systems that have been proved complete for FOL. A related but different problem is that of decidability. A system is decidable if there is a mechanical way of deciding whether s1 1 ••• •ol n I- .q(j is true or not in a finite time. Propositional logic is decidable, but no proof system for FO L is. In fact FOL is semidecidab/e: the best we can do is to give a mechanical procedure that will eventually terminate when given d 1 ... •ol" I- fiB provided that .91 1 ••• •ol n I= !Jd, but which may carry on for ever if .91 1 ••• •91 n I+ flJ. We shall make one other point here concerning I-. It should be obvious that there is a close relationship between I- (which is a symbol of the meta language we are using to describe a logic and its proof system) and -+ which is a symbol in the logic itself. This relationship can be formalized as the deduction theorem: if .oil ... .ol n84 I-"~ then .o/1 .. .. ol n I-,. (31-+ ~)

Classically, any proof system is monotonic; thus if .91 1 ••• d n I- .OJ then 1 ... d "~I- 31; adding hypotheses cannot alter the derivability of a wff. Some of the chapters in this book are concerned with non-monotonic inference systems. Non-monotonic inference rules are non-constructive: there is no effective procedure for computing them (they are usually defined using fixed points). For any given logic (i.e. language + model theory), there are, in general, many proof systems. These may be classified into several kinds. Axiomatic systems have a set of axiom schemas (such as we have seen here) and very few inference rules. Natural Deduction systems (see e.g. Tennant, 1978) have no axiom schemas, but many inference rules (typically an introduction and an elimination rule for every logical symbol and one additional rule). None of these systems were invented with automated reasoning in mind. Many proof systems have also been developed explicitly with the intention of creating a computer implementation. Typically in these systems, inference rules are made much more complicated (which is no great problem for a machine) in order to do "more work" at a single inference step and thus, often among other means, reduce the search space for a proof. See for example Robinson (1979), Wos eta/. (1984) or Bundy (1983). We have concentrated on FOL (and implicitly its sublanguage, propositional logic) here. We have also briefly mentioned higher-order logics, which would also normally be considered as "classical". Other variations of classical logic are possible. One variation, which can generally be considered as an extension of FOL, and which is important from the point of view of efficient automated reasoning, is many-sorted logic (MSL). In MSL, d

E. H. Mamdani et al.

14

rather than considering the universe of discourse as a single homogeneous set, it is divided into subsets called sorts. Function and predicate symbols are then defined only to make sense on particular argument sorts. Axiomatizations in MSL are usually smaller and proofs shorter than they would be in standard FOL. Many variants of MSL are possible; for example, the set of sorts may be disjoint or partially ordered, perhaps into a tree or lattice (there is a strong relationship between such sort hierarchies and the isa-hierarchies, type-hierarchies and inheritance-hierarchies to be found in AI systems). Some other variants of MSL are discussed in Cohn (1987). lntuitionistic logic has its roots in the constructivist movement in mathematics, of which Brouwer was the prime mover. Brouwer insisted that mathematical validity have a constructive character; he rejected, for example, demonstrations that statements of the form (P v -, P) are true when we cannot either demonstrate P or demonstrate -, P, maintaining that if we have no way to show that either P is true or -, P is true then we have no right to assert (P v -, P). For similar reasons (1-, P-+ P) is invalid: seeing that there is no way of refuting P is not sufficient reason for asserting it. For Brouwer, intuitionism in mathematics was primary, logic was secondary. Nevertheless, intuitionistic notions of truth and validity were axiomatized by Heyting. The resulting intuitionistic logic is not just of theoretical interest. Apart from the application in this book, it has recently been taken up by computer scientists in foundational studies of programming. We have provided a very brief overview of first-order logic and discussed important notions such as semantics, soundness, completeness and decidability. We have also very briefly mentioned some ways in which logics can deviate from the classical. Modal logics, which we have not mentioned here, are covered in Appendix B. There are many books that give far more detail than there is room for here on the subjects we have covered. Some of these can be found in the bibliography below. My thanks are due to lain Craig for help with proof reading and to Mike Clarke for the paragraph on intuitionism. Bibliography Bundy, A. (1983). The Computer Modelling of Mathematical Reasoning. Academic Press, London. Cohn, A. G. (1987). A more expressive formulation of many sorted logic. J. Autom. Reasoning 3, 113-200.

Enderton, H. B. (1972). A Mathematical Introduction to Logic. Academic Press, New York. Hodges, W. (1977). Logic. Pelican, Harmondsworth, Middx.

15

Introduction

Mendelson, E. (1964)./ntroduction to Mathematical Logic. Van Nostrand, New York. Robinson, J. A. (1979). Logic: Form and Function. Edinburgh University Press. Tennant, N. W. (1978). Natural Logic. Edinburgh University Press. Wos, L., Overbeek, R., Lusk, E. and Boyle, J. (1984). Automated Reasoning. PrenticeHall, Englewood Cliffs, NJ.

A. G. COHN Department of Computer Science University of Warwick Coventry, UK

APPENDIX B:

MODAL LOGIC-A BRIEF TUTORIAL

Modal logic is an extension of ordinary classical logic that deals initially with possibility and necessity, and has been extended to related concepts. Modern modal logic was founded by C. I. Lewis in his attempt to clarify the relation between implication and deducibility. Today modal logic gives a multipurpose framework useful for the analysis of belief, knowledge, obligation, temporal statements, etc. Solovay also found some very interesting results concerning provability and consistency in mathematics (Boolos, 1979). The notation used is that of ordinary logic supplemented with the box D (or L, or N) which is usually read "it is necessary that .. .' In normal systems one defines a dual operator 0 as -, D-,, which is usually read "it is possible that ... " 8.1

Formal definition of normal modal propositional logic

Alphabet We shall take as primitive signs .L, -+,

o, (, )

along with a set L of sentence letters

The intended meaning of ".L" is "false" and"-+" is material implication.

Sentences .L is a modal sentence. All sentence letters are modal sentences.

16

E. H. Mamdani et at.

If A and B are modal sentences then (A -+ B) is a modal sentence. If A is a modal sentence then DA is a modal sentence called the necessitation of A.

Example I.

2.

({D(p -+ .l) -+ .l) -+ .l) is a modal sentence; is not a modal sentence. (Dp-.? # ? )

Abbreviation If A is any modal sentence then

(A -+ .l) -,D-,A -,.l

is abbreviated by -,A, is abbreviated by ()A, is abbreviated by T.

So the first example becomes -, ()p. We shall also use the traditional abbreviations for "and" ("},"or" ( v) and unambiguous use of parentheses. A normal modal system contains all the theorems of classical propositional logic, including modal ones such as (Dp-+ Dp) It contains all instances of the schema of distribution, i.e. sentences such as

D(A -+ B) -+ (DA -+ DB) It is complete for the following rule of inference

A A-+ B B called modus ponens; A DA

called necessitation.

We see that the scheme of distribution, together with the rule modus ponens, permits the derivation of the following rule: DA

D(A-+ B) DB

which is a kind of modal modus ponens. Proofs and theorems are defined in the usual way.

17

Introduction

8.2

Some interesting sentences

The following formulae are not deducible in the normal system; we can thus use them to approach concepts in which we are interested, such as belief, knowledge, etc.:

T. D. 4. 5. B.

L.

DA-+ A OA-+ OA DA-+ DOA OA-+ DOA A-+DOA O(OA -+ A) -+ OA

Discussion The formula T is often used to analyse knowledge that is supposed to be perfect knowledge: that is, if I know A then A is true. In this case 0 is being interpreted as "I know that". Only true facts could be known. This formula is certainly not suitable for belief, because it would be counterintuitive to admit that if I believe A then A is true. The formula Dis a weaker form ofT. It is equivalent to •(DA" D 1A), which we can read as it is false that simultaneously I know (believe) A and that I know (believe) not A.

With respect to belief or knowledge, it is a kind of self-confidence assertion, and is often taken as an axiom. The formula 4 could also play a role in knowledge and belief when we admit a kind of positive introspection (like the (meta) predicate "clause" in pro log): if I know (believe) A then I know (believe) that I know (believe) A. This can be paraphrased as I know that I know. The formula 5, which is equivalent to paraphrased by

-, DA -+ 0-, OA, can be

I know (or believe) that I don't know (or believe). It is a sort of negative introspection, and is unlikely to be useful for a logic of knowledge or belief, but can be appropriate to model some uses of weak

18

E. H. Mamdani et al.

negation in the case of knowledge based systems with a closed-world-like assumption. The formula B was introduced to formalize, accompanied by T, the intuitionism of Brouwer but this attempt failed. In fact, Godel showed that intuitionism can be formalized with 4 and T. Today, B is used to formalize quantum logic. For knowledge and belief, B does not seem to be adequate. What is the formula L? In a sufficiently rich mathematical formalism like Peano arithmetic (PA) or the set theory of Zermelo and Fraenkel (ZF), Op can represent provable (1 p1 ) where 1 p1 is a representation of p in the theory. It is a remarkable fact concerning mathematical introspection that neither OA -+A

(a form of consistency statement)

A-+ DA

(a form of completeness statement)

nor

are provable in the theory. But a spectacular result of Solovay shows that, for Peano arithmetic, all the arithmetical theory of formal provability is deducible in the normal system G, which admits Las a unique axiom. For example, if OA means "A is provable in PA" then 0 T = -, O.l is a consistency statement, and the second incompleteness theorem of Godel OT-+IOO.l is a theorem of G. It can be read as if Peano arithmetic was saying if I am consistent then I cannot formally prove that I am consistent. It is inspiring for a computer-scientist's approach of self-reference and introspection.

8.3

Some modal systems

The system K has no axioms other than the distribution axiom: O(A -+ B) -+ (OA -+ DB) All systems considered here are extensions of K, that is they all have the theorems of K. The following diagram shows the extension relation between some systems built with the formulae discussed.

19

Introduction

G = KL

I have used the nomenclature of Chellas (1980), by which this diagram is inspired. Thus, for instance, KD45 means the system with axioms K, D, 4, 5. The traditional names for KT, KT4, KT5 and KB are T, S4, S5 and B respectively.

8.4

Semantics

The semantics of modal logic is based on the idea of possible worlds due to Leibnitz. The problem is to find a way to clarify the intended meaning of 0 and (>. Kripke provided neat mathematical representations for these. Op (pis necessarily true) can be read as "pis true in every world or every situation I can meet". And so, within normal systems, (>p will intuitively mean that there is a world or a situation that I can meet and in which p is true. So we face a set of worlds (or situations, states of world, states of mind, states of a machine) linked by an accessibility relation. Mathematically, we call a frame just any set W with a binary relation R. The set of worlds is chosen according to the application. Now, in each world any sentence is either true or false. A valuation V is given: it is a function that assigns to each sentence letter the value 1 (true) or 0 (false) in each world. Thus it is a mapping

V: w

X

L

-+

{0, 1}

A model is a triple <W, R, V)

20

E. H. Mamdani et al.

We must say when an arbitrary modal sentence is true (or satisfied) in a world in a model. We shall therefore define the relation of satisfaction I= between models and modal sentences.

Notation <W, R, V) l=w A means "A is true in the world w in the model <W, R, V)".

1.

<W, R, V) l=w T for any w in W. This means that T must be satisfied in every world. 2. <W, R, V) lfw l_ for any w in W. l_ is never satisfied. 3. <W, R, V) l=w A if V{w, A) = 1 when A is a sentence letter. 4. <W, R, V) l=w A -+ B if <W, R, V) l=w A only when <W, R, V) l=w B. At this point, we can see clearly that modal logic is really an extension of classical logic. In each world, all tautologies will be satisfied. Now we must say when a sentence with the form DA is satisfied. 5.

<W, R, V) l=w OA if for all x such that wRx <W, R, V) 1=. A

Diagrammatically, where® represents a world in which A is satisfied and arrows represent the accessibility relation R, we have

A

if

It is not difficult to show that <W, R, V) l=w OA if there exists a world x such that wRx and <W, R, V) l=x A. Indeed <W, R, V) l=w OA iff <W, R, V) l=w -, 0-, A iff <W, R, V) lfw 0 -,A (Why? Hint: -,A= A-+ l_) iff there exists x such that wRx and <W, R, V) If. -,A iff there exists x such that wRx and <W, R, V) 1=. A thus

if

21

Introduction

8.5

Validity

In propositional logic, a proposition is valid if it is true for any assignment to the propositional letters it contains. In the semantics of modal logic, the accessibility relation is used to interpret D. So we are led to the following question: is it possible to give a purely semantical characterization of modal systems? More precisely, a sentence S is valid in a model <W, R, V) if it is true in every world in the model, i.e. V{w, S) = 1 for all w in W; a sentence S is valid in a frame <W, R) if it is valid in every model <W, R, V) based on that frame. Note that a sentence is valid in all frames iff that sentence is valid in all models (based on all frames). It is not difficult to show that all axioms of distribution are valid in all frames. We must prove that for all <W, R, V) we have <W, R, V)

l=w O(A

-+

B)

-+

(OA

-+

DB) for any w

Let us suppose that <W, R, V) l=w D(A -+ B) and <W, R, V) l=w OA; thus wRx implies <W, R, V) 1=. A -+ Band <W, R, V) 1=. A; and so, for all x such that wRx we have <W, R, V) l=x B thus <W, R, V) l=w DB. Remark

In the model <W, R, V)

with W = {w1, w2} R = {<w1, w2)} V(w1, p) = 1 V(w2, p) = 0

A -+ DAis not true in w1, and thus not valid in the frame <W, R). But the rule of necessitation means that if A is valid in the model then OA will also be valid, i.e. if A is true in every world in a model then DA will also be true in every world in that model. We thus have (soundness) if K I- p then p is valid in all frames where K I- p means "p is a theorem of K". It is possible to show that the reverse is true, i.e. if p is valid in all frames then p is a theorem of K (completeness). What about the validity in other systems? Could we characterize frames such that sentences valid in such frames are those and only those provable in

E. H. Mamdani et al.

22

theories like K4, KT, KT4, ... It will be done by imposing further properties on R. Suppose that R is reflexive (the frame (W, R) is also called reflexive by abuse of language). In such a frame OA-+ A is valid. Indeed as (W, R) is reflexive each world in W is accessible from itself. Suppose that DA -+ A is not valid in such a frame. Then there must exist a model (WM, RM, VM) with a world wE WM such that Op is true in wand pis false in w. As by reflexivity of RM one has wRMw, Dp is true in w only if pis true in w, contradicting the previous assertion. Therefore OA -+ A is valid in any reflexive frames. We have also that DA-+ DDA A-+ DOA OA-+ DOA DA-+ OA

is is is is

valid valid valid valid

in in in in

transitive frames, symmetric frames, Euclidean frames, serial frames

Recall that a frame is Euclidean when for all a, b, c in W, aRb and aRc imply bRc; and a frame is serial when for all a in W there exists b such that aRb. For example, a reflexive frame is always serial, showing semantically that KT is an extension of KD. Thus we have (soundness) if K 1- A if KT 1- A if K41- A ifKT41-A If KTB 1- A if KT51- A if KD 1- A

then then then then then then then

A A A A A A A

is is is is is is is

valid valid valid valid valid valid valid

in in in in in in in

all all all all all all all

frames, reflexive frames, transitive frames reflexive and transitive frames, reflexive and symmetric frames, reflexive and Euclidean frames, serial frames.

It is also possible to show that the reverses of these entailments hold (completeness), so the nature of the frame completely determines each of these modal systems. Bibliography Hughes, G. E. and Cresswell, M. J. (1968). An Introduction to Modal Logic. Methuen, London. (An easy introduction to modal logic. Unfortunately all systems described are extensions ofT.) Chellas, B. F. ( 1980). Modal Logic. An Introduction. Cambridge University Press. (A systematic introduction to modal logic. A lot of systems are analysed. Completeness and decidability results are proved. A basic book for serious studies.) Boolos, G. (1979). Unprovahility of Consistency. An Essay in Modal Logic. Cambridge University Press. (A clear introduction to the use of modal logic in metamathematics. Inspiring both for the psychologist and the computer scientist.)

23

Introduction

Fitting, M. (1983). Proof Methods for Modal and lnstitutionistic Logics. Reidel, Dordrecht. (A useful book for those who want to implement modal systems on a computer. A rigorous and deep analysis of modal systems.) BRUNO MARCHAL I Rl Dl A, Universite Lihre de Bruxel/es Brussels, Belgium

APPENDIX C:

LOGIC AND PROBABILITY

The idea that classical logic may not be enough to model the way humans reason, or even ought to reason, and that probability theory should play a role in that matter is not new. This concern appears in G. Boole's works (1854) on the laws of thought. Since then, formal logic has greatly developed and there have been several attempts to let probability degrees into formal logic systems. Reichenbach (1949) tried to marry logic and frequentist probability so as to obtain a many-valued logic system. However, Carnap (1950) pointed out that a degree of probability must not be confused with a degree of truth, because "true" is not the same as "known to be true". He put forward a subjectivist view of probability based on the idea of confirmation of an hypothesis h on the basis of given evidence. Basically he considered that the probability P(h) is the proportion of worlds that make h true among those which evidence suggests as being possible. Following Carnap's intuition, some further research has tried to develop semantics for probability logic involving first-order formulae. This is the path followed by Los (1963), Fenstad (1967) and Keisler (1985). For instance, if IX is a formula of a language, with one free variable x, and M is a model where x is defined, then the probability P(IX) is defined by P(IX)

=Is JlM(1X(M)) d).(M)

where 1X(M) is the set of values of x in M for which IX is true, JlM(1X(M)) is the proportion of such values in M, and). is a probability measure defined on the set S of models. Keisler (1985) noticed that this type of logic could capture quantifiers between V and 3, and studied model-theoretic properties with no first-order analogue, such as the law of large numbers. Adams and Levine (1975) proposed a linear-programming approach to probabilistic deduction from probability-valued propositions. The modelling of conditionals, i.e. sentences of the form if A then B, is a Particularly thorny issue in any probability logic, because the probability of

24

E. fl. Mamdani et al.

the material implication (P(-,A v B)) is not a conditional probability (P(B I A)) generally. Adams (1975) assumes that uncertain conditionals can be expressed by means of conditional probabilities. However, the reconciliation between conditional probability and purely logical conditionals has been the focus of a very strong debate in the nineteen seventies; this debate is reported in Harper eta/. (1981). Denoting by A> B the conditional "B if A", Stalnaker (1968) interpreted the truth of A > B in the framework of possible world semantics. Namely, A > B is true in world w if and only if B is true in the world w' that is the closest possible to w and that makes A true. Stalnaker has investigated whether the identity P(A > B) = P(B I A) is at all reasonable; but Lewis ( 1976) proved that no probability function meeting this identity can assign positive probability to three or more pairwise incompatible statements. Discussions about the meaning of Stalnaker's conditionals versus Adams' point of view appear in Harper et a/. ( 1981) and especially in the paper by Gibbard (1981 ). It seems that Stalnaker's approach naturally expresses subjunctive conditionals ("if men had wings they would fly"), while conditional probability is closer to indicative conditionals ("if John is not the murderer then it is Peter"). A new approach to conditionals has been proposed by Goodman and Nguyen (1987), which may turn out to be a significant step towards making logical conditionals and conditional probabilities compatible. Strangely enough, most of the artificial-intelligence literature in probabilistic reasoning has focused on algorithmic issues (how to efficiently process a Bayesian network, or a set of probability-valued formulae), but not on semantics or representational issues; in particular the controversy concerning conditionals, which really belongs to a debate between probability and logic, has not yet been widely considered by artificial-intelligence researchers, although the debate between logic and probability as a proper tool for implementing reasoning processes is particularly lively at present, as seen by several papers by Cheeseman and others (see references in Paass' contribution to this volume). DIDIER DUBOIS and HENRI PRADE

LSI, Universite Paul Sabatier, Toulouse, France

References Adams, E. W. ( 1975). The Logic of Conditionals. Reidel, Dordrecht. Adams, E. W. and Levine, H. P. ( 1975). On the uncertainties transmitted from premises to conclusions in deductive inferences. Synthese 30, 429-460.

Introduction

25

Boole, G. (1954). An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities. MacMillan. (Reprinted by Dover, New York, 1958.) Carnap, R. (1950). Logical Foundations of Probability. Routledge & Kegan Paul, London. Fenstad, J. E. (1967). Representations of probabilities defined on first order languages. Sets, Models and Recursion Theory (ed. J. N. Crossley), pp. 156-172. NorthHolland, Amsterdam. Harper, W. L., Stalnaker, R. and Pearce, G. (eds.) (1981). ~fs. Conditionals, Belief, Decision, Chance and Time. Reidel, Dordrecht. Keisler, H. J. (1985). Probability quantifers. Perspectives in Mathematical Logic (ed. J. Barwise and S. Feferman), pp. 509-556. Springer-Verlag, Berlin. Lewis, D. ( 1976 ). Probabilities of conditionals and conditional probabilities. Phil. Rev. 85, 297-315. Also in Harper eta/. (1981, pp. 129-150). Los, J. ( 1963). Semantic representations of the probability of formulas in formalized theories. Studia Logica 14, 183-194. Reichenbach, H. (1949). The Theory of Probability. University of California Press, Berkeley and Los Angeles. Stalnaker, R. (1968). A theory of conditionals. Studies in Logical Theory (ed. N. Rescher). Blackwell, Oxford. Also in Harper eta/. (1981, pp. 41-55).

1

On Game-Theoretic Interactions with First-Order Knowledge Bases PETER JACKSON Department of Artificial Intelligence, Edinburgh University, UK

Abstract Suggestions that knowledge-representation languages should be based on logical languages appear to be predicated upon the assumption that one might thereby be able to ensure the correctness, completeness and consistency of the knowledge so represented. The question then arises as to how the knowledge engineer, in developing a particular knowledge base, can use methods associated with automatic theoremproving to derive any of these benefits. This chapter describes one approach to this problem, in which the interaction between a knowledge engineer and knowledge base under development is regarded as a language game. A game-theoretic interpretation of first-order logic is used to design an inference engine that helps the knowledge engineer to explore the consequences of adding new knowledge. It provides both rules governing the conditions under which a new proposition can be embedded in a model, and interaction strategies that the inference engine can adopt in response to assertions and questions. A modal extension to this logic is then presented, and it is shown that a suitable doxastic system can be obtained from the game rules by restricting the accessibility relation. Finally, this more expressive language is used to specify domaindependent game-playing heuristics.

1

INTRODUCTION

This research was undertaken with a view to exploring the role that logic might play in the incremental development of knowledge bases. The central idea was to try and use logic as a tool for both the specification of knowledge bases and the investigation of their epistemological properties. Knowledgebased programs are just as hard to specify and verify as more conventional pieces of software, and almost any advance in this area would be worthwhile. A first-order language, under a suitable interpretation, might provide a basis for constructing a program that would help knowledge engineers to explore and experiment with knowledge bases as an integral part of the process of creating them. This is because the notion of consistency is well defined for such languages, and an expert or knowledge engineer can also use logic as a query language to monitor the inferential behaviour of such a system. ~~AN-STANDARD LOGICS FOR AUTOMATED SONING ISBN 0.12-649520.3

Copyright («) /9HH Academic Pres.
P. Jackson

28

1.1

Incremental knowledge-base development

One approach to the incremental design of a knowledge-based system is to consider the interpreter for the knowledge base as capable of entertaining certain beliefs, based upon the contents of the knowledge base. One uses the term "belief" rather than "knowledge" since, as the knowledge base undergoes development, there is no guarantee that it will always be consistent or correct. This is particularly problematic in systems that require reprogramming or reconfiguring in the natural course of their operational lives, for example knowledge bases for battle management systems, financial and legal knowledge bases, and so on. A knowledge engineer and a knowledge-base interpreter can then be viewed as a cooperating man-machine system, in which knowledge-base maintenance proceeds via a dialogue of assertions and queries while the two parties seek to build a mutually acceptable and internally consistent model of the domain. The present situation is somewhat one-sided insofar as, although the knowledge engineer will always attempt to monitor the performance of an evolving system, there is no active role for the interpreter in the critiquing of new knowledge presented for incorporation into itself. Although it will typically lack sensors with which to verify the veracity of statements in the knowledge base, it does have complete access to the knowledge base itself, and could therefore, under certain circumstances, be regarded as an authority on its own set of beliefs. Of course, the consistency of a set of sentences is undecidable in any reasonably expressive language. Nevertheless, in terms of the allocation of function between man and machine, automatic theorem-proving techniques may have something to offer in the construction of an aide for the knowledge engineer. This approach can be contrasted with that of earlier researchers (e.g. Davis, 1976; Suwa et a/., 1982; Nguyen et a/., 1985) in that one is talking about an incremental interaction with an evolving specification, rather than an analysis of the properties or performance of an existing program, such as a production system. In viewing the interpreter of a knowledge base as an active agent, one is departing somewhat from the standard interpretation of a logical language in that: (1) the interpreter's knowledge of the domain will amost certainly be incomplete, such that it will be unable to pronounce truth or falsity for every expression in the logical language; and (2) the "meaning" of the predicates of the language will be defined less in terms of total characteristic functions and more in terms of the patterns of acceptance and rejection exhibited by the interpreter on the basis of a partial interpretation. The standard way of defining a belief system on a language is first to define the language, in terms of a truth-functional semantics for its sentences, and

1 Game- Theoretic lmeractions

29

then to define a belief system by specifying a set of valuations over its sentences, such as truth, falsity and indeterminacy. Ellis ( 1979) criticizes this approach on a number of grounds. For example, it appears· to limit the range oflanguages to those whose connectiv!!s and operators are truth-functionally definable, thus ruling out modal languages, since operators like necessity (and their epistemic and doxastic counterparts) cannot be classically defined. It also assumes that sentence types or statements are the bearers of truth, rather than sentence tokens, such as utterances or inscriptions. Truth-functional approaches are particularly problematic if one is interested not merely in the truth of statements, but also in pragmatic properties of questions and commands. Questions and commands are sentences, but they are neither true nor false; rather we are typically interested in the acceptability of their propositional contents. For example, the command "Shut the door!" will not be obeyed if the recipient believes its propositional content, that the door be shut, i.e. if he believes that the door is already shut. This leads to an analysis in which the sentences "The door is shut", "Is the door shut?" and "Shut the door!" have a common propositional component, and differ only in a pragmatic component, such as mood (see e.g. Lewis, 1972). It is then tempting to treat sentences that make statements, such as "The door is shut", in a manner that is uniform with the treatment given to questions and commands. In such a treatment, even declarative sentences are neither true nor false, rather an interpreter will accept or reject the statement being made as a function of the valuation it places upon the propositional component of the sentence token in some context. The differentiation of propositional and pragmatic components in sentences outlined above is consistent with the distinctions proposed in speechact theory (Austin, 1962; Searle, 1965) between propositional and illocutionary acts, while the behaviour of the interpreter is an aspect of perlocutionary force. It enables us to distinguish between semantic properties of sentence tokens, such as truth or falsity, and propositional attitudes of interpreters, such as agreement or disagreement. However, it also requires us to take into account the fact that propositional contents can be objects of belief, and that agents will typically possess strategies for gathering evidence for and against both their own beliefs and the beliefs of others. 1 ·2

A game-theoretic approach to knowledge-based systems

This paper follows on from earlier work (Jackson, 1987a) in that it attempts to apply a particular non-standard interpretation of logic, known as "gametheoretic semantics" (Hintikka, 1973, 1983), to the evaluation of assertions ~nd queries with respect to a knowledge base. In a game-theoretic Interpretation of logic, any attempt to establish the truth or falsity of an

30

P. Jackson

expression S in a language L is correlated with a two-person, zero-sum, perfect information game, GS, played according to the rules of L. One can think -of the two players as MIN and MAX, and the game consists of MAX seeking support for S, while MIN looks for a refutation. For example, the semantic rules ofL might be the following, where the syntax ofL is a subset of the propositional calculus. (G. v)

If Sis (P v Q) then MAX chooses either P or Q, and the game goes on with respect to it

(G.&)

If S is (P & Q), then as above, except that MIN chooses

(G. "')

If S is "'P then MAX and MIN swap roles, and the game goes on with respect to P

Hintikka assumes an interpretation for the atomic propositions that assigns either truth or falsity to each propositional constant. Thus the game rules provide for the recursive evaluation of complex expressions relative to this interpretation in a finite number of steps. In addition to the above, Hintikka provides rules for first-order languages, assuming a domain of individuals, D. (G. E) (G. U)

If Sis (Ex)Fx then MAX chooses some d from D, and the game goes on with respect to Fd If Sis (Ux)Fx then the situation is as above, except that MIN chooses

The basic idea is that, if a sentence s of a language L is true under a particular intepretation I, i.e. if I is a model of S, then it can be shown using only the game rules of L, hence (G. T)

S is true iff MAX has a winning strategy in GS

The truth rule (G. T) holds only in the case that our model is complete. However, if one uses logic to reason about some real-world domain, it is conceivable that only a partial interpretation may be available. In such circumstances, it is helpful to think of the knowledge base k as representing not a model but a "model set" K, i.e. an incomplete but consistent state description that could specify more than one possible world. This leads to the question of what restrictions one can place upon the embedding of new sentences in a model set, such that consistency is preserved. Both Hintikka (1955, 1962) and Ellis (1979) have formulated such criteria, albeit in slightly different ways. Hintikka considers model sets to be "a good formal counterpart to the informal idea of a (partial) description of a possible state of affairs". Ellis' counterpart of a model set is a "rational belief system": a set of

31

J Game- Theoretic Interactions

beliefs that is "defensible against all internal criticism", i.e. any criticism that depends solely upon argument by reductio ad absurdum. Even if some sentence S is true in every model in K, (G. T) assumes that MAX makes optimal choices. For. example, if MAX chooses P during G(P v Q), where P is false in every model in K but Q is true in at least one model, then MAX loses unnecessarily. Similarly, the converse of (G. T) (G. F)

S is false iff there is a winning strategy for MIN

assumes that MIN makes optimal choices. In other words, game-theoretic semantics is neither sound nor complete unless optimal choices are made by both sides. Jackson (1987b) describes an implication of a game-theoretic interpreter for first-order logic in which complex formulae are recursively unwrapped, with semantic game rules such as (G.&) and (G. v) being applied at each step. When an atomic expression is reached, a further set of proof-theoretic game rules apply. The purpose of these rules is to determine whether or not the atom in question is a game-theoretic consequence of the current knowledge base k. The extra rules are needed to cope with the presence of logical operators in the elements of k. For example, given (P & Q) E k, MAX should be able to win GP, but there is nothing in the rules supplied so far that provides for this. Let the set kP be the set of all propositions in k that are relevant to the evaluation of P according to some indexing scheme. For example, kP could be the set of all elements of k that contain instances of the predicate that governs P. To simplify the exposition, let us assume that negation has been driven in to the formulae of k. Then, for each Q E kP there is a subgame F(P, Q), with the following rules. (F. T)

If Q = P then MAX wins

(F · "')

If Q = - R then MAX and MIN swap roles and play F(P, R)

Thus kI-P if P

E

k, and role reversal gets the right result where - P

E

k.

(F · &) · If Q = (R 1 & R 2 ) then the players play F(P, Ri) for some i (F · v)

If Q = (R 1 v R 2 ) then the players play F(P, Ri) for some i. The winner of F( P, Ri) must then win G(- R i) for j =f. i in order to win F(P, Q)

In each case, choice of Ri goes to MIN if a substitution instance of P occurs in Q in the scope of negation, and to MAX otherwise. (F · U)

If Q = (Ux)R then the players play F(P, R.1x), where R.1x denotes the result of substituting uniformly a constant a for x in

32

P. Jackson

R. Choice of a goes to MIN if a substitution instance of P occurs in R in the scope of negation, and to MAX otherwise. (F. E)

If Q = (Ex)R then the players play F(P, R. 1x), where Ra 1x denotes the result of substituting uniformly a constant a for x in R. Choice of a goes to MAX if a substitution instance of P occurs in R in the scope of negation, and to MIN otherwise.

Fuller explanation of these rules can be found in Jackson (1987a, Chapter 3; 1987b, Section 2). It turns out that the relation of game-theoretic derivability they define is sound with respect to the standard semantics of classical logic but not complete. Briefly, (F. v) and (F. E) are weaker than their natural deduction counterparts, disjunction elimination and existential elimination. We assume, in playing GP, that the game halts if a player wins a subgame, F(P, Q) in kP. Pis a game-theoretic consequence of k iff MAX has a winning strategy. If kP is exhausted without a winner being found then GP is drawn. This is a three-valued calculus, since it is possible that k logically implies neither P nor - P. As before, the soundness and completeness of these games depends upon players making optimal choices. Unfortunately, it is very hard to find general-purpose strategies to guide game choices. The program described in Jackson (1987b, Section 4) used a collection of domain-free heuristics hard-coded into the procedures associated with logical connectives. However, the approach was found to be inflexible, and the heuristics did not work well with formulae of arbitrary complexity. The best results were obtained where heuristics were used to order choices, rather than make them irrevocably. Quantification games were a particular problem, and were implemented by assuming a small finite domain and allowing backtracking.

2

Towards a logic of assertions and questions

What follows is an attempt to provide a functional specification for a knowledge base that avoids talking about truth, falsity and indeterminacy, as well as avoiding all reference to the truth-functional interpretation of the quantifiers and connectives. This can be seen as a departure from the work of both Ellis and Hintikka described above, in that Ellis still refers toT, F and X evaluations over the sentences of a language in his specification of rational belief systems, while Hintikka explicitly attaches interpretation games to the quantifiers and connectives of a language, both when determining the truth value of a compound expression relative to some model and when stating criteria for the embedding of a sentence in a model set. Instead, the specification provided will be solely in terms of the patterns of acceptance and

33

1 Game- Theoretic Interaction.,·

rejection that a knowledge-base interpreter exhibits in response to sentences (rather in the manner that Ellis himself advocates in the specification of rational belief systems on a language), and in terms of 'the game-playing strategies that the interpreter employs in determining what its behaviour should be (which appears to be consonant with Hintikka's very general definition of a strategy as "a rule that tells the player in question what to do in each conceivable situation that can come up in a game"). In what follows, bold capital letters, such as L and M stand for sets and functions in the metalanguage. Italic capitals in various ranges stand for predicates (F, G, H) and formulae in the object language (P, Q, R). When subscripted, as in P" Q" etc., the latter stand for literals, i.e. atomic propositions or their negations. Lower-case italic letters in various ranges stand for individual constants (a, b, c), functions (f, g, h), and variables ranging over individuals (x, y, z) and possible worlds (u, v, w). In Section 3 the natural numbers N = {0, I, 2, ... } are used to index possible worlds. Bold lower-case letters such as k and s stand for various computational objects, such as knowledge bases and substitutions. The rest of this chapter describes work on a new formulation of the problem in terms of meta-level inference. The rationale behind the approach is spelled out, with particular reference to the shortcomings of previous efforts mentioned above. The next section introduces an explicit separation between propositional evaluation and perlocutionary effect in the processing of statements and questions, which allows the retention of a two-valued logic. Section 3 describes modal extensions to a game-theoretic meta-interpeter, motivated by the need to reason explicitly about belief in the specification of domain-dependant game-playing heuristics. Section 4 summarizes the research and relates it to other work.

2.1

Accepting and rejecting assertions

Let the syntax of our representation language L be that of the clausal form of logic. For the purposes of knowledge-base specification, let us agree to classify the clauses in the following way. A knowledge base k can contain "facts" of the form

[P" .. . , Pn

~]

With the interpretation that one or more of the P; are acceptable to k; "constraints" of the form [~

P,, ... , Pn]

With the interpretation that the P; are not simultaneously acceptable to k; and

34

P. Jackson

"rules" of the form

in which the Qi are joint conditions and the !'i are alternative conclusions, such that one or more of the Pi are acceptable to k if all of the Qi are acceptable. Each of the Pi and Qi are atomic formulae (atoms) of the general form F(t 1 , •• • , tk), where F is a k-place predicate, and the ti are terms that are either individual constants, individual variables of quantification, or function~ argument compositions of the form f(t 1 , ••• , ti) where f is a j-place function and the arguments are themselves terms. Let M be the set of models that such a language will support. It is clear that the empty knowledge base has M as its model set, since not a single model in M can be ruled out. On the other hand, there may be states of k that have no model, and this can be shown syntactically, by demonstrating inconsistency. The task of knowledge-base maintenance can be characterized as the task of moving towards an ideal knowledge base k* that has the smallest set of models from M such that k* remains free from both internal and external criticism. Freedom from internal criticism has already been defined as invulnerability to arguments based on reductio ad absurdum, while freedom from external criticism requires that k* appears to represent the beliefs of the expert. Levesque (1984) has suggested that the act of informing a knowledge base k in some language L can be viewed as a function, along the lines of TELL: k x L -+ k while the act of querying a knowledge base in some language can be viewed as a function, along the lines of ASK: k x L

-+

{yes, no, unknown}

assuming that only yes~no questions are allowed. In terms of first-order provability, Levesque expresses this functionality as follows: TELL(k, S) = (k & S) ASK(k, S) = (yes if I- (k :J S)) (no if I- (k :J -S)) (unknown otherwise) However, there is a certain asymmetry in this characterization. If we view telling as nothing more than an editing operation from a functional point of view then there is no guarantee that k will remain consistent. In the "extended

J Game- Theoretic Interactions

35

interface language" that Levesque goes on to present, he introduces a modal operator K such that KS denotes that the knowledge base currently knows that S. He then introduces the notion of a set of knowledge-base structures K, which contains just those world structures that are compatible with the knowledge base, and is therefore close to a model set, in the terminology of this chapter. Finally, he enhances his definitions of ASK and TELL so that a given query Sis now evaluated by seeing if KS is true on K, i.e. true of all the world structures in K, while a given assertion produces either K itself (i.e. the assertion is redundant) or the empty set (i.e. the knowledge base is now inconsistent) or a new set that is the intersection ofK with those worlds where the assertion is true. The question then arises as to what are good mechanisms for detecting those cases where the addition of a new assertion narrows the model set of a knowledge base down to the empty set. Imagine that there are only two strategies available to the interpreter for evaluating propositional contents, MAX and MIN. Both involve nothing more than a version of the proof-theoretic method for demonstrating inconsistency, whereby the interpreter gains an indirect proof of a proposition with respect to some model set by showing that its negation cannot be embedded successfully in it. The strategies differ in only the following way. Let GkP be the game played on a proposition P with respect to knowledge base k. The MAX strategy applies to formulae of the general form [ ~ P" .. ., Pn] for n > 0. MAX wins Gk[ ~ P" .. . , Pn] if and only if the set of propositions {[~PI, ... , Pn]} u k is unsatisfiable, otherwise MAX loses. The MIN strategy applies to formulae of the general form [PI, ... , Pn ~] for n > 0. MIN wins Gk[PI, ... , Pn ~] if and only if the set of propositions {[PI, ... , Pn ~ ]} u k is unsatisfiable, otherwise MIN loses. MIN and MAX can be generalized to games on any clause in L of the general form [PI, ... , Pn ~ Q" ... , Qm], where n, m > 0 in the following way: MIN wins Gk[PI, ... , Pn ~ Q" ... , Qm] if and only if MAX wins Gk[ ~ QI, ... , Qm] with substitutions and MIN wins Gk[P" ... , Pn ~ ]s, where [ ... ] s denotes the result of applying s to [ ... ].

If we allow that MAX wins its subgame when the conditions are empty, and MIN wins its subgame when the conclusions are empty, we can generalize the above to cases when m, n ~ 0. Let the empty conclusions be called reject and the empty condition be called accept. Thus MAX wins Gk[reject ~PI, ... , Pn] if MAX wins Gk[ ~PI, ... , Pn], and MIN wins Gk[P" ... , Pn ~accept] if MIN wins Gk[PI, ... , Pn ~].This allows for the evaluation of facts and constraints in the obvious way. A win for MAX is always a loss for MIN, and vice versa. This is because there is no concept of a draw at the level of MAX and MIN. If MAX t .I aJ s to generate the empty clause during Gk[ ~ QI, ... , Qm] then MAX has

36

P. Jackson

lost and MIN has won, while if MIN fails to generate the empty clause during Gk[Pt. ... , Pn +--]then MIN has lost and MAX has won. Both players have equal and total access to k. We are therefore dealing with a two-person, zerosum, perfect-information game. MAX and MIN can be considered as functions from both the set of sentences of Land the set of all model sets, 2M, to the set {win, lose}. Their semantics is as follows. MAX winning Gk[ +-- P 1 , ••• , Pn] denotes that the attempt to embed the proposition in the model set K has failed, while MIN winning Gk[P 1 , ... , Pn +--]denotes a similar failure. In other words,

PnJm n K

MAX wins Gk[ +-- P 1 ,

... ,

MIN wins Gk[P 1 ,

Pn +--]iff [Pt. ... , Pn +-- Jm n K = {

... ,

Pn] iff [ +-- P 1 ,

... ,

= { }

}

where P m denotes the model set associated with a proposition, i.e. all those states of affairs that constitute a model of P. When does the interpreter use which strategy? As in earlier work, it seems reasonable to assume that the interpreter should be critical of assertions presented to it, especially if they are proposed with a view to incorporating them into the knowledge base. Adopting the MIN strategy will accomplish this, and act as a guard against inconsistency. The user plays MAX, since he intends that the knowledge-base interpreter accept the assertion, and will want to know any grounds for rejection. On the other hand, the interpreter should be helpful when presented with a query, in that it should look for values for which the query can be satisfied. Adopting the MAX strategy will accomplish this, and return such values if they are to be found. The user will then be in the MIN role, in that he may adopt a critical attitude towards solutions, and demand alternatives. Nothing has been said so far about the interface language that the knowledge engineer uses to make assertions and pose questions. Let this language be the set of sentence types that can be formed by prefixing a mood symbol to any formula of the first-order calculus, expressed not in the clausal form but in an unambiguous variety of the full notation, which makes clear the scope of quantifiers and the precedence of connectives. Having the full notation will facilitate the introduction of modal operators in Section 3, by resolving any ambiguities that might result with respect to the relative scopes of modal operators and quantifiers. A meta-interpreter for the interface language is described in Section 2.3. Mood symbols denote the mood of the sentence in which a token of some formula occurs. Let > denote the declarative mood and ? denote the interrogative mood. The assertion > P will be considered as an illocutionary act on the part of the user, with the desired perlocutionary effect that the interpreter accept P, i.e. incorporate it into the knowledge base, while the

J Game- Theoretic lnteractiom

37

query ?P will be considered as a request that the interpreter inform the user whether or not the knowledge base logically implies P. Any expression understood by the knowledge-base interpreter will be a token of one of these sentence types. We are then interested in the patterns of acceptance and rejection exhibited by the interpreter to these sentence tokens, and not in the truth or falsity of either the sentence types, or their propositional contents. The effect of a sentence token upon the behaviour of the system will be a function of both the mood of the sentence and the valuation of its propositional content, the latter being the clausal form of the wff occurring in the sentence. It is in the interests of clarity to keep these two aspects of the system-propositional evaluation and perlocutionary effect-quite separate, apart from this functional relationship. The former is the function of the interpreter; the latter will be assigned to a separate entity, called the "executive". The executive handles such things as input-output and manipulations of the knowledge base; in short, all that is "side effect", as opposed to evaluation. Since we are not interested in the details of how the executive is implemented, let us characterize it by a function E: E(>, [PI, ... , P. +--- QI, ... , Qm], k) = (kif MIN wins Gk[PI, ... , P. +--- Q~> ... , Qm]) (k u {[PI, ... , P. +--- QI, ... , Qm]} otherwise)

2.2

Queries and quantifiers

At first glance, it might appear that queries of the general form ?[PI, ... , P. +--- QI, ... , Qm] could be answered using this mechanism, withE answering "yes" if MAX wins Gk[PI, ... , P. +--- QI, ... , Qm], and "no" otherwise. However, a little thought will show that this is not the case. A reply of "yes" would simply mean "yes, the proposition can be embedded in the model set", or "yes, some model in the model set is a model of the proposition", and not "yes, every model in the model set is a model of the proposition", which is what is required. Let us begin by looking at "PRO LOG-style" conjunctions of goals of the form ?[GI, ... , G.], with the interpretation "Does k logically imply GI & ... & G. (for some values of the variables in the G;)?" To answer this question in the affirmative, it is sufficient to show that MAX wins Gk[._ GI, ... ,G.], i.e. to show that{[+--- G~>····G.]} uk does not have a model. However, under such circumstances, a negative answer would be ambiguous between "it is not the case that k I- GI & ... & G." and "k 1- "-'(GI & ... G.)''. The confounding of "not provable" and "demonstrably not" implicit in the naive treatment is unacceptable in the present context.

38

P. Jackson

One way of dealing with this problem would be to respond to the query? P by attempting to show that k logically implied either P, ""P, neither or both, by playing two games: Gk[ +- P] and Gk[P +-]. MAX wins Gk[ +- P] if and only if kI-P, while MIN wins Gk[P +-] if and only if k I-"" P. The logic of propositional contents then remains two-valued, in that either MIN or MAX wins. Rather than there being three or four "truth values", e.g. true, false, unknown and inconsistent, there are four possible responses that our executive E can make, depending on how the games turn out:

E(?, [P" .. ., P.], k) = (yes if MAX wins Gk[ +- P" .. . , P.] and MAX wins Gk[P 1 , ... , P. +-] (no if MIN wins Gk[ +- P" ... , P.] and MIN wins Gk[P 1 , ... , P. +-] (unknown if MIN wins Gk[ +- P" .. . , P.] and MAX wins Gk[P" ... , P. +-] (inconsistent if MAX wins Gk[ +- P 1 , • .. , P.] and MIN wins Gk[P" ... , P. +-] Having handled the simple case of conjunctions of goals containing only existentially quantified variables, it is not hard to see how a meta-interpreter could be written that would take arbitrary formulae of the first-order calculus and construct the corresponding clausal queries and assertions required by the games specified in the function E. For the connectives, the following axiom schemata suffice:

NR:

[ +-

NL:

[P, -P

P, ""P] +-]

vP:

[(P v Q)

+-

P]

vQ:

[(P v Q)

+-

Q]

vE:

[P, Q

+-

(P v Q)]

&E:

[(P & Q)

&P:

[P

+-

(P& Q)]

&Q:

[Q

+-

(P& Q)]

CP:

[(P

::J

Q), p

+-]

CQ:

[(P

::J

Q)

Q]

CE:

[Q

+-

(P

+-

+::J

P, Q]

Q), P]

Such rules are similar to those for manipulating semantic tableaux, which are related, in their turn, to the sequent calculus.

Game- Theoretic Interactions

39

Handling the quantifiers is slightly more complicated:

Em.. :

MAX wins [ +--- (3x)( ... x .. .)] if MAX wins [ +--- (.. .f. .. )]

Um..: MAX wins [ +--- (Vx)( ... x .. .)] if MAX wins [+---( ... f ... )] Emin:

MIN wins [(3x)( ... x .. .) +---] if MIN wins [( .. .f. .. ) +---]

Umin:

MIN wins [(Vx)( ... x .. .) +---] if MIN wins [( ... y .. .] +---]

where xis the bound variable of quantification, y is a new variable uniformly substituted for x in the matrix, and f is a Skolem function of all the free variables in (... x .. .). Processing arbitrary queries of the form ?P, where Pis a wff of the interface language, now reduces to playing the games Gk[ +--- P] and Gk[P +--- ], as specified in the executive function E. The meta-interpreter will see to it that the correct clausal games are played. The main advantage is that the logic of propositions is still two-valued, since one and only one of MAX and MIN will win propositional games. Queries are important, because they give the expert and the knowledge engineer the ability to check that the representation of knowledge permits the drawing of just those inferences that they would sanction, and allows them to exercise their right to accept or reject the answers that they receive. This is consonant with the aim of providing a more symmetric interaction between the knowledge base and the knowledge engineer. It is also in harmony with the somewhat pragmatic view of truth and consistency espoused in this chapter.

3

MODAL EXTENSIONS TO THE META-INTERPRETER

The incremental approach to knowledge-base construction described herein requires that a knowledge-base interpreter reason about its own beliefs. It therefore seems natural to extend the interface language and the underlying theorem prover to accommodate doxastic operators. The idea is that introducing modal operators to the meta-language will help us to reason about the underlying model set directly, as a heuristic device in the generation of object-level proofs. Section 3.1 outlines a game-theoretic interpretation of first-order modal logic that is suitable for all normal systems where the accessibility relation is serial and the Barcan formula holds. Section 3.2 examines a particular doxastic logic, KD45, and considers how one might be able to construct autodoxastic theories incrementally and on demand, using a modified version of this proof procedure. Section 3.3 illustrates how a language extended in

40

P. Jack.l'on

this direction might facilitate the specification of domain-specific gameplaying heuristics.

3.1

Game rules for modal logics

The correspondence between modal treatments of doxastic logic and the underlying semantics of model sets derives from Hintikka ( 1962). The models in a model set can be viewed as the set of possible worlds W accessible from an agent's representation of the real world, i.e. conceivable in terms of what he believes in the real world. The valuation V assigned to a proposition P, V(P) E 2 W, is now no longer truth or falsity but the set of worlds in which P is true as far as the agent knows. However, it is not the case that all worlds are accessible from (i.e. conceivable in) all other worlds. A modal model is therefore a triple (W, R, V), where R <;;; W 2 is a binary accessibility relation defined over W. The modal operator L has the general reading "it is necessary that", such that LP means "P is true in all possible worlds", while M has the general reading "it is possible that" M P means "P is true in some possible world". The doxastic reading of LP is "Pis believed", while M P has the reading "Pis believable", in this sense of being consistent with some set of beliefs. In what follows, LP will be true with respect to a knowledge base k only if Pis true in all models in the corresponding model set K, while M P will be true only if Pis true in some model in K. The modal operators L and M can be introduced to the meta-interpreter by the following rules, where i is the current possible world, and let!; denotes that !X is true in the possible world i:

Mm..: MAX wins Gk[ +-IMP!;] iff MAX wins Gk[ +- IP!w;] Lm..: MAX wins Gk[ +- ILPI;] iff MAX wins Gk[ +- IPIJ:i] Mm;.:

MIN wins Gk[IMPI; +-]iff MIN wins Gk[IPIJ; +-]

Lm;.:

MIN wins Gk[ILPI; +-] iff MIN wins Gk[IP lw; +-]

where IPiw:i denotes the introduction of a world variable w, and states that P is true or false in all worlds accessible from the current world i, while IPIJ:i denotes the introduction of a Skolem function of any variables free in IPI;, stating that P is true in a particular world accessible from i. When introducing a new world constant in Lm•., this may need to be a function of any world indices introduced by M elimination. When introducing a new world constant in Mmin• this need to be a function of any world indices introduced by L elimination. We also need to keep track of

I

41

Game- Theoretic Interactions

accessibility relations between worlds as they are generated by the elimination rules, for reasons that will shortly become clear. Unification of two indexed literals Ia 11m, ... ,10 and IPii": ... :in now succeeds only if ( 1) a and {J unify and (2) im andj. can be shown to denote the same world. Clearly ifim andj. do not denote the same world then we do not have a contradiction, even if a and {J do unify. Indices im and j. are allowed to unify under the following conditions: (I)

(2) (3) (4)

we insist that i 0 = jo; if im and j. are both ground (i.e. neither of them contain variables) then we insist that im = j.; if im is ground and j. is a variable then we insist that im be accessible from j.- t. i.e. that RU.- 1, im); if im andj. are both variables then we insist that im be accessible from j.- 1 or j. be accessible from im- 1 , i.e. that RU. _ 1 , im) or R(im- 1 ,j.).

As an example, compare the proof of the S4.2 axiom, M LP ::::J LM P, with the failed proof of its converse, LM P ::::J M LP. Let us look at the failed proof first. Proof steps will be annotated off to the side. A introduces an assumption, while a substitution followed by two line numbers describes a resolution step. All other annotations cite a rule in the meta-interpreter, followed by the line number of the expression with which the rule resolved. We begin by placing the formula in the real world, which will be denoted by the number 0. (In what follows, the natural numbers N will serve as indices of possible world constants, and the scope of an index will be indicated by vertical bars.) Then we place the formula in the scope of negation and attempt to derive a contradiction.

LMP I. 2. 3. 4. 5. 6. 7.

::::l

MLP

+-!LMP ::::J MLP!o +- !MLP!o +- !LP!w,o +-

IPIJ(w):w:O

!LMP!o +!MP!v:o +!P!g(v):v:O +-

A CQ, I Mmax•2 Lmax> 3 CP, I Lmin• 5 Mmin•6

Where the notation w1 : ••• : w. indicates a chain of indices, in which w1 : wi denotes that world w1 is accessible from world wi. The Skolem functions g(v) and /(w) fail to unify, thus preventing the proof from going through and giving us the counter-intuitive theorem that if P is necessarily possible (i.e. Possible in all possible worlds) then it is possibly necessary (i.e. necessary in some possible world).

P. Jackson

42

The proof of the theorem, however, goes through without a hitch:

MLP I.

2. 3. 4.

5. 6. 7. 8.

:J

LMP

+--- IMLP :J LMPio +--- ILMPio +-1M PliO +---IPiw:t:o IMLPio +--ILPI2o +--IPI,·:2:o +--+---

A CQ, I

Lmox•2 Mmax• 3 CP, I

Mmin• 5 Lmin• 6 {wjv},4,7

The notation P w: 1 , 0 , where w is a world variable, denotes that Pis true in any world accessible from world I, where I is a particular world accessible from 0. Accessibility places yet another restriction upon the unification of two literals, since we should require that two worlds be accessible from each other before we attempt to unify their indices. It is clear that the above proof only goes through according to the game rules if the accessibility relation is both symmetric and transitive. The reader should attempt to perform the unification of the indices w: I :0 and v: 2:0 using the unification algorithm given above to see why this is so. The following possible-worlds diagram may help, where arrows between worlds reproduce the accessibility relations encoded in the two indices:

Concerning the interaction of world indices with quantification over individuals, compare the proof of (3x)L(Fx) :J L(3x)Fx with the failed proof of its converse, L(3x)Fx :J (3x)LFx. (3x)L(Fx)

I.

2.

:J

L(3x)Fx

+--- l(3x)L(Fx) +--- IL(3x)Fxlo

:J

L(3x)Fxl 0

A CQ, I

1 Game- Theoretic Interactions

3. 4. 5. 6. 7. 8.

+-

1(3x)Fxii o

+- iFYit o

1(3x)L(Fx)lo

+-

IL(Fh)io +-

IFhlw:O

+-

+-

43

Lmox•2 Emax• 3 CP, I Emin• 5 Lmin• 6 {b/y, 1/w}, 4, 7

The proof goes through, because the antecedent makes a stronger claim than the consequent, in that it identifies a particular individual with property Fin all worlds. The proof of the converse implication fails, because its antecedent is weaker than its consequent, and the corresponding attempt at unification is foiled by Skolem functions in which individuals become functions of indices and vice versa: L(3x)Fx

1. 2. 3. 4. 5. 6. 7.

++-

:J

(3x)L(Fx)

IL(3x)Fx :J (3x)L(Fx)l 0 1(3x)L(Fx)lo

+- IL(Fa)io +- IFalf(a):O IL(3x)Fxlo +l(3x)Fxlw:O +IF(g(w))lw +-

A

CQ,I Emax• 2 Lmax• 3 CP, I

Lmin• 5 Emin• 6

It is worth pointing out that the Barcan formula M (3x)Fx :J (3x)M Fx is derivable from the game rules. This restricts us to models where the same individuals exist in all wE W. We also require that R be serial, otherwise a world variable could range over an empty set of worlds. Thus the game rules supplied so far create an interpretation scheme for a modal logic that turns out to be first-order S5, if no restrictions are placed upon accessibility, i.e. if R is a universal relation. In order to implement a proof method for a doxastic logic like KD45, it is only necessary to remove reflexivity from Rand replace it by seriality. This is equivalent to deleting the axiom LP :J P from the axiomatic base of S5 and replacing it with LP :J M P.

3.2

Using the doxastic logic KD45

I suggested in Section 1.1 that a knowledge base k is properly regarded as a representation of the beliefs of an expert e. More accurately, it embodies the knowledge engineer's theory of what e believes. As a model of reality, this theory can be flawed in two different ways: (I) the knowledge engineer may have misrepresented e's beliefs, or (2) e may be mistaken about the world. I have argued elsewhere (Jackson, 1987b, Section 1.2) that the naive approach of assuming that everything in k is simply true has its epistemological

44

P. Jackson

problems. Also, the assumption that if k I- P then e actually believes P assumes infinite resources. If this view of k is to be more than a rhetorical device then the inferences that are drawn in response to assertions and questions should reflect the underlying semantics of belief and exhibit real patterns of doxastic reasoning. Thus propositions in k ought to be about e's beliefs and not about the world; they represent what the knowledge engineer has told the interpreter that the expert believes. To the extent that the interpreter is meant to impersonate the expert, they also serve as the beliefs of the interpreter itself. Given that the interpreter (unlike the expert) has no immediate access to the world, e.g. through sensors, it has no independent means of judging for itself what is true. Hence it cannot concern itself with the truth or falsity of knowledge-base contents; it can only concern itself with the acceptability of propositions, given what is already believed. In what follows, knowledge-base contents are all of the general form lalw:O• which can be read variously as: La, a is believed, every model of k is a model of a, a is in all the models in the model set K, a is true in all possible worlds doxastically accessible to e in 0. Assertions will be considered as speech acts that affirm belief; hence > P implies that La. Questions will be considered as speech acts that admit to lack of belief (or doubt); hence ?P implies that ...... La (or M -a). This is consistent with the felicity conditions for assertions and questions employed in Jackson ( 1987b, Section 3.1 ). Responding to these speech acts will now involve a genuine process of introspection. All this is best illustrated by example. There are two immediate and obvious uses for doxastic operators in automated reasoning systems, but they pose some theoretical problems, as well as practical problems of implementation. One is to circumscribe a predicate with an axiom along the lines of (\fx)(Fx ::J L(Fx)), which states that all the Fs are known. The desired behaviour from the axiom is that, given Fa, Fb, Fe in the knowledge base, and no other instances ofF, any attempt to assert Fd, where dis other than a, b or c, should result in a contradiction (given the unique name assumption). This is because L(Fd) is inconsistent with the knowledge base, from a doxastic point of view, since it makes a false claim about what is believed. The other is to enable default reasoning along the lines of (Vx)(M(Gx) ::J Gx), which states that if it is consistent with the current set of beliefs that something should have G, then it has G. Here the desired behaviour is that, upon querying Ga, for some individual a, it should be concluded that a has G, so long as the current set of beliefs does not allow ...... Ga to be derived. A modal doxastic logic such as KD45 does not enable either of the behaviours described above to be implemented in a straightforward manner, unless one has total run-time access not merely to a knowledge base, but to a complete, stable and consistent autodoxastic theory (of what the knowledge-

1 Game- Theoretic Interactions

45

base interpreter believes) of the kind described by Moore ( 1985, and Chapter 4 of this book). Thisr means that, for any knowledge base, k one requires a theory, T that contains: (I) the set T' of all the logical consequences ofT; (2) the set {LP: PET u T'}; and (3) the set {"' LP: it is not the case that pET u T'}. For a non-trivial knowledge base, the corresponding autodoxastic theory could be large indeed, even for finite domains. Given undecidability and infinite domains, the proposal becomes unworkable. On the other hand, it may be possible to generate the corresponding autodoxastic theory incrementally and on demand. That is to say, on occasions when the interpreter is required to reason about its own beliefs, it does this by actual introspection rather than simple look-up. Storing the results of such ruminations would result in a growing awareness of its own beliefs, rather than establishing such an awareness by fiat. Take the first case, that of circumscription, where we play a game on the assertion L(Fd), i.e. we play Gk[IFdlu 0 +--].The assertion of IFdlu:o in this context claims that Fd is true in every possible world consistent with what is currently in k, while IL(Fx) +-- Fxiw 0 E k states that, in every such world, all individuals with the property F are already believed to have F. The following attempt to derive a contradiction fails, even if there is a possible world accessible from 0 that is not a model of Fd. I.

IFdiu:O

2. 3. 4.

IL(Fx) +-- Fxiw:o IL(Fd)iu o +-IFdiv:u:O +--

A A {djx, ujw}, 1, 2 Lmin• 3

Unless we can obtain the proposition "I do not believe Fd"-i.e. [ +-- IFdi. 0 ] for some n E N-from our autodoxastic theory, the best that can be hoped for in any automatic proof is an infinite loop. The semantic intuition that one would like to capture in the proof method is that an assertion of the form ILPI;, such as the one at line 3, is only acceptable if Pis true in all the worlds accessible from doxastic viewpoint i, i.e. if IPiw:i is a true statement about e's beliefs . . One way of resolving away IFdiv:u:o ("I believe that I believe Fd"), at line 4 Is to attempt to show that it is false by assuming the contrary, "I don't believe that I believe Fd"-i.e. [ +-- IF dim" 0 ] for some m, n EN in our indexing scheme--and attempting to show that [ +-- IFdim n 0 ] u k is unsatisfiable. To understand this, note that a formula like [ +-- IFdb 1 0 ] poses the query "Is Fd believed to be believed?" If we fail to derive a contradiction then we are entitled to reject "I believe that I believe Fd" and resolve IFdlv:u:o away, since Fd is not, in fact, believed. If we do derive a contradiction, for example because [IFdlw: 0 +--] E k, then We cannot resolve IFdiv.u:o away, since Fd is believed. Our introspective

46

P. Jackson

powers are given to us by the transitivity of R, since we can only unify 2: I :0 and w: 0 if 2 is accessible from 0. The following possible world diagram, in conjunction with the unification algorithm, may make this clear.

w

However, this is entirely what one would expect, given that transitivity corresponds to the S4 axiom LP ::::J LLP. This axiom is a theorem of KD45, and corresponds to meta-belief, given a doxastic reading of L. The proof method just described uses the contra positive of this theorem, i.e. if you don't believe that you believe P, then you don't believe it. Consider the other example, involving default reasoning. I.

2. 3. 4.

+- IGalu:O IGx +- M(Gx)lw o +-IM(Ga)luo +- IGalv:u:O

A A {ajx, ujw}, I, 2 Mmax• 3

Again, the best we can hope for is a loop. To resolve away the negative literal IGalv u:o ("It is inconsistent with my beliefs to believe that Ga is consistent with my beliefs") at line 4, one would need "Ga is consistent with my beliefs"-i.e. [IGal.,o +-] for some n EN-in one's autodoxastic theory. In order to eliminate IGalv:u:o in the absence of such a theory, we have to assume [IGalm:n:o +-] and then attempt to derive a contradiction. If none is forthcoming, we can then return to the proof and allow ourselves to resolve IGalv:u:o away, because its negativity is inconsistent with a knowledge base for which the assertion of Ga is acceptable. The kind of intuition one is trying to capture here is that M ( Ga) is acceptable to k if and only if it is not the case that k r -Ga. If we do derive a contradiction, for example because [ +- IGalw:oJ E k, then it will be because Ga is not true in any world accessible from doxastic viewpoint u: 0. This is simply LP ::::J LLP in the guise of M M P ::::J M P, and we are again using the contrapositive of this theorem, i.e. if it is not consistent to believe P then it is not consistent to believe that it is (consistent to believe it). It is important to realize that the devices described above do not change the possible world semantics of L and M. The derivation of autodoxastic

1 Game- Theoretic Interaction.,·

47

statements by failing to refute the introduction of additional assumptions can be considered r as a function of the meta-interpreter that generates fragments of an implicit theory on demand. To show that this form of reasoning is both useful and tractable, the next section includes further examples. 3.3

Game-playing heuristics

The work described in this chapter deviates from the game-theoretic foundations laid by Hintikka in two significant respects. (I)

(2)

Hintikka assumed that the games were played in the context of a complete interpretation of the language. Admitting incompleteness means that one either has to move to a three-valued logic, or make some kind of separation between the semantic games that get played on propositions and the pragmatics of responding to queries and assertions. This chapter and earlier papers explore the consequences of each of these alternatives. Hintikka is rather vague about the precise nature of the strategies for choosing conjuncts and disjuncts to play upon, and for picking exemplars to stand for variables of quantification. A variety of domain-free strategies have been described in my earlier papers, and none of them is very satisfactory. The question then arises as to whether it is possible or useful to write domain-specific strategies for playing such games.

For illustrative purposes, let us take the domain of air battle simulation. Sample meta-rules and chains of reasoning will only be sketched, and the full indicia! notation will be omitted for the sake of clarity. There will be no more tableau proofs. Let us go back and consider some of Hintikka's original game rules for classical logic, as summarized in Section 2. To what extent might it be possible to enhance the meta-interpreter with knowledge-rich strategies for playing such games? For example, given (G.&)

If S is (P & Q) then MIN chooses either P or Q, and the game goes on with respect to it,

the choice of P or Q corresponds to conflict resolution in the meta-interpreter between the application of &P and &Q: &P:

[P ~ (P& Q)]

&Q:

[Q ~ (P& Q)]

48

P. Jackson

One can conceive of situations in which MIN can use domain-specific metaknowledge to choose a single conjunct for further processing. Imagine that we wish to check whether or not an aircraft is fit to enter combat, and that the two major causes of unfitness are shortage of fuel and being out of ammunition. The flight controller might assume as normal procedure that a pilot who discovers that he is out of ammunition would report this immediately, since it directly affects his readiness for action, whereas shortage of fuel depends rather upon what further action the squadron leader intends. Given the query ?(enough{uel(a) & armed (a))

for some particular aircraft a, one might have a rule in the meta-interpreter to the effect that only the fuel situation needs to be considered, if the controller has no information regarding ammunition supply, along the lines of

Ml:

[(enough{uel(x) & armed(x)) +-- M (armed(x)), enough{uel(x)]

The subgoal M(armed(a)) should trivially succeed so long as [ +-- armed(a)] is not in the knowledge base, leaving only a single conjunct for MAX to process. A domain-specific meta-rule like Ml should be seen as pre-empting the operation of domain-free rules, such as &P and &Q. Ml is a more specific strategy than that represented by M2:

[armed(x) +-- M(armed(x))]

since it only applies in a particular context, namely that of deciding between checking on fuel and checking on ammunition in order to economize on computational resources. Thus game-playing heuristics can be introduced at different levels of control, and decisions concerning the correct level will tend to be task (as well as domain) specific. Similar meta-rules can be supplied for the quantifiers. Suppose that the flight controller wants to know if the aircraft in an airborne squadron, say 617 Squadron, are short of fuel. Rather than reason about each aircraft in the squadron separately by iteration or backtracking, a reasonable heuristic would be to assume that all aircraft have consumed approximately the same amount offuel, and only reason about a particular squadron member, such as the squadron leader. This involves selecting the squadron leader as an exemplar for a universal variable in the query ?(Vx)(member(x, 617)

::::J

enough-fuel(x))

by the application of the following meta-rule:

Game- Theoretic Interactions

M3:

49

[(Vx)(member(x, s) => enough{uel(x)) squadron-leader(y, s), M(enough-fuel(y))]

+--

Thus any member of a squadron is assumed to have enough fuel if it is consistent to assume that his squadron leader has enough fuel. Rules like M3 could be used to define "prototypical individuals", which can be substituted for variables of quantification as quantifiers are eliminated. This is more knowledge-rich than the domain-free method of substituting arbitrary constants or standardized variables, and could be used to encode heuristic information of a common-sense nature, as in the case shown above. It is also more reliable, and easier to modify, than the heuristics described in earlier work. In addition to making the specification of game-playing strategies easier, the presence of modal operators obviously enriches the representation language. Thus one can distinguish between hostile aircraft, aircraft known to be hostile, aircraft not known to be hostile, aircraft known not to be hostile, aircraft not known not to be hostile, and so on. This allows the writing of heuristic rules which rely on such distinctions; for example in times of war, unidentified intruders on airspace might be taken as hostile in the absence of contradictory information, whereas in peacetime we might insist that there be positive reasons for believing that an intruder is hostile before an attack is authorized: M4:

[attack(x)

+--

penetrate(x, airspace), status(war), M(hostile(x))]

M5:

[attack(x)

+--

penetrate(x, airspace), status(peace), L(hostile(x))]

Clearly there are still decisions to be made concerning the control regime in the meta-interpreter. The above assumes that clauses in the meta-interpreter are hand-ordered for effect, rather as clauses are ordered in PROLOG programs, with proper axioms listed before logical axioms. The same is not true, however, of clauses in the object-level knowledge base, which form a set. The idea is to use control knowledge to eliminate backtracking in the metainterpreter as far as possible, with proper axioms pre-empting the application of logical axioms, unless the user initiates backtracking by rejecting a reply. The rationale behind this whole approach is that implicit control knowledge (such as that represented by clause ordering) should be confined to the metatheory, and should not contaminate the object-level theory of the domain (cf. Clancey and Bock, 1986, Section I). If this principle is observed then the object-level knowledge base remains capable of being run under different control regimes. For example, the knowledge engineer might like to experiment with different orderings of default rules.

50

P. Jackson

Also on the subject of control, it might make sense to use some form of limited inference when evaluating modalized literals, such as restricting the evaluation of the Pin M P or LP to unit resolution at the object-level, thereby trading completeness for further efficiency. Consider Moore's example, "If I had a brother, then I would know about it", along the lines of

[L(brother(x, Peter))

+-

brother(x, Peter)]

The spirit of this rule is that knowledge of any brothers I may have is immediate, rather than being available via repeated application of modus ponens.

4

SUMMARY AND STATUS OF THE RESEARCH

This work represents a fresh attempt to provide a game-theoretic foundation for constructive interactions between a user and a knowledge base. It represents an improvement on previous attempts in that: ( 1)

using the clausal form of logic as the underlying representation allows the employment of well-established theorem-proving techniques for which soundness and completeness results exist;

(2)

the clean separation between the evaluation of propositional contents and the perlocutionary effect of sentence tokens allows the retention of a two-valued logic;

(3)

the knowledge engineer retains the full expressive power of a firstorder language for interacting with the knowledge base, and can examine tableau proofs that are relatively easy to understand;

(4)

the specification of game-playing strategies and heuristics is made much easier by the introduction of modal operators to a metainterpreter written in the representation language itself.

4.1

Status of the present program

The modal proof method described in Jackson and Reichgelt (1987) has recently been implemented, and we propose to construct a meta-interpreter that deploys game-playing strategies of the kind described in Section 3. The theorem-prover allows the user to specify what properties of R can be used by the program in the resolution of indexed literals; the list of such properties is simply the value of a global variable. Thus it is an entirely trivial matter to

J Game- Theoretic Interactions

51

replace the reflexivity property by seriality to derive the doxastic logic KD45. Autodoxastic reasoning will be implemented by allowing the metainterpreter to call the object-level interpreter to perform introspection as illustrated in the game of Section 3.2. This will require some modification to the theorem-prover, since these games are not identical with those in Section 3.1.

4.2

Conclusions and relation to other work

It is worth stressing that game-theoretic semantics is not, in and of itself, a non-standard logic. Rather it is a non-standard interpretation of logic that unlike many other intepretations, appears to acknowledge from the outset the finitude of resources available for constructing proofs, and demands that game-playing strategies be supplied. As such, it can be applied to either standard or non-standard logics, as this chapter has tried to demonstrate. In particular, having a game-theoretic semantics does not mean that a logical language is necessarily non-monotonic. In a non-monotonic logic, new axioms can invalidate old axioms, and a theory can contain inconsistent subtheories. This is because inferences drawn on the basis of incomplete information may have to be retracted in the light of fresh evidence. In the game-theoretic approach to knowledge-base construction outlined above, it not generally the case that the interpreter allows a theory to be extended on the basis of default rules. Rather, in the face of an assertion of the form > P, it adopts a certain strategy in trying to refute P, and accepts P if this attempt fails. Rules like [LP +- P] serve to restrict this process and provide short cuts to discovering inconsistency. Similarly, in the face of a query of the form ?P, a particular strategy is adopted in trying to construct an answer, but, even if P is concluded on the basis of a default rule such as [P +- M P], this does not mean that P will be added to the knowledge base. All it means is that the interpreter is allowed to go on affirming Pin response to queries until such time as k I- "'P. Meta-rules containing L and M (of the kind exhibited in the last section) perform one of two basic functions. Rules of the general form

[Pt •... , L(F( ... )), ... , Pn +-Qt •... , Qm] serve to "restrict" the predicate F in a particular context, while rules of the form

serve to "relax" the predicate F in a particular context. However, restriction rules do not have a non-monotonic interpretation in

52

P. Jackson

the system described above. A non-monotonic interpretation presumably allows a predicate to be "opened" again, once it has been "closed". To return to Moore's "brother" example, non-monotonicity tolerates a scene in which one really can have a long-lost brother. Relaxation rules specify short-cut strategies for drawing inferences on demand, in order to answer questions on the basis of the available evidence. New evidence may cause the default rule to return a different reply to a future query on the basis of an extended theory. This seems to be another advantage of separating the executive from the interpreter, as described in Section 2. A comparison with Reiter's (1980) system is useful here. The examples of default rules of the form Q: M PIP given in his paper are relaxation rules, according to the present terminology, since they can be rewritten in the form [ P +--- M P, Q], while rules of the form Q: M- P1- P are restriction rules, since they can be rewritten in the form [LP +--- P, Q]. The main difference between the two systems is the way in which such rules are used. The present system uses relaxation rules to answer queries and restriction rules to monitor assertions, whereas Reiter's system uses both kinds of rule to extend a theory. In the case of relaxation rules, success in using them never extends the theory, while success in using restriction rules always prevents the theory from being extended. Thus the modal approach described herein relegates such rules to the meta-level of a theorem-prover for a standard logic (albeit with a non-standard semantics), while Reiter actually constructs a non-standard logic of defaults. In summary, game-theoretic semantics can be seen as a tool for building logical languages and interpretation schemes, rather than specifying a particular language or scheme. The aim of this chapter has been to demonstrate its usefulness in the context of knowledge-base management, and explore some of the different ways in which game-theoretic interpreters can be implemented for different logics. A proof method for modal extensions to the predicate calculus has been outlined in terms of game rules, and examples have been given as to how one might use this more expressive language to specify autodoxastic strategies for the heuristic control of language games.

ACKNOWLEDGMENTS I should like to thank Han Reichgelt for many useful discussions on the subject of modal logic, and for comments on earlier drafts. I am also grateful to Frank van Harmelen, George Kiss, Marco Colombetti and Jean-Phillipe Solvay for their helpful comments on papers and programs associated with

1 Game- Theoretic Interactions

53

this work over the last two years. Special thanks are due to Paul Gochet, for his detailed criticisms of an earlier version of this chapter.

BIBLIOGRAPHY Austin, J. L. (1962). How To Do Things With Words. Harvard University Press, Cambridge, Mass. (Austin's classic lectures on performatives, etc.) Chellas, B. F. (1980). Modal Logic: An Introduction. Cambridge University Press. (A thorough introduction to the main systems and results of modal propositional logic.) Clancey, W. J. and Bock, C. (1986). Representing control knowledge as abstract tasks and metarules. Memo. KSL 85-16, Knowledge Systems Laboratory, Stanford University. (An interesting account of both the potential and the perils of representing control knowledge in a logical language.) Davis, R. ( 1976). Applications of meta-level knowledge to the construction, maintenance and use of large knowledge bases. Knowledge Based Systems in Artificial Intelligence (ed. R. Davis and D. Lenat). McGraw-Hill, New York. Ellis, B. (1979). Rational Belief Systems. Blackwell, Oxford. (An intriguing and controversial discussion of the principles of rationality and their relation to logic.) Hintikka, J. (1955). Form and content in quantification theory. Acta Phil. Fennica 8, 7-55. (This paper contains a lucid exposition of model sets.) Hintikka, J. (1962). Knowledge and Belief Cornell University Press, Ithaca, NY. (A model-theoretic approach to knowledge and belief.) Hintikka, J. (1973). Logic, Language Games and IIJformation. Oxford University Press. (The most accessible text on game-theoretic semantics.) Hintikka, J. (1983). The Game of Language. Reidel, Dordrecht. (A compendium of papers on game-theoretic semantics. This is not a very good place to start. However, it does contain some interesting applications.) Jackson, P. ( 1987a). A representation language based on a game-theoretic interpretation of logic. PhD thesis, Computer Based Learning Unit, University of Leeds. (A more exhaustive (and exhausting) account of the game-theoretic approach to knowledge base maintenance described in this paper.) Jackson, P. (1987b). Towards an architecture for advice-giving systems. Current Issues in Expert Systems (ed. P. Dufour and A. van Lamsveerde). Academic Press, London. (This paper is essentially Chapters 3 and 4 of Jackson (1987a), as far as game-theoretic semantics is concerned.) Jackson, P. and Reichgelt, H. (1987). A general proof method for first-order modal logics. Prov. lOth Joint Conf. on Art(ficiallntelligence, Milan, pp. 942-944. Morgan Kaufmann, Los Altos, California. (A paper that gives the details of the indexical proof method mentioned in Section 3. Alas, there is a small mistake in the unification algorithm presented therein (which I have corrected here). In accordinace with Murphy's Law, this was discovered the day after the paper was sent off') Le~esque, H. (1984). Foundations of a functional approach to knowledge representation. Artificial I ntel/igence 23, 155-212. (This paper is an attempt to examine some formal properties of knowledge bases at a functional level of abstraction that rises above implementation detail.) Lewis, D. (1972). General semantics. Semantics of Natural Language (ed. D. Donaldson and G. Harman), pp. 169-218, Reidel, Dordrecht. (A wide-ranging ~aper that touches upon indexical semantics and the representation of performatJves, amongst other things.)

54

P. Jackson

Moore, R. C. ( 1985). Seman tical considerations on nonmonotonic logic. Artificial Intelligence 25,75-94. (A significant paper that sheds light on some of the problems associated with earlier formulations of non-monotonic reasoning.) Nguyen, T. A., Perkins, W. A., Laffey, T. J. and Pecora, D. (1985). Checking an expert system knowledge base for consistency and completeness. Proc. 9th Int. Joint Col!{ on Artificial Intelligence, Los Angeles, pp. 375-378. Morgan Kaufmann, Los Altos, California. Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence 13, 81-132. (A thorough account of default logic that provides many interesting results, develops a complete proof theory for a class of common defaults, and shows how this could be implemented in conjunction with a resolution theorem-prover.) Searle, J. (1965). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press. (Contains an analysis of illocutionary acts, such as assertions, questions, requests, congratulations and warnings.) Suwa, M., Scott, A. C. and Shortliffe, E. H. (1982). An approach to verifying completeness and consistency in a rule-based expert system. The AI Magazine, Fall, pp. 367-374.

DISCUSSION Paul Gochet: The author states that "the clean separation between the evaluation of propositional contents and the perlocutionary effect of sentence tokens allows the retention of a two-valued logic". I should like to stress the importance and originality of this result by setting it off against the opinion that has prevailed up to now. The shared view on the topic has been forged by two fascinating papers due to the logician N.D. Belnap (Belnap eta/., 1976/7; Bellman eta/., 1977). Belnap introduces four values, defines three connectives in terms of the latter and supplies a method for extending the mapping of all atomic formulae into the four values to a mapping of all the formulae (molecular as well as atomic ones) into these truth values. Belnap defines the epistemic states in which a computer can find itself in much the same way as Hintikka. A collection of Carnapian state descriptions is his way of representing incomplete information. This is congenial to Jackson's use of model sets to represent incomplete knowledge. At a later stage of the second paper, Belnap takes advantage of Dana Scott's theory of approximation lattices and points out that his four truth values make up an approximation lattice for which it is possible to define meet and join, and he states that "... the value of a sentence in an epistemic state is to be determined by taking the meet of all its values in the separate states ... "(Bellman et a/., 1977). Thanks to Belnap's sophisticated apparatus, it is possible to compute the determinate truth value of P v Q out of the indeterminate vaolue of P and the indeterminate value of Q, a computation that clearly results in an increase of the amount of information. The following example will make that clear. Suppose that the computer is given incomplete information with respect to P and also incomplete information with respect to Q. In one model set, Pis ascribed the truth value True and Q the truth value Neither True nor False. In the other model set, P is ascribed the truth value Neither True nor False and Q is ascribed the truth value True. Hence, when confronted with the queries ?P and ?Q, the computer cannot give an answer. Belnap says: "we should mark A with True in E (the model set) if it is marked True in

1 Game- Theoretic Interactions

55

all set-ups (in all models) in E, and mark it with False if it is marked False in all set-ups in E (Bellman et a/., 1977). Jackson adopts the same policy: he makes it a condition for a proposition being true that "every model in the model set is a model of the proposition." I have set the stage. Let me now examine what happens if the computer is asked ?(P v Q). Applying Belnap's insight and taking the meet, the computer can answer "Yes". The question arises as to whether Belnap's insight can be captured within the framework of two valued epistemic logic supplied by Jackson. N. Tennant (1987) has recently inquired into the novelty and independence of game-theoretic semantics from Tarski standard semantics. A brief comparison with Jackson's exploitation of game theory can help see the originality of the latter's work. Several features distinguish Hintikka's game-theoretic semantics from Tarski's semantics. The more apparent one is the fact that Hintikka's recursion clauses operate from outside in, instead of operating from inside out, like Tarski's. This feature, which should appeal to the computer scientist and the linguist, is not, however, the most important. For Tennant, the originality of game-theoretic semantics lies in its providing us with a rigorous account of the activities of seeking and finding. That point had already been made by Hintikka. What is new in Tennant's account is the claim that these two activities are misplaced when their purported objects are "constructively inaccessible existents" (Tennant, 1987, p. 175). In other words, gametheoretic semantics delivers a full account of seeking and finding if and only if we impose upon it an additional constraint, i.e. if we require that the strategies in our game be effective in the broad sense. This requirement, however, has a negative side. We have to pay something for it: it may happen that neither player possesses an effective strategy. By catering for this possibility, we lose bivalence. This helps us to appreciate the attractiveness of Jackson's account. That account succeeds in capturing the specificity of the activities of seeking and finding without sacrificing bivalence. Jackson achieves this result by putting to a new use the Austinian distinction between locution and perlocution. Jackson's logic of propositions remains two-valued: either MAX or MIN will win propositional games. On the other hand, the case of unsuccessful query, which occurs when the knowledge base is incomplete or inconsistent, is not ignored but accounted for at another level, i.e. at the level of the definition of the function EXECUTIVE. The latter function ranges over four values (Yes, No, Unknown, Inconsistent), which are not truth-values of the proposition (locutionary act) but which are possible responses to be seen as effects (perlocutionary act) triggered by the propositional component of the question (illocutionary act). Jackson fully clarifies the interplay between quantifiers and epistemic operators. This is, unquestionably, a major achievement. When a problem has been solved, as is the case here, the question to raise is "What should come next?" I guess that at this stage it would be appropriate to come to grips with the logic of identity and the logic ?f causality and to see how a first-order epistemic logic could be broadened to cover Identity and causality. As Ch. Cherniak (1986) says, "We must distinguish between an agent's merely acting as if he had a particular belief and his actually having that belief. ~though the logically deviant creature will act appropriately for the putative belief t a_t c = d and a = h, as well as for the belief that a = b and c = d, only the latter will be Its reason for these actions-that is, it will only use the belief that a = b and c = d as a basis for selecting desirable actions." The trouble is that the attempts to extend

56

P. Jackson

resolution methods to the logic of identity have not been very successful up to now. I should like to thank Dr M. McRobbie for a useful conversation on this topic. L. Farinas del Cerro: Usually the notion of derivability in logic uses a set of axioms and rules, another method has been proposed by Hintikka and Lorenzen. Hintikka uses the notion of two-player game as proposed by Jackson, and Lorenzen ( 1959) proposes the notion of two-person dialogue. These two proposals are very close; however, the philosophical motivations are very different. A logic in the Hintikka and Lorenzen approach is defined by a set of rules steering the game or the dialogue. The rules are certainly conventions; hence the main problem is what conventions are the most natural. Jackson gives conventions that define classical logic. In Lorenzen's work it appears that conventions for intuitionistic calculus look more natural than conventions for classical calculus. The same idea can be found in Gabbay's work, where classical calculus is obtained by adding a new rule to the intuitionistic rules (Gabbay, 1985). An important point in Jackson's work is the possibility of reaching modal logics in the contex of the game approach. However, it doesn't seem clear to me how the axioms of a particular modal system are captured in his approach, because the properties of the accessibility relation don't appear explicitly. Lorenzen defines dialogue rules able to formalize some modal logics. I think that this will be a natural way to extend Jackson's work.

Robert C. Moore: It is not immediately obvious on reading Peter Jackson's paper exactly what he is trying to achieve. At a superficial level, things are more or less clear. The problem being addressed is how to support a knowledge engineer building up a knowledge base about a domain, by providing him with tools for checking the consistency of what he has entered into the knowledge base and for checking whether desired conclusions are entailed by what he has entered into the knowledge base. What is not clear is precisely what a game-theoretic approach has to offer over more conventional solutions. The conventional way to approach the manifest problem would be to use a resource-bounded theorem-prover. When an assertion is presented to the knowledge base for acceptance, the theorem-prover is run to prove the negation of the new assertion. If the theorem-prover succeeds in proving the negation then the assertion is inconsistent with the knowledge base and is not accepted. If the theorem-prover runs out of ways to prove the negation without finding a proof then the original assertion has been shown to be consistent with the knowledge base (provided that the theoremprover is logically complete) and the assertion is accepted. If the theorem-prover runs out of resources while trying to prove the negation then the original assertion is provisionally accepted, although it has not been proved to be consistent. Since the consistency of complex theories is usually undecidable, however, this will normally be the most we can hope for. Checking entailment is just the dual of this process. To see whether a desired conclusion follows from the knowledge base, simply run the theorem-prover until the desired conclusion has been proved, or the theorem-prover runs out of ways to try to prove the conclusion, or the theorem prover runs out of resources. If the conclusion is proved then all is well. If the theorem-prover runs out of ways to prove the conclusion then it definitely does not follow from what is already in the knowledge base, and the knowledge base must be strengthened if the user wants the conclusion to follow. If the

1 Game- Theoretic Interactions

57

theorem-prover runs out of resources then the provisional assumption would be that the conclusion does not follow. Again, because of the limits of decidability, this is usually the best one can hope for if the conclusion really does not follow. This is such an obvious approach to take, that it would hardly be worth writing a paper about it. What we should like Jackson to tell us is why a game-theoretic approach might be better than this. I suspect that the answer Jackson would give might be something like this: general-purpose theorem proving is well known to be computationally very expensive, some would say computationally infeasible. The kind of logical games that Jackson discusses, however, if taken seriously as games might be played by the computer much faster than the time needed to get reasonable results by running a general-purpose theorem-prover, even with a resource bound. The game associated with a particular formula in Hintikka's approach has at most as many moves in it as the maximum depth of the formula, so if the computation needed to do move selection can be reasonably bounded, one could hope to obtain quick procedures for checking consistency and entailment. I see two difficulties with this answer-one conceptual and the other practical. The conceptual difficulty is that the kind of games Hintikka introduced were semantic games, whereas what is needed for the kind of enterprise Jackson is suggesting are proof-theoretic games. Hintikka's games are concerned with truth in a model, while Jackson is concerned with consistency and entailment with respect to a knowledge base, i.e. a theory. Hintikka's games are relatively straightforward because the structure of models is so much simpler than the structure of theories. For instance, the rule for P v Q is simply that MAX chooses Porchooses Q,since for P v Q to be true, P must be true or Q must be true. Jackson gives more complicated rules, apparently to cope with such facts as that P v Q could be entailed by a knowledge base without P being entailed or Q being entailed. The rationale for the particular rules he gives is not very clear, though, and it is difficult to see whether the total set of rules he gives is complete or, if not, what the limits of the rules are. One is left with the question of whether by moving from the domain of semantics to the domain of proof theory one has lost the intuitive motivation and simplicity that made the game-theoretic approach attractive in the first place. The practical question is whether there really could be game-playing strategies that would give useful results and still be computationally trictable, even if the conceptual shift from semantics to proof theory is carried off successfully. After all, Hintikka's methods depend on proving that one player has a winning strategy in a particular game. That generally requires an exhaustive search of the game tree, which in the case of theories over infinite domains is impossible. Jackson's discussion seems to suggest that, rather than proving that a player has a winning strategy, we simply play the game between the system and the user and assume that whichever wins must have had a winning strategy. But does this really make sense? Suppose the user wants to "prove" that a certain universal statement follows from th~ ~nowledge base, by playing a game with the knowledge base. Suppose they play Hmtikka's game with the user taking the MAX role and the system taking the MIN ~ol~. The system would have to choose an instance of the universal statement that it t~Inks" is a counter-example. Suppose it makes such a choice and the game continues ;uh MAX eventually winning. Can we conclude that the statement actually does ollow from the knowledge base? That depends, as Jackson himself points out, on Whether the system plays the game perfectly. But to play the game perfectly, the system would have to have proved that there was no other better choice of possible

58

P. Jackson

counter-example. It is not clear that this would be any easier than simply proving that the original formula was false. On the other hand, if the system is not a perfect player then it is hard to see why we should draw any connection between the outcome of a particular game and there being a winning strategy for the game. Jackson's answer to this problem is to propose the use of heuristics to guide the choice of moves in the game. But just as domain-independent heuristics for guiding a conventional theorem-prover seldom work very well, Jackson tried a variety of domain-independent heuristic game-playing strategies and found that "none of them is very satisfactory". Jackson goes on to suggest the use of domain-specific heuristics, but none of the examples he gives seem to depend particularly on the game-theoretic framework he assumes. This brings us back, then, to a refinement of the question we posed initially: what, if anything, makes the game-theoretic framework more attractive than conventional approaches as a basis for reasoning systems that are guided by domain-specific knowledge? Reply to Gochet: (F. v) is weaker than disjunction elimination, because it is not the case that k = { P v Q} 1- P v Q. MAX does not have a winning strategy in G(P v Q), given k and the extra game rules given above, although, given k = { P} or k = { Q}, MAX does have a winning strategy. Thus, to win an argument on P v Q, MAX must show which of P and Q is true. However, the epistemic extension solves this problem. We can represent the situation "we believe that either P or Q is true" by L( P v Q), i.e. every model in K, the model set of k, contains either P or Q. In the indicia! notation, we have [IPiw:O• IQiw:o +-] E k. The corresponding query can be put to the theorem-proved as [ +- L( P v Q)], and the proof is as follows:

+- L(P v Q) +- IP v Qlt:O 3. +- IPII:O 4. +-IQit:o 5. IPiw:O• IQiw:O +6. +I.

2.

A Lmox• I vP, 2 vQ,2 Aek

{1/w}, 3, 4, 5

R need have no special properties for this proof to go through; for example, it would go through in the system K, where R is not reflexive, symmetric or transitive. I am grateful to Professor Gochet for drawing my attention to Tennant's new book. My impression after reading the relevant chapter is that the author is correct in interpreting Hintikka's games of seeking and finding as constructive, and in pointing out that problems concerning decidability, infinite domains and incomplete information make the discovery of effective strategies a dubious enterprise. This confirms my own experience from a computational point of view; such problems motivated the separation of propositional valuation and perlocutionary force in the present work, and led me to characterize the games in terms of acceptability rather than truth. These difficulties also led to the use of meta-level inference for strategy selection. The strategies I propose are really heuristics, i.e. they are definitely weaker than those envisaged by Hintikka. Bear in mind that what Hintikka means by a strategy is "a rule that tells the player in question what to do in each conceivable situation that can come up in a game" (my italics). Thus it is clear that the present formulation takes me some distance from my roots in Hintikka's work. This is perhaps not too surprising; Hintikka was not concerned with the computational problems of game-theoretic semantics viewed as a technique for automated reasoning.

1 Game- Theoretic Interactions

59

Reply to Farinas del Cerro: The main advantage of the modal proof method outlined is that the issue of accessibility is dealt with by a special unification algorithm. This is more general than having to state the requisite axioms, and more efficient than a first order axiomatization in which one needs to have recursive (and combinatorially explosive) rules defining the accessibility relation. If one places no additional constraints upon the unification of two indexed literals, then one has a system in which the accessibility relation is one of equivalence, i.e. one has S5. However, by manipulating the properties of the accessibility relation, one can weaken this logic to any normal modal system with a Kripke semantics. (Recent work has removed the restrictions concerning the Barcan formula and the seriality requirement.) I should rather not be drawn into a discussion of classical versus intuitionistic logic, for which I am ill-equipped! Reply to Moore: If one attempts to program a solution to the problem of knowledge base updates, one soon finds out that the potential for combinatorial explosion in checking the consistency of new assertions is very great indeed. Applying resource bounds to a conventional resolution theorem-prover is both arbitrary and ineffective within reasonable limits. Complete methods for the full clausal form, such as ancestryfiltered form refutation, are combinatorially explosive, even if combined with heuristics such as unit preference, merging, subsumption and tautology elimination. Suppose that one's knowledge base, k, already contains some facts and rules, a number of which share predicates with the new rule r that one wishes to add. Let the average branching factor at each stage of the proof be b, and lets be the size of the rule, i.e. the number of literals it contains. In the happy event that a contradiction can be derived from k v {r} by unit resolution alone, this can take bs steps in the worst case. However, one may need to perform non-unit resolution, which will introduce more goals that ultimately need to be resolved away. If n such rule applications are required involving rules also of size s, then the worst case lengthens to bs + !s- 2 l" steps. Needless to say, the factor of n in the exponent is not good news. Thus Moore is correct when he supposes that the interest in game-theoretic approaches derives from the fact that conventional methods are computationally infeasible. Both domain-free and domain-specific heuristics will effect improvements on the worst case, as Moore rightly observes. The question is how to do this in a principled and structured way. Anyone who has looked inside the Boyer-Moore theorem-prover knows that the addition of heuristics requires careful organization if program behaviour is to be understandable, and the effect of adding new heurstics is to be predictable. Transparency is after all an important issue in expert systems. The game-theoretic approach has the advantage that the finiteness of resources is acknowledged at the outset, and one is more or less compelled to supply game-playing strategies that serve as heuristics. As Moore notes, the lure is that the length of the game is linear in the maximum depth of the formula, if the players have good heuristics and a level of risk can be tolerated. Strategies are organized around the connectives and quantifiers, and I had hoped that this structure would serve as a framework for modular and incremental heuristic control. In retrospect, this may seem rather naive, but then hindsight is an exact science. The main problem is that alth~ugh Hintikka's games on connectives and quantifiers purport to be about s~ekmg and finding, it is hard to find general-purpose strategies for actually playing t em (see my reply to Gochet). Now let us address Moore's conceptual and practical difficulties. Although Hintikka's games are essentially semantic, the moment that one

60

P. Jackson

abandons the simplifying assumption of a complete interpretation, one is forced to do real seeking and finding, and so, in the context of knowledge-base management, proof-theoretic games are required. The conceptual link between the two is provided by Hintikka's notion of a model set, which can be viewed both as a theory embodied in a set of sentences and as a partial interpretation that gives rise to more than one model. (Tennant's "background model" serves a similar purpose.) In my earlier work, the semantic games are the G games, such as (G. v) and (G.&), while the proof-theoretic games are the F games, such as (F. v) and (F.&). The former are as they are simply by virtue of the semantics of the connectives and operators; the latter say how to proceed with the proof with respect to a given theory once the atomic case has been reached. (F. v) and (F. E) are not complete with respect to the standard semantics of classical logic because they insist that a proof be constructive (see my reply to Gochet). The practical question of whether or not one can derive useful game-playing strategies is precisely that addressed by this paper and in previous work. In Jackson (l987b) I show that one can only derive useful domain-independent heuristics for a restricted class of games-those involving formulae which are essentially production rules, consisting of simple conjunctions of conditions and conclusions, with variables ranging over a small, finite domain. Even here, one was forced to tolerate a certain level of risk when making irrevocable choices, since the game outcomes win or lose are not identical with the normal valuations of truth and falsity, as Moore points out. The program described was intended as an aide to a knowledge engineer, not as a theoremprover. As stated in the earlier paper, it was assumed that the user would play a series of (possibly iterative and recursive) games with the system until he was satisfied with the outcome (cf. Tennant on game-theoretic arguments). Thus the user could reject partial proofs that he did not find convincing, or ask supplementary questions. The basic idea was to make proof sketches continuously available to the user, and allow him to flesh them out as far as he deemed necessary, rather than imposing arbitrary bounds on the prover, or letting it run overnight. The whole point of the present paper is that domain-free methods are "knowledgepoor". What is really needed is a mechanism for representing and reasoning with domain-dependent heuristics, for example by introducing "prototypical individuals" when playing quantification games. Section 3 presents a way of doing this that avoids the risks of unsoundness in the F and G rules. Although one can still structure heuristics around the connectives and quantifiers, one can also structure them around individual predicates. The amount of context that one supplies in a heuristic serves to limit its application to a particular level of control. Allowing the interpreter to reason about its own beliefs in the application of heuristics has a number of advantages. The main one is that autodoxastic reasoning is not defeasible, as Moore himself has pointed out. In other words, the winner of an autodoxastic game has a winning strategy, since he has total access to his own beliefs (even if his beliefs are incomplete with respect to the world). This is rather better than the situation in my earlier work, where a player could win or lose "unfairly" because either he or his opponent made a bad game decision. The primacy of the implicit heuristic component made it very hard to prove any interesting results about game-theoretic derivability, as I think Moore implies. Such is not the case here. The object-level theorem-prover is sound and complete with respect to the standard semantics of classical logic, while explicit heuristics in the modal metainterpreter involve patterns of reasoning that are simply indexical with respect to the current contents of the knowledge base. I have also explained that, although the

J Game- Theoretic Interactions

61

responses of the system show evidence of non-monotonic reasoning, theory extension remains monotonic. Non-monotonic theory extension is beyond the scope of this chapter. The main outcome of this research is the specification of a modal meta-interpreter that can reason about control by a genuine process of introspection, but that does not require the construction of an autodoxastic theory. The basic theorem-prover, described in Jackson and Reichgelt (1987), has since been implemented. The proof method employed can be shown to be sound and complete, and experiments conducted so far suggest that it is also computationally tractable. Moore concludes with the question: how much of this is actually due to gametheoretic semantics? I feel that many facets of Hintikka's work (e.g. game-theoretic semantics, model sets, and the possible-worlds approach to knowledge and belief) have contributed to his research. Game-theoretic semantics gave me a conceptual framework for thinking about heuristic control, even though it did not provide any straightforward solution. I also arrived at the modal proof method by the gametheoretic route described here, although in Jackson and Reichgelt we "compile out" the game theory by providing an axiomatization. My conclusion remains that gametheoretic semantics is an interesting way of looking at logic, but I should not like to claim that it is the only way to do meta-level inference.

Additional references Bellman, R. E. et a/. (1977). A useful four-valued logic. Modern Uses of MultipleValued Logic (ed. J. M. Dunn and G. Epstein), p. 19. Reidel, Dordrecht. Belnap, N.D. eta/. (1976/7). How a computer should think. Contemporary Aspects of Philosophy (ed. G. Ryle), pp. 30-69. Oriel Press, Stockfield, London. Cherniak, Ch. (1986). Minimal Rationality. MIT Press, Cambridge, Mass. Gabbay, D. (1985). N-prolog: an extension of Prolog with hypothetical implication. J. Logic Programming 2, 251-283. Lorenzen, P. (1959). Ein dialogisches Konstruktivitatskriterium. Infinitistic Methods: Proc. Symp. on Foundations of Mathematics, Warsaw. Tennant, N. (1987). Anti-Realism and Logic. Clarendon Press, Oxford.

2

An Automated Modal Logic of Elementary Changes LUIS FARINAS DEL CERRO and ANDREAS HERZIG Langages et Systemes lnformatiques, Universite Paul Sabatier, Toulouse, France

Abstract In this paper we define a modal logic for reasoning about changes of belief. We consider a particular kind of change, dealing with atomic facts, for which a simple formalization can be found. We stress how updates in databases can be represented in the same frame.

1

INTRODUCTION

To change one's set of beliefs is a fairly common activity, and it is certainly an essential occurrence in every evolving computer system. This is why an understanding of changes in belief is of great importance in areas such as philosophy, logic, databases and artificial intelligence. In the area of databases, with regard to updates, two paradigms have been given. One we call temporal. In this case the updates are interpreted as a way of obtaining a new state of the database. Modifications of the database produce a sequence of states, ordered temporally. Then with each update is associated a date. Consequently the update is something global to the database, i.e. the update concerns the complete database. Examples of this approach can be found in work of Castilho et a/. ( 1982) and Cholvy (1986). In the other approach, which we call hypothetical, the new database after the update is not constructed explicitly, but rather the reasoning that we can do on the database is modified, under the hypothesis of the update. In other words, in this approach the mechanism of reasoning about updates is local to the database, i.e. it involves a fragment of the database, the updates are not organized in time. Examples of this approach can be found in Gab bay (1982) and Warren (1986). These two dichotomic approaches can be found in many studies of "changes". In general, in the theory of changes three kinds of modifications of changes can be found. ~~~·STANDARD LOGICS

FOR AUTOMATED SONING ISBN 0·12·649520·3

Copyright((', 1988 Academic Press Umited All rights of reproduction in any form reserved

64

L. Farinas del Cerro and A. Herzig

(1)

Expansions: a piece of information is added without any modification of the database, consequently inconsistences can be introduced in this way.

(2)

Revisions: a piece of information is introduced, but requires modifications of the database; for example, any piece of information inconsistent with the added piece must be eliminated.

(3)

Contractions: a piece of information is withdrawn (retracted) from the database.

When complete databases are considered (any piece of information that does not appear in the database is false), the retraction of a piece of information means the addition of its negation to the database. Consequently, contraction and revision are the same type of modification, and revision is the only operation needed for updates. In any case, the following question regarding these two different ways of formalizing updates must be answered: what kind of information on the database must be brought to the new state after an update? The classical answer to this question is: the new state after an update is the state built from the state before with "minimal changes", and which is consistent with the update. As is well known in the logic of changes, when "minimal changes" interfere with the definition of a formal system, attention must be paid to the definition of the "preservation criterion". In other words, what formulae can be brought from one state to another? For example, we consider the database defined by the following set of formulae: {A, A -+ -+B}. We define the following update: add 1B to the database. Suppose the "preservation criterion" is defined as: "If a formula F belongs to the database and is consistent with the update then F must appear in the new database". Then we obtain a trivial database, because both A and A -+ B verify the "preservation criterion", and the new database will be: {A, A -+ B, -, B }, which is inconsistent. Therefore a system that includes "minimal changes" and a "preservation criterion" in more general terms can be inconsistent. Thus a balance must be found between "minimality" and "preservation". In the present chapter we have chosen a particular preservation criterion, and we shall consider restricted types of changes: changes that involve only units of information. In this case a solution can be found. We present a nonclassical logic, which we call ASSUME, for reasoning about changes and particularly about updates in complete databases. In Section 2 we present the language, in Sectio"n 3 we give the semantics and in Section 4 the axiomatics. The completeness and proof theory of the logic are treated in Section 5. In Section 6 modifications to databases are formalized in the frame of our logic. Finally we give some connections between our logic and related works.

2

2

65

An Automated Modal Logic of Elementary Changes

THE LANGUAGE OF THE LOGIC "ASSUME"

Expressions of the language of ASSUME are built from the symbols of the following pairwise disjoint sets: VAR:

....,, ", v,

propositional variables, p, q, ...

LIT:

propositional variables or negations of propositional variables, L, M, N, ...

+2:

classical propositional operations of negation, conjunction, disjunction, implication and equivalence respectively

-+,

ASSUME [ ]: ( ):

a modal operator brackets

The set of formulae FOR is the least set satisfying the following conditions: VAR c FOR A, BE FOR implies 1A, A "B, A v B, A-+ B, A +2 BE FOR

A

E

FOR and L

E

LIT implies ASSUME [L]A

E

FOR

3 THE SEMANTICS OF THE LANGUAGE OF THE LOGIC "ASSUME" To define the meaning of a formula, we fix a set of propositional variables, w £ VAR, usually called "world", from which the notion of satisfiability of a formula A (denoted by w sat A) is defined as follows:

w sat p iff p E w for

p E VAR

w sat -,A iff w s~t A w sat A v B iff w sat A or w sat B w sat A " B iff w sat A and w sat B w sat A

-+

B iff w s~t A or w sat B

w sat A +2 B iff w sat A

-+

B and w sat B

-+

A

w sat ASSUME[p]A iffw u {p} sat A for p E VAR w sat ASSUME[op]A iff w- {p} sat A for p E VAR Then ASSUME[L]A is true in a state w if A and L are true in a new state obtained from w with minimal changes.

66

L. Farinas del Cerro and A. Herziy

A set of propositional variables w s;; VAR satisfies a formula A if"w sat A". And A is valid, denoted by I= A, if A is satisfied by every set w s;; VAR. Let us give the following example. Let w = { p, q} be a set of propositional variables. Then the formula ASSUME[op]q is true in w, "w sat ASSUME[op]q", because "w- {p} sat q" (i.e. q E w- {p}). It can be represented graphically as follows:

w-{p}

w

4

AXIOMATIZATION

In this section we present a deductive system for the language of ASSUME. Let L, L 1 , L 2 , ..• denote arbitrary literals and let A and B denote formulae. We define M[L 1 , ... , L.] (n ~I) as the modality ASSUME[L 1 ] ••• ASSUME[L.] and we write L # L' if L and L' possess different propositional variables. We admit the following schema of axioms and inference rules. A I:

All tautologies of propositional logic

A2:

ASSUME[L]L

A3:

ASSUME[L](A

A4:

ASSUME[L]oA

A5:

M[L" .. ., L.- 1 ]L. ~ ASSUME[L]M[L 1 ,

R I:

modus ponens

R2:

presupposition

A

-+

B)

~

-+

(ASSUME[L]A

-+

ASSUME[L]B)

-,ASSUME[L]A ••• ,

L.- 1 ]L. if L # L.

A-+B

B

ASSU~E[L]A

The axiom A4 expresses the determinism of changes, in other words the expression "it is false that for every state under the presupposition L, A will be true" is equivalent to "under presupposition L, A is false". The axiom A5 expresses the "preservation criterion"; so if a formula is true in a state and this formula is independent of the change L then this formula will be true again in the state where L is assumed. Our ASSUME operator can be considered as a non-monotonic operator. If we suppose that A is deductible, and if we assume L, then A will not be

2

67

An Automated Modal Lo!fic (}{Elementary Chan!fe.<

necessarily deducible again, for example if we consider an L that is inconsistent with A. In the usual way, we define the notions of proof and theorem. A proof of a formula A from a set S of formulae is a finite sequence of formulae each of which is either an axiom or an element of set S or else is obtainable from earlier formulae by a rule of inference. A formula A is derivable from s (Sf- A) iff it has a proof from S. A formula A is a theorem of ASSUME logic (f-A) iff it is derivable merely from axioms. A setS of formulae is consistent if the formula of the form A " --,A is not derivable from S. We can see that the axioms are valid and the rules preserve validity. Hence the following theorem holds.

Soundness theorem

f- A implies

I=

A

Proof (1) The inference rules preserve validity. (2) A2: ASSUME[L]L is valid by construction of the models. (3) For the axiom A3:

ASSUME[L](A

-+

B)

-+

(ASSUME[L]A

-+

ASSUME[L]B)

the proof is easily obtained. (4) Now we give the proof for the axiom A4:

ASSUME[L]1A +:! --,ASSUME[L]A

For every world w there exists exactly one successor world wL derived from literal L, where wL is either w u {p} if L is p or w- {p} if L is 1p. Thus "w sat ASSUME[L]--,A" means "wL sat -,A", i.e. "w s~t ASSUME[L]A". But the latter is equivalent to "w sat --,ASSUME[L]A". (5) For A5: M[L" .. . , Ln-t]Ln ±=+ M [L, Lt, ... , Ln-t]Ln for L #- Ln we have the following. The proof is obtained by induction on n. Consider any world w. For n = 1, suppose L #- Lt. Then "w sat ASSUME[L]Lt" is equivalent to "w sat Lt", because "'w 1- sat Lt" if and only if "w sat Lt" (wL being the successor world by L as defined before). Now we have as our induction hypothesis that

M[L" .. ., Ln-t]Ln +:! M[L, Lt .... , Ln-t]Ln is Valid for L #- Ln and n ~ N. We shall show that for L #- LN+ t we have M[Lt' ... , LN]LN+ t +:! M[L, Lt' ... , LN]LN+ t is valid. Case 1: There is no i, 1 ~ i ~ N such that L;

=

LN + t· Let an arbitrary w be

68

L. Farinas del Cerro and A. Herzig

a state and let wL the successor would be L as defined previously. Using the induction hypothesis and the fact that L 1 #- LN+ 1 , we have

I= M[Lt. L2, .. ., LN]LN+ 1 +=! M[L2, .. ., LN]LN+ 1 I=

and

M[L 2, ... ,LN]LN+I +=!M[L,L2, ... ,LN]LN+I· "wsatM[Lt.····LN]LN+t"

So is equivalent to

"w sat M[L, L 2, .. ., LN]LN+ 1 ", which means "wL sat M[L 2, ... , LN]LN+ 1 ".

Applying the induction hypothesis once more, we have

I=

M[L2, .. . , LN]LN+ 1 +=! M[Lt. ... , LN]LN+ 1 "wL sat M[L 1 ,

that is which means

••• ,

"w sat M [L, L 1 ,

LN]LN+ 1",

.•• ,

LN]LN+ 1 ",

and the equivalence is shown. Case 2: There exists i, 1 ~ i ~ N such that L; = LN + 1 • Let us take the largest index i. We shall see that in this case M [L 1 , ... , LN]LN + 1 is either valid (L; +=! LN + t) or cannot be satisfied (L; +=! -, LN + t). Applying the induction hypothesis N- i times, we have I= LN+ 1 +=! M[L;+ 1 , ••• , LN]LN+ 1 . By presupposition, A3 and modus ponens (whose validity has already been shown), we get

I= ASSUME[L;]LN+l

+=! M[L;, Li+l• ... , LN]LN+!

We distinguish two subcases. (a) L; +=! LN+ 1 • Then I= ASSUME[L;]LN+ 1 , and also

I=

M[L;, L;+ b

... ,

LN]LN+ 1

Applying the presupposition i - 1 times, we obtain

I= M[Ll, ... , L;, ... , LN]LN+!· Presupposing L, we have

I= Thus

M[L,L!•····L;, ... ,LN]LN+!

I= M[Lt. ... , LN]LN+I +2 M[L, L 1 , ••• , LN]LN+I

because valid formulae are equivalent. (b) iL;+=!LN+I· Then I= ASSUME[L;]ILN+t· Applying presupposition as in (a), we get I=M[L 1 , ••• ,L;, ... ,LN]-,LN+I and I= M [L, L 1 , .•• , L;, .. ., LN]-, LN+ 1 ). As A4 is already established, we also have I= iM(Lt.···•LN]LN+I and I= 1M[L,L 1 , ••• ,LN]LN+I• and the • equivalence is shown.

An Automated Modal Logic '!f Elementary Changes

2

5

69

COMPLETENESS AND PROOF THEORY

5.1

Canonical model for ASSUME

To prove the completeness of the logic ASSUME, we use a special method called the canonical method for ASSUME. We prove what is sometimes called the fundamental theorem, which says "every formula is true in a state or world w if and only if it is a member of it". We define the canonical model from the set W of all maximal consistent sets of formulae of the logic ASSUME. Given a set wE W, we denote by wL a maximal consistent set such that L E wL, and L' E wL if and only if L' E w for L' ¥- L. Thus the set wL verifies Land resembles was much as possible. Before we prove the fundamental theorem we give the following facts: Fact 1

The following formulae are theorems of the logic ASSUME. 1.

(a)

M[L 1 ,

... ,

Ln- 1 ]Ln +=! {

2.

ASSUME[Li]Ln if Li = Ln and for every Lj (i <j < n), Lj ¥- Ln Ln otherwise

Note that if M[L 1 , ••. , Ln- 1 ]Ln is consistent then a formula like ASSUME[L]1L cannot be obtained, since -,ASSUME[L]IL is a theorem. (b)

M[Lt, ... , Ln](A A B)+=! M[Lt, ... , Ln]A A M[Lt. ... , Ln]B, A is ", v

(c)

M[Lt. .. . , Ln]-,M[Ln+ t. ... , Lm](A v B) +2 M[Lt. ... , Lm]-,A "M[Lt. .. . , Lm]-,B

where

M[Lt. .. . , Ln]-,M [Ln+ 1, •.. , Lm](A " B) +2 M[Lt. .. . , Lm]-,A v M[Lt. .. . , Lm]-,B M[Lt, ... , Ln]-,M[Ln+t• ... , Lm]-,A +2 M[Lt. ... , Lm]A Fact 2 ASSUME[L]A E w if and only if A E wL, where w, wL E W. Fact 3 affirms that there is at most one set wL Fact 3

w1 = w2 if w1 n LIT= w2 n LIT, 'v'wt. w2 E W

Satisfiability in the canonical model

Given a formula F and a world wE W, the satisfiably relation "w sat F" is defined as usual.

w sat p iff pEw for p E VAR

L. Farinas del Cerro and A. Herzig

70

w sat -,A iff w s~t A w sat A v B iff w sat A or w sat B w sat ASSUME[L]A iff wL sat A. (Note that wL is unique by Fact 3.) Fundamental theorem

FEw if and only if w sat F.

Proof This is by induction on the structure of F. The base and the classical cases are as usual. In particular, --,FEw means F ¢ w, which is equivalent to "w s~t F" by induction hypothesis, so "w sat --,F". For ASSUME[L]F E w, so by Fact 2 FE wL, which is equivalent to "wL sat F" by the induction hypothesis, so "w sat ASSUME[L]F".

•

Completeness theorem

If F is a valid formula then F is a theorem.

Proof Suppose on the contrary that --,F is consistent. Hence there is a maximal consistent set w such that F ¢ w. By the fundamental theorem, w s~t F. This contradicts the claim that F is valid.

5.2

Automated deduction

In this paragraph we shall define a resolution deduction method for our logic. As in classical resolution (Robinson, 1965), the normal form is a preliminary step to defining the inference rules. For the logic of ASSUME we shall see that a simple normal form can be obtained. Definition 5.1 We say that a formula A is in conjunctive normal form if it is an expression such as A =

c

I 1\ ... 1\

c.

where each Ci is a disjunction of either literals or expressiOns as ASSUME[L]--,L or ASSUME[L]L. To prove that each formula of the logic is (syntactically) equivalent to a formula in normal form, we use Fact I. Hence we have the following. Fact 4 Given a formula F in conjunctive normal form, there is a classical formula F' obtained from F by replacing ASSUME[L]L by True and ASSUME[L]--, L by False, such that if F is inconsistent in the logic of ASSUME then F' is classically inconsistent.

2

71

An Automated Modal Logic of Elementary Changes

Example 5.1 Let us give the following inconsistent formula: F = ASSUME [L](A "(A-+ 1L)). Using Fact 4, we obtain the following equivalent formulae successively:

ASSUME[L]A " (ASSUME[L]A

-+

ASSUME[L]-, L)

A "(1A v ASSUME[L]-,L)

Since, by Fact 4, ASSUME[L]..., Lis equivalent to False, and ..., A v False is refutable iff ..., A is refutable, we obtain the classical refutation: A, -,A False

Consequently, resolution for the logic ASSUME, after normalization (Fact 4), is equivalent to classical resolution. From this, we deduce that ASSUME logic is decidable. 5.3

Completeness using the normal form

In the previous paragraph we have established that every formula is syntactically equivalent to a formula in conjunctive normal form. In the proof of soundness we have shown that the same equivalence is valid. We can summarize these two results as the following fact. Fact 5 Given a formula F of ASSUME logic there is an effective procedure constructing a syntactically and semantically equivalent formula F' of classical calculus.

We can use Fact 5 to obtain a more direct proof of the main properties of our system. Soundness and completeness theorem theorem of ASSUME.

F is a valid formula iff F is a

Proof Given a valid formula F and the classical formula F' corresponding to its normal form (the semantic part of) Fact 5 warrants that F' is valid in ASSUME logic. Since classical logic is a fragment of ASSUME logic, this means that F' is valid in classical logic. Equivalently, F' is a theorem of classical logic, and (the syntactic part of) Fact 5 establishes that F is a theorem of ASSUME.

Consequently, since classical propositional calculus is sound, complete and decidable, our ASSUME logic will be sound, complete and decidable too.

L. Farinas del Cerro and A. Herzig

72

6

DATABASE UPDATES

From now on we employ a logic that can be used to formulate hypotheses from which deductions can be made. For example, when we say ASSUME[L]A, we have in mind that A holds, under the hypothesis L, or in other words, in the state of affairs where Lis realized, A holds. Consequently this operator can express changes. Now we give the formalization of database updates in the framework of this formalism. Our presupposition modal operator ASSUME allows us to add information to the database, since "ASSUME[p]A is true" means in the state where p is true, with minimal change, A must be true. For the operation of retracting information of the database, we consider two approaches. In the first approach we suppose that each database is a complete theory, i.e. information that does not appear in the database is false, and we define the retracting operator by the formula ASSUME[op]A. Thus subtracting a piece of information from the database is the same thing as adding its negation. Information is conceived of as action in the sense that actions can either be activated or inactivated. But this approach is not so natural for a database with incomplete information. To remedy this problem, there is a second approach, in which retracting information is the same thing as giving an undefined value to this information. This will be formalized as ASSUME[Undefine p]A. In this case 'Undefine" is a new modal operator. In this logic the propositional variables can be either "true", "false" or "undefined". We shall not give details here, but this approach could be developed in a way very similar to that of the first approach. The difficulty lies in combining the two types of modalities, assume, and true, false and undefined. 6.1

Examples

We now give a simple example. Let us consider the database {p, q, e}. If we add to this database the information I then we obtain the new database {p, q, e, /}.In terms of states, we have w

w'=wU{t}

If we retract the piece of information p, in our first approach we obtain, in terms of states,

2

An Automated Modal Logic of Elementary Changes

w

73

w'=w-{p}

The inconsistency produced by adding ---, p to the database is solved directly by the operator ASSUME, so a minimal change is observed. Let us now give an example in the following database: S = {t, ---,,, ASSUME[t]r}, which is inconsistent, or in other words from {t, 1r} we can deduce ASSUME[t]1r. Using the model definition w

w'=wU{I}

0 we obtain the set {r, t, 1r}, which is inconsistent. Consequently S is inconsistent too. The negation---,, is added to the model in order to show the inconsistency. As we have seen, the ASSUME operator can be useful to express updates. Moreover, the corresponding logic, in contrast with temporal logic, is a logic that gives the possibility to reason about updates. For example, in the database {ASSUME[L]A, ASSUME[M](B-+ ASSUME[N]E)}, we have two updates L and M over different pieces of the database, and the update N must be done under the presupposition that the database has been updated by M.

7

"ASSUME", CONDITIONAL LOGICS AND THEORY CHANGE

Examples of systems supporting revision operations are those of conditional logics (Stalnaker, 1968; Veltman, 1985) and of logics of theory change (Gardenfors 1984; Alchourron et a/., 1985; Segerberg, 1986). Both investigate the question "Given an actual state, what is the state like after the addition of a further formula?" In conditional logics, we write for our update {I)

S I- COND[L]A

Where "COND[ ]" stands for a family of conditional operators of the

74

L. Farinas del Cerro and A. Herziy

language. On the other hand, in a logic of theory change we write

(2)

(S

+ L) f-A

where the operator"+" is a function from theories and revision formulae to theories. The result is a maxima consistent theory containing the update, i.e. if we were to expand it by a further formula of the original theory S we would get an inconsistent theory. We construct it by first adding the update to the theory (eventually causing inconsistencies), and in a second step we delete propositions of the original theory until we get a consistent theory. In ASSUME semantics, in order to evaluate a formula ASSUME[L]A in an actual world w we evaluate A in a new world wL satisfying L and resembling as much as possible the actual world, where "resembling as much as possible" means that every literal L' different from L has been kept. Thus we have chosen a particular adjustment function defining the nearest state as the one reached by as few changes as possible in the truth values of the atomic formulae. This allows us to build the new state easily. On the other hand, this choice is very restrictive. Another approach, which is less restrictive, is provided by the intuitionistic calculus (Gabbay, 1982) in which the implication L =:>A is evaluated in a state w as follows:

w sat L

=:>

A iff for every w' such that w s; w', w' s~t L or w' sat A

In contrast, if we use this implication to represent the updates, we may obtain paradoxes, because L =:. (---, L =:> A) is a theorem of the intuitionistic calculus. In this calculus worlds are interpreted as states of knowledge. Thus the updates represented by implications express expansions. In the same framework, a logic between intuititionistic logic and ASSUME logic can be defined, avoiding some paradoxes of the intuitionistic implication and without the strict minimal change constraint of ASSUME logic. Its satisfiability relation is

w sat ASSUME[p]A iff for every w' such that w u {p} s; w', w' sat A w sat ASSUME[1p]A iff for every w' such that prj; w', w- {p} s; w', w' sat A Although in this chapter the formal properties of this logic are not given, we can see that the axioms A4 and A5 are not valid. Consequently, this logic does not collapse into the propositional calculus as previously. In analogy with intuitionistic calculus, in the new logic, worlds can be interpreted as states of belief. For example, to evaluate the formula ASSUME[1p]A in a state w we should evaluate A in a state w' such that w s; w'. Thus the updates represented by the ASSUME operator can express revisions.

2

8

An Automated Modal Logic()( Elementary Changes

75

RELATION TO OTHER RESEARCH

If we consider the expression "after adding A we have B" or "assuming A we have B" as a particular implication, our approach strengthens the classical implication A -+ Bin order to avoid some of its paradoxes. Thus, as we have seen above, A -+ (1 A -+B) is not valid from our point of view. This joins the standpoint of relevant logics as given in Orlowska and Weingartner ( 1986), where the classical calculus is filtered by the following criterion: a classical implication A -+ B will be valid only if A is "relevant" for B. On the basis of a relevance relation between predicates, Orlowska and Weingartner require that for each predicate in the consequent there exists one in the antecedent that is relevant for it. Thus A -+ ( 1 A -+ A) is valid, and A -+ (1 A -+ B) and A -+ (B -+ A) are valid only if A is relevant for B. Our approach has a particular understanding of their relevance relation between formulae. For us, A is relevant forB if A and B satisfy the following criterion of deductive independence: for every C deduced from A and B, C can be deduced from either A or B alone Another work connected with our approach is the work of Doyle (1979) on truth maintenance systems (TMS). In our case the structure of the semantic network is very simple because it is represented by a propositional complete theory, on the other hand, in contrast with TMS, if a certain belief is out of the network this entails that the negation is in. But, given the structure of our states and the properties of our logic, we cannot produce states where a certain belief and its negation appears. In TMS if this is the case then a revision of the network (nodes and their justifications in TMS terminology) is necessary to re-establish consistency.

9

CONCLUSION

In this chapter a non-classical logic that allows us to formalize updates in databases has been presented. The main property of the logic is that the real modifications to the database are given only when necessary, i.e. the new state obtained by modification is not built completely, because we use the ASSUME properties. Certainly the changes that can be expressed by our formalism are very restricted, one step that must be made is to extend this kind of change to any formula, i.e. in ASSUME[A]B, A can be a formula, eventually containing ASSUME operators. For this kind of extension, many problems appear; in Particular, the changes are non-deterministic, hence for an update many

76

L. Farinas del Cerro and A. Herzig

successor states are possible. To obtain these states, a consistency algorithm must be used; in this case, it will be impossible to use a simple syntactic criterion as for our logic. Therefore investigations must be continued. ACKNOWLEDGMENTS

We should like to thank Laurence Cholvy, Robert Demolombe, Jean Fargues, Solomon Passy and Philippe Smets for their many useful observations and criticism. BIBLIOGRAPHY Alchourron, C., Giirdenfors, P. and Makinson, D. (1985). On the logic of theory change. J. Symbolic Logic 50, 510-580. (A complete and very interesting study of modifications in theories, which marries the Alchourron and Makinson approach with the Giirdenfors approach.) Castilho, J. M., Casanova, M. A. and Furtado, A. L. ( 1982). A temporal framework for data base specifications. Proc. Very Large Data Bases, pp. 280-292. (One of the first papers considering a evolutive database as a Kripke model in the framework of temporal logic.) Cholvy, L. (1986). Updates semantics under domain closure assumption. Report ONERA-CERT, Toulouse. Doyle, J. (1979). A truth maintenance system. Artificial intelligence 12, 231-272. Gabbay, D. (1982). N-prolog: an extension of Prolog with hypothetical implication II. J. Logic Programming 2, 251-283. Giirdenfors, P (1984). Epistemic importance of minimal changes in belief. Austral. J. Phil. 61, 136-157. (A very original approach in changes in belief.) Orlowska, E. and Weingartner, P (1986). Semantic considerations on relevance. ICS PAS Report 582, Warsaw. Robinson, J. (1965). A machine-oriented logic based on the resolution principle. J. ACM 12, 23--41. Segerberg, K. ( 1986). On the logic of small changes in theories. Auckland Philosophical Papers. Auckland University. (A very clear paper about the relations between theory changes and conditional logics.) Stalnaker, R. ( 1968). A theory of conditionals. American Philosophical Quarterly Monograph Series 2, 98-112. (A classical paper in conditional logics.) Veltman, F. (1985). Logics for conditionals. Dissertation, University of Amsterdam. Warren, D. S. (1986). Data base updates. Pure Prolog: Proc. Int. Con.f on Fifth Generation Computer Systems, /COT, Tokyo, pp. 244--253.

DISCUSSION Jean Fargues: In this discussion, we abbreviate the modality ASSUME by using ASS, for convenience. This chapter presents an interesting way to envision consistency maintenance in the case of updates or revisions of a set of formulae. The logic of

2

An Automated Modal Logic (}{Elementary Changes

77

elementary change is a modal logic that allows us to express in the language of the logic itself the fact that q can be derived after revising the current set of formulae by addingp. Thus the modal formula ASS[p]q is true in Tifand only if q may be derived from a new set of formulae T' obtained from Tby addingp. From this definition, it becomes possible to reason about (elementary) changes to a theory T, without physically adding or retracting formulae in T. This comes from the Kripke semantics used to define this logic: if Tis the current theory we consider, i.e. the current world, then a new revised theory T' will be obtained as a successor world after a revision, this revision being expressed by the fact that some ASS[p]q formula is satisfied in T. Thus, although reflecting some non-monotonic behaviour of particular logic theories, the modal logic of elementary change remains monotonic, well founded and complete, as the authors have proved in the chapter. A first criticism of the logic of elementary changes could be that these changes are elementary. In fact, the language ofthis logic has a very strong restriction, namely that p must be a literal in any modal formula ASS[p]q. Another restriction comes from the fact that the modality ASSUME is a kind of successor modality (as in linear temporal logic for example) and from the determinism of the logic of elementary changes. An argument against this criticism is that we don't know how to express the axiomatics of a logic of non-elementary changes that can deal with addition or revision of arbitrary formulae, and therefore we cannot prove the completeness of such a logic. For example, the logic discussed here cannot be extended to formulae ASS[p v q]r without introducing non-determinism and completeness problems. Perhaps some other chapter in this book can provide some intuition on the way of considering a general logic of arbitrary change. I think that the most advanced survey on the logic of change has been made by David Makinson (1985), following Peter Giirdenfors, Isaac Levi and Robert Stalnaker. I shall refer to that work later in the discussion. Another criticism concerns the use of the logic of elementary change for the database-update problem. A reader familiar with some real database application could have some difficulties in filling the gap between the practical databasemanagement system he uses and the theoretical concerns of this paper. For example, a simple thing that we might like to express on a database is "we must take care to add p whenever we add q by update", or "we must take care to retract p whenever we retract q by update" When I read this chapter, I had problems in expressing such conditions in the Farinas del Cerro and Herzig logic. Thus I should appreciate it if the authors could propose to me a formulation for such constraints, by considering a particular theory including specific additional axioms. An idea could be to write formulae in the form ASS[p ]ASS[ 1 q ]false to express the fact that each database state does not contain p and 1q. But, ASS[L]true is a theorem, so that 1 ASS[p]ASS[ 1 q]true (equivalent to the preceding formula) is not a valid formula. Thus I do not see very well how to use the logic of elementary change to express integrity constraints in a database. A more fundamental problem concerns possible extensions to the logic of elementary change. To give a better understanding of the general problem of defining ~ logi~ of change, we must recall an alternative way of handling logic of change, dollowrng the good survey given by Makinson ( 1985). We recall some notations and efinitions.

78

L. Farinas del Cerro and A. Herzig

We will note that TH(A) = {x: A I- x }, I- being the deductibility relation on the logic we consider. We shall say that a set T of formulae is a theory if it is closed under TH, i.e. if T = TH(T), and we suppose here that the TH operator is such that l. 2. 3. 4.

As; TH(A) for all sets of formulae A TH(A) = TH(TH(A)) [idempotency] A s; B = TH(A) s; TH(B) [monotonicity] There exists a finite A' s; A such that p E TH(A) = p E TH(A')

[compactness]

Thus we suppose that (A, I-> is a logical structure, following the traditional definition of Gentzen and Tarski. We note that A -pis an arbitrary maximal consistent subset of A that does not imply p (several such sets may exist, of course). Giirdenfors, Levi and Makinson introduced a general meta-axiomatic defining the properties that a logic of change should have. To do this, Makinson used the "+" operator, defined by A+ x = TH((A- •x) v {x})

This definition is called the Levi identity. A + x appears as being the revised theory obtained from A by adding x, in the same way as Farinas del Cerro and Herzig do in this chapter, the consistency of the theory being preserved. In fact, the definition of ASSUME may be reformulated as ASS[p]qE T-=qE T+ p or by the equivalent definition ASS[p]qE T-=qETH((T- op)v{p}) Thus I should like to know if the logic of elementary change that the authors introduce may be considered as a particular case of the more general formalism introduced by Makinson and the other authors I have mentioned. In this case, it should be possible to extend the formalism of the modal logic of elementary change in order to allow for more complex formulae within the scope of the ASSUME modality. Frank Veltman: Even elementary changes are complicated-much more complicated than Luis Farinas del Cerro and Andreas Herzig seem to think. The only databases that could work well under the kind of changes that their theory takes into account are databases in which all units of information are mutually independent: it must be possible to alter the truth value of one atomic sentence and leave the truth values of all other atomic sentences as they stand, without this leading to incoherencies. I wonder if there are any complete databases that can harmlessly be altered in such a manner. Perhaps there are, but none of them is comparable to the kind of databases human beings have in their heads. The theory presented by Farinas del Cerro and Herzig does not teach us very much about changes in belief states, not even about the elementary ones. Let me illustrate these remarks by an example. In Holland the traffic lights have three bulbs: red, yellow and green. At all times, exactly one of these bulbs is shining; so the light is always either red, or yellow or green, but never both red and yellow, or both yellow and green, let alone both red and green. Now, let L be some Dutch traffic light and suppose our database contains the information that L was red at time t. If this database is anything near to complete then it will also contain the information that L was not green at time t and that L was not yellow at time t. It is not difficult to

2

An Automated Modal Lol(ic of Elementary Chanl(es

79

imagine a situation where it would be natural to assume that L had been green at time t instead of red: many dutchmen have at least once in their lives thought this: If the light had been green, I would not have to pay this fine

However, if we update the database according to the rules offered by Farinas del Cerro and Herzig, we end up with a database that not only says that L was green at timet, but in addition maintains that L was red at timet. And Dutch law does not say anything decisive about fines having to be paid or not in cases where the light happens to be both red and green. By changing the example a bit, we can illustrate another shortcoming of the theory of Farinas del Cerro and Herzig. Apparently, the authors think that an elementary assumption will always bring you from one complete database to another complete database. Again, this may be true for complete databases in which the basic information pieces are mutually independent, but it certainly does not hold generally. If it did then none of us would have any difficulty in choosing between the following alternatives: If the light had not been red, it would have been green If the light had not been red, it would have been yellow

But we all find it hard to make a choice here, don't we?

Reply: The main comment by Fargues and Veltman concerns the dependence of units of information. Since the only formulae that are true in every state of the model are the valid formulae, we cannot formalize this kind of dependence in our logic. Certainly, as mentioned in Section 7, it will be necessary to extend this logic in order to capture the possibility of expressing integrity constraints, which are a way of representing the dependence of units of information. Nevertheless, our aim was to define a simple modal logic that we hope is tutorial rather than to give a system able to resolve all the classical problems in databases. The example proposed by Veltman is an illustration of the dependence problem. It suggests the following question: what is a unit of information? If we consider that the colour of the traffic light is a unit then his remark is true, but this statement may not necessarily hold. Concerning Fargues' last comments, in which he asks "how are conditional logics related to theory change?", an answer is provided by P. Giirdenfors and K. Segerberg as follows: No family of conditional expressions of the form CONDITIONAL[A]B can be defined in the object language satisfying the very general postulates for theory changes of Alchourron, Giirdenfors and Makinson.

Additional reference Makinson, D. (1985). How to give it up: a survey of some formal aspects of the logic of theory change. Synthese 62, pp. 347-363. Reidel, Dordrecht.

3

Formal Expression of Time 1n a Knowledge Base EUGENE CHOURAOUI Groupe Representation et Traitement des Connaissances, Centre National de Ia Recherche Scientifique, Marseille, France

Abstract A justification is given for the desirability of using logics that take explicit account of time, especially in artificial-intelligence applications. Alternative models of the structure of time are outlined, together with their corresponding axiomatizations. Details of the use of temporal logics are illustrated by a case study of a particular new time-dependent logic, called TL-1. This logic is based on an infnite linear model of time, and contains the novelty that "immediate" past and future times are distinguished from the more general "past" and "future" by the use of special operators.

1

INTRODUCTION

Time plays an essential role in reasoning modelling where the involved informations are evolutive or submitted to changes, these changes being generally expressed by means of rules. The knowledge domains where time is involved are numerous and varied. They concern not only the real world (medicine, robot motions, automatic plant controllers, simulation of digital circuits, etc.) but symbolic universes as well (natural-language understanding, knowledge representation, man-machine communication, theory of programming, planning, etc.). In that framework, the problems to solve are so complex that, for a long time, work in artificial intelligence ignored, with a few exceptions, the study of temporal reasoning. However, because of the importance of time in many applications and despite its low level of computer modelling, researchers in Artificial Intelligence have gradually moved towards a new direction in research on time representation and Processing, which is now being actively pursued. The corresponding works can be divided in two categories: on one hand those based on computer models specifically elaborated from the object domains and/or the problems that are to be solved, and on the other hand those based on temporal logics and involving the techniques of automatic theorem-proving. We consider ~~N·STANDARD

LOGICS FOR AUTOMATED ASONING ISBN 0·12·649520·3

Copyright((" /9H8 Academic Pre.fis Limited All rights f~( reproduction in any form rt'.fierr,ed

82

E. Chouraqui

these two approaches in the following two sections; the rest of the chapter describes a particular temporal logic and its application to a symbolic system of knowledge representation, the ARCHES system.

2

COMPUTER MODELS OF TIME

An examination of the work done using the first approach reveals the diversity of computer models of time. These models have specific formal features depending on the cognitive purposes for which they have been developed. Hence the relations they have in common are mainly determined by the most general and intuitive properties characterizing time. Thus we can divide them into three categories: (i) models expressing sequences or changes between state; (ii) models representing a nonlinear time; (ii) those describing a linear time with manipulation of time intervals. 2.1

Models expressing sequences of states

This category includes the earliest work on AI relating to time; it concerned the situational calculus (McCarthy and Hayes, 1969). The domain of the study is represented through a set of states whose modalities of sequence are formally defined. Each state-called a situation-is described by features that themselves are used to infer other features about this situation or about future situations. The fluents are functions enabling calculations on these features. They are defined on the set of situations and are divided into two groups according to the nature of the target sets: the range of the propositional fluents is the set {True, False} while the range of the situational fluents is the set of situations. For instance, Rain(x)(s) is true if in the situation s it rains in x and the value of time(s) is the time t associated with the situation s. The causal assertions between situations are denoted by the specific fluent F(IT)(s), where IT is any propositional fluent: F(IT)(s) means that the situations must be followed (without specifying the moment) by the situation s' satisfying the fluent IT. Some other particular fluents enable a more accurate description of the sequences of situations (past or future) and also of the action representation. It is interesting to note that in this model neither a complete axiomatic of time nor a decision procedure similar to certain theorem-proving techniques have been developed (see Section 3). At the same time, examination of this model leads us to consider different approaches to modelling the changes between the states or the actions (Hayes, 1971); but, anyway, these approaches have the same limitations as the situational calculus. More recently, a temporal model expressing the chain ~f discrete events

3

83

Formal Expression of Time in a Know/edl(e Base

has been integrated into a medical expert system for cardiovascular diseases. This is the MECS-AI system (Koyama et a/., 1981), in which the past is introduced by key words like PREVIOUS or IN-PAST and the present by a special state called PRESENT; the future has no representation. This system handles chainings between states whose sequencing is not determined by action rules. During a consultation every state is generated, and the inference rules defined from past-related data enable the fixing of a diagnosis and the corresponding treatment. This model has been elaborated so as to solve some of the temporal problems arising in the selected domain, which justify the limits on its power of expression associated with its low level of formalization. 2.2

Models expressing nonlinear times

The study of computer modelling of branching and non-dense time has been approached in the context of natural-language understanding and, more specifically, in the context of tense analysis. The most significant model denotes the time by an ordered pair (T, <)in which Tis a set of temporal points and < a partial order on T (Bruce, 1972). The temporal segment concept has been introduced to represent the events. Every segment Sis defined as a subset of T, ordered by the relation < such that if x E T and y, z E S satisfy the conditions y ~ x ~ z and {x} uS is ordered by the relation < then xES. Primitive functions enable us to define operations on the segments like the following: before (A, B) =def Va, b(a E A

1\ bE

B -+a < b)

during (A, B) =der A c B " A ¥- B in-same-time (A, B) =def A = B Then the grammatical tenses are defined as a conjunction of primitive functions. For instance the statement "John left for Paris at three o'clock" is denoted by the formula sentence- I (S" S 2 , S 3 ) =before (S 2 , S 1) "in-same-time (S 3 , S 2 ) in which the segment S 1 is the time of enunciation, S 2 the time of the reference (at three o'clock) and S 3 is the time of the event (leaving for Paris). It is obvious that, with such a system, one cannot solve all the questions related to tenses, but can only recognize the tenses explicitly described in the sentences. 2.3 Linear-time intervals

models

with

manipulation

of

temporal

Several models of linear time have been elaborated from two fundamental concepts: the point and the interval, time being represented by an oriented line

84

E. Chouraqui

(Allen, 1983, 1984; Kandrashina, 1983; Long, 1983; Long and Russ, 1983; Malik and Binford, 1983; Vilain, 1982). This kind of model provides a rather simple description for a number of different problems; for instance naturallanguage understanding, digital-circuit design, plant management by scheduling of production phases and plans generation for robots. In these different application domains, the notion of interval is used in several ways: (i)

to define the length and the localization of an event (e.g. "the lecture lasts 45 minutes" and "in' summer it is hot"); (ii) to locate temporary situations (e.g. the youth and the employment of a worker); (iii) to represent the successive phases of an event (e.g. the construction of a building requires several preliminary steps like the obtention of official papers, the project design, the financial set-up and the civil engineering); (iv) to express all the occurrences of an event (e.g. "he visited Paris three times" and "yesterday he smoked ten cigarettes"); (v) to represent the time of the occurrences for a discontinuous event (e.g. "the train stopped, now it starts again" and "he travelled in two steps"). Generally, handling points and intervals involve three kinds of relations: primitive relations, composite relations and inference rules. The primitive relations divide themselves into relations between intervals (A before B, B during A, A and B overlap, A and B begin in the same time, etc.), relations between points (P 1 before P 2 , etc.), and the relations between points and intervals (P 1 before A, P 1 during A, etc.). The composite relations are obtained by combining at least two primitive relations. Then the inference rules are elaborated from still existing primitive or composite relations. They are of the form n

"If

1\

(R 1A 1B 1)

Then (R A B)"

i= I

and enable the inference of new information from the management of temporal data by using the right control strategy. This latter must be composed of at least two procedures: one procedure to propagate temporal constraints that enables dynamical maintenance of the relations between the newly created points and intervals and the still existing ones; and a second procedure to verify the consistency of these relations.

3

Formal Expression of Time in a Knowledge Base

3

85

THE TEMPORAL LOGICS

In classical logic the temporal aspects cannot be expressed. It is possible to extend it either by adding temporal operators, or temporal variables and constants. Temporal logics arise from the first approach. They have been thoroughly studied by logicians following in the wake of the pioneers of modal logics (Hughes and Creswell, 1973). The semantics of possible worlds introduced by Kripke gives a convenient interpretation of modalities taking into account the fact that modal logics are not bivalent (Kripke, 1963). It is based on either ignorance of the past, or on the uncertainty inherent in the future, which explains the close relations between modal and temporal logics. The computer study of these logics brings about the design of decision procedures using theorem-proving techniques (Farinas del Cerro, 1981; Farinas del Cerro eta/., 1986; Schwind, 1984). One can divide the temporal logics into two groups: tense logics and logics of dating. Their formalization enables the definition of translation rules between the two groups. 3.1

Tense logics

The tense logics are based on the representation of three tenses: past, present and future. To do this, classical logics are extended by the addition of the temporal connectors F (true at some time in the future) and P (true at some time in the past). In addition, the symbols G and H are used, meaning respectively true at all future times and true at all past times. The logics, built on F and P considered as one-argument functors, are called non-measurable tense logics (NMTL). If these functors have a second argument expressing the distance between the present and the time of the event (in the past or the future), the corresponding logics are called measurable tense logics (MTL). In NMTL, FA means "it will happens that A" and P A "It happened that A"; while in MTL, FnA means "It will happen that A in n units of time" and PnA "It happened that A, n units of times ago". NMTL can be divided into six groups (Cocchiarella, 1966a, b; Prior, 1967) (see Table 1). (i)

The minimal logic TLm. This is the most elementary system in which the fundamental properties of time like transitivity or linearity are not taken into consideration. (ii) The transitive logic TL 1 • Two axioms are added to TLm to express the transitivity of the future and of the past. (iii) The linear logic TL 1• Two new axioms indicate that the time is completely linear in the past as well as in the future. It is important to note that linearity relative to the past is a necessary property of time

86

E. Chouraqui

, ,

,6L.~ .-

··--•-•--(P-re-se-,,~

~

/r,

Future

Past

Fig. 1

while linearity of the future is not. So if we delete the axiom A4 (Table 1) we obtain a branching-time logic (TL,). Each branch from the present represents a possible situation in the future. When the future becomes the present, the branches are deleted except for one, which becomes a single line representing the past (Fig. 1). (iv) The infinite logic TL;. This expresses a doubly infinite time, which is completely in accordance with our own intuition. (v) The dense logic TLd. This expresses that between two different moments, there is always an intermediary time. (vi) The continuous logic TLc. In this logic the notion of time continuity is completely defined by the addition of two new axioms. It should be understood that any of these logics is the minimal logic plus the axioms with the corresponding name from the left-hand column of Table 1. Specific logics have been developed from NMTL with regard to welldefined classes of problems. Let us mention for instance the logic of actions (Schwind, 1978, 1983), and the logics with discrete moments of time (Fusaoka et a/., 1983; Gabbay, 1972; Manna and Pnueli, 1983). In the MTL it is necessary to introduce a subscript that indicates the distance between the present and the time of the event (the future or the past). This subscript is a numerical variable, which may be quantified. So it is necessary to consider here the axiomatics of the theory of quantification. But this theory will require us to introduce new connectors such as G and H (sec Table 1). For instance the two statements "It will always happen that A" and "It has always happened that A" can be expressed in this logic by the formulae Vn F.A Vn P.A

which mean respectively "for all units of time n, it will happen that A in

11

3

87

Formal Expression of Time in a Knowledge Base

Table 1

Non-measurable tense logics

Minimal logic

Definitions: Gp = iFip

Hp = i Pip Inference rules (I) if 1- p then Gp (I') if 1- p then Hp (2) if 1- p and 1- p = q, then 1- q Axioms AO Every classical tautology AI [G(p=q)]=[Gp=Gq] AI' [H(p=q)]=[Hp=Hq] A2 FHp=>p A2' PGp=>p Transitive logic

A3 A3'

FFp=Fp

PPp= Pp

A4 A4'

[Fp

A

Fq] = [F(p

A

[Pp

1\

Pq] = [P(p

1\

Infinite logic

AS, AS'

Fp v Fip Pp v Pip

Dense logic

A6 A6'

Fp=FFp

A7

[G(HFp = Fp) = [Pp=> Fp] [H(GPp=>Pp) =[Fp=>Pp]

Linear logic

Continuous logic

AT

q) v F(Fp q) v P(Pp

1\

q) v F(p q) v P(p

1\

H(HFp = Fp)

1\

(HFp = Fp)]

1\

G(GPp=Pp)

1\

(GPp=Pp)]

A

A 1\

Fq)] Pq)]

Pp= PPp

units of time" and "for all units of time n, it happened that A, n units of time ago". The organization of MTL, as before in NMTL, is more critical to fix. For instance, a minimal system where none of the intuitive properties of time (transitivity, linearity, etc.) hold still contains ten axioms, while the system TLmin has only four (Prior, 1968).

3.2

Dating logics

!he dating logics have an additional specification: the date. The two followIng statements show the use of this kind of specification: (El)

"People of Paris took the Bastille"

88

E. Chouraqui

(E2)

"People of Paris took the Bastille on 14 July 1789"

The statement (E1) is denoted by a formula from NMTL, while the statement (E2) can be represented by statement E1 with the additional assignment of the date. Generally, an acceptable notation to represent the statement "Pat the date t" is the following: T,( P). The basic system consists of one rule of inference and four axioms (Rescher and Garson, 1968): (i)

inference rule:

(ii)

Axioms:

If A Then T,(A)

(A 1)

T,(non p) <=>non T,(p)

(A2)

T,(p "q) <=> T,(p) " T,(q)

(A3)

Vt T,.(T,(A))<=> T,. Vt (T,(A))

(A4)

VtT,(p)<=>p

(/fA is a property always true then at the date t, A is true);

(linearity)

From this basic system, other logics of dating have been elaborated, in particular to express transitivity of time. For instance the axiom T,·( T,(p)) <=> T,(p) indicates that the dates are marked relatively to a fixed origin (Prior, 1957).

4 A CASE STUDY IN KNOWLEDGE REPRESENTATION USING A TEMPORAL LOGIC We now describe a particular non-measurable tense logic-called TLa-and demonstrate its use. The formal properties of TLa, in which time is completely linear and extends infinitely far into the past and the future, are intended to make treatment of the evolution of knowledge simple (Chouraqui, 1983, 1986). In addition to the connectives of classical logic, it includes four temporal connectives: immediate past and future, and more general or mediate past and future. Knowledge is represented by sets of states, whose evolution in time can be shown through the use of the former two connectives. TLa belongs to the NMTL logics with discrete instants of time (see Section 3.1 ). Its axiomatization (see Table 2) has been governed by our intention that this logic should be used to handle time-dependent evolution of representation of knowledge. More precisely, TLa is one of the component modules of the ARCHES system (Chouraqui, 1981) for the representation and manipulation of knowledge. Elementary facts in ARCHES correspond to statements like

3

Formal Expression

(Jf

Time in a Knowledge Base

89

"Peter sleeps peacefully", "Mary's dress is made of light red wool", "the car V is located in the garage G" and "elephants have trunk-like noses". Facts are represented by suitable compositions of entities constructed from a general notion of concept and instances of concepts called individuals. Production of new knowledge from existing facts can occur through two modes of reasoning-deduction and analogy-based upon specific inference mechanisms. The primitive data structure is the descriptive term, which is built from four basic entities: features, classes, operators and the functional symbol $. Features may be information without further extensions (e.g. Peaceful, Red, Light), concepts (Nose, Trunk) or individuals (V, G). Features having the same semantic nature are grouped into sets called classes. Class symbols express this semantic information. Thus Red, Trunk and G refer respectively to the classes Colour, Form and Localization. The relationships that exist between the classes (considered as pseudovariables) and the features are expressed by particular functional symbols, in general n-ary, called operators. For example, the statement "Automobile whose seating capacity is inferior or equal to 5" shows the kind of relationship that can exist between classes and features: the feature "number of seats" (i.e. numerical value 5) is related to its class Capacity by the arithmetical operator LE ( ~ ). Moreover properties and relations can be described locally, as in the following two statements: "Amphoras stamped T of type P" and "Automobile of a dark blue colour". The local characterization of features by descriptive terms enables the precise representation of this descriptive situation (properties of properties, relations made precise by properties, relations of relations, etc.). The relation that expresses the attribution of a descriptive element to a feature is represented by the functional symbol $. When a relation is not locally described, we consider that the corresponding feature is characterized by the empty description A (see Section 5). Thus a descriptive term is any expression of the form opn(T, td in which op" is an n-ary operator, T a class symbol and t 1 a tuple of degree n whose elements are either of the form $(t 1 , A), or of the form $(t;, opn;( T;, t li)), where t; is a feature referring to the class T, A is the empty description and opn;(T;, tli) a descriptive term locally characterizing t;. The descriptive terms are the atomic propositions of TL11 • General descriptions are expressions constructed from descriptive terms by means of classical and temporal connectives. They represent the formulae of TL11 ; but the ARCHES system manipulates only a subset of these formulae, containing ~e ~escriptions in a normal form expressed as a conjunction of disjunctions. ?r tnstance, the statement "the robot Musclor is in the passage PI and there ~til exist a state such that it will be in the workshop WI" are represented in R~HES by descriptions, as shown in Fig. 2 (the link ADP expresses the attnbution of a description to an individual). Likewise, the composite

90

E. Chouraqui

Robot

ADP

Localization

Localization

•

Musclor

Pas sa~ •~•P1

Works~

•~W1

Fig. 2 Representation of "the robot Musclor is in the passage PI and there will exist a state such that it will be in the workshop WI".

connective ANDTHEN, constructed by means of immediate future and negation, allows the representation of natural-language statements expressing changes of states (see Section 5). Figure 3 shows the representation of "Pierre has woken up". Statements of this type make up the database of the ARCHES system. The rule base contains the general laws of the investigated domain, as for instance the two following rules: "Whenever a robot goes to a workshop of type W, it will then perform a task of type T" and "A person will be immunized (for ever) against varicella, after he has caught this disease". The inference engine uses both rules and facts to solve problems, with the help of the axioms of TL11 •

5

SYNTACTIC CONTENTS OF TL,1

Formulae (descriptions) in TL11 are constructed in the same way and with the same general alphabet as in conventional texts on logic, with the additions of the empty-description symbol A. and the four temporal connectives mentioned above: immediate future ( + ), mediate future ( EB ), immediate past (- l and mediate past (e). .::\ .is our language: the countable set of legal descriptions. In addition, it is convenient to denote changes of state over two successive

3

Formal Expression

'!t" Time in a Knowled!fe Base

91

Person

ADP

State

+

State

•

Pierre

• Sleep

Fig. 3

•Awake

Representation of "Pierre has woken up".

instants of time by the connective ANDTHEN, which is defined m the obvious way in terms of the immediate-future connective as a ANDTHEN b

=der

a "

+ (1a

" b)

For instance, statements such as "The train leaves the railway station" or "John goes to his work" will be represented by means of this connective. The descriptions are defined in the obvious way, as mentioned above, from the descriptive terms and the classical and temporal connectives. On the set d we define a deduction relation between the descriptions, denoted by =>, that is the smallest transitive and reflexive relation satisfying the axioms of Table 2 (Porte, 1965). The relation = has been introduced to reduce the length of the axioms. We state that a = b (the description a "is equal to" the description b) if and only if a=> b and b =>a. We can easily show that the relation = is an equivalence. Conditions CI-C3 and CIO-CI6 specify that the semantic properties of the connectives ", v and -, are the same as in classical logic. Conditions C5, C6, C5' and C6' indicate respectively that the present and the immediate future belong to the future and the present and the immediate Past belong to the past. Conditions Cl7 and Cl7' express the determinism of the evolution of descriptions. Conditions Cl9 and Cl9' ensure that the connectives EB and 8 are transitive. C20 and C20' express the coherence of the evolution of descriptions in that the set d has one and only one minimum. Conditions C~ I • C22, C21' and C22' mean that if the formula a => b is always true then it ~Ill always be true in the future and in the past. Finally, conditions C23 and 24 express the symmetry of connectives defining the future and the past.

92

6

E. Chouraqui

SOME SEMANTIC CONTENTS OF TLA

The interpretation domain ofTL& is the (non-empty) union of two sets D; and De, containing the symbols for individuals and concepts respectively. Clearly De c &'(D;), the power set of D;, because every concept may be interpreted as a set of individuals, and Dec D; because every concept may be interpreted as some individual. Every interpretation of a description b is to be viewed as the set of the interpretations characterized by b: more precisely, as a mapping of the integers 7l. into &'(D;) so as to take account of the evolution of individuals: .::\ -+

(7l.

-+

&'(D;))

Associated with this mapping, there is a corresponding function Cfi that associates each component of TL& with its interpretation in the domain D; u De. Hence Cfi(b)U) E &'(D;) determines the set of interpretations of individuals that is covered by the description b in the state j, where j is a member of a set isomorphic to 7l.. The interpretation of any general description is defined as the set of interpretations of individuals characterized by this description, and successive (time-)states in the evolution can be labelled (as below) by successive elements n of 7l.. It is often convenient to begin the evolution from an initial staten= 0, which is a starting-point for the iterations such as (4) and (5) below, which describe the evolution of the correspondence function Cf, as n vanes. The interpretation of classical connectives is given by (1)

Cfi(a "b)(n) = Cfi(a)(n) n Cfi(b)(n)

(2)

Cfi(a v b)(n)

(3)

= Cfi(a)(n) u

Cfi(b)(n)

Cfi(1a)(n) = CCfi(a)(n)

(Cis complement in D;)

These rules correspond to the classical and intuitive interpretations of ", v and --,. The interpretation of temporal connectives is given by

(4)

Cfi( + a)(n) = Cfi(a)(n

+ 1)

00

(5)

Cfi($ a)(n) =

U Cfi(a)(n + p)

p=O

(6)

Cfi(- a)(n) = Cfi(a)(n - 1)

(7)

Cfi(

(8)

e a)(n) =

Cfi(.lc)(n) =

0

u Cfi(a)(n 00

p)

p=O

(i.e. empty)

Rules (4), (5), (6) and (7) determine the semantics of the evolution of

3

93

Formal Expression (}(Time in a Knowledl(e Base

descriptions: the interpretation of connectives + and - show that this latter expresses the evolution between two consecutive states E; and E; + 1 or E; and E;-i; the interpretation of connectives EB and expr-ess the evolution between two successive but not necessarily consecutive states E; and Ej U ~ i or j ~ i). From rules (5) and (7) we may also deduce that CC( EB EB a)(n) = ~(EB a)(n) and ~(88 a)(n) = CC(8 a)(n), which proves the transitivity of connectives EB and e. Finally, we may remark that unlike connectives + and -, the connectives EB and guarantee that the present is a part of the future or of the past (because j ~ i or j ~ i). Rule (8) shows that, whatever the state E;, there is no individual x characterized by the empty description. The connectives EB and 8 must be introduced in order to derive descriptions in which the sequence of states is not explicit (the aim is to look for the existence of at least one state in which an individual x has a given group of properties; these connectives are similar to the existential quantifier in classical logic). We can establish the soundness of the deduction relation=> in Table 2 with a proof that contains only technical details. We define the formula a=> b to be valid if CC(a)(i) is included in CC(b)(i) (i.e. CC(a)(i) c ~(b)(i)) for all i and for every pair (i.e. every interpretation) {Diu De,~}. The soundness of=> as a deduction relation is then proved by separate proofs of soundness of every condition Ck in Table 2 that makes explicit or (through =)indirect use of this relation. Further details of TL& that are of interest can be derived from all of the material above. For example, from conditions C5, C6, C5' and C6' and from the transitivity of the relation =>, we can easily establish the six following propositions:

e

e

-, e•a =>-a -, e-, => ea These propositions show that the complex operators"-, EB-," and "-, e-," may be interpreted as the universal quantification in the framework of time expression (an interpretation analogous to 3 and V in classical logic: l3-, = V); thus we may express the permanence of description whatever the state of knowledge. From axiomatization of the relation =>, we can also establish the following propositions: +(a v b)= +a v +b

-(a v b)= -a v -b

$(a v b) = EBa v EBb

8(a v b) = ea v 8b

$(a " b)=> EBa " EBb

8(a " b)=> ea " 8b

94

E. Chouraqui

The converse propositions of the last line are not satisfied. Therefore, in order to represent the description in a normal form expressed as an addition of disjunctions, we make the hypothesis that descriptions of the type EB(a " b), EB•(a v b), 8(a" b) and 8•(a v b) are not formulae of TL11 •

Table 2

The axioms and inference rules of TL 11

(i) Classical connectives CI C2 C3 C4

=

a 1\ b =a and a 1\ b b if a = b then a A c = b " c if a " 1 b = b then a = b if a, a b then b

=

aAb=bAa Cll a A (b " c) = (a " b) " c Cl2 a A a= a Cl3 a " A= A " a= A Cl4 a A (a v b)= a CIS a 1\ (b v c)= (a 1\ b) v (a Cl6 a/\ Ia= lal\a=A CIO

A

c)

avb=bva a v (b v c) = (a v b) v c ava=a avA=Ava=a a v (a A b) = a a v (b A c)= (a v b) 1\ (a v c)

(ii) Temporal connectives C5 C6 C7 CS C9

a=®a +a= ®a +®a=®+a ®(a v b)= ®a v EBb ®181a=a

Cl7 + 1a =..,+a CIS +(a A b)= +a A +b Cl9 ®®a= ®a C20 ®A=A C21 if a = b then +a = + b C22 if a= b then ®a= ®b C23 -+a= +-a C24 G®a = ®Ga

C5' C6' CT CS' C9'

a=Ga -a=Ga -ea= e-a 8(a v b)= ea v 8b 81®1a=a

CIT CIS' Cl9' C20' C21' C22'

- Ia = 1 -a -(a 1\ b)= -a A -b eea = ea 8A=A if a = b then -a = - b if a = b then ea = Gb

Similarly, focusing on ANDTHEN, we can derive simple results like noncommutativity (a ANDTHEN b #- b ANDTHEN a) and significant and desirable results like non-associativity ((a ANDTHEN b) ANDTHEN c #- o ANDTHEN (b ANDTHEN c)) and the fact that a permanent description cannot evolve (-, EB-, a ANDTHEN b = A.).

3

95

Formal Expression (}[ Time in a Knowledge Base

7 EXAMPLE OF DECISION PROCEDURE FOR THE RELATION

=

Figure 4 shows a simple outline of the proof of the goal "Does (will) there exist at least one state such that a robot will be awake?" In order to achieve this goal, the corresponding decision procedure involves on the one hand the facts "the robot Musclor is in the passage P; a moment later it will be in the workshop W and then it will perform the task T", and on the other hand the

KNOWLEDGE BASE RULES BASES

( R150

if (ROBOT X 8ACTIVITY( ISA,TASK ( Y))) then (ROBOT X 8STATE (IS A, AWAKE)))

DATABASE

(ROBOT MUSCLOR

LOCALIZATION(IN,PASSAGE ( Pll /\+LOCALIZATION (IN, WORKSHOP(W)) A++ACTIVITY(ISA,TASK( T )))

(elimination of the "and" connective)

(ROBOT MUSCLOR

LOCAL7,PASSAGE (P

(ROBOT MUSCLOR/\t+ACTIVITY( ISA,TASK( T))) ))I

1

(C21,C17,C15l

l

(ROBOT MUSCLOR/\8/\ ACTIVITY( ISA,TASK(T)))

(ROBOT MUSCLOR +LOCALIZATION(IN,WORKSHOP(W)))

(Rl50,•-I.IJSCOR,,•Tl

(ROBOT MUSCLOR/\8/\STATE( ISA,AWAKE)) ~ (z-MUSCLOR) (ROBOTz /\8/\STATE (IS A, AWAKEJ'?(goal)

Fig. 4

Example of a decision tree.

96

E. Clwuraqui

rule "if there exists (will exist) at least one state such that a robot performs (will perform) a task then there exists (will exist) at least one state such that a robot is (will be) awake". More precisely, the procedure whose effect is shown in Fig. 4 determines whether the relation H =>Cis satisfied, given a pair of descriptions (H, C). It works by using the formal properties of => noted above, and methods of problem decomposition and construction of the corresponding and/or graphs. It builds up two trees Ah and Ac for the hypothesis H and the tentative conclusion C, by development from the descriptions associated with Hand C, and then constructs an and/or graph A, by "appending'' to each leaf of Ah the tree Ac without its root. Finally, it tries to satisfy H => C by searching for at least one valid "and" subtree of A, by using all the axiomatic properties of TL& (e.g. in Tables I and 2).

8

CONCLUSION

Temporal reasoning plays a crucial role in several knowledge domains. Because of the complexity of the technical problems that have to be solved, it has probably been more common in the past for people to build specialpurpose computational models to solve specific problems than for them to use temporal logics and automatic theorem-proving. This may have happened in part because systematic presentations of the available types of temporal logic have been hard to find. The present chapter is intended to fill the gap, and to encourage the use of temporal logics for computational time-based reasoning. Following the general outline in Sections 1-4, we have given as an example and a case study the construction of a particular temporal logic TL& that is linear in a time coordinate that extends infinitely far into the past and the future. The novelty of. the basic structure of TL& is that the temporal connectives make a distinction between immediate past and future and the usual general views of "past" and "future". This distinction is made in order to allow the easy expression of transformations that may typically affect data in the course of time. The axiomatization of the connectives of TL&, and the natural development of its properties that has been summarized in Section 4, have led us to define a model of interpretation for the logic that is comparable to that of Kripke ( 1963) for modal logic. With the help of this model, we can prove consistency and soundness of TL&, which allows us to preserve the intrinsic coherence of the knowledge base of our ARCHES system when TL& is embedded in ARCHES. In addition, we have given a decision procedure that is based on the TL& axiomatic system and that supports information retrieval from the knowledge base, as in Fig. 4, taking into consideration the

3

Formal Expression of Time in a Knowled!fe Base

97

temporal constraints. The completeness problem has not been approached in the framework of this study, because the objective of our present research is the representation of time in a knowledge base rather than automatic theorem-proving. BIBLIOGRAPHY Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Commun. ACM 26, 832-843. Allen, J. F. ( 1984). Towards a general theory of action and time. Artificial/ ntelligence, 23, 123-154. Bruce, B. C. (1972). A model for temporal references and its application in a question answering program. Artificial Intelligence, 3, 1-26. Chouraqui, E. (1981). Contribution a )'etude theorique de Ia representation des connaissances, le systt~me symbolique ARCHES. These de doctorat d'etat, Nancy. Chouraqui, E. (1983). Formal expression of the evolution of knowledge. Proc. Int. Systems Dynamics Conf MIT. Chouraqui, E. (1986). Un systeme forme) de caracterisation de l'evolution des connaissances. Proc. Canadian Society for Computational Studies of Intelligence (CSCS/-86), Montreal, pp. 256-261. Cocchiarella, N. (1966a). Modality within tense logic. J. Symbolic Logic, 31,690--691. Cocchiarella, N. (1966b). A completeness theorem of tense logic. J. Symbolic Logic, 31, 688-691. Farinas del Cerro, L. (1981). Deduction automatique et logique modale. These de doctorat d'etat, Paris. Farinas del Cerro, L. (1986). MOLOG, Manuel d'utilisation. Laboratoire Langages Systemes Informatiques, Universite Paul Sabatier, Toulouse. Fusaoka, A., Seki, H. and Takahashi, K. ( 1983). A description and reasoning of plant controllers in temporal logic. Proc.lnt. Joint Conf on Artificial Intelligence (/JCA/83), Karlsruhe, Vol. I. Gabbay, D. M. (1972). Tense systems with discrete moments of time. J. Phil. Logic, I, 35-44. Hayes, P. J. (1971). A logic of actions. Machine Intelligence 6 (ed. B. Meltzer and D. Michie), pp. 495-520. Edinburgh University Press. Hughes, G. E. and Creswell, M. J. (1973). An Introduction to Modal Logic. Fletcher and Sons, Northwich, UK. Kandrashina, E. Yu. (1983). Representation of temporal knowledge. Proc. Int. Joint Conf. on Artificial Intelligence (/JCA/-83), Karlsruhe, Vol. 1, pp. 346-348. Koyama, T., Kaihara, S., Minimamikawa, T. and Kurokawa, T. (1981). Time-oriented features for medical consultation systems. Proc. Int. Joint Conf on Artificial Intelligence (/JCA/-81), Vancouver, Vol. 2, pp. 910--912. Kripke, S. A. ( 1963). Seman tical considerations on modal logic. Acta Phil. Fennica, 16, 83-94. Long, W. J. (1983). Reasoning about state from causation and time in medical domain. Proc. American Association for Artificial Intelligence Conf (AAAI-83), Washington, DC., pp. 251-254. Long, W. J. and Russ, T. A. (1983). A control structure for time dependent reasoning. Proc. Int. Joint Conf on Artificial Intelligence (/JCA/-83), Karlsruhe, Vol. 1, pp. 230-232.

98

E. Chouraqui

Malik, J. and Binfold, T. D. (1983). Reasoning in time and space. Proc./nt. Joint Con{ on Artificial Intelligence (IJCAI-83), Karlsruhe, Vol. I, pp. 343-345. Manna, Z. and Pnueli, A. (1983). Verification of concurrent programs: a temporal proof system. Report STAN-CS-83-967, Dept Computer Science, Stanford Univ. McCarthy, J. and Hayes, P. J. (1969). Some philosophical problems from the standpoint of artificial intelligence. Machine Intelligence (ed. B. Meltzer and D. Mitchie), 4, pp. 463-502. Edinburgh University Press. Porte, J. ( 1965). Recherches sur Ia theorie generate des systemesf(Jrmels. GauthierVillars, Paris. Prior, A. N. (1957). Time and Modality. Clarendon Press, Oxford. Prior, A. N. (1967). Past, Present and Future. Clarendon Press, Oxford. Prior, A. N. (1968). Papers on Time and Tense. Clarendon Press, Oxford. Rescher, N. and Garson, J. (1968). Topological logic. J. Symbolic Logic 33, 537-548. Schwind, C. B. ( 1978). Representing actions by state logic. Proc. Artificial Intelligence and Simulation of Behaviour (AISB/GJ), Hamburg, pp. 304-308. Schwind, C. B. (1983). A completeness proof for a logic of actions. Groupe Representation et Traitement des Connaissance (GRTC/LISH 127bis), Marseille. Schwind, C. B. (1984). A Prolog theorem prover for temporal and modal logic. Groupe Representation et Traitement des Connaissances (GRTC/LISH 386), Marseille. Vilain, M. B. (1982). A system for reasoning about time. Proc. American Association for Artificial Intelligence Conf (AAA/-82), Pittsburgh, pp. 197-201.

DISCUSSION In the first part of the chapter different computer models of time are briefly presented. With regards to models expressing sequences of states, one of the most famous is the situation calculus. A situation is a time interval over which no state of interest changes truth value; it denotes a stable period of time (or state, for short). For instance (T s (location John office)) means "It is true in situation s that John is at his office". An event is something that can happen, like John going to his house, and expresses a transition between states. So we can represent the effect of "John goes back home" by (T (results (do John (go home))) (location John home)), where "result" builds a new situation. Finally, it can be expressed that in a given situation, a given event actually occurs. One of the limitations of the situation calculus is that it focuses on transitions between discrete states. Moreover, the situation calculus is most useful when there is a single next situation after a given situation; that is, in the deterministic case. In the section dealing with linear-time models with manipulation of time intervals, the work from McDermott ( 1982) deserves to be mentioned. In this approach. partially ordered sets of states are dealt with. A fact is represented by the set of states where this fact is true, and an event is represented by the set of intervals during which this event happens. Moreover, a date is associated with each state. McDermott has proposed a semantic approach to a temporal logic of dating using a temporal precedence relation defined on the set of dates. This precedence relation satisfies the properties of transitivity, left-linearity (there is only one past), density and continuity. The temporal logic defined by McDermott is most interesting in Artificial Intelligence applications. Claudette Testemale:

3

99

Formal Expression of Time in a Knowledge Base

The second part of the chapter deals with temporal logics divided in tense logics and logics of dating. It would be useful to have a semantic characterization of the axioms in the framework of a temporal structure, in terms of temporal precedence relation and meaning function. In the rest of the chapter, a particular temporal logic and its application to a symbolic system of knowledge representation are described. This temporal logic belongs to non-measurable tense logics with discrete moments of time, but the axiomatics has been made suitable for the knowledge-representation application. However, the need for a specific temporal logic is not enhanced enough in the chapter. A possible way of giving a justification could consist in talking about semantics first. The semantics of the evolution of the so-called descriptions is described by the rules of interpretation of the different connectives. Then it is interesting to look for an axiomatization (with inference rules and axioms) of a special kind of formula (cf. tautologies in first-order logic). For instance, the axioms denoted Cl7, CIT, C20 and C20' seem to be fundamental and indispensable in any minimal set of axioms for the system TL~. Besides, it would have been interesting to get a synthetic view of the differences between the temporal-logic system TL~ and the temporal logics listed in the second part of the chapter. For instance, among the main characteristics of the system TL~. we find the use of four temporal connectives (the usual ones, mediate past, mediate future and two other connectives-immediate past, immediate future), and a specific organization of time where each point of time has a unique successor. The set of points of time may be represented by the set of integers. Then the temporal precedence relation underlying the semantic interpretation is the usual order on the set of integers (less than). This relation satisfies the properties of linearity and transitivity. Finally, it is worth noting that the usual connectives H (truth every time in the past) and G (true every time in the future) are recovered as -, 8-, and -, EE>-, respectively. The final comment concerns the property of the relation of deduction in the formal system TL~. The relation is proved to be valid but nothing is said about completeness (that is: if <'C(a)(i) c <'C(b)(i) for all i then a=> b). From the axiomatization of the relation =>, the proposition Ef>(a 1\ b)=> Ef)a 1\ Ef)b can be established while the converse cannot. However, it can be easily proved that the following relation is valid: Ef)a

1\

Ef)b => Ef>(a

1\

b) v Ef>(a

1\

Ef)b) v Ef>(Ef>a

1\

b)

Therefore one may ask for a justification for not considering formulae in which the connectors "and" and "or" appear in the scope of EE> or e.

L. Farinas del Cerro: When temporal logic is used in computer science, as in other domains, an important problem arises concerning the choice of the underlying model of time that we have in mind. Broadly, two main models can be considered: the linear model of time and the branching model of time (i.e. the time is formalized by a total or by a partial order). For example, in Chouraqui's paper the linear approach is given. Many discussions about the relevance of capturing the time structure of a particular P!oblem appear in the literature. The languages associated with these two models can grve, in a first approximation, some ideas about how to compare these two models. example, in the branching-time model we can distinguish an event that is possible ..rom an event that is certainly possible; for example "Marie can come tomorrow" and Marie will come tomorrow". From this example, we can see that branching logic is

jor

too

E. Chouraqui

more expressive than linear logic. However, Lamport (1980) proves that linear logic and branching logic are not equivalent. He proves that for a reasonable language there is a formula in linear logic that is not equivalent to any formula of branching logic, and on the other hand, there is a formula of branching logic that is not equivalent to any formula of linear logic. The result shows us that the choice of temporal logic must be given using a pragmatic criterion, for example the field of application. Consequently, particular attention must be paid to the justification of the structure of time selected. In this chapter the choice of the linear model is given under the hypothesis of the deterministic nature of time changes. J. A. Campbell:

There are a considerable number of logics dealing with time, but there is not much agreement on the key questions that the various kinds of temporal logic are expected to answer. The book on this subject by Turner ( 1984) is a good source for this variety. Chouraqui's classification of actual and possible work on temporal logics in terms of the structure of time is a helpful supplement to Turner's treatment. It is useful to read the two in parallel; surveys of temporal logic are otherwise hard to find. The chapter then continues with an example of a new and specific logic. When any new proposal of this kind appears, the reader needs to see first whether it is a technical development that avoids some of the difficulties of some existing class(es) of temporal logic or whether it is sui generis, introducing new issues or new views of reasoning about time. If it fits the former description then the examination can focus on the technical achievements and the specific connections with the logics that it improves upon. If not, then it can be assessed on the likely importance or usefulness of the new issues that it raises (provided that it passes an inspection for internal technical defects). Chouraqui's case study is difficult to place at first sight, because it is presented as a contribution to the topic of evolution of the contents of a knowledge base in time (which contains sub-topics involving several different kinds of logic). However, most of these sub-fields are just mentioned but otherwise ignored here. This is understandable in a short chapter, but it would have been interesting to see at least an estimation of the relative advantages and disadvantages of TL-1 and time-stamp or dating logics (Section 3.2) for the knowledge-base application. In practice, Chouraqui's contribution is quite narrowly a system in which observations about the past, present and future truth or falsity of logical assertions can be made. It competes with several existing logical frameworks that deal with unbounded linear time, and can be judged in competition with them. The general lesson of experience with these frameworks is that complicated systems of axioms cause more trouble than they are worth. For example, they complicate the achievement of technically necessary goals such as proofs of completeness. It is not intuitively obvious how to start a proof of completeness in his scheme. It may therefore not be coincidence that even a short report about completeness is not present in the chapter. In this respect, the case-study part of the chapter is more a report of work in progress than a finished survey of an area. Apart from this objection, the technical basis for further development of Chouraqui's logic seems sound, but the motivation for the approach (i.e. the answer to the question "why should somebody use just this approach for reasoning about the evolution of a time-dependent knowledge base, rather than some other one indicated by Turner (1984) or its bibliography?") is not clear. The evident novelty of the logic is

3

Formal Expression (){Time in a Knmdedl(e Base

101

the distinction between two parts (immediate and mediate) of both the past and the future. It is certainly true that many acts of reasoning are made more convenient if one can give special treatment to the immediate past or future (for example, an assertion may be known to be true or false close to the present but have no definite truth-value at more remote times), but informally this intuition is captured only if the two-way split is exclusive. However, in the logic under discussion it is not exclusive: if the past is p and the immediate past is IP, then IP is a subset of P. Consequently, the truth of a in IP requires one to deduce that a is true in P. (Axiom C6' justifies both of these points.) It is hard to see what problems in time-based reasoning are better suited to a logic containing C5, C5', C6 and C6' than one in which the fundamental partitioning of the past deals with (and makes axiomatic distinctions between) IP and P-IP. The same applies to the future, where IF and F- IF have the obvious meanings. When the special features of the logic that refer to IP and IF are removed, the remaining apparatus allows a fairly conventional temporal logical structure to be built. This structure (or, rather, a family of structures in which the non-"immediate" part of the present structure is a member) is summarized well in Chapter 6 of Turner (1984). This reference also contains developments of the basic structure in which intervals of time are primary units of discussion-a natural and welcome development if one is trying to find temporal logics for the formulation of practical examples in artificial intelligence. In fact, in some cases the interval is the basic unit of discussion. Allen's (1981) temporal logic is a good example, which also shows that the issues (e.g. of "naturalness" of axioms) are not all cut and dried: there is still room for research and improvement in interval logics and temporal logics in general. This issue of improvement is currently a live one, as very recent work by Ladkin (1987) and Allen and Hayes ( 1987) indicates. The work arises from a development (Allen and Hayes, 1985) of Allen's original logic, and does not have the property of completeness. Ladkin shows that one of the Allen and Hayes axioms is redundant, and adds a new axiom to achieve completeness. There are still open research questions in the area defined by these logics, for example establishing decidability in all instances where this is possible, and finding easy or practical decision procedures. A useful background reference to these logics is the text by van Benthem (1983). Among the more recent kinds of temporal logic that have emerged in response to the pressure of need, it is worth mentioning the scheme Tempura developed by Moszkowski (1986) as simultaneously a logical formalism and a programming language for the description and simulation of time-dependent processes. Although Tempura was first intended for concurrent and parallel computing exercises, it has s?me features that make it attractive also as an experimental means of describing time-dependent evolution of knowledge. Part of its quality of design is evidently due to the fact that its syntax and all of its semantics have been built up carefully together: a ~seful aim when one is trying to construct a computationally effective temporal logic ~Ithout running into both theoretical and practical difficulties. Chouraqui's presentation above does not make it evident that this happened with all the parts of the logic described in his case study.

Reply: TL& is a specific temporal logic that we purposely designed for the application we had in mind; that is, to allow representation in the ARCHES system certain modes 0 knowledge evolution and to solve the related problems. Hence its elaboration is ~ osely dependent upon the development of ARCHES, and so TL& is obviously not he most general feasible temporal language. This situation seems quite sensible methodologically speaking. Indeed, the temporal logics used in Artificial Intelligence

t

102

E. Chouraqui

present features inferred from the domains of applications and/or the classes of problems to solve (natural-language understanding, knowledge representation, databases, automatic theorem-proving, planning, causal reasoning and theory of actions, etc.). There are many different logical theories permitting reasoning about time that have been developed in Artificial Intelligence, as shown by the extensive bibliography relative to this area, but no particular theory can generally be considered as the best by the scientific community. Every one has useful properties allowing solution of the problem for which it was specifically developed, and thus leaves a lot of open questions. The case study that I have presented in the second part of the chapter stands in the research framework. The general application domain supported by this case study is the representation of knowledge. Of course, TL-1 does not allow solution of all the questions relative to the representation of time. Choices have been made with respect to the class of problems that ARCHES can represent and treat: linear and discrete expression of time, allowing representation of concepts like Yesterday, Tomorrow or Then, representation of state-changing or of the chain of discrete events, and updating of the knowledge base with keeping of the history. Thus TL-1 enables reasoning about general time sequences. With this purpose, two new temporal connectives have been defined: the immediate past and future. The semantic properties of these connectives entail the existence and uniqueness of the past (or future) instant (linear time); each is then an instant of the past (or of the future). In contrast with Campbell, I think that this last property is very intuitive: the immediate past (or future) always belongs to the past (or future). Moreover, many works on the discrete representation of time have the same kind of properties: see for example Gabbay (1972), Manna (1983) and Schwind (1983). Thus TL-1 is a specific non-measurable temporal logic built up from a classical NMTL logic for which the time is transitive, linear and doubly infinite (see Section 2.1 ). I completely agree with Campbell about how to design a logical system: its syntax and its semantics must be carefully built up together-with the goal of constructing a computable effective temporal logic. And I do stress that this process has been carried out in the design ofTL-1 (see Sections 5 and 6). Indeed, how could it not be so in order to prove the consistency and the soundness of this logic, since these two properties are essential for respecting the intrinsic coherence of the knowledge base of the ARCHES system. Similarly, we have elaborated a decision procedure particularly based upon the axiomatic system of TL-1, which allows the carrying out of information retrieval from the base, taking the temporal constraints into consideration. The completeness problem has not been approached in the framework of this study, since our objective was not automatic theorem-proving, but rather time representation in a knowledge base. In particular, the induction axiom a A -, ®.+ 1(a =>+a)=>-,
3

Formal Expression (}{Time in a Knowledge Base

103

Additional references

Allen, J. F. ( 1981 ). An interval based representation of temporal knowledge. Proc. 7th Int. Joint Conf on Artificial Intelligence, Vancouver, pp. 221-226. William Kaufman, Los Altos, California. Allen, J. F. and Hayes, P. J. (1985). A commonsense theory of time. Proc. 9th Int. Joint Conf on Artificial Intelligence, Los Angeles, pp. 528-531. Morgan Kaufmann, Los Altos, California. Allen, J. F. and Hayes, P. J. (1987). Short-time periods. Proc. lOth Int. Joint Conf on Artificial Intelligence, Milan, pp. 981-983. Morgan Kaufmann, Los Altos, California. • Ladkin, P. (1987). Models of axioms for time intervals. Proc. American Association for Artificial Intelligence Conf (AAA/-87), Seattle, pp. 234-239. Morgan Kaufmann, Los Altos, California. Lamport, L. (1980). "Sometimes" is sometime "Not never". Proc. 7th Ann. ACM Symp. on Principles of Programming Languages, Las Vegas, pp. 174-185. McDermott, D. V. (1982). A temporal logic for reasoning about processes and plans. Cogn. Sci. 6, 101-155. Moszkowski, B. (1986). Executing Temporal Logic Programs. Cambridge University Press. Turner, R. ( 1984). Logics for Artificial Intelligence. Ellis Horwood, Chichester. van Benthem, J. F. A. K. (1983). The Logic of Time. Reidel, Dordrecht.

4

Autoepistemic Logic ROBERT C. MOORE SRI International, Cambridge Computer Science Research Centre, Cambridge

Abstract Autoepistemic logic is a logic for modelling the beliefs of agents who reflect on their own beliefs. It can express statements such as "If I do not believe P then Q is true", and it characterizes the beliefs that an ideally rational agent would be entitled to hold on the basis of such statements. The most distinctive feature of autoepistemic logic is that it is non-monotonic, in the sense that the set of beliefs that are justified on the basis of a set of premises does not necessarily increase monotonically as the set of premises increases. In this chapter we review the basic theory of autoepistemic logic, discuss an alternative possible-world semantics, and present some recent results on the computability of autoepistemic theories. The chapter also includes a discussion of the distinction between autoepistemic reasoning and default reasoning and a detailed comparison of autoepistemic logic with the non-monotonic logics of McDermott and Doyle.

1 INTRODUCTION Autoepistemic logic is a logic for modelling the beliefs of agents who reflect on their own beliefs.t it can express statements such as "If I do not believe P then Q is true", and it characterizes the belief that an ideally rational agent would be entitled to hold on the basis of such statements. The most distinctive feature of autoepistemic logic is that it is non-monotonic, in the sense that the set of beliefs that are justified on the basis of a set of premises does not necessarily increase monotonically as the set of premises increases. Suppose that an agent takes as a premise "If I do not believe P then Q is true," and does not believe P. Under these circumstances, he is certainly entitled to believe Q. If he were to adopt as a belief the additional premise P, however, he Would no longer be entitled to believe Q, unless some additional justification existed. If we try to model this in a formal system, we seem to have a situation in tIt would perhaps be more precise to call it "autodoxastic logic", as the term "doxastic" llertains specifically to belief, in contrast with "epistemic", which pertains to knowledge. The term "autoepistemic" was chosen, however, in recognition of the current practice of using the term "epistemic logic" to refer to logics of knowledge and belief in general.

106

R. C. Moore

which the theorem Q is derivable from a set of axioms A, but is not derivable from some set A' that is a superset of A. As Minsky ( 1974, Appendix) has pointed out, this never happens in standard logics, because their inference rules make every axiom permissive. That is, the inference rules are always of the form "Pis a theorem if Q~> ... , Qn are theorems", so that new axioms can only make more theorems derivable; they can never invalidate a previous theorem. A non-monotonic logic allows axioms to be restrictive as well as permissive, by, in effect, employing inference rules of the form "Pis a theorem if Q 1 , ... , Qn are not theorems". The language of autoepistemic logic is the same as that of ordinary modal logic-standard propositional logic with an additional modal operator, which we shall write as B.t The intuitive interpretation of B(P) is "P is believed". The main difference between autoepistemic logic and standard modal logic is that autoepistemic logic obeys the principle that --, B( P) is a consequence of a set of premises A if Pis not a consequence of A. Intuitively, this says that if P is not believed then it is believed that P is not believed; in other words, an agent is assumed to be able to reflect on what he does not believe. This is an important feature of commonsense reasoning, because so many of our negative beliefs are based on the reasoning pattern "If it were so, I would believe it; I don't believe it; therefore it is not so." For example, as of this writing, I firmly believe that Richard Nixon is still living. I have absolutely no recent evidence to that effect, but I believe that if Nixon had died, I would know that! I don't have any information that would suggest that Nixon has died, so I believe he is still alive. To represent this reasoning in autoepistemic logic, let P represent the proposition that Nixon has died. My belief that, if Nixon had died, I would believe that he had died, would be represented as (I)

P

::J

B(P)

Suppose that this is my only premise that is relevant to the question of whether Nixon is still living. On the basis of this premise, I am entitled to believe the logically equivalent.

(2)

1B(P)

::J

-,p

(If I don't believe that Nixon has died then he has not died). Since I have no basis for believing P (Nixon has died), I am entitled to believe (3)

1B(P) tIn previous papers (Moore 1984, 1985) we have used L, following McDermott (1982).

! The ordinary way of putting this uses "know" rather than "believe", because we have sucll confidence in our beliefs. A more complete model would incorporate both knowledge and belief. but we shall stay with belief alone for simplicity.

4

107

Autoepistemic· Logic

(I don't believe that Nixon has died). Finally, by modus ponens applied to (2) and (3), I am entitled to believe

-,p

(4)

(Nixon has not died); i.e. Nixon is still alive. In any ordinary, monotonic logic, if P were added to a set of premises from which -,p can be derived then the resulting set of premises would be contradictory. That is not the case here, however. If we take {P, P :J B(P)} as the set of premises then the basis for believing 1B(P) is removed, so the derivation above fails to go through. The set of beliefs that are justified remains consistent, but it contains P instead of -, P.

2 TECHNICAL SUMMARY OF AUTOEPISTEMIC LOGIC 2.1

The basic theory

The type of object that is of primary interest in autoepistemic logic is a set of formulae that can be interpreted as a specification of the beliefs of an agent reflecting upon his own beliefs. We shall call such a set of formulae an autoepistemic theory. The truth of an agent's beliefs, expressed as an autoepistemic theory, is determined by (i) which propositional constants are true in the external world and (ii) which formulae are believed by the agent. A formula ofthe form B(P) will be true with respect to an agent if and only if P is in his set of beliefs. To formalize this, we define the notions of autoepistemic interpretation and autoepistemic model.

Definition 1 An autoepistemic interpretation I of an autocpistemic theory Tis a truth assignment to the formulae of the language ofT that satisfies the following conditions: I.

I conforms to the usual truth recursion for propositional logic;

2.

a formula B(P) is true in I if and only if PET.

Definition 2 An autoepistemic model of T is an autoepistemic interPretation of T in which all the formulae of T are true. (Any truth assignment in which all the formulae of Tare true and that satisfies Condition I of Definition I will be called simply a model of T.) h'We can readily define notions of soundness and completeness relative to t Is semantics. A semantically complete set of beliefs will be one that contains

108

R. C. Moore

everything that must be true, given that the entire set of beliefs is true and given that it is the set of beliefs being reasoned about.

Definition 3 An autoepistemic theory Tis semantically complete if and only if T contains every formula that is true in every autoepistemic model ofT. Soundness of a theory must be defined with respect to some set of premises. Intuitively speaking, an autoepistemic theory Twill be sound with respect to a set of premises A only if all the beliefs in T must be true, given that all the premises in A are true and given that T is, in fact, the set of beliefs under consideration.

Definition 4 An autoepistemic theory Tis sound with respect to a set of premises A if and only if every autoepistemic interpretation of T that is a model of A is also a model of T. The next problem is to give a syntactic characterization of the autoepistemic theories that satisfy these conditions. With a monotonic logic, the usual procedure is to define a collection of inference rules to apply to the premises. For a non-monotonic logic this is a non-trivial matter. The problem is that non-monotonic reasoning principles do not yield a simple iterative notion of derivability the way monotonic inference rules do. We can view a monotonic inference process as applying the inference rules in all possible ways to the premises, generating additional formulae to which the inference rules are applied in all possible ways, and so forth. Since monotonic inference rules are monotonic, once a formula has been generated at a given stage, it remains in the generated set of formulae at every subsequent stage. Thus the theorems of a theory in a monotonic system can be defined simply as all the formulae that are generated at any stage. The problem with attempting to follow this pattern in a non-monotonic logic is that we cannot draw non-monotonic inferences reliably at any particular stage, since something inferred at a later stage may invalidate them. Lacking such an iterative structure, non-monotonic systems often use non-constructive "fixedpoint" definitions, which do not directly yield algorithms for enumerating the "derivable" formulae, but do define sets of formulae that respect the intent of the non-monotonic inference rules. For autoepistemic logic, it is easiest to proceed by first specifying the closure conditions that we should expect the beliefs of an ideally rational agent to possess. Viewed informally, the beliefs should include whatever the agent could infer either by ordinary logic or by reflecting on what he believes. Stalnaker ( 1980) has put this formally by suggesting that a set of formulae T

4

Autoepistemic Lol(ic

109

that represents the beliefs of an ideally rational agent should satisfy the following conditions. t.

If P 1 , •.. ,P.ETand P 1 , ••• ,P.f--Q then QET(where f- means ordinary logical consequence).

2.

If PE Tthen B(P)E T.

3.

If P ¢ T then 1B(P) E T.

Stalnaker ( 1980, p. 6) describes the state of belief characterized by such a theory as stable "in the sense that no further conclusions could be drawn by an ideally rational agent in such a state". We shall use the same term to describe the theories themselves. Definition 5 An autoepistemic theory is stable if and only if it satisfies Stalnaker's Conditions 1-3. It turns out that these stability conditions precisely characterize the autoepistemic theories that are semantically complete (Moore, 1985, Theorem 3.3).

Theorem 1 (Completeness) An autoepistemic theory Tis semantically complete if and only if T is stable. By this theorem, we know that stability of an agent's beliefs guarantees that they are semantically complete, but stability alone does not tell us whether they are sound with respect to his premises. That is because the stability conditions say nothing about what an agent should not believe. They leave open the possibility of an agent's believing propositions that are not in any way grounded in his premises. What we need to add is constraint specifying that the only propositions the agent believes are his initial premises and those required by the stability conditions. To satisfy the stability conditions and include a set of premises A, an autoepistemic theory T must include all the ordinary logical consequences of

Au {B(P) I PET} u {IB(P) I P¢ T} We shall say that Tis grounded in A if it contains no more than this.

~efinition 6 An autoepistemic theory Tis grounded in a set of premises A tf and only if every formula of T is included in the set of ordinary logical consequences of Au {B(P) I pET} u {IB(P) I p ¢ T}

llO

R. C. Moore

The following theorem shows that this syntactic constraint on T and A captures the semantic notion of soundness (Moore, 1985, Theorem 3.4).

Theorem 2 (Soundness) An autoepistemic theory T is sound with respect to a set of premises A if and only if Tis grounded in A. The beliefs of an ideally rational agent ought to be both semantically complete and grounded in his premises; hence we have the following definition.

Definition 7 An autoepistemic theory Tis a stable expansion of a set of premises A only if Tis a superset of A that is stable and grounded in A; that is, if and only if T is the set of ordinary logical consequences of A u { B ( p) I p E

T} u {-, B ( p) I p ¢ T}

From Theorems I and 2, we can see that the possible sets of beliefs that an ideally rational agent might hold, given A as his premises, ought to be just the stable expansions of A. Note, however, that we use the plural "stable expansions", because there may be more than one stable expansion of a given set of premises. For example, consider {-, B ( P) ::J Q, -, B ( Q) ::J P} as a set of premises. The first formula asserts that if P is not believed then Q is true; the second asserts that if Q is not believed then P is true. In any stable autoepistemic theory that includes these premises, if P is not in the theory then Q will be, and vice versa. But if the theory is grounded in these premises, if P is in the theory then there will be no basis for including Q, and vice versa. Consequently, a stable expansion of {-, B( P) ::J Q, -, B(Q) ::J P} will contain either P or Q, but not both. It can also happen that there are no stable expansions of a given set of premises. Consider, for instance, {-, B(P) ::J P}. If Tis a stable autoepistemic theory that contains -, B( P) ::J P then it must also contain P. If P were not in T then 1B(P) would have to be in T, but then P would be in T-a contradiction. On the other hand, if P is T then T is not grounded in {-, B(P) ::J P}. Therefore no stable autoepistemic theory can be grounded in {•B(P) ::J P}. This seemingly strange behaviour results from the circularity of the interpretation of the autoepistemic operator B. Since B is interpreted relative to an entire set of beliefs, its interpretation will change with the various ways of completing a set of beliefs. In each acceptable completion of a set of beliefs. the interpretation of B will change to make that set stable and grounded in the premises. Sometimes, though, no matter how we try to form a complete

4

Autoepistemic Logic

Ill

set of beliefs, the result never coincides with the interpretation of B in a way that gives us a stable expansion of the premises. This raises the question of how to view autoepistemic logic as a logic. If we consider a set of premises A as axioms, what do we consider the theorems of A to be? If there is a unique stable expansion of A then it seems clear that we want this expansion to be the set of theorems of A. But what if there are several stable expansions of A--or none at all? If we take the point of view of the agent then we have to say that there can be alternative sets of theorems, or no set of theorems of A. This may be a strange property for a logic to possess, but, given the semantics, it is clear why this happens. An alternative is to take the theorems of A to be the intersection of the set of all formulae of the language with all the stable expansions of A. This yields the formulae that are in all stable expansions of A if there are more than one, and it makes the theory inconsistent if there is no stable expansion of A. This too is reasonable, but it has a different interpretation. It represents what an outside observer would know, given only knowledge of the agent's premises and that he is ideally rational.

2.2

An alternative semantics for autoepistemic logic

The semantics that we have provided for autoepistemic logic is simple, intuitive, and allows us to prove a number of important general results, but it requires enumerating an infinite truth assignment if the theory under consideration contains infinitely many formulae. This makes it difficult to exhibit particular models and intepretations we may be interested in. The problem is that, in the general case, there need be no systematic connection between the truth of one formula of the form B(P) and any other. Autoepistemic logic is designed to characterize the beliefs of ideally rational agents, but we want the semantics to be broader than that. The semantics that we have defined is intended to apply to arbitrary sets of beliefs, with the beliefs of ideally rational agents being a special case (just as model theory for standard logic applies to arbitrary sets of formulae, not just to those that are closed under logical consequence). Thus our semantics makes no necessary connection between the truth of B(P" Q) and B(P) or B(Q), because it is at least conceivable that an agent might be so logically deficient as to believe P A Q without believing P or believing Q. In such a case, there is little we can expect the truth definition for an autoepistemic theory to do, other than to list the true formulae of the form B( P) by brute stipulation. If we confine our attention to ideally rational agents, however, much more structure emerges. In fact, we can show that stable autopistemic theories can be simply characterized by Kripke-style possible-world models for modal logic (Kripke, 1971 ). For our purposes, what we need to recall about a Kripke

112

R. C. Moore

structure is that it contains a set of possible worlds and an accessibility relation between pairs of worlds. The truth of a formula is defined relative to a world, and conforms to the usual truth recursion for propositional logic. A formula of the form B( P) is true in a world W only if Pis true in every world accessible from W. On the interpretation of B as a belief operator, the possible worlds of the Kripke structures are the worlds that the agent believes might be the actual world, so that the agent believes P only if P is true in all possible worlds that the agent thinks might be actual. Kripke structures in which the accessibility relation is an equivalence relation are called S5 structures, and the S5 structures that will be of interest to us are those in which every world is accessible from every world. We shall call these the complete S5 structures. The key result is that the sets of formulae that are true in every world of some complete S5 structure are exactly the stable autoepistemic theories (Moore, 1984, p. 7).t

Theorem 3 Tis the set of formulae that are true in every world of some complete S5 structure if and only if T is a stable autoepistemic theory. Given this result, we can characterize any autoepistemic interpretation of any stable theory by an ordered pair consisting of the complete S5 structure that specifies what formulae are in the theory, in accord with Theorem 3, plus a propositional truth asignment to specify what is true in the actual world.

Definition 8 If K is a complete S5 Kripke structure and V is a propositional truth assignment, both defined on the propositional constants of an autoepistemic theory T, then (K, V) is a possible-world intepretation ofT if and only if T consists of all the formulae that are true in every world in K. A formula ofT is true in (K, V) if it is true according to the standard truth recursion for propositional logic, where the propositional constants are true in (K, V) if and only if they are true in V, and the formulae of the form B(P) are true in (K, V) if and only if they are true in every world in K (using the truth rules for Kripke structures). We can think of V as a possible world-the one that is actual-which makes (K, V) seem much like an ordinary Kripke structure. The difference is that K need not include V in our structures. K will not include V if some of the formulae in the stable theory characterized by K are false in the actual world described by V. Or, to put it more intuitively, if an agent has any false beliefs then the actual world will not be one of the possible worlds compatible t This result has been obtained independently by Halpern and Moses (1984), Melvin Fitting (personal communication) and Hector Levesque (personal communication).

4

Autoepistemic Lo!fic

113

with what he believes. We could have defined the notion of possible-world interpretation so that V is always in K, thus making (K, V) a standard Kripke structure, but this would require allowing a more complex accessibility relation. In that case, that K would no longer necessarily be an S5 structure, and we could not immediately apply Theorem 3.

Definition 9 (K, V) is a possible-world model of T if and only if (K, V) is a possible-world interpretation of T and every formula of T is true in (K, V). The possible-world models of a theory turn out to be just the possibleworld interpretations of the theory where the truth assignment that describes the actual world is the same as that for one of the possible worlds in the Kripke structure. Intuitively, this says that a set of beliefs is true if and only if the actual world is one of the worlds that the agent thinks might be actual (Moore, 1984, p. 9).

Theorem 4 If (K, V) is an autoepistemic interpretation of T then (K, V) will be an autoepistemic model of T if and only if the truth assignment V is consistent with the truth assignment provided by one of the possible worlds inK. Our possible-world models, then, turn out to be essentially identical with ordinary S5 Kripke models. It is critical for our definition of soundness, however, that there are possible-world interpretations of autoepistemic theories that are not simply S5 Kripke structures. In view of Theorems 3 and 4, it should be obvious that for every autoepistemic interpretation or autoepistemic model of a stable theory there is a corresponding possible-world interpretation or possible-world model, and vice versa. A major advantage of using the possible-world semantics for autoepistemic logic is that it makes it easy to demonstrate the existence of stable expansions of non-trivial sets of premises. For instance, we claimed in the preceding section that the set of premises {1B(P) ::::J Q, 1B(Q) ::::J P} has two stable expansions-one containing P but not Q, and the other containing Q but not P-but we did no more than give a plausibility argument for that assertion. We can now demonstrate this fact quite rigorously. Consider the stable theory T, generated by the complete S5 structure that contains exactly two worlds, { P, Q} and { P, --, Q}.t The possible-world

t We shall represent a possible world by the set of propositional constants and negations of Propositional constants that are true in it.

114

R. C. Moore

interpretations of Twill be the ordered pairs consisting of this S5 structure and any propositional truth assignment. Consider all the possible-world interpretations of Tin which ---, B( P) ::J Q and ---, B( Q) ::J Pare both true. By exhaustive enumeration, it is easy to see that these are exactly ({{P, Q}, {P, •Q}}, {P, Q}) ({{P, Q}, {P, •Q}}, {P, oQ})

Since, in each case, the actual world is one of the worlds of the Kripke structure characterizing the set of beliefs, each of these is a possible-world model ofT. Therefore Tis sound with respect to {---, B( P) ::J Q, ---, B( Q) ::J P }. Since Tis stable and includes {oB(P) ::J Q, oB(Q) ::J P} (note that both of these formulae are true in both worlds in the S5 structure), T is a stable expansion of A. Moreover, it is easy to see that T contains P but not Q. A similar construction yields a stable expansion ofT that contains Q but not P. On the other hand, if both P and Q are to be in a theory T then the corresponding S5 structure contains only one world, { P, Q}. But then {{{P, Q}}, {1P, •Q}} is a possible-world interpretation of Tin which ---, B( P) ::J Q and ---, B(Q) ::J Pare both true, but some of the formulae of Tare not ( P and Q, for instance). Hence if T contains both P and Q then Tis not a stable expansion of {oB(P) ::J Q, oB(Q) ::J P}. 2.3

Computing with autoepistemic logic

The topic of this book is non-standard logic for automated reasoning, but so far we have had nothing to say about automating the process of autoepistemic inference, i.e. deciding whether a formula belongs to a chosen stable expansion of a set of premises. In the propositional case, the methods introduced in the previous section can be used to generate a decision procedure for a given set of premises A. I.

Generate all possible sets of truth assignments to the propositional constants that occur in A. These will characterize the complete S5 Kripke structures for the language of A. If A is finite then each set of truth assignments will be finite, and there will be finitely many of them.

2.

Select the Kripke structures generated in Step I for which every formula of A is true in every possible world. This is necessary to ensure that the stable autoepistemic theory characterized by each Kripke structure considered is a superset A.

3.

Of these, select a Kripke structure K, and generate (K, V) for every truth assignment V to the propositional constants that occur in A.

4

115

Autoepistemic Logic

4.

Check that for every (K, V) generated in Step 3, every formula of A is true in (K, V) if and only if V E K. If so then the theory that K characterizes is a stable expansion of A.

5.

To test whether a formula is in the stable expansion represented by K, test whether it is true in every world in K.

To see how this works in practice, recall the example in Section I. If all I believe about whether Nixon has died is that if he had died I then would believe that he had died, then I ought to conclude that Nixon has not died. If p represents the proposition that Nixon has died then what we want to show is that -,pis in every stable expansion of the set {P ::J B(P)}. There is only one propositional constant, P, in the language of the set of premises, so there are only four sets of truth assignments generated in Step I: { },

{{•P}},

{{P}},

{{P},{•P}}

These correspond to Kripke structures, containing no possible worlds, containing only the world in which P is true, containing only the W?rld in which P is false, and containing both the world in which P is true and the world in which Pis false. Of these, P ::J B( P) is true in every world of the first three:

{ }, {{P}}, {{•P}} For the first two, however, we can construct (K, V) pairs such that P ::J B( P) is true, but V rt K: ( { }, {-, p }),

Of the two (K, V) pairs where K

=

( {{ p} }, {-, p})

{{•P}},

({{-, p} }, {-, p }), ( {{-, p} }, { p}) ::J B( P) is true only for the first, and that is indeed a pair where V E K. So the only stable expansion of { P ::J B( P)} is the set characterized by {{-, P} }. -,Pis true in every world in this Kripke structure, so we conclude that -,Pis contained in every stable expansion of {P ::J B(P)}. If we had independent information that Nixon had died, however, the conclusion that he had not died would not be obtainable. That is, if we add P to the premises then -, P will not be obtained in any stable expansion of the larger set of premises. The premises will now be { P, P ::- B(P)}, and the I<.ripke structures in which all of the premises are true in every possible world are

P

{ },

{{P}}

} is ruled out again, because every formula of A is true in ( { }, { P} ), but

116

R. C. Moore

{P} ¢ { }. This leaves only {{P}}, with {K, V} pairs

({{p} }, {p }), ( { { p} }, {-, p}) P :J B( P) is true in both of these, but P is true only in the first. This pair satisfies VE K, so {{P}} characterizes a stable expansion of {P, P :J B(P)}. Pis true in every world in {{ P} }, but -,Pis not, so the only stable expansion of this set of premises contains P but not -, P. This method amounts simply to direct application of the definitions and theorems given so far. It is not a very practical method, however. Finding the S5 structures that characterize stable expansions of the premises requires enumerating all possible sets of truth assignments to the propositional constants that occur in the premises, which takes at least hyper-exponential time. Moreover, it cannot be extended beyond propositional logic to logics where theories may have infinitely many models, or models that are themselves infinite objects. Can we do better than this? Can we, for instance, develop inference methods for first-order autoepistemic logic, i.e. first-order modal logic where the modal operator is given an autoepistemic interpretation? In the general case, the answer is "no". If we construct the autoepistemic analogue of an undecidable theory then the stable expansions will not even be recursively enumerable. Suppose we take as our set of premises A the axioms of Peano arithmetic (PA). A stable expansion of A would have to contain all the theorems of PA and (as follows from Theorem 5 below) none of the nontheorems. If Pis a formula of PA then B(P) will be in a stable expansion of A only if Pis provable in PA, and-, B( P) will be in a stable expansion of A only if Pis not provable in PA. If we could enumerate any stable expansions of PA, then, we should have a decision procedure for PA, which we know to be impossible. It is easy to see, then, that no stable expansion of a set of premises is recursively enumerable unless the set of ordinary logical consequences of those premises is decidable. What happens if the ordinary logical theory that we are dealing with is decidable? There are many open questions about this situation, but in at least one important special case there is a positive result. If we restrict the language of first-order autoepistemic logic so that B is applied only to closed formulae (i.e. we allow formulae like B(3xP(x)) but not 3xB(P(x)) then we can take the axioms of any decidable ordinary first-order theory as premises, extend these with any finite number of autoepistemic formulae as additional premises, and the stable expansions of the resulting set of premises can be determined and are decidable. To see how we can do this, first suppose that all the additional premises are ordinary formulae (that is, they contain no occurrences of the operator B) and the formula we want to decide about is also an ordinary formula. It turns

4

Autoepi.l·temic Logic

117

out that in this case, we can use the ordinary decision procedure. This follows immediately from the following result. Theorem 5 If a set of premises A contains only ordinary formulae then A has a unique stable expansion, whose ordinary formulae are just the ordinary logical consequences of A. This means that if the additional premises are P 1 , ••• , P. and we want to know whether some ordinary formula Q is in the stable expansion T of the extended set of premises then all we need to do is apply the original decision procedure toP, " ... " P. :J Q. We can extend this result by defining (i) a transformation on premises that, for each stable expansion of an arbitrary set of autoepistemic premises, gives us a set of ordinary premises that has the same stable expansion, and (ii) a procedure that, for a given set of ordinary premises, decides an arbitrary autoepistemic formula by calling itself recursively to simplify the formula to an ordinary formula and applying the ordinary decision procedure. The basic idea of the transformation on a set of premises A is to guess the truth assignment to all formulae of the form B(P) that occur in At that is consistent with each stable expansion of A. We can use this truth assignment to simplify A and each formula P, such that B(P) occurs in A. If we have guessed correctly then the simplified version of P will be an ordinary logical consequence of the simplified version of A if and only if we have assigned "true" as the value of B(P). We can carry out this transformation because there will be only finitely many truth assignments to the formulae of the form B(P) that occur in A, and we can use the ordinary decision procedure to test the premises and formulae resulting from the simplification. Letting T be a decidable ordinary first-order theory and A be a set of additional premises in the language of T, augmented by permitting the autoepistemic operator B to apply to closed formulae, the transformation can be performed as follows:

l.

Generate all possible truth assignments for the formulae of the form B(P) that occur in A.

2.

For each truth assignment V, simplify A to remove all occurrences of formulae of the form B(P). Call the result A'.

3.

For each formula P such that B(P) occurs in A, let P' be the result of simplifying P with respect to V, and use the decision procedure for T to determine whether P' is an ordinary logical consequence of Tu A'.

t Note that

B( P) might occur in A as a sub-formula, as in A = {--, P ~ --, B( P)}.

118

R. C. Moor"

4.

If for all the formulae of the form B( P) that occur in A, if B( P) is true only if P' is an ordinary logical consequence of Tu A', then the (unique) stable expansion of Tu A' is also a stable expansion of TuA.

The truth assignments V found in carrying out this procedure will characterize all the stable expansions of Tu A, with each truth assignment characterizing a different stable expansion. The procedure for deciding whether an autoepistemic formula P is in the stable expansion of Tu A' is as follows: I.

If P contains no sub-formulae of the form B(P;) then let P' = P.

2.

If P contains sub-formulae of the form B(P;) then (a) invoke this procedure recursively for each P; such that B(P;) is not embedded in any other such sub-formula. If P; is in the stable expansion of Tu A' then assign B(P;) the value "true"; if not then assign B( P;) the valu~ "false". (b) Let P' be the result of simplifying P according to these truth assignments.

3.

Apply the decision procedure for T to determine whether P' is an ordinary logical consequence of Tu A', If so then P is in the stable expansion of Tu A'; otherwise it is not.

To apply this method to the example about whether Nixon has died, let A be the set {P ::::J B( P)} (if Nixon had died than I would believe that he had died). B(P) is the only formula in A that we have to worry about, so let us try assigning it the value "true". A then simplifies to { }, which certainly does not entail P, so there is no stable expansion of {P ::::J B( P)} that makes B( P) come out true, i.e. that contains P. If we assign B(P) the value "false", however. then A simplifies to {--, P }, so the stable expansion of {--, P} is also a stable expansion of { P ::::J B( P) }, and clearly this contains --, P, but not P. If we had independent information that Nixon has died-if we took A to be { P, P ::::J B( P) }-and we assign "false" to B( P) then A simplifies to {P, --, P }. from which anything will follow, including P. Hence there is no stable expansion of {P, P ::::J B( P)} that does not contain P. On the other hand, if we assign B( P) the value "true" then A simplifies to {P }, so there is a stable expansion of {P, P ::::J B( P)} that contains P and does not contain --, P. This method seems efficient enough to be seriously considered as a practical method of reasoning in autoepistemic logic. Finding the truth assignments that characterize stable expansions can require exponential time. but this process is "off-line"; it need be done only once for a particular set of premises. The process of actually testing a formula for membership in the

4

119

Autoepistemic Logic

selected stable expansion is at worst a linear factor more costly than the underlying decision procedure, since that procedure gets invoked once for each occurrence of B in the formula to be tested.

3

APPLICATIONS OF AUTOEPISTEMIC LOGIC

The main area of application of autoepistemic logic would seem to be creating systems that have information about the scope and limits of their own knowledge. For example, most database-management systems have built into them the assumption that they have complete information about the domain corresponding to the contents of the database. That is, "not" is interpreted as meaning "not in the database", "all" as "all items in the database", etc. More flexible knowledge bases, however, should have the ability to record explicitly whether their knowledge about some particular concept is complete or not. The fact that a knowledge base records all objects having property P could be expressed in autoepistemic logic by including in the knowledge base the formula Vx( P(x) => B( P(x)))

while the information that there are definitely more objects having property P than just those recorded in the knowledge base could be expressed by the formula 3x( P(x) " -, B( P(x)))

Expressing information of this sort requires a facility not provided so far in autoepistemic logic, however-the ability to quantify into the scope of the operator B. That is, we need to express formulae having variables that are quantified outside the scope of B but have occurrences inside the scope of B. This happens in both of the examples just given. This creates at least two sorts of problem. For one, it is difficult to see how to extend the computational methods described in Section 2.3 to deal with formulae that involve quantifying into the scope of B. Deciding the truth of ~(P), where P is a closed formula, requires only deciding whether P is Included in the autoepistemic theory under consideration. Deciding the truth of 3x(B(P(x))) might require deciding inclusion in the theory for an infinite number of formulae-all the substitution instances of P(x). The more serious problem is a conceptual one, however. It is far from clear how to characterize the stable expansions of a set of premises once we allow quantifying into the scope of B. Suppose that our set of premises is { P(A ), P(B), Vx( P(x) => B( P(x)))}

120

R. C. Moore

and we are asked to decide whether oP(C) should follow from these premises. At first, it might appear that it should, since the third premise says that every object with property Pis believed to have property P, and only A and B are believed to have property P. But what if C = A or C = B is true? The quantifier in the third premise ranges over objects, not names of objects, so the possibility is open that P(C) is true, because C is believed to have property P under a different name. Rather than ---, P(C), it seems that what we want to be able to infer is P(C) => (C = A v C = B). It is not immediately clear what is the most general pattern of reasoning of which this is an instance, or how to describe it formally. As a final word on the applications of autoepistemic logic, we shall take note of an area where autoepistemic logic probably ought not to be applied. although it is often confused with autoepistemic reasoning. Non-monotonic reasoning is frequently discussed as if it were a single unified phenomenon and all cases of non-monotonic reasoning should be handled by a single formalism or mechanism. But non-monotonicity is a rather abstract syntactic property of an inference system, and there is no a priori reason to believe that all forms of non-monotonic reasoning should have the same logical basis. In fact, it appears that formalisms better suited to modelling autoepistemic reasoning are often mistakenly applied to the rather different phenomenon of default reasoning. By default reasoning we mean the drawing of plausible inferences from less-than-conclusive evidence in the absence of information to the contrary. The classic example concerns the ability of birds to fly. If we know that Tweety is a bird we shall normally assume, in the absence of evidence to the contrary, that Tweety can fly. If, however, we later learn that Tweety is a penguin, we shall withdraw our prior assumption. This inference is unquestionably non-monotonic, but is it autoepistemic? It does have an autoepistemic component, which is perhaps the source of the confusion, since we should _reflect on whether we have any reason to believe that Tweety cannot fly before inferring that he can, but there is more to it than that. Perhaps the major difference between autoepistemic reasoning and default reasoning is that autoepistemic reasoning is logically valid, but default reasoning is not. If we know that Tweety is a bird then that gives us some evidence that Tweety can fly, but it is not conclusive. In the absence of information to the contrary, however, we are willing to go ahead and tentatively conclude that Tweety can fly; the conclusion is not certain. though, so default reasoning is not a form of valid inference. Consider the belief that lies behind our willingness to infer that Tweety can fly from the fact that Tweety is a bird. It is probably something like "most birds can fly, or "almost all birds can fly", or "a typical bird can fly". To model this kind of reasoning, in a theory whose only premises are "Tweety is ' 1

4

121

Autoepistemic Logic

bird" and "Most birds can fly", we ought to be able to infer (nonmonotonically) "Tweety can fly". If there were a form of valid inference then we should be guaranteed that the conclusion is true if the premises are true. This is manifestly not the case. The premises of this inference give us a good reason to draw the conclusion, but not the ironclad guarantee that validity demands. McDermott (1982, p. 33) suggests using a formula equivalent to the following to sanction non-monotonic inferences about birds being able to fly:

'v'x((Bird(x) " -, B(-, Canjly(x))

:J

Canjly(x))

McDermott suggests as a gloss of this formula "Most birds can fly", which would indicate that he thinks of the inferences it sanctions as default inferences. But the formula actually says something quite different: "For all x, if x is a bird and it is not believed that x cannot fly then x can fly." McDermott's formula, then says that the only birds that cannot fly are the ones that are believed not to fly. If we have a theory whose only premises are this one and an assertion to the effect that Tweety is a bird then the conclusion that Tweety can fly would be a valid inference. That is, if it is true that Tweety is a bird, and it is true that only birds believed not to fly are in fact unable to fly, and Tweety is not believed not to fly, then it must be true that Tweety can fly. This, then, is a pure autoepistemic inference, not a default inference. To put the problem slightly differently, if we took McDermott's formula as a premise, and we did not have any information about any birds that cannot fly, we should be able to infer that all birds do fly. But this is not reasonable-even as a default inference-if all we know is that most, or almost all, birds fly. Default reasoning and autoepistemic reasoning are both non-monotonic, but for different reasons. Default reasoning is non-monotonic because, to use a term from philosophy, it is defeasible: its conclusions are tentative, so, given better information, they may be withdrawn. Purely autoepistemic reasoning, however, is not defeasible. If one really believes that one already knows all the instances of birds that cannot fly then one cannot consistently hold to that belief and at the same time accept new instances of birds that cannot fly. As Stalnaker ( 1980) has observed, autoepistemic reasoning is nonmonotonic because the meaning of an autoepistemic statement is contextsensitive; it depends on the theory in which the statement is embedded. The operator B changes its meaning with context just as do indexical words in natural language, such as "I", "here" and "now". The non-monotonicity associated with autoepistemic statements should therefore be no more PUzzling than the fact that "I am hungry" can be true when uttered by a Particular speaker at a particular time, but false when uttered by a different

122

R. C. Moo"'

speaker at the same time or the same speaker at a different time. So we might say that, whereas default reasoning is non-monotonic because it is defeasible, autoepistemic reasoning is non-monotonic because it is indexicial.

4

RELATED LOGICS

Autoepistemic logic is closely related to the logic of knowledge and ignorance of Halpern and Moses ( 1984), the chief difference being that theirs is a logic of knowledge rather than belief. Levesque (1981) has also developed a kind of autoepistemic logic, but in his system the agent's premises are restricted to a sub-language that makes no reference to what he believes. Autoepistemic logic is most closely related, however, to the non-monotonic logics of McDermott and Doyle ( 1980) and McDermott ( 1982). In fact, it was designed to be a reconstruction of these logics to avoid some of their difficulties. In the first logic that they define, McDermott and Doyle (1980) introduce an operator M, with formulae of the form M ( P) being read informally as"/' is consistent". Their logic, however, gives such a weak notion of consistency that, as they point out, M ( P) is not inconsistent with -, P. That is, it is possible for a theory to assert simultaneously that P is consistent with the theory and that P is false, without there being a formal contradiction. McDermott subsequently (1982) tried strengthening non-monotonic logic by developing non-monotonic modal logics based on T, S4, and S5. He discovered, however, that the most plausible candidate for formalizing the notion of consistency that he wanted, non-monotonic S5, collapses to ordinary S5 and is therefore monotonic. The reasons for these problems are readily apparent if we compare McDermott and Doyle's logics to autoepistemic logic. To make the comparison, we shall interpret B as the dual of M. That is, M ( P) is taken as an abbreviation for -, B (-, P). In other words, P is said to be consistent if 11' is not believed. Since we suppose we are dealing with ideally rational agents. this seems appropriate. On this interpretation, McDermott and Doyle's first logic is very similar to our autoepistemic logic with one glaring exception; its specification includes nothing corresponding to Stalnaker's Condition 2 (if PET then B(P) E T). McDermott and Doyle define the non-monotonic fixed points of a set of premises A, corresponding to our stable expansions of A. Their definition is equivalent to the following:

T is a fixed point of A if and only if T is the set of ordinary logical consequences of Au {-, B(P) I P rt T}. Compare this with our definition of a stable expansion of A:

4

Autoepistemic Lo![ic

123

Tis a stable expansion of A if and only if Tis the set of ordinary logical consequences of Au {B(P) I PET} u {1B(P) I P rt T}. In McDermott and Doyle's non-monotonic logic, {B ( P) I P E T} is missing from the "base" of the fixed points. This makes it possible for there to be nonmonotonic theories with fixed points that contain P but not B(P). So, under an autoepistemic interpretation of B, McDermott and Doyle's agents are omniscient as to what they do not believe, but they may know nothing about what they do believe. This explains essentially all the peculiarities of McDermott and Doyle's original logic. For instance, they note (1980, p. 69) that M (C) does not follow from M (C " D). Changing the modality to B, this is equivalent to saying that ---,B(P) does not follow from ---,B(P v Q). The problem is that, lacking the ability to infer B(P) from P, non-monotonic logic permits interpretations of B that are more restricted than simple belief. Suppose that we interpret B as "inferable in n or fewer steps" for some particular n. P might be inferable in exactly n steps, and P v Q in n + I. According to this interpretation, ---,B(P v Q) would be true and 1B(P) would be false. Since this interpretation of B is consistent with McDermott and Doyle's definition of a fixed point, ---, B( P) does not follow from ---, B( P v Q). The other example of this kind noted by McDermott and Doyle is that {M(C), 1C} has a consistent fixed point, which amounts to saying simultaneously that Pis consistent with everything asserted and that Pis false. But this set of premises is equivalent to {---, B( P), P }, which would have no consistent fixed points if B( P) were forced to be every fixed point that contains P. On the other hand, McDermott and Doyle consider it to be a problem that the set of premises {M( C) :J D, ---, D} has no fixed point in their logic. Restated in terms of B, this set of premises is equivalent to { P :J B(Q), P}. Every stable autoepistemic theory containing these premises will also contain Q. (If such a theory is consistent then, being closed under ordinary logical consequence, it will contain B(Q) and therefore must contain Q to avoid containing 1B(Q). On the other hand, an inconsistent autoepistemic theory will contain Q because it contains everything.) But Q is not contained in any theory grounded in the premises { P :J B(Q), P}; it is possible for P :J B(Q) and P both to be true with respect to an agent while Q is false. So there is no stable expansion of { P :J B(Q), P} in autoepistemic logic; hence this set of premises cannot be the foundation of an appropriate set of beliefs for an ideally rational agent. Thus our analysis justifies non-monotonic logic in this case, contrary to the intuition of McDermott and Doyle . . McDermott and Doyle recognized the weakness of the original formulabon of non-monotonic logic, and McDermott ( 1982) went on to develop a group of theories that are stronger because they are based on modal logic

124

R. C. Moore

rather than classical logic. McDermott's non-monotonic modal theories alter the logic in two ways. First, the definition of fixed point is changed to be equivalent to T is a fixed point of A only if T is the set of modal consequences of Au {•B(P) I P ¢ T},

where "modal consequence" means that PI- B(P) is used as an additional inference rule. Secondly, McDermott considers only theories that include as premises the axioms of one of the standard logics T, S4 and S5. Merely changing the definition of fixed point brings McDermott's logic much closer to autoepistemic logic. In particular, adding PI- B( P) as an inference rule means that all modal fixed points of A are stable expansions of A. However, adding PI- B(P) as an inference rule, rather than adding { B( P) I P E T} to the base of T, has as a consequence that not all stable expansions of A are modal fixed points of A. The difference is that, in autoepistemic logic, if P can be derived from B( P) then both can be in a stable expansion of the premises, whereas in McDermott's logic there must be a derivation of P that does not rely on B(P). Thus, although in autoepistemic logic there is a stable expansion of {B(P) :J P} that includes P, in McDermott's logic there is no modal fixed point of {B(P) :J P} that includes P. It is as if, in autoepistemic logic, one can acquire the belief that P and justify it later by the premise that if P is believed then it is true. In nonmonotonic logic, however, the justification of P has to precede belief in B( P ). This makes the interpretation of B in non-monotonic modal logic more like "justified belief" than simple belief. Since we have already shown that autoepistemic logic requires no specific axioms to model ideal autoepistemic reasoning, we might wonder what purpose is served by McDermott's second modification of non-monotonic logic, the addition of the axioms of various modal logics. The most plausible answer is that, besides behaving in accordance with the principles of autoepistemic logic, an ideally rational agent might well be expected to kno'>' what some of those principles are. For instance, the modal logic T has all instances of the schema B(P :J Q) :J (B(P) :J B(Q)) as axioms. This says that the agent's beliefs are closed under modus ponens-which is true for an ideally rational agent, so he might as well believe it. S4 adds the schema B(P) :J B(B(P)), which means that if the agent believes P then he believes that he believes it (Stalnaker's Condition 2). S5 adds the schema -, B( P) :J B(-, B(P)), which means that if the agent does not believe P then he believes that he does not believe it (Stalnaker's Condition 3). Since all these formulae are always true with respect to any ideally rational agent, it seems plausible to expect him to adopt them as premises. Thus S5 seems to be the most plausible candidate of the non-monotonic logics as a model of

4

Autoepistemic Logic

125

autoepistemic reasoning. Unfortunately, non-monotonic S5 turns out to be equivalent to ordinary S5. The problem is that all of these logics also contain the schema B( P) ::J P, which means that if the agent believes P then P is true-but this is not generally true, even for ideally rational agents. t It turns out that B( P) ::J P will always be contained in any stable autoepistemic theory (that is, ideally rational agents always believe that their beliefs are true), but making it a premise allows beliefs to be grounded that otherwise would not be. As a premise the schema B( P) ::J P can itself be justification for believing P, while as a "theorem" it must be derived from 1B(P), in which case P is not believed, or from P, in which case P must be independently justified, or from some other grounded formulae. In any case, as a premise schema, B(P) ::J P can sanction any belief whatsoever in autoepistemic logic. This is not generally true in modal non-monotonic logic, as we have also seen, but it is true in non-monotonic S5. The S5 axiom schema 1B(P) ::J B(•B(P)) embodies enough of the model theory of autoepistemic logic to allow B( P) to be "self-grounding": the schema 1B(P) ::J (1B(P)) is equivalent to the schema oB(oB(P)) ::J B(P), which allows B(P) to be justified by the fact that its negation is not believed. This inference is never in danger of being falsified, but, from this and B(P) ::J P, we obtain an unwarranted justification for believing P. The collapse of non-monotonic S5 into monotonic S5 follows immediately. Since B( P) ::J P can be used to justify belief in any formula at all, there are no formulae that are absent from every fixed point of theories based on nonmonotonic S5. It follows that there are no formulae of the form --, B( P) that are contained in every fixed point of theories based on non-monotonic S5; hence there are no theorems of the form --, B( P) in any theory based on nonmonotonic S5, because McDermott takes the theorems of a theory to be the intersection of all the fixed points. Since these formulae are just the ones that Would be produced by non-monotonic inference, non-monotonic S5 collapses to monotonic S5. In more informal terms, an agent who assumes that he is infallible is liable to believe anything, so an outside observer can conclude nothing about what he does not believe. The real problem with non-monotonic S5, then, is not the S5 schema; therefore McDermott's rather unmotivated suggestion to drop back to nonmonotonic S4 ( 1982, p. 45) is not the answer. The S5 schema merely makes k t B( P) => P would be an appropriate axiom schema if the intepretation of 8( P) were "P is nown" rather than "Pis believed," but that notion is not non-monotonic. An agent cannot, in ~eneral, know when he does not know P, because he might believe P-leading him to believe that e knows P-while Pis in fact false. Since agents are unable to reflect directly on what they do not know (only on what they do not believe), an autoepistemic logic of knowledge alone would not be a non-monotonic logic; rather, the appropriate logic would seem to be monotonic S4.

126

R. C. Moore

explicit the consequences of adopting B(P) ::::J Pas a premise scheme that arc implicit in the logic's natural semantics. If we want to base non-monotonic logic on a modal logic then the obvious solution is to drop back, not to S4, but to what Stalnaker (1980) calls "weak S5"-S5 without B( P) ::::J P-or K45 in Chellas's (1980) terminology. It is much better motivated, and moreover has the advantage of actually being non-monotonic. In autoepistemic logic, however, even this much is unnecessary. Adopting any of the axioms of weak S5 as premises makes no difference to what can be derived. The key fact is the following (Moore, 1985, Theorem 4.1 ).

Theorem 6 If P is true in every autoepistemic interpretation of T then Tis grounded in Au { P} if and only if Tis grounded in A. An immediate corollary of this result is that if P is true in every autoepistemic interpretation of T then T is a stable expansion of A u { P} if and only if Tis a stable expansion of A. Since the modal axiom schemata of weak S5, Q)

::::l

(B(P)

B(P)

::::l

B(B(P))

1B(P)

::::J

B(-,B(P))

B(P

::::l

::::l

B(Q))

simply state Stalnaker's Conditions 1~3, all their instances are true in every autoepistemic interpretation of any stable autoepistemic theory. The nonmodal axioms of weak S5 are just the valid formulae of ordinary logic, so they are true in every interpretation (autoepistemic or otherwise) of any autoepistemic theory (stable or otherwise). Therefore it immediately follows from Theorem 6 that a set of premises containing any of the axioms of weak S5 will have exactly the same stable expansions as the corresponding set of premises without any weak-S5 axioms.

ACKNOWLEDGMENTS This research was supported in part by the US Air Force Office of Scientific Research under Contract F49620-82-K-0031. It was also made possible in part by a gift from the Systems Development Foundation.

Bl B LIOG RAPHY Chellas, B. F. ( 1980). M ada/ Logic: An Introduction. Cambridge University Pre'' (This is one of the main textbooks on modal logic.)

4

Autoepistemic Logic

127

Halpern, J. Y. and Moses, Y. (1984). Towards a theory of knowledge and ignorance: preliminary report. Proc. Workshop on Non-Monotonic .Reasoning•. f~!ohonk Mountain House, New Paltz, New York, pp. 125-143. Amencan AssociatiOn for, Artificial Intelligence, Menlo Park, California; reprinted in Logics and Models of Concurrent Systems (ed. K. Apt), pp. 459-476. Springer-Verlag, Berlin (1985). (This paper gives a treatment of the locution "all that I know is P," which is very close to autoepistemic logic. The intended applications are to distributed computing.) Kripke, S. A. (1971). Semantical considerations on modal logic. Reference and Modality (ed. L. Linsky), pp. 63-72. Oxford University Press. (This is an easily understandable presentation of the formal semantics of modal logic by the most important contributor to that field.) Levesque, H. J. ( 1981). The interaction with incomplete knowledge bases: a formal treatment. Proceedings 7th Int. Joint Cof!f on Artificial Intelligence, Vancouver, pp. 240--245. William Kaufmann, Los Altos, California. (Levesque's goal was to characterize knowledge bases that can answer questions about what they know. The formalism does not permit the knowledge base to make explicit reference to what it knows, however, so that facts such as "if P were true, I would know it" are not representable.) McDermott, D. and Doyle, J. ( 1980). Non-monotonic logic I. Artificial/ ntelligence 13, 41-72. (See Section 4 for a detailed discussion.) McDermott, D. (1982). Nonmonotonic logic II: Nonmonotonic modal theories. J. ACM 29, 33-57. (See Section 4 for a detailed discussion.) Minsky, M. (1974). A framework for representing knowledge. MIT Artificial Intelligence Laboratory, AIM-306; reprinted in Mind Design (ed. J. Haugeland), pp. 95-128. MIT Press, Cambridge, Mass. (1981). (This is the paper that initially raised awareness of the problem posed by non-monotonicity for modelling commonsense reasoning as logical inference. Minsky seems to have thought this was a crushing argument against the use of logic, but the response was the invention of logics that are non-monotonic.) Moore, R. C. (1984). Possible-world semantics for autoepistemic logic. Proc. Workshop on Non-monotonic Reasoning, Mohonk Mountain House, New Paltz, New York, pp. 344-354; also SRI Artificial Intelligence Center Technical Note 337, SRI

International, Menlo Park, California (August, 1984). (The contents of this paper are summarized in Section 2.2.) Moore, R. C. (1985). Semantical considerations on nonmonotonic logic. Artificial intelligence 25, 75-94. (This is the original paper on autoepistemic logic, with the main emphasis on diagnosing the problems of McDermott and Doyle's nonmonotonic logics.) Stalnaker, R. (1980). A note on non-monotonic modal logic. Dept. Philosophy, Cornell Univ. (This originated as a commentary on McDermott (1982). It did much to point the way toward autoepistemic logic, but was unfortunately never published.)

DISCUSSION Jean Fargues: In this discussion we shall use the notation Dp instead of Bp, because We shall refer to the traditional modal systems S4 and S5. The autoepistemic logic interpretation introduced by Moore in this chapter is a Very seductive one, in the sense that it tries to avoid some of the problems encountered

128

R. C. Moore

by previous workers in this field. In that sense, it is a major contribution to the representation problem of the deduction on beliefs of rational agents, but the interest of that work goes beyond the autoepistemic formalism because it is related to all the other works on non-monotonic logics. Thus this chapter is very relevant to current research. Its great interest consists in the new and original way in which the author introduces the "belief" logic in the autoepistemic interpretation. Consequently, it is difficult to relate the proposed formalism to the other modal logics and to the traditional way of presenting an interpretating of a modal logic system. Thus it is not easy for the author to relate his discussion to our intuitive notions or to recent work on non-monotonic logics. I am sure that the reader will feel that the prime concern of the author was to be didactic, although the subject should be presented with the usual rigour, as the author did. This discussion should, strictly speaking, start with a survey of the belief logics, or of modal logic. This is not necessary here because the two last sections of the article are a good survey by themselves. Another interesting survey has been produced by Hanks and McDermott (1986), McDermott being one of the main contributors in the related non-monotonic logic field. I shall not compare here all the formalisms proposed for representing and manipulating the "beliefs" of a rational agent, because this productive comparison should be one of the benefits that a reader can gain from reading several chapters of this book. (Another reason is that the conclusion of such a comparison depends on the point of view adopted; a philosophical one versus a practical one, an atomistic one versus a continuous one, and so on.) Thus I should like to focus the discussion on two points that may be found difficult if one starts from the well-known notions that are the basis of classical logic. These points are: the notion of "deductibility" in autoepistemic logic; the consistency notion in autoepistemic logic. When I read the chapter, it appeared to me that a major difficulty was understanding what could be the notion of deduction underlying the characterization of a stable autoepistemic theory. In classical logic the deductibility relation I- can be viewed as a fixed-point operator, i.e. if TH( A) = {x: A I- x} is the deductibility operator (defining the set of theorems derivable from a set A of formulae) then the monotonicity meta-axiom may be written as VX, VY, X c

Y~TH(X)

c TH(Y)

Furthermore, it implies that the TH operator is a fixed-point operator, i.e. that TH(A) = TH(TH(A)). Thus there is a unique set of formulae that we can derive from the axioms, i.e. a unique set of theorems. In the case of non-monotonic logic, the uniqueness of the TH(A) set of theorems derivable as a fixed point of the deductibility operator is not necessarily satisfied. For this reason, it is only possible to characterize a stable expansion of an autoepistemic theory by a non-iterative statement:

T= TH(A v {Dp I peT} v {1Dp I pf T}) This is the first difficulty, because this "holistic" definition cannot lead to an iterative definition such as

4

129

A utoepistemic Logic

To= A

Tn+ 1 = TH(Tn U {Op I P E Tn}

U

{-1 Op I P ¢ T.})

because of the possiblity to go back at the (n + I )th step on some inference done at the jth step, j ~ n. An important consequence is that the proof methods for autoepistemic logic cannot be envisioned as the standard mechanized proofs that we know for classical logic or even for traditional modal logics (S4 or S5). The proof method detailed in this chapter is an interesting contribution for deduction on autoepistemic logic. But, for the above reason, it is in one sense non-constructive, as opposed to proof methods like resolution methods, or semantic-array-based methods. I could also discuss the fact that it seems less deterministic than the known methods, because of the non-iterative definition of the stable expansion of an autoepistemic theory. Thus the gap between the theoretical results on autoepistemic logic and the effective implementation of a proof procedure in a computer today seems difficult to reduce. I think that future work, starting from Moore's results, could provide an easier way to apply the autoepistemic formalism on real application domains. My second remark concerns the relativity of the consistency definition, when we consider non-monotonic logics. I should like to emphasize that a naive point of view about the consistency notion could lead the reader to encounter some paradoxical results. In fact, when we enter in this "non-monotonic-logic world", we must be aware that the most obvious definitions that we learned in classical logic may be related to monotonicity, and might not be valid in autoepistemic logic, for example. Thus let us consider the consistency definition in classical logic. A set of formulae X is inconsistent if and only if any of the following conditions hold: I. 2.

3.

3p such that X I- p and X I- 1p; lfp, X I- p; X I-f, where/is a distinguished formula (the truth-value falsehood), the nega-

tion --, p being defined as the formula p -=> f These three conditions may be proved to be equivalent, but the proof that I know uses the monotonicity property, for example for (2)=>(1) or for (2)=>(3), but not for (I)= (2) or for (3) = (I). Thus I should like to know if there is some relationship between the consistency notion in autoepistemic logic and the fact that it is nonmonotonic. Another question concerns the serial property of the accessibility relation between worlds (there always exists a world accessible from a given world). The serial property implies that {Op, 0--, p} is inconsistent. Is it inconsistent in autoepistemic logic too?

Peter Jackson: First of all, I have a request for clarification. What is in T? There is a reading of Moore's paper that seems to allow every proposition pin an autoepistemic theory Tto be implicitly in the scope the belief operator B. After all, Tis meant to be a theory about what an agent believes, and not a theory about the world. An au.toepistemic interpretation ofT ought to be a model ofT only if every proposition in TIs a true statement of the agent's beliefs, regardless of whether the agent's beliefs are correct. This is how I interpret Definition 2, after reflection, although my original reading of it required that unmodalized formulae in T be true of the world. If every proposition in T is implicity in the scope of B then the semantics that Moore proposes for B, in terms of a complete S5 Kripke structure, appears to follow

130

R. C. Moore

immediately, without the intricacies of Definitions 8 and 9 and Theorem 4. This is because Stalnaker's conditions define a modal logic as strong as K45, therefore containing K5, and it is not hard to show that l-.< 5 Doc iff I-55 oc (see e.g. Chell as, 1980, p. 142). Moore's statement that every autoepistemic theory will contain intances of S5 schemata such as Hoc => oc tends to confirm this reading of Definition 2, and it is also consistent with the secondary reflexiveness of the accessibility relation implied by the K5 theorem D(Doc => oc). If everything in T isn't implicitly in the scope of B then one would want to add seriality to Stalnaker's conditions in order to maintain Moore's semantics. Otherwise, there is a Kripke counter-model where {w} is the set of worlds and both Bp andB- p can be true at w. In other words, it seems that if Bp is in T, then - B- p must also be in T, if the beliefs of our rational agent are to be consistent. From an axiomatic point of view, this is like adding the doxastic equivalent of Doc => -D-oc to weak S5. The resulting logic would then be as strong as KD45, which is widely regarded as an appropriate logic of belief (e.g. Halpern and Moses, 1985; Konolige, 1985). Paradoxes of strict implication. Secondly, I should like to ask Moore if he is happy with the following theorems of autoepistemic logic:

B(Bp => Bq) v B(Bq => Bp) Bp => B(q => p) which have prompted other researchers in doxastic logic (e.g. Levesque, 1984) to adopt alternatives to possible world semantics. How could it be used? Thirdly, I should like to express some reservations about the non-monotonic aspects of autoepistemic logic. Under what conditions would you ever want to enumerate all the stable expansions of an autoepistemic theory (even where it was possible in principle to do so)? There is a lack of convincing examples here. Also, the restriction on quantifying into modal contexts, necessitated by the undecidability of even the monadic modal predicate calculus, must severely curtail the practical utility of any system that attempts to apply autoepistemic logic to problems requiring non-monotonic patterns of reasoning. As the author himself states, the ability to quantify in is required by many useful formulae that we might wish to add as proper axioms to an autoepistemic theory, e.g. 'v'x( P(x) => B( P(x))).

Philippe Smets: (i) In his introduction, Moore says that "if P is not a consequence of a set of premises A then Pis not believed". This does not fit with the commonsense meaning given to "belief". I can think of a situation where Pis not a consequence of a set of premises but nevertheless I believe P. Moore's interpretation for "belief" is some idealized concept. But why does he restrict belief to "sound belief" ("justified belief"), why does he require that "all agent's be\iefs should be grounded in his premises"? It seems that Moore's belief is knowledge in a world where all the premises would be true-this world might be of course different from the actual world. If the premises were true in the actual world then Moore's belief could be plain knowledge. (ii) Could we interpret "ideally rational agent" (see Section 2.2) by claiming that "ideal" means completeness (deductive closeness of T) and "rational" means soundness (believe only what can be deduced from the premises)? (iii) Which modal system could fix doxastic logic? K45logic (also called weak S5 in Moore's chapter) or KD45 (as mentioned in Jackson's comments) seem to us inadequate as they contain the negative introspection axiom 5 (-, BP => B-, BP ). Models KD4 or KD4! (i.e. K: B( P => Q) => (BP => BQ), D: BP => -, B-, P, 4:

4

131

Autoepistemic Logic

BP => BBP, ! BBP => BP; see Chellas, 1980, p. 155) seem to us more appropriate for doxastic logic, where a Peircian definition is given to "belief" (I believe A means that I would act as if A were true). In order to explore what kind of modal logic belief functions are extensions of, we consider belief functions with their range reduced to 0 and 1 (so each belief function has only one focal proposition, see Section 3.1 in Chapter 9). One finds KD4!, but not KD45 as axiom 5 is not satisfied. Let us take a space X with only 2 propositions, P and --, P. There are three ways to allocate the here unique mass on X. Should P = "God exists" and --, P = "God does not exist" then the three possible allocations correspond to those characterizing a= a deist (BP), b =an atheist (B--,P) and c =an agnostic (?P, where ?P is a shorthand for --, BP 1\ --, B--, P), where B is the modal operator "I believe that". Let y = {a v b v c}. Equation (6.1) of Chapter 9 (see page 272) when B = Y gives for x =>X andy e Y

belx(X I Y) =

L mo(z) n belx(X I y)

z:::. y

yez

where belx(x I y) is the degree of belief that x is true given that y e Y and belx(x I Y) is the degree of belief that x is true given an a priori belief function bel 0 defined on Y with mass m0 • Table 1 shows belx(x I y) for x = P, --, P and P v --, P and y e Y = {a v b v c}. Table 1

Values of belx(x I y)

X

Deist a belx(. I a)

Atheist b belx(. I b)

Agnostic c belx(. I c)

p --,p p v --,p

1 0 1

0 1 1

0 0 1

BP

B--,P

?P

Table 2 shows the seven possible allocations of the unique mass by the a priori belief functions defined on Y. Thus the column a v b corresponds to the case where I only believe that I am either a deist or an atheist, I only believe that I am not an agnostic. The modal translation translates my a priori belief with modal operators, so that "I only believe that I am not an agnostic" is translated into B(a v b) = B--,? P. At the b~ttom of the table the modal translation is the translation of belx(x I Y) computed With (6.1) given above. One must note that BBP and BP are in the same column; therefore one has BP == BBP, therefore axiom 4! is satisfied. But when --, BP is true, one does not necessarily have B--, BP.--, BP is true in all a priori allocations except the first one, but B: BP is true only in one of the six a priori allocations compatible with --, BP. So ~Xlom 5 (--, BP => B--, BP) is not satisfied. Therefore belief functions are a general!Zation of KD4! but not of KD45, where degrees of belief can take the values on the Interval [0, 1] and not only in {0, 1}.

132

R. C. Moore

Table 2 Values ofbelx(x I Y) for x in X for each of the seven possible a priori belief functions on Y. ??Pis shorthand for B(a v b v c), the case of total ignorance on Y

Focal elements for which m0 {.) Modal translations

a B(a) 88P

b B(b) 88--,P

c B(c) 8?P

0

0 I

0 0

8P

8--,P

?P

=

I

ave 8(a v c) 8--,8--,P

bvc 8(b v c) 8--,8P

avbvc 8(a v b v c) ??P

0 0

0 0

0 0

0 0

?P

?P

?P

?P

avb B(a v b) 8--,?P

X p --,p p v --,p

Modal translations

I should like to give an additional comment on the applicability of Moore's autoepistemic logic (AEL) and on the epistemology of"logics for introspective reasoning. AEL allows the modelling of the beliefs that an ideally rational introspective reasoner is entitled to hold on the basis of an initial set of premises. As such, AEL is a competence model of reflection upon one's beliefs. Mitigated computational results and limited applicability have been presented by Moore. One obvious reason for these defects that I should like to stress is the ideal nature of the kind of reasoning modelled by AEL, which may assume unbounded computational resources. Logical completeness is assumed for such a mode of reasoning. AEL is thus an adequate model when all the possible logical consequences of an initial set of premises must be taken into account. However, such abilities are not required, or even desirable, for several AI applications where cognitive skill and even negative introspection are to be modelled. With respect to these applications, AEL should thus be regarded as an idealized model to which an effective system may converge, but not a tool that should be used directly. Other systems for introspective reasoning must be designed to this end. To illustrate this, let us consider the well-known Tweety example. Independently of the other reasons raised by Moore, the ideal nature of the reasoning modelled by AEL may discard this problem from the possible domain of applicability. Actually, if we are able to derive that Tweety can fly, it is not because we cannot prove or do not believe through the use of perfect, sound and complete logical abilities that Tweety does not fly. The paradigm that we want to model is instead concerned with the fact that we do not explicitly hold the belief that Tweety cannot fly and we do not arrive with a certain amount of introspective investigation at the conclusion that Tweety does not fly. This introspective investigation certainly requires some logical abilities, but not the whole capacity of a complete first-order theorem-prover (at least. as far as I consider my personal way of reasoning about my knowledge of flying birds). Actually, we can question whether an adequate model should present completeness and soundness with respect to what is implicitly implied by an initial set of beliefs. In contrast, an ideal logical introspective reasoning is closed under logical consequence and has the following logical omniscience properties. E. Gregoire:

4

A utoepistemic Logic

133

The final sets of beliefs include all the valid formulae. They contain all or none of each class of logically equivalent formulae. All formulae are present if contradictory beliefs are held. However, it should be possible to avoid these often unrealistic properties. Recent new logics of knowledge and belief have been proposed in order to avoid the properties of logical omniscience. For instance, we may not require that the deductive system be complete (see Konolige, 1984), or we can reinterpret the possible-world semantics in order to get more reasonable ways of characterizing beliefs of rational agents (for instance, by distinguishing between what is explicit and implicit in one's beliefs-see Levesque, 1984). Although these new logics have not yet really coped with limited negative introspection, they may influence the design of systems for limited introspective reasoning. Moore's AEL should thus be viewed as a model of perfect introspective reasoning abilities. When aiming at the modelling of beliefs and disbeliefs held by agents that do not need to have complete reasoning abilities, other models seem desirable. In conclusion, AEL is used to model the beliefs of an ideally rational introspective agent. As Moore demonstrates, AEL can be used directly in several application domains. However, it should be remarked that AEL relies upon strong assumptions with respect to the modelled reasoning abilities. In several domains, computational shortcuts or restricted reasoning abilities seem acceptable or even desirable. Under these conditions, the role of AEL should appear as an idealized model because of the completeness of the logical reasoning that it models.

Reply: Fargues is right to focus on the "non-constructive" nature of the specification and computation of stable expansions in autoepistemic logic. It was for this very reason that I have studiously avoided the use of the term "deducibility" in my chapter, since autoepistemic consequences are not deducible from premises in any ordinary sense of the term. In the general case, constructive methods are known to be !lnachievable, since stable exP.ansions are not always recursively enumerable. The Interesting question is whether for special cases there would be constructive, iterative methods of computing stable expansions. In fact there is a well-known iterative method that is sound, but not complete, based on the "negation by failure" rule widely used in database querying and Prolog-style reasoning systems. The idea is that to prove --, B ( P ), the system tries all possible ways of proving P. If the system exhausts all possible ways of proving P without finding a proof, and if the underlying monotonic deduction method is complete, then this constitutes a (meta) proof that P is not deducible; hence --, B( P) is true. The method is not complete, however, because the attempt to prove P may not terminate. In that case --,B(P) is true, but the system never arrives at that conclusion. It would seem to be a very important question for theoretical analysis to find out whether there is any natural characterization of the autoepistemic theories for which the attempt to prove a formula would always terminate and hence this iterative method would be complete. The three notions of consistency discussed by Fargues are, in fact, all equivalent in autoepistemic logic. This follows immediately from the fact that stable autoepistemic theories are closed under ordinary logical consequence. Autoepistemic logic has ~nother aspect, however, that is in some intuitive sense another kind of inconsistency, ~· the possibility that there is no stable expansion of a set of premises. That is, it may that from a certain set of premises, an agent cannot construct any set of beliefs that conform to the conditions of autoepistemic logic, not even a set that is inconsistent in

134

R. C. Moore

the ordinary sense. I believe that the familiar paradox of "the unexpected hanging" (Montague and Kaplan, 1960) can be reconstructed in this way. To answer Fargues's question about seriality of the accessibility relation, it is necessary to look at the relationship between autoepistemic logic and standard modal logics from two different perspectives. One question that we can ask is what standard modal logics are included in an autoepistemic theory. A corollary of Theorem 3 is that every stable autoepistemic theory includes all the valid sentences of S5 and is closed under S5 consequence. Hence if we build standard Kripke models for stable autoepistemic theories then the accessibility relation will be an equivalence relation, and therefore serial. A slightly different question, however, is what modal logic describes autoepistemic logic from the outside (so to speak)? That is, suppose we do put every formula of an autoepistemic theory within the scope of an additional B, as Jackson suggests. It appears that the appropriate standard modal logic for this purpose is what I have called (following Stalnaker) weak S5 (or K45 in Chellas's terminology), which does not satisfy seriality. We cannot use KD45, which incorporates B(P) :::J 1B(1P), because we want to be able to describe inconsistent autoepistemic theories. These can arise, because if a set of premises is inconsistent then it will have a single stable expansion, which is also inconsistent, in view of Definition 4 and the stability conditions. K45, then, seems to be the logic that has as its models (not its theories) all possible stable autoepistemic theories. Hence, from this point of view, seriality is not appropriate. Jackson's question "What is in T?" is a natural consequence of the self-reflective nature of autoepistemic theories. The key to understanding what is going on is to realize that autoepistemic theories are not merely about the beliefs of a rational agent, they are supposed to be in the beliefs of a rational agent. A primitive proposition Pin an autoepistemic theory, then, is true only if Pis true in the world. To make an analogy, suppose that we have a book that collects the speeches of some famous politician. Of course one can ask whether the book is accurate in the sense of correctly recording what the politician said, but if we ask whether a particular statement in the book is true (e.g. "inflation has never been lower"), we have to look at the world, not merely at what the politician said. To push the analogy even further, if the politician says "I have said that inflation has never been lower" then that can be true whether or not inflation actually has been lower, but that statement involves explicit reference to what has been said. So, in Jackson's terms, things in Tare not implicitly in the scope of B with respect to their truth conditions. Jackson's proposed counter-model reflects a misunderstanding of the relationship between the possible-world interpretations of autoepistemic theories and standard Kripke structures for modal logic. Recall that a possible-world interpretation of an autoepistemic theory is an ordered pair of a single possible world, to represent what is true in the actual world, and a complete S5 structure, to represent what the agent believes. If we wanted to use a standard Kripke structure rather than these ordered pairs, we should add the actual world to the S5 structure, making each world in the original S5 structure accessible from the actual world, but the actual world would not necessarily be accessible from any other world. The resulting structure would satisfy the so-called Euclidean property, R(x, y) v R(x, z) :::J R(y, z), rather than necessarily being an equivalence relation as in S5 structures. Using this construction, {w} with 1 R(w, w) represents the situation where w is the actual world and the agent has inconsistent beliefs; hence there are no worlds compatible with what the agent believes in the actual world. In such a case, it is perfectly appropriate forB( P) and B(1 P) to both be true at w.

4

Autoepistemic Logic

135

Jackson's question about the paradoxes of strict implication seems really beside the point. These are paradoxes only if one is trying to use strict implication to model natural-language conditionals. Autoepistemic logic makes no attempt to say anything about conditionals, and if we remind ourselves that in autoepistemic logic, like classical logic, P => Q means nothing more than 1 P v Q then both of the examples Jackson gives are perfectly reasonable. As to Jackson's final points, the computational methods discussed in the paper apply to autoepistemic theories with more than one stable expansion more for theoretical reasons than practical ones. Straightforward examples of autoepistemic reasoning, like the Nixon example in the paper, do in fact result in unique stable expansions. Of course it is possible for pathological cases to occur, but the fact that things like the "unexpected hanging" paradox are so puzzling to our untutored intuition suggests that this is a genuine aspect of autoepistemic reasoning and not just an artefact of the logic. The fact that the monadic modal predicate calculus is undecidable does not, in and of itself, mean that quantifying into the scope of the autoepistemic operator is computationally intractable. Since autoepistemic logic is actually stronger than ordinary weak S5, it is not clear that the result to which Jackson refers is applicable. For example, the word problem for groups is undecidable, but the word problem for Abelian groups (a stronger theory) is trivial. The real problem with quantifying into the scope of the autoepistemic operator is to figure out what it means. Until this conceptual problem is solved, the computational problem cannot even be properly posed. In answer to Smets, whether every belief is the consequence of a set of premises depends on what one takes the premises to be. Reasoning has to start somewhere, so for our purposes an agent's premises could be taken to be simply those beliefs that are not the result of reasoning from other beliefs. These might be beliefs that are the direct result of perception. The notion of premise then, is used in an almost tautological way in this theory. Smets is correct in his conjecture that I am taking "ideal" to mean that the agent's reasoning is logically complete, and "rational" to mean that his reasoning is logically sound. Of course, this itself is a highly idealized notion of rationality, since it is often rational to believe something that one doesn't have irrefutable evidence for, but that is more in Professor Smets's line than mine. Although Smets finds negative introspection unintuitive as a property of belief, the whole enterprise of autoepistemic logic is, in some sense, just an exploration of the consequences of having perfect negative introspection, so this question is one that cannot really be addressed from within the perspective of autoepistemic logic. To defend negative introspection, however, consider the question of whether the President of the United States was standing up or not at precisely 10.00 a.m. on I January 1987. I dare say that the vast majority of people have no belief whatever about this question, and that furthermore, they are firmly convinced that they have no such belief. But being so convinced requires negative introspection! I must confess that the exact relation of autoepistemic logic to the logic of belief ~unctions remains obscure to me, so I do not really understand why negative !ntrospection seems to be ruled out in that framework. There is, however, what to me Is a more obvious way of connecting autoepistemic logic to a framework that deals in degrees of belief, namely subjective probability. Suppose that we define a probability measure over the set of possible worlds in the S5 structure that characterize a Particular stable autoepistemic theory. We can interpret the degree of belief in a Proposition to be the probability of the set of worlds in which that proposition is true. If we let the language talk the subjective probability the agent assigns to propositions then the analogue of perfect introspection will simply be to say that every agent has

136

R. C. Moore

perfect knowledge of exactly what subjective probability he assigns to every proposition. This may not be the most appealing theory of degrees of belief, but at least the framework itself does not rule out the possibility of negative introspection, as Smets's approach to belief functions seems to. The issue that Gregoire raises is to some extent a result of tension between the desire, on the one hand, that autoepistemic logic concern itself with the reflection of rational agents upon what they believe, and the fact, on the other hand, that it is a logic. In treating it as a logic, we are concerned with its completeness properties, and a logically complete theory can be viewed as a set of beliefs only of an ideally rational agent. It is important to realize, though, that the basic semantic definitions of autoepistemic interpretation and model and the notion of soundness are applicable even to incomplete theories. Lacking a specific theory of the reasoning abilities of lessthan-ideally rational agents, however, it is hard to derive any definite results about such agents. Additional references

Halpern, J. Y. (ed.) ( 1986). Proc. 1986 Conj: on Theoretical Aspects of Reasoning ahora Knowledge. Morgan Kaufmann, Los Altos, California. Halpern, J. Y. and Moses, Y. (1985). A guide to the modal logics of knowledge and belief: preliminary draft. Proc. 9th Int. Joint Conf on Artificial Intelligence, Los Angeles, pp. 480-490. Morgan Kaufmann, Los Altos, California. Hanks, S. and McDermott, D. (1986). Default reasoning, non-monotonic logics, and the frame problem. Proc. American Association for Artificial Intelligence Con( (AAAI-86), Philadelphia, pp. 328-333. Konolige, K. (1984). A deduction model of belief and its logics. Report STAN-CS-841022, Dept. Computer Sci., Stanford Univ. Konolige, K. (1985). A computational theory of belief introspection. Proc. 9th Int. Joint Conf on Artificial Intelligence, Los Angeles, pp. 502-508. Morgan Kaufmann. Los Altos, California. Levesque, H. J. (1984). A logic of implicit and explicit belief. Proc. American Association for Artificial Intelligence Conf (AAAI-84), pp. 198-202, Morgan Kaufmann, Los Altos, California. Montague, R. and Kaplan, D. (1960). A paradox regained. Notre Dame J. Formal Logic 1, 79-90; reprinted in Formal Philosophy: Selected Papers of Richard Montague (ed. R. H. Thomason), pp. 271-285. Yale University Press, New Haven. Conn. (1974). Newell, A. (1980). The knowledge level. AI Magazine 2, no. 2, pp. 1-20.

5

The Preferential-Models Approach to Non-Monotonic Logics PHILIPPE BESNARD /RISA. Rennes, France

PIERRE SIEGEL GIA. Universite d'Aix en Provence

a Luminy, Marseille, France

Abstract We suggest that non-monotonic reasoning could be expressed by preferring some interpretations and models over others in a given state of knowledge. We study a notion of preference based upon an ordering of the models of a theory. This notion enables us to compare various non-monotonic systems, including circumscription.

1

INTRODUCTION

Logics, those special systems aimed at formalizing arguments into dedicated languages, rely on the notion of inference-an operation that assigns to premises a conclusion. Accordingly, a logic derives its basic features from the properties of the various kinds of inferences that it allows. Classical firstorder logic, for instance, is monotonic because any deductiont is valid; that is, there is no (first-order) interpretationt -and hence no (first-order) model§ of the premises-in which the premises of the deduction would hold as opposed

t Deduction is the precise word for the notion of inference developed in first-order logic. The conclusion of a deduction is said to be deduced from the premises of that deduction. We consider here deduction from the semantic, that is model-theoretic, point of view, bearing in mind that the sernantic and syntactic approaches are equivalent in first-order logic. t A (first-order) interpretation is a special structure of first-order logic that makes every sentence (a formula) of a first-order language to be assigned a truth value, either true or false. §Given a theory, that is, a set of sentences (formulae), a (first-order) model of the theory is a (first-order) interpretation that makes every sentence (formula) of the theory to be assigned the truth value true. ~~~·STANDARD LOGICS FOR AUTOMATED SONING ISBN 0-12-649520-3

Cop)riqhr :1' /9HH Academic Pre.'ls Limirt•d All right.'l ~{reproduction in any j(Jrm re.'lert•ed

138

P. Be.mard and P. SieK(•/

to the conclusion. Stated alternatively, monotonicity is a property of firstorder logic stating that if H is part of a collection of premises K then whatever can be deduced from H can be deduced from K as well. The concept of monotonicity can easily be extended to any logic by substituting the notion of deduction for that of inference. Here is an example that illustrates the relationship between the monotonicity of a logic and the arguments to be formalized by a logic. Knowing that Socrates is a man and that all men are mortal, we can conclude that Socrates is mortal. This argument can be regarded as exemplifying validity because it can be formlized in first-order logic as the deduction whose premises are M AN(Socrates) and VxMAN(x)=MORTAL(x) and the conclusion is MORTAL(Socrates). Then, by monotonicity (of first-order logic here), that MORTAL(Socrates) can be deduced from the two given premises implies that it can be deduced whichever group of additional premises is introduced. In fact, the word monotonicity describes the intuitive principle by which increasing a collection of premises always increases the class of all corresponding conclusions. In contrast with the preceding valid inference, consider the following argument. Knowing that Socrates is a man and that almost all men do not suffer from agutyt leads us to conclude that Socrates does not suffer from aguty. After translating the components of this argument into sentences of an adequate first-order language, we are faced with a purported inference that would have ---,A GUT/ ED(Socrates) as a conclusion and a suitable collection of sentences (formulae) as premises. So far so good. However, if A GUT/ ED(Socrates) is added to the initial collection of premises then, clearly, no inference can be accepted that would have ---,A GUT/ ED(Socrates) as a conclusion following from the enlarged collection of premises. This means that, although the collection of premises in hand has been increased, the class of all corresponding conclusions did not increase in the sense that the class lost the element ---,A GUT/ ED(Socrates). In such a case, the mono tonicity principle is violated (as a further evidence for this, consider the original definition of monotonicity and observe that ---,A GUT/ ED( Socrates) cannot be inferred from a collection of premises a part of which makes it possible to be inferred). What is obviously required here is a non-monotonic logic (see AI (1980) or NM RW (1984) for an introduction to the topic). It appears from this example that monotonicity enforces a notion of inference too restrictive to formalize all those arguments that everybody commonly presents or accepts without considering validity. Yet, the importance of such arguments in Artificial Intelligence and in the field of t

A gut y is the disease that makes people unable to taste food.

5

Preferential-Models Approach to Non-Monotonic Lol(ics

139

expert systems in particular seems to be uncontroversial, as we are usually content with being right in all plausible states of affairs if we are not certain to be right K1 all possible states of affairs. In the context of models, this is to say that we are to be concerned with truth in models of which commonsense tells us that they are worthy of more consideration than the others. From this point of view, the right way to define a non-monotonic logic should be by just specifying a set of such preferential models from a superset of first-order models (such a formulation, by explicitly defining preferential models as firstorder models, emphasizes the fact that deductions are to be captured by the logic being devised-which is then going to extend first-order logic, as it seems inescapable to have all valid inferences in a logic that admits some invalid ones). This paper is devoted to a few non-monotonic logics that can indeed be defined in such a way (no philosophical issues are addressed, as we are not concerned here with why one particular definition of preferential models has been chosen rather than another).

2

FROM FIRST-ORDER MODELS TO MINIMAL MODELS

Starting from here, we focus on non-monotonic logics for which the notion of preferential models involved corresponds to minimality with respect to a certain preordering defined on sets of models. This enables us to present a few of those non-monotonic logics within a unique yet reasonably simple framework to be introduced in some detail shortly. For the time being, let us briefly review some basic notions of model theory of first-order logic, in particular in order to fix terminology and notations. First of all, an alphabet A consists of the usual logical symbols, variables, function symbols and relation symbols. Given an alphabet A, a first-order language, denoted L(A), consists of 0, TRUE, FALSE, terms and formulae built in the usual manner from symbols in A.

Definition Given an alphabet A, an interpretation M is a function over L(A) such that the following requirements are met: (i)

M[0] = D, where Dis a non-empty set called the domain of M;

(ii)

M (JJ

(iii)

M[R]

(iv)

M[t] ED for every term t;

(v)

M [F]

E E

DD" for every function symbol f of rank n;

2D" for every relation symbol R of rank n;

E { 0,

I} for every formula F;

140

P. Besnard and P. Siegel

(vi) (vii) (viii)

M[f(t~>

.. . , t.)] = M[f](M[t 1 ], ... , M[t.]) for all function symbolsjofrank nand terms t~>····t.;

M[R(t 1 , ••• , t.)] = M[R](M[t 1 ], • •• , M[t.]) for all relation symbols R of rank n and terms t 1 , ... , t.; M[FALSE] = 1- M[TRUE] = 0;

(ix)

M[1F] = 1 - M[F] for every formula F;

(x)

M[F v F'] =max (M[F], M[F']) for all formulae F and F';

(xi)

for every formula F and variable x; M['v'xF] = 1 iff N[F] = 1 for all models N such that N I L(A - {x}) = M I L(A- {x}).t

Notation Values 0 and 1 of the set 2 = {0, 1} correspond to the usual values for characteristic functions of subsets of a set (context always making it clear what this set is, product D", set of formulae, ... ).

By conditions (vi)-(xi), the truth value assigned to a formula is constrained to respect the principle of compositionality: the meaning of a formula depends on the meaning of its components. Modal logic (for a brief account of which see Appendix B to the Introduction to this book, and for an application see Chapter 2) for instance, differs from first-order logic in this respect. A model of a theory (i.e. set of formulae) Tis a first-order interpretation in which every formula ofT is satisfied or true (i.e. assigned value 1). A theory is satisfiable if and only if there exists at least one model of it. A theory T entails a formula F, denoted T 1 I= F, if and only if F is true in all models of T. · At the end of the last section we expressed, on a model-theoretic level, our view of the non-monotonic character of commonsense arguments as follows: in the perspective of commonsense, some models can be neglected. According to our approach, then in a non-monotonic logic formalizing commonsense arguments, the right models correspond to a class of preferential first-order models. If we want to characterize a non-monotonic logic in this way then we have to specify what the preferential models are. To do that, we could just elicit some models, but, of course, we won't get anywhere this way, so let's try something else. Think of the aguty example. In the light of a model where it is true that Socrates does not suffer from aguty, a model where it is true that Socrates suffers from aguty is not worth considering. More generally, given two models, we hope that we are able to prefer one to the other. We can then discard the latter model and proceed to the next pair of models. Eventually. t The notation !I D is used to designate the restriction of the function f to the domain D.

5

Preferential-Models Approach to Non-Monotonic Lol(ics

141

we shall get a stable state where only preferred models remain (this is only an intuitive presentation, so we don't need to bother about the process described here being effective or not). Such a comparison between models is intended to be achieved by means of a preordering, and the preferential models are then defined to be all models that are minimal with respect to the preordering. We now proceed to define the notion of preemption preordering, which is the cornerstone of our preferential-models approach to non-monotonic logics. Definition Given an alphabet A with set of relation symbols R, the preemption preordering ~P attached to the partition P = (R=, R+, R-, R.) of the set of relation symbols R is a preordering such that for every two models M and N over L(A), N minors M with respect to partition P, denoted N ~PM, iff (a)

N[0] = M[0];

(b)

N[x]

(c)

N[f] = M[f] for all function symbols/;

(d)

N[R]

(e)

N[R] 2 M[R] for all relation symbols R in R+;

(f)

N[R]

= M[x] =

~

for all variables x;

M[R] for all relation symbols R in R=;

M[R] for all relation symbols R in R_.

Here is an example of minoring where M and N are models sharing the same domain, namely D = {Socrates, Emos} and interpreting the constant Socrates (respectively Emos) by the element Socrates (respectively Emos) of the domain D. The preemption preordering to be considered is such that a model should be preferred to another if it makes the property A GUT/ ED smaller (R _ consists of the unique relation symbol A GUT/ ED) and if it makes the property 0 RDI NARY larger (R + consists of the unique relation symbol ORDINARY), regardless of the property FAST-WALKER (R. the set of varying relations, consists of the unique relation symbol FASTWALKER); in addition, all other relations (including the property MAN) should be the same in both the preferred model and the non-preferred model (R= = R- {AGUTIED, ORDINARY, FAST-WALKER}). Relations in M are as follows:

M[MAN]

=

{Emos, Socrates}

M[ORDINARY]={} M[AGUTIED] ={Socrates} M[FAST-WALKER] ={Socrates}

142

P. Besnard and P. Siegl'/

Relations in N are as follows:

N[MAN]

=

{Emos, Socrates}

N[ORDINARY] N[AGVTIED]

=

{Emos}

= { }

N[F AST-WALKER]

=

{Emos}

Since the preemption preordering ~P is based on the partition P =
Definition Given a preemption preordering ~p, a minimal model M of a theory T is a model ofT such that for every model N ofT, if N ~PM then N I L(A - R.) = M I L(A - R.). The following is an alternative definition.

Definition A model M of a theory T !s minimal with respect to a preemption preordering ~P iff for every model N of T, if N ~PM then M~pN.

Note that in the above example N ~PM but N is not minimal. Observe that when P is a partition such that R. is empty, the preemption preordering attached toP is actually an ordering (maybe a partial one). Also. it can be shown that for every preemption preordering, any universal theoryt has at least one minimal model.

3

THE CLOSED-WORLD ASSUMPTION

Studying the closed-world assumption (CW A for short) (Reiter, 1978)

IS

t A universal thenry is one in which every formula has only universal quantifiers, all of thcnl occurring in the formula before all relation symbols.

5

Preferential-Models Approach to Non-Monotonic Logics

143

especially instructive regarding the relationships tying non-monotonicity and a formal definition of inferences based on truth in minimal models, as the notion of inference embodied in CW A is simply that if a groundt atomict formula A cannot be deduced then its denial --,A can be inferred. Such a focusing on atomic formulae comes from the fact that CW A has been motivated by considerations about databases and expert systems. In fact, Reiter's original definition of CW A (the one used here), is restricted to sets of ground literals§ (the so-called extensional theories); that is, to theories corresponding to databases. Here is an illustration of the kind of problem CW A was devised to tackle. Consider an expert system devoted to financial matters about French companies. In the knowledge base of the system there are items of information such as "Bull is a computer manufacturer" and which can be encoded as the ground atomic formula COM P-M ANUF(Bull). On the other hand, it is hardly possible to have registered information showing all other French companies that are not computer manufacturers. First, adding -,COMP-MANUF(Peugeot),!! -,COMP-MANUF(Dassault)~, ... would result in a huge amount of information. Secondly, adding Vx COMPMANVF(x) x =Bull would be too rigid, as an update would have to complete or replace a mere addition of information for any entry of an additional computer manufacturer. Since the denials of ground atomic formulae of interest here, namely -,COMP-MANUF(Peugeot), -,COMPMANUF(Dassault) and so on, have to be taken into account by the expert system in some way other than through registration in the knowledge base, they have to be inferred and all this is conformed to by CW A. As an informal justification for the underlying principle of CW A, observe that knowledge bases in general are expected only to retrieve those relationships that do hold. This is because knowledge bases are built by gathering specifications of conditions under which relations hold but no information the other way round is provided: absence of relationship, being irrelevant, is identified with absence of information. We now proceed to formally define the notion of inference for CW A in terms of minimal models. Since CW A deals with ground literals, attention can be restricted to the well-known class of Her brand models (see e.g. Chang and Lee, 1973). A definition follows.

=

t An expression, terms of formula, of a first-order language, is ground if and only if no variable occurs in it. tAn atomic .fi~rmula consists of a relation symbol of rank n provided with n terms. §A literal is either an atomic formula or the denial of an atomic formula. II Peugeot is a French car company. ~ Dassault is a French airplane company.

144

P. Besnard and P. Sie!fe{

Definition

Given a language L(A), a Herbrand model M is such that

(i)

the domain of M consists of all ground terms of L(A);

(ii)

M[t] = t for any ground term t of L(A).

From the notions of minimal models and Herbrand models, a rather simple model-theoretic definition of CW A can be given. Definition A ground literal A follows from an extensional theory T hy C W A, denoted T !=cw A A iff A is true in all Herbrand models of T that are minimal with respect to the preemption preordering ~p, where p = <{ }, { }, R, { }). This definition can easily be generalized to encompass more complex formulae. Definition An arbitrary formula F follows from an extensional theory T by C W A, denoted T !=cw A F iff Cn(T) I= F, where Cn(T) = {A/ A IS a ground literal and T !=cw A A}. Since, for CW A, the preemption preordering to consider (and hence to which the term minimal refers throughout this section) is the one attached to the partition P such that R _ = R, all relations are minimized in the minimal models. Returning to our example, here are two models of the theory T 1 , which is assumed to contain COMP-MANUF(Bull) but no other formula involving the relation symbol COMP-MANUF. We call M 1 the model such that (i)

its domain D is {Bull, Peugeot, ... };

(ii)

M 1 [Bull] =Bull,

(iii)

M 1 [COMP-MANVF](d)= 1 iffd=BullforalldED.

M 1 [Peugeot] =Peugeot, ... ;

We call M 2 the model such that (i)

its domain is D (that is, M 2 has the same domain as M J);

(ii)

M 2 [Bull] =Bull,

(iii)

M 2 [COMP-MANVF](d)= 1 foralldED.

M 2 [Peugeot] =Peugeot, ... ;

Both M 1 and M 2 are models ofT 1 • Clearly, M 1 ~PM 2 (but . he converse is false), where ~Pis the preemption preordering such that the component R

5

Preferential-Models Approach to Non-Monotonic Lol{ics

145

of the partition Pis R. In fact, M 1 is a minimal model whereas M 2 of course is not. It can be easily established that COMP-MANUF(Bull) follows from T 1 by CWA, denoted T 1 l=cwA COMP-MANUF(Bull) because COMPM AN U F(Bull) being in T 1 is true in all models of T ~> and hence in all minimal ones. On the other hand, COMP-MANUF(Peugeot) is false in M 1 as it is false in all minimal Herbrand models of T 1. Therefore T 1 l=cwA ---,COMP-MANUF(Peugeot). Similarly, T1 l=cwA ---,COMPMANVF(Dassault), etc. Non-monotonicity of CW A can be illustrated by means of the formula COMP-MANUF(Peugeot) being added to theory T 1. Then, T 1 l=cwA ---,COMP-MANUF(Peugeot) but obviously T 1 u {COMP-

MANVF(Peugeot)} I=FcwA ---,COMP-MANUF(Peugeot). A well-known result (Reiter, 1978) establishes the consistencyt of CW A. This property of CW A is highly dependent on the form of the theories to which CW A is defined to apply because there can be no similar result as far as non-Horn theoriest are concerned. The reason for this is that all minimal Herbrand models of a Horn theory share the same set of true ground literals. This property strengthens the importance of CW A, because databases and expert systems have been for a long time information systems in which information corresponds to Horn theories. We now give an example that illustrates problems of consistency for theories that are not Horn. Taking the formula COMP-MANUF(Peugeot) v COMPM AN U F (Dassault) as the sole element of the theory T 2 determines minimal Herbrand models of opposite contents; in some models, COMPMANUF(Peugeot) is true but not COMP-MANUF(Dassault), and in the other models COMP-MANUF(Dassault) is true but not COMPMANVF(Peugeot). It would then be concluded both ---,COMPMANVF(Peugeot) and ---,COMP-MANUF(Dassault). The formula COMP-MANUF(Peugeot) v COMP-MANUF(Dassault) ofT 2 then gives rise to a contradiction and actually to an inconsistency because T 2 is satisfiable. We conclude this section by introducing the problem of inferential closure for CWA, for instance, ---,COMP-MANUF(t) can be inferred from T 1 by CW A for any ground term t of the language (apart from the ground term Bull of course), but Vx COMP-MANUF(x)~x =Bull cannot be inferred from T 1 by CW A, even though it is true in all minimal models ofT 1.

t Consistency is the precise term for the absence of contradiction between formulae inferred from a satisfiable theory or between those formulae and formulae of the theory. t A Horn theory is a universal theory in which every formula consists of a quantifiers part followed by a conjunction of disjunctions each of which has at most one disjunct being an atomic formula, all other disjuncts in the disjunction being denials of atomic formulae.

146

4

P. Besnard and P. Siegel

SUBIMPLICATION

In order to generalize CW A (in particular to deal consistently with theories that are not Horn) and to remedy the absence of inferential closure for CW A, a non-monotonic logic has been devised (Bossu and Siegel, 1982, 1985) that is called subimplication. It is natural then that in the general framework we set up subimplication is based on the very same preemption preordering as CWA (the one relying on the partition P = (R =, R +> R _, R), where R _ = R). As a consequence, subimplication will be characterized by the way in which it employs minimal models. In this regard, the concept of a discriminant model is as important for subimplication as the concept of Her brand model is for CW A.

Definition Given an alphabet A and language L(A), a discriminant model M is such that (i) (ii) (iii)

the domain of M consists of all ground terms of L(A') for some alphabet A' where A s; A'; M [c] = c for any constant c of L(A); M[f](t~> .. . , t.] = f(t 1 , ... , t.) for all function symbols/ of A and all ground terms t 1 , .. • , t. of L(A').

Such a model is called discriminant (Colmerauer, 1979) because it tends to make two different ground terms denote two different elements of its domain. Observe that Herbrand models are a proper subclass of the discriminant models, the actual difference between both notions being too technical a matter to be of interest here. The fundamental concept with respect to subimplication is minimality over the class of discriminant models. We make this precise in the following definition.

Definition A formula F follows from a theory T by subimplication, denoted T I= suB F, iff F is true in all discriminant models ofT that are minimal with respect to preemption preordering ~p, where P = ( { }, { }, R, { }). Of course, throughout this section, the term minimal refers to the preemption preordering attached to the partition P = ( { }, { }, R, { }). To illustrate these definitions, let us return to the theory T 1 consisting of the formula COM P-M ANU F(Bull) and other formulae not involving the relation symbol COMP-MANUF. We call M 3 the model such that

5

147

Preferential-Models Approach to Non-Monotonic Logics

(i)

its domain D is {Bull, Peugeot, Dassau/t, ... } (that is, it consists of all ground terms of L(A));

(ii)

M 3 [Bull] = Bull, M 3 [Peugeot] M 3 [Dassault] = Dassault, .. . ;

(iii)

=

Peugeot,

M 3 [COMP-MANUF](d)= I ilfd=BullforalldED.

Clearly, M 3 is a discriminant minimal model of T 1. Also, all other discriminant minimal models ofT1 look like M 3 , so that T 1 l=suB •COM PMANUF(Peugeot), T 1 l=su 8 -,COMP-MANUF(Dassault), and so on. More importantly, T 1 l=suB Vx COM P-M ANUF(x) x =Bull, as desired. Subimplication is obviously a non-monotonic logic, and in any case, the example illustrating the non-monotonic nature of CW A could be used here as well. It has to be pointed out that considering only discriminant models is precisely what permits the equality relation to be minimized in fact. As we shall see in the next section, minimization of equality cannot always be achieved if arbitrary models are involved, and this sometimes precludes minimization of other relations. In contrast with CW A, subimplication need not be restricted to Horn theories. Returning to the theory T 2 that consists of the unique formula COM P-M ANUF(Peugeot) v COMP-MANUF(Dassault), no contradiction follows from T 2 by subimplication. Subimplication combines truth in models where COMP-MANUF(Peugeot) is true but not COMPMANUF(Dassault) and truth in models where COMP-MANUF(Dassault) is true but not COMP-MANUF(Peugeot), to yield the formula -,COMPMANUF(Peugeot) v -,COMP-MANUF(Dassault). In symbols, T2i=su 8 oCOMP-MANUF(Peugeot) v -,COMP-MANUF(Dassault). It can be shown that universal theories always have at least one discriminant minimal model. It follows that consistency of subimplication is assured over the class of satisfiable universal theories. Here is an example of a theory that has no minimal model.

=

Let T 3 be the theory consisting of the four formulae below: Vx G(s(x), x)

Vxyz G(x, y)

1\

G(y, z)

= G(x, z)

Vx 1G(x, x) 3xVy G(y, x) = L(y)

To see that T 3 has no minimal models, it helps to interprets as "successor of", G as "is greater than" and L as "is a large number". Then, on the set of

148

P. Be.mard and P. Siel(el

natural numbers, the fourth formula means that there is a natural number that is the greatest natural number that is not a "large number". Obviously. a first case for the set of the "large numbers" is all natural numbers, another is all natural numbers except zero, still another is all natural numbers except zero and one, and so on. In terms of models, this means that there exists an infinite chain of inclusions of subsets of natural numbers, all of these subsets being interpretations for L. Consequently, there is an infinite chain of minoring of models ofT 3 that have the structure of natural numbers. For models with arbitrary domains things are slightly more complicated, but follow the same lines. The definition of subimplication we provide above is quite satisfactory for theories that are minimally modellable (a theory is said to be minimally modellable (Bossu and Siegel, 1982), or well-founded (Etherington et a/., 1985), iff every model of the theory is minored by a minimal model). A finer definition of a more achieved version of subimplication (which coincides with the version of subimplication presented here over the class of minimally modellable theories) exists, for which consistency over all theories can be proved, which guarantees monotonicity over the class of positive formulae: T I= F iff T I= suB F for all formulae F in which " and v are the only connectives Since subimplication is thereby identified with first-order logic as regards positive formulae, it is very well suited to databases (for more on this point of view see Nicolas, 1979). Subimplication has a syntactical counterpart in the form of a proof proceduret over the class of groundable formulae whose basic form is

where all variables actually occurring on the right side of the connective ~ occur on the left side (R 1 , ... , Rn+m are relation symbols). Variables can be replaced only by constants, or the resulting formulae will not be groundable.

5

CIRCUMSCRIPTION

In the general framework given in Section 2 relation symbols are divided into four groups. CW A and subimplication which we have described so far are based on very simple partitions; in fact all relation symbols are in R-. Circumscription (McCarthy, 1980, 1986) is an example of a non-monotonic logic based on a more complex partition. Circumscription deals with t A proof procedure is an effective formal system in which inferences of a given logic can he carried out.

5

Preferential-Models Approach to Non-Monotonic Logics

149

minimization of a relation (to be specified) in a way that may result in certain relations (also to be specified) being altered. After this informal description of circumscription, the precise definition given below is straightforward. Definition A formula F follows from a theory T by circumscription of a relation R in T with relations R~> .. . , R. varying, denoted T l=ciRIPJ F, iff F is true in all models that are minimal with respect to the preemption preordering ~P such that P = ... , R.}), where R= is of course the complement of {R, R 1 , .•. , R.} in R.

Observe that circumscription is actually a generic term, as emphasized by the notation l=c/R(PJ where P denotes the partition of R that furnishes the relations R, R~> .. ., R. characterizing the circumscription of relation R with the relations R 1 , ••• , R. being allowed to vary. Let us consider the theory T 1 once again. Circumscribing COM P-M AN U F with no varying relation yields preemption preordering attached to the partition P of R such that P =
its domain is D = {company}, company2, .. .};

(ii)

M 4 [Bull] =company}= M 4 [Peugeot];

(iii)

M 4 [COMP-MANUF](d) =I iffd =company} for all dE D.

Clearly M 4 is a minimal model. Also, T 1 l=ciRIPJ Vx COM P-M ANU F(x)<:;>X = Bull. However, the existence of M 4 proves that -,COM PMANUF(Peugeot) for instance is not implied by circumscription of COM PMANUF in T 1 • Furthermore, oCOMP-MANUF(Peugeot) and similar formulae cannot be obtained by (joint) circumscription, whichever set of relation symbols is chosen to be circumscribed. This is the major drawback of circumscription: equality is out of the scope of circumscription (Etherington eta/., 1985). Circumscribing equality does not yield the expected minimization, as achieved in subimplication for instance. Nevertheless, whenever equality is constrained to be minimal, circumscription (with no varying relation) yields the same results as subimplication. For example, if the formula -,Bull = Peugeot is added to the theory T 1 then -,COM PMANUF(Peugeot) is implied by the same circumscription as above (note that model M 4 for instance is ruled out). Intuitively, for theories with the unique names axioms (that is, all formulae 1ci = ci, where ci and ci are distinct constants) circumscription of all relation symbols coincide with subimplication. The same is true for CW A instead of subimplication, only for theories that contain a domain-closure axiom, that is, a formula Vx x :::::: c, v ... v x = c., where c 1 , ... , c. are constants. Such an axiom is needed

!50

P. Besnard and P. SieKel

to conclude from -,COM P-M ANU F(c) for all constants c distinct from Bull (i.e. what follows from CWA) to VxCOMP-MANUF(x)=>x =Bull (i.e. what follows from circumscription). Evidently, circumscription is more powerful than CW A (Lifschitz, 1985b ). Consistency of circumscription is very naturally linked to the existence of minimal models. Accordingly, there is nothing more to say than about consistency of subimplication: there exist examples of theories the circumscription of which is inconsistent and there exists a proof of consistency of circumscription for well-founded theories (Etherington et a/., 1985) (hence circumscription, when applied to universal theories, whether they are Horn or not, is consistent). The usefulness of varying relations can be illustrated easily by means of an example about employing a circumscriptive conclusion like Vx COM PMANUF(x)~x =Bull as the premise of an inference. Indeed, if the knowledge base of our expert system is presented with the information "the new SS tax applies to all companies except computer manufacturers", which can be encoded as the formula VxCOMPANY(x)" -,COMPMANUF(x)=>SS-TAXED(x), then circumscription of COMP-MANUF with the relation SS-TAXED varying yields the desired result, namely T l=ciRIP) Vx COM PAN Y(x) => [oSS-TAX ED(x) => x =Bull]. This illustrates the underlying principle of circumscription by which minimization of a relation is given priority up to fully specifying other relations (among the ones allowed to vary). Subimplication would not lead to the same result basically because subimplication is monotonic for positive formulae (see end of Section 4). It is not surprising that circumscription allows more inferences than subimplication, since partitioning of R is far more flexible for circumscription than for subimplication. All this is illustrated in the continuation of our example below. Let T 4 be the theory consisting of the two formulae considered in our example; that is,

COMP-MANUF(Bull) Vx COMPANY(x)" -,COMP-MANUF(x) =>SS-TAX ED(x)

We call M 5 the Herbrand model ofT 4 such that (i)

its domain D is {Bull, Peugeot, Dassau/t, ... };

(ii)

M 5 [Bull] = Bull, M 5 [Peugeot] M 5 [Dassault] = Dassau/t, .. . ;

(iii)

M 5 [COMPANY](d) =I for all dED;

=

Peugeot,

(iv) M 5 [COMP-MANUF](d) =I for all dE D;

5

151

Preferential-Models Approach to Non-Monotonic Logics

M 5 [SS-TAXED](d) =I for no dE D.

(v)

We call M 6 the Her brand model ofT 4 such that (i)

its domain D is {Bull, Peugeot, Dassau/t, ... };

(ii)

M 6 [Bu/l] = Bull, M 6 [Peugeot] M 6 [Dassault] = Dassau/t, ...;

(iii)

M 6 [COMPANY](d) =I for all dED;

(iv)

M 6 [COMP-MANUF](d) =I iff d =Bull for all dE D;

(v)

M 6 [SS- TAX ED](d)

=

=

Peugeot,

I iff d # Bull for all dE D.

Both M 5 and M 6 are minimal with respect to the preemption preordering attached to subimplication. Then T 4 I:Fsu 8 Vx COMPANY(x)= [--,SS-TAXED(x)=x =Bull]. In contrast, M 5 is not minimal with respect to the preemption preordering P attached to the circumscription of COM PMANVF with SS-TAXED varying because M 6 ~pM 5 and M 6 is not minored by M 5 . In fact, T 4 [=c 1R

Vx COMPANY(x)

=

[1SS-TAX ED(x) = x =Bull]

From a proof-theoretic point of view, circumscription offers the exciting prospect of a general formulation, as opposed to CW A and subimplication, for which not all theories are eligible to be applied to existing proof procedures. In fact, circumscription benefits from a very elegant prooftheoretic approach through a mere axiom schema, the so called circumscription schema. Definition

The circumscription schema for circumscription of the relation •.. , R.}) with varying ... , R. is of the form

R in the theory T (in the form of the sentencet T {R, R 1 ,

relations R 1 ,

[T{ R', R'1 ,

Where R', R'1 ,

... ,

••• ,

R~} " (Vx R'(x)

= R(x))] = (Vx R(x) = R'(x))

R~ are formulae and T{R', R'1 ,

... ,

R~} is the theory

T{R, R1 , •.. , R.} in which all occurrences of R, R1 , ... , R. are replaced by . Iy. ... , R'• respective Intuition is often lost when it comes to selecting formulae for replacing formula parameters R', R'1 , ••. , R~ in the circumscription schema. It is a ~ifficult task (Reiter, 1982; Besnard, 1984; Lifschitz, 1985a) to find an Instancet (of the circumcription schema) that ultimately leads to what could R' • R'1,

t A (finite) theory can always be written in the form of a formula in which all variables Occuring are quantified. tAn instance of a schema is a formula that results from substituting formulae for the formula Parameters in the schema.

!52

P. Besnard and P. Siegel

be called conclusive formulae. This is why it seems prudent to provide the reader with at least a rough idea of what the circumscription schema means. First, T{ R', R'1 , •• • , R~} testifies, if deducible from T, that R' if admissible for R in the sense that the way R is (incompletely) specified in T does not preclude R from being R' (only extensionally of course). Secondly, (Vx R'(x) => R(x)) ensures that R is required (by the formulae ofT) to be true whenever R' is true, so that if R' is admissible for R then restricting R to R' (by means of the formula (Vx R(x) => R'(x))) is the least that can be done in order to have R minimized. The circumscription schema is to be added to first-order predicate calculust (a brief account of it is given in the Introduction, Appendix A, p. 8) for usc by first-order inference rules as any first-order axiom schema, thus disturbing first-order predicate calculus as little as possible. Definition A formula F is derivable by circumscription ofthe relation R in the theory T with the relations R 1 , ••• , R. varying, denoted T l-c1RIP) F, iff F is derivable by first-order predicate calculus from T supplemented with the corresponding circumscription schema, in symbols C[T{R/R 1 , ... , R.}] 1- F.t Let us see how all this works in practice. Returning to our illustration, let us circumscribe COMP-MANUF, with SS-TAXED being allowed to vary, in the theory T 4 consisting of the two following formulae:

COM P-M AN U F(Bull) Vx COMPANY(x) " 1 COMP-MANUF(x)=>SS-TAXED(x)

Let us consider the instance yielded by the circumscription schema for the substitutions COMP-MANUF'(x)~x SS-TAXED'(x)~ IX=

=Bull

Bull

Then T 4 {COMP-MANUF', SS-TAXED'} consists of

Bull= Bull Vx COMPANY(x)" IX= Bull=> IX= Bull

Both formulae are valid, and, accordingly, they can be derived from anY theory; the former is an immediate consequence of the axiom of reflexivity of t

First-order predicate calcu/u.~ is the syntactical part of first-order logic. is the usual notation of derivability by first-order logic.

t The symbol f-

5

Preferential-Models Approach to Non-Monotonic Logics

153

equality Vx x = x; the latter comes from the axiom schema (A " B)~ B. As regards Vx COMP-MANUF'(x) ~ COMP-MANUF(x), it is the formula

Vxx

= Bull~COMP-MANUF(x)

which can be derived from COMP-MANVF(Bull) by virtue of Leibniz' substitutivity schema, namely VxVy x = y " A(x) ~ A(y) for every formula

A. Since the left member of the considered instance of the circumscription schema can be derived from T 4 , we can use modus ponens (from formulae A and A~ B infer formula B) to get Vx COMP-MANUF(x)

~

COMP-MANUF'(x), that is,

Vx COMP-MANUF(x)

~

x =Bull

At this stage, we have used the circumscription schema directly to obtain the formula Vx COMP-MANUF(x)~x =Bull, which can thus be said to be derivable from T 4 by circumscription (of COMP-MANUF in T 4 with SS-TAXED being allowed to vary). Symbolically, T 4 1-ciR(Pl Vx COMPMANVF(x) ~ x =Bull where C/R(P) denotes the circumscription of COMP-MANUF in T 4 with SS-TAXED being allowed to vary. From this formula and by sticking to pure first-order predicate calculus, it is possible to arrive at T 4 1-ciR(PJ Vx COM PAN Y(x) ~ [oSS-TAX ED(x) ~ x =Bull]. Details are as follows. The second formula of T 4 can be written in the form Vx COMPANY(x) ~ [oSS-TAXED(x)

~

COMP-MANUF(x)]

from which, using Vx COMP-MANUF(x)~x =Bull (which can be derived from the formula obtained above and the formula Vx x =Bull~ COMPMANUF(x) that we have already seen to be derivable from the formula COMP-MANUF(Bull) ofT 4 ), we conclude

'Vx

•JX

COMPANY(x) ~ [oSS-TAXED(x)

~

x =Bull]

. Deciding which formulae are to be substituted for the formula parameters

In the circumscription schema is fundamental with respect to obtaining conclusive formulae (particularly Vx COMP-MANUF(x)~x =Bull), as lllany instances of the circumscription schema, for example the one arising from substituting COMPANY(x) AX= Bull for COMP-MANUF'(x), lead nowhere. 1' We now consider non-monotonicity of circumscription through the theory s, which is the theory T 4 supplemented with the formula

COM P-M ANUF(Peugeot)

!54

P. Besnard and P. SieKei

It is not possible to have Vx COMP-MANUF(x) =>x =Bull

derivable from T 5 by circumscription, i.e. circumscription is non-monotonic. However, it is possible to derive Vx COMP-MANUF(x)=>x =Bull v x =Peugeot

from T 5 by circumscription using substitution x = Bull v x =·Peugeot for both COMP-MANUF'(x) and IX= Bull" IX= Peugeot for SS-

TAXED'(x). 6

CONCLUDING REMARKS

The account presented of the preferential-models approach to nonmonotonic logics seems to leave out two major contributions to nonmonotonic reasoning: default logic (Reiter, 1980) and autoepistemic logic (Chapter 4 and Moore, 1985). For the former, specifying an appropriate preemption preordering is rather easy for a restricted fragment like the one corresponding to CW A. Unfortunately, things get much more involved if non-atomic formulae are taken into account. For autoepistemic logic. working with a modal language adds another difficulty, mainly because a logic in which theories consist only of first-order formulae is easier to capture by means of a preemption preordering (rendering the effect of the nonmonotonic, perhaps content-specific, inference rules). In any case, both default logic and autoepistemic logic are more general than CW A, subimplication and circumscription in that they require maximization of one or several relations in many theories. That it would be impossible to characterize certain existing nonmonotonic logics through the preferential-models approach would just mean that their semantic bases rely on non-constructive fixed points, which cannot be captured by any preemption preordering. It is only a matter of expression But the principle of the preferential-models approach is not disputed: first formalize our intuitions about non-monotonic reasoning within a modeltheoretic setting and then devise a formal system for it (thereby being concerned with constructing a subclass of the preferential models defined hy means of non-constructive fixed points).

BIBLIOGRAPHY AI (1980). Special Issue on Non-monotonic Logics. Artificial Intelligence 13, no. I 2 (Apart from the closed world assumption, introduces the very first non-monoton 1c logics.)

5

Preferential-Models Approach to Non-Monotonic Lol(ics

155

sesnard, P. (1984). Vers une caracterisation de Ia circonscription. Rapport Inria. (Some results on circumscription about its proof theory.) sossu, G. and Siegel, P. (1982). Non monotonic Reasoning and Databases. Advances in Database Theory (ed. H. Gallaire, J. Minker and J.-M. Nicolas), pp. 239-284. Plenum Press, New York. (A first (technical) glance at subimplication.) Bossu, G. and Siegel, P. (1985). Saturation, Non monotonic Reasoning, and the Closed World Assumption. Artificial Intelligence 25, 13-63. (Where subimplication is defined.) Chang, C. C. and Lee, R. C. T. (1973). Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York. (The most readable textbook on Herbrand models and their role in the design of proof-theoretic systems, especially resolution.) Colmerauer, A. ( 1979). Sur les bases theoriques de Pro log. Rapport de Recherche, Universite d'Aix, Marseille 2. (A paper roughly along the lines of the previous item, but devoted to a particular system, namely Prolog (the logic programming language.) Etherington, D. W., Mercer, R. E. and Reiter, R. (1985). On the adequacy of predicate circumscription for closed world reasoning. Computational Intelligence 1, 11-15. (A must as far as the formal study of circumscription is concerned.) Farinas del Cerro, L. and Herzig, A. ( 1988). An automated modal logic of elementary changes. Chapter 2 of this book. (An introduction to a different view of logic programming.) Lifschitz, V. (1985a). Computing circumscription. Proc. Int. Joint Conf on Artificial Intelligence (/JCAI-85), pp. 121-127. Kaufmann, Los Angeles. (As the title says.) Lifschitz, V. (1985b). Closed world databases and circumscription. Artificial Intelligence 27, 229-235. (Results on when circumscription and the closed-world assumption meet.) McCarthy, J. (1980). Circumscription-a form of non monotonic reasoning. Artificial Intelligence 13, 27-39. (A beautiful exposition of the motivation for circumscription and its definition.) McCarthy, J. (1986). Applications of circumscription to formalizing commonsense knowledge. Artificial Intelligence 28, 89-116. (More insight into circumscription.) Marchal, B. (1988). Modal logic-a brief tutorial. Appendix B, Introduction to this book. (The basic definitions and concepts of modal logic.) Moore, R. C. ( 1985). Seman tical considerations on non-monotonic logic. Artificial Intelligence 25, 75-94. (Together with circumscription and default logic, the best proposal within the field of non-monotonic logics. Highly technical.) Moore, R. C. ( 1988). Autoepistemic logic. Chapter 4 of this book. (Good background _reading for the previous paper.) Nicolas, J .-M. ( 1979). Contribution a l'etude theorique des bases de donnees: apports de Ia logique mathematique. These d'etat, Onera-Cert, Toulouse. (A study of the application of logic to database theory.) NMRW (1984). Non Monotonic Reasoning Workshop, New Paltz. (A collection of Papers, the standard of which is rather mixed but including valuable contributions t_o the study of non-monotonic logics.) Reuer, R. ( 1978). On closed world databases. Logic and Databases (ed. H. Gallaire and J. Minker), pp. 55-76. Plenum Press, New York. (A pioneering and very insightful R ~aper on non-monotonicity in logic systems.) eiter, R. (1980). A logic for default reasoning. Artificial Intelligence 13, 81-132. (An R ~ssential work in the field of non-monotonic logics.) euer, R. (1982). Circumscription implies predicate completion (sometimes). Proc.

!56

P. Besnard and P. Siege/

American Association for Artificial Intelligence Col!f. (AAA/-82), pp. 418--420.

Kaufmann, Pittsburgh. (One more result on circumscription.) DISCUSSION

C. Froidevaux: Informal introduction to circumscnptwn and CWA. Commonsense reasoning supposes very wide knowledge, so we prefer to use general statements rather than to store all elementary facts they could generate. But in everyday life there are exceptions for almost general assertions, and they are too many to be all mentioned explicitly: our knowledge is necessarily incomplete. Thus we need to use general statements admitting exceptions. This process introduces non-monotonicity in commonsense reasoning. The prototypical example is the classical inference about birds: from the facts that generally birds can fly and that ostriches are birds but cannot fly, we can infer that Tweety. provided that it is a bird, can fly. But if we subsequently discover that Tweety is an ostrich, the ability of Tweety to fly can no longer be derived. To express such general statements, the circumscription approach uses classical logic and introduces abnormality predicates. (We should point out that there are as many abnormality predicates as types of exceptions.) In the case of birds, we say that a bird that is not abnormal (with respect to the ability to fly) can fly. We obtain the following first-order formulae: 1

(\fx) (bird(x)

A

1abnormal(x) = .flies(x))

= bird(x)) (\fx) (oxtrich(x) = 1jlies(x)) (\fx) (ostrich(x)

bird(Tweety)

From these formulae, it can be inferred that an ostrich is abnormal, but to deduce that Tweety flies, we must know that only ostriches are abnormal. We require the well-known default principle (cf. Chapter 7): if something relevant is an abnormal object then this is explicitly stated; otherwise it can be reasonably considered as normal. More generally, this principle extends to the following one: the objects that can be shown to have a certain property P (here abnormal) are all (and only those) the objects that satisfy this property: we circumscribe the extension of the predicate P. In order to do this, we have to minimize its extension. From a semantic point of view, we have introduced a notion of preferability between models: models where the extension of P is minimal are preferable. Another form of non-monotonicity occurs with the assumption that only positive information is represented and that negative information can be derived h~ completion. This process considerably simplifies the storage of the data: we need only give relevant positive facts, while negative information is very high. In the context of relational databases, this is known as the closed-world assumption (C W A); we assume that every relation instance is true only if it is given explicitly or else implied by one of the universal rules defining the relation. (In general, only Horn theories arc considered.)

5

Preferential-Models Approach to Non-Monotonic Logics

!57

for example, let us consider a universe of blocks and table as follows: A, Band C are blocks, D is a table, A is green, C is red, A is ON D, C is ON D, B is ON C and if object X is ON object Y and object Y is ON object Z then X is ON Z. Under the CW A, we restrict our attention to the world where A, Band Care the only blocks, Dis the only table, A is the only green thing and Cis the only red thing. Moreover, if object X is ON object Y then (X, Y) e {(C, D), (B, C), (B, D), (A, D)}. Hence in this world the colours of B and of D remain unknown and A is neither on B nor on C. Thus CW A leads us to prefer models where every predicate has a minimal extension. While circumscription focuses on some predicates for the minimization process, CW A deals with all predicates. This brief presentation highlights the fact that for these two formalisms the notion of first-order model is insufficient to capture non-monotonicity, and suggests that we resort to a concept of preferability between models: not all models are desirable. The preference criterion obviously depends on the formalism considered. Nonmonotonicity results from the following observation: if M is a preferable model for a set of axioms T and if T c T', then M is not even necessarily a model ofT'.

2 A preemption preordering on the models. Besnard and Siegel provide a general framework to define the semantics of both formalisms. To every theory T they attach a partition P of the set of its relation symbols, so that P = (R ~, R +, R _, R). R ~ denotes the set of relations that must be identical in all models, R + the set of relations to be maximized, R _ the set of relations to be minimized and R the set of relations allowed to vary from model to model. This partitioning is used to compare models: ~P denotes the preemption ordering with respect to P on the models. Recall the example of Section 2 of Besnard and Siegel's Chapter. Let T be the following theory: T = {MAN(Emos), MAN(Socrates),

('
A

ORDINARY(x)= -,AGUT/ED(x))

('
= M AN(x))}

The choice of the partition P for T refers to the intuition underlying a commonsense reasoning. The second assertion means that generally a man does not suffer from aguty. As ORDINARY is the most frequent case, we shall maximize the extension of the predicate ORDINARY, while AGU Tl ED, being an exceptional state, will be minimized. Since there is no logical relation between the properties ORDINARY and AGUT/ ED on the one hand, and FAST- WALKER on the other hand, the predicate FAST-WALKER is allowed to vary: we have no preference about the fact that Emos (or Socrates) is a FAST- WALKER or not. We shall also compare models where the extensions of the predicate MAN are the same. Hence we get the partition P = (R ~ ""{MAN}, R+ ={ORDINARY}, R- = {AGUTIED}, R. ={FAST-WALKER}). Let M and N be the following interpretations:

M [MAN]= {Emos,Socrates} M[ORDINARY]={ },

=

N [MAN]

N[ORDINARY]={Emos}

M [AGUTIED] ={Socrates},

N [AGUTIED] = { }

M [FAST-WALKER]= \Socrates}.

N [FAST-WALKER]= {Emos}

158

P. Be.mard and P. SieKel

M and N are both models ofT, but N is preferable to M with respect to the preemption ordering attached toP: N ~PM. However, N is not yet the expected model forT. Let N' be as follows:

N' [MAN]= {Emos, Socrates}, N' [AGUT/ED] = { },

N' [ORDINARY]= {Emos, Socrates}

N' [FAST-WALKER]= {Socrates, Emos}

N' is a model ofT that is preferable toN, and no model is preferable to N'. We say that N' is a minimal model for ~P. N' is a desirable model in the sense trtat we prefer overall a model where Emos and Socrates are ordinary men, who do not suffer from aguty. Minimal models for ~P play a fundamental role in the specification of the semantics ofCWA and circumscription formalisms. It remains to give the form of the partition P precisely for each formalism. For CW A, R _ consists of all relations and R ~ , R + , R are all empty sets. For cirumscription the partition is more elaborate: if Q is th~ predicate to be circumscribed and R 1 , •• • , R. the parameter relations then R = consists of all relations of the theory minus Q, R~o ... , R., R- = {Q}, R+ = {} and R. = {R~o ... , R.}. Unfortunately, the critical choice of the parameter relations is principally motivated by technical reasons, so that intuition is often missing. Roughly speaking, exception predicates must be minimized, while property predicates (e.g. FLIES, AGUT/ED), which we know about, are allowed to vary. 3 Limitations of this presentation. The possibility of describing different nonmonotonic formalisms in a general framework is useful: comparisons between them can easily be carried out. For example, this framework emphasizes the relationship between CW A and circumscription: it suggests that CW A is equivalent to the circumscription of all predicates. Besnard and Siegel propose also to include in the framework the formalism of subimplication. But the partition provided (which is the same as for the CWA). highlights more the similarity between CW A and subimplication than the difference. The preemption ordering as defined by Besnard and Siegel does not permit distinction between them. In fact, the essential difference lies in the special kind of models considered: on the one hand Her brand models for CW A; on the other hand. discriminant models for subimplication. The task is more complex for other non-monotonic formalisms. Default logic cannot be described in this context, except for the fragment corresponding to the CWA. Bidoit and Hull (1986) have proved that minimal models (in the sense of subimplication) define a good semantics for theories whose defaults are all CW A defaults (i.e. normal defaults without prerequisites). But for normal defaults in general this framework approach cannot take into account Moore's modal nonmonotonic logic: autoepistemic logic (see Chapter 4). Partitioning of relations is unsatisfactory even for circumscription. Several versions of this formalism have been proposed. Prioritized circumscription has been conceived by Lifschitz to remedy some drawbacks of general circumscription. In some cases, we have to deal with several kinds of abnormality, so that more than one abnormalit) predicate must be introduced. If the minimization of these predicates conflict with each other then we need a partial priority ordering between them. Hence priorit) circumscription requires minimization of predicates with respect to some prioritl ordering. In conclusion, the preferential-model approach to non-monotonic logics should he

5

Preferential-Models Approach to Non-Monotonic Lol{ics

159

extended so that the preemption ordering is no longer based on a partitioning of the relations. Very recently, Shoham (1987) has made a proposal for doing this. The basic idea behind the construction of his framework is again the use of a preference relation on models, associated with a standard logic. But this preference relation can be any strict partial order on interpretations. In this framework classical notions such as satisfiability and entailment are redefined. Depending on the choice of the preference relation, different non-monotonic logics can be described with this framework. For subimplication and circumscription, the partial orders are analogous to those of Besnard and Siegel. But as this partial order can be very general, default logic and autoepistemic logic can be dealt with. (In fact, Shoham handles another nonmonotonic modal logic, namely Halpern and Moses' logic, of minimal knowledge.) For example, Halpern and Moses' logic is treated by giving a preference criterion on Kripke interpretations. This framework makes comparisons between non-monotonic logics easier. Let us give such a result: circumscription can be reduced to the logic of minimal knowledge.

Flash Sheridan: A theory is a syntactic object- a collection of constant terms, function symbols, predicate symbols, sentences, and what-have-you. A model of a theory is some mathematical objects for that theory to be about: a constant symbol should mean some object, a relation symbol should mean some relation in extension (i.e. a set of ordered pairs), a predicate symbol should mean the extension of a predicate (i.e. a set), and a sentence should mean a truth value. The model is a model of the theory if the obvious things work out right; for instance, a true sentence means the truth value True. (It will make things easier to consider predicates as one-place relations. Actually, we worry about predicates more, so I shall consider relations as multiplace predicates.) A model (of a given theory) is nicer than or as nice as another if the predicates that one wants smaller (R _) are no larger, the predicates that one wants larger (R +) are no smaller, the predicates that one insists stay the same (R =)do stay the same, and the predicates that one doesn't care about (R.) can do whatever they want. As in the mundane world, what is nicer will depend on context; we abbreviate "nicer than or as nice as" by ::;;;p, where P tells us the context (P = (R=, R+, R-, R.)). The objects and functions must stay the same (Besnard and Siegel call this minoring). A model is minimal if there is nothing strictly nicer than it. The following are examples: Circumscription: In circumscription's opinion, what chiefly matters is that one predicate, the abnormality predicate, AB, is minimized (so R_ = {AB}). Some predicates (R.) are allowed to vary, some (R=) are not. SoP= (R=, 0. {AB}, R.). Closed-world assumption: In the CW A, one wants to minimize everything. So every ~redicate is in R _. We, however, derive only ground literals from CWA (a ground hteral is either a relation symbol followed by the appropriate number of constant symbols, or the negation thereof); after we have derived all the ground literals we Want, we then revert to normal deduction . . Subimplication: This uses the same definition of "nice", but we only consider discriminant models. A discriminant model is what set-theorists call a term model, ~hich is merely a cheap trick. We consider the theory itself as the model. For Instance, if we want a term model of set theory then the null set is the symbol "0" ~nd the number two is the term " {0. { 0}} ".In a decent set theory this works out very ad!y, since {0. 0} is the same set as {0}, but these are obviously different terms. ('fhtngs are not quite as bad as that if one is careful-but this is leading us astray.)

160

P. Besnard and P. SieK<"!

This has advantages for (their example) French corporations. In a discriminant model Dassault "# Peugeot since these are different names. It is nice to be able to prove that Dassault "# Peugeot, but I imagine this will be too crude. (For instance, a discriminant model will prove all sorts of things like Persia "# Iran, the morning star "# the evening star. This would greatly benefit users of pseudonymous credit cards. It will also prove Mother(John) "#Mary, so if "Mother( John)= Mary" is in your theory, it is now inconsistent. There is so because "Mother(John)" and "Mary" arc different terms.) The point to all this is that minimal models (i.e. models than which there is nothing nicer), in the varying senses of minimal (because the sense of"nicer" varies) models arc interesting. Something should follow from circumscription and some axioms if it is true in all circumscription-minimal models of the axioms, and from CW A similarly. Besnard and Siegel do not so much prove this as define it to be true.

Reply to Froidevaux: Within the preferential-models framework, CWA and subimplication are characterized through the same preemption ordering, but there is nothing wrong with this because subimplication aims at safely generalizing CWA to universal theories. It is too hasty a claim to say that only the fragment of default logic corresponding to CWA can be captured by the preferential-models method. Indeed, there is no reason to contend that the free default fragment of default logic is out of the scope of this method (the free default fragment, which has some properties (Besnard, 1987), is obtained by constraining defaults to have no prerequisites (see Chapter 7)). On the other hand, surely not all of default logic comes under the realm of the framework presented. This is due to some peculiarities of default logic such as the one corresponding to the inability of default logic to take into account default information for reasoning by cases (what is under consideration here is the reasoning scheme relativized to the consistency requirements underlying the notion of inference developed in default logic-such that if some available default information makes it possible to conclude C from A and to conclude C from B then C can be concluded from A v B). Even more unfortunate and unexpected is the fact that recent unpublished works seem to indicate that autoepistemic logic cannot be characterized by means of preferential models. To take into account prioritized circumscription, the definition of a preemption preordering has to be modified, but no change is needed as far as partitioning of relations is concerned. In conclusion, it is fair to say that Shoham (1987) independently proposed a framework for preference over models that goes far beyond the one presented here. but the price to pay for the increase in generality is, among other things, a complex redefinition of satisfiability and entailment. Reply to Sheridan: For subimplication to be applied, it is an absolute requirement that the theory at hand doesn't have equalities (even conditional ones) in it: so if "Mother(John) = Mary" is in the theory then no inconsistency arises, because subimplication just doesn't apply to the theory. Such a requirement may seem very restrictive at first sight, but it is a usual assumption for databases, and subimplication has been devised to handle the querying of databases with disjunctive information. As indicated by Sheridan, a formula should be derivable by circumscription if it is true in all circumscription-minimal models of the theory. Unfortunately, this is not always the case (see e.g. Besnard, 1987).

5

Preferential-Models Approach to Non-Monotonic Logics

161

Additional references

Besnard, P. (1987). An Introduction to Default Logic. Springer-Verlag, Berlin. Bidoit, N. and Hull, R. (1986). Positivism vs. minimalism in deductive databases. Proc. ACM. SIGACT-SIGMOD Symp. on Principles of Database Systems. pp. 123-132. Shoham, Y. (1987). A semantical approach to nonmonotonic logics. Proc. 2nd Int. Symp. on Logic in Comput. Science (L/CS 1987), pp. 275-279. Computer Society Press, Ithaca, New York.

6

An lntuitionistic Basis for NonMonotonic Reasoning MICHAEL R. B. CLARKE Department of Computer Science, Queen Mary College, University of London, UK

D.M.GABBAY Department of Computing, Imperial College of Science and Technology, University of London, UK

Abstract Many systems of non-monotonic deduction have been proposed, but little attention has been given to saying what non-monotonic deduction is. We suggest the notion of restricted monotonicity as one plausible relaxation of the strict monotonicity condition of classical deduction. Gabbay's intuitionistic semantics and nonmonotonic deduction rules are given and compared with the modally based fixedpoint systems of Moore and McDermott and Doyle. Systems with branching futures are shown not to satisfy restricted monotonicity in general. Implementation of a practical non-monotonic system is briefly discussed.

1

INTRODUCTION

Before proposing yet another logical system for non-monotonic reasoning, we start by suggesting some formal properties in terms of which nonmonotonic systems in general might be classified. Suppose that one is offered an inference system in some area of application where knowledge is represented by propositional formulae At. A 2 , ... , A., B, etc. One tries to evaluate it by asking questions of the form: does A follow from At. A 2 , ... , A.? Or, in the usual meta-language of propositional logic, does A 1, A 2 , .. . , A. r A hold? The answer, yes or no, indicates whether the Pair <{At. A 2 , ... , A.}, A) is or is not an element of the system's consequence relation. Proceeding in this way, one builds up a sample of the consequence relation and can then ask whether this is a satisfactory logical inference system. Is the machine behaving logically? One way to check is to look at the meaning of the formulae A 1 , A 2 , •. . , A., A and see whether the answers make ~ON·STANDARD LOGICS FOR AUTOMATED EASONING ISBN 0.12·649520·3

Copyright (fj /988 Academic Press Limitf'd All rights of rt•production in any form reserved

164

M. R. B. Clarke and D. M. Gahhay

sense in the context of application. Suppose, however, that the necessary interface is as yet unavailable. Is there any test that we can apply to the answers to check whether or not the system is operating logically? How do we know that it is not just converting the A; to numbers and checking whether or not the number corresponding to A divides the product of A~. A 2, .. ., A.? If it is then this arithmetic method of answering will have "logical" properties such as Br-A and, if A 1. A 2, .. ., A. r A then A 1. A 2, .. ., A., BrA. We should not want to say that an arithmetically based system like this was logical. What conditions should a system satisfy to be characterized as logical? This question has been answered by Tarski and Scott. If the three conditions A~.

A 2, .. ., A., B r- B

A~.A2,

... ,A.rX A~.A 2 ,

A~.

A~.A2, ... ,A.,XrB ... ,A.r-B

A 2, .. ., A. r B ... ,A.,Xr-B

A~.A 2 ,

(reflexivity) (transitivity) (monotonicity)

are satisfied then r- is the consequence relation of a monotonic deductive system. So we do have a method of checking whether or not our hypothetical system is a monotonic deduction machine. The arithmetic method of answering will fail the transitivity condition. In general, logics will be characterized by the further conditions that they satisfy in addition to those above. It can be shown for example that intuitionistic provability is the smallest consequence relation closed under substitution that satisfies the deduction theorem. It is natural to consider similar criteria for non-monotonic deduction. Not surprisingly, they involve changing the monotonicity condition. But first consider another simple example. Suppose that a database f!JJ contains assertions

Pl:

A,B--+ C

P2:

-,D--+ B

P3:

A

and that the deductive component is classical truth-functional logic. The way that deductions are actually carried out of course depends on the rules chosen-semantic tableau, natural deduction etc. Here we take the following (monotonic) rules:

6

Jntuitionistic Ba.,es/iJr Non-Monotonic Reasoning

Rl: R2: R3:

165

XI\Y-+Z X-+ ( Y-+ Z)

X

X-+Y

y X-+ Y

Y-+Z

X-+Z

Rl is an axiom schema in many systems, R2 is modus ponens, R3 is the transitivity of implication. We write A~> A 2 , ... , An => B to mean that B is deduced (monotonically) via a single application of the rule R. Using these rules on the database f!J>, we can make the successive deductions f!J> => A

-+

(B

-+

C)

f!J>,

A

-+

(B

-+

C) => B

f!J,

A

-+

(B

-+

C),

B

-+ -+

C C => -, D

-+

C

Thus f!J> 1- -, D -+ C, and we cannot extract any more information from f!J> by classical deduction-we cannot, for example, deduce C from f!J>. Attempting to get more out of the data, we might consider adding some more reasonable-looking inference rules. For example we might add a default rule stating that for any literal X not appearing in f!J> as the consequent of an implication, assert by default -,X: Dl:

X a literal and X not the consequent of an implication

-,x This is a different kind of rule from R I, ... , R3. It says in effect that if there is no way that X can be deduced from f!J> then assume -,X. For clarity we use a different symbol for this kind of default deduction and write A~> A 2 , •. . , An~ B. With this new possibility, we can get more out of the database. In fact we can deduce C without using R3: f!J

~

f!J~

-,D -,D => B

f!J~ -,D, B =>A-+ (B-+ C)

f!J~ -,D, B, A-+ (B-+ C)=> B-+ C f!J~ 1D, B, A -+ (B-+ C), B-+ C => C

This is just an example, of course; other rules could have been used. Regardless of what the rules actually are, we should like to be able draw general conclusions about such systems.

166

M. R. B. Clarke and D. M. Gahhay

Monotonic deductions are made up of chains of single-step rules such as R I, ... , R3 above. Properties of monotonic deduction are proved by mathematical induction on the length of the chain. Non-monotonic deductions can similarly be thought of as constructed from a mixture of single-step monotonic and non-monotonic rules, and properties of non-monotonic deduction can then be proved similarly from properties of the single-step rules. Clearly monotonicity is one property that we shall not be able to prove, and this has led to non-monotonic deduction being define X to indicate that X is derived from A via a monotonic rule, i.e. a rule with the property that if A => X then A, B => X. and A ~ Y to indicate that Y is derived from A via a non-monotonic rule such as Dl. Then a complete non-monotonic deduction can be thought of as chained application of single-step monotonic and non-monotonic rules. Properties of the single-step rules will determine properties of the system consequence relation. For example, the following properties of~ might be thought desirable. (I)

Consistency of new deductions with existing data. If A~o A 2, ... ,An is consistent and A~oA 2 , ... ,An~B holds then A~oA 2 , ... ,An,B is consistent.

(2)

If Band C can separately be derived from A~o A 2, .. ., An by singlestep rules then A~o A 2, .. ., An, B, Cis consistent. In view of (I), this can also be written A~o A~o

A2, .. ., An~ B and A 1, A 2, .. ., An~ C iff A2, .. . , An~ B 1\ C

This is arguable. It is valid for certain kinds of default system, but not those with multiple extensions and not for probabilistic reasoning. where for example we might say A~o

(3)

A2, ... ,An~ A ifProb (A

I A~o A2, ... ,An)> 0.5

Restricted monotonicity: A~o

A2, .. . , An~ X A~o A2, .. ., An~-? B A~oA2, ... ,An,X~B

We know that~ is non-monotonic, but if we add a proposition that tells us is consistent, it does not affect the deducibility of other deducible sentences. ~

6

/ntuitionistic Bases for Non-Monotonic Reasoning

167

We now define A., A 2, ... ,An 1- X iff for some B., B 2, . .. , Bm

It can then be shown (Gabbay, 1984) that the following properties hold for

1-:

2

A., A2, ... ,An, Bl- B

(reflexivity)

A., A2, . .. , An 1- B A., A2, ... ,An, B 1- C A.,A2, ... ,Ani"'C

(transitivity)

A,, A2, ... , An 1- B A,, A2, .. ., An 1- C A.,A2, ... ,An,BI-C

(restricted monotonicity)

INTUITIONISTIC VERSUS CLASSICAL LOGIC

So far we have given some examples of non-monotonic deduction and said something about properties that a non-monotonic consequence relation might or might not be expected to have. We now give the semantic basis for a particular system based on intuitionistic logic, and compare it with other systems and the criteria given earlier. The classical understanding of truth is simplistic-statements are either true or false. The intuitionistic notion is more realistic-a statement may change its truth value over time, but not capriciously. At this moment it may fail to be true because we do not have sufficient grounds for asserting it. At some future time it may become true as more information becomes available. Although intuitionistic logic is weaker than classical logic in the sense that its theorems are a subset of classical theorems, it is stronger in the sense that if by a classical argument we establish, for example, P v Q then this deduction is weaker than if we had derived it intuitionistically. Although intuitionistic logic uses the same names and symbols for the connectives as classical logic does, they are not definable in terms of each other in intuitionistic logic, and the semantics of -, and ..... in particular are rather different. Many classical deductions seem counter-intuitive when expressed in ordinary language, particularly those involving translations of material implication, for example (\fx)P(x) -+ Q I- (3x)(P(x) -+ Q), which has a fairly intricate classical proof by contradiction but is not provable intuitionistically. Let P(x) be "x plays Well" and Q "we shall win", and ask oneself whether the deduction seems reasonable.

M. R. B. Clarke and D. M. Gahhar

168

Many other examples of this type can be given, and whether the differences are significant depends presumably on the application. We shall go on to show, however, that many of the counter-intuitive results reported by McDermott and Doyle disappear when their notion of consistency is formulated intuitionistically. Although, reasoning intuitionistically, we do not insist that every statement is either true or false, we do assume that every statement is either true or fails to be true. The situation may develop as more information becomes available and statements that up to now have failed to be true may become true. We do assume, however, that statements that have been established as true can never again fail to be true. No further evidence can overturn a demonstration of truth. Negation is defined similarly, -, P is true now if and only if P fails to be true in all possible future situations-we see now, at this moment, that we could never have grounds for asserting P.

3 SEMANTIC PRESENTATION OF AN INTUITIONISTIC SYSTEM We start by giving a semantics for intuitionistic propositional logic in such a way as to show how it models the accumulation of information with time, together with a notion of consistency analogous to that of McDermott and Doyle. Consider a set T of moments of time and a reflexive and transitive timeordering relation ~, where t 2 ~ t 1 means that t 2 is either later than t 1 or t 1 itself. As time goes on, we learn more and more about the world in the sense that the truth of more and more propositions becomes established. Let the propositional language L be the usual one, with the addition of the extra operator M whose intended interpretation is that M P means it is consistent to asume P. Once known to be true, an atomic proposition remains true; up to that point it fails to be true. Label these two possibilities I and 0. They can be specified via a function h: T x L -+ {0, I} having the following properties: (0)

if h(t, A)

(I)

h(t, A " B)= I iff h(t, A)= I and h(t, B)= I;

(2)

h(t, A v B)= I iff h(t, A)= I or h(t, B)= I;

(3)

h(t, -,A) = I iff, for all s

(4)

h(t, A -+ B) = I iff, for all s

(5)

h(t, M A) = I iff there is an s

=

I for atomic A then h(s, A)

~

=

I for all s

~

t;

t, h(s, A) = 0; ~

t, if h(s, A) = I then h(s, B) = I;

~

t such that h(s, A) = I.

Note that, although atoms and formulae not containing M are forever true

6

[ntuitionistic Bases for Non-Monotonic Reasoning

169

once they become true, M A and formulae involving M can revert to falsehood. In the diagram, M A is true at t and t 1 but not at t 2 • Note also that M A is not the same as -,-,A (although in linear time it is). We say more later about the obvious connection with modal possibility. A

t,

Intuitionistic entailment is defined as A I= B if and only if, for any function h satisfying (0, ... , 5) above, whenever h(t, A)= I, h(t, B)= I. Intuitionistic provability r- can then be defined in terms of axiomatic, natural-deduction or tableau systems that can be proved sound and complete with respect to the above semantics. We say something about automating this later. Our notion of non-monotonic provability 1- will be based on r-, or equivalently 1=. But first we check whether or not the consequences of this definition satisfy our intentions about the behaviour of the operator M. M A and -,A are semantically exclusive and exhaustive possibilities, so we have I= M A v -,A. Also, -, M A is semantically equivalent to -,A and A I= M A is valid. Furthermore, M A -+ B is equivalent to-, A v B; either-, A is true now or, if not then M A is true now and hence B. Note that C v -,Cis not a theorem, but is equivalent to MC-+ C. Also, MC-+ -,cis equivalent to -,c. We have seen that I= M B v -, B, but that is not quite the same as A I= M B iff A I+ -,B. Let the time points T be the subset of L generated by -,, " and -+only, and let the ordering relation A ~ B mean I= A -+B. For atomic p let h(A, p) = I iff A I= p. For any formula Bit can then be shown inductively that h(A, B) = I iff A I= B. Now A 14: -, B iff A " B is consistent. But A " B ~ A since A " B I= A, so if A I+ -, B then there is a state, namely A 1\ B, in the future of A in which B is established, and conversely if there is such a state B in the future of A then A 14: -,B. So A I= MB iff A 14: -,B. Now consider again some of the features of McDermott and Doyle's ( 1980) system, also discussed in Gab bay ( 1982) and in Chapter 4 by Moore. Among several problematic cases for their logic, McDermott and Doyle cite the following: (I)

{(MC-+ D), 1D} is inconsistent;

(2)

MC does not follow from M(C

1\

D);

M. R. B. Clarke and D. M. Gahha 1·

170

(3)

{MC, 1C} is not inconsistent;

(4)

{MC-+

-,q

is inconsistent, but {MC--+ -,c, 1C} proves

-,c.

They find {(M C --+ D), -, D} inconsistent because -, C cannot be shown to follow, forcing the assumption of MC. Moore also finds that this theory cannot be the foundation of a consistent set of beliefs. Intuitionistically MC--+ Dis equivalent to -,c v D, so {(MC--+ D), 1D} is equivalent to -, C, justifying the intuition of McDermott and Doyle, who remedied the inconsistency of their classically based system by arbitrarily adding -,C. Also, M C does follow intuitionistically from M( C 1\ D), and {M C, -, C} is semantically inconsistent, while M C --+ -, C is equivalent to -,C. {M C --+ C: proves C v -, C, which says that either Cis known now or -,Cis known, but does not say which. The intuitionistic semantics can be extended to quantified formulae, and the system can also be given an axiomatic presentation. The details are in Gabbay ( 1982).

4

THE NON-MONOTONIC COMPONENT

So far we have shown that the intuitionistic semantic basis overcomes some of the problems with McDermott and Doyle's system, but we have not yet said what constitutes non-monotonic deduction in our system. We could now go on to extend the logic with fixed-point definitions as McDermott and Doyle do, and these are indeed of some interest, but, following the discussion in the first part of the chapter, we first define non-monotonic deduction iteratively in terms of chained single-step default rules. The single-step non-monotonic rules are of the form A 1, A 2 , ••• , An ~ B if. for some P., ... , Pk such that A., A 2, ... ,An, MP., . .. , MPk are consistent we have At. A 2, ... ,An, MPt. ... , MPk I= B. We can also say as a special case that At. A2, ... , An~ B if At. A 2, . .. , An I= B with no default. As defined here, the intuitionistic deduction of B is semantically based. The corresponding proof might involve several basic steps such as modus ponens. whereas earlier we implied that ~ stood for only a single basic step. What is of interest, however, is whether~ satisfies the consistency and monotonicity properties introduced earlier. We then define non-monotonic 1- as above. At. A 2, ... , An 1- X iffor some Bt.···•Bm

6

/ntuitionistic Basesfor Non-Monotonic Reasoning

171

The following are examples:

{MP-+ P}

(l)

-vv)

P because {(MP-+ P), MP} I= P

{(MP-+ P), (P "MQ-+ R)} I- R

(2) because

{(MP-+ P), (P "MQ-+ R), MP} I= P and

{(MP-+ P), (P "MQ-+ R), P, MQ} I= R

5

RESTRICTED MONOTONICITY

We stated earlier that if the consistency and restricted-monotonicity properties hold for single-step rules then they can be shown to be hereditary along chains of such rules. Thus it suffices to consider the single-step rules, and for simplicity we take these to be of the following form: A B if A, M P I= B for some P such that A, M P are consistent. As a preliminary, the following properties can be easily established: -vv)

( 1)

A, M P are consistent iff A I+ -, P;

(2)

if A, MPI= B then A I= B v -,P;

(3)

if A, MP are consistent and A, MP I= B then A, Bare consistent.

Now strict monotonicity A

-vv)

A, B

X

-vv)

X

does not hold for

-vv).

A counter-example

is (M P -+ P) P but {(M P -+ P), -, P} I= -, P. Suppose, however, that we take a B that is "more consistent" with A. This is the idea behind restricted A B A X monotonicity. Does always hold? Again the answer in A, B X general is no. A counter-example is M P -+ P P and M P -+ P P, because {(M P -+ P), M(-, P)} I= -, P, but {(M P -+ P), P, M(-, P)} is inconsistent. (It can be shown, however, that if P and Q are not the same then (MP-+ Q) Q but (MP-+ Q) 1Q). Now suppose that {A, MP} I= B and {A, MQ) I= X, where A, B and A, MQ are separately consistent but A, B, MQ are inconsistent. Then A,MQ I= 1B.SowehaveA I= B v -,PandA I= -,B v 1Q.Thisispossible in branching time, so restricted monotonicity need not hold, but in linear time the schema (ex -+ {3 v y) -+(ex -+ {3) v (ex -+ y) is provable, so I:::::(A-+ B) v (A-+ 1P) and I=(A-+ -,B) v (A-+ 1Q). But A I+ -,P, so there is a state, namely A " P, in the future of A in which P is established. -vv)

-vv)

-vv)

-vv)

-vv)

-vv)

+

-vv)

-,

172

M. R. B. Clarke and D. M. Gahhay

Therefore h(t, A -+ -, P) = 0 for all t and hence h(t, A -+ B) = I for all t. By a similar argument on Q, we also have h(t, A -+ -,B) = I for all t. So in linear time the assumption that {A, B, M Q} is inconsistent leads to a contradiction A -vv> B A -vv> X and holds. A, B -vv> X

6 CONNECTION WITH MODAL LOGIC AND FIXED-POINT THEORIES There is a well-known representation (originally due to Godel) of the notion of intuitionistic truth within the modal logic S4 . If [A] denotes the modal translation of the intuitionistic formula A then [A] = OA for atomic A [A 1\ B] = [A] 1\ [B] [A v B] = [A] v [B] [A -+ B] = O([A] -+ [B]) [iA] = OI[A]

where connectives inside square brackets are intuitionist and those outside are classical. We can now add to these

The modal translation of the intuitionistic M P -+ P (or equivalently P v -, P) is now OP v 0-, OP, which is an instance of the S 5 axiom. Thus defaults of the form M P -+ P, normal defaults in Reiter's terminology, take us some way towards S 5 . If we had obtained the non-monotonic component by the fixed-point method then its modal translation would have been: Tis the fixed point of the (intuitionistic) premises A~o A 2 , .. . , A. if Tis the set of S4 consequences of

Note the difference that intuitionistic negation makes. McDermott and Doyle's definition was equivalent to

{At. A2, .. . , A.} u {-, 0-, PI-, P ¢ T} The intuitionistic notion of truth seems to give much the same effect as the other half of Moore's definition, which forces inclusion of OP if P e T Consider again the example {MC-+ D, -,D}. The modal translation is {0(-, o-, DC-+ OD), o-, OD}, which is logically equivalent to {O(P-+ DQ), OP}, putting P for-, OD and Q for-, DC.

6

Jntuitionistic Bases ./(Jr Non-Monotonic Reasoning

173

From the K axiom and modus ponens, we have DDQ, i.e. OQ, i.e. [1C] as before. Unlike Moore and McDermott and Doyle we do not now get a contradiction through having to block --, DQ by including Q. Following the intuitionistic version of the fixed-point definition, DQ E T, so--, DQ need not be. There is no need, or even possibility in intuitionistic logic, of introducing

Q. 7 AUTOMATING INTUITIONISTIC NON-MONOTONIC DEDUCTION There are a number of ways to provide the monotonic component. We have experimented with a tableau implementation for intuitionistic logic based on the signed tableau rules in Fitting (1983), with two more rules for the new operator M: {tA,, ... , tAk,

JB,, .. .,fBm,fMX}

I

{tA" .. . , tAk,fB" .. . , fBm, t--, X}

where t andfsay that the formula that follows is true or fails to be true in the world specified. This is straightforward to implement, but tableau systems do not always provide efficient or comprehensible proofs, so a backwardchaining method for intuitionistic logic based on natural deduction is also being developed. Gabbay ( 1987) gives a similar system for classical logic, and also shows how it can be extended to intuitionistic and modal logic. What is not so clear is how the non-monotonic rules should be implemented. A natural approach is to try and extend the goal-directed backward-chaining methods already used for deductive databases. For many straightforward deductions this seems to work quite well. Consider once more the example {(MP ..... P), (P" MQ ..... R)} 1- R and suppose that we are thinking in terms of a Prolog-like system. The program is MP-+P P" MQ-+ R

and the query is

?I-

R

Note that the query requests non-monotonic deduction. To show R, we have to show P" MQ. To show P, we have to show MP, for which there is no explicit fact or rule. But M P is the same as failing to show --, P, so we invoke a

174

M. R. B. Clarke and D. M. Gahhay

new computation rule for formulae of the form M P. We try to show 1 p (monotonically) and fail. So M P can be assumed, and now we have to show Q, for which there is no fact or rule, so once more we must try to show 1 Q, which fails, so MQ succeeds and so on. Given a suitable complete theorem-prover that answers valid or not to any proposed propositional deduction, implementation of the new computation rule is relatively simple. Of course estimation of failure in the quantified case would need some heuristics. However, the additional rule, although necessary, is by no means sufficient. Suppose that we change the example slightly to

MP-+ P IPAMQ-+R We now have to show 1 P at the first step. In fact 1 P is an intuitionistic consequence of {(M P -+ P), M( 1 P) }, but this time there is no rule in the database to directly trigger M( 1 P). This step can be generated, however, by a computation rule that says that if one is otherwise stuck with a goal G then add MG if it is consistent to do so (this idea was suggested by Steve Reeves). We are currently experimenting with a practical system based on rules such these. BIBLIOGRAPHY Fitting, M. C. (1983). Proof Methods for Modal and lntuitionistic Logic. Reidel, Dordrecht. (The title is self-explanatory; a careful and complete survey, especially good on tableau methods.) Gabbay, D. M. (1982). Intuitionistic basis for non-monotonic logic. Proc. 6th Con:f on Automated Deduction (ed. D. W. Loveland). Lecture Notes in Computer Science. Vol. 138, pp. 260-273. Springer-Verlag, Berlin. (The paper that originally showed how McDermott and Doyle's notion of consistency could be given an intuitionistic semantics.) Gabbay, D. M. (1984). Theoretical foundations for non-monotonic reasoning in expert systems. Research Report DoC 84/11, Dept. Computing, Imperial College of Science and Technology, University of London; also in Logics and Models o( Concurrent Systems (ed. K. Apt), pp. 439-459 Springer-Verlag, Berlin. (The report in which the notion of restricted monotonicity is introduced, and in which proofs are given of results stated here.) Gabbay, D. M. (1987). Programming in Pure Logic (forthcoming book). (Prolog-style backward-chaining computation rules are given for classical, intuitionistic and modal logic. Unlike Prolog, the resulting systems are complete.) McDermott, D. and Doyle, J. (1980). Non-monotonic logic I. Artificial Intelligence 13. 41-72. (The paper that originally introduced the consistency operator M.) Moore, R. C. (1988). Autoepistemic logic. Chapter 4 of this book. Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence 13, 81-132. (Introduces the notion of default reasoning.)

6

Jntuitioni.l'tic Ba.l'e.l' for Non-Monotonic Reasoning

175

DISCUSSION

J. A. Campbell: "Non-monotonic reasoning" now means many things to many different people. For example, there is the loose or colloquial meaning: reasoning in which the conclusions of deductions change with time as new information is required. This meaning has been particularized in several different ways, e.g. circumscription (McCarthy, 1984). A typical middle-of-the-road particularization is default reasoning, of the kind represented by the work of Reiter ( 1980); other varieties are not hard to find (see e.g. l!,ukaszewicz, 1986; AAAI, 1984). Autoepistemic logic (Chapter 4) belongs in the same general non-monotonic family. The title of this chapter by Clarke and Gab bay suggests that intuitionistic logic is of some help or interest to people who may want to use any of the above systems. This is not untrue (or at least it fails to be untrue, i.e. it can be believed while we wait for further information), because it offers some technical fixes at a fairly basic level. The question of the extent to which the fixes can propagate to higher levels of any given non-monotonic logical system and therefore change the way in which its interpretation works is still open. It is certainly worth further research in the directions indicated below and in this chapter. The essence of the intuitionistic view of logics, applicable also to the nonmonotonic case as presented in this chapter, is that certain knowledge of the world is a non-decreasing quantity: at least for atomic propositions, the transition from "no evidence either way" to true or false is irreversible once it has happened. This is indeed a means of evolution of knowledge that makes non-monotonic reasoning necessary, but it is not clear that it matches many of the situations in which such reasoning will be needed in potential applications, for example real-world economic planning exercises against the background of pseudorandom behaviour of some (most?) governments. To take a simple case of an intuitionistically unruly effect, consider a component that is subject to faults followed by periods of recovery. When not "recovering", it is in either a "true" or a "false" state, with different implications for the system of which it is a part, in the two different states. Following recovery, there is a further period during which its state is not identifiable by the observer, although the state exists and affects the system. During that period, the observer must make inferences based on his beliefs about the component without having certainty about its state. This situation is quite realistic (for example the problem could be one of planning short-wave radio links, where the component is the ionosphere or part of it), but seems to be too complex for the basic intuitionistic semantic function h(t, A). This is a comment about the scope of the theory, not about its technical content. There is a second comment about scope which should be made here: the discussion by Clarke and Gabbay refers to propositional expressions, but leaves open the question of what happens to the formal structure when higher-order expressions are involved; in particular, first-order expressions for typical versions of default reasoning (for which there seems no reason to believe that the behaviour will be nasty after further work on technical issues) and second-order for circumscription (a tougher problem!). It is tempting to mention, in passing, that the example of the counter-intuitive meaning of ('v'x) P(x) -+ Q 1- (3x) (P(x) -+ Q) when P(x) is "x plays well" and Q is "we shall win" is an effective blow struck against classical first-order logic (or an effective demonstration of its defencelessness), but there is no evidence that it will not turn out to be an equally effective blow or demonstration at the expense of the intuitionistic approach, when that approach has been extended to the point where it can handle all first-order expressions satisfactorily. There are semantic issues here, to do with the

176

M. R. B. Clarke and D. M. Gahha1·

meaning of -+ and what everyday meanings it is reasonable to attach to predicate formulae, but these are rather different from anything else in this chapter (i.e. they may be the same for classical and intuitionistic logics) and probably from anything else in this entire book. Within the restricted (propositional-calculus) scope of the intuitionistic treatment in the paper, the main pay-off is that the most obvious defect of the seminal paper on non-monotonic reasoning by McDermott and Doyle (1980) is removed. This is that -,c does not follow naturally from MC-+ D and -,D. In addition, MC-+ c intuitionistically proves nothing significant ( C v -,C), which it does prove, is a paraphrase rather than a piece of new information), while McDermott and Doyle's view that it proves C looks too strong a view to model reality cautiously. (A cautious observer would call that view "jumping to conclusions".) These are both positive achievements. In Clarke and Gabbay's specification of the deduction ""'• however, a price is paid: further restriction of the scope of the formalism. This comes from the property (2). which requires that B and C should be consistent if both A., A 2 , •• • , An""' B and A., A 2 , .• • , An""' C. As the authors note, the condition rules out systems with multiple extensions. In a system with multiple extensions, each extension must make some kind of semantic sense, and thus correspond to the construction of the world-view of one conceivable rational agent: democratic diversity, in fact. A single-extension system is rather like the picture of the stage of a puppet theatre as seen by the puppet-master. There will certainly be occasions when that treatment is adequate or even necessary. but there will probably be at least as many other occasions when it is too limiting for the problem that one is trying to solve by non-monotonic methods. Property (2) can probably not be relaxed without destroying the essential structure and meaning of the "restricted monotonicity" property (3). In one respect, mentioned below, the destruction is not necessarily a bad thing. But a hard-headed potential user's reason (additional to any reason that a specialist in intuitionistic logic may have) for preserving property (3) is that it simplifies possible proof procedures and automation of this intuitionistic propositional calculus to the point where it has a fighting chance of being computationally efficient. Efficiency is not a problem for suitably simple propositional cases, for example the establishment of {(M P -+ P), (P " MQ -+ R)} 1- R, but in general the proof scheme needs much more attention if it is not to degenerate into a trial-and-error exercise. It seems that the maximum mileage has already been obtained from classical and technical non-monotonic rules of deduction; what would be helpful would be a third type of rule or set of rules to keep a chain of deductions moving in the right direction. towards a desired goal, and avoiding tautologies along the way. Obviously, soundness and completeness of any such scheme should receive early attention, for example as a way of protecting it against too-enthusiastic use of clues for good expressions to select in a next proof step that are given from outside the intuitionistic system on some external grounds of "plausibility". It would be amusing to consider whether it could ever be safe to receive clues of this kind, say from a belieffunctional (Chapter 9) computation running in parallel with the logical one, and what kinds of mechanisms would be needed to sanitise the advice, to the point where completeness etc. could be guaranteed. Customers using distributed and multi-agent planning methods in artificial intelligence are prone to ask for things at least as dangerous as this! Less adventurously, schemes for selecting the next move to make in a proof can be envisaged which rely on syntax or the previous history of a computation. An example

6

/ntuitionistic Bases for Non-Monotonic Reasoning

177

of the latter in a successful scheme of deduction in classical logic is ancestor-filtered resolution. A possibility that mixes the two is the idea that, in choosing expressions e andfin order to conclude e ".f ~ g, no more than one of e,.fand g should begin with the operator M. An example of the "syntax" type ("short expressions are likely to be nicer to concentrate on than long expressions, or at least they waste fewer computing resources if this concentration leads nowhere") is the unit-preference tactic used in resolution. Another syntactic example, even if the intention of including it in the paper is entirely different, is Clarke and Gabbay's rule Dl; there is no obvious reason why it should not serve two purposes at once. In general, there may be several ideas worth transferring to the problem of finding good intuitionistic proof procedures from the area of classical resolution where they have had past successes. Nilsson ( 1980) gives a useful catalogue of such resolution-based ideas. This chapter started by suggesting that it would propose some properties in terms of which non-monotonic systems might be classified. In outline. what it has proposed are two points on a possible scale of classification. A monotonic deduction system is specified by three properties for its consequence relation. One brand of (intuitionistic) non-monotonic system is specified by two properties in common with it and by a third property (restricted monotonicity) that is conceptually not far distant from 'classical" monotonicity. Yet there are many more brands of non-monotonic system, as the references mentioned at the beginning of these comments indicate. Going beyond restricted monotonicity, for example by trying to specify different families of multiple extensions, may provide a useful taxonomy of non-monotonic reasoning systems that is supported by an intuitionistic base. Taxonomies arc somewhat lacking at present, but are likely to be needed more as the amount and variety of published material on non-monotonic logics (in the "loose meaning" mentioned at the beginning of this discussion) increase.

Reply: The first point that Campbell makes is that the particular model we put forward of intuitionistic truth as accumulation of information in time does not cope with situations where truth values change in an arbitrary manner over time. The example he gives of an intermittently faulty component would seem to be best handled by state probabilities parametrized by time or possibly some kind of event calculus or similar temporal approach. The semantics that we propose is not intended to define a temporal logic, the time points are identified with the propositions known to be true at those points. Time is used as a metaphor in the same way that possible worlds are in modal logic. Campbell's second point is that we only discuss the propositional case. We could have made it clearer that intuitionistic logic can be given a first-order semantics in the usual way for Kripke models by specifying a partially-ordered set of domains of individuals for each time point or possible world. The original Gabbay (1982) paper also gives an axiomatic presentation that includes all the necessary axioms and rules for the first-order case. The formula (\lx)P(x) -+ Q I- (3x)(P(x) -+ Q) cannot be proved without reductio ad absurdum; it is not intuitionistically valid. The point we were trying to make here, by ~ay of a brief example en passant, was that intuitionistic implication and deduction is In many cases a better translation of the if ... then of everyday human reasoning than classical material implication and the deductions that result from having reductio ad absurdum as an available rule. McDermott and Doyle's system only proves C from MC-+ C when the nonmonotonic rules are used, just as the intuitionistic system does. It could be viewed as a

178

M. R. B. Clarke and D. M. Gahhay

justifiable cnuctsm of the intuitionistic system that MC-+ C is equivalent to C v --, C, i.e. neutral with respect to C or --,C. This is not the usual intention behind defaults. The question is related to the point about multiple extensions. We agree with John Campbell that multiple extensions are an inherent part of conjectural reasoning. What is needed is some way of saying that some extensions are preferred to others on the basis of current information. It is worth noting that, according to the definition we give of non-monotonic deduction, the theory M P -+ Q cannot non-monotonically prove 1Q if P and Q are different. We have enlarged slightly at the end of this chapter on how a practical nonmonotonic system might be implemented. What is needed now are practical applications and comparisons with other methods such as truth-maintenance systems.

Additional references AAAI (1984). Proc. AAAI Workshop on Non-Monotonic Reasoning, New Paltz, NY. American Association for Artificial Intelligence, Menlo Park, California. ¥-ukaszewicz, W. (1986). CC AI. J. Integrated Study Artificial Intel/. Cogn. Sci. Appl. Episternal. 3, 7-31. McCarthy, J. (1984). Applications of circumscription to formalizing common-sense knowledge. In AIAA (1984), pp. 295-324. Nilsson, N.J. (1980). Principles of Artificial Intelligence. Tioga Publishing Co., Palo Alto, California.

7

Inheritance in Semantic Networks and Default Logic CHRISTINE FROIDEVAUX Laboratoire de Recherche en lnformatique, Universite de Paris-Sud, Orsay, France

DANIEL KAYSER LIPN, Departement de Mathematiques et lnformatique, Universite de Paris-Nord, Vil/etaneuse, France

Abstract Hierarchies with exceptions are very common. Inheriting properties in such hierarchies is an important problem in automated reasoning. We present two solutions to this problem-Fahlman's semantic networks NETL and Reiter's Logic for Default Reasoning-and we examine how these two solutions are related.

1

INTRODUCTION

This paper examines the relationship ex1stmg between a knowledgerepresentation technique, namely semantic networks allowing exceptions, and a form of inference, default reasoning. Semantic nets are usually equipped with ad hoc inference procedures, while default reasoning is more firmly grounded, as far as theoretical background is concerned; unfortunately, it turns out that in the most general cases default logic has rather bad properties concerning decidability or proof theory. We thus focus on some more specific cases, namely type hierarchies, which are of major practical interest, where default reasoning and semantic nets can be related in various ways. 1.1

Semantic networks

The idea of using so-called "semantic" nets in order to represent some kinds of knowledge stems from a widely spread metaphor, which consists in using spatial expressions to convey semantic relationships (e.g. a semantic neighbourhood between concepts). Some early works, especially in the field of NON.STANDARD LOGICS FOR AUTOMATED REASONING ISBN 0·12-649520-3

Copyright CO /988 Academic Press Limited All rights of reproduction in any form reserved

C. Froidevaux and D. KaysC'r

180

information retrieval (e.g. Doyle, 1962), considered the possibility of implementing some sort of semantic graph in a computer. At that time, the li!lks in the net had no other interpretation than existence of a (possibly weighted) meaning relationship between the concepts. Since the publication of an influential paper by Quillian (1968), the idea of semantic nets has pervaded more areas of Artificial Intelligence, but, as no precise semantics was ascribed to nodes or to links in the original paper, many workers using Quillian's approach merely considered themselves free to give any meaning they wished to the relations expressed in a network. A noteworthy exception was the work of Shapiro (1971 ), which was arguably the first serious attempt to represent first-order logic formulae in a network formalism (we discuss the reasons for doing this in Section 1.3). In an important paper, Woods ( 1975) criticized the proliferation of graphic formalisms that had no well-defined semantic interpretation. A correspondence between some types of networks and certain classical first-order theories was subsequently provided by Schubert (1976) and Hayes (1977). A good presentation of "second-generation" networks is to be found in Findler ( 1979). Meanwhile, Fahlman (1979) presented a system, called NETL, that was primarily oriented toward parallel computers; although NETL's semantics is not always accurately defined, this system clearly embodies an important mechanism for default reasoning (see Section 1.2). A large part of the present chapter will be devoted to NETL or NETL-Iike inferencing facilities. More recent literature concerning semantic networks includes descriptions of KL-ONE (Brachman and Schmolze, 1985), Krypton (Brachman et a/., 1985) and KL-TWO (Vilain, 1984). 1.2

Default reasoning

Except in a few technical domains, knowledge is best expressed by means of general statements, which should not be interpreted as universal laws; for example Examples Turning the switch puts on the light (the bulb might be broken, ... ). Birds can fly (except ostriches, penguins, wounded birds, ... ). The seminar is held every monday (except during holidays, ... ).

Such facts can be represented well by formulae such as (1)

P(x)

1\

1exception 1 (x)

1\ ... 1\

iexception.(x) => Q(x)

But formulae of this kind do not express at all the fact that exceptions are

7

Inheritance in Semantic Networks and Default Logic

181

something exceptional: P receives exactly as much importance as any "exception;'' (in other words, if P(a) has been proved, and 1exceptioni(a) for all j I= i has also been proved, the implication remains blocked, as would be the case if not even P(a) was proved: the general case has no more weight than any bogus case). This would not be a problem if inference took place in a universe where an exhaustive description of each object is available (even in this case, more economical descriptions for rule (I) could help); unfortunately, most real-world applications are such that one cannot even dream of exhaustive descriptions. As Reiter ( 1978) says: "Default reasoning may well be the rule, rather than the exception, in reasoning about the world, since normally we must act in the presence of incomplete knowledge." Example A robot planning to have some light in the room should not have to prove that the bulb is OK, the wires are normal, the fuse is fine, .... Only if something wrong appears in the realization of the plan would it be time to check. As a matter of fact, human communication (and man-machine communication as well) is possible only under the pragmatic law (in the linguistic sense of the word "pragmatic") that if something relevant is in an exceptional state, it must be explicitly stated. Otherwise, it is reasonable to assume that everything is normal. Default reasoning deals exactly with this assumption. We present in Section 3 one formalization (Reiter, 1980) of default reasoning. Other formalizations include those of AI (1980), McDermott ( 1982), Moore ( 1985) (see also Chapter 4), Lukaszewicz (1985) and McCarthy ( 1986) (see also Chapter 5). 1.3

Inheritance

Representing knowledge would be futile if there were no procedure to draw inferences from the knowledge represented. We shall focus on the kind of default reasoning that can be achieved with semantic networks, but first it seems natural to answer the question: why use semantic networks in order to make inferences, instead of, say, usual logical formulae? Well, this is an ever-lasting point of debate. The arguments generally given in defence of the network representation are as follows. (i) Not all inferences are equally needed; while logic can, in principle, ?educe any deducible fact, some lines of reasoning have more practical Importance, and having a means of deducing quickly along these lines is definitely an advantage.

C. Froidevaux and D. Kaysn

182

The most useful deductive pattern is inheritance; it goes as follows: if it can be established that a (respectively the As) is a (respectively arc) member(s) of class B, and if it is known that every element of B has some property, then a (respectively the As) has (respectively have) that property. In first-order logic, this might be expressed for example as

(2)

('v'x) (A(x) =:. B(x))

(3)

('v'x) (B(x) =:. (3y)(Q(y)" P(x, y)))

Now, if assertions (2) and (3) are scattered among many irrelevant (for the present purpose) formulae, it might take a long time for an automatic theorem-prover to show that

(4)

('v'x) (A(x) =:. (3y)(Q(y)" P(x, y)))

holds. But this is precisely the kind of deduction that the indexing capabilities of a semantic net allow to be performed very quickly (see Section 2). (ii) The creation of new nodes and arcs seems easier, at least for a nonexpert, than the redaction of formulae. More important, a visual display can show at once all the relations concerning a given entity: this facility makes modification a much safer task than altering a formula without being aware of the side-effects of such a modification. It is worth noting that recent advances in semantic data models tend to favour network-like formalisms (see e.g. Hull, 1985), precisely because of the ease of modification and interrogation. (iii) As will be shown later, adding or removing exceptions is relatively fast and safe, while formulae like (I) need to be rewritten every time a new exception is taken into account. Alternatively, it could be argued that we now need two inference mechanisms in order to be complete: when inheritance has failed to yield the desired result, a standard theorem-prover must be triggered, and the time spent with the network is wasted; this is true but (a)

efforts are made to ensure that inheritance runs much faster than regular deduction-the waste of time is thus negligible; (b) some "hybrid" strategies (Brachman et al., 1985) may take advantage of the work already performed during inheritance to speed ur the deduction; (c) completeness is not always wanted (some users might prefer a message "It will take me much time to find the answer. Do you reallY

7

Inheritance in Semantic Networks and Default Logic

183

need it?"; only in the case where the user says "yes" is a complete deduction required).

2 AN EXAMPLE

Consider the following statements: when the road is clear, go ahead; when you have a red light in front of you, stop; when a policeman tells you to go ahead, go ahead. At first sight, they seem contradictory: nothing prevents you from being on a clear road, with a red light in front of you; the above statements conclude then on "go ahead" and "stop". What should be done is to consider the word "except" as implied at the end of the first two lines, i.e. if the road is clear go ahead, except when you have red light, in which case you must stop, except .... In the third assertion it is implicit that the road is clear. Now the situation is well defined in any case. Let us have a look at its representation in NETL:

GO:

RC: RL:

PO:

situation where you GO ahead; situation where Road is Clear; situation where road is clear, but Red Light is on; situation where road is clear, with red light, but POliceman tells you to go.

The arrow ~means "set inclusion" (e.g. RL ~ RC will be read as "every situation where a road is clear with Red Light on is a situation where Road is Clear"). The arrow means "typical set inclusion" (e.g. Rc ----. GO will be read as "every situation where Road is Clear is a situation Where you GO ahead, except if you are told otherwise"). The arrow -Hit+ rneans "typical set exclusion" (in the sense that the typical element of the first set does not belong to the second set), and RL -!+#+ GO should read "every situation where road is clear, but Red Light is on, is not a situation where

184

C. Froidevaux and D.

Kay.~,,,.

you GO, except if you are told otherwise". The arrow --- --+ means "exception" (i.e. cancels the information represented by the arc pointed at by the arrow). These kinds of arcs, with possible variations, are sufficient to yield the intuitive results, when the inference mechanism is enforced by the following marker-passing schema (similar to Fahlman, 1979). Put a marker, say M I, on the node corresponding exactly to the actual situation, and pass markers according to the rules M I ---+ becomes M I ---+ M I becomes M I

---+

MI

(but M I M3 unchanged)

M I '*++ becomes M I

~

M2

(but M I -++ M3 +f+ remains unchanged)

MI

----+

M3

MI

---+

----+

becomes M I

---+

remains

In other words, M3 behaves as an inhibitor: as soon as an arc is marked with M3, it becomes unable to pass any marker. M I is a "yes" and M2 is a "no". Markers are supposed to be passed in parallel synchronously (if the process is asynchronous, "lost races" between markers might happen: this is easily detected and fixed; see Fahlman, 1979). Moreover, some networks can yield to a situation where both M I and M2 appear on the same node. This inconsistency occurs only in what Fahlman calls "illegal networks"; Fahlman et al. ( 1981) provides a means to detect them. Example Is an RL situation an RC situation? Put M I on RL: rule M I ----. M I applies; M I appears on RC; the answer is "yes". Is it a GO situation? Rules M I -- --+ M3 and M I -ttl++ M2 apply alsP simultaneously; M I appears on RC, M2 on GO. At the next step, rule M I - - M3 ---+ applies; nothing else is applicable. Only M2 has appeared on GO; the answer is "no". Example What is a PO? Put M I on PO: rule M I ~ M I applies twice and propagates M I to RL and to GO; rule M I ----+ M3 applies. At the next step, rule M I --+ M I propagates M I from RL to RC; rules M I ~ M3 -++-' and Ml ----+ M3 apply. At the next step, rule Ml M3-- applies: nothing else is applicable. The answer is: a PO is an RL, an RC, and a GO For this question, working the marker-passing schema leads to the following network:

185

7 Inheritance in Semantic Networks and Default Logic

Remark 1

Depth can be increased, such as in the following example:

Regular years have 365 days, except when year number is divisible by 4, where they have 366 days, except when year number is divisible by 100, where they have 365 days, except when year number is divisible by 400, where they have 366 days. The network is as follows:

\

\

I

I I ~

The reader may convince him/herself that the propagation scheme works, Yielding the expected results, and that any deeper model could in principle be represented, although it seems unlikely to find in the real world four nested e~ceptions. Moreover, this example is less realistic than the previous one, 810ce all the knowledge is available at once, and there is hence no need to reason with defaults: an ordinary algorithm is sufficient.

186

C. Froidevaux and D. Kayser

Remark 2 NETL allows "role" links, to reflect situations like the one described by (3), yielding p

B

Q

• ..-...*--+*

(Read: for all x that is a B there exists a y that is a Q and that plays role P for x)

Nothing precludes having, besides a "strict" role link with the interpretation which would mean, as in similar given by (3), a "default" role link cases, "unless cancelled, there exists ... ", with possible exception links pointing at it. For instance, the following net would mean "every human has a father, who is human, except Adam":

Human

Adam

..--......

* ""'WWWW' > *

l/

Father

••

We shall not elaborate on the role exceptions in the rest of this paper, but they don't seem to conceal more difficulty than that we shall treat.

3

PRESENTATION OF THE DEFAULT LOGIC (REITER)

3.1 3.1.1

Definition of the theory Intuitive approach

Our presentation of the formalism will be essentially syntactic, because the proposals in order to have a semantical definition of default theories did not provide a more intuitive insight into these theories. This formalism presents many interesting features and is in our opinion an appropriate tool to handle default reasoning. In order to make the syntactical definitions more intuitive. we begin with a few informal considerations. Recall that by default reasoning, we mean the drawing of plausible inferences from less-than-conclusive evidence in the absence of information \l1 the contrary. The first idea is to introduce in the first-order formalism a basiL default operator, denoted 'tf where 'tf w means "w cannot be deduced from the given knowledge base". The first-order theory will be augmented with inference schemata like this for example: (5)

bird(x) 'tf (penguin(x) v ostrich(x) v ... ) jlies(x)

7 Jnherilance in Semantic Networks and Default Logic

187

such a formula means: "if x is a bird and if it cannot be proved that x is a penguin or an ostrich, then deduce that x flies." If we add the formula (6)

bird(tweety)

then we can infer that tweety flies. Default reasoning is non-monotonic in the sense that the addition of new statements may invalidate previously derived facts: the set of theorems does not grow monotonically with the set of axioms. Namely, in the example, if we add to the formulae (5) and (6) the axiom

(7)

penguin( tweet y)

then the theory is still consistent but default (5) is not applicable and the formula flies (tweety) is no longer a theorem. An important difference between classical logic and default reasoning is that a single set of axioms can have more than one set of conclusions. For example, consider a universe with two objects A and B. Assume an object is not a block unless it is required to be. Assume also that either A or B is a block. We get the two statements: (8)

ff Block(x) -,B/ock(x)

(9)

Block(A) v Block(B)

Default (8) means "if it cannot be proved that x is a block then deduce that x is not a block". Now neither Block(A) nor Block(B) can be proved using the classical inference rules, so that -,Block( A) and -,Block( B) should be provable by means of(8). But the statement -,Block( A) 1\ -,Block( B), which then becomes provable, is inconsistent with (9). In order to avoid this inconsistency, we will manage to have two possible extensions, one in which Block(A) and -,Block( B) are provable, and another in which Block( B) and -,Block( A) are provable. This feature of default reasoning explains the difficulty of providing a semantics for defaults that allows a set of axioms to give rise to several extensions. Another problem is related to the definition of the non-monotonic theorems. In the inference schema (10)

ffP Q

it is necessary to know that P is not in the set of all provable statements in Order to declare that Q is in that set. This amounts to saying that the set of Provable statements should be known before any proof begins. To avoid this

188

C. Froidevaux and D. Kay.l'<'r

apparent circularity a fixed-point construction (cf. Definition 2 in Section 3.1.2) must be used. The definition of the theorems is thus not constructive: it is not in general decidable whether a formula is a theorem, i.e. there is no algorithm that will tell us whether or not a given default is applicable Moreover, to tell whether a sequence of formulae is or is not a proof of a nonmonotonic theorem does not depend, in contrast with classical logic, on an individual check of each step. 3.1.2

Formal definition

We now provide a short formal definition of default logic (for a more thorough presentation the reader is referred to Reiter, 1980; Besnard, 1987). Definition 1 A default theory 11 = (D, W) consists of a set of closed firstorder formulae Wand a set of defaults D. A default is any expression of the form u(x):v 1(x), v2 (x), ... , v.(x) w(x)

where u(x), v 1 (x), v2 (x), ... , v.(x) and w(x) are well-formed formulae whose free variables are among those of x = x 1 , ... , xm; u(x) is called the prerequisite' of the default, v1 (x), v2 (x), ... , v.(x) are its justifications and w(x) is its consequent. The meaning of the default it as follows: if u(x) is known and if v1 (x), v2 (x), ... , v.(x) are consistent with what is known then w(x) is inferred. Note The operator If used in the previous section had only an intuitive meaning. The symbol ":" is here formally defined. The reader should be aware that ":B" attempts to model the intuitive expression lfiB.

In what follows, theorems are given only for closed defaults, i.e. for defaults without free variables. The results extend to an open default theory, b) considering the closed default theories obtained from the open default theory by instantiating all the free variables of the defaults with the terms of the corresponding Herbrand universe. We should point out that the defaults are not formulae of the first-order language, but are in some way specific oriented inference rules. "Default~ therefore function like meta-rules; they are instructions about how to create an extension of the incomplete theory" (Reiter, 1980): the set of defaults D yields a means of predicting conclusions in spite of some gaps in the firstorder knowledge W, by extending W. Any such extension will provide the theorems of the default theory and will be interpreted as an acceptable set ol beliefs that one may hold about the incompletely specified world W. Note

7

189

Inheritance in Semantic Networks and Default Logic

that not every default theory has an extension and some have more than one. A set of formulae is considered as being an extension if it is the fixed point of an operator r, which we now define. The definition of r must take into account the following properties: an extension must contain W, be deductively closed under first-order provability and be closed under the application of the defaults. More formally, we get the following.

Definition 2 Given a default theory A = (D, W), let S be a set of closed formulae and r(S) the smallest set satisfying the following three properties: ~

(i)

W

(ii)

r(S)

(iii)

If

r(S); =

Th(r(S))

u: V1, ... , w

Vn

(Th stands for "first-order theoremhood");

ED, u E r(S) and -, v1 ,

A set E is an extension for A iff r(E) operator r.

=

.•. , -, Vn

¢ S then wE r(S).

E, i.e. iff E is a fixed point of the

"r(S) is the minimal set of beliefs that we can have in view of S, where S indicates which justifications for beliefs are to be admitted" (Besnard, 1987). We give a characterization of extensions that makes this notion more intuitive.

Theorem 1 for A iff E =

Let A = (D, W) be a closed default theory. E is an extension E;, where

U;=o, .... oo

£0 = W Ei+ I = Th(Ed u

{l w

u: V1, ... ,

w

Vn

ED,whereueE;and-,v 1 ,

... ,-,vn¢E

}

fori;;;: 0

Because of the occurrence of E in the definition of E;+ 1 , the sequence of sets

E1 cannot be considered as constructive to obtain an extension. We give some examples to illustrate the definitions.

Example 1

01)

Consider the following assertions:

in general birds fly

02) canaries are birds 03)

tweety is a canary

190

C. Froidevaux and D. Kayser

the theory L\ 1 =(D., W.), where bird(x))} and

These assertions correspond to

W1

=

{canary(tweety), (Vx (canary(x)

=:.

_ {bird(x) :.fties(s)} fiies(x)

D.-

Then L\ 1 has a unique extension

E1

=

Th( {canary(tweety), (Vx) (canary(x) =:.

bird(x)), bird(tweety),fiies(tweety)J) Assume that we add the new assertion (14)

tweety does not fly

LetL\'1 be(D 1 , W'1 ),where W 1 = Wu{---,fiies(tweety)};thenL\'1 hasauniquc extension

£'1

=

Th({canary(tweety), (Vx) (canary(x) =:. bird(x)), bird(tweety), ifiies(tweety) J)

We can no longer deduce fiies(tweety). This example shows that default theories are, in general, non-monotonic.

Example 2 modified.

This example is taken from Dubois et al. ( 1985) and slightly

( 15)

generally, if Mary attends a meeting, Peter does not attend the meeting

( 16)

generally, if Peter attends a meeting, Mary does not attend the meeting

Let AM M (resp. AMP) be short for "Attends Meeting Mary" (resp. Attends Meeting Peter). These assertions translate into the following defaults: _ {AMM: -,AMP AMP: -,AMM} D2,------

-,AMP

-,AMM

Moreover, suppose that we know (17)

either Peter or Mary (or both) attend the meeting

Then we must add the first-order formula W 2 = {AMM v AMP}. Then L\ 2 = (D 2 , W 2 ) has a unique extension £ 2 = Th({AMM vAMP}). We can conclude only the global presence of Mary or of Peter. A couple of things are worth noticing here: first, in contrast with the block example, there is nothing in the extension that was not deducible from the first-order knowledge, and this is because none of the default prerequisites j-, provable. A consequence is that the statement of exclusion 1(AMM

1\

AMP)

7 Inheritance in Semantic Networks and

D~fault

191

Logic

which follows intuitively from the premises (if one is present then the other is absent, so that they are not both present), is not provable. It seems then advisable to modify the translation of the statements in order to get defaults without prerequisites (as in the block example), for instance:

, _ {:AMM , :AMP} D2---,AMP --,AMM "Generally, if x attends a meeting then y does not attend a meeting" is translated here as "if it is consistent to believe that x attends, then conclude that y does not attend". The statement of exclusion is now provable; as a matter of fact, !!2 = (D]., W 2 ) has two extensions: E]. = Th({AMM, -,AMP}) and E2 = Th({AMP, --,AMM}), and 1(AMM" AMP) is a member of both E]. and E2. Unfortunately, this translation is till inadequate. Suppose the formulation to be slightly altered, the premises being (15) generally, if Mary attends a meeting then Peter does not attend the meeting (18)

every time Bill attends then Peter also attends

(19)

Bill attends

We now get the following default theory: !l3 = (D3, W3),

and

W3

=

where D 3 = { :AMM} -,AMP

{AMB, AMB =AMP}

The reader expects the conclusion to be "Peter attends", but this is not provable with the given translation; as AM M can be consistently believed, the conclusion --,AMP is also attainable, and thus there is no extension at all. We are thus naturally led to a translation using seminormal defaults (see later); the initial formulation of the problem is thus rendered as where D'2

=

{:AMM" -,AMP, :AMP" --,AMM} -,AMP

--,AMM

W 2 = {AMM vAMP}

This translation has two advantages: the statement of exclusion belongs to both extensions E]. and E2 of ll2, and the adjunction of AMB =AMP and 4MB to the first default of A2 leads to the correct conclusion: AMP. From this example, it is clear that the translation of a set of general

192

C. Froidevaux and D. Kayser

statements into a set of defaults is no trivial matter. To our knowledge, as yet, no procedure has been designed that would do the job. Example 3 In the case where the satisfied prerequisites lead to opposite conclusions, we get two extensions as for the block example. Suppose that we have the following assertions (cf. Reiter, 1980): (20)

typically Republicans are not pacifists

(21)

typically Quakers are pacifists

(22)

Richard is both a Quaker and a Republican

The corresponding default theory is 11 4

=

(D 4 , W4 ), where

W4

=

{Quaker(Richard)

1\

Republican(Richard))

and

D

=

4

{Republican(x):---, Pacifis:(x) Quaker(x): Pacifist(x)} ---, Pacifist(x) ' Pacifist(x)

114 has two extensions: =

Th( {Quaker( Richard)

1\

Republican( Richard), Pacifist( Richard)})

E~ =

Th( {Quaker( Richard)

1\

Republican( Richard), ---,Pacifist( Richard)})

£4

The existence of two extensions is compatible with the fact that we cannot conclude about the warlike nature of Richard (the assertions are ambiguous). This situation corresponds to an inconsistent net in NETL. While default logic accommodates two mutually inconsistent views, NETL forces the user to choose among them, because of the "illegality" of the net. Example 4

Let 11 5 be (D 5 , W5 ), where W 5 =

0

and D 5 =

.-,A}.Then {T

11 5 has no extension. This means that intrinsically incoherent defaults can never be applied. Moreover, Reiter proved a result according to which defaults cannot bring any inconsistencies. Theorem 2 A closed default theory (D, W) has an inconsistent extension iff W is inconsistent. An important drawback of this formalism is that, given a default theory, we cannot know a priori whether it has an extension or not. It then seems natural to restrict ourselves to theories that are known to have an extension. Normal default theories enjoy this property.

7 Inheritance in Semantic Networks and Default Logic

193

rx./•

Definition 3 A closed normal default is a default of the form: where rx. and f3 are closed formulae. A closed normal default theory is a closed theory (D, W), where every default of D is normal.

We get the important following result: Theorem 3 extension.

(Reiter)

Every closed normal default theory has an

Normal default theories are also semi-monotonic; that is, we have the following: Theorem 4

(Semi-monotonicity)

(Reiter)

Suppose that D and

D' are sets of closed normal defaults with D £ D'. Let E be an extension for the closed normal default theory 11 = (D, W) and let 11' = (D', W). Then 11' has an extension E' such that E £ E'.

This property is very useful in the sense that it makes possible a proof theory that is local with respect to the defaults entering into the proof. Reiter (1980) defines the notion of default proof, which we shall not present here; we mention only the result, according to which a consistent closed normal default theory 11 has an extension E such that f3 E E iff f3 has a default proof with respect to 11. Owing to the necessity for satisfiability tests in establishing the default proof, the extension membership problem, even for closed normal default theories, is proved as being not semi-decidable. Unfortunately normal defaults can interact with each other so that they lead to the derivation of anomalous default assumptions. In order to avoid them, we need other default rules. For example, consider the following assertions (Reiter and Criscuolo, 1981): (23)

typically, high-school dropouts are adults

(24)

typically, adults are employed

We do not want to conclude, for a given high school dropout, that he/she is employed. Let John be a high school dropout. Now, if we need open normal defaults, we get the following default theory: 11 = (D, W), where

D = {high-school-dropout(x): adult(x), adult(x): employed(x)} adult(x) employed(x)

194

C. Froidevaux and D. Kayser

and

W = {high-school-dropout(John)}

Then A has an extension that contains employed(John). We can block the transitivity by replacing the second default by adult( x) : employed ( x) 1\ --,high-school-dropout( x) employed(x)

. rx.: p 1\ y Thus we get semi-normal defaults, 1.e. defaults of the form . SemiY

normal default theories are generally not desirable, because they do not enjoy some of the good properties of normal theories. They can fail to have some extension, they lack semi-monotonicity and their proof theory appears to be considerably more complex than that for normal theories. In some cases, semi-normal defauots can be avoided. Consider the following statements. (25)

typically university students are adults

(26)

typically adults are employed

(27)

typically university students are not employed

The use of normal defaults leads to some ambiguity. But in this case, the statements are compatible with the following statement that we can add: (28)

typically adults are not university students

Thus we can use normal defaults only as follows: D = {student(x): adult(x), adult(x): --, student(x), adult(x) --, student(x) adult(x)

1\

istudent(x): employed(x), student(x): 1employed(x)} employed(x) 1employed(x)

Let Peter be a student: W = {student( Peter)}. Then A = (D, W) has a unique extension that contains: 1employed(Peter). Let John be an adult: W' = (adult(John)}. Then A= (D, W') has a unique extension containing : 1student(John), employed(John).

In general it is not possible to avoid the use of semi-normal defaults, as will be shown in the next section.

3.2

Interpretation of NETL proposed by Etherington and Reiter

The discussion up to the end of Section 3 is restricted to the mechanisms of NETL that deal with is-a hierarchies with exceptions.

195

7 Inheritance in Semantic Networks and Default Logic

As in Reiter and Criscuolo (1981 ), Etherington and Reiter distinguish between prototypical facts such as "typically mammals give birth to live young" and hard facts about the world such as "all dogs are mammals". The former translate into default rules, while the latter translate into universal first-order formulae. The translation rules are as follows: A~B:

(a)

(\fx) (A(x) => B(x))

(strict is-a link)

("As are always Bs") A~B:

(b)

(\fx) (A(x) => --, B(x))

(strict is-not-a link)

("As are never Bs")

(c)

A--+ B:

(d)

A -Hit+ B:

(e)

c-- -+:

A(x): B(x)

(default is-a link)

B(x)

A(x): --, B(x)

(default is-not-a link)

--, B(x) (exception link)

This final link must have at its head a default link. It cannot be translated independently of this default link. There are two cases:

r-----

B

(i)

A(x):B(x)

1\

ICt(X)

1\ ..• 1\

--,Cn(X)

B(x)

c, ..... c. A(x): 1B(x)

1\

1C 1(x)

1\ •.. 1\

--,Cn(x)

--, B(x)

A

Recall the first example of Section 2, where the network was as on the left below: GO

~\

RC

\

The corresponding default theory is \

l)

Ri_ I

\.I

PO

A= (D, W) D

= {RC(x): GO(x)

" --, RL(x), _R_L_:_(x-=-):_--,_G_O--::-('-=-x'-=-)-,---"-'_P_O---'(-'-x)} GO(x) --, GO(x)

W = { (\fx) ( PO(x) => RL(x)), (\fx) (RL(x) => RC(x)). (\fx) ( PO(x) => GO(x))}

196

C. Froidevaux and D. Kayser

Let c be a PO situation. Let W 1 be Wu {PO(c)}. A 1 = (D, WI) has a unique extension that contains RL(c), RC(c) and GO(c); that is, c is an RL situation (road-is-clear-but-Red-Light-is-on), an RC situation (Road-isClear), and a GO situation (you-GO-ahead). Let b be an RL situation not known to be a PO situation. Let W 2 be Wu {RL(b)}. A2 = (D, W2 ) has a unique extension E that contains RC(b) and---, GO( b); b is an RC situation but is not a GO situation. Note that E also contains ---,PO( b). Etherington proved the following result: although it may have non-normal defaults, the default theory corresponding to an acyclic inheritance network with exceptions has at least one extension. An algorithm that computes the extensions is provided. Such default theories are said to be ordered semi-normal default theories. If a semi-normal default theory is not ordered then it does not necessarily have an extension, as the following example shows. Let A = (D, W), where W=

0,

D = {

: C & ---, B : B & ---, A : A & ---, C ' B ' A

C}

Then A has no extension. An important drawback of this formalization is that the translation of general statements explicitly mentions the exceptions. This gives rise to different problems. Either we assume that the exceptions are all previously known (an improbable assumption) or we accept continuous modification of the defaults as new exceptions are discovered. An increasing number of exceptions increases the complexity of the defaults. Moreover, the links of the network cannot be translated independently of each other. In Section 3.4 we propose another formalization using default logic that avoids the abovementioned drawbacks and is closer to the NETL structure. The basic idea bears some similarity to McDermott's (1982) proposal for handling exceptions. First we present Touretzky's proposition for handling is-a hierarchies with exceptions, a proposition that does not use semi-normal defaults. 3.3

Implicit ordering of defaults

Touretzky ( 1984) presents a formal analysis of inheritance under "i1iferential ordering". This concept allows him to define an implicit ordering relation « among defaults and to represent inheritance in a natural way, using default logic. To represent the is-a and is-not-a links between two classes P and Q. he uses the following normal defaults: P(x):Q(x) Q(x)

(' 1. k) 1s-a m ,

P(x):1Q(x) ---, Q(x)

. . (1s-not-a hnk)

Let di and di be two defaults of a normal default theory such that Pi(x) is the

7

Inheritance in Semantic Networks and Default Logic

197

prerequisite of di and Pi(x) is the prerequisite of di. He defines di « di to mean that either there exists a default with prerequisite Pi(x) and consequent Pj(x), or there exists another normal default dk such that di « dk and dk « di. This partial ordering induces an ordering (also denoted «) on the default proofs, by comparing the ordering of the last default rules used in each proof. (A default proof sequence is the equivalent of an inheritance path in semantic networks.) For example the three assertions (25)-(27) are translated into the following normal default theory:

D=

{d

1

=

student(x): adult(x), d 2 adult(x)

=

adult(x): ernployed(x), ernployed(x) d3

=

student(x): -, ernployed(x)} 1ernployed(x)

where d 1 « d 2 and d 3 « d 2 • (The second point comes from the fact that there exists a default, namely d 1 , whose prerequisite is the prerequisite of d3 and whose consequent is the prerequisite of d2 ). Let Peter be a student: W = {student (Peter)}. Let 11 = (D, W). We get two conflicting default proof sequences:

S1

=

student(Peter)--(dd- -+adult(Peter)--(d 2 )---+ employed( Peter)

S2 = student(Peter)--(d 3 )- -+iernployed(Peter) From d 3 « d 2 it follows that only S2 is a valid default proof: we obtain the desired result. With this ordering, we can decide which extension must be chosen when there are many contradictory possibilities while the network is intrinsically unambiguous. In the case of an ambiguous network, the implicit order forces the user to recognize that neither extension is to be preferred to the other. While there is no longer any need explicitly to mention exceptions to general laws, so that semi-normal defaults are avoided, this formalization does not allow general laws and exceptions to be distinguished. 3.4 Anotherformalization using default logic for is-a hierarchies With exceptions 3.4.1

Definition of the semantics used

The formal semantics proposed here for inheritance networks is more closely related to the definition of links, nodes and wires in NETL. As do Etherington and Reiter, we distinguish between hard facts and prototypical facts.

198

C. Froidevaux and D. Kayser

In this system a "default is-a link" between two nodes A and B is in fact identified with a node ("handle node") R;-the name of the assertion-linked by wires to nodes A and B. R; is called the "justification" of the assertion. The default is-a link between A and B will therefore be represented as follows: A - R; --+ B. It translates into the semi-normal default ~. = u,

A(x): R;(x)

B(x)

1\

B(x)

(is-a default rule)

(where A(x) and B(x) are property predicates and R;(x) is an assertion predicate). The default is-not-a link between A and B will be represented as follows: A

++ Ri -t++ B. (ji =

It translates into the semi-normal default A(x): R ·(x) 1

1\ ---, B(x)

---,B(x)

(is-not-a default rule)

Like the default is-a link, the cancel link is represented with a node for the name of the assertion. Its graphical representation is A -

R;- > B or A

1'

-++ Ri ++- > B 1'

I

I

Rk

Rk

.

I I

I

I

c

c

which translates into the default (jk =

C(x):Rk(x)

1\

---,R.(x)

---,R.(x)

for n = i or j

(exception default rule)

We make the following assumption: the justification of two different default rules cannot have the same assertion predicates. For every new assertion there will be a new assertion predicate created. Strict is-a links and strict is-not-a links are translated as in Section 3.2. Recall the example of Section 3.2. With our notation, we get the semantic network shown on opposite page. Let W1 be Wu { PO(a)}; A 1 = (D, W!) has a unique extension T 1 that contains PO(a), RL(a), RC(a) and GO(a). Let W2 be Wu {RL(h)j: A2 = (D, W 2 ) has a unique extension T 2 that contains RL(b), RC(b) and ---, GO(b ). Which are precisely the results we wanted. Note that in this case, we do not need the strict is-a link between PO and GO. We could as well have

7

199

Inheritance in Semantic Networks and Default Logic

The corresponding default theory is

Gr~

A= (D, W)

R,~

I\

I

RC

A

.,, l

RL

1\

1R 1(x)

R,

,' l

R4

RL(x): -,R 1(x)

I

R

I

D = {RC(x): GO(x) 1\ R 1(x), RL(x):---, GO(x) " R 2 (x), GO(x) ---, GO(x)

W

R 3 (x), PO(x): 1R 2 (x) 1\ R 4 (x)} 1R 2 (x)

= {('vx) (PO(x) =:. RL(x)), (\fx) (RL(x) =:. RC(x)),

(\fx) (PO(x) =:. GO(x))}

\l

PO

used an exception link between PO and R 3 . It translates into the default PO(x): ---, R 3 (x)

1\

R 5 (x)

---, R 3 (x) that we would use instead of the formula (\fx) (PO(x) =:. GO(x)). The semi-normal default theory is ordered in the same way as that of Etherington and Reiter. Therefore it has an extension. The algorithm provided by these authors applies here. Note that we can use the same network for another purpose: we might wish not to interpret exceptions as assertions of negative facts, but only as blocking the transitivity of the is-a inference path. In this case, we can suppress from the justification of every semi-normal default the predicate that equals to the consequent. We thus get a taxonomic default theory, whose defaults are neither normal nor semi-normal but which still has a unique extension and enjoys some other good properties (cf. Froidevaux, 1986). The formalization using semi-normal defaults and assertion predicates has the same useful properties as the NETL system: as new exceptions are added, old rules remain valid; it is merely necessary to introduce new exception default rules; there is thus no need to know all the exceptions beforehand, thanks to the use of the assertion predicates R;(x). One could object to the introduction of these assertion predicates R;(x) because of the large number of assertions. The objection should then also address another non-monotonic treatment of is-a hierarchies with exceptions: the "circumscription" proposed by McCarthy (1986). He supposes that every object is abnormal in some way and hence "wants to allow some aspects of the object to be abnormal and still assume the normality of the rest". To this end he introduces a predicate ab and many functions aspect;; let us notice that there are as many such

200

C. Froidevaux and D. Kay.l'er

functions aspect; as there are assertions, and hence as many as the assertion predicates R;. Recently, Lukaszewicz (1986) proposed a system for default reasoning that uses many abnormality predicates that are very similar to our assertion predicates. Another interesting feature of our formalism, besides a closer correspondence with elements of the NETL network (namely handle modes), is that it provides a means for handling ambiguities.

3.4.2 Solving ambiguity Consider the following example, taken from Reiter: (29)

If you don't know where a person lives then you can assume that he/she lives where his/her spouse does

(30)

You can also assume that he/she lives where his/her employer 1s located

( 31)

Mary works-in Vancouver

(32)

Spouse(M ary) lives-in Toronto

(33)

(x lives-in u

(34)

!(Vancouver= Toronto)

1\

x lives-in v)

=>

u=v

The statements (29) and (30) translate into defaults as follows: _

(j

1 -

(j 2

lives-in (spouse(x), y): R 1 (x, y) lives-in(x, y)

1\

lives-in(x, y)

works-in (x, y): R (x, y) 1\ lives-in(x, y) lives-in(x, y)

2 = -----'----=-::----=---,--------'--

The assertions (31 )-(34) translate naturally into first-order formulae. With Reiter's classical translation into default logic, two alternatives are obtained: more precisely, we get two extensions. With our translation, we can impose a priority among these extensions. For this, we add either the default () 3 or the default () 4 :

With the presence of the default () 3 , we get Mary lives-in Toronto; with () 4 , we get Mary lives-in Vancouver. Obviously we cannot add simultaneously both defaults () 3 and () 4 .

7

Inheritance in Semantic Networks and Default Logic

201

3.4.3 Conclusion Our formalization improves on the others, because (i)

it preserves the modularity of links in NETL; and

(ii)

it reflects the possibility of inhibiting links ("marking handle-node" in Fahlman, 1979).

It is, however, worth noting that none of the formalizations exactly translates the results of the marker-passing algorithm. For instance, in the theory A2 of Section 3.2, the default theory concludes on -,PO( b), while no M 2 marker could possibly reach node PO.

4

DOMAINS OF APPLICATION

As has already been explained in Section 1, semantic nets with exceptions are useful every time a domain is described with statements of typicality and contains incompletely specified objects. This does not mean that they apply equally well under all circumstances. Consider the following cases. 4.1 "Typically" is understood as "true unless otherwise stated or deduced" Example (After McDermott and Doyle, 1980) "In France, it is daylight at noon". The only possible exceptions are eclipses, and these are sufficiently rare and predictable to be announced; so the information is true unless it is explicitly known to be false. This is exactly the case where all the mechanisms presented here work at their best. 4.2 "Typically" means "most plausible unless otherwise stated or deduced" Example "Seminars are held every Monday at 2.30 p.m." This should be Understood as "true except during holidays and except when a cancellation has been sent to the prospective attendees". Nevertheless, an attendee who knows: (i)

that next Monday is not holiday;

(ii) that he/she is on the mailing list of the seminar and did not receive a cancellation,

202

C. Froidevaux and D. Kayser

is entitled to assume that the seminar will take place next Monday, but he/she might still consider the possibility of the contrary (for example, the cancellation did not reach him/her, the speaker has cancelled too late to send mail, ... ). This situation corresponds to a decision with incomplete information and. unfortunately, this case is much more frequent than case 4.1. The mechanism presented here still works, but, depending on whether or not default rules have been used to reach it, the conclusion should be taken as "true" or as "plausible". If an estimation of the probability/plausibility is possible then the semantic nets should be augmented with number-passing capabilities (Fahlman, 1982) in order to compute a degree of confidence in the result. When "typically" does not even have the meaning of "plausible", but reflects only an asymmetry in favour of one of the possible outcomes, the conclusion-when reached by means of defaults-has only the meaning of "the best conclusion available", which does not guarantee its plausibility. 4.3

Abduction

In troubleshooting or diagnosis systems, many rules read "if a is observed. then b is a plausible hypothesis". Of course, b might be ruled out by previous observations; the situation then bears some resemblance to the situations previously considered. Example "If lamp does not light when you turn on the switch then check the bulb." "If lamp does not light when you turn on the switch and the bulb seems OK, then check the fuse." "If lamp does not light when you turn on the switch, if the bulb and the fuse seem OK, then check whether there is some light somewhere else in the vicinity".

The word "seem" here is important, since successful troubleshooting involves taking into account the fact that first-order verifications do not preclude further investigations. In other words, if "observed" is taken as a "modal operator" then one never has a strict rule: "observed" P::::;. P, but a default rule: P to be true

!I' observed P then by default consider

Poole defines a logical system for problems of diagnosis that involves default assumptions that are analogous to some of Reiter's defaults. In this case, defaults are treated as possible hypotheses in a scientific theory that explains the results. We give only a few indications on the notion of

7

Inheritance in Semantic Networks and Default Logic

203

explainable: an answer is explainable if it follows logically from some consistent set of default instances together with the facts. Poole (1985) provides a semantic characterization of the notion of the most specific theory to explain the results, in the case where reasoning with defaults leads to more than one extension and hence produces different answers. As far as semantic networks are concerned, Poole's formalism provides a semantics for the syntactical notion of inferential ordering (cf. Section 3.3). However, this formalism is not restricted to default links (prototypical facts) as in Touretzky's, but also handles hard facts. 4.4

Typical elements in a class

Much recent work in cognitive psychology (e.g. Rosch, 1975; Dubois, 1986) is based on the fact that human knowledge makes an intensive use of classes in which some elements play a special role; these elements are sometimes said to be prototypical. In what respect is it useful to know which individuals or subclasses are prototypical of a class? The answer is clear when class inclusion is considered as a default statement: an object is prototypical in a class if all (or at least most) default rules specific to that class apply to the object, i.e. if it cancels no links starting from the node corresponding to the class. In a system having no intermediary degrees of truth, this fact adds nothing new. Any individual or subclass of the class already inherits all the default properties of the class; so the only information gained through knowledge of the fact "a is a typical B" is a higher degree of confidence in the default assumptions on a property inherited by a from B. However, the knowledge should also be exploited in a different way, which could be named "bottomup" inheritance, along the following lines: "if you want to know what a B is, and you know that a is a typical member of class B, then look at a". (This is used when answering a child's question: "What is a fir tree?": If you happen to see one good example of a fir then you may answer "look, this is a fir fir tree", i.e. fir trees inherit many properties of the tree.) The inference scheme would then be: if x is a B, if it is consistent that x has property P if a is a typical element of B and has property P then, it is plausible that x has property P. Unfortunately, this inference scheme is much too crude: it amounts to assuming that a typical element has only typical properties. The same Problem arises in analogical reasoning: analogy using irrelevant features Yields invalid conclusions. The problem of finding the "typical properties" or the "relevant features" remains open.

204

5

C. Froidevaux and D. Kay.l'er

CONCLUSION

We have focused on certain patterns of default reasoning essentially based on the notion of inheritance. These patterns occur frequently, especially in commonsense reasoning as it is represented in semantic networks (assertions of typicality). We have shown that networks can provide a graphical notation that distinguishes inheritance and exceptions, and that these networks can be handled reasonably computationally. Moreover, default logic yields a formal tool for representing default reasoning. Default logic works well when normal defaults are used. Unfortunately, normal defaults can interact so that the usc of non-normal defaults becomes necessary. We have emphasized the close links between semantic networks and default logic by giving translations of net arcs into default logic formulae. The formalizations using default logic show the correctness of the NETL-Iike semantic networks in the case of hierarchies with exceptions. In general, default theories are computationally intractable because of the need to check for consistency in the proof of a non-monotonic theorem. Reiter, as early as 1980, pointed out the need for heuristics (cf. Section 3) for handling default theories. Semantic networks, by providing indexing schemes on formulae, could yield an efficient heuristic for the consistency checks required in default reasoning (Reiter and Criscuolo, 1983). Potential derivation chains in the logical formalism correspond to directed paths in the network representation. Recall Example I and repeat what Reiter and Criscuolo suggest: "if node "-,flies" cannot be found within a sufficiently large radius r of the node "tweety" (i.e. if no directed path of length r or less from "tweety" to "-,flies" exists in the index structure) then it is likely that jlies(tweety) is consistent with the given first-order database". According to them, "an heuristic of this kind is precisely the sort of resource limited computation required for common sense reasoning" as Winograd (1980) advocates. This proposition is interesting insofar as it evokes an efficient way to combine the heuristic power of semantic network and the formal power of default logic.

BIBLIOGRAPHY AI (1980). Special issue on non-monotonic logic. Artificial Intelligence 13, no. I 2 (Historically, the first important publication in non-monotonic logic. Various aspects of the field are covered: the ideas behind the notion of non-monotonicity arc generally well discussed, even in the most technical papers. Some of the conributors have since given more elaborate versions of their systems.) Besnard, P. (1987). An Introduction to Default Logic. Springer-Verlag, Berlin.

7

Inheritance in Semantic Networks and Default Logic

205

Brachman, R. J. and Schmolze, J. G. (1985). An overview of the KL-ONE knowledge representation system. C ogn Sci. 9, 171-216. Brachman, R. J., Gilbert, V. P. and Levesque, H. J. (1985). An essential hybrid reasoning system-knowledge and symbol level accounts of KRYPTON. Proc.lnt. Joint CoY!f on Artificial Intelligence, Los Angeles (IJCAI-85), pp. 532-539. Morgan Kaufmann, Los Altos, California. Doyle, L. B. ( 1962). Indexing and abstracting by Association. American Documentation (October), pp. 378-390. Dubois, D. (1986). Comprehension de phrases: representation semantique et processus. These d'Etat, Univ. Paris 8. Dubois, D., Farreny, H. and Prade, H. (1985). Sur divers problemes inherents a l'automatisation des raisonnements de sens commun. Congn!s AFCET-RFIA, Grenoble, Vol. I, pp. 321-328. Etherington, D. W. and Reiter, R. (1983). On inheritance hierarchies with exceptions. Proc. American Association for Artificial Intelligence Col'!f. (AAA/-83), Washington, DC, pp. 104-108. Fahlman, S. E. (1979). NETL: A System for Representing and Using Real-World Knowledge. MIT Press, Cambridge, Mass. (NETL was intended for actual implementation on massively parallel architectures. The ambitions were probably too high for the hardware existing in the late 1970s. This work announces both nonmonotonic logic and connectionism. Very easy to read, and full of interesting remarks concerning knowledge representation issues.) Fahlman, S. E., Touretzky, D. S. and Van Roggen, W. (1981). Cancellation in a parallel semantic network. Proc. 7th Int. Joint. CoY!f on Artificial Intelligence (IJCAI-81), Vancouver, pp. 257-263. Fahlman, S. E. (1982). Three flavors of parallelism. Proc. Canadian Soc. for Computational Studies of lntel/igence-82, Saskatoon, Sask., pp. 230-235. Findler, N. V. (1979). Associative Networks-Representation and Use of Knowledge by Computers. Academic Press, New York. (Although rather old, compared with the timescale of the field, this collection of 14 contributions is one of the best introductions to semantic networks. As for AI (1980), most of the authors have more recently described newer versions of their systems, but the basic ideas underlying their research are generally better presented in this volume.) Froidevaux, C. (1985). Exceptions dans les hierarchies SORTE-DE. Congres AFCETRFI A, Grenoble, Vol. 2, pp. 1127-1138. Froidevaux, C. (1986). Taxonomic default theory. Proc. European Conf on Artificial Intelligence (ECAI-86), Brighton, pp. 123-129. Hayes, P. J. (1977). In defence of logic. Proc.lnt. Joint Conf on Artificial Intelligence (IJCAI-77), Cambridge, Mass., pp. 559-565. Hull, R. (1985). A survey on research on semantic database models. Technical Report, Comp. Sci. Dept, Univ. Southern California, May. Israel, D. J. (1980). What's wrong with non-monotonic logic?. Proc. American Association for Artificial Intelligence Conf (AAAI-80), Stanford, pp. 99-101. Kayser, D. (1984). Examen de diverses methodes utilisees en representation des connaissances. Congres AFCET-RF/A. Paris, Vol. 2, pp. 115-144. lukaszewicz, W. (1985). Two results on default logic. Proc. Int. Joint Col'!{ on Art!ficiallntelligence (IJCAI-85), Los Angeles, pp. 459--461. Morgan Kaufmann, Los Altos, California. lukaszewicz, W. (1986). Minimization of abnormality: a simple system for default reasoning. Proc. European Conff(Jr Artificial Intelligence (ECAI-86), Brighton, pp. 382-389.

206

C. Froidevaux and D. Kay.\'er

McCarthy, J. (1980). Circumscription-a form of non-monotonic reasoning. Artificial Intelligence 13, 295-323. McCarthy, J. (1986). Applications of circumscription to formalizing common-sense knowledge. Artificial Intelligence 28, 89-116. McDermott, D. and Doyle, J. ( 1980). Non-monotonic logic I. Artificial Intelligence 13. 41-72. McDermott, D. (1982). Non-monotonic logic II: Non-monotonic modal theories. JACM 29, 33-57. Moore, R. (1985). Semantical considerations on non-monotonic logic. Artificial Intelligence 25, 75-94. Poole, D. (1984). A logical system for default reasoning. Workshop on Non-Monotonic Reasoning, AAAJ, October, pp. 373-384. Poole, D. (1985). On the comparison of theories: preferring the most specific explanation. Proc. Int. Joint Conf on Artificial Intelligence (IJCAI-85), Los Angeles, pp. 144-147. Morgan Kaufmann, Los Altos, California. Quillian, M. R. (1968). Semantic memory. Semantic Information Processing (ed. M. Minsky), pp. 227-270. MIT Press, Cambridge, Mass. Reiter, R. (1978). On reasoning by detault. Proc. 2nd Symp. on Theoretical Issues in Natural Language Processing, Urbana, Illinois, pp. 210-218. Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence 13,81-132. (This paper, as well as McCarthy's one, is certainly the most influential paper of the collection. The most important theorems concerning normal default theories are proved (existence of extension, mutual inconsistency of extensions, semimonotonicity, elements of proof theory). The reasons for the choices are provided. which makes the paper, despite its technicality, very readable.) Reiter, R. and Criscuolo, G. (1981). On interacting defaults. Proc. Int. Joint Conf on Artificial Intelligence (IJCAI-81), Vancouver, pp. 270-276. Reiter, R. and Criscuolo, G. (1983). Some representational issues in default reasoning. Comp. Maths Applies 9, 15-27. Rosch, E. (1975). Cognitive representations of semantic categories. J. Exp. Psycho/. 104, 192-233.. Shapiro, S.C. (1971). The MIND system-a data structure for semantic information processing. Report R-837-PR, The Rand Corp. Schubert, L. K. (1976). "Extending the expressive power of semantic networks" Artificial Intelligence 7, 163-198. Touretzky, D. S. (1984). Implicit ordering of defaults in inheritance systems. Proc. American Association for Artificial Intelligence Conf. (AAAI-84), pp. 322-325. Vilain, M. (1984). KL-TWO; hybrid knowledge representation system. BBN, Technical Report 5694. Winograd, T. (1980). Extended inference modes in reasoning. Artificial Intelligence 13. 5-26. Woods, W. A. (1975). What is in a link-foundations for semantic networks. Representation and Understanding (ed. D. G. Bobrow and A. Collins), pp. 35-tC Academic Press, New York.

DISCUSSION Didier Dubois and Henri Prade: Default logic offers a formal mechanism for dealing with rules having unspecified exceptions. However, some limitations or problem>

7

Inheritance in Semantic Networks and Default Logic

207

seem to exist from a knowledge-representation point of view. In the following, five questions are briefly mentioned. (I) The approach in its present state does not seem able to take into account various modalities, such as "typically" and "very typically", that might enable some ambiguities to be resolved, as in the following example: typically, Republicans are not pacifists very typically, Quakers are pacifists Richard is both a Quaker and a Republican What default conclusion can be obtained about Richard in the absence of any other information? (2) Default logic enables ordinary conclusions to be derived as in standard logic, as well as default conclusions. But it seems that the default logic itself does not provide any mechanism in order to distinguish between an ordinary conclusion and a default one, which would be useful in case new information becomes available. (3) A troublesome question is the choice of the right translation of a default assertion in default logic, since, according to the choice that is made, the derivation capacities may be different, as is mentioned by Froidevaux and Kayser. For instance, the rule "typically, if Mary attends a meeting then Peter does not" can be written in default logic as

AMM: --,AMP --,AMP or as

:AMM --,AMP

or as

: AMM" --,AMP --,AMP

where AMM (resp. AMP) is short for "Attends Meeting Mary" (resp. "Attends Meeting Peter"). The question of the existence of a best translation seems to be worth investigating. (4) In Reiter and Criscuolo (1981) it is claimed that default logic is more oriented towards the treatment of typicality than towards usuality (where a frequentist interpretation is assumed). It seems acceptable to say that "generally, birds fly" is more a question of property typical of birds than of frequency (especially because there is a considerable ambiguity about the referential on which this frequency would be defined). However, there are many examples where the choice between the two interpretations is more difficult; we can say "typically students are unemployed" as well as "most students are unemployed"

It also seems that when we say that flying is a typical capability of birds we are saying not only that "generally, birds fly" but also that "generally, flying animals are birds" (although there are some exceptions~.g. bats). It suggests that typicality is perhaps a

C. Froidevaux and D. Kayser

208

matter of default equivalence. Another question related to typicality is to know what the primary notion between "typical property" and "typical element in a class". Note that a typical element may have a non-typical property too! (5) Default logic, as discussed by Froidevaux and Kayser, is basically concerned with the application of default rules (e.g. typically high-school dropouts are adults) that are general even if they have exceptions to particular situations or cases (e.g. John is a high-school dropout). Farreny and Prade (1986) have proposed a numerical treatment of this kind of problem using possibility measures. The problem of producing a new default rule from already known default rules is not usually considered in the default-logic literature. This latter problem has been addressed and discussed by Zadeh ( 1985), who models the fuzzy proportions present in default rules such as "most x that are As are Bs" in the setting of possibility theory. See the end of Chapter I 0 for a brief account of this approach. Numerical approaches offer the advantage of quantifying the probability or the possibility of encountering exceptions.

Philippe Smets: (I) In the case of a closed normal default theory with a finite number of defaults, could Theorem I, given Theorem 4, provide a constructive way of getting the extension, considering each default one after the other. (2) Belief functions and default logic: many examples in this chapter could be handled with belief functions. To see the power of the belief-function approach within the context of default reasoning, I treat the example "Where does Mary live" (sec Section 3.4.2, (29)-(34)). Let X = "city where Mary lives" H = "city where Mary's husband lives" W = "city where Mary works" Let Bel 1(X =H)= IX, Bel 2(X = W) = p.

Bel 1(X #H)= I - IX, Bel 2(X # W) = I - p

I have a degree of belief IX that Mary lives with her husband and I - IX that she docs not. I have a degree of belief p that Mary lives where she works and I - p that she docs not. The domain of X relevant here is ff x "Y where ff = { T, --, T}and t · = { V,--, V}, T means Toronto, V means Vancouver, and a pair like (T,--, V) in .OJ x "Y means that Mary lives in Toronto and not in Vancouver. Evidence £ 3 is "H = T". Combined with evidence £ 1 on which Bel 1 was derived, it induces a belief Bel 13 with masses m 13 (cyl(T)) =IX and m 13 (cyl(1 T)) = I -IX, where cyl(T) is the cylindrical extension ofT on ff x "Y, i.e. cyl(T) = (T, V) v (T,--, V). Evidence £ 4 is" W = V". Combined with evidence £ 2 on which Bel 2 was derived. it induces a belief Bel 24 with masses m 24 (cyl(V)) = {J and m24 (cyl(--, V)) = I - {J. The combination of Bel 13 with Bel 24 leads to the belief function Bel 1234 with masses m1234(T, --, V) = IX(! - p), m 123 4(1 T, V) = (I - ::~.){!. m1234(T, V) = IX{J, m1234(1 T,--, V) =(I -IX)(! - {J). We further have evidence £ 5 "the place where Mary lives is unique". It induces ;t belief function Bel 5 with mass m5(1(T, V)) = I. The combination of Bel 5 with Belt 234 leads to the belief function Bel 12345 with masses

7

209

Inheritance in Semantic Networks and Default Logic

m 123d-, T, V)

=

(I - rx)P/(1 - rx{J)

m 12345 (1 T,-, V) =(I - rx)(l - P}/(1 - rxp)

This last belief function quantifies our degree of belief that Mary lives in Toronto, in Vancouver ... or somewhere else, a case not considered in the authors' analysis. Note that the same final solution would be derived if one had proposed Bel 1 (X =H)= rx and Bel 1 (X #H)= 0, Bel 2 (X = W) = p and Bel 2 (X # W) = 0, i.e. Bel 1 (Bel 2 ) gives mass rx(p) to "X= H" ("X= W') and leaves the remaining masses undetermined. Furthermore, owing to the symmetry and associativity of Dempster's rule of combination, the order in which the five pieces of evidence are combined is irrelevant. The only question that remains is to evaluate rx and p, i.e. the strength of our belief that respectively a wife lives with her husband and that she lives where she works. This evaluation problem is similar to the one encountered and theoretically solved with subjective probabilities. Our solution has the advantage of (i)

(ii) (iii)

not neglecting the case "Mary lives neither in Toronto nor in Vancouver"; handling the case where there is some ordering between the default rules (29) and (30); and providing a measure of the incoherence between the two pieces of evidence £ 1 and £ 2 (the term rxp).

Didier Dubois and Robert Valette: Anybody familiar with the literature on discrete asynchronous systems should be struck by the analogy between a NETL representation and a Petri net (Peterson, 1982). In both types of represention we have networks and markers (or tokens) moving from node to node. Moreover, the example given in Section 2 about cross-road traffic modelling is a typical example given in introductory courses on Petri nets. The following is a Petri net representing the behaviour of vehicles in the example:

places transition token

A Petri net is generally the model of a sequential process in an evolving environment. In the above network, the environment is represented by the dotted arrows, which express the facts that the light may switch from red to green, the Policeman may say to go, the road may get cleared, at some points of time.

C. Froidevaux and D. Kayser

210

There is a patent difference in the modelling purpose between the above network. and the one in Section 2 (p. 185). The latter describes a reasoning process and does not mean to simulate the passing of a traffic junction. However, in the NETL formalism as well as Reiter default logic, the knowledge representation is oriented. The reason in~ problem is always based on facts about what the policeman says, how the road is, how the traffic light is, and aims at deciding whether to go or not to go. This is clearly very different from representations in classical logic formalisms, which are not directed. This directed feature of NETL representations makes it possible to envisage a Petrinet model of the reasoning processes imbedded in a NETL network. However, the obtained Petri net will have special features, not shared by all kinds of Petri nets. namely (i)

coloured tokens: this is to account for the three kinds of markers in NETL (see Jensen (1981) for introduction to coloured Petri nets).

(ii)

propagation by "contamination", i.e. a token crossing a transition leaves behind a copy of itself:

usually produces

in the Petri-net conventions, however, the contamination mode is obtained by the following modification of the graph, with the same firing convention: which produces Using these conventions, it is possible to systematically translate NETL arrows into the Petri-net formalism. We shall use three kinds of tokens: the "yes" (Y), the "no" (N), which stand for M I and M2, and a "nil" (0) token standing for the state of a place for which nothing is proved so far. Any place in a Petri net must contain one and only one of Y, N, 0. Links between places and transitions are also coloured, with the following meaning: {x 1 ,

... ,

x.}

means that the transition is enabled only by a token of colour x = {x 1 , upstream place.

••• ,

x.} in the

X

o--t-0 means that upon firing a transition, a token of colour x appears in the downstream place.

7

211

Inheritance in Semantic Networks and Default Logic

In the following, we give the translation of NETL arrows:

A

A

~

TB I

I

B (set inclusion)

(typical set inclusion) with inhibitor

y

0, y

y

y

A

I

B

N,O

I

c

c

N,O

A

I I I .. B (set exclusion)

A

cCPoB y

N

Note that the term "set exclusion" is rather ambiguous, because in NETL, it is an oriented notion, while it is usually a symmetrical notion. What would be the advantages in using a Petri-net formalism rather than the NETL representation? In fact, mathematical and computerized tools are available to analyse Petri nets from the point of view of their consistency, the discovery of structural properties such as invariants (i.e. parts of the network that always contain the same numbers of tokens). But in the case of coloured Petri nets, these tools are still in their infancy. Moreover, most of the theoretical results and analytical algorithms exclude the contamination mode for propagating tokens. In conclusion, although it may seem interesting to know that NETL networks do belong to the Petri-net family, the consequences of this result do not look as promising as one might have thought beforehand.

Reply: Regarding Comment (I) lukaszewicz ( 1985)t that all and & = (D, W) can be constructed, by order, and this solution is close to Let £ 0 = Th( W) and D 0 = D; for 0 ~ i ~ n:

if

by Smets, the answer is yes: it is proved in only the extensions of a closed normal theory considering the n defaults of D in a well-specified the one that is sought by Smets.

there exists at least one default d in D;, reading ~ such that v u E E; and -, v ¢ E;

then pick one such default d; let D;+ 1 =D;-{d}, E;+ 1 =Th(E;u{v}) else (i.e. D; is empty or for all defaults d in D; either u ¢ E; or -, v E E;) E = E; is an extension of&, exit. t We are indebted to P. Besnard for bringing this p;:-oof to our attention.

212

C. Froidevaux and D. Kayser

Comments (1), (4) and (5) by Dubois and Prade and Comment (2) by Smets are variants of one and the same statement: default logic is non-numerical. They seem to consider this fact as a weakness; it is not. There are, in practice, situations where one knows what is "normal", without finding it natural to give a numerical estimate of what "normal" amounts to. In the Toronto/Vancouver example, for instance, as Smets says, "the only question that remains [in his approach] is to evaluate ct. and fi"; the default approach spares us the burden of computing an evaluation. However, as Dubois and Prade's Comment (4) points out, it is not always obvious whether a given situation is better modelled with or without numbers. Default logic postulates that "abnormal" situations are made explicit, and this might yield trouble when two abnormalities conflict with each other, as in the Republican/Quaker example. If one considers that one of them is "more" abnormal than the other (the situation considered in Dubois and Prade's Comment (I)) then one can declare it (see Section 3.4.2). Reiter, quoted in Section 3.1.2, presents defaults as "meta-rules": Dubois and Pradc (Comment (5)) ask whether defaults might produce new defaults, i.e. whether the "meta-meta" level makes sense; we are not aware of any work in this direction, but it might be an interesting one. Comment (2) by Dubois and Prade emphasizes the fact that, as there is no degree of truth in this approach, every conclusion, be it "hard" or "default", gets the same status. If this makes problems, we mentioned (Section 4.2) that the system can be augmented with number-passing capabilities, in order to differentiate the "true" from the "plausible". Finally, we agree with Dubois and Prade's Comment (3): knowing how to correctly translate natural-language statements into default-logic formulae would certainly be a good thing, but we cannot tell how much effort this might require, or even whether it is at all feasible. Another interesting direction is to use default logic as a tool for linguistic investigations, and there is some evidence that it could be a very promising one, at least for several linguistic issues (e.g. coreferentiality, ambiguity, typicality).

Additional references Farreny, H. and Prade, H. (1986). Default and inexact reasoning with possibilit)' degrees. IEEE Trans. Syst. Man Cyber. 16, 270-276. Jensen, K. (1981). Colored Petri nets and the invariant method. Theor. Camp. Sci. 14. 317-336. Peterson, J. L. (1982). Petri Net Theory and the Modeling of Systems. Prentice-Hall. Englewood Cliffs, N.J. Zadeh, L. A. ( 1985). Syllogistic reasoning in fuzzy logic and its application to usualit) and reasoning with dispositions. IEEE Trans. Syst. Man Cyber. 15, 754-763.

8

Probabilistic Logic GERHARD PAASS Gese/lschaft fiir Mathematik und Datenverarbeitung, Sankt Augustin, Federal Republic of Germany

Abstract In this chapter the degree of belief in propositiOns is expressed by numerical probabilities. Probability theory is employed to establish a consistent probability measure in agreement with available evidence. The lines of reasoning as well as inherent assumptions of different evaluation methods are discussed. Finally the case of uncertain and partially contradictory probabilities is considered.

1

INTRODUCTION

Expert systems commonly employ some means of drawing inferences from domain and problem knowledge where both the knowledge and its implications are less than certain. There are many principles of how to evaluate such "weak" knowledge. The approach discussed in this chapter is based on logic as well as probability theory. Assume that there are two propositions A and B, both of which may be either true or false. It is the goal of mathematical logic to determine whether combined expressions, such as A 1\ B, are true or false. Now suppose that because of incomplete knowledge the expert does not know whether the propositions A and Bare true or false but can specify the "probability" for the truth of A and the truth of B. Then it is the aim of probabilistic logic to evaluate the "probability" that expressions such as A " B are true. In colloquial use "probability" is just another expression for "belief", "likelihood" or "chance". In probability theory and mathematical statistics, however, there is a precise definition of probability. By means of a probability distribution, the chances of several facts being jointly true or false may be characterized. The main assumption of probabilistic logic is that there exists ~consistent probability distribution over the possible states in the domain of Interest that represents the current knowledge of the decision maker. It describes his information about the relative chances of facts, rules and Consequences being jointly true or false. If all probabilities are equal to zero

214

G. Paas. 1

or one, we again get the case of classical logic. Therefore probabilistic logic is an extension of classical logic. The concept of probabilistic logic may be applied to expert systems. A typical expert system consists of rules and facts that can be stated as logical propositions. They form the iriference net on which the analysis of the decision-maker is based. Usually the validity of rules and facts is not known exactly. Then one or more domain experts have to specify a probability for their validity, which may be based on theory, experience or judgement. Because of limited information and differing experience of the different experts, these assessments are error-prone, and the resulting probabilities may be inexact or uncertain themselves. If several experts supply uncertain probabilities then contradictory judgements may even occur. To resolve such conflicts, the decision-maker, the user of the expert system, has to estimate the reliability of the domain experts' judgement. The aim of the analysis is the joint evaluation of the rules and facts in the inference net and the associated probability distribution to arrive at the desired probabilities of some consequences. This process of reasoning is sketched in Fig. I. In the first section some basic concepts of probabilistic logic are discussed. Starting with an example, an interpretative framework for subjective probabilities is given and it is shown how a probability measure on propositions can be constructed. The second section gives a formal frame( decision maker )

j (expert!)

~ Assign prohahiliries ro rules, facr.,

(

(

Assesses re/iahiliry of'experrs

knowledge base probabilities of con seq uencc I. consequence ]

rule I rule 2

expert 2 )

expertn)

~

Fig. 1

probability of rules, facts: reliability of experts

Evaluarion hy prohahilisric logic

Reasoning with probabilistic logic in expert systems.

8

Prohahilistic Logic

215

work for the evaluation of inference nets. After a concise vector notation has been specified, the different types of information that may be available to the decision maker (structural information, "data" from experts) are compiled and principles for the evaluation of the inference net are formulated. The following two sections describe methods for the evaluation of inference nets in more detail. If the available probabilities of rules and facts are exact then the decision maker can perform a "worst-case" analysis, which yields upper and lower bounds on probabilities in question without making additional assumptions. If the decision-maker can state reasonable assumptions about the statistical "correlation" between different propositions (and hence about the structure of the probability distribution) then he can rule out implausible alternatives and arrive at much more informative results in a "restricted" analysis. The following section applies these concepts to the case in which probabilities themselves are uncertain. The uncertainty is represented by "error models", which can be evaluated by statistical concepts such as the likelihood approach, the maximum-entropy principle or Bayesian statistics. In the last section of the chapter the main features of probabilistic logic will be summarized and compared with similar techniques. 2

BASIC CONCEPTS .OF PROBABILISTIC LOGIC

The aim of probabilistic logic is the definition and evaluation of a probability distribution over logical propositions. Comprehensive discussions of the topic are given by Cheeseman (1985), Spiegel halter (1986a, b) and Nilsson (1986). The main intention of this chapter is to describe the basic concepts and assumptions inherent in probabilistic logic. They will be illustrated by the following small-scale example. 2.1

Example: rule-based diagnostic system

Suppose that a doctor has to decide whether or not a patient has a disease D. The relation between the two symptoms A and B and the disease D is specified in the form of the following rules F., ... , F5 , which hold with a certain probability: F 1 :="If A then D follows" holds with probability n 1

F2 :="If ---,A then D follows" holds with probability n 2 F 3 :="If B then D follows" holds with probability n 3 F4 :="If D then B follows" holds with probability n 4 F 5 :="If A then B follows" holds with probability n 5

216

G. Paas.1

These probabilities ni reflect the subjective degree of belief of the doctor in the truth of the rules for a certain universe, for instance the people of a town. They are based on medical theory, experience or intuition. Assume in addition that it is known that sympton A has a certain probability with respect to this universe: F6 :="A" holds with probability n 6 The rules and facts F" .. . , F6 contain all available informatio111 about the probabilistic relation of A, B and D in the universe. The probability n 1 for F 1 :="If A then D follows" might be interpreted as the probability that the logical implication A => D holds. This implication is true if (A 1\ D) v -,A is valid. This proposition, however, has no intuitive meaning to the physician. For him it is more natural to consider only the situation that the antecedent A holds and assess the chance that D will be true. Therefore the probability associated with a rule is always defined as the conditiona1 probability of the consequence given the antecedent. For F 1 we have for example n 1 = p(D I A):= p(A 1\ D)/p(A). The probabilistic relation between A and D is completely defined by the "joint" probabilities p(A 1\ D). p(A 1\ -,D), p(1A 1\ D) and p(1A 1\ 1D). These probabilities can always be expressed by the conditional probabilities with respect to A as well as D. e.g. p(A 1\ D) = p(D I A)p(A) = p(A I D)p(D). Therefore the specification of p(D I A) does not necessarily mean that D is caused by A, as the identical relation between A and D could be characterized using p(A I D). Assume that the probability of the diagnosis D is to be determined for a specific patient who exhibits the symptons -,A and B: F 7 :="Diagnose D" holds for the patient with probability n 7 This probability is the conditional probability of the disease given the symptoms: 1t7 = p(D IIA 1\ B). The resulting inference net is shown in Fig. 2. Historically, probabilistic reasoning in expert systems has been centred around the Bayesian formula. To avoid computational difficulties, this formula in practice was employed together with a number of severe structural restrictions, especially the assumption of conditional independence. which are usually highly unrealistic and invalidate the results of the analysis. These restrictions are relaxed for the approaches presented in this chapter (i) (ii) (iii)

"symptoms" are not required to be conditionally independent given the disease"; "diagnoses" do not have to be exclusive; the inference net may contain cycles, i.e. multiple chains of reasoning for the same "facts";

8

217

Prohahilistic Logic p(Bi A)

(

(

symptom A )

~ p(Dj .. ,

)

""'p(D I •A)

~ ~

(

Fig. 2

(iv)

symptom 8

diseaseD

)

Example from medical diagnostics.

no prior probabilities are required for diagnoses and intermediary facts.

Therefore the present approach seems to be much better in application in real expert systems. For a more comprehensive discussion see Section 6.1. 2.2

Interpretation of subjective probabilities

Suppose that B means "the patient has hypertension". Now assume that Fred Miller comes to the expert, a physician, who states after some investigation: p(B) = 0.3. What is the meaning of this assertion: "Fred Miller has hypertension with probability 0.3"? The usual concept of probability involves a long sequence of repetitions of a given situation. For example saying that a fair coin has the probability-! of coming up heads means that in a long series of independent flips of the coin heads will occur about half of the time. This frequency concept, however, is not adequate when dealing with the probability of a proposition like B. It is not possible to make identical "copies" of Fred Miller with identical life histories and count the relative frequency of Fred Millers with hypertension. The theory of subjective probability (Berger, 1980, pp. 61 fl") has been created to enable one to talk about probabilities when the frequency Viewpoint does not apply. The main idea is to let the probability of a proposition reflect the personal belief in the "chance" that the proposition is true. In this context a proposition can be characterized as a clear statement that is capable of being either true or false. For example the expert may have a personal feeling as to the chance of B being about 0.3, even though no frequency probability can be assigned to the event. Such probability assessment is very common in everyday life, for instance when estimating the chance of rain for the next day. The calculation of a frequency probability is theoretically straightforward.

218

G. Paass

One simply determines the relative frequency of the event of interest. A subjective probability, however, can be determined only by introspection. There are several concepts available that illustrate the actual meaning of subjective probabilities. The simplest way of determining subjective probability values is to compare the probability of propositions. The expert, for example, can compare B with -,B. If-, B is felt to be twice as likely to occur as B then he would define p(B) = t and p(-, B) = l Betting situations are especially useful to consider because they tend to make the mind evaluate more carefully. To determine p(B) by this mechanism, imagine a situation in which the expert will receive 1 - d dollars if B occurs and d dollars if -, B occurs, where 0 ~ d ~ 1. The idea is then to choose d until the expert is indifferent between the two possibilities. Assuming rational behaviour (and linear utility of money), this is the case if the expected gain is equal for both alternatives: (1 - d)p(B)

= dp(1B)

(I )

where p(-, B) = 1 - p(B). As a result we get p(B) = d. The practical difficulties with the elicitation of subjective probabilities are discussed by Kahneman et a[. ( 1982). Assume that an expert has specified his personal degree of belief in the truth of the propositions under consideration. It is not clear at all that this degree of belief should constitute a probability measure p. It can, however, be shown that a probability measure will result if the specification of beliefs obeys a set of axioms reflecting rational behaviour. Such axioms, for example. contain postulates for the behaviour of a rational gambler, who is able to specify preferences between the "bets" described above. The attractiveness of these axioms, and hence of the laws of probability derived from them, arises from the fact that someone who violates them using a different scalar measure of uncertainty is liable to demonstrable loss and hence irrational. For discussions see Cheeseman (1985), Genest and Zidek ( 1986), Horvitz et a/. (1986), Good (1982) and Fishburn ( 1986).

2.3

Probability measures on propositions

Assume that a single rational expert provided consistent probabilities n., ... , n.f. for the facts and rules F., ... , F.F of the inference net. For a 11

evaluation of the desired probabilities of diseases it is necessary to integratL' all these probabilities to a joint probability measure, which is assumed to exist. In this section the construction of such a probability measure is discussed Consider rule F 1 in the example, for which the conditional probabilitY n. = p(D I A) •= p(D 1\ A)/p(A) is known to the expert. To express the conditional probabilities, we have to determine the probabilities of A and A 1\ f).

8

219

Probabilistic Logic

All such propositions that occur in a probability (or conditional probability) specified for the inference net (including probabilities to be determined as a result of the analysis) form the set 011 == { U 1 , ... , Unu} of relevant propositions. In the example consisting ofF" ... , F1 the set lllf comprises the following nu = I 0 propositions:

Us==B,

U2 ==A "D,

V 3 == -,A,

V4 == -,A" D

U6 == B" D,

U1 == D,

U8 ==A " B

U9 == -,A" B, V 10 ==D" -,A" B, To define a probability distribution for the propositions in lllf, we have to construct the smallest set ·'fl/' = { W" .. . , Wnw} of "elementary" propositions that fulfil the following conditions: (i)

each Vi is the disjunction of some of the J.tj: Vi =

(ii)

the Wj are exclusive: J.tj 1

(iii)

the· Wj are exhaustive: W1 v ... v W,w is true.

1\

VieJ(iJ

J.tj;

W} 2 is false for j 1 i= h;

As the Wj are exclusive and exhaustive, they represent the spectrum of all possible situations that are of interest for the decision problem. Therefore each Wj is called a possible world with respect to the problem of interest. The set § that can be formed from disjunctions, conjunctions, and negations of the elements of "'f/' is called the Boolean algebra of propositions in "'f/'. § contains the Vi as well as T, the proposition that is always true, and F, the proposition that is always false. Using the above properties, the set of possible worlds can be constructed by the generation of conjunctions ul 1\ 1\ Unu with ui = vi or ui = --,vi. If lllf contains only propositions from propositional calculus then it suffices to consider the conjunctions of the atomic propositions (A, B and D in our example) and their negations. A probability measure p( ·) is a numerically valued set function p: §--+ [0, I] satisfying the following axioms for all A, BE§: 0

p(A) If A " B

=F

~

0,

then

0

0

(2)

p(T) = I

p(A v B) = p(A)

+ p(B)

(3)

lhe sum of the probabilities p( Wj) is equal to I because the Wj are exclusive and exhaustive. The probability p(Ud is just the sum of the probabilities of the possible worlds that constitute Vi:

p(Ud =

L jeJ(i)

p(Wj),

where vi=

v

Wj

(4)

jeJ(i)

Conditions under which sample spaces can be generated from infinite sets are

220

G. Paas.1·

explored in extension theorems (Fishburn, 1986). Sample spaces can also be constructed if we have sentences, i.e. closed well-formed formulae in some logical language L, for instance first-order logic (Nilsson, 1986; Grosof, 1986 ). Here it is required that the consistency of the finite set lllf = {Vi, ... , v.u} of sentences of interest can be established (which is not always possible in firstorder logic). The probability measure is defined over the set 1f/' of equivalence classes of interpretations for the sentences in lllf, i.e. of consistent cpnjunctions

(5) Where Uj may take the value Ui or I Uj, and COnsistency is defined With respect to L. Grosof ( 1986) points out that 1f/' is isomorphic to the notion of the power set of a "frame of discernment" in Shafer-Dempster theory. With respect to the evaluation of probabilities in an inference net, there is no difference between propositions from propositional calculus and sentences from first-order logic after the set of possible worlds 1f/' has been constructed.

3 3.1

FRAMEWORK FOR THE EVALUATION OF INFERENCE NETS Vector notation

For simplicity, we assume that the set ou of relevant propositions contains the certain proposition T as always p( T) = 1 is known. Because a nonconditional probability p(A) is equal to the conditional probability p(A I n all specified probabilities 7tj, i = 1, .. ., nF, can be considered as conditional probabilities. For each ni = p(Ail I Ai 2 ) = p(Ai 1 1\ Ai 2 )/p(Ad the set J/1 contains the propositions U t ==Ail " Ai 2 as well as U i- == Ai 2 . The linear equations p(Ut} = LeJ+(i) p(W;) and p(Ui-) = Lier(i) p(WJ) suggest the representation of p( u and p( u j-) in terms of the p( W; ). we define a (0-1)-matrix R+ == (rij)"v·"w' where the ith row rt contains the representation of p( U t) in terms of the p( W; ), i.e. r ij = I iff j E J + (i). Then we have p(Ut)=rtpw, where Pw==(p(W1 ), ... ,p(W.w))' is the vector of probabilities of the possible worlds. In a similar way, we can define R- == (rij)"v·"w' where ri} = I iff j E F(i). This yields p(Ui-) = ri~pw. and therefore

n

ni = rt Pw/ri~ Pw

(6)

If we denote the elementwise division of vectors by +, we get a relation for the vector 1t == (n~. ... , n.F)' of probabilities supplied by the experts: 1t=(R+pw)+(R-pw)

(7)

This equation is the basis of most procedures discussed later. It describe~

8

221

Prohahilistic Logic

which probabilities Pw of possible worlds are compatible with the numbers n; supplied by the experts. As an additional restriction to pw, we have the requirement that the probabilities must sum to one and are not negative l'pw =I,

Pw ~o.

with

1=

(1, .... , I)'

(8)

The probability measures for which the relations (7) and (8) hold form the set 9 0 offeasible probability measures. If 1t contains conditional probabilities, the relation (7) between Pw and 1t is nonlinear. If, however, the value of 1t is known exactly, (7) can be transformed to a linear restriction on Pw (cf. Grosof, 1986). From (6), we have 0 = (n;r;~ - rt )pw =: c~pw. If the ith row of a matrix C" is defined by c~ then we arrive at C"pw = 0

(9)

In the special case that there are no "genuine" conditional probabilities, R-Pw consists of a vector of ones, and the nonlinear relation (7) reduces to a linear one: 1t = R+ Pw· This case was discussed by Nilsson (1986, p. 74). For use in later sections, let us introduce random variables associated with propositions. As shown above, there are atomic propositions X~o .. ., X"x whose conjunctions are the possible worlds Wj. (In the example, we had X 1 ==A, X 2 == B and X 3 ==D.) For each X; we define a binary random variable X; that can take the values X; and -,X;. Then marginal and conditional probability measures can be specified in a simple way. p(x;, xi), for instance, denotes the marginal probability measure given by p(X; 1\ Xi), p(X; 1\ -, Xj), p(-, X; 1\ Xi) and p(-, X; 1\ -,Xi), while p(x;i xi) symbolizes the corresponding conditional probability measures. The joint distribution is denoted by p(x). Usually it is the aim of the decision-maker to determine the probability of some proposition U * = VieJ• Wj, the desired "diagnosis". Let g* be a (0-1 )vector with gj = I iff j E J*. Then we have

p(U*) = g*'pw

(10)

In principle, p(U*) can be estimated by first determining some "optimal" Pw from the given 1t using (7) and (8), and then calculating p( U *) from (I 0). Before the probabilities for an inference net are specified, it is necessary to fix the relevant universe that is to be described by the joint distribution p(x). With respect to evaluation, the decision-maker is usually interested in the Probability of the diagnosis U * only for that part of the distribution p(x) Where the specific available evidence, the symptoms of the "patient", holds. If these symptoms B~o .. . , B1 are known with certainty then the desired Probability of U * for the patient is given by p( U *I B 1 1\ ... 1\ B1). If only the Probability of the symptoms is known then an auxiliary proposition E may

222

G. Pua.,_,

be introduced that is true iff the specific evidence holds. The information about the probabilities of the symptoms then may be introduced in the inference net by rules according to the conditional probabilities p(Bi I£). The desired probability of the diagnosis then will be p( U *I £) (for a similar approach cf. Spiegelhalter, 1986b). This avoids the usual approach (Nilsson. 1986), where all 21-terms of the factorization p(U*IE)=

I

p(U*Ib., ... ,b,)p(b., ... ,briE)

(IIJ

b, = B,, •81: .. . : b1=8 1, •81

have to be determined. Even for a moderate number I of symptoms, this approach is no longer practical because of the sheer number of combinations. Of course, as our approach involves less information than ( 11 ), it leads to an unique solution only if appropriate additional structural restrictions arc imposed.

3.2

Available information and evaluation strategies

Suppose that one or more domain experts have supplied a vector of probabilities 7t = (n., ... , n.,.) for the rules and facts F., .... , F.,.. Usually the dimension nF of 1t will be smaller than the dimension nw of Pw· If (7) and (8) are consistent and there is a solution Pw for them, it will not be unique, and the set .OJ!0 of feasible probability measures contains many elements. As it is impossible to distinguish between these solutions using the information contained in n, the vector Pw of probabilities is not identifiable in this case. Then (7) and (8) will have no unique solution for Pw. and the set &10 of feasible probability measures contains many elements. Often, however, the decision-maker has additional information that can he used for the evaluation of the inference net. One type of information concerns the structure of the probability measure. (i)

From theoretical or heuristical arguments, the decision-maker often knows that the true distribution is a member of a smaller restricted class of distributions. This is the case, for instance, if the experts arc independent and use different sources of information. The probability measures in such a class can be described by a smaller vector 9 of parameters, from which the whole vector Pw of probabilities can be recovered by the "parametrization", a function Pw(9) of 9.

(ii)

Alternatively, the decision-maker can consider the vector pw of unknown probabilities itself as a random quantity with distribution Pr(pw ). Pr(pw) describes the prior knowledge of the decision-maker about Pw before the information in 1t supplied by the experts is known to him, and allows a very flexible specification of structural knowledge about the probability distribution.

8

Probabilistic Logic

223

In addition, the decision maker needs some information on the precision of the probabilities n 1 supplied by the domain experts. Two cases can be distinguished. First, the n1 may be known exactly from definitional or theoretical reasoning. Secondly, the n 1 may be uncertain and erroneous to some extent. These errors may be a consequence of the limited information of the experts about the subject, and do not imply that the experts violate the axioms of rational decision. To describe the extent of such errors, the decision-maker has to assess the reliability of the experts from his own subjective view. This assessment can be formalized by an error model that gives the stochastic relation between the true probability and the numbers ir1 supplied by the experts. Suppose that the decision-maker wants to estimate the probability p( U *) of some proposition U* from a given ft. Depending on his information on the structure of the distribution p(x) and the errors, there are several strategies for the evaluation of the inference net. Let us first consider the case where the probabilities n 1 supplied by the experts are exact. If the decision-maker has no additional information about the internal structure of p(x) then it might be sufficient for his purposes to determine the lowest value p( U *)1ow and the highest value p( U *)high that are compatible with the information in ft. Then the probability of U * has to lie in the interval [p(U*)1ow• p(U*)high]. This approach is called worst-case analysis. It is closely related to the minimax principle of statistical decision theory, where the decision-maker tries to limit the maximal loss that could result from a decision. Assume that the decision-maker has some knowledge about the structure of p(x) and knows that p(x) is contained in a restricted class f!/J 1 of distributions. In a restricted analysis he can exploit this knowledge to determine the set of solutions & 2 := & 0 n f!/J 1 from that class for which (9) and (8) hold. If the restrictions are defined in an appropriate way and their number is large enough then multiple solutions may be eliminated, and f!/J2 contains a unique element, the "solution". If the supplied probabilities are uncertain because of limited information of the experts then we assume that the decision-maker can formulate his subjective knowledge about this uncertainty as an error model. This specifies the random relation between the "true" probabilities ft that would be specified by completely informed experts and the uncertain "data" ft. Again, structural restrictions may be taken into account by a parametrization Pw(O). This is the usual set-up of statistics, where inferences about parameters 8 are drawn from "observed" data. Consequently, the evaluation of the inference net can be done according to the different approaches of statistical analysis: (i)

classical statistical analysis relies on the frequency interpretation of

224

G. Pa<~.~·s

probability and employs point estimation, hypothesis testing and confidence methods to arrive at statistical conclusions; (ii)

Bayesian statistical analysis starts with a prior distribution Pr(O) on the parameters; the information contained in Pr(9) as well as in the "data" ii: is summarized by a posterior distribution Pr(OI ii:).

The characteristics of the different types of analysis are summari~d in Table 1. In the next sections these types of analysis are discussed in more detail. Table 1

Different types of analysis.

Information available to the decision-maker

on the structure of the probability distribution

on the precision of supplied probabilities

Type of analysis

Missing

Worst-case analysis

Restricted class of distributions

Restricted analysis

n is uncertain; {Restricted class of distributions error model for n

Classical statistical analysis

7t

{

is exact

Prior distribution on probabilities

4 4.1

Bayesian analysis

EXACT KNOWLEDGE ABOUT PROBABILITIES Worst-case analysis

Assume that n is known exactly and for some U *the decision-maker wants to estimate the smallest and the largest values of p( U *)compatible with n. From (10), we have p(U*) = g*'pw, and because of(8) and (9), the decision-maker can determine the highest value p( U *)high consistent with 1t by solving the following linear-programming problem: determine p( U *)high •= max g*'pw

subject to the restrictions

Pw

C"pw

=

0,

Pw

~

0,

For the solution of this problem, efficient algorithms from operations research are available. In the same way, p( U *) 1ow can be calculated by maximizing -g*'pw. The resulting bounds on p(U*) are correct without additional assumptions about the structure of the probability distribution. If

8

225

Probabilistic Logic

enough restrictions are available then the resulting interval [p(U*) 1ow• p(U*)high] for p(U*) will be sufficiently narrow. In general, the specification of an exact probability, e.g. n:; = 0.73184675 ... , is impossible in reality for an expert; as infinitely fine probability comparisons are then needed. It is realistic, however, that an expert knows with certainty that the true rr:; is contained in the interval [rr:;,Jow• 1t;, high] = [0.70, 0.75]. This yields exact upper and lower bounds 1tJow ::::; 1t ::::; 1thigh· Because of (6), we get 1t;, 10 w ::::; rt Pw/r;~ Pw· This leads to the inequality C~owPw ::::; 0, where the matrix Ciow is defined according to (9). In a similar way, we get C.:ighPw ~ 0. Then the determination of p( U *)high with p( U *) = g*'pw amounts to the following linear-programming problem: determine

p( U *)high •= max g*'pw

subject to the restrictions

Pw

C:'awPw::::; 0,

Pw

~ 0,

a'pw = I

The attractive feature of this approach is that it involves information combination and processing only via probabilistic means while explicitly recognizing the limited precision of probability elicitation. For a discussion see Fishburn (1986, pp. 340, 346, 351). The linear-programming approach gives a solution that exploits all available information in an optimal way. The analysis is in terms of the bounds on the joint vector Pw in !R"w instead of bounds on single p( J.tj ). As the Pw that meet the restrictions form an nw dimensional simplex, whose boundaries in general are not parallel to the coordinates, it contains more information and leads to smaller intervals for p( U *) than a lowerdimensional approach. This concept is a generalization of the convex Bayesian approach (Thompson, 1985; Kyburg, 1987), where probability intervals are propagated using the Bayesian formula. In the area of statistics, Smith (1961) has discussed a theory of interval-valued upper and lower probabilities. Other features of the distribution can also be characterized by restrictions. If, for instance, A and B are known to be independent then this yields P(A 1\ B) = p(A)p(B) (cf. Grosof, 1986). Such restrictions, however, are nonlinear, and instead of efficient linear-programming algorithms, general programs for constrained optimization have to be employed. Another type of nonlinearity is induced by the evaluation of conditional probabilities, e.g. P(D I A 1\ B) •= p(D 1\ A 1\ B)jp(A 1\ B). For both the numerator and the denominator probability bounds may be determined separately, yielding a coarse bound on the ratio. This bound, however, may be refined arbitrarily by partitioning the denominator range into a series of small intervals and determining intervals for the numerator given the denominator intervals. This technique is demonstrated in the following example.

226

G. Paas.v

Example example:

Assume that the following probability bounds are known for our 0.0 ::::; p(D I A) ::::; 0.4,

0.7::::; p(DI-,A)::::; 0.9

0.6 ::::; p(B I D) ::::; 0.8,

0.3 ::::; p(B I I D)::::; 0.5

0.0::::; p(B I A)::::; 0.2,

0.3 ::::; p(A) ::::; 0.7

an interval [v 0 , v 1] for p(D I-, A 1\ B) = First we can determine ranges for the denominator and the numerator by the linear-programming method: We

p(D

have

1\

-,A

to

1\

determine

B)/p(-, A " B).

0.13::::; p(D" -,A" B)::::; 0.60,

0.23 ::::; p(-, A " B) ::::; 0.69

As a lower bound for v0 we get 0.13/0.69 = 0.19, and as an upper bound 0.13/0.23 = 0.56. For v 1 we get the bounds 0.60/0.69 = 0.87 and 0.60/0.13 ~ 1. To get tighter bounds, the interval [0.23, 0.69] of the denominator can be partitioned into k equally spaced subintervals [t;, t;+ 1 ], i = 1, ... , k. Subsequently, k different LP-problems can be solved with the additional restriction t; ::::; p(-, A " B) ::::; t; + 1 in the ith problem. From each solution, bounds for v0 and v 1 can be determined. The maximum of all upper bounds and the minimum of all lower bounds gives a globally valid range for v0 or v 1 respectively. For k = 10 we get 0.477 ::::; v0 ::::; 0.544 and 0.998 ::::; v 1 ::::; 1.0, while k = 100 yields 0.531 ::::; v0 ::::; 0.538 and 0.999 ::::; v 1 ::::; 1.0. Hence we arrive at p(D I -,A " B) E [0.531, 1.000]. In this way we can get bounds for the solution of a nonlinear problem by linear methods. Although linear programming problems may be solved for quite large systems, the computational effort increases roughly as a cubic function of the number of constraints. Therefore methods to reduce the computational burden are desirable. An obvious alternative is to specify the linearprogramming problem in terms of marginal probabilities instead of the complete vector Pw· This means that for each rule like p(X; I Xi)= nk the corresponding restriction p(X; " Xi)= nkp(Xi) is formulated in terms of the marginal distribution p(x;, xi) of the two variables X; and xi. Additional restrictions are needed to ensure compatibility of marginal probabilities for the different subsets of variables. If, for example, the marginal distributions p(x;, xi) and p(x;, xd are used then p(x;) has to be identical for both marginals. As an additional simplification, the calculations for each of the marginal distributions could be performed separately using the results to narrow the range for the probability of each proposition in a stepwise manner. The INFERNO approach (Quinlan, 1983) works in this way by providing new bounds for propositions belonging to a single rule. Assume, for instance, that

8

Prohahi/isric Logic

227

p(D 1 A)~ ex 1 and p(A) ~ ex 2 . Then using the relations p(D) ~ p(D 1\ A)= p(DIA)p(A) yields p(D) ~ ex 2 •ex 1 :=ex 3 . By this and similar arguments, the

bounds on probabilities successively may be tightened. The resulting intervals are still valid, but in general they are not as narrow as the optimal ones. 4.2

Restricted analysis

Worst-case analysis often takes into account situations that are highly unlikely. This sometimes leads to very conservative bounds for the desired probabilities. One way to rule out such improbable situations is the imposition of structural restrictions on p(x). To define such a restricted class of distributions, two different strategies may be used. (i)

A new parametrization is chosen in such a way that some of the new parameters can be restricted to a fixed value (e.g. zero) by theoretical or heuristic arguments defining a restricted class of distributions. To arrive at a solution, only the remaining smaller vector 9 of parameters has to be determined, from which the whole vector Pw of probabilities can be recovered by the "parametrization", a function Pw(O). If the number of restrictions is large enough then under certain conditions a unique solution 6 may exist.

(ii)

One probability distribution from the set of feasible solutions is selected by some reasonable criterion. It is reasonable, for example, to select that probability measure with smallest "information" content, as it imposes minimal additional assumptions. The resulting set of solutions again forms a restricted class of distributions.

Let us first discuss a convenient parametrization and appropriate solution methods. The distribution p(x) concerns binary variables x 1 , ••. , x"x' and without loss of generality can be assumed to be multinomial with probabilities Pw. A frequently used parametrization of p(x) is closely related to a standard measure of the association between two binary random variables X; and xi, the logarithm of the cross-product ratio: p(iX; 1\ ---,Xi)p(X; 1\ Xi) ex( x;, xi) = I og :_---,,.,.------,,..,.-,---"'--''c:----:.,..::-,. p(iX; 1\ Xj)p(X; 1\ --,Xi)

(12)

This has values between - oo and oo. Values >0 indicate a positive association (if x; = X; then xi tends to be Xi, and vice versa), while values <0 indicate a negative association (if x; = X; then xi tends to be ---,Xi, and Vice versa). ex(x;, xi) takes the value 0 if the variables are independent, i.e. P(x;, xi)= p(x;)p(xi).

228

G. Paas.1·

In the case of three variables, we could consider the conditional distributions p(x;, xi I xk = -, Xd and p(x;, xi I xk = Xk) and determine the two-dimensional association between X; and xi for these conditional distributions. a(x;, xi, xk), defined as the difference of these "conditional" associations, is independent of the variable used for conditioning. If a(x;, xi, xk) =f. 0 then the two-dimensional associations of the conditional distributions p(x;, xi I xk = -, Xk) and p(x;, xi I xk = Xk) are different. Consequently, there is a characteristic of the distribution that cannot be""explained" in terms of the two-dimensional margins. Hence a(x;, xi, xd is a sort of "higher-order" interaction between X;, xi and xk. This way to construct measures for higher order associations can be extended to any number of variables (Fienberg, 1980, pp. 27ff). They have the attractive feature that their value is not changed if only a lower-order marginal is modified. This means that a higher-order interaction is not affected if only information about lower-dimensional marginals is taken into account. Now consider the case where no information jointly concerning X;, xi and xk is available. It is then plausible to assume a(x;, xi, xk) = 0 as there is no reason to suppose p(x;, xi I xk = Xd =f. p(x;, xi I xk = -, Xk). The distribution p(x) can be reparametrized in terms of the interactions or functions thereof, for instance as log-linear models (Bishop et al., 197 5 ). Higher-order interactions (and the corresponding parameters) are set equal to zero if no data for the determination of these interactions are available and there are no theoretical reasons indicating the presence of these interactions. In this way, the class of probability distributions may be restricted such that to each vector 1t of consistent values for the marginal probabilities there exists an unique solution Pw· Using this approach, the decision-maker may take into account all higher-order interactions between variables that he thinks are relevant and for which he has some information .

..

X1

.,.,...,__ _ _-til~

Xz

/.,,.,, XJ

Fig. 3

Inference net with cycle.

XJ

Fig. 4

Inference net without cycles

The procedure for obtaining a solution Pw depends on the structure of the inference net. Let us first consider the small example shown in Fig. 3, where only the marginal p(x., x 2 ) and the conditional distributions p(x 3 1 x 1 ) and

8

229

Prohahilistic Lof(ic

p(x 3 1x 2) are known to the decision-maker, who wants to estimate the joint distribution p(x 1 , x 2, x 3 ). Starting with the marginal distribution p(x" x 2 ), there are two possible "lines of reasoning" to estimate p(x" x 2, x 3 ):

(i)

ignore p(x 3 1x 2) and use p(x3lxd:

(ii)

p(x"x2,x3) = p(x3lxdp(x"x2);

ignore p(x 3 1 xd and use p(x31 x2):

p(x,, x2, x3) = p(x31 x2)p(x" x2).

Obviously a compromise between both lines of reasoning has to be found. If such cycles are present in an inference net then it can be shown that a solution for the joint distribution can only be obtained by iterative procedures. This corresponds to the result of Dubois and Prade (see Chapter 10) that a logic of uncertainty is generally not truth-functional. Haber and Brown (1986) present algorithms for the general case, while Pearl ( 1986) discusses simplified procedures for distributions with special interaction patterns. If there a're no cycles, as in Fig. 4 then Pw can be determined by a noniterative formula, which, however, may be complicated (Bishop et al., 1975, pp. 74ft} In the case of one "disease" x 1 and "symptoms" x 2, .. . , xk that are conditionally independent with respect to x 1 (i.e. p(x;, xi I xd =p(x;lxdp(xilxd for i>j> 1) this formula reduces to the famous Bayesian formula employed in many expert systems. The second way to restrict the class of distributions is the selection of a distribution from the set of feasible distributions by some criterion. Konolidge (1982) proposed the selection of a distribution where the entropy H(pw) == -pw'log Pw is maximized subject to the conditions (8) and (9). As maximum entropy corresponds to minimum (statistical) information in the probability distribution, this approach is reasonable. The maximum-entropy approach and similar criteria are discussed by Hunter (1986), Shore (1986), Gokhale and Kullback (1978), Diaconis and Zabell (1982) and Dalkey (1986). The notion of "minimum information" is, however, a bit misleading as it suggest that we get something (a unique solution) for nothing. In most cases it leads to identical estimates as the assumption of a log-linear model where the higher-order associations for which no data are available have been set to zero. Consequently, (i)

the selection of a distribution with "minimum information" implies similar or identical restrictions as if higher-order interactions were set to zero;

(ii)

the restrictions used in the log-linear model in this respect are the least demanding;

230

G. Pall.ls

(iii)

5

if the decision-maker does not know whether particular interactions are zero or not and no information is available about them from the experts then the utilization of the "minimum-information" approach or the restriction of those interactions to zero may be misleadiny.

UNCERTAIN KNOWLEDGE ABOUT PROBABILITIEiS

Assume that a number of experts have supplied probabilies iti expressing their subjective probability concerning the facts or rules of interest to the decision-maker. As the experts have different subjective probability distributions according to their different limited state of information and personal experience, their judgements are uncertain and may be erroneous and conflicting to some extent. This does not mean that the experts violate the laws of rational decision-making; the differences may simply arise from their different expertise and knowledge about the topic in question. The consequences of inconsistent probabilities are demonstrated in the following example: p(BIA)=O.l,

p(A) = 0.9,

p(B) = 0.9

Assume that these specifications hold with certainty. Then obviously p(-,BIA) = 0.9 holds. We have p(1B) ~ p(1B" A)= p(-,BIA)p(A) = 0.81, which contradicts p(B) = 0.9. Hence specifications from different

experts may lead to contradictions even if facts and rules are only assumed to hold with some probability. If an estimate of p( U *) depends too much on highly uncertain probabilities then it will be unreliable itself. Therefore. during the evaluation of the inference net, the extent of possible errors in the nj should be taken into account. For the decision-maker the probabilities iti are some sort of data. We assume that from his subjective knowledge about the experts and their state of information he can specify the relative precision of the iti. In the next section it is proposed to formalize this assessment by an "error model".

5.1

Error models

The value iti provided by an expert can be considered as a fixed but unknown uncertain quantity whose distribution is completely specified by the conditional distribution p(iti lni) defining the error model. The "true" probability ni can be considered as the subjective probability estimate that would be supplied by a rational expert with complete information about all aspects of the problem. With respect to statistical analysis, each iti can be considered as

8

231

Prohahi/istic Logic

a statistic of a hypothetical sample (Paass, 1986) containing one or more elements and generated according to p(iti I ni ). Hamburger ( 1986) shows that this representation of uncertainty satisfies a list of general desiderata. We assume that the decision-maker is able to assess the precision of the vector it of values supplied by the experts and knows the corresponding p(it l1t) or some of their parameters. In addition, we suppose that he can restrict the class of distributions according to theoretical reasons, which yields a parametrization Pw(O). Because he could choose 9 == Pw. this comprises the unrestricted case. Obviously this task of the decision-maker is very difficult. He has to assess the "performance" of the experts and their honesty. Moreover, he has to evaluate whether the experts share some information, which jointly will influence their judgement. Genest and Zidek ( 1986, pp. 120f) and French ( 1985) discuss these issues in connection with the "group-consensus problem", where a group consensus in a group of experts has to be found. They also consider the situation that the decision-maker is an expert himself. There are different types of error models and associated error distributions p(iti I ni(O)). The assumption of a model by the decision-maker merely implies that the figures supplied by the expert have the same distribution as if they resulted from a chance process according to that model. It is not necessary that the corresponding experiment or chance process actually took place. Piecewise-uniform error density, e.g.

_

{5y

p(nd ni(O)) =

if

(I - y)/0.8

liti - ni(O) I ~ 0.1 otherwise

where iri(O) ==min (max (ni(O), 0.1 ), 0.9) and 0 ~ y ~ 1. The probability that iti is inside the interval /(ni(O)) == [iri(O) - 0.1, iri(O) + 0.1] is y. Inside as well as outside the interval all values are equally probable. For y = I we know for certain that ni is contained in /(ni(O)), yielding the interval restrictions of worst-case analysis as a special case (cf. Loui, 1986). Note, however, that the error model assumes an uniform distribution over this interval, whereas in worst-case analysis it is only known that iti follows an arbitrary distribution confined to the interval. Additive error with constant variance itj = 1tj(9)

+ l>j,

E(ed = 0,

var (ei) =

af

Here iti is simply assumed to be a measurement value without respect to its probability properties. This model may only serve as a crude approximation, as probability values outside the interval [0, 1] are not excluded (cf. Rauch, 1984).

232

G. Paas5

Binomial errors niifi "' B(nj, ni(O)) where B(nj, ni(O)) indicates a binomial distribution. This error model presupposes an experiment involving ni independent random drawings according to the "true" probability ni of the rule or fact h For each observation in the resulting sample Si it can only be determined whether Fi is true or.fa/se. Let ci be the number of times where Fi holds and cdni the relative frequency. The probability of ci and ni for a fixed ni(O) is then proportional to ni(9y;(1 - ni(O))"•-c•. The error model was proposed by Ginsberg (1985) and Paass (1986). It can be justified if the experts get their information by experience and base their probability estimates on relative frequencies that they have observed in practice. Because of the limited number of observed cases, there will be a deviation of cdni from the true value ni(O), the sampliny error, which is assumed to be the only cause of errors. It decreases with growing sample size ni. Many variants of these error distributions (e.g. transformed normal errors, discussed by Genest and Zidek, 1986; Lindley, 1985) may be used to represent the knowledge of the decison-maker about the precision of experts' judgements. The parameters of the error distributions on the one hand can be specified directly if they have an intuitive meaning like the standard deviation O"j. On the other hand, they can be expressed in terms of quantiles of the error distribution. Assume, for instance, that the decision-maker wants to determine the sample size ni for the binomial error model and he knows that with probability 0.95 the value ifi will be the interval [0.7, 0.9] if ni(O) = 0.8. Then, using the definition of the binomial distribution (or appropriate approximations), the corresponding sample size n 1 can be determined as 66.5. Assume that an expert has specified ifi = 1 but the decision-maker thinks that this statement is uncertain to some extent. By means of an error model, it is possible to specify this uncertainty without necessarily stating evidence against ifi = I. This means that an estimate ifi = I will be the "best" estimate of ni if the other probability assignments do not imply some evidence in favour of ni < 1. Usually the errors for different ni are assumed to be statistically independent. In other words, it is supposed that the actual deviation ifi - rr; for some ni is not influenced by the actual deviation ifi- ni for any other ni,j #- i. This is reasonable if the experts use different sources of information and do not collaborate. Clemen and Winkler ( 1985) analyse the impact of dependence between experts on the precision of final estimates for probabilities. They show that dependent sources of information considerably reduce the precision of estimates in comparison with the independent case. An alternative is to model the joint distribution p(ifj, ifiln;(O), nj(O)) with the

8

Prohahilistic Logic

233

corresponding covariances or interactions. The statistical techniques discussed below can be applied directly to this case (cf. Paass, 1986). There are procedures to combine the opinions of experts that use no explicit statistical model but start from desirable properties of combination methods. One example is the linear opinion pool, where the estimated final distribution ftw is a convex combination of the distributions Pw; specified by the experts: ftw = Li PiPW;· However, the same result would arise if additive error with constant variance was assumed. For a discussion of these and other approaches see Genest and Zidek ( 1986). 5.2

Evaluation by statistical methods

Let us first discuss some statistical evaluation methods (for a short survey see Dawid, 1983). A widespread method is the likelihood approach. Its direct appeal lies in the idea that it is a good way to compare parameter values 9 1 and 9 2 by means of the probability that they assign to the observed "data". For given "data" iti specified by independent experts and error distributions p(itd ni(9)) we can define the likelihood function L(9) by n,.

L(9) := p(ir lx(9)) :=

fl

p(iti I ni(9))

i= I

This function summarizes all information present in the data. 9 1 is more compatible with the data than 9 2 if L(9 1 ) > L(9 2) (in the absence of other information). On the other hand, the same inferences should result for 9 1 and 82 if L(9d = L(92). Let E>max be the set of parameters where L(9) is maximal. Let us assume that E>max contains only one element. This will happen if the number n6 of parameters is not too large in relation to the number (and structure) of the data items ni. It can be checked by examining the derivates of L(9). The unique maximal parameter value e is called the maximum-likelihood estimate: L(O) = max L(9)

(13)

9

It utilizes all available information in an efficient way and yields the true parameter 9 0 if the sample sizes go to infinity. The likelihood function may be maximized by different optimization methods using first- or second-order derivatives (Mcintosh, 1982) or by simple iterative algorithms (e.g. the EMalgorithm; Dempster et al., 1977) that work without derivatives. If the unknown distribution contains very many parameters then simplified algorithms have to be used to reduce computational effort (cf. Paass, 1986). There are several ways to evaluate the likelihood function. First, of course, 0 can be utilized to calculate the associated value of p(U*) = g*'pw(O) for a

234

G. Paass

proposition U*. The precision of the estimate p(U*) can be determined by a confidence interval. It measures the information contained in the data and enables the decision-maker to distinguish between well-establsihed but equal probabilities and ignorance caused by missing information. It can he estimated using the appropriate likelihood-ratio statistic, whose accuracy depends on the accuracy of the approximation of L(O) by a normal distribution, which increases with growing sample sizes n;. To explore the stochastic relation between some variables of interest, the marginal probabilities for these variables may be estimated. In this way, one can determine. for instance, the information content of yet-unknown symptoms for a diagnosis (Paass, 1986). If two experts consider identical rules F; and Fi and give the same uncertain probability statement ii; = iii with the same variance a 2 then the combination of these two statements is equivalent to a statement with identical it; but variance !a 2 . Therefore pieces of evidence are accumulated, and the true rr; is more likely to be located near ii;. An inherent feature of the maximum-likelihood approach is that "contradictions" between given probabilities ii; are resolved and a unique estimate Pw == Pw(O) is determined. Estimated values it; may differ from the specified values ii;. Contradictions are resolved in such a way that for less reliable ir, the extent of modification is largest. Hence by checking the difference between ii; and it; the "most contradictory" probabilities can be detected.

Table 2 Rule or fact F;

p(D I A) p(DI-,A) p(B I D) p(BI-,D) p(BI A) p(A)

Sample size

Value ii; supplied by expert

1ri,low

ni. hiKh

n;

0.20 0.80 0.70 0.40 0.10 0.50

0.00 0.70 0.60 0.30 0.00 0.30

0.40 0.90 0.80 0.50 0.20 0.70

II 43 57 65 24 17

Interval

Example Suppose that for our example the probabilities ii; listed in Table 2 have been assigned by independent experts. Assuming binomial errors, the decision-maker assesses the reliability of the experts by specifying a "probable interval" [rr;.Iow• n;, high] that will contain the true probability value n; with the prescribed probability P; = 0.90. From this interval, the sample size n; of the corresponding hypothetical sample can be determined (a normal

8

235

Prohahilistic Logic

approximation was used). The hypothetical sample size n; is based on the assumption that the interval [n;. low• ni. high] corresponds to the 90% confidence interval for 1tj. The EM algorithm yields the maximum-likelihood estimate p(D I ---,A 1\ B)= 0.75 with an estimated standard deviation of 0.07. Compared with the worst-case solution of the example given above, this interval is much tighter. If the information about p(B I A) is not taken into account and A and B are assumed to be conditionally independent given D the estimate p(D I ---,A 1\ B) = 0.87 results. This effect is quite general, as the assumption of conditional independence has a tendency to overstate the information content of the available "data". Now suppose that an additional expert states his knowledge if 8 •= p(B I ---,A) = 0.50 and the decision-maker assumes this expert to be rather reliable with n 8 . low = 0.45 and n 8 • high = 0.55. The corresponding hypothetical sample size is 270.7, yielding an estimate p(D I ---,A 1\ B) = 0.88. However, the ifi are now contradictory to some extent, and the estimated values deviate from the specified values, for example p(B I A)= 0.35 instead of 0.10 and p(B I---, A) = 0.52 instead of0.50. According to a x2 test the deviation of p(B I A) is more significant than the deviation of p(B I---, A). In this sense the specified p(B I A) is "more contradictory" than p(B I---, A). An alternative statistical evaluation principle is Bayesian analysis. Here the vector 9 of unknown parameters itself is considered as an n8 -dimensional random variable with an unknown distribution Pr (9). Probability vectors are treated as points in !R"o and Pr (9) is a probability measure on that space, which can usually be described by a probability density. It induces a distribution Pr (n) on 1t because of Pw = Pw(9) and 1t = (R+ Pw) -:- (R- Pw). The aim of Bayesian analysis is the combination of a given prior distribution Pr (9) with some data. In our context, Pr (9) may result from two different lines of reasoning. First, it can encode structural information about the distribution discussed above (e.g. the absence of higher order associations, higher plausibility of specific distributions). On the other hand, it can reflect complete ignorance and give the same density to all 9-values. However, the definition of such "non-informative" priors is a controversial topic (Berger, 1980, pp. 68ff). The crucial and arguable feature of Bayesian reasoning is that the prior distribution Pr (9) is always assumed to exist. For a given prior distribution and a known error model p(if 19) the posterior density for the unknown parameter 9 is _

Pr( 9 ltt) =

p(if 19) Pr (9)

f p(ifl9) Pr(9)d9

( 14)

It specifies the density value or "relative" probability of 9-vectors after the information contained in if has been taken into account. Its maximum, the

236

G. Paass

maximum posterior estimate, if it exists, gives the "most-probable" parameter value. Moreover, posterior regions may be determined where the posterior density is highest and which contain the true parameter with a prescribed probability. This can even be done for the case that there is no unique maximum of the posterior density. If the decision-maker has a constant "noninformative" prior density Pr(9) =constant, indicating missing information about the parameter, then the maximum-posterior estimate is identical with the maximum-likelihood estimate (13). The determination of the maximum-posterior estimate can be done with the same methods and comparable effort as the calculation of the maximumlikelihood estimate. The determination of posterior regions involves the determination of multivariate integrals, which is computationally very demanding in the general case. Simplifications arise if Pr (9) is assumed to be normally distributed, and least-squares methods like the extended Kalman filter (Lederman, 1984, pp. 902ff) may be used. Gokhale and Kullback (1978, pp. 199 ff) discuss the application of the maximum entropy approach. A new class of algorithms for the solution of the nonlinear optimization problems employs statistical simulation techniques. With relatively little effort they yield approximate solution whose quality increases during the progress of the optimization. As in case of uncertain probabilities the variance of 0 usually is large suboptimal solutions often are sufficient. An example is the simulated annealing algorithm (Aarts, Laarhoven, 1985) where a cost function C (9) is to be minimized. The algorithm consists in successive modifications of a single component 0; of 9 to a new value ii;. If C(li) < C(9) the modification is accepted. Otherwise the modification is accepted with probabiity exp ([C(9)- C(li)]/t). If the control parameter t is slowly decreased towards zero it can be shown that the resulting values concentrate on the set {0 I C(O) =min C(9)} of minimal cost parameters. In a Bayesian context we may define C 8 (9) == -log (p(ft 19) Pr (9)) and the procedure yields the maximum posterior estimate. In the same way the negative log likelihood function CL(9) == -log (p(ft 19)) or other cost functions may be utilized. If for a suitable parameterization 9 and Bayesian cost function C8 (9) the value oft is set to I it can be shown (Paass, 1987) that the resulting simulated sequence of parameter values can be considered as a sample of the posterior distribution Pr (9 I ft). By observing the evolution of 9 after convergence to steady state the marginal posterior distribution and corresponding posterior regions of any parameter can be obtained. In the same way confidence regions can be established if the likelihood cost function CL(9) is employed as criterion function. In contrast to the sample based procedure of Bundy ( 1985) the values of higher order interactions not affected by the criterion function

8

Probabilistic Logic

237

can be controlled and, for example, can be set to their 'least informative' zero values.

6 6.1

SUMMARY AND COMPARISON Assessment of probabilistic logic

In this section we want to summarize and discuss the main features of probabilistic logic in expert systems. It has been demonstrated above that the probability of a proposition A can be interpreted as the "subjective degree of belief" of the decision-maker in A. No frequency concept is needed, as the existence of a probability measure p(x) can be derived from a few axioms of rational behaviour, and there exists a clear framework for the interpretation of probabilities. An expert system seems to be particularly suitable for the application of probability, as it is a closed simplified system with fixed states that can react to the real world only according to a limited number of "pieces of evidence". In contrast with the early utilization of Bayes' rule, where a large number of prior probabilities were required, the approaches discussed in this chapter need only the information about the structure and probability values of the inference net that is logically necessary to arrive at a result at all. Of course, the prize paid for such a relaxation in the requirements is that in general the resulting probabilities may no longer be unique point values but may only be known to be located within some interval. The inference net can have an arbitrary form with cycles. Structural assumptions may be stated by restricting the class of distributions considered (if, for instance, higher-order interactions are zero) or by assuming informative prior distributions. In worst-case analysis no structural assumptions are necessary. The inference net can be revised and enlarged by simply exchanging rules or adding variables. If new propositions occur in the course of reasoning then the probability measure can be extended consistently. Hence probabilistic logic is not confined to situations where all propositions are known in advance. It is not necessary to stick to a single number for the description of the degree of belief in a proposition. The precision of probabilities can be characterized by a whole function, the likelihood function or the related posterior density. Unlike classical logic, in general no single possible world will emerge as the "true" world, but all possible worlds have to be taken into account with differing chances of being the "true" world. Because of this complexity, no simple "explanation" of a result is usually feasible. It is,

238

G. Paass

however, possible to identify the main reasons that led to a specific result by considering the probability of antecedents of rules. An inherent feature of the approaches for handling uncertain probabilities is their ability to resolve contradictions and to take into account the relative precision of "inputs" for the determination of the resulting probabilities. During the process of probabilistic reasoning, the probability p(A) and hence the truth value assigned to a proposition A can change if new evidence arrives and uncertain probabilities are assumed. Consequently, probabilistic logic with uncertain pieces of evidence is a sort of non-monotonic logic. The intention of this chapter was to discuss the different evaluation principles and inherent assumptions that may be chosen in probabilistic logic. The algorithms presented often involve large computational effort. They give a sort of reference solution, which may be used to derive simpler computationally feasible procedures. Methods that may be employed for larger inference nets are (i) (ii) (iii) (iv)

the linear-programming approach, where the restrictions are specified in terms of marginal probabilities; the INFERNO approach of Quinlan (1983); the Bayesian network technique proposed by Pearl ( 1986), where a specific interaction pattern is assumed; statistical simulation techniques (Paass, 1987), which give approximate solutions for the most general case.

As many research groups are currently working on new algorithms, significant progress can be expected in the near future. 6.2

Relation to similar approaches

The characteristic of the Shafer-Dempster approach (Shafer, 1976; Chapter 9 of the present book) is that it does not lead to a probability distribution over the exclusive and exhaustive possible worlds W; E "'f/, but rather starts with a "probability mass function" m(A) ~ 0 on the Boolean algebra fi' over "HH. This function is supplied by the experts, and the masses have to sum to I. Example Consider the possible worlds "HH corresponding Boolean algebra is

= { W~> W2 ,

W3 }. Then the by fi' = {0, W~> W2, W3, W1 v W2, W1 v W3, W 2 v W3, W1 v W 2 v W3 }. Assume the probability mass function m(W1 ) = 0.3, m(W3) = 0.1, m(W2 v W3 ) = 0.4, and m( W1 v W2 v W3 ) = 0.2. given

If a mass is given to W; v J.tj, this means that this "amount" of belief can be

8

Probabilistic Logic

239

attributed jointly to W; and It), but the decision-maker does not have enough information to allocate proportions of m( W; v It}) to the single propositions w; and It). Consequently the mass given to the disjunction of all W; cannot be allocated at all. This situation can also be modelled by means of probabilistic logic. Assume for our example that there is a random variable w with possible "values" WI> W2 and W3, and that there exists a probability measure p(w) for w. The values of p(w) are not known. There exists, however, another variable v that has the "values" v~> v3, v23 and v 123 corresponding to the elements :!i' with positive mass. w and v are assumed to have a joint distribution p(v I w)p(w) = p(w, v) = p(w I v)p(v). About p(w I v), the decision-maker has some structural information: p(W1Ivd =I,

p(W2Ivd = 0,

p(W3I vd = 0

p(W1Iv3) = 0,

p( W2l V3) = 0,

p(W3Iv3) =I

o,

p( W2l V23) =IX,

p( W1 I v23) =

p( W1 I v123) = P~>

p( W2 I v 123) = P2,

p(W3IV23)= I-IX p(W3Iv123) = I - P1- P2

The free parameters IX, P1 and P2 , however, are unknown to him. They represent the information necessary for an allocation of probability masses to the probabilities of the W;. The basic probability assignment defines the marginal distribution p(v). It is then the task of the decision-maker to derive conclusions about the probability of the elements of the Boolean algebra. Obviously there is no unique solution, but upper and lower bounds on these probabilities can be established by use of worst-case analysis. The resulting bounds are the narrowest possible without introducing additional information into p(v, w). Grosof ( 1986) and Kyburg ( 1987) point out that every Shafter-Dempster belief function may be expressed by inequality constraints on an underlying probability measure. The converse, however, is not true, and hence the Shafer-Dempster scheme has less expressive power than the probability approach with upper and lower bounds. In the case of two experts supplying two different probability mass functions, an error model could be formulated according to the precision of their assignments. By convex Bayesian analysis (Thompson, 1985), the mass functions can be combined (cf. Grosof, 1986). It would be interesting to compare the results of Dempster's rule of combination with these probabilistic techniques. Lemmer ( 1986) has already showed that the combination rule yields results contradictory to a probability interpretation.

240

G. Pcws.1·

Many of the structural features of other approaches to uncertain reasoning are similar to probabilistic logic. Gaines ( 1978) shows that probabilistic logic as well as fuzzy logic (see Chapter 10) can be considered as special cases of a "standard uncertainty logic", which he defines by a set of axioms. T 0 arrive at probabilistic logic, the axiom of excluded middle p(A v -,A) = 1 has to be added, while another axiom is added to arrive at a variant of fuzzy logic. Goodman and Nguyen ( 1985) develop generalized set-membership functions with probabilistic and fuzzy logic as special cases. Horvitz et a/. ( 1986, pp. 212f) discuss the relation of two distinct forms of fuzzy logic to probabilistic logic. The first type (Zadeh, 1983) allows beliefs to be assigned to propositions that are fuzzy, i.e. remain ill-defined. Proponents of probability theory have pointed out that imprecision in the specification of a proposition could always be converted to uncertainty of a related precise event that had similar or identical semantic content. Cheeseman ( 19S6) proposed that probability distributions over variables of interest can capture the characteristics of fuzziness within the framework of probability. The fuzzy proposition "Mary is young", for instance, can be represented by a distribution specifying the probability that Mary has age z. He claims that "fuzzy logic is unnecessary for representing and reasoning about uncertainty." The second type of fuzzy logic (Gaines, 1978) interprets the degree J1.T( A) of membership of a proposition A in the set of true propositions as the degree of belief in A. In this approach the degree of belief in a conjunction is defined by J1.T(A 1\ B) = min (J1.T(A), J1.T(B)). Obviously, this is not consistent with the factorization p(A " B) = p(A I B)p(B) of probability theory. It is, however. comparable to features of worst-case analysis discussed above. The relation between probabilistic logic and default logic is discussed in Chapter 10. There are many other approaches to uncertain reasoning, some of which are rather ad hoc. Horvitz eta/. ( 1986) showed, for example, that the certainty factors approach is inconsistent. Heckerman ( 1986) modified the certainty-factor method in such a way that it satisfies the requirements for a Bayesian probability interpretation. The above arguments show that probabilistic logic is able to exhibit similar features to some "new" concepts of non-monotonic reasoning. Hence the concepts seem to be complementary instead of contradictory. For each approach it is most important to clarify and check all the inherent assumptions (independence, absence of higher-order associations, etc.) before applying it to a concrete problem. There has been great progress in this field during the last few years, and it is to be hoped that parallel research on different concepts will stimulate progress in uncertain reasoning as a whole.

8

Prohahilistic Logic

241

ACKNOWLEDGMENTS

I should like to thank Didier Dubois, Gabor Gyarfas, Hermann Quinke, Phillipe Smets and Frank Veltman for their valuable comments. In addition, I am grateful to the Gesellschaft fiir Mathematik und Datenverarbeitung, who provided the opportunity to work on this subject. Bl B LIOG RAPHY The following references provide an introduction to the basic principles and problems of probabilistic logic and compare it with other approaches. Cheeseman, P. (1985). In defense of probability. Proc. Int. Joint Con[. on Artificial Intelligence (/JCAI-85), Los Angeles, pp. 1002-1009. (Starting with the notion of probability as a measure of belief in the truth of a proposition, the positive features of probabilistic reasoning in comparison to other approaches are compiled. It is argued that probability theory, when used correctly, is sufficient for the task of uncertain reasoning.) Fishburn, P. C. (1986). The axioms of subjective probability (with discussion). Statist. Sci. 1, 335-358. (Gives an up to date survey of axiom systems, including comparative probability relations, decision-theoretic approaches, interval probabilities, etc. The discussion gives an impression of the controversies in this field.) French, S. (1985). Group consensus probability distributions: a critical survey. Bayesian Statistics 2 (ed. J. M. Bernardo et a/.), pp. 183-202. North-Holland, Amsterdam. (Two main versions of the group-consensus problem are considered. In the expert problem a group of experts submits probability judgements to a decision-maker outside the group, who has to aggregate the experts' opinions. In the group-decision problem the group itself is responsible for aggregating their probability judgements to a consistent probability distribution.) Genest, C. and Zidek, J. V. (1986). Combining probability distributions: a critique and an annotated bibliography. Statist. Sci. 1, 114-148. (This paper discusses the problem of aggregating a number of probability distributions specified by different experts. In contrast with probabilistic reasoning no marginal or conditional distributions are specified by the experts. From the point of view of decision theory, the different approaches are compared. The extensive bibliography and the discussion give a comprehensive picture of this field.) Kanal, L. N. and Lemmer, J. F. (eds) (1986). Uncertainty in Artificial Intelligence. North-Holland, Amsterdam. (This collection contains nearly 40 papers and gives a representative impression of current developments. The topics of probabilistic reasoning, belief functions, maximum entropy, and interval probabilities are covered and compared in depth.) Nilsson, N.J. (1986). Probabilistic logic. Artificial Intelligence 28, 71-87. (Defines the truth value of sentences of first-order logic by their probability in probabilistic reasoning systems. The derivation applies to any logical system for which the consistency of a finite set of sentences can be established.) Paass, G. (1986). Consistent evaluation of uncertain reasoning systems. Proc. 6th Int. Workshop on Expert Systems and their Applications, Avignon, pp. 73-94. (Inference nets are considered where probabilities of fact and rules are not known exactly but

242

G. Paas.1

are subject to error. These probabilities are modelled as random samples where the number of elements determines their reliability. The inference net is evaluated according to the maximum-likelihood principle, allowing conflicting evidence to be processed.) Quinlan, J. R. (1983). INFERNO: a cautious approach to uncertain reasoning. Comp. J. 26, 255-269. (Specifies a method for the evaluation of inference nets where intervals are specified for marginal and conditional probabilities, yielding intervals that have to contain the true probabilities. The approach is computationally cheap, but may yield intervals that are larger than optimal.) Spiegelhalter, D. J. (1986a). A statistical view of uncertainty in expert systems. Artificial Intelligence and Statistics (ed. W. Gale ed.), pp. 17-55. Addison-Wesley, Reading, Mass. (Different approaches to uncertain reasoning (e.g. probabilistic reasoning, fuzzy reasoning, belief functions and the theory of endorsements) arc compared from a statistical point of view. It is argued that a subjectivist Bayesian view of uncertainty can provide many features demanded by expert systems. Some examples as well as numerically feasible methods for the evaluation of inference nets are discussed.)

Other references

Aarts, E. H. L. and van Laarhoven, P. J. M. (1985). Statistical cooling: a general approach to combinatorial optimization problems. Philips J. Res. 40, 193-226. Berger, J. 0. (1980). Statistical Decision Theory. Springer-Verlag, New York. Bernardo, J. M., DeGroot, M. H., Lindley, D. V. and Smith, A. F. M. (eds) (1985). Bayesian Statistics 2. North-Holland, Amsterdam. Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, Mass. Bundy, A. (1985). Incidence calculus: a mechanism for probabilisitc reasoning. J. Autom. Reasoning 1, 263-283. Cheeseman, P. (1986). Probabilistic versus fuzzy reasoning. In Kana! and Lemmer (1986), pp. 85-102. Clemen, R. T. and Winkler, R. L. (1985). Limits for the precision and value of information from dependent sources. Operations Res. 33, 427-442. Dalkey, N. C. (1986). Inductive inference and the representation of uncertainty. In Kana! and Lemmer (1986), pp. 393-397. Dawid, A. P. (1983). Statistical inference. Encyclopedia of Statistics, Vol. 4 (ed. S. Kotz and N. L. Johnson), pp. 80--105. Wiley, New York. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood frorn incomplete data via the EM algorithm. J. R. Statist. Soc. 839, 1-38. Diaconis, P. and Zabell, S. L. (1982). Updating subjective probability. J. Am. Statist. Assn 77, 822-830. Fienberg, S. E. (1980). The Analysis of Cross Classified Categorial Data. MIT Press. Cambridge, Mass. Gaines, B. R. ( 1978). Fuzzy and probability uncertainty logics. Info. Control 38. 154-169. Gale, W. (ed.) (1986). Artificial Intelligence and Statistics. Addison-Wesley, Reading. Mass.

8

Prohahilistic Logic

243

Ginsberg, M. L. (1985). Does probability have a place in nonmonotonic reasoning? Proc.Int. Join Con,{. on Artificial Intelligence, (JJCAI-86) (ed. A. Joshi), Los Angeles, pp. 107-110. Gokhale, D. V. and Kullback, S. (1978). The In,{ormation in Contingency Tables. Marcel Dekker, New York. Good, I. J. (1982). Axioms of Probability. Encyclopedia o.f Statistical Sciences, Vol. I (ed. S. Kotz and N. L. Johnson), pp. 169-176, Wiley, New York. Goodman, I. R. and Nguyen, H. T. (1985). Uncertainty Models.for Knowledge-Based Systems. North-Holland, Amsterdam. Grosof, B. N. (1986). An inequality paradigm for probabilistic reasoning. In Kana! and Lemmer (1986), pp. 259-275. Haber, M. and Brown, M. B. (1986). Maximum likelihod methods for log-linear models when expected frequencies are subject to linear constraints. J. Am. Statist. Assn 81, 477-482. Hamburger, H. (1986). Representing, combining and using uncertain estimates. In Kanal and Lemmer (1986), pp. 399-414. Heckerman, D. (1986). Probabilistic interpretations for MYCIN's certainty factors. In Kanal and Lemmer (1986), pp. 167-196. Horvitz, E. J., Heckerman, D. E. and Langlotz, C. P. (1986). A framework for comparing alternative formalisms for plausible reasoning. Proc. American Association .for Artificial Intelligence Con,{. (AAAI-86), pp. 21(}-214. Hunter, D. (1986). Uncertain reasoning using maximum entropy inference. In Kana! and Lemmer (1986), pp. 203-209. Kahneman, D., Slovic, P. and Tversky, A. (1982). Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press. Konolidge, K. (1982). An information-theoretic approach to subjective Bayesian inference in rule-based systems. Draft, SRI International, Menlo Park. Kyburg, H. E. (1987). Bayesian and non-Bayesian evidential updating. Artificial Intelligence 31, 271-293. Lederman, E. (ed.) (1984). Handbook of Applicable Mathematics, Vol. VI Part B: Statistics. Wiley. Chichester. Lemmer, J. F. (1986). Confidence factors, empiricism and the Dempster-Shafer theory of evidence. In Kana! and Lemmer (1986), pp. 117-125. Lindley, D. V. (1985). Reconciliation of discrete probability distributions. In Bernardo eta/. (1985), pp. 375-390. Loui, R. P. (1986). Interval-based decisions for reasoning systems. In Kana! and Lemmer (1986), pp. 459-472. Mcintosh, A. A. (1982). Fitting Linear Models: An Application o.f Conjugate Gradient Algorithms. Springer-Verlag, New York. Paass, G. ( 1988). Uncertain reasoning by stochastic simulation. Working Paper G MD/ F3, St Augustin, FRG. Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence 29, 241-288. Rauch, H. E. (1984). Probability concepts for an expert system used for data fusion. AI Magazine (Fall), pp. 55-60. Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press. Shore, J. E. (1986). Relative entropy, probabilistic inference, and AI. In Kana! and Lemmer (1986), pp. 211-215. Smith, C. A. B. (1961). Consistency in statistical inference and decision. J. R. Statist. Soc. 823, 1-25.

244

G. Paa.\'.1'

Spiegelhalter, D. J. (1986b). Probabilistic reasoning in predictive expert systems. In Kana! and Lemmer (1986), pp. 47-67. Thompson, T. R. (1985). Parallel formulation of evidential reasoning theories. Proc. Int. Joint Conf. on Artijiciallntelligence (IJCAI-85), Los Angeles, pp. 321-327. Zadeh, L. A. (1983). The role of fuzzy logic in management of uncertainty in expert systems. Fuzzy Sets and Systems 11, 199-227.

DISCUSSION Frank Veltman: Since most decisions must be made under circumstances of uncertainty, the designer of an expert system is immediately confronted with the problem of representing the modes of inference typical of such circumstances. Gerhard Paass's chapter offers an encyclopaedic survey of the statistical techniques that become available once one has taken for granted that the kind of "uncertainty" at stake here is best captured by probability theory. I shall not dispute the idea that probability theory offers the best characterization of uncertainty-indeed, there are strong arguments in favour of this position (see e.g. Lindley, 1982). My comments--one remark and one question-only pertain to the particular way in which Paass develops this idea. (i) The remark that I want to make is this: the domain of application of the mathematical framework offered in Sections 2.3 and 3.1 is rather limited, much more limited than the author seems to think. Amplification. The theory presented in Sections 2.3 and 3.1 requires for its application that probabilities be assigned to things that bear a truth value: sentences (or propositions, as the author prefers to call them). This is established by choosing as the sample space a set of so-called possible worlds, each determined by some maximal consistent set of sentences. The practice, however, does not fit this theory. For one thing, there is no way to make sense of the example presented in Section 2.1 if the symbols "A"," B" and "D" arc really to be interpreted as sentences. This is what Paass says: Suppose that a doctor has to decide whether or not a patient has a disease D. The relation between the two symptoms A and B and the disease D is specified in the form of the following rules F" .. ., F 5 , which hold with a certain probability: F 1 ,="If A then D follows" holds with probability n 1 F 2 '="If --,A then D follows" holds with probability n 2

[ .... ] These probabilities 1ti reflect the subjective degree of belief of the doctor in the truth of the rules for a certain universe, for instance the people of a town. [ ....] the probability associated with a rule is always defined as the conditional probability of the consequence given the antecedent. For F1 we have for example n 1 = p(D I A)'= p(A " D)/p(A).

One might try to interpret "A" as "a person chosen at random shows symptom A". and "D" as "a person chosen at random suffers from diseaseD", but in this manner the sentence (A 1\ D) is not going to mean "a person chosen at random has both the

8

Prohahilistic Logic

245

symptom A and the diseaseD", let alone that p(D I A) can serve as a formalization of the conditional probability that a person chosen at random suffers from disease D, given that this person shows symptom A. There are two alternative, and appropriate, ways to interprete the quoted paragraphs, but in neither of these A and D can be understood as sentences. On the first reading, A and D are to be interpreted as predicates, and the sample space concerned is not a set of possible worlds, but a set of possible "patients", each given by some maximal consistent set of predicates. Fortunately, the statistical techniques discussed in Section 4 work just as well-mutatis mutandis-for this set up as they do for the original. However, this only holds for the case that we restrict ourselves to oneplace predicates. For more complex predicates things go wrong: Example 1 Some people think that the kissing disease is transmitted by kissing. Clearly, these people, and those who disagree with them, will want to talk about the conditional probability that a person x will be infected with the kissing disease by a person y who has the kissing disease, and who has kissed x. As far as I can see, there is no way to handle this probability within the framework offered by Paass, not even if we interpret it in a different manner.

The second way to read the quoted paragraphs is to think of A and D as formulae Ax and Dx in which the free variable x is suppressed. Ax can be considered as an abbreviation of "x shows symptom A", and Dx as an abbreviation of "x suffers from diseaseD". Where Paass writes (A 1\ D), we read (Ax 1\ Dx), and instead of p(D I A) we read p(Dx 1 Ax). Perhaps, at first sight, there is not much difference between this reading and the one discussed above. However, the advantage of not suppressing the free variables becomes clear if we want to formalize examples that involve both formulae with free variables and sentences, i.e. formulae without free variables. Example 2

(a) (b)

Compare the following:

the probability that every male smoker dies from cancer before the age of 60; the conditional probability that a randomly chosen person dies from cancer before the age of 60, given that this person is a male smoker.

Note that these two probabilities are not necessarily the same. In fact we can be pretty sure that an expert will assign the probability 0 to (a), and a positive probability to (b). Still, there are logical relations between these two probabilities. If one of them is I then so is the other. Within Paass's framework, these logical relations cannot be made explicit. What one needs for this example as well as for the previous one is a fully fledged probabilistic semantics for arbitrary first-order formulae with or without free variables-a semantics that assigns to a formula 4J(x., ... , x.) the probability of finding for a randomly chosen n-tuple of objects that the property expressed by 4> applies to them. Given such a framework, we could safely write "Dx" for "x dies before the age of 60", and "Sx" for "xis a male smoker", and so arrive at p(l;/x(Sx --+ Dx)) for the probability under (a), and p(Dx I Sx) for the probability under (b). The probability of Example I could be formalized as p(lxyiPy 1\ Kyx), where "lxy" is an abbreviation of"x will be infected by y", "Px" is an abbreviation of"x has the kissing disease", and "Kxy" is an abbreviation of "x has kissed y". A probability semantics with the desired properties was devised in the early sixties by the Polish logician Jerzy J:_os. Unfortunately, lack of space prevents me from

246

G. Paas.1

describing his theory here. Let me just mention the relevant literature. The locus classicus is J:,os (1963). The theory is further developed in Fenstad (1967), and in Gaifman and Snir (1982). Cooke ( 1986) discusses the theory with a view to application in expert systems. (ii) The question that I want to ask is this: How does the discussion of Section s relate to other work that has been done on the subject of expert resolution. Mor~ precisely, is it meant as an alternative to the approach that tries to develop criteria for evaluating expert probability assessment? Amplification. At several places in his chapter, Paass emphasizes that the probabilities at stake are supposed to reflect the expert's subjective degree of belief in the propositions concerned. Now, clearly, different experts may have different opinions about the same propositions, and therefore assign different numbers to them. In one way or another the decision-maker will have to resolve this conflict of opinions. I must confess that I do not fully understand the strategies that Paass recommends to this purpose. Actually, I find myself already at a loss with the way he introduces the problem. He says that the probabilities supplied by the experts may be erroneous to some extent, and he introduces so-called error models that give the stochastic relation between the true probability and the numbers supplied by the experts. I find it odd to find the word "error" in a context where subjective probabilities are involved- can people really be mistaken about their own degree of belief?-and I should be greatly helped if the author could give an example of the kind of errors he has in mind. Neither do I see what can possibly be meant by "the true probability", and it does not help much when the author says on p. 230 that

The "true" probability 1t; can be considered as the subjective probability estimate that would be supplied by a rational expert with complete information about all aspects of the problem. As far as I can see, the only feasible subjective probability estimates that any rational being-expert or not-will supply in these ideal circumstances are 0 for the propositions known to be false, and I for the propositions known to be true. But this is not what the author has in mind, I am afraid.t Anyway, the question arises as to why the author takes recourse to these error models to solve a problem that first and foremost seems to be a selection problem: which of the experts is the most reliable-or, perhaps better: which one has so far been found to be the most reliable? This question has received a lot of attention recently. and several criteria have been proposed for evaluating expert probability assessment. Important contributions can be found in Lichtenstein et a/. (1982), De Groot and Fienberg (1983) and Cooke (1985). I am grateful to Roger Cooke and Michie! van Lambalgen for their help.

Didier Dubois and Henri Prade: Are probability measures inevitable for modelling: subjective uncertainty? Several authors cited by Paass, such as Cheeseman (1983) and Horvitz et a/. (1986), have claimed that axioms of rational behaviour force an 'On p. 217ft' Paass discusses as a special case the case that the experts concerned have given their subjective estimates of an objective probability-one or the other statistical fact. Of course. in this special case the words "error" and "true probability" do apply.

8

247

Probabilistic Lof?ic

uncertainty measure to be a probability measure. To do so, they put forward Cox's (1946) axiom system for the modelling of "reasonable expectation". However, Cox's axioms are not always exactly reported. Namely, he starts from the following requirements. Letting .f(a I b) be a measure of the "reasonable credibility" of the proposition b when the proposition a is known to be true, Cox proposes two basic axioms: Cl:

there is some operation * such that .f(c " b I a)

C2:

=

.f(c I b " a) *f(b Ia)

(I)

there is a function S such that .f(--,bla)

=

S(f(bla))

(2)

where --, b denotes not b. The following additional technical requirements are needed: C3:

* and S both have continuous second-order derivatives.

Then .f is proved to be isomorphic to a probability measure. Cheeseman (1985) proposes Cox's results as a formal proof that no other set functions than probability measures are reasonable for the modelling of subjective uncertainty. This claim can be disputed for two reasons. Although (I) seems very sensible as a definition of conditional credibility function, the purely technical assumption (C3) is very strong and cannot be justified on commonsense arguments. For instance* =minimum is a solution of (I) that does not violate the algebra of propositions, but it certainly violates C3. It is recovered as a valid solution as soon as C3 is relaxed to a more intuitive continuity assumption. A second objection concerns Axiom C2, which explicitly states that only one number is enough to describe both the uncertainties of b and --,b. Clearly, this statement rules out the ability to distinguish between the notions of possibility and certainty. This distinction is the very purpose of belief functions, possibility measures, and any kind of upper and lower probability system. Hence Cox's setting, although an interesting attempt at recovering probability measures from a purely non-frequentist point of view, does not provide the ultimate answer to the problem of justifying subjective probabilities. Dubois and Prade (1982) should be consulted for another axiomatic setting encompassing both probability and possibility measures as admissible models of subjective uncertainty. Namely the degree g(a) attached to proposition a should satisfy the following decomposability axiom: Dl:

if a " b =

0

then there is an operation .l such that g(a v b)= g(a) .l g(b)

(3)

These are called decomposable measures. They are usually different from belief functions in the sense of Shafer, although possibility and probability measures do belong to both settings. But many belief functions are not decomposable. Another comment concerns the interpretation of degrees of probability as degrees of intermediate truth. In our contribution to this book (Chapter to) we strongly argue against such a confusion. A degree of truth is not a degree of uncertainty about truth. In particular, the logical propositions considered in Paass's chapter are only either true or false. This distinction, in a probabilistic setting, is at least as old as Carnap's 0945) paper. However, Carnap's view of probability of a proposition a as the ratio between the number of possible worlds where a is true over the number of possible Worlds is not really convincing, since it assumes that the possible worlds are equally

248

G. Paass

probable-an assumption that is very difficult to check, and which turns out to be false in many cases. Actually, a probability measure on an algebra of propositions is justified if possible worlds are identified with a set of outcomes in a random process, and statistical data about this process are available.

Philippe Smets: F:

The author, like probabilists, translates the sentence

"if W then N" holds with probability p

as P(N I W) = p. This interpretation is questionable. One can translate F in at least two ways: 1:

P(NI W)

2:

P(--, W

=p

v N) = p

Suppose that an urn contains 100 balls. Let W =the ball is white, N =the ball is numbered. Suppose that there are 60 W" N balls, 15 W" --,N, 12 --, W" N, 13--, W" --,N. A ball is going to be taken randomly (each ball having a probability of O.Ql of being selected). If one translates F into "if I have extracted a white ball then the probability that the ball is numbered is p" then one has P(N I W) = p = 0.80. But one can also consider all extractions where the proposition W -+ N is true, i.e. whenever the ball is either W " N or --, W. One obtains the second translation with P(--, W v N) = p = 0.85. The two translations correspond respectively to 1':

N-+P(W)=p

2':

P( W-+ N) = p

The decision as to which translation is relevant can only be derived from information external to F. In medicine, with G = "if symptom S then diseaseD holds with probability p", one usually derives G from the fact that among those with symptomS a proportion p have disease D, in which case interpretation I holds. But it could also be that one receives the rule S -+ D from a professor whose proportion of correct assertions is p. Then interpretation 2 holds. A study of which proposition receives the I - p can be enlightening. In the case G, r is the proportion of time the professor tells the truth and I - p is the proportion of time the professor does not tell the truth. To be a good probabilist, one must then construct the probability distribution on all the sentences that the professor could assert given that he does not tell the truth, often an unreal requirement, and distribute the probability I - p among these sentences. With belief functions, the distinction between I and 2 disappears. One has bei(NI W) = c(bel(--, W v N)- bel(--, W)) (see Chapter 9, Section 3.6), with c = I or c = I - bel(--, W), depending on the openor closed-world assumption (Chapter 9, Section 4). . Let W and N be the spaces on which W and N are defined (i.e. W is the set of colours). When I have only the information "my belief is p that W-+ N is true", I build a belief function on W x N such that bei(N 1 W) = p and such that bel(cyl( W)) "" bel(cyl(--, W)) = 0 (Chapter 9, Section 6), with cyi(W) = (W, N) v (W, 1N). Then c = I and bei(N I W) =bel(--, W v N)

8

Probabilistic Logic

249

Therefore in the G case, the belief-function approach consists in allocating the mass p to S--+ D and the mass I - p to the tautology, avoiding in fact the obligation to enumerate the set of sentences that might be uttered by the professor when he lies, and to allocate to each of them a probability. To assimilate the probability of a conditional A --+ C to a conditional probability P(C 1 A) is hazardous, especially when considering iterated conditionals like A --+ (B--+ C) as shown by Lewis (1976) in his first triviality result. Let us define the--+ operator such that P(A --+ C)= P(C I A) for every A and C. Then one gets P(A --+ C I C) = P(C I A " C) = I P(A--+ CI--,C) = P(CI A

1\ I

C)= 0

For any D one has P(D) = P(DIC)P(C)

+ P(DI--,C)P(--,C)

If D is A --+ C then one obtains P(C I A)= I· P(C)

+ 0· P(--,C) =

P(C)

so A and C are probabilistically independent! Conditionals are highly delicate concepts (see Harper et a/., 1981 ), and their direct translation as conditional probabilities can be misleading.

Reply: First, I should like to discuss the objections of Frank Veltman to the definition of a probability measure using propositions. Recall that the set "'Y = {Wt. .. . , w. } of elementary propositions was defined as an exhaustive collection of consrstent and mutually exclusive statements about the world. Exactly one of these statements, Wj, is true. To remain in our example of medical diagnosis, the world is described by the characteristics of the next patient in the waiting room of the doctor. The possible worlds are the possible, logically consistent combinations W; of symptoms and diseases of this patient. The subjective probability measure of the doctor assigns a number 0 ~ p( U) ~ I to each subset U of "'Y according to the subjective degree to which he believes this subset to contain the fixed, but yet unobserved, realization Wj E "'f'", the true symptoms and diseases of the patient. Hence relevant propositions are stated with respect to a specific situation (a specific patient). A sample or a population is not necessary for the specification and evaluation of subjective beliefs by subjective probability measures. As long as the set of elementary propositions is finite, this conceptually simple setup may be utilized. Nilsson (1986, pp. 77ft"), for instance, shows how problems of firstorder logic may be solved by this approach. However, first-order logic and probability are only loosely connected in Nilsson's theory, as in a first step the "internal" consistency of possible worlds "'Y; has to be established within first-order logic, and in a second step probability theory is applied to derive the desired probability of consequences. Therefore I agree that integrated probability semantics for arbitrary first-order formulae-as proposed by Veltman-are more appropriate in this case. Let me now discuss the remarks of Veltman concerning the case of uncertain knowledge about probabilities. Here an independent external decision-maker is postulated who is able to judge the reliability of different experts. This decision-maker IS assumed to specify his subjective belief about the reliability of the ith expert for a series of hypothetical situations. In each such hypothetical situation he is asked to assume a specific "true" value, e.g. n; = 0.3, for the probability p(A) in question.

250

G. Paass

Subsequently, he is asked to specify his subjective probability that the number rr 1 that the expert wiii assign to p(A) will be lower then a specific value Ct. for example c, == 0.1. By specifying his subjective probability for different values c1, the decisionmaker can formulate his conditional subjective probability measure p(if 1 ln 1 = 0.3) for the situation that n 1 = 0.3. This process is repeated for other hypothetical situations with different values of n1• In this way, a subjective conditional probability measure p(if1 ln 1) can be gained, givinga complete picture of how the decision-maker judges the performance of the ith expert in different circumstances. Note that p(ifd n 1) contains no indication of the "true" probability in question. In the literature on group decision making, such an external decision-maker has been called "supra-Bayesian" (Genest and Zidek, pp. 120ff). It has several advantages over other approaches. (i)

If such a decision-maker exists then the pooling process is not a problem, as he can treat the experts' judgements as data and update his prior via Bayes' theorem (Genest and Zidek, 1986, p. 120).

(ii)

The likelihood solution corresponds to a Bayesian solution with noninformative priors.

(iii)

Many known approaches to combining subjective probability estimates can be understood as special cases. Forming a weighted average of the probabilities of the experts (linear opinion pool), for example, corresponds to the assumption of a normal error model with the inverse weights as variances.

I agree with Dubois and Prade that there are different ways to model subjective uncertainty. The selection of such a theory depends on the characteristics of the problem at hand. It has to be demonstrated, however, that a new formalism is internally consistent and offers more than well established approaches (e.g. probability theory). Otherwise it may have a confusing effect and ignore the rich theory developed for established paradigms. The interpretation of a sentence F: "if W then N" holds with probability p as P(N I W) =pis not essential to the approach of the paper. Depending on the situation of interest, the interpretation P(-, W v N) = p may be more appropriate, as demonstrated in the comment of Smets. Both interpretations state some characteristics of the joint probability measure, and may be used without problems with the techniques discussed in this chapter.

Additional references Carnap, R. (1945). The two concepts of probability. Phil. Phenomenol. Res. 5, 513-532. Cooke, R. M. (1985). Expert resolution: Proc. 2nd Conf on Analysis, Design and Evaluation of Man-Machine Systems. Pergamon Press, New York. Cooke, R. M. (1986). Probabilistic reasoning in expert systems reconstructed in probability semantics. Philosophy of Science Association 1986, Vol. I. Cox, R. (1946). Probability, frequency and reasonable expectation. Am. J. Phys. 14. 1-13. De Groot, M. and Fienberg, S. E. (1983). The comparison and evaluation of forecasters. Statistician, 32, 12-22. Dubois, D. and Prade, H. (1982). A class of fuzzy measures based on triangular norrns.

8

Probabilistic Logic

251

A general framework for the combination of uncertain information. Int. J. Gen. Syst. 8, 43-61. Fens tad, J. E. ( 1967). Representations of probabilities defined on first order languages. Sets, Models, and Recursion Theory (ed. J. N. Crossley), pp. 156-172. NorthHolland, Amsterdam. Gaifman, H. and Snir, M. (1982). Probabilitities over rich languages, testing and randomness. J. Symbolic Logic 47, 495-548. Harper, W. L., Stalnaker, R. and Pearce, G. (1981). Ifs: Conditionals, Belief, Decision, Chance, and Time. Reidel, Dordrecht. Lewis, D. ( 1976). Probabilities of conditionals and conditional probabilities. Phil. Rev. 85, 297-315. Also in Harper eta/. (1981), pp. 129-147. Lichtenstein, S., Fisch hoff, B. and Phillips, D. (1982). Calibration of probabilities: the state of the art to 1980. Judgement under Uncertainty: Heuristics and Biases (ed. D. Kahneman, P. Slovic and A. Tversky), pp. 306-335. Cambridge University Press. Lindley, D. V. (1982). Scoring rules and the inevitability of probability. Int. Statist. Rev. 50, 1-26. Los, J. (1963). Semantic representation of the probability of formulas in formalized theories. Studia Logica 14, 183-194.

9

Belief Functions PHILIPPE SMETS /RID/A, Universite Libre de Bruxel/es, Belgium

Abstract This chapter is a short self-contained presentation of the use of belief functions, a mathematical tool for the quantification of subjective, personal credibility.

1

INTRODUCTION

In order to delimit the problems covered by belief functions, we briefly describe various types of ignorance closely related to belief. There are at least three forms: possibilistic, probabilistic or credibilistic, each endowed with its own mathematical model. 1.1

Possibility

The information that "John's height is over 170 em" implies that, in describing John, any height h over 170 is possible and any height equal to or below 170 is impossible. This can be represented by a possibility function on the height domain whose value is 0 for h ~ 170 and I for h > 170 (where 0 = impossible and I = possible). Ignorance is due to the lack of precision, of specificity of the information "over 170". This type of ignorance can be generalized with statements like "John is tall". It implies that a height less than 160 em is impossible (value= 0) and a height above 180 em is possible (value= 1). In between, one may consider that the possibility takes some intermediate value between its extrema 0 and 1, the greater the height, the greater the possibility. Ignorance is due to the imprecision that results from the use of the fuzzy, vague, ill-defined term "tall". This type of possibilistic ignorance is covered by Dubois and Prade in Chapter I 0, and will not be discussed here. 1.2

Probability

Another form of ignorance results from randomness encountered in chance

254

P. Snwrs

set-ups. For example, when throwing a dice, the probability that the outcome . . I IS one IS "6· This model can be generalized by considering that the probability of each event is not known as a real value between 0 and I, but as belonging to an interval. This results in the "upper and lower probability theory" (Good 1950; Smith, 1961; Dempster, 1967, 1968). This theory should not be confused with the one covered by belief functions. It requires the existence of an underlying probability whose value is known only to be within a crisp interval. It has been further generalized when probability is known as a fuzzy number (close to 0.6), as a linguistic variable (small) or to lie within a fuzzy interval (approximately between 0.4 and 0.5) (Zadeh, 1975). Another generalization is obtained by the introduction of some metaprobability that describes our knowledge about the value of the unknown but existing probability (Lindley et al., 1979). This meta-probability expresses in fact our degree of belief about an unknown probability, where degrees of belief are quantified by a probability function. It is a particular form of Bayesian probability. Furthermore, it is also a special case of the theory of belief functions when belief is quantified by a probability function, a particular form of belief function described below. 1.3

Credibility

Belief functions aim to model and to quantify the subjective, personal credibility (called belief hereinafter) induced in us by evidence. Some evidence is strong enough to induce knowledge: if it is II a.m. then I know it is daytime. Other not so definite evidence may induce only a belief: given the information available on 15 July, 1986, I believe that I will be in Cordes on 22 September, 1986. This belief can be more or less strong, thus admitting degrees of belief. Bayesian probabilists have claimed that this degree of belief can be quantified by probability functions whose major axiom, the additivity axiom. states that the probability of the union of two disjoint events is the sum of the probabilities of each event (Fine, 1973). The Bayesian approach is usually justified by axioms describing decision processes or betting behaviour Within such a context, our belief can indeed be described by a probability function, but it does not follow that our belief should always be so modelled Belief can exist outside any decision or betting context. It is a cognitive process that exists per se. The Bayesian argument implies only that when we face a decision problem we must be able to construct a probability function based on our belief. This chapter presents a model to quantify someone's degree of belief based on belief functions (Shafer, 1976). Within the AI community, it is often called

9

Belief Functions

255

the Dempster-Shafer model, an unfortunate denomination which allows too widespread a confusion between upper and lower probabilities and belief functions, the first dealing with an imprecisely known underlying probability, the second with the intensity of our credibility. Some modifications of Shafer's initial model are introduced, essentially the distinction between the open- and the closed-world assumptions and its impact on the normalization. This chapter is a short self-contained exposition of the whole theory, but Shafer's ( 1976) highly readable seminal book should be read before really pursuing the topic. The model developed here should not be confused with those found in recent AI research papers on belief networks (Pearl, 1986a, b). In these papers beliefs are quantified by classical Bayesian probabilities, and the problem under consideration is their implementation for AI applications. Section 2 of this chapter discusses the nature of the frame of discernment on which a degree of belief will be established, and presents the distinction between open- and closed-world assumptions. Section 3 introduces the general model and presents an example. Section 4 presents the relevant mathematical definitions and properties. Our presentation considers algebra of propositions and not sets as is often done. Both approaches could have been used. Our choice reflects a personal taste and the idea that the concept of the truth of a proposition precedes that of belonging to a set. Section 5 presents Dempster's rules of conditioning and combination. Section 6 presents Bayes theorem, generalized within a belief-function framework (Smets, 1978). Section 7 discusses discounting evidence, i.e. what to do when the available evidence is not reliable. Section 8 presents canonical experiments that can explain the meaning of the numerical value of the belief given to a proposition A. Section 9 concludes and presents some hints about the use of this model for automated reasoning.

2 THE FRAME OF DISCERNMENT 2.1

Open- and closed-world assumptions

In most probability theories, as well as in Shafer's theory, one starts by postulating some frame of discernment A (also called the Universe of Discourse or the Domain of Reference) on which evidence induces some belief. In reality, the cognitive process is hardly as simple. When faced with a cognitive problem, one starts by constructing the set KP of those propositions Known as Possible. But there is also (I) the set UP of Unknown Propositions for which we have no idea whether they are possible or impossible, and (2) the set K I of those propositions Known as Impossible. In

256

P. Smets

the classical approach, one considers that UP is empty and accepts the highly idealized closed-world assumption, i.e. that the truth is necessarily in KP and that A is KP. The content of the three sets depends not only on the problem studied, but also on the pieces of evidence available. As evidence becomes available, propositions are redistributed between the three sets. (1)

A proposition A is transferred from KP to Kl when the evidence permits the claim that A is impossible. This corresponds to the classical concept of conditioning.

(2)

A proposition A is transferred from UP to KP if the evidence induces us to consider as possible some forgotten propositions.

(3)

A proposition A is transferred from UP to Kl if the evidence induces us to consider that some forgotten propositions are in fact impossible. In practice, this has no direct impact, as the degrees of belief are constructed only on KP.

(4)

Transfer from K I to K P or UP and from K P to UP would be inconsistent with the definition of the three sets, if one accepts, as here, that the allocation of any proposition to one of the three sets is always correct. A true proposition may be correctly allocated to KP and UP. and a false proposition may be correctly allocated to K P, K I or UP.

A true proposition may not be allocated to Kl, and any propositiOn allocated to Kl will stay in Kl, inducing monotonicity for the impossible (false) propositions. The generalization could be considered by accepting that a true proposition might be in Kl and constructing some meta-belief function on the set of all propositions that expresses the degree of belief that each proposition can belong to any of the three sets. The closed-world assumption postulates an empty UP set. The open-world assumption admits the existence of a non-empty UP set, and the fact that the truth might be in UP. 2.2

Notation

This presentation of the frame of discernment is formalized as follows. One writes -,, v, 1\ and => for the negation, the disjunction, the conjunction and the material-implication connectives. The set K P will be based on A, a finite set of elementary propositions. Let D be the boolean algebra of propositions derived from A, i.e. Q contains the

9

Belief Functions

257

conjunctions, disjunctions and negations of any set of propositions of A. Let ln be the tautology relative to n, i.e. ln is the disjunction of all elementary propositions of A. Let On be the contradiction relative to n, i.e. none of the propositions of A implies On. Then the conjunction of any two distinct propositions of A is On. The set UP will be denoted by e. No details about its structure and about Kl are needed. Any support given by some piece of evidence to some proposition A of n is in fact given to A v e. In order to simplify the notation, we shall not repeat the disjunction with e, but it must be unstood that whenever a proposition A of Q is mentioned, it corresponds to A v e. The proposition On is not the contradiction, as it corresponds to On v e. There would be a contradiction if e was empty (the closed-world assumption). The proposition ln corresponds to ln v e and is thus a tautology as all propositions in KJ are false by definition. Negation of any proposition A of n, symbolized by --,A, is taken relatively to A. So --,A is the disjunction of e and any elementary proposition of A not implying A. The E symbol is used with the following meanings: A E A means that A is an elementary proposition of A; A E n means that A is a proposition of n;

for BEn, A E B means that A is an elementary proposition implying B. Thus OnE n is true but OnE A and OnE Bare false as On is not an elementary proposition. (Being an element of the algebra n is different from being an element of an element of Q.) For any A En, IAI is the number of elementary propositions BE A such that BE A. For A, BEn, the symbol A ..... B means "it is true that A implies B", i.e. A and B are such that whenever X E A, then X E B. Note that On -+ A can be asserted for all A in Q. We say that a proposition BEn is based on some elementary proposition A of A if A E B.

3 3.1

QUANTIFICATION OF DEGREE OF BELIEF General model

Suppose that there is a piece of evidence that induces in us some belief concerning the truth of propositions defined on a finite frame of discernment

258

P. Smers

A with !l being the Boolean algebra derived from A. It is postulated that there exists some finite amount of belief that is spread among the various propositions A of n according to the available evidence. For instance suppose that Mrs Jones has been murdered and we, the judges, know that the suspects arc Peter, Paul and Mary. Thus A= {Peter, Paul, Mary}. Given the available evidence, parts of the amount of belief are allocated to each of the three potential murderers, as in Bayesian model. But some evidence might support something other than only one of the three persons. Such is the case of the evidence "the mufderer is a male". This evidence supports A ="Peter or Paul", and we allocate some part of m of our total mass of belief to A without being able to split it between the two components of A. In such a case, probabilists usually invoke the Principle of Insufficient Reason or an argument of symmetry to decide that the mass m must be split into two equal parts, one for Peter and one for Paul. The originality (and the power) of Shafer's model is that it does not evoke these principles and leaves the mass m allocated to the proposition A. The total amount (mass) of belief is arbitrary, but is conveniently scaled to I without any loss of generality. The non-negative mass m(A) allocated to the proposition A En that cannot be allocated to any proposition A' such that A' -+ A, A' # A is called a basic probability number by Shafer ( 1976). (A -+ B is short for "it is true that A implies B'.) The function m: n-+ [0, I] is called a basic probability assignment whenever

L

m(A) =I

A-1 11

where ln is the tautology relative to n. The notation

L

means that the sum

A-B

is taken over all propositions A En that imply BEn, or over all propositions BEn implied by A E !l, depending on which symbol A orB is not fixed by the context. Any A such that m(A) > 0 is called a focal proposition.

3.2

Practical example

As a practical example, suppose that we are the judges and must analyse the available evidence concerning Mrs Jones' case. Three witnesses provide evidence (testimonies). Let the three pieces of evidence be symbolized by E ,. E2 and E 3 . E1:

Witness I is a janitor, who claims he heard the victim yelling and then saw a small man running out of the victim's house.

9

259

Belief Functions

E2 :

Witness 2 is an old lady, who lives across the street from the victim and who saw the crime through her window and claims the murderer was much taller than the victim.

E3 :

Witness 3 is Peter's girlfriend, who testifies that Peter was at her home far away from the victim's house when the crime happened.

How do we evaluate the meaning of these three pieces of evidence, how do we quantify their respective support for the potential murderers, how do we combine these supports, and what do we do if doubt can be cast about the quality of the testimonies. Let k symbolize the killer. E 1 supports that k is a man. Furthermore, k looks small, which fits Paul or Mary, both being small, but not Peter, who is quite tall. But as the janitor was far from the house, his opinion about the tallness of the man he saw running is doubtful, as is his testimony about the sex, as Mary has short hair and could have worn slacks. The impact of E 1 on n can be summarized by three masses, one pointing to {Peter or Paul}, one pointing to {Paul or Mary} and one unallocated, i.e. pointing to 10 . E 2 suggests Peter, but as the witness is short-sighted and claims she had taken off her glasses just before looking through the window, some reservation must be allocated concerning the value of her testimony. The impact of E 2 can be summarized by two masses, one pointing to {Peter}, the other being unallocated. E 3 suggests Paul or Mary. But as the witness is Peter's girlfriend, serious doubts must be put on her testimony. The impact of E 3 can be summarized by two masses, one pointing to {Paul or Mary}, the other being unallocated. Indeed, if the witness is lying, E 3 does not support that Peter is the killer, it only makes her testimony meaningless. Table 1 (columns m" m 2 and m3 ) presents the masses quantifying the impact of the three pieces of evidence on n. The evaluation of the masses is not discussed in this chapter; it will be briefly discussed in Section 8. Table 1

Masses derived from the three pieces of evidence, and their combination Q

m,

mz

mJ

On Peter Paul Mary Peter or Paul Peter or Mary Paul or Mary Peter, Paul or Mary

0.6

0.5 0.2 0.3

m,z

m,zJ

0.12 0.48

0.36 0.24 0.10 0.00 0.10 0.00 0.14 0.06

0.20

0.4

0.5 0.5

0.08 0.12

260

P. Sme1s

The present example is based on external evidence (testimonies). But one could just as well have used internal (objective) evidence like the fingerprints found on the weapon or the knowledge that the killer smokes a certain brand of cigarettes. 3.3

Combination of evidence

Pieces of evidence are combined by the application of Dempster's rule of combination on the basic probability assignments. The product of the masses induced by two distinct pieces of evidence is allocated after combination to the conjunction of the two focal propositions. Let mi(A) be the masses derived from evidence Ei, i = 1, 2, and let mu(A) be the mass obtained after the combination of pieces of evidence E 1 and E 2 . So m 1 (A) * m2 (B) is allocated to the conjunction A 1\ B. All such possible products are computed and all masses allocated to the same proposition are added together: mu(A) =

L

m 1 (A v X)m 2 (A v Y) =

x--,A

L

m 1(X)m 2 (Y)

x"'r=A

y--,A X"' Y =011

m123 is computed by combining m 12 with m3 in the same way. Table presents the results. Dempster's rule of combination is associative: whatever the order in which basic probability assignments are combined, the results are identical. 3.4

Belief and plausibility

The quantity m(A) measures the amount of belief that one commits specifically to A, not the total belief that one commits to A. Each mass m(A) supports also any proposition implied by A. Therefore the total degree of support (belief) that we have about the fact that a proposition A is true is obtained by adding all the masses m(B) allocated to propositions B that imply A without implying -,A (which means that On must be discarded from the sum). The degree of belief given to A is quantified by the belief function bel: n ..... [0, 1], with bel(A) =

L

m(B)

(3.1)

B-A

B,<011

By definition, bei(On) = 0, even though m(On) might be positive (see Section 4.8). The plausibility of a proposition A is the sum of the amounts of belief that are allocated to proposition B and that do not contradict A, i.e. do not imply-, A. The degree of plausibility given to A is quantified by the plausibility

9

Belief Function.'

function pi:

n---+

261

[0, 1], with

pi(A)

L

=

m(B)

B" A ;<011

It is related to bel through

pl(A) = bel(l 0 ) - bel(-, A)= 1- m(00 ) - bel(1A) The meanings of "belief" and "plausibility" are still controversial. One might prefer to call bel( A) the degree of minimal or necessary support (entailment, commitment) for A, and pl(A) the degree of maximal or potential support (entailment, commitment) for A. We shall use hereinafter the words belief and plausibility as they are those most often used in the present context, even though that usage might be subjected to criticisms. Table 2 presents the degrees of belief and plausibility derived from the data of Table 1. Table 2

Belief and plausibility functions derived from the data of Table I Q

On Peter Paul Mary Peter or Paul Peter or Mary Paul or Mary Peter, Paul or Mary

4

bel 1

pl.

bel 2

pl2

bel 3

pl3

bel 123

pll23

0.0 0.0 0.0 0.0 0.5 0.0 0.2 1.0

0.0 0.8 1.0 0.5 1.0 1.0 1.0 1.0

0.0 0.6 0.0 0.0 0.6 0.6 0.0 1.0

0.0 1.0 0.4 0.4 1.0 1.0 0.4 1.0

0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0

0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0

0.00 0.24 0.10 0.00 0.44 0.24 0.24 0.64

0.00 0.40 0.40 0.20 0.64 0.54 0.40 0.64

MATHEMATICAL PROPERTIES OF BELIEF FUNCTIONS

4.1

Belief functions

In Section 3 three functions have been defined on 0: the basic probability assignment m, the belief function bel and the plausibility function pl. Belief functions satisfy the following inequalities: = 1 - m(00 ) ~ 1

(1)

bel(1 0 )

(2)

for every n > 0 and every collection A 1 , A 2 ,

be{

y Ai) ~ f bel(AJ - L bel(Ai 1\ i>j

Aj) + .. · + ( -1)-n bel(AI

.. . ,

1\

An En,

A2

1\ ... 1\

An)

(4.1)

262

P. Smets

Shafer starts with the idea that degrees of belief satisfy these inequalities, arguing for instance that the belief in the disjunction of two propositions should at least contain the sum of belief allocated to each reduced by the belief allocated to both, the corresponding equality encountered for probability functions being unjustified. Unfortunately, this requirement is not sufficient to define belief functions, and one must postulate these inequalities for all n. Critics of Shafer's approach argue against having to postulate all these inequalities, an excessive and not very natural requirement. These criticisms justify why this presentation starts with non-negative basic probability numbers, rather than with belief functions. When all focal propositions are elementary propositions, the inequalities (4.1) become equalities, bel(A) = pl(A) for all A en, and the belief function becomes a probability function. Therefore Bayesian probability theory is a particular case of the theory of belief functions. One could consider the use of belief functions that must satisfy the inequalities (4.1) only for n:::; K < oo. In that case, the representation of bel based on the relation (3.1) might imply that some masses m are negative (Chateauneuf and Jaffray, 1986). A theory of belief where negative masses are allowed could be considered, but its meaning is not yet clear for the present author. The model based on the basic probability assignment could be abandoned, and another theory based on monotone capacities of finite order could be advocated. But the inequalities then have to be justified for all n :::; K. Shafer, in his book, indeed starts by postulating these inequalities for K = oo. That approach has been criticized as unnatural and is not proposed here. It seems easier to grasp the concept of positive masses spread among propositions then of the inequalities. When K = oo, both approaches lead to the same solution, but this is not the case when K is finite. 4.2

Communality function

A fourth function, the communality function, is defined on n, but its meaning is not obvious, which probably explains why it is so rarely mentioned. Its importance is nevertheless enormous from a computational point of view and for proving theorems. The communality function q is a function q: n-+ [0, I] such that q(A)

L

=

m(A v B)

B-•A

The four functions m, bel, pi and q define each other uniquely. Among other relations, one has m(A)=

L B-A

8;<00

(-lt-bbei(B)

9

Belief Functions

with a- b = lA

263 1\

1BI; m(A) =

L (-l)bq(A

v B)

n--,A

bel(A)

+ m(On) = L (-ltq(B) n--,A

q(A)=

L

(-l)bbel(-,B)

B-A

with b = IBI. 4.3

Vacuous belief function

The vacuous belief function is defined such that m(ln) = I bel(ln) = I bei(A) = 0

for all A -:f. ln

pi(A) = I

for all A -:f. On

q(A) =I

for all A en

It describes the belief one obtains in cases of total ignorance. Total ignorance has always been troublesome for the Bayesians, leading to strong controversies. It is either just rejected as non-existent-a Procrustean solution not followed here-or solved by the application of the Principle of Insufficient Reason: if one has k elementary propositions and there is no reason why any should be supported more than (more credible than) any others, then split the probability mass equally among them. But this does not represent Total Ignorance. There is no reason why some disjunction of elementary propositions should be more supported than any other. So one must have bei(A) equals some constant c ~ 0 for all A En, and not only for the elementary propositions A E A. Of course, this is not possible with probability functions. With belief functions, this means that with A and B such that A 1\ B =On, one has the inequality bei(A v B) ~ bei(A) + bei(B) and thus c ~ 2c; therefore c = 0 is the only solution, and it indeed satisfies all the inequalities characterizing belief functions. It corresponds to the highly logical basic probability assignment by which m(ln) = I and all other masses are null, ln being the only supported proposition. 4.4

Simple support functions

A belief function is called a simple support function (SSF) if it has at most one focal proposition different from ln. This focal proposition is called the focus

264

P. Sme1s

of the SSF. The pieces of evidence E 2 and E 3 in the example of Section 3.2 are such SSFs. SSFs correspond to a very elementary form of belief functions, the case where the evidence points partially toward a unique proposition. SSFs are a particular case of consonant belief functions. 4.5

Consonant belief functions

Consonant belief functions (Shafer, 1976) are belief functions for which the masses are allocated on focal propostions A 1 , A 2 , .•• , A. such that A 1 --+ A 2 • A 2 --+ A3 , ... , A.- 1 --+A •. In that case, bei(A " B)= min {bei(A), bei(B)}}

(4.2)

pi(A v B)= max {pi(A), pi(B)} Such bel and pi functions are called respectively necessity and possibility functions in Dubois and Prade ( 1985). Consonant belief functions are only particular cases of belief functions, but they are too restrictive as a general model to be used to quantify degrees of belief. The fact that degrees of necessity and possibility should satisfy the relation (4.2) and that some particular pair or family of belief functions also satisfies these relations does not mean that the two concepts, possibility/plausibility and necessity/credibility, can be confused. Different concepts can share the same mathematical model without sharing the same interpretation. 4.6

Dempster's rule of combination

In this presentation, each time one of the functions m, bel, pi or q is introduced with some supplementary symbols, we shall abstain from defining each one in relation to the others. This avoids the need to explicitly define m 1. pl 1 and q 1 as being the basic probability assignment m 1 , the plausibility function pl 1 and the communality function q 1 related to the belief function bel 1• The simple declaration of one of them automatically implies the others. the supplementary symbols being sufficient to know which ones are interrelated. Given two belief functions bel 1 and bel 2 induced by two distinct pieces of evidence, the belief function bel 12 that results from their combination is obtained by Dempster's rule of combination (see Section 3.3). Expressed with communality functions, Dempster's rule of combination becomes

a relation whose simplicity explains the advantage of these functions.

9

Belief Functions

4.7

265

Dempster's rule of conditioning

Suppose that we have a basic probability assignment m on n obtained after considering some initial evidence. Then suppose that we learn from a new piece of evidence that the truth is necessarily in B e !l, and thus that all propositions based on the elementary proposition A E -, B are impossible. How does this evidence modify our basic probability assignment? Let m' be the basic probability assignment obtained after taking the new evidence into account. To construct m'(A), three situations must be considered, depending on the relation between A en and the conditioning proposition BE !l. ( 1)

A -+ B. The evidence that the truth is in B does not modify the part of our total belief mass supporting A.

(2)

A 1\ B = A 1 =f. On and A 1\ -,B = A 2 =f. On. The mass A was allocated by m to A 1 v A 2 with A 1 -+ Band A 2 -+ 1B. We learn that the truth is in B; and therefore the mass that was allocated to A 1 v A 2 is transferred to A 1 , the only part of A that is compatible with the new evidence that asserts "the truth is in B".

(3)

A -+ -,B. The evidence that the truth is in B tells that all elementary propositions in A are impossible. Thus the mass m(A) is transferred to On.

Therefore m'(A) =

L m(A { c--,B

v C)

0

for all A

-+

B

otherwise

This rule is called Dempster's rule of conditioning. It implies m'(On) =bel(-, B)+ m(On) I' A _ {bel(A v 1B)- bel(1B) be ( ) - bel'(A 1\ B)

pl'(A) = pl(A q'(A) =

{

1\

q(A)

0

B)

for all A -+ B otherwise

for all A E !l

for all A -+ B otherwise

Returning to the murder case, suppose that a definitive piece of evidence tells that Peter is not the murderer; then B = {Paul, Mary}. For instance, the portion of belief that was allocated to Paul and/or Mary are unchanged, the portion that was given to "Peter or Paul" now supports Paul alone, and the portion that was given to Peter is transferred to On.

266

P. Smets

Conditioning on a proposition B is in fact a special case of Dempster's rule of combination where one combines bel with a belief function with only one focal proposition B receiving the whole mass. 4.8

Differences from Shafer's model

Shafer's original model includes the requirement m(On) = 0. (Remember that On is short for On v e, but Shafer postulates that e is empty-he always postulates the closed-world assumption.) We feel that this is unnecessary and may lead to unsatisfactory results (see Section 5.2). In the present murder case of Section 3.2, m(On) corresponds to that amount of belief allocated to none of the three suspects. We must always keep in mind that the murderer might be someone else, for example all pieces of evidence pointing to Mary and not to Peter and Paul, point in fact to "Mary or someone else other than Peter or Paul". In particular, m(On) is the amount of belief allocated to the proposition that none of the three suspects is the murderer. Had we received the evidence that the murderer must be one of the three suspects (the closed-word assumption of Section 2) then this new evidence would induce some conditioning that would imply m(On) = 0. The fact that m(On) might be non-null implies that the evidence is by nature essentially negative in that it allows one to discard some propositions. Indeed, all pieces of evidence pointing to Mary are essentially not supporting "Peter or Paul". The method of reasoning simulated by this approach is closer to an elimination procedure than to a constructive procedure. A support for a proposition is in fact a non-support for its negation taken relatively to 11. To understand m(On) > 0, one must accept the open-world assumption and consider that any amount of belief allocated to a proposition A E Q is in fact allocated to A v e, where e is the set UP (Section 2.1 ). Then m(O!l) represents the mass allocated to e. In the open-world context, ---,A ED means the complement of A relative to 11, and the mass allocated to ---,A E D is in fact allocated to ---,A v e. To say that A En and BEn contradict each other means that there are no elementary propositions in 11 that simultaneously imply A and B. If witness I points to Peter and witness 2 points to Paul, they contradict each other. If they are perfectly reliable, it means that the murderer is not in 11 = {Peter, Paul, Mary} (see Section 5.2). Shafer's approach postulates beforehand the closed-world assumption. If one were to define n such that it included e then this would lead to the same basic probability assignment as with the open-world assumption if one took care never to allocate some masses to propositions of Q that did not include e. We feel that it is easier to use the restricted nand to allow positive masses for On, keeping in mind that all masses given to propositions A E Q are always

9

267

Belief Functions

allocated to A v 8, except if the closed-world assumption is explicitly expressed. The major impact of Shafer's postulate m(On) = 0 is that the results obtained from Dempster's rule of combination must be renormalized in order to keep the total mass equal to I. Therefore he divides each mass obtained by Dempster's rule of combination by a constant corresponding to 1 - m(00 ).

5

COMBINATION OF TWO BELIEF FUNCTIONS

5.1

Axiomatic justification of Dempster's rule of combination

Suppose that we have two belief functions bel 1 and bel 2 induced by two distinct pieces of evidence. The question is to define a belief function bel 12 = bel 1 EB beh resulting from the combination of the two belief functions, where EB symbolizes the combination operator. Dempster's rule of combination can be justified by the following axioms. A1:

compositionality: beidA) is a function of A, bel 1 and beh only;

A2:

symmetry:

A3:

associativity:

(bel 1 EB beh) EB bel3 = bel. EB (beh EB beh) A4:

conditioning: if bel 2 is such that m 2 (B) = 1 then

m12(A) =

L m { c--,B 0

1 (A

v C)

for all A

--+

B

otherwise

The axiom of compositionality A 1 claims that the combination is a functional of both belief functions and maybe A, but nothing else. The axiom of symmetry A2 and the axiom of associativity A3 tell as that the result of the combination of pieces of evidence is independent of the order in which they are considered and/or they are associated. The axiom of conditioning A4 has been justified in Section 4.7. It implies that if bel 2 is vacuous then bel 12 = bel 1. Axioms A1-A4 imply that

ql2(A) =f(A, {q 1 (B): B--+ A}, {q2(B): B--+ A})

268

P. Smets

Axiom AS expresses the idea that the result of the combination will not be modified by a permutation among the elementary propositions of A:

internal symmetry: let A be a set of distinct elementary propositions A" A 2, .. . , An E A; let the propositions B" B 2, .. . , Bn be a permutation of the propositions A 1 , A 2, .. . , An; and let qi and q; be the sequences of communalities used in qu(A), with

AS:

qi = {qi(Ad, qi(A2),qi(A, v A2),qi(A3), ... ,qi(A, v A2 v ... vAn)} q;

=

{qi(B,),qi(B2),qi(B, v B2),qi(B3), ... , qi(B, v B2 v ... v Bn)}

then

f(A, q,, q2) = f(A, q',, q2) Axiom A6 considers that the mass m!2(A) given to A En is independent of the masses given by m 1 (and m2) to propositions B-+ 1A: A6:

autofunctionality: \fA E!l, A =1-lr19 m12 (A) does not depend on m 1(X) for all X-+ --,A;

A7:

three-elements: there are at least three elementary propositions in A.

A8:

continuity: for all q2(A), q 12 (A) is continuous as q 1(A)-+ 1.

The three-element axiom seems hardly critical. The continuity axiom 1s needed only to eliminate an uninteresting degenerate solution.

Theorem Uniqueness of Dempster's rule of combination: given axioms A 1A8, for all A E Q

All proofs and details about this axiomatic justification are in Smets ( 1986a).

5.2

Normalization

In order to distinguish between Shafer's definitions and ours, we use capital letters as first letter for the four functions as defined by Shafer. When Shafer introduced his model to quantify degrees of belief, he postulated M (00 ) = 0 and Bel(l 0 ) = 1. So, after combining two belief functions, he had to normalize the results in order to get Bel 12 (1 0 ) = 1. This is obtained by computing mu(A) as done here and then proportionally rescaling it into M !2(A) = mu(A)/{ l - m!2(0 0 )}. This normalization seems natural, but has been seriously criticized by Zadeh (1984) with the next counter-example. Suppose that we have a murder case with three suspects: Peter, Paul and

9

269

Belief Functions

Table 3

Peter Paul Mary

Witness I

Witness 2

M12

m12

0.99 O.ol 0.00

0.00 O.ol 0.99

0.00 1.00 0.00

0.00 0.0001 0.00

Mary, and two witnesses. Table 3 presents the degrees of belief of each witness about who might be the murderer. Witness 1 is sure that it is not Mary, that it is most probably Peter, but that it might also be Paul. Witness 2 holds similar beliefs except for the permutation between Peter and Mary. How can these two quite contradictory pieces of evidence be combined. Shafer's original solution M 12 leads to the conclusion that Paul is certainly the murderer. Zadeh does not accept this solution, as it gives full certainty to a solution (Paul) that is hardly supported at all. In fact, in the totally different situation where both witnesses had been sure that Paul was the murderer, the combined solution would have been the same M 12 . The solution m 12 within the present theory seems much more realistic as it shows a little support for the conclusion Paul, but On is highly supported (mdOn) = 0.9999). Keeping in mind the meaning of On given in Section 2.1, the most obvious conclusion one should have in the present situation is that the real murderer must be a fourth person, i.e. the solution is in the set e = UP and not in the set Q = KP ={Peter, Paul, Mary}. There is of course another way to handle the present inconsistency. The pieces of evidence are combined by a judge who obtains evidence from two witnesses, each expressing his own belief. The judge must also consider his own belief about the reliability of the witnesses. So one could introduce a meta-belief function representing the degree of belief held by the judge about the assertions of each witness. Discounting (Shafer, 1976) is one way to take into account this meta-belief (see Section 7). What represents the normalization in the present theory. Suppose that we are presented with the further piece of evidence: "The murderer is necessarily one of the group Peter, Paul or Mary". How might we accommodate this "closed-world conditioning" (UP is empty), i.e. how do we transform m12 into m'1 2 such that m'12 (0n) = 0. We must somehow reallocate m 12 (0n) to propositions of n in order to keep the sum of all the masses m'12 equal to 1. The general solution is given by m'dA) = mdA) m'12 (0n) = 0

+ c(A, m., m2)m.2(0n)

\fA en, A #-On

270

P. Sme1s

Shafer's solution corresponds to c(A, m" m2 ) = mu(A)/{1 - mu(00 )}. It can be obtained if one requires that relative degrees of belief (or plausibility) should stay constant after considering the closed world conditioning. Definition The closed-world conditioning corresponds to the impact of the absolutely certain proposition "UP = 0".

We have another axiom: A9:

let bel' be the belief function obtained from bel: !l --+ [0, I] after closed-world conditioning; then m'(0 0 ) = 0 and VA, BE !l, A, B i= On bel'(A)/bel'(B)

=

bel(A)/bel(B)

The last equality can be equivalently replaced by pl'(A)/pl'(B)

=

pl(A)/pl(B)

Axiom A9 implies that bel'( A) = c ·bel( A) with c independent of A. As bel'(l 0 ) = c(l- m'(00 )) =I, c = 1/{1- m(00 )}, as in Shafer's solution. In this chapter the combination operator EB has been considered under the open-world assumption and it has been shown that Shafer's normalization can be assimilated to the impact of the closed-world conditioning. It takes into account Zadeh's critics because if the closed-world assumption is true then the only murderer is Paul, as Peter and Mary have been eliminated by witnesses I and 2 respectively. By elimination, only Paul remains, as shown by M12· The real paradox in the counter-example lays not so much in the normalization but in the acceptance of the closed-world assumption. In a real-world situation it is obvious that if one can really believe both witnesses then one should seriously question the closed-world assumption. The solution m 12 has the advantage of showing the practical impact of the closedworld conditioning, which was not visible with Shafer's solution.

6

GENERALIZED BAYES THEOREM

Suppose that we have two finite sets of elementary propositions X and Yand let !lx and !lr be the finite Boolean algebras derived from X and Y. Let x E X and y E Y denote the elementary propositions of X and Y. Within probability theory, Bayes' theorem permits the computation of P(y I A), the a posteriori conditional probability distribution on Y given A E !lx from the set {P(x I y): y E Y} of conditional probability distributions on X given each elementary proposition y E Y and P(y), an a priori

9

271

Belief Function.\·

probability distribution on Y. One has P(y I A)= P(A I y)P(y)

/Jr

P(A I z)P(z)

This formula is based on the idea that there is an underlying joint probability distribution on the product space W = X x Y (with !lw the corresponding Boolean algebra) such that the various conditional probability distributions and the a priori probability distribution can be deduced from it respectively by conditioning and by marginalization. We have generalized this theorem when all probability distributions are replaced by belief functions (Smets, 1978). For each singleton y E Y let belx(. I y) be a belief function on the space X. (The dot corresponds to the variable whose domain is indicated by the subscript in belx(. I y).) Suppose that the a priori belief on Y is vacuous. Given these belief functions, the belief function belw on W = X x Y is constructed by the following steps. (1°)

Build the vacuous extension belw,y:2w-+[O,I] of belx(-ly) such that (a) its conditioning on the set {(x, y): x E X} is equal to belx(. I y), and (b) belw,y is the least iriformative belief function among all the ?elief functions bel~ that satisfies (a), i.e. belw,y(w) ~ bel~(w) Vwe!lw.

(2°)

liB-Combine the belw,y: belw

(3°)

Condition belw on x E !lx, the result being belw,..,: 2w-+ [0, 1].

(4°)

Marginalize belw,.., on the space Y into belr( .1 x), with belr(A I x) = belw,..(w), where w = {(x;, y;): X; Ex, y; E A}

=

belw,y 1 tB belw,y2 tB ... tB belw,yn·

The result is VA E !ly,X E nx belr(A I x) = {

fl

belx(lx I y)-

ye•A

plr(A I x) =

fl

belx(lx I y)}

yeY

{t - fl

(I - plx(x I y))}

yeA

qy(A I x) =

fl

plx(x I y)

yeA

If the closed-world assumption is accepted then the terms in the three relationships must be divided by I - c, where

c

=

f1 yeY

belx(lx I y)

272

P.

Sm<'ls

An interesting case is obtained when beJx(. I y) is vacuous for one y. Let it be so for y'. Then c = 0. The results do not depend on the closed- or open-world assumptions. (See also the diagnosis of a still unknown disease at the end of this section.) This generalized Bayes theorem satisfies the following requirements: (I

0

)

bely(. I x) is a function of {(belx(x I y), plx(x I y): y E Y };

(2°)

ply(y I x) is proportional to plx(x I y) for y E Y, Vx E !lx;

(3°)

if (i) plx(x I y) is a probability distribution function P(x I y) on X Vy E Y; (ii) x 1 and x 2 are two independent observations on X randomly selected according to P(x I y); (iii) belr(. I x;), i = I, 2, is the belief function describing the impact of the observations X; on the set Y, i.e. derived from P(x; I y); and (iv) belr(. I x 1, x 2) is the belief function describing the impact of both observations x 1 and x 2 considered simultaneously, i.e. derived from P(x" x 2 1 y) = P(x 1 I y) · P(x 2 I y); then bely(.l x1, x2) = belr(.l

xd EB bely(.l x2)

If there is a non-vacuous a priori belief function on Y, it must be EB combined with bely(. I x). If all belx(. I y) and the a priori belief function are probability functions then bely(A I x) = ply(A I x) and the generalized Bayes theorem reduces to the classical Bayes theorem. Let A E !lx and BE !ly. The vacuous-extension requirement is such that belx(A I B)= belw(A or 1B) = belw(B ::::J A), where (A or 1B) = (B ::::J A)= {(x;, y;): X; E A or y; E 1B) defined on Wand belw(B ::::J A) is the degree of belief of the material implication (B ::::J A). In the present situation at least, conditional belief and belief of conditionals (material implications) are identical concepts (Lewis, 1976). From belw, one can also compute belx(A I B) for A E !lx and BE !lr (i.e. when B is not an elementary proposition of Y). Suppose there is some a priori belief function bel 0 on Ythat has been combined with belw. Conditioning on B E !ly gives (Smets, 1978) belx(A

I B)

=

t.);om

(m 0 (y') ,.Dy' belx(A I y))}

I

pl 0 (B)

(6.1)

The importance of the generalized Bayes theorem is obvious for any inference when the available information is quantified by belief functions. For instance, for medical diagnosis problems Y is the set of mutually exclusive diagnosis, X is the set of symptoms, belx(x I y) describes our belief about which symptoms x are present in disease y, and belr(Y I x) describes our a posteriori belief about the diseases given the observed symptom x and an a priori belief on Y (Smets, 1978).

9

273

Belief Functions

Suppose that one introduces a diagnosis y' = the set of still unknown diseases. Given y', the belief on X is obviously vacuous: what can we know about the symptoms when the patient suffers from a still unknown disease? Then belr(Y' I x) is the belief that the patient presenting symptoms x belongs to the set y' of still unknown diseases. If the value is high, it means that one might be discovering a new disease. 7

DISCOUNTING EVIDENCE

Going back to the example of Section 3.2, suppose that we receive the information that the janitor was drunk. What can one say about evidence E 1 . Shafer proposes discounting this evidence by a factor c, the higher the value of c, the more it is discounted. Let rn be the basic probability assignment before discounting and m' that after discounting. Then rn'(A) = (1 -c)· rn(A) m'(l 0 ) = m(l 0 )

+c

L

\fA E Q, A -:f. 10 (7.1)

rn(A)

A"ln

This rule corresponds to the idea that each focal proposition sees its basic probability mass proportionally reduced, except 10 , which incorporate all missing masses. This rule can be justified as follows. Let the Y-space have two elements: y' = "witness tells the truth" and y" = "witness lies". The belief function bei(A) considered before discounting corresponds to belx(A I y'), and belx(A I y") is vacuous (a lie doesn't support anything). Let an a priori belief on Y be such that bel 0 (y') = 1 - c (the value of bel 0 (y") turns to be irrelevant). Compute belx(A I Y) from equation (6.1) and EB-combine the result with bel 0 . The result is them' function obtained after discounting, as described in equation (7.1 ). Suppose that, for our example, the judge discounts E 1 by a factor of 0.7, considering that the drunkeness of the janitor highly reduced the fiability of his initial testimony. Table 4 presents the impact of such discounting. Table 4 Masses and belief and plausibility functions derived from data of Table I when evidence E 1 has been discounted by a factor of 0.7 Q

m'I

On Peter Paul Mary Peter or Paul Peter or Mary Paul or Mary Peter, Paul or Mary

0.15 0.06 0.79

bel'1

pi',

m12J

bei12J

plt23

0.00 0.00 0.00 0.00 0.15 0.00 0.06 1.00

0.00 0.94 1.00 0.85 1.00 1.00 1.00 1.00

0.318 0.282 0.030

0.000 0.282 0.030 0.000 0.342 0.282 0.212 0.682

0.000 0.470 0.400 0.340 0.682 0.652 0.400 0.682

0.030 0.182 0.158

274

P. Sme1s

Note that discounting with a factor 1 results in a vacuous belief function whereas a factor 0 leaves the belief function unchanged. '

8

MEANING OF bel: A CANONICAL EXAMPLE

"When you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot measure it in numbers, your knowledge is of a meager and unsatisfactory kind." (Lord Kelvin, 1883). In order to understand what bel( A) is, the degree of belief of proposition A, we must have some canonical scale of propositions in which degrees of belief are well defined, and with which we can compare proposition A. Shafer and Tversky (1985) provide such a canonical scale. Let there be an unknown message X E 11 and a set of n translators t;: i = 1, 2, ... , n. Let p; be the probability that translator t; is selected. We can observe the result of the translation of the original message only by the translator that was selected. We ignore which translator has been used, we know only the probability with which each translator can be the one selected. Given the observed message Y, we can construct the set A; of messages from Q, the Boolean algebra derived from 11, which would have been translated into Y by translator t;. The belief bel( A) that the original message X is in the set A is obtained by adding the probabilities p; of the translators t; such that A; implies A. Such a bel is indeed a belief function. From it one can derive the functions m and pl. Furthermore, if the same message is translated twice by two independently selected translators then one can construct bel, based on the observed message Y 1 , bel 2 based on the observed message Y 2 , and bel 1 2 based on observed messages Y 1 and Y 2 considered simultaneously. Shafer and Tversky showed that bel 12 = bel 1 EB beh, the result obtained through the application of Dempster's rule of combination.

9

CONCLUSIONS

This chapter has presented a model to quantify someone's degree of belief that a proposition is true. A finite amount of belief is distributed among propositions of a frame of discernment 11. The non-negative mass m(A) quantifies the amount of belief specifically allocated to proposition A that cannot be allocated to any proposition B #- A that implies A. The degree of belief in a proposition A is the sum of the masses allocated to propositions B that imply A without implying -,A. The degree of plausibility in a

9

Belief Functions

275

proposition A is the sum of the masses allocated to propositions B that are compatible with A. One particular characteristic of this model is that it allows a positive mass to be allocated to the contradiction 00 relative to Q, the Boolean algebra derived from A. The meaning of such an allocation can be understood if one gives due consideration to the difference between the open- and closed-world assumptions. The frame of discernment A is an a priori construct on which one distributes one's belief. But one should not ignore the fact that this frame is usually nothing but an intellectual construct and that it may happen that none of the propositions of Q is true. The impact of the closed-world assumptions has been studied and a normalization coefficient derived; the result is Shafer's model. The advantage of the present approach is that it permits the evaluation of the degree of conflict among the propositions of n, and therefore a decision upon the appropriateness of the frame of discernment A and of the closed-world assumption. Axioms have been presented that justify Dempster's rule of combination used to combine two belief functions derived from two distinct pieces of evidence. Distinctness is somehow defined in the entailment functionality axiom AI. The major axiom is the conditioning axiom A4, which postulates the impact of conditioning on the masses allocation. Given these axioms, the uniqueness of the Dempster's rule of combination has been derived. The case of non-distinctness is considered in Smets ( 1986b ). Bayes' theorem has been generalized within the framework of belief functions. Discounting is a particular case of such a generalization. Belief functions provide a model that seems promising for the development of Expert Systems that need to handle uncertainty. Its use for medical diagnosis was described in Smets ( 1978, 1979, 1981 ); more recent applications in AI can be founded in Barnett ( 1981 ), Garvey et al. ( 1981 ), Gordon and Shortliffe ( 1984, 1985), Lowrance ( 1982) and Stratt ( 1984).

BIBLIOGRAPHY Bonissone, P. P. and Brown, A. L. ( 1985). Expanding the horizons of expert systems. Technical Information Series, Report 85CRD219, General Electric. Bonissone, P. P. and Ker, K. S. (1985). Selecting uncertainty calculi and granularity: an experiment in trading-off precision and complexity. Technical Information Series, Report 85CRD171, General Electric. Bonissone, P. P. (1986). Plausible reasoning: coping with uncertainty in expert systems. Technical Information Series, Report 86CRD053, General Electric. Bonissone, P. P. (1986). Summaring and propagating uncertain information with triangular norms. (Submitted for publication.) (The four papers by Bonissone and collaborators present an up-to-date survey of approximate reasoning techniques

276

P. Sme1s

and their implementation in Expert Systems. Very clear. They should be read in the order given above.) Chatalic, P. ( 1986). Raisonnement deduct if en presense de connaissances imprecises et incertaines. Un systeme base sur Ia theorie de Dempster-Shafer. These, Universite Paul Sabatier, Toulouse. (A thesis presenting the use of belief functions in an inference engine based on believed implications and facts). Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Statist. 38, 325-339. Dempster, A. P. (1968). A generalization of Bayesian inference. J. R. Statist. Soc. B30, 205-247. (Dempster's two papers introduced the idea of belief functions as developed by Shafer.) Fine, T. (1973). Theories of Probability. Academic Press, New York. (A highly critical comparison of the foundations of the various theories that have been proposed to justify the use of probability functions.) Gordon, J. and Shortliffe, E. H. (1985). A method for managing evidential reasoning in a hierarchical hypothesis space. Artificial Intelligence 26, 323-357. (A clear presentation of belief functions oriented toward the AI community, with discussion on its use and computationality in practical problems when a hierarchical structure can be imposed on the frame of discernment.) Kyburg, H. E. (1987). Bayesian and non-Bayesian evidential updating. Artificial Intelligence 31, 271-294. (A critique of belief functions, showing that many of the advantages of belief functions could also be obtained with probability functions. But belief functions are equated to interval-valued probabilities.) Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press. (THE book on belief functions. Highly readable, a must!) Shafer, G. and Tversky, A. ( 1985). Languages and designs for probability judgment Cognitive Sci. 9, 309ff. (A presentation of an experimental set-up that permits the derivation of degrees of belief that satisfy the model based on belief functions.) Smets, P. (1978). Un modele mathematico-statistique simulant le processus du diagnostic medical. Doctoral dissertation, Universite Libre de Bruxelles, Bruxelles. (Available through University Microfilm International, 30--32 Mortimer Street, London WIN 7RA, Thesis 80-70,003.) (Thesis presenting among others the generalized Bayesian theorem based on belief functions.) Smets, P. (198la). Medical diagnosis: fuzzy sets and degree of belief. Fuzzy Sets and Systems 5, 259-266. (Which types of ignorance must be considered in a model for medical diagnosis, and which models cover them.) Smets, P. (1981 b). The degree of belief in a fuzzy event. Info. Sci. 25, 1-19. (Definition of the crisp degree of belief in a fuzzy event.) Smets, P. ( 1986a). Belief functions and their combination. (Submitted for publication.) (An axiomatic justification of Dempster's rule of combination and a plea for unnormalized belief functions.) Smets, P. (1986b). Combining non distinct evidence. Proc. North American Fuzz\" Information Processing (NAFIP1986), New Orleans. (What does distinctness mean for belief functions and how should they be combined when evidences are not distinct (correlated).) Smets, P. (1986c). Bayes' theorem generalized for belief functions. Proc. European Conf. on Artificial Intelligence (ECAI-86), Vol. II, pp. 169-170. (A presentation of the generalized Bayes theorem.) Smith, C. A. B. (1961). Consistency in statistical inference and decision. J. R. Statist. Soc. B23, 1-37. (One of the papers that stimulated the work of Dempster on upper and lower probabilities.)

9

Belief Functions

277

Zadeh, L. (1984). A mathematical theory of evidence (book review). AI Magazine 5(3), 81-83. (A critique of the normalization factor in Dempster's rule of combination.) Other references

Barnett, J. A. (1981 ). Computational methods for a mathematical theory of evidence, Proc. 7th Int. Joint Con[. on Artificial Intelligence (IJCAI-81), Vancouver, pp. 868875. Chateauneuf, A. and Jaffray, J. Y. (1986). Some characterizations of lower probabilities and other monotone capacities through the use of Mobius inversion. Proc. Int. Conf on Information Processing and Management of Uncertainty in Knowledge Based Systems, Paris, pp. 229-230. Dubois, D. and Prade, H. (1985). Theorie des possibilites. Masson, Paris. Garvey, T. D., Lowrance, J.D. and Fischler, M.A. (1981). An inference technique for integrating knowledge from disparate sources, Proc. 7th Int. Joint Con[. on Artificial Intelligence (IJCAI-81), Vancouver, pp. 319-325. Good, I. J. ( 1950). Probability and the Weighting of Evidence. Griffin, London. Gordon, J. and Shortliffe, E. H. ( 1984). The Dempster-Shafer theory of evidence. In Rule-Based Expert Systems: the M YCI N experiments of the Stanford Heuristic Programming Project (ed. B. G. Buchanan and E. H. Shortliffe), pp. 272-292. Addison- Wesley, Reading, Mass. Lewis, D. (1976). Probabilities of conditionals and conditional probabilities. Phil. Rev. 85, 297-315. Lindley, D. V., Tversky, A. and Brown, R. V. (1979). On the reconciliation of probability assessment. J. R. Statist. Soc. A: 146-180. Lowrance, J. D. (1982). Dependency-graph models of evidential support. COINS Technical Report 82-26. Pearl, J. (1986a). On evidential reasoning in a hierarchy of hypothesis. Artificial Intelligence 28, 9-15. Pearl, J. (1986b). Fusion, propagation and structuring in belief network. Artificial Intelligence 29, 241-288. Smets, P. (1979). Modele quantitatif du diagnostic medical. Bull. A cad. R. Med. Be/g. 134, 33{}-343. Stratt, T. M. (1984). Continuous belief functions for evidential reasoning. Proc. 4th American Association for Artificial Intelligence Con[., Austin, Texas, pp. 308-313. Zadeh, L. (1975). The concept of linguistic variables and its application to approximate reasoning. Parts I, II and III. Info. Sci. 8, 199-249; 8, 301-357; 9, 4380.

DISCUSSION M. R. B. Clarke: This chapter gives a very clear explanation of the theory, with some good examples and interesting technical results. I shall deal first with what I see as the weak point of the theory, its semantic basis, and secondly with the question of normalization. Semantic basis of belieffunctions. There is no doubt that the Bayesian formulations of classical probability theory that have been used for evidence propagation in expert systems such as Prospector do not deal easily with ignorance. To get started, one is

278

P. Sme/s

forced to give prior probabilities to events or hypotheses even if one has no prior knowledge at all. These arbitrary assignments are difficult to make consistently and are propagated into the conclusions. A further problem with the classical theory is that commitment of probability to a hypothesis implies commitment of all remaining probability to its negation. In both these respects the Shafer theory seems superior. However, whereas the Bayesian theory has a well-defined semantics in terms of rational betting behaviour (Lindley, 1985), the Shafer theory of evidence has no such semantic basis. It provides a self-consistent method of manipulating belief numbers, but says nothing about how the results are to be used or what they mean. If probabilities are used then, in theory at least, expected gain or loss can be computed and compared for each diagnosis or projected course of action, but it is not clear that belief functions can be used in an analogous way without making equivalent assumptions. Neither is it clear what Dempster's rule is actually doing in some situations. Section 8 of the chapter discusses the translater example of Shafer and Tversky. In the (meta-)language of logic one would say that this rather special problem in probability is being put forward as a model for the theory. But the intended applications of belief functions, such as medical diagnosis, seem to be unrelated to this model. It would be useful to see an example from medical diagnosis. showing how the computed belief numbers are finally interpreted and used. Gordon and Shortliffe (1985) have done this for hierarchically structured hypotheses and have also shown that for this special case the computations are tractable. The chapter strongly maintains that belief and plausibility are not upper and lower subjective probabilities. Yet in some cases by reassigning masses they can be directly interpreted as such (Kyburg, I ':186). It is Dempster's rule that gives rise to departures from probability theory. The intervals resulting from Dempster-Shafer updating are subintervals of those resulting from application of conditioning to upper and lower probabilities. In some cases Dempster's rule gives intuitively reasonable results, while conditioning on upper and lower probabilities gives vacuously wide bounds. In other cases, however, Dempster's rules forces decisions whose expected utility is negative (Kyburg gives an example). Table 01

Peter Paul Mary

ml

m2

M12

ml2

Mean

0.99 0.01 0

0 0.01 0.99

0 1.0 0

0 0.0001 0

0.495 0.01 0.495

Normalization. The counter-example given in the chapter is not very convincing (Table D 1). The unnormalized value m 12 seems as wrong as the normalized M 12 · Suppose that 50 witnesses all give beliefO.OI to Paul. Would we want the combined belief to be 10- 100 ? One could argue that in this case the best answer is given by classical estimation theory. The widely varying beliefs in Peter and Mary, which average 0.495, could be shown by giving standard errors. The real mistake here is in assigning zero beliefs (or probabilities) to any event without thinking carefully about what happens as they tend to zero. Suppose that we have the situation shown in Table D 2, and now consider carefully the way that <) 2 1; behaves. The result given holds when b 2 /r. is large. If both (j and r. are small in such<~

9

279

Belief Functions

way that b 2 /F. tends to unity ({j = 0.01, f.= 0.0001 for example) then M 12 is ! for all three suspects. Paradoxes like the one quoted in the chapter are always likely to arise from Dempster's rule when zero rather than very small belief is assigned to an elementary proposition. TableD 2

Peter Paul Mary

1-b-f.

f.

(j

(j

1-b-f.

(I - ()- F.)/((jl/E + 2[1 -()-F.]) WM/W/E+2[1-b-F.]) (I - (j- E)/W/E + 2[1 - (j- f.])

Gerhard Paass: Philippe Smets' chapter on belief functions gives a highly informative introduction to that topic. He shows that belief functions are a flexible tool to simultaneously represent information and ignorance about the facts of interest. I want to comment on some aspects of belief functions from the view of the probability formalism. In Section 1.3 Smets postulates a difference between upper/lower probabilities on the one hand and belief functions on the other hand. Upper and lower probabilities, which assume an underlying probability measure, are contrasted with belief functions, which deal with the intensity of credibility. This difference is not quite evident to me. In Section 8 Smets discusses a canonical example, where beliefs are calculated from certain probabilities associated with a special hypothetical experiment. Hence one can define beliefs in terms of these probabilities, and in principle the whole theory could be stated in terms of probabilities. This view is backed by Shafer ( 1986, p. 133) who states that "the advantage gained by the belief-function generalization of the Bayesian language is the ability to use certain kinds of incomplete probability models". In fact Kyburg ( 1987) proves that the representation of belief states by a mass function m( ·)is a special case of the upper/lower probabilities approach. Belief functions are an elegant way to define upper and lower probabilities. In practice, however, the specification of bel(') by an expert may be difficult as a lot of restrictions have to be observed. For the specification in terms of m(') the expert has to subtract the "masses" allocated to "smaller" elements of the Borel algebra, which also seems to be somewhat unnatural. In addition, Grosof (1986, p. 269) shows by an example that specific sets of upper/lower probabilities cannot be specified in terms of a belief function. In this sense upper/lower distributions are more expressive than the Shafer/Dempster approach. In many cases, however, a representation of an inference net in terms of m( ·) or bel(·) may have advantages. Therefore it may be desirable for a user to be able to switch between the different representations. The attractive point of belief functions is the simple way in which evidence involving incompletely specified probability measures may be combined by Dempster's rule. This rule, however, is only one special way to do this. According to Kyburg (1987), it yields in general narrower probability intervals than the evaluation of upper/lower distributions. As Shafer (1986) points out it assumes that the basic probability assignments m 1(·)and m 2 ( ·)to be combined are independent, i.e. are based on independent arguments or independent items of evidence. Because of this independence assumption (which is not obvious from the axioms of Section 5.1), the rule corresponds to the formation of the product probability measure from the

280

P. Smels

probability measures underlying m 1(-) and m2 ( · ). Dependences can be taken into account by forming joint measures different from the product measure. In the same way as in the case of upper/lower distributions, it must be decided whether the pieces of evidence are independent or a more complicated design has to be used. Let us consider the example given in Section 2.2 and assume that the first two witnesses come to identical conclusions and both state m(Peter) = m(Paul) = m(Mary) = i. As combined evidence, Dempster's rule yields mn(Peter) = m12 (Paul) = mn(Mary) =! and m 12 (0) = 1. Following the interpretation of Smets, two-thirds oft he amount ofbeliefis allocated to outside suspects, although both witnesses give identical statements where outside suspects are not mentioned. This seems to be implausible, as identical statements from independent experts should lead to a reinforced joint statement and not indicate "contradictions" of sizable degree. In this case, Shafer's original approach seems to be more sensible, as it yields mn(Peter) = m 12 (Paul) = m 12 (Mary) = i. This approach, however, is plagued by Zadeh's paradox. In summary, the utilization of Dempster's rule does not seem to be very attractive. In this chapter Smets does not address the question of the numerical feasibility of the belief-function approach. As belief functions are defined on the complete Boolean algebra of events with 2k elements, it seems that the approach becomes unfeasible if the size k of the frame of discernment gets larger. It may, however, be possible to evaluate an inference net in terms of "marginal" belief functions, as can be done for the evaluation of upperflower distributions by linear-programming methods.

Didier Dubois and Henri Prade: Smets' chapter proposes a view of belief functions as generalized probability measures with a subjectivist interpretation. Parallel to this measure-theoretic view, one can develop a set-theoretic view of belief functions, introduced by Nguyen (1978) (see also Goodman and Nguyen, 1985). Considering a belief function Bel on a Boolean algebra Q of propositions, built from a finite set of (mutually exclusive) elementary propositions, it is possible to interpret Bel as a generalized logical proposition. Indeed, let ~(Q) be the set of belief functions on 0. and let BelA be the belief function with basic probability assignment rnA such that rnA( A) = I. It is then easy to define a canonical injection Q ~ ~(Q) such that 'v' A E 0, (A)= BelA, so that Q can be identified as a subset of ~(Q). To any belief function Bel, we associate an equivalent generalized proposition A (or set, if any proposition represents a subset of a frame of evidence) where A= {(A, m(A)) I m(A) > 0}. A generalized proposition is thus a weighted set of propositions. As a consequence, one may think of extending the 16 binary connectives oflogic from Q to~ to ~(Q). Let 0 stand for any binary logical connective, let Bel1 and Bel 2 be two belief functions expressing distinct bodies of evidence, and corresponding to generalized propositions A1 and A2 , Bel 1 and Bel 2 can be combined via 0 into Bel. which corresponds to A1 0 A2 , and is defined by the basic probability assignment (I) XOY=A

This definition is a generalized version of Dempster's rule (Section 3.3). Actually. Dempster's rule (in its non-normalized version) corresponds to a conjunction of generalized propositions (e.g. intersection of random sets). But union, implication, etc. can be extended by (1), as well as negation. Several notions of logical entailment between generalized propositions (or belief functions) can be defined, with various levels of strength. They correspond to several concepts of generalized set-inclusions.

9

Belil~f

281

Funclions

Note that if Bel is consonant (Section 4.5) then the corresponding generalized logical proposition A can be interpreted as a fuzzy subset of elementary propositions. In particular, the generalized notions of logical entailment, mentioned above, encompass fuzzy-set inclusion. See Dubois and Prade (1986a) for more details on this logical view of belief functions; in particular, the algebraic structure of J61(Q) (which is no longer a Boolean algebra) is studied. Equation (I) is one way of generalizing Dempster's rule of combination. As proved by Smets, the product operation combining m1 and m2 in (I) is unique. We have recently and independently obtained this result with another set of axioms (Dubois and Prade, 1986b). Namely, given two sets {a; I i = I, ... , m}, and {bi lj = I, ... , n} of positive numbers such that Li= 1. ...• m a;= I, Li= t. .... n hi= I, construct the set {cii I i = I, ... , m;j = I, ... , n} of positive numbers such that Vi,j, cii =a;* bi for some continuous operation. It is proved that if Vm, 'Vn, L cii = I then * is the product. Regarding the question of un-normalized results from the application of Dempster's rule, the main problem is what to do with m(0) obtained in (I) (with 0 = "). Smets criticizes Shafer's technique for normalization, in Section 5.2. We have also noticed the lack of numerical stability of the normalized version of Dempster's rule in the presence of conflicting information (Dubois and Prade, 1985a). Actually, the aggregation operation is not continuous in the vicinity of the total conflict situation. This is one more piece of evidence against a systematic use of Dempster's rule in its normalized form. Yager (1987) has proposed an alternative normalization procedure, which reallocates the weight m(0) in (I) to the tautology In, thus interpreting the amount of conflict as an extra amount of ignorance. This new rule is no longer discontinuous and can deal with the paradoxical case in Section 5.2. Another worthwhile addendum to Smets' contribution is the emergence of a generalized information theory Ia Shannon" in the setting of the theory of evidence. The cardinality lSI of a finite set S is a measure of imprecision of the statement "v E S", where v is a single-valued variable. A logical proposition A is a formal representation of "v E S" provided that S is a model of A. An extended notion of cardinality can be defined for belief functions, yielding a measure of imprecision

"a

HI(Bel)

=

L m(S) log2 lSI s

This measure is an extension of one proposed by Higashi and Klir ( 1983) in the setting of fuzzy sets. This measure is the working tool for a belief-function elicitation technique called the principle of minimum specificity (Dubois and Prade, 1986c), out of an incomplete specification. Note that HI(Bel) = 0 when Bel is a probability measure, so that HI does not generalize Shannon's entropy. Shannon's entropy measure has been generalized for belief functions by Yager ( 1983). It is a measure of the dispersion of focal clements: HD(Bel) = -

L

m(A) log 2 PI(A)

A!:!l

HD reduces to Shannon's entropy for probability measures. Note that if Bel is consonant then HD(Bel) = 0 (no dispersion). More details can be found in the cited papers; see also Dubois and Prade (1985b, 1987). Our final comment concerns Shafer's interpretation of the functions Bel and PI in terms of subjective, personal credicibility and plausibility, which Smets shares. As mathematical objects, the functions Bel and PI do not convey any a priori meaning, i.e. "subjectivist" or "objectivist". Now these functions were first derived by Dempster

282

P. Smers

(1967) in a statistical framework, by carrying a probability measure through a manyvalued mapping. Dempster calls these functions "upper and lower probabilities" because the precise location of the original probability measure is lost owing to the many-valued mapping; however, owing to the axiomatics of Bel and PI (as deriving from a basic probability assignment) they are only special cases of upper and lower probability measures. Interpreting the focal elements as imprecise observations, m(A) being the frequency rate of observing A, one can come up with a purely frequentist view of belief functions (Dubois and Prade, 1986d), which are then special kinds of illknown probability measures. Shafer has reinterpreted Dempster's upper and lower probabilities in terms of personal plausibility and belief. However, he has just modified the terminology. In particular, there is no attempt to justify this subjectivist view in a formal setting, for example in terms of betting behaviour or comparative belief relations, as already exists for probability measures (Savage, 1972). Such attempts exist for possibility measures (see Dubois (1986) for comparative possibility relations, and Giles (1982) for a betting-behaviour interpretation that encompasses notions of upper and lower probabilities more general than belief functions). Hence, so far, the subjectivist view of belief function is not supported (and the canonical experiments of Section 8 are not enough to do the job), although this way of modelling subjective uncertainty judgment looks very convenient in practice-much more convenient than the rigid framework of probability theory.

Reply: (I) The relation between belief functions and lower probabilities is really delicate (as seen in the three comments). The Bayesian credo is that a credal state can be described by a unique probability function that gives to any proposition a unique number that measures its degree of belief, and that these numbers obey the additivity rule (and the other properties) of probability functions. The credo for our transferable-beliefs model (which we think corresponds to Shafer's model) is that the degree of belief in any proposition can also be described by a unique number, but that these numbers obey the superadditivity property (and the other properties) of belief functions: bel(A v

B)~

bel(A)

+ bel(B)

- bel(A

1\

B)

The credo of the proponents of the upper and lower probabilities model is that there exists a family 9 of probability functions that describes our credal state. The probability P(A) of any proposition A is between the upper P*(A) and lower P *(A) probabilities with: P .(A) = min ( P(A): P P*(A) = max ( P(A): P

E E

.cpl) 9)

Conditioning on B is obtained by the conditioning of each P in 9; therefore

I B)= min (P(A I B): P E 9) P*(A I B)= max (P(A I B): P E 9)

P*(A

These results are not those obtained by Dempster's rule of conditioning (see Section 4.7). Finally, Dempster's model for upper and lower probabilities is often interpreted as a special case of the former when the family 9 of potential probability functions quantifying our degree of belief is restricted to the probability functions such that the

9

Belief Functions

283

lower probabilities satisfy the inequalities of Section 4.1. But the Dempster's rule of conditioning is not justified in this interpretation. The origin of Dempster's restricted family & can better be explained by considering a probability function on a space X and a one-to-many mapping G from space X to space Y. If one defines (--+ is the material implication) P*(A) = P({x: G(x)" A# 0}) P*(A) = P({x: G(x)--+ A})

one gets Dempster's upper and lower probabilities. Even though Dempster's model and the transferable-beliefs model share syntactical properties, they are not identical. Dempster's model presupposes a probability function on a certain space X and a one-to-many mapping G, which is not the case for the transferable-beliefs model. That both models share the same mathematical properties is not an argument for them being the same concept. Remember that water flow and electricity can be described mathematically by the same differential equationsbut water is not electricity. Dempster's model and the transferable-beliefs model share the same interrelation as an urn and a subjective probability. A subjective probability is not an objective probability. That they are numerically identical is not an argument. It results from the so-called frequency principle (Hacking, 1965, p. 135), which states (with chance and belief for objective and subjective "probability"). belief( A: chance( A)

=

p)

= p

that is, the belief in A is numerically equal to the chance of A. By choosing various A, one constructs a scale of belief, which can be used to assess the degree of belief of any further proposition (the concepts of urn and chance becoming unnecessary). To be acceptable, this principle also requires that belief and chance share the same mathematical properties (which Bayesians accept). In Dempster's model, the probability P(x) for x EX induces on the space Y a mass m(x) = P(x) such that

P.(A)

=

L

m(x)

G(x)~A

Thus m(x) corresponds to the mass m postulated in the moving-masses model. The Shafer-Tversky translator example permits the construction of a scale for a degree of belief. It uses an analogy of the "frequency principle": the degree of belief given to message y in Y is numerically equal to the lower probability of y computed from the translator example. Mathematical properties of both models are the same. In order to assess the degree of belief in any other proposition (unrelated to any assumed translator), one uses that scale. Considering these remarks, one may wonder if the "Shafer-Dempster" qualification is not as unfortunate as the use of "probability" to describe both objective and subjective concepts. (2) In his comments, Clarke complains about the absence of well-defined semantics for Shafer's theory. In fact, this complaint covers two problems solved in the probability domain by the urn model and exchangeable bets or by axioms like those of Savage. An analogy of the urn model exists in Shafer's theory. The Shafer-Tversky translator provides it. Its operationality may be weak ... but exchangeable bets are neither an efficient method to assess someone's belief. The operationality of a

284

P. Smers

canonical experimental model is not required. The experimental model is described in order to give a meaning to the value pin statements like "the degree of belief in A is p". Bayesians will claim that p has the properties of a probability function. We claim that p has the properties of a belief function. What about decision? We claim that beliefs obey the transferable-beliefs model. When someone must take a decision, he must then construct a probability function derived from the belief function that describes his credal state. This probability function is then used to make decisions. One obvious (if not yet fully justified) way to build this probability function is P(A) =

L

m(B)IA "

BI/IBI

Be(l

where IAI is the number of elementary propositions in A. It corresponds to a Generalized Insufficient Reason principle: a mass given to the disjunction of n elementary propositions is split equally among these n propositions. Bayesians might then argue: why bother with belief functions if their only observable translation is derived from a probability function ... let us use a probability function from start. But see the following counter-example. Peter, Paul and Mary are the three potential killers, and evidence I says there is the same support for the boys as for the girl, i.e. m 1 (Peter v Paul) = m 1 (Mary) = 0.5. The derived probability is the highly reasonable P 1 : P 1(Peter) = P 1 (Paul) = 0.25 P 1(Mary) = 0.50 Evidence 2 is "Peter cannot be the killer" (he was in America), i.e. m 2 (Paul v Mary)= I

P 2 (Paul) = P 2 (Mary) = 0.5 Combining the two pieces of evidence gives m 12 (Paul) = m 12 (Mary) = 0.5

i.e. Pn(Paul) = PdMary) = 0.5 a highly reasonable solution. Let us use the probabilist approach with P 1 and the conditioning evidence that the killer is "Paul or Mary". We get P'1 2 (Paul) =

1

P'n(Mary) =

1

with P'd ·) = P 1 ( • I Paul or Mary). This solution is unsatisfactory. Just think about the results one would get if the two pieces of evidence had been considered in the reverse order! Criticism that we have changed the terms of reference is not adequate either ... except if one completely redefines the conditioning process. Therefore we provide here an example where the moving-masses approach is OK whereas the strict probabilist approach leads to paradoxical results. (3)

The second complaint of Clarke-the absence of an axiomatic characterizatioll

9

Belief' Functiom·

285

like Savage's-is real. Such a characterization would be useful ... but unfortunately it is not yet available. Is this a criticism of the theory? I do not think so. It would give it a added bonus, but if this absence is taken as a weakness of the theory, one might well wonder how people have managed to use probability theory for so many years without a well-defined characterization. Belief-function theory is in its infancy, and it is to be hoped that such a characterization will be formulated. The fact that belief functions are defined by an infinite number of postulates (the inequalities given in Section 4.1 must be satisfied for all n) will surely lead to serious difficulties. (4) A question was raised about what does "distinct" mean. Evidence corresponds to a restriction on some underlying space. Consider spaces X and Y. Given evidence I (subset x of X is true) and evidence 2 (subset y of Y is true), we build two belief functions bel 1 and bel 2 on some space Z. Consider the belief on Yinduced by evidence I (x is true) and the belief on X induced by evidence 2 (y is true). If these two belief functions are vacuous, then the pieces of evidence are distinct. Therefore two pieces of evidence are distinct if each leaves one totally ignorant about the particular value the other will take (Smets, 1986b ). (5) The normalization problem is delicate. The idea proposed by Clarke is another way to solve it, but the limiting result when e goes to 0 depends on the ratio (jfe 2 , an unpleasant situation. I think the value of Zadeh's paradox lies in showing that a blind application of normalization is dangerous. We defend the idea that normalization should not be used except if one accepts explicitly the close-world assumption (Section 2). In the example of Paass, the witnesses are really very ignorant about who is the killer, and the belief of ~ (open-world assumption) is not necessarily unsavoury. Should we accept the closed-world assumption, the results of the normalization would be what Paass feels reasonable. (6) Dubois and Prade present some other syntactical rules to combine pieces of evidence. Their relevance to the moving-masses model is not obvious, but merits further exploration. Their justification of Dempster's rule of combination is based on the assumption that c;i is a function of a; and bi-a strong assumption that needs justification.

Additional references

Dubois, D. (1986). Belief structures, possibility theory and decomposable confidence measures on finite sets. Computers and Artificial Intelligence (Bratislava) 5, 403-416. Dubois, D. and Prade, H. (1985a). Combination and propagation of uncertainty with belief functions-a reexamination. Proc. 9th Int. Joint Conf on Artificial Intelligence (IJCAI-85), Los Angeles, pp. 111-113. Dubois, D. and Prade, H. (1985b). A note on measures of specificity for fuzzy sets. Int. J. Gen. Syst. 10, 279-283. Dubois, D. and Prade, H. (1986a). A set-theoretic view of belief functions. Logical operations and approximations by fuzzy sets. Int. J. Gen. Syst. 12, 193-226. Dubois, D. and Prade, H. (1986b). On the unicity of Dempster's rule of combination. Int. J. Intelligent Syst. I, 133-142.

286

P. Smets

Dubois, D. and Prade, H. (1986c). The principle of minimum specificity as a basis for evidential reasoning. Proc.lnt. Con{. on Information Processing and Management of Uncertainty to Knowledge-Based Systems, Paris, pp. 4Q--43. Dubois, D. and Prade, H. (1986d). Fuzzy sets and statistical data. Eur. J. Operational Res. 25, 345-356. Dubois, D. and Prade, H. (1987). Properties of measures of information in evidence and possibility theories. Fuzzy Sets and Systems 24, 161-182. Giles, R. (1982). Foundations for a theory of possibility. Fuzzy Information and Decision Processes (ed. M. M. Gupta and E. Sanchez), pp. 183-195. North-Holland, Amsterdam. Goodman, I. R. and Nguyen, H. T. (1985). Uncertainty Modelsfor Knowledge-Based Systems. A Unified Approach to the Measurement of Uncertainty. North-Holland, Amsterdam. Grosof, B. N. (1986). An inequality paradigm for probabilistic knowledge. Uncertainty in Artificial Intelligence (ed. L. N. Kana) and J. F. Lemmer), pp. 259-275. North-Holland, Amsterdam. Higashi, M. and Klir, G. J. (1983). Measures of uncertainty and information based on possibility distributions. Int. J. Gen. Syst. 9, 43-58. · Lindley, D. V. (1985). Making Decisions. Wiley, London. Nguyen, H. T. (1978). On random sets and belief functions. J. Math. Anal. Applies. 65, 531-542. Savage, L. J. (1972). The Foundations of Statistics. Dover, New York. Shafer, G. (1986). Probability judgement in Artificial Intelligence. Uncertainty in Artificial Intelligence (ed. L. N. Kana) and J. F. Lemmer), pp. 127-135. NorthHolland, Amsterdam. Yager, R. R. (1983). Entropy and specificity in a mathematical theory of evidence. Int. J. General Syst. 9, 249-260. Yager, R. R. (1987). On the Dempster-Shafer framework and new combination rules. Information Sciences 41, 93-137.

10

An Introduction to Possibilistic and Fuzzy Logics DIDIER DUBOIS and HENRI PRADE Laboratoire Langages et Systemes lnformatiques, Universite Paul Sabatier, Toulouse, France

Abstract This chapter discusses the notions of degree of truth for vague propositions and of degree of uncertainty in the presence of partial information. Possibility and fuzzy logics are then introduced for the treatment of uncertainty and vagueness respectively. Vague quantifiers are also considered. Some applications are mentioned.

INTRODUCTION

The expression "fuzzy logic" is used to refer to a variety of approaches proposing a logical treatment of imperfect knowledge usually referring explicitly to fuzzy-set theory. However, a distinction among these approaches can be made between those that deal primarily with vagueness and those whose primary concern is uncertainty. Issues of vagueness and uncertainty have become important with the emergence of advanced information systems equipped with some reasoning capabilities. Fuzzy and possibilistic logics have developed on their own for some 20 years now, and it has become worthwhile to compare this methodology with others that were independently suggested to address related problems of knowledge representation and processing. This chapter describes the basic features of a logic of vagueness and a logic of uncertainty, which can be related via the basic principles of fuzzy-set and possibility theory. Possibilistic logic is a logic of partial ignorance that contrasts with betterknown probabilistic logic systems. It is not possible to represent ignorance in terms of known probability values. Within probability, ignorance is often wrongly interpreted as randomness, where outcomes are equally probable. However, the state of knowledge where there is an equal lack of certainty about all events (including non-elementary ones) that are liable to occur cannot be expressed by a single probability measure. In contrast, possibility theory captures, in a very simple way, states of knowledge ranging from NON-STANDARD LOGICS FOR AUTOMATED REASONING ISBN O·I2-649520·3

Copyright (f) 1988 Academic Pre."s Limited All rights of reproduction in any form re.w?rl 1ed

288

D. Duhoi.l' and H. Prade

complete information to total ignorance. Fuzzy logic, in contrast, is a logic of vague predicates, and escapes the laws of Boolean algebra. As such, it cannot be paralleled with probabilistic logic because they address radically different issues, and not mutually exclusive ones. For instance, one could think of assigning a grade of probability to a vague proposition. Moreover, fuzzy logic is very deviant because automated reasoning processes are based on meaning computation, and can no longer be based on the usual symbolic theoremproving methodologies in a straightforward manner. The first section discusses the notion of truth, and what happens to it when the available information is incomplete and/or the propositions to be evaluated contain vague predicates. Emphasis is put on the question of truthfunctionality. It is indicated that a logic of uncertainty is generally not truthfunctional, while the logic of vagueness based on fuzzy sets, in the presence of complete information, is truth-functional. Section 2 introduces possibility theory and describes elementary patterns of reasoning with possibilityqualified propositions. The expressive power of this approach is tentatively compared with other numerical or non-numerical approaches, i.e. probability, evidence theory, modal and default logics. The third section reviews two approaches to the treatment of vagueness in logic: a "syntactic" approach where intermediary grades of truth are attached to propositions, and a "semantic" approach where the meaning of logical propositions is explicitly represented in the reasoning mechanisms. Basic patterns of reasoning are also described under the two approaches.

1

DEGREE OF TRUTH AND TRUTH-FUNCTIONALITY

Truth is generally understood as the conformity between a statement and the actual state of facts to which it supposedly refers. Here, however, a degree of truth is rather a measure of agreement between the representation of the meaning of a statement and the representation of what is actually known about reality. This is a practical view of truth in an information-system perspective. A statement describes the properties of objects and their interrelationships, under the form of a symbolic expression. The meaning of a statement can be viewed as a constraint restricting the value of variables that are implicit in the statement. This view is supported by Zadeh (1981), who defines procedures for the computation of meaning. What is known of reality is supposedly stored in a database fld, in the form of statements. What can be said of the truth of a query statementS depends upon our state of knowledge (the information in fld), and derives from a matching procedure between the meaning of S and of the contents of fld. According to the respective precision of Sand the information in .rJI, the truth of Sis asserted, refuted, but may also

10

Possihilistic and Fu::::::y Logics

289

be only partially known (pervaded with uncertainty), or may be a matter of degree (S is vague). 1.1 A semantic approach to the computation of truth and Prade, 1985a)

(Dubois

It is assumed that Sand fJB refer to the same universe of discourse (domain) U. U is generally a multidimensional space. A statement w is translated into a constraint "X is M(w)" by means of a meaning computation procedure such as PR UF (Zadeh, 1978b ). X is a vector of variables to which w refers, and M(w) is a relation linking the values of these variables. M(w) is a subset of U called the meaning of w. According to whether M(w) is a singleton, a subset or a fuzzy subset (Zadeh, 1965) of U, w is said to be precise, imprecise or vague. Vagueness explicitly refers to the existence of borderline elements of U to which the statement as well as its contrary cannot be completely applied, while imprecision refers to the lack of specification of the meaning of w. The meaning M(!JB) of the set of statements stored in fJB is obtained by performing the join of the relations M(w) for all win !JB. M(!JB) can be viewed as the set of possible states of the world. Example Consider the statement w = "John is tall". It is represented by the statement "height[ John] is TALL" where TALL is a vague predicate and height[ John] is a variable taking its values on the set U of heights between 0.5 and 2.5 m. M(w) is characterized by the membership function of the fuzzy set of "tall" heights, i.e. to each height is allocated a degree of tallness between 0 (non-membership) and 1 (complete membership).

Note that the notion of precision is not absolute. It depends on the way the universe of discourse is defined. In the above example w is a vague statement because U = [0.5, 2.5]. But if U ={SHORT, MEDIUM, TALL} then w becomes precise. In the framework of classical logic, predicates are always precise or imprecise, but are not vague; they will be referred to as crisp predicates. In the following a proposal initiated in Bellman and Zadeh ( 1977) is made to represent truth of crisp or vague statements in the presence of precise, imprecise or vague information. The approach is illustrated using the example of a data base fJB containing an item of information about John's height, and of a statement S pertaining to his height. (i) Crisp statement; precise information

Consider the situation shown in Fig. I. M (81) is then a singleton of U, say { u 0 }. For instance, u 0 = I. 7 m; i.e. we know that John is 1.7 m tall. M(S) is an

290

D. Duhois and H. Prade A= M(S)

M(Jl) = {uo}

0------------~====~========~ 165 1.70 Sizes

Fig. 1

ordinary subset A of U. In the example, S = "John is more than 1.65 m tall" and M(S) = [1.65, 2.5]. Let t(S I81) be the degree of truth of statementS w.r.t. 81. The degree of i:ruth is then (I)

where J.I.A is the characteristic function of A, the set of heights compatible with S. In the example, J.I.A(u 0 ) = I, i.e. Sis true. (a)

A= M(S)

o------~~~---=======~1.65 o b Size (b)

A= M(S)

0

1.65

0

b

Size

(c)

r----oI

I

I I I I I I

0

0

I I

A= M(S)

I I

I I

b 1.65

Size

Fig. 2 (a) S is necessarily true. (b) S is possibly true and possibly false. (c) S is impossible.

10

Possibilistic and Fuzzy Logics

291

(ii) Crisp statement; imprecise information

Let us assume that S is a crisp statement (e.g. a standard wff) and that f!J contains only crisp information but M(f!J) is not precise, i.e. in the example John's height is known only to lie in some subset of U. In Fig. 2, M(f!J) = [a, b]. Several situations may occur. (a)

M(81) s;; A; then Sis surely true, i.e. t(S I81) = I. For instance see Fig. 2(a): Sis as in (i), but M(81) = [1.68, 1.72].

(b)

An M(fJI) -:f. 0 but A does not contain M(f!J). S is possibly true or possibly false. This situation is closely related to the case when a wff S is consistent with the contents of f!J taken as axioms, but cannot be inferred from them. See Fig. 2(b), where for instance John's height is known to be between 1.60 and 1.70, S being as in (i).

(c)

An M(f!J) = 0; then Sis surely false; i.e. t(S IfJI) = 0. For instance, in Fig. 2(c) S is as in (i) and M(81) = [1.50, 1.60]. Here M(f!J) s;; A, where A, the complement of A, is the interpretation of "not S", denoted --, S.

To account for these various situations, a set function and defined by fl(E)

=

{10

if En~ -:f. otherw1se

n can be introduced

0

(2)

By convention, fl(S) is short for fl(M(S)). n is called a possibility measure, in the sense of Zadeh (1978a), because fl(S) evaluates to what extent it is possible that S is true. The degree of truth is here extended and becomes for each S a possibility distribution, i.e. the characteristic function of a subset of {0, 1}, denoted J.lr<s 1..,,, such that J.lr(SIJI)(O) =

n(--, S),

J.lr(SIJI)( 1) =

fl(S)

(3)

The above three cases can be characterized in the following way: (a)

fl(S) = 1, fl(--, S) t(S If!l) = 1;

(b)

fl(S)

(c)

fl(S)

= 1, = 0,

t(S I81)

= 0,

i.e. t(S I81)

= {1 },

which is identified with

= 1, i.e. t(Sif!J) = {0, 1}; fl(1S) = 1, i.e. t(Sif!J) = {0}, which

fl(1S)

is identified with

= 0.

Note that in the example considered here the case when fl(S) = n(-,S) = 0 cannot occur. It corresponds to logical inconsistency, and appears only if M(81) = 0.

292

D. Dubois and H. Prade

An equivalent description could have been made in terms of a set function N defined by

N(E)

={I0 ¢>Aotherwise ~ s;

(4)

It is called a necessity (certainty) measure and 1s related to the above possibility measure by the following identity: N(S) = I - 0(1S)

(5)

which just stresses that "Sis necessarily true" means that -,sis impossibly true.

o--~~~====~====~~==~~ 1.70 u0 =1.75 1.80 Sizes

Fig. 3

(iii) Vague statement; precise information

M(!?l) is again a singleton {u 0 } of U. In the example u0 = 1.75 m. M(S) is a fuzzy subset A of U, sketched in Fig. 3, which represents the sizes more or less compatible with the concept "tall" in the current context. The statement to be evaluated is S = "John is tall". Consistently with (I), its degree of truth is defined by

(6)

t(S 181) = J-1A(u 0 ) E [0, I]

Intermediary degrees of truth thus appear when vague statements are considered. These degrees are strictly between zero and one as long as u0 is a borderline case for the vague category A, and is neither a prototype of the objects satisfying the predicate A nor of the objects satisfying the predicate A. M(S) =A

.,. = fLF

0~----~--~----L-----~--1.60 1.65 1.80

Fig. 4

10

Possihilistic and Fu==.v Logics

293

(iv) Crisp statement; fuzzy information

In contrast with the previous case, it is the available information that is vague; i.e. M(&l) is a fuzzy set F, describing what is known of John's height. F is pictured in Fig. 4. The membership function of F is here interpreted as a possibility distribution n = Jl.F because the values in the support ofF (defined by {u E U IJJ.F(u) > 0}) are mutually exclusive candidates for John's height. In contrast, in (iii) the values in the support of A were simultaneously somewhat compatible with the query statement S; i.e. Jl.A was not a possibility distribution. Here JJ.F(u) = n(u) is the degree of possibility that John's height is exactly equal to u. Because the information is not precise, but the statement S is crisp as in (i) and (ii), the truth or falsity of S may not be known with certainty as in case (ii). But, because the information is fuzzy, this certainty is graded. Jl.r<Si.<~J is a possibility distribution over {0, I}, ranging on the interval [0, 1], defined by lf.r<Si.<~J( I) = sup {n(u) I u E A} ~ O(S)

(7)

lf.r<Si<~J(O) = sup {n(u) I u ¢ A} ~ fl(-, S)

(8)

with A = M(S), consistently with (2) and (3). Note that in this case strictly speaking we do not have intermediary degrees of truth. n is a set-function called a possibility measure (Zadeh, 1978a) and fl(A) is the degree of possibility that Sis true when A = M(S). The degree of necessity that Sis true N(A) = I - fl(A), i.e. it is the degree of impossibility that S is false. Possibility measures are such that VA, B, fl(A u B) = max (fl(A), fl(B)), and max (O(A), O(A)) = I; i.e. at least one of the two numbers defined by (7) and (8) is equal to I.

(v) Vague statement; fuzzy information (general case)

In this case both M(S) and M(PA) are represented by fuzzy sets. In the example, we want to know whether John is tall knowing that his height is about 1.70 m. See Fig. 5. A=M(S)

Fig. 5

294

D. Dubois and H. Prade

This case combines case (iii) and (iv), in the sense that there are intermediary degrees of truth that are not precisely known. J.lA takes values on the unit interval [0, 1], but because the value of John's height is ill-located in M(&d), t(S I PA) is itself a fuzzy set of [0, 1], which can be interpreted as a fuzzy truthvalue (Bellman and Zadeh, 1977), whose membership function is defined by the extension principle (see e.g. Dubois and Prade, 1980a) J.lr(Si.*J(v) =sup {n(u)IJ.LA(u) = v} u

(9) J.lr(s 1..,,(v) is the grade of possibility that the degree of truth of S given PAis v. The fuzzy truth value t(S I!14) can be approximated by means of two numbers O(S) and N(S), which extend (7) and (8) to the case of a vague statement (Zadeh, 1978b; Dubois and Prade, 1985a), with A = M(S):

O(S) =sup min (J.LA(u), n(u))

(10)

u

N(S) =I- 0{1S) = infmax (J.lA(u), I- n(u))

(II)

Namely, given a truth-value v* with grade of possibility I, it is easy to prove that N (S) ~ v* ~ O(S) (Dubois eta/., 1986). More specifically, when M (PA) = {u 0 }, fl(S) = N(S) = J.lA(u 0 ). Moreover, fl(S) and N(S) can be obtained from t(S I PA) directly (Dubois and Prade, 1985c). fl(S) can be viewed as the degrees of possibility and necessity that S is "true", if we interpret "true" by extending its definition from {0, 1} (i.e. J.ltrue( 1) = 1, J.ltrue(O) = 0) to [0, 1] by letting J.ltrue(v) = v, Vv E [0, 1]. We then have in any case fl(S) =sup min (J.lr(s 1.-.,(v), v)

( 12)

v

N(S) = infmax (1 - J.lr(Si!AIJ(v), v).

(13)

v

Table 1 summarizes the above discussion. Note that in the case of statistical knowledge (the information in PA is represented by an histogram) instead of fuzzy knowledge, and of a crisp query, we recover probabilistic logic, where the probability that S is true is equal to 1 minus the probability that S is false.

1.2

Truth-functionality issues

In this section we examine the status of logics of uncertainty and vagueness with respect to the existence of truth-functional connectives. It is proved that logics of uncertainty, such as the one arising in case (iv) of the previous

10

Possibilistic and Fuzzy Logics

295

Table 1

Query

Knowledge

Truth

Logic

Crisp Crisp

Precise Imprecise

Classical Close to modal

Crisp

Fuzzy

Vague Vague

Precise Fuzzy

0 or I 0, I or {0, I} Possibility { ofO, Possibility of I Between 0 and I Fuzzy truthvalue

Complexity

"Possibilistic" Many-valued Fuzzy

(Arrows indicate the orders of increasing complexity)

section, cannot be truth-functional, while this property can be preserved for the logic of vagueness based on fuzzy sets in the presence of complete information. Let us assume that S is a crisp statement. S can only be true or false, eventually. However, the available information in f!l may prevent us from concluding clearly about this matter. For instance, when f!l contains vague statements and M (f!l) is fuzzy, only degrees of possibility and necessity that S is true can be computed. One may think, in order to model the lack of knowledge about the truth or falsity of S, an intermediary degree of truth in some many-valued logic could be used. One advantage would be that, using the truth-functionality property, it would be easy to assess the state of uncertainty about the truth of a compound statement in terms of the "degree of truth" of elementary statements. Let us denote by r(p) such a degree of truth of a classical proposition p. For simplicity, we assume that p belongs to a Boolean algebra, supposedly finite; i.e. we restrict ourselves to propositional calculus. r(p) is computed for instance out of some measure of uncertainty pertaining to its meaning M(p) = A, for instance r(p)

=

![fl(A)

+ N(A)]

as implicitly suggested in Gaines ( 1976). In the following we make no assumption about the way r(p) is actually elicited. The following result proves the uselessness of any truth-functional [0, 1]valued logic with continuous connectives as a model of logic of uncertainty on a Boolean algebra of standard propositions. Proposition Let f!J be a finite Boolean algebra of propositions and let r be a truth-assignment function fJ-+ [0, 1], supposedly truth-functional via continuous connectives. Then Vp E ~ r(p) E {0, 1}. Moreover, r is an interpretation in the sense of propositional calculus, i.e. r( p) = 1 ¢> r(--, p) = 0.

296

D. Dubois and H. Prude

Proof Truth-functionality implies that there exists a function f: [0, I] --+ [0, I] and a two-place operation * on [0, I] such that Vp, r(1p) = f(r(p)),

withf(l) = 0 andf(O) = I

Vq, r(p v q) = r(p)H(q),

with I* I= I= 0* I= I •0 and 0•0 = 0

Using results in Dubois and Prade ( 1982a), f is a continuous order-reversing involution in [0, I] (because ---, 1p = p), and * is a continuous monotone semigroup of [0, I] called a triangular co norm (Schweizer and Sklar, 1963). Letting p = q leads to r(p) = r(p) H(p), i.e. * is idempotent. The only idempotent conorm is "max". Hence r is a possibility measure on E(f!l), the set of atoms of&. The truth-functionality of negation implies, moreover, r( p) = f(r(ip)), while in possibility theory max (r(p), r(1p)) = 1. Hence Vp, max (r(p),f(r(p))) = I. Hence the result follows. QED Thus assuming truth-functionality leads back to the case when propositions are all known as being either true or false, which, using the semantic view of truth values developed in the preceding section, implies that the available information in the database f!J is precise. As a consequence, logics of uncertainty cannot be truth-functional. This result is a reminder of the well-known fact in mathematics that a non-trivial Boolean algebra that is linearly ordered has only two elements. When vague propositions are allowed, the result is no longer valid. Indeed, algebras of vague propositions (whose interpretation are fuzzy sets) are no longer Boolean. The properties of a family of fuzzy subsets of a given universe depend upon the choice of set-theoretic operations. Whatever this choice may be, the necessity of relaxing the Boolean structure is proved in Dubois-Prade ( 1980b ). The most popular choice of operations is (Zadeh, 1965) union

lf.AvB =max (JJ.A, JJ.B)

(15)

intersection complementation

( 14)

llx = 1 - Jl.A

( 16)

Using these connectives, all properties of the Boolean algebra are preserved except Au A#- U, An A= 0. In other words, we get a complete distributive lattice with a pseudocomplementation. The above choice of operations is unique, in order to keep this structure, except for the complementation, which can be more general. In that context it was proved by Ponasse ( 1978) that fuzzy-set theory provides a proper representation of Lukasiewicz many-valued algebras; it is a counterpart of the famous Stone representation theorem for Boolean algebras. Such a work validates the interpreta-

/0

297

Possihilistic and Fuzzy Logics

tion of vague predicates in terms of fuzzy sets. Although in the following, fuzzy set-theoretic operations are always defined by ( 10)-( 12), other connectives exist. Axiomatic settings for fuzzy-set-theoretic operation are reviewed in Dubois- Prade ( 1985b ), based on extensive use of results in functional equations. However, algebraic structures that are obtained when departing from (14)-(16) are poorer. Truth-functionality is recovered when evaluating the grade of truth of a fuzzy predicate in the presence of precise information (case (iii) in Section 1.1). Indeed, we get, as a consequence of ( 14)-(16) and (6), the minimum, the maximum and the complement to 1 as models for conjunction, disjunction and negation respectively. Hence many-valued logics look useful for accommodating a logic of vagueness, when uncertainty is ruled out. An intermediate degree of truth for a vague proposition p is then interpreted by the fact that p does not perfectly match precisely described facts, which clearly differs from the case where truth is ignored because the actual facts are illknown (i.e. uncertainty), although pis a perfectly clear-cut statement. In that respect, fuzzy-set theory offers an interpretive framework for many-valued logics (Rescher, 1969). As suggested earlier, when both vagueness of statements and uncertainty about actual facts are present, the grade of truth can be defined as a fuzzy number, interpreted as a possibility distribution over truth-values. This representation nicely combines the existence of intermediate truth-values and the lack of knowledge about which is the actual one. As proved by Dubois and Prade ( 1985c), truth-functionality is also lost generally in this case, for instance, t(S or S' I!14) cannot be expressed in terms of t(S I!14) and t(S' I81), except in the following special case of decomposability: M(!Jl) is a fuzzy Cartesian product F x G on U x V with F defined on U, G on V, Jl.F x G = min (Jl.F, Jl.G), M(S) is a fuzzy set of U and M(S') is a fuzzy set of V; we then have

........._

t(S or S' I81) = max (t(S I!14'), t(S' I81"))

( 17)

where 84' is the part of 81 such that M(!Jl') = F, 81" is the part such that M(!Jl") = G, and is the maximum operation extended to fuzzy numbers (Dubois and Prade, 1980a). Under the same assumptions, we have

max

--

t(S and S' I!14) =min (t(S I!14'), t(S' I81"))

(18)

When measures of possibility are used, note that we always have O(A u B)= max (n(A), O(B)), but n(A n B) < min (n(A), O(B)) generally, when A and B are fuzzy or crisp sets. As a consequence, N(A u B) > max (N(A), N(B)) generally. Thus the pair (N(A), n(A)) approximating the fuzzy truth-value is globally not truth-functional either.

D. Dubois and H. Prade

298

2

POSSIBILITY LOGIC AND OTHER LOGICS OF UNCERTAINTY

In this section we consider Boolean algebras of crisp propositions. Let g( p) be the grade of uncertainty about the truth of proposition p (we no longer call it a degree of truth!). g can be choosen among monotonic set-functions, i.e. those that respect logical entailment: if p--+ q = 0 (the tautology) then g(p) ~ g(q). Practically, the nature of g will be dictated by the nature of the available information in the database !!4. For instance, when !11 contains vague predicates (case (iv) of Section 1.1 ), modelled by possibility distributions, g is naturally a possibility measure. Possibilistic logic is a logic of uncertainty based on the use of possibility measures. The idea of using possibility theory as a basis for uncertain deductive reasoning was first proposed by Prade (1983) and then systematically developed by the authors (Dubois and Prade, 1984c, 1985a, 1986a, 1987a).

2.1

Basic axioms and interpretations of possibility measures

In the case of possibilistic logic, each axiom Pi is assigned a grade of possibility O(pi) and a grade of necessity N(pi) = 1 = 0(1pi) that Pi can be taken as true. These numbers can be computed by comparing the meaning of Pi to a description of some actual state of facts, as argued in Section l. The laws of possibility theory state that 0(0) = 1,

0(0) = 0

and Vp, q, fl(p v q) =max (fl(p), fl(q))

( 19)

where 0 and I[)) stand respectively for the propositions true and false in any interpretation. Thus, as long as classical logical propositions are used, the relation max (fl(p), 1 - N(p)) = 1 must be satisfied. More specifically, N(p) = 1 entails that pis true;

n( p) = 0 entails that p is false; N(p) = 0, fl(p) = 1 means total ignorance about the truth or falsity of p.

Here the notions of possibility and necessity are given a logical interpretation related to the lack of precision of the available information. Other interpretations of the same mathematical model have been put forward. In the nineteen fifties, in the framework of decision theory, Shackle, an English economist, advocated a non-probabilistic model of subjective uncertainty (see Shackle, 1961 ); in this work the notion of epistemic possibility, expressed

10

299

Possihilistic and Fuzzy Logics

in terms of degree of potential surprise, was introduced. This proposal is very close to the above model. Basically, Shackle claims that human decisions are taken on the basis of available possibilities rather than probabilities. Zadeh (1978a) proposes a physical view of possibility in' terms of ease of attainment, feasibility (e.g. the possibility of squeezing a given number of tennis balls into a box). Frequentist views of possibility measures are suggested by Dubois and Prade ( 1986b ); in that perspective, a possibility measure is a special case of a random set (Goodman and Nguyen, 1985). Possibility measures can also be considered as a limiting case of decomposable set functions g, i.e. such that g(A u B) = g(A) * g(B) for some operation * when A n B = 0; such an axiomatic approach to "subjective distorted probabilities" was initiated by Dubois and Prade ( 1982a). A view of possibility in the spirit of measurement theory (comparative possibility) is proposed by Dubois ( 1987). Finally, Giles (1982) develops an interpretation of possibility measures in terms of betting behaviour in the spirit of decision theory.

2.2

Uncertain deductive reasoning with possibility degrees

A paradigm of deductive reasoning under uncertainty can be described as follows. Given a set of axioms p" p 2 , .• • , Pn consisting of wffs in, say, firstorder logic on finite domains, attach to each p; a number g( p;) expressing a grade of confidence regarding the truth of p;, with the convention that g(~) = I for tautology and g(O) = 0 for contradiction. g is a function from the Boolean algebra f!JJ generated from {p~. ... ,pn} to [0, 1], as defined in the introduction to this section. The problem of deductive inference under uncertainty comes down to computing g(p) (and g(1p)) for any proposition p of interest, knowing g(p.), .. . , g(pn). This computation must take advantage of the properties of the uncertainty measure g-here a possibility or a necessity measure. Basic patterns of inference of classical logic have been extended to possibilistic logic, namely: modus ponens (Prade, 1983; Dubois and Prade, 1984c)

fl(q)

~ N(q) ~min (N(p), N(p--+ q))

(20)

modus tollens (ibid.) N(p) ~ fl(p) ~max (fl(q), I - N(p--+ q))

(21)

More generally, from knowledge of N (p), N( 1p) and the grades of necessity of all ways of relating p and q by means of implication, the inference process

300

D. Duhois and H. Prade

can be put into matrix form (Dubois and Prade, 1986a): [ N(q) N(---,q)

J~ [n11n noo not][ 10

N(p) N(1p)

J

(22)

where the matrix product is a sup-min composition, nii is a lower bound on N( pi --+ qi), p 1 = p, p0 = ---, p. Proper behaviour of the inference rule requires that at least one diagonal of the matrix contains zeros (see Dubois and Prade, 1986a). Along the same lines of thought, the resolution principle has been extended in this context (Dubois and Prade, 1987a) for ground clauses, as well as firstorder ones. For ground clauses it can be proved that N(q v r) ~min (N(p v q), N(ip v r))

(23)

which gives back the resolution principle when the right-hand side of the inequality is 1. The refutation method, as a proof methodology, can be extended for uncertain clauses. Here, to an uncertain clause is attached a degree of uncertainty interpreted as a positive lower bound on its grade of necessity. It was proved that the grade of necessity attached to the empty clause corresponds to a lower bound on the grade of necessity of the proposition to be proved. This lower bound is obtained by applying the (extended) resolution principle to the set of axioms equipped with their uncertainty levels, and the negation of the proposition to be proved, with necessity 1. Moreover, a set of undertain ground clauses is said to be inconsistent when the allocation of lower bounds on grades of necessity violates the axioms of possibility theory. N(p) ~IX, N(1p) ~ p and min (IX, p) > 0 is an example of such a violation.) It can be proved that this notion of inconsistency is equivalent to the classical inconsistency of the set of clauses where the lower bounds of the grades of necessity are removed. Based on these results, it is possibile to envisage automated reasoning techniques in the style of theoremproving. But there is a clear problem of strategy in order to maximize the lower bound attached to the empty clause (Dubois et al., 1987a). 2.3

Relationship with other logics of uncertainty

It is clear that other types of measures of uncertainty can be used instead of possibility measures, for instance probability measures and belief functions; but non-numerical approaches such as modal logic or non-monotonic logic can also account for uncertainty. The links between possibility measures and these other proposals are briefly discussed in this section.

2.2.1

Probabilistic logic

Probabilistic logic is considered for automated reasoning purposes by several

10

301

Possibilistic and Fuzzy Logics

authors (see Chapter 8). One of the problems is to give some interpretation to grades of probability; this is achieved by interpreting logical formulae as subsets of elementary events referred to as sets of possible worlds, or even directly relating the probabilities to statistical experiments. In other words, these approaches define some way of capturing the meaning of logical propositions, in the same spirit as is done here, although with different terminologies and assumptions. Interestingly enough, the automated reasoning techniques proposed by most probability theorists strongly depart from the classical theorem-proving methodology. Namely, the meaning of propositions is explicitly used in the reasoning procedure, which then comes down to a constrained-optimization problem. However, it can be proved (see e.g. Suppes, 1966; Dubois and Prade 1987a) that basic patterns of inference can be extended to probabilistic logic, for example the resolution principle becomes Prob (q v r);;;: max (0, Prob (p v q)

+ Prob (1p

v r)- I)

(24)

Note that the lower bound is smaller than with necessity degrees. Quinlan's INFERNO system, mentioned in Chapter 8, is closer to the spirit of symbolic reasoning, but it lacks completeness since it does not always compute the best bounds on probability values. Possibilistic reasoning techniques turn out to be far simpler, and the completeness can be conjectured. Moreover, the use of the additive law in probabilistic logic may lead to an increase of errors from input data to conclusions, while errors remain constant in the possibilistic setting. Finally, the normalization rule for probability measures (L;= 1 ..... n Prob (u;) = I, where u; is an elementary event) is more difficult to satisfy than the normalization rule for possibility measures (max;= 1 ..... " fl(u;) = I). Namely, inconsistent uncertainty assignments may be more frequent in the probabilistic setting than in the possibilistic one. But the major difference between possibilistic and probabilistic logics is that in the latter there is no absolute convention for modelling ignorance about the truth of a proposition in terms of a single probability value. One can only do it in terms of upper and lower probabilities: P*(p) = I ;;;: Prob (p);;;: P.(p) = 0; but this is exactly the convention that is adopted in the possibilistic setting. 2.2.2

Belief functions

Both settings can be reconciled within the theory of evidence. See Chapter 9. Axioms p., .. ., Pn are given grades of credibility (belief, support) Cr(p;), and grades of plausibility PI( p;) ;;;: Cr( p;), which implicitly define a basic probability assignment m: f!J-+ [0, I] such that m(i(])) = 0 and

L pe.<'

m(p) = I

(25)

302

D. Dubois and H. Prade

Namely, denoting by /(p) the set of implicants of p (l(p) = {q Iq-+ p = the functions Cr and PI are defined by Cr(p;) =

L

~});

(26)

rn(q)

qe/(p;)

Pl(p;)

=

I-

L

(27)

rn(q)

qe/(•p 1)

Note that (26) and (27) do not characterize a unique basic probability assignment rn from knowledge of {(Cr(pi), PI( pi)), i = I, ... , n}, generally. The good point about this approach is that the evidence supporting p can be only loosely related to the evidence supporting 1p, in contrast with the probabilistic setting, where Prob (p) + Prob (1p) = 1. Intervals [Cr(p), Pl(p)] can be viewed as constraining ill-known probability values. In particular, if for any non-atomic p E &, rn(p) = 0, then only elementary propositions are weighted by m, and Cr = PI are probability measures. In contrast, if the set of focal propositions§' = { q Im(q) > 0} can be ordered as q" .. . , qm such that Vi= I, rn- I, qi E /(qi+ 1 ) then the function PI is a possibility measure and conversely. In this case Cr is also a necessity measure (see e.g. Dubois and Prade, 1982b).

2.2.3

Modallogic

It is interesting to discuss the links between possibility logic and modal logics

which provide a syntactic modelling of the concept of possibility and necessity, usually referring to possible-world semantics (see Appendix B of the Introduction). The main differences between both approaches seem to be as follows. (i)

In modal logic possibility and necessity are all-or-nothing concepts. As a consequence they can be introduced as special symbols in the language. Op reads "pis necessary" while OP reads "pis possible". In contrast, in possibility theory, possibility is a graded notion as well as necessity-whence the use of numbers.

(ii)

Modal logics propose numerous axiomatic settings, while the axioms of possibility theory are well-defined and unique. Along this line, it is relevant to define qualitative counterparts of possibility-theory axioms, in the style of modal logic, restricting ourselves to the case where O(p) E {0, 1}. One way of doing this is to use the following translation rule: 1- Dp

translates into N( p)

=

I

1- OP

translates into n(p)

=

I

/0

303

Po.uihilistic and Fu::::y Logics

Clearly, the classical identiy---, () p = D---, ptranslates into I - 0( p) = N(---, p), which is a basic relationship in possibility theory. Moreover, a numerical translation of Lewis' implication D(p-+ q) is clearly N(p-+ q) = I, which implies that p-+ q is true, in possibility logic. The basic

axiom of possibility logic can be expressed as 1-()(p v q)+-+(()p v ()q)

(28)

This is one of the basic axioms of the modal-logic system T according to von Wright (see Hughes and Cresswell, 1968). In addition, possibility theory recovers the square of Aristotelian modalities, as does the S 5 system. It would be interesting to relate possibility theory to some existing formal systems in modal logic.

2.2.4

Default reasoning using logics of uncertainty

Probabilistic logic and possibility logic have both been suggested as possible approaches to default reasoning (Rich, 1983; Farreny and Prade, 1986). The idea is to interpret the weight bearing on an "if-then" rule as a measure of the extent to which the rule has no exception. The rule then models an imperfect "is-a" link in a semantic network. See Chapter 7. This interpretation of logics of uncertainty faces several problems. (i)

It is not clear that in default logic the grade of uncertainty must be attached to a logical implication p -+ q. The use of conditioning instead of implication may appear more natural for modelling imperfect "is-a" relations. For instance Prob (q I p) expresses the proportion of q's among those that are p's. See Zadeh ( 1983) for a treatment of default rules in terms of fuzzy proportions. The problem raised here is the difference between what can be called a "conjecture" (i.e. a universal assertion that is true or false, but cannot yet be proved nor refuted) and what Zadeh ( 1985) calls a "disposition" (an assertion that generally holds, but sometimes does not).

(ii)

Default rules do not always underlie a statistical interpretation. In particular, typicality (Chapter 7) seems to be of a different nature. In that case a possibilistic treatment of default rules, as done by Farreny and Prade (1986), may be more satisfactory. A default rule is then modelled by knowledge of the quantity O(q 1 p) defined by the relation n(p

1\

q) =min (n(p), n(q I p))

(29)

from the knowledge of a possibility measure. However, as indicated

304

D. Dubois and H. Prade

by Dubois and Prade ( 1986a), this notion of conditioning is very close to logical implication, since n(q I p) = 1 - N(p--+ 1q), or 1.

3

FUZZY LOGIC

Uncertain propositions, considered in the preceding section, must not be confused with fuzzy propositions. In the first case we have propos1tions that are true or false (thus involving non-vague predicares), but due to the lack of precision of the available information, we can in general only estimate to what extent it is possible or necessary that a proposition is true. In the second case the available information is precise, but the vagueness of predicates leads to propositions with intermediary degrees of truth. Obviously we may encounter a fuzzy proposition for which the available reference information is not precise; then we have the general case of an uncertain fuzzy proposition; the study of such propositions is outside the scope of this introduction. This situation is the most complicated one, and it leads to fuzzy truth-values as indicated in Section 1. See also Prade (1985b) and Yager (1984) for a representation and a treatment of uncertain fuzzy propositions in terms of possibility distributions.

3.1

Formal reasoning with vague predicates

Patterns of reasoning in the style of modus ponens can be developed for fuzzy propositions, i.e. bounds on t(q) can be computed from knowledge oft( p) and t( p --+ q); see Dubois-Prade (1980a) for instance. However, there are different natural ways for defining t(p--+ q) from (14)-(16), as discussed by Dubois and Prade (1984a); moreover, we may have t(1q--+ 1p) #- t(p--+ q), for some definitions of the implication operator, when t( p --+ q) is not defined as t(1q v q). It can be proved that t(q) ~min (t(1p v q), t(p)) if and only if t(1p v q) + t(p) > 1 (see Dubois and Prade, 1980a, p. 167), result contrasts with (20). Quite early in the development of fuzzy-set theory, an extension of Robinson's resolution principle was proposed by Lee ( 1972) for ground clauses in the framework of the fuzzy logic defined by (14)-(16), i.e. for dealing with fuzzy propositions; note that the resolution principle avoids the explicit use of the implication connective in the representation of the knowledge. Basically, Lee proved that if all the truth-values of parent clauses are strictly greater than 0.5 then a resolvent clause derived by the resolution principle always has a truth-value between the maximum and the minimum of those of the parent clauses. See Dubois and Prade (1987a) for a bibliography of subsequent works along this line.

10

305

Po.uibilistic and Fuzzy Logics

Note that a set of fuzzy propositions is generally not a Boolean algebra. In particular, the law of contradiction is not valid, so that the refutation method, which is the basis of many logic-programming techniques, seems hard to implement here. This fact, and also the fact that the above results are applicable only to ground clauses, may restrict the applicability of formal proving methodologies for fuzzy logic.

3.2

Fuzzy logic based on meaning computation

Another approach to reasoning with fuzzy statements, i.e. statements containing fuzzy predicates, is described by Zadeh (1979). Let S~o S 2 , ... , Sn ben statements expressed, say, in natural language. Let {M(S;), i = I, ... , n} be the meanings of S ~o ... , Sn, as defined in Section I. M(Si) is obtained by translating Si into a meaning-representation language such as PR UF (Zadeh, 1978b), and is fuzzy restriction on a variable vector Xi taking its value on universe Vi. Let U be the universe of discourse built from U~o .. . , Un, and let V be some subuniverse from which a variable Yin which we are interested takes its values. Reasoning is here viewed as using the statements S~o ... , Sn to say something about Y. This is done in three basic steps: (i)

compute the meanings M(S;) of Si, i = I, ... , n, and their cylindrical extensions on U, say M 0 (S;);

(ii)

calculate the join of the M(Si); this yields a fuzzy relation R on U, interpreted as a possibility distribution;

(iii)

project R on the universe V, to obtain the fuzzy relation Projv (R), which is the meaning of the conclusion statement S about Y.

This scheme is very general. R is obtained by intersection of the fuzzy relations M 0 (S;). Let X be the vector variable pertaining to U; X can be denoted ( Y, Y'), where Y' pertains to variables taking values in V' such that U = V x V'. Projv (R) is defined by

v Y, J.lProjy(R)( Y)

=

sup J.lR( Y, Y')

(30)

Y'

consistently with possibility theory. The classical modus-ponens pattern has been generalized to the following fuzzy-logic pattern. S 1:XisA' S2 : if X is A then Yis B S: Yis B' where M(S 1 ) = A', a fuzzy set on U 1

306

D. Dubois and H. Prade

M(S 2 ) is obtained by means of a multiple-valued implication connective denoted --+ (see e.g. Dubois and Prade (1984b) for a review). We have J.lM(S2)(X, Y) = J.lA(X)

--+

J.lB( Y)

(31)

The universe U is U 1 x V, and the relation R on U is such that J.lR(X, Y) = J.lA·(X)

where

* is an

* (J.LA(X)

--+

J.lB( Y))

(32)

intersection operation such as ( 15). Thus we get (33)

It is shown by Dubois and Prade (1984b, 1985a) that the choice of the implication operation --+ is dictated by the choice of the intersection * as soon as the meaning of S 2 is defined from J.lA and J.l 8 under the following constraints:

(i)

from X is A one should conclude S: Y is B;

(ii)

M(S 2 ) should be as much unspecific as possible (i.e. as large a fuzzy set as possible) not to be arbitrary.

In particular, if * = min then --+ should be Godel implication (a --+ b = I if a ::::; b, and b otherwise). An axiomatic approach to the definition of manyvalued logic implication connectives (Rescher, 1969) is given in Trillas and Valverde (1985) for instance. A unified view of several classes of many-valued implication functions is proposed by Dubois and Prade (1984a). The pattern of the resolution principle can be dealt with using the same methodology:

S 1 :XisA'orZisC

S 2 : X is not A or Y is B S: ( Y, Z) is D This pattern can be called a generalized resolution principle. Indeed, here A' #- A, and predicates are fuzzy. It can be checked that, using (14)-(16) for basic set-theoretic operations, J.lo( Y, Z) =sup min (max (J.lA' (X), J.lc(Z)), max (I - J.lA(X), J.l 8 ( Y))) (34) X

Note that J.lo( Y, Z) ~ max (J.tc(Z), J.l 8 ( Y)) always holds; equality is obtained when A' = A is a crisp subset. The classical resolution principle is then recovered. Note that when C is empty, the pattern of the generalized modus ponens (33) is recovered, with * = min and a --+ b = max (I -a, b). When the implication used is not Godel's, A'= A does not yield B' = B in (33), D = B or C in (34). In other words, the elimination of fuzzy predicates

/0

307

Pmsihilistic and Fuzzy Logics

is not always permitted in fuzzy counterparts of the resolution principle. To recover the elimination property requires once again a proper choice of the implication operations in the pattern S 1 : if X is not A then Z is C S 2 : if X is A then Yis B S: ( Y, Z) is B or C Note that the inference mechanism in fuzzy logic is generally a nonlinearprogramming technique. Examples of systems based on these ideas are proposed by Baldwin (1979, 1983), Yager (1984) and Martin-Ciouaire and Prade (1986) for example. See also Prade and Negoita (1986), Sanchez and Zadeh (1987) for application-oriented papers, and Prade (1985a) for a larger bibliography.

3.3

Illustrative example

Let us con~ider the following example, which illustrates various aspects of possibility and fuzzy logics. We have two rules and two facts: (a)

if a person is a professional jockey U) then his/her weight is approximately between 45 and 50 kg (A');

(b)

if a person is a male (m) and his weight is between 40 and 50 kg (A) then it is likely he is a teenager (t);

(c)

John (J) is a male (m) and a professional jockey U).

The possible weights of a professional jockey specified by rule (a) are represented by means of a bell-shaped possibility distribution Jl.A, like that pictured in Fig. 6, and whose support is the interval [45, 50]. The conclusion part of rule (b) is pervaded with uncertainty; this can be modelled using the notation introduced above: (a):j(x)--+ A'(x);

(b): m(x)

1\

A(x) --+ t(x)

(IX);

(c): m(J),j(J)

where IX = N(Vx, m(x) " A(x) --+ t(x)). A particular case of the generalized modus ponens enables us to deduce from (a) and (c) that A'(J), i.e. John's weight is fuzzily restricted by A'. Then we compute to what extent we are certain that the condition part of (b) holds as N(m(J)

1\

A(J); m(J)

1\

A'(J)) =min (N(m(J); m(J)), N(A(J); A'(J))) =min (I, N(A(J); A'(J))) = N(A(J); A'(J))

which can be easily computed using (11) with n = propagate the uncertainty along rule (b):

llA'IJ)·

N(t(J));;;: min (N(A(J); A'(J)), IX)

Then by (20) we

308

D. Dubois and H. Prade

However, here we may find the conclusion that John is a teenager is somewhat certain is highly undesirable. The problem of controlling transitivity effects is classical in default logic; see Chapter 7. The way of coping with this problem in our framework is as follows. First, as others do, we substitute for (b) a more precise statement, namely (b')

if a person is a male and his weight is between 40 and 50 kg and he is not a jockey then it is likely that he is a teenager.

Then if we know that John is a jockey, rule (b') can no longer be applied to John. Secondly, if we just know that John is a male and that his weight is fuzzily restricted by A' then, in order to be able to use (b'), we keep the piece of default knowledge that the (a priori) possibility that a person is a jockey is very low, say e. So the corresponding a priori certainty that a person is not a jockey is high (I - e). The evaluation of the condition part of (b') now yields min (N(A(J); A'(J)), I -e), and finally our certainty that John is a teenager will be min (N(A(J); A'(J)), I - e, IX), i.e. a strong certainty if IX and N(A( J); A'( J)) are close to I.

3.4

Reasoning with fuzzy quantifiers

Often items of knowledge are expressed in the form of statements involving quantifiers different from the universal or the existential ones. These quantifiers can be viewed as proportions that may be only vaguely specified. They translate linguistic terms such as "most of" and "some". Zadeh ( 1983, 1985) has considered syllogisms with propositions involving fuzzy quantifiers modelled by fuzzy subsets of the unit interval, for example the so-called intersection/product syllogism of the form

Q1 As are Bs Q2 (A and B)s are Cs Q1 ® Q2 As are (B and C)s where ® is the extended product of fuzzy numbers (Dubois and Prade, 1980b). In the above syllogism Q 1 restricts the possible values of the proportion, lA n BI/IAI (where I I denotes the cardinality) or more generally of the conditional probability P(B I A). Q2 is defined in a similar way. The resulting quantifier Q1 ® Q2 is justified by the well-known probabilistic identity P(B n CIA) = P(B I A)· P(C IAn B). Note that patterns of reasoning that are valid when universally quantified may no longer hold even in a weaker form with fuzzy or numerical quantifiers. For instance the syllogism, where V means "all"

/0

309

Possihilistic and Fuzzy Logics

VAs are Bs Q 1 Bs are Cs Q As are Cs is valid with Q 1 = Q = V. However, as soon as Q 1 =IV, nothing can be said about the proportion of As that are Cs, which may take any value in the unit interval, i.e. Q = [0, 1]. Moreover, if we add the supplementary piece of knowledge Q 2 Bs are As

then it can be established that when JJ.Q 1 is increasing fori= I, 2 (i.e. Q 1 and

Q2 are variants of "most" as in Fig. 6)

eQl)

~( 0,18~ 1 Q=max

(35)

e

the extended where IilaX' is the extended maximum for fuzzy numbers, subtraction and (I Q 1 )/Q 2 an extended quotient. This result only expresses the fact that if A ~ B, P(A I B) ~ q 2 and P( C I B) ~ q 1 then, from the laws of probability theory, we conclude that P(C I A)~ (0, I - (1 - q 1 )/q 2 ). In (35) laws of probability theory are combined with results in fuzzy arithmetics (Dubois and Prade, 1980a). Consequently, Zadeh's theory of fuzzy syllogisms is nothing but probabilistic logic expressed in terms of conditional probabilities (as in the approach described in Chapter 8) with the assumption that the knowledge of probability values is in the form of fuzzy intervals, i.e. fuzzy probabilities (Zadeh, 1984), instead of point-probabilities or interval-valued ones. Based on this modelling, some forms of reasoning with default rules can be defined, when the general rules of the form Q As are Bs are instantiated. Note that in

e

0 Fig. 6

Q ="most".

310

D. Duboi.l' and H. Prade

their linguistic forms, the quantifiers may not explicitly appear in the rules (e.g. "snow is white" is short for "usually, snow is white"). Rules with explicit or implicit quantifiers are called "dispositions" by Zadeh ( 1985). The degree of truth of a statement of the formS= "Q As are Bs" is computed by Zadeh as follows: t(S 1.?4)

_.UQ (lA lA IBl) -_.UQ (u~u min (.UA(u), ,U8(u)))

-

11

L .UA(u)

ueU

provided that all values {(.UA(u), .u8(u)) I u E U} are stored in .?4. Yager (1983) proposes another treatment of quantified statements of the form "Q As are Bs", which does not relate to conditional probabilities.

4 EXAMPLES OF APPLICATIONS IN THE AUTHORS' RESEARCH GROUP Implementing the generalized modus ponens can be very simply achieved using parametrized representations of membership functions (Dubois et al., 1987b); the order of complexity of fuzzy production systems is thus not higher than for usual production systems. Of course the processing time for each rule is higher than in the classical case. But this is counterbalanced by a better expressive power of fuzzy production rules. A few fuzzy rules can generally account for the behaviour of a larger set of non-fuzzy rules. Implementing possibilistic logic in the production system style is quite efficient owing to the max-min matrix-calculation scheme for uncertainty propagation. In the resolution style, linear resolution strategies can be adapted, and a heuristic search algorithm has been developed (analogous to Nilsson's A*) to maximize the certainty of the resulting empty clause (Dubois et a/., 1987a). The design of the inference engine SPII (Martin-Ciouaire and Prade, 1986) has been motivated by the need for a sufficiently general inference system able to (i) deal both with the imprecision and the uncertainty pervading factual and expert knowledge, and (ii) combine symbolic reasoning with numerical computation. SPII-2 is capable of treating pieces of information (facts or rules) that are imprecise (since they are expressed by means of vague predicates) or uncertain (since their truth is not fully guaranteed). SPII-2 works in backward-chaining. Possibility theory is used for representing imprecision in terms of possibility distributions and uncertainty by means of a pair of possibility and necessity measures. More technically, SPII-2 (i) propagates uncertainty and imprecision in the reasoning process via deductive inferences; (ii) estimates the degree of matching between facts and condition parts of rules in presence of vagueness; (iii) combines imprecise or

10

Pos.l'ihili.l'tic and Fu:::y Logics

311

uncertain pieces of information relative to the same matter; and (iv) performs computation on ill-known numerical quantities using fuzzy arithmetics. SPII-2 has been developed and experimentally tested on a realistic prospectappraisal problem in petroleum geology involving fuzzy rules (Lebailly et al., 1987). SPII-2 is written in LELISP and is running on a VAX 11-780 computer as well as a Macintosh microcomputer. DIABETO (Buisson et al., 1987) is a medical expert system, accessible from the French videotex network TELETEL, which is a decision-aid tool for the treatment of diabetes. In DIABET0-111, imprecise/uncertain rules and facts are represented in an unified manner using possibility distributions. Particularly DIABET0-111 deals with expert rules involving fuzzy conditions, which are understood as "the more the condition is satisfied, the more certain is the conclusion". Besides, an interpolation method enables the system to build, from a given set of fuzzy rules, a new fuzzy rule which is more adapted to the current situation if necessary. Presently the knowledge base contains about 300 rules (the full knowledge base should contain about 1000 rules). The system is_ designed for use by sick people themselves. It is implemented in NIL (a dialect of LISP) on a VAX 11-780. The inference engine T AlGER (Farreny et al., 1986) is not only able to handle uncertain rules but also imprecise and uncertain factual pieces of knowledge concerning the values of logical or numerical variables. The possibilistic representation of uncertainty that is used is somewhat similar to the MYCIN one (Buchanan and Shortliffe, 1984), but the chaining and combination operations of the possibility-theory-based aproach differ somewhat from the empirical choice (obtained as distorted probabilistic laws) made in MYCIN. Besides, imprecision is dealt with in the same possibilistic framework in TAIGER. TAIGER manipulates numerical values pervaded with imprecision and uncertainty, while inference engines like that of MYCIN treat uncertain rules and facts only. TAIGER maintains a representation of imprecise or uncertain facts in terms of possibility distributions, while the uncertainty of a rule is modelled by attaching the numbers appearing in a 2 x 2 matrix representation of the rule (Farreny and Prade, 1986). TAIGER works in backward-chaining. TAIGER is currently implemented on an IBM-PC microcomputer in MULISP.

5

CONCLUSION

Possibility theory offers a common setting for modelling uncertainty and imprecision in reasoning systems. However, the reasoning methodology in fuzzy logic drastically differs from the theorem-l'roving approach. In the latter, statements are translated into logical formulae. Inference is then performed symbolically, regardless of the meaning of the formulae. In fuzzy

312

D. Dubois and H. Prade

logic, in contrast, statements are translated into elastic constraints in a meaning-representation language, and the meaning of the conclusion is directly computed via nonlinear-programming techniques. However, in possibility logic, as soon as no vagueness pervades the knowledge, it seems that part of the theorem-proving methodology can be extended, as stressed in Section 2. Finally, we have pointed out that the notion of truth can be viewed as the result of a semantic pattern-matching process. This view leads to the definition of operational procedures in order to compute degrees of truth and degrees of uncertainty that can feed approximate reasoning systems. Bl B LIOG RAPHY Baldwin, J. A. (1979). A new approach to approximate reasoning using a fuzzy logic. Fuzzy Sets and Systems 2, 309-325. (An extensive treatment of the generalized modus ponens based on fuzzy truth-values.) Dubois, D. and Prade, H. (1980a). Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York. (An account of the fuzzy-set literature in the nineteen seventies. Covers a broad range of topics.) Dubois, D. and Prade, H. (1985a). (avec Ia collaboration de H. Farreny, R. MartinCiouaire, C. Testemale) Thi!Orie des Possibilith Applications a Ia Representation des Connaissances en lnformatique. Collection Methode+ Programmes, Masson, Paris. (English translation to be published by Plenum Press, New York.) (A complement to the previous reference on some aspects of fuzzy-set theory, especially possibility measures, fuzzy arithmetics and fuzzy-set-theoretic operations. Focuses on applications to approximate reasoning, heuristic search, fuzzy programming and relational databases.) Dubois, D. and Prade, H. (1986a). Possibilistic inference under matrix form. Fuzzy Logic in Knowledge Engineering (ed. H. Prade and C. V. Negoita), pp. 112-126. Verlag TUV Rheinland, K6ln. (An extensive presentation of possibilistic logic. Also deals with the question of the possibility of conditionals and conditional possibility.) Dubois, D. and Prade, H. (1987a). Necessity measures and the resolution principle. IEEE Trans. Syst. Man Cyber. 17, 474-478. (The theorem-proving approach to possibilistic logic.) Gaines, B. R. (1976). Foundations of fuzzy reasoning. Int. J. Man-Machine Stud 8, 623-668. (A basic reference on the links between multiple-valued logics and fuzzyset theory.) Lee, R. C. T. (1972). Fuzzy logic and the resolution principle. J. Assoc.for Computing Machinery 19, 109-119. (The main and oldest reference on the theorem-proving approach to the max-min multiple-valued logic underlying fuzzy-set theory.) Ponasse, D. (1978). Algebres floues et algebres de J:_ukasiewicz. Rev. Roum. Mat h. Pures Appl. 23, 103-113. (A fuzzy counterpart of Stone's theorem for Boolean algebras.) Prade, H. (1985a). A computational approach to approximate and plausible reasoning with applications to expert systems. IEEE Trans. Pattern Anal. Machine Intelligence 7, 260--283 (Corrections in 7, 747-748). (An overview of approximate reasoning methodologies related to possibility theory and fuzzy logic. Includes a very large bibliography.) Prade, H. and Negoita, C. V. (eds) (1986). Fuzzy Logic in Knowledge Engineering.

/0

Possihi/istic and Fuzzy Logics

313

Verlag TOY Rheinland, Koln. (A collection of up-to-date contributions by major researchers in the area of possibility theory and fuzzy logic applied to approximate reasoning, databases and expert systems.) Sanchez, E. and Zadeh, L.A. (eds) (1987). Approximate Reasoning in Intelligent Systems, Decision and Control. Pergamon Press, Oxford. (A similar collection, with other contributions.) Yager, R. R. (1983). Quantified propositions in a linguistic logic. Int. J. Man-Machine Stud. 19, 195-227. (An alternative approach to fuzzy quantifiers, extending the substitution method in logic.) Zadeh, L. A. (1965). Fuzzy sets. Info. Control 8, 338-353. (The founding paper on fuzzy-set theory. It is still recommended reading to capture the basic intuitions.) Zadeh, L. A. (1978a). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems I, 3-28. (The first paper on possibility measures. Stresses the links between possibility distributions and linguistic information. Adopts a physical point of view on possibility, as opposed to statistical knowledge.) Zadeh, L.A. (1979). A theory of approximate reasoning. Machine Intelligence 9, (ed. J. E. Hayes, D. Michie and L. I. Mikulich), pp. 149-194. Elsevier, Amsterdam. (Zadeh's approach to reasoning with vague information. Describes in detail the combination/projection methodology sketched in Section 3.2.) Zadeh, L. A. (1985). Syllogistic reasoning in fuzzy logic and its application to usuality and reasoning with dispositions. IEEE Trans. Syst. Man Cyber. 15, 754-763. (The most up-to-date paper on the theory of dispositions and the treatment of fuzzy quantifiers.) Other references

Baldwin, J. (1983). A fuzzy relational inference language for expert systems. Proc. 13th IEEE Int. Symp. on Multiple-Valued Logic, Kyoto, pp. 416-423. IEEE, New York. Bellman, R. E. and Zadeh, L. A. (1977). Local and fuzzy logics. Modern Uses of Multiple-Valued Logics (ed. J. M. Dunn and G. Epstein), pp. 103-165. Reidel, Dordrecht. Buchanan, B. G. and Shortliffe, E. H. (1984). Rule-based Expert Systems. TheM YCI N Experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, Mass. Buisson, J. C., Farreny, H., Prade, H., Turnin, M. C., Tauber, J. P. and Bayard, F. (1987). TOULMED, an inference engine which deals with imprecise and uncertain aspects of medical knowledge. Proc. Eur. Conf on Artificiallnteligence in Medicine (AIM E-87), Marseilles. Springer-Verlag, Berlin. Dubois, D. (1987). Possibility theory: towards normative foundations. Risk, Decision and Rationality (ed. B. Munier). Reidel, Dordrecht. (To appear.) Dubois, D. and Prade, H. (1980b). New results about properties and semantics of fuzzy-set-theoretic operators. Fuzzy Sets. Theory and Applications to Policy Analysis and Information Systems (ed. P. P. Wang and S. K. Chang), pp. 59-75. Plenum Press, New York. Dubois, D. and Prade, H. (1982a). A class of fuzzy measures based on triangular norms. Int. J. Gen. Syst. 8, 43-61. Dubois, D. and Prade, H. (1982b ). On several representations of an uncertain body of evidence. Fuzzy Information and Decision Processes (ed. M. M. Gupta and E. Sanchez), pp. 167-181. North-Holland, Amsterdam. Dubois, D. and Prade, H. (1984a). A theorem on implication functions defined from triangular norms. Stochastica, 8, 267-279.

314

D. Duhois and H. Prade

Dubois, D. and Prade, H. (1984b). Fuzzy logics and the generalized modus ponens revisited. Cybernetics and Systems 15, 293-331. Dubois, D. and Prade, H. (1984c). The management of uncertainty in expert systems: the possibilistic approach. Operational Research '84: Proc. lOth Triennal I FO RS Conf., Washington, DC (ed. J. P. Brans), pp. 949-964. North-Holland, Amsterdam. Dubois, D. and Prade, H. (1985b). A review of fuzzy set aggregation connectives. Info. Sci. 36, 85-121. Dubois, D. and Prade, H. (1985c). Evidence measures based on fuzzy information. Automatica 31, 547-562. Dubois, D. and Prade, H. (1986b). Fuzzy sets and statistical data. Eur. J. Operational Res. 25, 345-356. Dubois, D., Prade, H. and Testemale, C. (1986). Weighted fuzzy pattern matching. Proc. Journee Nationale sur /es Ensembles F/ous, Ia Theorie des Possibilites et leurs Applications, Toulouse, pp. 115-145. (To appear in Fuzzy Sets and Systems, 1988.) Dubois, D., Lang, J. and Prade, H. (1987a). Theorem proving under uncertainty. A possibility theory-based approach. Proc. lOth Int. Joint Conf. on Artificial Intelligence (IJCAI-87), Milan.

Dubois, D., Martin-Ciouaire, R. and Prade, H. (1987b). Practical computing in fuzzy logic. Fuzzy Computing (ed. M. M. Gupta and T. Yamakawa). North-Holland, Amsterdam. (To appear.) Farreny, H. and Prade, H. (1986). Default and inexact reasoning with possibility degrees. IEEE Trans. Syst. Man Cyber. 16, 270-276. Farreny, H., Prade, H. and Wyss, E. (1986). Approximate reasoning in a rule-based expert system using possibility theory: a case study. Information Processing '86 (ed. H. J. Kugler), pp. 407-413. North-Holland, Amsterdam. Giles, R. (1982). Foundations for a theory of possibility. Fuzzy /'!formation and Decision Processes (ed. M. M. Gupta and E. Sanchez), pp. 183-195. North-Holland, Amsterdam. Goodman, I. R. and Nguyen, H. T. (1985). Uncertainty Modelsfor Knowledge-Based Systems. North-Holland, Amsterdam. Hughes, G. E. and Cresswell, M. J. (1968). An Introduction to Modal Logic. Methuen, London. Lebailly, J., Martin-Clouaire, R. and Prade, H. ( 1987). Use of fuzzy logic in rule-based systems in petroleum geology. Approximate Reasoning in Intelligent Systems, Decision and Control (ed. E. Sanchez and L. A. Zadeh), pp. 125-144. Pergamon Press, Oxford. Martin-Ciouaire, R. and Prade, H. (1986). SPil-l: a simple inference engine capable of accommodating both imprecision and uncertainty. Computer-Assisted DecisionMaking (ed. G. Mitra), pp. 117-131. North-Holland, Amsterdam. Prade, H. (1983). Data bases with fuzzy information and approximate reasoning in expert systems. Proc. IF AC Int. Symp. on Artificial Intelligence, Leningrad, pp. 113120. Prade, H. (1985b). Reasoning with fuzzy default values. Proc. 15th IEEE Int. Symp. on Multiple-Valued Logic, Kingston, Ontario, pp. 191-197. IEEE, New York. Rescher, N. (1969). Many-Valued Logic. McGraw-Hill, New York. Rich, E. (1983). Default reasoning as likelihood reasoning. Proc. American Association for Artificallntelligence Conf (AAAI-83), Washington, DC, pp. 348-351. Schweizer, B. and Sklar, A. (1963). Associative functions and abstract semi-groups. Pub/. Mat h. Debrecen 10, 69-81. Shackle, G. L. S. (1961). Decision, Order and Time in Human Affairs, 2nd edn. Cambridge University Press.

10

Po.vsihilistic and Fuzzy Lof(ic.\·

315

Suppes, P. ( 1966). Probabilistic inference and the concept of total evidence. Aspects of Inductive Logic (ed. J. Hintikka and P. Suppes), pp. 49-65. North-Holland, Amsterdam. Trillas, E. and Valverde, L. (1985). On implication and indistir;~guishability in the setting of fuzzy logic. Management Decision Support Systems Using Fuzzy Sets and Possibility Theory (ed. J. Kacprzyk and R. R. Yager), pp. 198-212. Verlag TOY, Rheinland, Koln. Yager, R. R. (1984). Approximate reasoning as a basis for rule-based expert systems. IEEE Trans. Syst. Man Cyber. 14, 636-643. Zadeh, L. A. ( 1978b ). PR UF: a meaning representation language for natural languages. Int. J. Man-Machine Stud. 10, 395-460. Zadeh, L. A. (1981 ). Test-score semantics for natural languages and meaning representation via PRUF. Technical Note 247, SRI International, Menlo Park, California. Also in Empirical Semantics (ed. B. B. Rieger), pp. 281-349. Brockmeyer, Bochum, 1982. Zadeh, L.A. (1983). The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets and Systems 11, 199-228. Zadeh, L.A. (1984). Fuzzy probabilities. l'!fo. Proc. Mgmt 19, 148-153.

DISCUSSION Marie-Odile Cordier: Dubois and Prade make in their paper a clear distinction between uncertain reasoning and vague (or approximate?) reasoning. Uncertain reasoning is a precise reasoning on a given situation (or world) incompletely described in a database. The answer to a precise query can then be: "surely" true, "surely" false, or "possibly" true or false: i.e. a degree of certainty. If one knows for example that John is tall then the query "is John's height more than 1.80 m?" is answered in terms of whether there is more or less possibility that the statement is true. Vague reasoning is reasoning with vague predicates on a precise database. The predicates are defined approximately and are more or less verified by precise data, i.e. are more or less true. An answer to a query is then a degree of truth. If one knows that John's height is 1.70 m then a statement such as "John is tall" can be said to be true with a to-be-determined degree of truth. Possibilistic logic and fuzzy logic are two ways of reasoning on imperfect knowledge; they use as a common tool fuzzy-set theory and for that reason are quite often confused. They are clearly distinguished in Dubois and Prade's chapter. Possibilistic logic is concerned with uncertain reasoning where the database is a fuzzy description of a given world; it is a logic of uncertainty, as is probabilistic logic and the logic of evidence. Fuzzy logic is concerned with fuzzy reasoning on precise information; it is a logic of vagueness. Both are numerical approaches in the sense that degrees of certainty and degrees of truth are both estimated by numerical values. In possibilistic logic, an assertion is labelled by two values, the possibility and the necessity, which describe an interval of certainty of this assertion. In fuzzy logic, an assertion is labelled with a degree of truth describing its conformity with the reality. An important result is given by Dubois and Prade stating that logics of uncertainty cannot be truth-functional. Uncertain reasoning: evaluation of uncertainty versus use of dependency links. Uncertain reasoning means reasoning on incomplete information: the value of

316

D. Dubois and H. Prade

what Dubois and Prade call a variable (such as the height of John) is unknown, but can be restricted to a set of values. These restrictions reduce the set of possible worlds that can be modelled by such an incomplete database. One way of doing this is to evaluate the possible values of the variable: using numerical estimations, which are easy to combine and to manipulate. These numbers can be obtained using fuzzy set theory as is done in possibilistic logic, or probability theory as in probabilistic logic, and can describe the possibility, probability, credibility etc. of the corresponding assertion. Symbolic estimations can also be used, such as the modalities proposed by Kodratoff eta/. (1985). Another way is to consider unknown values as possible hypotheses and to use hypothetical reasoning; assertions are then labelled by hypothetical contexts that describe precisely under what conditions the result can be obtained. Instead of producing answers such as "/ikes(Mary, Paul) with possibility

IX

and necessity

P"

it would produce "likes(Mary, Paul) if Paul earns more than $10000 or Paul is less than 40 years old" which can be more useful or more instructive.

One of the problems with logics of uncertainty is how to transform uncertain information into information labelled by a certainty degree. In probabilistic logic the certainty degree describes a probability that can be obtained by using the well-known probabilistic theory. It seems that probabilities are well adapted to describe the certainty of a fact such as "the king of diamonds is in West's hand" (game of bridge) which can be computed precisely. In possibilistic logic the possibility and necessity are obtained through the use of fuzzy-set theory. The meaning of a fuzzy predicate is described by a fuzzy set; the justification of these values is a question of agreement on the meaning of a word; a fuzzy set describing "John is tall" in terms of precise heights or "Peter is rich" in terms of amount of salary can be said to be good only if it satisfies the user. The meaning of a word is a matter of opinion and cannot be formally justified. The choice between logics of uncertainty seems to be quite dependent on the domain; Dubois and Prade argue that some results are worse (using the resolution principle) in probabilistic logic than in possibilistic logic, but what if inputs could be more precisely determined in the first case? Comparison with other logics of uncertainty.

Possibility and necessity of an implication: p :::J q (p-+ q). Dubois and Prade show how to get the couple (possibility, necessity) for an element of a fuzzy database: the meaning of a fuzzy predicate is given by a fuzzy set, represented by a trapezoidal curve; these curves are used to obtain for a given fuzzy fact, represented by a ground atomic formula, the two certainty measures. It is not so clear when one considers implication like p -+ q: where do possibility and necessity come from? What is their (intuitive) meaning? Possibility and necessity of an implication can be seen as extensions of possibility and necessity of a ground fact; they would describe the fuzzy relation between two propositions p and q, and express the uncertainty of expressions such as "it is possible that", "it can be that", "probably ... ". It would be expected that an implication would be labelled by possibility and

10

Possibi/istic and Fuzzy Logics

317

necessity measures as is the case for ground assertions; it seems that a matrix is proposed instead; this matrix is said to express "the grade of necessity of all ways of relating p and q". But no means (such as fuzzy sets for expressing the fuzzy predicates) are given for determining these measures. What is the intuitive meaning of these values? Where does this matrix come from? Are fuzzy sets used to compute it? For example, what is n 11 ? is it the necessity of (p --+ q) or the necessity of q knowing p? or are these two notions equivalent? In Farreny and Prade (1986) it seems that the matrix corresponds to conditional possibilities, with the relation O(p and q) =min (O(q I p), O(p)). Is this always the case? What are the properties of the matrix induced by the properties on possibilities and necessities? Does the matrix replace the possibility and necessity of an implication? Or can these measures be obtained from it? Possibility and necessity on .first-order implications. It is not so difficult to imagine the possibility and necessity of a propositional implication like

"if attends-to-the-meeting(Peter) then very probably attends-to-the-meeting(Mary)". It is not so easy when one considers first-order implications. What is now the meaning of possibility Poss and necessity Nee for: Vx P(x) --+ Q(x). (I) It could be the possibility (respective necessity) of the global formula

Poss [Vx P(x)

--+

Q(x)]

This means that "it can be" that all P(x) are Q(x) as in "it can be that all planes are on strike today"; it cannot be used to express the uncertainty on "all birds fly" for example. Poss [Vx p/ane(x)

--+

in-strike(x)]

(2) It is more probably something like the possibility of the conclusion when the conditions are verified, which is not so far from conditional possibilities: VxP(x)

--+

Poss [Q(x)]

Vx attends-to-the-meeting(Peter, x)

-->

Poss [attends-to-the-meeting(Mary, x)]

as in

or

Vx smokes(x)

--+

Poss [diedbefore60(x)].

These implications could be rewritten as: smokes(Toto)

--+

Poss [diedbefore60(Toto)]

smokes(Lulu)

--+

Poss [diedbefore60(Lulu)] .

.

The possibility does not depend on the x concerned, and remains the same after instantiation. (3) But what about when the possibility depends on the domain of x (as is the case when one considers statistical measures)? For example, in: Vx bird(x)

--+

Poss [flies(x)],

the possibility reflects the fact that Bird is a superclass, a union of classes of birds that fly and of classes of birds that do not fly; Poss is only valid for x being a bird but

318

D. Dubois and H. Prade

changes when the domain of xis restricted to a subset of bird such as duck or penguin; the implication cannot be specialized, via the classical rule of specialization, without an update of the possibility. Let us suppose Vx penguin(x)

-+

bird(x)

Vx bird(x)

-+

Poss [.flies(x)]

One cannot derive from this Vx penguin(x)

-+

Poss [bird(x)]

This is the same argument used in Duval and Kodratoff (1986): the French usually drink coffee, but it cannot be used to derive that there is some possibility that someone (who is French) drinks coffee. More generally, the problem is that of an evaluation that is true for a groupE of x, but cannot be used for a subset of E. The same problem is met with the use of fuzzy quantifiers; and it seems to be difficult to reason on such information. Implementation issues.

Let us suppose two certain inferences:

Vx height(x) > 1.80 m

-+

basketball-player(x)

Vx height(x) > 1.80 m

-+

likes(Mary, x)

and that we know from a database that height(John) > 1.80 with n = IX and N = fi. If a contradiction such as --,basketball-player(John) is added, then it seems that we have to (i) (ii)

update the certainty measures of height(John) > 1.80; update the certainty measures of the derived assertions such as likes(Mary, John);

(iii)

update all the certainty measures concerned with the height of John as height(John) > 1.90 ... ;

(iv)

if the first inference were

Vx height(x) > 1.80 m and weight(John) < 80 kg update could be done on weight(John) too.

-+

basketball-player(x) then this

If, for dealing with a contradiction, a complete TMS algorithm has to be implemented, requiring the use of dependency links to antecedents, is it not easier to use these links to reason directly, as in hypothetical reasoning, on the unknown or incompletely known values? In conclusion, this chapter seems to be an up-to-date treatment of a crucial problem, that of reasoning on imperfect knowledge. A clear presentation is made of possibilistic and fuzzy logics, and a number of exciting problems remain to be explored. Paul Gochet: Dubois and Prade acknowledge that the notion of truth with which they operate has been tailored for a special purpose. They define truth as the agreement, which can be partial (graded) between the representation of the meaning

10

Possihilistic and Fuzzy Logics

319

of a statement and the representation of what is actually known. If that definition is combined with the standard definition of knowledge as justified true belief then a vicious circle is generated. This objection, however, can be dismissed. The authors are entitled to take the notion of "knowledge base" as primitive and to define knowledge as the content of a knowledge base. They present the correspondence theory of truth as the standard concept: "Truth is generally understood as the conformity between a statement and the actual state of affairs it supposedly refers to." At first sight, that presentation can be questioned. Tarski's semantic concept of truth or Ramsey's earlier redundancy theory of truth have won a wider agreement, at least among logicians and philosophers, than the traditional correspondence theory of truth. For Tarski, truth consists of satisfaction by all sequences of objects of the domain, and satisfaction, in turn, is given a recursive definition (Gochet, 1986). That definition enables him to do without the metaphoric expression of "conformity" and to avoid commitment to dubious entities such as facts (Gochet, 1980) or states of affairs. For Tarski, the predicate "true in language L" can be defined either absolutely, i.e. independently of a model, or relatively, i.e. with respect to a model. The second definition is more often used today, as it plays a crucial role in formal semantics, where it serves to define validity (truth in all models). That definition of validity is very general. It applies also to non-classical logics in which the models have been enriched by the introduction of possible worlds, moments of time, accessibility relations, and modified by a non-standard interpretation of logical constants. A version of the correspondence theory of truth was recently defended by Perry and Barwise within the framework of their situation semantics. It has been shown, however, that situation semantics and Montague semantics, despite significant differences, can both be subsumed under a slightly modified version of the framework that Montague provided in his Universal grammar (Muskens, 1988). Since Montague's framework embodies and enlarges Tarski's definition of truth, this result shows that Dubois and Prade's correspondence theory of truth can fit in with the "received view", i.e. with Tarski's definition of truth. Dubois and Prade's theory, however, is incompatible with Ramsey's theory. This is worth examining, since Ramsey has taken a stance on the issue raised by the concept of degree of truth. According to Ramsey, and also according to Ayer (Gochet, 1988), who has much improved on Ramsey's theory, saying that a statement is true is nothing more than reasserting the statement. The sentence "It is true that p" means nothing more than "p". The predicate "true" is redundant. Haack ( 1980) observes that the very notion of degree of truth ceases to make sense if we take up the redundancy theory: "... given that he holds that 'It is true that p' means that p, it is natural that Ramsey should say that 'It is ! true that p' means nothing at all, since there seems to be no way of modifying the right-hand side of Ramsey's definition to give a sense to the modified left-hand side." One might question the claim that the adverbial modification of the truth-predicate cannot be transferred meaningfully to the asserted sentence. Instead of saying "It is half true that the flag Is white", one could say "The flag is half white". But this counterexample fails to refute Haack's claim. By cancelling out the expression "It is true that" and displacing the degree adverb, we change the meaning. The former sentence allowed one interpretation only ("The flag is grey"), whereas the latter allows several ones, and the preferred reading is "Half of the flag is white". Moreover, there are cases where the syntactic shift is really impossible. We can say

320

D. Dubois and H. Prade

"It is i true that France is hexagonal" but the sentence "France is i hexagonal" is sheer nonsense. The clash between Dubois and Prade's admission of degrees of truth and Ramsey's redundancy theory is definitely not an argument against the former view, since it is open to us to abandon Ramsey's theory and retain the idea that truth comes in degrees. We make rough statements such as the abovementioned sentence "France is hexagonal" borrowed from J. L. Austin. Two strategies are possible to cope with that linguistic use. We can say that such a statement fits the fact to a certain degree and decide that statements that fit the fact to a degree ranging between 50% and 100% are true, whereas statements whose "degree of fit" falls below 50% are to be ascribed the truth-value False. Or we can collapse the two dimensions of assessment (fitness to facts and truth-value) into one and introduce the notion of degrees of truth. This is Dubois and Prade's policy. This policy enables them to exhibit the interconnections between classical logic, modal logic, possibilistic logic, many-valued logic and fuzzy logic. This fully justifies their choice even if it departs from ordinary parlance.

Flash Sheridan: The problem with fuzzy logics is not that they are bad logics, but that they are not logics at all. It is hard to define what logic is, but it can be clear that something isn't a logic: if it has significant empirical consequences, or if it doesn't have connectives satisfying the most basic properties (see below) of and and or. Dubois and Prade's logic proves something that I claim is an empirical statement about the nature of colour. An alternative fuzzy logic has connectives that claim to be and and or, but are nothing like them. (I shall restrict my attacks to or; it is the easier target.) The two most basic things about or are that p or pis the same asp, and that p or q is no less true than either p or q. Call the first "idem potence", the second "monotonicity." And must also be idempotent, and monotonic the other way: p and q is no more true than either p or q. Say we have a pencil that is fairly red, and fairly orange. I claim that it is at least conceivable that it is very red or orange. (In fact, there is such a pencil, but that doesn't matter.) With the "most popular choice of operations" (Dubois and Prade's equations (14) and (15)) this is impossible: the pencil is fairly red or orange. (t(P v Q) = max (t(P), t( Q)). t(P) is the degree of truth of the proposition P.) Dubois and Prade do have a theory of colour that makes sense of this; I think it is arbitrary and wrong, but that doesn't matter. What matters is that one can deduce their theory of colour from their logic. I claim that this theory is empirical, so this version of fuzzy logic is not a logic. I am not going to discuss the meaning of"empirical"; if you feel the existence of such a pencil is not an empirical matter, you need not believe my argument. (I think one could even make a case that it isn't.) But if you agree that it is empirical, you may be intrigued by fuzzy logic's usefulness, but you must believe that this usefulnesss is accidental. I know of a different version of or. It uses + instead of max. The obvious problem with this is that one may then get truth values greater than I; it dodges the problem by fiat: if the value one gets is greater than!, pretend it is 1: t(p v q) =min [t(p + q), 1]. This is not idempotent. I am not here attacking the idea of vagueness; I should be interested to see a good logic of vagueness, although there are strong reasons to believe that there can be no such thing. The best philosopher to address the issue of vagueness has concluded that

10

Possibilistic and Fuzzy Lof(ics

321

it is incoherent (Dummett, 1975; see also Fine, 1975-to this latter article, through Frank Veltmann, I am indebted for the color example). If there is a way to axiomatize vagueness, it seems it would have to be far more radical than you would be willing to acept. (It would probably have to be an extreme version of an extreme philosphical position called "strict finitism".) But, except for computational convenience, I see no reason for it to be truth-functional.

Reply: The three discussants have each focused on different aspects of our paper; their respective comments can be summarized into the following questions. (i)

Can there be a logic of vague propositions (Sheridan)?

(ii)

What is the expressive power of possibilistic logic and its relevance for commonsense reasoning (Cordier)?

(iii)

What is the meaning of graded truth (Gochet)?

All three questions are very much relevant to a proper understanding of fuzzy set and possibility theory, and we are grateful to the discussants for raising them. First of all, fuzzy-set theory has no special claim to stand as a general theory of vagueness. To build a membership function one needs three objects: a referential set n, a set of membership values V and a mapping !1A from n to V that discriminates between membership and non-membership in A. The set Vis usually taken to be the unit interval, but this is clearly a matter of convenience. More generally, V should be allowed to be a lattice (Goguen, 1967); then one can build purely qualitative models of vague concepts. Q can be any kind of set, but in practice the use of the fuzzy-set approach is made easier whenever n is what we shall call a "simple set", i.e. either a finite set with small cardinality, or a linear numerical scale, or a Cartesian product thereof. Outside these cases, it is difficult to find a procedure that enables the membership function to be elicited in a reasonable way. Fortunately the above cases occur quite often in practice, especially when the predicate A can be expressed by means of some clearly identified attribute a of objects in n, ranging on some scale S that is a simple set. For instance n is a set of (possibly numerous) people, a(w) evaluates the size of wEn and A means "tall", and is defined on S rather than n using the membership function !1A: S -+ V. !1A(a(w)) is then the degree of tallness of the individual w. The identification of a membership function on a simple set is a problem in empirical psychometry, which is not especially difficult (see Smithson, 1987; Norwich and Turksen, 1984 ). Ancestors of the membership functions have been suggested by philosophers of vagueness (for example Black's (1937) consistency profiles) as a reasonable way of capturing the meaning of vague concepts. But note that the expression of membership functions in terms of random sets (Kampe de Feriet, 1982; Goodman and Nguyen, 1985) enable statistical interpretations to live alongside pure psychometric interpretations of fuzzy sets. This dependence of fuzzy logic upon empirical or statistical matters may look disgraceful to fully fledged logicians. In particular, our theory of vagueness offers no "ontological", absolute definition of graded truth. From a philosophical point of view, fuzzy or possibilistic logic may appear to be accidental. But all practical problems are philosophically accidental too, and the purpose of the logic systems in this book is the solving of practical problems rather than the solving of philosophical issues, although Gochet tends to suggest that our classification of logic systems may have some philosophical relevance. Let us turn to the question of truth-functionality. Sheridan stands strongly against

322

D. Duhoi.1· and H. Prade

the idea that a logic of vagueness can be truth-functional. First, we have never said that it always is. Truth-functionality is preserved only in the presence of complete information. It is false whenever the available information is not complete, even for standard formulae. Another point is that gradedness of truth is not compatible with Boolean algebra. An algebra of vague propositions necessarily has fewer properties than a Boolean algebra. Here there is a choice about which structural properties we are to give up. Let t( p) be the truth value of the (vague) proposition p. Sheridan thinks that basic properties of disjunction are monotonicity (t( p v q) ~ max (t( p), t(q))) and idempotency. Taking truth-functionality for granted leads to t(p v q) = t(p) l_ t(q), where l_ is continuous, coincides with the logical "or" for binary truth values, is commutative, and monotonically increasing in the wide sense. If we further assume that x l_ 0 = x (i.e. p or "false" = p) then the only possible choice for t(p v q) is t( p v q) = max (t( p), t(q)), given Sheridan's requirements of monotonicity and idempotency. Moreover, idempotency is incompatible with the excluded-middle law (Dubois and Prade, 1984d). If the latter must be preserved then we must drop idempotency, and the only possible solution, up to an isomorphism, becomes (t(p v q) =max (0, t(p) + t(q)- I). These proposals are not arbitrary, but are dictated by the algebraic properties that one wishes to keep. Hence there are several possible algebraic structures for a set of vague propositions, and they are all compatible with the unit interval. This is why truth-functionality can be preserved for vague propositions. But we acknowledge the fact that the truth-functionality assumption is made in order to get a simple theory of vagueness. We make it because it is not self-inconsistent and because it makes computations easy to carry out. We agree with Sheridan that 0.4 red and 0.4 orange may lead to 0.8 red or orange. We could always get a disjunction operation that satisfies this condition. However, we believe that nobody would state this property this way. We prefer the approach that first states which algebraic properties of the fuzzy "or" are sensible in a given situation, and then derives the proper class of "or"s accordingly. If this class contradicts the available evidence then maybe the truthfunctionality assumption should be dropped. But we are aware that truth-functionality is here a matter of convenience and is clearly an assumption. See Osherson and Smith (1981, 1982) for a discussion of its limitation from a psychological point of view, and the discussions by Zadeh (1982) and Cohen and Murphy (1985) of the extensionality of the logical combination of vague concepts. Let us now turn to possibilistic logic. Cordier raises a number of very interesting issues in her comments. First, why use numbers instead of symbols to express uncertainty? A purely symbolic approach to uncertainty such as the one by Duval and Kodratoff (1986) faces a challenging problem; that is, how to combine the symbolic modalities. At the end of a proof one gets a list modalities to be interpreted as a whole. And to our knowledge there is no guideline as to how this should be done. In contrast, here, uncertainty propagation is done according to the rules of a given theory of uncertainty (whose choice depends upon the nature of the available information). Of course, the degree of uncertainty bearing on a conclusion may not be informative enough, and, as Cordier stresses it, one may wish to get the reasons for uncertainty as well. But handling uncertainty is not incompatible with maintaining the hypothetical assumptions under which this uncertainty would be removed. The two tasks are not redundant: uncertainty expresses to what extent information is lacking, while hypothetical reasoning is useful for characterizing what is the extra information needed to remove the uncertainty. In that sense, the suggestion of using Truth Maintenance System-like approaches in conjunction with uncertain reasoning

10

Possihilistic and Fuz=y Logics

323

is certainly valuable. Note also that in the estimation of the uncertainty of a compound proposition with respect to a given (incomplete) state of information using a fuzzy pattern-matching technique, it is possible not only to compute a possibility and a necessity degree, but also to determine what part of the information needs to be made more precise and in what manner in order to come closer to complete certainty. Another important issue is that of properly interpreting degrees of necessity and possibility, and being able to get them out of the available evidence. As mentioned earlier, possibilistic information usually stems from linguistic information involving vague terms referring to simple sets. Incompleteness and vagueness of available evidence lead to grades of uncertainty obtained through fuzzy pattern-matching. This is true for elementary facts as well as rules, since a fuzzy linguistic rule also translates into a possibility distribution. Moreover, the interval between the necessity and the possibility of an assertion p, say [N(p), O(p)], can be viewed as bounds on an unknown probability-either lower-bounded (if 0( p) = I) or upper-bounded (if N(p) = 0).

Why is possibilistic logic interesting at all compared with probabilistic logic? (i)

Possibilistic logic offers an absolute reference point for expressing ignorance. Namely N(p) = N(1p) = 0. Expressing some belief about p comes down to choosing a point in [0, I] in between certainty (N(p) = I, N(1p) = 0) and ignorance (N(p) = 0, N(1p) = 0). Ignorance cannot be modelled in probability theory, where it is approximated by randomness. Moreover, in the case of randomness one can say nothing about Prob (p) compared with Prob ( 1 p) unless one knows how many alternatives are offered by p and 1 p. Note that possibility cannot model randomness, i.e. probability theory has its own usefulness, of course. Upper and lower probability systems can model ignorance, and possibility theory can be viewed as the simplest of upper and lower probability systems.

(ii)

Possibility theory offers a nice framework to attach weights of uncertainty to rules "if p then q" in complete accordance with classical logic. An uncertain "if ... then" rule is better expressed by a conditional measure (g(q I p)) than by the measure of a conditional (g( p -+ q)). The quantity N( p -+ q) is very close to a conditional possibility measure, as explained in Dubois and Prade (1986a). Namely we have N(p-+ q) = N(ql p) £ I - 0(1ql p)

as soon as O(p " q) of. 0( p), where O(q Ip) is defined as the greatest solution of 0( p " q) = min (n(q I p), O(p)). Hence the necessity of a conditional is close to a definition of conditional necessity, and possibilistic logic, when we restrict the pair (N(p), N(1p)) to be either (0, I) or (1, 0), reduces to classical logic. The probability of a conditional is seldom equal to the conditional probability (Prob (q 1 p) = Prob (p -+ q) only if they are both I, or if Prob (p) = I). Hence translating uncertain rules into conditional probabilities does not yield a logic that generalizes classical logic, strictly speaking. (iii)

Possibilistic logic is a quasi-qualitative calculus where numbers are compared and not added or multiplied. Numbers are useful only to model gradedness, and no great precision is required. In contrast, probabilistic logic requires sufficiently precise inputs in order to be able to carry out long inferences that remain informative.

D. Dubois and H. Prade

324

Let us now consider the problem of first-order implications. It is clear that N(Vx, P(x) -+ Q(x)) = IX is not the same as Vx N(P(x) -+ Q(x)) = IX (where P and

Q

are non-vague predicates for simplicity). The first expression represents a conjecture that is refuted by finding x 0 such that P(x 0 )-+ Q(x 0 ) is false. The other expression is closer to a default rule when IX is close to I, since it means that for all x, "P(x) -+ Q(x) is true is almost sure" (but there may be exceptions). That is exactly equivalent to saying that when P(x) is true then Q(x) is almost surely true, putting the necessity on Q(x) only. This identity of meaning is reflected by the fact that both approaches, i.e. putting the necessity on the rule or on the rule conclusion, are equivalent. To see this, let M(P) = {x 1 P(x) is true}. N(Q(x)) =IX is expressed by the fuzzy set M(Q.) defined by (Prade, 1985b) I ifxEM(Q) { I-IM
-IX)=

max (I - J-IM
-IX))

which is the same as the one translating the fuzzy rule P(x) -+ Q.(x), where Q. is the fuzzy predicate whose meaning is defined by J-IM
10

Possihilistic and Fu==y Logics

325

clear statistical interpretation (even if the probability values are fuzzily known). This is not really the case for the "most birds fly" example, since one does not know how to count events. If it means "most bird species can fly", performing a statistic makes no sense because the species are more a matter of definition than objective facts. In that case, we may think that possibilistic logic offers a reasonable framework for automated reasoning with instantiated default information, provided that we can cope with non-monotonicity, i.e. we can devise sophisticated schemes for combining pieces of information with various levels of specificity/generality. See Dubois and Prade ( 1987c) for a first step in this direction. Let us end this reply by saying a few more words about our notion of truth. Once again, it is very pragmatic and refers to what we know about the world, and not to the actual world. Our working assumption is that the database contains vague, incomplete information, but no utterly wrong statements. We agree with Haack about the lack of meaning of degrees of truth attached to classical formulae. But the incompleteness of the available information may make us ignorant about truth; we model it by a {0, !}-valued possibility measure on the set {true, false}, such that n(true) = n(false) = I. Vagueness of the available information leads us to let n(true) and n(false) lie in the unit interval. So far, truth is a binary notion. It is only when evaluating the truth of vague statements that grades of truth become meaningful, to model the fact that vagueness pertains to the existence of borderline instances or interpretations. The example "France is hexagonal" mentioned by Gochet is very interesting. It is a typical instance of a word that has a very precise meaning in some contexts (mathematics) and a fuzzy meaning in ordinary talk. "Hexagonal" then refers to an illbounded set of shapes that look more or less like the mathematical hexagon. A natural way of evaluating the degree of truth of "France is hexagonal" is to evaluate the relative amount of distortion to which a hexagon must be submitted in order to get France's shape. This idea is actually implemented in our team for the purpose of analysing verbal designations of objects in scene analysis (Dubois and Jaulent, 1985).

Additional references Black, M. (1937). Vagueness. An exercise in logical analysis. Phil. Sci. 4, 427-455. Cohen, B. and Murphy, G. (1985). Models of concepts. Cognitive Sci. 8, 27-58. Dubois, D. and Jaulent, M. C. (1985). Shape understanding via fuzzy models. Proc. 2nd IFACjlFJPjlFORS/IEA Conf on Analysis, Design and Evaluation of ManMachine Systems, Varese, Italy, pp. 302-307. Dubois, D. and Prade, H. ( 1984d). Criteria aggregation and ranking of alternatives in the setting of fuzzy set theory. Fuzzy Sets and Decision Analysis (ed. H. J. Zimmermann, L. A. Zadeh and B. R. Gaines), pp. 209-240. TIMS Studies in the Management Sciences Vol. 20. Dubois, D. and Prade, H. (1987b). On fuzzy syllogisms. Computational Intelligence (to appear). Dubois, D. and Prade, H. (1987c). Possibility theory and default reasoning. Artificial Intelligence. (To appear.) Dummett, M.A. E. (1975). Wang's Paradox. Synthese 30, 301-324; also in Truth and Other Enigmas. Duckworth Press, London, 1978. Duval, B. and Kodratoff, Y. (1986). Automated deduction in an uncertain and inconsistent data basis. Proc. 7th Eur. Con[. on Artificial Intelligence, Brighton, pp. 101-108.

326

D. Dubois and H. Prade

Fine, K. (1975). Vagueness, truth, and logic. Synthese 30, 266-300. Gochet, P. (1980). Outline of a Nominalist Theory of Propositions, pp. 73-86. Reidel, Dordrecht. Gochet, P. (1986). Ascent to Truth, pp. 81-83. Philosophia Verlag, Munich. Gochet, P. (1988). On Sir Alfred Ayer's theory of truth. The Philosophy of A. J. Ayer (ed. L. E. Hahn). The Library of Living Philosophers, Open Court, La Salle, Illinois. Goguen, J. A. (1967). L-fuzzy sets. J. Math. Anal. Applies 18,145-174. Goodman, I. and Nguyen, H. T. (1985). Uncertainty Modelsfor Knowledge Based Systems. North-Holland, Amsterdam. Haack, S. (1980). Is truth flat or bumpy. Prospectsfor Pragmatism (ed. D. H. Mellor), pp. 17-18. Cambridge University Press. Kampe de Feriet, J. (1982). Interpretation of membership functions of fuzzy sets in terms of plausibility and belief. Fuzzy ll!formation and Decision Processes (ed. M. M. Gupta and E. Sanchez), pp. 93-98. North-Holland, Amsterdam. Kodratoff, Y., Perdrix, H. and Franova, M. (1985). Traitement symbolique du raisonnement incertain. Proc. AFCET, ll!formatique Congres, Paris pp. 33-45. AFCET, Paris. Muskens, R. (1988). Going partial and relational in Montague grammar. Proc. 6th Amsterdam Colloq. Norwich, A. B. and Turksen, I. B. (1984). A model for the measurement of membership and the consequences of its empirical implementation. Fuzzy Sets and Systems 12, 1-25. Osherson, D. N. and Smith, E. E. (1981 ). On the adequacy of prototype theory as a theory of concepts. Cognition 9, 35-58. Osherson, D. N. and Smith, E. E. (1982). Gradedness and conceptual combination. Cognition 12, 299-318. Smithson, M. (1987). Fuzzy Set Analysis for Behavioral and Social Sciences. SpringerVerlag, Berlin. Zadeh, L. A. (1982). A note on prototype theory and fuzzy sets. Cognition 12, 291-297.

Index

Note: Figures are indicated by italic page numbers; Tables by bold numbers; Footnotes by suffix "n"

Aguty, meaning of term, 138n Aircraft battle simulation, 47, 48-9 Approximation lattices, 54 ARCHES system, 88, 89, 90, 96, 101, 102 ASK function, 34, 35 Assertions, accepting/rejecting of, 33-7 ASSUME logic axiomatics of, 66-8 canonical model for, 69-70 completeness of, 70, 71 and conditional logics, 73-4 in database updates, 72-3 decidability of, 71 fundamental theorem for, 70 language of, 65 deductive systems for, 66-8 semantics of, 65-6 relation to other approaches, 75 resolution deduction method for, 7(}-1 soundness of, 71 Autodoxastic logic, 105n; see Autoepistemic logic Autodoxastic theory, in game-playing, 44-5 Autoepistemic logic alternative semantics for, 111-14 applications of, 119-22 basic theory of, 107-11 compared to default reasoning, 12(}-1 computing with, 114-19 language of, I 06 non-defeasibility of, 121 non-monotonicity of, 105, 121-2, 130 possible-world semantics used, 111-14 and preferential-models approach, 154 and related logics, 122-6 transformation procedure for, 117-18 Autoepistemic model, 107 Autoepistemic theories, 107

completeness of, 108, 109 grounding defined for, 109-10 possible-world intepretations of, 112,

113 soundness of, 108, 110 stability defined for, 109 stable-expansion definition of, II 0 Axiom schemata ASSUME logic, 66 autoepistemic logic, 125 & n belief functions, 267-8 game-theoretic semantics, 38 meaning of term, 12n TLA logic, 94 Axiomatization ASSUME logic, 66-8 TLA logic, 88, 95 Barcan formula, 39, 43, 59 Bayesian analysis, 224, 235 Bayesian formula, 216, 225, 229, 237 Bayesian network techniques, 238 Bayesian theorem, generalized within framework of belief functions, 27(}-3, 275 Belief functions combination of two belief functions, 267-70 communality functions used, 262-3 consonant belief functions used, 264 and default logic, 208-9 Dempster's rule of combination applied, 260, 264 Dempster's rule of conditioning applied, 265-6 differences from Shafer's model used, 266-7 and discounting evidence, 273-4

328

Belief functions-contd. inequalities to be satisfied by, 261-2 mathematical properties of, 261-7 meaning of, 274 and modal logic, 131 transferable-beliefs model used, 257-8, 282, 283, 284 and possibilistic logic, 301-2 semantic basis of, 277-8 as simple support functions, 263--4 vacuous belief functions used, 263 Belief system, standard way of defining, 28-9 Boote, George, I, 2 Boolean connectives, 9 Bound variables, I 0 Boyer-Moore theorem-prover, 59 Calculus, meaning of term, 8 Canonical model, ASSUME logic, 69-70 Chellas nomenclature, 19, 126 Circumscription, 6, 148-54, 156, 159, 199-200 consistency of, 150 definition of, 149 derivation of formula by, 152 disadvantage of, 149 Leibniz substitutivity schema, 153 non-monotonicity of, 153--4 schema used, 151-2 varying relations used, 150 Classical logic, 2, 8-14 and default logic, 187 extension of, 2, 3, 15 and intuitionistic logic, 167-8 non-standard logics as alternatives, 2--4 probabilistic logic as extension of, 213-14 Closed default theory, 192 Closed normal default theory, 192 Closed-world assumption (CW A), 6, 142-5, 156-7, 159 in belief models, 256, 257, 269, 270 consistency of, 145 and minimal models, 143--4 non-monotonicity of, 145 type of problem tackled, 143 Closed-world conditioning, definition of, 270

Index

Combination, Dempster's rule of, 260, 264, 267-8, 275, 280, 281 Commonsense reasoning, 140, 156 Communality functions, 262-3 Completeness ASSUME logic, 70, 71 definition of, 12 requirement for, I statement in modal logic, 18 Conditional logics and ASSUME logic, 73--4 relation to theory change, 79 Conditional probability, definition of, 216 Conditionals, probabilistic logic modelling of, 23--4 Conditioning, Dempster's rule of, 265-6, 280, 283 Consistency circumscription, 150 closed-world assumption, 145 definition of, 129, 133 meaning of term, 145n statement in modal logic, 18 subimplication, 147 Consonant belief functions, 264 Convex Baynesian analysis, 225, 239 Correspondence theory of truth, 319 Credibilistic ignorance, 254-5 Crisp predicates, 289 Crisp statements, 289-92, 293 Database updates, 72 contraction type, 64 examples of, 72-3 expansion type, 64 hypothetical approach, 63 revision type, 64 temporal approach, 63 Decidability ASSUME logic, 71 meaning of term, 13 Deductibility, term avoided in autoepistemic logic, 128, 133 Deduction definition of, 137n meaning of term, 137n Deduction theorem, 13 Default logic and belief functions, 208-9

Index

and classical logic, 187 formalization for is-a hierarchies with exceptions, 197-201 advantages of, 201 ambiguity solving in, 200 semantics defined, 197-200 introduction to, 180-1 non-monotonicity of, 187 non-numerical nature of, 212 and preferential-models approach, 154 and probabilistic logic, 240 theory defined, 186-94 formal definition, 188-94 intuitive approach, 186-8 Default normal theories, semimonotonicity of, 193 Default proof, definition of, 193 Default reasoning and autoepistemic logic, 120--1 defeasibility of, 121 possibilistic logic used in, 303 probabilistic logic used in, 303 uncertainty logics used in, 303--4 Default theory, definition of, 188 Defaults, implicit ordering of, 196-7 Degree of belief, quantification of, 257-61 belief functions used, 260--1 combination of evidence, 260 general model used, 257-8 plausibility functions used, 260--1 practical example, 258-60 Dempster's rule of combination, 260, 264, 280, 281 axiomatic justification of, 267-8, 275 generalized version of, 280 uniqueness of, 268, 275, 281 Dempster's rule of conditioning, 265-6, 280, 283 Dempster-Shafter model, 220, 238-9, 255 Derivability, meaning of term, 12 Deviant logics, 3 DIABETO expert system, 311 Discernment, frame of, 255-7 notation of, 256-7 open- and closed-world assumptions, 255-6 Discounting (of evidence), 269, 273--4, 275 Discriminant minimal models, 147 defintion of, 146

329

Discriminant models, 159, 160 Domain of Reference, 255 Domain-free heuristics, 47, 58, 59, 60 Domain-specific heuristics, 47, 48, 58, 59, 60 Doxastic logic, game-playing using, 43-7

Elementary changes, logic of, 63-76 criticism of, 77-9 Ellis' rational belief system, 30--1 Entropy maximization, 229, 236 Environment, meaning of term, II Epistemic possibility, 298-9 Epistemic states, 54 Error models, 223, 230--3 additive error with constant variance, 231 binomial errors, 232 piecewise error density, 231 Etherington-Reiter interpretation (of NETL), 194-6 Excluded middle, law of, 3--4, 322 EXECUTIVE function, 37, 55 Expert, knowledge base as representation of beliefs of, 43 Expert systems, probabilistic logic applied to, 214,214, 216 Extended interface language, 34--5 Extended Kalman filter, 236 Extended logics, 3 Extensional theories, 143, 144

Fahlman's (semantic) network, 180, 1946 see also NETL system First-order logic, 9 compared to modal logic, 140 model theory, and minimal models, 139--42 First-order predicate calculus, meaning of term, 152n First-Order Predicate Logic (FOL), syntax of, 8-9 Fixed-point definitions, 123, 124, 166, 170 Fixed-point theories, relation to intuitionistic logic, 172-3

330

Frame of discernment, 255-7 notation of, 256-7 open- and closed-world assumptions, 255-6 Free variables, to Fundamental theorem ASSUME logic, 70 definition of, 69 Fuzzy information, 293-4 Fuzzy logic, 7-8, 287, 288, 304-10 applications of, 310-11 compared to possibilistic logic, 315 example of use, 307-8 formal reasoning with vague predicates, 304-5 meaning computation as basis, 305-7 meaning-representation language used, 305, 312 and probabilistic logic, 240 reasonong with fuzzy quantifiers, 30810 Fuzzy predicates, 316 Fuzzy quantifiers, reasoning with, 30810

Game-playing heuristics, 47-50 deviation from Hintikka's rules, 47 Game-theoretic semantics, 29-30 game rules used, 30, 31 model sets used, 30 rational belief system used, 30-1 relation to other approaches, 51-2 status of program, 50-I summary of research, 50 Generalized Bayesian theorem, 270-3 and belief functions, 272 Generalized modus ponens, 310, 311 Generalized resolution principle, 306 Godel's implication, 306 Ground literals, 143, 144 Grounding, meaning of term, 143n

Halpern and Moses' logic, 122, 159 Herbrand models, 145, 150-1 definition of, 144 Higher-order logics, 9, 12-13 Hintikka's game-theoretic semantics, 2930,47

Index

Horn theories, 145, !56 meaning of term, 145n Ignorance credibilistic, 254-5 possibilistic, 253, 287 probabilistic, 213, 253-4 Inference, rules of, 12 Inference nets, 214 framework for evaluation of, 220-4 information and evaluation strategies available, 222-4 vector notation used, 220-2 method used for large nets, 238 with/without cycle, 228-9 Inferential ordering, 196 INFERNO, 226-7, 238, 301 Inheritance hierarchies, 4, 6, 181-6 Insufficient Reason, Principle of, 258, 263 Interpretation definition of, 139-40 meaning of term, I 0 Introspective reasoning, 132-3, 135 Intuitionistic entailment, definition of, 169 Intuitionistic logic, 4, 6, 14 and ASSUME logic, 74 automation of, 173-4 compared to classical logic, 167-8 and fixed-point theories, 172-3 and modal logic, 172-3 non-monotonic component of, 170-1 semantic presentation of, 168-70 Intuitionistic probability, definition of, 169 K45 logic, 126, 130, 134 KD45 logic, 19,43-7, 130, 134 KL-ONE knowledge representation system, 180 KL-TWO knowledge representation system, 180 Knowledge, representation by temporal logic, 88-90 Knowledge bases, and autoepistemic logic, 119 Knowledge-based systems and circumscription, !50 game-theoretic approach to, 29-32

Index

incremental development of, 28-9 interface languages used, 34-5, 36--7 interpreter, 28, 31, 36 man-machine dialogue in, 28 truth-functional approaches, 29 Kripke models, 5, Ill Kripke semantics, 2, 19-20, 85 Kripke structures, 112-14, 115 KRYPTON knowledge representation system, 180 Levesque's autoepistemic logtc, 122 Levi identity, 78 Likelihood approach, 233 Likelihood functions definition of, 233 evaluation of, 233-4 Linear opinion pool, 233 Linear-programming approach (for inference nets), 224, 225, 226, 238 Lukasiewicz many-valued algebras, 296 McDermott's non-monotonic modal theories, 123-5 McDermott's temporal logic, 98 McDermott and Doyle's non-monotonic logic, 122-3, 168, 170, 172 Many-possible-worlds models, 2, 5, 11112 see also Kripke ... Many-sorted logic (MSL), 13-14 Many-valued logic, 292, 295 MAX and MIN strategies, 35-6, 55, 57 Maximum-entropy approach (to binomial error model), 236 Maximum-likelihood approach, 233, 234 Meaning-representation languages, 305, 312 Measurable tense logics (MTL), 85, 86, 87 MECS-AI system, 83 Medical diagnostic systems belief functions used, 272-3, 275 possibility distributions used, 311 probabilistic logic used, 215-17 Meta-interpreter, 38 modal extensions to, 39-50 Metatheory, 3, 7 Minimal models, 158, 160

331

definition of, 142 and first-order models, 139-42 Minimally modellable theories, meaning of term, 148 Minimum-information approach, probabilistic logic, 229, 230 Minoring, 159 meaning of term, 141 Modal logics abbreviations used, 16 alphabet used in, 15 automated modal logic of changes, 6376 Chellas nomenclature used, 19 and first -order logic, 140 formal definition of, 15-16 game rules for, 40-3 and intuitionistic logic, 172-3 modal systems illustrated, 18-19 notation used, 20 and possibilistic logic, 302-3 semantics of, 19-20 sentence construction in, 16 sentence examples, 17-18 validity in, 21-2 Model, meaning of term, !59 Model Theory, 3, 6 Modus ponens ASSUME logic, 66 generalized modus ponens (in fuzzy logic), 310, 311 meaning of term, 12 possibilistic logic, 299 Modus tollens, possibilistic logic, 299 Monotonic proof systems, 13 Monotonicity classical logic, 137 intuitionistic logic, 164 relation to arguments, 138 see also Non-monotonic logics; Restricted monotonicity Montague semantics, 319 Mood symbols, 36--7 MYCIN inference engine, 311 Natural Deduction systems, 13 Necessity functions, 316-17 Necessity measures, 302 Negative introspection, 130, 135 Nested exceptions, 185

332

NETL system, 180 example using, 183-6 exception arrow used, 184 handle nodes in, 198 illegal networks in, 184 interpretation by Etherington and Reiter, 194-6 marker-passing schema used, 184 set inclusion arrow used, 183 typical set inclusion arrow used, 183-4 Nixon example, 106-7, 115, 118 Non-measurable tense logics (NMTL), 85-6, 87 continuous logic, 86, 87 dense logic, 86, 87 infinite logic, 86, 87 linear logic, 85-6, 87 minimal logic, 85, 87 TL~ logic, 88-97; see also main entry: TL~ logic transitive logic, 85, 87 Non-monotonic inference systems, 6, 13 Non-monotonic logic autoepistemic logic as, 105 classification of, 163-4 intuitionistic basis for, 163-78 preferential-models approach to, 13760 Non-monotonic operators, ASSUME operator, 66-7 Normalization belief model, 268-70 in belief models, 278-9 Numeric logics, 3, 7-8, 213-326 Open-world assumption, in frame of discernment, 256 Ordered seminormal default theories, 196, 199 Peano arithmetic (PA), 18, 116 Petri nets, 209-11 Plausibility functions, 260--1 Poole's formalism, 203 Possibilistic ignorance, 253, 287 Possibilistic logic, 287-8 axioms and interpretations of, 298-9 and belief functions, 301-2 and default reasoning, 303-4 and fuzzy logic, 315

Index

and modal logic, 302-3 and probabilistic logic, 300-1, 323 uncertain deductive reasoning by, 299300 Possibility degrees, uncertain deductive reasoning with, 299-300 Possibility functions, 317-25 Possibility measures, 291, 293 interpretation of, 298-9 Possible-worlds approach, 2, 5, 19, 111-12 Possible-worlds diagrams, 42, 46 Possible-worlds semantics, 85, 302 see also Kripke ... Posterior distributions, 224, 235 Preemption preordering, 141-2, 157 Preferential-models approach, 139-54 and autoepistemic logic, 154 and default logic, 154 limitations of, 158-9 Preservation criterion, definition of, 64, 66 Presupposition, ASSUME logic, 66 Principle of Insufficient Reason, 258, 263 Prior distribution, 235 Probabilistic ignorance, 213, 253-4 Probabilistic logic, 7, 23-4, 213-51 analysis having exact knowledge about probabilities, 224-30 restricted analysis, 227-30 worst -case analysis, 224-7 analysis with uncertain knowledge about probabilities, 230--7 error models, 230--3 statistical methods, 233-7 applied to expert systems, 214-15 assessment of, 237-8 basis concepts of, 215-20 example in medical diagnostics, 215-17 interpretation of subjective probabilities, 217-18 probability measures on propositions, 218-20 conditionals modelled in, 23-4 and default logic, 240 as extension of classical logic, 213-14 framework for evaluation of inference nets available information and evaluation strategies, 222-4 vector notation, 220--2

Index

and fuzzy logic, 240 main assumption of, 213-14 and possibilistic logic, 300-1, 323 relation to similar approaches, 238--40 and Shafer-Dempster theory, 220, 238-9 Probability, definition of, 213 Probability measures, 218-20 Probability theory, 3 PROLOG language, 37, 49, 173 Proof procedure, meaning of term, 148n Proof system, meaning of term, 8, 12 Propositional content (of sentences), compared with pragmatic content, 29 Propositional logic, 9, II PRUF meaning-representation language, 305 Quantifiers, 37-9 fuzzy quantifiers, 308-10 Queries, 37-9 importance of, 39 Ramsey's redundancy theory of truth, 319, 320 Reflexivity conditions, 164, 167 Reiter's default logic, 186-94; see also Default logic Relaxation rules, 52 Restricted analysis, 223, 227-30 Restricted class (of distributions), 222, 227 Restricted monotonicity condition, 167 Role links, 186 Rule-based diagnostics, probabilistic logic used, 215-17 Rules of inference, 12 Sampling error, 232 Satisfiability ASSUME logic canonical model, 6970 definition of, 65 meaning of term, 140 Semantic networks abduction in, 202 domains of application for, 201--4

333

example using, 183-6 introduction to, 179-80 meaning of word "typically", 201-2 Poole's formalism used, 203 typical elements in a class used, 203 see also NETL Semantics ASSUME logic language, 65-6 intuitionistic logic, 168-70 meaning of term, 8 modal logic, 19-20 TL,1 logic, 92--4 Seminormal default theories, 196, 199 Sentence tokens, 37 Shackle's non-probabilistic model, 298 Shafer (belief) model, 7, 255 criticisms of, 262 difference from, 266-7 normalization necessary, 268-70, 278-9, 281, 285 semantic basis of, 278 Shafer-Dempster theory, 220, 238-9, 255 Shafer-Tversky translator example, 274, 278, 283 Shannon's entropy measure, 281 Simple support functions (SSFs), 263--4 Simulated annealing method, 236 Situtation calculus, 98 Skolem functions, 40, 41, 43 Socrates example, 138 Soundness ASSUME logic, 71 definition of, 12 requirement for, I SPII inference engine, 31 {}-II Stalnaker's conditions, I 09, 124, 126 Statistical methods, probabilistic logic, 233-7 Statistical simulation techniques, 238 Stone representation theorem, 296 Strategy, definition in game-theoretic semantics, 33 Subimplication, 146-8, 159 consistency of, 147 definition of, 146 Subjective probabilities, interpretation of, 217-18 Symbolic reasoning game-theoretic semantics, 4, 27-61 intuitionistism, 6, 163-78 modal logics, 5, 15-23, 63-103

334

Symbolic reasoning-contd. reasoning with incomplete knowledge, 5-6, 105-61, 179-212 Syntax meaning of term, 8 TL~ logic, 9{}-1

TAIGER inference engine, 311 Tarski's semantics, 55, 319 Taxonomic default theory, 199 TELL function, 34, 35 Temporal logics, 85-8 dating logics, 87-8 tense logics, 85-7 see also Measurable ... ; Nonmeasurable tense ... ; TL~ logic Term models, 159 Theorem-provers, resource-bounded, 56, 57 Theory, meaning of term, 159 Theory change and ASSUME logic, 74 relation to conditional logics, 79 Time computer models of, 82-4 branching-time model, 99-100 linear-time models with manipulation of temporal intervals, 83-4, 98 nonlinear times expressed, 83 sequences of states expressed, 82-3 importance of, 81 TL~ logic axiomatization of, 88, 95 axioms of, 94 decision procedure based on, 95-6, 102 deduction relation used, 93, 95-6 inference rules of, 94 knowledge representation using, 88-90 semantics contents of, 92-4 syntactic contents of, 9{}-1 Touretzky's analysis of inheritance, 1967 Transitivity conditions, 164, 167 Triangular conorm, 296 Truth correspondence theory of, 319 degree of, 288-94 meaning of term, 288

Index

Ramsey's redundancy theory of, 319, 320 semantic approach to calculation of, 289-94 with crisp statement and fuzzy information, 293, 295 and imprecise information, 291-2, 295 and precise information, 289-90, 295 with vague statement and fuzzy information, 293-4, 295 and precise information, 292, 295 Tarski's semantic concept of, 319 Truth Maintenance Systems (TMS), 75, 322 Truth-functionality, 294-7 Tweety example, 5, 6, 12{}-1, 132, 156 default logic consideration, 187, 18990 Type hierarchies, 6 Typicallity, 201-2, 303 Uncertainty, logics of, 8 Uncertainty reasoning, 315-16 symbolic approach to, 322 Unexpected hanging paradox, 134, 135 Universal theory, meaning of term, 142n, 143n Universe of Discourse, 255 meaning of term, 10 Upper-lower probabilities, contrasted with belief functions, 254,279, 282 Vagueness, logics of, 3, 8, 289 see also Fuzzy ... ; Possibilistic ... ; Probabilistic logic Weak S5 logic, 126, 130, 134 Well formed formula (wff), meaning of term, 8, 9 Well-founded theories, meaning of term, 148 Worst-case analysis, 223, 224-7 Zadeh's paradox, 268-9, 280, 285 Zermelo-Fraenkel (ZF) set theory, modal logic use of, 18