PREFACE
From its inception nearly 2500 years ago, logic has taken a dominant interest in the notion of logical consequence. It is widely agreed that there exists a basic conception of deductive consequence, which Aristotle called “necessitation”, in which sentences or sets of sentences serve as inputs, generating output-sentences in a truth-preserving way. What is not widely agreed is that truth-preserving necessitation is the sole relation of consequence, or the only form of it that merits the serious interest of logicians. Aristotle himself introduces a further — and, for his interests, the more important — idea of syllogistic consequence. Syllogistic consequence is a restriction of necessitation, got by imposing further conditions which, among other things, require that syllogisms not have redundant premisses. This is clearly a constraint on inputs, and it is easy to see that it has the effect of making syllogistic consequence strongly non-monotonic. For if an argument is a syllogism, then the result of adding new sentences to its premisses cannot itself be a syllogism. Aristotle also imposed constraints on the outputs of syllogistic consequence. Syllogisms are required to have only single propositions as conclusions. The question of the constraints it may be desirable or necessary to impose on a stripped down, merely truth-preserving deduction relation brings into convergence the two main themes of the present volume of the Handbook of the History of Logic. Twentieth and twenty-first century logic is well-stocked with affirmative answers to this question. Many valued logicians gravitate to the idea that a realistic characterization of consequence requires the admissibility of inputs which, owing to vagueness, temporal or quantum indeterminacy, or reference-failure, cannot be considered classically bivalent. Consider the sentence “The present king of France”. On the many valued approach, the non-existence of the present king of France denies the sentence a referent, which in turn denies it a classical truth value. In classical logic, such sentences may not be admitted to the consequence relation. In many valued systems, the admissibility of such sentences is secured by the postulation of one of more additional truth values. Free logics offer an interesting alternative to many valued treatments of reference-failure. Instead of allowing many-valued sentences as inputs, free logicians retain the classical truth values while restricting output. In particular, the quantifier introduction rule F a ⊢ ∃x(F x) fails when F a is guilty of reference failure. In a variation of free logic, the necessity to multiply truth values is also averted, provided we are prepared to add to the logic’s standard domain of discourse a non-standard domain, in which singular terms such as “the present king of France” pick up a referent, notwithstanding that they lack a referent in the standard domain. Either way,
viii
Preface
whether the many valued way or the free logic way, something has to be added if sentences such as these are to be admitted to the consequence relation. Either an additional truth value must be introduced or an additional domain of discourse. Many valued logics are non-bivalent. This generates a problem for classical validity. An argument is valid classically if and only if there is no valuation making its premisses true and its conclusion false. So any argument containing at least one non-bivalent sentence — whether in basic systems such as K3 or in quantum logics or vagueness logics — is trivially valid classically. Accordingly, many valued consequence has to be contrived so as to avert these promiscuous validities. Many valued logics also impose constraints on outputs. In classical logic, the Law of Excluded Middle is a logical truth, hence is a consequence of any sentence whatever. But since in virtually all systems of many valued logic Excluded Middle is not a logical truth, it is not output for every input. Non-monotonic logicians, like Aristotle before them, are constrainers of inputs; and often they lay restrictions on outputs as well. The leading intuition of nonmonotonic logics is that there exist consequence relations in which reasonably drawn conclusions from a given set of premisses can be upset by the addition of new information, and that such consequence relations should be taken seriously by logicians. Default logics are a standard affirmative response to this intuition. When the conclusion drawn from some given premisses is a “default”, then it obtains provisionally and only in the absence of information to the contrary. Since indications to the contrary might subsequently present themselves in the form of new information, such consequences are non-monotonic. There are also many valued approaches in which it seems appropriate or necessary directly to constrain the consequence relation itself. Dialetheic logics are a case in point. Dialetheic logics are systems in which selected sentences are allowed to be both true and false at once. Such sentences, while true, aren’t true only, since they are also false, hence not false only. Accordingly being both true and false is a third truth value, which makes dialetheic systems many valued. If consequence were allowed to operate classically in these logics, then any input carrying the third truth value would have every sentence whatever as output. To avert this explosion, consequence has to be a paraconsistent relation, that is, one that does not generate this unfettered output. Accordingly, dialetheic logic is also a paraconsistent logic. The converse does not hold, however. There are paraconsistent logics that muffle the explosiveness of classical consequence without the necessity to posit true contradictions. A dominant move in paraconsistent circles is to constrain explosiveness by restricting the application of the output rule, Disjunctive Syllogism, when inputs contain inconsistencies, whether deemed true or not. A further development — also a many valued one — are fuzzy logics, which are purpose-built to accommodate vague sentences both as inputs to and outputs of the consequence relation. The founding insight of these logics is not that vague sentences require additional truth values, but that the classical truth and falsity will do provided that we allow the values of sentences to be degrees (or slices) of them. So seen, “Harry is bald” might be either true or false, or neither; and if
Preface
ix
neither, it might be somewhat true, or true to degree n, where n is fairly high; or more false than true, or false to degree m, where m is higher than any degree to which the sentence is true. The ten chapters of The Many Valued and Non-Monotonic Turn in Logic are designed to give readers a detailed, expert and up-to-date appreciation of the character and importance of the main expression of the volume’s twin themes. Once again the Editors are deeply and most gratefully in the debt of the volume’s very able authors. The Editors also warmly thank the following persons: Professor John Beatty, Acting Head of the Philosophy Department, and Professor Nancy Gallini, Dean of the Faculty of Arts, at the University of British Columbia; Professor Michael Stingl, Chair of the Philosophy Department, and Professor Christopher Nicol, Dean of the Faculty of Arts and Science, at the University of Lethbridge; Professor Andrew Jones, Head of the Computer Science Department at King’s College London; Jane Spurr, Publications Administrator in London; Carol Woods, Production Associate in Vancouver and our valued colleagues at Elsevier, Senior Publisher, Arjen Sevenster, and his successor Donna Weerd-Wilson, and Production Associate, Andy Deelen. Dov M. Gabbay King’s College London John Woods University of British Columbia and King’s College London and University of Lethbridge
CONTRIBUTORS Grigoris Antoniou Institute of Computer Science, FORTH, PO Box 1385, 71110 Heraklion, Crete, Greece.
[email protected] Bryson Brown Department of Philosophy, University of Lethbridge, 4401 University Drive Lethbridge, Alberta T1K 3M4, Canada.
[email protected] Alexander Bochman Department of Computer Science, Holon Institute of Technology, Holon, Israel.
[email protected] Maria Luisa Dalla Chiara Departimento di Filosofia, Universita’ de Firenze, Via Bolognese 52, I-50139 Firenze, Italy. dallachiara@unifi.it Didier Dubois Universit´e Paul Sabatier, IRIT, 118 Route de Narbonne, 31062 Toulouse Cedex 09, France.
[email protected] Roberto Giuntini Dipartimento di Scienze Pedagogiche e Filosofiche, Universita’ di Cagliari, Via Is Mirrionis 1, I-09123 Cagliari, Italy.
[email protected] Lluis Godo Institut d’Investigaci´ o en Intellig`encia Artificial (IIIA), - Consejo Superior de Investigaciones Cient´ıficas (CSIC) - 08193 Bellaterra, Spain.
[email protected] Dominic Hyde School of History Philosophy Religion and Classics, University of Queensland, Brisbane, Queensland, 4072, Australia.
[email protected] Grzegorz Malinowski Department of Logic, University of Lodz, Poland.
[email protected]
xii
Contributors
Carl J. Posy Department of Philosophy, Hebrew University of Jerusalem, Mt. Scopus, Jerusalem 91905, Israel.
[email protected] Henri Prade Universit´e Paul Sabatier, IRIT, 118 Route de Narbonne, 31062 Toulouse Cedex 09, France.
[email protected] Graham Priest Philosophy Department, The University of Melbourne, Victoria 3010, Australia; and Department of Philosophy, University of St Andrews, St Andrews,KY16 9AL, Scotland.
[email protected] Miklos R´edei Department of History and Philosophy of Science, Faculty of Natural Sciences, Lorand Eotvos University, Budapest, Hungary.
[email protected] Karl Schlechta Laboratoire d’Informatique de Marseille, UMR 6166, CNRS and Universit´e de Provence, CMI, 39 rue Joliot-Curie, F-13453 Marseille Cedex 13, France.
[email protected], karl.schlechta:web.de Kewen Wang School of Information and Communication Technology, Griffith University, Brisbane, QLD 4111 , Australia. k.wang@griffith.edu.au
MANY-VALUED LOGIC AND ITS PHILOSOPHY Grzegorz Malinowski
INTRODUCTION The assumption stating that every propositions may be ascribed to exactly one of the two logical values, truth or falsity, called the principle of bivalence constitutes the basis for the classical logic.It determines both the subject matter and the scope of applicability of the classical logic. The principle is expressed together through the law of the excluded middle, p∨¬p and the principle of contradiction, ¬(p∧¬p). Given the classical understanding of the logical connectives, the above laws may be read, respectively, as stating that of the two propositions p and ¬p: at least one is true and at least one is false. The most natural and straightforward step beyond the two-valued logic is the introduction of more logical values, rejecting simultaneously the principle of bivalence. The indirect ways consist in revision of the “bunch” of sentence connectives, mostly after having questioned some classical laws concerning them. Then some non-truth-functional connectives are introduced into the language and the propositional logic is primal. The multiple-valued truth-tables constitute the basis of the first method, whereas in the other case they are procured as tools for the procedures of decidability of logical theorems. In either case, the extensional matrix semantics is based on revised “multiple-valued” truth-functionality. The chapter is devoted to the most important systems of many-valued logic and the vital philosophical and metalogical problems of many-valuedness. Its first aim is to give a historical account of the most important systems of many-valued logic and their development. In particular, to overview the original motivations and characteristic properties. The second aim of the proposal is to subscribe to a discussion on many-valuedness at all. Accordingly, some known, albeit the most justified, approaches to many-valuedness are recalled and, further to that, the author’s own approach to the problem of many-valuedness based on the inferential theory of (structural) propositional logics is presented.
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
14
Grzegorz Malinowski
The last two Sections, “Recent developments” and “Applications”, complement the general part of the text and introduce the Reader into more special topics. Monographs, expository papers on several issues concerning many-valuedness and suggested further readings are in Bibliography marked by an aterisk *. 1 WAYS OF MANY VALUES The question when a logic is many-valued is uneasy and non-trivial. To start with we outline an early history of the subject and recall some important justifications, interpretations and explanations of many logical values. We focus on future contingents, paradoxes and probability.
Early history The roots of many-valued logics can be traced back to Aristotle (4th century BC). In Chapter IX of De Interpretatione Aristotle considers the timely honoured sentence “There will be a sea-battle tomorrow”, which cannot be evaluated from the point of view of truth and falsity. The battle-sentence falls in a wide category of future contingents sentences, which refer to the future not necessary, or not actually determined, events. The Philosopher from Stagira suggests the existence of the “third” logical status of propositions. The prehistory of many-valued logic falls on the Middle Ages. However, an evaluation to what extent the approaches by Duns Scotus (c. 1266–1308), William Ockham (1285–1347) and Peter de Rivo (1420–1499) submitted to the topic is fairly uneasy. Mostly, since their studies were limited to the topics following Thomas Aquinas’ discussion of future contingents and divine foreknowledge. Accordingly, they were concerned with the modality and to the consequentia. More serious attempts to create non-classical logical constructions, three-valued mainly, appeared only on the turn of the XIXth century: they were due to Hugh MacColl, Charles S. Peirce, Nicolai A. Vasil’ev. In most cases the division of the totality of propositions into three categories was supported by some considerations dealing with some modal or temporal concepts. Eventually, some criteria of the distinction were applied and the propositions mostly were grouped as either “affirmative”, “negative” or “indifferent”. Philosophical motivations for many-valuedness may roughly be classified as ontological and epistemic. First of them focus on the nature of objects and facts, while the others refer the knowledge status of actual propositions. The “Era of many-valuedness” was finally inaugurated in 1920 by L ukasiewicz and Post, cf. L ukasiewicz [1920], Post [1920]. The thoroughly successful formulations of manyvalued logical constructions were possible in the result of an adaptation of the truth-table method applied to the classical logic by Frege in 1879, Peirce in 1885 and others. The impetus thus given bore the L ukasiewicz and Post method of logical algebras and matrices. Apparently different proposals of the two scholars had quite different supports.
Many-valued Logic and its Philosophy
15
The distinguishing feature of many-valued logics is that some of their connectives are non-truth-functional with respect to truth and falsity, what means that their properties cannot be fully described by two logical values. On the other hand, all connectives of a many-valued logic proper display a kind of generalized “truth-functionality”, i.e. an extensionality with respect to actual values of the logic in question. The non-truth functionality, due to the traditional extension–intension opposition, has also been identified as intensionality. This distinction played some important role on extended justification for the construction of several systems of logic, which either did not have any direct explanation in terms of unorthodox logical values or did not accept anything but the truth-falsity universe. Then, using the axiomatic method, some non-classical constructions formalizing intensional, and thus non-truth-functional, connectives appeared in the 1930s. First, the Lewis modal logics and the intuitionistic logic codifying the principles of a significant trend in the philosophy of mathematics initiated by Brouwer in 1907.
Future contingents The Aristotelian sea-battle sentence has been referred to and reproduced on several occasions throughout the entire XXth century. The first was L ukasiewicz analysing the sentence “I shall be in Warsaw at noon on 21 December of the next year.” The Polish philosopher argues that at the moment of the utterance this sentence is neither true nor false, since otherwise would get fatalist conclusions about necessity or impossibility of the contingent future events. In the seminal paper ukasiewicz considers the three values: 0, 1 and 2. “O logice tr´ ojwarto´sciowej”1 L While 0 and 1 are the classical values of the falsity and truth, the additional value 2 is a value of future contingent sentences and is interpreted as “possibility” or “indeterminacy”. Soon, L ukasiewicz changed the notation, putting 1/2 instead of 2, and suggested that the natural “ordering” of the three values had reflected his philosophical intuitions concerning prepositional connectives better. And, slightly later, in 1938, when Gonseth urged to remark incompatibility of this way of interpreting the third value with other principles of the three-valued logic, L ukasiewicz also quitted his “possibilistic” explanation of the intermediary logical value. Gonseth’s argument discovers that the original L ukasiewicz interpretation neglects the mutual dependence of “possible” propositions and it runs as follows: consider two propositions α and ¬α. Whenever α is undetermined, so is ¬α and then, the conjunction α ∧ ¬α is undetermined. This, however, contradicts the intuition since, independently of α’s content, α ∧ ¬α is false. The approach referring to future contingents has been discussed by further — we shall come back again to the problem, as soon as we have at hand all necessary knowledge concerning the propositional connectives. 1L ukasiewicz
[1920].
16
Grzegorz Malinowski
Paradoxes The famous 1907 Russell’s re-discovery of the Liar paradox troubled the scientific community. The kernel of the semantic paradox, known already in antiquity, is the self-reference, which leads to absurd. Thus, e.g. when one says “I am now speaking falsely” yields that what she says is true if, and only if, she speaks falsely. The re-discovery version is set-theoretical. The Russell set “of all sets that are not their own elements”, Z = {x : x ∈ x}, put the Cantor’s set theory into question and inflicted a serious logical problem. Accepting Z in the set theory founded on the classical logic and substituting it for x in the formula we get an inconsistent equivalence Z ∈ Z ≡ Z ∈ Z.
(∗)
So, is we insist on retaining Z, the sole possibility would be to change the underlying logic. Actually, this attitude was a strong argument in favour of manyvalued logics: the formula (*) ceases to be antinomy even in the simplest, threevalued L ukasiewicz logic. Still, however, the original version of the Liar paradox inspired Bochvar [1938] and led him to the conception of three-valued logic based on the division of propositions into sensible and senseless, and then “mapping” it into a two-level formal language. A proposition is meaningful if it is either true or false, all other sentences are considered as meaningless or paradoxical. The propositional language of Bochvar logic has two levels, which correspond to the object language and to the metalanguage. The both levels have their own connectives being counterparts of standard connectives. The whole approach has been mainly directed towards solving paradoxes emerging with the classical logic and set theory based on it.
Many-valuedness and probability L ukasiewicz, still before 1918, invented a theory of logical probability. The differentiating feature of logical probability is that it refers to propositions and not to events, L ukasiewicz [1913]. The conception was based on the assumption that there is a function P r ranging over the set of propositions of a given standard propositional language, with values from the real interval [0,1] satisfying the conditions: P r(p ∨ ¬p) = 1, P r(p ∨ q) = P r(p) + P r(q) for p and q mutually exclusive (P r(p ∧ q) = 0) and such that P r(p) = P r(q) for any two logically equivalent p and q. If the logical value v(p) is then identified with the measure of probability P r(p) then for P r(p) = 1/2 we would get 1
/2 ∨ 1/2 = P r(p ∨ ¬p) = 1 and 1/2 ∨ 1/2 = P r(p ∨ p) = P r(p) = 1/2 .
Many-valued Logic and its Philosophy
17
Consequently, logical probability must not be identified with logical values of many-valued logic. The reason is that probabilistic intensionality is incompatible with logical extensionality. The logical probability has been later considered by several scholars. The continuators of the line exerted much effort to create unsuccessfully a many-valued logic within which logical probability could find a satisfactory interpretation, see e.g. Zawirski [1934a; 1934b], Reichenbach [1935]. Many years later, Giles [1975] got satisfactory results concerning relation probability — logical values.2 It may be interesting to learn that the early, somewhat naive, probabilistic approach bore the first intuition of non-classical logical values. L ukasiewicz [1913] classified as undefinite the propositions with free nominal variables ranging over finite domains. He assigned to them fractions indicating the proportions between the number of actual variable values verifying a proposition and the number of all possible values of that variable. The “logical” values thus introduced are relative: they depend on the set of individuals actually evaluated. So, for example, the value of the proposition ‘x2 − 1 = 0’ amounts to 1/2 in the set {−1, 0} and to 2/3 in the set {−1, 0, 1}. Obviously enough, infinite sets of individuals are not admitted and this implies that L ukasiewicz’s suggestion cannot be seriously taken within the theory of probability. 2
THE THREE-VALUED L UKASIEWICZ LOGIC
The actual introduction of a third logical value next to truth and falsity, was preceded by thorough philosophical studies. L ukasiewicz, a co-originator of the Lvov-Warsaw philosophical school,3 was concerned with the problems of induction and the theory of probability. Especially, while dealing with the latter L ukasiewicz [1913] extricated himself from the “embarrassing” principle of contradiction. Still more direct influence on L ukasiewicz’s thinking had the discourse in the Lvov-Warsaw school about freedom and creativity. Kotarbi´ nski [1913] suggested the need of revising the two-valued logic that seemed to interfere with the freedom of human thinking. L ukasiewicz, a fierce follower of indeterminism, finally introduced the third logical value to be assigned to non-determined propositions; specifically, to propositions describing casual future events, i.e. future contingents. The entirely organized system, following the scheme of CP C, of propositional logic had as its aim to solve, among others, several questions concerning modality and paradoxes of set-theory.
The third logical value The first remarks about the three-valued propositional calculus can be found in the Farewell Lecture given in the Assembly Hall of Warsaw University on the 2 Giles considered subjective probability and translated its degrees into logical values of the infinite L ukasiewicz logic in a fairly sophisticated way. 3 See Wole´ nski [1989].
18
Grzegorz Malinowski
7th March, 1918. Next came the paper “O logice tr´ ojwarto´sciowej” [Lukasiewicz, 1920]. It brings an outline of the negation-implication version of the propositional calculus, whose semantics is based on three values: 0, 1 and, additionally 1/2 .4 At the early stage L ukasiewicz interpreted the third logical value 1/2 as “possibility” or “indeterminacy”. Following intuitions of these concepts, he extended the classical interpretation of negation and implication in the following tables:5 α 0 1 /2 1
¬α 1 1 /2 0
→ 0 1 /2 1
0 1 1 /2 0
1
/2 1 1 1 /2
1 1 1 1
The other connectives of disjunction, conjunction and equivalence were (later) introduced through the sequence of the following definitions: α ∨ β =df (α → β) → β α ∧ β =df ¬(¬α ∨ ¬β) α ≡ β =df (α → β) ∧ (β → α). Their tables are as follows: ∨ 0 1 /2 1
0 0 1 /2 1
1
/2 /2 1 /2 1
1
≡ 0 1 /2 1
∧ 0 1 /2 1
1 1 1 1 0 1 1 /2 0
1
/2 /2 1 1 /2 1
0 0 0 0
1
/2 0 1 /2 1 /2
1 0 1 /2 1
1 0 1 /2 1
A valuation v in the three-valued logic is any function from the set of formulas F or to {0, 1/2 , 1}, v : F or → {0, 1/2 , 1}, compatible with the above tables. A tautology is a formula which under any v takes on the designated value 1. ukasiewicz differs from TAUT. The set L 3 of tautologies of three-valued logic of L So, for instance, neither the law of the excluded middle, nor the principle of contradiction is in L 3 . To see this, it suffices to assign 1/2 for p: any such valuation also 1 associates /2 with EM and CP . The thorough-going refutation of these two laws was intended, in L ukasiewicz’s opinion, to codify the principles of indeterminism. 4 In
L ukasiewicz [1920] the notation is different: 2 appears instead of 1/2 . truth-tables of binary connectives ∗ are viewed as follows: the value of α is placed in the first vertical line, the value of β in the first horizontal line and the value of α ∗ β at the intersection of the two lines. 5 The
Many-valued Logic and its Philosophy
19
Another property of new semantics is that some classically inconsistent formulas are no more contradictory in L 3 . One of such formulas: p ≡ ¬p,
(∗)
is connected with the Russell paradox since the equivalence Z ∈ Z ≡ Z ∈ Z, is a substitution of (*). Accordingly, Russell paradox ceases to be antinomy in L 3: putting 1/2 for p makes the formula (*) true and therefore it is non-contradictory
Modality, axiomatization and interpretation Another L ukasiewicz’s intention was to formalize the modal functors: of possibility M and necessity L. L ukasiewicz postulated preservation of the consistency of middle ages’ intuitive theorems on modal propositions. Being aware of impossibility of expressing these functors in the truth-functional classical logic L ukasiewicz took the three-valued logic as base. In 1921 Tarski produced simple definitions, using negation and implication, of the two connectives meeting the L ukasiewicz’s requirements:6 x 0 1 /2 1
Mx 0 1 1
x 0 1 /2 1
Lx 0 0 1
M α =df ¬α → α Lα =df ¬M ¬α = ¬(α → ¬α)
Using M, L and other L ukasiewicz connectives we get third modal connective “it is contingent that”or, “it is modally indifferent”, distinguishing the intermediate logical value α 0 1 /2 1
Iα 0 1 0
Iα = M α ∧ ¬Lα
Applying I allows the formulation within L 3 , of counterparts of the law of the excluded middle and the principle of contradiction: p ∨ Ip ∨ ¬p ¬(p ∧ ¬Ip ∧ ¬p) 6 See
L ukasiewicz [1930].
20
Grzegorz Malinowski
rendering altogether that L ukasiewicz’s logic is three-valued. In spite of the promising combination of trivalence and modality the full elaboration of modal logic on the basis of the three-valued logic never succeeded (with the mere exception of algebraic constructions of Moisil — see Section 6), which was the result of the further L ukasiewicz’s investigations on modal sentences (see L ukasiewicz [1930]). Many years after L ukasiewicz comes back to the idea of construction of pluri-valued modal system and he exhibits a four-valued logic of possibility and necessity, L ukasiewicz [1953]. L 3 through the law of excluded fourth and the extended contradiction principle, expresses its three-valuedness. However, it is limited since not all connectives described by {0, 1/2 , 1}-tables are definable through formulas. One important example is the constant connective T , such that T x = 1/2 for any x ∈ {0, 1/2 , 1}. The axiomatization of L 3 due to Wajsberg [1931] is the first known axiomatization of a system of many-valued logic. Accepting the rules M P and SU B Wajsberg axiom system for (¬, →)-version of L ukasiewicz’s three-valued propositional calculus is as follows: W1. W2. W3. W4.
p → (q → p) (p → q) → ((q → r) → (p → r)) (¬p → ¬q) → (q → p) ((p → ¬p) → p) → p.
ukasiewicz conThe result obviously applies to the whole L 3 since the other L nectives are definable using those of negation and implication. Slupecki [1936] enriched the set of primitives by T and adding to W1–W4: W5. W6.
T p → ¬T p ¬T p → T p
got an axiom system for the functionally complete three-valued logic, compare Section 4.
An intuitive interpretation In view of existing difficulties, see Section 1, Slupecki [1964] undertook another attempt to interpret intuitively L ukasiewicz’s logic. Slupecki points out a definite language to describe the property of events determination in three-valued manner reconciling L ukasiewicz’s truth-tables. The language has a set S comprising propositions about events, which contain simple (atomic) propositions and compound ones formed by means of disjunction (∨), conjunction (∧) and negation (¬) connectives. Slupecki supposes the set of events Z described by propositions of S to be closed under the operations of union (∪), meet (or, intersection) (∩) and complementation (−) and, furthermore, the structure Z = (Z, ∪, ∩, −) to be a Boolean algebra. There is a causality relation → on Z (“f1 → f2 ” reads: the event f1 is a cause of the event f2 ”) providing the assumption that
Many-valued Logic and its Philosophy
(P1) (P2) (P3) (P4)
21
f → f1 ∪ f2 iff f → f1 or f → f2 . f → f1 ∩ f2 iff f → f1 and f → f2 . If f → f1 for some f , then f + → −f1 for no f + . If f1 → f , then f1 ∩ f2 → f .
for any f, f1 , f2 ∈ Z. For the purpose of defining the property of determination, he then singles out a set of past and present events hereafter denoted by symbols: g, g1 , g2 . . ., and puts D(f ) D(f )
= df = df
there is a g ∈ Z such that g → f , not D(f ) and not D(−f )
The intended meaning of D(f ) is that f is (at the present moment)determined, and D(f ) that f is (at the present moment) not determined . The relation ∗ between propositions in S and events in Z of describability (“p∗f ” reads: p describes (event) f ) has to satisfy the conditions: p ∨ p1 ∗ f ∪ f1 whenever p ∗ f and p1 ∗ f1 p ∧ p1 ∗ f ∩ f1 whenever p ∗ f and p1 ∗ f1 when p ∗ f, then ¬p ∗ −f .
(∗)
for any p, p1 ∈ S. In the end, Slupecki defines the three properties 1(α), 0(α) and 1 /2 (α) : 1(α) = p is true, 0(α) = α is false, 1/2 (α) = α has the “third” logical value, as follows: (DT)
if α ∗ f, then {1(α) iff D(f )} if α ∗ f, then {0(α) iff D(−f )} if α ∗ f, then {1/2 (α) iff D(f )}.
Using (P1)–(P4) and (∗), it is easy to check that for x ∈ {0, 1/2 , 1} x(p ∨ q) = x(p) ∨ x(q) x(p ∧ q) = x(p) ∧ (q) x(¬p) = ¬x(p), where ∨, ∧ and ¬ appearing on the right-hand side are the connectives of the three-valued L ukasiewicz’s logic. Thus, (DT) to some extent justifies the L ukasiewicz interpretation of logical values with reference to the property of determination. Slupecki’s interpretation omits the implication connective. Admittedly, Slupecki extends it onto the language with modal connectives M and L and in thus enriched language the implication of L ukasiewicz is definable. However, the interpretation of the implication obtained is fairly unintuitive. On the other hand, more profound analysis of the whole construction reveals that the assumption concerning Z has to be weakened: Nowak [1988] proved the formal correctness of the interpretation exclusively when Z is de Morgan lattice and not a Boolean algebra. This result shows that the three-valued logic can thus be interpreted as a set of propositions
22
Grzegorz Malinowski
describing events which form a non-classical algebra. (DT) implies then that the third value of L ukasiewicz, 1/2 , is assigned to propositions concerning non-boolean, undetermined events. Some time later, in 1922, L ukasiewicz extended his three-valued construction onto further sets of values and defined important family of finite and infite valued logics, see Section 6.
3
THREE-VALUED LOGICS OF KLEENE AND BOCHVAR
Kleene [1938; 1952] is the author of two systems of propositional and predicate logic motivated by indeterminacy of some propositions at a certain stage of investigation. Inspired by the studies of the foundations of mathematics and the theory of recursion Kleene aimed at getting tools that render the analysis of partially defined predicates possible. To be aware of a necessity of such logic(s) let us consider a simple example of such a predicate, the mathematical property P defined by the equivalence P (x)
1 ≤ 1/x ≤ 2,
if and only if
Where x is a variable ranging over the set of real numbers. It is apparent that due to the properties of division for x = 0 the propositional function P (x) is undetermined. More precisely, we then have that if 1 /2 ≤ a ≤ 1 true undetermined if a = 0 Proposition P (a) is false otherwise. The starting point of Kleene’s [1938] construction consists in considering also the propositions whose logical value of truth (t) or falsity (f) is undefined, undetermined by means of accessible algorithms, or not essential for actual consideration. The third logical value of undefiniteness (u) is reserved for this category of propositions. Kleene’s counterparts of the standard connectives are defined by the following tables: α f u t
→ f u t
¬α t u f ∧ f u t
f f f f
u f u u
f t u f t f u t
u t u u
∨ f u t
t t t t ≡ f u t
f t u f
f f u t u u u u
u u u t t f u t
t t t t
Many-valued Logic and its Philosophy
23
One may easily notice that as in L ukasiewicz logic the connectives’ behaviour towards the classical logical values t and f remains unchanged. However, now also mutual definability of α → β and ¬α ∨ β is saved. Kleene takes t as the only distinguished value and, in consequence, obtains that no formula is a tautology — it follows from the fact that any valuation which assigns u to every propositional variable also assigns u to any formula. It is striking that so “conservative” extension of the two-valued logic rejects all classical tautologies, even such as p ∨ ¬p and p → p. An accurate and compatible interpretation of Kleene’s connectives was given by K¨ orner [1966]. K¨ orner defined the notion of an inexact class of a given non-empty domain A generated by a partial definition D(P ) of a property P of elements of A as a three-valued “characteristic function” XP : A → {−1, 0, +1}: −1 when P (a) according to D(P ) is false 0 when P (a) is D(P )-undecidable XP (a) = +1 when P (a) according to D(P ) is true. Any family of inexact classes of a given domain A is a de Morgan lattice, the algebraic operations ∪, ∩ and −: (X ∪ Y )(a) = max{X(a), Y (a)} (X ∩ Y )(a) = min{X(a), Y (a)} (−X)(a) = −X(a). being counterparts of the Kleene connectives. K¨ orner’s ideas have been recently revitalized in the rough sets theory of Pawlak [1991] and the approximation logic based on it, see e.g. Rasiowa [1991]. In 1952 in his monograph Introduction to metamathematics Kleene refers to the connectives of his 1938 logic as strong and introduces another set of weak connectives: retaining the negation and equivalence he defines the three others by the tables
→ f u t
f t u f
u u u u
t t u t
∨ f u t
f f u t
u u u u
t t u t
∧ f u t
f f u f
u u u u
t f u t
The novel truth-tables are to describe the employment of logical connectives in respect of those arithmetical propositional functions whose decidability depends on the effective recursive procedures. They are constituted according to the rule of saying that any single appearance of u results in the whole context taking u. The original arithmetic motivation states that indeterminacy occurring at any stage of computation makes the entire procedure undetermined.
24
Grzegorz Malinowski
Bochvar As we alredy mentioned, Bochvar [1938] three-valued logic has been directed towards solving paradoxes emerging with the classical logic and set theory based on it. The propositional language of Bochvar logic has two levels, internal and external, which correspond to the object language and to metalanguage. The both levels have their own connectives: the counterparts of negation, implication, disjunction, conjunction and equivalence. The two planes of Bochvar construction correspond to Kleene weak logic (internal) and to classical logic (external), respectively. The internal connectives are conservative generalizations of the classical ones and will be denoted here as ¬, →, ∨, ∧ and ≡. The external connectives express the relations between logical values of propositions and incorporate the expressions “... is true” and “... is false”. They are here marked as starred connectives and understand in the following way: external external external external external
negation: implication: disjunction: conjunction: equivalence:
¬∗ α α →∗ β α ∨∗ β α ∧∗ β α ≡∗ β
‘α is false’ ‘if α is true then β is true’ ‘α is true or β is true’ ‘α is true and β is true’ ‘α is true iff β is true’.
The truth tables of internal connectives have been compiled according to the rule which is a rejoinder of the Kleene’s principle: “every compound proposition including at least one meaningless component is meaningless, in other cases its value is determined classically”. One may then easily conclude that the internal Bochvar connectives coincide with the weak connectives of Kleene. Therefore, we will adopt the previous notation for logical values with u being now the value “meaningless” and will refer the Reader to the last Section. The truth-table description of the second collection of Bochvar connectives is the following: α f u t
¬∗ α t t f ∧∗ f u t
→∗ f u t f f f f
u f f f
f t t f
u t t f t f f t
∨∗ f u t
t t t t ≡∗ f u t
f t t f
f f f t u t t f
u f f t
t t t t
t f f t
An important property of Bochvar construction, making it more natural, is a compatibility of two levels. The passage from the internal to external level is assured by the external assertion “α is true”, A∗. Below we show the truth-table of this connective and the intuitively justified definitions of external connectives:
Many-valued Logic and its Philosophy
25
Bochvar takes t as the designated value and thus gets the weak Kleene logic on the internal level. So, Bochvar’s internal logic does not have tautologies. Finally, the external logic is the classical logic: the truth tables of all external connectives ‘identify’ the values u and f, whereas the behaviour of these connectives with regard to f and t is standard.
A logic of nonsense Bochvar idea has been undertaken by several authors, who aimed at construing other systems appropriate for dealing with vagueness or nonsense, the latter sometimes called nonsense-logic. So, in his monograph “The logic of nonsense” Halld´en [1949] rediscovers Bochvar logic for these purposes. Halld´en adopts three logic values: falsity (f), truth (t) and “meaningless” (u). As the policy accepted for compound propositions and thus the connectives of negation and conjunction is just like the Bochvar’s, the truth tables of these connectives are exactly the same as in Bochvar internal logic.7 The system, however, differs from the latter. First, it has a new one-argument connective + serving to express meaningfulness of propositions. Thus if α is meaningless, then +α is false. Otherwise, +α is true. Second, Halld´en distinguishes two logical values u and t. Therefore, a formula is valid if it never takes f. In consequence, the set of valid formulas not containing + coincides with the set of tautologies of CP C. The construction, however, differs from the classical logic by its inference properties. The logic of nonsense restricts heavily rules of inference, among them the rule of Modus Ponens: in general q does not follow from the two premises: p → q and p. To see that it suffices consider a valuation for which p is meaningless and q is false. Under such valuation q is not designated, while the premises as meaningless are both designated. Halld´en provides a readable axiomatization of his logic. To this aim, he introduces the connectives of implication (→) and equivalence (≡) accepting standard classical definitions and two standard inference rules M P and SU B H1. (¬p → p) → p H2. p → (¬p → q) H3. (p → q) → ((q → r) → (p → r)) H4. +p ≡ +¬p H5. +(p ∧ q) ≡ +p ∧ +q H6. p → +p In the framework, it is also easy to define a dual to + connective putting: −α =df ¬ + α. Thus, as +α corresponds to “α is meaningful”, −α stands for “α is meaningless”. Further elaboration of Halld´en’s approach is made by Aqvist [1962] and Segerberg [1965]. Departing from the problems arising with normative sentences Aqvist created the calculus, which may be considered as a minor variant of L ukasiewicz three-valued logic, or the fragment of Kleene strong logic. The three primitives 7 Coincidence with Bochvar is striking. However, Halld´ en work is independent and original, compare e.g. Williamson [1994].
26
Grzegorz Malinowski
are: negation (¬), disjunction (∨) and a special connective # . Their tables use the three values: f, u, t (in our notation). The intended meaning of f and t is standard and the only designated is “t”. The tables of negation and disjunction are much the same, modulo notation, as the truth-tables of L ukasiewicz threevalued connectives. # is defined as follows: #(f ) = #(u) = f and #(t) = t and coincides with the L ukasiewicz “necessity” operator L. In view of philosophical application of his formal approach Aqvist defines three “characteristic” functors of the system: F α =df #¬α,
Lα =df # ∨ F α,
M α =df ¬Lα,
whose reading is: “α is false” (F α), “α is meaningful” (Lα) and “α is meaningless” (M α). Two attempts of generalization of Bochvar’s approach onto n-valued case (n > 3) are worth mentioning. The first is due to Rescher [1975], who, rather hastily, transplanted the idea onto the finite and the infinite case. The second, by Finn and Grigolia [1980], stemmed from algebraic description of the Bochvar threevalued logic. Finn and Grigolia employed the algebraic counterparts of Rosser and Turquette’s j — operators as “graded” assertions. 4
LOGIC ALGEBRAS, MATRICES AND STRUCTURALITY
The methodology of propositional calculi and algebraic approach to classical and non-classical logics are highly efficient tools of logical investigation of several problems concerning the many-valuedness; cf. [W´ ojcicki, 1988; Rasiowa, 1974] and more recent books by Czelakowski [2001] and Dunn and Hardegree [2001]. Our short presentation of the is limited to concepts used further in the Chapter.
Logic algebras A propositional language is viewed as an algebra of formulae L = (F or, F1 , . . . , Fm ), freely generated by the sets of propositional variables V ar = {p, q, r, . . .} and the finitary operations F1 , . . . , Fm on For playing the role of connectives. Accordingly, any interpretation structure A for L is an algebra A = (A, f1 , . . . , fm ). similar to it.8 Furthermore, rightly assumed property of the language’s free generation, implies that any mapping s : V ar → A, uniquely extends to the homomorphism hs : L → A, hs ∈ Hom(L, A). 8 See
[Suszko, 1957].
Many-valued Logic and its Philosophy
27
The most employed is the standard language of the classical logic Lk = (F or, ¬, →, ∨, ∧, ≡), with the connectives of negation, implication, disjunction, conjunction and equivalence. In turn, the two-element algebra of the classical logic has the form A2 = ({0, 1}, ¬, →, ∨, ∧, ≡), here the same symbols ¬, →, ∨, ∧, ≡ as for connectives denote corresponding operations on the set {0, 1} of “two values” 0 and 1 representing falsity and truth, respectively. A logic algebra is functionally complete when all finitary operations on its universe are definable by use of its original operations. That functional completeness is a property of the classical logic was proved by Post [1921]. In terminology just adopted we may equivalently say that the algebra A2 is functionally complete. Where n ≥ 2 is a given natural number, let us put En = {1, 2, . . . , n} and by Un denote any algebra of the form: Un = (En , f1 , . . . , fm ), f1 , . . . , fm being finitary operations on En . Un will be called functionally complete if every finitary mapping f : Enk → En (k ≥ 0, k finite)9 can be represented as a composition of the operations f1 , . . . , fm . Post [1921] reduced the complexity of the problem to small number of connectives. If we require that for some finite m any k-argument operation on En , where k ≤ m, is definable then Un is functionally complete for m variables. That logical property warrants definability of all at most m-argument connectives. [Post (1921)]. If Un is functionally complete for m variables, where m ≥ 2, then is also functionally complete for m+1 variables and hence also functionally complete. Note that the last theorem reduces the functional completeness of A2 to the definability of all 4 unary and 16 binary connectives. In turn, it is easy to show that the connectives of the standard language define all twenty. Post himself provided several other small collections to do the same and there is also known a “minimalist” reduction of classical connectives to a single one, the so-called Sheffer’s stroke. Getting the functional completeness of n element algebras was another motivation for building many-valued logic. Post was the first to give such an algebra generating two functions: the one-argument cyclic rotation (negation) and the two-argument maximum function (disjunction). In the present notation they look like 1 if x = n x ∨ y = max(x, y). ¬n x = i + 1 if x = i = n 9 The
0-ary operations are constants, i.e. elements of En .
28
Grzegorz Malinowski
Every Post algebra Pn = (En , ¬n , ∨) is functionally complete, Post (1921). Obviously, P2 = (E2 , ¬2 , ∨) is the (¬, ∨)reduct of the algebra A2 . The functional completeness of the n-valued logic algebras is a matter of consequence since the propositional logics founded on such algebras are logics of all possible extensional n-valued connectives (truth functional when n = 2) and, for every n, they, in a sense, are unique. Since functional completeness is a scarcely frequent property several criteria have been formulated which might help to determine its presence or not. [Slupecki (1939a)]. An n-valued algebra Un (n ≥ 0, n finite) is functionally complete if and only if in Un there are definable: (i)
all one-argument operations on En
(ii) at least one two-argument operation f (x, y) whose range consists of all values i for 1 ≤ i ≤ n. [Picard (1935)]. Un is functionally complete whenever the functions H, R, S in Section 5 are definable in it. Using Slupecki’s criterion we may easily establish the functional incompleteness of all already considered three-valued logics excepting the Post logic. For the L ukasiewicz three-valued logic L 3 it suffices to remark that the one-argument constant function T : T x = 1/2 for any x ∈ {0,1/2 , 1} is not definable in terms of the basic connectives. To check that consider any compound function of oneargument and assume that x ∈ {0, 1}, note that due to the tables of the primitive connectives the output value must not be equal 1/2 . On the other hand the same criterion implies that adding T to the stock of functions of L 3 leads to the functionally complete logic algebra, Slupecki [1936]. In the sequel we shall also deal with some known infinite logical constructions. Anticipating possible questions we inform that infinite logic algebras in principle are functionally incomplete. This is due to the fact that the set of possible functions of any algebra of this kind is uncountably infinite, while using a finite number of original operations one may define at most a countable family of functions.
Matrices The classical semantics of truth-valuations distinguishes 1, i.e. the logical value of truth, which corresponds to a specified kind of propositions. In a more general framework, interpretation structures equipped with a distinguished subset of elements corresponding to propositions of a specified kind are necessary. These are logical matrices. A pair
Many-valued Logic and its Philosophy
29
M = (A, D), with A being an algebra similar to a language L and D ⊆ A, a subset of A, will be thus referred to as a matrix for L. Elements of D will be called designated (or, distinguished ) elements of M . The set of formulae which take designated values only: E(M ) = {α ∈ F or : hα ∈ D for any h ∈ Hom(L, A)} is called the content of M . The relation |=M ⊆ 2F or × F or is said to be a matrix consequence of M provided that for any X ⊆ F or, α ∈ F or X |=M α if and only if for every h ∈ Hom(L, A)(hα ∈ D whenever hX ⊆ D). The content of a matrix is a counterpart of the set of tautologies and E(M ) = {α : ∅ |=M α}. The entailment |=M is a natural generalization of the classical consequence. In the terminology just adopted, the classical matrix has the form M2 = ({0, 1}, ¬, →, ∨, ∧, ≡, {1}), and the classical consequence relation is characterized as follows: X |=2 α if and only if for every h ∈ Hom(L, A2 ) (hα = 1 if hX ⊆ {1}). Notice that the set of tautologies is the content of M2 and it consists of formulas, which are “consequences” of the empty set, T AU T = E(M2 ) = {α : ∅ |=2 α}. The so-called deduction theorem for the classical logic expressed in terms of |=2 says now that for any set of formulas X and α, β ∈ F or, (ded2 )
X, α |=2 β if and only if X |=2 α → β.
To see how the framework of matrices works for three-valued logic of L ukasiewicz let us consider the matrix of L 3: M3 = ({0, 1/2 , 1}, ¬, →, ∨, ∧, ≡, {1}), with the connectives set by the tables in Section 2. The following deduction theorem (ded3 )
X, α |=3 β if and only if X |=3 α → (α → β),
expresses the mutual relation between the consequence and the implication.The left to right direction is essential. To see why the antecedent appears twice it suffices to consider a valuation h, sending all formulas from X into {1} and such that hα = 1/2 , hβ = 0. Accordingly, the classical counterpart of (ded3 ), (ded2 )
X, α |=3 β if and only if X |=3 α → β,
for exactly the same reasons fails.
30
Grzegorz Malinowski
Structural consequence and logics With every |=M there may be uniquely associated an operation CnM : 2For → 2For such that α ∈ CnM (X) if and only if X |=M α. Wherein K is a class (a set) of matrices for a given language L, the relation |=K is to be identified with the set-theoretical meet of {|=M : M ∈ K}. Consequently, CnK = {CnM : M ∈ K} i.e. for any X ⊆ For CnK (X) = {CnM (X) : M ∈ K}.
CnM and CnK are special examples of the consequence operations considered in the general theory of deductive systems originated with Tarski (1936). A mapping C : 2For → 2For will be referred to as a consequence operation of the language L if and only if any X, Y ⊆ For (T0) X ⊆ C(X) (T1) C(C(X)) = C(X) (T2) C(X) ⊆ C(Y ) whenever X ⊆ Y . If, moreover, for any substitution e ∈ End(L) (S)
eC(X) ⊆ C(eX),
we shall say that C is structural. It is easy to prove that each matrix consequence operation CnM is structural. Conversely, each structural consequence C of L and any set of formulas X determine together a matrix LX = (L, C(X)) called a Lindenbaum matrix. The class of all Lindenbaum matrices of a given consequence C of L, LC = {(L, C(X)) : X ⊆ For} will be referred to as Lindenbaum bundle. Since the substitutions (i.e. endomorphisms) of the language L take the role of valuations one may easily show that any structural consequence operation C is uniquely determined by its Lindenbaum bundle: C = CnLC and ultimately that [W´ ojcicki (1970)] For every structural consequence operation there is a class K of matrices such that C = CnK . An arbitrary consequence C may be conceived as a rule composed of all pairs (X, α) where α ∈ C(X). Rules of the form (∅, α) are called axiomatic and their consequents axioms. A rule R is structural if (X, α) ∈ R implies (eX, eα) ∈ R, any substitution e ∈ End(L). Structural rules may be generated by some “generic”
Many-valued Logic and its Philosophy
31
pairs (X, α), one in particular. Modus Ponens (the Detachment Rule), M P , is a structural rule determined by the single pair: {p → q, p}/q and is represented through the following schema: ϕ → ψ, ϕ / ψ. Let X be a set of formulas and R a set of rules of inference. Then, X is R-closed iff for every α ∈ F or, Y ⊆ X, (Y, α) ∈ R implies α ∈ X. The operation CnR defined for every X ⊆ For by CnR (X) = {Y ⊆ For : X ⊆ Y, Y being R-closed}
proves to be a consequence (structural if R is structural). Every R such that C = CnR is referred to as a base of C. In the case when R splits into the set of axiomatic rules RA and non-axiomatic rules RI i.e. R = RA ∪ RI , we may, putting A = {α : (∅, α) ∈ RA }, represent CnR as Cn(A, RI ) where Cn(A, RI ) (X) =
{Y : A ∪ X ⊆ Y and Y is RI -closed}.
A standard way of formalization of propositional logics (L, C) with implication consists in defining C as Cn(A, M P ) or Cn(A, {SU B, M P }) , where SU B is the substitution rule, SU B = {ϕ/eϕ : e ∈ End(L)}. In standard cases the implication is supposed to satisfy the Deduction Theorem: β ∈ C(X, α) if and only if α → β ∈ C(X)
(Ded)
Theory of logical matrices is a theory of models of propositional calculi and, from certain point of view, may be treated as a fragment of the algebraic systems theory, see Czelakowski [2001]. 5
POST LOGICS
As an outcome of the research on the classical logic Post [1920; 1921] construed a family of finite-valued propositional systems. The inspirations comprised Principia Mathematica of Whitehead and Russell (1910), the method of truth tables and Post’s own results concerning functional completeness of the classical logic.
Post n-element algebras Following Principia Mathematica Post takes negation (¬) and disjunction (∨) as primitives. For any natural n ≥ 2 a linearly ordered set Pn = {t1 , t2 , . . . , tn }, tn < tj iff i < j, is the set of values. Finally, the operations corresponding to connectives are: unary rotation (or cyclic negation) ¬ and binary disjunction ∨ defined by
32
Grzegorz Malinowski
¬ti =
ti+1 ti
if i = n if i = n
ti ∨ tj = timax {i,j} .
For a given n ≥ 2, these equations define n-element tables of negation and disjunction. Thus, e.g. for n = 5 tables are following:
x t1 t2 t3 t4 t5
¬x t2 t3 t4 t5 t1
∨ t1 t2 t3 t4 t5
t1 t1 t2 t3 t4 t5
t2 t2 t2 t3 t4 t5
t3 t3 t3 t3 t4 t5
t4 t4 t4 t4 t4 t5
t5 t5 t5 t5 t5 t5
It is easy to see that for n = 2 Post logic coincides with the negation-disjunction version of the classical logic: when P2 = {t1 , t2 } is identified with {0, 1}, the Post negation and disjunction are isomorphic variants of the classical connectives.10 The relation to CP C breaks for n > 2. In all these case the table of negation is not compatible with its classical counterpart. To see this remark that due to the disjunction t1 always corresponds to 0 and tn to 1. And, even if ¬tn = t1 , ¬t1 equals t2 and is not tn . Accordingly, the n-valued Post algebra Pn = ({t1 , t2 , . . . , tn }, ¬, ∨) either coincides with the negation-disjunction algebra of CP C(n = 2), or the latter algebra is not a subalgebra of it (n > 2). Post considers the “biggest” value tn as distinguished. Accordingly, among the laws of Post n-valued logics (n > 2) the counterparts of some significant tautologies of the classical logic like the n-valued law of the excluded middle: p ∨ ¬p ∨ ¬¬p ∨ . . . ¬¬ . . . ¬p. (n−1)times
appear. Contrary to that, an application of classical definitional patters of other standard connectives like conjunction, implication and equivalence leads to very strange results. Thus e.g. the definition of conjunction using de Morgan law, α ∧ β = ¬(¬α ∨ ¬β), results in getting the non-associative connective ∧. The source of these unexpected properties is, manifestly, the rotate character of Post negation. 10 It is worth to recall that this set of connectives permits to define all other classical connectives and thus warrants functional completeness of underlying algebra and logic.
Many-valued Logic and its Philosophy
33
The most important property of Post algebras is their functional completeness: by means of the two primitive functions, every finite-argument function on Pn can be defined. In particular, then, also the constant functions and hence the “logical values” t1 , t2 , . . . , tn . Getting functional completeness was one of the prime aims of the author.11
Semantic interpretation The construction, apparently algebraic, was eventually provided with an interesting semantic interpretation. Post suggests to see the elements of Pn as objects corresponding to (n − 1)-tuples P = (p1 , p2 , . . . , pn−1 ) of ordinary two-valued propositions p1 , p2 , . . . , pn−1 subject to the condition that true propositions are listed before the false. Next, (¬) ¬P if formed by replacing the first false element by its denial, but if there is no false element in P , then all are to be denied, in which case ¬P is a sequence of false propositions. (∨) When P = (p1 , p2 , . . . , pn−1 ) and Q = (q1 , q2 , . . . , qn−1 ), then P ∨ Q = (p1 ∨ q1 , p2 ∨ q2 , . . . , pn−1 ∨ qn−1 ). The mapping i : E n−1 → Pn , of the set of tuples E n−1 onto Pn : i(P ) = ti iff P contains exactly (i − 1) true propositions establishes an isomorphism between (E n−1 , ∨, ¬) and the Post algebra Pn . The exemplary universe E 4 corresponding to the case of five-valued Post logic, considered before, consists of the following 4-tuples: (0, 0, 0, 0) (1, 0, 0, 0) (1, 1, 0, 0) (1, 1, 1, 0) (1, 1, 1, 1)
t1 t2 t3 t4 t5 .
This interpretation of logic values and its algebra shows, among others, that the values in different Post logics should be understood differently. Post [1921] also defined a family of purely implicative n−valued logics. The family is fairly extensive and it covers implications designed by other authors, e.g. L ukasiewicz and G¨ odel. The novelty of this proposal was that Post designated many logical values at a time. That possibility, which seems quite natural, was ignored by other originators of many-valued logics. 11 Compare
[Post, 1921] in Ssection 4.
34
Grzegorz Malinowski
The problem of axiomatization The original (¬, ∨) systems of Post’s logic are not axiomatized so far. However, the problem of their axiomatizability has been for years a foregone matter; hence Slupecki [1939b] has constructed the largest possible class of functionally complete finite logics and gave a general method of their axiomatization. From this it evidently follows that also Post logics are axiomatizable albeit the problem of providing axioms for their original version still remains open. Slupecki matrix Snk (n being a given natural number, 1 ≤ k ≤ n) is of the form: Snk = ({1, 2, . . . , n}, →, R, S, {1, 2, . . . , k}). where → is a binary (implication), and R, S unary operations defined in the following way: y if 1 ≤ x ≤ k , x→y= 1 if k < x ≤ n x + 1 if 1 ≤ x ≤ n − 1 , R(x) = 1 if x = n 2 if x = 1 1 if x = 2 S(x) = . x if 3 ≤ x ≤ n
Functional completeness12 of each of these matrices results from Picard’s criterion: R and S are two of the Picard’s functions, in order to define the third, it suffices to put: 1 if x = 2 Hx = (x → R(x → x)) → Sx for k = 1, then Hx = x if x = 2 Hx = R(x → x) → x
for k > 1,
then
Hx =
1 if x = k x if x = k.
Slupecki produced an effective proof of axiomatizability of E(Snk ) (any pair (n, k) as above) giving a long list of axioms formulated in terms of implication and special one-argument connectives defined through the superpositions of R, S, and H. The chief line of approach here is to make capital of the standard character of implication,13 which can be classically axiomatized, using MP (the Detachment Rule). Slupecki extends M P onto the whole language, taking the L ukasiewicz’s formula: ((p → q) → r)) → ((r → p) → (s → p)) as the only axiom for implication and provides an inductive, combinatorial completeness proof. 12 A
finite matrix is functionally complete when its algebra has that property. Section 7.
13 Compare
Many-valued Logic and its Philosophy
35
Algebraic interpretations The history of algebras corresponding to Post logics was quite different and it even established the current notion of n-valued Post logic. First, Rosenbloom [1942] defined a Post algebra of order n (n ≥ 2) using the Post rotation (¬) and disjunction (∨) and some other auxiliary functions. Then, the concept has undergone several modifications, see Dwinger [1977]. The most important was the lattice-theoretical characterization by Epstein [1960]. Epstein presented Post algebra of order n as a distributive “chain based” lattice with involution and Boolean-valued endomorphisms. The concept thus presented has been very close to the- Moisil’s n-valued L ukasiewicz algebra — the main difference is that Post algebras are additionally equipped with the set of constants. The history turned when Rousseau (1969) observed that any Post algebra of order n is a pseudo-Boolean, or Heyting, algebra (see [Rasiowa, 1974]). Consequently, Rousseau proposed a definition of Post algebra using a new binary operation of relative pseudo-complement →, a counterpart of an implication. Thus, a Post algebra of order n (n ≥ 2) is a structure (P, 1, →, ∪, ∩, ¬, D1 , D2 , ..., Dn−1 , e0 , e1 , ..., en−1 ) where 1, e0 , e1 , ..., en−1 are zero-argument operations (constants), ¬, D1 , D2 , ..., Dn−1 are one-argument operations and ∪, ∩ are two-argument operations, such that (p0 ) (P, 1, →, ∪, ∩, ¬) is a pseudo-Boolean algebra, see Rasiowa [1974], and for x, y ∈ P , the following equations hold: (p1 ) (p2 ) (p3 ) (p4 ) (p5 ) (p6 ) (p7 ) (p8 )
Dk (x ∪ y) = Dk (x) ∪ Dk (y) Dk (x ∩ y) = Dk (x) ∩ Dk (y) Dk (x → y) = (D1 (x) → D1 (y)) ∩ (D2 (x) → D2 (y)) ∩ ... ∩ (Dk (x) → Dk (y)) Dk (¬x) = ¬D1 (x) Dk (Dj (x)) = Dj (x) Dk (ej ) = 1 if k ≤ j and Dk (ej ) = ¬1 if k > j x = (D1 (x) ∩ e1 ) ∪ (D2 (x) ∩ e2 ) ∪ ... ∪ (Dn−1 (x) ∩ en−1 ) D1 (x) ∪ ¬D1 (x) = 1.
Among them, (p7 ) is of a special importance since it assures uniform monotonic representation of elements of Post algebra of order n in terms of constants and Boolean elements. This, in other terms, means that the algebra based on the chain e0 , e1 , ..., en−1 and that every Post algebra is determined by its Boolean part. The property apparently reflects the original Post’s interpretation of his n logical values. Intense investigation of L ukasiewicz and Post algebras were motivated by their actual and expected applications, see Section 16. It is worth of mentioning that the redefinition of Post algebras in terms of pseudo-Boolean chain based lattices led to a new definition of n-valued Post logics and their generalization onto infinite cases.
36
Grzegorz Malinowski
6 L UKASIEWICZ LOGICS In 1922 L ukasiewicz generalized his three-valued logic and defined the family of ukasiewicz n-valued matrix many-valued logics, both finite and infinite-valued.14 L has the form Mn = (Ln , ¬, →, ∨, ∧, ≡, {1}), where {0, 1/n−1 , 2/n−1 , . . . , 1} {s/w : 0 ≤ s ≤ w, s, w ∈ N and w = 0} Ln = [0, 1]
if n ∈ N, n ≥ 2 if n = ℵ0 if n = ℵ1 .
and the functions are defined on Ln as follows: (i) (ii)
¬x = 1 − x x → y = min(1, 1 − x + y) x ∨ y = (x → y) → y = max(x, y) x ∧ y = ¬(¬x ∨ ¬y) = min(x, y) x ≡ y = (x → y) ∧ (y → x) = 1 − |x − y|.
The introduction of new many-valued logics was not supported by any separate argumentation. L ukasiewicz merely underlined, that the generalization was correct since for n = 3 one gets exactly the matrix of his 1920 three-valued logic. The future history will, however, show that L ukasiewicz logics have nice properties, which locate them among the most important logical constructions. First, the L ukasiewicz matrix M2 coincides with the matrix of the classical logic. And, since the set {0, 1} is closed with respect to all L ukasiewicz connectives A2 is a subalgebra of any algebra (Ln , ¬, →, ∨, ∧, ≡) and M2 is a submatrix of Mn . Therefore all tautologies of L ukasiewicz propositional calculi are included in the T AU T : E(Mn ) ⊆ E(M2 ) = T AU T. Next, the relations between the contents of finite matrices are established by the following Lindenbaum condition:15 For finite n, m ∈ N , E(Mn ) ⊆ E(Mm ) iff m − 1 is a divisor of n − 1. The proof of the last property may be based on the “submatrix” properties of the family of the finite matrices of L ukasiewicz (see above). Using the same argument one may also prove the counterpart of Lindenbaum’s condition. for matrix consequence relations |=n of Mn : For finite n, m ∈ N, |=n ⊆ |=m iff m − 1 is a divisor of n − 1. 14 See 15 See
[Lukasiewicz, 1970, p. 140]. (Lukasiewicz and Tarski [1930]).
Many-valued Logic and its Philosophy
37
It may be proved that the infinite L ukasiewicz matrices have a common content equal to the intersection of the contents of all finite matrices: E(Mℵ0 ) = E(Mℵ1 ) = {E(Mn ) : n ≥ 2, n ∈ N }.
L ukasiewicz n-valued logics L n are not functionally complete. All what was established for n = 3 applies for each finite n. First, no constant except 0 and 1 is definable in (Ln , ¬, →, ∨, ∧, ≡). Second, adding the constants to the stock of connectives makes this algebra functionally complete. And, since Mn is one generated, either by 1/n−1 or by n−2/n−1 , also adding only one of them do the job as well. McNaughton [1951] proved an ingenious definability criterion, both finite and infinite, which shows the mathematical beauty of L ukasiewicz’s logic constructions.
Axiomatizability A proof that finite matrices are axiomatizable was given in L ukasiewicz and Tarski [1930]. However, the problem of formulation of a concrete axiom system fo finite L ukasiewicz logics for n > 3 remains open till 1952. Rosser and Turquette [1952] are the authors of a general method of axiomatization of n-valued logics with connectives satisfying the so-called standard conditions. The method can be applied, among others, to L n since such connectives are either primitive or definable in L ukasiewicz finite matrices. Hence, for every n an axiomatization of L ukasiewicz’s n-valued propositional calculus can be obtained. The axiomatization, however, becomes very complicated due to the high generality of the method given by Rosser and Turquette. In [1930] L ukasiewicz conjectured that his ℵ0 -valued logic was axiomatizable (Lukasiewicz and Tarski [1930]) and that the axiomatization of the infinite-valued propositional calculus together with M P and SU B was the following: L1. L2. L3. L4. L5.
p → (q → p) (p → q) → ((q → r) → (p → r)) ((p → q) → q) → ((q → p) → p) (¬p → ¬q) → (q → p) ((p → q) → (q → p)) → (q → p).
Due to L ukasiewicz16 this hypothesis was confirmed by Wajsberg in [1931]. Next comes the reduction of the axiom set: Meredith [1958] and Chang [1958a] independently showed that axiom L5 is dependent on the others. There are two main accessible completeness proofs of L1–L4 (with M P and SU B): based on syntactic methods and linear inequalities by Rose and Rosser [1958], and purely algebraic — by Chang [1959]. Chang’s proof is based on properties of M V algebras, algebraic counterparts of the infinite-valued L ukasiewicz logic, defined and studied in Chang [1958b].17 16 L ukasiewicz 17 MV
[1970, p. 144]; no publication on the topic by Wajsberg exists. algebras are presented in the sequel.
38
Grzegorz Malinowski
The key role in the approach have additional binary connectives + and ·. The two connectives directly correspond to the main algebraic operations of M V algebras are defined by α + β =df ¬α → β and α · β =df ¬(α → ¬β). Several axiomatizations for finite-valued L ukasiewicz logics (n > 3) were obtained by way of extension of the axiom system L1–L4. Grigolia [1977] employs multiplying use of the connectives + and ·. Let kα will be a replacement of the formula α + α + ... + α (k times) and αk a replacement of the formula α · α · ... · α (k times). Given a finite n > 3, Grigolia’s axiom system for L n consists of the schemes of L1–L4 and Ln 5. Ln 6.
nα → (n − 1)α (n − 1)((¬α)j + (α · (j − 1)α)),
where 1 < j < n − 1 and j does not divide n − 1. Tokarz [1974] extension of L1–L4 is based on the characteristic functions of the set Ln in [0, 1] and the properties of the consequence relation of Mℵ0 . The axiom set for a given n-valued L ukasiewicz logic, including n = 2 (i.e. CP C), results from L1–L4 by adding a single special “disjunctive” axiom p ∨ ¬p ∨ δn1 (p) ∨ . . . ∨ δnn−2 (p), where, for any k, 1 ≤ k ≤ n − 1, the corresponding algebraic operation δnk (x) is the “characteristic function” of the logical value k/n−1 in the infinite L ukasiewicz matrix Mℵ0 . Another axiomatization of finite L ukasiewicz logics, offered by Tuziak (1988), is formulated in the standard propositional language, using the sequences of ascending implications defined inductively by: p →0 q = q, p →k+1 q = p → (p →k q). The axiom set for n-valued L ukasiewicz logic consists of ten formulas taken from the Hilbert-Bernays axiomatization of CP C and the following two “specific” axioms: T1.
(p →n q) → (p →n−1 q)
T2.
(p ≡ (p →s−2 ¬p)) →n−1 p for any 2 ≤ s ≤ n − 1 such that s is not a divisor of n − 1.
Algebraic interpretations The first attempts to obtain algebras corresponding to finite L ukasiewicz logics, due to Moisil, are dated back to the 1940’s.18 Moisil algebras are bounded distributive lattices with involution and Boolean-valued endomorphisms. A structure 18 See
Moisil [1972].
Many-valued Logic and its Philosophy
39
(A, ∪, ∩, N, s1 , s2 , ..., sn−1 , 0, 1) is an n-valued L ukasiewicz algebra19 provided that M1.
(A, ∪, ∩) is a distributive lattice with 0 and 1.
M2. N is an involution of A, i.e. N N x = x, and N (x ∪ y) = N (x) ∩ N (y), N (x ∩ y) = N (x) ∪ N (y). M3. sk ’s are endomorphisms of (A, ∪, ∩), i.e. for any k ∈ {1, 2, ..., n − 1}: sk (x ∪ y) = sk(x) ∪ sk (y) and sk (x ∩ y) = sk (x) ∩ sk (y), such that (i)
sk (x) ≤ sk+1 (x)
(ii) sk (st (x)) = st (x) (iii) sk (N (x)) = N sn−k (x) (iv) N sk (x) ∪ sk (x) = 1, N sk (x) ∩ sk (x) = 0 (v) If sk (x)) = sk (y) for every k, then x = y. The simplest finite n-valued L ukasiewicz algebra is the linearly ordered algebra of n-valued L ukasiewicz matrix (Ln , ∨, ∧, ¬, s1 , s2 , ..., sn−1 , 0, 1), with the one-argument functions s1 , s2 , ..., sn−1 defined by: 0 when 1 ≤ k ≤ n − j − 1 sk (j/n−1 ) = 1 when n − j − 1 < k ≤ n − 1. The definability of sk functions is warranted by the McNaughton test; the “effective” definitions using L ukasiewicz ¬ and → were given by Sucho´ n [1974]. Since for n ≥ 5, the L ukasiewicz implication was not definable in n-valued Moisil algebras, Cignoli [1980] enlarged the set of basic operations with additional binary operators satisfying some simple equations. The resulting structures, called proper n-valued L ukasiewicz algebras, proved to be real counterparts of L ukasiewicz finite-valued logics, see Cignoli [1982]. In view of getting algebraic completeness proof of the infinite-valued L ukasiewicz logic Chang [1958b] introduced the concept of M V algebra. An algebra A = (A, +, ·, −, 0, 1), where + and · are binary operations, − is a unary operation, 0 = 1, and ∪ and ∩ are binary operations defined by 19 Moisil
used this name.
40
Grzegorz Malinowski
x ∪ y = x : y− + y
x ∩ y = (x + y − ) : y
is an M V algebra if the following conditions are satisfied: C1. C2. C3. C4. C5. C6. C7. C9. C10. C11.
x+y =y+x x + (y + z) = (x + y) + z x + x− = 1 x+1=1 x+0=x (x + y)− = x− · y − (x− )− = x x∪y =y∪x x ∪ (y ∪ z) = (x ∪ y) ∪ z x + (y ∩ z) = (x + y) ∩ (x + z)
C1∗. C2∗. C3∗. C4∗. C5∗. C6∗. C8. C9∗. C10∗. C11∗.
x·y =y·x x · (y · z) = (x · y) · z x · x− = 0 x·0=0 x·1=x (x · y)− = x− + y − 0− = 1 x∩y =y∩x x ∩ (y ∩ z) = (x ∩ y) ∩ z x · (y ∪ z) = (x · y) ∪ (x · z).
The simplest example of the M V algebra is an arbitrary L ukasiewicz matrix, the operations + and · are defined as above and ∪, ∩, − are the connectives of disjunction, conjunction and negation, respectively. An adaptation of M V algebras to finite L ukasiewicz logics was made in 1973 by R. Grigolia. An M V algebra A is an M Vn algebra (n ≥ 2) provided that it satisfies additionally C12.
(n − 1)x + x = (n − 1)x
C12∗.
xn−1 · x = xn−1
and for n > 3: C13. C13∗.
{(jx) · (x− + ((j − 1)x)− )}n−1 = 0 (n − 1){xj + (x− · (xj−1 )− )} = 1
where 1 < j < n − 1 and j does not divide n − 1, see Grigolia (1977). All above algebras are conservative generalizations of Boolean algebras. Every Moisil algebra and every M V algebra has a Boolean subalgebra. {sk (x) : x ∈ A, 1 ≤ k ≤ n} is the set of all Boolean elements of a n-valued L ukasiewicz algebra. The set {x ∈ A : x + x = x} = {x ∈ A : x · x = x} is the set of Boolean elements of an M V algebra A. Moisil and Chang developed representation theory of their algebras. Chang’s idea of associating to any M V algebra a totally ordered abelian group was the crucial point of the algebraic completeness proof of L ukasiewicz axioms for infinte-valued logic, see Chang [1959].20 A group-theoretic flavour of M V algebras and their other mathematical properties attracted attention of many scholars. Several algebraic structures related to the original Chang’s algebras have been investigated and applied to logic. Among recent works it would be worth to note two new proofs of the completeness of L ukasiewicz axioms: by Cignoli [1993], using the representation of free l-groups and by Panti [1995], using tools of the algebraic geometry. Cignoli et al. [1999] is 20 The original proof for the infinite case is non-constructive. Mundici [1994] gave a constructive proof of it.
Many-valued Logic and its Philosophy
41
a good source of results and studies of infinite-valued L ukasiewicz logic and MV algebras. 7 STANDARD AXIOMATIZATION The problem of Hilbert-style axiomatization of many-valued logics rested open still many years after the inventory constructions. The 1930s brought mere syntactic characterizations of some systems of three-valued logic including Wajsberg’s axiomatization of the three-valued L ukasiewicz logic, see L ukasiewicz [1930]. In their seminal analysis Rosser and Turquette [1952] set the conditions that make finitely many-valued logics resemble to the CP C and hence simplify the problem of the syntactic formalization.
Standard conditions The first semantical steps of the analysis resolve the principle of interpretation of propositional languages in matrices of the form Mn,k = (Un , Dk ), where Un = (En , f1 , . . . , fm ), En = {1, 2, . . . , n}, Dk = {1, 2, . . . , k}, n ≥ 2 is a natural number and 1 ≤ k < n. The authors assume that the natural number ordering conveys decreasing degree of truth. So, 1 always refers to “truth” and n takes the role of falsity. Next come the conditions concerning propositional connectives, which in Mn,k have to represent negation (¬), implication (→), disjunction (∨), conjunction (∧), equivalence (≡) and special one-argument connectives j1 , . . . , jn . Assume that the same symbols are used to denote the corresponding functions of Un and that a given Mn,k is the interpretation structure. Then we say that the respective connectives satisfy the standard conditions if for any x, y ∈ En and i ∈ {1, 2, . . . , n} ¬x ∈ Dk x → y ∈ Dk x ∨ y ∈ Dk x ∧ y ∈ Dk x ≡ y ∈ Dk ji (x) ∈ Dk
if if if if if if
and and and and and and
only only only only only only
if if if if if if
x ∈ Dk x ∈ Dk and y ∈ Dk x ∈ Dk or y ∈ Dk x ∈ Dk and y ∈ Dk either x, y ∈ Dk or x, y ∈ Dk x = i.
Any matrix Mn,k having standard connectives as primitive or definable is called standard. When only some of them are present we will use the term “Q-standard”, where Q is a subset of the set of all standard connectives. All Post and all finite L ukasiewicz matrices are standard. The first case is easy. Post matrices are based on functionally complete algebras, see Section 5, and thus any possible connective is definable. A given n-valued L ukasiewicz matrix may be
42
Grzegorz Malinowski
isomorphically transformed onto a matrix of the form Mn,1 : the isomorphism is established by the mapping f (x) = n−(n−1) of the set {0,1/n−1 ,2/n−1 , . . . ,n−2 /n−1 , 1} onto {1, 2, . . . , n}. Notice that the mapping inverses the ordering. Accordingly, now the smallest element 1 is the designated value, whereas n corresponds to 0. A moment’s reflection shows that original L ukasiewicz disjunction and conjunction satisfy standard conditions. In turn, the other required connectives are definable in Mn . Thus, x ⇒ y =df x → (x → . . . → (x → y)),
x ≈ y =df (x ⇒ y) ∧ (y ⇒ x)
(n−1)times
define the standard implication and equivalence (→ appearing on the right is the original L ukasiewicz connective). The definability of js, ji (x) = 1 iff x = i follows easily from the McNaugthon criterion; see also Rosser and Turquette [1952]. Rosser and Turquette positively solved the problem of axiomatizability of known systems of many-valued logic, including n-valued L ukasiewicz and Post logics. Actually, any logic determined by a {→, j1 , j2 , . . . , jn } — standard matrix Mn,k is axiomatizable by means of the rule M P and SU B and the following set of axioms: A1. A2. A3. A4. A5. A6. A7.
(α → (β → α)) (α → (β → γ)) → (β → (α → γ)) (α → β) → ((β → γ) → (α → γ)) (ji (α) → (ji (α) → β)) → (ji (α) → β) (jn (α) → β) → ((jn−1 (α) → β) → (. . . → ((j1 (α) → β) → β) . . .)) ji (α) → α for i = 1, 2, . . . , k ji(r) (αr ) → (ji(r−1) (αr−1 ) → (. . . → (ji(1) (α1 ) → jf (F (α1 , . . . , αr ))) . . .)) where f = f (i(1), . . . , i(r));
symbols f and F used in A7 represent, respectively, an arbitrary function of the matrix Mn,k and a propositional connective associated with it; Rosser and Turquette [1952]. The axiom system A1–A7 consists of two parts. The first part, A1–A3, describes the properties of pure classical implication: in particular, due to them the deduction theorem (ded2 ) holds, cf. Section 4. The axioms of the second group, (A4)–(A7), bridge the semantic and syntactic properties of connectives. Checking the soundness of the axioms is easy and it heavily bases on the procedures known from the classical logic. The completeness proof, however, requires much calculation quite a complicated induction. Rosser and Turquette’s approach to many-valued logics weights heavily on further research of the topic. Especially, it yields that the two-valued logic is, to some degree, sufficient for the developments of many-valued logic. In turn, an expansion of techniques of the classical logic by the authors suggested further formalizations of the both, finite and infinite-valued logics. One example is the partial normal forms’ representation of formulas of a given finite-valued standard logic, mainly used in Many-valued logics for developing the quantification theory.
Many-valued Logic and its Philosophy
43
Partial normal forms The r-th (1 ≤ r ≤ n) partial normal form of a formula is a specification in terms of two-valued logic of the conditions under which the given formula takes the truth value r. To built this form we need any standard disjunction and conjunction connectives and js. The r-th partial normal form Nr (α) of a formula α = α(p1 , . . . , ps ) is the disjunction of all conjunctions of the form: j1r (p1 ) ∧ j2r (p2 ) ∧ . . . ∧ jsr (ps ), where 1r , 2r , . . . , sr is a sequence of logical values such that v(α) = r for any valuation v meeting the requirement: v(pi ) = ir . The “specification” may also be considered as a formula of an external language comprising the connectives of the classical connectives of negation, disjunction and conjunction. Rosser and Turquette used the disjunctive-conjunctive forms, but extending the notions onto the conjunctive-disjunctive forms is immediate. Consider e.g. the simple three-valued L ukasiewicz implication p → q, which in the notation just adopted has the following truth table: → 1 2 3
1 1 1 1
2 2 1 1
3 3 2 1
Let the superscript 1, 2 or 3 by a propositional variable denote that it takes a respective value. Thus e.g. p1 reads “p takes the value 1”, p2 reads “p takes the value 2”, and p3 reads “p takes the value 3”.Then the first partial normal form of p → q is the DC formula: q 1 ∨ (q 2 ∧ p2 ) ∨ (q 2 ∧ p3 ) ∨ (q 3 ∧ p3 ), its second partial normal form: (q 2 ∧ p1 ) ∨ (q 3 ∧ p2 ) and, finally, the third partial normal form of p → q is: q 3 ∧ p1 . So, the complete set of the above partial normal forms of formula is an equivalent to its truth table. Obviously, such a description of properties of formulas is external with respect to the system.21 It is interesting to note that in some cases the connectives are definable in standard logics to do the same effect. The next step is to ask for further classical connectives, including negation and implication. That 21 Such
was the approach by Bochvar, see Section 5.
44
Grzegorz Malinowski
question and possible consequences of a positive answer to it will be discussed later. The fact that the partial normal form’s method extends onto quantifiers opens an important door. First, an adaptation of the method to the first-order language led Rosser and Turquette to a formalization to a class of many-valued predicate calculi. On the other hand, the tool proved to be useful in investigation of natural deduction formalisations of finite-valued logics. An appropriate generalisation of the notion of normal form enables to construct rules of the sequent and tableaux systems directly from the finite element truth tables of their connectives. Moreover, as Zach [1993] explains, the partial normal forms provide a relationship between clause translation rules, sequent calculus introduction rule, and a natural deduction rule for a given connective. Many-valued logics as constructed through matrices are extensional with respect to their values. This property of the bunch of connectives may be identified as a generalized multiple-valued truth-functionality.
8
BACKGROUND TO PROOF THEORY
The studies of formal proofs, inaugurated by Gentzen [1934], now form a steady branch of logical investigations. Gentzen’s sequents and natural deduction22 for the classical and intuitionistic logic have been adapted soon to other non-classical systems, including the finite many-valued propositional and predicate calculi.
Sequents A sequent is an item of the form Γ ⇒ ∆, where Γ and ∆ are finite sequences, or multisets, of formulas. The sequent means that the conjunction of all formulas of Γ entails the disjunction of all formulas of ∆, in symbols ∧Γ ⊢ ∨∆. In the classical case, the entailment, due to the deduction theorem, is equivalent to ⊢ ∧Γ ⇒ ∨∆. Accordingly, the sequent receives a truth-functional interpretation: for any valuation, if all the formulas in Γ are true, then at least one formula in ∆ is not false. A version of the calculus using only sequents having a single formula in the place of ∆ was also used by Gentzen to formalize intuitionistic logic. In the end, Schr¨ oter [1955] provided a direct generalisation of the classical sequent approach onto the many-valued case. A natural, truth-functional approach to sequent formalisation of finite manyvalued logics is due to Rousseau [1967].23 Given a finite n ≥ 2, n-valued sequent Γ is an n-tuple Γ1 | Γ2 | . . . | Γn 22 Independently, Ja´ skowski [1934] presented a handy natural deduction system for the classical, intuitionistic and some version of free logic. 23 Takahashi [1967; 1970] gave a similar formalisation, see e.g. Bolc and Borowik [2003].
Many-valued Logic and its Philosophy
45
of finite sequences of formulas. Then, Γ is interpreted as true under a given interpretation if and only if at least one Γi , i ∈ {1, 2, . . . , n} has a formula, which takes the value i. Thus, the components Γ1 , Γ2 , . . . , Γn of Γ correspond to logical values of a logic under consideration. It is obvious that for n = 2 one gets the counterpart of the standard notion of a sequent Γ1 | Γ2 with the usual truth-falsity interpretation. The base of the construction of finite-valued sequent calculus is the expression of the n-valued truth-functionality by assumption that for any formula α, the sequent α | α | ... | α is an axiom. Next to that, there are the weakening rules for every place i, Γ1 | . . . | Γi | . . . | Γn Γ1 | . . . | Γi , α | . . . | Γn and (i, j) cut rules ∆1 | . . . | ∆j , α | . . . | ∆n Γ 1 | . . . | Γi , α | . . . | Γ n Γ1 , ∆1 | . . . | Γn , ∆n for every i = j, i, j ∈ 1, 2, . . . , n. The last and the hardest step consists in stating the admissible introduction rules for the connectives and, later,24 for quantifiers. Any F -introduction rule at the position i for a connective F of (finite) arity r, has the following form: Γ11 , ∆11 | . . . | Γ1n , ∆1n . . . Γj1 , ∆j1 | . . . | Γjn , ∆jn , Γ1 | . . . | Γi , F (α1 , . . . , αr ) | . . . | Γn where Γ1 = Γ11 ∪ . . . ∪ Γj1 , . . . , Γn = Γ1n ∪ . . . ∪ Γjn and, ∆1n , . . . , ∆jn are subsets of {α1 , . . . , αr }. To get an exhausting description of the connective in the sequents setting, one has to establish the rules for all positions, i.e. the original values of the logic. That may be done using conjunctive-disjunctive partial normal forms. However, a formula of a given many-valued logic may have several specific partial CD forms. This implies that the result of establishing the rules is not unique. Once, we have such a description for the connective, we may write the introduction rules of it, taking as the premises the set of sequents reflecting disjuncts, which have positioned the components at the corresponding places. To give an example, we stay with the L ukasiewicz three-valued implication considered in Section 2 expressed as in Section 7. One may verify, that the following formulas are the CD partial normal forms of p → q: 24 See
Section 9.
46
Grzegorz Malinowski
(p3 ∨ p2 ∨ q 1 ) ∧ (p3 ∨ q 2 ∨ q 1 ) (p2 ∨ p1 ) ∧ (p2 ∨ q 2 ) ∧ (p1 ∨ q 3 ) p1 ∧ q 3
(the first; value 1) (the second; value 2) (the third; value 3).
Accordingly, we get the following three introduction rules for the L ukasiewicz implication: Γ1 , q | Γ 2 , p | Γ 3 , p Γ 1 , q | Γ 2 , q | Γ 3 , p Γ1 , p → q | Γ 2 | Γ 3 Γ 1 , p | Γ 2 , p | Γ3
(I →; 1)
Γ1 | Γ2 , p, q | Γ3 Γ1 , p | Γ2 | Γ3 , q Γ1 | Γ 2 , p → q | Γ 3
Γ 1 , p | Γ 2 | Γ 3 Γ 1 | Γ2 | Γ3 , q Γ 1 | Γ2 | Γ3 , p → q
(I →; 2)
(I →; 3).
These rules, together with the rules for the negation connective on the set {1, 2, 3} assure a sequent formalisation of the three-valued propositional L ukasiewicz logic. To finish with, we remark that the following introduction rules follow directly from the table of negation: Γ1 , p | Γ 2 | Γ 3 Γ1 | Γ2 | Γ3 , ¬p
Γ 1 | Γ 2 , p | Γ3 Γ1 | Γ2 , ¬p | Γ3
Γ 1 | Γ2 | Γ 3 , p . Γ1 , ¬p | Γ2 | Γ3
Tableaux method The tableaux method raised by Beth [1955] is a refutation procedure, which proved to be useful not only as the proof procedure but also as a model search device. Beth was motivated by semantic concerns, what found its expression in the early terminology, when the most proper way of speaking about the procedure referred to it as to semantic tableaux, cf. Smullyan [1968]. The core of the method is putting a question on whether a given formula is valid or not or, equivalently, whether it is falsifiable. To show that a formula α is valid one begins with some expression intended to assert that α is not valid. Due to the already established tradition the “expression” is composed of a sign displaying the polarisation — the assertion of the truth or falsity — and a current formula, for the classical logic the most popular are two sets of signs: {T, F } and {+, −}, then the signed formulas are: T α, F α and +α, −α, respectively. Next, a set of elimination rules reducing signed formulas to simpler signed formulas, is given. In general, the procedure branches and at its final stage further productive application of the rules is impossible. A branch is closed if it contains two opposite signed formulas, i.e. T α, F α (or +α, −α), otherwise the branch is open. The former outcome locally denies the starting assumption on non validity — the integral deny is when all branches are closed, the latter gives a hint how to built a falsifying model. In case of classical logic there is a straightforward duality between the tableaux and sequents: any sequent system without cut may be reversed into the tableaux: the
Many-valued Logic and its Philosophy
47
elimination rules result as reversing the introduction rules for sequents, see e.g. D’Agostino [1999]. The signed tableaux systems for finitely many-valued logics were first given by Sucho´ n [1974] and Surma [1974]. The both authors presented their systems for the finite n-valued L ukasiewicz propositional logics using the modal operators of Moisil algebras25 to play the role of signs. Further elaboration of the method is due to Surma [1984]. Carnielli [1987; 1991] developed the idea and established a “systematisation of finite many-valued logics through the method of tableaux”.26 Since some time, the research is dominated with issues concerning the automation. One of recently important issues is using sets of signs instead signs, see H¨ahnle [1993] as well as as specified versions of sequents with meta connectives, H¨ahnle [1999]. Early approaches to many-valued tableaux systems resemble the sequents of Rousseau. However, contrary to the sequential calculi, which are grounded on the introduction rules, the tableaux use elimination rules. In particular approaches the signs are either simulated by the connectives definable within the logic, they are possibly the values of the logical matrix or the elements of an external structure. The sequential format for tableaux is particularly useful and in eligible cases the tableaux rules of elimination may be received as reverse of the introduction rules. Thus, e.g. for the three-valued L ukasiewicz connectives of negation and implication, one may have the following rules: Γ 1 , p → q | Γ 2 | Γ3 Γ 1 , q | Γ 2 , p | Γ3 , p Γ1 , q | Γ 2 , q | Γ 3 , p
Γ 1 , p | Γ 2 , p | Γ3
Γ1 | Γ2 , p → q | Γ3 Γ1 | Γ2 , p, q | Γ3
(→ E; 1)
Γ1 , p | Γ2 | Γ3 , q
Γ1 | Γ2 | Γ3 , p → q Γ 1 , p | Γ2 | Γ 3 Γ 1 | Γ 2 | Γ 3 , q
(→ E; 2)
(→ E; 3).
Further to that other elimination rules may be established, using appropriately the partial normal forms. It should be said, however, that — in general — the partial normal forms are not unique, and due to that property one may get different sets of elimination rules for a given connective. See Baaz et al. [1993].
Natural deduction and resolution Resolution is still among the main tools of Automated Reasoning. Recently, also the natural deduction becomes more and more acknowledged device in that context. The both, natural deduction and the resolution, are relatives of sequents 25 See
Section 6. is the title of the first Carnielli’s paper. It should be added that the characterisation was also extended onto quantifiers. 26 This
48
Grzegorz Malinowski
and, to a certain degree, are sequent expressible.27 Below, we shortly comment on them. The method of natural deduction is a special formalization, which establishes relations between premises and a set of conclusions. It operates with both kind of rules for all logical constants: rules of introduction and rules of elimination. Moreover, the set of conclusions may, in particular cases, consist of one formula. The history of natural deduction approach to many-valued logics is not very long. Probably the first paper on the topic is Becchio and Pabion [1977], who gave a system of natural deduction for the three-valued L ukasiewicz logic. Essentially new systems using sequents are discussed in Baaz et al. [1993] and Zach [1993]. The common feature of all systems of this kind is the property of conserving some logical values. Obviously, the most popular option is to save one logical value corresponding to the truth. On the other hand, the choice of many logic values at once, or a subset of the set of all values, of the logic at work, is also worth consideration. Resolution is a refutation method organized on clauses i.e finite disjunctions of literals. A literal is a propositional variable (positive literal) or the negation of a propositional variable (negative literal). The procedure starts with a set of clauses and the refutation ends with an empty clause. It operates with a single resolution rule, which is a form of the cut rule. The earliest articles on resolution in many-valued logics are due to Orlowska [1967] and Morgan [1976]. The pioneering works and their ancestrals have been based on special normal forms with multiple-valued literals, which used special unary connectives. The most recent outcome of investigation is an algebraic theory of resolution proof deductive proof systems developed by Stachniak [1996]. The key idea on which the theory is based is that the refutational deductive proof systems based on a non-clausal resolution become finite algebraic structures, the so-called resolution algebras. In turn, the particular interpretation of the resolution principle shows it as the rule of consistency verification defined relative to an appropriate propositional logic. The verification uses special formulas, the so-called verifiers “witnessing” the consistency of sets of formulas. In the classical case verifiers coincide with the formulas defining two standard truth values. The process of selecting verifiers for resolution counterparts of non-classical proof systems usually goes beyond the search for defining truth values. Thus, e.g. the resolution counterparts of three-valued and five-valued L ukasiewicz logics have, respectively, six and nine verifiers.28 What makes the last approach specially interesting from the point of view of many-valuedness is that the interpretation of the resolution principle is rooted in the logical tradition of Le´sniewski, L ukasiewicz, Tarski as well as Couturat, Post and Schr¨ oder.29 Accordingly, it seems very interesting that in the counterparts of functionally complete systems of (finite) many-valued logic verifiers coincide with formulas defining logical values. 27 Consult
D’Agostino [1999]. [1996, pp. 181 and 187]. 29 See Stachniak [1996, p. xii]. 28 Stachniak
Many-valued Logic and its Philosophy
49
9 QUANTIFIERS IN MANY-VALUED LOGIC Rosser and Turquette [1952] extended the method of partial normal forms onto formulas with quantifiers and developed a general theory of quantification for a class of finitely many-valued logics.
Ordinary and generalized quantifiers They started from the intuition permitting to treat ordinary quantifiers as functions on the set of pairs (x, F ), where x is a nominal variable and F a formula, with values in the set of formulae. A generalized quantifier of this type is any formula of the form: Q(x1 , x2 , ..., xm , F1 , F2 , ..., Ft ), where x1 , x2 , ..., xm are nominal variables and F1 , F2 , ..., Ft formulae built from predicates, nominal and propositional variables, and connectives. The intended meaning of Qi functions in n-valued logic is determined with the help of interpretations assigning to formulae values from the set {1, 2, ..., n}. Operating on the links between formulae stated by means of connectives of basic logic enables the construction of non-classical quantifiers. The theory of representations allows any generalized quantifier to be expressed by means of ordinary quantifiers. Standard n-valued predicate calculi with generalized quantifiers are axiomatizable using special “partial normal forms”. Many-valued predicate calculi are, however, usually built along the classical pattern. In that case a first-order language with two standard quantifiers: general ∀ and existential ∃ is considered. In general, the starting point is the substitutional conception of quantifiers according to which ∀ and ∃ are (infinite) generalizations of conjunction and disjunction, respectively. Accordingly, for a finite domain U = {a1 , a2 , ..., an }, the commutative and associative connectives of conjunction (∧) and disjunction (∨): ∀xF (x) ≡U F (a1 ) ∧ F (a2 ) ∧ ... ∧ F (an ) ∃xF (x) ≡U F (a1 ) ∨ F (a2 ) ∨ ... ∨ F (an ), (≡U means the equivalence of the formulae at any interpretation in U, a1 , a2 , ..., an being nominal constants ascribed to the objects of the domain). In finite-valued logical calculi constructed upon linear matrices, quantifiers are defined ‘directly’ through algebraic functions related the above-mentioned connectives. Thus, e.g. for finite L ukasiewicz and Post logics, for any interpretation f in a domain U f (∀xF (x)) = min{f (F (a)) : a ∈ U } f (∃xF (x)) = max{f (F (a)) : a ∈ U }. For other calculi semantic description of quantifiers may vary. Thus, for example, the clauses defining quantifiers in the first-order Bochvar logic should be following:
50
Grzegorz Malinowski
t u f (∀xF (x)) = f f u f (∃xF (x)) = t
when f (F (a)) = t for every a ∈ U when f (F (a)) = u for some a ∈ U otherwise when f (F (a)) = f for every a ∈ U when f (F (a)) = u for some a ∈ U otherwise.
Axiomatic systems of many-valued predicate logics are extensions of axiom systems of the ground propositional calculi in the similar way as for the classical logic, cf. Rasiowa and Sikorski [1963] and Rasiowa [1974]. Proofs of completeness for for finitely-valued calculi do not, in general, create difficulties. Eventually, the axiomatizability of several calculi of this kind may be assured by the Rosser and Turquette’s method extending the standard approach to quantifiers, see above.
Distribution quantifiers Attempts to adopt the matrix method for the first order finite-valued logics led to the concept of distribution quantifiers. The semantics of any such operator is defined with mapping from distributions to truth values: their “matrix” counterparts are functions from sets of logical values to logical values. The very idea has been covered by a general framework of Mostowski [1957] and for the finite-valued logics was developed by Carnielli [1987; 1991]. An advantage of the distribution approach is that an appropriate description of a quantifier directly yields the elimination rule for the corresponding tableau proof system. Thus, for example for the three-valued L ukasiewicz logic presented as in Section 7, the standard quantifiers ∀ and ∃ may be defined in terms of distribution in the following way: 3 if 3 ∈ X 2 if 2 ∈ X and 3 ∈ X d∀ (X) = 1 if X = {1}, 3 if X = {3} 2 if 2 ∈ X and 1 ∈ X d∃ (X) = 1 if 1 ∈ X. −1 The counter images of the two functions, d−1 ∀ and d∃ would be perhaps more displaying:
d−1 ∀ (3) = {{3}, {2, 3}, {1, 2, 3}, {1, 3}} d−1 ∀ (2) = {{2}, {1, 2}} d−1 ∀ (1) = {{1}}
d−1 ∃ (3) = {{3}} d−1 ∃ (2) = {{2}, {2, 3}} d−1 ∃ (1) = {{1}, {1, 2}, {1, 2, 3}, {1, 3}}.
From the last description the tableaux elimination rules stem directly. Some examples will follow:
Many-valued Logic and its Philosophy
1∀xF (x) , t is any term; 1F (t)
2∀xF (x) , c1 , c2 , c3 are new constants. 2F (c1 ) 1F (c2 ) 2F (c3 )
3∃xF (x) , t is any term; 3F (t)
2∃xF (x) , c1 , c2 , c3 are new constants. 2F (c1 ) 2F (c2 ) 3F (c3 )
51
Quantifiers and infinite-valued logic Introducing quantifiers to logics with infinitely many values in the semantical plane may be problematic. Thus, e.g. applying the before mentioned procedure to the ℵ0 -valued L ukasiewicz logic is impossible since in the case when U is infinite it may happen that the set {f (F (a)) : a ∈ U } does not contain the least or the greatest element and therefore min and max functions cannot be used in the definition. In turn, in the the ℵ1 -valued L ukasiewicz logic, the interpretations of quantifiers are introduced provided that for any interpretation in a non-empty domain U f (∀xF (x)) = inf {f (F (a)) : a ∈ U } f (∃xF (x)) = sup{f (F (a)) : a ∈ U }, see Mostowski [1961]. However, it appeared that ℵ1 -valued predicate calculus thus obtained is not axiomatizable, Scarpelini [1962]. The investigations of other scholars complete, to some extent, the Scarpelini’s result characterizing sets of valid formulas30 in terms of recursive countability RC:31 Rutledge [1959] showed that the set of valid formulas of ℵ1 -valued monadic predicate calculus is RC. Hay [1963] proved that for any valid formula α of this calculus there exists a finite m > 0 such that mα is derivable from some “sound” axiomatics. Finally, a general result of Mostowski [1961] implies that the set of formulas valid in L ukasiewicz’s matrix with the designated set (r, 1], 0 < r ≤ 1, is axiomatizable — the proof of Mostowski is not effective and the author provides no axiomatics. The adduced works allow, as we think, to estimate the complexity and subtlety of the problem. In this connection it is also worthwhile mentioning that the greatest hopes for proving completeness of ℵ1 -valued predicate calculus were combined with the algebraic method and MV algebras (see Section 6 and Belluce and Chang [1963]). The experience gathered while attempting to constitute such a proof finally yielded in the form of interpretation theory of the first-order language with values in compact Hausdorff spaces, the so-called continuous model theory in Chang and Keisler [1966]. 30 I.e. 31 See
formulas true at any interpretation. e.g. [Kleene, 1952].
52
Grzegorz Malinowski
10
¨ INTUITIONISM AND GODEL’S MATRICES
Intuitionism constitutes a constructivistic trend in the studies of the foundations of mathematics. Its sources are found in some elements of the philosophy of Kant who perceived the base of mathematics in an a priori intuition of time and space and who emphasized the role of construction in justifying the existence of mathematical objects. The history of the intuitionistic conception is excessively rich and has links with so eminent mathematicians as L. Kronecker, H. Poincar´e, E. Borel, H. Lebesgue to mention only few of them. The systematic and mature development of the intuitionistic ideas initiated in 1907 is a life-work of L. E. J.Brouwer (see [Heyting, 1966]).
Postulates and axioms One of the main assumptions of intuitionism is the postulate of effectiveness of existential mathematical theorems: a proposition concerning the existence of mathematical objects can be accepted only when we are able to provide a method of construction of those objects. Proposed by Heyting [1930] the interpretation of logical constants and quantifiers allowed to formulate an axiomatization of intuitionistic logic generally acknowledged as adequate. According to this interpretation the validity of any proposition is identified with its provability, and proofs of compound propositions are composed of the proofs of simpler ones. Proof of α∧β α∨β α→β
¬α ∃xΦ(x) ∀xΦ(x)
is a construction consisting of a proof of α and a proof of β; choosing one of the propositions α, β and laying down a proof of it; transferring any proof of α onto the proof of β and verifying that results of it are proofs of β indeed; equivalent to a proof of α → 0, where 0 is an absurd sentence (falsum); choosing an object a and laying down a proof of Φ(a); which to every object a of a given domain assigns a proof of Φ(a) and subsequently verifies it.
Heyting [1930] presents the intuitionistic propositional calculus as a system INT based on the axioms:
Many-valued Logic and its Philosophy
(H1) (H2) (H3) (H4) (H5) (H6) (H7) (H8) (H9) (H10) (H11)
53
p → (p ∧ p) (p ∧ q) → (q ∧ p) (p → q) → ((p ∧ r) → (q ∧ r)) ((p → q) ∧ (q → r)) → (p → r) q → (p → q) (p ∧ (p → q)) → q p → (p ∨ q) (p ∨ q) → (q ∨ p) ((p → r) ∧ (q → r)) → ((p ∨ q) → r) ¬p → (p → q) ((p → q) ∧ (p → ¬q)) → ¬p
and the rules M P and SU B. The “soundness” of INT axiomatics can be demonstrated through the above-mentioned interpretation of logical constants in terms of proof. It is readily checked that all the laws of the intuitionistic propositional calculus are classical tautologies. Notwithstanding, INT differs from CPC, which can be shown applying the following three-element Heyting matrix: H3 = ({0,1/2 , 1}, ¬, →, ∨, ∧, {1}), in which ∨ and ∧ are defined as in L ukasiewicz matrix while ¬ and → are characterized by the tables: α 0 1 /2 1
¬α 1 0@ 0
→ 0 1 /2 1
0 1 0@ 0
1
/2 1 1 1 /2
1 1 1 1
which differ from L ukasiewicz tables in two places marked by @ , where according to L ukasiewicz the value is 1/2 . INT ⊆ E(H3 ) since axioms (H1)–(H11) pertain to E(H3 ) and E(H3 ) is closed under the rules M P and SU B. Subsequently, such laws of the classical logic as p ∨ ¬p,
¬¬p → p,
(¬p → p) → p
are not tautologies of the matrix H3 : it suffices to consider any valuation h such that h(p) = 1/2 . It is also noteworthy that the strengthening of Heyting axiom system with the law of the excluded middle leads to the classical logic. The problems of relations between the intuitionistic and classical propositional calculi were given much attention, the so-called intermediate logics (intermediate between the intuitionistic and classical logics) were intensely studied (see e.g. the papers by Kabzi´ nski and Krajewski in: Marciszewski [1987] — the calculus determined by Heyting matrix can serve as an example of such a logic. The weakening of INT resulting from the omission of (H10) leads to the system of minimal logic (consult Johansson [1936]), closer to intuitionism than Heyting calculus.
54
Grzegorz Malinowski
G¨ odel matrices G¨ odel [1932] showed that INT cannot be described by a finite matrix and, in consequence, by a finite set of finite matrices. G¨odel’s reasoning consists in the construction of a sequence of matrices approximating INT, and next in pointing out, on the base of the matrices of that sequence, a suitable set of formulas outside INT: n-valued G¨ odel matrix (n ≥ 2, n finite) is of the form: Gn = ({0, 1/n−1 , . . . ,
n−2
/n−1 , 1}, ¬, →, ∨, ∧, {1}),
where ¬x =
1 0
if x = 0 , if x = 0
x→y=
1 y
x ∨ y = max(x, y), x ∧ y = min(x, y).
if x ≤ y , if x > y
Notice that G2 = M2 , G3 = H3 . The tables of negation and implication in G4 are the following: α 0 1 /3 2 /3 1
¬α 1 0 0 0
→ 0 1 /3 2 /3 1
0 1 0 0 0
1
/3 1 1 1 /3 1 /3
2
/3 1 1 1 2 /3
1 1 1 1 1
In turn, each G¨ odel matrix Gn is a submatrix of Gn+1 - check that the mapping h from the set of values of Gn+1 onto Gn defined as h(i/n ) = /i n−1 for 0 ≤ i ≤ n − 1 and h(1) = 1 is a homomorphism of two matrices. Accordingly, INT ⊆| E(Gn−1 ) ⊆| . . . ⊆| E(G3 ) ⊆| E(G2 ) = TAU T . Let, further, ≡ be the connective of equivalence defined customarily; for any formulas α, β, α ≡ β =df (α → β) ∧ (β → α). Then, α ≡ α ∈ INT and the functions of matrices Gn corresponding to ≡ are described by 1 when x = y x≡y= min(x, y) when x = y. Consider the sequence {di } (i ∈ N ) of formulas: (di )
(p1 ≡ p2 ) ∨ (p1 ≡ p3 ) ∨ . . . ∨ (p1 ≡ pi ) ∨ (p2 ≡ p3 ) ∨ (p2 ≡ p4 ) ∨ . . . . . . ∨ (p2 ≡ pi ) ∨ . . . ∨ (pi−2 ≡ pi−1 ) ∨ (pi−2 ≡ pi ) ∨ (pi−1 ≡ pi ).
As everyone is quick to notice, di ∈ E(Gn ) if i ≤ n and thus d2 , d3 , . . ., dn , . . . ∈ INT. The proof of the nonexistence of a finite matrix (weakly) adequate to INT can be viewed as follows: If INT = E(M ) for some M -element matrix M = (A, I), then v(α ≡ α) ∈ I for any valuation v. Then, by extensionality, for arbitrary formulas α, β (o)
v(α) = v(β)
implies
v(α ≡ β) ∈ I.
Many-valued Logic and its Philosophy
55
Now, consider a formula dk where k > m. Since the number of propositional variables of this formula exceeds the cardinality of the matrix M , then for every pk1 , pk2 ∈ V ar(dk ) there is a v such that v(pk1 ) = v(pk2 ) and, in view of (o), v(pk1 ≡ pk2 ) ∈ I. Applying the fact that (H7) ∈ E(M ), v(dk ) ∈ I can be proved. Thus dk ∈ E(M ). A contradiction. Keeping the original definitions of G¨ odel’s connectives one may define an infinitevalued logic. Thus, taking the set of all rational numbers from the real interval [0,1], we get the denumerable G¨ odel’s matrix Gℵ0 . It follows, that INT ⊆ | E(Gℵ0 ; Dummett [1959] showed that the system of propositional logic thus received is axiomatizable: it may be obtained from INT by adjoining the formula (H12)
(p → q) ∨ (q → p)
to the axiom system (H1)–(H11), the rules of inference remain unchanged.
More on adequacy An infinite class of finite matrices, adequate for INT, was introduced by Ja´skowski [1936]. Ja´skowski’s sequence of matrices begins with the classical matrix and its succeeding terms are made out of the preceding ones by means of special operation G; an account and detailed proof of the completeness theorem can be found in Surma [1971]. Of our interest are investigations by Beth [1956] concentrated upon the question what is “intuitionistically true tautology” and upon getting general method of finding “intuitionistic proofs” of the property. What appears to call for special attention is a topological interpretation provided by Tarski [1938]: here, propositional variables are associated with open sets val(p) of a fixed topological space (X, I) (I is an interior operation) and next this mapping extends onto all formulas thus setting: val(α ∨ β) = val(α) ∪ val(β) val(α ∧ β) = val(α) ∩ val(β) val(α → β) = I{(X − val(α)) ∪ val(β)} val(¬α) = I{X − val(α)}. If α ∈ INT, then val(α) = X. Conversely, if val(α) = X for any space X and for any valuation val, then α ∈ INT. Tarski’s interpretation inaugurated the characterization of the intuitionistic logic in lattice-theory terms i.e. pseudo-Boolean, called also Heyting (or Brouwer), algebras; see Rasiowa and Sikorski [1963]. Curiously enough, in practice almost all non-classical logical constructions, many-valued including, are expressible by means of pseudo-Boolean algebras (compare Section 5).
56
Grzegorz Malinowski
11
ON BIVALENT DESCRIPTIONS
In the 1970s the investigations of logical formalizations bore several descriptions of many-valued constructions in terms of zero-one valuations. The interpretations associated with these descriptions shed new light on the problem of logical manyvaluedness. Below, we discuss two different ways of expression of many-valued logics: one establishing logical two-valuedness of structural consequence relations as a result of division of sets of logical valued into distinguished and undistinguished values (see Section 7) and the other, in which replacement of more logical values by more valuations goes on a par with neglecting the role of the said division.
Suszko’s thesis and beyond Suszko [1977] calls attention to the referential character of homomorphisms associating to propositions their (possible) semantic correlates. Subsequently, he opposes them to logical valuations being zero–one valued functions defined on For. Given a propositional language L and matrix M = (A, D) for L, the set of valuations T VM is defined as: T VM = {th : h ∈ Hom(L, A)}, where th (α) =
1 if h(α) ∈ D 0 if h(α) ∈ D.
Notice that card(T VM ) ≤ card(Hom(L, A) (in general, h1 = h2 does not imply that th1 = th2 !). Notice, moreover, that X |=M α
if and only if
for every t ∈ T VM t(α) = 1 whenever t(X) ⊆ {1}.
The definition of logical valuations can be simply repeated with respect to any structural consequence operation C (or, equivalently, for any relation ⊢C associated with C) since for each such C there is a class matrices K having the property that C = {CnM : M ∈ K},
see [Wojcicki, 1970], in Section 4. Thus, each (structural) propositional logic (L, C) can be determined by a class of logical valuations of the language L or, in other words, it is logically two-valued, Suszko [1977]. The justification of Suszko’s thesis that states logical two-valuedness of an important family of logics lacks the description of valuations (i.e. elements of T VC ) for an arbitrary relation ⊢C . Moreover, it seems, that giving a general method for the recursive description of these valuations not knowing precisely the structure of the class K of matrices adequate for C is hardly possible. At the same time, however, even for simple relations of inference the conditions defining valuations
Many-valued Logic and its Philosophy
57
are illegible. An example of a relatively easily definable set of logical valuations ukasiewicz logic is LV3 , the class adequate for (→, ¬)-version of the three-valued L [Suszko, 1975]. LV3 is the set of all functions t : For → {0, 1} such that for any α, β, γ ∈ For the following conditions are satisfied: (0) (1) (2) (3) (4) (5) (6) (7)
either t(γ) = 0 or t(¬γ) = 0 t(α → β) = 1 whenever t(β) = 1 if t(α) = 1 and t(β) = 0, then t(α → β) = 0 if t(α) = t(β) and t(¬α) = t(¬β), then t(α → β) = 1 if t(α) = t(β) = 0 and t(¬α) = t(¬β), then t(α → β) = t(¬α) if t(¬α) = 0, then t(¬¬α) = t(α) if t(α) = 1 and t(β) = 0, then t(¬(α → β)) = t(¬β) if t(α) = t(¬α) = t(β) and t(¬β) = 1, then t(¬(α → β) = 0
Usually, the degree of complexity of the many-valued logics description increases with the the quantity of values. In some cases, however, it can be simplified by the application of extra connectives “identifying” original matrix values. Such a use of j-operators of Rosser and Turquette made possible to get e.g. a uniform description of logical valuations for finite L ukasiewicz logics, Malinowski [1977]. The logical valuation procedure forms a part of a broader semantical programme related to the conception of so-called non-Fregean logics [Suszko, 1972]. According to Suszko there are situations which play the role of semantic correlates of propositions. Logical valuations, on their side, are nothing more than characteristic functions of the sets of formulas being counterimages of the sets of positive situations, i.e. of those which obtain, under homomorphisms settling the interpretation. Following Suszko, it can be said that the n-valued L ukasiewicz logic (n ukasiewicz finite) is a two-valued logic of n situations s1 , s2 , . . . , sn denoted by L as 0,1/n−1 , . . . , 1, respectively. Obviously then 0 and 1 must not be identified with the logical values of falsity and truth. In the literature one may find other, more or less justified, claims that any logic has a two-valued semantics. One of them, by Kotas and da Costa, deserves special attention. Kotas and da Costa [1980] proved independently from Suszko, that any logic C given by axioms and rules of inference C = (Σ, ℜ). Given any such logic, a function ν : For → {0, 1} is called a (two-valued) valorization associated with C, when the following conditions are satisfied: (1) If α ∈ Σ, then ν(α) = 1, (2) If all premisses of an application of a rule of inference from ℜ have value 1, then the conclusion has also value 1, (3) there exists at least one formula α ∈ Σ, such that ν(α) = 0. The completeness of the inference ⊢c of C with respect to the inference relation |=c the class of all C -valorizations is standard: the valorizations are characteristic functions of saturated, or relatively maximal, sets of formulas: a set X of formulas
58
Grzegorz Malinowski
is α − saturated if X ⊢c α and for every β ∈ X, X ∪ {α} ⊢c α. Accordingly, a α-saturated set X defines a valorization ν such that ν(X) ⊆ {1} and ν(α) = {0}. The method of valorizations was used by Kotas and da Costa for getting semantic two-valued description for the system C1 of the paraconsistent propositional logic.32 Batens [1980] developed a device simplifying and automating the process of getting descriptions of classes of valorizations corresponding to a set of axiom and rules of inference. Next to that Batens also invented the so-called n-tuple semantics “bridging two-valued and many-valued semantic systems”, see Batens [1982].
Scott valuations Scott [1973; 1974] regards the division of the set of values on designated and undesignated as unnatural. And, replacing more values by more valuations, Scott endeavours to bestow more intuitive character upon many-valued constructions. The valuations are bivalent functions and they generate a partition of the set of propositions of a given language into types corresponding to the original logical values. Scott considers only finite classes of valuations and he assumes that (manyvalued) logics are determined by single matrices. The above two papers comprise merely an outline of a general method and its exemplification within n-valued L ukasiewicz logics. Let For be the set of formulas of a given propositional language L and V = {v0 , v1 , . . . , vn−1 } (n ≥ 1) a finite set of valuations: the elements of V are (for the moment) arbitrary functions vi : For → {t, f}, with t denoting truth and f falsity. By a type of propositions of L with respect to V we mean an arbitrary set Zβ of the form: Zβ = {α ∈ F or : vi (α) = vi (β) for any i ∈ {0, 1, . . . , n − 1}}. It is easily seen that using an n-element set of valuations one can induce maximally thus, for example, two-element set {w0 , w1 } of valuations 2n types: (see the table beside) defines four types: Z1 , Z2 , Z3 , Z4 . The confining of the valuations diw0 w1 minishes the number of types. The set of valuZ f f 1 ations just considered will define at most three f t Z 2 types Z1 , Z2 , Z4 when we require that w0 (α) ≤ t f Z3 w1 (α) for every α ∈ For, two types: Z2 , Z3 when Z t t 4 w0 (α) = w1 (α) for every α ∈ For and Z1 , Z4 under the condition that w0 = w1 . The types are counterparts of logical values: Scott [1973] refers to them as “indexes”. The above example shows that a given valency < 2n can be received on several ways. Which of these reductions should be taken into account, it depends 32 Paraconsisitent logics challenge the principle ex falso quodlibet. A logic is paraconsistent iff its consequence relation is not explosive, i.e. it is not true that {α, ¬α} |= β, see e.g. da Costa [1974].
Many-valued Logic and its Philosophy
59
on the properties of propositional connectives that, on their side, are type-valued operations i.e. mappings of sequences of types into types. An accurately-aimed choice of the limiting conditions leads to a relatively simple characterization of the connectives under consideration. Applying the above method, Scott gets a description of the implicative system of n-valued L ukasiewicz logic through the (n − 1)-element set of valuations V L∗n = {v0 , v1 , . . . , vn−2 } such that for any i, j ∈ {0, 1, . . . , n − 1} and α ∈ For∗ , For∗ being used to denote set of formulas of the language L∗ comprising negation ¬ and implication → connectives, (mon)
whenever vi (α) = t
and i ≤ j, then
vj (α) = t
and, moreover, v0 (α1 ) = f and vn−2 (α2 ) = t for some α1 , α2 ∈ For∗ . The table below shows that the set V L∗n determines n types Z0 , Z1 , . . . , Zn−1 of propositions: Z0 Z1 Z2 . . . Zn−2 Zn−1
v0 t f f . . . f f
v1 t t f . . . f f
v2 t t t . . . f f
... ... ... ... ... ... ... ... ...
vn−3 t t t . . . f f
vn−2 t t t . . . t f.
The function f (Zi ) = n − i − 1 / n − 1 is 1 − 1 order-reversing mapping of the set of types onto the universe of the L ukasiewicz matrix Mn : Z0 corresponds to 1 while Zn−1 to 0 in the matrix, compare Section 6. The negation and the implication connectives are characterized in the following way: Zi → Zj = Zmax(0,j−i) , ¬Zi = Zn−i−1 . Accordingly, for any k ∈ {0, 1, . . . , n − 2}, (¬)
vk (¬α) = t if and only if vn−i−1 (α) = t
(→)
vk (α → β) = t
if and only if i + k ≤ j, vi (α) = t and vj (β) = t.
Simple calculation shows that the set of all formulas of L∗ true under an arbitrary valuation vi ∈ V L∗n is just the content of Mn∗ , the (¬, →)-reduct of L ukasiewicz matrix Mn (n finite!): E(Mn∗ ) = {α ∈ For∗ : vi (α) = t for i ∈ {0, 1, . . . , n − 2}}. ∗
Simultaneously, however, the consequence relation |=∗n ⊆ 2For → For∗ ; X |=∗n α
if and only if
vi (α) = t whenever for any vi ∈ V L∗n ,
vi (X) ⊆ {t}
60
Grzegorz Malinowski
does not coincide with |=n (reduced to the language L∗ ) — to verify this it suffices to check e.g. that {α → β, α} |=∗n β, while clearly {α → β, α} |=n β. |=∗n is called a conditional assertion. Whether and how it can be extended onto the whole language L is evident. Scott suggests that the equalities of the form “vi (α) = t”, for i ∈ {0, . . . , n − 2}, should be read as “(the statement) α is true to within the degree i”. Consequently, he assumes that the numbers in the range 0 ≤ i ≤ n − 2 stand for degrees of error in deviation from the truth. Degree 0 is the strongest and corresponds to “perfect” truth or no error: all the tautologies of L ukasiewicz logic are schemas of the statements having 0 as their degree of error. Besides, L ukasiewicz implication may conveniently be explained in these terms: assuming i + j ≤ n − 2 we get that vi (α → β) = t and vj (α) = t yield vi+j (β) = t. Thus, making use of propositions α → β, one may express the amount of shift of error between the degree of hypothesis to that of the conclusion as being the measure of error of the whole implication. An example accommodated from the Euclidean geometry justifies the construction: Where a, b, c . . . are points of the plane, let the metalinguistic statement “|a − b|” denote the distance between a and b. Let S be a propositional language having the set For of formulas consisting of all formulas made up from atomic formulas “a = b” (possibly others) by the use of the connective → (and possibly others such as “∨” and “∧”). Let us define the set of valuations Vε = {v0 , v1 , . . . , vn−2 } (n ≥ 2) for S putting vi (“a = b”) = t if and only if |a − b| ≤ i and assuming that → satisfies (i). Let, finally, |=ε ⊆ 2For → For be the consequence relation determined in S by Vε , i.e. X |=ε α
if and only if
vi (α) = t whenever for any vi ∈ Vε .
vi (X) ⊆ {t}
Depending on the choice of a unit distance, the relationship between a and b is one of “imperfect” equality and as such it is not transitive. The conditional assertion (P1)
{“a = b”, “b = c”} |=ε “a = c”
fails in general, while for any a, b, c it is true that (P2)
“a = b”|=ε “b = c → a = c”.
Hence, the use of the L ukasiewicz implication permits to formulate a weakened version of the law of transitivity: (P2). 12
INTERPRETATION AND JUSTIFICATION
While some scholars of the philosophical foundation of logic criticised manyvalued constructions, the others tried to find convincing justifications for manyvaluedness. The most essential in the two categories were arguments concerning
Many-valued Logic and its Philosophy
61
L ukasiewicz logics and the problem of the justified interpretation of non-orthodox logical values at all. Below we overview and discuss some for and against multiplying logical values.
Three-valued L ukasiewicz logic L ukasiewicz [1920] explanation of the logical value 1/2 resorting to “future contingents” and a “posibility” or undetermination of the 0-1 status of propositions was criticized on several occasions. As we have already mentioned, the first blow was inflicted by Gonseth, who in 1938 noticed — see Gonseth [1941], that the formal characterization of the connectives in L ukasiewicz logic is incompatible with the suggested ways of interpreting the third logical value. The argumentation of Gonseth is sound and straightforward: Whenever α is undetermined, so is ¬α and then α ∧ ¬α is undetermined. That contradicts our intuition since, independently of α’s content, α ∧ ¬α is false. The upshot discovers that L ukasiewicz interpretation neglect the mutual dependence of some “possible” propositions. Haack [1978] analyses L ukasiewicz’s way of avoiding the fatalist conclusion derived from the assumption that the contingent statement “I shall be in Warsaw at noon on 21 December of the next year” is either true or false in advance of the event. She remarks that this way of rejecting bivalence is wrong, since it depends on a modal fallacy of arguing from “It is necessary that (if a, then b)” to “If a, then it is necessary that b”. Urquhart [1986] sees the third logical value as the set {0,1} of two “potential” classical values of a future contingent sentence and defines the implication as getting all possible values of implication. Thus, the implication having 0 as antecedent takes always value 1, the implication from 1 to {0,1} takes {0,1} and the implication from {0,1} to {0,1} has the value {0,1}. → 0 1 /2 1
0 1 1 /2 0
1
/2 1 1 /2 1 /2
1 1 1 1
The last point is inconsistent with the L ukasiewicz stipulation, since the output ukasiewicz table is of 1/2 → 1/2 had to be 1. Therefore, Urquhart claims, the L wrong. It may be of interest that the connective get by Urquhart is the Kleene strong implication. Reichenbach [1944] argued that adoption of three-valued logic would provide a solution to some problems raised by quantum mechanics. For the purpose of avoiding of “causal anomalies”, Reichenbach presents an extended version of the L ukasiewicz logic, adding further negation and implication connectives. He refers to the third logical value as “indeterminate” and assigns it to anomalous statements of quantum mechanics. The weak point of Reichenbach’s proposal is that as “indeterminate” are also classified certain laws, such as e.g. the principle of energy.
62
Grzegorz Malinowski
Temporal interpretation Urquhart [1973] provides a very interesting interpretation of values of finite values propositional logics. He takes the set Sn = {0, 1, . . . , n − 2} and considers the relation ⊢ between numbers and formulas: ⊢ ⊆ Sn × For. Urquhart generally assumes that (Tr)
If x ⊢ α
and x ≤ y ∈ Sn , then
y ⊢ α.
and adopts ⊢ to particular logics thus specifying n, the language, and providing recursive conditions which establish the meaning of connectives. Accordingly, in each of the cases considered we have to do with some Kripke-style semantics: Kn = (Sn , ≤, ⊢). A formula α is Kn -true iff it is true at the point 0, i.e. provided that 0 ⊢ α. Kn is a semantics of the system determined by a given matrix M when that set of all Kn -true formulas is equal to the content of M i.e. when E(M ) = {α ∈ F or : 0 ⊢ a}. Urquhart provides the semantics for n-valued logics of L ukasiewicz and Post, and for the three-valued Bochvar’s system. For L ukasiewicz calculi ⊢ has to satisfy the conditions: x⊢α→β x ⊢ ¬α x⊢α∨β x⊢α∧β x⊢α≡β
iff iff iff iff iff
y ⊢ α yields x + y ⊢ β whenever (n − 2) − x ⊢ α x ⊢ α or x ⊢ β x ⊢ α and x ⊢ β x ⊢ α → β and x ⊢ β → α.
x + y ∈ Sn
It is not hard to notice that it is possible to “translate” the Scott’s valuations33 in V L∗ to the instances of ⊢ can be made according to the equivalence: i ⊢ α if and only if vi (α) = t. The semantics for (¬, ∨)-variant of Post propositional logic is established through the conditions: x ⊢ ¬α
iff
x⊢α∨β
iff
y ⊢ α for no y ∈ Sn or there is a y ∈ Sn such that y < x and y ⊢ α x ⊢ α or x ⊢ β.
To “reference points” x ∈ Sn , several meanings may be attached. For L ukasiewicz and Post logics Urquhart suggests a temporal interpretation: 0 being the present moment, x = 0 a future moment, then “x ⊢ α” reads “α being true at 33 See
Section 11.
Many-valued Logic and its Philosophy
63
(the moment) x”. It is worthnoting that the assumption (Tr) guarantees that any proposition true at x is also true at every moment y future to x. That obviously means that in the framework elaborated, propositions are treated as temporally definitive units and, as such, they must not contain any occasional, time-depending expressions (such as e.g. “now”, “today” etc.). It may be appropriate, perhaps, to add that even the very originators of many-valued logics while using occasional words in the examples, they usually have in mind temporally-definite marks of reference. Under the above interpretation L ukasiewicz implication α → β is true at x if and only if the truth of α at y yields β being true at x + y i.e. at the future moment y time-units distant from x. On its turn, L ukasiewicz negation ¬α is true at x if and only if α is false at (n − 2) − x, i.e. at the moment back to n − 2 (the last in Sn ) of x time-units. Urquhart suggests that such a way of understanding exhibits the sources of difficulties in getting a plainly intuitive interpretation of many-valued L ukasiewicz logics, and he claims that the “natural” connectives of implication and negation should rather satisfy the conditions: x⊢α→β
iff
x ⊢ ¬α
iff
for any y ∈ Sn (y ⊢ β whenever x ≤ y y ⊢ α for no y ∈ Sn .
and y ⊢ α),
Urquhart’s interpretation of Post logics is, as easily seen, entirely compatible with the original interpretation envisaged by Post himself.
Set theory and many-valued logic Russell’s paradox34 leads to a more general question, namely, whether in the set theory there can be adopted the Comprehension Axiom : (AC) The propositions of the form ∃x∀y(y ∈ x ≡ Φ(y)), where Φ(y) is a formula containing y, are true. The presence of (AC) signifies that every formula defines a certain set or, more concisely, that for any property a set of objects bearing that property can be chosen. The discovery of Russell excludes the acceptance of (AC) in the set theory based on the classical logic. Hence, the only method allowing to construct set theory preserving the Comprehension Principle is to change its logic. The suggestion of Bochvar may be obviously conceived as a step in this direction. However, it can hardly be accepted as satisfactory; though it enables to classify some formulas (the formula defining Russell set included) as senseless still, simultaneously, it commits to a very embarrassing, distinction between two categories of propositions. Next, as it soon turned out, the three-valued and, more generally, all 34 See
Section 1.
64
Grzegorz Malinowski
finite-valued L ukasiewicz’s logics cannot seriously be taken into consideration either. Moh Shaw-Kwei [1954] provided the following method of construction of “undesirable” sets in Ln : F or a given finite n ≥ 2 we put Zn = {x : x ∈ x →n p}.35 The set Zn being antinomial since the following absorption rule α →n β (absn ) α →n−1 β is a rule of Ln (equivalently, (α →n β) → (α →n−1 β) ∈ E(Mn )). The assumption Zn ∈ Zn implies that Zn ∈ Zn →n p. Thus, after (n − 1)-ary application of (absn ) we get Zn ∈ Zn → p and, finally, as a result of the detachment (i.e. application of M P ), p. ukasiewicz logics As absorption rules (absn ) are not the rules of infinite-valued L much attention was given to the possibility of foundation of set theory with (AC) on these logics. Skolem [1957] put forward a hypothesis that the proposition ukasiewicz logic (or, more accurately, in the (AC) was consistent36 in ℵ1 -valued L predicate calculus with the predicate ∈). Up till now Skolem’s hypothesis was only partly supported. Using advanced proof theory techniques and applying Brouwer’s Fixed Point Theorem (for n-dimensional cube) Skolem showed that the set of formulas of the form (s1)
∀x1 . . . ∀xn ∃y∀t(t ∈ y ≡ U (t, y, x1 , . . . , xn )),
with U (t, y, x1 , . . . , xn ) being a formula containing no quantifiers wherein free variables are at the outmost t, y and x1 , . . . , xn , is consistent is ℵ1 -valued L ukasiewicz logic.37 The result of Skolem, having applied his method of proof, was expanded by Chang and Fenstad. Chang [1963] showed that the assumption of the absence of quantifiers in formulas U (t, y, x1 , . . . , xn ) can be removed under the condition that bound variables in U appear in atomic formulas u ∈ w only on the second position. Chang also proved that in Lℵ1 any formula (c)
∃x∀y(y ∈ x ≡ Φ(y)),
(compare (AC)) where Φ(y) is a formula with one free variable y, is consistent. Fenstad [1964] obtained a similar result in this direction: he showed that the set 35 p stands for any formula inconsistent in L ; α →n β is an “ascending” implication n α → (α → (. . . → (α → β) . . .)) with n − 1 antecedents α. 36 α is consistent in the Predicate Calculus iff there exists an interpretation f (f ) such that D f (α) = 1. In L ukasiewicz logic the concept of interpretation is defined according to the pattern of Section 6 applied in Lℵ1 . 37 The set X of closed (i.e. having no free variables) formulas is consistent iff three exists an interpretation f sending all formulas of X into true propositions i.e. such that f (X) ⊆ {1}.
Many-valued Logic and its Philosophy
65
of Skolem’s formulas is consistent (in ℵ1 -valued logic) under the assumption that free variable t takes only the place of w in atomic formulas u ∈ w. All this shows that the question of unlimited consistency of the Comprehension Axiom in many-valued logics still remains open. And, it obviously leaves the room for making several suppositions. For example, it can seem unnatural that Skolem attached the problem directly in Lℵ1 and not in the logic of countably many values. It should immediately be noticed that the endeavours to get a consistency proof of (AC) in Lℵ0 would have be connected with working out of a new method: Brouwer’s Fixed Point Theorem for the set of rational numbers of the interval [0,1] does not hold. In relation to the problems discussed a recent result by H´ ajek et al. [2000] on the Liar paradox and the theory of dequotation should be mentioned. H´ ajek shows the consistency of the theory of arithmetic with the truth predicate over the infinite L ukasiewicz predicate logic. To this aim, he considers the axiom of dequotation (DA) ϕ ≡ T r(ϕ), where T r is a unary predicate of truth, in the language of Peano arithmetic PA extended with T r, and ϕ is the G¨ odel number of ϕ. As known, adding (DA) to the axioms of PA founded on the classical logic leads to the contradiction, since one may construct the formula α such that α ≡ ¬T r(α) and thus to prove α ≡ ¬α. Taking into account the fact that the last formula is not inconsistent in L ukasiewicz logic H´ajek showed, that when formalized in the L ukasiewicz predicate logic PA + DA is consistent. The proof of the claim uses the nonstandard models of the structure N of natural numbers with zero, successor, addition and multiplication. H´ ajek constructs a formula, that over N behaves as a formula saying “I am at least a little false”.
13
MODES OF MANY-VALUEDNESS
There are two main approaches to logic. One related to the notion of valid or tautological formulas and the other, which uses the consequence relation. Depending on the choice, we may therefore speak about two different kinds of many-valuedness. Taking this into account, we get rid on the problem of possible formulation of the status of many-valuedness. Next to the two notions, we introduce the third concept of inferential many-valuedness stemming from a natural generalization of the standard consequence relation.
Two kinds of many-valuedness A system of logic determined by a matrix M for the standard language Lk is tautologically many-valued, whenever E(M ) does not coincide with the set of classical tautologies TAUT, i.e. if E(M ) = TAUT. The 1918 L ukasiewicz example of such a matrix is M3 = ({0, 1/2 , 1}, ¬, →, ∨, ∧, ↔, {1})
66
Grzegorz Malinowski
where ¬x = 1 − x, x → y = min(1, 1 − x + y) etc. It is, however, fairly simple to construct a matrix for Lk operating on {0, 1/2 , 1}, whose content is T AU T and, thus, defining two-valued logic. Note that to this aim it suffices to enlarge the set of the distinguished values adding 1/2 . Thus, the matrix M3∗ = ({0, 1/2 , 1}, ¬, →, ∨, ∧, ↔), {1/2 , 1}) defines the tautologically two-valued logic. When a logic is understood as a consequence relation |=M then it is many-valued, or c-many-valued, if |=M is different from the classical consequence relation |=2 . Since E(M ) = {α ∈ F or : ∅ |=M α}, every tautologically many-valued logic is also c-many-valued. Notice, that still the L ukasiewicz logic may serve as an example: |=3 = |=2 This, however, cannot be reversed. There are few examples of logics, which are tautologically two-valued, but are c-many-valued. Consider e.g. the matrix M for Lk for which the connectives are defined by the following truth tables: x 0 t 1
¬x 1 1 0 ∧ 0 t 1
→ 0 t 1 0 0 0 0
0 1 1 0 t 0 0 0
t 1 1 0 1 0 0 1
∨ 0 t 1
1 1 1 1 ≡ 0 t 1
0 1 1 0
0 0 0 1 t 1 1 0
t 0 0 1
1 1 1 1
1 0 0 1
Note, that E(M ) = TAUT, but |=M = |=M2 — since the rule of Modus Ponens fails for |=M , i.e. {α → β, α} |=M β does not hold. The above distinction between the two notions of many-valuedness is particularly important, when the set of tautologies of a given logic is empty. We face this situation with Kleene and Bochvar logics. The matrix of the weak Kleene (internal Bochvar) logic is K3 = ({f, u, t}, ¬, →, ∨, ∧, ≡, {t}), with the first set of operations. As it was already stated, this logic is nontautological, E(K3 ) = ∅. However, Kleene logic is non-trivial since the consequence |=K 3 consists of some special rules of the classical logic: 38 X |=K 3 α if and only if X |=2 α and V ar(α) ⊆ V ar(X).
for any classically consistent X ⊆ F or, i.e. such that h(X) ⊆ {1} for some interpretation h ∈ Hom(L, A2 ). 38 V
ar(α), V ar(X) are the sets of variables appearing in α and all formulas in X, respectively.
Many-valued Logic and its Philosophy
67
Multiple-element models of two-valued logic The use of logical matrices is undoubtedly the most natural way of achieving may-valuednes i.e. the consequence relation different from |=2 . We have already discussed two cases of getting a genuine logic of this kind. However, taking a multiple-element matrix as a base for the logical construction does not guarantee its many-valuedness. And, on the other hand, there are different kinds of that property. Consider, for instance, the matrix W3 = ({0, t, 1}, ¬, →, ∨, ∧, ≡, {t, 1}), x 0 t 1
¬x 1 0 0 ∧ 0 t 1
→ 0 t 1 0 0 0 0
0 1 0 0 t 0 t t
t t t t 1 0 t 1
∨ 0 t 1
1 1 t 1 ≡ 0 t 1
0 1 0 0
0 0 t 1 t 0 t t
t t t 1
1 1 1 1
1 0 t 1
Notice, that with every h ∈ Hom(L, W3 ) the valuation h∗ ∈ Hom(L, M2 ) corresponds in a one-to-one way such that hα ∈ {t, 1} iff h∗ α = 1. Therefore, |=W = |=M2 and W3 is nothing more then a three-valued model of the two-valued logic.39 The last, somewhat striking, case is when a multiple-element matrix retains all classical tautologies, i.e. its content coincides with TAUT, but its consequence relation differs from the classical by some rules of inference. The matrix K3∗ = ({f, u, t}, ¬, →, ∨, ∧, ≡, {u, t}), being like the Kleene-Bochvar matrix K3 but having two elements u and t designated has this property. Its consequence operation |=∗3 falsifies M P , since the inference {p → q, p} |=∗3 q does not hold.
Inferential many-valuedness Suszko [1977] stated that any matrix consequence, and, therefore every structural consequence relation,40 may be described using 0-1 valuations and thus that every logic is logically two-valued. The idea that shifted logical values over the set of matrix values refers to the division of matrix universe into two subsets of designated and undesignated elements and use the characteristic functions of the set of 39 Similar n-element models (any n ≥ 2) of the classical logic may be provided using matrices having standard connectives described in Section 7. 40 See Section 11.
68
Grzegorz Malinowski
designated elements D as logical valuations. A question whether many-valuedness of that kind is possible at all, led Malinowski [1990] to the next mode of logical many-valuedness, more precisely the three-valuedness, being a property of a natural consequence-like approach. The departure is a division of the matrix universe into three subsets of: rejected elements, accepted elements and all other elements. On such grounds it was possible to define the relation being a formal counterpart of reasoning admitting rules of inference which from non-rejected assumptions lead to accepted conclusions, see Malinowski [1990]. The relation was then called, somewhat inaccurately, a q-consequence. In the sequel, we shall use the term inference instead. An inference matrix for L is a triple M ∗ = (A, D∗ , D), where D∗ , D are disjoint subsets of rejected and accepted elements, respectively. |=M ∗ is said to be a matrix inference of M ∗ if for any X ⊆ F or, α ∈ F or: X |=M ∗ α iff for every h ∈ Hom(L, A)(hα ∈ D whenever hX ∩ D∗ = ∅). According to this, α is inferred from the set of premises X, whenever it is the case that if all premises are not rejected then α is accepted. Thus, the logical inference runs from non-rejected premises to the accepted conclusions. There are non-trivial reasons for which considering such inference relations is but a theoretical enterprise, see Malinowski [1990; 1998]. Obviously, with each relation |=M ∗ one may associate the operation W nM ∗ : 2For → 2For putting W nM ∗ (X) = {α : X |=M ∗ α}. Notice that when D∗ ∪ D = A, W nM ∗ coincides with the consequence CnM determined by the matrix M = (A, D). In other cases the two operations differ from each other - too see this consider any inference matrix of the form ({e1 , e2 , e3 }, f1 , f2 , ..., fn , {e1 }, {e3 }). The inferential framework just introduced is a natural generalization of the standard one. So, when D∗ ∪ D = A, all concepts reduce to their standard counterparts. The inference |=M ∗ coincides with the matrix consequence |=M , since D∗ and D are complementary. Accordingly, the inference becomes the relation of consequence. It is easy to observe that for any inference matrix M ∗ for which D∗ ∪ D = A no class TV of functions t : F or → {0, 1} exists such that for all X ⊆ F or and α ∈ F or, X |=M ∗ α iff for each t ∈ TV, (t(X) ⊆ {1} implies tα = 1). Thus, some “proper” W nM ∗ are not logically two-valued in the sense of Suszko. Now, for every h ∈ Hom(L, A) we define a three-valued function kh : F or → {0, 1/2 , 1} putting
Many-valued Logic and its Philosophy
1 1 /2 kh (α) = 0
69
if h(α) ∈ D if h(α) ∈ A − (D∗ ∪ D) if h(α) ∈ D∗ .
Given an inference matrix M ∗ for L let KVM ∗ = {kh : h ∈ Hom(L, A)}, X |=M ∗ α iff for every kh ∈ KVM ∗ if kh (X) ⊆ {1/2 , 1}, then Kh (α) = 1, This is a kind of a three-valued description of |=M ∗ . Notice that KVM ∗ reduces to T VM and KVM ∗ to T VM and |=M ∗ to |=M when D∗ ∪ D = A. In Malinowski[1990] an inference operation of which W nM ∗ is a prototype was introduced and studied. An operation W : 2For → 2For is an inference operation 41 provided that for every X, Y ⊆ For (W1) (W2)
W (X ∪ W (X)) = W (X) W (X) ⊆ W (Y ) whenever
X ⊆Y.
W is called structural if for any substitution e ∈ End(L) (S)
eW (X) ⊆ W (eX).
Where M ∗ is any inference matrix, W nM ∗ is structural. In turn, all Lindenbaum’s tools may be adopted to structural inference operations W to exactly the same effect. Thus, the bundle of Lindenbaum’s inference matrices WX = ( For , For − (X ∪ W (X)) , W (X)) may be used to prove, cf. Malinowski [1990], that for every structural inference operation W there is a class K of inferential matrices such that W nK (X) = {W nM ∗ (X) : M ∗ ∈ K}.
We conclude that each structural logic (L, W ) is logically two- or three-valued. A generalization of the inferential approach onto more values seems technically possible. It also seems, however, that a natural explanation of such an inferential device might be much more difficult to get, see Malinowski [2002].
Towards many-valuedness inference Inferential three-valuedness discussed above is entirely consistent with the common understanding of logical system as a set of formulas closed under substitutions, usually defined as a content of a logical matrix: For any inferential matrix M ∗ = (A, D∗ , D) and a corresponding matrix M = (A, D), W nM ∗ (Ø) = CnM (Ø) = E(M ) 41 Originally
it was called a quasi consequence or a q-consequence.
70
Grzegorz Malinowski
This means that any logical system may equally well be extended to two-valued logic (L, CnM ) or to a three-valued logic (L, W nM ∗ ). Then, obviously, depending on the quality and cardinality of M the two extensions may define different logics. Perhaps the most striking is that even CPC i.e. the content of two-element matrix M2 can also be extended to the three-valued inference. The inference matrix M2∗ = (A2 , Ø, {1}) determines the operation W nM2∗ such that W nM2∗ (Ø) = E(M2 ) = TAUT. It would be in order to add, that the “inferential” part of this logic is, ia some sense, uninteresting since the class of non-axiomatic rules comprises only sequents X/α, where α is a tautology, α ∈ TAUT. The question on how the three-valued inference is characterized deductively, i.e. through the set of rules and appropriate conception of proof leads to interesting results. First, the proper notion of proof for the inference operation is the weakest possible when we retain the usual notion of a rule.42 To give an idea let us mention that it differs essentially from the standard proof in exactly one point: the repetition rule (rep) {α/α : α ∈ F or} is no longer unrestictedly accepted as the postulate that each premisse (or assumption) in the proof is automatically accepted as a conclusion (one of subsequent steps in the proof). Notice that the presence of the repetition rule is but an immediate consequence of the methodological Tarski postulate (T0) and not by a separate declaration i.e. it is not given as a rule per se. Note that this is no more true for the inference relation. The absence of (rep) in the general framework makes room for several modifications of inference and getting an infinitely many inferences which are still weaker than the consequence operation having the same system of theorems (tautologies) as its base. These may be received simply by adding only some instances of the rule. Thus, for instance, if W (Ø) is non-empty, for α ∈ W (Ø) α ∈ W (α). That is due to the fact that then at least { α/α : α ∈ W (Ø)} is a rule of W . Perhaps then, a slightly better way of providing a workable framework for obtaining logical many-valuedness would be to change the concept of a rule of inference.
14
FUZZY SETS AND FUZZY LOGICS
Everyday reasonings operate on imprecise concepts and are supported by approximate inferences. That makes the possibility of applying the apparatus of the classical logic to formalize them very limited. Among special tools extending the formalization power of the standard approach are fuzzy sets theory and fuzzy logics. Our aim now is to account for one of the most interesting but simultaneously most controversial conceptions inspired by logical many-valuedness. 42 See
Malinowski [1990].
Many-valued Logic and its Philosophy
71
Fuzzy sets Zadeh [1965] defines a fuzzy set A of a given domain U as an abstract object characterized by generalized characteristic function UA with values in the real set [0,1]: UA : U → [0, 1]. The values of UA are interpreted as degrees of membership of elements of U to the fuzzy set A. The extreme values of this function, 0 and 1, denote respectively: not belonging to A and entire membership to A. Limiting the scope of UA to {0, 1} results in an “ordinary” characteristic function and in this each “classical” set is a special case of a fuzzy set. Fuzzy sets are an instrument of modelling inexact predicates appearing in natural languages. Thus, for example the property of “being much greater than 1” defined on the set of real positives R+ can be assigned to a fuzzy set W with a non-decreasing characteristic function R+ → [0, 1] which meets the conditions like: RW (0) = 0, RW (1) = 0, RW (5) = 0.01, RW (100) = 0.95, RW (500) = 1 etc. Certainly, in the above example only values RW (0), RW (1) unquestionable and the selection of other values is somehow arbitrary. In the family F (U ) of fuzzy (sub)sets of a given domain the relation of inclusion reflexes the order between the reals: A⊆B
if and only if UA (x) ≤ UB (x)
for any a ∈ U ,
and the counterparts of the operations of complement (−), union (∪) and intersection (∩) are set by: U−A (x) = 1 − UA (x) UA∪B (x) = max{UA (x), UB (x)} UA∩B (x) = min{UA (x), UB (x)}. Bellman and Giertz [1973] showed that UA∪B and UA∩B are the unique nondecreasing continuous functions warranting both the compatibility of the construction with the standard algebra of sets and the fact that (F (U ), ∪, ∩, −) is a de Morgan lattice if and only if U−A is defined as above. It is in order to notice that, in spite of the naturalness of the proposal, several studies admit as (more) helpful, fuzzy sets algebras defined otherwise.43 The values of generalized characteristic functions may be identified with logical values of propositions of the form “x ∈ X”, where ∈ is a “generalized” settheoretical predicate. Subsequently, using logical constants of a base logic one may set the inclusion and the operation s of fuzzy set algebra as: A ⊆ B =df ∀x(x ∈ A → x ∈ B) −A = {x : ¬(x ∈ A)} A ∪ B = {x : x ∈ A ∨ x ∈ B} A ∩ B = {x : x ∈ A ∧ x ∈ B}. 43 See
e.g. Zadeh [1976].
72
Grzegorz Malinowski
For Zadeh’s algebra the choice of a base logic is to great extent prejudiced: it (this logic) must be based on ℵ1 -element matrix, wherein negation is expressed by the function 1 − p, disjunction and conjunction, respectively by: max{p, q} and min{p, q}, and the universal quantifier as the greatest lower bound (inf ). The function of implication is not uniquely determined. However, evidently it should meet the requirement: (.)
If p → q = 1, then
p ≤ q.
Though L ukasiewicz’s implication and, consequently, his ℵ1 -valued logic, have been the most intensely applied, similar connectives of other logics have been also taken into account. The commonly shared belief among scholars working on fuzziness, both theoreticians and practicians, is that only a concrete application of fuzzy sets algebra can decide about the final form of the base logic (see Gaines [1976a]).
Reasoning and inexact predicates In the initial applications of fuzzy sets theory much attention was focused on the analysis of reasoning using inexact predicates, standard propositional connectives and quantifiers. Remarkably susceptible of experiment there appeared to be paradoxes, whose “successful” analysis consolidated the motivational layer of the conception of fuzziness. This account yielded the first understanding of the term “fuzzy logic” as a certain class of many-valued logics with uncountably many values, with L ukasiewicz logic in the foreground. Goguen’s [1969] analysis of the classical paradox of a bald man may serve as an example. Intuitively, we would be ready to accept the two following propositions: (z1) (z2)
A man with 20.000 hairs on his head is not bald A man who has one hair less than somebody who is not bald is not bald as well.
So, applying the Detachment Rule 20.000 times, we shall get the conclusion that a man with no hair is not bald either. Naturally, the paradox stems from (z2) and more specifically from inexactness of the predicate “bald” or, equivalently, “not-bald”. The paradox will vanish when logical value of the proposition “A man with n hair is not bald” is identified with the degree of membership of a man with n hair to a fuzzy set “not-bald”, since then (z2) would have logical value less than 1, say 1 − ε, where ε > 0. If, for instance, in basic logic we use L ukasiewicz implication then as result of 20.000 derivations we will obtain a proposition of the logical value amounting to 1 − 20.000ε, thus practically false. The development of fuzzy sets theory has surpassed all expectations. For its use there were adopted almost all important concepts of set theory, topology, algebra and probability calculus. It is remarkable that these were the applications in Computer Science and steering theory that gave an impetus to the most
Many-valued Logic and its Philosophy
73
extensive development of the theory: they have confirmed the usefulness of the unconventional methodology worked out by means of fuzzy set conception (see e.g. Gottwald [1981]).
Fuzzy logic proper Zadeh’s [1975] fuzzy logic is a method of modelling of imprecise reasoning operating on imprecise concepts and rules of approximate reasoning. Its construction conveyed the belief that thinking in terms of fuzzy sets is a typical feature of human perception. The examples of reasoning whose analysis might be possible due to fuzzy logic are such “inferences” as: Putin is very healthy Healthy men live a very long time
Bill likes women that are tall and wicked Monica is not very tall but very wicked
Putin will live a very long time
Bill will probably like Monica.
Fuzzy logic seeks to formulate several rules of approximate inference. For this purpose it attempts to formalize colloquial linguistic usage of certain “hedges” applied to imprecise concepts such as “very”, “more or less”, “too” etc. Zadeh’s logic is a two-level semantical construction allowing the fuzzines of predicates, their hedges and logical values. Its central elements are: (1) Denumerable set T V of linguistic logical values generated by its element “true” with the help of the hedge “very” and logical connectives. (2) Hedges of predicates and logical values, “very” being a special one. (3) The procedure of linguistic approximation compensating the lack of closure of the (object) language and the closure of the set T V onto logical connectives. Fuzzy logic is based on ordinary ℵ1 -valued logic with the connectives ¬, →,∨, ∧, ≡ and values in [0,1]. It identifies predicates with fuzzy subsets of a given universe and logical values with fuzzy subsets of the set of [0,1]. The most frequently used and discussed is the “fundamental” system FL obtained on L ukasiewicz Lℵ1 (see Bellman and Zadeh [1977]). We shall confine our further considerations to some aspects of this construction. It is assumed that the set T V of linguistic logical values of FL is the set of the form: T V = { true, false, not true, not false, very true, more or less true, rather true, not very true, not very false, . . .}
74
Grzegorz Malinowski
where “true” is a fixed fuzzy subset of [0,1], “very” a fixed hedge, and all other elements are defined through “true”, “very” and (operations determined by) L ukasiewicz’s connectives. Obviously, the “names” of all logical values are conventional labels and their proper meanings follow from characteristic functions. Hedges are one-argument operations sending fuzzy sets to fuzzy sets. The most basic (a primitive) is a hedge reflecting the adverb “very“ and denoted as g. Zadeh [1972] supposes (g)
UgA (x) = (UA (x))2
and suggests that other hedges should be defined as superpositions of g and connectives of the basic logic. The most important “derived” operator d is described as follows: (d)
UdA (x) = (UA (x))0,5 .
Zadeh urges that “more or less” is a linguistic counterpart of d. In the relevant literature g and d are called standard hedges and expressibility by their use has become a definability hedge criterion. The procedure of linguistic approximation stems from the lack of closure of T V under logical connectives. More specifically, if Utrue is a characteristic function of the value labelled “true” then, of course, all other elements of that set are thereby specified. Thus, for example the initial values of T V are defined by the following functions: Uf alse (x) Unot true (x) Uvery true (x) Umore or less true (x)
= = = =
Utrue (1 − x) 1 − Utrue (x) (Utrue (x))2 (Utrue (x))0,5 ,
(x ranges over the set of values of the base logic, x ∈ [0, 1]). Consequently, all linguistic logical values depend on the (subjective) introduction of the meaning of “true”. Zadeh calls this feature a localization and the elements of T V , local values. Thereby received, as a result of localization, the systems of FL are called local logics. For those logics there is the common way of defining logical connectives; it consists in identifying them with the operations of the algebra of fuzzy (subsets) of [0,1] and thus, in a sense, with connectives of the base logic. A linguistic approximation is a heuristic procedure assigning to any propositions linguistic logical values. For obvious reasons it is impossible to specify even a general principle of that procedure; what can said is merely that in a concrete application of a fuzzy logic it consists in searching for the closest value from T V 0 for a statement. As it has been already mentioned, fuzzy logic aims at formulating the rules of approximate reasoning (or inference). Basic as well standard rule of that kind is the Compositional Rule of Inference
Many-valued Logic and its Philosophy
u1 is F u1 and u2 are G
u1 and u2 are F u2 and u3 are G
u2 is LA{F ∗ G}
u1 and u3 are LA{F ∗ G}
75
where u1 and u2 are objects, F and G predicates (properties or relations) and ∗ an operation of relational composition, in the former case ∗ is defined as follows: UF ∗G (u2 ) = supx (UF (x) ∧ UG (x, u2 )), and LA{F ∗ G} is a linguistic approximation to the (unary) fuzzy relation F ∗ G. The following is an exemplification of the first scheme: a is small number a and b are approximately equal b is more or less small number In more complicated rules, the premisses are compound statements of several degrees of complexity e.g. if u1 is F , then u2 is G, and may be quantified by fuzzy quantifiers i.e. expressions like most, many, several, few etc. In the end, the formulation of other rules, like e.g.the Rule of Compositional Modus Ponens special operations on fuzzy sets has to be used. In practical application of fuzzy inference rules the first set consists in assigning fuzzy predicates fuzzy subsets of a given universe; in the case of (non-unary) relations, fuzzy subsets of relevant Cartesian products of their domains. This procedure, often referred to as fuzzy restriction, obviously exceeds logic. Notwithstanding, within the scope of a logic it is possible to formulate general principles of restriction resulting from the peculiarity of formal devices. Zadeh’s “fuzzy” conception has found its place among accepted methods of Artificial Intelligence. It holds its ground due to reliable applications as e.g. in medical diagnosis, see Turner [1984].
15
RECENT DEVELOPMENTS
Algebraic and metamathematical studies of the infinite-valed L ukasiewicz logic, see Cignoli et al. [1999], are among the most important issues of recent investigations. Somewhat related to these studies are activities and concerning “t-norm” based logics and delineating a class of propositional logics, called by H´ ajek [1998] fuzzy logics (in a narrow) sense. Further to this, at least two topics somewhat connected to the sciences of information should be mentioned: the lattices of truth and information, invented by Belnap [1977] and Ginsberg [1987], and the automatic deduction problems.
76
Grzegorz Malinowski
Truth functional fuzzy logics The influence of fuzzy set theory initiated the study of a class of systems of many-valued logics, whose semantics is based on the real interval [0,1]. Several comparisons between the systems serving as a base for particular constructions directed the scholar’s attention to, possibly idempotent, strong conjunctions connectives, whose corresponding truth functions were associative, commutative, nondecreasing and have 1 as its neutral (unit) element. Such functions were called t-norms. Accordingly, a binary function ∗ on [0,1] is a t-norm (triangular norm) if for any x, y, z ∈ [0, 1] x ∗ (y ∗ z) = (x ∗ y) ∗ z x∗y =y∗x if x ≤ y, then x ∗ z ≤ y ∗ z x ∗ 1 = x. Connectives corresponding to t-norms are conjunctions. Further to this one may also define t-conorms which serve as truth functions of disjunctions and, possibly do relate the two functions using appropriate function of negation. In Section 6 we already had examples of both, the t-norm (the function min(x, y) of L ukasiewicz conjunction) and t-conorm (the function max(x, y) of L ukasiewicz disjunction). H´ajek’s [1998] is the main study of fuzzy logics in the narrow sense. More precisely, the study of logics defined by continuous t-norms (a t-norm is continuous is considered in the mathematical terms is continuous as a mapping). Among the important continuous t-norms are the following: L ukasiewicz t-norm x ∗ y = max(0, x + y − 1), G¨ odel t-norm x ∗ y = min(x, y), product t-norm x ∗ y = x·y ; it may be of interest that all these functions have been used in numerous applications of fuzzy set theory as well as fuzzy logics (compare Section 14). The connectives defined through the continuous t-norm conjunctions (continuity with respect to the left argument is sufficient) are special. Accordingly, there is algebraically nice procedure relating them with implications, which have having good metalogical properties. Any such implication → is defined as residuum of a given t-norm ∗, i.e. x → y = max{ x : x ∗ z ≤ y}. H´ ajek introduces the basic fuzzy propositional logic, BL-logic, as the logic of continuous t-norms on [0,1]. The language of BL comprises the connectives of conjunction &, implication → and the constant ⊥ of falsity. The semantics of BL is established by the function of t-norm, all other functions corresponding to the connectives are derived. A formula is a BL tautology if and only if under each valuation of propositional variables compatible with the functions of connectives it takes the value 1. H´ ajek’s [1998] axiom system adequate for BL logic is the following:
Many-valued Logic and its Philosophy
77
(H1) (α → β)→ ((β → γ) → (α → γ) (H2) (α & β) → α) (H3) (α & β) →(β & α) (H4) (α & (α → β))→(β & (β → α) (H5a) (α →(β → γ)) → ((α & β) → γ) (H5b) ((α & β)→ γ)→ (α →(β → γ)) (H6) ((α → β) → γ)→ (((β → α) → γ) → γ) (H7) ⊥ → α . Any system of propositional logic determined by a t-norm in the way indicated may be received as a strengthening of BL. For instance, L ukasiewicz, G¨odel and product logics result form BL by addition of one axiom schema marked by the first letter of its name: (L) ¬¬α → α (G) α → (α & β) (P) ¬¬α → ((α → (α & β)) → (β & ¬¬β)). BL extends to the basic fuzzy predicate logic in a standard way. H´ ajek [1998] shows interesting features of t-norm based predicate calculi, see also H´ajek and others [2001], Montagna [2000].
Tableaux and sets of signs Tableaux are among valuable tools for automated deduction, see Bolc, Borowik [2003], the area of interest of scholars working in logic programming, automated software development and verification. That explains still big interest in exploring and developing the tableaux methods in many-valued logic. H¨ ahnle [1993] improved essentially the multiple-valued proving based on the standard tableaux for finite-valued logic. The solution, which has to decrease redundancy of tableaux systems uses truth value sets of signs as prefixes instead signs. Following H¨ ahnle, let us start with the Surma-Carnielli method of refutation illustrated with an example anchored in the three-valued L ukasiewicz logic. The tableaux rules mirror entries of the truth-tables of the connectives. Thus, for the disjunction ∨ characterized by the max function i ∨ j = max(i, j), the disjunctive formula signed with 1/2 , the rule emerges simply from the entry on the table described by the formula 1
/2 (ϕ ∨ ψ) iff (1/2 ϕ and 0ψ) or (1/2 ϕ and 1/2 ψ) or (0ϕ and 1/2 ψ)
and it has the following form: 1 1
/2 ϕ 0ψ
/2 (ϕ ∨ ψ) 1 /2 ϕ 0ϕ 1 1 /2 ψ /2 ψ .
78
Grzegorz Malinowski
As usual the vertical line signifies branching. It is obvious that in general the increase of logical value increases both the number of rules and the size of branching. Given a propositional logic L defined by a finite matrix M = (AM , D) , where AM = (AM , f1 , . . . , fr ), the algebra of signs for L is defined as an algebra AS = (S, f1′ , . . . , fr′ ) similar to AM with the operations defined as mappings from finite sequences of elements of S into sets of signs: f1′ (S1 , . . . , Sm ) = {fi (j1 , . . . , jm )|jk ∈ Sk , 1 ≤ k ≤ m}. Any algebra AS defines a semantics of L in terms of truth values sets corresponding to the members of S. Thus, for a formula ϕ = F (ϕ1 , . . . , ϕm ) two related interpretations f and f ′ of F in A and AS are associated. The definition of an L−tableau rule,44 specifies the conditions ensuring all expected properties, that is, soundness, completeness, and some minimizing requirements, expressed in terms of linear subtrees called extensions. A collection of extensions is a conclusion of a tableau rule when it satisfies four conditions which relate possible functions of the matrix with extensions and homomorphisms from the language L into the algebra of signs AS . The properties, which the class of homomorphisms associated to a given logic must satisfy, imply a kind of minimality of a number of extensions as well as exhaustiveness of the covering of the truth tables of the connectives. A minimal set of homomorphisms associated to a connective immediately leads to a tableau rule. For the disjunction {1/2 }(ϕ ∨ ψ) in the three-valued logic already considered, this set of homomorphisms has two elements h1 and h2 : h1 (ϕ) = {1/2 }, h1 (ψ) = {0,1/2 } h2 (ϕ) = {0,1/2 }, h2 (ψ) = {1/2 }. This, in turn, yields the following rule: {1/2 }(ϕ ∨ ψ) {0, /2 }ϕ {1/2 }ϕ 1 { /2 }ψ {0, 1/2 }ψ . 1
The new paradigm requires some further changes in the conceptual environment. To provide them all one should collect and consider all possible queries or, at least, reduce them to a small set. H¨ahnle is aware of that, and he gives a definition of a contradiction set of signed formulas. A signed formula for which no rule is defined is self-contradictory. Such is {1/2 }Iϕ : no rule for Iϕ exists since I ranges over the set {0,1}. Further examples of self-contradictory formulas in L 3 are {1/2 }Lϕ, 1 { /2 }M ϕ and {0, 1}T ϕ. A full tableau system for the propositional part p of the three-valued logic L 3 presented in H¨ ahnle [1993] employs the following set of signs: {{0, }, {1/2 }, {1}, {0, 1/2 }, {1/2 , 1}} 44 See
H¨ ahnle [1993, p. 34].
Many-valued Logic and its Philosophy
79
and the set of rules consisting of schemes such as (1) for every set in the family just specified and for every connective of L 3 . We show how the sets of signs work 3 tableau proof of validity of the formula on the example45 which presents the L ¬p ⊃ (∼ p ∧ ¬p) with two L 3 -definable connectives: ∼ i = 0 if i = 1, ∼ i = 1 otherwise; i ⊃ j = j if i = 1, i ⊃ j = 1 otherwise. (1) [−] {0, 1/2 }(¬p ⊃ (∼ p ∧ ¬p)) | (2) [1] {1}¬p | (3) [1] {0, 1/2 }(∼ p ∧ ¬p) | (4) [2] {0}p (5) [3] {0, 1/2 } ∼ p
(7) [3] {0, 1/2 }¬p |
(6) [5] {1}p closed with (4,6)
(8) [7] {1/2 , 1}p closed with (4,8)
Lattices of truth and information Belnap [1977; 1977a] defined a four-valued logic, primarily designed for solving some problems of relevant logics. However, the logic and its generalization occurred also important for automated reasoning, Artificial Intelligence and Computer Science applications. The original motivation is based on the idea, that the data, which are used by the computer or another inferential device, including a human being, may be inconsistent. Accordingly, given a state of affairs S, the data base may not contain any information concerning it, or it may contain contain the both: information that S obtains and that S does not obtain. The Belnap’s logic has to handle that situation and its set of epistemic values B4 consists of four elements: f (falsity), t (truth), ⊥ (undetermined) and ⊤ (overdetermined): B4 = { ⊥ f , t , ⊤}. Two lattice orderings on B4 are possible: the knowledge ordering ≤ k , which has ⊥ as the minimal and ⊤ as the maximal elements, and the truth ordering ≤ t having f as the minimal and t as the maximal elements. Both orderings may be represented in the following way: 45 Example
4.9 in H¨ ahnle [1993]; all tableaux rules are on pages 38 and 39.
80
Grzegorz Malinowski
knowledge 6 f
⊤ @ @ @t
@ @ @ ⊥
-
truth
The last figure shows two lattices: the knowledge lattice, showed directly and the truth lattice, which results if if the diamond is rotated counterclockwise and the axes change their menings. The two lattices are similar, though they are applicable in different situations. The knowledge lattice is useful for relevant logics and to paraconsistency — Belnap employs the matrix having it as universe with ⊤ and t as two distinguished elements. The truth lattice suits better for computer science applications — in this case it is natural to take t as the only designated element. Considering two partial orderings of B4 simultaneously was the major intuition behind the concept of bilattice, an ingenious generalization of the Belnap’s setting, see Ginsberg [1987; 1988]. The main idea was to get arbitrary set of values with two partial orders forming two lattices related as in B4. A bilattice is a structure ( B, ≤ t , ≤ k , − ) such that: (1) (B, ≤t ) and (B, ≤k ) are complete lattices (2) x ≤t y implies that −y ≤t −x for all x, y ∈ B (3) x ≤k y implies that −x ≤k −y for all x, y ∈ B (4) − − x = x for all x ∈ B In the next figure we have an example of the bilattice having nine elements. Arieli and Avron [1994; 1996] introduced the concept of logical bilattice. Logical bilattice is a pair (B,F ), where B is a bilattice and F is a prime bifilter on B , i.e. a prime filter with respect to ≤ k and a prime filter with respect to ≤ t . The investigations of bilattices went into at least two directions: algebraic and logical — see e.g. Avron [1996]. Fitting introduced the notion of interlaced bilattice: a bilattice is interlaced if the lattice operations of inf and sup defined by the two orders are monotone with respect to both, the ≤ k and ≤ t . One of the most important results concerning bilattices is the characterisation of the class logical bilattices in terms of B4. Avron [1998] showed that B4 plays similar role among bilattices as the two-element Boolean algebra in the class of all Boolean algebras.
Many-valued Logic and its Philosophy
81
knowledge 6 (1, 1) @ @ 1 (1/2 , 1) @ (1, /2 ) @ @ @ @ @ (1/2 , 1/2 ) @ (1, 0) (0, 1) @ @ @ @ @ @ (1/ , 0) (0, 1/2 )@ 2 @ @ (0, 0) truth
16
APPLICATIONS
Projections of the expected applications always constituted a distinguished motivation for many-valued logical constructions. Some these conceptions, as e.g. in L ukasiewicz, were to be a philosophical revolution. And, though it is quite uneasy to say whether and to what extent these expectations came actually to vain there are some concrete applications of many-valued logics and algebras to philosophical logic and to such practical areas as the switching theory and Computer Science. Below we present some examples.
Independence of axioms The logical method of testing axioms independence using algebras and matrices is credited to Bernays and L ukasiewicz. To demonstrate that an axiom system is independent one singles out a property, mostly validity, which is common to all axioms besides of one chosen and is inherited, via accepted rules of inference, by all theorems of the systems. The procedure being repeated as many times as there is the number of axioms in the system. The application of the method can be illustrated by the following example: Consider (¬, →)-system of the classical propositional calculus originating with L ukasiewicz. Its axioms are:
82
Grzegorz Malinowski
(A1) (A2) (A3)
(¬p → p) → p p → (¬p → q) (p → q) → ((q → r) → (p → r)),
and the rules M P and SU B. Let now M(A1) = ({0, 1}, ¬1 , →, {1}) ,
M(A2) = ({0, 1}, ¬2 , →, {1})
be matrices wherein the implication connective → is determined classically (by the well-known truth table), ¬1 (0) = ¬1 (1) = 0 and ¬2 (0) = ¬2 (1) = 1. Moreover let M(A3) = ({0, 1/2 , 1}, ¬, →, {1}) be a matrix with the connectives defined by the tables:
α 0 1 /2 1
¬α 1 1 /2 1
→ 0 1 /2 1
0 1 1 0
1
/2 1 0 0
1 1 1 1
It is readily seen that E(M(A1) ), E(M(A2) ) and E(M(A3) ) are closed under (M P ) and (SU B) and that (1) (2) (3)
A1 ∈ E(M(A1) ), A1 ∈ E(M(A2) ), A1 ∈ E(M(A3) ),
A2 ∈ E(M(A1) ) A2 ∈ E(M(A2) ) A2 ∈ E(M(A3) )
and and and
A3 ∈ E(M(A1) ), A3 ∈ E(M(A2) ), A3 ∈ E(M(A3) ).
Therefore the axiomatics (A1)–(A3) is independent. The application of the method described is not limited to the logical calculi. The proofs of independence in set theory through the use of matrices built on the base of Boolean algebras were presented by Scott and Solovay [1969].
Formalization of intensional functions L o´s [1948] showed that, under some reasonable assumptions, the formalization of functions of the kind “John believes that p” or, more accurately, “John asserts, that p” naturally leads to many-valued interpretation of the belief-operators within the scope of the system of the classical logic. All propositions “John asserts, that p” are clearly substitutions of the schema “x asserts, that p”, whose formal counterpart is a function Lxp, assigning logical value to each couple (name, proposition). L o´s gives the following axioms for his system in an appropriate language L:
Many-valued Logic and its Philosophy
83
Lxp ≡ ¬Lx(¬p) Lx(Ai), where (Ai), i ∈ {1, 2, 3}, is the ith axiom of (¬, →)-system of CPC of L ukasiewicz Lx(p → q) → (Lxp → Lxq) (∀x)Lxp → p LxLxp ≡ Lxp
(L1) (L2i) (L3) (L4) (L5)
and accepts the rules: MP, the substitution rule and generalization. The intuitions captured by the author are justified: so, e.g. (L3) expresses the fact that everyone uses MP : asserting a conditional statement and its antecedent commits one to assert the consequent. (L4) says that a sentence acknowledged by everyone is a theorem of the system. The operators Lx, Ly, . . . are certainly not the only intensional functions of the system considered. What is more, the closed formulas of the language define intensional propositional functions i.e. connectives; here the case is the function S: Sα =df ∃x∃y(Lxα ∧ Ly(¬α)), which can be interpreted as “it is questionable, that α”. Intuitively, the definition conveys the thought that saying “it is debatable that α” means to say “two people exist such that one asserts α, and the other asserts not-α”. Any interpretation of the system of L-operators starts with the selection of a definite range of nominal variables and a proposition set. In the simplest case of two persons A and B who do not agree in all the issues the set of propositions Z is divided into four classes, denoted as: 0, 1/3 , 2/3 , 1. The first class, 0, contains propositions which are acknowledged by neither person, the second class, 1/3 , propositions which A acknowledges and B does not, the third class, 2/3 , propositions which B acknowledges and A does not. Finally, the forth class, 1, all propositions acknowledged by A and B i.e. all logical theorems and perhaps other propositions acknowledged by both men. Identifying the acceptance of a proposition α by a person s with the formula Lsα we get a truth-table assigning to classes 0, 1/3 , 2/3 , 1 the pairs of logical values of truth (t) and falsity (f). In turn, making use of truth-tables for t and f we get the characterization of implication, negation and the connective S by means of the tables whose elements are the symbols of the four considered classes of propositions.46 L 0 1 /3 2 /3 1
A f t f t
B f f t t
α 0 1 /3 2 /3 1
¬α 1 2 /3 1 /3 0
Sα 0 1 1 0
→ 0 1 /3 2 /3 1
0 1 2 /3 1 /3 0
1
/3 1 1 1 /3 1 /3
2
/3 1 2 /3 1 2 /3
1 1 1 1 1
46 0,1/ ,2/ , 1 are identified with pairs (x, y), where x, y ∈ {f, t}, defined by the table for L. 3 3 The connectives are defined just as in the product of matrices i.e. ¬(x, y) = (¬x, ¬y) and (x1 , y1 ) → (x2 , y2 ) = (x1 → x2 , y1 → y2 ).
84
Grzegorz Malinowski
L o´s suggests that the objects 0, 1/3 , 2/3 , 1 may be treated as logical values. He further considers the matrix I4 = ({0, 1/3 , 2/3 1}, ¬, →, {1}) showing that E(I4) = TAU T {¬, →} (the set of classical (¬, →)-tautologies), and hence that many-valuedness of this system has but formal character. The shifting of L o´s interpretation onto the cases with more persons is straightforward, and it results in next formally many-valued versions of CPC (described by matrices with more than four elements).
Many-valued algebras and switching theory Soon after the successful applications of the classical logic, Boolean algebras and other algebraic structures (e.g. groups) in switching theory, in the 1950s, the scholars centered the interests on the possibility of the use of many-valued logic algebras for similar purposes (see e.g. Epstein, Frieder and Rine [1974]). These interests brought about the birth of several techniques of the analysis and synthesis of electronic circuits and relays based (mainly) on Moisil and Post algebras (see Rine [1977]). Below, we confine ourselves to some remarks justifying the purposefulness of using many-valued algebras in switching and relay circuits theory. The most elementary composite of the traditional electronic circuit is a mechanical contact opening and closing some fragment of an electrical network. The switch over of contacts is affected mechanically or electromechanically (i.e. using relays). And, among the contacts of a given network one may find such pairs of contacts which according to the technical assumptions have to change their positions into complementary simultaneously. The simplest example of such situation is the gear of two oppositely oriented contacts x1 and x2 positioned in parallel branches of a circuit (see Fig. 1): x1 is normally closed while x2 normally open. When considering the ideal model of the circuit one assumes x1
x2 Fig. 1
x1 •((◦ •◦ •◦
(1/2 )
x2 •◦ •◦
(0)
•((◦
(1)
Fig. 2
that both contacts react momentarily to an affection and thus stroking, as shown at Fig. 2, from the state (1) to (0). Practically, however, it may happen that x1 will open still before x2 would be closed and, consequently, contrary to the technical presumptions the gear during a time moment will be open. That is just the
Many-valued Logic and its Philosophy
85
reason for such a modelling in which the third state (1/2 ) (see Fig. 2) is considered; the table bez x1 x2 forehand characterizes the “real” switch-function 1 1 1 1 as a function of states and contact (1 inside the /2 0 1 table denotes normal contact’s state and 0 its de0 0 0 nial). On the other hand, one also may read the table treating x1 and x2 as (one-argument) functions of states i.e. of z, and their values as states as well putting: x1 = s1 (z), x2 = s2 (z) . Let us notice that then s1 and s2 are Moisil’s operations on {0, 1/2 , 1} . Subsequently, to describe any network built of the contacts x1 , x2 and their complements x1, x2 one should define binary operations ∪ and ∩ corresponding to two possible types of connections and unary operation N such that xi = N xi (i = 1, 2) and that N N z = z for z ∈ {0, 1/2 , 1} . It appears that the most accurate ways of introducing these operations leads to the three-valued Moisil algebra on {0, 1/2 , 1} : ({0, 1/2 , 1}, ∪, ∩, N, s1 , s2 ). A generalization of the outlined construction onto the case of any number of contacts similarly results in n-valued algebras. The algebraic treatment of switching devices aims at providing several techniques of the analysis, the synthesis and the minimalization of multiplex networks. The most important advantage of the many-valued approach is the possibility of elimination of possible switching disturbance through the algebraic synthesis of the networks, see e.g. Moisil [1966]. Application of many-valued algebras is not limited to binary contacts. Investigations concerning multi-stable contacts and switches have also been undertaken. However, according to difficulties with technical realizations of devices working on voltage-mode and the progress of technology of binary highly integrated circuits these activities are not very common. Still, however, many-valued constructions attract attention of engineers. Thus, for instance, multiple values may be useful for describing transistors. Hayes [1986], H¨ ahnle and Kernig [1993] give an example of such modelling M OS transistors. Due to a degradation of signals, a M OS transistor has different signal levels at source and terminals. It occurs that the natural way of modelling leads to a seven-valued ⊥), whose values are organized in a form of a lattice logic N = (F, T, F, T, ⊤, ⊤, F
F
@ @
⊤ @ @ @T
@ @⊤ @
@ @ @ ⊥
@ T
86
Grzegorz Malinowski
where the values represent respectively: F and T — full strength signals, F and — values which represent two modes of faulty nodes, T — degraded signals, ⊤, ⊤ 47 and ⊥ — “no signal”). The reader interested in the current state of investigation in this field is advised to consult “Computer”, IEEE Computer Society monthly as well as the yearly editions of this influential American organization “Proceedings of the MultipleValued Symposia”.
Many-valuedness in Computer Science The full scale hardware realizations of ternary computers were successfully completed at least twice: in 1958 the arithmetical SETUN in USSR and in 1973 logicoarithmetical TERNAC in USA. However, just the first emulation of TERNAC proved that both the speed and price are on the order of magnitude of the speed and price of binary computers. Since further experiments also showed the necessity of complication of programming languages, more attention has been directed towards the using many-valued algebras for synthesis and construction of hardware devices working with 2n voltage levels (n ≥ 2), especially memories (see Epstein, Frieder and Rine [1974], Bauer et al. [1995]). Post algebras found an important application in the systematization of theoretical research concerning programs and higher level programming languages which contain instructions branching programs, such as e.g. CASE, SELECT etc. The application of these instructions considerably simplifies programs thus making them more readible. In turn, the structure reconstruction of a branched program naturally leads to Post algebra of an order n. The typical CASE (or SELECT )situation is that in which one the (sub)programs P 1, . . . , P n should be performed according to whether conditions W 1 or . . . or W n are satisfied. Then, the constant functions of Post algebra, e1 , . . . , en , may be interpreted as devices which keep track of which of W 1, . . . , W n are true. The most powerful tool of the contemporary methodology of programming languages, algorithmic logic (see Salwicki [1970]) is formulated in the language containing operators which represent composition, branching and iteration operations on programs. The systems of this logic contain expressions representing programs and formulas describing properties of these programs. Rasiowa’s ω + -valued extension of algorithmic logic is not only fully adapted to arbitrary “wide” branching programs but it also constitutes a starting point for other more advanced logical constructions (see Rasiowa [1977]). Its semantics is based on Post algebras of order ω + defined similarly as Post algebras of order n in Rousseau version: the simplest algebra of that kind is of the following form: Pω = ({e0 , . . . , eω }, ∪, ∩, →, −, {Dk }k<ω , {ei }0≤i≤ω ) (0, 1, . . . , ω are ordinal numbers), where: −eω = e0 < e1 < . . . < eω , ei ∪ ek = emax (i, k) , ei ∩ ek = emin (i, k) , ei → ek = eω when i ≤ k, ei → ek = ek when i > k, −ei = e0 for 47 See
also H¨ ahnle [1993].
Many-valued Logic and its Philosophy
87
i = 0, −e0 = eω , Di (ek ) = eω if i ≤ k and Di (ek ) = e0 otherwise (compare Section 5). Algorithmic logic is among the main predecessors of the recent outcome of computing theory called dynamic logic, see e.g. Harel [1984]. The term dynamic logic is the generic name given to logical systems appropriate for reasoning about changes from one state to another, which ultimately may also represent programs. Though it is grounded in features of programming and program verification, dynamic logic borrowed heavily from modal logic and has its autonomous philosophical importance. Many-valued semantics founded on several algebraic structures have been widely used in logic programming. The framework for applications of bilattices in semantics of logic programming is discussed in Fitting [1991].
BIBLIOGRAPHY [Ackermann, 1967*] R. Ackermann. Introduction to many-valued logics. Routledge and Kegan Paul, London, 1967. [Arieli and Avron, 1994] O. Arieli and A. Avron. Logical bilattices and inconsistent data. In Proceedings of the 9th IEEE Annual Symposium on logic and Computer Science. IEEE Press, 468–476, 1994. [Arieli and Avron, 1996] O. Arieli and A. Avron. Reasoning with logical bilattices. Journal of Logic, Language and Information, 5, no. 1, 25–63, 1996. [Avron, 1996] A. Avron. The structure of interlaced bilattices. Journal of Mathematical Structures in Computer Science, 6, no. 1, 287–299, 1996. [Avron, 1998] A. Avron. The value of the four values. Artificial nIntelligence, 102, 97–141, 1998. [Baaz et al., 1993] M. Baaz, C. Ferm¨ uller, and R. Zach. Systematic construction of natural deduction for many-valued logics. Proceedings of the 23rd International Symposium on Multiplevalued Logic, IEEE Press, Los Gatos, CA, 208–213, 1993. [Batens, 1980] D. Batens. A completeness-proof method for extensions of the implicational fragment of the propositional calculus. Notre Dame Journal of Formal Logic, 20, 509–517, 1980. [Batens, 1982] D. Batens. A bridge between two-valued abd many-valued semantic systems: Ntuple semantics. Proceedings of the 12rth International Symposium on Multiple-valued Logic, IEEE Press, Los Angeles, 121–133, 1982. [Becchio and Pabion, 1977] D. Becchio and J.-F. Pabion. Gentzen’s techniques in the threevalued logic of L ukasiewicz (Abstract). Journal of Symbolic Logic, 42(2), 123–124, 1977. [Bellman and Giertz, 1973] R. E. Bellman and M. Giertz. On the analytic formalism of the theory of fuzzy sets. Information Sciences, 5, 149–156, 1973. [Bellman and Zadeh, 1977] R. E. Bellman and L. A. Zadeh. Local and fuzzy logics. In Dunn, J. M. and Epstein, G. (eds) Modern uses of multiple-valued logic. D. Reidel, Dordrecht, 105–165, 1977. [Belluce and Chang, 1963] L. P. Belluce and C. C. Chang. A weak completeness theorem for infinite-valued first-order logic. The Journal of Symbolic Logic, 28, 43–50, 1963. [Belnap, 1970] N. D. Belnap. Conditional assertion and restricted quantification. Noˆ us, 4, 1–13, 1970. [Belnap, 1977] N. D. Belnap. A useful four-valued logic. In Dunn, J. M. and Epstein G. (eds) Modern uses of multiple-valued logic. D. Reidel, Dordrecht, 8–37, 1977. [Belnap, 1977a] N. D. Belnap. How computer should think. In Ryle G. (ed.) Contemporary aspects of philosophy. Oriel Press, 30–56, 1977. [Beth, 1956] E. W. Beth. Semantic construction of intuitionistic logic. Mededelingen der koninklijke Nederlandse Akademie van Wetenhappen, new series, 19, no. 11, 357–388, 1956.
88
Grzegorz Malinowski
[Bochvar, 1938] D. A. Bochvar. Ob odnom tr´ehznaˇcnom isˇcisl´ enii i ´ ego prim´ en´ enii k analizu paradosov klassiˇc´ eskogo rasˇsirennogo funkcjonal’nogo isˇcisl´enia (On a three-valued calculus and its application to analysis of paradoxes of classical extended functional calculus). Mat´ ematiˇ c´ eskij Sbornik, 4, 287–308, 1938. [Bolc and Borowik, 2003*] L. Bolc and P. Borowik. Many-valued logics. Vol. 2: Automated reasoning and practical applications. Springer Verlag, 2003. [Carnielli, 1987] W. A. Carnielli. Systematization of finite-valued logics through the method of tableaux. Journal of Symbolic Logic, 52(2), 473–493, 1987. [Carnielli, 1991] W. A. Carnielli. On sequents and tableaux for many-valued logics. Journal of Symbolic Logic, 8(1), 59–76, 1991. [Chang, 1958a] C. C. Chang. Proof of an axiom of L ukasiewicz. Transactions of the American Mathematical Society, 87, 55–56, 1958. [Chang, 1958b] C. C. Chang. Algebraic analysis of many-valued logics. Transactions of the American Mathematical Society, 88, 467–490, 1958. [Chang, 1959] C. C. Chang. A new proof of the completeness of the L ukasiewicz axioms. Transactions of the American Mathematical Society, 93, 74–80, 1959. [Chang, 1963] C. C. Chang. The axiom of comprehension in infinite-valued logic. Mathematica Scandinavica, 13, 9–30, 1963. [Chang, 1966] C. C. Chang and H. J. Keisler. Continuous model theory. Princeton University Press, Princeton, New Jersey, 1966. [Chang and Keisler, 1973] C. C. Chang and H. J. Keisler. Model theory. North-Holland, Amsterdam, 1973. [Cignoli, 1980] R. Cignoli. Some algebraic aspects of many-valued logics. In Arruda, A. I. , da Costa, N. C. A. and Sette, A. M. (eds) Proceedings of the Third Brasilian Conference on Mathematical Logic, Sa˜ o Paulo, 49–69, 1980. [Cignoli, 1982] R. Cignoli. Proper n-valued L ukasiewicz algebras as S-algebras of L ukasiewicz n-valued propositional calculi. Studia Logica, 41, 3–16, 1982. [Cignoli et al., 1999*] R. Cignoli, I. M. L. D’Ottaviano, and D. Mundici. Foundations of manyvalued reasoning, Trends in Logic: Studia Logica Library, vol. 7. Kluwer Academic Publishers, Dordrecht, 1999. [Czelakowski, 2001] J. Czelakowski. Protoalgebraic logics. Trends in Logic: Studia Logica Library, vol. 10. Kluwer Academic Publishers, Dordrecht, 2001. [da Costa, 1974] N. C. A. da Costa. On the theory of inconsistent formal systems Notre Dame Journal of Formal Logic, 15, 497–510, 1974. [D’Agostino, 1999] M. D’Agostino. Tableaux methods for classical propositional logic. In D’Agostino, M., Gabbay, D., H¨ ahnle, R., and Posegga, J. (eds) Handbook of tableau methods. Kluwer Academic Publishers, Dordrecht, 45–123, 1999. [Dalen, 1986] D. van Dalen. Intuitionistic logic. In Gabbay, D. and Guenthner, F. (eds) Handbook of philosophical logic, vol. III. D. Reidel, Dordrecht, 225–339, 1986. [Dalen, 1986a] D. van Dalen. Intuitionistic logic. In Gabbay, D. and Guenthner, F. (eds) Handbook of philosophical logic, vol. III. D. Reidel, Dordrecht, 225–339, 1986. [Dummett, 1959] M. Dummett. A propositional matrix with denumerable matrix. The Journal of Symbolic Logic, 24, 97–106, 1969. [Dunn and Hardegree, 2001] J. M. Dunn and G. M. Hardegree. Algebraic methods in philosophical logic. Oxford Logic Guides 41, Clarendon Press, Oxford, 2001. [Dwinger, 1977] Ph. Dwinger. A survey of the theory of Post algebras and their generalizations. In Dunn, J. M. and Epstein, G. (eds) Modern uses of multiple-valued logic. D. Reidel, Dordrecht, 53–75, 1977. [Epstein, 1960] G. Epstein. The lattice theory of Post algebras. Transactions of the American Mathematical Society, 95, 300–317, 1960. [Epstein et al., 1974] G. Epstein, G. Frieder, and D. C. and Rine. The development of multiplevalued logic as related to Computer Science. Computer , 7, no. 9, 20–32, 1974. [Fenstad, 1964] J. E. Fenstad. On the consistency of the axiom of comprehension in the L ukasiewicz infinite-valued logic. Mathematica Scandinavica, 14, 64–74, 1964. [Finn and Grigolia, 1980] V. Finn and R. Grigolia. Bochvar’s algebras and corresponding propositional calculi . Bulletin of the Section of Logic, 9, no. 1, 39–45, 1980. [Fitting, 1969] M. C. Fitting. Intuitionistic Logic, Model Theory and Forcing. North-Holland, Amsterdam, 1969.
Many-valued Logic and its Philosophy
89
[Fitting, 1991] M. C. Fitting. Bilattices and the semantics of logic programming. Journal of Logic Programming, 11, no. 2, 91–116, 1991. [Gaines, 1976a] B. R. Gaines. Foundations of fuzzy reasoning. International Journal of Man– Machine Studies, 8, 623–668, 1976. [Gaines, 1976b] B. R. Gaines. General fuzzy logics. In Proceedings of the 3rd European Meeting on Cybernetics and Systems Research. Vienna, 1976. [Gentzen, 1934] G. Gentzen. Untersuchungen u ¨ ber das Logische Schliessen. Mathematisch Zeitscrift, 39, 176–210, 405–431, 1934. [Giles, 1974] R. Giles. A non-classical logic for physics. Studia Logica, 33, 397–416, 1974. [Ginsberg, 1987] M. L. Ginsberg. Multi-valued logics. In Ginsberg, M. L. (ed.) Readings in non-monotonic reasoning. Los-Altos, CA, 251–258, 1987. [Ginsberg, 1988] M. L. Ginsberg. Multi-valued logics: a uniform approach to reasoning in AI. Computer Intelligence, 4, 256–316, 1988. [G¨ odel, 1930] K. G¨ odel. Die Vollst¨ andigkeit der Axiome des logischen Funktionenkalk¨ uls. Monatschefte f¨ ur Mathematik und Physik, 37, 349–360, 1930. [G¨ odel, 1932] K. G¨ odel. Zum intuitionistischen Aussagenkalk¨ ul. Akademie der Wissenschaften in Wien, Mathematischnaturwissenschaftliche Klasse. Anzeiger, LXIX, 65–66, 1932. [G¨ odel, 1933] K. G¨ odel. Eine Interpretation des intuitionistischen Aussagenkalk¨ uls. Ergebnisse eines mathematischen Kolloquiums, IV, 34–40, 1933. [Goguen, 1969] J. A. Goguen. The logic of inexact concepts. Synthese, 19, 325–373, 1969. [Gonseth, 1941] F. Gonseth, ed. Les entretiens de Zurich sur les fondements et la m´ ethode des sciences math´ ematiques 6–9 d´ ecembre 1938 . Zurich, 1941. ¨ [Gottwald, 1981] S. Gottwald. Fuzzy-Mengen und ihre Anwendungen. Ein Uberblick. Elektronische Informationsverarbeitung und Kybernetik, 17, 207–233, 1981. [Gottwald, 2001] S. Gottwald. A Treatise on Many-Valued Logics. Studies in Logic and Computation, vol. 9, Research Studies Press, Baldock, Hertfordshire, England, 2001. [Grigolia, 1977] R. Grigolia. Algebraic analysis of L ukasiewicz–Tarski’s n-valued logical systems. In W´ ojcicki, R. and Malinowski, G. (eds) Selected papers on L ukasiewicz sentential calculi. Ossolineum, Wroclaw, 81–92, 1977. [H¨ ahnle, 1993*] R. H¨ ahnle. Automated deduction in multiple-valued logics, International Series of monographs on Computer Science, vol. 10. Oxford Univerity Press, 1993. [H¨ ahnle, 1999] R. H¨ ahnle. Tableaux for many-valued logics. In D’Agostino, Gabbay, D., H¨ ahnle, R., and Posegga, J. (eds) Hanbook of Tableaux Methods. Kluwer, Dodrdrecht, 529–580, 1999. [H¨ ahnle, 2001] R. H¨ ahnle. Advanced many-valued logics. In Gabbay, D. and Guenthner, F. (eds) Handbook of philosophical logic, 2nd ed., vol. II. D. Reidel, Dordrecht, 297–395, 2001. [H¨ ahnle and Kernig, 1993] R. H¨ ahnle and W. Kernig. Verification of switch level designs with many-valued logic. In Voronkov, A.(eds Proceedings LPAR’93, St. Petersburg, Russia , Lecture Notes on Computer Science, vol. 698. Springer Verlag, 158–169, 1993. [H´ ajek, 1998*] P. H´ ajek. Metamathematics of fuzzy logics, Trends in Logic: Studia Logica Library, vol. 4. Kluwer Academic Publishers, Dordrecht, 1998. [H´ ajek et al., 2000] P. H ’ajek, J. Paris and J. Sheperdson. The liar paradox and fuzzy logic, The Journal of Symbolic Logic, 65, 339–346, 2000. [Halld´ en, 1949] S. Halld ’en. The logic of nonsense, Uppsala Universitets Arsskrift, Uppsala, 1949. [Harel, 1984] D. Harel. Dynamic logic. In Gabbay, D. and Guenthner, F. (eds) Handbook of philosophical logic, vol. II. D. Reidel, Dordrecht, 497–604, 1984. [Hay, 1963] L. S. Hay. Axiomatization of the infinite-valued predicate calculus. The Journal of Symbolic Logic, 28, 77–86, 1963. [Hayes, 1986] J. P. Hayes. Pseudo-Booleanlogic circuits. IEEE Transactions on Computers, C35(7),602–612, 1986. [Heyting, 1966] A. Heyting. Intuitionism. An introduction. North-Holland, Amsterdam, 1966. [Ja´skowski, 1934] S. Ja´skowski. On the rules of suppositions in formal logic. Studia Logica, 1, 5–32, 1934. [Ja´skowski, 1936] S. Ja´skowski. Recherches sur le syst`eme de la logique intuitioniste. Actes du Congr` es International de Philosophie Scientifique VI . Philosophie de math´ematiques. Actualit´es scientifiques et industrielles 393, Paris, 58–61, 1936. [Johansson, 1936] I. Johansson. Der Minimalkalk¨ ul, ein reduzierter intuitionistischer Formalismus. Compositio Mathematicae, 4, 119–136, 1936.
90
Grzegorz Malinowski
[Kleene, 1938] S. C. Kleene. On a notation for ordinal numbers. The Journal of Symbolic Logic, 3, 150–155, 1938. [Kleene, 1952] S. C. Kleene. Introduction to metamathematics. North-Holland, Amsterdam, 1952. [K¨ orner, 1966] S. K¨ orner. Experience and theory. Routledge and Kegan Paul, London, 1966. [Kotarbi´ nski, 1913] T. Kotarbi´ nski. Zagadnienie istnienia przyszlo´sci (The problem of existence of the future). Przegl¸ ad Filozoficzny, VI.1, 1913. [Kotas and da Costa, 1980] J. Kotas and N. C. A. da Costa. Some problems on logical matrices and valorizations. In Arruda, A. I. , da Costa, N. C. A. and Sette, A. M. (eds) Proceedings of the Third Brasilian Conference on Mathematical Logic, Sa˜ o Paulo, 158–169, 1980. [L o´s, 1948] J. L o´s. Logiki wielowarto´sciowe a formalizacja funkcji intensjonalnych (Many-valued logics and the formalization of intensional functions). Kwartalnik Filozoficzny, 17, 59–78, 1948. [L ukasiewicz, 1906] J. L ukasiewicz. Analiza i konstrukcja poj¸ecia przyczyny. Przegl¸ ad Filozoficzny, 105–179, 1906. [L ukasiewicz, 1910] J. L ukasiewicz. O zasadzie sprzeczno´ sci u Arystotelesa. Studium krytyczne. Krak´ ow, 1910; English tr. On the principle of contradiction in Aristotle. Review of Metaphysics, XXIV, 1971. [L ukasiewicz, 1913] J. L ukasiewicz. Die logischen Grundlagen der Wahrscheinlichkeitsrechnung. Krak´ ow, 1913; English tr. Logical foundations of probability theory. In Borkowski, L. (ed.) Selected works. North-Holland, Amsterdam, 16–63. [L ukasiewicz, 1920] J. L ukasiewicz. O logice tr´ ojwarto´sciowej. Ruch Filozoficzny, 5, 170–171, 1920. English tr. On three-valued logic. In Borkowski, L. (ed.) Selected works. North-Holland, Amsterdam, 87–88. [L ukasiewicz, 1929] J. L ukasiewicz. Elementy logiki matematycznej . Skrypt. Warszawa, 1929 (II edn, PWN, Warszawa 1958); English tr. Elements of Mathematical Logic translated by Wojtasiewicz, O. Pergamon Press, Oxford, 1963. [L ukasiewicz, 1930] J. L ukasiewicz. Philosophische Bemerkungen zu mehrwertigen Systemen des Aussagenkalk¨ uls. Comptes rendus des s´ eances de la Soci´ et´ e des Sciences et des Lettres de Varsovie Cl. III , 23, 51–77, 1930; English tr. Philosophical remarks on many-valued systems of propositional logic. In McCall, S. (ed.) Polish Logic 1920–1939 . Clarendon Press, Oxford, 1967, 40–65. [L ukasiewicz, 1953] J. L ukasiewicz. A system of modal logic. Journal of Computing Systems, 1, 111–149, 1953. [L ukasiewicz, 1961] J. L ukasiewicz. Z zagadnie´ n logiki i filozofii. Pisma wybrane. PWN, Warszawa, 1961; English tr. Selected works (ed. Borkowski, L.). North-Holland, Amsterdam, 1970. [L ukasiewicz and Tarski, 1930] J. L ukasiewicz and A. Tarski. Untersuchungen u ¨ ber den Aussagenkalk¨ ul. Comptes rendus des s´ eances de la Soci´ et´ e des Sciences et des Lettres de Varsovie Cl. III, 23, 30–50, 1930. [MacColl, 1897] H. MacColl. Symbolical reasoning. Mind, 6, 493–510, 1897. [McNaughton, 1951] R. McNaughton. A theorem about infinite-valued sentential logic. The Journal of Symbolic Logic, 16, 1–13, 1951. [Malinowski, 1977] G. Malinowski. Classical characterization of n-valued L ukasiewicz calculi. Reports on Mathematical Logic, 9, 41–45, 1977. [Malinowski, 1990] G. Malinowski. Q-consequence operation. Reports on Mathematical Logic, 24, 49–59, 1990. [Malinowski, 1993*] G. Malinowski. Many-valued logics. Oxford Logic Guides 25, Clarendon Press, Oxford, 1993. [Malinowski, 1994] G. Malinowski. Inferential many-valuedness. In Wole´ nski, J. (ed.) Philosophical logic in Poland, Synthese Library, Kluwer Academic Publishers, Dordrecht, 74–84, 1994. [Malinowski, 2002] G. Malinowski. Referential and inferential many-valuedness. In W. A. Carnielli,W. A.,Coniglio, M. E.I. and D’Ottaviano M. L. (ed.) Paraconsistency the logical way to the inconsistent, Lecture Notes in Pure and Applied Mathematics, vol. 228, Marcel Dekker Inc., 341–352, 2002. [Marciszewski, 1987] W. Marciszewski, ed. Logika formalna. Zarys encyklopedyczny z zastosowaniem do informatyki i lingwistyki (Formal logic: an encyclopaedic outline with informatics and linguistics applied). PWN, Warszawa, 1987.
Many-valued Logic and its Philosophy
91
[Meredith, 1958] C. A. Meredith. The dependence of an axiom of L ukasiewicz. Transactions of the American Mathematical Society, 87, 54, 1958. [Moh Shaw-Kwei, 1954] Moh Shaw-Kwei. Logical paradoxes for many-valued systems. The Journal of Symbolic Logic, 19, 37–40, 1954. [Moisil, 1966] G. Moisil. Zastosowanie algebr L ukasiewicza do teorii uklad´ ow przeka´znikowostykowych (Application of L ukasiewicz algebras to the study of relay-contact networks). Ossolineum, Wroclaw (vol. II, 1967 edn), 1966. [Moisil, 1972] G. Moisil. Essais sur les logiques non-chrisipiennes. Editions de l’Acad´emie de la Republique Socialiste de Roumanie. Bucharest, 1972. [Montagna, 2000] F. Montagna. An algebraic approach to propositional fuzzy logic, Journal of Logic, Language and Information, Special issue on many-valued Logics of Uncertainty. Mundici, D. (ed.), 9 (1), 91–124, 2000. [Morgan, 1976] C. G. Morgan. A resolution principle for a class of many-valued logics. Logique et Analyse, 19 (74-75-76), 311–339, 1976. [Mostowski, 1961] A. Mostowski. Axiomatizability of some many-valued predicate calculi. Fundamenta Mathematicae, 50, 165–190, 1961. [Nowak, 1988] M. Nowak. O mo˙zliwo´sci interpretowania tr´ ojwarto´sciowej logiki L ukasiewicza metod¸a Slupeckiego (On the possibility of interpreting the three-valued L ukasiewicz logic using Slupecki’s method). Acta Universitatis Lodziensis, Folia Philosophica, 5, 3–13, 1988. [Orlowska, 1967] E. Orlowska. Mechanical proof procedure for the n-valued propositional calculus. Bulletin de l’Acad´ emie Polonaise des Sciences, S´ erie des sciences math´ ematiques, astronomiques et physiques, 15 (8), 537–541, 1967. [Panti, 1999] G. Panti. Varieties of MV-algebras, Journal of Applied Non-classical Logics, Special issue on many-valued logics. Carnieli, W. A. (ed.), 141–157, 1999. [Peirce, 1885] C. S. Peirce. On the algebra of logic: a contribution to the philosophy of notation. American Journal of Mathematics, 7, 180–202, 1885. [Picard, 1935] S. Picard. Sur les fonctions d´efinies dans les ensembles finis quelconques. Fundamenta Mathematicae, 24, 198–302, 1935. [Post, 1920] E. L. Post. Introduction to a general theory of elementary propositions. Bulletin of the American Mathematical Society, 26, 437, 1920. [Post, 1921] E. L. Post. Introduction to a general theory of elementary propositions. American Journal of Mathematics, 43, 163–185, 1921. [Rasiowa, 1974*] H. Rasiowa. An algebraic approach to non-classical logics. North-Holland, Amsterdam; PWN, Warsaw, 1974. [Rasiowa, 1977] H. Rasiowa. Many-valued algorithmic logic as a tool to investigate programs. In Dunn, J. M. and Epstein, G. (eds) Modern uses of multiple-valued logic. D. Reidel, Dordrecht, 79–102, 1977. [Rasiowa, 1991] H. Rasiowa. On approximation logics: A survey. University of Warsaw, Warsaw, 1991. [Rasiowa and Sikorski, 1963] H. Rasiowa and R. Sikorski. The Mathematics of Metamathematics. PWN, Warsaw, 1963. [Reichenbach, 1935] H. Reichenbach. Wahrscheinlichkeitslehre. Leiden, 1935; English tr. The theory of probability. University of California Press, Berkeley, 1949. [Rescher, 1969*] N. Rescher. Many-valued logic. McGraw-Hill, New York, 1969. [Rine, 1977] D. C. Rine, ed. Computer Science and Multiple-valued Logic. Theory and Aplications. Amsterdam, North-Holland, 1977. [Rose and Rosser, 1958] A. Rose and J. B. Rosser. Fragments of many-valued statement calculi. Transactions of the American Mathematical Society, 87, 1–53, 1958. [Rosenbloom, 1942] P. C. Rosenbloom. Post algebra. I. Postulates and general theory. American Journal of Mathematics, 64, 167–188, 1942. [Rosser and Turquette, 1952*] J. B. Rosser and A. R. Turquette. Many-valued logics. NorthHolland, Amsterdam, 1952. [Rousseau, 1967] G. Rousseau. Sequents in many-valued logic. Fundamenta Mathematicae, LX, 1, 23–33, 1967. [Rousseau, 1969] G. Rousseau. Logical systems with finitely many truth-values. Bulletin de l’Acad´ emie Polonaise des Sciences, S´ erie des sciences math´ ematiques, astronomiques et physiques, 17, 189–194, 1969. [Rutledge, 1959] J. D. Rutledge. A preliminary investigation of the infinitely many-valued predicate calculus. Ph.D. thesis, Cornell University, 1959.
92
Grzegorz Malinowski
[Salwicki, 1970] A. Salwicki. Formalized algorithmic languages. Bulletin de l’Acad´ emie Polonaise des Sciences, S´ erie des sciences math´ ematiques, astronomiques et physiques, 18, 227– 232, 1970. [Scarpelini, 1962] B. Scarpelini. Die Nichtaxiomatisierbarkeit des unendlichwertigen Pr¨ adikatenkalk¨ uls von L ukasiewicz. The Journal of Symbolic Logic, 17, 159–170, 1962. [Schr¨ oter, 1955] K. Schr¨ oter. Methoden zur Axiomatisierung beliebiger Aussagen- und Pr¨ adikatenkalk¨ ule. Zeitschrift f¨ ur Mathematische Logik und Grunglagen der Mathematik, 1, 241–251, 1955. [Scott, 1973] D. Scott. Background to formalisation. In Leblanc, H. (ed.) Truth, Syntax and Modality. North-Holland, Amsterdam, 244–273, 1973. [Scott, 1974] D. Scott. Completeness and axiomatizability in many-valued logic. In Henkin, L. et al. (eds) Proceedings of Tarski Symposium. Proceedings of Symposia in Pure Mathematics, vol. 25, 411–436, 1974. [Scott and Solovay, 1969] D. Scott and R. Solovay. Boolean valued models for set theory, Proceedings of the American Mathematical Society Summer Inst. Axiomatic Set Theory 1967 . University of California, Los Angeles. Proceedings of Symposia in Pure Mathematics, 13, 1969. [Skolem, 1957] T. Skolem. Bemerkungen zum Komprehensionsaxiom. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik, 3, 1–17, 1957. [Slupecki, 1936] J. Slupecki. Der volle dreiwertige Aussagenkalk¨ ul. Comptes rendus des s´ eances de la Soci´ et´ e des Sciences et des Lettres de Varsovie Cl. III , 29, 9–11, 1936; English tr. The full three-valued propositional calculus. In McCall, S. (ed.) Polish Logic 1920–1939 . Clarendon Press, Oxford, 1967, 335–337. [Slupecki, 1939a] J. Slupecki. Kryterium pelno´sci wielowarto´sciowych system´ ow logiki zda´ n (A criterion of completeness of many-valued systems of propositional logic). Comptes rendus des s´ eances de la Soci´ et´ e des Sciences et des Lettres de Varsovie Cl. III , 32, 102–109, 1939. [Slupecki, 1939b] J. Slupecki. Dow´ od aksjomatyzowalno´sci pelnych system´ ow wielowarto´sciowych rachunku zda´ n (Proof of the axiomatizability of full many-valued systems of propositional calculus). Comptes rendus des s´ eances de la Soci´ et´ e des Sciences et des Lettres de Varsovie Cl. III , 32, 110–128, 1939. [Slupecki, 1964] J. Slupecki. Pr´ oba intuicyjnej interpretacji logiki tr´ ojwarto´sciowej L ukasiewicza (An attempt of intuitionistic interpretation of three-valued L ukasiewicz logic). In Rozprawy Logiczne. PWN, Warszawa, 1964. [Stachniak, 1996] Z. Stachniak. Resolution proof systems: an algebraic theory. Kluwer, Dordrecht, 1996. [Sucho´ n, 1974] W. Sucho´ n. D´efinition des founcteurs modaux de Moisil dans le calcul n-valent des propositions de L ukasiewicz avec implication et n´egation. Reports on Mathematical Logic, 2, 43–47, 1974. [Surma, 1971] S. J. Surma. Ja´skowski’s matrix criterion for the intuitionistic propositional calculus. Prace z logiki, VI, 21–54, 1971. [Surma, 1973] S. J. Surma. A historical survey of the significant methods of proving Post’s theorem about the completeness of the classical propositional calculus. In Surma, S. J. (ed.) Studies in the History of Mathematical Logic. Ossolineum, Wroclaw, 19–32, 1973. [Surma, 1974] S. J. Surma. An algorithm for axiomatizing every finite logic. Reports on Mathematical Logic, 3, 57–62, 1974. [Surma, 1984] S. J. Surma. An algorithm for axiomatizing every finite logic. In Rine D. C. (ed.) Computer Science and Multiple-Valued Logics, North Holland, Amsterdam, 143–149, 1984. [Suszko, 1957] R. Suszko. Formalna teoria warto´sci logicznych (A formal theory of logical values). Studia Logica, VI, 145–320, 1957. [Suszko, 1972] R. Suszko. Abolition of the Fregean Axiom. In Parikh, R. (ed.) Logic Colloquium, Symposium on Logic held at Boston, 1972–73 . Lecture Notes in Mathematics, vol. 453, 169– 239, 1972. [Suszko, 1975] R. Suszko. Remarks on L ukasiewicz’s three-valued logic. Bulletin of the Section of Logic, 4, no. 3, 87–90, 1975. [Suszko, 1977] R. Suszko. The Fregean Axiom and Polish Mathematical Logic in the 1920’s. Studia Logica, 36, no. 4, 377–380, 1977. [Takahashi, 1967] R. Takahashi. Many-valued logics of extended Gentzen style I. Science Reports of the Tokyo Kyoiku Daigaku, Section A, 9(231), 95–116, 1967.
Many-valued Logic and its Philosophy
93
[Takahashi, 1970] R. Takahashi. Many-valued logics of extended Gentzen style II. Journal of Symbolic Logic, 35(231), 493–528, 1970. ¨ [Tarski, 1930] A. Tarski. Uber einige fundamentale Begriffe der Metamathematik. Comptes Rendus des s´ eances de la Soci´ et´ e des Sciences et des Lettres de Varsovie Cl.III , 23, 22–29, 1930; English tr. In Tarski, A. Logic, Semantics, Metamathematics: Papers from 1923 to 1938 , translated by Woodger, J. H. Clarendon Press, Oxford, 1956, 30–37. [Tarski, 1936] A. Tarski. O poj¸eciu wynikania logicznego (On the concept of logical consequence). Przegl¸ ad Filozoficzny, 39, 58–68, 1936; English tr. In Tarski, A. Logic, Semantics, Metamathematics: Papers from 1923 to 1938 , translated by Woodger, J. H. Clarendon Press, Oxford, 1956, 409–420. [Tarski, 1938] A. Tarski. Der Aussagenkalk¨ ul und die Topologie. Fundamenta Mathematicae, 31, 103–134, 1938; English tr. In Tarski, A. Logic, Semantics, Metamathematics: Papers from 1923 to 1938 , translated by Woodger, J. H. Clarendon Press, Oxford, 1956, 421–454. [Tokarz, 1974] M. Tokarz. A method of axiomatization of L ukasiewicz logics. Bulletin of the Section of Logic, 3, no. 2, 21–24, 1974. [Traczyk, ] T. Traczyk. An equational definition of a class of Post algebras. Bulletin de l’Acad´ emie Polonaise des Sciences Cl. III , 12, 147–149, 1964. [Turner, 1984*] R. Turner. Logics for Artificial Intelligence. Ellis Horwood, Chichester, 1984. [Tuziak, 1988] R. Tuziak. An axiomatization of the finitely-valued L ukasiewicz calculus. Studia Logica, 48, 49–56, 1988. [Urquhart, 1973] A. Urquhart. An interpretation of many-valued logic. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik, 19, 111–114, 1973. [Urquhart, 1986*] A. Urquhart. Many-valued logic. In Gabbay, D. and Guenthner, F. (eds) Handbook of philosophical logic, vol. III. D. Reidel, Dordrecht, 71–116, 1986. [Wade, 1945] C. I. Wade. Post algebras and rings. Duke Mathematical Journal, 12, 389–395, 1945. [Wajsberg, 1931] M. Wajsberg. Aksjomatyzacja tr´ ojwarto´sciowego rachunku zda´ n. Comptes Rendus de la Soci´ et´ e des Sciences et des Lettres de Varsovie Cl. III , 24, 126–148, 1931; English tr. Axiomatization of the three-valued propositional calculus. In McCall, S. (ed.) Polish Logic 1920–1939 . Clarendon Press, Oxford, 1967, 264–284. [Wajsberg, 1933] M. Wajsberg. Eine erweiterter Klassenkalk¨ ul. Monatshefte f¨ ur Mathematik und Physik , 40, 113–126, 1933. [Webb, 1935] D. L. Webb. Generation of any n-valued logic by one binary operation. Proceedings of the National Academy of Sciences, 21, 252–254, 1935. [Whitehead and Russell, 1910] A. N. Whitehead and B. Russell. Principia Mathematica, vol. I. Cambridge University Press, 1910. [Williamson, 1994] T. Williamson. Vagueness. Routledge, London and New York, 1994. [Wole´ nski, 1989] J. Wole´ nski. Logic and philosophy in the Lvov–Warsaw School. Synthese Library, 198. D. Reidel, Dordrecht, 1989. [Wolf, 1977*] R. G. Wolf. A survey of many-valued logic (1966–1974), Appendix II. In Dunn, J. M. and Epstein, G. (eds) Modern uses of multiple-valued logic. D. Reidel, Dordrecht, 167–324, 1977. [W´ ojcicki, 1970] R. W´ ojcicki. Some remarks on the consequence operation in sentential logics. Fundamenta Mathematicae, 68, 269–279, 1970. [W´ ojcicki, 1977] R. W´ ojcicki. Strongly finite sentential calculi. In W´ ojcicki, R. and Malinowski, G. (eds) Selected papers on L ukasiewicz sentential calculi. Ossolineum, Wroclaw, 53–77, 1977. [W´ ojcicki, 1988] R. W´ ojcicki. Theory of logical calculi. Basic theory of consequence operations. Synthese Library, 199. Kluwer Academic Publishers, Dordrecht, 1988. [Zach, 1993] R. Zach. Proof theory of finite-valued logics, Master’s thesis, Institut f¨ ur Algebra und Discrete Mathematik, TU Wien. 8, 338–353, 1993. [Zadeh, 1965] L. A. Zadeh. Fuzzy sets. Information and Control, 8, 338–353, 1965. [Zadeh, 1972] L. A. Zadeh. A fuzzy-set-theoretic interpretation of linguistic hedges. Journal of Cybernetics, 2, 4–34, 1972. [Zadeh, 1975] L. A. Zadeh. Fuzzy logic and approximate reasoning. Synthese, 30, 407–428, 1975. [Zadeh, 1976] L. A. Zadeh. A fuzzy-algorithmic approach to the definition of complex or imprecise concepts. International Journal of Man–Machine Studies, 8, 249–291, 1976.
94
Grzegorz Malinowski
[Zawirski, 1934a] Z. Zawirski. Znaczenie logiki wielowarto´sciowej i zwi¸azek jej z rachunkiem prawdopodobie´ nstwa (Significance of many-valued logic for cognition and its connection with the calculus of probability). Przegl¸ ad Filozoficzny, 37, 393–398, 1934. [Zawirski, 1934b] Z. Zawirski. Stosunek logiki wielowarto´sciowej do rachunku prawdopodobie´ nstwa (Relation of many-valued logic to the calculus of probability). Prace Komisji Filozoficznej Polskiego Towarzystwa Przyjaci´ ol Nauk , 4, 155–240, 1934. [Zinov’ev, 1963*] A. A. Zinov’ev. Philosophical problems of many-valued logic, edited and translated by K¨ ung, G. and Comey, D. D. D. Reidel, Dordrecht, 1963.
PRESERVATIONISM: A SHORT HISTORY
Bryson Brown Preservationism is a general approach to understanding consequence relations. Preservationist consequence relations dispense with the usual assumption that the semantic and syntactic properties preserved by consequence must be truth (or satisfiability) and consistency. Instead, this family of consequence relations draws on other semantic and syntactic features of premise sets, conclusion sets and even of consequence relations themselves. Preserving those features across extensions of sets of sentences, or a range of cases, provides new accounts of consequence. The central idea was proposed by R. E. Jennings and P. K. Schotch in a series of papers that appeared in the late 1970s and early 1980s. Since then, they, their students and colleagues have developed a wide range of new consequence relations, as well as some new readings of familiar consequence relations. In general, an interesting preservable property of premise sets will be preserved under some but not all extensions of the premise set. Philosophically, it will be a property we think of as “good” for a set to have — but what this comes to is not very constraining: What we require is not ideal goodness (something which lurks behind the conventional attachment to ‘truth-preservation’), but only a modest, comparative sort of goodness. Here two slogans, coined by Schotch and Jennings (respectively), fit nicely: Hippocrates: Don’t make things worse. Making do: Find something you like about your premises, and preserve it. Properties dual to such premise-set properties make good candidates for preservable properties of conclusion sets. Interesting new consequence relations also result when we insist that a truth-preserving consequence relation be preserved across a range of premise and conclusion sets based on given premise and conclusion sets. We begin with an account of the familiar classical consequence relation, emphasizing its preservational character. The main early motivation for preservationism emerges from this account: The need for a consequence relation that deals more constructively with inconsistent premises. The rest of the story will present, in rough chronological order, the main preservationist systems and what is known (and not yet known) about them.
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
96
Bryson Brown
1
SOME OBSERVATIONS ABOUT THE CLASSICAL VIEW OF CONSEQUENCE
We will say that a set Γ is consistent if and only if it is impossible to derive a sentence and its negation from Γ; Γ is satisfiable if and only if some allowed valuation assigns a designated value (usually read as a form of ‘truth’, indicating the correctness of an assertive commitment) to every sentence in Γ. A set is maximally consistent if it is consistent and adding any sentence to it would produce an inconsistent set. Similarly, a set is maximally satisfiable if it is satisfiable and no proper superset is satisfiable. The standard view of consequence relations results from a straightforward account of semantic and syntactic consequence relations ( and ⊢): guarantees the preservation of truth, while ⊢ preserves consistency. This can also be expressed (without appeal to the notion of a ‘guarantee’) by saying that preserves the satisfiability of all satisfiable extensions of Γ, and that ⊢ preserves the consistency of all consistent extensions of Γ. Formally, 1. Γ α iff ∀Γ′ [(Γ′ ⊇ Γ&Γ′ is satisfiable) → Γ′ , α is satisfiable]. 2. Γ ⊢ α iff ∀Γ′ [(Γ′ ⊇ Γ&Γ′ is consistent) → Γ′ , α is consistent].1 In English, if Γ′ extends Γ satisfiably or consistently, then every consequence of Γ must satisfiably or consistently extend Γ′ . This leads to an interesting observation: We can say that closing under these consequence relations begs no questions, construing question begging as treating sentences as part of the commitments that go with having accepted Γ, despite their being incompatible with some acceptable extensions of Γ. Conclusions that are incompatible with acceptable extensions of a premise set clearly go beyond the commitments that come with accepting those premises.2 Given the soundness and completeness of our system of derivation, 1 and 2 are simply alternative definitions of the same consequence relation, represented in the standard way as a set of ordered pairs of sets of sentences and individual sentences. The principle point here concerns the implications of this picture of consequence for the consequences of unsatisfiable or inconsistent premise sets. Both unsatisfiability and inconsistency are preserved when we form supersets. So if Γ is unsatisfiable 1 Here we’re using the notational convention that Γ, α = Γ ∪ {α}. We could also present 3 and 4 in slightly different form:
3′ .
Γ α iff ∀Γ′ [(Γ′ is a maximal satisfiable extension of Γ) → α ∈ Γ′ ].
4′ .
Γ ⊢ α iff ∀Γ′ [(Γ′ is a maximal consistent extension of Γ) → α ∈ Γ′ ].
However, both these formulations leave aside the issue of extension — and to our way of thinking, the notion of a consequence relation is very tightly linked to the idea of ‘acceptable’ extensions of sets of sentences. Of course we can re-capture this notion by appeal to extensions which are subsets of such maximal extensions. So the two versions will be distinct only if we have non-standard notions of unsatisfiability and inconsistency that are not always preserved under supersets. 2 Of course, those interested in induction and ampliative reasoning in general may balk at defining acceptability in terms of satisfiability or consistency alone — so they may still dispute Hume’s characterization of induction as question-begging.
Preservationism: A short history
97
or inconsistent, Γ has no satisfiable or consistent extensions. This implies that for an unsatisfiable or inconsistent premise set Γ, every sentence trivially preserves the satisfiability of every satisfiable extension of Γ; for such a Γ clauses 1 and 2 are satisfied for every α. Thus every sentence is a consequence of an unsatisfiable or inconsistent set. This trivialization of the consequences of unsatisfiable and inconsistent premise sets is deeply rooted in the standard account of consequence relations. Interest in paraconsistent consequence relations began with the recognition that, though this is all clear and perfectly correct as far as it goes, trivialization of these premise sets is also very unhelpful. In the course of events, we sometimes do end up with inconsistent sets of commitments. Classical logic’s response is to demand that we find another set of premises to work with. Though this seems to be good advice, it does not provide enough guidance, that is, it does not tell us how to do what it demands. Once we have arrived at inconsistent commitments, we need insight into those commitments to help us see how to improve on them. Classical logic simply tells us that our commitments are trivial, leaving behind neither an account of the content of conflicting commitments that could help us reflect on where and how we have gone wrong, nor any way of muddling through in the meanwhile. The classicists’ main response has been to interrogate the premises involved separately, on suspicion. Sometimes this is helpful. But sometimes the bona fides of each premise look perfectly sound, even though their collective consequences make it clear that something has gone wrong. As a simple example, consider a large collection of measurements, collectively inconsistent but each carefully made. No individual result need stand out as suspicious here, and repeating the measurements (if this is possible) is likely to produce yet another inconsistent set of results. Another sort of case arises when we develop formal theories: Simple sets of axioms each of which seems to express a clear, if informal, intuition that we want to capture often turn out to be inconsistent. Giving one or more up undermines the application that motivated the construction of the formal system in the first place. If we could find a constructive way of faute de mieux reasoning with inconsistent measurements and axioms, the results might be illuminating. But classical logic won’t allow it.
Paraconsistency A consequence relation that copes constructively with such sets of commitments must go beyond preserving the classical notions of truth or consistency. We say that a consequence relation Rp is minimally paraconsistent 3 iff this trivialization 3 The definition of ‘paraconsistent’, meaning either beyond or near consistent, remains contested territory. Some, following da Costa, demand that a paraconsistent logic not have the law of non-contradiction, ¬(p ∧ ¬p), as a theorem. But most now focus on non-trivialization. The divide within this group separates those who insist on non-trivialization of contradictions, i.e. the failure of p ∧ ¬p ⊢ α, for some α, and those who insist more weakly only on the non-trivialization of some inconsistent premise sets.
98
Bryson Brown
fails for Rp , i.e. iff for some classically unsatisfiable or inconsistent set of sentences Γ, there is a sentence α such that Γ, α ∈ Rp . There are two main strategies for producing such a consequence relation, based on different ways of changing clauses 1 and 2.
Strategy A: New accounts of “truth” and “consistency” This is the road most traveled in paraconsistent logic. Its practitioners include dialetheists, who defend a radical account of truth according to which some contradictions are in fact true. But they also include more conservative figures who take the new semantic values they propose for sentences to express epistemic commitment, or some other status more modest than truth, tout court. On our taxonomy, any paraconsistent semantics that provides non-trivial assignments of designated values to the members of some classically unsatisfiable sets of sentences falls into this group (the difference between the dialetheists and the rest being a matter of interpretation, rather than a substantive difference of formal approach). This approach retains the standard assumption that whenever α is not a semantic consequence of Γ for a consequence relation R, this is because the proposed semantics provides a valuation V such that for all γ ∈ Γ, V (γ) ∈ {v : v is designated} and V (α) ∈ {v : v is designated}. While truth is the standard reading of a designated value, philosophers have not always interpreted the designated values of a formal semantics as forms of truth. For instance, in “How a Computer Should Think,” Belnap reads the values of Dunn’s four-valued logic epistemically, as “told true”, “told false”, “told both” and “told neither”. However, even when we keep this interpretational latitude in mind, this semantic approach to consequence relations remains formally constraining. It focuses our attention on the assignment of values to sentences and the distinction between designated and undesignated values. The resulting consequence relations insist that Γ α holds if and only if α is assigned a designated value in every valuation assigning designated values to all the members of Γ. The implications of this account are explored in a very general guise in Scott [1974]. Scott considers the properties of a consequence relation determined by a set of ‘allowed’ valuations from the sentences of a language L, defined as functions from sentences of L into {0,1}, where Γ α ⇔ ∀V ∈ allowed valuations: ∀γ ∈ Γ, V (γ) = 1 ⇒ V (α) = 1. All three of transitivity, monotonicity and reflexivity must hold for any such consequence relation, and any consequence relation obeying these rules can be captured by such a set of allowed valuations. But there are interesting forms of consequence relation that do not fit this pattern. And some consequence relations that do fit the pattern can also be captured (and illuminated) by a different approach.
Preservationism: A short history
99
Strategy B: Preservationist Paraconsistency The preservationist approach has been less widely pursued. But it has some advantages over the first. The main idea is contained in our two opening slogans, Schotch’s Hippocratic principle and Jennings’ suggestion that we learn to make do. There are many properties whose preservation will constrain the acceptable extensions of sets of sentences. Whether we think of them as measures of how bad our premise set is, or of how good it is, they allow us to distinguish good extensions from bad, and thereby allow us to go on distinguishing consequences from nonconsequences even when our premise sets are logically imperfect. More radically, we can also consider dispensing with extensions in favour of a more general picture, in which we replace the set of allowed valuations with a function from sets of sentences to allowed valuations. The intersection of the sentences assigned ‘1’ by the valuations acceptable relative to Γ then determines Γ’s consequences. Insisting that the valuations acceptable relative to Γ assign ‘1’ to an extension of Γ forces reflexivity and transitivity on the consequence relation, while this generalization allows for consequence relations that are not monotonic, transitive or reflexive: Reflexivity obviously fails, when at least one of the allowed valuations relative to Γ fails to assign ‘1’ to a member of Γ. Transitivity fails in a somewhat subtler way: If some valuation acceptable relative to Γ assigns ‘0’ to β, while every valuation acceptable relative to Γ assigns ‘1’ to α and every valuation acceptable relative to Γ, α assigns ‘1’ to β, then β will fail to be a consequence of Γ even though β is a consequence of Γ,α and α is a consequence of β. This case is odd, indeed, but the formal account just given allows for it, as d’Entremont [1983] made clear. And there is something valuable in this, even if our proposals don’t exploit this possibility: By making room for these outr´e possibilities, we can illuminate the difference between systems that allow them and systems that don’t. Further, just as conditional forms of monotonicity apply to some preservationist logics, conditional forms of these properties, ensuring that reflexivity and transitivity hold except in certain ugly circumstances, provide more ways of coping when our premises do get ugly. For example, we might choose to ‘write out’ any contradictory sentences in a premise set Γ by refusing to assign them the value ‘1’ in the valuations allowed relative to Γ, while retaining reflexivity for all other sentences. From the classical point of view, there is nothing worse than an inconsistent, unsatisfiable set of sentences. Classical logic preserves satisfiability or consistency only; once these are lost nothing remains that a dogmatically classical logician cares to preserve. But this is short-sighted, narrow-minded, or both. Some other features of premise sets are worth preserving. Non-triviality is the most obvious example, but we will encounter several examples of specific preservable properties that can persist and be systematically preserved by a system of inference even in premise sets that are unsatisfiable and inconsistent. Beginning with a set of sentences that merely includes p and ¬p and applying our consequence relation to obtain the set of all sentences makes things worse by any reasonable measure: It takes us from a set that we could reasonably use to
100
Bryson Brown
represent someone’s (inconsistent) beliefs, or the contents of an interesting but inconsistent theory, to the set of all sentences in the language, a set that is completely useless as a representation of either. Applying a non-paraconsistent consequence relation to inconsistent premise sets clearly violates Schotch’s Hippocratic advice on dealing with bad situations. And this implies that we have good reason to take up Jennings’ positive suggestion: We should seek other virtues that such premise sets can still possess, and ways to reason with them that respect those virtues. The preservationist alternative widens the options before us. It liberates consequence relations from the tyranny of designated values. It does not demand that we find a way to assign a designated value to the premises of a rejected consequence while assigning a non-designated value to the conclusion. We can assign values to sentences just as classical logicians do,4 so long as we also recognize other features worth preserving. If some inconsistent/unsatisfiable sets possess these features and the features are not preserved by every extension of these sets, then the resulting consequence relation will be a paraconsistent one. 2 ORIGIN MYTH Preservationism developed out of work on deontic logic, begun roughly in 1975 by Raymond E. Jennings and Peter K. Schotch. But, like many innovative ideas, it arose from work aimed at other concerns. At the time, Jennings had recently been to New Zealand on a research trip to extend earlier work on the logic of preference and utilitarianism that he had done while working on his MA at Queen’s University. Jennings remarks that he found New Zealand ‘just like England before the war — the war of 1812, that is’. But he also found it a wonderful place to be as he worked on his ‘utilitarian semantics’. Wellington was friendly and accessible, and Jennings learned a lot of modal logic from both Rob Goldblatt and Max Cresswell. On his return from New Zealand, Jennings made a trip to Dalhousie University to visit Steven Burns, an old friend from student days at Queen’s. As the local logician, Peter Schotch was recruited to act as host. Jennings stayed with Schotch for about 5 days as the two worked through a list of 38 Henkin-style completeness proofs for various systems of modal logic. The next summer Jennings took a teaching appointment at Dalhousie and worked with Schotch on plans for a book on modal logic. Their aim was to update and refine the material in Hughes and Cresswell’s book in the light of Segerberg’s work on frame theory; they hoped to make the result ‘a little more readable’ than Segerberg. As they worked on their Primer on Modal Logic (only partly finished during that summer’s work), Jennings was also pursuing issues in deontic logic. One day he came into Schotch’s office and exclaimed that normal modal logic doesn’t work. His complaint was that it collapses distinctions that they both considered important. This collapse was particularly clear in the light of Segerberg’s topological semantics 4 Though,
of course, we need not!
Preservationism: A short history
101
for modal logic, [Segerberg, 1972], which defines a modal frame as an ordered pair, U, R, with U a set and R a non-empty binary relation on U . Each element of U determines a valuation on the modal language, with the modal operator having the truth condition: u α ⇔ ∀u′ : Ruu′ :u′ α. This class of modal frames is characterized by the following principles:5 K:
α ∧ β → (α ∧ β)
RM:
⊢ α → β ⇒ ⊢ α → β
RN:
⊢ α ⇒ ⊢ α
The following two equivalences are inescapable for any such logic: α → ♦α ⇔ (α → ♦α) p → ♦p ⇔ ¬⊥ But despite their equivalence in any normal modal logic, these pairs of statements are intuitively distinct, at least when read deontically: That ‘ought implies can’ seems distinct from ‘It’s not the case that it ought to be the case that the false’. Similarly, that ‘If it ought to be that it ought to be that α, then it ought to be permitted that α’ seems distinct from ‘It ought to be that, if it ought to be that α, then it is permitted that α’. Schotch and Jennings began to consider how to change the semantics so as to distinguish these principles. The obvious targets (as constraints on normal modal frames) were: 1. The requirement that the set of points in the frame be non-empty. 2. The central role of the binary ‘accessibility’ relation defined on the points. Pursuing empty frames seemed both odd and unhelpful, so Schotch and Jennings decided to explore changes to the relation. The only restriction normal modal logic imposes on the relation is that it must be binary, so the obvious alternative was to consider relations of higher ‘arity’.6 Beginning with a ternary relation treated as a function from worlds to ‘accessible’ world-pairs, they proposed a simple truth condition for ‘α’: M M M u α ⇔ ∀xy : uRxy →x α or y α.
Right away it was clear that on this semantics, α, β (α∧β). The first question they turned to was how to get 5 It can also be characterized by the elegant single rule, R : Γ ⊢ α ⇒ [Γ] ⊢ α, where [Γ] is the result of placing a ‘’ in front of each element of Γ. 6 The possibility of multiple binary relations was also explored.
102
Bryson Brown
α ∧ β → (α ∧ β)
K:
back again in this semantic context. Requiring ∀x, y : ∃uuRxy → x = y is close enough (though not quite exactly right, since this approach to collapsing the ternary semantics into a binary semantics doesn’t correspond strictly to K). The second question was, what corresponds to modal aggregation in this context? The next day, Jennings pointed out that K2 :
α ∧ β ∧ γ ((α ∧ β) ∨ (α ∧ γ) ∨ (β ∧ γ))
holds in ternary frames, on pigeon-hole grounds. Similarly, the rule Kn−1 : α1 ∧ . . . ∧ αn (αi ∧ αj ), 1 ≤ i = j ≤ n7
holds in n-ary frames in general. Suspecting that these rules might constitute the general modal aggregation rules for these ‘diagonal’ n-ary frames, Schotch and Jennings began a long effort to provide a completeness proof for them. As the end of summer school approached, they found themselves still stuck on the completeness proof for the logic based on K2 . Both went on working on the completeness problem over the fall and winter, and work on the project continued during summer school in 1976 at Simon Fraser University. As this project continued, Schotch and Jennings also began to reflect on this new consequence relation and its implications for more traditional ideas about consequence. The upshot was a growing appreciation of the limits imposed by the central role generally granted truth in the semantics of logical consequence, and the possibility of casting wider logical nets. Preservationism was born. Since then, it has developed along several main lines, as well as contributing to a number of smaller investigations. Here we will focus on three main lines of preservationist research, commenting on some of the other projects along the way.
Weak Aggregation As we’ve already seen, preservationism began with weakly aggregative modal logics and the ‘forcing’ consequence relation. In this section we continue the story of these logics and some important results that have emerged from work on them. In the summer of 1976, Schotch recalled work he had done in the algebra of modal logic at while at the University of Waterloo. Dennis Higgs, an algebraist there, had introduced Schotch to Tarski and Jonson’s [1951], “Boolean Algebra with Operators”, which included consideration of n-ary frames: A ternary frame corresponds to a binary operator in Tarski and Jonson’s approach, and in general, an n-ary frame corresponds to an n − 1-ary operator. Higgs had encouraged Schotch to work on modal logic from this point of view. At the following NSRCC conference hosted by Steve Thomason, Jennings and Schotch discussed this work with Kit Fine, Krister Segerberg and Rob Goldblatt, 7 Called
(∨).
‘cockroach intro’ (∧∨-intro) for its combination of conjunction (∧) and disjunction
Preservationism: A short history
103
as well as (someone??) working on applications to computing and dynamic logics. Goldblatt had considered binary operators with a temporal reading of ♦: first this, and then that; he had done a Henkin completeness proof for that system. David Johnston, then a student of Jennings’, successfully applied this approach of Goldblatt’s to n-ary frames in general in his M.A. thesis [1976]. Still, like Schotch and Jennings, he didn’t manage to produce a completeness proof for the diagonal fragment corresponding to Schotch and Jennings’ truth condition for .8 Schotch and Jennings continued to pursue a completeness proof for the diagonal fragment in the summer of 1976. It was clear that the issue was closely connected to (n − 1)-ary partitions of premise sets: pigeon-hole argument established the soundness of Kn−1 for an n-ary frame relation, but a proof of completeness eluded them. While they were struggling with the proof, Max Cresswell passed through and encouraged them to keep at it, remarking that the same thing had happened to him before. After some time, they felt that the K2 case (for ternary frames) was proven. Jim Freeman organized a CPA workshop at which they planned to present the proof. Storrs McCall, Hughes Leblanc and Danny Daniels were there, on a very hot (40◦ ) day. Halfway through lunch Ray asked Peter, “Isn’t this a counterexample?” Working through it again, Schotch anxious and excited (even manic, he says) and Jennings sleepy in the heat, they arrived at the notion of replacing the K2 rule with an appeal to a new consequence relation, the consequences that survive the n-ary partitions of a premise set. This consequence relation, written Γ[⊢ α, holds if and only every n-partition of Γ includes a cell that classically proves α. With this notion of consequence in place of K2 , completeness for their diagonal modal logics could be proven straightforwardly. At the 1978 SEP meetings in Pittsburgh, this novel ‘forcing’ relation and aggregation principles took the stage together, with Bill Harper, Teddy Seidenfeld and Bas van Fraassen among the audience. Chellas pointed out that Scott’s elegant rule for normal modal logic: Γ⊢α (Γ) ⊢ α could be used to capture the diagonal n-ary modal logics Schotch and Jennings had formulated: Simply put the new forcing consequence relation in place of ‘⊢’ in Scotts’ rule, and the diagonal n-ary modal logic would result. Barbara Partee pointed out that this consequence relation would allow non-trivial reasoning from inconsistent data. Earlier at the meeting, R. Wolf had read a paper, titled “Studies in Paraconsistent Logic I,” about work with Newton da Costa — this paper might have been the source of Partee’s suggestion. Partee’s intervention marked the first time that Schotch and Jennings heard the word ‘paraconsistency’. 8 This problem, which continued to be a central concern for Schotch, Jennings and their coworkers, can be found (in what may be its first appearance in the literature) in Prior’s work on tense operators, where he defined the following pair of operators:
u α · β ⇔ ∃xy : uRxy&(x α∧ y β) α ⊙ β ⇔ ∀xy : uRxy ⇒ (x α∨ y β) The ‘α’ operator of the diagonal fragment of n-ary modal logic can be defined as α ⊙ α.
104
Bryson Brown
Perhaps unsurprisingly, the reception for Wolf’s paper may not have been entirely positive: Schotch reports a ‘fierce scowl’ on Rich Thomason’s face during the presentation, though he adds that this might not have indicated a negative response to the paper. At any rate, following the conference Schotch and Jennings went off to pursue more work on paraconsistency.9 The result was a series of papers by Schotch, Jennings and Johnston on paraconsistency and new ideas for consequence relations that appeared between 1979 and the mid-80’s. In 1979, in the IEEE annual conference proceedings, Schotch and Jennings presented their new measure of (departure from) consistency in a paper titled “Multiple Valued Consistency”. In Schotch and Jennings [1980], the links between modality and consequence were explored, paralleling Scott’s rule for the base modal logic K with the forcing rule for the Kn modal logics. Another topic in Schotch and Jennings’ investigations at the time grew out of the study of n-ary relations (where n > 2) that their semantic interest in these modal logics led them to. The theory of binary relations has dominated mathematical exploration of relations — even ternary and quaternary relations rarely come to the fore there. Special names are used to characterize particular families of binary relations: reflexive, binary, transitive, symmetric, Euclidean, serial, etc. But names for particular families of ternary relations are hard to find — as are explorations of any correspondence between the standard families of binary relations and relations of higher ‘arity’. While binary relations play important roles in our understanding of the modal logic of binary frames, there were many interesting questions to ask about the parallel cases (if any) for ternary and higher arity frames. D.K. Johnston took a special interest in these issues at the time. But the central target of preservationist work in this period was the forcing relation, its semantics and axiomatization. In the early 80’s Schotch and Jennings contributed “On Detonating” to a volume on paraconsistency being edited by Graham Priest and Richard Sylvan. This paper presented a multiple-conclusion version of forcing, with weakened aggregation used to treat both inconsistency on the left and the failure of consistent deniability on the right. The volume finally emerged in 1989 as [Priest et al., 1989].10 The issues pursued in this period include the completeness question, the axiomatization of forcing, frame correspondence issues, and the 1st order definability of frame conditions providing semantics for extensions of the base diagonal n-ary modal logics formulated in terms of the forcing relation. It was during this time that Schotch and Jennings’ exploration of 9 On the way home, Jennings found a pen on the plane that he went on to use in the course of writing five new papers. 10 Schotch and Jennings’ sense of urgency to get these early results published was justified — in 1980 David Lewis first presented “Logic for Equivocators” (published in 1984 in Nous); he subsequently inquired about exactly when Schotch and Jennings had first published their ideas. On finding they had published first, he later referred to their work for ‘technical details’ as an illustration of his own more conservative approach to paraconsistency. In fact, at least as I see it, Lewis’ motives in his [1984] suggest something closer to the ambiguity-based preservationist approach to Priest’s LP, first proposed in [Brown, 1999].
Preservationism: A short history
105
the semantics of forcing first provided a completeness proof for an axiomatization of forcing — however, the proof was so tangled and difficult to follow that Schotch and Jennings never published it. But several notions that emerged from that work played important roles in subsequent developments.
Coverings and level functions In Jennings and Schotch [1984], the forcing consequence relation arises from a definition of levels of incoherence, reflecting a formally tractable sense in which things can be made worse, as we extend a given premise set. As we’ve seen, the n-forcing relation can be invoked to axiomatize the modal logic of n + 1-ary frames, where a sentence α is an n-consequence of a set of formulae, Γ, if and only if α follows from some cell in every n-partition of Γ. The definition of levels of incoherence generalizes on this, providing the base for a new consequence relation that takes full advantage of the possibility of applying a range of stronger or weaker principles of aggregation to our premises. Level functions are defined very generally at first: With Γ a set of sentences of a propositional language L not including ⊥, 0. ∀Γ, ℓ(Γ) ≤ ω 1. ℓ(∅) = 0 2. ∀Γ ∈ 2At , ℓ(Γ) = 1 3. ∀Γ ⊆ {α| ⊢ α}, ℓΓ) = 0 4. If Γ ⊆ Γ then ℓ(Γ) ≤ ℓ(Γ′ ) 5. If ℓ(Γ) = n and level(Γ′ ) = m then ℓ(Γ ∪ Γ′ ) ≤ (n + m) The level function central to the forcing relation is most generally characterized in terms of coverings rather than partitions: Covering An indexed family of sets of sentences, A = ∅, Ai , 1 ≤ i ≤ ξ covers a set Γ, if and only if, for every γ ∈ Γ, ∃A ∈ A : A ⊢ γ. The level of incoherence of Γ is then defined as: ℓ(Γ) = Min ξ such that ∃A: A = ∅, A1 . . .Aξ , A covers Γ and ∀A ∈ A, A ⊢ ⊥, if this limit exists. ℓ(Γ) = ∞, otherwise. For example, ℓ({α ∨ ¬α}) = 0 since the null set covers this set (and the same goes for every theorem, as 3 above requires). ℓ({α, ¬α}) = 2, since the family: ∅, {α}, {α} is a consistent cover of {α, ¬α}, while no smaller family can consistently cover this set. Finally, ℓ({α ∧ ¬α}) = ∞; this, of course, is the assigned ‘level’
106
Bryson Brown
of any set including a contradiction, and is added to our definition to extend the level function to all of 2L . But for any property P of sets of sentences, which is not preserved (in general) in supersets, we can define corresponding notions of covering and level that measure how far a given set departs from having the property. Further, as was first shown in [d’Entremont, 1983], if the initial property is compact, so is the corresponding level property.
Level-respecting consequence With the notion of a level of incoherence in hand, we can now define a new consequence relation which preserves levels in the same way that classical logic preserves consistency (which amounts to preserving levels 0 and 1). Where ξ = ℓ(Γ), Γ[⊢ α if and only if, ∀A : ∅, Ai : 1 ≤ i ≤ ξ ∧ A covers Γ ⇒ ∃A ∈ A : A⊢α As Schotch has observed, this consequence relation is interesting not just because of its subtle approach to aggregation, but also because, while it can be axiomatized and so proofs involving it can be conducted mechanically, which rules are correctly applicable to a given premise set depend on the set’s level. Because level is a generalization of consistency, we cannot, in general, finitely establish the level of an infinite premise set. This implies that while the rules for reasoning in accord with this notion of consequence are mechanically applicable and the consequence relation is compact, we cannot mechanically decide which rules are the right ones to apply to any given premise set. The potential implications of such rules for foundational studies have yet to be explored.
Traces The central concept in the semantics of forcing is that of an n-trace. T is an n-trace on a set Γ if and only if T is a set of sets such that some member of T is covered (i.e. is a subset of some cell) in each n-partition of Γ. So, where Tn (Γ) is the set of n-traces on Γ and Πn is the set of n-partitions of Γ, T ∈ Tn (Γ) ⇔ ∀π ∈ Πn (Γ), ∃τ ∈ T, p ∈ π : τ ⊆ π. A (left) formulated n-trace on a set of sentences is a disjunction of the conjunctions of the sentences in each element of the trace. Then every consequence that ‘survives’ the n-partitions of a set of sentences is implied by some such formulated trace on the set: By definition, such consequences are implied by some cell in every partition. So one trace implying such a consequence, α, is the set of all such cells, i.e. the set including a cell proving α chosen from every n-partition: Every such partition covers at least one member of this trace, and every member of the trace implies the consequence in question. Therefore the set of n-traces on the premise set captures the aggregative force of n-partitioning a premise set, and a proof that
Preservationism: A short history
107
a rule is sufficient to produce the n-traces will (when combined with rules sufficient to obtain the consequences of all singleton premises) be a completeness proof for forcing. Schotch and Jennings [1989] extends these ideas to a multiple-conclusion version of forcing, dualizing their weakening of aggregation on the left with a symmetrical weakening of aggregation on the right. In classical multiple-conclusion logics, any conclusion set at least one of whose members must be true is trivial, that is, it follows validly from any premise set. This trivialization of right-sets that cannot be consistently denied is the dual image of the classical trivialization of inconsistent sets on the left. In multiple-conclusion forcing, coverings of the conclusion set sufficient to divide it into consistently deniable cells are added to the coverings of the premise set already familiar from single conclusion forcing. Like premise sets, conclusion sets are assigned a (right-) level, defined as the least cardinality of a non-trivial covering of the conclusion set. A covering is non-trivial iff none of its members is classically trivial. Conclusion sets under multiple-conclusion forcing can be thought of as closed not under disjunction (which trivializes a conclusion set if at least one of its members must be true), but under 2/n + 1R , an operation forming the conjunction of pairwise disjunctions amongst any n + 1 members. Here the notion of a right-formulated n-trace comes in. Rather than a disjunction of conjunctions, a right-formulated trace is a conjunction of disjunctions. Any premise sentence that forces a conclusion set of level n must classically imply some right-formulated n-trace on the set. We can see this just by considering a special case in the definition of forcing. We define singleton-on-the-left forcing as follows: A singleton premise γ forces a conclusion set ∆ of level n if and only if some member of every n-covering of ∆ is a classical consequence of γ. But, just as for singleton conclusion forcing, all we need do to find a right-formulated trace that γ classically implies is to place each such member of the n-partitions into our n-trace, and form the conjunction of its disjunctions. Ex hypothesi, each of the disjunctions is implied by γ. Therefore their conjunction is as well.
Multiple Conclusion Forcing In multiple conclusion forcing, we say that Γ[⊢ ∆ holds if and only if for every ℓ(Γ) covering of Γ, A, and every ℓ′ (∆) covering of ∆, B, there is some pair of cells, a, b, a ∈ A and b ∈ B, such that a ⊢ b; more formally: Γ[⊢ ∆ iff ∀A = ∅, a1 , . . .ai |1 ≤ i ≤ ℓ(Γ)&∀γ ∈ Γ, ∃i : ai ⊢ γ&∀i, ai ⊢∅, ∀B = ∅, b1 , . . .bj |1 ≤ j ≤ ℓ′ (∆)∀δ ∈ ∆, ∃ai : δ|ai &∀i, ∅ ⊢ ai , ∃ak ∈ A, bl ∈ B, ak ⊢ bl A simple system of rules for multiple-conclusion forcing takes the form:
108
Bryson Brown
i.
Pres ⊢:
ii.
Ref:
iii.
2/n + 1(L) :
iv.
2/n + 1(R)
v.
Trans:
Γ[⊢ α, α ⊢ β, β[⊢ ∆ Γ[⊢ ∆ α∈Γ β∈∆ Γ[⊢ α β[⊢ ∆ Γ[⊢ ∆, α1 . . . Γ[⊢ ∆, αn+1 Γ[⊢ ∆, (αi ∧ αj ), 1 ≤ i = j ≤ n + 1 Γ, α1 [⊢ ∆, . . . , Γ, αn+1 [⊢ ∆
Γ, (αi ∨ αj )[⊢ ∆, 1 ≤ i = j ≤ n + 1 Γ, α[⊢ ∆, Γ[⊢ α, ∆ Γ[⊢ ∆
where n = ℓ(Γ) where n = ℓ′ (∆)
It turns out that whenever every n-covering of a premise set and every n′ covering of a conclusion set are such that one cell of the first classically implies one cell of the second, there is a singleton forced by the premise set and forcing the conclusion set. This singleton bridge principle, the main lemma of the completeness proof for multiple-conclusion forcing, was not established until 2001; the proof appears in Brown, “Refining Preservation”, presented at the 2003 meetings of the Society for Exact Philosophy.
Hypergraphs A student of Jennings, David Wagner, was the first to point out an important link between traces and graph-theory. A graph can be represented as a set of pairs, representing the edges of the graph. A hypergraph is defined as a set of sets; each element of the hypergraph is called an edge of the hypergraph A hypergraph can be (properly) n-coloured if and only if one of n colours can be assigned to each of its atoms in a way that leaves none of its elements monochrome. Thus what Scotch and Jennings had called an n-trace is a familiar mathematical object, a non-n-colourable hypergraph. Further, the aggregation rule for the Kn modal logics, 2/n + 1 : α1 , . . .αn+1 / (αi ∧ αj ), 1 ≤ i = j ≤ n + 1 corresponds to the smallest non-n-colourable hypergraph, the complete graph on n+1 vertices. Completeness The first published solution to the completeness problem for single-conclusion forcing was discovered by Brown in the winter of 1991. Having worked on the problem on and off since 1988, Brown was on sabbatical at the Center for Philosophy of Science in Pittsburgh. There he learned that Belnap and Massey had noticed an important role for the 2/n + 1 aggregation principles so central to forcing in their exploration of just how completely classical semantics determined the classical consequence relation (see [Belnap and Massey, 1990]). Brown returned to the problem with a new approach: his aim was to show that 2/n + 1 would suffice to
Preservationism: A short history
109
prove a contradiction from any n-inconsistent set of premises. The key step in the final proof was an induction on the cardinality of the premise set; a simple appeal to ∨-elimination (correct for singleton forcing) together with the monotonicity of n-forcing led to the result. A version of this proof was published in [Brown, 1993b], a collection of papers published in honour of Ray Jennings, on the occasion of his 50th birthday. A more formal presentation, showing completeness for Schotch and Jennings’ Kn modal logics, appeared in [Apostoli and Brown, 1995]. A further generalization of this work appears in a technical report by Paul Wong [1997], in which the completeness proof presented in [Apostoli and Brown, 1995] is extended to multi-ary modal operators. The connection with hypergraph colourings recognized by Wagner implies that if the completeness result holds, then a hypergraph operation corresponding to 2/n+ 1, forming the pairwise union of the edges of every pair out of n+1 input hypergraphs, together with trivial operations of edge-addition and edge-contraction, will produce all the non-n-colourable hypergraphs from the trivially non-n colourable singleton loop graphs. This result was later generalized, in the appendix of Brown and Schotch [1999]. The chromatic index of a hypergraph H is defined as the least number of colours required for a proper colouring of H. The generalization shows that using any hypergraph of chromatic index n+1 as a ‘template’ for aggregation together with edge-addition and contraction will allow the construction of all the non-n colourable hypergraphs. In a general spirit, Brown [2000], Brown and Schotch [1999], and Brown [2002] present an approach to aggregation that shifts aggregation from operator rules like 2/n + 1 to structural rules governing how conclusions from sets of premises. Such logics arise from consideration of the type-raising transition involved in going from from truth-preservation as a relation between sentences in a language to ‘truth-preservation’ as a relation between sets of sentences. The structural rule for aggregation applies the graph-theoretic completeness results, beginng with the closure of singleton premise sets (treated as hypergraphs with a single, singleton edge) under our graph-theoretical analogue of 2/n + 1 (along with edge-addition and contraction). The rule allows the derivation of anything provable in a base logic from every member of a resulting hypergraph. On this account, full aggregation is not a matter of having the rule α, β ⊢ α ∧ β, but a matter of the set of graphs we can construct from our singleton premise graphs. Classical (full) aggregation results when no chromatic index limit is imposed on the hypergraphs we can reason from. The advantage of this approach is that the resulting account of aggregation is independent of the connectives available in the object language. Subsequent explorations by T. (later D.) Nicholson, first an undergraduate and then a doctoral student of Jennings, led to dualized formulations of these logical results [Nicholson, Jennings and Sarenac, 2002] and to deeper graph-theoretical results, including an axiomatization of the notion of ‘family resemblance’ [Nicholson and Jennings, forthcoming] and a more general characterization of the non-n colourable hypergraphs and their duals, the n-chines. This has led to the latest formulation of these weakly aggregative logics, using dual formulations based on
110
Bryson Brown
the transverse hypergraphs. A new completeness proof based on this dualized approach to aggregation is due to Jennings, D. Sarenac, and especially D. (then T.) Nicholson — see [Nicholson et al., 2001]. This work is built on some interesting new concepts, and develops some striking results about them that have elegant and illuminating connections to the original presentation of forcing. Dual to the notion of an n-trace is an n-chine. We begin by redefining the notion of an n-trace in terms of colourings: Let an n-colouring of the elements of a set S be a function C in S → {1, . . .n}. Then an n-trace T is a set of sets, τi such that any n-colouring of the elements of T ’s members leaves at least one τi ∈ T monochrome. Similarly, the n-chines Xn are sets X whose members χi are such that any n-colouring of the elements of X’s members will be such that every element of X includes a member with one of the n-colours, i.e. ∃i ∈ n such that 1 (i) ∩ χj = ∅. A left-formulated n-chine on a set of sentences Γ is all χj ∈ X, C− a conjunction of disjunctions of the elements of an n-chine on Γ, while a rightformulated n-chine on a set of sentences ∆ is a disjunction of conjunctions of the elements of an n-chine on ∆. The original aggregation rule of forcing is 2/n + 1 : Γ[⊢ α1 . . .Γ[⊢ αn+1 /Γ[⊢ (αi ∧ αj), 1 ≤ i = j ≤ n + 1
It is based, as we’ve seen, on the complete graph on n + 1 vertices, which is the smallest non-n colourable hypergraph. But we can replace that rule with a dual, chine-based rule: n/n + 1 : Γ[⊢ α1 . . .Γ[⊢ αn+1 /Γ[⊢ (αi ∨ . . . ∨ αj ), 1 ≤ i = j ≤ n + 1
where the disjunctions are amongst the n-tuples drawn from the αi . One extremely elegant result of the exploration of this new formulation of forcing is a simple graph-theoretic criterion for n-chines: X is an n-chine if and only if the intersection of every n-tuple of elements of X is non-empty. An illuminating relation between chines and traces is that the set of least interceptors of an n-chine is a n-trace, and vice versa: T ∈ Tn ⇒ {χ|∀τ ∈ T : χ ∩ τ = ∅ & ∀χ′ , χ′ ⊂ χ → ∃τ ∈ T : χ′ ∩ τ = ∅} ∈ Xn X ∈ Xn ⇒ {τ |∀χ ∈ X : τ ∩ χ = ∅ & ∀τ ′ , τ ′ ⊂ τ → ∃χ ∈ X : τ ′ ∩ χ = ∅} ∈ Tn Finally, new ideas about aggregation have emerged in an axiomatization of the notion of family resemblance [Nicholson and Jennings, forthcoming]. The theory of family resemblance Nicholson and Jennings propose is grounded in Wittgenstein’s suggestion that the similarities connecting various kinds of things (get some results in here) involve a collection of properties that members of the kind possess in overlapping, criss-crossing combinations. But the formal details are far richer than the apparently elementary nature of their proposal would suggest. Nicholson and Jennings define the formal notion of a family in a very general way: Let P be a set of properties. Then
Preservationism: A short history
111
A set F is a family on P if and only if F ⊆ 2P , where F = ∅ and ∅ ∈ F. Family resemblance is a matter of overlap between various members of the family, so what we need to produce a measure of resemblence here isa measure, or measures of that overlap. Where S is a set and q an integer, let Sq be the set of q-tuple subsets of S. Jennings and Nicholson next define the harmonic number of a family F:
df def η(F) = min n : ∃G ∈ Sn : G = ∅, if this limit exists; else = ∞.
This by itself is not particularly new except as an application of earlier notions: In terms of our earlier vocabulary the greatest n such that F is an n-chine is F’s harmonic number. But Nicholson and Jennings generalize on this notion, defining the n(-harmonic) saturation number of F, σn (F): F def σn (F) = min m ≥ 1 : ∃k ∈ {1, 2, . . . , n}, ∃G ∈ :| ∩G |≤ m. k
This notion measures the thickness of the minimal overlap amongst n-tuples drawn from F, set equal to one more than the cardinality of the minimum overlap. By measuring this minimal thickness of family resemblance for different cardinalities of selections from F, σn (F), these numbers provide a partial ordering of family resemblance that allows us to compare different families and their relations in a variety of ways. Recently, Jennings, Schotch and Brown have returned to work on the set-set consequence relation first explored in “On Detonating”. Brown has adapted the completeness proof for the forcing relation to the set-set case. The key to this proof is a main lemma that demonstrates the equivalence of two distinct representation theorems. The lemma shows that aggregation on the left and right of the turnstile is all that distinguishes a multiple premise and conclusion logic from a base consequence relation defined on single sentences. Thus we can say that Γ[⊢ ∆ if and only ∃α : Γ[⊢ α and α[⊢ ∆. The graph-theoretical result of [Brown and Schotch, 1999] shows that the aggregation imposed by n-coverings of premise and conclusion sets can be captured separately by 2/n + 1 on the left and right (or, dually, by n/n + 1). Thus the completeness result for the single conclusion case (easily dualized to cover the single premise/multiple conclusion case) extends to the multiple conclusion logic. These ideas about aggregation have been brought to bear on some interesting applications as well; we briefly describe three of them here. First, in a project that began in 1984, David Braybrooke, Brown and Schotch developed hyperdeontic logic, a logical system combining a dynamic logic of action with a logic of rules. The aim of the project was to arrive at a formal system in which the contents of rules could be clearly expressed, in a way that would illuminate various discussions of rules in social history. The results of this investigation appeared in [Braybrooke, Brown and Schotch, 1995], as well as a few articles — for specifically logical work
112
Bryson Brown
see especially [Schotch, 1996; 2006]. In this work weak aggregation is applied as Jennings and Schotch originally envisioned, to prevent the trivialization of rule systems when they make inconsistent demands. Rather than conclude that such a system demands that its subjects bring about ⊥ (and everything that follows from ⊥), this system respects the level of a rule-systems’ demands. Conflict remains (not all the demands can be met), but this view of the logic of rules allows us to reason sensibly about what is and isn’t required by the rules when such conflicts arise. Second, in his thesis work [Brown, 1985] and in subsequent work [Brown, 1992a; 1993], Brown explored the use of non-aggregative and weakly-aggregative paraconsistent logics to capture the contents of conflicting and internally-inconsistent theories in science. This work has since been extended to work on the idea of approximate truth and its contextually-restricted nature, allowing even inconsistent theories to qualify as approximately true within certain contexts of application. Brown’s dissertation examined the mid-to-late nineteenth century conflict over the age of the earth. The later work has focused on the inconsistency of the old quantum theory, and especially Niels Bohr’s theory of the hydrogen atom. In his [1913], Bohr explicitly proposed a policy of division: As a response to the inconsistencies of Planck’s theory of black-body radiation, Bohr separated his quantized account of the allowed states of the hydrogen atom from the classical eletrodynamic treatment of the light emitted (and, in later work, absorbed) by the atom; Einstein saw in this clever maneuver ‘the ultimate musicality of thought’. However, while the use of forcing allows us to avoid disaster in our logical models of the contents of inconsistent theories, it doesn’t do as much as we would like to account for actual patterns of reasoning involved in applying these theories. From the very beginning, forcing was recognized to be a very weak consequence relation. In effect, forcing takes the consequences of Γ, where ℓ(Γ) = n, to be the sentences that survive every n-division of the contents of Γ. Each n-division produces, after closure of each element of the division under ⊢, the union of n classical theories.11 So the closure of Γ under forcing is an intersection of unions of n classical theories — from a certain point of view, one might well ask, what makes such an object a single theory? The formal answer, that it’s a set closed under a consequence relation, is satisfying enough for formal purposes. But for applications, we surely want a closer inferential integration in order to give a nice account of how one reasons with such inconsistent theories. Schotch and Jennings [1989; 1981] explore ways to strengthen forcing. A-forcing, considers only those divisions that keep sets of sentences having some property A ‘together’ in the same element of the division. If we choose consistency for this property (that is, if we regard as consequences of Γ the consequences of every consistent subset) we get the relation ≻∗ of [Schotch and Jennings, 1981]. If we close under this relation and then close the results, indefinitely many times, we get the ≻ (read ‘yields’) relation, which is reflexive, transitive and monotonic. 11 In divisions with a trivial cell, of course, we get the set of all sentences, but so long as ℓ(Γ) = ∞, only the non-trivial divisions do any work.
Preservationism: A short history
113
But there is a problem with ≻: It makes things worse, inflating the level of any inconsistent set to ω. For example, let Γ be {p, ¬p}. Then the closure of Γ under ≻∗ , Csucc∗ (Γ) includes ‘p → A’, for every sentence A. And {p, p → A} is consistent so long as A is consistent. So at the next application of ≻∗ , the results will include every consistent sentence in the language. This is not quite trivialization — we won’t be able to obtain any contradictions. What we have is the union of every consistent theory in the language. And that is too close to trivialization for comfort. In his MA thesis work with Jennings [Thorn, 1999], Paul Thorn also explored ways of strengthening the forcing relation. One proposal, called n-forcing+ [Thorn, 1999, 91], involves adding to each cell of an n-partition every sentence in the premise set consistent with every consistent subset of Γ. This relation strengthens forcing by eliminating some of the divisions of Γ (the unnecessarily weak ones, in the sense that any members of Γ that are consistent with every consistent cell in any n-division of Γ). We could even eliminate some more, combining A-forcing with Thorn’s proposal to arrive, at the limit, at ℓ(Γ) consistent theories each of which is the classical closure of some maximal consistent subset of Γ. But, though this approach avoids the near-trivialization of ≻, it still leaves us with a union of ℓ(Γ) classical theories as the final result. The skeptical observer will want to know why we should regard the result as a single ℓ(Γ)-theory rather than ℓ(Γ) separate theories. With some specific applications in mind and the same concerns about the weakness of forcing, Priest and Brown [2004] proposes yet another way to strengthen the forcing relation. Assuming from the start (as some applications suggest) that a particular division is already in view, the authors went on to propose an approach they call ‘chunk and permeate’: Certain kinds of sentences that can be inferred in some cells of the division are allowed to ‘permeate’ through into other cells of the division, where further inferences can be made. So to specify a chunk and permeate theory, we must specify a division of its claims, and a permeation rule determining what consequences from each element of the division are to permeate into what other cells. As with the ≻ relation, the full results of this process are not to be had in a single step; in general, we must close, permeate, close again, and so on, to arrive at all the results such an approach can produce. But, unlike ≻, Brown and Priest propose a more restricted rule for combining the results of each step to obtain further consequences. As we’ve seen, if consistency alone is enough to allow sentences to be combined into premise sets from which new conclusions are drawn at the next step, then if we begin with an inconsistent set, we can infer every consistent sentence in the language. Brown and Priest’s more restricted permeation relation allows us to hope (at least) and prove (in some cases) that the chunk and permeate relation will preserve the level of our original set. In the initial form of chunk and permeate, a single element of the division is specified as the place where conclusions are drawn. Aside from some odd cases (in which another element of the division is allowed to trivialize but is sufficiently isolated by the permeation relation that it does not export its trivialization through to the con-
114
Bryson Brown
clusion element) level-preservation is essential to non-trivial chunk and permeate structures. One advantage of this approach is that the consequences derived from such a theory are not always consequences of some consistent sub-theory of Γ. Thus, a chunk and permeate theory can have a stronger kind of logical unity than we find when we simply close Γ under forcing. More to the point for those concerned about applications, this procedure for extracting consequences from an inconsistent theory reflects the apparent practice of two important examples of inconsistent theories: the early calculus and old quantum theory. In both cases, calculations produce results that are then treated using mathematical theories that are incompatible with the principles and operations employed in the initial calculation. In [Brown, 1999] another application of forcing is considered. Forcing in its original form aims to keep aggregation from turning contrary sentences into contradictions. With this in mind, it was natural to consider the lottery paradox and Kyburg’s preferred diagnosis of it in terms of conjunctivitis: The insistence that rational acceptance is closed under conjunction. But, from the point of view of a probabilistic rule of acceptance, contradictions (whose probability is always 0) are not the only ‘bad’ sentences: We may well regard sentences falling below a certain probability threshold as unacceptable as well. This suggests a variation on level: Let ℓεp (Γ) be the least n such that an n-division of Γ has no cell whose closure under conjunction includes a sentence of probability less than ε. Then we can take as the consequence set of Γ, relative to the threshold ε, its closure under ℓep (Γ)-forcing. Finally and most recently, Jennings and Nicholson (forthcoming) propose a general logic of taxonomic ranks based on their n-saturation numbers, a measure of family resemblance. Two familiar operations that preserve family resemblance are edge addition and edge expansion. Jennings and Nicholson capture these two operations under the relation of subsumption: Given families F and G, F = G (read F subsumes G) iff for every edge g in G, there is an edge f in F such that f ⊇ g. Clearly, if σn (G) > m, then σn (F) > m. This gives us the first rule of the logic: [↑⊒] : given F, if G = F, obtain G. The next rule tells us how to aggregate families to produce new families that preserve σn , using an aggregation principle very close to that for aggregating nchines: We begin with a sequence of families, S = F1 , . . .Fi , . . .Fq . For T a set and n ≥ 1, T n-covers S if ∃{F1 , . . .Fn } ∈ (Sn ) such that ∀i ∈ {1, . . ., n}, T is a superset of some e ∈ Fi . T is a minimal n-cover if and only if T is an n-cover and every proper subset of T is not an n-cover. Then n/q(F1 , . . .Fi , . . .Fq ) is the set of all minimal n-covers of S. With these definitions in hand, the aggregation rule is straightforward: [n/n + 1]: Given G1 , . . ., Gn , obtain n/n + 1(G1 , . . ., Gn ), for n ≥ 1. Taxonomic rank, relative to an m − n derivation D of a family F is defined as the position of a family F in the derivation. Such ranks arise when wider families,
Preservationism: A short history
115
by subsuming narrower ones and/or being constructed by our aggregation rule, have at least the degree of family resemblance of the starting families. This account makes taxonomy a study of the preservation of a class of measures of family resemblance. The horizons of preservationist thinking extend beyond these weakly aggregative systems. The generalizations of consistency that forcing preserves are just one set of properties that can be preserved by an interesting consequence relation. But this case alone is enough to establish the interest of preservationism in logic: Broadening the range of properties that a consequence relation could preserve has produced alternative consequence relations that are both formally interesting and philosophically promising.
Two Further Preservationist Ideas A. Paradox tolerance, conditionals and Nobel measures In “Paradox Tolerant Logic” [1983], Jennings and Johnston proposed a preservationist approach to producing new forms of conditional connective. Their idea was to add a meta-semantic value to the usual truth value, intuitively read as “fixedness”. A sentence’s truth value was taken as fixed if its truth value is settled by the nature of the language, with no need for further input from the way the world is. And they assumed that we will want, from time to time, to be able to reason productively within a language that produces, in some circumstances, untoward (i.e. contradictory) consequences. By insisting on the preservation of fixedness, though, we can block inference from such unfortunate results to arbitrary conclusions about how the world is. The necessary technicalities for the system are straightforward: The sentences of a theory’s language are treated as unanalyzed atoms. They are assigned ordered pairs of values, selected from 1, 1, 1, 0, 0, 1 and 0, 0. The first member of the pair indicates truth or falsity, while the second indicates fixedness (settled by the choice of language) or lack of fixedness (settled by language and the way the world is). The tables for connectives in the logical system within which we study the theory language are set up intuitively, ensuring that classical tautologies receive the value 1, 1 while contradictions are assigned 0, 1. A conjunction of two fixed truths is itself a fixed truth; a conjunction including a fixed falsehood is a fixed falsehood. Negations reverse truth value while preserving fixity. The resulting tables are straightforward: α
¬α
1, 1
0, 1
1, 0
0, 0
0, 1
1, 1
0, 0
1, 0
116
Bryson Brown
∨
1, 1
1, 0
0, 1
0.0
1, 1
1, 1
1, 1
1, 1
1, 1
1, 0
1, 1
1, 0
1, 0
1, 0
0, 1
1, 1
1, 0
0, 1
0, 0
0, 0
1, 1
1, 0
0, 0
0, 0
∧
1, 1
1, 0
0, 1
0, 0
1, 1
1, 1
1, 0
0, 1
0, 0
1, 0
1, 0
1, 0
0, 1
0, 0
0, 1
0, 1
0, 1
0, 1
0, 1
0, 0
0, 0
0, 0
0, 1
0, 0
Finally, an implication connective is introduced which aims to capture the closure under consequence of a set of sentences in the theory’s language. The truthtable for this implication requires that a true implication preserve both truth and fixedness; fixedness of the conditional itself is decided by considering the changeability of the conditional’s truth value in the light of the changeability of the antecedent’s and conditional’s truth values and the (meta) fixedness of their fixedness values (300). The result is the following matrix: →
1, 1
1, 0
0, 1
0, 0
1, 1
1, 1
0, 1
0, 1
0, 1
1, 0
1, 1
1, 0
0, 0
0, 0
0, 1
1, 1
0, 1
1, 1
0, 1
0, 0
1, 1
1, 0
1, 0
1, 0
The original idea of PTL was to provide a non-explosive conditional that could serve in metalinguistic discussion of the consequence relation of some object language. One result of this metalinguistic focus is that Jennings and Johnston were not concerned with nested conditionals. Still, if we do consider nested cases, the non-explosive nature of the PTL conditional evaporates. Preserving both truth and fixity from left to right ensures that the ‘empirical consequences’ of the theory don’t trivialize when the theory’s language implies contradictory ‘fixed’ truths. But given a fixed contradiction ⊥ the nested conditional, (⊥ → α) → α holds for every α, and we are only two MPP inferences away from trivialization.
Preservationism: A short history
117
Extending the PTL approach to conditionals, Darko Sarenac’s MA thesis [2000] presents an infinite sequence of conditionals, each involving a longer string of ‘meta-valuational’ properties. PTL assigned 2-place values to sentences in L VP T L : L → {1, 1, 1, 0, 0, 1, 0, 0} with the first member of these ordered pairs was read as the truth-value of a sentence, while the second was read as fixedness. Sarenac extended this approach to n-place valuations, VS : L → {1, 1, . . .1, . . .1, 0, . . .0, 0, 1, . . .1, . . .0, 0, . . .} The first place in each n-tuple indicates the sentence’s truth value, while the rest indicate whether it has or lacks certain ‘meta-valuational’ properties, understood as properties of the preceding properties in the sequence. This apparatus allowed Sarenac to produce a sequence of logics, P T Ln , whose conditionals are progressively less explosive, as the sequence of properties preserved by the conditionals grows. The measure of explosiveness is defined relative to classically inconsistent sets of sentences, for instance {⊥} or {α, ¬α}. In P T L, ⊥ → α fails when α has the value 0, 0, and α → (¬α → β) fails when β has the value 0,0. But when we nest conditionals and apply modus ponens, the old explosions come back: ⊥ → (⊥ → α) always holds because the fixity value of ⊥ → α is always 1 (see line 3 of the matrix for →P T L above). Similarly, ¬α → (α → (¬α → β) always holds, for the same reason. So ‘first degree’ implication of P T L is paraconsistent in the sense that: ∃α ⊢P T L ⊥ → α. But ‘second degree’ P T L implication is not paraconsistent, since: ∀α ⊢P T L ⊥ → (⊥ → α). Sarenac defined a properly implicationally paraconsistent implication connective as one such that: ∃α, β :
No theorem of L is of the form x → (y → . . .(((. . . → β))). . .), x and y ranging over α, ¬α.
As we’ve seen, while ⊢P T L α → (¬α → β), ⊢P T L ¬α → (α → (¬α → β)). So P T L is not properly implicationally paraconsistent: P T L has conditional theorems that, given a sequence of applications of MPP, would allow us to infer any sentence in the language from the premise set {α, ¬α}. Sarenac then introduced a measure of explosiveness for a logic’s conditional. The definition of this notion turns on the depth of conditional nesting required for trivializing theorems like these to obtain. Consider the formula (α → (¬α → β)).
118
Bryson Brown
This sentence is an implicational fuse for the set {α, ¬α}; any logic L which includes this sentence as a theorem will detonate this set — that is, with this as a premise set, we need only make two modus ponens inferences to infer any sentence in the language, [Sarenac, 2000, 19]. While P T L lacks this fuse for {α, ¬α}, we have seen that it still has a fuse of its own for this set. But the fuse P T L provides is a longer fuse. This is the main idea behind Sarenac’s Nobel measure (22–3). We will build up to the definition of the Nobel measure of a set Γ in a logic L, beginning with a straightforward definition of the depth of conditional nesting of a sentence: i.
If α is not an implication, C(α) = 0
ii.
If α is an implication and γ is the consequent of α, then C(α) = 1 + C(γ).
Next, the fuse measure is defined for sentences γ that involve only the implication and negation connectives. Given a premise set Γ, the fuse measure of γ is a function fΓL with the value C(γ) if γ is a fuse for Γ in the logic L, and ∞ otherwise. The Nobel measure of a set Γ is then the length of the shortest fuse for Γ, if there is such a fuse, and ∞ otherwise. The general idea behind the P T Ln sequence of logics is very elegant: Sentences are assumed to have or lack many properties of semantic interest; these properties are gathered together in P = Pi , 1 ≤ i ≤ n. A valuation on the language assigns not just truth or falsity (a value for P1 ) to each sentence, but a value for each of the other properties as well. Each property Pi can be represented as a function in L → 2, and the full value of a sentence written as the sequence of the values of these functions. The conditional is required to preserve all the properties that the antecedent possesses, not just its truth. So, as in P T L, a conditional with a false antecedent is not necessarily true, and a conditional is trivially true only when the antecedent lacks all of the preservable properties. The trick in generating the sequence of increasingly less-volatile conditionals lies in arranging the nonalethic part of the conditional truth function to ensure, as each new property is added, that the minimum fuse lengths for {⊥} and {α, ¬α} are increased. The intersection of these increasingly weak conditional logics provides us with a truly non-explosive conditional, though (unlike the members of the P T Ln sequence) a complete axiomatization remains in question. Sarenac also explores another route to the goal of a non-explosive conditional which makes do with just two properties. The key difference between the properly paraconsistent conditional of SX and the conditional of P T L is that SX’s conditional retains the second value of its consequent.12 As a result, the conditional does not always receive the value ‘fixed’ when the antecedent is ⊥: 12 This creates difficulties for reading the second value as fixedness, but the P T L sequence n has already opened the door to a vast multiplicity of semantic properties that sentences may have or lack.
Preservationism: A short history
119
→SX
1, 1
1, 0
0, 1
0, 0
1, 1,
1, 1
0, 0
0, 1
0, 0
1, 0
1, 1
1, 0
0, 1
0, 0
0, 1
1, 1
0, 0
1, 1
0, 0
0, 0
1, 1
1, 0
1, 1
1, 0
The upshot is that in SX, there is no fuse for ⊥ or for {α, ¬α}; the Nobel measure of these sets is undefined. Nevertheless, SX is not paraconsistent by some other measures. While the SX conditional does not provide the means to detonate every necessary falsehood, SX follows P T L in treating the alethic profile of other connectives in a completely classical way. Still, SX and the P T Ln sequence provide useful models of conditionals that are resistant to trivialization. A weak but interesting form of preservationist paraconsistency is achieved here, as is a wider view of the properties that a conditional might be asked to preserve (and that might be of more general logical interest as well).
Ambiguity and Wild Cards: Brown [1999] began a line of work connecting logics first exploited by relevance and dialetheic logicians to preservationist ideas. The inspiration for this project emerged from reflections on some remarks by Diderik Batens. Batens had described his own modest motives, which led him to pursue a paraconsistent logic that would make minimal alterations in reasoning while allowing for tolerance of inconsistency. Batens was particularly concerned to isolate inconsistency by restricting its impact to a subset of the atoms; his approach was to make consistency the default assumption, and retract inferences when that assumption led to trouble. While returning home after the First World Congress on Paraconsistency in Ghent in 1997, Brown began to explore a central idea of his own approach, guided by the same modest aim. The idea was that, by treating certain sets of atomic sentences as ambiguous, we can project consistent images of inconsistent premise sets: Γ′ is an consistent image of Γ based on A iff ii.
A is a set of sentence letters.
iii.
Γ′ is consistent.
iv.
Γ′ results from the substitution, for each occurrence of each member a of A in Γ, of one of a pair of new sentence letters, af and at .
We write ConIm(Γ′ , Γ, A) for this relation. An obvious, if crude, measure of how far a set Γ is from being consistent, can be given by the number of atoms in the smallest set of atoms whose treatment as ambiguous would be sufficient to produce a consistent image of Γ. A subtler measure (respecting the fact that the ‘cost’ of treating an unanalyzed atom as
120
Bryson Brown
ambiguous needn’t be some fixed and equal quantity for all atoms) is to consider the set of least sets each of which is sufficient for projecting a consistent image of a premise set, Γ. We define Amb(Γ) as the set of least sets each of which is the base of some projection(s) of a consistent image of Γ. Formally: Amb(Γ) = {A|∃Γ′ : ConIm(Γ′ , Γ, A) ∧ ∀A′ , A′ ⊂ A, ¬∃Γ′′ : ConIm(Γ′′ , Γ, A′ )} We can now give a formal definition of the preservation relation: ∆ is an Amb(Γ)-preserving extension of Γ ⇔ Amb(Γ ∪ ∆) ⊆ Amb(Γ). To indicate that this preservation relation determines the acceptability predicate for our logic, we define: Accept(∆, Γ) iff ∆ is an Amb(Γ)-preserving extension of Γ ⇔ Amb(Γ ∪ ∆) ⊆ Amb(Γ). Combined with the earlier account of consequence relations as preserving the acceptability of all acceptable extensions, this leads us to a new consequence relation: Γ|Amb α ⇔ ∀∆ : Accept(∆, Γ) → Accept(∆ ∪ {α}, ∆). That is to say, α follows from Γ if and only if α is an acceptable extension of every acceptable extension of Γ. Such an extension, we can say, doesn’t make things worse because at least one of the ‘ambiguity’ sets allowing us to produce a consistent image of Γ will also allow us to produce a consistent image of Γ′ . Brown [1999] shows that this consequence relation is identical to the consequence relation of Priest’s logic of paradox (LP). This leads to an important (if obvious) observation about paraconsistent logics: The same consequence relation can be given very different philosophical readings. Priest’s LP is a dialetheic logic, originally presented using Kleene’s strong 3-valued matrices and treating the non-classical value (a fixed point for negation) as designated. But we can view the inferences of LP as guided instead by a preservationist understanding of the constraints that come in to prevent trivialization. LP is inelegant in much the way that singleton forcing is inelegant: it treats inconsistency on the left differently than it does theorems on the right. In LP, classical contradictions on the left don’t trivialize, but classical tautologies on the right do, that is, any such tautology follows from every premise set; cast in multiple-conclusion form, LP makes inconsistent premise sets nontrivial. Not every conclusion set follows from these. But LP trivializes all conclusion sets whose closure under disjunction includes a tautology. First degree entailment (FDE) is a logic closely related to LP that treats inconsistency on the left and its dual on the right symmetrically. Brown [2001] presents a closely related ambiguity-based account capturing the consequence relation of first degree entailment.
Preservationism: A short history
121
This treatment of FDE requires careful development of the symmetries of the consequence relation. Perhaps the most direct approach to re-imposing the leftright symmetries of classical logic on the ambiguity semantics for LP is to dualize the property to be preserved, and demand that this dual property be preserved from right to left. Having used ambiguity to project consistent images of the premise set, we now also use ambiguity to project consistently deniable images of the conclusion set. Let Amb∗ (∆) be the set of minimal sets of sentence letters whose ambiguity is sufficient to project a consistently deniable image of ∆. We require that any sentence from which ∆ follows be an acceptable extension of every acceptable extension of ∆, where acceptability is consistent deniability: Γ is an Amb∗ (∆)-preserving extension of ∆ ⇔ Amb∗ (∆ ∪ Γ) ⊆ Amb∗ (∆) We write this Accept∗ (Γ, ∆); the idea this time is that a set Γ is acceptable as an extension of a commitment to denying ∆ if and only if Γ includes ∆, and extending ∆ to Γ doesn’t make things worse, i.e. does not require any more ambiguity to produce a consistently-deniable image than merely denying ∆ does. Now we can define right-to-left consequence relation, with sentences on the left and sets of sentences on the right: γ ⊢Amb∗∗ ∆ ⇔ ∀∆ : Accept∗ (Γ, ∆) → Accept∗ (Γ ∪ {γ}, Γ). In English, ∆ follows from γ if and only if γ is an acceptable extension of every acceptable extension of ∆, considered as a set we are committed to denying. We can combine these two asymmetrical consequence relations to a symmetrical one by treating sets on the left as closed under conjunction and sets on the right as closed under disjunction, and demanding that both these consequence relations apply: Γ ⊢Sym ∆ ⇔ ∃δ ∈ Cl(∆, ∨) : Γ ⊢Amb δ&∃γ ∈ Cl(Γ, ∧) : γ ⊢Amb∗ ∆. Alternatively (linking both to a purely sentential consequence relation, so that the symmetrical set-set relation arises from type-raising a symmetrical sentencesentence relation), we can put it this way instead: Γ ⊢Sym ∆ ⇔ ∃δ ∈ Cl(∆, ∨), ∃γ ∈ Cl(Γ, ∧) : γ ⊢Sym δ. Where γ ⊢Sym δ iff {γ} ⊢Amb δ&γ ⊢Amb∗ {δ}. But the upshot of this maneuver is not the elegant FDE, but a less-refined logic sometimes called K ∗ . The two logics agree on the consequence relation except when classically trivial sets lie on both the left and the right. In those cases the triviality of the set on the other side ensures that the property we’re preserving is indeed preserved. So K ∗ , which is the logic just proposed here, trivializes when classically trivial sets appear on both the left and the right. To capture FDE, we need to be subtler about how to produce our symmetrical consequence relation.
122
Bryson Brown
The trick is to produce consistent images of premise sets and non-trivial images of conclusion sets simultaneously, requiring that the sets of sentence letters used to project these images be disjoint:13 Γ ⊢F DE ∆ iff every such consistent image of Γ can be consistently extended by some member of each compatible non-trivial image of ∆ (i.e. each non-trivial image of ∆ based on a disjoint set of sentence letters), or (now equivalently): Γ ⊢F DE ∆ iff every such non-trivial image of the conclusion set can be extended by some element of each non-contradictory image of the premise set while preserving its consistent deniability. Interestingly, there is another way to express this relation, which opens up a new understanding of preservation. This approach focuses less on what features of our premise and conclusion sets are preserved from left to right and right to left, and more on what we want to preserve regarding the consequence relation itself. We can say that we are preserving is the classical consequence relation itself, under a range of minimally ambiguous, consistent (or consistently deniable) images of our premises and conclusions: Γ ⊢F DE ∆ iff every image of the premise and conclusion sets, I(Γ), I ∗ (∆) obtained by treating disjoint sets of sentence letters drawn from Amb(Γ) and Amb∗ (∆) as ambiguous is such that I(Γ) ⊢ I ∗ (∆). This suggests a new preservationist strategy for producing new consequence relations from old. We can say that the new consequence relation holds when and only when the old relation holds in all of a range of cases anchored to (centered on) the original premise and conclusion sets. This strategy can eliminate or reduce trivialization by ensuring that the range of cases considered includes some non-trivial ones, even when the instance forming our ‘anchor’ is trivial. In fact, this idea can also be applied to Schotch and Jennings’ weakly aggregative forcing relation. A different presentation of these ambiguity-logics was explored in [Brown, 2005, ms.], a paper presented at the 2005 meetings of the Society for Exact Philosophy. Rather than construct consistent images of premise sets and consistently deniable images of conclusion sets, this presentation treated some atoms as ‘wild cards’ in constructing valuations. Wild card valuations differ from classical valuations only in how they treat a set of ‘wild-card’ atoms. Let L be a propositional language, At = {p, q, r, . . .} the set of atoms of L, and S1 , . . .Sn , . . . the sentences of L. A wild card valuation begins by selecting an 13 In effect, ambiguity allows us to capture the results of using ‘both’ and ‘neither’ as (respectively) designated and non-designated fixed points for negation, while insisting that the two sets of ambiguously-treated letters be disjoint ensures that we never treat the same sentence letter in both these ways.
Preservationism: A short history
123
element of 2At , W , as the set of wild card atoms. We assign values to the sentences of L first by settling on an a.
Assignment of 0 or 1 uniformly to each member of At − W .
Call this assignment AAt−W . Next, we b.
Assign 0 or 1 to each instance of an atom in W .14
Call the resulting assignment (now specified for each instance of each atom throughout L) W AAt−W . From here, we c.
Assign 0 or 1 to each complex sentence, based on the usual truth functional interpretation of the connectives.
The result is a wildcard valuation, W VAt−W . Let VAt−W be the set of all such valuations based on a given AAt−W .15 We don’t apply the members of VAt−W directly to arrive at our consequence relation. Instead, we quantify across them to obtain a valuation based on all the wildcard valuations for each wildcard set W . This is straightforward: Let VAt−W be the valuation determined by all the members of VAt−W . Then VAt−W ∈ L → {1, 0}, where VAt−W (S) = 1 if ∃V ∈ VAt−W : V (S) = 1. VAt−W (S) = 0 else. The last step is to define our consequence relation. But this is straightforward; we simply use the usual definition: Γ ⊢W α ⇔ ∀VW [(∀γ ∈ Γ, VW (γ) = 1) ⇒ VW (α) = 1]. The result is LP again; a dual treatment of wildcard valuations on the right allows us to capture K ∗ and FDE as well.
Final Reflections on Preservation The closing point of [Brown, 2003, ms.] is that, like the ambiguity semantics for FDE developed above, forcing too can be described as preserving the classical consequence relation across a range of related premise and conclusion sets, anchored to the given sets. By definition, Γ[⊢ ∆ holds if and only if the classical consequence relation holds between some pair of cells in every ℓ(Γ), ℓ′ (∆) division of Γ and ∆’s content. But we can capture this preservation of ⊢ in a way that emphasizes the parallel between our ambiguity semantics for FDE and forcing more strongly. Instead of dividing Γ’s content amongst the members of families of sets indexed to ℓ(Γ), we can achieve 14 This allows each instance of a wild-card atom in each sentence of L to receive either value freely. 15 So V At−∅ is just the singleton set of one classical valuation on L.
124
Bryson Brown
the same effect by means of ambiguity (note that for purposes of forcing, we insist that ambiguity does not arise within a single sentence): When ℓ(Γ) = n, we replace the sentence letters of Γ with n sets of new sentence letters, and produce images of Γ that replace the sentence letters in each γ ∈ Γ with sentence letters drawn from one of these sets. Supposing that no numerical subscripts appear in the sentence letters of Γ, we can replace the sentence letters of each sentence in Γ with the same letters combined with subscripts drawn from one of 1, . . .n. Then we can say that Γ[⊢ δ if and only if every such image of Γ has some such image of δ (i.e. an image of δ produced by replacing its sentence letters with letters combined with one of our subscripts) as a classical consequence. Similarly, we can say that γ[⊢ ∆ if and only if every such image of ∆ is such that some such image of γ (i.e. an image of γ produced by replacing its sentence letters in the same way) is a premise from which it follows classically. Finally, we can invoke the singleton bridge principle, and say that Γ[⊢ ∆ if and only if for some α, β, Γ[⊢ α, β[⊢ ∆, and α ⊢ β, where ⊢ is just the classical single turnstile. This observation, together with the earlier comments on the preservationist treatment of FDE, leads to my closing theme. The classroom treatment of classical logic puts the emphasis on the guaranteed preservation of truth. One preservationist response to this addresses it head on: To offer something else worth preserving, and show that it actually helps. This, of course, is most apparent when the preservation of truth becomes trivial. But another is to question the preservation of truth more closely, and to look more broadly at what is preserved in various consequence relations. Recent work by Jennings and some of his students has made a frontal assault on the first task. For an early example of this line of work, see [Jennings and Sarenac, 2006], where Pilate’s infamous question, ‘what is truth,’ gets some respect. The second got its start, for this author at least, in a conversation with Peter Apostoli about multiple-conclusion logics and Gentzen-style systems for them. Apostoli remarked that one might describe such systems as preserving the consequence relation itself. That suggestion inspired the readings here of both ambiguity-logics and forcing as preserving the classical consequence relation under ‘imaging’ operations applied to premise and conclusion sets. But more general questions arise here: What makes a property or relation a logically interesting preservable? What constraints on consequence relations remain, once a more general view of preservables is arrived at? So long as we think in terms of acceptably assertable extensions of premise sets and acceptably deniable extensions of conclusion sets, reflexivity seems inescapable; so long as all the acceptable extensions of an acceptable extension of a set Γ are themselves acceptable extensions of Γ, transitivity will also remain in place. Monotonicity is the most vulnerable of the traditional constraints, failing as soon as our standard for acceptable extensions is allowed to vary in response to what preservable property Γ turns out to have. But the others, too, can fail; how interesting (and illuminating) those failures may be remains to be seen. As to what makes a property preservable, no general account has yet emerged. Developing generalizations of consistency (and consistent deniability, on the right)
Preservationism: A short history
125
has certainly proved fruitful. Preservation of probability bounds on acceptable sentences has also been proposed, and, as we’ve just seen, preservation of the consequence relation itself under some transformations of premises and conclusions. But much remains to be discovered. BIBLIOGRAPHY [Apostoli and Brown, 1995] P. Apostoli and B. Brown. A Solution to the Completeness Problem for Weakly Aggregative Modal Logic. Journal of Symbolic Logic, 60, 3. 832-842, 1995. [Batens et al., 2000] D. Batens, J. van Bendegum, and G. Priest, eds. Frontiers of Paraconsistency: Proceedings of the First World Conference on Paraconsistency. Baldock, Hertfordshire, England ; Philadelphia, PA: Research Studies Press, 2000. [Belnap, 1977] N. D. Belnap. How a Computer Should Think. 30-56 in G. Ryle (ed.) Contemporary Aspects of Philosophy. Stocksfield, Eng.; Boston: Oriel Press, 1977. [Belnap, 1990] N. D. Belnap and G. Massey. Semantic Holism is Seriously False. Studia Logica, 83-86, 1990. [Braybrooke, 1996] D. Braybrooke, ed. Social Rules: Origin; Character; Logic; Change. Boulder: Westview Press, 1996. [Brown et al., 2004] B. Brown and G. Priest. Chunk and Permeate. Journal of Philosophical Logic, 33, 379-388, 2004. [Brown, 2004a] B. Brown. The Pragmatics of Empirical Adequacy. Australasian Journal of Philosophy, 82, 242-263, 2004. [Brown, 2004b] B. Brown. Knowledge and Non-Contradiction. In G. Priest and J.C. Beall, eds., The Law of Non-Contradiction, Oxford: Oxford University Press, 2004. [Brown, 2003] B. Brown. Notes on Hume and Skepticism of the Senses. Croatian Journal of Philosophy, 3 (9), 289-303, 2003. [Brown, 2000] B. Brown. Paraconsistent Classical Logic. In W.A.Carnielli, M.E.Coniglio and I.M.L.D’Ottaviano, eds., Paraconsistency: The Logical Way to the Inconsistent- Proceedings of the II World Congress on Paraconsistency 2000, Marcel Decker, New York, 2000. [Brown, 2002a] B. Brown. Approximate Truth. 81-103 in J. Meheus, ed., Inconsistency in Science, Dordrecht/Boston/London: Kluwer, 2002. [Brown, 2002b] B. Brown. On Paraconsistency. Part XII, entry 39, pp. 628-650 in Dale Jaquette, ed., A Companion to Philosophical Logic Malden: Mass, Oxford: Blackwell, 2002. [Brown, 2001] B. Brown. LP, FDE and Ambiguity. In H. Arabnia, ed., IC-AI 2001 Volume II- Proceedings of the 2001 meetings of the International Conference on Artificial Intelligence, CSREA publications, 2001. [Brown, 2000] B. Brown. Simple Natural Deduction for Weakly Aggregative Paraconsistent Logic. In Batens, van Bendegum, Priest (eds.), 2000. [Brown, 1999a] B. Brown. Adjunction and Aggregation. Nous 33:2, 1999. [Brown, 1999b] B. Brown. Yes, Virginia, There Really are Paraconsistent Logics. Journal of Philosophical Logic 28: 489-500, 1999. [Brown, 1999c] B. Brown. Smoke and Mirrors: A Few Nice Tricks: Critical Notice of Smoke and Mirrors by J. Brown, Dialogue, XXXVIII, 123-34, 1999. [Brown, 1996] B. Brown. Rules and the Rationality of Scientific Cultures. 53-74 in [Braybrooke, 1996]. [Brown, 1993] B. Brown. The Force of 2/n+1. in Martin Hahn, ed., Vicinae Deviae Burnaby: Simon Fraser University, 151-163, 1993. [Brown, 1993a] B. Brown. Old Quantum Theory: A Paraconsistent Approach. PSA 1992, Vol. 2, Philosophy of Science Association, 397-411, 1993. [Brown, 1992b] B. Brown. Rational Inconsistency and Reasoning. Informal Logic, XIV (1992) 5-10, 1992. [Brown, 1992a] B. Brown. Struggling with Conditionals. (critical notice of David Sanford, If P, then Q: Conditionals and the Foundations of Reasoning, London and New York: Routledge, 1989 Dialogue, 31, 4 (1992) 327-32, 1992. [Brown, 1990] B. Brown. How to be Realistic About Inconsistency in Science. Studies in the History and Philosophy of Science, 21, 2 (Je 1990) 281-294, 1990.
126
Bryson Brown
[Brown and Lepage, 2006] B. Brown and F. Lepage. Truth and Probability: Essays in honour of Hugues Leblanc, London: College Publications, at King’s College London, 2006. [Brown and Woods, 2001a] B. Brown and J. Woods, eds. Logical Consequence: Rival Approaches, Proceedings of the 1999 meeting of the Society for Exact Philosophy. General Editor: Dov Gabbay. Stanmore, Middlesex: Hermes Science Press, 2001. [Brown and Woods, 2001b] B. Brown and J. Woods, eds. New Essays in Exact Philosophy: Logic, Mathematics and Science: Proceedings of the 1999 meeting of the Society for Exact Philosophy. General Editor: Dov Gabbay (Stanmore, Middlesex: Hermes Science Press), 2001. [Brown and Schotch, 1999] B. Brown and P. K. Schotch. Logic and Aggregation. Journal of Philosophical Logic 28: 265-287, 1999. [d’Entremont, 1982] B. H. d’Entremont. Inference and Level, M.A. thesis, Dalhousie University, 1982. [Jennings, 1967] R. E. Jennings. Preference and Choice as Logical Correlates. Mind. 76: 556567, 1967. [Jennings, 1974a] R. E. Jennings. Pseudo-Subjectivism in Ethics. Dialogue. 13: 515-518, 1974. [Jennings, 1974b] R. E. Jennings. A Utilitarian Semantics for Deontic Logic. Journal of Philosophical Logic. 3: 445-456, 1974. [Jennings, 1981] R. E. Jennings. A Note on the Axiomatisation of Brouwersche Modal Logic. Journal of Philosophical Logic. 10: 41-43, 1981. [Jennings, 1982] R. E. Jennings. The Subjunctive in Conditionals and Elsewhere. Pacific Philosophical Quarterly. 63: 146-156, 1982. [Jennings, 1994] R. E. Jennings. The Genealogy of Disjunction. Oxford: Oxford University Press, 1994. [Jennings, 1985] R. E. Jennings. Can there be a natural deontic logic? Synthese. N 85; 65: 257-273, 1985. [Jennings and Johnston, 1983] R. E. Jennings and D. K. Johnston. Paradox-Tolerant Logic. Logique et Analyse. 26: 291-308, 1983. [Jennings et al., 1980] R. E. Jennings, D. K. Johnston, and P. K. Schotch. Universal First Order Definability in Modal Logic. Zeitschrift fuer Mathematische Logik und Grundlagen der Mathematik. 26: 327-330, 1980. [Jennings and Sarenac, 2006] R. E. Jennings and D. Sarenac. The Preservation of Truth. 1-16 in [Brown and Lepage, 2006]. [Jennings and Schotch, 1978] R. E. Jennings and P. K. Schotch. De Re and De Dicto Beliefs. Logique et Analyse. 21: 451-458, 1978. [Jennings and Schotch, 11980] R. E. Jennings and P. K. Schotch. Inference and Necessity. Journal of Philosophical Logic AG 80; 9: 327-240. [Jennings and Schotch, 1981a] R. E. Jennings and P. K. Schotch. Probabalistic Considerations on Modal Semantics. Notre Dame Journal of Formal Logic. Jl 81: 22; 227-238. [Jennings and Schotch, 1981b] R. E. Jennings and P. K. Schotch. Epistemic Logic, Skepticism and Non-Normal Modal Logic. Philosophical Studies. 40: 47-68, 1981. [Jennings and Schotch, 1984] R. E. Jennings and P. K. Schotch. The Preservation of Coherence. Studia Logica. 43: 89-106, 1984. [Jennings and Schotch, 1981c] R. E. Jennings and P. K. Schotch. Some Remarks on (Weakly) Weak Modal Logics. Notre Dame Journal of Formal Logic. 22: 309-314, 1981. [Jennings et al., 1981] R. E. Jennings, P. K. Schotch, and D. K. Johnston. The N-order Undefinability of the Geach Formula. Notre Dame Journal of Formal Logic. 22: 375-378, 1981. [Johnston, 1976] D. Johnston. A Generalized Relational Semantics for Modal Logic: MA thesis, Simon Fraser University, 1976. [Jonson and Tarski, 1951] B. Jonson and A. Tarski. Boolean Algebra with Operators. American Journal of Mathematics 73 891-939, 1951. [Massey, 1982] G. J. Massey. Bizarre Translation Defended. Philosophical Studies, N 82; 42: 419-423, 1982. [Massey, 1977] G. J. Massey. Negation, Material Equivalence and Conditioned Non-conjunction: Completeness and Duality. Notre Dame Journal of Formal Logic. JA 77; 18: 140-44, 1977. [Massey, 1966] G. J. Massey. The theory of truth-tabular connectives, both truth-functional and modal. Journal of Symbolic Logic, D 66; 31: 593-608, 1966.
Preservationism: A short history
127
[McLeod and Schotch, 2000] M. McLeod and P. K. Schotch. Remarks on the Modal Logic of Henry Bradford Smith. Journal of Philosophical Logic, 29 (6) 603-615, 2000. [Nicholson and Jennings, forthcoming] D. Nicholson and R. E. Jennings. An Axiomatization of Family Resemblance, forthcoming. [Nicholson et al., 2000] T. Nicholson, D. Sarenac, and R. E. Jennings. In [Brown and Woods, 2001a]. [Priest et al., 1989] G. Priest, R. Routley, and J. Norman. Paraconsistent Logic: Essays on the Inconsistent, Munich: Philosophia Verlag, 1989. [Sarenac, 2000] D. Sarenac. A Preservationist Approach to Implication, M.A thesis, Simon Fraser University, 2000. [Sarenac and Jennings, ] D. Sarenac and R. E. Jennings. The Preservation of Relevance. Eidos J 03; 17 (1): 23-36. [Schotch, 1996] P. K. Schotch. Hyperdeontic Logic: An Overview, In [Braybrooke, 1996, 21–37] [Schotch, 2000] P. K. Schotch. Skepticism and Epistemic Logic. Studia Logica 65: 187-198, 2000. [Schotch, 2006] P. K. Schotch. David Braybrooke on th eTrack of PPE. In [Sherwin and Schotch, 2006], pp. 325–344. [Schotch and Jennings, 1989] P. K. Schotch and R. E. Jennings. On Detonating. In [Priest et al., 1989, 306–327]. [Schotch and Jennings, 1981] P. K. Schotch and R. E. Jennings. Modal Logic and the Theory of Modal Aggregation. Philosophia. 9: 265-278, 1981. [Schotch and Jennings, 1980] P. K. Schotch and R. E. Jennings. Inference and Necessity. Journal of Philosophical Logic. 9 327-340, 1980. [Scott, 1974] D. Scott. Completeness and Aximatizability in Many-Valued Logic, Proceedings of Symposia in Pure Mthematics XXV; Proceedings, University of California, Berkeley, 23-30 June, 1971. Providence, RI: American Mathematical Society, 1974. [Segerberg, 1971] K. Segerberg. An Essay on Classical Modal Logic, Volume I Uppsala University Press, Uppsala, 1971. [Sherwin and Schotch, 2006] Engaged Philosophy: Essays in Honour of David Braybrooke. Tornto: University of Toronto Press, 2006. [Thorn, 1998] P. Thorn. The Normative Character of Interpretation and Mental Explanation, M.A. Thesis, Simon Fraser University, 1998. [Wong, 1997] P. Wong. Weak Aggregative Modal Logics with Multi-ary Modal Operators. Technical Report TR-ARP-10-97. Automated Reasoning Project, Research School of Information Sciences and Engineering, Australian National University, 1997. [Wong, 1998] P. Wong. Paraconsistent Inference and Preservation. Technical Report TRARP-11-98. Automated Reasoning Project, Research School of Information Sciences and Engineering, Australian National University, 1998. [Wong and Besnard, 2001] P. Wong and P. Besnard. Paraconsistent Reasoning as an Analytic Tool. Logic Journal of IGPL 9(2) 217-230, 2001. [Wong and Besnard, 2003] P. Wong and P. Besnard. Modal (Logic) Paraconsistency. 540-551 in Nielsen, T.D. and Zhang, N.L. Symbolic and Quantitative Approaches to Reasoning with Uncertainty: 7th European Conference, ECSQARU 2003 Aalborg, Denmark, July 2-5, 2003 Proceedings. Lecture Notes in Computer Science Vol 2711. Springer Verlag, 2003.
PARACONSISTENCY AND DIALETHEISM
Graham Priest
1
1.1
INTRODUCTION
Delineating the Topic of this Article
This article is about paraconsistent logic, logic in which contradictions do not entail everything. Though the roots of paraconsistency lie deep in the history of logic, its modern developments date to just before the middle of the 20th century. Since then, paraconsistent logic — or better, logics, since there are many of them — have been proposed and constructed for many, and very different, reasons. The most philosophically challenging of these reasons is dialetheism, the view that some contradictions are true. Though this article will also discuss other aspects of paraconsistency, it will concentrate specifically on its dialetheic aspects. Other aspects of the subject can be found in the article ‘Paraconsistency: Preservational Variations’ in this volume of the Handbook. The subject also has close connections with relevant logic. Many related details can therefore be found in the article ‘Relevant and Substructural Logics’, in Volume 4 of the Handbook. In the following two parts of this article, we will look at the history of the subject before about 1950. We will look at the history of paraconsistency; then we will look at the history of dialetheism. In the next two parts, we will turn to the modern developments, those since about 1950; first paraconsistency, then dialetheism. In the final three parts of the article will look at some important issues that bear on paraconsistency, or on which paraconsistency bears: the foundations of mathematics, the notion of negation, and rationality.
1.2
Defining the Key Notions: Paraconsistency
Let us start, however, with definitions of the two central notions of the article. Perhaps the major motivation behind paraconsistency in the modern period has been the thought that there are many situations where we wish to handle inconsistent information in a sensible way — and specifically, where we have to infer from it. (We may also wish to revise the information; but that is another matter. And a knowledge of what does or does not follow sensibly from the information may be necessary for an intelligent revision.)
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
130
Graham Priest
Let ⊢ be any relation of logical consequence.1 Let ¬ denote negation. (What, exactly, this is, we will come back to later in this essay.) Then the relation is called explosive if it satisfies the principle of Explosion: α, ¬α ⊢ β or, as it is sometimes called, ex contradictione quodlibet. Explosion is, on the face of it, a most implausible looking inference. It is one, however, that is valid in “classical logic”, that is, the orthodox logic of our day. Clearly, an explosive notion of logical consequence is not a suitable vehicle for drawing controlled inferences from inconsistent information. A necessary condition for a suitable vehicle is therefore that Explosion fail. This motivates the now standard definition: a consequence relation is paraconsistent if it is not explosive. The term was coined by Mir´ o Quesada at the Third Latin American Symposium on Mathematical Logic in 1976.2 Given a language in which to express premises and conclusions, a set of sentences in this language is called trivial if it contains all sentences. Let Σ be a set of sentences, and suppose that it is inconsistent, that is: for some α, Σ contains both α and ¬α. If ⊢ is explosive, the deductive closure of Σ under ⊢ (that is, the set of consequences of Σ) is trivial. Conversely, if ⊢ is paraconsistent it may be possible for the deductive closure of Σ to be non-trivial.3 Hence, a paraconsistent logic allows for the possibility of inconsistent sets of sentences whose deductive closures are non-trivial. Paraconsistency, in the sense just defined, is not a sufficient condition for a consequence relation to be a sensible one with which to handle inconsistent information. Consider, for example, so-called minimal logic, that is, essentially, intuitionist logic minus Explosion. This is paraconsistent, but in it α, ¬α ⊢ ¬β, for all α and β.4 Hence, one can infer the negation of anything from an inconsistency. This is not triviality, but it is clearly antithetical to the spirit of paraconsistency, if not the letter. It is possible to try to tighten up the definition of ‘paraconsistent’ in various ways.5 But it seems unlikely that there is any purely formal necessary and sufficient condition for the spirit of paraconsistency: inconsistent information may make a nonsense of a consequence relation in so many, and quite different, 1 In this article, I will think of such a relation as one between a set of premises and a single conclusion. However, as should be clear, multiple-conclusion paraconsistent logics are also quite feasible. In listing the premises of an inference, I will often omit set braces. I will use lower case Greek letters for individual premises/conclusions, and upper case Greek letters for sets thereof. Lower case Latin letters, p, q, r, will indicate distinct propositional parameters. 2 The prefix ‘para’ has a number of different significances. Newton da Costa informed me that the sense that Quesada had in mind was ‘quasi’, as in ‘paramedic’ or ‘paramilitary’. ‘Paraconsistent’ is therefore ‘consistent-like’. Until then, I had always assumed that the ‘para’ in ‘paraconsistent’ meant ‘beyond’, as in ‘paranormal’ and ‘paradox’ (beyond belief). Thus, ‘paraconsistent’ would be ‘beyond the consistent’. I still prefer this reading. 3 Though, of course, for certain inconsistent Σ and paraconsistent ⊢, the set of consequences may be trivial. 4 Since α ⊢ β → α and β → α ⊢ ¬α → ¬β. 5 See, for example, Urbas [1990].
Paraconsistency and Dialetheism
131
ways.6 Better, then, to go for a clean, simple, definition of paraconsistency, and leave worrying about the spirit to individual applications.
1.3
Defining the Key Notions: Dialetheism
No similar problems surround the definition of ‘dialetheism’. The fact that we are faced with, or even forced into operating with, information that is inconsistent, does not, of course, mean that that information is true. The view that it may be is dialetheism. Specifically, a dialetheia is a true contradiction, a pair, α and ¬α, which are both true (or equivalently, supposing a normal notion of conjunction, a truth of the form α ∧ ¬α). A dialetheist is therefore a person who holds that some contradictions are true. The word ‘dialetheism’ and its cognates were coined by Priest and Routley in 1981, when writing the introduction to Priest, Routley, and Norman [1989].7 Before that, the epithet ‘paraconsistency’ had often been used, quite confusingly, for both dialetheism and the failure of explosion.8 A trivialist is a person who believes that all contradictions are true (or equivalently, and more simply, who believes that everything is true). Clearly, a dialetheist need not be a trivialist (any more than a person who thinks that some statements are true must think that all statements are true). As just observed, a person may well think it appropriate to employ a paraconsistent logic in some context, or even think that there is a uniquely correct notion of deductive logical consequence which is paraconsistent, without being a dialetheist. Conversely, though, it is clear that a dialetheist must subscribe to a paraconsistent logic — at least when reasoning about those domains that give rise to dialetheias — unless they are a trivialist. A final word about truth. In talking of true contradictions, no particular notion of truth is presupposed. Interpreters of the term ‘dialetheia’ may interpret the notion of truth concerned in their own preferred fashion. Perhaps surprisingly, debates over the nature of truth make relatively little difference to debates about dialetheism.9
6 For example, as we will see later, the T -schema (plus self-reference) gives triviality in any logic that contains Contraction (α → (α → β) ⊢ α → β). Yet Contraction is valid in many logics that standardly get called paraconsistent. 7 Chapters 1 and 2 of that volume cover the same ground as the next two parts of this essay, and can be consulted for a slightly different account. 8 The term dialetheia was motivated by a remark of Wittgenstein [1978, p. 256], where he compares the liar sentence to a Janus-headed object facing both truth and falsity. A di/aletheia is, thus, a two-way truth. Routley, with an uncharacteristic purism, always preferred ‘dialethic’, ‘dialethism’, etc., to ‘dialetheic’, ‘dialetheism’, etc. Forms with and without the ‘e’ can now both be found in the literature. 9 See Priest [2000a].
132
Graham Priest
2 PARACONSISTENT LOGIC IN HISTORY
2.1 Explosion in Ancient Logic Having clarified the central notions of this essay, let us now turn to its first main theme. What are the histories of these notions? Paraconsistency first. It is sometimes thought that Explosion is a principle of inference coeval with logic. Calling the received theory of inference ‘classical’ may indeed give this impression. Nothing could be further from the truth, however. The oldest system of formal logic is Aristotle’s syllogistic;10 and syllogistic is, in the only way in which it makes sense to interpret the term, paraconsistent. Consider, for example, the inference: Some men are animals. No animals are men. All men are men. This is not a (valid) syllogism. Yet the premises are contradictories. Hence contradictions do not entail everything. Aristotle is well aware of this, and points it out explicitly: In the first figure [of syllogisms] no deduction whether affirmative or negative can be made out of opposed propositions: no affirmative deduction is possible because both propositions must be affirmative, but opposites are the one affirmative, the other negative... In the middle figure a deduction can be made both of opposites and of contraries. Let A stand for good, let B and C stand for science. If then one assumes that every science is good, and no science is good, A belongs to every B and to no C, so that B belongs to no C; no science, then is science. Similarly if after assuming that every science is good one assumed that the science of medicine is not good; for A belongs to every B but to no C, so that a particular science will not be a science... Consequently it is possible that opposites may lead to a conclusion, though not always or in every mood.11 Syllogistic is not a propositional logic. The first logicians to produce a propositional logic were the Stoics. But there is no record of any Stoic logician having endorsed Explosion either. Nor do any of the critics of Stoic logic, like Sextus Empiricus, mention it. (And this surely would have been grist for his mill!) Stoic logicians did not, therefore, endorse Explosion. 10 The investigation of logic in the East, and especially in India, starts at around the same time as it does in Greece. But for some reason, Indian logic never developed into a formal logic in anything like the Western sense. There is, at any rate as far as I am aware, no Indian logician who endorsed Explosion or anything like it. There are good reasons for this, as we will see in due course. 11 Prior Analytics 63b 31-64a 16. The translation is from Barnes [1984]. Note also that there is nothing suspicious about taking some of the terms of the syllogism to be the same. As this quote shows, Aristotle explicitly allows for this.
Paraconsistency and Dialetheism
133
It might be thought that Stoic logic was, none the less, explosive, on the following grounds. Consider the principle of inference called the Disjunctive Syllogism: α, ¬α ∨ β ⊢ β Given this, Explosion is not far away, as can be seen by the following argument, which we will call William’s argument (for reasons that will become clear in a moment): ¬α α ¬α ∨ β β (Premises are above lines; corresponding conclusions are below.) Now, Stoic logicians did explicitly endorse the Disjunctive Syllogism. It was one of their five “axioms” (indemonstrables).12 So perhaps their logic was explosive, though they did not notice it? No. It is too much to ask one to believe that such good logicians missed a two-line argument of this kind. The most likely explanation is that Stoic logicians did not endorse William’s argument since they did not endorse the other principle it employs, Addition: α⊢α∨β Though the precise details of the Stoic account of disjunction are somewhat moot, there are reasons to suppose that the Stoics would not account a disjunction of an arbitrary α and β even as grammatical: disjunctions were legitimate when the disjuncts were exclusive, and enumerated an exhaustive partition of some situation or other (as in: It’s either Monday, or it’s Tuesday, or ... or it’s Sunday).13
2.2
Explosion in Medieval Logic
The understanding of disjunction — and conjunction for that matter — in anything like its contemporary truth-functional sense seems not to emerge in logic until about the 12th century.14 It is therefore not surprising that the first occurrence of William’s argument seems to appear at about the same time. Though the evidence is circumstantial, it can be plausibly attributed to the 12th century Paris logician, William of Soissons, who was one of the parvipontinians, logicians who made a name for themselves advocating Explosion.15 William’s argument was well known within about 100 years. It can be found quite clearly in Alexander Neckham at the end of the 12th century,16 and it is clearly stated in the writings of the mid-14th century logician now known only as Pseudo-Scotus.17 12 See, 13 For 14 See 15 See 16 See 17 See
e.g., Boche´ nski [1963, p. 98]. a discussion of Stoic disjunction, see Jennings [1994, ch. 10]. Sylvan [2000, section 5.3]. Martin [1986]. Read [1988, p. 31]. Kneale and Kneale [1962, p. 281f].
134
Graham Priest
The history of the principle of Explosion in medieval logic after this time is a tangled one, and surely much of it still remains to be discovered. What one can say for sure is that logical consequence, and with it Explosion, was one of the topics that was hotly debated in medieval logic. (One thing that muddies the waters is the fact that logicians tended to run together logical consequence and the conditional, calling both consequentiae.) Most of the major logicians distinguished between different notions of logical consequence.18 The various notions go by different names for different logicians. But it was not uncommon to distinguish between a “material” notion of validity, according to which Explosion held, and a “formal” notion of validity, requiring some sort of connection between premises and conclusion. Unsurprisingly, Explosion did not hold in the latter.19 One factor that drove towards accepting Explosion, at least for material consequences, was a definition of validity that started to become popular around the 13th century, and which may be stated roughly as follows:20 A valid inference is one in which it is impossible for the premises to be true and the conclusion to be false. The account was by no means accepted by all. But given the common assumption that it is impossible for contradictions to be true, ¬3(α ∧ ¬α), and a few plausible principles concerning truth functional conjunction and modality, it follows that, for arbitrary β, ¬3((α ∧ ¬α) ∧ β). Assuming that the ‘and’ italicized in the above definition is truth functional, this is just Explosion.21 Of particular note in the present context are the Cologne logicians of the late 15th century. These rejected Explosion as a formally valid principle, and with it the Disjunctive Syllogism (thereby prefiguring modern paraconsistent and relevant logic), specifically on ground that both fail if we are reasoning about situations in which, maybe per impossibile both α and ¬α hold.22 As is well known, the study of logic went into decline after this period. The subtle debates of the great medieval logicians were forgotten. Formal logic came to be identified largely with syllogistic. A few propositional inferences, such as modus ponens (α, α → β ⊢ β) and the Disjunctive Syllogism, are sometimes stated in logic texts, but Explosion is not one of them (and neither is Addition). Even the greatest logician between the middle ages and the end of the 19th century, Leibniz, 18 Perhaps with some indication of which notion of consequence was appropriate in which sort of case. See Stump [1989, p. 262f]. 19 See, e.g., Sylvan [2000, 5.4]. 20 See, e.g., Boh [1982]; Ashworth [1974, pp. 120ff]. 21 Many definitions of validity are to be found in medieval logic. The one in question goes back well beyond the 13th century. Indeed, arguably it goes back to Megarian logicians. But in earlier versions, the conjunction was not necessarily interpreted truth functionally. For a full discussion, see Sylvan [2000]. 22 See Ashworth [1974, p. 135]. A similar line was run by de Soto in the early 16th century. See Read [1993, pp. 251–5].
Paraconsistency and Dialetheism
135
does not mention Explosion in his writings.23 It seems fair to say, therefore, that oblivion ensured that paraconsistency became the received position in logic once more.
2.3
Explosion in Modern Logic
Things changed dramatically with the rise of modern logic at the end of the 19th century. For the logical theory invented by Frege, and subsequently taken up by Russell — classical logic — is explosive. (This needs no documentation for contemporary readers.) But Frege and Russell were introducing (or reintroducing) into logic something very counter-intuitive.24 Since neither of them was much of a student of medieval logic (nor could they have been, given the poor scholarship of the period at the time), what needs discussion is where the drive for Explosion came from. The motors are at least two.25 Frege and Russell realised the power of a truth-functional analysis of connectives, and exploited it relentlessly. But they were over-impressed by it, believing, incorrectly, that all interesting logical connectives could be given a truth functional analysis. The point was later to be given central dogmatic status by Russell’s student, Wittgenstein, in the Tractatus. Now, if one gives a truth functional analysis of the conditional (if...then...), the only plausible candidate is the material conditional, ¬α ∨ β (α ⊃ β). Given this, the most natural principle for the conditional, modus ponens, collapses into the Disjunctive Syllogism, to which the logic is therefore committed. Given that the truth functional understanding of disjunction immediately vouchsafes Addition, Explosion is an immediate corollary. The second source of Explosion is, in many ways, more fundamental. It is a fusion of two things. The first is an account of negation. How, exactly, to understand negation is an important issue in the history of logic, though one that often lurks beneath the surface of other disputes (especially concerning the conditional). (More of this later.) In the middle of the 19th century an account of propositional negation was given by George Boole. According to Boole, negation acts like set-theoretic complementation. Specifically, for any α, α and ¬α partition the set of all situations: the situations in which ¬α is true are exactly those where α fails to be true. (Note that this is not entailed by a truth functional account of negation. Some paraconsistent logics have a different, but still truth functional, theory of negation.) Boole’s way of looking at negation, and more generally, the 23 At least according to the account of Leibniz’ logic provided by Kneale and Kneale [1962, pp. 336ff]. 24 And Russell, at least, was aware of this. There is a folklore story concerning Russell — Nick Griffin tells me that a version of it can be found in Joad [1927] — which goes as follows. Russell was dining at high table at Trinity, when he mentioned to one of his fellow dons that in his logic a contradiction implies everything. According to one version, the don, righly incredulous, challenged him to deduce the fact that he was the Pope from the claim that 2 = 3. After some thought, Russell replied: ‘Well, if 2 = 3 then, subtracting 1 from both sides, it follows that 1 = 2. Now the Pope and I are two. Hence, the Pope and I are one. That is, I am the Pope’. 25 There are certainly others. For example, Explosion is endorsed by by Peano [1967, p. 88], but his reasons for it are not stated.
136
Graham Priest
analysis of propositional operators in set-theoretic terms, was highly influential on the founders of modern logic. Russell, for example, took Boole’s work to be the beginning of ‘the immense and surprising development of deductive logic’ of which his own work formed a part.26 The second element entering into the fusion is an account of validity, to the effect that an inference is valid if there are no situations, or models, as they were to come to be called, in which the premises are true and the conclusion is false. The account is not stated by either Frege or Russell, as far as I am aware. It is implicit, however, at least for propositional logic, in the truth-tabular account of validity, and was developed and articulated, by Tarski and other logicians, into the modern model-theoretic account of validity. Neither the Boolean theory of negation nor the model-theoretic account of validity, on its own, delivers explosion.27 But together they do. For a consequence of Boole’s account is that exactly one of α and ¬α holds in every model. It follows that there is no model in which α and ¬α hold and which β fails. The modeltheoretic account does the rest. (The argument is clearly a relative of the medieval argument for Explosion based on the modal definition of validity.) It is interesting to note that William’s argument for Explosion does not seem to figure in discussions during this period. It was left to C. I. Lewis to rediscover it. (It is stated in Lewis and Langford [1932, p. 250].) There is a certain irony in this, since Lewis was one of the major early critics of Russell on the matter of the conditional. Lewis, whilst rejecting a material account of the conditional, was driven by William’s argument to accepting an account according to which contradictions do imply everything (“strict implication”). It is perhaps also worth noting that both Russell and Lewis perpetuate the medieval confusion of validity and the conditional, by calling both ‘implication’. Pointing out this confusion was to allow Quine to defend the material conditional as an account of conditionality.28 The problems with the material conditional go much deeper than this, though.29 Lewis was not the only critic of “classical logic” in the first half of the century. The most notable critics were the intuitionists. But though the intuitionists rejected central parts of Frege/Russell logic, they accepted enough of it to deliver Explosion.30 First, they, accepted both the Disjunctive Syllogism and Addition. They also accepted a model-theoretic account of validity (albeit with models of a somewhat different kind). They did not accept the Boolean account of negation. But according to their account, though α and ¬α may both fail in some situations, 26 Russell [1997, p. 497]. I also have a memory of him calling Boole the ‘father of modern logic’, but I am unable to locate the source. Boole himself was not a modern logician. Though he may have stretched this to its limits, syllogistic was squarely the basis of his work. He might plausibly, therefore, be thought of as the last of the great traditional logicians. 27 We will see later that the model-theoretic account of validity is quite compatible with paraconsistent logic. As for negation, it may follow from the Boolean account that contradictions are true in no situation; but this says nothing about consequence. 28 Quine 1966, p. 163f]. 29 See Priest [2001a, ch. 1]. 30 Though they were criticized on this ground, for example by Kolmogorov. Dropping Explosion from Intuitionist logic gives Johannson’s “minimal logic”. See Haack [1974, p. 101f].
Paraconsistency and Dialetheism
137
they cannot, at least, both hold. This is sufficient to give Explosion. So this is how things stood half way through the 20th century. Classical logic had become entrenched as the orthodox logical theory. Various other logical theories were known, and endorsed by some “deviant” logicians — especially modal and intuitionist logic; but all these accounts preserved enough features of classical logic to deliver Explosion. Explosion, therefore, had no serious challenge. We will take up the story concerning paraconsistency again in a later section. But now let us back-track, and look at the history of dialetheism. 3
3.1
DIALETHEISM IN HISTORY
Contradiction in Ancient Philosophy
Can contradictions be true? At the beginning of Western philosophy it would seem that opinions were divided on this issue. On the face of it, certain of the Presocratics took the answer to be ‘yes’. Uncontroversially, Heraclitus held that everything was in a state of flux. Any state of affairs described by α changes into one described by ¬α. More controversially, the flux state was one in which both α and ¬α hold.31 Hence, we find Heraclitus asserting contradictions such as:32 We step and do not step into the same rivers; we are and we are not. On the other hand, Parmenides held that what is has certain amazing properties. It is one, changeless, partless, etc. A major part of the argument for this is that one cannot say of what is that it is not, or vice versa:33 For never shall this be forcibly maintained, that things that are not are, but you must hold back your thought from this way of inquiry, nor let habit, born of much experience, force you down this way, by making you use an aimless eye or an ear and a tongue full of meaningless sounds: judge by reason the strife-encompassed refutation spoken by me. This certainly sounds like a proto-statement of the Law of Non-Contradiction. And Zeno, according to tradition Parmenides’ student, made a name for himself arguing that those who wished to deny Parmenides’ metaphysics ended up in contradiction — which he, at least, took to be unacceptable. The dialogues of Plato are somewhat ambivalent on the matter of contradiction. For a start, in the Republic we find Socrates enunciating a version of the Law of Non-Contradiction, and then arguing from it:34 31 And Heraclitus held, it would seem, that the flux state is sui generis. That is, α ∧ ¬α entails neither α nor ¬α. 32 Fragment 49a; translation from Robinson [1987]. 33 Fragment 7; translation from Kirk and Raven [1957, p. 248]. 34 436b. Hamilton and Cairns [1961].
138
Graham Priest
It is obvious that the same thing will never do or suffer opposites in the same respect in relation to the same thing and at the same time. In the later dialogue, the Parmenides, the same Socrates expresses less confidence: Even if all things come to partake of both [the form of like and the form of unlike], and by having a share of both are both like and unlike one another, what is there surprising in that? ... when things have a share in both or are shown to have both characteristics, I see nothing strange in that, Zeno, nor yet in a proof that all things are one by having a share in unity and at the same time many by sharing in plurality. But if anyone can prove that what is simple unity itself is many or that plurality itself is one, then shall I begin to be surprised.35 Thus, it may be possible for things in the familiar world to have inconsistent properties, though not the forms.36 What to make of the later part of this puzzling dialogue is notoriously hard. But taking the text at face value, Parmenides does succeed in showing that oneness itself does have inconsistent properties of just the kind to surprise Socrates. Interpreting texts such as these, especially the Presocratics, is fraught with difficulty, and it may well be thought that those I have cited as countenancing violations of Law of Non-Contradiction did not really do so, but were getting at something else. It should be noted, then, that a commentator no less than Aristotle interpreted a number of the Presocratics as endorsing contradictions.37 In Book 4 of the Metaphysics, he takes them in his sights, and mounts a sustained defence of the Law of Non-Contradiction, which he enunciates as follows (5b 18-22):38 For the same thing to hold good and not hold good simultaneously of the same thing and in the same respect is impossible (given any further specifications which might be added against dialectical difficulties). The rest of the text is something of an exegetical nightmare.39 The Law is, Aristotle says, so certain and fundamental that one cannot give a proof of it (5b 22-27). He then goes on straight away to give about seven or eight arguments for it (depending on how one counts). He calls these elenchic demonstrations, rather than 35 129b,
c. Hamilton and Cairns [1961]. is tempting to read Socrates as saying that things may be inconsistent in relational ways. That is, an object may be like something in some ways and unlike it in others. This would not be a real contradiction. But this cannot be what Socrates means. For exactly the same can be true of the forms. The form of the good might be like the form of unity in that both are forms, but unlike it in that it is the highest form. 37 Heraclitus and Protagoras are singled out for special mention. Protagoras claimed that if someone believes something, it is true (for them). Hence α may be true (for some person), and ¬α may be true (for someone else). This does not sound quite like a contradiction. But of course, if someone believes α ∧ ¬α, then that is true (for them). 38 Kirwan [1993]. 39 For a full analysis of the text, see Priest [1998]. 36 It
Paraconsistency and Dialetheism
139
proofs. Exactly what this means is not clear; what is clear is that the opponent’s preparedness to utter something meaningful is essential to the enterprise. But then, just to confuse matters, only the first of the arguments depends on this preparedness. So the latter arguments do not seem to be elenchic either. Leaving this aside, the arguments themselves are varied bunch. The first argument (6a 28-7b 18) is the longest. It is tangled and contorted, and it is not at all clear how it is supposed to work. (Some commentators claim to find two distinct arguments in it.) However one analyses it, though, it must be reckoned a failure. The most generous estimate of what it establishes is that for any predicate, P , it is impossible that something should be P and not be P (¬3(P a ∧ ¬P a)) which sounds all well and good at first. But one who really countenances violations of the Law of Non-Contradiction may simply agree with this! For they may still hold that for some P and a, P a ∧ ¬P a as well. It will follow, presumably, that 3(P a ∧ ¬P a), and hence that 3(P a ∧ ¬P a) ∧ ¬3(P a ∧ ¬P a). This is a contradiction (we might call it a “secondary contradiction”), but contradiction is clearly not a problem in this context.40 When we turn to the other arguments (7b 18-9a 6), matters are even worse. For the majority of these arguments, if they establish anything at all — and they certainly have steps at which one might cavil — establish not the Law of NonContradiction, but what we might call the Law of Non-Triviality: it is not possible that all contradictions be true. Dialetheists may of course agree with this. Aristotle, in fact, seems to slide between the two Laws with gay abandon, possibly because he took his main targets to be not just dialetheists, but trivialists.41 A couple of the arguments do not even attempt to establish the Law of Non-Triviality. What they conclude is that it is impossible for anyone to believe that all contradictions are true. It is, of course, compatible with this that all contradictions are true, nonetheless. Aristotle’s defence of the Law of Non-Contradiction must therefore be reckoned a failure. It’s historical importance has been completely out of proportion to its intellectual weight, however. Since the entrenchment of Aristotelian philosophy in the medieval European universities, the Law of Non-Contradiction has been high orthodoxy in Western philosophy. It is taken so much for granted that there has, improbably enough, been no sustained defence of the Law since Aristotle’s. (Of which other of Aristotle’s philosophical views can one say this?) It is worth noting, finally, that the Law of Non-Contradiction — and its mate the Law of Excluded Middle, also defended in Book 4 of Metaphysics — are not logical principles for Aristotle, but metaphysical principles, governing the nature of beings qua beings. By the time one gets to Leibniz, however, the Laws have been absorbed into the logical canon. 40 In fact, in many paraconsistent logics, such as LP , ¬(α ∧ ¬α) is a logical truth, and in their modalised versions, so is ¬3(α ∧ ¬α). Every contradiction therefore generates secondary contradictions. 41 The slide between ‘some’ and ‘all’ is also not uncommon in others who have tried to defend the law.
140
Graham Priest
3.2 A Minority Voice: Neoplatonism and its Successors There is just one tradition that stands out against the orthodox acceptance of the Law of Non-Contradiction. This is the metaphysical tradition that starts with the Neoplatonists, and goes through the great Christian mystics, Eruigina and Eckhart, and their Renaissance successors, such as Cusanus. What holds this tradition together is the belief that there is an ultimate reality, the One, or in its Christian form, the Godhead. This reality is, in some sense, responsible for the existence of everything else, including humankind. Humankind, being alienated from the reality, finds its ultimate fulfillment in union with it. This tradition draws on, amongst other things, some of the later Platonic dialogues, and especially the Parmenides. As we noted, in the second half of this dialogue Parmenides shows the One to have contradictory properties. It is perhaps not surprising, then, to find writers in this tradition having a tendency to say contradictory things, especially about the ultimate reality. For example, referring explicitly to Parmenides 160b 2-3, Plotinus says:42 The One is all things and no one of them; the source of all things is not all things and yet it is all things...43 Eckhart says, sometimes, that the Godhead is being; and, at other times, that it is beyond being — and thus not being.44 And Cusanus says that:45 in no way do they [distinctions] exist in the absolute maximum. The absolute maximum... is all things, and whilst being all, is none of them; in other words, it is at once the maximum and minimum of being. Cusanus also attacked contemporary Aristotelians for their attachment to the Law of Non-Contradiction.46 The contradictory claims about the One are no mere aberration on the part of these writers, but are driven by the view of the One as the ground of all things that are. If it were itself anything, it would not be this: it would be just another one of those things. Consequently, one cannot say truly anything to the effect that the One is such and such, or even that it is (simpliciter ); for this would simply make it one of the many. The One is therefore ineffable. As Plotinus puts it (Ennead V.5.6): The First must be without form, and, if without form, then it is no Being; Being must have some definition and therefore be limited; but the First cannot be thought of as having definition and limit, for thus 42 Ennead
V.2.1. Translation from MacKenna [1991]. inserts the words ‘in a transcendental sense’ here; but they are not in the text. I think that this is a misplaced application of the principle of charity. 44 See Smart [1967, p. 450]. 45 Of Learned Ignorance I.3. Translation from Heron [1954]. 46 See Maurer [1967]. 43 MacKenna
Paraconsistency and Dialetheism
141
it would not be the Source, but the particular item indicated by the definition assigned to it. If all things belong to the produced, which of them can be thought of as the supreme? Not included among them, this can be described only as transcending them: but they are Being and the Beings; it therefore transcends Being. But even though the One is ineffable, Plotinus still describes it as ‘the source of all things’, ‘perfect’ (Ennead V.2.1), a ‘Unity’, ‘precedent to all Being’ (Ennead VI.9.3). Clearly, describing the ineffable is going to force one into contradiction.47
3.3
Contradiction in Eastern Philosophy
We have not finished with the Neoplatonist tradition yet, but before we continue with it, let us look at Eastern Philosophy, starting in India. Since very early times, the Law of Non-Contradiction has been orthodox in the West. This is not at all the case in India. The standard view, going back to before the Buddha (a rough contemporary of Aristotle) was that on any claim of substance there are four possibilities: that the view is true (and true only), that it is false (and false only), that it is neither true nor false, and that it is both true and false. This is called the catuskoti (four corners), or tetralemma.48 Hence, the possibility of a contradiction was explicitly acknowledged. The difference between this view and the orthodox Western view is the same as that between the semantics of classical logic and the four-valued semantics for the relevant logic of First Degree Entailment (as we shall see). In classical logic, sentences have exactly one of the truth values T (true) and F (false). In First Degree Entailment they may have any combination of these values, including both and neither. Just to add complexity to the picture, some Buddhist philosopers argued that, for some issues, all or none of these four possibilities might hold. Thus, the major 2nd century Mahayana Buddhist philosopher N¯ ag¯ arjuna is sometimes interpreted in one or other of these ways. Arguments of this kind, just to confuse matters, are also sometimes called catuskoti. Interpreting N¯ ag¯arjuna is a very difficult task, but it is possible to interpret him, as some commentators did, as claiming that these matters are simply ineffable.49 The Law of Non-Contradiction has certainly had its defenders in the East, though. It was endorsed, for example, by logicians in the Ny¯ ay¯ a tradition. This influenced Buddhist philosophers, such as Darmak¯ırti, and, via him, some Buddhist schools, such as the Tibetan Gelug-pa. Even in Tibet, though, many Buddhist schools, such as the Nyngma-pa, rejected the law, at least for ultimate truths. Turning to Chinese philosophy, and specifically Taoism, one certainly finds utterances that look as though they violate the Law of Non-Contradiction. For ex47 Nor can one escape the contradiction by saying that the One is not positively characterisable, but may be characterised only negatively (the via negativa). For the above characterisations are positive. 48 See Raju [1953–4]. 49 This is particularly true of the Zen tradition. See Kasulis [1981, ch. 2].
142
Graham Priest
ample, in the Chuang Tzu (the second most important part of the Taoist canon), we find:50 That which makes things has no boundaries with things, but for things to have boundaries is what we mean by saying ‘the boundaries between things’. The boundaryless boundary is the boundary without a boundary. A cause of these contradictions is not unlike that in Neoplatonism. In Taoism, there is an ultimate reality, Tao, which is the source and generator of everything else. As the Tao Te Ching puts it:51 The Tao gives birth to the One. The One gives birth to the two. The Two give birth to the three — The Three give birth to every living thing.
It follows, as in the Western tradition, that there is nothing that can be said about it. As the Tao Te Ching puts it (ch. 1): The Tao that can be talked about is not the true Tao. The name that can be named is not the eternal name. Everything in the universe comes out of Nothing. Nothing — the nameless — is the beginning...
Yet in explaining this situation, we are forced to say things about it, as the above quotations demonstrate. Chan (Zen) is a fusion of Mahayana Buddhism and Taoism. As might therefore be expected, the dialetheic aspects of the two metaphysics reinforce each other. Above all, then, Zen is a metaphysics where we find the writings of its exponents full of apparent contradictions. Thus, for example, the great Zen master D¯ ogen says:52 This having been confirmed as the Great Teacher’s saying, we should study immobile sitting and transmit it correctly: herein lies a thorough investigation of immobile sitting handed down in the Buddha-way. Although thoughts on the immobile state of sitting are not limited to a single person, Y¨ ueh-shan’s saying is the very best. Namely: ‘thinking is not thinking’. or:53 50 22.6.
Translation from Mair [1994]. 42. Translation from Kwok, Palmer and Ramsey [1993]. What the one, two and three are is a moot point. But in one interpretation, the one is the T’ai-Chi (great harmony); the two are Yin and Yang. 52 Kim [1985, p. 157]. 53 Tanahashi [1985, p. 107]. 51 Ch.
Paraconsistency and Dialetheism
143
An ancient buddha said, ‘Mountains are mountains, waters are waters.’ These words do not mean that mountains are mountains; they mean that mountains are mountains. Therefore investigate mountains thoroughly... Now interpreting all this, especially the Chinese and Japanese writings, is a hard and contentious matter. The writings are often epigrammatic and poetical. Certainly, the writings contain assertions of contradictions, but are we meant to take them literally? It might be thought not. One suggestion is that the contradictions are uttered for their perlocutionary effect: to shock the hearer into some reaction. Certainly, this sort of thing plays a role in Zen, but not in Mahayana Buddhism or Taoism. And even in Zen, contradictions occur in even the theoretical writings. More plausibly, it may be suggested that the contradictions in question have to be interpreted in some non-literal way. For example, though ultimate reality is literally indescribable, what is said about it gives some metaphorical description of its nature. This won’t really work either, though. For the very reason that ultimate reality is indescribable is precisely because it is that which brings all beings into being; it can therefore be no being (and so to say anything about it is contradictory). At least this much of what is said about the Tao must be taken literally, or the whole picture falls apart.54
3.4
Hegel
Let us now return to Western philosophy, and specifically to Hegel. With the philosophers we have met in the last two sections, because their utterances are often so cryptic, it is always possible to suggest that their words should not be taken at face value. By contrast, Hegel’s dialetheism is ungainsayable. He says, for example:55 ...common experience... says that there is a host of contradictory things, contradictory arrangements, whose contradiction exists not merely in external reflection, but in themselves... External sensuous motion is contradiction’s immediate existence. Something moves, not because at one moment it is here and at another there, but because at one and the same moment it is here and not here, because in this “here”, it at once is and is not. Why does he take this view? For a start, Hegel is an inheritor of the Neoplatonic tradition.56 Hegel’s One is Spirit (Geist). This creates Nature. In Nature there are individual consciousnesses 54 It is true that in Chinese philosophy, unlike in Neoplatonism, the arguments that tie the parts of the picture together are not made explicit; but they are there implicitly: readers are left to think things through for themselves. 55 Miller [1969, p. 440]. 56 The genealogy is well tracked in Kolakowski [1978, ch. 1].
144
Graham Priest
(Spirit made conscious), who, by a process of conceptual development come to form a certain concept, the Absolute Idea, which allows them to understand the whole system. In this way Spirit achieves self-understanding, in which form it is the Absolute.57 There is much more to the story than this, of course, and to understand some of it, we need to backtrack to Kant. In the Transcendental Dialectic of the Critique of Pure Reason, Kant argues that Reason itself has a tendency to produce contradiction. Specifically, in the Antinomies of Pure Reason, Kant gives four pairs of arguments which are, he claims, inherent in thought. Each pair gives a pair of contradictory conclusions (that the world is limited in space and time, that it is not; that matter is infinitely divisible, that it is not; etc.) The only resolution of these contradictions, he argues, lies in the distinction between phenomena and noumena, and the insistence that our categories apply only to phenomena. The antinomies arise precisely because, in these arguments, Reason over-stretches itself, and applies the categories to noumena. There is a lot more to things than this, but that will suffice for here.58 Priest, G. Hegel criticised Kant’s distinction between phenomena and noumena. In particular, he rejected the claim that the two behave any differently with respect to the categories. The conclusions of Kant’s Antinomies therefore have to be accepted — the world is inconsistent:59 to offer the idea that the contradictions introduced into the world of Reason by the categories of the Understanding is inevitable and essential was to make one of the most important steps in the progress of Modern Philosophy. But the more important the issue raised the more trivial was the solution. Its only motive was an excessive tenderness for the things in the world. The blemish of contradiction, it seems, could not be allowed to mar the essence of the world; but there could be no objection to attaching it to the thinking Reason, to the essence of the mind. Probably, nobody will feel disposed to deny that the phenomenal world presents contradictions to the observing mind; meaning by ‘phenomenal’ the world as it presents itself to the senses and understanding, to the subjective mind. But if a comparison is instituted between the essence of the world, and the essence of the mind, it does seem strange to hear how calmly and confidently the modest dogma has been advanced by one and repeated by others, that thought or Reason, and not the World, is the seat of contradiction. Moreover, the Kantian contradictions are just the tip of an ice-berg. All our categories (or at least, all the important ones), give rise to contradiction in the same way. Thus, the contradictions concerning motion with which we started this section arise from one of Zeno’s paradoxes of motion. And it is reflection on these 57 For
fuller discussion, see Priest [1989–90]. and discussion can be found in Priest [1995, chs. 5, 6]. 59 Lesser Logic, Section 48. Translation from Wallace [1975]. 58 Details
Paraconsistency and Dialetheism
145
contradictions which drives the conceptual development that forces the emergence of the concept of the Absolute Idea. Famously, many aspects of Hegel’s thought were taken up by Marx (and Engels). In particular, Marx materialised the dialectic. In the process, much of the dialetheic story was simply taken over. This adds little of novelty that is important here, though, and so we do not need to go into it.60
3.5
Precursors of Modern Dialetheism
So much for the Neoplatonic tradition. Outside this, dialetheists and fellow travellers are very hard to find in Western philosophy. Around the turn of the 20th century, intimations of the failure of the Law of Non-Contradiction did start to arise in other areas, however. Let us look at these, starting with Meinong. Meinong’s theory of objects had two major postulates. The first is that every term of language refers to an object, though many of these objects may not exist. The second is that all objects may have properties, whether or not they exist. In particular, with reservations that we will come back to in a moment, all objects which are characterised in certain ways have those properties which their characterisations attribute to them (the Characterisation Principle). Thus, for example, the fabled Golden Mountain is both golden and a mountain; and, notoriously, the round square is both round and square. As the last example shows, some objects would appear to violate the Law of Non-Contradiction by being, for example, both round and square. Meinong was criticised on just these grounds by Russell [1905]. He replied that one should expect the Law to hold only for those things that exist, or at least for those things that are possible. Impossible objects have — what else? — impossible properties. That is how one knows that they cannot exist. As he puts it:61 B.Russell lays the real emphasis on the fact that by recognising such objects the principle of contradiction would lose its unlimited validity. Naturally I can in no way avoid this consequence... Indeed the principle of contradiction is directed by no one at anything other than the real and the possible. Things are not quite as straightforward as may appear, however.62 It is not entirely clear that Meinong does countenance violations of the Law of Non-Contradiction in the most full-blooded sense of the term. The round square is round and square, but is it round and not round? One would naturally think so, since being square entails not being round; but Meinong may well have thought that this entailment held only for existent, or at least possible, objects. Hence he may not have held there to be things with literally contradictory properties. 60 Details
can be found in Priest [1989–90]. [1907, p. 16]. 62 The following is discussed further in Routley [1980]. 61 Meinong
146
Graham Priest
But what about the thing such that it is both round and it is not the case that it is round? This would seem to be such that it is round and it is not the case that it is round. Not necessarily. For Meinong did not hold that every object has the properties it is characterised as having. One cannot characterise an object into existence, for example. (Think of the existent round square).63 The Characterisation Principle holds only for certain properties, those that are assumptible, or characterising. It is clear that Meinong thought that existence and like properties are not characterising, but Meinong never came clean and gave a general characterisation of characterising properties themselves. So we just do not know whether negation could occur in a characterising property. Hence, though there are certainly versions of Meinong’s theory in which some objects have contradictory properties, it is not clear whether these are Meinong’s. The next significant figure in the story is the Polish logician L ukasiewicz. In 1910, L ukasiewicz published a book-length critique of Aristotle on the Law of NonContradiction. This has still to be translated into English, but in the same year he ukasiewicz gives a damning published an abbreviated version of it, which has.64 L critique of Aristotle’s arguments, making it clear that they have no substance. Following Meinong’s lead, he also states that the Law of Non-Contradiction is not valid for impossible objects.65 However, he does claim that the Law is a valid “practical-ethical” principle. For example, without it one would not be able to establish that one was absent from the scene of a crime by demonstrating that one was somewhere else, and so not there.66 Given the logical acumen of the rest of the article, L ukasiewicz’s position here is disappointing. One does not need a universally valid law to do what L ukasiewicz requires. It is sufficient that the situation in question is such as to enable one to rule out inconsistency in that particular case. (Compare: even a logical intuitionist can appeal to the Law of Excluded Middle in finite situations.) For the same reason, an inductive generalisation from this sort of situation to the universal validity of the Law — or even to a law covering existent objects — is quite groundless. Another philosopher who was prepared to brook certain violations of the Law of Non-Contradiction, at around the same time, was the Russian Vasiliev.67 Like L ukasiewicz, Vasiliev held the Law to be valid for the actual world, but he held that it might fail in certain “imaginary worlds”. These are worlds where logic is different; there can be such things, just as there can be worlds where geometry is non-Euclidean. (Recall that he was writing before the General Theory of Relativity.) He did not think that all of logic could change from world to world, however. Essentially, positive logic, logic that does not concern negation (what he called 63 In fact, using an unbridled form of this principle, one can establish triviality. Merely consider the thing such that it is self-identical and α, for arbitrary α. 64 The book is L ukasiewicz [1910]; the English translation of the abbreviated version is L ukasiewicz [1970]. 65 [1970, section 19]. 66 [1970, section 20]. 67 Only one of his papers has been translated into English, Vasiliev [1912–13]. For further discussion of Vasiliev, see Priest [2000b].
Paraconsistency and Dialetheism
147
‘metalogic’) is invariant across all worlds. Only negation could behave differently in different worlds. Vasiliev also constructed a formal logic which was supposed to be the logic of these imaginary worlds, imaginary logic. This was not a modern logic, but a version of traditional logic. In particular, Vasiliev added to the two traditional syntactic forms ‘S is P (and not also not P )’, and ‘S is not P (and not also P )’, a third form, ‘S is P and not P ’. He then constructed a theory of syllogisms based on these three forms. (For example, the following is a valid syllogism: all S is M ; all M is P and not P ; hence, all S is P and not P .) Though Vasiliev’s logic is paraconsistent, it is not a modern paraconsistent logic: it is paraconsistent for exactly the same reason that standard syllogistic is. Nor, in a sense, is Vasiliev a dialetheist, since he held that no contradictions are true. His work clearly marks a departure from the traditional attitude towards the Law of Non-Contradiction, though. The final figure to be mentioned in this section is Wittgenstein. Though Wittgenstein’s views evolved throughout his life, they were mostly inhospitable to dialetheism. For most of his life, he held that contradictions, and especially the contradictions involved in the logical paradoxes, were senseless (in the Tractatus), or failed to make statements (in the transitional writings). However, towards the end of his life, and specifically in the Remarks on the Foundations of Mathematics, he came to reject this view:68 There is one mistake to avoid: one thinks that a contradiction must be senseless: that is to say, if e.g. we use the signs ‘p’, ‘∼’, ‘.’ consistently, then ‘p. ∼ p’ cannot say anything. The crucial view here seems to have been that concerning language games. People play a variety of these, and if people play games in which contradictions are accepted, then contradictions are indeed valid in those games (shades of Protagoras here). The logical paradoxes might just be such. As he says:69 But you can’t allow a contradiction to stand: Why not? We do sometimes use this form of talk, of course, not often — but one could imagine a technique of language in which it was a regular instrument. It might, for example be said of an object in motion that it existed and did not exist in this place; change might be expressed by means of contradiction. Unsurprisingly, Wittgenstein also had a sympathy towards paraconsistency. In 1930, he even predicted the modern development of the subject in the most striking fashion:70 68 Wittgenstein
[1978, pp. 377f]. [1978, p. 370]. 70 Wittgenstein [1979, p. 139]. 69 Wittgenstein
148
Graham Priest
I am prepared to predict that there will be mathematical investigations of calculi containing contradictions and that people will pride themselves in having emancipated themselves from consistency too. But his own efforts in this direction were not very inspired, and never came to much more than the directive ‘infer nothing from a contradiction’.71 Hence, Wittgenstein exerted no influence on future developments. Indeed, of all the people mentioned in this section it was only L ukasiewicz who was to exert an (indirect) influence on the development of paraconsistency, to which we now return. 4 MODERN PARACONSISTENCY
4.1 Background The revolution that produced modern logic around the start of the 20th century depended upon the application of novel mathematical techniques in proof-theory, model theory, and so on. For a while, these techniques were synonymous with classical logic. But logicians came to realise that the techniques are not specific to classical logic, but could be applied to produce quite different sorts of logical systems. By the middle of the century, the basics of many-valued logic, modal logic, and intuitionist logic had been developed. Many other sorts of logic have been developed since then; one of these is paraconsistent logic. The commencement of the modern development of paraconsistent logics occurred just after the end of the Second World War. At that juncture, it was an idea whose time had come — in the sense that it seems to have occurred to many different people, in very different places, and quite independently of each other. The result was a whole host of quite different paraconsistent logics. In this section we will look at these.72 I will be concerned here only with propositional logics. Though the addition of quantifiers certainly raises novel technical problems sometimes, it is normally conceptually routine. I shall assume familiarity with the basics of classical, modal, many-valued and intuitionist logic. I will use |=X as the consequence relation of the logic X. C is classical logic; I is intuionist logic.
4.2 Ja´skowski and Subsequent Developments The first influential developments in the area are constituted by the work of the Polish logician Ja´skowski, who had been a student of L ukasiewicz. Ja´skowski published a system of paraconsistent logic in 1948,73 which he called discussive (or discursive) logic. Ja´skowski cites a number of reasons why there might be situations in which one has to deal with inconsistent information, but the main 71 See Goldstein [1989] for discussion. According to Goldstein, the view that a contradiction entails nothing is present even in Wittgenstein’s earlier writings, including the Tractatus. 72 For technical information concerning the systems, see Priest [2002], where details explained in this section are discussed further. Proofs not given or referenced here can be found there. 73 Translated into English as Ja´ skowski [1969].
Paraconsistency and Dialetheism
149
idea that drives his construction is indicated in the name he gives his logic. He envisages a number of people engaged in a discussion or discourse (such as, for example, the witnesses at a trial). Each participant vouchsafes certain information, which is consistent(!); but the information of one participant may contradict that of another. Technically, the idea is implemented as follows.74 An interpretation, I, is a Kripke-interpretation for S5. It helps (but is not necessary) to think of I as coming with a distinguished base-world, g. What is true at any one world is thought of as the information provided by a participant of the discourse, and what holds in the discourse is what is true at any one of its worlds. This motivates the following definitions (where 3 is the usual possibility operator of modal logic): α holds in I iff 3α is true at g Σ |=d α iff for all I, if β holds in I, for all β ∈ Σ, then α holds in I It is a simple matter to show that |=d is paraconsistent. A two-world model where p is true at w1 (= g) and false at w2 , but q is true at neither w1 nor w2 will demonstrate that p, ¬p d q. It should be noted, though, that α ∧ ¬α |=d β, since, whatever α is, α ∧ ¬α holds in no I. It follows that the rule of Adjunction, α, β |=d α ∧ β, fails. This approach may therefore be classified under the rubric of non-adjunctive paraconsistent logic. As is clear, different discussive logics can be obtained by choosing underlying modal logics different from S5.75 A notable feature of discussive logic is the failure of modus ponens for the material conditional, ⊃: p, p ⊃ q d q. (The two-world interpretation above demonstrates this.) In fact, it can be shown that for sentences containing only extensional connectives there is no such thing as multi-premise validity, in the sense that if Σ |=d α, then for some β ∈ Σ, β |=d α. Moreover, single-premise inference is classical. That is, α |=d β iff α |=C β. In virtue of this, Ja´skowski defined a new sort of conditional, the discursive conditional, ⊃d , defined as follows: α ⊃d β is 3α ⊃ β. It is easy to check that α, α ⊃d β |=d β (that is, that 3α, 3(3α ⊃ β) |= 3β ), provided that the accessibility relation in the underlying modal logic is at least Euclidean (that is, if wRx and wRy then xRy). This holds in S5, but may fail in weaker logics, such as S4. The weakness produced by the failure of Adjunction, and multi-premise inferences in general, may be addressed with a quite different approach to constructing a non-adjunctive paraconsistent logic. The idea is to allow a certain amount of 74 What follows is somewhat anachronistic, since it appeals to possible-world semantics, which were developed only some 10-15 years later, but it is quite faithful to the spirit of Ja´skowski’s paper. 75 A somewhat different approach is given in Rescher and Brandom [1980]. They define validity in terms of truth preservation at all worlds, but they allow for inconsistent and incomplete worlds. What holds in an inconsistent world is what holds in any one of some bunch of ordinary worlds; and what holds in an incomplete world is what holds in all of some bunch of ordinary worlds. It can be shown that this results in the same consequence relation as discussive logic.
150
Graham Priest
conjoining before applying classical consequence. Since arbitrary conjoining cannot be allowed on pain of triviality, the question is how to regulate the conjoining. One solution to this, due first, as far as I know, to Rescher and Manor [1970–71], is as follows. Given any set of sentences, Σ, a maximally consistent (mc) subset of Σ is a set Π ⊆ Σ, such that Π is classically consistent, but if α ∈ Σ − Π, Π ∪ {α} is classically inconsistent. Then define: Σ |=rm α iff there is some mc subset of Σ such that Σ |=C α. |=rm is non-adjunctive, since p, ¬p rm p ∧ ¬p. ({p, ¬p} has two mc subsets, {p} and {¬p}.) It does allow multi-premise inference, however. For example, p, p ⊃ q |=rm q. ({p, p ⊃ q} has only one mc subset, namely itself.) |=rm has an unusual property for a notion of deductive consequence, however: it is not closed under uniform substitution. For, as is easy to check, p, q |=rm p ∧ q, but p, ¬p rm p ∧ ¬p.76 A different way of proceeding, due to Schotch and Jennings [1980], is as follows. Define a covering of a set, Σ, to be a finite collection of disjoint sets, Σ1 , ..., Σn , such that for all 1 ≤ i ≤ n, Σi ⊆ Σ and is classically consistent, and for all α ∈ Σ, at least one of the sets classically entails α. Define the level of incoherence of Σ, l(Σ), to be the smallest n such that Σ has a covering of size n; if it has no such covering, then set l(Σ) (conventionally) as ∞. If Σ is classically consistent, then l(Σ) = 1. A set such as {p, ¬p, q} has level 2, since it has two coverings of size 2: {p, q} and {¬p}; {¬p, q} and {p}. And if Σ contains a member that is itself classically inconsistent, then l(Σ) = ∞. Now define: Σ |=sj α iff l(Σ) = ∞, or l(Σ) = n and for every covering of Σ of size n, there is some member of it that classically entails α. The intuition to which this answers is this. We may suppose that Σ comes to us muddled up from different sources. The level of a set tells us the simplest way we can unscramble the data into consistent chunks; and however we unscramble the data in this way, we know that some source vouchsafes the conclusion. Like |=rm , |=sj is non-adjunctive, since p, ¬p rm p ∧ ¬p. ({p, ¬p} has level 2, with one covering: {p}, {¬p}.) But it does allow multi-premise inference. For example, p, p ⊃ q |=sj q. ({p, p ⊃ q} is of level 1.) And |=sj is not closed under uniform substitution. For p, q |=sj p ∧ q, but p, ¬p sj p ∧ ¬p. But |=rm and |=sj are not the same. For a start, since p ∧ ¬p is classically inconsistent, {p ∧ ¬p} has level ∞ and so {p ∧ ¬p} |=sj q. But {p ∧ ¬p} has one mc subset, namely the empty set, φ; and φ C q; hence, p ∧ ¬p rm q. Moreover, let Σ = {p, ¬p, q, r}. Then Σ has two mc subsets {p, q, r}, and {¬p, q, r}. Hence Σ |=rm q ∧ r. But Σ has level 2, and one covering has the the members: {p, q}, {¬p, r}. Hence, Σ sj q ∧ r. Finally, |=rm is monotonic: if Σ has an mc subset that classically delivers α, so does Σ ∪ Π. But |=rm is not: p, q |=sj p ∧ q, whilst p, ¬p, q sj p ∧ q, since {p, ¬p, q} has level 2, and one covering is: {¬p, q}, {p}. We can look at the Schotch/Jennings account in a somewhat different, but illuminating, fashion. A standard definition of classical consequence is the familiar: 76 We could define another consequence relation as the closure of |= rm under uniform substitution. This would still be paraconsistent.
Paraconsistency and Dialetheism
151
Σ |=C α iff for every evaluation, ν, if every member of Σ is true in ν, so is α. Equivalently, we can put it as follows. If Σ is consistent: Σ |=C α iff for every Π ⊇ Σ, if Π is consistent, so is Π ∪ {α} (If Σ is not consistent, then the biconditional holds vacuously.) In other words, a valid inference preserves consistency of supersets. Or, to put it another way, it preserves coherence of level 1. Now, if Π is inconsistent, there is not much consistency to be preserved, but we may still consider it worth preserving higher levels of coherence. This is exactly what |=sj does. For, as is noted in Brown and Schotch [1999], if for some n, l(Σ) = n: Σ |=sj α iff for every Π ⊇ Σ, if l(Π) = n then l(Π ∪ {α}) = n (If l(Σ) = ∞, the biconditional holds vacuously.)77 Thus, the Schotch/Jennings construction gives rives to a family of paraconsistent logics in which validity may be defined in terms of the preservation of something other than truth. Such preservational logics are the subject of another article in this Handbook, and so I will say no more about them here.
4.3
Dualising Intuitionism
The next sort of system of paraconsistent logic was the result of the work of the Brazilian logician da Costa starting with a thesis in 1963.78 Da Costa, and his students and co-workers, produced many systems of paraconsistent logic, including more discussive logics. But the original and best known da Costa systems arose as follows. In intuitionist logic, and because of the intuitionist account of negation, it is possible for neither α nor ¬α to hold. Thus, in a logic with a dual account of negation, it ought be possible for both α and ¬α to hold. The question, then, is how to dualise. Da Costa dualised as follows. We start with an axiomatisation of positive intuitionist logic (that is, intuitionist logic without negation). The following79 will do. The only rule of inference is modus ponens. α ⊃ (β ⊃ α) 77 From left to right, suppose that l(Σ) = n, Σ |= sj α, Π ⊇ Σ, and l(Π) = n. Let Π1 , ..., Πn be a covering of Π. Let Σi = Σ ∩ Πi . Then Σ1 , ..., Σn is a covering of Σ. Thus, for some i, Σi |=C α. Hence, Πi |=C α, and Π1 , ..., Πn is a covering of Π ∪ {α}. Conversely, suppose that l(Σ) = n, and for every Π ⊇ Σ, if l(Π) = n then l(Π ∪ {α}) = n; but Σ sj α. Then there is some partition of Σ, Σ1 , ..., Σn , such that for no i, Σi |=C α. Hence, for each i, Σi ∪ {¬α} is consistent. Thus, if Π = Σ ∪ {¬α ∧ σ; σ ∈ Σ}, l(Π) = n. Hence, l(Π ∪ {α}) = n. But this is impossible, since α cannot be consistently added to any member of a covering of Π of size n. 78 The most accessible place to read the results of da Costa’s early work is his [1974]. 79 Taken from Kleene [1952].
152
Graham Priest
(α ⊃ β) ⊃ ((α ⊃ (β ⊃ γ)) ⊃ (α ⊃ γ)) (α ∧ β) ⊃ α (α ∧ β) ⊃ β α ⊃ (β ⊃ (α ∧ β)) α ⊃ (α ∨ β) β ⊃ (α ∨ β) (α ⊃ γ) ⊃ ((β ⊃ γ) ⊃ ((α ∨ β) ⊃ γ)) One obtains an axiomatization for full intuitionist logic if one adds: (α ⊃ β) ⊃ ((α ⊃ ¬β) ⊃ ¬α)) α ⊃ (¬α ⊃ β) It is clear that one certainly does not want the second of these in a paraconsistent logic; the first, being a version of reductio ad absurdum, is also suspect.80 The two most notable consequences of these principles for negation are: α ⊃ ¬¬α ¬(α ∧ ¬α) (though not, of course, the converse of the first). Both of these, in their own ways, can be thought of as saying that if something is true, it is not false, whilst leaving open the possibility that something might be neither. To obtain a paraconsistent logic, it is therefore natural to take as axioms the claims which are, in some sense, the duals of these: 1¬ ¬¬α ⊃ α 2¬ α ∨ ¬α Both of these, in their ways, can be thought of as saying that if something is not false, it is true, whilst leaving open the possibility that something may be both. Adding these two axioms to those of positive intuitionist logic gives da Costa’s system Cω . Next, da Costa reasoned, there ought to be a way of expressing the fact that α behaves consistently (that is, is not both true and false). The natural way of doing this is by the sentence ¬(α ∧ ¬α). Write this as αo , and consider the principles: 1o β o ⊃ ((α ⊃ β) ⊃ ((α ⊃ ¬β) ⊃ ¬α))) 2o (αo ∧ β o ) ⊃ ((α ∧ β)o ∧ (α ∨ β)o ∧ (α ⊃ β)o ∧ (¬α)o ) The first says that the version of the reductio principle we have just met works provided that the contradiction deduced behaves consistently. The second (in which the last conjunct of the consequent is, in fact, redundant) expresses the plausible thought that if any sentences behave consistently, so do their compounds. Adding these two axioms to Cω gives the da Costa system C1 . 80 Though, note, even a paraconsistent logician can accept the principle that if something entails a contradiction, this fact establishes its negation: versions of this inference are valid in many relevant logics.
Paraconsistency and Dialetheism
153
The addition of this machinery in C1 allows us to define the strong negation of α, ¬∗ α, as: ¬α ∧ αo . ¬∗ α says that α is consistently false. It is possible to show that ¬∗ has all the properties of classical negation.81 But as is well known, the addition of classical negation to intuitionist logic turns the positive part into classical logic. (Using the properties of classical negation, it is possible, reasoning by cases in a standard fashion, to establish Peirce’s Law: ((α ⊃ β) ⊃ α) ⊃ α, which is the difference between positive intuitionist and classical logics.) Hence, the positive logic of C1 is classical logic. It might be thought that one needs more than αo to guarantee that α behaves consistently. After all, in contexts where contradictions may be acceptable, why might we not have αo ∧ α ∧ ¬α? In virtue of this, it might be thought that what is required in condition 1o is not αo , but αo ∧ αoo . Of course, there is no a priori guarantee that this behaves consistently either. So it might be thought that what is required is αo ∧ αoo ∧ αooo ; and so on. Let us write αn as αo ∧ ... ∧ αo...o (where the last conjunct has n ‘o’s). Then replacing ‘o’ by ‘n’ in 1o and 2o gives the da Costa system Cn (1 ≤ n < ω). Just as in C1 , in each Cn , a strong negation ¬∗ α can be defined as ¬α ∧ αn , and the collapse of the positive part into classical logic occurs as before. Semantics for the C systems were discovered by da Costa and Alves [1977]. Take the standard truth-functional semantics for positive classical logic. Thus, if ν is an evaluation, ν(α ∨ β) = 1 iff ν(α) = 1 or ν(β) = 1; ν(α ⊃ β) = 1 iff ν(α) = 0 or ν(β) = 1, etc. Now allow ν to behave non-deterministically on negation. That is, for any α, ν(¬α) may take any value. Validity is defined in the usual way, in terms of truth preservation over all evaluations. It is clear that the resulting system is paraconsistent, since one can take an evaluation that assigns both p and ¬p the value 1, and q the value 0. The system just described is, in fact, none of the da Costa systems. In a certain sense, it is the most basic of a whole family of logics which extend positive classical logic with a non-truth-functional negation. The Cn systems can be obtained by adding further constraints on evaluations concerning negation. Thus, if we add the conditions: (i) If ν(¬¬α) = 1 then ν(α) = 1 (ii) If ν(α) = 0 then ν(¬α) = 1 we validate 1¬ and 2¬ . Adding the conditions: If ν(β n ) = ν(α ⊃ β) = ν(α ⊃ ¬β) = 1 then ν(α) = 0 If ν(αn ) = ν(β n ) = 1 then ν((α ∧ β)n ) = ν((α ∨ β)n ) = ν((α ⊃ β)n ) = (¬α)o = 1 81 Specifically,
¬∗ satisfies the conditions:
(α ⊃ β) ⊃ ((α ⊃ ¬∗ β) ⊃ ¬∗ α)) ¬∗ ¬∗ α ⊃ α which give all the properties of classical negation. See da Costa and Guillaume [1965].
154
Graham Priest
then gives the system Cn (1 ≤ n < ω). The semantics for Cω are not quite so simple, since positive intuitionist logic is not truth-functional. However, non-deterministic semantics can be given as follows.82 A semi-evaluation is any evaluation that satisfies the standard conditions for conjunction and disjunction, plus (i), (ii), and: If ν(α ⊃ β) = 1 then ν(α) = 0 or ν(β) = 1 If ν(α ⊃ β) = 0 then ν(β) = 0 A valuation is now any semi-evaluation, ν, satisfying the further condition: if α is anything of the form α1 ⊃ (α2 ⊃ (...αn )...), where αn is not itself a conditional, then if ν(α) = 0, there is a semi-valuation, ν ′ , such that for all 1 ≤ i < n, ν ′ (αi ) = 1 and ν ′ (αn ) = 0. Validity is defined in terms of truth preservation over all evaluations in the usual way. As we have seen, all the C systems can be thought of as extending a positive logic (either intuitionistic or classical) with a non-truth-functional negation. They are therefore often classed under the rubric of positive plus logics. A singular fact about all the positive plus logics is that the substitution of provable equivalents breaks down. For example, α and α ∧ α are logically equivalent, but because negation is not truth functional, there is nothing in the semantics to guarantee that ¬α and ¬(α ∧ α) take the same value in an evaluation. Hence, these are not logically equivalent. Da Costa’s systems are the result of one way of producing something which may naturally be thought of as the dual of intuitionist logic. There are also other ways. Another is to dualise the Kripke semantics for intuitionist logic. A Kripke semantics for intuitionist logic is a structure W, R, ν, where W is a set (of worlds), R is a binary relation on W that is reflexive and transitive, and ν maps each propositional parameter to a truth value at every world, subject to the heredity condition: if xRy and νx (p) = 1, νy (p) = 1. The truth conditions for the operators are: νw (α ∧ β) = 1 iff νw (α) = 1 and νw (β) = 1 νw (α ∨ β) = 1 iff νw (α) = 1 or νw (β) = 1 νw (α ⊃ β) = 1 iff for all w′ such that wRw′ , if νw′ (α) = 1 then νw′ (β) = 1 νw (¬α) = 1 iff for all w′ such that wRw′ , νw′ (α) = 0 (Alternatively, ¬α may be defined as α ⊃ ⊥ where ⊥ is a logical constant that takes the value 0 at all worlds.) It is not difficult to show that the heredity condition follows for all formulas, not just parameters. An inference is valid if it is truth-preserving at all worlds of all interpretations. Dualising: everything is exactly the same, except that we dualise the truth conditions for negation, thus: 82 Folowing
Lopari´c [1986].
Paraconsistency and Dialetheism
155
νw (¬α) = 1 iff there is some w′ , such that w′ Rw and νw′ (α) = 0 It is easy to check that the general heredity condition still holds with these truth conditions. Since nothing has changed for the positive connectives, the positive part of this logic is intuitionist, but whereas in intuitionist logic we have α∧¬α |=I β and α |=I ¬¬α, but not β |=I α ∨ ¬α or ¬¬α |=I α, it is now the other way around. Details are left as an exercise. Here, though, is a counter-model for Explosion. Let W = {w0 , w1 }; w0 Rw1 ; at w0 , p and q are both false; at w1 , p is true and q is false. It follows that ¬p is true at w1 , and hence w1 gives the counter-model.83 Despite similarities, the logic obtained in this way is distinct from any of the C systems. It is easy to check, for example, that ¬α is logically equivalent to ¬(α ∧ α), and more generally, that provable equivalents are inter-substitutable. Yet a third way to dualise intuitionist logic, is to dualise its algebraic semantics.84 A Heyting algebra is a distributive lattice with a bottom element, ⊥, and an operator, ⊃, satisfying the condition:85 a ∧ b ≤ c iff a ≤ b ⊃ c which makes ⊥ ⊃ ⊥ the top element. We may define ¬α as α ⊃ ⊥. A standard example of a Heyting algebra is provided by any topological space, T . The members of the algebra are the open subsets of T ; ∧ and ∨ are union and intersection; ⊥ is the empty set, and a ⊃ b is (a ∨ b)o , where overlining indicates complementation, and o here is the interior operator of the topology. It is easy to check that ¬a = ao . It is well known that for finite sets of premises, intuitionist logic is sound and complete with respect to the class of all Heyting algebras — indeed with respect to the class of Heyting algebras defined by topological spaces. That is, α1 , ..., αn |=I β iff for every evaluation into every such algebra ν(α1 ∧ ... ∧ αn ) ≤ ν(β). We now dualise. A dual Heyting algebra is a distributive lattice with a top element, ⊤, and an operator, ⊂, satisfying the condition: a ≤ b ∨ c iff a ⊂ b ≤ c which makes ⊤ ⊂ ⊤ the bottom element. We may define ¬a as ⊤ ⊂ a. It is not difficult to check that if T is any topological space, then it produces a dual Heyting algebra whose elements are the closed sets of the space; ∧ and ∨ are union and intersection; ⊤ is the whole space; and a ⊂ b is (a ∧ b)c , where c is the closure c operator of the space. ¬b is clearly b . Validity is defined as before. We may call the logic that this construction gives closed set logic. Again, we have β |=I α ∨ ¬α and ¬¬α |=I α, but not their duals. Verification is left as an 83 A version of these semantics can be found, in effect, in Rauszer [1977]. In this, Rauszer gives a Kripke semantics for a logic he calls ‘Heyting-Brower Logic’. This is intuitionist logic plus the duals of intuitionist ¬ and ⊃. 84 As discovered by Goodman [1981]. 85 I use the same symbols for logical connectives and the corresponding algebraic operators, context sufficing to disambiguate.
156
Graham Priest
exercise, but a counter-model to Explosion is provided by the real numbers with their usual topology. Consider an evaluation, ν, such that ν(p) = [−1, +1], and ν(q) = φ. Then ν(p ∧ ¬p) = {−1, +1}, which is not a subset of φ. (This example illustrates how the points in the set represented by p ∧ ¬p may be thought of as the points on the topological boundary between the sets represented by p and ¬p.) It is to be noted that closed set logic is distinct from all the C systems. For example, it is easy to check that ¬α and ¬(α ∧ α) are logically equivalent — and more generally, that provable equivalents are inter-substitutable. Finally, as one would expect, modus ponens fails for ⊂. (It is, after all, the dual of ⊃.) It is a simple matter to construct a topological space where a ∩ (a ∩ b)c is not a subset of b. (Hint: take a to be the whole space.) Indeed, it may be shown that there is no operator definable in terms of ∧, ∨, ⊂ and ⊥ that satisfies modus ponens. Hence closed set logic is distinct from the logic obtained by dualising Kripke semantics as well.86
4.4 Many-Valued Logics It is not only intuitionism that allows for truth value gaps. In many-valued logics it is not uncommon to think of one of the values as neither true nor false. Hence another way of constructing a paraconsistent logic is to dualise this idea, with a many-valued logic that employs the value both true and false or something similar. The idea that paradoxical sentences might take a non-classical truth value goes back to at least Bochvar [1939]. But the idea that this might be used to construct a many-valued logic that was paraconsistent first seems to have been envisaged by the Argentinian logician Asenjo in 1954, though the ideas were not published until [1966]. As well as having the standard truth values, t and f , there is a third value i, which is the semantic value of paradoxical or antinomic sentences. The truth tables for conjunction, disjunction and negation are: ¬ t i f
f i t
∧ t i f
t t i f
i i i f
∨ t i f
f f f f
t t t t
i t i i
f t i f
and defining α ⊃ β as ¬α ∨ β gives it the table: ⊃ t i f
t t t t
i i i t
f f i t
86 I suspect that they have the same conditional-free fragment, though I have never checked the details. According to Goodman [1981, p. 124], closed set logic does have a Kripke semantics. The central feature of this is that it is not truth that is hereditary, but falsity. That is, if xRy and νx (α) = 0 then νy (α) = 0.
Paraconsistency and Dialetheism
157
The designated values are t and i. That is, a valid inference is one such that there is no evaluation where all the premises take the value t or i, and the conclusion does not.87 The logic is a very simple and natural one, and has been rediscovered a number of times since. For example, it and its properties were spelled out in more detail in Priest [1979], where it is termed LP (the Logic of Paradox), a name by which it is now standardly known. It is not difficult to see that LP is a paraconsistent logic: take the evaluation that sets p to the value i, and q to the value f to see that p, ¬p LP q. Despite this, it is not difficult to show that the logical truths of LP are exactly the same as those of classical logic. The same evaluation that invalidates Explosion shows that modus ponens for ⊃ is not valid: p, ¬p ∨ q LP q. The logic may be extended in many ways with a many-valued conditional connective that does satisfy modus ponens. Perhaps the simplest such connective has the following truth table: → t i f
t t t t
i f i t
f f f t
Adding this conditional gives the logic RM3 .88 It is clear that many-valued paraconsistent logics may be produced in many different ways. Any many-valued logic will be paraconsistent if it has a designated value, i, such that if ν(p) = i, ν(¬p) = i. Thus, L ukasiewicz continuum-valued logic (better known as a fuzzy logic) will be paraconsistent provided that the designated values include 0.5; but we will not go into this here.89 The semantics of LP may be reformulated in an illuminating fashion. Let 1 and 0 be the standard truth values true and false. And let us suppose that instead of taking an evaluation to be a function that relates each parameter to one or other of these, we take it to be a relation that relates each parameter to one or other, or maybe both. Let us write such an evaluation as ρ. We may think of αρ1 as ‘α is true (under ρ)’ and αρ0 as ‘α is false (under ρ)’. Given an evaluation of the propositional parameters, this can be extended to an evaluation of all formulas by the standard truth-table conditions: ¬αρ1 iff αρ0 ¬αρ0 iff αρ1 87 Designation is crucial here. The truth tables are the same as those of Kleene’s strong three valued logic. But there, the value i is thought of as neither true nor false, and hence not designated. This logic is not a paraconsistent logic. The designated values are not actually specified in Asenjo [1966], but designating i does seem to be faithful to his intentions. 88 The logic is so called because it is one of a family of n-valued logics, RM , whose intersection n is the semi-relevant logic RM (R-Mingle). I am not sure who first formulated RM3 . The earliest reference to it in print that I know is in Anderson and Belnap [1975]. 89 An argument for paraconsistency, based on a semantics with degrees of truth, was mounted by Pe˜ na in a doctoral thesis of 1979, and subsequently, e.g., in [1989]. His semantics is more complex than standard L ukaziewicz continuum-valued logic, though.
158 α ∧ βρ1 iff α ∧ βρ0 iff α ∨ βρ1 iff α ∨ βρ0 iff
Graham Priest
αρ1 and βρ1 αρ0 or βρ0 αρ1 or βρ1 αρ0 and βρ0
It is an easy matter to check that ρ relates every formula to 1 or 0 (or both). Moreover, if we write: t for: α is true and not false f for: α is false and not true i for: α is true and false then one can check that the conditions produce exactly the truth tables for LP . Further, under this translation, α takes a designated value (t or i) iff it relates to 1. So the definition of validity reduces to the classical one in terms of truthpreservation: Σ |=LP α iff for every ρ, if βρ1 for all β ∈ Σ, then αρ1 Hence, LP is exactly classical logic with the assumption that each sentence is either true or false, but not both, replaced with the assumption that each sentence is either true or false or both. Given these semantics, it is natural to drop the constraint that ρ must relate every parameter to at least one truth value, and so allow for the possibility that sentences may be neither true nor false, as well as both true and false. Thus, if we repeat the above exercise, but this time allow ρ to be an arbitrary relation between parameters and {0, 1}, we obtain a semantics for the logic of First Degree Entailment (F DE). These semantics were discovered by Dunn in his doctoral dissertation of 1966, though they were not published until [1976], by which time they also had been rediscovered by others.90 Since the semantic values of F DE extend those of LP it, too, is paraconsistent. But unlike LP it has no logical truths. (The empty value takes all these out.) It is also not difficult to show that F DE has a further important property: if α |=F DE β then α and β share a propositional parameter. F DE is, in fact, intimately related with the family of relevant logics that we will come to in the next subsection. Dunn’s semantics can be reformulated again. Instead of taking evaluations to be relations, we can take them, in a classically equivalent way, to be functions whose values are subsets of {1, 0}. It is not difficult to check that the truth conditions of the connectives can then be represented by the following diamond lattice: {1} ր
տ
տ
ր
{1, 0}
φ {0}
90 It is interesting to note that when Dunn was a student at the University of Pittsburgh he took some classes in the mathematics department where he was taught by Asenjo. Apparently, neither realised the connection between their work at this time.
Paraconsistency and Dialetheism
159
If ν is any evaluation of formulas into this lattice, ν(α ∧ β) = ν(α) ∧ ν(β);91 ν(α ∨ β) = ν(α) ∨ ν(β); and ν(¬α) = ¬ν(α), where ¬ maps top to bottom, vice versa, and maps each of the other values to itself. Suppose that we now define validity in the standard algebraic fashion: α1 , ..., αn |= β iff for every ν, ν(α1 ) ∧ ... ∧ ν(αn ) ≤ ν(β) Then the consequence relation is again F DE. The proof of this is relatively straightforward, though not entirely obvious. These semantics may be generalised as follows. A De Morgan lattice is a structure L, ∧, ∨, ¬, where L, ∧, ∨ is a distributive lattice, and ¬ is an involution of period two; that is, for all a, b in L: ¬¬a = a If a ≤ b then ¬b ≤ ¬a It is easy to check that the diamond lattice is a De Morgan lattice. One may show that F DE is sound and complete not just with respect to the diamond lattice, but with respect to the class of De Morgan lattices. (Thus, the class of De Morgan lattices relates to the diamond lattice as the class of Boolean algebras relates to the two-valued Boolean algebra in classical logic.) All these results are also due to Dunn. De Morgan lattices have a very natural philosophical interpretation. The members may be thought of as propositions (that is, as the Fregean senses of sentences). The ordering ≤ may then be thought of as a containment relation. Thus, α |= β iff however the senses of the parameters are determined, the sense of α contains that of β.
4.5
Relevant Logic
The final approach to paraconsistent logic that we will consider is relevant logic. What drove the development of this was a dissatisfaction with accounts of the conditional that validate “paradoxes” such as the paradoxes of material implication: α |= (β ⊃ α) ¬α |= (α ⊃ β) As soon as the material account of the conditional was endorsed by the founders of classical logic, it came in for criticism. As early as a few years after Principia Mathematica, C.I.Lewis started to produce theories of the strict conditional, α−−⊃ ⊃β (2(α ⊃ β)), which is not subject to these paradoxes. This conditional was, however, subject to other “paradoxes”, such as: 91 Again, I write the logical connectives and the corresponding algebraic operators using the same symbol.
160
Graham Priest
2β |= α−−⊃ ⊃β 2¬α |= α−−⊃ ⊃β Lewis eventually came to accept these. It is clear, though, that such inferences are just as counter-intuitive. In particular, intuition rebels because there may be no connection at all between α and β. This motivates the definition of a relevant logic. If L is some propositional logic with a conditional connective, →, then L is said to be relevant iff whenever |=L α → β, α and β share a propositional parameter.92 Commonality of the parameter provides the required connection of content. Though closely connected with paraconsistency, relevant logics are quite distinct. None of the paraconsistent logics that we have met so far is relevant.93 Moreover, a relevant logic may not be paraconsistent. One of the first relevant logics, Π′ of Ackermann [1956], contained the Disjunctive Syllogism as a basic rule. If this is interpreted as a rule of inference (i.e., as applying to arbitrary assumptions, not just to theorems), then Explosion is forthcoming in the usual way. The history of relevant logic goes back, in fact, to 1928, when the Russian logician Orlov published an axiomatisation of the fragment of the relevant logic R whose language contains just → and ¬. This seems to have gone unnoticed, however.94 Axiomatizations of the fragment of R whose language contains just → were given by Moh [1950] and Church [1951]). The subject took off properly, though, with the collaboration of the two US logicians Anderson and Belnap, starting at the end of the 1950s. In particular, in [1958] they dropped the Disjunctive Syllogism from Ackermann’s Π′ to produce their favourite relevance logic E. Both E and R are paraconsistent. The results of some 20 years of collaboration between Anderson, Belnap, and their students (especially Dunn, Meyer, and Urquhart) is published as Anderson and Belnap [1975], and Anderson, Belnap, and Dunn [1992]. Initially, relevance logic was given a purely axiomatic form. For reasons that will become clear later, let us start with an axiom system for a relevant logic that Anderson and Belnap did not consider, B. A1. A2. A3. A4. A5. A6. A7.
α→α α → (α ∨ β) (and β → (α ∨ β)) (α ∧ β) → α (and (α ∧ β) → β) α ∧ (β ∨ γ) → ((α ∧ β) ∨ (α ∧ γ)) ((α → β) ∧ (α → γ)) → (α → (β ∧ γ)) ((α → γ) ∧ (β → γ)) → ((α ∨ β) → γ) ¬¬α → α
92 According to this definition F DE is not a relevant logic, since it has no conditional connective. However, if we add a conditional connective, subject to the constraint that |= α → β iff α |=F DE β, it is. This is how the system first arose. 93 With the exception of F DE as understood in the previous footnote. 94 It was rediscovered by Doˇ sen [1992].
Paraconsistency and Dialetheism
R1. R2. R3. R4. R5.
161
α, α → β ⊢ β α, β ⊢ α ∧ β α → β ⊢ (γ → α) → (γ → β) α → β ⊢ (β → γ) → (α → γ) α → ¬β ⊢ β → ¬α
The logic R can be obtained by adding the axioms: A8. (α → β) → ((β → γ) → (α → γ)) A9. α → ((α → β) → β) A10. (α → (α → β)) → (α → β) A11. (α → ¬β) → (β → ¬α) (and dropping R3-R5, which are now redundant).95 F DE is, it turns out, the core of all the relevant systems, in that if α and β contain no occurrences of → then α |=F DE β iff α → β is provable (in no matter which of the above-mentioned systems). Like F DE, B has no logical truths expressible in terms of only ∧, ∨, and ¬. In R, however, α ∨ ¬α is a logical truth, as, in fact, are all classical tautologies. The axiom systems, by themselves, are not terribly illuminating. An important problem then became to find appropriate semantics. The first semantics, produced by Dunn, was an algebraic one. Define a De Morgan monoid to be a structure L, ∧, ∨, ¬, →, ◦, e. Where L, ∧, ∨, ¬ is a de Morgan lattice and → is a binary operator (representing the conditional). It is convenient to extract the properties of the conditional from a corresponding residuation operator (a sort of intensional conjunction); this is what ◦ is. e is a distinguished member of L; it’s presence is necessary since we need to define logical truth, and this cannot be done in terms of the top member of the lattice (as in the algebraic semantics for classical and intuitionist logics), since there may be none. The logical truths are those which are always at least as great as e. In a De Morgan monoid, the additional algebraic machinery must satisfy the conditions: e◦a=a a ◦ b ≤ c iff a ≤ b → c If a ≤ b then a ◦ c ≤ b ◦ c and c ◦ a ≤ c ◦ b a ◦ (b ∨ c) = (a ◦ b) ∨ (a ◦ c) and (b ∨ c) ◦ a = (b ◦ a) ∨ (c ◦ a) Note that e ≤ a → b iff e ◦ a ≤ b iff a ≤ b, so conditionals may be thought to express containment of propositional content. 95 E
is obtained from R by deleting A9 and adding:
(α → ¬α) → ¬α (α → γ) → (((α → γ) → β) → β) (N (α) ∧ N (β)) → N (α ∧ β) where N (γ) is (γ → γ) → γ. E is a much clumsier system than R. Initially, Anderson and Belnap thought that the → of E was exactly the modalised → of R. That is, they believed that if one adds an appropriate modal operator, 2, to R, then 2(α → β) behaves in R, just like α → β behaves in E. They even stated that should this not turn out to be the case, they would prefer the modalised version of R. It turned out not to be the case.
162
Graham Priest
Finally, define: Σ |= α iff for all evaluations into all De Morgan monoids, if e ≤ ν(β) for all β ∈ Σ, e ≤ ν(α) This consequence relation is exactly one for B. Stronger relevant logics can be obtained by putting further constraints on ◦. In particular, the logic R is produced by adding the following constraints: ◦8 a ◦ (b ◦ c) = (a ◦ b) ◦ c ◦9 a ◦ b = b ◦ a ◦10 a ≤ a ◦ a ◦11 a ◦ b ≤ c iff a ◦ ¬c ≤ ¬b ◦8-◦11 correspond to A8-A11, respectively, in the sense that the structures obtained by adding any one of them are sound and complete with respect to the axiom system obtained by adding the corresponding axiom to B.96 Perhaps the most robust semantics for relevant logics are world semantics. These were produced by the Australian logician R. Routley (later Sylvan), in conjunction with Meyer, who moved to Australia, in the early 1970s.97 The results of some 20 years of collaboration between Routley, Meyer, and their students, especially Brady, are published in Routley, Plumwood, Meyer and Brady [1984] and Brady [2003]. Historically, the world semantics piggy-backed upon yet another semantics for F DE produced by Sylvan and V.Routley (later Plumwood).98 An interpretation for the language of F DE is a structure W, ∗, ν, where W is a set of worlds, and ν is a function that assigns every propositional parameter a truth value (0 or 1) at every world. Thus, for all w ∈ W , νw (p) = 0 or νw (p) = 1. The novel element here is ∗. This is a function from worlds to worlds, satisfying the condition: w∗∗ = w. w∗ is often glossed as the “mirror image” world of w; but its philosophical understanding is still a matter of some debate.99 The truth conditions for the connectives are: νw (α ∧ β) = 1 iff νw (α) = 1 and νw (β) = 1 νw (α ∨ β) = 1 iff νw (α) = 1 or νw (β) = 1 νw (¬α) = 1 iff νw∗ (α) = 0 96 Dunn worked out the details for R. It was Meyer who worked out the details for B and the logics between B and R. See Meyer and Routley [1972]. 97 Related ideas were published by Urquhart [1972] and by Fine [1974]. 98 See Routley and Routley [1972]. 99 For what is, I think, the most coherent story, see Restall [1999].
Paraconsistency and Dialetheism
163
Thus, in the case where w∗ = w, the truth conditions for ¬ collapse into the standard ones of modal logic. Validity is defined in terms of truth-preservation at all worlds of all interpretations. Again, it is not entirely obvious that these semantics deliver F DE, but it is not difficult to establish this. Essentially, it is because a relational evaluation, ρ, and a pair of worlds, w, w∗ , are equivalent if they are related by the conditions: νw (α) = 1 iff αρ1 νw∗ (α) = 0 iff αρ0 Thus, a counter-model to Explosion is provided by the interpretation with two worlds, w, w∗ , such that p is true at w and false at w∗ (so that ¬p is true at w); but q is false at w. We can build an account of the conditional on top of this machinery as one would in a standard modal logic. Thus, α−−⊃ ⊃β is true at world w iff at every (accessible) world either α is false or β is true. The behavior of ∗ suffices to ⊃ ensure that neither α−−⊃ ⊃(β ∨ ¬β) nor (α ∧ ¬α)−− ⊃β is valid. But the logic is not a relevant logic. The trouble is, for example, that q −−⊃ ⊃q is true at all worlds. Hence ⊃ p−−⊃ ⊃(q −− ⊃q) is also true at all worlds, and so logically valid. To finish the job of producing the semantics for a relevant logic, we therefore need further machinery. In Routley/Meyer semantics, a new class of worlds is introduced.100 The worlds we have employed so far may be called normal worlds. The new worlds are nonnormal worlds. Non-normal worlds are logically impossible worlds, in the sense that in these worlds the laws of logic may be different from what they are at possible (normal) worlds — just as the laws of physics may be different at physically impossible worlds. In particular, if one thinks of conditionals as expressing the laws of logic — so that, for example α → α expresses the fact that α follows from α — then non-normal worlds are worlds where logically valid conditionals (like α → α) may fail. Thus p → (q → q) will not be logically valid, since there are worlds where p is true, but q → q is false. Specifically, an interpretation is a structure W, N, ∗, R, ν. W , ∗, and ν are as before. N is a subset of W , and is the class of normal worlds, so W − N is the class of non-normal worlds. The truth conditions for ∧, ∨, and ¬ are as before.101 At normal worlds, w: νw (α → β) = 1 iff for all w′ ∈ W , either νw′ (α) = 0 or νw′ (β) = 1 These are the simple S5 truth conditions for −−⊃ ⊃. To state the truth conditions for α → β at non-normal worlds we require the relation R. This is an arbitrary relation on worlds; but unlike the binary accessibility relation of standard modal logic, it is a ternary relation. Thus, for all w ∈ W − N : 100 The following are not quite the original Routley/Meyer semantics, but are a simplified form due to Priest and Sylvan [1992] and Restall [1993]. 101 It is possible to perform exactly the same construction concerning conditionals, but imposed not on ∗ semantics for negation, but on the Dunn four-valued semantics. The result is a family of perfectly good relevant logics, but not the Anderson Belnap family under consideration here.
164
Graham Priest
νw (α → β) = 1 iff for all x, y ∈ W such that Rwxy, either νx (α) = 0 or νy (β) = 1 Given these truth conditions, it is clear that a conditional such as q → q may fail at a non-normal world, w, since we may have Rwxy, with q true at x, but false at y. In this way, relevance is obtained. Note that if x = y the truth conditions for → at non-normal worlds collapse into the S5 truth conditions. Hence, we may state the truth conditions for → at all worlds uniformly in terms of the ternary relation, provided that at normal worlds we define R in terms of identity. That is, for normal worlds, w: Rwxy iff x = y Validity is defined in terms of truth preservation at normal worlds. Thus: Σ |= α iff for every interpretation and every w ∈ N , if νw (β) = 1 for all β ∈ Σ, νw (α) = 1 These semantics are a semantics for the relevant logic B. Stronger relevant logics may be produced by adding constraints on the ternary relation R. For example, the relevant logic R is produced by adding the following constraints. For all x, y, z, u, v ∈ W :102 R8. If ∃w(Rxyw and Rwuv) then ∃w(Rxuw and Rywv) R9. If Rxyz then Ryxz R10. If Rxyz then ∃w(Rxyw and Rwyz) R11. If Rxyz then Rxy ∗ z ∗ Each of these constraints corresponds to one of A8-A11, in the sense that the axiom is sound and complete with respect to the class of interpretations in which the corresponding constraint is in force. An important issue to be faced is what, exactly, the ternary relation means, and why it should be employed in stating the truth conditions of conditionals. Whether there are sensible answers to these questions, and, if so, what they are, is still a matter for debate. Some, for example, have tried to explicate the notion in terms of the flow of information.103 It is worth noting that the ternary relation can be avoided if one simply assigns conditionals arbitrary truth values at nonnormal worlds — which makes perfectly good sense, since at logically impossible worlds, logical principles could, presumably, do anything. This construction gives a relevant logic weaker than B.104 At any rate, the relevant logic B is the analogue of the modal logic K, in the following sense. K is the basic (normal) modal logic. In its semantics, the binary accessibility relation is arbitrary. Stronger logics are obtained by adding constraints on the relation. Similarly, B is the basic relevant logic (of this family). 102 Added in press: The condition R9 is not quite right. See the second edition of Priest [2001, 10.4a.5. 103 For further details, see Priest [2001a, 10.6]. 104 See Priest [2001a, ch. 9].
Paraconsistency and Dialetheism
165
In its semantics, the ternary accessibility relation is arbitrary. Stronger logics are obtained by adding constraints on the relation. It was this fact that became clear with the invention of the world-semantics for relevant logics by the Australian logicians. Moreover, just as the early work on modal logic had concentrated on systems at the strong end of the modal family, so Anderson and Belnap’s work had concentrated on systems at the strong end of the relevant family.105 Further details concerning relevant logic can be found in the chapter on the subject in this Handbook, so we will pursue the issue no further here. We have now looked at the development of paraconsistent logics in the modern period, based on four distinct ideas. This survey is certainly not exhaustive: there are other approaches.106 But we have tracked the major developments, and it is now time to return to dialetheism.
5
5.1
MODERN DIALETHEISM
Inconsistent Information
As we noted in 1.2, the major motive for modern paraconsistency is the idea that there are situations in which we need to reason in a non-trivial way from inconsistent information. The early proponents of paraconsistent logics mentioned various such situations, but the first sustained discussion of the issue (that I am aware 105 A word on terminology. The Americans called the subject relevance logic, since they took the logic to be spelling out what relevance was. This was rejected by Sylvan, who argued that the logics did not provide an analysis of relevance as such. The logics were relevant, but this fact fell out of something more fundamental, namely, truth preservation over a suitably rich class of (and especially impossible) worlds. Following Sylvan, Australian logicians have called the logics relevant logics. 106 A quite different approach goes back to research starting in the late 1950s. This also has relevance connections. It is a natural idea that classical logical consequence lets in too much, and specifically, that it lets in inferences where the premises and conclusion have no connection with each other. The thought then is to filter out the irrelevant inferences by imposing an extra condition. Specifically, define the inference from α to β to be prevalid if α |=C β and F (α, β). Prevalid inferences may not be closed under substitution. So define an inference to be valid if it is obtained from a prevalid inference by uniform substitution. The condition F is a filter that removes the Bad Guys. A suitable choice of F gives a paraconsistent logic. The first filter logic was given by Smiley [1959]. His filter was the condition that α not be a classical contradiction and β not be a classical tautology. It is clear that this makes the inference p ∧ ¬p ⊢ q invalid. It is also easy to check that the following inferences are valid under the filter: p ∧ ¬p ⊢ p ∧ (¬p ∨ q), p ∧ (¬p ∨ q) ⊢ q. (The first is a substitution instance of p ∧ r ⊢ p ∧ (r ∨ q).) This shows two things: first, that the disjunctive syllogism holds, unlike in most other paraconsistent — and particularly relevant — logics; second, that the transitivity of deducibility breaks down. The failure of transitivity is, in fact, typical of filter logics (though not invariably so). Perhaps the most interesting filter logic was developed by Tennant [1984], a student of Smiley. It is given most naturally in multiple-conclusion terms. (Thus, Σ |=C Π iff every classical evaluation that makes every member of Σ true makes some member of Π true.) Accordingly, Σ |= Π iff Σ |=C Π and there are no proper subsets Σ′ ⊂ Σ, Π′ ⊂ Π, such that Σ′ |=C Π′ . The filter takes out redundant “noise”. Suitably developed, this approach can be used to construct a family of relevant but non-transitive logics. See Tennant [1992].
166
Graham Priest
of) is Priest and Routley [1989].107 A list of the situations involving inconsistent information that have been mooted include: 1. Information collected from different sources, at different times, etc., especially in computational information processing. 2. Various theories in science and mathematics. 3. Various theories in philosophy. 4. Various bodies of law and other legal documents. 5. Descriptions of borderline cases concerning vague predicates. 6. Descriptions of certain states of change. 7. Information concerning over-determination and multi-criterial terms. 8. Information generated by paradoxes of self-reference. Of these, the most straightforward is 1.108 Information collected in this way is clearly liable to be inconsistent. The situation is particularly crucial in modern information processing, where the amount of information is humanly unsurveyable. Whilst, no doubt, one would normally wish to revise inconsistent information when it occurs in this context, we might be in a situation in which we do not know how to revise consistently. Worse, as is well known, there is no algorithm for inconsistency, so we may not even know that the information is inconsistent. For 2, it is a fact that various theories in the history of science have been inconsistent, and known to be so. Perhaps the most striking example of this is the Bohr theory of the atom, whose inconsistency was well recognised — even by Bohr. To explain the frequency of radiation emitted in quantum transitions, classical electromagnetic theory had to be employed. But the same electromagnetic theory contradicts the existence of stationary states for an electron in orbit; it entails that such electrons, since they are accelerating, will radiate (and so lose) energy.109 107 The essay, which can be consulted for further discussion of the material that follows, is one of the introductory chapters of Priest, Routley, and Norman [1989]. This was the first collection of essays on paraconsistency, and contains essays by most of the founders of the subject. It may be noted that the completed manuscript of the book was sent to the publisher in 1982, which is a more accurate dating of its contents. The book contains a useful bibliography of paraconsistecy to that date. 108 A supposed example of this that is sometimes cited is the information provided by witnesses at a a trial, who frequently contradict one another — and themselves. This example, though, is not very persuasive. For, plausibly, the pertinent information in this sort of case is not of the form ‘the car was red’, ‘the car was not red’, but of the form ‘witness x says that the car was red’, ‘witness y says that the car was not red’. (The judge and jury may or may not conclude something about the colour of the car.) Information of this kind is consistent. 109 The Bohr theory has long since been displaced by modern quantum theory. But this, too, sails close to the paraconsistent wind in a number of places. To mention just one: the Dirac δ-function has mathematically impossible properties. The integral of the function is non-zero; yet its value at all but one point is zero.
Paraconsistency and Dialetheism
167
An example of an inconsistent theory in the history of mathematics is the original calculus of Newton and Leibniz. Again, the inconsistency of this was well known at the time. It was pointed out forcibly by Berkeley, for example. In computing derivatives one needed to divide by infinitesimals, at one stage, and so suppose them to be non-zero. In the final stage of the computation, however, one had to ignore infinitesimal summands, hence assuming, in effect, that they are zero.110 We will return to the issue of inconsistent mathematical theories later. Turning to 3, the examples of inconsistent theories in the history of philosophy are legion. Indeed, most philosophers who have constructed theories of any degree of complexity have endorsed principles that turned out to be contradictory. No doubt, many of these philosophers contradicted themselves unwittingly. However, in Section 2 above, we noted various philosophers for whom this was not the case: Heraclitus, Hegel, and Meinong (at least, as many people interpreted him). Again, we will return to inconsistent philosophical theories later. We will also come to the other cases on the list above in a minute. But given even just these cases, it is clear that inferences must be, or were, drawn from inconsistent information. What inference mechanism was employed in each of the historical cases is a matter for detailed historical investigation. There is no a priori reason to suppose that it was one of the formal paraconsistent logics we looked at in the last section — though there is no a priori reason to suppose that it was not, either. What is ungainsayable is that in all these cases, where inference goes ahead in contexts whose inconsistency — or the possibility thereof — is explicitly acknowledged, some inference procedure that is de facto paraconsistent must (have) be(en) employed.
5.2
The Rise of Modern Dialetheism
In none of the cases so far discussed is there much temptation to suppose that the inconsistent information in question is true, that is, that we have an example of dialetheism — unless one endorses one of the philosophical theories mentioned, such as Meinongianism. Even in the cases of inconsistent theories in science and mathematics, we may suppose that the theories were important, not because they were taken to be true, but because they were useful instrumentally, or perhaps they were taken to be good approximations to the (consistent) truth. In fact, none of the paraconsistent logicians mentioned in the previous section who wrote before the 1970s, with the exception of Asenjo, comes close to endorsing dialetheism.111 Indeed, it is clear that some of the formal paraconsistent logics of the last section do not even lend themselves to dialetheism. Non-adjunctive logics, in particular, though they concern the aggregation of information that is, collectively, inconsistent, have no truck with the idea that the information 110 For an analysis of this, and many other inconsistent mathematical theories, see Mortensen [1995]. 111 This is true even of da Costa, who was much concerned with inconsistent set-theories. He tended to regard these simply as interesting and possibly important mathematical theories.
168
Graham Priest
from any one source is inconsistent. To bring this home, note that for each of the non-adjunctive constructions, one can formulate explicitly dialetheic versions. For example, consider discussive logic. Repeat the construction, but based not on a classical modal logic, but on a paraconsistent modal logic that allows for inconsistent worlds (for example, of the kind in the world-semantics of relevant logic). Or in the Rescher/Manor construction, instead of considering maximal consistent sets, consider maximal non-trivial sets, and then apply a paraconsistent consequence relation to these. How to handle pieces of information from multiple sources, which do not fit together happily, is a problem for everyone, dialetheist or otherwise. The rise of the modern dialetheist movement can most naturally be seen as starting in the 1970s with the collaboration between Priest and Routley in Australia.112 Priest argued for dialetheism in [1973] in an argument based on paradoxes of selfreference and G¨ odel’s Theorem. The case was mounted in detail in a paper, later published as [1979], given at a meeting of the Australasian Association for Logic in Canberra in 1976, where Priest and Routley first met. Priest [1987] is a sustained defence of dialetheism. Routley became sympathetic to dialetheism because of his work on the semantics of relevant logics, and the possibility of applying relevant logic to logical paradoxes and to Meinong. He endorsed the position in [1977] and [1979].113 It is worth noting that it was the development of the world-semantics for relevant logic which brought the dialetheic potential of relevant logic to the fore. If there are inconsistent worlds, a person of a naturally curious disposition will ask how one knows that the actual world is not one of them. The American relevant logicians never showed any tendency towards dialetheism. Even Dunn, who was responsible for the four-valued semantics, preferred to read 1 and 0 as ‘told true’ and ‘told false’, rather than as ‘true’ and ‘false’: inconsistent information could be given, but not the truth. Endorsing the world-semantics for relevant logic does not require dialetheism, however. It is quite possible to suppose that all the inconsistent worlds are non-normal, that is, that for all w ∈ W − N , w = w∗ . The logic will still be relevant, but will validate Explosion, and so not be paraconsistent. Alternatively, one may suppose that some normal worlds are inconsistent, so that the logic is paraconsistent, but that the actual world has special properties; in particular, consistency.
5.3 Arguments for Dialetheism Let us now return to the list of examples in 5.1. The rest of the examples on the list have been mooted as dialetheias. Let us start with 4. It is not uncommon 112 Readers must remember, especially at this point, that this essay is not being written by an impartial historian, and make due allowances for this. 113 In this paper Routley describes his position as ‘dialectical’, taking the view to be identical with aspects of dialectical logic in the Hegel/Marx tradition. Whilst there certainly are connections here, the simple identification is, at the very least, somewhat misleading, and Routley dropped the description after the term ‘dialetheism’ was coined.
Paraconsistency and Dialetheism
169
for legal documents to have unforeseen consequences; sometimes, these can be contradictory. Suppose, for example, that the constitution of a certain country contains the clauses: All university graduates are required to perform jury service. No woman shall be a member of a jury. We may suppose that when the constitution was written, university admission was restricted to male clergy, as it had been for hundreds of years. Some time later, however, universities open their doors to women. Women graduates are then both required to perform and forbidden from performing jury service.114 Of course, once the contradiction came to light, the constitution would presumably be changed, or a judge would rule one way or the other (which is tantamount to the same thing). But until and unless this is done, we have a legal contradiction. The law has a number of mechanisms for defusing prima facie contradictions. For example it is a general principle that constitutional law outranks statute law, and that a later law overrides an earlier law. Clearly, such principles may well resolve an explicit contradiction in legislation. However, equally clearly, the situation may be such that none of the principles applies. (The situation just described might be one of these.) And where this is the case, the contradictions are not just prima facie. Turning to 5, the idea is this. Given a vague predicate, there is a grey area between cases in which it clearly applies and cases where it clearly does not. Thus, there is no point at which a tadpole ceases to be a tadpole and becomes a frog. Suppose that Fred is a creature in this grey area. Intuition says that Fred is as much tadpole as not tadpole, and as little tadpole as not tadpole. In other words, the semantic value of ‘Fred is a tadpole’ is symmetrically poised between truth and falsity. It is commonplace to suppose that a sentence such as this is neither true nor false. But as far as the story so far goes, both true and false is just as good. Moreover, for any consideration that drives one towards truth value gaps, there would seem to be dual considerations that drive one towards truth value gluts.115 To be honest, any simple three-valued solution to the problem of vagueness is going to be problematic for very simple reasons. Just as the boundary between being true and being false is grey in such cases, so the boundary between being true and being neither true nor false, or being both true and false, is also grey. Little therefore seems to have been gained by moving to three semantic values. Considerations of this kind have led some logicians to endorse a continuum-valued semantics to deal with vagueness. Assuming, as is standard, that such values are numbers in the range [0, 1], and that if the value of α is x, the value of ¬α is 114 In a similar way, the rules of a game, such as chess, may well have untoward consequences, such as a contradiction in certain recondite situations that come to light. 115 See Hyde [1997].
170
Graham Priest
1 − x, then a contradiction α ∧ ¬α may certainly be half-true — and 0.5 may be a designated value in the context. In some ways, issues are similar when we move to 6. Consider a state of affairs described by α, which changes, perhaps instantaneously, to one described by ¬α. It may be that there is something about the point of transition that determines either α or ¬α as true at that transition. Thus, for example, if a car accelerates continuously from rest, there is a last point with zero velocity, but no first point with a non-zero velocity. But, again, it may be that the situation is completely symmetrical. Thus, if a subatomic particle makes an instantaneous transition from one quantum state to another, there are no continuity considerations to determine the situation at the point of transition one way or the other. In such situations, the transition state is symmetrically poised between α and ¬α. Either, then, neither α nor ¬α is true, or both are. Moreover, in this case, there are some considerations, at least, which push towards the latter conclusion. The state, whatever it is, is a state of change. Such a state is naturally described as one in which α ∧ ¬α holds. (Recall Heraclitus.) A state where neither α nor ¬α holds is less naturally thought of as a state of change. For if neither holds, then α has ceased to be true. That change is already over. It is true that if α ∧ ¬α holds then α still holds, so its ceasing is yet to occur. But in this case, at least ¬α has already started: change is under way. Or to put it another way: an instant where neither α nor ¬α holds cannot be a transition state between one where α holds and one where ¬α holds. For it is quite possible that such a state might be followed by ones where ¬α does not hold: ¬α never starts at all! The idea can be applied to one of Zeno’s paradoxes of motion: the arrow. Recall that this goes as follows. Consider an arrow at an instant of its motion. During that instant it advances not at all on its journey. Yet somehow in the whole motion, composed of just such instants, it does advance. How can this be possible? Standard measure-theory tells us that an interval of non-zero measure is composed of points of zero measure. Fine. But how can a physical advance be constituted by a bunch of no advances? A bunch of nothings, even an infinite bunch, is nothing. A resolution is provided by the previous considerations concerning change. At an instant of the motion, the arrow is at point p. But it is in a state of change, so it is not there as well. Thus, it is also at other points; presumably those just before and just after p. In the instant, then, it does occupy more than one point; it does make some advance on its journey. Finally in this section, let us consider 7. It is a commonplace to note that versions of verificationism may give rise to truth-value gaps since, for certain α, neither α nor ¬α may be verified — or even verifiable. It is less often noted that other versions may give rise to truth value gluts. Specifically, it is not uncommon for terms of our language to be multi-criterial — that is, for there to be different criteria which are semantically sufficient for the application of the term. For example, the appropriate reading from a correctly functioning alcohol thermometer is sufficient to determine the temperature of some water to be 4◦ c. But the appropriate reading of a thermo-electric thermometer is equally sufficient for the
Paraconsistency and Dialetheism
171
same. Now, normally, if we test for both of these criteria, they will either both hold or both fail. But in circumstances of a novel kind, it might well happen that the criteria fall apart. The alcohol thermometer may tell us that the temperature is 4◦ ; the thermo-electric thermometer may tell us that it is 3◦ , and so not 4◦ . It might be argued that if such a situation occurs, what this shows is that the terms in question are ambiguous, so that ‘3◦ ’ is ambiguous between 3◦ -by-analcohol-thermometer, and 3◦ -by-an-electro-chemical-thermometer. And doubtless, should this situation arise, we probably would replace our old concept of temperature by two new concepts. In just this way, for example, the term ‘mass’, as employed before the Special Theory of Relativity, was replaced by two terms ‘rest mass’ and ‘inertial mass’, afterwards. But it can hardly be claimed that the old term was semantically ambiguous before, in the way that, say, ‘cricket’ is (the insect and the game). It had a single meaning; we just recognised that meaning as applicable in different, and logically independent, ways. Thus, the situation, as described in the old language, really was inconsistent.
5.4
Truth and the Paradoxes of Self-Reference
This brings us to the last item on the list: the paradoxes of self-reference. As a matter of documented fact, this is the consideration that has been historically most influential for dialetheism. It is also, I think it fair to say, the consideration to which it is hardest to object coherently. Paradoxes of this kind are apparently valid arguments, often very simple arguments, starting from things that seem obviously true, but ending in explicit contradictions. Unless one can fault them, they establish dialetheism. Though many arguments in the family are, historically, quite recent, paradoxes of the family have been known now for close to two and a half thousand years. It is a mark of their resilience that even now there is still no consensus amongst those who think that there is something wrong with them as to what this is. Better, then, to stop trying to find a fault where none exists, and accept the arguments at face value. It is conventional wisdom to divide the paradoxes into semantic and set-theoretic. Though I think that this a profoundly misleading distinction,116 it will be useful to employ it here. Let us start with the semantic paradoxes. These are paradoxes that concern notions such as truth, satisfaction, reference. Take everyone’s favourite: the liar paradox.117 At its simplest, this is the claim: this claim is false. If it is true then it is false; and if it is false then it is true. Contradiction in either case. To tighten up the argument, let us write T for ‘is true’. Then the liar is a truth-bearer,118 λ, of the form ¬T λ. (The angle brackets here are some nameforming device.) Now, an almost irresistible principle concerning truth, stated first 116 See
Priest [1995, Part 3]. should be noted that though the paradox is a paradigm of the family, it has features that other members of the family do not have, and vice versa. One can not simply assume, therefore, that a solution to it automatically generalises to all members of the family. 118 One can choose whether these are sentences, propositions, beliefs or wot not, as one pleases. 117 It
172
Graham Priest
by Aristotle, is that something is true iff what it claims to be the case is in fact the case; as it is usually called now, the T -schema. For every α: T α ↔ α In particular, T λ ↔ λ. And given what λ is: T λ ↔ ¬T λ. T λ ∧ ¬T λ now follows, given various logical principles, such as the law of excluded middle, or consequentia mirabilis (α → ¬α ⊢ ¬α). The solutions to the liar and other semantic paradoxes that have been suggested — particularly in the last 100 years — are legion. This is not the place to attempt an exhaustive analysis of them. Further details can be found in the article on the paradoxes of self-reference in this Handbook. However, all attempts to provide a consistent analysis of the paradoxes run into fundamental problems. To see this, let us start by considering what are probably the two most influential such attempts in the last 100 years. The first of these is based on the work of Tarski. According to this, a language may not contain its own truth predicate. That is, a predicate satisfying the T schema for every sentence of a language L, must not occur in L itself, but must occur in a metalanguage. Of course, the move must be repeated, generating a whole hierarchy of languages, H, each of which contains a truth predicate for lower members of the hierarchy, but is semantically open: it does not contain its own truth predicate. In no sentence of the hierarchy may we therefore formulate a self-referential liar sentence. Of the many objections that one may raise against this solution, note here only the following. Given the resources of H, one may formulate the sentence: λH : λH is true in no member of H. Now we have a choice: is λH a sentence of some language in H or not? Suppose it is. We may therefore reason about its truth in the next member of the hierarchy up. If it is true, then it is not true in any member of H. Contradiction. Hence it cannot be true in any member of the hierarchy. That is, we have established λH . Hence, λH is a true sentence of some language in H. And we have already seen that this leads to contradiction. Suppose, on the other hand, that λH is not a member of the hierarchy. Then H is not English, since λH clearly is a sentence of English. The construction does not, therefore, show that the rules governing the truth predicate in English are consistent.119 The other particularly influential theory is Kripke’s. According to this, certain sentences may fail to take a truth value, and so be neither true nor false. Starting with a language which contains no truth predicate, we may augment the language 119 Here, and in what follows, I am assuming that English is the language of our vernacular discourse. Exactly the same considerations apply if it is some other natural language.
Paraconsistency and Dialetheism
173
with one, and construct a hierarchy. Not, this time, a hierarchy of languages, but a hierarchy of three-valued interpretations for the extended language. At the base level, every sentence containing T is neither true nor false. As we ascend the hierarchy, we acquire information to render sentences containing T determinately true or false. In particular, if we have shown that α is true at a certain level of the hierarchy, this suffices to render T α true at the next. If we play our cards right, we reach a level, F (a fixed point), where everything stabilises; by then, every sentence has a fixed semantic status; in particular, for every α, α and T α have the same status. It is this fixed-point interpretation that is supposed to provide an account of the behaviour of the truth predicate. Sentences that are determinately true or determinately false at the fixed point are called grounded. The liar sentence is, unsurprisingly, ungrounded. And being neither true nor false, it slips through the dilemma posed by the liar paradox argument. Again, of the many objections that may be brought against the theory, we note just one. Consider the sentence: λF : λF is not true at F What status does λF have at F ? If it has the status true, then it is not true at F . Contradiction. If it does not have the status true (in particular, if it is neither true nor false), then what it says to be the case is the case. Hence it is true. Contradiction again. One may object by noting that if λF is neither true nor false at F , then so are T λF and ¬T λF . Hence the final step of the reasoning does not follow. But if one chooses to break the argument in this fashion, this just shows, again, that the behaviour of T at the fixed point is not that of the English truth predicate. For according to the theory, λF is not true at the fixed point; and the theorist is committed to the truth of this claim. At this point, the only option,120 is to locate the discourse of the theorist outside the language L — in effect, taking the theorist’s truth predicate to be in a metalanguage for L. But this just shows that the construction does not establish the truth predicate of English to behave consistently. For the theorist is speaking English, and the construction does not apply to that. If we look at these two solutions, we can see a certain pattern. The machinery of the solution allows us to reformulate the liar paradox. Such reformulations are often call extended paradoxes. This is something of a misnomer, however. These paradoxes are not new paradoxes; they are just the same old paradox in a new theoretical context. What generates the paradox is a heuristic that allows us to construct a sentence that says of itself that it is not in the set of bona fide truths. Different solutions just characterise this set in different ways. At any rate, the only options in the face of these reformulated paradoxes are to accept contradiction or to deny that the machinery of the solution is expressible in the language in question. Since the machinery is part of the discourse of the theoretician, English, this shows that English discourse about truth has not been shown to be consistent. 120 Which
Kripke, in fact, exercised.
174
Graham Priest
The pattern we see here manifests itself, in fact, across all purported solutions to the liar paradox, showing them all to be deeply unsatisfactory for exactly the same reason.121 Neither is this an accident. There are underlying reasons as to why it must happen. We can put the matter in the form of a series of dilemmas. The liar and its kind arise, in the first place, as arguments in English. One who would solve the paradoxes must show that the semantic concepts of English involved are not, despite appearances, inconsistent — and it is necessary to show this for all such concepts, for they are all embroiled in contradiction. Attempts to do this employing the resources of modern logic all show how, for a given language, L, in some class of languages, to construct a theory TL , of the semantic notions of L, according to which they behave consistently. The first dilemma is posed by asking the question of whether TL is expressible in L. If the answer is ‘yes’, the liar heuristic always allows us to reformulate the paradox to generate inconsistency. Nor is this an accident. For since TL is expressible in L, and since, according to TL , things are consistent, we should be able to prove the consistency of TL in TL . And provided that TL is strong enough in other ways (for example, provided that it contains the resources of arithmetic, which it must if L is to be a candidate for English), then we know that TL is liable to be inconsistent by G¨ odel’s second incompleteness Theorem. (Any theory of the appropriate kind that can prove its own consistency is inconsistent.) If the answer to the original question in ‘no’, then we ask a second question: is English (or at least the relevant part of it), E, one of the languages in the family being considered? If the answer to this is ‘yes’, then it follows that TE is not expressible in English, which is self-refuting, since the theorist has explained how to construct TE in English. If, on the other hand, the answer to this question is ‘no’, then the original problem of showing that the semantic concepts of English are consistent has not been solved. Hence, all attempts to solve the paradox swing uncomfortably between inconsistency and a self-refuting inexpressibility. The problem, at root, is that English is, in a certain sense, over-rich. The semantic rules that govern notions such as truth over-determine the truth values of some sentences, generating contradiction. The only way to avoid this is to dock this richness in some way. But doing this just produces incompleteness, making it the case that it is no longer English that we are talking about.122 What we have seen is that the liar paradox and its kind are more than just 121 For
detailed arguments, See Priest [1987, ch.1], and Priest [1995, Part 3]. move is possible at this point: an explicitly revisionary one. This concedes that the rules that govern ‘is true’ in English generate contradictions, but insists that the concept should be replaced by one governed by rules which do not do this. This was, in fact, Tarski’s own view, and was the spirit in which he offered the hierarchy of metalanguages. But why must we revise? If our notion of truth is inconsistent, does this just not show us that an inconsistent notion is perfectly serviceable? And if we must go in for some act of self-conscious conceptual revision, then a revision to a paraconsistent/dialetheic conceptual framework is clearly a possibility. The mere proposal of a consistent framework is not, therefore, enough: it must be shown to be superior. As we will see in the final part of this essay, this seems rather hard task. 122 Another
Paraconsistency and Dialetheism
175
prima facie dialetheias. Attempts to show them to be only this, run into severe difficulties. At this point, a natural question is as follows: if consistent attempts to solve the paradoxes run into the problem of reformulated paradoxes, what about dialetheic solutions? In particular, if sentences may be both true and false, perhaps the bona fide truths are the ones that are just true. So what about: λD : λD is not (true only) If it is true it is also false. If it is false, it is true only. Hence it is true. Hence, it would seem to be true and false. But if it is true, it is not false. Hence it is true, false, and not false. We have certainly run into contradiction. But unlike consistent accounts of the paradox, this is hardly fatal. For the very point of a dialetheic account of the paradoxes is not to show that self-referential discourse about truth is consistent — precisely the opposite. This is a confirmation, not a refutation! There is an important issue here, however. Though some contradictions are acceptable to a dialetheist, not all are, unless the dialetheist is a trivialist. Now there is an argument which purports to show that the T -schema entails not just some contradictions; it entails everything. In particular, suppose that the conditional involved in the T -schema satisfies both modus ponens and Contraction: α → (α → β) ⊢ α → β. Let α be any sentence, and consider the sentence: λα : T λα → α (if this sentence is true then α). The T -schema gives: T λα ↔ (T λα → α) whence Contraction from left to right gives: T λα → α whence modus ponens from right to left gives T λα . A final modus ponens delivers α. Arguments of this kind are usually called Curry paradoxes, after one of their inventors. A dialetheic solution to the paradoxes therefore depends on endorsing a paraconsistent logic whose conditional does not satisfy Contraction.123 Paraconsistent logics whose positive parts are classical or intuitionistic, such as the positive-plus logics of 4.3, contain Contraction, and so are unsuitable. Even the stronger relevant logics in the vicinity of R endorse Contraction. But weaker relevant logics, in the vicinity of B, do not. It can be shown that a theory containing the T -schema and self-reference (even all of arithmetic), and based on a weaker relevant logic, though inconsistent, is non-trivial. It can be shown, in fact, that all the sentences 123 Or modus ponens, though this is a less easy position to defend. It has been defended by Goodship [1996].
176
Graham Priest
that are grounded in Kripke’s sense (and so contain only extensional connectives) behave consistently.124 We have yet to deal with the set-theoretic paradoxes, but before we turn to these, let us return to the issue of inconsistencies in philosophical theories.
5.5 The Limits of Thought A few philosophers have endorsed explicitly contradictory theories. Many have endorsed theories that turned out to be accidentally inconsistent — accidental in the sense that the inconsistencies could be trimmed without fundamental change. But there is a third group of philosophers. These are philosophers who, though they could hardly be said to be dialetheists, yet endorsed theories that were essentially inconsistent: inconsistency lay at the very heart of their theories; it could not be removed without entirely gutting them. Such inconsistencies seem to occur, in particular, in the works of those philosophers who argue that there are limits to what can be thought, conceived, described. In the very act of theorising, they think, conceive, or describe things that lie beyond the limit. Thus, many philosophers have argued that God is so different from anything that people can conceive, that God is literally beyond conception or description. This has not prevented them from saying things about God, though; for example, in explaining why God is beyond conception. A famous example of the same situation is provided by Kant in the first Critique. Kant espoused the distinction between phenomena (things that can be experienced) and noumena (things that cannot). Our categories of thought apply to the former (indeed, they are partly constitutive of them); but they cannot be applied to the latter (one reason for this: the criteria for applying each of the categories involves time, and noumena are not in time). In particular, then, one can say nothing about noumena, for to do so would be to apply categories to them. Yet Kant says much about noumena in the Critique; he explains, for example, why our categories cannot be applied to them. Another famous example of the same situation is provided by Wittgenstein in the Tractatus. Propositions express the facts that constitute the world. They can do so because of a commonality of structure. But such structure is not the kind of thing that propositions can be about (for propositions are about objects, and structure is not an object). One can say nothing, therefore, about this structure. Yet the Tractatus is largely composed of propositions that describe this structure, and ground the conclusion that it cannot be described. None of the philosophers referred to above was very happy about this contradictory situation; and all tried to suggest ways in which it might be avoided. In theology, it was not uncommon to draw a distinction between positive and negative attributions, and to claim that only negative assertions can be made of God (via negativa), not positive. But not only is the positive/negative distinction hard 124 The result was first proved for a version of set theory by Brady [1989]. Its adaptation to truth is spelled out in Priest [2002, Section 8].
Paraconsistency and Dialetheism
177
to sustain — to say, for example, that God is omnipotent is to say that God can do everything (positive); but it is equally to say that there is nothing that limits God’s power (negative) — the very reasons for supposing that God is ineffable would clearly seem to be positive: ineffability arises because God’s characteristics exceed any human ones by an infinite amount. In the Critique, Kant tried to defuse the contradiction in a not dissimilar way, claiming that the notion of a noumenon had a merely negative, or limiting, function: it just serves to remind that there are bounds to the applicability of our categories. But this does not actually address the issue, which is how we can possibly say anything at all about noumena; indeed, it makes matters worse by saying more things about them. And again, Kant says lots of things about noumena which go well beyond a simple assertion of this limiting function; for example, he defends free will on the ground that the noumenal self is not subject to causation. The issue was faced squarely in the Tractatus. Wittgenstein simply accepted that he could not really say anything about the structure of language or the world. The Tractatus, in particular, in mostly meaningless. But this is not at all satisfactory. Apart from the fact that we do understand what the propositions of the Tractatus say — and so they cannot be meaningless — if this were indeed so, we would have no ground for supposing that the propositions are meaningless, and so accepting Wittgenstein’s conclusions. (You would not buy a second-hand ladder from such a person.) None of the saving stratagems, then, is very successful. Nor is this surprising. For there is something inherently contradictory in the very project of theorising about limits of thought. In the very process, one is required to conceive or describe things that are on the other side — as Wittgenstein himself points out in the introduction to the Tractatus. The contradiction concerned is therefore at the very heart of the project. It is no mere accidental accretion to the theory, but is inherent in its very problematic. If there are limits to thought, they are contradictory — by their very nature. Of course, one might reject the contradiction by rejecting the claim that there are things beyond the limit of thought. (This is exactly Berkeley’s strategy in his argument that everything can be conceived.) There is no God; or if there is, God is perfectly effable. Hegel argued that our categories are just as applicable to noumena as they are to phenomena.125 And in the introduction to the English version of the Tractatus, Russell argued that what could not be stated in the language of the Tractatus could be stated in a metalanguage for it. How successful these particular moves are, is another matter. There certainly are general philosophical reasons for supposing there to be things beyond the limits of thought. The most definitive reasons for supposing this take us back to the semantical paradoxes of self-reference. There are so many objects that it 125 This is ironical, to a certain extent, since Hegel was a philosopher who was prepared to accept contradictions. But in this respect, the move takes Hegel out of the frying pan, and into the fire. For the move undercuts Kant’s solution to the Antinomies of Pure Reason, which contradictions must therefore be endorsed.
178
Graham Priest
impossible that all of them should have a name (or be referred to). There is, for example, an uncountable infinitude of ordinal numbers, but there is only a countable number of descriptions in English. Hence, there are many more ordinal numbers than can have names. In particular, to turn the screw, since the ordinal numbers are well-ordered, there is a least ordinal number that has no description. But we have just described it. Perhaps, it may be thought, something fishy is going on here with infinity. Historically, infinity has always, after all, been a notion with a question mark hanging over it. But similar paradoxes do not employ the notion of infinity. Given the syntactic resources of English, there is only a finite number of descriptions of some fixed length — say less than 100 words — and, a fortiori, only a finite number of (natural) numbers that are referred to by them. But the number of numbers exceeds any finite bound. Hence, there are numbers that cannot be referred to by a description with fewer than 100 worlds. And again, there must be a least. This cannot be referred to; but we have just referred to it. These two paradoxes are well known. The first is K¨ onig’s paradox; the second is Berry’s. They are semantic paradoxes of self-reference in the same family as the liar. We now see them in another light. They are paradoxes of the limits of thought; and contradiction is just what one should expect in such cases.126
6
THE FOUNDATIONS OF MATHEMATICS
6.1 Introduction: a Brief History The development of modern logic has been intimately and inextricably connected with issues in the foundations of mathematics. Questions concerning consistency and inconsistency have been a central part of this. One might therefore expect paraconsistency to have an important bearing on these matters. Such expectations would not be disappointed. In this part we will see why. In the process, we will pick up the issue of the set-theoretic paradoxes left hanging in the previous section. Let us start with a brief synopsis of the relevant history.127 The nineteenth century was a time of great progress in the understanding of foundational matters in mathematics, matters that had been murky for a very long time. By the end of the century, the reduction of rational, irrational, and complex numbers to the natural numbers was well understood. The nature of the natural numbers still remained obscure. It was in this context that Frege and Russell proposed an analysis of the natural numbers (and thence of all numbers) in purely logical terms. A vehicle for this analysis needed to be built; the vehicle was classical logic. It was more than this, though; for what was also needed was a theory of extensions, or sets, which both Frege and Russell took to be part of logic. According to 126 The
issues of this section are discussed at much greater length in Priest [1995]. details can be found in the articles on Frege, Russell, Hilbert, and G¨ odel in this Handbook. 127 Further
Paraconsistency and Dialetheism
179
Frege’s theory of extensions, the simplest and most obvious, every property has an extension. This is the unrestricted principle of set abstraction: ∀y(y ∈ {x; α(x)} ↔ α(y)) The schema looks to be analytic, and very much like a part of logic. The reduction was a very successful one... except that this theory of sets was found to be inconsistent. At first, the contradictions involved, discovered by Cantor, Burali-Forti and others, were complex, and it could be hoped that some error of reasoning might be to blame. But when Russell simplified one of Cantor’s arguments to produce his famous paradox, it became clear that contradiction lay at the heart of the theory of sets. Taking x ∈ / x for α(x) gives: ∀y(y ∈ {x; x ∈ / x} ↔ y ∈ / y) Now writing {x; x ∈ / x} as r, and instantiating the quantifier with this, produces r∈r↔r∈ / r, and given some simple logical principles, such as the law of excluded middle or consequentia mirabilis, contradiction follows. In response to this, mathematicians proposed ways of placing restrictions on the abstraction principle which were strong enough to avoid the contradictions, but not too strong to cripple standard set-theoretic reasoning, and particularly some version of the reduction of numbers to sets. How successful they were in this endeavour, we will return to in a moment. But the result for Frege and Russell’s logicist programme was pretty devastating. It became clear that, though the reduction of numbers to sets could be performed, the theory of sets employed could hardly be taken as a part of logic. Whilst the unrestricted abstraction schema could plausibly be taken as an analytic principle, the things that replaced it could not be seen in this way. Nor could this theory of sets claim any a priori obviousness or freedom from contradiction. This fact gave rise to another foundational programme, Hilbert’s. Hilbert thought that there were certain mathematical statements whose meanings were evident, and whose truth (when true) was also evident, finitary statements — roughly, numerical equations or truth-functional compounds thereof. Other sorts of statements, and especially those containing numerical variables — which he termed ideal — had no concrete meaning. We can reason employing such statements, but we can do so only if the reasoning does not contradict the finitary base. And since Hilbert took the underlying logic to be classical logic, and so explosive, what this meant was that the reasoning had to be consistent. Hence, it was necessary to prove the consistency of our formalisation of mathematics. Of course, a proof could have significance only if it was secure. Hence, the proof had to be carried out finitistically, that is, by employing only finitary statements. This was Hilbert’s programme.128 The programme was killed, historically, by G¨ odel’s famous incompleteness theorems. G¨odel showed that in any consistent theory of arithmetic there are sentences 128 See
Hilbert [1925].
180
Graham Priest
such that neither they nor their negations could be proved. Moreover, the consistency of the theory in question was one such statement. Hence, any consistent theory which includes at least finitary reasoning about numbers can not have its consistency shown in the theory itself. To confound matters further, G¨ odel demonstrated that, given a theory that was intuitively sound, some of the sentences that could not be proved in it could, none the less, be shown to be true. Let us now turn to the issues of how paraconsistency bears on these matters and vice versa.
6.2
The Paradoxes of Set Theory
For a start, the set-theoretic paradoxes provide further arguments for dialetheism. The unrestricted abstraction schema is an almost irresistible principle concerning sets. Even those who deny it have trouble sticking to their official position. And if it is what it appears to be, an a priori truth concerning sets, then dialetheism is hard to resist. As mentioned above, set theorists tried to avoid this conclusion by putting restrictions on the abstraction schema. And unlike the corresponding situation for the semantic paradoxes, there is now some sort of orthodoxy about this. Essentially, the orthodoxy concerns Zermelo Fraenkel set theory (ZF ) and its intuitive model, the cumulative hierarchy. This model is the set-theoretic structure obtained by starting with the empty set, and applying the power-set iteratively. The construction is pursued all the way up the ordinals, collecting at limit ordinals. The instances of the abstraction schema that are true are the ones that hold in the hierarchy. That is, the sets postulated by the schema do not exist unless they are in the hierarchy.129 Notice that it is not contentious that the sets in the hierarchy exist. All may agree with that. The crucial claim is the one to the effect that there are no sets outside the hierarchy. Unfortunately, there seems to be no very convincing reason as to why this should be so. It is not the case, for example, that adding further instances of the abstraction schema must produce inconsistency. For example, one can postulate, quite consistently with ZF , the existence of non-well-founded sets (that is, sets, x0 , such that there is an infinitely descending membership sequence x0 ∋ x1 ∋ x2 ∋ ...; there are no such sets in the hierarchy). Moreover, there are reasons as to why an insistence that there are no sets other than those in the hierarchy cannot be sustained. For a start, this is incompatible with mathematical practice. It is standard in category theory, in particular, to consider the category of all sets (or even all categories). Whatever else a category is, it is a collection of a certain kind. But the set of all sets in the hierarchy is not itself in the hierarchy. Indeed, if one supposes that there is such a set then, given 129 There are variations on the idea; for example, concerning whether or not to countenance proper classes (sub-collections of the whole hierarchy that cannot be members of anything); but these do not change the fundamental picture. In particular, all the arguments that follow can be reworked to apply to the collection of all classes (that is, sets or proper classes).
Paraconsistency and Dialetheism
181
the other resources of ZF , contradiction soon ensues. More fundamentally, the insistence flies in the face of the Domain Principle. A version of this was first enunciated by Cantor. In a modern form, it is as follows: if statements quantifying over some totality are to have determinate sense, then there must be a determinate totality of quantification. The rationale for the Principle is simple: sentences that contain bound variables have no determinate sense unless the domain of quantification is determinate. Is it true, for example, that every quadratic equation has two roots? If we are talking about real roots, the answer is ‘no’; if we are talking about complex roots, the answer is ‘yes’. Now, statements of set theory have quantifiers that range over all sets, and, presumably, have a determinate sense. By the Domain Principle, the set of all sets must therefore be a determinate collection. But it is not a collection located in the hierarchy, as we have just noted.130 The orthodox solution to the paradoxes of set theory is therefore in just as much trouble as the plethora of solutions to the semantic paradoxes.
6.3
Paraconsistent Set Theory
In contrast with attempted consistent solutions to the set-theoretic paradoxes, a dialetheic approach simply endorses the unrestricted abstraction schema, and accepts the ensuing contradictions. But since it employs a paraconsistent consequence relation, these contradictions are quarantined. As with semantic paradoxes, not all paraconsistent logics will do what is required. For example, in a logic with modus ponens and Contraction, Curry paradoxes are quickly forthcoming. If α is any sentence, then the abstraction schema gives: ∀y(y ∈ {x; x ∈ x → α} ↔ (y ∈ y → α)) Now write {x; x ∈ x → α} as c, and instantiate the universal quantifier with it to obtain: c ∈ c ↔ (c ∈ c → α); then argue as in the semantic case. It was shown by Brady [1989] that when based on a suitable relevant logic that does not endorse Contraction (but which contains the law of excluded middle), set theory based on the unrestricted abstraction schema, though inconsistent, is non-trivial.131 Let us call this theory naive relevant set theory. The next obvious question in this context concerns how much standard set theory can be derived in naive relevant set theory. In particular, can the reduction of number theory to set theory be obtained? If it can, then the logicist programme looks as though it can be made to fly again; Frege and Russell are vindicated. In naive set theory, and with a qualification to which we will return in a moment, naive set theory is sufficient for most workaday set theory, concerning the basic settheoretic operations (unions, pairs, functions, etc.).132 As to whether it provides 130 There are various (unsatisfactory) ways in which one may try to avoid this conclusion. These are discussed in Priest [1995, ch. 11]. 131 Brady [1983] also showed that without the law of excluded middle, the theory is consistent. 132 Details can be found in Routley [1977].
182
Graham Priest
for the essential parts of the theory of the transfinite, or for the reduction of number theory to set theory, no definitive answer can presently be given. What can be said is that the standard versions of many of the proofs concerned fail, since they depend on properties of the conditional not present in the underlying logic. Whether there are other proofs is not known. But the best guess is that for most of these things there probably are not. If this is the case, a big question clearly hangs over the acceptability of the theory. If it cannot accommodate at least the elements of standard transfinite set theory in some way, it would seem to be inadequate. Actually, the situation is more complex than I have so far indicated, due to considerations concerning extensionality. The natural identity condition for sets is coextensionality: two sets are the same if, as a matter of fact, they have the same members. That is: ∀x(α ≡ β) → {x; α} = {x; β} where ≡ is the material biconditional (α ≡ β is (α ∧ β) ∨ (¬β ∧ ¬α)). But if one formulates the identity conditions of sets in naive relevant set theory in this way, trouble ensues. Let r be {x; x ∈ / x}. We can show that r ∈ r ∧ r ∈ / r. Hence, for any α, we have α ≡ r ∈ r, and so {x; α} = {x; r ∈ r}.133 Given standard properties of identity, it follows that all sets are identical. One way around this problem is to replace the ≡ in the identity conditions with an appropriate relevant biconditional ↔.134 But there is a cost. Let x be the complement of x, {y; y ∈ / x}. Then one can show that for any x and y, ¬∃z z ∈ x ∩ x, and ¬∃z z ∈ y ∩ y. Thus, x ∩ x and y ∩ y are both empty; but one cannot show that they are identical, since arbitrary contradictions are not equivalent: it is not the case that (z ∈ x ∧ z ∈ / x) ↔ (z ∈ y ∧ z ∈ / y). One might think this not too much of a problem. After all, many people find a unique empty set somewhat puzzling. However, the problem is quite pervasive. There are going to be many universal sets, for example, for exactly the same reason.135 The structure of sets is not, therefore, a Boolean algebra. Unsurprisingly, it is a De Morgan algebra.136 And assuming, as seems natural, that a universe of sets must have an underlying Boolean structure, this shows that using an intensional connective to state identity conditions is going to deliver a theory of some kind of entity other than sets.137 Extensionality lies deep at the heart of set theory. 133 Or
{x; α} = {x; r ∈ r ∧ x = x} if one does not like vacuous quantification. is how extensionality is stated in Brady’s formulation. 135 And quite generally, every set is going to be duplicated many times; for if τ is any contingent truth, the same things satisfy α(x) and α(x) ∧ τ . But it is not the case that α(x) ↔ (α(x) ∧ τ ). 136 Indeed, as Dunn [1988] shows, if we add the assumption that there is a unique empty set and a unique universal set, the underlying logic collpses into classical logic. 137 Possibly properties, which are more naturally thought of as intensional entities. If we read set abstracts as referring to properties and ∈ as property instantiation then this problem does not arise, since there is no reason to expect a Boolean algebra. Note, also, that a naive theory of properties of this kind is not problematic if it is unable to deliver transfinite set theory. A dialetheic theory of properties is, in fact, quite unproblematic. 134 This
Paraconsistency and Dialetheism
183
Can this fact be reconciled with a dialetheic account of sets? There is one way. Formulate the theory entirely in terms of material conditionals and biconditionals. Not only are these employed in the statement of identity conditions of sets, but they are also employed in the abstraction schema. After all, this is how it is done in ZF . Call this theory simply naive set theory. If one formulates set theory in this way, the argument that all sets are identical fails, since it requires a detachment for the material conditional: — in effect, the disjunctive syllogism. Indeed, it is now an easy matter to show that there are models of the theory with more than one member. Such a move radically exacerbates the problem concerning the proof-theoretic power of the theory, however. Since the material conditional does not detach, the theory is very weak indeed. Fortunately, then, standard set theory may be interpreted in a different fashion. It can be shown that any model of ZF can be extended to a model of simply naive set theory.138 The original model is, in fact, a consistent substructure of the new model. Hence, there are models of naive set theory in which the cumulative hierarchy is a consistent sub-structure. And we may take the standard model (or models) of naive set theory to be such (a) model(s). In this way, classical set theory, and therefore all of classical mathematics, can be interpreted as a description of a consistent substructure of the universe of sets. This fact does nothing much to help logicism, however. In particular, one cannot argue that the principles of arithmetic are analytic, since, even if the axioms of set theory are analytic, the former have not been deduced from the latter.
6.4
G¨ odel’s Theorems
Let us now turn to G¨ odel’s incompleteness theorems. These concern theories that contain arithmetic, phrased in a standard first order language (with only extensional connectives). Without loss of generality, we can consider just arithmetic itself. A simple statement of G¨odel’s first theorem says that any consistent theory of arithmetic is incomplete. This need not be disputed. Careless statements of the theorem often omit the consistency clause. What paraconsistency shows is that the clause is absolutely necessary. As we will see, there are complete but inconsistent theories of arithmetic.139 The existence of these follows from a general model-theoretic construction called the Collapsing Lemma. I will not go into all the formal details of this here, but the essential idea is as follows. Take any classical model, and consider any equivalence relation on its domain, that is also a congruence relation on the interpretations of the function symbols in the language. Now construct an LP interpretation by identifying all the elements in each equivalence class. Any predicate of the language is true of the elements identified if it is true of some one of them; and it is 138 See
Restall [1992]. was first demonstrated, in effect, by Meyer [1978]. The same paper shows that the non-triviality (though not the consistency) of a certain consistent arithmetic based on relevant logic may also be demonstrated within the theory itself. Further technical details of what follows can be found in Priest [2002, Section 9]. 139 This
184
Graham Priest
false if it is false of some one of them. The resulting interpretation is the collapsed interpretation; and the Collapsing Lemma states that anything true in the original interpretation is true in the collapsed interpretation. Hence, if the original interpretation is a model of some theory, so is the collapsed interpretation. Of course, it will be a model of other things as well. In particular, it will verify certain contradictions. Thus, for example, suppose that a and b are distinct members of an equivalence class. Then since a = b was true before the collapse, it is true after the collapse. But since a and b have now been identified, a = b is also true. To apply this to the case at hand, take arithmetic to be formulated, as is usually done, in a first-order language containing the function symbols for successor, addition, and multiplication; and consider any model of the set of sentences in this language true in the standard model — maybe the standard model itself. It is easy to construct an appropriate equivalence relation, ∽, and apply the Collapsing Lemma to give an interpretation that is a model of an inconsistent theory containing classical arithmetic. For example, the following will do: for a fixed n, a ∽ b iff (a, b ≥ n) or (a, b < n and a = b). (This leaves all the numbers less than n alone, and identifies all the others.) To bring this to bear on G¨ odel’s theorem, choose an equivalence relation which makes the collapsed model finite. The one just mentioned will do nicely. Let T be the theory of the collapsed model (that is, the set of sentences true in it). Since what holds in a finite model is decidable (essentially by LP truth tables; quantifiers are equivalent to finite conjunctions and disjunctions), T is decidable. A fortiori, it is axiomatic. Hence, T is an axiomatic theory of arithmetic. It is inconsistent but complete. Let us turn now to the second incompleteness theorem. According to this, if a theory of arithmetic is consistent, the consistency of the theory cannot be proved in the theory itself. Inconsistent theories hardly bear on this fact. Classically, consistency and non-triviality are equivalent. Indeed, the canonical statement of consistency in these matters is a statement of non-triviality. In a paraconsistent logic the two are not equivalent, of course. T , for example, is inconsistent; but it is not trivial, provided that the equivalence relation is not the extreme one which identifies all elements of the domain (in the example of ∽ just given, provided that n > 0). The question of whether the non-triviality of an inconsistent but non-trivial theory can be proved in the theory itself is therefore a real one. And it can. Consider T . Since it is decidable, its membership relation is expressible in the language of arithmetic. That is, there is a sentence of one free variable, π(x), such that for any sentence, α, if α ∈ T then π(α) is true, and if α ∈ / T then ¬π(α) is true. (Here, α is the numeral of the code number of α.) Hence, by the Collapsing Lemma: π-in: if α ∈ T , π(α) ∈ T π-out: if α ∈ / T then ¬π(α) ∈ T
Paraconsistency and Dialetheism
185
(Of course, for some αs, π(α) and ¬π(α) may both be in T .) Then provided that the equivalence relation does not identify 1 and 0, 1 = 0 ∈ / T , and so ¬π(1 = 0 ∈ T . Hence, T is non-trivial, and the statement expressing the nontriviality of T is provable in T . G¨ odel’s second incompleteness Theorem does fail in this sense. We have not finished with G¨ odel’s Theorem yet, but let us ask how these matters bear on the issue of Hilbert’s Programme. Hilbert’s programme required that mathematics be formalised, and that the whole formalised theory be a conservative extension of the finitary part. Interestingly, Hilbert’s motivating considerations did not require the formalisation to be consistent (though since he assumed that the underlying logic was classical, this was taken for granted). Like all instrumentalisms, it does not matter what happens outside the core (in this case, the finitary) area. The point is that the extension be a conservative one over the core area. So the use of an inconsistent theory is quite compatible with Hilbert’s programme, in this sense. Does the construction we have been looking at provide what is required, then? Not exactly. First, as far as has been shown so far, it might be the case that both π(1 = 0) and ¬π(1 = 0) are in T . If this is the case, the significance of a non-triviality proof is somewhat moot. (It could be, though, that with careful juggling we can ensure that this is not the case.) More importantly, T is not a conservative extension of the true numerical equations. For since the model is finite, distinct numbers must have been identified. Hence, there are distinct m and n such that m = n ∈ T .140 There are certainly collapsed models where this is not the case. Suppose, for example, that we collapse a classical non-standard model of arithmetic, identifying some of the non-standard numbers, but leaving the standard numbers alone. Then the equational part of the theory of the collapsed model is consistent. In this case, though, the collapsed model is not finite, so there is no guarantee that its theory is axiomatisable. Whether or not there are collapses of non-standard models of this kind where the theory of the collapsed model is axiomatisable, or there are other axiomatic inconsistent theories with consistent equational parts, is not known at present.
6.5
G¨ odel’s Paradox
As we have noted, paraconsistency does not destroy G¨ odel’s theorems provided that they are stated in the right way; and in particular, that the consistency clauses are spelled out properly. Otherwise, they fail. The theorems have been held to have many philosophical consequences. If consistency is simply taken for granted, paraconsistency entirely undercuts any such mooted consequence. But, it may be argued, we are interested only in true theories, and the inconsistent theories in question can hardly be true. This move is itself moot. Once dialetheism is taken on board, it cannot simply be assumed that any true mathematical theory is consistent — especially in areas where paradoxes play, such as set theory. But 140 There is a radical move that is possible here, though: to accept that the true equations are themselves inconsistent. See Priest [1994].
186
Graham Priest
leave the flights of set theory out of this; what of arithmetic? Could it be seriously supposed that this is inconsistent? This brings us back to the version of G¨ odel’s theorem with which I ended the first section of this part. According to this, given any axiomatic and intuitively correct theory of arithmetic, there is a sentence that is not provable in the theory, but which we can yet establish as true by intuitively correct reasoning. The sentence is the famous undecidable sentence that “says of itself that it is not provable”; that is, a sentence, γ, of the form ¬π(γ).141 Now consider the canons of mathematical proof, those procedures whereby we establish mathematical claims as true. These are certainly intuitively correct — or we would not use them. They are not normally presented axiomatically; they are learned by mathematics students by osmosis. Yet it is reasonable to suppose that they are axiomatic. We are finite creatures; yet we can recognise, in principle, an infinite number of mathematical proofs when we see them. Hence, they must be generated by some finite set of resources. That is, they are axiomatic. In the same way, we can recognise an infinite number of grammatical sentences. Hence, these, too, must be generatable by some finite rule system, or our ability to recognise them would be inexplicable. Now consider the undecidable sentence, γ, for this system of proof. By the theorem, if the system is consistent, we cannot prove γ in it. But — again by the theorem — we can prove γ in an intuitively correct way. Hence, it must be provable in the system, since this encodes our intuitively correct reasoning. By modus tollens it follows that the system is inconsistent. Since this system encoded precisely our means of establishing mathematical claims as true, we have a new argument for dialetheism. What of the undecidable sentence? It is not difficult to see that it is provable. Let us use ⊢ as a sign for our intuitive notion of provability. It is certainly intuitively correct that what is provable is true (indeed, this is analytic), i.e., for all α, ⊢ π(α) ⊃ α. In particular, then, ⊢ π(γ) ⊃ ¬π(γ). It follows that ⊢ ¬π(γ), i.e., ⊢ γ. Of course, since we have a proof of γ, we have also demonstrated that ⊢ π(γ), i.e., ⊢ ¬γ. Thus, the “undecidable” sentence is one of the contradictions in question. It is worth noting that if T is the formal system introduced in the last section, both γ and ¬γ are in T . For γ ∈ T or γ ∈ / T . But in the latter case, ¬π(γ) ∈ T (by π-out), i.e., γ ∈ T anyway. But then π(γ) ∈ T (by π-in), i.e., ¬γ ∈ T . Hence T captures these aspects of our intuitive proof procedures admirably. At any rate, arithmetic is inconsistent, since we can prove certain contradictions to be true; and γ is one of them. In fact, dressed in the vernacular, γ is a very recognisable paradox, in the same family as the liar: this sentence is not provable. If it is provable, it is true, so not provable. Hence it is not provable. But then we have just proved it. We may call this G¨ odel’s paradox ; it returns us to the discussion of semantic paradoxes in the last part. We see that there is a very intimate connection between these paradoxes, G¨odel’s theorems, and dialetheism. 141 The theorem is proved explicitly in this form in Priest [1987, ch. 3], where the following argument is discussed at much greater length.
Paraconsistency and Dialetheism
187
7 NEGATION
7.1
What is Negation?
We have now looked at the history of both paraconsistency and dialetheism. No account of these issues could be well-rounded, however, without a discussion of a couple of philosophical notions which are intimately related to both. One of these is rationality, which I will deal with in the next part. The other, which I will deal with in this part, is negation. This is a notion that we have been taking for granted since the start of the essay. Such a crucial notion clearly cannot be left in this state. So what is negation?142 A natural thought is that the negation of a sentence is simply one that is obtained by inserting the word ‘not’ at an appropriate point before the main verb (or by some similar syntactic construction in other languages). This, however, is not right. It may well be that the negation of: 1 Bessy is a cow is: 1n Bessy is not a cow But as Aristotle pointed out a long time ago143 the negation of: 2 Some cows are black is not: 2′ Some cows are not black but rather: 2n No cows are black Worse, inserting a ‘not’ in a sentence often has nothing to do with negation at all. Consider, for example, the person who says: ‘I’m not British; I’m Scottish’ or ‘Australia was not established as a penal colony; it was established as a British territory using forced labour’. In both cases, the “notted” sentence is true, and the utterer would not suppose otherwise. What the ‘not’ is doing, as the second sentence in each pair makes clear, is rejecting certain (normal?) connotations of each first sentence. Linguists sometimes call this ‘metalinguistic negation’.144 What these examples show is that we have a grasp of the notion of negation, independent of any particular use of the word ‘not’, which we can use to determine 142 The
material in this section is discussed further in Priest [1999a]. Interpretatione, ch. 7. 144 See, e.g., Horn [1989, ch. 5], for an excellent discussion. In the context of logic, the terminology is clearly not a happy one. 143 De
188
Graham Priest
when “notting” negates. We can see that this relationship holds between examples like 1 and 1n, and 2 and 2n, but not between 2 and 2′ . This is the relationship between contradictories; let us call it the contradictory relation. We can, and of course modern logicians usually do, use a symbol, ¬, with the understanding that for any α, α and ¬α bear the contradictory relation to each other, but ¬ is a term of art.145 Perhaps it’s closest analogue in English is a phrase like ‘It is not true that’ (or equivalently, ‘It is not the case that’). But this is not exactly the same. For a start, it brings in explicitly the notion of truth. Moreover, these phrases can also be used as “metalinguistic” negations. Just consider: ‘It’s not true that he’s bad; he’s downright evil’. Negation, then, is the contradictory relation. But what relation is that? Different accounts of negation, and the different formal logics in which these are embedded, are exactly different theories which attempt to provide answers to this question. One may call these different notions of negation simply different negations if one wishes, but one should recall that what they are, really, are different conceptions of how negation functions. In the same way, different theories of matter (Aristotelian, Newtonian, quantum) provided different conceptions of the way that matter functions.
7.2 Theories of Negation There are, in fact, many different theories as to the nature of negation. Classical logic and intuitionist logics give quite different accounts, as do many other modern logics. Indeed, we have already looked at a number of paraconsistent accounts of negation in Part 4. The existence of different theories of negation is not merely a contemporary phenomenon, however. There are different theories of negation throughout the history of logic. Let me illustrate this fact by looking briefly at three, one from ancient logic, one from (early) medieval logic, and one from (early) modern logic. The first account is Aristotle’s. First, Aristotle has to say which sentences are the negations of which. This, and related information, is encapsulated in what later came to be known as the square of opposition: All As are Bs.
No As are Bs
Some As are Bs.
Some As are not Bs.
The top two statements are contraries. The bottom two are sub-contraries. Formulas at the opposite corners of diagonals are contradictories, and each statement at the top entails the one immediately below it. The central claims about the properties of contradictories are to be found in Book 4 of the Metaphysics. As we have seen, Aristotle there defends the claim that negation satisfies the laws of non-contradiction and excluded middle: 145 The device goes back to Stoic logicians who simply prefixed the whole sentence with a ‘not’ — or at any rate its Greek equivalent. Medieveal logicians often did the same — in Latin.
Paraconsistency and Dialetheism
189
LEM 2(α ∨ ¬α) LNC ¬3(α ∧ ¬α) Further discussion of the properties of contradictories is found in De Interpretatione. Prima facie, Aristotle appears there to take back some of the Metaphysics account, since he argues that if α is a contingent statement about the future, neither α nor ¬α is true, prefiguring theories that contain truth-value gaps. There is, however, a way of squaring the two texts, and this is to read Aristotle as endorsing supervaluation of some kind.146 Even though α and ¬α may both be neither true nor false now, eventually, one will be true and the other will be false. Hence, if we look at things from an “eventual” point of view, where everything receives a truth value, α ∨ ¬α (and so its necessitation) is true. In this way, Aristotle can have his law of excluded middle and eat it too. Whether the texts can reasonably be interpreted in this way, I leave Aristotle scholars to argue about. Whatever one says about the matter, this is still only a part of Aristotle’s account of negation. It does not specify, for example, what inferential relations negations enter into.147 What are these according to Aristotle? The major part of his answer to this question is to be found in the theory of syllogistic. This tells us, for example, that ‘all As are Bs and no Bs are Cs’ entails ‘no As are Cs’. Scattered through the Organon are other occasional remarks concerning negation and inference. For example, Aristotle claims (Prior Analytics 57b 3) that contradictories cannot both entail the same thing. His argument for this depends on the claim that nothing can entail its own negation. Aristotle never developed these remarks systematically, but they were to be influential on the next theory of negation that we will look at. This was endorsed by medieval logicians including Boethius, Abelard, and Kilwardby. It can be called the cancellation view of negation, since it holds that ¬α is something that cancels out α.148 As Abelard puts it:149 No one doubts that [a statement entailing its negation] is improper since the truth of any one of two propositions that divide truth [i.e., contradictories] not only does not require the truth of the other but rather entirely expels and extinguishes it. As Abelard observes, if negation does work like this then α cannot entail ¬α. For if it did, α would contain as part of its content something that neutralises it, in which event, it would have no content, and so entail nothing (or at least, nothing with any content). This principle, and related principles such as that nothing can entail 146 For
further discussion of supervaluation, see Priest [2001, 7.10]. bring this point home, note that both 2(α ∨ ¬α) and ¬3 (α ∧ ¬α) may well hold in a modal dialetheic logic. 148 For details, see Martin [1987] and Sylvan [2000]. 149 De Rijk [1970, p. 290]. 147 To
190
Graham Priest
a sentence and its contradictory, are now usually called connexivist principles.150 Such principles were commonly endorsed in early medieval logic. Carried to its logical conclusion, the cancellation account would seem to imply something much stronger than any of the connexivist principles so far mentioned; namely, that a contradiction entails nothing (with any content). For since ¬α cancels out α, α ∧ ¬α has no content, and so entails nothing. This, of course, is inconsistent with Aristotle’s claim which we noted in 2.1, that contradictories sometimes entail conclusions and sometimes do not. So this is not Aristotle’s view. But some philosophers certainly took the account to its logical conclusion. Thus, Berkeley, when criticising the infinitesimal calculus in the Analyst, says:151 Nothing is plainer than that no just conclusion can be directly drawn from two inconsistent premises. You may indeed suppose anything possible: But afterwards you may not suppose anything that destroys what you first supposed: or if you do, you must begin de novo... [When] you ... destroy one supposition by another ... you may not retain the consequences, or any part of the consequences, of the first supposition destroyed. Despite the fact that this quotation comes from Berkeley, allegiance to the cancellation view of negation, and to the connexivist principles that it delivers, waned in the later middle ages.152 The third account of negation we will look at is Boole’s, as he explains it in the Mathematical Analysis of Logic.153 Boole’s starting point in his logical investigations was the theory of the syllogism. His aim was to express syllogistic premises as equations, and then to give algebraic rules for operating on these which draw out their consequences. To turn the syllogistic forms into equations, he invokes the extensions of the terms involved. Thus, if a is the set of things that are A , etc., appropriate translations are: All As are Bs: No As are Bs: Some As are Bs: Some As are not Bs:
a(1 − b) = 0 ab = 0 ab = ν a(1 − b) = ν
Here, 1 is an appropriate universal class, so that 1 − b is the complement of b, 0 is the empty class, juxtaposition is intersection, and ν is an arbitrary non-empty class (necessary since Boole wants equations, not inequations). 150 For
modern connexivism, see Priest [1999b]. and Jessop [1951, p. 73]. 152 The reason seems to be that a truth functional account of conjunction and disjunction gained ground at this time. This makes trouble for connexivist principles. For by truth functionality, α ∧ ¬α ⊢ α; so by contraposition ¬α ⊢ ¬(α ∧ ¬α). But α ∧ ¬α ⊢ ¬α. Hence, by transitivity, α ∧ ¬α ⊢ ¬(α ∧ ¬α). See Martin [1987] and Sylvan [2000]. 153 The account given in the Laws of Thought is slightly different, but not in any essential ways. 151 Luce
Paraconsistency and Dialetheism
191
Boole extends this machinery to a propositional logic. To do this, he thinks of propositions as the sorts of thing that may change their truth value from circumstance to circumstance.154 He can then think of ‘if X then Y ’ as ‘all cases in which X is true are cases in which Y is true’: x(1 − y) = 0. Moreover, we may translate the other standard connectives thus: X and Y : X or Y : It is not the case that X:
xy x+y 1−x
where + is union, which Boole takes to make sense only when x and y are disjoint. Boole thus conceives negation as complementation: the negation of X is that statement which holds exactly where X fails to hold. It should be observed that none of the historical theories of negation that we have just looked at are the same as each other. As observed, according to Aristotle, contradictions may imply some things; whilst according to the cancellation account, strictly applied, they entail nothing. According to both Aristotle and cancellation, ‘if X then it is not the case that X’ is false, but under the Boolean interpretation this becomes: x(1 − (1 − x)) = 0. But x(1 − (1 − x)) = xx = x, and this is not equal to 0 in general.
7.3
Other Negations
But which account of negation is correct? This is a substantial question, and I will return to it in the next part. Before we get to that, there are some other issues concerning negation that are worth noting.155 Let us suppose that some paraconsistent account of negation is correct. Other accounts are then incorrect, but it does not follow that they do not succeed in capturing other meaningful and important notions. For example, in both classical and intuitionist logic there is an absurdity constant, ⊥, such that for all β, ⊥ → β is a logical truth. Negation may then be defined as α → ⊥, where → is the appropriate conditional. Let us write this as −α. The constant ⊥ makes perfectly good sense from a dialetheic point of view. If T is the truth predicate then ⊥ may be defined as ∀xT x; the T -schema then does the rest. Thus, −α makes perfectly good sense for a dialetheist too. But since its properties are inherited from those of →, −α may behave in ways quite different from classical and intuitionist negation. For example, suppose that → is the conditional of some relevant logic.156 Then we have Explosion for −, since α, −α ⊢ ⊥ (by modus ponens), and so α, −α ⊢ β.157 Moreover, in logics like R that contain (α ∧ (α → β)) → β, we will have (α ∧ (α → ⊥)) → ⊥, i.e., −(α ∧ −α), 154 In
the Laws of Thought, this becomes from time to time. following material is covered in more detail in Priest [1999a]. 156 In this context, ⊥ would usually be written as F , not to be confused with the constant f . See Anderson and Belnap [1975, p. 342f]. 157 One may wonder, in virtue of this, what happens to the liar paradox, phrased in terms of −. The answer is that it transforms into a Curry paradox. 155 The
192
Graham Priest
a version of the law of non-contradiction. But in weaker logics, such as B, this will not be the case. And in none of these logics will one have α ∨ (α → ⊥), i.e., a version of the law of excluded middle. Despite this, −α may well have useful properties. For example, let Λ be the set of all instances of the law of excluded middle, α∨¬α. Then, as is well known, Λ∪Σ ⊢I α iff Σ ⊢C α. In other words, full classical logic may be used even by an intuitionist, in contexts in which the law of excluded middle may be assumed enthymematically. In a similar way, suppose that Ξ is the set of all instances of −(α ∧ ¬α). Then it is not difficult to show that for many paraconsistent consequence relations, ⊢, Ξ ∪ Σ ⊢ α iff Σ ⊢C α.158 Hence, full classical logic can be used even by a paraconsistent logician if this schema is enthymematically assumed. The schema is one way of expressing the fact that we are reasoning about a consistent situation.159 Another negation-like notion, †α, may be characterised by the classical truth conditions: †α is true at a world, w, iff α is not true at w and, if truth and falsity are independent: †α is false at w iff α is true at w It might be thought that these conditions will deliver a notion with the properties of classical logic, but whether this is so depends on the properties of the negation used in the truth conditions (printed in boldface). For example, suppose that we wish to establish Explosion for †. Then we need to establish that, for any world, w, if α and α† are true at w then β is true at w; i.e.: if α is true at w and α is not true at w, β is true at w Now, even given that not-(α is true at w and α is not true at w) — and this may be true even if not is a paraconsistent negation — to infer what we want we need to invoke the inference not-γ ⊢ γ → δ. And we may well not be entitled to this. 158 For in many such logics, adding the disjunctive syllogism is sufficient to recapture classical logic. Now suppose that we have ¬α and α ∨ β. Then it follows that (¬α ∧α) ∨ β. But given that (α ∧ ¬α) → ⊥, and ⊥ → β, β follows by disjunction elimination. 159 A less heavy-handed way of recapturing classical logic is as follows. Suppose that one is employing the paraconsistent logic LP . (Similar constructions can be performed with some other paraconsistent logics.) Let an evaluation ν1 be more consistent than an evaluation ν2 , ν1 ≺ ν2 , iff every propositional parameter which is both true and false according to ν1 is both true and false according to ν2 , but not vice versa. As usual, ν is a model of α if it makes α true; and ν is a model of Σ if it is a model of every member. ν is a minimally inconsistent model of Σ iff ν is a model of Σ and if µ ≺ ν, µ is not a model of Σ. α is a minimally inconsistent consequence of Σ iff every minimally inconsistent model of Σ is a model of α. The construction employed in this definition of consequence is a standard one in non-monotonic logic, and is a way of enforcing certain default assumptions. Specifically, in this case, it enforces the assumption of consistency. Things are assumed to be no more inconsistent than Σ requires them to be. Unsurprisingly, it is not difficult to show that if Σ is consistent then α is a minimally inconsistent consequence of Σ iff it is a classical consequence. Thus, assuming consistency as a default assumption, a paraconsistent logician can use classical logic when reasoning from consistent information. The original idea here is due to Batens [1989], who has generalised it into a much broader programme of adaptive logics. See Batens [1999; 2000].
Paraconsistency and Dialetheism
193
One issue to which this is relevant is that of a dialetheic solution to the paradoxes of self-reference. For if there is a legitimate notion, say ∗1 , that behaves like classical negation (whether or not it really is negation) then the T -schema cannot be endorsed, as required by a dialetheic account. If it were, and given self-reference, we could simply apply the schema to a sentence, λ, of the form ∗1 T λ, to obtain T λ ∧ ∗1 T λ. Explosion would then give triviality. What we have seen is that there is no way that † can be shown to satisfy Explosion without assuming that the notion of negation appropriate in stating truth conditions itself satisfies certain “paradoxical” conditions. A dialetheist may simply deny this. The properties of a connective depend not just on its truth conditions, but on what follows from these; and this depends, of course, on the underlying logic. But can we not ensure that a connective, ∗1 , has all the properties of classical negation, including Explosion, by simply characterising it as a connective that satisfies the classical proof-theoretic rules of negation? No. As was shown by Prior [1960], there is no guarantee that characterising a connective by an arbitrary set of rules succeeds in giving it meaning. Prior’s example was a supposed connective, ∗2 (tonk), satisfying the rules α ⊢ α ∗2 β, α ∗2 β ⊢ β. Clearly, given ∗2 , one could infer anything from anything. It is clear, then, that ∗2 must lack sense, on pain of triviality. But a connective, ∗1 , possessing all the properties of classical negation equally gives rise to triviality, and so must lack sense. The triviality argument is essentially the liar argument concerning ∗1 just given. It is true that this argument invokes the T -schema, and that that schema is not included in standard logical machinery. But if a dialetheic account of truth is correct, the instances of the schema are logical truths concerning the truth predicate, just as much as the instances of the substitutivity of identicals are logical truths concerning the identity predicate. The T -schema ought, then, to be considered part of logic.
7.4
Denial
The other issue connected with negation that needs discussion is denial. Let me start by explaining what I mean by the word here. Speech acts are of many different kinds (have different illocutory forces): questioning, commanding, exhorting, etc. Perhaps the most fundamental kind of act is asserting. When a person asserts that α their aim is to get the hearer to believe that α, or at least, to believe that the speaker believes that α.160 Denial is another kind of speech act. When a person denies that α their aim is to get the hearer to reject (refuse to accept) α, or at least, to believe that the speaker rejects α. There was a long-standing confusion in logic, going all the way back to Aristotle, concerning assertion. The word was used to mean both the act of uttering and the content of what was uttered. A similar confusion beset the notion of denial. These confusions were finally laid to rest by Frege. And, said Frege, once this confusion is remedied, we may dispense with a sui generis notion of acts of denial. To deny 160 With
such Gricean refinements as seem fit.
194
Graham Priest
is simply to assert a sentence containing negative particles.161 This conclusion is certainly not required by enforcing the distinction between act and content, however; and, in fact, is false. For a start, one can deny without asserting a sentence with a negative particle: ‘England win the world cup? Get real.’ Perhaps less obviously, one can also assert a sentence containing a negative particle without denying. The existence of “metalinguistic” negation makes this patent, but the point stands even without that. For example, when a dialetheist asserts ‘The liar sentence is true; the liar sentence is not true’, the second utterance is not meant to convey to the hearer the fact that the dialetheist rejects the first sentence: after all, they do accept it. The second sentence conveys the fact that they accept its negation too. The issue does not depend in any essential way on dialetheism. Many people have inconsistent views (about religion, politics, or whatever). Sometimes they come to discover this fact by saying inconsistent things, perhaps under some probing questioning. Thus, for some α they may utter both α and ¬α. The second utterance is not an indication that the speaker rejects α. They do accept α. They just accept ¬α as well, at least until they revise their views. (If they did not accept α, there would be no need to revise their views.) Denial, then, is a linguistic act sui generis. This does not mean that uttering a sentence with a negative particle is never an act of denial; it certainly can be. You say to me ‘Truth is a consistent notion’; I say ‘It certainly is not’. What I am signalling here is exactly my rejection of what you say, and maybe trying to get you to revise your views in the process. Sometimes, then, an utterance containing a negative particle is an assertion; sometimes it is a denial. This is not an unusual situation. The very same words can often (if not always) be used in quite different speech acts. I say ‘the door is open’. Depending on the context, this could be an assertion, a command (to close it), or even a question. Of course, this raises the question of how one determines the illocutory force of an utterance. The short answer is that the context provides the relevant information. The long answer is surely very complex. But it suffices here that we can do it, since we often do.162 It might be thought that the notion of denial provides a route back into a classical account of negation. If we write ⊣ α to indicate a denial of α, then won’t ⊣ behave in just this way? Not at all. For a start, ⊣ is a force operator: it applies only to whole sentences; it cannot be embedded. Thus, α ↔⊣ α, for example, is a nonsense. But could there not be some operator on content, say ∆, such that asserting ∆α is the same as denying α? Perhaps ‘I deny that’ is a suitable candidate here. If this is the case, ∆ behaves in no way like classical negation. It is certainly not a logical truth, for example, that α ∨ ∆α: α may be untrue, and I may simply keep my mouth shut. α ∧ ∆α may also be true: I may deny a truth. 161 See
Frege [1919]. e.g., Parsons [1990], have objected to dialetheism on the ground that if it were true, it would be impossible for anyone to rule anything out, since when a person says ¬α, it is perfectly possible for them to accept α anyway. If ruling out means denying, this is not true, as we have just seen. And that’s a denial. 162 Some,
Paraconsistency and Dialetheism
195
For just this reason, the inference from α and ∆α to an arbitrary β is invalid. Is there not an operator on content, ∆, such that assertions of α and ∆α commit the utterer to everything? Indeed there is. Take for ∆ the negation-like operator, −, of the previous section. As we saw there, this will do the trick. But as we saw there, − does not behave like classical negation either.163 8 RATIONALITY
8.1
Multiple Criteria
Let us now turn to the final issue intimately connected with paraconsistency and dialetheism: rationality. The ideology of consistency is so firmly entrenched in orthodox western philosophy that it has been taken to provide the cornerstone of some of its most central concepts: consistency has been assumed to be a necessary condition for truth, (inferential) validity, and rationality. Paraconsistency and dialetheism clearly challenge this claim in the case of validity and truth (respectively). What of rationality? How can this work if contradictions may be tolerated? In articulating a reply to this question, the first thing to note is that consistency, if it is a constraint on rationality, is a relatively weak one. Even the most outrageous of views can be massaged into a consistent one if one is prepared to make adjustments elsewhere. Thus, consider the claim that the earth is flat. One can render this consistent with all other beliefs if one accepts that light does not travel in straight lines, that the earth moves in a toroid, that the moon landing was a fraud, etc.164 It is irrational for all that. There must therefore be other criteria for the rationality of a corpus of belief. What these are, philosophers of science argue about. All can agree that adequacy to the data (whatever form that takes) is one criterion. Others are more contentious. Simplicity, economy, unity, are all standardly cited, as are many different notions.165 Sorting out the truth in all this is, of course, an important issue for epistemology; but we do not need to go into the details here. As long as there is a multiplicity of criteria, they can come into conflict. One theory can be simple, but not handle all the data well; another can be more complex, with various ad hoc postulations, but give a more accurate account of the data.166 In such cases, which is the rationally acceptable theory? Possibly, in some cases, there may be no determinate answer to this question. Rationality may be a vague notion, and there may well be situations in which rational people can disagree. However, it seems reasonable to hold that if one theory is sufficiently better than all of its competitors on sufficiently many 163 An assertion of −α would normally be a denial of α, but it need not be: a trivilist would assert −α without rejecting α. 164 See the works of the Flat Earth Society. At the time of writing, these can be accessed at: http://www.flat-earth.org/platygaea/faq.mhtml. 165 For various lists, see Quine and Ullian [1970, ch. 5]; Kuhn [1977]; Lycan [1988, ch. 7]. 166 For example, the relationship between late 19th Century thermodynamics and the early quantum theory of energy was like this.
196
Graham Priest
of the criteria, then, rationally, one should believe this rather than the others.167 That is the way that things seems to work in the history of science, anyway. In disputes in the history of science, it is rare that all the indicators point mercilessly in the same direction. Yet a new view will often be accepted by the scientific community even though it has some black marks.
8.2 Rationality and Inconsistency The theory of rationality just sketched, nugatory though it be, is sufficient to show how rationality works in the presence of inconsistency. In particular, it suffices to show how inconsistent beliefs can be rational. If inconsistency is a negative criterion for rationality, it is only one of many, and in particular cases it may be trumped by performance on other criteria. This is precisely what seems to have happened with the various inconsistent theories in the history of science and mathematics that we noted in 5.1. In each case, the explanatory power of the inconsistent theory well outweighed its inconsistency. Of course, in each of these cases, the inconsistent theory was eventually replaced by a consistent theory.168 But in science, pretty much every theory gets replaced sooner or later. So this is nothing special. One may even question whether inconsistency is really a negative criterion at all. (That people have usually taken it to be so is not in dispute.) Consistency, or at least a certain amount of it, may well be required by other criteria. For example, if the theory is an empirical one, then adequacy to observational data is certainly an important criterion. Moreover, if α describes some observable situation, we rarely, if ever, see both α and ¬α. Empirical adequacy will therefore standardly require a theory to be consistent about observable states of affairs.169 The question is whether consistency is a criterion in its own right. This raises the hard question of what makes something a legitimate criterion. Different epistemologies will answer this question in different ways. For example, for a pragmatist, the only positive criteria are those which promote usefulness (in some sense). The question is therefore whether a consistent theory is, per se, more useful than an inconsistent one (in that sense). For a realist, on the other hand, the positive criteria are those which tend to select theories that correctly describe the appropriate external reality. The question is therefore whether we have some (perhaps transcendental) reason to believe that reality has a low degree of inconsistency. These are important questions; but they are too complex, and too tangential to the present issues, to be pursued here.170 167 This
is vague, too, of course. One way of tightening it up can be found in Priest [2001b]. this can be challenged in the case of modern quantum theory, which dallies with inconsistent notions, such as the Dirac δ-function, and is generally agreed to be inconsistent with the Theory of Relativity. 169 For further discussion, see Priest [1999c]. 170 It might be suggested that whatever the correct account, inconsistency must be a negative criterion. Why else would we find paradoxes, like the liar, intuitively unacceptable? The answer, of course, is that we mistakenly took consistency to be a desideratum (perhaps under the weight 168 Well,
Paraconsistency and Dialetheism
197
The theory of rationality just sketched shows not only how and when it is rational to accept an inconsistent theory, but how and when it is rational to give it up: the theory is simply trumped by another theory, consistent or otherwise. A frequent objection to paraconsistency and dialetheism is that if they were correct, there could never be any reason for people to reject any of their views. For any objection to a view establishes something inconsistent with it; and the person could simply accept the original view and the objection.171 Now, it is not true that objections always work in this way. They may work, for example, by showing that the position is committed to something unacceptable to its holder. And many consistent consequences are more unacceptable than some inconsistent ones. That you are a poached egg, for example, is a much more damaging consequence than that the liar sentence is both true and false. But even waiving this point, in the light of the preceding discussion, the objection is clearly incorrect. To accept the theory plus the objection is to accept an inconsistent theory. And despite paraconsistency, this may not be the rational thing to do. For example, even if inconsistency is not, per se, a negative mark, accepting the objection may be entirely ad hoc, and thus make a mess of simplicity.
8.3
The Choice of Logic
Let us now return to the question raised but deferred in the last part: which account of negation is correct? As I argued there, accounts of negation are theories concerning a certain relation. More generally, a formal logic (including its semantics) is a theory of all the relations it deals with, and, crucially, the relation of logical consequence. Now, the theory of rational belief sketched above was absolutely neutral as to what sort of theory it was whose belief was in question. The account can be applied to theories in physics, metaphysics, and, of course, logic. Thus, one determines the correct logic by seeing which one comes out best on the standard criteria of theory-choice.172 To see how this works, let me sketch an argument to the effect that the most rational logical theory to accept (at present) is a dialetheic one. Given the rudimentary nature of the theory of rationality I have given, and the intricacies of a number of the issues concerned, this can be no more than a sketch; but it will at least illustrate the application of the theory of rationality to logic itself. First, one cannot isolate logic from other subjects. The applications of logic spread to many other areas in metaphysics, the philosophy of language, and elsewhere. No logic, however pretty it is, can be considered acceptable if it makes a of the ideology of consistency). 171 Versions of the objection can be found in Lewis [1982, p. 434], and Popper [1963, p. 316f]. 172 The view of logic as a theory, on a par with all other theories, is defended by Haack [1974, esp. ch. 2]. She dubs it the ‘pragmatist’ view, though the name is not entirely happy, since the view is compatible, e.g., with orthodox realism concerning what theory is, in fact, true. Haack also accepts Quine’s attack on the analytic/synthetic distinction. But the view is quite compatible with the laws of logic being analytic. We can have theories about what is analytic as much as anything else.
198
Graham Priest
hash of these. In other words, one has to evaluate a logic as part of a package deal. In particular, one cannot divorce logic and truth: the two are intimately related. Thus, to keep things (overly) simple, suppose we face a choice between classical logic plus a consistent account of truth, and a paraconsistent logic plus an account of truth that endorses the T -schema, and is therefore inconsistent. Which is preferable? First, perhaps the most crucial question concerns the extent to which each theory is adequate to the data, which, in this case, comprises the intuitions we have concerning individual inferences. A consistent account fares badly in this area, at least with respect to the inferences enshrined in the T -schema, which certainly appear to be valid.173 It may be replied that in other areas the advantages are reversed. For a paraconsistent logic is weaker than classical logic; and hence a paraconsistent logic cannot account for a number of inferences, say those used in classical mathematics, for which classical logic can account. But as we saw in 7.3, a paraconsistent logic can account for classical reasoning in consistent domains. The inferences might not be deductively valid; they might, on this account, be enthymematic or non-monotonic; but at least their legitimate use is explained. What of the other criteria? Perhaps the most important of these is simplicity. As far as truth goes, there is no comparison here. There are many consistent accounts of truth (we looked at two in 5.4), and they are all quite complex, involving (usually transfinite) hierarchies, together with a bunch of ad hoc moves required to try to avoid extended paradoxes (the success of which is, in any case, moot, as we saw in 5.4). By contrast, a naive theory of truth, according to which truth is just that notion characterised by the T -schema, is about as simple as it is possible to be. Again, however, it may be replied that when it comes to other areas, the boot is on the other foot. Classical logic is about as simple as it is possible to be, whilst paraconsistent logics are much more complex, and contain unmotivated elements such as ternary relations. But this difference starts to disappear under scrutiny. Any adequate logic must be at least a modal logic. After all, we need a logic that can account for our modal inferences. But now compare a standard modal logic to a relevant logic, and consider, specifically, their world semantics. There are two major differences between the semantics of a standard modal logic and the world semantics of a relevant logic. The first is that the relevant semantics has a class of logically impossible worlds, over and above the possible worlds of the modal logic. But there would seem to be just as good reason to suppose there to be logically impossible worlds as to suppose there to be physically impossible worlds. Indeed, we would seem to need such worlds to complete all the jobs that possible worlds are fruitfully employed in. For example, if propositional content is to be understood in terms of worlds, then we need impossible worlds: someone who holds that the law of excluded middle fails has a different belief from someone 173 There are many other pertinent inferences, especially concerning the conditional. A relevant paraconsistent logic certainly out-performs classical logic in this area as well.
Paraconsistency and Dialetheism
199
who holds that the law of distribution fails. If worlds are to be used to analyse counter-factual conditionals, we need logically impossible worlds: merely consider the two conditionals: if intuitionist logic were correct, the law of excluded middle would fail (true); if intuitionist logic were correct, the law of distribution would fail (false). And so on. Or, to put it another way, since any adequate logic must take account of propositional content, counter-factuals, and so on, if impossible worlds are not used to handle these, some other technique must be; and this is likely to be at least as complex as employing impossible worlds. True, Routley/Meyer semantics also employ a ternary relation to give the truth conditions of conditionals at impossible worlds, and the interpretation of this relation is problematic. But a perfectly good relevant logic can be obtained without employing a ternary relation, simply by assigning conditionals arbitrary truth values at non-normal worlds, as I noted in 4.5.174 The other major difference between standard world-semantics for modal logics and relevant semantics brings us back to negation. Standard world semantics employ classical negation; relevant semantics employ some other notion. But the simplest relevant account of negation is the four-valued one of 4.4.175 This is exactly the same as the classical account in its truth and falsity conditions: ¬α is true (at a world) iff α is false (at that world), and vice versa. The only difference between the two accounts is that the classical one assumes that truth and falsity are exclusive and exhaustive, whilst the four-valued account imposes no such restrictions. This is hardly a significant difference in complexity. And if anything, it is the classical account which is more complex, since it imposes an extra condition. There may well, of course, be other criteria relevant to a choice between the two positions we have been discussing.176 There may equally be other areas in which one would wish to compare the performances of the two positions.177 But at least according to the preceding considerations, a paraconsistent logic plus dialetheism about truth, comes out well ahead of an explosive and consistent view. Indeed, there are quite general considerations as to why this is always likely to be the case. Anything classical logic can do, paraconsistent logic can do too: classical logic is, after all, just a special case. But paraconsistent logic has extra resources that allow it to provide a natural solution to many of the nagging problems of classical logic. It is the rational choice. 174 This
gives a relevant logic slightly weaker than B. See Priest [2001, ch. 9]. is not the account that is employed in the usual Routley/Meyer semantics, which is the Routley ∗. But there are perfectly good relevant logics that use the four-valued account of negation; they are just not the usual ones. See Priest [2001, ch. 9]. 176 The one criterion on which the inconsistent approach clearly does not come out ahead is conservatism, which some people take to be a virtue. Conservativeness is a highly dubious virtue, however. Rationality should not reflect elements of luck, such as who got in first. 177 Another important issue arises here. Is there a uniquely correct logic for reasoning about all domains (logical monism); or is it the case, as some have recently argued, that different domains of reasoning require different logics (logical pluralism)? For a discussion of these issues, with appropriate references, see Priest [2001c]. 175 This
200
8.4
Graham Priest
Conclusion
Of course, that is merely how things stand (as I see it) at the moment. The determination of the correct logic is a fallible and revisable business. It may well happen that what it is rational to believe about these matters will change as new theories and new pieces of evidence appear. Indeed, revision is to be expected historically: our logical theories have often been revised in the light of new developments. In contemporary universities, logic is often taught in an ahistorical fashion, which induces a certain short-sightedness and a corresponding dogmatism. A knowledge of the history of logic, as displayed in this volume, and the others in the series, should engender not only a sense of excitement about the development of logic, but a certain humility about our own perspective.178 BIBLIOGRAPHY [Ackermann, 1956] W. Ackermann. Begr¨ undung einer Strengen Implikation, Journal of Symbolic Logic 21, 113-28, 1956. [Anderson and Belnap, 1958] A. R. Anderson and N. D. Belnap. A Modification of Ackermann’s “Rigorous Implication” (abstract), Journal of Symbolic Logic 23, 457-8, 1958. [Anderson and Belnap, 1975] A. R. Anderson and N. D. Belnap. Entailment; the Logic of Relevance and Necessity, Vol. I, Princeton: Princeton University Press, 1975. [Anderson et al., 1992] A. R. Anderson, N. D. Belnap, and J. M. Dunn. Entailment; the Logic of Relevance and Necessity, Vol. II, Princeton: Princeton University Press, 1992. [Asenjo, 1966] F. G. Asenjo. A Calculus for Antinomies, Notre Dame Journal of Formal Logic 16, 103-5, 1966. [Ashworth, 1974] E. J. Ashworth. Language and Logic in the Post-Medieval Period, Dordrecht: Reidel Publishing Company, 1974. [Barnes, 1984] J. Barnes. The Complete Works of Aristotle, Princeton: Princeton University Press, 1984. [Batens, 1989] D. Batens. Dynamic Dialectical Logic, ch. 6 of [Priest, Routley, and Norman, 1989]. [Batens, 1999] D. Batens. Inconsistency-Adaptive Logics, pp. 445-72 of E.Orlowska (ed.), Logic at Work: Essays Dedicated to the Memory of Helena Rasiowa, Heidelberg: Physica Verlag (Springer), 1999. [Batens, 2000] D. Batens. A Survey of Inconsistency-Adaptive Logics, pp. 49-73 of D.Batens, C.Mortensen, G.Priest, and J.-P. Van-Bendegem (eds.), Frontiers of Paraconsistent Logic, Baldock: Research Studies Press, 2000. [Boche´ nski, 1963] I. M. Boche´ nski. Ancient Formal Logic, Amsterdam: North Holland Publishing Company, 1963. [Bochvar, 1939] D. Bochvar. On a Three-Valued Calculus and its Applications to the Analysis of Contradictions, Mathematiˇ c´ eskij Sbornik 4, 287-300, 1939. [Boh, 1982] I. Boh. Consequences, ch. 15 of N. Kretzman, A. Kenny, and J. Pinborg (eds.), The Cambridge History of Later Medieval Logic, Cambridge: Cambridge University Press, 1982. [Brady, 1983] R. Brady. The Simple Consistency of a Set Theory Based on CSQ, Notre Dame Journal of Formal Logic 24, 431-9, 1983. [Brady, 1989] R. Brady. The Non-Triviality of Dialectical Set Theory, ch. 16 of [Priest, Routley, and Norman, 1989]. [Brady, 2003] R. Brady, ed. Relevant Logics and their Rivals, Vol. II, Aldershot: Ashgate, 2003. 178 A version of this essay was given in a series of seminars at the University of St Andrews in Michaelmas term, 2000. I am grateful to those participating for many helpful comments and criticisms, but especially to Andr´es Bobenrieth, Roy Cook, Agust´ın Rayo, Stephen Read, Stewart Shapiro, John Skorupski, and Crispin Wright. The essay was finished in 2001. It has not been updated in teh light of subsequent developments.
Paraconsistency and Dialetheism
201
[Brown and Schotch, 1999] B. Brown and P. Schotch. Logic and Aggregation, Journal of Philosophical Logic 28, 265-87, 1999. [Church, 1951] A. Church. The Weak Calculus of Implication, pp. 22-37 of A.Menne, A.Wilhelmy, and H.Angsil (eds.), Kontrolliertes Denken, Unterschungen zum Logikkalk¨ ul und zur Logik der Einzelwissenschaften, Munich: Kommissions-Verlag Karl Alber, 1951. [Costa, 1974] N. C. A. da Costa. The Theory of Inconsistent Formal Systems, Notre Dame Journal of Formal Logic 15, 497-510, 1974. [Costa and Alves, 1977] N. C. A. da Costa and E. Alves. A Semantical Analysis of the Calculi Cn , Notre Dame Journal of Formal Logic 18, 621-30, 1977. [Costa and Guillaume, 1965] N. C. A. da Costa and M. Guillaume. N´egations Compos´ees et la Loi de Peirce dans les System Cn , Portugaliae Mathemetica 24, 201-9, 1965. [Rijk, 1970] L. M. De Rijk, ed. Petrus Abealardus; Dialectica, Assen: van Gorcum & Co, 1970. [Doˇsen, 1992] K. Doˇsen. The First Axiomatization of a Relevant Logic, Journal of Philosophical Logic 21, 339-56, 1992. [Dunn, 1976] J. M. Dunn. Intuitive Semantics for First Degree Entailment and “Coupled Trees”, Philosophical Studies 29, 149-68, 1976. [Dunn, 1988] J. M. Dunn. The Impossibility of Certain Second-Order Non-Classical Logics with Extensionality, pp. 261-79 of D.F.Austin (ed.), Philosophical Analysis, Dordrecht: Kluwer Academic Publishers, 1988. [Fine, 1974] K. Fine. Models for Entailment, Journal of Philosophical Logic 3, 347-72, 1974. [Frege, 1919] G. Frege. Negation, Beitr¨ age zur Philosophie des deutschen Idealismus 1, 143-57, 1919; translated into English and reprinted in P.Geach and M.Black (eds.), Translations for the Philosophical Writings of Gottlob Frege, Oxford: Basil Blackwell, 1960. [Goddard, 1998] L. Goddard. The Inconsistency of Traditional Logic, Australasian Journal of Philosophy 76, 152-64, 1998. [Goldstein, 1989] L. Goldstein. Wittgenstein and Paraconsistency, ch. 19 of [Priest, Routley, and Norman, 1989]. [Goodman, 1981] N. D. Goodman. The Logic of Contradiction, Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik 27, 119-26, 1981. [Goodship, 1996] L. Goodship. On Dialethism, Australasian Journal of Philosophy 74, 153-61, 1996. [Haack, 1974] S. Haack. Deviant Logic, Cambridge: Cambridge University Press, 1974. [Hamilton and Cairns, 1961] E. Hamilton and H. Cairns. Plato; the Collected Dialogues, Princeton: Princeton University Press, 1961. [Heron, 1954] G. Heron (trans.). Of Learned Ignorance, London: Routledge & Kegan Paul, 1954. [Hilbert, 1925] D. Hilbert. On the Infinite, Mathematische Annalen 95, 161-90, 1925; translated into English and reprinted as pp. 134-151 of P.Benacerraf and H.Putnam (eds.), Philosophy of Mathematics; Selected Readings, Oxford: Basil Blackwell. [Horn, 1989] L. R. Horn. A Natural History of Negation, Chicago: Chicago University Press, 1989. [Hyde, 1997] D. Hyde. From Heaps and Gaps to Heaps of Gluts, Mind 106, 641-60, 1997. [Ja´skowski, 1969] S. Ja´skowski. Propositional Calculus for Contradictory Deductive Systems, Studia Logica 24, 143-57, 1969. [Jennings, 1994] R. E. Jennings. The Genealogy of Disjunction, Oxford: Oxford University Press, 1994. [Joad, 1927] C. E. M. Joad. Bertrand Russell; the Man and the Things he Stands for, The New Leader, December 9th, 1927. [Kasulis, 1981] T. Kasulis. Zen Action; Zen Person, Honolulu: University of Hawai’i Press, 1981. [Kim, 1985] H.-J. Kim (trans.). Zazenshin: Admonitions for Zazen, ch. 12 of Flowers of Emptiness, Lewiston: The Edward Mellen Press, 1985. [Kirk and Raven, 1957] G. S. Kirk and J. E. Raven. The Presocratic Philosophers, Cambridge: Cambridge University Press, 1957. [Kirwan, 1993] C. Kirwan. Aristotle; Metaphysics, Books Γ, ∆, E, 2nd edition, Oxford: Oxford University Press, 1993. [Kleene, 1952] S. C. Kleene. Introduction to Metamathematics, Amsterdam: North Holland Publishing Company, 1952.
202
Graham Priest
[Kneale and Kneale, 1962] W. Kneale and M. Kneale. The Development of Logic, Oxford: Clarendon Press, 1962. [Kolakowski, 1978] L. Kolakowski. Main Currents of Marxism; Vol. I, the Founders, Oxford: Oxford University Press, 1978. [Kuhn, 1977] T. S. Kuhn. Objectivity, Value Judgment and Theory Choice, ch. 13 of The Essential Tension, Chicago, IL: University of Chicago Press, 1977. [Kwok et al., 1993] M. Kwok, M. Palmer, and J. Ramsey (trans.). Tao Te Ching, Dorset: Element Books Ltd, 1993. [Lewis and Langford, 1959] C. I. Lewis and C. H. Langford. Symbolic Logic, The Century Company; 2nd edition, New York: Dover Publications, 1959. [Lewis, 1982] D. Lewis. Logic for Equivocators, Noˆ us 14, 431-41, 1982. [Lopari´ c, 1986] A. Lopari´ c. A Semantical Study of Some Propositional Calculi, Journal of NonClassical Logic 3, 73-95, 1986. [Luce and Jessop, 1951] A. A. Luce and T. E. Jessop, eds. The Collected Works of George Berkeley, Vol. IV, London: Thomas Nelson & Sons, 1951. [L ukasiewicz, 1910] J. L ukasiewicz. O Zasadzie Sprzeczno´sci u Aristotelsa, Krakow: Studium Krytyczne, 1910. [L ukasiewicz, 1970] J. L ukasiewicz. On the Principle of Contradiction in Aristotle, Review of Metaphysics 24, 485-509, 1970. [Lycan, 1988] W. G. Lycan. Judgment and Justification, Cambridge: Cambridge University Press, 1988. [MacKenna, 1991] S. MacKenna (trans.). Plotinus; the Enneads, London: Penguin Classics, 1991. [Mair, 1994] V. H. Mair. Wandering on the Way; Early Taoist Parables of Chuang Tsu, New York: Bantam Books, 1994. [Martin, 1986] C. Martin. William’s Machine, Journal of Philosophy 83, 564-72, 1986. [Martin, 1987] C. Martin. Embarrassing Arguments and Surprising Conclusions in the Development of Theories of the Conditional in the Twelfth Century, pp. 377-400 of J.Jolivet and A.de Libera (eds.), Gilbert de Poitiers et ses contemporains: aux origines de la ‘Logica modernorum’: actes du 7e Symposium Europ´ een d’histoire de la logique et de la semantique m´ edi´ evales, Poitiers, 17-22 juin 1985, Napoli: Bibliopolis, 1987. [Maurer, 1967] A. Maurer. Nicholas of Cusa, pp. 496-8, Vol. 5, of P.Edwards (ed.), Encyclopedia of Philosophy, London: Macmillan, 1967. ¨ [Meinong, 1907] A. Meinong. Uber die Stellung der Gegenstandstheorie in System der Wissenshaften, Leipzig: R.Voitlander Verlag, 1907. [Meyer, 1978] R. K. Meyer. Relevant Arithmetic, Bulletin of the Section of Logic, Polish Academy of Sciences 5, 133-7, 1978. [Meyer and Routley, 1972] R. K. Meyer and R. Routley. Algebraic Analysis of Entailment, I, Logique et Analyse 15, 407-28, 1972. [Miller, 1969] A. V. Miller (trans.). The Science of Logic, London: Allen and Unwin, 1969. [Moh, 1950] S.-K. Moh. The Deduction Theorem and Two New Logical Systems, Methodos 2, 56-75, 1950. [Mortenssen, 1995] C. Mortensen. Inconsistent Mathematics, Dordrecht: Kluwer Academic Publishers, 1995. [Parsons, 1990] T. Parsons. True Contradictions, Canadian Journal of Philosophy 20, 335-53, 1990. [Peano, 1967] G. Peano. The Principles of Arithmetic, Presented by a New Method, pp. 8397 of J. Van Heijenoort (ed.), From Frege to G¨ odel; A Source Book of Mathematical Logic, 1879-1931, Cambridge, MA: Harvard University Press, 1967. [Pe˜ na, 1989] L. Pe˜ na. Verum et Ens Convertuntur, ch. 20 of Priest, Routley, and Norman, 1989. [Popper, 1963] K. Popper. Conjectures and Refutations, London: Routledge and Kegan Paul, 1963. [Priest, 1973] G. Priest. A Bedside Reader’s Guide to the Conventionalist Philosophy of Mathematics, in J. Bell, J. Cole, G. Priest, and A. Slomson (eds.), Proceedings of the Bertrand Russell Memorial Logic Conference; Denmark, 1971, Leeds: Bertrand Russell Memorial Logic Conference, 1973. [Priest, 1979] G. Priest. Logic of Paradox, Journal of Philosophical Logic 8, 219-41, 1979. [Priest, 1987] G. Priest. In Contradiction; a Study of the Transconsistent, Dordrecht: Kluwer Academic Publishers, 1987; Second revised edition, Oxford University Press, 2006.
Paraconsistency and Dialetheism
203
[Priest, 1989–90] G. Priest. Dialectic and Dialetheic, Science and Society 53, 388-415, 1989–90. [Priest, 1991] G. Priest. Minimally Inconsistent LP , Studia Logica 50, 321-31, 1991. [Priest, 1994] G. Priest. Is Arithmetic Consistent?, Mind 103, 337-49, 1994. [Priest, 1995] G. Priest. Beyond the Limits of Thought, Cambridge: Cambridge University Press, 1995; second (revised) edition, Oxford: Oxford University Press, 2002. [Priest, 1998] G. Priest. To Be and Not to Be—that is the Answer; on Aristotle and the Law of Non-Contradiction, Philosophiegeschichte und Logische Analyse 1, 91-130. 1998; revised as Chapter 1 of [Priest, 2006]. [Priest, 1999a] G. Priest. What Not? A Defence of Dialetheic Theory of Negation, pp. 10120 of D.Gabbay and H.Wansing (eds.), What is Negation?, Dordrecht: Kluwer Academic Publishers, 1999; revised as Chatper 4 of [Priest, 2006]. [Priest, 1999b] G. Priest. Negation as Cancellation and Connexive Logic, Topoi 18, 141-8, 1999. [Priest, 1999] G. Priest. Perceiving Contradictions, Australasian Journal of Philosophy 77, 43946, 1999; revised as part of Chapter 3 of [Priest, 2006]. [Priest, 2000a] G. Priest. Truth and Contradiction, Philosophical Quarterly 50, 305-19, 2000; revised as Chapter 2 of [Priest, 2006]. [Priest, 2000b] G. Priest. Vasil’ev and Imaginary Logic, History and Philosophy of Logic 21, 135-46, 2000. [Priest, 2001a] G. Priest. Introduction to Non-Classical Logic, Cambridge: Cambridge University Press, 2001; second edition revised as Introduction to Non-classical Logic, Vol I, Cambridge University Press, 2008. [Priest, 2001b] G. Priest. Paraconsistent Belief Revision, Theoria 68, 214-28, 2001; revised as Chapter 8 of [Priest, 2006]. [Priest, 2001c] G. Priest. Logic: One or Many?, in J. Woods and B. Brown (eds.), Logical Consequences: Rival Approaches. Proceedings of the 1999 Conference of the Society of Exact Philosophy, Stanmore: Hermes Science Publishers Ltd, 2001; revised as Chapter 12 of [Priest, 2006]. [Priest, 2002] G. Priest. Paraconsistent Logic’, pp. 287-393, Vol. 6 of D.Gabbay and F.Guenthner (eds.), Handbook of Philosophical Logic, 2nd edition, Dordrecht: Kluwer Academic Publishers, 2002. [Priest, 2006] G. Priest. Doubt Truth to be a Liar, Oxford: Oxford University Press, 2006. [Priest and Routley, 1989] G. Priest and R. Routley. The Philosophical Significance and Inevitability of Paraconsistency, ch. 18 of [Priest, Routley, and Norman, 1989]. [Priest et al., 1989] G. Priest, R. Routley, and J. Norman. Paraconsistent Logics: Essays on the Inconsistent, Munich: Philosophia Verlag, 1989. [Priest and Sylvan, 1992] G. Priest and R. Sylvan. Simplified Semantics for Basic Relevant Logics, Journal of Philosophical Logic 21, 217-32, 1992. [Prior, 1960] A. Prior. The Runabout Inference Ticket, Analysis 21, 38-9, 1960. [Quine, 1966] W. V. O. Quine. Three Grades of Modal Involvement, ch. 13 of The Ways of Paradox and Other Essays, New York: Random House, 1966. [Quine and Ullian, 1970] W. V. O. Quine and J. Ullian. The Web of Belief, New York, NY: Random House, 1970. [Raju, 1953–54] P. T. Raju. The Principle of Four-Cornered Negation in Indian Philosophy, Review of Metaphysics 7, 694-713, 1953–54. [Rauszer, 1977] C. Rauszer. Applications of Kripke Models to Heyting-Brower Logic, Studia Logica 36, 61-71, 1977. [Read, 1988] S. Read. Relevant Logic; a Philosophical Examination of Inference, Oxford: Basil Blackwell, 1988. [Read, 1993] S. Read. Formal and Material Consequence; Disjunctive Syllogism and Gamma, pp. 233-59 of K.Jacobi (ed.), Argumentationstheorie; Scholastische Forschungen zu den logischen und semantischen Regeln korrekten Folgerns, Leiden: E.J.Brill, 1993. [Rescher and Brandom, 1980] N. Rescher and R. Brandom. The Logic of Inconsistency, Oxford: Basil Blackwell, 1980. [Rescher and Manor, 1970–71] N. Rescher and R. Manor. On Inferences from Inconsistent Premises, Theory and Decision 1, 179-217, 1970–71. [Restall, 1992] G. Restall. A Note on Naive Set Theory in LP , Notre Dame Journal of Formal Logic 33, 422-32, 1992. [Restall, 1993] G. Restall. Simplified Semantics for Relevant Logics (and some of their Rivals), Journal of Philosophical Logic 22, 481-511, 1993.
204
Graham Priest
[Restall, 1999] G. Restall. Negation in Relevant Logics (How I Stopped Worrying and Learned to Love the Routley Star), pp. 53-76 of D.Gabbay and H.Wansing (eds.), What is Negation?, Dordrecht: Kluwer Academic Publishers, 1999. [Robinson, 1987] T. M. Robinson. Heraclitus, Toronto: University of Toronto Press, 1987. [Routley, 1977] R. Routley. Ultralogic as Universal?, Relevance Logic Newsletter 2, 50-90 and 138-75, 1977; reprinted as an appendix to Exploring Meinong’s Jungle and Beyond, Canberra: Research School of Social Sciences, 1980. [Routley, 1979] R. Routley. Dialectical Logic, Semantics and Metamathematics Erkenntnis 14, 301-31, 1979. [Routley, 1980] R. Routley. Three Meinongs, ch. 5 of Exploring Meinong’s Jungle and Beyond, Canberra: Research School of Social Sciences, 1980. [Routley et al., 1984] R. Routley, V. Plumwood, R. K. Meyer, and R. Brady. Relevant Logics and their Rivals, Vol. I Atascadero: Ridgeview, 1984. [Routley and Routley, 1972] R. Routley and V. Routley. Semantics of First Degree Entailment, Noˆ us 6, 335-59, 1984. [Russell, 1905] B. Russell. On Denoting, Mind 14, 479-93, 1905. [Russell, 1997] B. Russell. The Collected Papers of Bertrand Russell, Vol. 11, ed. J. Slater, London: Routledge, 1997. [Schotch and Jennings, 1980] P. Schotch and R. Jennings. Inference and Necessity, Journal of Philosophical Logic 9, 327-40, 1980. [Smart, 1967] N. Smart. Eckhart, Meister, pp. 449-551, Vol. 2, of P.Edwards (ed.), Encyclopedia of Philosophy, London: Macmillan, 1967. [Smiley, 1959] T. J. Smiley. Entailment and Deducibility, Proceedings of the Aristotelian Society 59, 233-254, 1959. [Stump, 1989] E. Stump. Dialectic and its Place in Medieval Logic, Ithaca: Cornell University Press, 1989. [Sylvan, 2000] R. Sylvan. A Preliminary Western History of Sociative Logics, ch. 5 of D.Hyde and G.Priest (eds.), Sociative Logics and their Applications; Essays by the Late Richard Sylvan, Aldershot: Ashgate Publishers, 2000. [Tanahashi, 1985] K. Tanahashi, ed. Mountains and Waters S¯ utra, pp. 97-107 of Moon in a Dewdrop; Writings of Zen Master D¯ ogen, New York: Farrar, Straus and Giroux, 1985. [Tennant, 1984] N. Tennant. Perfect Validity, Entailment and Paraconsistency, Studia Logica 43, 181-200, 1984. [Tennant, 1992] N. Tennant. Autologic, Edinburgh: Edinburgh University Press, 1992. [Urbas, 1990] I. Urbas. Paraconsistency, Studies in Soviet Thought 39, 434-54, 1990. [Urquhart, 1972] A. Urquhart. Semantics for Relevant Logics, Journal of Symbolic Logic 37, 274-82, 1972. [Vasil’´ ev, 1912–12] N. A. Vasil’´ev. Logica i M´etalogica, Logos 1-2, 53-81, 1912–13; translated into English by V.Vasyukov as ‘Logic and Metalogic’, Axiomathes 4 (1993), 329-51. [Wallace, 1975] W. Wallace (trans.). Hegel’s Logic; being Part One of the Encyclopaedia of the Philosophical Sciences, Oxford: Oxford University Press, 1975. [Wittgenstein, 1978] L. Wittgenstein. Remarks on the Foundations of Mathematics, 3rd (revised) edition, Oxford: Basil Blackwell, 1978. [Wittgenstein, 1979] L. Wittgenstein. Wittgenstein and the Vienna Circle, ed. B.F.McGuinness, Oxford Basil Blackwell, 1979.
THE HISTORY OF QUANTUM LOGIC Maria Luisa Dalla Chiara, Roberto Giuntini and Miklos R´edei
1 THE BIRTH OF QUANTUM LOGIC: BIRKHOFF AND VON NEUMANN The idea of quantum logic first appears explicitly in the short Section 5 of Chapter III in von Neumann’s 1932 book on the mathematical foundations of quantum mechanics [von Neumann, 1943]. Towards the end of this section von Neumann writes: As can be seen, the relation between the properties of a physical system on the one hand, and the projections on the other, makes possible a [von Neumann, sort of logical calculus with these. [our emphasis] 1943, p. 253] But this idea is not worked out in von Neumann’s 1932 book in detail; it is only in the 1936 seminal joint paper of Birkhoff and von Neumann [Birkhoff and von Neumann, 1936] where a systematic attempt is made to propose a “propositional calculus” for quantum logic: with this 1936 paper quantum logic was born. The birth was followed by a long dormancy: it was only in the late Fifties when quantum logic began to attract the interest of mathematical physicists, mathematicians, logicians and philosophers. One reason of this long disinterest in quantum logic may be that it was very difficult – if not impossible – to understand the Birkhoff-von Neumann concept of quantum logic exclusively on the basis of the 1936 Birkhoff-von Neumann paper. While proposing quantum logic in 1935-1936, von Neumann was simultaneously working on the theory of “rings of operators” (called von Neumann algebras today), and in the year of the publication of the Birkhoff-von Neumann paper on quantum logic von Neumann also published a joint paper with J. Murray, a work that established the classification theory of von Neumann algebras [Murray and von Neumann, 1936]. We shall see in this section that the results of this classification theory are intimately related to the Birkhoff-von Neumann concept of quantum logic. To understand more fully some apparently counterintuitive features of the Birkhoff-von Neumann idea of quantum logic, one has to take into account other, earlier results and ideas of von Neumann as well, however. In the second [von Neumann, 1927] of the three “foundational papers” [von Neumann, 1927]-[von Neumann, 1927] von Neumann worked out a derivation of the quantum mechanical probability calculus under the frequency-interpretation of probability. That
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
206
M. L. Dalla Chiara, R. Giuntini and M. R´edei
derivation – reproduced with apparently small but revealing modifications in Chapter IV of his 1932 book [von Neumann, 1943] – was very problematic: it contained conceptual inconsistencies, of which von Neumann was more or less aware. The conceptual difficulty led him to taking a critical attitude towards the standard Hilbert space formalism of quantum mechanics1 and to the hope that the mathematical formalism accommodating the quantum logic he proposed will also serve as a more satisfactory framework for quantum mechanics than the standard Hilbert space formalism. The proposed formalism was the theory of von Neumann algebras, especially the so called type II1 factor von Neumann algebras. Thus the 1936 Birkhoff-von Neumann concept of quantum logic is related to deep mathematical discoveries in the mid Thirties, to the history of quantum mechanics in the Twenties, and to conceptual difficulties in connection with the frequency-interpretation of quantum probability. So the issue is a convoluted one. The complexity of the problem is also reflected by the fact that, as we shall argue by citing evidence, von Neumann himself was never quite satisfied with how quantum logic had been worked out. In this section we also try to explain why. The essential point we make is that von Neumann wanted to interpret the algebraic structure representing quantum logic as the algebra of random events in the sense of a noncommutative probability theory. In a well-defined sense to be explained here, this cannot be achieved if probabilities are to be viewed as relative frequencies – not even if one abandons the standard Hilbert space formalism in favor of the theory of type II1 von Neumann algebras. This was likely the main reason why von Neumann abandoned the frequency-interpretation of quantum probability after 1936 in favor of a “logical interpretation” of probability, which von Neumann did not regard as very well developed and understood, however.
1.1 The main idea of quantum logic: logicizing non-Boolean algebras It is well known that both the syntactic as well as the semantic aspects of classical propositional logic can be described in terms of Boolean algebras. This is expressed metaphorically by Halmos’ famous characterization of the (classical) logician as the dual of a (Boolean) algebraist [Halmos, 1962, p. 22], a characterization which has been recently “dualized” by Dunn and Hardegree: “By duality we obtain that the algebraist is the dual of the logician.” [Dunn and Hardegree, 2001, p. 6]. The problem of quantum logic can be formulated as the question of whether the duality alluded to above also obtains if Boolean algebras are replaced by other, typically weaker algebraic structures arising from the mathematical formalism of quantum mechanics. The Birkhoff-von Neumann paper can be viewed as one of the first papers in which the suggestion to logicize a non-Boolean lattice appears. There are however several types of non-Boolean lattices. Which one is supposed to be logicized? At the time of birth of quantum logic the canonical example of non-distributive, ortholattices, the Hilbert lattices, were known, and, since this structure emerges 1 See
[R´ edei, 1996] for an analysis of von Neumann’s critique.
The History of Quantum Logic
207
naturally from the Hilbert space formalism of quantum mechanics, Hilbert lattices were the most natural candidates in 1935 for the propositional system of quantum logic. Indeed, Birkhoff and von Neumann did consider Hilbert lattices as a possible propositional system of quantum logic; yet, this lattice was not their choice: the major postulate in the Birkhoff von Neumann paper is formulated in the section entitled “Relation to abstract projective geometries” and reads: Hence we conclude that the propositional calculus of quantum mechanics has the same structure as an abstract projective geometry. (Emphasis in the original) [Birkhoff and von Neumann, 1936] What is this structure of abstract projective geometry and why did Birkhoff and von Neumann postulate it to be the proper algebraic structure representing quantum logic? To explain this we need to recall first the basic properties of Hilbert lattices.
1.2
Hilbert lattices
According to von Neumann’s axiomatization of quantum theory (QT), the mathematical interpretation of any quantum system S is a complex separable Hilbert space H.2 Any pure state (corresponding to a maximal information of the observer about the system) is mathematically represented by a unit vector ψ of the space H. States that do not necessarily correspond to maximal information are called mixtures. They are mathematically represented by density operators of H.3 Apparently, the Hilbert space H has for QT the role that in classical particle mechanics is plaid by the phase space (whose points represent possible pure states of the physical system under investigation). In the classical case, it is quite natural to assume that the events that may occur to the physical system are mathematically represented by subsets of the phase space S. This gives rise to a Boolean field of sets F(S), ∩, ∪, c , ∅, S, where the set-theoretic operations ∩, ∪, c represent respectively the conjunction, the disjunction and the negation of classical events. Why are the mere subsets of H not adequate mathematical representatives for quantum events, as in the phase-space case? The reason depends on the superposition principle, which represents one of the basic dividing lines between the quantum and the classical case.4 As opposed to classical mechanics, in quantum mechanics, any unit vector, that is a linear combination of pure states, gives rise to a new pure state. Suppose 2 See
Def. 74-82 of the Mathematical Appendix. Def. 92. 4 See Def. 80. 3 See
208
M. L. Dalla Chiara, R. Giuntini and M. R´edei
two pure states ψ1 , ψ2 are orthogonal and suppose that a pure state ψ is a linear combination of ψ1 , ψ2 . In other words: ψ = c1 ψ1 + c2 ψ2 (where |c1 |2 + |c2 |2 = 1). According to one of the basic axioms of QT (the so called Born rule), this means that a quantum system in state ψ might verify with probability |c1 |2 those events that are certain for state ψ1 (and are not certain for ψ) and might verify with probability |c2 |2 those events that are certain for state ψ2 (and are not certain for ψ). Suppose now there is given an orthonormal set of pure states {ψi }i∈I , where each ψi assigns probability 1 to a given event. Consider the linear combination ψ= ci ψi (ci = 0 and |ci |2 = 1), i
i
which turns out to be a pure state. Then, also ψ will assign probability 1 to the same event. As a consequence, the mathematical representatives of events should be closed under finite and infinite linear combinations. The closed subspaces of H are just the mathematical objects that can realize such a role.5 What will be the meaning of negation, conjunction and disjunction in the realm of quantum events? Let us first consider negation, by referring to Birkhoff and von Neumann’s paper. They observe: The mathematical representative of the negative of any experimental proposition is the orthogonal complement of the mathematical representative of the proposition itself. The orthogonal complement A′ of a closed subspace A is defined as the set of all vectors that are orthogonal to all elements of A. In other words, ψ ∈ A′ iff ψ ⊥ A iff, for any ϕ ∈ A, the inner product ψ|ϕ is 0.6 From the point of view of the physical interpretation, the orthogonal complement (called also orthocomplement) is particularly interesting, since it satisfies the following property: a pure state ψ assigns to an event A probability 1 (0, respectively) iff ψ assigns to the orthocomplement of A probability 0 (1, respectively). As a consequence, one is dealing with an operation that inverts the two extreme probability-values, which naturally correspond to the truth-values truth and falsity (as in the classical truth-table of negation). As for conjunction, Birkhoff and von Neumann point out that this can be still represented by the set-theoretic intersection (as in the classical case). For, the intersection A ∩ B of two closed subspaces is again a closed subspace. Hence, we will obtain the usual truth-table for the connective and : ψ verifies A ∩ B iff ψ verifies both members. 5 See 6 See
Def. 82. Def. 75.
The History of Quantum Logic
209
Disjunction, however, cannot be represented here as a set-theoretic union. For, generally, the union A ∪ B of two closed subspaces is not a closed subspace, except in special circumstances. In spite of this, we have at our disposal another good representative for the connective or : the supremum A∨B of two closed subspaces, that is the smallest closed subspace including both A and B. Of course, A ∨ B will include A ∪ B. As a consequence, we obtain the following structure: C(H) , ∧ , ∨ , ′ , 0 , 1 , where ∧ is the set-theoretic intersection; ∨ , ′ are defined as above; while 0 and 1 represent, respectively, the null subspace (the singleton consisting of the null vector, which is the smallest possible subspace) and the total space H. One can prove that C(H) , ∧ , ∨ , ′ , 0 , 1 is a complete ortholattice.7 Structures of this kind are called Hilbert lattices. By the one-to-one correspondence between the set C(H) of all closed subspaces and the set Π(H) of all projections of H, the lattice based on Π(H) turns out to be isomorphic to the lattice based on C(H). Hence, also projections give rise to a Hilbert lattice.8 Let L(H) represent either C(H) or Π(H). Any Hilbert lattice L(H) is an orthomodular lattice, i.e. the following equation holds for all A, B, C ∈ L(H): Orthomodularity If A ≤ B and A′ ≤ C, then A ∨ (B ∧ C) = (A ∨ B) ∧ (A ∨ C).9 Orthomodularity is a weakening of the following distributivity law (which is not valid in a Hilbert lattice): Distributivity A ∨ (B ∧ C) = (A ∨ B) ∧ (A ∨ C) But the orthomodularity property is not the finest weakening of distributivity. The modularity property Modularity If A ≤ B, then A ∨ (B ∧ C) = (A ∨ B) ∧ (A ∨ C) is strictly stronger than orthomodularity. It is not difficult to prove that a Hilbert lattice L(H) is modular if and only if H is finite dimensional as a linear space. An abstract projective geometry is a modular ortholattice. It must be emphasized that the lattice of projections of an infinite dimensional Hilbert space is not 7 See
Def. 47-57. Def. 90. 9 An equivalent definition of orthomodularity is the following: if A ≤ B, then B = A∨(A∨B ′ )′ (see Def. 60). 8 See
210
M. L. Dalla Chiara, R. Giuntini and M. R´edei
modular, it is only orthomodular. Von Neumann and Birkhoff were fully aware of this,10 and also of the fact that the Hilbert space needed to describe a quantum mechanical system is typically infinite dimensional. Consequently, by insisting on the modularity of the quantum propositional system they rejected the standard Hilbert lattice as the proper candidate for quantum logic. One should realize the seriousness and the counterintuitive nature of the Birkhoffvon Neumann suggestion: insisting on the modularity of the quantum propositional system they implicitly also rejected the standard Hilbert space formalism of quantum mechanics! This was a very radical position in 1936, and was probably another reason why the community of mathematical physicists did not jump on the Birkhoff-von Neumann idea of quantum logic. If however one rejects the standard Hilbert space formalism of quantum mechanics, one has to make a suggestion as to what to replace it by. By 1936 von Neumann had an answer to this question (we will return to this issue in section 1.4.). Before turning to this problem, let us see why Birkhoff and von Neumann regarded modularity crucial.
1.3 Modularity and probability To see why von Neumann insisted on the modularity of quantum logic, one has to understand that he wanted quantum logic to be not only the propositional calculus of a quantum mechanical system but also wanted it to serve as the event structure in the sense of probability theory. In other words, what von Neumann aimed at was establishing the quantum analogue of the classical situation, where a Boolean algebra can be interpreted both as the propositional algebra of a classical propositional logic and as the algebraic structure representing the random events of a classical probability theory, with probability being an additive normalized measure on the Boolean algebra. A characteristic property of a classical probability measure is the following Strong additivity property
µ(A) + µ(B) = µ(A ∨ B) + µ(A ∧ B).11 10 To be more precise: both knew that the Hilbert lattice is not modular in the infinite dimensional case (the 1936 paper contains explicit examples of infinite dimensional subspaces violating the modularity law); however, the orthomodularity property is not stated explicitly in [Birkhoff and von Neumann, 1936] as a condition generally valid in Hilbert lattices, and it is not clear whether Birkhoff or von Neumann had been aware of the general validity of orthomodularity property in Hilbert lattices. 11 The inequality µ(A) + µ(B) ≥ µ(A ∨ B)
is called subadditivity. Clearly, if a measure µ is strongly additive, then it also is subadditive. It can be shown that the converse also is true in the framework of von Neumann algebras (see [Petz and Zemanek, 1988]).
The History of Quantum Logic
211
Von Neumann’s insistence on the frequency-interpretation of probability in the years 1927-1932 makes understandable why he considered the strong additivity a key feature of probability. Assume that the probability p(X) (where X is an event) is to be interpreted as relative frequency in the following sense (advocated by von Mises [von Mises, 1919; Mises, 1928]): 1. There exists a fixed ensemble E consisting of N events such that 2. for each event X one can decide unambiguously and 3. without changing the ensemble whether X is the case or not; 4. p(X) = case.12
#(X) N
where #(X) is the number of events in E for which X is the
Under the assumptions (1)–(4) strong additivity holds trivially; so the strong additivity is a necessary condition for a probability measure p to be interpretable as relative frequency in the sense of (1)–(4). In 1936 von Neumann required the existence of an “a priori” probability measure on quantum logic, a probability measure which, besides being strong additive, also is faithful in the sense that every non-zero event has a finite, non-zero probability value.13 Hence, according to von Neumann, quantum logic is supposed to be a lattice L on which there exists a finite “a priori quantum probability” i.e. a map d having finite, non-negative values and having the following two properties: (i) d(A) < d(B),
if
A < B;
(ii) d(A) + d(B) = d(A ∨ B) + d(A ∧ B) A non-negative map d on a lattice having the two properties (i)–(ii) is called a dimension function. It is easy to prove, and both Birkhoff and von Neumann had known already very well, that if a lattice L admits a dimension function that takes on only finite values, then L is modular. Thus modularity of a lattice L is a necessary condition for a probability measure to exist on a lattice, if the probability is supposed to be interpreted as relative frequency. Since a Hilbert lattice Π(H) is not modular in general, there exists no finite dimension function on Π(H), i.e. there exists no a priori probability on the quantum logic determined by the Hilbert space formalism. Von Neumann viewed this fact as a pathological property of the Hilbert space formalism. It was largely because of this pathology that von Neumann expected the standard Hilbert space formalism to be superseded #(X)
12 Strictly speaking one should write p(X) = lim ; however, the limit is not imporN →∞ N tant from the point of view of the present considerations, so we omit it. 13 In [Birkhoff and von Neumann, 1936] the a priori probability is called the “a priori thermodynamical weight of states” [Birkhoff and von Neumann, 1936, p. 115]. For an explanation of this terminology see [R´ edei, 1998, Chapter 7].
212
M. L. Dalla Chiara, R. Giuntini and M. R´edei
by a mathematical theory that he hoped would be more suitable for quantum mechanics.14 However, there does exist exactly one (up to constant multiple) function d on the lattice of projections of a Hilbert space that is faithful and satisfies both (i) and (ii): this is the usual dimension function d, the number d(A) being the linear dimension of the linear subspace A. (Equivalently: d(A) = Tr(A), where Tr is the trace functional15 ). But this d is not finite if the Hilbert space is not finite dimensional. So one realizes that the conditions (i)-(ii) can be satisfied with a finite d, if d is the usual dimension function and L is the projection lattice of a finite dimensional Hilbert space. The assumption of a finite dimension function (a priori probability) is thus consistent with the assumption that the lattice is non-distributive; consequently, requiring the existence of a well-behaving finite a priori probability does not exclude the existence of non-classical probability structures. But the modular lattices of finite dimensional linear spaces with the discrete dimension function are certainly not sufficient as a framework for quantum theory, since one needs infinite dimensional Hilbert spaces to accommodate quantum mechanics (for instance the Heisenberg commutation relation cannot be represented on a finite dimensional Hilbert space). Birkhoff and von Neumann had known this, and they also pointed out that it would be desirable to find models of quantum logic with a non-discrete dimension function. Thus the fate of the Birkhoff-von Neumann idea of quantum logic as a modular, non-distributive lattice hinges upon whether there exist modular lattices with a finite dimension function that are not isomorphic to the modular lattice of projections of a finite dimensional linear space. However, the question of whether such modular lattices exist remains unanswered in the 1936 Birkhoff-von Neumann paper [Birkhoff and von Neumann, 1936]: one finds only a reference to the paper by Murray and von Neumann [Murray and von Neumann, 1936], where “a continuous dimensional model” of quantum logic is claimed to be worked out.
1.4 Modularity in von Neumann lattices The paper [Murray and von Neumann, 1936] shows that there exist non-distributive, modular lattices different from the Hilbert lattice of a finite dimensional Hilbert space. Proving the existence of such a structure is part of what is known as the “classification theory of von Neumann algebras”, which has since 1936 become a classical chapter in the theory of operator algebras.16 The relevant – and surprising – result of this classification theory is that there exists a modular lattice of non-finite (linear) dimensional projections on an infinite dimensional Hilbert 14 Other features of the Hilbert space formalism which he viewed as unsatisfactory include the pathological behavior of the set of all unbounded operators on a Hilbert space and the unphysical nature of the common product (composition) of operators. 15 The trace functional is defined in Def. 91. 16 See Def. 94-99, Theorems 100-102 and Corollary 103.
The History of Quantum Logic
213
space, and that on this lattice there exists a (unique up to normalization) dimension function d that takes on every value in the interval [0, 1]. The von Neumann algebra generated by these projections is called the “type II1 factor von Neumann algebra” N . 17 Furthermore, it can be shown that the unique dimension function on the lattice of projections of a type II1 factor comes from a (unique up to constant) trace τ defined on the factor itself – just like in the finite dimensional case, where too the dimension function is the restriction of the trace functional Tr to the lattice of projections. The difference between Tr and τ is the following: Tr is determined (up to constant multiple) by the requirement of unitary invariance with respect to all unitary operators. In other words: Tr(V AV ∗ ) = Tr(A), for all unitary operators V on the finite dimensional space H (where V ∗ is the adjoint of V ).18 The trace τ , instead, is determined (up to constant multiple) by unitary invariance with respect to every unitary operators belonging to the algebra: τ (V AV ∗ ) = τ (A), for every unitary operator V ∈ N . The trace τ also has the property: τ (AB) = τ (BA)
for all
A, B ∈ N .
Thus it seems that the modular lattice of projections of a type II1 algebra should have emerged for Birkhoff and von Neumann as the winning candidate for quantum logic; and so one would expect this lattice to be declared in the Birkhoff von Neumann paper to be the propositional system of quantum logic. But this is not quite the case: as we have seen in section 1.2, Birkhoff and von Neumann postulate that the quantum propositional calculus is isomorphic to an abstract projective geometry; in fact, the published paper does not at all refer to the results of the Murray-von Neumann classification theory of von Neumann algebras to support the modularity postulate. Why? One can answer this question on the basis of the unpublished letters of von Neumann to Birkhoff [von Neumann, forthcoming]. Von Neumann and Birkhoff had been engaged in an intense technical correspondence during the preparation of the manuscript of their 1936 paper. The correspondence took place in 1935, and the clues in von Neumann’s letters make it possible to reconstruct the major steps in the thought process that led to the main ideas of the 1936 Birkhoff-von Neumann paper. A detailed reconstruction ([R´edei, submitted], Introduction in [von Neumann, forthcoming]) of the development of the Birkhoff-von Neumann paper shows that von Neumann’s mind moved extremely quickly from the level of 17 For the details of the dimension theory see eg. [Takesaki, 1979], for a brief review we refer to [Petz and R´edi, 1995] or [R´ edei, 1998]. 18 See Def. 93 and 88.
214
M. L. Dalla Chiara, R. Giuntini and M. R´edei
abstractness of von Neumann algebras to the level of abstractness represented by continuous geometries19 — and this move was taking place precisely during the preparation of the quantum logic paper. In his letter to Birkhoff dated November 6, 1935 [von Neumann, forthcoming] von Neumann writes that it would be both desirable and possible to work out a general theory of dimension in complemented, modular lattices, which he viewed as the essential structural property of type II1 von Neumann algebras. In his letter written a week later (November 13, 1935), [von Neumann, forthcoming], von Neumann already gives a detailed description of his results on continuous geometry: on every projective geometry there exists a dimension function d having the properties (i)-(ii). Soon von Neumann proved however that all continuous geometries that admit a transition probability are isomorphic to projection lattices of finite von Neumann algebras; hence, as Halperin points out . . . continuous geometries do not provide new useful mathematical descriptions of quantum mechanical phenomena beyond those already available from rings of operators [= von Neumann algebras].6[Halperin, 1961, p. 191] The finite dimension function on a projective geometry, in particular the dimension function with the range [0, 1] on the continuous projective geometry defined by a type II1 factor von Neumann algebra, was for von Neumann crucially important: he interpreted it as a probability measure on the modular lattice of the quantum propositional system. This created an analogy with classical logic and probability theory, where a Boolean algebra is both a propositional system and a random event structure on which probability measures are defined. The Birkhoff von Neumann paper points out that property (ii) of the dimension function describes the strong additivity property of probability. As we have seen in section 1.3, strong additivity is a necessary condition for a measure to be interpreted as probability understood as relative frequency in the sense of von Mises, so the modularity property of the quantum propositional system understood as the von Neumann lattice of projections of a type II1 von Neumann algebra ensured a necessary condition for quantum probability to be interpreted as relative frequency (see [R´edei, 2001] and [R´edei, 1999] for a detailed discussion of this point.) So it seems that Birkhoff and von Neumann succeeded in isolating the algebraic structure suitable for representing both quantum propositional and quantum event structures, with the possibility of interpreting probability on the event structure as relative frequency. Yet, von Neumann was not entirely happy with the idea of quantum logic as a modular lattice. He voiced his frustration in a letter of July 2, 1945 to the President of the Washington Philosophical Society, F.B. Silsbee, to whom von Neumann promised in 1945 to write a paper on quantum logic. The paper was never written, and von Neumann apologized: 19 For
the notion of continuous geometry see Def. 52 and 62.
The History of Quantum Logic
215
It is with great regret that I am writing these lines to you, but I simply cannot help myself. In spite of very serious attempts to write the article on the “Logics of quantum mechanics” I find it completely impossible to do it at this time. As you may know, I wrote a paper on this subject with Garrett Birkhoff in 1936 (“Annals of Mathematics”, vol. 37, pp. 823-843), and I have thought a good deal on the subject since. My work on continuous geometries, on which I gave the Amer. Math. Soc. Colloquium lectures in 1937, comes to a considerable extent from this source. Also a good deal concerning the relationship between strict and probability logics (upon which I touched briefly in the Henry Joseph Lecture) and the extension of this “Propositional calculus” work to “logics with quantifiers” (which I never so far discussed in public). All these things should be presented as a connected whole (I mean the propositional and the “quantifier” strict logics, the probability logics, plus a short indication of the ideas of “continuous” projective geometry), and I have been mainly interrupted in this (as well as in writing a book on continuous geometries, which I still owe the Amer.Math.Soc.Colloqium Series) by the war. To do it properly would require a good deal of work, since the subjects that have to be correlated are very heterogenous collection – although I think that I can show how they belong together. When I offered to give the Henry Joseph Lecture on this subject, I thought (and I hope that I was not too far wrong in this) that I could give a reasonable general survey of at least part of the subject in a talk, which might have some interest to the audience. I did not realize the importance nor the difficulties of reducing this to writing. I have now learned – after a considerable number of serious but very unsuccessful efforts – that they are exceedingly great. I must, of course, accept a good part of the responsibility for my method of writing – I write rather freely and fast if a subject is “mature” in my mind, but develop the worst traits of pedantism and inefficiency if I attempt to give a preliminary account of a subject which I do not have yet in what I can believe in its final form. I have tried to live up to my promise and to force myself to write this article, and spent much more time on it than on many comparable ones which I wrote with no difficulty at all — and it just didn’t work. [von Neumann, 1045] Why didn’t it work? Since von Neumann does not elaborate further on the issue in the letter — nor did he ever publish any paper after 1936 on the topic of quantum logic — all one can do is try to interpret von Neumann’s published works to understand why he considered his efforts unsatisfactory. The following seems to be a reasonable interpretation. What von Neumann aimed at in his quest for quantum logic in the years 1935-36 was establishing the
216
M. L. Dalla Chiara, R. Giuntini and M. R´edei
quantum analogue of the classical situation, where a Boolean algebra can be interpreted as being both the propositional algebra of a classical propositional logic and the algebraic structure representing the random events of a classical probability theory, with probability being an additive normalized measure on the Boolean algebra satisfying strong additivity, and where the probabilities can also be interpreted as relative frequencies. The problem is that there exist no “properly non-commutative” versions of this situation. The only (irreducible) examples of non-commutative probability spaces probabilities of which can be interpreted via relative frequencies are the modular lattices of the finite (factor) von Neumann algebras with the canonical trace; however, the non-commutativity of these examples is somewhat misleading because the non-commutativity is suppressed by the fact that the trace is exactly the functional that is insensitive for the noncommutativity of the underlying algebra (see equation τ (AB) = τ (BA)). So it seems that while one can have both a non-classical (quantum) logic and a mathematically impeccable non-commutative measure theory, the conceptual relation of these two structures cannot be the same as in the classical, commutative case – as long as one views the measure as probability in the sense of relative frequency. This must have been the main reason why after 1936 von Neumann abandoned the relative frequency view of probability: This view, the very brilliantly however, is not context.
so-called “frequency theory of probability” has been upheld and expounded by R. von Mises. This view, acceptable to us, at least not in the present “logical” [von Neumann, 1937] ([von Neumann, 1961b, p. 196]
Instead, from 1936 on and based on the concept of quantum logic as the von Neumann lattice of a type II1 von Neumann algebra, von Neumann favors what can be called a “logical interpretation”. In this interpretation, advocated by von Neumann explicitly in his address to the 1954 International Congress of Mathematicians (Amsterdam, 1954) [von Neumann, 1954], quantum logic determines the (quantum) probability: once a type II1 von Neumann algebra and the quantum logic it determines are given, probability is also determined by the formula τ (V AV ∗ ) = τ (A); i.e. von Neumann sees logic and probability emerging simultaneously. Von Neumann did not think, however, that this rather abstract idea had been worked out by him as fully as it should. Rather, he saw in the unified theory of logic, probability and quantum mechanics a problem area that he thought should be further developed. He finishes his address to the Amsterdam Conference with these words: I think that it is quite important and will probably shade a great deal of new light on logics and probably alter the whole formal structure of logics considerably, if one succeeds in deriving this system from first principles, in other words from a suitable set of axioms. All the existing axiomatizations of this system are unsatisfactory in this sense, that
The History of Quantum Logic
217
they bring in quite arbitrarily algebraical laws which are not clearly related to anything that one believes to be true or that one has observed in quantum theory to be true. So, while one has very satisfactorily formalistic foundations of projective geometry of some infinite generalizations of it, including orthogonality, including angles, none of them are derived from intuitively plausible first principles in the manner in which axiomatizations in other areas are. Now I think that at this point lies a very important complex of open problems, about which one does not know well of how to formulate them now, but which are likely to give logics and the whole dependent [von Neumann, 1954, p. 245] system of probability a new slam. Neither von Neumann nor Birkhoff published any paper on quantum logic after 1936, and the field remained essentially inactive for the next two decades. 2
THE RENAISSANCE OF THE QUANTUM LOGICAL APPROACHES TO QUANTUM THEORY
Birkhoff and von Neumann’s joint article did not arouse any immediate interest, neither in the logical nor in the physical community. Only twenty years later one has witnessed a “renaissance period” for the logico-algebraic approach to QT. This has been mainly stimulated by the work of Jauch, Piron, Varadarajan, Suppes, Finkelstein, Foulis, Randall, Greechie, Gudder, Beltrametti, Cassinelli, Mittelstaedt and many others. A crucial turning point for this research was the appearance of George Mackey’s book Mathematical Foundations of Quantum Theory (1957). Strangely enough, the new quantum logical community that began to work at end of the Fifties did not seem aware of Birkhoff and von Neumann’s difficulties concerning the modularity question. According to the commonly accepted view, “Birkhoff and von Neumann’s quantum logic” was generally identified with the algebraic structure of a Hilbert lattice. This view was probably based on an apparently clear statement that is asserted at the very beginning of Birkhoff and von Neumann’s paper: Our main conclusion, based on admittedly heuristic arguments, is that one can reasonably expect to find a calculus of propositions which is formally indistinguishable from the calculus of linear subspaces with respect to set products, linear sums, and orthogonal complements - and resembles the usual calculus of propositions with respect to and , or , and not. At the same time, the new proposals were characterized by a more general approach, based on a kind of abstraction from the Hilbert space structures. The starting point of the new trends can be summarized as follows. Generally, any
218
M. L. Dalla Chiara, R. Giuntini and M. R´edei
physical theory T determines a class of event-state systems (E , S), where E contains the events that may occur to a given system, while S contains the states that such a physical system described by the theory may assume. The question arises: what are the abstract conditions that one should postulate for any pair (E , S)? In the case of QT, having in mind the standard Hilbert space model, one is naturally led to the following requirement: • the set E of events should be a “good” abstraction from Hilbert lattices. • The set S of states should be a “good” abstraction from the density operators in a Hilbert space, that represent possible states of physical systems. As a consequence, any state shall determine a probability measure, that assigns to any event in E a value in the interval [0, 1]. Both in the concrete and in the abstract case, states may be either pure (maximal pieces of information that cannot be consistently extended to a richer knowledge) or mixtures (non-maximal pieces of information). In such a framework two basic problems have been discussed: I) Is it possible to capture, by means of some abstract conditions that are required for any event-state pair (E , S), the behavior of the concrete Hilbert space pairs? II) To what extent should the standard Hilbert space model be absolutely binding? The first problem gave rise to a number of attempts to prove a kind of representation theorem. More precisely, the main question was: what are the necessary and sufficient conditions for a generic event-state pair (E , S) that make E isomorphic to a Hilbert lattice? The representation problem has been successfully solved only in 1995 by an important theorem proved by M.P. Sol`er. The second problem stimulated the investigation about more and more general quantum structures. Of course, looking for more general structures seems to imply a kind of discontent towards the standard quantum logical approach, based on Hilbert lattices. The fundamental criticisms that have been raised are quite independent of Birkhoff and von Neumann’s doubts and concern the following items: 1) The standard structures seem to determine a kind of extensional collapse. In fact, the closed subspaces of a Hilbert space represent at the same time physical properties in an intensional sense and the extensions thereof (sets of states that certainly verify the properties in question). As happens in classical set-theoretic semantics, there is no mathematical representative for physical properties in an intensional sense. Foulis and Randall have called such an extensional collapse “the metaphysical disaster” of the standard quantum logical approach.
The History of Quantum Logic
219
2) The lattice structure of the closed subspaces automatically renders the quantum event system closed under logical conjunction. This seems to imply some counterintuitive consequences from the physical point of view. Suppose two experimental propositions that concern two incompatible quantities, like “the spin in the x direction is up”, “the spin in the y direction is down”. In such a situation, the intuition of the quantum physicist seems to suggest the following semantic requirement: the conjunction of such propositions has no definite meaning; for, they cannot be experimentally tested at the same time. As a consequence, a lattice structure for the event system seems to be too strong. An interesting weakening can be obtained by giving up the lattice condition: generally the supremum is assumed to exist only for countable sets of events that are pairwise orthogonal . In other words, the event-structure is supposed to be a σorthocomplete orthomodular poset, which is not necessarily a lattice. In the recent quantum logical literature such a structure has been often simply called a quantum logic. At the same time, by standard quantum logic one usually means a Hilbert lattice. Needless to say, such a terminology that identifies a logic with a particular example of an algebraic structure turns out to be somewhat misleading from the strict logical point of view. As we will see in Section 3, different forms of quantum logic, which represent “genuine logics” according to the standard way of thinking of the logical tradition, have been characterized by convenient abstraction from the physical models.
2.1
Abstract event-state systems
After the appearance of Mackey’s book, the notion of event-state system has been analyzed by many authors. We will give here a synthetic idea of such investigations by referring to one of the most natural presentations, which has been proposed by Gudder in 1979.20 In the framework of Gudder’s approach, an event-state system is identified with a pair (E, S) consisting of two nonempty sets: the set E of the events (that may occur to a quantum system) and the set S of states (that the quantum system may assume). Events are supposed to be structured as an event algebra (a σorthocomplete orthomodular poset), while states are (nonclassical) probability measures that assign a probability-value to each event in E. More precisely, Gudder’s definition can be formulated as follows: DEFINITION 1 (Event-state system). An event-state system is a pair (E, S), where: 1. E, the set of events, has the structure of an event algebra (a σ-orthocomplete orthomodular poset) : E , ≤ , ′ , 0 , 1 . 20 See [Gudder, 1979]. See also [Beltrametti and Cassinelli, 1981] and [Pt´ ak and Pulmannov´ a, 1991].
220
M. L. Dalla Chiara, R. Giuntini and M. R´edei
In other words: (1.1) E , ≤ , 0 , 1 is a bounded poset (a partially ordered set bounded by the minimum 0 and by the maximum 1); (1.2) the (unary) operation ′ is an orthocomplement (satisfying the conditions: E = E ′′ ; if E ≤ F then F ′ ≤ E ′ ; E ∧ E ′ = 0; E ∨ E ′ = 1, for any events E and F ); (1.3) if E ≤ F ′ , then E ∨ F ∈ E; (1.4) if E ≤ F , then F = E ∨ (E ∨ F ′ )′ (orthomodularity) (1.5) For any countable set {En }n∈I of pairwise orthogonal events (such that Ei ≤ Ej′ , whenever i = j), the supremum {En }n∈I exists in E.
(2) S, the set of states is a set of probability measures s of E. In other words: (2.1) s(0) = 0, s(1) = 1; (2.2) For any countable {En }n∈I of pairwise orthogonal events: s(En ). s( {En }n∈I ) = n
2. S is order determining for the event-algebra. In other words, for any pair of events E and F , ∀s ∈ S [s(E) ≤ s(F )] implies E ≤ F. 3. S is σ-convex. In other words, for any countable set of states, {sn }n∈I and for any countable set {λn }n∈I of nonnegative real numbers such that n λn = 1, there is a state s in S such that for any event E: s(E) = λn sn (E). n
The state s (indicated by of the states sn ) .
n
λn sn ) is called a convex combination or mixture
It is worthwhile to stress the basic differences between Gudder’s approach and von Neumann’s ideas: • Gudder’s event algebras are weak structures that are not even lattices. As we have seen, von Neumann instead considered the structure of modular lattices essential for a quantum propositional system. • Gudder’s states are weak nonclassical probability measures that generally violate the strong additivity condition. It may happen: s(E) + s(F ) = s(E ∨ F ) + s(E ∧ F ).
The History of Quantum Logic
221
• While von Neumann looked for an a priori intrinsic probability measure determined by the event structure, a characteristic of the new event-state approach is the plurality of the probability measures that represent possible states of the quantum system under investigation. Apparently, Gudder’s definition postulates a strong interaction between events and states. In particular, states are ”responsible” for the event- order ≤ (by condition (3)). On this basis the notion of observable (which represents one of the fundamental concepts for any physical theory) can be naturally defined. An observable is identified with a map M that associates to any Borel set ∆ a particular event, representing the state of affairs: “the value of the observable M lies in the Borel set ∆”. It is required that the map M preserves some structural properties of the σ-Boolean algebra of all Borel sets. The precise definition is the following: DEFINITION 2 (Observable). An observable of E is an event-valued measure M on the Borel sets. In other words, M is a σ-homomorphism of B(IR) into E, that satisfies the following conditions: (1) M (∅) = 0, M (IR) = 1; (2) ∀∆, Γ ∈ B(IR): if ∆ ∩ Γ = ∅, then M (∆) ≤ (M (Γ))′ ; (3) If {∆i }i∈I is a countable set of real Borel sets such that ∆j ∩ ∆k = ∅, whenever j = k, then: {∆i }i∈I = {M (∆i )}i∈I . M As we have seen, Gudder’s definition requires a minimal structure for the eventset. Unlike Hilbert space projections, Gudder’s abstract events do not generally give rise to a lattice. Hence the conjunction of two incompatible events (that cannot be simultaneously tested) may be undefined. Stronger definitions have been proposed by other authors, for instance by the so called ”Geneva School” (Jauch, Piron, Aerts and others). The axioms assumed in the framework of this approach guarantee, right from the outset, a lattice-structure for the events of a quantum system. Physical reasons that justify this abstract choice have been discussed in many contributions of the Geneva School.21 In the framework of Gudder’s approach, a lattice structure can be finally recovered by assuming some stronger conditions that also concern the interaction between states and events. Let us briefly sketch how such interaction can work. So far we have seen how states “act” on events, inducing a particular algebraic structure on the set E. There is also an inverse “interaction:” the set of all events induces a preclusivity space on the set of all states. Let us first recall the abstract definition of preclusivity space, which will play a very important role for the possible world semantics of quantum logic. 21 See
[Jauch, 1968; Piron, 1976; Aerts, 1984].
222
M. L. Dalla Chiara, R. Giuntini and M. R´edei
DEFINITION 3 (Preclusivity space). A preclusivity space is a system (U, ⊥), where • U (called the support) is a nonempty set of objects; • ⊥ is an irreflexive and symmetric relation defined on U . In other words: (i) ∀x ∈ U : not x ⊥ x; (ii) ∀x, y ∈ U : x ⊥ y implies y ⊥ x. In the quantum theoretical applications, the universe can be identified with the set S of all states of a Gudder event-state system (E, S). In other words, one can also say that the universe represents a set of micro-objects prepared in different states. A preclusivity relation ⊥ can then be defined by referring to the set E of all events. DEFINITION 4 (The preclusivity relation between states). Given two states s, t ∈ S, s ⊥ t iff ∃E ∈ E [s(E) = 1 and t(E) = 0] . In other words, two states are preclusive iff they are strongly distinguished by at least one event, which is certain for the first state and impossible for the second state. One can easily check that ⊥ is a preclusivity relation. Every preclusivity space has a natural “twin space” (S, R), which is a similarity space. DEFINITION 5 (Similarity space). A similarity space is a system (U, R), where • U (called the universe) is a nonempty set of objects; • R is a reflexive and symmetric relation defined on U . In other words: (i) ∀x ∈ U : xRx; (ii) ∀x, y ∈ U : xRy implies yRx. The “twin” similarity space of the preclusivity space (S, ⊥) is the space (S, R), where the similarity relation R is the negation of the preclusivity relation ⊥. In other words: ∀x, y ∈ U : xRy iff not x ⊥ y. Apparently, the similarity relation R has the following meaning: sRt iff there is no event E such that: s(E) = 1 and t(E) = 0. In other words, s and t are similar iff they cannot be strongly distinguished by any event.
The History of Quantum Logic
223
We use the following abbreviations: s ⊥ X for ∀t ∈ X(s ⊥ t); sRX for ∀t ∈ X(sRt). X ⊥ Y for ∀t ∈ X(t ⊥ Y ); XRY for ∀t ∈ X(tRY ). In quantum contexts, the similarity relation (which represents the negation of the orthogonality relation ⊥) is usually indicated by ⊥. While ⊥ is reflexive and symmetric, it is not generally transitive. Consider now the power set P(S) of the set of all states S. The preclusivity relation ⊥ permits one to define on P(S) a unary operation ⊥ (called the preclusive complement), which turns out to be a weak complement. For any set X of states: X ⊥ := {x ∈ S : ∀t ∈ X(s ⊥ t)} . The preclusive complement ⊥ satisfies the following properties for any sets X, Y of states: • X ⊆ X ⊥⊥ ; • X ⊆ Y implies Y ⊥ ⊆ X ⊥ ; • X ∩ X ⊥ = ∅. At the same time, the strong double negation principle (X ⊥⊥ ⊆ X) and the excluded middle principle ( X ∪ X ⊥ = S) generally fails. Consider now the map ⊥⊥ : P(S) → P(S) such that: X X ⊥⊥ , for any X ⊆ S. One can easily check that this map is a closure operator , satisfying the following conditions: ∅⊥⊥ = ∅; X ⊆ X ⊥⊥ ; X ⊥⊥ = X ⊥⊥⊥⊥ ; X ⊆ Y X ⊥⊥ ⊆ Y ⊥⊥ . Consider then the set C(P(S)) of all closed elements of the power set of S. By definition, we have: X ∈ C(P(S)) iff X = X ⊥⊥ . The elements of C(P(S)) are called closed sets of states. As we will see, such sets play a very significant role for the semantics of quantum logic. A characteristic property of the closed sets of a preclusivity space is described by the following lemma. LEMMA 6. If (U, R) is a similarity space associated with a preclusivity space (U, ⊥), and if X is any subset of U , then, X is closed iff X satisfies the following condition: ∀x[x ∈ X iff ∀yRx∃zRy(z ∈ X)].
224
M. L. Dalla Chiara, R. Giuntini and M. R´edei
The following theorem gives important information about the algebraic structure of C(P(S)). THEOREM 7. The structure
C(P(S)) , ⊆ ,
⊥
, ∅, S
is a complete bounded ortholattice, where for any family {Xi }i∈I ⊆ C(P(S):
• the meet {Xi }i∈I exists and coincides with {Xi }i∈I ; • the join {Xi }i∈I exists and coincides with ( {Xi }i∈I )⊥⊥ ; • the preclusive complement
⊥
is an orthocomplement.22
Generally the lattice C(P(S)) fails to be distributive. Now, we have focused two special structures: • the σ-orthocomplete orthomodular poset E , ≤ , ′ , 0 , 1 based on the set of all events; • the complete ortholattice C(P(S)) , ⊆ ,
⊥
, ∅, S
based on the set of all closed sets of states.
A natural question arises: is there any structural relation between the two structures? If we want to obtain some interesting connections between E and C(P(S)), we shall require some further conditions. To this aim, we will first introduce a special map, called Yes, that will associate to any event E the set of all states that assign to E probability-value 1. In other words: Yes : E → P(S), where ∀E ∈ E, Yes(E) := {s ∈ S : s(E) = 1} . From an intuitive point of view, the set Yes(E) represents a kind of extension of the event E. If we regard states as possible worlds (in the sense of Kripkean semantics), then Yes(E) can be thought of as the set of all the worlds that certainly verify the event E. By adopting a standard semantic jargon, we can say that Yes(E) (which is also called the positive domain of E) represents the proposition that is associated to the event E. The map Yes is clearly order preserving. In other words, ∀E, F ∈ E, E ≤ F implies Yes(E) ⊆ Yes(F ). 22 See
Def. 47-57.
The History of Quantum Logic
225
However, Yes is not generally injective. It may happen that two different events have one and the same positive domain (as shown by some counterexamples). More importantly, Yes(E) is not generally a closed set of states, i.e., Yes(E) ∈ / C(P(S)). Is it possible to make Yes injective and Yes(E) a closed set? To make Yes injective it is sufficient to require that the set S satisfies a special property, usually called richness. DEFINITION 8. Let (E, S) be an event-state system. The set S is called rich for E iff ∀E, F ∈ E: Yes(E) ⊆ Yes(F ) implies E ≤ F. In other words, the set of states is rich for the set of all events, whenever the event order (which is determined by the set of states) turns out to be completely determined by the behavior of the states with respect to the certain probabilityvalue 1. Needless to stress, the richness property is stronger than the orderdetermining property. For this reason, it has sometimes been called strongly-order determining 23 . One immediately sees that, if the set of states is rich, then the map Yes is injective. For, suppose Yes(E) = Yes(F ). Then, by richness, E ≤ F and F ≤ E; hence, E = F (by the antisymmetry of ≤). How can we make Yes(E) closed (for any event E)? To this aim, let us first introduce the notion of carrier of a state. DEFINITION 9 (Carrier of a state). Let (E, S) be an event-state system. An event E is called the carrier of a state s iff the following conditions are satisfied: (i) s(E) = 1, (ii) ∀F ∈ E: s ∈ Yes(F ) implies E ≤ F . Apparently, the carrier of a state s is the smallest element of E to which s assigns value 1. Generally, it is not the case that every state has a carrier. However, one can easily show that if the carrier of a state exists, then it is unique. When existing, the carrier of the state s will be denoted by car(s). From an intuitive point of view, we could say that car(s) represents the characteristic property of the physical object described by s (a kind of individual concept in Leibniz’ sense). A situation where any state has a carrier corresponds to a richness property of the set of all events: each state is “characterized” by a special event. We call normal any event-state system (E, S) that satisfies the following conditions: (i) The set S of all states is rich for the set E of all events. (ii) For any state s ∈ S, the carrier of s exists. 23 See,
for instance, [Beltrametti and Cassinelli, 1981].
226
M. L. Dalla Chiara, R. Giuntini and M. R´edei
One can prove that for any normal event-state system, the positive domain of any event is closed. In other words: THEOREM 10. Let (E , S) be a normal event-state system. Then, ∀E ∈ E, Yes(E) = (Yes(E))⊥⊥ . Let us finally ask whether the map Yes preserves the existing meets and joins of the poset E. The answer is given by the following theorem. THEOREM 11. Let (E , S) be a normal event-state system. Then, Yes is an ortho-embedding of E into C(P(S)) that preserves all existing meets and joins in E. Thus, for normal event-state systems, the map Yes is an embedding of the σorthocomplete orthomodular poset E , ≤ , ′ , 0 , 1 into the complete ortholattice ⊥ C(P(S)) , ⊆ , , ∅ , S . One immediately sees that the map Yes is not generally surjective. In order to make the map surjective it is necessary and sufficient to require that the orthomodular poset E is a complete lattice. THEOREM 12. Let (E , S) be a normal event-state system. Then, the map Yes is surjective iff E is a complete lattice. From the semantic point of view, a situation where the map Yes is surjective leads to a kind of extensional collapse: the structure of all events is isomorphic to the structure of all possible propositions. We call supernormal the event-state systems that satisfy this strong condition. As we will see, orthodox QT gives rise to event-state systems of this kind.
2.2 Concrete event-state systems and the representation problem What are the basic relations between abstract event-state systems and the concrete examples that emerge in the framework of Hilbert space structures? Is it possible to capture, by means of some abstract conditions that can be required for any abstract event-state pair (E, S), the characteristic properties of the concrete Hilbert space pairs? This is one of the most important questions that have been discussed during four decades in the framework of the logico-algebraic approach to QT. The first step in this analysis has been focusing upon the characteristic properties of the concrete event-state pairs. As we have learnt, a concrete event-structure can be identified with a Hilbert lattice, based either on the set C(H) of all closed subspaces of a Hilbert space H or (equivalently) on the set Π(H) of all projections of H. Such a structure is a complete orthomodular lattice, which fails to be distributive. Lattices of this kind turn out to satisfy a number of special conditions that do not generally hold for the events of an abstract event-state system. In particular,
The History of Quantum Logic
227
it has been shown24 that any Hilbert lattice L(H) is an atomic, irreducible, nondistributive complete orthomodular lattice that satisfies the covering property.25 What can be said about the states of a concrete event-state system? As we already know, according to von Neumann’s axiomatization, the states of a quantum system S are mathematically represented by density operators of the Hilbert space H associated to S.26 The class of all density operators of H is denoted by D(H). A density operator ρ represents a pure state (maximal information about S) iff there is a unit vector ψ such that ρ is the projection P[ψ] , where [ψ] is the 1-dimensional closed subspace determined by ψ. Any density operator ρ determines a map mρ : Π(H) → IR such that ∀P ∈ Π(H): mρ (P ) = Tr(ρP ), (where Tr is the trace functional). One can show that mρ is a (non-classical) probability measure on Π(H). In other words: • mρ (0) = 0, mρ (1) = 1 and • for any countable set {Pn }n∈I of pairwise orthogonal projections: mρ ( {Pn }n∈I ) = {mρ (Pn )}n∈I . n
Consider now the set S(H) := {mρ : ρ ∈ D(H)} . This set contains precisely all those probability measures on Π(H) that are determined by density operators. At first sight, nothing guarantees that all probability measures defined on Π(H) are determined by a density operator. One can prove that, if the dimension of H is at least three, then every probability measure on Π(H) has the form mρ , for some ρ ∈ D(H). This is the content of a celebrated theorem proved by Gleason in 1957. THEOREM 13 (Gleason’s Theorem). Let H be a separable Hilbert space of dimension at least 3. Then, for every probability measure s on Π(H), there exists a unique density operator ρ ∈ D(H) such that ∀P ∈ Π(H): s(P ) = mρ (P ).27 Gleason’s theorem had a tremendous impact on the further quantum-logical researches. Apparently, the theorem assures that the intuitive notion of quantum state is perfectly grasped by the notion of density operator (whenever one is dealing with a Hilbert space whose dimension is at least three). Now we have focused upon the following: 24 See
[Beltrametti and Cassinelli, 1981]. Def. 64-66, 68. 26 We recall that a density operator is a linear, bounded, positive, trace-class operator of trace 25 See
1. 27 See [Gleason, 1957; Varadarajan, 1985; Dvureˇ censkij, 1993]. Gleason’s Theorem can be generalized also to Hilbert spaces over the quaternions.
228
M. L. Dalla Chiara, R. Giuntini and M. R´edei
• a special set of events Π(H); • a special set of states, identified with the set S(H) of all probability measures defined on Π(H) and determined by density operators. Consider the pair (Π(H), S(H)) and the isomorphic (C(H), S(H)). One can prove that both pairs represent an event-state system in the sense of Gudder. Such concrete event-state systems are usually called Hilbert event-state systems. Unlike abstract event-state systems, Hilbert event state systems are always normal and supernormal. Let us now turn to the crucial question that has been investigated for at length in the quantum logical literature: is it possible to capture lattice-theoretically the structure of Hilbert lattices? For many authors, the basic aim was to prove a kind of representation theorem that could reasonably replace a very strong axiom assumed by Mackey in his book. This axiom (Axiom 7 in the framework of Mackey’s axiomatization) asserted the following principle: the partially ordered set of all events is isomorphic to the partially ordered set of all closed subspaces of a separable infinite dimensional complex Hilbert space. Because of its apparent ad hoc character, such a principle has been never accepted as a reasonable axiom by the quantum logic community. In 1964, Piron gave an important partial answer to the representation problem. The content of Piron’s theorem can be summarized as follows: THEOREM 14 (The Piron weak representation theorem). Let L be a complete, irreducible, atomic, orthomodular lattice satisfying the covering property. If L has at least four pairwise orthogonal elements, then L is isomorphic to the orthomodular lattice all closed subspaces of a generalized Hilbert space.28 One also says that the lattice L that is isomorphic to the lattice of all closed subspaces is coordinatized by the generalized Hilbert space. Apparently, Piron’s theorem refers to a more general category of vector spaces: unlike the case of Hilbert spaces, generalized Hilbert space are not necessarily based on the real numbers or the complex numbers or the quaternions. The question arises: do the properties of the coordinatized lattice L of Piron’s Theorem force the generalized Hilbert space to be an actual Hilbert space? Quite unexpectedly, in 1980 Keller29 proved a negative result: there are lattices that satisfy all the conditions of Piron Theorem; at the same time, they are coordinatized by generalized Hilbert spaces over non-archimedean division rings. Keller’s counterexamples have sometimes been interpreted as showing the definitive impossibility for the quantum logical approach to capture the Hilbert space mathematics. This impossibility was supposed to demonstrate the failure of the quantum logic 28 See 29 See
[Piron, 1976; Varadarajan, 1985]. [Keller, 1980].
The History of Quantum Logic
229
approach in reaching its main goal: the “bottom-top” reconstruction of Hilbert lattices. Interestingly enough, such a negative conclusion has been contradicted by an important result proved by Sol`er30 in 1995: Hilbert lattices can be characterized in a lattice-theoretic way. Soler’s Theorem is quite technical. We will try and report here only the basic intuitive idea. The fundamental step in Sol`er’s proof is finding out a necessary and sufficient condition for a generalized Hilbert space to be a Hilbert space. DEFINITION 15 (The Sol`er condition). An infinite dimensional generalized Hilbert space satisfies the Sol`er condition iff there exists a set of vectors {ψi }i∈N and a scalar c such that: • ∀i[ψi | ψi = c]; • ∀i, j[i = j implies ψi | ψj = 0]. In other words, the elements of the set {ψi }i∈N (also called c-orthogonal set) are pairwise orthogonal; while the inner product of any element with itself is identically equal to c. On this basis, Sol`er’s strong representation theorem asserts the following equivalence: THEOREM 16. An infinite dimensional generalized Hilbert space (over a division ring) is a Hilbert space iff the space satisfies Sol`er’s condition. As a consequence, the Sol`er condition turns out to characterize Hilbert spaces in the class of all generalized Hilbert spaces. The important point is that one is dealing with a condition that admits a purely lattice-theoretic characterization, namely the so-called angle bisecting condition . One can prove that every lattice of infinite length31 that satisfies the angle bisecting condition (in addition to the conditions of the Piron Theorem) is isomorphic to a Hilbert lattice. At first sight, it seems difficult to give an intuitive physical interpretation either for the Sol`er condition or for the angle bisecting condition (whose formulation is quite long and complicated). Interestingly enough, in 1995 Holland32 found another condition (called ample unitary group condition), which seems to be physically more attractive. One can show that the Sol`er condition and the Holland condition are equivalent. Both the Sol`er condition and the ample unitary group condition have a major flaw: they essentially refer to the generalized Hilbert space machinery. Only the angle bisecting condition is purely lattice-theoretic. However, this condition is rather technical and by no means intuitive. To overcome this difficulty, in 2000 30 See
[Sol`er, 1995]. length of a lattice L is defined to be the supremum, over all the chains of L, of the numbers of elements in each chain minus 1. 32 See [Holland, 1995]. 31 The
230
M. L. Dalla Chiara, R. Giuntini and M. R´edei
Aerts and Steirteghem33 have proposed a new lattice-theoretic condition, called plane transitivity. One can prove that the plane transitivity condition and the angle bisecting condition are equivalent. At the same time, from an intuitive point of view, the content of the plane transitivity condition turns out to be somewhat close to Holland’s ample unitary group condition. DEFINITION 17 (The plane transitivity condition). Let L be an atomic orthomodular lattice.34 We say that L satisfies the plane transitivity condition iff for any two atoms a, b ∈ L, there are two distinct atoms a1 , b1 ∈ L and an isomorphism h: L→ L such that the following conditions are satisfied: (i) ∀c ∈ L: 0 ≤ c ≤ a1 ∨ b1 implies h(c) = c; (ii) h(a) = b. Summing up: THEOREM 18. For any infinite dimensional generalized Hilbert space the following conditions are equivalent: (i) the space is a Hilbert space; (ii) the space satisfies the Sol`er condition; (iii) the space satisfies the ample unitary condition; (iv) The orthomodular lattice of all closed subspaces of the space satisfies the plain transitivity condition. As a consequence, one can show that every atomic, irreducible, complete orthomodular lattice L of infinite length, that satisfies the covering property and the plane transitivity condition is isomorphic to a Hilbert lattice L(H). Notice that the infinite length of the coordinatized lattice L implies that the coordinatizing generalized Hilbert space H is infinite dimensional. Furthermore, L is separable iff H is separable. Let us now return to the class of all abstract event-state systems. Theorem 18 naturally suggests the following definition. DEFINITION 19 (Sol`er event-state system). An event-state system (E , S) (in the sense of Gudder) is called a Sol`er event-state system iff the set of all events E has the structure of an atomic, irreducible, complete orthomodular lattice of infinite length, that satisfies the covering property and the plane transitivity condition. As a consequence, one immediately obtains that all Sol`er event-state systems (E , S) (such that E is separable) are supernormal. All these results represent a satisfactory solution for the representation problem of the quantum logical approach to QT: Sol`er event-state systems represent a faithful abstract description of the basic structures of orthodox QT. Mackey’s critical Axiom 7 may now be replaced by an axiom that is not simply ad hoc. 33 See
[Aerts and van Steirteghem, 2000]. and atomicity are defined in Def. 64 and 65.
34 Atoms
The History of Quantum Logic
231
Sol`er’s Theorem might have closed the circle for the quantum logical approach to QT, leading to a perfect correspondence between the abstract and the concrete axiomatization of (non-relativistic) QT. These results might have determined a quick decay for the quantum logical investigations, whose basic goal seemed to be definitely reached. Strangely enough, what happened was quite the opposite. While a number of scholars were engaged in the solution of the representation problem, others were trying to discover a possible emergence of new logical and algebraic structures in the framework of Hilbert space QT. This work led finally to the birth of a new chapter of the history of quantum logic, the unsharp approaches (which will be presented in Section 4). 3
IS QUANTUM LOGIC A “VERITABLE” LOGIC?
For a long time, the investigations in the framework of the logico-algebraic approach to QT did not give a clear answer to the question “does a formal description of the quantum world force us to assume a non-classical logic?” Birkhoff and von Neumann seemed inclined to a positive answer. At the very beginning of their paper they observed: One of the aspects of quantum theory which has attracted the most general attention, is the novelty of the logical notions which it presupposes .... The object of the present paper is to discover what logical structures one may hope to find in physical theories which, like quantum mechanics, do not conform to classical logic. In spite of this general program, Birkhoff and von Neumann did never try to develop a technical version of quantum logic as a formal logic. Later on, a number of scholars who were working in the framework of the logico-algebraic approach to QT seemed to take a quite ambiguous attitude in this respect. A paradigmatic example is represented by a somewhat obscure position defended by Jauch in his celebrated book “Foundations of quantum mechanics” (1969) (which greatly influenced the quantum logical community): The propositional calculus of a physical system has a certain similarity to the corresponding calculus of ordinary logic. In the case of quantum mechanics, one often refers to this analogy and speak of quantum logic in contradistinction to ordinary logic.....The calculus introduced here has an entirely different meaning from the analogous calculus used in formal logic. Our calculus is the formalization of a set of empirical relations which are obtained by making measurements on a physical system. It expresses an objectively given property of the physical world. It is thus the formalization of empirical facts, inductively arrived at and subject to the uncertainty of any such fact. The calculus of formal logic, on the other hand, is obtained by making an analysis of the meaning of propositions. It is true under all circumstances and even
232
M. L. Dalla Chiara, R. Giuntini and M. R´edei
tautologically so. Thus, ordinary logic is used even in quantum mechanics of systems with a propositional calculus vastly different from that of formal logic. The two need have nothing in common.
3.1 A possible world semantics for quantum logic A turning point for the development of quantum logic as a logic has been determined by the proposal of a possible world semantics, which appeared a natural abstraction from the quantum theoretic formalism. In 1972 the Russian logician Dishkant published the article “Semantics of the minimal logic of quantum mechanics”, which shortly became a basic point of reference for the quantum logical research. Dishkant’s ideas have been further developed by Goldblatt in the article “Semantic analysis of orthologic” (appeared in 1974). From an intuitive point of view, the possible world semantics for quantum logic can be regarded as a natural variant of the kind of semantics that Kripke had proposed for intuitionistic logic and for modal logics. Accordingly, one also speaks of Kripkean semantics for quantum logic. As is well known, Kripkean models for intuitionistic logic are based on sets of possible worlds possibly correlated by an accessibility relation, which is reflexive and transitive. According to a canonical interpretation, the possible worlds of an intuitionistic Kripkean model, can be regarded as states of knowledge in progress. When a world j is accessible to another world i, the state of knowledge corresponding to j is more informative with respect to the state of knowledge represented by i. In this framework, knowledge is conservative: when a state of knowledge i knows a given sentence, then all the states of knowledge that are accessible to i know the sentence in question. The Kripkean characterization of quantum logic is based on a quite different idea. Possible worlds are interpreted as states of quantum objects, while the accessibility relation is identified with a similarity relation that may hold between states. From an intuitive point of view, one can easily understand the reason why semantic models with a reflexive and symmetric accessibility relation may be physically significant. In fact, physical theories are not generally concerned with possible evolutions of states of knowledge with respect to a constant world (as happens in the case of intuitionistic logic), but rather with sets of physical situations that may be similar , where states of knowledge must single out some invariants. We will now briefly sketch the basic concepts of the possible world semantics for a weak form of quantum logic, that Dishkant had called minimal quantum logic, while Goldblatt preferred to speak of orthologic. This logic fails to satisfy an important property of (abstract and concrete) quantum event-structures: orthomodularity. Following Goldblatt’s terminology, we will distinguish orthologic (OL) from orthomodular quantum logic (OQL), which is often simply called quantum logic. The sentential language of both logics consists of sentential letters and of the following primitive connectives: ¬ (not), (and). The notion of sentence is
The History of Quantum Logic
233
defined in the expected way. We will use the following metavariables: p, q, ... for atomic sentences and α, β, γ, ... for sentences. The disjunction (or) is supposed to be defined via de Morgan law (α β := ¬(¬α ¬β)).
234
M. L. Dalla Chiara, R. Giuntini and M. R´edei
We have already met the notion of similarity space: a pair consisting of a set of objects (representing the universe) and a similarity relation. We have seen how this notion plays an important role both for abstract and for concrete event-state systems. We will now see how similarity spaces have been used for the construction of Kripkean models for quantum logic. In semantic contexts, similarity spaces (I, R) (where I represents a set of possible worlds, while R represents an accessibility relation, which is reflexive and symmetric) are often called orthoframes. Given an orthoframe, we will use i, j, k, . . . as variables ranging over the set of worlds. Sometimes we write i ⊥ j for iRj. As we already know, any similarity space has a “twin space” that is a preclusivity space. The preclusivity relation, corresponding to the accessibility relation ⊥ will be denoted by ⊥. Hence, we will have: i ⊥ j iff not i ⊥ j. Whenever i ⊥ j we will say that j is inaccessible or orthogonal to i. We have already learnt that any preclusivity space (I, ⊥) permits one to define a preclusive complement ⊥ on the power set P(I) of I: ∀X ⊆ I[X ⊥ := {i ∈ I : i ⊥ X}]. The following conditions hold: • the map
⊥⊥
: P(I) → P(I) is a closure operator; • the structure C(P(I)) , ⊆ , ⊥ , ∅ , I based on the set of all closed subsets of I is an ortholattice. Hence, in particular, ⊥ is an orthocomplement; • X is a closed subset of I iff ∀i[i ∈ X iff ∀j ⊥ i∃k ⊥ j(k ∈ X)]. In the framework of semantic applications, the closed subsets of I are usually called (quantum) propositions of the orthoframe (I, ⊥). The following Lemma sums up some basic properties of (quantum) propositions: LEMMA 20. Let (I, R) be an orthoframe. (i) I and ∅ are propositions; (ii) If X is any set of worlds, then X ⊥ is a proposition; (iii) If C is a family of propositions, then C is a proposition.
On this basis, the notion of Kripkean model for OL can be defined as follows: DEFINITION 21 (Kripkean model for OL). A Kripkean model for OL is a system K = I , R , P r , V , where: (i) (I, R) is an orthoframe and P r is a set of propositions of the frame that contains ∅, I and is closed under the orthocomplement ⊥ and settheoretic intersection ∩;
The History of Quantum Logic
235
(ii) V is a function that associates to any sentence α a proposition in P r, satisfying the following conditions: V (¬β) = V (β)⊥ ; V (β γ) = V (β) ∩ V (γ). Instead of i ∈ V (α), one usually writes i |= α and one reads: “α is true in the world i”. If T is a set of sentences, i |= T will mean i |= β for any β ∈ T . THEOREM 22. For any Kripkean model K and any sentence α: i |= α iff ∀j ⊥ i ∃k ⊥ j (k |= α). LEMMA 23. In any Kripkean model K: (i) (ii)
i |= ¬β iff ∀j ⊥ i [j |= / β]; i |= β γ iff i |= β and i |= γ.
On this basis, the notions of truth, logical truth, consequence, logical consequence are defined in the expected way. DEFINITION 24 (Truth and logical truth). A sentence α is true in a model K = I , R , P r , V (abbreviated |=K α) iff V (α) = I; α is a logical truth of OL (|=OLα) iff |=K α for any model K. DEFINITION 25 (Consequence in a model and logical consequence). Let T be a set of sentences and let K be a model. A sentence α is a consequence in K of T (T |=K α) iff for any world i of K, i |= T i |= α. A sentence α is a logical consequence of T (T |= α) iff for any model K, T |=K α. OL We have seen that the set of propositions of a Kripkean model for OL gives rise to an ortholattice. On this basis, Kripkean models for OL can be canonically transformed into algebraic models, where the meaning of any sentence is identified with an element of an ortholattice, while the connectives are interpreted as the corresponding lattice-operation. One has shown that the Kripkean and the algebraic semantics characterize the same logic OL.35 In order to characterize orthomodular quantum logic (or quantum logic) one shall require a stronger condition in the definition of Kripkean model: DEFINITION 26 (Kripkean model for OQL). A Kripkean model for OQL is a Kripkean model K = I , R , P r , V for OL, where the set of propositions P r satisfies the orthomodular property: X ⊆ Y Y = X ∨ (X ∨ Y ′ )′ . We will indicate by QL either OL or OQL. Both logics are characterized by a deep asymmetry between conjunction and disjunction. By definition of Kripkean model, we have: 35 See
[Dalla Chiara and Giuntini, 2002].
236
M. L. Dalla Chiara, R. Giuntini and M. R´edei
• i |= β γ iff i |= β and i |= γ; • i |= β γ iff ∀j ⊥ i ∃k ⊥ j (k |= β or i |= γ). Hence, a disjunction may be true, even if both members are not true. A consequence of this asymmetry is the failure of the distributivity principle: α (β γ) |= / QL (α β) (α γ). The semantic behavior of the quantum logical disjunction, which may appear prima facie somewhat strange, seems to reflect pretty well a number of concrete quantum situations. In quantum theory one is often dealing with alternatives that are semantically determined and true, while both members are, in principle, indeterminate. For instance, suppose we are referring to a spin one-half particle (say an electron) whose spin in a certain direction may assume only two possible values: either up or down. Now, according to one of the uncertainty principles, the spin in the x direction (spinx ) and the spin in the y direction (spiny ) represent two incompatible quantities that cannot be simultaneously measured. Suppose an electron in state ψ verifies the proposition “spinx is up”. As a consequence of the uncertainty principle both propositions “spiny is up” and “spiny is down” shall be indeterminate. However the disjunction “either spiny is up or spiny is down” must be true. Interestingly enough, this characteristic feature of quantum logic had been already considered by Aristotle. One of L ukasiewicz’ celebrated contributions to the history of logic was the discovery that Aristotle was the first many-valued logician. Following this line of thought, one could reasonably add that Aristotle was, in a sense, even the first quantum logician. Let us refer to L ukasiewicz’ analysis of the 9-th chapter of Aristotle’s De Interpretatione. We are dealing with the famous example concerning the sea-battle question. According to L ukasiewicz’ interpretation, Aristotle seems to assert that both the sentence Tomorrow there will be a sea-battle, and its negation Tomorrow there will not be a sea-battle have today no definite truth-value. At the same time the disjunction Either tomorrow there will be a sea-battle or tomorrow there will not be a sea-battle is today (and always) true. In other words, Aristotle seems to be aware of the necessity of distinguishing the logical law of the excluded middle from the semantic bivalence principle. As a consequence, we obtain the possibility of a typical quantum logical situation:
The History of Quantum Logic
237
the truth of a disjunction does not generally imply the truth of at least one member. As expected, the Kripkean models of OQL admit a quite natural realization in the framework of the Hilbert event-state systems. Consider a quantum system S with associated Hilbert space H. Let (Π(H), S(H)) be the event-state system based on H. As we already know, Π(H) (the set of all projections of H) represents the set of all possible events that may occur to system S, while S(H) (the set of all probability measures mρ determined by a density operator ρ of H) represents the set of all pure and mixed states of S. Consider now a sentential language LS for S, whose atomic sentences refer to possible events M (∆) asserting that the value of an observable M lies in the Borel set ∆. Consider now the set Yes(M (∆)), consisting of all the states that assign probability-value 1 to the event M (∆). As we already know, Yes(M (∆)) is a closed subset of S(H). On this basis, we can construct the following Kripkean model for S: KS = I , R , P r , V , where: • I is the set S(H) of the states of S; • R is the similarity relation that is defined on S(H). In other words: iRj iff not ∃E ∈ Π(H)[i(E) = 1 and j(E) = 0]; • P r = C(P(S)) (= the set of all closed subsets of S(H)); • for any atomic sentence p, V (p) = Yes(M (∆)), where M (∆) is the event which the sentence p refers to. One immediately realizes that KS is a Kripkean model. For: • R is a similarity relation (reflexive and symmetric); • P r is a set of propositions, because every element X of C(P(S)) is a closed set such that X = X ⊥⊥ . Furthermore, P r contains ∅ and I, and is closed under the operations ⊥ and ∩; • for any p, V (p) ∈ P r. Interestingly enough, the accessibility relation turns out to have the following physical meaning: iRj iff j is a state into which i can be transformed after the performance of a physical measurement that concerns an observable of the system (by application of von Neumann-L¨ uders axiom, the so called “collapse of the wave function”).
238
M. L. Dalla Chiara, R. Giuntini and M. R´edei
Let us now return to our general definition of Kripkean model for OQL. Apparently, orthomodularity has not been characterized in terms of properties of the accessibility relation. Hence, the following important question arises: is it possible to express the orthomodularity of the proposition-structure in an orthoframe (I, R) as an elementary (first-order) property of the accessibility relation R? In 1984, Goldblatt gave a negative answer to this question, proving that: orthomodularity is not elementary.36 Goldblatt’s theorem has revealed a kind of metalogical intractability of OQL. As a consequence of this negative result, properties like decidability and the finite model property (which had been positively solved for OL) have stubbornly resisted to many attempts of solution in the case of OQL, and are still open problems. At the same time, OQL seems to have some logical advantages that are not shared by the weaker OL. For instance, interestingly enough, a conditional connective → turns out to be definable in terms of the primitive connectives of the quantum logical language. The most natural definition (originally proposed by Finch (1970) and Mittelstaedt (1972) and further investigated by Hardegree (1976) and other authors) is the following: α → β := ¬α (α β). In the quantum logical literature, such connective is often called Sasaki hook . Of course, in classical logic (by distributivity), the Sasaki hook is equivalent to the standard Philo’s conditional ¬α β. Notice that this classical conditional could not represent a “good” conditional for quantum logic, because it does not generally satisfy Modus Ponens. One can easily show that there are worlds i of a Kripkean model K such that: i |= α; i |= ¬α β; i |= / β. The Sasaki hook, instead, turns out to be well-behaved with respect to Modus Ponens, in the case of OQL (but not in the case of OL!). Although satisfying Modus Ponens, the quantum logical conditional gives rise to some anomalies. For instance, the following laws which hold for positive conditionals are here violated: α → (β → α); (α → (β → γ)) → ((α → β) → (α → γ)); (α → β) → ((β → γ) → (α → γ)); (α β → γ) → (α → (β → γ)); 36 See
[Goldblatt, 1984].
The History of Quantum Logic
239
(α → (β → γ)) → (β → (α → γ)). In 1975 Hardegree37 has suggested that such an anomalous behavior might be explained by conjecturing that the quantum logical conditional represents a kind of counterfactual conditional. This hypothesis seems to be confirmed by some significant physical examples. Let us consider again the Kripkean models that are associated to a quantum system S. Following Hardegree, we restrict our attention to the case of pure states. As a consequence, we consider Kripkean models having the following form: KS = I , R , P r , V , where : • I is the set of all pure states of S; • R is the nonorthogonality relation defined on I; • P r is the set of all pure propositions of the event-state system (Π(H), S(H)). In other words: Z ∈ P r iff Z is a closed set of pure states (i.e., such that Z = Z ⊥⊥ ); • V (p) is the pure proposition consisting of all pure states that assign probabilityvalue 1 to the question expressed by p. Hardegree has shown that, in such a case, the conditional → turns out to receive a quite natural counterfactual interpretation (in the sense of Stalnaker 38 ). More precisely, one can define, for any sentence α of the language LS , a partial Stalnakerfunction fα in the following way: fα : Dom(fα ) → I, where: Dom(fα ) := {i ∈ I : i ⊥ V (α)} . In other words, fα is defined exactly for all the pure states that are not orthogonal to the proposition of α. If i ∈ Dom(fα ), then: fα (i) := P V (α) i, where P V (α) is the projection that is uniquely associated with the pure proposition V (α) . The following condition holds: i |= α → β iff either ∀j ⊥ i(j |= / α) or fα (i) |= β. From an intuitive point of view, one can say that fα (i) represents the “pure state nearest” to i, that verifies α, where “nearest” is here defined in terms of the metric of the Hilbert space H. By definition and in virtue of von Neumann- L¨ uders axiom 37 See 38 See
[Hardegree, 1975]. [Stalnaker, 1981].
240
M. L. Dalla Chiara, R. Giuntini and M. R´edei
(the collapse of the wave-function), fα (i) turns out to have the following physical meaning: it represents the transformation of state i after the performance of a measurement concerning the physical event expressed by α, provided the result was positive. As a consequence, one obtains: α → β is true in a state i iff either α is impossible for i or the state into which i has been transformed after a positive α-test, verifies β.
3.2 Axiomatizations of quantum logic Both OL and OQL are axiomatizable logics. Many axiomatizations have been proposed: in the Hilbert-Bernays style and in the Gentzen-style (natural deduction and sequent-calculi).39 We present here a calculus (in the natural deduction style) which is a slight modification of the version proposed by Goldblatt in 1974. This calculus (which has no axioms) is determined as a set of rules. Let T1 , . . . , Tn be finite or infinite (possibly empty) sets of sentences. Any rule has the form T1 |− α1 , . . . , Tn |− αn T |− α (if α1 has been inferred from T1 , . . . , αn has been inferred from Tn , then α can be inferred from T ). We call any expression of the form T |− α a configuration. The configurations T1 |− α1 , . . . , Tn |− αn represent the premisses of the rule, while T |− α is the conclusion. As a limit case, we may have a rule in which the set of premisses is empty; in such a case we will speak of an improper rule. Instead of ∅ we will write T |− α T |− α; instead of ∅ |− α, we will write |− α. Rules of OL (OL1)
T ∪ {α} |− α
(OL2)
T |− α, T ∗ ∪ {α} |− β T ∪ T ∗ |− β
(OL3)
T ∪ {α β} |− α
(-elimination)
(OL4)
T ∪ {α β} |− β
(-elimination)
(OL5)
T |− α, T |− β T |− α β
(identity) (transitivity)
(-introduction)
39 An axiomatization of OQL in the Hilbert-Bernays style see has been proposed by Hardegree in 1976 (see [Hardegree, 1976] ). Sequent calculi for different forms of quantum logic have been investigated by Nishimura [1980] and by Battilotti and Sambin [1999]. See also [Battilotti and Faggian, 2002].
The History of Quantum Logic
241
(OL6)
T ∪ {α, β} |− γ T ∪ {α β} |− γ
(OL7)
{α} |− β, {α} |− ¬β ¬α
(OL8)
T ∪ {α} |− ¬¬α
(weak double negation)
(OL9)
T ∪ {¬¬α} |− α
(strong double negation)
(OL10)
T ∪ {α ¬α} |− β
(OL11)
{α} |− β {¬β} |− ¬α
(-introduction) (absurdity)
(Duns Scotus) (contraposition)
An axiomatization of OQL can be obtained by adding to the OL-calculus the following rule: (OQL)
α ¬(α ¬(α β)) |− β.
(orthomodularity)
On this basis, all the standard syntactical notions (derivation, derivability, logical theorem) are defined in the expected way. DEFINITION 27 (Derivation). A derivation of QL is a finite sequence of configurations T |− α, where any element of the sequence is either the conclusion of an improper rule or the conclusion of a proper rule whose premisses are previous elements of the sequence. DEFINITION 28 (Derivability). A sentence α is derivable from T (T |− QL α) iff there is a derivation such that the configuration T |− α is the last element of the derivation. Instead of {α} |− QL β we will write α |− QL β. DEFINITION 29 (Logical theorem). A sentence α is a logical theorem of QL ( |− QL α) iff ∅ |− QL α. A soundness and a completeness theorem have been proved for both logics with standard techniques (using the notion of canonical model )40 : THEOREM 30 (Soundness theorem). T |− QL α T |=QL α. THEOREM 31 (Completeness theorem). T |=QL α T |− QL α. 40 See
[Dalla Chiara and Giuntini, 2002].
242
M. L. Dalla Chiara, R. Giuntini and M. R´edei
To what extent does orthomodular quantum logic represent a completely faithful abstraction from QT? As we have seen, the prototypical models of OQL that are interesting from the physical point of view are based on the class H of all Hilbert lattices. Let us call Hilbert quantum logic (HQL) the logic that is semantically characterized by H (both in the Kripkean and in the algebraic semantics). An important problem that has been discussed for at length is the following: do OQL and HQL represent one and the same logic? In 1981 Greechie gave a negative answer to this question: there is an ortholattice-theoretical equation, the so-called orthoarguesian law 41 that holds in H, but fails in a particular orthomodular lattice. As a consequence, OQL does not represent a faithful logical abstraction from its quantum theoretical origin. The axiomatizability of HQL is still an open problem.
3.3 Metalogical anomalies and the hidden variable problem Both orthologic and orthomodular quantum logic give rise some significant metalogical anomalies that are deeply connected with the characteristic properties of pure states in QT. Unlike classical pure states, a quantum pure state represents a piece of information about the physical system under investigation that is at the same time a maximal and a logically incomplete knowledge. The information is maximal because it cannot be consistently extended to a richer knowledge in the framework of the theory: even an omniscient mind could not know more. At the same time, one is dealing with a logically incomplete information: owing to Heisenberg’s uncertainty relations, a number of possible properties of the system (which are expressed in the language of the theory) are essentially undecided . This typically quantum divergence between maximality and logical completeness is faithfully represented by a characteristic metalogical anomaly of QL: the failure of the Lindenbaum property. In QL, any noncontradictory set of sentences T can be extended to a noncontradictory maximal set T ′ (which does not admit any noncontradictory proper extension expressed in the same language). However, the set T cannot be generally extended to a noncontradictory and complete T ′ (such that, for any sentence α of the language, either α ∈ T or ¬α ∈ T ). Interestingly enough, the failure of the Lindenbaum property has represented a powerful metalogical tool that has been used to prove the impossibility of completing QT via some (non-contextual) hidden variable hypotheses.42 The debate concerning the question whether QT can be considered a physically complete account of microphenomena has a long and deep history. A turning point in this discussion has been the celebrated Einstein-Bohr debate, with the ensuing charge of incompleteness raised by the Einstein-Podolsky-Rosen argument (EPR). As we already know, in the framework of orthodox QT, physical systems can be prepared in pure states that have, in general, positive dispersion for most physical quantities. In the EPR argument, the attention is focused on the question whether the account of the microphysical phenomena provided by QT is to be regarded as 41 See 42 See,
[Greechie, 1981]. See also [Kalmbach, 1983]. for instance, [Giuntini, 1991].
The History of Quantum Logic
243
an exhaustive description of the physical reality to which those phenomena are supposed to refer, a question to which Einstein himself answered in the negative. There is a mathematical side of the completeness issue: the question becomes whether states with positive dispersion can be represented as a different, dispersionfree, kind of states in a way that is consistent with the mathematical constraints of the quantum theoretical formalism. In his book on the mathematical foundations of quantum mechanics, von Neumann proved a celebrated “No go theorem” asserting the logical incompatibility between the quantum formalism and the existence of dispersion free states (satisfying some general conditions). Already in the preface, von Neumann anticipates the program and the conclusion concerning the possibility of ‘neutralizing’ the statistical character of QT: There will be a detailed discussion of the problem as to whether it is possible to trace the statistical character of quantum mechanics to an ambiguity (i.e., incompleteness) in our description of nature. Indeed, such an interpretation would be a natural concomitant of the general principle that each probability statement arises from the incompleteness of our knowledge. This explanation “by hidden parameters” [...] has been proposed more than once. However, it will appear that this can scarcely succeed in a satisfactory way, or more precisely, such an explanation is incompatible with certain qualitative fundamental postulates of quantum mechanics. According to the advocates of hidden variables, QT is a physically incomplete theory. The intuitive idea that represents the common background to almost all hidden variable theories can be described in the following way: (I) the reason why a physical theory is statistical depends on the fact that the description provided by the states is incomplete. (II) It is possible to add a set Ξ of parameters (hidden variables) in such a way that • for every state s and for every ω ∈ Ξ, there exists a dispersion-free (dichotomous) state sω which semantically decides every property (event) of the physical system at issue; • the statistical predictions of the original theory should be recovered by averaging over these dichotomous states; • the algebraic structures determined by the properties (events) of the system should be preserved in the hidden variable extension. The hidden variable theories based on the assumptions (I) and (II) are usually called non-contextual , because they require the existence of a single space Ξ of hidden variables determining dispersion-free states. A weaker position is represented by the contextual hidden variable theories, according to which the choice of the hidden variable space depends on the physical quantity to be dealt with. As pointed out by Beltrametti and Cassinelli [1981]:
244
M. L. Dalla Chiara, R. Giuntini and M. R´edei
Despite the absence of mathematical obstacles against contextual hidden variable theories, it must be stressed that their calling for completed states that are probability measures not on the whole proposition [event] lattice E but only on a subset of E is rather far from intuitive physical ideas of what a state of a physical system should be. Thus, contextual hidden variable theorists, in their search for the restoration of some classical deterministic aspects, have to pay, on other sides, in quite radical departures from properties of classical states. Von Neumann’s proof of his “No go theorem” was based on a general assumption that has been, later, considered too strong. The condition asserts the following: Let sω be a dispersione-free state and let A, B be two (possibly noncompatible) observables. Then, Exp(A + B, sω ) = Exp(A, sω ) + Exp(B, sω ). In other words, the expectation functional Exp determined by the completed state sω is linear. In the late Sixties, Kochen and Specker published a series of articles, developing a purely logical argument for a “No go theorem,” such that von Neumann’s strong assumption can be relaxed.43 Soundness theorem. Kochen and Specker’s proof is based on a variant of quantum logic, that has been called partial classical logic (PaCL). The basic semantic idea is the following: unlike orthologic and orthomodular quantum logic (which are total logics, because the meaning of any sentence is always defined), molecular sentences of PaCL can be semantically undefined. From the semantic point of view, the crucial relation is represented by a compatibility relation, that may hold between the meanings of two sentences. As expected, the intended physical interpretation of the compatibility relation is the following: two sentences α and β have compatible meanings iff α and β can be simultaneously tested. Models of PaCL are special kinds of algebraic models based on partial Boolean algebras (weaker versions of Boolean algebras where the meet and the join are only defined for pairs of compatible elements). Al these investigations have revealed that there is a deep logical connection between the two following questions: • does a quantum system S admit a non-contextual hidden variable theory? • Does PaCL satisfy a version of the Lindenbaum property with respect to the algebraic models concerning the events that may occur to the system S? 4 INDETERMINISM AND FUZZINESS: THE UNSHARP APPROACHES TO QT The essential indeterminism of QT gives rise to a kind of ambiguity of the quantum world. Such ambiguity can be investigated at different levels. The first level concerns the characteristic features of quantum pure states, which represent pieces of 43 See
[Kochen and Specker, 1965a; Kochen and Specker, 1965; Kochen and Specker, 1967].
The History of Quantum Logic
245
information that are at the same time maximal and logically incomplete. Such divergence between maximality and logical completeness is the origin of most logical anomalies of the quantum phenomena. A second level of ambiguity is connected with a possibly fuzzy character of the physical events that are investigated. We can try and illustrate the difference between two “fuzziness-levels” by referring to a nonscientific example. Let us consider the two following sentences, which apparently have no definite truthvalue: I) Hamlet is 1.70 meters tall; II) Brutus is an honourable man. The semantic uncertainty involved in the first example seems to depend on the logical incompleteness of the individual concept associated to the name “Hamlet.” In other words, the property “being 1.70 meters tall” is a sharp property. However, our concept of Hamlet is not able to decide whether such a property is satisfied or not. Unlike real persons, literary characters have a number of indeterminate properties. On the contrary, the semantic uncertainty involved in the second example, is mainly caused by the ambiguity of the concept “honourable.” What does it mean “being honourable?” One need only recall how the ambiguity of the adjective “honourable” plays an important role in the famous Mark Antony’s monologue in Shakespeare’s “Julius Caesar.” Now, orthodox QT generally takes into consideration examples of the first kind (our first level of fuzziness): events are sharp, while all semantic uncertainties are due to the logical incompleteness of the individual concepts, that correspond to pure states of quantum objects. This is the reason why orthodox QT is sometimes called sharp QT, in contrast with unsharp QT, which also investigates examples of the second kind (second level of fuzziness). Strangely enough, the abstract researches on fuzzy logics and on quantum structures have undergone quite independent developments for many decades during the 20-th century. Only after the Eighties, there emerged an interesting convergence between the investigations about fuzzy and quantum structures, in the framework of the so called unsharp approach to quantum theory. In this connection a significant conjecture has been proposed: perhaps some apparent mysteries of the quantum world should be described as special cases of some more general fuzzy phenomena, whose behavior has not yet been fully understood. In 1920 J. L ukasiewicz, the “father” of fuzzy logics, published a two-page article whose title was “On three-valued logic.” The paper proposed a semantic characterization for the logic that has been later called L 3 (Lukasiewicz’ three-valued logic). In spite of the shortness of the paper, all the important points concerning the semantics of L 3 are already there and can be naturally generalized to the case of a generic number n of truth-values as well as to the case of infinite many values. The conclusion of the article was quite interesting: The present author is of the opinion that three-valued logic has above all theoretical importance as an endeavour to construct a system of
246
M. L. Dalla Chiara, R. Giuntini and M. R´edei
non-aristotelian logic. Whether the new system of logic has any practical importance will be seen only when the logical phenomena, especially those in the deductive sciences, are thoroughly examined, and when the consequences of the indeterministic philosophy, which is the metaphysical substratum of the new logic, can be compared with em[L pirical data. ukasiewicz, 1970] These days, L ukasiewicz’ remark appears to be highly prophetic, at least in two respects. First of all, the practical importance of many-valued logics has gone beyond all reasonable expectations at L ukasiewicz’ times. What we call today fuzzy logics (natural developments of L ukasiewicz’ many-valued logics) gave rise to a number of technological applications. We need only recall that we can buy washing machines and cameras whose suggestive name is just “fuzzy logic.” At the same time, QT has permitted us to compare the consequences of an indeterministic philosophy with empirical data. This has been done both at a logico-mathematical level and at an experimental level. As we have seen, the no go theorems have proved the impossibility of deterministic completions of orthodox QT by means of non-contextual hidden variable theories. At the same time, some experiments that have been performed in the Eighties44 have confirmed the statistical predictions of QT, against the predictions of the most significant hidden variable theories. L ukasiewicz was a contemporary of Heisenberg, Bohr, von Neumann. Strangely enough, however, he very rarely made explicit references to QT. In spite of this, he seemed to be aware of the importance of QT for his indeterministic philosophy. In 1946 he wrote a revised version of his paper “On Determinism,” an address that he delivered as the rector of the Warsaw University for the inauguration of the academic year 1922/1923. At the very beginning of the article he noticed: At the time when I gave my address those facts and theories in the field of atomic physics which subsequently led to the undermining of determinism were still unknown. In order not to deviate too much from, and not to interfere with, the original content of the address, I have not amplified my article with arguments drawn from this branch [L of knowledge. ukasiewicz, 1946] In 1983 the German physicist G. Ludwig published the book Foundations of Quantum Mechanics, which has been later regarded as the birth of the unsharp approach to QT. Paradoxically enough, Ludwig has always been an “enemy” of quantum logic. In spite of this, his ideas have greatly contributed to the revival of the quantum logical investigations during the last two decades. Ludwig’s pioneering work has been further developed by many scholars (Kraus, Davies, Mittelstaedt, Busch, Lahti, Bugajski, Beltrametti, Cattaneo, Nistic` o, Foulis, Bennett, 44 See
[Aspect et al., 1981; Aspect and Grangier, 1985].
The History of Quantum Logic
247
Gudder, Greechie, Pulmannov´ a, Dvureˇcenskij, Rieˇcan, Rieˇcanova, Schroeck and many others including the authors of this chapter). The starting point of the unsharp approach is deeply connected with a general problem that naturally arises in the framework of Hilbert space QT. Let us consider a concrete event-state system (Π(H) , S(H)), where (Π(H) is the set of projections, while S(H)) is the set of density operators of the Hilbert space H (associated to the physical system under investigation). One can ask the following question: do the sets Π(H) and S(H) correspond to an optimal possible choice of adequate mathematical representatives for the intuitive notions of event and of state, respectively? Consider first the notion of state. Once Π(H) is fixed, Gleason’s Theorem guarantees that S(H) corresponds to an optimal notion of state: for, any probability measure defined on Π(H) is determined by a density operator of H (provided the dimension of H is greater than or equal to 3). Let us discuss then the notion of event and let us ask whether Π(H) represents the largest set of operators assigned a probability-value, according to the Born rule. The answer to this question is negative. One can easily recognize the existence of bounded linear operators E that are not projections and that satisfy the following condition: for any density operator ρ, Tr(ρE) ∈ [0, 1]. From an intuitive point of view, this means that such operators E “behave as possible events,” because any state assigns to them a probability value. An interesting example of this kind is represented by the operator 12 I (where I is the identity operator). One immediately realizes that 12 I is a linear bounded operator that is not a projection, because: 1 1 1 1 I I = I = I 2 2 4 2 (hence 12 I fails to be idempotent). At the same time, for any density operator ρ we have: 1 1 Tr(ρ I) = . 2 2 Thus, 12 I seems to represent a totally indeterminate event, to which each state assigns probability 12 . Apparently, the event 21 I plays the role that, in fuzzy set theory, is played by the semitransparent fuzzy set 12 1 such that for any object x of the universe: 1 1 1(x) = . 2 2 This situation suggests that we liberalize the notion of quantum event and extend the set Π(H) to a new set of operators. Following Ludwig, the elements of
248
M. L. Dalla Chiara, R. Giuntini and M. R´edei
this new set have been called effects. The precise mathematical definition of effect is the following: DEFINITION 32 (Effects). An effect of H is a bounded linear operator E that satisfies the following condition, for any density operator ρ: Tr(ρE) ∈ [0, 1]. We denote by E(H) the set of all effects of H. Clearly, E(H) properly includes Π(H). Because: • any projection satisfies the definition of effect; • there are examples of effects that are not projections (for instance the effect 1 2 I, that is usually called the semitransparent effect). By definition, effects turn out to represent a kind of maximal mathematical representative for the notion of quantum event, in agreement with the basic statistical rule of QT (the Born rule). Unlike projections, effects represent quite general mathematical objects that describe at the same time events and states. Let E be any effect in E(H). The following conditions hold: • E represents a sharp event (∈ Π(H)) iff E is idempotent (EE = E); • E is a density operator (representing a state) iff Tr(E) = 1; • E represents a pure state iff E is at the same time a projection and a density operator.
4.1 Algebraic effect-structures There are different algebraic structures that can be induced on the set of all effects in a Hilbert space. One immediately realizes that the set E(H) can be naturally structured as a regular involution bounded poset 45 : E(H) , ≤ , ′ , 0 , 1 , where (i) ≤ is the natural order determined by the set of all density operators. In other words: E ≤ F iff for any density operator ρ ∈ D(H), Tr(ρE) ≤ Tr(ρF ). (i.e., any state assigns to E a probability-value that is less or equal than the probability-value assigned to F ); 45 See
Def. 47- 55.
The History of Quantum Logic
249
(ii) E ′ = 1 − E (where − is the standard operator difference); (iii) 0, 1 are the null projection (O) and the identity projection (I), respectively. One can easily check that: • ≤ is a partial order; •
′
is an involution;
• 0 and 1 are respectively the minimum and the maximum with respect to ≤; • the regularity condition holds. In other words: E ≤ E ′ and F ≤ F ′ implies E ≤ F ′ . The effect poset E(H) turns out to be properly fuzzy. The noncontradiction principle is violated: for instance the semitransparent effect 21 I satisfies the following condition: 1 1 1 1 1 I ∧ ( I)′ = I ∧ I = I = 0. 2 2 2 2 2 This is one of the reasons why proper effects (those that are not projections) may be regarded as representing unsharp physical events. Accordingly, we will also call the involution operation of an effect-structure a fuzzy complement. At the same time, the effect-poset fails to be a lattice. As proved by Greechie and Gudder in 1996, some pairs of effects have no meet.46 In 1986 Cattaneo and Nistic` o47 have proposed to extend the effect poset E(H) to a richer structure, equipped with a new complement ∼ , that has an intuitionisticlike behavior. Such operation ∼ has been called the Brouwer complement. DEFINITION 33. The Brouwer complement ∀E ∈ E(H) : E ∼ = PKer(E) . In other words, the Brouwer complement of E is the projection operator PKer(E) whose range is Ker(E), the kernel of E.48 By definition, the Brouwer complement of an effect is always a projection. In the particular case, when E is a projection, it turns out that E ′ = E ∼ , in other words, the fuzzy and the intuitionistic complement collapse into one and the same operation. The structure E(H) , ≤ , ′ , ∼ , 0 , 1 turns out to be a particular example of a kind of abstract structure that Cattaneo and Nistic` o have termed Brouwer Zadeh poset.49 The abstract definition of Brouwer Zadeh posets is the following: 46 See
[Gudder and Greechie, 1996]. [Cattaneo and Nistic` o, 1986]. 48 The kernel of E is the set of all vectors of H that are transformed by E into the null vector. 49 See [Cattaneo and Nistic` o, 1986]. 47 See
250
M. L. Dalla Chiara, R. Giuntini and M. R´edei
DEFINITION 34 (Brouwer Zadeh poset). A Brouwer Zadeh poset (or BZ-poset) is a structure B , ≤ , ′ , ∼ , 0 , 1 , where (i) B , ≤ , ′ , 0 , 1 is a regular poset; (ii)
∼
is a unary operation that behaves like an intuitionistic complement:
(iia) a ∧ a∼ = 0; (iib) a ≤ a∼∼ ; (iic) a ≤ b implies b∼ ≤ a∼ . (iii) The following relation connects the fuzzy and the intuitionistic complement: a∼′ = a∼∼ . Of course, any BZ-poset B , ≤ , ′ , ∼ , 0 , 1 where the two complements ′ and coincide, turns out to be an orthoposet (i.e. a bounded involution poset, where the involution ′ satisfies the non contradiction and the excluded middle principles). One can prove that the concrete effect-structure
∼
E(H) , ≤ , ′ ,
∼
, 0 , 1
is a Brouwer Zadeh poset, that is not an orthoposet. An interesting feature of the Brouwer Zadeh structures is the possibility to define two unary operations ν and µ, which turn out to behave as the modal operators necessarily and possibly, respectively. DEFINITION 35 (The modal operators). Let B , ≤ , ′ , Zadeh poset. ν(a) := a′∼ ; µ(a) := a∼′ .
∼
, 0 , 1 be a Brouwer
In other words, necessity is identified with the intuitionistic negation of the fuzzy negation, while possibility is identified with the fuzzy negation of the intuitionistic negation. The modal operators ν and µ turn out to have a typical S5 -like behavior. For, the following conditions are satisfied: • ν(a) ≤ a Necessarily a implies a. • If a ≤ b, then ν(a) ≤ ν(b) If a implies b, then the necessity of a implies the necessity of b. • a ≤ ν(µ(a)) a implies the necessity of its possibility.
The History of Quantum Logic
251
• ν(ν(a)) = ν(a) Necessity is equivalent to the necessity of the necessity. • ν(µ(a)) = µ(a) The necessity of a possibility is equivalent to the possibility. Of course, in any BZ-poset B , ≤ , ′ , ∼ , 0 , 1 where the two complements ′ and ∼ coincide, we obtain a collapse of the modalities. In other terms, ν(a) = a = µ(a). Let us now return to concrete Brouwer Zadeh posets E(H) , ≤ , ′ ,
∼
, 0 , 1 ,
and consider the necessity ν(E) of a given effect E (which may be either sharp or unsharp). One can easily prove the following lemma. LEMMA 36. (i) E is a projection iff E = ν(E) = E ′∼ = PKer(E ′ ) . (ii) Let P be any projection. P ≤ E implies P ≤ ν(E). As a consequence, we can say that ν(E) represents a kind of “best sharp lower approximation of E.” Brouwer Zadeh posets do not represent the only interesting way of structuring the set of all concrete effects. Other important structures that have naturally emerged from effect-systems are effect algebras and quantum MV algebras. Such structures (introduced in the late Eighties and in the Nineties) have represented a privileged object of research for the logico-algebraic approach to QT at the turn of the century. We will first sketch the definition of effect algebra (also called unsharp orthoalgebras).50 One is dealing with a particular kind of partial structure, equipped with a basic operation ⊞ that is only defined for special pairs of elements. From an intuitive point of view, such an operation can be regarded as an exclusive disjunction (aut), defined for events that are logically incompatible. The abstract definition of effect algebra is the following. DEFINITION 37 (Effect algebra).
An effect algebra is a partial structure
A = A , ⊞ , 0 , 1 , where ⊞ is a partial binary operation on A, and 0 and 1 are special distinct elements of A. When ⊞ is defined for a pair a , b ∈ A, we will write ∃(a ⊞ b). The following conditions hold: 50 See [Giuntini and Greuling, 1989; Foulis and Bennett, 1994; Dalla Chiara and Giuntini, 1994; Dvureˇ censkij and Pulmannov´ a, 2000; ?].
252
M. L. Dalla Chiara, R. Giuntini and M. R´edei
(i) Weak commutativity ∃(a ⊞ b) implies ∃(b ⊞ a) and a ⊞ b = b ⊞ a; (ii) Weak associativity ∃(b ⊞ c) and ∃(a ⊞ (b ⊞ c)) implies ∃(a ⊞ b) and ∃((a ⊞ b) ⊞ c) and a ⊞ (b ⊞ c) = (a ⊞ b) ⊞ c; (iii) Strong excluded middle For any a, there exists a unique x such that a ⊞ x = 1; (iv) Weak consistency ∃(a ⊞ 1) implies a = 0. An orthogonality relation ⊥, a partial order relation ≤ and a generalized complement ′ (which generally behaves as a fuzzy complement) can be defined in any effect algebra. DEFINITION 38. Let A , ⊞ , 0 , 1 be an effect algebra and let a, b ∈ A. (i) a ⊥ b iff a ⊞ b is defined in A. (ii) a ≤ b iff ∃c ∈ A such that a ⊥ c and b = a ⊞ c. (iii) The generalized complement of a is the unique element a′ such that a ⊞ a′ = 1. One can show that any effect algebra A , ⊞ , 0 , 1 gives rise to a bounded involution poset A , ≤ , ′ , 0 , 1, where ≤ and ′ are defined according to Definition 38. The category of all effect algebras turns out to be (categorically) equivalent to the category of all difference posets, which have been first studied by Kˆ opka and Chovanec and further investigated by Pulmannov´ a and others.51 Effect algebras represent weak examples of orthoalgebras, a category of partial structures that Foulis and Randall had introduced in 1981.52 Roughly, orthoalgebras are effect algebras that satisfy the noncontradiction principle. In such algebras, the involution ′ becomes an orthocomplementation. The precise mathematical definition is the following: DEFINITION 39 (Orthoalgebras). An orthoalgebra is an effect algebra A , ⊞ , 0 , 1 such that the following condition is satisfied: ∃(a ⊞ a) implies a = 0
(Strong consistency).
In other words: 0 is the only element that is orthogonal to itself. One can easily realize that orthoalgebras always determine an orthoposet. Let A = A , ⊞ , 0 , 1 be an orthoalgebra. The structure A , ≤ , ′ , 0 , 1 51 See 52 See
[Kˆ opka and Chovenec, 1994],[Pulmannov´ a, 1995]. [Foulis and Randall, 1981].
The History of Quantum Logic
253
(where ≤ and ′ are the partial order and the generalized complement of A) is an orthoposet. For, given any a ∈ A, the infimum a ∧ a′ exists and is equal to 0; equivalently, the supremum a ∨ a′ exists and is equal to 1. THEOREM 40. Any orthoalgebra A = A , ⊞ , 0 , 1 satisfies the following condition: if a, b ∈ A and a ⊥ b, then a ⊞ b is a minimal upper bound for a and b in A. COROLLARY 41. Any orthoalgebra A = A , ⊞ , 0 , 1 satisfies the following condition: for any a, b ∈ A such that a ⊥ b, if the supremum a ∨ b exists, then a ∨ b = a ⊞ b. Orthoalgebras and orthomodular posets turn out to be deeply connected. Any orthomodular poset A , ≤ , ′ , 0 , 1 determines an orthoalgebra A , ⊞ , 0 , 1 , where: a ⊞ b is defined iff a ≤ b′ . Furthermore, when defined, a ⊞ b = a ∨ b. At the same time, not every orthoalgebra is an orthomodular poset (as shown by Wright in 199053 ). Genuine examples of effect algebras (which are not generally orthoalgebras) can be naturally obtained in the domain of fuzzy set systems. EXAMPLE 42 (Effect algebras of fuzzy sets). Let B be the set of all fuzzy subsets of a universe U (in other words, B is the set of all functions assigning to any element of U a value in the real interval [0, 1] ). A partial operation ⊞ can be defined on B. For any f, g ∈ B: ∃(f ⊞ g) iff ∀x ∈ U : f (x) + g(x) ≤ 1, where + is the usual sum of real numbers. Furthermore: if ∃(f ⊞ g), then f ⊞ g := f + g, where: ∀x ∈ U {(f + g)(x) := f (x) + g(x)} . Let 1 be the classical characteristic function of the total set U , while 0 is the classical characteristic function of the empty set ∅. The structure B , ⊞ , 0 , 1 is an effect algebra. It turns out that the effect-algebra generalized complement ′ coincides with the fuzzy complement. In other words: ∀x ∈ U : f ′ (x) = 1 − f (x). Furthermore, the effect-algebra partial order relation coincides with the natural partial order of B. In other words: ∀x ∈ U [f (x) ≤ g(x)] iff ∃h ∈ B[f ⊥ h and g = f ⊞ h]. 53 See
[Wright, 1990].
254
M. L. Dalla Chiara, R. Giuntini and M. R´edei
The effect algebra B , ⊞ , 0 , 1 is not an orthoalgebra, because the strong consistency condition is violated by some genuine fuzzy sets (such as the semitransparent fuzzy set 21 1 that assigns to any object x value 21 ). How can we induce the structure of an effect algebra on the set E(H) of all effects of the Hilbert space H? As in the fuzzy-set case, it is sufficient to define the partial sum ⊞ as follows: ∃(E ⊞ F ) iff E + F ∈ E(H), where + is the usual sum-operator. Furthermore: E ⊞ F := E + F, if ∃(E ⊞ F ). It turns out that the structure E(H) , ⊞ , O , I is an effect algebra (called standard effect algebra or Hilbert effect algebra ), where the generalized complement of any effect E is just I − E. Furthermore, the effect-algebra order relation coincides with the natural order defined on E(H). In other words: ∀ρ ∈ D(H)[Tr(ρE) ≤ Tr(ρF )] iff ∃G ∈ E(H)[E ⊥ G and F = E ⊞ G]. At the same time, this structure fails to be an orthoalgebra. For instance, the semitransparent effect 21 I gives rise to a counterexample to the strong consistency condition: 1 1 1 1 1 I = O and I ⊞ I = I + ( I)′ = I. 2 2 2 2 2 Let us now turn to the other kind of structure that naturally emerges from concrete effect systems. One is dealing with quantum MV algebras (QMV algebras): they are weak variants of MV algebras (which represent privileged abstractions from classical fuzzy set structures).54 Before introducing the notion of QMV algebra, it will be useful to sum up some basic properties of MV algebras. As is well known, the set of all fuzzy subsets of a given set X gives rise to a de Morgan lattice, where the noncontradiction and the excluded middle principles are possibly violated. In this framework, the lattice operations ( the meet ∧, the join ∨ and the fuzzy complement ′ ) do not represent the only interesting fuzzy operations that can be defined. An important role is played by a new kind of conjunction and disjunction, which have been first investigated in the framework of L ukasiewicz’ approach to many valued logics. These operations are usually called L ukasiewicz operations. The definition of L ukasiewicz conjunction and disjunction in the framework of fuzzy set structures turns out to be quite natural. Fuzzy sets are nothing but generalized characteristic functions whose range is the real interval [0, 1]. Of course, [0, 1] is not closed under the ordinary real sum + (we may have x, y ∈ 54 See
[Giuntini, 1996].
The History of Quantum Logic
255
[0, 1] and x + y ∈ / [0, 1]). However, one can introduce a new operation ⊕, which is called truncated sum: ∀x, y ∈ [0, 1] {x ⊕ y := min(1, x + y)} . In other words, x ⊕ y is the ordinary sum x + y, whenever this sum belongs to the interval; otherwise x ⊕ y collapses into the maximum element 1. One immediately realizes that [0, 1] is closed under the operation ⊕. Now, we can use the truncated sum in order to define the L ukasiewicz disjunction between fuzzy sets (since no confusion is possible, it will be expedient to use the same symbol ⊕ both for the truncated sum and for the L ukasiewicz disjunction). Let f, g be fuzzy subsets of a set X. The L ukasiewicz disjunction ⊕ is defined as follows: ∀x ∈ X {(f ⊕ g)(x) := f (x) ⊕ g(x) = min(1, f (x) + g(x))} . On this basis, the L ukasiewicz conjunction ⊙ can be defined, via de Morgan, in terms of ⊕ and ′ : ∀x ∈ X {(f ⊙ g)(x) := (f ′ ⊕ g ′ )′ (x)} . As a consequence, one obtains: (f ⊙ g)(x) = max(0, f (x) + g(x) − 1). From an intuitive point of view, the L ukasiewicz operations and the lattice operations represent different notions of conjunction and disjunction that can be used in a fuzzy situation. Consider two fuzzy sets f and g; they can be intuitively regarded as two ambiguous properties. The number f (x) represents the “degree of certainty” according to which the object x satisfies the property f . A similar comment holds for g and g(x). What does it mean that the object x satisfies the disjunctive property “f or g” with a given degree of certainty? If we interpret “or” as the lattice join, we assume the following choice: an object satisfies a disjunction according to a degree that corresponds to the maximum between the degrees of the two members. If we, instead, interpret “or” as the L ukasiewicz disjunction, we assume the following choice: the degrees of the members of the disjunction have to be summed in such a way that one never goes beyond the absolute certainty (the value 1). Of course, in the limit-case represented by crisp sets (i.e., classical characteristic functions) the L ukasiewicz disjunction and the lattice join will coincide. Suppose x, y ∈ {0, 1}, then x ⊕ y = max(x, y). From the definitions, one immediately obtains that the L ukasiewicz operations are not generally idempotent. It may happen: a ⊕ a = a and a ⊙ a = a. As noticed by Mundici55 , this is a typical semantic situation that seems to be governed by the principle “repetita iuvant!” (repetitions are useful!). Of course 55 See
[Mundici, 1992].
256
M. L. Dalla Chiara, R. Giuntini and M. R´edei
repetitions are really useful in all physical circumstances that are accompanied by a certain noise. As a consequence, ⊕ and ⊙ do not give rise to a lattice structure. At the same time, as with the lattice operations, they turn out to satisfy commutativity and associativity: f ⊕ g = g ⊕ f; f ⊙ g = g ⊙ f; f ⊕ (g ⊕ h) = (f ⊕ g) ⊕ h; f ⊙ (g ⊙ h) = (f ⊙ g) ⊙ h. Unlike the fuzzy lattice operations, the L ukasiewicz conjunction and disjunction do satisfy both the excluded middle and the noncontradiction principle: f ⊕ f ′ = 1; f ⊙ f ′ = 0. Another important difference concerns the distributivity property. As opposed to the case of ∧ and ∨ (which satisfy distributivity in the fuzzy set environment), it may happen: f ⊙ (g ⊕ h) = (f ⊙ g) ⊕ (f ⊙ h); f ⊕ (g ⊙ h) = (f ⊕ g) ⊙ (f ⊕ h). What can be said about the relationships between the L ukasiewicz operations and the lattice operations? Interestingly enough, the lattice operations turn out to be definable in terms of the fuzzy complement and of the L ukasiewicz operations. For, we have: f ∧ g := (f ⊕ g ′ ) ⊙ g; f ∨ g := (f ⊙ g ′ ) ⊕ g. An interesting algebraic abstraction from fuzzy set structures can be obtained if we restrict our attention to the fuzzy complement, the lattice operations and the L ukasiewicz operations. This gives rise to the abstract notion of an MV algebra (multi-valued algebra), that Chang introduced in 1958 in order to provide an adequate semantic characterization for L ukasiewicz’ many-valued logics.56 MV algebras represent a weakening of Boolean algebras, where the notion of conjunction (disjunction) is split into two different operations. The first kind of operation behaves like a L ukasiewicz conjunction (disjunction) and is generally nonidempotent; the second kind of operation is a lattice-meet (join). These algebras are also equipped with a generalized complement. In this framework, the lattice operations turn out to be defined in terms of the generalized complement and of the L ukasiewicz operations. Whenever the two conjunctions (resp., disjunctions) collapse into one and the same operation, one obtains a Boolean algebra. Let us now give the formal definition of MV algebra. DEFINITION 43 (MV algebra57 ).
An MV algebra is a structure
M = M , ⊕ , ′ , 0 , 1 , 56 See 57 See
[Chang, 1958; Chang, 1959]. [Mangani, 1973; Cignoli et al., 2000].
The History of Quantum Logic
257
where ⊕ is a binary operation, ′ is a unary operation and 0, 1 are special distinct elements satisfying the following conditions: (MV1) a ⊕ b = b ⊕ a; (MV2) a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c; (MV3) a ⊕ a′ = 1; (MV4) a ⊕ 0 = a; (MV5) a ⊕ 1 = 1; (MV6) a′′ = a; (MV7) (a′ ⊕ b)′ ⊕ b = (b′ ⊕ a)′ ⊕ a. ukasiewicz conjunction ⊙, In any MV algebra M = M , ⊕ , ′ , 0 , 1, the L the lattice operations ∧ and ∨, the L ukasiewicz implication →L , the partial order relation ≤ can be defined as follows: • a ⊙ b := (a′ ⊕ b′ )′ ; • a ∧ b := (a ⊕ b′ ) ⊙ b; • a ∨ b := (a ⊙ b′ ) ⊕ b; • a →L b := a′ ⊕ b; • a ≤ b iff a ∧ b = a. It is not difficult to see that ∀a, b ∈ M : a ≤ b iff a →L b = a′ ⊕ b = 1. Hence, the operation →L represents a well behaved conditional.58 LEMMA 44. Let M = M , ⊕ , ′ , 0 , 1 be an MV algebra. Consider the structure M , ≤ , ′ , 0 , 1 , where ≤ is the partial order defined on M. Such structure is a distributive bounded involution lattice, where ∧ and ∨ represent the infimum and the supremum, respectively. The noncontradiction principle (a ∧ a′ = 0) and the excluded middle (a ∨ a′ = 1) are possibly violated.59 A privileged example of MV algebra can be defined by assuming as support the real interval [0, 1]. DEFINITION 45 (The [0, 1]-MV algebra). The [0, 1]-MV algebra is the structure M[0,1] = [0, 1] , ⊕ , ′ , 0 , 1 , where 58 Generally, a binary operation → of a structure (which is at least a bounded poset) is considered a well behaved conditional, when: a ≤ b iff a → b = 1, for any elements a and b. By assuming a natural logical interpretation, this means that the conditional a → b is “true” iff the “implication-relation” a ≤ b holds. 59 See, for instance, [Cignoli et al., 2000].
258
M. L. Dalla Chiara, R. Giuntini and M. R´edei
• ⊕ is the truncated sum. In other words: ∀x, y ∈ [0, 1] {x ⊕ y = min(1, x + y)} ; • ∀x ∈ [0, 1] {x′ = 1 − x} ; • 0 = 0; • 1 = 1. One can easily realize that M[0,1] is a special example of MV algebra where: • the partial order ≤ is a total order (coinciding with the natural real order); • x ∧ y = min(x, y); • x ∨ y = max(x, y). Let us now return to the concrete effect-structure E(H) , ⊞ , 0 , 1. The partial operation ⊞ can be naturally extended to a total operation ⊕ that behaves similarly to a truncated sum. For any E, F ∈ E(H), E + F if ∃(E ⊞ F ), E ⊕ F := 1 otherwise. Furthermore, let us define: E ′ := I − E. The structure E(H) , ⊕ , ′ , 0 , 1 turns out to be “very close” to an MV algebra. However, something is missing: E(H) satisfies the first six axioms of the definition of an MV algebra; at the same time one can easily check that the final axiom (usually called “Lukasiewicz axiom”) is violated. For instance, consider two nontrivial projections P, Q such that P is not orthogonal to Q′ and Q is not orthogonal to P ′ . Then, by the definition of ⊕ given immediately above, we have that P ⊕ Q′ = I and Q ⊕ P ′ = I. Hence, (P ′ ⊕ Q)′ ⊕ Q = Q = P = (P ⊕ Q′ )′ ⊕ P . As a consequence, the L ukasiewicz axiom must be conveniently weakened to obtain an adequate description of concrete effect structures. This can be done by means of the notion of quantum MV algebra (QMV algebra).60 As with MV algebras, QMV algebras are total structures having the following form: M = M , ⊕ , ′ , 0 , 1 , where: (i) 0 , 1 represent the impossible and the certain object, respectively; (ii)
′
is the negation-operation;
(iii) ⊕ represents a disjunction (or ) which is generally nonidempotent (a⊕a = a). 60 See
[Giuntini, 1996].
The History of Quantum Logic
259
A (generally nonidempotent) conjunction (and ) is then defined via the de Morgan law: a ⊙ b := (a′ ⊕ b′ )′ . On this basis, a pair consisting of an idempotent conjunction et and of an idempotent disjunction vel is then defined. As we have already discussed, in any MV algebra such idempotent operations behave as a lattice-meet and lattice-join, respectively. However, this is not the case for QMV algebras. As a consequence, in such a more general situation, we will denote the et operation by the symbol ⋓, while the vel will be indicated by ⋒. The definition of et and vel is as in the MV-case: a ⋓ b := (a ⊕ b′ ) ⊙ b a ⋒ b := (a ⊙ b′ ) ⊕ b. DEFINITION 46 (QMV algebra). A quantum MV algebra (QMV algebra) (QMV) is a structure M = M , ⊕ , ′ , 0 , 1 , where ⊕ is a binary operation, ′ is a unary operation, and 0, 1 are special distinct elements of M . For any a, b ∈ M : a ⊙ b := (a′ ⊕ b′ )′ , a ⋓ b := (a ⊕ b′ ) ⊙ b , a ⋒ b := (a ⊙ b′ ) ⊕ b. Assume that the following conditions hold: (QMV1)
a ⊕ b = b ⊕ a;
(QMV2)
a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c;
(QMV3)
a ⊕ a′ = 1;
(QMV4)
a ⊕ 0 = a;
(QMV5)
a ⊕ 1 = 1;
(QMV6)
a′′ = a;
(QMV7)
a ⊕ [(a′ ⋓ b) ⋓ (c ⋓ a′ )] = (a ⊕ b) ⋓ (a ⊕ c).
By Axioms (QMV3), (QMV1) and (QMV4), one immediately obtains that 0′ = 1. The operations ⋓ and ⋒ of a QMV algebra M are generally noncommutative. As a consequence, they do not represent lattice-operations. It is not difficult to prove that ⋓ is commutative iff ⋒ is commutative iff (MV7) of Definition 43 holds. From this it easily follows that a QMV algebra M is an MV algebra iff ⋓ or ⋒ is commutative. At the same time (as in the MV-case), we can define in any QMV algebra M , ⊕ , ′ , 0 , 1 the following relation: a ≤ b iff a ⋓ b = a.
260
M. L. Dalla Chiara, R. Giuntini and M. R´edei
The structure M , ≤ , ′ , 0 , 1 turns out to be a bounded involution poset. One can prove that the concrete effect structure E(H) , ⊕ , ′ , 0 , 1 is a QMV algebra (which is not an MV algebra).
4.2 Unsharp quantum logics Orthologic, orthomodular quantum logic and partial classical logic are all examples of sharp logics. Both the logical and the semantic version of the noncontradiction principle hold: • any contradiction α ¬α is always false;61 • a sentence α and its negation ¬α cannot both be true. Some unsharp forms of quantum logic have been proposed (in the late Eighties in the Nineties) as natural logical abstractions from the effect-state systems.62 The most obvious unsharp weakening of orthologic is represented by a logic that has been called paraconsistent quantum logic (briefly, PQL).63 In the algebraic semantics, this logic is characterized by the class of all models based on a bounded involution lattice, where the noncontradiction principle (a ∧ a′ = 0) is possibly violated. In the Kripkean semantics, instead, PQL is characterized by the class of all models K = I , R , P r , V , where the accessibility relation R is symmetric (but not necessarily reflexive), while P r behaves as in the OL case (i.e., P r is a set of propositions that contains I, ∅ and is closed under the operations ∩ and ′ ). Any pair I , R, where R is a symmetric relation on I, is called a symmetric frame. All the other semantic definitions are given as in the case of OL, mutatis mutandis. On this basis, one can show that our algebraic and Kripkean semantics characterize the same logic. Unlike OL and OQL, a world i of a PQL-model may verify a contradiction. Since R is generally not reflexive, it may happen that i ∈ V (α) and i ⊥ V (α). Hence, i |= α¬α. In spite of this, a contradiction cannot be verified by all worlds of a model K. Hilbert-space models for PQL can be constructed, in a natural way. In the Kripkean semantics, consider the models based on the following frames E(H) − {0} , ⊥ , where ⊥ represents the nonorthogonality relation between effects (E ⊥ F iff E ≤ F ′ ). Unlike the corresponding case involving projections, in this situation 61 Of
course, in the case of PaCL, contradictions are false only if defined. [Dalla Chiara et al., 2004]. 63 See [Dalla Chiara and Giuntini, 1989].
62 See
The History of Quantum Logic
261
the accessibility relation is symmetric but generally nonreflexive. For instance, the semi-transparent effect 12 I (representing the prototypical ambiguous property) is a fixed point of the generalized complement ′ . Hence, 1 1 1 1 I ⊥ I and ( I)′ ⊥ ( I)′ . 2 2 2 2 From the physical point of view, possible worlds are here identified with possible pieces of information about the physical system under investigation. Any information may correspond to: • a pure state (a maximal information); • a proper mixture (a non-maximal information); • a projection (a sharp property); • a proper effect (an unsharp property). Thus, unlike the sharp models of orthomodular quantum logic, here possible worlds do not always correspond to states of the quantum system under investigation. As expected, violations of the noncontradiction principle will be determined by unsharp (ambiguous) pieces of knowledge. An axiomatization of PQL can be obtained by dropping the absurdity rule and the Duns Scotus rule in the OL calculus. As with OL, the logic PQL satisfies the finite model property and is consequently decidable. From the logical point of view, an interesting feature of PQL is represented by the fact that this logic is a common sublogic in a wide class of important logics. In particular, PQL is a sublogic of Girard’s linear logic, of L ukasiewicz’ infinitely many-valued logic and of some relevant logics. As we have seen, PQL is expressed in the same language of orthologic and of orthomodular quantum logic, representing a weakening thereof. The Brouwer Zadeh structures (emerging from the concrete effect-state systems) have suggested a stronger example of unsharp quantum logic, called Brouwer Zadeh logic (also fuzzy intuitionistic logic). As expected, a characteristic property of Brouwer Zadeh logic (BZL) is a splitting of the connective “not” into two forms of negation: a fuzzylike negation, that gives rise to a paraconsistent behavior and an intuitionistic-like negation. The fuzzy “not” (¬) represents a weak negation, that inverts the two extreme truth-values (truth and falsity), satisfies the double negation principle but generally violates the noncontradiction principle. The intuitionistic “not” (∼) is a stronger negation, a kind of necessitation of the fuzzy “not”. On this basis, a necessity operator can be defined in terms of the intuitionistic and of the fuzzy negation: Lα :=∼ ¬α . A possibility operator is defined in terms of the necessity operator and of the fuzzy negation: M α := ¬L¬α .
262
M. L. Dalla Chiara, R. Giuntini and M. R´edei
As happens with OL, OQL and PQL, also BZL can be characterized by an algebraic and by a Kripkean semantics. We have seen that concrete effect-systems give also rise to examples of partial algebraic structures, where the basic operations are not always defined. How to give a semantic characterization for a logic that corresponds to the class of all effect algebras? Such a logic has been called unsharp partial quantum logic (UPaQL). The language of UPaQL consists of a set of atomic sentences and of two primi+ (aut). The set of tive connectives: the negation ¬ and the exclusive disjunction ∨ sentences is defined in the usual way. A conjunction is metalinguistically defined, via de Morgan law: + α∧. β := ¬(¬α ∨ ¬β). The intuitive idea underlying the semantics for PaQL is the following: disjunctions and conjunctions are always considered “legitimate” from a mere lin+ guistic point of view. However, semantically, a disjunction α ∨ β will have the intended meaning only in the “appropriate cases:” where the values of α and β + are orthogonal in the corresponding effect algebra. Otherwise, α ∨ β will have any meaning whatsoever (generally not connected with the meanings of α and β). As is well known, a similar semantic “trick” is used in some classical treatments of the description operator ι (“the unique individual satisfying a given property”; for instance, “the present king of Italy”). Apparently one is dealing with a different idea with respect to the semantics of partial classical logic (PaCL), where the meaning of a sentence is not necessarily defined. One has proved that UPaQL is an axiomatizable logic.64 Also the theory of QMV algebras has naturally suggested the semantic characterization of another form of quantum logic (called L ukasiewicz quantum logic (LQL)), which generalizes both OQL and Lℵ (Lukasiewicz’ infinite many valued logic). The language of L QL contains the same primitive connectives as UPaQL + (∨ , ¬). The conjunction (∧. ) is defined via the de Morgan law (as withUPaQL). ∨ Furthermore, a new pair of conjunction ( ∧ ∧ ) and disjunction ( ∨ ) connectives are defined as follows: + α∧ . β ∧ β := (α ∨ ¬β)∧ ∨ α∨ β := ¬(¬α ∧ ∧ ¬β)
L QL can be easily axiomatized by means of a calculus that simply mimics the axioms of QMV algebras.65 5 THE DISCUSSION ABOUT THE EMPIRICAL NATURE OF LOGIC “Is logic an empirical science?” This is question that has been often discussed in connection with quantum logic. At the very beginning of the contemporary discussion about the nature of logic, the claim that the “right logic” to be used in 64 See 65 See
[Dalla Chiara and Giuntini, 2002]. [Dalla Chiara et al., 2004].
The History of Quantum Logic
263
a given theoretical situation may also depend on experimental data appeared to be a kind of extremistic view, in contrast with a leading philosophical tradition according to which a characteristic feature of logic should be its absolute independence from any content. Interestingly enough, a quite heterodox thesis, in this connection, had been defended (already in 1936) by L ukasiewicz (the “father” of fuzzy logics). The strong contrast between L ukasiewicz’ position and the leading ideas of the Vienna Circle is apparent in the following quote [L ukasiewicz, 1936]: I think that in Carnap the attempt to reduce certain objective problems to a linguistic one results from his erroneous interpretation of the a priori sciences and their role in the study of reality. That erroneous opinion was taken over by Carnap from Wittgenstein, who considers all a priori propositions, that is, those belonging to logic and mathematics, to be tautologies. Carnap calls such propositions analytic. I have always opposed that terminology, since the association it evokes may make it misleading. Moreover, Carnap believes, together with Wittgenstein, that a priori propositions do not convey anything about reality. For them the a priori disciplines are only instruments which facilitate the cognition of reality, but a scientific interpretation of the world could, if necessary, do without those a priori elements. Now, my opinion on the a priori disciplines and their role in the study of reality is entirely different. We know today that not only do different systems of geometry exist, but different systems of logic as well, and they have, moreover, the property that one cannot be translated into another. I am convinced that one and only one of these logical systems is valid in the real world, that is, is real, in the same way as one and only one system of geometry is real. Today, it is true, we do not yet know which system that is, but I do not doubt that empirical research will sometime demonstrate whether the space of the universe is Euclidean or non-Euclidean, and whether relationships between facts correspond to two-valued logic or to one of the many-valued logics. All a priori systems, as soon as they are applied to reality, become natural-science hypotheses which have to be verified by facts in a similar way as is done with physical hypotheses. The comparison between logic and geometry has also been the central point of Putnam’s famous article Is Logic empirical? [Putnam, 1969], which has highly influenced the epistemological debate about quantum logic. These days, an empirical position in logic is generally no longer regarded as a “daring heresy” . At the same time, we are facing not only a variety of logics, but even a variety of quantum logics. The “labyrinth of quantum logics” described by van Fraassen in 197466 has become more and more labyrinthine. Even the distinction between sharp and unsharp logical situations turns out to be, to a certain extent, “unsharp”. 66 See
[van Fraassen, 1974].
264
M. L. Dalla Chiara, R. Giuntini and M. R´edei
As we have seen, the logical behavior of effects in Hilbert space QT can be represented by means of different forms of unsharp quantum logics. One can refer to a partial quantum logic (like UPaQL) (where conjunction and disjunction are only defined for pairs of orthogonal effects), or to a total logic (like L ukasiewicz’ quantum logic, a natural logical abstraction from the QMV- structure of E(H)). Another possibility is represented by paraconsistent quantum logic and by Brouwer Zadeh logic, whose Kripkean semantics is based on the following idea: effects are regarded as possible worlds (a kind of unsharp and partial pieces of information about possible physical situations), while the meanings of linguistic sentences are represented by convenient sets of effects. A totally different situation has recently arisen in the framework of quantum computation. The theory of quantum logical gates has suggested some nonstandard version of unsharp logic, that have been called quantum computational logics. Unlike all other forms of quantum logic we have investigated here, in quantum computational logics meanings of sentences correspond to quantum information quantities, which are mathematically represented by convenient systems of qubits. These researches belong, however, to a different and new chapter of the history of quantum logic. 6 MATHEMATICAL APPENDIX We give here a survey of the definitions of some basic mathematical concepts that have plaid a fundamental role in the history of quantum logic.
6.1 Algebraic structures DEFINITION 47 (Poset). A partially ordered set (called also poset) is a structure B = B , ≤ , where: B (the support of the structure) is a nonempty set and ≤ is a partial order relation on B. In other words, ≤ satisfies the following conditions for all a, b, c ∈ B: (i) a ≤ a (reflexivity); (ii) a ≤ b and b ≤ a implies a = b (antisymmetry); (iii) a ≤ b and b ≤ c implies a ≤ c (transitivity). DEFINITION 48 (Chain). Let B = B , ≤ be a poset. A chain in B is a subset C ⊆ B such that ∀a, b ∈ C: a ≤ b or b ≤ a. DEFINITION 49 (Bounded poset). A bounded poset is a structure B = B , ≤ , 0 , 1 , where:
The History of Quantum Logic
265
(i) B , ≤ is a poset; (ii) 0 and 1 are distinct special elements of B: the minimum and the maximum with respect to ≤. In other words, for all b ∈ B: 0 ≤ b and b ≤ 1. DEFINITION 50 (Lattice). A lattice is a poset B = B , ≤ in which any pair of elements a, b has a meet a ∧ b (also called infimum) and a join a ∨ b (also called supremum) such that: (i) a ∧ b ≤ a, b, and ∀c ∈ B: c ≤ a, b implies c ≤ a ∧ b; (ii) a, b ≤ a∨ b , and ∀c ∈ B: a, b ≤ c implies a ∨ b ≤ c. In any lattice the following condition holds: a ≤ b iff a ∧ b = a iff a ∨ b = b. DEFINITION 51 (Complemented lattice). A complemented lattice is a bounded lattice B where: ∀a ∈ B ∃b ∈ B such that a ∧ b = 0 and a ∨ b = 1.
Let X be any set of elements of a lattice B. If existing, the infimum X and the supremum X are the elements of B that satisfy the following conditions: (ia) ∀a ∈ X :
X ≤ a;
(ib) ∀c ∈ B : ∀a ∈ X[c ≤ a] implies c ≤ X; (iia) ∀a ∈ X : a ≤ X; (iib) ∀c ∈ B : ∀a ∈ X[a ≤ c] implies X ≤ c.
On can show that, when they exist the infimum and the supremum
are unique. A lattice is complete iff for any set of elements X the infimum X and the supremum X
exist. A lattice is σ-complete iff for any countable set of elements X the infimum X and the supremum X exist.
DEFINITION 52 (Continuous lattice). A continuous lattice is a complete lattice B such that for any a ∈ B and for any chain C ⊆ B the following conditions are satisfied: (i) B is meet-continuous. In other words: a∧ C = {a ∧ c : c ∈ C} ;
(ii) B is join-continuous. In other words: a∨ C = {a ∨ c : c ∈ C} .
266
M. L. Dalla Chiara, R. Giuntini and M. R´edei
In many situations, a poset (or a lattice) is closed under a unary operation that represents a weak form of logical negation. Such a finer structure is represented by a bounded involution poset. DEFINITION 53 (Bounded involution poset). A bounded involution poset is a structure B = B , ≤ , ′ , 0 , 1 where: (i) B , ≤ , 0 , 1 is a bounded poset; (ii)
′
is a unary operation (called involution or generalized complement) that satisfies the following conditions: (a) a = a′′
(double negation);
(b) a ≤ b implies b′ ≤ a′
(contraposition).
The presence of a negation-operation permits us to define an orthogonality relation ⊥, that may hold between two elements of a bounded involution poset. DEFINITION 54 (Orthogonality). Let a and b belong to a bounded involution poset. The object a is orthogonal to the object b (indicated by a ⊥ b) iff a ≤ b′ . A set of elements S is called a pairwise orthogonal set iff ∀a, b ∈ S such that a = b, a ⊥ b. A maximal set of pairwise orthogonal elements is a set of pairwise orthogonal elements that is not a proper subset of any set of pairwise orthogonal elements. When a is not orthogonal to b we write: a ⊥ b. The orthogonality relation ⊥ is sometimes also called preclusivity; while its negation ⊥ is also called accessibility. Since, by definition of bounded involution poset, a ≤ b implies b′ ≤ a′ (contraposition) and a = a′′ (double negation), one immediately obtains that ⊥ is a symmetric relation. Notice that 0 ⊥ 0 and that ⊥ is not necessarily irreflexive. It may happen that an object a (different from the null object 0) is orthogonal to itself: a ⊥ a (because a ≤ a′ ). Objects of this kind are called self-inconsistent. Suppose we have two self-inconsistent objects a and b, and let us ask whether in such a case a is necessarily orthogonal to b. Generally, the answer to this question is negative. There are examples of bounded involution posets such that for some objects a and b: a ⊥ a and b ⊥ b and a ⊥ b. DEFINITION 55 (Kleene poset). A bounded involution poset is a Kleene poset (or also a regular poset) iff it satisfies the Kleene condition for any pair of elements a and b: a ⊥ a and b ⊥ b implies a ⊥ b.
The History of Quantum Logic
267
DEFINITION 56 (Bounded involution lattice). A bounded involution lattice is a bounded involution poset that is also a lattice. A Kleene lattice (or regular lattice) is a Kleene poset that is also a lattice. Generally, bounded involution lattices and Kleene lattices may violate both the noncontradiction principle and the excluded middle. In other words, it may happen that: a ∧ a′ = 0 and a ∨ a′ = 1. DEFINITION 57 (Orthoposet and ortholattice). An orthoposet is a bounded involution poset B = B , ≤ , ′ , 0 , 1 that satisfies the conditions: (i) a ∧ a′ = 0 (noncontradiction principle); (ii) a ∨ a′ = 1 (excluded middle principle). An ortholattice is an orthoposet that is also a lattice. The involution operation ′ of an orthoposet (ortholattice) is also called orthocomplementation (or shortly orthocomplement). A σ-orthocomplete orthoposet (σ-orthocomplete ortholattice) is an orthoposet (ortholattice) B such that for any countable set {ai }i∈I of pairwise orthogonal elements the supremum {ai }i∈I exists in B. DEFINITION 58 (Distributive lattice). A lattice B = B , ∧ , ∨ is distributive iff the meet ∧ is distributed over the join ∨ and vice versa. In other words: (i) a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c); (ii) a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c). Distributive involution lattices are also called de Morgan lattices. In this framework, Boolean algebras can be then defined as particular examples of de Morgan lattices. DEFINITION 59 (Boolean algebra). A Boolean algebra is a structure B = B , ∧ , ∨ , ′ , 0 , 1 that is at the same time an ortholattice and a de Morgan lattice. In other words, Boolean algebras are distributive ortholattices. DEFINITION 60 (Orthomodular poset and orthomodular lattice). An orthomodular poset is an orthoposet B = B , ≤ , ′ , 0 , 1 that satisfies the following conditions: (i) ∀a, b ∈ B, a ⊥ b implies a ∨ b ∈ B;
268
M. L. Dalla Chiara, R. Giuntini and M. R´edei
(ii) ∀a, b ∈ B, a ≤ b implies b = a ∨ (a ∨ b′ )′ . An orthomodular lattice is an orthomodular poset that is also a lattice. Clearly, any distributive ortholattice (i.e., any Boolean algebra), is orthomodular. DEFINITION 61 (Modularity). A lattice B is called modular iff ∀a, b ∈ B, a ≤ b implies ∀c ∈ B[a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c)]. Every modular ortholattice is orthomodular, but not the other way around. Furthermore, any distributive lattice is modular. DEFINITION 62 (Continuous geometry). A continuous geometry (von Neumann lattice in Birkhoff’s terminology) is a modular, complemented continuous lattice. If one assumes that B is a complete, modular ortholattice, then the continuity conditions (i)-(ii) of Definition 52 can be derived because of the following THEOREM 63 (Kaplansky’s Theorem). Any complete, modular ortholattice is a continuous geometry. A bounded poset (lattice) B may contain some special elements, called atoms. DEFINITION 64 (Atom). An element b of B is called an atom of B iff b covers 0. In other words, b = 0 and ∀c ∈ B: c ≤ b implies c = 0 or c = b. Apparently, atoms are nonzero elements such that no other element lies between them and the lattice-minimum. DEFINITION 65 (Atomicity). A bounded poset B is atomic iff ∀a ∈ B − {0} there exists an atom b such that b ≤ a. Of course, any finite bounded poset is atomic. At the same time, there are examples of infinite bounded posets that are atomless (and hence nonatomic), the real interval [0, 1] being the most familiar example. It turns out that any atomic orthomodular lattice B is atomistic in the sense that any element can be represented as the supremum of a set of atoms, i.e., for any element a there exists a set {bi }i∈I of atoms such that a = {bi }i∈I .
DEFINITION 66 (Covering property). A lattice B satisfies the covering property iff ∀a, b ∈ B: if a covers a ∧ b, then a ∨ b covers b.
It turns out that an atomic lattice B has the covering property iff for every atom a of B and for every element b ∈ B such that a ∧ b = 0, the element a ∨ b covers b. One of the most significant quantum relations, compatibility, admits a purely algebraic definition. DEFINITION 67 (Compatibility). Let B be an orthomodular lattice and let a and b be elements of B. The element a is called compatible with the element b iff a = (a ∧ b′ ) ∨ (a ∧ b).
The History of Quantum Logic
269
One can show that the compatibility relation is symmetric. The proof uses the orthomodular property in an essential way. Clearly, if B is a Boolean algebra, then any element is compatible with any other element by distributivity. One can prove that a, b are compatible in the orthomodular lattice B iff the subalgebra of B generated by {a, b} is Boolean. DEFINITION 68 (Irreducibility). Let B be an orthomodular lattice. B is said to be irreducible iff {a ∈ B : ∀b ∈ B (a is compatible with b)} = {0, 1} . If B is not irreducible, it is called reducible. DEFINITION 69 (Separability). An orthomodular lattice B is called separable iff every set of pairwise orthogonal elements of B is countable. DEFINITION 70 (Group). A group is a structure G = G , + , − , 0, where + is a binary operation, − is a unary operation, 0 is a special element. The following conditions hold: (i) G , + , 0 is a monoid . In other words, (a) the operation + is associative: a + (b + c) = (a + b) + c; (b) 0 is the neutral element: a + 0 = a; (ii) ∀a ∈ G, −a is the inverse of a: a + (−a) = 0. An Abelian monoid (group) is a monoid (group) in which the operation + is commutative: a + b = b + a. DEFINITION 71 (Ring). A ring is a structure D = D , + , · , − , 0 that satisfies the following conditions: (i) D , + , 0 is an Abelian group; (ii) the operation · is associative: a · (b · c) = (a · b) · c; (iii) the operation · distributes over + on both sides, i.e., ∀a, b, c ∈ D: (a) a · (b + c) = (a · b) + (a · c); (b) (a + b) · c = (a · c) + (b · c).
270
M. L. Dalla Chiara, R. Giuntini and M. R´edei
If there is an element 1 in D that is neutral for · (i.e., if D , · , 1 is a monoid), then the ring is called a ring with unity. A ring is trivial in case it has only one element, otherwise it is nontrivial . It is easy to see that a ring with unity is nontrivial iff 0 = 1. A commutative ring is a ring in which the operation · is commutative. DEFINITION 72 (Division ring). A division ring is a nontrivial ring D with unity such that any nonzero element is invertible; in other words, for any a ∈ D (a = 0), there is an element b ∈ D such that a · b = b · a = 1. DEFINITION 73 (Field). A field is a commutative division ring. Both the real numbers (IR) and the complex numbers (C) give rise to a field. An example of a genuine division ring (where · is not commutative) is given by the quaternions (Q).
6.2 Hilbert spaces DEFINITION 74 (Vector space). A Vector space over a division ring D is a structure V = V , + , − , · , 0 that satisfies the following conditions: (i) V , + , − , 0 (the vector structure) is an Abelian group, where 0 (the null vector ) is the neutral element; (ii) for any element a of the division ring D and any vector ϕ of V , aϕ (the scalar product of a and ϕ) is a vector in V . The following conditions hold for any a, b ∈ D and for any ϕ, ψ ∈ V : (a) a(ϕ + ψ) = (aϕ) + (aψ); (b) (a + b)ϕ = (aϕ) + (bϕ); (c) a(bϕ) = (a · b)ϕ; (d) 1ϕ = ϕ. The elements (vectors) of a vector space V are indicated by ϕ, ψ, χ, . . ., while a, b, c, . . . represent elements (scalars) of the division ring D. Any finite sum of vectors ψ1 , . . . , ψn is indicated by ψ1 + . . . + ψn (or i∈K ψi , when K = {1, . . . , n}.) On this basis, one can introduce the notion of pre-Hilbert space. Hilbert spaces are then defined as special cases of pre-Hilbert spaces. We will only consider pre-Hilbert spaces (and Hilbert spaces) whose division ring is either IR or C. DEFINITION 75 (Pre-Hilbert space). Let D be the field of the real or the complex numbers. A pre-Hilbert space over D is a vector space V over D, equipped with an inner product .|. : V × V → D that satisfies the following conditions for any ϕ, ψ, χ ∈ V and any a ∈ D:
The History of Quantum Logic
271
(i) ϕ|ϕ ≥ 0; (ii) ϕ|ϕ = 0 iff ϕ = 0; (iii) ψ|aϕ = aψ|ϕ; (iv) ϕ|ψ + χ = ϕ|ψ + ϕ|χ; (v) ϕ|ψ = ψ|ϕ∗ , where tion if D = C.
∗
is the identity if D = IR, and the complex conjuga-
The inner product .|. permits one to generalize some geometrical notions of ordinary 3-dimensional spaces. DEFINITION 76 (Norm of a vector). The norm +ϕ+ of a vector ϕ is the number ϕ|ϕ1/2 . A unit (or normalized) vector is a vector ψ such that +ψ+ = 1. Two vectors ϕ, ψ are called orthogonal iff ϕ|ψ = 0. DEFINITION 77 (Orthonormal set of vectors). A set {ψi }i∈I of vectors is called orthonormal iff its elements are pairwise orthogonal unit vectors. In other words: (i) ∀i, j ∈ I(i = j) : ψi |ψj = 0; (ii) ∀i ∈ I : +ψi + = 1. The norm +.+ induces a metric d on the pre-Hilbert space V: d(ψ, ϕ) := +ψ − ϕ+. We say that a sequence {ψi }i∈N of vectors in V converges in norm (or simply converges) to a vector ϕ of V iff limi→∞ d(ψi , ϕ) = 0. In other words, ∀ ε > 0 ∃n ∈ N ∀k > n : d(ψk , ϕ) < ε. A Cauchy sequence is a sequence {ψi }i∈N of vectors in V such that ∀ε > 0 ∃n ∈ N ∀h > n ∀k > n : d(ψh , ψk ) < ε. It is easy to see that whenever a sequence {ψi }i∈N of vectors in V converges to a vector ϕ of V , then {ψi }i∈N is a Cauchy sequence. The crucial question is the converse one: which are the pre-Hilbert spaces in which every Cauchy sequence converges to an element in the space? DEFINITION 78 (Metrically complete pre-Hilbert space). A pre-Hilbert space V with inner product .|. is metrically complete with respect to the metric d induced by .|. iff every Cauchy sequence of vectors in V converges to a vector of V . DEFINITION 79 (Hilbert space). A Hilbert space is a metrically complete preHilbert space. A real (complex ) Hilbert space is a Hilbert space whose division ring is IR (C). The notion of pre-Hilbert space (Hilbert space) can be generalized to the case where the division ring is represented by Q (the division ring of all quaternions).
272
M. L. Dalla Chiara, R. Giuntini and M. R´edei
Consider a Hilbert space H over a division ring D. DEFINITION 80 (Hilbert linear combination). Let {ψi }i∈I be a set of vectors of H and let {ai }i∈I ⊆ D. A vector ψ is called a (Hilbert) linear combination (or superposition) of {ψi }i∈I (with scalars {ai }i∈I ) iff ∀ε ∈ IR+ there is a finite set J ⊆ I such that for any finite subset K of I including J: +ψ − ai ψi + ≤ ε. i∈K
Apparently, when existing, the linear combination of {ϕi }i∈I (with scalars {ai }i∈I ) is unique. We denote it by i∈I ai ψi . When no confusion is possible, the index set I will be omitted.
DEFINITION 81 (Orthonormal basis). An orthonormal basis of H is a maximal orthonormal set {ψi }i∈I of H. In other words, {ψi }i∈I is an orthonormal set such that no orthonormal set includes {ψi }i∈I as a proper subset.
One can prove that every Hilbert space H has an orthonormal basis and that all orthonormal bases of H have the same cardinality. The dimension of H is then defined as the cardinal number of any basis of H. Let {ψi }i∈I be any orthonormal basis of H. One can prove that every vector ϕ of H can be expressed in the following form: ϕ= ψi |ϕψi . i∈I
Hence, ϕ is a linear combination of {ψi }i∈I with scalars ψi |ϕ (the scalars ψi |ϕ are also called Fourier coefficients.) A Hilbert space H is called separable iff H has a countable orthonormal basis. In the following, we will always refer to separable Hilbert spaces. DEFINITION 82 (Closed subspace). A closed subspace of H is a set X of vectors that satisfies the following conditions: (i) X is a subspace of H. In other words, X is closed under finite linear combinations. Hence, ψ, ϕ ∈ X implies aψ + bϕ ∈ X; (ii) X is closed under limits of Cauchy sequences. In other words: if each element of a Cauchy sequence of vectors belongs to X, then also the limit of the sequence belongs to X. The set of all closed subspaces of H is indicated by C(H). For any vector ψ, we indicate by [ψ] the unique 1-dimensional closed subspace that contains ψ. DEFINITION 83 (Operator). An operator of H is a map A : Dom(H) → H,
The History of Quantum Logic
273
where Dom(A) (the domain of A) is a subset of H. DEFINITION 84 (Densely defined operator). A densely defined operator of H is an operator A that satisfies the following condition: ∀ε ∈ IR+ ∀ψ ∈ H ∃ϕ ∈ Dom(A) [d(ψ, ϕ) < ε], where d represents the metric induced by .|.. DEFINITION 85 (Linear operator). A linear operator on H is an operator A that satisfies the following conditions: (i) Dom(A) is a closed subspace of H; (ii) ∀ψ, ϕ ∈ Dom(A) ∀a, b ∈ D : A(aψ + bϕ) = aAψ + bAϕ. In other words, a characteristic of linear operators is preserving the linear combinations. DEFINITION 86 (Bounded operator). A linear operator A is called bounded iff there exists a positive real number a such that ∀ψ ∈ H : +Aψ+ ≤ a+ψ+. The set B(H) of all bounded operators of H turns out to be closed under the operator sum, the operator product and the scalar product. In other words, if A ∈ B(H) and B ∈ B(H), then A + B ∈ B(H) and A.B ∈ B(H) ; for any scalar a, if B ∈ B(H), then aB ∈ B(H). DEFINITION 87 (Positive operator). A bounded operator A is called positive iff ∀ψ ∈ H : ψ|Aψ ≥ 0. DEFINITION 88 (The adjoint operator). Let A be a densely defined linear operator of H. The adjoint of A is the unique operator A∗ such that ∀ψ ∈ Dom(A)∀ϕ ∈ Dom(A∗ ) : Aψ|ϕ = ψ|A∗ ϕ . DEFINITION 89 (Self-adjoint operator). A self-adjoint operator is a densely defined linear operator A such that A = A∗ . If A is self-adjoint, then ∀ψ, ϕ ∈ Dom(A) : Aψ|ϕ = ψ|Aϕ. If A is self-adjoint and everywhere defined (i.e., Dom(A) = H), then A is bounded. DEFINITION 90 (Projection operator). A projection operator is an everywhere defined self-adjoint operator P that satisfies the idempotence property: ∀ψ ∈ H : P ψ = P P ψ. There are two special projections O and I called the zero (or null projection) and the identity projection which are defined as follows: ∀ψ ∈ H, Oψ = 0 and Iψ = ψ. Any projection other than O and I is called a nontrivial projection. Thus, P is a projection operator if Dom(P ) = H and P = P 2 = P ∗ . The set of all projection operators will be indicated by Π(H).
274
M. L. Dalla Chiara, R. Giuntini and M. R´edei
One can prove that the set C(H) of all closed subspaces and the set Π(H) of all projections of H are in one-to-one correspondence. Let X be a closed subspace of H. By the projection theorem every vector ψ ∈ H can be uniquely expressed as a linear combination ψ1 + ψ2 , where ψ1 ∈ X and ψ2 is orthogonal to any vector of X. Accordingly, we can define an operator PX on H such that ∀ψ ∈ H : PX ψ = ψ1 (in other words, PX transforms any vector ψ into the “X-component” of ψ) It turns out that PX is a projection operator of H. Conversely, we can associate to any projection P its range, XP = {ψ : ∃ϕ(P ϕ = ψ)} , which turns out to be a closed subspace of H. For any closed subspace X and for any projection P , the following conditions hold: X(PX ) = X; P(XP ) = P. DEFINITION 91 (The trace functional). Let {ψi }i∈I be any orthonormal basis for H and let A be a positive operator. The trace of A (indicated by Tr(A)) is defined as follows: ψi |Aψi . Tr(A) := i
One can prove that the definition of Tr is independent of the choice of the basis. For any positive operator A, there exists a unique positive operator B such that: B 2 = A. If A is a (not necessarily positive) bounded operator, then A∗ A is positive. Let |A| be the unique positive operator such that |A|2 = A∗ A. A bounded operator A is called a trace-class operator iff Tr(|A|) < ∞. DEFINITION 92 (Density operator). A density operator is a positive, self-adjoint, trace-class operator ρ such that Tr(ρ) = 1. It is easy to see that, for any vector ψ, the projection P[ψ] onto the 1-dimensional closed subspace [ψ] is a density operator. DEFINITION 93 (Unitary operator). A unitary operator is a linear operator U such that: • Dom(U ) = H; • U U ∗ = U ∗ U = I. One can show that the unitary operators U are precisely the operators that preserve the inner product. In other words, for any ψ, ϕ ∈ H : ψ|ϕ = U ψ|U ϕ .
The History of Quantum Logic
275
DEFINITION 94 (Von Neumann algebra). A von Neumann algebra is a structure N = N, +, ., ∗ , I , where N is a subset of the set B(H) of all bounded operators of a Hilbert space H and for which the following conditions hold: (i) N contains the identity operator I and is closed under the scalar product, the sum +, the product . and the adjoint ∗ ; (ii) Com(Com(N )) = N , where Com(N ) := {B ∈ B(H) : ∀C ∈ N (B.C = C.B)} (Com(N ) is called the commutant of N ). Apparently, the set B(H) is a von Neumann algebra. One can easily see that the commutant is a closure operator on the power set of B(H). Furthermore, the commutant of any subset of B(H) is a von Neumann algebra. The double commutant of a subset X is called the von Neumann algebra generated by X. Since the commutant is a closure operator, it follows that the von Neumann algebra generated by a subset X of B(H) is the smallest von Neumann algebra including X. We denote by Π(N ) the set of all projections of a von Neumann algebra N . THEOREM 95. Let N be a von Neumann algebra (on a Hilbert space H). (i) N is generated by Π(N ); (ii) Π(N ) is a complete orthomodular sub-lattice of the Hilbert lattice Π(H). DEFINITION 96 (Center). The center of a von Neumann algebra N is the set Cen(N ) = N ∩ Com(N ). DEFINITION 97 (Factor). A factor is a von Neumann algebra N such that Cen(N ) = {cI : c ∈ C}. DEFINITION 98 (Projection equivalence). Two projections P and Q of a von Neumann algebra N are equivalent (P ∼ Q) iff ∃W ∈ N such that: W ∗ W = P and W W ∗ = Q. DEFINITION 99 (Finite Projection). A projection P of a von Neumann algebra N is finite iff ∀Q ∈ Π(N ): P ∼ Q and Q ≤ P imply P = Q. One can prove that for any complex Hilbert space H, the set B(H) is a factor. THEOREM 100. Let N be a factor. Then, (1) there exists a map d : Π(N ) → R+ ∪ {∞} (called dimension function) that satisfies the following conditions for any P, Q ∈ Π(N ): (i) d(P ) = 0 iff P = O;
276
M. L. Dalla Chiara, R. Giuntini and M. R´edei
Range of d
Type of N
Example
{0, 1, 2, · · · , n}
In
B(H), dim(H) = n
{0, 1, 2, · · · , ∞}
I∞
B(H), dim(H) = ℵ0
[0, 1] ⊂ IR
II1
new
IR+ ∩ {∞}
II∞
–
{0, ∞}
III
–
Orthomodular Π(N ) modular, atomic, nondistributive (n ≥ 2) nonmodular, atomic modular, nondistributive, no atom nonmodular, no atom nonmodular no atom
Figure 1. Factor types and their dimension functions (from [Kalmbach, 1983]). (ii) if P ⊥ Q, then d(P ∨ Q) = d(P ) + d(Q) (additivity); (iii) P is finite iff d(P ) < ∞; (iv) P ∼ Q iff d(P ) = d(Q). (2) The dimension function d is uniquely determined by conditions (i)-(iv) (up to a constant positive multiple). DEFINITION 101 (Factor type). Let N be a factor and let d be the dimension function defined above. N is called of (i) type In , if the range of d is {0, 1, 2, . . . , n}; (ii) type I∞ , if the range of d is {0, 1, 2, . . . , ∞}; (iii) type II1 , if the range of d is [0, 1]; (iv) type II∞ , if the range of d is R+ ∪ {∞}; (v) type III, if the range of d is {0, ∞}. THEOREM 102. Every von Neumann algebra is uniquely decomposable into the direct sum of factors of type In , I∞ , II1 , II∞ , III. COROLLARY 103. Every factor is of type either In or I∞ or II1 or II∞ or III.
The History of Quantum Logic
277
BIBLIOGRAPHY [Aerts, 1984] D. Aerts, Construction of a structure which enables to describe the joint system of a classical and a quantum system, Reports on Mathematical Physics 20 (1984), 117–129. [Aerts and van Steirteghem, 2000] D. Aerts and B. Van Steirteghem, Quantum axiomatics and a theorem of M.P. Sol´ er, International Journal of Theoretical Physics 39 (2000), 497–502. [Aspect and Grangier, 1985] A. Aspect and P. Grangier, Tests of Bell’s inequalities with pairs of low energy correlated photons: an experimental realization of Einstein-Podolsky-Rosen-type correlations, Symposium on the Foundations of Modern Physics (P. Lahti and P. Mittelstaedt, eds.), World Scientific, Singapore, 1985, pp. 51–71. [Aspect et al., 1981] A. Aspect, P. Grangier, and G. Roger, Experimental tests of realistic local theories via Bell’s theorem, Physical Review Letters 47 (1981), 460–467. [Battilotti, 1998] G. Battilotti, Embedding classical logic into basic orthologic with a primitive modality, Logic Journal of the IGPL, 6 (1998), 383–402. [Battilotti and Faggian, 2002] G. Battilotti and C. Faggian, Quantum logic and the cube of logics, Handbook of Philosophical Logic (D. M. Gabbay and F. Guenthner, eds.), vol. 6, Kluwer Academic Publishers, Dordrecht, 2002, pp. 213–226. [Battilotti and Sambin, 1999] G. Battilotti and G. Sambin, Basic logic and the cube of its extensions, Logic and Foundations of Mathematics (A. Cantini, E. Casari, and P. Minari, eds.), Kluwer Academic Publishers, Dordrecht, 1999, pp. 165–186. [Bell, 1966] J. S. Bell, On the problem of hidden variables in quantum mechanics, Reviews of Modern Physics 38 (1966), 447–452. [Beltrametti and Bugajski, 1995] E. Beltrametti and S. Bugajski, A classical extension of quantum mechanics, Journal of Physics A: Mathematical and General 28 (1995), 247–261. [Beltrametti and Bugajski, 1997] E. Beltrametti and S. Bugajski, Effect algebras and statistical physical theories, Journal of Mathematical Physics 38 (1997), 3020–3030. [Beltrametti and Cassinelli, 1981] E. Beltrametti and G. Cassinelli, The logic of quantum mechanics, Encyclopedia of Mathematics and its Applications, vol. 15, Addison-Wesley, Reading, 1981. [Bennett, 1995] M. K. Bennett, Affine and projective geometry, Wiley-Interscience, New York, 1995. [Bennett and Foulis, 1997] M. K. Bennett and D. J. Foulis, Interval and scale effect algebras, Advances in Mathematics 19 (1997), 200–215. [Birkhoff, 1967] G. Birkhoff, Lattice Theory, 3rd (new) ed., Colloquium Publications, vol. 25, American Mathematical Society, Providence, 1967. [Birkhoff and von Neumann, 1936] G. Birkhoff and J. von Neumann, The logic of quantum mechanics, Annals of Mathematics 37 (1936), 823-843, in [von Neumann, 1961b]. [Bruns et al., 1990] G. Bruns, R. J. Greechie, J. Harding, and M. Roddy, Completions of orthomodular lattices, Order 7 (1990), 789–807. [Bub, 1999] J. Bub, Interpreting the quantum world, Cambridge University Press, Cambridge, 1999. [Bugajski, 1993] S. Bugajski, Delinearization of quantum logic, International Journal of Theoretical Physics 32 (1993), 389–398. [Busch, 1985] P. Busch, Elements of unsharp reality in the EPR experiment, Symposium on the Foundations of Modern Physics (P. Lahti and P. Mittelstaedt, eds.), World Scientific, Singapore, 1985, pp. 343–357. [Busch et al., 1995] P. Busch, M. Grabowski, and P. Lahti, Operational quantum mechanics, Lectures Notes in Physics, no. m31, Springer, Berlin, 1995. [Busch et al., 1991] P. Busch, P. Lahti, and P. Mittelstaedt, The quantum theory of measurement, Lectures Notes in Physics, no. m2, Springer, Berlin, 1991. [Cattaneo, 1993] G. Cattaneo, Fuzzy quantum logic II: the logics of unsharp quantum mechanics, International Journal of Theoretical Physics 32 (1993), 1709–1734. [Cattaneo, 1997] G. Cattaneo, A unified framework for the algebra of unsharp quantum mechanics, International Journal of Theoretical Physics 36 (1997), 3085–3117. [Cattaneo et al., 1999] G. Cattaneo, M. L. Dalla Chiara, and R. Giuntini, How many notions of ’sharp’ ?, International Journal of Theoretical Physics 38 (1999), 3153–3161. [Cattaneo et al., 1989] G. Cattaneo, C. Garola, and G. Nistic` o, Preparation-effect versus question-proposition structures, Physics Essays 2 (1989), 197–216.
278
M. L. Dalla Chiara, R. Giuntini and M. R´edei
[Cattaneo and Giuntini, 1995] G. Cattaneo and R. Giuntini, Some results on BZ structures from hilbertian unsharp quantum physics, Foundations of Physics 25 (1995), 1147–1182. [Cattaneo and Gudder, 1999] G. Cattaneo and S. P. Gudder, Algebraic structures arising in axiomatic unsharp quantum physics, Foundations of Physics 29 (1999), 1607–1637. [Cattaneo and Laudisa, 1994] G. Cattaneo and F. Laudisa, Axiomatic unsharp quantum theory (from Mackey to Ludwig), Foundations of Physics 24 (1994), 631–683. [Cattaneo and Nistic` o, 1986] G. Cattaneo and G. Nistic` o, Brouwer-Zadeh posets and threevalued L ukasiewicz posets, Fuzzy Sets and Systems 33 (1986), 165–190. [Chang, 1958] C. C. Chang, Algebraic analysis of many valued logics, Transactions of the American Mathematical Society 88 (1958), 74–80. [Chang, 1959] C. C. Chang, A new proof of the completeness of L ukasiewicz axioms, Transactions of the American Mathematical Society 93 (1959), 467–490. [Cignoli et al., 2000] R. Cignoli, I. M. L. D’Ottaviano, and D. Mundici, Algebraic foundations of many-valued reasoning, Trends in Logic, vol. 7, Kluwer Academic Publishers, Dordrecht, 2000. [Cutland and Gibbins, 1982] N. J. Cutland and P .F. Gibbins, A regular sequent calculus for quantum logic in which ∧ and ∨ are dual, Logique et Analyse - Nouvelle Serie - 25 (1982), no. 45, 221–248. [Czelakowski, 1975] J. Czelakowski, Logics based on partial Boolean σ-algebras (i), Studia Logica 34 (1975), 371–395. [Dalla Chiara, 1981] M. L. Dalla Chiara, Some metalogical pathologies of quantum logic, Current Issues in Quantum Logic (E. Beltrametti and B. van Fraassen, eds.), Ettore Majorana International Science Series, vol. 8, Plenum, New York, 1981, pp. 147–159. [Dalla Chiara and Giuntini, 1989] M. L. Dalla Chiara and R. Giuntini, Paraconsistent quantum logics, Foundations of Physics 19 (1989), 891–904. [Dalla Chiara and Giuntini, 1994] M. L. Dalla Chiara and R. Giuntini, Unsharp quantum logics, Foundations of Physics 24 (1994), 1161–1177. [Dalla Chiara and Giuntini, 1995] M. L. Dalla Chiara and R. Giuntini, The logics of orthoalgebras, Studia Logica 55 (1995), 3–22. [Dalla Chiara and Giuntini, 1999] M. L. Dalla Chiara and R. Giuntini, L ukasiewicz theory of truth, from the quantum logical point of view, Alfred Tarski and the Vienna Circle (J. Wol´enski and E. K¨ ohler, eds.), Kluwer, Dordrecht, 1999, pp. 127–134. [Dalla Chiara and Giuntini, 2002] M. L. Dalla Chiara and R. Giuntini, Quantum logics, Handbook of Philosophical Logic (D.M. Gabbay and F. Guenthner, eds.), vol. 6, Kluwer Academic Publishers, Dordrecht, 2002, pp. 129–228. [Dalla Chiara et al., 2004] M. L. Dalla Chiara, R. Giuntini, R. Greechie, Reasoning in quantum theory. Sharp and unsharp quantum logics, Kluwer Academic Publishers, Dordrecht, 2004. [Davies, 1976] E. B. Davies, Quantum theory of open systems, Academic, New York, 1976. [Dishkant, 1972] H. Dishkant, Semantics of the minimal logic of quantum mechanics, Studia Logica 30 (1972), 17–29. [Dunn and Hardegree, 2001] J. M. Dunn, G. M. Hardegree, Algebraic Methods in Philosophical Logic, Clarendon Press, Oxford, 2001. [Dvureˇ censkij, 1993] A. Dvureˇ censkij, Gleason’s theorem and its applications, Mathematics and its Applications, no. 60, Kluwer, Dordrecht, 1993. [Dvureˇ censkij, 1997] A. Dvureˇ censkij, Measures and ⊥-decomposable measures of effects of a Hilbert space, Atti del Seminario Matematico e Fisico dell’ Universita di Modena 45 (1997), 259–288. [Dvureˇ censkij and Pulmannov´ a, 1994] A. Dvureˇ censkij and S. Pulmannov´ a, D-test spaces and difference poset, Reports on Mathematical Physics 34 (1994), 151–170. [Dvureˇ censkij and Pulmannov´ a, 2000] A. Dvureˇ censkij and S. Pulmannov´ a, New trends in quantum structures, Mathematics and Its Applications, vol. 516, Kluwer Academic Publishers, Dordrecht, 2000. [Einstein et al., 1935] A. Einstein, B. Podolsky, and N. Rosen, Can quantum-mechanical description of reality be considered complete?, Physical Review 47 (1935), 777–780. [Engesser and Gabbay, 2002] K. Engesser and D.Gabbay, Quantum logic, Hilbert space, revision theory, Artificial Intelligence 136 (2002), 61–100. [Faggian, 1998] C. Faggian, Classical proofs via basic logic, Computer Science Logic 11th International Workshop, CSL’97 (M. Nielson and W. Thomas, eds.), Lecture Notes in Computer Science, vol. 1414, Springer Verlag, 1998, pp. 203–219.
The History of Quantum Logic
279
[Faggian and Sambin, 1997] C. Faggian and G. Sambin, From basic logic to quantum logics with cut-elimination,, International Journal of Theoretical Physics 12 (1997), 31–37. [Finch, 1970] P. D. Finch, Quantum logic as an implication algebra, Bulletin of the Australian Mathematical Society 2 (1970), 101–106. [Fitting, 1969] M. Fitting, Intuitionistic Logic, Model Theory and Forcing, North-Holland, Amsterdam, 1969. [Foulis, 1999] D. J. Foulis, A half-century of quantum logic, what have we learned?, Quantum Structures and the Nature of Reality (D. Aerts and J. Pykacz, eds.), vol. 7, Kluwer Academic Publishers, Dordrecht, 1999, pp. 1–36. [Foulis, 2000] D. J. Foulis, MV and Heyting effect algebras, Foundations of Physics 30 (2000), 1687–1706. [Foulis and Bennett, 1994] D. J. Foulis and M. K. Bennett, Effect algebras and unsharp quantum logics, Foundations of Physics 24 (1994), 1325–1346. [Foulis and Greechie, 2000] D. J. Foulis and R. J. Greechie, Specification of finite effect algebras, International Journal Theoretical Physics 39 (2000), 665–676. [Foulis and Munem, 1984] D. J. Foulis and M. Munem, Calculus with analytic geometry, Worth Publishing, London, 1984. [Foulis and Randall, 1981] D. J. Foulis and C. H. Randall, Empirical logic and tensor product, Interpretation and Foundations of Quantum Mechanics, Grundlagen der exakten Naturwissenschaften, vol. 5, Bibliographisches Institut, Mannheim, 1981, pp. 9–20. [Foulis and Randall, 1983] D. J. Foulis and C. H. Randall, Properties and operational propositions in quantum mechanics, Foundations of Physics 13 (1983), 843–857. [Foulis et al., 1996] D. J. Foulis, R. J. Greechie, M. L. Dalla Chiara, and R. Giuntini, Quantum Logic, Encyclopedia of Applied Physics (G. Trigg, ed.), vol. 15, VCH Publishers, 1996, pp. 229–255. [Garola, 1980] C. Garola, Propositions and orthocomplementation in quantum logic, International Journal of Theoretical Physics 19 (1980), 369–378. [Garola, 1985] C. Garola, Embedding of posets into lattices in quantum logic, International Journal of Theoretical Physics 24 (1985), 423–433. [Gerelle et al., 1974] E. R. Gerelle, R. J. Greechie, and F. R. Miller, Weights on spaces, Physical Reality and Mathematical Description (C.P. Enz and J. Mehra, eds.), Reidel, Dordrecht, 1974, pp. 169–192. [Gibbins, 1985] P. F. Gibbins, A user-friendly quantum logic, Logique-et-Analyse.-NouvelleSerie 28 (1985), 353–362. [Gibbins, 1987] P. F. Gibbins, Particles and paradoxes - the limits of quantum logic, Cambridge University Press, Cambridge, 1987. [Girard, 1987] J. Y. Girard, Linear logic, Theoretical Computer Science 50 (1987), 1–102. [Giuntini, 2002] R. Giuntini, Weakly linear QMV algebras, Algebra Universalis. [Giuntini, 1990] R. Giuntini, Brouwer-Zadeh logic and the operational approach to quantum mechanics, Foundations of Physics 20 (1990), 701–714. [Giuntini, 1991a] R. Giuntini, Quantum logic and hidden variables, Grundlagen der exakten Naturwissenschaften, no. 8, Bibliographisches Institut, Mannheim, 1991. [Giuntini, 1991] R. Giuntini, A semantical investigation on Brouwer-Zadeh logic, Journal of Philosophical Logic 20 (1991), 411–433. [Giuntini, 1992] R. Giuntini, Brouwer-Zadeh logic, decidability and bimodal systems, Studia Logica 51 (1992), 97–112. [Giuntini, 1993] R. Giuntini, Three-valued Brouwer-Zadeh logic, International Journal of Theoretical Physics 32 (1993), 1875–1887. [Giuntini, 1995a] R. Giuntini, Quasilinear QMV algebras, International Journal of Theoretical Physics 34 (1995), 1397–1407. [Giuntini, 1995b] R. Giuntini, Unsharp orthoalgebras and quantum MV algebras, The Foundations of Quantum Mechanics - Historical Analysis and Open Questions (C. Garola and A. Rossi, eds.), Kluwer, Dordrecht, 1995, pp. 325–337. [Giuntini, 1996] R. Giuntini, Quantum MV algebras, Studia Logica 56 (1996), 393–417. [Giuntini, 2000] R. Giuntini, An independent axiomatization of QMV algebras, The Foundations of Quantum Mechanics (C. Garola and A. Rossi, eds.), World Scientific, Singapore, 2000. [Giuntini and Greuling, 1989] R. Giuntini and H. Greuling, Toward an unsharp language for unsharp properties, Foundations of Physics 19 (1989), 931–945.
280
M. L. Dalla Chiara, R. Giuntini and M. R´edei
[Giuntini and Pulmannov´ a, ] R. Giuntini and S. Pulmannov´ a, Ideals and congruences in QMV algebras, Communications in Algebra 28 (2000), 1567–1592. [Gleason, 1957] A. M. Gleason, Measures on the closed subspaces of a Hilbert space, Journal of Mathematics and Mechanics 6 (1957), 885–893. [Goldblatt, 1974] R. Goldblatt, Semantics analysis of orthologic, Journal of Philosophical Logic 3 (1974), 19–35. [Goldblatt, 1984] R. Goldblatt, Orthomodularity is not elementary, The Journal of Symbolic Logic 49 (1984), 401–404. [Greechie, 1968] R. J. Greechie, On the structure of orthomodular lattices satisfying the chain condition, Journal of Combinatorial Theory 4 (1968), 210–218. [Greechie, 1969] R. J. Greechie, An orthomodular poset with a full set of states not embeddable in Hilbert space, Caribbean Journal of Mathematics and Science 1 (1969), 1–10. [Greechie, 1971] R. J. Greechie, Orthomodular lattices admitting no states, Journal of Combinatorial Theory 10 (1971), 119–131. [Greechie, 1974] R. J. Greechie, Some results from the combinatorial approach to quantum logic, Synthese 29 (1974), 113–127. [Greechie, 1975] R. J. Greechie, On three dimensional quantum proposition systems, Quantum Theory and the Structures of Time and Space (L. Castell, M. Drieschner, and C.F. von Weizs¨ acker, eds.), Carl Hanser Verlag, Munchen-Wien, 1975, pp. 71–84. [Greechie, 1978] R. J. Greechie, Another nonstandard quantum logic (and how I found it), Mathematical Foundations of Quantum Theory (A.R. Marlow, ed.), Academic Press,, London, 1978, pp. 71–85. [Greechie, 1981] R. J. Greechie, A non-standard quantum logic with a strong set of states, Current Issues in Quantum Logic (E. Beltrametti and B. van Fraassen, eds.), Ettore Majorana International Science Series, vol. 8, Plenum, New York, 1981, pp. 375–380. [Greechie and Foulis, 1995] R. J. Greechie and D. J. Foulis, The transition to effect algebras, International Journal of Theoretical Physics 34 (1995), 1369–1382. [Gudder, 1979] S. P. Gudder, A survey of axiomatic quantum mechanics, The Logico-Algebraic Approach to Quantum Mechanics (C. A. Hooker, ed.), vol. II, Reidel, Dordrecht, 1979, pp. 323–363. [Gudder, 1995] S. P. Gudder, Total extensions of effect algebras, Foundations of Physics Letters 8 (1995), 243–252. [Gudder, 1998] S. P. Gudder, Sharply dominating effect algebras, Tatra Mountains Mathematical Publications 15 (1998), 23–30. [Gudder and Greechie, 1996] S. P. Gudder and R. J. Greechie, Effect algebra counterexamples, Mathematica Slovaca 46 (1996), 317–325. [H´ ajek, 1998] P. H´ ajek, Metamathematics of fuzzy logic, Trends in Logic, vol. 4, Kluwer Academic Publishers, Dordrecht, 1998. [Halmos, 1951] P. R. Halmos, Introduction to Hilbert space and the theory of spectral multiplicity, Chelsea, New York, 1951. [Halmos, 1962] P. R. Halmos, Algebraic Logic, Chelsa Publishing Company, New York, 1962. [Halperin, 1961] I. Halperin, Review of J. von Neumann’s manuscript “Continuous geometry with transition probability” in [von Neumann, 1961b], pp. 191-194. [Hardegree, 1975] G. M. Hardegree, Stalnaker conditionals and quantum logic, Journal of Philsophical Logic 4 (1975), 399–421. [Hardegree, 1976] G. M. Hardegree, The conditional in quantum logic, Logic and Probability in Quantum Mechanics (P. Suppes, ed.), Reidel, Dordrecht, 1976, pp. 55–72. [Hardegree, 1981] G. M. Hardegree, An axiom system for orthomodular quantum logic, Studia Logica 40 (1981), 1–12. [Holland, 1995] S. S. Holland, Orthomodularity in infinite dimensions: a theorem of M. Sol` er, Bulletin of the American Mathematical Society 32 (1995), 205–232. [Hughes, 1985] R. I. G. Hughes, Semantic alternatives in partial Boolean quantum logic, Journal of Philosophical Logic 14 (1985), 411–446. [Hughes, 1987] R. I. G. Hughes, The structure and interpretation of quantum mechanics, Cambridge University Press, Cambridge, 1987. [Jammer, 1974] M. Jammer, The philosophy of quantum mechanics, Wiley-Interscience, New York, 1974. [Jauch, 1968] J. M. Jauch, Foundations of quantum mechanics, Addison-Wesley, London, 1968. [Kalmbach, 1983] G. Kalmbach, Orthomodular Lattices, Academic Press, New York, 1983.
The History of Quantum Logic
281
[Keller, 1980] H. A. Keller, Ein nichtklassischer hilbertscher Raum, Mathematische Zeitschrift 172 (1980), 41–49. [Kochen and Specker, 1965a] S. Kochen and E. P. Specker, The calculus of partial propositional functions, Proceedings of the 1964 International Congress for Logic, Methodology and Philosophy of Science (Y. Bar-Hillel, ed.), North-Holland, Amsterdam, 1965, pp. 45–57. [Kochen and Specker, 1965] S. Kochen and E. P. Specker, Logical structures arising in quantum theory, The Theory of Models (J. Addison, L. Henkin, and A. Tarski, eds.), North-Holland, Amsterdam, 1965, pp. 177–189. [Kochen and Specker, 1967] S. Kochen and E. P. Specker, The problem of hidden variables in quantum mechanics, Journal of Mathematics and Mechanics 17 (1967), 59–87. [Kˆ opka and Chovenec, 1994] F. Kˆ opka and F. Chovanec, D-posets, Mathematica Slovaca 44 (1994), 21–34. [Kraus, 1983] K. Kraus, States, effects and operations, Lecture Notes in Physics, vol. 190, Springer, Berlin, 1983. [Ludwig, 1983] G. Ludwig, Foundations of quantum mechanics, vol. 1, Springer, Berlin, 1983. [L ukasiewicz, 1936] J. L ukasiewicz, Logistic and philosophy, Selected Work (L. Borkowski, ed.), North-Holland, Asterdam, 1970, pp. 218–235. [L ukasiewicz, 1946] J. L ukasiewicz, On determinism, Selected Work (L. Borkowski, ed.), NorthHolland, Asterdam, 1970, pp. 110–128. [L ukasiewicz, 1970] J. L ukasiewicz, On three-valued logic, Selected Work (L. Borkowski, ed.), North-Holland, Amsterdam, 1970. [Mackey, 1957] G. Mackey, The Mathematical Foundations of Quantum Mechanics, Benjamin, New York, 1957. [Mangani, 1973] P. Mangani, Su certe algebre connesse con logiche a pi` u valori, Bollettino Unione Matematica Italiana 8 (1973), 68–78. [Minari, 1987] P. Minari, On the algebraic and kripkean logical consequence relation for orthomodular quantum logic, Reports on Mathematical Logic 21 (1987), 47–54. [Mittlestaedt, 1972] P. Mittelstaedt, On the interpretation of the lattice of subspaces of Hilbert space as a propositional calculus, Zeitschrift f¨ ur Naturforschung 27a (1972), 1358–1362. [Mittelstaedt, 1978] P. Mittelstaedt, Quantum logic, Reidel, Dordrecht, 1978. [Mittelstaedt, 1985] P. Mittelstaedt (ed.), Recent developments in quantum logic, Grundlagen der exakten Naturwissenschaften, no. 6, Bibliographisches Institut, Mannheim, 1985. [Mittelstaedt, 1986] P. Mittelstaedt, Sprache und Realit¨ at in der modernen Physik, Bibliographisches Institut, Mannheim, 1986. [Morash, 1973] R. P. Morash, Angle bisection and orthoautomorphisms in Hilbert lattices, Canadian Journal of Mathematics 25 (1973), 261–272. [Mundici, 1992] D. Mundici, The logic of Ulam’s game with lies, Knowledge, Belief and Strategic Interaction (C. Bicchieri and M. L. Dalla Chiara, eds.), Cambridge University Press, Cambridge, 1992. [Murray and von Neumann, 1936] F. J. Murray and J. von Neumann, On rings of operators, Annals of Mathematics 37 (1936), 6-119, in [von Neumann, 1961a]. [Navara, 1999] M. Navara, Two descriptions of state spaces of orthomodular structures, International Journal of Theoretical Physics 38 (1999), 3163–3178. [Neubrunn and Rieˇcan, 1997] T. Neubrunn and B. Rieˇcan, Integral, measure and ordering, Kluwer Academic Publishers, Dordrecht, 1997. [Nishimura, 1980] H. Nishimura, Sequential method in quantum logic, Journal of Symbolic Logic 45 (1980), 339–352. [Nishimura, 1994] H. Nishimura, Proof theory for minimal quantum logic I and II, International Journal of Theoretical Physics 33 (1994), 102–113, 1427–1443. [Paoli, 2002] F. Paoli, Substructural logics: A primer, Trends in Logic, vol. 13, Kluwer Academic Publishers, 2002. [Peres, 1995] A. Peres, Quantum theory: Concepts and methods, Kluwer Academic Publishers, Dordrecht, 1995. [Petz and R´edi, 1995] D. Petz and M. R´ edei, John von Neumann and the theory of operator algebras in The Neumann Compendium. World Scientific Series of 20th Century Mathematics Vol. I., F. Brody and T. V´ amos (eds.), World Scientific, Singapore, 1995, 163-181. [Petz and Zemanek, 1988] D. Petz and J. Zemanek, Characterizations of the trace, Linear Algebra and its Applications 111 (1988), 43-52. [Piron, 1976] C. Piron, Foundations of quantum physics, W. A. Benjamin, Reading, 1976.
282
M. L. Dalla Chiara, R. Giuntini and M. R´edei
[Pitowsky, 1989] I. Pitowsky, Quantum probability - quantum logic, Lectures Notes in Physics, no. 321, Springer, Berlin, 1989. [Pratt, 1993] V. Pratt, Linear logic for generalized quantum mechanics, Workshop on Physics and Computation (PhysComp’92) (Dallas), IEEE, 1993, pp. 166–180. [Pt´ ak and Pulmannov´ a, 1991] P. Pt´ ak and S. Pulmannov´ a, Orthomodular structures as quantum logics, Fundamental Theories of Physics, no. 44, Kluwer, Dordrecht, 1991. [Pulmannov´ a, 1995] S. Pulmannov´ a, Representation of D-posets, International Journal of Theorethical Physics 34 (1995), 1689–1696. [Putnam, 1969] H. Putnam, Is logic empirical?, Boston Studies in the Philosophy of Science (R. S. Cohen and M. W. Wartofsky, eds.), vol. 5, Reidel, Dordrecht, 1969, pp. 216–241. [Pykacz, 2000] J. Pykacz, L ukasiewicz operations in fuzzy set theories and many-valued representations of quantum logics, Foundations of Physics, 30 (2000), 1503–1524. [R´ edei, submitted] M. R´ edei: The birth of quantum logic, manuscript, submitted. [R´ edei, 2001] M. R´ edei, Von Neumann’s concept of quantum logic and quantum probability, in [R´ edei and St¨ oltzner, 2001]. [R´ edei and St¨ oltzner, 2001] M. R´ edei, M. St¨ oltzner, John von Neumann and the Foundations of Quantum Physics, M. St¨ oltzner, M. R´edei (eds.), Kluwer Academic Publishers, Dordrecht, Boston, London 2001. [R´ edei, 1999] M. R´ edei, “Unsolved problems in mathematics” J. von Neumann’s address to the International Congress of Mathematicians Amsterdam, September 2-9, 1954, The Mathematical Intelligencer 21 (1999), 7-12. [R´ edei, 1998] M. R´ edei, Quantum Logic in Algebraic Approach, Kluwer Academic Publishers, Dordrecht, Holland, 1998. [R´ edei, 1996] M. R´ edei, Why John von Neumann did not like the Hilbert space formalism of quantum mechanics (and what he liked instead), Studies in the History and Philosophy of Modern Physics 27 (1996), 493-510. [Redhead, 1987] M. Redhead, Incompleteness, nonlocality and realism - a prolegomenon to the philosophy of quantum mechanics, Clarendon Press, Oxford, 1987. [Reed and Simon, 1972] M. Reed and B. Simon, Methods of modern mathematical physics, vol. I, Academic Press, New York, 1972. [Rieˇ canova, 1999] Z. Rieˇ canova, Subalgebras, intervals, and central elements of generalized effect algebras, International Journal of Theoretical Physics 38 (1999), 3209–3220. [Rosenthal, 1990] K. I. Rosenthal, Quantales and their Applications, Longman, New York, 1990. [Sambin et al., 2000] G. Sambin, G. Battilotti, and C. Faggian, Basic logic: reflection, symmetry, visibility,, The Journal of Symbolic Logic 65 (2000), 979–1013. [Schroeck, 1996] F. E. Schroeck, Quantum Mechanics on Phase Space, Fundamental Theories of Physics, vol. 74, Kluwer Academic Publishers, Dordrecht, 1996. [Sol`er, 1995] M. P. Sol` er, Characterization of Hilbert spaces by orthomodular spaces, Communications in Algebra 23 (1995), 219–243. [Stalnaker, 1981] R. Stalnaker, A theory of conditionals, Ifs. Conditionals, Belief, Decision, Chance, and Time (W. Harper, G. Pearce, and R. Stalnaker, eds.), Reidel, Dordrecht, 1981, pp. 41–55. [Svozil, 1998] K. Svozil, Quantum logic, Springer, Singapore, 1998. [Takesaki, 1979] M. Takesaki, Theory of Operator Algebras, I., Springer Verlag, New York, 1979. [Takeuti, 1981] G. Takeuti, Quantum set theory, Current Issues in Quantum Logic (E. G. Beltrametti and B. C. van Fraassen, eds.), Ettore Majorana International Science Series, vol. 8, Plenum, New York, 1981, pp. 303–322. [Tamura, 1988] S. Tamura, A Gentzen formulation without the cut rule for ortholattices, Kobe Journal of Mathematics 5 (1988), 133–150. [van Fraassen, 1974] B. van Fraassen, The labyrinth of quantum logics, Logical and Epistemological Studies in Contemporary Physics (R. Cohen and M. Wartosky, eds.), Boston Studies in the Philosophy of Science, vol. 13, Reidel, Dordrecht, 1974, pp. 224–254. [van Fraassen, 1991] B. van Fraassen, Quantum Mechanics. an empiricist view, Clarendon Press, Oxford, 1991. [Varadarajan, 1985] V. S. Varadarajan, Geometry of quantum theory, 2 ed., Springer, Berlin, 1985. [von Mises, 1919] R. von Mises, Grundlagen der Wahrscheinlichkeitsrechnung, Mathematische Zeitschrift 5 (1919), 52-99.
The History of Quantum Logic
283
[Mises, 1928] R, von Mises, Probability, Statistics and Truth (second English edition of Wahrscheinlichkeit, Statistik und Wahrheit, Springer, 1928), Dover Publications, New York, 1981. [von Neumann, 1927] J. von Neumann, Mathematische Begr¨ undung der Quantenmechanik, G¨ ottinger Nachrichten (1927), 1-57, in [von Neumann, 1962], pp. 151-207. [von Neumann, 1927] J. von Neumann, Wahrscheinlichkeitstheoretischer Aufbau der Quantenmechanik, G¨ ottinger Nachrichten (1927), 245-272, in [von Neumann, 1962], pp. 208-235. [von Neumann, 1927] J. von Neumann, Thermodynamik quantenmechanischer Gesamtheiten, G¨ ottinger Nachrichten (1927), 245-272, in [von Neumann, 1962],pp. 236-254. [von Neumann, 1943] J. von Neumann, Mathematische Grundlagen der Quantenmechanik, Dover Publications, New York, 1943 (first American Edition; first edition: Springer Verlag, Heidelberg, 1932). [von Neumann, 1937] J. von Neumann, Quantum logics (strict- and probability logics), Unfinished manuscript, John von Neumann Archive, Libarary of Congress, Washington, D.C. reviewed by A. H. Taub, in [von Neumann, 1961b] pp. 195-197. [von Neumann, 1045] J. von Neumann, Letter to Dr. Silsbee, July 2, 1945, in [R´ edei and St¨ oltzner, 2001], pp. 225-226. [von Neumann, 2001] J. von Neumann, Unsolved problems in mathematics, in [R´ edei and St¨ oltzner, 2001]. [von Neumann, 1954] J. von Neumann, Unsolved problems in mathematics. Address to the World Congress of Mathematics, Amsterdam, September 2-9, 1954, in [R´ edei and St¨ oltzner, 2001], pp. 231-245. [von Neumann, forthcoming] John von Neumann, Selected Letters, ed. by M. R´edei (forthcoming). [von Neumann, 1962] J. von Neumann, Collected Works Vol. I. Logic, Theory of Sets and Quantum Mechanics , A.H. Taub (ed.), Pergamon Press, 1962. [von Neumann, 1961a] J. von Neumann, Collected Works Vol. III. Rings of Operators , A.H. Taub (ed.), Pergamon Press, 1961. [von Neumann, 1961b] J. von Neumann, Collected Works Vol. IV. Continuous Geometry and Other Topics, A.H. Taub (ed.), Pergamon Press, 1961. [von Neumann, 1981] J. von Neumann,Continuous Geometries with Transition Probability Memoirs of the American Mathematical Society 34 No. 252 (1981) 1-210. [Wright, 1990] R. Wright, Generalized urn models, Foundations of Physics 20 (1990), 881–903. [Zadeh, 1965] L. Zadeh, Fuzzy sets and, Information and Control 8 (1965), 338–353. [Zierler, 1961] N. Zierler, Axioms for non-relativistic quantum mechanics, Pacific Journal of Mathematics 11 (1961), 1151–1169.
LOGICS OF VAGUENESS Dominic Hyde Lack of sharp boundaries is prevalent in our use of natural language. Consider your favourite animal species. We might easily imagine that, over time, the species becomes rare. Further pressures subsequently then push the species into the category of the vulnerable, with the trend continuing until the species is endangered, and finally extinct. At what moment in time did it become rare? When exactly did it become vulnerable? When did it qualify for being rare and when, finally, did it become extinct? Similarly we may ask at what instant did the autumn leaves turn brown or did that person become rich, famous, bald, tall or an adult. These predicates — ‘is vulnerable’, ‘is rare’, ‘is brown’, ‘is rich’, etc. — are all examples of predicates whose limits of application are essentially indefinite or indeterminate, and they are typical examples of what are termed vague predicates. Take the predicate ‘is tall’ for instance. We might line up a crowd of people starting with the shortest and progressing smoothly to the tallest. The crowd is not clearly partitioned into two mutually exclusive and exhaustive sets of those to whom the predicate applies and those to whom it fails to apply. There is, for example, no identifiably shortest tall person, nor can we point to the tallest short person. The transition from one set to the other is not precise and one might ask rhetorically, as the third century philosopher Diogenes La¨ertius [1925: vii. 82] is reputed to have done, “Where do you draw the line?”. The most common instances of vague predicates are those for which the applicability of the predicate just seems to fade off, as in the above examples, and consequently no sharp boundary can be drawn separating the predicate’s positive extension from its negative extension. The behaviour of vague predicates is thus contrasted with such precise predicates as ‘is greater than two’ defined on the natural numbers. We can clearly partition the domain of natural numbers, N , into two sharp sets: P − = {0, 1, 2} and P + = {3, 4, 5, ...}, the set P − comprising those natural numbers determinately failing to satisfy the predicate and the set P + comprising those natural numbers that determinately satisfy it. Vagueness can already be distinguished from another sense in which language is often said to be vague — vague in the sense of inexact, unspecific or general. Take for example the claim that there are between two hundred and one thousand species of Eucalyptus trees. It might be responded that this claim is “vague” and one could be a lot more “precise”. However, vagueness in this sense is quite different from vagueness as described above. Being between two hundred and one thousand is an inexact description of the number of species in a genus but it is
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
286
Dominic Hyde
not vague in the sense of there being indeterminate limits to its application — it will be true if the number of species lies between these two numbers and false otherwise. Of course, I can make a much more exact estimation of their number which nonetheless is more vague, e.g. approximately one thousand five hundred. Is my more exact estimation correct if the number is one thousand five hundred and eighty two? There may simply be no clear or determinate answer. Increasing exactness is consistent with a decrease in precision whilst a decrease in exactness is consistent with an increase in precision. The symptom of vagueness alluded to above, our inability to draw a sharp line between those things in the predicate’s positive extension and those in its negative extension, is tantamount to there being borderline (or penumbral) cases for the predicate in question — cases which jointly constitute the borderline region (or penumbra) for the vague predicate. Intuitively, such cases are where there are objects to which the predicate meaningfully applies (i.e. an object in the predicate’s domain of significance) yet for which it is essentially indeterminate whether the predicate or its negation truthfully applies. That is to say, there are situations where a language user, having carried out all the empirical and conceptual research possible concerning the case to hand, will nonetheless still be unable either to determinately apply the predicate to an object to which the predicate may be said to apply meaningfully or to apply its negation determinately. This indeterminacy or indefiniteness, taken as a defining characteristic of vagueness, is not due to the lack of knowledge of facts or of meanings that one could in principle come to know. The inability to draw boundaries to the application of a vague predicate also gives rise to the most troublesome hallmark of vagueness; its susceptibility to paradox — the sorites paradox.
1 VAGUENESS AND THE SORITES PUZZLE The sorites paradox describes a class of paradoxical arguments, sometimes called “little-by-little arguments”. These arguments arise as a result of the indeterminacy surrounding limits of application of the predicates involved. For example, the concept of a heap appears to lack sharp boundaries and, as a consequence of the subsequent indeterminacy surrounding the limits of applicability of the predicate ‘is a heap’, no one grain of wheat can be identified as making the difference between being a heap and not being a heap. Given then that one grain of wheat does not make a heap, it would seem to follow that two do not, thus three do not, and so on. In the end it would appear that no amount of wheat can make a heap. We are faced with paradox since from apparently true premises by seemingly uncontroversial reasoning we arrive at an apparently false conclusion. This phenomenon at the heart of the paradox is the phenomenon of vagueness.
Logics of Vagueness
1.1
287
The Origins of the Puzzle
The name ‘sorites’ derives from the Greek word soros (meaning ‘heap’) and originally referred, not to a paradox, but rather to a puzzle: “Would you describe a single grain of wheat as a heap? No. Would you describe two grains of wheat as a heap? No. . . . You must admit the presence of a heap sooner or later, so where do you draw the line?” The puzzle was known as The Heap. It was one of a series of puzzles attributed to the Megarian logician Eubulides of Miletus. Also included were the Liar: “A man says that he is lying. Is what he says true or false?”; the Hooded Man: “You say that you know your brother. Yet that man who just came in with his head covered is your brother and you did not know him”; and the Bald Man: “Would you describe a man with one hair on his head as bald? Yes. Would you describe a man with two hairs on his head as bald? Yes. . . . You must refrain from describing a man with ten thousand hairs on his head as bald, so where do you draw the line?” This last puzzle was originally known as the falakros puzzle and was seen to have the same form as the Heap. All such puzzles became collectively known as sorites puzzles. It is not known whether Eubulides actually invented the sorites puzzles. Some scholars have attempted to trace their origins back to Zeno of Elea but the evidence seems to point to Eubulides as the first to employ the sorites. Nor is it known just what motives Eubulides may have had for presenting these sorites puzzles. They were, however, employed by later Greek philosophers to attack various positions, most notably by the Sceptics against the Stoics’ claims to knowledge. No evidenced has yet surfaced of any later interest in the sorites in the extensive philosophical work of the great Arabic scholars of the tenth, eleventh and twelfth centuries (e.g. Alfarabi, Avicenna or Averroes) nor in Eastern philosophical work. These puzzles of Greek antiquity are now more usually described as paradoxes. Though the sorites conundrum can be presented informally as a series of questions whose puzzling nature gives it dialectical force it can be, and was, presented as a formal argument having logical structure. The following argument form of the sorites was common: 1 grain of wheat does not make a heap. If 1 grain of wheat does not make a heap then 2 grains of wheat do not. If 2 grains of wheat do not make a heap then 3 grains do not. .. . If 9,999 grains of wheat do not make a heap then 10,000 do not. ∴
10,000 grains of wheat do not make a heap.
The argument certainly seems to be valid, employing only modus ponens and cut (enabling the chaining together of each sub-argument involving a single modus ponens inference). These rules of inference are endorsed by both Stoic logic and modern classical logic, amongst others.
288
Dominic Hyde
Moreover its premises appear true. Some Stoic presentations of the argument and the form presented by Diogenes La¨ertius recast it in a form which replaced all the conditionals, ‘If A then B’, with ‘Not(A and not-B)’ to stress that the conditional should not be thought of as being a strong one, but rather the weak Philonian conditional (the modern material conditional) according to which ‘If A then B’ was equivalent to ‘Not(A and not-B)’. Such emphasis was deemed necessary since there was a great deal of debate in Stoic logic regarding the correct analysis for the conditional. In thus judging that a connective as weak as the Philonian conditional underpinned this form of the paradox they were forestalling resolutions of the paradox that denied the truth of the conditionals based on a strong reading of them. This interpretation then presents the argument in its strongest form since the validity of modus ponens seems assured for this conditional whilst the premises are construed so weakly as to be difficult to deny. The difference of one grain would seem to be too small to make any difference to the application of the predicate; it is a difference so negligible as to make no apparent difference to the truth-values of the respective antecedents and consequents. Yet the conclusion seems false. Thus paradox confronted the Stoics just as it does the modern classical logician. Nor are such paradoxes isolated conundrums. Innumerable sorites paradoxes can be expressed in this way. For example, one can present the puzzle of the Bald Man in this manner. Since a man with one hair on his head is bald and if a man with one is then a man with two is, so a man with two hairs on his head is bald. Again, if a man with two is then a man with three is, so a man with three hairs on his head is bald, and so on. So a man with ten thousand hairs on his head is bald, yet we rightly feel that such men are hirsute, i.e. not bald. Indeed, it seems that almost any vague predicate admits of such a sorites paradox and vague predicates are ubiquitous. As presented, the paradox of the Heap and the Bald Man proceed by addition (of grains of wheat and hairs on the head respectively). Alternatively though, one might proceed in reverse, by subtraction. If one is prepared to admit that ten thousand grains of sand make a heap then one can argue that one grain of sand does since the removal of any one grain of sand cannot make the difference. Similarly, if one is prepared to admit a man with ten thousand hairs on his head is not bald, then one can argue that even with one hair on his head he is not bald since the removal of any one hair from the originally hirsute scalp cannot make the relevant difference. It was thus recognised, even in antiquity, that sorites arguments come in pairs, using: ‘non-heap’ and ‘heap’; ‘bald’ and ‘hirsute’; ‘poor’ and ‘rich’; ‘few’ and ‘many’; ‘small’ and ‘large’; and so on. For every argument which proceeds by addition there is another reverse argument which proceeds by subtraction. Curiously, the paradox seemed to attract little subsequent interest until the late nineteenth century when formal logic once again assumed a central role in philosophy. Since the demise of ideal language doctrines in the latter half of the twentieth century interest in the vagaries of natural language, and the sorites
Logics of Vagueness
289
paradox in particular, has greatly increased. (See Williamson [1994, ch. 1] for more details on the early history of the sorites.)
1.2
Its Paradoxical Forms
A common form of the sorites paradox presented for discussion in the literature is the form discussed above. Let F represent the soritical predicate (e.g. ‘is bald’, or ‘does not make a heap’) and let the expression ‘an ’ (where n is a natural number) represent a subject expression in the series with regard to which F is soritical (e.g. ‘a man with n hair(s) on his head’ or ‘n grain(s) of wheat’). Then the sorites proceeds by way of a series of conditionals and can be schematically represented as follows: Conditional Sorites F a1 If F a1 then F a2 If F a2 then F a3 .. . If F ai−1 then F ai ∴
F ai (where i can be arbitrary large)
Whether the argument is taken to proceed by addition or subtraction will depend on how one views the series. Barnes [1982] states conditions under which any argument of this form is soritical. Initially, the series a1 , . . . , ai must be ordered; for example, scalps ordered according to number of hairs, heaps ordered according to number of grains of wheat, and so on. Secondly, the predicate F must satisfy the following three constraints: (i) it must appear true of a1 , the first item in the series; (ii) it must appear false of ai , the last item in the series; and (iii) each adjacent pair in the series, an and an+1 , must be sufficiently similar as to appear indiscriminable in respect of F — that is, both an and an+1 appear to satisfy F or neither do. Under these conditions F will be soritical relative to the series a1 , . . . , ai and any argument of the above form using F and a1 , . . . , ai will be soritical. In recent times the explanation of the fact that sorites arguments come in pairs has shifted from consideration of the sorites series itself and whether it proceeds by addition or subtraction to the predicate involved. It is now common to focus on the presence or absence of negation in the predicate, noting the existence of both a positive form which bloats the predicate’s extension and negative form which shrinks the predicate’s extension. With the foregoing analysis of the conditions for sorites susceptibility it is easy to verify that F will be soritical relative to a1 , . . . , ai if and only if not-F is soritical relative to ai , . . . , a1 . Thus verifying that for every positive sorites there is an analogous negative variant.
290
Dominic Hyde
The key feature of soritical predicates which drives the paradox, constraint (iii), is described in Wright [1975] as “tolerance” and is thought to arise as a result of the vagueness of the predicate involved. Predicates such as ‘is a heap’ or ‘is bald’ appear tolerant of sufficiently small changes in the relevant respects — namely number of grains or number of hairs. The degree of change between adjacent members of the series relative to which F is soritical would seem too small to make any difference to the application of the predicate F . Yet large changes in relevant respects will make a difference, even though large changes are the accumulation of small ones which do not seem to make a difference. This is the very heart of the conundrum which has delighted and perplexed so many for so long. Any resolution of the paradoxes is further complicated by the fact that they can be presented in a variety of forms and the problem they present can only be considered solved when all forms have been defused. One variant replaces the set of conditional premises with a universally quantified premises. Let ‘n’ be a variable ranging over the natural numbers and let ‘∀n(...n...)’ assert that every number n satisfies the condition . . . n . . . . Further, let us represent the claim of the form ‘∀n(if F an then F an+1 )’ as ‘∀n(F an → F an+1 )’. Then the sorites is now seen as proceeding by the inference pattern known as mathematical induction: F a1 ∀n(F an → F an+1 )
Mathematical Induction Sorites ∴
∀nF an
So, for example, it is argued that since a man with 1 hair on his head is bald and since the addition of one hair cannot make the difference between being bald and not bald (for any number n, if a man with n hairs is bald then so is a man with n + 1 hairs), then no matter what number n you choose, a man with n hairs on his head is bald. Yet another form is a variant of this inductive form. Assume that it is not the case that, for every n, a man with n hairs on his head is bald, i.e. that for some number n, it is not the case that a man with n hairs on his head is bald. Then by the least number principle (equivalent to the principle of mathematical induction) there must be a least such number, say i + 1, such that it is not the case that a man with i + 1 hairs on his head is bald. Since a man with 1 hair on his head is bald it follows that i + 1 must be greater than 1. So, there must be some number n (= i) such that a man with n hairs counts as bald whilst a man with n + 1 does not. Thus it is argued that though a1 is bald, not every number n is such that an is bald, so there must be some point at which baldness ceases. Let ‘∃n(. . . n . . .)’ assert that some number n satisfies the condition . . . n . . . . Then we can represent the chain of reasoning just described as follows:
Logics of Vagueness
291
Line-drawing Sorites F a1 ¬∀nF an ∴
∃n ≥ 1(F an & ¬F an+1 )
Now obviously, given that sorites arguments have been presented in these three forms, “the sorites paradox” will not be solved by merely claiming, say, mathematical induction to be invalid for soritical predicates. All forms need to be addressed one way or another. (See [Priest, 1991] for yet another interesting form the paradox might take, a form which makes explicit the paradox’s dependence on condition (iii) mentioned above and presents the argument as proceeding by substitutivity of identicals.) One would hope to solve the paradox, if at all, by revealing some general underlying fault common to all forms of the paradox. No such general solution could depend on the diagnosis of a fault peculiar to any one form. On the other hand, were no general solution available then “the sorites paradox” will only be adequately addressed when each of its forms separately have been adequately dealt with. This piecemeal approach holds little attraction though. It is less economical than a unified approach, arguably less elegant, and would fail to come to grips with the underlying unifying phenomenon which is considered to give rise to the paradoxes, namely vagueness. A logic of vagueness, be it classical or otherwise, ought to be able to defuse all those paradoxes that have their source in this phenomenon. 2
STOICISM AND THE EPISTEMIC THEORY
The sorites paradox in antiquity did not remain an isolated curio or pedantic conundrum; it had an edge which the Sceptics hoped to use against the Stoic theory of knowledge in particular, by showing that the Stoics’ conception of knowledge, in being soritical, was incoherent. The Stoics’ response, exemplified by Chrysippus, amounted to the claim that some conditional premise of the conditional sorites argument was false and thus the Sceptics’ argument was considered unsound. ‘Knowledge’, though vague and soritical relative to an appropriately chosen series, is semantically determinate so there is a cut-off point to its application. In the imperceptible slide from cognitive impressions to non-cognitive impressions there comes a point where two seemingly indistinguishable impressions are such that one serves to ground claims to knowledge whilst the other does not — even though they are, as just remarked, apparently indistinguishable. The inclination to validate all the premises of the argument (along with the inference pattern employed) was to be explained via the unknowable nature of the semantic boundary. The Sceptics were, in effect, taken to confuse our inability to know the boundaries of knowledge with the absence of a boundary. Though
292
Dominic Hyde
everyone agreed that no boundary could be known, according to the Stoic defence this was as deep as the problem went. The conundrum was an epistemological one. Thus the Stoics rejected the threat of wholesale epistemological scepticism (there could be no coherent claims to knowledge) in favour of the limited scepticism arising from our inability to know the precise boundaries to knowledge. “Nothing can be known” was rejected in favour of “the precise boundaries to knowledge itself cannot be known” — wholesale ignorance was replaced by ignorance of precise boundaries. This quite specific response to the soritical, and hence paradoxical, nature of ‘knowledge’ generalizes, of course. One might respond to the paradoxicality of any soritical predicate by denying one of the premises of the conditional sorites argument involving that predicate, and the conditional premises are the natural target. In answer to the apparent incoherence of soritical terms per se due to their apparently unbounded application, it is claimed that there are bounds, precise bounds, but that they are unknowable. In thus requiring that there be a determinate fact of the matter as to whether the predicate applies to any given case in its range of significance such an account is committed to the view that such facts may transcend our ability to know whether or not they obtain. This is a strong expression of semantic realism and, as such, is vulnerable to the usual scepticism. Many think the response runs counter to our intuitions on the matter. We feel that one hair cannot make the difference between being described as bald and being described as hirsute; that two colour-patches indiscriminable in colour cannot be described respectively as red and orange; and in so doing we are echoing the more time-worn view of Galen [1987, 223]. If you do not say with respect to any of the numbers, as in the case of 100 grains of wheat for example, that it now constitutes a heap, but afterwards when a grain is added to it, you say that a heap has now been formed, consequently this quantity of corn becomes a heap by the addition of the single grain of wheat, and if the grain is taken away the heap is eliminated. And I know of nothing worse and more absurd than that the being and non-being of a heap is determined by a grain of corn. Absurd as it seems, the existence of a precise cut-off point follows from the epistemic theory which uses precisely this feature to evade the sorites; classical logic is not threatened since precise cut-off points exist and we may therefore claim the major premise of the sorites as false. (Sorensen [2001] argues at length that this sense of absurdity attaching to the epistemic theory is nonetheless compatible with its truth.) Looking more closely to the nature of the epistemic gap, characteristic of soritical expressions on the epistemic account, the first quick point to be clear on is that Galen, in expressing reservations concerning an epistemic analysis, would
Logics of Vagueness
293
agree that one cannot know the sharp boundaries of vague terms but claim that this is because there is none to be known. The reservation is not that one can know the semantic boundaries to soritical terms but rather that, though one cannot, there is one nonetheless. That is, the concern centres on the commitment to an epistemic blindspot. Secondly, the gap is one that is unbridgeable in principle`. It’s not that one simply doesn’t currently know where particular semantic boundaries for soritical expressions lie. Epistemic vagueness is a matter of necessary ignorance. In the case of vague predicates, there is still a determinate answer as to whether or not the predicate applies, it is just that it is impossible to know the answer — it is unknowable. The claim that vagueness amounts to nothing more than an epistemic gap is generally met with incredulity. What could possibly be the cause for such a gap, such a blindspot? In the twentieth century resurgence of interest in the ancient conundrum, an epistemic approach was commonly ruled out by definition, as a cursory study of encyclopedia and dictionary entries will reveal. Vagueness was typically characterised as a semantic phenomenon whereby the apparent semantic indeterminacy surrounding a soritical term’s extension was considered real. In the absence of any apparent barrier to knowledge of a soritical predicate’s precise extension it was generally assumed that there was simply no precise extension to be known. Over the last decade the philosophical landscape has changed. Williamson [1994] and Sorensen [2001] offer an impressive array of arguments defending an epistemological account of vagueness which, if successful, would make possible an epistemological solution to the sorites. Williamson [1994, ch. 8] offers us “one line of thought [that] may rescue the epistemic theory” in the face of incredulity as to the existence of an epistemic blindspot. He claims that a margin-of-epistemic-error principle precludes knowledge of the boundary of a predicate’s application over a sorites series. In this way he hopes to undermine our incredulity by providing an explanation as to why we cannot know where the boundary is. That is, if you want to claim that there is a sharp boundary then the margin-of-error principle will help to explain its unknowability. The specific epistemological problem, as Williamson sees it, briefly is this: we cannot know the semantic determinations on the sharp boundary of a soritical term’s application since this violates an error margin principle that is required for knowledge. So what is this error principle? The general claim is that in order to know that A one must at least be reliably right about it; knowing A entails our being reliably right in supposing A to be the case. Being reliably right in supposing A to be the case in turn entails A’s being the case in sufficiently similar circumstances. (Of course, as Williamson points out, the dimensions of similarity depend on A.) Thus the constraint that knowledge be reliable results in the following (vague but non-trivial) general principle: Margin for Error Principle If ‘It is known that A’ is true then ‘A’ is true in all sufficiently similar cases.
294
Dominic Hyde
In other words, if a proposition is true whilst there are sufficiently similar cases in which it is false, it is not available to be known. The above general principle has as a particular consequence that for F soritical relative to a1 , . . . , ai : if an is F whilst an+1 is not-F then one cannot know that an is F . In order to know that a predicate applies to a particular case one must at least be reliably right about it; knowing that an is F entails our being reliably right in supposing an to be F . Being reliably right in supposing an to be F in turn entails things sufficiently close to an being F (the dimensions of closeness depending on F ). Now each adjacent pair in the series, an and an+1 , must appear indiscriminable in respect of F (condition (iii) above for the soriticality of ‘F ’) and, in so far as they are indiscriminable, they are taken to be sufficiently close. So, the constraint of reliability in effect says that one can know an to be F only if adjacent members of the series with regard to which F is supposed soritical, namely an+1 and an−1 , are also F . Thus the reliability constraint on knowledge results in the following specific principle governing what one can say about an F -soritical series a1 , . . . , ai : If an is known to be F then an+1 is F . Applying this principle at F ’s (supposedly) sharp semantic boundary then explains the unknowability of the boundary. If the boundary divides an and an+1 — that is, an is F whilst an+1 is not-F — then, since an is truly F whilst the sufficiently similar case an+1 is not, the requirement on knowledge that there be a margin for error precludes knowledge of the fact that an is F . Similarly, we can show that the margin-for-error requirement precludes knowledge of the fact that an+1 is not-F . So, were there a sharp boundary to the application of a soritical term within a series relative to which the term in question is soritical, it would necessarily be unknowable. The burden of proof to explain why a sharp boundary, if presumed to exist, would be unknowable appears lifted. Though notice that the postulated boundary has not yet been argued for; it is simply that an obvious argument against it (following from the absence of any barrier to knowledge) has now been defused. More positive argument proceeds by way of an appeal to the ability to retain classical logic in the presence of vagueness (or, more exactly, the ability to retain whatever logic one took as appropriate prior to encountering the phenomenon of vagueness). The sorites paradox does not threaten classical logic. Given this epistemic analysis of vagueness, vagueness does not threaten classical logic but is, instead, modelled by an extension of the logic. What Williamson [1994] dubs “the logic of clarity”, C — what we might call “the logic of determinacy” given our earlier characterization of vagueness as indeterminacy — is a logic whose vocabulary extends that of the underlying logic, presumed classical, by the addition of a sentence functor ‘(it is) determinately (the case that)’, D. For any sentence A, DA will count as true if A is true in all sufficiently similar situations, where similarity is represented as a measure on situations or worlds.
Logics of Vagueness
295
A modal logic results, with 2 replaced by D. The modal logic KT is singled out as appropriate. (See Williamson [1994, Appendix] for details.) Thus: ⊢C DA → A ⊢C D(A → B) → (DA → DB) If ⊢C A then ⊢C DA So despite ⊢C A ∨ ¬A, and so ⊢C D(A ∨ ¬A), nonetheless C DA ∨ D¬A as evidenced by borderline predications where it might be neither determinately the case that A nor determinately not the case that A. Despite distributing across ‘→’, D does not distribute across ‘∨’. 3 FREGE, RUSSELL AND THE IDEAL LANGUAGE Contra the epistemic theorist, vagueness is widely considered an essential semantic feature of specific terms in natural language. This semantic view of vagueness dominated through much of twentieth century philosophy (so much so that the epistemic view is often precluded by definition — as noted earlier). Russell [1923, 85–6] speaks for many when, by way of an initial explanation of vagueness, he asks us to consider the various ways in which common words are vague, and let us begin with such a word as ‘red.’ It is perfectly obvious, since colours form a continuum, that there are shades of a colour concerning which we shall be in doubt whether to call them red or not, not because we are ignorant of the meaning of the word ‘red,’ but because it is a word the extent of whose application is essentially doubtful. This, of course, is the answer to the old puzzle about the old man who went bald. It is supposed that at first he was not bald, that he lost his hairs one by one, and that in the end he was bald; therefore, it is argued, there must have been one hair the loss of which converted him into a bald man. This, of course, is absurd. Baldness is a vague conception; some men are certainly bald, some are certainly not bald, while between them there are men of whom it is not true to say they must either be bald or not bald. Our moving beyond the epistemic theory means that we can no longer avail ourselves of the epistemic solution to the sorites. Classical logic seems threatened. Consider the conditional sorites. Rejecting the epistemic theory and its attendant claim that some premise of this argument is false, we might accept the premises as true and take issue with the reasoning involved, or we might endorse the reasoning as valid while denying that all the premises are true without accepting any as false. The former option would seem to entail some revision of classical
296
Dominic Hyde
logic since the paradox is classically valid, and the latter option would seem to entail rejection of the classically acceptable Principle of Bivalence. However, there is another option, another way one might seek to preserve classical logic and deny there is any tension between vagueness and classical logic. The vagueness of natural language might be taken to be irrelevant to logic since logic simply fails to apply to such “defective” language. Such a view is expressed by Frege and Russell. (See especially [Frege, 1903; Russell, 1923].) Committed as such theorists were to ideal language doctrines, it is not surprising to find them pursuing such a response. All traditional logic habitually assumes that precise symbols are being employed. It is therefore not applicable to this terrestrial life, but only to an imagined celestial existence. [Russell, 1923, 89] A key attribute of the ideal “celestial” language is said to be its precision; the vagueness of natural language is a defect to be eliminated. Since soritical terms are vague, the elimination of vagueness will entail the elimination of soritical terms. They cannot then, as some theorists propose, be marshalled as a challenge to classical logic. There can be no such thing as a logic of vagueness. A modern variation on this response, promoted most notably in Quine [1981], sees vagueness as an eliminable feature of natural language. The class of vague terms, including soritical predicates, can as a matter of fact be dispensed with and a “suitably regimented language” will be purged of vagueness. There is, perhaps, some cost to ordinary ways of talking, but a cost that is nonetheless worth paying for the simplicity it affords — namely, our thereby being able to defend classical logic with what Quine describes as its “sweet simplicity”. However, with the demise of ideal language doctrines and subsequent restoration of respect for ordinary language, vagueness is increasingly considered less superficial than this response suggests. If logic is to have teeth it must be applicable to natural language as it stands. Soritical expressions are unavoidable and the paradox must be squarely faced. 4 THE “TRIUMPH” OF THE DIALECTIC Russell not only took issue with those who would assert the existence of sharp semantic boundaries to vague terms. He simultaneously took issue with those he saw guilty of “the fallacy of verbalism”. There is a certain tendency in those who have realized that words are vague to infer that things also are vague. ... This seems to me precisely a case of the fallacy of verbalism — the fallacy that consists in mistaking the properties of words for the properties of things. Vagueness and precision alike are characteristics which can only belong to a representation, of which language is an example. They have to do with the
Logics of Vagueness
297
relation between a representation and that which it represents. Apart from representation, whether cognitive or mechanical, there can be no such thing as vagueness or precision; things are what they are, and there is an end of it. Nothing is more or less what it is, or to a certain extent possessed of the properties which it possesses. Russell [1923, 84–5]. His intended target here may well have been the dialectical materialists. Dialectics is the “logic of contradiction” applicable ... to those cases where formal [i.e. traditional or classical] logic is inadequate ... “contradictions contained in the concepts are but reflections, or translations into the language of thought, of those contradictions which are contained in the phenomena.” ... Someone points to a young man whose beard is just beginning to grow and demands a reply to the question as to whether he does or does not have a beard. One cannot say that he does, for it is not yet a beard. In a word, the beard is becoming; it is in motion; it is only a certain quantity of individual hairs which will one day become a quality called a beard. Thus wrote Milosz [1955, 47–8] of what he called The Method — dialectical materialism as interpreted by Lenin and Stalin, deriving from the dialectical materialism of Marx and Engels with its roots in Hegelianism. As with Russell earlier but more candidly, Milosz complained of the imputed ontological ramifications arising from the vagueness of representations. The hairs growing on the chin of a young man are absolutely indifferent as to what name one will give them. There is no “transition” here from “quantity to quality” ... The problem “beard or no beard” arises from the language we use, from our system of classification. What boundless vanity it is to ascribe to phenomena the contradictions in which we are entangled because of our clumsy concepts. Milosz [1955, 48] Notice, however, that, though the dialectical materialists’ ontological claims are disputed, our “clumsy”, i.e. vague, concepts are nonetheless admitted as entangling us in contradiction. Russell too explicitly rejected the supposed ontological implications of vagueness in representations and presumably had dialectical materialism or something similar in mind when complaining of those prone to the fallacy of verbalism. But the logical problems raised by vague language are another matter, over and above any such fallacy. Russell took the view, as we have seen, that vagueness is a defect and vague language is beyond the scope of (classical) logic. However, independently of the metaphysical questions raised, a dialectical approach to vagueness has been variously proposed. While Frege developed what we now commonly term “classical logic” to a high degree of sophistication from a perspective which sees vagueness as a defect, Marxist philosophers were pursuing a rival, “dialectical logic”. This “logic of contradiction” was deemed able to accommodate not only the inconsistencies postulated
298
Dominic Hyde
by Marxist analyses of phenomena such as motion, but also the phenomenon of vagueness, now considered as within the scope of logic. Plekhanov [1937/1908, 112], a target of Milosz’s criticisms, took the failure of “customary” (i.e. classical) logic to be apparent. He continues: When we see a man who has lost most of the hair from his cranium, we say that he is bald. But how are we to determine at what precise moment the loss of the hair of the head makes a man bald? To every definite question as to whether an object has this characteristic or that, we must respond with a yes or a no. As to that there can be no doubt whatever. But how are we to answer when an object is undergoing a change, when it is in the act of losing a given characteristic or is only in the course of acquiring it? A definite answer should, of course, be the rule in these cases likewise. But the answer will not be a definite one unless it is couched in accordance with the formula ‘Yes is no, and no is yes’; for it will be impossible to answer in accordance with the formula ‘Either yes or no’. Plekhanov [1937/1908. 114] Meaningful (definite) questions require a yes or no response, but in borderline cases we cannot say exclusively one or the other, either yes or no. We must answer both yes and no; vagueness presents itself “as an irrefutable argument in favour of the ‘logic of contradiction’.” Thus (at least) some Marxists theorists sought to establish the triumph of the dialectic over its Western rival. While it might be thought an interesting approach to pursue, no further detail was provided. There was no analysis offered comparable to the sophistication of its rival, classical logic. In a more illuminating discussion, McGill and Parry [1948, 428] explicitly advocated vagueness as grounds for a dialectical logic, claiming that “[i]n any concrete continuum there is a stretch where something is both A and ¬A. ... There is a sense in which the ranges of application of red and non-red [in so far as ‘red’ is vague] overlap, and the law of non-contradiction does not hold”. In agreeing with McGill and Parry that vagueness involves us in contradiction, Newton da Costa and Robert Wolf [1980, 194] suggested that one requirement of a dialectical logic “is that the proposed logic be interpretable as a logic of vagueness”. Da Costa’s view can be traced to an earlier suggestion of the pioneering logician Stanislaw Ja´skowski, a student of L ukasiewicz and member of the Lvov-Warsaw School of philosophy. In the same year that McGill and Parry published their dialectical approach to vagueness, Ja´skowski [1969/1948] described a “discussive logic” one of whose main applications was to serve as a logic of vague concepts — concepts which he saw as giving rise to contradictions. While McGill and Parry suggested a logic of vagueness tolerating contradiction, this pioneering paper marks the first formal presentation of a contradiction-tolerating or paraconsistent logic. A logic is defined to be paraconsistent just in case its consequence relation, ⊢, is such that not everything follows from a contradictory pair A and ¬A. i.e. for some A and B, {A, ¬A} B. Such a logic then can admit that some contradictory
Logics of Vagueness
299
pair may be true, while denying that everything is true. It thus admits of nontrivial inconsistency. So, for example, a borderline case of a bearded person may be admitted as both bearded and not bearded without triviality. The admission does not carry a commitment to everything being true (i.e. it does not carry a commitment to what we might term trivialism — the view that everything is true). Similarly, a borderline case of a red object may be admitted as both red and not red without triviality. Ja´skowski’s discussive logic was just such a paraconsistent logic, the first to be formally presented in detail. Discussive logic, though differing from classical logic by its admission of contradictory pairs of sentences A and ¬A as sometimes jointly true, nonetheless retains all the theorems of classical logic. Where ‘⊢DL ’ and ‘⊢CL ’ represent the consequence relations of discussive and classical logic respectively then: A, ¬A DL B, nonetheless: ⊢DL A if and only if ⊢CL A. Thus, in particular: ⊢DL ¬(A & ¬A) and in this sense the law of non-contradiction is preserved, contra McGill and Parry, despite the logic in question being paraconsistent. Contradictions are always false. Moreover (and non-trivially in a paraconsistent setting), contradictions are also never true in discussive logic. Correspondingly: A & ¬A ⊢DL B. Though contradictory pairs of sentences do not entail everything, contradictions themselves do. The principle ex falso quodlibet remains valid in this sense, and the logic is said to be weakly paraconsistent. (Strongly paraconsistent logics, following Arruda [1989], are both paraconsistent and such as to fail ex falso quodlibet.) Consequently, despite the necessary non-truth of contradictions in discussive logic, each of the contradictory pair that constitutes the contradiction can be true and so {A, ¬A} DL A & ¬A. Thus adjunction fails. A, B DL A & B. It is this non-adjunctive feature of the logic that has most frequently been cited as grounds for rejecting such a logic. (See, for example, [Lewis, 1983, ch. 15; Priest and Routley, 1989b; Keefe, 2000, ch. 7].) Within the pioneering Brazilian tradition of research into paraconsistent logics, Da Costa’s work, building on Ja´skowski’s, was picked up and subsequently elaborated on in Arruda and Alves [1979], and Da Costa and Doria [1995]. They
300
Dominic Hyde
persisted with Ja´skowski’s claim that discussive logic be looked on as a logic of vagueness. Some idea of the extent to which Ja´skowski’s work, and more particularly his view on vagueness, has influenced the development of paraconsistent logic in Brazil can be gleaned from Arruda [1989]. This explicit interest in vagueness from a paraconsistent perspective is not restricted to the Brazilian school. A paraconsistent approach to vagueness has been pursued within analytic philosophy by other non-classical logicians and philosophers. (See [Pe˜ na, 1989; Priest and Routley, 1989a]. Priest and Routley criticize weakly paraconsistent approaches for the reason, noted above, that they are non-adjunctive. Lewis [1982] considers such an approach on the strength of the questionable analogy between vagueness and ambiguity, but ultimately rejects it.) The main problem with suggested paraconsistent analyses is that while they have pointed in a paraconsistent direction they have not explained in any detail how vagueness is to be analyzed from a philosophical point of view. Vagueness is noted as an area for the application of paraconsistency but the centre of attention has remained squarely on the paraconsistent logics themselves and their detail. It is little wonder then that an emerging research program centering on vagueness itself has paid them little attention. How, for example, on a paraconsistent approach to the phenomenon of vagueness, is the pressing issue of the sorites paradox to be resolved? Hyde [1997] argues that the failure of modus ponens for a conventionally defined connective ‘→’ in discussive logic presents an obvious diagnosis. The paradox is unsound since invalid. In recognition of the fact that ‘→’ does not support modus ponens Ja´skowski introduced a weaker connective, discussive implication ‘→D ’, which does satisfy modus ponens but now it is far from clear that the sorites paradox interpreted as employing such a conditional has all true premises. The general scepticism that many feel towards paraconsistency has meant that such an approach has not received wide support. A logic closely related to discussive logic has, however, been widely discussed and is commonly endorsed as a logic of vagueness. That logic is supervaluationism. 5
SUPERVALUATIONISM
A decade after Ja´skowski presented his paraconsistent response to vagueness, another former student of the innovative Lvov-Warsaw School of philosophy, Henryk Mehlberg, described an informal approach to vagueness. Mehlberg [1958] is generally recognized as a precursor to the formal method of supervaluations. Supervaluationism, as it has become known, as applied to the phenomenon of vagueness is now commonly considered a reinterpretation of the ‘presuppositional languages’ formally described by van Fraassen [1966]. The approach is the dual of the paraconsistent approach discussed above. Where discussive logic admits truth-value gluts when confronted with borderline cases, the current proposal admits truth-value gaps. Where discussive logic admits the truth of both A and ¬A (e.g. when predicating baldness and non-baldness of a
Logics of Vagueness
301
borderline case), and so admits A as both true and false, the supervaluationist denies the truth of both A and ¬A, and so rejects A as neither true nor false. And just as the admission of truth-value gluts on behalf of the paraconsistentist was non-trivial (i.e. some propositions are admitted as both true and false without every proposition being so), so too with supervaluationism. The admission of truth-value gaps is non-trivial. Some propositions are admitted as neither true nor false without every proposition being so. (Thus supervaluationism is an example of what has been termed a paracomplete logic — a non-trivial gap logic.) Dummett [1975], Fine [1975] and Keefe [2000] build on Mehlberg [1958] and adapt van Fraassen’s supervaluation semantics to the sorites paradox, and vagueness more generally, resulting in a non-bivalent logic that, initially at least, retains the classical consequence relation and classical laws whilst admitting truth-value gaps. The challenge posed by the conditional sorites paradox can, on this view, be met by denying the truth of some conditional premise. This accords with the diagnosis offered by the epistemic theorist, however, given the now-postulated failure of bivalence, such a denial no longer commits one to acceptance of the falsity of the premise in question. The epistemic gap in respect of borderline cases, i.e. our inability to know either that A or to know that ¬A, is now taken to reflect a truth-value gap. Vagueness is a semantic phenomenon on this approach, as Russell claimed, but is also within the scope of logic which is then modified to account for the phenomenon. Thus, in contrast to the epistemic conception of vagueness, a semantic conception will treat the apparent semantic indeterminacy of vague predicates as real. Borderline cases, symptomatic of vagueness, are cases to which the predicate neither determinately applies nor determinately doesn’t apply, where ‘determinately’ is now given a semantic analysis as opposed to an epistemic one. Contra an epistemic account, the positive extension of a predicate is given by those objects to which the predicate determinately applies, the negative extension is given by those objects to which the predicate determinately does not apply, and the remaining (borderline) cases constitute the predicate’s penumbra. Consistent with a view of vagueness as a semantic deficiency (e.g. Fine [1975]) or as semantic indecision (e.g. Lewis [1986]) “truth” can now be defined in terms of that which is true irrespective of how the semantic deficiency or indecision is resolved (“super-truth” as it is sometimes called). That is to say, in supervaluationist terms, a sentence of the language will count as true just in case it is true on all admissible precisifications. (See Fine [1975] for more on the notion of an “admissible precisification”.) The ensuing logic is a consequence of how validity is then defined. Two variants have been articulated. Firstly, just as a sentence of the language is evaluated as (super)true if it is true for all admissible precisifications, one might analogously define an inference to be valid if and only if it is valid in all admissible precisifications. On this definition, A is a valid consequence of a set of sentences Σ if and only if, in all admissible precisifications, A is true whenever all members of
302
Dominic Hyde
Σ are. All and only the classically valid inferences are valid in all such precisifications, and so all and only the classically valid inferences are valid according to this definition of validity. This is the definition suggested by Dummett [1975]. (As with van Fraassen’s presentation of supervaluationist semantics, supervaluationist approaches to vagueness assume that admissible precisifications correspond to classical models in the sense that when vagueness is eliminated the resulting precisified language is classical. This assumption is not essential. A supervaluationist model structure could equally well be built upon an underlying semantics that was nonclassical, e.g. intuitionist, relevant, etc. In this sense a supervaluationist approach merely aims to provide a non-bivalent semantic superstructure sensitive to vagueness which collapses to one’s preferred underlying semantics where vagueness does not arise. However, since it is traditionally a development of a non-classical semantics from a classical base, and this tradition has circumscribed the ensuing issues, problems, and debate, supervaluationism as it is discussed and debated is now synonymous with this classically oriented theory — classical supervaluationism.) As Dummett points out, it follows that an inference valid on the foregoing account, what Williamson [1994, 148] calls a “locally valid” inference, will lead from (super)true premises to a (super)true conclusion. This latter, strictly weaker claim to do with the preservation of (super)truth is used by Fine [1975] to suggest an alternative definition of validity, “global validity”, which also preserves classical consequence. A is a valid consequence of a set of sentences Σ if and only if A is true in all admissible precisifications whenever all members of Σ are true in all admissible precisifications. That is, A is a valid consequence of a set of sentences Σ if and only if A is (super)true whenever all members of Σ are (super)true. The result of its being a weaker relation manifests itself when we come to consider the extended language including a determinacy operator, D, to be discussed below. Global validity is standardly adopted as the relevant notion of validity (see [Williamson, 1994; Keefe, 2000]. Irrespective of which definition is adopted, validity coincides with classical validity on the unextended language, as noted. Where ‘⊢SV ’ represents either consequence relation then: Σ ⊢SV A if and only if Σ ⊢CL A. (For proofs see [Williamson, 1994, 148–9; Keefe, 2000, 175–6].) In particular, treating laws as zero-premise arguments, the logic then preserves all classical laws. ⊢SV A if and only if ⊢CL A. Thus, in particular: ⊢SV A ∨ ¬A and in this sense the law of excluded middle is preserved despite the logic in question admitting truth-value gaps. For example, irrespective of the vagueness of
Logics of Vagueness
303
‘heap’ it is logically true of any number of grains of wheat that it either does or does not make a heap. As a consequence, supervaluation semantics is no longer truthfunctional. Where A is neither true nor false due to vagueness, the disjunction A ∨ A will similarly lack a truth-value, whereas A ∨ ¬A will be true. Conjunction and the conditional exhibit analogous non-classical features. What of the sorites paradox then? Since all the forms taken by the sorites are classically valid, they are therefore also supervaluationally valid. The conclusion of the conditional form is resisted by noticing that some conditional premise fails to be true; though, admittedly, none is false. The conditional sorites is valid but unsound. More revealing is the diagnosis with regard to the mathematical induction form. It is also deemed unsound due to the failure of one of the premises — the universal premise. The universally quantified conditional is not true. In fact, it is false; it is false despite the fact that there is no single conditional premise of the conditional form of the paradox which can be identified as false. That is to say, it is false that for all n, if F an then F an+1 (where F is soritical relative to the subjects of the form an ). Given that supervaluation semantics further admits that the falsity of ∀n(F an → F an+1 ) is logically equivalent to the truth of ∃n(F an & ¬F an+1 ), the line-drawing form of the sorites is also solved. The argument is supervaluationally valid since classically valid and its premises are incontestably true. What supervaluation semantics claims to provide is a formal account of how it is that such a conclusion could, contrary to appearances, be true. It is true since true no matter how one resolves the indeterminacy of the vague term involved (i.e. the soritical predicate). In this way then the sorites paradoxes are said to be defused. With vagueness viewed as a semantic phenomenon, classical semantics is no longer appropriate as a semantics of vague language and supervaluation semantics is proposed in its place. One immediate concern facing this solution, however, is the fact that it ultimately treats the mathematical induction and line-drawing forms of the sorites in just the same way as the logically conservative epistemic theory. We are forced to accept the avowedly counter-intuitive truth of ∃n(F an & ¬F an+1 ) which seems to postulate the existence of a sharp boundary, yet the existence of just such a boundary is what the semantic theory of vagueness is supposed to deny. Supervaluationists respond by denying that the conclusion of the line-drawing sorites expresses the existence of a sharp boundary. Though committed to the claim that: (a) T‘∃n(F an & ¬F an+1 )’, semantic precision is, it is claimed, only properly captured by the claim that: (b) ∃nT‘(F an & ¬F an+1 )’, and this is clearly denied by supervaluation theory. Whilst it is true that there is some cut-off point, there is no particular point of which it is true that it is the cut-off point. Since it is only this latter claim which is taken to commit one to the existence of a sharp boundary, there is no commitment to there being such a boundary of which we are ignorant (contra the epistemic theorist). With this explanation, however, doubts arise as to the adequacy of the logic.
304
Dominic Hyde
Not only must (b) be properly taken to represent the semantic precision of F but we must also be prepared to admit that some existential statements can be true without having any true instance, thus blocking any inference from (a) to (b). Just as the retention of the law of excluded middle in the presence of truth-value gaps commits the supervaluationist to there being true disjunctions lacking true disjuncts (analogous to the non-standard behaviour of conjunction in discussive logic, taken by many as evidence of that logic’s inadequacy), so too must we countenance analogous non-standard behaviour in the logic’s quantification theory. In effect, the commitment to the preservation of classical validity comes at a cost to other intuitions. The supervaluationist approach has also come under fire for its semantic ascent when defusing the sorites. The problem with accepting the major premise of the mathematical induction form of the paradox as false is simply that it runs counter to our conviction that a grain of wheat can make the difference between a heap and a non-heap. Yet this conviction can be expressed in the object-language, so why should the metalinguistic subtleties involved in distinguishing (a) from (b) above be relevant here? As it happens, such ascent is not essential to the account. The language can be extended to include a determinacy operator, D (‘It is determinately the case that ...’), appropriate for the expression of vagueness in the object language. The vagueness of expressions like ‘heap’ is characterized by their possessing borderline cases and this can now be expressed as the existence of cases to which the term neither determinately applies nor determinately does not apply. A vague sentence A is such that neither DA nor D¬A; i.e. it is neither determinately the case nor determinately not the case. Where IA =df ¬DA & ¬D¬A, the vagueness of a sentence is then expressed as IA. For any sentence A, DA will count as true if and only if A is (super)true, i.e. if and only if A is true in all admissible precisifications. A semantics for D can then be given in a manner analogous to that given for necessity and a modallike logic results, with 2 (‘It is necessarily the case that ...’) replaced by D. (See [Williamson, 1994, 149–50].) By means of the extended language (a) and ′ (b) above can be recast within the object language: (a ) D∃n(F an & ¬F an+1 ); ′ (b ) ∃nD(F an & ¬F an+1 ). The first is again affirmed and the latter denied. Any ′ ′ inference from (a ) to (b ) is now analogous to the modal inference from 2∃xF x to ∃x2F x and is said to be fallacious just as the corresponding modal inference is commonly said to be. The strength of the resulting logic of determinacy is a matter of some debate. In respect of the logic’s theorems, it seems uncontroversial that: ⊢SV DA → A ⊢SV D(A → B) → (DA → DB) If ⊢SV A then ⊢SV DA
Logics of Vagueness
305
Whether or not the logic should include as theorems the analogue of the Brouwersche axiom, and analogues of the characteristic axioms for S4 and S5, i.e. ⊢ ¬A → D¬DA ⊢ DA → DDA ⊢ ¬DA → D¬DA. is more controversial. Williamson [1999] argues for their rejection, endorsing the analogue of the modal system KT as appropriate for the identification of the theorems of the extended language. The phenomenon of higher-order vagueness, on which see more below, is central to the case. The failure of any strong analogy with modal logics becomes apparent when we consider validity more generally. While the validity of the inference from A to 2A would render the operator 2 trivial, in SV the inference is indeed valid. (In fact, it is globally though not locally valid, thus what follows assumes the point already made above that supervaluationism standardly adopts the global account.) A ⊢SV DA. The non-triviality of D is avoided since A → DA is not valid. i.e.: SV A → DA. Therewith we are presented with a counterexample to conditional proof, one of a number of classically valid structural rules (i.e. rules describing entailments between valid arguments) that fail in the extended logic. Since: ¬DA SV ¬A the rule of contraposition also fails. Additionally, since A ⊢SV DA, A ⊢SV DA ∨ D¬A and, similarly, ¬A ⊢SV DA ∨ D¬A. Since ⊢SV A ∨ ¬A, were proof by cases valid it would follow that ⊢SV DA ∨ D¬A which is patently false. Thus proof by cases fails. So too does reductio ad absurdum since A & ¬DA ⊢SV DA & ¬DA yet we cannot validly conclude that ⊢SV ¬(A & ¬DA). The logic of the extended language is therefore decidedly non-classical. This then threatens to undermine claims by supervaluationists that such an approach to vagueness “preserves classical logic”. In addition to the non-classical semantics (as evidenced by the unusual behaviour of ∨ and ∃ in particular), the consequence relation of the extended language also deviates. (See [Williamson, 1994, 150ff; Keefe, 2000, 176ff].) Moreover, given the way in which D was added to the language, it effectively functions as a (super)truth predicate. And, just as D was a non-trivial operator, so too with (super)truth. Like D, the truth of A entails and is entailed by A, yet the T-Schema fails. I.e. the truth of A is not materially equivalent to A. If Tarski’s T-Schema was to hold and truth was taken to be disquotational then bivalence
306
Dominic Hyde
would ensue (as [Williamson, 1994] makes plain) but it does not. Whether and to what extent this undermines a supervaluationist account of vagueness (and supervaluationism as a viable logic of truth-value gaps more generally) is discussed by Keefe [2000, 202ff].
6 MANY-VALUED AND FUZZY LOGICS
The foregoing logics of vagueness have postulated the True and the False and have either sought to show that vague sentences either fall under one or other of the two categories, or under both, or under neither. Thus no truth-values other than the True and the False were postulated, with non-classical approaches advocating either truth-value gluts or truth-value gaps. Epistemic approaches to vagueness rest content with an exclusive and exhaustive categorization of sentences into the True and the False, discussive logic rejects the exclusivity of such a categorization, and supervaluationism rejects the exhaustiveness of such a categorization. All, however, can be viewed as embracing the idea that there are only two truth-values. Many-valued logic, on the other hand, explicitly rejects this. In such a logic, vague sentences, sentences where a predicate is applied to a borderline case, are neither true nor false and take some additional value. Beyond this, many-valued responses to the phenomenon of vagueness and the attendant sorites paradoxes vary. Many-valued logics can vary, firstly, in respect of the number of non-classical truthvalues deemed appropriate to model vagueness and defuse the sorites paradox. Are the values required merely three in number or are more, perhaps infinitely many, required? Secondly, what semantics ought one provide for the logical connectives? Should truth-functionality be retained or should we, like supervaluationists, advocate a non-truth functional approach? And if, say, a truth-functional approach is to be adopted then what specific truth-functions are appropriate? And, thirdly, what account of validity should be adopted? An early three-valued proposal for a logic of vagueness can be found in Halld´en [1949]. The initial motivation for such a logic is similar to the supervaluationist’s. Just as a vague predicate divides objects into the positive extension, negative extension and the penumbra, vague sentences can be divided into the True, the False and the Indeterminate. Unlike supervaluation semantics, however, Indeterminacy is considered a third value, thus the truth set expands upon the classical pair {t, f } to include a third value {t, i, f }, and the sentential connectives are all defined truth-functionally. The truth-functions are represented in the truth-tables below.
Logics of Vagueness
A t i f
¬A f i t
A t i f t i f t i f
B t t t i i i f f f
A&B t i f i i i f i f
307
A∨B t i t i i i t i f
A→B t i t i i i f i t
(These tables correspond, in fact, to what Kleene [1938] earlier described as the characteristic tables for his “weak” connectives.) The tables represent an extension of the classical truth-tables in the sense that compound sentences whose components take classical values, themselves take the same value as dictated by the classical truth-tables. Any compound sentence with indeterminate (i.e. non-classical) components is itself thereby taken to be indeterminate. In particular, it is easy to see that classical theorems, despite remaining always true when their component sentences take classical values, will be indeterminate whenever any component sentence is. Despite this, Halld´en retains classical theoremhood by defining theoremhood so that a sentence A is a theorem (⊢H A) if and only if A is always either true or indeterminate. Sentences such as excluded middle claims, though sometimes indeterminate, are nonetheless never false and so excluded middle counts as a theorem, ⊢H A ∨ ¬A. More generally: ⊢H A if and only if ⊢CL A. Generalizing to a definition of validity, A is a valid consequence of a set of sentences Σ (Σ ⊢H A) if and only if A is true or indeterminate whenever all members of Σ are true or indeterminate. As is familiar in three-valued logics with validity defined as preservation of non-falsity, as it is here, the logic is decidedly non-classical despite the retention of classical theoremhood. For example, given indeterminate A and false B, the inference from A and ¬A to B fails to preserve non-falsehood. Consequently: A, ¬A H B. The logic is paraconsistent. (In fact, it corresponds to the paraconsistent variant of the Kleene “weak” system, differing in its definition of validity in terms of preservation of non-falsity, as opposed to the Kleene definition in terms of preservation of truth.) Though not generally remarked upon, Halld´en’s logic of vagueness constitutes one of the earliest formally characterized paraconsistent logics, appearing just one year after Ja´skowski’s [1948] publication, though differing from that system by being truth-functional. Unlike discussive logic, adjunction is valid:
308
Dominic Hyde
A, B ⊢H A & B and so ex falso quodlibet is no longer valid in the following sense: A & ¬A H B. Unlike discussive logic then the system is strongly paraconsistent. Of relevance to the sorites paradoxes is the fact that modus ponens fails: A, A → B H B. Therewith, a solution is forthcoming to the conditional sorites. It is simply invalid depending as it does on iterated applications of modus ponens. And, with universally and existentially quantified sentences treated as analogous to long conjunctions and disjunctions respectively, the mathematical induction sorites is similarly solved. For more discussion see Williamson [1994, §4.4]. Contrasted with three-valued truth-functional logics that preserve classical theoremhood but deviate in respect of classical inference, alternatives have been proposed which, while less conservative with respect to classical theoremhood, are nonetheless more conservative with respect to classical inference. K¨ orner [1960], and more recently Tye [1990], proposed a three-valued logic of vagueness. (A recent variation on this theme is Field [2003].) The connectives are defined as follows. A t i f
¬A f i t
A t i f t i f t i f
B t t t i i i f f f
A&B t i f i i f f f f
A∨B t t t t i i t i f
A→B t t t i i t f i t
(These tables correspond to the characteristic tables for Kleene’s 1938 “strong” three-valued system K3 .) As with Halld´en’s logic of vagueness, the tables represent an extension of the classical truth-tables; where vagueness does not arise, the connectives are taken to behave classically. Moreover, universally and existentially quantified sentences are treated as analogous to long conjunctions and disjunctions respectively. Thus ∀xF x is true if and only if, for all d ∈ ∆ (the domain of quantification), F d is true; is false if and only if, for some d ∈ ∆, F d is false; and indeterminate otherwise. ∃xF x is true if and only if, for some d ∈ ∆, F d is true; is false if and only if, for all d ∈ ∆ , F d is false; and indeterminate otherwise.
Logics of Vagueness
309
Theoremhood can be variously defined, with Tye proposing that a sentence A is a theorem (⊢K3 A) if and only if A is always true. There are no theorems in such a system. Any sentence will be evaluated as non-true when all its component sentences are, thus no sentence is always true. In particular, the law of excluded middle is no longer a theorem: K3 A ∨ ¬A. Nor is the law of non-contradiction a theorem. Moreover and more surprisingly, in accord with the accepted (classical) logical equivalence between ¬A ∨ B and A → B, it follows that: K3 A → A. Generalizing to a definition of validity, Tye proposes that A be a valid consequence of a set of sentences Σ (Σ ⊢K3 A) if and only if A is true whenever all members of Σ are true. (K¨orner proposes a different, somewhat idiosyncratic, definition. See [Williamson, 1994, 289, fn. 16] for discussion.) With only truthpreservation required for validity, despite the class of theorems now being empty a range of classically valid inferences are now accepted as valid. In particular, modus ponens and mathematical induction are valid. Correspondingly, the conditional sorites and mathematical induction sorites are valid. The air of paradox they engender is said to be dispelled by observing that the premises are not all true, thus the arguments are unsound. The conditional sorites is said to have some non-true conditional premise which is nonetheless non-false. There will be some conditional F ai−1 → F ai whose antecedent and consequent are both indeterminate since each of ai−1 and ai are borderline cases of F , thus rendering the conditional itself indeterminate. So too with respect to the mathematical induction sorites. The major premise, the universally quantified conditional expressing the tolerance of the predicate with respect to marginal change, will be neither true nor false. The line-drawing sorites will, accordingly, have a conclusion that is neither true nor false despite having true premises. It is therefore invalid on such an approach. Parsons [2000], building upon Parsons [1987] and Parsons and Woodruff [1995], proposes a closely similar three-valued system for evaluating arguments involving “indeterminacy”. The system proposed is L ukasiewicz’s three-valued system L3 . The system differs from the Kleene logic, K3 , only in respect of conditionals with indeterminate antecedent and indeterminate consequent. Whereas K3 claims such a conditional to be indeterminate, L3 takes such a conditional to be true. Thus: ⊢L3 A → A. The difference between the two systems is of no consequence if used to diagnose flaws in sorites reasoning.
310
Dominic Hyde
However, despite the logic being proposed as a logic of indeterminacy, whether such a logic should be counted a logic of vagueness is another matter. Parsons [2000] explicitly disavows any proposal in respect of “vagueness”. Nonetheless, such a system, even if not explicitly advocated for use as a logic of vagueness, remains a clear rival to K3 . While some are motivated to adopt one of the foregoing three-valued approaches for their truth-functionality, others find the consequences unacceptable. Those who, for example, find supervaluationist arguments for classical laws plausible will baulk at excluded middle claims sometimes being other than determinately true or contradictions sometimes being other than determinately false, as may be the case in such systems. (See [Williamson, 1994, ch. 4; Keefe, 2000, §4.5] for a discussion.) A further concern with such approaches is that the invoked tripartite division of sentences seems to face similar objections to those which led to the abandonment of the bipartite division effected by two-valued classical logic. There would seem to be no more grounds for supposing there to exist a boundary between the true sentences and indeterminate ones or the indeterminate sentences and false sentences than there was for supposing a sharp boundary to exist between the true sentences and the false ones. The phenomenon of vagueness which drives the sorites paradox no more suggests two sharp boundaries than it did one. Vague concepts appear to be concepts without boundaries at all. No finite number of divisions seems adequate. Goguen [1969] and Zadeh [1975] propose replacing classical two-valued logic with an infinite-valued one. Infinite-valued or fuzzy logics thus replace talk of truth with talk of degrees of truth. Just as baldness, for example, comes in degrees so too, it is argued, does the truth of sentences predicating baldness of things. The fact that John is more bald than Jo is reflected in the sentence ‘John is bald’ having a higher degree of truth than ‘Jo is bald’. With this logical innovation infinite-valued logics are then offered as a means to solve the sorites paradox. As with all many-valued logics, the connectives can be defined in a number of ways, giving rise to a number of distinct logics. A now common proposal proceeds by way of the continuum-valued, truth-functional semantics of L ukasiewicz and Tarski [1930]. If we represent the set of truth-values by the set of reals [0, 1], where 0 represents (determinate) falsehood and 1 represents (determinate) truth, then we can characterize the truth-value of a compound sentence A, ν(A), as follows: ν(¬A) = 1 − ν(A) ν(A & B) = min{ν(A), ν(B)} ν(A ∨ B) = max{ν(A), ν(B)} 1 if ν(A) ≤ ν(B) ν(A → B) = 1 − ν(A) + ν(B) if ν(A) > ν(B)
Logics of Vagueness
311
Quantification generalizes upon the connectives in the usual way. Where ∆ is the domain of quantification:: ν(∀xF x) = greatest lower bound{ν(F d) : d ∈ ∆} ν(∃xF x) = least upper bound{ν(F d) : d ∈ ∆}. Despite their advocacy of a continuum-valued semantics for vagueness, neither Goguen nor Zadeh advocated the foregoing semantics. Goguen [1969, 350f], for example, makes a more general proposal allowing for truth-values as n-tuples, i.e. as vector values. Thus where the applicability of a vague predicate might depend on a number of dimensions (e.g. the application of colour predicates might be thought to depend on hue, saturation and brightness) semantic values exhibit complexity sufficient to accommodate this. For example, for some sentence A predicating redness of some object a, the semantics allows for a valuation ν such that ν(A) = α1 , α2 , α3 . Consequently, the set of truth-values is not totally ordered; A and B may take distinct truth-values α and β such that there is simply no fact of the matter whether α < β or β ≤ α. Peacocke [1981, 135] similarly recommends abandonment of a totally ordered truth-value set. Goguen [1969, 347] also argues for a distinct truth-function for conjunction according to which ν(A & B) = ν(A) × ν(B). It follows that ν(A & A) < ν(A) where 0 < ν(A) < 1. Many find this counterintuitive. The truth-function for disjunction is analogously distinct. In accord with the De Morgan principle according to which A ∨ B is equivalent to ¬(¬A & ¬B), ν(A ∨ B) = ν(A) + ν(B) − (ν(A) × ν(B)). Consequently ν(A ∨ A) > ν(A) where 0 < ν(A) < 1. Zadeh [1975], on the other hand, builds on the fuzzy set theory of Zadeh [1965]. This landmark work, “Fuzzy Sets”, has launched a small industry concerned with mathematical, computational and philosophical applications of fuzzy set theory. At its simplest, a fuzzy set differs from a classical set in so far as set membership is a matter of degree. Rather than an item either being a member (i.e. being a member to degree 1) or being a non-member (i.e. being a member to degree 0), it is proposed that it might now be a member to some degree in the continuum-valued range [0, 1]. The extension to logic is straightforward, initially at least. Consider the set of truths. True sentences can be thought of as members of the set to degree 1; false sentences are members to degree 0. But the set admits of membership to degrees other than 1 or 0. We might now admit sentences as members of the set to some degree n where 0 < n < 1 and go on to define a sentence as “true to degree n” just in case it is a member of the set of truths to degree n. Thus degrees of truth can be formally extracted from fuzzy set theory. However, unlike the L ukasiewicz semantics outlined earlier, Zadeh [1975] goes on develop a semantic theory that replaces numerical truth-values (e.g. 1, 0.76, 0.40, etc.) with non-numerical expressions like “very true”, “very not true”, etc. Ultimately, though, such linguistic truth-values depend upon numerical truth-values and so the difference may not be as great as it first appears.
312
Dominic Hyde
In fact, despite Goguen and Zadeh’s advocacy of their particular variants of truth-functional infinite-valued logic, to the extent that their project has been taken up by subsequent theorists of vagueness it has been primarily pursued without the variations on the L ukasiewicz approach described so far. Continuum-many numerical truth-values and a semantics for the logical constants as described by L ukasiewicz remains the preferred basis for the development of truth-functional infinite-valued approaches to vagueness. (Lakoff [1973] differs only marginally as regards the clause for ‘→’, requiring that ν(A → B) = 1 iff ν(A) ≤ ν(B).) Machina [1976] develops just such an approach. Building on the L ukasiewicz semantics outlined, validity in his system, M , is defined as “truth-preservation” so that A is a valid consequence of a set of sentences Σ (Σ ⊢M A) if and only if, for all ν, ν(A) ≥ min{ν(B) : B ∈ Σ}. Subsequently generalizing on the notion of validity, Machina [1976, 70] goes on to define a broader notion of “degree of truthpreservation” possessed by an argument form. The argument form ‘Σ therefore A’ is truth-preserving to degree n (0 ≤ n ≤ 1) in M if and only if 1 − n is the greatest lower bound on the magnitude of the drop in truth-value from the least true premise to conclusion under any valuation. (Where the degree of truth of the conclusion is never less than the least true premise then we stipulate that n = 1.) An argument form that is truth-preserving to degree 1 is then obviously one such that no valuation makes the conclusion any less true than the least true premise, and thus the argument form is valid as earlier defined. Machina also proposes replacing the notion of tautology with the notion of a “minimally n-valued formula”: a formula is minimally n-valued if and only if it can never have a truth-value less than n. It is easy to verify that all classical tautologies are minimally 1-valued when restricted to classical values 0 and 1, as expected. However, when vague propositions are considered it is easy to see that the law of excluded middle and law of non-contradiction (for example) are only minimally 0.5-valued. On the system thus described, modus ponens and mathematical induction are invalid since they are not completely truth-preserving. Correspondingly, the conditional sorites and mathematical induction sorites are invalid and, for that reason, unsound. The line drawing sorites, though not discussed by Machina, is similarly not completely truth-preserving and so unsound; its premises may take value 1 while its conclusion is as low as 0.5. Edgington [1992; 1996] advocate a distinctly different logic which, while invoking a degree theory building on a continuum-valued truth-set, is nonetheless nontruth-functional. A non-truth-functional approach is said to be required following consideration of a range of cases where truth-functionality leads to evaluations of complex sentences that appear counterintuitive. (See [Edgington, 1996, 304–5].) Where borderline predications giving rise to sentences A and B whose degrees of truth, or “verity” (as Edgington puts it), are somewhere between 1 and 0, the degree of truth of their conjunction A & B (say) will sometimes be given by the minimum value of the conjuncts, as predicted by L ukasiewicz semantics, but not always. L ukasiewicz semantics simply gets things wrong, so it is claimed.
Logics of Vagueness
313
Moreover, despite C and B having the same verity, the verities of A & B and A & C might nonetheless differ. Consequently no truth-functional semantics, L ukasiewicz or otherwise, will suffice. Instead, a systematic account of the logical constants is adapted from probability theory. Thus, where ν(B given A), the conditional verity of B given A, is the value assigned to B on the hypothetical decision to count A as definitely true (i.e. ν(A) = 1): ν(¬A) = 1 − ν(A). ν(A) × ν(B given A), if ν(A) = 0 ν(A & B) = 0, if ν(A) = 0 ν(A ∨ B) = ν(A) + ν(B) − ν(A & B). The semantics for the conditional is side-stepped in Edgington [1996] on the simple grounds that the sorites paradox can be framed without such a connective, replacing the material conditional A → B with ¬(A & ¬B), as recommended by the Stoics, but the account offered in Edgington [1992] is said to be “tempting”, namely: ν(A → B) = ν(B given A). Quantification generalizes upon the connectives, with universal quantification analogous to an extended conjunction of each of its instances and existential quantification analogous to an extended disjunction of each of its instances. Subsequently, sentences of the form ∀xF x may be false without any instance being false if the verity of each instance is slightly less than 1 and sentences of the form ∃xF x may be true if the verity of each instance is slightly more than 0. Validity in Edgington’s system E is now defined so that A is a valid consequence of a set of sentences Σ = {B1 , B2 , ..., Bn } — i.e. {B1 , B2 , ..., Bn } ⊢E A — if and only if, for all ν, ν(¬A) ≤ ν(¬B1 ) + ν(¬B2 ) + ... + ν(¬Bn ). That is, Σ ⊢E A if and only if for no evaluation does the “unverity” of its conclusion A, i.e. 1− ν(A) or equivalently ν(¬A), exceed the sum of the unverities of the premises. Validity is a matter of verity-preservation in this sense, or, as Edgington puts it, “valid arguments have the verity-constraining property [just described]”. Such a logic is taken to validate the theorems and inferences characteristic of classical logic: Σ ⊢E A if and only if Σ ⊢CL A. Of course, the logical structure of verity and E more generally is, as formulated by way of the axioms of probability theory, equivalent to the logical structure of probability, and this is taken to be classical. For this reason the consequence relation of the logic of vagueness is classical. But this is not essential to the approach being advocated. As Edgington makes clear, and as we saw in the case of supervaluationist approaches to vagueness, the point is rather that to the
314
Dominic Hyde
extent that one takes the consequence relation to be classical in the absence of vagueness, its extension to accommodate inference involving vague expressions does not thereby undermine its claim to be classical. Vagueness necessitates the recognition of degrees of truth, or verities, but does not necessitate a weakening of the consequence relation (in contrast to other approaches considered above that invoke degrees of truth). Thus, one might, for example, have independent reason for rejecting classical logic in favour of intuitionist or relevant logic prior to any consideration of the puzzle posed by vagueness and go on to account for vagueness by invoking verity whose logical structure is nonetheless equivalent to that of the underlying logic, either intuitionist or relevant. This is achieved by appealing to an appropriately non-classical probability theory when adapting principles governing probabilities of negation, conjunction, etc. to a semantics and logic of vagueness. (See [Weatherson, 2004].) Not only is E’s consequence relation classical, the principle of bivalence is also said to be preserved despite the appeal to degrees of truth. Given a conception of truth satisfying the T-schema, i.e. disquotational truth T : ⊢E T A ∨ T ¬A Taken as an expression of bivalence, bivalence is thus accepted; ν(A is true ∨ A is false) = ν(A is true ∨ ¬A is true) = 1 always. It is always determinately the case that any sentence A is either true or false. This is to be distinguished from the claim that it is always either determinately the case that A is true or determinately the case that A is false, i.e. ν(A is true) = 1 or ν(¬A is true) = ν(A is false) = 1. Determinacy, i.e. having verity 1, does not distribute across disjunction; the acceptance of the law of excluded middle — for all ν, ν(A ∨ ¬A) = 1 — even though, for some ν, ν(A) = 0.5 further illustrates the point. Every sentence is either (disquotationally) true or false but not every sentence has verity either 1 or 0. As in McGee and McLoughlin’s [1995] brand of supervaluationism which distinguishes bivalent disquotational truth from gap-tolerating correspondence truth, Edgington distinguishes disquotational truth from verity. Validity requires verity-preservation, thus invoking a non-bivalent evaluation of vague expressions in the context of argument evaluation. In this specific sense then E is a logic that is “many-valued” in its evaluation of vague language. Yet, like (classical) supervaluationism, it is non-truth-functional and validates classical logic. The formal similarities run deeper. Edgington [1996], Lewis [1970] and Kamp [1975] all point to the possibility of defining a measure of verity within a supervaluationist semantics by means of a measure on the space of admissible precisifications making a sentence true. From the perspective of E, however, supervaluationism is just a logic of verity where only 1 and 0 are recognized as truth-values, with all intermediate verities collectively treated as “neither true nor false”. Supervaluationism ignores any more fine-grained evaluation, including any continuum-valued one such as figures in E. Continuum-many values are said to
Logics of Vagueness
315
be required yet the semantics described above (Goguen, L ukasiewicz) predict implausible semantic values for complex sentences (e.g. conjunctions) due to their truth-functionality. E is proposed as the requisite synthesis. Like the Stoic logicians, Edgington [1996] treats the conditional sorites in its weakest form, assuming only a material conditional. F a1 ¬(F a1 & ¬F a2 ) ¬(F a2 & ¬F a3 ) .. . ¬(F ai−1 & ¬F ai ) ∴
F ai (where i can be arbitrary large)
for appropriate a1 , . . . , ai . Since detachment for the conditional amounts to the classically valid inference ‘{A, ¬(A & ¬B)} therefore B’, the argument is valid in E. The reasoning is not at fault, instead it is the failure to pay due heed to the admittedly small but nonetheless significant unverity of each of the material conditional premises. The premises are not all of verity 1 and the accumulated unverity results eventually in verity 0. The mathematical induction sorites, again considered in its weakest form with major premise ∀n¬(F an & ¬F an+1 ), is similarly valid but its major premise is now “clearly false”, taking verity 0. And as with supervaluationists’ SV , its classical negation, ∃n(F an & ¬F an+1 ), is “clearly true” taking verity 1. And so the air of paradox that shrouds SV ’s commitment to such a truth in resolving the mathematical induction form of the sorites envelops E. Like SV , the commitment to: (a) its being clearly true that ∃n(F an & ¬F an+1 ), is sharply distinguished from: (b) ∃n for which it is clearly true that (F an & ¬F an+1 ). Again, clear truth, i.e. determinacy, does not interact with the quantifiers in the expected manner. Whether this anomaly can be satisfactorily defended against for either SV or E remains contested. It amounts, quite simply, to an acceptance of the line-drawing sorites as sound. The language of E can be extended to include a determinacy operator D. For any sentence A, let DA count as true if and only if ν(A) = 1. A semantics for D can then be given in a manner analogous to that given for SV and a modal-like logic results. If the relation between E and SV is as strong as suggested then the logic of determinacy proposed by system E will be in the vicinity of the modal logic KT . 7 CONTEXTUALISM In late twentieth century discussions of vagueness and the sorites paradox, some theorists dissatisfied with the foregoing logical responses to vagueness began to
316
Dominic Hyde
develop responses to vagueness that emphasised the role of context. On this view, the key feature underlying puzzlement about vagueness is our failure to properly attend to the role played by context, especially when considering the sorites paradox. The suggestion that context might play a role in the analysis of vagueness can be traced back to remarks in Lewis [1969] where an alternative to a supervaluationist semantic account is developed. Though a strong advocate for a supervaluationist semantics, Lewis also considers an approach where vagueness does not reside in any language but arises by virtue of speakers invoking different precise languages in different contexts. So, for example, the lack of a sharp boundary to the application of ‘heap’ is a matter of there being a range of precise languages each of which draws the (sharp) boundary somewhere in the predicate’s penumbra, but no one of which is invariably selected by any speaker as the language appropriate across a range of contexts; in some contexts the line is drawn in one place, and in other contexts it will be drawn elsewhere in the predicate’s penumbra. Vagueness is thus a matter of contextually sensitive choice as to which precise language to use from the admissible range of such languages (i.e. those consistent with our intentions and beliefs). Unlike supervaluationist responses, the range of admissible choices are not invoked to characterise the vague semantic behaviour of vague expressions. Rather, each choice selects from a language that whose expressions are precise and the indeterminacy characteristic of vagueness is a consequence of there being no fact of the matter invariant across the various contexts of use as to which precise language from the range of languages available for choice is selected. Burns [1991; 1995] takes up and develops this pragmatic approach to vagueness. (See [Keefe, 2000, ch. 6] for discussion.) A “logic of vagueness” on this approach is something of a misnomer. Vagueness is not a logical feature of any expressions of a language. Language (or, each language of the cluster of languages we are taken to use) is precise and its logic is a matter of debate into which vagueness does not enter. If, ignoring vagueness, one takes logic to be classical then there is an end to it so far as discussions of vagueness are concerned. As a pragmatic phenomenon its solution lies in pragmatics, not logic. One problem with the foregoing analysis is that it supposes that in any fixed context of use we employ a language that is precise. We are, it is supposed, free to draw boundaries to the extension of a vague term, and actually do so whenever choosing a language in a context of use (though in different contexts we draw boundaries in different places). We may dither about exactly which language to employ (and so dither as to where we will draw the boundary) but any language chosen will be precise. And this seems counterintuitive. Any boundary seems implausible in principle. Some contextualists, like Kamp [1981], explicitly propose an approach which seeks to account for the lack of sharp boundaries in the extension of vague terms by proffering an explanation as to how such boundaries will never be found wherever one looks for them. Confronted with any pair of items in a series with regard
Logics of Vagueness
317
to which the predicate in question is soritical, the predicate is always interpreted in such a way as to not distinguish between them. For example, ‘heap’ is never interpreted in a context so as to apply to one of an indistinguishable pair of piles of wheat and not the other. This overriding demand produces contextual-shifts along a sorites series (akin to “Gestalt shifts”) whereby the predicate is re-interpreted so as to comply, i.e. to not distinguish between adjacent items. Vague predicates thus appear “tolerant” since contextual variation in their interpretation masks any relevant boundaries that may exist in the series. Predating Burns [1991], Kamp proposed a non-classical semantics for vagueness. The demand to not distinguish between elements of an indistinguishable pair is sufficient, on Kamp’s view, to make true every instance of the universally quantified premise of the mathematical induction sorites. Yet, the universally quantified premise itself is false, and a non-classical analysis of the quantifier proposed. (While some other approaches propose a semantics for the quantifier that is closely similar, e.g. Edgington [1996], Kamp invokes context in his analysis while Edgington invokes degrees of truth.) How such a proposal deals with the conditional sorites is not clear. The line drawing sorites can be declared sound. The counterintuitive falsity of the major premise of the induction sorites and truth of the conclusion of the line drawing form is said to arise as a result of our confusing the fact that there is no boundary in the region of the sorites series we are attending to with the claim that there is no boundary anywhere in the series. Like Burns, Raffman [1994; 1996] agree with Kamp that the mathematical induction form has a false major premise, however Raffman retains a standard semantics for the universal quantifier; the conditional sorites is accordingly valid but has some false premise. Again, appearances to the contrary fail to properly account for context by failing to notice that truth can be secured for all the conditionals together only by equivocating on context. Graff [2000] also pursues a classical approach which, like foregoing contextualist approaches, appeals to hidden parameters to account for misleading appearances underwriting the sorites paradox. According to Graff’s “interest-relative” account, vague predicates express properties that are interest-relative in the sense that their extensions are determined by what counts as significant for an individual x at a time. For example, ‘is a tall building’ as used in a context by an individual x expresses the property of being significantly taller for x than an average building. Given the variation of facts over time then (as opposed to the variation in the context of use, as described in earlier accounts) the extension of the univocal property expressed by the vague predicate will vary since what is or is not significant for an individual varies over time (as opposed to the variation appealed to by contextualists due to equivocal properties being expressed in varied contexts). Consequently, like contextualist solutions, the conditional sorites appears sound only because we fail to heed variation in background parameters relevant to the evaluation of the various conditionals. Assertions of their joint truth equivocate on temporal indices. Soames [1999] uses context-sensitivity to defend a tripartite picture of vague
318
Dominic Hyde
predicates, postulating boundaries between the extension, the anti-extension, and the borderline cases. Subsequently coupled with Kleene’s strong, three-valued logic K3 , this non-classical contextualism denies the truth of the universally quantified, major premise of the mathematical induction sorites while nonetheless also denying its falsity. (Tappenden [1993] suggests a very similar three-valued approach which also appeals to context to explain the apparent truth of the universally quantified premise.) The conditional sorites also admits of solution. Accepting the standard (three-valued) truth-conditions for the universal quantifier, Soames takes the conditional sorites to have some non-true conditional premise. (For arguments for a non-bivalent approach see Tappenden [1993] and Soames [2002]. For argument for the classical variant, see Williamson [2002].) Again, the counterintuitive nature of the postulated boundaries is said to be dispelled once we properly distinguish the fact of there being no boundary in the local region of the sorites series being attended to from the stronger (false) claim that there is no boundary globally, i.e. anywhere in the sorites series. Whether or not the challenged posed by vagueness, and the sorites paradox in particular, can be recast in such a way that appeal to variation in contextual or other indices is ruled out (and therewith an analysis of the paradox as a fallacy of equivocation of some sort) is a matter of debate. Stanley [2003] points to new versions of the paradox seemingly resistant to at least some contextualist analyses. Moreover, the contextualist accounts considered — whether coupled with classical logic (or, more generally, one’s preferred logic of non-vague language) or a logic amended to accommodate vagueness — still postulate boundaries, though boundaries that are admittedly not locally discriminable, i.e. not where we’re looking. But we may feel, contra the contextualist, that the reason there are none locally, none where we look, is exactly because there are none globally, i.e. none per se. Contextualists presumably think that this gets things the wrong way around; our thinking there are none globally supervenes upon our finding none locally. Whether this can satisfactorily dispel the burden incurred by postulating boundaries is disputed. (See Priest [2003], for example.) Finally, can a contextualist analysis adequately address the phenomenon of higher-order vagueness? If the indeterminacy associated with vagueness is to be explained by appealing ultimately to variation in context then higher-order vagueness must, it seems, be explained by appealing to higher-order variation in context. For example, if ‘borderline red’ is vague then it seems that there must be relevant contextual variablity in our use of the term, a term whose applicability itself depends on the presence or absence of contextual variability in our use of the term ‘red’. There may then, it seems, be contextual variability in whether or not there is contextual variability in our use of a higher-order vague term. Whether contextualism pursues this iterative idea, and whether it can succeed presents an interesting challenge. (See Soames [2003] for discussion in respect of one version of contextualism.) The challenge is not, restricted to contextualism. Higher-order vagueness presents challenges for the other approaches to vagueness canvassed too.
Logics of Vagueness
319
8 HIGHER-ORDER VAGUENESS The sorites paradox derives its force from two competing thoughts. Firstly, since a vague predicate F draws no sharp boundaries within its range of significance there can be no sharp boundary between its extension (those things satisfying F ) and its anti-extension (those things satisfying not-F ). Secondly, since there appears, in fact, to be a transition along a sorites series from satisfiers of F to satisfiers of not-F there must surely come a point nonetheless where F -satisfaction gives out so as to avoid describing satisfiers of not-F as satisfiers of F . The puzzlement arises from the challenge to explain how a transition can occur if not at some sharp boundary. How are we to describe a vague transition? An initial thought, as we have seen, is that determinate or clear F -satisfaction does indeed give out somewhere but does so in such a way as to not immediately result in determinate or clear satisfaction of not-F , proceeding instead by way of borderline cases. The absence of a sharp boundary between the F s and not-F s is the absence of any point at which there is a determinate or clear change from F to not-F . That is to say, there is no object in a sorites series, an , such that an is determinately F while an+1 is determinately not-F ; to suppose there is is to rule out the existence of borderline cases, the shadowy denizens that constitute the penumbra between the F s and not-F s. But the penumbra is itself also appears to be shadowy. There appears to be no sharp boundary between those objects that are penumbral cases and those that are non-penumbral cases (i.e. objects that are either determinately F or determinately not-F ). As Russell [1923, 87] notes: The fact is that all words are attributable without doubt over a certain area, but become questionable within a penumbra, outside of which they are again certainly not attributable. Someone might seek to obtain precision in the use of words by saying that no word is to be applied in the penumbra, but fortunately the penumbra itself is not accurately [precisely] definable, and all the vaguenesses which apply to the primary use of words apply also when we try to fix a limit to their indubitable applicability. There appears to be no more a sharp, determinate cut-off to the borderline F s than there was to the F s. So too for the determinate F s, the borderline borderline F s, the borderline determinate F s, etc. The vagueness of F appears to bring in its wake the vagueness of vaguely F, determinately F , vaguely vaguely F, vaguely determinately F , and so on. There would appear to higher-orders of vagueness; a vague predicate F admits of borderline cases, borderline clear cases, borderline borderline cases, etc. or so it seems. Epistemic approaches to vagueness characterize vagueness as a matter of ignorance as to where boundaries lie; the lack of a determinate boundary for F is a matter of the lack of a known boundary for F . Higher-order vagueness is then a matter of the lack of a known boundary for ‘known F ’ or ‘determinately F ’.
320
Dominic Hyde
We cannot know the limits to the applicability of the predicate ‘known F ’. A sorites paradox that uses the predicate ‘known F ’ is solved in the same manner as a sorites involving F by claiming that there is some boundary to its application but the boundary is unknown and therefore indeterminate or unclear. Just as something’s being F does not entail that it is known to be F , something’s being known to be F does not entail that it is known to be known to be F . The analogue of the KK principle therefore fails for D. The logic of determinacy C is such that C DA → DDA. Higher-order vagueness precludes a logic as strong as S4. Williamson [1994, Appendix; 1999] further argue for a logic of determinacy as weak as the modal logic KT. For similar reasons, as already noted when discussing SV , supervaluationism’s analysis of vagueness has been similarly argued to generate a logic of determinacy equivalent to the modal logic KT. Higher-order vagueness undermines higher-order D-strengthening principles such as: ⊢ DA → DDA ⊢ ¬DA → D¬DA. Just as first order vagueness invalidates ⊢ A → DA, higher order vagueness is taken to invalidate its higher-order analogues. (Dummett 1975 suggested that higher-order vagueness would render the logic weaker than S4. Fine [1975] specified KT . Williamson [1999] provides formal argument.) In this way, the extended language of SV admits of the phenomenon. But given the definition of the determinacy operator D, higher-order vagueness (for example, the possibility of there being borderline cases of ‘determinately F ’) points to the possibility of its being neither true nor false that it is true that A (for example, it is neither true nor false that it is true that F a). In such cases “the truth-value status of A (whether it is true, false or lacks a value) remains unsettled” [Keefe, 2000, 203]. With vagueness treated as a semantic phenomenon, higher-order vagueness is reflected in the vagueness of key semantic concepts in the metalanguage. Truth is vague, more specifically ‘admissible precisification’ is vague. And so too ‘borderline case’. Many-valued logics are commonly thought to encounter particular difficulties in the face of higher-order vagueness. Consider three-valued approaches. If higherorder vagueness in the object language is modelled using a vague higher-order (i.e. meta-) language so that, for example, it might be indeterminate whether a sentence A was indeterminate, then A would have to be said to be ‘indeterminately indeterminate’. Since this value is distinct from any other (it is claimed), such vagueness in the metalanguage can be seen to necessitate the introduction of a new truth-value. Similarly, it is claimed that if there are sentences for which it is neither true nor false that they are true, then there must be sentences ‘that are neither true nor false nor indeterminate’ (see especially [Tye, 1990; 1994]). The apparently trivalent theory proves not to be trivalent after all.
Logics of Vagueness
321
More generally, it might be thought that if a many-valued logic of any valency, finite or infinite, admits that a metalinguistic sentence assigning some given intermediate value to A itself receives an intermediate value then the proposed logic is threatened with incoherence. To admit that it might be anything other than true or false whether a sentence A takes a particular value from the logic’s truth set, irrespective of what that truth set is, is to admit that the truth-set does not exhaust the range of values sentences of the logic can take. But then the semantics is incomplete since it defines the logical behaviour of the object language only in respect of that now-admittedly incomplete truth-set. (See [Williamson, 1994, 112; Keefe, 2000, 121].) Whether this is ultimately telling against any many-valued logic (as Keefe suggests) or even against finitely-valued logics (as Williamson suggests), the challenge has been posed. Is the presumption of supervaluationism’s immunity to this supposed problem justified? If so, why does higher-order vagueness present a particular challenge to truth-functional logics? One response is to deny the phenomenon of higher-order vagueness and therewith deny that ‘true’ is vague, but to do so without accepting that there are no higher orders of vagueness. Truth is not vague, nor, however, is it precise. For to suggest it was precise would be to (wrongly) posit a sharp boundary between the truths and non-truths. Truth is therefore vaguely vague (i.e. neither determinately vague nor determinately not vague) and it is vague whether there are any higher-orders of vagueness. Tye [1994] pursues such a response in the context of advocating K3 as the logic of determinacy. (See [Keefe, 2000, 121–2; Hyde, 2003] for criticism.) Burgess [1990] also denies higher-order vagueness to the extent that, at least for some predicates, vagueness may (determinately) terminate at some finite level. And Wright [1992] argues that the higher-order phenomenon is incoherent in a way that first-order vagueness is not. (See [Edgington, 1993; Heck, 1993] for criticism.) Another response, is to treat a many-valued model of mere first-order vagueness as improving upon a classical model and claim that such a first-order simplification of the phenomenon being modelled, despite proposing a precise model of a vague phenomenon (thus ignoring higher-order vagueness), nonetheless provides adequate understanding of the logical puzzle posed by vagueness to enable us to see our way out. Theorizing in this way presents an idealization, but is none the worse for that. (See [Edgington, 1996, 308–9] for example.) Whether such an instrumental approach can be defended is discussed in Keefe [2000, 123ff]. In the end, higher-order vagueness brings us back to the puzzle as originally posed by Eubulides. Higher-order vagueness reaffirms the simple idea that vague predicates draw no sharp boundaries. Not only is there no such boundary between a vague predicate’s extension and its anti-extension, nor between the true and the false, etc., there is no sharp boundary, no sharp line, anywhere. As Eubulides’ puzzle makes clear, we encounter difficulty when asked where to draw the line in the application of a vague predicate to a series of objects, each seemingly indis-
322
Dominic Hyde
criminably different (in relevant respects) from its neighbour. Whether we are asked to draw a line between true and false applications, or between the early members of the series to which we are happy to apply the predicate and those where any different answer is warranted, i.e. exhibiting the first element of doubt, the first lowering of credence, or a lesser degree of truth, or the absence of determinate determinate truth — whatever, any such line seems counterintuitive. Yet to not change one’s answer at some point seems to demand of us that we continue to give our original assent to its application where it appears to clearly not be appropriate. For all their logical sophistication, logical theories addressing the sorites paradox must still make sense of the original puzzle bequeathed to us by Eubulides. BIBLIOGRAPHY [Arruda, 1989] A. I. Arruda. Aspects of the Historical Development of Paraconsistent Logic, in G. Priest, R. Routley and J. Norman (eds), Paraconsistent Logic: Essays on the Inconsistent, Philosophia, 1989. [Arruda and Alves, 1979] A. I. Arruda and E. H. Alves. Some Remarks On the Logic of Vagueness, Bulletin Section of Logic, Polish Academy of Sciences 8: 133–8, 1979. [Barnes, 1982] J. Barnes. Medicine, Experience and Logic in J. Barnes, J. Brunschwig, M. Burnyeat and M. Schofield (eds), Science and Speculation, Cambridge University Press, pp. 24–68, 1982. [Burgess, 1990] J. A. Burgess. The Sorites Paradox and Higher-Order Vagueness, Synthese 85: 417–74, 1990. [Burns, 1991] L. Burns. Vagueness: An Investigation into Natural Languages and the Sorites Paradox, Kluwer, 1991. [Burns, 1995] L. Burns. Something to do with Vagueness, Southern Journal of Philosophy 33 (supplement): 23–47, 1995. [Da Costa and Doria, 1995] N. C. A. Da Costa and F. A. Doria. On Ja´skowski’s Discussive Logics, Studia Logica 54: 45, 1995. [Da Costa and Wolf, 1980] N. C. A. Da Costa and R. G. Wolf. Studies in Paraconsistent Logic I: The Dialectical Principle of the Unity of Opposites, Philosophia 9: 189–217, 1980. [Diogenes, 1925] Diogenes La¨ertius. Lives of Eminent Philosophers, (translated and edited by R.D. Hicks) Harvard University Press, 1925. [Dummett, 1975] M. Dummett. Wang’s Paradox, Synthese 30: 301–24, 1975. [Edgington, 1992] D. Edgington. Validity, Uncertainty and Vagueness, Analysis 52: 193–204, 1992. [Edgington, 1993] D. Edgington. Wright and Sainsbury on Higher-Order Vagueness, Analysis 53: 193–200, 1993. [Edgington, 1996] D. Edgington. Vagueness by Degrees, in Keefe and Smith, pp. 294–316, 1996. [Field, 2003] H. Field. No Fact of the Matter, Australasian Journal of Philosophy 81: 457–80, 2003. [Fine, 1975] K. Fine. Vagueness, Truth and Logic, Synthese 30: 265–300, 1975. [Frege, 1903] F. Frege. Grundgesetze der Arithmetik, Vol. II; translated in P. Geach and M. Black (eds), Translations from the Philosophical Writings of Gottlob Frege, 1903; 3rd edn, Blackwell (1980). [Galen, 1987] Galen. On Medical Experience, 16.1–17.3; translated in A.A. Long & D.N. Sedley, The Hellenistic Philosophers, Cambridge University Press, Vol. 1, p. 223, 1987. [Goguen, 1969] J. Goguen. The Logic of Inexact Concepts, Synthese 19: 325–78, 1969. [Graff, 2000] D. Graff. Shifting Sands: An Interest-Relative Theory of Vagueness, Philosophical Topics 28: 45–81, 2000. [Halld´ en, 1949] S. Halld´en. The Logic of Nonsense, Uppsala: Uppsala Universitets Arsskrift, 1949.
Logics of Vagueness
323
[Heck, 1993] R. Heck. A Note on the Logic of (Higher-Order) Vagueness, Analysis 53: 201–8, 1993. [Hyde, 1997] D. Hyde. From Heaps and Gaps to Heaps of Gluts, Mind 106: 641–60, 1997. [Hyde, 2003] D. Hyde. Higher Orders of Vagueness Reinstated, Mind 112: 301–5, 2003. [Ja´skowski, 1948/1969] S. Ja´skowski. Propositional Calculus for Contradictory Deductive Systems, Studia Logica 24 (1969): 143–57, 1969. Originally published in Polish in Studia Scientarium Torunensis, Sec. A II: 55–77, 1948. [Kamp, 1975] J. A. W. Kamp. Two Theories about Adjectives, in E. Keenan (ed.), Formal Semantics of Natural Languages, Cambridge University Press, pp. 123–55, 1975. [Kamp, 1981] J. A. W. Kamp. The Paradox of the Heap, in U. M¨ onnich (ed.), Aspects of Philosophical Logic, Reidel, pp. 225–77, 1981. [Keefe, 2000] R. Keefe. Theories of Vagueness, Cambridge University Press, 2000. [Keefe and Smith, 1996] R. Keefe and P. Smith, eds. Vagueness: A Reader, Cambridge Mass.: MIT Press, 1996. [Kleene, 1938] S. C. Kleene. On a Notation for Ordinal Numbers, Journal of Symbolic Logic 3: 150–5, 1938. [K¨ orner, 1960] S. K¨ orner. The Philosophy of Mathematics, Hutchinson, London, 1960. [Lakoff, 1973] G. Lakoff. Hedges: A Study in Meaning Criteria and the Logic of Fuzzy Concepts, Journal of Philosophical Logic 2: 458–508, 1973. [Lewis, 1969] D. Lewis. Conventions, Harvard University Press, 1969. [Lewis, 1970] D. Lewis. General Semantics, Synthese 22: 18–67, 1970. [Lewis, 1982] D. Lewis. Logic for Equivocators, Nous 16: 431–41, 1982. [Lewis, 1983] D. Lewis. Philosophical Papers, Oxford: Oxford University Press, 1983. [Lewis, 1986] D. Lewis. On the Plurality of Worlds, Oxford: Basil Blackwell, 1986. [L ukasiewicz and Tarski, 1930] J. L ukasiewicz and A. Tarski. Untersuchungen u ¨ ber den Aussagenkalkul, Comptes rendus des s´ eances de la Soci´ et´ e des Sciences et des Lettres de Varsovie 23: 1–21, 30–50, 1930. Reprinted as Investigations into the Sentential Calculus in A. Tarski, Logic, Semantics, Metamathematics, (ed. by J. Corcoran, translated by J.H. Woolger), Indianapolis, 2nd edition (1983). [Machina, 1976] K. Machina. Truth, Belief and Vagueness, Journal of Philosophical Logic 5: 47–78, 1976. [McGee and McLaughlin, 1995] V. McGee and B. McLaughlin. Distinctions Without a Difference, Southern Journal of Philosophy 33 (supplement): 203–51, 1995. [McGill and Parry, 1948] V. J. McGill and W. T. Parry. The Unity of Opposites: A Dialectical Principle, Science and Society 12: 418–44, 1948. [Mehlberg, 1958] H. Mehlberg. The Reach of Science, Toronto University Press, 1958. [Milosz, 1955] C. Milosz. The Captive Mind, New York: Vintage Books, 1955. (Translated from 1953 Polish original by J. Zielonko.) [Parsons, 1987] T. Parsons. Entities Without Identity in J. Tomberlin (ed.), Philosophical Perspectives, 1, Metaphysics, Ridgeview Publishing Co., pp. 1–19, 1987. [Parsons, 2000] T. Parsons. Indeterminate Identity: Metaphysics and Semantics, Oxford University Press, 2000. [Parsons and Woodruff, 1995] T. Parsons and P. Woodruff. Wordly Indeterminacy of Identity, Proceedings of the Aristotelian Society 95: 171–91. Reprinted in Keefe and Smith (1996), pp. 321–37, 1995. [Peacocke, 1981] C. Peacocke. Are Vague Predicates Incoherent?, Synthese 46: 121–41, 1981. [Pe˜ na, 1989] L. Pe˜ na. Verum et ens Convertuntur, in G. Priest, R. Routley and J. Norman (eds), Paraconsistent Logic: Essays on the Inconsistent, Philosophia, 1989. [Plekhanov, 1937/1908] G. Plekhanov. Fundamental Problems of Marxism, London: Lawrence and Wishart, 1937. (Translated from 1908 Russian original by E. and C. Paul.) [Priest, 1991] G. Priest. Sorites and Identity, Logique et Analyse 135–6: 293–6, 1991. [Priest, 2003] G. Priest. A Site for Sorites, in J.C. Beall (ed.), Liars and Heaps: New Essays on Paradox, Oxford: Clarendon Press, pp. 9–23, 2003. [Priest and Routley, 1989a] G. Priest and R. Routley. Applications of a Paraconsistent Logic, in G. Priest, R. Routley and J. Norman (eds), Paraconsistent Logic: Essays on the Inconsistent, Philosophia, pp. 367–93, 1989. [Priest and Routley, 1989b] G. Priest and R. Routley. Systems of Paraconsistent Logic in G. Priest, R. Routley and J. Norman (eds), Paraconsistent Logic: Essays on the Inconsistent, Philosophia, pp. 151–86, 1989.
324
Dominic Hyde
[Quine, 1981] W. V. O. Quine. What Price Bivalence?, Journal of Philosophy 78: 90–5, 1981. [Raffman, 1994] D. Raffman. Vagueness Without Paradox, Philosophical Review 103: 41–74, 1994. [Raffman, 1996] D. Raffman. Vagueness and Context-Sensitivity, Philosophical Studies 81: 175– 92, 1996. [Russell, 1923] B. Russell. Vagueness, Australasian Journal of Philosophy and Psychology 1: 84–92, 1923. [Soames, 1999] S. Soames. Understanding Truth, Oxford University Press, 1999. [Soames, 2002] S. Soames. Replies, Philosophy and Phenomenological Research 65: 429–52, 2002. [Soames, 2003] S. Soames. Higher-Order Vagueness for Partially Defined Predicates, in J.C. Beall (ed.), Liars and Heaps: New Essays on Paradox, Oxford: Clarendon Press, pp. 128–50, 2003. [Sorensen, 1988] R. Sorensen. Blindspots, Oxford: Clarendon Press, 1988. [Sorensen, 2001] R. Sorensen. Vagueness and Contradiction, New York: Oxford University Press, 2001. [Stanley, 2003] J. Stanley. Context, Interest-Relativity, and the Sorites, Analysis 63: 269–80, 2003. [Tappenden, 1993] J. Tappenden. The Liar and the Sorites Paradoxes: Towards a Unified Treatment, Journal of Philosophy 90: 551–77, 1993. [Tye, 1990] M. Tye. Vague objects, Mind 99: 535–57, 1990. [Tye, 1994] M. Tye. Sorites paradoxes and the semantics of vagueness, in J. Tomberlin (ed.), Philosophical Perspectives 8: Logic and Language, Ridgeview Publishing Co., pp. 189–206, 1994. Partially reprinted in Keefe and Smith (1996), pp. 281–93. [van Fraassen, 1966] B. C. Van Fraassen. Singular Terms, Truth-Value Gaps, and Free Logic, Journal of Philosophy 63: 481–85, 1966. [Weatherson, 2004] B. Weatherson. From Classical to Intuitionistic Probability, Notre Dame Journal of Formal Logic 44: 111–23, 2004. [Williamson, 1994] T. Williamson. Vagueness, London: Routledge, 1994. [Williamson, 1999] T. Williamson. On the Structure of Higher-Order Vagueness, Mind 108: 127–142, 1999. [Williamson, 2002] T. Williamson. Soames on Vagueness, Philosophy and Phenomenological Research 65: 422–28, 2002. [Wright, 1975] C. Wright. On the Coherence of Vague Predicates, Synthese 30: 325–65, 1975. Condensed version reprinted as Language-Mastery and the Sorites Paradox in G. Evans & J. McDowell (eds), Truth and Meaning: Essays in Semantics, Oxford University Press (1976), pp. 223–47. Later reprinted in part in Keefe and Smith (1996), pp. 151–73. [Wright, 1992] C. Wright. Is Higher-Order Vagueness Coherent?, Analysis 52: 129–39, 1992. [Zaden, 1965] L. Zadeh. Fuzzy Sets, Information and Control 8: 338–53, 1965. [Zadeh, 1975] L. Zadeh. Fuzzy Logic and Approximate Reasoning, Synthese 30: 407–28, 1975.
FUZZY-SET BASED LOGICS — AN HISTORY-ORIENTED PRESENTATION OF THEIR MAIN DEVELOPMENTS
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
1
INTRODUCTION: A HISTORICAL PERSPECTIVE
The representation of human-originated information and the formalization of commonsense reasoning has motivated different schools of research in Artificial or Computational Intelligence in the second half of the 20th century. This new trend has also put formal logic, originally developed in connection with the foundations of mathematics, in a completely new perspective, as a tool for processing information on computers. Logic has traditionally put emphasis on symbolic processing at the syntactical level and binary truth-values at the semantical level. The idea of fuzzy sets introduced in the early sixties [Zadeh, 1965] and the development of fuzzy logic later on [Zadeh, 1975a] has brought forward a new formal framework for capturing graded imprecision in information representation and reasoning devices. Indeed, fuzzy sets membership grades can be interpreted in various ways which play a role in human reasoning, such as levels of intensity, similarity degrees, levels of uncertainty, and degrees of preference. Of course, the development of fuzzy sets and fuzzy logic takes its roots in concerns already encountered in non-classical logics in the first half of the century, when the need for intermediary truth-values and modalities emerged. We start by briefly surveying some of the main issues raised by this research line before describing the historical development of fuzzy sets, fuzzy logic and related issues. Jan L ukasiewicz (1878-1956) and his followers have developed three-valued logics, and other many-valued systems, since 1920 [L ukasiewicz, 1920]. This research was motivated by philosophical concerns as well as some technical problems in logic but not so much by issues in knowledge representation, leaving the interpretation of intermediate truth-values unclear. This issue can be related to a misunderstanding regarding the law of excluded middle and the law of non-contradiction, and the connections between many-valued logics and modal logics. The principle of bivalence, Every proposition is either true or false,
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
326
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
formulated and strongly defended by Chrisippus and his school in antique Greece, was for instance questioned by Epicureans, and even rejected by them in the case of propositions referring to future contingencies. Let us take an example considered already by Aristotle, namely the proposition: “There will be a sea battle to-morrow (p) and there will not be a sea battle to-morrow (¬p)” This proposition “p and ¬p” is ever false, because of the non-contradiction law and the proposition “p or ¬p” is ever true, because tertium non datur. But we may fail to know the truth of both propositions “there will be a sea battle to-morrow” and “there will not be a sea battle to-morrow”. In this case, at least intuitively, it seems reasonable to say that it is possible that there will be a sea battle to-morrow but at the same time, it is possible that there will not be a sea battle to-morrow. There has been a recurrent tendency, until the twentieth century many-valued logic tradition, to claim the failure of the bivalence principle on such grounds, and to consider the modality possible as a third truth value. This was apparently (unfortunately) the starting motivation of L ukasiewicz for introducing his threevalued logic. Indeed, the introduction of a third truth-value was interpreted by L ukasiewicz as standing for possible. However the proposition “possible p” is not the same as p, and “possible ¬p” is not the negation of “possible p”. Hence the fact that the proposition “possible p” ∧ “possible ¬p” may be true does not question the law of non-contradiction since “possible p” and “possible ¬p” are not mutually exclusive. This situation leads to interpretation problems for a fully truth-functional calculus of possibility, since even if p is “possible” and ¬p is “possible”, still p ∧ ¬p is ever false. On the contrary, vague or fuzzy propositions are ones such that, due to the gradual boundary of their sets of models, proposition “p and ¬p” is not completely false in some interpretations. This is why Moisil [1972] speaks of fuzzy logics as Non-Chrisippean logics. A similar confusion seems to have prevailed in the first half of the century between probability and partial truth. Trying to develop a quantitative concept of truth, H. Reichenbach [1949] proposed his probability logic in which the alternative true-false is replaced by a continuous scale of truth values. In this logic he introduces probability propositions to which probabilities are assigned, interpreted as grades of truth. In a simple illustrative example, he considers the statement “I shall hit the center”. As a measure of the degree of truth of this statement, Reichenbach proposes to measure the distance r of the hit to the center and to take the truth-value as equal to 1/(1 + r). But, of course, this can be done only after the shot. However, quantifying the proposition after the hit is not a matter of belief assessment when the distance to the center is known. It is easy to figure out retrospectively that this method is actually evaluating the fuzzy proposition “I hit close to the center”. Of course we cannot evaluate the truth of the above
Fuzzy Logic
327
sentence before the shot, because now it is a matter of belief assessment, for which probability can be suitable. Very early, when many-valued logics came to light, some scholars in the foundations of probability became aware that probabilities differ from what logicians call truth-values. De Finetti [1936], witnessing the emergence of many-valued logics (especially the works of L ukasiewicz, see [L ukasiewicz, 1970]), pointed out that uncertainty, or partial belief, as captured by probability, is a meta-concept with respect to truth degrees, and goes along with the idea that a proposition, in its usual acceptance, is a binary notion. On the contrary, the notion of partial truth (i.e. allowing for intermediary degrees of truth between true -1- and false -0-) as put forward by L ukasiewicz [1930], leads to changing the very notion of proposition. Indeed, the definition of a proposition is a matter of convention. This remark clearly points out the fact that many-valued logics deal with many-valuedness in the logical status of propositions (as opposed to Boolean status), not with belief or probability of propositions. On the contrary, uncertainty pertains to the beliefs held by an agent, who is not totally sure whether a proposition of interest is true or false, without questioning the fact that ultimately this proposition cannot be but true or false. Probabilistic logic, contrary to many-valued logics, is not a substitute of binary logic. It is only superposed to it. However this point is not always clearly made by the forefunners of many-valued logics. Carnap [1949] also points out the difference in nature between truth-values and probability values (hence degrees thereof), precisely because “true” (resp: false) is not synonymous to “known to be true” (resp: known to be false), that is to say, verified (resp: falsified). He criticizes Reichenbach on his claim that probability values should supersede the two usual truth-values. In the same vein, H. Weyl [1946] introduced a calculus of vague predicates treated as functions defined on a fixed universe of discourse U , with values in the unit interval. Operations on such predicates f : U → [0, 1] have been defined as follows: f ∩ g = min(f, g) (conjunction); f ∪ g = max(f, g) (disjunction); f c = 1 − f (negation). Clearly, this is one ancestor of the fuzzy set calculus. However, one of the approaches discussed by him for interpreting these connectives again considers truth values as probabilities. As shown above, this interpretation is dubious, first because probability and truth address different issues, and especially because probabilities are not compositional for all logical connectives (in fact, only for negation). The history of fuzzy logic starts with the foundational 1965 paper by Lotfi Zadeh entitled “Fuzzy Sets” [Zadeh, 1965]. In this paper, motivated by problems in pattern classification and information processing, Zadeh proposes the idea of fuzzy sets as generalized sets having elements with intermediary membership grades. In this view, a fuzzy set is characterized by its membership function, allocating a
328
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
membership grade to any element of the referential domain. The unit interval is usually taken as the range of these membership grades, although any suitable partially ordered set could also be used (typically: a complete lattice [Goguen, 1967]. Then, extended set theoretic operations on membership functions are defined by means of many-valued connectives, such as minimum and maximum for the intersection and the union respectively. Later, due to other researchers, it has been recognised that the appropriate connectives for defining generalized intersection and union operations was a class of associative monotonic connectives known as triangular norms (t-norms for short), together with their De Morgan dual triangular co-norms (t-conorms for short) (see Section 2.1). These operations are at the basis of the semantics of a class of mathematical fuzzy logical systems that have been thoroughly studied in the recent past, as it will be reported later in Section 3. While the many-valued logic stream has mainly been developed in a mathematical logic style, the notion of fuzzy set-based approximate reasoning as imagined by Zadeh in the seventies is much more related to information processing: he wrote in 1979 that “the theory of approximate reasoning is concerned with the deduction of possibly imprecise conclusions from a set of imprecise premises” [Zadeh, 1979a]. Fuzzy logic in Zadeh’s sense, as it can be seen in the next section, is both a framework allowing the representation of vague (or gradual) predicates and a framework to reason under incomplete information. By his interest in modeling vagueness, Zadeh strongly departs from the logical tradition that regards vague propositions as poor statements to be avoided or to be reformulated more precisely [Russell, 1923]. Moreover, the view of local fuzzy truth-values emphasized by Bellman and Zadeh [1977] really means that in fuzzy logic, what is called truth is evaluated with respect to a description of a state of (vague, incomplete) knowledge, and not necessarily with respect to an objective, completely and precisely known state of the world. Many-valued logics are a suitable formalism to deal with an aspect of vagueness, called fuzziness by Zadeh, pertaining to gradual properties. It should be emphasized that the fuzziness of a property is not viewed as a defect in the linguistic expression of knowledge (e.g., lack of precision, sloppiness, limitation of the natural languages), but rather as a way of expressing gradedness. In that sense, fuzzy sets do not have exactly the same concern as other approaches to vagueness. For instance, K. Fine [1975] proposes that statements about a vague predicate be taken to be true if and only if they hold for all possible ways of making the predicate clear-cut. It enables classical logic properties to be preserved, like the mutual exclusiveness between a vague predicate A and its negation not-A. In contrast, the fuzzy set view maintains that in some situations there is no clear-cut predicate underlying a fuzzy proposition due to the smooth transition from one class to another induced by its gradual nature. In particular, A and not-A will have a limited overlap; see [Dubois et al., 2005a] for a detailed discussion. The presence of this overlap leads to a logical view of interpolative reasoning [Klawonn and Nov´ ak, 1996; Dubois et al., 1997a].
Fuzzy Logic
329
However, when only imprecise or incomplete information is available, truthvalues (classical or intermediate) become ill-known. Then belief states can be modeled by sets of truth-values. Actually, what are called fuzzy truth-values by Zadeh turn out to be ill-known truth-values in this sense. They are fuzzy sets of truth-values and not so much an attempt to grasp the linguistic subtleties of the word true in natural languages. Strictly speaking, fuzzy set theory deals with classes with unsharp boundaries and gradual properties, but it is not concerned with uncertainty or partial belief. The latter is rather due to a lack of precise (or complete) information, then making truth-values ill-known. This is the reason why Zadeh [1978a] introduced possibility theory, which naturally complements fuzzy set theory for handling uncertainty induced by fuzzy and incomplete pieces of information. Possibility theory turns out to be a non-probabilistic view of uncertainty aiming at modeling states of partial or complete ignorance rather than capturing randomness. Based on possibility theory, a logical formalism has been developed in the last twenty years under the name of possibilistic logic (see Section 4.1). Therefore we can distinguish: • states with Boolean information from states with gradual information (leading to intermediate uncertainty degrees) and, • statements that can be only true or false from statements that may have an intermediate truth-values because they refer to vague or gradual properties. This analysis leads us to four noticeable classes of formalisms: (i) classical logic where both truth and belief (understood as the status of what can be inferred from available information) are Boolean, (ii) many-valued logics where truth is a matter of degree but consequencehood is Boolean, (iii) possibilistic logic for graded belief about Boolean statements, and (iv) the general case of non-Boolean statements leading to graded truth and imprecise information leading to graded beliefs, which motivated Zadeh’s proposal. In the last twenty years, while researchers have been developing formal manyvalued logics and uncertainty logics based on fuzzy sets, Zadeh rather emphasized computational and engineering issues by advocating the importance of soft computing (a range of numerically oriented techniques including fuzzy rules-based control systems, neural nets, and genetic algorithms [Zadeh, 1994b]) and then introduced new paradigms about computational intelligence like granular computing [Zadeh, 1997], computing with words [Zadeh, 1995] and perception-based reasoning [Zadeh, 1999], trying to enlarge his original motivation for a computational approach to the way humans handle information. Since fuzzy sets, fuzzy logic, possibility theory, and soft computing have the same father, Zadeh, these notions are too often confused although they refer to quite different tasks and have been developed in sometimes opposite directions. On the one hand, the term fuzzy logic, understood in the narrow/technical sense refers to many-valued logics that handle gradual properties (that are a matter of degree,
330
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
e.g. “large”, “old”, “expensive”, . . . ). These logics are developed by logicians or artificial intelligence theoreticians. Technicaly speaking, they are compositional w.r.t. to all logical connectives, while uncertainty logics (like possibilistic logic) cannot never be compositional w.r.t. to all logical connectives. On the other hand, “fuzzy logics”, in the broad sense, is a generic expression that most of the time refers to that part of soft computing where fuzzy sets and fuzzy rules are used. Lastly, “soft computing” is a buzz-word sometimes referring to the same research trend as “computational intelligence” (viewed as an alternative problem solving paradigm to classical artificial intelligence methods that are found to be too symbolically-oriented). The remaining part of the chapter is structured as follows. Section 2 provides a detailed account of the fuzzy set-based approach to approximate reasoning. It starts with a review of fuzzy set connectives and the possibility theory-based representation of information under the form of flexible constraints. Then the approximate reasoning methodology based on the combination and projection of such flexible constraints is described, before providing a detailed discussion on the specially important notion of fuzzy truth value in this setting. The last part of this section is devoted to the representation of different types of fuzzy if-then rules and to the discussion of the generalized modus ponens and some related issues such as basic inference patterns. Section 3 contains a survey of the main many-valued logical systems more recently developed in relation to the formalization of fuzzy logic in narrow sense. The so-called t-norm based fuzzy logics are first introduced, providing Hilbert-style axiomatizations of main systems, their algebraic semantics as well as analytical proof calculi based on hypersequents for some of these logics. Extensions of these logics with truth-constants and additional connectives are also reported. Then, an overview of other systems of many-valued logic with deduction based on resolutionstyle inference rules is presented. A more abstract point of view, the consequence operators approach to fuzzy logic, is also surveyed. Finally, a many-valued logic encoding of major approximate reasoning patterns is described. Section 4 is devoted to fuzzy set-based logical formalisms handling uncertainty and similarity, including possibilistic logic, its extension to deal with fuzzy constants, similarity-based inference, modal fuzzy theories of uncertainty, and logics handling fuzzy truth values in their syntax. 2 A GENERAL THEORY OF APPROXIMATE REASONING Zadeh proposed and developed the theory of approximate reasoning in a long series of papers in the 1970’s [1973; 1975a; 1975b; 1975c; 1976; 1978b; 1979a], at the same time when he introduced possibility theory [Zadeh, 1978a] as a new approach to uncertainty modeling. His original approach is based on a fuzzy set-based representation of the contents of factual statements (expressing elastic restrictions on the possible values of some parameters) and of if-then rules relating such fuzzy statements.
Fuzzy Logic
331
The phrase fuzzy logic appears rather early [Zadeh, 1973]: “[...] the pervasiveness of fuzziness in human thought processes suggests that much of the logic behind human reasoning is not the tradidional two-valued or even multivalued logic, but a logic with fuzzy truths, fuzzy connectives and fuzzy rules of inference. In our view, it is this fuzzy, and as yet not well-understood, logic1 that plays a basic role in what may well be one of the most important facets of human thinking [...]”. Clearly, after its founder, fuzzy logic strongly departs at first glance from the standard view of logic where inference does not depend on the contents of propositions. Indeed from p and p′ → q one always infers q whenever p ⊢ p′ for any propositions p, p′ and q, while in Zadeh’s generalized modus ponens, which is a typical pattern of approximate reasoning, from “X is A∗ ” and “if X is A then Y is B”, one deduces “Y is B ∗ ” where B ∗ = f (A∗ , A, B) depends on the implication chosen, and may differ from B while being non-trivial. Thus, in this approach, the content of an inference result does depend on the semantic contents of the premises. Strictly speaking, the presentation in retrospect, below, of Zadeh’s theory of approximate reasoning does not contain anything new. Still, we emphasize how altogether its main features contribute to a coherent theory that turns to encompass several important particular cases of extensions of classical propositional logic, at the semantic level. Moreover, we try to point out the importance of the idea of fuzzy truth as compatibility, and of the converse notion of truth qualification, two key issues in the theory of approximate reasoning which have been often overlooked or misunderstood, as well as the role of the minimal specificity principle in the representation of information in possibility theory. The section below can be viewed as a revised and summarized version of [Bouchon-Meunier et al., 1999], where more details can be also found about various approaches that are more loosely inspired from Zadeh’s proposal.
2.1
Fuzzy sets
This section provides basic definitions of fuzzy set theory and its main connectives. The emphasis is also put here on the various representations of a fuzzy set, that are instrumental when extending formal notions from sets to fuzzy sets. Membership Functions L. A. Zadeh has given in his now famous paper [Zadeh, 1965] the following definition: A fuzzy set is a class with a continuum of membership grades. So, a fuzzy set (class) F in a referential U is characterized by a membership function which associates with each element u ∈ U a real number in the interval [0, 1]. The value of the membership function at element u represents the “grade of membership”of u in F . A fuzzy set F is thus defined as a mapping F : U → [0, 1], 1 Italics
are ours
332
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
and it is a kind of generalization of the traditional characteristic function of a subset A : U → {0, 1}.There is a tendency now to identify the theory of fuzzy sets with a theory of generalized characteristic functions2 . In particular, F (u) =1 reflects full membership of u in F , while F (u) = 0 expresses absolute non-membership in F . Usual sets can be viewed as special cases of fuzzy sets where only full membership and absolute non-membership are allowed. They are called crisp sets, or Boolean sets. When 0 < F (u) < 1, one speaks of partial membership. For instance, the term young (for ages of humans) may apply to a 30-year old individual only at degree 0.5. A fuzzy set can be also denoted as a set of pairs made of an element of U and its membership grade when positive: {(u, F (u)), u ∈ (0, 1]}. The set of fuzzy subsets of U is denoted F(U ). The membership function attached to a given word (such as young) depends on the contextual intended use of the word; a young retired person is certainly older than a young student, and the idea of what a young student is also depends on the user. However, in the different contexts, the term young will be understood as a gradual property generally. Membership degrees are fixed only by convention, and the unit interval as a range of membership grades, is arbitrary. The unit interval is natural for modeling membership grades of fuzzy sets of real numbers. The continuity of the membership scale reflects the continuity of the referential. Then a membership degree F (u) can be viewed as a degree of proximity between element u and the prototypes of F , that is, the elements v such that F (v) = 1. The membership grade decreases as elements are located farther from such prototypes. This representation points out that there is no precise threshold between ages that qualify as young and ages that qualify as not young. More precisely there is a gap between protopypes of young and proptypes of not young. It is clear that fuzzy sets can offer a natural interface between linguistic representations and numerical representations. Of course, membership grades never appear as such in natural languages. In natural language, gradual predicates are those to which linguistic hedges such as very can be applied. Such linguistic hedges are the trace of gradual membership in natural language. Clearly the numerical membership grade corresponding to very is itself ill-defined. It is a fuzzy set of membership degrees as suggested by Zadeh [1972]. He suggested to build the membership function of very young from the one of young and the one of very, by letting very-young(·) = very(young(·)). So, fuzzy subsets of membership grades (represented by a function from [0, 1] to itself) model linguistic hedges that can modify membership functions of fuzzy predicates. However if the referential set U is a finite set of objects then the use of the unit interval as a set of membership grades is more difficult to justify. A finite totally ordered set L will then do. It results from a partitioning of elements of U with respect to a fuzzy set F , each class in the partition gathering elements with equal membership, and the set of classes being ordered from full membership to non-membership. 2 This is why in the following we shall equivalently denote the membership grade of u to a fuzzy set F as F (u) or the more usual µF (u), according to best convenience and clarity
Fuzzy Logic
333
Parikh [1983] questions the possibility of precisely assessing degrees of truth for a vague predicate. In practice, however membership degrees have mainly an ordinal meaning. In other words it is the ordering induced by the membership degrees between the elements that is meaningful, rather than the exact value of the degrees. This is in agreement with the qualitative nature of the most usual operations that are used on these degrees (min, max and the complementation to 1 as an order-reversing operation in [0, 1], as recalled below). Obviously a fuzzy membership function will depend on the context in various ways. First, the universe of discourse (i.e., the domain of the membership function) has to be defined (e.g., young is not the same thing for a man or for a tree). Second, it may depend on the other classes which are used to cover the domain. For instance, with respect to a given domain, young does not mean exactly the same thing if the remaining vocabulary includes only the word old, or is richer and contains both mature and old. Lastly, a fuzzy membership function may vary from one person to another. However, what is really important in practice is to correctly represent the pieces of knowledge provided by an expert and capture the meaning he intends to give to his own words. Whether there can be a universal consensus on the meaning of a linguistic expression like young man is another matter. Level Cuts Another possible and very convenient view of fuzzy set is that of a nested family of classical subsets, via the notion of level-cut. The α-level cut Fα of a fuzzy set F is the set {u ∈ U : F (u) ≥ α}, for 1 ≥ α > 0. The idea is to fix a positive threshold α and to consider as members of the set the elements with membership grades above the threshold. Moving the threshold in the unit interval, the family of crisp sets {Fα : 1 ≥ α > 0} is generated. This is the horizontal view of a fuzzy set. For α = 1, the core of F is obtained. It gathers the prototypes of F . Letting α vanish, the support s(F ) of F is obtained. It contains elements with positive membership grades, those which belong to some extent to F . Note that the support is different from F0 = U . Gentilhomme[1968]’s “ensembles flous” were fuzzy sets with only a core and a support. The set of level-cuts of F is nested in the sense that : (1) α < β implies Fβ ⊆ Fα Going from the level-cut representation to the membership function and back is easy. The membership function can be recovered from the level-cut as follows: (2) F (u) = sup{α : u ∈ Fα } Conversely, given an indexed nested family {Aα : 1 ≥ α > 0} such that A0 = U and condition (1) (plus a continuity requirement in the infinite case) holds, then there is a unique fuzzy set F whose level-cuts are precisely Fα = Aα for each α ∈ [0, 1]. This representation theorem was obtained by Negoita and Ralescu [1975].
334
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Fuzzy Connectives: Negations, Conjunctions and Disjunctions The usual set-theoretic operations of complementation, intersection and union were extended by means of suitable operations on [0, 1] (or on some weaker ordered structure), that mimic, to some extent, the properties of the Boolean connectives on {0, 1} used to compute the corresponding characteristic functions. Namely, denoting (·)c , ∩, ∪, the fuzzy set complementation, intersection and union, respectively, these connectives are usually understood as follows: (3) Ac (u) = n(A(u)) (4) (A ∩ B)(u, v) = T (A(u), B(v)) (5) (A ∪ B)(u, v) = S(A(u), B(v)) where A is a fuzzy subset of a universe U , B a fuzzy subset of a universe V , and where n is a so-called negation function, T is a so-called triangular norms and S a triangular conorms, whose characteristic properties are stated below. Note that strictly speaking, equations 4-5 define the intersection and union of fuzzy sets only if U = V and u = v; otherwise they define the Cartesian product of A and B and the dual co-product. All these connective operations are actually extensions of the classical ones, i.e., for the values 0 and 1, they behave classically, and give rise to different multiple-valued logical systems when they are taken as truth-functions for connectives (see Section 3 of this chapter). It is worth noticing that in his original paper, acknowledgedly inspired in part by Kleene’s many-valued logics [Kleene, 1952], Zadeh proposed to interpret complementation, intersection and union by means of 1 − (·), min and max operations respectively. These operations are the only ones that are compatible with the level cuts view of fuzzy sets. Zadeh also mentioned the possibility of using other operations, namely the algebraic product for intersection-like, and its De Morgan dual as well as algebraic sum (when not greater than 1) for union-like fuzzy set theoretic operations. Axioms for fuzzy set operations were proposed as early as 1973, starting with [Bellman and Giertz, 1973] and later Fung and Fu [Fung and Fu, 1975]. However the systematic study of fuzzy set connectives was only started in the late seventies by several scholars, like Alsina, Trillas, Valverde [1980; 1983], Hoehle [1979], Klement[1980], Dubois and Prade[1979a; 1980] (also [Dubois, 1980], [Prade, 1980]) and many colleagues, and led to a general framework outlined below. A negation n is a unary operation in [0, 1] [Trillas, 1979] satisfying the following properties: n(0) = 1; n(1) = 0; n(a) ≥ n(b), n(n(a)) ≥ a.
(6) (7) if a ≤ b;
(8) (9)
Furthermore, if n(n(a)) = a, i.e., if n is an involution, n is called a strong negation. The most typical strong negation is n(a) = 1 − a, for all a ∈ [0, 1].
Fuzzy Logic
335
G¨ odel’s negation, defined as n(0) = 1 and n(a) = 0 for all a ∈ (0, 1], is an example of non-strong negation. Triangular norms (t-norms for short) and triangular conorms (t-conorms for short) were invented by Schweizer and Sklar [1963; 1983], in the framework of probabilistic metric spaces, for the purpose of expressing the triangular inequality. They also turn out to the most general binary operations on [0, 1] that meet natural and intuitive requirements for conjunction and disjunction operations. Namely, a t-norm T is a binary operation on [0, 1], i.e., T : [0, 1]× : [0, 1] → [0, 1], that satisfies the following conditions: • commutative : T (a, b) = T (b, a); • associative: T (a, T (b, c)) = T (T (a, b), c); • non-decreasing in both arguments: T (a, b) ≤ T (a′ , b′ ) if a ≤ a′ and b ≤ b′ ; • boundary conditions: T (a, 1) = T (1, a) = a. It can be proved that T (a, 0) = T (0, a) = 0. The boundary conditions and the latter conditions respectively express the set-theoretic properties A ∩ U = A and A ∩ ∅ = ∅. It is known that the minimum operation is the greatest t-norm, i.e., for any t-norm T , T (a, b) ≤ min(a, b) holds for all a, b ∈ [0, 1]. Typical basic examples of t-norms are • the minimum : T (a, b) = min(a, b), • the product: T (a, b) = a · b • the linear t-norm: T (a, b) = max(0, a + b − 1) The linear t-norm is often referred to as L ukasiewicz’s t-norm3 . Note the inequalities, max(0, a + b − 1) ≤ a · b ≤ min(a, b). The De Morgan-like dual notion of a t-norm (w.r.t. negation n(a) = 1 − a, or a more general strong negation) is that of a t-conorm. A binary operation S on [0, 1] is called a t-conorm if it satisfies the same properties as the ones of a tnorm except for the boundary conditions, namely, here 0 is an identity and 1 is absorbent. Namely the following conditions express that A ∪ ∅ = A: boundary conditions: S(0, a) = S(a, 0) = a. Hence S(a, 1) = S(1, a) = 1, expressing that A ∪ U = U . Dually, the maximum operation is the smallest t-conorm (S(a, b) ≥ max(a, b)). T-norms and t-conorms are dual with respect to strong negations in the following sense: if T is a (continuous) t-norm and n a strong negation then the function 3 because it is closely related to the implication connective min(1, 1−a+b) originally introduced by L ukasiewicz
336
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
S defined as S(a, b) = n(T (n(a), n(b))) is a (continuous) t-conorm, and conversely, if S is a t-conorm, then the function T defined as T (a, b) = n(S(n(a), n(b))) is a t-norm. Typical basic examples of t-conorms are the duals of minimum, product and L ukasiewicz’ t-norms, namely the maximum S(a, b) = max(a, b), the so-called probabilistic sum S(a, b) = a+b−a·b and the bounded sum S(a, b) = min(1, a+b). Note now the inequalities max(a, b) ≤ a + b − a · b ≤ min(1, a + b). A t-norm (resp. a t-conorm) is said to be continuous if it is a continuous mapping from [0, 1]2 into [0, 1] in the usual sense. For continuous t-norms commutativity is a consequence of the other properties (see Theorem 2.43 in [Klement et al., 2000]). All the above examples are continuous. An important example of non-continuous t-norm is the so-called nilpotent minimum [Fodor, 1995] defined as min(a, b), if a + b ≥ 1 T (a, b) = 0, otherwise. See the monographs by Klement, Mesiar and Pap [2000] and by Alsina, Frank and Schweizer [2006] for further details on triangular norms, conorms and negation functions. Fuzzy Implications Most well-known fuzzy implication functions I : [0, 1] × [0, 1] → [0, 1], are generalizations, to multiple-valued logical systems, of the classical implication function. In classical logic the deduction theorem states the equivalence between the entailments r ∧ p |= q and r |= p → q, and this equivalence holds provided that p → q ≡ ¬p ∨ q. In terms of conjunction and implication functions, this can be expressed as c ≤ I(a, b) ⇐⇒ T (a, c) ≤ b where a, b, c ∈ {0, 1}. In the Boolean setting it is easy to see that I(a, b) = S(n(a), b), where S coincide with disjunction and n with classical negation. However these two interpretations give rise to distinct families of fuzzy implications, extending the set {0, 1} to the unit interval. The strong and residuated implication functions (S-implications and R-implications for short) are respectively defined as follows [Trillas and Valverde, 1981]. 1. S-implications are of the form IS (a, b) = S(n(a), b), where S is a t-conorm and n is a strong negation function, hence the name of strong implication, also due to the fact that when S = max, or probabilistic sum, it refers to a strong fuzzy set inclusion requiring that the support of one fuzzy set be included into the core of the other one).
Fuzzy Logic
337
2. R-implications are of the form IR (a, b) = sup{z ∈ [0, 1] : T (a, z) ≤ b}, where T is a t-norm. This mode of pseudo-inversion of the t-norm is a generalization of the traditional residuation operation in lattices, e.g. [Galatos et al., 2007] for a recent reference. Residuated implications make sense if and only if the generating t-norm is leftcontinuous. Both kinds of implication functions share the following reasonable properties: • Left-decreasingness: I(a, b) ≥ I(a′ , b) if a ≤ a′ ; • Right-increasingness: I(a, b) ≤ I(a, b′ ) if b ≤ b′ ; • Neutrality: I(1, b) = b; • Exchange: I(a, I(b, c)) = I(b, I(a, c)). Notice that another usual property like Identity: I(a, 1) = 1 easily comes from the neutrality and monotonicity properties. The main difference between strong and residuated implications lies in the fact that the contraposition property, i.e. Contraposition: I(a, b) = I(n(b), n(a)), symbol n being some negation function, holds for all strong implications but fails for most residuated implications. In contrast, the following property Ordering: I(a, b) = 1 iffa ≤ b, which establishes the fact that implication defines an ordering, holds for all residuated implications but fails for most strong ones. The failure of the contraposition property for the residuated implications enables a third kind of implication functions to be defined, the so-called reciprocal R-implications, in the following way: IC (a, b) = IR (n(b), n(a)) for some residuated implication IR and negation n. The above monotonicity and exchange properties still hold for these reciprocal implications, but now the neutrality principle is no longer valid for them. However, the following properties do hold for them: • Negation: IC (a, 0) = n(a) • Ordering: IC (a, b) = 1 iff a ≤ b
338
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
generating t-norm
S-implication
R-implication
n(a) = 1 − a max(1 − a, b)
min(a, b)
Kleene-Dienes
1, if a ≤ b b, otherwise
G¨ odel 1, if a ≤ b b/a, otherwise
Reciprocal R-implication n(a) = 1 − a 1, if a ≤ b 1 − a, otherwise
1,
if a ≤ b otherwise
a·b
1−a+a·b
max(0, a + b − 1)
Reichenbach min(1, 1 − a + b)
Goguen min(1, 1 − a + b)
min(1, 1 − a + b)
L ukasiewicz
L ukasiewicz
L ukasiewicz
1−a , 1−b
Table 1. Main multiple-valued implications Notice that the first one also holds for strong implications while the second, as already noticed, holds for the residuated implications as well. Table 1 shows the corresponding strong, residuated and reciprocal implications definable from the three main t-norms and taking the usual negation n(a) = 1 − a. Notice that the well-known L ukasiewicz implication I(a, b) = min(1, 1 − a + b) is both an Simplication and an R-implication, and thus a reciprocal R-implication too. The residuated implication induced by the nilpotent minimum is also an S-implication defined by: 1, if a ≤ b IR (a, b) = max(1 − a, b), otherwise. More generally all R-implications such that IR (a, 0) define an involutive negation are also S-implications. Considering only the core of R-implications gives birth to another multiplevalued implication of interest, usually named Gaines-Rescher implication, namely 1, if a ≤ b IR (a, b) = 0, otherwise. Let us observe that this implication fails to satisfy the neutrality property, we only have I(1, b) ≤ b, since I(1, b) = 0 when b < 1. Moreover, by construction, this connective is all-or-nothing although it has many-valued arguments. For more details the reader is referred to studies of various families of fuzzy implication functions satisfying some sets of required properties, for instance see [Baldwin and Pilsworth, 1980; Domingo et al., 1981; Gaines, 1976; Smets and Magrez, 1987; Trillas and Valverde, 1985; Weber, 1983]. See also [Fodor and Yager, 2000] for a more extensive survey of fuzzy implications.
Fuzzy Logic
339
Remark: Non-Commutative Conjunctions. Dubois and Prade[1984a] have shown that S-implications and R-implications could be merged into a single family, provided that the class of triangular norms is enlarged to non-commutative conjunction operators. See [Fodor, 1989] for a systematic study of this phenomenon. For instance, the Kleene-Dienes S-implication a → b = max(1 − a, b) can be obtained by residuation from the non-commutative conjunction 0, if a + b ≤ 1 ⋆ . T (a, b) = b, otherwise Note that the nilpotent minimum t-norm value for the pair (a, b) is the minimum of T ⋆ (a, b) and T ⋆ (b, a).
2.2
The possibility-theoretic view of reasoning after Zadeh
The core of Zadeh’s approach to approximate reasoning [Zadeh, 1979a] can retrospectively be viewed as relying on two main ideas: i) the possibility distributionbased representation of pieces of knowledge, and ii) a combination / projection method that makes sense in the framework of possibility theory. This what is restated in this section. Possibility distributions and the minimal specificity principle Zadeh’s knowledge representation framework is based on the idea of expressing restrictions on the possible values of so-called variables. These variables are more general than the notion of propositional variable in logic, and refer to parameters or single-valued attributes used for describing a situation, such as for instance, the pressure, the temperature of a room, the size, the age, or the sex for a person. Like in the case of random variables and probability distributions, the ill-known value of these variables can be associated with distributions mapping the domain of the concerned parameter or attribute to the real unit real interval [0, 1]. These distributions are named possibility distributions. Thus, what is known about the value of a variable x, whose domain is a set U , is represented by a possibility distribution πx . A value πx (u) is to be understood as the degree of possibility that x = u (variable x takes value u). When πx (u) = 0, it means that the value u (in U ) is completely impossible for x, while πx (u) is all the larger as u is considered to be a more possible (or in fact, less impossible) value for x; πx (u) = 1 expresses that absolutely nothing forbids to consider u as a possible value for x, but there may exist other values u′ such πx (u′ ) = 1. In that sense, πx expresses potential possibility. Since knowledge is often expressed linguistically in practice, Zadeh uses fuzzy sets as a basis for the possibilistic representation setting that he proposes. Then a fuzzy set E is used to represent an incomplete piece of information about the value of a single-valued variable x, the membership degree attached to a value expresses the level of possibility that this value is indeed the value of the variable. This is
340
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
what happens if the available information is couched in words, more precisely in fuzzy statements S of the form “x is E”, like in, e.g.“Tom is young”. Here the fuzzy set “young” represents the set of possible values of the variable x = age of Tom. The fuzzy set E is then interpreted as a possibility distribution [Zadeh, 1978a], which expresses the levels of plausibility of the possible values of the illknown variable x. Namely if the only available knowledge about x is that “x lies in E” where E is a fuzzy subset of U , then the possibility distribution of x is defined by the equation: (10) πx (u) = µE (u), ∀u ∈ U, where E (with membership function µE ) is considered as the fuzzy set of (more or less) possible values of x and where πx ranges on [0, 1]. More generally, the range of a possibility distribution can be any bounded linearly ordered scale (which may be discrete, with a finite number of levels). Fuzzy sets, viewed as possibility distributions, act as flexible constraints on the values of variables referred to in natural language sentences. The above equation represents a statement of the form “x lies in E” or more informally “x is E”. It does not mean that possibility distributions are the same as membership functions, however. The equality πx = µE is an assignment statement since it means: given that the only available knowledge is “x lies in E”, the degree of possibility that x = u is evaluated by the degree of membership µE (u). If two possibility distributions pertaining to the same variable x, πx and πx′ are such that πx < πx′ , πx is said to be more specific than πx′ in the sense that no value u is considered as less possible for x according to πx′ than to πx . This concept of specificity whose importance has been first stressed by Yager [1983a] underlies the idea that any possibility distribution πx is provisional in nature and likely to be improved by further information, when the available one is not complete. When πx < πx′ , the information πx′ is redundant and can be dropped. When the available information stems from several reliable sources, the possibility distribution that accounts for it is the least specific possibility distribution that satisfies the set of constraints induced by the pieces of information given by the different sources. This is the principle of minimal specificity. Particularly, it means that given a statement “x is E”, then any possibility distribution π such that π(u) ≤ µE (u), ∀u ∈ U , is in accordance with “x is E”. However, in order to represent our knowledge about x, choosing a particular π such that ∃u, π(u) < µE (u) would be arbitrarily too precise. Hence the equality πx = µE is naturally adopted if “x is E” is the only available knowledge, and already embodies the principle of minimal specificity. Let x and y be two variables taking their values on domains U and V respectively. Any relation R, fuzzy or not, between them can be represented by a joint possibility distribution, πx,y = µR , which expresses a (fuzzy) restriction on the Cartesian product U × V . Common examples of such fuzzy relations R between two variables x and y are representations of “approximately equal” (when U = V ), “much greater than” (when U = V is linearly ordered), or function-like relations
Fuzzy Logic
341
such that the one expressed by the fuzzy rule “if x is small then y is large” (when U and V are numerical domains). Joint possibility distributions can be easily extended to more than two variables. Generally speaking, we can thus represent fuzzy statements S of the form “(x1 , . . . , xn ) are in relation R” (where R may be itself defined from more elementary fuzzy sets, as seen later in the case of fuzzy rules). Possibility and necessity measures The extent to which the information “x is E”, represented by the possibility distribution πx = µE , is consistent with a statement like “the value of x is in subset A” is estimated by means of the possibility measure Π, defined by Zadeh[1978a]: (11) Π(A) = sup πx (u). u∈A
where A is a classical subset of U . The value of Π(A) corresponds to the element(s) of A having the greatest possibility degree according to πx ; in the finite case, “sup” can be changed into “max” in the above definition of Π(A) in eq. (11). Π(A) = 0 means x ∈ A is impossible knowing that “x is E” . Π(A) estimates the consistency of the statement “x ∈ A” with what we know about the possible values of x. It corresponds to a logical view of possibility. Indeed, if πx models a non-fuzzy piece of incomplete information represented by an ordinary subset E, the definition of a possibility measure reduces to 1, if A ∩ E = ∅ (x ∈ A and x ∈ E are consistent) (12) ΠE (A) = 0, otherwise (A and E are mutually exclusive). Any possibility measure Π satisfies the following max-decomposability characteristic property (13) Π(A ∪ B) = max(Π(A), Π(B)). Among the features of possibility measures that contrast with probability measures, let us point out the weak relationship between the possibility of an event A and that of its complement Ac (’not A’). Either A or Ac must be possible, that is max(Π(A), Π(Ac )) = 1 due to A∪Ac = U and Π(U ) = 1 (normalization of Π). The normalization of Π requires that supu∈U πx (u) = 1 ; if U is finite, it amounts to requiring the existence of some u0 ∈ U such that πx (u0 ) = 1. This normalization expresses consistency of the information captured by πx (it will be even clearer when discussing possibilistic logic). Π(U ) estimates the consistency of the statement “x ∈ U ” (it is a tautology if U is an exhaustive set of possible values), with what we know about the possible values of x. Indeed, it expresses that not all the values u are somewhat impossible for x (to a degree 1−πx (u) > 0) and that at least one value u0 will be fully possible. In case of total ignorance, ∀u ∈ U, π(u) = 1. Then, all contingent events are fully possible: Π(A) = 1 = Π(Ac ), ∀A = ∅, U . Note
342
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
that this leads to a representation of ignorance (E = U and ∀A = ∅, ΠE (A) = 1) which presupposes nothing about the number of elements in the reference set U (elementary events), while the latter aspect plays a crucial role in probabilistic modeling. The case when Π(A) = 1, Π(Ac ) > 0 corresponds to partial ignorance about A. Besides, only Π(A ∩ B) ≤ min(Π(A), Π(B)) holds. It agrees with the fact that in case of total ignorance about A, Π(A) = Π(Ac ) = 1, while for B = Ac , Π(A ∩ B) = 0 since Π(∅) = 0. The index 1 − Π(Ac ) evaluates to the impossibility of ’not A’, hence about the certainty (or necessity) of occurrence of A since when ’not A’ is impossible then A is certain. It is thus natural to use this duality and define the degree of necessity of A [Dubois and Prade, 1980; Zadeh, 1979b] as (14) N (A) = 1 − Π(Ac ) = inf 1 − πx (u). u∈A
Clearly, a necessity measure N satisfies N (A ∩ B) = min(N (A), N (B)). In case of a discrete linearly ordered scale, the mapping s → 1 − s would be replaced by the order-reversing map of the scale. The above duality relation is clearly reminiscent of modal logics that handle pairs of modalities related by a relation of the form 2p ≡ ¬3¬p. But here possibility and necessity are graded. Note that the definitions of possibility and necessity measures are qualitative in nature, since they only require a bounded linearly ordered scale. Modal accounts of possibility theory involving conditional statements have been proposed in [Lewis, 1973b] (this is called the VN conditional logic), [Fari˜ nas and Herzig, 1991; Boutilier, 1994; Fari˜ nas et al., 1994; H´ ajek et al., 1994; H´ ajek, 1994]. Before Zadeh, a graded notion of possibility was introduced as a full-fledged approach to uncertainty and decision in the 1940-1970’s by the English economist G. L. S. Shackle [1961], who called degree of potential surprise of an event its degree of impossibility, that is, the degree of necessity of the opposite event. It makes the point that possibility, in possibility theory, is understood as being potential, not actual. Shackle’s notion of possibility is basically epistemic, it is a “character of the chooser’s particular state of knowledge in his present.” Impossibility is then understood as disbelief. Potential surprise is valued on a disbelief scale, namely a positive interval of the form [0, y ∗ ], where y ∗ denotes the absolute rejection of the event to which it is assigned. The Shackle scale is thus reversed with respect to the possibility scale. In case everything is possible, all mutually exclusive hypotheses have zero surprise (corresponding to the ignorant possibility distribution where π(u) = 1, ∀u). At least one elementary hypothesis must carry zero potential surprise (the normalization condition π(u) = 1, for some u). The degree of surprise of an event, a set of elementary hypotheses, is the degree of surprise of its least surprising realization (the basic “maxitivity” axiom of possibility theory). The disbelief notion introduced later by Spohn [1990] employs the same type of convention as potential surprise, but using the set of natural integers as a disbelief scale.
Fuzzy Logic
343
Inference in approximate reasoning Inference in the framework of possibility theory as described by Zadeh [1979a] is a four-stepped procedure that can be respectively termed i) representation; ii) combination; iii) projection; iv) interpretation. Namely, given a set of n statements S1 , . . . , Sn expressing fuzzy restrictions that form a knowledge base, inference proceeds in the following way: i) Representation. Translate S1 , . . . , Sn into possibility distributions π 1 , . . . , π n restricting the values of involved variables. In particular, facts of the form St = “x is F ” translate into πxt = µF . Statements of rules of the form St = t “if x is F then y is G” translate into possibility distributions πx,y = µR with µR = f (µF , µG ) where f depends on the intended semantics of the rule, as explained below in section 2.4. Let x = (x1 , . . . , xk , . . . , xm ) be a vector made of all the variables involved in statements S1 , . . . , Sn . Assume St only involves variables x1 , . . . , xk , then its possibility distribution can be cylindrically extended to x as πxt (u1 , . . . , uk , uk+1 , . . . , um ) = π t (u1 , . . . , uk ), ∀uk+1 , . . . , um which means that the possibility that x1 = u1 , . . . , xk = uk according to St does not depend on the values uk+1 , . . . , um taken by the other variables xk+1 , . . . , xm . ii) Combination. Combine the possibility distributions πx1 , . . . , πxn obtained at step (i) in a conjunctive way in order to build a joint possibility distribution πx expressing the contents of the whole knowledge base, namely, πx = min(πx1 , . . . , πxn ). Indeed each granule of knowledge “x is Ei ”, for i = 1, . . . , n, as already said, translates into the inequality constraint (15) ∀u, πx (u) ≤ µEi (u). Thus given several pieces of knowledge of the form “x is Ei ”, for i = 1, . . . , n, we have (16) ∀i, πx ≤ µEi , or equivalently πx ≤ min µEi . i=1,...,n
Taking into account all the available pieces of knowledge S1 = “x is E1 ”,. . . , Sn = “x is En ”, the minimal specificity principle is applied. It is a principle of minimal commitment that stipulates that anything that is not explicitly declared impossible should remain possible (in other words, one has not to be more restrictive about the possible situations than what is enforced by the available pieces of knowledge). Thus, the available information should be represented by the possibility distribution:
344
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
(17) πx (u) = min µEi . i=1,...,n
iii) Projection. Then πx is projected on the domain(s) corresponding to the variable(s) of interest, i.e., the variable(s) for which one wants to know the restriction that can be deduced from the available information. Given a joint possibility distribution πx,y involving two variables defined on U × V (the extension to n variables is straightforward), its projection πy on V is obtained [Zadeh, 1975b]: (18) πy (v) = sup πx,y (u, v). u∈U
Clearly, what is computed is the possibility measure for having y = v given πx,y . Generally, πx,y ≤ min(πx , πy ) where πx (u) = Π({u} × V ). When equality holds, πx,y is then said to be min-separable, and the variables x and y are said to be non-interactive [Zadeh, 1975b]. It is in accordance with the principle of minimal specificity, since πy (v) is calculated from the highest possibility value of pairs (x, y) where y = v. When modeling incomplete information, non-interactivity expresses a lack of knowledge about potential links between x and y. Namely, if we start with two pieces of knowledge represented by πx and πy , and if we do not know if x and y are interactive or not, i.e., πx,y is not known, we use the upper bound min(πx , πy ) instead, which is less informative (but which agrees with the available knowledge). The combination and projection steps are also in agreement with Zadeh’s entailment principle, which states that if “x is E” then “x is F ”, as soon as the fuzzy set inclusion E ⊆ F holds, i.e.,∀u, µE (u) ≤ µF (u), where x denotes a variable or a tuple of variables, and u any instantiation of them. Indeed, if F is entailed by the knowledge base, i.e., mini=1,...,n µEi ≤ µF , F can be added to the knowledge base without changing anything, since πx = min(mini=1,...,n µEi , µF ) = mini=1,...,n µEi . iv) Interpretation. This last step, which is not always used, aims at providing conclusions that are linguistically interpretable [Zadeh, 1978b]. Indeed, at step (i) one starts with linguistic-like statements of the form “xi is Ei ”, and at step (iii) what is obtained is a possibility distribution πy (or πy in case of a subset of variables), and not something of the form “y is F ”. F as the best linguistic approximation of the result of step (iii) should obey three conditions: (a) F belongs to some subsets of fuzzy sets (defined on the domain V of y) that represent linguistic labels or some combinations of them that are authorized (e.g. “not very young and not very old”, built from the elementary linguistic labels “young” and “old”); (b) F should agree with the entailment principle, i.e. obey the constraint πy ≤ µF ;
Fuzzy Logic
345
(c) F should be maximally specific, i.e. as small as possible (in the sense of fuzzy set inclusion); in order to have a conclusion that is meaningful for the end-user (condition a), valid (condition b), and as precise as permitted (condition c), see, e.g. [Baldwin, 1979] for a solution to this optimization problem. Observe that if the pieces of knowledge are not fuzzy but clear-cut, this four steps procedure reduces to classical deduction, since a classical logic knowledge base is generally viewed as equivalent to the logical conjunction of the logical formulas pi that belong to the base. Moreover, in the case of propositional logic, asserting pi , where pi is a proposition, amounts to saying that any interpretation (situation) that falsifes pi is impossible, because it would not be compatible with the state of knowledge. So, at the semantic level, pi can be represented by the possibility distribution π i = µ[pi ] , where [pi ] is the set of models of pi , and µ[pi ] its characteristic function. It also encompasses possibilistic logic (see section 4.1) as a particular case [Dubois and Prade, 1991a], where pieces of knowledge are semantically equivalent to prioritized crisp constraints of the form N (Ei ) ≥ αi and N is a necessity measure. Such an inequality has a unique minimally specific solution, namely the possibility distribution πxi = max(µEi , 1 − αi ). Propositional logic corresponds to the case where ∀i, αi = 1 (and Ei = [pi ]). The combination and projection steps applied to a fact S1 = “x is F ′ ”, and a rule S2 = “if x is F then y is G”, yields πy (v) = sup min(µF ′ (u), µR (u, v)), u∈U
where µR represents the rule S2 . Then, the fact “y is G′ ” is inferred such that µG′ (v) = πy (v). This is called the generalized modus ponens, first proposed by Zadeh[1973]. However, µG′ = µG follows from µF ′ = µF only for a particular choice of f in µR = f (µF , µG ), as discussed below in Section 2.5.
2.3
Fuzzy truth-values - Degree of truth vs. degree of uncertainty
Zadeh [1978b; 1979a] also emphasizes that his theory of approximate reasoning can be interpreted in terms of what he calls “fuzzy truth-values” (see also [Bellman and Zadeh, 1977]). This terminology has led to many misunderstandings (e.g., [Haack, 1979]), that brings us back to the often made confusion (already mentioned in the introduction) between intermediate truth and uncertainty, hence between degree of truth and degree of belief. This is the topic of this section. Fuzzy truth-values as compatibility profiles It was emphasized earlier that Zadeh’s approach to approximate reasoning is based on a representation of the contents of the pieces of information. This led Bellman and Zadeh [1977] to claim that the notion of truth is local rather than absolute:
346
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
a statement can be true only with respect to another statement held for sure. In other words, truth is viewed as the compatibility between a statement and “what is known about reality”, understood as the description of some actual state of facts as stored in a database. Namely, computing the degree of truth of a statement S comes down to estimating its conformity with the description D of what is known about the actual state of facts. This point of view is in accordance with the testscore semantics for natural languages of Zadeh [1981]. It does not lead to scalar degrees of truth, but to fuzzy sets of truth-values in general. Bellman and Zadeh [1977] define the fuzzy truth-value of a fuzzy statement S = “x is A” given that another one, D = “x is B”, is taken for granted. When B = {u0 }, i.e. D = “x is (equal to) u0 ”, the degree of truth of S is simply µA (u0 ), the degree of membership of u0 to the fuzzy set A. More generally, the information on the degree of truth of S given D will be described by a fuzzy set τ (S; D) (or simply τ for short) of the unit interval [0, 1], understood as the compatibility COM (A; B) of the fuzzy set A with respect to the fuzzy set B, with membership function: sup{B(u) | A(u) = α}, if A−1 (α) = ∅ (19) τ (α) = µCOM (A;B) (α) 0, otherwise for all α ∈ [0, 1]. As can be checked, τ (S; D) is a fuzzy subset of truth-values and τ (α) is the degree of possibility, according to the available information D, that there exists an interpretation that makes S true at degree α. In fact, τ (S; D) is an epistemic state. As a consequence, truth evaluation comes down to a semantic pattern matching procedure. Six noticeable situations can be encountered [Dubois and Prade, 1988b], [Dubois et al., 1991c]. In each situation, a particular case of τ (S; D) is obtained. a) Boolean statement evaluated under complete information: S is a classical statement and D is a precise (i.e., complete) description of the actual state of facts. Namely A is not fuzzy and B = {u0 }. Either D is compatible with S and S is true (this is when u0 ∈ A) and τ (S; D) = {1}; or D is not compatible with S and S is false (this is when u0 ∈ A) and τ (S; D) = {0}. This situation prevails for any Boolean statement S. When B is the set of models of a classical knowledge base K, then this situation is when K is logically complete. b) Fuzzy statement evaluated under complete information: In that case D is still of the form x = u0 but the conformity of S with respect to D becomes a matter of degree, because A is a fuzzy set. The actual state of facts B = {u0 } can be borderline for A. For instance, the statement S to evaluate is “John is tall” and it is known that D = “John’s height is 1.75 m”. Then τ (S; D) = {A(u0 )}, a precise value in [0, 1]. Then what can be called a degree of truth can be attached to the statement S (in our example τ (S; D) = tall(1.75)); by convention τ (S; D) = {1} implies that S is true, and τ (S; D) = {0} implies that S is false. But S can
Fuzzy Logic
347
be half-false as well. In any case, the truth-value of S is precisely known. This situation is captured by truth-functional many-valued logics. c) Fuzzy statement; incomplete non-fuzzy information: In this case, the information D does not contain fuzzy information but is just incomplete, and A is a fuzzy set. Then, it can be checked that τ (S; D) is a crisp set of truth values {A(u) : u ∈ B}. This set is lower bounded by inf u∈B A(u) and upper bounded by supu∈B A(u) and represents the potential truth-values of S. d) Boolean statement evaluated under incomplete non-fuzzy information: In that case, S and D are representable in classical logic, neither A nor B are fuzzy, and the conformity of S with respect to D is still an all-or-nothing matter but may be ill-known due to the fact that D does not precisely describe the actual state of facts, i.e., there may be two distinct states of facts u and u′ that are both compatible with D such that u is compatible with S but u′ is compatible with “not S”. Hence the truth-value of S, which is either true or false (since A is not fuzzy), may be unknown. Namely, either D classically entails S, so S is certainly true (this is when B ⊆ A), and τ (S; D) = {1}; or D is not compatible with S, so S is certainly false (this is when B ∩ A = ∅) and τ (S; D) = {0}. But there is a third case, namely when D neither classically entails S nor does it entail its negation (this is when B ∩ A = ∅ and B ∩ Ac = ∅). Then the (binary) truth-value of S is unknown. This corresponds to the fuzzy truth-value τ (S; D) = {0, 1}. This situation is fully described in classical logic. The logical view of possibility is to let ΠB (A) = 1 when B ∩ A = ∅, ΠB (A) = 0 otherwise. It can be checked that, generally: τ (S; D)(0) = µCOM (A;B) (0) = ΠB (Ac ) τ (S; D)(1) = µCOM (A;B) (1) = ΠB (A). Equivalently, NB (A) = 1 − ΠB (Ac ) = 1 is interpreted as the assertion of the certainty of S. Hence the fuzzy truth-value provides a complete description of the partial belief of S. So, fuzzy truth-values describe uncertainty as much as truth (see also Yager[1983b]). e) Boolean statement evaluated under fuzzy information: In that case, S is a classical logic statement (A is an ordinary set) but D contains fuzzy information. The conformity of S with respect to the actual state of facts is still an all-or-nothing matter but remains ill-known as in the previous case. The presence of fuzzy information in D leads to qualify the uncertainty about the truth-value of S in a more refined way. A grade of possibility Π(A), intermediary between 0 and 1, can be attached to S. This grade is interpreted as the level of consistency between S and D. The dual level NB (A) = 1 − ΠB (Ac ) = 1 is interpreted as the degree the certainty of S and expresses the extent to which S is a consequence of D. These are standard possibility and necessity measures as recalled above.
348
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Clearly these numbers are not degrees of truth, but only reflect a state of belief about the truth or the falsity of statement S. In such a situation, the fuzzy truth-value τ (S; D) reduces to a fuzzy set τ of {0, 1}, such that τ (0) = ΠB (Ac ) and τ (1) = ΠB (A). Moreover, if the fuzzy sets A and B are normalized, we have max(τ (0), τ (1)) = 1, i. e., τ is a normalized fuzzy set of {0, 1}. f ) Fuzzy statement evaluated under fuzzy incomplete information: When both S and D can be expressed as fuzzy sets, the fuzzy truth-value τ (S; D) is a genuine fuzzy subset of [0, 1]. It restricts the more or less possible values of the degree of truth. Indeed, in this case, truth may altogether be a matter of degree and may be ill-known. In other words, to each truth-value α = τ (S; u) representing the degree of conformity of the fuzzy statement S with some precise state of facts u compatible with D, a degree of possibility τ (α) that S has truthvalue α is assigned. It reflects the uncertainty that u be the true state of facts. This is the most complex situation. In the particular case where S = “x is A” and D = “x is A” (i.e., B = A), the compatibility COM (A; A) reduces to α, if A−1 (α) = ∅ (20) τ (α) = µCOM (A;A) (α) 0, otherwise When A−1 (α) = ∅ for all α, τ (α) = α, ∀α ∈ [0, 1]. This particular fuzzy truth value corresponds to the idea of “certainly true” (“u-true” in Zadeh’s original terminology) . In case A−1 (α) = ∅, ∀α except 0 and 1, i.e., A is nonfuzzy, “certainly true” enforces standard Boolean truth (our case (a) above), since then COM (A; A) = {1}, whose membership function is µCOM (A;A) (1) = 1 and µCOM (A;A) (0) = 0 on the truth set {0, 1}. The fuzzy truth-value COM (A; B) thus precisely describes the relative position of fuzzy set A (involved in statement S) with respect to fuzzy set B (involved in statement D). It can be summarized, by means of two indices, the possibility and necessity of fuzzy events, respectively expressing degree of consistency of S with respect to D, and the degree of entailment of S from D, namely: ΠB (A)
=
NB (A)
=
sup min(A(u), B(u)), u∈U
inf max(A(u), 1 − B(u)).
u∈U
Indeed, ΠB (A) and NB (A) can be directly computed from the fuzzy truth-value COM (A; B). Namely, as pointed out in [Baldwin and Pilsworth, 1979; Prade, 1982; Yager, 1983b; Dubois and Prade, 1985a] : ΠB (A)
=
sup min(α, µCOM (A;B) (α))
(21)
α∈[0,1]
NB (A)
=
inf max(α, 1 − µCOM (A;B) (α)).
α∈[0,1]
(22)
Fuzzy Logic
349
Truth qualification This view of local truth leads Zadeh [1979a] to reconstruct a statement “x is B” from a fuzzy truth-qualified statement of the form “(x is A) is τ -true”, where τ is a fuzzy subset of [0, 1] (that may mean for instance “almost true”, “not very true”. . . ), according to the following equivalence: (x is A) is τ ⇔ x is B So, given that “(x is A) is τ -true”, the fuzzy set B such that “(x is A) is τ -true given that x is B” is any solution of the following functional equation: ∀α ∈ [0, 1], τ (α) = µCOM (A;B) (α) where τ and A are known. The principle of minimal specificity leads us to consider the greatest solution B to this equation, defined as, after [Bellman and Zadeh, 1977; Sanchez, 1978]: (23) B(u) = τ (A(u)), ∀u. This is also supported by an equivalent definition of COM(A; B)[Godo, 1990] which is µCOM (A;B) = inf{f | f : [0, 1] → [0, 1], f ◦ A ≥ B} where inf and ≥ refer respectively to the point-wise infimum and inequality, that is, COM (A; B) represents the minimal functional modification required for the fuzzy subset A in order to include the fuzzy subset B, in agreement with the entailment principle. The similarity of B(u) = τ (A(u)) with the modeling of linguistic modifiers [Zadeh, 1972], such as “very” (veryA (u) = (A(u))2 ) has been pointed out. Indeed, linguistic hedges can can be viewed as a kind of truth-qualifiers. This is not surprising since in natural language, truth-qualified sentences like “It is almost true that John is tall” stand for “John is almost tall”. Using this representation, fuzzy sets of [0, 1] can be interpreted in terms of fuzzy truth-values [Bellman and Zadeh, 1977; Baldwin, 1979; Yager, 1985b]. Especially • “It is true that x is A” must be equivalent to “x is A” so that the fuzzy set of [0, 1] with membership function τ (α) = α has been named true in the literature (while it really means “certainly true”). • “It is false that x is A” is often equivalent to the negative statement “x is not-A”, that is, “x is Ac ” with Ac (·) = 1 − A(·), hence the fuzzy set of [0, 1] with membership function τ (α) = 1 − α has been named false (while it really means “certainly false”). • “It is unknown if x is A” must be equivalent to “x is U ” where U is the whole domain of x. Hence, the set [0, 1] ifself corresponds to the case of a totally unknown truth-value. This is a clear indication that what Zadeh calls a fuzzy truth value is not a genuine truth-value: unknown is not a truth-value, it expresses a state of (lack of) knowledge.
350
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
It clearly appears now that what is called a fuzzy truth-value above is not a genuine truth-value. In the Boolean setting, what this discussion comes down to is to distinguish between an element of {0, 1}, where 0 means false and 1 means true, from a singleton in 2{0,1} , where the set {0} means certainly false and {1} means certainly true. So, fuzzy truth-values true and false are misnomers here. The natural language expression “it is true that x is A” really means “it is certainly true that x is A”, and “it is false that x is A” really means “it is certainly false that x is A”. One thus may argue that the fuzzy set with membership function τ (α) = α could be better named certainly true, and is a modality, the fuzzy set with membership function τ (α) = 1 − α could be named certainly false; this is in better agreement with the representation of “unknown” by the set [0, 1] itself, not by a specific element of the truth set. In a nutshell, Zadeh’s fuzzy truth-values are epistemic states modeled by (fuzzy) subsets of the truth-sets. The term fuzzy truth-value could wrongly suggest a particular view of Fuzzy Logic as a fuzzy truth-valued logic, i.e., a logic where truth-values are fuzzy sets (represeting linguistic labels). Viewed as such, fuzzy logic would be just another multiple-valued logic whose truth set is a family of fuzzy sets. This view is not sanctioned by the above analysis of fuzzy truth-values. Zadeh’s fuzzy logic is a logic where truth-qualified statements can be expressed using (linguistic) values represented by fuzzy sets of the unit interval. That is, the truth set is just the unit interval, and fuzzy truth-values described here express uncertainty about precise truth-values. The situation where a fuzzy set of the unit interval could be viewed as a genuine truth-value would be in the case of a fuzzy statement S represented by a type 2 fuzzy set (a fuzzy set with fuzzy set-valued membership grades, [Mizumoto and Tanaka, 1976; Dubois and Prade, 1979b] and a reference statement D expressing complete information x = u0 . Then A(u0 ) is a fuzzy set of the unit interval which could be interpreted as a genuine (fuzzy) truth-value. Type 2 fuzzy logic, and especially the particular case of interval-valued fuzzy logic, have been developed at a practical level in the last ten years for trying to cope with engineering needs [Mendel, 2000]. As seen above, COM (A; B) has its support in {0, 1} if A is not fuzzy. It makes no sense, as a consequence, to assert “it is τ -true that x is A” using a fuzzy (linguistic) truth-value τ , namely a fuzzy set τ whose support extends outside {0, 1}. This is because one is not entitled, strictly speaking, to attach intermediary grades of truth to Boolean statements, e.g., formulas in classical logic. However it is possible to give a meaning to sentences such as “it is almost true that x = 5”. It clearly intends to mean that “x is almost equal to 5”. This can be done by equipping the set of interpretations of the language with fuzzy proximity relations R such that saying “x is A” means in fact “x is R ◦ A” (see [Prade, 1985], p. 269), where the composition R ◦ A (defined by (R ◦ A)(u) = supv∈A R(u, v) is a fuzzy subset which is larger than A, while A may be Boolean. Then R◦A corresponds to an upper approximation of A which gathers the elements in A and those which are close to them. This indicates a dispositional use of Boolean statements that need
Fuzzy Logic
351
to be fuzzified before their meaning can be laid bare. This view has been specially advocated by Ruspini [1991]. This latter dispositional use of Boolean statements contrasts with the one related to usuality described by Zadeh [1987], for whom “snow is white” is short for “usually, snow is white””, which is in the spirit of default rules having potential exceptions, as studied in nonmonotonic reasoning (see also section 4.1). This fuzzification of Boolean concepts is related to Weston [1987]’s idea of approximate truth as reflecting a distance between a statement and the ideal truth, since fuzzy proximity relations are closely related to distances. Niskanen [1988] also advocates in favor of a distance view of approximate truth where the degree of truth of a statement S with respect to the available information D is computed as a relative distance between the (fuzzy) subsets representing S and D (by extending to fuzzy sets a relative distance which is supposed to exist on the referential). This distance-based approach corresponds to an “horizontal view” directly related to the distance existing between elements of the referential corresponding to D and S, and completely contrasts with the “vertical view” of the information system approach presented here where membership functions of the representations of S and D are compared, in terms of degrees of inclusion and non-empty intersection. Truth qualification and R.C.T. Lee’s fuzzy logic An interesting particular case of truth qualification is the one of statements of the form “(x is A) is at least γ-true”, where γ ∈ [0, 1]. This means that “(x is A) is τ γ true”, with τ γ (α) = 0 if α < γ and τ γ (α) = 1 if α ≥ γ. This is a truth-qualified fuzzy proposition “p is at least γ-true” with p = “x is A”. Applying Zadeh’s view, it precisely means that the truth-qualified statement is equivalent to “x is Aγ ”, where Aγ is the γ-level cut of the fuzzy set A, a classical subset defined by Aγ = {u | µA (u) ≥ γ}. This enables us to retrieve a noticeable particular case of multiple-valued logics of Lee [1972] and Yager [1985a], see [Dubois et al., 1991c] for a survey. Assume we have the two statements “(x is A or B) is at least γ1 -true” and “(x is not A or C) is at least γ2 -true”. First, note that in Zadeh’s approach, the disjunction “(x is A) or (x is B)” is represented by the disjunction of constraints “(πx ≤ µA ) or (πx ≤ µB )”, which entails πx ≤ max(µA , µB ). This leads to take πx = max(µA , µB ) as a representation of the disjunction “(x is A) or (x is B)”, in agreement with the spirit of the minimal specificity principle (since there is no µ such that “(πx ≤ µA ) or (πx ≤ µB )” entails πx ≤ µ, with µ < max(µA , µB )). Then, taking µAorB = max(µA , µB ), which is the most commonly used definition of the union of fuzzy sets, “x is A or B” is equivalent to“(x is A) or (x is B)”, while observe that “[(x is A) is at least γ-true ] or [(x is B) is at least γ-true]” only entails “(x is A or B) is at least γ-true” (since µA (u) ≥ γ or µB (u) ≥ γ implies max(µA (u), µB (u)) ≥ γ). Moreover, “x is not A” is assumed to be represented by the constraint πx ≤ µnotA = 1 − µA . Thus, the two statements “(x is A or B) is at least γ1 -true” and “(x is not A or C) is at least γ2 -true” are respectively
352
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
represented by the constraints γ1 ≤ max(µA , µB ) γ2 ≤ max(1 − µA , µC ), and thus min(γ1 , γ2 ) ≤ min(max(µA , µB ), max(1 − µA , µC )) which implies min(γ1 , γ2 ) ≤ max(min(µB , µC ), min(µA , 1 − µA )) and also min(γ1 , γ2 ) ≤ max(min(µB , µC ), 0.5), since min(µA , 1 − µA )) ≤ 0.5. Thus, assuming min(γ1 , γ2 ) > 0.5, we get min(γ1 , γ2 ) ≤ max(µB , µC ). Hence the following inference pattern (where p, q, and r are fuzzy propositions) is again in agreement with Zadeh’s theory of approximate reasoning: v(p ∨ q) ≥ γ1 , v(¬p ∨ r) ≥ γ2 , if 0.5 < min(γ1 , γ2 ) v(q ∨ r) ≥ min(γ1 , γ2 ) with v(p ∨ q) = max(v(p), v(q)) and v(¬p) = 1 − v(p), as in [Lee, 1972]. Truth qualification and possibilistic logic This corresponds to situation (e) above of a Boolean statement in the face of fuzzy information. But now, the fuzzy information “x is B” should be retrieved from the equations τ (0) = ΠB (Ac ) = 1 − NB (A) and τ (1) = ΠB (A) with max(τ (0), τ (1)) = 1, where A is an ordinary subset and thus p = “x is A” is a classical proposition. Assume τ (1) = 1. It means that p = “(x is A) is certain to degree 1−τ (0)” (or if we prefer that “it is certain to degree 1 − τ (0) that p is true”), since NB (A) = 1 − τ (0) (with B unknown), which is then equivalent to the fuzzy statement “x is B” represented by ∀u, πx (u) = B(u) = max(A(u), τ (0)), by application of the minimal specificity principle. If τ (0) = 1, it means that p = “(x is not A) is certain at degree 1 − τ (1)”, then one obtains ∀u, πx (u) = B(u) = max(Ac (u), τ (1)). As can be seen, if τ (1) = 1 = τ (0), then we are in the situation of complete ignorance, i.e. ∀u, B(u) = 1 (neither A nor ’not A’ are somewhat certain). The latter particular case of certainty qualification of Boolean statements corresponds to the semantical side of possibilistic logic, as explained in section 4.1. The distinction between thresholding degrees of truth and thresholding degrees of certainty is first emphasized in [Dubois et al., 1997b], further elaborated in [Lehmke, 2001b], where a more general logical framework is proposed that attaches fuzzy truth-values τ to fuzzy propositions. Certainty qualification of fuzzy propositions Informally, asserting “It is true that x is A” is viewed as equivalent to “x is A”. Then what is considered as true, stricto sensu, is that πx = A(·) is certain.
Fuzzy Logic
353
Interpreting true in a very strong way as the certainty that the truth value is maximal, i.e., τ ′ (α) = 0 if α < 1 and τ ′ (1) = 1, would come down to postulating that “It is true that x is A” is equivalent to “x is in core(A)”, where core(A) = {u | A(u) = 1}, or in other words, “A(x) = 1”. So, as already said, the fuzzy set of [0, 1] with membership function τ (α) = α modeling ’true’ here means more than pointing to a single truth value, it is the invariant operator in the set of (linguistic) modifiers of membership functions, such that true(A(u)) = A(u). Similarly, “it is false that x is A” is understood as “x is Ac ” (equivalent to “‘it is true that x is Ac ”). It follows the linguistic exchange rule between linguistic modifiers and fuzzy truth-values, with f alse(α) = 1 − α, and f alse(A(u)) = 1 − A(u) = Ac (u). It is not the same as asserting that “(x is in core(A)) is false”, nor that “A(x) = 0”, although all these views coincide in the non-fuzzy case. It also differs from the (meta) negation, bearing on the equality, of the assertion πx = A(·). More generally, it is natural to represent the certainty-qualified statement “it is certain at degree α that x is A”, when A is fuzzy, by πx = max(1 − α, A(·))[Dubois and Prade, 1990]. Indeed, first consider the simpler case of “x is A is certain”, where A is fuzzy. Clearly the formula gives back πx = A(·) for α = 1. Let us observe that “x is A” is equivalent to say that “(x is Aλ ) is 1 − λ certain”, for λ ∈ [0, 1), where Aλ is the strict λ-cut of A, i.e., Aλ = {u ∈ U | A(u) > λ}, since N (Aλ ) ≥ 1 − λ where N is the necessity measure defined from πx = A(·). In the general case of statements of the form “(x is A) is (at least) α-certain”, it is natural to forbid the certainty of any level cut to overpass α. It amounts to stating that ∀λ, “(x is Aλ ) is (at least) min(α, 1 − λ)-certain”. This is satisfied by keeping πx = max(1 − α, A(·)). Observe, however, that πx cannot be retrieved as the least specific solution of equation N (A) ≥ α using the definition of the necessity of a fuzzy event given by N (A) = inf u∈U max(A(u), 1 − π(u)) = 1 − sup u ∈ U min(1 − A(u), π(u)) = 1 − Π(Ac ), since N (A) is then not equal to 1 for πx = A(·). Nevertheless, πx = max(1 − α, A(·)) is still the least specific solution of an equation of the form C(A) ≥ α, where C(A) is defined by C(A) = inf πx (u) → A(u) u
where α → β is the reciprocal of G¨ odel’s implication, namely α → β = 1 if α ≤ β and α → β = 1−α otherwise. The equivalence C(A) ≥ α ⇔ πx ≤ max(1−α, A(·)) is easy to prove using the equivalence γ ≤ max(1 − α, β) ⇔ γ → β ≥ α. C(A) is a particular case of a degree of inclusion of B (with πx = µB (·)) into A. Then C(A) = 1 yields πx ≤ A(·), while N (A) = 1 would yield πx ≤ µcore(A) (·) (since N (A) = 1 if and only if {u ∈ U | πx (u) > 0} ⊆ core(A)). As expected, “it is true that x is A”, represented by πx = A(·), indeed means “it is certain that (x is A) is true”, since then C(A) = 1, and “it is false that x is A”, represented by πx = 1 − A(·), indeed means “it is certain that (x is A) is false” since then C(Ac ) = 1. While if ’true’ refers to the usual truth-value (represented here by τ 1 (α) = 0 if α < 1 and τ 1 (1) = 1), “it is true that x is A” is represented by πx = mucore(A) (·), and N (A) = 1. Moreover, note that both N
354
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
and C still enjoy the characteristic properties N (A ∩ B) = min(N (A), N (B)) and C(A ∩ B) = min(C(A), C(B)), when the intersection of two fuzzy sets is defined by combining pointwisely their membership functions by the operation min.
Graded truth versus degrees of uncertainty: the compositionality problem The frequent confusion pervading the relationship between truth and (un)certainty in the approximate reasoning literature is apparently due to the lack of a dedicated paradigm for interpreting partial truth and degrees of uncertainty in a single framework, although the distinction between the two concepts has been made a long time ago e.g. [Carnap, 1949; de Finetti, 1936]. Such a paradigm has been provided above. An important consequence of our information-based interpretation of truth is that degrees of uncertainty cannot be compositional for all connectives [Dubois and Prade, 1994; Dubois and Prade, 2001]. Let g stand for a [0, 1]-valued function that intends to estimate degrees of confidence in propositions. Let A be the set of situations where proposition S is true. It corresponds to assuming a fuzzy truth value in Zadeh’s sense, defined on {0, 1}, letting τ (1) = g(A) and τ (0) = g(Ac ). Then, Ac , A1 ∩ A2 , A1 ∪ A2 , respectively denote the set of situations where the propositions “not-S”, “S1 and S2 ”, S1 or S2 ” hold, g(A) is the degree of confidence in proposition S. It can be proved that there cannot exist operations ⊗ and ⊕ on [0, 1], nor negation functions f such that the following identities simultaneously hold for all propositions whose meaning is described by crisp sets A1 , A2 , A: (i) g(Ac ) = f (g(A)); (ii) g(A1 ∩ A2 ) = g(A1 ) ⊗ g(A2 ); (iii) g(A1 ∪ A2 ) = g(A1 ) ⊕ g(A2 ). More precisely, (i)-(ii)-(iii) entail that for any A, g(A) ∈ {0, 1}, and either g(A) = 0 or g(A) = 1, i.e., this the case of complete information, where all statements are either certainly true or certainly false and g is isomorphic to a classical truthassignment function. This result is proved independently in [Weston, 1987], [Dubois and Prade, 1988b]. However weak forms of compositionality are allowed; for instance Π(A1 ∪ A2 ) = max(Π(A1 ), Π(A2 )) in possibility theory, but generally, Π(A1 ∩ A2 ) < min(Π(A1 ), Π(A2 )); the equality Π(A1 ∩ A2 ) = min(Π(A1 ), Π(A2 )) holds for propositions “x1 is A1 ” and “x2 is A2 ” that refer to non-interactive variables x1 and x2 (see the previous section 2.2). Similarly, with grades of probability P (A) = 1 − P (Ac ) but P (A1 ∩ A2 ) = P (A1 ) · P (A2 ) holds only in situations of stochastic independence between A1 and A2 . The above impossibility result is another way of stating a well-known fact, i.e., that the unit interval cannot be equipped with a Boolean algebra structure.
Fuzzy Logic
355
This result is based on the assumption that the propositions to evaluate are not fuzzy and thus belong to a Boolean algebra. By contrast, confidence values of fuzzy (or non-fuzzy) propositions may be assumed to be compositional when these propositions are evaluated under complete information, since then, sets of possible truth-values reduce to singletons. The possibility of having g(A) ∈ {0, 1} is because sets of fuzzy propositions are no longer Boolean algebras. For instance, using max, min, 1 − (·) for expressing disjunction, conjunction and negation of fuzzy propositions, sets of such propositions are equipped with a distributive lattice structure weaker than a Boolean algebra, which is compatible with the unit interval. Sometimes, arguments against fuzzy set theory rely on compositionality issues, (e.g.,[Weston, 1987], [Elkan, 1994]). These arguments are based either on the wrong assumptions that the algebra of propositions to be evaluated is Boolean, or that intermediate degrees of truth can model uncertainty. As a consequence, fuzzy truth-values `a la Zadeh are not truth-functional, generally, since they account for uncertainty. Namely COM (A1 ∩ A2 ; B) is not a function of COM (A1 ; B) and COM (A2 ; B); COM (A1 ∪ A2 ; B) is not a function of COM (A1 ; B) and COM (A2 ; B). This lack of compositionality is one more proof that fuzzy truth-values are not intermediate truth-values in the sense of a compositional many-valued logic. Neither is Zadeh’s fuzzy logic a type 2 fuzzy logic in the sense of [Dubois and Prade, 1979b], who use 2[0,1] as a truth set, and define compositional connectives by extending those of multiple valued logic to fuzzy set-valued arguments. The presence or absence of compositional rules is a criterion to distinguish between the problem of defining truth tables in logics with gradual propositions, and the problem of reasoning under uncertainty (logics that infer from more or less certainly true classical propositions under incomplete information). However it does not mean that all logics of graded truth are compositional (for instance, similarity logics using crisp propositions fuzzified by a fuzzy proximity relation (as done in [Ruspini, 1991]), are not compositional [Dubois and Prade, 1998b]. The information system paradigm underlying Zadeh’s view of fuzzy truth values nevertheless questions the comparison made in [Gaines, 1978] between probabilistic logics which are not compositional, and a particular (max-min) many-valued logic which is truth-functional. The setting in which this comparison takes place (i.e., abstract distributive lattices equipped with a valuation) does not allow for a proper conceptual discrimination between graded truth and uncertainty. The meaning of valuations attached to propositions is left open, so that grades of probability and degrees of truth in fuzzy logic are misleadingly treated as special cases of such abstract valuations. As a consequence Gaines’ comparison remains at an abstract level and has limited practical significance. Moreover the chosen abstract setting is not general enough to encompass all many-valued logics. For instance Gaines “standard uncertainty logic” (SUL) assumes that conjunction and disjunction are idempotent; this assumption rules out most of the compositional many-valued calculi surveyed in Section 3 of this chapter, where operations other than min and max are used to represent conjunctions and disjunctions of fuzzy predicates.
356
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Moreover, when the SUL is compositional, it suffers from the above trivialization. Alternative views of fuzzy truth The above approach to truth and uncertainty has been tailored for a special purpose, i.e., that of dealing with knowledge-based reasoning systems. It suggests fuzzy matching techniques between the meaning of a proposition and a state of knowledge as natural procedures for effectively computing degrees of uncertainty, modeled as fuzzy truth-values in the presence of fuzziness. Clearly other empirical settings for defining truth-values exist. Gaines [1978] suggests a systematic way of generating valuations in a SUL by resolving paradoxes (such as the Barber Paradox). This approach, also advocated by Smets and Magrez [1988], does not make a clear distinction between graded truth and uncertainty; moreover its relevance and practical usefulness for dealing with knowledge-based systems is questionable. Another view of truth is the one proposed in [Giles, 1988a; Giles, 1988b]. Namely the truth of a vague statement S in a supposedly known state of fact D = {u} reflects the “gain in prestige” an individual would get by asserting S in front of a society of people. This gain is expressed as a pay-off function. When the state of facts is ill-known, Giles assumes that it can be represented by a subjective probability distribution and the degree of truth of S is viewed as the expected pay-off for asserting S. Giles’ metaphor provides a nice device to elicit degrees of membership in terms of utility values. His view is in accordance with our data base metaphor where only probability distributions would be admitted to represent uncertainty. However the distinction between truth-values and degrees of belief (viewed by Giles as “the subjective form of degrees of truth”) is again hard to make. Especially the expected pay-off of S is the probability P (S) of the fuzzy event S, i.e., a grade of uncertainty; but it is also an expected truth-value. The use of expectations mixes truth-values and degrees of belief. Note that the two equations (21) and (22) consider possibility and necessity as a kind of qualitative expected values of the compatibility. So, expectation-based evaluations, summarizing distributions over truth values, are not compositional.
2.4 Fuzzy if-then rules Fuzzy if-then rules are conditional statements of the form “if x is A then y is B”, or more generally “if x1 is A1 and . . . and xn is An then y is B”, where A, Ai , B are fuzzy sets. They appear originally in [Zadeh, 1973], that provides an outline of his future theory of approximate reasoning. From this initial proposal, a huge amount of literature was produced aiming at proposing different encoding of fuzzy rules or some mechanisms for processing them, often motivated by some engineering concerns such as fuzzy rules-based control, e.g., [Mamdani, 1977; Sugeno and Takagi, 1983; Sugeno, 1985]. It is out of the scope of the present chapter to review all the approximate reasoning literature in detail (see [Bouchon-Meunier et al., 1999] for a detailed overview). In the following, we first provide the representation of different kinds of fuzzy rules that make sense in the possibility theory-based
Fuzzy Logic
357
setting presented above, and then discuss how drawing inferences in this setting. Understanding the semantics of the different models of fuzzy rules is a key issue for figuring out their range of applicability and their proper processing. For the sake of clarity we start the presentation with non-fuzzy rules and we then extend the discussion to the general case of fuzzy rules. Two understandings of if-then rules Consider the rule “if x ∈ A then y ∈ B” where x and y are variables ranging on domains U and V , and A and B are ordinary (i.e., non fuzzy) subsets of U and V respectively. The partial description of a relationship R between x and y that the rule provides can be equivalently formulated in terms of (Boolean) membership functions as the condition: if A(u) = 1 then B(v) = 1. If we think of this relationship as a binary relation R on U × V , then clearly pairs (u, v) of values of the variables (x, y) such that A(u) = B(v) = 1 must belong to the relation R, while pairs such that A(u) = 1 and B(v) = 0 cannot belong to R. However, this condition says nothing about pairs (u, v) for which A(u) = 0. That is, these pairs may or may not belong to the relation R. Therefore, the only constraints enforced by the rule on relation R are the following ones: min(A(u), B(v)) ≤ R(u, v) ≤ max(1 − A(u), B(v)). In other words, R contains at least all the pairs (u, v) such that A(u) = B(v) = 1 and at most those pairs (u, v) such that either A(u) = 0 or B(v) = 1. Thus, the above inequalities express that any representation of the rule “if x ∈ A then y ∈ B” is lower bounded by the representation of the conjunction “x ∈ A and y ∈ B” and upper bounded by the representation of the material implication “x ∈ A implies y ∈ B”, i.e.,“ x ∈ A or y ∈ B”. In set notation, it reads A × B ⊆ R ⊆ (Ac × V ) ∪ (U × B). Thus, in terms of the constraints induced on the joint possibility distribution πx,y restricting the possible values of the two-dimensional variable (x, y), the above inequalities lead to the two following types of constraints: • the inequality πx,y (u, v) ≤ max(1 − A(u), B(v)) expresses that values outside B are impossible values for y when x takes value in A (i.e., πx,y (u, v) = 0 if A(u) = 1 and B(v) = 0), while the possible values for y are unrestricted (πx,y (u, v) ≤ 1) when x does not take value in A. Thus, the meaning of this inequality can be read: if x ∈ A, it is certain that the value of y is in B. • the inequality πx,y (u, v) ≥ min(A(u), B(v))
358
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
means that all values v ∈ B are possible when x takes value in A (that is, πx,y (u, v) = 1 if A(u) = B(v) = 1), while no constraint is provided for the values of y when x does not take value in A. Thus, the semantics of the latter inequality reads: if x ∈ A, all the values in B are possible (admissible, feasible) for y. We immediately recognize in the right-hand side of the two above inequalities a (binary) implication and a (binary) conjunction respectively. They respectively define the conjunction-based and the implication-based models of rules. But even if they are of different nature, both models stem from considering a rule as a (partial) specification of a binary relation R on the product space U × V . Note that R ⊆ (Ac × V ) ∪ (U × B) is equivalent to A ◦ R ⊆ B in the Boolean case, A ◦ R being the usual image of A via R (A◦R = {v ∈ V | ∃u ∈ U, A(u) = 1, R(u, v) = 1}). Implication-based models of rules correspond to a type of constraints that we have already encountered when introducing the possibility theory setting. Conjunction-based models of rules cannot be processed using the minimal specificity principle. As we shall see they correspond to another type of information than the one usually considered in classical logical reasoning and involve a notion of possibility different from the one estimated by Π. The existence and the proper use of implication-based and conjunction-based representations of fuzzy rules has been often misunderstood in various fields of applications. As pointed out in a series of papers by Dubois and Prade [1989; 1991a; 1992a; 1992b; 1996a], there are several types of fuzzy rules with different semantics, corresponding to several types of implications or conjunctions. As seen above, the meaning of a rule of the form “if x is A then y is B” is significantly different when modeled using a genuine implication A → B or using a Cartesian product A × B. Implication-based fuzzy rules Let us consider the rule “if x is A then y is B” where A and B are now fuzzy subsets of U and V respectively. In this case, the intuitive idea underlying such a rule is to say that if the value of x is no longer in the core of A, but still close to it, the possible values of y lie in some fuzzy subset not too much different from B. The ways B can be modified in order to accommodate the possible values of y depend on the intended meaning of the fuzzy rule, as expressed by the connective relating A and B. In this subsection a fuzzy rule is viewed as a constraint πx,y (u, v) ≤ I(A(u), B(v)) for some many-valued implicationI. However, contrary to the Boolean case, R ⊆ (Ac × V ) ∪ (U × B) is no longer equivalent to A◦R ⊆ B, due to the difference between two types of multiple-valued implications: S-implications and R-implications. It gives birth to two types of fuzzy rules. Certainty rules. A first way of relaxing the conclusion B is to attach some level of certainty to it, independently of whether B is fuzzy or not, in such a way that the possibility degrees of the values outside the support of B become strictly
Fuzzy Logic
359
positive. This corresponds to rules of the type “the more x is A, the more certain y is B” and they are known in the literature as certainty rules. A simple translation of this type of constraint is the inequality ∀u, A(u) ≤ C(B) where C(B) stands for the certainty of B under the unknown possibility distribution πx,y (as for certainty-qualified fuzzy statements), i. e. C(B) = inf I(πx,y (u, v), B(v)), v
where the implication I is the reciprocal of an R-implication IR (the previous definition of the certainty of a fuzzy statement introduced in Section 2.3 is here enlarged to any reciprocal of an R-implication). Then, in agreement with the minimal specificity principle, the greatest solution of this certainty-qualification problem provides the solution to the problem of representing certainty rules, namely πx,y (u, v) ≤ IS (A(u), B(v)) = S(n(A(u)), B(v)) where the right hand side of the inequality corresponds to the strong implication defined from the negation function n and the t-conorm S which is n-dual of the t-norm T generator of IR . In particular, if n(α) = 1 − α, T (α, β) = min(α, β), S(α, β) = max(α, β), we obtain πx,y (u, v) ≤ max(1 − A(u), B(v)) where Kleene-Dienes implication α → β = max(1 − α, β) can be recognized. Gradual Rules. The second way of relaxing the conclusion amounts to enlarging the core of B, in such a way that if x takes value in the α-cut of A, then the values in the α-cut of B become fully possible for y. This interpretation, which requires B to be fuzzy, corresponds to the so-called gradual rules, i.e., rules of the type “the more x is A, the more y is B”, as in the piece of knowledge “the bigger a truck, the slower its speed”. (Statements involving “the less” are easily obtained by duality, using the fuzzy set complementation). The name ‘gradual rule’ was coined by Prade [1988]; see also [Dubois and Prade, 1992b]. The intended meaning of a gradual rule, understood as “the greater the membership degree of the value of x to the fuzzy set A, the greater the membership degree of the value of y to the fuzzy set B should be” is captured by the following inequality: min(A(u), πx,y (u, v)) ≤ B(v) or equivalently, πx,y (u, v) → A(u) → B(v), where → denotes G¨odel’s implication. The above inequality can be relaxed by introducing a triangular norm T , i.e., T (A(u), πx,y (u, v)) ≤ B(v).
360
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Then → will be replaced by the corresponding R-implication generated by T . Clearly, in this type of rules the degree of truth of the antecedent constrains the degree of truth of the consequent, since A(u) → B(v) = 1 if and only if A(u) ≤ B(v) for R-implications. Impossibility rules. A third category of implication-based rule is obtained by writing a constraint expressing that “the more x is A, the less possible the complement of B is a range for y”. Such rules are interpreted as saying, “if x = u then the complement of B is at most (1 − A(u))-possible”. This corresponds to the following inequality as interpretation of the fuzzy rule (where the usual definition of the possibility of a fuzzy event is extended using a triangular norm T instead of the minimum operation only): Π(B c ) = sup T (1 − B(v), πx,y (u, v)) ≤ 1 − A(u), v
this reads “the more x is A, the more impossible not-B”. It leads to the following equivalent inequality πx,y (u, v) ≤ (1 − B(v)) ≤ (1 − A(u)) where → is the R-implication associated with T . If T = min, then we get the following constraint πx,y (u, v) ≤ 1 − A(u) if A(u) > B(v). If T = product, the upper bound of πx,y (u, v) is the reciprocal of Goguen implication from A(u) to B(v). In practice these rules are close to certainty rules since they coincide when B is a non-fuzzy set (as expected from the semantics). However, when B is fuzzy, impossibility rules combine the main effects of certainty and gradual rules: apparition of a level of uncertainty and widening of the core of B: the more x is A, the more certain y is in a smaller subset of values around the core of B. Thus, they could also be named certainty-gradual rules so as to account for this double effect. Note that in the implication-based models, πx,y is always upper bounded; then applying the minimal specificity principle leads to a possibility distribution which is normalized (if B is normalized). The three types of implication-based fuzzy rules correspond to the three basic types of implication functions recalled above. In the fuzzy logic literature, other models of implication functions have been considered. For instance, let us mention QL-implications [Trillas and Valverde, 1981]. They are based on interpreting p → q as ¬p ∨ (p ∧ q), which is used in quantum logic (in classical logic it obviously reduces to material implication). This view leads to implication functions of the form I(α, β) = S(n(α), T (α, β)) where S is a t-conorm, n a strong negation and T is the n-dual t-norm of S. The so-called Zadeh’s implication [Zadeh, 1973] corresponds to taking S = max, i.e.,I(α, β) = max(1 − α, min(α, β)), and is the basis for another type of fuzzy rules.
Fuzzy Logic
361
Conjunction-based fuzzy rules Conjunction-based fuzzy rules first appear as an ad hoc proposal in the first fuzzy rule-based controllers [Mamdani, 1977]. Later, they were reinterpreted in the setting of possibility theory, using a new type of possibility evaluation. Namely, interpreting “x is A is (at least) β-possible” as “all elements in A are possible values for x, at least with degree β”, i.e., ∆(A) = inf u∈A πx (u) ≥ β, leads to state the following constraint on πx : ∀u, πx (u) ≥ min(A(u), β). This approach is actually in the spirit of a proposal also briefly discussed in [Zadeh, 1978b] and more extensively in [Sanchez, 1978]. See [Dubois and Prade, 1992b] for the introduction of the measure of guaranteed possibility ∆, and [Dubois et al., 2000], [Dubois et al., 2003b] for the development of a bipolar view of possibility theory allowing for the representation of positive and negative pieces of information. Constraints enforcing lower bounds on a possibility distribution, as above, are positive pieces of information, since it guarantees a minimum level of possibility for some values or interpretations. This contrasts with constraints enforcing upper bounds on a possibility distribution, which are negative pieces of information, since they state that some values are to some extent impossible (those values whose degree of possibility is strictly less than 1 and may be close to 0). Note that classical logic handles negative information in the above sense. Indeed, knowing a collection of propositional statements of the form “x is Ai ” (where the Ai ’s are classical subsets of a universe U ) is equivalent to saying that values for x outside ∩i Ai are impossible. Note that positive information should obey a maximal specificity principle that states that only what is reported as being actually possible should be considered as such (and to a degree that is not higher than what is stated). This means that we only know that ∀u, πx (u) ≥ min(A(u), β), as far as positive information is concerned, then the positive part of the information will be represented by the smallest possibility distribution obeying the constraint, here, ∀u, πx (u) = min(A(u), β). In case of several pieces of positive information stating that “x is Ai ” is guaranteed to be possible, then we can conclude from πx (u) ≥ Ai (u), that πx (u) ≥ maxi Ai (u), (“x is ∪i Ai ” in case of classical subsets), which corresponds to a disjunctive combination of information. Note that both the minimal specificity principle for negative information and the maximal specificity principle for positive information are the two sides of the same coin. They are in fact minimal commitment principles. Together they state that potential values for x cannot be considered as more impossible (in the Π-sense), nor as more possible (in the ∆-sense) than what follows from the constraints representing the available negative or positive information. In the case where A is a fuzzy set, the representation of statements of the form “x is A is (at least) β-possible” by πx (u) ≥ min(A(u), β), is still equivalent to ∆(A) ≥ β, provided that ∆(A) is extended to fuzzy events by ∆(A) = inf A(u) → πx (u) u
362
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
where → is G¨ odel ’s implication. This can be easily shown using the equivalence α → β ≥ γ ⇔ β ≥ min(α, γ). This is the basis for defining possibility rules. Possibility rules. They correspond to rules of the form “the more x is A, the more possible y is B”, understood as if x = u, any value compatible with “y is B” is all the more guaranteed as being possible for y as A(u) is higher, in agreement with the sense of the set function ∆. Thus, the representation of such possibility-qualified statements obey the constraint: A(u) ≤ ∆(B). Hence the constraint on the conditional possibility distribution πx,y (u, ·) for y is min(A(u), B(v)) ≤ πx,y (u, v) or, more generally T (A(u), B(v)) ≤ πx,y (u, v) if we allow the use of any t-norm T in place of min. As already mentioned, this type of rules (using T = min or product) pervades the literature on fuzzy control, since it is in accordance with viewing fuzzy rules as partial descriptions of a fuzzy graph R relating x and y, in the sense that “if x is A then y is B” says nothing but the fuzzy set A × B belongs to the graph of R, i.e., A × B ⊆ R. This interpretation helps us understand why the fuzzy output of fuzzy rules-based controllers is generally subnormalized: the obtained output is nothing but a lower bound on the actual possibility distribution. When A and B become fuzzy, the equivalence between A×B ⊆ R and A◦Rc ⊆ B c no longer holds. This leads to a new conjunction-based kind of fuzzy rules, called antigradual rules, where the guaranteed possible range of values for y is reduced when x moves away from the core of A. Antigradual rules. They correspond to a rule of the type “the more x is A and the less y is related to x, the less y is B”, and to the corresponding constraint T (A(u), 1 − πx,y (u, v)) ≤ 1 − B(v) where T is a triangular norm. This can be equivalently written T ∗ (A(u), B(v)) =def 1 − (A(u) →R (1 − B(v))) ≤ πx,y (u, v) where →R is the residuated implication based on T . T ∗ is a non-commutative conjunction that is the right adjoint of a strong implication, i. e., the strong implication a →S b = n(T (a, n(b)) can be obtained from T ∗ by residuation, starting with a continuous t-norm T [Dubois and Prade, 1984a]. It can be checked that πx,y (u, v) ≥ B(v) if and only if A(u) > 1 − B(v) for T = min. Thus, the values v such that B(v) > 1−A(u) are guaranteed to be possible for y, and the larger A(u), the larger the subset of values v for y guaranteed as possible (at degree B(v)). In other words, the subset of values for y with some positive guaranteed possibility becomes smaller as x moves away from the core of A.
Fuzzy Logic
363
Note that the same non-commutative conjunction where A and B are permuted corresponds to a third kind of rules expressed by the constraint “the more y is B and the less y is related to x, the less x is A”, i.e. T (B(v), 1 − πx,y (u, v)) ≤ 1 − A(u). For T = min, it leads to the constraint πx,y (u, v) ≥ T ∗ (B(v), A(u)). Viewed as a rule from A to B, this is very similar to a possibility rule, since both types of rules coincide when B is non-fuzzy. When B is fuzzy, the behaviour of the above inequality is somewhat similar to the ones of both possibility and antigradual rules: truncation of B and skrinking of its support. Namely, the more x is A, the more possible a larger subset of values around the core of B. However this is not really a different kind of rule: it is an antigradual rule of the form “if y is B then x is A”. Remark. The different fuzzy rules surveyed above can be understood in terms of the modification applied to the conclusion part “y is B”, when a precise input x = u0 matches the condition “x is A” at the level A(u0 ) = α. For min-based models of fuzzy rules, B is modified into B ′ such that B ′ (v) = τ (B(v)), ∀v where τ is a modifier (or equivalently a fuzzy truth value in Zadeh’sense) defined by ∀t ∈ [0, 1], τ (θ) = 1 if θ ≥ α ; τ (θ) = θ if θ < α (gradual rule); τ (θ)) = max(θ, 1 − α) (certainty rule); τ (θ) = min(θ, α) (possibility rule); τ (θ) = 0 if θ ≤ 1 − α; τ (θ) = θ > 1 − α (antigradual rule). It can be seen that some modifiers introduce a level of uncertainty, while others rather provide a variation around the fuzzy set B by increasing high degrees of membership or decreasing low degrees. Meta-rules Besides the relational view presented in the two above subsections, we can think of a rule “if x is A then y is B” as specifying some constraints between the marginal possibility distributions πx and πy describing the available knowledge about the variables x and y. Indeed, the meanings of the individual components of the rule, in terms of their induced constraints, are πx ≤ µA and πy ≤ µB . Therefore, a possible understanding of the rule is just the following condition if πx ≤ µA then πy ≤ µB which, in turn, has the following easy possibilistic interpretation in case A and B are not fuzzy: “if A is certain (Nx (A) = 1) then B is certain (Ny (B) = 1)”,
364
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
where Nx and Ny denote the necessity and possibility measures generated by the possibility distributions πx and πy respectively. Having in mind the logical equivalence in classical logic of the material implication p → q with the disjunction ¬p ∨ q, one could yet think of another interpretation of the fuzzy rule “if x is A then y is B” as “(x is Ac ) or (y is B)”, that is “πx ≤ 1 − µA or πy ≤ µB ”, or, put it in another way, if not(πx ≤ 1 − µA ) then πy ≤ µB In possibilistic terms it also reads (since A is non-fuzzy) “if A is possible (Πx (A) = 1) then B is certain (Ny (B) = 1)”. The difference between the two readings can be seen as relying on the two types of negation at work here, namely not(πx ≤ µA ) and πx ≤ 1 − µA respectively. With such meta-level models, we no longer need to apply the combination/projection principle on their representations because πy is directly assessed once the condition part of the rule is satisfied. In the fuzzy case, the two above readings can be generalized, turning them respectively into the inequalities inf u πx (u) → µA (u) ≤ inf v πy (v) → µB (v), supu T ∗ (πx (u), µA (u)) ≤ inf v πy (v) → µB (v), where T ∗ (α, β) = 1 − (α → (1 − β)) is the non-commutative conjunction adjoint of t-norm T . Observe that Cx (A) = inf u πx (u) → µA (u) and Cy (B) = inf v πy (v) → µB (v) are certainty-like indices, while P osx (A) = supx T ∗ (πx (u), µA (u)) = 1 − Cx (Ac ) is a possibility-like index. Certainty rules described in the previous section mean that “y is B” is certain as much as “x is A (µA (u) = 1), while the first meta rule reading states here that “y is B” is certain as much as “x is A” is certain. Its fuzzy extension above expresses that “the more certain x is A, the more certain y is B”, while the second one means “the more possible x is A, the more certain y is B”. Solving the above inequalities yields respectively πy (v) ≤ Cx (A) → µB (v), and πy (v) ≤ P osx (A) → µB (v), where → is a R-implication, which lays bare the behavior of such models. Namely, they modify the output by widening the core of B on the basis of some amount of uncertainty α, thus producing less restrictive outputs (since α → µB (v) ≥ µB (v), ∀α). Notice that as soon as the uncertainty degree is as low as µB (v), πy (v) is unrestricted. The two considered meta-level models of fuzzy rule coincide for a precise input x = u0 with gradual rules due to the use of R-implications in the approach. This meta-level view has been less investigated than the other ones (see [Esteva et al., 1997a]). However it underlies the so-called compatibility-modification inference of Cross and Sudkamp [1994].
Fuzzy Logic
2.5
365
Inference with fuzzy if-then rules
This section does not aim at providing a survey of the different fuzzy logic mechanisms that have been proposed in the literature in the eighties and in the nineties, nor an overview of the problems raised by their practical use and implementation. See [Bouchon-Meunier et al., 1999] in that respect. We focus our interest on a local pattern of inference of particular importance, usually called generalized modus ponens, which sufficiently illustrates the main issues. As we shall see, the properties of this pattern of inference heavily depend on the connective used for modeling the if-then rule. Moreover, classical modus ponens can be retrieved as a particular case for fuzzy premises only for appropriate choices of the implication in the fuzzy rule and of the operation for combining the two premises in the pattern. We shall discuss the meaning of this state of fact. The generalized modus ponens The generalized modus ponens can be viewed as a particular case of a more general rule, the compositional rule of inference, introduced by Zadeh [1979a]: From:
S = “(x, y) is F ” S ′ = “(y, z) is G”
Infer:
S ′′ = “(x, z) is F ◦ G”.
where: 1. x, y and z are linguistic variables taking values in U , V and W respectively, 2. F is a fuzzy subset of U × V , and G a fuzzy subset of V × W , and 3. F ◦ G is the fuzzy subset of U × W defined by sup-min composition of F and G, i.e., F ◦ G(u, w) = supv∈V min(F (u, v), G(v, w)). This is a direct consequence of the combination-projection method underlying the possibility theory-based treatment of inference. Indeed S and S ′ translate into the constraints πx,y (u, v) ≤ F (u, v) and πy,z (u, v) ≤ G(u, v). So, by combining them, after a cylindrical extension, we get πx,y,z (u, v, w) ≤ min(F (u, v), G(v, w)). Finally, projecting this constraint on the joint variable (x, z) we get πx,z (u, w) ≤ sup min(F (u, v), G(v, w)), v∈V
which yields, after application of the minimal specificity principle, the representation of the statement S ′′ in the above rule. This rule has found various applications. For instance, assume F = “approximately equal to”, G = “much greater than”,
366
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
S = “x is approximately equal to y”, S ′ = “y is somewhat greater than z”. Using parameterized representations of F and G, one can compute the parameters underlying F ◦ G, and then interpret it [Dubois and Prade, 1988a]. The generalized modus ponens inference pattern proposed by Zadeh [1973] is of the form: From:
S = “x is A∗ ” S ′ = “if x is A then y is B”
Infer:
S ′′ = “y is B ∗ ”.
It is a particular case of the compositional rule of inference where A and A∗ are fuzzy subsets of U , B is a fuzzy subset of V , and where statement S is represented by πx (u) ≤ A∗ (u), and S ′ is interpreted as a statement of the form “(x, y) is R”, represented by πx,y (u, v) ≤ R(u, v), where R is the fuzzy relation defined by R(u, v) = I(A(u), B(v)), I being some suitable implication connective. Then B ∗ = A∗ ◦ R. Speaking in an informal way, the idea is that the closer A∗ is to A, the closer the conclusion “y is B ∗ ”. is to the consequent “y is B” (however the underlying notion of closeness varies according to the modeling of the rule). For instance, when I is Kleene-Dienes implication, i.e., when we interpret the fuzzy rule as a certainty rule (see section 3.1), we get B ∗ (v) = sup min(A∗ (u), max(1 − A(u), B(v))) = max(1 − NA∗ (A), B(v)), u
where NA∗ (A) = inf u max(A(u), 1 − A∗ (u)) is the usual necessity measure of A, computed with π(u) = A∗ (u). B ∗ means that “y is B” is certain to the degree NA∗ (A). This agrees with the understanding of certainty rules as “the more certain x is A, the more certain y is B” in the presence of a fuzzy input “x is A∗ ”. It is also very similar to what is obtained in the meta-rule view (where there is more freedom left in the evaluation of certainty degrees when A is fuzzy). Note that with Kleene-Dienes implication (i.e., with certainty rules), we have A ◦ I(A, B) = B ∗ , where B ∗ = max(1 − NA (A), B), and when A is fuzzy it is only guaranteed that NA (A) ≥ 1/2, so the output B ∗ corresponds to “(y is B) is NA (A)-certain” and not to “y is B” (which is however obtained when A is not fuzzy). This means that the coincidence with classical modus ponens is lost. However, it holds that core(A) ◦ I(A, B) = B, which is well in agreement with the intended meaning of certainty rules. Indeed “y is B” is obtained only if NA∗ (A) = 1, which requires that the support of A∗ contains only typical elements of A (A∗ ⊆ core(A)). For instance, if A = “bird” (here a fuzzy set, the set of more or less typical birds) and B = “able to fly” (B is non-fuzzy), then B follows for sure only if x designates a typical bird. This contrasts with the situation encountered with gradual rules and G¨ odel implication, for which it holds that A ◦ I(A, B) = B in any case. In fact, it has
Fuzzy Logic
367
been noticed quite early that the use of the min operation in the combination step of the inference process (as stipulated by the possibilistic framework) is not compatible with the requirement that B ∗ = B can be derived when A∗ = A, except for G¨ odel implication. More generally, if we require that classical modus ponens continue to hold for fuzzy premises, more solutions are found if a combination operation T other than min (thus departing from the possibility theory setting) is allowed. Namely, we start with the functional equation expressing this requirement sup T (A(u), I(A(u), B(v))) = B(v). u
This problem has been addressed from two slightly different points of view in [Trillas and Valverde, 1985; Valverde and Trillas, 1985] and [Dubois and Prade, 1984b; Dubois and Prade, 1985b]. Solutions to the above equation are provided by choosing T as a continuous t-norm and I as its associated residuated implication. Apart from the perfect coincidence with classical modus ponens, other natural or desirable requirements have been proposed for the generalized modus ponens by different authors who have looked for the appropriate implications (and possibly combination operations) that ensure these required properties (see e.g., [Baldwin and Pilsworth, 1980; Fukami et al., 1980; Mizumoto and Zimmermann, 1982; Whalen and Schott, 1983; Whalen and Schott, 1985; Whalen, 2003]). Some of these requirements like monotonicity (A∗1 ⊆ A∗2 implies B1∗ ⊆ B2∗ , where fuzzy set inclusion is pointwisely defined by an inequality between membership degrees) are always satisfied, while some other “natural” ones, like B ∗ ⊇ B (nothing more precise than what the rule says can be inferred) may sometimes be debatable (e.g., if we are modeling interpolative reasoning), and are violated by some implications such as Rescher-Gaines implication which is defined by I(α, β) = 1 if α ≤ β and I(α, β) = 0 if α > β, and which corresponds to the core of G¨ odel implication. Systems of parallel fuzzy if-then rules Let us now briefly consider the case of a system of parallel implication-based fuzzy if-then rules { “if x is Ai then y is Bi ” }i=1,n . Each rule i is represented by the inequality ∀i, πx,y (u, v) ≤ I(Ai (u), Bi (v)). This leads to πx,y (u, v) ≤ min I(Ai (u), Bi (v)). i
By projection and applying the minimal specificity principle, the inference from the set of parallel implication-based rules, and a fact “x is A∗ ”, produces “y is B ∗ ” defined by B ∗ (v) = sup min(A∗ (u), min I(Ai (u), Bi (v))). u
i
368
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Denoting the above inference B ∗ = A∗ ◦ [∩i (Ai → Bi )], the following inclusion can be easily established A∗ ◦ [∩i=1,n (Ai → Bi )] ⊆ ∩i=1,n [A∗ ◦ (Ai → Bi )]. This expresses that the combination/projection principle should be performed globally (which can be computationally heavy), if one wants to obtain an exact result rather than a valid but imprecise result. In other words, it might be rather uninformative to perform each inference Bi∗ = A∗ ◦ (Ai → Bi ) separately and then combine the Bi∗ ’s in a conjunctive manner. For instance if A∗ = Ai ∪ Aj for some i and j such that Ai ∩ Aj = ∅ then A∗ ◦ (Ai → Bi ) = V (nothing is inferred) while odel implication. This prop(Ai ∪ Aj ) ◦ [(Ai → Bi ) ∩ (Aj → Bj )] = Bi ∪ Bj , for G¨ erty points out a major weakness in the traditional rule by rule strategy used in many expert system inference engines (that prescribe to trigger rules separately), in the presence of fuzziness, or even incomplete Boolean information. Techniques for reasoning with parallel fuzzy implication-based rules in the presence of imprecise outputs have been little studied in the literature (see [Ughetto et al., 1997] for gradual rules, and a more general theoretical study in [Morsi and Fahmy, 2002]). Inference with fuzzy conjunctive rules Let us examine the situation with a conjunction-based model for fuzzy rules (see Section 2.4). For an input “x is A∗ ” and a fuzzy rule “if x is A then y is B” assumed to be represented by πx,y (u, v) = min(A(u), B(v)), the combination/projection method yields the output B ∗ (v) = sup min(A∗ (u), min(A(u), B(v))) u∈U
This expression, which corresponds to Mamdani[1977]’s model, can be simplified into B ∗ (v) = min(ΠA∗ (A), B(v)) where ΠA∗ (A) = supu∈U min(A∗ (u), A(u)) is the possibility of A computed with π = A∗ (·). Let us denote this fuzzy inference A∗ ◦ (A × B) = B ∗ . Note that A◦(A×B) = B if A is normalized. However, we should go back to the understanding of such rules as positive pieces of information (see section 3.1) for explaining why parallel conjunction-based fuzzy rules should be combined disjunctively, as in Mamdani’s model of fuzzy control inference. Indeed from a bipolar possibility theory point of view, a system of conjunctionbased rules (where each rule is modeled by the Cartesian product Ai × Bi , i. e., ∀i, πx,y (u, v) ≥ min(Ai (u), Bi (v)) leads to the inequality πx,y (u, v) ≥ max min(Ai (u), Bi (v)). i
Then, given a set of fuzzy if-then rules {“if x is Ai then y is Bi ” : i = 1, n} and an input “x is A∗ ”, Mamdani’s method consists in three steps:
Fuzzy Logic
369
(i) The output Bi∗ for each rule is computed as follows: Bi∗ (v) = sup min(A∗ (u), min(Ai (u), Bi (v))) = min(ΠA∗ (Ai ), Bi (v)). u
(ii) The global output B ∗ is then the disjunctive combination of the outputs of each rule, which allows for a rule by rule computation. Indeed applying the maximal specificity principle to the representation of the set of rules, and then the combination/projection method, we get B ∗ (v) = sup min(A∗ (u), max min(Ai (u), Bi (v))) = max Bi∗ (v). i
u
i
(iii) Finally, there is a defuzzification process in order to come up with a single value v0 ∈ B ∗ for y. This defuzzication step is out of the scope of logic and then of this paper. Still, problems remain with the inference with conjunction-based rules in case of a fuzzy input. Indeed, the above approach is questionable because adding a rule then may lead to a more imprecise conclusion (before defuzzification), and Aj ◦ ∪i (Ai × Bi ) = Bj except if the Ai ’s are disjoint as pointed out in [Di Nola et al., 1989]. To overcome these difficulties, it is useful to consider the fuzzy relation obtained from a set of conjunction-based rules for what it really is, namely, positive information, as proposed in [Dubois et al., 2003b]. A conjunctive rule base actually is a memory of fuzzy cases. Then, what appeared to be anomalies under the negative information view, becomes natural. It is clear that adding a new conjunctive rule to a fuzzy case memory should expand the possibilities, not reduce them. The fuzzy input still consists in a restriction on the values of the input variable and thus is of a different nature. It is in some sense negative information. So, the question is “how to exploit a set of fuzzy cases, which for each input value describes the fuzzy set of guaranteed possible output values, on the basis of negative imprecise information on the input?” In fact, what has to be computed, via an appropriate projection, are the output values that are guaranteed possible for y, for all values of x compatible with the restriction A∗ on the input value. The expected conclusion, in terms of guaranteed possible values, is given for a nonfuzzy input A∗ by B∗ (v) = inf max min(Ai (u), Bi (v)). u∈A∗
i
What is computed is the intersection of the sets of images of precise inputs compatible with A∗ . Any value y = v in this intersection is guaranteed possible, by any input value compatible with A∗ . The term B∗ is the lower image of A∗ via the fuzzy relation aggregating the fuzzy cases conjunctively. In the case where none of the sets are fuzzy, B∗ = {v ∈ V | ∀u ∈ A∗ , ∃i s.t. u ∈ Ai and v ∈ Bi } = A∗ → ∪i (Ai × Bi )
370
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
In the case where A∗ is fuzzy, B ∗ is defined by B∗ (v) = inf {A∗ (u) → max min(Ai (u), Bi (v))} u
i
where → is G¨ odel implication. Indeed, starting that, from the constraints πx,y (u, v) πx (u)
≥
max min(Ai (u), Bi (v)), for i = 1, . . . , n
i=1,n ∗
≤ A (u)
representing respectively the positive information given by the set of conjunctivebased fuzzy rules and the negative information corresponding to the input, one can derive by simple computations the following further constraint πy (v)
≥ inf {A∗ (u) → max min(Ai (u), Bi (v))}, u
i=1,n
provided that πx is normalized (i.e. supu πx (u) = 1). For any fixed value v of y, B∗ (v) is nothing but the guaranteed possibility measure ∆v (A∗ ) of A∗ as being in relation with v through the fuzzy relation aggregating the fuzzy cases (while B ∗ (v) was the possibility measure Πv (A∗ ) of the same event). It can be checked that for usual fuzzy partitions (such that Ai (u) = 1 ⇒ Aj (u) < 1 for j = i), if A∗ = Ai , then B∗ = Bi , a result that cannot be obtained using the sup-min composition. Di Nola et al. [1985] have pointed out that, when the rule is modeled by means of a t-norm T , R(u, v) = T (A(u), B(v)) is the least solution of the fuzzy relational equation inf I(A(u), R(u, v)) = B(v), u
where I is the residuated implication associated with T . When T = min, I is G¨ odel implication. Note that this definition of R as a least solution is well in accordance with the interpretation of the possibility rules, to which a principle of maximal specificity must be applied. Capturing interpolation in approximate reasoning Many authors, including Zadeh [1992], have pointed out that approximate reasoning techniques in fuzzy control, such as Mamdani’s method, perform an interpolation between the conclusions of the rules of the fuzzy controller, on the basis of the degrees of matching of the (usually precise) input measurements (describing the current state of the system to be controlled), with the condition parts of these rules. However the interpolative effect is achieved by defuzzification and is not part of the logical inference step. Klawonn and Novak [1996] have contrasted fuzzy interpolation on the basis of an imprecisely known function (described by fuzzy points Ai × Bi ) and logical inference in the presence of fuzzy information. Besides, Sudkamp [1993] discusses the construction of fuzzy rules from sets of pairs of precise values (ai , bi ) and similarity relations on U and V .
Fuzzy Logic
371
Sugeno and Takagi [1983]’s fuzzy modeling method (see also [Sugeno, 1985] for control) can be viewed as a special case of Mamdani’s and a generalization thereof. It starts from n rules with precise numerical conclusion parts, of the form “if x1 (i) (i) is A1 and . . . and xp is Ap then y is b(i) (x)”, where x = (x1 , . . . , xp ). Here the conclusions in the rules depend on the input value, contrary to the fuzzy rules (i) (i) in Mamdani’s approach. Let αi (u) = min(A1 (u1 ), . . . , Ap (up )) be the level of matching between the input and the conditions of rule i. Sugeno and Takagi define the relation between x and y to be the following function: y=
Σi αi (x) · b(i) (x) Σi αi (x)
which indeed performs a weighted interpolation. This result can be retrieved using Mamdani’s method, noticing that in this case B ∗ = {b(i) (u)/αi (u) : i = 1, n}, where b/µ indicates that element b has membership value µ, and applying the center of gravity method for selecting a value representing B ∗ . When the conclusions b(i) (x) = bi do not depend on x, and assuming single condition rules, this interpolation effect can be obtained within the inference step, by applying Zadeh’s approximate reasoning combination and projection approach. For this purpose, consider the rules as pure gradual rules (based on RescherGaines implication rather than on G¨ odel’s), expressing that “the closer x is to ai , the closer y is to bi ”, where (ai , bi ), i = 1, n are pairs of scalar values, where we assume a1 < . . . < ai−1 < ai < ai+1 < . . . < an . The first problem is to represent “close to ai ”, by means of a fuzzy set Ai . It seems natural to assume that Ai (ai−1 ) = Ai (ai+1 ) = 0 since there are special rules adapted to the cases x = ai−1 , x = ai+1 . Moreover if u = ai , Ai (u) < 1 for u ∈ (ai−1 , ai+1 ), since information is only available for u = ai . Hence Ai should be a fuzzy interval with support (ai−1 , ai+1 ) and core {ai }. Since the closer x is to ai−1 , the farther it is from ai , Ai−1 should decrease when Ai increases, and by symmetry, Ai ((ai + ai+1 )/2) = Ai−1 ((ai−1 + ai )/2) = 1/2. The simplest way of achieving this is to let ∀u ∈ [ai−1 , ai ], Ai−1 (u) + Ai (u) = 1, an example of which are triangular-shaped fuzzy sets. Clearly the conclusion parts of the rules should involve fuzzy sets Bi whose meaning is “close to bi ”, with similar conventions. In other words, each rule is understood as “the more x is Ai , the more y is Bi ”. Pure gradual rules are modeled by inequality constraints of the form Ai (u) ≤ Bi (v). Then the subset of V obtained by combining the results of the rules for the input x = u0 is given by B ∗ (v) = min Ai (u0 ) → Bi (v) i=1,n
where the implication is the one of Rescher-Gaines, defined by a → b = 1 if a ≤ b and a → b = 0 if a > b. In that case the output associated with the precise input u0 where ai−1 < u0 < ai , is B ∗ = (αi−1 → Bi−1 ) ∩ (αi → Bi ) = [Bi−1 ]αi−1 ∩ [Bi ]αi
372
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
since α → B(·) corresponds to the level cut [B]α , αi−1 = Ai−1 (u0 ), αi = Ai (u0 ), and αi−1 + αi = 1. Due to the latter assumption it can be easily proved (without the assumption of triangular shaped fuzzy sets), that there exists a unique value y = b such that B ∗ (b) = 1, which exactly corresponds to the result of the linear interpolation, i.e., b = αi−1 · bi−1 + αi · bi . The conclusion thus obtained is nothing but the singleton value computed by Sugeno and Takagi’s method. It is a theoretical justification for this inference method in the one-dimensional case. Hence reasoning with gradual rules does model interpolation, linear interpolation being retrieved as a particular case. The more complicated case of gradual rules with compound conditions, i.e., rules of the form “the more x1 is A1 , . . . , and the more xp is Ap , the more y is B” is also studied in detail in [Dubois et al., 1994]. Then provided that the rules satisfy a coherence condition, the output of a system of pure gradual rules, where conditions and conclusions are fuzzy intervals, is an interval.
2.6 Concluding remarks on approximate reasoning The presentation has emphasized the basic ideas underlying Zadeh’s original proposal, showing their consistency, their close relation to the representation setting of possibility theory. Various inference machineries can be handled at the semantic level. Still many issues of interest considered elsewhere in the literature (see [Bouchon-Meunier et al., 1999]), like computational tractability, coherence of a set of fuzzy rules, special applications to temporal or to order-of-magnitude reasoning, the handling of fuzzy quantifiers (viewed as imprecisely known conditional probabilities) in reasoning patterns, fuzzy analogical reasoning, interpolative reasoning with sparse fuzzy rules, etc, have been left apart, let alone more practically oriented research works. This framework can express pieces of information with rich contents. The important but sometimes misleading, notion of fuzzy truthvalue, encompassing both notions of intermediate degrees of truth and (degrees of ) uncertainty about truth has been discussed at length. It is crucial for a proper appraisal of the line of thought followed by the founder of fuzzy logic, and in order to situate the role of fuzzy logic in the narrow sense, mainly developed in the nineties and summarized in the remainder of this paper, for the purpose of knowledge representation. In the meantime, in the last twenty years, Zadeh [1988; 1989; 1997; 1999; 2001; 2005] has continued to elaborate his semantic, nonlinear optimization approach to human fuzzy and uncertain reasoning, to precisiate as well as to enlarge it, to propose new perspectives, emphasizing the importance of key-notions like computing with words and perceptions as opposed to numbers and measurements, and information granulation.
Fuzzy Logic
3
373
MANY VALUED LOGICAL SYSTEMS BASED ON FUZZY SET CONNECTIVES
In the preface of the book [Zadeh, 1994a], Zadeh made a very clear distinction between the two main meanings of the term fuzzy logic. Indeed, he writes: The term “fuzzy logic” has two different meanings: wide and narrow. In a narrow sense it is a logical system which aims a formalization of approximate reasoning. In this sense it is an extension of multivalued logic. However the agenda of fuzzy logic (FL) is quite different from that of traditional many-valued logic. Such key concepts in FL as the concept of linguistic variable, fuzzy if-then rule, fuzzy quantification and defuzzification, truth qualification, the extension principle, the compositional rule of inference and interpolative reasoning, among others, are not addressed in traditional systems. In its wide sense, FL, is fuzzily synonymous with the fuzzy set theory of classes of unsharp boundaries. H´ ajek, in the introduction of his monograph [H´ ajek, 1998a] makes the following comment to Zadeh’s quotation: Even if I agree with Zadeh’s distinction (. . . ) I consider formal calculi of many-valued logic to be the kernel of fuzzy logic in the narrow sense and the task of explaining things Zadeh mentions by means of this calculi to be a very promising task. On the other hand, Nov´ ak et al., also in the introduction of their monograph [Nov´ ak et al., 1999], write: Fuzzy logic in narrow sense is a special many-valued logic which aims at providing formal background for the graded approach to vagueness. According to H´ ajek and Nov´ ak et al.’s point of view, this section is devoted to the formal background of fuzzy logic in narrow sense, that is, to formal systems of many-valued logics having the real unit interval as set of truth values, and truth functions defined by fuzzy connectives that behave classically on extremal truth values (0 and 1) and satisfy some natural monotonicity conditions. Actually, these connectives originate from the definition and algebraic study of set theoretical operations over the real unit interval, essentially developed in the eighties, when this field had a great development. It was in that period when the use of tnorms and t-conorms as operations to model fuzzy set conjunction and disjunction respectively was adopted, and related implication and negation functions were studied, as reported in Section 2.1. Therefore, the syntactical issues of fuzzy logic have followed the semantical ones. The main many-valued systems described in this section are the so-called tnorm based fuzzy logics. They correspond to [0, 1]-valued calculi defined by a
374
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
conjunction and an implication interpreted respectively by a (left-continuous) tnorm and its residuum, and have had a great development over the past ten years and from many points of view (logical, algebraic, proof-theoretical, functional representation, and complexity), as witnessed by a number of important monographs that have appeared in the literature, see [H´ ajek, 1998a; Gottwald, 2001; Nov´ ak et al., 1999]. Actually, two prominent many-valued logics that fall in this class, namely L ukasiewicz and G¨ odel infinitely-valued logics [L ukasiewicz, 1930; G¨ odel, 1932], were defined much before fuzzy logic was born. They indeed correspond to the calculi defined by L ukasiewicz and min t-norms respectively. L ukasiewicz logic L has received much attention from the fifties, when completeness results were proved by Rose and Rosser [1958], and by algebraic means by Chang [1958; 1959], who developed the theory of MV-algebras largely studied in the literature. Moreover McNaughton theorem [McNaughton, 1951] provides a functional description of its logical functions. Many results about L ukasiewicz logic and MV-algebras can be found in the book [Cignoli et al., 1999]. On the other hand, a completeness theorem for G¨ odel logic was already given in the fifties by Dummett [1959]. Note that the algebraic structures related to G¨odel logic are linear Heyting algebras (known as G¨ odel algebras in the context of fuzzy logics), that have been studied in the setting of intermediate or superintuitionistic logics, i.e. logics between intuitionistic and classical logic. The key ideas of these logical systems are described in the first three subsections. Then, in the next two subsections, more complex systems resulting from the addition of new connectives, as well as a number of further issues related to t-norm based fuzzy logics, are briefly surveyed. The sixth subsection shows how to embed the main patterns of approximate reasoning inside a residuated fuzzy logic. The following subsection is devoted to variants of fuzzy logic systems, including clausal and resolution-based fuzzy logics. The former are mainly systems related to the logical calculi on the real unit interval defined by a De Morgan triple: a t-norm for conjunction, a strong negation and the dual t-connorm for disjunction. The section concludes with a subsection dealing with notions of graded consequence and their relationship to closure operators, in a Tarski-style. This is a different approach to formalize a form of fuzzy logic which, in particular, has been the topic of Gerla’s monograph [2001] and partially also in [Bˇelohl´ avek, 2002b]. Even if the set of topics addressed in this section is very wide, we acknowledge the fact that we do not cover for sure all the approaches and aspects of formal systems of fuzzy logic that have been proposed in the literature. This is the case for instance of a whole research stream line on fuzzifying modal logics, started indeed very early by Schotch [1975], and then enriched by a number of significant contributions, like Gabbay’s general fibring method for building fuzzy modal logics [Gabbay, 1996; Gabbay, 1997] or the introduction of various types of modalities in the frame of the above mentioned t-norm based fuzzy logics [H´ ajek, 1998a, Chap. 8], to cite only a very few of them.
Fuzzy Logic
3.1
375
BL and related logics
Probably the most studied and developed many-valued systems related to fuzzy logic are those corresponding to logical calculi with the real interval [0, 1] as set of truth-values and defined by a conjunction & and an implication → interpreted respectively by a (left-continuous) t-norm ∗ and its residuum ⇒, and where negation is defined as ¬ϕ = ϕ → 0, with 0 being the truth-constant for falsity. In the framework of these logics, called t-norm based fuzzy logics, each (left continuous) t-norm ∗ uniquely determines a semantical (propositional) calculus P C(∗) over formulas defined in the usual way from a countable set of propositional variables, connectives ∧, & and → and truth-constant 0 [H´ ajek, 1998a]. Further connectives are defined as follows: ϕ∨ψ ¬ϕ ϕ≡ψ
is is is
((ϕ → ψ) → ψ) ∧ ((ψ → ϕ) → ϕ), ϕ → ¯0, (ϕ → ψ)&(ψ → ϕ).
Evaluations of propositional variables are mappings e assigning each propositional variable p a truth-value e(p) ∈ [0, 1], which extend univocally to compound formulas as follows: e(0) = 0 e(ϕ ∧ ψ) = min(e(ϕ), e(ψ)) e(ϕ&ψ) = e(ϕ) ∗ e(ψ) e(ϕ → ψ) = e(ϕ) ⇒ e(ψ) Note that, from the above defintions, e(ϕ ∨ ψ) = max(e(ϕ), e(ψ)), ¬ϕ = e(ϕ) ⇒ 0 and e(ϕ ≡ ψ) = e(ϕ → ψ) ∗ e(ψ → ϕ). A formula ϕ is a said to be a 1-tautology of P C(∗) if e(ϕ) = 1 for each evaluation e. The set of all 1-tautologies of P C(∗) will be denoted as T AU T (∗). Three outstanding examples of (continuous) t-norm based fuzzy logic calculi are: G¨ odel logic calculus: defined by the operations x ∗G y x ⇒G y
=
min(x, y) 1, if x ≤ y = y, otherwise.
L ukasiewicz logic calculus: defined by the operations x ∗L y x ⇒L y
max(x + y − 1, 0) 1, if x ≤ y = 1 − x + y, otherwise.
=
376
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Product logic calculus: defined by the operations x ∗Π y x ⇒Π y
= x · y (product of reals) 1, if x ≤ y = y/x, otherwise.
These three cases are important since each continuous t-norm is definable as an ordinal sum of copies of L ukasiewicz, Minimum and Product t-norms (see e.g. [Klement et al., 2000]), and the min and max operations are definable from ∗ and ⇒. Indeed, for each continuous t-norm ∗ and its residuated implication ⇒, the following identities are true: min(x, y) max(x, y)
= x ∗ (x ⇒ y), = min((x ⇒ y) ⇒ y, (y ⇒ x) ⇒ x).
Actually, two of these logics correspond to many-valued systems already studied before fuzzy logic was born. These are the well-known infinitely-valued L ukasiewicz [1930] and G¨ odel [1932] logics4 which are the logical systems corresponding to the so-called L ukasiewicz and minimum t-norms and their residuated implications respectively (see, for example, [Cignoli et al., 1999; Gottwald, 2001] for excellent descriptions of these logics). Much later, already motivated by research on fuzzy logic, Product logic, the many-valued logic corresponding to Product t-norm and its residuum, was also axiomatized in [H´ ajek et al., 1996]. All these logics enjoy standard completeness, that is, completeness with respect to interpretations over the algebra on the unit real interval [0, 1] defined by the corresponding t-norm and its residuum. Namely, it holds that: ϕ is provable in L ukasiewicz logic ϕ is provable in G¨ odel logic ϕ is provable in Product logic
iff iff iff
ϕ ∈ T AU T (∗L ) ϕ ∈ T AU T (∗G ) ϕ ∈ T AU T (∗Π ).
A main step in the formalization of fuzzy logic in narrow sense is H´ ajek’s monograph [H´ ajek, 1998a], where the author introduced the Basic Fuzzy logic BL as a common fragment of the above mentioned three outstanding many-valued logics, and intending to syntactically capture the common tautologies of all propositional calculi P C(∗) for ∗ being a continuous t-norm. The language of BL logic is built (in the usual way) from a countable set of propositional variables, a conjunction &, an implication → and the constant 0. Since for a continuous t-norm ∗ and its residuum ⇒ we have min(x, y) = x ∗ (x ⇒ y), in BL the connective ∧ is taken as definable from & and →: 4 G¨ odel logic is also known as Dummett logic, referring to the scholar who proved its completeness.
Fuzzy Logic
ϕ∧ψ
is
377
ϕ&(ϕ → ψ)
Other connectives (∨, ¬, ≡) are defined as in P C(∗). The following formulas are the axioms5 of BL: (A1) (A2) (A3) (A4) (A5a) (A5b) (A6) (A7)
(ϕ → ψ) → ((ψ → χ) → (ϕ → χ)) (ϕ&ψ) → ϕ (ϕ&ψ) → (ψ&ϕ) (ϕ&(ϕ → ψ) → (ψ&(ψ → ϕ)) (ϕ → (ψ → χ)) → ((ϕ&ψ) → χ) ((ϕ&ψ) → χ) → (ϕ → (ψ → χ)) ((ϕ → ψ) → χ) → (((ψ → ϕ) → χ) → χ) ¯0 → ϕ
The deduction rule of BL is modus ponens. Axiom (A1) captures the transitivity of the residuum, axioms (A2) and (A3) stand for the weakening and commutativity properties of the conjunction, axiom (A4) forces the commutativity of the defined ∧ connective and it is related to the divisibility and the continuity of the &, axioms (A5a) and (A5b) stand for the residuation property of the pair (&, →), axiom (A6) is a form of proof-by-cases property and is directly related to the pre-linearity axiom (ϕ → ψ) ∨ (ψ → ϕ), which is an equivalent formulation of (A6), and finally axiom (A7) establishes that 0 is the least truth-value. These axioms and deduction rule defines a notion of proof, denoted ⊢BL , in the usual way. As a matter of fact, L ukasiewicz, G¨odel and Product logics are axiomatic extensions of BL. Indeed, it is shown in [H´ ajek, 1998a] that L ukasiewicz logic is the extension of BL by the axiom (L)
¬¬ϕ → ϕ,
forcing the negation to be involutive, and G¨ odel logic is the extension of BL by the axiom (G)
ϕ → (ϕ&ϕ).
forcing the conjunction to be idempotent. Finally, product logic is just the extension of BL by the following two axioms: (Π1) (Π2)
¬¬χ → (((ϕ&χ) → (ψ&χ)) → (ϕ → ψ)), ϕ ∧ ¬ϕ → ¯0.
The first axiom indicates that if c = 0, the cancellation of c on both sides of the inequality a · c ≤ b · c is possible, hence the strict monotony of the conjunction 5 These are the original set of axioms proposed by H´ ajek in [H´ ajek, 1998a]. Later Cintula showed [Cintula, 2005a] that (A3) is redundant.
378
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
on (0, 1]. The last axiom is due to the fact that negation in product logic behaves such that n(a) = a → 0 = 0 if a > 0. From a semantical point of view, if one takes a continuous t-norm ∗ for the truth function of & and the corresponding residuum ⇒ for the truth function of → (and evaluating 0 by 0) then all the axioms of BL become 1-tautologies (have identically the truth value 1). And since modus ponens preserves 1-tautologies, all formulas provable in BL are 1-tautologies, i.e. if ⊢BL ϕ then ϕ ∈ ∩{T AU T (∗) : ∗ is a continuous t-norm}. This shows that BL is sound with respect to the standard semantics, i.e. with respect to evaluations on [0, 1] taking as truthfunctions continuous t-norms and their residua. Actually, standard semantics is a particular case of a more general algebraic semantics. Indeed, the algebraic counterpart of BL logic are the so-called BLalgebras. A BL-algebra is an algebra L = L, ∗, ⇒, ∧, ∨, 0, 1 with four binary operations and two constants such that: (i) (L, ∧, ∨, 0, 1) is a lattice with the largest element 1 and the least element 0 (with respect to the lattice ordering ≤), (ii) (L, ∗, 1) is a commutative semigroup with the unit element 1, i.e. ∗ is commutative, associative and 1 ∗ x = x for all x, (iii) the following conditions hold: (1) z ≤ (x ⇒ y) iff x ∗ z ≤ y for all x, y, z.
(residuation)
(2) x ∧ y = x ∗ (x ⇒ y)
(divisibility)
(3) (x ⇒ y) ∨ (y ⇒ x) = 1.
(pre-linearity)
Thus, in other words, a BL-algebra is a bounded, integral commutative residuated lattice satisfying (2) and (3). The class of all BL-algebras forms a variety. Due to (3), each BL-algebra can be decomposed as a subdirect product of linearly ordered BL-algebras. BL-algebras defined on the real unit interval [0, 1], called standard BL-algebras, are determined by continuous t-norms, i.e. any standard BL-algebra is of the form [0, 1]∗ = [0, 1], ∗, ⇒, min, max, 0, 1 for some continuous t-norm ∗, where ⇒ is its residuum. By defining ¬x = x ⇒ 0, it turns out that the algebraic semantics of L ukasiewicz logic, defined by the class of MV-algebras (or Wajsberg algebras), correspond to the subvariety of BL-algebras satisfying the additional condition ¬¬x = x, while the algebraic semantics of G¨odel logic, defined by the class of G-algebras, corresponds to the subvariety of BL-algebras satisfying the additional condition x ∗ x = x. Finally, Product algebras, which define the algebraic semantics for Product logic, are just BL-algebras further satisfying x ∧ ¬x = 0, ¬¬z ⇒ ((x ∗ z = y ∗ z) ⇒ x = y) = 1. Given a BL-algebra L, one can define L-evaluations of formulas in the same way
Fuzzy Logic
379
as in [0, 1] just by taking as truth-functions the operations of L. An L-evaluation e is called a model of a formula ϕ when e(ϕ) = 1 (1 being the top element of the algebra), and it is a model of a set of formulas Γ if it is a model of every formula of Γ. A L-tautology is then a formula getting the value 1 for each L-evaluation, i.e. any L-evaluation is a model of the formula. In particular, when L = [0, 1]∗ , the set of L-tautologies is the set T AU T (∗) introduced before. Then, the logic BL is sound with respect to L-tautologies: if ϕ is provable in BL then ϕ is an L-tautology for each BL-algebra L. Moreover, H´ ajek proved the following completeness results for BL, namely the following three conditions are proved in [H´ajek, 1998a] to be equivalent: (i) Γ ⊢BL ϕ, (ii) for each BL-algebra L, any L-evaluation which is a model of Γ, it is a model of ϕ as well, (iii) for each linearly ordered BL-algebra L, any L-evaluation which is a model of Γ, it is a model of ϕ as well, H´ ajek’s conjecture was that BL captured the 1-tautologies common to all manyvalued calculi defined by a continuous t-norm. In fact this was proved [H´ ajek, 1998b; Cignoli et al., 2000] to be the case soon after, that is, it holds that ϕ is provable in BL
iff
ϕ∈
{T AU T (∗) : ∗ is a continuous t-norm}
This is the so-called standard completeness property for BL. More than that, a stronger completeness property holds: if Γ is a finite set of formulas, then Γ ⊢BL ϕ if and only if for each standard BL-algebra L, any L-evaluation which is a model of Γ, it is a model of ϕ. This result is usually referred as finite strong standard completeness of BL. On the other hand, in [Esteva et al., 2004] the authors provide a general method to get a finite axiomatization, as an extension of BL, of each propositional calculus P C(∗), for ∗ being a continuous t-norm. Therefore, for each of these logics, denoted L∗ , one has that a formula ϕ is provable in L∗ iff ϕ ∈ T AU T (∗). Note that L∗ is equivalent to G¨ odel logic G when ∗ = min, to L ukasiewicz logic L when ∗ is the L ukasiewicz t-norm ∗L and to Product logic when ∗ is the product of real numbers. Actually, the book [H´ ajek, 1998a] was the starting point of many fruitful and deep research works on BL logic and their extensions, as well as on its algebraic counterpart, the variety of BL-algebras. See the special issue [Esteva and Godo (eds.), 2005] for a quite exhaustive up-to-date overview on recent results on BLalgebras and BL-logics. The well-known result that a t-norm has residuum if and only if the t-norm is left-continuous makes it clear that BL is not the most general t-norm-based logic (in the setting of residuated fuzzy logics). In fact, a weaker logic than BL, called Monoidal t-norm-based Logic, MTL for short, was defined in [Esteva and Godo, 2001] and proved in [Jenei and Montagna, 2002] to be the logic of left-continuous
380
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
t-norms and their residua. Thus MTL is indeed the most general residuated tnorm-based logic. The basic difference between BL and MTL is the divisibility axiom (or algebraically the equality x ∧ y = x ∗ (x ⇒ y)), which characterizes the continuity of the t-norm and which is not satisfied in MTL. This means that the min-conjunction ∧ is not definable in MTL and, as opposed to BL, it has to be introduced as a primitive connective into the language together with BL primitive connectives (strong conjunction &, implication → and the truth constant 0). Axioms of MTL are obtained from those of BL by replacing axiom (A4) by the three following ones: (A4a) (A4b) (A4c)
ϕ∧ψ →ϕ ϕ∧ψ →ψ∧ϕ ϕ&(ϕ → ψ) → ϕ ∧ ψ
Most of well-known fuzzy logics (among them L ukasiewicz logic, G¨odel logic, H´ ajek’s BL logic and Product logic)—as well as the Classical Propositional Calculus6 —can be presented as axiomatic extensions of MTL. Tables 2 and 3 collect some axiom schemata7 and the axiomatic extensions of MTL they define8 . Notice that in extensions of MTL with the divisibility axiom (Div), i.e. in extensions of BL, the additive conjunction ∧ is in fact definable and therefore it is not considered as a primitive connective in their languages. For the sake of homogeneity we will keep L = {&, →, ∧, 0} as the common language for all MTL extensions. The algebraic counterpart of MTL logic is the class of the so-called MTLalgebras. MTL-algebras are in fact pre-linear residuated lattices (understood as commutative, integral, bounded residuated monoids). Of particular interest are the MTL-algebras defined on the real unit interval [0, 1], which are defined in fact by left-continuous t-norms and their residua. Jenei and Montagna proved that MTL is (strongly) complete with respect to the class of MTL-algebras defined on the real unit interval. This means in particular that ϕ is provable in MTL
iff
ϕ∈
{T AU T (∗) : ∗ is a left-continuous t-norm}.
One common property of all MTL extensions is that they enjoy a local form of the deduction theorem, namely, for any MTL axiomatic extension L it holds that Γ ∪ {ϕ} ⊢L ψ iff there exists n ∈ N such that Γ ⊢L ϕn → ψ, 6 Indeed, Classical Propositional Calculus can be presented as the extension of MTL (and of any of its axiomatic extensions) with the excluded-middle axiom (EM). 7 Axioms of pseudo-complementation (PC) and n-contraction (C ) are also known respectively n by the names of weak contraction and n-potence, see e.g. [Galatos et al., 2007]. 8 Of course, some of these logics were known well before MTL was introduced. We only want to point out that it is possible to present them as the axiomatic extensions of MTL obtained by adding the corresponding axioms to the Hilbert style calculus for MTL given above. Moreover, these tables only collect some of the most prominent axiomatic extensions of MTL, even though many other ones have been studied in the literature (see e.g. [Noguera, 2006], [Wang et al., 2005b] and [Wang et al., 2005a]).
Fuzzy Logic
Axiom schema ¬¬ϕ → ϕ ¬ϕ ∨ ((ϕ → ϕ&ψ) → ψ) ¬(ϕ&ψ) ∨ ((ψ → ϕ&ψ) → ϕ) ϕ → ϕ&ϕ ϕ ∧ ψ → ϕ&(ϕ → ψ) ϕ ∧ ¬ϕ → 0 ϕ ∨ ¬ϕ (ϕ&ψ → 0) ∨ (ϕ ∧ ψ → ϕ&ψ) ϕn−1 → ϕn
381
Name Involution (Inv) Cancellation (C) Weak Cancellation (WC) Contraction (Con) Divisibility (Div) Pseudo-complementation (PC) Excluded Middle (EM) Weak Nilpotent Minimum (WNM) n-Contraction (Cn )
Table 2. Some usual axiom schemata in fuzzy logics. where ϕn stands for ϕ& . n. . &ϕ. It is local in the sense that n depends on particular formulas involved Γ, ϕ and ψ. It turns out that the only axiomatic extension of MTL for which the classical (global) deduction theorem Γ ∪ {ϕ} ⊢L ψ iff Γ ⊢L ϕ → ψ holds is for L being G¨ odel fuzzy logic. This fact clearly indicates, that in general, syntactic inference ϕ ⊢L ψ in BL, MTL and any of their extensions L does not implement Zadeh’s entailment principle of approximate reasoning in the semantics (except in G¨ odel logic). For Zadeh, the inference of a fuzzy proposition ψ from ϕ means that ψ is always at least as true as ϕ in all interpretations. At the syntactic level, it generally corresponds to proving ⊢L ϕ → ψ, not ϕ ⊢L ψ. At the semantic level, the latter only corresponds to the inclusion of cores of the corresponding fuzzy sets (that is, the preservation of the highest membership value 1). Regarding this issue, the NM logic can be considered the closest to G¨ odel logic, since it also enjoys a global form of deduction theorem, but with n = 2 in the above deduction theorem expression, i.e. it holds that Γ ∪ {ϕ} ⊢N M ψ iff Γ ⊢N M ϕ&ϕ → ψ . for all Γ, ϕ, ψ. Actually, NM is a genuine MTL-extension (i.e. it is not a BLextension) that axiomatizes the calculus defined by the nilpotent minimum t-norm ∗N M (see Section 2.1), and satisfies the following standard completeness property: ϕ is provable in NM
iff
ϕ ∈ T AU T (∗N M )
where x ∗N M y = min(x, y) if x > 1 − y, x ∗N M y = 0 otherwise. This logic, introduced in [Esteva and Godo, 2001], has very nice logical properties besides the above global deduction theorem, as having an involutive negation (like L ukasiewicz logic), or being complete for deduction from arbitrary theories (not only for theorems). Indeed, this logic has received much attention by the Chinese school leaded
382
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Logic SMTL ΠMTL WCMTL IMTL WNM NM Cn MTL Cn IMTL BL SBL L Π G
Additional axiom schemata (PC) (C) (WC) (Inv) (WNM) (Inv) and (WNM) (Cn ) (Inv) and (Cn ) (Div) (Div) and (PC) (Div) and (Inv) (Div) and (C) (Con)
References [H´ ajek, 2002] [H´ ajek, 2002] [Montagna et al., 2006] [Esteva and Godo, 2001] [Esteva and Godo, 2001] [Esteva and Godo, 2001] [Ciabattoni et al., 2002] [Ciabattoni et al., 2002] [H´ajek, 1998a] [Esteva et al., 2000] [H´ajek, 1998a] [H´ ajek et al., 1996] [H´ajek, 1998a]
Table 3. Some axiomatic extensions of MTL obtained by adding the corresponing additional axiom schemata and the references where they have been introduced (in the context of fuzzy logics). by G.J. Wang. It turns out that he independently introduced in [Wang, 1999; Wang, 2000] a logic in the language (¬, ∨, →), called L∗ , with an algebraic semantics consisting of a variety of algebras called R0 -algebras. Pei later showed [Pei, 2003] that both R0 algebras and NM were in fact definitionally equivalent, and hence that logics NM and L∗ were equivalent as well. A similar relation was also found for IMTL and weaker version of L∗ . In the tradition of substructural logics, both BL and MTL are logics without contraction (see Ono and Komori’s seminal work [1985]). The weakest residuated logic without contraction is H¨ ohle’s Monoidal Logic ML [H¨ohle, 1995], equivalent to FLew (Full Lambek calculus with exchange and weakening)9 introduced by Kowalski and Ono [2001] as well as to Adillon and Verd´ u’s IPC∗ \c (Intu[ itionistic Propositional Calculus without contraction) Adillon and Verd´ u, 2000], and that is the logic corresponding to the variety of (bounded, integral and commutative) residuated lattices. From them, MTL can be obtained by adding the prelinearity axiom and from there, a hierarchy of all t-norm-based fuzzy logics can be considered as different schematic extensions [Kowalski and Ono, 2001; Esteva et al., 2003a]. Figure 1 shows a diagram of this hierarchy with the main logics involved. The issue of completeness of these and other t-norm based fuzzy logics extending of MTL has been addressed in the literature. In fact, several kinds of algebraic completeness have been considered, depending on the number of premises. Here we 9 Also known as aMALL or aMAILL (affine Multiplicative Additive fragment of (propositional) Intuitionistic Linear logic or HBCK [Ono and Komori, 1985].
Fuzzy Logic
383
Figure 1. Hierarchy of some substructural and fuzzy logics. will only refer to the completeness properties with respect to the usually intended semantics (standard semantics) on the real unit interval [0, 1]. For any L axiomatic extension of MTL and for every set of L-formulas Γ ∪ {ϕ}, we write Γ |=L ϕ when for every evaluation e of formulas on the any standard L-algebra (L-chain on [0, 1]) one has e(ϕ) = 1 whenever e(ψ) = 1 for all ψ ∈ Γ. Then: • L has the property of strong standard completeness, SSC for short, when for every set of formulae Γ, Γ ⊢L ϕ iff Γ |=L ϕ. • L has the property of finite strong standard completeness, FSSC for short, when for every finite set of formulae Γ, Γ ⊢L ϕ iff Γ |=L ϕ. • L has the property of (weak) standard completeness, SC for short, when for every formula ϕ, ⊢L ϕ iff |=L ϕ. Of course, the SSC implies the FSSC, and the FSSC implies the SC. Table 4 gathers the different standard results for some of the main t-norm based logics. Note that for some of these logics one may restrict to check completeness with respect to a single standard algebra defined by a distinguished t-norm, like in the cases of G, L , Π and NM logics. In the literature of t-norm based logics, one can find not only a number of axiomatic extensions of MTL but also extensions by means of expanding the language with new connectives. Some of these expansions (like those with Baaz[1996]’s ∆ connective, an involutive negation, with other conjunction or implication connectives, or with intermediate truth-constants) will be addressed later in Sections 3.3 and 3.4. All of MTL extensions and most of its expansions defined elsewhere share the property of being complete with respect to a corresponding class of linearly
384
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Logic MTL IMTL SMTL ΠMTL BL SBL L Π G WNM NM
SC Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
FSSC Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
SSC Yes Yes Yes No No No No No Yes Yes Yes
References [Jenei and Montagna, 2002] [Esteva et al., 2002] [Esteva et al., 2002] [Horˇc´ık, 2005b; Horˇc´ık, 2007] [H´ajek, 1998a; Cignoli et al., 2000] [Esteva et al., 2000] see [H´ ajek, 1998a] [H´ ajek, 1998a] see [H´ ajek, 1998a] [Esteva and Godo, 2001] [Esteva and Godo, 2001]
Table 4. Standard completeness properties for some axiomatic extensions of MTL and their references. For the negative results see [Montagna et al., 2006].
ordered algebras. To encompass all these logics and prove general results common to all of them, Cintula introduced the notion of core fuzzy logics10 in [Cintula, 2006]. Namely, a finitary logic L in a countable language is a core fuzzy logic if: (i) L expands MTL; (ii) L satisfies the congruence condition: for any ϕ, ψ, χ, ϕ ≡ ψ ⊢L χ(ϕ) ≡ χ(ψ); (iii) L satisfies the following local deduction theorem: Γ, ϕ ⊢L ψ iff there a is natural number n such that Γ ⊢L ϕ& . n. . &ϕ → ψ. Each core fuzzy logic L has a corresponding notion of L-algebra (defined as usual) and a corresponding class L of L-algebras, and enjoys many interesting properties. Among them we can highlight the facts that L is algebraizable in the sense of Blok and Pigozzi [1989] and L is its equivalent algebraic semantics, that L is indeed a variety, and that every L-algebra is representable as a subdirect product of Lchains, and hence L is (strongly) complete with respect to the class of L-chains. Predicate fuzzy logics Predicate logic versions of the propositional t-norm based logics described above have also been defined and studied in the literature. Following [H´ ajek and Cintula, 2007] we provide below a general definition of the predicate logic L∀ for any core fuzzy logic L. As usual, the propositional language of L is enlarged with a set of predicates P red, a set of object variables V ar and a set of object constants Const, together 10 Actually, Cintula also defines the class of ∆-core fuzzy logics to capture all expansions having the ∆ connective (see Section 3.4), since they have slightly different properties.
Fuzzy Logic
385
with the two classical quantifiers ∀ and ∃. The notion of formula trivially generalizes taking into account that now, if ϕ is a formula and x is an object variable, then (∀x)ϕ and (∃x)ϕ are formulas as well. In first-order fuzzy logics it is usual to restrict the semantics to L-chains only. For each L-chain A an L-interpretation for a predicate language PL = (P red, Const) of L∀ is a structure M = (M, (rP )P ∈P red , (mc )c∈Const ) where M = ∅, rP : M ar(P ) → A and mc ∈ M for each P ∈ P red and c ∈ Const. For each evaluation of variables v : V ar → M , the truth-value +ϕ+A M,v of a formula (where v(x) ∈ M for each variable x) is defined inductively from +P (x, · · · , c, · · ·)+A M,v = rP (v(x), · · · , mc · · ·), taking into account that the value commutes with connectives, and defining A ′ +(∀x)ϕ+A M,v = inf{+ϕ+M,v ′ | v(y) = v (y) for all variables, except x} ′ A +(∃x)ϕ+A M,v = sup{+ϕ+M,v ′ | v(y) = v (y) for all variables, except x}
if the infimum and supremum exist in A, otherwise the truth-value(s) remain undefined. An structure M is called A-safe if all infs and sups needed for definition of the truth-value of any formula exist in A. Then, the truth-value of a formula ϕ in a safe A-structure M is just A +ϕ+A M = inf{+ϕ+M,v | v : V ar → M }.
When +ϕ+A M = 1 for a A-safe structure M, the pair (M, A) is said to be a model for ϕ, written (M, A) |= ϕ. The axioms for L∀ are the axioms resulting from those of L by substitution of propositional variables with formulas of PL plus the following axioms on quantifiers (the same used in [H´ ajek, 1998a] when defining BL∀): (∀1) (∃1) (∀2) (∃2) (∀3)
(∀x)ϕ(x) → ϕ(t) (t substitutable for x in ϕ(x)) ϕ(t) → (∃x)ϕ(x) (t substitutable for x in ϕ(x)) (∀x)(ν → ϕ) → (ν → (∀x)ϕ) (x not free in ν) (∀x)(ϕ → ν) → ((∃x)ϕ → ν) (x not free in ν) (∀x)(ϕ ∨ ν) → ((∀x)ϕ ∨ ν) (x not free in ν)
Rules of inference of MTL∀ are modus ponens and generalization: from ϕ infer (∀x)ϕ. A completeness theorem for first-order BL was proven in [H´ ajek, 1998a] and the completeness theorems of other predicate fuzzy logics defined in the literature have been proven in the corresponding papers where the propositional logics were introduced. The following general formulation of completeness for predicate core and ∆-core fuzzy logics is from the paper [H´ ajek and Cintula, 2006]: for any be a (∆-)core fuzzy logic L over a predicate language PL, it holds that
386
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
T ⊢L∀ ϕ iff (M, A) |= ϕ for each model (M, A) of T , for any set of sentences T and formula ϕ of the predicate language PL. For some MTL axiomatic extensions L there are postive and negative results of standard completeness of the corresponding predicate logic L∀. For instance, for L being either G¨ odel, Nilpotent Minimum, MTL, SMTL or IMTL logics, the corresponding predicate logics G∀, NM∀, MTL∀, SMTL∀ and IMTL∀ have been proved to be standard complete for deductions from arbitrary theories (see [H´ ajek, 1998a; Esteva and Godo, 2001; Montagna and Ono, 2002]). However, the predicate logics L ∀, Π∀, BL∀, SBL∀ and ΠMTL∀ are not standard complete [H´ ajek, 1998a; Montagna et al., 2006; Horˇc´ık, 2007]. For more details on predicate fuzzy logics, including complexity results and model theory, the interested reader is referred to [H´ ajek and Cintula, 2006] and to the excellent survey [H´ ajek and Cintula, 2007].
3.2 Proof theory for t-norm based fuzzy logics From a proof-theoretic point of view, it is well known that Hilbert-style calculi are not a suitable basis for efficient proof search (by humans or computers). For the latter task one has to develop proof methods that are “analytic”; i.e., the proof search proceeds by step-wise decomposition of the formula to be proved. Sequent calculi, together with natural deduction systems, tableaux or resolution methods, yield suitable formalisms to deal with the above task. In this section we survey some analytic calculi that have been recently proposed for MTL (e.g. see [Gabbay et al., 2004] for a survey) and some of its extensions using hypersequents, a natural generalization of Gentzen’s sequents introduced by Avron [1991]. Cut-free sequent calculi provide suitable analytic proof methods. Sequents are well-known structures of the form ϕ1 , . . . , ϕn ⊢ ψ1 , . . . , ψm which can be intuitively understoof as “ϕ1 and . . . and ϕn implies ψ1 or . . . ψm ”. Sequent calculi have been defined for many logics, however they have problems with fuzzy logics, namely to cope with the linear ordering of truth-values in [0, 1]. To overcome with this problem when devising a sequent calculus for G¨ odel logic, Avron [1991] introduced a natural generalization of sequents called hypersequents. A hypersequent is an expression of the form Γ1 ⊢ ∆1 | . . . | Γn ⊢ ∆n where for all i = 1, . . . n, Γi ⊢ ∆i is an ordinary sequent. Γi ⊢ ∆i is called a component of the hypersequent. The intended interpretation of the symbol “|” is disjunctive, so the above hypersequent can be read as stating that one of the ordinary sequents Γ1 ⊢ ∆1 holds. Like in ordinary sequent calculi, in a hypersequent calculus there are axioms and rules which are divided into two groups: logical and structural rules. The logical
Fuzzy Logic
387
rules are essentially the same as those in sequent calculi, the only difference is the presence of dummy contexts G and H, called side hypersequents which are used as variables for (possibly empty) hypersequents. The structural rules are divided into internal and external rules. The internal rules deal with formulas within components. If they are present, they are the usual weakening and contraction rules. The external rules manipulate whole components within a hypersequent. These are external weakening (EW) and external contraction (EC): H |Γ⊢A|Γ⊢A
H (EW )
(EC) H |Γ⊢A
H |Γ⊢A
In hypersequent calculi it is possible to define further structural rules which simultaneously act on several components of one or more hypersequents. It is this type of rule which increases the expressive power of hypersequent calculi with respect to ordinary sequent calculi. An example of such a kind of rule is Avron’s communication rule: (com)
H | Π1 , Γ1 ⊢ A
G | Π2 , Γ2 ⊢ B
H | G | Π1 , Π2 ⊢ A | Γ1 , Γ2 ⊢ B Indeed, by adding (com) to the hypersequent calculus for intuitionistic logic one gets a cut-free calculus for G¨odel logic [Avron, 1991]. Following this approach, a proof theory for MTL has been investigated in [Baaz et al., 2004], where an analytic hypersequent calculus has been introduced. This calculus, called HMTL, has been defined by adding the (com) rule to the hypersequent calculus for intuitionistic logic without contraction IPC∗ \ c (or equivalently Monidal logic ML or Full Lambek with exchange and weaking FLew ). More precisely, axioms and rules of HMTL are those of Table 3.2. In fact, in [Baaz et al., 2004] it is shown that HMTL is sound and complete for MTL and that HMTL admits cut-elimination. Cut-free hypersequent calculi have also been obtained by Ciabattoni et al. [Ciabattoni et al., 2002] for IMTL and SMTL. Elegant hypersequent calculi have also been defined by Metcalfe, Olivetti and Gabbay for L ukasiewicz logic [Metcalfe et al., 2005] and Product logic [Metcalfe et al., 2004a], but using different rules for connectives. A generalization of both hypersequents and sequents-of-relations, called relational hypersequents is introduced in [Ciabattoni et al., 2005]. Within this framework, they are able to provide logical rules for L ukasiewicz, G¨odel and Product logics that are uniform i.e., identical for all three logics and then purely syntactic calculi with very simple initial relational hypersequents are obtained by introducing structural rules reflecting the characteristic properties of the particular logic. Such a framework is also used by Bova and Montagna in a very recent paper [Bova and Montagna, 2007] to provide a proof system for BL, a problem which has been open for a long time. Finally, let us comment that other proof search oriented calculi include a tableaux calculus for L ukasiewicz logic [H¨ ahnle, 1994], decomposition proof systems for
388
A⊢A
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
0⊢A
(cut)
Internal and External Structural Rules : H |Γ⊢C (iw) H | Γ, B ⊢ C M ultiplicative f ragment : H | Γ, A, B ⊢ C (&, l) H | Γ, A&B ⊢ C G | Γ ⊢ A H | Γ′ , B ⊢ C (→, l) G | H | Γ, Γ′ , A → B ⊢ C Additive f ragment : H | Γ, Ai ⊢ C (∧, li )i=1,2 H | Γ, A1 ∧ A2 ⊢ C H | Γ, A ⊢ C G | Γ, B ⊢ C (∨, l) H | G | Γ, A ∨ B ⊢ C
(EC),
(&, r) (→, r)
(∧, r)
H |Γ⊢A
G | A, Γ′ ⊢ C
H | G | Γ, Γ′ ⊢ C (EW ),
H |Γ⊢A
(com)
G | Γ′ ⊢ B
H | G | Γ, Γ′ ⊢ A&B H | Γ, A ⊢ B H |Γ⊢A→B G|Γ⊢A
(∨, ri )i=1,2
H |Γ⊢B
G | H | Γ⊢A∧B H | Γ ⊢ Ai H | Γ ⊢ A1 ∨ A2
Table 5. Axioms and rules of the hypersequent calculus HMTL. G¨ odel logic [Avron and Konikowska, 2001], and goal-directed systems for L ukasiewicz and G¨odel logics [Metcalfe et al., 2004b; Metcalfe et al., 2003]. Also, a general approach is presented in [Aguzzoli, 2004] where a calculus for any logic based on a continuous t-norm is obtained via reductions to suitable finite-valued logics, but not very suitable for proof search due to a very high branching factor of the generated proof trees. For an exhaustive survey on proof theory for fuzzy logics, the interested reader is referred to the forthcoming monograph [Metcalfe et al., to appear].
3.3 Dealing with partial truth: Pavelka-style logics with truth-constants The notion of deduction in t-norm based fuzzy logics is basically crisp, in the sense it preserves the distinguished value 1. Indeed, a deduction T ⊢L ψ in a complete logic L actually means that ψ necessarily takes the truth-value 1 in all evaluations that make all the formulas in T 1-true. However, from another point of view, more in line with Zadeh’s approximate reasoning, one can also consider t-norm based fuzzy logics as logics of comparative truth. In fact, the residuum ⇒ of a (left-continuous) t-norm ∗ satisfies the condition x ⇒ y = 1 if, and only if, x ≤ y for all x, y ∈ [0, 1]. This means that a formula ϕ → ψ is a logical consequence of a theory T , i.e. if T ⊢L ϕ → ψ, if the truth degree of ϕ is at most as high
Fuzzy Logic
389
as the truth degree of ψ in any interpretation which is a model of the theory T . Therefore, implications indeed implicitly capture a notion of comparative truth. This is fine, but in some situations one might be also interested to explicitly represent and reason with partial degrees of truth. For instance, in any logic L∗ of a left-continuous t-norm ∗, any truth-evaluation e satisfying e(ϕ → ψ) ≥ α and e(ϕ) ≥ β, necessarily satisfies e(ψ) ≥ α ∗ β as well. Therefore, having this kind of graded (semantical) form of modus ponens inside the logic (as many applied fuzzy systems do [Dubois et al., 1991c]) may seem useful when trying to devise mechanisms for allowing deductions from partially true propositions. One convenient and elegant way to allow for an explicit treatment of degrees of truth is by introducing truth-constants into the language. In fact, if one introduces in the language new constant symbols α for suitable values α ∈ [0, 1] and stipulates that e(α) = α for all truth-evalutations, then a formula of the kind α → ϕ becomes 1-true under any evaluation e whenever α ≤ e(ϕ). This approach actually goes back to Pavelka [1979] who built a propositional many-valued logical system PL which turned out to be equivalent to the expansion of L ukasiewicz Logic by adding into the language a truth-constant r for each real r ∈ [0, 1], together with a number of additional axioms. The semantics is the same as L ukasiewicz logic, just expanding the evaluations e of propositional variables in [0, 1] to truth-constants by requiring e(r) = r for all r ∈ [0, 1]. Although the resulting logic is not strong standard complete (SSC in the sense defined in Section 3.1) with respect to that intended semantics, Pavelka proved that his logic is complete in a different sense. Namely, he defined the truth degree of a formula ϕ in a theory T as +ϕ+T = inf{e(ϕ) | e is a PL-evaluation model of T }, and the provability degree of ϕ in T as |ϕ|T = sup{r ∈ [0, 1] | T ⊢PL r → ϕ} and proved that these two degrees coincide. This kind of completeness is usually known as Pavelka-style completeness, and strongly relies on the continuity of L ukasiewicz truth functions. Note that +ϕ+T = 1 is not equivalent to T ⊢P L ϕ, ak extended Pavelka’s approach to but only to T ⊢P L r → ϕ for all r < 1. Nov´ L ukasiewicz first order logic [Nov´ ak, 1990a; Nov´ ak, 1990b]. [ ] Later, H´ ajek 1998a showed that Pavelka’s logic PL could be significantly simplified while keeping the completeness results. Indeed he showed that it is enough to extend the language only by a countable number of truth-constants, one for each rational in [0, 1], and by adding to the logic the two following additional axiom schemata, called book-keeping axioms: r&s ↔ r ∗L s r → s ↔ r ⇒L s
390
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
for all r ∈ [0, 1]∩Q, where ∗L and ⇒L are the L ukasiewicz t-norm and its residuum respectively. He called this new system Rational Pavelka Logic, RPL for short. Moreover, he proved that RPL is strong standard complete for finite theories (FSSC in the usual sense). He also defined the logic RPL∀, the extension of RPL to first order, and showed that RPL∀ enjoys the same Pavelka-style completeness. Similar rational expansions for other t-norm based fuzzy logics can be analogously defined, but unfortunately Pavelka-style completeness cannot be obtained since L ukasiewicz Logic is the only fuzzy logic whose truth-functions (conjunction and implication) are continuous functions. However, several expansions with truth-constants of fuzzy logics different from L ukasiewicz have been studied, mainly related to the other two outstanding continuous t-norm based logics, namely G¨ odel and Product logic. We may cite [H´ ajek, 1998a] where an expansion of G∆ (the expansion of G¨ odel Logic G with Baaz’s projection connective ∆) with a finite number of rational truth-constants, [Esteva et al., 2000] where the authors define logical systems obtained by adding (rational) truth-constants to G∼ (G¨ odel Logic with an involutive negation) and to Π (Product Logic) and Π∼ (Product Logic with an involutive negation). In the case of the rational expansions of Π and Π∼ an infinitary inference rule (from {ϕ → r : r ∈ Q ∩ (0, 1]} infer ϕ → 0) is introduced in order to get Pavelkastyle completeness. Rational truth-constants have been also considered in some stronger logics (see Section 3.4) like in the logic L Π 12 [Esteva et al., 2001b], a logic that combines the connectives from both L ukasiewicz and Product logics plus the truth-constant 1/2, and in the logic PL [Horˇc´ık and Cintula, 2004], a logic which combines L ukasiewicz Logic connectives plus the Product Logic conjunction (but not implication), as well as in some closely related logics. Following this line, Cintula gives in [Cintula, 2005c] a definition of what he calls Pavelka-style extension of a particular fuzzy logic. He considers the Pavelka-style extensions of the most popular fuzzy logics, and for each one of them he defines an axiomatic system with infinitary rules (to overcome discontinuities like in the case of Π explained above) which is proved to be Pavelka-style complete. Moreover he also considers the first order versions of these extensions and provides necessary conditions for them to satisfy Pavelka-style completeness. Recently, a systematic approach based on traditional algebraic semantics has been considered to study completeness results (in the usual sense) for expansions of t-norm based logics with truth-constants. Indeed, as already mentioned, only the case of L ukasiewicz logic was known according to [H´ajek, 1998a]. Using this algebraic approach the expansions of the other two distinguished fuzzy logics, G¨ odel and Product logics, with countable sets of truth-constants have been reported in [Esteva et al., 2006] and in [Savick´ y et al., 2006] respectively. Following [Esteva et al., 2007; Esteva et al., 2007b], we briefly describe in the rest of this section the main ideas and results of this general algebraic approach. If L∗ is a logic of (left-continuous) t-norm ∗, and C = C, ∗, ⇒, min, max, 0, 1 is a countable subalgebra of the standard L∗ -algebra [0, 1]∗ , then the logic L∗ (C) is defined as follows:
Fuzzy Logic
391
(i) the language of L∗ (C) is the one of L∗ expanded with a new propositional variable r for each r ∈ C (ii) the axioms of L∗ (C) are those of L∗ plus the bookeeping axioms r&s ↔ r ∗ s r → s ↔ r ⇒∗ s for each r, s ∈ C. The algebraic counterpart of the L∗ (C) logic consists of the class of L∗ (C)-algebras, defined as structures A = A, &, →, ∧, ∨, {r A : r ∈ C} such that: A A 1. A, &, →, ∧, ∨, 0 , 1 is an L∗ -algebra, and 2. for every r, s ∈ C the following identities hold: rA &sA = r ∗ sA rA → sA = r ⇒ sA . A L∗ (C)-chain defined over the real unit interval [0, 1] is called standard. Among the standard chains, there is one which reflects the intended semantics, the socalled canonical L∗ (C)-chain [0, 1]L∗ (C) = [0, 1], ∗, ⇒, min, max, {r : r ∈ C}, i. e. the one where the truth-constants are interpreted by themselves. Note that, for a logic L∗ (C) there can exist multiple standard chains, as soon as there exist different ways of interpreting the truth-constants on [0, 1] respecting the bookkeeping axioms. For instance, for the case of G¨ odel logic, when ∗ = min and C = [0, 1] ∩ Q, the algebra A = [0, 1], &, →, ∧, ∨, {r A : r ∈ C} where 1, if r ≥ α rA = 0, otherwise is a standard L∗ (C) algebra for any α > 0. Since the additional symbols added to the language are 0-ary, L∗ (C) is also an algebraizable logic and its equivalent algebraic semantics is the variety of L∗ (C)algebras, This, together with the fact that L∗ (C)-algebras are representable as a subdirect product of L∗ (C)-chains, leads to the following general completeness result of L∗ (C) with respect to the class of L∗ (C)-chains: for any set Γ ∪ {ϕ} of L∗ (C) formulas, Γ ⊢L∗ (C) ϕ if, and only if, for each L∗ (C)-chain A, e(ϕ) = 1
A
for all A-evaluation e model of Γ.
The issue of studying when a logic L∗ (C) is also complete with respect to the class of standard L∗ (C)-chains (called standard completeness) or with respect to the canonical L∗ (C)-chain (called canonical completeness) has been addressed in the
392
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
literature for some logics L∗ . H´ ajek already proved in [H´ ajek, 1998a] the canonical completeness of the expansion of L ukasiewicz logic with rational truth-constants for finite theories. More recently, the expansions of G¨ odel (and of some t-norm based logic related to the nilpotent minimum t-norm) and of Product logic with countable sets of truth-constants have been proved to be canonical complete for theorems in [Esteva et al., 2006] and in [Savick´ y et al., 2006] respectively. A rather exhaustive description of completeness results for the logics L∗ (C) can be found in [Esteva et al., 2007; Esteva et al., 2007b] and about complexity in [H´ajek, 2006b]. One negative result for many of these logics (with the exception of L ukasiewicz logic) is that they are not canonical complete for deductions from non-empty theories. However, such canonical completeness can be recovered in some cases (see e.g. [Esteva et al., 2007]) when the one considers the fragment of formulas of the kind r → ϕ, where ϕ is a formula without additional truth-constants. Actually, this kind of formulas, under the notation as a pair (r, ϕ), have been extensively considered in other frameworks for reasoning with partial degrees of truth, like in Nov´ ak’s formalism of fuzzy logic with evaluated syntax based on L ukasiewicz Logic (see e.g. [Nov´ ak et al., 1999]), in Gerla’s framework of abstract fuzzy logics [Gerla, ] 2001 or in fuzzy logic programming (see e.g. [Vojt´ aˇs, 2001]).
3.4 More complex residuated logics Other interesting kinds of fuzzy logics are those expansions obtained by joining the logics of different t-norms or by adding specific t-norm related connectives to certain logics. In this section we describe some of them, in particular expansions with Baaz’s ∆ connective, expansions with an involutive negation, and the logics ukasiewicz and Product logics. L Π, L Π 21 and PL combining connectives from L Logics with ∆ Here below we describe L∆ , the expansion of an axiomatic extension L of MTL with Baaz’s ∆ connective. The intended semantics for the ∆ unary connective, introduced in [Baaz, 1996], is that ∆ϕ captures the crisp part of a fuzzy proposition ϕ (similar to the core of a fuzzy set). This is done by extending the truth-evaluations e on formulas with the additional requirement: 1, if e(ϕ) = 1 e(∆ϕ) = 0, otherwise Therefore, for any formula ϕ, ∆ϕ behaves as a classical (two-valued) formula. At the syntactical level, axioms and rules of L∆ are those of L plus the following additional set of axioms: (∆1) ∆ϕ ∨ ¬∆ϕ, (∆2) ∆(ϕ ∨ ψ) → (∆ϕ ∨ ∆ψ),
Fuzzy Logic
393
(∆3) ∆ϕ → ϕ, (∆4) ∆ϕ → ∆∆ϕ, (∆5) ∆(ϕ → ψ) → (∆ϕ → ∆ψ). and the Necessitation rule for ∆: from ϕ derive ∆ϕ11 . The notion of proof in L∆ is the usual one. Notice that in general the local deduction theorem for MTL and its extensions L fails for the logics L∆ . Indeed, ϕ ⊢L∆ ∆ϕ, but for each n it may be the case ⊢L∆ ϕn → ∆ϕ. Take, for example, a strict continuous t-norm ∗, hence isomorphic to the product. Then for all 0 < x < 1, xn > 0. However, every logic L∆ satisfies another form of deduction theorem, known as [H´ ajek, 1998a]: Γ ∪ {ϕ} ⊢ ψ iff Γ ⊢ ∆ϕ → ψ. The algebraic semantics of L∆ is given by L∆ -algebras, i.e. L-algebras expanded with a unary operator δ, satisfying the following conditions for all x, y: (δ1) (δ2) (δ3) (δ4) (δ5) (δ6)
δ(x) ∨ ¬δ(x) = 1 δ(x ∨ y) ≤ (δ(x) ∨ δ(y)) δ(x) ≤ x δ(x) ≤ δ(δ(x)) δ(x ⇒ y) ≤ (δ(x) ⇒ δ(y)) δ(1) = 1
Notice that in any linearly ordered L∆ -algebra δ(x) = 1 if x = 1, and δ(x) = 0 otherwise. The notions of evaluation, model and tautology are obviously adapted from the above case. Then the following is the general completeness results for ajek, 1998a; Esteva and Godo, 2001]: for each set of L∆ -formulas Γ L∆ logics [H´ and each L∆ -formula ϕ the following are equivalent: 1. Γ ⊢L∆ ϕ, 2. for each L∆ -chain A and each A-model e of Γ, e(ϕ) = 1, 3. for each L∆ -algebra A and each A-model e of Γ, e(ϕ) = 1. Standard completeness for L∆ logics have been proved in the literature whenever the logic L has been shown to be standard complete, like e.g. it is the case for all the logics listed in Table 3. Logics with an involutive negation Basic strict fuzzy logic SBL was introduced in [Esteva et al., 2000] as the axiomatic extension of BL by the single axiom (PC) ¬(ϕ ∧ ¬ϕ), and SMTL in an analogous way as extension of MTL [H´ ajek, 2002]. Note that G¨ odel logic G and Product logic Π are extensions of SBL (and 11 Note that this rule holds because syntactic derivation only preserves the maximal truth value, contrary to Zadeh’s entailment principle
394
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
thus of SMTL as well). In any extension of SMTL, the presence of the axiom (PC) forces the negation ¬ to be strict, i.e. any evaluation e model of (PC) one has 1, if e(ϕ) = 0 e(¬ϕ) = 0, otherwise This kind of “two-valued” negation is also known in the literature as G¨ odel negation. In the logics with G¨ odel negation, one cannot define a meaningful (strong) disjunction ∨ by duality from the conjunction &, i.e. to define ϕ∨ψ as ¬(¬ϕ&¬ψ), as well as a corresponding S-implication ϕ →S ψ as ¬ϕ∨ψ. It seems therefore natural to introduce in these logics an involutive negation ∼ as an extra connective. To do so, and noticing that a suitable combination of both kinds of negations behaves like the ∆ connective, i.e. 1, if e(ϕ) = 1 = e(∆ϕ), e(¬ ∼ ϕ) = 0, otherwise the logic SBL∼ , where the ∆ connective is in fact a derivable connective (∆ϕ is ¬ ∼ ϕ) was introduced in [Esteva et al., 2000] as an axiomatic extension of the logic SBL∆ by the following two axioms: (∼1) (∼2)
∼∼ ϕ ≡ ϕ ∆(ϕ → ψ) → (∼ ψ →∼ ϕ)
Axiom (∼1) forces the negation ∼ to be involutive and axiom (∼2) to be order reversing. Similar extensions have been defined for G¨ odel logic (G∼ ), Product logic (Π∼ ) and SMTL (SMTL∼ ). Standard completeness for these logics was proved but, interestingly enough, these two axioms are not enough to show completeness of SBL (Π, SMTL, resp.) with respect to SBL-algebras (Π-algebras, SMTL-algebras resp.) on [0, 1] expanded only by the standard negation n(x) = 1 − x, one needs to consider all possible involutive negations in [0, 1], even though all of them are isomorphic. This was noticed in [Esteva et al., 2000], and has been deeply studied by Cintula et al. in [2006] where the expansions of SBL with an involutive negation are systematically investigated. The addition of an involutive negation in the more general framework of MTL has also been addressed by Flaminio and Marchioni in [2006]. Π 21 is a logic “putting L ukasiewicz and Product logics toThe logic L Π 21 . L gether”, introduced an studied in [Esteva and Godo, 1999; Montagna, 2000; Esteva et al., 2001b] and further developed by Cintula in [2001a; 2001b; 2003; 2005b]. The language of the L Π logic is built in the usual way from a countable set of propositional variables, three binary connectives →L (Lukasiewicz implication), ⊙ 0. (Product conjunction) and →Π (Product implication), and the truth constant ¯ A truth-evaluation is a mapping e that assigns to every propositional variable a real number from the unit interval [0, 1] and extends to all formulas as follows:
Fuzzy Logic
e(¯ 0)= 0, e(ϕ ⊙ ψ) = e(ϕ) · e(ψ),
395
e(ϕ →L ψ) = min(1 − e(ϕ) + e(ψ), 1), 1, if e(ϕ) ≤ e(ψ) . e(ϕ →Π ψ) = e(ψ)/e(ϕ), otherwise
The truth constant 1 is defined as ϕ →L ϕ. In this way we have e(1) = 1 for any truth-evaluation e. Moreover, many other connectives can be defined from those introduced above: ¯ ¬L ϕ is ϕ →L 0, ¬Π ϕ is ϕ →Π ¯ 0, ϕ ∧ ψ is ϕ&(ϕ →L ψ), ϕ ∨ ψ is ¬L (¬L ϕ ∧ ¬L ψ), ϕ&ψ is ¬L (¬L ϕ ⊕ ¬L ψ), ϕ ⊕ ψ is ¬L ϕ →L ψ, ϕ ⊖ ψ is ϕ&¬L ψ, ϕ ≡ ψ is (ϕ →L ψ)&(ψ →L ϕ), ∆ϕ is ¬Π ¬L ϕ, ∇ϕ is ¬Π ¬Π ϕ, with the following interpretations: e(¬L ϕ) = 1 − e(ϕ), e(ϕ ∧ ψ) = min(e(ϕ), e(ψ)), e(ϕ ⊕ ψ) = min(1, e(ϕ) + e(ψ)), e(ϕ ⊖ ψ) = max(0, e(ϕ) − e(ψ)), 1, if e(ϕ) = 1 e(∆ϕ) = , 0, otherwise
e(¬Π ϕ) = e(ϕ ∨ ψ) = e(ϕ&ψ) = e(ϕ ≡ ψ) = e(∇ϕ) =
1, if e(ϕ) = 0 , 0, otherwise max(e(ϕ), e(ψ)), max(0, e(ϕ) + e(ψ) − 1), 1 − |e(ϕ) − e(ψ)|, 1, if e(ϕ) > 0 . 0, otherwise
The logical system L Π is the logic whose axioms are12 : (L) (Π) (¬) (∆) (LΠ5)
Axioms of L ukasiewicz logic (for →L , &, ¯ 0); 0); Axioms for product logic (for →Π , ⊙, ¯ ¬Π ϕ →L ¬L ϕ ∆(ϕ →L ψ) ≡L ∆(ϕ →Π ψ) ϕ ⊙ (ψ ⊖ χ) ≡L (ϕ ⊙ ψ) ⊖ (ϕ ⊙ χ)
and whose inference rules are modus ponens (for →L ) and necessitation for ∆: from ϕ infer ∆ϕ. Π by adding a truth constant 12 together The logic L Π 21 is then obtained from L with the axiom: (LΠ 12 )
1 2
≡ ¬L 12
Obviously, a truth-evaluation e for L Π is easily extended to an evaluation for 1 1 1 L Π 2 by further requiring e( 2 ) = 2 . The notion of proof in L Π 21 is as usual and it is indeed strongly complete for finite theories with respect to the given semantics. That is, if T is a finite set of Π 21 -evaluation e model of T . formulas, then T ⊢L Π 1 ϕ iff e(ϕ) = 1 for any L 2
It is interesting to remark that L Π and L Π 21 are indeed very powerful logics. Indeed L Π conservatively extends L ukasiewicz, Product and G¨ odel logics (note that 12 This definition, proposed in [Cintula, 2003], is actually a simplified version of the original definition of LΠ given in [Esteva et al., 2001b].
396
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
G¨ odel implication →G is also definable by putting ϕ →G ψ as ∆(ϕ → ψ) ∨ ψ). Moreover, as shown in [Esteva et al., 2001b], rational truth constants r¯ (for each rational r ∈ [0, 1]) are definable in L Π 21 from the truth constant 12 and the connectives. Therefore, in the language of L Π 12 there is a truth-constant for each rational in [0, 1], and due to completeness of L Π 12 , the following book-keeping axioms for rational truth constants are provable: (RLΠ1) (RLΠ3)
¬L r r⊙s
≡ ≡
1 − r, r · s,
(RLΠ2) (RLΠ4)
r →L s r →Π s
≡ ≡
min(1, 1 − r + s), r ⇒P s,
where r ⇒P s = 1 if r ≤ s, r ⇒P s = s/r otherwise. Moreover, Cintula [2003] shows (see also [Marchioni and Montagna, 2006]) that, for each continuous t-norm ∗ that is an ordinal sum of finitely many copies of L ukasiewicz, product and min Π 21 . Indeed, he imum t-norms, L∗ (the logic of the t-norm ∗) is interpretable in L 1 defines a syntactical translation of L∗ -formulas into L Π 2 -formulas, say ϕ → ϕ′ , 1 ′ such that L∗ proves ϕ if and only if L Π 2 proves ϕ . Connections between the logics L Π and Π∼ (the extension of product logic Π with an involutive negation, see above) have been also investigated in [Cintula, 2001b]. The predicate L Π and L Π 21 logics have been studied in [Cintula, 2001a], showing in particular that they conservatively extend G¨ odel predicate logic. To conclude, let us remark that the so-called L Π 12 -algebras, the algebraic coun1 terpart of the logic L Π 2 , are in strong connection with ordered fields. Indeed, Montagna has shown [Montagna, 2000; Montagna, 2001], among other things, that L Π 12 -algebras are substructures of fields extending the field of rational numbers. Morever, as recently shown in [Marchioni and Montagna, 2006; Marchioni and Montagna, to appear], that the theory of real closed fields is faithfully interpretable in L Π 21 . See also [Montagna and Panti, 2001; Montagna, 2005] for further deep algebraic results regarding L Π-algebras. The logic PL. Starting from algebraic investigations on MV-algebras with additional operators by Montagna [2001; 2005], the logic PL, for Product-Lukasiewicz, was introduced by Horˇc´ık and Cintula in [2004]. Basically, PL is an expansion of L ukasiewicz logic by means of the product conjunction, and its language is built up from three binary connectives, & (Lukasiewicz conjunction), → (Lukasiewicz implication), ⊙ (Product conjunction), and the truth constant ¯ 0. The axioms of PL are those of L ukasiewicz logic, plus the following additional axioms: (PL1) ϕ ⊙ (ψ&(χ → 0)) ↔ (ϕ ⊙ ψ)&((ϕ ⊙ χ) → 0), (PL2) ϕ ⊙ (χ ⊙ ψ) ↔ (ϕ ⊙ ψ) ⊙ χ, (PL3) ϕ → ϕ ⊙ 1, (PL4) ϕ ⊙ ψ → ϕ, (PL5) ϕ ⊙ ψ → ψ ⊙ ϕ
Fuzzy Logic
397
They also consider the logic PL′ as the extension of PL by the deduction rule: (ZD) from ¬(ϕ ⊙ ϕ), derive ¬ϕ. PL′ is shown to be standard complete with respect to the standard L ukasiewicz algebra expanded with the product (of reals) operation (see also [Montagna, 2001]), hence w. r. t. the intended semantics, while PL is not. In fact, it is the inference rule (ZD) that makes the difference, forcing the interpretation of the product ⊙ connective to have no zero divisors. At the same time, in contrast to all the other algebraic semantics surveyed so far, the class of algebras associated to the PL′ does not form a variety but a quasi-variety. In [Horˇc´ık and Cintula, 2004], the authors also study expansions of these logics by means of Baaz’s ∆ connective and by rational truth constants, as well as their predicate versions. A logic which is very related to these systems is Takeuti and Titani’s logic [1992]. It is a predicate fuzzy logic based on the Gentzen’s system LJ of intuitionistic predicate logic. The connectives used by this logic are just the connectives of the predicate PL logic with a subset of rational truth-constants but Takeuti and Titani’s logic has two additional deduction rules and 46 axioms and it is sound and complete w.r.t. the standard PL∆ -algebra (cf. [Takeuti and Titani, 1992, Th. 1.4.3]). In [Horˇc´ık and Cintula, 2004] it is shown it exactly corresponds to the expansion of predicate PL∆ logic with truth-constants which are of the form k/2n , for natural numbers k and n.
3.5
Further issues on residuated fuzzy logics
The aim in the preceding subsections has been to survey main advances in the logical formalization of residuated many-valued systems underlying fuzzy logic in narrow sense. This field has had a great development in the last 10-15 years, and many scholars from different disciplines like algebra, logic, computer science or artificial intelligence joined efforts. Hence, our presentation is not exhaustive by far. A lot of aspects and contributions have not been covered by lack of space reasons, although they deserve to be commented. At the risk of being again incomplete, we briefly go through some of them in the rest of this subsection. A. Other existing expansions and fragments of MTL and related logics Hoop fuzzy logics: In [Esteva et al., 2003b] the positive (falsehood-free) fragments of BL and main extensions (propositional and predicate calculi) are axiomatized and they are related 0-free subreducts of the corresponding algebras, which turn out to be a special class of algebraic structures known as hoops (hence the name of hoop fuzzy logic). Similar study is carried for MTL and extensions, introducing the related algebraic structures which are called semihoops. Issues of completeness, conservativeness and complexity are also addressed. The class of the so-called basic hoops, hoops corresponding to BLH, the hoop variant of BL,
398
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
have an important role in the algebraic study of linearly ordered chains [Aglian´ o et al., to appear]. Rational L ukasiewicz logic and DMV-algebras: A peculiar kind of expansion which allows the representation of rational truth-constants is given by the indexRational L ukasiewicz logicRational L ukasiewicz logic RL introduced by Gerla [2001b]. RL is obtained by extending L ukasiewicz logic by the unary connectives δn , for each n ∈ N, plus the following axioms: (D1) δn ϕ⊕ . n. . ⊕δn ϕ ↔ ϕ
(D2) ¬δn ϕ ⊕ ¬(δn ϕ⊕ . n. . ⊕δn ϕ).
where ⊕ is L ukasiewicz strong disjunction. The algebraic semantics for RL is given by DMV-algebras (divisible MV-algebras). A L ukasiewicz logic evaluation e into the real unit interval is extended to the connectives δn by e(δn ϕ) = e(ϕ)/n. In this way one can define in RL all rationals in [0, 1]. RL was shown to enjoy both finite strong standard completeness and Pavelka-style completeness (see [Gerla, 2001b] for all details). In particular, H´ ajek’s Rational Pavelka logic can be faithfully interpreted in RL. Fuzzy logics with equality: The question of introducing the (fuzzy) equality predicate in different systems of fuzzy logic has been dealt with in several papers, see e.g. [Liau and Lin, 1988; Bˇelohl´ avek, 2002c; H´ajek, 1998a; Nov´ ak et al., 1999; Nov´ ak, 2004; Bˇelohl´ avek and Vychodil, 2005] . Actually, in most of the works, fuzzy equality is a generalization of the classical equality because it is subject to axioms which are formally the same as the equality axioms in classical predicate logic. Semantically, fuzzy equality is related to the characterization of graded similarity among objects, with the meaning that the more similar are a couple of objects, the higher is the degree of their equality. B. About computational complexity The issue of complexity of t-norm based logics has also been studied in a number of papers starting with Mundici’s [1994] pioneering work regarding NP-completeness of L ukasiewicz logic and flourishing during the nineties, with some problems still left open. It has to be pointed out that the dichotomy of the SAT and TAUT problems in classical logic, where checking the tautologicity of ϕ is equivalent to check that ¬ϕ is not satisfiable and vice-versa, is no longer at hand in many-valued logics. Unlike in classical logic, for a many-valued semantics there need not be a simple relationship between its TAUT and SAT problems. This is the reason why, given a class K of algebras of the same type, it is natural to distinguish the following sets of formulas (as suggested in [Baaz et al., 2002] for the SAT problems): T AU T1K = {ϕ | ∀A ∈ K, ∀eA , eA (ϕ) = 1} K = {ϕ | ∀A ∈ K, ∀eA , eA (ϕ) > 0} T AU Tpos K SAT1 = {ϕ | ∃A ∈ K, ∃eA , eA (ϕ) = 1}
Fuzzy Logic
399
K = {ϕ | ∃A ∈ K, ∃eA , eA (ϕ) > 0} SATpos
The interested reader is referred to two excellent surveys on complexity results and methods used: the one by Aguzzoli, Gerla and Hannikov´ a [2005] concerning a large family of propositional fuzzy logics (BL and several of its expansions) as well as some logics with the connective ∆; and the one by H´ ajek’s [2005b] for the case of prominent predicate fuzzy logics. C. Weaker systems of fuzzy logic Non commutative fuzzy logics: Starting from purely algebraic motivations (see [Di Nola et al., 2002]), several authors have studied generalizations of BL and MTL (and related t-norm based logics) with a non-commutative conjunction &, e.g. [H´ ajek, 2003a; H´ ajek, 2003b; Jenei and Montagna, 2003]. These logics have two implications, corresponding to the left and right residuum of the conjunction. The algebraic counterpart are the so-called pseudo-BL and pseudo-MTL algebras. Interestingly enough, while there are pseudo-MTL algebras over the real unit interval [0, 1], defined by left continuous pseudo-t-norms (i.e. operations satisfying all properties of t-norms but the commutativity), there are not pseudo-BL algebras, since continuous pseudo-t-norms are necessarily commutative. Still a weaker fuzzy logic, the so-called flea logic is investigated in [H´ ajek, 2005c], which is a common generalization of three well-known generalizations of the fuzzy (propositional) logic BL, namely the monoidal t-norm logic MTL, the hoop logic BHL and the non-commutative logic pseudo-BL. Weakly implicative fuzzy logics: Going even further on generalizing systems of fuzzy logic, Cintula [2006] has introduced the framework of weakly implicative fuzzy logics. The main idea behind this class of logics is to capture the notion of comparative truth common to all fuzzy logics. Roughly speaking, they are logics close to Rasiowa’s implicative logics [Rasiowa, 1974] but satisfying a proof-bycases property. This property ensures that these logics have a semantics based on linearly ordered sets of truth-values, hence allowing a proper notion of comparative truth. The interested reader is referred to [Behounek and Cintula, 2006b] where the authors advocate for this view of fuzzy logic. D. Functional representation issues McNaughton famous theorem [McNaughton, 1951], establishing that the class of functions representable by formulas of L ukasiewicz logic is the class of piecewise linear functions with integer coefficients, has been the point of departure of many research efforts trying to generalize it for other important fuzzy logics, i.e. trying to describe the class of real functions which can be defined by the truth tables of formulas of a given fuzzy logic. For instance we may cite [Gerla, 2000; Gerla, 2001a; Wang et al., 2004; Aguzzoli et al., 2005a; Aguzzoli et al., 2006] for the case of G¨ odel, Nilpotent Minimum and related logics, [Cintula and Gerla, 2004] for the
400
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
case of product logic, [Montagna and Panti, 2001] for the case of the L ukasiewicz expansions like L ∆ , PL∆ , L Π, L Π 21 logics. It is interesting to notice that the problem of whether the class of functions (on [0, 1]) defined by formulas of Product L ukasiewicz logic PL (see Section 3.4) amounts to the famous Pierce-Birkhoff conjecture: “Is every real-valued continuous piecewise polynomial function on real affine n-space expressible using finitely many polynomial functions and the operations of (pointwise) supremum and infimum?” This has been actually proved true for the case of functions of three variables, but it remains an open problem for the case of more variables.
3.6 T-norm based fuzzy logic modelling of approximate reasoning We have already referred in previous sections to the distinction between fuzzy logic in a narrow sense and in a broad sense. In Zadeh’s opinion [1988], fuzzy logic in the narrow sense is an extension of many-valued logic but having a different agenda, in particular including the approximate reasoning machinery described in Section 2 (flexible constraints propagation, generalized modus ponens, etc. ) and other aspects not covered there, such as linguistic quantifiers, modifiers, etc. In general, linguistic and semantical aspects are mainly stressed. The aim of this section is to show that fuzzy logic in Zadeh’s narrow sense can be presented as classical deduction in the frame of the t-norm based fuzzy logics described in previous subsections, and thus bridging the gap between the contents of Section 2 and Section 3. In the literature one can find several approaches to cast main Zadeh’s approximate reasoning constructs in a formal logical framework. In particular, Nov´ ak and colleagues have done much in this direction, using the model of fuzzy logic with evaluated syntax, fully elaborated in the monograph [Nov´ ak et al., 1999] (see the references therein and also [Dvoˇra´k and Nov´ ak, 2004]), and more recently he has developed a very powerful and sophisticated model of fuzzy type theory [Nov´ ak, 2005; Nov´ ak and Lehmke, 2006]. In his monograph, H´ ajek [1998a] also has a part devoted to this task. In what follows, we show a simple way of how to capture at a syntactical level, namely in a many-sorted version of predicate fuzzy logic calculus, say MTL∀, some of the basic Zadeh’s approximate reasoning patterns, basically from ideas in [H´ ajek, 1998a; Godo and H´ ajek, 1999]. It turns out that the logical structure becomes rather simple and the fact that fuzzy inference is in fact a (crisp) deduction becomes rather apparent. The potential advantges of this presentation are several. They range from having a formal framework which can be common or very similar for various kinds of fuzzy logics to the availability of well-developed proof theoretical tools of many-valued logic. Consider the simplest and most usual expressions in Zadeh’s fuzzy logic of the form “x is A”,
Fuzzy Logic
401
discussed in Section 2.2, with the intended meaning the variable x takes the value in A, represented by a fuzzy set µA on a certain domain U . The representation of this statement in the frame of possibility theory is the constraint (∀u)(πx (u) ≤ µA (u)) where πx stands for the possibility distribution for the variable x. But such a constraint is very easy to represent in MTL∀ as the (∀x)(X(x) → A(x)) (Caution!: do not confuse the logical variable x in this logical expression from the linguistic (extra-logical) variable x in “x is A”) where A and X are many-valued predicates of the same sort in each particular model M. Their interpretations (as fuzzy relations on their common domain) can be understood as the membership function µA : U −→ [0, 1] and the possibility distribution πx respectively. Indeed, one can easily observe that + (∀x)(X(x) → A(x)) +M = 1 if and only if +X(x)+M,e ≤+A(x)+M,e , for all x and any evaluation e. From now on, variables ranging over universes will be x, y; “x is A” becomes (∀x)(X(x) → A(x)) or just X ⊆ A; if z is 2-dimmensional variable (x, y), then an expression “z is R” becomes (∀x, y)(Z(x, y) → R(x, y)) or just Z ⊆ R. In what follows, only two (linguistic) variables will be involved x, y and z = (x, y). Therefore we assume that X, Y (corresponding to the possibility distributions πx and πy ) are projections of a binary binary fuzzy predicate Z (corresponding to the joint possibility distribution πx,y ). The axioms we need to state in order to formalize this asumption are: Π1 :
(∀x, y)(Z(x, y) → X(x)) & (∀x, y)(Z(x, y) → Y (y))
Π2 :
(∀x)(X(x) → (∃y)Z(x, y)) & (∀y)(Y (y) → (∃x)Z(x, y))
Condition Π1 expresses the monotonicity conditions πx,y (u, v) ≤ πx (u) and πx,y (u, v) ≤ πy (v), whereas both conditions Π1 and Π2 used together express the marginalization conditions πx (u) = supv πx,y (u, v) and πy (v) = supv πx,y (u, v). These can be equivalently presented as the only one condition P roj, as follows: Proj:
(∀x)(X(x) ≡ (∃y)Z(x, y)) & (∀y)(Y (y) ≡ (∃x)Z(x, y))
Next we shall consider several approximate reasoning patterns described in Section 2, and for each pattern we shall present a corresponding tautology and its derived deduction rule, which will automatically be sound. 1. Entailment Principle: From “x is A” infer “x is A∗ ”, whenever µA (u) ≤ µA∗ (u) for all u. Provable tautology:
402
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
(A ⊆ A∗ ) → (X ⊆ A → X ⊆ A∗ ) Sound rule:
A ⊆ A∗ , X ⊆ A X ⊆ A∗
2. Cylindrical extension: From “x is A” infer “(x, y) is A+ ”, where µA+ (u, v) = µA (u) for each v. Provable tautology: Π1 → [(X ⊆ A) → ((∀xy)(A+ (x, y) ↔ A(x)) → (Z ⊆ A+ ))] Sound rule:
Π1, X ⊆ A, (∀xy)(A+ (x, y) ↔ A(x)) Z ⊆ A+
3. min–Combination: From “x is A1 ” and “x is A2 ” infer “x is A1 ∩ A2 ”, where µA1 ∩A2 (u) = min(µA1 (u), µA2 (u)). Tautology: (X ⊆ A1 ) → ((X ⊆ A2 ) → (X ⊆ (A1 ∧ A2 ))) Rule:
X ⊆ A1 , X ⊆ A2 X ⊆ (A1 ∧ A2 )
where (A1 ∧ A2 )(x) is an abbreviation for A1 (x) ∧ A2 (x). 4. Projection: From “(x, y) is R” infer “y is RY ”, where µRY (y) = supu µR (u, v) for each v. Provable tautology: Π2 → ((Z ⊆ R) → (∀y)(Y (y) → (∃x)R(x, y))) Sound rule:
Π2, Z ⊆ R (∀y)(Y (y) → (∃x)R(x, y))
Note that the formalization of the max–min composition rule (from “x is A” and “(x, y) is R” infer “y is B”, where µB (y) = supu min(µA (u), µR (u, v))) Cond, P roj, (X ⊆ A), (Z ⊆ R) , Y ⊆B where Cond is the formula (∀y)(B(y) ≡ (∃x)(A(x) ∧ R(x, y))), is indeed a derived rule from the above ones. More complex patterns like those related to inference with fuzzy if-then rules “if x is A then y is B” can also be formalized. As we have seen in Section 2, there
Fuzzy Logic
403
are several semantics for the fuzzy if-then rules in terms of the different types constraints on the joint possibility distribution πx,y it may induce. Each particular semantics will obviously have a different representation. We will describe just a couple of them. Within the implicative interpretations of fuzzy rules, gradual rules are interpreted by the constraint πx,y (u, v) ≤ A(u) ⇒ B(v), for some residuated implication ⇒. According to this interpretation, the folllowing is a derivable (sound) rule Cond, P roj, X ⊆ A∗ , Z ⊆ A → B , Y ⊆ B∗ where (A → B)(x, y) stands for A(x) → B(y) and Cond is (∀y)[B ∗ (y) ≡ (∃x)(A∗ (x) ∧(A(x) → B(y)))]. If one wants to strengthen this rule as to force to derive (∀y)(B ∗ (y) ≡ B(y)) when adding the condition (∀x)(A∗ (x) ≡ A(x)) to the premises, then one has to move to another generalized modus ponens rule that is also derivable Cond, Π2′ , X ⊆ A∗ , Z ⊆ A → B , Y ⊆ B∗ where Cond is now (∀y)(B ∗ (y) ≡ (∃x)[A∗ (x) & (A(x) → B(y)))] and where condition Π2′ is (∀y)(Y (y) → (∃x)(X(x) & Z(x, y))), a slightly stronger condition than Π2. Finally, within the conjunctive model of fuzzy rules, where a rule “if x is A then y is B” is interpreted by the constraint πx,y (u, v) ≥ A(u) ∧ B(v), and an observation “x is A∗ ” by a positive constraint πx (u) ≥ A∗ (u), one can easily derive the Mamdani model (here with just one rule) Cond, P roj, X ⊇ A∗ , Z ⊇ A ∧ B , Y ⊇ B∗ where Cond is (∀y)[B ∗ (y) ≡ (∃x)(A∗ (x) ∧ A(x)) ∧ B(y)]. Interestingly enough, if the observation is instead modelled as a negative constraint πx (u) ≤ A∗ (u), then one can derive the following rule, Cond, P roj, (∃x)X(x), X ⊆ A∗ , Z ⊇ A ∧ B , Y ⊇ B∗ where Cond is now (∀y)[B ∗ (y) ≡ (∀x)(A∗ (x) → (A(x)) ∧ B(y))], which is in accordance with the discussion in Section 2.5.
3.7
Clausal and resolution-based fuzzy logics
S-fuzzy logics. Another family of fuzzy logics, very different from the class of logics presented in the previous subsections, can be built by taking as basic connectives a conjunction ⊓, a disjunction ⊔ and a negation ¬, rather than a conjunction and a (residuated) implication. These connectives are to be interpreted in [0, 1] by the triple (max, min, 1 − ·), or more generally by a De Morgan triple
404
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
(T, S, N ) where T is a t-norm, N a strong negations function and S is the N -dual t-conorm, i.e. S(x, y) = N (T (N (x), N (y)). See [Klement and Navara, 1999] for a comparison of these two fuzzy logic traditions. Butnariu and Klement [1995] introduced the so-called S-fuzzy logics, associated to the family of Frank t-norms. This is a parametrized family of continuous tnorms {Tλ }λ∈[0,∞] , strictly decreasing with respect to the parameter λ, and which has three interesting limit cases λ = 0, 1, ∞ corresponding to the three well known t-norms: T0 = min, T1 = ∗Π (product t-norm) and T∞ = ∗L (Lukasiewicz t-norm). For λ ∈ (0, ∞), (λx − 1)(λy − 1) Tλ (x, y) = logλ (1 + ) λ−1 is a t-norm isomorphic to ∗Π . The language of S-fuzzy logics Lλ is built over a countable set of propositional variables and two connectives ⊓ and ¬. Disjunction ⊔ and implication → are defined conenctives, ϕ ⊔ ψ is ¬(¬ϕ ⊓ ¬ψ) and ϕ → ψ is ¬(ϕ ⊓ ¬ψ). Semantics of Lλ is defined by evaluations of propositonal variables into [0, 1] that extend to arbitrary propositions by defining e(ϕ ⊓ ψ) = Tλ (e(ϕ), e(ψ)), e(¬ϕ) = 1 − e(ϕ). Notice that the interpretation of the implication is given by e(ϕ → ψ) = ISλ (e(ϕ), e(ψ)), where ISλ (x, y) = Sλ (1 − x, y) is an S-implication (see Section 2.1), with Sλ being the dual t-conorm of Tλ . This is the main reason why these logics are called Sfuzzy logics. When λ = 0, L0 is the so-called max-min S-logic, while for λ = ∞, L∞ corresponds to L ukasiewicz logic L . In S-fuzzy logics Lλ for λ = ∞ there are no formulas that take the value 1 under all truth-evaluations, but on the other hand, the set of formulas which are always evaluated to an strictly positive value is closed by modus ponens. This leads to define that a formula ϕ is a Lλ -tautology whenever e(ϕ) > 0 for all Lλ -evaluation e. Then the authors prove the following kind of completeness: the set of Lλ tautologies coincide with classical (two-valued) tautologies. This is in accordance with the well-known fact that, in the frame of Product logic Π (and more generally in SMTL), the fragment consisting of the double negated formulas ¬¬ϕ is indeed equivalent to classical logic. Fuzzy logic programming systems. Many non-residuated logical calculi that have early been developed in the literature as extensions of classical logic programming systems are related to some form of S-fuzzy logic, and a distinguishing feature is that the notion of proof is based on a kind of resolution rule, i.e. computing the truth value +ψ ⊔ χ+ from +ϕ ⊔ ψ+ and +¬ϕ ⊔ χ+. The first fuzzy resolution method was defined by [Lee, 1972] and it is related to the max-min S-fuzzy logic mentioned above. At the syntactic level, formulas
Fuzzy Logic
405
are classical first-order formulas (thus we write below ∧, and ∨ instead of ⊓ and ⊔ resp.) but at the semantic level, formulas have a truth value which may be intermediary between 0 and 1. An interpretation M is defined by an assignment of a truth value to each atomic formula, from which truth values of compound formulas are computed in the following way: +¬ϕ+M = 1 − +ϕ+M , +ϕ ∧ ψ+M = min(+ϕ+M , +ψ+M ), +ϕ ∨ ψ+M = max(+ϕ+M , +ψ+M ). The notions of validity, consistency and inconsistency are generalized to fuzzy logic: Let ϕ be a fuzzy formula. ϕ is valid iff +ϕ+M ≥ 0.5 for each interpretation M, i.e the set of designated truth values is [0.5, 1]. ϕ is inconsistent iff +ϕ+M ≤ 0.5 for each interpretation M. And, ϕ entails another formula ψ, denoted ϕ |= ψ, if +ψ+M ≥ 0.5 for each interpretation M such that +ϕ+M ≥ 0.5. [Lee and Chang, 1971] proved that a fuzzy formula is valid (respec. inconsistent) iff the formula is classically valid (respectively, inconsistent), i.e. considering the involved predicates and propositions as crisp; and that ϕ |= ψ in fuzzy logic iff ϕ |= ψ in classical logic. The resolvent of two clauses C1 and C2 is defined as in classical first-order logic. [Lee, 1972] proved that provided that C1 and C2 are ground clauses, and if min(+C1 +, +C2 +) = a > 0.5 and max(+C1 +, +C2 +) = b, then a ≤ +R(C1 , C2 )+ ≤ b for each resolvent R(C1 , C2 ) of C1 and C2 (see the discussion in section 2.3). This is generalized to resolvents of a set of ground clauses obtained by a number of successive applications of the resolution principle. Hence, Lee’s resolution is sound . This result also holds for intervals of truth values with a lower bound greater than 0.5. Lee’s proof method does not deal with refutation, hence it is not complete (since resolution is not complete for deduction). Many subsequent works have been based on Lee’s setting. In [Shen et al., 1988; Mukaidono et al., 1989] Lee’s resolution principle was generalized by introducing a fuzzy resolvent. Let C1 and C2 be two clauses of fuzzy logic and let R(C1 , C2 ) be a classical resolvent of C1 and C2 . Let l be the literal on the basis of which R(C1 , C2 ) has been obtained. Then, the fuzzy resolvent of C1 and C2 is R(C1 , C2 ) ∨ (l ∧ ¬l) with the truth value max(+R(C1 , C2 )+, +(l ∧ ¬l)+). It is proved that a fuzzy resolvent is always a logical consequence of its parent clauses, which generalizes Lee’s result. See also [Chung and Schwartz, 1995] for a related approach. One of the drawbacks of these and other early approaches is that they are based on the language of classical logic, and thus, does not make it possible to deal with intermediate truth values at the syntactic level. Nevertheless, the trend initiated by [Lee, 1972] blossomed in the framework of logic programming giving birth to a number of fuzzy logic programming systems. An exhaustive survey on fuzzy logic programming before 1991 is in [Dubois et al., 1991c, Sec. 4.3]. Most of them are mainly heuristic-based and not with a formal logical background. This is in part due to the difficulty of adapting resolution-based proof methods to fuzzy logics with residuated implication, with the exception of L ukasiewicz logic (whose implication is also an S-implication). Indeed, a resolution rule for L ukasiewicz-based logics has
406
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
been proposed in [Thiele and Lehmke, 1994; Lehmke, 1995; Klawonn and Kruse, 1994; Klawonn, 1995]. Lehmke and Thiele defined a resolution system for so-called weighted bold clauses. Clauses are of the form C = l1 ⊔ · · · ⊔ ln , where li are literals in classical way (they consider only propositional logic) and ⊔ is the L ukasiewicz (strong) disjunction (i.e. +C1 ⊔ C2 + = min(+C1 + + +C2 +, 1)). They introduce the resolution rule as follows: T ⊢ C1 , and p occurs in C1 T ⊢ C2 , and ¬p occurs in C2 , T ⊢ ((C1 ⊔ C2 )\p)\¬p where \ denotes the operation of omitting the corresponding literal. Then, they get the following result: If T ⊢ C then T |= C, and if T has no 1-model then T ⊢ ⊥. Klawonn and Kruse [1994] turned to predicate fuzzy logic in the setting of finitelyvalued L ukasiewicz logics. They introduce special implication clauses of the form (∀x1 . . . xn )(ϕ ⇒ A) and (∀x1 . . . xn )ϕ, where A is an atomic formula and ϕ contains only “and” and “or” types of connectives and no quantifiers. In this framework they define a prolog system (called LULOG) with a complete proof procedure for deriving the greatest lower bound for the truth-value of implication clauses, and based on the following graded resolution rule: from (¬ϕ⊔ψ, α) and (¬ψ ⊔χ, β) derive (¬ϕ ⊔ χ, max(α + β − 1, 0). Soundness and completeness results can be also found in the literature for fuzzy prolog systems where rules (without negation) are interpreted by as formulas p1 & . . . &pn → q of genuine residuated logic. For instance we may cite [Mukaidono and Kikuchi, 1993] for the case of G¨ odel semantics, and [Vojt´ aˇs, 1998] for the general case where & and → are interpreted by a left-continous t-norm and its residdum. Moreover, Vojt´ aˇs [2001] presented a soundness and completeness proof for fuzzy logic programs without negation and with a wide variety of connectives, and generalized in the framework of multi-adjoint residuated lattices by Medina et al. [2001].
3.8 Graded consequence and fuzzy consequence operators The systems of t-norm-based logics discussed in the previous sections aim at formalizing the logical background for fuzzy set based approximate reasoning, and their semantics are based on allowing their formulas to take intermediary degrees of truth. But, as already pointed out in Section 3.3, they all have crisp notions of consequence, both of logical entailment and of provability. It is natural to ask whether it is possible to generalize these considerations to the case that one starts from fuzzy sets of formulas, and that one gets from them, as logical consequence, fuzzy sets of formulas. One form of attacking this problem is by extending the logic with truth-constants as described in Section 3.3. However, there is also another approach, more algebraically oriented toward consequence operations for the classical case, originating
Fuzzy Logic
407
from Tarski [1930], see also [W´ ojcicki, 1988]. This approach treats consequence operations as closure operators. Many works have been devoted to extend the notions of closure operators, closure systems and consequence relations from two-valued logic to many-valued / fuzzy logics. Actually, both approaches have the origin in the work of Pavelka. Although one of the first works on fuzzy closure operators, was done by Mich´ alek [1975] in the framework of Fuzzy Topological Spaces, the first and best well-known approach to fuzzy closure operators in the logical setting is due to Pavelka [1979] and the basic monograph elaborating this approach is Nov´ ak, Perfilieva and Moˇckoˇr’s [1999]. In this approach, closure operators (in the standard sense of Tarski) are defined as mappings from fuzzy sets of formulas to fuzzy sets of formulas. In some more detail (following [Gottwald and H´ ajek, 2005]’s presentation), let L be a propositional language, P(L) be its power set and F(L) the set of L-fuzzy subsets of L, where L = (L, ∗, ⇒, ∧, ∨, ≤, 0, 1) is a complete MTL-algebra. Propositions of L will be denoted by lower case letters p,q, . . . , and fuzzy sets of propositions by upper case letters A,B, etc. For each A ∈ F(L) and each p ∈ L, A(p) ∈ L will stand for the membership degree of p to A. Moreover, the lattice structure of L induces a related lattice structure on F(L), (F(L), ∩, ∪, ⊆, ¯ 0, ¯ 1), which is complete and distributive as well, where ∩, ∪ are the pointwise extensions of the lattice operations ∧ and ∨ to F(L), i.e. (A ∩ B)(p) (A ∪ B)(p)
= =
A(p) ∧ B(p), for all p ∈ L A(p) ∨ B(p), for all p ∈ L,
and where the lattice (subsethood) ordering and top and bottom elements are defined respectively by A⊆B ¯0(p) = 0
iff and
A(p) ≤ B(p) for all p ∈ L ¯1(p) = 1, for all p ∈ L .
¯ For any k ∈ L, we shall also denote by k¯ the constant fuzzy set defined by k(p) =k for all p ∈ L. The Pavelka-style approach is an easy matter as long as the semantic consequence is considered. An L-evaluation e is a model of a fuzzy set of formulas A ∈ F(L) if and only if A(p) ≤ e(p) holds for each formula p. This leads to define as semantic consequence of A the following fuzzy sets of formulas: C sem (A)(p) = {e(p) | e model of A}, for each p ∈ L
For a syntactic characterization of this consequence relation it is necessary to have some logical calculus K which treats formulas of the language together with truth degrees. So the language of this calculus has to extend the language of the basic logical system by having also symbols for the truth degrees (truth-constants) denoted r for each r ∈ L, very similar to what has been described in Section 3.3. Once this is done, one can consider evaluated formulas, i.e. pairs (r, p) consisting of
408
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
a truth constant and a formula. Using this notion, one can understand in a natural way each fuzzy set of formulas A as a (crisp) set of evaluated formulas {(A(p), p) | p ∈ L}. Then, assuming the calculus K has a suitable notion of derivation for evaluated formulas ⊢K , then each K-derivation of an evaluated formula (r, p) can be understood as a derivation of p to the degree r ∈ L. Since p can have multiple derivations, it is natural to define the provability degree of p as the supremum of all these degrees. This leads to the following definition of fuzzy syntactical consequence of a fuzzy set of formulas A: C syn (A)(p) = {r ∈ L | {(A(q), q) | q ∈ L} ⊢K (r, p)}
This is in fact an infinitary notion of provability, that can be suitably handled by L ukasiewicz logic L since it has their truth-functions continuous. Indeed, by defining the derivation relation ⊢K from the set of axioms of L written in the form (1, ϕ), and having as inference rule the following kind of evaluated modus ponens (r, p) (s, p → q) , (r ∗ s, q) where ∗ is L ukasiewicz t-norm, it can be shown (see e.g. [Nov´ ak et al., 1999]) that one gets the following strong completeness result: C sem (A)(p) = C syn (A)(p) for any formula p and any fuzzy set of formulas A, that establishes the equivalence between the semantical and syntactical definitions of the consequence operators in the setting of L ukasiewicz logic. Thus Pavelka’s fuzzy consequence operators map each fuzzy set of formulas A (i.e. each set of evaluated formulas) to a fuzzy set of formulas denoted generically ˜ C(A) (i.e. a set of evaluated formulas) that corresponds to the set of evaluated formulas that are consequences of the initial set represented by A. And this mapping fulfills the properties of a fuzzy closure operator as defined by Pavelka [1979]. Namely, a fuzzy closure operator on the language L is a mapping C˜ : F(L) → F(L) fulfilling, for all A, B ∈ F(L), the following properties: ˜ fuzzy inclusion: A ⊆ C(A) ˜ C1) ˜ fuzzy monotony: if A ⊆ B then C(A) ˜ ˜ C2) ⊆ C(B) ˜ fuzzy idempotence: C( ˜ C(A)) ˜ ˜ C3) ⊆ C(A). This generalization of the notion of consequence operators leads to study closure operators and related notions like closure systems and consequence relations in other, more general fuzzy logic settings. In the rest of this section we review some of the main contributions. Gerla [1994a] proposes a method to extend any classical closure operator C defined on P(L), i.e. on classical sets of formulas, into a fuzzy closure operator
Fuzzy Logic
409
C˜ ∗ defined in F(L), i.e. on fuzzy sets of formulas. This approach is further delevoped in [Gerla, 2001, Chap. 3]. In the following, we assume F(L) to be fuzzy sets of formulas valued on a complete linearly-ordered G¨ odel BL-algebra L, i.e. a BL-chain (L, ∧, ∨, ⊗, ⇒, 0, 1) where ⊗ = ∧. Then, given a closure operator C : P(L) −→ P(L), the canonical extension of C is the fuzzy operator C˜ ∗ : F(L) −→ F(L) defined by C˜ ∗ (A)(p) = sup{α ∈ L | p ∈ C(Aα )}, where Aα stands for the α-cut of A, i.e. Aα = {p ∈ L | A(p) ≥ α}. According to this definition, the canonical extension C˜ ∗ is a fuzzy closure operator such that C˜ ∗ (A)(p) = 1 if p ∈ C(∅) and C˜ ∗ (A)(p) ≥ sup{A(q1 ) ∧ . . . ∧ A(qn ) | p ∈ C({q1 , . . . , qn })}. If C is compact, then the latter inequality becomes an equality. It also follows that a fuzzy set A is closed by C˜ ∗ then any α-cut of A is closed ˜ Canonical extensions of classical closure operators were characterized in by C. [Gerla, 2001] in the following terms: a fuzzy closure operator C˜ is the canonical extension of a closure operator if, and only if, for every meet-preserving function ˜ ˜ ◦ A) = f ◦ A. In other f : L −→ L such that f (1) = 1, if C(A) = A then C(f words, this characterization amounts to requiring that if A belongs to the closure ˜ then so does f ◦ A. system defined by C, As regards the generalization of the notion of consequence relation, Chakraborty [1988; 1995] introduced the notion of graded consequence relation as a fuzzy relation between crisp sets of formulas and formulas. To do this, he assumes to have a monoidal operation ⊗ in L such that (L, ⊗, 1, ≤, ⇒) is a complete residuated lattice. Then a fuzzy relation gc : P(L) × L −→ L is called a graded consequence relation by Chakraborty if, for every A, B ∈ P(L) and p, q ∈ L, gc fulfills: gc1) fuzzy reflexivity: gc(A, p) = 1 for all p ∈ A gc2) fuzzy monotony: if B ⊆ A then gc(B, p) ≤ gc(A, p) gc3) fuzzy cut: [inf q∈B gc(A, q)] ⊗ gc(A ∪ B, p) ≤ gc(A, p).13 Links between fuzzy closure operators and graded consequence relations were examined by Gerla [1996] and by Castro Trillas and Cubillo [1994]. In particular Castro et al. point out that several methods of approximate reasoning used in Artificial Intelligence, such as Polya’s models of plausible reasoning [Polya, 1954] or Nilsson’s probabilistic logic [Nilsson, 1974], are not covered by the formalism of graded consequence relations, and they introduce a new concept of consequence relations, called fuzzy consequence relations which, unlike Chakraborty’s graded consequence relation, apply over fuzzy sets of formulas. Namely, a fuzzy relation f c : F(L) × L −→ L is called a fuzzy consequence relation in [Castro et al., 1994] if the following three properties hold for every A, B ∈ F(L) and p, q ∈ L: f c1) fuzzy reflexivity: A(p) ≤ f c(A, p) 13 By
residuation, this axiom is equivalent to [inf q∈B gc(A, q)] ≤ gc(A ∪ B, p) ⇒ gc(A, p)
410
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
f c2) fuzzy monotony: If B ⊆ A then f c(B, p) ≤ f c(A, p) f c3) fuzzy cut: if for all p, B(p) ≤ f c(A, p), then for all q, f c(A ∪ B, q) ≤ f c(A, q) However, it is worth noticing that fuzzy consequence relations as defined above, when restricted over crisp sets of formulas, become only a particular class of graded consequence relations. Namely, regarding the two versions of the fuzzy cut properties, (gc3) and (f c3), it holds that for A, B ∈ P(L), if B(p) ≤ f c(A, p) for all p ∈ L, it is clear that inf q∈B f c(A, q) = 1. Let us point out that, in the classical setting, there are well known relationships of interdefinability among closure operators, consequence relations and closure systems. In the fuzzy framework, fuzzy closure operators and fuzzy consequence relations are related in a analogous way, as proved in [Castro et al., 1994]: ˜ • if C˜ is a fuzzy closure operator then f c, defined as f c(A, p) = C(A)(p), is a fuzzy consequence relation. ˜ defined as C(A) ˜ • if f c is a fuzzy consequence relation then C, = f c(A, ·), is a fuzzy closure operator. ˜ for cloTherefore, via these relationships, the fuzzy idempotence property (C3) sure operators and the fuzzy cut property (f c3) for consequence relations become equivalent. In the context of MTL-algebra L = (L, ∧, ∨, ⊗, ⇒, 0, 1), using the notation of closure operators and the notion of degree of inclusion between L-fuzzy sets of formulas defined as as [A ⊆⊗ B] = inf A(p) ⇒ B(p), p∈L
the relation between Chakraborty ’s graded consequence and Castro et. al.’s fuzzy consequence relation becomes self evident. As already mentioned, the former is defined only over classical sets while the latter is defined over fuzzy sets, but both yield a fuzzy set of formulas as output. Nevertheless, having this difference in mind, the two first conditions of both operators become syntactically the same as ˜ and C2 ˜ of Pavelka’s definition of fuzzy closure operators while the fuzzy cut C1 properties (the third ones) become very close one to another: ˜ ˜ ∪ B)) ⊆ C(A), ˜ gc3) fuzzy cut: ([B ⊑⊗ C(A)] ⊗ C(A ˜ ˜ where [B ⊑⊗ C(A)] = inf q∈B C(A)(q) (recall that B is a classical set). ˜ ˜ ∪ B) ⊆ C(A) ˜ f c3) fuzzy cut: if B ⊆ C(A) then C(A In [Rodr´ıguez et al., 2003] a new class of fuzzy closure operators is introduced, the so-called implicative closure operators, as a generalization of Chakraborty’s graded consequence relations over fuzzy sets of formulas. The adjective implicative is due to the fact that they generalize the Fuzzy Cut property (gc3) by means of the above defined degree of inclusion, which in turn depends on the implication operation ⇒ of the algebra L. More precisely, a mapping C˜ : F(L) −→ F(L) is called an implicative closure operator if, for every A, B ∈ F(L), C˜ fulfills:
Fuzzy Logic
411
˜ ˜ C1) fuzzy inclusion: A ⊆ C(A) ˜ ˜ ˜ C2) fuzzy monotony: If B ⊆ A then C(B) ⊆ C(A) ˜ ˜ ˜ ∪ B) ⊑⊗ C(A)] ˜ C3) fuzzy cut14 [B ⊑⊗ C(A)] ≤ [C(A The corresponding implicative consequence relation, denoted by Ic , is defined as ˜ The translation of the properties of Implicative closure operIc (A, p) = C(A)(p). ators to implicative consequence relations read as follows: ic1) fuzzy reflexivity: A(p) ≤ Ic (A, p) ic2) fuzzy monotony: If B ⊆ A then ic(B, p) ≤ Ic (A, p) ˜ ic3) fuzzy cut: [B ⊑⊗ C(A)] ≤ Ic (A ∪ B, p) ⇒ Ic (A, p). Now, it is easy to check that the restriction of implicative consequence relations over classical sets of formulas are exactly Chakraborty’s graded consequence rela˜ tions, since if B is a crisp set, [B ⊑⊗ C(A)] = inf p∈B Ic (A, p). On the other hand, fuzzy consequence relations are implicative as well, since property (ic3) clearly implies (f c3). Therefore, implicative consequence relations generalize both graded and fuzzy consequence relations. The relationship of implicative consequence operators to deduction in fuzzy logics with truth constants (as reported in Section 3.3) is also addressed in [Rodr´ıguez et al., 2003]. An it turns out that, although implicative closure operators are very general and defined in the framework of BL-algebras, strangely enough, they do not capture graded deduction (Pavelka-style) in any of the extensions of BL, except for G¨ odel’s logic. Belohl´ avek [2001; 2002a] proposes yet another notion of closure operator over fuzzy sets with values in a complete residuated lattice L, with the idea of capturing what he calls generalized monotonicity condition that reads as “‘if A is almost a subset of B then the closure of A is almost a subset of the closure of B”. Using the degree of inclusion defined before15 , for every order filter K of L, a new closure operator is defined as follows. An LK -closure operator on F(L) is a mapping C˜ : F(L) → F(L) satisfying for all A, A1 , A2 ∈ F(L) the conditions: ˜ A ⊆ C(A) ˜ (B1) ˜ [A1 ⊑⊗ A2 ] ≤ [C(A ˜ 1 ) ⊑⊗ C(A ˜ 2 )] whenever [A1 ⊑⊗ A2 ] ∈ K. (B2) ˜ C(A) ˜ ˜ C(A)) ˜ (B3) = C( 14 The original and equivalent presentation of this property in [Rodr´ ıguez et al., 2003] is [B ⊑⊗ ˜ ˜ ˜ C(A)] ⊗ C(A ∪ B) ⊆ C(A), directly extending (gc 3). 15 Actually, in Belohl´ avek’s paper it is considered as a fuzzy relation denoted as S(A1 , A2 ), instead of [A1 ⊑⊗ A2 ] used above.
412
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
It is clear that for L = {0, 1}, L{1} -closure operators are classical closure operators and for L = [0, 1]G , L{1} -closure operators are precisely the fuzzy closure operators studied by Gerla. In fact, although introduced independently, this notion is very close to implicative closure operators. Indeed, it is shown in [Bˇelohl´ avek, 2001] that conditions ˜ and (B3) ˜ can be equivalently replaced by the following condition: (B2) ˜ 2 )] ≤ [C(A ˜ 2 )] whenever [A1 ⊑⊗ C(A ˜ 2 )] ∈ K. ˜ [A1 ⊑⊗ C(A ˜ 1 ) ⊑⊗ C(A (B4) ˜ and (C3). ˜ ˜ alone Notice the similarity between (B4) Indeed, when K = L, (C3) ˜ is slightly stronger than (B4), this shows that in that case implicative closure operators are LK -closure operators. But in [Rodr´ıguez et al., 2003] it is proved that ˜ and (C2), ˜ ˜ is actually equivalent to (B4). ˜ in the presence of (C1) (C3) Therefore, when K = L, both implicative operators and LK -closure operators are exactly the same, as also witnessed by the very similar characterizations of these two kinds of fuzzy closure operators provided in [Rodr´ıguez et al., 2003] and [Bˇelohl´ avek, 2001; Bˇelohl´ avek, 2002a] in terms of their associated fuzzy closure systems. The study of the relationships between fuzzy closure operators and fuzzy similarities and preorders have also received some attention in the literature. In classical logic it is clear that the relation R(ϕ, ψ) iff ϕ ⊢ ψ defines a preorder in the set of formulas and E(ϕ, ψ) = R(ψ, ϕ) ∧ R(ϕ, ψ) defines and equivalence relation. This is not the case in the fuzzy setting, but there exist some relations that have been analyzed in several papers, e.g. [Castro and Trillas, 1991; Gerla, 2001; Rodr´ıguez et al., 2003; Elorza and Burillo, 1999; Bˇelohl´ avek, 2002a]. Finally, let us briefly comment that in the literature, different authors have studied the so-called fuzzy operators defined by fuzzy relations. Given a L-fuzzy relation R : L × L −→ L on a given logical language L, the associated fuzzy operator C˜R over F(L) is defined by: C˜R (A)(q) = p∈L {A(p) ⊗ R(p, q)}
for all A ∈ F(L), that is C˜R computes the image of fuzzy sets by sup −⊗ composition with R. Properties of these operators have been studied for instance when R is a fuzzy preorder [Castro and Trillas, 1991] or when is a fuzzy similarity relation [Castro and Klawonn, 1994; Esteva et al., 1998]. A special class of fuzzy operators appearing in the context of approximate reasoning patterns has been studied by Boixader and Jacas [Boixader and Jacas, 1998]. These operators, called extensional inference operators, are required to satisfy a extensionality condition which is very similar to condition (B2) above, and they can be associated to particular models of fuzzy if-then rules.
3.9 Concluding remarks: what formal fuzzy logic is useful for? From the contents of the section it will probably become clear that the concept of fuzzy logic, even understood as a formal system of many-valued logic, admits of multiple formalizations and interpretations. This may be felt as a shortcoming
Fuzzy Logic
413
but it can also be thought as an indication of the richness and complexity of the body of existing works. It may be particularly interesting for the reader to consult a recent special issue [Nov´ ak, 2006] of the journal Fuzzy Sets and Systems devoted to discuss the question of what fuzzy logic is. So far no definitive answer exists. The other important conceptual question is: what formal fuzzy logic is useful for?. The use of fuzzy logic (in narrow sense) to model linguistic vagueness would seem to be the most obvious application, however it is not generally accepted yet within the philosophic community. In fact vagueness often refers to semantic ambiguity and this is often confused with the gradual nature of linguistic categories. Fuzzy logic clearly accounts for the latter, but it is true as well that linguistic categories can be both gradual and semantically ambiguous. Also, fuzzy logic is not often used for knowledge representation in Artificial Intelligence (AI) because of the lack of epistemic concepts in it, and because there is a strong Boolean logic tradition in AI. However, introducing many-valuedness in AI epistemic logics can be handled in fuzzy logic as explained in next section. Fuzzy logic may prove on the other hand to be very useful for the synthesis of continuous functions, like Karnaugh tables were used for the synthesis of Boolean functions. This problem has no relationship to approximate reasoning, but this topic is close to fuzzy rule-based systems used as neuro-fuzzy universal approximators of real functions. New uses of first order logic related to the Semantic Web, such as description logics, can also benefit from the framework of fuzzy logic, so as to make formal models of domain ontologies more flexible, hence more realistic. This subjectmatter may well prove to be a future prominent research trend, as witnessed by the recent blossoming of publications in this area, briefly surveyed below.. Description logics [Baader et al., 2003], initially named “terminological logics”, are tractable fragments of first-order logic representation languages that handle the notions of concepts (or classes), of roles (and properties), and of instances or objects, thus directly relying at the semantic level on the notions of set, binary relations, membership, and cardinality. They are especially useful for describing ontologies that consist in hierarchies of concepts in a particular domain. Since Yen’s [1991] pioneering work, many proposals have been made for introducing fuzzy features in description logic [Tresp and Molitor, 1998; Straccia, 1998; Straccia, 2001; Straccia, 2006a], and in semantic web languages, since fuzzy sets aim at providing a representation of classes and relations with gradual membership, which may be more suitable for dealing with concepts having a somewhat vague or elastic definition. Some authors have recently advocated other settings for a proper handling of fuzzy concepts, such as the fuzzy logic BL [H´ajek, 2005a; H´ ajek, 2006a], or an approach to fuzzy description logic programs under the answer set semantics [L ukasiewicz, 2006]. Moreover, some authors [Hollunder, 1994; Straccia, 2006b; Straccia, 2006c] have also expressed concern about handling uncertainty and exceptions in description logic. Hollunder [1994] has introduced uncertainty in terminological logics using possibilistic logic (see Section 4.1). Recently, Dubois, Mengin and Prade [2006] have discussed how to handle both possibilistic uncertainty and fuzziness prac-
414
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
tically in description logic (by approximating fuzzy classes by finite families of nested ordinary classes). 4
FUZZY SET-BASED LOGICAL HANDLING OF UNCERTAINTY AND SIMILARITY
Fuzzy logics as studied in the previous section can be viewed as abstract formal machineries that can make syntactic inferences about gradual notions, as opposed to classical logic devoted to binary notions. As such it does not contain any epistemic ingredient, as opposed to Zadeh’s approximate reasoning framework. Indeed, a fuzzy set, viewed as a possibility distribution, can model graded incomplete knowledge, hence qualifies as a tool for handling uncertainty that differs from a probability distribution. However, it should be clear that a fuzzy set can capture incomplete knowledge because it is a set, not because it is fuzzy (i.e. gradual). Hence no surprise if some logics of uncertainty can be devised on the basis of fuzzy set theory and the theory of approximate reasoning. This is naturally the case of possibilistic logic and its variants, which bridge the gap with knowledge representation concerns in artificial intelligence, such as non-monotonic reasoning. The gradual nature of fuzzy sets also lead to logics of graded similarity. Moreover, being abstract machines handling gradual notions, fuzzy logic can embed uncertainty calculi because belief is just another (usually) gradual notion. This section surveys the application of fuzzy logic to current trends in reasoning about knowledge and beliefs.
4.1 Possibilistic logic Zadeh’s approach to approximate reasoning can be particularized to offer proper semantics to reasoning with a set of classical propositions equipped with a complete pre-ordering that enable reliable propositions to be distinguished from less reliable ones. Conclusions are all the safer as they are deduced from more reliable pieces of information. The idea of reasoning from sets of (classical) logic formulas stratified in layers corresponding to different levels of confidence is very old. Rescher [1976] proposed a deductive machinery on the basis of the principle that the strength of a conclusion is the strength of the weakest argument used in its proof, pointing out that this idea dates back to Theophrastus (372-287 BC)16 . However, Rescher did not provide any semantics for his proposal. The contribution of the possibilistic logic setting is to relate this idea (measuring the validity of an inference chain by its weakest link) to fuzzy set-based necessity measures in the framework of Zadeh [1978a]’s possibility theory, since the following pattern, first pointed out by Prade [1982], then holds N (¬p ∨ q) ≥ α and N (p) ≥ β imply N (q) ≥ min(α, β), 16 A disciple of Aristotle, who was also a distinguished writer and the creator of the first botanic garden!
Fuzzy Logic
415
where N is a necessity measure; see section 2.2 equation (14). This interpretative setting provides a semantic justification to the claim that the weight attached to a conclusion should be the weakest among the weights attached to the formulas involved in a derivation. Basic formalism Possibilistic logic (Dubois and Prade [1987; 2004]; Dubois, Lang and Prade [2002; 1994b], Lang[1991; 2001]) manipulates propositional or first order logical formulas weighted by lower bounds of necessity measures, or of possibility measures. A first-order possibilistic logic formula is essentially a pair made of a classical first order logic formula and a weight expressing certainty or priority. As already said, in possibilistic logic [Dubois et al., 1994a; Dubois et al., 1994b; Dubois and Prade, 1987], weights of formulas p are interpreted in terms of lower bounds α ∈ (0, 1] of necessity measures, i.e., the possibilistic logic expression (p, α) is understood as N (p) ≥ α, where N is a necessity measure. Constraints of the form Π(p) ≥ α could be also handled in the logic but they correspond to very poor pieces of information [Dubois and Prade, 1990; Lang et al., 1991], while constraint N (p) ≥ α ⇔ Π(¬p) ≤ 1 − α expresses that ¬p is somewhat impossible, which is much more informative. Still, both kinds of constraints can be useful for expressing situations of partial or complete ignorance about p by stating both Π(p) ≥ α and Π(¬p) ≥ α′ and then propagating this ignorance to be able to determine what is somewhat certain and what cannot be such due to acknowledged ignorance (to be distinguished from a simple lack of knowledge when no information appears in the knowledge base). A mixed resolution rule [Dubois and Prade, 1990] N (¬p ∨ q) ≥ α and Π(p ∨ r) ≥ β imply Π(q ∨ r) ≥ β if α > 1 − β (if α ≤ 1 − β, Π(q ∨ r) ≥ 0) is at the basis of the propagation mechanism for lower possibility bound information in a logic of graded possibility and certainty (Lang, Dubois, and Prade [1991]). In the following, we focus on the fragment of possibilistic logic handling only lower necessity bound information. Syntax An axiomatisation of 1st order possibilistic logic is provided by Lang [1991]; see also [Dubois et al., 1994a]. In the propositional case, the axioms consist of all propositional axioms with weight 1. The inference rules are: • {(¬p ∨ q, α), (p, β)} ⊢ (q, min(α, β)) (modus ponens) • for β ≤ α, (p, α) ⊢ (p, β) (weight weakening), where ⊢ denotes the syntactic inference of possibilistic logic. The min-decomposability of necessity measures allows us to work with weighted clauses without lack of
416
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
generality, since N (∧i=1,n pi ) ≥ α iff ∀i, N (pi ) ≥ α. It means that possibilistic logic expressions of the form (∧i=1,n pi , α) can be interpreted as a set of n formulas (pi , α). In other words, any weighted logical formula put in Conjunctive Normal Form is equivalent to a set of weighted clauses. This feature considerably simplifies the proof theory of possibilistic logic. The basic inference rule in possibilistic logic put in clausal form is the resolution rule: (¬p ∨ q, α); (p ∨ r, β) ⊢ (q ∨ r, min(α, β)). Classical resolution is retrieved when all the weights are equal to 1. Other valid inference rules are for instance: • if p classically entails q, (p, α) ⊢ (q, α) (formula weakening) • ((∀x)p(x), α) ⊢ (p(s), α) (particularization) • (p, α); (p, β) ⊢ (p, max(α, β)) (weight fusion). Observe that since (¬p ∨ p, 1) is an axiom, formula weakening is a particular case of the resolution rule (indeed (p, α); (¬p ∨ p ∨ r, 1) ⊢ (p ∨ r, α)). Formulas of the form (p, 0) that do not contain any information (∀p, N (p) ≥ 0 always holds), are not part of the possibilistic language. Refutation can be easily extended to possibilistic logic. Let K be a knowledge base made of possibilistic formulas, i.e., K = {(pi , αi )}i=1,n . Proving (p, α) from K amounts to adding (¬p, 1), put in clausal form, to K, and using the above rules repeatedly until getting K ∪ {(¬p, 1)} ⊢ (⊥, α). Clearly, we are interested here in getting the empty clause with the greatest possible weight [Dubois et al., 1987]. It holds that K ⊢ (p, α) if and only if Kα ⊢ p (in the classical sense), where Kα = {p | (p, β) ∈ K, β ≥ α}. Proof methods for possibilistic logic are described by Dubois, Lang and Prade [1994a], Liau and Lin [1993], and Hollunder [1995]. See [Lang, 2001] for algorithms and complexity issues. Remarkably enough, the repeated use of the probabilistic counterpart to the possibilistic resolution rule (namely, P rob(¬p ∨ q) ≥ α; P rob(p ∨ r) ≥ β ⊢ P rob(q ∨ r) ≥ max(0, α+β−1)) is not in general sufficient for obtaining the best lower bound on the probability of a logical consequence, in contrast to the case of possibilistic logic. An important feature of possibilistic logic is its ability to deal with inconsistency. The level of inconsistency of a possibilistic logic base is defined as Inc(K) = max{α | K ⊢ (⊥, α)} where, by convention max ∅ = 0. More generally, Inc(K) = 0 if and only if K ∗ = {pi | (pi , αi ) ∈ K)} is consistent in the usual sense. Note that this not true in case αi would represent a lower bound of the probability of pi in a probabilistically weighted logic.
Fuzzy Logic
417
Semantics Semantic aspects of possibilistic logic, including soundness and completeness results with respect to the above syntactic inference machinery, are presented in [Lang, 1991; Lang et al., 1991; Dubois et al., 1994b; Dubois et al., 1994a]. From a semantic point of view, a possibilistic knowledge base K = {(pi , αi )}i=1,n is understood as the possibility distribution πK representing the fuzzy set of models of K: πK (ω) = min max(µ[pi ] (ω), 1 − αi ) i=1,n
where [pi ] denotes the sets of models of pi such that µ[pi ] (ω) = 1 if ω ∈ [pi ] (i.e. ω |= pi ), and µ[pi ] (ω) = 0 otherwise). In the above formula, the degree of possibility of ω is computed as the complement to 1 of the largest weight of a formula falsified by ω. Thus, ω is all the less possible as it falsifies formulas of higher degrees. In particular, if ω is a counter-model of a formula with weight 1, then ω is impossible, i.e. πK (ω) = 0. It can be shown that πK is the largest possibility distribution such that NK (pi ) ≥ αi , ∀i = 1, n, i.e., the possibility distribution which allocates the greatest possible possibility degree to each interpretation in agreement with the constraints induced by K (where NK is the necessity measure associated with πK , namely NK (p) = minv∈[¬p] (1 − πK (v)) ). It may be that NK (pi ) > αi , for some i, due to logical constraints between formulas in K. The possibilistic closure corrects the ranking of formulas for the sake of logical coherence. Moreover, it can be shown that πK = πK ′ if and only if, for any level α, Kα and Kα′ are logically equivalent in the classical sense. K and K ′ are then said to be semantically equivalent. The semantic entailment is then defined by K |= (p, α) if and only if NK (p) ≥ α, i.e., if and only if ∀ω, πK (ω) ≤ max(µ[p] (ω), 1 − α). Besides, it can be shown that Inc(K) = 1 − maxω πK (ω). Soundness and completeness are expressed by K ⊢ (p, α) ⇔ K |= (p, α). In this form of possibilistic entailment, final weights attached to all formulas are at least equal to the inconsistency level of the base. The inconsistency-free formulas, which are above this level, entail propositions that have higher weights. Biacino and Gerla [1992] provide an algebraic analysis of possibility and necessity measures generated by this form of inference. The closure of a possibilistic knowledge base is an example of canonical extension of the closure operator of classical logic in the sense of [Gerla, 2001, Chap. 3]. To summarize, a possibilistic logic base is associated with a fuzzy set of models. This fuzzy set is understood as either the set of more or less plausible states of the world (given the available information), or as the set of more or less satisfactory states, according to whether we are dealing with uncertainty or with preference modeling. Conversely, it can be shown that any fuzzy set F representing a fuzzy piece of knowledge, with a membership function µF defined on a finite set is semantically equivalent to a possibilistic logic base.
418
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
There is a major difference between possibilistic logic and weighted many-valued logics of Pavelka-style [Pavelka, 1979; H´ ajek, 1998a], especially fuzzy Prolog languages like Lee’s fuzzy clausal logic [Lee, 1972], although they look alike syntactically. Namely, in the latter, a weight t attached to a (many-valued) formula p often acts as a truth-value threshold, and (p, t) in a fuzzy knowledge base expresses the requirement that the truth-value of p should be at least equal to t for (p, t) to be valid. So in such fuzzy logics, while truth is many-valued, the validity of a weighted formula is two-valued. For instance, in Pavelka-like languages, (p, t) can be encoded as t → p adding a truth-constant t to the language. Using RescherGaines implication, t → p has validity 1 if p has truth-value at least t, and 0 otherwise; then (p, t) is Boolean. Of course, using another many-valued implication, (p, t) remains many-valued. On the contrary, in possibilistic logic, truth is two valued (since p is Boolean), but the validity of (p, α) with respect to classical interpretations is many-valued [Dubois and Prade, 2001]. In some sense, weights in Pavelka style may defuzzify many-valued logics, while they fuzzify Boolean formulas in possibilistic logic. Moreover inferring (p, α) in possibilistic logic can be viewed as inferring p with some certainty, quantified by the weight α, while in standard many valued logics (i.e. with a standard notion of proof) a formula is either inferred or not [H´ ajek, 1998a]. Since possibilistic logic bases are semantically equivalent to fuzzy sets of interpretations, it makes sense to use fuzzy set aggregation operations for merging the bases. Pointwise aggregation operations applied to fuzzy sets can be also directly performed at the syntactic level. This idea was first pointed out by Boldrin [1995] (see also [Boldrin and Sossai, 1995]), and generalized [Benferhat et al., 1998] to two possibilistic bases K1 = {(pi , αi ) | i ∈ I} and K2 = {(qj , βj ) | j ∈ J}. It can be, in particular, applied to triangular norm and triangular co-norm operations. Let πT and πS be the result of the combination of πK1 and πK2 based on a t-norm operation T , and the dual t-conorm operation S(α, β) = 1 − T (1 − α, 1 − β) respectively. Then, πT and πS are respectively associated with the following possibilistic logic bases: • KT = K1 ∪ K2 ∪ {(pi ∨ qj , S(αi , βj )) | (pi , αi ) ∈ K1 , (qj , βj ) ∈ K2 }, • KS = {(pi ∨ qj , T (αi , βj )) | (pi , αi ) ∈ K1 , (qj , βj ) ∈ K2 }. With T = min, Kmin = K1 ∪ K2 in agreement with possibilistic logic semantics. This method also provides a framework where symbolic approaches for fusing classical logic bases [Konieczny and Pino-P´erez, 1998] can be recovered by making the implicit priorities induced from Hamming distances between sets of models, explicit [Benferhat et al., 2002a; Konieczny et al., 2002]. Bipolar possibilistic logic A remarkable variant of possibilistic logic is obtained by no longer interpreting weights as lower bounds of necessity (nor possibility) measures, but as constraints in terms of yet another set function expressing guaranteed possibility. Section 2.2
Fuzzy Logic
419
recalled how a possibility measure Π and a necessity measure N are defined from a possibility distribution π. However, given a (non-contradictory, non-tautological) proposition p, the qualitative information conveyed by π pertaining to p can be assessed not only in terms of possibility and necessity measures, but also in terms of two other functions. Namely, ∆(p) = minω∈[p] π(ω) and ∇(p) = 1 − ∆(¬p). ∆ is called a guaranteed possibility function [Dubois and Prade, 1992c]17 . Thus a constraint of the form ∆(p) ≥ α expresses the guarantee that all the models of p are possible at least at degree α. This is a form of positive information, which contrasts with constraints of the form N (p) ≥ α (⇔ Π(¬p) ≤ 1 − α) that rather expresses negative information in the sense that counter-models are then (somewhat) impossible [Dubois et al., 2000]. Starting with a set of constraints of the form ∆(pj ) ≥ βj for j = 1, . . . , n, expressing that (all) the models of pj are guaranteed to be possible at least at level βj , and applying a principle of maximal specificity that minimizes possibility degrees, the most informative possibility distribution π∗ such that the constraints are satisfied is obtained. Note that this principle is the converse of the one used for defining πK , and is in the spirit of a closed-world assumption: only what is said to be (somewhat) guaranteed possible is considered as so. Namely π∗ (ω) = max min(µ[pj ] (ω), βj ). j=1,n
By contrast with Π and N , the function ∆ is non-increasing (rather than nondecreasing) w. r. t. logical entailment. Fusion of guaranteed possibility-pieces of information is disjunctive rather than conjunctive (as expressed by π∗ by contrast with the definition of πK ). ∆ satisfies the characteristic axiom ∆(p ∨ q) = min(∆(p), ∆(q)), and the basic inference rules, in the propositional case, associated with ∆ are • [¬p ∧ q, α], [p ∧ r, β] ⊢ [q ∧ r, min(α, β)] (resolution rule) • if p entails q classically, [q, α] ⊢ [p, α] (formula weakening) • for β ≤ α, [p, α] ⊢ [p, β] (weight weakening) • [p, α]; [p, β] ⊢ [p, max(α, β)] (weight fusion). where [p, α] stands for ∆(p) ≥ α. The first two properties show the reversed behavior of ∆-based formulas w. r. t. usual entailment. Indeed, if all the models of q are guaranteed to be possible, then it holds as well to any subset of models, e.g. the models of p, knowing that p entails q. Besides, observe that the formula [p ∧ q, α] is semantically equivalent to [q, min(v(p), α)], where v(p) = 1 if p is true and v(p) = 0 if p is false. This means that p ∧ q is guaranteed to be possible at least to the level α, if q is guaranteed to be possible to this level when p is true. 17 Not
to be confused with Baaz ∆ operator in Section 3.4.
420
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
This remark can be used in hypothetical reasoning, as in the case of standard possibilistic formulas. So, ∆-based formulas behave in a way that is very different and in some sense opposite to the one of standard (N -based) formulas (since the function ∆ is non-increasing). When dealing with uncertainty, this leads to a twofold representation setting distinguishing between • what is not impossible because not ruled out by our beliefs; this is captured by constraints of the form N (pi ) ≥ αi associated with a possibility distribution π ∗ expressing the semantics of a standard possibilistic knowledge base, • and what is known as feasible because it has been observed; this is expressed by constraints of the form ∆(qj ) ≥ βj associated with π∗ . In other words, it offers a framework for reasoning with rules and cases (or examples) in a joint manner. Clearly, some consistency between the two types of information (what is guaranteed possible cannot be ruled out as impossible) should prevail, namely ∀ω, π∗ (ω) ≤ π ∗ (ω) and should be maintained through fusion and revision processes [Dubois et al., 2001]. The idea of a separate treatment of positive information and negative information has been also proposed by Atanassov [1986; 1999] who introduces the so-called intuitionistic fuzzy sets 18 as a pair of membership and non-membership functions constrained by a direct counterpart of the above inequality (viewing 1 − π ∗ as a non-membership function). However, apart from the troublesome use of the word ‘intuitionistic’ here, the logic of intuitionistic fuzzy sets (developed at the semantic level) strongly differs from bipolar possibilitic logic. See [Dubois et al., 2005b] for a discussion. A proposal related to Atanassov’s approach, and still different from bipolar possibilitic logic (in spite of its name) can be found in [Zhang and Zhang, 2004]. Possibilistic logic can be used as a framework for qualitative reasoning about preference [Liau, 1999; Benferhat et al., 2001; Dubois et al., 1999a]. When modeling preferences, bipolarity enables us to distinguish between positive desires encoded using ∆, and negative desires (states that are rejected) where N-based constraints describe states that are not unacceptable [Benferhat et al., 2002b]. Deontic reasoning can also be captured by possibilistic logic as shown by Liau [1999]. Namely, necessity measures encode obligation and possibility measures model implicit permission. Dubois et al. [2000] have pointed out that ∆ functions may account for explicit permission.
4.2 Extensions of possibilistic logic Possibilistic logic is amenable to different extensions. A first idea is to exploit refined or generalized scales, or yet allows weights to have unknown, or variable 18 This is a misleading terminology as the underlying algebra does not obey the properties of intuitionistic logic; see [Dubois et al., 2005b]
Fuzzy Logic
421
values, while preserving classical logic formulas and weights interpreted in terms of necessity measures. Variable weights enables a form of hypothetical reasoning to be captured, as well as accounting for some kinds of fuzzy rules as we shall see. Lattice-valued possibilistic logics The totally ordered scale used in possibilistic logic can be replaced by a complete distributive lattice. Examples of the interest of such a construct include: • multiple-source possibilistic logic [Dubois et al., 1992], where weights are replaced by fuzzy sets of sources that more or less certainly support the truth of formulas; • timed possibilistic logic [Dubois et al., 1991b] where weights are fuzzy sets of time points where formulas are known as being true with some timedependent certainty levels • a logic of supporters [Lafage et al., 2000], where weights are sets of irredundant subsets of assumptions that support formulas. A formal study of logics where formulas are associated with general “weights” in a complete lattice has been carried out by Lehmke [2001b]. Necessity values attached to formulas can be encoded as a particular case of such “weights”. More generally, a partially ordered extension of possibilistic logic whose semantic counterpart consists of partially ordered models has been recently proposed by (Benferhat, Lagrue and Papini, [2004b]). A recent extension [Dubois and Prade, 2006] of possibilistic logic allows a calculus where formulas, which can be nested, encode the beliefs of different agents and their mutual beliefs. One can for instance express that all the agents in a group have some beliefs, or that there is at least one agent in a group that has a particular belief, where beliefs may be more or less entrenched. Symbolic weights Rather than dealing with weights in a partially ordered structure, one may consider weights belonging to a linearly ordered structure, but handled in a symbolic manner in such a way that the information that some formulas are known to be more certain than others (or equally certain as others) can be represented by constraints on the weights. This may be useful in particular in case of multiple source knowledge. This idea already present in Benferhat et al. [1998] (where constraints encodes a partial order on the set of sources), has been more recently reconsidered by encoding the constraints as propositional formulas and rewriting the propositional possibilitic logic knowledge base in a two-sorted propositional logic [Benferhat et al., 2004a]. The principle is to translate (p, α) into p ∨ A (understood as “p is true or situation is A-abnormal”) and α ≤ β into ¬B ∨ A (a statement is all the more certain, as it is more abnormal to have it false, and
422
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
strong abnormality implies weaker abnormality). This view appears to be fruitful by leading to efficient compilation techniques both when the constraints partially order the weights [Benferhat and Prade, 2005], or linearly order them as in standard possibilistic logic [Benferhat and Prade, 2006]. Variable weights and fuzzy constants It has been noticed that subparts of classical logic formulas may be ‘moved’ to the weight part of a possibilistic logic formula. For instance, the possibilistic formula (¬p(x)∨q(x), α) is semantically equivalent to (q(x), min(µP (x), α)), where µP (x) = 1 if p(x) is true and µP (x) = 0 if p(x) is false. It expresses that q(x) is α-certainly true given the proviso that p(x) is true. This is the basis of the use of possibilistic logic in hypothetical reasoning [Dubois et al., 1991a] and case by case reasoning [Dubois and Prade, 1996b], which enables us to compute under what conditions a conclusion could be at least somewhat certain, when information is missing for establishing it unconditionally. Such variable weights can be also useful for fuzzifying the scope of a universal quantifier. Namely, an expression such that (¬p(x) ∨ q(x), α) can be read “∀x ∈ P, (q(x), α)” where the set P = {x | p(x) is true}. Making one step further, P can be allowed to be fuzzy [Dubois et al., 1994c]. The formula (q(x), µP (x)) then expresses a piece of information of the form “the more x is P , the more certain q(x) is true”. A fuzzy restriction on the scope of an existential quantifier can be also introduced in the following way [Dubois et al., 1998]. From the two classical first order logic premises “∀x ∈ A, ¬p(x, y) ∨ q(x, y)”, and “∃x ∈ B, p(x, c)”, where c is a constant, we can conclude that “∃x ∈ B, q(x, c)” provided that B ⊆ A. Let p(B, c) stand for that ∃x ∈ B, q(x, c)”. Then B can be called imprecise constant. Letting A and B be fuzzy sets, the following pattern can be established: (¬p(x, y) ∨ q(x, y), min(µA (x), α)); (p(B, c), β) ⊢ (q(B, c), min(NB (A), α, β). where NB (A) = inf t max(µA (t), 1 − µB (t)) is the necessity measure of the fuzzy event A based on fuzzy information B and it can be seen as a (partial) degree of unification of A given B. See [Alsinet et al., 1999 ; Alsinet, 2001; Alsinet et al., 2002] for a further development and logical formalization of these ideas in a logic programming framework. In particular, in that context the above pattern can be turned into a sound rule by replacing B by the cut Bβ in NB (A). A complete proof procedure based on a similar resolution rule dealing only with fuzzy constants has been defined [Alsinet and Godo, 2000; Alsinet and Godo, 2001]. This framework has been recently extended in order to incorporate elements of argumentation theory in order to deal with conflicting information [Ches˜ nevar et al., 2004; Alsinet et al., 2006]. Embedding possibilistic logic in a non-classical logic Another type of extension consists in embedding possibilistic logic in a wider object language adding new connectives between possibilistic formulas. In particular, it is
Fuzzy Logic
423
possible to cast possibilistic logic inside a (regular) many-valued logic such as G¨ odel or L ukasiewicz logic. The idea is to consider many-valued atomic sentences ϕ of the form (p, α) where p is a formula in classical logic. Then, one can define well-formed formulas of the form ϕ ∨ ψ, ϕ ∧ ψ, ϕ → ψ, etc. where the “external” connectives linking ϕ and ψ are those of the chosen many-valued logic. From this point of view, possibilistic logic can be viewed as a fragment of G¨ odel or L ukasiewicz logic that uses only one external connective: conjunction ∧ interpreted as minimum. This approach involving a Boolean algebra embedded in a non-classical one has been proposed by Boldrin and Sossai [1997; 1999] with a view to augment possibilistic logic with fusion modes cast at the object level. H´ ajek et al. [1995] use this method for both probability and possibility theories, thus understanding the probability or the necessity of a classical formula as the truth degree of another formula. This kind of embedding inside a fuzzy logic works for other uncertainty logics as well as explained in section 4.5. Lastly, possibilistic logic can be cast in the framework of modal logic. Modal accounts of qualitative possibility theory involving conditional statements were already proposed by Lewis [1973a] (this is called the VN conditional logic, see [Dubois and Prade, 1998a; Fari˜ nas and Herzig, 1991]). Other embeddings of possibilistic logic in modal logic are described in [Boutilier, 1994; H´ ajek, 1994; H´ ajek et al., 1994]. Possibilistic extensions of non-classical logics One may consider counterparts to possibilistic logic for non-classical logics, such as many-valued logics. A many-valued logic is cast in the setting of possibility theory by changing the classical logic formula p present in the possibilistic logic formula (p, α) into a many-valued formula, in G¨ odel or L ukasiewicz logic, for instance. Now (p, α) is interpreted as C(p) ≥ α, where C(p) is the degree of necessity of a fuzzy event as proposed by Dubois and Prade [Dubois and Prade, 1990] (see section 2.3). Alsinet and Godo [Alsinet, 2001; Alsinet and Godo, 2000] cast possibilistic logic in the framework of G¨ odel many-valued logic. A possibilistic many-valued formula can also be obtained in first-order logic by making a fuzzy restriction of the scope of an existential quantifier pertaining to a standard first order possibilistic formula, as seen above. Besnard and Lang [1994] have proposed a possibilistic extension of paraconsistent logic in the same spirit. Quasi-possibilistic logic (Dubois, Konieczny, and Prade [2003a]) encompasses both possibilistic logic and quasi-classical logic (a paraconsistent logic due to Besnard and Hunter [1995]; see also [Hunter, 2002]). These two logics cope with inconsistency in different ways, yet preserving the main features of classical logic. Thus, quasi-possibilistic logic preserves their respective merits, and can handle plain conflicts taking place at the same level of certainty (as in quasi-classical logic), while it takes advantage of the stratification of the knowledge base into certainty layers for introducing gradedness in conflict analysis (as in possibilistic logic).
424
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Lehmke [2001a; 2001b] has tried to cast Pavelka-style fuzzy logics and possibilistic logic inside the same framework, considering weighted many-valued formulas of the form (p, τ ), where p is a many-valued formula with truth set T , and τ is a “label” defined as a monotone mapping from the truth-set T to a validity set L. T and L are supposed to be complete lattices, and the set of labels has properties that make it a fuzzy extension of a filter in LT . Labels encompass what Zadeh [1975a] called “fuzzy truth-values” of the form “very true”, “more or less true”. They are continuous increasing mappings from T = [0, 1] to L = [0, 1] such that τ (1) = 1. A (many-valued) interpretation V al, associating a truth-value θ ∈ T to a formula p, satisfies (p, τ ), to degree λ ∈ L, whenever τ (θ) = λ. When T = [0, 1], L = {0, 1}, τ (θ) = 1 for θ ≥ t, and 0 otherwise, then (p, τ ) can be viewed as a weighted formula in some Pavelka-style logic. When T = {0, 1}, L = [0, 1], τ (θ) = 1 − α for θ = 0, and 1 for θ = 1, then (p, τ ) can be viewed as a weighted formula in possibilistic logic. Lehmke [2001a] has laid the foundations for developing such labelled fuzzy logics, which can express uncertainty about (many-valued) truth in a graded way. It encompasses proposals of Esteva et al. [1994] who suggested that attaching a certainty weight α to a fuzzy proposition p can be modeled by means of a labeled formula (p, τ ), where τ (θ) = max(1 − α, θ), in agreement with semantic intuitions formalized in [Dubois and Prade, 1990]. This type of generalization highights the difference between many-valued and possibilistic logics. Refining possibilistic inference A last kind of extension consists in keeping the language and the semantics of possibilistic logics, while altering the inference relation with a view to make it more productive. Such inference relations that tolerate inconsistency can be defined at the syntactic level [Benferhat et al., 1999]. Besides, proof-paths leading to conclusions can be evaluated by more refined strategies than just their weakest links [Dubois and Prade, 2004].
4.3 Possibilistic nonmonotonic inference A nonmonotonic inference notion can be defined in possibilistic logic as K ⊢pref p if and only if K ⊢ (p, α) with α > Inc(K). It can be rewritten as K cons ⊢ (p, α), where K cons = K \ {(pi , αi ) | αi ≤ Inc(K)} is the set of weighted formulas whose weights are above the level of inconsistency (they are thus not involved in the inconsistency). Indeed, Inc(K cons ) = 0. This inference is nonmonotonic because due to the non-decreasingness of the inconsistency level when K is augmented, K ⊢pref p may not imply K ∪ {(q, 1)} ⊢pref p. The semantic counterpart to the preferential nonmonotonic inference K ⊢pref p (that is, K ⊢ (p, α) with α > Inc(K)) is defined as K |=pref p if and only if NK (p) > Inc(K), where NK derives from the possibility distribution πK that describes the fuzzy set of models of K. The set {ω | πK (ω) is maximal} forms the set of best models B(K) of K. It turns out that K |=pref p if and only if B(K) ⊆ [p] if and only if K ⊢pref p. It can be shown that B(K) ⊆ [p] is
Fuzzy Logic
425
equivalent to ΠK (p) > ΠK (¬p) where ΠK is the possibility measure defined from πK [Dubois and Prade, 1991c]. Similarly K ∪ {(p, 1)} |=pref q is equivalent to ΠK (p ∧ q) > ΠK (p ∧ ¬q). The latter corresponds to the idea of inferring a belief q from a contingent proposition p in the context of some background knowledge described by πK (encoded in K), which we denote p |=πK q. Conversely, a constraint of the form Π(p ∧ q) > Π(p ∧ ¬q) is a proper encoding of a default rule expressing that in context p, having q true is the normal course of things. Then a knowledge base made of a set of default rules is associated with a set of such constraints that induces a family (possibly empty in case of inconsistency) of possibility measures. Two types of nonmonotonic entailments can be then defined (see [Benferhat et al., 1992; Benferhat et al., 1997; Dubois and Prade, 1995] for details): 1. the above preferential entailment |=π based on the unique possibility distribution π obeying the above constraints (it leads to an easy encoding of default rules as possibilistic logic formulas); 2. a more cautious entailment, if we restrict to beliefs inferred from all possibility measures obeying the above constraints. Clearly p |=π q means that when only p is known to be true, q is an expected, normal conclusion since q is true in all the most plausible situations where p is true. This type of inference contrasts with the similarity-based inference of Section 4.4 since in the latter the sets of models of q is enlarged so as to encompass the models of p, while in possibilistic entailment, the set of models of p is restricted to the best ones. Preferential possibilistic entailment |=π satisfies the following properties that characterize nonmonotonic consequence relations |∼: Restricted Reflexivity: Consistency Preservation: Left logical equivalence: Right weakening: Closure under conjunction: OR: Rational monotony: Cut:
p |∼ p, if |= p ≡ ⊥ p |∼ ⊥ if |= p ≡ p′ , from p |∼ q deduce p′ |∼ q from q |= q ′ and p |∼ q deduce p |∼ q ′ p |∼ q and p |∼ r deduce p |∼ q ∧ r from p |∼ r and q |∼ r deduce p ∨ q |∼ r from p |∼ r and p |∼ ¬q deduce p ∧ q |∼ r from p ∧ q |∼ r and p |∼ q deduce p |∼ r.
But for the two first properties (replaced by a mere reflexivity axiom), these are the properties of the so-called rational inference of Lehmann and Magidor [1992]. Let us explain some of these axioms. Restricted reflexivity just excludes the assumption that everything follows by default from a contradiction. Consistency preservation ensures the consistency of lines of reasoning from consistent arguments. Right weakening and closure for conjunction ensures that the set of plausible consequences of p is a deductively closed set. The OR rule copes with reasoning by cases. Rational monotony controls the amount of monotonicity of the possibilistic inference: from p |=π r we can continue concluding r if q is also
426
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
true, provided that it does not hold that, in the context p, ¬q is expected. The cut rule is a weak form of transitivity. Liau and Lin [1996] have augmented possibilistic logic with weighted conditionals of the form c
c+
p −→ q and p −→ q that encode Dempster rule of conditioning (Π(q | p) = Π(p ∧ q)/Π(p)), and correspond to constraints Π(p ∧ q) ≥ c · Π(p) and Π(p ∧ q) > c · Π(p) respectively with c being a coefficient in the unit interval. Liau [1998] considers more general conditionals where a t-norm is used instead of the product. Note that if p = ⊤ (tautology), then c
1−c+
⊤ −→ q and ¬(⊤ −→ ¬q) stands for Π(q) ≥ c for N (q) ≥ c respectively. This augmented possibilistic logic enables various forms of reasoning to be captured such as similarity-based and default reasoning as surveyed in [Liau and Lin, 1996].
4.4 Deductive Similarity Reasoning The question raised by interpolative reasoning is how to devise a logic of similarity, where inference rules can account for the proximity between interpretations of the language. This kind of investigation has been started by Ruspini [1991] with a view to cast fuzzy patterns of inference such as the generalized modus ponens of Zadeh into a logical setting, and pursued by Esteva et al. [1994]. Indeed in the scope of similarity modeling, a form of generalized modus ponens can be expressed informally as follows, p is close to being true p approximately implies q q is not far from being true where “close”, “approximately”, and “not far” refer to a similarity relation S, while p and q are classical propositions. The universe of discourse Ω serves as a framework for modeling the meaning of classical propositions p1 , p2 , . . . , pn in a formal language L, by means of constraints on a set of interpretations Ω. Interpretations are complete descriptions of the world in terms of this language, and assign a truth-value to each propositional variable. Let [p] denote the set of models of proposition p, i.e., the set of interpretations which make p true. If ω is a model of p, this is denoted denoted w |= p. The set of interpretations Ω is thus equipped with a similarity relation S, that is a reflexive, symmetric and t-normtransitive fuzzy relation. The latter property means that there is a triangular norm T such that ∀ω, , ω ′ , ω ′′ , T (S(ω, ω ′ ), S(ω ′ , ω ′′ )) ≤ S(ω, ω ′′ ). For any subset A of Ω, a fuzzy set A∗ can be defined by
Fuzzy Logic
427
(24) A∗ (ω) = sup S(ω, ω ′ ) ω ′ ∈A
where S(ω, ω ′ ) is the degree of similarity between ω and ω ′ . A∗ is the fuzzy set of elements close to A. Then proposition p can be fuzzified into another proposition p∗ which means “approximately p” and whose fuzzy set of models is [p∗ ] = [p]∗ as defined by (24). Clearly, a logic dealing with propositions of the form p∗ is a fuzzy logic in the sense of a many-valued logic, whose truth-value set is the range of S(ω, ω ′ ), for instance [0, 1]. The satisfaction relation is graded and denoted |=α namely, ω |=α p
iff
there exists a model ω ′ of p which is α-similar to ω,
in other words, iff [p∗ ](ω) ≥ α, i.e., ω belongs to the α-cut of [p∗ ], that will be denoted by [p∗ ]α . One might be tempted by defining a multiple-valued logic of similarity. Unfortunately it cannot be truth-functional. Namely given S, truth evaluations vω defined as v(p) = [p∗ ](ω), associated to the interpretation ω, are truth-functional neither for the negation not for the conjunction. Indeed, in general, [p ∧ q]∗ (ω) is not a function of [p∗ ](ω) and [q ∗ ](ω) only. This feature can be observed even if S is a standard equivalence relation. Indeed, for A ⊆ Ω, A∗ = S ◦ A is the union of equivalence classes of elements belonging to A, i.e., it is the upper approximation of A in the sense of rough set theory [Pawlak, 1991], and it is well known that [A ∩ B]∗ ⊆ [A]∗ ∩ [B]∗ and no equality is obtained (e.g., when A ∩ B = ∅, but [A]∗ ∩[B]∗ = ∅). This fact stresses the difference between similarity logic and other truth-functional fuzzy logics. The reason is that here all fuzzy propositions are interpreted in the light of a single similarity relation, so that there are in some sense less fuzzy propositions here than in more standard many-valued calculi. Similarity logic is more constrained, since the set of fuzzy subsets {[p]∗ : p ∈ L} of Ω induced by classical propositions of the language L, is in a one-to-one correspondence to a Boolean algebra (associated with L), and is only a proper subset of the set [0, 1]Ω of all fuzzy subsets of Ω. However it holds that [A ∪ B]∗ = [A]∗ ∪ [B]∗ . The graded satisfaction relation can be extended over to a graded semantic entailment relation: a proposition p entails a proposition q at degree α, written p |=α q, if each model of p makes q ∗ at least α-true, where q ∗ is obtained by means of a T -transitive fuzzy relation S [Dubois et al., 1997a]. That is, p |=α q holds iff [p] ⊆ [q ∗ ]α . p |=α q means “p entails q, approximately” and α is a level of strength. The properties of this entailment relation are:
428
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
Nestedness: T-Transitivity: Reflexivity: Right weakening: Left strengthening: Left OR: Right OR:
if p |=α q and β ≤ α then p |=β q; if p |=α r and r |=β q then p |=T (α,β) q; p |=1 p; if p |=α q and q |= r then p |=α r; if p |= r and r |=α q then p |=α q; p ∨ r |=α q iff p |=α q and r |=α q; if r has a single model, r |=α p ∨ q iff r |=α p or r |=α q.
The fourth and fifth properties are consequences of the transitivity property (since q |= r entails q |=1 r due to [q] ⊆ [r] ⊆ [r∗ ]1 ). They express a form of monotonicity. The transitivity property is weaker than usual and the graceful degradation of the strength of entailment it expresses, when T = min, is rather natural. It must be noticed that |=α does not satisfy the Right And property, i.e., from p |=α q and p |=α r it does not follow in general that p |=α q ∧ r. Hence the set of approximate consequences of p in the sense of |=α will not be deductively closed. The left OR is necessary to handle disjunctive information, and the right OR is a consequence of the decomposability w.r.t. the ∨ connective in similarity logic. Characterization of the similarity-based graded entailment in terms of the above properties as well as for two other related entailments are given in [Dubois et al., 1997a]. The idea of approximate entailment can also incorporate “background knowledge” in the form of some proposition K. Namely, [Dubois et al., 1997a] propose ∗ ∗ another entailment relation defined as p |=α K q iff [K] ⊆ ([p ] → [q ])α , where → is the R-implication associated with the triangular norm T and [p∗ ] → [q ∗ ] expresses a form of gradual rule “the closer to the truth of p, the closer to the truth of q”. Then, using both |=α and |=α K , a deductive notion of interpolation based on gradual rules, as described in Section 2.5, can be captured inside a logical setting. The relations and the differences between similarity-based logics and possibilistic logic are discussed in [Esteva et al., 1994] and in [Dubois and Prade, 1998b]. The presence of a similarity relation on the set of interpretation suggests a modal logic setting for similarity-based reasoning where each level cut Sα of S is an accessibility relation. Especially p |=α q can be encoded as p |= 3α q, where 3α is the possibility modality induced by Sα . Such a multimodal logic setting is systematically developed by Esteva et al. [1997b]. Finally, let us mention that a different approach to similarity-based reasoning, with application to the framework of logic programming, has been formally developed in [Ying, 1994; Gerla and Sessa, 1999; Biacino et al., 2000; Formato et al., 2000]. The idea is to extend the classical unification procedure in classical first order logic by allowing partial degrees of matching between predicate and constants that are declared a priori to be similar to some extent. A comparison between both approaches can be found in [Esteva et al., 2001a].
Fuzzy Logic
4.5
429
Fuzzy logic theories to reason under uncertainty
Although fuzzy logic is not a logic of uncertainty per se, as it has been stressed in Sections 1 and 2, a fuzzy logic apparatus can indeed be used in a non standard (i.e. non truth-functional) way to represent and reasoning with probability or other uncertainty measures. This is the case for instance of the approach developed by Gerla [1994b]. Roughly speaking, Gerla devises a probability logic by defining a suitable fuzzy consequence operator C, in the sense of Pavelka (see Section 3.8), on fuzzy sets v of the set B of classical formulas (modulo classical equivalence) in a given language, where the membership degree v(p) of a proposition p is understood as lower bound on its probability. A (finitely additive) probability w on B is a fuzzy set (or theory) that is complete, i.e. fulfilling w(p) + w(¬p) = 1 for each p ∈ B. Models of fuzzy set v are probabilities w such that v ≤ w (i.e. v(p) ≤ w(p) for each p). The probabilistic theory C(v) generated by v is the greatest lower bound of the probabilities greater than or equal to v. Then Gerla defines a fuzzy deduction operator D based on some inference rules to deal with probability envelopes (called the h-m-k-rules and the h-m-collapsing rules) and shows that C and D coincide, this gives the probabilistic completeness of the system. In a series of works starting in [H´ ajek et al., 1995], a different logical approach to reason about uncertainty has been developed that is able to combine notions of different classical uncertainty measures (probability, necessity/possibility and belief functions) with elements of t-norm based fuzzy logics: the basic observation is that “uncertainty” or belief is itself a gradual notion, e.g. a proposition may be totally, quite, more or less, or slightly certain (in the sense of probable, possible, believable, plausible, etc.). For instance in the case of probability, one just starts with Boolean formulas ϕ and a probability on them; then there is nothing wrong in taking as truth-degree of the fuzzy proposition P ϕ := “ϕ is probable” just the probability degree of the crisp proposition ϕ. Technically speaking, the approach boils down to considering the following identity probability degree of ϕ = truth degree of P ϕ, where P is a (fuzzy) modality with the intended reading: P ϕ stands for the fuzzy proposition “ϕ is probable”. Notice that such an approach clearly distinguishes between assertions like “(ϕ is probable) and (ψ is probable)” on the one hand and “(ϕ ∧ ψ) is probable” in the other. This is the basic idea exposed in [H´ ajek et al., 1995] and then later refined by H´ ajek in [H´ ajek, 1998a]. Taking L ukasiewicz logic L as base logic, this is done by first enlarging the language of L by means of a unary (fuzzy) modality P for probably, defining two kinds of formulas: - classical Boolean formulas: ϕ, ψ, . . . (which are definable in L ), and - modal formulas: for each Boolean formula ϕ, P (ϕ) is an atomic modal formula and, moreover, such a class of modal formulas, MF, is taken closed under the connectives of L ukasiewicz logic →L and ¬L ,
430
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
and then by defining a set of axioms and an inference rule reflecting those of a probability measure, namely: (FP1) P (¬ϕ ∨ ψ) →L (P ϕ →L P ψ), (FP2) P (¬ϕ) ≡L ¬L P ϕ, (FP3) P (ϕ ∨ ψ) ≡L ((P ϕ →L P (ϕ ∧ ψ)) →L P ψ), and the necessitation rule for P : from ϕ infer P (ϕ), for any Boolean formula ϕ. The resulting fuzzy probability logic, F P (L), is sound and (finite) strong complete [H´ ajek, 1998a] with respect to the intended probabilistic semantics given by the class of probabilistic Kripke models. These are structures M = W, µ, e where W is a non-empty set, e : W × BF → {0, 1} (where BF denotes the set of Boolean formulas) is such that, for all w ∈ W , e(w, ·) is a Boolean evaluation of non-modal formulas, and µ is a finitely additive probability measure on a Boolean subalgebra Ω ⊆ 2W such that, for every Boolean formula ϕ, the set [ϕ]W = {w ∈ W : e(w, ϕ) = 1} is µ-measurable, i.e. [ϕ]W ∈ Ω and hence µ([ϕ]W ) is defined. Then, the truth-evaluation of a formula P ϕ in a model M is given by + P (ϕ) +M = µ([ϕ]W ) and it is extended to compound (modal) formulas using L ukasiewicz logic connectives. The completeness result for F P (L) states that a (modal) formula Φ follows (using the axioms and rules of F P (L)) from a finite set of (modal) formulas Γ iff + Φ +M = 1 in any probabilistic Kripke model M that evaluates all formulas in Γ with value 1. The same results holds for F P (RP L), that is, if the expansion of L with rational truth-constants RPL is used instead of L as base logic. Thus both F P (L) and F P (RP L) are adequate for a treatment of simple probability. Let us comment that the issue of devising fuzzy theories for reasoning with conditional probability has also been developed for instance in [Godo et al., 2000; Flaminio and Montagna, 2005; Flaminio, 2005; Godo and Marchioni, 2006] taking L Π 21 as base logic instead of L ukasiewicz logic in order to express axioms of conditional probability involving product and division. The same easy approach can be used to devise a fuzzy modal theory to reason with necessity measures, hence very close to possibilistic logic. In fact, buiding the modal formulas MF as above, just replacing the modality P by another modality N , the logic F N (L) is defined as F P (L) by replacing the axioms (FP1), (FP2) and (FP3) by the following ones: (FN1) N (¬ϕ ∨ ψ) →L (N ϕ →L N ψ), (FN2) ¬N ⊥ (FN3) N (ϕ ∧ ψ) ≡L (N ϕ ∧ N ψ) and keeping the necessitation rule for N : from ϕ infer N (ϕ), for any Boolean formula ϕ. This axiomatization gives completeness with respect to the intended semantics, i.e. w. r. t. the class of necessity Kripke models M = W, µ, e,
Fuzzy Logic
431
where now µ is a necessity measure on a suitable Boolean subalgebra Ω ⊆ 2W . Note that one can define the dual modality Π, Πϕ as ¬N ¬ϕ, and then the truthvalue of Πϕ in a necessity Kripke models is just the corresponding possibility degree. If we consider the theory F N (RP L), the necessity modal theory over RPL (thus introducing rational truth-constants), then one faithfully cast possibilistic logic into F N (RP L) by transforming possibilistic logic expressions (p, α) (with α rational) into the modal formulas α →L p. See [Marchioni, 2006] for an extension to deal with generalized conditional necessities and possibilities. This kind of approach has been generalized to deal with Dempster-Shafer belief functions19 . The idea exploited there is that belief functions on propositions can be understood as probabilities of necessities (in the sense of S5 modal formulas). So, roughly speaking, what one needs to do is to define the above F P (L) over S5 formulas rather than over propositional calculus formulas. Then the belief modal formula Bϕ, where ϕ is a classical (modality free) formula, is defined as P 2ϕ. The details are fully elaborated in [Godo et al., 2003], including completeness results of the defined fuzzy belief function logic F B(L) w. r. t. the intended semantics based on belief functions. 5
CONCLUSION
The idea of developing something like fuzzy logic was already part of Zadeh’s concerns in the early fifties. Indeed, one can read in an early position paper of his, entitled Thinking machines: a new field in Electrical Engineering the following premonitory statement20 : “Through their association with mathematicians, the electrical engineers working on thinking machines have become familiar with such hitherto remote subjects as Boolean algebra, multivalued logic, and so forth. And it seems that the time is not so far distant when taking a course in mathematical logic will be just as essential to a graduate student in electrical engineering as taking a course in complex variables is at the present time.” It seems that Zadeh’s prediction was correct to a large extent. The historical development of fuzzy logic may look somewhat erratic. The concept of approximate reasoning developed by Zadeh in the seventies in considerable details did not receive great attention at the time, neither from the logical community, nor from the engineering community, let alone the artificial intelligence community, despite isolated related works in the eighties. Many logicians did not like it by lack of a syntax. Engineers exploited very successful, sometimes ad hoc, numerical techniques borrowing only a small part of fuzzy set concepts. They 19 A belief function [Shafer, 1975] on a set W is a mapping bel : 2W → [0, 1] satisfying the following conditions: bel(W ) = 1, bel(∅) = 0 and bel(A1 ∪ . . . ∪ An ) ≥ P |I|+1 bel(∩ i∈I Ai ), for each n. ∅= I⊆{1,...,n} (−1) 20 appearing in the Columbia Engineering Quarterly, Vol. 3, January 1950, p. 31
432
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
did not implement the combination projection principle which is the backbone of approximate reasoning (see [Dubois et al., 1999b] on this point). Finally there is a long tradition of mutual distrust between artificial intelligence and fuzzy logic, due to the numerical flavor of the latter. Only later on, in the late nineties, approximate reasoning would be at work in possibilistic counterparts to Bayesian networks. The nineties witnessed the birth of new important research trend on the logical side, which is no less than a strong revival of the multiple-valued logic tradition, essentially prompted by later theoretical developments of fuzzy set theory (especially the axiomatization of connectives). However multiple-valued logic had been seriously criticized at the philosophical level (see the survey paper by Urquart[1986], for instance) because of the confusion between truth-values on the one hand and degrees of belief, or various forms of incomplete information, on the other hand, a confusion that even goes back to pioneers including L ukasiewicz (e.g., the idea of ”possible” as a third truth-value). Attempts to encapsulate ideas of non-termination and error values (suggested by Kleene) in many-valued logics in formal specification of software systems also seem to fail (see H¨ahnle [2005]). In some sense, fuzzy set theory had the merit of giving multiple-valued logic a more natural interpretation, in terms of gradual properties. The point is to bridge the gap between logical notions and non-Boolean (even continuous) representation frameworks. This has nothing to do with the representation of belief. It is interesting to see that the current trend towards applying L ukasiewicz infinite-valued logic and other multiple valued logics is not focused on the handling of uncertainty, but on the approximation of real functions via normal forms (see the works of Mundici [1994], Perfilieva [2004], Aguzzoli and Gerla [2005], etc.). Another emerging topic is the reconsideration of mathematical foundations of set theory in the setting of the general multiple-valued logic setting recently put together [Behounek and Cintula, 2006b], sometimes in a category-theoretical framework [H¨ohle, 2007]. However, in this new trend, the fundamental thesis of Zadeh, namely that “fuzzy logic is a logic of approximate reasoning” is again left on the side of the road. Yet our contention is that a good approach to ensuring a full revival of fuzzy logic is to demonstrate its capability to reasoning about knowledge and uncertainty. To this end, many-valued logics must be augmented with some kind of modalities, and the natural path to this end is the framework of possibility theory. The case of possibilistic logic is typical of this trend, as witnessed by its connections to modal logic, nonmonotonic logics and non-standard probabilities, along the lines independently initiated by Lewis [1973b] and by Kraus, Lehmann and Magidor [1990]. However, possibilistic logic handles sharp propositions. Recent works pointed out in the last part of this survey make first steps towards a reconciliation between possibility theory, other theories of belief as well, and many-valued logic. Fuzzy logic in the narrow sense being essentially a rigorous symbolic setting to reason about gradual notions (including belief), we argue that this is the way to follow.
Fuzzy Logic
433
BIBLIOGRAPHY [Adillon and Verd´ u, 2000] R. J. Adillon and V. Verd´ u. On a Contraction-Less Intuitionistic Propositional Logic with Conjunction and Fusion. Studia Logica 65(1):11-30 (2000) [Aglian´ o et al., to appear] P. Aglian´ o, I. M. A. Ferreirim and F. Montagna. Basic hoops: an algebraic study of continuous t-norms. Studia Logica, to appear. [Aguzzoli, 2004] S. Aguzzoli. Uniform description of calculi for all t-norm logics. In L. Henkin et al.(eds.), Proceedings of 34th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2004),pages 38-43, 2004. [Aguzzoli and Gerla, 2005] S. Aguzzoli and B. Gerla. Normal Forms for the One-Variable Fragment of H´ ajek’s Basic Logic. Proceedings of 35th IEEE International Symposium on MultipleValued Logic (ISMVL 2005), pp. 284-289. [Aguzzoli et al., 2005] S. Aguzzoli, B. Gerla and Z. Hanikov´ a. Complexity issues in basic logic. Soft Computing 9: 919–934, 2005. [Aguzzoli et al., 2005a] S. Aguzzoli, B. Gerla and C. Manara. Poset Representation for Gdel and Nilpotent Minimum Logics. Proceedings of The 8th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU 2005, Barcelona, Spain. Lecture Notes in Artificial Intelligence, 3571 Springer, pp. 662-674, 2005. [Aguzzoli et al., 2006] S. Aguzzoli, B. Gerla and C. Manara. Structure of the algebras of NMGformulas. Proceedings of the 11th Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, IPMU 2006, Paris, France, pp. 1620-1627, 2006. [Alsina et al., 1980] C. Alsina, E. Trillas and L. Valverde (1980) On Some Logical Connectives for Fuzzy Set Theory. Busefal 3, Summer 1980, Universit´e Paul Sabatier. pp. 18-29. Long version in Journal of Mathematical Analysis and Applications. [Alsina et al., 1983] C. Alsina, E. Trillas and L. Valverde. On Some Logical Connectives for Fuzzy Set Theory. Journal of Math. Analysis and Applications 93 (1), 1983, pp. 15-26. [Alsina et al., 2006] C. Alsina, M. Frank and B. Schweizer. Associative Functions: Triangular Norms and Copulas, World Scientific, 2006. [Alsinet, 2001] T. Alsinet. Logic Programming with Fuzzy Unification and Imprecise Constants: Possibilistic Semantics and Automated Deduction, Ph. D. Thesis, Technical University of Catalunya, Barcelona, 2001. [Alsinet and Godo, 2000] T. Alsinet and L. Godo, A complete calculus for possibilistic logic programming with fuzzy propositional variables. Proc. of the 16th Conference on Uncertainty in Artificial Intelligence (UAI’00), Stanford, Ca., (Morgan Kaufmann, San Francisco, Ca.), 2000, pp.1-10. [Alsinet and Godo, 2001] T. Alsinet and L. Godo. A proof procedure for possibilistic logic programming with fuzzy constants. In Proc. of the ECSQARU-2001 Conference, LNAI 2143, Springer, pages 760–771, 2001. [Alsinet et al., 1999 ] T. Alsinet, L. Godo and S. Sandri. On the semantics and automated deduction for PLFC, a logic of possibilistic uncertainty and fuzziness. Proc. of the 15th Conference on Uncertainty in Artificial Intelligence, (UAI’99), Stockholm, Sweden, (Morgan Kaufmann, San Francisco, Ca.), 1999, pp. 3-20. [Alsinet et al., 2002] T. Alsinet, L. Godo, S. Sandri. Two formalisms of extended possibilistic logic programming with context-dependent fuzzy unification: a comparative description. Elec. Notes in Theor. Computer Sci. 66 (5), 2002. [Alsinet et al., 2006] T. Alsinet, C. Ches˜ nevar, L. Godo, S. Sandri and G. Simari (2006) Modeling Defeasible Argumentation within a Possibilistic Logic Framework with Fuzzy Unification. Proc. of the 11th International Conference IPMU 2006 (Information Processing and Mangament of Uncertainty), 1228-1235. [Atanassov, 1986] K. T. Atanassov. Intuitionistic fuzzy sets. Fuzzy Sets and Systems, 20, 8796,1986. [Atanassov, 1999] K. T. Atanassov. Intuitionistic Fuzzy Sets: Theory And Applications, Physica Verlag, 1999. [Avron, 1991] A. Avron. Hypersequents, Logical Consequence and Intermediate Logics for Concurrency. Annals of Mathematics and Artificial Intelligence,4: 225–248, 1991. [Avron and Konikowska, 2001] A. Avron and B. Konikowska. Decomposition Proof Systems for G¨ odel-Dummett Logics.Studia Logica, 69(2):197?219, 2001.
434
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[Baader et al., 2003] F. Baader, D. Calvanese, D. McGuinness, D. Nardi and P. Patel-Schneider (eds.). The Description Logic Handbook: Theory, Implementation, and Applications, Cambridge University Press, 2003. [Baaz, 1996] M. Baaz. Infinite-valued G¨ odel logic with 0-1-projections and relativisations. In Petr H´ ajek, editor, G¨ odel’96: Logical Foundations of Mathematics, Computer Science, and Physics, volume 6 of Lecture Notes in Logic, pages 23–33. Springer-Verlag, Brno, 1996. [Baaz et al., 2002] M. Baaz, P. H´ ajek, F. Montagna and H. Veith. Complexity of t-tautologies. Ann Pure Appl Logic 113: 3–11, 2002. [Baaz et al., 2004] M. Baaz, A. Ciabattoni and F. Montagna. A proof-Theoreticalinvestigation of Monoidal T-norm Based Logic. Fundamenta Informaticae 59, 315-322, 2004. [Baldwin, 1979] J.F. Baldwin. A new approach to approximate reasoning using a fuzzy logic, Fuzzy Sets and Systems, 2, 309-325, 1979. [Baldwin and Pilsworth, 1979] J.F. Baldwin and B. W. Pilsworth. Fuzzy truth definition of possibility measure for decision classification, Int. J. of Man-Machine Studies, 11, 447-463, 1979.. [Baldwin and Pilsworth, 1980] J.F. Baldwin and B. W. Pilsworth. Axiomatic approach to implication for approximate reasoning with fuzzy logic, Fuzzy Sets and Systems, 3, 193-219, 1980. [Behounek and Cintula, 2006a] Libor Behounek and Petr Cintula. Fuzzy logics as the logics of chains. Fuzzy Sets and Systems 157(5): 604-610 (2006). [Behounek and Cintula, 2006b] Libor Behounek and Petr Cintula. From fuzzy logic to fuzzy mathematics: A methodological manifesto. Fuzzy Sets and Systems 157(5): 642-646 (2006). [Bellman and Giertz, 1973] R. Bellman and M. Giertz. On the analytical formalism of the theory of fuzzy sets. Information Sciences 5, 149-156, 1973 [Bellman and Zadeh, 1977] R.E. Bellman and L.A. Zadeh. Local and fuzzy logics, Modern Uses of Multiple-Valued Logic (Epstein G., ed.), D. Reidel, Dordrecht, 103-165, 1977. [Bˇ elohl´ avek, 2001] R. Bˇ elohl´ avek Fuzzy Closure Operators.Journal of Mathematical Analysis and Applications 262, 473–489, 2001. [Bˇ elohl´ avek, 2002a] R. Bˇ elohl´ avek Fuzzy closure operators II:induced relations, representation, and examples. Soft Computing 7, 53-64, 2002. [Bˇ elohl´ avek, 2002b] R. Bˇ elohl´ avek Fuzzy Relational Systems: Foundations and Principles. Kluwer Academic/Plenum Press (Vol. 20 of IFSR Int. Series on Systems Science and Engineering), New York, 2002. [Bˇ elohl´ avek, 2002c] R. Bˇ elohl´ avek. Fuzzy equational logic. Archive for Math. Logic 41(2002), 83–90. [Bˇ elohl´ avek and Vychodil, 2005] R. Bˇ elohl´ avek and V. Vychodil. Fuzzy Equational Logic. Springer (series: Studies in Fuzziness and Soft Computing, vol. 186), Berlin, 2005. [Benferhat and Prade, 2005] S. Benferhat and H. Prade. Encoding formulas with partially constrained weights in a possibilistic-like many-sorted propositional logic. Proc. 19th International Joint Conference on Artificial Intelligence IJCAI’05, Edinburgh, Scotland, UK, July 30-August 5, 1281-1286, 2005. [Benferhat and Prade, 2006] S. Benferhat and H. Prade. Compiling Possibilistic Knowledge Bases. Proc. 17th European Conference on Artificial Intelligence, (ECAI 2006), August 29 - September 1, Riva del Garda, Italy, 337-341, 2006. [Benferhat et al., 1998] S. Benferhat, D. Dubois, J. Lang, H. Prade, A. Saffiotti and P. Smets . A general approach for inconsistency handling and merging information in prioritized knowledge bases. Proc. of 6th Int. Conf. Principles of Knowledge Representation and Reasoning, Trento, Italy. Morgan Kaufmann, San Francisco, CA, pp. 466-477, 1998. [Benferhat et al., 1992] S. Benferhat, D. Dubois and H. Prade . Representing default rules in possibilistic logic. Proc. 3rd Inter. Conf. Principles of Knowledge Representation and Reasoning (KR’92), Cambridge, MA,(Morgan Kaufmann, San Francisco, Ca.), 1992, pp. 673-684. [Benferhat et al., 1997] S. Benferhat, D. Dubois and H. Prade. Nonmonotonic reasoning, conditional objects and possibility theory. Artificial Intellig. J., 92 (1997), 259-276. [Benferhat et al., 1998] S. Benferhat, D. Dubois and H. Prade, From semantic to syntactic approaches to information combination in possibilistic Logic. In: Aggregation and Fusion of Imperfect Information, (B. Bouchon-Meunier, ed.), Physica Verlag, 1998, pp. 141-161. [Benferhat et al., 1999] S. Benferhat, D. Dubois and H. Prade Some syntactic approaches to the handling of inconsistent knowledge bases: A comparative study. Part 2: The prioritized case. In: Logic at Work, (E. Orlowska, Ed.), Physica-Verlag, Heidelberg, 1999, pp. 473-511.
Fuzzy Logic
435
[Benferhat et al., 2001] S. Benferhat, D. Dubois, and H. Prade Towards a possibilistic logic handling of preferences. Applied Intelligence, 14 (2001), 303-317. [Benferhat et al., 2002a] S. Benferhat, D. Dubois, S. Kaci and H. Prade. Possibilistic merging and distance-based fusion of propositional information. Annals of Mathematics and Artificial Intelligence, 34, 217-252, 2002. [Benferhat et al., 2002b] S. Benferhat, D. Dubois, S. Kaci and H. Prade, Bipolar representation and fusion of preferences in the possibilistic logic framework. Proc. of the 8th Int. Conf. on Principles of Knowledge Representation and Reasoning, KR?02, Toulouse, France. (Morgan Kaufmann, San Francisco, Ca.), 2002, pp. 421-432. [Benferhat et al., 2004a] S. Benferhat, D. Dubois and H. Prade. Logique possibiliste avec calcul symbolique sur des poids partiellement constraints. In Actes des Rencontres Francophones sur la Logique Floue et ses Applications (LFA’04), Toulouse, France, Nov. 18-19, Cpadus, Toulouse, 67-74, 2004. [Benferhat et al., 2004b] S. Benferhat, S. Lagrue and O. Papini. Reasoning with partially ordered information in a possibilistic framework, Fuzzy Sets and Systems, 144, 25-41, 2004. [Besnard and Lang, 1994] P. Besnard and J. Lang. Possibility and necessity functions over nonclassical logics. Proc of the 10th Conf. Uncertainty in Artificial Intelligence (R. Lopez de Mantaras, D. Poole, eds.) (Morgan Kaufmann, San Francisco, Ca.), 1994, pp.69-76. [Besnard and Hunter, 1995] P. Besnard and A. Hunter. Quasi-classical logic: Non-trivializable classical reasoning from inconsistent information, Proc. of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty (ECSQARU), Fribourg, July 3-5, (Ch. Froidevaux, J. Kohlas, eds.), Lecture Notes in Computer Science 946, Springer, 44-51, 1995. [Biacino and Gerla , 1992] L. Biacino and G. Gerla. Generated necessities and possibilities, Int. J. Intelligent Systems, 7 (1992), 445-454. [Biacino et al., 2000] L. Biacino, G. Gerla and M. Ying Approximate Reasoning Based on Similarity. Mathematical Logic Quarterly, Vol. 46, N. 1, pp. 77-86, 2000. [Blok and Pigozzi, 1989] J. Willem Blok and Don Pigozzi. Algebraizable Logics, volume 396 of Memoirs of the American Mathematical Society.American Mathematical Society, Providence, 1989. [Boixader and Jacas, 1998] D. Boixader, J. Jacas Extensionality Based Approximate Reasoning. International Journal of Approximate Reasoning 19, 221-230, 1998. [Boldrin, 1995] L. Boldrin. A substructural connective for possibilistic logic. In: Symbolic and Quantitative Approaches to Reasoning and Uncertainty (Proc. of Europ. Conf. ECSQARU’95) C. Froidevaux, J. Kohlas, eds.), Springer Verlag, Fribourg, pp. 60-68, 2005. [Boldrin and Sossai, 1995] L. Boldrin and C. Sossai, An algebraic semantics for possibilistic logic. Proc of the 11th Conf. Uncertainty in Artificial Intelligence (P. Besnard, S. Hank, eds.) Morgan Kaufmann, San Francisco, CA, 1995, pp. 27-35. [Boldrin and Sossai, 1997] L. Boldrin and C. Sossai, Local possibilistic logic. J. Applied NonClassical Logics, 7 (1997), 309-333 [Boldrin and Sossai, 1999] L. Boldrin and C. Sossai. Truth-functionality and measure-based logics. In: Fuzzy Sets, Logic and Reasoning about Knowledge, (Dubois, D., Prade, H. and Klement, E.P., Eds.), Kluwer, Dordrecht, vol. 15 in Applied Logic Series, 1999, pp.351-380. [Bouchon-Meunier et al., 1999] B. Bouchon-Meunier, D. Dubois, L. Godo, H. Prade . Fuzzy sets and possibility theory in approximate and plausible reasoning. Fuzzy Sets in Approximate reasoning and Information Systems (Bezdek, J. Dubois, D. Prade, H., Eds): Kluwer, Boston, Mass., The Handbooks of Fuzzy Sets, 15-190, 1999. [Boutilier, 1994] C. Boutilier. Modal logics for qualitative possibility theory, Int. J. Approximate Reasoning, 10, 173-201, 1994. [Bova and Montagna, 2007] S. Bova and F. Montagna. Proof Search in Hajek’s Basic Logic. ACM Transactions on Computational Logic, to appear. [Butnariu and Klement , 1995] D. Butnariu, E. P. Klement, and S. Zafrany. On triangular normbased propositional fuzzylogics. Fuzzy Sets Systems 69: 241-255, 1995. [Carnap, 1949] Carnap R. (1949). The two concepts of probability, Philosophy and Phenomenological Research, 513-532, 1949. [Castro and Klawonn, 1994] F. Klawonn and J.L. Castro. Similarity in Fuzzy Reasoning. Mathware and SoftComputing 2, 197-228, 1994. [Castro and Trillas, 1991] J.L. Castro and E. Trillas. Tarski’s Fuzzy Consequences. Proc. of IFES’91. Yokohama (Japan). Vol. 1, 70-81, 1991.
436
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[Castro et al., 1994] J.L. Castro, E. Trillas and S. Cubillo (1994). On consequence in approximate reasoning. Journal ofApplied Non-Classical Logics, vol. 4, n. 1, 91-103. [Chakraborty, 1988] M.K. Chakraborty. Use of fuzzy set theory in introducing graded consequence in multiple valued logic. In Fuzzy Logic in KnowledgeSystems, Decision and Control, M.M. Gupta and T.Yamakawa (eds).North Holland, Amsterdam, pp. 247-257, 1998. [Chakraborty, 1995] M.K. Chakraborty. Graded consequence: further studies. Journal of Applied Non-Classical Logics 5 (2), 227-237, 1995. [Chang, 1958] Chen Chung Chang.Algebraic analysis of many-valued logics.Trans. Amer. Math. Soc., 88:456–490, 1958. [Chang, 1959] C.C. Chang. A new proof of the completeness of the L ukasiewicz axioms. Transactions of the American Mathematical Society 93 (1959), pp. 74–90. [Ches˜ nevar et al., 2004] C. Ches˜ nevar, G.Simari, T. Alsinet and L.Godo (2004) A Logic Programming Framework for Possibilistic Argumentation with Vague Knowledge. Procs. of the Uncertainty in Artificial Intelligence Conference (UAI-2004), Banff (Canada), July 7-11, (M. Chickering and J. Halpern Eds), AUAI Press, pages 76–84. [Chung and Schwartz, 1995] H.-T. Chung and D.G. Schwartz. A resolution-based system for symbolic approximate reasoning, International Journal of Approximate Reasoning, 13, 3 (1995) 201-246. [Ciabattoni et al., 2002] A. Ciabattoni, F. Esteva and L. Godo. T-norm based logics with ncontraction. Special Issue on SOFSEM2002 of Neural Network World, 12(5):453–460, 2002. [Ciabattoni et al., 2005] A. Ciabattoni, C. Ferm¨ uller and G. Metcalfe. Uniform Rules and Dialogue Games for Fuzzy Logics. In Proceedings of LPAR 2004, volume 3452 of LNAI, pages 496-510, 2005 [Cignoli et al., 1999] R. Cignoli, I.M.L. D’Ottaviano and D. Mundici. Algebraic Foundations of many-valued reasoning. Kluwer Academic Press, Dordrecht-Boston-London, 1999. [Cignoli et al., 2000] R. Cignoli, F. Esteva, L. Godo, and A. Torrens. Basic fuzzy logic is the logic of continuous t-norms and their residua. Soft Computing, 4(2):106–112, 2000. [Cintula, 2001a] P. Cintula. The L Π. and L Π1/2 propositional and predicate logics. Fuzzy Sets and Systems 124(3): 289-302 (2001) [Cintula, 2001b] P. Cintula. An alternative approach to the L Π logic. Neural Network World 124: 561-575 (2001). [Cintula, 2003] P. Cintula: Advances in the L Π and L Π 21 logics. Arch. Math. Log. 42(5): 449-468 (2003). [Cintula, 2005a] P. Cintula. Short note: on the redundancy of axiom (A3) in BL and MTL. Soft Comput. 9(12): 942-942 (2005) [Cintula, 2005b] P. Cintula. A note to the definition of the L Π-algebras. Soft Comput. 9(8): 575-578 (2005) [Cintula, 2005c] P. Cintula. From Fuzzy Logic to Fuzzy Mathematics. Ph.D. dissertation, Czech Technical University, Prague (Czech Republic), 2005. [Cintula, 2006] P. Cintula. Weakly implicative (fuzzy) logics I: Basic properties. Archive for Mathematical Logic, 45(6):673–704, 2006. [Cintula and Gerla, 2004] Petr Cintula and Brunella Gerla. Semi-normal forms and functional representation of product fuzzy logic. Fuzzy Sets and Systems 143, 89–110, 2004. [Cintula et al., 2006] P. Cintula, E.P. Klement, R. Mesiar, and M. Navara. Residuated logics based on strict triangular norms with an involutive negation Mathematical Logical Quarterly 52(3): 269-282 (2006). [Cross and Sudkamp, 1994] V. Cross and T. Sudkamp. Patterns of fuzzy rule-based inference, Int. J. of Approximate Reasoning, 11, 235-255, 1994. [de Finetti, 1936] B. de Finetti. La logique de la probabilit´e, Actes Congr`es Int. de Philos. Scient., Paris 1935. Hermann et Cie Editions, Paris, IV1- IV9. [Di Nola et al., 1985] A. di Nola, W. Pedrycz and S. Sessa. Fuzzy relation equations and algorithms of inference mechanism in expert systems, Approximate Reasoning in Expert Systems (Gupta M. M., Kandel A., Bandler W. and Kiszka J. B., eds.), North-Holland, Amsterdam, 355-367, 1985. [Di Nola et al., 1989] A. di Nola, W. Pedrycz and S. Sessa. An aspect of discrepancy in the implementation of modus ponens in the presence of fuzzy quantities, Int. J. of Approximate Reasoning, 3, 259-265, 1999.
Fuzzy Logic
437
[Di Nola et al., 2002] A. di Nola, G. Georgescu, and A. Iorgulescu. Pseudo-BL algebras I and II. J. Multiple-Valued Logic 8: 671–750, 2002. [Domingo et al., 1981] X. Domingo, E. Trillas and L. Valverde. Pushing L ukasiewicz-Tarski implication a little farther. Proc. IEEE Int. Symposium on Multiple-valued Logic (ISMVL’81), 232-234, 1981.. [Dubois, 1980] D. Dubois. Triangular norms for fuzzy sets. Proc. of 2nd International Linz Seminar on Fuzzy Set Theory, E.P. Klement (ed.), 39-68, 1980. [Dubois and Prade, 1979a] D. Dubois and H. Prade. New Results about Properties and Semantics of Fuzzy Set-theoretic Operators. First Symposium on Policy Analysis and Information Systems. Durham, North Caroline, USA, 167-174, 1979. [Dubois and Prade, 1979b] D. Dubois and H. Prade. Operations in a fuzzy-valued logic, Information and Control 43(2), 224-240, 1979. [Dubois and Prade, 1980] D. Dubois and H. Prade, Fuzzy Sets and Systems - Theory and Applications, New York: Academic Press, 1980. [Dubois and Prade, 1984a] D. Dubois and H. Prade. A theorem on implication functions defined from triangular norms, Stochastica 8(3), 267-279, 1984. [Dubois and Prade, 1984b] D. Dubois and H. Prade. Fuzzy logics and the generalized modus ponens revisited, Cybernetics and Systems 15, 293-331, 1984. [Dubois and Prade, 1985a] D. Dubois and H. Prade. Evidence measures based on fuzzy information, Automatica 21(5), 547-562, 1985. [Dubois and Prade, 1985b] D. Dubois and H. Prade. The generalized modus ponens under supmin composition -A theoretical study-, Approximate Reasoning in Expert Systems (Gupta M. M., Kandel A., Bandler W. and Kiszka J. B., eds.), North-Holland, Amsterdam, 217-232, 1985. [Dubois and Prade, 1987] D. Dubois and H. Prade. Necessity measures and the resolution principle. IEEE Trans. Systems, Man and Cybernetics, 17 (1987), 474-478. [Dubois and Prade, 1988a] D. Dubois and H. Prade. Possibility Theory, Plenum Press, NewYork, 1988. [Dubois and Prade, 1988b] D. Dubois and H. Prade. An introduction to possibilistic and fuzzy logics (with discussions and a reply), Non Standard Logics for Automated Reasoning (Smets P., Mamdani A., D. Dubois and H. Prade., eds.), Academic Press, 287-315 and 321-326, 1988. [Dubois and Prade, 1989] D. Dubois and H. Prade. A typology of fuzzy ”if... then...” rules, Proc. of the 3rd Inter. Fuzzy Systems Association (IFSA’89), Congress, Seattle, WA, Aug. 6-11, 782-785, 1989. [Dubois and Prade, 1990] D. Dubois and H. Prade. Resolution principles in possibilistic logic. Int. J. Approx. Reasoning, 4 (1990), pp. 1-21. [Dubois and Prade, 1991a] D. Dubois and H. Prade. Fuzzy sets in approximate reasoning - Part 1: Inference with possibility distributions, Fuzzy Sets and Systems, 40, 143-202, 1991. [Dubois and Prade, 1991b] D. Dubois and H. Prade. Epistemic entrenchment and possibilistic logic, Artificial Intelligence, 50 (1991), 223-239. [Dubois and Prade, 1991c] D. Dubois and H. Prade. Possibilistic logic, preferential models, nonmonotonicity and related issues. In Proc. of the Inter. Joint Conf. on Artificial Intelligence (IJCAI’91), Sydney, Australia, Aug. 24-30, 419-424, 1991. [Dubois and Prade, 1992a] D. Dubois and H. Prade. Gradual inference rules in approximate reasoning, Information Sciences, 61, 1992, 103-122, 1992. [Dubois and Prade, 1992b] D. Dubois and H. Prade. (1992b). Fuzzy rules in knowledge-based systems. Modelling gradedness, uncertainty and preference, An Introduction to Fuzzy Logic Applications in Intelligent Systems (Yager R. R. and Zadeh L. A., eds.), Kluwer, Dordrecht, 45-68, 1992. [Dubois and Prade, 1992c] D. Dubois and H. Prade. Possibility theory as a basis for preference propagation in automated reasoning. Proc. 1st IEEE Inter. Conf. on Fuzzy Systems (FUZZIEEE’92), San Diego, Ca., 1992, pp. 821-832. [Dubois and Prade, 1994] D. Dubois and H. Prade. Can we enforce full compositionality in uncertainty calculi?, Proc. of the 12th National Conf. on Artificial Intelligence (AAAI’94), Seattle, WA, 149-154, 1994. [Dubois and Prade, 1995] D. Dubois and H. Prade. Conditional objects, possibility theory and default rules. In: Conditionals: From Philosophy to Computer Science (G. Crocco, L. Fari˜ nas del Cerro, A. Herzig, eds.), Oxford University Press, Oxford, UK, 1995, pp. 311-346.
438
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[Dubois and Prade, 1996a] D. Dubois and H. Prade. What are fuzzy rules and how to use them, Fuzzy Sets and Systems, 84, 169-185, 1996. [Dubois and Prade, 1996b] D. Dubois and H. Prade, Combining hypothetical reasoning and plausible inference in possibilistic logic. J. of Multiple Valued Logic, 1 (1996), 219-239 . [Dubois and Prade, 1998a] D. Dubois and H. Prade, Possibility theory: Qualitative and quantitative aspects, In: Handbook of Defeasible Reasoning and Uncertainty Management Systems, Vol. 1. (D. M. Gabbay and P. Smets P., Eds.), Dordrecht: Kluwer Academic, 1998, pp. 169-226. [Dubois and Prade, 1998b] D. Dubois and H. Prade. Similarity vs. preference in fuzzy set-based logics. In Incomplete information: Rough Set Analysis (Orlowska E. ed.) Physica-Verlag, Heidelberg, 1998. [Dubois and Prade, 2001] D. Dubois and H. Prade. Possibility theory, probability theory and multiple-valued logics: A clarification. Annals of Mathematics and Artificial Intelligence 32, 35-66, 2001. [Dubois and Prade, 2004] D. Dubois and H. Prade. Possibilistic logic: a retrospective and prospective view. Fuzzy Sets and Systems, 144, 3-23, 2004. [Dubois and Prade, 2006] D. Dubois and H. Prade. Extensions multi-agents de la logique possibiliste. In Actes des Rencontres Francophones sur la Logique Floue et ses Applications (LFA’06), Toulouse, France, Oct. 19-20, 137-144, 2006. [Dubois et al., 1987] D. Dubois, J. Lang and H. Prade. Theorem proving under uncertainty - A possibility theory-based approach. Proc. of the 10th Inter. Joint Conf. on Artificial Intelligence IJCAI’87, Milano, Italy, 1987, pp. 984-986. [Dubois et al., 1991a] D. Dubois, J. Lang and H. Prade. A possibilistic assumption-based truth maintenance system with uncertain justifications and its application to belief revision. In: Truth Maintenance Systems, (J.P. Martins, M. Reinfrank, eds.), LNAI 515, Springer Verlag, 1991, pp. 87-106. [Dubois et al., 1991b] D. Dubois, J. Lang and H. Prade. Timed possibilistic logic. Fundamenta Informaticae, XV (1991), Numbers 3-4, 211-237. [Dubois et al., 1991c] D. Dubois, J. Lang and H. Prade. Fuzzy sets in approximate reasoning Part 2: Logical approaches, Fuzzy Sets and Systems, 40, 203-244, 1991. [Dubois et al., 1992] D. Dubois, J. Lang and H. Prade Dealing with multi-source information in possibilitic logic. Proc. of the 10th Eur. Conf. on Artificial Intelligence (ECAI’92), Vienna, Austria. Wiley, New-York, 1992, pp. 38-42. [Dubois et al., 1993] D. Dubois, H. Prade and R. R. Yager (eds) Readings in Fuzzy Sets for Intelligent Systems, Morgan Kaufmann, 1993. [Dubois et al., 1994] D. Dubois, M. Grabisch and H. Prade. Gradual rules and the approximation of control laws, Theoretical Aspects of Fuzzy Control (Nguyen H. T., Sugeno M., Tong R. and Yager R. R., eds.), Wiley, New York, 147-181, 1984. [Dubois et al., 1994a] D. Dubois, J. Lang and H. Prade. Possibilistic logic. In: Handbook of Logic in Artificial Intelligence and Logic Programming, (D.M. Gabbay et al., eds.), Vol. 3, Oxford Univ. Press, Oxford, UK, 1994, pp. 439-513. [Dubois et al., 1994b] D. Dubois, J. Lang and H. Prade. Automated reasoning using possibilistic logic: Semantics, belief revision and variable certainty weights, IEEE Trans. Data & Knowledge Engineering 6 (1994), 64-71. [Dubois et al., 1994c] D. Dubois, J. Lang and H. Prade. Handling uncertainty, context, vague predicates, and partial inconsistency in possibilistic logic. In: Fuzzy Logic and Fuzzy Control (Proc. of the IJCAI’91 Workshop) (D. Driankov, P.W. Eklund, A.L. Ralescu, eds.), LNAI 833, Springer-Verlag, Berlin, 1994, pp.45-55. [Dubois et al., 1997a] D. Dubois, F. Esteva, P. Garcia, L. Godo and H. Prade. A logical approach to interpolation based on similarity relations. International Journal of Approximate Reasoning, 17(7), 1997, 1-36. [Dubois et al., 1997b] D. Dubois, S. Lehmke and H. Prade. A comparative study of logics of graded uncertainty and logics of graded truth. Proc. 18th International on Fuzzy Set Theory (Enriched Lattice Structures for Many-Valued and Fuzzy Logics), Linz, Austria, 1997. [Dubois et al., 1998] D. Dubois, H. Prade and S. Sandri. A possibilistic logic with fuzzy constants and fuzzily restricted quantifiers. In: Logic Programming and Soft Computing (T.P. Martin and F. Arcelli-Fontana, Eds.), Research Studies Press, Ltd, Baldock, England, 1998, pp. 69-90.
Fuzzy Logic
439
[Dubois et al., 1999a] D. Dubois, D. Le Berre and H. Prade, R. Sabbadin. Using possibilistic logic for modeling qualitative decision: ATMS-based algorithms. Fundamenta Informaticae 37 (1999), 1-30. [Dubois et al., 1999b] D. Dubois, H. Prade and L. Ughetto. Fuzzy logic, control engineering and artificial intelligence. In: Fuzzy Algorithms for Control, (H.B. Verbruggen, H.-J. Zimmermann, R. Babuska, eds.), Kluwer Academic, 17-57, 1999. [Dubois et al., 2000] D. Dubois, P. H´ ajek and H. Prade. Knowledge-Driven versus data-driven logics. Journal of Logic, Language and Information 9, 65–89, 2000. [Dubois et al., 2001] D. Dubois, H. Prade and P. Smets, “Not impossible” vs. “guaranteed possible” in fusion and revision. Proc. 6th Europ. Conf. on Symbolic and Quantitative Approaches to reasoning with Uncertainty ECSQARU-01, Toulouse, LNAI 2143, Springer Verlag, Berlin, 2001, pp. 522-531. [Dubois et al., 2003a] D. Dubois, S. Konieczny and H. Prade. Quasi-possibilistic logic and its measures of information and conflict. Fundamenta Informaticae 57, 101-125, 2003. [Dubois et al., 2003b] D. Dubois and H. Prade and L. Ughetto. New perspective on reasoning with fuzzy rules. International Journal of Intelligent Systems, 18,541-567, 2003. [Dubois et al., 2005a] D. Dubois, F. Esteva, L. Godo and H. Prade. An information-based discussion of vagueness. Handbook of Categorization in Cognitive Science, (Henri Cohen, Claire Lefebvre, Eds.) Chap. 40, Elsevier, 2005 pp. 892-913. [Dubois et al., 2005b] D. Dubois, S. Gottwald, P. H´ ajek, J. Kacprzyk and H. Prade. Terminological difficulties in fuzzy set theory - The case of Intuitionistic Fuzzy Sets (with a reply by K. T. Atanassov, 496-499), Fuzzy Sets and Systems 156, 485-491, 2005. [Dubois et al., 2006] D. Dubois, J. Mengin and H. Prade. Possibilistic uncertainty and fuzzy features in description logic. A preliminary discussion. In: Fuzzy logic and the semantic web (E. Sanchez, ed.), Elsevier, 101-113, 2006. [Dummett, 1959] Michael Dummett.A propositional calculus with denumerable matrix.Journal of Symbolic Logic, 27:97–106, 1959. [Dvoˇr´ ak and Nov´ ak, 2004] Anton´ın Dvoˇr´ ak and Vil´em Nov´ ak. Formal theories and linguistic descriptions. Fuzzy Sets and Systems 143(1), 169–188, 2004. [Elkan, 1994] C. Elkan. The paradoxical success of fuzzy logic. (with discussions by many scientists and a reply by the author), IEEE Expert, August, 3-46, 1994. [Elorza and Burillo, 1999] J. Elorza, P. Burillo. On the relation of Fuzzy Preorders and Fuzzyconsequence Operators. Int. J. of Uncertainty, Fuzziness andKnowledge-based Systems, Vol 7, (3), 219-234, 1999. [Esteva and Godo, 1999] F. Esteva and L. Godo. Putting together L ukasiewicz and product logics. Mathware and Soft Computing, Vol. VI, n.2-3, pp. 219-234, 1999. [Esteva and Godo, 2001] Francesc Esteva and Llu´ıs Godo. Monoidal t-norm based logic: Towards a logic for left-continuous t-norms. Fuzzy Sets and Systems, 124(3):271–288, 2001. [Esteva and Godo (eds.), 2005] F. Esteva and L. Godo (eds.), Special issue on BL-algebras Soft Computing 9(2), 2005. [Esteva et al., 1994] F. Esteva, P. Garcia and L. Godo. Relating and extending semantical approaches to possibilistic reasoning. International Journal of Approximate Reasoning, 10(4), 311-344, 1994. [Esteva et al., 1997a] F. Esteva. P. Garcia and L. Godo. On the Semantics of Fuzzy Statements Based on Possibilistic Constraints, Proc. VII Congreso Espa˜ nol sobre Tecnolog´ıas y L´ ogica Fuzzy, ESTYLF’97, Tarragona. Universitat Rovira i Virgili, 21-27, 1997. [Esteva et al., 1997b] F. Esteva, P. Garcia, L. Godo, and R. Rodr´ıguez. A modal account of similarity-based reasoning.International Journal of Approximate Reasoning, 16(3-4):235–260, 1997. [Esteva et al., 1998] F. Esteva, P. Garcia, L. Godo, R.O.Rodr´ıguez. Fuzzy Approximation Relations, Modal Structures and Possibilistic Logic. Mathware and SoftComputing, 5, n. 2-3, 151-166, 1998. [Esteva et al., 2000] F. Esteva, L. Godo, P. H´ ajek and M. Navara. Residuated fuzzy logics with an involutive negation. Archive for Mathematical Logic, 39(2):103–124, 2000. [Esteva et al., 2001a] F. Esteva, P. Garcia and L. Godo. On syntactical and semantical approaches to similarity-based approximate reasoning, Proceedings of Joint 9th IFSA World Congress and 20th NAFIPS International Conference, July 25-28, 2001,Vancouver (BC), Canada, pp. 1598-1603.
440
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[Esteva et al., 2001b] F. Esteva, L. Godo and F. Montagna. The L Π and L Π 21 logics: Two complete fuzzy systems joining L ukasiewicz and product logics. Archive for Mathematical Logic, 40(1):39–67, 2001. [Esteva et al., 2002] F. Esteva, J. Gispert, L. Godo and F. Montagna. On the standard and rational completeness of some axiomatic extensions of the monoidal t-norm logic. Studia Logica, 71(2):199-226, 2002 [Esteva et al., 2003a] F. Esteva, L. Godo and A.Garc´ıa-Cerda˜ na. On the hierarchy of t-norm based residuated fuzzy logics. In Beyond Two: Theory andApplications of Multiple-Valued Logic, Ed. M.Fitting and E.Orlowska, Springer-Verlag (2003) 251–272. [Esteva et al., 2003b] F. Esteva, L. Godo, P. H´ ajek, and F. Montagna. Hoops and fuzzy logic. Journal of Logic and Computation, 13:531?555, 2003. [Esteva et al., 2004] F. Esteva, L. Godo and F. Montagna. Equational characterization of the subvarieties of BL generated by t-norm algebras. Studia Logica 76, 161-200, 2004. [Esteva et al., 2006] Francesc Esteva, Llu´ıs Godo, and Carles Noguera.On rational Weak Nilpotent Minimum logics. Journal of Multiple-Valued Logic and Soft Computing, Vol. 12, Number 1-2, pp. 9-32, 2006. [Esteva et al., 2007] F. Esteva, J. Gispert, L. Godo, and C. Noguera. Adding truth-constants to continuous t-norm based logics: Axiomatization and completeness results. Fuzzy Sets and Systems 158: 597–618, 2007. [Esteva et al., 2007b] F. Esteva, L. Godo, and C. Noguera. On expansions of t-norm based logics with truth-constants. To appear in the book Fuzzy Logics and Related Structures (S. Gottwald, P. H´ ajek, U. H¨ ohle and E.P. Klement eds.), Elsevier, 2007. [Fari˜ nas and Herzig, 1991] L. Fari˜ nas del Cerro and A. Herzig. A modal analysis of possibility theory, Fundamentals of Artificial Intelligence Research (FAIR’91) (Jorrand Ph. and Kelemen J., Eds.), Lecture Notes in Computer Sciences, Vol. 535, Springer Verlag, Berlin, 1991, pp. 11-18. [Fari˜ nas et al., 1994] L. Fari˜ nas del Cerro, A. Herzig and J. Lang, From ordering-based nonmonotonic reasoning to conditional logics, Artificial Intelligence, 66, 375-393, 1994. [Fine, 1975] K. Fine (1975). Vagueness, truth and logic, Synthese, 30: 265-300. [Flaminio, 2005] T. Flaminio. A Zero-Layer Based Fuzzy Probabilistic Logic for Conditional Probability. Lecture Notes in Artificial Intelligence, 3571: 8th European Conference on Symbolic and Quantitaive Approaches on Reasoning under Uncertainty ECSQARU’05, Barcelona, Spain, July 2005. 714–725. [Flaminio and Marchioni, 2006] Tommaso Flaminio and Enrico Marchioni. T-norm based logics with an independent involutive negation. Fuzzy Sets and Systems, Vol. 157, Issue 24, 31253144, 2006. [Flaminio and Montagna, 2005] T. Flaminio and F. Montagna. A Logical and Algebraic Treatment of Conditional Probability. Archive for Mathematical Logic, 44, 245–262 (2005). [Fodor, 1989] J. Fodor. Some remarks on fuzzy implication operations, BUSEFAL (IRIT, Univ. P. Sabatier, Toulouse, France), 38, 42-46, 1989. [Fodor, 1995] J. Fodor. Nilpotent minimum and related connectives for fuzzy logic. Proc. of FUZZ–IEEE’95, 1995, pp. 2077–2082. [Fodor and Yager, 2000] J.C. Fodor and R.R. Yager Fuzzy Set-theoretic Operators and Quantifiers. Chapter 1.2 in: (D. Dubois and H. Prade, Eds.) Handbook of Fuzzy Sets and Possibility Theory, Vol. 1: Basic Notions, Kluwer,Boston, MA, 2000, pp. 125-193. [Formato et al., 2000] F. Formato, G. Gerla, and M.I. Sessa. Similarity-based unification. Fundamenta Informaticae, 40:1–22, 2000. [Fukami et al., 1980] S. Fukami, M. Mizumoto and K. Tanaka K. (1980). Some considerations on fuzzy conditional inference, Fuzzy Sets and Systems, 4, 243-273. [Fung and Fu, 1975] L.W. Fung and K.S. Fu. An axiomatic approach to rational decision making in a fuzzy environment. In “Fuzzy sets and their applications to cognitive and decision processes”, (L.A. Zadeh, K.S. Fu, K. Tanaka, M. Shimura eds.), Academic Press, 227-256, 1975. [Gabbay, 1996] D.M. Gabbay. How to make your logic fuzzy (fibred semantic and weaving of logics, part 3. In D. Dubois, E.P. Klement, and H. Prade, editors, Fuzzy Set, Logics, and Artificial Intelligence, pages 69–89, 1996. [Gabbay, 1997] D.M. Gabbay. Fibring and labelling: Two methods for making modal logic fuzzy. In M. Mares, et al. eds. Proc. of Seventh International Fuzzy Systems Association World Congress IFSA’97, Vol. 1, Prague, Czech Republic, June 1997.
Fuzzy Logic
441
[Gabbay et al., 2004] D.M. Gabbay, G. Metcalfe and N. Olivetti. Hypersequents and Fuzzy Logic. Revista de la Real Academia de Ciencias 98(1), pages 113-126, 2004. [Gaines, 1976] B.R. Gaines. Foundations of fuzzy reasoning, Int. J. of Man-Machine Studies, 6, 623-668, 1976. [Gaines, 1978] B.R. Gaines. Fuzzy and probability uncertainty logics, Information and Control, 38, 154-169, 1978. [Galatos et al., 2007] N. Galatos, P. Jipsen, T. Kowalski and H. Ono. Residuated Lattices: an algebraic glimpse at substructural logics, Studies in Logics and the Foundations of Mathematics 151, Elsevier 2007. [Gentilhomme, 1968] Gentilhomme Y. (1968). Les ensembles flous en linguistique, Cahiers de Linguistique Th´ eorique et Appliqu´ ee (Bucarest), 5, 47-63. [Gerla, 2000] B. Gerla. A Note on Functions Associated with G¨ odel Formulas, Soft Computing, vol 4 (2000), 206-209. [Gerla, 2001a] B. Gerla. Many-Valued Logics Based on Continuous t-Norms and Their Functional Representation, Ph.D. Universit` a di Milano, 2001. [Gerla, 2001b] B. Gerla. Rational L ukasiewicz logic and DMV-algebras. Neural Networks World, vol 11 (2001), 579-584. [Gerla, 1994a] G. Gerla (1994). An Extension Principle for Fuzzy Logic. Mathematical Logic Quarterly, n. 40, 357-380. [Gerla, 1994b] G. Gerla. Inferences in probability logic. Artificial Intelligence 70, 33–52, 1994. [Gerla, 1996] G. Gerla. Graded Consequence Relations and Fuzzy Closure Operators. Journal of Applied Non-Clasical Logics, vol. 6 num. 4, 369-379, 1996. [Gerla, 2001] G. Gerla (2001) Fuzzy Logic: Mathematical Tols for Approximate Reasoning. Trends in Logic, vol. 11. Kluwer Academic Publishers. [Gerla and Sessa, 1999] G. Gerla and M.I. Sessa. Similarity in logic programming. In G. Chen, M. Ying, and K. Cai, editors, Fuzzy Logic and Soft Computing, chapter 2, pages 19–31. Kluwer, 1999. [Giles, 1988a] R. Giles. The concept of grade of membership, Fuzzy Sets and Systems, 25, 297323, 1988. [Giles, 1988b] R. Giles. A utility-valued logic for decision-making, Int. J. of Approximate Reasoning, 2, 113-141, 1988. [G¨ odel, 1932] Kurt G¨ odel. Zum intuitionistischen Aussagenkalk¨ ul. Anzieger Akademie der Wissenschaften Wien, 69:65–66, 1932. [Godo, 1990] L. Godo. Contribuci´ o a l’Estudi de Models d’infer` encia en els Sistemes Possibil´ıstics. PhD Thesis, Universitat Politcnica de Catalunya, Barcelona, Spain, 1990. [Godo and H´ ajek, 1999] L. Godo and P. H´ ajek. Fuzzy Inference as Deduction. Journal of Applied non-Classical Logics 9(1), 37-60, 1999. [Godo et al., 2000] L. Godo, F. Esteva and P. H´ ajek, Reasoning about probability using fuzzy logic. Neural Network World, Vol. 10, Number 5, 811-824 (2000). [Godo et al., 2003] L. Godo, P. H´ ajek and F. Esteva. A fuzzy modal logic for belief functions. Fundamenta Informaticae 57(2-4), 127-146, 2003. [Godo and Marchioni, 2006] L. Godo and E. Marchioni. Reasoning about coherent conditional probability in a fuzzy logic setting. Logic Journal of the IGPL, Vol. 14, Number 3, 457-481, 2006. [Goguen, 1967] J.A. Goguen. L-fuzzy sets, J. Math. Anal. Appl. 8:145-174, 1967. [Goguen, 1969] J.A. Goguen. The logic of inexact concepts, Synthese, 19, 325-37, 1969. [Gottwald, 1993] Siegfried Gottwald. Fuzzy Sets and Fuzzy Logic: Foundations of Application– from a Mathematical Point of View. Vieweg, Wiesbaden, 1993. [Gottwald, 2001] S. Gottwald. A Treatise on Many-valued Logics, Studies in Logic and Computation 9, Research Studies Press Ltd., Baldock, UK, 2001. [Gottwald and H´ ajek, 2005] S. Gottwald and P. H´ ajek. Triangular norm based mathematical fuzzy logic. In Erich Petr Klement and Radko Mesiar, editors, Logical, Algebraic, Analytic and Probabilistic Aspects of Triangular Norms, pages 275?300. Elsevier, Amsterdam, 2005. [Haack, 1979] Susan Haack. Do We Need “Fuzzy Logic”? International Journal of ManMachine Studies, 11 (4), 437–445, 1979. [H´ ajek, 1994] P. H´ ajek . A qualitative fuzzy possibilistic logic, Int. J. of Approximate Reasoning, 12 (1994), 1-19. [H´ ajek, 1998a] P. H´ ajek. Metamathematics of Fuzzy Logic, Trends in Logic, vol. 4, Kluwer, Dordercht, 1998.
442
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[H´ ajek, 1998b] P. H´ ajek. Basic fuzzy logic and BL-algebras, Soft Computing 2 (1998), 124 – 128. [H´ ajek, 2002] Petr H´ ajek. Observations on the monoidal t-norm logic. Fuzzy Sets and Systems, 132(1):107–112, 2002. [H´ ajek, 2003a] P. H´ ajek. Fuzzy logics with non-commutative conjunctions. Journal of Logic and Computation, 13: 469-479, 2003. [H´ ajek, 2003b] P. H´ ajek. Observations on non-commutative fuzzy logics. Soft Computing, 8:28?43, 2003. [H´ ajek, 2005a] P. H´ ajek. Making fuzzy description logic more general. Fuzzy Sets and Systems, 154, 1-15, 2005. [H´ ajek, 2005b] P. H´ ajek. Arithmetical complexity of fuzzy predicate logics – a survey. Soft Computing 9: 935–941, 2005. [H´ ajek, 2005c] P. H´ ajek. Fleas and fuzzy logic. Journal of Multiple-Valued Logic and Soft Computing, Volume 11, Number 1-2, 137-152, 2005. [H´ ajek, 2006a] P. H´ ajek What does mathematical fuzzy logic offer to description logic? In Capturing Intelligence: Fuzzy Logic and the Semantic Web, Elie Sanchez, ed., Elsevier, 91100, 2006. [H´ ajek, 2006b] P. H´ ajek. Computational complexity of t-norm based propositional fuzzy logics with rational truth-constants. Fuzzy Sets and Systems 157 (2006) 677–682. [H´ ajek and Cintula, 2006] Petr H´ ajek and Petr Cintula. On theories and models in fuzzy predicate logics. Journal of Symbolic Logic, 71(3):863–880, 2006. [H´ ajek and Cintula, 2007] Petr H´ ajek and Petr Cintula. Triangular norm predicate fuzzy logics. To appear in the book Fuzzy Logics and Related Structures (S. Gottwald, P. H´ ajek, U. H¨ ohle and E.P. Klement eds.), Elsevier, 2007. [H´ ajek et al., 1994] P. H´ ajek, D. Harmancov´ a, F. Esteva, P. Garcia and L. Godo. On modal logics for qualitative possibility in a fuzzy setting, Proc. of the 11th Conf. on Uncertainty in Artificial Intelligence (L´ opez de M´ antaras R. and Poole D., eds.), Morgan Kaufmann, San Francisco, CA, 1994, pp. 278-285. [H´ ajek et al., 1995] P. H´ ajek, L. Godo and F. Esteva. Fuzzy logic and probability, Proc. of the 12th Conf. on Uncertainty in Artificial Intelligence (Besnard P. Hanks S., eds.), Morgan Kaufmann, San Francisco, CA, 1995, pp. 237-244. [H´ ajek et al., 1996] Petr H´ ajek, Llu´ıs Godo, and Francesc Esteva.A complete many-valued logic with product conjunction. Archive for Mathematical Logic, 35(3):191–208, 1996. [H¨ ahnle, 1994] R. H¨ ahnle. Automated Deduction in Multiple-Valued Logics, volume 10 of International Series of Monographs in Computer Science. Oxford University Press, 1994. [H¨ ahnle, 2005] R. H¨ ahnle. Many-valued logic, partiality, and abstraction in formal specification languages. Logic Journal of IGPL 2005 13(4):415-433. [H¨ ohle, 1979] U. H¨ ohle. Minkowski functionals of L-fuzzy sets. First Symposium on Policy Analysis and Information Systems. Durham, North Caroline, USA, 178- 186, 1979. [H¨ ohle, 1995] U. H¨ ohle. Commutative, residuated l-monoids. In H¨ ohle, U. and Klement. E.P. eds., Non-Classical Logics andTheir Applications to Fuzzy Subsets, Kluwer Acad. Publ., Dordrecht (1995) 55–106. [H¨ ohle, 2007] U. H¨ ohle. Fuzzy Sets and Sheaves. Part I: Basic Concepts. Part II: Sheaf-theoretic Foundations of Fuzzy Set Theory with Applications to Algebra and Topology, Fuzzy Sets and Systems to appear. [Hollunder, 1994] B. Hollunder. An alternative proof method for possibilistic logic and its application to terminological logics. Proc. of the 10th Conference on Uncertainty in Artificial Intelligence (UAI’94), (R.L. de M´ antaras, D. Poole, eds.), San Francisco, CA, USA, 1994, 327-335. [Hollunder, 1995] B. Hollunder. An alternative proof method for possibilistic logic and its application to terminological logics, Int. J. of Approximate Reasoning, 12 (1995), 85-109. [Horˇ c´ık, 2005b] Rostislav Horˇc´ık. Standard completeness theorem for ΠMTL. Archive for Mathematical Logic, 44(4): 413–424, 2005. [Horˇ c´ık, 2007] Rostislav Horˇc´ık. On the Failure of Standard Completeness in ΠMTL for Infinite Theories, Fuzzy Sets and Systems, 158(6): 619-624, March 2007. [Horˇ c´ık and Cintula, 2004] Rostislav Horˇc´ık and Petr Cintula. Product L ukasiewicz logic. Archive for Mathematical Logic, 43(4): 477–503, 2004.
Fuzzy Logic
443
[Hunter, 2002] A. Hunter. Measuring inconsistency in knowledge via quasi-classical models. Proc. 18th National Conference on Artificial Intelligence (AAAI 2002), Edmonton, Canada, pp. 68 - 73, 2002. [Jenei and Montagna, 2002] S´ andor Jenei and Franco Montagna.A proof of standard completeness for Esteva and Godo’s logic MTL. Studia Logica, 70(2):183–192, 2002. [Jenei and Montagna, 2003] S. Jenei and F. Montagna. A proof of standard completeness for non-commutative monoidal t-norm logic. Neural Network World, 13: 481–488, 2003. [Klawonn, 1995] F. Klawonn.Prolog extensions to many-valued logics.In H¨ ohle, U. and Klement, E.P., editor, Non-Classical Logics and Their Applications to Fuzzy Subsets. A Handbook of the Mathematical Foundations of Fussy Sets Theory, pages 271–289. Kluwer, 1995. [Klawonn and Kruse, 1994] F. Klawonn and R. Kruse. A L ukasiewicz logic based Prolog. Mathware and Soft Computing, 1:5–29, 1994. [Klawonn and Nov´ ak, 1996] F. Klawonn and V. Nov´ ak V. The relation between inference and interpolation in the framework of fuzzy systems, Fuzzy Sets and Systems, 81, 331-354, 1996. [Kleene, 1952] S.C. Kleene. Introduction to Metamathematics North Holland, Amsterdam, 1952. [Klement , 1980] E.P. Klement (1980) Some remarks on t-norms, fuzzy σ-algebras and fuzzy measures. Proc. of 2nd International Linz Seminar on Fuzzy Set Theory, E.P. Klement (ed.), 125-142. [Klement and Navara, 1999] E.P. Klement and M. Navara. A survey of different triangular norm-based fuzzy logic, Fuzzy Sets and Systems 101, 1999, 241–251. [Klement et al., 2000] E.P. Klement, R. Mesiar and E. Pap. Triangular Norms. Kluwer Academic Publisher, Dordrecht. 2000. [Konieczny and Pino-P´erez, 1998] S. Konieczny and R. Pino P´erez. On the logic of merging. Proc. of the 1998 Conf. on Knowledge Representation and Reasoning Principles (KR-98), Trento. Morgan Kaufmann, San Francisco, Ca., 1998, 488-498. [Konieczny et al., 2002] S. Konieczny, J. Lang and P. Marquis . Distance-based merging: A general framework and some complexity results . Proc. of the 8th International Conference, Principles of Knowledge Representation and Reasoning (KR2002), Toulouse. Morgan Kaufmann, San Francisco, Ca., 2002, pp. 97-108. [Kowalski and Ono, 2001] T. Kowalski and H. Ono. Residuated lattices: An algebraic glimpse at logics withoutcontraction, JAIST Report, March 2001, 1-67. [Kraus et al., 1990] S. Kraus, D. J. Lehmann and M. Magidor. Nonmonotonic Reasoning, Preferential Models and Cumulative Logics. Artificial Intelligence 44(1-2): 167-207 (1990). [Lafage et al., 2000] C . Lafage, J. Lang and R. Sabbadin. A logic of supporters. In: Information, Uncertainty and Fusion, (B. Bouchon-Meunier, R. R. Yager and L. A. Zadeh, Eds.), Kluwer Acad. Publ.,Dordrecht, 2000, pp.381-392. [Lang, 1991] J. Lang. Logique Possibiliste: Aspects Formels, D´ eduction Automatique et Applications. Th`ese de Doctorat, Universit´e Paul Sabatier, Toulouse, 1991. [Lang, 2001] J. Lang. Possibilistic logic: complexity and algorithms. In: Algorithms for Uncertainty and Defeasible Reasoning . (J. Kohlas,S. Moral, Eds.), Vol. 5 of the Handbook of Defeasible Reasoning and Uncertainty Management Systems, Kluwer Acad. Publ., Dordrecht, 179-220, 2001, pp. 179-220. [Lang et al., 1991] J. Lang, D. Dubois and H. Prade. A logic of graded possibility and certainty coping with partial inconsistency, Proc. of the 7th Conf. on Uncertainty in Artificial Intelligence, UCLA, Los Angeles, July 13-15, 1991, (Morgan Kaufmann, San Francisco, Ca.), 1991, 188-196 [Lee, 1972] R.C.T. Lee.Fuzzy logic and the resolution principle. Journal of the ACM, 19(1):109– 119, 1972. [Lee and Chang, 1971] R.C.T. Lee and C.L. Chang. Some properties of fuzzy logic. Information and Control, 19(5):417–431, 1971. [Lehmann and Magidor, 1992] D. Lehmann and M. Magidor. What does a conditional knowledge base entail? Artificial Intelligence 55 (1992), pp. 1-60. [Lehmke, 1995] S. Lehmke.On resolution-based theorem proving in propositional fuzzy logic with ‘bold’ connectives. Universit¨ at Dortmund, Fachbereich Informatik, 1995. Master’s Thesis. [Lehmke, 2001a] S. Lehmke. Logics which Allow Degrees of Truth and Degrees of Validity. PhD dissertation, Universit¨ at Dortmund, Germany, 2001. [Lehmke, 2001b] S. Lehmke, Degrees of Truth and Degrees of Validity. In Discovering the World with Fuzzy Logic (V. Nov´ ak, I. Perfilieva, Eds) Physica Verlag, Heidelberg, 2001, pp. 19223791.
444
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[Lewis, 1973a] D. L. Lewis. Counterfactuals. Oxford: Basil Blackwell, 1973. [Lewis, 1973b] D.L. Lewis. Counterfactuals and comparative possibility, J. Philosophical Logic 2 (1973), pp 418-46 [Liau, 1998] C.J. Liau, Possibilistic residuated implications logics with applications. Int. J. Uncertainty, Fuzziness, and Knowledge-based Systems, 6 (1998): 365-385. [Liau, 1999] C.J. Liau, On the possibility theory-based semantics for logics of preference. International J. Approximate Reasoning, 20 (1999), 173-190. [Liau and Lin, 1988] C.J. Liau and B.I.P Lin. Fuzzy logic with equality. International Journal Pattern Recognition and Artificial Intelligence, 2(2):351–365, 1988. [Liau and Lin, 1993] C. J. Liau and B. I-P. Lin. Proof methods for reasoning about possibility and necessity, Int. J. of Approximate Reasoning, 9, 327-364, 1993. [Liau and Lin, 1996] C.J. Liau and I.P. Lin. Possibilistic reasoning: a mini-survey and uniform semantics. Artificial Intelligence, 88 (1996), 163-193. [L ukasiewicz, 1920] Jan L ukasiewicz. O logice trojwartosciowej (On three-valued logic). Ruch filozoficzny, 5:170–171, 1920. [L ukasiewicz, 1930] J. L ukasiewicz. Philosophical remarks on many-valued systems of propositional logic, 1930. Reprinted in Selected Works (Borkowski, ed.), Studies in Logic and the Foundations of Mathematics, North-Holland, Amsterdam, 1970, 153-179. [L ukasiewicz, 1970] J. L ukasiewicz. Selected Works, Borkowski, ed., Studies in Logic and the Foundations of Mathematics, North-Holland, Amsterdam, 1970.. [L ukasiewicz, 2006] T. L ukasiewicz. Fuzzy description logic programs under the answer set semantics for the semantic web. Proc. of the 2nd International Conference on Rules and Rule Markup Languages for the Semantic Web (RuleML 2006), (T. Eiter, E. Franconi, R. Hodgson, and S. Stephens, eds.) pp. 89-96, Athens, Georgia, IEEE Computer Society, 2006. [Mamdani, 1977] E.H. Mamdani. Application of fuzzy logic to approximate reasoning using linguistic systems, IEEE Trans. on Comput., 26, 1182-1191, 1977. [Marchioni, 2006] E. Marchioni. Possibilistic conditioning framed in fuzzy logics. International Journal of Approximate Reasoning, Vol. 43, Issue 2, 133-165, 2006. [Marchioni and Montagna, 2006] E. Marchioni and F. Montagna. A note on definability in L Π1/2. In Proc. of the 11th IPMU International Conference, 1588–1595, 2006. [Marchioni and Montagna, to appear] E. Marchioni and F. Montagna. Complexity and definability issues in L Π 21 .Journal of Logic and Computation, doi:10.1093/logcom/exl044, to appear. [McNaughton, 1951] R. McNaughton. A theorem about infinite-valued sentencial logic. Journal of Symbolic Logic, 16:1–13, 1951. [Medina et al., 2001] J. Medina, M. Ojeda-Aciego and P. Vojt´ atˇs, Multi-adjoint logic programming with continuous semantics, Proc of Logic Programming and Non-Monotonic Reasoning, LPNMR’01, Springer-Verlag, Lecture Notes in Artificial Intelligence 2173 (2001), 351-364. [Mendel, 2000] J. Mendel (2000) Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions, Prentice-Hall, 2000. [Metcalfe et al., 2003] G. Metcalfe, N. Olivetti, and D. Gabbay. Goal-directed calculi for G¨ odelDummett logics.In M. Baaz and J. A. Makowsky, editors, Proceedings of CSL 2003, volume 2803 of LNCS,pages 413-426. Springer, 2003. [Metcalfe et al., 2004a] 14. G. Metcalfe, N. Olivetti, and D. Gabbay. Analytic proof calculi for product logics. Archive for Mathematical Logic, 43(7): 859-889, 2004. [Metcalfe et al., 2004b] G. Metcalfe, N. Olivetti, and D. Gabbay. Goal-directed methods for L ukasiewicz logics. InJ. Marcinkowski and A. Tarlecki, editors, Proceedings of CSL 2004, volume 3210 of LNCS,pages 85?99. Springer, 2004. [Metcalfe et al., 2005] 16. G. Metcalfe, N. Olivetti, and D. Gabbay. Sequent and Hypersequent Calculi for L ukasiewicz and Abelian Logics. ACM Transactions on Computational Logic 6(3): 578-613, 2005. [Metcalfe et al., to appear] G. Metcalfe, N. Olivetti, and D. Gabbay. Proof Theory for Fuzzy Logics. Book in preparation for Research Studies Press. [Mich´ alek, 1975] J. Mich´ alek (1975). Fuzzy Topologies. Kibernetika, vol. II, n. 5, 345-354. [Mizumoto and Tanaka, 1976] M. Mizumoto and K. Tanaka. Some properties of fuzzy sets of type 2, Information Control, 31, 312-340, 1976. [Mizumoto and Zimmermann, 1982] M. Mizumoto and H. J. Zimmermann. Comparison of fuzzy reasoning methods, Fuzzy Sets and Systems, 8, 253-283, 1982.
Fuzzy Logic
445
[Moisil, 1972] G. Moisil. La logique des concepts nuancs, Essais sur les Logiques Non Chrysippiennes, Editions Acad. Repub. Soc. Roum, Bucharest, 157-163, 1972. [Montagna, 2000] Franco Montagna. An algebraic approach to propositional fuzzy logic. Journal of Language, Logic and Information 9, 91-124, 2000. [Montagna, 2001] Franco Montagna. Functorial representation of MV∆ algebras with additional operations. Journal of Algebra 238, 99-125, 2001. [Montagna, 2005] F. Montagna. Subreducts of MV-algebras with product and product residuation. Algebra Universalis 53, 109-137, 2005. [Montagna and Ono, 2002] F. Montagna and H. Ono. Kripke semantics, undecidability and standard completeness for Esteva and Godo’s logic MTL∀. Studia Logica, 71(2): 227-245, 2002. [Montagna and Panti, 2001] F. Montagna and G. Panti. Adding structure to MV-algebras. Journal of Pure and Applied Algebra 164, 365–387, 2001. [Montagna et al., 2006] Franco Montagna, Carles Noguera, and Rostislav Horˇc´ık. On weakly cancellative fuzzy logics. Journal of Logic and Computation, 16(4): 423–450, 2006. [Morsi and Fahmy, 2002] N. N. Morsi and A. A. Fahmy. On generalized modus ponens with multiple rules and a residuated implication, Fuzzy Sets and Systems, Volume 129, Issue 2, 16 July 2002, Pages 267-274. [Mostert and Shields, 1957] P.S. Mostert and A.L. Shields. On the structure of semigroups on a compact manifold with boundary. Ann. Math., 65:117–143, 1957. [Mukaidono and Kikuchi, 1993] M. Mukaidono and H. Kikuchi, Foundations of fuzzy logic programming, in: P.-Z. Wang, K.-F. Loe (Eds.), Between Mind And Computer - Fuzzy Science and Engineering, World Scientic Publ, pp. 225-244, 1993. [Mukaidono et al., 1989] M. Mukaidono, Z.L. Shen, and L. Ding.Fundamentals of fuzzy Prolog. International Journal of Approximate Reasoning, 3:179–193, 1989. [Mundici, 1994] D. Mundici. A constructive proof of McNaughton’s theorem in infinite-valued logic. Journal of Symbolic Logic, 59(2):596–602, 1994. [Negoita and Ralescu, 1975] C.V. Negoita and D.A. Ralescu. Representation theorems for fuzzy concepts, Kybernetes, 4, 169-174, 1975. [Nilsson, 1974] N.J. Nilsson Probabilistic Logic. Artificial Intelligence, 28, 71-87, 1974. [Niskanen, 1988] V. A. Niskanen. An alternative approach for specifying fuzzy linguistic truth values: truth as a distance, Cybernetics and Systems’88 (Trappl R., ed.), Kluwer Academic Publishers, 627-634, 1988. [Noguera, 2006] Carles Noguera.Algebraic study of axiomatic study of t-norm based fuzzy logics. PhD thesis, University of Barcelona, Barcelona, 2006. [Nov´ ak, 1990a] V. Nov´ ak. On the syntactico-semantical completeness of first-order fuzzy logic. Part I: Syntax and Semantics. Kybernetika, 26:47–66, 1990. [Nov´ ak, 1990b] V. Nov´ ak. On the syntactico-semantical completeness of first-order fuzzy logic. Part II: Main results. Kybernetika, 26:134–154, 1990. [Nov´ ak, 1999] V. Nov´ ak. Weighted inference systems. In J. Bezdek, D. Dubois, and H. Prade, editors, Fuzzy Sets in Approximate Reasoning and Information Systems, Fuzzy Sets Series, pages 191–241. Kluwer, 1999. [Nov´ ak, 2004] V. Nov´ ak. On fuzzy equality and approximation in fuzzy logic. Soft Computing 8 (2004) 668–675. [Nov´ ak, 2005] Vil´ em Nov´ ak. On fuzzy type theory. Fuzzy Sets and Systems 149(2),235–273, 2005. [Nov´ ak, 2006] Vil´ em Nov´ ak editor. Special section “What is Fuzzy Logic”, Fuzzy Sets and Systems 157(5), 595–718, 2006. [Nov´ ak and Lehmke, 2006] Vil´ em Nov´ ak and Stephan Lehmke. Logical structure of fuzzy IFTHEN rules. Fuzzy Sets and Systems 157(15) 2003–2029, 2006. [Nov´ ak and Perfilieva, 2000] V. Nov´ ak and I. Perfilieva.Some consequences of Herbrand and McNaughton theorems in fuzzy logic.In V. Nov´ ak and I. Perfilieva, editors, Discovering the World with Fuzzy Logic, Studies in Fuzziness and Soft Computing, pages 271–295. Physica-Verlag, 2000. [Nov´ ak et al., 1999] Vil´ em Nov´ ak, Irina Perfilieva, and Jiˇr´ı Moˇ ckoˇr. Mathematical Principles of Fuzzy Logic. Kluwer, Dordrecht, 1999. [Ono and Komori, 1985] H. Ono and Y. Komori. Logics without the contraction rule. Journal of Symbolic Logic 50, 169-201, 1985.
446
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[Parikh, 1983] R. Parikh. The problem of vague predicates, Language, Logic and Method, (Cohen R.S. and Wartopsky M.W., eds.), D. Reidel, Dordrecht, 241-261, 1983. [Pavelka, 1979] J. Pavelka. On Fuzzy Logic I, II, III. Zeitschrift fur Math. Logik und Grundlagen der Math. 25 (1979) 45-52, 119-134, 447-464. [Pawlak, 1991] Z. Pawlak.Rough Sets: Theoretical Aspects of Reasoning About Data. Dordrecht: Kluwer Academic Publishing, 1991. [Pei, 2003] D. Pei. On equivalent forms of fuzzy logic systems NM and IMTL. Fuzzy Sets and Systems 138 (2003) 187 - 195. [Perfilieva, 2004] Irina Perfilieva. Normal forms in BL-algebra and their contribution to universal approximation of functions. Fuzzy Sets and Systems 143(1): 111-127, 2004. [Polya, 1954] G. Polya (1954). Patterns of Plausible Inference, Princeton University Press. [Prade, 1980] H. Prade. Unions et intersections d’ensembles flous. Busefal 3, 58-62, 1980. [Prade, 1982] H. Prade. Mod` eles Math´ ematiques de l’Impr´ ecis et de l’Incertain en vue ˆ d’Applications au Raisonnement Naturel, Th`ese de Doctorat d’Etat, Universit´e Paul Sabatier, 1982. [Prade, 1985] H. Prade. A computational approach to approximate and plausible reasoning with applications to expert systems, IEEE Trans. on Pattern Analysis and Machine Intelligence, 7(3), 260-283. Corrections in 7(6), 747-748, 1985. [Prade, 1988] H. Prade. Raisonner avec des r`egles d’inf´erence graduelle - Une approche bas´ee sur les ensembles flous, Revue d’Intelligence Artificielle (Herm`es, Paris), 2(2), 29-44, 1988. [Rasiowa, 1974] Helena Rasiowa. An Algebraic Approach to Non-Classical Logics. NorthHolland, Amsterdam, 1974. [Reichenbach, 1949] H. Reichenbach. The theory of probability, University of California Press, 1949. [Rescher, 1976] N. Rescher. Plausible Reasoning. Van Gorcum, Amsterdam, 1976. [Rescher and Manor, 1970] N. Rescher, R. Manor, On inference from inconsistent premises. Theory and Decision, 1(1970), 179-219. [Rodr´ıguez et al., 2003] R. Rodr´ıguez, F. Esteva, P. Garca and L. Godo. On Implicative Closure Operators in Approximate Reasoning. International Journal of Approximate Reasoning 33 (2003) 159–184. Preliminary version in Proc. of 1999 Eusflat-Estylf Joint Conference, Palma de Mallorca, Sep. 99, pp. 35-37. [Rose and Rosser, 1958] A. Rose and J.B. Rosser. Fragments of many-valued statement calculi. Transactions of the American Mathematical Society 87, 1–53, 1958. [Ruspini, 1991] E.H. Ruspini. On the semantics of fuzzy logic, Int. J. of Approximate Reasoning, 5, 45-88, 1991. [Russell, 1923] B. Russell. Vagueness, Austr. J. of Philosophy, 1, 84-92, 1923. [Sanchez, 1978] E. Sanchez. On possibility-qualification in natural languages, Information Sciences, 15, 45-76, 1978. [Savick´ y et al., 2006] P. Savick´ y, R. Cignoli, F. Esteva, L. Godo, and C. Noguera. On product logic with truth constants. Journal of Logic and Computation, 16(2):205–225, 2006. [Schotch, 1975] P.K. Schotch. Fuzzy modal logic. In Proc. of the 5th Intl. Symposium on Multiple-valued Logic (ISMVL-75), IEEE press, pp. 176-183, 1975. [Schweizer and Sklar, 1963] B. Schweizer and A. Sklar. Associative functions and abstract semigroups. Publ. Math. Debrecen 10, pp. 69-180, 1963. [Schweizer and Sklar, 1983] B. Schweizer and A. Sklar. Probabilistic metric spaces, NorthHolland, 1983. [Shackle, 1961] G. L.S. Shackle. Decision, Order and Time in Human Affairs, (2nd edition), Cambridge University Press, UK, 1961. [Shafer, 1975] G. Shafer A mathematical theory of evidence. Princeton Univ. Press 1975. [Shen et al., 1988] Z.L. Shen, L. Ding, and M. Mukaidono. Fuzzy resolution principle. In Proceedings of the Eighteenth International Symposium on Multiple-Valued Logic, ISMVL-88, pages 210–214, Palma de Mallorca, Spain, 1988. IEEE Press. [Smets and Magrez, 1987] P. Smets and P. Magrez. Implication in fuzzy logic, Int. J. of Approximate Reasoning, 1, 327-347, 1987. [Smets and Magrez, 1988] P. Smets and P. Magrez. The measure of the degree of truth and the grade of membership, Fuzzy Sets and Systems, 25, 297-323, 1988.
Fuzzy Logic
447
[Spohn, 1990] W. Spohn. A general non-probabilistic theory of inductive reasoning, Uncertainty in Artificial Intelligence 4 (Shachter R.D., Levitt T.S., Kanal L.N. and Lemmer J.F., Eds.), North-Holland, Amsterdam, 149-158, 1980. [Straccia, 1998] U. Straccia. A fuzzy description logic. Proc. 15th National Conf. on Artificial Intelligence (AAAI’98) and 10th Conf. on Innovative Applications of Artificial Intelligence (IAAI’98), AAAI Press, 594-599, 1998. [Straccia, 2001] U. Straccia. Reasoning within fuzzy description logics. J. of Artif. Intellig. Research, 14:137-166, 2001. [Straccia, 2006a] U. Straccia. A fuzzy description logic for the semantic web. In Capturing Intelligence: Fuzzy Logic and the Semantic Web, Elie Sanchez, ed., Elsevier, 2006. [Straccia, 2006b] U. Straccia. Uncertainty and description logic programs over lattices. In Capturing Intelligence: Fuzzy Logic and the Semantic Web, Elie Sanchez, ed., Elsevier, 2006. [Straccia, 2006c] U. Straccia. Description logics over lattices. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 14(1): 1-16, 2006. [Sudkamp, 1993] T. Sudkamp. Similarity, interpolation, and fuzzy rule construction, Fuzzy Sets and Systems, 58, 73-86, 1993. [Sugeno, 1985] M. Sugeno. An introductory survey of fuzzy control, Information Sciences, 36, 59-83, 1985. [Sugeno and Takagi, 1983] M. Sugeno and T. Takagi. Multi-dimensional fuzzy reasoning, Fuzzy Sets and Systems, 9, 313-325, 1983. [Takeuti and Titani, 1992] G. Takeuti and S. Titani. Fuzzy Logic and Fuzzy Set Theory. Archive for Mathematical Logic 32, 1-32, 1992. [Tarski, 1930] A. Tarski. Fundamentale Begriffe der Methodologie der deduktiven Wissenschaften. Monat-shefte Mathematik Physik, 37: 361-404, 1930 [Thiele and Lehmke, 1994] H. Thiele and S. Lehmke.On ‘bold’ resolution theory.In Proceedings of the Third IEEE International Conference on Fuzzy Systems, Fuzz-IEEE-94, pages 1945– 1950, Orlando, Florida, 1994. IEEE Press. [Tresp and Molitor, 1998] C.B. Tresp and R. Molitor. A description logic for vague knowledge. Proc. of the 13th European Conference on Artificial Intelligence (ECAI’98), J. Wiley and Sons, 1998, 361-365. [Trillas, 1979] E. Trillas. Sobre funciones de negacin en la teora de conjuntos difusos (in spanish), Stochastica, III, 1, 47-60, 1979. English version: On negation functions in fuzzy set theory, Advances of Fuzzy Logic (Barro S., Bugarn A. and Sobrino A., eds.), Publicacin de la Universidade de Santiago de Compostela, 1998, 31-40. [Trillas and Valverde, 1981] E. Trillas and L. Valverde. On some functionally expressable implications for fuzzy set theory, Proc. of the 3rd Inter. Seminar on Fuzzy Set Theory, Linz, Austria, 173-190, 1981. [Trillas and Valverde, 1985] E. Trillas and L. Valverde. On implication and indistinguishability in the setting of fuzzy logic, Management Decision Support Systems using Fuzzy Sets and Possibility Theory (Kacprzyk J. and Yager R. R., eds.), Verlag TV Rheinland, Kln, 198-212, 1985. [Ughetto et al., 1997] L. Ughetto, D. Dubois and H. Prade. Efficient inference procedures with fuzzy inputs, Proc. of the 6th IEEE Inter. Conf. on Fuzzy Systems (FUZZ-IEEE’97), Barcelona, Spain, 567-572, 1997. [Urquhart, 1986] A. Urquhart. Many-Valued Logic. In Dov M. Gabbay and Franz Guenthner, eds. Handbook of Philosophical Logic: Volume III, Alternatives to Classical Logic, pp.71-116. Dordrecht: Reidel, 1986. [Valverde and Trillas, 1985] L. Valverde and E. Trillas. On modus ponens in fuzzy logic. In Proceedings of the Fifteenth International Symposium on Multiple-Valued Logic, ISMVL-85, pages 294–301. IEEE Press, 1985. [Vojt´ aˇs, 1998] P. Vojt´ aˇs. Fuzzy reasoning with tunable t-operators. Journal for Advanced Computer Intelligence, 2:121–127, 1998. [Vojt´ aˇs, 2001] P. Vojt´ aˇs. Fuzzy logic programming. Fuzzy Sets and Systems, 124(3):361–370, 2001. [Wang, 1999] G.J. Wang. On the logic foundation of fuzzy reasoning, Information Sciences 117 (1999) 47–88. [Wang, 2000] G.J. Wang. Non-classical Mathematical Logic and Approximate Reasoning, Science Press, Beijing, 2000 (in Chinese).
448
Didier Dubois, Francesc Esteva, Llu´ıs Godo and Henri Prade
[Wang et al., 2004] S.-M.San-Min Wang, Bao-Shu Wang and Xiang-Yun Wang. A characterization of truth-functions in the nilpotent minimum logic. Fuzzy Sets and Systems, Volume 145, 253-266, 2004. [Wang et al., 2005a] San-Min Wang, Bao-Shu Wang, and Dao-Wu Pei. A fuzzy logic for an ordinal sum t-norm. Fuzzy Sets and Systems, 149(2):297–307, 2005. [Wang et al., 2005b] S. Wang, B. Wang, Ren-Fang. NML, a schematic extension of F. Esteva and L. Godo’s logic MTL. Fuzzy Sets Syst 149, 285-295, 2005. [Weber, 1983] S. Weber. A general concept of fuzzy connectives, negations and implications based on t-norms and t-co-norms, Fuzzy Sets and Systems, 11, 115-134, 1983. [Weston, 1987] T. Weston. Approximate truth, J. Philos. Logic, 16, 203-227, 1987. [Weyl, 1946] H. Weyl. Mathematic and logic, Amer. Math. Month., 53, 2-13, 1946. [Whalen, 2003] T. Whalen. Parameterized R-implications Fuzzy Sets and Systems, 134, 2003, 231-281, 2003. [Whalen and Schott, 1983] T. Whalen and B. Schott. Issues in fuzzy production systems, Int. J. of Man-Machine Studies, 19, 57-71, 1983. [Whalen and Schott, 1985] T. Whalen and B. Schott. Alternative logics for approximate reasoning in expert systems: A comparative study, Int. J. of Man-Machine Studies, 22, 327-346. [W´ ojcicki, 1988] R. W´ ojcicki. Theory of Logical Calculi: Basic Theory of Consequence Operations. Kluwer Academic Publishers, Dordrecht, 1988. [Yager, 1983a] R.R. Yager. An introduction to applications of possibility theory, Human Systems Management, 3, 246-269, 1983. [Yager, 1983b] R.R. Yager. Some relationships between possibility, truth and certainty, Fuzzy Sets and Systems, 11, 151-156, 1983. [Yager, 1985a] R.R. Yager. Inference in a multivalued logic system. International Journal ManMachine Studies, 23:27–44, 1985. [Yager, 1985b] R.R. Yager. Strong truth and rules of inference in fuzzy logic and approximate reasoning, Cybernetics and Systems, 16, 23-63, 1985. [Yen, 1991] J. Yen. Generalizing term subsumption languages to fuzzy logic. Proc. of the 12th International Joint Conference on Artificial Intelligence (IJCAI’91), Sidney, August 1991, 472-477. [Ying, 1994] M. Ying (1994). A Logic for Approximate Reasoning. Journal of Symbolic Logic,vol. 59, n. 3, 830-837. [Zadeh, 1965] L.A. Zadeh. Fuzzy sets, Information and Control, 8, 338-353, 1965. [Zadeh, 1972] L.A. Zadeh. A fuzzy-set-theoretic interpretation of linguistic hedges, J. of Cybernetics, 2, 4-34, 1972. [Zadeh, 1973] L.A. Zadeh. Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. on Systems, Man and Cybernetics, 3, 28-44, 1973. [Zadeh, 1975a] L.A. Zadeh. Fuzzy Logic and approximate reasoning (In memory of Grigore Moisil), Synthese, 30, 407-428, 1975. [Zadeh, 1975b] L.A. Zadeh. Calculus of fuzzy restrictions, Fuzzy Sets and their Applications to Cognitive and Decision Processes (Zadeh L. A., Fu K. S., Tanaka K. and Shimura M., eds.), Academic Press, New York, 1-39, 1975. [Zadeh, 1975c] L.A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning, Information Sciences, Part 1: 8, 199-249; Part 2: 8, 301-357; Part 3: 9, 43-80, 1975. [Zadeh, 1976] L.A. Zadeh. A fuzzy-algorithmic approach to the definition of complex or imprecise concepts, Int. J. of Man-Machine Studies, 8, 249-291, 1976. [Zadeh, 1978a] L.A. Zadeh. Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems, 1, 3-28, 1978. [Zadeh, 1978b] L.A. Zadeh. PRUF - A meaning representation language for natural languages, Int. J. of Man-Machine Studies, 10, 395-460, 1978. [Zadeh, 1979a] L.A. Zadeh. A theory of approximate reasoning. In J.E: Hayes, D. Michie, and L.I. Mikulich, editors, Machine Intelligence, volume 9, pages 149–194. Elsevier, 1979. [Zadeh, 1979b] L.A. Zadeh, Fuzzy sets and information granularity. In M.M. Gupta, R.K. Ragade and R.R. Yager (eds.), Advances in Fuzzy Set Theory and Applications, North-Holland, Amsterdam, pp. 3-18,1979. [Zadeh, 1981] L.A. Zadeh. Test score semantics for natural languages and meaning representation via PRUF, Empirical Semantics, Vol. 1 (Rieger B. B., ed.), Brockmeyer, Bochum, 281-349, 1981.
Fuzzy Logic
449
[Zadeh, 1987] L.A. Zadeh. A computational theory of dispositions, Int. J. of Intelligent Systems, 2, 39-63, 1987. [Zadeh, 1988] L.A. Zadeh.Fuzzy Logic. IEEE Computer 21(4): 83-93 (1988) [Zadeh, 1989] L.A. Zadeh. Knowledge Representation in Fuzzy Logic. IEEE Trans. Knowl. Data Eng. 1(1): 89-100 (1989) [Zadeh, 1992] L.A. Zadeh.The calculus of fuzzy if/then rules, AI Expert, 7(3), 27-27, 1992. [Zadeh, 1994a] L.A. Zadeh. Preface in Fuzzy Logic technology and Applications, (R. J. Marks-II Ed.), IEEE Technical Activities Board (1994). [Zadeh, 1994b] L.A. Zadeh. Soft computing and fuzzy logic, IEEE Software, November issue, 48-56, 1994. [Zadeh, 1995] L.A. Zadeh. Fuzzy logic = Computing with words, IEEE Trans. on Fuzzy Systems, 4, 103-111, 1995. [Zadeh, 1997] L.A. Zadeh. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems, 90, 1997, 111-127. [Zadeh, 1999] L.A. Zadeh. A New Direction in System Analysis: From Computation with Measurements to Computation with Perceptions (Abstract). RSFDGrC 1999: 10-11. [Zadeh, 2001] L.A. Zadeh. A New Direction in AI: Toward a Computational Theory of Perceptions. AI Magazine 22(1): 73-84 (2001). [Zadeh, 2005] L.A. Zadeh. Toward a generalized theory of uncertainty (GTU)–an outline, Information Sciences, 172, 2005, 1-40. [Zhang and Zhang, 2004] Wen-Ran Zhang, Lulu Zhang. YinYang bipolar logic and bipolar fuzzy logic. Information Sciences, 165, 265-287, 2004.
NONMONOTONIC LOGICS: A PREFERENTIAL APPROACH Karl Schlechta
1
INTRODUCTION
What are nonmonotonic logics, and why do they exist? A logic is called non-monotonic, if it is so in the first argument. If |∼ is the consequence relation, then T |∼ φ need not imply T ′ |∼ φ for T ⊆ T ′ . Seen from classical logic, this is a surprising property, which is, however, imposed by the intended application. Non-monotonic logics are used for (among other things) reasoning with information of different quality. For instance, to take the most common example, the sentence “birds fly” of common sense reasoning does not mean that all birds fly, with “all” the classical quantifier, but that the majority of birds fly, the interesting ones fly, or something the like. It is a general information, which we are prepared to give up in the face of more specific or reliable (i.e. of better quality) information. Knowing that Tweety is a bird, and that it is a penguin, will make us believe that the property of the more special class, penguins, to be unable to fly, will override the general property. Thus bird(x) |∼ f ly(x), but bird(x) ∧ penguin(x) |∼ f ly(x), and even bird(x) ∧ penguin(x) |∼ ¬f ly(x). So, we can summarize: non-monotonic logics are an abstraction of principled reasoning with information of different quality (among other things). Thus, they have their justification as a logic of artificial intelligence, which tries to imitate aspects of common sense reasoning. There are several types of non-monotonic logics, the principal ones are perhaps: • defeasible inheritance • defaults • others, as (1) autoepistemic logic (2) circumscription (3) logic programming and Prolog (4) preferential reasoning
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
452
Karl Schlechta
(5) theory revision (6) theory update The last two, theory revision and theory update, stand out, as their consequence relation takes two arguments on the left, e.g. for theory revision K and φ, and look at the consequences of K ∗ φ, the result of revising K by φ. This property is, however, at least in the traditional AGM approach (due to Alchourron, G¨ ardenfors, Makinson, [1985]), only a superficial distinction, as K will be fixed. As a matter of fact, theory revision and update find their place naturally in the general nonmonotonic context. Defeasible inheritance, however, is radically different from the other formalisms, as the reasoning follows predetermined paths, it does not have the flexibility of the other systems, it is deceptively simple, but is ridden with some deep problems, like extensions versus direct scepticism, and so on. They will not be treated here, the reader is referred to the book by Touretsky [1986], with some discussion also to be found in the author’s [1997a]. Defaults take an intermediate position, they are based on classical logic, enriched with rules which work with consistency tests. They will also not be treated here, see Reiter [1980] for the original paper. Concerning the remaining logics, the author of these lines has never done serious work, i.e. research, on autoepistemic logic, circumscirption, logic programming and Prolog, so he simply does not feel competent enough for a deeper presentation. Theory update is very close to counterfactual conditionals, which are relatively well known, and not very far from theory revision, though the quantifier is distributed differently at a decisive point: in the semantics of distance based theory revision, we look at all φ-models which are globally closest to the set of all K-models, whereas in theory update, we look at all φ-models, which are closest to some Kmodel (this makes it monotone in the first argument). We will not treat theory update here, either. So we will focus on preferential reasoning and theory revision. This introductory remark will be further restricted, and we will discuss only preferential reasoning. The reason is simple: the basic problems and approaches are similar, so one can stand for the other. There are two, even three approaches to preferential reasoning. The first is by structural semantics: the consequences are not those formulas which hold in all (classical) models of a (classical) theory T , but those which hold in the preferred models of that theory — where preference is determined by a binary relation between classical models. As the preferred models are a subset of all models, we have a strengthening of classical logic. Note, that we have a superstructure over the set of classical models, just as a Kripke structure is on top of a set of classical models. This construction is the natural and basically very simple idea of preferential structures. They were first introduced in a different context, for deontic logic, by Hansson in [1971], and then rediscovered by Shoham [1987] and Siegel [1985] for non-monotonic logic, generalizing circumscription.
Nonmonotonic Logics: A Preferential Approach
453
The second approach is by examination of natural rules for non-monotonic logics. Note that the important rule of monotony obviously fails, so reasoning becomes less regulated, so the need for other laws is strongly felt. Such rules are e.g. AND (if T |∼ φ and T |∼ φ′ , then T |∼ (φ ∧ φ′ )), OR (if φ |∼ ψ, and φ′ |∼ ψ, then also φ ∨ φ′ |∼ ψ), etc. Such laws were examined first by Gabbay [1985] and Makinson [1994]. The connection between the two approaches was quite soon seen, and published in the seminal papers [Kraus et al., 1990] and [Lehmann and Magidor, 1992]. Finally, the third approach is intermediate, and considers the abstract algebraic properties of the choice functions defined by preference. The most important property is X ⊆ Y → µ(Y ) ∩ X ⊆ µ(X), its validity is obvious: if y ∈ X is minimal or preferred in Y , y ∈ µ(Y ), i.e. there is no y ′ < y in Y , then there is no such y ′ in X, so it must be minimal in X, too. (For historical reasons, preference increases downwards.) It is intermediate in the following sense: Such algebraic properties of the choice functions carry (almost) directly over to the logical properties of the generated consequence relation, and the hard work in representation is the construction of the relation from the properties of the choice function. Such choice function were considered in social choice, see the work by Aizerman, Arrow, Chernoff, Malishevski, Sen, [Aizerman, 1985; Aizerman and Malishevski, 1981; Arrow, 1959; Chernoff, 1954; Sen, 1970], and rediscovered in the context of (possibly infinite) representation problems by the present author, [Schlechta, 1992]. The connection was pointed out by Lehmann, [2001]. Where are the problems? Apart from general problems facing researchers in representation questions, i.e. to find the right construction techniques, there are specific issues to treat here: The first one is hidden in the “(almost)” of the preceding paragraph. In the general infinite case, it may well be that µ(M (T )), the set of minimal models of a theory does not correspond to any theory, i.e. it is not definable by a theory. In this case, the tight connection between semantics and logics is loosened, and the usual characterizations fail and cannot be recovered. The second one has to do with domain closure properties: if, e.g. the domain of definable model sets is not closed under finite unions (it is in classical logic, but not in all other logics, when we build preferential strutures on top of their models), this has far reaching consequences on possible characterizations. This is the subject of still ongoing research. Organisation of the text We begin with a short discussion of some concepts underlying nonmonotonic logics, and their development. The main emphasis of this text is, however, on formal results in Section 2, proof techniques, advanced problems and solutions in Section 3. In the latter part, we will in particular discuss definability preservation and domain closure properties, which are certainly also important for other fields in non-classical logic, and thus go beyond the framework of this Chapter.
454
Karl Schlechta
1.1
Basic development and semantical notions
Since the beginning of nonmonotonic logics, there was a development in two directions: • from fast and dirty common sense reasoning to reasoning about normality, • from rules to semantics. Grosso modo, the second development was not followed by researchers who wanted to create rapidly a working system, whereas it was followed by that part of the community which was more foundations oriented, who wanted to understand what those new logics were about. And, there seems no better way than a formal semantics to understand a logic. In the beginning, there was hope that somehow bundling information into normality would allow to simplify reasoning. Of course, this is partly true, we can subsume many cases under “normal cases” — and exceptions under “abnormal cases”, but this leaves two fundamental problems unsolved: (1) is reasoning with normal cases more efficient? (2) how do we know whether assuming normality is justified? Solutions to the second problem via consistency — an often adopted idea — in some non-trivial logic is notoriously inefficient. As a consequence, researchers have turned to the perhaps more accessible question of what “normality” is, or, better, what its properties are. The author has followed both re-directions, and this chapter will reflect it. When we look at formal semantics of logics which are more or less tightly related to the question of reasoning about normality, we see that two basic concepts stand out: size and distance. These do not necessarily mean size and distance as we use them in every day life, or in usual mathematics, but they are sufficiently close to the usual concepts to merit their name. Size and distance can be used to define other notions too, like certainty, utility, etc., but this discussion would lead beyond this handbook chapter, and we refer the reader to [Schlechta, 2004] instead. We discuss first the concept of size It is natural to interpret “normality” by some sort of “size”: “normality” might just mean “majority” (perhaps with different weight given to different cases), or something like “a big subset”. The standard abstraction of “big” is the notion of a filter (or, dually, an ideal is the abstraction of “small”). We include immediately a modification, the weak versions, to be discussed below. They seem to be minimal in the following sense: A reasonable abstract notion of size without the properties of weak filters seems difficult to imagine: The full set seems the best candidate for a “big” subset, “big” should cooperate with inclusion, and, finally, no set should be big and small at the same time.
Nonmonotonic Logics: A Preferential Approach
455
DEFINITION 1. Fix a base set X. A (weak) filter on or over X is a set F ⊆ P(X) - P(X) the power set of X -, s.t. (F 1) − (F 3) ((F 1), (F2), (F 3′ ) respectively) hold: (F1) X ∈ F (F2) A ⊆ B ⊆ X, A ∈ F imply B ∈ F (F3) A, B ∈ F imply A ∩ B ∈ F (F3′ ) A, B ∈ F imply A ∩ B = ∅. So a weak filter satisfies (F3′ ) instead of (F3). An (weak) ideal on or over X is a set I ⊆ P(X), s.t. (I1)–(I3) ((I1), (I2), (I3′ ) respectively) hold: (I1)
∅∈I
(I2)
A ⊆ B ⊆ X, B ∈ I imply A ∈ I
(I3)
A, B ∈ I imply A ∪ B ∈ I
(I3′ )
A, B ∈ I imply A ∪ B = X.
So a weak ideal satisfies (I3′ ) instead of (I3). Elements of a filter on X are called big subsets of X, their complements are called small, and the rest have “medium size”. The set of the X-complements of the elements of a filter form an ideal, and vice versa. Due to the finite intersection property, filters and ideals work well with logics: If φ holds normally, as it holds in a big subset, and so does φ′ , then φ ∧ φ′ will normally hold, too, as the intersection of two big subsets is big again. This is a nice property, but not justified in all situations, consider e.g. simple counting of a finite subset. (The question has a name, “lottery paradox”: normally no single participant wins, but someone wins in the end.) This motivates the weak versions, see Section 2.3 below for more details. Normality defined by (weak or not) filters is a local concept: the filter defined on X and the one defined on X ′ might be totally independent. Consider, however, the following two situations: Let Y ′ be a big subset of X ′ , X ⊆ X ′ , and Y ′ ⊆ X. If “size” has any absolute meaning, then Y ′ should be a big subset of X, too. On the other hand, let X and X ′ be big subsets of Y, then there are good reasons (analogue to those justifying the intersection property of filters) to assume that X ∩ X ′ is also a big subset of X ′ . These set properties are strongly connected to logical properties: For instance, if the latter property holds, we can deduce the logical property Cautious Monotony (see below for a formal definition): If ψ implies normally φ and φ′ , because the sets X and X ′ of ψ ∧ φ−models and ψ ∧ φ′ models are big subsets of the set Y of ψ-models, then ψ ∧ φ′ will imply normally φ too, as the set X ∩ X ′ of ψ ∧ φ ∧ φ′ -models will be a big subset of the set X ′ of ψ ∧ φ′ -models.
456
Karl Schlechta
Seen more abstractly, such set properties allow the transfer of big subsets from one to another base set (and the conclusions drawn on this basis), and we call them “coherence properties”. They are very important, not only for working with a logic which respects them, but also for soundness and completeness questions, often they are at the core of such problems. The reader is invited to read the articles by Ben-David and Ben-Eliyahu [1994] and Friedman and Halpern [1995], which treat essentially the same questions in different languages (and perhaps their comparison by the author in [Schlechta, 1997b] and [Schlechta, 2004]). We turn to the concept of distance Suppose we have a (by some criterion) ideal situation — be it realistic or not. “Normality” might then be defined via some distance: normal situations are the cases among those considered which have minimal distance from the ideal ones. “Distance” need not be a metric, it might be symmetric or not, it might respect identity (only x has distance 0 to x), it might respect the triangle inequality or not, it may even fail to be a total order: the distance from x to y might be incomparable to the distance from x′ to y ′ . We define distance or pseudo-distance for our purposes as: DEFINITION 2. d : U × U → Z is called a pseudo-distance on U iff (d1) holds: (d1)
Z is totally ordered by a relation < .
If, in addition, Z has a < −smallest element 0, and (d2) holds, we say that d respects identity: (d2)
d(a, b) = 0 iff a = b.
If, in addition, (d3) holds, then d is called symmetric: (d3)
d(a, b) = d(b, a).
(For any a, b ∈ U.) Let ≤ stand for < or = . Note that we can force the triangle inequality to hold trivially (if we can choose the values in the real numbers): It suffices to choose the values in the set {0} ∪ [0.5, 1], i.e. in the interval from 0.5 to 1, or as 0. This remark is due to D.Lehmann. (Usually, we will only be interested in the comparison of distances, not in their absolute values, so we can thus make the triangle inequality hold trivially.) A preference relation is, in its most general form, just an arbitrary binary relation ≺, expressing different degrees of normality or (for historical reasons, better:) abnormality. We will then not so much consider all elements of a (model) set, but only the “best” or ≺ −minimal ones, and reason with these “best” elements. We thus define a logic by T |∼ φ iff in the ≺ −best models of T φ holds. (It is reasonable to assume here for the moment that such best models always exist, if there are any T -models at all.) Preferential models are formally defined in Definitions
Nonmonotonic Logics: A Preferential Approach
457
7 and 8 below for the “minimal” version, and Definition 75 for the “limit version” — see there for an explanation. To see the conceptual connection between distance and preference, consider the following argument: a is preferred to b iff the distance from an ideal point ∞ to a is smaller than the distance from ∞ to b. This might be the moment to make our “situations” more precise: In most cases, they will just be classical propositional models, (almost) as in Kripke semantics for modal and similar logics, or as in Stalnaker-Lewis semantics for counterfactual conditionals (which, by the way, work with distances, too). A natural distance for such classical models is (at least in the finite case) the Hamming distance: the distance between m and m′ is the number of propositional variables (or atoms) in which they differ. Finally, when we consider e.g. situations developing over several steps, e.g. for iterated update, we might be interested to form sums e.g. of distances between situations (now, of course, absolute values will matter). Here, well-known algorithms to solve systems of (in)equalities of sums are useful to investigate representation problems. The reader is referred to [Schlechta, 2004] for details. Before we turn to historical remarks to conclude this introduction, we will introduce some definitions which are basic for the rest of this Chapter of the Handbook.
1.2
Some definitions
We will assume the Axiom of Choice throughout this chapter. DEFINITION3. We use P to denote the power set operator, Π{Xi : i ∈ I} := {g : g : I → {Xi : i ∈ I}, ∀i ∈ I.g(i) ∈ Xi } is the general cartesian product, card(X) shall denote the cardinality of X, and V the set-theoretic universe we work in - the class of all sets. Given a set of pairs X , and a set X, we denote by X ⌈X := {x, i ∈ X : x ∈ X}. A ⊆ B will denote that A is a subset of B or equal to B, and A ⊂ B that A is a proper subset of B, likewise for A ⊇ B and A ⊃ B. Given some fixed set U we work in, and X ⊆ U, then C(X) := U − X. ≺∗ will denote the transitive closure of the relation ≺ . If a relation <, ≺, or similar is given, a⊥b will express that a and b are < − (or ≺ −) incomparable — context will tell. A child (or successor) of an element x in a tree t will be a direct child in t. A child of a child, etc. will be called an indirect child. Trees will be supposed to grow downwards, so the root is the top element. A subsequence σi : i ∈ I ⊆ µ of a sequence σi : i ∈ µ is called cofinal, iff for all i ∈ µ there is i′ ∈ I i ≤ i′ . Unless said otherwise, we always work in propositional logic. DEFINITION 4. If L is a propositional language, v(L) will be the set of its variables, ML the set of its classical models, φ, etc. shall denote formulas, T, etc. theories in L, and M (T ) or MT ⊆ ML the models of T, likewise for φ.
458
Karl Schlechta
A theory will just be an arbitrary set of formulas, without any closure conditions. For any classical model m, let T h(m) be the set of formulas valid in m, likewise T h(M ) := {φ : m |= φ for all m ∈ M }, if M is a set of classical models. |= is the sign of classical validity. For two theories T and T ′ , let T ∨ T ′ := {φ ∨ ψ : φ ∈ T, ψ ∈ T ′ }. ⊥ stands for falsity (the double use will be unambigous). T ⊆ L will denote the closure of T under classical logic, and ⊢ the classical consequence relation, thus T := {φ : T ⊢ φ}. Given some other logic |∼, T will denote the set of consequences of T under that logic, i.e. T := {φ : T |∼ φ}. Con(T ) will say that T is classically consistent, likewise Con(φ), etc. Note that the double bar notation does not really conflict with the single bar notation: closing twice under classical logic makes no sense from a pragmatic point of view, as the classical consequence operator is idempotent. D L ⊆ P(ML ) shall be the set of definable subsets of ML , i.e. A ∈ D L iff there is some T ⊆ L s.t. A = MT . If the context is clear, we omit the subscript L from DL . For X ⊆ P(ML ), a function µ : X → P(ML ) will be called definability preserving, iff µ(Y ) ∈ D L for all Y ∈ D L ∩ X. We recall the following basic facts about definable sets. The reader should be familiar with such properties. FACT 5. (1) ∅, ML ∈ D L . (2) D L contains all singletons, is closed under arbitrary intersections and finite unions. (3) If v(L) is infinite, and m any model for L, then M := ML − {m} is not definable by any theory T. (Proof: Suppose it were, and let φ hold in M ′ , but not in m, so in m ¬φ holds, but as φ is finite, there is a model m′ in M ′ which coincides on all propositional variables of φ with m, so in m′ ¬φ holds, too, a contradiction.) (4) There is an easy cardinality argument which shows that in the infinite case, there are many more not definable than definable model sets: If κ = card(v(L)), then κ is also the size of the set of L-formulas, so there there are 2κ L−theories, thus at most 2κ definable model sets. Yet there are 2κ κ different models, so 2(2 ) model sets. This arguments complements above 3., one is constructive, the other shows the cardinality difference.
1.3 Historical remarks We conclude this introduction by some very short historical remarks, and by putting our approach into a more general perspective.
Nonmonotonic Logics: A Preferential Approach
459
Preferential structures or models were first considered by Hansson as a semantics for deontic logic, see [Hansson, 1971], where the relation expresses moral quality of a situation. They were re-discovered independently by Shoham [1987] and Siegel [1985] (in the latter case, in the limit version, see below in Section 3.4) as an abstract semantics for reasoning about normal cases. Distance based semantics were, to the author’s knowledge, first considered in the Stalnaker–Lewis semantics for counterfactuals (see [Lewis, 1973]), and introduced as a semantics for theory revision (see below) by Lehmann, Magidor, and the author, see [Lehmann et al., 2001]. Filter or weak filter based semantics for reasoning with and about normality were introduced by the author in a first order setting, and re-discovered independently by Ben-David and Ben-Eliyahu [1994] and Friedman and Halpern [1995], which treat essentially the same questions in different languages. The various properties of choice functions in a finite setting were considered by economists in the context of social choice, and re-discovered indepently by the author in the infinite setting for his representation proofs. The contents of Section 2.4 is, of course, due to Alchourron, G¨ ardenfors, and Makinson, see [1985], part of the work of Kraus, Lehmann, and Magidor, [Kraus et al., 1990], and [Lehmann and Magidor, 1992] is described in Section 2.2. Plausibility logic in Section 3.3 is due to Lehmann, [1992a; 1992b]. The rest of the material - with some small exceptions, indicated locally — is due to the author, and presented in detail in his book, [Schlechta, 2004]. It might have become clear implicitly that we concentrate on the model side, more precisely, we will consider as consequences of a formula or theory those formulas which hold in some set of classical models — chosen in an adequate manner. As this approach works with some choice of model sets, as do Kripke semantics for modal logic, we call such logics generalized modal logics. This covers preferential logics, the logics of counterfactual conditionals, theory revision and theory update, and, of course, usual modal logic. This general approach is quite liberal, but it already has some consequences, and thus excludes some ways of reasoning: the set of consequences will be closed under classical logic (and will contain the original information if the chosen set is a subset of the models of this original information — as will often be the case). On the other hand, e.g. obligations cannot be modelled this way, as classical weakening will hold in our approach, but from the obligation not to steal, we cannot conclude the obligation not to steal or to kill our grandmother. 2 BASIC DEFINITIONS, RESULTS, AND DISCUSSION
2.1
Introduction
Above lines have indicated perhaps already to the reader a strategy usually adopted by the author: We split the proof of completeness results into two subproblems: We first try to characterize algebraic properties of model set functions generated by structures,
460
Karl Schlechta
and, second, try to find logical properties corresponding to the algebraic ones. For instance, the set of preferred models of a formula is a subset of the set of models of this formula, and, second, if φ implies φ′ classically, then any preferred φ′ -model, which is a φ-model, will also be a preferred φ−model. (Reason, suppose not, so there is m′ ≺ m, m′ ∈ M (φ) ⊆ M (φ′ ), so m cannot be a preferred φ′ model.) Thus, if µ — for minimal — is the choice function generated by ≺, then µ(X) ⊆ X, and if X ⊆ Y, then µ(Y ) ∩ X ⊆ µ(X). The problem is now split into the questions: (1) Do these two properties of µ characterize all choice functions defined by preferential structures? (2) What are the corresponding logical properties? This split approach has several advantages: (1) The main work is usually in the first part, the algebraic characterization, and it can be re-used for different logics (as only the second part changes). (2) We do not need to worry about often trivial properties like classical equivalence etc., they are built-in. (3) It helps to bring to light (and to solve them) more subtle problems on both sides: First, the model choice function may have very nice algebraic properties, but it does not cooperate with logic: µ(M (φ)) may not be a logically definable model set. This has very nasty consequences for logical characterization, and the usual attempts will fail in general, one has to adopt other means (see below in Section 3.5). Second, the model domain itself might be a problem, as it is not closed under certain operations, like finite union (it is so for classical logic, but not for some logics defined by sequent calculi). Here, the logical side might still work fine, but the algebraic side collapses, as usual representation techniques will not work any more, resulting in other kinds of desasters and solutions (see below in Section 3.3). Thus, separating both problems helps to see difficulties, and find solutions, too. We summarize frequently used logical and algebraic properties in the following table. The left hand column presents the single formula version, the center column the theory version (a theory is, for us, an arbitrary set of formulas), the right hand column the algebraic version, describing the choice function on the model set, e.g. f (X) ⊆ X corresponds to the rule φ ⊢ ψ implies φ |∼ ψ in the formula version, and to T ⊆ T in the theory version. A short discussion of some of the properties follows the table. (PR) is also called infinite conditionalization - we choose the name for its central role for preferential structures. Note that in the presence of (µ ⊆), and if Y is closed under finite intersections, (µP R) is equivalent to (µP R′ ) f (X) ∩ Y ⊆ f (X ∩ Y ).
Nonmonotonic Logics: A Preferential Approach
461
The system of rules (AND), (OR), (LLE), (RW), (SC), (CP), (CM), (CUM) is also called system P (for preferential), adding (RM) gives the system R (for rationality or rankedness). (AND) is obviously closely related to filters, as we saw already in Section 1. (LLE), (RW), (CCL) will all hold automatically, whenever we work with fixed model sets. (SC) corresponds to the choice of a subset. (CP) is somewhat delicate, as it presupposes that the chosen model set is non-empty. This might fail in the presence of ever better choices, without ideal ones; the problem is addressed by the limit versions — see below in Section 3.4. (PR) is an inifinitary version of one half of the deduction theorem: Let T stand for φ, T ′ for ψ, and φ ∧ ψ |∼ σ, so φ |∼ ψ → σ, but (ψ → σ) ∧ ψ ⊢ σ. (CUM) (whose most interesting half in our context is (CM)) may best be seen as normal use of lemmas: We have worked hard and found some lemmas. Now we can take a rest, and come back again with our new lemmas. Adding them to the axioms will neither add new theorems, nor prevent old ones to hold. (RM) is perhaps best understood by looking at big and small subsets. If the set of φ ∧ ψ−models is a big subset of the set of φ−models, and the set of φ ∧ ψ ′ −models is a not a small subset of the set of φ−models (i.e. big or of medium size), then the set of φ ∧ ψ ∧ ψ ′ −models is a big subset of the set of φ ∧ ψ ′ −models. DEFINITION 6. (AND) φ |∼ ψ, φ |∼ ψ ′ ⇒ φ |∼ ψ ∧ ψ ′ (OR) φ |∼ ψ, φ′ |∼ ψ ⇒ φ ∨ φ′ |∼ ψ (LLE) or Left Logical Equivalence ⊢ φ ↔ φ′ , φ |∼ ψ ⇒ φ′ |∼ ψ (RW) or Right Weakening φ |∼ ψ, ⊢ ψ → ψ ′ ⇒ φ |∼ ψ ′ (CCL) or Classical Closure
(SC) or Supraclassicality φ ⊢ ψ ⇒ φ |∼ ψ (CP) or Consistency Preservation φ |∼ ⊥ ⇒ φ ⊢ ⊥
(AND) T |∼ ψ, T |∼ ψ ′ ⇒ T |∼ ψ ∧ ψ ′ (OR) T |∼ ψ, T ′ |∼ ψ ⇒ T ∨ T ′ |∼ ψ (LLE)
(µ ∪ w) - w for weak f (A ∪ B) ⊆ f (A) ∪ f (B)
T = T′ ⇒ T = T′ (RW) T |∼ ψ, ⊢ ψ → ψ ′ ⇒ T |∼ ψ ′ (CCL) T is classically closed (SC) T ⊆T (CP)
(µ ⊆) f (X) ⊆ X (µ∅)
T |∼ ⊥ ⇒ T ⊢ ⊥
f (X) = ∅ ⇒ X = ∅
462
2.2
Karl Schlechta
(RM) or Rational Monotony φ |∼ ψ, φ | ∼ ψ ′ ⇒ φ ∧ ψ ′ |∼ ψ (CM) or Cautious Monotony φ |∼ ψ, φ |∼ ψ ′ ⇒ φ ∧ ψ |∼ ψ ′ (CUM) or Cumulativity φ |∼ ψ ⇒ (φ |∼ ψ ′ ⇔ φ ∧ ψ |∼ ψ ′ )
(RM) T |∼ ψ, T | ∼ ψ ′ ⇒ T ∪ {ψ ′ } |∼ ψ (CM) T ⊆ T′ ⊆ T ⇒ T ⊆ T′ (CUM) T ⊆ T′ ⊆ T ⇒ T = T′ (PR)
(µ =) X ⊆ Y, Y ∩ f (X) = ∅ ⇒ f (X) = f (Y ) ∩ X
φ ∧ φ′ ⊆ φ ∪ {φ′ }
T ∪ T′ ⊆ T ∪ T′
X⊆Y ⇒ f (Y ) ∩ X ⊆ f (X)
f (X) ⊆ Y ⊆ X ⇒ f (Y ) ⊆ f (X) (µCU M ) f (X) ⊆ Y ⊆ X ⇒ f (Y ) = f (X) (µP R)
Preferential Structures
We begin the formal discussion by a presentation of preferential structures. They are perhaps the best examined semantics for nonmonotonic logics. Though the basic idea is simple, it has led to a number of interesting investigations, concepts, and techniques. Basic definitions The following two definitions make preferential structures precise. We first give the algebraic definition, and then the definition of the consequence relation generated by a preferential structure. In the algebraic definition, the set U is an arbitrary set, in the application to logic, this will be the set of classical models of the underlying propositional language. In both cases, we first present the simpler variant without copies, and then the one with copies. (Note that e.g. [Kraus et al., 1990; Lehmann and Magidor, 1992] use labelling functions instead, the version without copies corresponds to injective labelling functions, the one with copies to the general case. These are just different ways of speaking.) We will discuss the difference between the version without and the version with copies below, where we show that the version with copies is strictly more expressive than the version without copies, and that transitivity of the relation adds new properties in the case without copies. When we summarize our own results below (see Section 2.2), we will mention that, in the general case with copies, transitivity can be added without changing properties. We give here the “minimal version”, the much more complicated “limit version” is presented and discussed in Section 3. Recall the intuition that the relation ≺ expresses “normality” or “importance” — the ≺-smaller, the more normal or important. The smallest elements are those which count. The problem and justification of copies has not been definitely solved up to the present (to the author’s knowledge). In this context, it is probably best to see it as a technical trick to achieve more expressiveness of the logic — see Examples 11 and 12 below, and [Schlechta, 2004] for an extended discussion.
Nonmonotonic Logics: A Preferential Approach
463
DEFINITION 7. Fix U = ∅, and consider arbitrary X. Note that this X has not necessarily anything to do with U, or U below. Thus, the functions µM below are in principle functions from V to V - where V is the set theoretical universe we work in. (A)
Preferential models or structures.
(1)
The version without copies: A pair M := U, ≺ with U an arbitrary set, and ≺ an arbitrary binary relation is called a preferential model or structure.
(2)
The version with copies: A pair M := U, ≺ with U an arbitrary set of pairs, and ≺ an arbitrary binary relation is called a preferential model or structure. If x, i ∈ U, then x is intended to be an element of U, and i the index of the copy.
(B)
Minimal elements, the functions µM
(1)
The version without copies: Let M := U, ≺, and define µM (X) := {x ∈ X : x ∈ U ∧¬∃x′ ∈ X ∩U.x′ ≺ x}. µM (X) is called the set of minimal elements of X (in M).
(2)
The version with copies: Let M := U, ≺ be as above. Define µM (X) := {x ∈ X : ∃x, i ∈ U.¬∃x′ , i′ ∈ U(x′ ∈ X ∧ x′ , i′ ′ ≺ x, i)}.
Note that one minimal copy suffices to make the element, i.e. the first coordinate, minimal. Changing the quantifier from existential to universal would, of course, change the approach fundamentally. Again, by abuse of language, we say that µM (X) is the set of minimal elements of X in the structure. If the context is clear, we will also write just µ. We sometimes say that x, i “kills” or “minimizes” y, j if x, i ≺ y, j. By abuse of language we also say a set X kills or minimizes a set Y if for all y, j ∈ U, y ∈ Y there is x, i ∈ U, x ∈ X s.t. x, i ≺ y, j. M is also called injective or 1-copy, iff there is always at most one copy x, i for each x. We say that M is transitive, irreflexive, etc., iff ≺ is. Recall that µ(X) might well be empty, even if X is not.
464
Karl Schlechta
DEFINITION 8. We define the consequence relation of a preferential structure for a given propositional language L. (A) (1)
If m is a classical model of a language L, we say by abuse of language m, i |= φ iff m |= φ, and if X is a set of such pairs, that X |= φ iff for all m, i ∈ X m |= φ.
(2)
If M is a preferential structure, and X is a set of L-models for a classical propositional language L, or a set of pairs m, i, where the m are such models, we call M a classical preferential structure or model.
(B)
Validity in a preferential structure, or the semantical consequence relation defined by such a structure: Let M be as above. We define: T |=M φ iff µM (M (T )) |= φ, i.e. µM (M (T )) ⊆ M (φ). M will be called definability preserving iff for all X ∈ D L µM (X) ∈ D L . As µM is defined on D L , but need by no means always result in some new definable set, this is (and reveals itself as a quite strong) additional property.
We define now two additional properties of the relation, smoothness and rankedness. The first condition says that if x ∈ X is not a minimal element of X, then there is x′ ∈ µ(X) x′ ≺ x. In the finite case without copies, smoothness is a trivial consequence of transitivity and lack of cycles. But note that in the other cases infinite descending chains might still exist, even if the smoothness condition holds, they are just “short-circuited”: we might have such chains, but below every element in the chain is a minimal element. In the author’s opinion, smoothness is difficult to justify as a structural property (or, in a more philosophical spirit, as a property of the world): why should we always have such minimal elements below non-minimal ones? Smoothness has, however, a justification from its consequences. Its attractiveness comes from two sides: First, it generates the very valuable logical property, cumulativity (CUM): If M is smooth, and T is the set of |=M -consequences, then T ⊆ T ′ ⊆ T implies T = T ′ — see the discussion after Definition 6. Second, for certain approaches, it facilitates completeness proofs, as we can look directly at “ideal” elements, without having to bother about intermediate stages. See in particular the work by Lehmann and his co-authors, [Kraus et al., 1990; Lehmann and Magidor, 1992]. We will mention in Section 2.2 that cumulativity can also be achieved without smoothness, by topological means. “Smoothness”, or, as it is also called, “stopperedness”, seems — in the author’s opinion - a misnamer. I think it should better
Nonmonotonic Logics: A Preferential Approach
465
be called something like “weak transitivity”: consider the case where a ≻ b ≻ c, but c ≺ a, with c ∈ µ(X). It is then not necessarily the case that a ≻ c, but there is c′ “sufficiently close to c”, i.e. in µ(X), s.t. a ≻ c′ . Results and proof techniques underline this idea. First, in the general case with copies, and in the smooth case, transitivity does not add new properties, it is “already present”, second, the construction of smoothness by sequences σ (see below in Section 3.2) is very close in spirit to a transitive construction. The second condition, rankedness, seems easier to justify already as a property of the structure. It says that, essentially, the elements are ordered in layers: If a and b are not comparable, then they are in the same layer. So, if c is above (below) a, it will also be above (below) b — like pancakes or geological strata. Apart from the triangle inequality (and leaving aside cardinality questions), this is then just a distance from some imaginary, ideal point. Again, this property has important consequences on the resulting model choice functions and consequence relations, making proof techniques for the non-ranked and the ranked case very different. DEFINITION 9. Let Z ⊆ P(U ). (In applications to logic, Z will be D L .) A preferential structure M is called Z−smooth iff in every X ∈ Z every element x ∈ X is either minimal in X or above an element, which is minimal in X. More precisely: (1)
The version without copies: If x ∈ X ∈ Z, then either x ∈ µ(X) or there is x′ ∈ µ(X).x′ ≺ x.
(2)
The version with copies: If x ∈ X ∈ Z, and x, i ∈ U, then either there is no x′ , i′ ∈ U, x′ ∈ X, x′ , i′ ≺ x, i or there is x′ , i′ ∈ U, x′ , i′ ≺ x, i, x′ ∈ X, s.t. there is no x′′ , i′′ ∈ U, x′′ ∈ X, with x′′ , i′′ ≺ x′ , i′ .
When considering the models of a language L, M will be called smooth iff it is D L -smooth; D L is the default. Obviously, the richer the set Z is, the stronger the condition Z-smoothness will be. DEFINITION 10. A relation ≺U on U is called ranked iff there is an orderpreserving function from U to a total order O, f : U → O, with u ≺U u′ iff f (u) ≺O f (u′ ), equivalently, if x and x′ are ≺U −incomparable, then (y ≺U x iff y ≺U x′ ) and (y ≻U x iff y ≻U x′ ) for all y. (See Fact 23.) It is easily seen that copies are largely unnecessary in ranked structures. The rough argument is as follows: suppose we have two copies of x, x1 and x2 , and y ≺ x1 , z ≺ x2 . If, e.g. x1 ≺ x2 , then x2 is superflous (by transitivity, and ranked structures are transitive). If x1 and x2 are incomparable, then both z and y are smaller than both copies, again they are superfluous. As promised above, we show now with two simple examples the importance of the distinction between structures with and without copies. The first example (or an equivalent construction) seems to be folklore in the field and first published by
466
Karl Schlechta
Lehmann. It shows that the version with copies is strictly more expressive than the version without. The second one shows that transitivity adds something in the case without copies, we will state later that it does not do so in the case with copies. EXAMPLE 11. Consider the propositional language L of 2 propositional variables p, q, and the preferential model M defined by m |= p ∧ q, m′ |= p ∧ q, m2 |= ¬p ∧ q, m3 |= ¬p ∧ ¬q, with m2 ≺ m, m3 ≺ m′ , and let |=M be its consequence relation. Obviously, T h(m) ∨ {¬p} |=M ¬p, but there is no complete theory T ′ s.t. T h(m) ∨ T ′ |=M ¬p. (If there were one, T ′ would correspond to m, m2 , m3 , or the missing m4 |= p ∧ ¬q, but we need two models to kill all copies of m.) On the other hand, if there were just one copy of m, then one other model would suffice to kill m. More formally, if we admit at most one copy of each model in a structure M, m |= T, and T h(m) ∨ T |=M φ for some φ s.t. m |= ¬φ — i.e. m is not minimal in the models of T h(m) ∨ T - then there is a complete T ′ with T ′ ⊢ T and T h(m) ∨ T ′ |=M φ, i.e. there is m′′ with m′′ |= T ′ and m′′ ≺ m. EXAMPLE 12. Consider the structure a ≺ b ≺ c, but a ≺ c. This is not equivalent to any transitive structure with one copy each, i.e. there is no transitive structure whose function µ′ is equal to µ : As µ({a, b}) = {a}, and µ({b, c}) = {b}, we must have a ≺ b ≺ c, but then by transitivity a ≺ c, so µ′ ({a, c}) = {a} in the new structure, contradicting µ({a, c}) = {a, c} in the old structure. The KLM results and their discussion We summarize now the main results of the classical papers [Kraus et al., 1990] and [Lehmann and Magidor, 1992]. PROPOSITION 13. (1) Smooth preferential structures (with copies) are characterized by the formula versions of the logical properties summarized in system P. (2) Smooth ranked preferential structures are characterized by the formula versions of the logical properties summarized in system R, i.e. P and, in addition RM. (See Definition 6 for the definitions, and [Kraus et al., 1990; Lehmann and Magidor, 1992] for details.) Limitation of the KLM result It was an obvious question whether an analogue holds for the full theory version of above rules. The answer is negative, as was shown by the author in [Schlechta, 1992] Instead of presenting details (which can be found there or in [Schlechta, 2004]), we discuss the underlying situation. The main result of KLM can be reformulated as follows, using a new condition:
Nonmonotonic Logics: A Preferential Approach
467
DEFINITION 14. We say that |∼ satisfies Distributivity iff T ∩ T ′ ⊆ T ∩ T ′ for all theories T, T ′ of L. S. Kraus, D. Lehmann and M. Magidor have shown that the finitary restrictions of all supraclassical, cumulative, and distributive inference operations are representable by preferential structures. We will make it plausible now that this does not generalize to the arbitrary infinitary case, i.e. with theories on the left hand side. Leaving aside questions of definability preservation, above condition translates into the following model set condition, where µ is the model choice function: (µD)
µ(X ∪ Y ) ⊆ µ(X) ∪ µ(Y )
We will see below that condition (µP R) X ⊆ Y → µ(Y ) ∩ X ⊆ µ(X) essentially characterizes preferential structures, and its validity was seen as soon as the definition of minimal preferential structures was known. In these terms, the problem is whether (µ ⊆) + (µCU M ) + (µD) entail (µP R) in the general case. Now, we see immediately: (µP R) + (µ ⊆) entail (µD) : µ(X ∪ Y ) = (µ(X ∪ Y ) ∩ X) ∪ (µ(X ∪ Y ) ∩ Y ) ⊆ µ(X) ∪ µ(Y ). Second, if the domain is closed under set difference, then (µD) + (µ ⊆) entail (µP R) : Let U ⊆ V, V = U ∪ (V − U ). Then µ(V ) ∩ U ⊆ (µ(U ) ∪ µ(V − U )) ∩ U = µ(U ). The condition of closure under set difference is, of course, satisfied for formula defined model sets, but not in the general case of theory defined model sets. To make the problem more palatable, we formulate it as follows: µ(X) is something like the “core” of X. A counterexample to (µP R), i.e. a case of X ⊆ Y and µ(Y ) ∩ X ⊆ µ(X) says that small sets do not always “protect” their core as well as big ones — the contrary to preferential structures, which “autodestruct” their “outer part”, and, the bigger the set, the more elements will be destroyed. (µD) says that protection is immune to finite unions: their components protect their cores as well their union does. Now, it seems not so difficult to find a counterexample, and, in hindsight, it is surprising that it took some time to arrive there. Intuitively, we can work from the inside, where smaller sets destroy more elements, or from the outside, where bigger sets protect their elements better. The original counterexample is of the first type: smaller sets decide more things, and we use this decision to make the logic stronger: the decision of infinitely many pi will destroy all ¬r-models in the definition of the choice function. We might also work from the outside: say a point is in the core (or protected) iff there is a (nontrivial) sequence of elements converging to it in some suitable topology, e.g.
468
Karl Schlechta
the natural one in propositional logic. Then small sets are less protective, and the property will be robust under finite operations. If we take a little care, we can make the operations cumulative. For details, see [Schlechta, 2004]. A nonsmooth model of cumulativity We have described smoothness as a weak kind of transitivity. The following example shows that “weak smoothness” (or “weak weak transitivity”) suffices to give cumulativity. See [Schlechta, 1999] or [Schlechta, 2004] for details. The idea behind the construction is extremely simple, and related to closure properties of the domain. Smoothness says that in each set of the domain, every element is either minimal, or there must be an element below it which is minimal in this set. Smoothness on the model side entails cumulativity on the logics side — but not vice versa, as the construction to be presented shows. Cumulativity can be violated in preferential structures, i.e. φ |∼ ψ, φ |∼ τ, but not φ ∧ ψ |∼ τ, may hold, as there might be a φ ∧ ψ ∧ ¬τ -model m, which is not minimal in the set of φ−models, but minimal in the set of φ ∧ ψ−models. There will then be a φ ∧ ¬ψ-model m′ smaller than m, but no φ ∧ ψ-model smaller than m, in particular no minimal φ-model smaller than m, as smoothness postulates. To do without smoothness is simple. We just have to assure that in all formula definable sets (and it is here where the domain properties matter), which contain the set of φ-minimal models µ(φ), there must be an element smaller than m. But this is guaranteed if we have a sequence of models smaller than m converging in the standard topology to µ(φ). Then any set of models for any ψ which contains µ(φ) will contain some element of this sequence, so m will not be minimal. We give now the details, without many comments, as this part of this Section 2.2 is adressed more to the advanced reader. We did not put it into Section 3, as its context by substance is best found here. DEFINITION 15. A sequence f of models converges to a set of models M, f → M , iff ∀φ(M |= φ → ∃i∀j ≥ i.fj |= φ). If M = {m}, we will also write f → m. FACT 16. Let f be a sequence composed of n subsequences f 1 , . . . , f n , e.g. fn∗j+0 = fj1 , etc., and f i → Mi . Let φ be a formula unboundedly often true in f . Then there is 1 ≤ i ≤ n and m ∈ Mi s.t. m |= φ. EXAMPLE 17. (A nonsmooth transitive injective — i.e. without copies — structure validating system P ) Take the language defined by the propositional variables r, s, t, pi : i < ω. Take four models mi , i = 1, . . . , 4, where for all i, jmi |= pj (to be definite), and let m0 |= r, ¬s, t, m1 |= r, ¬s, ¬t, m2 |= r, s, t, m3 |= r, s, ¬t. It is important to make m2 and m3 identical except for t, the other values for the pj are unimportant. Let m2 < m1 . (The other mi are incomparable.) Define two sequences of models f 1 → m1 , f 3 → m3 s.t. for all i,j fji |= r, ¬t. This is possible, as m1 |= r, ¬t, m3 |= r, ¬t.
Nonmonotonic Logics: A Preferential Approach
469
6
s m0 |= r, ¬s, t m1 |= r, ¬s, ¬t
D f
?
m2 |= r, s, t
m3 |= r, s, ¬t
Figure 1. All models in these sequences can be chosen different, and different from the mi — this is no problem, as we have for all consistent φ uncountably many models where φ holds. Let f be the mixture of f i , e.g. f2n+0 := fn1 , etc. Put m0 above f, with f in descending order. Arrange the rest of the 2ω models above m0 ordered as the ordinals, i.e. every subset has a minimum. Thus, there is one long chain C (i.e. C is totally ordered) of models, at its lower end a descending countable chain f, directly above f m0 , above m0 all other models except m1 −m3 , arranged in a well-order. The models m1 −m3 form a separate group. See Figure 1. Note that m0 is a minimal model of t. The usual rules of P hold, as this is a preferential structure, except perhaps for (CM), which holds in smooth structures, and our construction is not smooth. Note that (CM) says φ |∼ ψ → φ = φ ∧ ψ, so it suffices to show for all φφ |∼ ψ → µ(φ) = µ(φ∧ψ). This is the point of the construction. The infinite descending chains converge to some minimal model, so if α holds in this minimal model, then α holds infinitely often in the chain, too. Thus there are no new minimal models
470
Karl Schlechta
of α, which might weaken the consequences. We examine the possible cases of µ(φ)(∅, {m1 }, {m2 }, {m3 }, {m1 , m3 }, {m2 , m3 }, and µ(φ) ∩ C = ∅). We check the different cases for (CM), to show that it holds despite the lack of smoothness: Case 1:
µ(φ) = {m2 , m3 } : Then φ |∼ ψ iff {m2 , m3 } |= ψ. So if φ |∼ ψ, then φ ∧ ψ holds in m3 , so by f 3 , φ ∧ ψ is (downward) unboundedly often true in f, so µ(φ ∧ ψ) = {m2 , m3 }.
Case 2:
µ(φ) = {m1 } and Case 3: µ(φ) = {m3 } : as above, by f 1 and f 3 .
Case 4:
µ(φ) = {m2 } : As m2 |= φ, and m3 | = φ, φ is of the form φ′ ∧ t, so none of the fi is a model of φ, so φ has a minimal model in the chain C, so this is impossible.
Case 5:
µ(φ) = {m1 , m3 } : Then φ |∼ ψ iff {m1 , m3 } |= ψ. So as in Case 1, if φ |∼ ψ, φ ∧ ψ is unboundedly often true in f 1 (and in f 3 ), and µ(φ ∧ ψ) = {m1 , m3 }.
Case 6:
µ(φ) = ∅ : This is impossible by Fact 16: If φ is unboundedly often true in C, then it must be true in one of m1 , m3 .
Case 7:
µ(φ) ∩ C = ∅ : Then below each m |= φ, there is m′ ∈ µ(φ). Thus, the usual argument which shows Cumulativity in smooth structures applies.
The representation results of the author We turn now to the results of the author. As indicated in the introduction, Section 2.1 above, we will split representation results (and proofs) into two parts. On the one hand side we have the preferential structure, on the other side, we have the logical properties. We will consider an intermediate level, the algebraic properties of the model choice functions, and establish correspondence between structure and choice function on one side, between choice function and resulting logic on the other side. For the reader’s convenience, we first recall the central algebraic properties, and then summarize the results, first the algebraic representation results, and then their logical counterparts. We then show that the caveat about definability preservation is necessary in the general case. We will see in Example 21, (given already in [Schlechta, 1992]) that condition (PR) may fail, if the structure is not definability preserving. Once we have done the algebraic representation, the proofs and properties for the logical side are essentially straightforward — with above caveat: One has to pay attention to the fact that we can go back and forth between model sets on the one hand side, and theories and their consequences on the other, due to the fact that we have classical soundness and completeness, and that the model
Nonmonotonic Logics: A Preferential Approach
471
set operators are assumed to be definability preserving. The reader will see in Section 3 that the lack of definability preservation complicates things, as we might “overlook” exceptions, when we do not separate carefully model sets from the sets of all models of a theory. The complication is very serious, as we can also show that, in this case, there is no “normal” characterization at all possible. The problem of definability preservation occurs in other situations, too, e.g. in distance based revision, see [Lehmann et al., 2001], and Example 47 below. A solution to the problem of definability preservation in another context (revision of defeasible databases) was examined in [Audibert et al., 1999]. A characterization of general, not necessarily definability preserving, preferential structures (by fundamentally nonlogical, and much uglier conditions than those presented here) is given in Section 3.5 of [Schlechta, 2004], and Section 3.5 below. Recall that the properties (LLE) and (CCL) are essentially void in an algebraic setting, and hold for all logics defined via model sets. (SC) is trivial too, and due to the subset condition, only (PR) and (CUM) are really interesting and expressive. Not necessarily ranked structures CONDITION 18. For a function µ : Y → Y, we consider the conditions, already presented above: (µ ⊆) µ(X) ⊆ X, (µP R) X ⊆ Y → µ(Y ) ∩ X ⊆ µ(X), (µCU M ) µ(X) ⊆ Y ⊆ X → µ(X) = µ(Y ), (µ∅) µ(Y ) = ∅ if Y = ∅, (for all X, Y ∈ Y). We give now the basic algebraic result and its translation to logic. A sketch of the proof is given in Section 3.2 below. We see the central role of the simple condition (µP R), in the not necessarily smooth case, it is almost the only condition, (µ ⊆) being trivial. PROPOSITION 19. (1) An operation µ : Y → Y is representable by a preferential structure iff µ satisfies (µ ⊆) and (µP R). The structure can be chosen transitive. (2) Let Y be closed under finite unions and finite intersections, and µ : Y → Y. Then there is a Y−smooth preferential structure Z, s.t. for all X ∈ Y µ(X) = µZ (X) iff µ satisfies (µ ⊆), (µP R), (µCU M ). The structure can be chosen transitive. The proof makes extensive use of copies. PROPOSITION 20. Let |∼ be a logic for L. (1)
Then there is a (transitive) definability preserving classical preferential model M s.t. T = T h(µM (M (T ))) iff
472
Karl Schlechta
(LLE) T = T ′ → T = T ′ , (CCL) T is classically closed, (SC)
T ⊆ T,
(PR)
T ∪ T ′ ⊆ T ∪ T ′ for all T, T ′ ⊆ L.
(2)
The structure can be chosen smooth, iff, in addition
(CUM) T ⊆ T ′ ⊆ T → T = T ′ holds. The proof is straightforward, going back and forth between model sets and theories, but makes essential use of definability preservation, as the following Example 21 shows. Details can be found e.g. in [Schlechta, 2004]. EXAMPLE 21. This example was first given in [Schlechta, 1992]. It shows that condition (PR) may fail in models which are not definability preserving. Let v(L) := {pi : i ∈ ω}, n, n′ ∈ ML be defined by n |= {pi : i ∈ ω}, n′ |= {¬p0 } ∪ {pi : 0 < i < ω}. Let M := ML , ≺ where only n ≺ n′ , i.e. just two models are comparable. Let µ := µM , and |∼ be defined as usual by µ. Set T := ∅, T ′ := {pi : 0 < i < ω}. We have MT = ML , µ(MT ) = ML − ′ {n }, MT ′ = {n, n′ }, µ(MT ′ ) = {n}. So M is not definability preserving, and, furthermore, T = T , T ′ = {pi : i < ω}, so p0 ∈ T ∪ T ′ , but T ∪ T ′ = T ∪ T ′ = T ′ , so p0 ∈ T ∪ T ′ , contradicting (PR). Ranked structures We turn now to ranked structures. One of the main purposes of this section is to draw the reader’s attention to the fact, that often seemingly very similar conditions still differ in their expressive power. We first present some elementary facts about ranked structures, and then discuss a quite long list of conditions, their (in)dependence, and their consequences. The proofs are not always utterly trivial, and the reader is referred to [Schlechta, 2004] for details. NOTATION 22. (1) A = B + C stands for: A = B or A = C or A = B ∪ C. (2) Recall from Definition 1.3: Given ≺, a⊥b means: neither a ≺ b nor b ≺ a. FACT 23. Let ≺ be an irreflexive, binary relation on X, then the following two conditions are equivalent: (1) There is Ω and an irreflexive, total, binary relation ≺′ on Ω and a function f : X → Ω s.t. x ≺ y ↔ f (x) ≺′ f (y) for all x, y ∈ X.
Nonmonotonic Logics: A Preferential Approach
473
(2) Let x, y, z ∈ X and x⊥y wrt. ≺ (i.e. neither x ≺ y nor y ≺ x), then z ≺ x → z ≺ y and x ≺ z → y ≺ z. (The proof is easy. For (2) → (1), take as Ω the sets {x′ : x⊥x′ }.) DEFINITION 24. Call an irreflexive, binary relation ≺ on X, which satisfies (1) (equivalently (2)) of Fact 23, ranked. By abuse of language, we also call the structure X, ≺ ranked. DEFINITION 25. Let Z = X , ≺ be a preferential structure. Call Z 1 − ∞ over Z, iff for all x ∈ Z there are exactly one or infinitely many copies of x, i.e. for all x ∈ Z {u ∈ X : u = x, i for some i} has cardinality 1 or ≥ ω. LEMMA 26. Let Z = X , ≺ be a preferential structure and f : Y → P(Z) with Y ⊆ P(Z) be represented by Z, i.e. for X ∈ Y f (X) = µZ (X), and Z be ranked and free of cycles. Then there is a structure Z ′ , 1 − ∞ over Z, ranked and free of cycles, which also represents f. The proof is more tedious than difficult. We work now on a (nonempty) set U, and consider functions µ : Y → P(U ), where Y ⊆ P(U ). We first enumerate some conditions, which we will consider in the sequel. The differences between them are sometimes quite subtle, as will be seen below, e.g. in Fact 30. Facts 28 and 29 collect some positive results, Fact 30 some negative ones. DEFINITION 27. The conditions for the minimal case are: (µ∅) (µ∅f in) (µ =) (µ =′ ) (µ +) (µ∪) (µ∪′ ) (µ ∈)
X = ∅ → µ(X) = ∅, X = ∅ → µ(X) = ∅ for finite X, X ⊆ Y, µ(Y ) ∩ X = ∅ → µ(Y ) ∩ X = µ(X), µ(Y ) ∩ X = ∅ → µ(Y ∩ X) = µ(Y ) ∩ X, µ(X ∪ Y ) = µ(X) + µ(Y ) (+ is defined in Notation 22), µ(Y ) ∩ (X − µ(X)) = ∅ → µ(X ∪ Y ) ∩ Y = ∅, µ(Y ) ∩ (X − µ(X)) = ∅ → µ(X ∪ Y ) = µ(X), a ∈ X − µ(X) → ∃b ∈ X.a ∈ µ({a, b}).
Note that (µ =′ ) is very close to Rational Monotony: Rational Monotony says: α |∼ β, α |∼ ¬γ → α ∧ γ |∼ β. Or, µ(A) ⊆ B, µ(A) ∩ C = ∅ → µ(A ∩ C) ⊆ B for all A, B, C. This is not quite, but almost: µ(A ∩ C) ⊆ µ(A) ∩ C (it depends how many B there are, if µ(A) is some such B, the fit is perfect). These properties are somewhat technical, but not complicated, so a more detailed discussion is perhaps not necessary. Their importance lies in the following positive and negative results. The proofs of the following two facts are easy, for lack of space, the reader is referred to [Schlechta, 2004]. FACT 28. In all ranked structures, (µ ⊆), (µ =), (µP R), (µ =′ ), (µ +), (µ∪), (µ∪′ ), (µ ∈) will hold, if the corresponding closure conditions (e.g. closure under finite intersections for (µ =′ )) are satisfied.
474
Karl Schlechta
FACT 29. The following properties (2)–(9) hold, provided corresponding closure conditions for the domain Y are satisfied. We first enumerate these conditions. For (3), (4), (8): closure under finite unions. For (2): closure under finite intersections. For (6) and (7): closure under finite unions, and Y contains all singletons. For (5): closure under set difference. For (9): suffienctly strong conditions - which are satisfied for the set of models definable by propositional theories. Note that the closure conditions for (5), (6), (9) are quite different, for this reason, (5) alone is not enough. (1) (µ =) entails (µP R), (2) in the presence of (µ ⊆), (µ =) is equivalent to (µ =′ ), (3) (µ ⊆), (µ =) → (µ∪), (4) (µ ⊆), (µ∅), (µ =) entail: (4.1) (µ +), (4.2) (µ∪′ ), (4.3) (µCU M ), (5) (µ ⊆) + (µ +) → (µ =), (6) (µ +) + (µ ∈) + (µP R) + (µ ⊆) → (µ =), (7) (µCU M ) + (µ =) → (µ ∈), (8) (µCU M ) + (µ =) + (µ ⊆) → (µ +), (9) (µP R) + (µCU M ) + (µ +) → (µ =). FACT 30. (1) (µ ⊆) + (µP R) + (µ =) → (µ +), (2) (µ ⊆) + (µP R) + (µ +) → (µ =) (without closure under set difference), (3) (µ ⊆) + (µP R) + (µ +) + (µ =) + (µ∪) → (µ ∈) (and thus (µ ⊆) + (µP R) + (µ +) + (µ =) + (µ∪) do not guarantee representability by ranked structures by Fact 28). To give the reader a flavour of argumentation in the area, we present its proof.
Nonmonotonic Logics: A Preferential Approach
475
Proof. (1) Consider the following structure without transitivity U := {a, b, c, d}, c and d have ω many copies in descending order c1 5 c2 . . . ., etc. a,b have one single copy each. a 5 b, a 5 d1 , b 5 a, b 5 c1 . (µ +) does not hold: µ(U ) = ∅, but µ({a, c}) = {a}, µ({b, d}) = {b}. (µP R) holds as in all preferential structures. (µ =) holds: If it were to fail, then for some A ⊆ B, µ(B) ∩ A = ∅, so µ(B) = ∅. But the only possible cases for B are now: (a ∈ B, b, d ∈ B) or (b ∈ B, a, c ∈ B). Thus, B can be {a}, {a, c}, {b}, {b, d} with µ(B) = {a}, {a}, {b}, {b}. If A = B, then the result will hold trivially. Moreover, A has to be = ∅. So the remaining cases of B where it might fail are B = {a, c} and {b, d}, and by µ(B) ∩ A = ∅, the only cases of A where it might fail, are A = {a} or {b} respectively. So the only cases remaining are: B = {a, c}, A = {a} and B = {b, d}, A = {b}. In the first case, µ(A) = µ(B) = {a}, in the second µ(A) = µ(B) = {b}, but (µ =) holds in both. (2) Work in the set of theory definable model sets of an infinite propositional language. Note that this is not closed under set difference, and closure properties will play a crucial role in the argumentation. Let U := {y, a, xi<ω }, where xi → a in the standard topology. For the order, arrange s.t. y is minimized by any set iff this set contains a cofinal subsequence of the xi , this can be done by the standard construction. Moreover, let the xi all kill themselves, i.e. with ω many copies x1i 5 x2i 5 . . . . There are no other elements in the relation. Note that if a ∈ µ(X), then a ∈ X, and X cannot contain a cofinal subsequence of the xi , as X is closed in the standard topology. (A short argument: suppose X contains such a subsequence, but a ∈ X. Then the theory of a T h(a) is inconsistent with T h(X), so already a finite subset of T h(a) is inconsistent with T h(X), but such a finite subset will finally hold in a cofinal sequence converging to a.) Likewise, if y ∈ µ(X), then X cannot contain a cofinal subsequence of the xi . Obviously, (µ ⊆) and (µP R) hold, but (µ =) does not hold: Set B := U, A := {a, y}. Then µ(B) = {a}, µ(A) = {a, y}, contradicting (µ =). It remains to show that (µ +) holds. µ(X) can only be ∅, {a}, {y}, {a, y}. As µ(A ∪ B) ⊆ µ(A) ∪ µ(B) by (µP R), Case 1:
µ(A ∪ B) = {a, y} is settled. Note that if y ∈ X − µ(X), then X will contain a cofinal subsequence, and thus a ∈ µ(X).
Case 2:
µ(A ∪ B) = {a}.
Case 2.1: µ(A) = {a} — we are done. Case 2.2: µ(A) = {y} : A does not contain a, nor a cofinal subsequence. If µ(B) = ∅, then a ∈ B, so a ∈ A∪B, a contradiction. If µ(B) = {a}, we are done.
476
Karl Schlechta
If y ∈ µ(B), then y ∈ B, but B does not contain a cofinal subsequence, so A ∪ B does not either, so y ∈ µ(A ∪ B), contradiction. Case 2.3: µ(A) = ∅ : A cannot contain a cofinal subsequence. If µ(B) = {a}, we are done. a ∈ µ(B) does have to hold, so µ(B) = {a, y} is the only remaining possibility. But then B does not contain a cofinal subsequence, and neither does A ∪ B, so y ∈ µ(A ∪ B), contradiction. Case 2.4: µ(A) = {a, y} : A does not contain a cofinal subsequence. If µ(B) = {a}, we are done. If µ(B) = ∅, B does not contain a cofinal subsequence (as a ∈ B), so neither does A ∪ B, so y ∈ µ(A ∪ B), contradiction. If y ∈ µ(B), B does not contain a cofinal subsequence, and we are done again. Case 3:
µ(A ∪ B) = {y} : To obtain a contradiction, we need a ∈ µ(A) or a ∈ µ(B). But in both cases a ∈ µ(A ∪ B).
Case 4:
µ(A ∪ B) = ∅ : Thus, A ∪ B contains no cofinal subsequence. If, e.g. y ∈ µ(A), then y ∈ µ(A ∪ B), if a ∈ µ(A), then a ∈ µ(A ∪ B), so µ(A) = ∅.
(3) Let U := {y, xi<ω }, xi a sequence, each xi kills itself, x1i 5 x2i 5 . . . and y is killed by all cofinal subsequences of the xi . Then for any X ⊆ U µ(X) = ∅ or µ(X) = {y}. (µ ⊆) and (µP R) hold obviously. (µ +) : Let A ∪ B be given. If y ∈ X, then for all Y ⊆ X µ(Y ) = ∅. So, if y ∈ A ∪ B, we are done. If y ∈ A ∪ B, if µ(A ∪ B) = ∅, one of A,B must contain a cofinal sequence, it will have µ = ∅. If not, then µ(A ∪ B) = {y}, and this will also hold for the one y is in. (µ =) : Let A ⊆ B, µ(B) ∩ A = ∅, show µ(A) = µ(B) ∩ A. But now µ(B) = {y}, y ∈ A, so B does not contain a cofinal subsequence, neither does A, so µ(A) = {y}. (µ∪) : (A − µ(A)) ∩ µ(A′ ) = ∅, so µ(A′ ) = {y}, so µ(A ∪ A′ ) = ∅, as y ∈ A − µ(A). But (µ ∈) does not hold: y ∈ U − µ(U ), but there is no x s.t. y ∈ µ({x, y}). We turn to characterizations. Characterizations We characterize first the case without copies (Proposition 31), continue with again some negative results for the general case (Proposition 32), and conclude with a characterization of the general case (Proposition 33).
Nonmonotonic Logics: A Preferential Approach
477
We give two variants for the case without copies, Proposition 31 (1) and (2). The first imposes (µ∅) globally, but does not require the finite subsets to be in the domain, the second needs (µ∅) only for finite sets, i.e. (µ∅f in), but finite sets have to be in the domain. Thus, the first is useful for rules of the form φ |∼ ψ, the second for rules of the form T |∼ φ. Note that the prerequisites of Proposition 31 (2) hold in particular in the case of ranked structures without copies, where all elements of U are present in the structure — we need infinite descending chains to have µ(X) = ∅ for X = ∅. PROPOSITION 31. (1) Let Y ⊆ P(U ) be closed under finite unions. Then (µ ⊆), (µ∅), (µ =) characterize ranked structures for which for all X ∈ Y X = ∅ → µ< (X) = ∅ hold, i.e. (µ ⊆), (µ∅), (µ =) hold in such structures for µ< , and if they hold for some µ, we can find a ranked relation < on U s.t. µ = µ< . Moreover, the structure can be choosen Y-smooth. (2) Let Y ⊆ P(U ) be closed under finite unions, and contain singletons. Then (µ ⊆), (µ∅f in), (µ =), (µ ∈) characterize ranked structures for which for all finite X ∈ Y X = ∅ → µ< (X) = ∅ hold, i.e. (µ ⊆), (µ∅f in), (µ =), (µ ∈) hold in such structures for µ< , and if they hold for some µ, we can find a ranked relation < on U s.t. µ = µ< . For lack of space, the reader is referred to [Schlechta, 2004] for a proof. We turn now to the general case, where every element may occur in several copies. PROPOSITION 32. (1) (µ ⊆) + (µP R) + (µ =) + (µ∪) + (µ ∈) do not imply representation by a ranked structure. (2) The infinitary version of (µ +) : (µ + ∞) µ( {Ai : i ∈ I}) = {µ(Ai ) : i ∈ I ′ } for some I ′ ⊆ I will not always hold in ranked structures. (2) is immediate, when we consider infinite descending chains. We assume again the existence of singletons for the following representation result. PROPOSITION 33. Let Y be closed under finite unions and contain singletons. Then (µ ⊆) + (µP R) + (µ +) + (µ∪) + (µ ∈) characterize ranked structures. For lack of space, the reader is again referred to [Schlechta, 2004] for details.
2.3
(Weak) Filters
Recall the concept of (weak) filters discussed in Section 1.1. We argued there that, if defaults have something to do with “big” subsets - however we measure
478
Karl Schlechta
them — then weak filters should give some kind of minimal semantics to defaults. This is what we do in this section for first order logic. We introduce a new, generalized, quantifier to which we give exactly the desired properties: it should express that a property holds almost everywhere. In particular, the property should hold somewhere if it does so almost everwhere, and, if it holds everywhere, then it holds almost everywhere, and it cannot be that φ and ¬φ hold almost everywhere at the same time. The latter gives a notion of consistency, we cannot write down just anything any more and pretend that it is still a reasonable default theory. Note that the new quantifier ∇ will be fully in the object language, so we can negate it, nest it, mix it with classical quantifiers, everything we can do in usual first order logic. The essential axioms are now 1. ∇xφ(x) ∧ ∀x(φ(x) → ψ(x)) → ∇xψ(x), 2. ∇xφ(x) → ¬∇x¬φ(x), 3. ∀xφ(x) → ∇xφ(x) and ∇xφ(x) → ∃xφ(x). These axioms correspond exactly to the properties of weak filters, so it is not surprising that they (together with the usual ones for classical first order logic) are sound and complete for weak filter models, where the essential supplementary definition of validity for ∇ and the weak filter N (M ) over M’s universe is (with M a classical first order logic model): M, N (M ) |= ∇xφ(x) iff there is A ∈ N (M ) s.t. ∀a ∈ A(M, N (M ) |= φ[a]). There is an easy and natural extension to the relativized version of our generalized quantifier, ∇xφ(x) : ψ(x). This poses the question of coherence between the different normal sets — in our base system, there are simply none, but it is trivial to introduce them, as the correspondence between the semantic and the syntactic versions will be obvious. For instance, we can express that a big subset of X, which is also a subset of Y , will be a big subset Y , if Y ⊆ X by ∀x(ψ(x) → φ(x)) ∧ ∀x(φ(x) ∧ σ(x) → ψ(x)) ∧ ∇xφ(x) : σ(x) → ∇xψ(x) : σ(x). See [Schlechta, 1995] and [Schlechta, 2004] for details.
2.4 Theory Revision We turn now to distance based logics, more precisely, to theory revision. Theory revision is the problem of “fusing” contradictory information to obtain consistent information. The now classical, and most influential, approach is the one by Alchorron, G¨ ardenfors, Makinson, for short AGM. We use “AGM” indiscriminately for the article [Alchourron et al., 1985], for its three authors, and for their approach. We will present it below in very rough outline, but will first take a more general point of view, and discuss shortly some different cases and basic
Nonmonotonic Logics: A Preferential Approach
479
ideas of the problem. We will not present more complicated approaches to theory revision, where for instance the theory itself contains information about revision. Let us first state that the problem of theory revision is underspecified, so, just as for nonmonotonic logics, there are different reasonable solutions for different situations. Consider for instance the following cases: • Two witnesses in court tell different versions of a nighttime accident with poor visibility. We will probably conclude on their reliability from the difference of their testimonies. So the outcome will probably be some “haze” around the OR of the two stories. Of course, there are situations where a contradiction is so coarse that both informations will just be discarded. If, e.g. one witness says to have seen a bicycle, the other an airplane, without further information, we will probably exclude both, and not conclude that there was a means of transportation. This does not interest us here. • Theories are deontic statements, the old law was as reliable as the new one, but the new one shall nonetheless have priority. • We have contradictory information from two sources, but have good assumptions where each source might err. Speaking semantically, we are given (at least) two sets X, Y, and look for a suitable choice function f (X, Y ), which captures some of the ideas of revision. There are the following two basic approaches (with their reasons): • we choose a subset of the union of models, if we do not take any reliability into account, • we consider the contradictions a sign of lack of precision, and do not really believe any of the sources, so we choose some set not included in the union. Can we find some postulates for a problem, which is posed in so general terms? It seems so: (1) If we have no good reason for the contrary, each bit of information should have at least potential influence. (2) We should not throw the baby with the bathwater: • The result should not be overly strong, i.e. it should be consistent, if possible. • The result should not be overly weak, if possible. This point is usually summarized by the informal postulate of “minimal change” — ubiquous in common-sense reasoning (counterfactuals, theory update)! But this smells distance: given situation A, we look for situation B, which is minimally different from A, given some conditions, and some
480
Karl Schlechta
criterion of difference. If the “distance” between A and B is the amount of change, the cloud of smell concretizes to a formal definition: B is the one among a set of candidates B, which is closest to A. And that is exactly what we will do later on in Section 2.4. (We can also base revision on a notion of size of models, considering the biggest models as most important. This is shortly hinted at at the end of Section 2.4.) A further major distinction between different approaches to theory revision is whether each bit of information has the same weight, or whether some have a priviledged position. Traditional (AGM) theory revision gives more weight to the second information. We have then the following semantical situation: we have two sets, X and Y, perhaps disjoint, and Y should be given more weight than X. The approach by AGM is, for historical reasons, extremist in the following sense: f (X, Y ) will always be a subset of Y, so the influence of Y is very strong. In particular, we do not doubt the reliability of Y. The problem is how to choose this subset of Y, recalling that some influence of X should nonetheless be felt. As mentioned above, given a distance, the more we go away from X, the less likely a point y will be, seen from X. So it is a natural idea to take those points in Y, which seem most likely, seen from X, i.e. which are closest to X. This is elaborated in Section 2.4. The AGM approach We present now the basic definitions and results of AGM style theory revision. All definitions and results in this Section 2.4 can be found in the work of AGM. As the authors also provide ample motivation and discussion, we limit ourselves here to the absolute minimum in comments. DEFINITION 34. We consider two functions, − and ∗, taking a deductively closed theory and a formula as arguments, and returning a (deductively closed) theory on the logics side. The algebraic counterparts work on definable model sets. It is obvious that (K − 1), (K ∗ 1), (K − 6), (K ∗ 6) have vacuously true counterparts on the semantical side. Note that K (X) will never change, everything is relative to fixed K (X). In particular, AGM revision will thus not impose any restrictions on changes between different Ks, as they happen e.g. in iterated revision. K ∗φ is the result of revising K with φ. K −φ is the result of subtracting enough from K to be able to add ¬φ in a reasonable way. If they satisfy the following “rationality postulates” for -: (K-1) (K-2) (K-3) (K-4) (K-5)
K-A is deductively closed K-A ⊆ K A ∈ K ⇒ K − A = K
⊢ A ⇒ A ∈ K − A K ⊆ (K − A) ∪ {A}
(X (X (X (X
⊖ 2) ⊖ 3) ⊖ 4) ⊖ 5)
X ⊆X ⊖A X ⊆ A ⇒ X ⊖ A = X A = U ⇒ X ⊖ A ⊆ A (X ⊖ A) ∩ A ⊆ X
Nonmonotonic Logics: A Preferential Approach
(K-6) ⊢ A ↔ B ⇒ K − A = K − B (K-7) (K − A) ∩ (K − B) ⊆ K − (A ∧ B) (K-8) A ∈ K − (A ∧ B) ⇒ K − (A ∧ B) ⊆ K − A
(X ⊖ 7) (X ⊖ 8)
481
X ⊖ (A ∩ B) ⊆ (X ⊖ A) ∪ (X ⊖ B) X ⊖ (A ∩ B) ⊆ A ⇒ X ⊖ A ⊆ X ⊖ (A ∩ B)
and for ∗ (K*1) K ∗ A is deductively closed (K*2) A ∈ K ∗A (K*3) K ∗ A ⊆ K ∪ {A} (K*4) ¬A ∈ K ⇒ K ∪ {A} ⊆ K ∗ A (K*5) K ∗ A = K⊥ ⇒ ⊢ ¬A (K*6) ⊢ A ↔ B ⇒ K ∗ A = K ∗ B (K*7) K ∗ (A ∧ B) ⊆ (K ∗ A) ∪ {B} (K*8) ¬B ∈ K ∗ A ⇒ (K ∗ A) ∪ {B} ⊆ K ∗ (A ∧ B)
(X | 2) (X | 3) (X | 4) (X | 5) (X | 7) (X | 8)
X|A⊆A X ∩A⊆X |A X ∩ A = ∅ ⇒ X |A⊆X ∩A X|A=∅⇒A=∅ (X | A) ∩ B ⊆ X | (A ∩ B) (X | A) ∩ B = ∅ ⇒ X | (A ∩ B) ⊆ (X | A) ∩ B
they are called a (syntactical or semantical) contraction and revision function respectively. The second condition expresses the strength of the second argument, the third and fourth one treat and connect to the degenerate case, where consistent addition is possible, and the fifth expresses a limit condition — which is treated in more detail, of course, in the limit version — see below in Section 3.4. REMARK 35. Note that (X | 7) and (X | 8) express a central condition for ranked structures, see Definition 27: If we note X | . by fX (.), we then have: fX (A) ∩ B = ∅ ⇒ fX (A ∩ B) = fX (A) ∩ B. PROPOSITION 36. Both notions are interdefinable by the following equations: K ∗ A := (K − ¬A) ∪ {A} K-A := K ∩ (K ∗ ¬A)
X | A := (X ⊖ CA) ∩ A X ⊖ A := X ∪ (X | CA)
i.e., if the defining side has the respective properties, so will the defined side. The concept of an epistemic entrenchment relation completes the picture, as such relations take the arbitrariness out of the preceeding definitions, they fix the choices. DEFINITION 37. Let ≤K be a relation on the formulas relative to a deductively closed theory K on the formulas of L, and ≤X a relation on P(U ) or a suitable subset of P(U ) relative to fixed X s.t. ≤K is transitive A ⊢ B ⇒ A ≤K B ∀A, B (A ≤K A ∧ B or B ≤K A ∧ B) (EE4) K = K⊥ ⇒ (A ∈ K iff ∀B.A ≤K B) (EE5) ∀B.B ≤K A ⇒⊢ A (EE1) (EE2) (EE3)
(EE1) (EE2) (EE3) (EE4) (EE5)
≤X is transitive A ⊆ B ⇒ A ≤X B ∀A, B (A ≤X A ∩ B or B ≤X A ∩ B) X = ∅ ⇒ (X ⊆ A iff ∀B.A ≤X B) ∀B.B ≤X A ⇒ A = U
482
Karl Schlechta
We then call ≤K (≤X ) a relation of epistemic entrenchment for K(X). When the context is clear, we simply write ≤ . But, we should not forget that ≤K depends on K(X), as the whole AGM approach is relative to a fixed K(X). A remark on intuition: The idea of epistemic entrenchment is that φ is more entrenched than ψ (relative to K) iff M (¬ψ) is closer to M (K) than M (¬φ) is to M (K). In shorthand, the more we can twiggle K without reaching ¬φ, the more φ is entrenched. Truth is maximally entrenched — no twiggling whatever will reach falsity. The more φ is entrenched, the more we are certain about it. Seen this way, the properties of epistemic entrenchment relations are very natural (and trivial): As only the closest points of M (¬φ) count (seen from M (K)), φ or ψ will be as entrenched as φ ∧ ψ, and there is a logically strongest φ′ which is as entrenched as φ — this is just the sphere around M (K) with radius d(M (K), M (¬φ)). Again, we have an interdefinability result: PROPOSITION 38. The function K − (X⊖) and the ordering ≤K (≤X ) are interdefinable in the following sense: Define K − A by B ∈ K − A :↔ B ∈ K and (A
Define A ≤K B by A ≤K B :↔ A ∈ K − (A ∧ B) or ⊢ A ∧ B and A ≤X B :↔ A, B = U or X ⊖ (A ∩ B) ⊆ A.
Then, if the defining side has the respective properties, so will the defined side. Before we turn to details of a distance semantics for revision, we just mention that we can also give a semantics based on size of model sets to revision. The reader is referred to [Schlechta, 1991] or [Schlechta, 2004] for details. In short, we have shown that it is possible to define theory revision from model size in a uniform way, i.e. for various base theories T. Thus, this approach gives also a semantics to iterated revision. The basic idea is as follows: we associate to each propositional variable (of a countable language) in an independent way a (measurable) subset of the real interval [0, 1], and thus a probability. This probability measure is then extended to arbitrary formulas, and gives a “weight” to each formula. This ordering results in a “pre-EE relation”, (Definition 7.4.1 in [Schlechta, 2004]) which does not mention any base theory T, and generates epistemic entrenchment relations for arbitrary T (see Definition 7.4.2 and Proposition 7.4.2 in [Schlechta, 2004]). Distance semantics for theory revision We bring now distance and theory revision together and define a distance semantics for theory revision. Note that this semantics defines, by its very nature, also iterated revision, and thus goes significantly beyond AGM revision, in both senses:
Nonmonotonic Logics: A Preferential Approach
483
first, the system is more expressive, second, one has to pay for it, in general, finite characterization is not possible any more (see Section 2.4 below). Before we go into details, we would like to emphasize two quite different uses one can make of distances. The first one gives a semantics for theory revision, the second one for counterfactual conditionals (see [Lewis, 1973]), but also for theory update (see Katsuno and co-authors, e.g. [Katsuno and Mendelzon, 1990]). DEFINITION 39. We define the collective and the individual variant of choosing the closest elements in the second operand by: X | Y := {y ∈ Y : ∃xy ∈ X.∀x′ ∈ X, ∀y ′ ∈ Y (d(xy , y) ≤ d(x′ , y ′ ))} (the collective variant, where only the closest elements in X count) and X ↑ Y := {y ∈ Y : ∃xy ∈ X.∀y ′ ∈ Y (d(xy , y) ≤ d(xy , y ′ ))} (the individual variant, where each x ∈ X has its word to say). The latter variant is, in particular, monotone in the first argument, which the first is not. REMARK 40. It is trivial to see that AGM individual distance: Suppose X | Y := {y ∈ d(xy , y ′ ))}. Consider a, b, c. {a, b} | {b, c} = d(a, b) < d(a, c). But on the other hand {a, c} contradiction.
revision cannot be defined by an Y : ∃xy ∈ X(∀y ′ ∈ Y.d(xy , y) ≤ {b} by (X | 3) and (X | 4), so | {b, c} = {c}, so d(a, b) > d(a, c),
DEFINITION 41. Given a pseudo-distance d : U × U → Z (see Definition 2), let for A, B ⊆ U A |d B := {b ∈ B : ∃ab ∈ A∀a′ ∈ A∀b′ ∈ B.d(ab , b) ≤ d(a′ , b′ )} Thus, A |d B is the subset of B consisting of all b ∈ B that are closest to A. Note that, if A or B is infinite, A |d B may be empty, even if A and B are not empty. A condition assuring nonemptiness will be imposed below. (The limit version gets rid of such nonemptiness conditions, forced by property 5 of the AGM theory, see Section 3.4.) A little example shows the additional strength of distance based revision. Assume the distance to be symmetrical. We than have A | B = (B | A) | B. This is true only for the collective variant of distance, and left as an easy exercise. DEFINITION 42. An operation | is representable iff there is a pseudo-distance d : U × U → Z such that (1) A | B = A |d B := {b ∈ B : ∃ab ∈ A∀a′ ∈ A∀b′ ∈ B(d(ab , b) ≤ d(a′ , b′ ))}. The following is the central definition, it describes the way a revision ∗d is attached to a pseudo-distance d on the set of models. DEFINITION 43.
T ∗d T ′ := T h(M (T ) |d M (T ′ )).
484
Karl Schlechta
∗ is called representable iff there is a pseudo-distance d on the set of models s.t. T ∗ T ′ = T h(M (T ) |d M (T ′ )). Proposition 44 is the main result for the symmetric case. It gives a characterization of distance definable choice functions | . The central (and almost only) condition is the loop condition (| S1), see the discussion immediately after the proposition. PROPOSITION 44. Let U = ∅, Y ⊆ P(U ) be closed under finite ∩ and finite ∪, ∅ ∈ Y. Let A, B, Xi ∈ Y. Let |: Y × Y → Y, and consider the conditions (| 1) A | B ⊆ B (| 2) A ∩ B = ∅ → A | B = A ∩ B (| S1) (Loop): (X1 | (X0 ∪ X2 )) ∩ X0 = ∅, (X2 | (X1 ∪ X3 )) ∩ X1 = ∅, (X3 | (X2 ∪ X4 )) ∩ X2 = ∅, . . . . (Xk | (Xk−1 ∪ X0 )) ∩ Xk−1 = ∅ imply (X0 | (Xk ∪ X1 )) ∩ X1 = ∅. (a)
| is representable by a symmetric pseudo-distance d : U ×U → Z iff | satisfies (| 1) and (| S1).
(b)
| is representable by an identity respecting symmetric pseudo-distance d : U × U → Z iff | satisfies (| 1), (| 2), and (| S1).
Note that (| 1) corresponds to (*2), (| 2) to (*3), (*0) will hold trivially, (*1) holds by definition of Y and |, (*4) will be a consequence of representation. (| S1) corresponds to: d(X1 , X0 ) ≤ d(X1 , X2 ), d(X2 , X1 ) ≤ d(X2 , X3 ), d(X3 , X2 ) ≤ d(X3 , X4 ) ≤ . . . ≤ d(Xk , Xk−1 ) ≤ d(Xk , X0 ) → d(X0 , X1 ) ≤ d(X0 , Xk ), and, by symmetry, d(X0 , X1 ) ≤ d(X1 , X2 ) ≤ . . . ≤ d(X0 , Xk ) → d(X0 , X1 ) ≤ d(X0 , Xk ), i.e. transitivity, or to absence of loops involving < . We give a very very short outline of the proof: Define + A, B +≤+ A, B ′ + iff (A | (B ∪ B ′ )) ∩ B = ∅. Extend the transitive closure of ≤ to a suitable total preorder, and consider its equivalence classes. The restriction to singletons defines a suitable distance. This result translates into logic in the following way: We consider the following conditions for a revision function ∗ defined for arbitrary consistent theories on both sides. This is thus a slight extension of the AGM framework, as AGM work with formulas only on the right of ∗. CONDITION 45. (*0)
If |= T ↔ S, |= T ′ ↔ S ′ , then T ∗ T ′ = S ∗ S ′ ,
(*1)
T ∗ T ′ is a consistent, deductively closed theory,
(*2)
T ′ ⊆ T ∗ T ′,
Nonmonotonic Logics: A Preferential Approach
(*3)
485
If T ∪ T ′ is consistent, then T ∗ T ′ = T ∪ T ′ ,
(*S1) Con(T0 , T1 ∗ (T0 ∨ T2 )), Con(T1 , T2 ∗ (T1 ∨ T3 )), Con(T2 , T3 ∗ (T2 ∨ T4 )) . . . Con(Tk−1 , Tk ∗ (Tk−1 ∨ T0 )) imply Con(T1 , T0 ∗ (Tk ∨ T1 )). We finally have the following representation result for the symmetric case: PROPOSITION 46. Let L be a propositional language. (a) A revision operation ∗ is representable by a symmetric consistency and definability preserving pseudo-distance iff ∗ satisfies (*0)-(*2), (∗S1). (b) A revision operation ∗ is representable by a symmetric consistency and definability preserving, identity respecting pseudo-distance iff ∗ satisfies (*0)–(*3), (*S1). Definability preservation is again not innocent. The following Example 47 (1) shows that, in general, a revision operation defined on models via a pseudo-distance by T ∗ T ′ := T h(M (T ) |d M (T ′ )) will not satisfy (∗S1), unless we require |d to preserve definability. But this is not proper to our new condition (∗S1), the same happens to the original AGM postulates, as essentially the same Example 47 (2) shows. To see this, we summarize the AGM postulates (K*7) and (K*8) in (*4): (*4) If T ∗ T ′ is consistent with T ′′ , then T ∗ (T ′ ∪ T ′′ ) = (T ∗ T ′ ) ∪ T ′′ . (*4) may fail in the general infinite case without definability preservation. EXAMPLE 47. Consider an infinite propositional language L. Let X be an infinite set of models, m, m1 , m2 be models for L. Arrange the models of L in the real plane s.t. all x ∈ X have the same distance < 2 (in the real plane) from m, m2 has distance 2 from m, and m1 has distance 3 from m. Let T, T1 , T2 be complete (consistent) theories, T ′ a theory with infinitely many models, M (T ) = {m}, M (T1 ) = {m1 }, M (T2 ) = {m2 }. The two variants diverge now slightly: (1) M (T ′ ) = X ∪ {m1 }. T, T ′ , T2 will be pairwise inconsistent. (2) M (T ′ ) = X ∪ {m1 , m2 }, M (T ′′ ) = {m1 , m2 }. Assume in both cases T h(X) = T ′ , so X will not be definable by a theory. Now for the results: Then M (T ) | M (T ′ ) = X, but T ∗ T ′ = T h(X) = T ′ . (1) We easily verify Con(T, T2 ∗ (T ∨ T )), Con(T2 , T ∗ (T2 ∨ T1 )), Con(T, T1 ∗ (T ∨ T )), Con(T1 , T ∗ (T1 ∨ T ′ )), Con(T, T ′ ∗ (T ∨ T )), and conclude by Loop (i.e. (*S1)) Con(T2 , T ∗ (T ′ ∨ T2 )), which is wrong. (2) So T ∗ T ′ is consistent with T ′′ , and (T ∗ T ′ ) ∪ T ′′ = T ′′ . But T ′ ∪ T ′′ = T ′′ , and T ∗ (T ′ ∪ T ′′ ) = T2 = T ′′ , contradicting (*4).
486
Karl Schlechta
Absence of finite characterization for distance based revision We conclude this section with a proof that distance based revision has no finite characterization of the usual form. This part belongs half way into the next section, so it is adequate to conclude the present one by this observation. We begin by a seemingly innocent observation which will, however, lead to a strong negative result: there is no finite characterization of distance based revision possible. Note that even when the pseudo-distance is a real distance, the resulting revision operator |d does not always permit to reconstruct the relations of the distances: revision is a coarse instrument to investigate distances. Distances with common start (or end, by symmetry) can always be compared by looking at the result of revision: a |d {b, b′ } = b iff d(a, b) < d(a, b′ ), a |d {b, b′ } = b′ iff d(a, b) > d(a, b′ ), a |d {b, b′ } = {b, b′ } iff d(a, b) = d(a, b′ ). This is not the case with arbitrary distances d(x, y) and d(a, b), as the following example will show. EXAMPLE 48. We work in the real plane, with the standard distance, the angles have 120 degrees. a′ is closer to y than x is to y, a is closer to b than x is to y, but a′ is farther away from b′ than x is from y. Similarly for b,b’. But we cannot distinguish the situation {a, b, x, y} and the situation {a′ , b′ , x, y} through |d . (See Figure 2.) This is easily (and tediously) seen by examining all cases. Thus, the forest can hide the trees, closer points hide those further away. This phenomenon has a positive side, too: we can unify local distance semantics for counterfactual conditionals into one global metric, as was shown in [Schlechta and Makinson, 1994]. In this case, everything which is hidden behind the first elements, can not interfere any more, so a uniform construction is possible. For details, the reader is referred there or to [Schlechta, 2004]. We show now that no finite normal characterization of distance defined revision is possible. We work on the algebraic side. The crucial example (Example 49), can be chosen arbitrarily big. We take care that the important revision results are isolated, i.e. that they have no repercussion on other results. For this purpose, we use the property that closer elements hide those farther away, as we just saw in Example 48. As a result, we obtain structures which are trivially not distance definable, but changing just one “bit” of information makes them distance definable. Consequently, in the limit, the amount of information distinguishing the representable and the not representable case becomes arbitrarily small, and we need arbitrarily much information to describe the situation. This is made formal in Proposition 50. We will have to modify the general framework described above a little, but the main idea is the same. First, we recall the positive result.
Nonmonotonic Logics: A Preferential Approach
s x
s yA
487
sa s x
A As b
s yA
A A
A A
s a′
A s b′
Figure 2. We have characterized revision representable by distance. The crucial condition was a loop condition, of the type: if d(a1 , b1 ) ≤ d(a2 , b2 ) ≤ . . . ≤ d(an , bn ), then d(a1 , b1 ) ≤ d(an , bn ), where n is arbitrarily big (but finite). There are no really better, i.e. finite, conditions, and we can prove it. We construct a class of examples, which provides for all n ∈ ω a Y (n) which is not representable by distances, and for all a1 , . . . , an in Y (n) a structure X which is representable by distances and agrees with Y (n) on a1 , . . . , an . We call the distance representable examples “legal”, and the other ones “illegal”. For didactic reasons, we develop the construction from the end. The problem is to construct revision formalisms which can be transformed by a very minor change from an illegal to a legal case. Thus, the problem is to construct legal examples sufficiently close to illegal ones, and, for this, we have to define a distance. We will construct legal structures where the crucial property (freedom from loops) can be isolated from the rest of the information by a suitable choice of distances. It will then suffice to change just one bit of information to obtain an illegal example, which can be transformed back to a legal one by changing again one bit of the important information (not necessarily the same one). EXAMPLE 49 (Hamster wheels). Fix n sufficiently big (> 4 or s.t. the like will do). d will be the (symmetrical) distance to be defined now. Take {ai : 1 ≤ i ≤ n} ∪ {bi : 1 ≤ i ≤ n} and join them (separately) in two “wheels”, s.t. a1 is between an and a2 , etc.
488
Karl Schlechta
Let d(ai , aj ) := d(bi , bj ) := 1 for any i = j, and d(x, x) = 0 for all x. Call bi the opposite of ai , bi−1 and bi+1 (all modulo n) the 1-opposite of ai , and bi−2 and bi+2 the 2-opposite of ai , etc. Let d(ai , bj ) := 1.9 if bj is the 1-opposite of ai , and d(ai , bj ) := 1.1 if bj is the k-opposite of ai for k > 1. Choose d(ai , bi ) ∈ [1.2, 1.8] arbitrarily. We call this the “choices”. Look now at A | B (the set of closest elements in B, seen from A). We show that almost all A,B give the same results, independent of the choices of d(ai , bi ). The case A ∩ B = ∅ is trivial. If there is some ai ∈ A, and some aj ∈ B, then A | B will contain all aj ∈ B. Likewise for bi , bj . In these cases, the distance 1 makes all other cases invisible. Let now A = {ai : i ∈ I}, and B = {bj : j ∈ J}. (A = {bi . . . .}, etc. is symmetrical.) Case 1: A = {ai }. Then in all choices the k-opposites, k > 1, have precedence over the opposites over the 1-opposites, the result does not depend on the choices. Case 2: A contains at least three ai . Assume that B contains at least two bj . If not, we are in Case 1. In this case, one bj is k-opposite, k > 1, and this decides, independent from the choices of the d(ai , bi ). Case 3: A = {ai , aj }. By Cases 1 and 2 and symmetry, the only interesting case is where B = {bl , bm }. If j = i + 1, then bl or bm are k-opposites, k > 1, and the outcome is the same for all choices. So, finally, all revision information which allows to differentiate between the different choices is of the type {ai , ai+1 } | {bi , bi+1 } — and they do it, e.g. {ai , ai+1 } | {bi , bi+1 } = {bi } iff d(ai , bi ) < d(ai+1 , bi+1 ). But, to see whether we have a legal situation, e.g. of the type d(ai , bi ) = d(aj , bj ) for all i,j, or an illegal one of the type d(a1 , b1 ) < d(a2 , b2 ) < . . . < d(an , bn ) < d(a1 , b1 ), which cannot be represented by a distance, we need the whole chain of n pieces of information. This is easy, just construct a legal case for any smaller set of information. More precisely, define a revision operator | as above for all but the crucial sets. The construction indicates how to define a distance which generates these results. For the illegal case, add now a loop, by working with the crucial cases. This operator cannot be generated by a distance. But omitting one step of the loop results in a structure which is distance definable. As we took care to isolate the crucial cases from the rest, the other results stay unchanged. Consequently, all sufficiently small formulas (below the upper bound) are valid (or not) in both cases. We make this formal in the following Proposition 50. PROPOSITION 50. No small (finite) normal characterization of distance representable revision is possible.
Nonmonotonic Logics: A Preferential Approach
3
3.1
489
ADVANCED TOPICS
Introduction
This section is addressed primarily to the advanced reader, who wants to see more subtle problems, and some techniques used in the field. Of course, there will be many gaps — due to limited space — but we hope these pages can nonetheless give a first impression, and, perhaps, attract the reader to pursue and do his own research in the field. The material is taken from [Schlechta, 2004] (partly also from earlier work of the author), and the interested reader is referred there for details and discussion of the general framework.
3.2
Proof techniques for preferential structures
Introduction We show here some proof techniques for preferential structures, as they allow us to see essential properties of and ways to handle such structures. We begin by the basic construction, used for general preferential structures, show a generic way to modify it for transitive structures, and then turn to smooth structures. For the latter, it is important to avoid “coming back”, this is coded by the set H(U ), a hull around U, and its properties, summarized in Fact 60. Once we have these properties, it is again straigtforward, and almost administrative work, to transform the function µ into a representing smooth structure. General preferential structures We describe first the basic construction technique for unrestricted preferential structures with copies. The main idea is to use, for given x, functions f which choose for any Y s.t. x ∈ Y − µ(Y ) an element y ∈ Y, with the intention to minimize x (more precisely, the copy x, f ) in Y by this y. This will become clearer in a moment. DEFINITION 51. Let Yx := {Y ∈ Y: x ∈ Y − µ(Y )}, and Πx := ΠYx (recall that ΠX is the cartesian product of X). The following Claim 52 is the core of the completeness proof. It is a direct consequence of property (µP R), and gives the most general construction possible (apart from the relation, where minimizing by one copy would suffice, and we will modify it here to obtain transitivity), as we do not say more than we are forced to say: If x is not minimal in X, it will be minimized by X — but we do not a priori know by which x′ , or set of x′ in X, and we do not say more. We have CLAIM 52. Let µ : Y → Y satisfy (µ ⊆) and (µP R), and let U ∈ Y. Then x ∈ µ(U ) ↔ x ∈ U ∧ ∃f ∈ Πx .ran(f ) ∩ U = ∅.
490
Karl Schlechta
Proof. We give only the main argument, case Yx = ∅ : “→”: Let x ∈ µ(U ) ⊆ U. It suffices to show Y ∈ Yx → Y −U = ∅. But if Y ⊆ U and Y ∈ Yx , then x ∈ Y − µ(Y ), contradicting (µP R). “←”: If x ∈ U − µ(U ), then U ∈ Yx , so ∀f ∈ Πx .ran(f ) ∩ U = ∅. Note the decisive role (µP R) plays here. We define the preferential structure by Let X := {x, f : x ∈ Z ∧ f ∈ Πx }, and x′ , f ′ ≺ x, f :↔ x′ ∈ ran(f ). Let Z := X , ≺. CLAIM 53. For U ∈ Y holds: µ(U ) = µZ (U ). Proof. By Claim 52, it suffices to show that for all U ∈ Yx ∈ µZ (U ) ↔ x ∈ U and ∃f ∈ Πx .ran(f ) ∩ U = ∅. So let U ∈ Y. “→”: If x ∈ µZ (U ), then there is x, f minimal in X ⌈U (recall from Definition 3 that X ⌈U := {x, i ∈ X : x ∈ U }), so x ∈ U, and there is no x′ , f ′ ≺ x, f , x′ ∈ U, so by Πx′ = ∅ there is no x′ ∈ ran(f ), x′ ∈ U, but then ran(f ) ∩ U = ∅. “←”: If x ∈ U, and there is f ∈ Πx , ran(f ) ∩ U = ∅, then x, f is minimal in X ⌈U. We finally have: PROPOSITION 54. An operation µ : Y → Y is representable by a preferential structure iff µ satisfies (µ ⊆) and (µP R). Transitive preferential structures We turn to transitivity. If we look at a transitive relation, say we begin with a, then b ≺ a, c ≺ a, we then continue d ≺ b, e ≺ b etc., we build a tree (as branches may join again, it might be a more general graph, but this does not matter, we can think in trees). For transitivity, finite chains suffice, so we build trees of height ≤ ω. The trees contain the direct or indirect successors of the root. Consider now b ≺ a. The tree corresponding to b is just the subtree of a’s tree, beginning at b, and vice versa, these subtrees give us the direct successors of a. We use this fact to refine the construction of the relation, to have better control over successors. Our construction avoids a certain excess in the relation ≺ of above construction: There, too many elements y, g are smaller than some x, f , as the relation is independent from g. This excess prevents transitivity. As it suffices to make one copy of the successor smaller than the element to be minimized, we restrict the relation, using our trees. We can use the element itself to minimize it. This is made precise by the use of the trees tfx for a given element x and choice function fx . The trees tfx are constructed as follows: The root is x, the first branching is done according to fx , and then we continue with constant choice. Let, e.g. x′ ∈ ran(fx ), we can now always choose x′ , as it will be a legal successor of x′ itself, being present in all X ′ s.t. x′ ∈ X ′ − f (X ′ ). So we have a tree which branches once, directly above
Nonmonotonic Logics: A Preferential Approach
491
the root, and is then constant without branching. Obviously, this is essentially equivalent to the old construction in the not necessarily transitive case. This shows two things: first, the construction with trees gives the same µ as the old construction with simple choice functions. Second, even if we consider successors of successors, nothing changes: we are still with the old x′ . Consequently, considering the transitive closure will not change matters, an element x, tfx will be minimized by its direct successors iff it will be minimized by direct and indirect successors. If you like, the trees tfx are the mathematical construction expressing the intuition that we know so little about minimization that we have to consider suicide a serious possibility — the intuitive reason why transitivity imposes no new conditions. We make this precise in the following CONSTRUCTION 55. (1) For x ∈ Z, let Tx be the set of trees tx s.t. (a) all nodes are elements of Z, (b) the root of tx is x, (c) height(tx ) ≤ ω, (d) if y is an element in tx , then there is f ∈ Πy := Π{Y ∈ Y: y ∈ Y −µ(Y )} s.t. the set of children of y is ran(f ). (2) For x, y ∈ Z, tx ∈ Tx , ty ∈ Ty , set tx ty iff y is a (direct) child of the root x in tx , and ty is the subtree of tx beginning at y. (3) Let Z := {x, tx : x ∈ Z, tx ∈ Tx }, x, tx ≻ y, ty iff tx ty . CLAIM 56. ∀U ∈ Y.µ(U ) = µZ (U ) Proof. The proof is straightforward, it makes essential use of the special trees tfx . By Claim 52, it suffices to show that for all U ∈ Y x ∈ µZ (U ) ↔ x ∈ U ∧ ∃f ∈ Πx .ran(f ) ∩ U = ∅. Fix U ∈ Y. “→”: x ∈ µZ (U ) → ex. x, tx minimal in Z⌈U, thus x ∈ U and there is no y, ty ∈ Z, y, ty ≺ x, tx , y ∈ U . Let f define the set of children of the root x in tx . If ran(f ) ∩ U = ∅, if y ∈ U is a child of x in tx , and if ty is the subtree of tx starting at y, then ty ∈ Ty and y, ty ≺ x, tx , contradicting minimality of x, tx in Z⌈U. So ran(f ) ∩ U = ∅. “←”: Let x ∈ U. If Yx = ∅, then the tree x has no -successors, and x, x is ≻-minimal in Z. If Yx = ∅ and f ∈ Πx s.t. ran(f ) ∩ U = ∅, then < x, tfx > is ≻-minimal in Z⌈U. We consider now the transitive closure of Z. (Recall that ≺∗ denotes the transitive closure of ≺ .) Claim 57 shows that transitivity does not destroy what we have achieved. CLAIM 57. Let Z ′ := {x, tx : x ∈ Z, tx ∈ Tx }, x, tx ≻ y, ty iff tx ∗ ty . Then µZ = µZ ′ .
492
Karl Schlechta
Proof. Again, the tfx play a special role. Suppose there is U ∈ Y, x ∈ U, x ∈ µZ (U ), x ∈ µZ ′ (U ). Then there must be an element x, tx ∈ Z with no x, tx ≻ y, ty for any y ∈ U. Let f ∈ Πx determine the set of children of x in tx , then ran(f ) ∩ U = ∅, consider tfx . As all elements = x of tfx are already in ran(f ), no element of tfx is in U. Thus there is no z, tz ≺∗ < x, tfx > in Z with z ∈ U, so < x, tfx > is minimal in Z ′ ⌈U, contradiction. We thus have PROPOSITION 58. An operation µ : Y → Y is representable by a transitive preferential structure iff µ satisfies (µ ⊆) and (µP R). Smooth preferential structures We turn to smooth structures and cumulativity. We assume now closure of the domain Y under finite unions and intersections. In the smooth case, we know that if x ∈ X − µ(X), then there must be x′ ≺ x, x′ ∈ µ(X) (or, more precisely, for each copy x, i of x, there must be such x′ ). Thus, the freedom of choice is smaller, and at first sight, the case seems simpler. The problem is to assure that obtaining minimization for x in X does not destroy smoothness elsewhere, or, if it does, we have to repair it. Recall that smoothness says that if some element is not minimal, then there is a minimal element below it — it does not exclude that there are nonminimal elements below it, it only imposes the existence of minimal elements below it. Thus, if, during construction, we put some nonminimal elements below some element, we can and have to repair this by putting a suitable minimal one below it. Of course, we have to take care that this repairing process does not destroy something else, or, we have to repair this again, etc., and have to assure at the same time that we do not alter the choice function. The basic idea is thus as follows for some given x, and a copy x, σ to be constructed (x, σ will later be minimized by all elements in the ranges of the σi which constitute σ): • First, we minimize x, where necessary, using the same idea of cartesian product as in the not necessarily smooth case, but this time choosing in µ(Y ) for suitable Y : σ0 ∈ Π{µ(Y ) : x ∈ Y − µ(Y )}. • This might have caused trouble, if X is such that x ∈ µ(X), and ran(σ0 ) ∩ X = ∅, we have destroyed minimality of the copy x, σ under construction in X, and have to put a new element minimal in this X below it, to preserve smoothness: σ1 ∈ Π{µ(X) : x ∈ µ(X) and ran(σ0 ) ∩ X = ∅}. • Again, we might have caused trouble, as we might have destroyed minimality in some X, this time by the new ran(σ1 ), so we repeat the procedure for σ1 , and so on, infinitely often.
Nonmonotonic Logics: A Preferential Approach
493
We then show that for each x and U with x ∈ µ(U ) there is such x, σ, s.t. all ran(σi ) have empty intersection with U — this guarantees minimality of x in U for some copy. As a matter of fact, we show a stronger property, that ran(σi ) ∩ H(U ) = ∅ for all σi , where H(U ) is a sufficiently big “hull” around U. The existence of such special x, σ will also assure smoothness: Again, we make in an excess of relation all copies irrespective of the second coordinate smaller than a given copy. Thus, if an element y, τ for y ∈ µ(Y ) is not minimal in the constructed structure, the reason is that for some i ran(τi ) ∩ Y = ∅. This will be repaired in the next step i + 1, by putting some x minimal in Y below it, and as we do not look at the second coordinate, there will be a minimal copy of x, x, σ below it. The hull H(U ) is defined as {X : µ(X) ⊆ U }. The motivation for this definition is that anything inside the hull will be “sucked” into U — any element in the hull will be minimized by some element in some µ(X) ⊆ U, and thus by U. More precisely, if u ∈ µ(U ), but u ∈ X − µ(X), then there is x ∈ µ(X) − H(U ). Consequently, to kill minimality of u in X, we can choose x ∈ µ(X)−H(U ), x ≺ u, without interfering with u’s minimality in U. Moreover, if x ∈ Y − µ(Y ), then, by x ∈ H(U ), µ(Y ) ⊆ H(U ), so we can kill minimality of x in Y by choosing some y ∈ H(U ). Thus, even in the transitive case, we can leave U to destroy minimality of u in some X, without ever having to come back into U, it suffices to choose sufficiently far from U, i.e. outside H(U ). H(U ) is the right notion of “neighborhood”. (It is easier to stay altogether out of H(U ) in the inductive construction of σ, than to avoid U directly — which we need for our minimal elements.) Note that H(U ) need not be an element of the domain, which is not necessarily closed under arbitrary unions. But this does not matter, as H(U ) will never appear as an argument of f. Obviously, suitable properties of H(U ) as shown in Fact 60 are crucial for the inductive construction of the σ used for minimal elements. Closure of the domain under finite unions is used in a crucial way in the proof of this Fact 60, which collects the main properties of H(U ), to be defined now. DEFINITION 59. Define H(U ) := {X : µ(X) ⊆ U }. FACT 60. Let A, U, U ′ , Y and all Ai be in Y. (µ ⊆) and (µP R) entail: (1) A = {Ai : i ∈ I} → µ(A) ⊆ {µ(Ai ) : i ∈ I}, (2) U ⊆ H(U ), and U ⊆ U ′ → H(U ) ⊆ H(U ′ ), (3) µ(U ∪ Y ) − H(U ) ⊆ µ(Y ). (µ ⊆), (µP R), (µCU M ) entail: (4) U ⊆ A, µ(A) ⊆ H(U ) → µ(A) ⊆ U, (5) µ(Y ) ⊆ H(U ) → Y ⊆ H(U ) and µ(U ∪ Y ) = µ(U ),
494
Karl Schlechta
(6) x ∈ µ(U ), x ∈ Y − µ(Y ) → Y ⊆ H(U ), (7) Y ⊆ H(U ) → µ(U ∪ Y ) ⊆ H(U ). For a proof, see [Schlechta, 2004]. DEFINITION 61. For x ∈ Z, let Wx := {µ(Y ): Y ∈ Y ∧ x ∈ Y − µ(Y )}, Γx := ΠWx . We have (slightly simplified) CLAIM 62. Let U ∈ Y. Then x ∈ µ(U ) ↔ x ∈ U ∧ ∃f ∈ Γx .ran(f ) ∩ H(U ) = ∅. This is a direct consequence of Fact 60 (6). We define the structure Z : X := {x, g: x ∈ K, g ∈ Γx }, x′ , g ′ ≺ x, g :↔ x′ ∈ ran(g), Z := X , ≺. and have CLAIM 63. ∀U ∈ Y.µ(U ) = µZ (U ) This follows from Claim 62. The structure will not yet be smooth, we now construct the refined structure Z ′ . CONSTRUCTION 64 (Construction of Z ′ ). σ is called x-admissible sequence iff 1. σ is a sequence of length ≤ ω, σ = {σi : i ∈ ω}, 2. σo ∈ Π{µ(Y ): Y ∈ Y ∧ x ∈ Y − µ(Y )}, 3. σi+1 ∈ Π{µ(X): X ∈ Y ∧ x ∈ µ(X) ∧ ran(σi ) ∩ X = ∅}. By 2., σ0 minimizes x, and by 3., if x ∈ µ(X), and ran(σi ) ∩ X = ∅, i.e. we have destroyed minimality of x in X, x will be above some y minimal in X to preserve smoothness. Let Σx be the set of x-admissible sequences, for σ ∈ Σx let σ := {ran(σi ) : i ∈ ω}. Let X ′ := {< x, σ >: x ∈ K ∧ σ ∈ Σx } and < x′ , σ ′ >≺′ < x, σ > :↔ x′ ∈ σ . Finally, let Z ′ :=< X ′ , ≺′ >, and µ′ := µZ ′ . It is now easy to show that Z ′ represents µ, and that Z ′ is smooth. For x ∈ µ(U ), we construct a special x-admissible sequence σ x,U using the properties of H(U ) as described in Fact 60. Assume x ∈ µ(U ) (so x ∈ K), U ∈ Y, we will construct minimal σ, i.e. show that there is σ x,U ∈ Σx s.t. σ x,U ∩U = ∅. We construct this σ x,U inductively, with the stronger property that ran(σix,U ) ∩ H(U ) = ∅ for all i ∈ ω. σ0x,U : x ∈ µ(U ), x ∈ Y − µ(Y ) → µ(Y ) − H(U ) = ∅ by Fact 60, (6) + (5). Let σ0x,U ∈ Π{µ(Y ) − H(U ) : Y ∈ Y, x ∈ Y − µ(Y )}, so ran(σ0x,U ) ∩ H(U ) = ∅. x,U σix,U → σi+1 : By induction hypothesis, ran(σix,U )∩H(U ) = ∅. Let X ∈ Y be s.t. x ∈ µ(X), ran(σix,U ) ∩ X = ∅. Thus X ⊆ H(U ), so µ(U ∪ X) − H(U ) = ∅ by Fact x,U 60, (7). Let σi+1 ∈ Π{µ(U ∪ X) − H(U ) : X ∈ Y, x ∈ µ(X), ran(σix,U ) ∩ X = ∅},
Nonmonotonic Logics: A Preferential Approach
495
x,U so ran(σi+1 ) ∩ H(U ) = ∅. As µ(U ∪ X) − H(U ) ⊆ µ(X) by Fact 60, (3), the construction satisfies the x-admissibility condition.
CLAIM 65. For all U ∈ Y µ(U ) = µZ (U ) = µ′ (U ). Z ′ is Y−smooth. For its proof, we use the special sequences σ x,U — see [Schlechta, 2004] for details. We summarize: PROPOSITION 66. Let Y be closed under finite unions and finite intersections, and µ : Y → Y. Then there is a Y−smooth preferential structure Z, s.t. for all X ∈ Y µ(X) = µZ (X) iff µ satisfies (µ ⊆), (µP R), (µCU M ). Smooth and transitive preferential structures Recall that, in a certain way, it is not surprising that transitivity does not impose stronger conditions in the smooth case either. Smoothness is itself a weak kind of transitivity: If an element is not minimal, then there is a minimal element below it, i.e., x ≻ y with y not minimal is possible, so there might be z ′ ≺ y, but then there is z minimal with x ≻ z. This is “almost” x ≻ z ′ , transitivity. To obtain representation, we combine the ideas of the smooth, but not necessarily transitive case with those of the general transitive case — as the reader will have suspected. Thus, we index again with trees, and work with (suitably adapted) admissible sequences for the construction of the trees. In the construction of the admissible sequences, we were careful to repair all damage done in previous steps. We have to add now repair of all damage done by using transitivity, i.e., the transitivity of the relation might destroy minimality, and we have to construct minimal elements below all elements for which we thus destroyed minimality. Both cases are combined by considering immediately all Y s.t. x ∈ Y − H(U ). The properties described in Fact 60 play again a central role. The main part of the argument is in the following construction, and we refer the reader to [Schlechta, 2004] for more details and the rest of the proof. CONSTRUCTION 67. (A)
The set Tx of trees t for fixed x:
(1)
Construction of the set T µx of trees for those sets U ∈ Y, where x ∈ µ(U ) : Let U ∈ Y, x ∈ µ(U ). The trees tU,x ∈ T µx are constructed inductively, observing simultaneously: If Un+1 , xn+1 is a child of Un , xn , then (a) xn+1 ∈ µ(Un+1 ) − H(Un ), and (b) Un ⊆ Un+1 . Set U0 := U, x0 := x. Level 0: U0 , x0 . Level n → n + 1: Let Un , xn be in level n. Suppose Yn+1 ∈ Y, xn ∈ Yn+1 , and Yn+1 ⊆ H(Un ). Note that µ(Un ∪ Yn+1 ) − H(Un ) = ∅ by Fact
496
Karl Schlechta
60, (7), and µ(Un ∪ Yn+1 ) − H(Un ) ⊆ µ(Yn+1 ) by Fact 60, (3). Choose fn+1 ∈ Π{µ(Un ∪ Yn+1 ) − H(Un ) : Yn+1 ∈ Y, xn ∈ Yn+1 ⊆ H(Un )} (for the construction of this tree, at this element), and let the set of children of Un , xn be {Un ∪ Yn+1 , fn+1 (Yn+1 ) : Yn+1 ∈ Y, xn ∈ Yn+1 ⊆ H(Un )}. (If there is no such Yn+1 , Un , xn has no children.) Obviously, (a) and (b) hold. We call such trees U, x−trees. (2)
Construction of the set Tx′ of trees for the nonminimal elements. Let x ∈ Z. Construct the tree tx as follows (here, one tree per x suffices for all U): Level 0: ∅, x Level 1: Choose arbitrary f ∈ Π{µ(U ) : x ∈ U ∈ Y}. Let {U, f (U ) : x ∈ U ∈ Y} be the set of children of ∅, x. This assures that the element will be nonminimal. Level > 1: Let U, f (U ) be an element of level 1, as f (U ) ∈ µ(U ), there is a tU,f (U ) ∈ T µf (U ) . Graft one of these trees tU,f (U ) ∈ T µf (U ) at U, f (U ) on the level 1. This assures that a minimal element will be below it to guarantee smoothness. Finally, let Tx := T µx ∪ Tx′ .
(B)
The relation between trees: For x, y ∈ Z, t ∈ Tx , t′ ∈ Ty , set t t′ iff for some Y Y, y is a child of the root X, x in t, and t′ is the subtree of t beginning at this Y, y.
(C)
The structure Z: Let Z := {x, tx : x ∈ Z, tx ∈ Tx }, x, tx ≻ y, ty iff tx ∗ ty .
The rest of the proof are then simple observations.
3.3 The importance of domain closure The attempt to characterize a sequent calculus by a smooth structure gave the author the first hint of the importance of domain closure properties (here under finite unions) — and an incentive to look for stronger conditions than Cumulativity to obtain representation by smooth structures. We shortly introduce one such system (for another with similar representation problems, see Arieli and Avron [2000]), redefine preferential structures for such systems and give an example which shows failure of representation by smooth structures. We then indicate how to mend the representation proof for smooth structures as discussed above (which used closure of the domain under finite unions) by a suitable adaptation of H(U ), which will be replaced by H(U, x). The presence of the second parameter, x, seems necessary.
Nonmonotonic Logics: A Preferential Approach
497
It is a matter of ongoing research to characterize transitive smooth structures without closure of the domain under finite unions — our proof of the transitive case did use unions in a crucial way. Plausibility logic Plausibility logic was introduced by D. Lehmann [1992a; 1992b] as a sequent calculus in a propositional language without connectives. Thus, a plausibility logic language L is just a set, whose elements correspond to propositional variables, and a sequent has the X |∼ Y, where X, Y are f inite subsets of L; intuitively,
form X |∼ Y means X |∼ Y. Due to its simple language, we have no “or” on the left hand side, so the domain of definable sets is not necessarily closed under finite unions, and this has important repercussions on representation proofs and results, as we will see now. The reader interested in motivation is referred to the original articles [Lehmann, 1992a; Lehmann, 1992b]. We abuse notation, and write X |∼ a for X |∼ {a}, X, a |∼ Y for X ∪ {a} |∼ Y, ab |∼ Y for {a, b} |∼ Y, etc. When discussing plausibility logic, X,Y, etc. will denote finite subsets of L, a,b, etc. elements of L. DEFINITION 68. X and Y will be finite subsets of L, a, etc. elements of L. The base axiom and rules of plausibility logic are (we use the prefix “Pl” to differentiate them from the usual ones): (PlI)
(Inclusion): X |∼ a for all a ∈ X,
(PlRM)
(Right Monotony): X |∼ Y ⇒ X |∼ a, Y,
(PlCLM) (Cautious Left Monotony): X |∼ a, X |∼ Y ⇒ X, a |∼ Y, (PlCC)
(Cautious Cut): X, a1 . . . an |∼ Y, and for all 1 ≤ i ≤ n X |∼ ai , Y ⇒ X |∼ Y,
We now adapt the definition of a preferential model to plausibility logic. This is the central definition on the semantic side. DEFINITION 69. A model for a plausibility logic language L is just an arbitrary subset of L. If M := M, ≺ is a preferential model s.t. M is a set of (indexed) L-models, then for a finite set X ⊆ L (to be imagined on the left hand side of |∼!), we define (a) m |= X iff X ⊆ m (b) M (X) := {m: < m, i >∈ M for some i and m |= X} (c) µ(X) := {m ∈ M (X): ∃m, i ∈ M.¬∃m′ , i′ ∈ M (m′ ∈ M (X) ∧ m′ , i′ ≺ m, i)} (d) X |=M Y iff ∀m ∈ µ(X).m ∩ Y = ∅.
498
Karl Schlechta
(a) reflects the intuitive reading of X as X, and (d) that of Y as Y in X |∼ Y. Note that X is a set of “formulas”, and µ(X) = µM (M (X)). It is easy to see: PROPOSITION 70. (P lI) + (P lRM ) + (P lCC) is complete (and sound) for preferential models We refer the reader to [Schlechta, 2004] or [Schlechta, 1996] for details, as this is not central to our argument here. Incompleteness of plausibility logic for smooth structures We note the following fact for smooth preferential models: FACT 71. Let U, X, Y be any sets, M be smooth for at least {Y, X} and let µ(Y ) ⊆ U ∪ X, µ(X) ⊆ U, then X ∩ Y ∩ µ(U ) ⊆ µ(Y ). This is easy to see by a drawing little diagram. Consider now: EXAMPLE 72. Let L := {a, b, c, d, e, f }, and X := {a |∼ b, b |∼ a, a |∼ c, a |∼ f d, dc |∼ ba, dc |∼ e, f cba |∼ e}. Then X does not entail a |∼ e (the verification is tedious, and was first done by the author using a small computer program). This can be used to show that the condition in Fact 71 above fails. Discussion and remedy Our new conditions take care of the “semi-transitivity” of smoothness, coding it directly and not by a simple condition, which uses finite union. For this purpose, we modify the definition of H(U ), and replace it by H(U, x), which depends now on U and on x. Research currently under way underlines the necessity to do so. DEFINITION 73. Definition of H(U, x) : H(U, x)0 := U H(U, x)i+1 := H(U, x)i ∪ {U ′ : x ∈ µ(U ′ ), µ(U ′ ) ⊆ H(U, x)i } We take unions at limits. H(U, x) := {H(U, x)i : i < κ} for κ sufficiently big. (HU) is the property: x ∈ µ(U ), x ∈ Y − µ(Y ) → µ(Y ) ⊆ H(U, x). We then have: FACT 74. (1) x ∈ µ(Y ), µ(Y ) ⊆ H(U, x) → Y ⊆ H(U, x), (2) (HU) holds in all smooth models.
Nonmonotonic Logics: A Preferential Approach
499
This suffices for the construction: We patch the proof of the smooth case in Section 3.2 a little. H(U ) is replaced by H(U, x), Fact 60 is replaced by above Fact 74 (seen as a condition), and we avoid unions. For lack of space, the reader is referred to [Schlechta, 2004].
3.4
The limit version
Introduction The limit version is a natural extension of the minimal version. The basic motivation and idea are as follows: X may not be empty, but µ(X) may be empty, bacause there are no optimal (minimal) elements, they get smaller and smaller, but there is no smallest one — just as an open interval of the reals has no smallest element. In this case, the minimal variant collapses, as we can deduce everything by quantifying over the empty set of models. It is therefore natural to define T |∼ φ iff “from a certain point onward” φ holds. We make this precise by “minimizing initial segments”, or MISE: φ has to hold in a MISE A of M (φ) (and not anymore in µ(φ)), which is a subset of M (T ) with the following properties: (1) every model m of T is either in A, or there is m′ ≺ m, m′ ∈ A, and (2) A is downward closed in M (T ), i.e., if m ∈ A, m′ ∈ M (T ), m′ ≺ m, then m′ ∈ A. This is the natural definition, corresponding to the other uses of the word “limit”. Finally, it has to hold, i.e. below every element, there must be one where it holds, and it should not become false again. The definition can be simplified in the case of ranked structures to “all layers from a certain degree onward”. For distance based theory revision, we have to modify a little, and consider all m which have a global distance smaller than a given value — this modification is straightforward, and avoids similar problems when X, Y = ∅, but X | Y = ∅. Preferential structures Our main results are that, in an important class of examples, the limit version is equivalent to the minimal version. This holds for transitive structures in the limit interpretation, where • either the set of definable closed minimizing sets is cofinal (see below), or • we consider only formulas on the left of |∼ . We show that both satisfy the laws of the minimal variant, so the generated logics can be represented by a minimal preferential structure (but, of course, perhaps with a different relation).
500
Karl Schlechta
We begin by a modification of the use of the preferential relation: DEFINITION 75. (1) The version without copies: Let M := U, ≺. Define for Y with Y ⊆ X ⊆ U : Y is a minimizing initial segment, or MISE, of X iff: (a) ∀x ∈ X∃x ∈ Y.y 7 x — where y 7 x stands for x ≺ y or x = y and (b) ∀y ∈ Y, ∀x ∈ X(x ≺ y → x ∈ Y ). (2) The version with copies: Let M := U, ≺ be as above. Define for Y ⊆ X ⊆ U Y is a minimizing initial segment, or MISE of X iff: (a) ∀x, i ∈ X∃y, j ∈ Y.y, j 7 x, i and (b) ∀y, j ∈ Y, ∀x, i ∈ X(x, i ≺ y, j → x, i ∈ Y ). (3) Finally, we say that a set X of MISE is cofinal in another set of MISE X ′ (for the same base set X) iff for all Y ′ ∈ X ′ , there is Y ∈ X , Y ⊆ Y ′ . In the case of ranked structures (see above Definition 10), we may assume without loss of generality that the MISE sets have a particularly simple form: Given a ranked structure, let for X ⊆ U Λ(X) := {A ⊆ X : ∀x ∈ X∃a ∈ A(a ≺ x or a = x) ∧ ∀a ∈ A∀x ∈ X(x ≺ a ∨ x⊥a → x ∈ A)} (A minimizes X and is downward and horizontally closed.) Λ(X) is thus wlog. the set of MISE for X. Strictly speaking, we have to index Λ by ≺, but when the context is clear, we omit it. A MISE X is called definable iff {x : ∃x, i ∈ X} ∈ D L . We define on the logical level: T |=M φ iff there is a MISE Y ⊆ U⌈M (T ) s.t. Y |= φ. (⌈ is defined in Definition 3: U⌈M (T ) := {x, i ∈ U : x ∈ M (T )} — if there are no copies, we simplify in the obvious way.) Fact 76 contains some important facts about MISE. FACT 76. Let the relation ≺ be transitive. (1) If X is MISE for A, and X ⊆ B ⊆ A, then X is MISE for B. (2) If X is MISE for A, and X ⊆ B ⊆ A, and Y is MISE for B, then X ∩ Y is MISE for A. (3) If X is MISE for A, Y MISE for B, then there is Z ⊆ X ∪ Y MISE for A ∪ B.
Nonmonotonic Logics: A Preferential Approach
501
Proof. We give only the (somewhat trickier) argument for (3): Let Z := {x, i ∈ X: ¬∃b, j 7 x, i.b, j ∈ B − Y } ∪ {y, j ∈ Y : ¬∃a, i 7 y, j.a, i ∈ A − X}, where 7 stands for ≺ or = . (3.1) Z minimizes A ∪ B : We consider A, B is symmetrical. (a) We first show: If a, k ∈ X−Z, then there is y, i ∈ Z.a, k ≻ y, i. Proof: If a, k ∈ X−Z, then there is b, j 7 a, k, b, j ∈ B−Y. Then there is y, i ≺ b, j, y, i ∈ Y. But y, i ∈ Z, too: If not, there would be a′ , k ′ 7 y, i, a′ , k ′ ∈ A−X, but a′ , k ′ ≺ a, k, contradicting closure of X. (b) If a′′ , k ′′ ∈ A−X, there is a, k ∈ X, a, k ≺ a′′ , k ′′ . If a, k ∈ Z, continue with (a). (3.2) Z is closed in A ∪ B : Let then z, i ∈ Z, u, k ≺ z, i, u, k ∈ A ∪ B. Suppose z, i ∈ X — the case z, i ∈ Y is symmetrical. (a) u, k ∈ A − X cannot be, by closure of X. (b) u, k ∈ B − Y cannot be, as z, i ∈ Z, and by definition of Z. (c) If u, k ∈ X−Z, then there is v, l 7 u, k, v, l ∈ B−Y, so v, l ≺ z, i, contradicting (b). (d) If u, k ∈ Y −Z, then there is v, l 7 u, k, v, l ∈ A−X, contradicting (a). In the limit variant holds now: FACT 77. If ≺ is transitive, then (1) (AND) holds, (2) (OR) holds, (3) φ ∧ φ′ ⊆ φ ∪ {φ′ }, (4) Finite cumulativity holds, i.e. if φ |∼ ψ, then φ = φ ∧ ψ. The proof is a direct consequence of Fact 76. We emphasize, that neither the infinitary version of (PR), nor the infinitary version of Cumulativity hold in the general limit case — see Example 3.4.1 and Example 3.4.2 in [Schlechta, 2004]. Consequently: For the transitive case, on the left only formulas (perhaps the most important case), any limit version structure is equivalent to a minimal version structure. The proof uses closure properties (closure under set difference). Conversely, we can read any smooth minimal version as a trivial limit version, so the two are in an important class (transitive, formulas on the left) equivalent. This fact and the next point will be summarized in Proposition 78. The KLM results show that they are equivalent to a smooth minimal structure. (We work in the other sections with the strong infinitary condition, which fails here, see Example 3.4.2 in [Schlechta, 2004].) Similar considerations as for formulas show: Having cofinally many definable sets trivializes the problem (again in the transitive case).
502
Karl Schlechta
We summarize our main positive results on the limit variant of general preferential structures: PROPOSITION 78. Let the relation be transitive. Then (1) Every instance of the the limit version, where the definable closed minimizing sets are cofinal in the closed minimizing sets, is equivalent to an instance of the minimal version. (2) If we consider only formulas on the left of |∼, the resulting logic of the limit version can also be generated by the minimal version of a (perhaps different) preferential structure. Moreover, the structure can be chosen smooth. Similar results hold in the ranked case, as we will see now. We consider structures of the type (U, ≺), where ≺ is a ranked relation, without copies. The condition ∅ = X ⊆ U → µ≺ (X) = ∅ will not necessarily hold (but it will hold for finite X as we have no copies). FACT 79. The following laws hold in the limit version of ranked structures: (1) T is consistent, if T is, (2) T ⊆ T , (3) T is classically closed, (4) T |∼ φ, T ′ |∼ φ → T ∨ T ′ |∼ φ, (5) If T |∼ φ, then T |∼ φ′ ↔ T ∪ {φ} |∼ φ′ . This results again in trivialization: PROPOSITION 80. (1) Having cofinally many definable sets in the Λ′ s trivializes the problem, it becomes equivalent to the minimal variant. (2) When considering just formulas, in the ranked case without copies, Λ is equivalent to µ — so Λ is trivialized again in this case. More precisely: Let a logic φ |∼ ψ be given by the limit variant without copies. Then there is a ranked structure, which gives exactly the same logic, but interpreted in the minimal variant. The following instructive example shows that this is NOT necessarily true if we consider full theories T and T |∼ ψ. EXAMPLE 81. Let L be given by the propositional variables pi , iω. Order the atomic formulas by pi ≺ ¬pi , and then order all sequences s = +/ − p0 , +/ − p1 , . . . ., i < n ≤ ω lexicographically, identify models with such sequences of length
Nonmonotonic Logics: A Preferential Approach
503
ω. So, in this order, the biggest model is the one making all pi false, the smallest the one making all pi true. Any finite sequence (an initial segment) s = +/ − p0 , +/ − p1 , . . .+/−pn has a smallest model +/−p0 , +/−p1 , . . .+/−pn , pn+1 , pn+2 , . . ., which continues all positive, call it ms . As there are only countably many such finite sequences, the number of ms is countable, too (and ms = ms′ for different s, s′ can happen). Take now any formula φ, it can be written as a finite disjunction of sequences s of fixed length n +/ − p0 , +/ − p1 , . . . + / − pn , choose wlog. n minimal, and denote sφ the smallest (in our order) of these s. E.g., if φ = (p0 ∧p1 )∨(p1 ∧¬p2 ) = (p0 ∧p1 ∧p2 )∨(p0 ∧p1 ∧¬p2 )∨(p0 ∧p1 ∧¬p2 )∨(¬p0 ∧p1 ∧¬p2 ), and sφ = p0 , p1 , p2 . (1) Consider now the initial segments defined by this order. In this order, the initial segments of the models of φ are fully determined by the smallest (in our order) s of φ, moreover, they are trivial, as they all contain the minimal model ms = sφ + pn+1 , pn+2 , . . . — where + is concatenation. It is important to note that even when we take away ms , the initial segments will still converge to ms — but it is not there any more. Thus, in both cases, ms there or not, φ |=Λ sφ + pn+1 , pn+2 , . . . — written a little sloppily. (A more formal argument: If φ |=Λ ψ, with the ms present, then ψ holds in ms , but ψ has finite length, so beyond some pk the values do not matter, and we can make them negative — but such sequences did not change their rank, they stay there.) (2) Modify the order now. Put all ms on top of the construction. As there are only countably many, all consistent φ will have most of their models in the part left untouched — the ms are not important for formulas and their initial segments. To summarize: φ |=Λ ψ is the same in both structures, as long as we consider just formulas φ. Of course, when considering full theories, we will see the difference — it suffices to take theories of exactly two models. Thus, just considering formulas does not suffice to fully describe the underlying structure. Note that we can add to the information about formulas information about full theories, which will contradict rankedness (e.g., in the second variant, take three models, and make m⊥m′ ≺ m′′ , but not m ≺ m′′ ) — but this information will not touch the formula part, as far as formulas are concerned, it stays consistent, as we never miss those models ms . Moreover, the reordered structure (in (2)) is not equivalent to any minimal structure when considering full theories: Suppose it were. We have ∅ |∼ +pi for all i, so the whole structure has to have exactly one minimal model, but this model is minimized by other models, a contradiction. Theory revision We find very similar results for theory revision.
504
Karl Schlechta
Analogous to the case of preferential, and in particular ranked structures, we can show that, as long as we consider revisions of the form φ ∗ ψ, the limit version is equivalent to the minimal version: Again, the limit version for formulas has the logical properties of the minimal case, thus a limit distance structure is equivalent to a minimal distance structure — with, perhaps, a different distance. Essential are, here again, closure properties of the domain. The essential point is now: Given two sets X and Y, we are interested in systems of points in Y, which are closer and closer to X. So, on the right, we compare d(X, y) with d(X, y ′ ), but, X may itself be infinite and getting closer and closer to Y without a minimum. Now, if d(X, y) < d(X, y ′ ), then there is a “witness” x ∈ X which shows this, i.e. ∃x ∈ X s.t. ∀x′ ∈ Xd(x, y) < d(x′ , y ′ ) : d(X, y) < d(X, y ′ ) iff there is x ∈ X s.t. ∀x′ ∈ X d(x, y) < d(x′ , y ′ ) — such x will be called a witness for d(X, y) < d(X, y ′ ) Thus, we consider systems Λ(X, Y ), where Λ(X, Y ) ⊆ P(Y ) Given a distance d, such Λ(X, Y ) will be ∅ = {y ∈ Y : d(X, y) ≤ r} for some r (alternatively: d(X, y) < r), or, more generally, for X which get themselves ever closer, ∅ = {y ∈ Y : ∃x ∈ X.d(x, y) ≤ r}(< r respectively). Note that for X, Y = ∅ any A ∈ Λ(X, Y ) is nonempty, too, as we do not choose r too small, and that for A, A′ ∈ Λ(X, Y )A ⊆ A′ or A′ ⊆ A. The logical side is then defined by: φ ∈ T ∗ T ′ iff there is A ∈ Λ(M (T ), M (T ′ )) s.t. A |= φ. By compactness and inclusion, T ∗ T ′ is consistent (if T and T ′ are) and deductively closed. We have again a trivialization result: PROPOSITION 82. The limit variant of a symmetrical distance defined revision is equivalent to the minimal variant, as long as we consider formulas (and not full theories) on the left.
3.5 The role of definability preservation We discuss in this section two things. First, on the negative side, we will see that general (i.e. not necessarily definability preserving) preferential structures in the minimal variant (smooth or not, ranked or not) do not have any characterization of the usual forms — not even infinitary ones. The same applies to the general limit variant. The limit version will in all cases be a trivial consequence of the minimal version: On the one hand, the constructed structures will give the same results in the minimal and the limit reading (this is due to the simplicity of the relation, where paths will have length at most 1). On the other hand, the logics we define will not be preferentially or distance representable in both readings — this is again trivial. Second, on the positive side, we show how to provide characterizations to the general case, too, using “small” sets of exceptions — small in a topological sense, but arbitrarily big in cardinality.
Nonmonotonic Logics: A Preferential Approach
505
Negative results We have seen that distance based revision has no finite characterization, but a countable set of finite conditions suffices, as transitivity speaks about arbitrarily long finite chains. The case of not necessarily definability preserving preferential structures (and, as a consequence, of the limit version of preferentials structures) is much worse, as we will see now in Proposition 83. This proposition shows that there is no “normal” characterization of any size of general preferential structures, and consequently of the limit variant. We will not define formally what a “normal” characterization is (essentially to leave more room to adapt our results), we just remind the reader that usual characterizations have the form of universally quantified boolean expressions of set expressions mentioning model sets like M (T ), M (φ), and the result of applying an operator like µ to them. A standard example is ∀X∀Y (X ⊆ Y → µ(Y ) ∩ X ⊆ µ(X)). This negative result, together with above reductory results, casts a heavy doubt on the utility of the limit version as a reasoning tool. It seems either hopelessly, or unnecessarily, complicated. But it seems useful as a tool for theoretical investigations, as it separates finitary from infinitary versions, see in particular Section 3.4.1 of [Schlechta, 2004]. We go into more details here, as the proof uses ideas which are perhaps no so common in the field, but first we summarize the results — we will discuss only some of them here. PROPOSITION 83. (1) There is no “normal” characterization of any fixed size of not necessarily definability preserving preferential structures. (2) There is no “normal” characterization of any fixed size of the general limit variant of preferential structures. (3) There is no “normal” characterization of any fixed size of not necessarily definability preserving ranked preferential structures. (4) There is no “normal” characterization of any fixed size of the general limit version of ranked preferential structures. (5) There is no normal characterization of not necessarily definability preserving distance defined revision. The distance can be chosen symmetric or not. (6) There is no normal characterization of the limit version of distance defined revision. The distance can be chosen symmetric or not. As an indication of the construction of the counterexamples, we treat the case of general preferential structures. All details can be found in [Schlechta, 2004].
506
Karl Schlechta
NOTATION 84. (1) We will always work in a propositional language L with κ many (κ an infinite cardinal) propositional variables pi : i < κ. As p0 will have a special role, we will set p := p0 . In the revision case, we will use another special variable, which we will call q. (This will just avoid excessive indexing.) (2) In all cases, we will show that there is no normal characterization of size ≤ κ. As κ was arbitrary, we will have shown the results. We will always assume that there is such a characterization Φ of size κ, and derive a contradiction. For this purpose, we construct suitable logics which are not representable, and show that for any instantiation of Φ (i.e. with at most κ theories T or formulas φ) in these logics, we find a “legal” structure where these instances have the same value as in the original logic, a contradiction to the assumed discerning power of Φ. (By hypothesis, at least one instance has to have negative value in the not representable logics, but then it has the same negative value in a legal structure, a contradiction.) To simplify notation, we assume wlog. that the characterization works with theories only, we can always replace a formula φ by the theory {φ}, etc. The structures to be constructed depend of course on the particular instantiation of Φ, a set of theories of size ≤ κ, we will denote this set T , and construct the structures from T and the “illegal” original logic. (3) Given any model set X ⊆ ML , we define X := M (T h(X)) - the closure of X in the standard topology. We state our main technical lemma. LEMMA 85. Let L be a language of κ many (κ an infinite cardinal) propositional variables. Let a theory T be given, ET ⊆ {X ⊆ ML : card(X) ≤ κ} be closed under unions of size ≤ κ and subsets, and T be defined by T := T h( {M (T ) − A : A ∈ ET }). Then there is an (usually not unique) “optimal” AT ∈ ET s.t. (1) T = T h(M (T ) − AT ), (2) for all A ∈ ET M (T ) − AT ⊆ M (T ) − A.
The proof involves some counting, and is in [Schlechta, 2004]. We are now ready to prove the negative result for general, not necessarily definability preserving preferential structures and the general limit variant. i.e. (1) and (2) of Proposition 83. Proof. Before we begin the proof, we recall that the “small sets of exceptions” we speak about can be arbitrarily big unions of exceptions, this depends essentially
Nonmonotonic Logics: A Preferential Approach
507
on the size of the language. So there is no contradiction in our results. If you like, the “small” of the “small sets of exceptions” is relative, the κ discussed here is absolute. (2) It is easy to see that (2) is a consequence of (1): Any minimal variant of suitable preferential structures can also be read as a degenerate case of the limit variant: There is a smallest closed minimizing set, so both variants coincide. This is in particular true for the structurally extremely simple cases we consider here — the relation will be trivial, as the paths in the relation have length at most 1, we work with quantity. On the other hand, it is easily seen that the logic we define first is not preferential, neither in the minimal, nor in the limit reading. Proof of (1): Let then κ be any infinite cardinal. We show that there is no characterization of general (i.e. not necessarily definability preserving) preferential structures which has size ≤ κ. We suppose there were one such characterization Φ of size ≤ κ, and construct a counterexample. The idea of the proof is very simple. We show that it suffices to consider for any given instantiation of Φ ≤ κ many pairs m ≺ m− in a case not representable by a preferential structure, and that ≤ κ many such pairs give the same result in a true preferential structure for this instantiation. Thus, every instantiation is true in an “illegal” and a “legal” example, so Φ cannot discern between legal and illegal examples. The main work is to show that ≤ κ many pairs suffice in the illegal example, this was done in Lemma 85. We first note some auxiliary facts and definitions, and then define the logic, which, as we show, is not representable by a preferential structure. We then use the union of all the “optimal” sets AT guaranteed by Lemma 85 to define the preferential structure, and show that in this structure T for T ∈ T is the same as in the old logic, so the truth value of the instantiated expression is the same in the old logic and the new structure. Writing down all details properly is a little complicated. As any formula φ in the language has finite size, φ uses only a finite number of variables, so φ has 0 or 2κ different models. For any model m with m |= p, let m− be exactly like m with the exception that − m |= ¬p. (If m |= p, m− is not defined.) Let A := {X ⊆ M (¬p) : card(X) ≤ κ}. For given T, let AT := {X ∈ A : X ⊆ M (T ) ∧ ∀m− ∈ X.m ∈ M (T )}. Note that AT is closed under subsets and under unions of size ≤ κ. For T, let BT := {X ∈ AT : M (T ) − X = M (T )}, the (in the logical sense) “big” elements of AT . For X ⊆ ML , let X⌈⌈ M (T ) := {m− ∈ X : m− ∈ M (T ) ∧ m ∈ M (T )}. Thus, AT = {X⌈⌈ M (T ) : X ∈ A}. Define now the logic |∼ as follows in two steps: (1) T h({m, m− }) := T h({m}) (Speaking preferentially, m ≺ m− , for all pairs m, m− , this will be the entire relation. The relation is thus extremely simple, ≺ −paths have length
508
Karl Schlechta
at most 1, so ≺ is automatically transitive.) We now look at (in terms of preferential models only some!) consequences: (2) T := T h( {M (T ) − A : A ∈ BT }) = T h( {M (T ) − A : A ∈ AT }). We note:
(a) This — with exception of the size condition — would be exactly the preferential consequence of part (1) of the definition. (b) (1) is a special case of (2), we have seperated them for didactic reasons. (c) The prerequisites of Lemma 85 are satisfied for T and AT . (d) It is crucial that we close before intersecting. (Remark: We discussed a similar idea — better “protection” of single models by bigger model sets — in Section 2.2, where we gave a counterexample to the KLM characterization.) This logic is not preferential. We give the argument for the minimal case, the argument for the limit case is the same. Take T := ∅. Take any A ∈ AT . Then T h(ML ) = T h(ML − A), as any φ, which holds in A, will have 2κ models, so there must be a model of φ in ML − A, so we cannot separate A or any of its subsets. Thus, M (∅) − A = M (∅) for all A of size ≤ κ, so ∅ = ∅, which cannot be if |∼ is preferential, for then ∅ = p. Suppose there were a characterization Φ of size ≤ κ. It has to say “no” for at least one instance T (i.e. a set of size ≤ κ of theories) of the universally quantified condition Φ. We will show that we find a true preferential structure where this instance T of Φ has the same truth value, more precisely, where all T ∈ T have the same T in the old logic and in the preferential structure, a contradiction, as this instance evaluates now to “false” in the preferential structure, too. Suppose T ∈ T . If T = T , we do nothing (or set AT := ∅). When T is different from T , this is because BT = ∅. By Lemma 85, for each of the ≤ κ T ∈ T , it suffices to consider a set AT of size ≤ κ of suitable models of ¬p to calculate T , i.e. T = T h(M (T ) − AT ), so, all in all, we work just with at most κ many such models. More precisely, set B := {AT : T = T h(M (T ) − AT ) = T , T ∈ T }. Note that for each T with T = T , B⌈⌈ M (T ) ∈ BT , as B has size ≤ κ, and B contains AT , so M (T ) − B⌈⌈ M (T ) = M (T ). But we also have T = T h(M (T ) − AT ) = T h(M (T ) − B⌈⌈ M (T )), as AT was optimal in BT . Consider now the preferential structure where we do not make all m ≺ m− , but only the κ many of them featuring in B, i.e. those we have used in the instance T of Φ. We have to show that the instance T of Φ still fails in the new structure. But
Nonmonotonic Logics: A Preferential Approach
509
this is now trivial. Things like T , etc. do not change, the only problem might be T . As we work in a true preferential structure, we now have to consider not subsets of size at most κ, but all of B⌈⌈ M (T ) at once — which also has size ≤ κ. But, by definition of the new structure, T = T h(M (T ) − B⌈⌈ M (T )) = T h(M (T ) − AT ). On the other hand, if T = T in the old structure, the same will hold in the new structure, as B⌈⌈ M (T ) is one of the sets considered, and they did not change T . Thus, the T in the new and in the old structure are the same. So the instance T of Φ fails also in a suitable preferential structure, contradicting its supposed discriminatory power. The limit reading of this simple structure gives the same result. We turn to Positive results: characterization with “small” exception sets We characterize in this final section not necessarily definability preserving operators, first for preferential structures, then for distance based revision. The basic idea is the same in both cases. We approximate a given choice function or set operator up to a (logically) small set of exceptions. Suppose that T ′ = T h(µ(M (T ))), the set of formulas valid in the minimal models of T. If µ is definability preserving, then M (T ′ ) = µ(M (T )), and there is no model m of T ′ s.t. there is some model m′ of T with m′ ≺ m. If µ is not definability preserving, there might be a model m of T ′ , not in µ(M (T )) and a model m′ of T s.t. m′ ≺ m. But there may not be many such models m, many in the logical sense, i.e. that there is φ s.t. T ′ ⊢ φ, T ′ ⊢ ¬φ, and M (T ∪ {φ}) consists of such models — otherwise µ(M (T )) |= ¬φ. In this sense, the set of such exceptional models is small. Small sets of exceptions can thus be tolerated, they correspond to the coarseness of the underlying language, which cannot describe all sets of models. The quantity of such models can, however, be arbitrarily big, when we measure it by cardinality. We first define what a “small” subset is — in purely algebraic terms. There will be no particular properties (apart from the fact that small is downward closed), as long as we do not impose any conditions on Y. (Intuitively, Y is the set of theory definable sets of models.) Let Y ⊆ P(Z). If B ∈ Y, A ⊆ B is called a small subset of B iff there is no X ∈ Y, B − A ⊆ X ⊂ B. If Y is closed under arbitrary intersections, Z ∈ Y, A ⊆ Z, A will be the smallest X ∈ Y with A ⊆ X — the closure, hull, or whatever you like. In the intended application, A is M (T h(A)). We will show that our laws hold up to such small sets of exceptions. This is reflected, e.g. in condition (PR) for preferential structures without definability preservation: ( |∼ 4) Let T, Ti , i ∈ I be theories s.t. ∀i Ti ⊢ T, then there is no φ s.t. φ ∈ T and M (T ∪{¬φ}) ⊆ {M (Ti )−M (Ti ) : i ∈ I} (see Conditions 94), by the nonexistence of φ — which corresponds to the nonexistence of intermediate definable subsets.
510
Karl Schlechta
Note that the index set I may be arbitrary big; this depends on the size of the language. The problem and the remedy for preferential structures and distance based revision are very similar. We begin with Preferential structures We present the technique used to show the results in outline. The results are, literally and abstractly, very close to those used to obtain the results for the definability preserving case. Let Y := D L . For an arbitrary, i.e. not necessarily definability preserving, preferential structure Z of L−models, let for X ∈ Y µ′Z (X) := µZ (X) − {x : ∃Y ∈ Y, Y ⊆ X, x ∈ Y − µ(Y )} = {x ∈ X : ¬∃Y ∈ Y(Y ⊆ X and x ∈ Y − µZ (Y )} µ′ (we omit the index Z, when this does not create any ambiguity), and its adequate modification for the smooth case, are the central definitions, and will replace µ in the technical development. Note that, µ(X) = µ′ (X), i.e. that µ(X)−µ′ (X) is small, and, if Z is definability preserving, then µ′ = µ. For representation, we consider now the Conditions 88 below and show that they — (µ ⊆) and (µ2) in the general case, (µ∅), (µ ⊆), (µ2s),(µCU M ) in the smooth case — imply a list of properties for µ′ and H(U ) := {X ∈ Y : µ(X) ⊆ U }, described in Conditions 88 and 89. We then show that such µ′ can be represented by a (general or smooth) preferential structure, which can be chosen transitive. The strategy and execution is the same as for the definability preserving case. It remains to replace, better approximate, µ′ by µ to obtain representation, this is possible, as they differ only by small sets. We put our results together in Proposition 90 and Proposition 93. We recollect the definition of the hull H, and then define two versions of the approximation µ′ to µ (for the general and the smooth case), and formulate the Conditions 88 for µ, which we will use for characterization. We then give another set of conditions, Conditions 89 for µ′ , which are implied by the first set in Conditions 88. The conditions for µ′ allow us to represent µ′ just as we represented µ in Section 2. Proposition 90 and 93 bridge the gap between µ and µ′ , and state the results we worked for, i.e. representation of µ. Let in the following Y ⊆ P(Z) be closed under arbitrary intersections and finite unions, ∅, Z ∈ Y, and let . be defined wrt. Y. Let µ : Y → Y. Smoothness will also be wrt. Y. Recollect the definition of H(U ) : H(U ) := {X ∈ Y : µ(X) ⊆ U } for U ∈ Y. CONDITION 86. (H1)
U ⊆ H(U ),
Nonmonotonic Logics: A Preferential Approach
(H2)
511
U ⊆ U ′ → H(U ) ⊆ H(U ′ ) for U, U ′ ∈ Y.
FACT 87. Conditions (H1) and (H2) hold for H as defined above, if µ(U ) ⊆ U. We make now the conditions for µ, µ′ , and H precise. CONDITION 88. (µ∅)
U = ∅ → µ(U ) = ∅,
(µ ⊆)
µ(U ) ⊆ U,
(µ2)
µ(U ) − µ′ (U ) is small, where µ′ (U ) := {x ∈ U : ¬∃Y ∈ Y(Y ⊆ U and x ∈ Y − µ(Y ))},
(µ2s)
µ(U ) − µ′ (U ) is small, where µ′ (U ) := {x ∈ U : ¬∃U ′ ∈ Y(µ(U ∪ U ′ ) ⊆ U and x ∈ U ′ − µ(U ′ ))},
(µCU M ) µ(X) ⊆ Y ⊆ X → µ(X) = µ(Y ) for X, Y, U ∈ Y. Note that (µ2) contains essentially the fundamental condition X ⊆ Y → µ(Y )∩ X ⊆ µ(X) of preferential structures. To see this, it suffices to take ∅ as the only small set, or µ(U ) = µ′ (U ). We re-emphasize that “small” does not mean “small by cardinality”. CONDITION 89. (µ′ ⊆) µ′ (U ) ⊆ U, (µ′ 2)
x ∈ µ′ (U ), x ∈ Y − µ′ (Y ) → Y ⊆ U,
(µ′ ∅)
U = ∅ → µ′ (U ) = ∅,
(µ′ 4)
µ′ (U ∪ Y ) − H(U ) ⊆ µ′ (Y ),
(µ′ 5)
x ∈ µ′ (U ), x ∈ Y − µ′ (Y ) → Y ⊆ H(U ),
(µ′ 6)
Y ⊆ H(U ) → µ′ (U ∪ Y ) ⊆ H(U ) for Y, U ∈ Y.
Note that (µ′ 5) implies (µ′ 2) if (H1) holds. Outline of the proofs: In both cases, i.e. the general and the smooth case, we follow the same strategy: First, we show from the conditions on µ — (µ ⊆), (µ2) in the general case, (µ∅), (µ ⊆), (µ2s), (µCU M ) in the smooth case — that certain conditions hold for µ′ (and for H in the smooth case) — (µ′ ⊆), (µ′ 2) in the general case, (µ′ ⊆), (µ′ ∅), (µ′ 4)–(µ′ 6) in the smooth case. We then show that any µ′ : Y → P(Z) satisfying these conditions can be represented by a (smooth) preferential structure Z, and that the structure can be chosen transitive. As the proof for the not necessarily transitive case is easier, we do this proof first, and then the transitive case. The basic ideas are mostly the same as those used for the definability
512
Karl Schlechta
preserving case. Finally, we show that if Z is a [smooth] preferential structure, (µ ⊆), (µ2) [(µ∅), (µ ⊆), (µ2s), (µCU M ) in the smooth case] will hold for µZ . Moreover, if µ′ was defined from µ as indicated, and if in addition (µ2) [or (µ2s)] holds, then µ = µZ . Putting all these things together results in representation results for the general and the smooth case, Proposition 90 and 93. For the details: We first construct µ′ from µ µ′ (U ) := {x ∈ U : ¬∃Y ∈ Y(Y ⊆ U and x ∈ Y − µ(Y ))}. We then see that µ′ satisfies (µ′ ⊆) and (µ′ 2), if µ satisfies (µ ⊆) and (µ2). (The main part of the argument is to show (µ′ 2) : If x ∈ Y − µ′ (Y ), then, by definition, there is Y ′ ∈ Y, Y ′ ⊆ Y and x ∈ Y ′ − µ(Y ′ ). If, in addition, Y ⊆ U, then Y ′ ⊆ U, so x ∈ µ′ (U ).) Thus, we can conclude by Proposition 19 that there is a preferential structure Z over Z s.t. µ′ = µZ , where Z can be chosen transitive. We conclude that such µ have a sufficient approximation by a preferential structure: PROPOSITION 90. Let Z be an arbitrary set, Y ⊆ P(Z), µ : Y → Y, Y closed under arbitrary intersections and finite unions, and ∅, Z ∈ Y, and let . be defined wrt. Y. (a) If µ satisfies (µ ⊆), (µ2), then there is a transitive preferential structure Z over Z s.t. for all U ∈ Y µ(U ) = µZ (U ) . (b) If Z is a preferential structure over Z and µ : Y → Y s.t. for all U ∈ Y µ(U ) = µZ (U ), then µ satisfies (µ ⊆), (µ2).
Proof of (b): (µ ⊆) : µZ (U ) ⊆ U, so by U ∈ Yµ(U ) = µZ (U ) ⊆ U. (µ2) : If (µ2) is false, there is U ∈ Y s.t. for U ′ := {Y ′ − µ(Y ′ ) : Y ′ ∈ Y, Y ′ ⊆ U } µ(U ) − U ′ ⊂ µ(U ). By µZ (Y ′ ) ⊆ µ(Y ′ ), Y ′ − µ(Y ′ ) ⊆ Y ′ − µZ (Y ′ ). No copy of any x ∈ Y ′ − µZ (Y ′ ) with Y ′ ⊆ U, Y ′ ∈ Y can be minimal in Z⌈U. Thus, by µZ (U ) ⊆ µ(U ), µZ (U ) ⊆ µ(U ) − U ′ , so µZ (U ) ⊆ µ(U ) − U ′ ⊂ µ(U ), contradiction. We turn to the smooth case, it works in an analogous way. We first define DEFINITION 91. µ′ (U ) := {x ∈ U : ¬∃U ′ ∈ Y(x ∈ U ′ − µ(U ′ ) and µ(U ∪ U ′ ) ⊆ U )}.
Nonmonotonic Logics: A Preferential Approach
513
Under the prerequisites (µ∅), (µ ⊆), (µ2s), (µCU M ) for µ, H and µ′ will satisfy (H1) and (H2) of Conditions 3.1, and (µ′ ⊆), (µ′ ∅), (µ′ 4)–(µ′ 6) of Conditions 3.3, and we can conclude: PROPOSITION 92. Let µ′ : Y → P(Z) and H : Y → P(Z) be two operations satisfying (H1) and (H2) of Conditions 86, and (µ′ ⊆), (µ′ ∅), (µ′ 4)–(µ′ 6) of Conditions 89. Then (a) there is a smooth preferential structure Z over Z s.t. µ′ = µZ , (b) Z can be chosen transitive. We finish with PROPOSITION 93. Let Z be an arbitrary set, Y ⊆ P(Z), µ : Y → Y, Y closed under arbitrary intersections and finite unions, and ∅, Z ∈ Y, and let . be defined wrt. Y.
(a) If µ satisfies (µ∅), (µ ⊆), (µ2s), (µCU M ), then there is a transitive smooth preferential structure Z over Z s.t. for all U ∈ Y µ(U ) = µZ (U ) .
(b) If Z is a smooth preferential structure over Z and µ : Y → Y s.t. for all U ∈ Y µ(U ) = µZ (U ), then µ satisfies (µ∅), (µ ⊆), (µ2s), (µCU M ). The proof is similar to that of the general situation. We turn to the logical counterpart: Consider
CONDITION 94. (CP)
Con(T ) → Con(T ),
(LLE) T = T ′ → T = T ′ , (CL)
T is classically closed,
(SC)
T ⊆ T,
( |∼ 4) Let T, Ti , i ∈ I be theories s.t. ∀i Ti ⊢ T, then there is no φ s.t. φ ∈ T and M (T ∪ {¬φ}) ⊆ {M (Ti ) − M (Ti ) : i ∈ I},
( |∼ 4s) Let T, Ti , i ∈ I be theories s.t. ∀i T ⊆ Ti ∨ T , then there is no φ s.t. φ ∈ T and M (T ∪ {¬φ}) ⊆ {M (Ti ) − M (Ti ) : i ∈ I}, ( |∼ 5) T ∨ T ′ ⊆ T ∨ T ′ ,
(CUM) T ⊆ T ′ ⊆ T → T = T ′ for all T, T ′ , Ti .
514
Karl Schlechta
We then have: PROPOSITION 95. Let |∼ be a logic for L. Then: (a.1) If M is a classical preferential model over ML and T = T h(µM (M (T ))), then (LLE), (CCL), (SC), ( |∼ 4) hold for the logic so defined. (a.2) If (LLE), (CCL), (SC), ( |∼ 4) hold for a logic, then there is a transitive classical preferential model over ML M s.t. T = T h(µM (M (T ))). (b.1) If M is a smooth classical preferential model over ML and T = T h(µM (M (T ))), then (CP), (LLE), (CCL), (SC), ( |∼ 4s), ( |∼ 5), (CUM) hold for the logic so defined. (b.2) If (CP), (LLE), (CCL), (SC), ( |∼ 4s), ( |∼ 5), (CUM) hold for a logic, then there is a smooth transitive classical preferential model M over ML s.t. T = T h(µM (M (T ))).
Theory Revision Just as we have approximated µ by µ′ for preferential structures, we approximate | by |′ for revision. We consider |′ s.t. A | B = A |′ B, for |: Y × Y → Y, define A |′ B := A | B − {b ∈ B : ∃B ′ ∈ Y(b ∈ B ′ ⊆ B and b ∈ A | B ′ )}, formulate suitable conditions for |′ , in particular a loop condition, and show that |′ can be represented by a distance. The logical Conditions 97 describe the logical situation, and we summarize the result in Proposition 98. DEFINITION 96. If ∗ is a revision function, we define S ∗′ T := M (S ∗ T ) {m ∈ M (T ) : ∃T ′ (m |= T ′ , T ′ ⊢ T, m |= S ∗ T ′ )} We consider the following conditions for a revision function ∗ defined for arbitrary consistent theories on both sides. CONDITION 97. (*0)
If |= T ↔ S, |= T ′ ↔ S ′ , then T ∗ T ′ = S ∗ S ′ ,
(*1)
T ∗ T ′ is a consistent, deductively closed theory,
(*2)
T ′ ⊆ T ∗ T ′,
(*3)
If T ∪ T ′ is consistent, then T ∗ T ′ = T ∪ T ′ ,
(*5)
T h(T ∗′ T ′ ) = T ∗ T ′ ,
(*’L) M (T0 )∩(T1 ∗′ (T0 ∨T2 )) = ∅, M (T1 )∩(T2 ∗′ (T1 ∨T3 )) = ∅, M (T2 )∩(T3 ∗′ (T2 ∨ T4 )) = ∅, . . . M (Tk−1 )∩(Tk ∗′ (Tk−1 ∨T0 )) = ∅ imply M (T1 )∩(T0 ∗′ (Tk ∨T1 )).
Nonmonotonic Logics: A Preferential Approach
515
PROPOSITION 98. Let L be a propositional language. A revision function ∗ is representable by a symmetric consistency preserving [identity respecting] pseudodistance iff ∗ satisfies (*0)–(*2), (*5), (*’L) [and (*3)]. The proofs are similar to those for preferential structures. ACKNOWLEDGEMENTS I would like to thank David Makinson for valuable discussion and suggestions. Daniel Lehmann kindly gave permission to re-use the material of the joint paper [Lehmann et al., 2001]. The following editors kindly gave permission to re-use the following material published before: Almost all of the material presented here was published before by Elsevier in the author’s [Schlechta, 2004]. The material on a nonsmooth model of cumulativity (in Section 2.2) was published by Oxford University Press in [Schlechta, 1999]. The basic material on distance based revision (Section 2.4) was published in the Journal of Symbolic Logic, see [Lehmann et al., 2001]. The basic proof techniques for preferential structures (Section 3.2) were published by Oxford University Press in [Sch92]. The advanced proof techniques for preferential structures (Sections 3.2–3.2) were published in the Journal of Symbolic Logic, see [Schlechta, 2000]. The basic material on Plausibility Logic (Sections 3.3, 3.3) was published by Kluwer, see [Schlechta, 1996]. BIBLIOGRAPHY [Arieli and Avron, 2000] O. Arieli and A. Avron. General Patterns for Nonmononic Reasoning: From Basic Entailment to Plausible Relations, Logic Journal of the Interest Group in Pure and Applied Logics, Vol. 8, No. 2, pp. 119-148, 2000. [Alchourron et al., 1985] C. Alchourron, P. G¨ ardenfors, and D. Makinson. On the Logic of Theory Change: partial meet contraction and revision functions, Journal of Symbolic Logic, Vol. 50, pp. 510-530, 1985. [Audibert et al., 1999] L. Audibert, C. Lhoussaine and K. Schlechta. Distance based revision of preferential logics, Logic Journal of the Interest Group in Pure and Applied Logics, Vol. 7, No. 4, pp. 429-446, 1999. [Aizerman and Malishevski, 1981] M. A. Aizerman and A. V. Malishevski. General theory of best variants choice: Some aspects, IEEE Transactions on Automatic Control, 26:1030-1040, 1981. [Aizerman, 1985] M. A. Aizerman. New problems in the general choice theory: Review of a research trend, Social Choice and Welfare, 2:235-282, 1985. [Arrow, 1959] K. J. Arrow. Rational choice functions and orderings, Economica, 26:121-127, 1959. [Ben-David and Ben-Eliyahu, 1994] S. Ben-David and R. Ben-Eliyahu. A modal logic for subjective default reasoning, Proceedings LICS-94, 1994 [Bossu and Siegel, 1985] G. Bossu and P. Siegel. Saturation, Nonmonotonic Reasoning and the Closed-World Assumption, Artificial Intelligence, 25, 13–63, 1985.
516
Karl Schlechta
[Chernoff, 1954] H. Chernoff. Rational selection of decision functions, Econometrica, 26:121-127, 1954 [Friedman and Halpern, 1995] N. Friedman and J. Halpern. Plausibility measures and default reasoning, IBM Almaden Research Center Tech.Rept. 1995, to appear in Journal of the ACM. [Gabbay, 1985] D. M. Gabbay. Theoretical foundations for non-monotonic reasoning in expert systems. In K. R. Apt, ed. Logics and Models of Concurrent Systems, pp. 439–457. Springer, Berlin, 1985. [Hansson, 1971] B. Hansson. An analysis of some deontic logics, Nous 3, 373-398. Reprinted in R. Hilpinen, ed. Deontic Logic: Introductory and Systematic Readings, pp. 121–137. Reidel, Dordrecht, 1971. [Kraus et al., 1990] S. Kraus, D. Lehmann, and M. Magidor. Nonmonotonic reasoning, preferential models and cumulative logics, Artificial Intelligence, 44 (1-2), 167–207, 1990. [Katsuno and Mendelzon, 1990] H. Katsuno and A. O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It, Univ. of Toronto Tech. Rept., KRR-TR-90-6 [Lehmann and Magidor, 1992] D. Lehmann and M. Magidor. What does a conditional knowledge base entail? Artificial Intelligence, 55(1), 1-60, 1992. [Lehmann et al., 2001] D. Lehmann, M. Magidor, and K.Schlechta. Distance Semantics for Belief Revision, Journal of Symbolic Logic, Vol.66, No. 1, 295–317, 2001. [Lehmann, 1992a] D. Lehmann. Plausibility Logic, Proceedings CSL91, 1992. [Lehmann, 1992b] D. Lehmann. Plausibility Logic, Tech.Rept. TR-92-3, Feb. 1992, Hebrew University, Jerusalem 91904, Israel [Lehmann, 2001] D. Lehmann. Nonmonotonic Logics and Semantics, Journal of Logic and Computation, 11(2):229-256, 2001. [Lewis, 1973] D. Lewis. Counterfactuals, Blackwell, Oxford, 1973. [Makinson, 1994] D. Makinson. General patterns in nonmonotonic reasoning. In D.Gabbay, C.Hogger, Robinson, eds., Handbook of Logic in Artificial Intelligence and Logic Programming, Vol. III: Nonmonotonic and Uncertain Reasoning, pp. 35–110. Oxford University Press, 1994. [Reiter, 1980] R. Reiter. A logic for default reasoning, Artificial Intelligence 13 (1-2), 81-132, 1980. [Schlechta and Makinson, 1994] K. Schlechta and D. Makinson. Local and Global Metrics for the Semantics of Counterfactual Conditionals, Journal of Applied Non-Classical Logics, Vol.4, No.2, pp. 129-140, Hermes, Paris, 1994, also LIM Research Report RR 37, 09/94. [Schlechta, 1991] K. Schlechta. Theory Revision and Probability, Notre Dame Journal of Formal Logic 32, No.2, 307-319, 1991. [Schlechta, 1992] K. Schlechta. Some results on classical preferential models, Journal of Logic and Computation, Vol.2, No.6, 675-686, 1992. [Schlechta, 1995] K. Schlechta. Defaults as generalized quantifiers, Journal of Logic and Computation, Vol.5, No.4, 473-494, 1995. [Schlechta, 1996] K. Schlechta. Completeness and incompleteness for plausibility logic, Journal of Logic, Language and Information, 5:2, 177-192, 1996. [Schlechta, 1997a] K. Schlechta. Nonmonotonic logics — Basic Concepts, Results, and Techniques. Springer Lecture Notes series, LNAI 1187, 1997. [Schlechta, 1997b] K. Schlechta. Filters and partial orders, Journal of the Interest Group in Pure and Applied Logics, Vol. 5, No. 5, 753-772, 1997. [Schlechta, 1999] K. Schlechta. A topological construction of a non-smooth model of cumulativity, Journal of Logic and Computation, Vol.9, No.4, pp. 457-462, 1999. [Schlechta, 2000] K. Schlechta. New techniques and completeness results for preferential structures, Journal of Symbolic Logic, Vol.65, No.2, pp.719-746, 2000. [Schlechta, 2004] K. Schlechta. Coherent Systems, Elsevier, Amsterdam, 2004. [Sen, 1970] A. K. Sen. Collective Choice and Social Welfare, Holden-Day, San Francisco, CA, 1970. [Shoham, 1987] Y. Shoham. A semantical approach to nonmonotonic logics. In Proc. Logics in Computer Science, p. 275-279, Ithaca, N.Y., 1987, and In Proceed. IJCAI 87, p. 388-392. [Touretzky, 1986] D. S. Touretzky. The Mathematics of Inheritance Systems, Los Altos/ London, 1986.
DEFAULT LOGIC Grigoris Antoniou and Kewen Wang
1 INTRODUCTION: DEFAULT REASONING When an intelligent system (either computer–based or human) tries to solve a problem, it may be able to rely on complete information about this problem, and its main task is to draw the correct conclusions using classical reasoning. In such cases classical predicate logic may be sufficient. However in many situations the system has only incomplete information at hand, be it because some pieces of information are unavailable, be it because it has to respond quickly and does not have the time to collect all relevant data. Classical logic has indeed the capacity to represent and reason with certain aspects of incomplete information. But there are occasions in which additional information needs to be “filled in” to overcome the incompleteness, for example because certain decisions must be made. In such cases the system has to make some plausible conjectures, which in the case of default reasoning are based on rules of thumb, called defaults. For example, an emergency doctor has to make some conjectures about the most probable causes of the symptoms observed. Obviously it would be inappropriate to await the results of possibly extensive and time–consuming tests before beginning with the treatment. When decisions are based on assumptions, these may turn out to be wrong in the face of additional information that becomes available; for example, medical tests may lead to a modified diagnosis. The phenomenon of having to take back some previous conclusions is called nonmonotonicity; it says that if a statement ϕ follows from a set of premises M and M ⊆ M ′ , ϕ does not necessarily follow from M ′ . Default Logic, originally presented in [Reiter, 1980], provides formal methods to support this kind of reasoning. Default Logic is perhaps the most prominent method for nonmonotonic reasoning, basically because of the simplicity of the notion of a default, and because defaults prevail in many application areas. However there exist several alternative design decisions which have led to variations of the initial idea; actually we can talk of a family of default reasoning methods because they share the same foundations. In this paper we present the motivations and basic ideas of some of the most important default logic variants, and compare them both with respect to interconnections and the fulfillment of some properties. The key idea underlying all default reasoning methods is the use of rules allowing for “jumping to conclusions” even in the absence of certain information. In other words, this is the proposed solution to the problem of reasoning with incomplete information, as described above. In addition, default reasoning can be viewed as an approach to reasoning with inconsistent information. It is well documented that classical “collapses” in the
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
518
Grigoris Antoniou and Kewen Wang
presence of inconsistent information, in the sense that any conclusion can be drawn. In default reasoning, we can have default rules with conflicting conclusions. One of the main properties of default reasoning is that if the certain knowledge is free of inconsistencies, then the application of default rules cannot lead to inconsistent conclusions holding at the same time. The particular aims of this chapter are to: • present the basic ideas of Default Logic. • equip the readers with skills and methods so that they can apply the concepts of Default Logic to concrete situations. • give the reader a feeling of the diversity of the topic. The chapter is organized as follows: Section 2 presents the basics of Reiter’s Default Logic. Section 3 describes some basic default logic variants, based on variations of the original definition. Section 4 studies the notion of preferences in default logics. Section 5 discusses more recent approaches to preferences for logic programs (as a subclass of default theories). No prior knowledge of Default Logic is required, but we assume that the reader is familiar with the notation and the basic concepts of classical logic. 2 DEFAULT LOGIC
2.1
The Notion of a Default
A rule used by football organizers in Germany might be: “A football game shall take place, unless there is snow in the stadium”. This rule of thumb is represented by the default f ootball : ¬snow . takesP lace The interpretation of the default is as follows: If there is no information that there will be snow in the stadium, it is reasonable to assume ¬snow and conclude that the game will take place (so preparations can proceed). But if there is a heavy snowfall during the night before the game is scheduled, then this assumption can no longer be made. Now we have definite information that there is snow, so we cannot assume ¬snow, therefore the default cannot be applied. In this case we need to take back the previous conclusion (the game will take place), so the reasoning is nonmonotonic. Before proceeding with more examples let us first explain why classical logic is not appropriate to model this situation. Of course, we could use the rule f ootball ∧ ¬snow → takesP lace. The problem with this rule is that we have to definitively establish that there will be no snow in the stadium before applying the rule. But that would mean that no game could
Default Logic
519
be scheduled in the winter, which would create a revolution in Germany! It is important to understand the difference between having to know that it will not snow, and being able to assume that it will snow. Defaults support the drawing of conclusions based on assumptions. The same example could have been represented by the default f ootball : takesP lace , takesP lace together with the classical rule snow → ¬takesP lace. In case we know snow then we can deduce ¬takesP lace in classical logic, therefore we cannot assume takesP lace, as required by the default. In this representation, the default says “Football matches usually takes place”, and exceptions to this rule are represented by classical rules, as the above one. Defaults can be used to model prototypical reasoning which means that most instances of a concept have some property. One example is the statement “Typically, children have (living) parents” which may be expressed by the default child(X) : hasP arents(X) . hasP arents(X) A further form of default reasoning is no–risk reasoning. It concerns situations where we draw a conclusion even if it is not the most probable, because another decision could lead to a disaster. Perhaps the best example is the following main principle of justice in the Western cultures: “In the absence of evidence to the contrary assume that the accused is innocent”. In default form: accused(X) : innocent(X) . innocent(X) Defaults naturally occur in many application domains. Let us give an example from legal reasoning. According to German law, a foreigner is usually expelled if they have committed a crime. One of the exceptions to this rule concerns political refugees. This information is expressed by the default criminal(X) ∧ f oreigner(X) : expel(X) expel(X) in combination with the rule politicalRef ugee(X) → ¬expel(X). Hierarchies with exceptions are commonly used in biology. Here is a standard example: Typically, molluscs are shell–bearers. Cephalopods are molluscs. Cephalopods are not shell–bearers.
520
Grigoris Antoniou and Kewen Wang
It is represented by the default mollusc(X) : shellBearer(X) shellBearer(X) together with the rule cephalopod(X) → mollusc(X) ∧ ¬shellBearer(X). Defaults can be used naturally to model the Closed World Assumption [Reiter, 1977] which is used in database theory, algebraic specification, and logic programming. According to this assumption, an application domain is described by certain axioms (in form of relational facts, equations, rules etc.) with the following understanding: a ground fact (that is, a non–parameterized statement about single objects) is taken to be false in the problem domain if it does not follow from the axioms. The closed world assumption has the simple default representation true : ¬ϕ ¬ϕ for each ground atom ϕ. The explanation of the default is: if it is consistent to assume ¬ϕ (which is equivalent to not having a proof for ϕ) then conclude ¬ϕ. Further examples of defaults can be found in, say, [Besnard, 1989; Etherington, 1987b; Łukaszewicz, 1990; Poole, 1994].
2.2
The Syntax of Default Logic
A default theory T is a pair (W, D) consisting of a set W of predicate logic formulae (called the facts or axioms of T ) and a countable set D of defaults. A default δ has the form ϕ : ψ1 , . . . , ψn χ where ϕ, ψ1 , . . . , ψn , χ are closed predicate logic formulae, and n > 0. The formula ϕ is called the prerequisite, ψ1 , . . . , ψn the justifications, and χ the consequent of δ. Sometimes ϕ is denoted by pre(δ), {ψ1 , . . . , ψn } by just(δ), and χ by cons(δ). For a set D of defaults, cons(D) denotes the set of consequents of the defaults in D. A default is called normal iff it has the form ϕ:ψ ψ . One point that needs some discussion is the requirement that the formulae in a default be ground. This implies that bird(X) : f lies(X) f lies(X) is not a default according to the definition above. Let us call such rules of inference open defaults. An open default is interpreted as a default schema meaning that it represents a set of defaults (this set may be infinite).
Default Logic
521
A default schema looks like a default, the only difference being that ϕ, ψ1 , . . . , ψn , χ are arbitrary predicate logic formulae (i.e. they may contain free variables). A default schema defines a set of defaults, namely ϕσ : ψ1 σ, . . . , ψn σ χσ for all ground substitutions σ that assign values to all free variables occurring in the schema. That means, free variables are interpreted as being universally quantified over the whole default schema. Given a default schema bird(X) : f lies(X) f lies(X) and the facts bird(tweety) and bird(sam), the default theory represented is lies(tweety) bird(sam):f lies(sam) , }). ({bird(tweety), bird(sam)}, { bird(tweety):f f lies(tweety) f lies(sam)
2.3
Informal Discussion of the Semantics
Given a default
ϕ:ψ1 ,...,ψn , χ
its informal meaning is the following:
If ϕ is known, and if it is consistent to assume ψ1 , . . . , ψn , then conclude χ. In order to formalize this interpretation we must say in which context ϕ should be known, and with what ψ1 , . . . , ψn should be consistent. A first guess would be the set of facts, but this turns out to be inappropriate. Consider the default schema f riend(X, Y ) ∧ f riend(Y, Z) : f riend(X, Z) f riend(X, Z) which says “Usually my friends’ friends are also my friends”. Given the information f riend(tom, bob), f riend(bob, sally) and f riend(sally, tina), we would like to conclude f riend(tom, tina). But this is only possible if we apply the appropriate instance of the default schema to f riend(sally, tina) and f riend(tom, sally)}. The latter formula stems from a previous application of the default schema1 . If we did not admit this intermediate step and used the original facts only, then we could not get the expected result. Another example is the default theory T = (W, D) with W = {green, aaaM ember} and D = {δ1 , δ2 } with aaaM ember : likesCars green : ¬likesCars , δ2 = . ¬likesCars likesCars If consistency of the justifications was tested against the set of facts, then both defaults could be subsequently applied. But then we would conclude both likesCars and ¬likesCars δ1 =
1 with
other instantiations, of course.
522
Grigoris Antoniou and Kewen Wang
which is a contradiction. It is unintuitive to let the application of default rules lead to an inconsistency, even if they contradict each other. Instead, if we applied the first default, and then checked application of the second with respect to the current knowledge collected so far, the second default would be blocked: from the application of the first default we know ¬likesCars, so it is not consistent to assume likesCars. After these examples, here is the formal definition: n δ = ϕ:ψ1 ,...,ψ is applicable to a deductively closed set of formulae E iff ϕ ∈ E χ and ¬ψ1 ∈ E, . . . , ¬ψn ∈ E.
The example of Greens and AAA members indicates that there can be several competing current knowledge bases which may be inconsistent with one another. The semantics of Default Logic will be given in terms of extensions that will be defined as the current knowledge bases satisfying some conditions. Intuitively, extensions represent possible world views which are based on the given default theories; they seek to extend the set of known facts with “reasonable” conjectures based on the available defaults. The formal definition will be given in the next subsection. Here we just collect some desirable properties of extensions. • An extension E should include the set W of facts since W contains the certain information available: W ⊆ E. • An extension E should be deductively closed because we do not want to prevent classical logical reasoning. Actually, we want to draw more conclusions and that is why we apply default rules in addition. Formally: E = T h(E), where T h denotes the deductive closure. • An extension E should be closed under the application of defaults in D (formally: n ∈ D, ϕ ∈ E and ¬ψ1 ∈ E, . . . , ¬ψn ∈ E then χ ∈ E). That is, if ϕ:ψ1 ,...,ψ χ we do not stop applying defaults until we are forced to. The explanation is that there is no reason to stop at some particular stage if more defaults might be applied; extensions are maximal possible world views. These properties are certainly insufficient because they do not include any “upper bound”, that is, they don’t provide any information about which formulae should be excluded from an extension. So we should require that an extension E be minimal with respect to these properties. Unfortunately, this requirement is still insufficient. To see this consider the }. Let default theory T = (W, D) with W = {aussie} and D = { aussie:drinksBeer drinksBeer E = T h({aussie, ¬drinksBeer}). It is easily checked that E is minimal with the three properties mentioned above, but it would be highly unintuitive to accept it as an extension, since that would support the following argument: “If Aussies usually drink Beer and if somebody is an Aussie, then assume that she does not drink Beer”.
2.4
An Operational Definition of Extensions
For a given default theory T = (W, D) let Π = (δ0 , δ1 , . . .) be a finite or infinite sequence of defaults from D without multiple occurrences. Think of Π as a possible order in which
Default Logic
523
we apply some defaults from D. Of course, we don’t want to apply a default more than once within such a reasoning chain because no additional information would be gained by doing so. We denote the initial segment of Π of length k by Π[k], provided the length of Π is at least k (from now on, this assumption is always made when referring to Π[k]). With each such sequence Π we associate two sets of first–order formulae, In(Π) and Out(Π): • In(Π) is T h(W ∪ {cons(δ) | δ occurs in Π}). So, In(Π) collects the information gained by the application of the defaults in Π and represents the current knowledge base after the defaults in Π have been applied. • Out(Π) = {¬ψ | ψ ∈ just(δ) for some δ occurring in Π}. So, Out(Π) collects formulae that should not turn out to be true, i.e. that should not become part of the current knowledge base even after subsequent application of other defaults. Let us give a simple example. Consider the default theory T = (W, D) with W = {a} and D containing the following defaults: a : ¬b b:c , δ2 = . ¬b c For Π = (δ1 ) we have In(Π) = T h({a, ¬b}) and Out(Π) = {b}. For Π = (δ2 , δ1 ) we have In(Π) = T h({a, c, ¬b}) and Out(Π) = {¬c, b}. Up to now we have not assured that the defaults in Π can be applied in the order given. In the example above, (δ2 , δ1 ) cannot be applied in this order (applied according to the definition in the previous subsection). To be more specific, δ2 cannot be applied, since b ∈ In(()) = T h(W ) = T h({a}) which is the current knowledge before we attempt to apply δ2 . On the other hand, there is no problem with Π = (δ1 ); in this case we say that Π is a process of T . Here is the formal definition: δ1 =
• Π is called a process of T iff δk is applicable to In(Π[k]), for every k such that δk occurs in Π. Given a process Π of T we define the following: • Π is successful iff In(Π) ∩ Out(Π) = ∅, otherwise it is failed. • Π is closed iff every δ ∈ D that is applicable to In(Π) already occurs in Π. Closed processes correspond to the desired property of an extension E being closed under application of defaults in D. Consider the default theory T = (W, D) with W = {a} and D containing the following defaults: a : ¬b true : c , δ2 = . d b Π1 = (δ1 ) is successful but not closed since δ2 may be applied to In(Π1 ) = T h({a, d}). Π2 = (δ1 , δ2 ) is closed but not successful: both In(Π2 ) = T h({a, d, b}) and Out(Π2 ) = δ1 =
524
Grigoris Antoniou and Kewen Wang
T h(∅) •
∅
T h({¬a}) • {¬a} failed
Figure 1. {b, ¬c} contain b. On the other hand, Π3 = (δ2 ) is a closed and successful process of T . According to the following definition, which was first introduced in [Antoniou and Sperschneider, 1994], In(Π3 ) = T h({a, b}) is an extension of T , in fact its single extension. DEFINITION 1. A set of formulae E is an extension of the default theory T iff there is some closed and successful process Π of T such that E = In(Π). In examples it is often useful to arrange all possible processes in a canonical manner within a tree, called the process tree of the given default theory T . The nodes of the tree are labeled with two sets of formulae, an In–set (to the left of the node) and an Out–set (to the right of the node). The edges correspond to default applications and are labeled with the default that is being applied. The paths of the process tree starting at the root correspond to processes of T .
2.5
Some Examples
Let T = (W, D) with W = ∅ and D = { true:a ¬a }. The process tree in Figure 1 shows that T has no extensions. Indeed, the default may be applied because there is nothing preventing us from assuming a. But when the default is applied, the negation of a is added to the current knowledge base, so the default invalidates its own application because both the In and the Out–set contain ¬a. This example demonstrates that there need not always be an extension of a default theory. Let T = (W, D) be the default theory with W = ∅ and D = {δ1 , δ2 } with δ1 =
true : p , ¬q
δ2 =
true : q . r
The process tree of T is found in Figure 2 and shows that T has exactly one extension, namely T h({¬q}). The right path of the tree shows an example where application of a default destroys the applicability of a previous default: δ1 can be applied after δ2 , but then ¬q becomes part of the In–set, whilst it is also included in the Out–set (as the negation of the justification of δ2 ). Let T = (W, D) with W = {green, aaaM ember} and D = {δ1 , δ2 } with
Default Logic
T h(∅) •
525
∅
"b b " b " δ2 δ1 " b b " b " b " b " b " b " T h({r}) •
T h({¬q}) • {¬p} closed & successful
{¬q}
δ1
T h({¬q, r}) • {¬q, ¬p} failed
Figure 2. T h({g, a}) •
∅
"b b " b " δ2 δ1 " b b " b " b " b " b " b " T h({g, a, l}) • {¬l} closed & successful
T h({g, a, ¬l}) • {l} closed & successful
Figure 3.
aaaM ember : likesCars green : ¬likesCars , δ2 = . ¬likesCars likesCars The process tree in Figure 3 shows that T has exactly two extensions (where g stands for green, a for aaaM ember, and l for likesCars). δ1 =
2.6
Reiter’s Original Definition of Extensions
In this subsection we present Reiter’s original definition of extensions [Reiter, 1980]. In subsection 2.3 we briefly explained that the most difficult problem in describing the meaning of a default is to determine the appropriate set with which the justifications of the defaults must be consistent. The approach adopted by Reiter is to use some theory beforehand. That is, choose a theory which plays the role of a context or belief set and always check consistency against this context. Let us formalize this notion: n is applicable to a deductively closed set of formulae F • A default δ = ϕ:ψ1 ,...,ψ χ with respect to belief set E (the aforementioned context) iff ϕ ∈ F , and ¬ψ1 ∈
526
Grigoris Antoniou and Kewen Wang
E, . . . , ¬ψn ∈ E (that is, each ψi is consistent with E). Note that the concept “δ is applicable to E” used so far is a special case where E = F . The next question that arises is which contexts to use. Firstly note that when a belief set E has been established some formulae will become part of the knowledge base by applying defaults with respect to E. Therefore they should be believed, i.e. be members of E. On the other hand what would be a justification for a belief if it were not obtained from default application w.r.t. E? We require that E contain only formulae that can be derived from the axioms by default application w.r.t. E. Let us now give a formal presentation of these ideas. For a set D of defaults, we say that F is closed under D with respect to belief set E iff, for every default δ in D that is applicable to F with respect to belief set E, its consequent χ is also contained in F . Given a default theory T = (W, D) and a set of formulae E, let ΛT (E) be the least set of formulae that contains W , is closed under logical conclusion (i.e. first–order deduction), and closed under D with respect to E. Informally speaking, ΛT (E) is the set of formulae that are sanctioned by the default theory T with respect to the belief set E. Now, according to Reiter’s definition, E is an extension of T iff E = ΛT (E). This fixpoint definition says that E is an extension iff by deciding to use E as a belief set, exactly the formulae in E will be obtained from default application. But please note the difficulty in applying this definition: we have to guess E and subsequently check for the fulfillment of the fixpoint equation. Having to guess is one of the most serious obstacles in understanding the concepts of Default Logic and in being able to apply them to concrete cases. The following theorem shows that Reiter’s extension concept is equivalent to the definition in subsection 2.4. THEOREM 2. Let T = (W, D) be a default theory. E is an extension of T (in the sense of definition 2.1) iff E = ΛT (E). We conclude by giving a quasi–inductive characterization of extensions, also due to Reiter: Given adefault theory T = (W, D), we say that E has a quasi–inductive definition in T iff E = i Ei , where E0 = T h(W ) and Ei+1 = T h(Ei ∪ {cons(δ) | δ ∈ D is applicable to Ei w.r.t. belief set E}). THEOREM 3. E is an extension of T iff E has a quasi–inductive definition in T .
This characterization replaces the ΛT –operator by a construction, both of them using the set E as context or belief set. Given a set of formulae E, this characterization is intuitively appealing. But notice that still it is necessary to first guess E before checking whether it is an extension. In this sense the characterization is not as easy to apply as the process model from subsection 2.4. The relationship of processes to the quasi–inductive definition is that the traversal of the process tree operationalizes the idea of guessing. More formally: if a branch of the process tree leads to a closed and successful process Π, then the quasi–inductive construction using In(Π) as a belief set yields the same result. But some branches of the process tree can lead to failed processes; this is the price we have to pay if we wish to avoid guessing.
Default Logic
2.7
527
An Argumentation-theoretic Characterization
Argumentation provides an abstract view of nonmonotonic reasoning. It is based on the consideration of arguments and their possible defeat by counterarguments. In the following we briefly describe the characterisation of default logic in the argumentation framework of [Bondarenko et al., 1997]. Underlying any argumentation framework is a deductive basis, which consists of the logical language L, and a set R of inference rules. The deductive basis defines a syntactic provability relation ⊢; in this section T h(T ) denotes the set {α ∈ L | T ⊢ α}. An Argumentation-based framework (w.r.t. a deductive basis) consists of • a set W of formulae, representing the certain knowledge. • a set Ab of formulae, representing the possible assumptions. • a function − : Ab → F or, with the idea that α represents the contrary of α ∈ Ab. A set ∆ ⊆ Ab attacks an assumption α ∈ Ab iff W ∪ ∆ ⊢ α. A set of assumptions ∆ attacks a set of assumptions ∆′ iff there is an α ∈ ∆′ such that ∆ attacks α. A set of assumptions ∆ ⊆ Ab is stable iff (a) ∆ is closed: ∆ = {α ∈ Ab | T ∪ ∆ ⊢ α}. (b) ∆ does not attack itself. (c) ∆ attacks every α ∈ ∆. If ∆ is stable, then T h(W ∪ ∆) is called a stable extension. The concepts of attack and stability are independent of the particular argumentationbased framework. In fact they have been used to characterise several nonmonotonic reasoning approaches; see [Bondarenko et al., 1997] for details. In the following we show how Default Logic can be embedded into this framework. Let T = (W, D) be a default theory. We define its translation arg(T ) into the argumentation-based framework. First we define the deductive basis. The language L consists of the first order language L0 of T , extended by additional predicate symbols M α for every closed formula α in the language of T . Let R0 be a deductively complete set of inference rules for predicate logic (in the language L0 ). We add the following inference rules, which correspond to each of the defaults in T . Essentially we wish to infer the consequent of a default if we have assumed all its justifications and we have already inferred its prerequisite. Formally: L = L0 ∪ {M α | α is closed formula in L′ }. ϕ, M ψ1 , . . . , M ψn ϕ : ψ1 , . . . , ψn | ∈ D}2 . R = R0 ∪ { χ χ 2 Note that the rules in R are not defaults, but rather inference rules in the sense of classical logic: if all formulas above the line have been derived, then we may also derive the formula below the line.
528
Grigoris Antoniou and Kewen Wang
The argumentation-based framework arg(T ) = (W, Ab,− ) is now defined as follows: • Ab = {M ψ | ψ ∈ just(δ) for δ ∈ D}. • M α = ¬α. THEOREM 4. E is an extension of T iff there is a stable extension E ′ of arg(T ) such that E = E ′ ∩ L0 . The proof is found in [[Bondarenko et al., 1997]. As an example, consider the default theory with the two defaults true : p true : r , . q ¬p In the translation we have Ab = {M p, M r} Mp Mr R = R0 ∪ { , } q ¬p ∆ = {M r} is stable: (i) it is closed since we can only infer the assumption M r using ∆ (this is true for any default theory); (ii) it does not attack itself; (iii) it attacks the assumption M p not in ∆: ∆ ⊢ ¬p = M p. On the other hand ∆′ = {M p, M r} is not stable because it attacks itself: ∆′ ⊢ ¬p = M p and M p ∈ ∆′ . Finally, ∆′′ = {M p} is not stable because it does not attack M r which is not included in ∆′′ .
2.8
Operational semantics
In this section we will give a semantic characterisation of default logic in the semantic framework of [Teng, 1996]. Consider the default theory T consisting of the fact p and the q:r defaults δ1 = p:q q and δ2 = r . Obviously T has the single extension T h({p, q, r}). It is obtained from the process (δ1 , δ2 ) (see [Antoniou, 1998]). Let W be the set of all possible (total) worlds over the propositional language {p, q, r}. The semantic counterpart of the process above is the following so-called default partition sequence: S =< W0 , W1 , W2 , W3 >, where W0 = {w | w |= ¬p} W1 = {w | w |= (p ∧ ¬q)} W2 = {w | w |= (p ∧ q ∧ ¬r)} W3 = {w | w |= (p ∧ q ∧ r)}. First note that S forms a partition of W . W0 includes all worlds in which the fact p is false. W1 includes those world not in W0 in which the consequent of the first default is false. Equally W2 contains those worlds not in W0 ∪ W1 in which the consequent of the
Default Logic
529
second default is not true. Finally W3 consists of the remaining worlds, in which the fact and the consequents of the defaults applied are true. The set of all formulae true in W3 is an extension of T , indeed its only extension. Essentially < Wi , . . . , Wl > defines a “frame of reference”, that is, the body of knowledge which builds the current context before applying the i-th rule. Each reasoning step is performed with respect to the current context. Once an inference is made, the frame of reference is updated by pruning out additional worlds, those in which the new conclusion is false. Please note also that the prerequisite p of the first default applied is true in all worlds in W1 ∪ W2 ∪ W3 , and that there is at least one world in W3 in which the justification of δ1 is true. This reflects the property of success in the process model of Default Logic: when a default is applied, its justifications must be consistent not only with the current knowledge base, but also with the final result of the branch of the process tree (that is, with the In-set of the closed process). For simplicity we give the semantics in the propositional case. Let Σ be a propositional signature, that is, a set of propositional atoms. We call w a world (in Σ) iff w ⊆ Σ ∪ ¬Σ, and for every p ∈ Σ either p ∈ w or ¬p ∈ w. w |= ϕ denotes validity of the formula ϕ in w. Let W be the set of all possible worlds (in Σ). A partition sequence of W is a tuple < W1 , . . . , Wl > (l ≥ 1) such that the non-empty elements Wi form a partition of W . DEFINITION 5. Let T = (W, D) be a default theory theory. A default partition sequence for T is a partition sequence S =< W0 , . . . , Wl > such that there is a sequence of P =< δ1 , . . . , δl−1 > satisfying the following conditions: (a) For all i = 1, . . . , l − 1, Wi = {w | w ∈ W0 ∪ . . . ∪ Wi−1 and w |= cons(δi )} (b) For all i = 1, . . . , l − 1: (i) ∀w ∈ Wi ∪ . . . ∪ Wl : w |= pre(δi ) (ii) ∀ψ ∈ just(δi )∃w ∈ Wl : w |= ψ (c) There is no default δ ∈ {δ1 , . . . , δl−1 } which is applicable in the sense of (b) (replacing δi by δ, and i by l). Condition (a) ensures that every time a default is applied, the worlds in which its consequent is false are disregarded from further consideration. Condition (b) ensures applicability of the respective default in the current frame of reference. Finally (c) corresponds to the closure property of processes. THEOREM 6. If E is an extension of T , then there is a default partition sequence S =< W0 , . . . , Wl > of T such that E = {ϕ | ∀w ∈ Wl : w |= ϕ}.
530
Grigoris Antoniou and Kewen Wang
3 VARIANTS OF DEFAULT LOGIC
3.1
A Discussion of Properties
Here we discuss some properties of Default Logic. Some of these properties can be interpreted as deficiencies, or they highlight some of Reiter’s original “design decisions” and show alternative ideas that could be followed instead. In this sense the discussion in this section motivates alternative approaches that will be presented in subsequent sections. One point that should be stressed is that there is not a “correct” default logic approach, but rather the most appropriate for the concrete problem at hand. Different intuitions lead to different approaches that may work better for some applications and worse for others. Existence of Extensions We saw that a default theory may not have any extensions. Is this a shortcoming of Default Logic? One might hold the view that if the default theory includes “nonsense” (for example true:p ¬p ), then the logic should indeed be allowed to provide no answer. According to this view, it is up to the user to provide meaningful information in the form of meaningful facts and defaults; after all, if a program contains an error, we don’t blame the programming language. The opposite view regards nonexistence of extensions as a drawback, and would prefer a more “fault–tolerant” logic; one which works even if some pieces of information are deficient. This viewpoint is supported by the trend towards heterogeneous information sources, where it is not easy to identify which source is responsible for the deficiency, or where the single pieces of information are meaningful, but lead to problems when put together. A more technical argument in favor of the second view is the concept of semimonotonicity. Default Logic is a method for performing nonmonotonic reasoning, so we cannot expect it to be monotonic when new knowledge is added to the set of facts. However we might expect that the addition of new defaults would yield more, and not less information3 . Formally, semi–monotonicity means the following: Let T = (W, D) and T ′ = (W, D′ ) be default theories such that D ⊆ D′ . Then for every extension E of T there is an extension E ′ of T ′ such that E ⊆ E ′ . Default Logic violates this property. For example, T = (∅, { true:p p }) has the single extentrue:q sion E = T h({p}), but T ′ = (∅, { true:p , }) has no extension. So nonexistence of p ¬q extensions leads to the violation of semi–monotonicity. Even though the concept of semi– monotonicity is not equivalent to the existence of extensions, these two properties usually come together (for a more formal support of this claim see [Antoniou et al., 1996]). If we adopt the view that the possible nonexistence of extensions is a problem, then there are two alternative solutions. The first one consists in restricting attention to those classes of default theories for which the existence of extensions is guaranteed. Already in 3 Some researchers would disagree with this view and regard semi–monotonicity as not desirable; see, for example, [Brewka, 1991].
Default Logic
531
his classical paper [Reiter, 1980] Reiter showed that if all defaults in a theory T are normal (in which case T is called a normal default theory), then T has at least one extension. Essentially this is because all processes are successful, as can be easily seen. THEOREM 7. Normal default theories always have extensions. Furthermore they satisfy semi–monotonicity. One problem with the restriction to normal default theories is that their expressiveness is limited. In general it can be shown that normal default theories are strictly less expressive than general default theories. Normal defaults have limitations particularly regarding the interaction among defaults. Consider the example Bill is a high school dropout. Typically, high school dropouts are adults. Typically, adults are employed. These facts are naturally represented by the normal default theory T = ({dropout(bill)}, { dropout(X):adult(X) , adult(X):employed(X) }). T has the single extension T h({dropout adult(X) employed(X) (bill), adult(bill), employed(bill)}). It is acceptable to assume that Bill is adult, but it is counterintuitive to assume that Bill is employed! That is, whereas the second default on its own is accurate, we want to prevent its application in case the adult X is a high school dropout. This can be achieved if we change the second default to adult(X) : employed(X) ∧ ¬dropout(X) . employed(X) But this default is not normal4 . Defaults of this form are called semi–normal; [Etherington, 1987a] studied this class of default theories, and gave a sufficient condition for the existence of extensions. Another way of expressing interactions among defaults is the use of explicit priorities; this approach will be further discussed in section 4. Instead of imposing restrictions on the form of defaults in order to guarantee the existence of extensions, the other principal way is to modify the concept of an extension in such a way that all default theories have at least one extension, and that semi– monotonicity is guaranteed. In sections 3.2 and 3.3 we will discuss two important variants with these properties, Lukaszewicz’ Justified Default Logic and Schaub’s Constrained Default Logic. Joint Consistency of Justifications and true:¬p has It is easy to see that the default theory consisting of the defaults true:p q r the single extension T h({q, r}). This shows that the joint consistency of justifications is not required. Justifications are not supposed to form a consistent set of beliefs, rather they are used to sanction “jumping” to some conclusions. 4 Note that it is unreasonable to add ¬dropout(X) to the prerequisite of the default to keep it normal, because then we would have to definitely know that an adult is not a high school dropout before concluding that the person is employed.
532
Grigoris Antoniou and Kewen Wang
This design decision is natural and makes sense for many cases, but can also lead to unintuitive results. As an example consider the default theory, due to Poole, which says that, by default, a robot’s arm (say a or b) is usable unless it is broken; further we know that either a or b is broken. Given this information, we would not expect both a and b to be usable. Let us see how Default Logic treats this example. Consider the default theory T = (W, D) with W = {broken(a) ∨ broken(b)} and D consisting of the defaults true : usable(a) ∧ ¬broken(a) true : usable(b) ∧ ¬broken(b) , . usable(a) usable(b) Since we do not have definite information that a is broken we may apply the first default and obtain E ′ = T h(W ∪ {usable(a)}). Since E ′ does not include broken(b) we may apply the second default and get T h(W ∪ {usable(a), usable(b)}) as an extension of T . This result is undesirable, as we know that either a or b is broken. In section 3.3 we shall discuss Constrained Default Logic as a prototypical Default Logic approach that enforces joint consistency of justifications of defaults involved in an extension. The joint consistency property gives up part of the expressive power of default theories: n is equivalent to the under this property any default with several justifications ϕ:ψ1 ,...,ψ χ n modified default ϕ:ψ1 ∧...∧ψ which has one justification. This is in contrast to a result χ in [Besnard, 1989] which shows that in Default Logic, defaults with several justifications are strictly more expressive than defaults with just one justification. Essentially, in default logics adopting joint consistency it is impossible to express default rules of the form “In case I am ignorant about p (meaning that I know neither p nor ¬p) I conclude q”. The , but this default can never be natural representation in default form would be true:p,¬p q applied if joint consistency is required, because its justifications contradict one another; on the other hand it can be applicable in the sense of Default Logic. Another example for which joint consistency of justifications is undesirable is the following5 . When I prepare for a trip then I use the following default rules:
If I may assume that the weather will be bad I’ll take my sweater. If I may assume that the weather will be good then I’ll take my swimsuit. In the absence of any reliable information about the weather I am cautious enough to take both with me. But note that I am not building a consistent belief set upon which I make these decisions; obviously the assumptions of the default rules contradict each other. So Default Logic will treat this example in the intended way whereas joint consistency of justifications will prevent me from taking both my sweater and my swimsuit with me. Cumulativity and Lemmas Cumulativity is, informally speaking, the property that allows for the safe use of lemmas. Formally: Let D be a fixed, countable set of defaults. For a formula ϕ and a set of 5 My
thanks go to an anonymous referee.
Default Logic
533
formulae W we define W ⊢D ϕ iff ϕ is included in all extensions of the default theory (W, D). Now, cumulativity is the following property: If W ⊢D ϕ, then f or all ψ : W ⊢D ψ ⇐⇒ W ∪ {ϕ} ⊢D ψ. If we interpret ϕ as a lemma, cumulativity says that the same formulae can be obtained from W as from W ∪{ϕ}. This is the standard basis of using lemmas in, say, mathematics. Default Logic does not respect cumulativity: consider T = (W, D) with W = ∅ and D consisting of the defaults true : a a ∨ b : ¬a , a ¬a (this example is due to Makinson). The only extension of T is T h({a}). Obviously, W ⊢D a. From a ∨ b ∈ T h({a}) we get W ⊢D a ∨ b. If we take W ′ = {a ∨ b}, then the default theory (W ′ , D) has two extensions, T h({a}) and T h({¬a, b}); therefore W ∪ {a ∨ b} ⊢D a. An analysis of cumulativity and other abstract properties of nonmonotonic inference is found in [Makinson, 1994]. Quite some work has been invested in developing default logics that possess the cumulativity property, one notable approach being Brewka’s Cumulative Default Logic [Brewka, 1991]. But it is doubtful whether this is the right way to go, since it has additional conceptual and computational load, due to the use of assertions rather than plain formulae. One might argue that semimonotonicity is rather unintuitive because it requires a defeasible conclusion which was based on some assumptions to be represented by a certain piece of information, that means a fact, and yet exhibit the same behaviour. From the practical point of view the really important issue is whether we are able to represent and use lemmas in a safe way. How can we do this in Default Logic? [Schaub, 1992] proposed the representation of a lemma by a corresponding lemma default which records in its justifications the assumptions on which a conclusion was based. The formal definition of a lemma default is as follows. Let Πχ be a nonempty, successful process of T , minimal with the property χ ∈ In(Πχ ). A lemma default δχ corresponding to χ is the default true : ψ1 , . . . , ψn χ where {ψ1 , . . . , ψn } = {ψ | ψ ∈ just(δ) for a δ occurring in Πχ }. This default collects all assumptions that were used in order to derive χ. THEOREM 8. Let χ be included in an extension of T and δχ a corresponding lemma default. Then every extension of T is an extension of T ′ = (W, D ∪{δχ }), and conversely. So it is indeed possible to represent lemmas in Default Logic, not as facts (as required by cumulativity) but rather as defaults, which appears more natural anyway, since it highlights the nature of a lemma as having been proven defeasibly and thus as being open to disputation.
534
3.2
Grigoris Antoniou and Kewen Wang
Justified Default Logic
Motivation and Formal Presentation Lukaszewicz considered the possible nonexistence of extensions as a representational shortcoming of the original Default Logic, and presented a variant, Justified Default Logic [Łukaszewicz, 1988] which avoids this problem. The essence of his approach is the following: If we have a successful but not yet closed process, and all ways of expanding it by applying a new default lead to a failed process, then we stop and accept the current In–set as an extension. In other words, we take back the final, “fatal” step that causes failure. Consider the default theory T = (W, D) with W = {holidays, sunday} and D consisting of the defaults
δ1 =
sunday : goF ishing ∧ ¬wakeU pLate holidays : wakeU pLate , δ2 = . goF ishing wakeU pLate
It is easily seen that T has only one extension (in the sense of section 2), namely T h({holidays, sunday, wakeU pLate}). But if we apply δ1 first, then δ2 can be applied and leads to a failed process. In this sense we lose the intermediate information T h({holidays, sunday, goF ishing}). On the other hand, in Justified Default Logic we would stop after the application of δ1 instead of applying δ2 and running into failure; therefore we accept T h({holidays, sunday, goF ishing}) as an additional (modified) extension. Technically this is achieved by paying attention to maximally successful processes. Let T be a default theory, and let Π and Γ be processes of T . We define Π < Γ iff the set of defaults occurring in Π is a proper subset of the defaults occurring in Γ. Π is called a maximal process of T , iff Π is successful and there is no successful process Γ such that Π < Γ. A set of formulae E is called a modified extension of T iff there is a maximal process Π of T such that E = In(Π). In the example above Π = (δ1 ) is a maximal process: the only process that strictly includes Π is Γ = (δ1 , δ2 ) which is not successful. Therefore T h({holidays, sunday, goF ishing}) is a modified extension of T . T has another modified extension, which is the single extension T h({holidays, sunday, wakeU pLate}) of T . Obviously every closed and successful process is a maximal process (since no new default can be applied). Therefore we have the following result: THEOREM 9. Every extension of a default theory T is a modified extension of T . In the process of a default theory T maximal processes correspond either to closed and successful nodes, or to nodes n such that all immediate children of n are failed. It is instructive to look at a default theory without an extension, for example T = (W, D) with W = ∅ and D = { true:p ¬p }. The empty process is maximal (though not closed), because the application of the “strange default” would lead to a failed process, therefore T h(∅) is a modified extension of T . Since any branch of the process tree can be extended successfully to a modified extension, the following result can be shown.
Default Logic
535
THEOREM 10. Every default theory has at least one modified extension. Furthermore Justified Default Logic satisfies semi–monotonicity. Lukaszewicz’ Original Definition The original definition given in [Łukaszewicz, 1988] was based on fixpoint equations. Let T = (W, D) be a default theory, and E, F , E ′ and F ′ sets of formulae. We say that a n default δ = ϕ:ψ1 ,...,ψ is applicable to E ′ and F ′ with respect to E and F iff ϕ ∈ E ′ and χ E ∪ {χ} |= ¬ψ for all ψ ∈ F ∪ {ψ1 , . . . , ψn }. E ′ and F ′ are closed under the application of defaults in D with respect to E and F n iff, whenever a default δ = ϕ:ψ1 ,...,ψ in D is applicable to E ′ and F ′ with respect to E χ and F , χ ∈ E ′ and {ψ1 , . . . , ψn } ⊆ F ′ . Define Λ1T (E, F ) and Λ2T (E, F ) to be the smallest sets of formulae such that Λ1T (E, F ) is deductively closed, W ⊆ Λ1T (E, F ), and Λ1T (E, F ) and Λ2T (E, F ) are closed under D with respect to E and F . The following theorem shows that modified extensions correspond exactly to sets E and F satisfying the fixed–point equations E = Λ1T (E, F ) and F = Λ2T (E, F ). This is not surprising: intuitively, the idea behind the complicated definition of the Λ–operators is to maintain the set of justifications of defaults that have been applied (i.e. the sets F and F ′ which, in fact, correspond to ¬Out(Π)), and to avoid applications of defaults if they lead to an inconsistency with one of these justifications. THEOREM 11. Let T be a default theory. For every modified extension E of T there is a set of formulae F such that E = Λ1T (E, F ) and F = Λ2T (E, F ). Conversely, let E and F be sets of formulae such that E = Λ1T (E, F ) and F = 2 ΛT (E, F ). Then E is a modified extension of T .
3.3
Constrained Default Logic
Motivation and Definition Justified Default Logic avoids running into inconsistencies and can therefore guarantee the existence of modified extensions. On the other hand, it does not require joint consistency of default justifications; for example, the default theory T = (W, D) with W = ∅ true:¬p } has the single modified extension T h({q, r}). Constrained and D = { true:p q , r Default Logic [Schaub, 1992; Delgrande et al., 1994] is a Default Logic approach which enforces joint consistency. In the example above, after the application of the first default the second default may not be applied because p contradicts ¬p. Furthermore, since the justifications are consistent with each other, we test the consistency of their conjunction with the current knowledge base. In the terminology of processes, we require the consistency of In(Π) ∪ ¬Out(Π). Finally, we adopt the idea from the previous section, namely a default may only be n applied if it does not lead to a contradiction (failure) a posteriori. That means, if ϕ:ψ1 ,...,ψ χ is tested for application to a process Π, then In(Π) ∪ ¬Out(Π) ∪ {ψ1 , . . . , ψn , χ} must be consistent. We note that the set Out no longer makes sense since we require joint
536
Grigoris Antoniou and Kewen Wang
consistency. Instead we have to maintain the set of formulae which consists of W , all consequents and all justifications of the defaults that have been applied. • Given a default theory T = (W, D) and a sequence Π of defaults in D without multiple occurrences, we define Con(Π) = T h(W ∪ {ϕ | ϕ is the consequent or a justification of a default occurring in Π}). Sometimes we refer to Con(Π) as the set of constraints or the set of supporting beliefs. Con(Π) represents the set of beliefs supporting Π. For the default theory T = (W, D) true:¬p } let Π1 = (δ1 ). Then Con(Π1 ) = with W = ∅ and D = {δ1 = true:p q , δ2 = r T h({p, q}). n is applicable to a pair of deductively closed We say that a default δ = ϕ:ψ1 ,...,ψ χ sets of formulae (E, C) iff ϕ ∈ E and ψ1 ∧ . . . ∧ ψn ∧ χ is consistent with C. A pair (E, C) of deductively closed sets of formulae is called closed under D if, for every default ϕ:ψ1 ,...,ψn ∈ D that is applicable to (E, C), χ ∈ E and {ψ1 , . . . , ψn , χ} ⊆ C. χ In the example above, δ2 is not applicable to (In(Π1 ), Con(Π1 )) = (T h({q}), T h({p, q})) because {¬p ∧ r} ∪ T h({p, q}) is inconsistent. Let Π = (δ0 , δ1 , . . .) be a sequence of defaults in D without multiple occurrences. • Π is a constrained process of the default theory T = (W, D) iff, for all k such that Π[k] is defined, δk is applicable to (In(Π[k]), Con(Π[k])). • A closed constrained process Π is a constrained process such that every default δ which is applicable to (In(Π), Con(Π)) already occurs in Π. • A pair of sets of formulae (E, C) is a constrained extension of T iff there is a closed constrained process Π of T such that (E, C) = (In(Π), Con(Π)). Note that we do not need a concept of success here because of the definition of default applicability we adopted: δ is only applicable to (E, C) if it does not lead to a contradiction. Let us reconsider the “broken arms” example: T = (W, D) with W = {broken(a) ∨ broken(b)}, and D consisting of the defaults
δ1 =
true : usable(b) ∧ ¬broken(b) true : usable(a) ∧ ¬broken(a) , δ2 = . usable(a) usable(b)
It is easily seen that there are two closed constrained processes, (δ1 ) and (δ2 ), leading to two constrained extensions: (T h(W ∪ {usable(a)}), T h({broken(b), usable(a), ¬broken(a)})), and (T h(W ∪ {usable(b)}), T h({broken(a), usable(b), ¬broken(b)})). The effect of the definitions above is that it is impossible to apply both defaults together: after the application of, say, δ1 , ¬broken(a) is included in the Con–set; together with
Default Logic
537
broken(a) ∨ broken(b) it follows broken(b), therefore δ2 is blocked. The two alternative constrained extensions describe the two possible cases we would have intuitively expected. p:r For another example consider T = (W, D) with W = {p} and D = { p:¬r q , r }. T has two constrained extensions, (T h({p, q}), T h({p, q, ¬r})) and (T h({p, r}), T h({p, r})). Note that for both constrained extensions, the second component collects the assumptions supporting the first component. A Fixpoint Characterization Schaub’s original definition of constrained extensions used a fixed–point equation [Schaub, 1992]: Let T = (W, D) be a default theory. For a set C of formulae let ΘT (C) be the pair of smallest sets of formulae (E ′ , C ′ ) such that 1. W ⊆ E ′ ⊆ C ′ 2. E ′ and C ′ are deductively closed n 3. For every ϕ:ψ1 ,...,ψ ∈ D, if ϕ ∈ E ′ and C ∪ {ψ1 , . . . , ψn , χ} is consistent, then χ ′ χ ∈ E and {ψ1 , . . . , ψn , χ} ⊆ C ′ .
The following result shows that this definition is equivalent to the definition of constrained extensions from the previous subsection. THEOREM 12. (E, C) is a constrained extension of T iff (E, C) = ΘT (C). THEOREM 13. Every default theory has at least one constrained extension. Furthermore Constrained Default Logic is semi–monotonic. Interconnections In the following we describe the relationship among the default logic variants presented so far. THEOREM 14. Let T be a default theory and E = In(Π) an extension of T , where Π is a closed and successful process of T . If E ∪ ¬Out(Π) is consistent, then (E, T h(E ∪ ¬Out(Π))) is a constrained extension of T . The converse does not hold since the existence of an extension is not guaranteed. For example T = (∅, { true:p ¬p }) has the single constrained extension (T h(∅), T h(∅)), but no extension. THEOREM 15. Let T be a default theory and E = In(Π) a modified extension of T , where Π is a maximal process of T . If E ∪ ¬Out(Π) is consistent, then (E, T h(E ∪ ¬Out(Π))) is a constrained extension of T . true:¬p The example T = (W, D) with W = ∅ and D = { true:p } shows that we q , r cannot expect the first component of a constrained extension to be a modified extension: T has the single modified extension T h({q, r}), but possesses two constrained extensions, (T h({q}), T h({p, q})) and (T h({r}), T h({¬p, r})). As the following result
538
Grigoris Antoniou and Kewen Wang
demonstrates, it is not accidental that for both constrained extensions, the first component is included in the modified extension. THEOREM 16. Let T be a default theory and (E, C) a constrained extension of T . Then there is a modified extension F of T such that E ⊆ F . The following examples illustrates well the difference between the three approaches. Consider the default theory T = (W, D) with W = ∅ and D={
true : p true : ¬p true : ¬q, ¬r , , }. q r s
T has the single extension T h({q, r}), two modified extensions, T h({q, r}) T h({s}), and three constrained extensions (T h({q}), T h({q, p})) (T h({r}), T h({r, ¬p})) (T h({s}), T h({s, ¬q, ¬r})). This theory illustrates the essential differences of the three approaches discussed. Default Logic does not care about inconsistencies among justifications and may run into inconsistencies. Thus the first two defaults can be applied together, while if the third default is applied first, then the process is not closed and subsequent application of another default leads to failure. Justified Default Logic avoids the latter situation, so we obtain an additional modified extension. Constrained Default Logic avoids running into failure, too, but additionally requires joint consistency of justifications, therefore the two first defaults cannot be applied in conjunction, as in the other two approaches. Thus we get three constrained extensions. We conclude this section by noting that for normal default theories, all default logic approaches discussed are identical. In other words, they coincide for the “well–behaved” class of default theories, and seek to extend it in different directions. THEOREM 17. Let T be a normal default theory, and E a set of formulae. The following statements are equivalent. (a) E is an extension of T . (b) E is a modified extension of T . (c) There exists a set of formulae C such that (E, C) is a constrained extension of T .
Default Logic
3.4
539
Rational Default Logic
Constrained Default Logic enforces joint consistency of the justifications of defaults that contribute to an extension, but goes one step further by requiring that the consequent of a default be consistent with the current Con–set. Rational Default Logic [Mikitiuk and n Truszczy´nski, 1995] does not require the latter step. Technically, a default ϕ:ψ1 ,...,ψ is χ rationally applicable to a pair of deductively closed sets of formulae (E, C) iff ϕ ∈ E and {ψ1 , . . . , ψn } ∪ C is consistent. As an example, consider the default theory T = (W, D) with W = ∅ and D = true:¬b true:¬c true:¬d , e , f }. T has the single extension T h({c, d}), three con{ true:b c , d strained extensions, (T h({e, f }), T h({e, ¬c, f, ¬d})) (T h({c, f }), T h({c, b, f, ¬d})) (T h({d, e}), T h({d, ¬b, e, ¬c})) but two rational extensions, T h({c, f }) and T h({d, e}). The first constrained extension is “lost” in Rational Default Logic because it is not closed under application of further and true:¬b are both rationally applicable to T h({e, f }); but once one of defaults. true:b c d them is applied to T h({e, f }) we get a failed situation. [Mikitiuk and Truszczy´nski, 1995] shows that if E is an extension of T in Rational Default Logic, then (E, C) is a constrained extension of T for some set C. The converse is true for semi–normal default theories. Rational Default Logic does not guarantee the existence of extensions. For example, the default theory consisting of the single default true:p ¬p does not have any extensions.
3.5
Cumulative Default Logic
As mentioned earlier Cumulative Default Logic was introduced by Brewka to ensure the property of cumulativity [Brewka, 1991]. The solution he adopted was to use so–called assertions, pairs (ϕ, J) of a formula ϕ and a set of formulae J which collects the assumptions that were used to deduce ϕ. When a default is applied to deduce ϕ the justifications of that default are added to J. We illustrate this approach by considering the example from subsection 3.3 which showed that Default Logic violates cumulativity. Consider T = (W, D) with W = ∅ and D consisting of the defaults true : a a ∨ b : ¬a , . a ¬a In the beginning we can apply only the first default and derive the assertion (a, {a}), meaning that we derived a based on the assumption a. Obviously the second default is not applicable. The violation of cumulativity in Default Logic was caused by the addition of a ∨ b as a new fact which opened the way for the application of the second default instead of the first one. But in Cumulative Default Logic we are allowed to add the assertion (a ∨ b, {a}) to the default theory (if a is derived based on a, then a ∨ b is also
540
Grigoris Antoniou and Kewen Wang
derived based on a), but now the second default is still not applicable because ¬a is not consistent with the set of supporting beliefs {a}. Note that adding a to the default theory as we did in Default Logic corresponds to adding the assertion (a, ∅), which is different from (a, {a}). If we disregard {a}, which is the assumption upon which the deduction of a was based, then indeed we can get more conclusions; this forgetting is the deeper reason for the failure of Default Logic to satisfy cumulativity. From the technical and practical point of view, the use of assertions is complicated and causes practical problems, for example with regard to implementation; this is the price we have to pay for cumulativity. And the gain is questionable in the light of our discussion in subsection 3.1, which argued that lemmas can and should be represented as defaults, rather than facts. Nevertheless Cumulative Default Logic was historically an important one.
3.6
Disjunctive Default Logic
The “broken arm” example from subsection 3.1 shows that Default Logic has a deficiency with the correct treatment of disjunctive information. [Gelfond et al., 1991] proposes a way out of these difficulties by the following analysis: if a formula ϕ ∨ ψ becomes part of the current knowledge base (either as a fact or as a consequent of some default), it should not be included as a predicate logic formula. Instead it should have the effect that one of ϕ and ψ becomes part of an extension. In other words, the expression broken(a)|broken(b) should have the effect that an extension contains one of the two disjuncts, rather than the disjunction broken(a) ∨ broken(b). To see another example, consider the default theory q:r T = (W, D) with W = {p ∨ q} and D = { p:r r , r }. In Default Logic we know the formula p ∨ q but are unable to apply any of the two defaults; so we end up with the single extension T h({p ∨ q}). On the other hand, Disjunctive Default Logic leads to two extensions, one in which p is included, and one in which q is included. In the former case q:r p:r r becomes applicable, in the latter case r becomes applicable. So we end up with two extensions, T h({p, r}) and T h({q, r}), which is intuitively more appealing. For more details see [Gelfond et al., 1991].
3.7
Weak Extensions
All variants of Default Logic discussed so far share the same idea of treating prerequisites of defaults: in order for a default δ to be applicable, its prerequisite must be proven using the facts and the consequents of defaults that were applied before δ. For example, in order to be applicable, p must follow from the facts, or be the consequent for the default p:true p of another default etc. A default theory consisting only of this default has the single extension T h(∅). This has led researchers to refer to Default Logic as being “strongly grounded” in the given facts. In contrast, Autoepistemic Logic [Moore, 1985] provides more freedom in
Default Logic
541
choosing what one wants to believe in. Weak extensions of default theories were introduced to capture this intuition in the default logic framework [Marek and Truszczy´nski, 1993]. In the framework of weak extensions, we can simply decide to believe in some formulae. The only requirement is that this decision can be justified using the facts and default rules. Reconsider the default theory consisting of the single default p:true p . We may decide to believe in p or not. Suppose be do believe in p; then the default can be applied and gives us p as a consequence. In this sense the default justifies the decision to believe in p; T h({p}) is thus a weak extension. Of course we could also adopt the more cautious view and decide not to believe in p; then the default is not applicable, so p cannot be proved and our decision is again justified. In general, extensions of a theory T are also weak extensions of T . For a technical discussion see [Marek and Truszczy´nski, 1993].
4 DEFAULT REASONING WITH PREFERENCE The notion of preference is formerly studied by philosophers, economists and psychologists. In recent years it is pervasive in a number of areas of artificial intelligence including nonmonotonic reasoning, constraint problem solving, decision theory, design of autonomous agents [Junker et al., 2004]. Preference constitutes a very natural and effective way of resolving indeterminate situations. For example, in scheduling not all deadlines may be simultaneously satisfiable, and in configuration various goals may not be simultaneously met. In legal reasoning, laws may apply in different situations, but laws may also conflict with each other. In such a situation, preferences among desiderata may allow one to come to an appropriate compromise solution. Conflicts may be resolved by principles such as ruling that newer laws will have priority over less recent ones, and laws of a higher authority have priority over laws of a lower authority. For a conflict among these principles one may further decide that the “authority” preference takes priority over the “recency” preference. The growing interest in preferences is also reflected by the large number of proposals in nonmonotonic reasoning [Delgrande et al., 2004]. In this section we will review some approaches to default reasoning with preference [Baader and Hollunder, 1992; Brewka and Eiter, 2000; Brewka and Eiter, 1999; Delgrande and Schaub, 2000; Delgrande et al., 2002; Wang and Zhou, 2001]. There are also some other proposals for preference handling in Default Logic and logic programs, for example, [Buccafurri et al., 1999; Dimopoulos and Kakas, 1995; Gelfond and Son, 1997; Grosof, 1997; Rintanen, 1998; Sakama and Inoue, 2000; Zhang and Foo, 1997]. Due to the limitation of space, we have to omit them.
4.1
Some Desiderata on Preference
A preference relation is a binary relation < between objects of a specific type. The objects can be atoms, literals, formulas, or rules. An preference relation < is often a partial order. For any two objects δ1 and δ2 such that δ2 < δ1 , then the object δ1 has higher preference over the object δ2 . Naturally, the higher-ranked object δ1 will be asserted over
542
Grigoris Antoniou and Kewen Wang
the lower, δ2 if a conflict arises. However, different approaches have further interpreted or constrained the relation < in a multitude of ways. Most commonly, a preference ordering is imposed “externally” on rules of a default theory. A default theory (D, W ) may be extended to a prioritized default theory (D, W, <) where < ⊆ D × D gives a preference ordering on how rules may be applied. DEFINITION 18. A prioritized default theory is a triple ∆ = (D, W, <) where (D, W ) is a default theory and < is a partial order. The following example shows a classical scenario where preference information is useful for resolving conflicts in the German-Law example in Section 2.1. In the German-Law scenario, there are two legal rules: (1) A foreigner will be expelled if he has committed a crime. (2) A foreigner will not be expelled if he is a political refugee. These rules can be easily expressed as two defaults in Default Logic: δ1 : δ2 :
criminal(X)∧f oreigner(X):expel(X) expel(X) politicalRef ugee(X) . ¬expel(X)
Suppose we also know that Dude is a foreigner who is involved in a crime and is a political refugee. Then we can encode this knowledge base as a default theory (D, W ) where D = {δ1 , δ2 } and W = {f oreigner(Dude), criminal(Dude), politicalRef ugee(Dude)}. Intuitively, ¬expel(Dude) should be derived from (D, W ) while expel(Dude) should not be derived from (D, W ). However, under Reiter’s definition of extensions, ∆ = (D, W ) is inconsistent and thus no useful information is derived since both expel(Dude) and ¬expel(Dude) can be inferred. Observe that the rule δ2 has higher priority over δ1 , we can naturally obtain a prioritized default theory ∆ = (D, W, <) where the preference relation < on D is defined by δ1 < δ2 . Under most approaches to prioritized Default Logic, ∆ has a unique preferred extension E = T h(W ∪ {¬expel(Dude)}). Although most prioritized Default Logics agree on the above example, they differentiate each other in general and thus result in different interpretations to preference. To evaluate different approaches to preference, a number of possible desiderata have been proposed so that an approach may be expected to satisfy [Delgrande et al., 2004]. [Brewka and Eiter, 1999] proposed two “principles” argued to constitute a minimal requirement for preference handling in a rule-based system. Thus approaches such as Default Logic are most naturally covered by these principles, although they are also applicable, for example, to a circumscriptive abnormality theory with preferences. Principle I: Let B1 and B2 be two extensions of a prioritised theory (T, <) generated by rules R ∪ {δ1 } and R ∪ {δ2 }, where rules δ1 , δ2 ∈ R. If δ1 is preferred over δ2 then B2 is not a preferred extension of T . The term “generated” is crucial in Principle I: For extension B a rule r is a generating rule just if its prerequisites are in B and it is not defeated by B. For a default δ, we say δ is defeated by B if B |= ¬β for some β ∈ just(δ).
Default Logic
543
Principle II: Let B be a preferred extension of a prioritised theory (T, <) and δ a rule such that at least one prerequisite of δ is not in B. Then B is a preferred extension of (T ∪ {δ}, <′ ) whenever <′ agrees with < on priorities among rules in T . Thus adding an inapplicable rule in a preferred extension does not make the extension non-preferred, so long as prior preferences are not changed. Complexity: For major approaches to nonmonotonic reasoning, the complexity of general decision problems of interest is known. Arguably, adding preferences to a given approach should not change the complexity of a given problem. Thus, consider a decision problem such as: Is γ a member of all extensions of theory T ? Arguably, it would be desirable that the overall complexity not change if all extensions is replaced by all preferred extensions. The intuition is that if the complexity does change, then substantial additional machinery has been added to the underlying formalism in order to implement preferences. Fortunately, each of the approaches to preferences discussed in this and next sections have the same complexity as its host system. For simplicity, we consider only propositional Default Logic. Any default with variables is seen as a schema of defaults. Let ∆ = (D, W, <) be a prioritized Default Logic and E is a set of formulas.
4.2
Terminological Default Logic
As we have seen in Section 2, there are many ways to define the semantics of Default Logic. In particular, an extension of a default theory can be defined in terms of quasiinductive definition (Theorem 2). The prioritized default logic introduced in [Baader and Hollunder, 1992], named terminological Default Logic, incorporates preference information into Reiter’s quasi-inductive definition. The main idea of this approach to preference is that a default can be applied during the iteration step only if the default is active and no other active default has higher priority than it. We say a default δ ∈ D is active in E if δ is applicable to E and cons(δ) ∈ T h(E). DEFINITION 19. Let ∆ = (D, W, <) be a prioritized Default Logic and E a set of formulas. We define a sequence E0 , E1 , E2 , · · · as follows: E0 = W , and for all i ≥ 0, ∃δ ∈ D, pre(δ) ∈ T h(Ei ), ¬just(δ) ∈ E and Ei+1 = Ei ∪ cons(δ) | ′ . δ is not active in E for every δ ′ ∈ D with δ < δ ′ Then E is a P-extension of ∆ = (D, W, <) if and only if E = ∪i≥0 T h(Ei ). In the above definition, only active rules with highest priority are applied in each iteration step and thus priority among rules is respected.
544
Grigoris Antoniou and Kewen Wang
Consider a modified version of the classical bird-fly example ∆ = (D, W, <) where W = {penguin(Danny), winged(Danny)}, D consists of the following defaults: δ1 δ2 δ3 δ4
: penguin(x) : ¬f lies(x)/¬f lies(x) : bird(x) : winged(x)/winged(x) : winged(x) : f lies(x)/f lies(x) : penguin(x) : /bird(x)
and δ2 < δ1 (more specific rule is preferred). Then ∆ = (D, W, <) has a unique P-extension E = T h({penguin(Danny), winged(Danny), bird(Danny), ¬f lies(Danny)}) although the default theory (D, W ) has two classical extensions (the other one contains f lies(Danny)). We can prove that each P-extension is also a classical extension. THEOREM 20. Let (D, W, <) be a prioritized Default Logic and E is a set of formulas. If E is a P-extension of (D, W, <), then E is also an extension of (D, W ). A prioritized Default Logic may have zero, one or more P-extensions. However, it is proved that a prioritized normal default theory always has a P-extension. THEOREM 21. Every prioritized normal default theory has a P-extension. While the prioritized Default Logic under P-extensions satisfies Principle II, it violates Principle I.
4.3
Reduction-based Approach
[Brewka and Eiter, 1999] proposed an alternative approach to preference. One important feature of this approach is to use certain reductions. Specifically, two reductions are employed in this approach: (1) a general partial order is reduced to a number of total orders; (2) a general default theory is reduced to a prerequisite-free default theory. Recall that a partial order < on a set S is a well-ordering if every subset of S has the least element. Thus, a well-ordering is a total order. A fully prioritized default theory is a prioritized theory ∆ = (D, W, <) where < is a well-ordering. Let ∆ = (D, W, <) be a fully prioritized prerequisite-free default theory. The operator C is defined as C(∆) = ∪α≥0 Eα , where E0 = T h(W ) and for every ordinal α ≥ 0, E if no default in D is active in Eα ; Eα+1 = head (r) α T h(Eα ∪ {cons(d)}) otherwise, d is the first active default in D. For a prioritized default theory ∆ = (D, W, <), its semantics is given by its preferred extensions which are defined in the following steps.
Step 1 Let ∆ = (D, W, <) be a fully prioritized prerequisite-free default theory. Denote ∆E the prioritized default theory obtained from ∆ by removing all defaults whose
Default Logic
545
consequents are in E but defeated in E. Then we say E is a preferred extension of ∆ if and only if E = C(∆E ). Step 2 Let ∆ = (D, W, <) be a fully prioritized default theory and E a set of formulas. The default theory ∆E = (DE , W, <E ) is obtained from ∆ and E by (1) eliminating every default d ∈ D with pre(d) ∈ E, and (2) replacing pre(d) by the tautology ⊤ in all remaining defaults. Here <E is naturally inherited from <: For any two defaults d′1 and d′2 in DE , d′1 <E d′2 if and only if their corresponding defaults d1 and d2 in D satisfy d1 < d2 . Note that ∆E is prerequisite-free. Thus we can define the notion of preferred extensions as follows. DEFINITION 22. E is a preferred extension of ∆ = (D, W, <) if and only if (1) E is an extension of (D, W ) and (2) E is a preferred extension of ∆E . As an example, consider the prioritzed default theory ∆ = (D, W, <) where W = {}, D consists of the following defaults: d1 : a : b/b d2 : ⊤ : ¬b/¬b d3 : ⊤ : a/a and d3 < d2 < d1 . (D, W ) has two extensions E1 = T h({a, b}) and E2 = T h({a, ¬b}). It can be verified that E1 = C(∆E1 ) and thus E1 is a preferred extension of ∆. However, E2 is not a preferred extension of ∆. THEOREM 23. The preferred extensions defined in this section satisfies both Principle I and II.
4.4
Compiling Prioritized Default Logic
In [Delgrande and Schaub, 2000], a methodology based on Default Logic is proposed for expressing general preference information. In this approach, a prioritized Default Logic (D, W, <) is compiled into a standard Default Logic (D′ , W ′ ) and the preferences can be respected in the sense that E is a (preferred) extension of (D, W, <) if and only if E is an extension of (D′ , W ′ ). In this section, we show that how Brewka and Eiter’s prioritized Default Logic can be compiled into standard Default Logic. The main idea of this approach is to encode preference information into new defaults. For each default d, we introduce the symbol nd for representing the label of d; bl (nd ) denotes that the application of d is blocked (in some iteration step); ok (nd ) is used to delay the application of d; ap(nd ) denotes that d is applicable; nd ≺ nd′ is to encode d < d′ . Each default d in D is mapped to three defaults: pre(d) ∧ ok(nd ) : just(d) ok(nd ) : ¬pre(d) ¬just(d) ∧ ok(nd ) : , , . cons(d) ∧ ap(nd ) bl(nd ) bl(nd )
546
Grigoris Antoniou and Kewen Wang
These three rules for d are abbreviated by da , db1 , db2 , respectively. The rule da represents the original default d while the other two defaults are designed to control the applicability of d. Let (D, W, <) be a prioritized default theory. Usually, if we want to map (D, W, <) into a standard default theory (D′ , W ′ ), then • W ′ is the union of W , the set of atoms {nd ≺ nd′ | d < d′ , d, d′ ∈ D} and some other formulas including the unique names axioms and the domain closure axioms. • D′ is the union of the set of ∪d∈D {da , db1 , db2 } and some other defaults for specific preference approach. A translation for Brewka and Eiter’s reduction-based approach is provided in [Delgrande et al., 2000a]. In general, the compiled default theory is much larger than the original one. However, the compilation approach allows to employ standard systems for implementing prioritized Default Logics.
4.5
Dynamic Preferences
In previous approaches, preference information is expressed as external relation on defaults (it is often called static preference). Alternatively, preferences may be imposed at the object-level. For example in [Delgrande and Schaub, 2000], constants representing names are associated with the default rules. Instead of a relation δ2 < δ1 between default rules one can now assert n2 ≺ n1 between the corresponding names, where ≺ is a (new) binary relation in the object language. In this approach, we deal with standard default theory (D, W ) over a language including the predicate ≺ which expresses a preference relation. Since preferences are now available dynamically by inferences from W and D, we cannot simply treat a dynamically prioritized default theory (D, W ) as a standard default theory semantically. In particular, the predicate ≺ needs some special treatment. E is an extension of a dynamically prioritized default theory (D, W ) if E is a standard extension of the default theory (D′ , W ′ ) where D′ = {da , db1 , db2 | d ∈ D} ∪ {
: ¬(x ≺ y), (x ≺ y) } (x ≺ y)
and W ′ = W ∪ W≺ ∪ DCAN ∪ U N AN . Here DCAN is the domain closure assumption, U N AN is the unique names assumption and W≺ consists two parts: axioms specifying that ≺ is a strict partial order and axioms specifying properties of bl, ap and ok like [(x ≺ y) ⊃ (bl(y) ∨ ap(y))] ⊃ ok(x). In this way, a default theory with dynamic preference can also be compiled into a standard default theory.
Default Logic
5
547
PRIORITIZED LOGIC PROGRAMS
As a formalism of nonmonotonic reasoning, logic programming under the answer sets [Gelfond and Lifschitz, 1990] has a close relation to default logic. In fact, extended logic programs can be embedded into default logic in the sense that each extended logic program can be naturally transformed into an ’equivalent’ default theory. More recently, those approaches to prioritized Default Logic in Sections 4.2-4.4 have been adapted to logic programs and deeper results are achieved. In this section, we discuss some of those proposals using a uniform framework which was suggested in [Schaub and Wang, 2001; Schaub and Wang, 2003].
5.1
Answer Sets for Extended Logic Programs
An extended logic program is a finite set of rules of the form (1)
L0 ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln ,
where n ≥ m ≥ 0, and each Li (0 ≤ i ≤ n) is a literal, ie. either an atom A or the negation ¬A of A. The set of all literals is denoted by Lit. Given a rule r as in (1), we let head (r) denote the head, L0 , of r and body(r) the body, {L1 , . . . , Lm , not Lm+1 , . . . , not Ln }, of r. Further, let body + (r) = {L1 , , . . . , Lm } and body − (r) = {Lm+1 , . . . , Ln }. A program is called basic if body − (r) = ∅ for all its rules; it is called normal if it contains no classical negation symbol ¬. The reduct of a rule r is defined as r+ = head (r) ← body + (r); the reduct, ΠX , of a program Π relative to a set X of literals is defined by ΠX = {r+ | r ∈ Π and body − (r) ∩ X = ∅}. A set of literals X is closed under a basic program Π iff for any r ∈ Π, head (r) ∈ X whenever body + (r) ⊆ X. We say that X is logically closed iff it is either consistent (ie. it does not contain both a literal A and its negation ¬A) or equals Lit. The smallest set of literals which is both logically closed and closed under a basic program Π is denoted by Cn(Π). With these formalities at hand, we can define answer set semantics for extended logic programs: A set X of literals is an answer set of a program Π iff Cn(ΠX ) = X. For the rest of this paper, we concentrate on consistent answer sets. Logic programs can be embedded into Default Logic, Autoepistemic Logic and Circumscription. In particular, each rule r of the form L0 ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln in logic program can be equivalently transformed into a default dr : L1 ∧ · · · ∧ Lm : ¬Lm+1 ∧ · · · ∧ ¬Ln L0 in the following sense. Note that each literal in logic program is treated as an atom. THEOREM 24. Given a logic program Π, let DΠ = {dr | r ∈ Π}. Then a set S of literals is an answer set of Π if and only if T h(S) is an extension of the default theory (DΠ , ∅).
548
Grigoris Antoniou and Kewen Wang
Similar to the case of Default Logic, we can define the activeness of a rule as follows. DEFINITION 25. Let X and Y be two sets of literals in a logic program Π. A rule r in Π is active in the pair (X, Y ), if body + (r) ⊆ X and body − (r) ∩ Y = ∅. This definition is a generalization of the activeness defined in last section. An prioritized logic program is a pair (Π, <) where Π is an extended logic program and < ⊆ Π × Π is a strict partial order. Given, r1 , r2 ∈ Π, the relation r1 < r2 is meant to express that r2 has higher priority than r1 .
Preferred fixpoints Answer sets are defined via a reduction of extended logic programs to basic programs. Such a reduction is inappropriate when resolving conflicts among rules by means of preferences. Rather conflict resolution must be addressed among the original rules in order to account for blockage between rules. The intuition behind W-preference in [Schaub and Wang, 2003; Wang and Zhou, 2001] is therefore to characterize preferred answer sets by an inductive development that agrees with the given ordering rather than a simultaneous reduction. DEFINITION 26. Let (Π, <) be a prioritized program and let X and Y be sets of literals. We define the set of immediate consequences of Y with respect to (Π, <) and X as
T(Π,<),X Y
=
I. II . head (r)
r ∈ Π is active in (Y, X) and there is no rule r′ ∈ Π with r < r′ such that (a) r′ is active in (X, Y ) and (b) head (r′ ) ∈ Y
if it is consistent, and T(Π,<),X Y = Lit otherwise.
The idea is to apply a rule r only if it is applicable and each r′ with higher priority has been applied. The above definition allows us to define a counterpart of the standard consequence operator in the setting of prioritized programs: DEFINITION 27. Let be a prioritized program and let X be a set of literals. We (Π, <) i ∅. define C(Π,<) (X) = i≥0 T(Π,<),X
Of particular interest in view of an alternating fixpoint theory is that C(Π,<) enjoys anti-monotonicity, i.e. X1 ⊆ X2 , implies C(Π,<) (X2 ) ⊆ C(Π,<) (X1 ).
DEFINITION 28. Let (Π, <) be a prioritized program and let X be a set of literals. We define X as a preferred answer set of (Π, <) if and only if C(Π,<) (X) = X. For illustration, consider the following prioritized program, which is a variant of the
Default Logic
549
classical bird-fly example. r1 r2 r3 r4 r5
: : : : :
¬f w f b p
← p, not f ← b, not ¬w ← w, not ¬f ← p ←
r2 < r1
Here f is for “fly”; p for “penguin”; w for “winged”; b for “bird”. This program admits two standard answer sets: X = {p, b, ¬f, w} and X ′ = {p, b, f, w} but only X is preferred. THEOREM 29. Let (Π, <) be a prioritized program and X a set of literals. If X is a preferred answer set of (Π, <), then X is an answer set of Π. In particular, the preferred answer sets of (Π, ∅) are all answer sets of Π. In turn, for each answer set X of a logic program Π, there is an ordering < on the rules of Π such that X is the unique preferred answer set of (Π, <). THEOREM 30. Let Π be a logic program and X an answer set of Π. Then, there is a partial order < such that X is the unique preferred answer set of the prioritized program (Π, <). Moreover, (Π, <) has at most one preferred answer set, whenever the rules in Π are totally ordered. THEOREM 31. A totally prioritized program has at most one preferred answer set. Interestingly, stratified programs can be associated with an order on their rules in a canonical way. That is, rules in lower levels are preferred over rules in higher levels. We obtain thus a prioritized program (Π, <s ) for any stratified logic program Π with a fixed stratification. THEOREM 32. Let Π be a stratified logic program and X ⋆ be the perfect model of Π. Let <s be an order induced by some stratification of Π. Then , (Π, <s ) has the unique preferred answer set X ⋆ . Various preference strategies can be embedded into the above framework by adjusting the operator C(Π,<) . In 5.2 and 5.3, we show how two exisitng approaches to preference semantics can be characterized. For clarity, we add the prefix (or superscript) “W” to all names of concepts introduced in this section.
5.2
Order preservation
The selection of preferred answer sets can also be characterized by the notion of order preservation [Delgrande et al., 2000b]. The set ΓΠ X of all generating rules of a set X of literals from program Π is given by ΓΠ X = {r ∈ Π | body + (r) ⊆ X and body − (r) ∩ X = ∅}.
550
Grigoris Antoniou and Kewen Wang
DEFINITION 33. Let (Π, <) be a prioritized program and let X be an answer set of Π. Then, X is called
5.3
Characterizing Prioritized Reduction
The preference approach introduced in this section differs in two significant ways from the previous ones. First, the construction of answer sets is separated from verifying whether they respect the given preferences. Second, rules that putatively lead to counter-intuitive results are explicitly removed from the inference process: EX (Π) = Π \ {r ∈ Π | head (r) ∈ X, body − (r) ∩ X = ∅} Accordingly, we define EX (Π, <) = (EX (Π), < ∩ (EX (Π) × EX (Π)) ). For simplicity, we assume EX (Π) = Π in the following. In [Brewka and Eiter, 1999], a B-preferred answer set of a prioritized program (Π, <) is defined as a B-preferred answer set of a fully prioritized program where ≪ extends <. Moreover, a general program is further reduced to a prerequisite-free program by (1) deleting any rule r with body + (r) ⊆ X and (2) deleting body + (r) for any remaining rule r.
Default Logic
551
DEFINITION 35. Let (Π, ≪) be a fully prioritized prerequisite-free logic program, let ri i∈I be an enumeration of Π according to the ordering ≪, and let X be a set of literals. Then, B(Π,≪) (X) is the smallest logically closed set of literals containing i∈I Xi , where Xj = ∅ for j ∈ I and
Xi−1 if body − (ri ) ∩ Xi−1 = ∅ Xi = Xi−1 ∪ {head (ri )} otherwise. X is a B-preferred answer set of (Π, ≪) if X is an answer set of Π and X = B(Π,≪) (X). Consider again the bird-fly example, it can be verified that both X and X ′ are Bpreferred while only X is D- and W-preferred. If we replace the condition “r ∈ Π is active in (Y, X)” in Definition 26 with “r ∈ Π B in a similar is active in (X, X)”, then we can define a new consequence operator C(Π,<) D W B way as C(Π,<) and C(Π,<) . Unlike above, C(Π,<) is not anti-monotonic. This is related to the fact that the “answer set property” of a set is verified separately. B-preference also enjoys a similar fixpoint characterization as D -preference and W preference. THEOREM 36. Let (Π, <) be a prioritized program over L and let X be an answer set of Π. B (X) = X. Then, we have that X is B-preferred if and only if C(Π,<)
5.4
Relationships.
First of all, we observe that all three approaches treat the blockage of (higher-ranked) rules in the same way. That is, a rule r′ is found to be blocked if either its prerequisites in body + (r′ ) are never derivable or if some member of body − (r′ ) has been derived by higher-ranked or unrelated rules. This is reflected by the identity of conditions IIa and 2a/b in all three approaches, respectively. From the fixpoint characterizations of the three approaches, it can be shown that each D -preferred answer set is a W -preferred answer set and each W -preferred answer set is a B-preferred answer set. Let AS(Π) = {X | CΠ (X) = X} and AS P (Π, <) = {X ∈ AS(Π) | X is P preferred} for P = W, D, B. Then, we obtain the following summarizing result. THEOREM 37. Let (Π, <) be a prioritized logic program. Then, we have AS D (Π, <) ⊆ AS W (Π, <) ⊆ AS B (Π, <) ⊆ AS(Π) This hierarchy is primarily induced by a decreasing interaction between groundedness and preference. For stratified logic programs, we have the following result. THEOREM 38. Let X ⋆ be the perfect model of stratified logic program Π and let <s be an order induced by some stratification of Π. Let (Π, <) be a prioritized logic program such that < ⊆ <s .
552
Grigoris Antoniou and Kewen Wang
Then, we have AS D (Π, <) = AS W (Π, <) = AS B (Π, <) = AS(Π) = {X ⋆ }. Finally, it should be pointed out that the previous fixpoint characterisations do not only provide a unifying framework for various preferred answer sets but moreover lead directly to implementations via the compilations techniques introduced in [Delgrande et al., 2000b]. The corresponding compiler, called plp [Delgrande et al., 2001], serves as a frontend to dlv and smodels and is freely available at http://www.cs.uni-potsdam. de/\∼{}torsten/plp/.
6
CONCLUSION
Default Logic is an important method of knowledge representation and reasoning, because it supports reasoning with incomplete information, and because defaults can be found naturally in many application domains, such as diagnostic problems, information retrieval, legal reasoning, regulations, specifications of systems and software etc. Default Logic can be used either to model reasoning with incomplete information, which was the original motivation, or as a formalism which enables compact representation of information [Cadoli et al., 1994]. Important prerequisites for the development of successful applications in these domains include (i) the understanding of the basic concepts, and (ii) the existence of powerful implementations. The aim of this paper was to contribute to the first requirement. We have discussed the basic concepts and ideas of Default Logic, and based the presentation on operational interpretations, rather than on fixpoints, as usually done. The operational interpretations allow learners to apply concepts to concrete problems in a straightforward way. This is an important point, because the difficulty of understanding Default Logic should not be underestimated. A survey among students at the University of Toronto regarding their ranking of the difficulty of several nonmonotonic reasoning formalisms resulted in Default Logic (presented based on fixpoints) being perceived as the second most difficult one, surpassed only by the full version of Circumscription [McCarthy, 1980], but well ahead of Autoepistemic Logic [Moore, 1985]. In some cases, standard Default Logic is insufficient to resolve conflicts among defaults as we have seen in previous sections. Preferences provide a declarative way to solve this problem and thus a lot of approaches to preference handling in Default Logic are proposed. Logic programs under the answer sets can be seen an expressive subset of default theories. More recent approaches to preferences are mainly based on logic programs. For this reason, we also introduced preference handling in logic programs. Implementation aspects were also outside the scope of this paper; some entry points to current work in the area include [Cholewi´nski et al., 1995; Cholewi´nski et al., 1996; Courtney et al., 1996; Linke and Schaub, 1995; Niemel¨a, 1995; Risch and Schwind, 1994; Schaub, 1994].
Default Logic
553
BIBLIOGRAPHY [Antoniou and Sperschneider, 1994] G. Antoniou and V. Sperschneider. Operational concepts of nonmonotonic logics. part 1: Default logic. Artificial Intelligence Review, 8:3–16, 1994. [Antoniou et al., 1996] G. Antoniou, T. O’Neill, and J. Thurbon. Studying properties of classes of default logics - preliminary report. In Proc. 4th Pacific Rim International Conference on Artificial Intelligence, LNAI 1114, pages 558–569. Springer-Verlag, 1996. [Antoniou, 1998] G. Antoniou. Non-monotonic Reasoning. The MIT Press, 1998. [Baader and Hollunder, 1992] F. Baader and B. Hollunder. Embedding defaults into terminological knowledge representation formalisms. In Proceedings of the Third International Conference on the Principles of Knowledge Representation and Reasoning, pages 306–317, Cambridge, MA, October 1992. [Besnard, 1989] P. Besnard. An Introduction to Default Logic. Symbolic Computation — Artifical Intelligence. Springer-Verlag, 1989. [Bondarenko et al., 1997] A. Bondarenko, P. Dung, R. Kowalski, and F. Toni. An abstract, argumentationtheoretic approach to default reasoning. Artificial Intelligence, 93(1-2):63–101, 1997. [Brewka and Eiter, 1999] G. Brewka and T. Eiter. Preferred answer sets for extended logic programs. Artificial Intelligence, 109(1-2):297–356, 1999. [Brewka and Eiter, 2000] G. Brewka and T. Eiter. Prioritizing default logic. In St. H¨olldobler, editor, Intellectics and Computational Logic — Papers in Honour of Wolfgang Bibel, pages 27–45. Kluwer Academic Publishers, 2000. [Brewka, 1991] G. Brewka. Nonmonotonic Reasoning: Logical Foundations of Commonsense. Cambridge University Press, Cambridge, 1991. [Buccafurri et al., 1999] F. Buccafurri, N. Leone, and P. Rullo. Semantics and expressiveness of disjunctive ordered logic. Annals of Mathematics and Artificial Intelligence, 25:311–337, 1999. [Cadoli et al., 1994] M. Cadoli, F. Donini, and M. Schaerf. Is intractability of nonmonotonic reasoning a real drawback. In Proceedings of the AAAI National Conference on Artificial Intelligence, pages 946–951. The AAAI Press/The MIT Press, 1994. [Cholewi´nski et al., 1995] P. Cholewi´nski, V. Marek, A. Mikitiuk, and M. Truszczy´nski. Experimenting with nonmonotonic reasoning. In L. Sterling, editor, Proceedings of the International Conference on Logic Programming, pages 267–281. The MIT Press, 1995. [Cholewi´nski et al., 1996] P. Cholewi´nski, V. Marek, and M. Truszczy´nski. Default reasoning system DeReS. In Proceedings of the Fifth International Conference on the Principles of Knowledge Representation and Reasoning, pages 518–528. Morgan Kaufmann Publishers, 1996. [Courtney et al., 1996] A. Courtney, G. Antoniou, and N. Foo. Exten: A system for computing default logic extensions. In Proceedings of the 4th Pacific Rim International Conference on Artificial Intelligence, volume 1114 of Lecture Notes in Artificial Intelligence, pages 208–223. Springer-Verlag, 1996. [Delgrande and Schaub, 2000] J. Delgrande and T. Schaub. Expressing preferences in default logic. Artificial Intelligence, 123(1-2):41–87, 2000. [Delgrande et al., 1994] J. Delgrande, T. Schaub, and W. Jackson. Alternative approaches to default logic. Artificial Intelligence, 70(1-2):167–237, 1994. [Delgrande et al., 2000a] J. Delgrande, T. Schaub, and H. Tompits. A compilation of Brewka and Eiter’s approach to prioritization. In M. Ojeda-Aciego, I. Guzm´an, G. Brewka, and L. Pereira, editors, Proceedings of the European Workshop on Logics in Artificial Intelligence (JELIA 2000), volume 1919 of Lecture Notes in Artificial Intelligence, pages 376–390. Springer-Verlag, 2000. [Delgrande et al., 2000b] J. Delgrande, T. Schaub, and H. Tompits. Logic programs with compiled preferences. In W. Horn, editor, Proceedings of the European Conference on Artificial Intelligence, pages 392–398. IOS Press, 2000. [Delgrande et al., 2001] J. Delgrande, T. Schaub, and H. Tompits. A generic compiler for ordered logic programs. In T. Eiter, W. Faber, and M. Truszczy´nski, editors, Proceedings of the Sixth International Conference on Logic Programming and Nonmonotonic Reasoning, volume 2173 of Lecture Notes in Artificial Intelligence, pages 411–415. Springer-Verlag, 2001. [Delgrande et al., 2002] J. Delgrande, T. Schaub, and H. Tompits. A framework for compiling preferences in logic programs. Theory and Practice of Logic Programming, 2002. To appear. [Delgrande et al., 2004] J. Delgrande, T. Schaub, H. Tompits, and K. Wang. A classification and survey of preference handling approaches in nonmonotonic reasoning. Computational Intelligence, 20(2):308–334, 2004.
554
Grigoris Antoniou and Kewen Wang
[Dimopoulos and Kakas, 1995] Y. Dimopoulos and C. Kakas. Logic programming without negation as failure. In J. Lloyd, editor, Proceedings of the International Symposium of Logic Programming, pages 369–383. The MIT Press, 1995. [Etherington, 1987a] D. Etherington. Relating default logic and circumscription. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 489–494, 1987. [Etherington, 1987b] D. Etherington. A semantics for default logic. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 495–498, 1987. [Gelfond and Lifschitz, 1990] M. Gelfond and V. Lifschitz. Logic programs with classical negation. In Proceedings of the International Conference on Logic Programming, pages 579–597, 1990. [Gelfond and Son, 1997] M. Gelfond and T. Son. Reasoning with prioritized defaults. In J. Dix, L. Pereira, and T. Przymusinski, editors, Third International Workshop on Logic Programming and Knowledge Representation, volume 1471 of Lecture Notes in Computer Science, pages 164–223. Springer-Verlag, 1997. [Gelfond et al., 1991] M. Gelfond, V. Lifschitz, H. Przymusinska, and M. Truszczy´nski. Disjunctive defaults. In J. Allen, R. Fikes, and E. Sandewall, editors, Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning, pages 230–237. Morgan Kaufmann Publishers, 1991. [Grosof, 1997] B. Grosof. Prioritized conflict handling for logic programs. In J. Maluszynsk, editor, Logic Programming: Proceedings of the 1997 International Symposium, pages 197–211. The MIT Press, 1997. [Junker et al., 2004] U. Junker, J. Delgrande, J. Doyle, F. Rossi, and T. Schaub. Computational Intelligence, specila issue on Preferences in AI, volume 20. 2004. [Linke and Schaub, 1995] T. Linke and T. Schaub. Lemma handling in default logic theorem provers. In G. Brewka and C. Witteveen, editors, Second Dutch/German Workshop on Non-Monotonic Reasoning Techniques and Their Applications, 1995. [Łukaszewicz, 1988] W. Łukaszewicz. Considerations on default logic — an alternative approach. Computational Intelligence, 4:1–16, 1988. [Łukaszewicz, 1990] W. Łukaszewicz. Non-monotonic reasoning: formalizations of commonsense reasoning. Artificial Intelligence. Ellis Horwood, 1990. [Makinson, 1994] D. Makinson. General patterns in nonmonotonic reasoning. In D. Gabbay, C. Hogger, and J. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, volume 1, pages 35–110. Oxford University Press, 1994. [Marek and Truszczy´nski, 1993] V. Marek and M. Truszczy´nski. Nonmonotonic logic: context-dependent reasoning. Artifical Intelligence. Springer-Verlag, 1993. [McCarthy, 1980] J. McCarthy. Circumscription — a form of nonmonotonic reasoning. Artificial Intelligence, 13(1-2):27–39, 1980. [Mikitiuk and Truszczy´nski, 1995] A. Mikitiuk and M. Truszczy´nski. Rational versus constrained default logic. In C. Mellish, editor, Proceedings of the International Joint Conference on Artificial Intelligence, pages 1509–1515. Morgan Kaufmann Publishers, 1995. [Moore, 1985] R. Moore. Semantical considerations on nonmonotonic logics. Artificial Intelligence, 25:75– 94, 1985. [Niemel¨a, 1995] I. Niemel¨a. Towards efficient default reasoning. In C. Mellish, editor, Proceedings of the International Joint Conference on Artificial Intelligence, pages 312–318. Morgan Kaufmann Publishers, 1995. [Poole, 1994] D. Poole. Default logic. In D. Gabbay, C. Hogger, and J. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, volume 1, pages 189–215. Oxford University Press, 1994. [Reiter, 1977] R. Reiter. On closed world data bases. In H. Gallaire and J.-M. Nicolas, editors, Proceedings of Workshop on Logic and Databases, pages 119–140. Plenum, Toulouse, France, 1977. [Reiter, 1980] R. Reiter. A logic for default reasoning. Artificial Intelligence, 13(1-2):81–132, 1980. [Rintanen, 1998] J. Rintanen. Lexicographic priorities in default logic. Artificial Intelligence, 106:221–265, 1998. [Risch and Schwind, 1994] V. Risch and C. Schwind. Tableau-based characterization and theorem proving for default logic. Journal of Automated Reasoning, 13:223–242, 1994. [Sakama and Inoue, 2000] C. Sakama and K. Inoue. Prioritized logic programming and its application to commonsense reasoning. Artificial Intelligence, 123(1-2):185–222, 2000. [Schaub and Wang, 2001] T. Schaub and K. Wang. A comparative study of logic programs with preference: Preliminary report. In A. Provetti and S. Cao, editors, Proceedings of AAAI Spring Symposium on Answer Set Programming, pages 151–157. AAAI Press, 2001. [Schaub and Wang, 2003] T. Schaub and K. Wang. Towards a semantic framework for preference handling in answer set programming. Theory and Practice of Logic Programming, 3(4-5):569–607, 2003.
Default Logic
555
[Schaub, 1992] T. Schaub. On constrained default theories. Technical Report AIDA-92-2, FG Intellektik, FB Informatik, TH Darmstadt, Alexanderstraße 10, D-64283 Darmstadt, Germany, January 1992. [Schaub, 1994] T. Schaub. A new methodology for query-answering in default logics via structure-oriented theorem proving. Technical report, IRISA, Campus de Beaulieu, F-35042 Rennes Cedex, France, January 1994. [Teng, 1996] C. Teng. Possible world partition sequences: A unifying framework for uncertain reasoning. In Proc. 12th Conference on Uncertainty in Artificial Intelligence, pages 517–524, 1996. [Wang and Zhou, 2001] K. Wang and L. Zhou. An extension to gcwa and query evaluation for disjunctive deductive databases. Journal of Intelligent Information Systems, 16(3):229–253, 2001. [Zhang and Foo, 1997] Y. Zhang and N. Foo. Answer sets for prioritized logic programs. In J. Maluszynski, editor, Proceedings of the International Symposium on Logic Programming (ILPS-97), pages 69–84. The MIT Press, 1997.
NONMONOTONIC REASONING Alexander Bochman
1 WHAT IS NONMONOTONIC REASONING The field of nonmonotonic reasoning is now an essential part of the logical approach to Artificial Intelligence (AI). There exists a vast literature on the topic, including a number of books [Antoniou, 1997; Besnard, 1989; Bochman, 2001; Bochman, 2005; Brewka, 1991; Lukaszewicz, 1990; Makinson, 2005; Marek and Truszczy´ nski, 1993; Schlechta, 1997; Schlechta, 2004]. Two collections are especially useful: [Ginsberg, 1987] is a primary source for the early history of the subject, while the handbook [Gabbay et al., 1994] provides overviews of important topics and approaches. [Minker, 2000] is a most recent collection of survey papers and original contributions to logic-based AI. This chapter has also benefited from a number of overviews of the field, especially [Reiter, 1987a; Minker, 1993; Brewka et al., 1997; Thomason, 2003]. The relationship between nonmonotonic reasoning and logic is part of a larger story of the relations between AI and logic (see [Thomason, 2003]). John McCarthy, one of the founders of AI, has suggested in [McCarthy, 1959] and consistently developed a research methodology that used logic to formalize the reasoning problems in AI.1 McCarthy’s objective was to formalize common sense reasoning used in dealing with everyday problems. In a sense, nonmonotonic reasoning is an outgrowth of McCarthy’s program. But though commonsense reasoning has always appeared to be an attractive standard, the study of ‘artificial reasoning’ need not and actually has not been committed to the latter. The basic formalisms of nonmonotonic reasoning could hardly be called formalizations of commonsense reasoning. Still, in trying to cope with principal commonsense reasoning tasks, the suggested formalisms have succeeded in capturing important features of the latter and thereby have broken new territory for logic. Artificial Intelligence has practical purposes, which give rise to problems and solutions of a new kind, apparently different from the questions relevant for philosophers. The authors of first nonmonotonic theories have tried, of course, to express their formalisms using available logical means, ranging from the classical first order language to modal logics. McCarthy himself has always believed that anything that can be expressed, can be expressed in first order logic (he considered this a 1 See
[Lifschitz, 1991a] for an overview of McCarthy’s research program.
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
558
Alexander Bochman
kind of Turing thesis for logic). Still, from its very beginning, logical AI has created formalisms and approaches that had no counterpart in existing logical theories, and refined them to sophisticated logical systems. This was achieved mostly by people that were not logicians as their primary speciality. It is even advantageous to see nonmonotonic reasoning as a brand new approach to logical reasoning; this would save us from hasty attempts to subsume such a reasoning in existing logical formalisms at the price of losing the precious new content. Already at this stage of its development, nonmonotonic reasoning is not yet another application of logic, but a relatively independent field of logical research that has a great potential in informing, in turn, future logical theory as well as many areas of philosophical inquiry. The origins of nonmonotonic reasoning within the broad area of logical AI lied in dissatisfaction with the traditional logical methods in representing and handling the problems posed by AI. Basically, the problem was that reasoning necessary for an intelligent behavior and decision making in realistic situations has turned out to be difficult, even impossible, to represent as deductive inferences in some logical system. In commonsense reasoning, we usually have just partial information about a given situation, and we make a lot of assumptions about how things normally are in order to carry out further reasoning. For example, if we learn that Tweety is a bird, we usually assume that it can fly. Without such presumptions, it would be almost impossible to carry out the simplest commonsense reasoning tasks. Speaking generally, human reasoning is not reducible to collecting facts and deriving their consequences; it embodies an active epistemic attitude that involves making assumptions and wholesale theories about the world and acting in accordance with them. We do not only perceive the world, we also give it structure in order to make it intelligible and controllable. Commonsense reasoning in this sense is just a rudimentary form of a general scientific methodology. The way of thinking in partially known circumstances suggested by nonmonotonic reasoning consists in using justified beliefs and reasonable assumptions that can guide us in our decisions. Accordingly, nonmonotonic reasoning can be described as a theory of making and revising assumptions in a reasoned or principled way [Doyle, 1994]. Of course, the latter are only beliefs and assumptions, so they should be abandoned when we learn new facts about the circumstances that contradict them. The sentence “Birds (normally) fly” is weaker than “All birds fly”; there is a seemingly open-ended list of exceptions — ostriches, penguins, Peking ducks, etc. etc. So, if we would try to use classical logic for representing “Birds fly”, the first problem would be that it is practically impossible to enumerate all exceptions to flight with an axiom of the form (∀x).Bird(x)&¬P enguin(x)&¬Emu(x)&¬Dead(x)&... ⊃ F ly(x) This fact indicates that our commonsense assumptions are often global in character, saying something like “the world is as normal as possible, given the known
Nonmonotonic Reasoning
559
facts”. The second crucial problem is that, even if we could enumerate all such exceptions, we still could not derive F ly(T weety) from Bird(T weety) alone. This is so since we are not given that Tweety is not a penguin, or dead, etc. The antecedent of the above implication cannot be derived, in which case there is no way of deriving the consequent. Nevertheless, if told only about a particular bird, say Tweety, without being told anything else about it, we would be justified in assuming that Tweety can fly, without knowing that it is not one of the exceptional birds. So the problem is how we can actually make such assumptions in the absence of information to the contrary. This suppositional character of commonsense reasoning conflicts with the monotonic character of logical derivations. Monotonicity is just a characteristic property of deductive inferences arising from the very notion of a proof being a sequence of steps starting with accepted axioms and proceeding by inference rules that remain valid in any context of its use. Consequently, if a set a of formulas implies a consequence C, then a larger set a ∪ {A} will also imply C. Commonsense reasoning is non-monotonic in this sense, because adding new facts may invalidate some of the assumptions made earlier. In his influential “frames paper” [Minsky, 1974], Marvin Minsky proposed the notion of a frame, a complex data structure meant to represent a stereotyped and default information. While Minsky’s description of a frame was informal, central to his notion were prototypes, default assumptions, and the unsuitability of classical definitions for commonsense concepts. In the appendix entitled “Criticism of the Logistic Approach”, Minsky explained why he thinks that logical approaches will not work. To begin with, he directly questioned the suitability of representing commonsense knowledge in a form of a deductive system: There have been serious attempts, from as far back as Aristotle, to represent common sense reasoning by a ”logistic” system .... No one has been able successfully to confront such a system with a realistically large set of propositions. I think such attempts will continue to fail, because of the character of logistic in general rather than from defects of particular formalisms. Minsky doubted the feasibility of representing ordinary knowledge effectively in the form of many small, independently true propositions. On his opinion, such a “logical” reasoning is not flexible enough to serve as a basis for thinking. The strategy of complete separation of specific knowledge from general rules of inference is much too radical. We need more direct ways for linking fragments of knowledge to advice about how they are to be used. As a result of this deficiency, traditional formal logic cannot discuss what ought to be deduced under ordinary circumstances. Minsky was also one of the first who mentioned monotonicity as a source of the problem:
560
Alexander Bochman
MONOTONICITY: ... In any logistic system, all the axioms are necessarily “permissive” — they all help to permit new inferences to be drawn. Each added axiom means more theorems, none can disappear. There simply is no direct way to add information to tell such the system about kinds of conclusions that should not be drawn! To put it simply: if we adopt enough axioms to deduce what we need, we deduce far too many other things. As yet another problematic feature, Minsky mentioned the requirement of consistency demanded by Logic that makes the corresponding systems too weak: I cannot state strongly enough my conviction that the preoccupation with Consistency, so valuable for Mathematical Logic, has been incredibly destructive to those working on models of mind. At the popular level it has produced a weird conception of the potential capabilities of machines in general. At the “logical” level it has blocked efforts to represent ordinary knowledge, by presenting an unreachable image of a corpus of context-free “truths” that can stand separately by themselves. First theories of nonmonotonic reasoning could be viewed as providing logical answers to Minsky’s challenge.
2
PRE-HISTORY: PROBLEMS AND FIRST SOLUTIONS
Long before the emergence of first nonmonotonic systems, there have been a number of problems and applications in AI that required and used some forms of nonmonotonic reasoning. In fact, it is these problems and solutions, rather than the strategic considerations of McCarthy or Minsky, that influenced the actual shape of subsequent nonmonotonic formalisms. Initial solutions to commonsense reasoning tasks worked (though in restricted applications), and this was an incentive for trying to provide them with a more systematic logical basis. On a most general level, nonmonotonic or default reasoning is intimately connected to the notion of prototypes in psychology and natural kinds in philosophy. Just as the latter, default assumptions cannot be defined via necessary and sufficient conditions, but involve a description of “typical” members. The problem of representing and reasoning with such concepts has reappeared in AI as a practical problem of building taxonomic hierarchies for large knowledge bases. The basic reasoning principle in such hierarchies is that subclasses inherit properties from their super-classes. Much more complex logical issues have arisen when the organization of a domain into hierarchies has been allowed to have exceptions. The theory of reasoning in such taxonomies has been called nonmonotonic inheritance (see [Horty, 1994] for an overview). The guiding principle in resolving potential conflicts in such
Nonmonotonic Reasoning
561
hierarchies was a specificity principle [Poole, 1985; Touretzky, 1986]: more specific information should override more generic information in cases of conflict. Though obviously related to nonmonotonic reasoning, nonmonotonic inheritance relied more heavily on graph-based representations than on traditional logical tools (see, e.g., [Touretzky, 1986]). Nevertheless, it has managed to provide a plausible analysis of reasoning in this restricted context. The relations of nonmonotonic inheritance to general nonmonotonic formalisms, and especially the role of the specificity principle have been an active area of research. Default assumptions of a different kind have been ‘discovered’ in the framework of already existing systems, such as reasoning in databases and planning domains, and even in formulation of ordinary problems, or puzzles. A common assumption or, better, convention in such systems and problem formulations has been that positive assertions that are not explicitly stated should be considered false. Attempts to formalize this assumption have led to Reiter’s Closed World Assumption principle and McCarthy’s circumscription (see below). But first and foremost, the problem of default assumptions has shown itself in attempts to represent reasoning about actions and change.
2.1
The frame problem This, then, are the three problems in formalizing action: [the qualification, frame, and ramification problem]. Other than that, no worries. [Ginsberg, 1993]
It is difficult to overestimate the importance of the frame problem for AI. It is central to virtually every interesting area of AI, such as planning, explanation, and diagnosis. As such, solving the frame problem is necessary for the whole logical approach to AI. As was stated in [McCarthy and Hayes, 1969], a computer program capable of acting intelligently in the world must have a general representation of the world in terms of which its inputs are interpreted. It should decide what to do by inferring in a formal language that a certain strategy will achieve its assigned goal. This paper introduced also the main problem that prevented an adequate formalization of this task — the Frame Problem. Basically, the problem was how efficiently determine which things remain the same in a changing world. As was rightly noted in [Thomason, 2003], the frame problem arises in the context of predictive reasoning, a type of reasoning that has been neglected in the traditional tense-logical literature, but is essential for planning and formalizing intelligent behavior. Prediction involves the inference of later states from earlier ones. Changes in this setting do not merely occur, but occur for a reason. Furthermore, we usually assume that most things will be unchanged by the performance of an action. It is this inertia assumption that connects reasoning about action and change with nonmonotonic reasoning. In this reformulation, the frame problem is to determine what stays the same about the world as time passes and actions are performed — without having to explicitly state each time all the things that stay the same [Morgenstern, 1996].
562
Alexander Bochman
The frame problem has a number of formulations and levels of generalization. The initial description of the problem in [McCarthy and Hayes, 1969] was made in the specific context of a situation calculus, an instance of first-order logic especially formulated for reasoning about action. In this model, changes are produced by actions, so the basic relation is Result between an action, an initial situation, and a situation resulting from the performance of the action. In order to specify how propositions (fluents) do not change as actions occur (e.g., a red block remains red after we have put it on top of another block), the authors suggested to write down special axioms they called frame axioms. Of course, a huge number of things stay the same after a particular action, so we would have to add a very large number of frame axioms to the theory. This is precisely the frame problem. More generally, it is the persistence problem [Shoham, 1988]: the general problem of predicting the properties that remain the same as actions are performed, within any reasonable formalism for reasoning about time. On a more general level, the frame problem has been understood as a temporal projection problem encompassing the persistence, ramification and qualification problems, as well as the problem of backward temporal projection, or retrodiction. The ramification problem was formulated first in [Finger, 1987] and concerns the necessity of taking into account numerous derived effects (ramifications) of actions, effects created by logical and causal properties of a situation. Taking an example from [Lin, 1995], suppose that a certain suitcase has two locks, and is open if both locks are open. Then the action of opening one lock produces an indirect effect of opening the suitcase if and only if the other lock is open. Derived effects should be taken into account when combined with the above mentioned inertia assumption, since they override the latter. Thus, if an action of opening one lock is performed, its derived effect overrides the default inertia assumption that the suitcase remains closed. The ramification problem has raised general questions on the nature of causation and its role in temporal reasoning. It was an incentive for the causal approach to the frame problem, described later. The qualification problem is the problem of specifying what conditions must be true in the world for a given action to have its intended effect. It was introduced in [McCarthy, 1980] and described as follows: It seemed that in order to fully represent the conditions for the successful performance of an action, an impractical and implausible number of qualifications would have to be included in the sentences expressing them. Common sense reasoning is ordinarily ready to jump to the conclusion that a tool can be used for its intended purpose unless something prevents its use. Considered purely extensionally, such a statement conveys no information; it seems merely to assert that a tool can be used for its intended purpose unless it can’t. Heuristically, the statement is not just a tautologous disjunction; it suggests forming a plan to use the tool.
Nonmonotonic Reasoning
563
If I turn the ignition key in my car, I expect the car to start. However, many conditions have to be true in order for this statement to be true: The battery must be alive, the starter must work, there must be gas in the tank, there is no potato in the tailpipe, etc. — an open-ended list of qualifications. Still, without knowing for certain about most of these facts, I normally assume that turning the key will start the car. The Qualification Problem has turned out to be one of the most stubborn problems for the representation of action and change. The majority of subsequent nonmonotonic formalisms, such as default logic, have failed to deliver the intended conclusions due to the fact that they do not provide natural ways of formalizing the specificity principle, mentioned earlier in discussing nonmonotonic inheritance, according to which more specific defaults should override more general defaults. The idea behind first nonmonotonic solutions to the frame problem has been to treat inertia as a default: changes are assumed to occur only if there is some reason for them to occur. In action-based formalisms, the absence of change is inferred when an action is performed unless a reason for the change can be found in axioms for the action. Attempts to formalize such a reasoning obviously required a nonmonotonic reasoning system. In one of the earliest attempts to formalize such a reasoning, [Sandewall, 1972] used a modal operator U N LESS A meaning “A cannot be proved” for dealing with the frame problem, namely for expressing the inertia claim that every action leaves any fluent unaffected, unless it is possible to deduce otherwise. One of the principal motivations of McCarthy and Reiter for the study of nonmonotonic reasoning was the belief that it would provide a solution to the frame problem. The frame problem has ‘survived’, however, the first generation of nonmonotonic formalisms. [Hanks and McDermott, 1987] have suggested the Yale Shooting Anomaly and demonstrated that the apparently plausible nonmonotonic approaches to the frame problem fail. Actually, it was a major factor in conversion of McDermott, one of the founders of nonmonotonic reasoning, to an antilogicist (see [McDermott, 1987]). The Yale Shooting Anomaly has shown, in effect, that a simple-minded combination of default assumptions measured only by the number of defaults that are violated can lead to spurious solutions. Still, the development of the nonmonotonic approach to the frame problem has continued and led to new, more adequate solutions (see, e.g., [Morgenstern and Stein, 1994; Sandewall, 1994; Shanahan, 1997]). In addition, monotonic solutions to the frame problem has been suggested in [Schubert, 1990; Reiter, 1991] and successfully applied to quite complex formalization problems. These solutions were based, however, on writing explicit frame axioms stating what does not change when an action is performed.
2.2
Procedural nonmonotonicity
Procedural solutions to the frame problem have been popular in AI since its earliest days. Perhaps the best known is the planning program STRIPS suggested by Fikes
564
Alexander Bochman
and Nilsson in 1971. Given an initial state, a goal state, and a list of actions, STRIPS finds a sequence of actions that achieves the goal state. In the course of planning, however, it must reason about what changes and what stays the same after an action is performed. STRIPS avoids (and thereby solves) the frame problem by assuming that if an action is not known to change some feature, it does not. In a setting devoid of deductive inferences, this principle is easy to represent procedurally by associating with each action a list of preconditions that must be satisfied in order for the action to be performed, along with an add list and a delete list. The add list is the set of statements that gets added to the current state after an action is performed, and the delete list is the set of statements that gets deleted from the current state after the action is performed. By its very nature, STRIPS cannot handle conditional actions, and worked only for limited ontologies devoid of causation and other constraints on combination of propositional atoms. Also in other areas of AI researchers have routinely been implementing procedural nonmonotonic reasoning systems, usually without reflecting on the underlying reasoning patterns on which their programs rely. Typically these patterns were implemented using the so-called negation-as-failure, which occurs as an explicit operator in programming languages like PROLOG. In the database theory there is an explicit convention about the representation of negative information that provided a specific instance of nonmonotonic reasoning. For example, the database for an airline flight schedule does not include the city pairs that are not connected, which clearly would be an overwhelming amount of information. Instead of explicitly representing such negative information, databases implicitly do so by appealing to the so-called closed word assumption (CWA) [Reiter, 1978], which states that if a positive fact is not explicitly present in the database, its negation is assumed to hold. For simple databases consisting of atomic facts only, e.g. relational databases, this approach to negative information is straightforward. In the case of deductive databases, however, it is no longer sufficient that a fact not be explicitly present in order to conjecture its negation; the fact may be derivable. For this case, Reiter defined the closure of a database as follows: CW A(DB) = DB ∪ {¬P (t) | DB P (t)} where P(t) is a ground predicate instance. That is, if a ground atom cannot be inferred from the database, its negation is added to the closure. Under the CWA, queries are evaluated with respect to CWA(DB), rather than DB itself. Logically speaking, the CWA singles out the least model of a database. Consequently, it works only when the database possesses such a least model, e.g., for Horn databases. Otherwise it becomes inconsistent. For example, for a database containing just the disjunctive statement A ∨ B, neither A nor B is deduced, so both ¬A and ¬B are in the closure, which is then inconsistent with the original database. A suitable generalization of CWA for arbitrary databases, the Generalized Closed World Assumption, has been suggested in [Minker, 1982]. The Prolog programming language developed by Colmerauer and his students
Nonmonotonic Reasoning
565
[Colmerauer et al., 1973] and the PLANNER language developed by Hewitt [Hewitt, 1969] were the first languages to have a nonmonotonic component. The not operator in Prolog, and the THNOT capability in PLANNER provided default rules for answering questions about data where the facts did not appear explicitly in the program. In Prolog, the goal notG succeeds if the attempt to find a proof of G using the Prolog program as axioms fails. Thus, Prologs negation is a nonmonotonic operator: if G is nonprovable from some axioms, it needn’t remain nonprovable from an enlarged axiom set. The way this procedural negation is actually used in AI programs amounts to invoking the rule of inference “From failure of G, infer ¬G.” This is really the closed world assumption. This procedural negation can also be used to implement other forms of default reasoning; this has led to developing a modern logic programming as a general representation formalism for nonmonotonic reasoning. A different formalization of the CWA was proposed in [Clark, 1978] in an attempt to give a formal semantics for negation in Prolog. Clark’s idea was that Prolog clauses provide sufficient but not necessary conditions on the predicates in their heads, while the CWA is the assumption that these sufficient conditions are also necessary. Accordingly, for a propositional logic program Π consisting of rules of the form p ← a, queries should be evaluated with respect to its completion, which is a classical logical theory consisting of the following equivalences: p ↔ {∧ai | p ← ai ∈ Π}, for any ground atom p. The completion formulas embody two kinds of information. As implications from right to left, they contain the material implications corresponding to the program rules. In addition, left-to-right implications state that an atom belongs to the model only if one of its justifications is also in the model. The latter justification requirement has become a central part of the truth maintenance system, suggested by Jon Doyle.
2.3
Justification-based truth maintenance
Doyle’s truth maintenance system (TMS) can be viewed as one of the first rigorous solutions to the problem of representing nonmonotonic reasoning (see [Doyle, 1979]). In particular, it introduced the now familiar notion of nonmonotonic justification, subsequently used in default and modal nonmonotonic logics. The idea of the TMS was to keep track of the support of beliefs, and to use the record of these support dependencies when it is necessary to revise beliefs. In a TMS, part of the support for a belief can consist in the absence of other beliefs. This introduced nonmonotonicity. The TMS represented belief states by structures called nodes that the TMS labeled as either in or out (of the current state). The TMS also recorded sets of justifications or reasons for each node in the form of rules A//B c read as “A without B gives c”, meaning that the node c should be in if each node in the set A
566
Alexander Bochman
is in and each node in the set B is out. The TMS then seek to construct labelings for the nodes from these justifications, labelings that satisfy two principles: • stability — a node is labeled in iff one of its reasons is valid in the labeling (i.e., expresses hypotheses “A without B” that match the labeling); • groundedness — labelings provide each node labeled in with a noncircular argument in terms of valid reasons. The TMS algorithm and its refinements had a significant impact on AI applications, and called for a logical analysis. It provided a natural and highly specific challenge for those seeking to develop a nonmonotonic logic. In fact, both the modal nonmonotonic logic of [McDermott and Doyle, 1980] and the default logic of [Reiter, 1980] can be seen as logical formalizations of the above principles. Namely, each of these theories formalized nonmonotonic reasoning by encoding groundedness and the presence and absence of knowledge in terms of logical provability and unprovability (consistency). It is interesting to mention, however, that Jon Doyle himself has always felt a discrepancy between his original formulation and subsequent logical formalizations: In the first place, the logical formalizations convert what in many systems is a fast and computationally trivial check for presence and absence of attitudes into a computationally difficult or impossible check for provability, unprovability, consistency or inconsistency. This inaptness seems especially galling in light of the initial problem-solving motivations for nonmonotonic assumptions, for which assumptions served to speed inference, not to slow it. [Doyle, 1994] One of the most important features of Doyle’s system was its emphasis on justifications and the role of argumentation in constructing proper labelings. This theme has been developed in subsequent theories. 3
COMING OF AGE
Nonmonotonic reasoning obtained its impetus in 1980 with the publication of a seminal issue of the Artificial Intelligence Journal, devoted to nonmonotonic reasoning. The issue included papers representing three basic approaches to nonmonotonic reasoning: circumscription [McCarthy, 1980], default logic [Reiter, 1980], and modal nonmonotonic logic [McDermott and Doyle, 1980]. These theories suggested three different ways of meeting Minsky’s challenge by developing formalisms that do not have the monotonicity property. On the face of it, the three approaches were indeed different, beginning with the fact that they were based on three altogether different languages — the classical first order language in the case of circumscription, a set of inference rules in default logic, and modal language in modal nonmonotonic logic. Still, behind these
Nonmonotonic Reasoning
567
differences there was a common idea. The idea was that default conditionals or inference rules used in commonsense derivations can be represented, respectively, as ordinary conditionals or inference rules by augmenting their premises with additional assumptions, assumptions that could readily be accepted in the absence of contrary information. In this respect, the differences between the three theories amounted to different mechanisms of making such default assumptions.
3.1
Circumscription
McCarthy’s circumscription was based on classical logic, and focused in a large part on representation techniques. He stressed that circumscription is not a “nonmonotonic logic”, but a form of nonmonotonic reasoning augmenting ordinary first order logic. The first paper [McCarthy, 1980] connected the strategic ideas of [McCarthy and Hayes, 1969] with the need for nonmonotonic reasoning, and described a simplest kind of circumscription, namely domain circumscription. The second paper [McCarthy, 1986] provided more thorough logical foundations, and introduced the more general and powerful predicate circumscription approach. As a first description in [McCarthy, 1980], McCarthy characterized circumscription as a kind of conjectural reasoning by which humans and intelligent computer programs jump to the conclusion that the objects they can determine to have certain properties are the only objects that do. The result of applying circumscription to a collection A of facts is a sentence schema that asserts that the only tuples satisfying a predicate are those whose doing so follows from the sentences of A. Since adding more sentences to A might make P applicable to more tuples, circumscription is not monotonic. Conclusions derived from circumscription are conjectures that A includes all the relevant facts and that the objects whose existence follows from A are all the relevant objects. Thus, circumscription is a tool of making conjectures. Conjectures may be regarded as expressions of probabilistic notions such as “most birds can fly” or they may be expressions of standard, or normal, cases. Such conjectures sometimes conflict, but there is nothing wrong with having incompatible conjectures on hand. Besides the possibility of deciding that one is correct and the other wrong, it is possible to use one for generating possible exceptions to the other. Although circumscription was originally presented as a schema for adding more formulas to a theory, just as Reiter’s CWA or Clark’s completion, it can also be described semantically in terms of restricting the models of the theory to those that have minimal extensions of (some of) the predicates and functions. Let P be a set of predicate symbols that we are interested in minimizing and Z another set of predicate symbols that are allowed to vary across compared models. Predicates other than P and Z are called the fixed symbols. Let A(P ; Z) be a first-order sentence containing the symbols P and Z. A (parallel predicate) circumscription
568
Alexander Bochman
chooses models of A(P ; Z) that are minimal in the extension of predicates P , assuming that these models have the same interpretation for all symbols not in P or Z. This characterization can be concisely written as a second-order formula. More detailed description of circumscription in its different forms can be found in [Lifschitz, 1994b]. The importance of predicates and functions that vary across compared interpretations has been recognized in [Etherington et al., 1985]. Without these, it would be impossible to infer new positive instances of any predicates from the preferred models. Accordingly, the general problem with using circumscription in applications amounted to specifying the circumscription policy: which predicates should be varied, which should be minimized, and with what priority. Widely varying results occur depending on these choices. In fact, as was noted already in [McCarthy, 1980], the results of circumscription depend on the very set of predicates used to express the facts, so the choice of representation has epistemological consequences in making conjectures by circumscription. The above models of circumscription can also be described as preferred models with respect to an appropriate preorder on all models. This view has later led to the generalization of circumscription to a general preferential approach to nonmonotonic reasoning. Abnormality theories It is somewhat misleading to reduce the essence of circumscription to minimization. Viewed as a formalism for nonmonotonic reasoning, the central concept of McCarthy’s circumscriptive method is an abnormality theory — a set of conditionals containing the abnormality predicate ab that provides a representation for default information. [McCarthy, 1980] justified the need in introducing such auxiliary predicates as follows: When we circumscribe the first order logic statement of the [missionaries and cannibals] problem together with the common sense facts about boats etc., we will be able to conclude that there is no bridge or helicopter. “Aha”, you say, “but there won’t be any oars either”. No, we get out of that as follows: It is a part of common knowledge that a boat can be used to cross a river unless there is something wrong with it or something else prevents using it, and if our facts don’t require that there be something that prevents crossing the river, circumscription will generate the conjecture that there isn’t. The price is introducing as entities in our language the “somethings” that may prevent the use of the boat... Using circumscription requires that common sense knowledge be expressed in a form that says a boat can be used to cross rivers unless there is something that prevents its use. In particular, it looks like we must introduce into our ontology (the things that exist) a category that includes something wrong with a boat or a category that includes
Nonmonotonic Reasoning
569
something that may prevent its use. Incidentally, once we have decided to admit something wrong with the boat, we are inclined to admit a lack of oars as such a something and to ask questions like, “Is a lack of oars all that is wrong with the boat?” Some philosophers and scientists may be reluctant to introduce such things; but since ordinary language allows “something wrong with the boat” we shouldn’t be hasty in excluding it. Making a suitable formalism is likely to be technically difficult as well as philosophically problematical, but we must try. We challenge anyone who thinks he can avoid such entities to express in his favorite formalism, “Besides leakiness, there is something else wrong with the boat”. In [McCarthy, 1986], McCarthy proposed a uniform principle for representing default claims in circumscription. It turns out that many common sense facts can be formalized in a uniform way. A single predicate ab, standing for “abnormal” is circumscribed with certain other predicates and functions considered as variables that can be constrained to achieve the circumscription subject to the axioms. This also seems to cover the use of circumscription to represent default rules. Many people have proposed representing facts about what is “normally” the case. One problem is that every object is abnormal in some way, and we want to allow some aspects of the object to be abnormal and still assume the normality of the rest. We do this with a predicate ab standing for “abnormal”. We circumscribe ab z. The argument of ab will be some aspect of the entities involved. Some aspects can be abnormal without affecting others. The aspects themselves are abstract entities, and their unintuitiveness is somewhat a blemish on the theory. For example, to say that normally birds fly, we can use ∀x : Bird(x) ∧ ¬ab aspect1(x) ⊃ F ly(x). Here the meaning of ab aspect1(x) is something like “x is abnormal with respect to flying birds”. There can be many different aspects of abnormality, and they are indexed according to kind. The circumscription would then minimize abnormalities, allowing relevant predicates to vary (e.g., F ly). An important advantage of representation with abnormality predicates is that, by asserting that a certain object is abnormal in some respect, we can block, or defeat, an associated default rule without asserting that its consequent is false. For example, by asserting ab aspect1(T weety), we block the application of the default “Birds fly” to Tweety without asserting that it cannot fly. This feature has turned out to be useful in many applications.
570
Alexander Bochman
Abnormality theories have been widely used both in applications of circumscription, and in other theories. Some major examples are inheritance theories [Etherington and Reiter, 1983], logic-based diagnosis [Reiter, 1987b], naming defaults in the abductive approach of [Poole, 1988a], general representation of defaults in [Konolige and Myers, 1989] and reasoning about time and action. Abnormality theories have brought out, however, several problems in the application of circumscription to commonsense reasoning. One of the most pressing was the already mentioned specificity problem arising when there are conflicting defaults. In combining two defaults, “Birds fly” and “Penguins can’t fly”, the specificity principle naturally suggests that the second, more specific, default should be preferred. A general approach to handle this problem in circumscription, suggested in [Lifschitz, 1985] and endorsed in [McCarthy, 1986], was to impose priorities among minimized predicates and abnormalities. The corresponding variant of circumscription has been called prioritized circumscription. [Grosof, 1991] has generalized prioritized circumscription to a partial order of priorities. Further developments and applications Lifschitz [1985] described the concept of parallel circumscription and also treated prioritized circumscription. He addressed also the problem of computing circumscription and showed that, in some cases, circumscription can be replaced by an equivalent first-order formula. Further results in this direction have been obtained in [Doherty et al., 1995]. Independently of McCarthy, [Bossu and Siegel, 1985] have provided semantic account of nonmonotonic reasoning for a special class of minimal models of a first-order theory. [Lifschitz, 1987a] has proposed a more expressive variation of circumscription, called pointwise circumscription which, instead of minimizing the extension of a predicate P (x) as a whole, minimizes the truth-value of P (a) at each element a. Perlis [1986] and Poole [1989b] showed the inadequacies of circumscription to deal with counterexamples like the lottery paradox of Kyburg. In a lottery, it is known that some person will win, yet, for any individual x, the default should be that x does not win. If these facts are translated in a straightforward way into circumscription, it is impossible to arrive at the default conclusion, for any given individual x, that x will not win the lottery. To remedy this and similar anomalies, [Etherington et al., 1991] proposed a scoped circumscription in which the individuals over whom minimization proceeds are limited by a scoping predicate. We mentioned earlier that Reiter’s Closed World Assumption works only for theories that have a unique least model. For this case, it has been shown in [Lifschitz, 1985] that CWA is equivalent to circumscription (modulo unique names and domain closure assumptions). Similarly, it has been shown in [Reiter, 1982] that an appropriate circumscription always implies Clark’s predicate completion, but not vice versa. In other words, the minimal models determined by circumscription form in general only a subset of models sanctioned by completion. Recently, it
Nonmonotonic Reasoning
571
has been shown in [Lee and Lin, 2004] that circumscription can be obtained from completion by augmenting the latter with so-called ‘loop’ formulas. In the earliest circumscriptive solutions to the frame problem, the inertia rule was stated using an abnormality predicate. This formalization has succumbed, however, to the Yale Shooting Problem, mentioned earlier. Baker [1989] presented another solution to the Yale Shooting problem in the situation calculus, using a circumscriptive inertial axiom. A circumscriptive approach to the qualification problem was presented in [Lifschitz, 1987b]; it used an explicit relation between an action and its preconditions, and circumscriptively minimized preconditions, eliminating thereby unknown conditions that might render an action inefficacious. Further details and references can be found in [Shanahan, 1997].
3.2
Default Logic
A detailed description of default logic, its properties and variations can be found in [Antoniou and Wang, 2006]. Accordingly, we will sketch below only general features of the formalism, sufficient for determining its place in the nonmonotonic reasoning field. In both circumscription and modal nonmonotonic logic (see below), default statements are treated as logical formulas, while in default logic [Reiter, 1980] they are represented as inference rules. In this respect, Reiter’s default logic has been largely inspired by the need to provide logical foundations for the procedural approach to nonmonotonicity found in deductive databases, logic programming and Doyle’s truth maintenance. In particular, default logic begins with interpreting “In the absence of any information to the contrary, assume A” as “If A can be consistently assumed, then assume it.” A default is a rule of the form A : b/C, intended to state something like: ‘if A is believed, and each B ∈ b can be consistently believed, then C should be believed’. A is called a prerequisite of a default rule, b a set of its justifications. The flying birds default is represented by the rule Bird(x) : F ly(x)/F ly(x). Reiter defined a default theory to be a pair (D, W ), where D is a set of default rules and W is a set of closed first-order sentences. The set W represents what is known to be true of the world. This knowledge is usually incomplete, and default rules act as mappings from this incomplete theory to a more complete extension of the theory. They partly fill in the gaps with plausible beliefs. Extensions are defined by a fixed point construction. For any set S of firstorder sentences, Γ(S) is defined as the smallest set satisfying the following three properties: 1. W ⊆ Γ(S); 2. Γ(S) is closed under first-order logical consequence; 3. If A : b/C belongs to D and A ∈ Γ(S) and ¬B ∈ / S, for any B ∈ b, then C ∈ Γ(S).
572
Alexander Bochman
Then a set E is an extension of the default theory iff Γ(E) = E, that is, E is a fixed point of the operator Γ. The above definition embodies an idea that an extension must not contain “ungrounded” beliefs, i.e., every formula in it must be derivable from W and the consequents of applied defaults in a non-circular way. It is this property that distinguishes default logic from circumscription. Though a standard way of excluding unwanted elements employs a minimality requirement, minimality alone is insufficient to exclude ungrounded beliefs. Reiter’s idea can also be expressed as follows. At first stage we assume a conjectured extension and use it to determine the set of applicable inference rules, namely default rules such that their justifications are consistent with the assumption set. Then we take the logical closure of these applicable inference rules, and if it coincides with the candidate extension, the latter is vindicated. In this sense, an extension is a set of beliefs which are in some sense “justified” or “reasonable” in light of what is known about the world (cf. [Etherington, 1987]). This interpretation can be seen as a paradigmatic form of general explanatory nonmonotonic reasoning that we will discuss later. Extensions are deductively closed sets that are closed also with respect to the rules of the default theory. Moreover, they are minimal such sets, and hence different extensions are incomparable with respect to inclusion. Still, not every minimal theory closed with respect to default rules is an extension. Multiple extensions of a default theory are possible. The perspective adopted on these in [Reiter, 1980] was that any such extension is a reasonable belief set for an agent. A typical example involves Richard Nixon who is quaker and republican. Quakers (typically) are pacifists. Republicans (typically) are not pacifists. One might conclude that Nixon is a pacifist using the first default, but also that Nixon is not a pacifist because he is a republican. In this situation the default logic generates two extensions, one containing the belief that Nixon is a pacifist, the other one containing the belief that he is not. Speaking generally, there are many applications where each of the extensions is of interest by itself. For instance, diagnostic reasoning (see below) is usually modeled in such a way that each diagnosis corresponds to a particular extension. A default theory does not always have extensions, a simplest example being the default theory (∅, {true : ¬A/A}). However, an important subclass consisting of normal default rules of the form A : B/B always has an extension. On Reiter’s intended interpretation, normal defaults provided a representation of commonsense claims “If A, then normally B”. Reiter developed a complete proof theory for normal defaults and showed how it interfaces with a top-down resolution theorem prover. He has considered this proof theory as one of the advantages of default logic. Moreover, extensions of normal default theories are semi-monotonic: if E is an extension of a normal default theory (W, D) then the normal default theory (W, D ∪ D0 ) has an extension E0 such that E ⊆ E0 . In other words, additional defaults may augment existing extensions or produce new ones, but they never destroy the extensions obtained before. Reiter has mentioned in [Reiter, 1980] that he knows of no naturally occurring
Nonmonotonic Reasoning
573
default which cannot be represented in this form. However, the later paper [Reiter and Criscuolo, 1981] showed that in order to deal with default interactions, we need at least semi-normal defaults of the form A : B ∧ C/C. Moreover, though some authors2 have questioned the usefulness of defaults which are not semi-normal, Paul Morris has shown in [Morris, 1988] that such default rules can provide a solution for the Yale Shooting Anomaly. There has been a number of attempts of providing a direct semantic interpretation of default logic. It was suggested already in [Reiter, 1980] that default rules could be viewed as operators that restrict the models of the set of known facts W . Developing this idea, [Etherington, 1987] has suggested a semantics based on combining preference relations ≥δ on sets of models determined by individual defaults δ. For normal default theories, extensions corresponded to preferred sets of models. For full generality, however, an additional condition of Stability was required which stated, roughly, that the corresponding model set is ‘accessible’ from M OD(W ) via a sequence of ≥δ -links corresponding to defaults δ with unrefuted justifications. This operational, or quasi-inductive description of extensions has been developed later in a number of works — see [Makinson, 2003; Antoniou and Wang, 2006]. Default logic is a more general and more expressive formalism than circumscription. In one direction, [Imielinski, 1987] has shown that even normal default theories cannot be translated in a modular way to circumscription. In the other direction, [Etherington, 1987] has shown that minimizing the predicate P , with all other predicates varying, corresponds to the use of the default : ¬P (x)/¬P (x). Speaking generally, circumscription corresponds to default theories involving only simplest normal defaults without prerequisites; skeptical reasoning in such theories is equivalent to minimization with respect to the ordering on interpretations determined by the sets of violated defaults (cf. Theorem 4.7 in [Poole, 1994a]). Etherington and Reiter [1983] used default logic to formalize inheritance hierarchies, while [Reinfrank et al., 1989] have shown that justifications of Doyle’s truth maintenance system can be directly translated into default rules, and then extensions of the resulting default theory will exactly correspond to admissible labelings of TMS. A large part of the expressive power of default logic is due to the representation of defaults as inference rules. This representation avoids some problems arising with formula-based interpretations of defaults (such as contraposition) and provides a natural framework for logic programming and reasoning in databases. Unfortunately, this representation also has its problems, problems considered by many researchers as more or less serious drawbacks. One kind of problems concerned an apparent discrepancy between commonsense claims “If A then normally B” and their representation in terms of normal defaults A : B/B. For example, default logic does not allow for reasoning by cases: From “Italians (normally) like wine” and “French (normally) like wine” we cannot conclude that the person at hand that is Italian or French likes wine. This is because, 2 E.g.,
[Brewka et al., 1997].
574
Alexander Bochman
in default logic, a default can only be applied if its prerequisite has already been derived. A more profound problem has been pointed out in [Makinson, 1989]. A general class of preferential nonmonotonic inference relations (see below) satisfies a natural Cumulativity postulate stating that, if A entails both B and C, then A ∧ B should entail C. Makinson has shown, however, that default logic does not satisfy Cumulativity (see [Antoniou and Wang, 2006]). This discrepancy was a first rigorous indication that default logic cannot be directly subsumed by the preference-based approach to nonmonotonic reasoning. A problem of a different kind has been noticed in [Poole, 1989b] that concerned joint consistency of justifications of default rules. Namely, the definition of extensions only enforces that each single justification of an applied default is consistent with the generated extension, but nothing guarantees the joint consistency of the justifications of all applied defaults (see again [Antoniou and Wang, 2006] for examples and discussion). These and other problems have led to a number of proposed modifications of default logic. Most of them are described in [Antoniou and Wang, 2006]. Also, driven mainly by the analogy with logic programming, a number of authors have suggested to generalize the notion of extension to that of a partial extension — see [Baral and Subrahmanian, 1991; Przymusinska and Przymusinski, 1994; Brewka and Gottlob, 1997] (cf. also [Antonelli, 1999; Denecker et al., 2003]). Unfortunately, these modifications still have not gained widespread acceptance. A generalization of a different kind has been proposed in [Gelfond et al., 1991], guided by the need to provide a logical basis for disjunctive logic programming, as well as more perspicuous ways of handling disjunctive information. A disjunctive default theory is a set of disjunctive defaults, rules of the form a : b/c, where a, b, c are finite sets of formulas. The authors described a semantics for such default theories that constituted a proper generalization of Reiter’s default logic, as well as the semantics for disjunctive databases from [Gelfond and Lifschitz, 1991]. They also have shown how the above mentioned Poole’s problem of joint justifications can be avoided by using disjunctive default rules. Marek and Truszczy´ nski [1989] introduced the notion of a weak extension as a default counterpart of stable expansions in autoepistemic logic and models of Clark’s completion in logic programming (see below). Weak extensions can be defined3 as fixed points of a modified operator Γw , obtained by weakening the third condition in the above definition of Γ to (3’) If A : b/C ∈ D, A ∈ S and ¬B ∈ / S, for any B ∈ b, then C ∈ Γ(S). A number of studies have dealt with elaboration of default logic as a logical formalism. Thus, [Marek and Truszczy´ nski, 1989] suggested a more ‘logical’ description of default logic using the notion of a context-depended proof as a way of formalizing Reiter’s operator Γ. [Marek et al., 1990] described a general theory of nonmonotonic rule systems based on abstract rules a:b/A having an informal interpretation similar to Doyle’s justifications: ‘If all a’s are established, and none 3 Cf.
[Halpern, 1997].
Nonmonotonic Reasoning
575
of b’s is established now or ever, conclude A’. The authors studied abstract counterparts of extensions and weak extensions in this framework. An even more general default system in the framework of the domain theory has been developed in [Zhang and Rounds, 1997]. A reconstruction of default derivability in terms of restricted Hilbert-type proofs has been described in [Amati et al., 1994]. Finally, a sophisticated representation of default logic as a sequent calculus based on both provability and unprovability sequents has been given in [Bonatti and Olivetti, 2002]. Reiter has mentioned in [Reiter, 1987a] that, because the defaults are represented as inference rules rather than object language formulas, they cannot be reasoned about within the logic. Thus, from “Normally canaries are yellow” and “Yellow things are never green” we cannot conclude in default logic that “Normally canaries are never green”. As far as we know, a first default system with such meta-rules allowing to infer derived default rules from given ones has been suggested in [Thiele, 1990] (see also [Brewka, 1992]). Bochman [1994] introduced default consequence relations as a generalization of both default and modal formalizations of nonmonotonic reasoning. Default consequence relation is a logical (monotonic) inference system based on default rules of the form a:b C similar to that in [Marek et al., 1990]. The basic system was required to satisfy the following postulates: Monotonicity Cut
If a : b A and a ⊆ a′ , b ⊆ b′ , then a′ : b′ A.
If a : b A and a, A : b B, then a : b B.
Consistency
A:A f
together with postulates securing that default rules respect classical entailment both in premises and conclusions. A binary derivability operator associated with a default consequence relation was defined as Cn(u, v) = {A | u : v A}. This operator simplified a characterization of extensions and weak extensions, called expansions in [Bochman, 1994], due to their correspondence with stable expansions of autoepistemic logic. Namely, a set u of propositions is an extension of a default consequence relation if u = Cn(∅, u), and it is an expansion if u = Cn(u, u) (where u denotes the complement of u). Reiter’s default rules A:b/C were translated as the rules A:¬b C. Then it was shown that extensions of the resulting default consequence relation correspond precisely to extensions of Reiter’s default logic. Moreover, it has been shown that the following additional postulates also preserve extensions: Reflexivity
A: A.
Negative Factoring If a, B : b f and a : b, B A, then a : b A. The resulting class of default consequence relations can be viewed as an underlying monotonic logic of Reiter’s default logic. In contrast, the logic adequate for reasoning with expansions can be obtained by adopting instead a powerful alternative postulate:
576
Factoring
Alexander Bochman
If a, B : b A and a : b, B A, then a : b A.
Such default consequence relations have been called autoepistemic. In autoepistemic consequence relations, extensions collapse to expansions (weak extensions). It was shown that this logic provides an exact non-modal counterpart of Moore’s autoepistemic logic (see below).
3.3 Modal nonmonotonic logics The third seminal paper in the 1980 issue of Artificial Intelligence, [McDermott and Doyle, 1980], is the beginning of a modal approach to nonmonotonic reasoning. McDermott and Doyle provided first a broad picture of instances of nonmonotonic reasoning in different parts of AI, and stressed the need for a formal logical analysis of these phenomena. The theory was also clearly influenced by the need to provide a formal account of truth maintenance. As a general suggestion, the authors proposed to expand the notation in which logical inference rules are stated by adding premises like ‘unless proven otherwise’. The modal nonmonotonic logic was formulated in a modal language containing a modal operator M p with the intended meaning “p is consistent with everything believed”. In accordance with this understanding, a direct way of making assumptions in this setting might consist in accepting an inference rule of the form “If ¬A, then ⊢ M A”. Such a rule, however, would be circular relative to the underlying notion of derivability. And the way suggested by the authors was a fixed-point construction much similar to that of Reiter’s default logic, but formulated this time entirely in the modal object language. For a set u of propositions, let us denote by M u the set {M A | A ∈ u}, and similarly for ¬u, etc. In addition, u will denote the complement of u with respect to a given modal language. Using this notation, a set s is a fixed point of a modal theory u, if it satisfies the equality s = Th(u ∪ M¬s) The set of nonmonotonic conclusions of a modal theory was defined in [McDermott and Doyle, 1980] as an intersection of all its fixed points. The initial formulation of modal nonmonotonic logic has turned out to be unsatisfactory, mainly due to the fact that it secured no connection between a modal formula M C and its objective counterpart C, so that even the nonmonotonic theory {M C, ¬C} was consistent. In response to this latter difficulty, [McDermott, 1982] developed a stronger version of the formalism based on the entailment relation of standard modal logics instead of first-order logic. In fact, the decisive modification was an adoption of the Necessitation rule A ⊢ LA. The corresponding fixed points were defined now as sets satisfying the equality s = CnS (u ∪ M¬s), where CnS is a provability operator of some modal logic S containing the necessitation rule.
Nonmonotonic Reasoning
577
In what follows, we will call such fixed points S-extensions (following [Konolige, 1988] and [Schwarz, 1990], but in contrast with the influential subsequent terminology of Marek and Truszczy´ nski who have called them S-expansions). As has been shown by McDermott, the stronger is the underlying modal logic, the smaller is the set of extensions, and hence the larger is the set of nonmonotonic consequences of a modal theory. It has turned out, however, that the modal nonmonotonic logic based on the strongest modal logic S5 collapses to a monotonic system. So the resulting suggestion was somewhat indecisive, namely a range of possible modal nonmonotonic logics without clear criteria for evaluating the merits of the alternatives. However, this indecisiveness has turned out to be advantageous in the subsequent development of the modal approach. Neither Doyle or McDermott pursued the modal approach much beyond these initial stages. Actually, both have expressed later criticism about the whole logical approach to AI. Autoepistemic logic As an alternative response to the problems with initial formulations of modal nonmonotonic logics, Robert C. Moore proposed his autoepistemic logic (AEL) in [Moore, 1985]. Autoepistemic logic has had a great impact on the development of nonmonotonic reasoning; for a certain period of time it has even been considered as a correct replacement of the theory of McDermott and Doyle. Moore has suggested that modal nonmonotonic logic should be interpreted not as a theory of default (defeasible) reasoning, but as a model of (undefeasible) autoepistemic reasoning of an ideally rational agent. Following a suggestion made by Robert Stalnaker in 1980 (published much later as [Stalnaker, 1993]), Moore reconstructed nonmonotonic logic as a model of an ideally rational agent’s reasoning about its own beliefs. He argued that purely autoepistemic reasoning is not defeasible; it is nonmonotonic because it is context-sensitive. He defined a semantics for which he showed that autoepistemic logic is sound and complete. Instead of the modal consistency operator M , he used the dual belief operator L. Moore stressed the epistemic interpretation of a default rule as one that licenses a conclusion unless something that the agent knows blocks it. Nonmonotonicity can be achieved by endowing an agent with the introspective ability, namely an ability to reflect on its own beliefs in order to infer sentences expressing what it doesn’t believe. Introspective here means that the agent is completely aware about his beliefs: if a formula p belongs to the set of beliefs B of the agent, then also Lp has to belong to B, and if p does not belong to B, then ¬Lp must be in B. Sets of formulas satisfying these properties were first discussed by Stalnaker, who called them stable sets. Stable sets are essentially sets of formulas globally valid in S5-models (see [Konolige, 1988]). As has been shown in [Moore, 1985] (and implicitly already in [McDermott, 1982]) a stable set is uniquely determined by its objective (non-modal) propositions. This fact has provided later a basic link between modal and non-
578
Alexander Bochman
modal approaches to nonmonotonic reasoning. A stable expansion of a set u of premises was defined as a stable set that is grounded in u. Again, the formal definition was based on a fixed point equation: s = T h(u ∪ Ls ∪ ¬L¯ s) The groundedness condition ensured that every member of an expansion has some reason tracing back to s. As in default logic, conflicting AEL rules can lead to alternative stable sets of beliefs a reasoner may adopt. Moore has argued that the original nonmonotonic logic of McDermott and Doyle was simply too weak to capture the notions they wanted, while [McDermott, 1982] strengthened it in a wrong way. He observed that, in the modal nonmonotonic logic, the component Ls is missing from the ‘base’ of the fixed points, which means that McDermott and Doyle’s agents are omniscient as to what they do not believe, but they may know nothing as to what they do believe. A difference between the nonmonotonic modal and autoepistemic logics can be seen on the theory {LP →P }, which has a single fixed point that does not include P , though it also has a stable expansion containing P . As argued by Moore, this makes the interpretation of L in nonmonotonic modal logic more like “justified belief” than simple belief. On the other hand, already [Konolige, 1988] argued that the second expansion is intuitively unacceptable. It corresponds to an agent arbitrarily entering P , hence also LP , into her belief set. Since nonmonotonic S5 collapses to monotonic S5, Moore (following Stalnaker) has suggested to retreat not to S4, as was suggested in [McDermott, 1982], but to K45 obtained from S5 by dropping reflexivity LP → P . Moore has also shown that the axioms of K45 do not change stable expansions. Levesque [1990] generalized Moore’s notion of a stable expansion to the full first-order case. He provided a semantic account of stable expansions in terms of a second modal operator O, where O(A) is read as “A is all that is believed.” In this approach, nonmonotonicity was pushed entirely into the scope of the O operator. Levesque’s ideas have been systematically presented and applied to the theory of knowledge bases in [Levesque and Lakemeyer, 2000]. It has been shown in [Konolige, 1989] that autoepistemic logic is a more expressive formalism than circumscription (see also [Niemel¨a, 1992]). The relation between autoepistemic and default logic, however, has turned out to be more complex, and it has created an important incentive for further development of a theory of nonmonotonic reasoning. Unified modal theories Konolige [1988] has attempted to translate default logic into autoepistemic logic, and vice versa. To this end, he suggested to rephrase a default rule A : B1 , . . . Bn /C as a modal formula (LA ∧ ¬L¬B1 ∧ . . . ∧ ¬L¬Bn ) ⊃ C. Despite Konolige’s intentions, however, this translation has turned out to be inappropriate for representing default rules. Furthermore, [Gottlob, 1994] has
Nonmonotonic Reasoning
579
shown that there can be no modular translation of default logic into autoepistemic logic, even if we restrict ourselves to normal defaults, or just to inference rules without justifications. Still, [Marek and Truszczy´ nski, 1989] has shown that Konolige’s translation works for prerequisite-free defaults. As to the general case, Marek and Truszczy´ nski have shown that the translation provides instead an exact correspondence between stable expansions and weak extensions. A more organized picture started to emerge with the result of [Schwarz, 1990] according to which autoepistemic logic is just one of the nonmonotonic logics in the general approach of [McDermott, 1982]. Namely, it is precisely the nonmonotonic logic based on K45, or even KD45, if we ignore inconsistent expansions.4 This result, as well as difficulties encountered in interpreting default logic, revived interest in the whole range of modal nonmonotonic logics based on ‘negative introspection’ in the works of Marek, Truszczy´ nski and Schwarz. A minimal modal nonmonotonic logic in this range is based on a modal logic N that does not contain modal axioms at all (see [Fitting et al., 1992]). Corresponding N -extensions have been introduced in [Marek and Truszczy´ nski, 1990] and called iterative expansions. A systematic study of modal nonmonotonic logics based on different underlying modal logics can be found in [Marek et al., 1993; Marek and Truszczy´ nski, 1993]. This study has shown the importance of many otherwise esoteric modal logics for nonmonotonic reasoning, such as S4F, SW5, and KD45. All these logics have, however, a common semantic feature: in the terminology of [Segerberg, 1971], each of them is characterized by certain Kripke models of depth two having a unique final cluster (see [Schwarz, 1992a]). An adequate modal interpretation of defaults logic in modal nonmonotonic logics of [McDermott, 1982] has been suggested in [Truszczy´ nski, 1991]. The interpretation was based on the following more complex translation: A : B1 , . . . Bn /C ⇒ (LA ∧ LM B1 ∧ . . . ∧ LM Bn ) ⊃ LC Under this interpretation, a whole range of modal logics between T− and S4F can be used as a “host” logic. In addition, the translation can be naturally extended to disjunctive default rules of [Gelfond et al., 1991]. In this sense, modal nonmonotonic logics also subsumed disjunctive default logic. A different translation of default logic into the modal logic KD4Z has been suggested in [Amati et al., 1997]. Bochman [1994; 1995b] introduced the notion of a modal default consequence relation (see also [Bochman, 1998c]). These consequence relations were defined in a language with a modal operator, but otherwise involved the same rules as general default consequence relations, described earlier. A default consequence relation in a modal language has been called modal if it satisfied the following two modal axioms: A : LA and : A ¬LA. 4 KD45 is obtained from S5 by replacing the T axiom LA⊃A with a weaker D axiom LA⊃¬L¬A.
580
Alexander Bochman
Basically, the transition from default to modal nonmonotonic logics amounts to adding these default rules to a default theory. A similar translation has been used in [Janhunen, 1996] and rediscovered a number of times in later publications. Modal default consequence relations have turned out to be a convenient tool for studying modal nonmonotonic reasoning. Thus, both autoepistemic reasoning and reasoning with negative introspection acquired a natural characterization in this framework. In particular, S-extensions of a modal theory u have been shown to coincide with extensions of the least modal default consequence relation containing u and the modal axioms of the modal logic S. Under certain reasonable conditions, modal consequence relations have turned out to be reducible to their nonmodal, default sub-relations in a way that preserved the associated nonmonotonic semantics. These results were used in [Bochman, 1998c] for establishing a two-way correspondence between modal and default formalizations. In particular, it has been shown that modal autoepistemic logic is equivalent to a nonmodal autoepistemic consequence relation, described earlier. Lin and Shoham have suggested in [Lin and Shoham, 1992] a bimodal system purported to combine preferential and fixed-point approaches to nonmonotonic reasoning. The bridge was provided by a preference relation on models that minimized knowledge for fixed sets of assumptions. Semantic models of a bimodal system are triples (Mk , Mb , V ) such that Mk and Mb are sets of possible worlds, Mb ⊆ Mk , and V is a valuation function assigning each world a propositional interpretation. Mk and Mb serve as ranges of two modal operators K and B representing, respectively, the knowledge and assumptions (beliefs) of an introspective agent. A model M1 is preferred over a model M2 , if they have the same B-worlds, but M1 has a larger set of K-worlds. Finally, a preferred model is a model M = (Mk , Mb , V ) such that Mk = Mb , and there is no model M1 that is preferred over M. Lin and Shoham were able to show that, by using suitable translations, default and autoepistemic logic, as well as the minimal belief logic of [Halpern and Moses, 1985] can be embedded into their system. Lifschitz [1991b; 1994a] described a simplified version of such a bimodal logic, called MBNF (Minimal Belief with Negation-as-Failure), extended it to the language with quantifiers, and considered a representation of default logic, circumscription and logic programming in this framework. Yet another bimodal theory, Autoepistemic Logic of Minimal Beliefs (AELB) has been suggested in [Przymusinski, 1994] as an extension of Moore’s autoepistemic logic and has been shown to subsume circumscription, CWA and its generalizations, epistemic specifications and a number of semantics for logic programs. It has been shown in [Lifschitz and Schwarz, 1993] that a significant part of MBNF, namely theories with protected literals, can be embedded into autoepistemic logic (see also [Chen, 1994]). Furthermore, it has been shown in [Schwarz and Truszczy´ nski, 1994] that in most cases bimodal nonmonotonic logics can be systematically reduced to ordinary unimodal nonmonotonic logics by translating the belief (assumption) operator B as ¬K¬K (see also [Bochman, 1995c] for a
Nonmonotonic Reasoning
581
similar reduction of MBNF). Finally, a partial (four-valued) generalization of modal and default nonmonotonic logics has been studied in [Denecker et al., 2003].
3.4
Logic Programming
Logic programming has been based on the idea that program rules should have both a procedural and declarative (namely, logical) meaning. This double interpretation was intended to elevate the programming process by basing it on a transparent and systematic logical representation of real world information. Fortunately, it has been discovered quite early that logic programming with negation as failure allows us to express significant forms of nonmonotonic reasoning. Moreover, general nonmonotonic formalisms, described earlier, inspired the development of new semantics for logic programs. These developments have made logic programming an integral part of nonmonotonic reasoning research functioning as a general computational mechanism for knowledge representation and nonmonotonic reasoning (see [Baral, 2003]). A normal logic program is a set of program rules of the form A ← a, notb, where A is a propositional atom (the head of the rule), a and b are finite sets of atoms (forming the body of the rule), and not is negation as failure. The semantics for logic programs have gradually evolved from definite rules without not to more general program rules involving negation as failure. For definite logic programs, such a semantics has been identified with the unique minimal model of the corresponding logical theory [van Emden and Kowalski, 1976]. Both the Closed World Assumption (CWA) [Reiter, 1978] and completion theory [Clark, 1978] have provided its syntactic characterization. This semantics has been conservatively extended in [Apt et al., 1988] to stratified normal programs. It has been called the perfect semantics in [Przymusinski, 1988]. See [Shepherdson, 1988] for a detailed description of these initial semantics. Unlike the CWA, Clark’s completion is naturally definable for any normal program. It has turned out, however, that the corresponding supported semantics for logic programs produces inadequate results when applied to arbitrary (i.e., nonstratified) programs. In response to this problem, two competing semantics have been suggested for arbitrary normal programs: the stable model semantics [Gelfond and Lifschitz, 1988], and the well-founded semantics [van Gelder et al., 1991]. A large literature has been devoted to studying the relationships between these three semantics, as well as between the latter and the nonmonotonic formalisms. Under the stable model semantics, a normal program rule A ← a, notb corresponds to a rule a : ¬b/A of default logic (see [Marek and Truszczy´ nski, 1989]). This translation establishes a one-to-one correspondence between stable models of a normal logic program and extensions of the associated default theory. In this sense, logic programs under the stable semantics capture the main idea, as well as primary representation capabilities, of default logic. Still, this embedding of logic programs into default logic is unidirectional, since not every default theory
582
Alexander Bochman
corresponds in this sense to a logic program. The stable model semantics is sufficient, in particular, for an adequate representation of Doyle’s truth-maintenance system [Doyle, 1979]. If we translate each nonmonotonic justification as a program rule then there is a 1-1 correspondence between stable models of the resulting logic program and admissible labelings of the original justification network [Elkan, 1990]. Guided partly by the correspondence with default logic, [Gelfond and Lifschitz, 1991] have suggested to include classical negation ¬ into logic programs, in addition to negation-as-failure not. They argued that some facts of commonsense reasoning can be represented more easily when classical negation is available. Alferes and Pereira [1992] defined a parameterizable schema to encompass and characterize a range of proposed semantics for such extended logic programs. By adjusting the parameters they have been able to specify several semantics using two kinds of negation. Gelfond [1994] expanded the syntax and semantics of logic programs even further to an epistemic formalism with modal operators allowing for the representation of incomplete information in the presence of multiple extensions. Gelfond and Lifschitz [1988] showed that the stable model semantics is also equivalent to some translation of logic programs into autoepistemic theories. Pearce [1997] has suggested that a certain three-valued logic called here-and-there (HT) can serve as an underlying logic of the stable model semantics. Continuing this line of research, [Lifschitz et al., 2001] have shown that HT provides a characterization of strong equivalence for logic programs under the stable semantics.5 A stable model of a logic program is also a model of its completion, while the converse does not hold. Marek and Subrahmanian [1992] showed the relationship between supported models of normal programs and expansions of autoepistemic theories. Fagin [1994] has shown, however, that a natural syntactic acyclicity condition on the rules of a program, subsequently called “tightness”, is sufficient for coincidence of the stable and supported semantics. As a practical consequence, this has reduced the task of computing the stable models for tight programs to classical satisfiability [Babovich et al., 2000]. Finally, as a most important recent development, [Lin and Zhao, 2002] showed that a classical logical description of the stable semantics for an arbitrary normal program can be obtained by augmenting its Clark’s completion with what they called “loop formulas”. This formulation has opened the way of computing the stable semantics of logic programs using SAT solvers. Unlike the stable semantics, the well-founded semantics suggested in [van Gelder et al., 1991] does not correspond to any of the nonmonotonic formalisms, described earlier. It determines, however, a unique model for every normal logic program. This semantics has been elaborated upon by Przymusinski to a theory of partial stable models [Przymusinski, 1991b; Przymusinski, 1990]; the well-founded model constituted the least model in this hierarchy. Przymusinski has shown, in effect, that the well-founded semantics can be seen as the three-valued stable semantics 5 Two programs are strongly equivalent if they have the same stable model semantics for any extension of these programs with additional rules.
Nonmonotonic Reasoning
583
(see also [Dung, 1992]). Fitting [1991] has extended the well-founded semantics to a family of lattice-based logic programming languages. Denecker has characterized the WFS as based on the principle of inductive definitions — see [Denecker et al., 2001]. Finally, [Cabalar, 2001] has shown that a two-dimensional variant of the logic HT, called HT 2 , can serve as the underlying logic of WFS. An argumentationtheoretic analysis of the different semantics for normal logic programs has been suggested in [Dung, 1995a]. He has shown, in particular, that the stable, partial stable and well-founded semantics are based on different argumentation principles of rejecting negation-as-failure assumptions (see below). An alternative representation of normal logic programs and their semantics has been given in [Bochman, 1995a; Bochman, 1996b] in the framework of default consequence relations from [Bochman, 1994]. The framework has provided a relatively simple representation of various semantics for such programs, as well as allowed to single out different kinds of logical reasoning that are appropriate with respect to these semantics. It has been shown, in particular, that stable and supported models are precise structural counterparts of, respectively, extensions and expansions of general default consequence relations. An important line of research in logic programming has concentrated on extending the expressive capabilities of program rules by allowing disjunction in their heads, namely to rules of the form c ← a, notb, where c is a set of atoms (treated disjunctively). For such disjunctive logic programs, Reiter’s Closed World Assumption has been generalized to the Generalized Closed World Assumption (GCWA) [Minker, 1982] and its derivatives. The model theoretic definition of the GCWA states that one can conclude the negation of a ground atom if it is false in all minimal Herbrand models. This makes disjunctive programming quite close in spirit to McCarthy’s circumscription — see [Oikarinen and Janhunen, 2005] for a detailed comparison. Initial semantics for disjunctive programs containing negation in premises of the rules have been summarized in [Lobo et al., 1992]. Przymusinski [1991a] introduced the stable model semantics for disjunctive logic programs that generalized corresponding semantics for normal logic programs. Depending upon whether only total (2-valued) or partial (3-valued) models are used, one obtains the disjunctive stable semantics or the partial disjunctive stable semantics, respectively. He showed that for locally stratified disjunctive programs, both disjunctive semantics coincide with the perfect model semantics. Unfortunately, absence of clear grounds for adjudicating semantics for logic programs has resulted in a rapid proliferation of suggested semantics, especially for disjunctive programs. This proliferation has obviously created a severe problem. Dix [1991; 1992] has provided a systematic analysis of various semantics for normal and disjunctive programs based on such properties as associated nonmonotonic inference relations, modularity, relevance and the principle of partial evaluation. Even the two-valued stable semantics for disjunctive logic programs is already not subsumed by default logic, but requires its extension to disjunctive default logic [Gelfond et al., 1991]. It can be naturally embedded, however, into more general modal nonmonotonic formalisms such as Lifschitz’ MBNF [Lifschitz, 1994a] that
584
Alexander Bochman
we mentioned at the end of the preceding section. These formalisms have suggested, in turn, a generalization of disjunctive program rules to rules involving negation as failure in heads, namely rules of the form notd, c ← a, notb, where c and d are sets of atoms — see [Lifschitz and Woo, 1992; Inoue and Sakama, 1994]. The stable model semantics is naturally extendable to such generalized programs in a way that preserves the correspondence with these nonmonotonic formalisms. Moreover, the alternative well-founded and partial stable semantics are representable in a many-valued extension of such a modal nonmonotonic system, suggested in [Denecker et al., 2003]. Yet more powerful generalization has been introduced in [Lifschitz et al., 1999]) that made use of program rules with arbitrary logical formulas in their heads and bodies. Such results as the correspondence between the stable semantics and completion with loop formulas have been extended to such programs (see [Erdem and Lifschitz, 2003; Lee and Lifschitz, 2003]). Bochman [1996a] has suggested a general scheme for constructing semantics of logic programs of a most general kind in the logical framework of biconsequence relations that we will describe later in this study. A detailed description was given in [Bochman, 1998a; Bochman, 1998b]. Briefly, the formalism involved rules (bisequents) of the form a : b c : d that directly represented general program rules notd, c ← a, notb. The notion of circumscription of a biconsequence relation was defined, and then a nonmonotonic completion was constructed as a closure of the circumscription with respect to certain coherence rules. The strength of these rules depended on the language L in which they were formulated. Finally the nonmonotonic semantics of a biconsequence relation was defined as a pair of sets of propositions in the language L that are, respectively, provable and refutable in the nonmonotonic completion. A uniform representation of major existing semantics for logic programs was provided simply by varying the language L of the representation. A related, though more ‘logical’, representation of general logic programs and their semantics has been suggested in [Bochman, 2004b] in the framework of causal inference relations from [Bochman, 2004a] that will be described later.
3.5 Argumentation theory Since Doyle’s seminal work on truth maintenance [Doyle, 1979], the importance of argumentation in choosing default assumptions has been a recurrent theme in the literature on nonmonotonic reasoning (see, e.g., [Lin and Shoham, 1989; Geffner, 1992]). In a parallel development, a number of researchers in traditional argumentation theory have shown that, despite influential convictions and prejudices, the ordinary, human argumentation is within the reach of formal logical methods (see [Ches˜ nevar et al., 2000] for a survey of this development). A powerful version of abstract argumentation theory, especially suitable for nonmonotonic reasoning, has been suggested in [Dung, 1995b]. We will give below a brief description of this theory. DEFINITION 1. An abstract argumentation theory is a pair A, ֒→, where A is
Nonmonotonic Reasoning
585
a set of arguments, while ֒→ a binary relation of an attack on A. If α ֒→ β holds, then the argument α attacks, or undermines, the argument β. A general task of argumentation theory consists in determining sets of arguments that are safe (justified) in some sense with respect to the attack relation. To this end, we should extend first the attack relation to sets of arguments: if Γ, ∆ are sets of arguments, then Γ ֒→ ∆ is defined to hold if α ֒→ β, for some α ∈ Γ and β ∈ ∆. Let us say that an argument α is allowable for the set of arguments Γ, if Γ does not attack α. For any set of arguments Γ, we will denote by [Γ] the set of all arguments allowable by Γ, that is [Γ] = {α | Γ ֒→ α} An argument α will be said to be acceptable for the set of arguments Γ, if Γ attacks any argument against α. As can be easily checked, the set of arguments that are acceptable for Γ coincides with [[Γ]]. Using the above notions, we can give a quite simple characterization of the basic objects of an abstract argumentation theory. DEFINITION 2. A set of arguments Γ will be called • conflict-free if Γ ⊆ [Γ]; • admissible if it is conflict-free and Γ ⊆ [[Γ]]; • a complete extension if it is conflict-free and Γ = [[Γ]]; • a preferred extension if it is a maximal complete extension; • a stable extension if Γ = [Γ]. A set of arguments Γ is conflict-free if it does not attack itself. A conflict-free set Γ is admissible if and only if any argument from Γ is also acceptable for Γ, and it is a complete extension if it coincides with the set of arguments that are acceptable with respect to it. Finally, a stable extension is a conflict-free set of arguments that attacks any argument outside it. Any stable extension is also a preferred extension, any preferred extension is a complete extension, and any complete extension is an admissible set. Moreover, as has been shown in [Dung, 1995b], any admissible set is included in some complete extension. Consequently, preferred extensions coincide with maximal admissible sets. In addition, the set of complete extensions forms a complete lower semi-lattice: for any set of complete extensions, there exists a unique greatest complete extension that is included in all of them. In particular, there always exists a least complete extension of an argumentation theory. As has been shown in [Dung, 1995a] the above objects exactly correspond to the semantics suggested for normal logic programs. Thus, stable extensions correspond to stable models, complete extensions correspond to partial stable models,
586
Alexander Bochman
preferred extensions correspond to regular models, while the least complete extension corresponds in this sense to the well-founded semantics. These results have shown, in effect, that the abstract argumentation theory successfully captures the essence of logical reasoning behind normal logic programs (see also [Kakas and Toni, 1999]). The notion of an argument is often taken as primitive in argumentation theory, which even allows for a possibility of considering arguments that are not propositional in character (e.g., arguments as inference rules, or derivations). As has been shown in [Bondarenko et al., 1997], however, a powerful instantiation of argumentation theory can be obtained by identifying arguments with propositions of a special kind called assumptions (see also [Kowalski and Toni, 1996]). Thus, for logic programs assumptions can be identified with negation-as-failure literals of the form notA, while consistency-based justifications of default rules (see above) can serve as assumptions in default logic. Slightly simplifying definitions from [Bondarenko et al., 1997], an assumptionbased argumentation framework can be defined as a triple consisting of an underlying deductive system, a distinguished subset of propositions Ab called assumptions, and a mapping from Ab to the set of all propositions of the language that determines the contrary α of any assumption α. For instance, the contrary of the negation-as-failure literal notA in logic programming is the atom A itself, while ¬A is the contrary of the justification A in default logic. A set of assumptions Γ attacks an assumption α, if it implies its contrary, α. It has been shown that the main nonmonotonic formalisms such as default and modal nonmonotonic logics, as well as semantics of normal logic programs are representable in this framework. A dialectical proof procedure for finding admissible sets of assumptions has recently been described in [Dung et al., 2006]. In [Bochman, 2003b] an extension of an abstract argumentation framework was introduced in which the attack relation is defined directly among sets of arguments. The extension, called collective argumentation, has turned out to be suitable for representing semantics of disjunctive logic programs. In collective argumentation ֒→ is an attack relation on sets of arguments satisfying the following monotonicity condition: (Monotonicity)
If Γ ֒→ ∆, then Γ ∪ Γ′ ֒→ ∆ ∪ ∆′ .
Dung’s argumentation theory can be identified with a normal collective argumentation theory in which no set of arguments attacks the empty set ∅ and the following condition is satisfied: (Locality)
If Γ ֒→ ∆, ∆′ , then either Γ ֒→ ∆ or Γ ֒→ ∆′ .
In a normal collective argumentation theory the attack relation is reducible to the relation Γ ֒→ α between sets of arguments and single arguments, and the resulting theory coincides with that given in [Dung, 1995a]. The attack relation of an argumentation theory can be given a natural fourvalued semantics based on independent evaluations of acceptance and rejection of
Nonmonotonic Reasoning
587
arguments. On this interpretation the attack Γ ֒→ ∆ means that at least one of the arguments in ∆ should be rejected whenever all the arguments from Γ are accepted. The expressive capabilities of the argumentation theory depend, however, on the absence of usual ‘classical’ constraints on the acceptance and rejection of arguments, so it permits situations in which an argument is both accepted and rejected, or, alternatively, neither accepted, nor rejected. Such an understanding can be captured formally by assigning to any argument an arbitrary subset of the set {t, f }, where t denotes acceptance (truth), while f denotes rejection (falsity) (cf. [Jakobovits and Vermeir, 1999]). This interpretation is nothing other than the well-known Belnap’s interpretation of four-valued logic (see [Belnap, 1977]). The language of arguments can be extended with a global negation connective ∼ having the following semantic interpretation: ∼A is accepted iff A is rejected ∼A is rejected iff A is accepted. An axiomatization of this negation in argumentation theory can be obtained by imposing the following rules on the attack relation (see [Bochman, 2003b]): A ֒→ ∼A
∼A ֒→ A
If a ֒→ A, b and a, ∼A ֒→ b, then a ֒→ b
AN
If a, A ֒→ b and a ֒→ b, ∼A, then a ֒→ b It turns out that the resulting N-attack relations are interdefinable with certain consequence relations. A Belnap consequence relation in a propositional language with a global negation ∼ is a Scott (multiple-conclusion) consequence relation satisfying the postulates (Reflexivity)
A A;
(Monotonicity) If a b and a ⊆ a′ , b ⊆ b′ , then a′ b′ ; (Cut) If a b, A and a, A b, then a b, as well as the following two Double Negation rules for ∼: A ∼∼A
∼∼A A.
For a set u of propositions, ∼u will denote the set {∼A | A ∈ u}. Now, for a given N-attack relation, we can define the following consequence relation: a b ≡ a ֒→ ∼b
CA
Similarly, for any Belnap consequence relation we can define the corresponding attack relation as follows: a ֒→ b ≡ a ∼b AC
588
Alexander Bochman
As has been shown in [Bochman, 2003b], the above definitions establish an exact equivalence between N-attack relations and Belnap consequence relations. In this setting, the global negation ∼ serves as a faithful logical formalization of the operation of taking the contrary from [Bondarenko et al., 1997]. Moreover, given an arbitrary language L that does not contain ∼, we can define assumptions as propositions of the form ∼A, where A ∈ L. Then, since ∼ satisfies double negation, a negation of an assumption will be a proposition from L. Such a ‘negative’ representation of assumptions will agree with the applications of the argumentation theory to other nonmonotonic formalisms described in [Bondarenko et al., 1997].
3.6 Abduction and causal reasoning It does not seem necessary to argue that abduction and causation are essential for the human, commonsense reasoning about the world. It has been gradually realized, however, that these kinds of reasoning are essential also for efficient ‘artificial’ reasoning. Moreover, a new outlook on these kinds of reasoning can be achieved by viewing them as special, though important, instances of nonmonotonic reasoning. Abductive reasoning Abduction is the process of finding explanations for observations. The importance of abduction for AI can be seen already in the Minsky’s frame paper [Minsky, 1974]. A frame that contains default and stereotypical information should be imposed on a particular situation: Once a frame is proposed to represent a situation, a matching process tries to assign values to each frame’s terminals, consistent with the markers at each place. The matching process is partly controlled by information associated with the frame (which includes information about how to deal with surprises) and partly by knowledge about the system’s current goals. Israel [1980] criticized first nonmonotonic formalisms by objecting to the centrality of deductive logic in these formalisms as a mechanism for justification. He argued that what we need is not a new logic, but a good scientific methodology. Abductive reasoning to a best explanation requires rational epistemic policies that lie, on Israel’s view, outside nonmonotonic logics. McDermott [1987] levied a similar criticism about the absence of a firm theoretical basis behind diagnostic and other programs dealing with abduction: This state of affairs does not stop us from writing medical diagnosis programs. But it does keep us from understanding them. There is no independent theory to appeal to that can justify the inferences a program makes .... these programs embody tacit theories of abduction;
Nonmonotonic Reasoning
589
these theories would be the first nontrivial formal theories of abduction, if only one could make them explicit. Despite this criticism, however, formal properties of abductive inference methods and relations to other formalisms have been actively explored. de Kleer [1986] has developed the Assumption-based Truth Maintenance Systems (ATMS), a method for finding explanations in the context of a propositional Horn-clause theories, when the hypotheses and observations are positive atoms. Similarly to Doyle’s JTMS [Doyle, 1979], described earlier, the ATMS was developed by de Kleer as a subcomponent of a more general problem-solving system. The main inference mechanism in the ATMS is the computation of a label at each node. A label for an atom c is a set of environments, that is, sets of hypotheses that explain c. The ATMS differed from Doyle’s TMS, however, in keeping track of multiple explanations or contexts, and especially in using only monotonic propositional inference. More general systems of logical abduction used more general theories. The most prominent of these is the Clause Maintenance System (CMS) of [Reiter and de Kleer, 1987]. To generalize the ATMS definition of explanation, the authors considered a propositional domain theory consisting of general clauses and drew on the concept of prime implicates to extend the inference technique of the ATMS. Prime implicates can be used to find all the parsimonious explanations of unit clauses. It was shown in [Reiter and de Kleer, 1987] that the label in ATMS is exactly the set of such explanations, so the ATMS can be used to compute parsimonious explanations for propositional Horn-clause theories. Under diagnosis from first principles [Reiter, 1987b], or diagnosis from structure and behavior, the only information at hand is a description of some system, say a physical device, together with an observation of that system’s behavior. If this observation conflicts with intended system behavior, then the diagnostic problem is to determine which components could by malfunctioning account for the discrepancy. Since components can fail in various and often unpredictable ways, their normal or default behaviors should be described. These descriptions fit the pattern of nonmonotonic reasoning. For example, an AND gate in a digital circuit would have the description: Normally, an AND-gate’s output is the Boolean AND function of its inputs. In diagnosis, such component descriptions are used in the following way: We first assume that all of the system components are behaving normally. Suppose, however, the system behavior predicted by this assumption conflicts with (i.e. is inconsistent with) the observed system behavior. Thus some of the components we assume to be behaving normally must really be malfunctioning. By retracting enough of the original assumptions about correctly behaving components, we can remove the inconsistency between the predicted and observed behavior. The retracted components yield a diagnosis. This approach to diagnosis from first principles was called a consistency-based diagnosis, and it forms the basis for several diagnostic reasoning systems (see also [de Kleer et al., 1992]). From a logical point of view, the consistency-based diagnosis reduces to classical consistency reasoning with respect to a certain abnormality theory, in which
590
Alexander Bochman
abnormalities are minimized as in circumscription. This kind of diagnosis is still not fully abductive, since it determines only what is a minimal set of abnormalities that is consistent with the observed behavior. In other words, it does not explain observations, but only excuses them. In medical diagnosis, the type of reasoning involved is abductive in nature and consists in explaining observations or the symptoms of a patient. Moreover, as in general nonmonotonic reasoning, adding more information can cause a previously held conclusion to become invalid. If it is known only that a patient has a fever, the most reasonable explanation is that he has the flu. But if we learn that he also has jaundice, then it becomes more likely that he has a disease of the liver. The first, ‘procedural’ formalization of this reasoning was the set-covering model, proposed in [Reggia et al., 1985]. This model had a set of causes and a set of symptoms, along with a relation that maps a cause to the set of symptoms that it induces. Given an observation of symptoms for a particular case, a diagnosis is a set of causes that covers all of the observed symptoms and contains no irrelevant causes. On the face of it, consistency-based and abductive diagnosis appear very different. To begin with, rather than abducing causes that imply the observations, the consistency approach tries to minimize the extent of the causation set by denying as many of its elements as possible. Moreover, in the abductive framework, the causes have implications for the effects, while in the consistency based systems, the most important information seems to be the implication of the observations for possible causes. Despite these differences, it is now known that, under certain conditions, consistency-based explanations in the Clark completion of the domain theory coincide with abductive explanations in the source theory. THEOREM. [Poole, 1988b; Console et al., 1991] Let Σ be a set of nonatomic definite clauses whose directed graph of dependencies is acyclic, and let Π be the Clark completion of Σ. Then the consistency-based explanations of an observation O in Π are exactly the abductive explanations of O in Σ. For more complicated domain theories, Clark completion does not give the required closure to abductive explanations. For such more general cases the correct generalization of Clark completion is explanatory closure (see [Konolige, 1992]). Classical abduction has also been used as a practical proof method for circumscription (see, e.g., [Ginsberg, 1989]). Levesque [1989] suggested a knowledge level analysis of abduction in which the domain theory is represented as the beliefs of an agent. Motivated primarily by the intractability of logic-based abduction, this representation allowed for incomplete deduction in connecting hypotheses to observations for which tractable abduction mechanisms can be developed. Poole [1988a] has developed the Theorist system in which abduction is used as an inference method in default theories. In the spirit of David Israel, Poole argued that there is nothing wrong with classical logic; instead, nonmonotonicity is a problem of how the logic is used. Theorist assumed the typical abductive
Nonmonotonic Reasoning
591
machinery: a set of hypotheses, a first-order background theory, and the concept of explanation. A scenario is any subset of the hypotheses consistent with the background theory; in the language of Theorist, an explanation for an observation O is a scenario that implies O. Theorist defined the notion of extension as a set of propositions generated by a maximal consistent scenario. In other words, by taking defaults as possible hypotheses, default reasoning has been reduced to a process of theory formation. Defaults of Poole’s abductive system corresponded to a simplest kind of Reiter’s default rules, namely normal defaults of the form : A/A. In addition, Poole employed the mechanism of naming defaults (closely related to McCarthy’s abnormality predicates) that has allowed him to say, in particular, when a default is inapplicable. The resulting system has been shown to capture many of the representative capabilities of Reiter’s default logic in an almost classical logical framework. Systems such as the ATMS and Theorist are popular in many AI applications, because they are easy to understand and relatively easy to implement. The use of abductive methods is growing within AI, and they are now a standard part of most AI representation and reasoning systems. In fact, the study of abduction is one of the success stories of nonmonotonic reasoning, and it has a major impact on the development of an application area. Abductive Logic Programming. One of the important applications of abduction has been developed in the framework of logic programming. A comprehensive survey of the extension of logic programming to perform abductive reasoning (referred to as abductive logic programming) can be found in [Kakas et al., 1992] together with an extensive bibliography on abductive reasoning. As their primary goal, the authors introduced an argumentation theoretic approach to the use of abduction as an interpretation of negation-as-failure. Abduction was shown to generalize negation-as-failure to include not only negative but also positive hypotheses, and to include general integrity constraints. They showed that abductive logic programming is related to the justification-based truth maintenance system of Doyle and the assumption-based truth maintenance system of de Kleer. Abductive logic programs are defined as pairs (Π, A), where Π is a logic program, and A a set abducible atoms. A formalization of abductive reasoning in this setting is provided by the generalized stable semantics [Kakas and Mancarella, 1990], in which an abductive explanation of a query q is a subset S of abducibles such that there exists a stable model of the program Π ∪ S that satisfies q. Denecker and De Schreye [1992] developed a family of extensions of SLDNF resolution for normal abductive programs. Further details and directions in abductive logic programming can be found, e.g., in [Brewka and Konolige, 1993; You et al., 2000; Lin and You, 2002]. It has been shown in [Inoue and Sakama, 1998] that abductive logic programs under the generalized stable semantics are reducible to general disjunctive logic programs under the stable semantics. The relevant transformation of abductive
592
Alexander Bochman
programs can be obtained simply by adding to Π the program rules p, notp ←, for any abducible atom p from A. This reduction has confirmed, in effect, that general logic programs are inherently abductive and hence have the same representation capabilities as abductive logic programs. An abstract abductive framework. To end this section on abduction, we will provide below a brief description of an abstract abductive system that is sufficient for the main applications of abduction in AI. Further details on this system and its expressive capabilities can be found in [Bochman, 2005]. An abductive system is a pair A = (Cn, A), where Cn is a supraclassical consequence relation, while A a distinguished set of propositions called abducibles. A set of abducibles a ⊆ A is an explanation of a proposition A, if A ∈ Cn(a). In applications, the consequence relation Cn is usually given indirectly by a generating conditional theory ∆, in which case the corresponding abductive system can be defined as (Cn∆ , A). Many abductive frameworks also impose syntactic restrictions on the set of abducibles A (Poole’s Theorist being a notable exception). Thus, A is often restricted to a set of special atoms (e.g., those built from abnormality predicates ab), or to the corresponding set of literals. The restriction of this kind is not essential, however. Indeed, for any abducible proposition A we can introduce a new abducible propositional atom pA , and add the equivalence A ↔ pA to the underlying theory. The new abductive system will have much the same properties. An abductive system (Cn, A) will be called classical if Cn is a classical consequence relation. A classical abductive system can be safely equated with a pair (Σ, A), where Σ is a set of classical propositions (the domain theory). An example of such a system in diagnosis is [de Kleer et al., 1992], a descendant of the consistency-based approach of [Reiter, 1987b]. In abductive systems, acceptance of propositions depends on existence of explanations, and consequently such systems sanction not only forward inferences determined by the consequence relation, but also backward inferences from facts to their explanations, and combinations of both. All these kinds of inference can be captured formally by considering only theories of Cn that are generated by the abducibles. This suggests the following notion: DEFINITION 3. The abductive semantics SA of an abductive system A is the set of theories {Cn(a) | a ⊆ A}. By restricting the set of theories to theories generated by abducibles, we obtain a semantic framework containing more information. Generally speaking, all the information that can be discerned from the abductive semantics of an abductive system can be seen as abductively implied by the latter. The information embodied in the abductive semantics can be made explicit by using the associated Scott (multiple-conclusion) consequence relation, defined as follows6 : for any sets b, c of propositions, 6A
Tarski consequence relation of this kind has been used for the same purposes in [Lobo and
Nonmonotonic Reasoning
593
b ⊢A c ≡ (∀a ⊆ A)(b ⊆ Cn(a) → c ∩ Cn(a) = ∅) This consequence relation describes not only forward explanatory relations, but also abductive inferences from propositions to their explanations. Speaking generally, it describes the explanatory closure, or completion, of an abductive system, and thereby captures abduction by deduction (cf. [Console et al., 1991; Konolige, 1992]). EXAMPLE 4. The following abductive system describes a variant of the wellknown Pearl’s example. Assume that an abductive system A is determined by the set ∆ of rules Rained ⊢ Grasswet Rained ⊢ Streetwet,
Sprinkler ⊢ Grasswet
and the set of abducibles Rained, ¬Rained, Sprinkler, ¬Sprinkler, ¬Grassswet. Since Rained and ¬Rained are abducibles, Rained is an independent (exogenous) parameter, and similarly for Sprinkler. However, since only ¬ Grassswet is an abducible, non-wet grass does not require explanation, but wet grass does. Thus, any theory of SA that contains Grasswet should contain either Rained, or Sprinkler, and consequently we have Grasswet ⊢A Rained, Sprinkler. Similarly, Streetwet implies in this sense both its only explanation Rained and a collateral effect Grasswet. Causal reasoning The last example of the preceding section also illustrates that one of the central applications of abductive inference consists in generating causal explanations: reasons connecting causes and their effects. In this case, the hypotheses, or abducibles, represent primitive causes, observations are about their effects, and the background theory encodes the relation between them. This is just one of the many ways in which the notion of causation that once has been expelled from exact sciences (see [Russell, 1957]) reappears as an important representation tool in Artificial Intelligence. Moreover, the theories of causality emerging in AI are beginning to illuminate in turn the actual role of causality in our reasoning. As a mater of fact, causal considerations play an essential role in abduction in general. They determine, in particular, the very choice of abducibles, as well as the right form of descriptions and constraints (even in classical first-order representations). As has been shown already in [Darwiche and Pearl, 1994], system descriptions that do not respect the natural causal order of things can produce inadequate predictions and explanations. Uzc´ ategui, 1997].
594
Alexander Bochman
The intimate connection between causation and abduction has become especially vivid in the abductive approach to diagnosis (see especially [Cox and Pietrzykowski, 1987; Poole, 1995b; Konolige, 1994]). As has been acknowledged in these studies, reasoning about causes and effects should constitute a logical basis for diagnostic reasoning. Unfortunately, the absence of an adequate logical formalization for causal reasoning has relegated the latter to the role of an informal heuristic background, with classical logic serving as the representation language. Abduction and diagnosis are not the only areas of AI in which causality has emerged. Thus, causality is an essential part of general qualitative reasoning; see, e.g., [Iwasaki and Simon, 1986; Nayak, 1994], which goes back to Herbert Simon’s important work [Simon, 1952]. Judea Pearl and his students and associates have developed a profound research program in the study of causation derived from the use of causal diagrams in reasoning about probabilities known as Bayesian Belief Networks. A detailed description of the emerged theory of causality and its applications can be found in [Pearl, 2000]. See also [Halpern, 1998; Halpern and Pearl, 2001a; Halpern and Pearl, 2001b; Eiter and Lukasiewicz, 2004]. But perhaps the most significant development concerning the role of causation in our reasoning has occurred in of one of the central fields of logical AI, reasoning about action and change. The starting point in this development was the discovery of the Yale Shooting Problem [Hanks and McDermott, 1987]. We are told that a gun is loaded at time 1, and that the gun is fired at Fred at time 5. Loading a gun causes the gun to be loaded, and firing a loaded gun at an individual causes the person to be dead. In addition, the fluents alive and loaded persist as long as possible; that is, these fluents stay true unless an action occurs that is abnormal with respect to these fluents. Thus, a person who is alive tends to remain alive, and a gun that is loaded tends to remain loaded. What can we conclude about Fred’s status at time 6? Although common sense argues that Fred is dead at time 6, existed nonmonotonic representations of this story supported two models. In one model (the expected model), the fluent loaded persists as long as possible. Therefore, the gun remains loaded until it is fired at Fred, and Fred dies. In this model, at time 5, shooting is abnormal with respect to Fred’s being alive. In the other, unexpected model, the fluent alive persists as long as possible (i.e., Fred is alive after the shooting). Therefore, the fluent loaded did not persist; somehow the gun has become unloaded. That is, in some situation between 2 and 5, the empty action Wait was abnormal with respect to the gun being loaded. This existence of multiple extensions has created a genuine problem. Hanks and McDermott in fact argued that the Yale Shooting Problem underscored the inadequacy of the whole logicist temporal reasoning. Early solutions to the Yale Shooting problem has been based on chronological minimization, but they quickly lost favor mainly due to their inability to handle backward temporal reasoning. As was rightly pointed out in [McDermott, 1987], the main source of the problem was that the previous nonmonotonic logics drew conclusions that minimized disturbances instead of avoiding disturbances with unknown causes. And the emerged alternative approaches has begun to employ
Nonmonotonic Reasoning
595
various forms of causal reasoning. Lifschitz [1987b] was the beginning of a sustained line of research in the causal approach. The causal-based approach argued that we expect Fred to die because there is an action that causes Fred’s death, but there is no action that causes the gun to become unloaded. All the causal representations formalize this principle in some way. Lifschitz has provided also a solution to the qualification problem in this framework. Several modifications of Lifschitz’s solution have been suggested (see, e.g., [Baker, 1989]). A broadly explanatory approach to nonmonotonic reasoning was pursued by Hector Geffner in [Geffner, 1992]. In the last chapters of his book, Geffner argued for the necessity of incorporating causation as part of the meaning of default conditionals. He used a unary modal operator Cp meaning ‘p is caused’ and expressed causal claims as conditionals of the form p → Cq. Using this language, he formalized a simple causal solution to the Yale Shooting Problem. In [Geffner, 1992], however, the causal theory was only sketched; in particular the Ramification Problem was left untouched. One of the first causal approaches, Motivated Action Theory [Morgenstern and Stein, 1994], was based on the idea of preferring models in which unexpected actions do not happen. A description of a problem scenario in MAT consisted of a theory and a partial chronicle description. The theory contained causal rules and persistence rules. Causal rules described how actions change the world; persistence rules described how fluents remain the same over time. Central to MAT was the concept of motivation. Intuitively, an action is motivated if there is a “reason” for it to happen. The effects of causal chains are motivated in this sense. A model is preferred if it has as few unmotivated actions as possible. This kind of reasoning was clearly nonmonotonic, and it allowed for a solution to the Yale Shooting Problem. In the expected model, where the gun remains loaded and Fred dies, there are no unmotivated actions. In the unexpected model, there is an unmotivated action – the Unload action. Thus we prefer the expected models over the unexpected models. However, MAT falled short as a solution to the frame problem due to its persistence rules – which are just another form of frame axioms. A first systematic causal representation of reasoning about action and change has been developed by Fangzhen Lin in [Lin, 1995; Lin, 1996]. The representation has allowed, in particular, to handle natural causes in addition to actions, and provided a natural solution to the ramification problem. Lin’s representation was formulated in a purely classical first-order language, but employed a (reified) distinction between facts that hold in a situation versus facts that are caused in it. The adequate models have been obtained by minimizing (i.e., circumscribing) the caused facts. A formalization of this causal reasoning in action theories has been suggested in [McCain and Turner, 1997] in the framework of what they called causal theories. A causal theory is a set of causal rules that express causal relations among propositions. The inertia (or persistence) principle is also expressed as a kind of a causal rule. Then a true proposition can be caused either because it is
596
Alexander Bochman
the direct or indirect effect of an action, or because it involves the persistence of a caused proposition. Initial conditions are also considered to be caused, by stipulation. Finally, the nonmonotonic semantics of a causal theory is determined by causally explained models, namely the models that both satisfy the causal rules and such that every fact holding in them is caused by some causal rule. In other words, in causally explained models the caused propositions coincide with the propositions that are true, and this must be the only possibility consistent with the extensional part of the model. The resulting nonmonotonic formalism has been shown to provide a plausible and efficient solution for both the frame and ramification problem. See [Lifschitz, 1997; Turner, 1999; Giunchiglia et al., 2004] for a detailed exposition of this theory and applications in representing action domains. Related causal approaches to representing actions and change have been suggested in [Thielscher, 1997; Schwind, 1999; Zhang and Foo, 2001], to mention only a few. The logical foundations of causal reasoning of this kind have been formulated in [Bochman, 2003c; Bochman, 2004a] in the framework of an inference system for causal rules originated in input/output logics of [Makinson and van der Torre, 2000]. Formally, a causal inference relation is a binary relation ⇒ on the set of classical propositions satisfying the following postulates: (Strengthening) (Weakening)
If A ⇒ B and B C, then A ⇒ C;
If A ⇒ B and A ⇒ C, then A ⇒ B ∧ C;
(And) (Or)
If A B and B ⇒ C, then A ⇒ C;
If A ⇒ C and B ⇒ C, then A ∨ B ⇒ C.
(Cut) (Truth) (Falsity)
If A ⇒ B and A ∧ B ⇒ C, then A ⇒ C; t ⇒ t; f ⇒ f.
From a logical point of view, the most significant ‘omission’ of the above set is the absence of the reflexivity postulate A ⇒ A. It is precisely this feature of causal inference that creates a possibility of nonmonotonic reasoning. Causal inference relations can be given a standard possible worlds semantics. Namely, given a relational possible worlds model (W, R, V ), where W is a set of possible worlds, R a binary accessibility relation on W , and V a valuation function, the validity of causal rules can be defined as follows: DEFINITION 5. A rule A ⇒ B is valid in a possible worlds model (W, R, V ) if, for any α, β ∈ W such that αRβ, if A holds in α, then B holds in β. Causal inference relations are determined by possible worlds models in which the relation R is quasi-reflexive, that is, αRβ holds only if αRα.
Nonmonotonic Reasoning
597
Causal rules are extended to rules with sets of propositions
in premises by stipulating that, for a set u of propositions, u ⇒ A holds if a ⇒ A for some finite a ⊆ u. C(u) denotes the set of propositions explained by u: C(u) = {A | u ⇒ A} The production operator C plays the same role as the usual derivability operator for consequence relations. In particular, it is a monotonic operator, that is, u ⊆ v implies C(u) ⊆ C(v). Still, it does not satisfy inclusion, that is, u ⊆ C(u) does not in general hold.
In causal inference relations, any causal rule is reducible to a set of clausal rules li ⇒ lj , where li , lj are classical literals. In addition, any rule A ⇒ B is equivalent to a pair of rules A ∧ ¬B ⇒ f and A ∧ B ⇒ B. The rules A ∧ B ⇒ B are explanatory rules. Though logically trivial, they play an important explanatory role in causal reasoning by saying that, if A holds, B is self-explanatory (and hence does not require explanation). Such rules has been used in [McCain and Turner, 1997] for representing inertia or persistence claims (see an example at the end of this section). On the other hand, the rule A∧¬B ⇒ f is a constraint that does not have an explanatory content, but imposes a factual restriction A→B on the set of interpretations. Causal inference relations determine also a natural nonmonotonic semantics, and provide thereby a logical basis for a particular form of nonmonotonic reasoning. DEFINITION 6. A nonmonotonic semantics of a causal inference relation is the set of all its exact worlds — maximal deductively closed sets u of propositions such that u = C(u). Exact worlds are worlds that are fixed points of the production operator C. An exact world describes a model that is closed with respect to the causal rules and such that every proposition in it is caused by other propositions accepted in the model. Accordingly, they embody an explanatory closure assumption, according to which any accepted proposition should also have explanation for its acceptance. Such an assumption is nothing other than the venerable principle of Universal Causation (cf. [Turner, 1999]). The nonmonotonic semantics for causal theories is indeed nonmonotonic in the sense that adding new rules to the causal relation may lead to a nonmonotonic change of the associated semantics, and thereby of derived information. This happens even though causal rules themselves are monotonic, since they satisfy the postulate of Strengthening (the Antecedent). The nonmonotonic semantics of causal theories coincides with the semantics suggested in [McCain and Turner, 1997]. Moreover, it has been shown in [Bochman, 2003c] that causal inference relations constitute a maximal logic adequate for this kind of nonmonotonic semantics. Causal inference relations and their variations have turned out to provide a new general-purpose formalism for nonmonotonic reasoning with representation
598
Alexander Bochman
capabilities stretching far beyond reasoning about action. In particular, it has been shown in [Bochman, 2004a] that they provide a natural logical representation of abduction, based on treating abducibles as self-explanatory propositions satisfying reflexivity A ⇒ A. In addition, it has been shown in [Bochman, 2004b] that causal reasoning provides a precise interpretation for general logic programs. Namely, any program rule c, notd ← a, notb can be translated as a causal rule d, ¬b ⇒ ∧a →∨c. Then the reasoning underlying the stable model semantics for logic programs (see above) is captured by augmenting the resulting causal theory with the causal version of the Closed World Assumption stating that all negated atoms are selfexplanatory: Default Negation ¬p ⇒ ¬p, for any propositional atom p. The causal nonmonotonic semantics of the resulting causal theory will correspond precisely to the stable semantics of the source logic program. Moreover, unlike known embedding of logic programs into other nonmonotonic formalisms, namely default and autoepistemic logics, the causal interpretation of logic programs turns out to be bi-directional in the sense that any causal theory is reducible to a general logic program. As can be shown, a world α is an exact world of a causal inference relation if and only if, for any propositional atom p, p ∈ α if and only if α ⇒ p and ¬p ∈ α if and only if α ⇒ ¬p. By the above description, the exact worlds of a causal relation are determined ultimately by rules of the form A ⇒ l, where l is a literal. Such rules are called determinate. McCain and Turner have established an important connection between the nonmonotonic semantics of a determinate causal theory and Clark’s completion of the latter. A finite causal theory ∆ is definite, if it consists of determinate rules, or rules A ⇒ f , where f is a falsity constant. A completion of such a theory is the set of all classical formulas p ↔ {A | A ⇒ p ∈ ∆} ¬p ↔ {A | A ⇒ ¬p ∈ ∆}
for any propositional atom p, plus the set {¬A | A ⇒ f ∈ ∆}. Then the classical models of the completion precisely correspond to exact worlds of ∆ (see [Giunchiglia et al., 2004]). The completion formulas embody two kinds of information. As (forward) implications from right to left, they contain the material implications corresponding to the causal rules from ∆. In addition, left-to-right implications state that a literal belongs to the model only if one of its causes is also in the model. These implications reflect the impact of causal descriptions using classical logical means. In other words, the completion of a causal theory is a classical logical theory that embodies the required causal content, so it obliterates in a sense the need in causal
Nonmonotonic Reasoning
599
representation. In fact, a classical theory of actions and change based directly on such a completion has been suggested by Ray Reiter as a simple solution to the frame problem — see [Reiter, 1991; Reiter, 2001]. This solution can now be reformulated as follows. Reiter’s simple solution. The following toy representation contains the main ingredients of causal reasoning in temporal domains. The temporal behavior of a propositional fluent F is described using two propositional atoms F0 and F1 saying, respectively, that F holds now and F holds in the next moment. C + ⇒ F1 C − ⇒ ¬F1 F 0 ∧ F1 ⇒ F1 ¬F0 ∧ ¬F1 ⇒ ¬F1 F0 ⇒ F0 ¬F0 ⇒ ¬F0 . The first pair of causal rules describes the actions or natural factors that can cause F and, respectively, ¬F (C + and C − normally describe the present situation). Second, we have a pair of inertia axioms, purely explanatory rules stating that if F holds (does not hold) now, then it is self-explanatory that it will hold (resp., not hold) in the next moment. The last pair of initial axioms states that F0 is an exogenous parameter. The above causal theory is determinate, and its completion is as follows: F1 ↔ C + ∨ (F0 ∧ F1 )
¬F1 ↔ C − ∨ (¬F0 ∧ ¬F1 ).
These formulas are equivalent to the conjunction of ¬(C + ∧ C − ) and F1 ↔ C + ∨ (F0 ∧ ¬C − ). The above formulas provide an abstract description of Reiter’s simple solution: the first formula corresponds to his consistency condition, while the last one — to the successor state axiom for F . 4 PREFERENTIAL NONMONOTONIC REASONING As a theory of a rational use of assumptions, the main problem nonmonotonic reasoning deals with is that assumptions are often incompatible with one another, or with known facts. In such cases of conflict we must have a reasoned choice. The preferential approach follows here the slogan “Choice presupposes preference”. According to this approach, the choice of assumptions should be made by forming the space of options for choice and establishing preference relations among them. This makes preferential approach a special case of a general methodology that is at least as old as the decision theory and theory of social choice. McCarthy’s circumscription can be seen as the ultimate origin of the preferential approach. A generalization of this approach was initiated by Gabbay in [1985] on the logical side, and by Shoham [1988] on the AI side. A first overview of the
600
Alexander Bochman
preferential approach has been given in [Makinson, 1994]. A detailed description of the approach as we see it now can be found in [Bochman, 2001; Schlechta, 2004; Makinson, 2005]; see also the paper of Karl Schlechta in this volume. Both the Closed World Assumption and circumscription can be seen as working on the principle of preferring interpretations in which positive facts are minimized; this idea was pursued already in [Lifschitz, 1985]. Generalizing this idea, [Shoham, 1988] argued that any form of nonmonotonicity necessarily involves minimality of one kind or another. He argued also for a shift in emphasis from syntactic characterizations in favor of semantic ones. Namely, the relevant nonmonotonic entailment should be defined in terms of truth in all those models of a given axiomatization minimal with respect to some application dependent criterion. The ability to characterize such minimality criteria axiomatically is not essential. In effect, on Shoham’s view, an axiomatization of an application domain coupled with a characterization of its preferred minimal models is a sufficient specification of the required entailments. Shoham defined a model preference logic by using an arbitrary preference ordering of the interpretations of a language. DEFINITION 7. An interpretation i is a preferred model of A if it satisfies A and there is no better interpretation j > i satisfying A. A preferentially entails B (written A|∼B) iff all preferred models of A satisfy B. In support of his conclusions, Shoham offered his own theory of temporal minimization, as well as a minimal model semantics for a certain simplification of Reiter’s default logic. Shoham’s approach was very appealing, and suggested a unifying perspective on nonmonotonic reasoning. This treatment of nonmonotonicity was also similar to the earlier modal semantic theories of conditionals and counterfactuals, which have been studied in the philosophical literature — see, e.g., [Stalnaker, 1968; Lewis, 1973]. Just as the nonmonotonic entailment, counterfactual conditionals do not satisfy monotonicity, that is, A > C does not imply A ∧ B > C. The interrelations between these two theories have become an important theme. Thus, [Delgrande, 1987; Delgrande, 1988] employed conditional logics for reasoning about typicality. The paper [Kraus et al., 1990] constituted a turning point in the development of the preferential approach. Based on Gabbay’s earlier description of generalized inference relations, given in [Gabbay, 1985], and on semantic ideas of [Shoham, 1988], the authors described both semantics and axiomatization for a range of nonmonotonic inference relations. They also strongly argued that preferential conditionals provide a more adequate and versatile formalization of the notion of normality than, say, default logic. Kraus, Lehmann and Magidor [1990] has established the logical foundations for a research program that has attracted many researchers, both in AI and in logic. It has been found, in particular, that the new approach to nonmonotonic reasoning is intimately connected also with the influential theory of belief change suggested in [Alchourr´ on et al., 1985]. As a consequence, an
Nonmonotonic Reasoning
601
alternative but equivalent formalization of nonmonotonic inference relations was developed in [G¨ ardenfors and Makinson, 1994] based on expectation ordering of classical formulas. It is worth mentioning here, however, that despite the general enthusiasm with this new approach, there have also been voices of doubt. Already [Reiter, 1987a] has warned against overly hasty generalizations when it comes to nonmonotonic reasoning. Moreover, Reiter has rightly argued that, for the purposes of representing nonmonotonic reasoning, these preferential logics have two fatal flaws; they are (globally) monotonic and extremely weak. On Reiter’s view, nonmonotonicity is achieved in these logics by pragmatic considerations affecting how the logic is used, which destroys the principled semantics on which these logics were originally based.
4.1
Epistemic States
Semantic interpretation constitutes one of the main components of a viable reasoning system, monotonic or not. A formal inference engine, though important, can be effectively used for representing and solving reasoning tasks only if its basic notions have clear meaning allowing to discern them from a description of a situation at hand. The standard semantics of preferential inference relations is based on abstract possible worlds models in which worlds are ordered by a preference relation. A more specific semantic interpretation for such inference relations, suitable for nonmonotonic reasoning, can be obtained, however, based on preference ordering on sets of default assumptions. This strategy was pursued by Hector Geffner in [Geffner, 1992] in the context of an ambitious general project in nonmonotonic reasoning, which showed also how to apply the preferred model approach to particular reasoning problems. This more specific interpretation also subsumes the approach to nonmonotonic reasoning employed in Poole’s Theorist system [Poole, 1988a] (see above). It has been suggested in [Bochman, 2001] that a general representation framework for preferential nonmonotonic reasoning can be given in terms of epistemic states, defined below. DEFINITION 8. An epistemic state is a triple (S, l, ≺), where S is a set of admissible belief states, ≺ a preference relation on S, while l is a labeling function assigning a deductively closed belief set to every state from S. On the intended interpretation, admissible belief states are generated as logical closures of allowable combinations of default assumptions. Such states are taken to be the options for choice. The preference relation on admissible belief states reflects the fact that not all admissible combinations of defaults constitute equally preferred options for choice. For example, defaults are presumed to hold, so an admissible belief state generated by a larger set of defaults is normally preferred to an admissible state generated by a smaller set of defaults. In addition, not all
602
Alexander Bochman
defaults are born equal, so they may have some priority structure that imposes, in turn, additional preferences among belief states (see below). Epistemic states guide our decisions what to believe in particular situations. They are epistemic, however, precisely because they say nothing directly about what is actually true, but only what is believed (or assumed) to hold. This makes epistemic states relatively stable entities; change in facts and situations will not necessary lead to change in epistemic states. The actual assumptions made in particular situations are obtained by choosing preferred admissible belief states that are consistent with the facts. Prioritization An explicit construction of epistemic states generated by default bases provides us with characteristic properties of epistemic states arising in particular reasoning contexts. An epistemic state is base-generated by a set ∆ of propositions with respect to a classical Tarski consequence relation Th if • the set of its admissible states is the set P(∆) of subsets of ∆; • l is a function assigning each Γ ⊆ ∆ a theory Th(Γ); • the preference order is monotonic on P(∆): if Γ ⊂ Φ, then Γ ≺ Φ. The preference order on admissible belief states is usually derived in some way from priorities among individual defaults. This task turns out to be a special case of a general problem of combining a set of preference relations into a single ‘consensus’ preference order. Let us suppose that the set of defaults ∆ is ordered by some priority relation which will be assumed to be a strict partial order: α β will mean that α is prior to β. Recall that defaults are beliefs we are willing to hold insofar as it is consistent to do so. Hence any default δ determines a primary preference relation δ on P(∆) by which admissible belief sets containing the default are preferred to belief sets that do not contain it: Γ δ Φ ≡ if δ ∈ Γ then δ ∈ Φ Each δ is a weak order having just two equivalence classes, namely sets of defaults that contain δ, and sets that don’t. In this setting, the problem of finding a global preference order amounts to constructing an operator that maps a set of preference relations {δ | δ ∈ ∆} to a single preference relation on P(∆). As has been shown in [Andreka et al., 2002], any finitary operator of this kind satisfying the so-called Arrow’s conditions is definable using a priority graph (N, , v), where is a priority order on a set of nodes N , and v is a labeling function assigning each node a preference relation. The priority graph determines a single resulting preference relation via the lexicographic rule, by which t is weakly preferred to s
Nonmonotonic Reasoning
603
overall if it is weakly preferred for each argument preference, except possibly those for which there is a prior preference that strictly prefers t to s: s t ≡ ∀i ∈ N (s v(i) t ∨ ∃j ∈ N (j i ∧ s ≺v(j) t)) In our case, the prioritized base (∆, ) can be viewed as a priority graph in which every node δ is assigned a preference relation δ . Consequently, we can apply the lexicographic rule and arrive at the following definition: Γ Φ ≡ (∀α ∈ Γ\Φ)(∃β ∈ Φ\Γ)(β α) Γ Φ holds when, for each default in Γ \ Φ, there is a prior default in Φ \ Γ. The corresponding strict preference Γ ≺ Φ is defined as Γ Φ ∧ Γ = Φ. Lifschitz [1985] was apparently the first to use this construction in prioritized circumscription, while [Geffner, 1992] employed it for defining preference relations among sets of defaults (see also [Grosof, 1991]).
4.2
Nonmonotonic inference and its kinds
In particular situations, we restrict our attention to admissible belief sets that are consistent with the facts, and choose preferred among them. The latter are used to support the assumptions and conclusions we make about the situation at hand. Accordingly, all kinds of nonmonotonic inference relations, described below, presuppose a two-step selection procedure: for a current evidence A, we consider admissible belief states that are consistent with A and choose preferred elements in this set. An admissible belief state s ∈ S will be said to be compatible with a proposition A, if ¬A does not belong to its belief set, that is, ¬A ∈ / l(s). The set of all admissible states that are compatible with A will be denoted by A. A skeptical inference (or prediction) with respect to an epistemic state is obtained when we infer only what is supported by each of the preferred states. In other words, B will be a skeptical conclusion from the evidence A in an epistemic state E if each preferred admissible belief set in E that is consistent with A, taken together with A itself, implies B. DEFINITION 9. B is a skeptical consequence of A (notation A|∼B) in an epistemic state if A → B is supported by all preferred belief states in A. A set of conditionals A|∼B that are valid in an epistemic state E will be called a skeptical inference relation determined by E. The above definition generalizes the notion of prediction from [Poole, 1988a], as well as an expectation-based inference of [G¨ ardenfors and Makinson, 1994]. But in fact, it is much older. While the semantics of nonmonotonic inference from [Kraus et al., 1990] derives from the possible worlds semantics of Stalnaker and Lewis, the above definition can be traced back to the era before the discovery of possible worlds, namely to Frank Ramsey and John S. Mill:
604
Alexander Bochman
“In general we can say with Mill that ‘If p then q’ means that q is inferrable from p, that is, of course, from p together with certain facts and laws not stated but in some way indicated by the context.” [Ramsey, 1978, page 144] This definition has also been used in the ‘premise-based’ semantics for counterfactuals [Veltman, 1976; Kratzer, 1981] (see also [Lewis, 1981]). A credulous inference (or explanation) with respect to an epistemic state is obtained by assuming that we can reasonably infer (or explain) conclusions that are supported by at least one preferred belief state consistent with the facts. In other words, B will be a credulous conclusion from A if at least one preferred admissible belief set in E that is consistent with A, taken together with A itself, implies B. In still other words, DEFINITION 10. B is a credulous consequence of A in an epistemic state if A→B is supported by at least one preferred belief state in A. The set of conditionals that are credulously valid in an epistemic state E forms a credulous inference relation determined by E. The above definition constitutes a generalization of the corresponding definition of explanation in Poole’s abductive system [Poole, 1988a]. Credulous inference is only one, though important, instance of a broad range of non-skeptical inference relations (see [Bochman, 2003a]).
4.3 Syntactic characterizations As we mentioned, [Gabbay, 1985] was a starting point of the approach to nonmonotonic reasoning based on describing associated inference relations. This approach was primarily designed to capture the skeptical view of nonmonotonic reasoning. We have seen, however, that many reasoning tasks in AI, such as abduction and diagnosis, are based on a credulous understanding of nonmonotonic reasoning. A common ground for both skeptical and credulous inference can be found in the logic of conditionals suggested in [van Benthem, 1984]. The main idea behind van Benthem’s approach was that a conditional can be seen as a generalized quantifier representing a relation between the respective sets of instances or situations supporting and refuting it. A situation confirms a conditional A|∼B if it supports the classical implication A→B, and refutes A|∼B if it supports A→¬B. Then the validity of a conditional in a set of situations is determined by appropriate, ‘favorable’ combinations of confirming and refuting instances. Whatever they are, we can assume that adding new confirming instances to a valid conditional, or removing refuting ones, cannot change its validity. Accordingly, we can accept the following principle: If all situations confirming A|∼B confirm also C|∼D, and all situations refuting C|∼D refute also A|∼B, then validity of A|∼B implies validity of C|∼D. The above principle is already sufficient for justifying the rules of the basic inference relation, given below. It supports also a representation of conditionals in
Nonmonotonic Reasoning
605
terms of expectation relations. More exactly, we can say that a conditional is valid if the set of its confirming situations is sufficiently good compared with the set of its refuting situations. Accordingly, A|∼B can be defined as A→¬B < A→B for an appropriate expectation relation <, while the above principle secures that this is a general expectation relation as defined in [Bochman, 2001]. A basic inference relation B satisfies the following postulates: Reflexivity
A|∼A
Left Equivalence Right Weakening Antecedence Deduction
If A ↔ B and A|∼C, then B|∼C If A|∼B and B C, then A|∼C
If A|∼B, then A|∼A ∧ B If A ∧ B|∼C, then A|∼B → C
Very Cautious Monotony
If A|∼B ∧ C, then A ∧ B|∼C
A set Γ of conditionals implies a conditional α with respect to basic inference if α belongs to the least basic inference relation containing Γ. Note, however, that all the above postulates involve at most one conditional premise. As a result, the basic entailment boils down to a derivability relation among single conditionals. The following theorem describes this derivability relation in terms of the classical entailment. THEOREM 11. A|∼B B C|∼D if and only if either C D, or A → B C → D and C→¬D A→¬B. Theorem 11 justifies the principle stated earlier: A|∼B implies C|∼D if and only if either all situations confirm C|∼D, or else all confirming instances of the former are confirming instances of the latter and all refuting instances of the latter are refuting instances of the former. Basic inference relation is a weak inference system, first of all because it does not allow to combine different conditionals. Nevertheless, it is in a sense complete so far as we are interested in derivability among individual conditionals. More exactly, basic derivability captures exactly the one-premise derivability of both skeptical and credulous inference relations. As was noted already in [Gabbay, 1985], a characteristic feature of sceptical inference is the validity of the following postulate: (And) If A|∼B and A|∼C, then A|∼B ∧ C. Indeed, in the framework of basic inference, And is all we need for capturing precisely the preferential inference relations from [Kraus et al., 1990]. Such inference relations provide a complete axiomatization of skeptical inference with respect to epistemic states. An important special case of preferential inference, rational inference relations, are determined by linearly ordered epistemic states; they are obtained by adding further
606
Alexander Bochman
(Rational Monotony) If A|∼B and A|≁ ¬C, then A ∧ C|∼B. In contrast, credulous inference relations do not satisfy And. Still, they are axiomatized as basic inference relations satisfying Rational Monotony. Rational Monotony is not a ‘Horn’ rule, so it does not allow us to derive new conditionals from given ones. In fact, credulous inference relations do not derive much more conditionals than what can be derived already by basic inference (see [Bochman, 2001]). This indicates that there should be no hope to capture credulous nonmonotonic reasoning by derivability in some nonmonotonic logic. Something else should be added to the above logical framework in order to represent the relevant form of nonmonotonic reasoning. Though less evident, the same holds for skeptical inference. Both these kinds of inference need to be augmented with an appropriate globally nonmonotonic semantics that would provide a basis for the associated systems of defeasible entailment, as described in the next section.
4.4 Defeasible entailment A theory of reasoning about default conditionals should occupy an important place in the general theory of nonmonotonic reasoning. Thus, the question whether a proposition B is derivable from an evidence A in a default base is reducible to the question whether the conditional A|∼B is derivable from the base, so practically all reasoning problems about default conditionals are reducible to the question what conditionals can be derived from a conditional default base. The latter problem constitutes therefore the main task of a theory of default conditionals (see [Lehmann and Magidor, 1992]). For a skeptical reasoning, a most plausible understanding of default conditionals is obtained by treating them as skeptical inference rules in the framework of epistemic states. Accordingly, preferential inference relations of [Kraus et al., 1990] can be considered as a logic behind skeptical nonmonotonic reasoning; the rules of the former should be taken for granted by the latter. This does not mean, however, that nonmonotonic reasoning about default conditionals is reducible to preferential derivability. Preferential inference is severely sub-classical and does not allow us, for example, to infer “Red birds fly” from “Birds fly”. In fact, this is precisely the reason why such inference relations have been called nonmonotonic. Clearly, there are good reasons for not accepting such a derivation as a logical rule for preferential inference; otherwise “Birds fly” would imply also “Birds with broken wings fly” and even “Penguins fly”. Still, this should not prevent us from accepting “Red birds fly” on the basis of “Birds fly” as a reasonable nonmonotonic (or defeasible) conclusion, namely a conclusion made in the absence of information against it. By doing this, we would just follow the general strategy of nonmonotonic reasoning that involves making reasonable assumptions on the basis of available information. Thus, the logical core of skeptical inference, preferential inference relations, should be augmented with a mechanism of making nonmonotonic conclusions. This kind of reasoning will of course be defeasible, or
Nonmonotonic Reasoning
607
globally nonmonotonic, since addition of new conditionals can block some of the conclusions made earlier. Note in this respect that, though preferential inference is (locally) nonmonotonic with respect to premises of conditionals, it is nevertheless globally monotonic: adding new conditionals does not change the validity of previous derivations. On the semantic side, default conditionals are constraints on epistemic states in the sense that the latter should make them skeptically valid. Still, usually there is a huge number of epistemic states that satisfy a given set of conditionals, so we have both an opportunity and necessity to choose among them. Our guiding principle in this choice can be the same basic principle of nonmonotonic reasoning, namely that the intended epistemic states should be as normal as is permitted by the current constraints. By choosing particular such states, we thereby will adopt conditionals that would not be derivable from a given base by preferential inference alone. The above considerations lead to a seemingly inevitable conclusion that default conditionals possess a clear logical meaning and associated logical semantics based on epistemic states (or possible worlds models), but they still lack a globally nonmonotonic semantics that would provide an interpretation for the associated defeasible entailment. Actually, the literature on nonmonotonic reasoning is abundant with such theories of defeasible entailment. A history of studies on this subject could be summarized as follows. Initial formal systems, namely Lehmann’s rational closure [Lehmann, 1989; Lehmann and Magidor, 1992] and Pearl’s system Z [Pearl, 1990], have turned out to be equivalent. This encouraging development has followed by a realization that both theories are insufficient for representing defeasible entailment, since they do not allow to make certain intended conclusions. Hence, they have been refined in a number of ways, giving such systems as lexicographic inference [Benferhat et al., 1993; Lehmann, 1995], and similar modifications of Pearl’s system [Goldszmidt et al., 1993; Tan and Pearl, 1995]. Unfortunately, these refined systems have encountered an opposite problem, namely, together with some desirable properties, they invariably produced some unwanted conclusions. All these systems have been based on a supposition that defeasible entailment should form a rational inference relation. A more general approach in the framework of preferential inference has been suggested in [Geffner, 1992]. Yet another, more syntactic, approach to defeasible entailment has been pursued in the framework of inheritance hierarchies (see [Horty, 1994]). Inheritance reasoning deals with a quite restricted class of conditionals constructed from literals. Nevertheless, in this restricted domain it has achieved a remarkably close correspondence between what is derived and what is expected intuitively. Accordingly, inheritance reasoning has emerged as an important test bed for adjudicating proposed theories. Despite the diversity, the systems of defeasible entailment have a lot in common, and take as a starting point a few basic principles. Thus, most of them presuppose that intended models should be described, ultimately, in terms of material impli-
608
Alexander Bochman
cations corresponding to a given set of conditionals. More exactly, these classical implications should serve as defaults in the nonmonotonic reasoning sanctioned by a default base. This idea can be made precise as follows: Default base-generation. The intended epistemic states for a de of fault base B should be base-generated by the corresponding set B material implications. In other words, the admissible belief states of intended epistemic states for and the preference order a default base B should be formed by subsets of B, should be monotonic on these subsets (see Section 4.1). In addition, it should be required that all the conditionals from B should be skeptically valid in the resulting epistemic state. Already these constraints on intended epistemic states allow us to derive “Red birds fly” from “Birds fly” for all default bases that do not contain conflicting information about redness. The constraints also sanction defeasible entailment across exception classes: if penguins are birds that normally do not fly, while birds normally fly and have wings, then we are able to conclude that penguins normally have wings, despite being abnormal birds. This excludes, in effect, Pearl’s system Z and rational closure that cannot make such a derivation. Still, these requirements are quite weak and do not produce problematic conclusions that plagued some stronger systems suggested in the literature. Unfortunately, though the above constraints deal successfully with many examples of defeasible entailment, they are still insufficient for capturing some important reasoning patterns. What is missing in our construction is a principled way of constructing a preference order on default sets. This problem has turned out to be far from being trivial or univocal. As for now, two most plausible solutions to this problem, suggested in the literature, are Geffner’s conditional entailment and inheritance reasoning. Conditional entailment [Geffner, 1992] determines a prioritization of default bases by making use of the following relation among conditionals: DEFINITION 12. A conditional α dominates a set of conditionals Γ if the set of implications {Γ, α } is incompatible with the antecedent of α. The origins of this relation can be found already in [Adams, 1975], and it has been used in practically all studies of defeasible entailment, including the notion of preemption in inheritance reasoning. A suggestive reading of dominance says that if α dominates Γ, it should have priority over at least one conditional in Γ7 . Accordingly, a priority order on the default base is admissible if it satisfies this condition. Then the intended models can be identified with epistemic states that are generated by all admissible priority orders on the default base (using the lexicographic rule — see above). Conditional entailment has shown itself as a serious candidate on the role of a general theory of defeasible entailment. Still, it does not capture inheritance 7 This
secures that α will be valid in the resulting epistemic state.
Nonmonotonic Reasoning
609
reasoning. The main difference between the two theories is that conditional entailment is based on absolute priorities among defaults, while inheritance hierarchies determine such priorities in a context-dependent way, namely in presence of other defaults that provide a (preemption) link between two defaults (see [Dung and Son, 2001]). Indeed, it has been shown in [Bochman, 2001] that inheritance reasoning is representable by epistemic states that are base-generated by default conditionals ordered by certain conditional priority orders. Still, the corresponding construction could hardly be called simple or natural. A more natural representation of inheritance reasoning has been given in [Dung and Son, 2001] as an instantiation of an argumentation theory that belongs already to explanatory nonmonotonic formalisms, discussed in the next section. Furthermore, Geffner himself has shown in [Geffner, 1992] that conditional entailment still does not capture some important derivations, and it should be augmented with an explicit representation of causal reasoning. In fact, the causal generalization suggested by Geffner in the last chapters of his book has served as one of the inspirations for a causal theory of reasoning about actions and change (see [Turner, 1999]). 5
EXPLANATORY NONMONOTONIC REASONING
The ‘mainstream’ approach to nonmonotonic reasoning includes default and modal nonmonotonic logics, logic programming, abductive and causal reasoning. We will call this approach explanatory nonmonotonic reasoning, since explanation can be seen as its basic ingredient. Propositions and facts may be not only true or false in a model of a problem situation, but some of them are explainable (justified) by other facts and rules that are accepted. In the epistemic setting, some of the propositions are derivable from other propositions using rules that are admissible in the situation. In the objective setting, some of the facts are caused by other facts and causal rules acting in the domain. Furthermore, explanatory nonmonotonic reasoning is based on very strong principles of Explanation Closure or Causal Completeness (see [Reiter, 2001]), according to which any fact holding in a model should be explained, or caused, by the rules that describe the domain. Incidentally, it is these principles that make explanatory reasoning nonmonotonic. By the above description, abduction, that is, reasoning from facts to their explanations, is an integral part of explanatory nonmonotonic reasoning. Ultimate explanations, or abducibles, correspond not to normality defaults, but to conjectures representing base causes or facts that do not require explanation; we assume the latter only for explaining some evidence. In some domains, explanatory formalisms adopt simplifying assumptions that exempt, in effect, certain propositions from the burden of explanation. Closed World Assumption [Reiter, 1978] is the most important assumption of this kind. According to it, negative assertions do not require explanation. Nonmonotonic reasoning in databases and logic programming are domains for which such an assumption turns out to be most appropriate. It is important to note that the
610
Alexander Bochman
minimization principle that has been the source of the preferential approach can also be derived as a result of combining Explanation Closure with the Closed World Assumption (see below). Consequently, it need not be viewed as a principle of scaled preference of negative information; rather, it could be seen as a byproduct of the stipulation that negated propositions can be accepted without any further explanation, while positive assertions always require explanation. This understanding allows us to explain why McCarthy’s circumscription, that is based on the principle of minimization, is subsumed also by explanatory formalisms. The above principles form an ultimate basis for all formal systems of explanatory nonmonotonic reasoning. They presuppose, however, a richer picture of what is in the world than what is usually captured in logical models of the latter. The traditional understanding of possible worlds in logic stems from the Tractatus’ metaphysics where ‘[e]ach item can be the case or not the case while everything else remains the same’ [Wittgenstein, 1961]. Consequently, there is no way of making an inference from one fact to another, and there is no causal nexus to justify such an inference. The only restriction on the structure of the world is the principle of non-contradiction. Such worlds leave no place for dependencies among facts and related notions, in particular for causation. Wittgenstein himself concluded that belief in such dependencies is a superstition. An alternative picture depicts the world not as a mere assemblage of unrelated facts, but as something that has a structure. This structure determines dependencies among occurrent facts that serve as a basis for our explanatory and causal claims. It is this structure that makes the world intelligible and, what is especially important for AI, controllable. By this picture, explanatory and causal relations form an integral part of understanding of and acting in the world. Consequently, such relations should form an essential part of knowledge representation, at least in Artificial Intelligence.
5.1 Biconsequence Relations A uniform account of explanatory nonmonotonic formalisms can be given in the framework of biconsequence relations described in this section. Biconsequence relations are specialized consequence relations for reasoning with respect to a pair of contexts. On the interpretation suitable for nonmonotonic reasoning, one of these contexts is the main (objective) one, while the other context provides assumptions, or explanations, that justify inferences in the main context. This separation of inferences and their justifications creates a possibility of explanatory nonmonotonic reasoning. The two contexts will be termed, respectively, the context of truth and the context of falsity. In the truth context the propositions are evaluated as being either true or non-true, while in the falsity context they can be false or non-false. As a benefit of this terminological decision, a bi-context reasoning can also be interpreted as a reasoning with possibly inconsistent and incomplete information. Furthermore, such a reasoning can also be viewed as a four-valued reasoning (see
Nonmonotonic Reasoning
611
[Belnap, 1977]). A bisequent is an inference rule of the form a : b c : d, where a, b, c, d are finite sets of propositions. We will employ two informal interpretations of bisequents. According to the four-valued interpretation, it says ‘If all propositions from a are true and all propositions from b are false, then either one of the propositions from c is true or one of the propositions from d is false’. According to the explanatory (or assumption) interpretation, it says ‘If no proposition from b is assumed, and all propositions from d are assumed, then all propositions from a hold only if one of the propositions from c holds’. A biconsequence relation is a set of bisequents satisfying the rules: Monotonicity Reflexivity
Cut
a:b c:d , if a ⊆ a′ , b ⊆ b′ , c ⊆ c′ , d ⊆ d′ ; a′ : b′ c′ : d′ A: A:
and
a : b A, c : d A, a : b c : d a:b c:d
: A : A; a : b c : A, d a : A, b c : d . a:b c:d
A biconsequence relation can be seen as a fusion, or fibring, of two Scott consequence relations much in the sense of [Gabbay, 1998]. This fusion gives a syntactic expression to combining two independent contexts. The definition of a biconsequence relation is extendable to arbitrary sets of propositions by accepting the compactness requirement: Compactness
u : v w : z iff a : b c : d,
for some finite sets a, b, c, d such that a ⊆ u, b ⊆ v, c ⊆ w and d ⊆ z. For a set u of propositions, u will denote the set of propositions that do not belong to u. A pair (u, v) of sets of propositions will be called a bitheory of a biconsequence relation if u : v u : v. A set u of propositions is a (propositional) theory of , if (u, u) is a bitheory of . Bitheories can be seen as pairs of sets that are closed with respect to the bisequents of a biconsequence relation. A bitheory (u, v) of is positively minimal, if there is no bitheory (u′ , v) of such that u′ ⊂ u. Such bitheories play an important role in describing nonmonotonic semantics. By a bimodel we will mean a pair of sets of propositions. A set of bimodels will be called a binary semantics. DEFINITION 13. A bisequent a : b c : d is valid in a binary semantics B, if, for any (u, v) ∈ B, if a ⊆ u and b ⊆ v, then either c ∩ u = ∅, or d ∩ v = ∅.
612
Alexander Bochman
The set of bisequents that are valid in a binary semantics forms a biconsequence relation. On the other hand, any biconsequence relation is determined in this sense by its canonical semantics defined as the set of bitheories of . Consequently, the binary semantics provides an adequate interpretation of biconsequence relations. According to Belnap’s idea, the four truth-values {⊤, t, f , ⊥} of a four-valued interpretation can be identified with the four subsets of the set {t, f } of classical truth-values, namely {t, f }, {t}, {f } and ∅. Thus, ⊤ means that a proposition is both true and false (i.e., contradictory), t means that it is ‘classically’ true (that is, true without being false), f means that it is classically false, while ⊥ means that it is neither true nor false (undetermined). This representation allows us to see any four-valued interpretation ν as a pair of ordinary interpretations corresponding, respectively, to independent assignments of truth and falsity to propositions: ν |= A iff t ∈ ν(A)
ν =|A
iff f ∈ ν(A).
Now, a bimodel (u, v) can be viewed as a four-valued interpretation, where u is the set of true propositions, while v is the set of propositions that are not false. Biconsequence relations provide in this sense a syntactic formalism for four-valued reasoning. A bisequent theory is an arbitrary set of bisequents. For any bisequent theory ∆ there is a least biconsequence relation ∆ containing it that describes the logical content of ∆. This allows us to extend the notions of a bitheory and propositional theory to arbitrary bisequent theories. Two kinds of bisequents are of special interest for nonmonotonic reasoning. The first are default bisequents a:b c: without negative conclusions that are related to rules of default logic. The second are autoepistemic bisequents :b c:d that are related to autoepistemic logic. A default bisequent says that if no proposition from b is assumed, then all propositions from a hold only if one of the propositions from c holds. Such a bisequent can be viewed as a Scott inference rule a ⊢ c that is conditioned by a set of negative assumptions (i.e., absence of bs). Thus, such bisequents involve full inference capabilities with respect to the main context, but permit only negative assumptions. In contrast, an autoepistemic bisequent :b c:d says that if no proposition from b is assumed, and all propositions from d are assumed, then one of the propositions from c holds. Such rules have rich assumption capabilities, but allow us to make only unconditional assertions. In addition to the above distinction, we are often interested in singular bisequents, namely bisequents a:b C:d having a single proposition C as a positive conclusion. Such bisequents can be seen as counterparts of Tarski rules. The formalism of default consequence relations, introduced in [Bochman, 1994], provided, in effect, a uniform description of singular default and autoepistemic bisequents as species of default rules of the form a:b ⊢ C. The difference of default vs. autoepistemic rules has been reflected, however, as a difference in corresponding logics for default rules. This common representation has given a convenient basis for a
Nonmonotonic Reasoning
613
comparative study of default and modal nonmonotonic reasoning (see [Bochman, 1998c]). An important feature of biconsequence relations is a possibility of imposing structural constraints on the binary semantics by accepting additional structural rules. Some of them play an important role in nonmonotonic reasoning. Thus, a biconsequence relation is consistent, if it satisfies Consistency
A:A
On the explanatory interpretation, Consistency says that no proposition can be taken to hold without assuming it. This amounts to restricting the binary semantics to consistent bimodels, that is, bimodels (u, v) such that u ⊆ v. On the four-valued representation, Consistency requires that no proposition can be both true and false, so it determines a semantic setting of partial logic (see, e.g., [Blamey, 1986]) that usually deals only with possible incompleteness of information. A biconsequence relation is regular if it satisfies Regularity
b:a a:b : a : b
Regularity is a kind of an assumption coherence constraint. It says that a coherent set of assumptions should be such that it is compatible with taking these assumptions as actually holding. A semantic counterpart of Regularity is a quasireflexive binary semantics in which, for any bimodel (u, v), (v, v) is also a bimodel. Any four-valued connective is definable in biconsequence relations via a pair of introduction rules and a pair of elimination rules corresponding to the two valuations of a four-valued interpretation. We are primarily interested, however, in information a bi-context reasoning can give us about ordinary, classical truth and falsity, so we restrict attention to classical connectives that are conservative on the subset {t, f }. Such connectives give classical truth-values when their arguments receive classical values t or f . There are four natural connectives that are jointly sufficient for defining all classical four-valued functions. The first is the well-known conjunction: ν |= A ∧ B iff ν |= A and ν |= B ν =|A ∧ B iff ν =|A or ν =|B. Next, there are two negation connectives that can be seen as two alternative extensions of classical negation to the four-valued setting: ν |= ¬A iff ν |= A ν |= ∼A iff ν =|A
ν =|¬A iff ν =|A ν =|∼A iff ν |= A.
We will call ¬ and ∼ a local and global negation, respectively. Each of them can be used together with the conjunction to define a disjunction: A ∨ B ≡ ∼(∼A ∧ ∼B) ≡ ¬(¬A ∧ ¬B).
614
Alexander Bochman
Finally, the unary connective A can be seen as a kind of a modal operator, closely related to the modal operator L that will be used in the modal extension of our formalism. ν |= AA iff ν =|A
ν =|AA
iff ν =|A.
From the classical point of view, the most natural subclass of the classical four-valued connectives is formed by connectives that behave as ordinary classical connectives with respect to each of the two contexts. Connectives of this kind will be called locally classical. The conjunction ∧ and local negation ¬ form a functionally complete basis for all such connectives. Having the four-valued connectives at our disposal, we can transform bisequents into more familiar inference rules, and even to ordinary logical formulas. Let ¬u denote the set {¬A | A ∈ u}, and similarly for ∼u, etc. Then any bisequent a : b c : d is equivalent to each of the following: a, ∼b : c, ∼d :
¬a, c : d, ¬b Bisequents of the form (1) can be seen as ordinary sequents. In fact, this is a common trick used for representing many-valued logics in the form of a sequent calculus. If the language contains also conjunction (or, equivalently, disjunction), the set of premises can be replaced by their conjunction, while the set of conclusions — by their disjunction. Consequently, we can transform bisequents into Tarskitype rules A ⊢ B. Actually, the resulting system will coincide with a (flat) theory of relevant entailment [Dunn, 1976]. An important alternative possibility arises from using the representation (2). This time we can use the local connectives {∧, ¬} in order to reduce such bisequents to that of the form A : B, where A and B are classical propositions. The latter bisequents will correspond to rules B ⇒ A of production and causal inference relations, discussed later.
5.2 Nonmonotonic Semantics Nonmonotonic semantics of a biconsequence relation is a certain set of its theories. Namely, such theories are explanatory closed in the sense that presence and absence of propositions in the main context is explained (i.e., derived) using the rules of the biconsequence relation when the theory itself is taken as the assumption context. Exact semantics The notion of an exact theory of a biconsequence relation provides us with a simplest and most general kind of nonmonotonic reasoning. DEFINITION 14. A theory u of a biconsequence relation is exact, if there is no other set v = u such that (v, u) is a bitheory of . The set of exact theories will be called an exact nonmonotonic semantics of .
Nonmonotonic Reasoning
615
Exact theories correspond to bitheories, for which the assumption context determines itself as a unique objective state compatible with it. Such theories can be given the following syntactic description. LEMMA 15. A set u of propositions is an exact theory of a biconsequence relation
if and only if, for any proposition A, A ∈ u iff : u A : u
and
A∈ / u iff A : u : u.
The exact semantics is extendable to arbitrary bisequent theories by considering their associated biconsequence relations. Regular biconsequence relations constitute a maximal logic suitable for the exact nonmonotonic semantics. The above definition of nonmonotonic semantics leaves us much freedom in determining nonmonotonic consequences of a bisequent theory. The most obvious skeptical choice consists in taking propositions that belong to all exact theories. As a credulous alternative, however, we can consider propositions that belong to at least one theory. An even more general understanding amounts to a view that all the information that can be discerned from the nonmonotonic semantics of a bisequent theory or a biconsequence relation can be seen as nonmonotonically implied by the latter. Exact theories determine a truly nonmonotonic semantics, since adding new bisequents to a bisequent theory may not only eliminate exact theories, but also add new ones (or both). In other words, the set of exact theories does not change monotonically with the growth of the set of bisequents. Finally, if a bisequent theory contains only default bisequents a:b c:, then any its exact theory will also be a minimal propositional theory. Default semantics A more familiar class of nonmonotonic models, extensions, correspond to extensions of default logic and stable models of logic programs. DEFINITION 16. A set u is an extension of a biconsequence relation , if (u, u) is a positively minimal bitheory of . A default nonmonotonic semantics of a biconsequence relation is the set of its extensions. u is an extension if it is a theory of a biconsequence relation such that there is no smaller set v ⊂ u such that (v, u) is a bitheory. Hence any exact theory of a biconsequence relation is an extension, though not vice versa. Let P r denote the set of all propositions of the language. The following lemma provides a syntactic description of extensions. LEMMA 17. A set u is an extension of a biconsequence relation if and only if u = {A | :u A, u:u} and either u = P r, or : P r. By the above description, extensions are theories of a biconsequence relation that explain only why they have the propositions they have. In other words, for extensions we are relieved from the necessity of explaining why propositions do not
616
Alexander Bochman
belong to the intended theory. This agreement constitutes the essence of Reiter’s Closed World Assumption. Another consequence of the above description is that extensions are determined by the autoepistemic bisequents of a biconsequence relation. It turns out that the default nonmonotonic semantics is precisely an exact semantics under a stronger logic of consistent biconsequence relations. The effect of Consistency A : A amounts to an immediate refutation of any proposition that is assumed not to hold. That is why absence of propositions from extensions does not require explanation, only their presence. This makes the minimality condition in the definition of extensions a consequence of the logical Consistency postulate, instead of an independent ‘rationality principle’ of nonmonotonic reasoning. Bisequent interpretation of logic programs Representations of general logic programs and their semantics in the logical formalism of biconsequence relations establish the ways of providing (or, better, restoring) a logical basis of logic programming. Recall that a general logic program is a set of program rules notd, c ← a, notb, where a, b, c, d are finite sets of propositional atoms. Such program rules involve disjunctions and default negations in heads and subsume practically all structural extensions of Prolog rules suggested in the literature. A program rule notd, c ← a, notb can be directly interpreted as a bisequent a : b c : d. Let bc(Π) denote the bisequent theory corresponding to a general program Π under this interpretation. The following result shows that it provides an exact correspondence between stable models of a general program and extensions of the associated bisequent theory. THEOREM 18. Stable models of a general logic program Π coincide with the extensions of bc(Π). Moreover, the correspondence turns out to be bidirectional, since any bisequent in a four-valued language is logically reducible to a set of bisequents without connectives; such bisequents can already be viewed as program rules.
5.3 Causal and Production Inference As we mentioned earlier, in languages containing certain (four-valued) connectives, bisequents are reducible to much simpler rules. In particular, in a language with local classical connectives, bisequents are reducible to rules of the form A ⇒ B, where A and B are classical logical formulas (cf. representation (2) above). Such rules can be given an informal reading ‘A causes, or explains, B’, and the resulting logical system will turn out to coincide with the system causal inference described in Section 3.6. Speaking more formally, for a biconsequence relation in a local classical language, we can define the following set of rules, called the production subrelation of : ⇒ = {A ⇒ B | B : A}
Nonmonotonic Reasoning
617
Then the production subrelation of a regular biconsequence relation will form a causal inference relation. Moreover, for any causal inference relation ⇒ there is a unique regular biconsequence relation in the language with the local connectives that has ⇒ as its production subrelation. By this correspondence, a causal rule A ⇒ B can be seen as an assumption-based conditional saying that if A is assumed, then B should hold. In this sense, the system of causal inference constitutes a primary logical system of explanatory nonmonotonic reasoning, whereas biconsequence relations form a structural counterpart of this logical formalism. In particular, in the general correspondence between causal and biconsequence relations, the causal nonmonotonic semantics (see Definition 6) corresponds to the exact nonmonotonic semantics of biconsequence relations. As for biconsequence relations, the default nonmonotonic semantics of causal theories can be obtained by imposing a causal postulate corresponding to the Consistency postulate for biconsequence relations: (Default Negation)
¬p ⇒ ¬p, for any propositional atom p.
Default Negation stipulates that negations of atomic propositions are selfexplanatory, and hence it provides a simple causal expression for Reiter’s Closed World Assumption. As we already mentioned in Section 3.6, this kind of causal inference can be used as a logical basis for logic programming. Production inference. A useful generalization of causal inference is obtained by dropping the postulate Or of causal inference that has allowed for reasoning by cases. The resulting formalism has been called in [Bochman, 2004a] the system of production inference. For this formalism, we can generalize also the corresponding nonmonotonic semantics as follows: DEFINITION 19. A nonmonotonic production semantics of a production inference relation is the set of all its exact theories, namely sets u of propositions such that u = C(u). As before, an exact theory describes an informational state in which every proposition is explained by other propositions accepted in this state. Accordingly, restricting our universe of discourse to exact theories amounts to imposing a kind of an explanatory closure assumption on intended models. The nonmonotonic production semantics allows us to provide a formal representation of abductive reasoning that covers the main applications of abduction in AI. To begin with, abducibles can be identified with self-explanatory propositions of a production relation, that is, propositions satisfying A ⇒ A. Then it turns out that traditional abductive systems are representable via a special class of abductive production inference relations that satisfy (Abduction) If B ⇒ C, then B ⇒ A ⇒ C, for some abducible A. It has been shown in [Bochman, 2005] that abductive inference relations provide a generalization of abductive reasoning in causal theories [Konolige, 1992; Poole,
618
Alexander Bochman
1994b], as well as of abduction in logic programming. On the other hand, any production inference relation includes a greatest abductive subrelation, and in many regular situations (e.g., when the production relation is well-founded) the latter determines the same nonmonotonic semantics. Summing up, the general nonmonotonic semantics of a production relation is usually describable by some abductive system, and vice versa.
5.4 Epistemic Explanatory Reasoning Epistemic formalisms of default and modal nonmonotonic logics find their natural place in the framework of supraclassical biconsequence relations, defined below. An epistemic understanding of biconsequence relations amounts to treating the main and assumption contexts, respectively, as the contexts of knowledge and belief: propositions that hold in the main context can be viewed as known, while propositions of the assumption context form the associated set of beliefs. This understanding will later receive an explicit expression in the modal extension of the formalism. Even in a non-modal setting, however, the epistemic reading implies that both contexts should correspond not to complete worlds, but to incomplete deductive theories. DEFINITION 20. A biconsequence relation in a classical language is supraclassical, if it satisfies Supraclassicality If a A, then a : A : and : A : a. Falsity
f : and
: f.
In supraclassical biconsequence relations both contexts respect the classical entailment. In addition, sets of positive premises and negative conclusions can be replaced by their conjunctions, but positive conclusion sets and negative premise sets are not replaceable in this way by classical disjunctions. Also, the deduction theorem, contraposition, and disjunction in the antecedent are not valid, in general, for each of the two contexts. A semantics of supraclassical biconsequence relations is obtained from the general binary semantics by requiring that bimodels are pairs of consistent deductively closed sets. Structural rules for biconsequence relations can also be used in the supraclassical case. As before, Consistency will correspond to the requirement that u ⊆ v, for any bimodel (u, v). Similarly, regular biconsequence relations will be determined by quasi-reflexive binary semantics. A supraclassical biconsequence relation will be called saturated, if it is consistent, regular, and satisfies the following postulate: Saturation
A ∨ B, ¬A ∨ B : B.
For a deductively closed set u, let u⊥ denote the set of all maximal sub-theories of u, plus u itself. Then a classical bimodel (u, v) will be called saturated, if u ∈ v⊥. A classical binary semantics B will be called saturated if it is regular, and all its
Nonmonotonic Reasoning
619
bimodels are saturated. Such a semantics provides an adequate interpretation for saturated biconsequence relations. Classical nonmonotonic semantics The notions of an exact theory and extension can be directly extended to supraclassical consequence relations, with the only, though important, qualification that they form now deductively closed sets. Still, practically all the results about such semantics remain valid for the supraclassical case. The default nonmonotonic semantics of supraclassical biconsequence relations forms a generalization of default logic. Supraclassical biconsequence relations that are consistent and regular constitute a maximal logic adequate for extensions. For such biconsequence relations, extensions are described as sets satisfying the following fixpoint equality: u = {A : u A : u}.
Thus, an extension is a set of formulas that are provable on the basis of taking itself as the set of assumptions. As before, classical extensions of a default bisequent theory will be minimal theories. Actually, default bisequent theories under this nonmonotonic semantics give an exact representation for the disjunctive default logic [Gelfond et al., 1991]. For singular default rules a:b C:, it reduces to the original default logic of [Reiter, 1980]. The nonmonotonic semantics defined below constitutes an exact non-modal counterpart of Moore’s autoepistemic logic. DEFINITION 21. A theory u of a supraclassical biconsequence relation is a classical expansion of , if, for any v ∈ u⊥ such that v = u, the pair (v, u) is not a bitheory of . The set of classical expansions determines the autoepistemic semantics of . Any extension of a supraclassical biconsequence relation will be a classical expansion, though not vice versa. In fact, classical expansions can be precisely characterized as extensions of saturated biconsequence relations. The next result states important sufficient conditions for coincidence of classical expansions and extensions of a bisequent theory. A bisequent theory ∆ will be called positively simple, if positive premises and positive conclusions of any bisequent from ∆ are sets of classical literals. THEOREM 22. If a bisequent theory is autoepistemic or positively simple, then its classical expansions coincide with classical extensions. Bisequents a:b c:d such that a, b, c, d are sets of classical literals, are logical counterparts of program rules of extended logic programs with classical negation (see [Gelfond and Lifschitz, 1991; Lifschitz and Woo, 1992]). The semantics of such programs is determined by answer sets that coincide with extensions of respective bisequent theories. Moreover, such bisequent theories are positively simple, so by Theorem 22 extended logic programs obliterate the distinction between extensions
620
Alexander Bochman
and classical expansions. This is the logical basis for a possibility of representing extended logic programs also in autoepistemic logic (see [Lifschitz and Schwarz, 1993]).
5.5 Modal Nonmonotonic Logics A general representation of modal nonmonotonic reasoning can be given in the framework of modal biconsequence relations. The role of the modal operator L in this setting consists in reflecting assumptions and beliefs as propositions in the main context. DEFINITION 23. A supraclassical biconsequence relation in a modal language will be called modal if it satisfies the following postulates: Positive Reflection Negative Reflection
A : LA:, : LA :A,
Negative Introspection
: A ¬LA:.
Any theory of a modal biconsequence relation is a modal stable set in the sense of [Moore, 1985], and hence extensions and expansions of modal biconsequence relations will always be stable theories. For a modal logic M, a modal biconsequence relation will be called an Mbiconsequence relation, if A: holds for every modal axiom A of M. A possible worlds semantics for K-biconsequence relations is obtained in terms of Kripke models having a last cluster, namely models of the form M = (W, R, F, V ), where (W, R, V ) is an ordinary Kripke model, while F ⊆ W is a non-empty last cluster of the model (see [Segerberg, 1971]). We will call such models final Kripke models. The relevance of such semantics for modal nonmonotonic reasoning has been shown in [Schwarz, 1992a]. For a final Kripke model M = (W, R, F, V ), let |M | denote the set of modal propositions that are valid in M , while +M + the set of propositions valid in the S5submodel of M generated by F . For a set S of final Kripke models, we define the binary semantics BS = {(|M |, +M +) | M ∈ S}, and say that a bisequent a:b c:d is valid in S, if it is valid in BS . Biconsequence relations described in the next definition play a crucial role in a modal representation of extension-based nonmonotonic reasoning. DEFINITION 24. A modal biconsequence relation is an F-biconsequence relation, if it is regular and satisfies F
A, LA→B : B.
F-biconsequence relations provide a concise representation for the modal logic S4F obtained from S4 by adding the axiom (A ∧ M LB)→L(M A ∨ B). A semantic characterization of S4F [Segerberg, 1971] is given in terms of final Kripke models (W, R, F, V ), such that αRβ iff either β ∈ F , or α ∈ / F.
Nonmonotonic Reasoning
621
Any bisequent
a:b c:d of an F-biconsequence relation is already reducible to a modal formula (La ∪ L¬Lb) → (Lc ∪ L¬Ld). Consequently, any bisequent theory ∆ in such a logic is reducible to an ordinary modal theory that we will ˜ denote by ∆. Modal nonmonotonic semantics By varying the underlying modal logic, we obtain a whole range of modal nonmonotonic semantics. DEFINITION 25. A set of propositions is an M-extension (M-expansion) of a bisequent theory ∆, if it is an extension (resp. expansion) of the least Mbiconsequence relation containing ∆. If ∆ is a plain modal theory, then M-extensions of ∆ coincide with M-expansions in the sense of [Marek et al., 1993]8 . Recall, however, that modal extensions and expansions are modal stable theories, and hence they are determined by their objective subsets. This opens a possibility of reducing modal nonmonotonic reasoning to a nonmodal one, and vice versa. For any set u of propositions, let uo denote the set of all non-modal propositions in u. Similarly, if is a modal biconsequence relation, o will denote its restriction to the non-modal sub-language. Then we have THEOREM 26. If is a modal biconsequence relation, and u a stable theory, then u is an extension of if and only if uo is an extension of o . According to the above result, the objective biconsequence relation o embodies all the information about the modal nonmonotonic semantics of . In other words, the net effect of modal reasoning in biconsequence relations can be measured by the set of derived objective bisequents. Consequently, non-modal supraclassical biconsequence relations turn out to be sufficiently expressive to capture modal nonmonotonic reasoning. In the other direction, in modal F-biconsequence relations any bisequent theory ˜ This allows us to use ordinary modal ∆ is reducible to a usual modal theory ∆. logical formalisms for representing non-modal nonmonotonic reasoning. Thus, the following result generalizes the corresponding result of [Truszczy´ nski, 1991] about a modal embedding of default theories. THEOREM 27. If ∆ is an objective bisequent theory, then classical extensions of ˜ ∆ are precisely objective parts of S4F-extensions of ∆. We end with considering modal expansions. Two kinds of expansions are important for a general description. The first is stable expansions of Moore’s autoepistemic logic. They coincide with M-expansions for any modal logic M in the range 5 ⊆ M ⊆ KD45. The second kind of expansions is reflexive expansions of Schwarz’ reflexive autoepistemic logic [Schwarz, 1992b]. They coincide with M-expansions 8 This creates, of course, an unfortunate terminological discrepancy, which is compensated, however, by the conformity of our terminology with the rest of this paper.
622
Alexander Bochman
for any modal logic in the range KT ⊆ M ⊆ SW5. Both kinds of expansions can be computed either by transforming a bisequent theory into a modal theory and finding its modal extensions, or by reducing it to an objective bisequent theory and computing its classical expansions. Finally, ‘normal’ expansions in general can be viewed as a combination of these two kinds of expansions: THEOREM 28. A set of propositions is a K-expansion of a modal bisequent theory ∆ if and only if it is both a stable and reflexive expansion of ∆. 6
CONCLUSIONS
The following passage from [Reiter, 1987a] remains surprisingly relevant today, almost twenty years later: Nonmonotonicity appears to be the rule, rather than the exception, in much of what passes for human commonsense reasoning. The formal study of such reasoning patterns and their applications has made impressive, and rapidly accelerating progress. Nevertheless, much remains to be done. ... [M]any more non-toy examples need to be thoroughly explored in order for us to gain a deeper understanding of the essential nature of nonmonotonic reasoning. The ultimate quest, of course, is to discover a single theory embracing all the seemingly disparate settings in AI where nonmonotonic reasoning arises. Undoubtedly, there will be surprises en route, but AI will profit from the journey, in the process becoming much more the science we all wish it to be. Despite clear success, twenty five years of nonmonotonic reasoning research have shown that we need deep breath and long term objectives in order to make nonmonotonic reasoning a viable tool for the challenges posed by AI. There is still much to be done in order to meet the actual complexity of reasoning tasks required by the latter. In particular, the quest for a unified theory of nonmonotonic reasoning still has not been accomplished. The relation between the two principal paradigms of nonmonotonic reasoning, preferential and explanatory one, has emerged as the main theoretical problem for a future development of the field. By an inspiring analogy, in nonmonotonic reasoning we have both a global relativity theory of preferential reasoning and a local quantum mechanics of explanatory reasoning. So, what we need is a unified theory of nonmonotonic reality. As in physics, however, this unified theory is not going to emerge as a straightforward juxtaposition of these components.
Nonmonotonic Reasoning
623
BIBLIOGRAPHY [Adams, 1975] E. W. Adams. The Logic of Conditionals. Reidel, Dordrecht, 1975. [Alchourr´ on et al., 1985] C. Alchourr´ on, P. G¨ ardenfors, and D. Makinson. On the logic of theory change: Partial meet contraction and revision functions. Journal of Symbolic Logic, 50:510– 530, 1985. [Alferes and Pereira, 1992] J. J. Alferes and L. M. Pereira. On logic program semantics with two kinds of negation. In K. R. Apt, editor, Proc. Joint Int. Conf. and Symp. on Logic Programming, pages 574–589, Cambridge, Mass., 1992. MIT Press. [Amati et al., 1994] G. Amati, L. Carlucci Aiello, and F. Pirri. Defaults as restrictions on classical hilbert-style proofs. Journal of Logic, Language and Information, 3:303–326, 1994. [Amati et al., 1997] G. Amati, L. Carlucci Aiello, and F. Pirri. Definability and commonsense reasoning. Artificial Intelligence, 93:169–199, 1997. [Andreka et al., 2002] H. Andreka, M. Ryan, and P.-Y. Schobbens. Operators and laws for combining preference relations. Journal of Logic and Computation, 12:13–53, 2002. [Antonelli, 1999] G. A. Antonelli. A directly cautious theory of defeasible consequence for default logic via the notion of general extension. Artificial Intelligence, 109:71–109, 1999. [Antoniou and Wang, 2006] G. Antoniou and K. Wang. Default logic, 2006. This volume. [Antoniou, 1997] G. Antoniou. Nonmonotonic Reasoning. MIT Press, Cambridge, Mass., 1997. [Apt et al., 1988] K. Apt, H. Blair, and A. Walker. Towards a theory of declarative knowledge. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pages 89–148. Morgan Kaufmann, San Mateo, CA, 1988. [Babovich et al., 2000] Y. Babovich, E. Erdem, and V. Lifschitz. Fages’ theorem and answer set programming. In Proceedings of the Nonmonotonic Reasoning Workshop, NMR-2000, 2000. [Baker, 1989] A. B. Baker. A simple solution to the Yale shooting problem. In R. J. Brachman, H. J. Levesque, and R. Reiter, editors, KR’89: Principles of Knowledge Representation and Reasoning, pages 11–20. Morgan Kaufmann, San Mateo, California, 1989. [Baral and Subrahmanian, 1991] C. Baral and V. S. Subrahmanian. Dualities between alternative semantics for logic programming and nonmonotonic reasoning (extended abstract). In Proc. 1st Int. Worksop on Logic Programming and Nonmonotonic Reasoning, pages 69–86, 1991. [Baral, 2003] C. Baral. Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge UP, 2003. [Belnap, 1977] N. D. Belnap, Jr. A useful four-valued logic. In M. Dunn and G. Epstein, editors, Modern Uses of Multiple-Valued Logic, pages 8–41. D. Reidel, 1977. [Benferhat et al., 1993] S. Benferhat, C. Cayrol, D. Dubois, J. Lang, and H. Prade. Inconsistency management and prioritized syntax-based entailment. In R. Bajcsy, editor, Proceedings Int. Joint Conf. on Artificial Intelligence, IJCAI’93, pages 640–645, Chambery, France, 1993. Morgan Kaufmann. [Besnard, 1989] P. Besnard. An Introduction to Default Logic. Springer, 1989. [Blamey, 1986] S. Blamey. Partial logic. In D. M. Gabbay and F. Guenthner, editors, Handbook of Philosophical Logic, Vol. III, pages 1–70. D. Reidel, 1986. [Bochman, 1994] A. Bochman. On the relation between default and modal consequence relations. In Proc Int. Conf. on Principles of Knowledge Representation and Reasoning, pages 63–74, 1994. [Bochman, 1995a] A. Bochman. Default consequence relations as a logical framework for logic programs. In Proc. Int. Conf. on Logic Programming and Nonmonotonic Reasoning, pages 245–258. Springer, 1995. [Bochman, 1995b] A. Bochman. Modal nonmonotonic logics demodalized. Annals of Mathematics and Artificial Intelligence, 15:101–123, 1995. [Bochman, 1995c] A. Bochman. On bimodal nonmonotonic logics and their unimodal and nonmodal equivalents. In Proc. IJCAI’95, pages 1518–1524, 1995. [Bochman, 1996a] A. Bochman. On a logical basis of general logic programs. In Proc. Workshop on Nonmonotonic Extensions of Logic Programming, Lecture Notes in AI, 1996. Springer Verlag. [Bochman, 1996b] A. Bochman. On a logical basis of normal logic programs. Fundamenta Informaticae, 28:223–245, 1996.
624
Alexander Bochman
[Bochman, 1998a] A. Bochman. A logical foundation for logic programming I: Biconsequence relations and nonmonotonic completion. Journal of Logic Programming, 35:151–170, 1998. [Bochman, 1998b] A. Bochman. A logical foundation for logic programming II: Semantics of general logic programs. Journal of Logic Programming, 35:171–194, 1998. [Bochman, 1998c] A. Bochman. On the relation between default and modal nonmonotonic reasoning. Artificial Intelligence, 101:1–34, 1998. [Bochman, 2001] A. Bochman. A Logical Theory of Nonomonotonic Inference and Belief Change. Springer, 2001. [Bochman, 2003a] A. Bochman. Brave nonmonotonic inference and its kinds. Annals of Mathematics and Artificial Intelligence, 39:101–121, 2003. [Bochman, 2003b] A. Bochman. Collective argumentation and disjunctive logic programming. Journal of Logic and Computation, 9:55–56, 2003. [Bochman, 2003c] A. Bochman. A logic for causal reasoning. In Proceedings IJCAI’03, Acapulco, 2003. Morgan Kaufmann. [Bochman, 2004a] A. Bochman. A causal approach to nonmonotonic reasoning. Artificial Intelligence, 160:105–143, 2004. [Bochman, 2004b] A. Bochman. A causal logic of logic programming. In D. Dubois, C. Welty, and M.-A. Williams, editors, Proc. Ninth Conference on Principles of Knowledge Representation and Reasoning, KR’04, pages 427–437, Whistler, 2004. [Bochman, 2005] A. Bochman. Explanatory Nonmonotonic Reasoning. World Scientific, 2005. [Bonatti and Olivetti, 2002] P. A. Bonatti and N. Olivetti. Sequent calculi for propositional nonmonotonic logics. ACM Trans. Comput. Log., 3:226–278, 2002. [Bondarenko et al., 1997] A. Bondarenko, P. M. Dung, R. A. Kowalski, and F. Toni. An abstract, argumentation-theoretic framework for default reasoning. Artificial Intelligence, 93:63– 101, 1997. [Bossu and Siegel, 1985] G. Bossu and P. Siegel. Saturation, nonmonotonic reasoning and the closed-world assumption. Artificial Intelligence, 25:13–63, 1985. [Brewka and Gottlob, 1997] G. Brewka and G. Gottlob. Well-founded semantics for default logic. Fundamenta Informaticae, 31:221–236, 1997. [Brewka and Konolige, 1993] G. Brewka and K. Konolige. An abductive framework for general logic programs and other nonmonotonic systems. In Proc. IJCAI, pages 9–17, 1993. [Brewka et al., 1997] G. Brewka, J. Dix, and K. Konolige. Nonmonotonic Reasoning: An Overview. CSLI Publications, Stanford, 1997. [Brewka, 1991] G. Brewka. Cumulative default logic: In defense of nonmonotonic inference rules. Artificial Intelligence, 50:183–205, 1991. [Brewka, 1992] G. Brewka. A framework for cumulative default logics. Technical Report TR92-042, ICSI, Berkeley, CA, 1992. [Cabalar, 2001] P. Cabalar. Well-founded semantics as two-dimensional here-and-there. In Proceedings of ASP 01, 2001 AAAI Spring Symposium Series, pages 15–20, Stanford, 2001. [Chen, 1994] J. Chen. Relating only knowing to minimal belief and negation as failure. Journal of Experimental and Theoretical Artificial Intelligence, 6:409–429, 1994. [Ches˜ nevar et al., 2000] C. I. Ches˜ nevar, A. G. Marguitman, and R. P. Loui. Logical models of argument. ACM Computing Surveys, 32:337–383, 2000. [Clark, 1978] K. Clark. Negation as failure. In H. Gallaire and J. Minker, editors, Logic and Data Bases, pages 293–322. Plenum Press, 1978. [Colmerauer et al., 1973] A. Colmerauer, H. Kanoui, R. Pasero, and P. Roussel. Un systeme de communication hommemachine. Technical report, Groupe de Intelligence Artificielle Universitae de AixMarseille II, Marseille, 1973. [Console et al., 1991] L. Console, D. Theseider Dupre, and P. Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation, 1:661–690, 1991. [Cox and Pietrzykowski, 1987] P. T. Cox and T. Pietrzykowski. General diagnosis by abductive inference. In Proc. IEEE Symposium on Logic Programming, pages 183–189, 1987. [Darwiche and Pearl, 1994] A. Darwiche and J. Pearl. Symbolic causal networks. In Proceedings AAAI’94, pages 238–244, 1994. [de Kleer et al., 1992] J. de Kleer, A. K. Mackworth, and R. Reiter. Characterizing diagnoses and systems. Artificial Intelligence, 52:197–222, 1992. [de Kleer, 1986] J. de Kleer. An assumption-based TMS. Artificial Intelligence, 28:127–162, 1986.
Nonmonotonic Reasoning
625
[Delgrande, 1987] J. P. Delgrande. A first-order conditional logic for prototypical properties. Artificial Intelligence, 33:105–130, 1987. [Delgrande, 1988] J. Delgrande. An approach to default reasoning based on a firstorder conditional logic: revised report. Artificial Intelligence, 36:63–90, 1988. [Denecker and Schreye, 1992] M. Denecker and D. De Schreye. SLDNFA: an abductive procedure for normal abductive programs. In Proc. Joint Int.l Conf. and Symp. on Logic Programming, pages 686–702, 1992. [Denecker et al., 2001] M. Denecker, M. Bruynooghe, and V. W. Marek. Logic programming revisited: Logic programs as inductive definitions. ACM Trans. Comput. Log., 2:623–654, 2001. [Denecker et al., 2003] M. Denecker, V. W. Marek, and M. Truszczy´ nski. Uniform semantic treatment of default and autoepistemic logics. Artificial Intelligence, 143:79–122, 2003. [Dix, 1991] J. Dix. Classifying semantics of logic programs. In A. Nerode, W. Marek, and V. S. Subrahmanian, editors, Proceedings 1st International Workshop on Logic Programming and Nonmonotonic Reasoning, pages 166–180, Cambridge, Mass., 1991. MIT Press. [Dix, 1992] J. Dix. Classifying semantics of disjunctive logic programs. In K. Apt, editor, Proc. Joint Int. Conf. and Symp. on Logic Programming, ICSLP’92, pages 798–812. MIT Press, 1992. [Doherty et al., 1995] P. Doherty, W. Lukaszewicz, and A. Szalas. Computing circumscription revisited: Preliminary report. In Proceedings Int. Joint Conf. on Artificial Intelligence, pages 1502–1508, 1995. [Doyle, 1979] J. Doyle. A truth maintenance system. Artificial Intelligence, 12:231–272, 1979. [Doyle, 1994] J. Doyle. Reasoned assumptions and rational psychology. Fundamenta Informaticae, 20:3573, 1994. [Dung and Son, 2001] P. M. Dung and T. C. Son. An argument-based approach to reasoning with specificity. Artificial Intelligence, 133:35–85, 2001. [Dung et al., 2006] P. M. Dung, R. A. Kowalski, and F. Toni. Dialectic proof procedures for assumption-based, admissible argumentation. Artificial Intelligence, 170:114159, 2006. [Dung, 1992] P. M. Dung. Acyclic disjunctive programs with abductive procedure as proof procedure. In Proc. Int. Conference on Fifth Generation Computer Systems, pages 555–561. ICOT, 1992. [Dung, 1995a] P. M. Dung. An argumentation-theoretic foundation for logic programming. J. of Logic Programming, 22:151–177, 1995. [Dung, 1995b] P. M. Dung. On the acceptability of arguments and its fundamental role in non-monotonic reasoning, logic programming and n-persons games. Artificial Intelligence, 76:321–358, 1995. [Dunn, 1976] J. M. Dunn. Intuitive semantics for first-degree entailment and coupled trees. Philosophical Studies, 29:149–168, 1976. [Eiter and Lukasiewicz, 2004] T. Eiter and T. Lukasiewicz. Complexity results for explanations in the structural-model approach. Artificial Intelligence, 154(1-2):145–198, April 2004. [Elkan, 1990] C. Elkan. A rational reconstruction of nonmonotonic truth maintenance systems. Artificial Intelligence, 43:219–234, 1990. [Erdem and Lifschitz, 2003] E. Erdem and V. Lifschitz. Tight logic programs. Theory and Practice of Logic Programming, 3:499–518, 2003. [Etherington and Reiter, 1983] D. Etherington and R. Reiter. On inheritance hierarchies with exceptions. In Proceedings of AAAI-83, pages 104–108, 1983. [Etherington et al., 1985] D. Etherington, R. Mercer, and R. Reiter. On the adequacy of predicate circumscription for closed world reasoning. Computational Intelligence, 1:11–15, 1985. [Etherington et al., 1991] D. Etherington, S. Kraus, and D. Perlis. Nonmonotonicity and the scope of reasoning. Artificial Intelligence, 52:221–261, 1991. [Etherington, 1987] D. Etherington. Reasoning with Incomplete Information. Research Notes in AI. Pitman, London, 1987. [Fages, 1994] F. Fages. Consistency of Clark’s completion and existence of stable models. Journal of Methods of Logic in Computer Science, 1:51–60, 1994. [Finger, 1987] J. J. Finger. Exploiting Constraints in Design Synthesis. Ph.D. dissertation, Department of Computer Science, Stanford University, Stanford, California, 1987. [Fitting et al., 1992] M. C. Fitting, W. Marek, and M. Truszczy´ nski. The pure logic of necessitation. Journal of Logic and Computation, 2:349–373, 1992.
626
Alexander Bochman
[Fitting, 1991] M. C. Fitting. Bilattices and the semantics of logic programming. Journal of Logic Programming, 11:91–116, 1991. [Gabbay et al., 1994] D. Gabbay, C. J. Hogger, and J. A. Robinson, editors. Handbook of Logic in Artificial Intelligence and Logic Programming. Volume 3: Nonmonotonic Reasoning and Uncertain Reasoning. Oxford UP, Oxford, 1994. [Gabbay, 1985] D. M. Gabbay. Theoretical foundations for non-monotonic reasoning in expert systems. In K. R. Apt, editor, Logics and Models of Concurrent Systems. Springer, 1985. [Gabbay, 1998] D. M. Gabbay. Fibring Logics. Oxford University Press, 1998. [G¨ ardenfors and Makinson, 1994] P. G¨ ardenfors and D. Makinson. Nonmonotonic inference based on expectations. Artificial Intelligence, 65:197–245, 1994. [Geffner, 1992] H. Geffner. Default Reasoning. Causal and Conditional Theories. MIT Press, 1992. [Gelfond and Lifschitz, 1988] M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In R. Kowalski and K. Bowen, editors, Proc. 5th International Conf./Symp. on Logic Programming, pages 1070–1080, Cambridge, MA, 1988. MIT Press. [Gelfond and Lifschitz, 1991] M. Gelfond and V. Lifschitz. Classical negation and disjunctive databases. New Generation Computing, 9:365–385, 1991. [Gelfond et al., 1991] M. Gelfond, V. Lifschitz, H. Przymusi´ nska, and M. Truszczy´ nski. Disjunctive defaults. In Proc. Second Int. Conf. on Principles of Knowledge Representation and Reasoning, KR’91, pages 230–237, Cambridge, Mass., 1991. [Gelfond, 1994] M. Gelfond. Logic programming and reasoning with incomplete information. Annals of Mathematics and Artificial Intelligence, 12:89–116, 1994. [Ginsberg, 1987] M. L. Ginsberg, editor. Readings in Nonmonotonic Reasoning. Morgan Kaufmann, Los Altos, Ca., 1987. [Ginsberg, 1989] M. L. Ginsberg. A circumscriptive theorem prover. Artif. Intell., 39:209–230, 1989. [Ginsberg, 1993] M. Ginsberg. Essentials of Artificial Intelligence. Morgan Kaufmann, 1993. [Giunchiglia et al., 2004] E. Giunchiglia, J. Lee, V. Lifschitz, N. McCain, and H. Turner. Nonmonotonic causal theories. Artificial Intelligence, 153:49–104, 2004. [Goldszmidt et al., 1993] M. Goldszmidt, P. Morris, and J. Pearl. A maximum entropy approach to nonmonotonic reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:220–232, 1993. [Gottlob, 1994] G. Gottlob. The power of beliefs or translating default logic into standard autoepistemic logic. In G. Lakemeyer and B. Nebel, editors, Foundations of Knowledge Representation and Reasoning, volume 810 of LNAI, pages 133–144. Springer, 1994. [Grosof, 1991] B. N. Grosof. Generalising prioritization. In J. Allen, R. Fikes, and E. Sandewall, editors, Proc. Second International Conference on Principles of Knowledge Representation and Reasoning (KR’91), pages 289–300. Morgan Kaufmann, 1991. [Halpern and Moses, 1985] Joseph Y. Halpern and Yoram Moses. Towards a theory of knowledge and ignorance. In Krzysztof R. Apt, editor, Logics and Models of Concurrent Systems, pages 459–476. Springer-Verlag, Berlin, 1985. [Halpern and Pearl, 2001a] J. Y. Halpern and J. Pearl. Causes and explanations: A structralmodel approach—part I: Causes. In Proc. Seventh Conf. On Uncertainty in Artificial Intelligence (UAI’01), pages 194–202, San Francisco, CA, 2001. Morgan Kaufmann. [Halpern and Pearl, 2001b] J. Y. Halpern and J. Pearl. Causes and explanations: A structralmodel approach—part II: Explanations. In Proceedings Int. Joint Conf. on Artificial Intelligence, IJCAI-01. Morgan Kaufmann, 2001. [Halpern, 1997] J. Y. Halpern. A critical reexamination of default logic, autoepistemic logic, and only knowing. Computational Intelligence, 13:144–163, 1997. [Halpern, 1998] J. Y. Halpern. Axiomatizing causal reasoning. In G. F. Cooper and S. Moral, editors, Uncertainty in Artificial Intelligence, pages 202–210. Morgan Kaufmann, 1998. [Hanks and McDermott, 1987] S. Hanks and D. McDermott. Non-monotonic logics and temporal projection. Artificial Intelligence, 33:379–412, 1987. [Hewitt, 1969] C. E. Hewitt. PLANNER: A language for proving theorems in robots. In First International Joint Conference on Artificial Intelligence, pages 295–301, 1969. [Horty, 1994] J. F. Horty. Some direct theories of nonmonotonic inheritance. In D. M. Gabbay, C. J. Hogger, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming 3: Nonmonotonic Reasoning and Uncertain Reasoning. Oxford University Press, Oxford, 1994.
Nonmonotonic Reasoning
627
[Imielinski, 1987] T. Imielinski. Results on translating defaults to circumscription. Artificial Intelligence, 32:131–146, 1987. [Inoue and Sakama, 1994] K. Inoue and C. Sakama. On positive occurrences of negation as failure. In Proc. 4th Int. Conf. on Principles of Knowledge Representation and Reasoning, KR’94, pages 293–304. Morgan Kauffman, 1994. [Inoue and Sakama, 1998] K. Inoue and C. Sakama. Negation as failure in the head. Journal of Logic Programming, 35:39–78, 1998. [Israel, 1980] D. J. Israel. What’s wrong with non-monotonic logic. In Proceedings of the First Annual Conference of the American Association for Artificial Intelligence, Stanford University, 1980. [Iwasaki and Simon, 1986] Yumi Iwasaki and Herbert Simon. Causality in device behavior. Artificial Intelligence, 29(1):3–32, 1986. [Jakobovits and Vermeir, 1999] H. Jakobovits and D. Vermeir. Robust semantics for argumentation frameworks. Journal of Logic and Computation, 9:215–261, 1999. [Janhunen, 1996] T. Janhunen. Representing autoepistemic introspection in terms of default rules. In Proc. European Conf. on AI, ECAI-96, pages 70–74, Budapest, 1996. John Wiley & Sons. [Kakas and Mancarella, 1990] A. C. Kakas and P. Mancarella. Generalized stable models: A semantics for abduction. In Proc. European Conf. on Artificial Intelligence, ECAI-90, pages 385–391, Stockholm, 1990. [Kakas and Toni, 1999] A. C. Kakas and F. Toni. Computing argumentation in logic programming. Journal of Logic and Computation, 9:515–562, 1999. [Kakas et al., 1992] A. C. Kakas, R. A. Kowalski, and F. Toni. Abductive logic programming. Journal of Logic and Computation, 2:719–770, 1992. [Konolige and Myers, 1989] K. Konolige and K. L. Myers. Representing defaults with epistemic concepts. Computational Intelligence, 5:32–44, 1989. [Konolige, 1988] K. Konolige. On the relation between default and autoepistemic logic. Artificial Intelligence, 35:343–382, 1988. [Konolige, 1989] K. Konolige. On the relation between autoepistemic logic and circumscription. In Proc. IJCAI-89, pages 1213–1218, 1989. [Konolige, 1992] K. Konolige. Abduction versus closure in causal theories. Artificial Intelligence, 53:255–272, 1992. [Konolige, 1994] K. Konolige. Using default and causal reasoning in diagnosis. Annals of Mathematics and Artificial Intelligence, 11:97–135, 1994. [Kowalski and Toni, 1996] R. A. Kowalski and F. Toni. Abstract argumentation. Artificial Intelligence and Law, 4:275–296, 1996. [Kratzer, 1981] A. Kratzer. Partition and revision: The semantics of counterfactuals. Journal of Philosophical Logic, 10:201–216, 1981. [Kraus et al., 1990] S. Kraus, D. Lehmann, and M. Magidor. Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence, 44:167–207, 1990. [Lee and Lifschitz, 2003] J. Lee and V. Lifschitz. Loop formulas for disjunctive logic programs. In Proc. Nineteenth Int. Conference on Logic Programming, pages 451–465, 2003. [Lee and Lin, 2004] J. Lee and F. Lin. Loop formulas for circumscription. In Proc AAAI-04, pages 281–286, San Jose, CA, 2004. [Lehmann and Magidor, 1992] D. Lehmann and M. Magidor. What does a conditional knowledge base entail? Artificial Intelligence, 55:1–60, 1992. [Lehmann, 1989] D. Lehmann. What does a conditional knowledge base entail? In R. Brachman and H. J. Levesque, editors, Proc. 2nd Int. Conf. on Principles of Knowledge Representation and Reasoning, KR’89, pages 212–222. Morgan Kaufmann, 1989. [Lehmann, 1995] D. Lehmann. Another perspective on default reasoning. Annals of Mathematics and Artificial Intelligence, 15:61–82, 1995. [Levesque and Lakemeyer, 2000] H. Levesque and G. Lakemeyer. The Logic of Knowledge Bases. The MIT Press, 2000. [Levesque, 1989] H. J. Levesque. A knowledge-level account of abduction. In Proc. IJCAI, pages 1061–1067, 1989. [Levesque, 1990] H. J. Levesque. All I know: A study in autoepistemic logic. Artificial Intelligence, 42:263309, 1990. [Lewis, 1973] D. Lewis. Counterfactuals. Harvard University Press, Cambridge, Mass., 1973.
628
Alexander Bochman
[Lewis, 1981] D. Lewis. Ordering semantics and premise semantics for counterfactuals. Journal of Philosophical Logic, 10:217–234, 1981. [Lifschitz and Schwarz, 1993] V. Lifschitz and G. Schwarz. Extended logic programs as autoepistemic theories. In L. M. Pereira and A. Nerode, editors, Proc. Second Int. Workshop on Logic Programming and Nonmonotonic Reasoning, pages 101–114. MIT Press, 1993. [Lifschitz and Woo, 1992] V. Lifschitz and T. Woo. Answer sets in general nonmonotonic reasoning (preliminary report). In Proc. Third Int. Conf. on Principles of Knowledge Representation and Reasoning, KR‘92, pages 603–614. Morgan Kauffman, 1992. [Lifschitz et al., 1999] V. Lifschitz, L. R. Tang, and H. Turner. Nested expressions in logic programs. Annals of Mathematics and Artificial Intelligence, 25:369–389, 1999. [Lifschitz et al., 2001] V. Lifschitz, D. Pearce, and A. Valverde. Strongly equivalent logic programs. ACM Transactions on Computational Logic, 2:526–541, 2001. [Lifschitz, 1985] V. Lifschitz. Computing circumscription. In Proc. 9th Int. Joint Conf. on Artificial Intelligence, IJCAI-85, pages 121–127. Morgan Kaufmann, 1985. [Lifschitz, 1987a] V. Lifschitz. Pointwise circumscription. In M. Ginsberg, editor, Readings in Non-Monotonic Reasoning, pages 179–193. Morgan Kaufmann, San Mateo, CA, 1987. [Lifschitz, 1987b] Vladimir Lifschitz. Formal theories of action: Preliminary report. In John McDermott, editor, Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Los Altos, California, 1987. Morgan Kaufmann. [Lifschitz, 1991a] V. Lifschitz, editor. Artificial Intelligence and Mathematical Theory of Computation, Papers in Honor of John McCarthy. Academic Press, 1991. [Lifschitz, 1991b] V. Lifschitz. Nonmonotonic databases and epistemic queries. In Proceedings Int. Joint Conf. on Artificial Intelligence, IJCAI-91, pages 381–386. Morgan Kaufmann, 1991. [Lifschitz, 1994a] V. Lifschitz. Minimal belief and negation as failure. Artificial Intelligence, 70:53–72, 1994. [Lifschitz, 1994b] Vladimir Lifschitz. Circumscription. In Dov Gabbay, C. J. Hogger, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Volume 3: Nonmonotonic Reasoning and Uncertain Reasoning, pages 298–352. Oxford University Press, 1994. [Lifschitz, 1997] V. Lifschitz. On the logic of causal explanation. Artificial Intelligence, 96:451– 465, 1997. [Lin and Shoham, 1989] F. Lin and Y. Shoham. Argument systems: A uniform basis for nonmonotonic reasoning. In Proceedings of 1st Intl. Conference on Principles of Knowledge Representation and Reasoning, pages 245–255, Stanford, CA, 1989. [Lin and Shoham, 1992] F. Lin and Y. Shoham. A logic of knowledge and justified assumptions. Artificial Intelligence, 57:271–289, 1992. [Lin and You, 2002] F. Lin and J.-H. You. Abduction in logic programming: A new definition and an abductive procedure based on rewriting. Artificial Intelligence, 140:175–205, 2002. [Lin and Zhao, 2002] F. Lin and Y. Zhao. ASSAT: Computing answer sets of a logic program by SAT solvers. In Proceedings AAAI-02, 2002. [Lin, 1995] F. Lin. Embracing causality in specifying the inderect effect of actions. In Proc. Int. Joint Conf. on Artificial Intelligence, IJCAI-95, pages 1985–1991, Montreal, 1995. Morgan Kaufmann. [Lin, 1996] F. Lin. Embracing causality in specifying the indeterminate effects of actions. In Proceedings AAAI-96, pages 670–676, 1996. [Lobo and Uzc´ ategui, 1997] J. Lobo and C. Uzc´ ategui. Abductive consequence relations. Artificial Intelligence, 89:149–171, 1997. [Lobo et al., 1992] J. Lobo, J. Minker, and A. Rajasekar. Foundations of Disjunctive Logic Programming. MIT Press, Cambridge, Mass., 1992. [Lukaszewicz, 1990] W. Lukaszewicz. Non-Monotonic Reasoning: Formalization of Commonsense Reasoning. Ellis Horwood, New York, 1990. [Makinson and van der Torre, 2000] D. Makinson and L. van der Torre. Input/Output logics. Journal of Philosophical Logic, 29:383–408, 2000. [Makinson, 1989] D. Makinson. General theory of cumulative inference. In M. Reinfrank, editor, Nonmonotonic Reasoning, volume 346 of LNAI, pages 1–18. Springer, 1989.
Nonmonotonic Reasoning
629
[Makinson, 1994] D. Makinson. General patterns in nonmonotonic reasoning. In D. M. Gabbay and Others, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Vol. 3, Nonmonotonic and Uncertain Reasoning, volume 2, pages 35–110. Oxford University Press, Oxford, 1994. [Makinson, 2003] D. Makinson. Bridges between classical and nonmonotonic logic. Logic Journal of the IGPL, 11:69–96, 2003. [Makinson, 2005] D. Makinson. Bridges from Classical to Nonmonotonic Logic. King’s College Publications, 2005. [Marek and Subrahmanian, 1992] W. Marek and V. S. Subrahmanian. The relationship between stable, supported, default and autoepistemic semantics for general logic programs. Theoretical Computer Science, 103:365–386, 1992. [Marek and Truszczy´ nski, 1989] W. Marek and M. Truszczy´ nski. Relating autoepistemic and default logics. In Int. Conf. on Principles of Knowledge Representation and Reasoning, KR’89, pages 276–288, San Mateo, Calif., 1989. Morgan Kaufmann. [Marek and Truszczy´ nski, 1990] W. Marek and M. Truszczy´ nski. Modal logic for default reasoning. Annals of Mathematics and Artificial Intelligence, 1:275–302, 1990. [Marek and Truszczy´ nski, 1993] W. Marek and M. Truszczy´ nski. Nonmonotonic Logic, Context-Dependent Reasoning. Springer, 1993. [Marek et al., 1990] W. Marek, A. Nerode, and J. Remmel. A theory of nonmonotonic rule systems. Annals of Mathematics and Artificial Intelligence, 1:241–273, 1990. [Marek et al., 1993] V. W. Marek, G. F. Schwarz, and M. Truszchinski. Modal nonmonotonic logics: ranges, characterization, computation. Journal of ACM, 40:963–990, 1993. [McCain and Turner, 1997] N. McCain and H. Turner. Causal theories of action and change. In Proceedings AAAI-97, pages 460–465, 1997. [McCarthy and Hayes, 1969] J. McCarthy and P. Hayes. Some philosophical problems from the standpoint of artificial intelligence. In B. Meltzer and D. Michie, editors, Machine Intelligence, pages 463–502. Edinburg University Press, Edinburg, 1969. [McCarthy, 1959] John McCarthy. Programs with common sense. In Proceedings of the Teddington Conference on the Mechanization of Thought Processes, pages 75–91, London, 1959. Her Majesty’s Stationary Office. [McCarthy, 1980] J. McCarthy. Circumscription — a form of non-monotonic reasoning. Artificial Intelligence, 13:27–39, 1980. [McCarthy, 1986] J. McCarthy. Applications of circumscription to formalizing common sense knowledge. Artificial Intelligence, 13:27–39, 1986. [McDermott and Doyle, 1980] D. McDermott and J. Doyle. Nonmonotonic logic. Artificial Intelligence, 13:41–72, 1980. [McDermott, 1982] D. McDermott. Nonmonotonic logic II: Nonmonotonic modal theories. Journal of the ACM, 29:33–57, 1982. [McDermott, 1987] Drew McDermott. Critique of pure reason. Computational Intelligence, 3(3):149–160, 1987. [Minker, 1982] J. Minker. On indefinite databases and the closed world assumption. In Proceedings of 6th Conference on Automated Deduction, pages 292–308, New York, 1982. [Minker, 1993] Jack Minker. An overview of nonmonotonic reasoning and logic programming. Journal of Logic Programming, 17:95–126, 1993. [Minker, 2000] Jack Minker, editor. Logic-Based Artificial Intelligence. Kluwer Academic Publishers, Dordrecht, 2000. [Minsky, 1974] M. Minsky. A framework for representing knowledge. Tech. Report 306, Artificial Intelligence Laboratory, MIT, 1974. [Moore, 1985] R. C. Moore. Semantical considerations on non-monotonic logic. Artificial Intelligence, 25:75–94, 1985. [Morgenstern and Stein, 1994] L. Morgenstern and L. Stein. Motivated action theory: a formal theory of causal reasoning. Artificial Intelligence, 71(1):1–42, 1994. [Morgenstern, 1996] L. Morgenstern. The problem with solutions to the frame problem. In K. M. Ford and Z. Pylyshyn, editors, The Robot’s Dilemma Revisited: The Frame Problem in Artificial Intelligence, pages 99–133. Ablex Publishing Co., Norwood, New Jersey, 1996. [Morris, 1988] P. H. Morris. The anomalous extension problem in default reasoning. Artificial Intelligence, 35:383–399, 1988. [Nayak, 1994] P. P. Nayak. Causal approximations. Artificial Intelligence, 70:277–334, 1994.
630
Alexander Bochman
[Niemel¨ a, 1992] I. Niemel¨ a. A unifying framework for nonmonotonic reasoning. In Proc. 10th European Conference on Artificial Intelligence, ECAI-92, pages 334–338, Vienna, 1992. John Wiley. [Oikarinen and Janhunen, 2005] E. Oikarinen and T. Janhunen. circ2dlp — translating circumscription into disjunctive logic programming. In C. Baral, G. Greco, N. Leone, and G. Terracina, editors, Logic Programming and Nonmonotonic Reasoning, 8th International Conference, LPNMR 2005, Diamante, Italy, September 5-8, 2005, Proceedings, volume 3662 of Lecture Notes in Computer Science, pages 405–409. Springer, 2005. [Pearce, 1997] D. Pearce. A new logical characterization of stable models and answer sets. In J. Dix, L. M. Pereira, and T. Przymusinski, editors, Non-Monotonic Extensions of Logic Programming, volume 1216 of LNAI, pages 57–70. Springer, 1997. [Pearl, 1990] J. Pearl. System Z: A natural ordering of defaults with tractable applications to default reasoning. In Proceedings of the Third Conference on Theoretical Aspects of Reasoning About Knowledge (TARK’90), pages 121–135, San Mateo, CA, 1990. Morgan Kaufmann. [Pearl, 2000] J. Pearl. Causality. Cambridge UP, 2000. [Perlis, 1986] D. Perlis. On the consistency of commonsense reasoning. Computational Intelligence, 2:180–190, 1986. [Poole, 1985] D. Poole. On the comparison of theories: Preferring the most specific explanation. In Proceedings Ninth International Joint Conference on Artificial Intelligence, pages 144–147, Los Angeles, 1985. [Poole, 1988a] D. Poole. A logical framework for default reasoning. Artificial Intelligence, 36:27–47, 1988. [Poole, 1988b] D. Poole. Representing knowledge for logic-based diagnosis. In Proc. Int. Conf. on Fifth Generation Computer Systems, pages 1282–1290, Tokyo, 1988. [Poole, 1989a] D. Poole. Explanation and prediction: An architecture for default and abductive reasoning. Computational Intelligence, 5:97–110, 1989. [Poole, 1989b] D. Poole. What the lottery paradox tells us about default reasoning. In R. J. Brachman, H. J. Levesque, and R. Reiter, editors, Proceedings of the First Int. Conf. on Principles of Knowledge Representation and Reasoning, pages 333–340. Morgan Kaufmann, 1989. [Poole, 1994a] D. Poole. Default logic. In D. M. Gabbay, C. J. Hogger, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, volume 3, pages 189–215. Oxford University Press, 1994. [Poole, 1994b] D. Poole. Representing diagnosis knowledge. Annals of Mathematics and Artificial Intelligence, 11:33–50, 1994. [Przymusinska and Przymusinski, 1994] H. Przymusinska and T. Przymusinski. Stationary default extensions. Fundamenta Informaticae, 21:67–87, 1994. (In print). [Przymusinski, 1988] T. C. Przymusinski. On the declarative semantics of stratified deductive databases and logic programs. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pages 193–216. Morgan Kauffman, 1988. [Przymusinski, 1990] T. C. Przymusinski. The well-founded semantics coincides with the threevalued stable semantics. Fundamenta Informaticae, 13:445–464, 1990. [Przymusinski, 1991a] T. C. Przymusinski. Semantics of disjunctive logic programs and deductive databases. In Proc. Second Int. Conf. On Deductive and Object-Oriented Databases, pages 85–107. Springer, 1991. [Przymusinski, 1991b] T. C. Przymusinski. Three-valued nonmonotonic formalisms and semantics of logic programs. Artificial Intelligence, 49:309–343, 1991. [Przymusinski, 1994] T. C. Przymusinski. A knowledge representation framework based on autoepistemic logic of minimal beliefs. In Proceedings AAAI-94, 1994. [Ramsey, 1978] F. P. Ramsey. Foundations. Routledge & Kegan Paul, London, 1978. [Reggia et al., 1985] J. A. Reggia, D. S. Nau, and P. Y. Wang. A formal model of diagnostic inference. i. problem formulation and decomposition. Inf. Sci., 37:227–256, 1985. [Reinfrank et al., 1989] M. Reinfrank, O. Dressler, and G. Brewka. On the relation between truth maintenance and autoepistemic logic. In Proc. Int. Joint Conf. on Artificial Intelligence, pages 1206–1212, 1989. [Reiter and Criscuolo, 1981] R. Reiter and G. Criscuolo. On interacting defaults. In Proceedings of IJCAI81, pages 270–276, 1981.
Nonmonotonic Reasoning
631
[Reiter and de Kleer, 1987] R. Reiter and J. de Kleer. Formal foundations for assumptionbased truth maintenance systems: preliminary report. In Proceedings of AAAI-87, pages 183–188, Seattle, Washington, 1987. [Reiter, 1978] R. Reiter. On closed world data bases. In H. Gallaire and J. Minker, editors, Logic and Data Bases, pages 119–140. Plenum Press, 1978. [Reiter, 1980] R. Reiter. A logic for default reasoning. Artificial Intelligence, 13:81–132, 1980. [Reiter, 1982] R. Reiter. Circumscription implies predicate completion (sometimes). In Proc. AAAI, pages 418–420, 1982. [Reiter, 1987a] R. Reiter. Nonmonotonic reasoning. Annual Review of Computer Science, 2:147– 186, 1987. [Reiter, 1987b] R. Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32:57–95, 1987. [Reiter, 1991] R. Reiter. The frame problem in the situation calculus: A simple solution (sometimes) and a completeness result for goal regression. In V. Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of Lohn McCarthy, pages 318–420. Academic Press, 1991. [Reiter, 2001] R. Reiter. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamic Systems. MIT Press, 2001. [Russell, 1957] B. Russell. Mysticism and Logic, chapter On the notion of cause, pages 180–208. Allen & Unvin, London, 1957. [Sandewall, 1972] E. Sandewall. An approach to the frame problem and its implementation. In Machine Intelligence, volume 7, pages 195–204. Edinburgh University Press, 1972. [Sandewall, 1994] Erik Sandewall. Features and Fluents: A Systematic Approach to the Representation of Knowledge About Dynamical Systems. Oxford University Press, Oxford, 1994. [Schlechta, 1997] K. Schlechta. Nonmonotonic Logics: Basic Concepts, Results, and Techniques, volume 1187 of Lecture Notes in AI. Springer Verlag, 1997. [Schlechta, 2004] Karl Schlechta. Coherent Systems, volume 2 of Studies in Logic and Practical Reasoning. Elsevier, Amsterdam, 2004. [Schubert, 1990] L. Schubert. Monotonic solution of the frame problem in the situation calculus; an efficient method for worlds with fully specified actions. In Henry Kyburg, Ronald Loui, and Greg Carlson, editors, Knowledge Representation and Defeasible Reasoning, pages 23–67. Kluwer Academic Publishers, Dordrecht, 1990. [Schwarz and Truszczy´ nski, 1994] G. Schwarz and M. Truszczy´ nski. Minimal knowledge problem: a new approach. Artificial Intelligence, 67:113–141, 1994. [Schwarz, 1990] G. Schwarz. Autoepistemic modal logics. In R. Parikh, editor, Theoretical Aspects of Reasoning about Knowledge, TARK-90, pages 97–109, San Mateo, CA, 1990. Morgan Kaufmann. [Schwarz, 1992a] G. Schwarz. Minimal model semantics for nonmonotonic modal logics. In Proceedings LICS-92, pages 34–43, Santa Cruz, CA., 1992. [Schwarz, 1992b] G. Schwarz. Reflexive autoepistemic logic. Fundamenta Informaticae, 17:157– 173, 1992. [Schwind, 1999] C. Schwind. Causality in action theories. Link¨ oping Electronic Articles in Computer and Information Science, 4(4), 1999. [Segerberg, 1971] K. Segerberg. An Essay in Classical Modal Logic, volume 13 of Filosofiska Studier. Uppsala University, 1971. [Shanahan, 1997] M. P. Shanahan. Solving the Frame Problem. The MIT Press, 1997. [Shepherdson, 1988] J. C. Shepherdson. Negation in logic programming. In J. Minker, editor, Deductive Databases and Logic Programming, pages 19–88. M. Kaufmann, 1988. [Shoham, 1988] Y. Shoham. Reasoning about Change. Cambridge University Press, 1988. [Simon, 1952] Herbert Simon. On the definition of the causal relation. The Journal of Philosophy, 49:517–528, 1952. [Stalnaker, 1968] R. Stalnaker. A theory of conditionals. In N. Rescher, editor, Studies in Logical Theory. Basil Blackwell, Oxford, 1968. [Stalnaker, 1993] Robert C. Stalnaker. A note on non-monotonic modal logic. Artificial Intelligence, 64(2):183–196, 1993. Widely circulated in manuscipt form, 1980 to 1992. [Tan and Pearl, 1995] S.-W. Tan and J. Pearl. Specificity and inheritance in default reasoning. In Proceedings Int. Joint Conf. on Artificial Intelligence, IJCAI-95, pages 1480–1486, 1995.
632
Alexander Bochman
[Thiele, 1990] H. Thiele. On generation of cumulative inference operators by default deduction rules. In Nonmonotonic and Inductive Logic, Lecture Notes in Computer Science, pages 100–137. Springer, 1990. [Thielscher, 1997] M. Thielscher. Ramification and causality. Artificial Intelligence, 89:317–364, 1997. [Thomason, 2003] Richmond Thomason. Logic and artificial intelligence. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. 2003. [Touretzky, 1986] D. S. Touretzky. The Mathematics of of Inheritance Systems. Morgan Kaufmann, Los Altos, 1986. [Truszczy´ nski, 1991] M. Truszczy´ nski. Modal interpretations of default logic. In J. Myopoulos and R. Reiter, editors, Proceedings Int. Joint Conf. on Artificial Intelligence, IJCAI’91, pages 393–398, San Mateo, Calif., 1991. Morgan Kaufmann. [Turner, 1999] H. Turner. A logic of universal causation. Artificial Intelligence, 113:87–123, 1999. [van Benthem, 1984] J. van Benthem. Foundations of conditional logic. Journal of Philosophical Logic, 13:303–349, 1984. [van Emden and Kowalski, 1976] M. H. van Emden and R. A. Kowalski. The semantics of predicate logic as a programming language. J. of ACM, 23:733–742, 1976. [van Gelder et al., 1991] A. van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semantics for general logic programs. J. ACM, 38:620–650, 1991. [Veltman, 1976] F. Veltman. Prejudices, presuppositions and the theory of conditionals. In J. Groenendijk and M. Stokhof, editors, Amsterdam Papers on Formal Grammar, volume 1, pages 248–281. Centrale Interfaculteit, Universiteit van Amsterdam, 1976. [Wittgenstein, 1961] L. Wittgenstein. Tractatus Logico-Philosophicus. Routledge & Kegan Paul, London, 1961. English translation by D. F. Pears and B. F. McGuinness. [You et al., 2000] J.-H. You, L. Y. Yuan, and R. Goebel. An abductive apprach to disjunctive logic programming. Journal of Logic Programming, 44:101–127, 2000. [Zhang and Foo, 2001] D. Zhang and N. Foo. EPDL: A logic for causal reasoning. In Proc Int. Joint Conf. on Artificial Intelligence, IJCAI-01, pages 131–136, Seattle, 2001. Morgan Kaufmann. [Zhang and Rounds, 1997] G.-Q. Zhang and W. C. Rounds. Defaults in domain theory. Theor. Comput. Sci., 177:155–182, 1997.
FREE LOGICS Carl J. Posy
1
1.1
INTRODUCTION
Overview
The term “free logic”, or more properly “free logics”, designates an approach to first order quantificational logic that is devoid of presuppositions about or commitments to the existence of anything. Now, to be sure, the question of the “existential presuppositions” or “existential commitments” (or “existential import” as it was once called) of formal logic was a special province of Aristotelian logic, and modern quantificational logic is said to neutralize this question by making all existential claims explicit. Here is a paradigmatic example: In the circumference of the traditional square of opposition All S are P
All S are not P
Some S is P
Some S is not P
the propositions beneath the vertical arrows are supposed to be (“subaltern”) consequences of the ones at the top. The upper horizontal arrows connect “contrary” propositions, which are supposed to be such that they can be simultaneously false but not simultaneously true; and the lower horizontal arrows connect “sub-contraries”, propositions that are can be simultaneously true but not simultaneously false. However it is easy to see that these relations will not hold unless we presuppose that at least one thing exists to which the predicate S applies. In traditional logic this assumption is an implicit presupposition. In modern standard quantificational theory, however, we can explicitly write ∃xSx. If we wanted to preserve the relations on this square we would add this formula as an explicit assumption, and we thus avoid the scent of implicit presuppositions. Actually, it took a new dual understanding of valid inference and a revolution in logical structure to achieve this insight. On the side of validity, modern logic
Handbook of the History of Logic. Volume 8 Dov M. Gabbay and John Woods (Editors) © 2007 Elsevier B.V. All rights reserved.
634
Carl J. Posy
replaces the old syllogistic theory with two different, well defined notions of validity: derivability in a syntactic formal system and the semantic notion of truth under all interpretations. And on the side of logical form, instead of “All S are P ” and “Some S are P ” (and their various negations) we now have a recursive construction built up from singular predications. Moreover, we express “all” and “some” by separate quantifiers that are never part of the basic predication. All this comes together in the observation that you need to assume ∃xSx, in order to get from ∀x(Sx → P x) to ∃x(Sx&P x). In this chapter I will sketch the development of free logic and show its connection to some of the large themes of this modern logic. Indeed, free logicians share both the spirit and the revolutionary letter of this modern enterprise. Philosophically, they believe that neither language nor logic alone should commit us to the existence of any particular objects nor, perhaps, to the existence of anything at all. Technically, they use both formal systems and modern semantics to work out their philosophical views, and they attend carefully to the nuances of logical form. But they apply this enterprise to contemporary logic itself. For, free logic starts with the observation that modern logic did not finish the job. Yes we now can bring hidden existential commitments to the surface, but our standard contemporary logic — both formal systems and semantics — still harbors intrinsic existential assumptions. Indeed those very same modern notions — singular terms, predication and quantifiers — are the chief culprits. Though there were isolated prior studies, free logic proper started with syntactic investigations in the mid 1950s and expanded to semantic studies in the mid 1960s. Parts II and III of the chapter will sketch the main points of these stages and will highlight the centrality of singular predication and quantification. Part IV briefly uses Russell’s theory of descriptions as an historic prism. In one direction this theory and the discussions surrounding it provide technical and philosophical precedents for free logic’s main themes. In the other direction, Russell’s theory directly led to one of the most fruitful applications of free logic, the modern theory of presupposition. Russell’s theory also hardened the extensional stream of modern logic, the stream within which free logics developed. But Part V highlights three studies which show that the free logician’s concern with predication and existence forms a natural and mutually beneficial link with the intentional tradition in modern logic as well.
1.2 Modern Morphology Since free logics are, as I said, highly sensitive to logical form, it is wise to begin with a clear statement about morphologies and languages. A morphology, M , consists of a stock of logical and non-logical symbols together with rules for forming terms and well-formed-formulas (wff’s). A language L, built from M , is determined by choosing a stock of non-logical symbols from M . Its terms and wff’s are then constructed in accordance with M .
Free Logics
635
Our basic morphology, MC is built as follows:1 The morphology MC Singular terms: Constants: a, b, c, d, a1 , . . ., b1 , . . ., . . . (A constant is a “closed” singular term.) Variables: x, y, z, x1 , . . ., y1 , . . ., . . . Predicates For each i(≥ 0) and each n(≥ 0) we have the n-place predicate letter Pin . Connectives Sentential connectives: ∼, → Quantifier : ∀ A formal language, LC , built from this morphology, will contain all the variables and logical symbols together with a (non-empty) selection of relation symbols and a (possibly empty) collection of constants. In such a language the notion of a well formed formula (“wff”) is defined as follows: 1. If t1 , . . ., tn are singular terms and Q is an n-place predicate letter, then Q(t1 , . . ., tn ) is a wff. Such a wff is an “atomic” wff. (Note that the case in which n = 0 shows that every proposition letter standing alone is an atomic wff.) 2. If A and B are wffs so too are ∼ A and (A → B). 3. If A is a wff and v is a variable, then ∀vA is a wff. 4. There are no other wffs. The scope of the quantifier in a wff and free and bound occurrences of variables are defined as usual, as are the notions of open and closed wff’s. A closed wff is called a sentence. Abbreviations (A ∨ B) abbreviates (∼ A → B) (A&B) abbreviates ∼ (∼ A∨ ∼ B) ∃vA abbreviates ∼ ∀v ∼ A. At/s is the wff derived from A by replacing all free occurrences of t by s. At//s is the wff achieved by replacing some (or all) occurrences of t by s. We will need several variations on this basic morphology: 1 I will be loose in what follows about use and mention, and thus will dispense with quotation marks whenever the context is clear. Similarly, I will drop subscripts and superscripts from predicate letters whenever the context allows.
636
Carl J. Posy
The Morphology M= In the morphology M= we add the two place predicate = to MC in order to designate the relation of identity. For any pair, {t1 , t2 }, of singular terms in such a language, t1 = t2 will be an atomic wff. Formal systems formulated in a language L= will generally contain axioms for =. In such a language we can express “individual existence” via the wff ∃x(x = t). The Morphology ME! A language LE! built from the morphology ME! will get the same effect by treating “existence” as a predicate in its own right. Specifically, we add the symbol E! to the morphology MC as a specially designated one place predicate letter. Consequently for any singular term t, E!t is an atomic wff.2 The Morphology Mι Languages built out of Mι add the ι (definite description) operator to their basic vocabulary. In such languages we shall have to define the notion of singular term in a recursive fashion: 1. Any individual variable is a singular term 2. Any individual constant is a closed singular term 3. If v is a variable and A is a wff, then ιvA is a singular term. (Occurrences of “v” in “ιvA” are bound occurrences.) If no variables occur free in A other than v, then ιvA is a closed singular term.3 The Modal Morphology M M is exactly like MC except that we now add a new operator to the stock of sentential connectives; and correspondingly we reformulate as follows clause (ii) in the definition of a wff for any language L formed from M : (ii)M If A and B are wff’s so too are ∼ A, (A → B) and A. We also add the following abbreviation: ♦A abbreviates ∼ ∼ A. One can combine M with other non-modal morphologies, getting ME! , M= , etc. 2 Of course, we might simply have designated the predicate letter P 1 to do this work. How1 ever, the special importance of this predicate warrants allocating it a special symbol of its own. Similarly, we might use P12 instead of = to designate identity; however, once again, the special significance of this notion dictates a special symbol and a separate language. Indeed, as we’ll see below, these special predicate symbols can take on the status of logical symbols. 3 We could introduce a language that has mathematical function symbols in a similar fashion. However, we need not do so; the ι operator can represent functions.
Free Logics
637
The Morphology MI This will be the morphology from which we build languages appropriate to intuitionistic logic. The well known equivalences that allow us to abbreviate connectives are not valid in intuitionistic logic, so this morphology contains each of the connectives ∼, →, &, ∨, ∀, and ∃ on its own. The definition of wff for a language LI built from MI will have a clause for each of the connectives. MI admits variations of its own for E!, ι, and =. 2
2.1
FORMAL SYSTEMS
Three Existential Commitments of Classical Logic
Formal systems — purely syntactic creatures that they are — show their existential commitments by the existentially quantified statements derivable from them, and by the interactions that they support between quantifiers and singular terms. Now an axiomatic system, Σ, for some particular subject matter, may well assert or assume the existence of all sorts of things. The question of existential commitment is really only pressing when Σ is a “logical” systems: that is, an allegedly topic neutral system which itself is taken as a basis for any more specialized formal system. As it turns out our standard modern logic has existential commitments of three sorts. Here is an axiomatic system for standard first order quantificational logic. Let us call this system ΣC ; it is formulated in the language LC .4 The axiom schemata of ΣC are: A1
A → (B → A)
A2
(A → B) → (((A → (B → C)) → (B → C))
A3
(∼ A → B) → ((∼ A →∼ B) → A)
A4
∀v(A → B) → (∀vA → ∀vB) (v need not be free in A or in B)
A5
∀xA → At/x (where t is a singular term that is free for x in A)
The inference rules of ΣC are R1
Modus Ponens: A, A → B/B
R2
Universal Generalization: A/∀xA
As usual, a categorical derivation in ΣC is a finite list of wff’s, each element of which is either an axiom or else follows from earlier elements by one of the rules; and we say that A is a Theorem of ΣC (and write ⊢C A) if A is the last item in a categorical derivation in ΣC . Similarly, if Γ is a set of wffs (finite or infinite), then 4 The
“C” here stands for “classical”. Standard logic is often called “classical logic.”
638
Carl J. Posy
a hypothetical derivation from Γ in is a finite set of wff’s each of which is either an axiom, an element of Γ, or the consequence of earlier elements by one of the rules of inference; and we say that A is derivable from Γ in ΣC (and write Γ +C A) if A is the last item in a hypothetical derivation from Γ in ΣC .5 Here are the three sorts of existential commitments of ΣC : 1. The fact that existential wff’s are categorically derivable in ΣC i) ∃x(P1 (x) → (P2 x → P1 x)) for instance — shows that ΣC implicitly assumes that the world is not empty. The fact that ii) ∀xAx → ∃xAx, is a theorem expresses the same existential assumption. And of course the assumption will be inherited by any formal system taking ΣC as its logical base. And this is objectionable. We might for instance add to ΣC the axioms for some theory of unicorns, a theory including the claim that all unicorns are horses with a single horn. But we would not want to conclude from such a theory that horses with horns actually exist. A formal system will be free of this sort of assumption, if no existential statement is categorically derivable or derivable from premises which contain no existential assumptions or closed singular terms. 2. For the second sort of existential assumption, let us move to the language L= , and let us add the standard axiom schemata for existence to the axioms of ΣC . E1. t = t E2. t = s → (At//x → As//x ) (where the replacements occur in the identical places in A.) (I’ll call the resulting formal system ΣC= .) Now, iii) ∃x(x = t) is a theorem of ΣC= for any term t. The equality sign in L= allows us to speak of the existence of individual things, and we can ask whether we actually need to go to L= and ΣC= in order formally to express this notion of individual existence. I’ll tell you the answer to that technical question below. But even as things stand (iii) is a very strong existential commitment. It says that everything that has a name must already exist. This is indeed draconian. If we cannot name things that don’t or might not exist, then — on the plausible assumption that thought takes place in language — we will 5Σ C satisfies the “deduction” meta-theorem: Γ ∪ {A} ⊢C B ⇒ Γ ⊢C A → B. So do all the other logical systems that I will discuss. So in general I will move freely between A ⊢ B and ⊢ A → B for any of these systems.
Free Logics
639
not even be able to think about those things, or at least not as individual things. Indeed, expanding the language by adding names to use in those thoughts won’t help: The commitment still holds in the expanded language, and the new names will still be forced to stand for things that do exist. 3. ΣC has the property that iv) A(t1 . . .ti ..tn ) +C ∃xA(t1 . . .x. . .tn ). This is somewhat weaker than the second sort of commitment, for it does not by itself prohibit us from forming expressions for non-existent entities. But it does say that — even if we can form such expressions — we can never assert anything true about these non-existent things. These three existential commitments of standard logic — that there necessarily is something rather than nothing, that we cannot formulate judgments about things that don’t or might not exist, or at least that we cannot say anything true about such things — these are the free logician’s foils. As we shall see, free logicians will refine and subdivide these broad types of existential commitment and will devise methods to divest them. The proof theoretic approach to free logic aims to produce logical systems that abjure one or more of these existential assumptions. Each system will have its own special axioms, reflecting distinct philosophical views about the basic issues of predication and quantification. Each will thus have its own notion of logical theorem and of logical consequence. Let me turn to the main such systems now.6
2.2
Inclusive Logics
Inclusive logics are designed to be free of the first sort of existential commitment that I mentioned above. They are compatible with the assumption that nothing exists at all. The first such system was actually in a 1934 paper by Jaskowski, who used a meta-theoretical notation, T , in order to flag places in proofs that depend on classical logic’s existential commitment. In particular, he effectively changed axiom schema A5 to: A5′
∀xA&T y → Ay/x .
Now to derive formula (i) in standard logic we would assume (v) ∀x ∼ ((P1 x → (P2 x → P1 x))). 6 In
all this, I will concentrate on axiomatic systems. To be sure, natural deduction and tableau versions of all these systems exist. However they are formally equivalent to the systems I will describe, and so I will not address them independently. Thus, in particular, the system of Jaskowski that I discuss in the next section is actually a natural deduction system, but in presenting it I will make the trivial adjustments needed in order to present it on a par with the other systems that I will discuss.
640
Carl J. Posy
for reductio (since ∃xA is defined as ∼ ∀x ∼ A), and then use Axiom A5 in the form (vi) ∀x ∼ ((P1 x → (P2 x → P1 x))) →∼ ((P1 y → (P2 y → P1 y)) in order to get (vii) ∼ (P1 y → (P2 y → P1 y)) by modus ponens. And that, in turn, would contradict axiom A1, in the form (viii) (P1 y → (P2 y → P1 y)). In Jaskowski’s system, we would have to make the additional assumption Ty in order to derive (vii) from (vi) and in order to derive (viii) from the theorem corresponding to A1. But since Ty is not derivable by itself, the proof of (i) is blocked. Jaskowski’s system had no open wff’s as theorems. Mostowski [1951] attempted to produce an inclusive system which did allow open theorems. This later system had inconveniences of its own — as we’ll see in section 3.4 — but together these papers do offer a sustained study of one of modern logic’s existential assumptions.7
2.3 Positive and Negative Free Logics Positive Free Logics: Now assumptions of the second and third kind are especially troubling when we want to allow ourselves to make assertions about particular things that don’t (or may not) exist. This certainly occurs in fictional discourse: Sherlock Holmes never existed, but he did solve the mystery of the “Red Headed League.” It can occur in the history of science: (To use J. Lambert’s favorite example:) There is no Vulcan; but had there been, it would be a planet that rotates.) It occurs regularly in mathematical discourse. The largest twin prime will be odd. And even in ordinary daily discourse, we may well discover that something or someone of whom we have been speaking does not in fact exist. A main focus for the founders of free logic in the 1950s and early 1960s was to produce formal logical systems designed to support discourse of this sort. Perhaps the simplest such system is what I shall call ΣP F L . This system is formulated for any language Lc and arises from ΣC by substituting the following axiom instead of A5: FA5
∀w(∀vA → Aw/v ) (where w is a variable free for the variable v in A)
7 Interestingly, some authors do not include these systems in the proper canon of free logics. Bencivenga, [2002] is paradigmatic here. For one thing, these papers by Jaskowski and Mostowski predate the official birth of free logic in the mid 1950s. But more substantively, this work did not explicitly address questions of singular terms, and some see combating existential commitments of the second and third sort as the true work of free logic. However, I should mention that Jaskowski’s T acts formally like the symbol “E!”, so in fact his proposed instantiation rule is a precursor of the free-logical axiom FA8, below.
Free Logics
641
and adding the following additional axiom: FA6
∀w∀vA → ∀v∀wA.
Axiom FA6 is simply a formal device designed to fill allow the full range of deductions in this system.8 It is that slight change between A5 and FA5 that does the main work. FA5 has a dual effect: 1. It explicitly places the burden of existential weight on the quantifiers. (As Quine famously requires.) 2. By blocking the inference from ∀xA to arbitrary At/x , it allows the use of singular terms which do not have existential weight. It allows us for instance to affirm that Pegasus is a flying horse, while still saying that real horses don’t fly. Indeed, this system is a syntactic version of what has come to be called positive free logic; positive, because it allows the affirmation of at least some atomic predications with non-denoting singular terms. One can readily expand the language in which ΣP F L is expressed to a language L= or to a language LE! , languages that allows singular existence claims. Such expansions actually involve a few niceties: In the former case we add axioms E1 and E2 to the system. We also drop FA5 and FA6, and replace them with the single axiom schema: FA7
(∀vA&(∃w(w = v)) → Aw/v
In the resulting system, ΣP F L= , FA5 and FA6 are derivable. In the latter case (moving to a system formulated in a language LE! ) we once again drop FA5 and FA6. In this case however we need to replace them with two new axiom schemata: FA8
(∀vA&(E!v)) → Aw/v and
FA9
∀vE!v.
We can call the resulting system ΣP F LE! . One last pair of proof theoretic results needs to be mentioned: First, when we add “=” to the language LE! , together with the axioms for identity, it turns out that we can derive the biconditional: (x) E!t ↔ ∃x(x = t) in the resulting version of ΣP F LE! . (This was shown by Hintikka, quite early in the game.) Secondly, quite a bit later Meyer, Bencivenga and Lambert showed that we cannot define E!t in ΣP F L alone. These two results together, serve to answer the 8 Kit Fine, in [1983], proved that this axiom is not eliminable from this and similar formal systems for free logic.
642
Carl J. Posy
question I posed in section II.1: I asked there whether we need to add = to our language in order to express individual existence. The answer put simply is, yes. If we want to express this notion of individual existence then we must either add a primitive predicate E! to do that job or else add = to the language and define E!. There is no other way to do so. Negative Free Logics Now, the positive systems I’ve just described escape existential commitments of both the second and third kind: They allow us not only to think of non-existent things, but to assert true claims about such things as well. Some free logicians, however, find this too liberal: they hold that nothing can be truly predicated of a non-denoting term. So, a popular and important class of free logical systems is the class of so called “negative free logics”. In these systems, atomic wff’s with non-denoting singulary terms are all denied. Note, by the way, that I said “atomic wffs” here. Negative free logics do not generally accept the full classical doctrine (iv). Most negative systems do allow us to assert logical truths such as (x) P t∨ ∼ P t even when t is a non-denoting term. It is “contingent” truths that have the existential import. So negative systems usually have (xi) P n (t1 , . . ., tn ) → ∃x1 . . .∃xn P n (x1 . . .xn ) as a theorem. This is a refined version of classical logic’s third commitment. Actually, the most common versions of negative free logic are formulated in a language that can express individual existence; that is, in a language LE! or a language L= . In the first case, one simply adds the axiom schema FA10 P n (t1 , . . ., tn ) → E!t1 &. . .&E!tn to the axioms of ΣP F LE! , or its contrapositive FA10′ ∼ (E!t1 &. . .&E!tn ) →∼ P n (t1 , . . ., tn ) . Either way we can call the resulting system ΣN F LE! In the second case, when we move to a language, L= , instead of F10 we add the following axiom schema to the axioms of ΣP F L= . FA11 P n (t1 , . . ., tn ) → ∃v(v = t1 )&. . .&∃v(v = tn ) But now we need to exercise some special care in treating =. One thing we might do is to replace axiom schema E1 with the following axiom: NE1 ∀x(x = x).
Free Logics
643
I call the resulting system ΣSN F L= . The prefix “S” here indicates that this is a strongly negative system. For it has the effect of rejecting identity statements, (t = t), for non-denoting terms. Indeed, in this system, we can define E!t by the condition: (xii) E!t =df (t = t). However, to some this seemed too draconian a negativity. Identity statements are, after all, tantamount to logical truths; and we do admit such tautologies as (x) even for non-denoting t. So there is also a weaker negative system in which identity statements may be true for non-denoting terms. In such a system, ΣN F L= we would keep the original axiom schema E1. Terminological Note: There is no consistency in the use of the expressions positive and negative free logic. Systems with and without NE1 get called negative free logic, and some authors reserve the term positive free logic for systems in which affirm all atomic predicates with non-denoting terms. But, terminology aside, the important thing to keep in mind is the gradation of free logical systems, and their special sensitivity to the language in which they are formulated.
2.4
The Theory of Descriptions9
A definite description is an expression of the form “the x, such that . . . x. . . ”, with “the” acting in the singular.10 These expressions are ordinarily embedded in elementary predications. Thus for instance (xiii) The lion in the MGM logo roars to the left. As we saw in section 1.2, definite description phrases are formalized as “ιxA”, where A is a wff, ιx is a variable binding operation, and the expression “ιxA” is a singular term. So sentences like (xiii) will be formalized as (xiv) P1 (ιxP2 (x)). When the condition (xv) ∃x(Ax&∀y(Ay ↔ y = x) holds, then the expression ιxA is said to be fulfilled. It is unfulfilled if either there is no x such that Ax holds or if there is more than one such x. Now two natural assumptions about definite descriptions are (xvi) A(ιxA) 9 Free description theory is sometimes called an application of free logic. But it concerns the central issues of singularity and predication, and it developed simultaneously with the main proof theoretic and semantic aspects of free logic. So I treat it as part of the core. 10 The restriction to singular “the”, is meant to rule out generic sentences such as “The lion hunts by night.”
644
Carl J. Posy
and (xvii) ιxA = ιxA. The second is a logical truth, the first a paradigmatic analytic truth. Indeed, in classical logic (xvi) is a direct consequence of (xviii) (t = ιxA) ↔ (At/x &∀y(Ay/x ↔ y = t)), which expresses the meaning of the ι operator. But in classical logic these natural assumptions lead quickly to contradiction: For, (xvi) implies (xix) P (ιx(P x& ∼ P x)) and also (xx) ∼ P (ιx(P x& ∼ P x), a contradiction. And (xvii) implies (xxi) ∃x(P x& ∼ P x), which is logically false, and will lead to a contradiction as well. Clearly classical logic’s third and second existential assumptions are the culprits here, so free logic is a natural remedy. And, indeed, free logicians have proposed a series of formal systems for definite descriptions and corresponding semantics as well. Proof theoretically, an elegant early proposal by Lambert was simply to universally quantify (xviii) and append the result (FA12) = ∀z[(z = ιxA) ↔ (Az/x &∀y(Ay/x ↔ y = z))] as an additional axiom to a system of positive free logic, formulated in a language Lι . This captures the meaning of ιxA, but of course neither (xvi) nor (xvii) will be derivable from (FA12) and the axioms of free logic. In fact we now know that there is a pair of hierarchies of formal free description theories. One of these orders such theories according to the strength of the generalizations of (xvi), the other according to generalizations of (xvii). Historical and Textual Notes Quine in [1954] coined the term “inclusive logic” to cover systems which are free of the first existential commitment, because the class of semantic interpretations for such a system will include interpretations with an empty domain. The first syntactic treatment of free logic in its own right was Leonard [1956]. Other pioneering syntactic treatments were by Hintikka [1959] and Leblanc and Hailperin [1959] (both of which were natural deduction systems) and by Lambert [1963]. Lambert’s paper formulated axiom FA5, which itself has been the subject of both
Free Logics
645
technical and philosophical interest. (See: Meyer and Leblanc [1970] for the former, and Bencivenga [1989] for the latter.) Lambert coined the term “Free Logic”, and his [1963] was the first formalization without a symbol for individual existence. Hintikka [1959] has the derivation of E!t ↔ ∃x(x = t) in L=E! . The proof that E! cannot be defined in ΣP F L is given in Meyer, Bencivenga and Lambert, [1982]. Schock [1964] and [1968] set out the details and motivation for a negative free logic. He provided semantics as well as syntactic formalization. Scales [1969] provides a sustained treatment of the negative approach, and Burge [1974] provided yet another negative system. Grandy [1972] and [1977] advocates a positive free logic, and discusses the connection between intuitions about positive versus negative free logics on the one hand, and intuitions about the nature of predication on the other. Leonard [1956] contains a free description theory in a modal language. Hintikka [1959] initiated the non-modal treatment, which has since dominated the field. In a series of papers (Lambert [1962; 1964; 1967]) Lambert axiomatized the theory and presented a complete semantics. van Fraassen initially proposed a one-dimensional hierarchy of free description theories. Lambert and Woodruff worked out the dual hierarchies in the 1980s and 1990s. Their work is summarized in Lambert [2003] chapter 5. 3
3.1
SEMANTIC APPROACHES
The Correspondence Theory of Truth
The ranking I just mentioned of free description theories — like many similar proof theoretic rankings — is established by semantic means. And indeed, though semantic approaches to free logic began a bit later than the first proof theoretic studies, they quickly flourished. Semantic notions such as reference, satisfaction, truth and validity, provided technical soundness and completeness theorems; but they also provided a flexible context in which logicians could explore and once again refine the philosophical doctrines underlying various principles of free logic.11 In standard model theory, given a language L, an interpretation I of L will set out a non-empty domain of discourse D, will specify the referents of the closed singular terms within D and of basic n-place predicates as subsets of Dn ; and then it will determine the truth values for atomic and compound sentences (and even some open wff’s) according the well known recursion clauses. Since each of these activities will play a separate role, I’ll use distinct notation for each one. An interpretation, I, will be an ordered quadruple: I = D, IS , IB , IT where D is the domain, IS and IB are denotation functions defined as follows: S(1a)
For each variable v, IS (v) ∈ D
11 Though there are both objectual and “substitutional” semantics for free logic in the literature, they are formally equivalent. Thus mainly I’ll present only objectual approaches, and use them to highlight the associated philosophical issues. In section 5.3 the difference between these two approaches is significant, and so I will address it there.
646
Carl J. Posy
S(1b)
For each closed term t, IS (t) ∈ D
B(1)
For each n-placed predicate symbol P n , IB (P n ) ⊆ Dn
And IT , the satisfaction function, is a total function from sentences into {T, F } defined according to the following recursive clauses: T(1)
IT (P n (t1 , . . .tn )) = T iff IS (t1 ), . . ., IS (tn ) ∈ IB (P n )
T(2)
IT (∼ A) = T iff IT (A) = F
T(3)
IT (A → B) = T iff IT (B) = T or IT (A) = F
T(4)
Let v be a variable. IT (∀vA) = T iff for every d ∈ D, the interpretation I ′ , which is like I, with the possible exception that IS′ (v) = d, IT′ (A) = T .12 [From now own I will write I d/v for such an interpretation.]
T(2) and (T3) together with the definitions of the other sentential connectives give the standard sentential truth tables. If L is built from M= then we add the following clause: T(1a) IT (t1 = t2 ) = T iff IS (t1 ) = IS (t2 ).13 If IT (A) = T we say that I satisfies A, and if A is a closed wff, then we also say that A is true under I. Given a set, Γ, of sentences in L, I satisfies Γ if IT (A) = T for all A ∈ Γ. A wff of L will be logically valid ( A) if and only if it is satisfied by every interpretation of the language. G¨ odel’s completeness theorem shows that a wff will be logically valid if and only if it is a theorem of ΣC . That is ⊢C A ⇔ A. The ⇒ direction is soundness; the ⇐ direction is completeness. A wff A is a semantic consequence of a set Γ of wffs (Γ A), if every interpretation that satisfies Γ also satisfies A. G¨ odel’s theorem actually shows that Γ ⊢C A ⇔ Γ A. The ⇐ direction is called strong completeness (or argument completeness). The completeness theorem proves formally — though it is straightforward to show individually — that the semantics set out in clauses S, B, and T indeed validates all of the classical existential assumptions, (i)–(iv). Certainly the most natural understanding of this standard model theory is the straightforward correspondence theory of truth: The domain is a set of objects; each singular term denotes one of these objects (its reference or extension); predicates terms denote the extensions of properties and relations; a basic predication is true when the objects denoted stand in the relation denoted; and the recursive clauses tell us how to determine the extensions (truth values) of compound wffs. 12 Keep in mind that under our present definition of ∃, this clause entails that I (∃vA) = T iff T d/v there is a d ∈ D such that IT (A) = T . 13 As usual, the first occurrence of “=” here is in the object language, while the second occurrence is in the meta-language.
Free Logics
647
Thus the fact that (i)–(iv) are valid principles expresses a consistent metaphysical position; a position challenged by each stripe and version of free logic. Three broad semantic approaches to free logic were developed in the 1960s and 1970s: the single domain approach, the dual domain approach, and the approach of supervaluations. Each of these approaches will validate FA5 and other core axioms of free logic, and each aims to invalidate the classical theorems that express existential commitment. Here’s a brief survey of these approaches with some of their main variants.
3.2
Single Domain Approaches
As in standard semantics, the single domain approach defines an interpretation, I, as given by a domain D, together with the triple of valuation functions IS , IB and IT . But it then tinkers with IS to allow for singular terms that denote no existent object and modifies IT accordingly. There are two main classes of single domain semantics, “partial denotation” and “designated object” approaches. 3.2.1
Partial Denotation Functions
Here the main idea is that we cannot refer to things that do not exist. So if t1 , . . ., tk are specific terms for non-existent objects (“Pegasus:, for instance, and “Santa Clause” and “Sherlock Holmes”) then we will require that IS (ti ) is not defined for any of these terms. And so we get a class of interpretations in which IS is a partial function. This class itself can provide semantics for negative and for positive free logics. Negative Free Semantics Now the natural question is: what do we do with IT (P n (t1 , . . ., tn )) when IB (ti ) is not defined for one or more of the of the singular terms ti ? Clearly in this case, IS (t1 ), . . ., IS (tn ) ∈ / IB (P n ), so an equally natural answer is to require that IT (P n (t1 , . . ., tj )) is simply false in this circumstance. Those who adopt this approach modify clause T(1) as follows: FT(1) IT (P n (t1 , . . ., tn )) = T if all of IS (t1 ), . . ., IS (tn ) are defined and IS (t1 ), . . ., IS (tn ) ∈ IB (P n ). Otherwise IT (P n (t1 , . . ., tn )) = F . The remaining semantic clauses stand without change. The new notions of validity (′ A) and semantic consequence (Γ ′ A)are now defined in terms of this new class of interpretations (i.e., those in which IS is a partial function and IT uses the modified clause FT(1)). Principle (xi) can be shown to be valid under this notion. (xxii) At/x ′ ∃xA does not hold in general, but the special case
648
Carl J. Posy
(xxiii) P n (t1 , . . ., t, . . ., tn ) ′ ∃xAx/t does hold. So, in fact, this is a semantics for the basic negative free logic. In moving to a language LE! , one adds the following natural clause to the definition of IT : FT(1b) IT (E!t) = T if I(t) is defined. IT (E!t) = F otherwise. Then the resulting semantics validates axiom FA8 (as one expects in any free logic) and gives (xxiv) P n (t1 , . . ., tj ) ′ E!t1 &. . .&E!tj which is the mark of a negative free logic. Positive Free Semantics Now in order to produce a semantics for a positive free logic — i.e., one in which (xxiv) does not hold — we will have to modify clause FT(1). Since there is course no way to tamper with IB here, what we have to do is simply to specify by fiat that some particular atomic sentences are true even though they contain non-denoting singular terms. We can give a list of these, say Λ, and then modify clause FT(1) as follows: FT(1)′ IT (P n (t1 , . . ., tn )) = T if (a) all of IS (t1 ), . . ., IS (tn ) are defined and IS (t1 ), . . ., IS (tn ) ∈ IB (P n ), or (b) P n (t1 , . . .tn ) ∈ Λ. Otherwise IT (P n (t1 , . . ., tn )) = F . The remaining clauses would be as in the negative case. From a formal point of view the semantics thus derived can be shown strongly complete with respect to the positive free logic ΣP F L . Often the list Λ is said to be a list of “conventions”. And indeed within the formal semantics itself, this list may have no special justification. It stems from no “ontological” relation between an object and a property, or a relation among the members of collection of actual objects. However, in actual practice the elements of the list will usually be dictated by our understanding of the language or by the context of discourse. Thus, again, we know exactly when and why we want to affirm that Sherlock Holmes is a detective and that he lives on Baker Street.
Free Logics
649
Identity In the context of partial denotation functions, the language L= demands some special attention. Specifically if t is a non-denoting term, then how are we to deal with IT (t = t)? To get a strongly negative free logic we would have to make this atomic sentence come out false when t is a non-denoting term. We would do so by adding the following clause to the semantics: FT(1a) IT (t1 = t2 ) = T if IS (t1 ) and IS (t2 ) are both defined, and IS (t1 ) = IS (t2 ); otherwise IT (t1 = t2 ) = F . Once again, one can claim that this clause infringes on the status of (t = t) as a logical truth. Here, indeed, is a prima facie case for one of those “conventions” at work in positive free logic. So one can add instead: FT(1a)′ IT (t1 = t2 ) = T if (a) IS (t1 ) and IS (t2 ) are both defined, and IS (t1 ) = IS (t2 ); or (b) t1 is the same term as t2 ; otherwise IT (t1 = t2 ) = F . This, of course, has the effect of putting every wff of the form (t=t) in the stock, Λ, of conventions. When these are the only wff’s in Λ, then the result is a semantics appropriate to the weakly negative free logic ΣN F L= . 3.2.2
Semantics with a Designated Object
This approach does allow reference to non-existent objects. Suppose once again that t1 , . . ., tk are specific terms for non-existent objects. If we want to allow that IS (ti ) is defined for each of these terms, then perhaps the most straightforward way to do this is to pick some particular d ∈ D and let IS (ti ) = d for each 1 ≤ i ≤ k. This way of operating follows a suggestion of Frege’s, and seems straightforward because IS remains a total function, and IT is allowed to operate in the old standard manner according to clauses T(1)–T(4). However the fact is that we do need to refine this straightforward treatment in order to avoid some anomalous situations. Thus, for instance, if the domain of discourse consists of human beings and Julius Caesar is the designated object; then, under the natural interpretation of the language, it will turn out that Sherlock Holmes lived in Rome, and that Odysseus was stabbed by Brutus. So clearly we need to pick an object d in a way that nothing like this occurs. One way to achieve this is simply to add a new object, d∗ , to obtain a domain ′ D = D ∪ {d∗ }, and then modify the denotation functions as follows: FS(1a)′ For each variable v, IS (v) ∈ D / {t1 , . . ., tk }, IS (t) ∈ D FS(1bi)′ For each closed term t ∈ FS(1bi)′ For each closed term t ∈ {t1 , . . ., tk }, IS (t) = d∗ . FB(1)′ For each n-placed predicate symbol P n , IB (P n ) ⊆ Dn
650
Carl J. Posy
If we are working in L= , then would then add clause FT(1a) to deal with identity. Interpretations built in this way will make every atomic statement that contains an occurrence of one or more of the terms {t1 , . . ., tk } come out false. So it would seem as though such interpretations would naturally validate the theorems of a “negative” logic. A negative logic yes; but as it stands not yet a negative free logic. Indeed, if the logic is formulated in our base language LC , a language in which it is impossible to express individual existence, then in fact this semantics will give plain old classical logic. Moreover, moving to LE! this semantics will still validate E!t for every t in the language, unless we make further modifications. If we do not want to revert to the partial denotation function approach, then we must in fact modify clause T(1b) so that FT(1b)′ IT (E!t) = T iff IS (t) = d∗ . This change, together with clause FB(1)′ above will indeed validate (xxv) P j (t1 , . . ., tn ) E!t1 &. . .&E!tn , and thus give us a negative free logic. It will also avoid the anomalies I mentioned above. To be sure, this semantics does equate all non-existents. And so if we move to a language L= , we will have to add an axiom of the form FA12 [∼ ∃x(x = t1 )& ∼ ∃x(x = t2 )] → (t1 = t2 ). And equivalently in a language LE!= we would add FA12′ [∼ E!t1 & ∼ E!t2 ] → (t1 = t2 ). But notice this is in fact avoidable, for what we really have done is that we have separated the domain into two distinct sub-domains, D and {d∗ }. We have thus moved away from the notion of a single domain. We have in effect admitted a special domain of non-existent objects. To be sure we have restricted this secondary domain to a single object, d∗ . But once we admit the possibility of an interpretation which allows reference to non-existent objects, we have opened the door to expanding this secondary domain to what I call the “dual domain” approach, an approach that naturally validates positive free logic.
3.3 The Dual Domain Approach “Dual domain semantics” simply split D into a pair of disjoint sets, D = DE ∪DN . DE (often called the “inner domain”) will contain the actually existing objects. DN (the “outer domain”, which now may contain more than just one thing) will consist of the “non-existent” objects. Technically, one now can define an interpretation I = DE , DN , IS , IB , IT . The main changes will be
Free Logics
651
FS(1a)′′ For each variable v, IS (v) ∈ DE FS(1b)′′ For each closed term t, IS (t) ∈ DE ∪ DN FB(1)′′ For each j-placed predicate symbol P n , IB (P n ) ⊆ (DE ∪ DN )j and d/v
FT(4)′′ Let v be a variable. IT (∀vA) = T iff for every d ∈ DE , IT (A) = T .14 We can define the corresponding notion of validity, and show quite straightforwardly that this class of interpretations is strongly complete with respect to ΣP F L . And if we go to a language LE! or L= , then we will get the same result regarding the corresponding positive free logics ΣP F LE! and ΣP F L= . In a language Lι , the dual domain approach provides the main tools for the ranking of free description systems that I mentioned at the end of section 2.4. In this dual domain semantics it is a bit ingenuous to speak of a term t such that IS (t) ∈ DN as a “non-denoting” term. Au contraire, perhaps the strongest philosophical advantage of this semantic approach is its strong adherence to the letter of the correspondence theory of truth: an elementary predication is true in virtue of a property of the things denoted or a relation among the things denoted. On the other hand, this same consideration is one of the most controversial points in this semantic approach. For, critics are quick to carp that if an alleged object doesn’t exist then it presents no foundation on which to ground a correspondence, no input on which to predicate a property, and no pole for any polyadic relation. There have been attempts to sidestep these objections. Lambert and Meyer [1968], for instance, have shown how to work with an outer domain containing expressions describing the non-existent objects rather than the objects themselves. But this and similar devices subvert the notion of predication and the intention of FB(1)′′ . Thus for instance according to this suggestion the extension of the predicate horse will include lots of real horses plus the expression “Pegasus.”15 So the free logician’s real choice is to accept dual domains together with their philosophical foundations, or else to go to some different semantic approach to free logic.
3.4
Inclusive Semantics
The single and dual domain approaches together with their variations can, if we allow it, admit cases in which the domain (or the domain of existing objects) is empty. The clauses will remain the same. Mostowski pointed out that in an interpretation with an empty domain clause T4 has the effect of making all universal wff’s come out true.16 The same holds for each of the free variants. 14 Notice,
that if IS (t) ∈ DN , then I itself will not be one of the interpretations that we test. himself comes to admit this in [Lambert, 2003, p. 114]. 16 To see this just read the clause as a conditional: “For any d, (d ∈ D ⊃ I d\x (A) = T )”; and then read the ⊃ as a material conditional. 15 Lambert
652
Carl J. Posy
Of course if the domain is empty, no closed term can denote, a situation that free logic is well equipped to handle. But no variable can denote either. And this means that we must decide how to interpret open wffs. Some authors treat this by simply leaving the open wffs without truth value, and thus have only closed wff’s as logical truths. Others treat any open wff as though it was already universally quantified, and therefore as true. This latter approach has the drawback of actually restricting the validity of modus ponens. (For, P x will be true, when it is interpreted as ∀xP x on the empty domain. P x → ∃yP y will similarly be true; since it is an open wff. But ∃yP y will certainly not be true.) The various inclusive logics adopt one or the other of these approaches.
3.5 Supervaluations The single and dual domain approaches that I have surveyed are all bivalent. That is, each use interpretations in which IT is a total function, defined for each wff. This choice has the strong advantage of preserving a classical propositional logic. For, were we to admit for some A that IT (A) is undefined, then we would also naturally leave IT (∼ A) and consequently the classical tautology IT (A∨ ∼ A) undefined as well. This bivalence comes at a cost: For, it is very natural to assume that if s is a non-referring term, then P (s) and P (s, t) should be truth-valueless. For instance, even if we admit “Pegasus is a horse” as true, we still may want to say that “Pegasus was born on a Wednesday” simply lacks truth value. And we may well want to deny that there is any fact of the matter about the claim that Pegasus was taller than Bucephelus. The semantics of “supervaluations” aims to accommodate these natural nonbivalent instincts while preserving the underlying classical propositional logic. The idea, informally, is that IT will be a partial function, allowing some wffs with nondenoting terms to come out without a truth value. However, if such a wff, B, comes out true no matter how we might “complete” IT , then B is taken to be satisfied on its own. Here are the formal details: To define a supervaluational interpretation I = D, IS , IB , IT we will assume initially that our language is LC and will define I ∗ from a partial interpretation I ∗ = D, IS∗ , IB in three steps: 1. Step 1: Definition of I ∗ Given the domain D, define IS∗ and I∗B over D in the manner of a single domain, partial denotation interpretation. Thus in particular we shall assume that IS∗ is not defined for a set of terms {si }i∈J ∗ and that IB (P n ) ⊆ Dn . (J may be a finite or an infinite set.) I ∗ is a “partial interpretation” in the sense that IT∗ is now allowed to be a partial function, defined as follows: T ∗ (1.i) IT∗ (P n (t1 . . .tn )) = T if IS∗ (t1 ), . . ., IS∗ (tn ) are all defined and ∗ IS∗ (t1 ), . . ., I∗S (tn ) ∈ IB (P n )
Free Logics
653
T ∗ (1.ii) IT∗ (P n (t1 , . . ., tn )) = F if I ∗S (t1 ), . . ., IS∗ (tn ) are all defined and ∗ IS∗ (t1 ), . . ., IS∗ (tn ) ∈ / IB (P n ) T ∗ (1.iii) IT∗ (P j (t1 , . . ., tn )) is undefined if not all of IS∗ (t1 ), . . ., IS∗ (tn ) are defined. 2. Step 2: Definition of the “classical valuations”, ITk , (where k ∈ K).17 These evaluation functions are “completions” of the partial function IT∗ , defined as follows: (i) ITk (P n (t1 , . . ., tn )) = IT∗ (P n (t1 , . . ., tn )) if IS∗ (t1 ), . . ., IS∗ (tn ) are all defined (ii) ITk ((P n (t1 , . . ., tn )) ∈ {T, F } (arbitrarily chosen) if not all of IS∗ (t1 ), . . ., IS∗ (tn ) are defined. (iii) The clauses for compound wff’s are the standard ones. These valuation functions are “classical” in the sense that they are bivalent. 3. Step 3: Definition of the supervaluational interpretation I = D, IS IR , IT (i) IS = IS∗ ∗ (ii) IB = IB
(iii) IT is defined as follows for each wff A: SVT(i) IT (A) = T if ITk (A) = T for all k ∈ K. SVT(ii) IT (A) = F if ITk (A) = F for all k ∈ K. SVT(iii) IT (A) is undefined otherwise. Super valuations provide what is called a “neutral” free logic: For, on this semantics, wffs containing non-denoting terms are neither true (as they might be in a positive free semantics) nor necessarily false (as they must be in any semantics for negative free logic). Such wffs may turn out to be simply truth-valueless. Conventions It is now a fairly straightforward matter, to extend the notion of a supervaluation to respect what I above called “conventions”. That is, we may append a list, Λ, of wff’s, all of which we stipulate must be taken as true in each classical valuation, ITk . This is the standard method for preserving the special status of “t = t” in L= , and it is naturally extended once again to other atomic “analytic truths”. Once 17 The index set, K, will contain χ + 2η such classical valuations, where χ is the number of atomic wffs all of whose singular terms denote, and η is the number of atomic wffs containing at least one non-denoting singular term. Thus if the set of non-denoting terms is infinite, K will be an uncountable set.
654
Carl J. Posy
again, from the point of view of the model theory, these truths do not result from any correspondence or factual predication. But of course in any given classical valuation, ITk , that is no different than the attitude towards the status of the empty-term predications. Indeed, the most straightforward technique for treating these conventions is simply to limit the class of “admissible” classical valuations. Formal Properties of Supervaluations: We can now straightforwardly define a supervaluational notion of validity and consequence. That is: A if A is satisfied by every supervaluation, I. And similarly, Γ A if every supervaluational interpretation I that satisfies every element of Γ also satisfies A. This readily shows that supervaluational semantics does validate (A∨ ∼ A). And in general this semantic approach does validate the classical propositional logic. There are soundness and completeness theorems connecting the variations on the supervaluation semantics to corresponding systems of free logic. In particular, van Fraassen showed that when we add the set of conventions of the form “t = t”, then the resulting semantics is complete with respect to ΣP F L= . However, the non-bivalence of supervaluations does have a meta-logical cost: We do not have “strong completeness”. This is because a supervaluational interpretation, I, will satisfy P t for any singular term t, only if it satisfies ∃x(x = t). That is (xxvi) P t ∃x(x = t). Actually, we also have (xxvii) ∼ P t ∃x(x = t). But we do not have (xxviii) P t ⊢ ∃x(x = t). For, were (xxviii) to hold, we could also show (xxix) ⊢ P t → ∃x(x = t), and by soundness we would then be able to show (xxx) P t → ∃x(x = t). (xxx), however, is certainly not true. For, if in any interpretation I ∗S (t) is undefined, then there will be some k, such ITk (P t → ∃x(x = t)) = T , and some such that ITk (P t → ∃x(x = t)) = F , and IT (P t → ∃x(x = t)) will also be undefined. (And indeed, P t → ∃x(x = t) is not a theorem of ΣP F L= .)
Free Logics
655
Referential variants Supervaluations heightened attention to the metaphysical issues of reference and predication that underlie free semantics. In particular, some critics objected to the way that classical valuations arbitrarily “complete” the initial partial valuation I ∗ , without resting such completions on a relation of predication between an object and its properties. The use of conventions here of course further aggravates this complaint. Thus some authors suggested that the classical valuations be completed by adding objects to the domain, and then varying the ways in which ITk (P j (t1 , . . ., tj ) can turn out. It is worth noting that such an approach does not itself rest upon a uniform ontological notion of predication. For, when we move to a language LE! we will need to add the following clause to the definition of IT∗ : T(1b)′ IT∗ (E!t) = T if IS∗ (t) is defined; and IT∗ (E!t) = F otherwise. However, in order to guarantee that IT (E!si ) = F , for the non-denoting terms {si }i∈J , we will need to require TCV (1b.k) ITk (E!t) = IT∗ (E!t). for each classical valuation I k . And so we will once again find ourselves artificially assigning a truth value to a basic predication in a way that is not ontologically determined. (Indeed, in this case the truth value actually conflicts with the situation described by that classical valuation.)
Historical and Textual Notes Schock [1964] and [1968] present a partial-function negative semantics. Burge [1974] has one as well. Lambert and van Fraassen [1972] present a version of the positive partial function semantics. Lambert suggested a designated-object semantics for free-description theory as part of his initial free-definite-description theory in Lambert [1963b]-[1967]. Scott [1967] contains an updated designatedobject semantics. Dual domain semantics can be found in Cocchiarella [1966] and Leblanc and Thomason [1968]. The Leblanc and Thomason semantics is also inclusive. Lehmann [1994] provides semantics and a formalization of neutral free logic. Supervaluations were introduced to free logic and initially developed by van Fraassen in [1966a] and [1966b]. Subsequent discussion and refinements can be found in Skyrms [1968], Meyer and Lambert [1968], Bencivenga [1981] and Woodruff, [1984]. 4 ROOTS AND FRUITS: DESCRIPTIONS AND PRESUPPOSITIONS In this part I shall briefly show that Russell’s theory of descriptions illuminates free logic’s roots in the early 20th century discussions of logical form, and also provides entree to one of free logic’s richest applications, the theory of presuppositions.
656
Carl J. Posy
4.1 Russell’s Theory of Descriptions When Russell first set out his theory of descriptions in [1905] he was already firmly committed to the doctrine that truth ultimately rests on singular predications; as were his philosophical foils, Frege and Meinong. Within this context Russell combated four main doctrines about singular predication, three of them from Frege, one from Meinong: 1. Frege held that definite descriptions and proper names alike are singular terms. So apparent predications in which these expressions appear have the logical form P t (or in general P n (t1 , . . ., tn )). 2. Frege also held that when such a term — be it a name or a description — fails to denote, then the sentence of which it is a part necessarily lacks truth value. That’s because the reference of the whole sentence (i.e. its truth value) is a function of the references of its components. His example: “Odysseus was set ashore while sleeping”. Odysseus is a non existent character, and the sentence is truth-valueless. He would treat Meinong’s the “The golden mountain is golden” similarly. 3. Famously, however, Frege insists that in oblique contexts a sentence and its parts contribute not their ordinary reference, but rather their senses (which are taken now as “secondary references”). This makes it possible to say “Smith didn’t know that the morning star is the same as the evening star” without implying that Smith failed to know the trivial truth “The morning star is the same as the morning star.” And it also makes it possible to allow that “Smith believes that Penelope awaited Odysseus’s return” even though Penelope and Odysseus do not exist.18 4. Meinong agreed with Frege on (1). For him too, descriptions and names alike are legitimate singular terms. But he was unwilling to allow that a predicative form Pt is a legitimate predication if t fails to denote. And he would not turn to ‘senses’ as a stopgap. So instead of denying reference to names like “Odysseus” and descriptions like “the golden mountain”, he posited a special realm of non-existent objects, with their special own nonactual sort of being, to serve as the referents for these terms. Russell, like Meinong, rejected senses. He too believed that truth, all truth, rests on singular predication and that this predication is impossible without reference to objects. But he certainly rejected any appeal to a special world of nonexistents. Indeed, using the argument I set out in 2.4, he claimed that Meinong’s metaphysical excess is self contradictory.19 18 Frege’s suggestion to use a designated object is his way of treating non-denotation in scientific discourse. In natural discourse he preferred to leave sentences with non-denoting terms as truth valueless. 19 Parsons [1980] has a sophisticated defense of Meinongian views. Lambert in [1991a] and [1995] established a natural connection between Russell’s argument against Meinong and Russell’s paradox in set theory.
Free Logics
657
Russell’s solution for definite descriptions So Russell was left with the problems of how to treat sentences with non-denoting terms and how to account for truth in oblique contexts. He solved both by rejecting (1): A definite description, he claimed, is not a singular term at all, but rather it is an “incomplete symbol” which invites us recast the apparent logical form of statements in which it appears. Thus, for instance, (xxxi) “The present King of France is bald” is to be recast as: (xxxii) There is someone who is presently the King of France, and anyone who is presently King of France is that very person, and that person is bald. Russell showed how consideration of scope can help us deal with negations, so we can distinguish between the negation of (xxxi) and the entirely different claim (xxxiii) The present King of France is not bald which, on Russell’s view, is consistent with (xxxii). He showed as well how this construal applies to oblique contexts, and to identity claims. Thus, for instance he points out that when George IV asked whether Scott was the author of Waverly he was inquiring about the truth of (xxxiv) There is someone who is the unique author of Waverly, and that person is Scott. There is no problem here about triviality — the king is not asking whether Scott is the same as Scott — but there is also no appeal here to Fregean senses. The question of proper names This solved the problem of unfulfilled definite descriptions. But Russell still had to face the question of non-denoting proper names, “Odysseus”, “Pegasus”. His answer in principle: these are disguised definite descriptions. “Pegasus”, for instance, is just shorthand for “the flying horse”.20 Of course in practice, we may well not know whether or not a name actually denotes something or someone. Consequently, we will not in fact know the logical form of sentences containing such names, and indeed logical form will become undecidable. To avoid this, Russell proposed the criterion of sensory acquaintance: a speaker can use a term as a logically proper name only when the speaker is directly sensing 20 Thus, though Russell rejected Frege’s requirement that even proper names have sense, along with Frege he did believe that the reference of these allegedly proper names is fixed by some description of the object to which the name refers. Thus Kripke attacks both of these views together in [1972].
658
Carl J. Posy
the very thing to which the term purports to refer. On this view it may well turn the only logically proper names are reports of immediate sense experience; 21 but at least logical form remains decidable.22 The Formal Legacy It is Russell, in setting out the mathematical version of his theory of descriptions, who introduced our notation of ιxA and E!. To be sure, our modern free-description theory takes (xxxv) P 1 (ιxP 2 (a, x)) to be the proper formalization of (xxxi). Russell would not do so. For him (xxxv) gives merely the surface structure of (xxxi). The proper logical structure is given by (xxxvi) ∃x(P 2 (a, x)&∀y(P 2 (a, y) → y = x))&P 1 (x)). This indeed is what allows him to translate (xxxiii) as (xxxvii) ∃x(P 2 (a, x)&∀y(P 2 (a, y) → y = x))& ∼ P 1 (x)). And that in turn does not contradict (xxxvi). It is a tribute to Russell’s ingenuity that despite this difference, his notation still prevails. Philosophical Roots The debate here is about what I called above the second and third type of existential commitment: Frege and Meinong rejected the second commitment — as does contemporary free logic. They allowed us to talk about individual objects that do not or may not exist. Russell effectively accepted the second commitment: From his point of view, one can name only existents. Now we already find in Frege’s views the roots of the designated object semantics for scientific language, and the partial valuation semantics for natural language. And Meinong adumbrates something like the dual domain semantics. Indeed, dual domain approaches to the semantics of free logic are often called “Meinongian” semantics. We should however be careful here: Meinong believed that these nonexistent objects are “incomplete”: they are undetermined regarding 21 And
it may well be that direct sense data are the only true objects. and Bernays faced a similar problem in their theory of definite descriptions for a mathematical language. In Hilbert and Bernays [1934] they proposed that a mathematical definite description should count as a singular term only if it an be proved that there exists a unique individual satisfying the description. Lambert (in [2003], chapter 4) criticizes this view on the because of the undecidablity of logical form as well as for some other reasons. Quine (in [1951] ch. 27) proposed to eliminate names altogether from a formal language in favor of predicates, for which existence and uniqueness can be proved. Thus logical form is now uniform. 22 Hilbert
Free Logics
659
contingent properties. There is no truth of the matter about whether it rained last Tuesday on the golden mountain. So, in fact, his view foreshadows supervaluations with conventions. As for Russell, he distinguishes the true logical structure of a sentence from its apparent structure. At the level of true logical form — where all the singular terms are logically proper names — Russell is the forerunner of straight classical logic, with all of its existential commitments. At the surface structure level, however, Russell has given us the philosophical basis for a strongly negative free logic.
4.2
Presuppositions
Free logic has found several technical applications.23 But I want here to mention only the application of free logic in the theory of presuppositions. For, this is a direct link from the roots in Russell’s description theory. Russell’s theory of definite descriptions was a philosophical staple for almost fifty years, until Strawson challenged it in [1950]. Strawson returned to Frege’s view that allowed a sentence containing definite descriptions to lack truth value. Indeed, he held that whether or not such a sentence is true or false or truthvalueless depends on the context in which the sentence is uttered. Take, for example the sentence “the present King of France is bald” uttered in 1701, versus the same sentence uttered in 2001. Because of this view he objected to Russell’s account of the logical form of sentences such as (xxxi). Indeed, if as Russell requires, sentences such as this are really to be read as (xxxvii), then we are committed to saying that (xxxi) formally implies (xxxviii) There exists a present King of France. Certainly, Strawson agreed, there is a logical connection between the claim that the the present King of France is bald and the claim that there is presently a King of France. But, says Strawson, the relation between these two claims cannot be logical implication. For, if it (xxxi) implied (xxxviii) — in virtue of its logical form — then the falsity of (xxxviii) would require the falsity of (xxxi) and not, as Strawson would have it, the truthvaluelessness of (xxxi). Instead, Strawson suggested that the “The present King of France is bald” presupposes “There is a present King of France”, but does not imply “There is a present King of France”. But how is this relation of presupposition to be formally expressed? Here is where the semantics of free logic comes in. For, van Fraassen used the fact that supervaluations are not strongly complete with respect to ΣP F L= in order precisely to define a notion of presupposition that does what Strawson wants. Specifically (xvi) and (xvii) above show that when t is non-denoting, P t will be truthvalueless. 23 Quite prominently free logic has been applied to the theory of partial functions and to the study of programming languages. These studies use free logic to deal with things like the function f(x,y) = x/y, when y=0. See Gumb and Lambert [1997].
660
Carl J. Posy
And so in general, van Fraassen defined PS: A presupposes B if the falsity of B semantically entails that neither A nor ∼ A will be true. This simple observation and definition is in fact extraordinarily useful. Lambert, for instance, applied it in order to interpret Reichenbach’s logic of quantum mechanics. And Kit Fine has adapted it in order to provide a semantics for vague predicates which preserves classical logic. For our purposes, the one use I want to note is that this definition brings full circle the original task of free logic: For, we now have a more precise formulation of the notion of “existential presupposition”, than was heretofore available; the very notion with which free logic began. Historical and Textual notes Russell’s [1905] presents a less formal version his theory of definite descriptions than the version that appears in Russell and Whitehead [1910]. The later version contains the “ιx” and “E!” notations as eliminable partial symbols given by contextual definitions. Lambert [1992] offers a close reading and criticism of the later presentation. The Meinongian views criticized by Russell are in Meinong [1899] and [1902]. Frege’s views about sense and reference appear in his famous essay, Frege, [1892]. This essay also contains an early suggestion of the notion of presupposition. Strawson’s discussion of presupposition can be found in Strawson [1952] as well as [1950]. van Fraassen in [1968] credits Lambert with having suggested that he apply the method of supervaluations to the notion of presupposition. Lambert’s application of supervaluations to Reichenbach’s interpretation of quantum mechanics is in Lambert [1969a]. Kit Fine’s application to the semantics of vague predicates is in Fine [1975].
5 FREE LOGIC AND INTENSIONAL LOGICS As I mentioned initially, one further consequence of Russell’s analysis was to harden the extensional stance within standard classical semantics. Russell was unwilling to accord semantic weight to Fregean senses, and the logical tradition that he fostered was staunchly referential and extensional. Free logic grew and flourished in this tradition. It is based upon a truth functional propositional base together with set theoretically extensional predicates and quantifiers. However in this part I want to sketch three ways in which free logic’s concern with predication and existence forms natural links with modern logic’s great nonextensional stream. Free logic, we shall see, helps to explicate the very notion of extensionality, and it interacts subtly and symbiotically with the two chief intensional systems: modal logic and intuitionistic logic.
Free Logics
5.1
661
Varieties of Non-Extensionality
Here is a standard version of the principle of extensionality for sentences: A sentence will maintain its truth value when it is transformed by substituting a coreferential expression for one of its constituent expressions of the same sort. In a slogan: Co-extensive expressions are substitutable in sentences salva-veritate. Lambert calls this notion of extensionality, SV-extensionality (for salvaveritate). And he argues that under a natural notion of “co-extensive predicates” the various semantics for free logics are not SV-extensional. The natural notion is simply that Q and S are co-extensive if (xxxix) ∀x (Qx↔Sx) holds.24 Now let A be the predicate “(x = x)”, B be the predicate “E!x&x = x” and C be the predicate “E!x → (x = x)”. Then A is coextensive with B and with C. Yet B and C cannot always replace A, salva veritate in free semantics. Thus in particular, when “t” is the term “Vulcan”, (xl) (At/x ↔ B t/x ) will fail in any free semantics which validates (t = t), and (xli) (At/x ↔ C t/x ) will fail in the semantics for strongly negative free logic, or any semantics which leaves “Vulcan=Vulcan” truth-valueless. So (xlii) Vulcan = Vulcan is not SV-extensional. A second common notion of extensionality says that a sentence is extensional if its truth value depends only on the extensions (references) of its parts. And it seems obvious that this second notion of extensionality — Lambert calls it TDextensionality (for “truth-value dependence”) — entails SV-extensionality: If the truth value of a sentence is a function of the extensions of its parts, then substitution of co-extensional parts must preserve truth value. However, Lambert argues that in a strongly negative free logic (one in which even “t = t” is false for non-denoting t) the sentence (xlii) is TD-extensional. For its truth value (false) is determined simply by the fact that “Vulcan” has no extension. So, using free logic, Lambert shown that TD-extensionality does not imply SVextensionality. Clearly, then, free logic does impinge on matters non-extensional; and that alone leads one to wonder whether and how formal free logics interact with the well known formal non-extensional logics. 24 He argues that this is the appropriate notion of co-extensive predicate, even for free logic, and even for dual-domain semantics. That is, the extension of a predicate is the set of existing things that satisfy that predicate.
662
Carl J. Posy
Indeed the fact is that the motivations for free logics — considerations of predication, singularity and quantifiers — overlap naturally with foundations of modal and intuitionistic logic, and there have been a number of studies tracing the interplay between free logic and these non-extensional theories. These studies show that though we have good reason to produce hybrid formal systems here, in each case it turns out that the accompanying semantics requires a delicate balance of the free and the intensional components. And these studies show that the results in each case are advantageous for both sides of the match.
5.2 Free Logic and Modal Logic Modal logic introduces explicit propositional modal operators: for necessity and (dually) ♦ for possibility. Systems of modal logic are non-extensional in both of Lambert’s senses: The truth value of ♦A is independent of whether or not A is actually true; and A may hold while B fails even though A and B are both true. Proof theoretically, modal formal systems rest on classical propositional logic, but add a variety of additional modal axioms. The most well known such systems all share the modal axiom K
A → ((A → B) → B)
and the inference rule called “Necessitation:” NEC A/A. They are then distinguished by their additional modal axioms. The basic tool for the semantics for modal logic is a “Kripke-structure” consisting of a collection of “possible worlds” together with an interpretation of the language. The truth at a world of a modal compound will generally depend on the truth of the component sentences at other worlds. Free logics go together with modality historically: Kant, for instance links the logic of existence with his modal category of actuality.25 In our own times, modal logics provide a natural platform to discuss objects which don’t exist, but might do so. Indeed, Kripke models force us to ask about terms that have no denotation in a given world and to think about how to define the extensions of a predicate at different worlds. The answers have not always led to modal versions of free logic; but the difficulties the “non-free” answers present do recommend a free modal semantics, and thus corresponding formal systems of free modal logic. 5.2.1
Basic Modal Semantics
Let’s assume that we are dealing with a modal language, L= , that includes the identity symbol. Now an interpretation of such a language will be a 7-tuple 25 See, the chapter entitled “The Postulates of Empirical Thought”, in the Critique of Pure Reason.
Free Logics
663
I = W, r, R, D, IS , IB , IT ; where W is a non-empty set (intuitively the set of possible worlds), r ∈ W is taken to be the actual world, and R is a binary relation on W (the “accessibility relation). Intuitively, u1 Ru2 means that u2 is possible relative to the state of affairs u1 . The satisfaction function IT now must always give a “world relative” notion of satisfaction; that is IT (u, A) ∈ {T, F }, where u ∈ W . For non-modal sentences IT is defined just as it is defined in classical logic. And in the modal case IT (u, A) will be true just in case IT (u′ , A) is true for every u′ accessible to u. This captures the basic idea that the truth of A at a world, u, amounts to the truth of A at every world that is possible with respect to u. We will then say that a wff A is satisfied by the interpretation, I, if IT (r, A) = T . And we can define validity and semantic entailment for classes of interpretations. Typically a class of interpretations will be the set of all the interpretations in which R has some particular relational feature (e.g. all the interpretations in which R is a symmetric relation). Kripke showed how the nature of R determines the modal axioms that will be valid with respect to that class of interpretations. Thus in particular • The system T is derived by adding the axiom A → A to K, and is associated with the class of interpretations in which R is reflexive; • The system S4, derived by adding the axiom A → A to T, corresponds to the class of interpretations in which R is reflexive and transitive; • The system B, derived by adding the axiom A → ♦A to T goes with the class of interpretations in which R is a symmetric relation; • And the system S5,derived by adding the axiom ♦A → ♦A to T, goes together with the interpretations whose R is an equivalence relation. Single Domain Semantics For our purposes the important questions come with the determination of D, and the valuation functions. Here’s the apparently most straightforward such definition of a modal interpretation: An interpretation I = W, r, R, D, IS , IB , IT , where: MD: MS: MB: MT(1): MT(2): MT(3): MT(4):
D is a non-empty set For each singular term, t, Is (t) ∈ D For each predicate letter P n and each world u, IB (u, P n ) ⊆ Dn IT (u, P n (t1 , . . ., tn )) = T iff IS (t1 ), . . ., IS (tn ) ∈ IB (u, P n ) IT (u, ∼ A) = T iff IT (u, A) = F IT (u, A → B) = T iff IT (u, B) = T or IT (u, A) = F Let v be a variable. IT (u, ∀vA) = T iff for every d ∈ D, I d/v (u, A) = T .
The idea here is that D is a single all encompassing domain, and for each t, IS (t) defined once and for all over that domain. All the variation within a model structure takes place regarding IB , and IT is sensitive to that.
664
Carl J. Posy
However notice that this seemingly straightforward proposal for a quantified modal semantics validates the so called “Barcan formula”: (xliii) (∀xAx → ∀xAx) which in turn is equivalent to the wff, (xliv) ♦∃xAx → ∃x♦Ax.26 And that is a problem. For, you don’t have to be a free logician to wonder why the fact that I might have had a sister (though I don’t) should entail the claim that there already exists someone (or something) that could have been my sister. There are some recent attempts to justify (xliv) by assuming that there are non-concrete “possible objects” in the domain. But, once again, even advocates of standard modal logic may well find this proposal abhorrent. World-relative domains These considerations — reminiscent of free logic — recommended that even the standard modal logician, who wants to invalidate (xliv) turn to the class of interpretations that assign to each world, u, within the Kripke structure, its own domain, Du . So, now an interpretation I = W, r, R, D, D∗ , IS , IB , IT , where D∗ is a function from W to P (D), the power set of D. (I’ll write D∗ (u) as Du .) Having decided to move in this direction, we now face four decisions. We must decide how to define the valuation functions IS , IB and IT , decisions that are familiar to us from ordinary non-modal free logic. And we must decide whether and what relation there should be between the domain, Du , of a world u and the domain, Du′ , of a world u′ that is accessible to u. Actually simple and standard decisions on IB , IT and IS , will suffice to invalidate (xliv). Specifically, we will continue to define IB (u1 P n ) as a subset of Dn , keeping MR as above; but we will restrict the quantifiers at a world, u, to the domain, Du . So clause MT(4) will be replaced by MT(4)′ Let v be a variable. IT (u, ∀vA) = T iff for every d ∈ DU , I d/v (u, A) = T .27 And as for IS , here too, we will go “world relative”. That is we will speak of IS (u, t). However, in this case we shall require for all terms, t, and worlds, u, that 26 You can see readily why (i′ ) should be valid in a semantics with a single domain: For, if IT (u, ♦∃xAx) = T , then there will be a world u′ (accessible to u) such that IT (u′ , ∃xAx) = d/x d/x T . Thus, for some d ∈ D, IT (u′ , ∃xAx) = T . But then IT (u, ♦A) = T , and thus IT (u, ∃x♦Ax) = T . 27 One must take care here in respecting the interaction between the clause governing I (u, B) T and the clause governing IT (u, ∀xAx). Thus in evaluating IT (u, ∀xAx) one must first build an interpretation I d/x for some d ∈ Du . Then with that interpretation fixed, one checks each d/x u′ ∈ W such that uRu′ , to see whether IT (u′ , A) = T . One then repeats this process for each ′ other d ∈ c. On the other hand, in evaluating IT (u, ∀xAx), one evaluates IT (u′ , ∀xA) for each u′ ∈ W such that uRu′ . For each such u′ , the variable ranges over Du′ and not Du .
Free Logics
665
IS (u, t) is defined, but we shall not require that IS (u, t) ∈ Du . So clause MS is now replaced by the equivalent: MS′
For each singular term, t, Is (u, t) ∈ D
Given these decisions, here is the counter-example to (xliv): Let W = {a, u}, and let rRu. Let Dr = {1}, Du = {1, 2}, IS (r, b1 ) = IS (u, b1 ) = 2; IB (r, P ) = IB (u, P ) = {2}. Then IT (r, ♦∃xP x) = T ; because IT (u, P b1 ) = T , and thus IT (u, ∃xP x) = T . But IT (r, ∃x♦P x) = F . But the problem is that these conditions on I also serve to invalidate the following wffs: (xlv) ∀x∃y(x = y) (xlvi) ∃x(x = t) (xlvii) At/x → ∃xA And, however sweet that might be for a free logician, the fact is that each of these wff’s is a theorem of any standard system of modal logic.28 And so the standard modal logician is led to tinker yet more with the basic notion of an interpretation in order to validate these wffs. It’s the decision about accessible domains that saves (xlv). Specifically we adopt what I call the “monotonicity” condition on domains: One restricts attention to interpretations for which uRu′ entails Du ⊆ Du′ . (xlv) is valid with respect to the class of “monotonic” interpretations, because in each such interpretation, I, d/x once IT (r, ∃y(x = y)) = T , then for all u′ such that rRu′ , d ∈ Du′ as well. So d/x IT (u′ , ∃y(x = y)) = T for each such u′ , and IT (r, ∀x∃y(x = y)) = T . As for (xlvi) and (xlvii), let me mention two ways that have been adopted to support them. a) Kripke’s original paper simply used a language without any individual constants or description; and, like Mostowski, Kripke too construed free variables as if they were universally quantified. Thus in effect (xlvi) comes out as (xlviii) ∀y∃x(x = y) and (xlvii) as (xlviii) ∀y(A →
∃xAx/y ).
These formulas are valid in the world-relative semantics.29 28 (xlv) is a theorem in virtue E1, necessitation and the classical laws of quantification; (xlvi) and (xlvii) are, of course products of classical quantification by itself. 29 We should note that Kripke also restricts the rule of necessitation, and allows it to apply only to closed wffs.
666
Carl J. Posy
b) Another approach is to eschew bivalence. Specifically, one can modify the condition on IB , so that IB (u1 P n ) ⊆ (Du )n , and then change IT so that IT (u, A(t1 , . . ., tn ) is undefined if any of IS (u, t1 ), . . ., IS (u, tn ) is not in DU . In this case we will also change the definition of validity: A sentence will be valid relative to a class of interpretations, if no interpretation ever makes that sentence come out false. Now ∃x(x = t) will never come out false in any / DU , then IT (u, ∃x(x = t)) world with this definition of IT . For, if IS (u, t) ∈ will be undefined. Similarly, if IS (u, t) ∈ DU , then IT (u, At/x → ∃xA) = T ; and if IB (u, t) ∈ / DU , then IT (u, At/x → ∃xA) is undefined. Other variations have been proposed. But the point here — made most forcefully by James Garson in [1991] — is that the most natural approach would be simply to jettison the classical quantification rules and go directly for a free modal logic. Formal tricks such as changing the definition of validity are just that, tricks which complicate the semantics unnaturally. And indeed the montonocity condition runs against the main point of world-relative domains (and, indeed, of modal logic in general): some things that exist might not have existed.30 It seems clear that free modal logic would be a much more natural embodiment of that premise.31
5.2.2
Free Modal Logic
The basic idea here is to use a system of free logic as the quantificational side of one or another modal logic, and thereby produce modal systems that invalidate principles (xlvi) and (xlvii) and indeed replace them naturally with (xlviii) and (xlix). Thus for instance in a modal language LE!= , we might adopt axiom schemata FA8 and FA9 instead of the classical A5, and then make the corresponding adjustments in the modal semantics. Certainly such a semantics will accept worldrelative domains and will eschew the monotonicity condition. To be sure, the remaining adjustments — the definitions of IS , IB , and IT — must be done delicately, for there are pitfalls here. In this case, though, the moves made to avoid those pitfalls have led to philosophically interesting notions. To display this I shall trace a case in point: It is natural in such a semantics simply to accept the standard MS, MB, MT(1)– MT(4′ ), and then add FMT(1b) IT (u, E!t) = T iff IS (u, t) ∈ DU . but, in fact, we can’t do that. For, when A is a modal wff, this semantics actually invalidates the principle 30 Moreover, it is important to note that the monotonicity condition validates Barcan Formula for the modal semantics for systems B and S5. For in these cases R must be a symmetric relation. 31 And indeed, in this context we should note that Kripke’s semantics manages to avoid invalidating (xlvi) and (xlvii) by validating (xlvii) and (xlix), principles that are accepted in free logic.
Free Logics
667
1. At/x &E!t → ∃xA. one of the hallmarks of a free logic. The simple wff, (c = c)&E!c provides a counter-example: For consider an interpretation, I, in which W = {r, u}; rRr, and rRu; Dr = Du = {1, 2}; IS (r, c) = 1 and IS (u, c) = 2. Now certainly IT (r, (c = c)) = T . (That’s because IT (r, (c = c)) = IT (u, (c = c)) = T .) Also clearly IT (r, E!c) = T . But IT (r, ∃x(x = c)) = F .32 This is not an easily deflected example: Unless we pick a strongly negative system of free logic (e.g., a variation on ΣSN F L= ), (c = c) will be a theorem of our free modal logic. Kripke’s notion of rigid designation can explain what’s happening here.33 A rigid designator is a term, t, such that IS (u, t) = IS (u′ , t) for all u and u′ . Following Kripke, it is now generally accepted that proper names are rigid designators but that definite descriptions are not. (Thus the name “Carl Posy” refers to the same person in every world, while “the author of the chapter on ‘Free Logics’ in the Handbook of the History of Logic” does not. Any number of other people could well have written this chapter.)} Now we can see that the problem in the counter-example to (l) and similar examples is that c is treated as a non-rigid designator — its denotation varies between r and u — while at the same time the quantifiers are insensitive to this variation. (In evaluating ∀xA at a world, we fix a denotation for x and hold this fixed even if A itself requires us to look at other worlds.) So a tempting fix is to restrict ourselves to interpretations which in which IS always acts rigidly, just like the quantifiers: That is, for all t, u and u′ , IS (u, t) = IS (u′ , t). In such interpretations, even though the domains are world-relative we would still speak of IS (t). This is technically possible, but seems draconian and against the spirit of free logic. For, one of the glories of free logics is their ease with non-rigid definite descriptions. A more subtle approach is to let the quantifiers act less rigidly. A main tool that has been suggested here is the notion of an “individual concept”: a function from worlds to objects, which picks out an object at each world. Thus, for instance, the term “The author of the ’Free Logics’ chapter in the Handbook of the History of Logic” would pick out Carl Posy in the actual world, J. K. Lambert in a different world, and E. Bencivenga in yet a third. We would then assign individual concepts to our singular terms. Since the concept is already flexible, the assignment itself need not be world-relative. The satisfaction conditions for quantified wffs would then be: FMT(4) IT (u, ∀xA) = T iff for every individual concept f : W → D such that f /x f (u) ∈ Du, IT (u, A) = T 32 This
1/x
is because IT
1/x
(u, x = c) = F , so IT
2/x c) = F , so again IT (r, (x = c)) = F . c)) = T , we get IT (r, ∃x(x = c)) = F . 33 See Kripke [1972]
2/x
(r, (x = c)) = F . And similarly IT
Thus, since there is no d ∈ Dr such that
(r, x =
d/x IT (r, (x
=
668
Carl J. Posy
Here, the notation I f /x means that we use the value of f at the world in which we are working in order to evaluate A. When A is a non-modal wff, this operates just as before. But when the structure of A sends us to worlds u′ which are accessible to u, we then use f (u′ ) rather than f (u) in evaluating A. Certainly this technical approach will once again validate (l) in general, and modal versions in particular. Moreover, the notion “individual concept” is a very rich philosophical tool. Carnap in [1947], for instance, uses it to define the “intension” of a singular term, and takes this to be a more precise rendition of Frege’s notion of the “sense” of such a term. But this suggestion is too flexible. For, let us assume that each local domain is non-empty. Then for each u, any d ∈ Du we can easily and artificially construct an individual concept, f , such that f (u) = d, and such that for each u′ accessible to u, f (u′ ) ∈ Du′ . Such a “choice function” need have no rhyme or reason to it. f /x f /x But IT (u′ , ∃y(x = y) = T for each accessible u′ . So, IT (u, ∃y(x = y)) = T and thus IT (u, ∃x∃y(x = y)) = T . It thus turns out that (li) ∃x∃y(x = y) is a valid wff under this semantics, a wff which claims that there exists a necessary entity. So the semantics seems to give us a cheap version of the ontological argument, certainly an unintended and undesirable consequence. So finally we are led to an intermediate and more balanced suggestion: Assign world-relative functions. That is, once again define IS (u, t), but have IS (u, t) itself be a function from worlds to individuals. (These functions are sometimes called “individual substances”.) In this case it is straightforward to validate (lii) E!c → ∃x(x = c) and in general (liii) A&E!t → ∃xAx/t , and to invalidate the cheap ontological argument: ∃x∃y(x = y). This is the desired formal result. But more importantly, these formal concerns have led us to an interesting and right notion of entity for the range of our quantifiers, a notion which puts the flexibility in the place where it is needed, and does not force us to entertain pathological entities. So this wedding of free and modal logics revealed the interaction between intension and existence and showed a perhaps unexpected correlation between the action of the quantifiers and the flexibility of reference. Historical and Textual Notes Lewis and Langford [1932] set out the main hierarchy of formal systems of modal logic. Kripke’s semantics were initially presented in Kripke [1959] and [1963]. The Barcan formula stems initially from Ruth Barcan Marcus [1946]. Williamson
Free Logics
669
[1998] and [2000] attempt to support the single domain approach. Treatments of the non-bivalent approaches as well as various inclusion relations among world relative domains can be found in Hughes and Cresswell [1968] and [1996] and in Gabbay [1976]. Carnap [1947] introduced the notion of individual concepts into the semantics of modal logic. Scott [1970] favors this notion. A version of the notion of individual substance can be found in Thomason [1969]. Garson [1991] presents a comprehensive account of free semantics for modal logic together with a discussion of the completeness proofs for this semantics.
5.3
Free Logic and Intuitionism
In the last section we saw a natural alliance between free logic and quantified modal logic. In this section I would like to show an even more strikingly symbiotic relation between free logic and intuitionistic logic. Once again there is a natural philosophical affinity here. Brouwer founded intuitionism to limit what he took to be the excessive existential claims of classical mathematics; and his student, Heyting, axiomatized Intuitionistic Logic in order to formalize the limited circumstances under which Brouwer would allow us to claim that an object exists. Indeed, much like free logic, intuitionistic logic aims to block existential claims that arise from logic or language alone. But we’ll see that the standard proof theoretic formulation of intuitionistic logic does not quite live up to this goal, so that a free intuitionistic formal logic is much to be desired. We will also see that the considerations of free logic drive a wedge between the two standard approaches to intuitionistic semantics, each with flaws of its own, and that a free intuitionistic semantics once again provides both technical and philosophical advantages.
5.3.1
The formal system ΣI
Intuitionism’s strict constraints on existence are reflected in strict constraints on assertability and ultimately in the rules of logic. Thus many of the theorems of ΣC are not intuitionistically provable. (p∨ ∼ p) — the law of excluded middle — is the most famous of these, but the equivalences which allowed us heretofore to view dual connectives as abbreviations also fail. Thus for intuitionism we must always chose a language from the full morphology MI , and axiomatizations of intuitionistic logic must have principles governing each one of the connectives. Here is an axiomatization of intuitionistic logic adapted from Kleene [1952] and van Dalen [1986]. Axiom Schemata:
670
Carl J. Posy
AI1. A → (B → A) AI2. (A → B) → ((A → (B → C)) → (A → C)) AI3. A → (B → (A&B)) AI4a. (A&B) → A AI4.b. (A&B) → B AI5.a. A → (A ∨ B) AI5.5b. B → (A ∨ B) AI6. (A → C) → ((B → C) → ((A ∨ B) → C)) AI7. (A → B) → ((A →∼ B) →∼ A) AI8. A(t) → ∃xAx//t AI9. ∀xA(x) → Ax/t (where t is free for x in A and x is not free in A) AI10. A → (∼ A → B) The inference rules are: 1. Modus Ponens 2. A → B(x)/A → (∀x)B(x) 3. A(x) → B/(∃x)A(x) → B The definitions of derivation, theoremhood (⊢I A) and logical consequence (Γ ⊢I A) are standard. This logic is, once again, intensional in both of Lambert’s senses: G¨ odel proved that it is strongly not truth functional — indeed that no truth tables with a finite set of truth values can characterize its propositional part. And its application in Brouwer’s theory of “choice sequences” shows that it does not respect substitution of coextensive expressions salva veritate.34 In this system none of the following wff’s is provable as a theorem: (liv) ∼ ∀x ∼ A → ∃xA (lv) (A → ∃xB(x)) → ∃x(A ∨ B(x)) (lvi) (∀xA(x) → B) → ∃x(A(x) → B) (lvii) [∀x(A(x)∨ ∼ A(x))& ∼∼ ∃xA(x)] → ∃xA(x) (lviii) ∀x∃yA(x, y) ∨ ∃x∀y ∼ A(x, y) (lix) ∀xAx ∨ ∃x ∼ Ax So the system ΣI does indeed put strong constraints on existential provability. Moreover, the system has the constructive property of explicit definability: If Γ ⊢I ∃xA(x) then there is a closed singular term t, such that Γ ⊢I At/x . All this is very much in line with the “spirit” of free logic. The “non-free” aspects ΣI : But in fact the spirit of free logic goes only “so far” in this standard intuitionistic logic. For, ΣI is formulated so that when Axiom Schema (AI10) is replaced by 34 Posy
[2001] gives examples that demonstrate this point.
Free Logics
671
AI10′ ∼∼ A → A the result is a system of standard classical logic, equivalent to ΣC. (Similarly if we go to a language LI= and add the standard axioms for identity, the resulting system (ΣI= ) will be equivalent to ΣC= .) And that means that the determining difference between ΣC and ΣI resides on the propositional side and not at the quantificational level. Indeed, ΣI and ΣI= harbor all three of the existential commitments of ΣC and ΣC= . On the one hand many formalizations of intuitionistic mathematics resort to contrived techniques in order to preserve Heyting’s logic. They tend to admit only provably total functions, and they restrict singular terms to those which are guaranteed to denote. But let me point out that — contrary to Heyting’s intention — these restrictions deviate from Brouwer’s actual mathematical practice. Brouwer often introduced singular terms which are not known to denote any provably existent object, and he quite explicitly denies that every expressible real valued function is total. Clearly the hygiene of a free intuitionistic logic is called for here. 5.3.2
Standard Model Theoretic Semantics
At its inception, intuitionistic logic inspired topological and algebraic semantics, but in 1955 E. Beth produced a model theoretic semantics, and ten years later Kripke adapted his modal semantics and gave a Kripke-style semantics for intuitionistic logic.35 This is now the industry standard. So once again an interpretation, I, will be an ordered 8-tuple W, r, R, D, D∗ , IS , IB , IT . However, there are important heuristic and formal differences between the modal and the intuitionistic Kripke style interpretations Heuristically, intuitionism rests on an assertability (and not a correspondence) theory of truth. That is a semantic theory which rests truth on knowability or provability. As Kripke points out, this means that the nodes in an intuitionistic Kripke model represent states of knowledge (epistemic situations) about the world, rather than states of the world per se and that the accessibility relation represents possible increases in knowledge rather than possible different states of affairs. r represents the current state of knowledge and a domain Du consists of the objects known to exist (or in mathematics, the objects that have been constructed) at state u. Formally, this means that that the basic recursion clauses of IT for some of the logical particles will already refer to accessible nodes. (Thus, for instance, since falsity outright (∼ A) is a much stronger claim than mere ignorance, the condition for ∼ A will require looking ahead to show that A never comes to be known.) It means that R will always be reflexive and transitive — we assume that knowledge increases consistently — and that the domains will now quite naturally increase monontonically. Indeed, we now will naturally require that IB “increase 35 Kripke was inspired by a theorem of G¨ odel’s showing the connection between intuitionistic logic and S4.
672
Carl J. Posy
monotonically” as well. (A predication, once known to be true, is not forgotten as we learn additional truths.) And finally, it means that all singular terms will refer rigidly; for, the nodes here do not represent different possible states of affairs but rather increasing knowledge about a single stable state of affairs. As I said, this overall semantic framework covers a pair of distinct approaches to the Kripke style semantics, one standardly objectual, one quasi-substitutional. The first is flawed technically, the second philosophically. 5.3.2.1 The objectual semantics This approach continues the style of modal semantics we saw above. Here are the clauses: ID: ID*: IS: IB: IT0:36 IT(1): IT(2): IT(3): IT(4): IT(5): IT(6): IT(7):
D is a non-empty set. If uRu′ then Du ⊆ Du′ For each singular term, t, Is (t) ∈ D a) For each predicate letter P n and each world u, IB (u, P n ) ⊆ (Du )n b) If uRu′ then IB (u, P n ) ⊆ IB (u′ , P n ) IT (u, A) ∈ {1, 0} IT (u, P n (t1 , . . ., tn )) = 1 iff IS (t1 ), . . ., IS (tn ) ∈ IB (u, P n ) IT (u, A&B) = 1 iff IT (u, A) = IT (u, B) = 1 IT (u, A ∨ B) = 1 iff either IT (u, A) = 1 or IT (u, B) = 1 IT (u, A → B) = 1 iff for each u′ , such that uRu′ and IT (u′ , A) = 1IT (u, B) = 1 IT (u, ∼ A) = 1 iff for each u′ , such that uRu′ , IT (u′ , A) = 0 Let v be a variable. IT (u, ∀vA) = 1 iff for every u′ such that uRu′ d/v and every d ∈ Du′ , IT (u′ , A) = 1 Let v be a variable. IT (u, ∃vA) = 1 iff for some d ∈ d/v Du , IT (u, A) = 1
A an interpretation I satisfies a wff, A if I(r, A) = 1. Validity and semantic entailment are defined as usual. Here is an interpretation, I, which invalidates the principle of excluded middle and formula (liv): W = {r, u, w}; rRu, and rRw (plus of course rRr, uRu, and wRw); IS (a1 ) = d1 , IS (a2 ) = d2 and IS (a3 ) = d3 ; Dr = {d1 }, Du = {d1 }; Dw = {d1 , d2 , d3 }; IB (r, P ) = ∅ and IB (u, P ) = {d1 }; and IB (w, P ) = {d2 , d3 }. To invalidate excluded middle (in the form of (P a1 ∨ ∼ P a1 )) notice that IT (r, P a1 ) = 0 (in virtue of IB(a) and IT(1)); but that IT (r, ∼ P a2 ) = 0 as well. (That’s because of IT(5).) So, IT (r, P a1 ∨ ∼ P a1 ) = 0 too. 36 In intuitionism one standardly uses {1,0} instead of {T,F} to emphasize the fact that IT (u,A)=0 indicates our ignorance of A at the state u, and not A’s falsity.
Free Logics
673
To invalidate formula (liv) {in the form: (∼ ∀x ∼ P x → ∃xP x)} note that IT (u, ∀x ∼ P x) = 0,37 and similarly IT (w, ∀x ∼ P x) = 0. Thus IT (r, ∀x ∼ P x) = 0 as well. (Since rRu and rRw.) But then IT (u, ∼ ∀x ∼ P x) = 1. d/x But IT (r, ∃xP x) = 0. For, there is no d ∈ Dr such that IT (r, P x) = 1. So IT (r, (∼ ∀x ∼ P x → ∃xP x)) = 0. These counter-examples use the asymmetry between the conditions on universal and existential quantification, a facet that is well in the spirit of free logic. But in this case this “free” spirit goes indeed deeply; so deeply that it produces a real discrepancy between the semantics and the less “free” formal system ΣI : For, the same interpretation, I, that I sketched above, will invalidate axiom AI(8) {in the form {(P a3 → P a2 ) → ∃x(P x → P a2 )}: We need only note that /x IT (r, (P a3 → P a2 )) = 1;38 but that I d 1T (r, (P x → P a2 )) = 0 (in virtue of u). 39 Thus IT (r, ∃x(P x → P a2 )) = 0. And, indeed, axiom AI(9) of ΣI is also not valid in this semantics. Here’s a simple interpretation I that invalidates that axiom schema {in the form: ∀xP x → P a2 }. W = {r, u}; rRu, (plus of course rRr, and uRu); IS (a1 ) = d1 , IS (a2 ) = d2 ; Dr = {d1 }, Du = {d1 , d2 }; IB (r, P ) = {d1 } and IB (u, P ) = {d1 , d2 }. Under this interpretation, IT (r, ∀xP x) = 1, but IT (r, P 2) = 0. So IT (r, ∀xP x → P a2 ) = 0. So the system ΣI is not sound with respect to this semantics! 5.3.2.2 The expanding-language semantics This unsoundness results from a dissonance between those wffs whose satisfaction at a node is determined solely by local conditions and those whose satisfaction may rest on other accessible nodes as well. These latter wff’s can be satisfied at a node even though they may contain terms that have no corresponding object at that node. Now many authors avoid this unsoundness result by restricting our ability to form wffs with such “non-denoting” terms. They restrict variables to range over the domain of a node; they introduce “parameters” at each node to stand for the elements at that node, and they increase the class of wffs accordingly.40 Formally, on this approach, each node, u ∈ W , will have its own language, LuI , which is gotten from LI by adding constants to denote each of the elements of Du . (If they have constants at all in the original language, LI , then they insist that the constants all denote at r.) Standardly, then, instead of IS one simply identifies the parameters of LuI with the elements of Du themselves. The important changes on this approach are that IT(6) becomes IT(6)′ Let v be a variable. IT (u, ∀vA) = 1 iff for every u′ such that uRu′ and every d ∈ Du′ , IT (u′ , Ad/v ) = 1 and IT(7) becomes 37 Since
/x
I d 1T (u, P x) = 1, and uRu. w is the only node in which P a3 holds, and in wP a2 holds as well. 39 This is because there are no other elements of D to test. r 40 In those language containing function symbols they once again restrict themselves to provably total functions. 38 Since
674
Carl J. Posy
IT(7)′ Let v be a variable. IT (u, ∃vA) = 1 iff for some d ∈ Du , IT (u, Ad/v ) = 1 On this approach we simply cannot form the wffs that would invalidate IA(8) and IA(9). Some authors appeal to Brouwer’s aversion to the independence of language in order to justify this semantic approach. We cannot refer, they say, to things that we have not yet constructed. But in fact, this justification is disingenuous: For, recall, Kripke’s semantics doesn’t aim to depict the correspondence theory of truth, and the relation set out by IS is not ontological reference. We are dealing here simply with thought experiments about possible future states of knowledge. Indeed, this appeal is doubly improper. For, as I mentioned in §2.1, restrictions of this sort say that at node u, we may not even think about those not-yet-provedto-exist objects; yet, on the other hand clause IT(6)′ requires us to think already at u about precisely those objects that have not yet come to be constructed or proved to exist. Moreover, this approach is in fact inconsistent with intuitionistic mathematical practice, including Brouwer’s. For, as I said, Brouwer frequently argues by reductio, naming objects which he will later show not to exist. And he often considers functions whose values may not yet be known, or even known to exist at all. 5.3.3
Full Free Intuitionistic Logic
So it would be best simply to expand the logic to an explicitly free logic, and to acknowledge that change by adopting an explicitly free semantics. Several freesemantics have been proposed for intuitionistic logic. But in fact adapting the objectual Kripke modeling of 5.3.2.1 gives a straightforward and flexible approach to free intuitionistic semantics. It combines many of the general free-logical approaches, and, as it turns out, it this combination actually avoids some of the shortcomings that the various approaches to free semantics had on their own. The most natural version — a positive free semantics — will drop the restriction of IB (Pn ) to (Du )n , and simply replace IB with: IB′
a) For each predicate letter P n and each world u, IB (u, P n ) ⊆ (D)n b) If uRu′ then IB (u, P n ) ⊆ IB (u′ , P n )
Now, on the one hand, IB′ (a) looks like the single domain approach of modal logic. But — and here is a philosophical advantage — it does not lead to the excessive metaphysics of such modal semantics. For, once again, we must keep in mind that an intuitionistic interpretation concerns the growth of knowledge about the elements of a given objective situation. Non-denotation at a node need not indicate unrealized metaphysical possibility. On the other hand, this understanding of the semantics does not prevent us from expressing and incorporating full fledged non-existence. Indeed, we can move to a language LIE! , and consider a variety of free principles. In general such a language requires, as in the modal case:
Free Logics
675
FIT(1b) IT (u, E!t) = 1 iff IS (u, t) ∈ DU . And then IT (u, ∼ E!t) = 1 would say that t is known not to exist. Combining IB′ and FIT(1b) would give a straightforward positive semantics. The corresponding formal system ΣIF P will replace axiom scheme AI(9) with the axiom schemata FI (8) and FI(9). But ΣIF P will also need to replace AI(8) with its free version: FAI(8) A(t)&E!t → (∃x)A(x). This version of the semantics is strongly complete with respect to ΣIF P . Similarly, we could return to the more stringent condition IB on predication. With “E!” in the language, however, this is now strongly complete for an explicitly negative free logic ΣIF N which includes the axiom FA10. We can also move in the intermediate direction, and produce a semantics which allows “analytic” truths for non-denoting terms. In such a semantics we would keep IB, but change IT(1) as follows: IT(1)′ IT (u, P n (t1 , . . .tn )) = 1 iff a. IS (t1 ), . . ., IS (tn ) ∈ IB (u, P n ) or b(i). There is a u′ ∈ W such that IS (t1 ) ∈ Du′ , and . . . , and IS (tn ) ∈ Du′ ; and b(ii). For all u′ ∈ W , if (IS (t1 ) ∈ Du′ , and . . . , and IS (tj ) ∈ Du′ ) then IS (t1 ), . . ., IS (tn ) ∈ IB (u, P n ). IT(1)′ has a dual effect. On the one hand it allows cases in which ∼ E!t holds at a node, but so does Pt. On the other hand it restricts those cases to analytic truths, so to speak. Indeed, the intuitionistic language can now make this explicit. For the semantics with IB and IT(1)′ validates the wff: (lx) P (t1 , . . ., tn ) ∨ (P (t1 , . . ., tn ) → (E!t1 &. . .&E!tn )) Which says that either P (t1 , . . ., tn ) is analytic or else it can only be true “synthetically”, in virtue of a relation among objects actually known to exist. It is important here that clause ITb(i) and ITb(ii) do not stipulate uRu′ . In the case that P (t1 , . . ., tn ) is analytic, then its truth at a node is determined even by inaccessible nodes. Here, by the way is yet another philosophical plus. These inaccessible nodes are the true equivalents of free logic’s outer domains. But they do not represent some metaphysically arcane objects. They simply map out what it is to think now about something that isn’t or isn’t yet constructed. Finally, let me mention that one can take a partial valuation approach as well, an approach in which IS (t) might indeed be undefined for some t at a given node, u. In this case one would modify IS as follows: FIS′ (1a) IS (u, t) ∈ Du if IS (u, t) is defined FIS′ (1b) IS (u, t) = d and uRu′ ⇒ IS (u′ , t) = d
676
Carl J. Posy
We would then accept the original IB and adapt IT to this new situation as follows: FIT(1b)′ IT (u, E!t) = 1 iff IS (u, t) is defined. Technically speaking, this approach adds nothing. For, it is formally equivalent to the approach above which validates FA(10). Philosophically, however, this catches the intuition that nothing can be said of a non-existent thing, but it does this without falling into the accompanying sense of deceit or impracticality. This also has the philosophical advantage giving a non-bivalent positive semantics, which — unlike the combination of supervaluations and conventions or ontological completions — is indeed based upon a uniform concept of predication. So as in the modal case we find in the marriage of free logic with intutionism, a mutually beneficial union. It is a natural logic for intuitionism: sound, doctrinally appropriate and practical. And it presents a field for free semantics that neutralizes some philosophical objections that plagued the extensional approaches. Historical and Textual Notes Brouwer set out the basic tenets of intuitionism in his doctoral thesis, Brouwer [1907], and pursued it in a series of papers and lectures mainly collected together in Brouwer [1975]. Heyting’s axiomatization appeared in Heyting [1930]. The present version is derived from Kleene [1952] and van Dalen [2002]. G¨ odel [1932] showed that it cannot be characterized by a finite number of truth values. Topological semantics were initiated by Stone [1937] and Tarski [1938]. Algebraic semantics stem from McKinsey and Tarski [1948]. The first multi-nodal model theoretic semantics were in Beth [1956]. Kripke’s version was set out in Kripke [1965]. G¨ odel [1933] showed the connection between intuitionistic logic and S4, that inspired this modeling. As in the case of modal logic, Kripke’s semantics for intuitionistic logic was presented for a language without individual constants. Objectual versions of the Kripke semantics (for which the unsoundness result holds) are in Thomason [1968] and Dummett [2000]. (Dummett takes each Du to be a subset of the natural numbers, while the initial language contains closed terms for all of the natural numbers.) The expanding-language approach can be found in Axcel [1968] and Fitting [1969], both of whom appeal to Brouwerian constructivist views to justify this approach. This approach is prominent in Troelstra and van Dalen [1988], van Dalen [1997] and most recently in Mints [2000]. Leblanc and Gumb [1984] extend this to free semantics. Scott [1979] presents a free intuitionistic semantics, though based on a generalization of the topological interpretation. Stenlund [1975] concentrates on definite descriptions. The objectual free semantics set out above is mainly derived from the approach of Posy [1982].
Summary Free logics carry Aristotelian logic’s concern with existential commitment into the field of modern logic. They use both syntactic and semantic tools to analyze, to
Free Logics
677
refine and ultimately to combat modern logic’s own existential commitments; and they are extraordinarily sensitive to the modern view of logical form. Indeed, they and their applications are technical and philosophical heirs to the debates about singular predication and quantification that took place at the dawn of modern logic. Finally, though free logics traditionally developed in the extensional tradition, they in fact interact productively with modal and intuitionistic logics, the staples of modern logic’s intensional stream. ACKNOWLEDGEMENT Research for this chapter was supported by g rant # 91402-1 from the Israel Science foundation. The author is alos indebeted to Mark van Atten for helpful comments and references, and to the editors for their patience. BIBLIOGRAPHY [Aczel, 1968] P. Aczel. Saturated intuitionistic theories, in Schmidt H., K. Schutte and H.Thiele, eds., textitContributions to Mathematical Logic, North Holland Publishing Company, 1968. [Barcan Marcus, 1946] R. Barcan (Marcus). A functional calculus of first order based on strict implication, Journal of Symbolic Logic, 11, 1–16, 1946. [Beth, 1955] E. Beth. Semantic construction of intuitionistic logic, Mededlingen der Koninklijke Nederlandse Akademie van Wetenschappen, n.s. 18, 572–577, 1955. [Bencivenga, 1981] E. Bencivenga. Free semantics, Boston Studies in the Philosophy of Science, 47, pp.31–48, 1981. Reprinted in [Lambert, 1991]. [Bencivenga, 2002] E. Bencivenga. Free Logics, in [Gabbay and Guenthner, 2002, pp. 147–196]. [Brouwer, 1907] L. E. J. Brouwer. Over de Grondslagen der Wiskunde, Ph.D. thesis, University of Amsterdam; translated as “On the foundations of mathematics”, in [Brouwer, 1975, pp. 11–101]. [Brouwer, 1975] L. E. J. Brouwer. Collected Works, volume 1, A. Heyting, ed. North Holland Publishing Company, 1975. [Burge, 1974] T. Burge. Truth and Singular Terms, Nous, 8, 309–325, 1974. Reprinted in [Lambert, 1991]. [Carnap, 1947] R. Carnap. Meaning and Necessity, University of Chicago Press, 1947. [Cocchiarella, 1966] N. Cocchiarella. A logic of possible and actual objects, Journal of Symbolic Logic, 31, 688, 1966. [Dummett, 2000] M. Dummett. Elements of Intuitionism, 2nd Edition, Oxford University Press, 2000. [Fine, 1975] K. Fine. Vagueness, truth and logic, Synthese, 30, 265–300, 1975. Reprinted in Keefe, R. and P. Smith, Vagueness, A Reader, MIT Press, 1997. [Fine, 1983] K. Fine. The permutation principle in quantificational logic, Journal of Philosophical Logic, 12, 33–37, 1983. [Fitting, 1969] M. Fitting. Intuitionistic Logic, Model Theory and Forcing, North Holland Publishing Company, 1969. [Frege, 1892] G. Frege. Uber Sinn und Beduetung, Zeitschrift f¨ ur Philosophie und philosophische Kritik, 100, 25–50, 1892. [Gabbay, 1976] D. Gabbay. Investigations in Modal and Tense Logic with Applications to Problems in Philosophy and Linguistics, Reidel, 1976. [Gabbay and Guenthner, 2002] D. Gabbay and F. Guenthner, eds. Handbook of Philosophical Logic, 2nd Edition, volume 5, Kluwer, 2002. [Garson, 1991] J. Garson. Applications of free logic to quantified intensional logic, in [Lambert, 1991, pp. 111–144].
678
Carl J. Posy
[G¨ odel, 1932] K. G¨ odel. Zum intuitionistischen Aussagenkalk¨ ul, Anzeiger der Akademie der Wissenschaften in Wien , 69, 65–66, 1932. Reprinted and translated in [G¨ odel, 1986, pp. pp. 222–225]. [G¨ odel, 1933] K. G¨ odel. Zur intuitionistischen Arithmetik und Zahlentheorie, Ergebnisse eines mathematischen Kolloquiums, 4, 34–38, 1933. Reprinted and translated in [G¨ odel, 1986 pp. 286–295]. [G¨ odel, 1986] K. G¨ odel. Collected Works, Volume I: Publications 1929–1936, S. Feferman, et. al., eds. Oxford, Oxford University Press, 1986. [Grandy, 1972] R. Grandy. A definition of truth for theories with intensional definite description operators, Journal of Philosophical Logic, 1, 135–155, 1972. [Grandy, 1977] R. Grandy. Predication and singular terms, Nous, 11, 163–167, 1977. [Gumb and Lambert, 1997] R. Gumb and J.K. Lambert, Definitions in non-strict positive free logic, Modern Logic, 7, 25–55, 1997. [Hilbert and Bernays, 1934] D. Hilbert and P. Bernays, Die Grundlagen der Mathematik, v. I, Springer Verlag, 1934. [Heyting, 1930] A. Heyting. Die formalen Regeln der intuitionistischen Logik, Situngsberichte der preussichen Akademie von Wissenschaften, phys. math. Kl., pp. 42–56, 57–71, 1930. [Hintikka, 1959] J. Hintikka. Existential Presuppositions and Existential Commitments, Journal of Philosophy, 56, 125–137, 1959. [Hughes and Cresswell, 1968] G. Hughes and M. Cresswell, An Introduction to Modal Logic, Methuen, 1968. [Hughes and Cresswell, 1996] G. Hughes and M. Cresswell. A New Introduction to Modal Logic, Routledge, 1996. [Jaskowski, 1934] S. Jaskowsi. On the rules of supposition in formal logic, Studia Logica, 1, 5–32, 1934. [Kleene, 1952] S. Kleene.Introduction to Metamathematics, van Nostrand, 1932. [Kripke, 1959] S. Kripke. A completeness theorem in modal logic, Journal of Symbolic Logic, 24, 1–14, 1959. [Kripke, 1963] S. Kripke. Semantical considerations on modal logics, Acta Philosophical Fennica: Modal and Many Valued Logics, 16, 83–94, 1963. [Kripke, 1965] S. Kripke. Semantical analysis of intuitionistic logic, I. In J. Crossley and M. Dummett, eds., Formal Systems and Recursive Functions, pp. 92–129. North Holland Publishing Co., 1965. [Kripke, 1972] S. Kripke. Naming and Necessity, Harvard University Press, 1972. [Lambert, 1962] J. K. Lambert. Notes on E! III: A theory of descriptions, Philosophical Studies, 13, 5–59, 1962. [Lambert, 1963] J. K. Lambert. Existential import revisited, Notre Dame Journal of Formal Logic, 4, 288–292, 1963. [Lambert, 1964] J. K. Lambert. Notes on E! IV: A reduction in free quantification theory with identity and definite descriptions, Philosophical Studies, 15, 85–88, 1964. [Lambert, 1967] J. K. Lambert. Free logic and the concept of existence, Notre Dame Journal of Formal Logic, 8, 133–144, 1967. [Lambert, 1969] J. K. Lambert, ed. The Logical Way of Doing Things, Yale University Press, 1969. [Lambert, 1969a] J. K. Lambert. Logical truth and microphysics, in [Lambert, 1969, pp. 93– 118]. Revised and reprinted in Lambert [2003]. [Lambert, 1970] J. K. Lambert, ed. Philosophical Problems in Logic, Reidel, 1970. [Lambert, 1991] J. K. Lambert, ed. Philosophical Applications of Free Logic, Oxford University Press, 1991. [Lambert, 1991a] J. K. Lambert. A theory about logical theories of ‘expressions of the form “the so and so” where ‘the’ is in the singular’, Erkenntnis, 35, 337–346, 1991. [Lambert, 1992] J. K. Lambert. Russell’s theory of definite descriptions, Philosophical Studies, 65, 153–167, 1992. Reprinted as Chapter 1 of [Lambert, 2003]. [Lambert, 1995] J. K. Lambert. On the reduction of two paradoxes and the significance thereof. In Kr¨ uger, L., and B. Falkenburg, eds., Physik, Philosophie, und die Einheit der Wissenschaften, pp. 21–33. Spectrum, Heidelberg, 1995. [Lambert, 2003] J. K. Lambert. Free Logic: Selected Essays, Cambridge University Press, 2003. [Lambert and van Fraassen, 1972] J. K. Lambert and B. van Fraassen, Derivation and Counterexample, Dickenson Publishing Company, 1972.
Free Logics
679
[Leblanc and Gumb, 1984] H. Leblanc and R. Gumb, Soundness and Completeness proofs for three brands of intuitionistic logic. In H. Leblanc, R, Gumb and R. Stern, eds., Essays in Epistemology and Semantics, Haven Publishing Company, 1984. [Leblanc and Hailperin, 1959] H. Leblanc and T. Hailperin, Nondesignating Singular Terms, Philosophical Review, 68, 129–136, 1959. [Leblanc and Thomason, 1968] H. Leblanc and R. Thomason, Completeness theorems for presupposition-free logics, Fundamenta Mathematicae, 62, 125–126, 1968. [Lehmann, 1994] S. Lehmann. Strict Fregean free logic, Journal of Philosophical Logic, 23, 307–336, 1994. [Leonard, 1956] H. Leonard. The Logic of Existence, Philosophical Studies, 7, 49–64, 1956. [Lewis and Langford, 1932] C. I. Lewis and C. H. Langford. textitSymbolic Logic, Dover Publications, 1932. [Meinong, 1899] A. Meinong. Uber Gegenst¨ ande H¨ oherer Ordnung und deren Verhaltnis zir inneren Wahrnemung , Zeitschrift f u ¨r Psychologie und Physiologie der Sinnesorgane, 21, 182–272, 1899. [Meinong, 1902] A. Meinong. Uber Annahmen, Leipzig, 1902. [Meyer and Lambert, 1968] R. Meyer and J. K. Lambert. Universally free logic and standard quantification theory, Journal of Symbolic Logic, 33, 8–26, 1968. [Meyer et al., 1982] R. Meyer, E. Bencivenga, and J. K. Lambert. The ineliminability of E! in free quantification theory without identity, Journal of Philosophical Logic, 11, 229–231, 1982. [McKinsey and Tarski, 1948] J. McKinsey and A. Tarski. Some theorems about the sentential calculi of Lewis and Heyting, Journal of Symbolic Logic, 13, 1–15, 1948. [Mints, 2000] G. Mints. A Short Introduction to Intuitionistic Logic, Kluwer Academic/Plenum Publishers, 2000. [Mostowski, 1951] A. Mostowski. On the rules of proof in the pure functional calculus of the first order, Journal of Symbolic Logic, 16, 107–111, 1951. [Parsons, 1980] T. Parsons. Non-Existent Objects, Yale University Press, 1980. [Posy, 1982] C. Posy. A Free IPC is a Natural Logic, Topoi, 1, 30–43, 1982. Reprinted in [Lambert, 1991]. [Posy, 2001] C. Posy. Epistemology, Ontology and the Continuum, in The Growth of Mathematical Knowledge, E. Grossholz, and H. Breger, eds., pp. 199–219. Kluwer, 2001. [Quine, 1951] W. V. O. Quine. Mathematical Logic, Revised Edition, Harvard University Press, 1951. [Quine, 1954] W. V. O. Quine. Quantification and the empty domain, Journal of Symbolic Logic, 19, 177–179, 1954. [Russell, 1905] B. Russell. On denoting, Mind, 14, 479–493, 1905. [Russell and Whitehead, 1910] B. Russell and A. N. Whitehead. Principia Mathematica, Volume 1, second printing, Cambridge University Press, 1910. [Scales, 1969] R. Scales. Attribution and Existence, Ph.D. Thesis, University of California, Irvine, University of Michigan Microfilms, 1969. [Schock, 1964] R. Schock. Contributions to syntax, semantics and the philosophy of science, Notre Dame Journal of Formal Logic, 5, 241–290, 1964. [Schock, 1968] R. Schock. Logics Without Existence Assumptions, Almquist and Wiksells, Uppsala, 1968. [Scott, 1967] D. Scott. Existence and description in formal logic, in R. Schoenman, ed., Bertrand Russell, Philosopher of the Century, pp. 181–200, 1967. Reprinted in [Lambert, 1991]. [Scott, 1970] D. Scott. Advice on modal logic, in [Lambert, 1970, pp. 143–174]. [Scott, 1979] D. Scott. Identity and existence in intuitionistic logic, in M. Fourman, C. Mulvey, and D. Scott, eds., Applications of Sheaves, Proceedings Durham, 1979 ; pp. 660–696. Lecture Notes in Mathematics, no. 753, Springer, 1979. [Skyrms, 1968] B. Skyrms. Supervaluations: Identity, existence and individual concepts, Journal of Philosophy, 69, 477–482, 1968. [Stenlund, 1975] S. Stenlund. Descriptions in intuitionistic logic. In S. Kanger, ed., Proceedings of the Third Scandinavian Logic Symposium, pp. 197–212. North Holland Publishing Company, 1975. [Stone, 1937] M. Stone. Topological representation of distributive lattices and Brouwerian logics, asopis pro pestovani matematiky a fysiky, 67, 1–25, 1937. [Strawson, 1950] P. Strawson. On referring, Mind, 59, 320–344, 1950. [Strawson, 1952] P. Strawson. Introduction to Logical Theory, Methuen, 1952.
680
Carl J. Posy
[Tarski, 1938] A. Tarski. Der Aussagenkalk¨ ul und die Toplolgie, Fundamenta Mathematicae, 31, 103–134, 1938. [Thomason, 1968] R. Thomason. On the strong semantical completeness of the intuitionistic predicate calculus, Journal of Symbolic Logic, 33, 1–7, 1968. [Thomason, 1969] R. Thomason. Modal logic and metaphysics, in [Lambert, 1969]. [Troelstra and van Dalen, 1988] A. Troelstra and D. van Dalen. Constructivism in Mathematics, I and II. North Holland Publishing Company, 1988. [van Dalen, 1997] D. van Dalen. Logic and Structure, 3rd Edition, Springer, 1997. [van Dalen, 2002] D. van Dalen. Intuitionistic Logic, in [Gabbay and Guenthner, 2002, pp. 1– 114]. [van Fraassen, 1966a] B. C. van Fraassen. The completeness of free logic, Zeitschrift f¨ ur mathematische Logik und Grudlagen der Mathematik, 121, 219–234, 1966. [van Fraassen, 1966b] B. C. van Fraassen. Singular terms, truth-value gaps and free logic, Journal of Philosophy, 67, 481–495, 1966. Reprinted in [Lambert, 1991]. [van Fraassen, 1968] B. C. van Fraassen. Presupposition, implication and self reference, Journal of Philosophy, 69, 136–152. Reprinted in [Lambert, 1991]. [van Fraassen, 1969] B. C. van Fraassen. Presuppositions, supervaluations, and free logic. In [Lambert, 1969, pp. 67–91]. [Williamson, 1998] T. Williamson. Bare possibilities, Erkenntnis, 48, 257–273, 1998. [Williamson, 2000] T. Williamson. Existence and contingency, Proceedings of the Aristotelian Society, 100, 117–139, 2000. [Woodruff, 1984] P. Woodruff. On supervaluations in free logic, Journal of Symbolic Logic, 49, 943–950, 1984.
INDEX
A-forcing, 112 abduction, 588, 594, 609, 617 abstract fuzzy logic, 392 aggregation, 106, 108, 109, 114 aggregative force, 106 algebra functionally complete, 27 many-valued, 84 of signs, 78 of the classical logic, 27 pseudo-Boolean, 55 algebraic semantics, 378 ambiguity, 119, 121, 124 semantics, 121, 123 Apostoli, P., 109, 124 approximate reasoning, 330 approximate truth, 112 argumentation theory, 584 assertio external, 24 assertion graded, 26 Atten, M. van, 677 autoepistemic logic, 577, 619, 621 axiom, 30 axiomatic extensions of MTL, 382 axioms independence, 81 Barcan formula, 664, 666, 668 Barcan Marcus, R., 668 Barnes, J., 289 Batens, D., 119 belief-operators, 82 Bellman, R., 346 Belnap, N. D., 98, 108, 587, 612 Bencivenga, E., 640, 641, 644, 655 Benthem, J. F. A. K. van, 604 Bernays, P., 657
Beth, E., 671 biconsequence relation, 610, 611 modal, 620 bilattice, 80 interlaced, 80 logical, 80 binary, 104 bipolar possibilistic logic, 418 bivalence, 652 BL-logic, 76, 376 Bochman, A., 579, 584 Bohr, N., 112 Braybrooke, D., 111 Brouwer, L. E. J., 669, 670, 674, 676 Brown, B., 109, 111–113, 119, 120, 122 Burge, T., 645, 655 Burgess, J., 321 Burns, L., 316 Burns, S., 100 canonical chain, 391 canonical completeness, 391 canonical extension, 409 Carnap, R., 327, 668, 669 causal reasoning, 588, 593, 595, 616 chines, 110 chromatic index, 109 chunk and permeate, 113, 114 circumscription, 566, 567, 573, 577, 600, 610 Clark completion, 565, 582, 590, 598 Clark, K., 565 classical logic, 13, 299, 313 closed world assumption, 564, 600, 609 Cocchiarella, N., 655 colouring, 110
682 commonsense reasoning, 557 comparative truth, 388 Compositional Rule of Inference, 74 compositionality, 354 Comprehension Axiom, 63 computational complexity, 398 conditional assertion, 60 conditional sorites, 289 conditionals, 117 conjunctivitis, 114 connectives external, 24 internal, 24 strong, 23 weak, 23 consequence matrix, 29 operation, 30 structural, 30 consequentia, 14 consistency, 560 consistent deniability, 107, 121 consistent images, 119 content of a matrix, 29 continuous model theory, 51 convention, 648, 653 core fuzzy logic, 384 Costa, N. C. A. da, 103 coverings, 105 Cresswell, M. J., 100, 103, 669 Da Costa, N. C. A., 299 Dalen, D. van, 669, 676 Dalhousie University, 100 Daniels, D., 103 De Finetti B., 327 deduction theorem, 29, 31 default logic, 566, 571, 573, 619 defeasible entailment, 606 definite description, 643 degree of error, 60 degree of inclusion, 410 degrees of truth, 310 Delta Deduction Theorem, 393
Index
Denecker, M., 583 deontic logic, 100 depth of conditional nesting, 117 dequotation axiom, 65 theory of, 65 designated-object, 655 diagnosis, 589, 594 dialetheic, 119 dialetheic logic, 120, 297 dialetheists, 98 Diogenes La¨ertius, 288 discussive logic, 298, 299 Dix, J., 583 DMV-algebra, 398 Doyle, J., 565, 576 Dummett, M., 301, 320, 676 Dung, F. M., 583, 584, 586 Dunn, J. M., 98 dynamic logic, 111 edge addition, 114 edge expansion, 114 Edgington, D., 312 Einstein, A., 112 element designated, 29 distinguished, 29 epistemic vagueness, 293 Eubulides of Miletus, 287 Euclidean, 104 ex falso quodlibet, 58 excluded middle, 13 n-valued, 32 extension, 571 partial, 574 weak, 574 extensionality logical, 17 falsity, 13 family , 110 family resemblance, 111, 114, 115 FDE, 121–124 Field, H., 308
683
Index
Fine, K., 102, 301, 320, 640, 660 first degree entailment (FDE), 120 Fitting, M., 583 fixedness, 117 FL, 73 forcing, 102, 104, 107, 114, 115, 124 formula closed, 64 formulated n-trace, 106 Fraassen, B. van, 103, 645, 655, 660 frame problem, 561 Frank t-norms, 404 free logics formal systems formal systems, 640 inclusive, 640 negative, 640, 642, 643, 667 neutral, 640 positive, 640, 643 semantics designated object, 649 dual domain, 650, 652 inclusive, 651 partial denotation, 647 single domain, 652 supervaluations, 652 Freeman, J., 103 Frege, G., 296, 656–658 fuse measure, 118 future contingents, 14, 17 fuzzy belief function logic, 431 fuzzy closure operator, 408 fuzzy consequence relation, 409 fuzzy constants, 422 fuzzy description logic, 413 fuzzy if-then rules, 356 fuzzy implication function, 336 fuzzy logic, 310, 331, 373 propositional, 76 truth-functional, 76 fuzzy logic programming system, 405 fuzzy logic with equality, 398 fuzzy logic with evaluated syntax, 392 fuzzy modal logic, 374
fuzzy fuzzy fuzzy fuzzy fuzzy
probability logic, 430 resolution, 404 set, 71, 311, 331 truth-value, 346 type therory, 400
G¨ odel logic, 376 G¨ odel, K., 646, 670, 676 Gabbay, D. M., 600, 611, 669 Galen, 292 Γ ⊢F DE ∆, 122 Garson, J., 666, 669 Geffner, H., 595, 601, 608 Gelfond, M., 582 generalized modus ponens, 365 Gentzen, G., 124 Goguen, J., 310 Goldblatt, R., 100, 102 graded consequence relation, 409 Graff, D., 317 Grandy, R., 645 graph, 108 guaranteed possibility, 361 Gumb, R., 660, 676 H´ ajek, P., 376 Hailperin, T., 644 Halld´en, S., 306 harmonic number, 111 Harper, B., 103 hedge, 74 standard, 74 Henkin, L., 100, 103 Heyting, A., 669, 671, 676 Higgs, D., 102 higher-order vagueness, 305, 319 Hilbert, D., 657 Hintikka, J., 641, 644, 645 Hoop fuzzy logics, 397 Hughes, G. E., 100, 669 Hyde, D., 300, 321 hyperdeontic logic, 111 hypergraph, 108 hypersequent, 386
684 implicational fuse, 118 implication connective, 116 implicative closure operators, 410 inclusive logic, 639, 644 inconsistent theory, 100, 112 indexed family, 105 inexact class, 23 inference, 68 many-valued, 69 matrix, 68 operation, 69 structural, 69 intensional functions, 82 intensionality, 15 probabilistic, 17 interpolation, 370 introduction rule, 45 intuitionism, 52, 669 intuitionistic fuzzy sets, 420 intuitionistic logic, 669 j-connectives, 41 j-operators, 26, 41, 57 Ja´skowski, S., 298, 639, 640 Jennings, R. E., 95, 101–105, 107– 112, 114, 115, 124 Johnston, D. K., 104, 115 Jonson, B., 102 K ∗ , 121, 123 K¨ orner, S., 308 Kamp, J., 316 Kant, I., 662 Keefe, R., 301, 320 Kleene, S. C., 120, 669, 676 Kleer, J. de, 589 knowledge, 79 lattice, 80 ordering, 79 Konolige, K., 579 Kripke structure, 662, 664 Kripke, S., 657, 663, 665, 668, 671, 674, 676 Kyburg, H., 114
Index
LK -closure operator, 411 Lambert, K., 640, 641, 644, 645, 651, 655–657, 660–662, 670 Langford, C. H., 668 language propositional, 26 standard, 27 lattice de Morgan, 23, 71 of information, 79 of truth, 79 law de Morgan, 32 least interceptors, 110 Leblanc, H., 103, 644, 655, 676 Lee, R.T.C., 351 left-formulated n-chine, 110 Lehmann, D., 607 Lehmann, S., 655 Leonard, H., 644, 645 level functions, 105 level of incoherence, 105 level of inconsistency, 416, 424 level-preservation, 114 Levesque, H., 577 Lewis, C. I., 668 Lewis, D., 104, 316, 603 Liar paradox, 16 Lifschitz, V., 570, 580, 584, 595 Lin, F., 580, 595 Lindenbaum bundle, 30 inference matrix, 69 matrix, 30 line-drawing sorites, 291 linguistic approximation, 74 literal, 48 negative, 48 positive, 48 local deduction theorem, 380 local logic, 74 localization, 74 logic, 57 algorithmic, 86
Index
Belnap’s, 79 Bochvar, 24 external, 25 internal, 25 dynamic, 87 functionally complete three-valued, 20 intuitionistic, 15, 52 Kleene, 22 strong, 25 logically three-valued, 69 logically two-valued, 67, 69 minimal, 53 non-Fregean , 57 of nonsense, 25 paraconsistent, 58 propositional, 56 seven-valued, 85 three-valued L ukasiewicz, 61 two-valued, 13, 67 logic of supporters, 421 logic programming, 581 logic with an involutive negation, 393 logical two-valuedness, 56 logical valuations, 56, 68 logical value, 17 logics, 640 logics with ∆, 392 LP, 120, 123 L Π 12 logic, 394 L ukasiewicz logic, 376 L ukasiewicz, J., 310, 312, 325 Lukasiewicz n-valued algebra, 39 proper, 39 logic three-valued, 17 matrix n-valued, 36 infinite, 37 Lvov-Warsaw School, 298, 300 Machina, K., 312
685 Makinson, D., 574 many-valued logic, 306 many-valuedness, 14, 65, 68 c-, 66 tautological, 65 Marek, W., 574 Massey, G. J., 108 materialism, 297 mathematical induction sorites, 290 matrix, 29 ℵ1 -element, 72 L ukasiewicz, 53 functionally complete, 34 G¨ odel, 54, 55 n-valued, 54 Heyting, 53 Ja´skowski, 55 standard, 41 McCall, S., 103 McCarthy, J., 557, 567 McDermott, D., 576 McNaughton theorem, 399 meaningless, 26 Mehlberg, H., 300 Meinong, A., 656, 658 meta-semantic value, 115 meta-valuational properties, 117 Meyer, R. K., 641, 644, 655 minimally paraconsistent, 97 Minsky, M., 559, 566 Mints, G., 676 modal frame, 101 modal logics, 108 modal nonmonotonic logic, 576 modal nonmonotonic logic, 566, 620 Moisil algebra, 38 Moisil, G., 326 Monoidal logic, 382 monochrome, 110 monotonic, 99, 112 monotonic representation, 35 monotonicity, 98, 124 monotonicity condition on domains, 665
686 Moore, R. C., 577, 619 Mostowski, A., 640 Mostowski, J., 651 MTL logic, 379 multi-adjoint residuated lattice, 406 multiple conclusion forcing, 107 multiple-conclusion, 124 multiple-source possibilistic logic, 421 Mundici, D., 398 MV algebra, 37, 40, 378 n(-harmonic) saturation number, 111 n-chine, 110 n-chines, 109 n-forcing, 113 n-inconsistent set, 109 n-trace, 107 n-trace, 106 n-tuple semantics, 58 natural deduction, 44, 48 necessity, 19 necessity measure, 342 negative free logic, 659 negation cyclic, 31 negation as failure, 581 negative semantics, 655 neutral, 655 neutral free logic, 653 Nicholson, T., 109–111, 114 nilpotent minimum, 336, 381 non-commutative fuzzy logic, 399 non-explosive conditional, 116, 118 nonmonotonic inheritance, 560, 607 nonmonotonic possibilistic entailment, 425 nonmonotonic reasoning, 557 explanatory, 609 preferential, 599 normal modal logic, 100 Novel measure, 118 paracomplete logic, 301 paraconsistency, 97, 103 paraconsistent logic, 298, 307
Index
paradox of bald man, 72 paradox tolerant logic, 115 Parsons, T., 309, 656 Partee, B., 103 partial function, 655 partial normal form, 43, 49 partition, 105, 106 Pavelka-style completeness, 389 Peacocke, C., 311 Pearce, D., 582 Pearl, J., 607 Pittsburgh, 103 Plekhanov, G., 298 Poole, D., 590 positive free logic, 651, 648 possibilistic entailment, 417 possibilistic logic, 415 possibility, 19 distribution, 339 measure, 341 Post algebra of order n, 35 Posy, C. J., 670, 676 predicate fuzzy logic, 384 preservable, 95, 124 preservation, 115, 123 preservationism, 95, 102 preservationist, 95, 99, 100, 102, 115, 119, 120, 124 preservationist paraconsistency, 99 Priest’s logic of paradox (LP), 120 Priest, G., 104, 113 prime bifilter, 80 Principia Mathematica, 31 principle of bivalence, 13 principle of contradiction, 13 principle of minimal specificity, 340 Prior, A. N., 103 probabilistic rule of acceptance, 114 probability, 114, 125 logical, 16 probability logic, 429 product logic, 376 properly implicationally paraconsis-
Index
tent implication connective, 117 proposition, 16 false, 16 meaningful, 16 meaningless, 16 modal, 19 paradoxical, 16 true, 16 types, 58 Przymusinski, T., 583 PTL, 116–119 PTLn , 119 PL logic, 396 q-consequence, 68, 69 quantifier, 49 distribution, 50 fuzzy, 75 generalized, 49 in infinite-valued logic, 51 non-classical, 49 quasi-consequence, 69 Quine, W. V. O., 296, 641, 644, 657 Raffman, D., 317 rational Pavelka logic, 390 reflexive, 99, 104, 112 reflexivity, 98, 99, 124 Reichenbach, H., 326, 660 Reiter, R., 561, 564, 571, 599 relational hypersequent, 387 relative pseudo-complement, 35 resolution, 48 right-formulated n-chine, 110 rigid designation, 667 rotation, 31 rule, 30 absorption, 64 axiomatic, 30 cut, 45 Detachment, 31, 34, 72 Modus Ponens, 31 Compositional, 75 of repetition, 70
687 structural, 30 substitution, 31 Russell paradox, 63 set, 16 Russell’s theory of description, 634, 656 Russell, B., 295, 296, 319, 656–658, 660 S-fuzzy logic, 404 safe structures, 385 Sarenac, D., 109, 117, 118, 124 Scales, R., 645 sceptics, 287, 291 Schock, R., 644, 655 Schotch, P. K., 95, 100–105, 107–109, 111, 112 Schwarz, G., 579 Scott, D., 98, 103, 104, 655, 669 sea-battle sentence, 14, 15 Segerberg, K., 100–102 Seidenfeld, T., 103 self-reference, 16 semantic vagueness, 295, 301 sequent, 44 n-valued, 44 serial, 104 sets of signs, 77 Shackle, G. , 342 Shoham, Y., 580, 600 similarity logic, 426 Simon Fraser University, 102 single domain, 647 single domain semantics, 663 singleton bridge principle, 108 Soames, S., 317 Sorensen, R., 293 Spohn, W., 342 square of opposition, 633 stable expansion, 577 stable model semantics, 581 stable sets, 577 Stalnaker, R., 577, 603
688
Index
standard completeness, 383 standard conditions, 37, 41 standard connectives, 41 standard semantics, 378 Stenlund, S., 676 Stoics, 287, 291 Stone, M., 676 Strawson, P., 659 strong negation, 334 strongly paraconsistent, 308 structural rule, 109 submatrix, 36 substructural logic, 382 supervaluationism, 300, 302 supervaluations, 647, 655 supported semantics, 581 Suszko’s thesis, 56 switching theory, 84 SX, 118, 119 Sylvan, R., 104 symmetric, 104
triangular conorm, 335 triangular norm, 335 trivial, 124 trivialization, 97, 107, 113, 120–122 Truszczy´ nski, M., 574 truth, 13, 79 lattice, 80 ordering, 79 truth maintenance system, 565, 573, 589 truth-constant, 389 truth-functionality, 13 multiple-valued, 13, 44 truth-maintenance system, 582 truth-table method, 14 truth-value gaps, 300 truth-value gluts, 300 Tye, M., 308, 320, 321
t-conorm, 76 t-norm, 76 t-norm based fuzzy logics, 373 T -Schema, 305 tableaux, 77 tableaux (semantic tableaux), 46 tableaux rule, 78 Takeuti and Titani’s logic, 397 Tarski, A., 102, 676 tautology, 18 taxonomic rank, 114 taxonomy, 115 ternary relations, 104 theory of descriptions, 643 Thomason, R., 104, 655, 669 Thomason, S., 102 Thorn, P., 113 three-valued logic, 306 timed possibilistic logic, 421 traces, 106, 110 transitivity, 98, 99, 104, 112, 124 transverse hypergraphs, 110
valorization, 57 value accepted, 68 designated, 18, 29, 58, 67 false, 74 indeterminate, 61 linguistic, 73 rejected, 68 true, 74 undesignated, 58, 67 variable weight, 422 verifiers, 48
uncertainty logic, 429 undefiniteness, 22 Univeristy of Waterlooo, 102
Wagner, D., 108, 109 weak aggregation, 102 weakly aggregative, 115 weakly aggregative forcing, 122 weakly implicative fuzzy logic, 399 weakly paraconsistent, 299 weighted clauses, 416 well-founded semantics, 581, 583 Whitehead, A. N., 660
Index
wild card valuation, 122, 123 wild cards, 119 Williamson, T., 289, 293, 294, 320, 668 Wittgenstein, L., 110 Wolf, R., 103, 104 Wong, P., 109 Woodruff, P., 655 Wright, C., 321 Yale Shooting Problem, 563, 594, 595 Zadeh algebra, 72 Zadeh, L. A., 310, 327, 330
689