Logic for Philosophy
Theodore Sider June 8, 2007
Preface This book is an elementary introduction to the logic that st...
59 downloads
1186 Views
994KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Logic for Philosophy
Theodore Sider June 8, 2007
Preface This book is an elementary introduction to the logic that students of philosophy ought to know. The goal is to familiarize students with i) basic approaches to logic, including proof theory and especially model theory, ii) extensions of standard logic that are important in philosophy, and iii) some elementary philosophy of logic. The goal is to acquaint students with the logic they need to know in order to be philosophers — in order to understand the logically sophisticated articles one encounters in today’s philosophy journals, and in order to avoid being bamboozled by symbol-mongerers. For better or for worse (I think better), the last century-or-so’s developments in logic are part of the shared knowledge base of philosophers, and inform, to varying degrees of directness, work in every area of philosophy. Logic is part of our shared language and inheritance. As a consequence, the standard philosophy curriculum includes a healthy dose of logic. This is a good thing. But the advanced logic that is part of this curriculum is usually a course in “mathematical logic”, which usually means an intensive course in metalogic (for example, a course based on the excellent Boolos and Jeffrey (1989).) I do believe in the value of such a course. But if advanced undergraduate philosophy majors or beginning graduate students are to have one advanced logic course, I think it should not be a course in metalogic. The standard metalogic courses are too mathematically demanding for the average philosophy student, and omit too much material that the average student needs to know. The one standard logic course should rather be a course designed to instill logical literacy. I begin with a sketch of standard propositional and predicate logic. I briefly discuss a few extensions and variations on each (e.g., three-valued logic, definite descriptions). I then discuss modal logic and counterfactual conditionals in detail. I presuppose familiarity with: the meanings of the logical symbols of first-order predicate logic without identity or function symbols, truth tables, translations from English into propositional and predicate logic, and derivations
i
PREFACE
ii
(in some suitable natural deduction system) in propositional and predicate logic. I drew heavily from a number of different sources, which would be good for supplemental reading: • Propositional logic: Mendelson (1987) • Descriptions, multi-valued logic: Gamut (1991a) • Sequents: Lemmon (1965) • Further quantifiers: Glanzberg (2006); Sher (1991, chapter 2); Westerståhl (1989); Boolos and Jeffrey (1989, chapter 18) • Modal logic: Gamut (1991b); Cresswell and Hughes (1996a) • Semantics for intuitionism : Priest (2001) • Counterfactuals: Lewis (1973) • Two-dimensional modal logic: Davies and Humberstone (1980) Another source was Ed Gettier’s 1988 modal logic class at the University of Massachusetts. My notes from that class formed the basis of the first version of this document. Marcello Antosh, Josh Armstrong, Gabe Greenberg, Angela Harris, Alex Morgan, Jeff Russell, Crystal Tychonievich, Jennifer Wang, Brian Weatherson, and Evan Williams: thank you for your helpful comments.
Contents Preface
i
1 Nature of Logic 1.1 Logical consequence, logical correctness, logical truth . 1.2 Form and abstraction . . . . . . . . . . . . . . . . . . . . . . . 1.3 Formal logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Correctness and application . . . . . . . . . . . . . . . . . . 1.5 The nature of logical consequence . . . . . . . . . . . . . . 1.6 Extensions, deviations, variations . . . . . . . . . . . . . . . 1.6.1 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Deviations . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Variations . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Metalogic, metalanguages, and formalization . . . . . . . 2 Propositional Logic 2.1 Grammar of PL . . . . . . . . . . . . . . . . . . 2.2 Provability in PL . . . . . . . . . . . . . . . . . . 2.2.1 Examples of axiomatic proofs in PL 2.2.2 The deduction theorem . . . . . . . . 2.3 Semantics of PL . . . . . . . . . . . . . . . . . . 2.4 Soundness and completeness of PL . . . . . 2.5 Natural Deduction in PL . . . . . . . . . . . . 2.5.1 Sequents . . . . . . . . . . . . . . . . . . 2.5.2 Rules . . . . . . . . . . . . . . . . . . . . 2.5.3 Example derivations . . . . . . . . . . 2.5.4 Theoremhood and consequence . .
iii
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . .
1 2 2 6 7 8 10 10 11 11 12
. . . . . . . . . . .
14 14 16 18 22 22 28 31 31 32 34 38
CONTENTS
iv
3 Variations and Deviations from PL 3.1 Alternate connectives . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Symbolizing truth functions in propositional logic 3.1.2 Inadequate connective sets . . . . . . . . . . . . . . . . 3.1.3 Sheffer stroke . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Polish notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Multi-valued logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Łukasiewicz’s system . . . . . . . . . . . . . . . . . . . . 3.3.2 Kleene’s “strong” tables . . . . . . . . . . . . . . . . . . 3.3.3 Kleene’s “weak” tables (Bochvar’s tables) . . . . . . . 3.3.4 Supervaluationism . . . . . . . . . . . . . . . . . . . . . . 3.4 Intuitionism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
39 39 39 43 43 44 45 47 48 50 50 54
4 Predicate Logic 4.1 Grammar of predicate logic . . . . 4.2 Set theory . . . . . . . . . . . . . . . . 4.3 Semantics of predicate logic . . . . 4.4 Establishing validity and invalidity
. . . .
. . . .
. . . .
58 58 59 61 66
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
5 Extensions of Predicate Logic 5.1 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Grammar for the identity sign . . . . . . . . . . . . . . . . . 5.1.2 Semantics for the identity sign . . . . . . . . . . . . . . . . 5.1.3 Translations with the identity sign . . . . . . . . . . . . . . 5.2 Function symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Grammar for function symbols . . . . . . . . . . . . . . . . 5.2.2 Semantics for function symbols . . . . . . . . . . . . . . . . 5.2.3 Translations with function symbols: some examples . . 5.3 Definite descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Grammar for ι . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Semantics for ι . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Eliminability of function symbols and definite descriptions 5.4 Further quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Generalized monadic quantifiers . . . . . . . . . . . . . . . 5.4.2 Generalized binary quantifiers . . . . . . . . . . . . . . . . . 5.4.3 Second-order logic . . . . . . . . . . . . . . . . . . . . . . . .
68 68 68 69 69 71 72 73 75 76 77 78 79 83 83 85 87
CONTENTS 6 Propositional Modal Logic 6.1 Grammar of MPL . . . . . . . . . . . . . . . . . . . . 6.2 Translations in MPL . . . . . . . . . . . . . . . . . . 6.3 Axiomatic systems of MPL . . . . . . . . . . . . . . 6.3.1 System K . . . . . . . . . . . . . . . . . . . . . 6.3.2 System D . . . . . . . . . . . . . . . . . . . . . 6.3.3 System T . . . . . . . . . . . . . . . . . . . . . 6.3.4 System B . . . . . . . . . . . . . . . . . . . . . 6.3.5 System S4 . . . . . . . . . . . . . . . . . . . . 6.3.6 System S5 . . . . . . . . . . . . . . . . . . . . 6.4 Semantics for MPL . . . . . . . . . . . . . . . . . . . 6.4.1 Relations . . . . . . . . . . . . . . . . . . . . . 6.4.2 Kripke models for MPL . . . . . . . . . . . 6.4.3 Validity in MPL . . . . . . . . . . . . . . . . 6.4.4 Semantic validity proofs . . . . . . . . . . . 6.4.5 Countermodels in MPL . . . . . . . . . . . 6.5 Soundness in MPL . . . . . . . . . . . . . . . . . . . 6.5.1 Soundness of K . . . . . . . . . . . . . . . . . 6.5.2 Soundness of D . . . . . . . . . . . . . . . . 6.5.3 Soundness of T . . . . . . . . . . . . . . . . . 6.5.4 Soundness of B . . . . . . . . . . . . . . . . . 6.5.5 Soundness of S4 . . . . . . . . . . . . . . . . 6.5.6 Soundness of S5 . . . . . . . . . . . . . . . . 6.6 Completeness of MPL . . . . . . . . . . . . . . . . . 6.6.1 Canonical models . . . . . . . . . . . . . . . 6.6.2 Maximal consistent sets of wffs . . . . . . 6.6.3 Maximal consistent extensions . . . . . . . 6.6.4 Consistent sets of wffs in modal systems 6.6.5 Canonical models . . . . . . . . . . . . . . . 6.6.6 Completeness of systems of MPL . . . .
v
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90 91 92 95 96 108 109 110 111 112 113 115 116 118 119 122 138 140 140 141 142 142 142 143 143 144 145 147 149 151
7 Variations on Propositional Modal Logic 7.1 Propositional tense logic . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Philosophical introduction . . . . . . . . . . . . . . . . . 7.1.2 Syntax of tense logic . . . . . . . . . . . . . . . . . . . . . 7.1.3 Validity and theoremhood in tense logic . . . . . . . . 7.2 Intuitionist propositional logic . . . . . . . . . . . . . . . . . . . . 7.2.1 Kripke semantics for intuitionist propositional logic
. . . . . .
. . . . . .
153 153 153 155 155 160 160
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS
vi
7.2.2 7.2.3
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Soundness and other facts about intuitionist validity . . 166
8 Counterfactuals 8.1 Logical Features of English counterfactuals . . . 8.1.1 Not truth-functional . . . . . . . . . . . . . 8.1.2 Can be contingent . . . . . . . . . . . . . . . 8.1.3 No augmentation . . . . . . . . . . . . . . . 8.1.4 No contraposition . . . . . . . . . . . . . . . 8.1.5 Some validities . . . . . . . . . . . . . . . . . 8.1.6 Context dependence . . . . . . . . . . . . . 8.2 The Lewis/Stalnaker approach . . . . . . . . . . . 8.3 Stalnaker’s system (SC) . . . . . . . . . . . . . . . . . 8.3.1 Syntax of SC . . . . . . . . . . . . . . . . . . 8.3.2 Semantics of SC . . . . . . . . . . . . . . . . 8.4 Validity proofs in SC . . . . . . . . . . . . . . . . . . 8.5 Countermodels in SC . . . . . . . . . . . . . . . . . 8.6 Logical Features of SC . . . . . . . . . . . . . . . . . 8.6.1 Not truth-functional . . . . . . . . . . . . . 8.6.2 Can be contingent . . . . . . . . . . . . . . . 8.6.3 No augmentation . . . . . . . . . . . . . . . 8.6.4 No contraposition . . . . . . . . . . . . . . . 8.6.5 No exportation . . . . . . . . . . . . . . . . . 8.6.6 No importation . . . . . . . . . . . . . . . . 8.6.7 No hypothetical syllogism (transitivity) . 8.6.8 No transposition . . . . . . . . . . . . . . . . 8.6.9 Some validities . . . . . . . . . . . . . . . . . 8.7 Lewis’s criticisms of Stalnaker’s theory . . . . . . 8.8 Lewis’s System . . . . . . . . . . . . . . . . . . . . . . 8.9 The problem of disjunctive antecedents . . . . . 9 Quantified Modal Logic 9.1 Grammar of QML . . . . . . . . . . . . . . . . . . 9.2 Translations in QML . . . . . . . . . . . . . . . . . 9.3 Semantics for QML . . . . . . . . . . . . . . . . . . 9.4 Countermodels and validity proofs in QML . . 9.5 Philosophically interesting formulas of QML . 9.6 Variable domains . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
170 171 171 171 172 173 173 174 175 176 176 177 180 181 190 190 190 191 191 191 192 193 193 194 194 197 200
. . . . . .
202 202 202 204 206 212 219
CONTENTS
vii
9.6.1 9.6.2 9.6.3
Countermodels to the Barcan and related formulas . . . 220 Expanding, shrinking domains . . . . . . . . . . . . . . . . 222 Properties and nonexistence . . . . . . . . . . . . . . . . . . 224
10 Two-dimensional modal logic 10.1 Actuality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Kripke models with actual worlds . . . . . . 10.1.2 Semantics for @ . . . . . . . . . . . . . . . . . . 10.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . 10.2 × . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Two-dimensional semantics for × . . . . . . 10.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . 10.3 Fixedly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Examples . . . . . . . . . . . . . . . . . . . . . . . 10.4 A philosophical application: necessity and a priority References
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
228 228 229 230 231 232 233 235 236 237 240 246
Chapter 1 Nature of Logic ince you are reading this book, you are probably already familiar with some logic. You probably know how to translate English sentences into symbolic notation — into propositional logic:
S
English Propositional logic If snow is white then grass is green S→G Either snow is white or grass is not green S∨∼G and into predicate logic: English Predicate logic If Jones is happy then someone is happy H j →∃xH x Anyone who is friends with Jones is either ∀x[F x j →(I x ∨ ∀yF xy)] insane or friends with everyone You are also probably familiar with some basic techniques for evaluating arguments written out in symbolic notation. You have probably encountered truth tables, and some form of proof theory (perhaps a “natural deduction” system; perhaps “truth trees”.) You may have even encountered some elementary model theory. In short: you had an introductory course in symbolic logic. What you already have is: literacy in elementary logic. What you will get out of this book is: literacy in the rest of logic that philosophers tend to presuppose, plus a deeper grasp of what logic is all about. So what is logic all about?
1
CHAPTER 1. NATURE OF LOGIC
1.1
2
Logical consequence, logical correctness, logical truth
Logic studies logical consequence. The statement “someone is happy” is a logical consequence of the statement “Ted is happy”. If Ted is happy, then it logically follows that someone is happy. Likewise, the statement “Ted is happy” is a logical consequence of the statements “It’s not the case that John is happy” and “Either John is happy or Ted is happy”; the first statement follows from the latter two statements. If it is true that John is not happy, and also true that either John or Ted is happy, then Ted must be happy. Relatedly, logic studies logical correctness.1 An argument is logically correct when its conclusion is a logical consequence of its premises. Thus, the following argument is logically correct: It’s not the case that John is happy Ted is happy or John is happy Therefore, Ted is happy Relatedly, logic studies logical truth. A logical truth is a sentence that is a logical consequence of any sentence whatsoever — a sentence that is “true purely by virtue of logic”. Examples might include: “it’s not the case that snow is white and also not white”, “All fish are fish”, and “If Ted is happy then someone is happy”.
1.2
Form and abstraction
Logicians focus on the forms of sentences and arguments. Consider the logically correct argument from the last section: Argument A:
It’s not the case that John is happy Ted is happy or John is happy Therefore, Ted is happy
It is customary to say that argument A is logically correct because of its form — because its form is: 1
Logical correctness is sometimes called validity. I’m avoiding this term here because I will use it later on for a different concept: that of a sentence that is “true no matter what”.
CHAPTER 1. NATURE OF LOGIC
3
It’s not the case that φ φ or ψ Therefore ψ Likewise, we say that the conclusion of argument A is a logical consequence of its premises in virtue of the forms of these sentences; and we say that “it’s not the case that snow is white and also not white” is a logical truth because it has the form: it’s not the case that φ and not-φ. We need to think hard about the idea of form. Apparently, we got the alleged form of argument A by replacing some words with Greek letters and leaving other words as they were. We replaced the sentences ‘John is happy’ and ‘Ted is happy’ with φ and ψ, respectively, but left the expressions ‘It’s not the case that’ and ‘or’ as they were, resulting in the schematic form displayed above. Let’s call that form, “Form 1”. What’s so special about Form 1? Couldn’t we make other choices for what to leave and what to replace? For instance, if we replace the predicate ‘is happy’ with the schematic letter α, leaving the rest intact, we get this: Form 2:
It’s not the case that John is α Ted is α or John is α Therefore, Ted is α
And if we replace the ‘or’ with the schematic letter γ and leave the rest intact, then we get this: Form 3:
It’s not the case that John is happy Ted is happy γ John is happy Therefore, Ted is happy
As we saw, if we think of argument A as having form 1, then we can think of it as being logically correct in virtue of its form, since every instance of form 1 is logically correct. Now, if we think of argument A’s form as being form 2, we can continue to think of argument A as being logically correct in virtue of its form, since, like form 1, every instance of form 2 is logically correct. That is: no matter what predicate we change α to, form 2 becomes a logically correct argument. But if we think of argument A’s form as being form 3, then we cannot think of it as being logically correct in virtue of its form, for not every instance of form 3 is a logically correct argument. If we change γ to ‘if and only if’, for example, then we get the following logically incorrect argument:
CHAPTER 1. NATURE OF LOGIC
4
It’s not the case that John is happy Ted is happy if and only if John is happy Therefore, Ted is happy So, what did we mean, when we said that argument A is logically correct in virtue of its form? What is argument A’s form? Is it form 1, form 2, or form 3? There is no such thing as the form of an argument. When we assign an argument a form, what we are doing is focusing on certain words and ignoring others. We leave intact the words we’re focusing on, and we insert schematic letters for the rest. Thus, in assigning argument A form 1, we’re focusing on the words (phrases) ‘it is not the case that’ and ‘or’, and ignoring other words. More generally, in (standard) propositional logic, we focus on the phrases ‘if…then’, ‘if and only if’, ‘and’, ‘or’, and so on, and ignore others. We do this in order to investigate the relations of logical consequence that hold in virtue of these words’ meaning. The fact that argument A is logically correct depends just on the meaning of the phrases ‘it is not the case that’ and ‘or’; it does not depend on the meanings of the sentences ‘John is happy’ and ‘Ted is happy’. We can substitute any sentences we like for ‘φ’ and ‘ψ’ in form 1 and still get a valid argument. In predicate logic, on the other hand, we focus on further words: ‘all’ and ‘some’. Broadening our focus in this way allows us to capture a wider range of correct arguments, logical consequences, and logical truths. For example “If Ted is happy then someone is happy” is a logical truth in virtue of the meaning of ‘someone’, and not merely in virtue of the meanings of the characteristic words of propositional logic. Call the words on which we’re focusing — that is, the words that we leave intact when we construct the forms of sentences and arguments — the logical constants. (We can speak of natural language logical constants — ‘and’, ‘or’, etc. for propositional logic; ‘all’ and ‘some’ in addition for predicate logic — as well as symbolic logical constants: ∧, ∨, etc. for propositional logic; ∀ and ∃ in addition for predicate logic.) What we’ve seen is that the forms we assign depend on what we’re considering to be the logical constants. We call these expressions logical constants because we interpret them in a constant way in logic, in contrast to other terms. For example, ∧ is a logical constant; it always represents the English ‘and’. There are fixed rules governing ∧, in proof systems (the rule that from P ∧Q one can infer P , for example), in the rules for constructing truth tables, and so on. Moreover, these rules are distinctive for ∧: there are different rules for other logical constants, ∨
CHAPTER 1. NATURE OF LOGIC
5
for example. In contrast, the terms in logic that are not logical constants do not have fixed, particular rules governing their meanings. For example, there are no special rules governing what one can do with a P as opposed to a Q in proofs or derivations. That’s because P doesn’t symbolize any sentence in particular; it can stand for any old sentence. There isn’t anything sacred about the choices of logical constants we make in propositional and predicate logic; and therefore, there isn’t anything sacred about the customary forms we assign to sentences. We could treat other words as logical constants. We could, for example, stop taking ‘or’ as a logical constant, and instead take ‘It’s not the case that John is happy’, ‘Ted is happy’, and ‘John is happy’ as logical constants. We would thereby view argument A as having form 3. This would not be a particularly productive choice (since it would not help to explain the correctness of argument A), but it’s not wrong simply by virtue of the concept of form. More interestingly, consider the fact that every argument of the following form is valid: α is a bachelor Therefore, α is unmarried Accordingly, we could treat the predicates ‘is a bachelor’ and ‘is unmarried’ as logical constants, and develop a corresponding logic. We could introduce special symbolic logical constants for these predicates, we could introduce distinctive rules governing these predicates in proofs. (The rule of “bachelorelimination”, for instance, might allow one to infer “α is unmarried” from “α is a bachelor”.) As with the choices of the previous paragraph, this choice of what to treat as a logical constant is also not ruled out by the concept of form. And it would be more productive than the choices of the last paragraph. Still, it would be far less productive than the usual choices of logical constants in predicate and propositional logic. The word ‘bachelor’ doesn’t have as general application as the words commonly treated as logical constants in propositional and predicate logic; the latter are ubiquitous. At least, this remark about “generality” is one idea about what should be considered a “logical constant”, and hence one idea about the scope of what is usually thought of as “logic”. Where to draw the boundaries of logic — and indeed, whether the logic/nonlogic boundary is an important one to draw — is an open philosophical question about logic. At any rate, in this course, one thing we’ll do is study systems that expand the list of logical constants from standard propositional and predicate logic.
CHAPTER 1. NATURE OF LOGIC
1.3
6
Formal logic
Modern logic is “mathematical” or “formal” logic. This means simply that one studies logic using mathematical techniques. More carefully: in order to develop theories of logical consequence, logical correctness, and logical truth, one develops a formal language (see below), one treats the sentences of the formal language as mathematical objects; one uses the tools of mathematics (especially, the tools of very abstract mathematics, such as set theory) to formulate theories about the sentences in the formal language; and one applies mathematical standards of rigor to these theories. Mathematical logic was originally developed to study mathematical reasoning2 , but its techniques are now applied to reasoning of all kinds. Think, for example, of propositional logic (this will be our first topic below). The standard approach to analyzing the logical behavior of ‘and’, ‘or’, and so on, is to develop a certain formal language, the language of propositional logic. The sentences of this language look like this: P (Q→R)∨(Q→∼S) P ↔(P ∧Q) The symbols ∧, ∨, etc., are used to represent the English words ‘and’, ‘or’, and so on (the logical constants for propositional logic), and the sentence letters P, Q, etc., are used to represent declarative English sentences. Why ‘formal’? Because we stipulate, in a mathematically rigorous way, a grammar for the language; that is, we stipulate a mathematically rigorous definition of the idea of a sentence of this language. Moreover, since we are only interested in the logical behavior of the chosen logical constants ‘and’, ‘or’, and so on, we choose special symbols (∧, ∨ . . . ) for these words only; we use P, Q, R, . . . indifferently to represent any English sentence whose internal logical structure we are willing to ignore.3 We go on, then, to study (as always, in a mathematically rigorous way) various concepts that apply to the sentences in formal languages. In propositional 2
Notes Natural languages like English also have a grammar, and the grammar can be studied using mathematical techniques. But the grammar is much more complicated, and is discovered rather than stipulated; and natural languages lack abstractions like the sentence letters. 3
CHAPTER 1. NATURE OF LOGIC
7
logic, for example, one constructs a mathematically rigorous definition of a tautology (“all Trues in the truth table”), and a rigorous definition of a provable formula (e.g., in terms of a system of deduction, using rules of inference, assumptions, and so on). Of course, the real goal is to apply the notions of logical consequence, logical correctness, and logical truth to sentences of English and other natural languages. The formal languages are merely a tool; we need to apply the tool.
1.4
Correctness and application
To apply the tools we develop for formal languages, we need to speak of a formal system as being correct. What does that sort of claim mean? As we saw, logicians use formal languages and formal structures to study logical consequence, logical correctness, and logical truth. And the range of structures that one could in principle study is very wide. For example, I could introduce a new notion of “provability” by saying “in Ted Logic, the following rule may be used when constructing proofs: if you have P on a line, you may infer ∼P . The annotation is ‘T’.” I could then go on to investigate the properties of such a system. Logic can be viewed as a branch of mathematics, and we can mathematically study any system we like, including a system (like Ted logic) in which one can “prove” ∼P from P . Of course, such a system would shed no light on the logical correctness of English arguments. That is, it would be implausible to claim that when we translate an English argument A into symbols, the conclusion of the resulting symbolic argument may be derived in Ted logic from its premises iff the English argument A is logically correct. Thus, the existence of a coherent, specifiable logical system is a completely separate thing from its application. When we say that a logical system is correct, we have in mind some application of that system. Here’s an oversimplified account of one such correctness claim. Suppose we specified some translation scheme from English into the language of propositional logic. In particular, we would want to translate English ‘and’ into the logical ∧ , English ‘or’ into the logical ∨, etc. Then, the claim that standard propositional logic is a correct logic of English ‘and’, ‘or’, etc. is the claim that an English argument is logically correct (i.e., its conclusion is a logical consequence of its premises) in virtue of ‘and’, ‘or’, etc., iff you can derive the translation of its conclusion from the translations of its premises in the standard system of propositional logic.
CHAPTER 1. NATURE OF LOGIC
1.5
8
The nature of logical consequence
The definition of correctness just given employed the notion of logical consequence, as applied to sentences of English. But what is it for a sentence of English to be a logical consequence of other sentences of English? The question here is a philosophical question, as opposed to a mathematical one. Think of it this way. Logicians define various notions concerning sentences of formal languages (e.g., notions of derivability of symbolic formulas; e.g., the notion of a tautology). These notions are good insofar as they correctly model logical consequence, logical truth, and so on; but in what does logical truth consist? This is one of the core questions of philosophical logic. This book is not primarily a book in philosophical logic, so we won’t spend much time on the question. However, I do want to make clear that the question is indeed a question. The question is sometimes obscured by the fact that terms like ‘logical consequence’, ‘logically correct argument’, and ‘logical truth’ are often stipulatively defined in logic books. This can lead to the belief that there are no genuine issues concerning these notions. It is also obscured by the fact that one philosophical theory of these notions — the model-theoretic one — is so dominant that one can forget that it is a nontrivial theory. Stipulative definitions are of course not things whose truth can be questioned; but stipulative definitions of logical notions are good insofar as the stipulated notions accurately model the real notions: logical consequence, logical correctness, and logical truth of natural language arguments. Further, the stipulated definitions generally concern formal languages, whereas the ultimate goal is an understanding of correct reasoning of the sort that we actually do, using natural languages. Let’s focus just on logical consequence. There are various possible views about what logical consequence is; here I will give just a quick survey. Probably the most standard account is the model-theoretic one: φ is a logical consequence of the sentences in set Γ if the formal translation of φ is true in every model in which the formal translations of the members of Γ are true. This account needs to be spelled out in various ways. First, “formal translations” are translations into a formal language; but which formal language? It will be a language that has a logical constant for each English logical expression. But that raises the question of which expressions of English are logical expressions. In addition to ‘and’, ‘or’, ‘all’, and so on, are any of the following logical expressions?
CHAPTER 1. NATURE OF LOGIC
9
necessarily it will be the case that most it is morally wrong that Further, the notion of translation must be defined; further, an appropriate definition of ‘model’ must be chosen. Similar issues of refinement confront a second account, the proof-theoretic account: φ is a logical consequence of the members of Γ iff the translation of φ is provable from the translations of the members of Γ. We must decide what formal language to translate into, and we must decide upon an appropriate definition of provability. A third view is Quine’s: φ is a logical consequence of the members of Γ iff there is no way to (uniformly) substitute new nonlogical expressions for nonlogical expressions in φ and the members of Γ so that the members of Γ become true and φ becomes false. Three other accounts should be mentioned. The first account is a modal one. Say that Γ modally implies φ iff it is not possible for φ to be false while the members of Γ are true. (What does ‘possible’ mean here? There are many kinds of possibility one might have in mind: so-called “metaphysical possibility”, “absolute possibility”, “idealized epistemic possibility”…. Clearly the acceptability of the proposal depends on the legitimacy of these notions. We discuss modality later in the book, beginning in chapter 6.) One might then propose that φ is a logical consequence of the members of Γ iff Γ modally implies φ. (An intermediate proposal: φ is a logical consequence of the members of Γ iff, in virtue of the forms of φ and the members of Γ, Γ modally implies φ. More carefully: φ is a logical consequence of the members of Γ iff Γ modally implies φ, and moreover, whenever Γ0 and φ0 result from Γ and φ by (uniform) substitution of nonlogical expressions, Γ0 modally implies φ0 . This is like Quine’s definition, but with modal implication in place of truth-preservation.) Second, there is a primitivist account, according to which logical consequence is a primitive notion. Third, there is a pluralist account according to which there is no one kind of genuine logical consequence. There are, of course, the various concepts proposed by each account, each of which is trying to capture genuine logical consequence; but in fact there is no further notion of genuine logical consequence at all; there are only the proposed construals.4 4
Notes. Say something about pluralism about consequence as opposed to pluralism about
CHAPTER 1. NATURE OF LOGIC
10
As I say, this is not a book on philosophical logic, and so we will not inquire further into which (if any) of these accounts is correct. We will, rather, focus exclusively on two kinds of formal proposals for modeling logical consequence, logical correctness, and logical truth: model-theoretic and proof-theoretic proposals.
1.6
Extensions, deviations, variations5
“Standard logic” is what is usually studied in introductory logic courses. It includes propositional logic (∧, ∨, ∼, →, ↔), and predicate logic (∀, ∃, variables). In this course we’ll consider various modifications of standard logic:
1.6.1
Extensions
Here we add to standard logic. We add both: • Symbols • things we can prove The reason we add to standard logic is that we want to get a better representation of English arguments. There are more logically correct English arguments than you can show to be logically correct using plain old standard logic. For example, predicate logic is an extension of propositional logic. You can do a lot with propositional logic, but you can’t even show that: Ted is happy Therefore, someone is happy is logically correct using just propositional logic. So we added quantifiers, variables, predicates, etc., to propositional logic (added symbols), and added new rules of derivation to yield more things we could prove. We’ll want to add even more things. For instance, we will add the “=” sign to predicate logic, and we’ll add a sign for “necessarily” when we get to modal logic. truth. E.g., the former is much easier to swallow than the claim that there’s no fact of the matter as to whether ‘Jones is bald or Jones is not bald’ is true. 5 See Gamut (1991a, pp. 156-158).
CHAPTER 1. NATURE OF LOGIC
1.6.2
11
Deviations
Here we change, rather than add. We keep the same symbols, but we change what we are saying about what the symbols mean. Thus, we get different things we can prove. Why do this? Perhaps because we think that standard logicians are wrong about what the right logic for English is. If we want to correctly model logical consequence in English, therefore, we must construct systems that behave differently from standard logic. For example, we’ll talk about multi-valued logic. In standard logic, everything is either true or false. But maybe that’s a mistake; maybe some sentences are neither true nor false. E.g.: The king of the United States is bald Sherlock Holmes weighs more than 178 pounds $100 is a lot of money There will be a sea battle tomorrow Intuitionism is another case. Intuitionists say that, roughly, that something is true if and only if it is provable. This leads them to deny that ∼∼P logically implies P. For it might be provable that there is no proof of ∼P, even if P isn’t provable. Likewise, intuitionists deny that P∨∼P is a logical truth, since it might be that neither P nor ∼P is provable. Accordingly, intuitionists develop a nonstandard version of propositional logic.
1.6.3
Variations
Here we also change standard logic, but we change the notation without changing the content of logic. We study alternate ways of expressing the same thing. For example, in intro logic we show how: ∼(P ∧Q) ∼P ∨∼Q are two different ways of saying the same thing. We will study other ways of saying the same thing as these, including: P |Q ∼∧P Q
CHAPTER 1. NATURE OF LOGIC
12
In the first case, | is a new symbol for “not both”. In the second case (“Polish notation”), the ∼ and the a mean what they mean in standard logic; but instead of going between the P and the Q, the a goes before P and Q. The value of this, as we’ll see, is that we no longer will need parentheses.
1.7
Metalogic, metalanguages, and formalization
In introductory logic, we learned how to use certain logical systems. We learned how to do truth tables, construct derivations, and so on. But logicians do not spend much of their time developing systems only to sit around all day doing derivations in those systems. As soon as a logician develops a new system, he or she will begin to ask questions about that system. For an analogy, imagine people who make up games. They might invent a new version of chess. Now, they might spend some time actually playing the new game. But if they were like logicians, they would soon get bored with this and start asking questions about the game, such as: “is the average length of this new game longer than the average length of a game of chess?”. “Is there any strategy one could pursue which will guarantee a victory?” Logicians might ask, for example, what things can be proved in such and such a system? Can you prove the same things in this system as in system X? Proving things about logical systems is part of “meta-logic”, which is an important part of logic. In metalogic, we turn our focus to the logical languages themselves, and prove results about those languages. Standard textbooks introduce a pair of methods for evaluating symbolic arguments for logical correctness. One is semantic: an argument is semantically correct iff in its complete truth table, there is no case in which the premises are all true while its conclusion is false. Another is proof-theoretic: an argument is proof-theoretically correct iff there exists a derivation of its conclusion from its premise, where a derivation is then appropriately defined. (Think: introduction- and elimination- rules, conditional and indirect proof, and so on.) Now, an interesting meta-logical question is: how do these two methods for evaluating symbolic arguments relate to each other? The question is answered by the following metalogical results, which are proved in standard books on metalogic: Soundness of propositional logic: Any proof-theoretically correct argument is semantically correct
CHAPTER 1. NATURE OF LOGIC
13
Completeness of propositional logic: Any semantically correct argument is proof-theoretically correct These are really interesting claims! They show that the method of truth tables and the method of constructing derivations amount to the same thing, as applied to symbolic arguments of propositional logic. A couple remarks about proving things in metalogic. First: what do we mean by “proving”? We do not mean: constructing a derivation in the logical system we’re investigating. We’re trying to construct a proof about the system. We do this in English, informally. Logicians often distinguish the “object language” from the “metalanguage”. The object language is the language that we’re studying: the language of propositional logic in this case. Sentences of the object language look like this: P ∧Q ∼(P ∨Q)↔R The metalanguage is the language we use to talk about the object language. In this case, the metalanguage is English. Here are some example sentences of the metalanguage: ‘P ∧Q’ is a sentence with three symbols, one of which is a logical constant Every sentence of propositional logic has the same number of left parentheses as right parentheses If an argument’s conclusion can be derived from its premises, then there is no case in its truth table where its premises are all true but its conclusion is false (i.e., soundness) Our proofs in metalogic will take place in the metalanguage, English. Second: to get anywhere in metalogic, we will have to get a lot more careful about certain things than we were in intro logic. Let’s look at soundness, for instance. To be able to prove this, in a mathematically rigorous way, we’ll need to have the terms in it defined very carefully. In particular, we’ll need to say exactly what we mean by ‘sentence of propositional logic’, ‘truth tables’, and ‘derived’. Defining these terms precisely (another thing we’ll do using English, the metalanguage!) is known as formalizing logic. Our first task will be to formalize propositional logic.
Chapter 2 Propositional Logic
W
e begin with the simplest logic commonly studied: propositional logic. Despite its simplicity, it has great power and beauty.
2.1
Grammar of PL
Modern logic has made great strides by treating the language of logic as a mathematical object. To do so, grammar needs to be developed rigorously. (Our study of a new logical system will always begin with grammar.) If all you want to do is understand the language of logic informally, and be able to use it effectively, you don’t really need to get so careful about grammar. For even if you haven’t ever seen the grammar of propositional logic formalized, you can recognize that things like this make sense: P →Q R∧(∼S↔P ) Whereas things like this do not: →P QR∼ (P ∼Q∼(∨ But to make any headway in metalogic, we will need more than an intuitive understanding of what makes sense and what does not; we will need a precise definition that has the consequence that only the strings of symbols in the first group “make sense”. 14
CHAPTER 2. PROPOSITIONAL LOGIC
15
Grammatical formulas (i.e., ones that “make sense”) are called well-formed formulas, or “wffs” for short. We define these by first carefully defining exactly which symbols are allowed to occur in wffs (the “primitive vocabulary”), and second, carefully defining exactly which strings of these symbols counts as a wff: Primitive vocabulary: Sentence letters: P, Q, R . . . , with or without numerical subscripts Connectives: →, ∼ Parentheses: (,) Definition of wff: i) Sentence letters are wffs ii) If φ, ψ are wffs then φ→ψ and ∼φ are wffs iii) Only strings than can be shown to be wffs using i) and ii) are wffs What happened to ∧, ∨, and ↔? We included only → and ∼. Answer: if we have → and ∼, we don’t need the others; we can define them: φ∧ψ =df φ∨ψ =df φ↔ψ =df
∼(φ→∼ψ) ∼φ→ψ (φ→ψ)∧(ψ→φ)
Note that the choice to begin with → and ∼ as our primitive connectives was arbitrary. We could have started with ∼ and ∧, and defined the others as follows: φ∨ψ =df φ→ψ =df φ↔ψ =df
∼(∼φ∧∼ψ) ∼(φ∧∼ψ) (φ→ψ)∧(ψ→φ)
And other alternate choices are possible. We’ll talk about this later. So: → and ∼ are our primitive connectives; the others are defined. Why do we choose only a small number of primitive connectives? Because, as we will see, it makes meta-proofs easier.
CHAPTER 2. PROPOSITIONAL LOGIC
2.2
16
Provability in PL
After grammar, a common step in the study of a language is to give some syntactic definition of ‘theorem’. The intuitive notion of a theorem is something one can prove. One method of characterizing theoremhood is called the method of natural deduction. Any system in which one has assumptions for “conditional proof”, assumptions for “indirect derivation”, etc. is a system of natural deduction. That was our method in intro logic. We will investigate a version of this method below. Natural deduction systems are prized because they are easy to use. That is, it is relatively easy to construct derivations. They are called “natural” deduction systems because the techniques used in constructing derivations are supposed to mirror techniques of actual reasoning. But as easy as they are to use, natural deduction systems are not as useful for metalogic. This is because it is hard to prove things about natural deduction systems. For metalogic, many logicians prefer a different method for defining ‘theorem’: the axiomatic method. To apply the axiomatic method, we must choose i) a set of rules, and ii) a set of axioms. An axiom is simply any chosen sentence. The idea is to choose axioms that are obvious logical truths. E.g., “P →P ” would be a good axiom — it’s obviously a logical truth. (As it happens, we won’t choose this particular axiom; we’ll instead choose other axioms from which this one may be proved.) A rule is simply a permission to infer one sort of sentence from other sentences. For example, modus ponens. It can be stated thus: “From φ→ψ and φ you may infer ψ”, and pictured as follows: MP φ→ψ φ ψ After the axioms and rules have been chosen, we may offer the following definitions: A proof is a finite sequence of wffs, in which each member either i) is an axiom, or ii) follows from earlier members via a rule A theorem is the last member of any proof (we write “` φ” for “φ is a theorem”).
CHAPTER 2. PROPOSITIONAL LOGIC
17
One wff φ is provable from another set of wffs Γ (“Γ ` φ”), if there is a sequence of wffs, in which the last is φ, and in which each member is either an axiom, a member of Γ, or follows from earlier wffs in the sequence via a rule. Here are the axioms and rules for PL:1 PL (propositional logic) Rules: MP Axioms: Where φ, ψ, and χ are wffs, anything that comes from the following schemas are axioms (A1) φ→(ψ→φ) (A2) (φ→(ψ→χ ))→((φ→ψ)→(φ→χ )) (A3) (∼ψ→∼φ)→((∼ψ→φ)→ψ) The axiom schemas A1-A3 are called schemas because they aren’t in fact axioms; they’re rather “recipes” for constructing axioms. They aren’t axioms themselves because they aren’t wffs. The Greek letters ‘φ’ and ‘ψ’ and ‘χ ’ aren’t in the primitive vocabulary of the grammar of propositional logic, and so no string of symbols containing these Greek letters is a wff. Rather, the idea is that what you get when you fill in φ and ψ andχ in these schemas with wffs, is an axiom. Let me clarify this. Take the first axiom schema, φ→(ψ→φ). This means: “for any wff that you stick in for φ and any wff that you stick in for ψ, the result φ→(ψ→φ) is an axiom.” First point of clarification: you can stick in the same thing for φ as for ψ if you want. Thus, P →(P →P ) is an axiom. Second point: you don’t have to stick in the same thing for φ as for ψ. Thus, P →(Q→P ) is an axiom. Third point: you can stick in complex formulas for φ and ψ if you want. Thus, (P →Q)→(∼(R→S)→(P →Q)) is an axiom. Fourth point: within a single axiom, you must substitute the same thing for each Greek letter wherever it occurs. For example, P →(Q→R) is not an axiom; you can’t let the first φ be P and the second φ be R. Finally, even though you can’t make φ be different things within a single axiom, you can make φ be different things when making new axioms. That is, you can make one set of substitutions for φ and ψ, generate 1
See Mendelson (1987, p. 29).
CHAPTER 2. PROPOSITIONAL LOGIC
18
an axiom, then choose a different set of substitutions, and generate a further axiom, and use each of these axioms in the same axiomatic proof. For instance, you could use each of the following instances of A1 within a single axiomatic proof: P →(Q→P ) ∼P →((Q→R)→∼P ) In the first case, I made φ be P and ψ be Q; in the second case I made φ be ∼P and ψ be Q→R. This is fine because I kept φ and ψ constant within each axiom. A theorem is, in essence, a wff that is provable from no assumptions at all. We also introduced the notion of φ’s being provable from some assumptions. Where Γ is any set of wffs, we said that φ is provable from Γ iff there exists some sequence of wffs, each member of which is either i) an axiom, ii) a member of Γ, or iii) follows from earlier lines in the proof via a rule. Such a proof is called a “proof from Γ”. Proofs from Γ are just like plain old proofs, except that in addition to axioms you can write down a member of Γ at any time. Example: To show that {(P →Q)} ` (P →P ), we construct a proof of P →P from {(P →Q)}: 1. P →(Q→P ) 2. (P →(Q→P ))→((P →Q)→(P →P )) 3. (P →Q)→(P →P ) 4. P →Q 5. P →P
(A1) (A2) 1,2 MP member of {(P →Q)} 3, 4 MP
This notion of provability-from is the proof-theoretic way of making sense of logical consequence (which, remember, is the main goal of logic). An argument is a good one, in this sense, if its conclusion is provable from the set of its premises.
2.2.1
Examples of axiomatic proofs in PL
I said that doing derivations in natural deduction systems is easy. I meant: easy relative to doing proofs in axiomatic systems, which is extremely hard. Some, of course, are easy, for instance (P →Q)→(P →P ):
CHAPTER 2. PROPOSITIONAL LOGIC
19
1. P →(Q→P ) (A1) 2. P →(Q→P ))→((P →Q)→(P →P )) (A2) 3. (P →Q)→(P →P ) 1,2 MP The next example is a little harder: (R→P )→(R→(Q→P )) 1. [R→(P →(Q→P ))]→[(R→P )→(R→(Q→P ))] 2. P →(Q→P ) 3. [P →(Q→P )]→[R→(P →(Q→P ))] 4. R→(P →(Q→P )) 5. (R→P )→(R→(Q→P ))
A2 A1 A1 2,3 MP 1,4 MP
Here’s how I approached this problem. What I was trying to prove, namely (R→P )→(R→(Q→P )), is a conditional whose antecedent and consequent both begin: (R→. That looks like the consequent of A2. So I wrote out an instance of A2 whose consequent was the formula I was trying to prove; that gave me line 1 of the proof. Then I tried to figure out a way to get the antecedent of line 1; namely, R→(P →(Q→P )). And that turned out to be pretty easy. The consequent of this formula, P →(Q→P ) is an axiom (line 2 of the proof). And if you can get a formula φ, then you choose anything you like — say, R, — and then get R→φ, by using A1 and modus ponens; that’s what I did in lines 3 and 4. In fact, we’ll want to make a move like this in many proofs, whenever we have φ on its own, and we want to move to ψ→φ. Let’s call this move “adding an antecedent”; this is how it is done: 1. φ (from earlier lines) 2. φ→(ψ→φ) A1 3. ψ→φ 1, 2 MP In future proofs, instead of repeating such steps, let’s just move directly from φ to ψ→φ, with the justification “adding an antecedent”. Think of the theorem we just proved, (R→P )→(R→(Q→P )), as “weakening the consequent”: Q→P is weaker than P , so if R implies P , it also implies Q→P . Next I will prove a theorem that intuitively corresponds to “strengthening the antecedent”: [(P →Q)→R]→(Q→R). The intuitive idea is that if P →Q leads to R, then Q ought to lead to R, since Q is stronger than P →Q. This proof will be harder still. I’ll start with a sketch of a new technique, which I’ll call the “MP technique”. Here’s what the technique will let us do. Suppose we can separately prove φ→ψ and φ→(ψ→χ ). The technique then shows us how to construct a proof of
CHAPTER 2. PROPOSITIONAL LOGIC
20
φ→χ . I call this the MP technique because its effect is that you can do modus ponens “within the consequent of the conditional φ→”. Here’s how the MP technique works: 1. 2. 3. 4. 5.
φ→ψ φ→(ψ→χ ) (φ→(ψ→χ ))→((φ→ψ)→(φ→χ )) (φ→ψ)→(φ→χ ) φ→χ
from earlier lines from earlier lines A2 2,3 MP 1,4 MP
The MP technique can now be used to prove [(P →Q)→R]→(Q→R), letting φ = (P →Q)→R, ψ = Q→(P →Q) and χ = Q→R. a. [(P →Q)→R]→[Q→(P →Q)] b. [(P →Q)→R]→[(Q→(P →Q))→(Q→R)] c. [(P →Q)→R]→[Q→R]
see below see below a,b MP method
All that remains is to supply separate proofs of lines a and b. Step a is pretty easy. Its consequent, Q→(P →Q) is an instance of A1, so we can prove it in one line, then use “adding an antecedent” to get a. Line b is a bit harder. It has the form: (α→β)→[(γ →α)→(γ →β)]. Call this theorem (*). The following proof of (*) uses the MP technique again! 1. [γ →(α→β)]→[(γ →α)→(γ →β)] A2 2. (α→β)→[γ →(α→β)] A1 3. (α→β)→[γ →(α→β)]→[(γ →α)→(γ →β)] adding an antecedent to line 1. 4. (α→β)→[(γ →α)→(γ →β)] 2, 3, MP method For this use of the MP method, φ = α→β, ψ = γ →(α→β), and χ =(γ →α)→(γ →β). That’s it! ` [(P →Q)→R]→(Q→R) has been established. Notice that once the proofs get longer, it gets impractical to actually write out the entire proof. It’s easier to do proof sketches. If you wanted to write out the entire proof, you’d need to take each of the bits, fill in the details, and assemble the results into one proof. We can use the techniques we’ve developed so far to prove some further theorems. For instance, let’s show that in a nested conditional, one can swap the antecedents: ` [P →(Q→R)]→[Q→(P →R)]:
CHAPTER 2. PROPOSITIONAL LOGIC
21
a. [P →(Q→R)]→[(P →Q)→(P →R)] b. [P →(Q→R)]→{[(P →Q)→(P →R)]→[Q→(P →R)]} c. [P →(Q→R)]→[Q→(P →R)]
A2 see below a,b MP technique
(In this case of the MP technique, φ = P →(Q→R), ψ = (P →Q)→(P →R), and χ = [Q→(P →R)].) All that remains is to prove [P →(Q→R)]→{[(P →Q)→(P →R)]→[Q→(P →R)]}. But that’s pretty easy, given the techniques we’ve already amassed; here’s the sketch: 1. [(P →Q)→(P →R)]→[Q→(P →R)] 2. [P →(Q→R)]→{[(P →Q)→(P →R)]→[Q→(P →R)]}
strengthening the antecedent 1, adding an antecedent
QED Next let’s show that ` (P →Q)→[(Q→R)→(P →R)], which says in effect that conditionals are “transitive”. Here’s a sketch: 1. {(Q→R)→[(P →Q)→(P →R)]}→ {[(Q→R)→(P →Q)]→[(Q→R)→(P →R)]} 2. (Q→R)→[(P →Q)→(P →R)] 3. [(Q→R)→(P →Q)]→[(Q→R)→(P →R)] 4. {[(Q→R)→(P →Q)]→[(Q→R)→(P →R)]}→ {(P →Q)→[(Q→R)→(P →R)]} 5. (P →Q)→[(Q→R)→(P →R)]
A2 (*) (proved above) 1, 2 MP strengthening the antecedent 3, 4 MP
QED Given this transitivity theorem, we can always move from φ→ψ and ψ→χ to φ→χ thus: 1. φ→ψ from earlier lines 2. ψ→χ from earlier lines 3. (φ→ψ)→[(ψ→χ )→(φ→χ )] transitivity theorem just proved 4. (ψ→χ )→(φ→χ ) 1, 3 MP 5. φ→χ 2, 4 MP So, such moves may henceforth be justified by appeal to “transitivity”. The following proof of the law of contraposition uses this move:
CHAPTER 2. PROPOSITIONAL LOGIC 1. (∼Q→∼P )→[(∼Q→P )→Q] 2. [(∼Q→P )→Q]→(P →Q) 3. (∼Q→∼P )→(P →Q)
22
A3 strengthening the antecedent 1, 2 transitivity
QED One final proof, of ∼P →(P →Q): 1. ∼P →(∼Q→∼P ) A1 2. (∼Q→∼P )→(P →Q) contraposition 3. ∼P →(P →Q) 1, 2 transitivity
2.2.2
The deduction theorem
Axiomatic proofs are difficult, in large part, because assumptions are not allowed. In a natural deduction system, one may assume the antecedent, φ, reason one’s way to the consequent, ψ, and then conclude the conditional φ→ψ. This form of proof is much easier. While assumptions are not allowed in proofs in our system, in fact the following theorem about our system may be proved: Deduction theorem: If Γ∪{φ}` ψ, then Γ ` (φ→ψ) What the deduction theorem says is that whenever there exists a proof from (Γ and) φ to ψ, then there also exists a proof of φ→ψ (from Γ). That is not to say that one is allowed to assume φ in a proof of φ→ψ. But once one proves the deduction theorem (I won’t do that here), then if one can prove ψ after assuming φ, one is entitled to conclude that a proof of φ→ψ exists.
2.3
Semantics of PL
The notion of provability is one conception of logical consequence. An alternate conception is the following: ψ is a logical consequence of φ1 , . . . , φn , iff there is “no way” for φ1 , . . . , φn to be true while ψ is false. This is called a semantic conception of logical consequence, because it appeals to the notion of truth, which in turn involves the notion of meaning. We need to make the semantic conception of logical consequence more precise. We will in effect employ the method of truth tables, but in a formalized way. Let’s define the notion of a propositional assignment function (PL-assignment):
CHAPTER 2. PROPOSITIONAL LOGIC
23
A is a PL-assignment =d f A is a function, or rule, that assigns either 1 or 0 to every sentence letter of PL Think of 0 and 1 as truth values; thus a valuation assigns truth values to sentence letters. Instead of saying “let P be false, and Q be true”, we can say: let A be an assignment such that A(P )=0 and A(Q)=1. Once we settle what truth values a given assignment assigns to the sentence letters, the truth values of complex sentences containing those sentence letters are thereby fixed. The usual, informal, method for showing exactly how those truth values are fixed is by giving truth tables. The standard truth tables for the → and ∼ are the following: φ 1 1 0 0
→ 1 0 0 0
ψ 1 0 1 0
∼ 1 0
φ 0 1
What we will do, instead, is write out a formal definition of a function — the valuation function — that assigns truth values to complex sentences as a function of the truth values of their sentence letters — i.e., as a function of a given PL assignment A. But as you’ll see, the idea is the same as the truth tables: the truth tables are really just pictures of the valuation function: For any PL-assignment, A, the PL-valuation for A, VA , is defined as the function that assigns to each wff either 1 or 0, and which is such that, for any wffs φ and ψ, i) if φ is a sentence letter then VA (φ) = A(φ) ii) VA (φ→ψ) = 1 iff either VA (φ) = 0 or VA (ψ) = 1 iii) VA (∼φ) = 1 iff VA (φ) = 0 Notice an interesting feature of this definition: the very expression we are trying to define, ‘VA ’, appears in the right hand side of clauses ii) and iii) of the definition. Is that “circular”? Not in any objectionable way. This definition is what is called a “recursive” definition, and recursive definitions are legitimate despite this sort of circularity. The reason is that clauses ii) and iii) define
CHAPTER 2. PROPOSITIONAL LOGIC
24
the valuation of complex formulas (∼φ and φ→ψ) in terms of the valuations of smaller formulas (φ and ψ). These smaller formulas may themselves be complex, and therefore may have their valuations determined, via clauses ii) and iii), in terms of yet smaller formulas, and so on. But eventually (since every formula has a finite length!) the valuation of a formula will be determined by clause i), not clauses ii) and iii). And clause i) is not circular: in that clause, the valuation of φ is determined by a direct appeal to the assignment function A. Recursive definitions always “bottom out” in this way; they always include a clause (called the “base” clause) like i). A digression: notice that in the definition of a valuation function I use the English logical connectives ‘either…or’, and ‘iff ’; and in many places in the future I’ll be using other English logical words such as ‘not’, ‘all’, etc. I use these English connectives rather than the logical connectives a , ∨, →, ↔, ∼, ∀, and ∃, to draw attention to the fact that I’m not writing down wffs. of the language of study (in this case, the language of propositional logic), but rather writing down sentences of our “metalanguage” — the informal language we use to discuss logical languages. Notice that there’s no “circularity” here. In giving these definitions, our goal is not to define logical concepts such as negation, conjunction, quantification, etc. That is probably impossible. What we are doing is i) starting with the assumption that we already understand the logical concepts, ii) trying to use those notions to provide a formalized semantics for a logical language. This can be put in terms of “object language” — the language under study, in this case the language of propositional logic — and “metalanguage”, the language used to talk about the object language, in this case English. We use metalanguage connectives, such as ‘iff’ and ‘or’, to provide a semantics for the object language connectives ∼, →, etc. Back to the definition. There is no need to give clauses for the other connectives, since they’re not officially in our language — “φ∧ψ”, for example, is really just an abbreviation we use for the sentence ∼(φ→∼ψ). However we could derive that the following facts are indeed true of valuation functions: For any assignment A, and any wffs ψ and χ , VA (ψ∧χ ) = 1 VA (ψ∨χ ) = 1 VA (ψ↔χ ) = 1
iff VA (ψ) = 1 and VA (χ ) = 1 iff either VA (ψ) = 1 or VA (χ ) = 1 iff VA (ψ) = VA (χ )
We can show that all three of these hold. I’ll do the first one here; the others
CHAPTER 2. PROPOSITIONAL LOGIC
25
are homework exercises. I’ll write it out in a little more detail than you need to use, to make it clear exactly how the reasoning works: Let ψ and χ be any wffs. The expression ψ∧χ is an abbreviation for the expression ∼(ψ→∼χ ). So we want to show that, for any PLassignment A, VA (∼(ψ→∼χ )) = 1 iff VA (ψ)=1 and VA (χ )=1. Now, in order to show that a statement α holds iff a statement β holds, we must first show that if α holds, then β holds (the “forwards ⇒ direction”); then we must show that if β holds then α holds (the “backwards ⇐direction”): ⇒: First assume that VA (∼(ψ→∼χ ))=1. Then, by definition of the valuation function, clause for ∼, VA (ψ→∼χ )=0. So2 , VA (ψ→∼χ ) is not 1. But then, by the clause in the definition of VA for the →, we know that it’s not the case that: either VA (ψ)=0 or VA (∼χ )=1. That is: VA (ψ)=1 and VA (∼χ )=0. From the latter, by the clause for ∼, we know that VA (χ )=1. That’s what we wanted to show — that VA (ψ)=1 and VA (χ )=1. ⇐: This is sort of like undoing the previous half. Suppose that VA (ψ)=1 and VA (χ )=1. Since VA (χ )=1, by the clause for ∼, VA (∼χ )=0; but now since VA (ψ)=1 and VA (∼χ )=0, by the clause for → we know that VA (ψ→∼χ )=0; then by the clause for ∼, we know that VA (∼(ψ→∼χ ))=1, which is what we were trying to show. It’s important to distinguish the kind of argument I just gave from what were called “proofs” in section 2.2. The proofs of section 2.2 are object-language proofs. The notion of a proof was rigorously defined (“a sequence of wffs, each of which is either …”) and very restrictive: modus ponens was the only rule, and there was no conditional proof or proof by reductio. The argument just given, on the other hand, was given in the metalanguage, in an unformalized language, and using unformalized techniques of argument. “Unformalized” doesn’t imply lack of rigor. The argument was perfectly rigorous: it conforms to the standards of good argumentation that generally prevail in mathematics. We’re free to use reductio ad absurdum, conditional proof, universal proof (to The careful reader will note that here (and henceforth), I treat “VA (α)=0” and “VA (α) is not 1” interchangeably (for any wff α). (Similarly for “VA (α)=1” and “VA (α) is not 0”.) This is justified as follows. First, if VA (α) is 0, then it can’t also be that VA (α) is 1 — VA was stipulated to be a function. Second, since it was stipulated that VA assigns either 0 or 1 to each wff, if VA (α) is not 1, then VA (α) must be 0. 2
CHAPTER 2. PROPOSITIONAL LOGIC
26
establish something of the form “everything is thus-and-so”, we consider an arbitrary thing and show that it is thus-and-so). We may “skip steps” if it’s clear how the argument is supposed to go. In short, what we must do is convince a well-informed and mathematically sophisticated reader that the result we’re after is indeed correct. With our definition of an assignment, and of truth-in-an-assignment, in place, we can now define the semantic versions of the notions of logical truth, logical consequence, and logically correct argument. The semantic notion of a logical truth is that of a valid formula: φ is valid =d f for every PL-valuation, A, VA (φ)=1 We write “ φ” for “φ is valid”. The valid formulas of propositional logic are often called “tautologies”. As for logical consequence, the semantic version of this notion is that of one formula’s being a semantic consequence of a set of formulas: φ is a semantic consequence of the wffs in set Γ =d f for every PL-valuation, A, IF for each θ in Γ, VA (θ)=1, THEN VA (φ)=1 We write “Γ φ” for “φ is a semantic consequence of Γ” The intuitive idea is that φ is a semantic consequence of Γ if φ is true whenever each of the members of Γ is true. Finally, we can say that an argument is semantically logically correct iff its conclusion is a semantic consequence of (the set of) its premises. A parenthetical remark: now we can see the importance for setting up the grammar for our system according to precise rules. If we hadn’t, the definition of ‘truth value’ given here would have been impossible. In this definition we defined truth values of complicated formulas based on their form. For example, if a formula has the form (φ→ψ), then we assigned it an appropriate truth value based on the truth values of φ and ψ. But suppose we had a formula in our language that looked as follows: P →P →P and suppose that P has truth value 0. What is the truth value of the whole? We can’t tell, because of the missing parentheses. For if the parentheses look like this: (P →P )→P
CHAPTER 2. PROPOSITIONAL LOGIC
27
then the truth value is 0, whereas if the parentheses look like this: P →(P →P ) then it is 1. Certain kinds of ambiguity, then, make it impossible to assign truth value. We solve this problem in logic by pronouncing the original string “P →P →P ” as ill-formed; it is missing parentheses. Thus, the precise rules of grammar assure us that when it comes time for doing semantics, we’ll be able to assign semantic values (in this case, truth values) in an unambiguous way. Notice also a fact about validity in propositional logic: it is mechanically “decidable” — a computer program could be written that is capable of telling, for any given formula, whether or not that formula is valid. The program would simply construct a complete truth table for the formula in question. We can observe that this is possible by noting the following: every formula contains a finite number N of sentence letters, and so for any formula, there are only a finite number of different “cases” one needs to check — namely, the 2N permutations of truth values for the contained sentence letters. But given any assignment of truth values to the sentence letters of a formula, it’s clearly a perfectly mechanical procedure to compute the truth value the formula takes for those truth values — simply apply the rules for the ∼ and → repeatedly. It’s worth being very clear about two assumptions in this proof (which, by the way, is our first bit of metatheory — our first proof about a logical system). They are: that every formula has a finite number of sentence letters, and that the truth values of sentence letters not contained in a formula do not affect the truth value of the formula. We need the latter assumption to be sure that there are only a finite number of cases — namely, the permutations of truth values of the contained sentence letters — we need to check to see whether a formula is valid; for there are infinitely many valuation functions (since there are infinitely many sentence letters in the language of PL). These two assumptions are obviously true, but it would be good to prove them. I’ll prove the first assumption here; the second may be proved by a similar method, namely by induction. Proof that every wff contains a finite number of sentence letters: In this sort of proof by induction, we’re trying to prove a statement of the form: every wff has property P. The property P in this case is having a finite number of different sentence letters. In order to do this, we must show two separate statements:
CHAPTER 2. PROPOSITIONAL LOGIC
28
base case: we show that every atomic sentence has the property. This is obvious — atomic sentences are just sentence letters, and each of them contains one sentence letter, and thus finitely many different sentence letters. induction step: we begin by assuming that if formulas φ and ψ have the property, then so will the complex formulas one can form from φ and ψ by the rules of formation, namely ∼φ and φ→ψ. So, we assume that φ and ψ have finitely many different sentence letters; and we show that the same must hold for ∼φ and φ→ψ. That’s obvious: ∼φ has as many different sentence letters as does φ; since φ, by assumption, has only finitely many, then so does ∼φ. As for φ→ψ, by hypothesis, φ and ψ have finitely many different sentence letters, and so φ→ψ has, at most, n + m sentence letters, where n and m are the number of different sentence letters in φ and ψ, respectively. We’ve shown that every atomic formula has the property having a finite number of different sentence letters; and we’ve shown that the property is inherited by complex formulas built according to the recursion rules. But every wff is either atomic, or built from atomics by a finite series of applications of the recursion rules. Therefore, by induction, every wff has the property. QED.
2.4
Soundness and completeness of PL
We have two notions now, the notion of a theorem and the notion of a valid formula. An important accomplishment of metalogic is the establishment of the following two important connections between these notions: Soundness: every theorem is valid (If ` φ then φ) Completeness: every valid wff is a theorem (If φ then ` φ) It’s pretty easy to establish soundness: Soundness proof for PL: We’re going to prove this by induction as well. But our inductive proof here is slightly different. We’re not trying to prove something of the form “Every wff has property P”. Instead, we’re trying to prove something of the form “Every
CHAPTER 2. PROPOSITIONAL LOGIC theorem has property P”. In this case, the property P is: being a valid formula. Here’s how induction works in this case. A theorem is the last line of any proof. So, to show that every theorem has a certain property P, all we need to do is show that every time one adds another line to a proof, that line has property P. Now, there are two ways one can add to a proof. First, one can add an axiom. The base case of the inductive proof must show that adding axioms always means adding a line with property P. Second, one can add a formula that follows from earlier lines by a rule. The inductive step of the inductive proof must show that in this case, too, one adds a line with property P, provided all the preceding lines have property P. OK, here goes: base case: here we need to show that every PL-axiom is valid. This is tedious but straightforward. Take A1, for example. Suppose for reductio that some instance of A1 is invalid, i.e., for some assignment A, VA (φ→(ψ→φ))=0. Thus, VA (φ)=1 and VA (ψ→φ)=0. Given the latter, VA (φ)=0 — contradiction. Analogous proofs can be given that instances of A2 and A3 are also valid. induction step: here we begin by assuming that every line in a proof up to a certain point is valid (this is the “inductive hypothesis”); we then show that if one adds another line that follows from earlier lines by the rule modus ponens, that line must be valid too. I.e., we’re trying to show that “modus ponens preserves validity”. So, assume the inductive hypothesis: that all the earlier lines in the proof are valid. And now, consider the result of applying modus ponens. That means that the new line we’ve added to the proof is some formula ψ, which we’ve inferred from two earlier lines that have the forms φ→ψ and φ. We must show that ψ is a valid formula, i.e., that VA (ψ)=1 for every assignment A. By the inductive hypothesis, all earlier lines in the proof are valid, and hence both φ→ψ and φ are valid. Thus, VA (φ)=1 and VA (φ→ψ)=1. But if VA (φ)=1 then VA (ψ) can’t be 0, for if it were, then VA (φ→ψ) would be 0, and it isn’t. Thus, VA (ψ)=1. We’ve shown that axioms are valid, and that modus ponens preserves validity. So, by induction, every time one adds to a proof, one adds a valid formula. So the last line in a proof is always a valid
29
CHAPTER 2. PROPOSITIONAL LOGIC
30
formula. Thus, every theorem is valid. QED. Notice the general structure of this proof: we first showed that every axiom has a certain property, and then we showed that the rule of inference preserves the property. Given the definition of ‘theorem’, it followed that every theorem has the property. We chose our definition of a theorem with just this sort of proof in mind. Remember that this is a proof in the metalanguage, about propositional logic. It isn’t a proof in any system of derivation. Completeness is harder, and I won’t prove it here. Before we leave this chapter, let me summarize and clarify the nature of proofs by induction. Induction is the method of proof to use whenever one is trying to prove that every member of an infinite class of entities has a certain feature F, where each member of that infinite class of entities is generated from certain “starting points” by a finite number of successive “operations”. To do this, one establishes two things: a) that the starting points have feature F, and b) that the operations preserve feature F — i.e., that if the inputs to the operations have feature F then the output also has feature F. In logic, it is important to distinguish two different cases where proofs by induction are needed. One case is where one is establishing a fact of the form: every theorem has a certain feature F. (The proof of the soundness theorem is an example of this case.) Here’s why induction is applicable: a theorem is defined as the last line of a proof. So the fact to be established is that every line in every proof has feature F. Now, a proof is defined as a finite sequence, where each member is either an axiom or follows from earlier lines by the rule modus ponens. The axioms are the “starting points” and modus ponens is the “operation”. So all that’s required to show that every line in every proof has feature F is to show that a) the axioms all have feature F, and b) show that if you start with formulas that have feature F, and you apply modus ponens, then what you get is something with feature F. More carefully, b) means: if φ has feature F, and φ→ψ has feature F, then ψ has feature F. Once a) and b) are established, one can conclude by induction that all lines in all proofs have feature F. A second case in which induction may be used is when one is trying to establish a fact of the form: every formula has a certain feature F. (The proof that every wff has a finite number of sentence letter is an example of this case.) Here’s why induction is applicable: all formulas are built out of sentence letters (the “starting points”) by successive applications of the rules of formation (“operations”) (the rules of formation, recall, say that if φ and ψ are formulas,
CHAPTER 2. PROPOSITIONAL LOGIC
31
then so are (φ→ψ) and ∼φ.) So, to show that all formulas have feature F, we must merely show that a) all the sentence letters have feature F, and b) show that if φ and ψ both have feature F, then both (φ→ψ) and ∼φ also will have feature F. In any given proof by induction, it’s important to identify which sort of inductive proof one is dealing with.
2.5
Natural Deduction in Propositional Logic
This chapter develops the method of natural deduction for propositional logic. This is an alternate method for defining provability. Natural deduction proofs are much easier to construct than axiomatic proofs.
2.5.1
Sequents
The basic notion is that of a sequent. A sequent looks like this: Γ`φ Γ is a set of formulas, and φ is a particular formula. The symbol in the middle, `, is the same symbol as we used as the metalanguage symbol for provability in section 2.2; but here it is being used for a (somewhat) different purposes: the symbol itself is a part of what are now calling sequents. What does a sequent mean? Well, strictly speaking a sequent isn’t a statement, so it doesn’t mean anything; but informally think of the sequent Γ ` φ as meaning that φ is a logical consequence of Γ. Accordingly, we can introduce a notion of logical correctness for sequents: the sequent Γ ` φ is logically correct if the formula φ is a logical consequence of the formulas in Γ. Let’s call the formulas in Γ the premises of the sequent, and φ the conclusion of the sequent; thus, one is entitled to conclude the conclusion of a logically correct sequent from its premises. Now, from our investigation of the semantics of propositional logic, we already have the makings of a semantic criterion for when a sequent is logically correct: the sequent Γ ` φ is logically correct iff φ is a semantic consequence of Γ. We could also, if we wanted, use the methods of section 2.2 to give a proof-theoretic criterion: Γ ` φ is logically correct iff φ is provable from Γ. But in this section, we will explore a new sort of proof-theoretic criterion for the logical correctness of sequents.
CHAPTER 2. PROPOSITIONAL LOGIC
32
The purpose of this new criterion is to model the kind of reasoning one employs in everyday life (and which is modeled in the derivation systems of many introductory logic books.) In ordinary reasoning, one reasons by means of assumptions. For example, in order to establish a conditional claim “if P then Q”, one would ordinarily i) assume P, ii) reason one’s way to Q, iii) and on that basis conclude that the conditional claim “if P then Q” is true. Another example: to establish a claim of the form “not P”, one would ordinarily i) assume P, ii) reason one’s way to a contradiction, iii) and on that basis conclude that “not P” is true. Reasoning by conditional proof and by reductio ad absurdum is commonplace in ordinary life, but is not allowed in the proof system of section 2.2 (that is why proofs were so hard to construct in that system!) The natural deduction proofs that we will introduce in this section do allow (analogs of) conditional proof and reductio, and are therefore much easier to construct. Superficially, proofs in our natural deduction system will look a lot like axiomatic proofs. A natural deduction proof is defined as a sequence of sequents; to prove a sequent, one must simply exhibit a proof whose last line is the desired sequent. Rules of inference now allow us to move from sequents to sequents, rather than from formulas to formulas. Some sequents may be entered into a proof at any time; these function like axioms. But as we will see, natural deduction proofs are much easier to construct than axiomatic proofs, since the “conditional” nature of sequents will, in essence, allow us to work with assumptions. (It’s a bit weird to think of proving sequents, since each sequent itself asserts a relation of logical consequence between its premises and its conclusion. But you’ll get used to it.)
2.5.2
Rules
Our rules tell us when we can write down sequents. The first rule, the rule of assumptions (As), says we can always write down any sequent of the following form: φ`φ All such a sequent says is that φ can be proved if we assume φ itself. The rest of the rules consist of introduction rules and elimination rules for each of the connectives ∼, ∧, ∨, → (let’s forget the ↔; φ↔ψ can just be eliminated in favor of (φ→ψ)∧(ψ→φ).) Let’s start with ∧I:
CHAPTER 2. PROPOSITIONAL LOGIC
33
Γ`φ ∆`ψ Γ,∆ ` φ∧ψ This says that if some assumptions Γ lead to φ, and some assumptions ∆ lead to ψ, then from all the assumptions together, the ones in Γ and the ones in ∆, we can get to φ∧ψ. Next we have ∧E: Γ ` φ∧ψ Γ`φ Γ`ψ This says that if Γ leads to the conjunction φaψ, then we should be able to infer either of the two sequents below the line, Γ ` φ or Γ ` ψ (or both). Next, ∨I and ∨E: Γ`φ Γ ` φ∨ψ
Γ ` ψ∨φ
Γ ` φ∨ψ ∆1 ,φ ` χ Γ,∆1 ,∆2 ` χ
∆2 ,ψ ` χ
Let’s think about what ∨E means. Remember the intuitive meaning of a sequent: its conclusion is a logical consequence of its premise. Another (related) way to think of it is that Γ ` φ means that one can establish that φ if one assumes the members of Γ. So, if the sequent Γ ` φ∨ψ is logically correct, that means we’ve got the disjunction φ∨ψ, assuming the formulas in Γ. Now, suppose we can reason to a new formula χ , assuming φ, plus perhaps some other assumptions ∆1 . And suppose we can also reason to χ from ψ, plus perhaps some other assumptions ∆2 . Then, since either φ or ψ (plus the assumptions in ∆1 and ∆2 ) leads to χ , and we know that φ∨ψ is true (conditional on the assumptions in Γ), we ought to be able to infer χ itself, assuming the assumptions we needed along the way (∆1 and ∆2 ), plus the assumptions we needed to get φ∨ψ, namely, Γ. Next, we have double negation (DN): Γ`φ Γ ` ∼∼φ
Γ ` ∼∼φ Γ`φ
In connection with negation, we also have the rule of reductio ad absurdum, RAA: Γ,φ ` ψ∧∼ψ Γ ` ∼φ
CHAPTER 2. PROPOSITIONAL LOGIC
34
That is, if φ (along with perhaps some other assumptions, Γ) leads to a contradiction, we can conclude that ∼φ is true (given the assumptions in Γ). And finally we have →I and →E: Γ ` φ→ψ Γ,∆ ` ψ
Γ,φ ` ψ Γ ` φ→ψ
∆`φ
→E is perfectly straightforward; it’s just modus ponens. But →I requires a bit more thought. →I is just the principle of conditional proof. Suppose you can get to ψ on the assumption that φ (plus perhaps some other assumptions Γ.) Then, you should be able to conclude that the conditional φ→ψ is true (assuming the formulas in Γ). Put another way: if you want to establish the conditional φ→ψ, all you need to do is assume that φ is true, and reason your way to ψ.
2.5.3
Example derivations
While this system may not look exactly like the derivation systems familiar from introductory logic textbooks, the essence is the same. Each system is designed to model reasoning with assumptions. The difference is that the derivation systems in introductory textbooks keep track of which assumptions are operative at a given point in a proof by means of the placement of that point in the proof on the page, and by means of drawing lines or boxes around parts of the proof once the assumptions that led to those parts are no longer operative, whereas the current system keeps track of the operative assumptions by explicitly placing them in the premise sets of sequents. A little experimentation will show that all the usual techniques for proving things in the usual systems carry over to the present system. In this section I’ll do a few examples to make this clear. A first simple example: let’s show that P ∧Q ` Q∧P : 1. 2. 3. 4.
P ∧Q ` P ∧Q P ∧Q ` P P ∧Q ` Q P ∧Q ` Q∧P
As 1, ∧E 1, ∧E 2, 3 ∧I
Notice the strategy. We first use the rule of assumptions to enter the premise of the sequent we’re trying to prove: P ∧Q. We then use the rules of inference to infer the consequent of that sequent: Q∧P . Since our initial assumption of P ∧Q was dependent on the formula P ∧Q, our subsequent inferences remain
CHAPTER 2. PROPOSITIONAL LOGIC
35
dependent on that same assumption, and so the final sequent concluded, Q∧P , remains dependent on that assumption. Let’s write our proofs out in a simpler way. Instead of writing out entire sequents, let’s write out only their conclusions. We can indicate the premises of the sequent using line numbers; the line numbers indicating the premises of the sequent will go to the left of the number indicating the sequent itself. Rewriting the previous proof in this way yields: 1 (1) P ∧Q 1 (2) P 1 (3) Q 1 (4) Q∧P
As 1, ∧E 1, ∧E 2, 3 ∧I
Next, let’s have an example to illustrate conditional proof. Let’s establish P →Q, Q→R ` P →R: 1. P →Q ` P →Q 2. Q→R ` Q→R 3. P ` P 4. P →Q, P ` Q 5. P →Q, Q→R, P ` R 6. P →Q, Q→R ` P →R
As As As 1,3 →E 2,4 →E 5,→I
It can be rewritten in the simpler style as follows: 1 (1) 2 (2) 3 (3) 1,3 (4) 1,2,3 (5) 1,2 (6)
P →Q Q→R P Q R P →R
As As As 1, 3 →E 2, 4 →E 5, →I
Let’s think about this example. We’re trying to establish P →R on the basis of two formulas, P →Q and Q→R, so we start by assuming the latter two formulas. Then, since the formula we’re trying to establish is a conditional, we assume the antecedent of the conditional, in line 3. We then proceed, on that basis, to reason our way to R, the consequent of the conditional we’re trying to prove. (Notice how in lines 4 and 5, we add more line numbers on the very left. Whenever we use → E, we increase dependencies: when we infer
CHAPTER 2. PROPOSITIONAL LOGIC
36
Q from P and P →Q, our conclusion Q depends on all the formulas that P and P →Q depended on, namely, the formulas on lines 1 and 3. Look back to the statement of the rule → E: the conclusion ψ depends on all the formulas that φ and φ→ψ depended on: Γ and ∆.) That brings us to line 5. At that point, we’ve shown that R can be proven, on the basis of various assumptions, including P . The rule →I — that is, the rule of conditional proof — then lets us conclude that the conditional P →R follows merely on the basis of the other assumptions; that rule, note, lets us in line 6 drop line 3 from the list of assumptions on which P →R depends. Next let’s establish an instance of DeMorgan’s Law, ∼(P ∨Q) ` ∼P ∧∼Q: 1 2 2 1,2 1 6 6 1,6 1 1
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
∼(P ∨Q) P P ∨Q (P ∨Q)∧∼(P ∨Q) ∼P Q P ∨Q (P ∨Q)∧∼(P ∨Q) ∼Q ∼P ∧∼Q
As As (for reductio) 2, ∨I 1, 3 ∧I 4, RAA As (for reductio) 6, ∨I 1, 7 ∧I 8, RAA 5, 9∧I
Next let’s establish ∅ ` P ∨∼P (∅ is the null set): 1 (1) 2 (2) 2 (3) 2,1 (4) 1 (5) 6 (6) 6 (7) 6,1 (8) 1 (9) 1 (10) ∅ (11) ∅ (12)
∼(P ∨∼P ) P P ∨∼P (P ∨∼P )∧∼(P ∨∼P ) ∼P ∼P P ∨∼P (P ∨∼P )∧∼(P ∨∼P ) ∼∼P ∼P ∧∼∼P ∼∼(P ∨∼P ) P ∨∼P
As As (for reductio) 2, ∨I 1, 3 ∧I 4, RAA As (for reductio) 6, ∨I 1, 7 ∧I 8, RAA 5, 9 ∧I 10, RAA 11, DN
Comment: my overall goal was to assume ∼(P ∨∼P ) and then derive a con-
CHAPTER 2. PROPOSITIONAL LOGIC
37
tradiction. And my route to the contradiction was to separately establish ∼P (lines 2-5) and ∼∼P (lines 6-9), each by reductio arguments. Finally, let’s establish a sequent corresponding to a way that ∨ E is sometimes formulated: P ∨Q, ∼P ` Q: 1 (1) 2 (2) 3 (3) 4 (4) 5 (5) 4,5 (6) 4,5 (7) 2,4,5 (8) 2,4 (9) 2,4 (10) 1,2 (11)
P ∨Q ∼P Q P ∼Q ∼Q∧P P P ∧∼P ∼∼Q Q Q
As As As (for use with ∨E) As (for use with ∨E) As (for reductio) 4,5 ∧I 6, ∧E 2,7 ∧I 8, RAA 9, DN 1,3,10 ∨E
The basic idea of this proof is to use ∨ E on line 1 to get Q. That calls, in turn, for showing that each disjunct of line 1, P and Q, leads to Q. Showing that Q leads to Q is easy; that was line 3. Showing that P leads to Q took lines 4-10; line 10 states the result of that reasoning, namely that Q follows from P (as well as line 2). I began at line 4 by assuming P . Then my strategy was to establish Q by reductio, so I assumed ∼Q in line 5. At this point, I basically had my contradiction: at line 2 I had ∼P and at line 4 I had P . (You might think I had another contradiction: Q at line 3 and ∼Q at line 5. But at the end of the proof, I don’t want my conclusion to depend on line 3, whereas I don’t mind it depending on line 2, since that’s one of the premises of the sequent I’m trying to establish.) So I want to put P and ∼P together, to get P ∧∼P , and then conclude ∼∼Q by RAA. But there is a minor hitch. Look carefully at how RAA is formulated. It says that if we have Γ,φ ` ψ∧∼ψ, we can conclude Γ ` ∼φ. The first of these two sequents includes φ in its premises. That means that in order to conclude ∼φ, the contradiction ψ∧∼ψ needs to depend on φ. So in the present case, in order to finish the reductio argument and conclude ∼∼Q, the contradiction P ∧∼P needs to depend on the reductio assumption ∼Q (line 5.) But if I just used ∧I to put lines 2 and 4 together, the resulting contradiction will only depend on lines 2 and 4. To get around this, I used a little trick. Whenever you have a sequent Γ ` φ, you can always add any formula ψ you like to the premises on which φ depends, using the following
CHAPTER 2. PROPOSITIONAL LOGIC
38
argument:3 Γ`φ ψ`ψ Γ,ψ ` φ∧ψ Γ,ψ ` φ
(begin with this) As (ψ is any chosen formula) ∧I ∧E
Lines 4, 6 and 7 in the proof employ this trick: initially, at line 4, P only depends on 4, but then by line 7, P also depends on 5. That way, the move from 8 to 9 by RAA is justified.
2.5.4
Theoremhood and consequence
So far we have defined the notion of a provable sequent. We can now use this to define sequent-proof-theoretic (“SPT”) versions of the notions of logical truth, logical consequence, and logical correctness: A formula φ is a SPT-logical truth iff the sequent ∅ ` φ is a provable sequent. Formula φ is a SPT-logical consequence of the formulas in set Γ iff the sequent Γ ` φ is a provable sequent. An argument is SPT-logically correct iff its conclusion is a SPT-logical consequence of its premises Given these definitions, and given the semantics for PL described in section 2.3, one could give soundness and completeness proofs.4 The soundness proof, for instance, would proceed by proving by induction that whenever sequent Γ ` φ is provable, φ is a semantic consequence of Γ. This would proceed by showing that each rule of inference (As, RAA, ∧I, ∧E, etc.) preserves semantic consequence. The completeness proof is harder. Adding arbitrary dependencies is not allowed in relevance logic, where a sequent is provable only when all of its premises are, in an intuitive sense, relevant to its conclusion. Relevant logicians modify various rules of classical logic, including the rule of ∧E. 4 Note: if we allow sequents with infinite premise-sets, then to secure completeness we will need to add a rule for adding dependencies: from Γ ` φ conclude Γ ∪ ∆ ` φ, where ∆ is any set of wffs. The trick described above for adding dependencies (using RA, ∧I and ∧E) only allows us to add one wff at a time to the premise set of a given sequent, whereas this new rule allows adding infinitely many. 3
Chapter 3 Variations and Deviations from Standard Propositional Logic s promised, we will not stop with the standard logics familiar from introductory textbooks. In this chapter we examine some philosophically important variations and deviations from standard propositional logic.
A
3.1
Alternate connectives
3.1.1
Symbolizing truth functions in propositional logic
Our propositional logic is complete, in a certain sense that I’ll now explain. The basic idea is that we can say anything we want to say with it. More carefully, let’s introduce the idea of a truth function. A truth function is a (finite-placed) function or rule that maps truth values (i.e., 0s and 1s) to truth values. For example, here is a truth function, f : f (1) = 0 f (0) = 1 This is a called one-place function because it takes only one truth value as input. In fact, we have a name for this truth function: negation. And we express that truth function with our symbol ∼. So the negation truth function is one we can symbolize in propositional logic. More carefully, what we mean by saying “We can symbolize truth function f in propositional logic” is this: 39
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
40
there is some sentence of propositional logic, φ, containing a single sentence letter, P , which has the following feature: whenever P has a truth value t , then the whole sentence φ has the truth value f (t ). The sentence φ is in fact ∼P . Here’s another truth function, g . g is a two-place truth function, which means that it takes two truth values as inputs: g (1, 1) = 1 g (1, 0) = 0 g (0, 1) = 0 g (0, 0) = 0 In fact, this is the conjunction truth function. And have a symbol for this truth function: ∧. And as before, we can symbolize function g in propositional logic; this means that: There is some sentence of propositional logic, φ, containing two sentence letters, P and Q, which has the following feature: whenever P has a truth value, t , and Q has a truth value, t 0 , then the whole sentence φ has the truth value g (t , t 0 ) One such sentence φ is in fact: P ∧Q. Notice that the sentence φ that symbolizes1 the function g had to have two sentence letters rather than one. That’s because the function g was a two-place function. In general, the definition of “can be symbolized” is this: n-place truth function h can be symbolized in propositional logic =d f there is some sentence of propositional logic, φ, containing n sentence letters, P1 . . . Pn , which has the following feature: whenever P1 has a truth value, t1 , and, …, and Pn has a truth value, tn , then the whole sentence φ has the truth value h(t1 . . . tn ) Here’s another truth function: i(1, 1) = 0 i(1, 0) = 1 i(0, 1) = 1 i(0, 0) = 1 1
Strictly speaking, we should speak of a sentence symbolizing a truth function relative to an ordering of its sentence letters.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
41
Think of this truth function as “not both”. Unlike the negation and conjunction truth functions, we don’t have a single symbol for this truth function. Nevertheless, it can be symbolized in propositional logic: by the following sentence: ∼(P ∧Q) Now, it’s not too hard to prove that: All truth functions (of any finite number of places) can be symbolized in propositional logic using just the ∧, ∨, and ∼ I’ll begin by illustrating the idea of the proof with an example. Suppose we want to symbolize the following three-place truth-function: f( f( f( f( f( f( f( f(
1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 0, 0, 1, 1, 0, 0,
1 0 1 0 1 0 1 0
)=0 )=1 )=0 )=1 )=0 )=0 )=1 )=0
Since this truth function returns the value 1 in just three cases (rows two, four, and seven), what we want is a sentence containing three sentence letters, P1 , P2 , P3 , that is true in exactly those three cases: (a) when P1 , P2 , P3 take on the three truth values in the second row (i.e., 1, 1, 0), (b) when P1 , P2 , P3 take on the three truth values in the fourth row (1, 0, 0), and (c) when P1 , P2 , P3 take on the three truth values in the seventh row (0, 0, 1). Now, we can construct a sentence that is true in case (a) and false otherwise: P1 ∧P2 ∧∼P3 . We can also construct a sentence that’s true in case (b) and false otherwise: P1 ∧∼P2 ∧∼P3 . And we can also construct a sentence that’s true in case (c) and false otherwise: ∼P1 ∧∼P2 ∧P3 . But then we can simply disjoin these three sentences to get the sentence we want: (P1 ∧P2 ∧∼P3 ) ∨ (P1 ∧∼P2 ∧∼P3 ) ∨ (∼P1 ∧∼P2 ∧P3 )
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
42
(Strictly speaking the three-way conjunctions, and the three-way disjunction, need parentheses added, but since it doesn’t matter where they’re added — conjunction and disjunction are associative — I’ve left them off.) This strategy is in fact purely general. Any n-place truth function, f , can be represented by a chart like the one above. Each row in the chart consists of a certain combination of n truth values, followed by the truth value returned by f for those n inputs. For each such row, construct a conjunction whose i th conjunct is Pi if the i th truth value in the row is 1, and ∼Pi if the i th truth value in the row is 0. Notice that the conjunction just constructed is true if and only if its sentence letters have the truth values corresponding to the row in question. The desired formula is then simply the disjunction of all and only the conjunctions for rows where the function f returns the value 1.2 Since the conjunction for a given row is true iff its sentence letters have the truth values corresponding to the row in question, the resulting disjunction is true iff its sentence letters have truth values corresponding to one of the rows where f returns the value true, which is what we want. Say that a set of connectives is adequate iff one can symbolize all the truth functions using a sentence containing only those connectives. What we just showed was that the set {∧, ∨, ∼} is adequate. We can then use this fact to prove that other sets of connectives are adequate. For example, it is easy to prove that φ∨ψ has the same truth table as (is true relative to exactly the same PL-assignments as) ∼(∼φ∧∼ψ). But that means that for any sentence χ whose only connectives are ∧, ∨, and ∼, we can construct another sentence χ 0 with the same truth table but whose only connectives are ∧ and ∼: simply begin with χ and use the equivalence between φ∨ψ and ∼(∼φ∧∼ψ) to eliminate all occurrences of ∨ in favor of occurrences of ∧ and ∼. But now consider any truth function f . Since {∧, ∨, ∼} is adequate, f can be symbolized by some sentence χ ; but χ has the same truth table as some sentence χ 0 whose only connectives are ∧ , and ∼; hence f can be symbolized by χ 0 as well. So {∧, ∼} is adequate. Similar arguments can be given to show that other connective sets are adequate as well. For example, the ∧ can be eliminated in favor of the → and the ∼ (since φ∧ψ has the same truth table as ∼(φ→∼ψ)); therefore, since {∧, ∼} is adequate, {→, ∼} is also adequate. 2
Special case: if there are no such rows — i.e., if the function returns 0 for all inputs — then let the formula be simply any logically false formula containing P1 . . . Pn , for example P1 ∧∼P1 ∧P2 ∧P3 ∧ · · · ∧Pn .
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
3.1.2
43
Inadequate connective sets
Can we show that certain sets of connectives that are not adequate? The answer is yes. Let’s prove that {∧, →} is not an adequate set of connectives. We’ll do this by proving that if those were our only connectives, we couldn’t symbolize the negation truth function. Let’s prove this fact: For any sentence, φ, containing just sentence letter P and the connectives ∧ and →, if P is true then so is φ. We’ll again use the method of induction. We want to show that the assertion is true for all sentences. So we first prove that the assertion is true for all sentence with no connectives (i.e., for sentences containing just sentence letters.) This is the base case, and is very easy here, since if φ has no connectives, then obviously φ is just the sentence letter P itself, in which case, clearly, if P is true then so is φ. Next we assume the inductive hypothesis: Inductive hypothesis: the assertion holds true for sentences φ and ψ And we try to show, on the basis of this assumption, that: The assertion holds true for φ∧ψ and φ→ψ This is easy to show. We want to show that the assertion holds for φ∧ψ — that is, if P is true then so is φ∧ψ. But we know by the inductive hypothesis that the assertion holds for φ and ψ individually. So we know that if P is true, then both φ and ψ are true. But then, we know from the truth table for ∧ that φ∧ψ is also true in this case. The reasoning is exactly parallel for φ→ψ: the inductive hypothesis tells us that whenever P is true, so are φ and ψ, and then we know that in this case φ→ψ must also then be true, by the truth table for →. Therefore, by induction, the result is proved.
3.1.3
Sheffer stroke
We’ve seen how we can choose alternate sets of connectives. Some of these choices are adequate (i.e., allow symbolization of all truth functions), others are not.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
44
As we saw, there are some truth functions that can be symbolized in propositional logic, but not by a single connective (e.g., the not-both function i discussed above.) We could change this, by adding a new connective. Let’s use a new connective, the “Sheffer stroke”, |, to symbolize not-both. φ|ψ is to mean that not both φ and ψ are true, so let’s stipulate that φ|ψ will have the same truth table as ∼(φ∧ψ), i.e: φ 1 1 0 0
| ψ 0 1 1 0 1 1 1 0
Now here’s an exciting thing about |: it’s an adequate connective all on its own. You can symbolize all the truth functions using just |! Here’s how we can prove this. We showed above that {→, ∼} is adequate; so all we need to do is show how to define the → and the ∼ using just the |. Defining ∼ is easy; φ|φ has the same truth table as ∼φ. As for φ→ψ, think of it this way. φ→ψ is equivalent to ∼(φ∧∼ψ), i.e., φ|∼ψ. But given the method just given for defining ∼ in terms of |, we know that ∼ψ is equivalent to ψ|ψ. Thus, φ→ψ has the same truth table as: φ|(ψ|ψ).
3.2
Polish notation
Alternate connectives, like the Sheffer stroke, are called “variations” of standard logic because they don’t really change what we’re saying with propositional logic; it’s just a change in notation. Another fun change in notation is polish notation. The basic idea of polish notation is that the connectives all go before the sentences they connect. Instead of writing P ∧Q, we write ∧P Q. Instead of writing P ∨Q we write ∨P Q. Formally, here is the definition of a wff: i) sentence letters are wffs ii) if φ and ψ are wffs, then so are: ∧φψ ∨φψ →φψ
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
45
↔φψ ∼φ What’s the point? This notation eliminates the need for parentheses. With the usual notation, in which we put the connectives between the sentences they connect, we need parentheses to distinguish, e.g.: (P ∧Q)→R P ∧(Q→R) But with polish notation, these are distinguished without parentheses; they become: →∧P QR ∧P →QR respectively.
3.3
Multi-valued logic3
Logicians have considered adding a third truth value to the usual two. In these new systems, in addition to truth (1) and falsity (0), we have a third truth value, #. There are a number of things one could take # to mean (e.g., “meaningless”, or “undefined”, or “unknown”). Standard logic is “bivalent” — that means that there are no more than two truth values. So, moving from standard logic to a system that admits a third truth value is called “denying bivalence”. One could deny bivalence, and go even further, and admit four, five, or even infinitely many truth values. But we’ll only discuss trivalent systems — i.e., systems with only three truth values. Why would one want to admit a third truth value? There are various philosophical reasons one might give. One concerns vagueness. A person with one dollar is not rich. A person with a million dollars is rich. Somewhere in the middle, there are some people that are hard to classify. Perhaps a person with $100,000 is such a person. They seem neither definitely rich nor definitely not rich. So there’s pressure to say that the statement “this person is rich” is capable of being neither definitely true nor definitely false. It’s vague. Others say we need a third truth value for statements about the future. If it is in some sense “not yet determined” whether there will be a sea battle tomorrow, then (it is argued) the sentence: 3
See Gamut (1991a, pp. 173-183).
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
46
There will be a sea battle tomorrow is neither true nor false. In general, statements about the future are neither true nor false if there is nothing about the present that determines their truth value one way or the other.4 Yet another case in which some have claimed that bivalence fails concerns failed presupposition. Consider this sentence: Ted stopped beating his dog. In fact, I have never beaten a dog. I don’t even have a dog. So is it true that I stopped beating my dog? Obviously not. But on the other hand, is this statement false? Certainly no one would want to assert its negation: “Ted has not stopped beating his dog”. The sentence presupposes that I was beating a dog; since this presupposition is false, the question of the sentence’s truth does not arise: the sentence is neither true nor false. For a final challenge to bivalence, consider the sentence: Sherlock Holmes has a mole on his left leg ‘Sherlock Holmes’ doesn’t refer to a real entity. Further, Sir Arthur Conan Doyle does not specify in his Sherlock Holmes stories whether Holmes has such a mole. Either of these reasons might be argued to result in a truth value gap for the displayed sentence. It’s an interesting philosophical question whether any of these arguments for bivalence’s failing are any good. But we won’t take up that question. Instead, we’ll look at the formal result of giving up bivalence. That is, we’ll introduce some non-bivalent formal systems. We won’t ask whether these systems really model English correctly. These systems all give different truth tables for the Boolean connectives. The original truth tables give you the truth values of complex formulas based on whether their sentence letters are true or false (1 or 0). The new truth tables need to take into account cases where the sentence letters are # (neither 1 nor 0). 4 An alternate view preserves the “openness of the future” as well as bivalence: both ‘There will be a sea battle tomorrow’ and ‘There will fail to be a sea battle tomorrow’ are false. This combination is not contradictory, provided one rejects the equivalence of “It will be the case tomorrow that ∼φ” and “∼ it will be the case tomorrow that φ”.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
3.3.1
47
Łukasiewicz’s system
Here are the new truth tables (let’s skip the ↔): ∼ 1 0 0 1 # #
∧ 1 0 #
1 1 0 #
0 0 0 0
# # 0 #
∨ 1 0 #
1 1 1 1
0 1 0 #
# 1 # #
→ 1 0 #
1 1 1 1
0 0 1 #
# # 1 1
Using these truth tables, one can calculate truth values of wholes based on truth values of parts. Example: Where P is 1, Q is 0 and R is #, calculate the truth value of (P∨Q)→∼(R→Q). First, what is R→Q? Answer, from the truth table for →: #. Next, what is ∼(R→Q)? From the truth table for ∼, we know that the negation of a # is a #. So, ∼(R→Q) is #. Next, P ∨Q: that’s 1 ∨ 0 — i.e., 0. Finally, the whole thing: 0→# — i.e., 1. If we wanted, we could formalize all of this a bit more by defining up new notions of an assignment, and of truth relative to an assignment. The new notion of an assignment — a trivalent assignment — would be that of a function that assigns to each sentence letter exactly one of the values: 1, 0, #. And the new notion of truth relative to a trivalent assignment would be defined thus: For any trivalent assignment, A, the valuation for A, VA , is defined as the function that assigns to each wff either 1, 0, or #, and which is such that, for any wffs φ and ψ, i) if φ is a sentence letter then VA (φ)=A(φ) ii) VA (φaψ) =
1 if VA (φ)=1 and VA (ψ)=1 0 if VA (φ)=0 or VA (ψ)=0 # otherwise
iii) VA (φ∨ψ) =
1 if VA (φ)=1 or VA (ψ)=1 0 if VA (φ)=0 and VA (ψ)=0 # otherwise
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
48
1 if either: VA (φ)=0, or VA (ψ)=1, or VA (φ)=# and VA (ψ)=# iv) VA (φ→ψ) = 0 VA (φ)=1 and VA (ψ)=0 # otherwise
v) VA (∼φ) =
1 if VA (φ)=0 0 if VA (φ)=1 # otherwise
Let’s keep the definitions of validity and semantic consequence from before. Thus, φ is a valid formula iff it is true no matter what truth values its sentence letters take on. More formally: φ is valid iff for every trivalent assignment A, VA (φ)=1. φ is a semantic consequence of Γ iff φ is true whenever all the members of Γ are true — that is, iff for each trivalent assignment, if the valuation for that assignment assigns 1 to each member of Γ then it also assigns 1 to φ. So now, there are two ways a formula could fail to be valid. It could be 0 under some PL-valuation, or it could be # under some PL-valuation. “Valid formula” (as I’ve defined it) means always true; it does not mean never false. (A formula could be never false and still not always true, if it sometimes is #.) Example: is P ∨∼P valid? Answer: no, it isn’t. Suppose P is #. Then ∼P is #; but then the whole thing is # (since #∨# is #.) Example: is P →P valid? Answer: yes. P could be either 1, 0 or #. From the truth table for →, we see that P →P is 1 in all three cases.
3.3.2
Kleene’s “strong” tables
This system is like Łukasiewicz’s system, except that the truth table for the → is different:
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL → 1 0 #
1 1 1 1
0 0 1 #
49
# # 1 #
Here is the intuitive idea behind the Kleene tables. Let’s call the truth values 0 and 1 the “classical” truth values. If a formula’s halves have only classical truth values, then the truth value of the whole formula is just the classical truth value determined by the classical truth values of the halves. But if one or both halves are #, then we must consider the result of turning each # into one of the classical truth values. If the entire formula would sometimes be 1 and sometimes be 0 after doing this, then the entire formula is #. But if the entire formula always takes the same truth value, X, no matter which classical truth value any #s are turned into, then the entire formula gets this truth value X. Intuitively: if there is “enough information” in the classical truth values of a formula’s parts to settle on one particular classical truth value, then that truth value is the formula’s truth value. Take the truth table for φ→ψ, for example. When φ is 0 and ψ is #, the whole formula is 1 — because the false antecedent is sufficient to make the whole formula true, no matter what classical truth value we convert ψ to. On the other hand, when φ is 1 and ψ is #, then the whole formula is #. The reason is that what classical truth value we substitute in for ψ’s # affects the truth value of the whole. If the # becomes a 0 then the whole thing is 0; but if the # becomes a 1 then the whole thing is 1. There are two important differences between Łukasiewicz’s and Kleene’s systems. The first is that, unlike Łukasiewicz’s system, Kleene’s system makes the formula P →P invalid. The reason is that in Kleene’s system, #→# is #; thus, P →P isn’t true in all valuations (it is # in the valuation where P is #.) In fact, it’s easy to prove that there are no valid formulas in Kleene’s system. Consider the valuation that makes every sentence letter #. Here’s an inductive proof that every wff is # in this interpretation. Base case: all the sentence letters are # in this interpretation. (That’s obvious.) Inductive case: assume that φ and ψ are both # in this interpretation. We need now to show that φ∧ψ, φ∨ψ, and φ→ψ are all # in this interpretation. But that’s easy — just look at the truth tables for ∧, ∨ and →. #∧# is #, #∨# is #, and #→# is #. QED. Even though there are no valid formulas in Kleene’s system, there are still cases of semantic consequence. Let’s consider to understand semantic consequence as we did above: Γ φ iff φ is true whenever every member of Γ is true.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
50
Then in Kleene’s system (and all the other systems we’ll consider), P ∧Q P . That’s because the definition of this is that P is true in every valuation in which P ∧Q is true. But the only way for P ∧Q to be true is for P to be true and Q to be true. The second (related) difference is that in Kleene’s system, → is interdefinable with the ∼ and ∨, in that φ→ψ has exactly the same truth table as ∼φ∨ψ. (Look at the truth tables to verify that this is true.) But that’s not true for Łukasiewicz’s system. In Łukasiewicz’s system, when φ and ψ are both #, then φ→ψ is 1, but ∼φ∨ψ is #.
3.3.3
Kleene’s “weak” tables (Bochvar’s tables)
This final system has a very different intuitive idea: that # is “infectious”. That is, if any formula has a part that is #, then the entire formula is #. Thus, the tables are as follows: ∼ 1 0 0 1 # #
∧ 1 0 #
1 1 0 #
0 0 0 #
# # # #
∨ 1 0 #
1 1 1 #
0 1 0 #
# # # #
→ 1 0 #
1 1 1 #
0 0 1 #
# # # #
So basically, the classical bit of each truth table is what you’d expect; but everything gets boring if any constituent formula is a #. One way to think about these tables is to think of the # as indicating nonsense. The sentence “The sun is purple and blevledgekl;rz”, one might naturally think, is neither true nor false because it is nonsense. It is nonsense even though it has a part that isn’t nonsense.
3.3.4
Supervaluationism
Recall the guiding thought behind the strong Kleene tables: if a formula’s classical truth values fix a particular truth value, then that is the value that the formula takes on. There is a way to take this idea a step further, which results in a new and interesting way of thinking about three-valued logic. According to the strong Kleene tables, we get a classical truth value for φ ψ, where is any connective, only when we have “enough classical information” in the truth values of φ and ψ to fix a classical truth value for φ ψ. Consider φ∧ψ for example: if either φ or ψ is false, then since
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
51
falsehood of a conjunct is classically sufficient for the falsehood of the whole conjunction, the entire formula is false. But if, on the other hand, both φ and ψ are #, then neither φ nor ψ has a classical truth value, we do not have enough classical information to settle on a classical truth value for φ∧ψ, and so the whole formula is #. But now consider a special case of the situation just considered, where φ is P , ψ is ∼P , and P is #. According to the strong Kleene tables, the conjunction P ∧∼P is #, since it is the conjunction of two formulas that are #. But there is a way of thinking about truth values of complex sentences according to which the truth value ought to be 0, not #: no matter what classical truth value P were to take on, the whole sentence P ∧∼P would be 0 — therefore, one might think, P ∧∼P ought to be 0. If P were 0 then P ∧∼P would be 0∧∼0 — that is 0; and if P were 1 then P ∧∼P would be 1∧∼1 — 0 again. The general thought here is this: suppose a sentence φ contains some sentence letters P1 . . . Pn that are #. If φ would be false no matter how we assign classical truth values to P1 . . . Pn — that is, no matter how we precisified φ — then φ is in fact false. Further, if φ would be true no matter how we precisified it, then φ is in fact true. But if precisifying φ would sometimes make it true and sometimes make it false, then φ in fact is #. The idea here can be thought of as an extension of the idea behind the strong Kleene tables. Consider a formula φ ψ, where is any connective. If there is enough classical information in the truth values of φ and ψ to fix on a particular classical truth value, then the strong Kleene tables assign φ ψ that truth value. Our new idea goes further, and says: if there is enough classical information within φ and ψ to fix a particular classical truth value, then φ ψ gets that truth value. Information “within” φ and ψ includes, not only the truth values of φ and ψ, but also a certain sort of information about sentence letters that occur in both φ and ψ. For example, in P ∧∼P , when P is #, there is insufficient classical information in the truth values of P and of ∼P to settle on a truth value for the whole formula P ∧∼P (since each is #). But when we look inside P and ∼P , we get more classical information: we can use the fact that P occurs in each to reason as we did above: whenever we turn P to 0, we turn ∼P to 1, and so P ∧∼P becomes 0; and whenever we turn P to 1 we turn ∼P to 0, and so again, P ∧∼P becomes 0. This new idea — that a formula has a classical truth value iff every way of precisifying it results in that truth value — is known as supervaluationism. Let us lay out this idea formally.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
52
As above, a trivalent assignment is a function that assigns to each sentence letter one of the values 1, 0, #. Where A is a trivalent assignment and C is a PL- assignment (i.e., a bivalent assignment in the sense of section 2.3), say that C is a precisification of A iff: whenever A assigns a sentence letter a classical value (i.e., 1 or 0), C assigns that sentence letter the same classical value. Thus, precisifications of A agree with A on the classical truth values, but in addition — being classical assignments — they also assign classical truth values to sentence letters to which A assigns #. A given precisification of A “decides” all of A’s #s in a certain way. We can now say how the supervaluationist assigns truth values to complex formulas relative to a valuation. When φ is any wff and A is a trivalent assignment, let us define SA (φ) — the “supervaluation of φ relative to A — thus: SA (φ) = 1 if VC (φ) = 1 for every precisification, C, of A SA (φ) = 0 if VC (φ) = 0 for every precisification, C, of A SA (φ) = # otherwise Note the appeal here to VC , the valuation for PL-assignment C, as defined in section 2.3. Some common terminology: when SA (φ)=1, we say that φ is supertrue in A, and when SA (φ)=0, we say that φ is superfalse in A. For the supervaluationist, a formula is true when it is supertrue (i.e., true in all precisifications of A), false when it is superfalse (i.e., false in all precisifications of A), and # when it is neither supertrue nor superfalse (i.e., when it is true in some precisifications of A but false in others.) To return to the example considered above: the supervaluationist assigns a different truth value to P ∧∼P , when P is #, than do the strong Kleene tables (and indeed, than do all the other tables we have considered.) The strong Kleene tables say that P ∧∼P is # in this case. But the supervaluationist says that it is 0: each precisification of any trivalent assignment that assigns P # is by definition a PL-assignment, and P ∧∼P is 0 in each PL-assignment. Let us note a few interesting facts about supervaluationism. First, note that every classical tautology turns out to be supertrue in every valuation. For let A be any trivalent assignment and let φ be a tautology. By the definition of supertruth, φ is supertrue iff φ is true in every precisification of A. But precisifications are PL-assignments, and the definition of a tautology
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
53
is a formula that is true in every PL-assignment. Similar reasoning shows that if φ is false in every PL-assignment, then φ is superfalse. Second, note that according to supervaluationism, some formulas are nevertheless neither true nor false in some valuations. For instance, take the formula P ∧Q, in any valuation V in which P is 1 and Q is #. Any precisification of V must continue to assign P 1. But some will assign 1 to Q, whereas others will assign 0 to Q. Any of the former precisifications will assign 1 to P ∧Q, whereas any of the latter will assign 0 to P ∧Q. Hence P ∧Q is neither supertrue nor superfalse in V — SV (P ∧Q) = #. Finally, notice that the propositional connectives are not truth functional according to supervaluationism. To say that a connective is truth functional is to say that the truth value5 of a complex statement whose major connective is is a function of the truth values of the immediate constituents of that formula — that is, any two such formulas whose immediate constituents have the same true values must themselves have the same truth value. According to all of the truth tables for three-valued logic we considered earlier (Łukasiewicz, Kleene strong and weak), the propositional connectives are truth-functional. Indeed, this is in a way trivial: if a connective isn’t truth-functional then one can’t give it a truth table at all: what a truth table does is specify how a connective determines the truth value of entire sentences as a function of the truth values of its parts. But supervaluationism renders the truth functional connectives not truth functional. Consider the following pair of sentences, in a valuation in which P and Q are both #: P ∧Q P ∧∼P As we have seen, in such valuations, the first formula is # and the second formula is 0 (since it is superfalse). But each of these formulas is a conjunction, each of whose conjuncts is #: the truth values of φ and ψ do not determine the truth value of φ∧ψ. Similarly, the truth values of other complex formulas are not determined by the truth values of their parts, as the following pairs of formulas show (the indicated truth values, again, are relative to a valuation in which P and Q are #): P ∨Q # P →Q # P ∨∼P 1 P →P 1 5
I am counting #, in addition to 1 and 0, as a “truth value”.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
3.4
54
Intuitionism
Intuitionism in the philosophy of mathematics is a view according to which there are no mind-independent mathematical facts. Rather, mathematical facts and entities are mental constructs that owe their existence to the mental activity of mathematicians constructing proofs. This philosophy of mathematics then leads them to a distinctive form of logic: intuitionist logic. Let P be the statement: The sequence 0123456789 occurs somewhere in the decimal expansion of π. How should we think about its meaning? For the classical mathematician, the answer is straightforward. P is a statement about a part of mathematical reality, namely, the infinite decimal expansion of π. Either the sequence 0123456789 occurs somewhere in that expansion, in which case P is true, or it does not, in which case P is false and ∼P is true. For the intuitionist, this whole picture is mistaken, premised as it is on the reality of an infinite decimal expansion of π. Our minds are finite, and so only the finite initial segment of π’s decimal expansion that we have constructed so far is real. The intuitionist’s alternate picture of P ’s meaning, and indeed of meaning generally (for mathematical statements) is a radical one.6 The classical mathematician, comfortable with the idea of a realm of mindindependent entities, thinks of meaning in terms of truth and falsity. As we saw, she thinks of P as being true or false depending on the facts about π’s decimal expansion. Further, she explains the meanings of the propositional connectives in truth-theoretic terms: a conjunction is true iff each of its conjuncts are true; a negation is true iff the negated formula is false; and so on. Intuitionists, on the other hand, reject the centrality of truth to meaning, since truth is tied up with the rejected picture of mind-independent mathematical reality. For them, the central semantic concept is that of proof. They simply do not think in terms of truth and falsity; in matters of meaning, they think in terms of the conditions under which formulas have been proved. Take P , for example. Intuitionists advise us: don’t think in terms of what it would take for P to be true. Think, rather, in terms of what it would take to prove P . And the answer is clear: we would need to actually continue our construction of the decimal expansion of π to a point where we found the sequence 0123456789. What, now, of ∼P ? Again, thinking in terms of proof, not truth: what 6
One intuitionist picture, anyway, on which see Dummett (1973). What follows is a crude sketch. It does not do justice to the actual intuitionist position, which is, as they say, subtle.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
55
would it take for ∼P to be proved? The answer here is less straightforward. Since P said that there exists a number of a certain sort, it was clear how it would have to be proved: by actually exhibiting (calculating) some particular number of that sort. But ∼P says that there is no number of a certain sort; how do we prove something like that? The intuitionist’s answer: by proving that the assumption that there is a number of that sort leads to a contradiction. In general, a negation, ∼φ, is proved by proving that φ leads to a contradiction.7 Similarly for the other connectives: the intuitionist explicates their meanings by their role in generating proof conditions, rather than truth conditions. φ∧ψ is proved by separately giving a proof of φ and a proof of ψ; φ∨ψ is proved by giving either a proof of φ or a proof of ψ; φ→ψ is proved by exhibiting a construction whereby any proof of φ can be converted into a proof of ψ. Likewise, the intuitionist thinks of logical correctness as the preservation of provability, not the preservation of truth. For example, φ∧ψ logically implies φ because if one has a proof of φ∧ψ, then one has a proof of φ; and conversely, if one has proofs of φ and ψ separately, then one has the materials for a proof of φ∧ψ. So far, so classical. But ∼∼φ does not logically imply φ, for the intuitionist. Simply having a proof of ∼∼P — a proof that the assumption that 0123456789 occurs nowhere in π’s decimal expansion leads to a contradiction — wouldn’t give us a proof of P , since proving P would require exhibiting a particular place in π’s decimal expansion where 0123456789 occurs. Likewise, intuitionists do not accept the law of the excluded middle, φ∨∼φ, as a logical truth. To be a logical truth, a sentence should be provable from no premises whatsoever. But to prove P ∨∼P , for example, would require either exhibiting a case of 0123456789 in π’s decimal expansion, or proving that the assumption that 0123456789 occurs in π’s decimal expansion leads to a contradiction. We’re not in a position to do either. Though we won’t consider intuitionist predicate logic, one of its most striking features is easy to grasp informally. Intuitionists say that an existentially quantified sentence is proved iff one of its instances has been proved. Therefore they reject the inference from ∼∀xF x to ∃x∼F x, for one might be able to prove a contradiction from the assumption of ∀xF x without being able to prove any instance of ∃x∼F x. We have so far been considering a putative philosophical justification for intuitionist propositional logic. That justification has been rough and ready; 7
Given the contrast with the classical conception of negation, a different symbol (often “¬”) is sometimes used for intuitionist negation.
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
56
but intuitionist propositional logic itself is easy to present, perfectly precise, and is a coherent system regardless of what one thinks of its philosophical underpinnings. Two simple modifications to the natural deduction system of chapter 2.5 generate intuitionistic propositional logic. First, we drop one half of double-negation, namely what we might call “double-negation elimination” (DNE): Γ ` ∼∼φ Γ`φ Second, we add the rule “ex falso” (EF): Γ ` φ∧∼φ Γ`ψ Note that EF can be proved in the original system of chapter 2.5: simply use RAA and then DNE. So, intuitionist logic results from a system for classical logic by simply dropping one rule (DNE) and adding another rule that was previously provable (EF). It follows that every intuitionistically provable sequent is also classically provable (because every intuitionistic proof can be converted to a classical proof). Notice how dropping DNE blocks proofs of various classical theorems the intuitionist wants to avoid. The proof of ∅ ` P ∨∼P in chapter 2.5, for instance, used DNE. Of course, for all we’ve said so far, there might be some other way to prove this sequent. Only when we have a semantics for intuitionistic logic, and a soundness proof relative to that semantics, can we show that this sequent cannot be proven without DNE. We will discuss a semantics for intuitionism in section 7.2. It is interesting to note that even though intuitionists reject the inference from ∼∼P to P , they accept the inference from ∼∼∼P to ∼P , since its proof only requires the half of DN that they accept, namely the inference from P to ∼∼P : 1 2 2 1,2 1
(1) (2) (3) (4) (5)
∼∼∼P P ∼∼P ∼∼P ∧ ∼∼∼P ∼P
As As (for reductio) 2, DN (accepted version) 1,3 ∧I 4, RAA
CHAPTER 3. VARIATIONS AND DEVIATIONS FROM PL
57
Note that you can’t use this sort of proof to establish ∼∼P ` P . Given the way RAA is stated, its application always results in a formula beginning with the ∼.
Chapter 4 Predicate Logic et’s now turn from propositional to predicate logic. We’re going to do the same thing we did for propositional logic: formalize predicate logic. We’ll first do grammar. One would normally then go on to define theoremhood, using axioms and rules.1 But we’ll skip that bit; we’ll go straight to semantics — we’ll define validity.
L 4.1
Grammar of predicate logic
As before, we start by specifying the kinds of symbols that may be used in sentences of predicate logic — primitive vocabulary — and then go on to define the well formed formulas as strings of primitive vocabulary that have the right form. Primitive vocabulary: i) logical: →, ∼, ∀ ii) nonlogical: a) for each n > 0, n-place predicates F , G . . ., with or without subscripts b) variables x, y . . . with or without subscripts c) individual constants (names) a, b . . ., with or without subscripts 1
There are, in fact, axiom systems for (first-order) predicate logic that are sound and complete with respect to the semantics I will go on to give.
58
CHAPTER 4. PREDICATE LOGIC
59
iii) parentheses No symbol of one type is a symbol of any other type. Let’s call any variable or constant a term. Definition of wff: i) if Π is an n-place predicate and α1 . . . αn are terms, then Πα1 . . . αn is a wff ii) if φ, ψ are wffs, and α is a variable, then ∼φ, (φ→ψ), and ∀αφ are wffs iii) nothing else is a wff. We’ll call formulas that are wffs in virtue of clause i) “atomic” formulas. When a formula has no free variables, we’ll say that it is a closed formula, or sentence; otherwise it is an open formula. We have the same defined logical terms: ∧, ∨, ↔. We also add the following definition of the existential quantifier: ∃vφ =d f ∼∀α∼φ
(where α is a variable and φ is a wff).
We won’t give any definition of ‘theorem’, but rather move straight to semantics.
4.2
Set theory2
First, though, a digression. We’ll need to talk a lot about sets, so I want to make some introductory remarks about set theory. Sets have members. For example, we may speak of the set of people, the set of cities, the set of real numbers, etc. There is also the empty set: the one set with no members. Though the notion of a set is an intuitive one, it is deeply perplexing. This can be seen by reflecting on the Russell Paradox, discovered by Bertrand Russell, the great philosopher and mathematician. Let us call R the set of all and only those sets that are not members of themselves. For short, R is the set of nonself-members. Russell asks the following question: is R a member of itself? There are two possibilities: 2
Supplementary reading: the beginning of Enderton (1977)
CHAPTER 4. PREDICATE LOGIC
60
i) R ∈ / R. Thus, R is a non-self-member. But R was said to be the set of all non-self-members, and so we’d have R ∈ R. Contradiction. ii) R ∈ R. So R is not a non-self-member. R, by definition, contains only non-self-members. So R ∈ / R. Contradiction. Thus, each possibility leads to a contradiction. But there are no remaining possibilities — either R is a member of itself or it isn’t! So it looks like the very idea of sets is paradoxical. The modern discipline of axiomatic set theory arose in part to develop a notion of sets that isn’t subject to this sort of paradox. This is done by imposing rigid restrictions on when a given “condition” picks out a set. In the example above, the condition “is a non-self-member” will be ruled out — there’s no set of all and only the things satisfying this condition. The specifics of set theory are beyond the scope of this course; for our purposes, we’ll help ourselves to the existence of sets, and not worry about exactly what sets are, or how the Russell paradox is avoided. There are some common notions of sets we’ll need. We say that A is a subset of B (“A ⊆ B”) when every member of A is a member of B. We say that the intersection of A and B (“A ∩ B”) is the set that contains all and only those things that are in both A and B, and that the union of A and B (“A ∪ B”) is the set containing all and only those things that are members of either A or B. Suppose we want to refer to the set of the so-and-sos — that is, the set containing all and only objects, u, that satisfy the condition “so-and-so”. We’ll do this with the term “{u: u is a so-and-so}”. With this locution, we can restate the definitions of ∩ and ∪ from the previous paragraph as follows: A ∩ B =d f {u : u ∈ A and u ∈ B} A ∪ B =d f {u : u ∈ A or u ∈ B} We’ll need more than just sets — we’ll need ordered pairs.3 Why? Well, you may recall that when we specify a set, we don’t need to specify any particular order An aside: there’s a trick for “reducing” talk of ordered pairs to talk of sets; define 〈u, v〉 as the set {{u},{u, v}}. The trick is that we put the set together in such a way that we can look at the set and tell what the first member of the ordered pair is: it’s the one that “appears twice”. And one can go on to reduce talk of ordered triples, ordered quadruples, etc., to talk of ordered pairs. But the technicalities of these reductions won’t matter for us; I’ll just feel free to go on speaking of ordered pairs, triples, etc. 3
CHAPTER 4. PREDICATE LOGIC
61
of the members. For example, the set containing me and Bill Clinton doesn’t have a “first” member. This is reflected in the fact that “{Ted, Clinton}” and “{Clinton, Ted}” are two different names for the same set — the set containing just Clinton and Ted. But sometimes, we have the need to talk about a set-like thing containing Clinton and Ted, but in a certain order. For this purpose, logicians use ordered pairs. To name the ordered pair of Clinton and Ted, we use: “〈Clinton, Ted〉”. Here, the order is significant, for 〈Clinton, Ted〉 and 〈Ted, Clinton〉 are not the same thing. Similarly, we have ordered triples 〈u, v, w〉, and in general, ordered n-tuples 〈u1 , . . . , un 〉, for any natural number n. Let’s even allow 1-tuples: let’s define the 1-tuple 〈u〉 as being the object u itself.
4.3
Semantics of predicate logic
The semantic definition of validity, intuitively, says that a sentence is valid if it is “true no matter what”. In propositional logic, we cashed out “true no matter what” using valuations. A valuation function was thought of as a possibility, since a valuation assigns either true or false to each sentence letter. Then we defined truth-in-a-valuation. So that means we could think of truth-in-allvaluations as “truth in all possibilities” — i.e., truth no matter what. This procedure needs to get more complicated for the predicate “calculus” (PC), as it is sometimes called. The reason is that the method of truth tables assumes that we can calculate the truth value of a complex formula by looking at the truth values of its parts. But take the sentence ∃x(F x∧Gx). You can’t calculate its truth value by looking at the truth values of F x and Gx, since sentences like F x don’t have truth values. The letter ‘x’ doesn’t stand for any one thing; it’s a variable; so ‘F x’ doesn’t have a truth value. The following is called “Tarski-style” semantics for PC, in honor of Alfred Tarski who made it up (actually, he made something like it up). It gives an alternate definition of “true no matter what” — true in all models: A (PC-) model is an ordered pair 〈D,I〉, such that: i) D is a non-empty set ii) I is a function such that: a) if α is a constant then I(α) ∈ D
CHAPTER 4. PREDICATE LOGIC
62
b) if Π is an n-place predicate, then I(Π) = some set of ntuples of members of D. (An “n-tuple” is an ordered set of n objects. Ordinary sets are unordered — the set {u, v} is just the same as the set {v, u}; the order doesn’t matter. But the ordered set 〈u, v〉 is not the same as the ordered set 〈v, u〉, because the order of u and v is different.) Think of a model as a possible world. The model is based on D, its domain. D contains, intuitively, the individuals that exist in the possibility. I is the interpretation function, which tells us what the non-logical constants (names and predicates) mean in the possibility. I assigns to each constant (“name”) a member of the domain — its denotation. For example, if the domain is the set of persons, then the name ‘a’ might be assigned me. One-place predicates get assigned sets of 1-tuples of D — that is, just sets of members of D. So, a one-place predicate ‘F ’ might get assigned a set of persons. That set is called the “extension” of the predicate — if the extension is the set of males, then the predicate ‘F ’ might be thought of as symbolizing “is male”. Two-place predicates get assigned sets of ordered pairs of members of D — that is, binary relations over the domain. If a two place predicate ‘R’ is assigned the set of persons 〈u, v〉 such that u is taller than v, we might think of ‘R’ as symbolizing “is taller than”. Similarly, three-place predicates get assigned sets of ordered triples… Relative to any PC-model 〈D,I〉, we want to define the notion of truth in a model — the corresponding valuation function. But we’ll need some apparatus first. It’s pretty easy to see what truth value a sentence like F a should have. I assigns a member of the domain to a — call that member u. I also assigns a subset of the domain to F — let’s call that subset S. The sentence F a should be true iff u ∈ S — that is, iff the referent of a is a member of the extension of F . That is, F a should be true iff I(a)∈I(F ). Similarly, Rab should be true iff 〈I(a),I(b )〉 ∈ I(R). And so on. As before, we can give recursive clauses for the truth values of negations and conditionals. φ→ψ, for example, will be true iff either φ is false or ψ is true. But this becomes tricky when we try to specify the truth value of ∀xF x. It should, intuitively, be true if and only if ‘F x’ is true, no matter what we put in in place of ‘x’. But this is vague. Do we mean “whatever name (constant) we put in place of ‘x”’? No, because we don’t want to assume that we’ve got a name for everything in the domain, and what if F x is true for all the objects we
CHAPTER 4. PREDICATE LOGIC
63
have names for, but false for one of the nameless things! Do we mean, “true no matter what object from the domain we put in place of ‘x”’? No; objects from the domain aren’t part of our primitive vocabulary, so the result of replacing ‘x’ with an object from the domain won’t be a formula!4 Tarski’s solution to this problem goes as follows. Initially, we don’t consider truth values of formulas absolutely. Rather, we let the variables refer to certain things in the domain temporarily. Then, we’ll say that ∀xF x will be true iff for all objects u in the domain D: F x is true while x temporarily refers to u. We implement this idea of temporary reference by defining the notion of a variable assignment: g is a variable assignment for model 〈D,I〉 iff g is a function that assigns to each variable some object in D. The variable assignments give the “temporary” meanings to the variables; when g (x) = u, then u is the temporary denotation of x. We need a further bit of notation. Let u be some object in D, let g be some variable assignment, and let α be a variable. We then define “g u/α ” to be the variable assignment that is just like g , except that it assigns u to α. (If g already assigns u to α then g u/α will be the same function as g .) Note the following important fact about variable assignments: g u/α , when applied to α, must give the value u. (Work through the definitions to see that this is so.) That is: g u/α (α) = u One more bit of apparatus. Given any model M (=〈D,I〉), and given any variable assignment, g , and given any term (i.e., variable or name) α, we define the denotation of α, relative to M and g , “[α]M,g ” as follows: [α]M, g =I(α) if α is a constant, and [α]M,g = g (α) if α is a variable. The subscripts M and g on [ ] indicate that denotations are assigned relative to a model (M), and relative to a variable assignment (g ). Now we are ready to define the valuation function, which assigns truth values relative to variable assignments. (Relativization to assignments is necessary 4
Unless the domain happens to contain members of our primitive vocabulary!
CHAPTER 4. PREDICATE LOGIC
64
because, as we noticed before, F x doesn’t have a truth value absolutely. It only has a truth value relative to an assigned value to the variable x — i.e., relative to a choice of an arbitrary denotation for x.) The valuation function, VM, g , for model M (=〈D,I〉) and variable assignment g , is defined as the function that assigns to each wff either 0 or 1 subject to the following constraints: i) for any n-place predicate Π and any terms α1 …αn , VM,g (Πα1 …αn ) =1 iff 〈[α1 ]M,g …[αn ]M,g 〉 ∈ I(Π) ii) for any wffs φ, ψ, and any variable α: a) VM,g (∼φ)=1 iff VM, g (φ)=0 b) VM,g (φ→ψ)=1 iff either VM,g (φ)=0 or VM,g (ψ)=1 c) VM,g (∀αφ)=1 iff for every u ∈ D, VM, gu/α (φ)=1 (In understanding clause i), recall that the one tuple containing just u, 〈u〉 is just u itself. Thus, in the case where Π is F , some one place predicate, clause i) says that VM,g (F α)=1 iff [α]M,g ∈ I(F ).) So far we have defined the notion of truth in a model relative to a variable assignment. But what we really want is a notion of truth in a model, period — that is, absolute truth in a model. (We want this because we want to define, e.g., a valid formula as one that is true in all models.) So, let’s define absolute truth in a model in this way: φ is true in M =d f VM, g (φ)=1, for each variable assignment g It might seem that this is too strict a requirement — why must φ be true relative to each variable assignment? But in fact, it’s not too strict at all. The kinds of formulas we’re really interested in are formulas without free variables (we’re interested in formulas like F a, ∀xF x, ∀x(F x→Gx); not formulas like F x, ∀xRxy, etc.). And if a formula has no free variables, then if there’s even a single variable assignment relative to which it is true, then it is true relative to every variable assignment. (And so, we could just as well have defined truth in a model as truth relative to some variable assignment.) I won’t prove this fact, but it’s not too hard to prove; one would simply need to prove (by induction) that, for any wff φ, if variable assignments g and h agree on all variables free in φ, then VM,g (φ)=VM,h (φ). Now we can go on to give semantic definitions of the other core logical notions:
CHAPTER 4. PREDICATE LOGIC
65
φ is valid (“ φ”) iff φ is true in all PC-models φ is a semantic consequence of Γ (“Γ φ”) iff for every model M, if each member of Γ is true in M then φ is also true in M Since our new definition of the valuation function treats the propositional connectives → and ∼ in the same way as the propositional logic valuation did, it’s easy to prove that it also treats the defined connectives ∧, ∨, and ↔ in the same way: VM, g (φ∧ψ)=1 iff VM,g (φ)=1 andVM,g (ψ)=1 VM,g (φ∨ψ)=1 iff VM,g (φ)=1 or VM,g (ψ)=1 VM,g (φ↔ψ)=1 iff VM, g (φ) = VM,g (ψ) Moreover, we can also prove that the valuation function treats ∃ as it should (given its intended meaning): VM, g (∃αφ)=1 iff there is some u ∈ D such that VM,gu/α (φ)=1 This can be established as follows: The definition of ∃αφ is: ∼∀α∼φ. So, we must show that for any model, and any variable assignment g based on that model, VM, g (∼∀α∼φ)=1 iff there is some u ∈ D such that VM,gu/α (φ)=1. (In arguments like these, I’ll sometimes stop writing the subscript M in order to reduce clutter. It should be obvious from the context what the relevant model is.) Here’s the argument: • Vg (∼∀α∼φ)=1 iff Vg (∀α∼φ)=0 (given the clause for ∼ in the definition of the valuation function) • But, Vg (∀α∼φ)=0 iff for some u ∈ D, Vgu/α (∼φ)=0 • Given the clause for ∼, this can be rewritten as: … iff for some u ∈ D, Vgu/α (φ)=1
CHAPTER 4. PREDICATE LOGIC
4.4
66
Establishing validity and invalidity
Given our definitions, we can establish that particular formulas are valid. For example, let’s show that ∀xF x→F a is valid; that is, let’s show that for any model 〈D,I〉, and any variable assignment g , Vg (∀xF x→F a)=1. • Suppose otherwise; then Vg (∀xF x) = 1 and Vg (F a) = 0. • Given the latter, that means that [a] g ∈ / I(F ) — that is, I(a) ∈ / I(F ). • Given the former, for any u ∈ D, Vgu/x (F x) = 1. • I(a) ∈ D, and so VgI(a)/x (F x) = 1. • By the truth condition for atomics, [x] gI(a)/x ∈ I(F ). • By the definition of the denotation of a variable, [x] gI(a)/x = gI(a)/x (x) • but gI(a)/x (x) = I(a). Thus, I(a) ∈ I(F ). Contradiction Notice a couple things about this proof. Note that I claimed that I(a)∈D; this comes from the definition of an interpretation function: the interpretation of a constant is always a member of the domain. Second, notice that “I(a)” is a term of our metalanguage; that’s why, when I’m given that “for any u in D”, I can set u equal to I(a). In general, to show that a formula is valid, what you need to do is reason informally, in the metalanguage, using ordinary, informal, mathematically acceptable patterns of reasoning. We’ve seen how to establish that particular formulas are valid. How do we show that a formula is invalid? We need to simply exhibit a single model in which the formula is false. (The definition of validity specifies that a valid formula is true in all models; therefore, it only takes one model in which a formula is false to make that formula invalid.) So let’s take one example; let’s show that the formula (∃xF x∧∃xGx)→∃x(F x∧Gx) isn’t valid. To do this, we must produce a model in which this formula is false. All we need is a single model, since in order for the formula to be valid, it must be true in all models. My model will contain numbers in its domain: D={0,1}
CHAPTER 4. PREDICATE LOGIC
67
I(F )={0} I(G)={1} It is intuitively clear that the formula is false in this model. In this model, something has F (namely, 0), and something has G (namely, 1), but nothing has both.
Chapter 5 Extensions of Predicate Logic he predicate logic we considered in the previous chapter is powerful. Much natural language discourse can be represented using it, in a way that reveals logical structure. Nevertheless, it has its limitations. In this chapter we consider some of its limitations, and corresponding additions to predicate logic.
T 5.1
Identity
“Standard” predicate logic is usually taken to include the identity sign (“=”). “a=b ” means that a and b are one and the same thing.
5.1.1
Grammar for the identity sign
We first need to expand our definition of a well-formed formula. We simply add the following clause: If α and β are terms, then α=β is a wff We need to beware of a potential source of confusion. We’re now using the symbol ‘=’ as the object-language symbol for identity. But I’ve also been using ‘=’ as the metalanguage symbol for identity, for instance when I write things like “V(φ)=1”. This shouldn’t generally cause confusion, but if there’s a danger of misunderstanding, I’ll write in the metalanguage “is” or “is the same object as” or something like that, instead of ‘=’.
68
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
5.1.2
69
Semantics for the identity sign
We also need to add to our definition of truth-in-a-model; we need to add a clause to the definition of a valuation function telling it what truth values it should give sentences containing the = sign. Here is the clause: VM, g (α=β) = 1 iff [α]M,g is the same object as [β]M,g We can then use this clause in establishing the validity and invalidity of formulas. For example, let’s show that the formula ∀x∃y x=y is valid: • We need to show that in any model, and any variable assignment g in that model, Vg (∀x∃y x=y) = 1. So, suppose for reductio that for some g in some model, Vg (∀x∃y x=y) = 0. • Given the clause for ∀, for some object in the domain, call it “u”, Vgu/x (∃y x=y) = 0. • Given the clause for ∃, for every v in the domain, Vgu/x v/y (x=y) =0. • Letting v be u, we have: Vgu/x u/y (x=y) = 0. • So, given the clause for “=”, [x] gu/x u/y is not the same object as [y] gu/x u/y • but [x] gu/x u/y and [y] gu/x u/y are the same object. [x] gu/x u/y is g u/x u/y (x), i.e., u; and [y] gu/x u/y is g u/x u/y (y), i.e., u.
5.1.3
Translations with the identity sign
Why do we ever add anything to our list of logical constants? Why not stick with the tried and true logical constants of propositional and predicate logic? We generally add a logical constant when it has a distinctive inferential and semantic role, and when it has very general application — when, that is, it occurs in a wide range of linguistic contexts. We studied the distinctive semantic role of ‘=’ in the previous section. In this section, we’ll look at the range of linguistic contexts that can be symbolized using ‘=’. The most obvious sentences that may be symbolized with ‘=’ are those that explicitly concern identity, such as: Mark Twain is identical to Samuel Clemens t =c Every man fails to be identical to George Sand ∀x(M x→∼x=s)
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
70
(It will be convenient to abbreviate ∼α=β as α 6= β. Thus, the second symbolization can be rewritten as: ∀x(M x→x 6= s).) But many other sentences involve the concept of identity in subtler ways. Consider, for example, “Every lawyer hates every other lawyer”. The ‘other’ signifies nonidentity; we have, therefore: ∀x(Lx→∀y[(Ly∧x 6= y)→H xy]) Consider next “Only Ted can change grades”. This means: “no one other than Ted can change grades”, and may therefore be symbolized as: ∼∃x(x 6= t ∧C x) (letting ‘C x’ symbolize “x can change grades”.) Another interesting class of sentences concerns number. We cannot symbolize “There are at least two dinosaurs” as: “∃x∃y(D x∧D y)”, since this would be true even if there were only one dinosaur: x and y could be assigned the same dinosaur. The identity sign to the rescue: ∃x∃y(D x∧D y ∧ x 6= y) This says that there are two different objects, x and y, each of which are dinosaurs. To say “There are at least three dinosaurs” we say: ∃x∃y∃z(D x∧D y∧D z∧ x 6= y ∧ x 6= z ∧ y 6= z) Indeed, for any n, one can construct a sentence φn that symbolizes “there are at least n F s”: φn :
∃x1 . . . ∃xn (F x1 ∧ · · · ∧F xn ∧ δ)
where δ is the conjunction of all sentences “xi 6= x j ” where i and j are integers between 1 and n (inclusive) and i < j . (The sentence δ says in effect that no two of the variables x1 . . . xn stand for the same object.) Since we can construct each φn , we can symbolize other sentences involving number as well. To say that there are at most n F s, we write: ∼φn+1 . To say that there are between n and m F s (where m > n), we write: φn ∧∼φ m+1 . To say that there are exactly n F s”, we write: φn ∧∼φn+1 .” These methods for constructing sentences involving number will always work; but one can often construct shorter numerical symbolizations by other methods. For example, to say “there are exactly two dinosaurs”, instead of saying “there are at least two dinosaurs, and it’s not the case that there are at least three dinosaurs”, we could say instead: ∃x∃y(D x∧D y ∧ x 6= y ∧ ∀z[D z→(z=x∨z=y)])
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
5.2
71
Function symbols
Given our current predicate logic, our symbolizations of the following English sentences are unsatisfactory: 3 is the sum of 1 and 2 a=b George W. Bush’s father was a politician Pb The names in predicate logic stand for English singular noun phrases. Thus, “b ” in the final sentence stands for “George W. Bush’s father”. But this symbolization ignores the fact that the noun phrase “George W. Bush’s father” contains “George W. Bush” as a significant constituent. It treats “George W. Bush’s father” as a black box, so far as logic is concerned. But that’s bad. For example, the following is an intuitively valid inference in English: George W. Bush’s father was a politician Therefore, someone’s father was a politician We should develop a logic that accounts for the validity of this reasoning by “breaking up” noun phrases like “George W. Bush’s father”. To do this, we’ll add function symbols. Think of “George W. Bush’s father” as the result of plugging “George W. Bush” into the blank in “ ’s father”. “ ’s father” is an English function symbol. Function symbols are like predicates in some ways. The predicate “ is happy” has a blank in it, in which you can put a name. “ ’s father” is similar in that you can put a name into its blank. But there is a difference: when you put a name into the blank of a predicate, you get a complete sentence, whereas when you put a name into the blank of “ ’s father”, you get a noun phrase, such as “George W. Bush’s father”. Corresponding to English function symbols, we’ll add logical function symbols. We’ll symbolize “ ’s father” as f ( ). We can put names into the blank here. Thus, we’ll symbolize “George W. Bush’s father” as “ f (a)”, where “a” stands for “George W. Bush”. We need to add two more complications. First, what goes into the blank doesn’t have to be a name — it could be something that itself contains a function symbol. E.g., in English you can say: “George W. Bush’s father’s father”. We’d symbolize this as: f ( f (a)).
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
72
Second, just as we have multi-place predicates, we have multi-place function symbols. “The sum of 1 and 2” contains the function symbol “the sum of and —”. When you fill in the blanks with the names “1” and “2”, you get the noun phrase “the sum of 1 and 2”. So, we symbolize this using the two-place function symbol, “s( ,—). If we let “a” symbolize “1” and “b ” symbolize “2”, then “the sum of 1 and 2” becomes: s (a, b ). The result of plugging names into function symbols in English is a noun phrase. Noun phrases combine with predicates to form complete sentences. Function symbols function analogously in logic. Once you combine a function symbol with a name, you can take the whole thing, apply a predicate to it, and get a complete sentence. Thus, the sentence: George W. Bush’s father was a politician Becomes: P f (a) And 3 is the sum of 1 and 2 becomes c = s(a, b ) (here “c” symbolizes “3”). We can put variables into the blanks of function symbols, too. Thus, we can symbolize Someone’s father was a politician As ∃xP f (x)
5.2.1
Grammar for function symbols
We need to update our definition of a wff to allow for function symbols. First, we need to add to our vocabulary. So, the new definition starts like this (the new bit is in boldface): Primitive vocabulary:
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
73
i) logical: →, ∼, ∀ ii) nonlogical: a) for each n > 0, n-place predicates F , G, . . ., with or without subscripts b) for each n > 0, n-place function symbols f , g ,…, with or without subscripts c) variables x, y, . . ., with or without subscripts d) individual constants (names) a, b , . . ., with or without subscripts iii) parentheses The definition of a wff, actually, stays the same; all that needs to change is the definition of a “term”. Before, terms were just names or variables. Now, we need to allow for f (a), f ( f (a)), etc., to be terms. So we need a recursive definition of a term, as follows: Definition of terms i) names and variables are terms ii) if f is an n-place function symbol, and α1 …αn are terms, then f (α1 ,…,αn ) is a term iii) nothing else is a term
5.2.2
Semantics for function symbols
We now need to update our definition of a model by saying what the interpretation of a function symbol is. That’s easy: the interpretation of an n-place function symbol ought to be an n-place function defined on the model’s domain — i.e., a rule that maps any n members of the model’s domain to another member of the model’s domain. For example, in a model in which the one-place function symbol f ( ) is to represent “ ’s father”, the interpretation of f will be the function that assigns to any member of the domain that object’s father. Here’s the general definition of a model: A PC-Model is defined as an ordered pair 〈D,I〉, such that: i) D is a non-empty set
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
74
ii) I is a function such that: a) if α is a constant then I(α) ∈ D b) if Π is an n-place predicate, then I(Π) = some set of ntuples of members of D. c) If f is an n-place function symbol, then I(f ) is an nplace (total) function defined on D. (“Total” simply means that the function must yield an output for any n members of D.) The definition of a valuation function stays the same; all we need to do is update the definition of denotation to accommodate our new complex terms. Since we now can have arbitrarily long terms (not just names or variables), we need a recursive definition: Definition of denotation in a model M (=〈D, I 〉): i) if α is a constant then [α]M, g is I(α) ii) if α is a variable then [α]M,g is g (α) iii) if α is a complex term f (α1 ,…,αn ), then [α]M,g = I(f )([α1 ]M,g ,…,[αn ]M,g ) Note the recursive nature of this definition: the denotation of a complex term is defined in terms of the denotations of its smaller parts. Let’s think carefully about what clause iii) says. It says that, in order to calculate the denotation of the complex term f (α1 ,…,αn ) (relative to assignment g ), we must first figure out what I( f ) is — that is, what the interpretation function I assigns to the function symbol f . This object, the new definition of a model tells us, is an n-place function on the domain. We then take this function, I( f ), and apply it to n arguments: namely, the denotations (relative to g ) of the terms α1 . . . αn . The result is our desired denotation of f (α1 ,…,αn ). It may help to think about a simple case. Suppose that f is a one-place function symbol; suppose our domain consists of the set of natural numbers; suppose that the name a denotes the number 3 in this model (i.e., I(a) = 3), and suppose that f denotes the successor function (i.e., I( f ) is the function, successor, that assigns to any natural number n the number n + 1.) In that case,
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
75
the definition tells us that: [ f (a)] g = I( f )([a] g ) = I( f )(I(a)) = successor(3) =4 Here’s a sample metalanguage proof that makes use of the new definitions. Let’s show that the argument considered earlier is a valid argument, when it is symbolized into our new logical language: George W. Bush’s father was a politician Therefore, someone’s father was a politician We need to show that P f (a) ∃xP f (x) — i.e., that in any model in which P f (a) is true, ∃xP f (x) is true: • Suppose that P f (a) is true in a model 〈I, D〉 — i.e., Vg (P f (a)) = 1 (where V is the valuation for this model), for each variable assignment g . • We need to show that Vg (∃xP f (x)) = 1, for each g . • That is, we must show (for an arbitrarily selected g ) that for some object u ∈ D,Vgu/x (P f (x)) = 1, i.e., [ f (x)] gu/x ∈ I(P ). • [ f (x)] gu/x is defined as I( f )([x] gu/x ) (clause for function symbols of the definition of denotation) • But [x] gu/x is just g u/x (x) — i.e., u. • So what we must show is that for some u ∈ D,I( f )(u) ∈ I(P ). – – – –
5.2.3
Well, since Vg (P f (a)) = 1, [ f (a)] g ∈ I(P ). But [ f (a)] g is I( f )([a] g ); and [a] g is just I(a). So I( f )(I(a)) ∈ I(P ). So I(a) is our desired u
Translations with function symbols: some examples
Here are some example translations involving function symbols:
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
76
Everyone loves his or her father ∀xLx f (x) No one’s father is also his or her mother ∼∃x f (x)=m(x) No one is his or her own father ∼∃x x= f (x) Everyone’s maternal grandfather hates his or her paternal grandmother ∀x H f (m(x)) m( f (x)) Every even number is the sum of two prime numbers ∀x(E x→∃y∃z(P y∧P z∧x=s (y, z)))
5.3
Definite descriptions
Our logic has gotten more powerful with the addition of function symbols, but it still isn’t perfect. Function symbols let us “break up” certain complex noun phrases — e.g., “Bush’s father”. But there are other noun phrases we still can’t break up — e.g., “The black cat”. As it is, we still need to symbolize this as a simple name — “a”. But that leaves out the fact that “the black cat” contain “black” and “cat” as semantically significant constituents. And so it misses the validity of certain inferences. For example, the following is a valid argument: The black cat is happy Therefore, some cat is happy If we symbolized this argument given our current logic, we would get the obviously invalid: Ha Therefore, ∃x(C x∧H x) Here’s how we’ll break up “the black cat”. We’ll introduce a new symbol, ι, to stand for “the”. The function of “the” in English is to turn predicates into noun phrases. “Black cat” is a predicate of English; “the black cat” is a noun phrase that refers to the thing that satisfied the predicate “black cat”. Similarly, in logic, given a predicate F , we’ll let ιxF x be a term that means: the thing that is F .
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
77
We’ll want to let ιx attach to complex predicates, not just simple predicates. To symbolize “the black cat” — i.e., the thing that is both black and a cat — we want to write: ιx(B x∧C x). In fact, we’ll let ιx attach to wffs with arbitrary complexity. To symbolize “the fireman who saved someone”, we’ll write: ιx(F x∧∃yS xy).
5.3.1
Grammar for ι
Just as with function symbols, we need to add a bit to the primitive vocabulary, and revise the definition of a term. Here’s the new grammar: Primitive vocabulary: i) logical: →, ∼, ∀, ι ii) nonlogical: a) for each n > 0, n-place predicates F , G, . . ., with or without subscripts b) variables x, y, . . ., with or without subscripts c) individual constants (names) a, b , . . ., with or without subscripts Definition of terms and wffs: i) names and variables are terms ii) if φ is a wff and α is a variable then ιαφ is a term iii) if Π is an n-place predicate and α1 …αn are terms, then Πα1 …αn is a wff iv) if φ, ψ are wffs, and α is a variable, then ∼φ, (φ→ψ), and ∀αφ are wffs v) nothing else is a wff or term Notice how we needed to combine the recursive definitions of term and wff into a single recursive definition of wffs and terms together. The reason is that we need the notion of a wff to define what counts as a term containing the ι operator (clause ii); but we need the notion of a term to define what counts as a wff (clause iii). The way we accomplish this is not circular. The reason it isn’t is that we can always decide, using these rules, whether a given string counts as a wff or term by looking at whether smaller strings count as wffs or terms. And the smallest strings are said to be wffs or terms in non-circular ways.
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
5.3.2
78
Semantics for ι
We need to update the definition of denotation so that ιxφ will denote the one and only thing in the domain that is φ. This is a little tricky, though. What is there is no such thing? Suppose that ‘K’ symbolizes “king of” and ‘a’ symbolizes “USA”. Then, what should ‘ιxK xa’ denote? It is trying to denote the king of the USA, but there is no such thing. Further, what if more than one thing satisfies the predicate? In short, what do we say about “empty descriptions”? One approach would be to say that every atomic sentence with an empty description is false. One way to do this is to include in each model an “emptiness marker”, e, which is an object we assign as the denotation for each empty description. The emptiness marker shouldn’t be thought of as a “real” denotation; when we assign it as the denotation of a description, this just marks the fact that the description has no real denotation. We will stipulate that the emptiness marker is not in the domain; this ensures that it is not in the extension of any predicate, and hence that atomic sentences containing empty descriptions are always false. Here’s how the semantics looks: A PC-Model is defined as an ordered triple 〈D,I, e〉, such that: i) D is a non-empty set ii) e ∈ /D iii) I is a function such that: a) if α is a constant then I(α) ∈ D b) if Π is an n-place predicate, then I(Π) = some set of ntuples of members of D Definition of denotation and valuation: The denotation and valuation functions, [ ]M,g and VM,g , for model M (=〈D,I〉) and variable assignment g , are defined as the functions that satisfy the following constraints: i) VM,g assigns to each wff either 0 or 1 ii) []M, g assigns to each term either e, or a member of D iii) if α is a constant then [α]M,g is I(α) iv) if α is a variable then [α]M,g is g (α)
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
79
v) if α is a complex term ιβφ, where β is a variable and φ is a wff, then [α]M,g = the unique u ∈ D such that Vgu/β(φ) = 1 if there is a unique such u; otherwise [α]M,g = e vi) for any n-place predicate Π. and any terms α1 , . . . , αn , VM, g (Πα1 . . . αn ) = 1 iff 〈[α1 ]M,g . . . [αn ]M,g 〉 ∈ I(Π) vii) for any wffs φ, ψ, and any variable α: a) VM,g (∼φ)=1 iff VM,g (φ)=0 b) VM,g (φ→ψ)=1 iff either VM,g (φ)=0 or VM,g (ψ)=1 c) VM,g (∀αφ)=1 iff for every u ∈ D, VM,gu/α (φ)=1 (As with the grammar, we need to mix together the definition of denotation and the definition of the valuation function. The reason is that we need to define the denotation of definite descriptions using the valuation function (in clause v), but we need to define the valuation function using the concept of denotation in clause vi. As before, this is not circular.) An alternate approach would appeal to three-valued logic. We could leave the denotation of ιxφ undefined if there is no object in the domain such that φ. We could then treat any atomic sentence that contains a denotationless term as being neither true nor false — i.e., #. We would then need to update the other clauses to allow for #s, perhaps using the Kleene tables, perhaps some other truth tables. I won’t explore this further.
5.3.3
Eliminability of function symbols and definite descriptions
In a sense, we don’t really need function symbols or the ι. Take: The black cat is happy Given the ι, we can symbolize this as H ιx(B x∧C x). But we could symbolize it without the ι as follows: ∃x[ (B x∧C x) ∧ ∀y[(B y∧C y)→y=x] ∧ H x] That is, “there is something such that: i) it is a black cat, ii) nothing else is a black cat, and iii) it is happy”.
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
80
This method for symbolizing sentences containing ‘the’ is called “Russell’s theory of descriptions”, because Bertrand Russell, the 19th and 20th century philosopher and logician, invented it.1 In general, we have the following rule: “the F is φ” is symbolized: “∃x[F x ∧ ∀y(F y→x=y) ∧ φ] This method can be iterated so as to apply to sentences with two or more definite descriptions, such as: The 8-foot tall man drove the 20-foot long limousine. which becomes, letting ‘E’ stand for ‘is eight feet tall’ and ‘T ’ stand for ‘is twenty feet long’: ∃x[E x∧M x ∧ ∀z([E z∧M z]→x=z) ∧ ∃y[T y∧Ly ∧ ∀z([T z∧Lz]→y=z) ∧D xy]] An interesting problem arises with negations of sentences involving definite descriptions, when we use Russell’s method: The president is not bald. Does this mean: The president is such that he’s non-bald. which is symbolized as follows: ∃x[P x∧∀y(P y→x=y)∧∼B x] ? Or does it mean: It is not the case that the President is bald which is symbolized thus: ∼∃x[P x∧∀y(P y→x=y)∧B x] 1
See Russell (1905).
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
81
? According to Russell, the original sentence is simply ambiguous. Symbolizing it the first way is called “giving the description wide scope (relative to the ∼)”, since the ∼ is inside the scope of the ∃x. Symbolizing it in the second way is called “giving the description narrow scope (relative to the ∼)”, because the ∃x is inside the scope of the ∼. What is the difference in meaning between these two symbolizations? The first says that there really is a unique president, and adds that he is not bald. So the first implies that there’s a unique president. The second merely denies that there is a unique president, who is bald. That doesn’t imply that there’s a unique president. It would be true if there’s a unique president who is not bald, but it would also be true in two other cases: i) there’s no president ii) there is more than one president A similar ambiguity arises with the following sentence: The round square does not exist. We might think to symbolize it: ∃x[Rx∧S x∧∀y([Ry∧S y]→x=y)∧∼E x] letting “E” stands for “exists”. In other words, we might give the description wide scope. But this is wrong, because it says there is a certain round square that doesn’t exist, and that’s a contradiction. This way of symbolizing the sentence corresponds to reading the sentence as saying: The thing that is a round square is such that it does not exist But that isn’t the most natural way to read the sentence. The sentence would usually be interpreted to mean: It is not true that the round square exists. — that is, as the negation of “the round square exists”: ∼∃x[Rx∧S x∧∀y([Ry∧S y]→x=y)∧E x] with the ∼ out in front. Here we’ve given the description narrow scope. Notice also that saying that x exists at the end is redundant, so we could simplify to:
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
82
∼∃x[Rx∧S x∧∀y([Ry∧S y]→x=y)] Again, notice the moral of these last two examples: if a definite description occurs in a sentence with a ‘not’, the sentence may be ambiguous: does the ‘not’ apply to the entire rest of the sentence, or merely to the predicate? If we are willing to use Russell’s method for translating definite descriptions, we can drop ι from our language. We would, in effect, not be treating “the F ” as a referring phrase. We would instead be paraphrasing sentences that contain “the F ” into sentences that don’t. “The black cat is happy” got paraphrased as: “there is something that is a black cat, is such that nothing else is a black cat, and is happy”. See? — no occurrence of “the black cat” in the paraphrased sentence. In fact, once we use Russell’s method, we can get rid of function symbols too. Given function symbols, we treated “father” as a function symbol, symbolized it with “ f ”, and symbolized the sentence “George W. Bush’s father was a politician” as P f (b ). But instead, we could treat ‘father of’ as a two-place predicate, F , and regard the whole sentence as meaning: “The father of George W. Bush was a politician.” Given the ι, this could be symbolized as: P ιxF x b But given Russell’s method, we can symbolize the whole thing without using either function symbols or the ι: ∃x(F x b ∧ ∀y(F y b →y=x) ∧ P x) We can get rid of all function symbols this way, if we want. Here’s the method: • Take any n-place function symbol f • Introduce a corresponding n + 1-place predicate R • In any sentence containing the term “ f (α1 . . . αn )”, replace each occurrence of this term with “the x such that R(x, α1 . . . αn )”. • Finally, symbolize the resulting sentence using Russell’s theory of descriptions For example, let’s go back to: Every even number is the sum of two prime numbers
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
83
Instead of introducing a function symbol s(x, y) for “the sum of x and y”, let’s introduce a predicate letter R(z, x, y) for “z is a sum of x and y”. We then use Russell’s method to symbolize the whole sentence thus: ∀x(E x→∃y∃z[P y∧P z∧∃w(Rwy z∧∀w1 (Rw1 y z→w1 =w)∧x=w)]) The end of the formula (beginning with ∃w) says “the product of y and z is identical to x” — that is, that there exists some w such that w is a product of y and z, and there is no other product of y and z other than w, and w = x.
5.4
Further quantifiers
Standard logic contains just the quantifiers ∀ and ∃. As we have seen, using just these quantifiers, plus the rest of standard predicate logic, one can represent the truth conditions of a great many sentences of natural language. But not all. For instance, there is no way to symbolize the following sentences in predicate logic: Most things are massive Most men are brutes There are infinitely many numbers Some critics admire only one another Like those sentences that are representable in standard logic, these sentences involve quantificational notions: most things, some critics, and so on. In this section we examine broader notions of quantifiers that allow us to symbolize these sentences.
5.4.1
Generalized monadic quantifiers
We need to generalize the idea behind the standard quantifiers ∃ and ∀ in two ways. To approach the first, think about the clauses in the definition of truth in a model, M, with domain D, for ∀ and ∃: VM, g (∀αφ)=1 iff for every u ∈ D, VM, gu/α (φ)=1 VM,g (∃αφ)=1 iff for some u ∈ D, VM, gu/α (φ)=1 Let’s introduce the following bit of terminology. For any model, M (= 〈D,I〉), and wff, φ, let’s introduce a name for (roughly speaking) the set of members of M’s domain of which φ is true:
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
84
φM, g ,α = {u : u ∈ D and VM,gu/α (φ) = 1} Thus, if we begin with any variable assignment g , then φM,g ,α is the set of things u in D such that φ is true, relative to variable assignment g (u/α). Given this terminology, we can rewrite the clauses for ∀ and ∃ as follows: VM, g (∀αφ) = 1 iff φM,g ,α = D VM,g (∃αφ) = 1 iff φM,g ,α 6=∅ But if we can rewrite the semantic clauses for the familiar quantifiers ∀ and ∃ in this way — as conditions on φM,g ,α — then why not introduce new symbols of the same grammatical type as ∀ and ∃, whose semantics is parallel to ∀ and ∃ except in laying down different conditions on φM,g ,α ? These would be new kinds of quantifiers. For instance, for any integer n, we could introduce a quantifier ∃n , to be read as “there exists at least n”. That is, ∃n φ means: “there are at least n φs.” The definitions of a wff, and of truth in a model, would be updated with the following clauses: If α is a variable and φ is a wff, then ∃n αφ is a wff VM,g (∃n αφ)=1 iff |φM,g ,α | ≥ n The expression |A| stands for the “cardinality” of set A — i.e., the number of members of A. Thus, this definition says that ∃n αφ is true iff the cardinality of φM, g ,α is greater than or equal to n — i.e., this set has at least n members. Now, the introduction of the symbols ∃n do not increase the expressive power of predicate logic, for as we saw in section 5.1.3, we can symbolize “there are at least n Fs” using just standard predicate logic (plus =). The new notation is merely a space-saver. But other such additions are not mere space-savers. For instance, consider the quantifier most: If α is a variable and φ is a wff, then “most α φ” is a wff VM,g (most α φ)=1 iff |φM,g ,α | > |D−φM,g ,α | The minus-sign in the second clause is the symbol for set-theoretic difference: A−B is the set of things that are in A but not in B. Thus, the definition says that most α φ is true iff more things in the domain D are φ than are not φ. As it turns out (though I won’t prove this here), there is no way to say this using just
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
85
standard predicate logic. The introduction of the quantifier most genuinely enhances standard predicate logic. One could add all sorts of additional “quantifiers” Q in this way. Each would be, grammatically, just like ∀ and ∃, in that each would combine with a variable, α, and then attach to a sentence φ, to form a new sentence Qαφ. Each of these new quantifiers, Q, would be associated with a relation between sets, RQ , such that Qαφ would be true in a model, M, with domain D, relative to variable assignment g , iff φM, g ,α bears RQ to D. If such an added symbol Q is to count as a quantifier in any intuitive sense, then the relation RQ can’t be just any relation between sets. It should be a relation concerning the relative “quantities” of its relata. It shouldn’t, for instance, “concern particular objects” in the way that the following symbol, ∃Ted-loved , concerns particular objects: VM, g (∃Ted-loved αφ)=1 iff φM,g ,α ∩ {u : u ∈ D and Ted loves u} 6= ∅ So we should require the following of RQ . Consider any set, D, and any oneone function, f , from D onto another set D*. Then, if a subset X of D bears RQ to D, the set f [X ] must bear RQ to D*. ( f [X ] is the image of X under function f — i.e., {u : u ∈ D* and u = f (v), for some v ∈ D}. It is the subset of D* onto which f projects X .)
5.4.2
Generalized binary quantifiers
We have seen how the standard quantifiers ∀ and ∃ can be generalized in one way: syntactically similar symbols may be introduced and associated with different relations between sets. Our second way of generalizing the standard quantifiers is to allow two-place, or binary quantifiers. ∀ and φ are monadic in that ∀α and ∃α attach to a single open sentence φ. Compare the natural language monadic quantifiers ‘everything’ and ‘something’: Everything is material Something is spiritual Here, the predicates (verb phrases) ‘is material’ and ‘is spiritual’ correspond to the open sentences of logic; it is to these that ‘everything’ and ‘something’ attach. But in fact, monadic quantifiers in natural language are atypical. ‘Every’ and ‘some’ typically occur as follows:
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
86
Every student is happy Some fish are tasty The quantifiers ‘every’ and ‘some’ attach to two predicates. In the first, ‘every’ attaches to ‘[is a] student’ and ‘is happy’; in the second, ‘some’ attaches to ‘[is a] fish’ and ‘[is] tasty’. In these sentences, we may think of ‘every’ and ‘some’ as binary quantifiers. (Indeed, one might think of ‘everything’ and ‘something’ as the result of applying the binary quantifiers ‘every’ and ‘some’ to the predicate ‘is a thing’.) A logical notation can be introduced which exhibits a parallel structure, in which ∀ and ∃ attach to two open sentences. In this notation, the form of quantified sentences is: (∀α:φ)ψ (∃α:φ)ψ The first is to be read: “all φs are ψ”; the second is to be read “there is a φ that is a ψ”. The clauses for these new binary quantifiers are these: VM, g ((∀α:φ)ψ )=1 iff φM,g ,α ⊆ ψM,g ,α VM,g ((∃α:φ)ψ )=1 iff φM, g ,α ∩ ψM,g ,α 6= ∅ As with the introduction of the monadic quantifiers ∃n , the introduction of the binary existential and universal quantifiers does not increase the expressive power of first order logic, for the same effect can be achieved with monadic quantifiers. (∀α:φ)ψ and (∃α:φ)ψ become, respectively: ∀α(φ→ψ) ∃α(φ∧ψ) But, as with the monadic quantifier most, there are binary quantifiers one can introduce that genuinely increase expressive power. For example, most occurrences of ‘most’ in English are binary, e.g.: Most fish swim To symbolize such sentences, we can introduce a binary quantifier most2 . The sentence (most2 α:φ)ψ is to be read “most φs are ψs”. The semantic clause for most2 is:
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
87
VM, g ((most2 α:φ)ψ)=1 iff |φM,g ,α ∩ ψM,g ,α | > |φM, g ,α − ψM, g ,α | The binary most2 increases our expressive power, even relative to the monadic most: not every sentence expressible with the former is equivalent to a sentence expressible with the latter.2
5.4.3
Second-order logic
Standard predicate logic, as described in sections 4.1-4.3, is known as first-order. We’ll now briefly look at second-order predicate logic. The distinction has to do with how variables behave; it has syntactic and semantic sides. The syntactic part of the idea concerns the grammar of variables. All the variables in first-order logic are grammatical terms. That is, they behave grammatically like names: you can combine them with a predicate to get a wff; you cannot combine them solely with other terms to get a wff; etc. In second-order logic, on the other hand, variables can occupy predicate position. Thus, each of the following sentences is a well-formed formula in second-order logic: ∃X X a ∃X ∃yXy Here we say the variable X occupying predicate position. Predicate variables, like the normal predicates of standard first-order logic, can be one-place, twoplace, three place, etc. The semantic part of the idea concerns the interpretation of variables. In first-order logic, a variable-assignment assigns to each variable a member of the domain. A variable assignment in second-order logic assigns to each standard (first-order) variable α a member of the domain, as before, but assigns to each n-place predicate variable a set of n-tuples drawn from the domain. (This is what one would expect: the semantic value of a n-place predicate is its extension, a set of n-tuples, and variable assignments assign temporary semantic values.) Then, the following clauses to the definition of truth in a model must be added: If π is an n-place predicate variable and α1 …αn are terms, then VM,g (πα1 …αn ) = 1 iff 〈[α1 ]M,g …[αn ]M,g 〉 ∈ g (π) 2
Westerståhl (1989, p. 29).
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
88
If π is a predicate variable and φ is a wff, then VM, g (∀πφ) = 1 iff for every set U of n-tuples from D, VM, gU/π (φ) = 1 (where gU/π is the variable assignment just like g except in assigning U to π.) Second-order logic is different from first-order logic in many ways. For instance, one can define the identity predicate in second-order logic: x=y =d f ∀X (X x↔Xy) This can be seen to work correctly as follows. A one-place second-order variable X gets assigned a set of things. Thus, the atomic sentence X x says that the object (currently assigned to) x is a member of the set (currently assigned to) X . Thus, ∀X (X x↔Xy) says that x and y are members of exactly the same sets. But since x and only x is a member of {x} (i.e., x’s unit set), that means that the only way for this to be true is for y to be identical to x. More importantly, the metalogical properties of second-order logic are dramatically different from those of first-order logic. For instance, there is no sound and complete recursive set of axioms for second-order logic; the Löwenheim-Skolem theorem fails for second-order logic; Gödel’s incompleteness result does not apply to second-order logic (i.e., there is a recursive set of second-order axioms whose (second-order, semantic) consequences are all and only the sentences true in the standard model of arithmetic.) Such results are studied in depth in mathematical logic.3 Second-order logic also allows us to express claims that cannot be expressed in first-order logic. Consider the “Geach-Kaplan sentence”:4 (GK) Some critics admire only one another It can be shown that there is no way to symbolize (one reading of) the sentence using just first-order logic and predicates for ‘is a critic’ and ‘admires’. The sentence says that there is a group of critics such that the members of that group admire only other members of the group, but one cannot say this in first-order logic. However, the sentence can be symbolized in second-order logic: (GK2 ) ∃X [∃xX x ∧ ∀x(X x→C x) ∧ ∀x∀y([X x∧Axy]→[Xy∧x 6= y)] 3
See, for instance, Boolos and Jeffrey (1989, chapter 18) The sentence and its significance were discovered by Peter Geach and David Kaplan. See Boolos (1984). 4
CHAPTER 5. EXTENSIONS OF PREDICATE LOGIC
89
(GK2 ) “symbolizes” (GK) in the sense that it contains no predicates other than C and A, and for every model 〈D,I〉, the following is true: (*) (GK2 ) is true in 〈D,I〉 iff D has a nonempty subset, X , such that i) X ⊆ I(C ), and ii) whenever 〈u, v〉 ∈ I(A) and u ∈ X , then v ∈ X as well and v is not u. No first-order sentence symbolizes the Geach-Kaplan sentence in this sense. However, one can in a sense symbolize the Geach-Kaplan sentence using a firstorder sentence, provided the sentence employs, in addition to the predicates C and A, a predicate ∈ for set-membership: (GK1 ) ∃z[∃x x∈z ∧ ∀x(x∈z→C x) ∧ ∀x∀y([x∈z∧Axy]→[y∈z∧x 6= y)] (GK1 ) doesn’t “symbolize” (GK) in the sense of satisfying (*) in every model, for in some models the two-place predicate ∈ doesn’t mean set-membership. Nevertheless, if we just restrict our attention to models 〈D,I〉 in which ∈ does mean set-membership (restricted to the model’s domain, of course — that is, I(∈) = {〈u, v〉 : u, v ∈ D and u∈v}), then (GK1 ) will indeed satisfy (*). In essence, the difference between (GK1 ) and (GK2 ) is that it is hard-wired into the definition of truth in a model that second-order predications Xy express setmembership, whereas this is not hard-wired into the definition of the first-order predication y ∈ z.5
5
For more on second-order logic, see Boolos (1975, 1984, 1985).
Chapter 6 Propositional Modal Logic odal logic is the logic of necessity and possibility. Thus, in modal logic we will treat as logical constants words like “necessary”, “could be”, “must be”, etc. Here are our new symbols:
M
2φ: “It is necessary that φ”, “Necessarily, φ”, “It must be that φ” 3φ: “It is possible that φ”, “Possibly, φ”, “It could be that φ”, “It can be that φ”, “It might be that φ” “φ is possible” is sometimes used in the following sense: “φ could be true, but then again, φ could be false”. For example, if one says “it might rain tomorrow”, one might intend to say not only that there is a possibility of rain, but also that there is a possibility that there will be no rain. This is not the sense of ‘possible’ that we symbolize with the 3. In our intended sense, “possibly φ” does not imply “possibly not-φ”. To get into the spirit of this sense, note the naturalness of saying the following: “well of course 2+2 can equal 4, since it does equal 4”. Here, ‘can’ is used in our intended sense: it is presumably not possible for 2+2 to fail to be 4, and so in this case, ‘it can be the case that 2+2 equals 4’ does not imply ‘it can be the case that 2+2 does not equal 4’. It is helpful to think of the 2 and the 3 in terms of possible worlds. A possible world is a complete and possible scenario. “Possible” means simply that the scenario could have happened. There are no possible worlds in which it is both raining and also not raining (at the same time and place). But within this limit, we can imagine all sorts of possible worlds: possible worlds with talking donkeys, possible worlds in which I am ten feet tall, and so on. “Complete” means simply that no detail is left out — possible worlds are completely specific 90
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
91
scenarios. There is no possible world in which I am “somewhere between 10 and 11 feet tall”; I must be some particular height in any possible world.1 Likewise, in any possible world in which I am exactly 10 feet, six inches tall (say), I must have some particular weight, must live in some particular place, and so on. One of these possible worlds is the actual world — this is the complete and possible scenario that in fact obtains. The rest of them are merely possible — they do not obtain, but would have obtained if things had gone differently. In terms of possible worlds, we can think of our modal operators thus: “2φ” is true iff φ is true in all possible worlds “3φ” is true iff φ is true in at least one possible world It is necessarily true that all bachelors are male; in every possible world, every bachelor is male. There might have existed a talking donkey; some possible world contains a talking donkey. Possible worlds provide, at the very least, a vivid way to think about necessity and possibility. How much more than a vivid guide they provide is an open philosophical question. Some maintain that possible worlds are the key to the metaphysics of modality, that what it is for a proposition to be necessarily true is for it to be true in all possible worlds.2 Whether this view is defensible is a question beyond the scope of this book; what is important for present purposes is that we distinguish possible worlds as a vivid heuristic from possible worlds as a concern in serious metaphysics. Our first topic in modal logic is the addition of the 2 and the 3 to propositional logic; the result is modal propositional logic (“MPL”). A further step will be be modal predicate logic (chapter 9).
6.1
Grammar of MPL
We need a new language: the language of propositional modal logic. The grammar of this language is just like the grammar of propositional logic, except that we add the 2 as a new one-place sentence connective: Primitive vocabulary: 1 2
This is not to say that possible worlds exclude vagueness. See, for example, Sider (2003).
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
92
Sentence letters: P, Q, R . . . , with or without numerical subscripts Connectives: →, ∼, 2 Parentheses: (,)
Definition of wff: i) Sentence letters are wffs ii) If φ, ψ are wffs then φ→ψ, ∼φ and 2 are wffs iii) nothing else is a wff The 2 is the only new primitive connective. But just as we were able to define ∧, ∨, and ↔, we can define new nonprimitive modal connectives: 3φ =df ∼2∼φ “φ is possible” φ⇒ψ =df 2(φ→ψ) “strictly implies”
6.2
Translations in MPL
Modal logic allows us to symbolize a number of sentences we couldn’t symbolize before. The most obvious cases are sentences that overtly involve “necessarily”, “possibly”, or equivalent expressions: Necessarily, if snow is white, then snow is white or grass is green 2[S→(S∨G)] I’ll go if I must 2G→G It is possible that Bush will lose the election 3L Snow might have been either green or blue 3(G∨B) If snow could have been green, then grass could have been white 3G→3W ‘Impossible’ and related expressions signify the lack of possibility:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
93
It is impossible for snow to be both white and not white ∼3(W ∧∼W ) If grass cannot be clever then snow cannot be furry ∼3C →∼3F God’s being merciful is inconsistent with imperfection’s being incompatible with your going to heaven. ∼3(M ∧∼3(I ∧H )) (M =“God is merciful”, I = “You are imperfect”, H =“You go to heaven”) As for the strict conditional, it arguably does a decent job of representing certain English conditional constructions: Snow is a necessary condition for skiing ∼W ⇒∼K Food and water are required for survival ∼(F ∧W )⇒∼S Thunder implies lightning T ⇒L Once we add modal operators, we can expose an important ambiguity in certain English sentences. The surface grammar of a sentence like “if Ted is a bachelor, then he must be unmarried” is misleading: it suggests the symbolization: B→2U But since I am in fact a bachelor, it would follow from this symbolization that the proposition that I am unmarried is necessarily true. But clearly I am not necessarily a bachelor — I could have been married! The sentence is not saying that if I am in fact a bachelor, then the following is a necessary truth: I am married. It is rather saying that, necessarily, if I am a bachelor then I am married: 2(B→U ) It is the relationship between my being a bachelor and my being unmarried that is necessary. Think of this in terms of possible worlds: the first symbolization
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
94
says that if I am a bachelor in the actual world, then I am unmarried in every possible world (which is absurd); whereas the second one says that in each possible world, w, if I am a bachelor in w, then I am unmarried in w (which is quite sensible). The distinction between φ→2ψ and 2(φ→ψ) is called the distinction between the “necessity of the consequent” (first sentence) and the “necessity of the consequence” (second sentence). It is important to keep the distinction in mind, because of the fact that English surface structure is misleading. English modal words are ambiguous in a systematic way. For example, suppose I say that I can’t attend a certain conference in Cleveland. What is the force of “can’t” here? Probably I’m saying that my attending the conference is inconsistent with honoring other commitments I’ve made at that time. But notice that another sentence I might utter is: “I could attend the conference; but I would have to cancel my class, and I don’t want to do that.” Now I’ve said that I can attend the conference; have I contradicted my earlier assertion that I cannot attend the conference? No — what I mean now is perhaps that I have the means to get to Cleveland on that date. I have shifted what I mean by “can”. In fact, there are a lot of things one could mean by a modal word like ‘can’. Examples: I can come to the party, but I can’t stay late. (“can” = “is not inconvenient”) Humans can travel to the moon, but not Mars. (“can” = “is achievable with current technology”) Objects can move almost as fast as the speed of light, but nothing can travel faster than light. (“can” = “is consistent with the laws of nature”) Objects could have traveled faster than the speed of light (if the laws of nature had been different), but no matter what the laws had been, nothing could have traveled faster than itself. (“can” = “metaphysical possibility”) You can borrow but you can’t steal. (“can” = “morally acceptable”) So when representing English sentences using the 2 and the 3, one should keep in mind that these expressions can be used to express different strengths of necessity and possibility. (Though we won’t do this, one could introduce different symbols for different sorts of possibility and necessity.)
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
95
The different strengths of possibility and necessity can be made vivid by thinking, again, in terms of possible worlds. As we saw, we can think of the 2 and the 3 as quantifiers over possible worlds (the former a universal quantifier, the latter an existential quantifier). The very broad sort of possibility and necessity, metaphysical possibility and necessity, can be thought of as a completely unrestricted quantifier: a statement is necessarily true iff it is true in all possible worlds whatsoever. The other kinds of possibility and necessity can be thought of as resulting from various restrictions on the quantifiers over possible worlds. Thus, when ‘can’ signifies achievability given current technology, it means: true in some possible world in which technology has not progressed beyond where it has progressed in fact at the current time; when ‘can’ means moral acceptability, it means: true in some possible world in which nothing morally forbidden occurs; and so on.
6.3
Axiomatic systems of MPL
We’ll now study provability in modal logic. We’ll approach this axiomatically: we’re going to write down axioms, which are sentences of propositional modal logic that seem clearly to be logical truths, and we’re going to write down rules of inference, which say which sentences can be logically inferred from which other sentences. Unfortunately, this is much less straightforward for modal logic than it is for propositional logic. For propositional logic, pretty much everyone agrees what the right axioms and rules of inference should be. But for modal logic, this is less clear, especially for sentences that contain iterations of modal operators. For instance, should 2P →22P be a logical truth? It’s not obvious what the answer is. A quick peek at the history of modal logic is in order. Modal logic arose from dissatisfaction with the material conditional → of standard propositional logic. The material conditional φ→ψ is true whenever φ is false or ψ is true; but in expressing the conditionality of ψ on φ, we sometimes want to require a tighter relationship: we want it not to be a mere accident that either φ is false or ψ is true. To express this tighter relationship, C. I. Lewis introduced the strict conditional φ⇒ψ, which he defined, as above, as 2(φ→ψ).3 Thus defined, φ⇒ψ isn’t automatically true just because φ is false or ψ is true. It must be necessarily true that either φ is false or ψ is true. 3
See Lewis (1918); Lewis and Langford (1932).
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
96
Lewis then asked: what axioms should govern this new symbol 2? Certain axioms seemed clearly appropriate, for instance: 2(φ→ψ)→(2φ→2ψ). Others were less clear. Should 2φ→22φ be an axiom? What about 32φ→φ? What to choose as an axiom is far less obvious in modal logic as it is in standard propositional logic. Lewis’s solution to this problem was not to choose. Instead, he formulated several different axiomatic systems. The systems differ from one another by containing different axioms; as a result, different theorems are provable in the different systems. In the following sections, we will look at some of these axiomatic systems of Lewis’s. This of course only postpones the difficult questions. Those questions arise when we want to apply Lewis’s systems; when we ask which system is the correct system — i.e., which one correctly mirrors the logical properties of the English words ‘possibly’ and ‘necessarily’? (Note that since there are different sorts of necessity and possibility, different systems might correctly represent different sorts.) We won’t here be addressing such philosophical questions. Since we will be formulating multiple axiomatic systems, it is no longer appropriate to use a single symbol “`φ” to mean that φ is a theorem. Instead, we will give names to the modal systems: K, D, T, B, S4, S5; and we’ll use these names as subscripts on “`” to indicate theoremhood. Thus, `K φ will mean that φ is a theorem of system K.
6.3.1
System K
Our first system, K, is the weakest system — i.e., the system with the fewest theorems. Here are the rules and axioms of K: System K
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
Rules:
97
MP NEC (“necessitation”) φ φ→ψ φ ψ 2φ
Axioms: PL axioms: for any MPL-wffs φ, ψ, and χ , the following are axioms: (A1) φ→(ψ→φ) (A2) (φ→(ψ→χ ))→((φ→ψ)→(φ→χ )) (A3) (∼ψ→∼φ)→((∼ψ→φ)→ψ) K-schema: for any MPL wffs φ and ψ, the following is an axiom: 2(φ→ψ)→(2φ→2ψ) As before, a K-proof is defined as a series of wffs, each of which is either a K-axiom or follows from earlier lines in proof by a K-rule. A K-theorem is the last line of any K-proof. This axiomatic system (like all the modal systems we will study) is an extension of propositional logic, in the sense that it includes all of the theorems of propositional logic, but then adds more theorems. It includes all of propositional logic because one of its rules is the propositional logic rule MP, and each propositional logic axiom is one of its axioms. It adds theorems by adding a new rule of inference (NEC), and a new axiom schema (the K-schema) (as well as adding new wffs — wffs containing the 2 — to the stock of wffs that can occur in the PL axioms.) The rule of inference, NEC (for “necessitation”), says that if you have a formula φ on a line, then you may infer the formula 2φ. This may seem unintuitive. After all, can’t a sentence be true without being necessarily true? Yes — but the rule of necessitation doesn’t contradict this. Remember that every line in every axiomatic proof is a theorem. So whenever one uses necessitation in a proof, one is applying it to a theorem. And necessitation does seem appropriate when applied to theorems: if φ is a theorem, then 2φ ought also to be a theorem. Think of it this way. The worry about the rule of necessitation is that it isn’t a truth-preserving rule: intuitively speaking, its premise can be true when its conclusion is false. The answer to the worry is that while necessitation
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
98
doesn’t preserve truth, it does preserve logical truth, which is all that matters in the present context. For in the present context, we’re only using NEC in a definition of theoremhood. We want our theorems to be, intuitively, logical truths; and provided that our axioms are all logical truths and our rules preserve logical truth, the definition will yield only logical truths as theorems. We will return to this issue. Let’s investigate what one can prove in K. The simplest sort of distinctively modal proof consists of first proving something from the PL axioms, and then necessitating it, as in the following proof of 2((P →Q)→(P →P )) 1. 2. 3. 4.
P →(Q→P ) P →(Q→P ))→((P →Q)→(P →P )) (P →Q)→(P →P ) 2((P →Q)→(P →P ))
(A1) (A2) 1,2 MP 3, NEC
Using this technique, we can prove anything of the form 2φ, where φ is provable in PL. And, since the PL axioms are complete (I mentioned but did not prove this fact in chapter 2), that means that we can prove 2φ whenever φ is a tautology — i.e., a valid wff of PL. But constructing proofs from the PL axioms is a pain in the neck! — and anyway not what we want to focus on in this chapter. So let’s introduce the following time-saving shortcut. Instead of writing out proofs of tautologies, let’s instead allow ourselves to write any PL tautology at any point in a proof, annotating simply “PL”.4 Thus, the previous proof could be shortened to: 1. (P →Q)→(P →P ) PL 2. 2((P →Q)→(P →P )) 1, NEC Furthermore, consider the wff 2P →2P . Clearly, we can construct a proof of this wff from the PL axioms: begin with any proof of the tautology Q→Q from the PL axioms, and then construct a new proof by replacing each occurrence of Q in the first proof with 2P . (This is a legitimate proof, even though 2P isn’t a wff of propositional logic, because when we stated the system K, the schematic letters φ, ψ, and χ in the PL axioms are allowed to be filled in with any wffs of MPL, not just wffs of PL.) So let us also include lines like this in our modal proofs: 4
How do you know whether something is a tautology? Figure it out any way you like: do a truth table, or a natural deduction derivation — whatever.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
99
2P →2P PL Why am I making such a fuss about this? Didn’t I just say in the previous paragraph that we can write down any tautology at any time, with the annotation “PL”? Well, strictly speaking, 2P →2P isn’t a tautology. A tautology is a valid wff of PL, and 2P →2P isn’t even a wff of PL (since it contains a 2). But it is the result of beginning with some PL-tautology (Q→Q, in this case) and uniformly changing sentence letters to chosen modal wffs (in this case, Qs to 2P s); hence any proof of the PL tautology may be converted into a proof of it; hence the “PL” annotation is just as justified here as it is in the case of a genuine tautology. So in general, MPL wffs that result from PL tautologies in this way may be written down and annotated “PL”. Back to investigating what we can prove in K. As we’ve seen, we can prove that tautologies are necessary — we can prove 2φ whenever φ is a tautology. One can also prove in K that contradictions are impossible. For instance, ∼3(P ∧∼P ) is a theorem of K: 1. 2. 3. 4.
∼(P ∧∼P ) 2∼(P ∧∼P ) 2∼(P ∧∼P )→∼∼2∼(P ∧∼P ) ∼∼2∼(P ∧∼P )
PL 1, NEC PL 2, 3, MP
But line 4 is a definitional abbreviation of ∼3(P ∧∼P ). Let’s introduce another time-saving shortcut. Note that the move from 2 to 4 in the previous proof is just a move from a formula to a propositional logical consequence of that formula. Let’s allow ourselves to move directly from any lines in a proof, φ1 . . . φn , to any propositional logical consequence ψ of those lines, by “PL”. Thus, the previous proof could be shorted to: 1. ∼(P ∧∼P ) PL 2. 2∼(P ∧∼P ) 1, NEC 3. ∼∼2∼(P ∧∼P ) 2, PL Why is this legitimate? Suppose that ψ is a propositional logical semantic consequence of φ1 . . . φn . Then the conditional φ1 →(φ2 → · · · (φn →ψ) . . . ) is a PL-valid formula, and so, given the completeness of the PL axioms, is a theorem of K. That means that if we have φ1 , . . . , φn in an axiomatic K-proof, then we can always prove the conditional φ1 →(φ2 → · · · (φn →ψ) . . . ) using the PL-axioms, and then use MP repeatedly to infer ψ. So inferring ψ directly,
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
100
and annotating “PL”, is justified. (As with the earlier “PL” shortcut, let’s use this shortcut when the conditional φ1 →(φ2 → · · · (φn →ψ) . . .) results from some tautology by uniform substitution, even if it contains modal operators and so isn’t strictly a tautology.) So far our modal proofs have only used necessitation and the PL axioms. What about the K-axioms? The point of the K-schema is to enable “distribution of the 2 over the →”. That is, if you ever have the formula 2(φ→ψ), then you can always move to 2φ→2ψ as follows: 2(φ→ψ) 2(φ→ψ)→(2φ→2ψ) K axiom 2φ→2ψ MP Putting this together with necessitation, this means that whenever one can prove the conditional φ→ψ, then one can prove the modal conditional 2φ→2ψ as well. First prove φ→ψ, then necessitate it to get 2(φ→ψ), then distribute the 2 over the arrow to get 2φ→2ψ. This procedure is one of the core K-strategies, and is featured in the following proof of 2(P ∧Q)→(2P ∧2Q): 1. 2. 3. 4. 5. 6.
(P ∧Q)→P 2[(P ∧Q)→P ] 2[(P ∧Q)→P ]→[2(P ∧Q)→2P ] 2(P ∧Q)→2P 2(P ∧Q)→2Q 2(P ∧Q)→(2P ∧2Q)
PL NEC K axiom 3,4 MP Insert steps similar to 1-4 4,5, PL
Notice that the preceding proof, like all of our proofs since we introduced the time-saving shortcuts, is not a K-proof in the official defined sense. Lines 1, 5, and 6 are not axioms, nor do they follow from earlier lines by MP or NEC; similarly for line 6.5 So what kind of “proof” is it? It’s a metalanguage proof: an attempt to convince the reader, by acceptable standards of rigor, that some real K-proof exists. A reader could use this metalanguage proof as a blueprint for constructing a real proof. She would begin by replacing line 1 with a proof from the PL axioms of the conditional (P ∧Q)→P . (As we know from chapter 2, this could be a real pain in the neck! — but the completeness of PL assures us that it is possible.) She would then replace line 5 with lines parallel to lines 1-4, 5 A further (even pickier) reason: the symbol ∧ isn’t allowed in wffs; the sentences in the proof are mere abbreviations for official MPL-wffs.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
101
but which begin with a proof of (P ∧Q)→Q rather than (P ∧Q)→P . Finally, in place of line 6, she would insert a proof from the PL axioms of the sentence (2(P ∧Q)→2P )→[(2(P ∧Q)→2Q)→(2(P ∧Q)→(2P ∧2Q))], and then use modus ponens twice to infer 2(P ∧Q)→(2P ∧2Q). Another example: (2P ∨2Q)→2(P ∨Q): 1. 2. 3. 4. 5. 6.
P →(P ∨Q) 2(P →(P ∨Q)) 2(Q→(P ∨Q)) 2P →2(P ∨Q) 2Q→2(P ∨Q) (2P ∨2Q)→2(P ∨Q)
PL 1, NEC PL, NEC 2, K 3, K 4,5 PL
Here I’ve introduced another time-saving shortcut that I’ll use more and more as we progress: doing two (or more) steps at once. Line 3 is really short for: 3a. Q→(P ∨Q) PL 3b. 2(Q→(P ∨Q)) 3a, NEC And line 4 is short for: 4a. 2(P →(P ∨Q))→(2P →2(P ∨Q)) K axiom 4b. 2P →2(P ∨Q) 2, 4a, MP One further comment about this last proof: it illustrates a strategy that is common in modal proofs. We were trying to prove a conditional formula whose antecedent is a disjunction of two modal formulas. But the modal techniques we had developed didn’t deliver formulas of this form. They only showed us how to put 2s in front of PL-tautologies, and how to distribute 2s over conditional. They only yield formulas of the form 2φ and 2φ→2ψ, whereas the formula we’re trying to prove looks different. To overcome this problem, what we did was to use the modal techniques to prove two conditionals, namely 2P →2(P ∨Q) and 2Q→2(P ∨Q), from which the desired formula, namely (2P ∨2Q)→2(P ∨Q), follows by propositional logic. The trick, in general, is this: remember that you have PL at your disposal. Simply look for one or more modal formulas you know how to prove which, by PL, imply the formula you want. Assemble the desired formulas, and then write down your desired formula, annotating “PL”. In doing so, it may be helpful to recall PL inferences like the following:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC φ→ψ ψ→φ φ↔ψ
φ→(ψ→χ ) (φ∧ψ)→χ
φ→ψ φ→χ φ→(ψ∧χ )
φ→χ ψ→χ (φ∨ψ)→χ
φ→ψ ∼φ∨ψ
φ→∼ψ ∼(φ∧ψ)
102
The next example illustrates our next major modal proof technique: combining two 2 statements to get a single 2 statement. Let us construct a K-proof of (2P ∧2Q)→2(P ∧Q): 1. 2. 3. 4. 5. 6.
P →(Q→(P ∧Q)) 2[P →(Q→(P ∧Q))] 2P →2(Q→(P ∧Q)) 2(Q→(P ∧Q))→[2Q→2(P ∧Q)] 2P →[2Q→2(P ∧Q)] (2P ∧2Q)→2(P ∧Q)
PL NEC 2, K K axiom 3,4 PL 5, PL
(If you wanted to, you could skip step 5, and just go straight to 6 by propositional logic, since 6 is a propositional logical consequence of 3 and 4; I put it in for perspicuity.) The general technique illustrated by the last problem applies anytime you want to move from several 2 statements to a further 2 statement, where the inside parts of the first 2 statements imply the inside part of the final 2 statement. More carefully: it applies whenever you want to prove a formula of the form 2φ1 →(2φ2 → · · · (2φn →2ψ) . . . ), provided you are able to prove the formula φ1 →(φ2 → · · · (φn →ψ) . . . ). (The previous proof was an instance of this because it involved moving from 2P and 2Q to 2(P ∧Q); and this is a case where one can move from the inside parts of the first two formulas (namely, P and Q), to the inside part of the third formula (namely, P ∧Q) — by PL.) To do this, one begins by proving the conditional φ1 →(φ2 → · · · (φn →ψ) . . . ), necessitating it to get 2[φ1 →(φ2 → · · · (φn →ψ) . . . )], and then distributing the 2 over the arrows repeatedly using K-axioms and PL to get 2φ1 →(2φ2 → · · · (2φn →2ψ) . . . ).
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
103
One cautionary note in connection with this last proof: one might think to make this proof more intuitive by using something like conditional proof (which is, in essence, a use of the deduction theorem for propositional logic): 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
2P ∧2Q 2P 2Q P →(Q→(P ∧Q)) 2[P →(Q→(P ∧Q))] 2P →2(Q→(P ∧Q)) 2(Q→(P ∧Q)) 2Q→2(P ∧Q) 2(P ∧Q) (2P ∧2Q)→2(P ∧Q)
Assume for conditional proof 1, PL 1, PL PL NEC 5, K 6,2, MP 7,K 3,8 MP 1-9, conditional proof
But this is not a legal proof. Not only is conditional proof not allowed by the official rules of the proof system; it couldn’t be added to the system, since the deduction theorem (which says that if φ is provable from ψ then the conditional φ→ψ is provable) does not hold for K. If conditional proof did work, one could show that the following is a K theorem: P →2P . But clearly, this shouldn’t be a K theorem (and indeed, one can show that it isn’t.) Here’s the bogus proof: 1. P Assume 2. 2P 1, NEC 3. P →2P 1,2, Conditional proof We already knew that conditional proof is not officially allowed in K-proofs. But this shows that we couldn’t add conditional proof to our system (at least, not in a straightforward way.) Of course, to convince yourself that a given formula is really a tautology of propositional logic, you could sketch a proof of it to yourself using conditional proof, since conditional proof is acceptable for non-modal propositional logic. But as I’ve mentioned, we can just infer propositional logical consequences directly in our proofs. Another example will illustrate another technique for proving formulas in K with “nested” modal operators; let us prove `K 22(P ∧Q)→22P : 1. 2. 3. 4.
(P ∧Q)→P 2(P ∧Q)→2P 2[2(P ∧Q)→2P ] 22(P ∧Q)→22P
PL 1, NEC, K 2, NEC 3, K
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
104
Notice in line 3 that we necessitated something that was not a PL theorem. That’s ok; we’re allowed to necessitate any K-theorems, even those whose proofs were distinctly modal. Notice also how this proof contains two instances of our basic K-strategy. This strategy involves obtaining a conditional, necessitating it, then distributing the 2 using K. We did this first using the conditional (P ∧Q)→P ; that led us to a conditional, 2(P ∧Q)→2P . Then we started the strategy over again, using this as our initial conditional. So far we have no techniques dealing with the 3, other than eliminating it by definition. It will be convenient to derive some shortcuts. For example, we should be assured that the following hold: (these are often called “modal negation”, or MN): `K ∼2φ↔3∼φ `K ∼3φ↔2∼φ I’ll do one of these; the rest can be an exercise: 1. ∼∼φ→φ PL 2. 2∼∼φ→2φ 1, NEC, K 3. ∼2φ→∼2∼∼φ 2, PL The final line, 3, is the definitional equivalent of ∼2φ→3∼φ. Similarly, we should establish that 2φ↔∼3∼φ (i.e., 2φ↔∼∼2∼∼φ) is a theorem: 1. 2. 3. 4. 5. 6.
φ→∼∼φ 2φ→2∼∼φ 2φ→∼∼2∼∼φ 2∼∼φ→2φ ∼∼2∼∼φ→2φ 2φ↔∼∼2∼∼φ
PL 1, NEC, K 2, PL PL, NEC, K 4, PL 5, 3 PL
We can refer to the two theorems 2φ↔∼3∼φ, and 3φ↔∼2∼φ, as “dual”, because they say that the 2 and the 3 are “duals” (an analogy: the quantifiers are duals, because ∀xφ↔∼∃x∼φ, and ∃xφ↔∼∀x∼φ are both logical truths). It will also be worthwhile to know that an analog of the K axiom holds for the 3: “K3”: 2(φ→ψ)→(3φ→3ψ)
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
105
which amounts to showing that the following is a theorem: 2(φ→ψ)→(∼2∼φ→∼2∼ψ): How are we going to do this? We’re dealing with an axiomatic system here, so we can’t use conditional derivation, indirect derivation, and so on, as we usually would in deriving things in propositional logic. So what to do? What we need to do is look for a tautology that we can necessitate, and which will correspond to this pattern of reasoning. We can be guided as follows: the formula we seek is equivalent, in propositional logic, to the following: 2(φ→ψ)→(2∼ψ→2∼φ) So if we can show this last formula, we can infer our goal by PL reasoning. But this last formula looks like the result of necessitating a tautology, and then distributing the 2 over the → a couple times; and indeed it is: 1. 2. 3. 4. 5.
(φ→ψ)→ (∼ψ→∼φ) 2(φ→ψ)→2(∼ψ→∼φ) 2(∼ψ→∼φ)→(2∼ψ→2∼φ) 2(φ→ψ)→(2∼ψ→2∼φ) 2(φ→ψ)→(∼2∼φ→∼2∼ψ)
PL 1, NEC, K K 2,3 PL 4,PL
In doing proofs, let’s also allow ourselves to refer to earlier theorems proved, rather than repeating their proofs. The importance of K3 may be illustrated by the following proof of 2P →(3Q→3(P ∧Q)): 1. 2. 3. 4.
P →[Q→(P ∧Q)] 2P →2[Q→(P ∧Q)] 2[Q→(P ∧Q)]→[3Q→3(P ∧Q)] 2P →[3Q→3(P ∧Q)]
PL 1, NEC, K K3 2,3, PL
In general, the K3 rule allows us to complete proofs of the following sort. Suppose we wish to prove a formula of the form: O1 φ1 →(O2 φ2 →(. . . →(On φn →3ψ) . . .) where the Oi s are modal operators, all but one of which are 2s. (Thus, the remaining Oi is the 3.) This can be done, provided that ψ is provable in K from the φi s. The basic strategy is to prove a nested conditional, the antecedents of which are the φi s, and the consequent of which is ψ; necessitate it; then
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
106
repeatedly distribute the 2 over the →s, once using K3, the rest of the times using K. But there is one catch. We need to make the application of K3 last, after all the applications of K. This in turn requires the conditional we use to have the φi that is underneath the 3 as the last of the antecedents. For instance, suppose that φ3 is the one underneath the 3. Thus, what we are trying to prove is: 2φ1 →(2φ2 →(3φ3 →(2φ4 →(…→(2φn →3ψ)…) In this case, the conditional to use would be: φ1 →(φ2 →(φn →(φ4 →(…→(φn−1 →(φ3 →ψ)…) In other words, one must swap one of the other φi s (I arbitrarily chose φn ) with φ3 . What one obtains at the end will therefore have the modal statements out of order: 2φ1 →(2φ2 →(2φn →(2φ4 →(…→(2φn−1 →(3φ3 →3ψ)…) But that problem is easily solved; this is equivalent in PL to what we’re trying to get. (Recall that φ→(ψ→χ ) is logically equivalent in PL to ψ→(φ→χ ).) Why do we need to save K3 for last? The strategy of successively distributing the box over all the nested conditionals comes to a halt as soon as the K3 theorem is used. Let me illustrate with an example. Suppose we wish to prove `K 3P →(2Q→3(P ∧Q)). We might think to begin as follows: 1. 2. 3. 4.
P →(Q→(P ∧Q)) PL 2[P →(Q→(P ∧Q))] 1, Nec 3P →3(Q→(P ∧Q)) K3 ?
But now what? What we need to finish the proof is: 3(Q→(P ∧Q))→(2Q→3(P ∧Q)). But neither K nor K3 gets us this. The remedy is to begin the proof with a different conditional: 1. 2. 3. 4. 5. 6.
Q→(P →(P ∧Q)) 2(Q→(P →(P ∧Q))) 2Q→2(P →(P ∧Q)) 2(P →(P ∧Q))→(3P →3(P ∧Q)) 2Q→(3P →3(P ∧Q)) 3P →(2Q→3(P ∧Q))
PL 1, Nec 2, K, MP K3 3, 4, PL 5, PL
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
107
Before completing our discussion of K, it will be useful to note an important theorem: Substitution of equivalents: If `K α↔β, then for every wff χ , `K χ ↔χ β , where χ β results from χ by changing αs to βs. Proof of substitution of equivalents: Suppose (*) `K α↔β. We proceed by induction. Base case: here χ is a sentence letter. Then either i) changing αs to βs has no effect, in which case χ β is just χ , in which case obviously `K (χ ↔χ β ); or ii) χ is α, in which case χ β is β, and we know `K (χ ↔χ β ) since we are given (*). Induction case: We now assume the result holds for some formulas β β χ1 and χ2 — that is, we assume that `K χ1 ↔χ1 and `K χ2 ↔χ2 — and we show the result holds for ∼χ1 , 2χ1 , and χ1 →χ2 . Take the first case. We must show that the result holds for ∼χ1 — β i.e., we must show that `K ∼χ1 ↔(∼χ1 )β . (∼χ1 )β is just ∼χ1 , so β we must show `K ∼χ1 ↔∼χ1 . But this follows by PL from the β inductive hypothesis: `K χ1 ↔χ1 . Take the second case. We must show `K (χ1 →χ2 )β . The inducβ tive hypothesis tells us that `K χ1 ↔χ1 , from which it follows by propositional logic that: `K (χ1 →χ2 )↔ (χ1 β →χ2 ) The inductive hypothesis also tells us that `K (χ2 ↔χ2 β ), from which by propositional logic we obtain: `K (χ1 β →χ2 )↔ (χ1 β →χ2 β ) Now from the two indented equivalences, by propositional logic we have: `K (χ1 →χ2 )↔ (χ1 β →χ2 β ) But note that (χ1 β →χ2 β ) is just the same formula as (χ1 →χ2 )β . So we’ve shown what we wanted to show.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
108 β
Finally, take the third case. We must show that `K 2χ1 ↔2χ1 . β This follows from the inductive hypothesis `K χ1 ↔χ1 . For the β inductive hypothesis implies `K χ1 →χ1 , by PL; and then by NEC β and K we have `K 2χ1 →2χ1 . A parallel argument establishes β `K 2χ1 →2χ1 ; and then the desired conclusion follows by PL. That completes the inductive proof. Substitution of equivalents is important because it lets us utilize proved equivalences in later proofs. For example, above, we proved the following two theorems: 2(P ∧Q)→(2P ∧2Q) (2P ∧2Q)→2(P ∧Q) From this we can infer by PL logic that 2(P ∧Q)↔(2P ∧2Q) is a theorem; let us call this 2∧. This means that, in effect, from now on, we can treat 2(P ∧Q) and 2P ∧2Q interchangeably. We’ll allow ourselves to utilize substitution of equivalents within proofs; let’s refer to the rule as “eq”. Note that the principle of substitution of equivalents, plus the biconditionals of modal negation, MN, that were shown to be theorems, allows us to “swap” negation signs and modal operators at will. For example, we can show ∼223φ↔332∼φ as follows: 1. 2. 3. 4.
∼223φ↔ ∼223φ 3∼23φ↔ ∼223φ 33∼3φ↔ ∼223φ 332∼φ↔ ∼223φ
PL 1, MN, eq 2. MN, eq 3, MN, eq
Let’s allow ourselves, for brevity, to do several of these steps at once. One can, then, prove a number of theorems in K. But there’s a big limitation of K: you can’t prove 2P →3P . (We’ll show this later, when we study invalidity.) From this, it follows that you can’t prove that tautologies are possible or that contradictions aren’t necessary. So a new system is called for.
6.3.2
System D
System D results from adding a new axiom to system K: System D
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
109
Rules: MP, NEC Axioms: PL axioms K-schema: 2(φ→ψ)→(2φ→2ψ) D-schema: 2φ→3φ Notice that since D includes all the K axioms, we retain all the K-theorems. And notice that this means that substitution of equivalents still holds. The addition of the new axioms just gives us more theorems. In fact, all of our systems will build on K in this way, by adding new axioms to K. An example of what we can do now is prove that tautologies are possible: 1. 2. 3. 4.
P ∨∼P 2(P ∨∼P ) 2(P ∨∼P )→3(P ∨∼P ) 3(P ∨∼P )
PL 1, NEC D 2,3 MP
Let’s do one more theorem, 22P →23P : 1. 2P →3P D 2. 2(2P →3P ) 1, NEC 3. 22P →23P 2, K This system is also extremely weak. As we will see later, we can’t prove 2φ→φ in D. Therefore, D doesn’t seem to be a correct logic for metaphysical necessity, for surely, if something’s true in all possible worlds, then it is true in the actual world, and so just plain old true. But there is some interest in D anyway: some have suggested that D is a correct logic for some sort of moral necessity: thus, 2φ might be read as “φ is morally necessary”, “φ ought to be the case”, or “φ ought to be performed”; similarly, 3φ might be read as “φ is morally permitted”, or “φ is allowed”. Thus, the D axiom corresponds to the principle that if something ought to be performed then it is permitted. That 2φ→φ cannot be proved in D would then be a virtue, for from the fact that something ought be performed, it certainly doesn’t follow that it is performed. But I won’t go any further into the question of whether D in fact does give a correct logic for some concept of moral necessity. That’s philosophy, not logic.
6.3.3
System T
This new system is the first system we have considered that has any plausibility of being a correct logic for metaphysical necessity. For we add the following
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
110
axiom schema: 2φ→φ. In fact, when we do this we can drop the D schema, since its instances become theorems. (We’ll show this below.) Thus, our T system is given by: System T Rules: MP, NEC Axioms: PL axioms K-schema: 2(φ→ψ)→(2φ→2ψ) T-schema: 2φ→φ The first theorem that’s worth establishing is this: “T3”: `T φ→3φ: 1. 2∼φ→∼φ T 2. φ→∼2∼φ 1, PL 2 is just the definition of φ→3φ. And now notice that instances of the Daxioms are now theorems, for 2φ→φ is a T axiom, we just proved that φ→3φ is a theorem; and from these two by PL we can prove 2φ→3φ.
6.3.4
System B
Our systems so far don’t allow us to prove anything interesting about iterated modalities, i.e., sentences with consecutive boxes or diamonds. Which such sentences should be theorems? The B axiom schema decides some of these questions for us: System B Rules: MP, NEC Axioms: PL axioms K-schema: 2(φ→ψ)→(2φ→2ψ) T-schema: 2φ→φ B-schema: 32φ→φ As in the case of T, let’s derive an analog of the B axiom for the 3: B3: φ→23φ:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
111
1. 32∼φ→∼φ B 2. ∼23φ→∼φ 2, MN, eq (twice) 3. φ→23φ 3, PL Let’s try an example from the sheet of MPL theorems (# 41): [2P ∧232(P →Q)]→2Q: 1. 2. 3. 4. 5.
6.3.5
32(P →Q)→(P →Q) 232(P →Q)→2(P →Q) 2(P →Q)→(2P →2Q) 232(P →Q)→(2P →2Q) [2P ∧232(P →Q)]→2Q
B 1, Nec, K, MP K 2, 3 PL 4, PL
System S4
Here, we replace the B schema with a different schema: System S4 Rules: MP, NEC Axioms: PL axioms K-schema: 2(φ→ψ)→(2φ→2ψ) T-schema: 2φ→φ S4-schema: 2φ→22φ Again, we have a derived version of the S4 axiom for the diamond: Proof of “S43”: 33φ→3φ: 1. 2. 3. 4. 5.
2∼φ→22∼φ ∼22∼φ→∼2∼φ ∼∼33φ→∼2∼φ ∼∼33φ→3φ 33φ→3φ
S4 1, PL 2, MN, eq (twice) 3, dual, eq 4, PL
Let’s do one fairly difficult example: (3P ∧2Q)→3(P ∧2Q) (half of #56 from the theorem sheet). How should we approach this problem? My thinking is as follows. We saw in the K section above that the following sort of thing may always be proved: 2φ→(3ψ→3χ ), whenever the conditional φ→(ψ→χ ) can be proved. So we need to try to work the problem into this form.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
112
As-is, the problem doesn’t quite have this form. But something very related does have this form, namely: 22Q→(3P →3(P ∧2Q)) (since the conditional 2Q→(P →(P ∧2Q)) is a tautology). This thought inspires the following proof: 1. 2. 3. 4. 5. 6.
6.3.6
2Q→(P →(P ∧2Q)) 22Q→2(P →(P ∧2Q)) 2(P →(P ∧2Q))→(3P →3(P ∧2Q)) 22Q→(3P →3(P ∧2Q)) 2Q→22Q (3P ∧2Q)→3(P ∧2Q)
PL 1, Nec, K, MP K3 2, 3 PL S4 4, 5 PL
System S5
Here, instead of the B or S4 schemas, we add the S5 schema to T: System S5 Rules: MP, NEC Axioms: PL axioms K-schema: 2(φ→ψ)→(2φ→2ψ) T-schema: 2φ→φ S5-schema: 32φ→2φ Theorem: “S53”: 3φ→23φ: 1. 32∼φ→2∼φ S5 2. ∼2∼φ→∼32∼φ 1, PL 3. 3φ→23φ 2, MN, dual, PL, eq. Note that the B and S4 axioms are now derivable as theorems. The B axiom, 32φ→φ, is trivial: 1. 32φ→2φ S5 2. 2φ→φ T 3. 32φ→φ 1,2 PL And now the S4 axiom, 2φ→22φ. This is a little harder. I used the B3 theorem, which we can now appeal to since the theoremhood of the B-schema has been established.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC 1. 2. 3. 4. 5.
2φ→232φ 32φ→2φ 2(32φ→2φ) 232φ→22φ 2φ→22φ
113
B3 S5 2, Nec 3, K, MP 4, 1, PL
In S5, an important result for strings of modal operators holds: whenever a formula has a string of modal operators in front, it is always equivalent to the formula with only the first operator. For example, 223232232323φ is equivalent to 3φ (that is, each is provable from the other). This follows from the fact that the following equivalences are all theorems of S5: a) 32φ↔2φ b) 22φ↔2φ c) 23φ↔3φ d) 33φ↔3φ The left-to-right direction of a) is just S5; the right-to-left is T3; b) is T and S4; c) is T and S53; and d) is S43 and T3. Thus, by repeated applications of these equivalences, using substitution of equivalents, we can reduce strings of modal operators to the innermost operator. This fact, in conjunction with the MN and dual equivalences above, allows us to perform reductions on any string of modal operators and tildes (∼).
6.4
Semantics for MPL
So far, we have ways of establishing that a given wff is a theorem, for each of the various systems. But we don’t have a method for showing that a given wff isn’t a theorem. Truth-tables did for us in non-modal propositional logic. To show φ is not a theorem of PL, we produced an assignment to the sentence letters, relative to which φ is false. The existence of such an assignment means that φ is not valid. But by the soundness proof, every theorem is valid. So φ is not a theorem. We will follow the same procedure for MPL: for each system, we will define a notion of validity, and provide a soundness proof. Soundness, recall, means that every theorem is valid. Thus, to establish non-theoremhood, it will suffice to establish invalidity.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
114
Defining validity is important for a couple other reasons. First, it gives us an alternate way of characterizing logical consequence. Whereas theoremhood is a proof-theoretic method of characterizing consequence, validity is a semantic method. Secondly, giving a semantics for a logical system sheds light on meaning. Learning about the truth table for ∧ tells us something about its meaning: the function of ∧ is to produce compound sentences φ∧ψ, which are true iff both conjuncts φ and ψ are true. Likewise, it is plausible to think that constructing a semantics for modal logic will tell us something about what 2 and 3 mean. In constructing a semantics for MPL, we face the following challenge: the modal operators 2 and 3 are not truth functional. A (sentential) connective is an expression that combines with sentences to make new sentences. A one-place connective combines with one sentence to form a new sentence. ‘It is not the case that’ is a one-place connective of English — the ∼ is a one-place connective in the language of PL. A connective is truth-functional iff whenever it combines with sentences to form a new sentence, the truth value of the resulting sentence is determined by the truth value of the component sentences. Many think that ‘and’ is truth-functional, since they think that an English sentence of the form “φ and ψ” is true iff φ and ψ are both true. But ‘necessarily’ is not truthfunctional. Suppose I tell you the truth value of φ; will you be able to tell me the truth value of this sentence? Well, if φ is false then presumably you can (it is false), but if φ is true, then you still don’t know. If φ is “Ted is a philosopher” then “Necessarily φ” is false, but if φ is “Either Ted is a philosopher or he isn’t a philosopher” then “Necessarily φ” is true. So the truth value of “Necessarily φ” isn’t determined by the truth value of φ. Similarly, ‘possibly’ isn’t truthfunctional either: ‘I might have been six feet tall’ is true, whereas ‘I might have been a round square’ is false, despite the fact that ‘I am six feet tall’ and ‘I am a round square’ each have the same truth value (they’re both false.) Since the 2 and the 3 are supposed to represent ‘necessarily’ and ‘possibly’, respectively, and since the latter aren’t truth-functional, we can’t use the method of truth tables to construct the semantics for the 2 and the 3. For the method of truth tables assumes truth-functionality. Truth tables are just pictures of truth functions: they specify what truth value a complex sentence has as a function of what truth values its parts have. Imagine trying to construct a truth table for the 2. It’s presumably clear (though see the discussion of systems K, D, and T below) that 2φ should be false if φ is false, but what about when φ is true?:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
115
2 φ ? 1 0 0 There’s nothing we can put in this slot in the truth table, since when φ is false, sometimes 2φ is true and sometimes it is false. Our challenge is clear: we need to develop a semantics for the 2 and the 3 which isn’t just the method of truth tables.
6.4.1
Relations
Before we investigate how to overcome this challenge, a digression is necessary. Recall our discussion of ordered ’tuples from section 4.2. In addition to their use in constructing models, ordered ’tuples are also useful for constructing relations. We take a binary (2-place) relation to be a set of ordered pairs. For example, the taller-than relation may be taken to be the set of ordered pairs 〈u, v〉 such that u is taller than v. The less-than relation for positive integers is the set of ordered pairs 〈m, n〉 such that m is a positive integer less than n, another positive integer. That is, it is the following set: {〈1, 2〉, 〈1, 3〉, 〈1, 4〉 . . . 〈2, 3〉, 〈2, 4〉…} When 〈u, v〉 is a member of relation R, we say that u and v stand in R, or that u bears R to v. Most simply, we write “Ruv”. (This notation is like that of predicate logic; but here I’m speaking the metalanguage, not displaying sentences of a formalized language.) Some definitions. Let R be any binary relation. • The domain of R (“dom(R)”) is the set {u: for some v, Ruv} • The range of R (“ran(R)”) is the set {u: for some v, Rv u} • R is over A iff dom(R) ⊆ A and ran(R) ⊆ A In other words, the domain of R is the set of all things that bear R to something; the range is the set of all things that something bears R to; and R is over A iff the members of the ’tuples in R are all drawn from A. Let R be any binary relation over A. Then we define the following notions with respect to A: • R is serial (in A) iff for every u ∈ A, there is some v ∈ A such that Ruv.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
116
• R is reflexive (in A) iff for every u ∈ A, Ru u • R is symmetric iff for all u, v, if Ruv then Rv u • R is transitive iff for any u, v, w, if Ruv and Rv w then Ruw • R is an equivalence relation (in A) iff R is symmetric, transitive, and reflexive (in A) • R is total (in A) iff for every u, v ∈ A, Ruv Notice that we relativize some of these notions to a given set A. We define the notion of reflexivity relative to a set, for example. We do this because the alternative would be to say that a relation is reflexive simpliciter if everything bears R to itself; but that would require the domain and range of any reflexive relation to be the set of absolutely all objects. It’s better to introduce the notion of being reflexive relative to a set, which is applicable to relations with smaller domains and ranges. (I will sometimes omit the qualifier ‘in A’ when it is clear which set that is.) Why don’t symmetry and transitivity have to be relativized to a set? — because they only say what must happen if R holds among certain things. Symmetry, for example, says merely that if R holds between u and v, then it must also hold between v and u, and so we can say that a relation is symmetric absolutely, without implying that everything is in its domain and range.
6.4.2
Kripke models for MPL
Now we’re ready to introduce a semantics for MPL. As we saw, we can’t construct truth tables for the 2 or the 3. Instead, we will pursue an approach called possible-worlds semantics. The intuitive idea is to count 2φ as being true iff φ is true in all possible worlds, and 3φ as being true iff φ is true in some possible worlds. More carefully: we are going to develop models for modal propositional logic. These models will contain objects we will call “possible worlds”. And formulas are going to be true or false “at” these worlds — that is, we are going to assign truth values to formulas in these models relative to possible worlds, rather than absolutely. Truth values of propositional-logic compound formulas — that is, negations and conditionals — will be determined by truth tables within each world; ∼φ, for example, will be true at a world iff φ is false at that world. But the truth value of 2φ at a world won’t be determined by the truth value of φ at that world; the truth value of φ at other worlds will also be relevant.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
117
Specifically, 2φ will count as true at a world iff φ is true at every world that is “accessible” from the first world. What does “accessible” mean? Each model will come equipped with a binary relation, R, that holds between possible worlds; we will say that world v is “accessible from” world w when Rwv. The intuitive idea is that Rwv if and only if v is possible relative to w. That is, if you live in world w, then from your perspective, the events in world v are possible. The idea that what is possible might vary depending on what possible world you live in might at first seem strange, but it isn’t really. “It is physically impossible to travel faster than the speed of light” is true in the actual world, but false in worlds where the laws of nature allow faster-than-light travel. On to the official definitions based on these intuitive ideas. We need some general definitions which we will re-use in defining validity for the various systems. A frame is an ordered pair 〈W, R〉, such that: i) W is a non-empty set of objects, the “worlds” ii) R is a binary relation over W, the “accessibility relation” An MPL-model (“based on frame 〈W, R〉”) is an ordered triple, 〈W, R, A〉, such that: i) 〈W, R〉 is a frame ii) A is a two-place function (a “modal assignment function”) that assigns a 0 or 1 to each sentence letter, relative to (“at”) each world — that is, for any sentence letter α, and any w ∈ W, A(α, w) is either 0 or 1. Where M (=〈W, R, A〉) is any MPL-model, the valuation for M, VM , is defined as the two-place function that assigns either 0 or 1 to each wff relative to each member of W, subject to the following constraints, where α is any sentence letter, φ and ψ are any wffs, and w is any member of W: a) VM (α, w) = A(α, w) b) VM (∼φ, w)=1 if VM (φ, w)=0 c) VM (φ→ψ, w)=1 iff either VM (φ, w)=0 or VM (ψ, w)=1 d) VM (2φ, w)=1 iff for each v ∈ W, if Rwv, then VM (φ, v)=1
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
118
Note how each model has two halves, the frame and the assignment. Each frame is a map of the “structure” of the space of possible worlds: it contains information about how many worlds there are, and which worlds are accessible from which. The assignment then adds information about which sentence letters are true at which worlds. And given any model, the valuation function goes further and extends the notion of truth at worlds to complex wffs. What about the truth values for complex formulas that contain ∧, ∨, ↔, and 3? We have as derived conditions the following: i) VM (φ∧ψ, w)=1 iff VM (φ, w)=1 and VM (ψ, w)=1 ii) VM (φ∨ψ, w)=1 iff VM (φ, w)=1 or VM (ψ, w)=1 iii) VM (φ↔ψ, w)=1 iff VM (φ, w)=VM (ψ, w) iv) VM (3φ, w)=1 iff for some v ∈ W, Rwv and VM (φ, v)=1 It’s important to get clear on the status of possible-worlds lingo here. Where 〈W, R,I〉 is a model, we call the members of W “worlds”, and we call R the “accessibility” relation. But there’s no heavy-duty or suspect metaphysics going on. We give R and the members of W these names simply because this is a vivid way to think. Officially, W is nothing but a nonempty set, any old nonempty set. Its members needn’t be the kinds of things metaphysicians call possible worlds: they can be numbers, people, bananas — whatever you like. Similarly, R is just defined to be any old binary relation on R.
6.4.3
Validity in MPL
More definitions. A formula φ is valid on a frame 〈W,R〉 iff for every MPL-model M based on this frame, and every w ∈ W,VM (φ, w) = 1 We can now give our definitions of validity for our various systems: • φ is K-valid iff φ is valid on all frames • φ is D-valid iff φ is valid on all serial frames (i.e. all frames 〈W,R〉 where R is serial in W) • φ is T-valid iff φ is valid on all reflexive frames • φ is B-valid iff φ is valid on all reflexive and symmetric frames (i.e., on all frames 〈W,R〉 where R is reflexive in W and transitive)
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
119
• φ is S4-valid iff φ is valid on all reflexive and transitive frames • φ is S5-valid iff φ is valid on all equivalence frames (i.e., on all frames 〈W,R〉 where R is an equivalence relation on W) As with the ` notation, we subscript with the name of a system to indicate validity in that system; thus, “T φ” means that φ is T-valid. Of course, we would only define validity for these systems in this way if we could prove soundness and completeness relative to the definitions. We’ll do this below; but first, we’ll spend a bit of time working with these definitions.
6.4.4
Semantic validity proofs
Given our definition of validity, one can now prove that a certain formula is valid in a given system. First, a very simple example. Let us prove that the formula 2(P ∨∼P ) is K-valid. That is, we must prove that this formula is valid in every frame, since that is the definition of K-validity. Being valid in a frame means being true at every world in every model based on every frame. So, consider any frame 〈W, R〉, consider any model, M, based on that frame, and let w be any world in W. We must prove that VM (2(P ∨∼P ), w) = 1. (As before, I’ll start to omit the subscript M on VM when it’s unambiguous which model we’re talking about.) i) Suppose for reductio that V(2(P ∨∼P ), w) = 0 ii) So, by the truth condition for the 2, there is some world, v, such that Rwv and V(P ∨∼P, v) = 0 iii) But that’s impossible; since the truth conditions for the ∨ and the ∼ are just the usual ones, P ∨∼P is true in any world whatsoever Another example: let’s show that T (32(P →Q)∧2P )→3Q. We must show that V((32(P →Q)∧2P )→3Q, w) = 1 for the valuation V on an arbitrarily chosen model and world w in that model. i) Assume for reductio that V((32(P →Q)∧2P )→3Q, w) = 0 ii) So V(32(P →Q)∧2P, w) = 1 and … iii) …V(3Q, w) = 0
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
120
iv) From ii), 32(P →Q) is true at w, and so V(2(P →Q), v) = 1, for some world, call it v, such that Rwv v) From ii), V(2P, w) = 1. So, by the truth condition for the 2, P is true in every world accessible from w; since Rwv, it follows that V(P, v) = 1. vi) From iv), P →Q is true in every world accessible from v; since our model is a T-model, R is reflexive. So Rv v; and so V(P →Q, v) = 1 vii) From v) and vi), by the truth condition for the →, V(Q, v) = 1 viii) Given iii), Q is false at every world accessible from w; this contradicts vii) OK, we’ve shown that the formula (32(P →Q)∧2P )→3Q is valid in T. Suppose we were interested in showing that this formula is also valid in S4. What more would we have to do? Nothing! To be S4-valid is to be valid in every reflexive-and-transitive frame; since every reflexive-and-transitive frame is reflexive, we know automatically that the formula is S4-valid, without doing a separate proof. Think of it another way. To do a proof that the formula is S4-valid, we need to do a proof in which we are allowed to assume that the accessibility relation is both transitive and reflexive. And the proof above did just that. We didn’t ever use the fact that the accessibility relation is transitive — we only used the fact that it is reflexive (in line 9). But we don’t need to use everything we’re allowed to assume. In contrast, the proof above doesn’t establish that this formula is, say, K-valid. To be K-valid, the formula would need to be valid in all K-frames. But Kframes needn’t have reflexive accessibility relations, whereas the proof we gave assumed that the accessibility relation was reflexive. And in fact the formula isn’t in fact K-valid, as we’ll show how to demonstrate in the next section.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
121
Consider the following diagram of systems: = S5 `@@ @@ || | | @@ || B S4 aB ~> BB ~ ~ BB B ~~~ TO
DO K An arrow from one system to another indicates that validity in the first system implies validity in the second system. For example, if a formula is D-valid, then it’s also T-valid. The reason is that if something is valid on all D-frames, then, since every T-frame is also a D-frame (since reflexivity implies seriality), it must be valid on all T-frames as well. S5 is the strongest system, since it has the most valid formulas. (That’s because it has the fewest frames — it’s easier to be an S5 theorem because there are fewer potentially falsifying frames.) Notice that the diagram isn’t linear. That’s because of the following. Both B and S4 are stronger than T; each contains all the T-valid formulas. But neither B nor S4 is stronger than the other — each contains valid formulas that the other doesn’t. (They of course overlap, because each contains all the T-valid formulas.) S5 is stronger than each; S5 contains all the valid formulas of each. These relationships between the systems will be exhibited below. Suppose you are given a formula, and for each system in which it is valid, you want to give a semantic proof of its validity. This needn’t require multiple semantic proofs — as we have seen, one semantic proof can do the job. To prove that a certain formula is valid in a number of systems, it suffices to prove that it is valid in the weakest possible system. Then, that very proof will automatically be a proof that it is valid in all stronger systems. For example, a proof that a formula is valid in K would itself be a proof that the formula is D, T, B, S4, and S5-valid. Why? Because every frame of any kind is a K-frame, so K-valid formulas are always valid in all other systems. In general, then, to show what systems a formula is valid in, it suffices to give a single semantic proof of it: in the weakest system in which it is valid.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
122
There is an exception, however, since neither B nor S4 is stronger than the other. Suppose a formula is not valid in T, but one has given a semantic proof its validity in B. This proof also establishes that the formula is also valid in S5, since every S5 model is a B-model. But one still doesn’t yet know whether the formula is S4-valid, since not every S4-model is a B-model. Another semantic proof may be needed: of the formula’s S4-validity. (Of course, the formula may not be S4-valid.) So: when a wff is valid in both B and S4, but not in T, two semantic proofs of its validity are needed. I’ll present some more sample validity proofs below, but it’s often easier to do proofs of validity when one has failed to construct a counter-model for a formula. So let’s look first at counter-modeling.
6.4.5
Countermodels in MPL
We have a definition of validity for the various systems, and we’ve shown how to establish validity of particular formulas. Now we’ll investigate establishing invalidity. Let’s show that the formula 3P →2P is not K-valid. A formula is K-valid if it is valid in all K-frames, so all we must do is find one K-frame in which it isn’t valid. What follows is a procedure for doing this:6 Place the formula in a box The goal is to find some frame, some valuation for that frame, and some world in the frame, where the formula is false. Let’s start by drawing a box, which represents some chosen world in the frame we’ll construct. The goal is to make the formula false in this world. In these examples I’ll always call this first world “r”: r
3P →2P
Now, since the box represents a world, we should have some way of representing the accessibility relation — what worlds does world r “see”? Well, to represent one world (box) seeing another, we’ll draw an arrow from the first to the second. However, we don’t need to make this world r see anything. After all, we’re 6
This procedure is from Cresswell and Hughes (1996b).
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
123
trying to construct a K-model, and the accessibility relation of a K-model doesn’t even need to be serial — nothing need see anything. So, we’ll forget about arrows for the time being. Make the formula false in the world We will indicate a formula’s truth value (1 or 0) by writing it above the formula’s major connective. So to indicate that 3P →2P is to be false in this model, we’ll put a 0 above its arrow: r
0
3P →2P
Enter in forced truth values If we want to make the 3P →2P false in this world, the definition of a valuation function requires us to assign certain other truth values. Whenever a conditional is false at a world, its antecedent is true at that world and its consequent is false at that world. So, we’ve got to enter in more truth values; a 1 over the major connective of the antecedent (3P ), and a 0 over the major connective of the consequent (2P ): r
1
0 0
3P →2P
Enter asterisks When we assign a truth value to a modal formula, we thereby commit ourselves to assigning certain other truth values to various formulas at various worlds. For example, when we make 3P true at r, we commit ourselves to making P true at some world that r sees. To remind ourselves of this commitment, we’ll put an asterisk (*) below 3P . An asterisk below indicates a commitment to there being some world of a certain sort. Similarly, since 2P is false at r, this means that P must be false in some world P sees (if it were true in all such worlds, then by the semantic clause for the 2, 2P would be true at r). We again have a commitment to there being some world of a certain sort, so we enter an asterisk below 2P as well:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
1
r
124
0 0
3P →2P ∗ ∗
Discharge bottom asterisks The next step is to fulfill the commitments we incurred by adding the bottom asterisks. For each, we need to add a world to the diagram. The first asterisk requires us to add a world in which P is true; the second requires us to add a world in which P is false. We do this as follows: 1
r
~~ ~~ ~ ~~ ~~ ~ ~ a 1
P
0 0
3P →2P ∗ ∗
@@ @@ @@ @@ @@ b
0
P
What I’ve done is added two more worlds to the diagram: a and b. P is true in a, but false in b. I have thereby satisfied my obligations to the asterisks on my diagram, for r does indeed see a world in which P is true, and another in which P is false. The official model We now have a diagram of a K-model containing a world in which 3P →2P is false. But we need to produce an official model, according to the official definition of a model. A model is an ordered triple 〈W, R,I〉, so we must specify the model’s three members. The set of worlds We first must specify the set of worlds, W. W is simply the set of worlds I invoked: W = {r, a, b} But what are r, a, and b? Let’s just take them to be the letters ‘r’, ‘a’, and ‘b’. No reason not to — the members of W, recall, can be any things whatsoever.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
125
The accessibility relation Next, for the accessibility relation. Here, we need to write down what the arrows in the diagram represent. What do they represent? They represent that r “sees” a, and that r sees b. There are no other arrows in the diagram, so no other world sees anything else. Now, remember that the accessibility relation, like all relations, is a set of ordered pairs. So, we simply write out this set: R = {〈r, a〉, 〈r, b〉} That is, we write out the set of all ordered pairs 〈w1 , w2 〉 such that w1 “sees” w2 . The valuation function Finally, we need to specify the modal assignment function, A, which assigns truth values to sentence letters at worlds. In our model, A must assign 1 to P at world a, and 0 to P at world b. Now, our official definition requires an assignment function to assign a truth value to each of the infinitely many sentence letters at each world; but so long as P is true at a and false at b, it doesn’t matter what other truth values A assigns. So let’s just (arbitrarily) choose to make all other sentence letters false at all worlds in the model. We have, then: V(P, a) = 1 V(P, b) = 0 for all other sentence letters α and worlds w, V(α, w) = 0 So, that’s it — we’ve produced the model. Check the model At the end of this process, it’s a good idea to check to make sure that your model is correct. This involves various things. First, make sure that you’ve succeeded in producing the correct kind of model. For example, if you’re trying to produce a T-model, make sure that the accessibility relation you’ve written down is reflexive (we were only doing a K-model in this case, remember). Secondly, make sure that the formula in question really does come out false at one of the worlds in your model.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
126
Simplifying models Sometimes a model can be simplified. Consider the diagram of the final version of the model above: 1
r
0 0
3P →2P ∗ ∗
@@ @@ @@ @@ @@
~~ ~~ ~ ~ ~~ ~ ~~ a 1
b
P
0
P
We needn’t have used three worlds in the model. When we discharged the first asterisk, we needed to put in a world that r sees, in which P is true. But we needn’t have made that a new world — we could have simply have made P true in r. Of course we couldn’t haven’t done that for both asterisks, because that would have made P both true and false at r. So, we could make one simplification: 1 1 0 0
r
0
3P →2P ∗ ∗
b
0
P
The official model would then look as follows: W : {r, b} R : {〈r, r〉, 〈r, b〉} V(P, r) = 1; otherwise, everything false Adapting models to different systems We have showed that 3P →2P is not K-valid. Now, let’s show that this formula isn’t D-valid, i.e. that it is false in some world of some model with a serial accessibility relation (i.e., some “D-model”). Well, we haven’t quite done this, since the model above does not have a serial accessibility relation. But we can easily change this, as follows:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
127
1 1 0 0
3P →2P ∗ ∗
r
0
b
0
0
P
Official model: W : {r, b} R : {〈r, r〉, 〈r, b〉, 〈b, b〉} V(P, r) = 1; otherwise, everything false That was easy — adding the fact that b sees itself didn’t require changing anything else in the model. Suppose we want now to show that 3P →2P isn’t T-valid. Well, we’ve already done so! Why? Because we’ve already produced a T-model in which this formula is false. Look back at the most recent model. Its accessibility relation is reflexive. So it’s a T-model already. In fact, that accessibility relation is also already transitive, so it’s already an S4-model. So we know that the formula isn’t S4-valid. It’s easy to revise the model to make the accessibility relation symmetric: 1 1 0 0
r
0
3P →2P ∗ ∗ O
b
0
0
P
Official model: W: {r,b} R: {〈r,r〉,〈r,b〉,〈b,b〉,〈b,r〉} V(P, r) = 1; otherwise, everything false
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
128
Now, we’ve got a B-model, too. What’s more, we’ve also got an S5-model: notice that the accessibility relation is an equivalence relation. (In fact, it’s also a total relation.) So, we’ve succeeded in establishing that 3P →2P is not valid in any of our systems. However, we could have done the same thing much more quickly, if we had given this final model in the first place. After all, this model is an S5, S4, B, T, D, and K-model. So one model establishes that the formula isn’t valid in any of the systems. In general, in order to establish that a formula is invalid in a number of systems, try to produce a model for the strongest system (i.e., the system with the most requirements on models). If you do, then you’ll automatically have a model for the weaker systems. Keep in mind the diagram of systems: S5 `@ @@ |= @@ || | | @ | S4 aB >B BB ~~ ~ BB B ~~~ TO DO K An arrow from one system to another, recall, indicates that validity in the first system implies validity in the second. The arrows also indicate facts about invalidity, but in reverse: when an arrow points from one system to another, then invalidity in the second system implies invalidity in the first. For example, if a wff is invalid in T, then it is invalid in D. (That’s because every T-frame is a D-frame, and hence every T-countermodel is a D-countermodel.) When our task is to discover which systems a given formula is invalid in, usually only one countermodel will be needed — a countermodel in the strongest system in which the formula is invalid. But there is an exception involving B and S4. Suppose a given formula is valid in S5, but we discover a model showing that it isn’t valid in B. That model is automatically a T, D, and K-model, so we know that the formula isn’t T, D, or K-valid. But we don’t yet know about that formula’s S4-validity. If it is S4-invalid, then we will need to produce a second countermodel, an S4 countermodel. (Notice that
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
129
the B-model couldn’t already be an S4-model. If it were, then its accessibility relation would be reflexive, symmetric, and transitive, and so it would be an S5 model, contradicting the fact that the formula was S5-valid.) Additional steps in countermodelling I gave a list of steps in constructing countermodels: 1. Place the formula in a box 2. Make the formula false in the world 3. Enter in forced truth values 4. Enter asterisks 5. Discharge bottom asterisks 6. The official model We’ll need to adapt this list. Above asterisks Let’s try to get a countermodel for 32P →23P in all the systems in which it is invalid, and a semantic validity proof in all the systems in which it is valid. We always start with countermodelling before doing semantic validity proofs, and when doing countermodelling, we start by trying for a K-model. After the first few steps, we have: 1
r
a
1
2P
0 0
32P →23P ∗ ∗
y yy yy y y yy y |y
EE EE EE EE EE E" b
0
3P
At this point, we’ve got a true 2, and a false 3. Take the first: a true 2P . This doesn’t commit us to adding a world in which P is true; rather, it commits us to making P true in every world that a sees. Similarly, a zero over a 3, over 3P in world b in this case, commits us to making P false in every world that b sees. We indicate such commitments, commitments in every world seen, by putting asterisks above the relevant modal operators:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
1
r
0 0
32P →23P ∗ ∗
BB BB BB BB BB BB !
|| || | || || | }| |
∗ 1
a
130
2P
∗ 0
b
3P
Now, how can we discharge these asterisks? In this case, when trying to construct a K-model, we don’t need to do anything. Since a, for example, doesn’t see any world, then automatically P is true in every world it sees; the statement “for every world, w, if Raw then V(P, w) = 1” is vacuously true. Same goes for b — P is automatically false in all worlds it sees. So, we’ve got a K-model in which 32P →23P is true. But now suppose we try to turn the model into a D-model. Every world must now see at least one world. Let’s try: 1
r
|| || | || || | }| |
∗ 1
a
2P
c
0
0 0
32P →23P ∗ ∗
BB BB BB BB BB BB !
∗ 0
b
3P
1
0
P
d
0
P
I added worlds c and d, so that a and b would each see at least one world. (Further, worlds c and d each had to see a world, to keep the relation serial. I could have added still more worlds that c and d saw, but then they would themselves need to see some worlds…So I just let c and d see themselves.) But once c and d were added, discharging the upper asterisks in worlds a and b required making P true in c and false in d (since a sees c and b sees d). Let’s now try for a T-model. This will involve, among other things, letting
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
131
a and b see themselves. But this gets rid of the need for worlds c and d, since they were added just to make the relation serial. I’ll try: 1
r
0
∗ 1 1
a
0
2P
0 0
32P →23P ∗ ∗
| || || | || || | } |
BB BB BB BB BB BB ! b
0
∗ 0 0
3P
When I added arrows, I needed to make sure that I correctly discharged the asterisks. This required nothing of world r, since there were no top asterisks there. There were top asterisks in worlds a and b; but it turned out to be easy to discharge these asterisks — I just needed to let P be true in a, but false in b. Notice that I could have moved straight to this T-model — which is itself a D-model — rather than first going through the earlier mere-D-model. However, this won’t always be possible — sometimes you’ll be able to get a D-model, but no T-model. At this point let’s verify that our model does indeed assign the value 0 to our formula 32P →23P . First notice that 2P is true in a (since a only sees one world — itself — and P is true there). But r sees a. So 32P is true at r. Now, consider b. b only sees one world, itself, and P is false there. So 3P must also be false there. But r sees b. So 23P is false at r. But now, the antecedent of 32P →23P is true, while its consequent is false, at r. So that conditional is false at r. Which is what we wanted. Onward. Our model is not a B-model, since a, for example, doesn’t see r, despite the fact that r sees a. So let’s try to make this into a B-model. This involves making the relation symmetric. Here’s how it looks before I try to
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
132
discharge the top asterisks in a and b: 1
r
|= || | || || | | }| | 0
∗ 1 1
a
0
0 0
32P →23P ∗ ∗
2P
aBB BB BB BB BB BB ! b
0
∗ 0 0
3P
Now I need to make sure that all top asterisks are discharged. For example, since a now sees r, I’ll need to make sure that P is true at r. However, since b sees r too, P needs to be false at r. But P can’t be both true and false at r. So we’re stuck, in trying to get a B-model in which this formula is false. This suggests that maybe it is impossible — that is, perhaps this formula is true in all worlds in all B-models — that is, perhaps the formula is B-valid. (In fact, it is; one would suspect this because the formula is the result, roughly, of applying the B-axiom, and then afterwards the theorem B3. But since we haven’t proved soundness, we haven’t yet proved any connection between theoremhood and validity.) So, the thing to do is try to prove this: by supplying a semantic validity proof. So, let 〈W,R〉 be any reflexive-and-symmetric frame, let V be any valuation function, and let w be any member of W; we must show that V(32P →23P, w) = 1. i) Suppose for reductio that V(32P →23P, w) = 0 ii) Then V(32, w) = 1 and … iii) …V(23P, w) = 0 iv) By i), for some v, Rwv and V(2P, v) = 1. v) By symmetry, Rv w. vi) From iv), via the truth condition for 2, we know that P is true at every world accessible from v; and so, by v), V(P, w) = 1. vii) By iii), there is some world, call it u, such that Rw u and V(3P, u) = 0.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
133
viii) By symmetry, w is accessible from u. ix) By vi), P is false in every world accessible from u; and so by viii), V(P, w) = 0, contradicting vi) Just as we suspected: the formula is indeed B-valid. So we know that it is S5-valid (the proof we just gave was itself a proof of its S5-validity). But what about S4-validity? Remember the diagram — we don’t have the answer yet. The thing to do here is to try to come up with an S4-model, or an S4 semantic validity proof. Usually, the best thing to do is to try for a model. In fact, in the present case this is quite easy: our T-model is already an S4-model. So, we’re done. Our answer to what systems the formula is valid and invalid in comes in two parts: Invalidity: we have an S4-model, which we put down officially as follows: W = {r,a,b} R = {〈r,r〉,〈a,a〉,〈b,b〉,〈r,a〉,〈r,b〉} V(P, a) = 1, all others false This model is itself also a T, D, and K-model (since its accessibility relation is reflexive and serial), so we’ve shown that the formula is K, D, T, and S4-invalid. Validity: we gave a semantic proof of B-validity above. This was itself a proof of S5-validity, as I noted. I’ll do one more example: 32P →3232P . We can get a T-model as follows:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
134
∗ 1
32P →3232P ∗ ∗
r
0
Notice how commitments to specific truth values for different formulas are recorded by placing the formulas side by side in the box
0 0 0
∗ 1 1
0
2P
a
232P ∗
0
~
∗ 0 0
b
0
I discharged the second bottom asterisk in r by letting r see b
1
32P ∗
P
c
0
0
P
Official model: W = {r, a, b, c} R = {〈r, r〉, 〈a, a〉, 〈b, b〉, 〈c, c〉, 〈r, a〉, 〈r, b〉, 〈a, b〉, 〈b, c〉} V(P, b) = 1, all else 0 Now consider what happens when we try to turn this model into a B-model. World b must see back to world a. But then the false 32P in b conflicts with the true 2P in a. So it’s time for a validity proof. In constructed this validity proof, we can be guided by failed attempt to construct a countermodel (assuming all of our choices in constructing that countermodel were forced). In the following proof that the formula is B-valid, I chose variables for worlds that match up with the countermodel above: i) Suppose for reductio that V(32P →3232P, r ) = 0, in some world r in some B-model 〈W, R,V〉 ii) So V(32P, r ) = 1 and . . .
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
135
iii) V(3232P, r ) = 0 iv) From ii), there’s some world, call it a, such that V(2P, a) = 1 and Rra v) From iii), since Rra, V(232P, a) = 0 vi) And so, there’s some world, call it b , such that V(32P, b ) = 0 and Rab vii) By symmetry, Rb a. And so, given vi), V(2P, a) = 0. This contradicts iv) We now have a T-model for the formula, and a proof that it is B-valid. The B-validity proof shows the formula to be S5-valid; the T-model shows it to be K- and D-invalid. We still don’t yet know about S4. So let’s return to the T-model above, and see what happens when we try to make its accessibility relation transitive. World a must then see world c, which is impossible since 2P is true in a and P is false in c. So we’re ready for a S4-validity proof (the proof looks like the B-validity proof at first, but then diverges): i) Suppose for reductio that V(32P →3232P, r ) = 0, for some world r in some S4-model 〈W , R,V 〉 ii) So V(32P, r ) = 1 and . . . iii) V(3232P, r ) = 0 iv) From ii), there’s some world, call it a, such that V(2P, a) = 1 and Rra v) From iii), since Rra, V(232P, a) = 0 vi) And so, there’s some world, call it b , such that V(32P, b ) = 0 and Rab vii) By reflexivity, Rb b , so given vi), V(2P, b ) = 0 viii) And so, there’s some world, call it c, such that V(P, c) = 0 and Rb c. ix) From vi) and viii), given transitivity, we have Rac. And so, given iv), V(P, c) = 1, contradicting viii)
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
136
Daggers There’s another kind of step in constructing models. When we make an arrow false, then we’re forced to enter certain truth values for its components: 1 on the antecedent, 0 on the consequent. But consider making a disjunction true. This can happen in more than one way. The first disjunct might be true, or the second might be true, or both could be true. Similarly for true or false biconditionals, true conditionals, and false conjunctions. Consider constructing a countermodel for 2(P ∨Q)→2P ∨2Q. We’ll start by making the antecedent true. Let’s try for a T-model; so we must make P ∨Q true at r, since r sees itself. We put a dagger below the ∨, signifying that we’ve got a choice to make there. We’ll wait to make it as long as possible. Now, we must make the consequent false too. That means making both disjuncts false, which gives us two bottom asterisks. Let’s wait to discharge them. And first, let’s try one of the two possible choices for discharging the dagger — making P true at r: 1
r
0
1 1
0
0
0 0
2(P ∨Q)→(2P ∨2Q) † ∗ ∗
Now, we need to think about discharging the asterisks. The second can be discharged by making Q false at r. The first cannot be done that way because we’ve already made P true at r. So we’ll add another world: ∗ 1
r
0
1 1
0
0
0 0 0
2(P ∨Q)→(2P ∨2Q) † ∗ ∗ a
0
0
P
Now, however, we’ve got to go back and discharge the top asterisk in r — we’ve got to make P ∨Q true in a:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
137
∗ 1
1 1
0
0
0 0 0
2(P ∨Q)→(2P ∨2Q) † ∗ ∗
r
0
0 1
P ∨Q †
a
0
Now, however, it’s easy to discharge the dagger, because our hand is forced — we’ve already got P false in a, so Q must be true in a. Moreover, we can make r accessible from a without messing up anything. So, we get this: ∗ 1
r
0
1 1
0
0
0 0 0
2(P ∨Q)→(2P ∨2Q) † ∗ ∗ O 0 1 1 a
0
P ∨Q †
And that’s the final picture. So, the official S5-model is: W = {r,a} R = {〈r,r〉,〈a,a〉,〈r,a〉} V(P, r) = 1, V(Q, a) = 1, all else false Summary of steps Here, then, is a final list of the steps for constructing countermodels: 1. Place the formula in a box 2. Make the formula false in the world 3. Enter in forced truth values 4. Enter in daggers, and after all forced moves over
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
138
5. Enter asterisks 6. Discharge asterisks (hint: do bottom asterisks first) 7. Back to step 3. 8. The official model
6.5
Soundness in MPL7
At this point, for each of our modal systems, we have a definition of ‘validity’. The hope in giving these definitions of validity is that they’ll match up with the axiomatic definitions of theoremhood already given — that is, we hope that a formula is a T-theorem (for example), iff it is T-valid, under the definition of T-validity given above. That is, we hope that soundness and completeness hold for T, and all the other systems. These hopes are in fact satisfied; in fact, the K-theorems are exactly the K-validities; the D-theorems are exactly the D-validities, etc. This striking correspondence between the formal properties (i.e. seriality, reflexivity, etc.) of the accessibility relation and the various systems was discovered by Saul Kripke and others in the late 1950s, and was a major advance in the history of modal logic. This discovery has practical as well as theoretical value. First, once we’ve proved soundness, we will for the first time have a method for establishing that a given formula is not a theorem: construct a countermodel for that formula, thus establishing that the formula is not valid, and then conclude via soundness that the formula is not a theorem. Second, given completeness, if we want to know that a given formula is a theorem, it suffices to show that it is valid. Since semantic validity proofs are comparatively easy to construct, it’s nice to be able to use them rather than axiomatic proofs to establish logical truth. We seek, then, to prove soundness and completeness, which are, in the case of K, the following: Soundness every K-theorem is K-valid Completeness every K-valid formula is a K-theorem As before, soundness will be much easier, so we’ll start with that. We’re going to prove a general result, which we’ll use in several soundness proofs. Where Γ is any set of modal wffs, let’s call “K+Γ” the axiomatic system that consists of 7
Proof adapted from Cresswell and Hughes (1996a).
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
139
the same rules of inference as K (MP and NEC), and which has as axioms the axioms of K (the K axioms plus the PL axioms), plus the members of Γ. Here is the theorem: Theorem 6.1 If Γ is any set of modal wffs and 〈W,R〉 is a frame on which each wff in Γ is valid, then every theorem of K+Γ is valid in 〈W,R〉. The reason we’re interested in this general result is this: nearly all the systems commonly discussed by modal logicians are extensions of K in that they consist of K, with the same rules, plus some extra axioms. These are called “normal systems”. The nice thing about Theorem 6.1 is that it gives us a strategy for constructing soundness proofs for such systems. Consider, for example, the system K + {2φ→φ : φ is an MPL wff} — i.e., system T. To establish soundness for T, all we need to do is show that all the axioms are valid in all reflexive frames; for we may then conclude by Theorem 6.1 that every theorem of T is valid on all reflexive frames. To prove Theorem 6.1, we will first prove two lemmas: Lemma 6.2 All PL and K-axioms are valid on all frames Lemma 6.3 MP and Necessitation preserve validity on any given frame Theorem 6.1 then follows: Assume that every wff in Γ is valid on a given frame 〈W,R〉, and consider any theorem φ of K+Γ. That theorem is a last line in a proof in which each lines is either an axiom, or follows from earlier lines in the proof by MP or NEC. Axioms of K+Γ are either PL axioms, K axioms, or members of Γ. The first two classes of axioms are valid on all frames, by Lemma 6.2; and the final class of formulas are valid on the frame 〈W,R〉. Thus, all axioms in the proof of φ are valid on 〈W,R〉. By Lemma 6.3, the rules of inference in the proof preserve validity on 〈W,R〉. Therefore, φ is valid on 〈W,R〉. It now remains to prove the lemmas. Proof of Lemma 6.2 We already know that the PL axioms are valid under the original definition of PL. But the truth conditions at a world are just like the plain old truth conditions for PL. Thus, the PL axioms are valid on all frames. We need now to show
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
140
that any K axiom — i.e., any formula of the form 2(φ→ψ)→ (2φ→2ψ) — is valid on any frame. So, let 〈W,R〉 be any frame, and let w be any member of W, and suppose for reductio that this conditional isn’t valid on 〈W,R〉; then there’s some V, and some w in W, such that V(2(φ→ψ)→(2φ→2ψ), w) = 0, which in turn means, by the truth condition for →, that V(2(φ→ψ), w) = 1 V((2φ→2ψ), w) = 0, and so V(2φ, w) = 1 and V(2ψ, w) = 0. But now let’s apply the truth condition for 2 to what we’ve got here: At every v such that Rwv (“accessible from w”), φ→ψ is true At every v accessible from w, φ is true At some v accessible from w, ψ is false But that can’t be, because the conditional φ→ψ is false at a world if its antecedent is true and its consequent is false. Proof of Lemma 6.3 First MP. Let φ and φ→ψ be valid on a frame 〈W,R〉; we must show that ψ is also valid on that frame. That is, where V is any valuation function, and w is any member of W, we must show that V(ψ, w) = 1. Since φ and φ→ψ are valid on this frame, V(φ→ψ, w) = 1, and V(φ, w) = 1; but by the truth condition for →, V(ψ, w) must also be 1. Next NEC. Let φ be valid on a frame 〈W,R〉, let V be any valuation, and let w be any world in W. We must show that V(2φ, w) = 1, which means showing that V(φ, v) = 1, where v is any world accessible from w. But V(φ, v) = 1 for any world v and any valuation V (since φ is valid on this frame). Now the proofs of soundness for the individual systems:
6.5.1
Soundness of K
K’s soundness follows immediately from Theorem 6.1.
6.5.2
Soundness of D
Given Theorem 6.1, since D = K + {the D axioms}, all we must do is show that all the D-axioms are valid on all serial frames.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
141
So, let 〈W,R〉 be any serial frame, let V be any valuation function, and let w be any member of W; we must show that V(2φ→3φ, w) = 1. i. The truth condition for → says that the only way for a conditional to be false is for it to have a true antecedent and false consequent. So, let’s assume that the antecedent is true, and show that its consequent must be true as well. So, let’s assume V(2φ, w) = 1 ii. From i, φ is true at every world accessible from w. (Truth condition for 2) iii. But since R is serial, some v is accessible from w. iv. By ii, V(φ, v) = 1. v. But that means that V(3φ, w) = 1 (Truth condition for 3). vi. Thus, V(2φ→3φ, w) = 1 Note that we are not giving axiomatic proofs of things; we’re proving a fact about the system D in the metalanguage. We are free to use whatever pattern of reasoning seems appropriate; the combination of conditional proof, reductio ad absurdum, and free-wheeling intuition used here is perfectly acceptable. Notice that I appealed to the truth condition for the non-primitive connective 3; that’s fine. Thus, every D-axiom is valid in all serial frames. And so, we can conclude from theorem 6.1 that every D-theorem is valid in all serial frames — this is the claim of soundness for D.
6.5.3
Soundness of T
We must show that all T axioms are valid in all reflexive frames. So, we need to show V(2φ→φ, w) = 1, for arbitrary W, R, V, and w ∈ W. In virtue of the truth condition for →, it suffices to show that if 2φ is true at w, then so is φ: i. Assume 2φ is true at w ii. Then φ is true at every world accessible from w (truth condition for 2) iii. But w is accessible from w (reflexivity) iv. So φ is true at w
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
6.5.4
142
Soundness of B
Choose an arbitrary W, R, V, and w ∈ W; we must show the T and B axioms to be true at w under V, under the assumption that R is reflexive and symmetric. The proof of the previous section shows that the T-axiom is true at w. Now for the B axiom: i. Assume V(32φ, w) = 1. ii. So, V(2φ, v) = 1, for some v such that Rwv. iii. Since Rwv, we have Rv w, by symmetry of R. iv. From ii, we know that φ is true in every world accessible from v. v. So, from iii and iv, φ is true at w.
6.5.5
Soundness of S4
Choose an arbitrary W, R, V, and w ∈ W; we must show the T and S4 axioms to be true at w under V, under the assumption that R is reflexive and transitive. We know from above that the T-axioms are true at w, since R remains reflexive. Now for the S4 axioms: i. Assume 2φ is true at w. So φ is true at every world accessible from w. ii. To show that 22φ is true at w, we must show that 2φ is true at every world accessible from w. Let v be any such world. iii. To show that 2φ is true at v, we must show that φ is true at every world accessible from v. Let u be any such world. iv. By transitivity, since Rwv and Rv u, we have Rw u. v. From i and iv, φ is true at u.
6.5.6
Soundness of S5
Choose an arbitrary W, R, V, and w ∈ W; we must show the T and S5 axioms to be true at w under V, under the assumption that R is reflexive, transitive, and symmetric. We know from above that the T-axioms are true at w, since R remains reflexive. Now for the S5 axioms:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
143
i. Assume V(32φ, w) = 1 — so 2φ is true at some world, v, accessible from w. ii. To show V(2φ, w) = 1, we must show that V(φ, u) = 1 for an arbitrarily chosen u such that Rw u. iii. By symmetry, Ru w; since we now have Rwv and Ruw, by transitivity we have Ruv; and by symmetry we have Rv u. iv. From i, φ is true at every world accessible from v v. From iii and iv, we know that φ is true at u. And so we’re done; we now know that soundness holds for all systems in question. The method of countermodels, therefore, is a method for establishing non-theoremhood. If, for example, we can locate a reflexive frame in which a given formula is not valid, we can conclude that the formula is not T-valid, and hence, by soundness, not a T-theorem.
6.6
Completeness of MPL8
We will now prove completeness for each modal system. As with the soundness proof, most of the work will go into developing some general-purpose machinery. At the end we’ll then use the machinery to construct completeness proofs for each system. We’ll be constructing a kind of completeness proof known as a “Henkinproof”, after Leon Henkin, who used similar methods to demonstrate completeness for (nonmodal) predicate logic.
6.6.1
Canonical models
Let us extend our terminology by saying that a formula is valid on a model (as opposed to valid on a frame) iff it is true at every world in that model. For certain systems, S, we’re going to make use of certain kinds of models, canonical models, which will have the following feature: A formula is valid in the canonical model for S iff it is a theorem of S 8
Proof (including the numbering of lemmas and theorems) adapted from Cresswell and Hughes (1996a).
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
144
That’s an important connection, which the following example brings out. Suppose we can come up with a canonical model for T. Suppose further that we can show that this model has got a reflexive frame. Then, we know that every T-valid formula is valid in that model, by definition of T-validity. But then we know that every T-valid formula is a theorem of T, since the model is canonical. So we would have established completeness for T. We want to construct canonical models for various systems. The trick will be to let the worlds in question be constructed out of sets of formulas of modal logic — remember that worlds are allowed to be anything we like. When we specify the valuation function, we’ll say that a formula is true at a world iff the formula is a member of the set that is the world. Working out this idea will occupy us for awhile.
6.6.2
Maximal consistent sets of wffs
To carry out this idea of constructing worlds as sets of formulas that are true at those worlds, we’ll need to put some constraints on the nature of these sets of wffs. It’s part of the definition of a valuation function that for any wff φ and any world w, either φ or ∼φ is true at the world. That means that any set of wffs that we’re going to call a world had better contain either φ or ∼φ. Moreover, it can’t contain both, since a formula can’t be both true and false at a world. Other constraints must be introduced as well. So, let’s proceed as follows. Let S be any system. A set of wffs, Γ, is S-inconsistent iff for some finite sequence φ1 , …,φn ∈ Γ, `S ∼(φ1 ∧ · · · ∧φn ) In other words, a set is S-inconsistent if it contains some wffs (finite in number) that are provably (in S) contradictory. A set of wffs, Γ, is maximal iff for every wff φ, either φ or ∼φ is a member of Γ A set is maximal S-consistent, then, if it is both maximal and S-consistent. Such sets may be used as worlds, as the following lemma begins to show: Lemma 6.1 Let Γ be a maximal S-consistent set of wffs. Then: 6.1a for any wff φ, exactly one of φ, ∼φ is in Γ
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
145
6.1b φ→ψ is in Γ iff either φ is not in Γ or ψ is in Γ 6.1c if φ and φ→ψ are both members of Γ then so is ψ Proof of Lemma 6.1: 6.1a: we know from the definition of maximality that at least one of φ or ∼φ is in Γ. But it cannot be that both are in Γ, for then Γ would be S-inconsistent (it would contain the finite subset {φ,∼φ}; but since all modal systems incorporate propositional logic, it is a theorem of S that ∼(φ∧∼φ).) 6.1b. Suppose first that φ→ψ is in Γ, and suppose for reductio that φ is in Γ but ψ is not. Then, by 6.1a, ∼ψ is in Γ; but now Γ is S-inconsistent by containing the subset {φ,φ→ψ,∼ψ}. Suppose for the other direction that either φ is not in Γ or ψ is in Γ, and suppose for reductio that φ→ψ isn’t in Γ. By 6.1a, ∼(φ→ψ) is in Γ. Now, if φ ∈ / Γ then ∼φ ∈ Γ, but then Γ would contain the S-inconsistent subset {∼(φ→ψ),∼φ}. And if on the other hand, ψ ∈ Γ then Γ again contains an S-inconsistent subset: {∼(φ→ψ),ψ}. Either possibility contradicts Γ’s S-consistency. 6.1c. Direct consequence of 6.1b. Another Lemma: Lemma 6.2 Where Γ is any maximal S-consistent set of wffs, 6.2a if `S φ then φ ∈ Γ 6.2b if φ ∈ Γ and `S φ→ψ then ψ ∈ Γ Proof of Lemma 6.2: 6.2a; if `S φ then Γ cannot contain ∼φ, for otherwise it would not be S-consistent. Thus, by maximality, Γ contains φ. 6.2b: if `S φ→ψ then by 6.2a, φ→ψ is in Γ; but then by 6.1c, ψ ∈ Γ.
6.6.3
Maximal consistent extensions
Next let’s show that if we begin with an S-consistent set ∆, we can expand it into a maximal consistent set: Theorem 6.3 Let ∆ be an S-consistent set of wffs. Then there’s some maximal S-consistent set of wffs, Γ, such that ∆ ⊆ Γ Proof of Theorem 6.3: In outline, we’re going to construct Γ as follows. We’re going to start with the formulas in ∆, and we’re then going to go through all the wffs in the language of MPL, φ1 , φ2 ,…, one at a time. For each of these
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
146
wffs, we’re going to add in either it or its negation, depending on which choice would be S-consistent. After we’re done, we’ll have our set Γ. It will obviously be maximal; it will obviously contain ∆ as a subset; and, we’ll show, it will also be S-consistent. So, let φ1 , φ2 ,…be a list — an infinite list, of course — of all the wffs of MPL.9 Our strategy, recall, is to construct Γ by starting with ∆, and then going through this list one-by-one, at each point adding either φi or ∼φi . Here’s how we do this more carefully. Let’s begin by defining an infinite sequence of sets: i) Γ0 is defined as ∆ ii) Assuming that Γn has been defined, we define Γn+1 to be Γn ∪{φn+1 } if that would be an S-consistent set; otherwise we define Γn+1 to be Γn ∪{∼φn+1 } Note the recursive nature of the definition: the next member of the sequence of sets, Γn+1 is defined as a function of the previous member of the sequence, Γn . Next let’s prove that each member in this sequence — that is, each Γi — is an S-consistent set. We do this inductively, by first showing that Γ0 is S-consistent, and then showing that if Γn is S-consistent, then so will be Γn+1 . 9
We need to be sure that there is some way of arranging all the wffs of MPL into such a list. Here is one method. Consider the following list of the primitive expressions of MPL: ( 1
) ∼ 2 3
→ 4
2 5
P1 6
P2 7
... ...
Since we’ll need to refer to what position an expression has in this list, the positions of the expressions are listed underneath those expressions. (E.g., the position of the 2 is 5.) Now, where φ is any wff, call the rating of φ the sum of the positions of its primitive expressions. We can now construct the listing of all the wffs of MPL by an infinite series of stages: stage 1, stage 2, etc. In stage n, we append to our growing list all the wffs of rating n, in alphabetical order. The notion of alphabetical order here is the usual one, given the ordering of the primitive expressions laid out above. (E.g., just as ‘and’ comes before ‘nad’ in alphabetical order, since ‘a’ precedes ‘n’ in the usual ordering of the English alphabet, ∼2P2 comes before 2∼P2 in alphabetical order since ∼ comes before the 2 in the ordering of the alphabet of MPL. Note that each of these wffs are inserted into the list in stage 15.) In stages 1-5 no wffs are added at all, since every wff must have at least one sentence letter and P1 is the wff with the smallest position. In stage 6 there is one wff: P1 . Thus, the first member of our list of wffs is P1 . In stage 7 there is one wff: P2 , so P2 is the second member of the list. In every subsequent stage there are only finitely many wffs; so each stage adds finitely many wffs to the list; each wff gets added at some stage; so each wff eventually gets added after some finite amount of time to this list.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
147
Obviously, Γ0 is S-consistent, since ∆ was stipulated to be S-consistent. Next, suppose that Γn is S-consistent; we must show that Γn+1 is S-consistent. Obviously, if Γn+1 was defined as Γn ∪{φn+1 } then it is S-consistent. Suppose on the other hand that it was defined as Γn ∪{∼φn+1 }; that means that Γn ∪{φn+1 } was S-inconsistent — the conjunction of some finite subset of its members is provably false in S. Since Γn was S-consistent, the finite subset must contain φn+1 , and so there exist ψ1 …ψ m ∈ Γn such that `S ∼(ψ1 ∧ · · · ∧ψ m ∧φn+1 ). Suppose now for reductio that Γn+1 — that is, Γn ∪{∼φn+1 } — is S-inconsistent, and so the conjunction of some finite subset of its members is provably false. Again, the finite subset must contain φn+1 , so there exist χ1 …χ p ∈ Γn such that `S ∼(χ1 ∧ · · · ∧χ p ∧∼φn+1 ). But notice that ∼(ψ1 ∧ · · · ∧ψ m ∧χ1 ∧ · · · ∧χ p ) is a PL consequence of ∼(ψ1 ∧ · · · ∧ψ m ∧φn+1 ) and ∼(χ1 ∧ · · · ∧χ p ∧∼φn+1 ). Any system is closed under (finite) PL-consequence (since the systems contain all PLtheorems, and are closed under modus ponens). Thus, `S ∼(ψ1 ∧ · · · ∧ψ m ∧χ1 ∧ · · · ∧χ p ); but this contradicts the fact that Γn is S-consistent. We have shown that all the sets in our sequence Γi are S-consistent. Let us now define Γ to be the union of all the sets in the infinite sequence — i.e., {φ : φ ∈ Γi for some i }. We must now show that Γ is the set we’re after: that i) ∆ ⊆ Γ, ii) Γ is maximal, and iii) Γ is S-consistent. Any member of ∆ is a member of Γ0 (since Γ0 was defined as ∆), hence is a member of one of the Γi s, and hence is a member of Γ. So ∆ ⊆ Γ. Any wff of MPL is in the list somewhere — i.e., it is φi for some i. But by definition of Γi , either φi or ∼φi is a member of Γi ; and so one of these is a member of Γ. Γ is therefore maximal. Suppose for reductio that Γ is S-inconsistent; there must then exist ψ1 . . . ψ m ∈ Γ such that `S ∼(ψ1 ∧ · · · ∧ψ m ). By definition of Γ, each of these ψi ’s are members of Γ j , for some j . Let k be the largest such j . Note next that, given the way the Γi ’s are constructed, each Γi is a subset of all subsequent ones. Thus, all of the ψi ’s are members of Γk , and thus Γk is S-inconsistent. But that can’t be — we showed that all the Γi ’s are S-consistent. QED.
6.6.4
Consistent sets of wffs in modal systems
We need to start thinking about which worlds in our canonical model will be accessible from which. Given the truth condition for 2, we’ll need to be sure that if ∆ is accessible from Γ, then for every formula 2φ that’s in Γ, φ must be in ∆. In fact, this will be our definition of accessibility. Let’s introduce some
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
148
notation: 2− (∆) is defined as the set of wffs φ such that 2φ is a member of ∆ Think of this operation as “stripping off the boxes”. Below, we will define accessibility as follows: ∆ is accessible from Γ iff 2− (Γ) ⊆ ∆. We’ll thereby be assured that the 2 statements from various worlds will “mesh”. But we need to be assured that the same will hold for the 3; in order to construct a canonical model, we’ll need to be sure that if we include a world (maximal S-consistent set of wffs) containing ∼2φ (which is, in effect, 3∼φ), we’ll also be able to include some accessible world containing ∼φ. That’s what the next lemma addresses: Lemma 6.4 Where S is any normal10 system of MPL, and ∆ is an S-consistent set of wffs containing ∼2φ, 2− (∆) ∪ {∼φ} is S-consistent. Proof of Lemma 6.4: Let ∆ and S be as described, and suppose for reductio that 2− (∆) ∪ {∼φ} is S-inconsistent; there are then ψ1 …ψn in 2− (∆) such that `S ∼(ψ1 ∧ · · · ∧ψn ∧∼φ). Now begin a proof in S with a proof of ∼(ψ1 ∧ · · · ∧ψn ∧∼φ), and then continue as follows: . . ∼(ψ1 ∧ · · · ∧ψn ∧∼φ) ψ1 →(ψ2 → · · · (ψn →φ) . . . ) 2(ψ1 →(ψ2 → · · · (ψn →φ) . . . ) . . 2ψ1 →(2ψ2 → · · · (2ψn →2φ) . . . ) ∼(2ψ1 ∧ · · · ∧2ψn ∧∼2φ)
PL NEC
K, PL (×n) PL
This proof establishes that `S ∼(2ψ1 ∧ · · · ∧2ψn ∧∼2φ). But since 2ψ1 …2ψn , and ∼2φ are all in ∆, this contradicts ∆’s S-consistency (2ψ1 …2ψn are members of ∆ because ψ1 …ψn are members of 2− (∆).) The proof is done, except for a couple special cases. First: what if ∆ has no wffs of the form 2ψ? Then 2− (∆) is empty, which would mean that none of 10
“Normal” = includes K, recall.
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
149
the ψi ’s exist. But in that case, the contradiction in 2− (∆)∪ {∼φ} is due to ∼φ alone; that is, `S φ. But by NEC, `S 2φ, which contradicts the claim that ∆ is S-consistent but contains ∼2φ. The second special case is this: what if the contradiction in 2− (∆) ∪ {∼φ} doesn’t involve ∼φ? In that case, we’d have `S ∼(ψ1 ∧ · · · ∧ψn ). But then, given closure under PL-consequence, we’d have `S ∼(ψ1 ∧ · · · ∧ψn ∧∼φ). The proof then proceeds as before. QED.
6.6.5
Canonical models
We’ll now put these lemmas to work; for the point of Theorem 6.3 and Lemma 6.4 is that we’ll be able to construct worlds as maximal consistent sets of wffs that “mesh”, given our definition of the accessibility relation. We’ll now show how to construct the canonical model for a given system, S. We need to do this by specifying 〈W,R,A〉: W: the set of all maximal S-consistent sets of wffs. wRw 0 iff 2− (w) ⊆ w 0 For any sentence letter, φ, and any world, w, A(φ, w) = 1 iff φ ∈ w Consider, now, the valuation V that results from this definition of the canonical model. Given how the assignment A was defined, we know that for any atomic wff, α, V(α, w) = 1 iff α ∈ w. We wish now to show that this holds for all wffs, not just atomic wffs: a wff is true at a world iff it is a member of that world: Theorem 6.5 Where M (= 〈W, R, A〉) is the canonical model for any normal modal system, S, for any wff φ and any w ∈ W, VM (φ, w) = 1 iff φ ∈ w Proof of Theorem 6.5: We’ll use induction. The base case is when φ has zero connectives — i.e., φ is a sentence letter. In that case, the theorem holds given the definition of a canonical model. Now the inductive step. We suppose (ih) that the result holds for φ, ψ, and show that it holds for ∼φ, φ→ψ, and 2φ as well: ∼ : We must show that ∼φ is true at w iff ∼φ ∈ w. i. ∼φ ∈ w iff φ ∈ / w (6.1a)
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
150
ii. φ ∈ / w iff φ is not true at w (ih) iii. φ is not true at w iff ∼φ is true at w (truth cond. for ∼) iv. so, ∼φ is true at w iff ∼φ ∈ w (i, ii, iii) → : We must show that φ→ψ is true at w iff φ→ψ ∈ w: i. V(φ→ψ, w) = 1 iff either V(φ, w) = 0 or V(ψ, w) = 1 (truth cond for →) ii. So, V(φ→ψ, w) = 1 iff either φ ∈ / w or ψ ∈ w (ih) iii. So, V(φ→ψ, w) = 1 iff φ→ψ ∈ w (6.1b) 2 : We must show that 2φ is true at w iff 2φ ∈ w. First the forwards direction. Assume 2φ is true at w; then φ is true at every w0 such that wRw 0 . By the ih, we have (*) φ is a member of every such w 0 . Now suppose for reductio that 2φ ∈ / w; by 6.1a, ∼2φ ∈ w. Since w is S-consistent, − by Lemma 6.4, 2 (w) ∪ {∼φ} is S-consistent; by Theorem 6.3 it has a maximal S-consistent extension, v. By definition of W, v is a world; since 2− (w) ⊆ v, by definition of R, wRv; and so by (*) v contains φ. But v also contains ∼φ, which contradicts its S-consistency. Now the backwards direction. Assume 2φ ∈ w. Then by definition of R, for every w 0 such that wRw 0 , φ ∈ w 0 . By the ih, φ is true at every such world; hence by the truth condition for 2, 2φ is true at w. QED. What was the point of proving theorem 6.5? The whole idea of a canonical model was to be that a formula is valid on the canonical model for S iff it is a theorem of S. This fact follows immediately from Theorem 6.5: Corollary 6.6: φ is valid in the canonical model for S iff `S φ Proof of Corollary 6.6: Let 〈W, R, A〉 be the canonical model for S. Suppose `S φ. Then, by 6.2a, φ is a member of every maximal S-consistent set, and hence φ ∈ w, for every w ∈ W. By 6.5, φ is true in every w ∈ W, and so is valid in this model. Now for the other direction: suppose 0S φ. Then {∼φ} is S-consistent, and so by theorem 6.3, has a maximal consistent extension; thus, ∼φ ∈ w for some w ∈ W; by theorem 6.5, ∼φ is therefore true at w, and so φ is not true at w, and hence φ is not valid in this model. So, we’ve gotten where we wanted to go: we’ve shown that every system has a canonical model, and that a wff is valid on the canonical model iff it is a theorem of the system. We now use this fact to prove completeness for our various systems:
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
6.6.6
151
Completeness of systems of MPL
Completeness of K K’s completeness follows immediately. Any K-valid wff is valid in all frames, and thus valid in all models, and thus valid in the canonical model for K, and so, by corollary 6.6, is a theorem of K. For the other systems, all that’s required for completeness is that we show that the accessibility relation in the canonical model has the formal property of the accessibility relation in all the frames for that system. This will be made clear in the proof of completeness for D: Completeness of D Let us show that in the canonical model for D, the accessibility relation, R, is serial. Let w be any world in that model. We showed above that 3(P →P ) is a theorem of D, and so is a member of w by 6.2a, and so is true at w by 6.5. Thus, by the truth condition for 3, there must be some world accessible to w in which P →P is true; and hence there must be some world accessible to w. Now for D’s completeness. Let φ be D-valid. It is then valid in all serial models. But we just showed that the canonical model for D is serial. φ is therefore valid in that model, and hence by 6.6, `D φ. Completeness of T As is plain from the previous section, all we need to do is prove that the accessibility relation in the canonical model for T is reflexive, for then it will follow that every T-valid formula is valid in the canonical model, and hence by 6.6, every T-valid formula is a T-theorem. Let w be any world in the canonical model for T. For any φ, `T 2φ→φ; thus, by 6.2b, it follows that for any φ, if 2φ ∈ w then φ ∈ w. But this is the definition of wRw. Completeness of B The accessibility relation can be shown to be reflexive in the same way as it was shown for T, since every T-theorem is a B-theorem. Now for symmetry: in the canonical model for B, suppose that wRv. We must show that vRw — that is, that for any 2ψ in v, ψ ∈ w. So, suppose that
CHAPTER 6. PROPOSITIONAL MODAL LOGIC
152
2ψ ∈ v. By 6.5, 2ψ is true at v; since wRv, by the definition of 3 it follows that 32ψ is true at w, and hence is a member of w by 6.5. Since `B 32ψ→ψ, by 6.2b, ψ ∈ w. Completeness of S4 Reflexivity holds still; transitivity remains. Suppose wRv and vRu. We must show wRu — that is, for any 2ψ ∈ w, ψ ∈u. If 2ψ ∈ w, since `S4 2ψ→22ψ, by 6.2b we have 22ψ ∈ w. By 6.5, 22ψ is true at w; hence by the truth condition for 2, 2ψ is true at v; again by the truth condition for 2, ψ is true at u; by 6.5, ψ ∈ u. Completeness of S5 Reflexivity, symmetry, and transitivity hold in virtue of the proofs above, since S5’s theorems include all those of T, B, and S4.
Chapter 7 Variations on Propositional Modal Logic s we have seen, possible worlds are useful for giving a semantics for propositional modal logic. Possible worlds are useful in other areas of logic as well. In this chapter we will briefly examine two other uses for possible worlds: semantics for tense logic, and semantics for intuitionism.
A
7.1 7.1.1
Propositional tense logic1 Philosophical introduction
Propositional modal logic concerned the logic of the non-truth-functional sentential operators “it is necessary that” and “it is possible that”. Another set of sentential operators that can be similarly treated are propositional tense operators, such as “it will be the case that”, “it has always been the case that”, etc. A full logical treatment of natural language obviously requires that we pay attention to temporal notions. It isn’t so obvious, though, that we can’t just use plain old predicate logic to treat temporal notions. This was the strategy of many early logicians, such as Quine.2 Quine’s method is roughly this: 1 2
See Gamut (1991b, section 2.4); Cresswell and Hughes (1996a, pp. 127-134). See, for example, Quine (1953).
153
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC154 Everyone who is now an adult was once a child ∀x(Axn → ∃t [t
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC155 Pφ: it either is, or was at some point in the past the case that φ In this language, if P means “there exists a dinosaur”, then the sentence: PP means there is, or once existed a dinosaur — but this isn’t reduced in any way to merely past dinosaurs. The tense operator ‘P’ is primitive. Another interpretation for these tense operators is as follows: Gφ: it is always going to be the case that φ Hφ: it always has been the case that φ Fφ: it will at some point in the future be the case that φ Pφ: it was at some point in the past the case that φ The difference is in whether the present time is included in the scope of the tense operators. The difference is reflected in whether the T principle will be true for the G and H — on the first way of taking the tense operators, Gφ implies φ, but not on the second. It will be convenient to take them in the first way. Let’s simplify things in a couple ways. Notice that the tense operators come in pairs, one for the future looking tense operators: G and F, and one for the past: H and P. Let’s just concentrate on the one pair, the future looking operators. Further, let’s change notation; let’s just use our old 2 and 3 for G and F.
7.1.2
Syntax of tense logic
The syntax of our propositional tense logic is the same as before. Note that we can continue to define 3 as ∼2∼ — for something is or will be the case iff it is not the case that it is and always is going to be false.
7.1.3
Validity and theoremhood in tense logic
What we want to do now is find out what axioms we should use, when the 2 and 3 are interpreted as we are interpreting them now. What Prior and others did was to utilize a possible worlds semantics for the 3 and 2, and search for what then were appropriate axioms.
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC156 Here’s what we mean by a possible worlds semantics for the tense operators. We think of the members of W, the set of worlds, as times. And we think of the accessibility relation as being the at-or-earlier-than relation (“≤”). Thus, “wRw 0 ”, or, as I’ll often write, “t Rt 0 ”, means that t ≤ t 0 . Thus, formulas have truth values at times; 2φ (“φ is and always is going to be the case”) is true at a time, in these models, iff φ is true at that time and all later times. (Note that if we had taken our tense operators in the second way mentioned above, so that 2φ means that merely that φ is always going to be the case, then the accessibility relation would need to be interpreted as 〈 — strictly earlier than.) Next, the goal is to find out axiomatic systems that will be sound and complete with respect to classes of frames that match possible structures for time. For example, some frames one could write down wouldn’t seem like representations of what time could be like, if the earlier-than relation wasn’t transitive, for example, or if it was symmetric. So, we look for the frames that seem like possibilities for time, and then we look for axioms that will characterize those frames — i.e., will result in a sound and complete system relative to those frames. Some comments. First, on this conception, a formula doesn’t have a permanent truth value; rather, its truth value changes from time to time. This is a fundamental break with the Quinean conception of an atemporal logic. Quine’s “there is a dinosaur” doesn’t vary in truth value. Nor does the truth value of any given symbolization of “I am standing now”. For if we name the present moment ‘n’, The sentence will be symbolized as “St n”. And if I symbolize what I mean by a later utterance at some other time, I’ll need to use a new name, “n 0 ”, to denote that time, and thus the symbolization will be “St n 0 ”. On the Quinean view, formulas have permanent truth values. Second, it’s an interesting philosophical question why Prior and others regarded our intuitions about possibilities for the structure of the time axis as being a good guide to finding the right axioms for tense logic. After all, for Prior the tenses are primitive — we don’t define “always” as “true at all times”. The issues here are similar to those confronting a possible worlds theorist who doesn’t really believe in possible worlds — an actualist — but who wants to use intuitions about the accessibility relation as a guide to formulating the correct modal system. Third, note that there are philosophers who do accept the conception of space-like time, but nevertheless are interested in tense logic, as a representation
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC157 of English tensed speech. These philosophers can with clear conscience consult their intuitions about possible time structures. Fourth, note that tense logic may not be a plausible model for how English tense actually works; whether it is, is a question for linguistics. Still, even a realistic account of English tense not based on tense logic might still contain features analogous to tense logic. So, what is the right system of modal logic, when we interpret 2 and 3 as expressing these tensed notions? Is it one of our systems already considered? Or some further system we haven’t considered? K, D, T in Tense Logic As can easily be seen, the K axioms are going to need to be axioms (or theorems) in any system we choose, because if the semantics includes the familiar clause: 2φ is true at t iff φ is true at all t 0 such that t Rt 0 Then the K axioms all turn out valid (cf. the proof above that all K-axioms are valid in any frame whatsoever). Also, since we are interpreting t Rt 0 here as “t ≤ t 0 ”, then R is automatically reflexive, so we’ll want to include the T axioms, and thus the D axioms as theorems. (This would not be the case if we took the tense operators in the second way mentioned above, for the reason that R would not then be reflexive. In fact it could naturally be taken to be irreflexive — no time is earlier than itself.) What more do we want to add? We don’t want to go all the way to S5, because ≤ is not an equivalence relation. In terms of theorems, the characteristic S5 axiom is implausible in the tense logical case. 32φ→2φ now means “if it will be the case that it is (then) always going to be the case that φ, then it (now) is always going to be the case that φ”. That’s wrong: it will be the case that it is always going to be the case that I’m dead, but nevertheless, it isn’t true (now) that it is always going to be the case that I’m dead. Similarly, it doesn’t follow from it will be the case that it is always going to be the case that I’m dead that I’m dead. Thus, the B axiom 32φ→φ fails. And this is to be expected — ≤ isn’t a symmetrical relation. S4.3 in Tense Logic That relation does, however, appear to be transitive. And the principle 2φ→22φ seems like a good axiom.
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC158 But do we want to require more than transitivity? Connectivity also should be required: Connectivity: if t ≤ t 0 and t ≤ t 00 , then either t 0 ≤ t 00 or t 00 ≤ t 0 This make the set of times be more like a line, since it rules out branching. Let’s consider the class of frames — call them the S4.3 frames — in which the accessibility relation is reflexive, transitive, and connected. What axioms correspond to this class of frames? As it turns out, the axioms of S4.3, which are the S4 axioms, plus the following: D1 2(2φ→ψ)∨2(2ψ→φ) Soundness of S4.3 Given theorem 6.1 and our earlier proofs that the T and S4 axioms are valid in all reflexive-and-transitive frames, we can prove soundness for S4.3 if we can show that the new axioms D1 are all valid in all reflexive-and-transitive-and-connected frames: • Assume for reductio that V(2(2φ→ψ), w) = 0 and V(2(2ψ→φ), w) = 0 • so there are a pair of worlds (times), a and b , accessible from w, at which 2φ→ψ and 2ψ→φ are false, respectively. • thus, 2φ is true at a, ψ is false at a, 2ψ is true at b , and φ is false at b • but by connectivity, either aRb or b Ra. Either way there’s a contradiction. Completeness of S4.3 Given corollary 6.6, all we need to do is show that in the canonical model for S4.3, the accessibility relation is reflexive, transitive, and connected. Reflexivity and transitivity carry over from the proof of S4’s completeness. The reason is that the proof there just depended on the existence of certain theorems in S4, and S4.3 only differs from S4 in adding new theorems. We now just need to show that the accessibility relation is connected: • So, let wRa and wRb ; we must show that either aRb or b Ra. So suppose for reductio that a doesn’t bear R to b and b doesn’t bear R to a.
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC159 • Since a doesn’t bear R to b , for some ψ, 2ψ ∈ a but ψ ∈ / b. By 6.5, 2ψ is true at a and ψ is false at b . • Similarly, since b doesn’t bear R to a, for some φ, 2φ is true at b and φ is false at a. • Thus, 2ψ→φ is false at a, and 2φ→ψ is false at b . • But, `S4.3 2(2φ→ψ)∨2(2ψ→φ), and so by 6.2a, 2(2φ→ψ) ∨2(2ψ→φ) ∈ w, and so by 6.5, 2(2φ→ψ)∨2(2ψ→φ) is true at w. Thus, one of its disjuncts is true at w. But this is impossible since w sees both worlds a and b Other constraints on ≤ There are other constraints one could place on the time-line. Even transitivity, reflexivity, and connectedness don’t really suffice to characterize the time line, for they allow times to be “tied”: it could be that for two distinct times, t and t 0 , t ≤ t 0 , and t 0 ≤ t . A requirement of anti-symmetry (for any t , t 0 , if t ≤ t 0 , and t 0 ≤ t , then t = t 0 ) would rule this out. We won’t discuss what formulas correspond to anti-symmetry. Likewise, one could impose the constraint that time has no ending, that time doesn’t go in a circle, that time is dense, that time has a metric like that of the real line. It is easy to express some of these further constraints on frames, by means of axioms, if the tense operators are given the second interpretation above, where the truth condition for 2φ is that φ is true at all later times. For example, we can insure that no frames have an end to time by including the following axiom: 2φ→3φ, for if time had an end, this would be a time that didn’t see any other times, at which all 3 statements would be false. If we say that time has an end iff there’s some time such that there’s no later time, then from the soundness and completeness proofs for D, we see that the Dcharacteristic axiom characterizes the models where time doesn’t end. (Note: these constraints only have the desired impact when we assume in our model that earlier-than must be irreflexive.) Similarly, L. T. F. Gamut shows (claims) that the following formula requires that time is dense: 3P →33P . The claim that time is dense means that between any two times, there’s a third time. (They must mean this on the assumption that the accessibility relation is also transitive and connected.) Why does this formula correspond to density? Because if time weren’t dense, then it could
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC160 be that at some time, t , P is future, in virtue of being true at the next moment, t 0 ; but at no future time will P be future, since there’s no moment between t and t 0 , and P is false at all times after t 0 . Thus, non-dense frames could not be included in our definition of tense-logical validity, for otherwise soundness would fail — this axiom would be a theorem, but not valid. These formulas we’ve found to correspond to these conditions on frames don’t correspond in the same way, if the tense operators are given the first interpretation, specified above. For example, on that interpretation, 3P →33P won’t correspond to density. The reason is that on that interpretation, the accessibility relation is naturally taken to be transitive. But then 3P →33P is automatically valid in reflexive frames, whether or not the accessibility relation is dense. Thus, non-dense frames could still be included in the definition of tense-logical validity.3
7.2
Intuitionist propositional logic
7.2.1
Kripke semantics for intuitionist propositional logic4
As we saw in section 3.4, thinking of meaning in terms of proof conditions, rather than truth conditions, results in a different propositional logic. The proof conditions associated with the propositional connectives are thus: a proof of ∼φ is a proof that φ leads to a contradiction a proof of φ∧ψ is a proof of φ and a proof of ψ a proof of φ∨ψ is a proof of φ or a proof of ψ a proof of φ→ψ is a construction that can be used to turn any proof of φ into a proof of ψ Suppose one thinks of a valid formula as one that is provable no matter what. Then these intuitive conditions result in a different set of valid formulas from the usual ones. For instance, the following formula is not a logical truth: ∼∼P →P 3
It is difficult to impose some of these restrictions if one does not assume irreflexivity in the models, and assumes reflexivity instead. I think Cresswell told me that it’s impossible to impose certain constraints (maybe density?) if you assume reflexivity. 4 See (Priest, 2001, chapter 6)
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC161 for just because one has a proof that ∼P leads to a contradiction, it does not follow that one has a proof of P . Neither is the following a logical truth: P ∨∼P for one might lack a proof of P as well as a proof that P leads to a contradiction. As we saw, one can give rules of proof for intuitionism using sequents, simply by beginning with the original sequent calculus and then dropping double-negation elimination while adding ex falso. But we still need a semantics. Here’s the Kripke possible worlds semantics for intuitionism. We’ll prove soundness (but not completeness) for the proof system of section 3.4 relative to this semantics: A intuitionism model is a triple 〈W,R,I〉, such that: i) W is a non-empty set (“worlds”) ii) R is a binary relation over W (“accessibility”) that is reflexive, transitive, and obeys the heredity condition: for any sentence letter φ, if I(φ, w) = 1 and Rwv then I(φ, v) = 1 iii) I is a function from sentence letters and worlds to truth values (“interpretation function”). Given any such model, we can define the corresponding valuation function thus: a) VI (φ, w) = I(φ, w) for any sentence letter φ b) VI (φ∧ψ, w) = 1 iff VI (φ, w) = 1 and VI (ψ, w) = 1 c) VI (φ∨ψ, w) = 1 iff VI (φ, w) = 1 or VI (ψ, w) = 1 d) VI (∼φ, w) = 1 iff for every v such that Rwv, VI (φ, v) = 0 e) VI (φ→ψ, w) = 1 iff for every v such that Rwv, either VI (φ, v) = 0 or VI (ψ, v) = 1 Note that the truth conditions for the → and the ∼ at world w no longer depend exclusively on what w is like; they are sensitive to what happens at worlds accessible from w. Here is the intuitive idea behind these truth conditions. Think of a world as a state of information at a time. At any world, one has come up with proofs
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC162 of some things but not others. When V assigns 1 to a formula at a world, that means intuitively that as of that state of information, the formula has been proven. The assignment of 0 means that the formula has not been proven thus far (though it might nevertheless in the future.) The accessibility relation R represents the possible futures. If v is accessible from w, that means that v contains all the proofs in w, plus perhaps more. Given this understanding, reflexivity and transitivity are obviously correct to impose, as is the heredity condition. (Think this through to see that it’s so.) Note: the accessibility relation will not in general be symmetric: for sometimes one will come across a new proof that one did not formerly have. Let’s also think through why the truth conditions for →, ∧, ∨ and ∼ are intuitively correct. As of a time, one has proved φ∧ψ iff one has proved both φ and ψ then. As of a time, one has proved φ∨ψ iff one has proved one of the disjuncts. As of a time, one has proved ∼φ iff one has proved that there is no proof. And (this is fudging!) one can in principle prove that there is no proof of φ iff there is no possible extension of one’s state of information in which one proves φ.5 And: if one has a method of converting proofs of φ into proofs of ψ, then there could never be a possible future in which one has a proof of φ but not one of ψ. Conversely, if one lacks such a method, then it should be possible one day to have a proof of φ without being able to convert it into a proof of ψ, and thus without then having a proof of ψ.6 We can now define semantic consequence and validity in the obvious way: Γ φ iff for every model and every world w, if every member of Γ is true in w, so is φ φ iff ∅ φ.
7.2.2
Examples
Let’s now examine some examples. First: Q P →Q 5
I find a bit of what Priest says about this confusing. Are the worlds idealized so that one has already proven everyone one can in principle prove? If so then this makes better sense of the truth condition for ∼. But that’s clearly the wrong idea, for then any formula true in any extension of a world should already be assigned truth in that world. 6 Again this seems to slip into thinking of the worlds as idealized states.
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC163 Take any model and any world w; assume that V(Q, w) = 1 and V(P →Q, w) = 0. Thus, for some v, Rwv and V(P, v) = 1 and V(Q, v) = 0. But this violates heredity. Next: P →Q ∼Q→∼P (contraposition): Suppose V(P →Q, w) = 1 and V(∼Q→∼P, w) = 0. Given the latter, there’s some world v such that Rwv and V(∼Q, v) = 1 and V(∼P, v) = 0. Given the latter, for some u, Rv u and V(P, u) = 1. Given the former, V(Q, u) = 0. Given transitivity, Rw u. Given the truth of P →Q at w, either V(P, u) = 0 or V(Q, u) = 1. Contradiction. Next we will establish two facts connecting semantic consequence to conditionals: Deduction theorem: (If φ ψ then φ→ψ) Converse deduction theorem: if φ→ψ then φ ψ For the first, suppose φ ψ, and suppose for reductio that V(φ→ψ, w) = 0. Then for some v (that w sees), V(φ, v) = 1 and V(ψ, v) = 0 — contradicts φ ψ. As for the second, suppose φ→ψ, and suppose for reductio that V(φ, w) = 1 while V(ψ, w) = 0. By φ→ψ, V(φ→ψ, w) = 1. By reflexivity, either V(φ, w) = 0 or V(ψ, w) = 1. Contradiction. In section 3.4 we asserted (but did not prove) that ∅ ` P ∨∼P is an unprovable sequent. Here we’ll establish: 2 P ∨∼P : Here’s a model in which P ∨∼P is false in world r: 0 0 0
r
0
P ∨∼P a
1
0
Official model:
P
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC164 W: {r,a} R: {〈r,r〉,〈a,a〉,〈r,a〉} V(P,a)=1, all other atomics false everywhere (I’ll skip the official models from now on.) So, 2 P ∨∼P . Given the soundness of the intuitionist proof system (proved below), it follows that ∅ ` P ∨∼P is indeed an unprovable sequent. Next example: ∼∼P 2 P : 1 0
r
0
P
∼∼P 0
a
1
0
P
0
∼P
Note: since ∼∼P is true at r, that means that ∼P must be 0 at every world at which r sees. Now, Rrr, so ∼P must be false at r. So r must see s ome world in which P is true. World a takes care of that. Next: ∼(P ∧Q) 2 (∼P ∨∼Q): 1 r
0
0 0
∼(P ∧Q)
p ppp p p p ppp w pp p a
0
1
0 0
P
P ∧Q
Next example: ∼P ∨∼Q ∼(P ∧Q):
0
0 0
∼P ∨∼Q
OOO OOO OOO OOO O' 1 b
0
Q
0 0
P ∧Q
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC165 Suppose V(∼P ∨∼Q, w) = 1 and V(∼(P ∧Q), w) = 0. Given the latter, for some v, Rwv and V(P ∧Q, v) = 1. So, V(P, v) = 1 and V(Q, v) = 1. Given the former, either V(∼P, w) = 1 or V(∼Q, w) = 1. If the former then V(P, v) = 0; if the latter V(Q, v) = 0. Contradiction either way. Next: (P ∧Q)→S 2 (P →S)→(Q→S): We start thus: 1
r
0
0
(P ∧Q)→S
(P →S)→(Q→S)
Then we discharge the false conditional in r: 1
0
(P ∧Q)→S
r
0
(P →S)→(Q→S)
a
0
1
0
P →S
Q→S
and then we discharge the false conditional in a: 1
r
0
0
(P ∧Q)→S
(P →S)→(Q→S)
a
0
1
0
P →S
Q→S
b
0
1
0
Q
S
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC166 At this point all the false conditionals have been discharged. But there are two true conditionals, in worlds r and a, that we need to attend to. A true conditional in essence says that the truth conditions of the classical material conditional must be true in every accessible world. Thus, let’s make our model look like this: 0 0
r
0
1
(P ∧Q)→S
0
(P →S)→(Q→S)
a
0 1
0
0
P →S
Q→S
b
0
1
0
0
Q
S
P
Since P is false in r, P ∧Q is false in r, and so the conditional (P ∧Q)→S is satisfied, so far as r is concerned. Moreover, that conditional is also satisfied so far as a is concerned, since we have made P , and so P ∧Q, false in a. As for the true conditional P →S in a, it is satisfied in a and in b by the fact that P is false in both of those worlds.
7.2.3
Soundness and other facts about intuitionist validity
Let’s establish a couple of important facts: Everything intuitionistically valid is standardly valid: Proof: Suppose Γ Int φ (that is, φ is a semantic consequence of Γ under the intuitionist definition), and let I be a classical interpretation in which every member of Γ is true; we must show that VI (φ) = 1. Consider the intuitionist model with just one world, in which formulas have the same truth values as they have in the classical interpretation — i.e., 〈{r}, {〈r, r〉}, I*〉, where I*(α, r) = I(α) for each sentence letter α. It’s easy to check that since the intuitionist model has only one world, the classical and intuitionist truth conditions collapse in this case, so that for every wff φ, VI* (φ,r)=VI (φ).
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC167 So, since every member of Γ is true in I, every member of Γ is true at r in the intuitionist model. Since Γ Int φ, it follows that φ is true at r in the intuitionist model; and so, φ is true in the classical interpretation — i.e., VI (φ) = 1. “General heredity”: the heredity condition holds for all formulas, not just atomics. (I.e., for any wff φ, whether atomic or no, and any world, w, in any model, if V(φ, w) = 1 and Rwv then V(φ, v) = 1. ) Proof: by induction. The base case is just the official heredity condition. Next we make the inductive hypothesis: heredity is true for formulas φ and ψ; we must now show that heredity also holds for ∼φ, φ→ψ, φ∧ψ, and φ∨ψ: ∼ : Suppose for reductio that V(∼φ, w) = 1, Rwv, and V(∼φ, v) = 0. Given the latter, for some u, Rv u and V(φ, u) = 1. By transitivity, Rw u. This contradicts V(∼φ, w) = 1. → : Suppose for reductio that V(φ→ψ, w) = 1, Rwv, and V(φ→ψ, v) = 0. Given the latter, for some u, Rv u and V(φ, u) = 1 and V(ψ, u) = 0; but by transitivity, Rw u — contradicts the fact that V(φ→ψ, w) = 1. ∧: Suppose for reductio that V(φ∧ψ, w) = 1, Rwv, and V(φ∧ψ, v) = 0. Given the former, V(φ, w) = 1 and V(ψ, w) = 1. By the inductive hypothesis, V(φ, v) = 1 and V(ψ, v) = 1 — contradiction. ∨ : Suppose for reductio that V(φ∨ψ, w) = 1, Rwv, and V(φ∨ψ, v) = 0. Given the former, either V(φ, w) = 1 or V(ψ, w) = 1; and so, given the inductive hypothesis, either φ or ψ is true in v. That violates V(φ∨ψ, v) = 0. The proof system of section 3.4 is sound. Proof: Say that a sequent Γ ` φ is intuitionistically valid (“I-valid”) iff in for every world in every intuitionist model, if every member of Γ is true at that world (“V(Γ, w) = 1”, for short), then V(φ, w) = 1. To say that the system of section 4.4 is sound is to say that every sequent that is provable in that system is I-valid. Since a provable sequent is the last sequent in any proof, all we need to show is that every sequent in any proof is I-valid. And to do that, all we need to
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC168 show is that i) the rule of assumptions generates I-valid sequents, and ii) all the other rules preserve I-validity. The rule of assumptions generates sequents of the form φ ` φ, which are clearly I-valid. As for the other rules: ∧I: Here we assume that the inputs to ∧I are I-valid, and show that its output is I-valid. That is, we assume that Γ ` φ and ∆ ` ψ are I-valid sequents, and we must show that it follows that Γ, ∆ ` φ∧ψ is also I-valid. So, consider any model with valuation V and any world w such that V(Γ ∪ ∆)=1, and suppose for reductio that V(φ∧ψ, w) = 0. Since Γ ` φ is I-valid, V(φ, w) = 1; since ∆ ` ψ is I-valid, V(ψ, w) = 1; contradiction. ∧E: Assume that Γ ` φ∧ψ is I-valid, and suppose for reductio that V(Γ, w) = 1 and V(φ, w) = 0, for some world w in some model. By the I-validity of Γ ` φ∧ψ, V(φ∧ψ, w) = 1, so V(φ, w) = 1. Contradiction. The case of ψ is symmetric. ∨I: Assume that Γ ` φ is I-valid and suppose for reductio that V(Γ, w) = 1 but V(φ∨ψ, w) = 0. Thus V(φ, w) = 0 — contradiction. ∨E: Assume that Γ ` φ∨ψ, ∆1 ,φ ` Π, and ∆2 ,ψ ` Π are all I-valid, and suppose for reductio that V(Γ∪∆1 ∪∆2 , w) = 1 but V(Π, w) = 0. The first assumption tells us that V(φ∨ψ, w) = 1, so either φ or ψ is true at w. If the former, then the second assumption tells us that V(Π, w) = 1; if the second, then the third assumption tells us that V(Π, w) = 1. Either way, we have a contradiction. DN (the remaining version: if Γ ` φ then Γ ` ∼∼φ): assume Γ ` φ is I-valid, and suppose for reductio that V(Γ, w) = 1 but V(∼∼φ, w) = 1. From the latter, for some v, Rwv and V(∼φ, v) = 1. So V(φ, v) = 0 (since R is reflexive). From the former and the fact that Γ ` φ, V(φ, w) = 1. This violates general heredity. RAA: Suppose that Γ,φ ` ψ∧∼ψ is I-valid, and suppose for reductio that V(Γ, w) = 1 but V(∼φ, w) = 0. Then for some v, Rwv and V(φ, v) = 1. Since V(Γ, w) = 1 (i.e., all members of Γ are true at w), by general heredity V(Γ, v) = 1 (all members of Γ are true at v). Thus, since Γ,φ ` ψ∧∼ψ is I-valid, V(ψ∧∼ψ, v) = 1. But that is
CHAPTER 7. VARIATIONS ON PROPOSITIONAL MODAL LOGIC169 impossible. (If V(ψ∧∼ψ, v) = 1, then V(ψ, v) = 1 and V(∼ψ, v) = 1; but from the latter and the reflexivity of R it follows that V(ψ, v) = 0.) →I: Suppose that Γ,φ ` ψ is I-valid, and suppose for reductio that V(Γ, w) = 1 but V(φ→ψ, w) = 0. Given the latter, for some v, Rwv and V(φ, v) = 1 and V(ψ, v) = 0. Given general heredity, V(Γ, v) = 1. And so, given that Γ,φ ` ψ is I-valid, V(ψ, v) = 1 — contradiction. →E: Suppose that Γ ` φ and ∆ ` φ→ψ are both I-valid, and suppose for reductio that V(Γ ∪ ∆, w) = 1 but V(ψ, w) = 0. Since Γ ` φ and ∆ ` φ→ψ, V(φ, w) = 1 and V(φ→ψ, w) = 1. Given the latter, and given that R is reflexive, either V(φ, w) = 0 or V(ψ, w) = 1. Contradiction. EF (ex falso): Suppose Γ ` φ∧∼φ is I-valid, and suppose for reductio that V(Γ, w) = 1 but V(ψ, w) = 0. Given the former and the I-validity of Γ ` φ∧∼φ, V(φ∧∼φ, w) = 1, which is impossible. Note that the rule we dropped, double-negation elimination, does not preserve I-validity. For suppose it did. The sequent ∼∼P ` ∼∼P is obviously Ivalid, and thus the sequent ∼∼P ` P , which follows from it by double-negation elimination, would also be I-valid. But it isn’t: in the last section we gave a model and a world r in which ∼∼P is true and P is false.
Chapter 8 Counterfactuals1 here are certain conditionals in English that are not well-represented either by the logical material conditional or by the logical strict conditional. Some of these have been called “counterfactual” conditionals, conditionals phrased in the subjunctive mood2 , roughly of the form:
T
If it had been that P , then it would have been that Q for instance: If I had struck this match, it would have lit We represent such a conditional as: P 2→Q What should the logic for these conditionals be? Our language will be that of MPL, plus a new binary sentential connective 2→. We will develop a definition of validity for this new language, but we won’t introduce an axiomatic system; we’ll never define ‘theorem’. 1
This section is adapted from my notes from Ed Gettier’s fall 1988 modal logic class. Thus, counterfactuals must be distinguished from English conditionals phrased in the indicative mood. Counterfactuals are generally thought to semantically differ from indicative conditionals. A famous example: the counterfactual conditional ‘If Oswald hadn’t shot Kennedy, someone else would have’ is false (assuming that certain conspiracy theories are false and Oswald was acting alone); but the indicative conditional ‘If Oswald didn’t shoot Kennedy then someone else did’ is true (we know that someone shot Kennedy, so if it wasn’t Oswald, it must have been something else.) The semantics of indicative conditionals is an important topic in its own right, but we won’t take up that topic here. 2
170
CHAPTER 8. COUNTERFACTUALS
8.1
171
Logical Features of English counterfactuals
Why do we need a new logic for the counterfactual conditional? Why not just use the truth-functional →? The reason is that there are several ways in which English counterfactual conditionals appear to differ from the material conditional. We’ll go through some examples to find out what English sentences involving counterfactuals are, or are not, logically true; and then we’ll then look at a semantics for subjunctive conditionals that has the desired features.
8.1.1
Not truth-functional
Our system for counterfactuals should have the following features: 2 ∼P →(P 2→Q) 2 Q→(P 2→Q) I did not strike the match. But it doesn’t follow that if I had struck the match, it would have lit. Suppose, for example, that it is underwater. So ∼P →(P 2→Q) shouldn’t be valid. Similarly, Clinton won the last election, but it doesn’t follow that if the newspapers had discovered beforehand that Clinton had an affair with Al Gore, he would still have won. So the second thing shouldn’t be a theorem. These wffs are valid for the material conditional, however: ∼P →(P →Q) Q→(P →Q) Thus, → does not adequately symbolize the English counterfactual conditional.
8.1.2
Can be contingent
It’s not true, presumably, that if Oswald hadn’t shot Kennedy, then someone else would have (assuming that the conspiracy theory is false). But the conspiracy theory might have been true, and in that world, it would be true that if Oswald hadn’t shot Kennedy, someone else would have. Thus, the following should be possible: ∼(P 2→Q)∧3(P 2→Q) Likewise, the following should be possible:
CHAPTER 8. COUNTERFACTUALS
172
(P 2→Q)∧3∼(P 2→Q) When I say these things should be “possible”, I mean that our logic shouldn’t rule out such a situation — that is, the following formulas should not be valid formulas: (P 2→Q)→2(P 2→Q) ∼(P 2→Q)→2∼(P 2→Q) One reason this is important is that it shows an obstacle to using the strict conditional ⇒ to represent the counterfactual conditional. For remember that P ⇒Q is defined as 2(P →Q). Thus, it is a theorem of S4 that: (P ⇒Q)→2(P ⇒Q) Moreover, it is a theorem of S5 that: ∼(P ⇒Q)→2∼(P ⇒Q)
8.1.3
No augmentation
Consider the argument form called augmentation. Augmentation is valid for → and for ⇒: P →Q (P ∧R)→Q
P ⇒Q (P ∧R)⇒Q
The formula corresponding to the final argument (augmentation for ⇒), [P ⇒Q]→[(P ∧R)⇒Q], is K-valid, as you can easily check, and hence valid in all our systems. However, augmentation is famously not valid for subjunctive conditionals of English. Consider: If I were to strike the match, it would light. Therefore, if I were to strike the match and I was in outer space, it would light. So, our next desideratum is that the corresponding argument should not be valid for the counterfactual conditional 2→. That is, the conditional corresponding to this sort of argument, namely:
CHAPTER 8. COUNTERFACTUALS
173
[P 2→Q]→[(P ∧R)2→Q] should not be a valid formula, when we give our definition of validity below. I’ll quit writing these formulas down — I’ll just talk about arguments being valid.
8.1.4
No contraposition
Contraposition is valid for → and for ⇒: φ⇒ψ ∼ψ⇒∼φ
φ→ψ ∼ψ→∼φ
but not for 2→. Suppose I’m on the firing squad, and we just shot someone. Suppose I learned that my gun was loaded, but so were those of others. Then the premise of the following argument is true, while its consequent is false: If my gun hadn’t been loaded, he would still be dead. Therefore, if he weren’t dead, my gun would have been loaded.
8.1.5
Some validities
The following argument should be valid: P 2→Q P →Q That is, the counterfactual conditional should imply the material conditional. For then it follows that modus ponens and modus tollens are valid for the 2→: P P 2→Q Q
P 2→Q ∼Q ∼P
(The reason is, of course, that modus ponens and modus tollens are valid for the →.) (Note that it’s not inconsistent to say that modus tollens holds for the 2→ and also that contraposition fails.) Another validity: the strict conditional should imply the counterfactual:
CHAPTER 8. COUNTERFACTUALS
174
P ⇒Q P 2→Q To see that these implications hold, consider the argument from the strict conditional to the counterfactual conditional: surely, if P entails Q, then if P were true, Q would be as well. As for the counterfactual implying the material, suppose that you think that if P were true, Q would also be true. Now suppose that someone tells you that P is true, but that Q is false. Wouldn’t you then need to give up your original claim that if P were to be true, then Q would be true? It seems so. So, the statement P 2→Q isn’t consistent with P ∧∼Q — that is, it isn’t consistent with the denial of P →Q.
8.1.6
Context dependence
This is perhaps the most notorious feature of the counterfactual conditional. Consider the sentence: If Albert Einstein were to fight Mike Tyson, he would win. Is this true? Seems not. After all, Mike Tyson is a champion boxer, incarcerated though he is. But wait: Einstein is a smart guy. Einstein wouldn’t fight Tyson unless, for some reason, he knew he would win. So the conditional seems true. Another example: If Syracuse were in Louisiana, it would be warm in Syracuse in the winter. True or false? It might seem true: Louisiana is in the south. But wait — perhaps Syracuse would be in Louisiana because the borders of Louisiana would then be extended North to our present location. In these cases, which answer is correct? No one answer is correct. In some contexts, it would be correct to give one answer; in others, the other answer would be correct. Take the first case. When we imagine Einstein fighting Tyson, we imagine reality being different in certain respects from actuality — we imagine Einstein fighting Tyson, for example. In other respects, we imagine a situation that is a lot like reality — we don’t imagine a situation, for example, in which people explode instantaneously when they start to fight. Now, when considering counterfactuals, there is a question of what parts of reality we hold constant. In the Einstein-Tyson case, we seem to have at least two choices. Do
CHAPTER 8. COUNTERFACTUALS
175
we hold constant the brains of Einstein, and consider what would happen then if Einstein were to fight Tyson? Or do we hold constant the strength of Tyson? In the second case, we can hold constant either the geographical location of Syracuse, or the border of Louisiana. What determines which things are to be held constant, when we evaluate the truth value of a counterfactual? It would seem that the context of utterance of a counterfactual does. When I talk about the second counterfactual, and I say: “you know, I wish Syracuse were in Louisiana. We’d be so warm now….”. What I said in this context was true, for I created a context in which we hold constant the features of actuality that we must in order for the counterfactual to be true. In another context, it might be that we would speak in such a way as for the counterfactual to be false. We might say “You know, Louisiana is statistically the warmest state in the country. Good thing Syracuse isn’t in Louisiana, because that would ruin the statistic. It would be freezing in Louisiana if Syracuse was in Louisiana.” My utterance seems true here too. What, does just saying this sentence, intending to make it true, make it true? Well, in a sense, yes. When a certain sentence has a meaning that is partly determined by context, then when a person utters that sentence with the intention of saying something true, that tends to create a context in which the sentence is true. Compare ‘flat’ — we’ll say “the table is flat”, and thereby utter a truth. But when a scientist looks at the same table and says “you know, macroscopic objects are far from being flat. Take that table, for instance. It isn’t flat at all — when viewed under a microscope, it can be seen to have a very irregular surface”. The term ‘flat’ has a certain amount of vagueness — how flat does a thing have to be to count as being “flat”? Well, the amount required is determined by context.3
8.2
The Lewis/Stalnaker approach
Here is the core idea of David Lewis (1973) and Robert Stalnaker (1968) of how to interpret counterfactual conditionals. Consider a counterfactual conditional P 2→Q. To determine its truth, Lewis and Stalnaker instruct us to consider the possible world that is as similar to reality as possible, in which P is true. Then, the counterfactual is true in the actual world if and only if Q is true in that possible world. Consider Lewis’s example: 3
See Lewis (1979).
CHAPTER 8. COUNTERFACTUALS
176
If kangaroos had no tails, they would topple over. When we consider the possible world that would be actual if kangaroos had no tails, we do not depart gratuitously from actuality. For example, we do not consider a world in which kangaroos have wings, or crutches. We do not consider a world with different laws of nature, in which there is no gravity. We keep the kangaroos as they actually are, but remove the tails, and we keep the laws of nature as they actually are. It seems that the kangaroos would then fall over. Take the examples of the previous section, in which I got you to give differing answers to certain sentences. Consider: If Einstein fought Tyson, he’d win. How does the contextual dependence of this sentence work, on the LewisStalnaker view? By supplying different standards of comparison of similarity. Think about similarity, for a moment: things can be similar in certain respects, while not being similar in other respect. A blue square is similar to a blue circle in respect of color, not in respect of shape. Now, when we answer affirmative to this counterfactual, according to Lewis and Stalnaker, when we consider the possible world most similar to the actual world in which Einstein and Tyson fight, we are using a kind of similarity that weights heavily Einstein’s intelligence. When we count the counterfactual false, we are using a kind of similarity that weights very heavily Tyson’s physical features.
8.3
Stalnaker’s system4
I now lay out Stalnaker’s system, SC (for “modal Stalnaker Conditionals”):
8.3.1
Syntax of SC
Definition of SC-wffs: i) Sentence letters are wffs ii) if φ, ψ are wffs then (φ→ψ), ∼φ, 2φ, and (φ2→ψ) are wffs iii) nothing else is a wff 4
See Stalnaker (1968). The version of the theory I present here is slightly different from Stalnaker’s original version; see Lewis (1973, p. 79).
CHAPTER 8. COUNTERFACTUALS
177
If we wanted, we could next define a notion of theoremhood for SC, by choosing some axioms and rules of inference. We won’t do this, though; we’ll go straight to semantics.
8.3.2
Semantics of SC
First, let’s define two formal properties for relations: strong connectivity and anti-symmetry. Let R be a binary relation over A: strong connectivity (in A): for any x, y ∈ A, either Rxy or Ry x anti-symmetry: for any x, y, if Rxy and Ry x then x = y Another bit of notation — let R be some three-place relation. We might abbreviate “Rxy z” as “R z xy”. Now, there’s a certain two-place relation we can get by “plugging” up one place of R with a certain object u — we define it as follows: R u is the relation that holds between x and y iff R u xy. Now we have our definition of an SC-model (“Stalnaker-counterfactuals”): An SC-model is an ordered triple 〈W,,V〉, where i) W is a nonempty set (As before, think of the members of W as “worlds”. Also, let’s use the variables w, x, y, z, and so on, as variables for worlds, so we can say “for any w”, instead of “for any w ∈ W”.) ii) V is a function that assigns truth values to formulas relative to members of W (“valuation function”) iii) is a three-place relation over W (“nearness relation”; read “x z y” as “x is at least as near to/similar to z as is y”.) iv) V and satisfy the following conditions: C1: C2: C3: C4: C5:
for any w,w is strongly connected (in W) for any w, w is transitive for any w, w is anti-symmetric for any x, y, x x y (“Base”) for any SC-wff, φ, provided φ is true in at least one world, then for every z, there’s some w such that V(φ, w) = 1, and such that for any x, if V(φ, x) = 1 then w z x (“Limit”)
CHAPTER 8. COUNTERFACTUALS
178
v) For all SC-wffs, φ, ψ and for all w ∈ W, a) b) c) d)
V(∼φ, w) = 1 iff V(φ, w) = 0 V(φ→ψ, w) = 1 iff either V(φ, w) = 0 or V(ψ, w) = 1 V(2φ, w) = 1 iff for any v, V(φ, v) = 1 V(φ2→ψ, w) = 1 iff for any x, IF [V(φ, x) = 1 and for any y such that V(φ, y) = 1, x w y] THEN V(ψ, x) = 1
Phew! Let’s look into what this means. First, notice that much of this is exactly the same as for our MPL models — we still have the set of worlds, and formulas being given truth values at worlds. We’ll still say that φ is true “at w” iff V(φ, w) = 1. What happened to the accessibility relation? It has simply been dropped, in favor of a simplified truth-clause for the 2 — clause c. 2φ is true iff φ is true at all worlds in the model, not just all accessible worlds. It turns out that this in effect just gives us an S5 logic for the 2, for you get the same valid formulas for MPL, whether you make the accessibility relation an equivalence relation, or a total relation. Clearly, if φ is valid in all equivalence relation models, then it is valid in all total models, since every total relation is an equivalence relation. What’s more, the converse is true — if φ is valid in all total models then it’s also valid in all equivalence relation models. Rough proof: let φ be any formula that’s valid in all total models, and let M be any equivalence relation model. We need to show that φ is true in an arbitrary world r ∈ W (M’s set of worlds). Now, any equivalence relation partitions its domain into non-overlapping subsets in which each world sees every other world. So W is divided up into one or more non-overlapping subsets. One of these, W r , contains r . Now, consider a model, M0 , just like M, but whose set of worlds is just W r . M0 is a total model, so φ is valid in it by hypothesis. Thus, in this model, φ is true at r . But then φ is true at r in M, as well. Why? Well, this is the rough bit, but it’s intuitively correct: the truth value of φ at r in M isn’t affected by what goes on outside r ’s partition, since chains of modal operators just take us to worlds seen by r , and worlds seen by worlds seen by r , and…. Such chains will never have us “look at” anything out of r ’s partition, since these worlds are utterly unconnected to r via the accessibility relation. So φ’s truth value at r in M is determined by what goes on in W r , and so is the same as its truth value at r in M0 . So, we get the same class of valid formulas whether we require the accessibility relation to be total, or an equivalence relation. Things are easier if we make it a total relation, because then we can simply drop talk of the accessibility
CHAPTER 8. COUNTERFACTUALS
179
relation, and define necessity as truth at all worlds. The corresponding clause for possibility is: e) V(3φ, w) = 1 iff for some v, V(φ, v) = 1 The derived clauses for the other connectives remain the same: f) V(φ∧ψ, w) = 1 iff V(φ, w) = 1 and V(ψ, w) = 1 g) V(φ∨ψ, w) = 1 iff V(φ, w) = 1 or V(ψ, w) = 1 h) V(φ↔ψ, w) = 1 iff V(φ, w) = V(ψ, w) Next, what about this nearness relation? Think of this as the similarity relation we talked about before. In order to evaluate whether x w y, we place ourselves in possible world w, and we ask whether x is more similar to our world than y is. (Recall the bit about counterfactual conditionals being highly context dependent. In a full treatment of counterfactuals, we would complicate our semantics by introducing contexts of utterance, and evaluate sentences relative to these contexts of utterance. The point of this would be to allow different nearness relations in the different contexts. But that is beyond the scope of this course.) I say “we can think of” as a similarity relation, but take this with a grain of salt — just as the members of W can be any old things, so, can be any old relation over W. Just as the members of W could be fish, so the relation could be any old relation over fish. In logic, we just use these models to define the notion of validity. (Exactly how these logical systems relate to the problem of giving real truth conditions for English counterfactuals is tricky, and again beyond the scope of this course.) The axioms govern the formal properties of the nearness relation — certain of them, at least, seem plausible constraints on if it is to be thought of as a similarity relation. C1 simply says that it makes sense to compare any two worlds in respect of similarity to a given world. C2 has a transparent meaning. C3 means “no ties” — it says that, relative to a given base world w, it is never the case that there are two separate worlds x and y such that each is at least as close to w as the other. C4 is the “base” axiom — it says that every world is at least as close to itself as every other. Given C3, it has the further implication that every world is closer to itself than every other. (We define “x is closer to w than y is” (x ≺w y) to mean x w y and not: y w x.) C5 is called the “limit” assumption: according to it, for any formula φ and any base world w, there is some world that is a closest world to w in which φ is true (that is, unless φ isn’t true at any worlds at all). This rules out the following possibility: there
CHAPTER 8. COUNTERFACTUALS
180
are no closest φ worlds, only an infinite chain of φ worlds, each of which is closer than the previous. Certain of these assumptions have been challenged, especially C3 and C5. We will consider those issues below. Note that the valuation function for a model was built directly into the model. This is in contrast to our earlier definitions of models, in which we equipped models with interpretation functions of different sorts, and then defined the resultant valuation functions accordingly. The valuation function is built into the definition of a SC-model because the limit assumption constrains SC-models, and the limit assumption is stated in terms of the valuation function. Given our definitions, we can define SC-validity: φ is SC-valid =d f φ is true at every world in every SC-model
8.4
Validity proofs in SC
Given this semantic system, we can give semantic validity proofs just as we did for the various modal systems. Consider the formula (P ∧Q)→(P 2→Q). We want to show that this is SC-valid. Thus, we pick an arbitrary SC-model, 〈W,,V〉, pick an arbitrary world r ∈W, and show that this formula is true at r : i) Suppose for reductio that V((P ∧Q)→(P 2→Q), r ) = 0 ii) then V(P, r ) = 1,V(Q, r ) = 1, and V(P 2→Q, r ) = 0 iii) the truth condition for 2→ says that P 2→Q is true at r iff for every closest P -world (to r ), Q is true as well. So since P 2→Q is false at r , there must be a closest-to-r P -world at which Q is false — that is, there is some world a such that: a) V(P, a) = 1 b) for any x, if V(P, x) = 1 then a r x c) V(Q, a) = 0 iv) from ii), since V(P, r ) = 1, a r r v) by “base”, r r a vi) so, by anti-symmetry, r = a. But now, Q is both true and false at r Here is another validity proof, for the formula {[P 2→Q]∧[(P ∧Q)2→R]} →[P 2→R]. (This formula is worth taking note of, because it is valid despite its similarity to the invalid formula {[P 2→Q]∧[Q2→R]}→[P 2→R]):
CHAPTER 8. COUNTERFACTUALS
181
i. Suppose for reductio that P 2→Q and (P ∧Q)2→R are true at r , but P 2→R is false there. ii. Then there’s a world, a, that is a nearest-to-r P world, in which R is false. iii. Since P 2→Q is true at r , Q is true in all the nearest-to-r P worlds, and so V(Q, a) = 1. iv. Note now that a is a nearest-to-r P ∧Q world: • P and Q are both true there, so P ∧Q is true there. • let x be any P ∧Q world. x is then a P world. But since a is a nearest-to-r P world, we know that a r x. (Remember: “a is a nearest-to-r P world” means: “V(P, a) = 1, and for any x, if V(P, x) = 1 then a r x”) v. Since (P ∧Q)2→R is true at r , it follows that R is true at a. This contradicts ii.
8.5
Countermodels in SC
Our other main concern will be countermodels. As before, we’ll be able to use failed attempts at constructing countermodels as guides to our validity proofs. First, consider the truth condition for φ2→ψ: V(φ2→ψ, w) = 1 iff for every x, if x is a nearest-to-w φ world then V(ψ, x) = 1 Now, suppose there are no nearest-to-w φ worlds. Then V(φ2→ψ, w) is automatically 1, since the universally quantified statement is vacuously true. How could it turn out that there are no nearest-to-w φ worlds? One way would be if there are no φ worlds whatsoever. That is, if φ is necessarily false. And in fact, this is the only way there could be no nearest-to-w φ worlds, for the limit assumption guarantees that if there is at least one φ world, then there is a nearest-to-w φ world. Say that a counterfactual φ2→ψ is vacuously true iff φ is false at every world. What we have seen is that there are two ways a counterfactual φ2→ψ can be true at a given world w: i) it can be vacuously true, or ii) ψ can be true in all the nearest-to-w φ worlds (by anti-symmetry, there can be only one). Bear this
CHAPTER 8. COUNTERFACTUALS
182
in mind when constructing a countermodel — there are two ways to make a 2→ true. On the other hand, there’s only one way to make a counterfactual false. It follows from the truth condition for the 2→ that: V(φ2→ψ, w) = 0 iff there is a nearest-to-w φ world at which ψ is false. In constructing models, it is often wise to make forced moves first, as we saw for constructing MPL models. This continues to be a good strategy. It now means: take care of false 2→s before taking care of true 2→s, for the latter can occur in two different ways, whereas the former can only occur in one way. We’ll see how this goes with some concrete examples. Let’s show that transitivity is invalid for 2→. That is, let’s show that the following formula is SC-invalid: [(P 2→Q)∧(Q2→R)]→(P 2→R). I’m going to do this with diagrams that are a little different from the ones I used for MPL. The new diagrams again contain boxes (rounded now, to distinguish them from the old countermodels) in which we put formulas; and we again indicate truth values of formulas with small numbers above the formulas. Since there is no accessibility relation, we don’t need the arrows between the boxes. But since we need to represent the nearness relation, we will arrange the boxes in a vertical line. At the bottom of the lines goes a box for the world, r , of our model in which we’re trying to make a given formula false. We place the other worlds in the diagram above this bottom world r : the further away a world is from r in the r ordering, the further above r we place it in the diagram. In this case we want to make P 2→Q and Q2→R true, but P 2→R false, so we begin as follows: /.
r
-,
()[(P 2→Q)∧(Q2→ R)]→(P 2→ R)*+ 1
1
1
0
0
In keeping with the advice I gave a moment ago, let’s “take care of” the false counterfactual first: let’s make P 2→R false in r. This means that we need to have R be false in the nearest-to-r P world. I’ll indicate this as follows:
CHAPTER 8. COUNTERFACTUALS
a
O
no P r
183
/.1
()P
0
R
-,
Q*+ 1
/. 0 1 -, 1 1 0 0 ()[(P 2→Q)∧(Q2→ R)]→(P 2→ R)*+
a is the nearest P world to r. I indicate that in the diagram by making P true at a, and reminding myself that I can’t put any P -world anywhere between a and r . Since the counterfactual P 2→R was supposed to be false at r, I made R false in a. Notice that I made Q true in a. This was to take care of the fact that P 2→Q is true in r . This formula says that Q is true in the nearest-to-r P world. Well, a is the nearest-to-r P world. Now for the final counterfactual Q2→R. This can be true in two ways — either there is no Q world at all (the vacuous case), or R is true in the nearestto-r Q world. Well, Q is already true in a, so the vacuous case is ruled out. So we must make R true in the nearest to r Q-world. Where will we put the nearest-to-r Q world? There are three possibilities. It could be farther away from, tied with, or closer to r than a. Let’s try the first possibility: /.1
b
a
O
no P r
/.1
()P
-,
()Q
R*+ 1
0
R
O -,
Q*+ 1
/. 0 1 0 1 -, 1 0 0 ()[(P 2→Q)∧(Q2→ R)]→(P 2→ R)*+
no Q
This doesn’t work, because when we make b the nearest-to-r Q world, this contradicts the fact that we’ve got Q true at a nearer world — namely, a. Likewise, we can’t make the nearest-to-r Q world be tied with a. Antisymmetry would make this world be a. But we need to make R true at the nearest Q world, whereas R is already false in a. But the final possibility works out just fine — let the nearest Q world be closer than a:
CHAPTER 8. COUNTERFACTUALS
O
a
no P
b
r
/.1
()P
R
/.1
1
()Q
0
R
184 -,
Q*+ 1
-,
P*+ 0
/. 0 1 0 1 -, 1 0 0 ()[(P 2→Q)∧(Q2→ R)]→(P 2→ R)*+
O
no Q
Notice that I made P false in b, since I said “no P ” of all worlds nearer to r than a. Here’s the official model: W={r,a,b} r = {〈b,a〉…} V(P, a) = V(Q, a) = V(Q, b) = V(R, b) = 1; all other atomics false everywhere else. A couple comments on the official model. I left out a lot in giving the similarity relation for this model. First, I left out some of the elements of r — fully written out, it would be: r = {〈r,b〉,〈b,a〉,〈r,a〉,〈r,r〉,〈a,a〉,〈b,b〉} I left out 〈r,b〉 because it gets included automatically given the “base” assumption (C4). Also, the element 〈r,a〉 is required to make r transitive. The elements 〈r,r〉, 〈a,a〉, and 〈b,b〉 were entered to make that relation reflexive. Why must it be reflexive? Because reflexivity comes from strong connectivity. (Let w and x be any members of W; we get (x w x or x w x) from Strong connectivity of w , and hence x w x.) My plan was to write out enough of r so that the rest could be inferred, given the definition of an SC-model. Secondly, this isn’t a complete writing out of itself; it is just r . To be complete, we’d need to write out a , and b . But in this problem, these latter two parts of don’t matter, so it’s permissible to omit them. We’ll do a problem below where we need to consider more of than simply r . Suppose one has a formula, for example: (P 2→R)→((P ∧Q)2→R)
CHAPTER 8. COUNTERFACTUALS
185
(This is the formula corresponding to the inference pattern of augmentation.) Suppose we ask: is it SC-valid? The best way to approach this is as follows. First try to found a countermodel. If we succeed, then the formula is not valid. If we have trouble finding a countermodel, then we can try giving a semantic validity proof (for if the formula is in fact valid, then no countermodel exists). In this particular case, the formula is invalid, as the following countermodel shows: a
O
/.1 1 1 ()P ∧Q
/.1
no P ∧Q
b
r
()P
1
R
-,
R*+ 0
-,
Q*+ 0
/. 0 1 -, 0 0 ()(P 2→ R)→[(P ∧Q)2→ R)*+
O
no P
Ok, I began with the false subjunctive: (P ∧Q)2→R. This forced the existence of a nearest P ∧Q world, in which R was false. But since P ∧Q was true there, P was true there; this ruled out the true P 2→R in r being vacuously true. So I was forced to consider the nearest P world. It couldn’t be farther out than a, since P is true in a. It couldn’t be a, since R was already false there. So I had to put it nearer than a. Notice that I had to make Q false at b. Why? Well, it was in the “no P ∧Q zone”, and I had made P true in it. Here’s the official model: W={r,a,b} r = {〈b,a〉…} V(P ,a) = V(Q,a) = V(P ,b) = V(R,b) = 1; all other atomics 0 everywhere else. Next example: 3P →[(P 2→Q)→∼(P 2→∼Q)]. An attempt to find a countermodel fails at the following point:
CHAPTER 8. COUNTERFACTUALS /. a
O
no P r
()P 1
186
1 1 0
-,
∼Q*+
/.1 0 0 -, 1 0 0 1 ()3P →[(P 2→Q)→∼(P 2→∼Q)]*+
At world a, we’ve got Q being both true and false. A word about how we got to that point. I noticed that I had to make two counterfactuals true: P 2→Q and P 2→∼Q. Now, this isn’t a contradiction all by itself. Remember that counterfactuals are vacuously true if their antecedents are impossible. So if P were impossible, then both of these would indeed be true, without any problem. But 3P has to be true at r. This rules out those counterfactual’s being vacuously true. Since P is possible, the limit assumption has the result that there is a closest P world. This then with the two true counterfactuals created the contradiction. This reasoning is embodied in the following semantic validity proof: i. Suppose for reductio that 3P is true at some world r , but (P 2→Q)→∼(P 2→∼Q) is false at r . ii. Then P 2→Q and P 2→∼Q are both true at r as well. iii. Since 3P is true at r , P is true at some world. So, by the limit assumption, we have: there exists a world, a, such that V(P, a) = 1 and for any x, if V(P,x)=1 then a r x. For short, there is a closest-to-r P world. iv. The truth condition for 2→, applied to P 2→Q, gives us that Q is true at all the closest-to-r P worlds. v. Similarly, applied to P 2→∼Q, we know that ∼Q is true at all the closest-to-r P worlds. vi. Thus, both Q and ∼Q would be true at a. Impossible. Note the use of the limit assumption. It is the limit assumption we must use when we need to know that there is a nearest φ-world, and we can’t get this knowledge from other things in the proof. For the last example of a countermodel, I’ll do a new kind of problem: one with a counterfactual nested within another counterfactual. Let’s show that the following formula is SC-invalid: [P 2→(Q2→R)]→[(P ∧Q)2→R] (this is the
CHAPTER 8. COUNTERFACTUALS
187
formula corresponding to “importation”). We begin by making the formula false in r, the actual world of the model. This means making the antecedent true and the consequent false. Now, since the consequent is a false counterfactual, we are forced to make there be a nearest P ∧Q world in which R is false: a
O
no P ∧Q
/.
r
/.1 1 1 ()P ∧Q
-,
R*+ 0
-,
()[P 2→(Q2→ R)]→[(P ∧Q)2→ R]*+ 1
0
0
Now we’ve got to make the first subjunctive conditional true. We can’t make it vacuously true, because we’ve already got a P -world in the model: a. So, we’ve got to put in the nearest-to-r P world. Could it be farther away than a? No, because a would be a closer P world. Could it be a? No, because we’ve got to make Q2→R true in the closest P world, and since Q is true but R is false in a, Q2→R is already false in a. So, we do it as follows: a
O
/.1 1 1 ()P ∧Q
/.1
no P ∧Q
b
r
()P
-,
R*+ 0
-,
Q2→R*+ 0
1
/. 0 1 -, 0 0 ()[P 2→(Q2→ R)]→[(P ∧Q)2→ R]*+
O
no P
Why did I make Q false at b? Well, because b is in the “no P ∧Q” zone, and P is true at b, so Q had to be false there. Now, the remaining thing to do is to make Q2→R true at b. This requires some thought. The diagram right now represents “the view from r” — it represents how near the worlds in the model are to r. That is, it represents the r relation. But the truth value of Q2→R at b depends on “the view from b” — that is, the b relation. So we need to consider a new diagram, in which b is the bottom, base, world:
CHAPTER 8. COUNTERFACTUALS /.1
no Q
/.1
b
-,
()Q
c
O
188
()P
R*+ 1
-,
Q2→R*+ 0
1
I made there be a nearest-to-b Q world, and made R true there. Notice that I kept the old truth values of b from the other diagram. This is because this new diagram is a diagram of the same worlds as the old diagram; the difference is that the new diagram represents the b nearness relation, whereas the old one represented a different relation: r . Now, this diagram isn’t finished. The diagram is that of the b relation, and that relation relates all the worlds in the model. So, worlds r and a have to show up somewhere here. But the safest practice is to put them far away from b, so that there isn’t any possibility of conflict with the “no Q” zone that has been established. Thus, the final appearance of this part of the diagram is as follows: r
a
c
O
no Q
/.1
b
()P
/. ()
-, *+
/. ()
-, *+
/.1
-,
()Q
R*+ 1
-,
Q2→R*+ 0
1
The old truth values from r and a still are had (remember that this is another diagram of the same model, but representing a different nearness relation), but I left them out because of the fact that they’ve already been written on the other part of the diagram. Notice that the order of the worlds in the r-diagram does not in any way affect the order of the worlds on the b diagram. The nearness relations in the two diagrams are completely independent, because in our definition of ‘SC-
CHAPTER 8. COUNTERFACTUALS
189
model’, we entered in no conditions constraining the relations between i and j when i 6= j . This sometimes seems unintuitive. For example, we could have two halves of a model looking as follows: The view from r c b a r
The view from a r b c a
If this seems odd, remember that only some of the features of a diagram are intended to be genuinely representative. For example, these diagrams are in ink; this is not intended to convey the idea that the worlds in the model are made of ink. This feature of the diagram isn’t intended to convey information about the model. Analogously, the fact that in the left diagram, b is physically closer to a than c is not intended to convey the information that, in the model, ba c. In fact, the diagram of the view from r is only intended to convey information about r ; it doesn’t carry any information about a , b , or c . Back to the countermodel. That other part of the diagram, the view from r, must be updated to include world c. The safest procedure is to put c far away on the model to minimize possibility of conflict. Thus, the final picture of the view from r is: c
a
O
b
r
-, *+
/.1 1 1 ()P ∧Q
/.1
no P ∧Q
/. ()
()P
-,
R*+ 0
-,
Q2→R*+ 0
1
/. 0 1 -, 0 0 ()[P 2→(Q2→ R)]→[(P ∧Q)2→ R]*+
O
no P
Again, I haven’t re-written the truth values in world c, because they’re already in the other diagram, but they are to be understood as carrying over. Now for the official model:
CHAPTER 8. COUNTERFACTUALS
190
W={r,a,b,c} V(P ,a) = V(Q,a) = V(P ,b) = V(Q,c) = V(R,c) = 1; all else (atomics) 0 elsewhere r = {〈b,a〉,〈a,c〉…} b = {〈c,a〉,〈a,r〉…} Notice that we needed to express two of ’s subrelations — r and b . Remember that any model has got to contain i for every world i in the model. For example, if we were to write out this model completely officially, we’d have to specify a and c . But we don’t bother with those parts of that don’t matter.
8.6
Logical Features of SC
Here we’ll discuss various features of SC, which appear to confirm that it’s a good logic for counterfactual conditionals. Recall that we began our discussion of counterfactuals by noting some features of English counterfactuals that must be reflected in our logical treatment of 2→. In what follows, we’ll show that these features, and more, are preserved in SC.
8.6.1
Not truth-functional
We wanted our system for counterfactuals to have the following features: 2 ∼P →(P 2→Q) 2 Q→(P 2→Q) Clearly, it does. Take any model in which P is false at some world, r, and in which there’s a nearest-to-r P world in which Q is false. The conditional ∼P →(P 2→Q) is false at r in this world; hence that conditional isn’t SC-valid. Similarly, the second conditional Q→(P 2→Q) is invalid, as is shown by a model in which Q is true at r, P is false at r, and in which there’s a nearest-to-r P world in which Q is false.
8.6.2
Can be contingent
We wanted it to turn out that:
CHAPTER 8. COUNTERFACTUALS
191
2 (P 2→Q)→2(P 2→Q) 2 ∼(P 2→Q) →2∼(P 2→Q) They’re clearly not valid, in virtue of the fact that the similarity metric based on different worlds can be very different. Perhaps Q is true in the nearest-to-r P world. Q might still be false at the nearest-to-a P world. Thus, P 2→Q would be false at a, and so 2(P 2→Q) would be false at r. Similarly for the second formula.
8.6.3
No augmentation
We’ve already shown above that the following conditional isn’t SC-valid: [P 2→Q]→[(P ∧R)2→Q]
8.6.4
No contraposition
Let’s show that 2 (P 2→Q)→(∼Q2→∼P ): a
O
/.1 0 ()∼Q /.1
no ∼Q
b
/. r
()P
∼P*+ -,
Q*+ 1
O -,
()(P 2→Q)→(∼Q2→∼P )*+ 1
0
I won’t bother with the official model.
8.6.5
-,
0 1
No exportation
Exportation is valid for →: (P ∧Q)→R P →(Q→R) but not for ⇒ (in any system):
0
no P
CHAPTER 8. COUNTERFACTUALS
192
(P ∧Q)⇒R P ⇒(Q⇒R) It is also invalid for the English counterfactual: If I had married X and Y , I would have been a bigamist. does not imply If I had married X , then it would have been the case that if I had married Y I would have been a bigamist. Fortunately, the formula [(P ∧Q)2→R]→[P 2→(Q2→R)] is not SC valid, as can easily be shown with countermodels.
8.6.6
No importation
Importation is valid for →: and valid in T for ⇒: P →(Q→R) (P ∧Q)→R
P ⇒(Q⇒R) (P ∧Q)⇒R
but not for the English counterfactual. For consider: If I had married X , then it would have been the case that if I had married Y I would have been happy. doesn’t imply If I had married X and Y , I would have been happy. (if I had married both, my relatives would have hated me, I would have become a public spectacle, etc.) Fortunately, this argument isn’t valid for the 2→ in SC; above we used countermodels to show that the conditional [P 2→(Q2→R)]→[(P ∧Q)2→R] isn’t SC-valid.
CHAPTER 8. COUNTERFACTUALS
8.6.7
193
No hypothetical syllogism (transitivity)
The following inference pattern is valid for → and ⇒ (in all systems): P →Q Q→R P →R
P ⇒Q Q⇒R P ⇒R
but not for the English counterfactual, for consider: If my mother and father hadn’t met, I wouldn’t have been born. If I hadn’t been born, Mike would have been the oldest. If my mother and father hadn’t met, Mike would have been the oldest. Fortunately, we were able to use countermodels above to show that [(P 2→Q) ∧(Q2→R)]→(P 2→R) is not SC-valid.
8.6.8
No transposition
Transposition is valid for →: P →(Q→R) Q→(P →R) but not for ⇒ (in any system): P ⇒(Q⇒R) Q⇒(P ⇒R) Here’s an example that shows that transposition should be invalid for 2→5 It is true that: If Ted had taken a job in Virginia, then if he had taken a job in Rochester, he’d have lived in New York State. But it is not true that: If Ted had taken a job in Rochester, then if Ted had taken a job in Virginia, he’d have lived in New York State. Fortunately, [P 2→(Q2→R)]→[Q2→(P 2→R)] is SC-invalid, as the reader can verify. 5
Thanks to Brock Sides for the example.
CHAPTER 8. COUNTERFACTUALS
8.6.9
194
Some validities
Here are a couple validity proofs for formulas we wanted to turn out valid: Proof that (P 2→Q)→(P →Q): i. Suppose P 2→Q is true at some world r , and that P is true at r. ii. Given “base”, for every world, x, r r x. iii. Thus, r is a closest-to-r P world. So Q is true there too. Proof that (P ⇒Q)→(P 2→Q): i. Suppose P ⇒Q is true at r , and suppose for reductio that P 2→Q is false at r . ii. Then there’s a nearest-to-r P world, a, at which Q is false. iii. But that can’t be. “P ⇒Q” means 2(P →Q). So P →Q is true at every world. So there can’t be a world like a, in which P is true and Q is false.
8.7
Lewis’s criticisms of Stalnaker’s theory
SC isn’t the only logical system that has been proposed for subjunctive conditionals, and it isn’t even the only theory based on “similarity” of worlds. There are rival “similarity” theories, especially David Lewis’s. So far, I’ve only discussed features of Stalnaker’s system that are shared by Lewis’s. Now I’ll talk about things that are specific to Stalnaker. Lewis challenges Stalnaker’s assumption that w is always anti-symmetric. Similarity relations permit ties. It seems very implausible to rule out that two worlds are exactly similar to the actual world. So if these conditions were being offered as real truth conditions for English counterfactuals, this would seem to be an objection.6 The validity of certain formulas depends on the “no ties” assumption; the following two wffs are SC-valid, but are challenged by Lewis: (P 2→Q) ∨ (P 2→∼Q)
(“Conditional excluded middle”)
[P 2→(Q∨R)] → [(P 2→Q)∨(P 2→R)] 6
For an interesting response, see Stalnaker (1981).
(“distribution”)
CHAPTER 8. COUNTERFACTUALS
195
Take the first one, for example. Suppose you gave up anti-symmetry, thereby allowing ties. Then the following would be a countermodel for the law of conditional excluded middle: a
O
no Q
/.
r
/.1
()P
-,
/.1
Q*+ 0
b
()P
-,
∼Q*+ 0 1
-,
()(P 2→Q)∨(P 2→∼Q)*+ 0
0
0
Remember that P 2→Q is true only if Q is true in all the nearest P worlds. In this model, Q is true in one of the nearest P worlds, but not all, so that counterfactual is false at r. Similarly for P 2→∼Q. A similar model shows that distribution fails if the “no ties” assumption is given up. So, should we give up conditional excluded middle? As Lewis notes, the principle is very plausible. First note that it is uncontroversial that the following is valid: 3P →[(P 2→∼Q)→∼(P 2→Q)] So the validity of conditional excluded middle would imply that, in cases where P is contingent, ∼(P 2→Q) and P 2→∼Q are equivalent to each other. And this is indeed how we ordinary use such counterfactuals: we often don’t distinguish between P 2→∼Q and ∼(P 2→Q). The following dialogue sounds natural: “If I were to hit you, you’d bleed”. “No! If you hit me, I wouldn’t bleed.” We seem to be treating “If you hit me, I wouldn’t bleed” and “Not: if you hit me, I would bleed” interchangeably. Accepting conditional excluded middle vindicates this ordinary practice. But if conditional excluded middle is invalid, then this would not be justified, since P 2→Q and P 2→∼Q could both be false; the second could not be treated as the negation of the first. And take the other formula validated by Stalnaker’s theory, distribution. In reply to: “if the coin had been flipped, it would have come out either heads or tails”, one might ask: “which would it have been, heads or tails?”. The thinking behind the reply is that “if the coin had been flipped, it would have come up heads”, or “if the coin had been flipped, it would have come up tails” must be true.
CHAPTER 8. COUNTERFACTUALS
196
So there’s some plausibility to both these formulas. But Lewis says two things. First, if we’re going to accept the similarity analysis, we’ve got to give them up, because ties just are possible. But second, the intuitions aren’t completely compelling. About the coin-flipping case, Lewis denies that if the coin had been flipped, it would have come up heads, and he also denies that if the coin had been flipped, it would have come up tails. Rather, he says, if it had been flipped, it might have come up heads. And if it had been flipped, it might have come up tails. But neither outcome is such that it would have resulted, had the coin been flipped. Concerning excluded middle, Lewis says: It is not the case that if Bizet and Verdi were compatriots, Bizet would be Italian; and it is not the case that if Bizet and Verdi were compatriots, Bizet would not be Italian; nevertheless, if Bizet and Verdi were compatriots, Bizet either would or would not be Italian. (Counterfactuals, p. 80.) Lewis can follow this up by noting that if Bizet and Verdi were compatriots, Bizet might be Italian, but it’s not the case that if they were compatriots, he would be Italian. Here is a related complaint of Lewis’s about Stalnaker’s semantics. In the last little bit, I’ve used English phrases of the form “if it were the case that φ, then it might have been the case that ψ”. This conditional Lewis calls “the ‘might’ counterfactual”; he symbolizes it as φ3→ψ, and defines it thus: φ3→ψ =d f ∼(φ2→∼ψ) Lewis criticizes Stalnaker’s system for the fact that this definition of 3→ doesn’t work in Stalnaker’s system. Why not? Well, since internal negation is valid in Stalnaker’s system, φ3→ψ would always imply φ2→ψ — not good, since the might-conditional in English seems weaker than the would-conditional. So, Lewis’s definition of 3→ doesn’t work in Stalnaker’s system. Moreover, there doesn’t seem to be any other plausible definition. So, Stalnaker can’t define 3→.7 Lewis also objects to the limit assumption. The following line is less than one inch long: 7
Lewis (1973, p. 80).
CHAPTER 8. COUNTERFACTUALS
197
Now, consider the counterfactual: If the line were more than 1 inch long, it would be over 100 miles long. Seems false. But if we use Stalnaker’s truth conditions as truth conditions for English sentences, and take our intuitive judgments of similarity seriously, we seem to get the result that it is true! The reason is that there doesn’t seem to be a closest world in which the line is more than 1 inch long. For every world in which the line is, say, 1+k inches long, there’s another world in which the line has a length closer to its actual length but still more than 1 inch long: say, 1+ k2 inches. So there doesn’t seem to be any closest world in which the line is over 1 inch long. This has two results. First, the limit assumption is false, for real similarity and English. Second, the counterfactual is vacuously true, given Stalnaker’s truth conditions for the 2→. Lewis accepts neither anti-symmetry nor the limit assumption. Let’s look at Lewis’s system.
8.8
Lewis’s system8
To move from Stalaker’s system to Lewis’s, we can start by just dropping the anti-symmetry assumption. We also want to drop the limit assumption. But after dropping the limit assumption, if we made no further adjustments to the system, we would get unwanted vacuous truths, as we did in the example of the 1 inch line above.9 The truth definition for 2→ needs to be changed. Instead of saying that φ2→ψ is true iff ψ is true in all the nearest φ worlds, we will instead say that φ2→ψ is true iff either i) φ is true in no worlds (the vacuous case), or ii) there is some φ world such that for every φ world at least as close, φ→ψ is true there. Here is the new system, LC (Lewis counterfactuals). It is exactly the same as the Stalnaker system except that limit and anti-symmetry are dropped, and the parts indicated in boldface are changed: 8
See Lewis (1973, pp. 48-49). My formulation does away with the accessibility relation (in Lewis’s terminology, Si , the set of worlds accessible from i, is always W, the set of all worlds in the model), so it is a bit simpler. 9 Actually, dropping the limit assumption doesn’t affect the class of valid formulas, which is the same with or without the limit assumption. (See Lewis (1973, p. 121).
CHAPTER 8. COUNTERFACTUALS
198
An LC-model is a three-tuple 〈W,,V〉, where: i) W is a nonempty set ii) V is a function that assigns either 0 or 1 to each wff relative to each member of W iii) is a three-place relation over W iv) V and satisfy the following conditions: C1: for any w, w is strongly connected C2: for any w, w is transitive C3: for any x, y, if y x x then x = y (“base”) v) For all wffs, φ, ψ and for all w ∈ W: a) b) c) d)
V(∼φ, w) = 1 iff V(φ, w) = 0 V(φ→ψ, w) = 1 iff either V(φ, w) = 0 or V(ψ, w) = 1 V(2φ, w) = 1 iff for any v, V(φ, v) = 1 V(φ2→ψ, w) = 1 iff EITHER φ is true at no worlds, OR: there is some world, x, such that V(φ, x) = 1 and for all y, if y w x then V(φ→ψ, y) = 1
It may be verified that every LC-valid wff is SC-valid (although not vice versa).10 Comments on all this: First, notice that the limit and anti-symmetry conditions are simply dropped. Second, the Base condition is modified; now it says that no world is as close to a world as itself. Before, it said that each world is at least as close to itself as any other. Stalnaker’s Base condition, plus antisymmetry, entails the present Base condition. But Lewis’s system doesn’t have anti-symmetry, so the Base condition must be stated in the stronger form.11 Third, let’s think about what the truth condition for the 2→ says. First, there’s the vacuous case: if φ is necessarily false then φ2→ψ comes out true. But if φ is possibly true, then what the clause says is this: φ2→ψ is true at 10
Here’s a sketch of an argument. Let’s say that an LC model is “Stalnaker-acceptable” iff it obeys the limit and anti-symmetry assumptions. Suppose that φ is LC-valid. Then it’s true in all Stalnaker acceptable LC-models. Now, notice that in Stalnaker-acceptable models, Lewis’s truth-conditions for 2→ are identical to Stalnaker’s. So, φ must be true in all SC-models. 11 Why do we want to prohibit worlds being just as close to w as w is to itself? So that (P ∧Q)→(P 2→Q) comes out valid. If P ∧Q were true at w, while P ∧∼Q was true at some world as close to w as w is to itself, then (P ∧Q)→(P 2→Q) would then be false at w.
CHAPTER 8. COUNTERFACTUALS
199
w iff there’s some φ world where ψ is true, such that no matter how much closer to w you go, you’ll never get a φ world where ψ is false. If there is a nearest-to-w φ world, then this implies that φ2→ψ is true at w iff ψ is true in all the nearest-to-w φ worlds. So, thinking of these as truth-conditions for English sentences for a moment, recall the sentence: If the line were over 1 inch long, it would be over 10 inches long. There’s no nearest world in which the line is over 1 inch long, only an infinite series of worlds where the line has lengths getting closer and closer to 1 inch long. But this doesn’t make the counterfactual true. A counterfactual is true if its antecedent is impossible, but that’s not true in this case. So the only way the counterfactual could be true is if the second part of the definition is satisfied — if, that is, there is some world, x, such that the antecedent is true at x, and the material conditional (antecedent→ consequent) is true at every world at least as similar to the actual world as is x. Since the “at least as similar as” relation is reflexive, this can be rewritten thus: for some world, x, the antecedent and consequent are both true at x, and the material conditional (antecedent→ consequent) is true at every world at least as similar to the actual world as is x So, is there any such world, x? No. For let x be any world at which the antecedent and consequent are both true — i.e., any world in which the line is over 10 inches long. We can always find a world that is more similar to the actual world than x in which the material conditional (antecedent→ consequent) is false: just choose a world just like x but in which the line is only, say, 2 inches long. Let’s see how Lewis’s theory works in the case of a true counterfactual, for instance: If I were more than six feet tall, then I would be less than nine feet tall (I am, in fact, less than six feet tall.) The situation here is similar to the previous example in that there is no nearest world in which the antecedent is true. But now, we can find a world x, in which the antecedent and consequent are both true, and such that the material conditional (antecedent→ consequent) is true
CHAPTER 8. COUNTERFACTUALS
200
in every world at least as similar to the actual world as is x. Simply take x to be a world just like the actual world but in which I am, say, six-feet-one. Any world that is at least as similar to the actual world as this world must be one in which I’m less than nine feet tall; so in any such world the material conditional (I’m more than six feet tall→ I’m less than nine feet tall) is true. Notice that the formulas representing Conditional Excluded Middle and Distribution come out invalid now, because of the possibility of ties. Another thing: Lewis gives the following definition for the ‘might’-counterfactual: φ3→ψ =d f ∼(φ2→∼ψ) From this we may obtain a derived clause for the truth conditions of φ3→ψ: V(φ3→ψ, w) = 1 iff for some x, V(φ, x) = 1, and for any x, if V(φ, x) = 1 then there’s some y such that y w x and V(φ∧ψ, y) = 1) That is, φ3→ψ is true at w iff φ is possible, and for any φ world, there’s a world as close or closer to w in which φ and ψ are both true. In cases where there is a nearest φ world, this means that ψ must be true in at least one of the nearest φ worlds.
8.9
The problem of disjunctive antecedents
Before we leave counterfactual conditionals, I want to talk about one criticism that has been raised against both Lewis’s and Stalnaker’s systems.12 The argument: (P ∨Q)2→R P 2→R is invalid in both systems. (Take a model where there is a unique nearest P ∨Q world to r, in which Q is true but not P ; and make there be a unique nearest P world in which R is false.) But shouldn’t this argument be valid? Imagine Butch Cassidy and the Sundance Kid talking to each other in heaven, after having gotten killed at the shootout. They say: 12
For references, see the bibliography of Lewis (1977).
CHAPTER 8. COUNTERFACTUALS
201
If we had surrendered or tried to run away, we would have been shot. Intuitively — in English — if this is true, so is this: If we had surrendered, we would have gotten shot. In general, the English pattern: If P or Q were true, R would be true If P were true, R would be true seems valid. Is this a problem for Lewis and Stalnaker? Some have argued this, but others respond as follows. One must take great care in translating from English into logic. For example,13 no one would want to criticize the law ∼∼P →P on the grounds that “There ain’t no way I’m doing that” doesn’t imply that I might do that. And there are notorious peculiar things about the behavior of ‘or’ in similar contexts. Consider: (1) You are permitted to stay or go. One can argue that this does not have the form: (2) Permit(Stay ∨ go) After all, suppose that you are permitted to stay, but not to go. If you stay, you can’t help doing the following act: staying ∨ going. So, surely, you’re permitted to do that. So, (2) is true. But (1) isn’t; if someone uttered (1) to you when you were in jail, they’d be lying to you! (1) really means: (3) You are permitted to stay, AND you are permitted to go. Similarly, in English, “If either P or Q were true then R would be true” seems usually to mean “If P were true then R would be true, and if Q were true then R would be true”. We can’t just expect English always to translate directly into our logical language — sometimes the surface structure of English is misleading. 13
The example is adapted from Loewer (1976).
Chapter 9 Quantified Modal Logic e’re going to look at Kripke-style semantics for quantified modal logic. The language is what you get by adding the 2 and 3 to the language of predicate logic. There are many interesting issues concerning the interaction of modal operators with quantifiers.
W 9.1
Grammar of QML
Grammatically, the language of QML is the same as that of plain old predicate logic, except that we add the 2. Thus, the one new clause to the definition of a wff is the clause that if φ is a wff, then so is 2φ. The 3 is defined as before: as ∼2∼.
9.2
Translations in QML
Adding quantifiers to our modal language allows us to make new distinctions when symbolizing sentences of English. In particular, it allows us to make the famous distinction between de re and de dicto modal statements. A paradigm instance concerns the ambiguity of the sentence (1) Some rich person might have been poor. If we don’t have any quantifiers around, it seems that we have only one possible symbolization: as 3P , where P stands for “Some rich person is poor”. But this symbolization is incorrect, because in no possible world is the statement “Some 202
CHAPTER 9. QUANTIFIED MODAL LOGIC
203
rich person is poor” true! In contrast, (1) seems to have a reading on which it is true. When we turn to symbolize it in QML, we have better luck. There seem to be two possibilities: (1a) 3∃x(Rx∧P x) (1b) ∃x(Rx∧3P x) (1a) is just the rejected translation as 3P , with P becoming ∃x(Rx∧P x). But (1b) is better — translating back into English, it says that there is a person who is in fact rich, but is such that s/he might have been poor. That’s what the original sentence meant, on its most natural reading. The ambiguity of (1) is said to be a “de re/de dicto” ambiguity. (1a) is the “de dicto” reading — the modal operator 3 attaches to a complete closed sentence. It’s called “de dicto” because the property of possibility is attributed to the sentence — a dictum. (1b) is the de re reading — there, the modal operator attaches to an open sentence. It’s called “de re” (“of the object”), because 3F x can be thought of as attributing a certain property to an object u when x is assigned the value u — the modal property of possibly being F. The following sentence also exhibits a de re/de dicto ambiguity: Every bachelor is necessarily male. The two readings are: 2∀x(B x→M x)
(De dicto)
∀x(B x→2M x)
(De re)
The de dicto reading is true because it is true that in any possible world, anyone that is in that world a bachelor is, in that world, male. The de re reading is false because it makes the claim that if any object, u, is a bachelor in the actual world, that object u is necessarily a bachelor — i.e., the object u is a bachelor in all possible worlds. That claim is false: many objects are in fact bachelors, but might have been married. Definite descriptions also exhibit a de re/de dicto ambiguity. This may be illustrated by using Russell’s theory of descriptions (section 5.3.3). Russell’s method generates an ambiguity when a sentential operator such as “not” is added to “is G”, for example: The striped bear is not dangerous
CHAPTER 9. QUANTIFIED MODAL LOGIC
204
Does this mean: ∼∃x(S x∧B x∧∀y([S y∧B y]→x=y)∧D x) or does it mean the following? ∃x(S x∧B x∧∀y([S y∧B y]→x=y)∧∼D x) I.e., is the sentence denying that there is one and only one striped bear that is dangerous, or is it saying that there is one and only one striped bear, and that bear is non-dangerous? There’s an ambiguity, which is a matter of the relative scopes of the definite description and the ∼. Sentences with definite descriptions and modal operators have an analogous ambiguity. Consider, for instance: The number of the planets is necessarily odd Letting “N x” mean that x numbers the planets, this could be symbolized in either of the following ways: 2∃x(N x∧∀y(N y→x=y)∧O x) ∃x(N x∧∀y(N y→x=y)∧2O x) The first is de dicto; it says that it’s necessary that: there is one and only one number of the planets, and that number is odd; that’s false (since there could have been eight planets.) The second is de re; it says that (in fact) there is one and only one number of the planets, and that that number is necessarily odd. That’s true, since the number that is in fact the number of the planets — the number nine — is indeed necessarily odd.
9.3
Semantics for QML
I’m going to simplify our discussion of QML in a couple ways. First, I won’t consider axiom systems at all; we’ll go straight to semantics. Second, I’ll simplify the semantics by dropping the accessibility relation from our models. 2φ will be said to be true iff φ is true in all worlds in the model; this means, in effect, that I’m assuming that the underlying propositional modal logic is S5. Third, I’m going to start by considering the simplest sort of semantics first: one with a “constant domain”. (In a subsequent section we’ll consider models with variable domains.)
CHAPTER 9. QUANTIFIED MODAL LOGIC
205
A QML-model as an ordered triple 〈W,D,I〉 such that: i) W is a nonempty set
(“possible worlds”)
ii) D is a non-empty set
(“domain”)
iii) I is a function such that:
(“interpretation function”)
a) if α is a constant then I(α)∈D b) if Πn is an n-place predicate then I(Πn ) is a set of ordered n + 1-tuples 〈u1 , . . . , un , w〉, where u1 , . . . , un are members of D, and w ∈ W. Recall that as we moved from non-modal propositional logic to modal logic, we assigned formulas semantic values (in that case, truth values) in worlds. We do the same thing here. Well, names (constants) do not get assigned semantic values relative to worlds, actually: they get assigned referents (members of the domain) absolutely. (This reflects the common belief that the parts of natural language that constants most naturally represent — proper names — are rigid designators — terms that have the same denotation relative to every possible world.) But predicates do get assigned semantic values relative to worlds: instead of getting assigned an extension, a predicate now gets assigned an extension relative to each world. This represents the fact that, e.g., the set of tall things in the actual world might be different from the set of tall things in some other world. I’m in fact not a member of the set of tall things, but I might have been a member of the set of tall things — if I had been tall. Relative to the actual world, I’m not a member of the extension of ‘is tall’, but relative to some other world, I am (say, a world in which I’m 6 feet 5 inches tall). We carry over the earlier notion of an assignment, and give the definition of the valuation function, which now is relative in its assignment of truth values in two different ways. As with non-modal predicate logic models, the truth values are assigned relative to variable assignments. But they’re also assigned relative to a given world, since, after all, the sentence ‘F a’, if it represents “Ted is tall”, should vary in truth value from world to world: Definition of the valuation function , VM,g , for model M (=〈W,D,I〉) and variable assignment g : i) for any terms α, β, VM,g (α=β, w) = 1 iff [α]M, g = [β]M,g
CHAPTER 9. QUANTIFIED MODAL LOGIC
206
ii) for any n-place predicate, Π, and any terms α1 , . . . , αn , VM,g (Πα1 . . . αn , w) = 1 iff 〈[α1 ]M,g , . . . , [αn ]M, g , w〉 ∈ I(Π) iii) for any wffs φ, ψ, and any variable, α, a) VM,g (∼φ, w) = 1 iff VM,g (φ, w) = 0 b) VM, g (φ→ψ, w) = 1 iff either VM,g (φ, w) = 0 or VM,g (ψ, w) = 1 c) VM,g (∀αφ, w) = 1 iff for every u ∈ D,VM,gu/α (φ, w) = 1 d) VM,g (2φ, w) = 1 iff for every v ∈ W,VM, g (φ, v) = 1 The derived clauses are what you’d expect, including the following one for 3: VM, g (3φ, w) = 1 iff for some v ∈ W,VM,g (φ, v) = 1 Finally, we have: φ is valid in M (=〈W,D,I〉) =d f for every variable assignment, g , and every w ∈ W,VM, g (φ, w) = 1 φ is QML-valid =d f φ is valid in all QML models.
9.4
Countermodels and validity proofs in QML
We’ll again be interested in coming up with countermodels for invalid formulas, and validity proofs for valid ones. We can use the same pictorial method for constructing countermodels, asterisks and all, with a few changes. First, there’s no need for the arrows between worlds, since we’ve dropped the accessibility relation, thereby making every world accessible to every other. Secondly, we have predicates and names for atomics instead of sentence letters, so how to account for this? Let’s look at an example: finding a countermodel for the formula (3F a∧3Ga)→3(F a∧Ga). We begin as follows: ∗ 1
r
1 1
0 0
(3F a∧3Ga)→3(F a∧Ga) ∗ ∗
CHAPTER 9. QUANTIFIED MODAL LOGIC
207
The understars make us create two new worlds: ∗ 1
1 1
0 0
(3F a∧3Ga)→3(F a∧Ga) ∗ ∗
r
a
1
1
Fa
b
Ga
We must then discharge the overstar from the false diamond in each world (since every world is accessible to every other world in our models): ∗ 1
r
1 1
0 0
0
0
(3F a∧3Ga)→3(F a∧Ga) ∗ ∗ †
a
1
Fa
0 0
1
F a∧Ga
b
Ga
0
0
F a∧Ga
(I had to make either Fa or Ga false in r — I chose Fa arbitrarily.) Now, we’ve indicated the truth-values that we want the atomics to have. How do we make the atomics have the TVs we want in the picture? We do this by introducing a domain for the model, and stipulating what the names refer to and what objects are in the extensions of the predicates. Let’s use letters like ‘u’ and ‘v’ as the members of the domain in our models. Now, if we let the name ‘a’ refer to (the letter) u, and let the extension of F in world r be {} (the empty set), then the truth value of ‘F a’ in world r will be 0 (false), since the denotation of a isn’t in the extension of F at world r. Likewise, we need to put u in the extension of F (but not in the extension of G) in world a, and put u in the extension of G ((but not in the extension of F ) in world b. This all may be indicated on the diagram as follows: a: u
CHAPTER 9. QUANTIFIED MODAL LOGIC
208
∗ 1
r
1 1
0 0
0
0
(3F a∧3Ga)→3(F a∧Ga) ∗ ∗ † F :{}
1 a
Fa
0 0
1
F a∧Ga
F : {u}
b
G : {}
Ga F : {}
0
0
F a∧Ga G : {u}
Within each world I’ve included a specification of the extension of each predicate. But the specification of the referent of the name ‘a’ does not go within any world; it was rather indicated (in boldface) at the top of model. This is because names, unlike predicates, get assigned semantic values absolutely in a model, not relative to worlds. Time for the official model: W = {r,a,b} D = {u} I(a) = u I(F ) = {〈u,a〉} I(G) = {〈u,b〉} Let’s try one with quantifiers: 2∃xF x→∃x2F x: ∗ 1 1
r
+ 0 0
2 ∃ xF x→ ∃ x2F x +
The overstar above the 2 in the antecedent must be discharged in r itself, since, remember, every world sees every world in these models. That gives us a true existential. Now, a true existential is a bit like a true 3 — the true ∃xF x means that there must be some object u from the domain that’s in the extension of F in r. I’ll put a + under true ∃s and false ∀s, to indicate a commitment to some instance of some sort or other. Analogously, I’ll indicate a commitment to all instances of a given type (which would arise from a true ∀ or a false ∃) with a + above the connective in question.
CHAPTER 9. QUANTIFIED MODAL LOGIC
209
OK, how do we make ∃xF x true in r? By making “F x” true for some value of x. Let’s put the letter u in the domain, and make “F x” true when u is assigned to x. We’ll indicate this by putting a 1 overtop of “F ux ” in the diagram. Now, “F ux ” isn’t a formula of our language — what it indicates is that “F x” is to be true when u is assigned to x. And to make this come true, we treat it as an atomic — we put u in the extension of F at r: +
∗ 1 1
r
0 0
2 ∃ xF x→ ∃ x2F x +
1
F ux
F : {u} Good. Now we’ve got to attend to the overplus, the + sign overtop the false ∃x2F x. Since it’s a false ∃, we’ve got to make 2F x false for every object in the domain (otherwise — if there were something in the domain for which 2F x was true — ∃x2F x would be true after all). So far, we’ve got only one object in our domain, u, so we’ve got to make 2F x false, when u is assigned to the variable ‘x’. We’ll indicate this on the diagram by putting a 0 overtop of “2F ux ”: ∗ 1 1
r
+ 0 0
2 ∃ xF x→ ∃ x2F x +
1
F ux
0
2F ux ∗
F : {u} Ok, now we have an understar, which means we should add a new world to our model. When doing so, we’ll need to discharge the overstar from the antecedent. We get:
CHAPTER 9. QUANTIFIED MODAL LOGIC
+
∗ 1 1
r
210
0 0
1
2 ∃ xF x→ ∃ x2F x +
F ux
0
2F ux ∗
F : {u}
0
1
F ux
1
F vx
∃ xF x +
a
F : {v} This move requires some explanation. Why the v? Well, I was required to make F x false, with u assigned to x. Well, that means keeping u out of the extension of F at a. Easy enough, right? Just make F ’s extension {}? Well, no — because of the true 2 in r, I’ve got to make ∃xF x true in a. But that means that something’s got to be in F ’s extension in a! It can’t be u, so I’ll add a new object, v, to the domain, and put it in F ’s extension in a. But adding v to the domain of the model adds a complication. We had an overplus in r — over the false ∃. That meant that, in r, for every member of the domain, 2F x is false. So, 2F x is false in r when v is assigned to x. That creates another understar, requiring the creation of a new world. The model then looks as follows: +
∗ 1 1
r
0 0
2 ∃ xF x→ ∃ x2F x +
1
0
2F ux ∗
F ux
0
2F vx ∗
F : {u}
0
F ux a
1
∃ xF x + F : {v}
1
0
F vx
F vx b
1
∃ xF x + F : {u}
1
F ux
CHAPTER 9. QUANTIFIED MODAL LOGIC
211
(Notice that we needn’t have made another world b — we could simply have discharged the understar on r.) Ok, here’s the official model: W = {r,a,b} D = {u,v} I(F ) = {〈u,r〉,〈u,b〉,〈v,a〉} Another example: I’ll do a validity proof for the following formula: 3∃x(x = a∧2F x)→F a: i. suppose for reductio that (for some model, some world r , and some variable assignment g ,) Vg (3∃x(x=a∧2F x), r ) = 1, and … ii. …that Vg (F a, r ) = 0 iii. From i), for some w ∈ W, Vg (∃x(x=a∧2F x), w) = 1 iv. so for some u ∈D, Vgu/x (x=a∧2F x, w) = 1) v. Thus, Vgu/x (x=a, w) = 1 and … vi. …Vgu/x (2F x, w) = 1 vii. from vi), Vgu/x (F x, r ) = 1 viii. Thus, 〈[x] gu/x , r 〉 ∈ I(F ) — that is, 〈u, r 〉 ∈ I(F ) ix. from v, [x] gu/x = [a] gu/x x. By the definition of denotation plus facts about variable assignments, u = I(a) xi. By viii) and x), 〈I(a),r 〉 ∈ I(F ) xii. Thus, Vg (F a, r ) = 1. Contradicts line ii) Notice that in line xii I inferred that Vg assigned “F a” truth at r . I could have subscripted V with any variable assignment, since the truth condition for the formula “F a” is the same, regardless of the variable assignment; I picked g because that’s what I needed to get the contradiction.
CHAPTER 9. QUANTIFIED MODAL LOGIC
9.5
212
Philosophically interesting formulas of QML
Now, let’s try to come up with a countermodel for the following formula: ∀x∀y(x=y→2(x=y)). When we make the universal false, we get an underplus. So we’ve got to make the inside part, ∀y(x=y→2x=y), false for some value of x. We do this by putting some object u in the domain, and letting that be the value of x for which ∀y(x=y→2x=y) is false. We get: 0 r
0
∀ x∀y(x=y→2x=y) +
∀y( ux =y→2( ux =y)) +
Now we need to do the same thing for our new false universal: ∀y(x=y→2x=y). For some value of y, the inside conditional has to be false. But that means that the antecedent must be true. So the value for y has to be u again. We get: 0 r
∀ x∀y(x=y→2x=y) +
0
∀y( ux =y→2( ux =y)) +
1 0 0 u u u u = →2( =y) x y x
∗
The understar now requires creation of a world in which x=y is false, when both x and y are assigned u. But there is no such world! An identity sentence is true (at any world) if the denotations of the terms are identical. Our attempt to find a countermodel has failed; we must do a validity proof. Consider any QML model 〈W, D,I〉, any r ∈ W, and any variable assignment g ; we’ll show that Vg (∀x∀y(x=y→2x=y), r ) = 1: i. suppose for reductio that Vg (∀x∀y(x=y→2x=y), r ) = 0. ii. Then, for some u ∈ D, Vgu/x (∀y(x=y→2x=y), r ) = 0 iii. So, for some v ∈ D, Vgu/x v/y (x=y→2(x=y), r ) = 0. iv. Thus, Vgu/x v/y (x=y, r ) = 1, and … v. …Vgu/x v/y (2(x=y), r ) = 0 vi. from iv) [x] gu/x v/y = [y] gu/x v/y vii. From v), at some world, w, Vgu/x v/y (x=y, w) = 0 viii. And so, [x] gu/x v/y 6= [y] gu/x v/y . Contradicts vi).
CHAPTER 9. QUANTIFIED MODAL LOGIC
213
Notice at the end how the particular world at which the identity sentence was false didn’t matter. The truth condition for an identity sentence is simply that the terms denote the same thing; it doesn’t matter what world this is evaluated relative to. This is a famous formula — it says that if “two” things are identical, then they are necessarily identical. It makes sense: “x=y” says that x and y are one and the same thing. Now, if there were a world in which x was different from y, since x and y are the same thing, this would have to be a world in which x was different from x. How could that be? Still, people found it strange. It was a great discovery that Hesperus = Phosphorus. Surely, it could have turned out the other way — surely, Hesperus might not have turned out identical to Phosphorus! But isn’t this a counterexample to this formula? For a discussion of this example, see Saul Kripke’s 1972. A note about the variables ‘u’ and ‘v’ here. In validity proofs, I’m using italicized ‘u’ and ‘v’ as variables to range over objects in the domain of the model I’m considering. So, a sentence like ‘u = v’ might be true, just as the sentence ‘x=y’ of our object language can be true. But when I’m doing countermodels, I’m using the roman letters ‘u’ and ‘v’ as themselves being members of the domain, not as variables ranging over members of the domain. Since the letters ‘u’ and ‘v’ are different letters, they are different members of the domain. Thus, in a countermodel with letters in the domain, if the denotation of a name ‘a’ is the letter ‘u’, and the denotation of the name ‘b ’ is the letter ‘v’, then the sentence ‘a=b ’ has got to be false, since ‘u’6=‘v’. If I were using ‘u’ and ‘v’ as variables ranging over members of the domain, then the sentence ‘u = v’ might be true! This just goes to show that it’s important to distinguish between the following two sentences: u=v ‘u’ = ’v’ The first could be true, depending on what ‘u’ and ‘v’ currently refer to, but the second one is just plain false, since ‘u’ and ‘v’ are different letters. Another famous formula of QML is the “Barcan Formula”, named after Ruth Barcan Marcus. Schematically, it is ∀x2φ→2∀xφ. We’ll just consider one of its instances: ∀x2F x→2∀xF x. If we try to find a countermodel for this formula we get to the following stage:
CHAPTER 9. QUANTIFIED MODAL LOGIC
0
+ 1 r
0
∀ xF x +
0 0
∀ x2F x→2∀xF x ∗
a
214
F ux
F : {}
When you have a choice between discharging over-things and under-things, whether plusses or stars, always do the under things first. In this case, this means discharging the understar and ignoring the over-plus for the moment. So, discharging the understar gave us world a, in which we made a universal false. This gave an underplus, and forced us to make an instance false. So I put object u in our domain, and keep it out of the extension of F in a. This makes F x false in a, when x is assigned u. But now, I need to discharge the overplus in r. I must make 2F x true for every member of the domain, including u, which is now in the domain. But then this requires F x to be true, when u is assigned to x, in a: + 1 r
∗ 0 0
∀ x2F x→2∀x F x ∗ F : {u}
0
1 1
2F ux
a
0
∀ xF x +
F ux
1
F ux
F : {?}
So, we fail to get a model. Time for a validity proof: i. suppose for reductio that Vg (∀x2F x, r ) = 1 and … ii. …Vg (2∀xF x, r ) = 0. iii. from ii), for some w, Vg (∀xF x, w) = 0 iv. so, for some u in the domain, Vgu/x (F x, w) = 0 v. from i), for every member of the domain, and so for u in particular, Vgu/x (2F x, r ) = 1. vi. thus, for every world, and so for w in particular, Vgu/x (F x, w) = 1. Contradicts iv). It is a famous fact that the Barcan formula turns out valid (on the current semantics). This is famous because the Barcan formula seems, intuitively,
CHAPTER 9. QUANTIFIED MODAL LOGIC
215
invalid. Suppose nothing exists besides God. Suppose further that God is essentially good. Then, letting ‘F x’ mean “x is good”, it seems that ∀x2F x is true, since everything there is (namely, God) is necessarily good. And yet it shouldn’t follow that 2∀xF x is true, since it should still be possible for there to be non-good things — consider a world like this world, for example. The fact that the Barcan formula is valid in our present system is due to the assumption of “common domains” in our definition of a model. We have a single domain in the model — this seems to correspond to an assumption that the set of objects that exist does not vary from world to world. The God example depends on violating this assumption — it assumes the existence of a world with fewer objects than exist in other worlds. There are various ways to get around this problem. The first one I’ll mention is the simplest, and probably the least satisfactory. We’ll understand the objects in the domain as being all of the possible objects. Thus, the QML formula “∀x F x”, as uttered in the actual world, means that every possible object is F at the actual world, not just the things that exist at the actual world. But in ordinary language, when we say “Everything has mass”, or “there are no unicorns”, we don’t mean to be talking about all possible objects, just all existing objects. Ordinary quantifiers are restricted to the existing objects. To translate ordinary claims, then, we must introduce into our language an existence predicate: “E”. We can use this predicate to translate what would ordinarily be expressed by: If everything is necessarily good, then necessarily: everything is good. The correct translation is not the Barcan formula, but is rather: ∀x(E x→2F x)→2∀x(E x→F x) which says: If everything that exists is necessarily good, then necessarily: everything that exists is good. And this formula is not valid, as may be shown by the following countermodel:
CHAPTER 9. QUANTIFIED MODAL LOGIC
216
+ 1 r
0 0
0
∀ x(E x→2F x)→2∀x(E x→F x) ∗
1 0
E ux →2F ux †
E : {}
0 a
1
∀ x(E x→F x) + E : {u}
0 0
E ux →F ux
F : {}
My thinking was to first discharge the under-star under the false 2 in r. This gave me world a. Then, I discharged the under-plus in a by making E x→F x false when x is assigned u. But then I needed to discharge the over-plus in r, the true ∀x(E x→2F x), on u. This gave me the right hand part of r: the true E x→2F x when x is assigned u. However, I was able to make this true by making both antecedent and consequent false. The false consequent gave me an understar, but this was already discharged in a. The approach we have been considering gives a formally acceptable solution to the problem of the Barcan formula, but the approach is philosophically questionable. Pretend, again, that only God exists. The approach grants the truth of the following instance of the Barcan formula: If everything is necessarily good, then necessarily: everything is good Clearly, the consequent is false. That means that the antecedent is false. But how can that be, given that only God exists? The reply of this approach is that the antecedent is false because there are many things other than God that are not necessarily good: these are things (like you and me, for instance) that do not exist. Everything that exists is necessarily good, but some of the nonexistent things are not necessarily good. The problem with this approach is now clear: it assumes that there are some things that do not exist. This is a philosophical idea that is pretty unpopular. According to most philosophers, “there is something that does not exist” is a contradiction in terms. The question of the Barcan formula is one interesting question about how quantifiers and modal operators interact. Here are some other formulas
CHAPTER 9. QUANTIFIED MODAL LOGIC
217
concerning the interaction of the 2 and the quantifiers (I also list equivalents for the 3): 2∀xφ→∀x2φ ∀x2φ→2∀xφ 2∃xφ→∃x2φ ∃x2φ→2∃xφ
(Converse Barcan) (Barcan)
∃x3φ→3∃xφ 3∃xφ→∃x3φ ∀x3φ→3∀xφ 3∀xφ→∀x3φ
We have already discussed the Barcan formula, and the third formula on the list 2∃xφ→∃x2φ, which, quite properly, turns out invalid on our semantics. Let’s look at the other two. First, the converse Barcan formula: 2∀xF x→∀x2F x This has some plausibility, although it could be challenged on the following grounds. The antecedent, it might be argued, says that in every world, everything that exists in that world is F . Existents are thus always F . It might still be that some object isn’t necessarily F : perhaps some object that is F in every world in which it exists, fails to be F in worlds in which it doesn’t exist. This talk of an object being F in a world in which it doesn’t exist may seem strange, but let F stand for ‘exists’. In every world, everything that exists in that world exists, but assuming that some things are not necessarily existent, some things are such that at other worlds, they don’t have the property of existence. The same counterexample works, and is perhaps more intuitive for, the 3 version of the converse Barcan formula: ∃x3φ→3∃xφ; let φ be “does not exist” (“there exists something that is possibly non-existent; therefore it is possible that there exists something that doesn’t exist.”) This counterexample requires i) the assumption that what exists varies from world to world, and ii) the assumption that the quantifiers range only over existents, and iii) the assumption that the wff 2F x is true only if x is F in all worlds, not just all worlds in which x exists. Our current semantics denies i). When we look at alternate semantics for QML below, we’ll need to re-evaluate this objection. Notice that our existence-predicate solution to the objection to the Barcan Formula works again here; the proper symbolization for the sentence “necessarily, everything exists; therefore, everything necessarily exists” would be:
CHAPTER 9. QUANTIFIED MODAL LOGIC
218
2∀x(E x→E x)→∀x(E x→2E x) which is invalid. Now for the final formula: ∃x2φ→2∃xφ In fact, this formula is valid in our semantics, which can be demonstrated as follows: i. suppose Vg (∃x2φ, w) = 1, and … ii. …Vg (2∃xφ, w) = 0 iii. by i), for some u ∈ D, Vgu/x (2φ, w) = 1 iv. so for every world v, Vgu/x (φ, v) = 1 v. By ii), for some world, v, Vg (∃xφ, v) = 0 vi. so, for every member of the domain, and so for u in particular, Vgu/x (φ, v) = 0, contradicting iv) But this result appears to be implausible, for reasons similar to the reasons for rejecting the Barcan formula. Let’s suppose that physical objects are necessarily physical. Then, ∃x2P x seems true, letting P mean ‘is physical’. But 2∃xP x seems false — it seems possible that there are no physical objects. This counterexample requires that there be worlds with fewer objects than those that actually exist, whereas the counterexample to the Barcan formula involved the possibility that there be more objects than those that actually exist. Again, our existence-predicate solution solves the problem, if one is willing to ignore its philosophically questionable assumptions. The claim “if something is necessarily physical then necessarily, something is physical” gets symbolized as: ∃x(E x∧2P x)→2∃x(E x∧P x) which is invalid.
CHAPTER 9. QUANTIFIED MODAL LOGIC
9.6
219
Variable domains
We now consider a way of dealing with the problems discussed above, which is more philosophically appealing than the existence-predicate approach. According to the approach we will now consider, the philosophically problematic feature of the simple semantics for QML is its single common domain of objects shared by all possible worlds. Once this feature is accepted, one faces a choice between two unappealing alternatives: i) the same individuals exist in each possible world, or ii) some objects do not exist. A more philosophically acceptable semantics avoids each of these alternatives by doing away with a common domain, in favor of different domains for different possible worlds. We will also reinstate the accessibility relation, for reasons to be made clear below:1 A variable-domain QML model is an ordered 5-tuple 〈W,R,D,Q,I〉 such that: i) W is a nonempty set ii) R is a binary relation on W iii) D is a non-empty set
(“possible worlds”) (“accessibility relation”) (“domain”)
iv) Q is a function that assigns to any w ∈ W a subset of D. Let us refer to Q(w) as “Dw ”. Think of Dw as the set of objects that exist at w. v) I is a function such that:
(“interpretation function”)
a) if α is a constant then I(α) ∈ D b) if Π is an n-place predicate then I(Π) is a set of ordered n+1-tuples 〈u1 , . . . , un , w〉, where u1 , . . . , un are members of D, and w ∈ W. Denotation is defined as before. Truth in a model may then be defined thus: Definition of the valuation function VM,g for a variable-domain QML model M (=〈W,R,D,Q,I〉), relative to a variable assignment, g : More care than I take is needed in converting the earlier definition of validity if one is worried about the validity of formulas with free variables. See Cresswell and Hughes (1996a, p. 275). 1
CHAPTER 9. QUANTIFIED MODAL LOGIC
220
i) for any terms α and β, VM, g (α=β, w) = 1 iff [α]M,g = [β]M,g ii) for any n-place predicate, Π, and any terms α1 , . . . , αn , VM,g (Πα1 . . . αn , w) = 1 iff 〈[α1 ]M,g , . . . , [αn ]M, g , w〉 ∈ I(Π) iii) for any wffs φ and ψ, and any variable, α, a) VM,g (∼φ, w) = 1 iff VM,g (φ, w) = 0 b) VM, g (φ→ψ, w) = 1 iff either VM,g (φ, w) = 0 or VM,g (ψ, w) = 1 c) VM,g (∀αφ, w) = 1 iff for each u ∈ Dw ,VM, gu/α (φ, w) = 1 d) VM, g (2φ, w) = 1 iff for each v ∈ W, if Rwv then VM, g (φ, v) = 1 (The obvious derived clauses for the existential and diamond are as follows: Vg (∃αφ, w) = 1 iff for some u ∈ Dw ,Vgu/α (φ, w) = 1 Vg (3φ, w) = 1 iff for some v ∈ W, Rwv and Vg (φ, v) = 1 There were two main adjustments. First, we reinstated the accessibility relation. Second, we introduced subdomains. We still have D, the one big domain of all possible individuals. But for each possible world w, we introduce a subset of the domain, Dw , to be the domain for w. When evaluating a quantified sentence at a world w, the quantifier ranges only over Dw .
9.6.1
Countermodels to the Barcan and related formulas in variable-domain QML
What is the effect of this new truth definition on the Barcan formula and related formulas? All of the following come out invalid: 2∀xφ→∀x2φ ∀x2φ→2∀xφ 2∃xφ→∃x2φ ∃x2φ→2∃xφ
(Converse Barcan) (Barcan)
∃x3φ→3∃xφ 3∃xφ→∃x3φ ∀x3φ→3∀xφ 3∀xφ→∀x3φ
The third one on the list was invalid before, and so is still invalid now. For note that every one of our old models is equivalent to one of the new models,
CHAPTER 9. QUANTIFIED MODAL LOGIC
221
namely one in which Q is a constant function assigning the whole domain to each world. As for the Barcan formula ∀x2F x→2∀xF x, consider the following model: r
F : {u}
Dr : {u} 0
a
Da : {u,v} 0
F : {u}
Official model: W: {r,a} R: {〈r,r〉,〈r,a〉,〈a,a〉} D: {u,v} Dr : {u}
Da : {u,v}
I(F ): {〈u,r〉,〈u,a〉} As for the fourth schema on the list, we can construct a countermodel of one of its instances, ∃x2F x→2∃xF x, as follows: r
F : {u}
Dr : {u,v} 0
a
0
Da : {v}
F : {u}
W: {r,a} R: {〈r,r〉,〈r,a〉,〈a,a〉} D: {u,v} Dr : {u,v}
Da : {v}
I(F ): {〈u,r〉,〈u,a〉} And for the converse Barcan, 2∀xF x→∀x2F x, we have the following countermodel:
CHAPTER 9. QUANTIFIED MODAL LOGIC
r
0
F : {u,v}
Dr : {u,v} a
0
Da : {v}
222
F : {v}
W: {r,a} R: {〈r,r〉,〈r,a〉,〈a,a〉} D: {u,v} Dr : {u,v}
Da : {v}
I(F ): {〈u,r〉,〈v,r〉,〈v,a〉}
9.6.2
Expanding, shrinking domains
There are several comments worth making about these models. First, note that if we made certain restrictions on the models, the counterexamples would go away. For example, the first example, the counterexample to the Barcan formula, required a model in which the domain expanded; world a was accessible from world r, and had a larger domain. But suppose we made the: Decreasing domains requirement: if Rwv, then Dv ⊆ Dw The counterexample would then go away. Indeed, the Barcan formula would then become valid, which may be proved as follows: i. suppose for reductio that Vg (∀x2φ, w) = 1 and… ii. …Vg (2∀xφ, w) = 0 iii. by ii), for some v, Rwv and Vg (∀xφ, v) = 0 iv. and so, for some u ∈ Dv ,Vgu/x (φ, v) = 0 v. given decreasing domains, Dv ⊆ Dw , and so u ∈ Dw vi. by i), for every object in Dw , and so for u in particular, Vgu/x (2φ, w) = 1 vii. so, Vgu/x (φ, v) = 1. Contradicts iv)
CHAPTER 9. QUANTIFIED MODAL LOGIC
223
Similarly, notice that the counterexamples to ∃x2F x→2∃xF x and the converse Barcan formula assumed that domains can shrink. Just as above, an added requirement on models will validate these formulas: Increasing domains requirement: if Rwv then Dw ⊆ Dv Let’s first show that ∃x2φ→2∃xφ turns out valid under this added requirement: i. suppose for reductio that Vg (∃x2φ, w) = 1, and … ii. …Vg (2∃xφ, w) = 0 iii. by i), for some u ∈ Dw ,Vgu/x (2φ, w) = 1 iv. by ii), for some world v, Rwv and Vg (∃xφ, v) = 0 v. by the increasing domain requirement, Dw ⊆ Dv , and so u ∈ Dv vi. by iv), for every object in Dv , and so for u in particular, Vgu/x (φ, v) = 0 vii. by iii), Vgu/x (φ, v) = 1. Contradicts vi) We now show that the converse Barcan formula turns out valid, as well, under the increasing domains requirement: i. suppose for reductio that Vg (2∀xφ, w) = 1, and … ii. …Vg (∀x2φ, w) = 0 iii. so for some u ∈ Dw ,Vgu/x (2φ, w) = 0 iv. so for some v, Rwv and Vgu/x (φ, v) = 0 v. by i), Vg (∀xφ, v) = 1 vi. by increasing domains, u ∈ Dv vii. by v), for every object in Dv , and so for u in particular, Vgu/x (φ, v) = 1. Contradicts iv)
CHAPTER 9. QUANTIFIED MODAL LOGIC
224
Note further that even after imposing the increasing domains requirement, the Barcan formula remains invalid; and after imposing the decreasing domains requirement, the converse Barcan and also ∃x2φ→2∃xφ remain invalid; this can be seen by the original countermodels for these formulas above, each of which obeys the requirements in question. However, it should also be noted that in systems in which the accessibility relation is symmetric, this collapses: imposing either of these requirements results in imposing the other. That is, in B or S5, imposing either the increasing or the decreasing domains requirement results in imposing both, and hence results in all three formulas being validated.
9.6.3
Properties and nonexistence
There is a question one might have about our models, which is raised by how the interpretation function behaves with respect to predicates: if Π is an n-place predicate then I(Π) is a set of ordered n+1-tuples 〈u1 , . . . , un , w〉, where u1 , . . . , un are members of D, and w ∈ W. This in effect allows objects to have properties in worlds in which they do not exist, for nothing rules out Π having in its extension 〈u1 , . . . , un , w〉 even if some of the ui s are not in Dw . This might be thought to be improper; we might for philosophical reasons think that objects cannot have properties in worlds in which they do not exist. This thought would result in the following modification to the definition of an interpretation function: if Π is an n-place predicate then I(Π) is a set of ordered n+1-tuples 〈u1 , . . . , un , w〉, where u1 , . . . , un are members of Dw , and w ∈ W. This says that the extension of a predicate in a world always consists of n-tuples drawn from the domain of that world. But this raises a number of questions. What do we now say about the truth value of a formula F x at a world where the current referent of x doesn’t exist? We could leave the definitions as they were originally stated. This makes an atomic sentence like F x false at any world where the referent of x does not exist. But there is a philosophical problem one can raise at this point. Begin with any language. Now shift to a new language in which the atomic formulas mean the negations of what they meant in the first language. Now we get a problem. In each language, the formula F a is false at any world in which the
CHAPTER 9. QUANTIFIED MODAL LOGIC
225
referent of “a” does not exist, despite the fact that ‘F a’ in the second language is supposed to express the same thing as the negation of ‘F a’ in the first language. The problem is avoided if we could always require the atomics to express “positive” properties, but that requires making philosophical sense of the notion of a positive property. Continuing to assuming that objects never have properties at worlds at which they don’t exist, an alternate approach would be to say that atomic formulas containing a term that doesn’t denote anything in some world lack truth value at that world. Here is a start at defining the valuation function based on this idea: Relative to any variable-domains QML model M(=〈W,R,D,Q,I〉) and any variable assignment, g , the corresponding valuation function VM,g is defined as the possibly partial function that assigns either 0 or 1 or nothing to each wff relative to each member of W, subject to the following constraints: i) for any terms α and β, VM,g (α=β, w) = 1 iff [α]M,g ∈ Dw , [β]M,g ∈ Dw , and [α]M,g = [β]M,g ; Vg (α=β, w) = 0 iff [α]M,g ∈ Dw , [β]M,g ∈ Dw , and [α]M, g 6= [β]M,g ii) for any n-place predicate, Π, and any terms α1 , . . . , αn , VM,g (Πα1 . . . αn , w) = 1 iff 〈[α1 ]M,g , . . . , [αn ]M,g , w〉 ∈ I(Π); VM, g (Πα1 . . . αn , w) = 0 iff [α1 ]M, g , . . . , [αn ]M,g are all members of Dw but 〈[α1 ]M,g , . . . , [αn ]M,g , w〉 ∈ / I(Π) Note how we need to give conditions for falsity as well as truth. We’re no longer assuming that valuations assign either 0 or 1, so specifying the conditions under which a sentence is true no longer suffices to specify the conditions under which it is false. So far, the definition counts atomics as lacking truth value when their terms fail to denote. But now we face a choice of how to continue the definition of the valuation function. What truth values do complex formulas get when their constituents lack truth values? This brings up some of the issues from section 3.3: what truth tables are appropriate when we allow truth value gaps? A simple approach is Bochvar’s: a complex formula is gappy whenever any of its constituents are gappy. This could be implemented thus:2 Note: although this gives → the Bochvar truth conditions, it gives the 2 and ∀ strong Kleene-like truth conditions. 2
CHAPTER 9. QUANTIFIED MODAL LOGIC
226
iii) for any wffs φ and ψ, and any variable, α, a) VM,g (∼φ, w) = 1 iff VM,g (φ, w) = 0; VM,g (∼φ, w) = 0 if VM, g (φ, w) = 1 b) VM, g (φ→ψ, w) = 1 iff either VM,g (φ, w) = 0 and VM,g (ψ, w) = 0, or VM, g (φ, w) = 0 and VM, g (ψ, w) = 1, or VM, g (φ, w) = 1 and VM,g (ψ, w) = 1; VM,g (φ→ψ, w) = 0 iff VM, g (φ, w) = 1 and VM,g (ψ, w) = 0 c) VM,g (∀αφ, w) = 1 iff for every u ∈ Dw ,VM, gu/α (φ, w) = 1; VM, g (∀αφ, w) = 0 iff for some u ∈ Dw , VM, gu/α (φ, w) = 0 d) VM,g (2φ, w) = 1 iff for every v ∈ W such that Rwv, VM,g (φ, v) = 1; VM,g (2φ, w) = 0 iff for some v ∈ W such that Rwv, VM,g (φ, v) = 0 But certain problems arise. First, 2φ is only true if φ is true in all accessible worlds. But if φ contains constants or variables that denote things that do not necessarily exist, then φ will lack truth value in, and hence will fail to be true in, some worlds. Thus, the symbolization of “necessarily, if Ted is human then Ted is human” is false. One could revise the truth definition for the 2: VM,g (2φ, w) = 1 iff for every v ∈ W such that Rwv, VM,g (φ, v) is not 0; VM,g (2φ, w) = 0 iff for some v ∈ W such that Rwv, VM,g (φ, v) = 0 But then, the principle 2φ→φ is no longer valid: under this definition, 2φ will be true in w provided φ is not false at any accessible world; but it is consistent with this that φ itself lacks truth value at w. We could eliminate this problem by revising the definition again: VM,g (2φ, w) = 1 iff VM,g (φ, w) = 1, and for every v ∈ W such that Rwv, VM, g (φ, v) is not 0; VM, g (2φ, w) = 0 iff for some v ∈ W such that Rwv, VM,g (φ, v) = 0 But problems remain. For example, 2(φ∧ψ)→2φ won’t be valid: the reason is that 2(φ∧ψ) will only be falsified by worlds containing the referents of constants in both φ and ψ. There could be a world containing the referents
CHAPTER 9. QUANTIFIED MODAL LOGIC
227
of constants in φ, in which φ is false, but this won’t make 2(φ∧ψ) false if the constants in ψ don’t denote at this world. A move at this point would be to adopt other truth tables rather than Bochvar’s; Kleene’s, say. On Kleene’s (“strong”) tables, if the truth values that are defined for a constituent settle the truth value of the complex, regardless of what truth values the other constituents take on, then the settled truth value is taken as the truth value of the complex. For example, (False ∧ undefined) results in false. Thus, at a world where φ is false, so is φ∧ψ; thus the counterexample to the validity of 2(φ∧ψ)→2φ no longer works. There is more that can be said here, but we’ll stop with that.
Chapter 10 Two-dimensional modal logic
I
n this chapter we consider an extension to modal logic with considerable philosophical interest.
10.1
Actuality
The word ‘actually’, in one of its senses anyway, can be thought of as a one-place sentence operator: “Actually, φ.” ‘Actually’ might at first seem redundant. “Actually, snow is white” basically amounts to: “snow is white”. But the actuality operator interacts with modal operators in interesting ways. The following two sentences, for example, clearly have different meanings: Necessarily, if snow is white then snow is white Necessarily, if snow is white then snow is actually white The first sentence expresses the triviality that snow is white in any possible world in which snow is white. But the second sentence makes the nontrivial statement that if snow is white in any world, then snow is white in the actual world. So, ‘actually’ is nonredundant, and consequently, worth thinking about. Let’s add a symbol to modal logic for it. “@φ” will symbolize “Actually, φ”. We can now symbolize the pair of sentences above as 2(S→S) and 2(S→@S),
228
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
229
respectively. For some further examples of sentences we can symbolize using ‘actually’, consider:1 It might have been that everyone who is actually rich is poor 3∀x(@Rx→P x) There could have existed something that does not actually exist 3∃x@∼∃y y=x
10.1.1
Kripke models with actual worlds
For the purposes of this chapter, the logic of iterated boxes and diamonds isn’t relevant, so let’s simplify things by dropping the accessibility relation from models; we will thereby treat every world as being accessible from every other. Before laying out the semantics of @, let’s examine a slightly different way of laying out standard modal logic. For propositional modal logic, instead of defining a model as an ordered pair 〈W,I〉 (no accessibility relation, remember), one could instead define a model as a triple 〈W, r,I〉, where W and I are as before, and r is a member of W, thought of as the actual, or designated world of the model. The designated world r plays no role in the definition of the valuation for a given model; it only plays a role in the definitions of truth in a model and validity: φ is true in model M (= 〈W, r,I〉) iff VM (φ, r) = 1 φ is valid in system S iff φ is true in all models for system S The old definition of validity for a system, recall, never employed the notion of truth in a model; rather, it proceeded via the notion of validity in a frame. The nice thing about the new definition is that it’s parallel to the way validity is usually defined in model theory: one first defines truth in a model, and then defines validity as truth in all models. But the new definition doesn’t differ in any substantive way from the old definition, in that it yields exactly the same class of valid formulas: Proof: It’s obvious that everything valid on the old definition is valid on the new definition (the old definition says that validity 1
In certain special cases, we could do without the new symbol @. For example, instead of symbolizing “Necessarily, if snow is white then snow is actually white” as 2(S→@S), we could symbolize it as 3S→S. But the @ is not in general eliminable; see Hodes (1984b,a).
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
230
is truth in all worlds in all models; the addition of the designated world r doesn’t play any role in defining truth at worlds, so each of the new models has the same distribution of truth values as one of the old models.) Moreover, suppose that a formula is invalid on the old definition — i.e., suppose that φ is false at some world, w, in some model M. Now construct a model of the new variety that’s just like M except that its designated world is w. φ will be false in this model, and so φ turns out invalid under the new definition. In a parallel way, one can also add a designated world to models for quantified modal logic.
10.1.2
Semantics for @
Now for the semantics of @. We can give @ a very simple semantics using models with designated worlds. Further, the designated world will now be involved in the notion of truth in a model, not just in the definition of validity. We’ll move straight to quantified modal logic, bypassing propositional logic. To keep things simple, let the models have a constant domain and no accessibility relation. (It will be obvious how to add these complications back in, if they are desired.) Define a Designated-world QML-model as a four-tuple 〈W, r, D,I〉, where: i) ii) iii) iv)
W is a non-empty set (“worlds”) r is a member of W (“designated/actual world”) D is a non-empty set (“domain”) I is a function that assigns semantic values (as before — names are assigned members of D; predicates are assigned extensions relative to worlds)
In the definition of the valuation for such a model, the semantic clauses for the old logical constants run as before, with the exception that the clause for the 2 no longer mentions accessibility: VM,g (2φ, w) = 1 iff for every v ∈ W,VM,g (φ, v) = 1 And we now add a clause for the new operator @: VM,g (@φ, w) = 1 iff VM,g (φ, r) = 1 i.e., @φ is true at any world iff φ is true in the designated world of the model.
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
10.1.3
231
Examples
Example 1: Show that ∀x(F x∨2Gx)→2∀x(Gx∨@F x) i) Suppose for reductio that this formula is not valid. Then for some model and some variable assignment g , Vg (∀x(F x∨2Gx)→ 2∀x(Gx∨@F x), r) = 0. ii) Then Vg (∀x(F x∨2Gx), r) = 1 and… iii) …Vg (2∀x(Gx∨@F x), r) = 0 iv) Given the latter, there is some world, call it “a”, such that Rra and Vg (∀x(Gx∨@F x), a) = 0. And so, there is some object, call it “u”, in the model’s domain, such that Vgu/x (Gx∨@F x, a) = 0 v) And so Vgu/x (Gx, a) = 0 and… vi) …Vgu/x (@F x, a) = 0 vii) Given the latter, Vgu/x (F x, r) = 0 (by the clause in the truth definition for @) viii) Given ii), for every object in the domain, and so for u in particular, Vgu/x (F x∨2Gx, r) = 1. ix) And so, either Vgu/x (F x, r) = 1 or Vgu/x (2Gx, r) = 1 x) From ix and vii, Vgu/x (2Gx, r) = 1 xi) And so, Vgu/x (Gx, a) = 1 — contradicts v) Example 2: show that 2 2∀x(Gx∨@F x)→2∀x(Gx∨F x): here is a model in which this formula is false at the actual world, r: W = {r,a} R = {〈r, a〉} D = {u} I(F ) = {〈u, r〉} I(G) = { } The formula turns out false in this model: the consequent is false because at world a, something (namely, u) is neither G nor F ; but the antecedent is true: since u is F at r, it’s necessary that u is either G or actually F .
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
10.2
232
×
Adding @ to the language of quantified modal logic is a step in the right direction, since it allows us to express certain kinds of comparisons between possible worlds that we couldn’t express otherwise. But it doesn’t go far enough; we need a further addition.2 Consider this sentence: It might have been the case that, if all those then rich might all have been poor, then someone is happy What it’s saying, in possible worlds terms, is this: For some world w, if there’s a world v such that (everyone who is rich in w is poor in v), then someone is happy in w. This is a bit like “It might have been that everyone who is actually rich is poor”; in this new sentence the word ‘then’ plays a role a bit like the role ‘actually’ played in the earlier sentence. But the ‘then’ does not take us back to the actual world of the model; it rather takes us back to the world, w, that is introduced by the first possibility operator, ‘it might have been the case that’. We cannot, therefore, symbolize our new sentence thus: 3(3∀x(@Rx→P x)→∃xH x) for this has the truth condition that there is world w such that, if there’s a world v such that (everyone who is rich in r is poor in v), then someone is happy in w. The problem is that the @, as we’ve defined it, always takes us back to the model’s designated world, whereas what we need to do is to “mark” a world, and have @ take us back to the “marked” world: 3×(3∀x(@Rx→P x)→∃xH x) Here × marks the spot: it indicates a point of reference for subsequent occurrences of @. See Hodes (1984a) on the limitations of @; see Cresswell (1990) on × (which he calls “Ref”), and further related additions. 2
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
10.2.1
233
Two-dimensional semantics for ×
So let’s further augment the language of QML with another one-place sentence operator, ×. The idea is that ×φ means the same thing as φ, except that subsequent occurrences of @ in φ are to be interpreted as picking out the world that was the “current world of evaluation” when the × was encountered. (This will become clearer once we lay out the semantics for × and @.) To lay out this semantics, let’s return to the old QML models (i.e., without a designated world; and let’s continue to omit the accessibility relation). Thus, a model is a triple 〈W, D,I〉, W a non-empty set, D a non-empty set, I a function assigning referents to names and extensions to predicates at worlds as before. But now we change the definition of truth. We no longer evaluate formulas at worlds. Instead we evaluate a formula at a pair of worlds (hence: “twodimensional semantics”). One world is the world we’re used to; it’s the world that we’re evaluating the formula for truth in. Call this the “world of evaluation”. The other world is a “reference world” — it’s the world that we’re currently thinking of as the actual world, and the world that will be relevant to the evaluation of @. Thus, VM,g (φ, w1 , w2 ) will mean that φ is true at world w2 , with reference world w1 . We define the denotation function [ ] as before. And VM,g is defined as the function that assigns to each wff, relative to each pair of worlds, either 0 or 1 subject to the following constraints: i) VM,g (Πα1 . . . αn , v, w) = 1 iff 〈[α1 ]M,g , . . . , [αn ]M, g , w〉 ∈ I(Π) ii) VM,g (∼φ, v, w) = 1 iff VM,g (φ, v, w) = 0 iii) VM, g (φ→ψ, v, w) = 1 iff VM,g (φ, v, w) = 0 or VM,g (ψ, v, w) = 1 iv) VM,g (∀αφ, v, w) = 1 iff for all u ∈ D,VM,gu/α (φ, v, w) = 1 v) VM,g (2φ, v, w) = 1 iff for all w 0 ∈ W,VM,g (φ, v, w 0 ) = 1 vi) VM,g (@φ, v, w) = 1 iff VM,g (φ, v, v) = 1 vii) VM,g (×φ, v, w) = 1 iff VM, g (φ, w, w) = 1 And then we define validity thus:
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
234
φ is valid iff for every model M, every world w in that model, and every assignment g based on that model, VM, g (φ, w, w) = 1 Note what the × does: change the reference world. When evaluating a formula, it says to forget about the old reference world, and make the new reference world whatever the current world of evaluation happens to be. We defined validity as truth at every pair of worlds of the form 〈w, w〉. But this isn’t the only notion of validity one could introduce; there is also the notion of truth at every pair of worlds:3 φ is generally valid iff for every model M, every worlds v and w in that model, and every variable assignment g based on that model, VM ,g (φ, v, w) = 1 These come apart in various ways, as we’ll see below. As we saw, moving to this new language increases the flexibility of the @; we can symbolize It might have been the case that, if all those then rich might all have been poor, then someone is happy as 3×(3∀x(@Rx→P x)→∃xH x) Moreover, it costs us nothing. For we can replace any sentence φ of the old language with ×φ in the new language (i.e. we just put the × operator at the front of the sentence.)4 For example, instead of symbolizing It might have been that everyone who is actually rich is poor as 3∀x(@Rx→P x) as we did before, we symbolize it now as ×3∀x(@Rx→P x). 3
The term ‘general validity’ is from Davies and Humberstone (1980); the earlier definition of validity corresponds to their “real-world validity”. 4 This amounts to the same thing as the old symbolization in the following sense. Let φ be any wff of the old language. Thus, φ may have some occurrences of @, but it has no occurrences of ×. Then, for every QML-model M = 〈W, D,I〉, and any v, w ∈ W, ×φ is true at 〈v, w〉 in M iff φ is true in the designated-world QML model 〈W, w, D,I〉.
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
10.2.2
235
Examples
Example 1: show that if φ then @φ: Suppose for reductio that φ is valid but @φ is not. That means that in some model and some world, w (and some assignment g , but I’ll suppress this since it isn’t relevant here), V(@φ, w, w) = 0. Thus, given the truth condition for @, V(φ, w, w) = 0. But that violates the validity of φ. Example 2: φ↔@φ, but 2 2(φ↔@φ). (Moral: any proof theory for this logic had better not include the rule of necessitation!) The truth condition for @ insures that for any world w in any model (and any variable assignment), V(@φ, w, w) = 1 iff V(φ, w, w) = 1, and so V(φ↔@φ, w, w) = 1. Thus, φ↔@φ. But some instances of 2(φ↔@φ) aren’t valid. Let φ be ‘Fa’; here’s a countermodel: W = {c,d} D = {u} I(a) = u I(F )={〈u, c〉} In this model, V(2(F a↔@F a), c, c) = 0, because V(F a↔@F a, c, d) = 0. For ‘F a’ is true at 〈c, d〉 iff the referent of ‘a’ is in the extension of ‘F ’ at world d (it isn’t) whereas ‘@F a’ is true at 〈c, d〉 iff the referent of ‘a’ is in the extension of ‘F ’ at world c (it is). Note that this same model shows that φ↔@φ is not generally valid. General validity is truth at all pairs of worlds, and the formula F a↔@F a, as we just showed, is false at the pair 〈c, d〉. Example 3: φ→2@φ Consider any model, world w (and variable assignment), and suppose for reductio that V(φ, w, w) = 1 but V(2@φ, w, w) = 0. Given the latter, there is some world, v, such that V(@φ, w, v) = 0. And so, given the truth condition for @, V(φ, w, w) = 0. Contradiction.
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
236
Example 4: 2×∀x3@F x→2∀xF x i) Suppose for reductio that for some world w, some variable assignment g , and some model, Vg (2×∀x3@F x, w, w) = 1 and … ii) …Vg (2∀xF x, w, w) = 0. iii) Given the latter, for some world, call it “a”, Vg (∀xF x, w, a) = 0. iv) And so for some u ∈ D (call it “u”), Vgu/x (F x, w, a) = 0. v) Given i), Vg (×∀x3@F x, w, a) = 1 vi) Given the truth condition for ×, Vg (∀x3@F x, a, a) = 1 vii) Thus, for every object in the domain, and so for u in particular, Vgu/x (3@F x, a, a) = 1 viii) Thus, for some world, call it b , Vgu/x (@F x, a, b ) = 1 ix) Given the truth condition for @, Vgu/x (F x, a, a) = 1 x) Given the truth condition for atomics, 〈[x] gu/x , a〉 ∈ I(F ) xi) But given iv, 〈[x] gu/x , a〉 ∈ / I(F ). Contradiction
10.3
Fixedly
The two-dimensional approach to semantics — evaluating formulas at pairs of worlds rather than single worlds — raises an intriguing possibility. The 2 is a universal quantifier over the world of evaluation; we might, by analogy, follow Davies and Humberstone (1980) and introduce an operator that is a universal quantifier over the reference world. Davies and Humberstone call this operator F, and read “Fφ” as “fixedly, φ”. Grammatically, F is a one-place sentential operator. Its semantic clause is this: VM, g (Fφ, v, w) = 1 iff for every v 0 ,VM,g (φ, v 0 , w) = 1 Humberstone and Davies point out that given F, @, and 2, we can introduce two new operators: F@ and F2. It’s easy to show that:
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
237
VM,g (F@φ, v, w) = 1 iff for every v 0 ∈ W,VM,g (φ, v 0 , v 0 ) = 1 VM,g (F2, v, w) = 1 iff for v 0 , w 0 ∈ W,VM,g (φ, v 0 , w 0 ) = 1 Thus, we can think of F@ and F2, as well as 2 and F themselves, as being “kinds of necessities”, since their truth conditions introduce universal quantifiers over worlds of evaluation and reference worlds. (What about 2F? It’s easy to show that 2F is just equivalent to F2.) Humberstone and Davies don’t use two-dimensional semantics; they instead use designated-world QML models (and they don’t include ×). Say that designated-world QML models are variants iff they are alike except perhaps for the designated world. The truth condition for F is then this: VM,g (Fφ, w) = 1 iff for every model M0 that is a variant of M, VM0 ,g (φ, w) = 1 But this isn’t significantly different from the 2D approach. Think of it this way: in any 2D model, a choice of a reference world is like pairing an old-style model with a world to be the designated world.
10.3.1
Examples
Example 1: F@φ→φ Suppose otherwise: suppose V(F@φ, w, w) = 1 but V(φ, w, w) = 0. Given the former, V(@φ, w, w) = 1 (given the truth condition for ‘F’); but then V(φ, w, w) = 1 (given the truth condition for @). Contradiction. Example 2: F@φ→φ is not generally valid General validity requires truth at all pairs 〈v, w〉 in all models. But in the following model, Vg (F@(@Ga↔Ga)→(@Ga↔Ga), c, d) = 0 (for any g ): W = {c,d} D = {u} I(a) = u I(G) = {〈u, c〉}
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
238
In this model, the referent of ‘a’ is in the extension of ‘G’ in world c, but not in world d. That means that @Ga is true at 〈c, d〉 whereas Ga is false at 〈c, d〉, and so @Ga↔Ga is false at 〈c, d〉. But F@φ means that φ is true at all pairs of the form 〈v, v〉, and the formula @Ga↔Ga is true at any such pair (in any model). Thus, the antecedent of the conditional is true in this model. Example 3: 2 φ→Fφ In the model of the previous problem, the formula @Ga→F@Ga is false at 〈c, c〉. The antecedent is true because the referent of ‘a’ is in the extension of ‘G’ at c. The consequent is false because F@Ga means that ‘Ga’ is true at all pairs of the form 〈v, v〉, whereas ‘Ga’ is not true at 〈d, d〉 (since the referent of ‘a’ is not in the extension of ‘G’ at d). Example 4: If φ has no occurrences of @, then φ→Fφ Let’s prove by induction that if φ has no occurrences of @, then φ→Fφ is generally valid (i.e., true in any model at any world pair 〈v, w〉 under any variable assignment). The result then follows, because general validity (truth at all pairs) obviously implies validity (truth at pairs 〈w, w〉). First, let φ be atomic. Then we’re trying to show that for any worlds v, w, and any variable assignment g , Vg (Πα1 . . . αn → F Πα1 . . . αn , v, w) = 1. Suppose otherwise — suppose that (i) Vg (Πα1 . . . αn , v, w) = 1 and (ii) Vg (F Πα1 . . . αn , v, w) = 0. Given (ii), for some world, call it v 0 ,Vg (Πα1 . . . αn , v 0 , w) = 0, and so the ordered n-tuple of the denotations of α1 . . . αn is not in the extension of Π at w, which contradicts (i). Now the inductive step. We must assume that φ and ψ obey our statement, and show that complex formulas built from φ and ψ also obey our statement. That is, we assume the inductive hypothesis: (ih) φ and ψ have no occurrences of @, and φ→Fφ and ψ→Fψ are generally valid and we must show that the following are also generally valid:
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
239
∼φ→F∼φ (φ→ψ)→F(φ→ψ) ∀αφ→F∀αφ 2φ→F2φ Fφ→FFφ ×φ→F ×φ ∼ : Suppose otherwise — suppose V(∼φ→F∼φ, v, w) = 0 for some v, w. So V(∼φ, v, w) = 1 and V(F∼φ, v, w) = 0. So V(φ, v, w) = 0, and for some v 0 ,V(∼φ, v 0 , w) = 0; and so V(φ, v 0 , w) = 1. By (ih), V(φ→Fφ, v 0 , w) = 1, and so V(Fφ, v 0 , w) = 1, and so V(φ, v, w) = 1 — contradiction. → : Suppose for some v, w,V(φ→ψ)→ F(φ→ψ), v, w) = 0. So (i) V(φ→ψ, v, w) = 1 and V(F(φ→ψ), v, w) = 0. So, for some world, call it u, V(φ→ψ, u, w) = 0, and so V(φ, u, w) = 1 and V(ψ, u, w) = 0. Given the former and the inductive hypothesis, V(Fφ, u, w) = 1, and so V(φ, v, w) = 1. And so, given (i), V(ψ, v, w) = 1, and so, given the inductive hypothesis, V(Fψ, v, w) = 1, and so V(ψ, u, w) = 1, which contradicts (ii). ∀ : Suppose for some v, w, Vg (∀αφ, v, w) = 1, but Vg (F∀αφ, v, w) = 0. Given the latter, for some v 0 , Vg (∀αφ, v 0 , w) = 0; and so, for some u in the domain, Vgu/α (φ, v 0 , w) = 0. Given the former, Vgu/α (φ, v, w) = 1; given (ih) it follows that Vgu/α (Fφ, v, w) = 1, and so, Vgu/α (φ, v 0 , w) = 1. Contradiction. 2 : suppose (i) V(2φ, v, w) = 1 and (ii) V(F2φ, v, w) = 0, for some v, w. From (ii), V(2φ, v 0 , w) = 0 for some v 0 , and so V(φ, v 0 , w 0 ) = 0 for some w 0 . Given (i), V(φ, v, w 0 ) = 1; and so, given (ih), V(Fφ, v, w 0 ) = 1, and so V(φ, v 0 , w 0 ) = 1. Contradiction. F: suppose V(Fφ, v, w) = 1 and V(FFφ, v, w) = 0, for some v, w. From the latter, V(Fφ, v 0 , w) = 0 for some v 0 , and so V(φ, v 00 , w) = 0
for some v 00 , which contradicts the former.
×: suppose Vg (×φ, v, w) = 1 but Vg (F×φ, v, w) = 0, for some v, w. Given the latter, Vg (×φ, v 0 , w) = 0 for some v 0 , and so Vg (φ, w, w) = 0, which contradicts the former.
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
10.4
240
A philosophical application: necessity and a priority
The two-dimensional modal framework has been put to significant philosophical use in the past twenty-five or so years.5 This is not the place for an extended survey; rather, I will briefly present the two-dimensional account of just one philosophical issue: the relationship between necessity and a priority. In Naming and Necessity, Saul Kripke famously presented putative examples of necessary a posteriori statements and of contingent a priori statements: Hesperus = Phosphorus B (the standard meter bar) is one meter long The first statement, Kripke argued, is necessary because whenever we try to imagine a possible world in which Hesperus is not Phosphorus, we find that we have in fact merely imagined a world in which ‘Hesperus’ and ‘Phosphorus’ denote different objects than they in fact denote. Given that Hesperus and Phosphorus are in fact one and the same entity — namely, the planet Venus — there is no possible world in which Hesperus is different from Phosphorus, for such a world would have to be a world in which Venus is distinct from itself. Thus, the statement is necessary, despite its a posteriority: it took astronomical evidence to learn that Hesperus and Phosphorus were identical; no amount of pure rational reflection would have sufficed. As for the second statement, Kripke argues that one can know its truth as soon as one knows that the phrase ‘one meter’ has its reference fixed by the description: “the length of bar B”. Thus it is a priori. Nevertheless, he argues, it is contingent: bar B does not have its length essentially, and thus could have been longer or shorter than one meter. On the face of it, the existence of necessary a priori or contingent a posteriori statements is paradoxical. How can a statement that is true in all possible worlds be in principle resistant to a priori investigation? Worse, how can a statement that might have been false be known a priori? The two-dimensional framework has been thought by some to shed light on all this. Let’s consider the contingent a priori first. Let’s define the following notion of contingency: 5
For work in this tradition, see Stalnaker (1978, 2003a, 2004); Evans (1979); Davies and Humberstone (1980); Hirsch (1986); Chalmers (1996, 2006); Jackson (1998); see Soames (2004) for an extended critique.
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
241
φ is superficially contingent in model M at world w iff, for every variable assignment g ,VM,g (2φ, w, w) = VM, g (2∼φ, w, w) = 0. This corresponds, intuitively, to this: if we were sitting at w, and we uttered 3φ∧3∼φ, we’d speak the truth. How should we formalize the notion of a priority? As a rough and ready guide, let’s think of a sentence as being a priori iff it is valid — i.e., true at every pair 〈w, w〉 of every model. In defense of this guide: we can think of the truth value of an utterance of a sentence as being the valuation of that sentence at the pair 〈w, w〉 in a model that accurately models the genuine possibilities, and in which w accurately models the (genuine) possible world of the speaker. So any valid sentence is invariably true whenever uttered; hence, if φ is valid, any speaker who understands his or her language is in a position to know that an utterance of φ would be true. Under these definitions, there are sentences that are superficially contingent but nevertheless a priori. Consider any sentence of the form: φ↔@φ. In any model in which φ is true at w and false at some other world, the sentence is superficially contingent. But it is a priori, since, as we showed above, it is valid (though it’s not generally valid, as we also showed above.) That was a relatively simple example; but one can give other examples that are similar in spirit both to Kripke’s example of the meter bar, and to a related example due to Gareth Evans (1979): Bar B is one meter Julius invented the zip where bar B is the standard meter bar, and the “descriptive names” ‘one meter’ and ‘Julius’ are said to be “rigid designators” whose references are “fixed” by the descriptions ‘the length of bar B’ and ‘the inventor of the zip’, respectively. Now, whether or not these sentences, understood as sentences of everyday English, are indeed genuinely contingent and a priori depends on delicate issues in the philosophy of language concerning descriptive names, rigid designation, and reference fixing. Rather than going into all that, let’s construct some examples that are similar to Kripke’s and Evans’s. Let’s simply stipulate that ‘one meter’ and ‘Julius’ are to abbreviate “actualized descriptions”: ‘the actual length of bar B’ and ‘the actual inventor of the zip’. With a little creative reconstruing in the first case, the sentences then have the form: “the actual G is G”:
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
242
the actual length of bar B is a length of bar B the actual inventor of the zip invented the zip Now, these sentences are not quite a priori, since for all one knows, the G might not exist — there might exist no unique length of bar B, no unique inventor of the zip. So suppose we consider instead the following sentences: If there is exactly one length of bar B, then the actual length of bar B is a length of bar B If there is exactly one inventor of the zip, then the actual inventor of the zip invented the zip Each has the form: If there is exactly one G, then the actual G is G ∃x(Gx∧∀y(Gy→y=x)) → ∃x(@Gx∧∀y(@Gy→y=x)∧Gx) Any sentence of this form is valid (though not generally valid), and is superficially contingent. So we have further examples of the contingent a priori in the neighborhood of the examples of Kripke and Evans. Various philosophers want to concede that these sentences are contingent in one sense — namely, in the sense of superficial contingency. But, they claim, this is a relatively unimportant sense (hence the term ‘superficial contingency’, which was coined by Evans). In another sense, they’re not contingent at all. Evans calls the second sense of contingency “deep contingency”, and defines it thus (1979, p. 185): If a deeply contingent statement is true, there will exist some state of affairs of which we can say both that had it not existed the statement would not have been true, and that it might not have existed.
The intended meaning of ‘the statement would not have been true’ is that the statement, as uttered with its actual meaning, would not have been true. The idea is supposed to be that ‘Julius invented the zip’ is not deeply contingent, because we can’t locate the required state of affairs, since in any situation in which ‘Julius invented the zip’ is uttered with its actual meaning, it is uttered truly. So the Julius example is not one of a deeply contingent a priori truth. Evans’s notion of deep contingency is far from clear. One of the nice things about the two-dimensional modal framework is that it allows us to give a
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
243
clear definition of deep contingency. Davies and Humberstone (1980) give a definition of deep contingency which is parallel to the definition of superficial contingency, but with F@ in place of 2: φ is deeply contingent in model M at w iff (for all g ) VM ,g (F@φ, w, w) = 0 and VM ,g (F@∼φ, w, w) = 0. Under this definition, the examples we have given are not deeply contingent. To be sure, this definition is only as clear as the two-dimensional notions of fixedness and actuality. The formal structure of the two-dimensional framework is of course clear, but one can raise philosophical questions about how that formalism is to be interpreted. But at least the formalism provides a clear framework for the philosophical debate to occur. Our discussion of the necessary a posteriori will be parallel to that of the contingent a priori. Just as we defined superficial contingency as the falsity of the 2, so we can define superficial necessity as the truth of the 2: φ is superficially necessary in model M at w iff (for all g ) VM,g (2φ, w, w) = 1 How shall we construe a posteriority? Let’s follow our earlier strategy, and take the failure to be valid as our guide. But here we must take a bit more care. It’s quite a trivial matter to construct models in which invalid sentences are necessarily true; and we don’t need the two-dimensional framework to do it. We clearly don’t want to say that ‘Everything is a lawyer” is an example of the necessary a posteriori. But let F stand for ‘is a lawyer’; we can construct a model in which the predicate F is true of every member of the domain at any world, ∀xF x is true, and so is superficially necessary at every world, despite the fact that it is not valid. But this is too cheap. We began by letting the predicate F stand for a predicate of English, but then constructed our model without attending to the modal fact that it’s simply not the case that it’s necessarily true that everything is a lawyer. If F is indeed to stand for ‘is a lawyer’, we would need to include in any realistic model — any model faithful to the modal facts — worlds in which not everything is in the extension of F . To provide nontrivial models of the necessary a posteriori, when we have chosen to think of the nonlogical expressions of the language of QML as standing for certain expressions of English, our strategy will be provide realistic
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
244
models — models that are faithful to the real modal facts in relevant respects, given the choice of what the nonlogical expressions stand — in which invalid sentences are necessarily true. Now, since the notion of a “realistic model” has not been made precise, the argument here will be imprecise; but in the circumstances this imprecision is inevitable. So: consider now, as a schematic example of an a posteriori and superficially necessary sentence: If the actual F and the actual G exist, then they are identical [∃x(@F x∧∀y(@F y→x=y)) ∧ ∃z(@Gz∧∀y(@Gy→z=y)] → ∃x[@F x∧∀y(@F y→x=y) ∧ ∃z(@Gz∧∀y(@Gy→z=y) ∧ z=x)] This sentence isn’t valid. Nevertheless, it is superficially necessary in any model and any world w in which F and G each have a single object in their extension, no matter what the extensions of F and G are in other worlds in the model. So whenever such a model is realistic (given what we let F and G stand for), we will have our desired example. We can fill in this schema and construct an example similar to Kripke’s Hesperus and Phosphorus example. Set aside controversies about the semantics of proper names in natural language; let’s just stipulate that ‘Hesperus’ is to be short for ‘the actual F ’, and that Phosphorus is to be short for ‘the actual G’. And let’s think of F as standing for ‘the first heavenly body visible in the evening’, and G for ‘the last heavenly body visible in the morning’. Then (HP) If Hesperus and Phosphorus exist then they are identical has the form ‘If the actual F and the actual G exist then they are identical’, which was discussed in the previous paragraph. We may then construct a realistic model in which F and G each have a single object in their extension in some world, w, but in which they have different objects in their extensions in other worlds. In such a model, the sentence (2HP) 2(If Hesperus and Phosphorus exist then they are identical) is true at 〈w, w〉, and so we again have our desired example: (HP) is superficially necessary, despite the fact that it is a posteriori (invalid). Isn’t it strange that (HP) is both a posteriori and necessary? The twodimensional response is: no, it’s not, since although it is superficially necessary, it isn’t deeply necessary in the following sense:
CHAPTER 10. TWO-DIMENSIONAL MODAL LOGIC
245
φ is deeply necessary in model M at w iff (for all g ) VM ,g (F@φ, w, w) = 1 It isn’t deeply necessary because in any realistic model (given what F and G currently stand for), there must be worlds and objects other than c and u, that are configured as they are in the model below: W = {c, d} D = {u, v} I(F )= {〈u, c〉, 〈u, d〉} I(G) = {〈u, c〉, 〈v, d〉} In this model, even though (2HP) is false at 〈c, c〉, still, F@(HP), i.e.: F@{[∃x(@F x∧∀y(@F y→x=y)) ∧ ∃z(@Gz∧∀y(@Gy→z=y)] →
∃x[@F x∧∀y(@F y→x=y) ∧ ∃z(@Gz∧∀y(@Gy→z=y) ∧ z=x)]} is false at 〈c, c〉 (and indeed, at every pair of worlds), since (HP) is false at 〈d, d〉. And so, (HP) is not deeply necessary in this model. One might try to take this two-dimensional line further, and claim that in every case of the necessary a posteriori (or the contingent a priori), the necessity (contingency) is merely superficial. But defending this stronger line would require more than we have in place so far. To take one example, return again to ‘Hesperus = Phosphorus’, but now, instead of thinking of ‘Hesperus’ and ‘Phosphorus’ as abbreviations for actualized descriptions, let us represent them by names in the logical sense (i.e., the expressions called “names” in the definition of well-formed formulas, which are assigned denotations by interpretation functions in models). Thus, ‘Hesperus=Phosphorus’ is now represented as: a=b . Consider the following model: W = {c, d} D = {u, v} I(a) = u I(b ) = u The model is apparently realistic; it falsifies no relevant modal facts. But the sentence a=b is deeply necessary (at any world in the model). And yet it is a posteriori (invalid).
Bibliography Benacerraf, Paul and Hilary Putnam (eds.) (1983). Philosophy of Mathematics. 2nd edition. Cambridge: Cambridge University Press. Boolos, George (1975). “On Second-Order Logic.” Journal of Philosophy 72: 509–527. Reprinted in Boolos 1998: 37–53. — (1984). “To Be Is to Be the Value of a Variable (or to Be Some Values of Some Variables).” Journal of Philosophy 81: 430–49. Reprinted in Boolos 1998: 54–72. — (1985). “Nominalist Platonism.” Philosophical Review 94: 327–44. Reprinted in Boolos 1998: 73–87. — (1998). Logic, Logic, and Logic. Cambridge, MA: Harvard University Press. Boolos, George and Richard Jeffrey (1989). Computability and Logic. 3rd edition. Cambridge: Cambridge University Press. Chalmers, David (1996). The Conscious Mind. Oxford: Oxford University Press. — (2006). “Two-Dimensional Semantics.” In Ernest Lepore and Barry C. Smith (eds.), Oxford Handbook of Philosophy of Language, 574–606. New York: Oxford University Press. Cresswell, M. J. (1990). Entities and Indices. Dordrecht: Kluwer. Cresswell, M.J. and G.E. Hughes (1996a). A New Introduction to Modal Logic. London: Routledge. — (1996b). A New Introduction to Modal Logic. London: Routledge.
246
BIBLIOGRAPHY
247
Davies, Martin and Lloyd Humberstone (1980). “Two Notions of Necessity.” Philosophical Studies 38: 1–30. Dummett, Michael (1973). “The Philosophical Basis of Intuitionist Logic.” In H. E. Rose and J. C. Shepherdson (eds.), Proceedings of the Logic Colloquium, Bristol, July 1973, 5–49. Amsterdam: North-Holland. Reprinted in Benacerraf and Putnam 1983: 97–129. Enderton, Herbert (1977). Elements of Set Theory. New York: Academic Press. Evans, Gareth (1979). “Reference and Contingency.” The Monist 62: 161–189. Reprinted in Evans 1985. — (1985). Collected Papers. Oxford: Clarendon Press. Gamut, L. T. F. (1991a). Logic, Language, and Meaning, Volume 1: Introduction to Logic. Chicago: University of Chicago Press. — (1991b). Logic, Language, and Meaning, Volume 2: Intensional Logic and Logical Grammar. Chicago: University of Chicago Press. Glanzberg, Michael (2006). “Quantifiers.” In Ernest Lepore and Barry C. Smith (eds.), The Oxford Handbook of Philosophy of Language, 794–821. Oxford University Press. Harper, William L., Robert Stalnaker and Glenn Pearce (eds.) (1981). Ifs: Conditionals, Belief, Decision, Chance, and Time. Dordrecht: D. Reidel Publishing Company. Hirsch, Eli (1986). “Metaphysical Necessity and Conceptual Truth.” In Peter French, Theodore E. Uehling, Jr. and Howard K. Wettstein (eds.), Midwest Studies in Philosophy XI: Studies in Essentialism, 243–256. Minneapolis: University of Minnesota Press. Hodes, Harold (1984a). “On Modal Logics Which Enrich First-order S5.” Journal of Philosophical Logic 13: 423–454. — (1984b). “Some Theorems on the Expressive Limitations of Modal Languages.” Journal of Philosophical Logic 13: 13–26. Jackson, Frank (1998). From Metaphysics to Ethics: A Defence of Conceptual Analysis. Oxford: Oxford University Press.
BIBLIOGRAPHY
248
Kripke, Saul (1972). “Naming and Necessity.” In Donald Davidson and Gilbert Harman (eds.), Semantics of Natural Language, 253–355, 763–769. Dordrecht: Reidel. Revised edition published in 1980 as Naming and Necessity (Cambridge, MA: Harvard University Press). Lemmon, E. J. (1965). Beginning Logic. London: Chapman & Hall. Lewis, C. I. (1918). A Survey of Symbolic Logic. Berkeley: University of California Press. Lewis, C. I. and C. H. Langford (1932). Symbolic Logic. New York: Century Company. Lewis, David (1973). Counterfactuals. Oxford: Blackwell. — (1977). “Possible-World Semantics for Counterfactual Logics: A Rejoinder.” Journal of Philosophical Logic 6: 359–363. — (1979). “Scorekeeping in a Language Game.” Journal of Philosophical Logic 8: 339–59. Reprinted in Lewis 1983: 233–249. — (1983). Philosophical Papers, Volume 1. Oxford: Oxford University Press. Loewer, Barry (1976). “Counterfactuals with Disjunctive Antecedents.” Journal of Philosophy 73: 531–537. Mendelson, Elliott (1987). Introduction to Mathematical Logic. Belmont, California: Wadsworth & Brooks. Priest, Graham (2001). An Introduction to Non-Classical Logic. Cambridge: Cambridge University Press. Quine, W. V. O. (1953). “Mr. Strawson on Logical Theory.” Mind 62: 433–451. Russell, Bertrand (1905). “On Denoting.” Mind 479–93. Reprinted in Russell 1956: 41–56. — (1956). Logic and Knowledge. Ed. Robert Charles Marsh. New York: G.P. Putnam’s Sons. Sher, Gila (1991). The Bounds of Logic: A Generalized Viewpoint. Cambridge, Mass.: MIT Press.
BIBLIOGRAPHY
249
Sider, Theodore (2003). “Reductive Theories of Modality.” In Michael J. Loux and Dean W. Zimmerman (eds.), Oxford Handbook of Metaphysics, 180–208. Oxford: Oxford University Press. Soames, Scott (2004). Reference and Description: The Case against TwoDimensionalism. Princeton: Princeton University Press. Stalnaker, Robert (1968). “A Theory of Conditionals.” In Studies in Logical Theory: American Philosophical Quarterly Monograph Series, No. 2. Oxford: Blackwell. Reprinted in Harper et al. 1981: 41–56. — (1978). “Assertion.” In Peter Cole and Jerry Morgan (eds.), Syntax and Semantics, Volume 9: Pragmatics, 315–332. New York: Academic Press. Reprinted in Stalnaker 1999: 78–95. — (1981). “A Defense of Conditional Excluded Middle.” In Harper et al. (1981), 87–104. — (1999). Context and Content: Essays on Intentionality in Speech and Thought. Oxford: Oxford University Press. — (2003a). “Conceptual Truth and Metaphysical Necessity.” In Stalnaker (2003b), 201–215. — (2003b). Ways a World Might Be. Oxford: Oxford University Press. — (2004). “Assertion Revisited: On the Interpretation of Two-Dimensional Modal Semantics.” Philosophical Studies 118: 299–322. Reprinted in Stalnaker 2003b: 293–309. Westerståhl, Dag (1989). “Quantifiers in Formal and Natural Languages.” In D. Gabbay and F. Guenther (eds.), Handbook of Philosophical Logic, volume 4, 1–131. Dordrecht: Kluwer.