Lecture Notes in Artificial Intelligence
6211
Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science FoLLI Publications on Logic, Language and Information Editors-in-Chief Luigia Carlucci Aiello, University of Rome "La Sapienza", Italy Michael Moortgat, University of Utrecht, The Netherlands Maarten de Rijke, University of Amsterdam, The Netherlands
Editorial Board Carlos Areces, INRIA Lorraine, France Nicholas Asher, University of Texas at Austin, TX, USA Johan van Benthem, University of Amsterdam, The Netherlands Raffaella Bernardi, Free University of Bozen-Bolzano, Italy Antal van den Bosch, Tilburg University, The Netherlands Paul Buitelaar, DFKI, Saarbrücken, Germany Diego Calvanese, Free University of Bozen-Bolzano, Italy Ann Copestake, University of Cambridge, United Kingdom Robert Dale, Macquarie University, Sydney, Australia Luis Fariñas, IRIT, Toulouse, France Claire Gardent, INRIA Lorraine, France Rajeev Goré, Australian National University, Canberra, Australia Reiner Hähnle, Chalmers University of Technology, Göteborg, Sweden Wilfrid Hodges, Queen Mary, University of London, United Kingdom Carsten Lutz, Dresden University of Technology, Germany Christopher Manning, Stanford University, CA, USA Valeria de Paiva, Palo Alto Research Center, CA, USA Martha Palmer, University of Pennsylvania, PA, USA Alberto Policriti, University of Udine, Italy James Rogers, Earlham College, Richmond, IN, USA Francesca Rossi, University of Padua, Italy Yde Venema, University of Amsterdam, The Netherlands Bonnie Webber, University of Edinburgh, Scotland, United Kingdom Ian H. Witten, University of Waikato, New Zealand
Thomas Icard Reinhard Muskens (Eds.)
Interfaces: Explorations in Logic, Language and Computation ESSLLI 2008 and ESSLLI 2009 Student Sessions, Selected Papers
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Thomas Icard Stanford University Stanford, CA, USA E-mail:
[email protected] Reinhard Muskens Tilburg University Tilburg, The Netherlands E-mail:
[email protected]
Library of Congress Control Number: 2010931167
CR Subject Classification (1998): F.4.1, F.3, F.4, I.2.3, I.2, D.3 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-642-14728-3 Springer Berlin Heidelberg New York 978-3-642-14728-9 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180 543210
Preface
The European Summer School in Logic, Language and Information (ESSLLI) takes place every year, each time at a different location in Europe. With its focus on the large interdisciplinary area where linguistics, logic and computation converge, it has become very popular since it started in 1989, attracting large crowds of students. ESSLLI is where everyone in the field meets, teaches, takes courses, gives talks, dances all night, and generally has a good time. One of the enjoyable features of the School is its recurring Student Session, organized by students along the lines of a conference. The speakers are students too, who are eager to get a chance to present their work. They face stiff competition to get their talks accepted, as the number of papers that is sent in each year is high and acceptance rates low. In my experience many of the selected talks contain fresh and surprising insights and are a pleasure to attend. But the reader may judge the quality of the Student Session for himself, as this volume contains a selection of papers from its 2008 and 2009 installments, the first held in Hamburg, the second in Bordeaux. The book is divided into four parts. – – – –
Semantics and Pragmatics Mathematical Linguistics Applied Computational Linguistics Logic and Computation
The first two of these present work in the intersection of logic (broadly conceived) and different parts of linguistics, the third contains papers on the interface of linguistics and computation, while the fourth, as its name suggests, deals with logic and computation. The reader will see a connection with the Venn diagram that functions as ESSLLI’s logo. Let me finish by thanking everyone who contributed to making the 2008 and 2009 Student Sessions the successes they were: Kata Balogh, who chaired the 2008 Session, and Thomas Icard, who chaired that of 2009; their Co-chairs Manuel Kirschner, Salvador Mascarenhas, Laia Mayol, Bruno Mery, Ji Ruan, and Marija Slavkovik; all referees and area experts; the speakers, of course; and last but certainly not least Springer, for generously making available the Springer Best Paper Awards. May 2010
Reinhard Muskens
Organization
The ESSLLI Student Session is part of the European Summer School for Logic, Language, and Information, organized by the Association for Logic, Language, and Information. This volume contains papers from the 2008 Student Session in Hamburg and the 2009 Student Session in Bordeaux.
2008 Student Session Chair
Kata Balogh (Amsterdam)
Co-chair Logic and Language
Laia Mayol (Pennsylvania)
Co-chair Logic and Computation
Ji Ruan (Liverpool)
Co-chair Language and Computation
Manuel Kirschner (Bozen-Bolzano)
Area Experts
Anke L¨ udeling (Berlin) Paul Egr´e (Paris) Guram Bezhanishvili (New Mexico) Alexander Rabinovich (Tel Aviv) Rineke Verbrugge (Groningen)
2009 Student Session Chair
Thomas Icard (Stanford)
Co-chair Logic and Language
Salvador Mascarenhas (New York)
Co-chair Logic and Computation
Marija Slavkovik (Luxembourg)
Co-chair Language and Computation
Bruno Mery (Bordeaux)
Area Experts
Reinhard Muskens (Tilburg) Nathan Klinedinst (London) Makoto Kanazawa (Tokyo) Jens Michaelis (Bielefeld) Arnon Avron (Tel Aviv) Alexandru Baltag (Oxford)
Table of Contents
Semantics and Pragmatics Can DP Be a Scope Island? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon Charlow Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Franke
1
13
What Makes a Knight? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan Wintein
25
The Algebraic Structure of Amounts: Evidence from Comparatives . . . . . Daniel Lassiter
38
Mathematical Linguistics Extraction in the Lambek-Grishin Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . Arno Bastenhof Formal Parameters of Phonology: From Government Phonology to SPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Graf
57
72
Applied Computational Linguistics Variable Selection in Logistic Regression: The British English Dative Alternation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daphne Theijssen
87
A Salience-Driven Approach to Speech Recognition for Human-Robot Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pierre Lison
102
Language Technologies for Instructional Resources in Bulgarian . . . . . . . . Ivelina Nikolova
114
Logic and Computation Description Logics for Relative Terminologies . . . . . . . . . . . . . . . . . . . . . . . . Szymon Klarman
124
VIII
Table of Contents
Cdiprover3: A Tool for Proving Derivational Complexities of Term Rewriting Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Schnabl
142
POP* and Semantic Labeling Using SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Avanzini
155
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
167
Can DP Be a Scope Island? Simon Charlow New York University
1
Introduction
Sauerland [1] uses data from inverse linking—cf. [2]—to motivate quantifier raising (QR) out of DP, proposing to derive Larson’s generalization—cf. [3]— regarding the scopal integrity of DP via an Economy-based constraint on QR (cf. [4]). This squib is in four parts. I first lay out Sauerland’s three arguments for QR out of DP. I present (a slightly modified version of) his mechanism for constraining QR. I show that it both over- and under- generates. I conclude by arguing that the readings Sauerland uses to motivate his account don’t result from an islandrespecting QR mechanism. In short, each of the cases Sauerland considers involve DPs with “special” scopal properties: plural demonstrative DPs, bare plural DPs, and antecedent-contained deletion (ACD)-hosting DPs. The argument that apparent wide-scope readings of plural demonstratives are only apparent is motivated using (so far as I know) new data from English, while the latter two cases receive independent motivation from the literature. The conclusion is that the question posed in the title of this paper can be answered in the affirmative.
2
Sauerland’s Data
2.1
Modal Intervention
Sauerland points out that (1) can be true if Mary doesn’t have any specific individuals in mind and doesn’t want to get married twice (say she’s placed a classified ad indicating she wants to meet and marry a man from either Finland or Norway): (1) Mary wants to marry someone from these two countries. ([1]’s ex. 8a)
Sauerland concludes that (a) the non-specificity of Mary’s desire suggests that the indefinite remains within the scope of the bouletic operator O, and (b) the fact that Mary needn’t desire two marriages requires that these two countries be outside the scope of O. In sum, the scope ordering is 2 > O > ∃ and requires QR out of DP: (2) [these two countries]x [Mary wants [λw . PRO marry [someone from x] in w ]]
Thanks to Chris Barker, Emma Cunningham, Polly Jacobson, Ezra Keshet, Philippe Schlenker, Uli Sauerland, Anna Szabolcsi, and an anonymous reviewer. This work was supported in part by NSF grant BCS-0902671 to Philippe Schlenker and an NSF Graduate Research Fellowship to the author.
T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 1–12, 2010. c Springer-Verlag Berlin Heidelberg 2010
2
S. Charlow
The first of these points seems correct. If the semantics of want involves quantification over want-worlds w , this scope ordering entails that the individual Mary marries can vary with each wMary . This derives a non-specific desire. Point (b) is more subtle. Depending on the semantics assigned to want, scoping two over it may be insufficient to derive the disjunctive reading. In brief: the existence of two individuals x such that Mary marries x in each of her want-worlds w still requires, on a naïve semantics for want, that Mary marry twice in each w . More sophisticated semantics for want—e.g. [5]—may obviate this worry. Grant that scoping two over want can derive the disjunctive reading. Sauerland point then requires that leaving two within the scope of O be incompatible with a disjunctive desire. We return to this below. 2.2
Antecedent-Contained Deletion
Sauerland corroborates this observation by noting the grammaticality of “wide” non-specific readings of ACD constructions like (3): (3) Mary wants to marry someone from every country Barry does.
(3) can be true if (a) neither Mary nor Barry has anyone specific in mind and (b) the ACD is resolved “wide”—viz. anaphoric to the larger VP want to... As before, the first of these points suggests that the indefinite remains within the scope of want. Additionally, standard assumptions require that the DP containing the ellipsis site QR past the verb heading the antecedent VP in order to resolve the antecedent-containment paradox—cf. [6]. The scope ordering ∀ > O > ∃ again requires QR out of DP. 2.3
Negation Intervention
Finally, Sauerland echoes [7]’s observation regarding (4): (4) John didn’t see pictures of several baseball players.
(4) has a reading judged true if there are several baseball players x such that John didn’t see any pictures of x—several > ¬ > ∃. Sauerland assumes that the scope of the existential quantifier marks the LF position of the bare plural (a proposition I dispute below) and safely establishes that the cardinal indefinite occupies an LF position above negation. The by-now-familiar conclusion is that this reading requires QR out of DP.
3
Larson’s Generalization and Constraints on QR
[3] observes that a QP external to a DP X must scope either below or above all scopal elements in X (i.e. no interleaved scope): (5) Three men danced with a woman from every city. (*∀ > 3 > ∃) (6) Several students ate a piece of every pie. (*∀ > several > ∃)
Can DP Be a Scope Island?
3
The conclusion usually drawn from this datum is that QR out of DP is illicit. Inverse linking instead results from QR of the embedded QP to a DP-adjunction position: (7) [DP [everycity]x [DP someone from x ]] left
This approach is adopted in e.g. [6,8,9]. If QR into DP is likewise illicit, Larson’s generalization is derived. 3.1
Superiority
Sauerland rejects DP’s scope island-hood, arguing that subjecting QR to Superiority in the sense of [10,4] accounts for generalizations (8) and (9). (8) QP1 [QP2 [QP3 ]] *QP3 > QP1 > QP2 ([3]) (9) O [DP [QP]] QP> O > DP (Sauerland)
We won’t dwell on the syntactic details of Sauerland’s account here. It will be sufficient to note that Sauerland allows the relative scope of two QPs to be reversed iff this reversal is required for interpretation. This is effected by ordering QR of higher QPs before QR of lower QPs and requiring that QR be to the nearest node of type t (thus the lower QP in general lands below the higher one).1 “Canonical” inverse scope is derived by total reconstruction of the subject QP (which Sauerland conceptualizes in terms of subject movement at PF). Sauerland assumes that absent DP-internal clausal syntax, DP-embedded QPs are uninterpretable in situ. Accordingly, they QR to the nearest node of type t. If the embedding DP is quantificational, this entails a scope inversion (note that surface scope readings of inversely linked constructions are predicted impossible, something we revisit in §4.1). If the QP-containing DP is itself uninterpretable— e.g. in object position—a proper characterization of Superiority (cf. [4,12]) requires that it QR before the embedded QP. This is all that’s needed to derive Larson’s generalization in the extensional case. (10) [vP [QP1 three men] danced with [QP2 a woman from [QP3 every city]] ]
Two scenarios are possible. Either (a) QP1 moves to [Spec,TP] at LF (it QRs), or (b) it moves there at PF (it doesn’t). In the first case each QP QRs. The only inversion required for interpretation is between QP2 and QP3 , and so the scope ordering 1 > 3 > 2 is derived. In scenario (b) QR applies twice. One inversion comes for free (QP1 and QP2 ; since QP1 doesn’t raise, QP2 goes to the nearest node of type t—viz. above QP1 ), and one is required for interpretation (QP2 and QP3 ). Superiority also requires that QP2 raise over QP1 before QP3 raises out of QP2 . Thus the scope ordering 3 > 2 > 1 is derived. In both scenarios QP2 and QP3 scope together relative to QP1 . Non-QP Operators. The following constructions replace the subject QP with an intensional operator/negation: 1
This oversimplifies the mechanism [4] and Sauerland propose. I don’t think this affects any of my points. See [11,12] for further discussion.
4
S. Charlow
(11) Mary wants [TP PRO to marry [QP1 someone from [QP2 these two countries]] ] (12) [NegP not [vP John see [QP1 pictures of [QP2 several baseball players]] ]]
Both structures require QR of QP1 and QP2 to a TP/vP-adjunction position for interpretation. QP2 may subsequently continue climbing the tree. It’s free to raise over want/not; Superiority doesn’t come into play since these aren’t QPs. Thus the scope ordering 2 > O > 1 is derived (similarly, the ACD example is predicted grammatical).
4 4.1
Problems with the Account Surface Scope
As noted in §3.1, Sauerland’s account predicts that a DP-embedded QP can never scope inside its embedding DP. (13) John bought a picture of every player on the team. ([11]’s ex. 40a) (14) John bought a picture of each player on the team. ([11]’s ex. 40b) (15) Everyone/no one from a foreign country eats sushi. (after [13] 221, ex. 1)
As [11] notes, example (13) has a reading on which it’s true if John bought a single picture with everyone in it and false if he bought many individual pictures but no single picture with everyone. Though this reading seems to require surface scope (viz. ∃ > ∀) [11] suggests it may stem from a “group interpretation” of wide-scoping every player on the team—i.e. roughly equivalent to all the players on the team. If a group interpretation is unavailable for e.g. each player on the team in (14), [11] argues, we have an explanation for why surface scope (viz. ∃ > ∀) is “unavailable” here. A few comments are in order. First, the ungrammaticality of ∃ > ∀ in (14) is actually not clear. Though the surface-scope reading may be marked, this follows from each’s oft-noted strong preference for wide scope. Second, the grammaticality of (15) on its surface-scoping reading—viz. every/no x such that x is from a foreign country is such that x eats sushi (∀/¬∃ > ∃)—cannot be answered by appeal to group interpretations of the embedded QP. A theory of inverse linking must, it seems, account for “surface” linking. Absent an ad hoc appeal to abstract clausal syntax inside DP, [11,1] cannot. 4.2
Reliance on Covert Clausal Syntax
[8,3] observe that QPs embedded in nominal intensional complements can be read de dicto: (16) Max needs a lock of mane from every unicorn in an enchanted forest.
(16) ([3]’s ex. 4a) has a reading on which it’s true if Max is trying to perform a spell which requires him to pick an enchanted forest and then procure a lock of mane from every unicorn in it. Max’s need in this scenario is nonspecific with respect to both unicorns and locks of mane, suggesting that each QP remains within the scope of the intensional verb need.
Can DP Be a Scope Island?
5
The DP-as-scope-island approach to inverse linking predicts this state of affairs. QR of the embedded QP targets the DP-internal adjunction site rather than the nearest node of type t. The embedded QP can—indeed must—remain within the scope of need. Something more needs to be said on Sauerland’s account. Following [14] he proposes that intensional transitives take abstractly clausal complements. Informally, the syntax of (16) is something like Max needs PRO to have... The infinitive clause offers a type-t landing site for the embedded QP below need. Abstract clausal syntax in complements of intensional transitives is thus an essential feature of Sauerland’s account. 4.3
Double-Object Behavior in Intensional Cases
Surprisingly, Sauerland’s account predicts that though inversely linked DPs in extensional contexts obey Larson’s generalization, those in intensional contexts do not. Compare the following two cases:2 (17) Two students want to read a book by every author. (*∀ > 2 > ∃) (18) Two boys gave every girl a flower. (∀ > 2 > ∃)
Example (17) lacks the starred reading—unsurprisingly given Larson’s generalization. Example (18)—discussed by Bruening in unpublished work, and given as [11]’s ex. 49—permits an intervening scope reading (i.e. on which boys vary with girls and flowers with boys). (19) [QP1 two students] want [ [QP3 every author]x [ [QP2 a book by x]y [PRO to read y] ] ] (20) [QP1 two boys] gave [QP3 every girl] [QP2 a flower]
(19) and (20) represent intermediate steps in the derivations of (17) and (18), respectively. In (19) QP2 has raised from object position, and QP3 has raised out of QP2 . The difficulty for Sauerland here is that (19) and (20) are predicted to license the same subsequent movements—the numbering is intended to highlight this. If in both cases QP1 moves only at PF we may derive the following structures: (21) [QP3 every pie]x [ [QP1 two students] want [ x [[QP2 a piece of x]y [PRO to eat y]] ]] ] (22) [QP3 every girl]x [ [QP1 two boys] gave x [QP2 a flower] ]
In short, both (17) and (18) are predicted to permit 3 > 1 > 2. While this is a good result for (18), it’s a bad one for (17). Note, moreover, that appealing to the obligatoriness of the QR in (22) as compared to the non-obligatoriness of the QR in (21) won’t help: (23) A (different) child needed every toy. (∀ > ∃) (24) Two boys want to give every girl a flower. (∀ > 2 > ∃)
(23) possesses an inverse-scope reading (on which children vary with toys), and (24) possesses the interleaved scope reading that (17) lacks. As per Sauerland’s assumption, the syntax of (23) is actually as in (25): 2
I thank an anonymous reviewer for comments which helped me sharpen this point.
6
S. Charlow
(25) [a (different) child] needed [PRO to have [every toy] ] (26) [two boys] want [PRO to give [every girl] [a flower] ]
QR of every toy/girl above the subject isn’t obligatory in either case. In both instances obligatory QR targets a position below the intensional verb (and thus below the subject QP). In short, Sauerland needs to allow non-obligatory QR to reorder subject and object QPs. Ruling this mechanism out in order to save (17) dooms (23) and (24). 4.4
Under-Generation Issues
The following constructions are grammatical when the ECM indefinite scopes below the matrix-clause intensional operator O (evidenced by NPI grammaticality in 27 and a nonspecifically construed indefinite in 28) and the bracketed QP scopes above O:3 (27) Frege refused to let any students search for proofs of [at least 597 provable theorems] (28) Frege wanted many students to desire clear proofs of [every theorem Russell did]
In (27) the bracketed QP can be (indeed, on its most salient reading is) construed de re. Frege need never have wanted anything pertaining de dicto to ≥597 provable theorems. Say he made a habit of dissuading students from searching for proofs of theorems he considered unprovable, but by our reckoning he engaged in no fewer than 597 erroneous dissuasions. For Sauerland this requires 597 > refused. In (28) wide ACD resolution is permitted. As Sauerland observes, this suggests that the bracketed QP scopes at least as high as wanted. Both of these “wide” readings are compatible with O > ∃, a situation Superiority predicts impossible. Obligatory QR of the bracketed QPs in both cases targets a node below the ECM indefinite (N.B. the verbs in the infinitival complements are intensional transitives; on the [14] analysis of these constructions their complements are clausal; obligatory QR of the bracketed QPs thus targets a position below the infinitival intensional transitive). If the ECM indefinite stays within the scope of O, Superiority predicts—barring total reconstruction of the indefinite4 —that the bracketed QP will be unable to take scope over O, contrary to fact. I return to both of these constructions in §5.
5
Re-evaluating Sauerland’s Data
5.1
Modal Intervention?
Does the non-specific disjunctive-desire reading of (1)—repeated here as (29)— require QRing these two countries over the intensional operator? Here’s some evidence it doesn’t: (29) Mary wants to marry someone from these two countries. (30) (Pointing to “Toccata and Fugue in D minor” and “O Fortuna”) When these two songs play in a movie, someone’s about to die. 3 4
For Sauerland, anyway. I discuss below why I don’t think de re diagnoses wide scope. The aforementioned anonymous reviewer notes that total reconstruction as generally understood only applies to A-chains, not QR chains. True enough.
Can DP Be a Scope Island?
7
(31) The paranoid wizard refuses to show anyone these two amulets. (32) The paranoid wizard refuses to show more than two people these two amulets. (33) You may show a reporter (lacking a security clearance) these two memos. (34) [Ms. Goard] declined to show a reporter those applications. (35) At least some states consider it to be attempted murder to give someone these drugs. (36) When you give someone these viruses, you expect to see a spike as gene expression changes. (37) #Mary wants to marry someone from every Scandinavian country. (38) #When every Stravinsky song plays in a movie, someone’s about to die.
To the extent that (29) can express something felicitous, so can (30), despite the fact that QR of those two songs over the modal operator when is blocked by a tensed clause boundary. Specifically, (30) needn’t quantify over situations in which two songs play. The availability of a disjunctive reading in this case (viz. ≈ when either of those two songs plays in a movie, someone’s about to die) suggests that QR out of DP may not be required for a felicitous reading of (29). Example (31), whose infinitival complement hosts a double object configuration, corroborates this assessment. Double object constructions are known to disallow QR of the DO over the IO—cf. [4]. Here the NPI/nonspecific IO remains within the scope of the downward-entailing intensional operator refuse. Accordingly, these two amulets cannot QR above refuse. Nevertheless, the most salient reading of (31) involves a wizard who doesn’t show anyone either of the two amulets.5 Similarly (32) permits a reading such that the paranoid wizard won’t show either of the two amulets to any group of three or more people. Similarly, (33) allows a disjunctive construal of these two memos. On this reading, you are conferred permission to show any reporter lacking a security clearance either of the two memos (and possibly both). So you’re being compliant if you show such a reporter memo #1 but not memo #2. This is again despite a nonspecific IO, which should prohibit QR of these two memos to a position over the deontic modal. Examples (35)–(36) likewise permit nonspecific IOs alongside disjunctively construed DOs, despite double object configurations (and a tensed clause boundary in 36).6 5
6
Superiority theorists may counter that NPIs aren’t subject to QR and thus that the DO is free to QR over anyone in (31). This leaves (32) and (33)–(36) mysterious. Additionally, [15] shows that NPIs can host ACD gaps, suggesting they QR, after all—cf. that boy won’t show anyone he should his report card. (34)–(36) were obtained via Google search. They can be accessed at the following links:
1. http://www.nytimes.com/2000/11/19/us/counting-vote-seminole-countyjudge-asked-democrats-quash-absentee-ballots.html 2. http://tribes.tribe.net/bdsmtipstechniques/thread/ 8cd9d057-e54d-4b03-8899-edada3dc33e6 3. http://www.genomics.duke.edu/press/genomelife/current/GL_MarApr09.pdf —each of which displays the nonspecific-IO/disjunctive-DO reading.
8
S. Charlow
Finally, (37) and (38) lack felicitous readings (given certain norms surrounding marriage and film scores). They are incompatible with scenarios in which Mary wants to marry once, and every Stravinsky song playing in a given situation isn’t a clue about anything. This suggests that plural demonstratives may be necessary for disjunctive readings of (29)–(36).7 In sum, (30)–(36) show that QR over an intensional operator cannot be necessary for a disjunctive construal of a plural demonstrative. Examples (37) and (38) show that in certain cases the plural demonstrative is a necessary component of the disjunctive reading. These facts follow if we assume that disjunctive readings in these cases aren’t (necessarily) due to QR over an intensional operator but may instead arise when plural demonstratives occur in the scope of modal (or downward-entailing, cf. §5.2) operators.8 5.2
Negation Intervention?
Recall [7]’s negation-intervention cases—e.g. (4), repeated here as (39): (39) John didn’t see pictures of several baseball players (at the auction).
As [7] observes and Sauerland confirms, constructions like (39) allow a reading with several > ¬ > ∃. Several baseball players x, in other words, are such that John didn’t see any pictures of x. Independently motivated semantic apparatus for bare plurals helps explain these data. If DP is a scope island, scoping several over not requires QRing the bare plural over not: (40) [[several baseball players]x pictures of x]y [John didn’t see y]
We assume following [16] that bare plurals sometimes denote kinds and that combining a kind-level argument with a predicate of objects creates a type-mismatch resolved by an operation called ‘D(erived) K(ind) P(redication).’ Following [17], the semantics of DKP is as follows: (41) For any P denoting a predicate of objects: DKP(P ) = λy.[∃x : x ≤ y][P x], where x ≤ y iff x instantiates the kind y.
DKP generalizes to n-ary relations in the usual way (cf. [16] fn. 16), introducing an existential quantifier within the scope of the shifted verb. That DPs of the form pictures of several baseball players denote kinds on their inversely linked readings is confirmed by (a) the felicity of (42) and (b) the absence of a several > ∃ > ¬ reading for (39) (repeated as 43): (42) Pictures of several baseball players are rare. (43) John didn’t see pictures of several baseball players (at the auction). 7 8
Though plural indefinites seem to work similarly in certain cases. See §5.2. Disjunctive readings might involve something like a free-choice effect or exceptional scope (i.e. scope out of islands which doesn’t require QR out of islands).
Can DP Be a Scope Island?
9
Returning to (40), note that the trace y left by QR of the bare plural will (presumably) be kind-level.9 This creates a mismatch between see and the bare plural’s trace y. DKP applies to see y, introducing an ∃ within the scope of a ¬: (44) λz . see y z →DKP λz . [∃x : x ≤ y][see x z]
This derives several > ¬ > ∃, despite the prohibition on QR out of DP. Plural Indefinites and Demonstratives under Negation. Other factors may be at work in these cases. Recall (31), repeated here as (45): (45) The paranoid wizard refuses to show anyone these two amulets. (46) The paranoid wizard refuses to show anyone several (of his) amulets.
As noted previously, (45) requires the NPI IO to remain under refuse, while permitting a (disjunctive) reading truth-conditionally equivalent to 2 > refuse. Interestingly, the same goes for (46), which replaces the demonstrative with a plural indefinite and admits a (disjunctive) reading equivalent to several > refuse. In both cases scope freezing should prohibit QR of the DO over the IO to a position above refuse. It is hypothesized that these readings instead result from disjunctively construed DOs. Might QR of (39)’s inversely-linked bare plural over negation thus be unnecessary for a reading which gives several apparent scope over negation? Consider the following cases: (47) John didn’t read any books by these two authors. (≈ 2 > ¬ > ∃) (48) John didn’t read any books by several authors. (??? ≈ several > ¬ > ∃)
These examples replace (39)’s bare plural with full determiner phrases. (47) allows a (disjunctive) reading equivalent to 2 > ¬ > ∃, whereas the disjunctive reading of (48) is borderline ungrammatical. Why this might obtain is unfortunately beyond the scope of what I can consider here, but it shows that the QR+DKP story may be necessary for (39) but not for an example with a plural demonstrative in lieu of the plural indefinite.10,11 5.3
ACD and Scope Shift
Sauerland’s ACD data remains to be discussed. Recall that (49) is grammatical with a nonspecific indefinite and wide ACD resolution, suggesting QR out of DP. (49) Mary wants to marry someone from every country Barry does. 9
10 11
Pictures of several baseball players will denote something like the set of predicates κ of kinds such that for several baseball players x, the y = pictures of _x^ (where _x^ = x) is such that κy = 1. The contrast between (46) (permits a disjunctive reading) and (48) (doesn’t) is also unexplained. Note also that the contrast between (47) and (48) doesn’t follow for Sauerland, who permits the embedded QP to QR over negation in both cases.
10
S. Charlow
I’d like to suggest that QR to resolve ACD is a more powerful scope-shift mechanism than QR which isn’t required for interpretation. A similar claim is made in [18], which distinguishes “ACD-QR” from “Scope-QR”—viz. QR which doesn’t resolve antecedent containment. [18] notes e.g. that ACD licenses QR across a tensed clause boundary and negation, both islands for Scope-QR: (50) John hopes I marry everyone you do (hope...) (51) John said that Mary will not pass every student that we predicted he would (say...)
In the following I consider some additional evidence in favor of an ACD-QR mechanism distinct from and more powerful than Scope-QR. ACD and DE Operators. Examples (52) and (53) differ in that (53) hosts an ACD gap, whereas (52) does not. The reading of (53) we’re interested in involves wide ACD resolution: (52) Mary denied kissing everyone. (??∀ > deny) cf. Mary imagined kissing everyone. (∀ > imagine) (53) Mary denied kissing everyone Barry did. (∀ > deny)
QPs headed by every do not readily QR over downward-entailing operators—cf. [19,18].12 (52) doesn’t permit ∀ > deny without focal stress on everyone. The wide ACD reading of (53), by contrast, permits (actually, requires) ∀ > deny (note that although Barry is focused, everyone is not). Double Object Constructions. Imagine a scenario as follows: a bus full of Red Sox players pulls up. Mary and Barry both mistake them for the Yankees. Each of them wants to give the same presents to some player (or other?) on the bus. (54) Mary wants to give a Yankee everything Barry does.
(54) is grammatical with a Yankee read de dicto—as required by the mistaken beliefs of Mary and Barry in our scenario—and wide ACD resolution, pace [4].13 Whether this reading permits gift recipients to vary with gifts is a more difficult matter.14 Nevertheless, the grammaticality of a wide ACD site hosted in a DO (∀ > O), combined with a de dicto IO (O > ∃), requires subverting double-object scope freezing. (55) The wizard’s wife refuses to show anyone the same two amulets her husband does. 12 13
14
N.B. these authors only consider QR of every over not. [20] discusses an example like (54) in his fn. 10 but doesn’t consider whether it allows the IO to be read de dicto. [4] considers two examples like (54)—his (27a) and (27b)—but concludes they don’t admit a de dicto IO, contrary to the judgments of my informants and myself. As [4] correctly notes, Mary gave a child every present Barry did is grammatical but doesn’t allow girls to vary with presents (*∀ > ∃). He proposes that both IO and DO QR, the DO since it contains the ACD gap and the IO to get out of the DO’s way (i.e. to preserve Superiority). Thus IO > DO is preserved. Examples (54) and (55) suggest that this may not be the right approach.
Can DP Be a Scope Island?
11
(55) is grammatical with wide ACD. Given the NPI IO, this requires 2 > O > ∃. Again, ACD QR subverts the prohibition on QR of DO over IO in double object configurations. Larson’s Generalization. Recall (28), repeated here (slightly modified) as (56): (56) Frege wanted a student to construct a proof of [every theorem Russell did]
Previously we focused on how (28) represented a problem for Sauerland. Superiority predicts that a nonspecific reading of the ECM indefinite will be incompatible with wide ACD resolution, contrary to fact. Note, however, that this reading represents a problem for just about anybody. Specifically, its grammaticality entails a violation of Larson’s generalization:15 (57) [every theorem Russell wanted a student to construct a proof of x]x [Frege wanted a student to construct a proof of x]
LF (57) entails that a QP intervenes between a DP-embedded QP and its remnant! In other words: the same strategy that Sauerland uses to argue that DP isn’t a scope island allows us to construct examples wherein Larson’s generalization doesn’t. But of course we don’t want to conclude that Larson’s generalization doesn’t ever hold. In sum, since ACD-QR can cross islands, Sauerland’s ACD examples aren’t dispositive for the DP-as-scope-island hypothesis.
6
Conclusions
This squib has offered evidence that the conclusions reached in [11,1] favoring QR out of DP may not be warranted. Most seriously, the mechanism Sauerland proposes to derive Larson’s generalization only really works for extensional cases, over-generating when inversely-linked DPs occur in intensional contexts. Sauerland’s account also struggles with “surface-linked” interpretations of inversely linked DPs and ECM interveners which should block certain readings but appear not to, as well as a reliance on covert clausal syntax in intensional transitive constructions. On closer inspection, readings analogous to those which Sauerland takes to motivate QR out of DP occur in constructions where (we have independent reason to believe that) QR above a certain relevant operator isn’t an option. Importantly, each of Sauerland’s arguments for QR out of DP is given a doubleobject-construction rejoinder. I have speculated that plural demonstratives/ indefinites in the scope of modal/negation operators can be construed disjunctively in the absence of QR. Additionally, if (following [16]) the scope of an ∃ quantifier isn’t diagnostic of the LF position of a kind-denoting bare plural, [7]’s negation-intervention cases don’t require a split DP. 15
Similar comments don’t apply to (27). As many authors—e.g. [21,22]—have shown, scoping a QP over an intensional operator may be sufficient for a de re reading of that QP, but it cannot be necessary.
12
S. Charlow
Finally, in line with [18], I’ve provided new arguments that ACD-QR can do things Scope-QR can’t: namely, scope an every-phrase over a downward-entailing operator, carry a DO over an IO in double object configurations, and precipitate violations of Larson’s generalization. Some of these criticisms will also militate against [4]’s characterization of QR as Superiority-governed. Additionally, it remains to be determined to what extent plural demonstratives and indefinites behave as a piece with respect to disjunctive readings, why this might be the case, and what any of this has to do with modals/negation. I must leave consideration of these matters to future work.
References 1. Sauerland, U.: DP Is Not a Scope Island. Linguistic Inquiry 36(2), 303–314 (2005) 2. May, R.: The Grammar of Quantification. Ph.d. thesis. MIT Press, Cambridge (1977) 3. Larson, R.K.: Quantifying into NP. Ms. MIT, Cambridge (1987) 4. Bruening, B.: QR Obeys Superiority: Frozen Scope and ACD. Linguistic Inquiry 32(2), 233–273 (2001) 5. Heim, I.: Presupposition Projection and the Semantics of Attitude Verbs. Journal of Semantics 9, 183–221 (1992) 6. May, R.: Logical Form: Its Structure and Derivation. MIT Press, Cambridge (1985) 7. Huang, C.T.J.: Logical relations in Chinese and the theory of grammar. Garland, New York (1998) (Ph.d. thesis, MIT, 1982) 8. Rooth, M.: Association with Focus. Ph.d. thesis, UMass, Amherst (1985) 9. Büring, D.: Crossover Situations. Natural Language Semantics 12(1), 23–62 (2004) 10. Richards, N.: What moves where when in which language? Ph.d. thesis, MIT (1997) 11. Sauerland, U.: Syntactic Economy and Quantifier Raising. Ms., Universität Tübingen (2000) 12. Charlow, S.: Inverse linking, Superiority, and QR. Ms., New York University (2009) 13. Heim, I., Kratzer, A.: Semantics in Generative Grammar. Blackwell, Oxford (1998) 14. Larson, R.K., den Dikken, M., Ludlow, P.: Intensional Transitive Verbs and Abstract Clausal Complementation. Ms., SUNY at Stony Brook, Vrije Universiteit, Amsterdam (1997) 15. Merchant, J.: Antecedent-contained deletion in negative polarity items. Syntax 3(2), 144–150 (2000) 16. Chierchia, G.: Reference to Kinds across Languages. Natural Language Semantics 6, 339–405 (1998) 17. Magri, G.: Constraints on the readings of bare plural subjects of individual-level predicates: syntax or semantics? In: Bateman, L., Ussery, C. (eds.) Proceedings from NELS35, vol. I, pp. 391–402. GLSA Publications, Amherst (2004) 18. von Fintel, K., Iatridou, S.: Epistemic Containment. Linguistic Inquiry 34(2), 173–198 (2003) 19. Beghelli, F., Stowell, T.: Distributivity and negation: the syntax of each and every. In: Szabolcsi, A. (ed.) Ways of Scope Taking, pp. 71–109. Kluwer, Dordrecht (1997) 20. Larson, R.K.: Double Objects Revisited: Reply to Jackendoff. Linguistic Inquiry 21(4), 589–632 (1990) 21. Farkas, D.: Evaluation indices and scope. In: Szabolcsi, A. (ed.) Ways of Scope Taking, pp. 183–215. Kluwer, Dordrecht (1997) 22. Keshet, E.: Good Intensions: Paving Two Roads to a Theory of the De re/De dicto Distinction. Ph.d. thesis, MIT (2008)
Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation Michael Franke Seminar f¨ ur Sprachwissenschaft University of T¨ ubingen
Abstract. This paper applies a model of boundedly rational “level-k thinking” [1–3] to a classical concern of game theory: when is information credible and what shall I do with it if it is not? The model presented here extends and generalizes recent work in game-theoretic pragmatics [4–6]. Pragmatic inference is modeled as a sequence of iterated best responses, defined here in terms of the interlocutors’ epistemic states. Credibility considerations are a special case of a more general pragmatic inference procedure at each iteration step. The resulting analysis of message credibility improves on previous game-theoretic analyses, is more general and places credibility in the linguistic context where it, arguably, belongs.
1
Semantic Meaning and Credible Information in Signaling Games
The perhaps simplest game-theoretic model of language use is a signaling game with meaningful signals. A sender S observes the state of the world t ∈ T in private and chooses a message m from a set of alternatives M all of which are assumed to be meaningful in the (unique and commonly known) language shared by S and a receiver R. In turn, R observes the sent message and chooses an action a from a given set A. In general, the payoffs for both S and R depend on the state t, the sent message m and the action a chosen by the receiver. Formally, a signaling game with meaningful signals is a tuple {S, R} , T, Pr, M, [[·]] , A, US , UR where Pr ∈ Δ(T ) is a probability distribution over T ; [[·]] : M → P(T ) is a semantic denotation function and US,R : M × A × T → IR are utility functions for both sender and receiver.1 We can
1
I would like to thank Tikitu de Jager, Robert van Rooij, Daniel Rothschild, Marc Staudacher and three anonymous referees for insightful comments, help and discussion. I benefited greatly from discussions with Gerhard J¨ ager. Also, I am thankful to Sven Lauer for waking my interest by first explaining to me with enormous patience some puzzles about credibility that I did not fully understand at the time. Errors are my own. I will assume throughout that (i) all sets T , M and A are non-empty and finite, that (ii) Pr(t) > 0 for all t ∈ T , that (iii) for each state t there is at least one message m which is true in that state and that (iv) no message is contradictory, i.e., there is no m for which [[m]] = ∅.
T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 13–24, 2010. c Springer-Verlag Berlin Heidelberg 2010
14
M. Franke
t∃¬∀ t∀
a∃¬∀ a∀ msome mall √ 1,1 0,0 − √ √ 0,0 1,1
Fig. 1. Scalar Implicature
thigh tlow
amate aignore mhigh mlow √ 1,1 0,0 − √ 1,0 0,1 −
Fig. 2. Partial Conflict
conceive of such signaling games as abstract mathematical models of a conversational context whose most important features they represent: the interlocutors’ beliefs, behavioral possibilities and preferences. If a signaling game is a context model, the game’s solution concept is what yields a prediction of the behavior of agents in the modelled conversational situation. The following easy example of a scalar implicature, e.g., the inference that not all students came associated with the sentence “Some of the students came”, makes this distinction clear. A simple context model for this case is the signaling game called “Scalar Implicature” in Figure 1:2 there are two states t∃¬∀ and t∀ , two messages msome and mall with semantic meaning as indicated and two receiver interpretation actions a∃¬∀ or a∀ which correspond one-to-one with the states; sender and receiver payoffs are aligned: an implementation of the standard assumption that conversation and implicature calculation revolve around the cooperative principle [7]. A solution concept, whatever it may be, should then ideally predict that S t∀ (S t∃¬∀ ) chooses msome (mall ) and the receiver responds with action a∃¬∀ (a∀ ).3 It is obvious that in order to arrive at this prediction, a special role has to be assigned to the conventional, semantic meaning of the messages involved. For instance, in the above example anti-semantic play, as we could call it, that simply reverses the use of messages, should be excluded. Most game-theoretic models of language use hard-wire semantic meaning into the game play, either as a restriction on available moves of sender and receiver, or into the payoffs, but in both cases effectively enforcing truthfulness and trust. This is fine as long as conversation is mainly cooperative and preferences aligned. But let’s face it: the central Gricean assumption of cooperation is an optimistic idealization after all; conflict, lies and deceit are as ubiquitous as air. But then, hard-wiring of truthfulness and trust limits the applicability of our models as it excludes the possibility that senders may wish to mislead their audience. We should aim for more general models and, ideally, let the agents, not the modeller decide when to be truthful and what to trust. Opposed to hard-wiring truthfulness and trust, the most liberal case at the other end of the spectrum is to model communication, not considering reputation or further psychological constraints at all, as cheap talk. Here messages do not impose restrictions on the game play and are entirely payoff irrelevant: US,R (m, a, t) = US,R (m , a, t) for all m, m ∈ M , a ∈ A and t ∈ T . However, if talk is cheap, yet exogenously meaningful, the question arises how to integrate 2 3
Unless indicated, I assume that states are equiprobable in example games. For t ∈ T , I write S t as an abbreviation for “a sender of type t”.
Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation
15
semantic meaning into the game. Standard solution concepts, such as sequential equilibrium or rationalizability, are too weak to predict anything reasonable in this case: they allow for nearly all anti-semantic play and also for babbling, where signals are sent, as it were, arbitrarily and therefore ignored by the receiver. In response to this problem, game theorists have proposed various refinements of the standard solution concepts based on the notion of credibility.4 The idea is that semantic meaning should be respected (in the solution concept) wherever this is reasonable in view of the possibly diverging preferences of interlocutors. As an easy example, look at the game “Partial Conflict” in Figure 2 where S is of either a high quality or a low quality type, and where R would like to pair with S thigh only, while S wants to pair with R irrespective of her type. Interests are in partial conflict here and, intuitively, a costless, non-committing message mhigh is not credible, because S tlow would have all reason to send it untruthfully. Therefore, intuitively, R should ignore whatever S says in this game. In general, if nothing prevents S from babbling, lying or deceiving, she might as well do so; whenever she even has an incentive to, she certainly will. For the receiver the central question becomes: when is a signal credible and what should I do if it is not? This paper offers a fresh look at this classical problem of game theory. The novelty is, so to speak, a “linguistic turn”: I suggest that credibility considerations are pragmatic inferences, in some sense very much alike —and in another sense very much unlike— conversational implicatures. I argue that this linguistic approach to credibility of information improves on the classical game-theoretic analyses by Farrell and Rabin [8, 9]. In order to implement conventional meaning of signals in a cheap talk model, the present paper takes an epistemic approach to the solution of games: the model presented in this paper spells out the reasoning of interlocutors in terms of their beliefs about the behavior of their opponents as a sequence of iterated best responses (ibr) which takes semantic meaning as a starting point. For clarity: the ibr model places no restriction whatsoever on the use of signals; conventional meaning is implemented merely as a focal element in the deliberation of agents. This way, the ibr model extends recent work in game-theoretic pragmatics [5, 6], to which it adds generality by taking diverging preferences into account and by implementing the basic assumptions of “levelk models” of reasoning in games [1–3]. In particular, agents in the model are assumed to be boundedly rational in the sense that each agent computes only finitely many steps of the best response sequence. Section 2 scrutinizes the notion of credibility, section 3 spells out the formal model and section 4 discusses its properties and predictions.
2
Credibility and Pragmatic Inference
The classical idea of message credibility is due to Farrell [8]. Farrell seeks an equilibrium refinement that pays due respect to the semantic meaning of messages. His 4
The standards in the debate about credibility were set Farrell [8] for equilibrium and by Rabin [9] for rationalizability. I will mainly focus on these two classical papers here for reasons of space.
16
M. Franke
a1 a2 a3 m1 m2 √ t1 4,3 3,0 1,2 − √ t2 3,0 4,3 1,2 − Fig. 3. Best Message Counts
a1 a2 a3 a4 m12 √ t1 4,5 5,4 0,0 1,4 √ t2 0,0 4,5 5,4 1,4 t3 5,4 0,0 4,5 1,4 −
m23 m13 √ − √ − √ √
Fig. 4. Further Iteration
notion of credibility is therefore tied to a given reference equilibrium as a status quo. According to Farrell, then, a message m is Farrell-credible with respect to a given equilibrium if all t ∈ [[m]] prefer the receiver to interpret m literally, i.e., to play a best response to the belief Pr(·| [[m]]) that m is true, over the equilibrium play, while no type t ∈ [[m]] does. A number of objections can be raised against Farrell-credibility. First of all, the definition requires all types in [[m]] to prefer a literal interpretation of m over the reference equilibrium. This makes sense, under Farrell’s Rich Language Assumption (rla) that for every X ⊆ T there is a message m with [[m]] = X. This assumption is prevalent in game-theoretic discussions of credibility, but restricts applicability. I will show in section 4 that this assumption seriously restricts Rabin’s account [9]. But for now, suffice it to say that, in particular, the rla excludes models like G1, used to study pragmatic inference in the light of (partial) inexpressibility. I will drop the rla here to aim for more generality and compatibility with linguistic pragmatics.5 Doing so, implies amending Farrellcredibility to require only that some types in [[m]] prefer a literal interpretation of m over the reference equilibrium. Still, there are further problems. Matthews et al. criticize Farrell-credibility as being too strong [12]. Their argument builds on the example in Figure 3. Compared to the babbling equilibrium, in which R performs a3 , messages m1 and m2 are intuitively credible: both S t1 , as well as S t2 have good reason to send m1 and m2 respectively. Communication seems possible and utterly plausible. However, neither message is Farrell-credible, because for i, j ∈ {1, 2} and i = j not only S tj , but also S ti prefers R to play a best response to a literal interpretation of mj , which would trigger action aj , over the no-communication outcome a3 . The problem with Farrell’s notion is obviously that just doing better than equilibrium is not enough reason to send a message, when sending another message is even better for the sender. When evaluating the credibility of a message m, we have to take into account alternative forms that t ∈ [[m]] might want to send. Compare this with the scalar implicature game in Figure 1. Intuitively, message msome is interpreted as communicating that the true state of affairs is t∃¬∀ , 5
A reviewer points out that the rla has a correspondent in the linguistic world in the “principle of effability” [10]. The reviewer supports dropping the rla, because otherwise pragmatic inferences are limited to context and effort considerations. It is also very common (and, to my mind, reasonable) to restrict attention to certain alternative expressions only, namely those that are salient (in context) after observing a message. Of course, game theory is silent as to where the alternatives come from, since this is a question for the linguist, perhaps even the syntactician [11].
Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation
17
because in t∀ the sender would have used mall . In other words, the receiver discards a state t ∈ [[m]] as a possible sender of m because that type has a better message to send. Of course, such pragmatic enrichment does not make a message intuitively incredible, as it is still used in line with its semantic meaning. Intuitively speaking, in this setting S even wants R to draw this pragmatic inference. This is, of course, different in the “Partial Conflict” game in Figure 2. In general, if S wants to mislead, she intuitively wants the receiver to adopt a certain belief, but she does not want the receiver to realize that this belief might be false: we could say, somewhat loosely, that S wants her purported communicative intention to be recognized (and acted upon), but she does not want her deceptive intention to be recognized. Nevertheless, if the receiver does manage to recognize a deceptive intention, this too may lead to some kind of pragmatic inference, albeit one that the sender did not intend the receiver to draw. While the implicature in the scalar implicature game rules out a semantically feasible possibility, credibility considerations, in a sense, do the exact opposite: message mhigh is pragmatically weakened in the “Partial Conflict” game by ruling in state tlow . Despite the differences, there is a common core to both implicature and credibility inference. Here and there, the receiver seems to reason: which types of senders would send this message given that I believe it literally? Indeed, exactly this kind of reasoning underlies Benz and van Rooij’s optimal assertions model of implicature calculation for the purely cooperative case [6]. The driving observation of the present paper is that the same reasoning might not only rule out states t ∈ [[m]] to yield implicatures but may also rule in states t ∈ [[m]]. When the latter is the case, m seems intuitively incredible. Still, the reasoning pattern by which implicatures and credibility-based inferences are computed is the same. On superficial reading, this view on message credibility can be attributed to Stalnaker [4]:6 call a message m BvRS-credible (Benz, van Rooij, Stalnaker) iff for some types t ∈ [[m]], but for no type t ∈ [[m]] S t ’s expected utility of sending m given that R interprets literally is at least as great as S t ’s expected utility of sending any alternative message m . The notion of BvRS-credibility matches our intuitions in all the cases discussed so far, but it is, in a sense, self-refuting, as the game in Figure 4 from [12] shows. In this game, all the available messages m12 , m23 and m13 are BvRScredible, because if R interprets literally S t1 will use message m12 , S t2 will use message m23 and S t3 will use message m13 . No message is used untruthfully by any type. However, if R realizes that exactly S t1 uses message m12 , he would 6
It is unfortunately not entirely clear to me what exactly Stalnaker’s proposal amounts to, as insightful as it might be, because the account is not fully spelled out formally. The basic idea seems to be that (something like) the notion of BvRScredibility, as it is called here, should be integrated as a constraint on receiver beliefs— believe a message iff it is BvRS-credible— into an epistemic model of the game together with some appropriate assumption of (common) belief in rationality. The class of game models that satisfies rationality and credibility constraints would then ultimately define how signals are used and interpreted.
18
M. Franke
rather not play a2 , but a1 . But if the sender realizes that message m12 triggers the receiver to play a1 , suddenly S t3 wants to send m12 untruthfully. This example shows that BvRS-credibility is a reliable start, but stops too short. If messages are deemed credible and therefore believed, this may create an incentive to mislead. What seems needed to rectify the formal analysis of message credibility is a fully spelled-out model of iterated best responses that starts in the Benz-van-Rooij-Stalnaker way and then carries on iterating. Here is such a model.
3
The IBR Model and Its Assumptions
3.1
Assumptions: Focal Meaning and Bounded Rationality
The ibr model presented in this paper rests on three assumptions with which it also sets itself apart from previous best-response models in formal pragmatics [5, 6, 13]. The first assumption is the Focal Meaning Assumption: semantic meaning is focal in the sense that the sequence of best responses starts with a purely semantic truth-only sender strategy. Semantic meaning is also assumed focal in the sense that throughout the ibr sequence R believes messages to be truthful unless S has a positive incentive to be untruthful. This is the second, so called Truth Ceteris Paribus Assumption (tcp). These two (epistemic) assumptions assign semantic meaning its proper place in this model of cheap-talk communication. The third assumption is the Bounded Rationality Assumption: I assume that players in the game have limited resources which allow them to reason only up to some finite iteration depth k. At the same time I take agents to be overconfident : each agent believes that she is smarter than her opponent. Camerer et al. make an empirical case for these assumptions about the psychology of reasoners [3].7 However, for simplicity, I do not implement their Cognitive Hierarchy Model in full. Camerer et al. assume that each agent who is able to reason up to strategic depth k has a proper belief about the population distribution of players who reason up to depth l < k, but I will assume here, just to keep things simple, that each player believes that she is exactly one step ahead of her opponent [2, 15]. (I will discuss this simplifying assumption critically in section 4.) 3.2
Beliefs and Best Responses
Given a signaling game, a sender signaling-strategy is a function σ ∈ S = (Δ(M ))T and a receiver response-strategy is a function ρ ∈ R = 7
A good intuitively accessible example why this should be is a so-called beauty contest game [14]. Each player from a group of size n > 2 chooses a number from 0 to 100. The player closest to 2/3 the average wins. When this game is played with a group of subjects who have never played the game before, the usual group average lies somewhere between 20 to 30. This is quite far from the group average 0 which we would expect from common (true) belief in rationality. Everybody seems to believe that they are just a bit smarter than everybody else, without noticing their own limitations.
Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation
19
(Δ(A))M . In order to define which strategies are best responses to a given belief, we need to define the game-relevant beliefs of both S and R. Since the only uncertainty of S concerns what R will do, the set of relevant sender beliefs ΠS is just the set of receiver response-strategies: ΠS = R. On the receiver’s side, we may say, with some redundancy, that there are three components in any gamerelevant belief [16]: firstly, R has a prior belief Pr(·) about the true state of the world; secondly, he has a belief about the sender’s signaling strategy; and thirdly, he has a posterior belief about the true state after hearing a message. Posteriors should be derived by Bayesian update from the former two components, but also specify R’s beliefs after unexpected surprise messages. Taken 1 together, the set 2 3 of relevant receiver beliefs ΠR is the set of all triples πR , πR , πR for which 1 2 3 = Pr, πR ∈ S = (Δ(M ))T and πR ∈ (Δ(T ))M such that for any t ∈ T and πR 2 m ∈ M if πR (t, m) = 0, then: 3 πR (m, t) =
1 2 (t) × πR (t, m) πR . 1 2 (t , m) π (t ) × πR t ∈T R
Given a sender belief ρ ∈ ΠS , say that σ is a best response signaling strategy to belief ρ iff for all t ∈ T and m ∈ M we have: ρm (a) × US (m , a, t) . σ(t, m) = 0 → m ∈ arg max m ∈M
a∈A
The set of all such best responses to belief ρ is denoted by S(ρ). Given a receiver belief πR ∈ ΠR say that ρ is a best response strategy to belief πR iff for all m ∈ M and a ∈ A we have: 3 ρ(m, a) = 0 → a ∈ arg max πR (m, t) × UR (m, a , t) . a ∈A
t∈T
The set of all such best responses to belief πR isdenoted by R(πR ). Also, if ΠR ⊆ ΠR is a set of receiver beliefs, let R(ΠR ) = πR ∈Π R(πR ). R
3.3
Strategic Types and the IBR Sequence
In line with the Bounded Rationality Assumption of Section 3.1, I assume that senders and receivers are of different strategic types. Strategic types correspond to the level k of strategic reasoning a player in the game performs (while believing she thereby outperfoms her opponent by exactly one step of reasoning). I will give an inductive definition of strategic types in terms of player’s beliefs, starting with a fixed strategy σ0∗ of S0 .8 Then, for any k ≥ 0, Rk is characterized by a ∗ belief set πR ⊆ ΠR that S is a level-k sender and Sk+1 is characterized by a k ∗ belief πSk+1 ∈ ΠS that R is a level-k receiver. I assume that S0 plays according to the signaling strategy σ0∗ which simply sends any true message with equal probability in all states. There need not be any 8
I will write Sk and Rk to refer to a sender or receiver of strategic type k. Likewise, Skt refers to a sender of strategic type k and knowledge type t.
20
M. Franke
belief to which this is a best response, as level-0 senders are (possibly irrational) dummies to implement the Focal Meaning Assumption. R0 then believes that he is facing S0 . With unique σ0∗ , which sends all messages in M with positive probability (M is finite and contains no contradictions), R0 is characterized ∗ entirely by the unique belief πR that S plays σ0∗ . o In general, Rk believes that he is facing a level-k sender. For k > 0, Sk is characterized by a belief πS∗ k ∈ ΠS . Rk consequently believes that Sk plays a best response σk ∈ S(πS∗ k ) to this belief. We can leave this unrestricted and assume that Rk considers any σk ∈ S(πS∗ k ) possible. But it will transpire that for an intuitively appealing analysis of message credibility we need to assume that Rk takes Sk to be truthful all else being equal (see also discussion in section 4). We implement the tcp assumption of Section 3.1 as a restriction S ∗ (πS∗ k ) ⊆ S(πS∗ k ) on signaling strategies held possible by R. Of course, even when restricted, there need not be a unique signaling strategy here. As a general tie-break rule, assume the “principle of insufficient reason” that all σk ∈ S ∗ (πS∗ k ) are equiprobable to Rk . That means that Rk effectively believes that his opponent is playing response strategy ∗ ) σ(t, m) σ∈S ∗ (πS k σk∗ (t, m) = . |S ∗ (πS∗ k )| This fixes Rk ’s beliefs about the behavior of his opponent, but it need not fix Rk ’s 3 about surprise messages. Since this matter is intricate and moreover belief πR Rk ’s counterfactual beliefs do not play a crucial role in any examples discussed in this paper, I will not pursue this issue at all in this paper (but see also footnote 9 below). In general, let us say that Rk is characterized by any belief whose second component is σk∗ and whose third component satisfies some (coherent, but possibly vacuous) assumption about the interpretation of surprise messages. ∗ ⊆ ΠR be the set of all such beliefs. Rk is then fully characterized by Let, πR k ∗ πRk . In turn, Sk+1 believes that her opponent is a level-k receiver who plays a best ∗ response ρk ∈ R(πR ). With the above tie-break rule Sk+1 is fully characterized k by the belief ∗ ) ρ(m, a) ρ∈R(πR ∗ k . ρk (m, a) = ∗ )| |R(πR k 3.4
Credibility and Inference
∗ Define that a signal m is k-optimal in t iff σk+1 (t, m) = 0. The set of kt optimal messages in t are all messages that Rk+1 believes Sk+1 might send (thus taking the tcp assumption into account). Similarly, distill from R’s beliefs his interpretation-strategy δ : M → P(T ) as given by belief πR : δπR (m) = 3 {t ∈ T | πR (m, t) = 0}. This simply is the support of the posterior beliefs of R after receiving message m. Let’s write δk for the interpretation strategy of a level-k receiver.
Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation
21
For any k > 0, since Sk believes to face Rk−1 with interpretation strategy δk−1 , wanting to send message m would intuitively count as an attempt to mislead if sent by Skt just in case t ∈ δk−1 (m). Such an attempt would moreover be untruthful if t ∈ [[m]]. While Rk−1 would be deceived, Rk would see through the attempted deception. From Rk ’s point of view, who adheres to the tcp Assumption, a message m is incredible if it is k − 1-optimal in some t ∈ [[m]]. But then Rk will include t in his interpretation of m: recognizing a deceptive intention leads to pragmatic inference. In general, we should consider a message m credible unless some type t ∈ [[m]] would want to use m somewhere along the ibr sequence; precisely, m is credible iff δk (m) ⊆ [[m]] for all k ≥ 0.9
4
Discussion
The ibr model makes intuitively correct predictions about message credibility for the games considered so far. In the scalar implicature game, R0 responds to msome with the appropriate action a∃¬∀ , but still interprets δ0 (msome ) = {t∃¬∀ , t∀ }. In turn, R1 interprets as δ1 (msome ) = {t∃¬∀ }; he has pragmatically enriched the semantic meaning by taking the sender’s payoff structure and available messages into account. After one round a fixed-point is reached, with fully revealing credible signaling in accordance with intuition. In the game “Partial Conflict”, ibr t predicts that both S1high and S1tlow will use mhigh which is therefore not credible. In the game from Figure 3, also fully revealing communication is predicted and for the game in Figure 4 ibr predicts that all messages are credible for R0 and R1 , but not for R2 , hence incredible as such. In general, the ibr model predicts that communication in games of pure coordination is always credible. a1 a2 m12 √ t1 1,1 0,0 √ t2 0,0 1,1 t3 0,0 1,1 -
m3 − − √
Fig. 5. White Lie 9
Pr(t) a1 a2 a3 m12 √ t1 1/8 1,1 0,0 0,0 √ t2 3/4 0,0 1,1 0,0 t2 1/8 0,0 0,0 1,1 −
m23 − √ √
Fig. 6. Game without Name
It may seem that messages which would not be sent by any type (after the first round or later) come out credible under this definition, which would not be a good prediction. (Thanks to Daniel Rothschild (p.c.) for pointing this out to me.) However, this is not quite right: we get into this predicament only for some versions of the ibr sequence, not for others. It all depends on how the receiver forms his counterfactual beliefs. If, for instance, we assume that R rationalizes observed behavior even if it surprises him, we can keep the definition unchanged: if no type whatsoever has an outstanding reason to send m, the receiver’s posterior beliefs after m will support any type. So, unless m is tautologous, it is incredible. Still, Rothschild’s criticism is appropriate: the definition of message credibility offered here is, in a sense, incomplete as long as we do not properly define the receiver’s counterfactual beliefs; something left for another occasion.
22
M. Franke
Proposition 1. Take a signaling game with T = A and US,R (·, t, t ) = c > 0 if t = t and 0 otherwise. Then δk (m) ⊆ [[m]] for all k and m. Proof. Clearly, δ0 (m) ⊆ [[m]] for arbitrary m. So assume that δk (m) ⊆ [[m]]. In t this case Sk+1 will use m only if t ∈ δk (m). But then t ∈ [[m]] and therefore δk+1 (m) ⊆ [[m]].
However, the ibr model does not guarantee generally that communication is credible even when preferences are perfectly aligned, i.e., US = UR . This may seem surprising at first, but is due naturally to the possibility of, what we could call, white lies: untruthful signaling that is beneficial for the receiver. These may occur if the set of available signals is not expressive enough. As an easy example, consider the game in Figure 5 where S t2 will use m3 untruthfully to induce action a2 , which, however, is best for both receiver and sender. To understand the central role of the tcp assumption in the present proposal, consider the game in Figure 6. Here, R0 has the following posterior beliefs: after hearing message m12 he rules out t3 and believes that t2 is three times as likely as t1 ; similarly, after hearing message m23 he rules out t1 and believes that t2 is three times as likely as t3 . Consequently, R0 responds to both signals with a2 . Now, S1t1 , for instance, does not care which message to choose from, as far as her expected utilities are concerned. But R1 nevertheless assumes that S1t1 speaks truthfully. It’s thanks to the tcp assumption that ibr predicts messages to be credible in this game. This game also shows a difference between the ibr model and Rabin’s model of credible communication [9], which superficially look very similar. Rabin’s model consists of two components: the first component is a definition of message credibility which is almost a two-step iteration of best responses starting from the semantic meaning; the second component is iterated strict dominance around a fixed core set of Rabin-credible messages being sent truthfully and believed. In particular, Rabin requires for m to be credible that m induces, when taken literally, exactly the set of all sender-best actions (from the set of actions that are inducible by some receiver belief) of all t ∈ [[m]]. This is defensible under the Rich Language Assumption, but both messages in the last considered game fail this requirement. Consequently, with no credible message to restrict iterated strict dominance, Rabin’s model predicts a total anything-goes for this game. This shows the limited applicability of approaches to message credibility that are inseparable from the Rich Language Assumption. The present notion of message credibility and the ibr model are not restricted in this sense and fare well with (partial) inexpressibility and the resulting inferences. To wrap up: as a solution concept, the epistemic ibr model offers, basically, a set of beliefs, viz., beliefs obtained under certain assumptions about the psychology of agents from a sequence of iterated best responses. I do not claim that this model is a reasonable model for human reasoning in general. Certainly, the simplifying assumption that players believe that they are facing a level-k opponent, and not possibly a level-l < k opponent, is highly implausible proportional to k, but especially so for agents that have, in a manner of speaking, already reasoned themselves through a circle multiple times. (It is easily verified that
Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation
23
for finite M and T the ibr sequence always enters a circle after some k ∈ IN.)10 Still, I wish to defend that the ibr model does capture (our intuitions about) certain aspects of (idealized) linguistic behavior, namely pragmatic inference in cooperative and non-cooperative situations. Whether it is a plausible model of belief formation and reasoning in the envisaged linguistic situations is ultimately an empirical question. In conclusion, the ibr model offers a novel perspective on message credibility and the pragmatic inferences based on this notion. The model generalizes existing game-theoretical models of pragmatic inference by taking conflicting interests into account. It also generalizes game-theoretic accounts of credibility by giving up the Rich Language Assumption. The explicitly epistemic perspective on agents’ deliberation assigns a natural place to semantic meaning in cheap-talk signaling games as a focal starting point. It also highlights the unity in pragmatic inference: in this model both credibility-based inferences and implicatures are different outcomes of the same reasoning process.
References 1. Stahl, D.O., Wilson, P.W.: On players’ models of other players: Theory and experimental evidence. Games and Economic Behavior 10, 218–254 (1995) 2. Crawford, V.P.: Lying for strategic advantage: Rational and boundedly rational misrepresentation of intentions. American Economic Review 93(1), 133–149 (2003) 3. Camerer, C.F., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. The Quarterly Journal of Economics 119(3), 861–898 (2004) 4. Stalnaker, R.: Saying and meaning, cheap talk and credibility. In: Benz, A., J¨ ager, G., van Rooij, R. (eds.) Game Theory and Pragmatics, pp. 83–100. Palgrave MacMillan, Basingstoke (2006) 5. J¨ ager, G.: Game dynamics connects semantics and pragmatics. In: Pietarinen, A.-V. (ed.) Game Theory and Linguistic Meaning, pp. 89–102. Elsevier, Amsterdam (2007) 6. Benz, A., van Rooij, R.: Optimal assertions and what they implicate. Topoi 26, 63–78 (2007) 7. Grice, P.H.: Studies in the Ways of Words. Harvard University Press, Cambridge (1989) 8. Farrell, J.: Meaning and credibility in cheap-talk games. Games and Economic Behavior 5, 514–531 (1993) 9. Rabin, M.: Communication between rational agents. Journal of Economic Theory 51, 144–170 (1990) 10. Katz, J.J.: Language and Other Abstract Objects. Basil Blackwell, Malden (1981) 11. Katzir, R.: Structurally-defined alternatives. Linguistics and Philosophy 30(6), 669–690 (2007) 10
It is tempting to assume that “looping reasoners” may have an Aha-Erlebnis and to extend the ibr sequence by transfinite induction assuming, for instance, that level-ω players best respond to the belief that the ibr sequence is circling. I do not know whether this is necessary and/or desirable for linguistic applications. We should keep in mind though that in some cases human reasoners may not get to the ideal level of reasoning in this model and in others they might even go beyond it.
24
M. Franke
12. Matthews, S.A., Okuno-Fujiwara, M., Postlewaite, A.: Refining cheap talk equilibria. Journal of Economic Theory 55, 247–273 (1991) 13. J¨ ager, G.: Game theory in semantics and pragmatics. Manuscript, University of Bielefeld (February 2008) 14. Ho, T.-H., Camerer, C., Weigelt, K.: Iterated dominance and iterated best response in experimental “p-beauty contests”. The American Economic Review 88(4), 947–969 (1998) 15. Crawford, V.P.: Let’s talk it over: Coordination via preplay communication with level-k thinking (Unpublished Manuscript) (2007) 16. Battigalli, P.: Rationalization in signaling games: Theory and applications. International Game Theory Review 8(1), 67–93 (2006)
What Makes a Knight? Stefan Wintein
1
Introduction
In Smullyan’s well known logic puzzles (see for instance [3]), the notion of a knight, which is a creature that always speaks the truth, plays an important role. Rabern and Rabern (in [2]) made the following observation with respect to knights. They noted that when a knight is asked (1), he gets into serious trouble. Is it the case that: your answer to this question is ‘no’ ?
(1)
Indeed, upon answering (1) with either ‘yes’ or ‘no’, the knight can be accused of lying. How then, does a knight respond to (1)? Rabern and Rabern (henceforth R&R) assume that the knight reacts to questions like (1) with an answer different from ‘yes’ and ‘no’; let’s say that this reaction consists of answering (1) with neither. R&R use their assumption of a third possible reaction to set up an argument with the following intriguing conclusion: it is possible to determine the value of a three valued variable x by asking a single question to a knight (who knows the value of x). R&R’s argument is given in natural language, and is not backed up by formal machinery which rigorously defines the criteria which determine the answer of a knight to an arbitrary question. In [6] we asked under what conditions the informal argument of R&R can be reconstructed as a (formally) valid piece of reasoning. We showed that, under the assumption that it is not allowed to ask questions to the knight in which the “neither predicate” occurs self-referentially, there is a natural notion of validity according to which the reasoning of R&R can be considered as valid1 . The ban on self-referential questions involving the neither predicate excludes that we ask a knight a question such as the following. Is it the case that: you answer this question with ‘neither’ or you answer it with ‘no’ ?
(2)
Questions like (2) cause problems for a certain conception of a knight. According to this conception, a knight reacts to σ by answering with ‘yes’, ‘no’ or ‘neither’ if and only if σ is, respectively, true, false or ungrounded2 . By exploiting wellknown “Strengthened Liar arguments”, a contradiction can be derived from this knight conception and the hypothesis that the knight answers (2) with either 1 2
Thanks to Reinhard Muskens for his comments on this work. Actually, we were concerned with an ungroundedness predicate, equating, in alethic terms, Liars and Truthtellers. By which we mean that it does not receive a classical truth value in Kripke’s Strong Kleene minimal fixed point valuation.
T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 25–37, 2010. Springer-Verlag Berlin Heidelberg 2010
26
S. Wintein
‘yes’, ‘no’ or ‘neither’. In order to avoid the derivation of such contradictions, [6] formulated a notion of validity under which R&R’s argument is valid, but which banned self-referential constructions with the neither predicate. Interestingly, the formal reconstruction given in [6] of R&R’s informal argument differs from the intuitive conception of a knight that underlies their paper. For, as Brian Rabern explained in personal communication, it is allowed to ask questions like (2) to a knight; given such questions, a knight answers with ‘neither’, despite the fact that this reaction turns (2) into a true sentence. How then to characterize the process which determines the answer of a knight to an arbitrary question? In this paper, we will give a precise answer to that question by our definition of the knight function. According to our conception of a knight, a knight may react in four distinct ways3 to a question. Besides ‘yes’, ‘no’ and ‘neither’, we assume that a knight may also react with ‘both’. For instance, he will do so on the following question. Is it the case that: your answer to this question is ‘yes’ ?
(3)
Upon being asked (3), a knight may react by answering ‘yes’ or ‘no’ without being accused of lying. We will assume that when it is possible to answer a question σ both with ‘yes’ and ‘no’ in the sense alluded to, the knight will react to σ by answering ‘both’. Hence, we will distinguish between the semantic value of Liar like questions such as (1) and Truthteller like questions such as (3). Formally, this means that we will be concerned with the semantic values of Belnap’s famous t, f , b, n. For us, if a question σ has value logic which are contained in 4 t then the knight will answer it with ‘yes’ while a value of f implies that the knight will answer σ with ‘no’. Likewise, the semantic value n is associated with an answer of ‘neither’ while b is associated with answering ‘both’. In this paper, the questions that can be asked to a knight will be modeled as sentences of a language which has available four unary “answering” predicates, ‘T ’, ‘F ’, ‘N ’ and ‘B’, corresponding4 to an answer of ‘yes’, ‘no’, ‘neither’ and ‘both’ respectively. Importantly, in our interpretation of it is possible to create selfreference with respect to all four “answering” predicates. There is no, as there was in [6], ban on self-reference of any kind. The essence of this paper is to give a characterization of the knight function , i.e., the function which maps an arbitrary question (sentence) σ of , to the answer given to σ by a knight. Exploiting our function , we conclude the paper by showing a further interesting property of a knight; it is possible to determine the value of a four valued variable x by asking a single question to a knight (who knows the value of x). Structure of this paper. In Section 2 we will introduce the formal machinery of this paper. We will work with quantifier free languages which employ two kind 3 4
This is a major difference with [2] and [6], where knights are considered which have an answering repertoire of three distinct reactions. Of course, if the reader likes he may also adopt another (e.g. an alethic one) interpretation of the four predicates.
What Makes a Knight?
27
of constant symbols; quotational constant symbols, used for quoting sentences, and non quotational ones, which are used to generate circular and self-referential sentences. The language will be equipped with assertoric rules which are, formally, tableau rules for the assertoric sentences of , which are sentences that are signed with the symbols in A , D , A , D . Intuitively, Aσ indicates that it is possible to assert σ, while Dσ indicates that it is not possible to deny σ. In section 3 the assertoric rules of are used, in combination with techniques of assertoric semantics (see, e.g., [7], [8]), to define the knight function . In section 4 we apply the knight function to an interesting logic puzzle in the spirit of Smullyan. Section 5 gives conclusions.
2
Preliminaries
2.1
The Language
Ä
We will work with a language that is, in a sense, intermediate between propositional logic and first order logic. Let us explain in what sense. On the one hand, contains a set of propositional constant symbols P pn n N. On the other hand, our language contains four (unary) predicate symbols, ‘T ’, ‘F ’, ‘N ’ and ‘B’ . The set of non quotational constant symbols is given by C cn n N. The quotational constant symbols of are contained in the set Q . This set is defined together with Sen , the set of sentences of , as the smallest sets which satisfy the following three equations. c C, p P T c, F c, N c, B c, p Sen φ, ψ Sen φ, φ ψ , φ ψ Sen , φ Q φ Q T φ, F φ, N φ, B φ Sen 2.2
The Denotation Function π
A denotation function for is a function π C Q Sen which is such that for any φ Q , π φ φ. Thus, a denotation function maps the quotational constants to the associated sentences while the non quotational constants in C may refer to any sentence of . The non quotational constants can be used to create self-reference. In this paper, we assume that constants λ, τ C are always such that π λ T λ and π τ T τ . Hence T λ and T τ model5 the (Strengthened) Liar and the Truthteller respectively. 2.3
Factual Valuations and 4 Valuations
The elements of P are thought of as (possible) facts. A factual valuation for is a 1, 0, 0, 1. We will interpret ‘V p 1, 0’ as function V P 2, where 2
5
Strictly speaking, T λ models the following question. Is it the case that: your answer to this question is not ‘yes’ ? However, for sake of notational convenience we will denote T λ as the Liar. Similar remarks apply to T τ .
28
S. Wintein
‘p can be answered positively and p cannot be answered negatively by a knight’ while ‘V p 0, 1’ is interpreted as ‘p cannot be answered positively and p can be answered negatively, by a knight’. By a 4 valuation for we mean a function 1, 0, 0, 1, 1, 1, 0, 0. We will use t, f , b from the sentences of into 4 and n as abbreviations for 1, 0, 0, 1, 1, 1 and 0, 0 respectively. 2.4
Kripke Correctness
We say that a 4 valuation V is Kripke correct just in case V is compositional with respect to the following truth tables. tf t tf f f f nnf bbf
nb nb f f nf f b
tf nb ttt t t f tf nb ntnn t btb t b
π t T t t t f f n n b b
φ φ t f f t n n b b
π t F t t f f t n n b b
Thus, a 4 valuation V is Kripke correct just in case V is Strong Kleene compositional over 4 (i.e., it satisfies the truth tables displayed above for , , ) and V satisfies the Kripkean fixed point interpretation (see [1]) of a truth and falsity predicate (i.e., it satisfies the truth tables displayed above for T t and F t). Observe that the definition of Kripke correctness does not impose any explicit constraint on the valuation of sentences of form N t and B t. Sentences of form N t and B t will be discussed in 3.7. 2.5
Worlds
A world is the equivalent of a factual valuation. A world w is a set of assertoric sentences and the assertoric world wV that corresponds with factual valuation V is defined as follows: wV
Ap V p
1 Dp V p
The set of all assertoric sentences is denoted as
0
, where:
Xσy X A, D, y , , σ Sen
Alternatively, a world is defined as any w such that Xσy w implies that y and that σ P and such that, for any p P , we have that Ap w Dp w. The set of all worlds will be denoted by W .
What Makes a Knight?
2.6
29
Assertoric Rules
are basically rules of a tableau system signed with The assertoric rules of the elements A , D , A , D for Strong Kleene logic, augmented with rules for T, F, N and B. We follow [4] in distinguishing two types of rules; those of disjunctive type and those of conjunctive type . An assertoric rule associates an assertoric sentence Xσy with its set of immediate semantic sub sentences, Π Xσy . Depending on its type, an assertoric rule is depicted in either one of the following two ways: Xσy Π Xσy
Xσy
Π Xσy
(4)
Here are the assertoric rules for . In the rules for the four predicates, t is a quotational or non quotational constant, i.e., an arbitrary element of C Q .
A
D
A
D
Aαβ
Dαβ
Aαβ
Dαβ
Aα , Aβ
Dα , Dβ
Aαβ
Aα , Aβ
Dαβ
Dα , Dβ
Aαβ
Dαβ
Aα , Aβ
Dα , Dβ
Aα , Aβ
Dα , Dβ
Aα
Dα
Dα
Aα
Aα Dα
Dα Aα
AT
T t F t N t B t
Aπ Aπ
t
Aπ
t
AF
t
Dπ
t
AN
t
, Dπ t
AB
t
t
, Dπ t
DT
t
t
Dπ
t
DF
t
t
DN
t
Aπ
Aπ
Aπ
, Dπ t
DB
t
t
, Dπ t
AT
t
Aπ Aπ
t
Aπ
t
AF
t
Dπ
t
AN
t
, Dπ t
AB
t
t
, Dπ t
DT
t
Aπ Aπ
t
Dπ
t
DT
t
Aπ
t
DN
t
, Dπ t
DB
t
t
t
, Dπ t
The assertoric rules for , , , T and F are associated with a Kripkean Strong Kleene fixed point interpretation of . The (downwards reading of the) assertoric rule for AN t says that if you assert N t you must refuse to assert π t and also, you must refuse to deny π t. The other rules are explained along similar lines. Observe that only the assertoric rules for N and B bring the negative assertoric sentences (those of form Xσ ) in play and that a negative rule X is the dual of the positive rule X in a sense which is clear from the table of rules. When the set of immediate semantic sub sentences associated with a rule X y is
30
S. Wintein
a singleton, it does not matter which type, or , we attribute to the rule. The allotment of types displayed in the table was chosen for sake of symmetry.
3
Constructing a Knight Function
In this section, we will construct the knight function Sen W 4 which is interpreted as follows. For every σ , if σ, w t, respectively f , n or b, then the knight will answer σ in world w with, respectively, ‘yes’, ‘no’, ‘neither’ or ‘both’. 3.1
Assertoric Semantics and Games
In order to define we will adopt the framework of assertoric semantics as developed in ([7]). The function will be constructed in terms of the assertoric 4 valuation , which is a valuation of an assertoric game between two players, whose strategies consist of associating immediate semantic sub sentences with assertoric sentences of type and respectively. Player wins the assertoric game for Xσy just in case he can ensure an open outcome of the game. The closure conditions by which we judge outcomes to be open and closed are discussed below. The function will report, for each sentence σ, whether or not player wins the game for Aσ and also, whether or not he wins the game for Dσ . For instance σ, w 1, 0 tells us that player wins the game for Aσ and that he looses the game for Dσ . In 3.2 until 3.7 we will be concerned with . With at hand, the definition of is easily obtained, as will be discussed in 3.8. 3.2
Expansions
Let S . A function f S is said to respect the immediate semantic sub sentence structure, denoted resf , just in case f assigns to each Xσy S an immediate semantic sub sentence of Xσy . That is: resf
X
y σ
S f Xσy Π Xσy
(5)
With and denoting the sets consisting of all assertoric sentences of type and respectively, we define the sets , and as follows:
f f
, resf
g g
, resg
h h , resh For each h and Xσy , the expansion of Xσy according to h, hX
y σ
by letting, for each n N: hXσy 0
Xσy ,
hXσy n 1
hhXσy n
, is defined
What Makes a Knight?
3.3
31
Outcomes
Observe that a pair f, g induces a function h f g . For any f, g and Xσy , we say that the sequence outXσy , f, g hXσy n n N , where h f g, is the outcome of the interrogation for Xσy in which player plays strategy f and in which player plays strategy g. An outcome out is either open or closed in a world w, which we will denote as Ow out and Cw out respectively. Before we discuss the closure conditions for outcomes, we first describe how the closure conditions for outcomes give rise to closure conditions for assertoric sentences and how the latter conditions can be used to induce a 4 valuation V . 3.4
Inducing a 4 Valuation via Closure Conditions
An assertoric sentence Xσy is said to be open in world w, denoted Ow Xσy just in case player can ensure an open outcome in the game for Xσy . Hence, we have that: Ow Xσy
f g O
y w outXσ , f, g
(6)
When Xσy is not open in a world w, it is closed in that world, denoted Cw Xσy . We are interested, for reasons given below, in closure conditions for assertoric sentences which validate the assertoric rules of . Closure conditions validate the assertoric rules of just in case we have, for each Xσy and each w W , that: Ow Yφz for all Yφz Π Xσy . (7) Xσy is of type : Ow Xσy
Xσy is of type : Ow Xσy
O
z w Yφ
for some Yφz Π Xσy .
Closure conditions induce a 4 valuation V as follows, where VX Sen is the projector function of V on its X coordinate. VA σ, w
1
O
V σ, w
w Aσ ,
x, y
VD σ, w
V
A σ, w
1
O
x, VD σ, w
y
Proof: By an inspection of the assertoric rules, left to the reader. The Cyclical Closure Conditions and
0, 1
w Dσ
Proposition 1. Closure conditions which validate the assertoric rules of duce a 4 valuation V which is Kripke correct.
3.5
(8)
(9) in
Î
The essential element involved in the definition of closure conditions which valis the notion of a cycle. The rationale of our idate the assertoric rules of definition of closure conditions in terms of cycles will be revealed in 3.6. In order to define the notion of a cycle, we enlarge the set of assertoric rules by adding
32
S. Wintein
the (trivial) rules for elements of P : for each p P , we let Π Xpy Xpy . A cycle is a finite sequence of assertoric sentences such that 1) each term of the sequence, except the first, is an immediate sub sentence of its predecessor and 2) the first term is an immediate sub sentence of the last term. The addition of the assertoric rules for elements of P ensures that, for each p P , Xpy is a cycle. With π λ T λ, and π γ N γ , other examples of cycles are AT λ , DT λ , DT λ , AT λ and AN γ , AN γ . There are three types of cycles. A positive cycle is a cycle of which all the terms are positive, whereas in a negative cycle each term is negative and a mixed cycle has both positive and is a cycle, its inverse, denoted as 1 is obtained by negative terms. When performing a “charge swap” (from into or from into ) on each element of . A cycle is either vicious or virtuous relative to a world w.
(a)
is positive, we say that 1. If following two conditions holds:
is vicious in w just in case of one the
Xp and Xp w. (b) For some σ Sen , both Aσ and Dσ are terms of
.
2. If is negative, we say that is vicious in w just in case 1 is virtuous in w. 3. If is mixed, we say that is vicious in w just in case for some X A, D, σ Sen , both Xσ and Xσ are terms of .
An outcome outXσy , f, g is a sequence and the set of terms of an outcome is either of finite or of denumerably infinite cardinality. If the set of terms of an outcome out is finite, then there is a first term in out, say Xσy with index n, for which there exists a term Yβz of out with index m n such that Yβz is an immediate sub sentence of Xσy . We call such a term Xσy a stopping term. The sub sequence of out which consists of the terms up to an including the stopping term is called the initial sequence of out. The initial sequence of an outcome contains a (unique) finite cycle whose last term is the stopping term of the outcome. Call this cycle the measurement cycle of the outcome. An outcome which involves infinitely many terms does not contain a cycle and hence it does not contain a measurement cycle. We semantically valuate an outcome by judging it to be open or closed, based on the “moral” character of its measurement cycle. According to the cyclical closure conditions, an outcome out is (cyclically) closed in a world w, denoted Cw out just in case one of the following two conditions holds: 1. The set of terms of out is infinite and there is a term with index n such that any term with index n, has negative charge, i.e., is of form Xσ . 2. The set of terms of out is finite and the measurement cycle of out is vicious in w. By instantiating schema (9) with the cyclical closure conditions, we obtain the 4 valuation .
What Makes a Knight?
3.6
33
The Compositionality Condition
An important property of the cyclical closure conditions is that they satisfy the compositionality condition Comp. Comp Cw hXαy n n0
C
y n
w hXα n1
The fact that the cyclical closure conditions satisfy Comp implies that along each expansion hXαy of Xαy , closure, i.e. closedness and openess, is preserved along the expansion. That the cyclical closure conditions satisfy Comp is easily seen by an inspection of those conditions. In fact, the cyclical closure conditions have been devised to satisfy Comp, as from Comp we can prove that the cyclical closure conditions validate the assertoric rules. Before we do so, we define Π Xσy as the set consisting of all semantic sub sentences of Xσy . Thus, Π Xσy is obtained by taking the transitive closure of the immediate semantic sub sentence relation. Proposition 2. The cyclical closure conditions validate the assertoric rules of . Proof: We illustrate for Aαβ . Other cases are similar and left to the reader. Suppose that Aαβ is open (in w). Then there is a strategy f such that for all g , the expansion hAαβ , where h f g, is open. Now Aαβ is of type , and the strategies of player can be bi-partitioned into strategies gα , which have g Aαβ Aα and strategies of type gβ , which have g Aαβ Aβ . As f results in an open outcome, no matter whether player plays a strategy of type gα or gβ , it follows that f is such that for all g , we have that Ow outAα , f , g and that Ow outAβ , f , g . Hence, Aα is open and Aβ is open. Suppose that Aα is open and that Aβ is open. This means that there exists a strategy fα such that for all g we have that Ow outAα , fα , g and that there exists a strategy fβ such that for all g we have that Ow outAβ , fβ , g . Let f be any strategy which satisfies:
- Xσy Π Aα , type of Xσy is f Xσy fα Xσy - Xσy Π Aβ Π Aα , type of Xσy is f Xσy
fβ Xσy
From Comp it follows that the constructed f is such that for all g we have that Ow outAαβ , f, g . Proposition 3.
is Kripke correct.
Proof: From Proposition 2 and Proposition 1. 3.7
The Predicates N and B
Proposition 3 tells us that is Kripke correct. However, the notion of Kripke correctness does not tell us anything about the relation between a sentence σ
34
S. Wintein
and a sentence which says of σ that it is neither, respectively both. Let us make some remarks with respect to this relation. As a corollary from Proposition 3, we see that is Truth Correct (T C) with respect to the values t and f . By this, we mean that:
T t, w
t
πt, w
t,
F t, w
t
πt, w
f
Thus, as is T C with respect to t and f , a sentence which says of σ that respectively, σ is true or σ is false, is true just in case σ is true, respectively false. is not T C with respect to the values n and b however, as we can find a denotation function π such that (for some t and any w):
πt, w
n
N t, w
t
(10)
For instance, consider the sentence N η T η , where π η N η T η . By drawing the relevant assertoric expansions of Aπ η and Dπ η , we see that for any world w, V nπ η , w n and N η , w n. Hence, we have a sentence, π η , with semantic value n while N η , a sentence which says that π η has semantic value n, is not true. Thus, is not T C with respect to n. Neither is Falsity Correct (F C) with respect to t and f , for we can find a denotation function π such that (for some t and any w):
πt, w t
T t, w
f
πt, w f
F t, w
f
For instance, the Liar has value n, but so does a sentence which says of the Liar that it is true. Space and time preclude us from continuing our discussion of the interesting notions of Truth and Falsity Correctness. We now turn to the definition of the knight function , which is based on . 3.8
The Knight Function
Ã
In our construction of the knight function, we will make an assumption with respect to the denotation function π. First, we define the language C , which contains the same logical symbolism as , except for the fact that it contains no quotational constant symbols. A denotation function π C Q Sen is said to be C closed just in case for any c C we have that π c Sen C . Thus, if π is C closed it is, for instance, forbidden that π c T T λ, where c C. The assumption of C closedness may be justified as follows. A question to a knight can be addressed in two senses. In a self-referential sense, for instance by letting π c T c p and by asking T c p, i.e., “you will not answer ‘yes’ to this question or p is the case” or in a detached sense, by asking T T c p, i.e., “you will answer ‘yes’ when you are asked the quoted question”. With an C closed denotational function, one can ask a sentence of the first kind by constructing a sentence of C , while a sentence of the second kind, and mixed sentences as T λ T T τ , can be asked by constructing a sentence in Sen Sen C . Thus, in a sense the fact that we can’t have π c T T λ in an C closed valuation function is no real restriction. If
What Makes a Knight?
35
by asking T c we mean asking the self-referential question ‘this sentence is not true’, one can take T λ. If by asking T c we mean asking something about the Liar in a detached sense, we can do so by asking T T λ. Now it may very well be possible to take care of the two senses, detached and self-referential, in the presence of an arbitrary dentation function. In such an approach, the structure of the denotation function itself will determine, for each sentence of , whether it is of self-referential or of detached type. However, for sake of simplicity we will work with an C closed denotation function. That being said, the knight function is defined as follows. We first define the atomic knight function 0 , where
0 P
T c c C F c c C N c c C B c c C
4,
is defined as follows. For each σ in the domain of 0 , we let 0 σ, w σ, w. Next, we define the knight function as the recursive extension of 0 according to the truth tables for , , and the following truth tables for sentences of form T σ , F σ , N σ and B σ . σ T σ t t f f n f b f
σ F σ t f t f n f b f
σ N σ t f f f n t b f
σ B σ t f f f n f b t
A quotational ascription of a semantic value to an atomic sentence in the domain of 0 , is a reflection on the outcome of the downwards expansion process, i.e. a reflection on the values of and is, as illustrated by the truth tables for the semantic predicates, a classical reflection in the sense that all atomic sentences of form T σ , F σ , N σ or B σ have a classical truth value. For instance, we have that T λ, w n and that T T λ, w f and so T T λ, w t. In a sense, we cannot really say that the Liar is not true, while in another sense we can. The two senses alluded to are reflected in ’s valuation of T λ as n and of T T λ as t. The knight function is, from the perspective of formal theories of truth, an interesting function whose formal structure and (alethic) interpretation deserves attention. However, the alethic interpretation of is not the topic of this paper and neither is it the topic of this paper to give a thorough formal analysis of . What is the topic of this paper is the construction of a plausible knight function, which is, so we claim, given by , and an application of this knight function to solve an interesting logic puzzle. It is to this application that we now turn.
4 4.1
Solving Logic Puzzles with
The Three Roads Riddle
Here is the the three roads riddle, which is a version of the riddle presented in ([2]) that was mentioned in the introduction of this paper. Suppose that you are
36
S. Wintein
at a junction point of three roads, the left, right and middle road say. One of the roads is a good road, the other two are bad ones and for each road, you have no clue as to whether it is good or bad. At the junction point, there is a knight who knows which road is good. Can you find out which road is good by asking a single yes-no question to the knight? R&R show that you can, by asking the following question to the knight: Is it the case that: (you will answer this question with ‘no’ and the left road is good) or the middle road is the good? R&R argue, in natural language using reductio ad absurdum, that an answer of ‘yes’ indicates that the middle road is good, an answer of ‘no’ indicates that the right road is good and that an explosion indicates that the left road is the good one. Our formal reconstruction of the question reflects the distribution of answers to the three possible “road situations”. We let pL , pM , pR P stand for the propositions that the left, respectively middle and right, road is good. The question of R&R is formally modeled as π c, where c C is such that π c F c pL pM . That π c is a solution to the riddle is illustrated by observing that, with ‘ ’ the double material implication which is defined as usual, for any world w we have that:
1. 2. 3. 4.2
πc, w πc, w πc, w
t f n
A A A
T πc pM , w F πc pR , w N πc pL , w
w pR w pL w
pM
t t t
The Four Roads Riddle
With the present formalism at hand, it is not hard to see that we can also solve the four roads riddle, which is just like the three roads riddle except for the fact that at the junction point, there are four roads, an east, west, north and south road. We will use pE , pW , pN , pS to denote the proposition that respectively, the east, west, north or south road is good. As demonstrated in ([5]) and ([8]) in detail, we can find out which of the four roads is good by asking the following question to the knight. θ T λ pE T τ pW pN The intuitive counterpart of θ is that we ask the knight: Is it the case that: ( your answer to this question is not ‘no’ and east is good) or (your answer to this question is ‘yes’ and west is good) or north is good? where an occurrence of ‘this question’ refers to the question that is on the same horizontal line as that occurrence of ‘this question’. It is left to the reader to verify, which he can do by drawing the possible assertoric expansions of Aθ and Dθ , that indeed question θ does the job, as we have that: 1. 2. 3. 4.
θ, w θ, w θ, w θ, w
t f n b
A A A A
w w pE w pW w
pN pS
T θ pN , w t F πθ pS , w t N πθ pE , w t B πθ pW , w t
What Makes a Knight?
5
37
Conclusion
Using the framework of assertoric semantics, we extended Smullyan’s notion of a (classical) knight so that the notion is also defined for non classical languages in which we have the power to talk about all four possible answers of like a knight, via the predicates ‘T ’, ‘F ’, ‘B’ and ‘N ’, and in which we may ask arbitrary self referential questions to the knight involving these predicates. We studied a notion of a knight with two non classical answers, ‘neither’ and ‘both’ and we used this conception to formulate a new Smullyan like riddle, which we called the four roads riddle, and we showed that it can be solved in 1 question by this paper’s notion of a knight. Lots of interesting philosophical and technical questions with respect to the function have been left untouched by this paper. Philosophically, I take it that the most interesting question is what (the assertoric interpretation) of tells us about our notion of truth. This paper is not the place to discuss such matters; let me just remark that I am convinced that can be invoked to tell an anti-deflationary story about truth. Let us conclude this paper with another Smullyan like riddle, which adds a twist to the four roads puzzle. Suppose you are at the junction point with the four roads and that there are also two creatures present, a knight and a knave (an inverse knight; someone who always lies). Again, you want to take the good road and you have no idea which of the four roads is good. The knight and the knave know which road is good and you can query them. However, you do not know which of the two creatures is the knight and which is the knave and you are only allowed to ask a single question. Can you find a question that allows you to take the good road with certainty?
References [1] Kripke, S.: Outline of a theory of truth. Journal of Philosophical Logic 72, 690–716 (1975) [2] Rabern, B., Rabern, L.: A simple solution to the hardest logic puzzle ever. Analysis 68, 105–112 (2008) [3] Smullyan, R.: The Lady or the Tiger. Pelican Books (1983) [4] Smullyan, R.: First-order Logic. Dover, New York (1995) [5] Wintein, S.: Computing with self-reference. In: Proceedings of the A.G.P.C. 09 conference, available online (2009) [6] Wintein, S.: On languages that contain their own ungroundedness predicate. To appear in: Logique et Analyse, also available in the online proceedings of ESSLLI’ 09 (2009) [7] Wintein, S.: Assertoric alethic semantics and the six cornerstones of truth (submitted) (2010) [8] Wintein, S.: Assertoric semantics and the computational power of self-referential truth (submitted) (2010)
The Algebraic Structure of Amounts: Evidence from Comparatives Daniel Lassiter New York University
Abstract. Heim [9] notes certain restrictions on quantifier intervention in comparatives and proposes an LF-constraint to account for this. I show that these restrictions are identical to constraints on intervention in wh-questions widely discussed under the heading of weak islands. I also show that Heim’s proposal is too restrictive: existential quantifiers can intervene. Both of these facts follow from the algebraic semantic theory of weak islands in Szabolcsi & Zwarts [25], which assigns different algebraic structures to amounts and counting expressions. This theory also makes novel predictions about the interaction of degree operators with conjunction and disjunction, which I show to be correct. Issues involving modal interveners [9], interval semantics for degrees [23,1], and density [4] are also considered. Keywords: Comparatives, weak islands, degrees, algebraic semantics, quantification, disjunction.
1
Introduction
1.1
Two Puzzles
Consider the sentence in (1): (1) Every girl is less angry than Larry is. A prominent theory of comparatives, associated with e.g. von Stechow [28] and Heim [9], predicts that (1) should be ambiguous between two readings, the first equivalent to (2a) and the second equivalent to (2b). (2)
a. For every girl, she is less angry than Larry is. b. The least angry girl is less angry than Larry is.
But as Kennedy [11] and Heim [9] note, the predicted reading in (2b) is absent: instead, (1) is unambiguously false if there is any girl who is angrier than Larry. The second puzzle involves the relationship between (3) and (4). (3) John is richer than his father was or his son will be. a. OK if John has $1 million, his father has $1,000, and his son has $10,000. T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 38–56, 2010. c Springer-Verlag Berlin Heidelberg 2010
The Algebraic Structure of Amounts: Evidence from Comparatives
39
b. OK if John has $10,000, his father has $1,000, and his son has $1 million (e.g., with continuation “... but I’m not sure which one”). (4) John is richer than his father was and his son will be. a. OK if John has $1 million, his father has $1,000, and his son has $10,000. b. * if John has $10,000, his father has $1,000, and his son has $1 million. (3) is ambiguous between (3a) and (3b), while (4) is unambiguous. In other words, both (3) and (4) can be read as meaning that John is richer than the richer of his father and his son, but only (3) can be read as meaning that he is richer than the poorer of the two. This is also a problem for theories of comparatives that follow von Stechow’s and Heim’s assumptions, because they predict that (3) and (4) should both be ambiguous, and have the same two readings. Sentences like (3) have been taken as evidence that or is lexically ambiguous between the standard logical disjunction and an NPI interpreted as conjunction [23]. I will argue, first, that the ambiguity of (3) is a matter of scope rather than lexical ambiguity; and, second, that the fact that (4) does not display a similar ambiguity is explained by the same principles that rule out the unavailable reading of (1) that is paraphrased in (2b). 1.2
The Plan
Heim [9] states an empirical generalization that quantificational DPs may not intervene scopally between a comparative operator and its trace, essentially to explain the absence of reading (2b) of (1). She suggests that this can be accounted for by an intervention constraint at LF. I demonstrate that this generalization holds with some but not all quantifiers, and that the limitations and their exceptions match quite closely the distribution of interveners in weak islands. Following a brief suggestion in Szabolcsi [24], I show that the algebraic semantic theory of weak islands in Szabolcsi & Zwarts [25] predicts the observed restrictions on comparative scope without further stipulation, as well as the exceptions. This extension of the algebraic account to comparatives also predicts the existence of maximum readings of wh-questions and comparatives with existential quantifiers, and I show that this prediction is correct. In addition, it accounts for the asymmetry between conjunction and disjunction in the comparative complement that we saw in (3) and (4). Certain strong modals appear to provide counter-examples; however, I suggest that the problem lies not in the semantics of comparison but in the analysis of modals. A great deal of work has been done since 1993 on both comparative scope and weak islands, and we might suspect that the problem discussed here can be avoided by adopting one of these more recent proposals. In the penultimate section I survey two influential accounts of the semantics of degree, one relating to comparatives [23] and the other to negative islands [4]. I suggest that neither
40
D. Lassiter
of these proposals accounts for the data at issue, but both are compatible with, and in need of, a solution along the lines proposed here.
2
Comparatives and Weak Islands
2.1
Preliminaries
Suppose, following Heim [9], that gradable adjectives like angry denote relations between individuals and degrees. (5) angry = λdλx[angry (d)(x)] [angry(d)(x)] can be read as “x’s degree of anger is (at least) d”. As the presence of “at least” in the previous sentence suggests, we also assume with Heim that expressions of degree in natural language follow the monotonicity condition in (6). (6) A function f
is monotone iff: ∀x∀d∀d [(f (d)(x) ∧ d < d) → f (d )(x)] (5) is not uncontroversial, but it is a reasonably standard analysis, and we will examine its relationship to some alternative accounts in the penultimate section. (6) is required to accommodate, e.g., the acceptability of true but underinformative answers to questions: Is your son three feet tall? Yes – in fact, he is four feet tall. I also assume, following von Stechow [28] and Heim [9], that more/-er comparatives are evaluated by examining the maxima of two sets of degrees: (7) max = λD ιd∀d [D(d ) → d ≥ d ] More takes two sets of degrees as arguments and returns 1 iff the maximum of the second (the main clause) is greater than the maximum of the first (the than-clause). Less does the same, replacing except that the ordering is reversed. (8)
a. more/-er = λDd,t λD d,t [max(D ) > max(D)] b. less = λDd,t λD d,t [max(D ) < max(D)]
Finally, we assume that more/-er forms a constituent with the than-clause to the exclusion of the adjective, and that typical cases of (at least clausal) comparatives involve ellipsis, so that Larry is angrier than Bill is = Larry is angry [-er than Bill is angry]. 2.2
Quantificational Interveners and the Heim-Kennedy Constraint
Once we consider quantifiers, the treatment of gradable adjectives and comparatives outlined briefly in the previous subsection immediately generates the puzzle in (1)-(2). To see this, note first that, in order for more/-er to get its second argument, it must have undergone QR above the main clause. The difference between the two predicted readings depends on whether the quantifier every girl
The Algebraic Structure of Amounts: Evidence from Comparatives
41
raises before or after the comparative clause. So, for example, Heim would assign the first reading of (1) the LF in (9a). The alternative reading is generated when the comparative clause raises to a position higher than the quantifier every girl. (9) Every girl is less angry than Larry is. a. Direct scope: every girl > less > d-angry ∀x[girl(x) → [max(λd.angry(d)(x))] < max(λd.angry(d)(Larry))] “For every girl x, Larry is angrier than she is.” b. Scope-splitting: less > every girl > d-tall max(λd.∀x[girl(x) → angry(d)(x)]) < max(λd.angry(d)(Larry)) * “Larry’s max degree of anger exceeds the greatest degree to which every girl is angry (i.e., he is angrier than the least angry girl).” If (9) had the“scope-splitting” reading in (9b), it would be true (on this reading) if the least angry girl is less angry than Larry. However, (9) is clearly false if any girl is angrier than Larry. Heim [9] suggests that the unavailability of (9b) and related data can be treated as a LF-constraint along the lines of (10) (cf. [26]): (10) Heim-Kennedy Constraint (HK): A quantificational DP may not intervene between a degree operator and its trace. The proposed constraint (10) attempts to account for the unavailability of (9b) (and similar facts with different quantifiers) by stipulating that the quantificational DP every girl may not intervene between the degree operator less and its trace d-tall. The puzzle is what syntactic or semantic principles explain this constraint given that structures such as (9b) are semantically unexceptionable on our assumptions. 2.3
Similarities between Weak Islands and Comparative Scope
As Rullmann [21] and Hackl [7] note, there are considerable similarities between the limitations on the scope of the comparative operator and the core facts of weak islands disussed by Kroch [14] and Rizzi [20], among many others. Rullmann notes the following patterns: (11)
a. b. c. d. e.
I I I I I
wonder wonder wonder wonder wonder
how how how how how
tall tall tall tall tall
Marcus is / # isn’t. this player is / # no player is. every player is / # few players are. most players are / # fewer than ten players are. many players / # at most ten players are.
(12)
a. b. c. d. e.
Marcus is taller than Lou is / # isn’t. Marcus is taller than this player is / # no player is. Marcus is taller than every player is / # few players are. Marcus is taller than most players are / # fewer than ten players are. Marcus is taller than many players are / # at most ten players are.
42
D. Lassiter
These similarities are impressive enough to suggest that a theory of the weak island facts in (11) should also account for the limitations on comparatives in (12). Rullmann suggests that the unavailability of the relevant examples in (11) and (12) is due to semantic, rather than syntactic, facts. Specifically, both whquestions and comparatives make use of a maximality operation, roughly as in (13): (13)
a. I wonder how tall Marcus is. I wonder: what is the degree d s.t. d = max(λd.Marcus is d-tall)? b. Marcus is taller than Lou is. (ιd[d = max(λd.Marcus is d-tall)]) > (ιd[d = max(λd.Lou is dtall)])
With these interpretations of comparatives and questions, we predict that the sentences in (14) should be ill-formed because each contains an undefined description: (14)
a. # I wonder how tall Marcus isn’t. I wonder: what is the degree d s.t. d = max(λd . Marcus is not d-tall)? b. # Marcus is taller than Lou isn’t. (max(λd.Marcus is d-tall) > (max(λd. Lou is not d-tall))
If degrees of height are arranged on a scale from zero to infinity, there can be no maximal degree d such that Marcus or Lou is not d-tall. However, the similarities between comparatives and weak island-sensitive expressions such as how tall go deeper than Rullmann’s discussion would indicate. S&Z point out that several of the acceptable examples in (11) do not have all the readings predicted by the logically possible orderings of every player and how tall. As it turns out, the same scopal orders are also missing in the corresponding comparatives when we substitute -er for how tall. For example, (15) I wonder how tall every player is. a. every player > how tall > d-tall “For every player x, I wonder: what is the max degree d s.t. x is d-tall?” b. how tall > every player > d-tall “I wonder: what is the degree d s.t. d = max(λd. every player is d-tall)?” A complete answer to (15) would involve listing all the players and their heights. In contrast, an appropriate response to (15b) would be to intersect the heights of all the players and give the maximum of this set, i.e. to give the height of the shortest player. This second reading is clearly not available. In fact, although S&Z and Rullmann do not notice, similar facts hold for the corresponding comparative:
The Algebraic Structure of Amounts: Evidence from Comparatives
43
(16) Marcus is taller than every player is. a. every player > -er > d-tall “For every player, Marcus is taller than he is.” b. -er > every player > d-tall # “Marcus’ max height is greater than the max height s.t. every player is that tall, i.e. he is taller than the shortest player.” Rullmann’s explanation does not exclude the unacceptable (16b): unlike comparatives with an intervening negation, there is a maximal degree d s.t. every player is d-tall on Rullmann’s assumptions, namely the height of the shortest player. Note in addition that (16) is identical in terms of scope possibilities to our original comparative scope-splitting example in (1)/(9), although its syntax is considerably different. Like (9), (16) falls under Heim’s proposed LF-constraint (10), which correctly predicts the unavailability of (16b).1
3
Comparative Scope and the Algebraic Structure of Amounts
3.1
Szabolcsi & Zwarts’ (1993) Theory of Weak Islands
Like Rullmann [21], S&Z argue that no syntactic generalization can account for the full range of weak islands, and propose to account for them in semantic terms. They formulate their basic claim as follows: (17) Weak island violations come about when an extracted phrase should take scope over some intervener but is unable to. S&Z explicate this claim in algebraic terms, arguing that weak islands can be understood if we pay attention to the operations that particular quantificational elements are associated with. For instance, (18) Universal quantification involves taking intersections (technically, meets). Existential quantification involves taking unions (technically, joins). Negation involves taking complements. (18) becomes important once we assign algebraic structures as denotations to types of objects, since not all algebraic operations are defined for all structures. The prediction is that a sentence will be semantically unacceptable, even if it can be derived syntactically, if computing it requires performing an operation on a structure for which the operation is not defined. S&Z illustrate this claim with the verb behave: 1
We might think to appeal to the fact that comparative clauses are islands to extraction in order to account for the missing readings in (16), but this would give the wrong results: it would rule out (16a) instead of (16b).
44
(19)
D. Lassiter
a. How did John behave? b. *How didn’t John behave? c. How did everyone behave? i. For each person, tell me: how did he behave? ii. *What was the behavior exhibited by everyone?
Behave requires a complement that denotes a manner. S&Z suggest that manners denote in a free join semilattice, as Landman [15] does for masses. [abc]
(20) Free join semilattice [ab]
[ac]
[bc]
[a]
[b]
[c]
A noteworthy property of (20) is that it is closed under union, but not under complement or intersection. For instance, the union (technically, join) of [a] with [b⊕c] is [a⊕b⊕c], but the intersection (meet) of [a] with [b⊕c] is not defined. The linguistic relevance of this observation is that it corresponds to our intuitions of appropriate answers to questions about behavior. In S&Z’s example, suppose that three people displayed the following behaviors: (21) John behaved kindly and stupidly. Mary behaved rudely and stupidly. Jim behaved loudly and stupidly. If someone were to ask: “How did everyone behave?”, interpreted with how taking wide scope as in (19c-ii), it would not be sufficient to answer “stupidly”. The explanation for this, according to S&Z, is that computing the answer to this question on the relevant reading would require intersecting the manners in which John, Mary and Jim behaved, but intersection is not defined on (20). This, then, is an example of when “an extracted phrase should take scope over some intervener but is unable to”. Similarly, (19b) is unacceptable because complement is not defined on (20). Extending this account to amounts is slightly trickier, since amounts seem to come in two forms. In the first, which S&Z label “counting-conscious”, whexpressions are able to take scope over universal quantifiers. S&Z imagine a situation in which a swimming team is allowed to take a break when everyone has swum 50 laps. In this situation it would be possible to ask: (22) [At least] How many laps has every swimmer covered by now? If the number of laps covered by the slowest swimmer is a possible answer, then counting-conscious amount expressions had better denote in a structure in which
The Algebraic Structure of Amounts: Evidence from Comparatives
45
intersection is defined. 2 The lattice in (23) – essentially the structure normally assumed for all degrees – seems to be an appropriate choice. (23) Lattice
Intersection and union are defined in this structure, though complement is not. This analysis predicts correctly that how many/much should be able to take scope over existential quantification but not negation. (24)
3.2
a. How many laps has at least one swimmer covered by now? [Answer: the number of laps covered by the fastest swimmer.] b. *How many laps hasn’t John covered by now? Extending the Account to Comparatives
Many authors assume that amounts always denote in (23) or some other structure D, ≤ for D ⊆ R. The problem we began with — why can’t Every girl is less tall than John mean “The shortest girl is shorter than John”? — relied tacitly on this assumption, which entails that it should be possible to intersect sets of degrees. I would like to suggest an alternative: heights and similar amounts do not denote in (23), but in a poorer structure for which intersection is not defined, as S&Z claim for island-sensitive amount wh-expressions. As S&Z note, such a structure is motivated already by the existence of non-counting-conscious amount wh-expressions which are sensitive to a wider variety of interveners than how many was in the examples in (22) and (24). This is clear, for example, with amounts that are not associated with canonical measures: (25) How much pain did every student endure? a. “For every student, how much pain did he/she endure?” b. * “What is the greatest amount of pain s.t. every student endured that much, i.e. how much was endured by the one who endured the least?” The unacceptability of (25b) is surprising given that the degree expression is able to take wide scope in the overtly similar (22). S&Z argue that, unless counting is involved, amount expressions denote in a join semilattice: 2
As Anna Szabolcsi (p.c.) suggests, the use of canonical measures like laps, kilos, meters may help bring out this reading.
46
D. Lassiter
(26) Join semilattice
[a + b + c + d] (= 4) [a + b + c] (= 3)
[a + b] (= 2) [a] (= 1)
[d]
[c]
[b]
(26) should be seen as a structure collecting arbitrary unit-sized bits of stuff, abstracting away from their real-world identity, like adding cups of milk to a recipe (S&Z pp.247-8). An important formal property of (26) is that “if p is a proper part of q, there is some part of q (the witness) that does not overlap with p” (p.247). As a result, intersection is not defined unless the objects intersected are identical. S&Z claim that this fact is sufficient to explain the unavailability of (25b), since the heights of the various students, being elements of (26), cannot be intersected. This explanation for the unavailability of reading (25b) relies on a quite general proposal about the structure of amounts. As a result, it predicts that amount-denoting expressions should show similar behavior wherever they appear in natural language, and not only in wh-expressions. The similarities between amount-denoting wh-expressions and comparatives, then, are explained in a most straightforward way: certain operations are not defined on amount-denoting expressions because of the algebraic structure of their denotations, regardless of the other details of the expressions they are embedded in. So, returning to (9), (27) Every girl is less angry than Larry is. Scope-splitting: less > every girl > d-tall max(λd.angry(Larry)(d)) > max(λd.∀x[girl(x) → angry(x)(d)]) * “Larry’s max degree of anger exceeds the greatest degree to which every girl is angry (i.e., he is angrier than the least angry girl).” This interpretation is not available because, in the normal case, max(λd.∀x[girl(x) → tall(x)(d)]) is undefined. I conclude that the puzzle described by the HeimKennedy constraint was not a problem about the scope of a particular type of operator, but was generated by incorrect assumptions about the nature of amounts. Amounts are not simply points on a scale, but elements of (26). This proposal is independently motivated in S&Z, and it explains the restrictions captured in HK as well as other similarities between comparatives and weak islands. A small but important exception to this generalization is that, since A∩A = A for any A, intersections are defined on (26) in the special case in which all individuals in the domain of quantification are mapped to the same point of the join semilattice. The prediction, then, is that the scope-splitting readings of (25) and (27) should emerge just in case it is assumed that all individuals have the property in question to the same degree. And indeed, S&Z note that there is a third reading of sentences like (25) which presupposes that the girls are all
The Algebraic Structure of Amounts: Evidence from Comparatives
47
equally angry (see also Abrus´an [1] for discussion).3 Though S&Z do not make this connection, it seems that their theory makes another correct prediction in this domain: universal quantifiers can intervene when this presupposition is appropriate.4 3.3
Maximum Readings of Existential Quantifiers and Their Kin
The crucial difference between join semilattices (26) and chains (23) is that the latter is closed under intersection while the former is not. However, both are closed under union, which corresponds to existential quantification. Here the predictions of the present theory diverge from those of the LF-constraint in (10): on our theory, existential quantifiers and other quantifiers which are computed using only unions should be acceptable as interveners in both comparatives and amount wh-questions. (10), in contrast, forbids all quantificational DPs from intervening scopally. In fact we have already seen an example which shows a quantificational intervener of this type: the only available reading of (24) is one which requests the number of laps finished by the fastest swimmer. These readings are also available with amount wh-questions, as (28) shows. (28) How tall is at least one boy in your class? [Answer: 6 feet.] This reading does not seem to be available with the quantifier some. However, this is probably due to the fact that, for independent reasons, some in (29) can only have a choice reading. (29) How tall is some professor? a. “Pick a professor and tell me: How tall is (s)he?” b. # “How tall is the tallest professor?” The readings in (24) and (28) may be more robust than the corresponding reading of (29b) because at least, in contrast to some, does not support choice readings (cf. Groenendijk & Stokhof [6]). Maximum readings also appear with a NP when the NP is focused: (30) [The strongest girl lifted 80 kg.] OK, but how much did a BOY lift? [Answer: the amount that the strongest boy lifted.] Since our account emphasizes the similarities between comparatives and whquestions, we expect that similar readings should exist also in comparatives. These readings are indeed attested: for instance, Heim [9, p.223] notes this example. 3 4
We cannot tell whether the comparative in (27) has this reading, since it is truthconditionally equivalent to the direct scope reading. Thanks to an anonymous reviewer for emphasizing the importance of the reading which contains this “very demanding presupposition”, and to Roberto Zamparelli for pointing out that this is an unexpected and welcome prediction of S&Z’s account.
48
D. Lassiter
(31) Jaffrey is closer to an airport than it is to a train station. This is true iff the closest airport to Jaffrey is closer than the closest train station. An additional naturally occurring example which seems to display a maximum reading with an existential quantifier in a comparative complement, viz.: (32) “I made the Yankee hat more famous than a Yankee can.” (Jay-Z, “Empire State of Mind”, The Blueprint 2, Roc Nation, 2009) From the context of this song, the artist is clearly not saying that he made the Yankees hat more famous than some particular Yankee (baseball player) can, or more than a typical Yankee can, but more than any Yankee can. Finally, if any and ever in contexts such as the following has existential semantics, as claimed in Kadmon & Landman [10] and many others, then (37a) and (37b) also involve intervention by an existential quantifier: (33)
a. Larry endured more pain than any professor did. b. My brother is angrier than I ever was.
Maximum readings, then, are well-attested in both amount comparatives and amount wh-questions. This supports the present theory in treating quantifier scope in comparatives and wh-questions in the same way, and is incompatible with the LF constraint (10) proposed by Heim.
4
Conjunction and Disjunction in the Comparative Complement
Noting an ambiguity similar to that in (3) – where a sentence with or in the comparative complement is equivalent on one reading to the same sentence with and replacing or – Schwarzschild & Wilkinson [23] suggest: or in these examples may in fact be a negative polarity item ... which has a conjunctive interpretation in this context, in the way that negative polarity any or ever seem to have universal interpretations in the comparative. The difference between ordinary or meaning ‘∨’ and NPI or meaning ‘∧’, then, is a matter of lexical ambiguity.5 This is a possible analysis, but I think that it 5
This claim is important to Schwarzschild & Wilkinson [23] because their semantics for comparatives prevents scope ambiguities of the type considered here from being computed, so that they have no choice but to treat or as ambiguous. However, Schwarzschild [22] notes that the earlier proposal in Schwarzschild & Wilkinson [23] wrongly predicts that there should be no scope ambiguities of the type considered here, and proposes an enrichment that is compatible with the analysis of the ambiguity of sentences with or and the non-ambiguity of the corresponding sentences with and suggested here. See section (6.1) for details.
The Algebraic Structure of Amounts: Evidence from Comparatives
49
would be desirable to derive the ambiguity of (3) without positing two meanings of or, in line with Grice’s [5] Modified Occam’s Razor (“Senses are not to be multiplied beyond necessity”). The present theory yields an explanation of this fact, and of why similar constructions with and are unambiguous. The only available reading of the sentence in (4) is the one in (34a), where (34) is treated as an elliptical variant of (35). (34) John is richer than his father was and his son will be. a. max(λd[rich(d)(f ather)]) < max(λd[rich(d)(John)]) ∧ max(λd[rich(d)(son)]) < max(λd[rich(d)(John)]) (35) John is richer than his father was and he is richer than his son will be. Why can’t (34) be read as in (36), which ought to mean that John is richer than the poorer of his father and his son? (36) max(λd[rich(d)(f ather) ∧ rich(d)(son)]) < max(λd[rich(d)(John)]) The theory we have adopted offers an explanation: because conjunction, like universal quantification, relies on the operation of intersection, (36) is not available for the same reason that Every girl is less angry than Larry doesn’t mean that the least angry girl is less angry than Larry. Computing (36) would require taking the intersection of the degrees of wealth of John’s father and his son; but this is not possible, because this operation is not defined for amounts of wealth. In contrast, the same sentence with or instead of and is ambiguous because the maximum reading (37a) can be computed using only unions, which are defined in a join semilattice. (37b) is the alternative reading on which (37) is elliptical, like the only available reading of (34). (37) John is richer than his father was or his son will be. a. max(λd[rich(d)(f ather)∨rich(d)(son)]) < max(λd[rich(d)(John)]) [“He is richer than both.”] b. max(λd[rich(d)(f ather)< rich(d)(John)])∨max(λd[rich(d)(John)]) [“He is richer than one or the other, but I don’t remember which.”] Thus it is not necessary to treat or as lexically ambiguous: the issue is one of scope.
5
Modals and Intensional Verbs
On S&Z’s theory, existential quantifiers are able to intervene with amount expressions because join semilattices are closed under unions. This produces the “maximum” readings that we have seen. The analysis predicts, correctly, that (38) is ambiguous, a type of case discussed at length in Heim [9]. (38) (This draft is 10 pages.) The paper is allowed to be exactly 5 pages longer than that. [9, p.224]
50
D. Lassiter
a. allowed > exactly 5 pages -er > that-long ∃w ∈ Acc : max(λd : longw (p, d)) = 15pp “In some accessible world, the paper is exactly 15 pages long, i.e. it may be that long and possibly longer” b. exactly 5 pages -er > allowed > that-long max(λd[∃w ∈ Acc : longw (p, d)) = 15pp “The max length of the paper in any accessible world, i.e. its maximum allowable length, is 15 pages” I am not entirely certain whether the corresponding wh-question is ambiguous: some speakers think it is, and others do not. The robust reading is (39b), which involves scope-splitting; the questionable reading is the choice reading (39a). (39) How long is the paper allowed to be? a. allowed > how long > that-long ? “Pick an accessible world and tell me: what is the length of the paper is that world?” [Answer: “For example, it could be 17 pages long.”] b. how long > allowed > that-long “What is the max length of the paper in any accessible world, i.e. its maximum permissible length?” [Answer: “20 pages – no more will be accepted.”] If the answer given to (39a) is not possible, this may again be related to restrictions on the availability of choice readings, or possibly allowed does not have a quantificational semantics at all (as I will suggest for required ). So far, so good. However, since intersection is undefined with amount expressions, S&Z and the current proposal seem to predict that this ambiguity should be absent with universal modals, so that neither (40b) nor (41b) should be possible. (40) (This draft is 10 pages.) The paper is required to be exactly 5 pages longer than that. [9, p.224] a. required > exactly 5 pages -er > that-long ∀w ∈ Acc : max(λd : longw (p, d)) = 15pp “In every world, the paper is exactly 15 pages long” b. exactly 5 pages -er > required > that-long max(λd[∀w ∈ Acc : longw (p, d)]) = 15pp “The max common length of the paper in all accessible worlds, i.e. its length in the world in which it is shortest, is 15 pages” (41) How long is the paper required to be? a. required > how long > that-long “What is the length s.t. in every accessible world, the paper is exactly that long?” b. how long > required > that-long “What is the max common length of the paper in all accessible worlds, i.e. its length in the world in which it is shortest?”
The Algebraic Structure of Amounts: Evidence from Comparatives
51
But (40b) and (41b) are possible readings, and in fact are probably the most robust interpretations of (40) and (41). S&Z suggest briefly that modals and intensional verbs may be acceptable interveners because they do not involve Boolean operations. I am not sure precisely what they have in mind, but we should consider the possibility that the issue with (40)-(41) is not a problem about the interaction between degree operators and universal quantification, but one about the analysis of modals and intensional verbs. For instance, suppose that we were to treat modals not as quantifiers but as degree words, essentially as minimum- and maximum-standard adjectives as discussed in Kennedy & McNally [13] and Kennedy [12]. This analysis is motivated on independent grounds in Lassiter [17].6 For reasons of space the theory will not be described in detail, but – on one possible implementation of the analysis for allowed and required – its predictions are these: only the (b) readings of (40)-(41) should be present, and it is (40a) and (41a) that need explanation. In the case of (40), at least, the (a) entails the (b) reading, and so we may suppose that only (40b) is generated, and its meaning is “Fifteen pages is the minimum requirement”.7 On such an analysis, (40a) is not a different reading of (40) but merely a special case of (40b), where we have further (e.g., Gricean) reasons to believe that that the minimum is also a maximum.8 The advantage of such an analysis, from the present perspective, is that it explains another stipulative aspect of Heim’s proposed LF-constraint: why is the restriction limited to quantificational DPs? If the proposal I have gestured at in this section is correct, we have an answer: “universal” modals are immune to the prohibition against taking intersections with amounts because they are not really quantifiers. That is, computing them does not involve universal quantification over worlds, but simply checking that the degree (of probability, obligation, etc.) of a set of worlds lies above a particular (relatively high) threshold. Even if this particular suggestion turns out to be incorrect, the difference between universal quantifiers and universal modals noted by S&Z and Heim 6
7
8
A proposal due to van Rooij [27] and Levinson [18] seems to make similar predictions for want. This may account for the fact that want, like require, is a scope-splitting verb. The suggestion made here also recalls a puzzle noted by Nouwen [19] about minimum requirements and what Nouwen calls “existential needs”. Need is another scopesplitting verb, as it happens. I suspect that Nouwen’s problem, and the solution, is the same as in the modal scope-splitting cases discussed here. An anonymous reviewer notes that this account does not extend from required to should, citing the following puzzling data also discussed in Heim [9]: (i) Jack drove faster than he was required to. (ii) Jack drove faster than he should have. If the law requires driving at speeds between 45 mph and 70 mph, (i) is naturally interpreted as saying that Jack drove more than 45 mph, but (ii) says that he drove more than 70 mph. There are various possible analyses of these facts from the current perspective, including treating required as a minimum-standard degree expression, but should as a relative-standard expression (like tall or happy).
52
D. Lassiter
is an unexplained problem for all available theories of comparatives and weak islands, and not just the proposal made here. The general point of this section is simply that the setup of this problem relies on one particular theory of modals which may well turn out to be incorrect. Fleshing out a complete alternative is unfortunately beyond the scope of this paper, however.
6
Comparison with Related Proposals
In this section I compare the proposal made here with two influential proposals in the recent literature. The general conclusion is that the problem in (1) is not resolved by these modifications to the semantics of comparatives and/or whquestions, and that a separate account is needed. However, the line of thought pursued here is essentially compatible with these proposals as well. 6.1
Schwarzschild & Wilkinson
An influential proposal due to Schwarzschild & Wilkinson [23] argues that comparatives have a semantics based on intervals rather than points. They show that, on this assumption, it is possible to derive apparent wide scope of quantifiers in the comparative complement without allowing QR out of the comparative complement. Since the latter would violate the general prohibition against extraction from a comparative complement, this result is welcome. However, it is also empirically problematic, and attempts to cope lead back to a solution of the type given here. Schwarzschild & Wilkinson give the comparative sentence in (42a) a denotation that is roughly paraphrased in (42b): (42)
a. Larry is taller than every girl is. b. The smallest interval containing Larry’s height lies above the largest interval containing the heights of all the girls and nothing else.
(42b) will be true just in case, for every girl, Larry is taller than she is. As a result, the undesired “shortest-girl” reading is not generated. Schwarzschild [22] acknowledges, though, that the proposal in Schwarzschild & Wilkinson [23] is too restrictive: it predicts that there should never be scope ambiguities between more/-er/less and quantifiers in the comparative complement. We have already seen a number of examples where such ambiguities are attested, involving existential quantification and its ilk (in (32)) and existential and (perhaps) universal modals in (38) and (40). In order to maintain the interval-based analysis, Schwarzschild [22] introduces a point-to-interval operator π which can appear in various places in the comparative complement (see also Heim [8] for a closely related proposal and much discussion). In this way, Schwarzschild derives the ambiguities discussed here without raising the QP out of the comparative clause. The important thing to note, for our purposes, is that while Schwarzschild & Wilkinson [23] present a semantics on which the problems discussed here do
The Algebraic Structure of Amounts: Evidence from Comparatives
53
not arise, Schwarzschild’s [22] modification re-introduces scope ambiguities in comparatives in order to deal with the restricted set of cases where they do arise. This modification is valuable because it explains how these ambiguities arise despite the islandhood of the comparative clause; however, as Heim [8, pp.15-16] points out, in order to “prevent massive overgeneration of unattested readings, we must make sure that π never moves over a DP-quantifier, an adverb of quantification, or for that matter, an epistemic modal or attitude verb”. This is essentially Heim’s [9] proposed LF-constraint (10) re-stated in terms of the scope of π. So we are back to square one: the interval-based account, though it has the important virtue of explaining apparent island-violating QR out of the comparative complement, does not explain the core puzzle that we are interested in, why (42a) lacks a “shortest-girl” reading. So the current proposal, or something else which does this job, is still needed in order to explain why (42a) does not (on the high-π reading) have the “shortest-girl” reading.9,10 6.2
Fox & Hackl
Next we turn to an influential proposal by Fox & Hackl [4]. I show that the theory advocated here is not in direct competition with Fox & Hackl’s theory, but that there are some complications in integrating the two approaches which may cause difficulty for Fox & Hackl’s. Fox and Hackl argue that amount-denoting expressions always denote on a dense scale, effectively the lattice in (23) with the added stipulation that, for any two degrees, there is always a degree that falls between them. The most interesting data from the current perspective are in (43) and (44): (43)
a. How fast are we not allowed to drive? b. *How fast are we allowed not to drive?
(44)
a. How fast are we required not to drive? b. *How fast are we not required to drive?
The contrasts in (43) and (44) are surprising from S&Z’s perspective: on their assumptions, there is no maximal degree d such that you are not allowed to drive d-fast, and yet (23a) is fully acceptable. In addition, (43a) and (44a) do not ask for maxima but for minima (the least degree which is unacceptably fast, i.e. the speed limit). Fox and Hackl show that the minimality readings of (43a) and (44a), and the ungrammaticality of (43b) and (44b) follow if we assume (following [3] and [2]) that wh-questions do not ask for a maximal answer but for a maximally informative answer, defined as follows: 9
10
Although intervals are usually assumed to be subsets of the reals – and so fit naturally with totally ordered structures like (22) – there is no barrier in principle to defining an interval-based degree semantics for partially ordered domains. Of course, some issues of detail may well arise in the implementation. A similar point holds for Abrus´ an [1]: her interval-based semantics does well with negative and manner islands, but does not account for quantificational interveners.
54
D. Lassiter
(45) The maximally informative answer to a question is the true answer which entails all other true answers to the question. Fox & Hackl show that, on this definition, upward monotonic degree questions ask for a maximum, since if John’s maximum height is 6 feet, this entails that he is 5 feet tall, and so on for all other true answers. However, downward entailing degree questions ask for a minimum, since if we are not allowed to drive 70 mph, we are not allowed to drive 71 mph, etc. This is not as deep a problem for the present theory as it may appear. S&Z assume that wh-questions look for a maximal answer, but it is unproblematic simply to modify their theory so that wh-questions look for a maximally informative answer. Likewise, we can just as easily stipulate that a join semilattice (26) is dense as we can stipulate that a number line (23) is dense; this maneuver would replicate Fox & Hackl’s result about minima in downward entailing contexts. In this way it is possible simply to combine S&Z’s theory with Fox & Hackl’s. In fact, this is probably independently necessary for Fox & Hackl, since their assumption that amounts always denote in (23) fails to predict the core data of the present paper: the fact that How tall is every girl? and Every girl is less tall than John lack a “shortest-girl” reading. I conclude that the two theories are compatible, but basically independent. Finally, note that the maximal informativity hypothesis in (45), whatever its merit in wh-questions and other environments discussed by Fox & Hackl, is not appropriate for comparatives: here it appears that we need simple maximality.11 (46)
a. How fast are you not allowed to drive? b. *You’re driving faster than you’re not allowed to.
A simple extension of the maximal informativity hypothesis to comparatives would predict that (46b) should mean “You are exceeding the speed limit”. In contrast, the maximality-based account predicts that (46b) is unacceptable, since there is no maximal speed which is not allowed. This appears to be the correct prediction. However, it is worth noting that, because of the asymmetry between (43) and (46), combining the two theories in the way suggested here effectively means giving up the claim that the comparative is a type of wh-operator. Since this idea has much syntactic and semantic support, it is probably worth looking for an alternative explanation of (43) and (44) that does not involve adopting the proposal in (45).
7
Conclusion
To sum up, the traditional approach on which amounts are arranged on a scale of degrees fails to explain why the constraint in (10) should hold. However, the numerous similarities between limitations on comparatives and amount-denoting wh-questions with quantifiers suggest that these phenomena should be explained by a single theory. S&Z’s semantic account of weak islands predicts, to a large 11
Thanks to a PLC reviewer for bringing the contrast in (46) to my attention.
The Algebraic Structure of Amounts: Evidence from Comparatives
55
extent, where quantifier intervention is possible and where it is not. The crucial insight is that intervention effects are due to the kinds of operations that quantifiers need to perform, and not merely the structural configuration of various scope-taking elements. S&Z’s theory also predicts correctly that narrow-scope conjunction is impossible in amount comparatives, but narrow-scope disjunction is possible. To be sure, important puzzles remain; but the algebraic approach to comparative scope offers a promising explanation for a range of phenomena that have not been previously treated in a unified fashion. Furthermore, if S&Z’s theory turns out to be incomplete, all is not lost. The most important lesson of the present paper, I believe, is not that S&Z’s specific theory of weak islands is correct — as we have seen, there are certainly empirical and technical challenges12 — but rather that weak island phenomena are not specific to wh-questions. In fact, we should probably think of the phenomena summarized by the Heim-Kennedy constraint as comparative weak islands. However the theory of weak islands progresses, evidence from comparatives will need to play a crucial role in its development — and vice versa.13
References 1. Abrus´ an, M.: Contradiction and Grammar. PhD thesis, MIT (2007) 2. Beck, S., Rullmann, H.: A flexible approach to exhaustivity in questions. Natural Language Semantics 7(3), 249–298 (1999) 3. Dayal, V.: Locality in wh-quantification. Kluwer, Dordrecht (1996) 4. Fox, D., Hackl, M.: The universal density of measurement. Linguistics and Philosophy 29(5), 537–586 (2006) 5. Grice, H.P.: Studies in the Way of Words. Harvard University Press, Cambridge (1989) 6. Groenendijk, J., Stokhof, M.: Studies in the Semantics of Questions and the Pragmatics of Answers. PhD thesis, University of Amsterdam (1984) 7. Hackl, M.: Comparative quantifiers. PhD thesis, MIT (2000) 8. Heim, I.: Remarks on comparative clauses as generalized quantifiers. Ms. MIT, Cambridge (2006) 9. Heim, I.: Degree operators and scope. In: Fery, Sternefeld (eds.) Audiatur Vox Sapientiae: A Festschrift for Arnim von Stechow. Akademie Verlag, Berlin (2001) 10. Kadmon, N., Landman, F.: Any. Linguistics and philosophy 16(4), 353–422 (1993) 12
13
In particular, S&Z do not give a compositional implementation of their proposal. I do not see any very deep difficulties in doing so, although, as Anna Szabolcsi (p.c.) points out, treating amounts and counters differently in semantic terms despite their similar (or possibly identical) syntax might seem unattractive to some. Thanks to Anna Szabolcsi and an anonymous ESSLLI reviewer for pointing out the need for this note. Thanks to Chris Barker, Anna Szabolcsi, Arnim von Stechow, Emmanuel Chemla, Yoad Winter, Lucas Champillon, Rick Nouwen, Roberto Zamparelli, Roger Schwarzschild, several anonymous reviewers, and audiences at the 33rd Penn Linguistics Colloquium and the 2009 ESSLLI Student Session for helpful discussion and advice. An earlier version of this paper appeared as Lassiter [16].
56
D. Lassiter
11. Kennedy, C.: Projecting the adjective: The syntax and semantics of gradability and comparison. PhD thesis, U.C., Santa Cruz (1997) 12. Kennedy, C.: Vagueness and grammar: The semantics of relative and absolute gradable adjectives. Linguistics and Philosophy 30(1), 1–45 (2007) 13. Kennedy, C., McNally, L.: Scale structure, degree modification, and the semantics of gradable predicates. Language 81(2), 345–381 (2005) 14. Kroch, A.: Amount quantification, referentiality, and long wh-movement. Ms., University of Pennsylvania (1989) 15. Landman, F.: Structures for Semantics. Kluwer, Dordrecht (1991) 16. Lassiter, D.: Explaining a restriction on the scope of the comparative operator. University of Pennsylvania Working Papers in Linguistics, 16(1) (2010) 17. Lassiter, D.: Gradable Epistemic Modals, Probability, and Scale Structure. In: Proceedings from Semantics and Linguistic Theory XX (to appear, 2010) 18. Levinson, D.: Probabilistic Model-theoretic Semantics for want. In: Proceedings from Semantics and Linguistic Theory XIII (2003) 19. Nouwen, R.: Two puzzles about requirements. In: Proceedings of the 17th Amsterdam Colloquium, pp. 326–334 (2009) 20. Rizzi, L.: Relativized Minimality. MIT Press, Cambridge (1990) 21. Rullmann, H.: Maximality in the Semantics of wh-constructions. PhD thesis, University of Massachusetts, Amherst (1995) 22. Schwarzschild, R.: Scope-splitting in the comparative. Handout from MIT colloquium (2004), http://www.rci.rutgers.edu/~ tapuz/MIT04.pdf 23. Schwarzschild, R., Wilkinson, K.: Quantifiers in comparatives: A semantics of degree based on intervals. Natural Language Semantics 10(1), 1–41 (2002) 24. Szabolcsi, A.: Strong vs. weak islands. Blackwell Companion to Syntax 4, 479–531 (2006) 25. Szabolcsi, A., Zwarts, F.: Weak islands and an algebraic semantics for scope taking. Natural Language Semantics 1(3), 235–284 (1993) 26. Takahashi, S.: More than two quantifiers. Natural Language Semantics 14(1), 57–101 (2006) 27. van Rooij, R.: Some analyses of pro-attitudes. In: de Swart, H. (ed.) Logic, Game Theory, and Social Choice. Tilburg University Press (1999) 28. von Stechow, A.: Comparing semantic theories of comparison. Journal of Semantics 3(1), 1–77 (1984)
Extraction in the Lambek-Grishin Calculus Arno Bastenhof Utrecht University
Abstract. We propose an analysis of extraction in the Lambek-Grishin calculus (LG): a categorial type logic featuring subtractions A B and B A, with proof-theoretic behavior dual to that of the usual implications A B, B A. Our analysis rests on three pillars: Moortgat’s discontinuous type constructors ([6]); their decomposition in LG as proposed by Bernardi and Moortgat ([1]); and the polarity-sensitive double negation translations of [3] and [5], inspiring the Montagovian semantics of our analysis. Characteristic of the latter is the use of logical constants for existential quantification and identity to identify the extracted argument with its associated gap.
Being founded upon logics of strings (L) or trees (NL), categorial type logics [7, CTL] do not naturally accommodate a satisfactory treatment of discontinuity. In response, Moortgat ([6]) proposed a type constructor that allows unbounded abstraction over the type of the syntactic context of an expression. Though originally intended for the analysis of scopal ambiguities, we here pursue its application to extraction. We find opportunities for improvement by extending our analysis to the Lambek-Grishin calculus (LG), a conservative extension of NL for which Moortgat’s type constructor was shown derivable in [1]. We conclude by pairing our analysis with a Montagovian semantics, taking inspiration from the double negation translations proposed in [3] and [5]. We proceed as follows. 1 briefly reviews the application of CTL to linguistic analysis. 2 outlines our usage of Moortgat’s type constructor in finding a type schema for extraction. The slight amount of lexical ambiguity thereby posited is shown reducible in 3, where we discuss the decomposition of our schema in LG. 4 couples our analysis with a Montagovian semantics. 5 summarizes our findings and relates our work to the literature.
1
Categorial Analyses
We adopt a logical perspective on natural language syntax: syntactic categories are logical formulas, or (syntactic) types (written A..E), as we shall call them. A..E
n np s
B A
AB
Types n, np and s are atomic, categorizing common nouns, noun phrases and sentences respectively. The interpretations of complex types we derive from the proof-theoretic meanings of their constructors: the implications , . Proofs we T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 57–71, 2010. Springer-Verlag Berlin Heidelberg 2010
58
A. Bastenhof
take to establish sequents Γ A, understood as grammaticality judgements: Γ is a well-formed constituent of type A. By a constituent we simply understand a binary-branching tree with types at its leaves: Γ, Δ
A
Γ Δ
On this reading, the slashes embody subcategorization: a constituent Γ of type AB (B A combines with a constituent Δ of type B into Γ Δ (Δ Γ ), assigned the type A. Made explicit in a Natural Deduction presentation: A Γ
A
Ax
AB Δ B Δ B Γ B A Γ B A I E E Γ Δ A ΔΓ A Γ AB
B Γ A I Γ B A
Introduction rules I , I allow the inference of a type, while elimination rules E , E allow the use of a type in an inference. Axioms Ax ensure each constituent corresponding to a single leaf to be a well-formed constituent of the type found at that leaf. Additionally, one may consider a formula counterpart of tree merger Γ Δ: the multiplicative product AB, read as binary merger:1 Δ
A B Γ A B Γ Δ C
C
E
Γ Γ
A Δ B I Δ AB
As an illustration, consider the following type assignments to the lexical items in John offered the lady a drink. Figure 1 derives the corresponding sentence:2 John npsu
offered npsu snpdo npio
the lady a drink npio n n npdo n n
The calculus NL thus established is no stronger than context-free formalisms. The next section discusses Moortgat’s proposal on how to go beyond.
2
Types for Extraction
We discuss Moortgat’s discontinuous type constructors ([6]), proposing an application to extraction. Intuitively, said type constructors allow abstraction over the type of the syntactic contexts of expressions. By a context we shall understand trees Γ with a hole . Substituting for some Δ yields the tree Γ Δ, the notation emphasizing the (distinguished) occurrence of Δ in Γ . Γ , Δ 1 2
Γ Δ Δ Γ
The notation Γ Δ emphasizes the distinguished occurrence of a subtree Δ in Γ . We discuss this in more detail in 2. Subscripts have been added to facilitate matching with grammatical roles: su, do and io abbreviate subject, direct object and indirect object respectively.
Extraction in the Lambek-Grishin Calculus
59
Any subtree Δ of some Γ uniquely determines a context Γ s.t. Γ Γ Δ. This result extends to the level of type assignment: for derivable Γ B, any decomposition of Γ into a subtree Δ and its context Γ allows for a type A of the hole to be determined, in the sense that there are derivable Γ A B and Δ A. Cut recovers the original type assignment Γ B: Δ
A Γ A Γ Δ B
B
Cut
The act of singling out a subtree of Γ is assigned an operational meaning with C : the embedded constituent Δ with syntacMoortgat’s discontinuity types A B tic distribution characterized by A may seize scope over its embedding context Γ A of type B, with outcome an expression of type C.3 Δ
C Γ A AB
B
E C Moortgat proposes assigning types np ss to quantified noun phrases, abstracting over the sentential domain defining their scope. Here, we consider instead a more surface oriented manifestation of discontinuity: extraction. A predicate C an argument of which is extracted we lexically assign the type A B , B parameterizing over the types: A of its syntactic distribution; C of its extracted argument (instantiating B C by B C if it occurs in a right branch, and C B otherwise); and B of the extraction domain. Here, (diamond) and (box) are unary type-forming operators, held subject to:
Γ Δ
A
A
A
thereby preventing overgeneration: constituents of type C do not directly combine with gapped clauses, seeing as they do not derive C. As an example, consider the complex noun phrase lady whom John offered a drink. The object noun phrase (C np) of offered is extracted (from a right branch) at the level of the embedded sentence (B s), read itself locally selecting for a subject and direct object (A npsnp) (Figure 1 gives the derivation): whom nn s
np
io
John
offered
npsu
npsu s npdo s
np s
io
a
drink
npdo n
n
Use of E allows the assignment of snp to the gapped embedded sentence John offered a drink, establishing it as a proper argument for whom. We observe: 1. Our type schema for extraction places no constraints on the location of the gap, seeing as E operates at an arbitrarily deep level within a context.4 3 4
Moortgat writes instead q A, B, C , giving inference rules for a sequent calculus. Our notation is borrowed from [11]. Compare this to the situation in L, where only peripheral extraction is possible, at least as long as one does not wish to resort to the use of types that are highly specialized to their syntactic environments.
60
A. Bastenhof
C 2. We rely on limited lexical ambiguity: types A B are lexically asB signed next to the usual types for when no extraction takes place. Although the ambiguity is well under control (it being of finite quantity), we establish it reducible in the next section for several specific cases. 3. Our analysis is, of course, not limited to the case where the extraction domain is a sentence. For example, with wh-extraction in English the gap shows up at the level of a yes-no question [12, Chapter 3]:
Whom
np
wh q
did q sinf
John np
offer npsinf np q
np q
something? np
Here q, wh are the types of yes-no and wh-questions respectively.
3
The Lambek-Grishin Calculus
C from the We discuss a proposal of Bernardi and Moortgat ([1]) to derive A B more primitive type constructors of the Lambek-Grishin calculus (LG, [8]). The latter is an extension of NL exhibiting an involutive operation on types and sequents, manifesting as an arrow-reversing duality:5 A is derivable A
Γ
Γ
is derivable
More accurately, once we admit some notion of costructure Π naturally occurring on the right hand side of the turnstile: Γ
Π is derivable Π
Γ
is derivable
We realize at the level of types by adding subtractions , to the repertoire of type-forming operators: s np n B A A B AB B A
A..E
Formally,
(Atoms) (Left selection vs. right rejection) (Right selection vs. left rejection)
is then defined by B
AB A
BA A B
AB B A
B A A B
and A A is easily checked. def A for A atomic. Involutivity of (A At the level of sequents, we require a notion of costructure: Π, Σ
A
Π Σ ,
A A
Γ Δ
Δ Γ
Σ
Π Σ Π
Cocontexts Π are defined similarly to contexts Γ . Inference rules are listed in Figure 2. An easy induction shows that gives the requested duality. E.g., 5
In contrast with classical NL [2], this duality is not internalized via linear negation.
Fig. 1. Derivations for examples discussed in sections 1,2. Lexical insertion is realized (informally) by writing axioms A an inference of A from w when w is a word lexically assigned the type A.
Ax something npsinf np npsinf np np John E np np s np something np s inf inf did E offer q sinf John npsinf np something sinf E npsinf np q q np did John npsinf np something q whom E wh q np did John offer something q np E whom did John offer something wh
a drink np n n E Ax nps np nps np a drink np John E offered np nps np a drink nps E nps np s s np John nps np a drink s whom E nn s np John offered a drink s np E whom John offered a drink n n
the lady np n n offered a drink E nps np np the lady np np n n E E offered the lady nps np a drink np John E np offered the lady a drink nps E John offered the lady a drink s
A as
Extraction in the Lambek-Grishin Calculus 61
62
A. Bastenhof
A Δ
A
A Γ Π B I Γ Π AB
Ax
Π A Γ A Σ Cut Γ Δ Π Σ Δ
A B
A
A
Π B Π Σ
Σ
Δ A
B
Σ B A A Σ Π
Π
Γ B
E
Π B I Π B A
Γ B A Π I Γ A B Π
A B
Γ
E
A
Γ B Π A I Γ B A Π
B A
B Γ Δ Γ
Γ Γ
Δ
Fig. 2. Inference rules of LG. For Cut, we require either Γ
Γ
A B
Γ
Δ
Δ A
B
E
B
Δ A
B A Δ Γ
E
E or Π
Γ
.
E
Moortgat ([8]) originally formulated LG as a display calculus, although here we opted for a presentation more Natural Deduction-esque. Note, however, that (Cut) is not eliminable in this presentation, as with usual Natural Deduction. Compared to NL, LG’s expressivity is enhanced through the introductions: the inferred (co-)slash may occur arbitrarily deep inside the right-hand side (left-hand side) of the sequent. Bernardi and Moortgat exploit this property by decomposing types A C B as B C A. Indeed, E is then derivable: Ax Ax B C B C C C E B B B C C Cut Γ A B C C I Γ B C A C Cut C
Γ A Δ
B C A Γ Δ
Our analysis of extraction in 2 thereby immediately extends to LG: translate C AB as B B C A. In doing so, we open the door to B further improvement of our analysis. Recall from previous examples our use of two separate types for offered, depending on whether or not extraction took place. In LG, we can find a type D from which both may be derived : D D
npsu snpdo npio (No extraction) s np io np (Extraction) su snpdo s
whereas D npsu snpio npdo and D npsu snpio npdo , preventing overgeneration. Abbreviating npsu snpdo npio as dtv, we have the following solution for D: s npio DZ np ss dtv io s s s
Extraction in the Lambek-Grishin Calculus s s
s s
Ax E
s s s s npio s s s ss s ss npio s s s s npio s s dtv s s s
s s
bind
np
s s
E
io
np np
io
io
s dtv
s s dtv s
s s dtv s s s s s dtv s s
np np s io
bind
Ax E
s s s s bind s s dtv s dtv s
E
s np np s np s s np np s np s snp np np np s np s s io
su
io
io
63
io
do
su
su
io
do
E I
io
do
Fig. 3. Explaining the intuition behind D. Active formulas are explicitly marked with a suffix in order to faciliate understanding.
thus at least partially eliminating the need for (lexically) ambiguous type assignment. Figure 3 explains the intuition behind D, using the following derived rules in order to abstract away from unnecessary details: Γ A B bind C C Γ A B
Γ
AB C B, B E Γ C A
Γ
Γ B E C C B
B More generally, we can find such a D for types AB and A C provided C
C
headA (e.g., if A D
4
npsnp, then C
s), the solution then being:6
B C B B C C AC C C
Formal Semantics
Montagovian semantics splits the labour between a derivational and a lexical component. The former motivates LG’s inference rules semantically, showing how they combine the denotations associated with their premises into a denotation of their conclusion. The lexical component determines the base of the 6
The case of extraction from a left branch is similar.
64
A. Bastenhof
τ
x
x τ
E Δ N τ σ Γ, τ , σ M τ Γ, Δ case N of x y .M τ Γ
M Γ, Δ
τ
τ
y
σ
I Γ M τ Δ N σ Γ, Δ M N τ σ Γ, τ x M φ Γ λxτ M τ
Δ N τ M N φ x
N τ Γ, τ x M τ Cut Γ, Δ M N x τ
Δ
Ax
E
τ λx M N case N1 N2 of xτ y σ .M
β β
I
M N x M N1 x, N2 y
Fig. 4. Recipes for meaning construction, phrased in linear λ-calculus. In E , E , we require the sets of variables in Γ, Δ to be disjoint.
recursion, specifying the denotations of the individual words. 4.1 gives a derivational semantics of LG, inspired by [5] and [3], while 4.2 illustrates the lexical semantics associated with our extraction type schema. 4.1
Derivational Semantics
We phrase our instructions for meaning composition in a restriction of the linear simply-typed λ-calculus, detailed in Figure 4. The (semantic) type τ of a term M is determined relative to a context Γ , being a multiset of type declarations xσ for the free variables in M . This relation is expressed by a sequent Γ M τ , held subject to various linearity constraints: Γ contains no variables not in M , and no type declaration occurs more than once. In other words, we dispense with weakening and contraction, and end up with a fragment of intuitionistic multiplicative linear logic. We speak of a fragment, in that we require only limited use of implications. Formally, we define semantic types τ, σ, ρ by τ, σ, ρ
s np n φ
τ
τ σ
Note that we considered again atomic types s, np and n: the specification of their referential properties we leave to the lexical component. The distinguished type φ acts as the sole answer type of linear implications: τ then reads as τ φ. The term language thus obtained is essentially the linear counterpart of the restricted λ-calculus explored in [5]. We shall refer to it as LP, , or simply LP when no confusion arises with the full LP , . Finding a derivational semantics for LG now amounts to translating a derivation for a multiple conclusion sequent Γ Π into a single conclusioned Δ M τ . Such problems have been tackled before in the proof theory of classical logic through the familiar double negation translations into minimal logic: roughly, translate the conclusions of a derivation as hypotheses by negating them. This practice typically results in the prefixing of a great deal of subformulas with double negations ; most unpractical as a foundation for a Montagovian semantics. Here, we adapt instead the approach by Girard ([3]), who obtained a
Extraction in the Lambek-Grishin Calculus
65
significantly more efficient translation by taking into account the polarity of a formula. Within LG, we will define the latter concept by stating atomic types to have positive polarity, while complex types are said to have negative polarity: P, Q K, L
s np n AB B A
B A
(Positive types) A B (Negative types)
We define a translation taking LG’s types to LP. We set A A for atomic A, whereas A for complex A explicitly abstracts over the polarities B of its direct subtypes B ( if positive, if negative):7 A B
B A, A B A B A B A B A B
AB , B A A B A B A B A B
Extending the translation to sequents, we set
A
A for positive input A or negative output A A for negative input A or positive output A
where A is input if it occurs in the antecedent (left-hand side) of a sequent, and output if it occurs in the consequent (right-hand side). A derivation of Γ Π we then interpret by a term-in-context Γ , Π M φ, with Γ and Π denoting pointwise applications of , trading in the structure built up by , for LP’s multisets. Figure 4.1 interprets LG’s inference rules, restricting the treatment of introductions and eliminations of complex A to the case where each of its direct subtypes is negative, leaving the other cases as an exercise. We note: 1. We mention explicitly only the active and main formulas of each rule. 2. λxτ σ .case x of y τ z σ .M (x not free in M ) is abbreviated λy τ z σ M . 3. We use α, β, γ, possibly sub- or superscripted, as variable names for types A with A originating in the righthand side of the translated sequent. Returning to our analysis of extraction, we illustrate with the slightly simplified lady n
whom John nns npdo npsu
saw npdo npsu s s s
leaving more room for discussion of the lexical semantics associated with the extraction type schema. The derivational semantics is calculated in Figure 4.1. For reasons of space, we merged the syntactic and semantic derivations into one, presenting it in list format. 7
A less negation-heavy translation is in fact possible if we take subtractions to be positive. However, we would then be unable to give a lexical semantics for our extraction types. See also our discussion in 5.
11. 12. 13. 14.
9. 10.
w
whomw j John saws ladyl j John saws
Johnj
z
saws saws
np nps
Johnj Johnj
so Johnj npsx4 nps x4 nps x4
x2
x1
β,η
β,η
β,η
x
s
np
α
np
α5
P
x
M φ
α
β
M λγ
L
L
L
γ
y
M φ
s np
α2
E
(E,13,12)
(Cut,9,10) (Ax) ( E,11,10) (Ax)
(Ax)
(Cut,6,3) (I,7)
(Ax) (Ax) (E,1,2) (Ax) (Ax) (E,4,5)
γ β x y z φ
LK
N λβ
KL
z
N φ
Cut
E
K M λβ N α γ φ K L M φ L N φ
x
K
N φ
λz z λx4 α1 x4 λoα1 o α2 j λα5 s α5 s λx4 α1 x4 λoα1 o α2 j s npα2 α6 w α6 nn s np β w β λα2 s λx4 α1 x4 λoα1 o α2 j nn α α7 l n 7 λv w γ v λα2 s λx4 α1 x4 λoα1 o α2 j l γ w γ l λα2 s λx4 α1 x4 λoα1 o α2 j n
s α5 nps s
, Lβ M φ
I z λβ x M φ
α
P N λx P M λα
P
1 x1 α1 s s x2 α2 s npα2 λγ λβ γ o β α2 α1 β,η α1 o α2 s s npα1 α3 α3 j np α x4 α4 nps 4 α λux4 α u j β,η x4 α j s λαx4 α j λoα1 o α2 x4 λoα1 o α2 j s s npα1 s npα2 α2 z λx4 α1 x4 λoα1 o α2 j s np
z
K L
K
γ
α
Cut
M φ
I case γ of y α.M φ
y
L , K
M φ
N φ
x
K
K M λα K λx
LK
np np
N φ
Fig. 5. Recipes for meaning construction, phrased in linear λ-calculus. Sample derivation included.
whom
Ax
s s s
whomw
s s
ladyl
8. Johnj
1. 2. 3. 4. 5. 6. 7.
α x φ
α
x α φ
, K
α
P , P
x
K
x
Ax
α
K
66 A. Bastenhof
Extraction in the Lambek-Grishin Calculus
4.2
67
Lexical Semantics
Having established a compositional procedure for determining the denotations of complex constituents from those of their components, we close off the recursion by interpreting the words at the yields. In doing so, we need no longer commit to linearity constraints: while recipes for compositional meaning construction inherit their linearity from the syntactic mechanisms they interpret, our means of referring to the world around us is not so restricted. Thus, we fix as our domain of discourse the simply typed λ-calculus with product and implication types. Moreover, we ask at least for atomic types e, t, characterizing the denotational domains of entities and truth values respectively: τ, σ
e t
τ
σ
τ
σ
Complex linear types carry over straightforwardly: replace occurrences of τ σ and τ by τ σ and τ t respectively. Moreover, terms M N become pairings M, N and case expressions case N of x y.M become simultaneous substitutions M π1 N x, π2 N y . Abbreviating λxτ σ M π1 xy, π2 xz by λy, z .M and types τ t by τ , our terms of 4.1 remain practically unchanged. The only non-triviality lies in how we interpret the atoms φ, s, np, n: φ t
s t
np e
n e
In particular, input occurrences of s, np, n interpret as s, np and n, which now read t (sentences denote truth values), e (noun phrases denote entities) and e e t (common nouns denote first-order properties). Henceforth, we write for the composition of ( 4.1) with the mapping to the type language of the λ-calculus, as outlined above. Now consider again our example: lady n
whom John nns np do npsu
saw s npdo np su s s
The derivational interpretation of this relative clause we calculated to be w
γ, l, λα2 s λx4 , α1 x4 λoα1 o, α2 , j
parameterizing over the following lexical denotations: Word Variable Type lady l n whom w nns np John j np saw s s s np nps Our task is to substitute appropriate terms for l, w, j, s such that the result becomes logically equivalent to:
γ λy e lady y saw y john
68
A. Bastenhof
with γ the remaining free variable of type n to have at our disposal the following constants:
e. For this task, we assume
Constant(s) Type Description (denotation)
e Existential quantification e e Equality of entities t t Conjunction john e Entity lady e (First-order) property see e e Binary relation In particular, we assume the interpretations of , and to uniformly match their semantics in classical first-order predicate logic across (the usual set-theoretic) models for the λ-calculus.8 Suitable witnesses to the variables j, l are now easily provided: just take the constants john and lady respectively. For whom, we seek a term of the type nns np nn snp n n s np e e t e
def def def
We propose λγ, P , Qγ λy e P y Q λpt p, y Our real interest, of course, is in saw. Its type nps s snp decomposes as s snp nps. Since it is negative and occurs in antecedent position, we seek a term of the type s s np nps s snp nps . That is, we seek an abstraction: λR
ssnpnpsì
N
reducing the problem to that of finding a suitable instantiation of N of type t. The type of the bound variable R further dissects as
s snp nps
def
nps s snp
Thus, we are to provide R with two arguments, one of type of type s snp . We start with the former: nps N1
def def
t e λq t , xe q
nps , the other
see u x
Notice the use of a free variable u (type e) as a place-holder for the extracted argument of saw (i.e., the direct object). As for the second argument of R,
8
s snp N2
In practice, we write
and
def def
t t e λpt , q t , y e q p y
u
in infix notation, and abbreviate λxe M as xe M .
Extraction in the Lambek-Grishin Calculus
69
In essence, N2 contains the recipe for construing the denotation of the gapped embedded clause John saw . Crucially, it equates the denotation of the extracted argument (the bound variable y) with u (the free variable standing in for the direct object in N1 ). An existential binding of u concludes the denotation of saw: N
ue R N1 , N2 e
u R λq, xq
see u x, λp, q , y q p y
def
u
With the denotations of each of the lexical items fixed, we apply β-reduction. We sketch the computation. Starting again from the term below, replace j with john and l with lady: w
γ, l, λα2 s λx4 , α1 x4 λoα1 o, α2 , j
Substituting for s the term we found for saw in s λx4 , α1 x4 λoα1 o, α2 , john
yields, after β-reduction,
uπ1 α2
see u john π2 α2
u
Finally, replacing w with the term we found for whom in w
γ, lady, λα2 uπ1 α2
see u john π2 α2
u
gives a term β-reducible to γ λy e lady y ue see u john y
u
The desired result we then obtain by the following equivalence, licensed by the laws of predicate logic:9
5
ue see u john y
u see y john
Discussion
We have provided an analysis for extraction in the Lambek-Grishin calculus. Its syntactic component was inspired by Moortgat’s discontinuous type constructor, while its semantic component drew upon a double negation translation into the simply-typed λ-calculus. Characteristic of the lexical semantics we proposed was the use of constants for equality and existential quantification to identify the extracted argument with its gap position. Ours is not the only analysis of extraction in LG. We mention several alternatives: 1. Moot ([10]) provides an embedding of lexicalized tree adjoining grammars in LG, illustrating it with a grammar for a language fragment exhibiting discontinuous dependencies. Like ours, Moot’s approach relies on (finite) lexical ambiguity, although we have not yet seen it been coupled with a Montagovian semantics. 9
Compare this to Montague’s PTQ treatment of to be.
70
A. Bastenhof
2. Moortgat and Pentus ([9]) explore the relation of type similarity in LG, with A, B being type similar (written A B) in case A, B are derivable from a common D, referred to as their meet (or, equivalently, A B iff A, B both derive some join C). In particular, they observe npsnppp npsppnp (pp an atomic type for prepositions), suggesting the assignment of their meet to gave so as to make derivable both John gave a book to Mary as well as the gapped book that John gave to Mary from a single type assignment. Our proposal constitutes a refinement of this approach: we observed (as an instance of the general schema discussed at the end of 3) s np , witnessing it by a meet that we npsppnp npspp s ensured would not lead to overgeneration. 3. A different approach, though not tailored to LG, is that of [12]: extraction derives from ’movement’, expressed by permutations of antecedent structures. To prevent a collapse into multisets, the latter operation is licensed only when the phrase undergoing the dislocation is prefixed by (hence applicable to the type np used for the gap in 3):
A B C A B C
A B C B A C
A B C A B C
A B C A C B
The current approach, like those discussed above, instead derives Vermaat’s postulates as linear distributivity principles between merger and the subtractions , , albeit with the turnstile turned around. Indeed, the following sequents, studied originally by [4], are derivable in our presentation of LG: A B C A B C
A B C B A C
A B C A B C
A B C A C B
Conversely, one may present LG using these sequents as axioms. We refer to [8] for further details. We conclude with a discussion of LG’s derivational semantics. Our proposal in 4.1 ’competes’ with that of [1], who consider the following two dual translations (named call-by-name and call-by-value, after the evaluation strategies they encode for Cut elimination in sequent calculus): (CBN) (CBV) A (atomic) A A AB, B A B A B A B A, A B B A B A
A derivation of Γ Π we then interpret by Γ , Π M φ (CBN) or by Γ , Π M φ (CBV). We make two observations: [1] make no explicit reference to polarities; and our own proposal sides with CBN for complex A (for negative direct subtypes), but with CBV for atomic types. The second observation hints at an explanation of the first: CBN considers all types negative, whereas CBV considers all types positive, thus preventing any mixing of polarities. As an illustration of how the two proposals compare in practice, consider again our term for saw in 4.2:
Extraction in the Lambek-Grishin Calculus
λR uR λq, xq
see u x, λp, q , y q p y
71
u
In CBN, we would have had to adopt the more complex λR uR λk, X X λxk see u x , λw, k , Y w λpk p Y λy y
u
with no term being derivable at all in CBV. We leave the further comparisons between these two proposals for future research. Acknowledgements. I thank the following people for their comments on earlier drafts of this paper: Michael Moortgat, Raffaella Bernardi, Jan van Eijck, Andres L¨oh, Gianluca Giorgolo, Christina Unger, as well as two anonymous referees.
References 1. Bernardi, R., Moortgat, M.: Continuation semantics for symmetric categorial grammar. In: Leivant, D., de Queiroz, R.J.G.B. (eds.) WoLLIC 2007. LNCS, vol. 4576, pp. 53–71. Springer, Heidelberg (2007) 2. De Groote, P., Lamarche, F.: Classical non-associative lambek calculus. Studia Logica 71, 355–388 (2002) 3. Girard, J.-Y.: A new constructive logic: Classical logic. Mathematical Structures in Computer Science 1(3), 255–296 (1991) 4. Grishin, V.N.: On a generalization of the Ajdukiewicz-Lambek system. In: Mikhailov, A.I. (ed.) Studies in Nonclassical Logics and Formal Systems, Nauka, Moscow, pp. 315–334 (1983) 5. Lafont, Y., Reus, B., Streicher, T.: Continuation semantics or expressing implication by negation. Technical report, Ludwig-Maximilians-Universit¨ at, M¨ unchen (1993) 6. Moortgat, M.: Generalized quantifiers and discontinuous type constructors. In: Horck, A., Sijtsma, W. (eds.) Discontinuous Constituency, pp. 181–208. Mouton de Gruyter, Berlin (1992) 7. Moortgat, M.: Categorial type logics. In: van Benthem, J.F.A.K., ter Meulen, G.B.A. (eds.) Handbook of Logic and Language, pp. 93–177. Elsevier, Amsterdam (1997) 8. Moortgat, M.: Symmetries in natural language syntax and semantics: The LambekGrishin calculus. In: Leivant, D., de Queiroz, R.J.G.B. (eds.) WoLLIC 2007. LNCS, vol. 4576, pp. 264–284. Springer, Heidelberg (2007) 9. Moortgat, M., Pentus, M.: Type similarity for the Lambek–Grishin calculus. In: Proceedings 12th conference on formal grammar (2007) 10. Moot, R.: Proof nets for display logic. CoRR, abs/0711.2444 (2007) 11. Shan, C.-C.: A continuation semantics of interrogatives that accounts for Baker’s ambiguity. In: Jackson, B. (ed.) Proceedings of SALT XII: semantics and linguistic theory, pp. 246–265 (2002) 12. Vermaat, W.: The logic of variation. A cross-linguistic account of wh-question formation. PhD thesis, Utrecht Institute of Linguistics OTS, Utrecht University (2006)
Formal Parameters of Phonology From Government Phonology to SPE Thomas Graf Department of Linguistics University of California, Los Angeles [email protected] http://tgraf.bol.ucla.edu
Abstract. Inspired by the model-theoretic approach to phonology deployed by Kracht [25] and Potts and Pullum [32], I develop an extendable modal logic for the investigation of phonological theories operating on (richly annotated) string structures. In contrast to previous research in this vein [17, 31, 37], I ultimately strive to study the entire class of such theories rather than merely one particular incarnation thereof. To this end, I first provide a formalization of classic Government Phonology in a restricted variant of temporal logic, whose generative capacity is then subsequently increased by the addition of further operators, thereby pushing it up the subregular hierarchy until one reaches the level of the regular stringsets. I identify several other axes along which Government Phonology might be generalized, moving us towards a parametric metatheory of phonology.
Like any other subfield of linguistics, phonology is home to a multitude of competing theories that differ vastly in their conceptual and technical assumptions. Contentious issues are, among others, the relation between phonology and phonetics (and if it is an interesting research question to begin with), if features are privative, binary or attribute valued, if phonological structures are strings, trees or complex matrices, if features can move from one position to another (i.e. if they are autosegments), and what role optimality requirements play in determining well-formedness. Meticulous empirical comparisons carried out by linguists have so far failed to yield conclusive results; it seems that for every phenomenon that lends support to a certain set of assumptions, there is another one that refutes it. The lack of a theoretical consensus should not be taken to indicate that the way phonologists go about their research is flawed. Unless one subscribes to the view that scientific theories can faithfully reflect reality rather than merely approximate it, it is to be expected that one theory may fail where another one succeeds, and vice versa. A similar situation arises in physics, where depending
This paper has benefited tremendously from the comments and suggestions of Bruce Hayes, Ed Keenan, Marcus Kracht, Ed Stabler, Kie Zuraw, the members of the UCLA phonology seminar (winter quarter 2009), and two anonymous reviewers.
T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 72–86, 2010. c Springer-Verlag Berlin Heidelberg 2010
Formal Parameters of Phonology
73
on the circumstances light exhibits particle-like or wave-like properties. But faced with this apparent indeterminacy of theory choice, it is only natural for us to ask if there is a principled way to identify interchangeable theories, i.e. proposals which may seem to have little in common yet are underlyingly the same. This requires developing a metatheory of phonology that uses a finite set of parameters to conclusively determine the equivalence class which a given phonological theory belongs to. This paper is intended to lay the basis for such a metatheory, building on techniques and insights from model-theoretic syntax [24, 35, 36]: I develop a modal logic for the formalization of a particular theory, Government phonology (GP), and then use this modal logic and its connections to neighboring areas, foremost formal language theory, to explore natural extensions and their relation to other approaches in phonology. I feel obliged to point out in advance that I have my doubts concerning the feasibility of a formal theory of phonology that is adequate and insightful on both a linguistic and a mathematical level. But this is a problem all too familiar to mathematical linguists: any mathematically natural class of formal languages allows for constructions that never arise in natural language. For example, assignment of primary word stress is sometimes sensitive to whether a syllable is an odd or an even number of syllables away from the edge of a word (see [10] and my remarks in Sec. 2). Now in order to distinguish between odd and even, phonology has to be capable of counting modulo 2. On the other hand, phenomena that involve counting modulo 3, 4 or 21 — which from a mathematical perspective are just as simple as counting modulo 2 — are unheard of. Thus, the problem of mathematical methods in the realm of language is that their grip tends to be too loose, and the more we try to tighten it, the more difficult it becomes to prove interesting results. Undeniably, though, a loose grip is better than no grip at all. I am confident that in attempting to construct the kind of metatheory of phonology I envision, irrespective of any shortcomings it might have, we will gain crucial insights into the core claims about language that are embodied by different phonological assumptions (e.g. computational complexity and memory usage) and how one may translate those claims from one theory into another. Moreover, the explicit logical formalization of linguistic theories makes it possible to investigate various problems in an algorithmic way using techniques from proof theory and model checking. These results are relevant to linguists and computer scientists alike. Linguists get a better understanding of how their claims relate to the psychological reality of language, how the different modules of a given theory interact to yield generalizations, and how they increase the expressivity of a theory (see [32] for such results on optimality theory). To a limited degree, linguists also get the freedom to switch to different theories for specific phenomena without jeopardizing the validity of their framework of choice. Computer scientists, on the other hand, will find that the model-theoretic perspective on phonology eases the computational implementation of linguistic proposals and allows them to gauge their runtime-behavior in advance. Furthermore, they may use the connection between finite model theory and formal language theory to increase the
74
T. Graf
efficiency of their programs by picking the weakest phonological theory that is expressive enough for the task at hand. This paper is divided into two parts as follows. First, I introduce GP as an example of a weak theory of phonology and show how it can be axiomatized as a theory of richly annotated string structures using modal logic. In the second part, I analyze several parameters that distinguish GP from other proposals and might have an effect on generative capacity. In particular, I discuss how increasing the power of GP’s spreading operation moves us along the subregular hierarchy and why the specifics of the feature system have no effect on expressivity in general. I close with a short discussion of two important areas of future research, the impact of the syllable template on generative capacity and the relation between derivational and representational theories. The reader is expected to have some basic familiarity with phonology, formal language theory, non-classical logics and model-theoretic syntax. There is an abundance of introductory material for the former three, while the latter is cogently summarized in [34] and [35].
1 1.1
A Weak Theory of Phonology — Government Phonology Informal Overview
Due to space restrictions, I offer but a sketch of the main ideas of Government Phonology (GP). More readily accessible expositions may be found in the User’s Guide to Government Phonology [20] and related work of mine [10, 11]. To compensate for the terseness, the reader may want to check the explanation against the examples in Fig. 1 on the facing page. Before we go in medias res, though, a note on my sources is in order. Just like Government-and-Binding theory [4], GP has changed a lot since its inception and practitioners hardly ever fully specify the details of the version of GP they use. However, there seems to be a consensus that a GP-variant is considered canonical if it incorporates the following modules: government, the syllable template, coda licensing and the ECP from [21], magic licensing from [19], and licensing constraints and the revised theory of elements from [20]. My strategy will be to follow the definitions in [20] as closely as possible and fill in any gaps using the literature just cited. In GP, the carrier of all phonological structure is the skeleton, a finite, linearly ordered sequence of nodes (depicted by little crosses in Fig. 1) to which phonological expressions (PEs) can be attached in order to form the melody of the structure. A PE is built from a set E of privative features called elements, yielding a pair O, H, where O ⊆ E is a set of operators, H ∈ E ∪ {∅} the head, and H ∈ / O. It is an open empirical question how many features are needed for an adequate account of phonological behavior [13, 14] — recent incarnations usually set E := {A, I, U, H, L, }, but for our axiomatization the only requirement is for E to be finite. Some examples of PEs are = {A, H} , ∅, = {L, } , A, = ∅, ∅, = {I} , ∅, = ∅, I, and = ∅, I. The set of licit PEs is
Formal Parameters of Phonology O
R
O R
x x x
x x x
N
O R O R O R
N
N
N
x x x x x x x
O R O R O R
O R
N C
N
N
C
N
O R O R
N
N
N
x x x x x x
x x x x x x
75
Fig. 1. Some phonological structures in GP (with IPA notation)
further restricted by language-specific licensing constraints, i.e. restrictions on the co-occurrence of features and their position in the PE. Common licensing constraints are for A to occupy only head positions, ruling out in the list above, and for I and U not to occur in the same PE, ruling out the typologically uncommon = {U} , I and = {I} , U, among others. As witnessed by = ∅, I and = ∅, I, every PE is inherently underspecified; whether it is realized as a consonant or a vowel depends on its position in the structure, which is annotated with constituency information. An expression is realized as a vowel if it is associated to a skeleton node contained by a nucleus (N), but as a consonant if the node is contained by an onset (O) or a coda (C). Every N constitutes a rhyme (R), with C an optional subconstituent of R. All O, N and R may branch, that is be associated to up to two skeleton nodes, but a branching R must not contain a branching N. Furthermore, word initial O can be floated, i.e. be associated to no node at all. The number of PEs per node is limited to one, with the exception of unary branching N, where the limit is two (to model light diphthongs). All phonological structures are obtained from concatenating O, R pairs according to constraints imposed by two government relations. Constituent government restricts the distribution of elements within a constituent, requiring that the leftmost PE licenses all other constituent-internal PEs. Transconstituent government enforces dependencies between the constituents themselves. In particular, every branching O has to be licensed by the N immediately following it, and every C has to be licensed by the PE contained in the immediately following O. Even though the precise licensing conditions are not fully worked out for either government relation, the general hypothesis is that PE i licenses PE j iff PE i is leftmost in its constituent and contained by N, or leftmost in its constituent and composed from at most as many elements as PE j and licenses no PE k = PE j
76
T. Graf
(hence any C has to be followed by a non-branching O, but a branching O might be followed by a branching N or R). GP also features empty categories: a segment does not have to be associated to a PE. Inside a unary branching O, an unassociated node will always be mapped to the empty string. Inside N, on the other hand, it is either mapped to the empty string or the language-specific realization of the PE {∅} , ∅. This is determined by the phonological ECP, which allows only p-licensed N to be mapped to the empty string. N is p-licensed if it is followed by a coda containing a sibilant (magic licensing), or in certain languages if it is the rightmost segment of the string (final empty nucleus, abbreviated FEN), or if it is properly governed [18]. N is properly governed if the first N following it is not p-licensed and no government relations hold between or within any Cs or Os in-between the two Ns. Note that segments inside C or a branching O always have to be associated to a PE. Finally, GP allows elements to spread, just as in fully autosegmental theories [9]. All elements, though, are assumed to share a single tier, and association lines are allowed to cross. The properties of spreading have not been explicitly spelled out in the literature, but it is safe to assume that it can proceed in either direction and might be optional or obligatory, depending on the element, its position in the string and the language in question. While there seem to be restrictions on the set of viable targets given a specific source, the only canonical one is a ban against spreading within a branching O. 1.2
Formalization in Modal Logic
For my formalization, I use a very weak modal logic that can be thought of as the result of removing the “sometime in the future” and “sometime in the past” modalities from restricted temporal logic [6, 7]. Naturally, the tree model property of modal logic implies that the logic is too weak to define the intended class of models, so we are indeed dealing with a formal description rather than a proper axiomatization. Let E be some non-empty finite set of basic elements different from the neutral element v, which represents the empty set of GP’s feature calculus. We define the set of elements E := (E × {1, 2}× {head , operator } × {local , spread }) ∪ ({v} × {1, 2} × {head , operator } × {local }). The intended role of the head /operator and local /spread parameter is to distinguish elements according to their position in the PE and whether they arose from a spreading operation, respectively. The second projection is of very limited use and required only by GP’s rendition of light diphthongs as two PEs associated to one node in the structure. The set of melodic features M := E ∪ {μ, fake, } will be our set of propositional variables. The intention is for μ (mnemonic for mute) and to mark unpronounced and licensed segments, respectively, while fake denotes an unassociated onset. For the sake of increased readability, the set of propositional variables is “sorted” such that x ∈ M is represented by m, m ∈ E by e, heads by h, and operators by o. The variable en is taken to stand for any element such that π2 (e) = n, where πi (x) returns the ith projection of x. In rare occasions, I will write e and e for a specific element e in head and operator position, respectively.
Formal Parameters of Phonology
77
Furthermore, there are three nullary modalities1 , N , O, C, the set of which is designated by S, read skeleton. In addition, we introduce two unary diamond operators and , whose duals are denoted by and . The set of well-formed formulas is built up in the usual way from M, S, , , → and ⊥. Our intended models M := F, V are built over bidirectional frames F := D, Ri , R i∈S , where D is an initial subset of N, Ri ⊆ D for each i ∈ S, and R is the successor function over N. The valuation function V : M → ℘(D) maps propositional variables to subsets of D. The definition of satisfaction is standard, though it should be noted that our models are “numbered from right to left”. That is to say, 0 ∈ D marks the right edge of a structure and n + 1 is to the left of n. This is due to GP’s transconstituent government being computed from right to left. M, w |= ⊥ M, w |= p M, w |= ¬φ M, w |= φ ∧ ψ M, w |= N M, w |= O M, w |= C M, w |= φ M, w |= φ
never iff w ∈ V (p) iff M, w φ iff M, w |= φ and M, w |= ψ iff w ∈ RN iff w ∈ RO iff w ∈ RC iff M, w + 1 |= φ iff M, w − 1 |= φ
With the logic fully defined, we can turn to the axioms for GP. The formalization of the skeleton is straightforward if one models binary branching constituents as two adjacent unary branching ones and views rhymes as mere notational devices. Recall that Ns containing light diphthongs are implemented as a single N with both e1 and e2 elements associated to it. S1 Unique constituency i∈S (i ↔ i =j∈S ¬j) S2 ( ⊥ → O) ∧ ( ⊥ → N ) Word edges S3 R ↔ (N ∨ C) Definition of rhyme S4 N → O∨ N Nucleus placement S5 O →¬O∨¬O Binary branching onsets S6 R→¬R∨¬R Binary branching rhymes S7 C → N ∧ O Coda placement GP’s feature calculus is also easy to capture. A propositional formula φ over a set of variables x1 , . . . , xk is called exhaustive iff φ := 1≤i≤k ψi , where for every i, ψi is either xi or ¬xi . A PE φ is an exhaustive propositional formula over E such that φ ∪ {F1, F2, F3, F4, h, o} is consistent. 1
I follow the terminology of [1] here. Nullary modalities correspond to unary relations and can hence be thought of as propositional constants. As far as I can see, nothing hinges on whether we treat constituent labels as nullary modalities, propositional constants, or propositional variables; my motivation in separating them from phonological features stems solely from the parallel distinction between melody and constituency in GP.
78
T. Graf
F1 F2 F3 F4
(hn → hn Exactly one head =hn ¬hn ) ¬v → (hn → π1 (h)=π1 (o) ¬on ) No basic element (except v) twice v → o v excludes other operators =v ¬o (e2 → h1 ∧ o1 ) Pseudo branching implies first branch
Let PH be the least set containing all PEs (noting that a PE is now a particular kind of propositional formula), and let lic : PH → ℘(PH ) map every PE to its set of melodic licensors. Furthermore, S ⊆ PH designates the set of PEs occurring in the codas of magic licensing configurations (the letter S is mnemonic for “sibilants”). The following five axioms, then, sufficiently restrict the melody. Universal annotation M1 i∈S i → φ∈PH φ ∨ μ ∨ fake M2 ((O∨ N ∨ N ) → ¬e2 ) No pseudo branching for O, C & branching N M3 O∧ O → φ∈PH (φ → ψ∈lic(φ) ψ) Licensing within branching onsets M4 C ∧ i∈S ¬i → ¬μ ∧ φ∈PH (φ → ψ∈lic(φ) ψ) Melodic coda licensing fake → O ∧ m Fake onsets M5 =fake ¬m Remember that GP allows languages to impose further restrictions on the melody by recourse to licensing constraints. It is easy to see that licensing constraints operating on single PEs can be captured by propositional formulas. The licensing constraint “A must be head”, for instance, corresponds to the propositional formula ¬A. Licensing constraints that extend beyond a single segment can be modeled using and , provided their domain of application is finitely bounded (see the discussion on spreading below for further details). Thus licensing constraints pose no obstacle to formalization in our logic, either. As mentioned above, I use μ to mark “mute” segments that will be realized as the empty string. The distribution of μ is simple for O and C — the latter never allows it, and the former only if it is unary branching and followed by a pronounced N. For N, on the other hand, we first need to distribute in a principled manner across the string to mark the licensed nuclei, i.e. those N that may remain unpronounced. Note that unpronounced segments may not contain any other elements (which would affect spreading). L1 μ → m∈{μ,} ¬m ∧ ¬C ∧ (N → ) Empty categories / L2 L3 L4
N ∧ N → (μ ↔ μ) No partially mute branching nuclei O ∧ μ → ¬ O∧ (N ∧ ¬μ) Mute onsets N ∧ ↔ (C ∧ i∈S i) ∨ (¬ N ∧ ⊥) ∨ P-licensing Magic Licensing
FEN
((¬ N → ( N ∨ ⊥)) ∧ (¬ N → (N ∧ ¬μ))) Proper Government
Formal Parameters of Phonology
79
Axiom L4 looks daunting at first, but it is easy to unravel. The magic licensing conditions tells us that N is licensed if it is followed by a sibilant in coda position.2 The FEN condition ensures that wordfinal N are licensed if they are nonbranching. The proper government condition is the most complex one, though it is actually simpler than the original GP definition. Remember that N is properly governed if the first N following it is pronounced and neither a branching onset nor a coda intervenes. Also keep in mind that we treat a binary branching constituent as two adjacent unary branching constituents. The proper government condition then enforces a structural requirement such that N (or the first N if we are talking about two adjacent N) may not be preceded by two constituents that are not N and (the second N) may not be followed by two constituents that are not N or not pronounced. Together with axioms S1–S7, this gives the same results as the original constraint.3 The last module, spreading, is also the most difficult to accommodate. Most properties of spreading are language specific — only the set of spreadable features and the ban against onset internal spreading are universal. To capture this variability, I define a general spreading scheme σ with six parameters i, j, ω, , min and max . ω
π1 (i)=π1 (j)
(i ∧ ω →
max n=min
♦n (j ∧ ) ∧ (O ∧ ♦O → ω
max
♦n (j ∧ ))) ω
σ :=
n=min+1
The variables i, j ∈ E, coupled with judicious use of the formulas ω and regulate the optionality of spreading. If spreading is optional, i is a spread element and ω, are formulas describing, respectively, the structural configuration of the target of spreading and the set of licit sources for spreading operations to said target. If spreading is mandatory, then i is a local element and ω, describe the source and the set of targets. If we want spreading to be mandatory in only those where cases max a target is actually available, ω has to contain the subformula n=min ♦n . Observe moreover that we need to make sure that every structural configuration is covered by some ω, so that unwanted spreading can be blocked by making ω
ω
ω
ω
ω
2
3
Note that we can easily restrict the context, if this appears to be necessary for em pirical reasons. Strengthening the condition to (C ∧ i∈S i)∧ ⊥, for example, restricts magic licensing to the N occupying the second position in the string. In this case, the modal logic is once again flexible enough to accommodate various alternatives. For instance, if proper government should be limited to non-branching Ns, one only has to replace both occurrences of → by ∧. Also, my formalization establishes no requirement for a segment to remain silent, because N often are pronounced in magic licensing configurations or at the end of a word in a FEN language. For proper government, however, it is sometimes assumed that licensed nuclei have to remain silent, giving rise to a strictly alternating pattern of realized and unrealized Ns. If we seek to accommodate such a system, we have to distinguish Ns that are magically licensed or FEN licensed from Ns that are licensed by virtue of being properly governed. The easiest way to do so is to split into two features o and m (optional and mandatory), the latter of which is reserved for properly governed Ns. The simple formula m → μ will force such Ns to remain unpronounced.
80
T. Graf
not satisfiable. As further parameters, the finite values min, max > 0 encode the minimum and maximum distance of spreading, respectively. Finally, the operator ♦ ∈ {, } fixes the direction of spreading for the entire formula (♦n is the n-fold iteration of ♦). With optional spreading, the direction of the operator is opposite to the direction of spreading, otherwise they are identical. The different ways of interaction between the parameters is summarized in Table 1. Table 1. Parameterization of spreading patterns with respect to σ Direction
optional optional mandatory mandatory
left right left right
i
ω
ω
Mode
♦
spread spread local local
target target source source
source source target target
As the astute reader (or rather, all readers that took a glimpse at footnotes 2 and 3) will have noticed by now, nothing in our logic prevents us from defining alternative versions of GP. Whether this is a welcome state of affairs is a matter of perspective. On the one hand, the flexibility of our logic ensures its applicability to a wide range of different variants of GP, e.g. to versions where spreading is allowed within onsets or where the details of proper government and the restrictions on branching vary. On the other hand, it raises the question whether there isn’t an even weaker modal logic that is still expressive enough to formalize GP. However, the basic feature calculus of GP already requires the logical symbols ¬ and ∧, which gives us the complete set of logical connectives, and we furthermore need and to move us along the phonological string. Hence, imposing any further syntactic restrictions on formulas requires advanced technical concepts such as the number of quantifier alternations. But this brings us back to an issue I discussed in the preface to this section: the loose grip of mathematical methods, and why it isn’t as problematic as it might seem initially. Lest I unnecessarily bore the reader with methodological remarks, I shall merely point out that it is doubtful that a further weakening of the logic would would have interesting ramifications given the questions I set out to answer; I am not interested in the logic that provides the best fit for a specific theory but in the investigation of entire classes of string-based phonological theories from a model-theoretic perspective. In the next section, I try to get closer to this goal.
2 2.1
The Parameters of Phonological Theories Elaborate Spreading — Increasing the Generative Capacity
It is easy to see that the modal logic defined in the previous section is powerful enough to account for all finitely bounded phonological phenomena (I hasten to add that this does not imply that GP itself can account for all of them, since
Formal Parameters of Phonology
81
certain phenomena might be ruled out by, say, the syllable template or the ECP). In fact, it is even possible to accommodate many long-distance phenomena in a straight-forward way, provided that they can be reinterpreted as arising from iterated application of finitely bounded processes or conditions. Consider for example a stress rule for language L that assigns primary stress to the last syllable that is preceded by an even number of syllables. Assume furthermore that secondary stress in L is trochaic, that is to say it falls on every odd syllable but the last one. Let 1 and 2 stand for primary and secondary stress, respectively. Unstressed syllables are assigned the feature 0. Then the following formula will ensure the correct assignment of primary stress, even though the notion of being separated from the left word edge by an even number of syllables is unbounded (for the sake of simplicity, I assume that every node in the string represents a syllable; it is an easy but unenlightening exercise to rewrite the formula for a GP syllable template consisting of Os, Ns and Cs).
i∧ (i → ¬j) ∧ ( ⊥ → 1 ∨ 2) ∧ (2 → 0)∧ i∈{0,1,2}
i =j∈{0,1,2}
(0 → (1 ∨ 2)∨ ⊥) ∧ (1 → ¬ 1 ∧ ( ⊥∨ ⊥)) Other seemingly unbounded phenomena arising from iteration of local processes, most importantly vowel harmony (see [3] for a GP analysis), can be captured in a similar way. However, there are several unbounded phonological phenomena that require increased expressivity, as I discuss en detail in [10]. Since we are only concerned with string structures, it is a natural move to try to enhance our language with operators from more powerful string logics, in particular, linear temporal logic. The first step is the addition of two operators + + and + with the corresponding relation R , the transitive closure of R . This new logic is exactly as powerful as restricted temporal logic [6], which in turn has been shown to exactly match the expressivity of the two-variable fragment of first-order logic ([7]; see [44] for further equivalence results). Among other things, unbounded OCP effects [9, 26] can now be captured in an elegant way. The formula O ∧A∧L∧ →+ ¬(O ∧A∧ ), for example, disallows alveolar nasals to be followed by another alveolar stop, no matter how far the two are apart. But + and + are too coarse for faithful renditions of unbounded spreading. For example, it is not possible to define all intervals of arbitrary size within which a certain condition has to hold (e.g. no b may appear between a and c). As a remedy, we can add to the logic the until and since operators U and S familiar from linear temporal logic, granting us the power of full first-order logic and pushing us to the level of the star-free languages [5, 6, 29, 41]. Star-free languages feature a plethora of properties that make them very attractive for purposes of natural language processing. Moreover, the only phenomenon known to the author that exceeds their confines is stress assignment in Cairene Arabic and Creek, which basically works like the stress assignment system outlined above — with the one exception that secondary stress is not marked overtly [12, 30]. Under these conditions, assigning primary stress involves counting modulo 2,
82
T. Graf
which is undefinable in first-order logic, whence a more powerful logic is needed. The next step up from the star-free stringsets are the regular stringsets, which can count modulo n. The regular stringsets are identical to the sets of finite strings definable in monadic second order logic (MSO) [2], linear temporal logic with modal fixed point operators [43] or regular linear temporal logic [27]. In linguistic terms, this corresponds to spreading being capable of picking its target based on more elaborate patterns, counting modulo 2 being one of them. For further discussion of the relation between expressivity and phenomena in natural language phonology, the reader is once again referred to [10]. A caveat is in order, though. Thatcher [40] proved that every recognizable set is a projection of some local set. Thus the hierarchy outlined above collapses if we grant ourselves an arbitrary number of additional features to encode all the structural properties our logic cannot express. In the case of primary stress in Cairene Arabic and Creek, for instance, we could just use the feature for secondary stress assignment even though secondary stress seems to be absent in these languages. Generally speaking, we can reinterpret any unbounded dependency as a result of iterated local processes by using “invisible” features. Therefore, all claims about generative capacity hold only under the proviso that all such coding-features are being eschewed. We have just seen that the power of GP can be extended along the subregular hierarchy, up to the power of regular languages, and that there seems to be empirical motivation to do so. Interestingly, it has been observed that SPE yields regular languages, too [15, 17]. But even the most powerful rendition of GP defines only a proper subset of the stringsets derivable in SPE, apparently due to its restrictions on the feature system, the syllable template and its government requirements. The question we face, then, is whether we can generalize GP in these regards, too, to push it to the full power of SPE and obtain a multidimensional vector space of phonological theories. 2.2
Feature Systems
Is is easy to see that at the level of classes of theories, the restriction to privative features is immaterial. A set of PEs is denoted by some propositional formula over E, and the boolean closure of E is isomorphic to ℘(E). But as shown in [22], a binary feature system using a set of features F can be modeled by the powerset algebra ℘(F), too. So if |E| = |F|, then ℘(E) and ℘(F) isomorphic, and so are the two feature systems. The same result holds for systems using more than two feature values, provided their number is finitely bounded, since multivalued features can be replaced by a collection of binary valued features given sufficient co-occurrence restrictions on feature values (which can easily be formalized in propositional logic). One might argue, though, that the core restriction of privative feature systems does not arise from the feature system itself but from the methodological principle that absent features, i.e. negative feature values, behave like constituency information and cannot spread. In general, though, this is not a substantial restriction either, as for every privative feature system E we can easily design a
Formal Parameters of Phonology
83
privative feature system F := {e+ , e− | e ∈ E} such that M, w |= e+ iff M, w |= e and M, w |= e− iff M, w |= ¬e. Crucially, though, this does not entail that the methodological principle described above has no impact on expressivity when the set of features is fixed across all theories, which is an interesting issue for future research. 2.3
Syllable Template
While GP’s syllable template could in principle be generalized to arbitrary numbers and sizes of constituents, a look at competing theories such as SPE and CVCV [28, 38] shows that the number of different constituents is already more than sufficient. This is hardly surprising, because GP’s syllable template is modeled after the canonical syllable template, which isn’t commonly considered to be in need of further refinement. Consequently, we only need to lift the restriction on the branching factor and allow theories not to use all three constituent types. SPE then operates with a single N constituent of unbounded size (as no segment in SPE requires special licensing, just like Ns in GP), whereas CVCV uses N and O constituents of size 1. Regarding the government relations, the idea is to let every theory fix the branching factor b for each constituent and the maximum number l of licensees per head. Every node within some constituent has to be constituent licensed by the head, i.e. the leftmost node of said constituent. Similarly, all nodes in a coda or non-head position have to be transconstituent licensed by the head of the following constituent. For every head the number of constituent licensees and transconstituent licensees, taken together, may not exceed l. Even from this basic sketch it should already be clear that the syllable template can have a negative impact on expressivity, but only under the right conditions. For instance, if our feature system is set up in a way such that every symbol of our alphabet is to be represented by a PE in N (as happens to be the case for SPE), restrictions on b and l are without effect. Thus one of the next stages in this project will revolve around determining under which conditions the syllable template has a monotonic effect on generative capacity. 2.4
Representations versus Derivations
One of the most striking differences between phonological theories is the distinction between representational and derivational ones, which begs the question how we can ensure comparability between these two classes. Representational theories are naturally captured by the declarative, model-theoretic approach, whereas derivational theories like SPE are usually formalized as regular relations [17, 31], which resist being recast in logical terms due to their closure properties. This problem is aggravated by the fact Optimality Theory [33], which provides the predominant framework in contemporary phonology, is also best understood in terms of regular relations [8, 16]. Of course, one can use a coding trick from two-level phonology [23] and use an unpronounced feature like μ to ensure that
84
T. Graf
all derivationally related strings have the same length, so that the regular relations can be interpreted as languages over pairs and hence cast in MSO terms [42]. Unfortunately, it is far from obvious how this method could be extended to subregular grammars, because Thatcher’s theorem tells us that the projection of a subregular language of pairs might be a regular language. But due to the ubiquity of SPE and OT analyses in phonology, no other open issue is of greater importance to the success of this project.
3
Conclusion
The purpose of this paper was to lay the foundation for a general framework in which string-based phonological theories can be matched against each other. I started out with a modal logic which despite its restrictions was still perfectly capable of defining a rather advanced and intricate phonological theory. I then tried to generalize the theory along several axes, some of which readily lent themselves to conclusive results while others didn’t. We saw that the power of spreading, by virtue of being an indicator of the necessary power of the description language, has an immediate and monotonic effect on generative capacity. Feature systems, on the other hand, were shown to be a negligible factor in theory comparisons; it remains an open question if the privativity assumption might affect generative capacity when the set of features is fixed. A detailled study of the effects of the syllable template also had to be deferred to later work. Clearly the most pressing issue, though, is the translation from representational to derivational theories. Not only will it enable us to reconcile two supposedly orthogonal perspectives on phonology, but it also allows us to harvest results on finite-state OT [8] to extend the framework to optimality theory. Even though a lot of work remains to be done and not all of my goals may turn out be achievable, I am confident that a model-theoretic approach provides an interesting new perspective on long-standing issues in phonology.
References [1] Blackburn, P., de Rijke, M., Venema, Y.: Modal Logic. Cambridge University Press, Cambridge (2002) [2] B¨ uchi, J.R.: Weak second-order arithmetic and finite automata. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik 6, 66–92 (1960) [3] Charette, M., G¨ oksel, A.: Licensing constraints and vowel harmony in Turkic languages. SOAS Working Papers In Linguistics and Phonetics 6, 1–25 (1996) [4] Chomsky, N.: Lectures on Government and Binding: The Pisa Lectures. Foris, Dordrecht (1981) [5] Cohen, J.: On the expressive power of temporal logic for infinite words. Theoretical Computer Science 83, 301–312 (1991) [6] Cohen, J., Perrin, D., Pin, J.E.: On the expressive power of temporal logic. Journal of Computer and System Sciences 46, 271–294 (1993) [7] Etessami, K., Vardi, M.Y., Wilke, T.: First-order logic with two variables and unary temporal logic. In: Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science, pp. 228–235 (1997)
Formal Parameters of Phonology
85
[8] Frank, R., Satta, G.: Optimality theory and the generative complexity of constraint violability. Computational Linguistics 24, 307–315 (1998) [9] Goldsmith, J.: Autosegmental Phonology. Ph.D. thesis, MIT (1976) [10] Graf, T.: Comparing incomparable frameworks: A model theoretic approach to phonology. In: University of Pennsylvania Working Papers in Linguistics, Article 10, vol. 16 (2010), http://repository.upenn.edu/pwpl/vol16/iss1/10 [11] Graf, T.: Logics of Phonological Reasoning. Master’s thesis, University of California, Los Angeles (2010) [12] Haas, M.R.: Tonal accent in Creek. In: Hyman, L.M. (ed.) Southern California Occasional Papers in Linguistics, vol. 4, pp. 195–208. University of Southern California, Los Angeles (1977), reprinted in [39] [13] Harris, J., Lindsey, G.: The elements of phonological representation. In: Durand, J., Katamba, F. (eds.) Frontiers of Phonology, pp. 34–79. Longman, Harlow (1995) [14] Jensen, S.: Is an element? Towards a non-segmental phonology. SOAS Working Papers In Linguistics and Phonetics 4, 71–78 (1994) [15] Johnson, C.D.: Formal Aspects of Phonological Description. The Hague, Mouton (1972) [16] J¨ ager, G.: Gradient constraints in finite state OT: The unidirectional and the bidirectional case. In: Kaufmann, I., Stiebels, B. (eds.) More than Words. A Festschrift for Dieter Wunderlich, pp. 299–325. Akademie Verlag, Berlin (2002) [17] Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Computational Linguistics 20(3), 331–378 (1994) [18] Kaye, J.: Government in phonology: the case of Moroccan Arabic. The Linguistic Review 6, 131–159 (1990) [19] Kaye, J.: Do you believe in magic? The story of s+C sequences. Working Papers in Linguistics and Phonetics 2, 293–313 (1992) [20] Kaye, J.: A user’s guide to government phonology (2000) (unpublished manuscript), http://134.59.31.7/~ scheer/scan/Kaye00guideGP.pdf [21] Kaye, J., Lowenstamm, J., Vergnaud, J.R.: Constituent structure and government in phonology. Phonology Yearbook 7, 193–231 (1990) [22] Keenan, E.: Mathematical structures in language, ms. University of California, Los Angeles (2008) [23] Koskenniemi, K.: Two-level morphology: A general computational model for wordform recognition and production. Publication 11 (1983) [24] Kracht, M.: Syntactic codes and grammar refinement. Journal of Logic, Language and Information 4, 41–60 (1995) [25] Kracht, M.: Features in phonological theory. In: L¨ owe, B., Malzkorn, W., R¨ asch, T. (eds.) Foundations of the Formal Sciences II, Applications of Mathematical Logic in Philosophy and Linguistics. Trends in Logic, vol. 17, pp. 123–149. Kluwer, Dordrecht (2003); papers of a conference held in Bonn (November 11–13, 2000) [26] Leben, W.: Suprasegmental Phonology. Ph.D. thesis, MIT (1973) [27] Leucker, M., S´ anchez, C.: Regular linear temporal logic. In: Jones, C.B., Liu, Z., Woodcock, J. (eds.) ICTAC 2007. LNCS, vol. 4711, pp. 291–305. Springer, Heidelberg (2007) [28] Lowenstamm, J.: CV as the only syllable type. In: Durand, J., Laks, B. (eds.) Current Trends in Phonology: Models and Methods, pp. 419–421. European Studies Research Institute, University of Salford (1996) [29] McNaughton, R., Pappert, S.: Counter-Free Automata. MIT Press, Cambridge (1971) [30] Mitchell, T.F.: Prominence and syllabification in Arabic. Bulletin of the School of Oriental and African Studies 23(2), 369–389 (1960)
86
T. Graf
[31] Mohri, M., Sproat, R.: An efficient compiler for weighted rewrite rules. In: 34th Annual Meeting of the Association for Computational Linguistics, pp. 231–238 (1996) [32] Potts, C., Pullum, G.K.: Model theory and the content of OT constraints. Phonology 19(4), 361–393 (2002) [33] Prince, A., Smolensky, P.: Optimality Theory: Constraint Interaction in Generative Grammar. Blackwell, Oxford (2004) [34] Pullum, G.K.: The evolution of model-theoretic frameworks in linguistics. In: Rogers, J., Kepser, S. (eds.) Model-Theoretic Syntax @ 10, pp. 1–10 (2007) [35] Rogers, J.: A model-theoretic framework for theories of syntax. In: Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, USA, pp. 10–16 (1996) [36] Rogers, J.: Strict LT2: Regular: Local: Recognizable. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 366–385. Springer, Heidelberg (1997) [37] Russell, K.: A Constraint-Based Approach to Phonology. Ph.D. thesis, University of Southern California (1993) [38] Scheer, T.: A Lateral Theory of Phonology: What is CVCV and Why Should it be? Mouton de Gruyter, Berlin (2004) [39] Sturtevant, W.C. (ed.): A Creek Source Book. Garland, New York (1987) [40] Thatcher, J.W.: Characterizing derivation trees for context-free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences 1, 317–322 (1967) [41] Thomas, W.: Star-free regular sets of ω-sequences. Information and Control 42, 148–156 (1979) [42] Vaillette, N.: Logical specification of regular relations for NLP. Natural Language Engineering 9(1), 65–85 (2003) [43] Vardi, M.Y.: A temporal fixpoint calculus. In: Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 250–259 (1988) [44] Weil, P.: Algebraic recognizability of languages. In: Fiala, J., Koubek, V., Kratochv´ıl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 149–175. Springer, Heidelberg (2004)
Variable Selection in Logistic Regression: The British English Dative Alternation Daphne Theijssen Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, 6525 HT Nijmegen, The Netherlands [email protected] http://lands.let.ru.nl/~ daphne
Abstract. This paper addresses the problem of selecting the ‘optimal’ variable subset in a logistic regression model for a medium-sized data set. As a case study, we take the British English dative alternation, where speakers and writers can choose between two – equally grammatical – syntactic constructions to express the same meaning. With 29 explanatory variables taken from the literature, we build two types of models: one with the verb sense included as a random effect, and one without a random effect. For each type, we build three different models by including all variables and keeping the significant ones, by successively adding the most predictive variable (forward selection), and by successively removing the least predictive variable (backward elimination). Seeing that the six approaches lead to six different variable selections (and thus six different models), we conclude that the selection of the ‘best’ model requires a substantial amount of linguistic expertise.
1
Introduction
There are many linguistic phenomena that researchers have tried to explain on the basis of features on several different levels of description (semantic, syntactic, lexical, etc.), and it can be argued that no single level can account for all observations. Probabilistic modelling techniques can help in combining these partially explanatory features and testing the combination on corpus data. A popular – and rather successful – technique for this purpose is logistic regression modelling. However, how exactly the technique is best employed for this type of research remains an open question. Statistical models built using corpus data do precisely what they are designed to do: find the ‘best possible’ model for a specific data set given a specific set of explanatory features. The issue that probabilistic techniques model data (while one would actually want to model underlying processes) is only aggravated by the fact that the variables are usually not mutually independent. As a consequence, one set of data and explanatory features can result in different models, depending on the details of the model building process. Building a regression model consists of three main steps: (1) deciding which of the available explanatory features should actually be included as variables in T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 87–101, 2010. c Springer-Verlag Berlin Heidelberg 2010
88
D. Theijssen
the model, (2) establishing the coefficients (weights) for the variables, and (3) evaluating the model. The first step is generally referred to as variable selection and is the topic of the current paper. Steps (1) and (3) are clearly intimately related. Researchers have employed at least three different approaches to variable selection: (1) first building a model on all available explanatory features and then keeping/reporting those that have a significant contribution (e.g. [3]), (2) successively adding the most explanatory feature (forward), until no significant gain in model accuracy1 is obtained anymore (e.g. [9]), and (3) starting with a model containing all available features, and (backward) successively removing those that yield the smallest contribution, as long as the accuracy of the model is not significantly reduced (e.g. [2]). In general, researchers report on only one (optimal) model without giving clear motivations for their choice of the procedure used. In this paper, we compare the three approaches in a case study: we apply them to a set of 930 instances of the British English dative alternation, taken from the British component of the ICE Corpus. In the dative alternation, speakers choose between the double object (1) and the prepositional dative construction (2). 1. She handed the student the book. 2. She handed the book to the student. The explanatory features (explanations suggested in the literature) are taken from Bresnan et al.’s work on the dative alternation in American English [3]. Previous research (e.g. [8,3]) has indicated that the verb or verb sense often predicts a preference for one of the two constructions. However, contrary to the fourteen explanatory features suggested by Bresnan et al., which can be treated as fixed variables because of their small number of values (often only two), verb sense has so many different values that it cannot be treated as a fixed variable in a regression model. Recently developed logistic regression models can handle variables with too many values by treating these as random effects (cf. [18]). In order to examine the effect of building such mixed models, we create models with and without a random effect in each of the three approaches to variable selection described above. This leads to a total of six different models. Our goal is to investigate whether it is justified to report only one ‘optimal’ regression model, if models can be built in several different ways. We will also pay attention to the role of a random effect in a model of syntactic variation built with a medium-sized set of observations. The case of the British English dative alternation is used to illustrate the issues and results. The structure of this paper is as follows: A short overview of the related work can be found in Section 2. The data is described in Section 3. In Section 4, we explain the method applied. The results are shown and discussed in Section 5. In the final Section (6), we present our conclusions. 1
Obviously, the accuracy measure will also have considerable impact on the result.
Variable Selection in Logistic Regression
2 2.1
89
Related Work The Dative Alternation
Bresnan et al. [3] built various logistic regression models for the dative alternation based on 2360 instances they extracted from the three-million word Switchboard Corpus of transcribed American English telephone dialogues [5]. With the help of a mixed-effect logistic regression model, or mixed model, with verb sense as a random effect, they were able to explain 95% of the variation. They defined the verb sense as the verb lemma together with its semantic verb class. The semantic verb class is either ‘abstract’ (e.g. give it some thought), ‘communication’ (e.g. tell him a story), ‘transfer of possession’ (e.g. give him the book ), ‘prevention of possession’ (e.g. deny him the money) or ‘future transfer of possession’ (e.g. promise him help). To test how well the model generalizes to previously unseen data, they built a model on 2000 instances randomly selected from the total set, and tested on the remaining 360 cases. Repeating this 100 times, 94% of the test cases on average were predicted correctly. Many of the variables in the model concern the two objects in the construction (the student and the book in example 1 and 2). In the prepositional dative construction, the object first mentioned is the theme (the book ), and the second object the recipient (the student ). In the double object construction, the recipient precedes the theme. Bresnan et al. found that the first object is typically (headed by) a pronoun, mentioned previously in the discourse (given), animate, definite and longer (in number of words) than the second object. The characteristics of the second object are generally the opposite: non-pronominal, new, inanimate, indefinite and shorter. According to Haspelmath [10], there is a slight difference between the dative alternation as it occurs in British English and in American English. When the theme is a pronoun, speakers of American English tend to allow only the prepositional dative construction. In British English, clauses such as She gave me it and even She gave it me are also acceptable. Haspelmath provides no evidence for these claims (neither from corpora nor from psycholinguistic experiments). He refers to Siewierska and Hollmann [17], who present frequency counts in various corpora of Lancashire (British) English: Of the 415 instances of the dative alternation they found, 8 were of the pattern She gave me it, and 15 of She gave it me. It must be expected that such differences between language variants result in different behaviour of variables in models for these different language variants. Inappropriate approaches to variable selection may obscure this kind of ‘real’ difference. Gries [7] performed analyses with multiple variables that are similar to those in Bresnan et al. [3], but applied a different technique (linear discriminant analysis or LDA) on a notably smaller data set consisting of only 117 instances from the British National Corpus [4]. The LDA model is trained on all instances, and is able to predict 88.9% of these cases correctly (with a majority baseline of 51.3%). There is no information on how the model performs on previously unseen data.
90
D. Theijssen
Gries and Stefanowitsch [8] investigated the effect of the verb in 1772 instances from the ICE-GB Corpus [6]. When predicting the preferred dative construction for each verb (not taking into account the separate senses), 82.2% of the constructions could be predicted correctly. Using verb bias as a predictor thus outperforms the majority baseline of 65.0%. 2.2
Variable Selection in Logistic Regression
Variable selection in building logistic regression models is an extremely important issue, for which no hard and fast solution is available. In [11, chapter 5] it is explained that variable selection is often needed to arrive at a model that reaches an acceptable prediction accuracy and is still interpretable in terms of some theory about the role of the independent variables. Keeping too many variables may lead to overfitting, while a simpler model may suffer from underfitting. The risk of applying variable selection is that one optimizes the model for a particular data set. Using a slightly different data set may result in a very different variable subset. Previous studies aimed at creating logistic regression models to explain linguistic phenomena have used various approaches to variable selection. Grondelaers and Speelman [9], for instance, successively added the most predictive variables to an empty model, while Blackwell [2] successively eliminated the least predictive variables from the full model. The main criticisms of these methods are (1) that the results are difficult to interpret when the variables are highly correlated, (2) that deciding which variable to remove or add is not trivial, (3) that all methods may result in different models that may be sub-optimal in some sense, and (4) that each provides a single model, while there may be more than one ‘optimal’ subset [11]. A third approach to variable selection used in linguistic research is keeping only the significant variables in a complete model (cf. Bresnan et al. [3]). This is also what Sheather suggests in [16, chapter 8]. Before building a model, however, he studies plots of the variables to select those that he expects to contribute to the model. Where beneficial, he transforms the variables to give them more predictive power (e.g. by taking their log). After these preprocessing steps he builds a model containing all the selected variables, removes the insignificant ones, and then builds a new model. As indicated by Izenman [11], variable selection on the basis of a data set may lead to a model that is specific for that particular set. Since we want to be able to compare our models to those found by Bresnan et al. [3], who did not employ such transformations, we refrain from such preprocessing and we set out using the same set of variables they used in the variable selection process. Yet another approach mentioned in [11] is to build all models with each possible subset and select those with the best trade-off between accuracy, generalisability and interpretability. An important objection to this approach is that it is computationally expensive to carry out, and that decisions about interpretability may suffer from theoretical prejudice. For these reasons, we do not employ this method.
Variable Selection in Logistic Regression
3
91
Data
Despite the fact that a number of researchers have studied the dative alternation in English (see Section 2.1), none of the larger data sets used is available in such a form that it enables the research in this paper.2 We therefore established our own set of instances of the dative alternation in British English. Since we study a syntactic phenomenon, it is convenient to employ a corpus with detailed (manually checked) syntactic annotations. We selected the one-million-word British component of the ICE Corpus, the ICE-GB, containing both written and (transcribed) spoken language [6]. We used a Perl script to automatically extract potentially relevant clauses from the ICE-GB. These were clauses with an indirect and a direct object (double object) and clauses with a direct object and a prepositional phrase with the preposition to (prepositional dative). Next, we manually checked the extracted sets of clauses and removed irrelevant clauses such as those where the preposition to had a locative function (as, for example, in Fold the short edges to the centre.). Following Bresnan et al. [3], we ignored constructions with a preposition other than to, with a clausal object, with passive voice and with reversed constructions (e.g. She gave it me). To further limit the influence of the syntactic environment of the construction, we decided to exclude variants in imperative and interrogative clauses, as well as those with phrasal verbs (e.g. to hand over ). Coordinated verbs or verb phrases were also removed. The characteristics of the resulting data sets can be found in Table 1. Table 1. Characteristics of the 930 instances taken from the ICE-GB Corpus Medium Double object Prep. dative Total Spoken British English 406 152 558 Written British English 266 106 372 Total 672 258 930
4
Method
4.1
Explanatory Features
We adopt the explanatory features and their definitions from Bresnan et al. [3] (Table 2), and manually annotate our data set following an annotation manual based on these definitions.3 Our set includes one feature that was not used in [3]: medium, which tells us whether the construction was found in written or spoken text. It may well be 2
3
Although most of the data set used in [3] is available through the R package LanguageR, the original sentences and some annotations are not publicly available because they are taken from an unpublished, corrected version of the Switchboard Corpus. The annotation manual is available online: http://lands.let.ru.nl/~daphne/ downloads.html
92
D. Theijssen
Table 2. Explanatory features (th=theme, rec=recipient). All nominal explanatory features are transformed into binary variables with values 0 and 1. Feature 1. rec = animate 2. th = concrete 3. rec = definite
Values 1, 0 1, 0 1, 0
4. th = definite 5. rec = given 6. th = given 7. length difference 8. rec = plural 9. th = plural 10. rec = local 11. rec = pronominal 12. th = pronominal 13. verb = abstract verb = communication verb = transfer 14. structural parallellism 15. medium = written
1, 0 1, 0 1, 0 -3.4-4.2 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0
Description human or animal, or not with fixed form and/or space, or not definite pronoun, proper name or noun preceded by definite determiner, or not Id. mentioned/evoked ≤20 clauses before, or not Id. ln(#words in th) − ln(#words in rec) plural in number, or not (singular) Id. first or second person (I, you), or not headed by a pronoun, or not Id. give it some thought is abstract, tell him a story is communication, give him the book is transfer preceding instance is prep. dative, or not type of data is written, or not (spoken)
that certain variables only play a role in one of the two media. In order to test this, we include the 14 (two-way) interactions between the features taken from Bresnan et al. and the medium.4 Together with the feature medium itself, this yields a total number of 29 features. As mentioned in the Introduction, we will build models with and without including verb sense as a random effect. Following [3], we define the verb sense as the lemma of the verb together with its semantic class, e.g. pay a for pay with an abstract meaning (pay attention) and pay t when pay is used to describe a transfer of possession (pay $10 ). In total, our data set contains 94 different verb senses (derived from 65 different verbs). The distribution of the verb senses with 5 or more occurrences can be found in Table 3. As predicted by Gries and Stefanowitsch [8], many verbs show a bias towards one of the two constructions. The verb give, for instance, shows a bias for the double object construction, and sell for the prepositional dative construction. Only for pay and send, the bias differs for the different senses. For example, pay shows a clear bias towards the prepositional dative construction when it has an abstract meaning, but no bias when transfer of possession is meant. Nevertheless, we follow the approach in [3] by taking the verb sense, not the verb, as the random effect. 4
We are aware of the fact that there are other ways to incorporate the medium in the regression models, for instance by building separate models for the written and the spoken data. Since the focus of this paper is on the three approaches in combination with the presence or absence of a random effect, we will limit ourselves to the method described.
Variable Selection in Logistic Regression
93
Table 3. Distribution of verb senses with 5 or more occurrences in the data set. The verb senses in the right-most list have a clear bias towards the double object (d.obj.) construction, those in the left-most for the prepositional dative (p.dat.) construction, and those in the middle show no clear preference. The a represents abstract, c communication and t transfer of possession. # d.obj. > # p.dat. verb sense d.obj. p.dat. give a 255 32 give t 56 21 66 10 give c tell c 67 1 send t 42 16 37 9 show c offer a 24 9 show a 6 1 offer t 6 0 tell a 6 0 wish c 6 0 bring a 4 1
4.2
# d.obj. ≈ # p.dat. verb sense d.obj. p.dat. do a 8 10 send c 9 7 lend t 8 7 pay t 6 5 leave a 5 4 write c 4 5 bring t 3 2 hand t 3 2
# d.obj. < # p.dat. verb sense d.obj. p.dat. pay a 2 12 cause a 5 8 sell t 0 10 owe a 2 6 explain c 0 6 present c 0 6 read c 1 4
Variable Selection
Using the values of the 29 explanatory features (fixed effect factors), we establish a regression function that predicts the natural logarithm (ln) of the odds that the construction C in clause j is a prepositional dative. The prepositional dative is regarded a ‘success’ (with value 1), while the double object construction is considered a ‘failure’ (0). The regression function for the models without a random effect is: (1): ln odds(Cj = 1) = α +
29
(βk Vjk ) .
(1)
k=1
The α is the intercept of the function. βk Vjk are the weights β and values Vj of the 29 variables k. For the model with the random effect (for verb sense i), the regression function is: ln odds(Cij = 1) = α +
29
(βk Vjk ) + eij + ri .
(2)
k=1
The random effect ri is normally distributed with mean zero (ri ∼ N (0, σr2 )), independent of the normally distributed error term eij (eij ∼ N (0, σe2 )). The optimal values for the function parameters α, βk and (for models with a random effect) ri and eij are found with the help of Maximum Likelihood Estimation.5 The outcome of the regression enables us to use the model as a classifier: all cases with ln odds(Cj = 1) ≥ t (for the models without a random effect) or 5
We use the functions glm() and lmer() [1] in R [15].
94
D. Theijssen
ln odds(Cij = 1) ≥ t (for models with a random effect) are classified as prepositional dative, all with ln odds(Cj = 1) < t or ln odds(Cij = 1) < t as double object, with t the decision threshold, which we set to 0. With this threshold, all instances for which the regression function outputs a negative ln odds are classified as double object constructions, all other instances as prepositional dative. In the first approach, we include all 29 features in the model formula. We then remove all variables Vk that do not have a significant effect in the model output,6 and build a model with the remaining (significant) variables. For the second approach, being forward selection, we start with an empty model and successively add the variable that is most predictive. As Izenman [11] explains, there are several possible criteria for deciding which variable to enter. We decide to enter the variable that yields the highest area under the ROC (Receiver Operating Characteristics) curve of the extended model. The ROC curve shows the proportions of correctly and incorrectly classifies instances as a function of the decision threshold. The area under the ROC curve (AUC) gives the probability that the regression function, when randomly selecting a positive (prepositional dative) and a negative (double object) instance, outputs a higher log odds for the positive instance than for the negative instance. The AUC is thus an evaluation measure for the quality of a model. It is calculated with: average rank(xC=1 ) − n−p
p+1 2
,
(3)
where average rank(xC=1 ) is the average rank of the instances x that are prepositional dative (when all instances are ranked numerically according to the log odds), p the number of prepositional dative instances, and n the total number of instances.7 We add the next most predictive variable to the model as long as it gives an improvement over the AUC of the model without the variable. An interaction of variable Vk with medium is only included when the resulting AUC is higher than the value reached after adding the main variable Vk .8 Two AUC values are considered different when the difference is higher than a threshold. We set the threshold to 0.002.9 For the third approach (backward elimination), we use the opposite procedure: we start with the full model, containing all 29 variables, and successively leave out the variable Vk that, after removal, yields the model with the highest AUC value that is not lower than the AUC value for the model with Vk . When the AUC value of a model without variable Vk does not differ from the AUC value of the model without the interaction of Vk with medium, we remove the interaction. Again, AUC values are only considered different when the difference exceeds a threshold (again set to 0.002). 6 7 8 9
We use the P-values as provided by glm() and lmer(). We use the function somers2() created in R [15] by Frank Harrell. When including an interaction but not the main variables in it, the interaction will also partly explain variation that is caused by the main variables [14]. The threshold value has been established experimentally.
Variable Selection in Logistic Regression
95
We evaluate the models with and without random effects by establishing the model quality (training and testing on all 930 cases) by calculating the percentage of correctly classified instances (accuracy) and the area under the ROC curve (AUC). Also, we determine the prediction accuracy reached in 10-fold cross-validation (10 sessions of training on 90% of the data and testing on the remaining 10%) in order to establish how well a model generalizes to previously unseen data. In the 10-fold cross-validation setting, we provide the algorithms with the variables selected in the models trained on all 930 cases. The regression coefficients for these subsets of variables are then estimated for each separate training set. The coefficients in the regression models help us understand which variables play what role in the dative alternation. We will therefore compare the coefficients of the significant effects in the models built on all 930 instances.
5 5.1
Results Mixed Models
Table 4 gives the model quality and prediction accuracy for the different regression models we built, including verb sense as a random effect. The prediction accuracy (the percentage of correctly classified cases) is significantly higher than the majority baseline (always selecting the double object construction) in all settings, also when testing on new data (p < 0.001 for the three models, Wilcoxon paired signed rank test). Table 4. Number of variables selected, model quality and prediction accuracy of the regression models with verb sense as a random effect
selection #variables baseline 1. significant 6 0.723 2. forward 4 0.723 3. backward 4 0.723
model quality (train=test) AUC accuracy 0.979 0.935 0.979 0.932 0.978 0.928
10-fold cv aver. accuracy 0.819 0.827 0.833
When training and testing on all 930 instances, the mixed models reach very high AUC and prediction accuracy (model quality). However, seeing the decrease in accuracy in a 10-fold cross-validation setting, it seems that the mixed models do not generalize very well to previously unseen data. The significant effects for the variables selected in the three approaches are presented in Table 5. The directions of the main effects are the same as the results presented in Section 2.1 for American English [3]. The forward selection (2) and backward elimination (3) approaches lead to almost the same regression model. The only difference is that in the backward model, the discourse givenness of the recipient is included as a main effect, while it is included as an interaction with medium in the forward model. Both indicate that the choice for the double object construction is more likely when the
96
D. Theijssen
Table 5. Coefficients of significant effects in (mixed) regression models with verb sense as random effect, trained on all 930 instances, *** p<0.001 ** p<0.01 * p<0.05. The (negative) effects above the horizontal line draw towards the double object construction, and the (positive) effects below it toward the prepositional dative construction. Effect length difference rec=animate rec=given rec=given, medium=spoken rec=given, medium=written rec=local th=pronominal, medium=written (intercept) th=definite th=given th=pronominal
1. significant -2.50 *** -1.01 *
2. forward -2.44 ***
3. backward -2.39 *** -1.44 ***
-2.53 -1.79 2.05 1.78
*** * *** ***
-0.94 * -1.74 *** -1.82 ***
-1.78 ***
2.32 ***
2.38 ***
2.34 ***
2.33 ***
2.19 ***
recipient has been mentioned previously in the discourse (and is thus given). In the forward model, this effect is a little stronger in writing than in speech. The animacy of the recipient is only found significant in the model obtained by keeping the significant variables (1). The other differences between the two stepwise models and this model are likely to be caused by the fact that the information contained in the variables shows considerable overlap. Pronominal and definite objects are also often discourse given. A significant effect for the one variable may therefore decrease the possibility of regarding the other as significant. This is exactly what we see: the model obtained through the two stepwise approaches contains a variable denoting the givenness of the theme but none describing its pronominality or definiteness, while it is the other way around for the model with the significant variables from the full model. The model obtained by keeping the significant variables in the full model also contains one interaction, namely that between medium and a pronominal theme. The main effect (without medium) is also included, but it shows the opposite effect. When the theme is pronominal, speakers tend to use the prepositional dative construction (coefficient 2.15). This effect seems much less strong in writing (remaining coefficient 2.15 - 2.01 = 0.14). What remains unclear, is which of the three models is more suitable for explaining the British English dative alternation. Seeing the differences between the significant effects in the three models we found, and the relatively low prediction accuracy in 10-fold cross-validation, it seems that the models are modelling the specific data set rather than the phenomenon. A probable cause is that the mixed models are too complex to model a data set consisting of 930 instances. In the next section, we apply the three approaches to build simpler models, namely without the random effect.
Variable Selection in Logistic Regression
5.2
97
Models without a Random Effect
The model quality and prediction accuracy for the models without a random effect can be found in Table 6. Table 6. Model fit and prediction accuracy of the regression models without a random effect selection #variables baseline 1. significant 6 0.723 2. forward 7 0.723 3. backward 8 0.723
model quality (train=test) AUC accuracy 0.938 0.878 0.943 0.878 0.946 0.882
10-fold cv aver. accuracy 0.872 0.876 0.876
The estimates of model quality AUC and accuracy are considerably lower than the values obtained with the mixed models (Table 4). On the other hand, the models without a random effect generalize well to new data: the prediction accuracy in 10-fold cross-validation is very similar to the model quality accuracy (when training and testing on all instances). The prediction accuracies reached in 10-fold cross-validation are significantly better than those reached with the best mixed model (p < 0.001 for the three regular models compared to the backward mixed model, following the Wilcoxon paired signed rank test). Apparently the simpler models, without a random effect, outperform the mixed models when applying them to previously unseen data. Table 7 shows the significant effects in the models without random effect. Again, the directions of the coefficients are the same across the three models, but they disagree on the significance of the variables. Three variables are selected in all three approaches: the person of the recipient (local or non-local), the pronominality of the recipient, and the concreteness of the theme. The latter two were not selected at all in the mixed-effect approach of the previous section. Three more variables have significant effects in two of the three models. According to all three models, speakers tend to use the double object construction when the theme is longer than the recipient. The backward elimination model (3), however, shows that the effect of length difference is especially strong in speech. As for the mixed model in the previous section, the forward selection has selected the interaction between the medium and the discourse givenness of the recipient. Writers are thus more likely to choose the double object construction when the recipient has recently been mentioned in the text, than when the recipient is newly (re)introduced. The semantic verb class is only selected in the backward elimination. In the literature (cf. [13]), it is argued that the prepositional dative construction is especially used to express a change of place (moving the theme), and the double object construction a change of state (possessing the theme). In this perspective, we would expect instances with a transfer of possession to be in the prepositional dative construction (give a book to you), and instances with abstract meanings in the double object construction (give you moral support). This is also what
98
D. Theijssen
Table 7. Coefficients of significant effects in regression models (without random effect), trained on all 930 instances, *** p<0.001 ** p<0.01 * p<0.05. The (negative) effects above the horizontal line draw towards the double object construction, and the (positive) effects below it toward the prepositional dative construction. Effect length difference length difference, medium=spoken length difference, medium=written rec=definite rec=given, medium=written rec=local rec=pronominal verb=abstract, medium=written verb=transfer, medium=spoken verb=transfer, medium=written (intercept) th=concrete th=definite th=given
1. significant -1.73 ***
-1.22 *** -1.35 ***
1.33 *** 1.48 ***
2. forward -2.35 -1.71 -1.01 -0.66 -0.94 -0.88
*** *** ** * ** **
0.82 ** 1.48 *** 1.58 ***
3. backward -2.00 ***
-1.15 *** -1.15 -1.25 -0.99 -1.04 -1.32 1.56 1.63 1.16 0.98
** *** * * * ** *** *** **
Bresnan et al. [3] found for spoken American English. In the backward model, however, the effect is the opposite: a transfer of possession is more strongly drawn towards the double object construction than an abstract meaning. The problem here is that these two semantic verb classes depend largely on the concreteness of the theme (Pearson correlation = 0.739), a feature that has been selected in all three models in Table 7. When the semantic verb class is transfer of possession, the theme is very likely to be concrete. The backward model thus seems to compensate the positive coefficient of concreteness (1.63) by given a negative coefficient to the semantic verb class (e.g. -1.32 for transfer of possession in writing). The resulting effect is still directed at the double object construction (remaining coefficient 1.63 - 1.32 = 0.31), but it is not very strong. In Section 3, we saw that only pay and send showed different biases towards one of the two constructions in different verb senses. It seems that the biases are mostly due to the verb (see also [8]) and the concreteness of the theme, and not so much to their semantic verb classes abstract, communication and transfer of possession.
6
Discussion and Conclusion
In this paper, we built regular and mixed (i.e. containing a random effect) logistic regression models in order to explain the British English dative alternation. We used a data set of 930 instances taken from the ICE-GB Corpus, and took the explanatory factors suggested by Bresnan et al. [3]. The regular and the mixed models were constructed following three different approaches: (1) providing the algorithms with all 29 variables and keeping the significant ones, (2) starting
Variable Selection in Logistic Regression
99
with an empty model and forwardly successively adding the most predictive variables, and (3) starting with a model with all 29 features and backwardly successively removing the least predictive variables. In total, we thus have built six logistic regression models for the same data set. The six models show some overlap in the variables that are regarded significant. These variables show the same effects as found for American English [3]: pronominal, relatively short, local (first or second person), discourse given, definite and concrete objects typically precede objects with the opposite characteristics. Contrary to the claims in Haspelmath [10], we found no evidence for the hypothesis that the dative alternation in British English differs from that in American English. With respect to medium, there seem to be some differences between the dative alternation in speech and writing. Four variables were selected as interactions with medium. Only one of them, the givenness of the recipient, has been selected in more than one model (i.e. in the two forward selections). As opposed to the mixed models, the models without a random effect generalize well to previously unseen data. This does not necessarily mean that the British English dative alternation is best modelled with logistic regression models without a random effect. The models fit the data better when verb sense is included as a random effect. The fact that the mixed models do not generalize well to new data could be due to the relatively small size of our data set. In the near future, we therefore aim at extending our data set, employing the British National Corpus [4]. Since manually extending the data set in a way similar to that taken to reach the current data set of 930 instances is too labour-intensive, we aim at automatically extending the data set (in an approach similar to that taken in Lapata [12]), and automatically annotating it for the explanatory features in this paper. With the larger set, we hope to be able to model the underlying processes of the dative alternation, rather than modelling the instances that made it into our data set. One of the drawbacks of variable selection is that different selection methods can lead to different models [11]. Accordingly, the six methods we applied have led to six different selections of variables and thus to six different models. How can we decide which is the optimal model for our purpose? Of course, the way to approach this issue depends on the goal of a specific research enterprise. For a researcher building a machine translation system, the best approach is probably to choose the highest prediction accuracy on previously unseen data. For linguists, however, the best approach may be less clear. In our project we want to combine the explanatory features suggested in previous research and test the combination on real data. We thus have hypotheses about what are explanatory features and what kind of effect they show in isolation, but it is unclear how specific features behave in combination with others. Also, we want a model that is interpretable in the framework of some linguistic theory and that, ideally, reflects the processes in human brains. It is uncertain how (and if) we can evaluate a model in this sense. Still, despite these difficulties, using techniques such as logistic regression is very useful for gaining insight in the relative contribution
100
D. Theijssen
that different features have on the choices people make when there is syntactic variability. But contrary to what seems to be common in linguistics, researchers should be careful in choosing a single approach and drawing conclusions from one model only. Firm conclusions about mental processes can only be drawn if similar models are obtained with a number of different data sets. In addition, models derived from corpus data should be tested in psycholinguistic experiments. Acknowledgments. I am grateful to Lou Boves, Hans van Halteren and Nelleke Oostdijk for their support and their useful comments on this paper.
References 1. Bates, D.: Fitting linear mixed models in R. R News 5(1), 27–30 (2005) 2. Blackwell, A.: Acquiring the English adjective lexicon: relationships with input properties and adjectival semantic typology. Child Language 32, 535–562 (2005) 3. Bresnan, J., Cueni, A., Nikitina, T., Baayen, H.: Predicting the Dative Alternation. In: Bouma, G., Kraemer, I., Zwarts, J. (eds.) Cognitive Foundations of Interpretation, pp. 69–94. Royal Netherlands Academy of Science, Amsterdam (2007) 4. Burnard, L.: Reference Guide for the British National Corpus (XML Edition). Published for the British National Corpus Consortium. Research Technologies Service at Oxford University Computing Services (2007) 5. Godfrey, J., Holliman, E., McDaniel, J.: Switchboard: Telephone speech corpus for research and development. In: ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 517–520. IEEE Computer Society, Los Alamitos (1992) 6. Greenbaum, S.: Comparing English Worldwide: The International Corpus of English. Clarendon, Oxford (1996) 7. Gries, S.: Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics 1, 1–27 (2003) 8. Gries, S., Stefanowitsch, A.: Extending Collostructional Analysis: A Corpus-based Perspective on ‘Alternations’. International Journal of Corpus Linguistics 9, 97–129 (2004) 9. Grondelaers, S., Speelman, D.: A variationist account of constituent ordering in presentative sentences in Belgian Dutch. Corpus Linguistics and Linguistic Theory 3(2), 161–193 (2007) 10. Haspelmath, M.: Ditransitive alignment splits and inverse alignment. Functions of Language 14(1), 79–102 (2007) 11. Izenman, A.: Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer, New York (2008) 12. Lapata, M.: Acquiring lexical generalizations from corpora: a case study for diathesis alternations. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pp. 397–404. Morgan Kaufmann, San Francisco (1999) 13. Pinker, S.: Learnability and Cognition: The Acquisition of Argument Structure. MIT Press, Cambridge (1989) 14. Rietveld, T., van Hout, R.: Statistical Techniques for the Study of Language and Language Behavior. Mouton de Gruyter, Berlin (1993) 15. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2008)
Variable Selection in Logistic Regression
101
16. Sheather, S.: A Modern Approach to Regression with R. Springer, New York (2009) 17. Siewierska, A., Hollmann, W.: Ditransitive clauses in English with special reference to Lancashire dialect. In: Hannay, M., van der Steen, G.J. (eds.) Structuralfunctional Studies in English Grammar: In Honor of Lachlan Mackenzie, pp. 83–102. John Benjamins, Amsterdam (2007) 18. West, B.T., Welch, K.B., Galecki, A.T.: Linear Mixed Models: A practical guide using statistical software. Chapman & Hall/CRC, Boca Raton (2007)
A Salience-Driven Approach to Speech Recognition for Human-Robot Interaction Pierre Lison Language Technology Lab, German Research Centre for Artificial Intelligence (DFKI GmbH), Saarbr¨ ucken Germany [email protected]
Abstract. We present an implemented model for speech recognition in natural environments which relies on contextual information about salient entities to prime utterance recognition. The hypothesis underlying our approach is that, in situated human-robot interaction, speech recognition performance can be significantly enhanced by exploiting knowledge about the immediate physical environment and the dialogue history. To this end, visual salience (objects perceived in the physical scene) and linguistic salience (previously referred-to objects within the current dialogue) are integrated into a single cross-modal salience model. The model is dynamically updated as the environment evolves, and is used to establish expectations about uttered words which are most likely to be heard given the context. The update is realised by continously adapting the word-class probabilities specified in the statistical language model. The present article discusses the motivations behind our approach, describes our implementation as part of a distributed, cognitive architecture for mobile robots, and reports the evaluation results on a test suite.
1
Introduction
Recent years have seen increasing interest in service robots endowed with communicative capabilities. In many cases, these robots must operate in open-ended environments and interact with humans using natural language to perform a variety of service-oriented tasks. Developing cognitive systems for such robots remains a formidable challenge. Software architectures for cognitive robots are typically composed of several cooperating subsystems, such as communication, computer vision, navigation and manipulation skills, and various deliberative processes such as symbolic planners [1]. These subsystems are highly interdependent. Incorporating basic functionalities for dialogue comprehension and production is not sufficient to make a robot interact naturally in situated dialogues. Crucially, dialogue managers for human-robot interaction also needs to relate language, action and situated reality in a unified framework, and enable the robot to use its perceptual experience to continuously learn and adapt itself to the environment. The first step in comprehending spoken dialogue is automatic speech recognition [ASR]. For robots operating in real-world noisy environments, and dealing T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 102–113, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Salience-Driven Approach to Speech Recognition
103
with utterances pertaining to complex, open-ended domains, this step is particularly difficult and error-prone. In spite of continuous technological advances, the performance of ASR remains for most tasks at least an order of magnitude worse than that of human listeners [2]. One strategy for addressing this issue is to use context information to guide the speech recognition by percolating contextual constraints to the statistical language model [3]. In this paper, we follow this approach by defining a contextsensitive language model which exploits information about salient objects in the visual scene and linguistic expressions in the dialogue history to prime recognition. To this end, a salience model integrating both visual and linguistic salience is used to dynamically compute lexical activations, which are incorporated into the language model at runtime. Our approach departs from previous work on context-sensitive speech recognition by modeling salience as inherently cross-modal, instead of relying on just one particular modality such as gesture [4], eye gaze [5] or dialogue state [3]. The Fuse system described in [6] is a closely related approach, but limited to the processing of object descriptions, whereas our system was designed from the start to handle generic situated dialogues. The structure of the paper is as follows: in the next section we briefly introduce the software architecture in which our system has been developed. We then describe in Section 3 our approach, detailing the salience model, and explaining how it is exploited within the language model used for speech recognition. We finally present in Section 4 the empirical evaluation of our approach, followed in Section 5 by conclusions.
2
Background
The approach we present in this paper is fully implemented and integrated into a distributed cognitive architecture for autonomous robots (see [7]). The architecture is divided into a set of subsystems. Each subsystem consists of a number of processes, and a working memory. The processes can access sensors, effectors, and the working memory to share information within the subsystem. The robot is capable of building up visuo-spatial models of a dynamic local scene, and continuously plan and execute manipulation actions on objects within that scene. The robot can discuss objects and their material- and spatial properties for the purpose of visual learning and manipulation tasks. Fig. 1 illustrates the architecture for the communication subsystem. Starting with speech recognition, we process the audio signal to establish a word lattice containing statistically ranked hypotheses about word sequences. Subsequently, parsing constructs grammatical analyses for the given word lattice. A grammatical analysis constructs both a syntactic analysis of the utterance, and a representation of its meaning. The analysis is based on an incremental chart parser1 for Combinatory Categorial Grammar [8]. These meaning representations are ontologically richly sorted, relational structures, formulated in a 1
Built using the OpenCCG API: http://openccg.sf.net
104
P. Lison
Fig. 1. Schema of the communication subsystem (limited to comprehension)
(propositional) description logic, more precisely in Hybrid Logic Dependency Semantics [9]. The parser then compacts all meaning representations into a single packed logical form [10,11]. A packed logical form represents content similar across the different analyses as a single graph, using over- and underspecification of how different nodes can be connected to capture lexical and syntactic forms of ambiguity. At the level of dialogue interpretation, the logical forms are resolved against a SDRS-like dialogue model [12], which is then exploited in various pragmatic interpretation tasks such as reference resolution or dialogue move recognition. Linguistic interpretations must finally be associated with extra-linguistic knowledge about the environment – dialogue comprehension hence needs to connect with other subarchitectures like vision, spatial reasoning or planning. We realise this information binding between different modalities via a specific module, called the “binder”, which is responsible for the ontology-based mediation accross modalities [13]. Interpretation in context indeed plays a crucial role in the comprehension of utterance as it unfolds. Human listeners continuously integrate linguistic information with scene understanding, (foregrounded entities and events) and word knowledge. This contextual knowledge serves the double purpose of interpreting what has been said, and predicting/anticipating what is going to be said. Their integration is also closely time-locked, as evidenced by analyses of saccadic eye movements in visual scenes [14] and by neuroscience-based studies of event-related brain potentials [15]. Several approaches in situated dialogue for human-robot interaction demonstrated that a robot’s understanding can be substantially improved by relating utterances to the situated context [17,18,11]. Contextual knowledge can be fruitfully exploited to guide attention and help disambiguate and refine linguistic input by filtering out unlikely interpretations (see Fig. 2 for an illustration). Our
A Salience-Driven Approach to Speech Recognition
105
Fig. 2. Context-sensitivity in processing situated dialogue understanding (the use of contextual knowledge for discriminative parse selection is described in [16])
approach is essentially an attempt to improve the speech recognition by drawing inspiration from the contextual priming effects evidenced in human cognition.
3 3.1
Approach Salience Modeling
In our implementation, we define salience using two main sources of information: 1. the salience of objects in the perceived visual scene; 2. the linguistic salience or “recency” of linguistic expressions in the dialogue history. Other information sources could also be easily added in the model. Examples are the presence of gestures [4], eye gaze tracking [5], entities in large-scale space [19], or the integration of a task model – as salience generally depends on intentionality [20]. Visual salience. Via the “binder”, we can access the set of objects currently perceived in the visual scene. Each object is associated with a concept name (e.g. printer) and a number of features, for instance spatial coordinates or qualitative propreties like colour, shape or size. Several features can be used to compute the salience of an object. The ones currently used in our implementation are (1) the object size and (2) its distance relative to the robot (i.e. spatial proximity). Other features could also prove to be helpful, like the reachability of the object or its distance from the point of visual focus – similarly to the spread of visual acuity across the human retina. To derive the visual salience value for each object, we assign a numeric value for the two variables, and then perform a weighted addition. The associated weights are determined via regression tests.
106
P. Lison
Fig. 3. Example of a visual scene
It is worth noting that the choice of a particular measure for the visual saliency is heavily dependent on the application domain and the properties of the visual scene (typical number of objects, relative distances, recognition capacities of the vision system, angle of view, etc.). For the application domain in which we performed our evaluation (cfr. section 4), the experimental results turned out to be largely indifferent to the choice of a specific method of calculation for the visual saliency. At the end of the processing, we end up with a set Ev of visual objects, each of which is associated with a numeric salience value s(ek ), with ek ∈ Ev . Linguistic salience. There is a vast amount of literature on the topic of linguistic salience. Roughly speaking, linguistic salience can be characterised either in terms of hierarchical recency, according to a tree-like model of discourse structure (cfr. [21,22,12]), or in terms of linear recency of mention (see [23] for a discussion). Our implementation can theoretically handle both types of linguistic salience, but for all practical purposes, the system only takes linear recency into account, as it is easier to compute and usually more reliable than hierarchical recency (which crucially depends on having a well-formed discourse structure). To compute the linguistic salience, we extract a set El of potential referents from the discourse structure, and for each referent ek we assign a salience value s(ek ) equal to the distance (measured on a logarithmic scale) between its last mention and the current position in the discourse structure. 3.2
Cross-Modal Salience Model
Once the visual and linguistic salience are computed, we can proceed to their integration into a cross-modal statistical model. We define the set E as the union
A Salience-Driven Approach to Speech Recognition
107
of the visual and linguistic entities: E = Ev ∪ El , and devise a probability distribution P (E) on this set: P (ek ) =
δv IEv (ek ) sv (ek ) + δl IEl (ek ) sl (ek ) |E|
(1)
where IA (x) is the indicator function of set A, and δv , δl are factors controlling the relative importance of each type of salience. They are determined empirically, subject to the following constraint to normalise the distribution: δv
ek ∈Ev
s(ek ) + δl
s(ek ) = |E|
(2)
ek ∈El
The statistical model P (E) thus simply reflects the salience of each visual or linguistic entity: the more salient, the higher the probability. 3.3
Lexical Activation
In order for the salience model to be of any use for speech recognition, a connection between the salient entities and their associated words in the ASR vocabulary needs to be established. To this end, we define a lexical activation network, which lists, for each possible salient entity, the set of words activated by it. The network specifies the words which are likely to be heard when the given entity is present in the environment or in the dialogue history. It can therefore include words related to the object denomination, subparts, common properties or affordances. The salient entity laptop will activate words like ‘laptop’, ‘notebook’, ‘screen’, ‘opened’, ‘ibm’, ‘switch on/off’, ‘close’, etc. The list is structured according to word classes, and a weight can be set on each word to modulate the lexical activation: supposing a laptop is present, the word ‘laptop’ should receive a higher activation than, say, the word ‘close’, which is less situation specific. The use of lexical activation networks is a key difference between our model and [6], which relies on a measure of “descriptive fitness” to modify the word probabilities. One advantage of our approach is the possibility to go beyond object descriptions and activate word types denoting subparts, properties or affordances of objects. In the context of a laptop object, words such as ‘screen’, ‘ibm’, ‘closed’ or ‘switch on/off’ would for instance be activated. If the probability of specific words is increased, we need to re-normalise the probability distribution. One solution would be to decrease the probability of all non-activated words accordingly. This solution, however, suffers from a significant drawback: our vocabulary contains many context-independent words like prepositions, determiners or general words like ‘thing’ or ‘place’, whose probability should remain constant. To address this issue, we mark an explicit distinction in our vocabulary between context-dependent and context-independent words. Only the context-dependent words can be activated or deactivated by the context. The context-independent words maintain a constant probability. Fig. 4 illustrates these distinctions.
108
P. Lison
In the current implementation, the lexical activation network is constructed semi-manually, using a simple lexicon extraction algorithm. We start with the list of possible salient entities, which is given by: 1. the set of physical objects the vision system can recognise; 2. the set of nouns specified in the CCG lexicon with ‘object’ as ontological type. For each entity, we then extract its associated lexicon by matching domainspecific syntactic patterns against a corpus of dialogue transcripts.
Fig. 4. Graphical illustration of the word activation network
3.4
Language Modeling
We now detail the language model used for the speech recognition – a class-based trigram model enriched with contextual information provided by the cross-modal salience model. 3.5
Corpus Generation
We need a corpus to train any statistical language model. Unfortunately, no corpus of situated dialogue adapted to our task domain is available to this day. Collecting in-domain data via Wizard of Oz experiments is a very costly and time-consuming process, so we decided to follow the approach advocated in [24] instead and generate a class-based corpus from a task grammar. Practically, we first collected a small set of WoZ experiments, totalling about 800 utterances. This set is of course too small to be directly used as a corpus for language model training, but sufficient to get an intuitive idea of the utterances which are representative of our discourse domain. Based on it, we then designed a domain-specific context-free grammar able to cover most of the utterances. Weights were automatically assigned to each grammar rule by parsing our initial corpus, hence leading to a small stochastic context-free grammar. As a last step, this grammar is randomly traversed a large number of times, which yields the final corpus.
A Salience-Driven Approach to Speech Recognition
3.6
109
Salience-Driven, Class-Based Language Models
The objective of the speech recognizer is to find the word sequence W∗ which has the highest probability given the observed speech signal O and a set E of salient objects: W∗ = arg max W
P (W|O; E)
(3)
P (O|W) ×
= arg max W
acoustic model
P (W|E)
(4)
salience-driven language model
For a trigram language model, the probability of the word sequence P (w1n |E) is: P (w1n |E)
n
P (wi |wi−1 wi−2 ; E)
(5)
i=1
Our language model is class-based, so it can be further decomposed into wordclass and class transitions probabilities. The class transition probabilities reflect the language syntax; we assume they are independent of salient objects. The word-class probabilities, however, do depend on context: for a given class – e.g. noun -, the probability of hearing the word ‘laptop’ will be higher if a laptop is present in the environment. Hence: P (wi |wi−1 wi−2 ; E) =
P (wi |ci ; E) word-class probability
×
P (ci |ci−1 , ci−2 )
(6)
class transition probability
We now define the word-class probabilities P (wi |ci ; E):
P (wi |ci ; E) =
P (wi |ci , ek ) × P (ek )
(7)
ek ∈E
To compute P (wi |ci , ek ), we use the lexical activation network specified for ek : ⎧ P (wi |ci ) + α1 ⎪ ⎪ ⎨ P (wi |ci ) − α2 P (wi |ci , ek ) = ⎪ ⎪ ⎩ P (wi |ci )
if if
wi ∈ activatedWords(ek ) wi ∈ / activatedWords(ek ) ∧ wi ∈ contextDependentWords
(8)
else
The optimum value of α1 is determined using regression tests. α2 is computed relative to α1 in order to keep the sum of all probabilities equal to 1: α2 =
|activatedWords| × α1 |contextDependentWords| − |activatedWords|
These word-class probabilities are dynamically updated as the environment and the dialogue evolves and incorporated into the language model at runtime.
110
4 4.1
P. Lison
Evaluation Evaluation Procedure
We evaluated our approach using a test suite of 250 spoken utterances recorded during Wizard-of-Oz experiments (a representative subset of the 800 utterances initially collected). The participants were asked to interact with the robot while looking at a specific visual scene. We designed 10 different visual scenes by systematic variation of the nature, number and spatial configuration of the objects presented. Fig. 5 gives an example of visual scene.
Fig. 5. Sample visual scene including three objects: a box, a ball, and a chocolate bar
The interactions could include descriptions, questions and commands. No particular tasks were assigned to the participants. The only constraint we imposed was that all interactions with the robot had to be related to the shared visual scene. After being recorded, all spoken utterances have been manually segmented one-by-one, and transcribed (without markers or punctuation). 4.2
Results
Table 1 summarises our experimental results. We focus our analysis on the WER of our model compared to the baseline – that is, compared to a class-based trigram model not based on salience. The table details the WER results obtained by comparing the first recognition hypothesis to the gold standard transcription. Below these results, we also indicate the results obtained with NBest 3 – that is, the results obtained by considering the first three recognition hypotheses (instead of the first one). The
A Salience-Driven Approach to Speech Recognition
111
Table 1. Comparative results of recognition performance
Word Error Rate [WER]:
Classical LM
Salience-driven LM
vocabulary size 200 words
25.04 % (NBest 3: 20.72 %)
24.22 % (NBest 3: 19.97 %)
vocabulary size 400 words
26.68 % (NBest 3: 21.98 %)
23.85 % (NBest 3: 19.97 %)
vocabulary size 600 words
28.61 % (NBest 3: 24.59 %)
23.99 % (NBest 3: 20.27 %)
word error rate is then computed as the minimum value of the word error rates yielded by the three hypotheses2 .
4.3
Analysis
As the results show, the use of a salience model can enhance the recognition performance in situated interactions: with a vocabulary of about 600 words, the WER is indeed reduced by 28.61−23.99 × 100 = 16.1 % compared to the 28.61 baseline. According to the Sign test, the differences for the last two tests (400 and 600 words) are statistically significant. As we could expect, the salience-driven approach is especially helpful when operating with a larger vocabulary, where the expectations provided by the salience model can really make a difference in the word recognition. The word error rate remains nevertheless quite high. This is due to several reasons. The major issue is that the words causing most recognition problems are – at least in our test suite – function words like prepositions, discourse markers, connectives, auxiliaries, etc., and not content words. Unfortunately, the use of function words is usually not context-dependent, and hence not influenced by salience. By classifying the errors according to the part-of-speech of the misrecognised word, we estimated that 89 % of the recognition errors were due to function words. Moreover, our test suite is constituted of “free speech” interactions, which often include lexical items or grammatical constructs outside the range of our language model. 2
Or to put it slightly differently, the word error rate for NBest 3 is computed by assuming that, out of the three suggested recognition hypotheses, the one finally selected is always the one with the minimal error.
112
5
P. Lison
Conclusion
We have presented an implemented model for speech recognition based on the concept of salience. This salience is defined via visual and linguistic cues, and is used to compute degrees of lexical activations, which are in turn applied to dynamically adapt the ASR language model to the robot’s environment and dialogue state. The obtained experimental results demonstrate the effectiveness of our approach. It is worth noting that the primary role of the context-sensitive ASR mechanism outlined in this paper is to establish expectations about uttered words which are most likely to be heard given the context – that is, to anticipate what will be uttered. In [16], we move a step further, and explain how we can also use the context as a discrimination tool to select the most relevant interpretations of a given utterance.
Acknowledgements My thanks go to G.-J. Kruijff, H. Zender, M. Wilson and N. Yampolska for their insightful comments. The research reported in this article was supported by the EU FP6 IST Cognitive Systems Integrated project Cognitive Systems for Cognitive Assistants “CoSy” FP6-004250-IP. The first version of this paper appeared in [25].
References 1. Langley, P., Laird, J.E., Rogers, S.: Cognitive architectures: Research issues and challenges. Technical report, Institute for the Study of Learning and Expertise, Palo Alto, CA (2005) 2. Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Communication: Special Issue on Bridging the Gap Between Human and Automatic Speech Processing 49, 418–435 (2007) 3. Gruenstein, A., Wang, C., Seneff, S.: Context-sensitive statistical language modeling. In: Proceedings of INTERSPEECH 2005, pp. 17–20 (2005) 4. Chai, J.Y., Qu, S.: A salience driven approach to robust input interpretation in multimodal conversational systems. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing 2005, Vancouver, Canada, October 2005, pp. 217–224. Association for Computational Linguistics (2005) 5. Qu, S., Chai, J.: An exploration of eye gaze in spoken language processing for multimodal conversational interfaces. In: Proceedings of the Conference of the North America Chapter of the Association of Computational Linguistics, pp. 284–291 (2007) 6. Roy, D., Mukherjee, N.: Towards situated speech understanding: visual context priming of language models. Computer Speech & Language 19(2), 227–248 (2005) 7. Hawes, N., Sloman, A., Wyatt, J., Zillich, M., Jacobsson, H., Kruijff, G.M., Brenner, M., Berginc, G., Skocaj, D.: Towards an integrated robot with multiple cognitive functions. In: AAAI, pp. 1548–1553. AAAI Press, Menlo Park (2007) 8. Steedman, M., Baldridge, J.: Combinatory categorial grammar. In: Borsley, R., B¨ orjars, K. (eds.) Nontransformational Syntax: A Guide to Current Models. Blackwell, Oxford (2009)
A Salience-Driven Approach to Speech Recognition
113
9. Baldridge, J., Kruijff, G.J.M.: Coupling CCG and hybrid logic dependency semantics. In: ACL’02: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 319–326. Association for Computational Linguistics (2002) 10. Carroll, J., Oepen, S.: High efficiency realization for a wide-coverage unification grammar. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 165–176. Springer, Heidelberg (2005) 11. Kruijff, G., Lison, P., Benjamin, T., Jacobsson, H., Hawes, N.: Incremental, multilevel processing for comprehending situated dialogue in human-robot interaction. In: Language and Robots: Proceedings from the Symposium (LangRo’2007), Aveiro, Portugal, December 2007, pp. 55–64 (2007) 12. Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003) 13. Jacobsson, H., Hawes, N., Kruijff, G.J., Wyatt, J.: Crossmodal content binding in information-processing architectures. In: Proceedings of the 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), Amsterdam, The Netherlands, March 12-15 (2008) 14. Knoeferle, P., Crocker, M.: The coordinated interplay of scene, utterance, and world knowledge: evidence from eye tracking. Cognitive Science (2006) 15. Van Berkum, J.: Sentence comprehension in a wider discourse: Can we use ERPs to keep track of things? In: Carreiras Jr., M., Chiarcos, C. (eds.) The on-line study of sentence comprehension: Eyetracking, ERPs and beyond, pp. 229–270. Psychology Press, New York (2004) 16. Lison, P.: Robust processing of situated spoken dialogue. In: Chiarcos, C., de Castilho, R.E., Stede, M. (eds.) Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009, Potsdam, Germany. Narr Verlag (2009) 17. Roy, D.: Semiotic schemas: A framework for grounding language in action and perception. Artificial Intelligence 167(1-2), 170–205 (2005) 18. Brick, T., Scheutz, M.: Incremental natural language processing for HRI. In: Proceeding of the ACM/IEEE international conference on Human-Robot Interaction (HRI’07), pp. 263–270 (2007) 19. Zender, H., Kruijff, G.J.M.: Towards generating referring expressions in a mobile robot scenario. In: Language and Robots: Proceedings of the Symposium, Aveiro, Portugal, December 2007, pp. 101–106 (2007) 20. Landragin, F.: Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems. Signal Processing 86(12), 3578–3595 (2006) 21. Grosz, B.J., Sidner, C.L.: Attention, intentions, and the structure of discourse. Computational Linguistics 12(3), 175–204 (1986) 22. Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21(2), 203–225 (1995) 23. Kelleher, J.: Integrating visual and linguistic salience for reference resolution. In: Creaney, N. (ed.) Proceedings of the 16th Irish conference on Artificial Intelligence and Cognitive Science (AICS-05), Portstewart, Northern Ireland (2005) 24. Weilhammer, K., Stuttle, M.N., Young, S.: Bootstrapping language models for dialogue systems. In: Proceedings of INTERSPEECH 2006, Pittsburgh, PA (2006) 25. Lison, P.: A salience-driven approach to speech recognition for human-robot interaction. In: Proceedings of the 13th ESSLLI student session, Hamburg, Germany (2008)
Language Technologies for Instructional Resources in Bulgarian Ivelina Nikolova Institute for Parallel Processing, Bulgarian Academy of Sciences 25A G. Bonchev Str, 1113 Sofia, Bulgaria [email protected] http://lml.bas.bg/~ iva
Abstract. This paper describes a language technology based system for providing computer-aided design of test items. Plain text instructional materials are being processed with various tools for POS tagging, constituency parsing and term extraction and lexical and syntactic information is being collected. The system compiles a list of terms which are central for the instructional materials, creates drafts of fill-in-the-blank questions and suggests possible distractors. The experiment is carried out on textbooks in geography, biology and history of Bulgarian high-schools. Keywords: Term extraction, multiple-choice test items generation, natural language processing.
1
Introduction and Related Work
Asking questions is a way to keep students attention in class and verify their understanding. Depending on the type of education and the goal of the teacher, questions could be asked in a different form - orally or as short writing examination, in a game manner etc. One common technique to do that is asking multiple-choice questions, which became even more popular in the last years, because it is also applicable in the e-learning. However, designing thousands of tests is a time and effort-consuming educational activity. All questions in the test should be carefully tuned for the target group of test-takers and should not underestimate or overestimate their knowledge. Hence the teaching experts who prepare the tests must have much broader knowledge in the field, compared to the content explicitly included in the particular textbook, and they have to tune the tests to the knowledge of the test-takers. One of the most difficult tasks in producing test items is to decide whether a question does really have its answer in the instructional materials. These difficulties gave rise of a relatively new research area dealing with support of the generation of test items, answer and distractor suggestions. Multiplechoice test item generation (MCTI) with the help of NLP technologies is a hot area where different tools for text processing are used in order to transform the facts from the instructional materials to questions which can be used for students assessment. One of the most interesting approaches in this respect is described T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 114–123, 2010. c Springer-Verlag Berlin Heidelberg 2010
Language Technologies for Instructional Resources in Bulgarian
115
in detail in [7], where language technologies (LT) are applied for the generation of test-items in English, focusing on the automatic choice of distractors. While the initial idea was already presented in [5], the first approach to really use NLP techniques and to feature a proper evaluation was [6]. They report speeding up of the process of test development about 6-10 times, compared to the manual test elicitation. Their approach is not domain specific and can be applied to any area. Other authors actively working in the area are [1]. They are focusing on the different types of question models with application primarly in the language learning. We are not familiar with any related work concerning this activity for learning materials in Bulgarian except for the previous work of the author [8]. So our efforts are strongly inspired by the growing interest to this field, which is due to its significant practical importance. On the other hand, we are motivated and encouraged by the presence of LT for Bulgarian language, which enable relatively complex text preprocessing, so the automatic acquisition of learning objects from raw texts does not start from scratch. This article presents the master thesis’ idea of the author which is still work in progress. The aim is to develop a workbench supporting test designers by language technologies, applied to instructional materials. The task has three aspects: (1) suggestion of key terms for (2) question generation and (3) distractor suggestion. For our purpose the text is preprocessed by a number of already available LT modules and lexical and syntactic features are extracted and kept in meta-data format. These features are used later on for the generation of draft learning objects. The experiment described in the article has been applied for three different domain areas Geography, Biology and History. The materials are taken from textbooks for 9th, 10th and 11th grade respectively. The remaining part of this article is organised as follows: we first sketch the general architecture of the system in section 2; in section 3 we describe the data processing; section 4 explains in detail the experiment done so far; section 5 concerns the evaluation at the current state; section 6 presents the conclusion and issues for future work. 1.1
Workbench Description
The system expects in the input instructional materials supplied by the testdesigner and returns a set of draft learning objects which are to help the test items preparation. As shown in Fig.1 once the materials are loaded they are being preprocessed and two main data sets are created: (a) list of key terms (terms central for the supplied text), its generations is explained later in section 4.1 and (b) lexical and syntactic information about the supplied text, which is kept in metadata format. Then the user may obtain all possible questions generated from the supplied material or the ones related to a certain key term she is interested in. Together with each question the system offers correct answer and possible distractors. If the system does not find appropriate sentences, containing the term, which match its internal question templates (explained later in section 4.2), it returns a list of pointers to the text, containing the local context in
116
I. Nikolova
Fig. 1. Workbench supporting the development of multiple-choice test items
which the term appears and a list of related concepts, generated by the same model as the distractors are.
2
Data Processing
Our task is to support test makers during the educational resources building process, namely generation of MCTI and vocabulary of important concepts for the domain. We apply language technologies over the raw instructional materials and obtain linguistic resources which are to be loaded into a workbench that help the test designers during their work. For our purpose we passed through several phases as shown in Fig. 2. The instructional material is taken in a plain text format and is firstly parsed with an NP extractor. The obtained nouns and noun phrases form a list of potential key terms, which are to be filtered and only the most significant of them will be suggested to the test designers. During the term extraction an inverted index is produced. It contains a list of the extracted NPs (nouns and noun phrases) and their corresponding absolute position in the text. A threshold for the importance of the extracted terms is set and all NPs with frequency higher than the threshold are included in the list of key terms. In addition all the NPs that contain a noun which is a key term are also included in the key terms list. During the next phase the raw text is tagged for POS categories. For our case we found practical to use the SVMTool made by [4] which was trained over the newspaper part of BulTreeBank1 . The proper names, recognised by the tagger were added to the list of key terms and then the output was processed with the 1
HPSG-based Syntactic Treebank of Bulgarian (BulTreeBank), http://bultreebank.org/
Language Technologies for Instructional Resources in Bulgarian
117
Fig. 2. Data processing
multilingual statistical parsing engine of Dan Bikel [2], which is implementation and extension of Collins parser referred below as [3]. The parsing model was trained on BulTreeBank. All the syntactic and lexical information obtained in these phases is kept in meta-format and used later in order to produce draft learning objects (key terms, test items), which are suggested to the test designers.
3 3.1
The Experiment Key Terms Suggestion
We build our approach on the understanding that questions given to the learner concern terms, which are central for the domain. These are the terms which serve as a basis for the learning material and represent a specific domain vocabulary. Here those terms are referred to as key terms. Although verbs might be also qualified as good key terms in some studies, in this experiment we pay attention only to nouns and noun phrases as potential key terms. They were extracted by the classic approach for automatic term extraction based on frequencies. In order to overcome the problem of the inflection of the language the raw texts were lemmatized the output of the NP-extractor Morena. Once we obtained a list of nouns (LN) and a list of noun phrases (LNP) we had to rank them in order to extract only the most important ones which are the focus of our approach and users queries. We applied two different techniques for measuring the term importance over LN: simple frequency counting and tf-idf measuring. As reported by [7], we also noticed that tf-idf produces worse results as it tends to give low score to frequently used words (for example stopanstvo - economy) which are
118
I. Nikolova
Table 1. Word frequency distribution in a text with length about 1000 words word frequency fi number of words wf with frequency fi 55 1 46 1 22 6 20 1 18 1 wf ≤ fi 16 1 14 1 12 5 10 5 8 6 6 8 4 44 wf ≥ fi 2 174
actually quite important in the case of instructional materials (it is common to repeat the same information to the learners in order to force them to better remember it). At the same time sorting the list of nouns by their frequencies, after removing the stop words, gave us quite satisfying results. To set a threshold for important and less important terms in previous experiments we have observed test items, prepared manually by the test designers, concerning the same material as the corpora we are processing. The test items were parsed with an NP extractor. We checked the popularity of the NPs, extracted from the test items, in the whole corpus. The lowest popularity was accepted as a threshold. After repeating the same procedure for different domain corpora we noticed that the importance border is near the term frequency, which equals to the number of words having that count. For example in a comparatively short text we have the following figure (see table 1) where the threshold is set to frequency f = 7. Once the threshold is adjusted, we consider all the terms above it as key terms which should be suggested to the test-makers. Additionally we enrich the list of key terms with all NPs, which contained key terms. For example: along with the term (economy) from the materials in geography we add the following NPs: agrarno stopanstvo (agrarian economy), svetovno stopanstvo (world economy), nacionalno stopanstvo (national economy), pazarno stopanstvo (market economy), nacionalno pazarno stopanstvo (national market economy), svremenno svetovno stopanstvo (contemporary world economics), ponsko stopanstvo (Japanese economy), naturalno stopanstvo (natural economy), svremenno moderno agrarno stopanstvo (contemporary modern agrarian economy)
Language Technologies for Instructional Resources in Bulgarian
119
To prevent the use of phrases like thnoto stopanstvo (their economy) we removed all phrases containing stop words. In our case stop words most often appear to be personal and possessive pronouns. After the POS tagging the recognised proper nouns were also added to the list of key terms and the final list of key terms was formed. Finally, to summarise, our list of key terms contains: (+) all nouns and noun phrases with term frequency higher than the threshold; (+) all noun phrases which contain a noun being a key term; (+) all proper nouns; (−) excluding noun phrases containing stopwords. 3.2
Question Generation
In order to filter out clauses which are appropriate for question generation a module processes the lexico-syntactic information collected during the preprocessing phase and decides whether a clause is eligible if: (1) it contains at least one key term; (2) the term is in a NPA clause of its VPS2 (the NPA clause is the subject daughter of VPS phrase) and (3) the clause is finite. If the three conditions are present, we consider that the term is in the subject phrase of the sentence, which means that it is has central meaning for the sentence and we apply a rule which replaces the focal term with a blank. The system additionally checks whether the sentences do not point to some figures or tables, appendixes. For example in the materials of Biology the terms nasledstvenost (heredity) and unasledvane (inheritance) are key terms. And we have the following information about the constituents for one of the sentences which contain these terms. (S (VPS (NPA (N (NN Blagodarenie)) (PP (Prep (IN na)) (Ncfsd nasledstvenostta) (CoordP (Conj (C (CC i)) (Ncnsd unasledvaneto)) (ConjArg (NPA (N (NN vidovete)) (PP (Prep (IN v)) (N (NN prirodata)))))))) (VPC (V (T (RP ne)) (Pron (Ppxta se)) (V (VB proment))) (NPA (A (JJ dlgo)) (N (NN vreme))))) (PUNC .)) Whichever of both terms is chosen by the user the system will try to produce a stem from this sentence because it satisfies the three necessary conditions. Thus it will replace the suggested key term with a blank, the sentence will be then suggested as a fill-in-the-blank question and the key term as an answer. 2
NPA - head-adjunct noun phrase / VPS -head-subject verb phrase for full definitions HPSG-based Syntactic Treebank of Bulgarian (BulTreeBank), BulTreeBank Project Technical Report 05. 2004, http://bultreebank.org/TechRep/BTB-TR05.pdf
120
I. Nikolova
E.g. Blagodarenie na ... i unasledvaneto vidovete v prirodata ne se promenqt dlgo vreme. (Due to ... and inheritance the species remain unchanged for long periods.) correct answer: nasledstvenostta (the heredity) In the following sentence, again the key term nasledstvenost is present. (S (VPS (NPA (CoordP (ConjArg (NPA (N (NN Izuqavaneto)) (PP (Prep (IN na)) (Ncfsd nasledstvenostta) (CoordP (Conj (C (CC i))) (Ncfsd izmenqivostta))))) (Conj (C (CC i))) (ConjArg (N (NN razkrivaneto)))) (PP (Prep (IN )) (Ncmpd ) (Pron (Ppetdp3 )))) (VPC (V (VB )) (NPA (A (JJ )) (N (NN )) (IN ))) (Ncfsd )) (PUNC .)) The term is a part of the subject phrase, so it is possible to make a fill-in-theblank question, where the blank will replace the focal term nasledstvenostta. Izuqavaneto na ... i izmenqivostta i razkrivaneto na zakonomernostite im sa osnovnite zadaqi na genetikata. (The study of ... and variability and the discovery of their regularities are the basic tasks of genetics.) correct answer: nasledstvenostta (heredity) Except for the change of the focal term with a blank, we do not apply any other transformation to the chosen sentence. This way the our method garantees a good grammatical formness of the suggested questions. 3.3
Distractor Generation
For the purpose of our application we need to suggest distractors in two cases: (1) when questions are generated automatically and (2) when a key term was chosen by the designer, but no questions could be generated for that key term, then only related concepts are shown to the user (they are extracted by the same principle as distractors and that is why we explain their construction in this section). In well-designed MCTI, the distractors are always semantically close to the correct answer (as well as to each other, in a sense). To find such distractors in previous studies we have tried paragraph clustering in order to define groups of text sections which have similar topics, but in short text this methodology does not give a promising result. Because of that we chose a rather simple working solution. We observed already prepared tests for beginners level and we noticed that most of the distractors looked very similar in first sight. They were mostly phrases holding the same noun and different modifiers or the opposite, composed by the same modifier and different nouns. That is why we accepted the practice to suggested as distractors NPs, that contain the same noun, as the key term chosen by the user and then change the modifier of the phrase. Also the other way round, we change the noun of the chosen key term and suggest phrases with the same modifier and different noun. All these phrases are taken from the NP list generated in the first stage. Such examples are shown on table 2 and 3.
Language Technologies for Instructional Resources in Bulgarian
121
Table 2. Distractors with constant modifier Constant modifier priroden kompleks (natural complex) prirodna zona (natural zone) priroden komponent (natural component) Table 3. Distractors with constant noun Constant noun agrarno stopanstvo (agrarian economy) svetovno stopanstvo (world economy) nacionalno stopanstvo (national economy)
In result for a given key term like priroden komponent (natural component), we would obtain the following FIB, correct answer and distractors. Key term: FIB phrase:
priroden komponent (natural component) Kompleksi, koito obedinvat vsiqki ............ na dadena teritori za opredeleno vreme se nariqat plni prirodni kompleksi. (Complexes which unite all ....... on certain territory for certain period of time are called full nature complexes. ) correct answer: priroden komponent (natural component) distractors: priroden kompleks natural complex) prirodni zoni (nature zone)
4
Evaluation
At the current state the system has been tested by three teachers, who are professional test designers. Each one of them is a specialist in one of the three areas and has a degree also in one of the others. They have experimented with materials in the three domains Biology, Geography and History. Each designer had to choose 20 key terms in total and to evaluate with a YES/NO mark (YES acceptable question with or without need to be changed; NO - not acceptable question) the questions produced by the system, related to the chosen key terms. From the materials in Biology and Geography, where the terminology is more specific useful definitions were extracted and were appreciated by the designers. For the History domain where the language use is more general mainly proper names were helpful. In total the average of the generated fill-in-the-blank questions reported as acceptable by the designers were 61% (with or without postediting). The professionals shared that the context and the distractors turned very helpful, because they gave them more options to seek for related information. The reasons for discarding the rest of the questions were mainly that some of the sentences had common meaning and did not represent specific definition; some others were discarded because the blank was ambiguous - they had two
122
I. Nikolova
many possible options for a correct answer; or the chosen term was not central for the sentence which was chosen. The designers were especially satisfied with the high quality of the key terms which served as a cross-reference over the whole material. They find them useful in order to systematise the topics on which the student could be examined. This feature has time-saving effect because the vocabulary of key terms represents a summary of the contents. Deeper analysis of the speeding-up of the process will be done after improving the user interface of the system. The test designers were certain that the so-prepared question items are useful only in the case of beginner level testing, where deep understanding is not required and learners are taught mostly basic definitions.
5
Conclusion and Future Work
This experiment represents a step towards the automatic test generation and it shows the advances gained using more sophisticated tools and deeper processing of the instructional materials. Although the approach is considered as domain independent we consider Biology and Geography more suitable, producing better results than History. One of the reasons is that in History pure definitions in one sentence are hardly found and normally many references are used. In this domain important role had the proper names which were also included in the list of key terms. As this article represents a work in progress we plan to go deeper in the data analysis by adding dependency parsing. Then we can observe the subject and object clauses and make additional inferences. We will also try different techniques for distractor selection, such as using term similarity measures over the corpus and different types of questions. We plan to improve the user interface, because it is a main issue which concerns the efficiency of the work of the test designers. Overall we plan deeper evaluation of the system, including classical test theory and error analysis in order to improve the produced items. Acknowledgements. My thanks go to my supervisor Galia Angelova and for Atanas Chanev who kindly provided models for the SVMTool and Dan Bikel’s parser for Bulgarian.
References 1. Aldabe, I., et al.: Automatic acquisition of didactic resources: generating test-based questions. In: de Castro, I.F. (ed.) Proceeding of SINTICE ’07, pp. 105–111 (2007) 2. Bikel, D.: A Distributional Analysis of a Lexicalized Statistical Parsing Model. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP (2004) 3. Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania (1999) 4. Gimen´ez, J., M´ arquez, L.: SVMTool: A general POS tagger generator based on Support Vector Machines. In: Proceedings of the 4th International Conference LREC’04 (2004)
Language Technologies for Instructional Resources in Bulgarian
123
5. Mitkov, R.: Computer-aided testing: automatic generation of achievement tests from textbooks. In: Proceedings of the International Conference Nouveaux modes d’acquisition du savoir et travail humain (invited talk). Sfax, Tunisia (1998) 6. Mitkov, R., Ha, L.A.: Computer-aided generation of multiple-choice tests. In: Proceedings of the HLT/NAACL 2003 Workshop on Building educational applications using Natural Language Processing, Edmonton, Canada, pp. 17–22 (2003) 7. Mitkov, R., et al.: A computer-aided environment for generating multiple-choice test items. Natural Language Engineering 12, 177–194 (2006) 8. Nikolova, I.: Supporting the Development of Multiple-Choice Tests in Bulgarian by Language Technologies. In: Paskaleva, E., Slavcheva, M. (eds.) Proceedings of the Workshop A Common Natural Language Processing Paradigm for Balkan Languages, pp. 31–34 (2007)
Description Logics for Relative Terminologies Szymon Klarman Vrije Universiteit Amsterdam, Department of Computer Science, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands [email protected]
Abstract. Context-sensitivity has been for long a subject of study in linguistics, logic and computer science. Recently the problem of reasoning with contextual knowledge has been picked up also by the Semantic Web community. In this paper we introduce a conservative extension to the Description Logic ALC which supports representation of ontologies containing relative terms, such as ‘big’ or ‘tall’, whose meaning depends on the choice of a particular comparison class (context). We define the language and investigate its computational properties, including the specification of a tableau-based decision procedure and complexity bounds.
1
Introduction
It is a commonplace observation that the same expressions might have different meanings when used in different contexts. A trivial example could be that of the concept The Biggest. Figure 1 presents three snapshots of the same knowledge base which focus on different parts of the domain. The extension of the concept visibly varies across the three takes. Intuitively, there seem to be no contradiction in that individual Moscow is an instance of The Biggest, when considered in the context of European cities, an instance of ¬The Biggest, when contrasted with all cities, and finally, an instance of none of these when the focus is only on the cities in Asia. Natural language users resolve such superficial incoherencies simply by recognizing that certain terms, call them relative, such as The Biggest, acquire definite meanings only when put in the context of other denoting expressions1 — in this case, expressions denoting so-called comparison classes, i.e. collections of objects with respect to which the terms are used [1,2]. The problem of context-sensitivity has been for long a subject of studies in linguistics, logic and even computer science. Recently, it has been also encountered in the research on the Semantic Web [3,4], where the need for representing and reasoning with imperfect information becomes ever more pressing. Relativity of meaning appears as one of common types of such imperfection. Alas, Description Logics (DLs), which form the foundation of the Web Ontology Language OWL [5], the basic knowledge representation formalism on the Semantic Web, were 1
The philosophy of language qualifies them as syncategorematic, i.e. ones that do not form denoting expressions by themselves.
T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 124–141, 2010. c Springer-Verlag Berlin Heidelberg 2010
Description Logics for Relative Terminologies &,7<
(8523($1B&,7<
¬7+(B%,**(67 Ɣ$PVWHUGDP
Ɣ6\GQH\
Ɣ0RVFRZ
7+(B%,**(67 Ɣ7RN\R
125
$6,$1B&,7<
¬7+(B%,**(67 Ɣ$PVWHUGDP
¬7+(B%,**(67
7+(B%,**(67 Ɣ0RVFRZ
7+(B%,**(67 Ɣ7RN\R
Fig. 1. Example of a relative concept The Biggest
originally developed for modeling crisp, static and unambiguous knowledge, and as such, are incapable of handling the task seamlessly. Consequently, it has become highly desirable to look for more expressive, ideally backward compatible languages to meet the new application requirements [4,6]. In this paper we define a simple, conservative extension to the DL ALC, which is intended for representing those relative terminologies, for which the nature of contextualization complies strictly to the following assumption: CONTEXT = COMPARISON CLASS This understanding of contexts, although very specific, is not uncommon in practical applications. In some domains, for instance geographical or medical, the use of qualitative descriptions involving relative terms is a typical way of escaping arbitrary threshold-based classification criteria. Technically, the adopted approach rests on a limited use of two-dimensional modal semantics [7], in which the basic object-oriented DL language can obtain multiple interpretations relative to possible worlds on a separate context dimension. Thus, scenarios like the one shown in Fig. 1 can be represented in an intuitive and elegant manner, conceptually and formally compatible with the model-theoretic paradigm of DLs. This paper is a revised version of [8] and its follow-up [9]. The language presented here is considerably constrained with respect to the original proposal, while a much deeper study of its computational aspects is provided. In the next section we define the syntax and the semantics of the extension. Further, we specify a tableau-based decision procedure and derive complexity bounds on the satisfiability problem. In the last two sections, we shortly position our work in a broader perspective and conclude the presentation.
2
Representation Language
For a closer insight into the problem consider again the scenario from Fig. 1. Apparently, there is no straightforward way of modeling it in a standard DL fashion. Asserting both Moscow : The Biggest and Moscow : ¬The Biggest in the same knowledge base results in an immediate contradiction. On the other hand, using indices for marking versions of a concept in different contexts, such as in Moscow : The Biggest EC and Moscow : ¬The Biggest C, indeed
126
S. Klarman
allows to avoid inconsistency, but for the price of a full syntactic and semantic detachment of the indexed versions. Thus, the latter strategy makes it impossible to impose global constraints on the contextualized concepts, for instance, to declare that regardless of the context, The Biggest is always a subclass of Big. Moreover, neither of the approaches facilitates use of knowledge about comparison classes per se, for instance, in order to infer contradiction in case European City happens to be equivalent to City, and thus denote exactly the same context. Finding a suitable fix for this kind of flaws motivates to a big extent our proposal. The logic CALC , introduced in this paper, extends the basic DL ALC with a modal-like operator which internalizes the use of comparison classes in the language. The classes are denoted by means of arbitrary DL concepts. Semantically, the operator is grounded in an extra modal dimension incorporated into DL interpretations, whose possible states are subsets of the object domain. We start by recalling the basic nomenclature of DLs and then give a detailed account of the syntax and the semantics of CALC . 2.1
Description Logic ALC
A DL language is specified by a signature Σ = (NI , NC , NR ), where NI is a set of individual names, NC a set of concept names, and NR a set of role names, and a set of operators enabling construction of complex formulas [10]. The DL ALC permits concept descriptions defined by means of concept names (atomic concepts), special symbols , ⊥ and the following constructors: C, D, r → ¬C | C D | C D | ∃r.C | ∀r.C where C, D are arbitrary concept descriptions and r is a role. A knowledge base K = (T , A) in ALC, consists of the terminological and the assertional component. The (general) TBox T contains concept inclusion axioms C D (abbreviated to C ≡ D whenever C D and D C). The ABox A contains axioms of possibly two forms: concept assertions C(a) and role assertions r(a, b), where a, b are individual names. The semantics is defined in terms of an interpretation I = (ΔI , ·I ), where I Δ is a non-empty domain of individuals, and ·I is an interpretation function, which maps every a ∈ NI to an element of ΔI , every C ∈ NC to a subset of ΔI and every r ∈ NR to a subset of ΔI × ΔI . The function is inductively extended over complex terms in a usual way, according to the semantics of the operators. An interpretation I satisfies an axiom in either of the cases below: – I |= C D iff C I ⊆ DI , – I |= C(a) iff aI ∈ C I , – I |= r(a, b) iff aI , bI ∈ rI . An interpretation is a model of a knowledge base iff it satisfies all its axioms.
Description Logics for Relative Terminologies
2.2
127
Description Logic CALC
The logic CALC adds to the syntax of ALC a new concept constructor, based on modal-like context operator ·: C, D → DC A contextualized concept description consists of a relative concept C and a specified comparison class D, which co-determines the meaning of C. Intuitively,
DC denotes all objects which are C as considered in the context of all objects which are D. For instance, CityThe Biggest describes the individuals that are the biggest as considered in the context of (all and only) cities. Other than that CALC does not differ from ALC on the syntactic level. Some deeper changes are introduced to the semantics of the language, which is augmented with an extra modal dimension, whose possible states — comparison classes/contexts — are defined extensionally as subsets of the (global) domain of interpretation. In each context a relevant part of the vocabulary is freely reinterpreted. Definition 1 introduces the notion of context structure which is an interpretation of a CALC language. Definition 1. A context structure for a CALC language is a triple C = Δ, W, {Iw }w∈W , where: – Δ is a global domain of interpretation, – W ⊆ ℘(Δ) is a set of comparison classes, with Δ ∈ W and ∅ ∈ W, – Iw = (ΔIw , ·Iw ) is an interpretation of the language in the context w: • ΔIw = w is a non-empty domain of individuals, • ·Iw is an interpretation function defined as usual. Given a context structure C = Δ, W, {Iw }w∈W we can now properly define the semantics of contextualized concept descriptions: ( DC)Iw = {x ∈ ΔIw | x ∈ DIw ∧ x ∈ C Iw|D } where w | D is an operation returning v ∈ W such that v = DIw . The accessibility relation over W , which we leave implicit, visibly follows the ⊇-ordering of the comparison classes, with Δ ∈ W being its least element. Put differently, the context operator might give access only to a world whose domain is a subset of the current one. We also do not introduce the dual ‘box’ operator, as not very interesting from the modeling perspective and, moreover, practically redundant, even as an abbreviation for the usual ¬ D¬C. Observe that according to our semantics ¬ D¬C = ¬D DC, hence a CALC formula in Negation Normal Form does not in fact contain negations in front of ·. For a finer-grained treatment of context-sensitivity we pose a few additional, natural constraints on the local interpretations of the vocabulary. First, we note that in general not the whole language should always be interpreted in a context, but only its part which is deemed meaningful in it. In our case, this is especially apparent with respect to individual names, which are in principle rigid, but in certain contexts might be loosing their designations. This phenomenon is sanctioned by the following assumption:
128
S. Klarman
(RI) for every a ∈ NI and w, v ∈ W , if aIw and aIv are defined then aIw = aIv . Further, we distinguish between local and global concept names (NCl and NCg , respectively) and roles (NRl and NRg ). While the local terms (relative to contexts) are to be interpreted freely, the interpretations of the global ones (contextindependent) are constrained so as to behave backward-monotonically along the accessibility relation: (GC) for every C ∈ NCg and w ∈ W , C Iw = C IΔ ∩ ΔIw , (GR) for every r ∈ NRg and w ∈ W , rIw = rIΔ ∩ ΔIw × ΔIw . Finally, we allow local and global TBoxes (T l , T g ). The global axioms hold universally in all contexts, whereas the local ones apply only to the root of the context structure. The intuition here is that some terminological constraints are analytical and thus context-independent (global), whereas others seize to hold when the focus shifts to a specific comparison class (local). For decidability reasons the syntax of global axioms is restricted to the ALC fragment. ABox axioms are left local in the above sense, although it is straightforward to extend their validity to all contexts by means of global vocabulary. As expected, the notion of satisfaction in CALC is relativized to the context structure and a particular context in it, i.e. C, w |= ϑ iff ϑ is satisfied by Iw . A context structure C is a model of a knowledge base iff the constraints (RI), (GC), (GR) are respected in C, and all the axioms are satisfied with respect to the following contexts: – – – –
C, Δ |= C D, if C D ∈ T l , C, w |= C D for every w ∈ W , if C D ∈ T g , C, Δ |= C(a), C, Δ |= r(a, b).
It follows that both syntactically and semantically CALC is a conservative extension of ALC, i.e. an ALC knowledge base is satisfiable iff it is a satisfiable CALC knowledge base. 2.3
Representation of Relative Terminologies
As an example of a CALC knowledge base we will formalize a toy ontology of cities and towns and their relative sizes. On a larger scale, similar conceptualizations are common, for instance, in modeling geographic information, where not seldom are notions defined by means of relative terms referring to comparison classes. Such strategy allows to avoid the use of arbitrary value intervals on some physical attributes, and replace them by their qualitative and more practical approximations [11]. T l = { (1) (2) (3) T g = { (4) (5) A = { (6) (7)
City ≡ European City Asian City, European City Asian City ⊥, Town ≡ CitySmall } The Biggest Big, Big Small ⊥ }
CityThe Biggest(Tokyo),
Asian CityThe Biggest(Tokyo) }
Description Logics for Relative Terminologies
129
We assume that concepts Town, City, European City and Asian City are to be interpreted globally, whereas the remaining ones locally. The local TBox states that every city is either a European or an Asian city (1), that these two classes are disjoint (2), and that towns are the small cities (3). Further, we ensure that regardless of the context, the biggest objects are always big (4), and these in turn are never the same as small (5). Finally, we assert that Tokyo is the biggest as compared to cities (6) and as compared to Asian cities (7). Given this setup it can be shown, for instance, that the following entailments hold: K |= City ¬European CityBig(Tokyo) K |= ¬Town(Tokyo) The validity of the first entailment rests on the fact that Asian cities are exactly those that are cities but not the European ones (1,2). Hence, the comparison class denoted by Asian City is the same as that described by City ¬European City. Consequently, since Tokyo is an instance of The Biggest in the former context (7), this has to be the case as well in the latter. Finally, being the biggest there it has to be naturally an instance of Big (4). By similar reasoning we can also demonstrate the second claim. Observe that Tokyo is an instance of Big in the context of all cities (4,6), and therefore of ¬Small in that context (5). But then it follows that it cannot be a town, or else it would have to be a small city (3).
3
Reasoning with Comparison Classes
To properly frame the discussion over the computational aspects of CALC , we should first carefully consider the relationship between the syntactic and the semantic view on the contexts involved in our logic. Syntactically, every CALC formula ϑ induces a finite tree of context labels Λϑ = {γ, δ, . . .}, isomorphic to the structure of ·-nestings in the formula. For instance, the inclusion A
B( A B) gives rise to the tree in Fig. 2. The labels are represented as
⊆
. I mmε Δ m m m m m mmm
mA mmm m m m vmmm I Ai Δ A =AI
B
5 B ΔIB =B I n nn n ? n = nn nnn A B n nnn n n vnn I I B | A Δ B|A =AIB B | B Δ B|B =B IB Fig. 2. A tree of context labels and the context structure
130
S. Klarman
finite sequences of concepts separated with vertical lines: γ = C1 | . . . | Cn−1 | Cn , where every concept is the description occurring in some context operator on a certain depth of the formula. The empty label ε refers to the formula’s root. Labels can be then easily rendered back into the language as CALC concepts of the form γ L , where γ L = C1 . . . Cn−1 Cn for γ = C1 | . . . | Cn−1 | Cn . Let us shift now to the semantic perspective. Clearly, the ⊇-ordering of comparison classes in a context structure does not have to be necessarily tree-shaped. In fact, different descriptions of comparison classes might denote exactly the same subsets of the domain. This characteristic has to be appropriately handled in the reasoning. For that purpose we will find the following notion useful. An assignment of context equalities over a formula ϑ is a set Ω ⊆ {γ ∼ δ | γ, δ ∈ Λϑ }, where ∼ is an equivalence relation and Ω is closed under ∼. 3.1
Tableau Decision Procedure
The tableau calculus for CALC , presented in this section, is an extension of the well-known procedures for ALC [12]. The proof of satisfiability of a formula ϑ is a process of finding a complete and clash-free constraint system for ϑ (a set of logical constraints) by means of tableau rules. If such a system exists then ϑ is satisfiable — and unsatisfiable otherwise. The constraint systems are constructed by iterative application of inference rules to the constraints in the system. Apart from variables for representing domain objects we also use context labels for marking contexts and assume that both sets are well-ordered by some relation . By an abuse of notation we write γ ∈ S (or γ : x ∈ S) to say that label γ (or term x within the scope of label γ) occurs in the system S. A proof for ϑ = i ϑi , where every ϑi is a CALC axiom, is initiated by setting a constraint system containing ε : ϑi for all i. For simplicity, we assume that every concept inclusion ε : C D added to the tableau is instantaneously rewritten into an equivalent form ε : ≡ ¬C D, similarly ε : C ≡ D into ε : ≡ (¬C D) (C ¬D) and all concept occurring on the tableau are in Negation Normal Form. Finally, we allow a special type of constraints γ ∼ δ, which represent designated context equalities. The inference mechanism involves the standard ALC rules along with the CALC -specific rules, presented in the order of application in Tab. 1. The meaning of ⇒· is straightforward: it introduces a relative concept assertion within the scope of a newly generated context label, thus marking a transition of the proof into a different context. The ⇒∼ rule for every pair of different context labels occurring in the system decides non-deterministically whether the contexts denoted by them should be interpreted as equal or not. In either case respective constraints are added to the system to enforce generation of adequate models. Also, if the former is chosen, the rule introduces the corresponding equality statement over the context labels, which is used as a reference for application of the ALC rules. The rules ⇒⊆ and ⇒⊇ jointly ensure that for any context label γ | C used in the system, a variable x occurs within its scope if and only if the system contains a constraint γ : C(x). The remaining rules straightforwardly implement the semantics of global concepts, roles, and local and global TBox
Description Logics for Relative Terminologies
131
Table 1. CALC tableau rules
⇒·
if γ : CD(x) ∈ S then set S := S ∪ {γ | C : D(x)}
⇒∼
if {γ, δ} ⊆ S, where γ = δ, then set S := S ∪ {ε : (γ L ≡ δ L )} ∪ {γ ∼ δ} or S := S ∪ {ε : (γ L ≡ δ L )}
⇒⊆
if γ | C ∈ S then set S := S ∪ {ε : (γ L C γ L C)}
⇒⊇
if γ | C : x ∈ S then set S := S ∪ {γ : C(x)}
⇒C g if γ : C(x) ∈ S, where C ∈ NCg and δ : x ∈ S then set S := S ∪ {δ : C(x)} ⇒Rg
if γ : r(x, y) ∈ S, where r ∈ NRg and {δ : x, δ : y} ⊆ S then set S := S ∪ {δ : r(x, y)}
⇒≡T l if ε : ( ≡ C) ∈ S, where ≡ C ∈ T l and ε : x ∈ S then set S := S ∪ {ε : C(x)} ⇒≡T g if ε : ( ≡ C) ∈ S, where ≡ C ∈ T g and γ : x ∈ S then set S := S ∪ {γ : C(x)} ⇒≡
if ε : (C ≡ D) ∈ S then for a new -minimal x set S := S ∪ {ε : C ¬D(x)} or S := S ∪ {ε : ¬C D(x)}
axioms. The rule ⇒≡ applies only to the constraints introduced by ⇒∼ , which are interpreted locally. The ALC rules (⇒ , ⇒ , ⇒∃ , ⇒∀ , blocking and clash (branch closure), see [12]) are applied locally to the constraints with equal context labels, i.e. to the systems Sγ = {φ(x) | δ : φ(x) ∈ S and δ ∈ [γ]}, where [γ] = {δ ∈ S | δ = γ or δ ∼ γ ∈ S}. The constraints generated by a rule due to its application to the system Sγ are added to S with a -minimal context label from [γ]. As usual it is required that application of the ⇒∃ rule is deferred until no other rules apply. We say that S contains a clash if and only if there exists a label γ ∈ S such that Sγ contains a clash. In such cases no other rules are applicable to S. The correctness of the algorithm is proven in the appendix. 3.2
Computational Complexity
It turns out that the convenient expressiveness of the language is compromised by a noticeable expense in the complexity of reasoning. More precisely, we are going to show that any CALC formula ϑ can be translated into an equisatisfiable ALC
132
S. Klarman Table 2. Translation ·Ω ε from CALC to ALC for a fixed Ω.
(C D)ε := C if C D ∈ T l ε Dε L L (C D)ε := (γ C)ε (γ D)ε if C D ∈ T g γ∈Λϑ
(C(a))ε := Cε (a) (r(a, b))ε := rε (a, b) Aγ := A if A ∈ NCg Aγ := A∗ if A ∈ NCl Aε := A ⊥γ := ⊥ γ := rγ := r if r ∈ NRg rγ := r∗ if r ∈ NRl rε := r
if C(a) ∈ A if r(a, b) ∈ A (¬A)γ := ¬Aγ (C D)γ := Cγ Dγ (C D)γ := Cγ Dγ (∃r.D)ε := ∃rε .Dε (∀r.D)ε := ∀rε .Dε (∃r.D)γ|C := ∃rγ|C .( Cγ Dγ|C ) (∀r.D)γ|C := ∀rγ|C .( Cγ Dγ|C ) ( CD)γ := Cγ Dγ|C
Cγ|D := A∗ and set: T := T ∪ {A∗ ≡ Dγ Cγ|D }
Cε := A∗ and set: T := T ∪ {A∗ ≡ Cε } for every Cγ and Dδ , with Cγ = Dδ , set: T := T ∪ { Cγ ≡ Dδ } iff (γ | C) ∼ (δ | D) ∈ Ω T := T ∪ { Cγ ≡ Dδ } iff (γ | C) ∼ (δ | D) ∈Ω formula, which in the worst case is exponentially larger than ϑ. However, as the exponential blow-up stems exclusively from the fact that one has to account for all possible assignments of context equalities over the formula, we can therefore consider an ‘oracle’ providing a correct assignment and, as a result, obtain a translation only polynomially larger. Since the decision problem in ALC with non-empty TBoxes is ExpTime-complete [10], we will therefore conclude that the upper bound of deciding satisfiability in CALC is NExpTime. The translation of a formula ϑ = i ϑi , where each ϑi is a CALC axiom, is defined as: Ω ϑε = Ω∈Ωϑ ϑΩ ε = Ω∈Ωϑ i (ϑi )ε where Ωϑ is the set of all possible assignments of context equalities for ϑ. The details are presented in Tab. 2. The translation rests on introduction of fresh atoms, marked as A∗ for new concept names and r∗ for new role names, and supplementary TBox axioms, which constrain the interpretation of the added terms. Roughly, the new atoms are used to differentiate between occurrences of the same terms within non-equal contexts and additionally to abbreviate the references to the comparison classes. The following restrictions are imposed on the translation function ·Ω δ : Ω AΩ γ = Aδ iff γ ∼ δ ∈ Ω rγΩ = rδΩ iff γ ∼ δ ∈ Ω
The following Lemma, which we prove in the appendix, states the general properties of the translation.
Description Logics for Relative Terminologies
133
Lemma 1 (Translation properties). For every CALC formula ϑ it holds that: 1. ϑ is satisfiable iff ϑε is satisfiable; 2. for a fixed assignment of context equalities Ω the size of ϑΩ ε is polynomial in the size of ϑ; 3. the size of ϑε is exponential in the size of ϑ. Based on those we are able to derive the following complexity bounds for the satisfiability problem in CALC . Theorem 1 (Complexity bounds). The upper and the lower bound on the complexity of deciding satisfiability of a CALC formula are NExpTime and ExpTime, respectively. As CALC is a conservative extension of ALC, the lower bound ExpTime carries over directly from ALC [10]. For the upper bound, note that Lemma 1 implies that under a fixed assignment of context equalities the size of the ALC formula resulting from the translation can be at most polynomially larger than that of the original CALC formula. Assuming the correct assignment is given, solving the problem can be at most as hard as in ALC w.r.t. a non-empty TBox, i.e. ExpTime-complete. Deciding satisfiability in CALC is therefore in NExpTime. Apart from indicating the upper bound for complexity, Lemma 1 offers also two additional insights into CALC . Firstly, it provides an alternative to the tableau decision procedure. In fact, both approaches are very closely related. In particular, they involve the same exponential blow-up associated with the number of possible assignments of context equalities, and also the treatment of terms occurring within the scope of context operators is analogical in both cases. Regardless of that, it is worthwhile to study the tableau calculus independently, as some potential extensions of CALC (e.g. allowing anonymous context operators) might easily impede the translation-based strategy, while still remain possible to handle by the tableau augmented with some additional rules. Secondly, it shows that strictly speaking CALC is not more expressive than ALC. Nevertheless, there exists no equivalence-preserving reduction, i.e. formulas of CALC do not have in general equivalent counterparts in ALC. For this reason we conjecture that CALC is strictly more succinct than ALC, a feature very appealing for representation languages.
4
Related Work
The logic CALC can be seen as a special case of multi-dimensional DLs [13], and more generally, as an instance of multi-dimensional modal logics [7], in which next to the standard object dimension we introduce a second one, referring to the subsets of the object domain as the possible states in the model. The scope of multi-dimensionality involved here, however, is very limited, thus discharging certain computational problems inherent to richer multi-dimensional formalisms. Notably, we were able to define a terminating decision procedure without resorting to some more advanced techniques such as based on quasimodels [14].
134
S. Klarman
The problem of representing and reasoning with contextual knowledge, in particular in DLs, has been quite extensively studied in the literature, e.g. in [4,15,3,16,17]. The vast majority of authors, following the tradition of McCarthy and others [18,19,20], consider contexts on a very abstract level, as primitive First-Order objects, which by themselves do not have any properties. Thus the general semantic intuition of introducing an additional dimension, in an explicit (e.g. by listing all contexts [4,15,3]) or an implicit (e.g. by treating subsets of models of a knowledge base as contexts [16,17]) manner, is common with our approach. However, as the problem we address here is more specific, we are also able to offer a stronger explication of what a context is — namely a subset of domain objects — and, consequently a stronger inference mechanism. Some generality is therefore sacrificed for the sake of problem-specific customization. Finally, CALC shares certain similarities with public announcement logic (PAL) [21], which studies the dynamics of information flow in epistemic models. Interestingly, our context operator can be to some extent seen as a PAL announcement, whose role is to reduce the DL (epistemic) model to exactly those individuals (epistemic states) that satisfy given concept (formula). Unlike in PAL, however, we interpret an application of the operator as a leap to a different model, rather than an update of the current one, thus allowing for a change in the meaning of relative terms. Because of that, it is also not possible to reduce reasoning in CALC to PAL, for which tableau proof procedures exist, e.g. [22], or directly transfer other interesting results [23]. Only in a special case (empty TBoxes and only global concepts and roles) is CALC a variant of PAL on unrestricted frames.
5
Conclusions and Future Work
Providing a sound formal account of context-sensitivity and related phenomena is a vital challenge in the field of knowledge representation, and quite recently, also on the Semantic Web. In this paper we have addressed a very specific case of that problem, namely, representation of relative terms, whose meaning depends on the selection of comparison classes to which the terms are applied. Admittedly, the scope of the proposal is quite narrow and it does not pretend to have solved the general problem of context-sensitivity in DL-based representations. Nevertheless, we have showed that by a careful use of supplementary modal dimensions one can obtain extra expressive power, which on the one hand is sufficient to handle certain interesting representation problems, while on the other does not require deep revisions on the syntactic, semantic nor, most importantly, the proof-theoretic side of the basic DL paradigm. Our belief, which we aim to verify in the course of the future work, is that in a similar manner, aspects of multi-dimensionality can offer convenient formal means for addressing other types of context-sensitivity, and other phenomena related to imperfect knowledge, such as uncertainty or vagueness, which currently are approached on the grounds of formalisms involving a thorough reconstruction of the semantics and the proof theory of DLs, e.g. probabilistic, possibilistic or fuzzy DLs [24].
Description Logics for Relative Terminologies
135
Acknowledgements I would like to thank my supervisor Stefan Schlobach, as well as Carsten Lutz and V´ıctor Gut´ıerrez-Basulto, for helpful discussions on the ideas presented in this paper. This research has been funded by Hewlett-Packard Labs.
References 1. Shapiro, S.: Vagueness in Context. Oxford University Press, Oxford (2006) 2. Gaio, S.: Granular Models for Vague Predicates. In: Proceedings of the Fifth International Conference, FOIS 2008 (2008) 3. Bouquet, P., Giunchiglia, F., van Harmelen, F., Serafini, L., Stuckenschmidt, H.: C-OWL: Contextualizing ontologies. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 164–179. Springer, Heidelberg (2003) 4. Benslimane, D., Arara, A., Falquet, G., Maamar, Z., Thiran, P., Gargouri, F.: Contextual ontologies: Motivations, challenges, and solutions. In: Proceedings of the Advances in Information Systems Conference, Izmir (2006) 5. Horrocks, I., Patel-Schneider, P.F., Harmelen, F.V.: From SHIQ and RDF to OWL: The making of a Web Ontology Language. Journal of Web Semantics 1 (2003) 6. Guha, R., Mccool, R., Fikes, R.: Contexts for the semantic web. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 32–46. Springer, Heidelberg (2004) 7. Kurucz, A., Wolter, F., Zakharyaschev, M., Gabbay, D.M.: Many-Dimensional Modal Logics: Theory and Applications. Studies in Logic and the Foundations of Mathematics, vol. 148. Elsevier, Amsterdam (2003) 8. Klarman, S.: Description logics for relative terminologies or why the biggest city is not a big thing. In: Icard, T. (ed.) Proc. of the ESSLLI 2009 Student Session (2009) 9. Klarman, S., Schlobach, S.: Relativizing concept descriptions to comparison classes. In: Description Logics. CEUR Workshop Proceedings, vol. 477. CEUR-WS.org (2009) 10. Baader, F., Calvanese, D., Mcguinness, D.L., Nardi, D., Patel-Schneider, P.F.: The description logic handbook: theory, implementation, and applications. Cambridge University Press, Cambridge (2003) 11. Third, A., Bennett, B., Mallenby, D.: Architecture for a grounded ontology of geographic information. In: Fonseca, F., Rodr´ıguez, M.A., Levashkin, S. (eds.) GeoS 2007. LNCS, vol. 4853, pp. 36–50. Springer, Heidelberg (2007) 12. Baader, F., Sattler, U.: An overview of tableau algorithms for description logics. Studia Logica 69, 5–40 (2001) 13. Wolter, F., Zakharyaschev, M.: Multi-dimensional description logics. In: The Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 104–109 (1999) 14. Wolter, F., Zakharyaschev, M.: Satisfiability problem in description logics with modal operators. In: Proceedings of the Sixth Conference on Principles of Knowledge Representation and Reasoning, pp. 512–523 (1998) 15. Giunchiglia, F., Ghidini, C.: Local models semantics, or contextual reasoning = locality + compatibility. Artificial Intelligence 127 (2001) 16. Grossi, D.: Desigining Invisible Handcuffs. Formal Investigations in Institutions and Organizations for Multi-Agent Systems. PhD thesis, Utrecht University (2007)
136
S. Klarman
17. Goczyla, K., Waloszek, W., Waloszek, A.: Contextualization of a DL knowledge base. In: The Proc. of the Description Logics Workshop (2007) 18. McCarthy, J.: Notes on formalizing context, pp. 555–560. Morgan Kaufmann, San Francisco (1993) 19. Guha, R.V.: Contexts: A Formalization and Some Applications. PhD thesis, Stanford University (1995) 20. Buvac, S., Mason, I.A.: Propositional logic of context. In: Proc. of the 11th National Conference on Artificial Inteligence, pp. 412–419 (1993) 21. van Ditmarsch, H., van der Hoek, W., Kooi, B.: Dynamic Epistemic Logic. Synthese Library. Springer, Heidelberg (2007) 22. Balbiani, P., Ditmarsch, H., Herzig, A., Lima, T.: A tableau method for public announcement logics. In: Olivetti, N. (ed.) TABLEAUX 2007. LNCS (LNAI), vol. 4548, pp. 43–59. Springer, Heidelberg (2007) 23. Lutz, C.: Complexity and succinctness of public announcement logic. In: AAMAS’06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pp. 137–143. ACM, New York (2006) 24. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description logics for the semantic web. Web Semantics 6(4), 291–308 (2008)
Appendix We prove that the results of soundness, termination and completeness hold for the tableau calculus presented in Sec. 3. The notation and the structure of the proofs follow closely the presentation in [7]. In particular we use con(ϑ) to denote all subconcepts of ϑ, rol (ϑ) for role names, and obj (ϑ) for all individual names occurring in the formula. Theorem 2 (Soundness). Let S be a complete clash-free constraint system for ϑ. Then ϑ is satisfiable. Proof. We use S as ‘a guide’ to show that there exists a model for ϑ. First, we define a structure C = Δ, W, {Iw }w∈W , using the following abbreviations [γ] = {δ ∈ S | δ = γ or δ ∼ γ ∈ S} and Sγ = {φ(x) | δ : φ(x) ∈ S and δ ∈ [γ]}: – w[γ] ∈ W for every [γ], such that γ ∈ S, where I[γ] comprises all terms x such that x ∈ Sγ w[γ] = Δ – Δ= W – x ∈ AI[γ] for every γ, x and A such that A(x) ∈ Sγ – aI[γ] = a for every γ and a ∈ obj (ϑ) such that a ∈ Sγ – rI[γ] (x, y) for every γ, x, y and r such that r(x, y) ∈ Sγ (or r(z, y) ∈ Sγ in case x is blocked in Sγ by z) It follows that C is a context structure. Note, that we ensure: 1) w[γ] = w[δ] if and only if [γ] = [δ], by the closure of S under ⇒∼ , 2) the rigidity of individual names, by the construction of C, 3) the semantics of global concept and role names, by the closure of S under ⇒C g and ⇒Rg . Next we prove the following: Proposition 1. For every w[γ] ∈ W , concept C ∈ con(ϑ) and x ∈ ΔI[γ] , if γ : C(x) ∈ S then x ∈ C I[γ] .
Description Logics for Relative Terminologies
137
Proof. The proof is by induction on the form of C. – For atomic concepts the claim follows directly from the definition of C. – Let C = ¬A for some atomic concept A and suppose γ : ¬A(x) ∈ S for any γ and x. Notice that there can be no δ : A(x) ∈ S for any δ ∈ [γ] as S is clash-free. Hence x ∈ AI[γ] and x ∈ (¬A)I[γ] . – The cases of C = B D, C = B D, C = ∃r.D and C = ∀r.D are as in ALC (see [7]), modulo proper indexing of the interpretation function and the labeling of contexts. – Let C = BD and suppose γ : BD(x) ∈ S. Then since S is closed under ⇒· , it has to be the case that γ | B : D(x). But then there exists w[γ|B] ∈ W , such that x ∈ DI[γ|B] . Also, since S is closed under ⇒⊆ and ⇒⊇ , it has to be the case that ΔI[γ|B] = B I[γ] . Therefore x ∈ ( BD)I[γ] . By Proposition 1 we can finally show the following: Proposition 2. C is a model of ϑ. Proof. Recall that ϑ = i ϑi and for all i, ε : ϑi is in the initial constraint system for ϑ with empty context labels. Observe that w[ε] is the root of C and consider possible syntactic form of any ϑi : – C(a): then by Prop. 1 a ∈ C I[ε] , hence C, w[ε] |= C(a) – r(a, b): then by definition of C it holds that aI[ε] , bI[ε] ∈ rI[ε] and hence C, w[ε] |= r(a, b) – ≡ C ∈ T l : then since S is closed under ⇒≡T l , it has to be the case that for all x ∈ ΔI[ε] , ε : C(x) ∈ S and by Prop. 1, x ∈ C I[ε] . Hence I[ε] = C I[ε] , and consequently C, w[ε] |= ≡ C. – ≡ C ∈ T g : then since S is closed under ⇒≡T g , it has to be the case that for all γ, x ∈ ΔI[γ] , γ : C(x) ∈ S and by Prop. 1, x ∈ C I[γ] . Hence I[γ] = C I[γ] , and consequently C, w[γ] |= ≡ C for all w[γ] ∈ W . It follows that each conjunct of ϑ is satisfied by C.
This completes the proof of soundness.
Theorem 3 (Termination). There is no infinite sequence of inference steps via the tableau rules. Proof. Consider a formula ϑ in NNF. Clearly, there is only a finite number of
· operators used in ϑ, and hence, there can be only a finite number of unique context labels introduced in the tableau due to application of the ⇒· rule. Given that, there can be also only finite number of inference steps via the rules ⇒∼ and ⇒⊆ , as well as via the ⇒⊇ rule for any individual variable. Note, that other than occurrences of ·, ϑ does not contain any symbols from outside ALC, hence the only problem for termination is posed by application of the ⇒∃ rule (clearly, upon suspending it there can be always only a finite number of possible inference steps). But given a finite number of context labels it has to be the case that at some point the blocking rule applies, and all -minimal individual variables occurring in S are blocked. Hence the tableau procedure for ϑ terminates in finite time.
138
S. Klarman
Theorem 4 (Completeness). If ϑ is satisfiable then there exists a complete clash-free constraint system for ϑ. Proof. Let ϑ be a CALC formula and C = Δ, W, {Iw }w∈W a context structure satisfying ϑ. We use C as an oracle in determining the construction of a complete clash-free constraint system for ϑ. We say that a constraint system S is compatible with C iff there exist mappings π : Λϑ → W and σ : ΛI → Δ, where Λϑ and ΛI are the context labels and the individual terms occurring in S, such that the following conditions are satisfied: – – – – – –
π(ε) = Δ, C, Δ |= φ, for every formula φ whenever ε : φ ∈ S σ(a) = aIw for every a ∈ obj (ϑ) and w ∈ W ; σ(x) ∈ ΔIπ(γ) whenever γ : x ∈ S; σ(x) ∈ C Iπ(γ) whenever γ : C(x) ∈ S;
σ(x), σ(y) ∈ rIπ(γ) whenever γ : r(x, y) ∈ S.
Let S be a constraint system for ϑ compatible with C. We show that if any of the tableau rules is applicable to S, then it can be applied in such a way that the resulting system S is still compatible with C. – The cases of ⇒ , ⇒ , ⇒∃ , ⇒∀ , ⇒≡T g , ⇒≡T l , ⇒≡ are as in ALC (see [7]), modulo relativization of the rules to local constraint systems Sγ , for particular γ ∈ Λϑ . The mapping π remains unmodified. S is compatible with C. – Suppose we apply ⇒· to some γ : CD(x) ∈ S. Then we obtain S by adding γ | C : D(x) to S. We set π(γ | D) := w for w ∈ W such that w = C Iπ(γ) , and leave σ unmodified. S is compatible with C. – Suppose we apply ⇒∼ to some {γ, δ} ⊆ S. It must be that either π(γ) = π(δ) or π(γ) = π(δ) in C. We pick the correct one and obtain S by adding L ε : γ ≡ δ L or ε : γ L ≡ δ L to S, respectively. Clearly the added formula has to be satisfied in C, π(ε). Both mappings remain unmodified. S is compatible with C. – Suppose we apply ⇒⊆ to some γ | C ∈ S. It has to be the case that C Iπ(γ) = ΔIπ(γ|C) . We obtain S by adding ε : γ L C ≡ (γ | C)L to S, which has to be clearly satisfied in C, π(ε). Both mappings remain unmodified. S is compatible with C. – Suppose we apply ⇒⊇ to some γ : x ∈ S. Then we obtain S by adding ε : (x) to S. Since σ(x) ∈ ΔIπ(γ) it must also hold that σ(x) ∈ ΔIπ(ε) . Both mappings remain unmodified. S is compatible with C. – Suppose we apply ⇒≡C g (resp. ⇒≡Rg ) to some {γ : C(x), δ : x} ⊆ S (resp. {γ : r(x, y), δ : x, δ : y} ⊆ S). We obtain S by adding δ : C(x) (resp. δ : r(x, y)) to S. But since C is a global concept (r is a global role) it must be already that σ(x) ∈ C Iπ(δ) (resp. σ(x), σ(y) ∈ rIπ(δ) . Both mappings remain unmodified. S is compatible with C. By Theorem 3 the number of inferences applicable to S is finite, therefore at some point we obtain a complete constraint system, which is clearly clash-free.
Description Logics for Relative Terminologies
Proof of Lemma 1. Let ϑ be a CALC formula and ϑε = lation to ALC. Then:
Ω∈Ωϑ
139
ϑΩ ε its trans-
Claim 1. ϑ is satisfiable iff ϑε is satisfiable; Proof. For proving this claim we establish a correspondence between the models of ϑ and ϑε . For every set of context equivalences Ω over the context labels Λϑ , we relate the context structures C = Δ, W, {Iw }w∈W of ϑ to the interpretations I = (ΔI , ·I ) of ϑΩ ε according to the following constraints. For every γ ∈ Λϑ , Ω Ω concept name A ∈ con(ϑΩ ε ), role name r ∈ rol(ϑε ), individual name a ∈ obj(ϑε ) Ω and finally for every occurrence of a C operator in ϑε we set: Iγ ⊥Iγ ΔI
CIγ aI
= = = = =
Iπ(ε) ⊥Iπ(ε) Δ ΔIπ(γ|C) aIπ(ε)
AIγ AIγ rγI rγI
= = = =
AIπ(γ) AIπ(ε) rIπ(γ) rIπ(ε)
iff iff iff iff
A ∈ NCl A ∈ NCg r ∈ NRl r ∈ NRg
where π is a mapping from Λϑ to W such that π(ε) = Δ and for every γ ∈ Λϑ it holds that π(γ | C) = C Iπ(γ) . In what follows, to simplify the notation, we write ·Iγ instead of ·Iπ(γ) . Clearly, context structures uniquely determine the interpretations and vice versa. The gist of the proof, which we demonstrate in the subsequent steps, lies in that for any CALC concept C and its translation Cε there exist corresponding interpretations C = Δ, W, {Iw }w∈W and I = (ΔI , ·I ) such that C Iε = CεI . Below, we call C a γ-concept whenever it translates to Cγ in ϑε , for some γ ∈ Λϑ . Proposition 3. For every γ-concept P , such that P contains only concept names and symbols ⊥, , ¬, , it holds that P Iγ = ΔIγ ∩ PγI . Proof. Transform P to conjunctive normal form i j Lij , so that every Lij is an atom, negated atom, ⊥ or . Then we have that ΔIγ ∩ ( i j Lij )Iγ = ΔIγ ∩ i j Lij Iγ = i j (ΔIγ ∩ Lij Iγ ). For every possible form of Lij we show that ΔIγ ∩ Lij Iγ = Lij Iγ : – Lij = A: if A is local then by the correspondence AIγ = AIγ and obviously ΔIγ ∩ AIγ = AIγ . If A is global then by the correspondence AIγ = AIε . By the semantics of global concepts AIγ = ΔIγ ∩ AIε . – Lij = ¬A: then ΔIγ ∩ (¬A)Iγ = ΔIγ ∩ (ΔI \ AIγ ) = (ΔIγ ∩ ΔI ) \ (ΔIγ ∩ AIγ ) Hence, by the fact that ΔIγ ⊆ ΔI and by the previous argument this is equivalent to ΔIγ \ AIγ and thus to (¬A)Iγ . – Lij = ⊥: then by the correspondence ΔIγ ∩ ⊥Iγ = ΔIγ ∩ ⊥Iε = ∅ = ⊥Iγ . – Lij = : then by the correspondence ΔIγ ∩ Iγ = ΔIγ ∩ Iε = ΔIγ = Iγ . Hence i j (ΔIγ ∩ Lij Iγ ) = i j Lij Iγ = ( i j Lij )Iγ = P Iγ , which concludes the proof.
140
S. Klarman
Proposition 4. For every γ-concept ∃r.P and ∀r.P , where P is as defined in Proposition 3, it holds that ΔIγ ∩ (∃r.P )Iγ = (∃r.P )Iγ and ΔIγ ∩ (∀r.P )Iγ = (∀r.P )Iγ . Proof. By the translation and the correspondence we have that ΔIγ ∩ (∃r.P )Iγ = ΔIγ ∩ {x | ∃y : rγI (x, y) ∧ y ∈ (ΔIγ ∩ PγI )} and analogically for ∀r.P . By Proposition 3 ΔIγ ∩ PγI = P Iγ . Further, by the correspondence, if r is local then rγI = rIγ which is equivalent to rIγ ∩ ΔIγ × ΔIγ , else if r is global we have rγI = rIε , but then rIε ∩ ΔIγ × ΔIγ = rIγ . Hence ΔIγ ∩ {x | ∃y : rγI (x, y) ∧ y ∈ (ΔIγ ∩ PγI )} = {x | ∃y : rIγ (x, y) ∧ y ∈ P Iγ } = (∃r.P )Iγ and analogically for ∀r.P . Proposition 5. For every ALC γ-concept C it holds that ΔIγ ∩ CγI = C Iγ Proof. Given the translation rules and the correspondence, the claim follows inductively from Propositions 3, 4 and the principle of compositionality. Proposition 6. For every γ-concept CD it holds that ( CD)Iγ = ( CD)Iγ . Proof. By induction over the structure of D. Let D be an ALC concept. Then by I , which by the correspondence the translation we get that ( CD)Iγ = CIγ Dγ|C Iγ|C I is equivalent to Δ ∩Dγ|C and by Proposition 5 to DIγ|C = ( CD)Iγ . Assume D is any CALC concept. Then given the translation rules and the correspondence, the claim follows inductively from Proposition 5, the former argument, and the principle of compositionality. Lemma 2. For every ε-concept C it holds that CεI = C Iε . Proof. Given the translation rules and the correspondence, the claim follows inductively from Propositions 5, 6, and the principle of compositionality. By comparing the semantics of CALC and ALC axioms it follows immediately that if there exists a context structure satisfying a formula ϑ, then there has to exist an assignment Ω and an ALC model of a formula ϑΩ ε , namely the one based on the correspondence: – for TBox axioms and concept assertions directly by Lemma 2; – for role assertions by the correspondence. Likewise, the conditional holds in the opposite direction. Therefore, ϑ is satisfi able if and only if ϑε is. Claim 2. for a fixed assignment of context equalities Ω the size of ϑΩ ε is polynomial in the size of ϑ; Proof. The formula ϑ induces a finite number of context labels Λϑ which is bounded by |ϑ|. According to the translation rules for a fixed assignment of context equivalences Ω we introduce at most |Λϑ | new concept names for representing the labels. Further the input is extended with:
Description Logics for Relative Terminologies
141
– |Λϑ | new TBox axioms, of length bounded by |ϑ|, defining the new concept names; 2 – |Λϑ | 2−|Λϑ | new TBox axioms, of constant length, for representing context equivalences; – |Λϑ | new TBox axioms of length bounded by |ϑ| for every global TBox axiom in ϑ; – a number of new occurrences of concept names representing the contexts, linear in |ϑ|. It follows that the increase in the size of the translation ϑΩ ε is polynomial in the size of the original formula. Claim 3. the size of ϑε is exponential in the size of ϑ. Proof. By Point 2 there has to exist a polynomial function p such that |ϑΩ ε | = p(|ϑ|). However, The whole translation ϑε consists of |Ωϑ | different disjuncts of length |ϑΩ ε |, where Ωϑ is the set of all possible assignments of context equivalences over Λϑ . One can see that the size of Ωϑ is equal to B(|Λϑ |), where B is a function computing so-called Bell numbers (the number of possible partitions of a set of a given cardinality). It can be shown that for any |Λϑ | > 1 it holds 2 that 2|Λϑ | ≤ B(|Λϑ |) < 2|Λϑ | . Therefore, since |Λϑ | is bounded by |ϑ| there exists a polynomial function q, of the degree d, such that |ϑε | = 2q(|ϑ|) and hence d |ϑε | ∈ O(2(|ϑ| ) ).
Cdiprover3: A Tool for Proving Derivational Complexities of Term Rewriting Systems Andreas Schnabl Institute of Computer Science, University of Innsbruck, Austria [email protected]
Abstract. This paper describes cdiprover3, a tool for proving termination of term rewrite systems by polynomial interpretations and context dependent interpretations. The methods used by cdiprover3 induce small bounds on the derivational complexity of the considered system. We explain the tool in detail, and give an overview of the employed proof methods.
1
Introduction
Term rewriting is a Turing complete model of computation, which is conceptually closely related to declarative and (first-order) functional programming. One of its most studied properties, termination, is also a central problem in computer science. This property is undecidable in general, but many partial decision methods have been developed in the last decades. Beyond showing termination of a given rewrite system, some of these methods can also give bounds on various measures of its complexity. As suggested in [13], a natural way of measuring the complexity of a term rewrite system is to analyse its derivational complexity. The derivational complexity is a function which relates the size of a term and the maximal number of rewrite steps that can be executed starting from any term of that size in the given rewrite system. We are particularly interested in small, i.e. polynomial upper bounds on this function. In contrast to our approach of measuring derivational complexity, the constructor discipline is mentioned in [16]. In this field, we look at the complexity of the function that is encoded by a constructor system. It is either measured by the number of rewrite steps needed to bring the term into normal form [4,2], or by counting the number of steps needed by some evaluation mechanism different from standard term rewriting [17,5]. Both of these measures describe the complexity of the computation underlying the given rewrite system in some sense. In this paper, we describe cdiprover3, a tool which uses polynomial and context dependent interpretations in order to prove termination and complexity bounds of term rewrite systems. The tool, its predecessors, and full experimental data are available at http://cl-informatik.uibk.ac.at/~aschnabl/experiments/cdi/ T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 142–154, 2010. c Springer-Verlag Berlin Heidelberg 2010
Cdiprover3: A Tool for Proving Derivational Complexities
143
Polynomial interpretations, introduced in [15], are a standard direct termination proof method. Besides showing termination of rewrite systems, they also provide an easy way to extract upper bounds on the derivational complexity [13]. However, as noticed in [12], this often heavily overestimates the derivational complexity. Context dependent interpretations, also introduced in [12], are an effort to improve these upper bounds. The remainder of this paper is organised as follows: Section 2 outlines the basics of term rewriting needed to state all relevant results. In Section 3, we briefly describe polynomial and context dependent interpretations, which are used by cdiprover3. Section 4 describes the implementation of cdiprover3, and mentions some experimental results. In Section 5, we explain the input and output of cdiprover3 in detail. Last, in Section 6, we state conclusions and potential future work.
2
Term Rewriting
In this section, we review some basics of term rewriting. We only cover the concepts which are relevant to this paper. A general introduction to term rewriting can be found in [3,24], for instance. A term rewrite system (TRS) R consists of a signature F , a countably infinite set of variables V disjoint from F , and a finite set of rewrite rules l → r, where l and r are terms such that l ∈ / V and all variables which occur in r also occur in l. The signature F defines a set of function symbols, and assigns to each function symbol f its arity. We assume that every signature contains at least one function symbol of arity 0. The set of terms built from F and V is denoted by T (F , V). The set of terms T (F ) without any variables is called the set of ground terms over F . A function symbol is defined if it occurs at the root of a left hand side of a rewrite rule. All non-defined function symbols are called constructors. A constructor based term is a term containing exactly one defined function symbol, which appears at the root of that term. We call the total number of function symbol and variable occurrences in a term t its size, denoted by |t|. A substitution is a mapping σ : Dom(σ) → T (F , V), where Dom(σ) is a finite subset of V. The result of replacing all occurrences of variables x ∈ Dom(σ) in a term t by σ(x) is denoted by tσ. A context is a term C[] containing a single occurrence of a fresh function symbol of arity 0. If we replace with a term t, we denote the resulting term by C[t]. Given a TRS R and two terms s, t, we say that s rewrites to t (s →R t) if there exist a context C, a substitution σ and a rewrite rule l → r in R such that s = C[lσ] and t = C[rσ]. The transitive closure of this relation is ∗ n →+ R . The reflexive and transitive closure is →R . We write →R to express n-fold composition of →R . A TRS R is terminating if there exists no infinite chain of terms t0 , t1 , . . . such that ti →R ti+1 for each i ∈ N. For a terminating TRS R, the derivation length of a term t is defined as dlR (t) = max{n | ∃s : t →nR s}. The derivational complexity is the function dcR : N → N which maps n to max{dlR (t) | |t| n}.
144
3
A. Schnabl
Used Termination Proof Methods
3.1
Polynomial Interpretations
An F -algebra A for some signature F consists of a carrier A and interpretation functions {fA : An → A | f ∈ F, n = arity(f )}. Given an assignment α : V → A, we denote the evaluation of a term t into A by [α]A (t). It is defined inductively as follows: [α]A (x) = α(x) [α]A (f (t1 , . . . , tn )) = fA ([α]A (t1 ), . . . , [α]A (tn ))
for x ∈ V for f ∈ F
A well-founded monotone F -algebra is a pair (A, >) where A is an F -algebra and > is a well-founded proper order such that for every function symbol f ∈ F, fA is strictly monotone with respect to >, i.e. fA (s1 , . . . , si , . . . , sn ) > fA (s1 , . . . , ti , . . . , sn ) whenever si > ti . A well-founded monotone algebra (A, >) is compatible with a TRS R if for every rewrite rule l → r in R and every assignment α, [α]A (l) > [α]A (r) holds. It is a well-known fact that a TRS R is terminating if and only if there exists a well-founded monotone algebra that is compatible with R. A polynomial interpretation [15] is an interpretation into a well-founded monotone algebra (A, >) such that A ⊆ N, > is the standard order on the natural numbers, and fA is a polynomial for every function symbol f . If a polynomial interpretation is compatible with a TRS R, then we clearly have dlR (t) [α]A (t) for all terms t and assignments α. Example 1. Consider the TRS R with the following rewrite rules over the signature containing the function symbols 0 (arity 0), s (arity 1), + and - (arity 2). This system is example SK90/2.11.trs in the termination problems database1 (TPDB), which is the standard benchmark for termination provers: +(0, y) → y +(s(x), y) → s(+(x, y))
-(0, y) → 0 -(x, 0) → x
-(s(x), s(y)) → -(x, y)
The following interpretation functions build a compatible polynomial interpretation A over the carrier N: +A (x, y) = 2x + y
-A (x, y) = 3x + 3y
sA (x) = x + 2
0A = 1
A strongly linear interpretation is a polynomial interpretation n such that every interpretation function fA has the form fA (x1 , . . . , xn ) = i=1 xi + c, c ∈ N. A surprisingly simple property is that compatibility with a strongly linear interpretation induces a linear upper bound on the derivational complexity. A linear polynomial interpretation is a polynomial interpretation where each n interpretation function fA has the shape fA (x1 , . . . , xn ) = i=1 ai xi +c, ai ∈ N, 1
See http://www.lri.fr/~ marche/tpdb/ and http://termcomp.uibk.ac.at/
Cdiprover3: A Tool for Proving Derivational Complexities
145
c ∈ N. For instance, the interpretation given in Example 1 is a linear polynomial interpretation. Because of their simplicity, this class of polynomial interpretations is the one most commonly used in automatic termination provers. As illustrated by Example 2 below, if only a single one of the coefficients ai in any of the functions fA is greater than 1, there might already exist derivations whose length is exponential in the size of the starting term. Example 2. Consider the TRS S with the following single rule over the signature containing the function symbols a, b (arity 1), and c (arity 0). This system is example SK90/2.50.trs in the TPDB: a(b(x)) → b(b(a(x))) The following interpretation functions build a compatible linear polynomial interpretation A over N: aA (x) = 2x
bA (x) = x + 1
cA = 0
If we start a rewrite sequence from the term an (b(c)), we reach the normal form n b2 (an (c)) after 2n − 1 rewriting steps. Therefore, the derivational complexity of S is at least exponential. 3.2
Context Dependent Interpretations
Even though polynomial interpretations provide an easy way to obtain an upper bound on the derivational complexity of a TRS, they are not very suitable for proving polynomial derivational complexity. Strongly linear interpretations only capture linear derivational complexity, but even a slight generalisation admits already examples of exponential derivational complexity, as illustrated by Example 2. In [12], context dependent interpretations are introduced. They use an additional parameter (usually denoted by Δ) in the interpretation functions, which changes in the course of evaluating the interpretation of a term, thus making the interpretation dependent on the context. This way of computing interpretations also allows us to bridge the gap between linear and polynomial derivational complexity. Definition 1. A context dependent interpretation C for some signature F conn + → R+ sists of functions {fC [Δ] : (R+ 0) 0 | f ∈ F, n = arity(f ), Δ ∈ R } i + + and {fC : R → R | f ∈ F, i ∈ {1, . . . , arity(f )}}. Given a Δ-assignment α : R + × V → R+ 0 , the evaluation of a term t by C is denoted by [α, Δ]C (t). It is defined inductively as follows: [α, Δ]C (x) = α(Δ, x) [α, Δ]C (f (t1 , . . . , tn )) =
fC [Δ]([α, fC1 (Δ)]C (t1 ), . . . , [α, fCn (Δ)]C (tn ))
for x ∈ V for f ∈ F
Definition 2. For each Δ ∈ R+ , let >Δ be the order defined by a >Δ b ⇐⇒ a − b Δ. A context dependent interpretation C is compatible with a TRS R if for all rewrite rules l → r in R, all Δ ∈ R+ , and every Δ-assignment α, we have [α, Δ]C (l) >Δ [α, Δ]C (r).
146
A. Schnabl
Definition 3. A Δ-linear interpretation is a context dependent interpretation C whose interpretation functions have the form fC [Δ](z1 , . . . , zn ) =
n
a(f,i) zi +
i=1
fCi (Δ) =
n
b(f,i) zi Δ + cf Δ + df
i=1
Δ a(f,i) + b(f,i) Δ
with a(f,i) , b(f,i) , cf , df ∈ N, a(f,i) + b(f,i) = 0 for all f ∈ F, 1 i n. If we have a(f,i) ∈ {0, 1} for all f, i, we also call it a Δ-restricted interpretation We consider Δ-linear interpretations because of the similarity between the functions fC [Δ] and the interpretation functions of linear polynomial interpretations. Another point of interest is that the simple syntactical restriction to Δ-restricted interpretations yields a quadratic upper bound on the derivational complexity. Moreover, because of the special shape of Δ-linear interpretations, we need no additional monotonicity criterion for our main theorems: Theorem 1 ([18]). Let R be a TRS and suppose that there exists a compatible Δ-linear interpretation. Then R is terminating and dcR (n) = 2O(n) . Theorem 2 ([21]). Let R be a TRS and suppose that there exists a compatible Δ-restricted interpretation. Then R is terminating and dcR (n) = O(n2 ). Example 3. Consider the TRS given in Example 1 again. A compatible Δrestricted (and Δ-linear) interpretation C is built from the following interpretation functions: +C [Δ](x, y) = (1 + Δ)x + y + Δ -C [Δ](x, y) = x + y + Δ sC [Δ](x) = x + Δ + 1
Δ 1+Δ -1C (Δ) = Δ
+1C (Δ) =
s1C (Δ) = Δ
+2C (Δ) = Δ −2C (Δ) = Δ 0C [Δ] = 0
Note that this interpretation gives a quadratic upper bound on the derivational complexity. However, from the polynomial interpretation given in Example 1, we can only infer an exponential upper bound [13]. Consider the term Pn,n , where we define P0,n = sn (0) and Pm+1,n = +(Pm,n , 0). We have |Pn,n | = 3n + 1. For every m, n ∈ N, Pm+1,n rewrites to Pm,n in n+1 steps. Therefore, Pn,n reaches its normal form sn (0) after n(n+1) rewrite steps. Hence, the derivational complexity is also Ω(n2 ) for this example, so the inferred bound O(n2 ) is tight.
4
Implementation
cdiprover3 is written fully in OCaml2. It employs the libraries of the termination prover TTT23 . From these libraries, functionality for handling TRSs and 2 3
http://caml.inria.fr http://colo6-c703.uibk.ac.at/ttt2
Cdiprover3: A Tool for Proving Derivational Complexities
147
SAT encodings, and an interface to the SAT solver MiniSAT4 are used. Without counting this, the tool consists of about 1700 lines of OCaml code. About 25% of that code are devoted to the manipulation of polynomials and extensions of polynomials that stem from our use of the parameter Δ. Another 35% are used for constructing parametric interpretations and building suitable Diophantine constraints (see below) which enforce the necessary conditions for termination. Using TTT2’s library for propositional logic and its interface to MiniSAT, 15% of the code deal with encoding Diophantine constraints into SAT. The remaining code is used for parsing input options and the given TRS, generating output, and controlling the program flow. In order to find polynomial interpretations automatically, Diophantine constraints are generated according to the procedure described in [6]. Putting an upper bound on the coefficients makes the problem finite. Essentially following [8], we then encode the (finite domain) constraints into a propositional satisfiability problem. This problem is given to MiniSAT. From a satisfying assignment for the SAT problem, we construct a polynomial interpretation which is monotone and compatible with the given TRS. This procedure is also the basis of the automatic search for Δ-linear and Δrestricted interpretations. The starting point of that search is an interpretation with uninstantiated coefficients. If we want to be able to apply Theorem 1 or 2, we need to find coefficients which make the resulting interpretation compatible with the given TRS. Furthermore, we need to make sure that no divisions by zero occur in the interpretation functions. Again, we encode these properties into Diophantine constraints on the coefficients of a Δ-linear or Δ-restricted interpretation. The encoding is an adaptation of the procedure in [6] to context dependent interpretations: to encode the condition that no divisions by zero occur, we use the constraint a(f,i) + b(f,i) > 0 for each function symbol f ∈ F, and 1 i arity(f ). Here, the variables a(f,i) and b(f,i) refer to the (uninstantiated) coefficients of fC [Δ], as presented in Definition 3. If a Δ-restricted interpretation is searched, we also add the constraint a(f,i) − 1 0 for each f ∈ F, and 1 i arity(f ), which enforces the Δ-restricted shape. To ensure compatibility with the given TRS, we use the constraints ∀α∀Δ∀x1 . . . ∀xn [α, Δ]C (l) − [α, Δ]C (r) − Δ 0 for each rule l → r in the given TRS, where x1 , . . . , xn is the set of variables occurring in l → r. We unfold [α, Δ]C according to the equalities given in Definitions 1 and 3. We then use some (incomplete) transformations to obtain a set of constraints using only the variables a(f,i) , b(f,i) , cf , and df introduced in 4
http://minisat.se
148
A. Schnabl
Definition 3. Satisfaction of these transformed constraints then implies satisfaction of the original constraints, which in turn implies compatibility of the induced context dependent interpretation with the given TRS. For a detailed description of this procedure, we refer to [21,18]. Once we have built the constraints, we continue using the same techniques as for searching polynomial interpretations: we encode the constraints in a propositional satisfiability problem, apply the SAT solver, and use a satisfying assignment to construct a context dependent interpretation. Table 1 shows experimental results of applying cdiprover3 on the 957 known terminating examples of version 4.0 of the TPDB. The tests were performed R OpteronTM 2.80 GHz dual single-threaded on a server equipped with 8 AMD
core processors with 64 GB of memory. For each system, cdiprover3 was given a timeout of 60 seconds. All times in the table are given in milliseconds. The first line of the table indicates the used proof technique; SL denotes strongly linear interpretations. The second row of the table specifies the upper bound for the coefficient variables; in all tests, we called cdiprover3 with the options -i -b X (see Section 5 below), where X is the value specified in the second row. As we can see, cdiprover3 is able to prove polynomial derivational complexity for 88 of the 368 known terminating non-duplicating rewrite systems of the TPDB (duplicating rewrite systems have at least exponential derivational complexity, so this restriction is harmless here). The results indicate that an upper bound of 7 on the coefficient variables suffices to capture all examples on our test set. Therefore, 3 and 7 seem to be good candidates for default values of the -b flag. However, it should be noted that our handling of the divisions introduced by the functions fCi is computationally rather expensive, which is indicated by the number of timeouts and the average time needed for successful proofs. This also explains the slight decrease in performance when we extend the search space to Δ-linear interpretations. The amount and average time of successes for Δ-linear interpretations remains almost constant for the tested upper bounds on the coefficient variables. However, raising this upper bound leads to a significant increase in the number of timeouts. There is exactly one system which can be handled by Δ-linear interpretations (even with upper bound 3), but not by Δ-restricted interpretations: system SK90/2.50 in the TPDB, which we mentioned in Example 2. Note that it is theoretically impossible to find a suitable Δ-restricted interpretation for this TRS, since its derivational complexity is exponential. Table 1. Performance of cdiprover3 Method SL SL+Δ-rest. Δ-linear Δ-rest. 31 31 3 7 15 31 3 7 15 -i -b X # success 41 88 82 83 82 83 83 86 86 average success time 15 3287 5256 5566 4974 5847 3425 3935 3837 0 234 525 687 750 797 142 189 222 # timeout
31 86 3845 238
Cdiprover3: A Tool for Proving Derivational Complexities
5
149
Using cdiprover3
cdiprover3 is called from command line. Its basic usage pattern is $ ./cdiprover3 – specifies the maximum number of seconds until cdiprover3 stops looking for a suitable interpretation. – specifies the path to the file which contains the considered TRS. – For , the following switches are available: -c defines the desired subclass of the searched polynomial or context dependent interpretation. The following values of are legal: linear, simple, simplemixed, quadratic These values specify the respective subclasses of polynomial interpretations, as defined in [22]. Linear polynomial interpretations imply an exponential upper bound on the derivational complexity. The other classes imply a double exponential upper bound, cf. [13]. pizerolinear, pizerosimple, pizerosimplemixed, pizeroquadratic For these values, cdiprover3 tries to find a polynomial interpretation with the following restrictions: defined function symbols are interpreted by linear, simple, simple-mixed, or quadratic polynomials, respectively. Constructors are interpreted by strongly linear polynomials. These interpretations guarantee that the derivation length of all constructor based terms is polynomial [4]. sli This option corresponds to strongly linear interpretations. As mentioned in Section 3, they induce a linear upper bound on the derivational complexity of a compatible TRS. deltalinear This value specifies that the tool should search for a Δlinear interpretation. By Theorem 1, compatibility with such an interpretation implies an exponential upper bound on the derivational complexity. deltarestricted This value corresponds to Δ-restricted interpretations. By Theorem 2, they induce a quadratic upper bound. -b sets the upper bound for the coefficient variables. The default value for this bound is 3. -i This switch activates an incremental strategy for handling the upper bound on the coefficient variables. First, cdiprover3 tries to find a solution using an intermediate upper bound of 1 (which corresponds to encoding each coefficient variable by one bit). Whenever the tool fails to find a proof for some upper bound b, it is checked whether b is equal to the bound specified by the -b option. If that is the case, then the search for a proof is given up. Otherwise, b is set to the minimum of the bound specified by the -b option and 2(b + 1) − 1 (which corresponds to increasing the number of bits used for each coefficient variable by 1). If the -c switch is not specified, then the standard strategy for proving polynomial derivational complexity is employed. First, cdiprover3 looks for a strongly
150
A. Schnabl
linear interpretation. If that is not successful, then a suitable Δ-restricted interpretation is searched. The input TRS files are expected to have the same format as the files in the TPDB. The format specification for this database is available at http://www.lri.fr/~marche/tpdb/format.html. The output given by cdiprover3, as exemplified by Example 4, is structured as follows. The first line contains a short answer to the question whether the given TRS is terminating: YES, MAYBE, or TIMEOUT. The latter means that cdiprover3 was still busy after the specified timeout. MAYBE means that a termination proof $ cat tpdb-4.0/TRS/SK90/2.11.trs (VAR x y) (RULES +(0,y) -> y +(s(x),y) -> s(+(x,y)) -(0,y) -> 0 -(x,0) -> x -(s(x),s(y)) -> -(x,y) ) (COMMENT Example 2.11 (Addition and Subtraction) in \cite{SK90}) $ ./cdiprover3 -i tpdb-4.0/TRS/SK90/2.11.trs 60 YES QUADRATIC upper bound on the derivational complexity This TRS is terminating using the deltarestricted interpretation -(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 0*X1*delta + 1*delta s(delta, X0) = + 1*X0 + 1 + 0*X0*delta + 1*delta 0(delta) = + 0 + 0*delta +(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 1*X1*delta + 1*delta - tau 1(delta) = delta/(1 + 0 * delta) - tau 2(delta) = delta/(1 + 0 * delta) s tau 1(delta) = delta/(1 + 0 * delta) + tau 1(delta) = delta/(1 + 1 * delta) + tau 2(delta) = delta/(1 + 0 * delta) Time: 0.024418 seconds Statistics: Number of monomials: 187 Last formula building started for bound 1 Last SAT solving started for bound 1 Fig. 1. Output produced by cdiprover3
Cdiprover3: A Tool for Proving Derivational Complexities
151
could not be found, and cdiprover3 gave up before time ran out. The answer YES indicates that an interpretation of the given class has been found which guarantees termination of the given TRS. It is followed by the inferred bound on the derivational complexity and a listing of the interpretation functions. After the interpretation functions, the elapsed time between the call of cdiprover3 and the output of the proof is given. In all cases, the answer is concluded by statistics stating the total number of monomials in the constructed Diophantine constraints, and the upper bound for the coefficients that was used in the last call to MiniSAT. Example 4. Given the TRS shown in Example 1, cdiprover3 produces the output shown in Figure 1. The interpretations in Example 3 and in the output are equivalent. Note that the parameter Δ in the interpretation functions fC [Δ] is treated like another argument of the function. The interpretation functions fCi are represented by f tau i in the output.
6
Discussion
In this paper, we have presented the (as far as we know) first tool which is specifically designed for automatically proving polynomial derivational complexity of term rewriting. We have also given a brief introduction into the applied proof methods. During the almost two years which have passed between the 13th ESSLLI Student Session, where this paper was originally published, and the writing of this version, we have done further work concerning context dependent interpretations and automated complexity analysis. In [20], we have extended Δ-linear interpretations to Δ2 -interpretations, defined by the following shape: fC (Δ, z1 , . . . , zn ) =
n i=1
fCi (Δ) =
a(f,i) zi +
n
b(f,i) zi Δ + gf + hf Δ
i=1
c(f,i) + d(f,i) Δ a(f,i) + b(f,i) Δ
In the same paper, we have established a correspondence result between Δ2 interpretations and two-dimensional matrix interpretations. Matrix interpretations are interpretations into a well-founded F -monotone algebra using vectors of natural numbers as their carrier. Their interpretation functions are based on vector addition and matrix-vector multiplication. See [7] for a more detailed description of matrix interpretations. We have the following theorem: Theorem 3 ([20]). Let R be a TRS and let C be a Δ2 -interpretation such that R is compatible with C. Then there exists a corresponding matrix interpretation A (of dimension 2) compatible with R.
152
A. Schnabl
With some minor restrictions, the theorem also holds in the reverse direction. Also note that one-dimensional matrix interpretations are equivalent to polynomial interpretations as long as all used polynomials are linear. Moreover, in the meantime, Martin Avanzini, Georg Moser, and the author of this paper have also been developing TCT, a more general tool for automated complexity analysis of term rewriting. TCT can be found at http://cl-informatik.uibk.ac.at/software/tct/ While TCT does not apply polynomial and context dependent interpretations anymore, matrix interpretations are one of its most heavily used proof techniques. As suggested by Theorem 3, all examples, where cdiprover3 can show a polynomial upper bound on the derivational complexity of a TRS, can also be handled by TCT with a matrix interpretation of dimension at most 2 (of a restricted shape which induces a quadratic upper bound on the derivational complexity). Further techniques implemented by TCT include arctic interpretations [14] (the basic idea of this technique is to extend matrix interpretations to a domain different from natural numbers), root labeling [23], and rewriting of right hand sides [25]. Currently, TCT can show a polynomial upper bound on the derivational complexity of 212 of the 368 known terminating non-duplicating systems mentioned in Section 4. The average time for a successful complexity proof is 4.89 seconds, and TCT produces a timeout for 122 of the remaining systems. However, it should be noted that TCT was designed to run several termination proof attempts in parallel, and TCT ran on 16 cores in this test (we used the same testing machine as for the tests described in Section 4, and we did not restrict TCT to run single-threaded). Hence, the numbers are not directly comparable. Still, it becomes visible that the power of automated derivational complexity analysis has increased greatly during the last two years. At this point, there exist upper bounds on the derivational complexity induced by most direct termination proof techniques. However, virtually all state-of-theart termination provers employ the dependency pair framework, cf. [9], in their proofs. As shown in [19], not even the most simple version of the dependency pair method, as presented in [1], is suitable for inferring polynomial upper bounds on derivational complexities. There have been efforts in [10,11] to weaken the basic dependency pair method in order to make it usable for bounding the derivation length of constructor based terms (this is called runtime complexity analysis in these papers). A possible avenue for future work would be to develop a restricted version of the dependency pair method (or even the dependency pair framework) which is able to infer polynomial bounds on derivational complexities.
References 1. Arts, T., Giesl, J.: Termination of term rewriting using dependency pairs. Theor. Comp. Sci. 236(1,2), 133–178 (2000) 2. Avanzini, M., Moser, G.: Complexity analysis by rewriting. In: Garrigue, J., Hermenegildo, M.V. (eds.) FLOPS 2008. LNCS, vol. 4989, pp. 130–146. Springer, Heidelberg (2008)
Cdiprover3: A Tool for Proving Derivational Complexities
153
3. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998) 4. Bonfante, G., Cichon, A., Marion, J.Y., Touzet, H.: Algorithms with polynomial interpretation termination proof. J. Funct. Program. 11(1), 33–53 (2001) 5. Bonfante, G., Marion, J.Y., P´echoux, R.: Quasi-interpretation synthesis by decomposition. In: Jones, C.B., Liu, Z., Woodcock, J. (eds.) ICTAC 2007. LNCS, vol. 4711, pp. 410–424. Springer, Heidelberg (2007) 6. Contejean, E., March´e, C., Tom´ as, A.P., Urbain, X.: Mechanically proving termination using polynomial interpretations. J. Autom. Reason. 34(4), 325–363 (2005) 7. Endrullis, J., Waldmann, J., Zantema, H.: Matrix interpretations for proving termination of term rewriting. J. Autom. Reason. 40(3), 195–220 (2008) 8. Fuhs, C., Giesl, J., Middeldorp, A., Schneider-Kamp, P., Thiemann, R., Zankl, H.: SAT solving for termination analysis with polynomial interpretations. In: MarquesSilva, J., Sakallah, K.A. (eds.) SAT 2007. LNCS, vol. 4501, pp. 340–354. Springer, Heidelberg (2007) 9. Giesl, J., Thiemann, R., Schneider-Kamp, P.: The dependency pair framework: Combining techniques for automated termination proofs. In: Baader, F., Voronkov, A. (eds.) LPAR 2004. LNCS (LNAI), vol. 3452, pp. 301–331. Springer, Heidelberg (2005) 10. Hirokawa, N., Moser, G.: Automated complexity analysis based on the dependency pair method. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS (LNAI), vol. 5195, pp. 364–379. Springer, Heidelberg (2008) 11. Hirokawa, N., Moser, G.: Complexity, graphs, and the dependency pair method. In: Cervesato, I., Veith, H., Voronkov, A. (eds.) LPAR 2008. LNCS (LNAI), vol. 5330, pp. 652–666. Springer, Heidelberg (2008) 12. Hofbauer, D.: Termination proofs by context-dependent interpretations. In: Middeldorp, A. (ed.) RTA 2001. LNCS, vol. 2051, pp. 108–121. Springer, Heidelberg (2001) 13. Hofbauer, D., Lautemann, C.: Termination proofs and the length of derivations. In: Dershowitz, N. (ed.) RTA 1989. LNCS, vol. 355, pp. 167–177. Springer, Heidelberg (1989) 14. Koprowski, A., Waldmann, J.: Arctic termination . . . below zero. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 202–216. Springer, Heidelberg (2008) 15. Lankford, D.: On proving term-rewriting systems are noetherian. Tech. Rep. MTP2, Math. Dept., Louisiana Tech. University (1979) 16. Lescanne, P.: Termination of rewrite systems by elementary interpretations. Formal Aspects of Computing 7(1), 77–90 (1995) 17. Marion, J.Y.: Analysing the implicit complexity of programs. Inf. Comput. 183(1), 2–18 (2003) 18. Moser, G., Schnabl, A.: Proving quadratic derivational complexities using context dependent interpretations. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 276–290. Springer, Heidelberg (2008) 19. Moser, G., Schnabl, A.: The derivational complexity induced by the dependency pair method. In: Treinen, R. (ed.) RTA 2009. LNCS, vol. 5595, pp. 255–269. Springer, Heidelberg (2009) 20. Moser, G., Schnabl, A., Waldmann, J.: Complexity analysis of term rewriting based on matrix and context dependent interpretations. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2008. LIPIcs, vol. 2, pp. 304–315. Schloss Dagstuhl Leibniz-Zentrum fuer Informatik (2008) 21. Schnabl, A.: Context Dependent Interpretations, Master’s thesis, Universit¨ at Innsbruck (2007), http://cl-informatik.uibk.ac.at/~ aschnabl/
154
A. Schnabl
22. Steinbach, J.: Proving polynomials positive. In: Shyamasundar, R.K. (ed.) FSTTCS 1992. LNCS, vol. 652, pp. 191–202. Springer, Heidelberg (1992) 23. Sternagel, C., Middeldorp, A.: Root-labeling. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 336–350. Springer, Heidelberg (2008) 24. TeReSe: Term Rewriting Systems. Cambridge Tracts in Theoretical Computer Science, vol. 55. Cambridge University Press, Cambridge (2003) 25. Zantema, H.: Reducing right-hand sides for termination. In: Middeldorp, A., van Oostrom, V., van Raamsdonk, F., de Vrijer, R. (eds.) Processes, Terms and Cycles: Steps on the Road to Infinity. LNCS, vol. 3838, pp. 173–197. Springer, Heidelberg (2005)
POP∗ and Semantic Labeling Using SAT Martin Avanzini Institute of Computer Science, University of Innsbruck, Austria [email protected]
Abstract. The polynomial path order (POP∗ for short) is a termination method that induces polynomial bounds on the innermost runtime complexity of term rewrite systems (TRSs for short). Semantic labeling is a transformation technique used for proving termination. In this paper, we propose an efficient implementation of POP∗ together with finite semantic labeling. This automation works by a reduction to the problem of boolean satisfiability. We have implemented the technique and experimental results confirm the feasibility of our approach. By semantic labeling the analytical power of POP∗ is significantly increased.
1
Introduction
Term rewrite systems (TRSs for short) provide a conceptually simple but powerful abstract model of computation. In rewriting, proving termination is a long standing research field. Consequently, termination techniques applicable in an automated setting have been introduced quite early. Early research concentrated mainly on direct termination techniques [24]. One such technique is the use of recursive path orders, for instance the multiset path order (MPO for short) [11]. Nowadays, the emphasis shifted toward transformation techniques like the dependency pair method [2] or semantic labeling [26]. These methods significantly increase the possibility to automatically verify termination. Many termination techniques can be used to analyse the complexity of rewrite systems. For instance, Hofbauer was the first to observe that termination via MPO implies the existence of a primitive recursive bound on the derivational complexity [15]. Here the derivational complexity of a TRS measures the maximal number of rewrite steps as a function in the size of the initial term. For the study of lower complexity bounds we recently introduced in [4] the polynomial path order (POP∗ for short). This order is in essence a miniaturization of MPO, carefully crafted to induce polynomial bounds on the number of rewrite steps (c.f. Theorem 1), whenever the initial term is argument-normalised (aka basic). In this work, we show how to increase the analytical power of POP∗ by semantic labeling [26]. The idea behind semantic labeling is to label the function symbols of the analysed TRS R with semantic information so that proving termination of the labeled TRS Rlab becomes easier. The transformation is termination preserving and reflecting. More precisely, every derivation of R is simulated
This research is supported by FWF (Austrian Science Fund) projects P20133.
T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 155–166, 2010. Springer-Verlag Berlin Heidelberg 2010
156
M. Avanzini
step-by-step by Rlab . Thus, besides analysing the termination behavior of R, the TRS Rlab can also be employed for investigating the complexity of R. In order to obtain the labeled TRS Rlab from R, one needs to define suitable interpretation- and labeling-functions for all function symbols appearing in R. Naturally, these functions have to be chosen such that the employed direct technique — in our case POP∗ — is applicable to the labeled system. To find a properly labeled TRS Rlab automatically, we extend the propositional encoding of POP∗ presented in [4]. Satisfiability of the constructed formula certifies the existence of a labeled system Rlab that is compatible with POP∗ . As we have implemented the technique, the feasibility of our approach is confirmed. Moreover, experimental evidence indicates that the analytical power of polynomial path orders is significantly improved. The automation of semantic labeling together with some base order is not essentially new. For instance, an automation of semantic labeling together with recursive path orders has already been given in [17]. Unfortunately, this approach is inapplicable in our context as the resulting TRS is usually infinite here. Like many syntactic techniques, soundness of polynomial path orders is restricted to finite TRSs. To achieve that Rlab is finite, we restrict interpretation- and labeling-functions to finite domains. We structure the remainder of this paper as follows: In Section 2 we recall basic notions and briefly introduce the reader to polynomial path orders POP∗ . In Section 3 we show how polynomial path orders together semantic labeling can be efficiently automated. In Section 4 we present experimental results, and we conclude in Section 5.
2
The Polynomial Path Order
We briefly recall the basic concepts of term rewriting, for details [8] provides a good resource. Let V denote a countably infinite set of variables and F a signature, that is a set of function symbols with associated arities. The set of terms over F and V is denoted by T (F , V). We write for the subterm relation, the converse is denoted by , the strict part of by . A term rewrite system (TRS for short) R over T (F , V) is a set of rewrite rules l → r such that l, r ∈ T (F , V), l ∈ V and all variables of r also appear in l. In the following, R always denotes a TRS. If not mentioned otherwise, R is finite. A binary relation on T (F , V) is a rewrite relation if it is closed under contexts and substitutions. The smallest extension of R that is a rewrite relation i → is denoted by →R . The innermost rewrite relation − R is the restriction of →R where innermost terms have to be reduced first. The transitive and reflexive closure of a rewrite relation → is denoted by →∗ and we write s →n t for the contraction of s to t in n steps. We say that R is (innermost) terminating if there i exists no infinite chain of terms t0 , t1 , . . . such that ti →R ti+1 (ti − → R ti+1 ) for all i ∈ N. The root symbols of left-hand sides of rewrite rules in R are called defined symbols and collected in D(R), while all other symbols are called constructor
POP∗ and Semantic Labeling Using SAT
157
symbols and collected in C(R). A term f (s1 , . . . , sn ) is basic if f ∈ D(R) and s1 , . . . , sn ∈ T (C(R), V). We write Tb (R) for the set of all basic terms over R. If every left-hand side of R is basic then R is called constructor TRS. Constructor TRSs allow us to model the computation of functions in a very natural way. Example 1. Consider the constructor TRS Rmult defined by add(0, y) → y add(s(x), y) → s(add(x, y))
mult(0, y) → 0 mult(s(x), y) → add(y, mult(x, y)).
Rmult defines the function symbols add and mult, i.e. D(R) = {add, mult}. Natural numbers are represented using the constructor symbols C(R) = {s, 0}. Define the encoding function · : Σ ∗ → T (C(R), ∅) by 0 = 0 and n + 1 = s(n). i ∗ Then for all n, m ∈ N, mult(n, m) − → R n ∗ m. We say that Rmult computes multiplication (and addition) on natural numbers. For instance, the system admits the innermost rewrite sequence i i i mult(s(0), 0) − → add(0, mult(0, 0)) − → add(0, 0) − → 0,
computing 1∗0 = 0. Note that for the second term, the innermost redex mult(0, 0) is reduced first. In [19] it is proposed to conceive the complexity of a rewrite system R as the complexity of the functions computed by R. Whereas this view falls into the realm of implicit complexity analysis, we conceive rewriting under R as the evaluation mechanism of the encoded function. Thus it is natural to define the runtime complexity based on the number of rewrite steps admitted by R. Let |s| denote the size of a term s. The (innermost) runtime complexity of a terminating rewrite system R is defined by i n → t, s ∈ Tb (R) and |s| m} . rcR (m) = max{n | ∃s, t. s −
To verify whether the runtime complexity of a rewrite system R is polynomially bounded, we employ polynomial path order. Inspired by the recursiontheoretic characterization of the polytime functions given in [9], polynomial path orders rely on the separation of safe and normal inputs. For this, the notion of safe mappings is introduced. A safe mapping safe associates with every nary function symbol f the set of safe argument positions. If f ∈ D(R) then safe(f ) ⊆ {1, . . . , n}, for f ∈ C(R) we fix safe(f ) = {1, . . . , n}. The argument positions not included in safe(f ) are called normal and denoted by nrm(f ). A precedence is an irreflexive and transitive order on F . The polynomial path order >pop∗ is an extension of the auxiliary order >pop , both defined in the following two definitions. Here we write >= for the reflexive closure of an order >, further (>)mul denotes its multiset-extension (c.f. [8]). Definition 1. Let > be a precedence and let safe be a safe mapping. We define the order >pop inductively as follows: s = f (s1 , . . . , sn ) >pop t if one of the following alternatives hold:
158
M. Avanzini
1. si >= pop t for some i ∈ {1, . . . , n}, and if f ∈ D(R) then i ∈ nrm(f ), or 2. t = g(t1 , . . . , tm ), f ∈ D(R), f > g and s >pop ti for all 1 i m. Definition 2. Let > be a precedence and let safe be a safe mapping. We define the polynomial path order >pop∗ inductively as follows: s = f (s1 , . . . , sn ) >pop∗ t if one of the following alternatives hold: 1. s >pop t, or 2. si >= pop∗ t for some i ∈ {1, . . . , n}, or 3. t = g(t1 , . . . , tm ), f ∈ D(R), f > g and – s >pop∗ tj0 for some j0 ∈ safe(g), and – for all j = j0 , either s >pop tj or s tj and j ∈ safe(g), or 4. t = f (t1 , . . . , tm ), f ∈ D(R) and – [si1 , . . . , sip ] (>pop∗ )mul [ti1 , . . . , tip ] for nrm(f ) = {i1 , . . . , ip }, and – [sj1 , . . . , sjq ] (>= pop∗ )mul [tj1 , . . . , tjq ] for safe(f ) = {j1 , . . . , jq }. Here [t1 , . . . , tn ] denotes the multiset with elements t1 , . . . , tn . When R ⊆ >pop∗ holds, we say that >pop∗ is compatible with R. The main theorem from [4] states: Theorem 1. Let R be a finite, constructor TRS compatible with >pop∗ , i.e., R ⊆ >pop∗ . Then the runtime complexity of R is polynomial. The polynomial depends only on the cardinality of F and the sizes of the right-hand sides in R. We conclude this section by demonstrating the application of POP∗ on the TRS Rmult . Below we write i for the i-th case of Definition 2. Example 2. Reconsider the rewrite system Rmult from Example 1. Consider the safe mapping safe where the second argument of addition is safe (safe(add) = {2}) and all arguments of multiplication are normal (safe(mult) = ∅). Furthermore, let the precedence > be defined as mult > add > s > 0. In order to verify compatibility for this particular instance >pop∗ we need to show that all the rules in Rmult are strictly decreasing, i.e., l >pop∗ r holds for l → r ∈ Rmult . To exemplify this, consider the rule add(s(x), y) → s(add(x, y)). From s(x) >pop∗ x by rule 2 we infer [s(x)] (>pop∗ )mul [x]. Furthermore [y] (>= pop∗ )mul [y] holds and thus by rule 4 we obtain add(s(x), y) >pop∗ add(x, y). From add > s we finally conclude add(s(x), y) >pop∗ s(add(x, y)) by one application of rule 3 . As a consequence of Theorem 1, the number of rewrite steps starting from mult(n, m) is polynomially bounded in n and m.
3
A Propositional Encoding of POP∗ and Finite Semantic Labeling
Before we investigate the propositional encoding of polynomial path orders and semantic labeling, we briefly explain basic notions of semantic labeling as introduced in [26]. Semantics is given to a TRS R by defining a model. A model is an F -algebra A, i.e. a carrier A equipped with operations fA : An → A for every n-ary symbol
POP∗ and Semantic Labeling Using SAT
159
f ∈ F, such that for every rule l → r ∈ R and any assignment α : V → A, the equality [α]A (l) = [α]A (r) holds. Here [α]A (t) denotes the interpretation of t with assignment α, recursively defined by α(t) if t ∈ V [α]A (t) = fA ([α]A (t1 ), . . . , [α]A (tn )) if t = f (t1 , . . . , tn ) . The system R is labeled according to a labeling for A, i.e. a set of mappings f : An → A for every n-ary function symbol f ∈ F.1 For every assignment α, the mapping labα (t) is defined by t if t ∈ V labα (t) = fa (labα (t1 ), . . . , labα (tn )) if t = f (t1 , . . . , tn ) where a = f ([α]A (t1 ), . . . , [α]A (tn )). The labeled TRS Rlab is obtained by labeling all rules for all assignments α: Rlab = {labα (l) → labα (r) | l → r ∈ R and assignment α}. The main theorem from [26] states that Rlab is terminating if and only if R is terminating. In particular, it is shown that s →R t
⇐⇒
labα (s) →Rlab labα (t)
holds for α an arbitrary assignment. To simplify the presentation, we consider only algebras B with carrier B = { , ⊥} here, although in principle the approach works for arbitrary finite carriers. To encode a Boolean function b : Bn → B, we make use of unique propositional atoms bw1 ,...,wn for every sequence of arguments w1 , . . . , wn ∈ Bn . The atom bw1 ,...,wn denotes the result of applying arguments w1 , . . . , wn to b. For each sequence a1 , . . . , an of propositional formulas, we denote by b(a1 , . . . , an ) the following formula: when n = 0, we set b = bε . For n > 0, we set b(a1 , . . . , an ) =
w1 ,...,wn ∈Bn
n
wi ↔ ai → bw1 ,...,wn .
i=1
Consider the constraint b(a1 , . . . , an ) ↔ r, and suppose ν is a satisfying assignment. One easily verifies that the encoded function b satisfies b(w1 , . . . , wn ) = ν(bw1 ,...,wn ) = ν(r) for w1 = ν(a1 ), . . . , wn = ν(an ). We use this observation below to impose restrictions on interpretation- and labeling-functions. For every assignment α : V → B and term t appearing in R we introduce the ∈ V. The meaning of intα,t is the result of [α]B (t) for atoms intα,t and labα,t for t the encoded model B, labα,t denotes the label of the root symbol of the labeled 1
The definition from [26] allows the labeling of a subset of F and leave other symbols unchanged. In our context, this has no consequence and simplifies the translation.
160
M. Avanzini
term labα (t). To ensure for terms t = f (t1 , . . . , tn ) and assignments α a correct valuation of intα,t and labα,t respectively, we introduce constraints INTα (t) = intα,t ↔ fB (intα,t1 , . . . , intα,tn ), and LABα (t) = labα,t ↔ f (intα,t1 , . . . , intα,tn ). Furthermore, we set INTα (t) = intα,t ↔ α(t)2 for t ∈ V. The above constraints have to be enforced for every term appearing in R. This is covered by LAB(R) = (INTα (t) ∧ LABα (t)) ∧ (intα,l ↔ intα,r ) . α
Rt
l→r∈R
Above is extended to TRSs in the obvious way: R t if l t or r t for some rule l → r ∈ R. Notice that l→r∈R (intα,l ↔ intα,r ) enforces the model condition. Assume ν is a satisfying assignment for LAB(R) and Rlab denotes the system obtained by labeling R according to the encoded labeling and model. In order to show compatibility of Rlab with POP∗ , we need to find a precedence > and safe mapping safe such that Rlab ⊆ >pop∗ holds for the induced order >pop∗ . To compare the labeled versions labα (s) and labα (t) of two concrete terms s, t ∈ T (F , V) under a particular assignment α, i.e., to check labα (s) >pop∗ labα (t), we define (1)
(2)
(3)
(4)
s >pop∗ tα = s >pop∗ tα ∨ s >pop∗ tα ∨ s >pop∗ tα ∨ s >pop∗ tα . (i)
Here s >pop∗ t refers to the encodings of the case i from Definition 2. We discuss the cases 2 – 4 , case 1 , the comparison using the weaker order >pop , is obtained similarly. The above definition relies on the following auxiliary constraints. For every labeled symbol fa ∈ Flab and argument position i of f , we encode i ∈ safe(fa ) by a propositional atom safefa ,i . For every unlabeled symbol f ∈ F and formula a representing the label, the formula SF(fa , i) (respectively NRM(fa , i)) assesses that depending on the valuation of a, the i-th position of f or f⊥ is safe (normal). Similar, for f, g ∈ F and propositional formulas a and b, the formula fa > gb ensures fν(a) > fν(b) in the precedence for satisfying assignment ν. For the latter, we follow the standard approach of encoding precedences on function symbols, compare for instance [23]. Notice that si = t if and only if labα (si ) = labα (t). Thus case 2 is perfectly (2) captured by f (s1 , . . . , sn ) >pop∗ tα = if si = t holds for some si . Otherwise, (2) we define f (s1 , . . . , sn ) >pop∗ tα = ni=1 si >pop∗ tα . For the encoding of the third clause in Definition 2, we introduce fresh atoms δj for each argument position j of g. The formula one(δ1 , . . . , δm ) assures that exactly one atom δj is true. This particular atom marks the unique safe argument position j of g(t1 , . . . , tm ) with the strong comparison labα (s) >pop∗ labα (tj ) allowed. We express clause 3 by the propositional formula 2
We also use and ⊥ to denote truth and falsity in propositional formulas.
POP∗ and Semantic Labeling Using SAT
161
(3)
f (s1 , . . . , sn ) >pop∗ g(t1 , . . . , tm )α = flabα,s > glabα,t ∧ one(δ1 , . . . , δm ) m ∧ δj → s >pop∗ tj α ∧ SF(glabα,t , j) j=1
∧ ¬δj → s >pop tj α ∨ s tj ∧ SF(glabα,t , j)
for g ∈ D(R). Here s ti = when s ti holds, and otherwise s ti = ⊥. This is justified as the subterm relation is closed under labeling. Note that in the above encoding of clause 3 , we assume that the labeled root symbol flabα,s is a defined symbol of Rlab . For the case that flabα,s is not defined, we add a rule flabα,s (x1 , . . . , xn ) → c with c a fresh constant to the analysed system Rlab . The latter rule is oriented if we additionally require flabα,s > c in the precedence. For (3) instance, the constraint mult(s(x), y) >pop∗ add(y, mult(x, y))α unfolds to multl1 > addl2 α ∧ one(δ1 , δ2 ) ∧ δ1 → mult(s(x), y) >pop∗ yα ∧ SF(gl2 , 1) ∧ ¬δ1 → mult(s(x), y) >pop yα ∨ ∧ SF(gl2 , 1) ∧ δ2 → mult(s(x), y) >pop∗ mult(x, y)α ∧ SF(gl2 , 2) ∧ ¬δ2 → mult(s(x), y) >pop mult(x, y)α ∨ ⊥ ∧ SF(gl2 , 2) for corresponding labels l1 and l2 depending on the encoded model. Additionally we require mult > c and mult⊥ > c to orient the added rules. To encode the final clause 4 from Definition 2, we make use of multiset covers [23]. A multiset cover is a pair of total mappings γ : {1, . . . , n} → {1, . . . , n} and ε : {1, . . . , n} → B, encoded using fresh atoms γi,j and εi . The underlying idea is that for the comparison [s1 , . . . , sn ] (>= pop∗ )mul [t1 , . . . , tn ] to hold, every term tj has to be covered by some term si (encoded by γij ), either by si = tj or si >pop∗ tj . The former situation is encoded by ¬εi , the latter by εi . For the case si = tj , si must not cover any element besides tj . We set (γ, ε) =
m
one(γ1,j , . . . , γn,j ) ∧
j=1
n
(εi → one(γi,1 , . . . , γi,m )) .
i=1
Based on this encoding of multiset covers, case 4 is now expressible as (4)
f (s1 , . . . , sn ) >pop∗ f (t1 , . . . , tn )α = (labα,s ↔ labα,t ) ∧ (γ, ε) ∧
n
NRM(flabα,s , i) ∧ ¬εi i=1
n n γi,j → (SF(flabα,s , i) ↔ SF(flabα,t , j)) ∧ i=1 j=1
∧ (εi → si = tj ) ∧ (¬εi → si >pop∗ tj α ) .
162
M. Avanzini
The constraint ni=1 NRM(flabα,s , i) ∧ ¬εi is used so that at least one normal argument decreases. Assuming STRICT(R) and SMSL (R) cover the restrictions on the precedence and safe mapping, satisfiability of POP∗SL(R) = l >pop∗ rα ∧ SM(R) ∧ STRICT(R) ∧ LAB(R) α l→r∈R
certifies the existence of a model B and labeling such that the rewrite system Rlab = Rlab ∪ {fa (x1 , . . . , xn ) → c | f ∈ D(R) and fa ∈ C(Rlab )} is compatible with >pop∗ . The encoding is sound in the following sense. Theorem 2. Suppose the propositional formula POP∗SL(R) is satisfiable. Then Rlab ⊆ >pop∗ for some (finite) labeled rewrite system Rlab and polynomial path order >pop∗ . Since every rewrite sequence of R is simulated step-by-step by Rlab we obtain: Corollary 1. Let R be a finite constructor TRS. Suppose the propositional formula POP∗SL (R) is satisfiable. Then the induced (innermost) runtime complexity of R is polynomial.
4
Experimental Results
We have implemented the encoding of POP∗ with semantic labeling (denoted by POP∗SL below) in OCaml. We compare this implementation to the implementation without labeling from [4] (denoted by POP∗ ) and an implementation of a restricted class of polynomial interpretations (denoted by SMC). To check satisfiability of the obtained formulas we employ the MiniSat SAT-solver [12]. SMC refers to a restrictive class of polynomial interpretations: Every constructor symbol is interpreted by a strongly linear polynomial, i.e., a polynomial n xi + c with c ∈ N, c 1. Furthermore, each of shape P (x1 , . . . , xn ) = Σi=1 defined symbol is interpreted by a simple-mixed polynomial P (x1 , . . . , xn ) = n bi x2i with coefficients in N. Unlike for the genΣij ∈0,1 ai1 ...in xi11 . . . xinn + Σi=1 eral case, these restricted interpretations induce polynomial bounds on the runtime complexity. To find such interpretation functions automatically, we employ cdiprover3 [20]. Table 1 presents experimental results based on two testbeds. Testbed T constitutes of the 957 examples from the Termination Problem Database 4.03 (TPDB) that were automatically verified terminating in the competition of 20074 . Testbed C is a restriction of T where only constructor TRSs have been considered (449 in total). All experiments were conducted on a PC with 512 MB of RAM and a 2.4 GHz Intel Pentium IV processor.
3 4
Available at http://www.lri.fr/~ marche/tpdb C.f. http://www.lri.fr/~ marche/termination-competition/2007/
POP∗ and Semantic Labeling Using SAT
163
Table 1. Experimental results on TPDB 4.0
Yes Maybe Timeout (60 sec.) Average Time Yes (sec.)
POP∗ T C
POP∗SL T C
SMC T C
65 41 892 408 0 0
128 74 800 370 29 5
156 83 495 271 306 95
0.037
0.130
0.183
Table 1 confirms that semantic labeling significantly increases the power of POP∗ , yielding comparable results to SMC. What is noteworthy is that the union of yes-instances of the three methods constitutes of 218 examples for testbed T and 112 for testbed C. For these 112 out of 449 constructor TRSs we are able to conclude a polynomial runtime complexity. Interestingly, POP∗SL and SMC succeed on a quite different range of systems. There are 29 constructor TRSs that only POP∗SL can deal with, whereas 38 constructor yes-instances of SMC cannot be handled by POP∗SL . Table 1 reflects that for both suites SMC runs into a timeout for approximately every fourth system. This indicates that purely semantic methods similar to SMC tend to get impractical when the size of the input system increases. Compared to this, the number of timeouts of POP∗SL is rather low. We perform various optimizations in our implementation: First of all, the constraints can be reduced during construction. Further, it is beneficial to lazily con(2) struct the overall constraint. For example, the formula f (s1 , . . . , sn ) >pop∗ si α reduces to . Hence f (s1 , . . . , sn ) >pop∗ si α = can be concluded without constructing encodings for the remaining cases in Definition 2. Furthermore, s >pop∗ t is doomed to failure if t contains variables not appearing in s. For this case, we replace the corresponding constraint by ⊥. SAT-solvers expect their input in CNF (worst case exponential in size). We employ the transformation proposed in [21] to obtain an equisatisfiable CNF linear in size. This approach is analogous to Tseitin’s transformation [25] but the resulting CNF is usually shorter as the plurality of atoms is taken into account.
5
Conclusion
In this paper we have shown howto automatically verify polynomial runtime complexities of rewrite systems. For that we employ semantic labeling and polynomial path orders. Our automation works by a reduction to SAT and employing a state-of-the-art SAT-solver. To our best knowledge, this is the first SAT encoding of some recursive path order with finite semantic labeling. The experimental results confirm the feasibility of our approach. Moreover, they demonstrate that by semantic labeling we significantly increase the power of polynomial path orders.
164
M. Avanzini
Our research seems comparable to [10], where recursive path orders together with strongly linear polynomial quasi-interpretations are employed in the complexity analysis. In particular, they have a fully automatable (but of course incomplete) procedure to verify whether the functions computed by the TRS under consideration are feasibly, i.e., polytime, computable. Opposed to [10], we study the length of derivations here. In [7] it is shown that polynomially bounded innermost runtime-complexity entails polytime computability of the functions defined. As a by-product of Corollary 1, [7] gives us a procedure for the complexity analysis of the functions defined. Finally, we also mention that semantic labeling over a Boolean carrier has been implemented in the termination prover TPA [16], where heuristics are used to find an appropriately labeled TRS Rlab . Unlike their approach, we leave all choices concerning the labeling to a state-of-the-art SAT-solver. In the meantime, polynomial path orders have been extended in various ways. Inspired by the concept of predicative recursion and parameter substitution (see [9]), [6] extends polynomial path orders, widening their applicability. Our integration of semantic labeling naturally translates to this extension. Second, polynomial path orders can also be defined over quasi-precedences, compare [5]. Further, in [5] polynomial path orders have been combined with weak dependency pairs [14], a version of the dependency pair method suitably adapted for the study of runtime-complexities. In principle, this allows the use of those techniques that were developed in the context of dependency pairs for termination analysis, also for complexity analysis. In [5] we exploit two such techniques, namely argument filterings [18] and the usable rules criterion [2]. All above mentioned extensions have been implemented in the Tyrolean Complexity Tool, an open source complexity analyser for TRSs5 . Finally, we conclude with an application of our research. There is a long interest in the functional programming community to automatically verify complexity properties of programs. For brevity, we just mention [22,1,10]. Rewriting naturally models the evaluation of functional programs, and the termination behavior of functional programs via transformations to rewrite systems has been extensively studied. For instance, one recent approach is described in [13] where Haskell programs are covered. In joint work with Hirokawa, Middeldorp and Moser [3] we propose a translation from (a pure subset of higher-order) Scheme programs to term rewrite systems. The transformation is designed to be complexity preserving and thus allows the study of the complexity of a Scheme program P by the analysis of the transformed rewrite system R. Hence from compatibility of R with POP∗ we can directly conclude that the number of evaluation steps of the Scheme program P is polynomially bounded with respect to the input sizes. All necessary steps can be performed mechanically and thus we arrive at a completely automatic complexity analysis for (a pure subset of) Scheme, and eagerly evaluated functional programs in general.
5
For further information, see http://cl-informatik.uibk.ac.at/software/tct/
POP∗ and Semantic Labeling Using SAT
165
References 1. Anderson, H., Khoo, S., Andrei, S., Luca, B.: Calculating polynomial runtime properties. In: Yi, K. (ed.) APLAS 2005. LNCS, vol. 3780, pp. 230–246. Springer, Heidelberg (2005) 2. Arts, T., Giesl, J.: Termination of term rewriting using dependency pairs. TCS 236(1-2), 133–178 (2000) 3. Avanzini, M., Hirokawa, N., Middeldorp, A., Moser, G.: Proving termination of scheme programs by rewriting, http://cl-informatik.uibk.ac.at/~ zini/publications/SchemeTR07.pdf 4. Avanzini, M., Moser, G.: Complexity analysis by rewriting. In: Garrigue, J., Hermenegildo, M.V. (eds.) FLOPS 2008. LNCS, vol. 4989, pp. 130–146. Springer, Heidelberg (2008) 5. Avanzini, M., Moser, G.: Dependency pairs and polynomial path orders. In: Treinen, R. (ed.) RTA 2009. LNCS, vol. 5595, pp. 48–62. Springer, Heidelberg (2009) 6. Avanzini, M., Moser, G.: Polynomial path orders and the rules of predicative recursion with parameter substitution. In: Proc. 10th WST (2009) 7. Avanzini, M., Moser, G.: Complexity analysis by graph rewriting. In: Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp. 257–271. Springer, Heidelberg (2010) 8. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998) 9. Bellantoni, S., Cook, S.A.: A new recursion-theoretic characterization of the polytime functions. CC 2, 97–110 (1992) 10. Bonfante, G., Marion, J., P´echoux, R.: Quasi-interpretation synthesis by decomposition. In: Jones, C.B., Liu, Z., Woodcock, J. (eds.) ICTAC 2007. LNCS, vol. 4711, pp. 410–424. Springer, Heidelberg (2007) 11. Dershowitz, N.: Orderings for term-rewriting systems. In: 20th Annual Symposium on Foundations of Computer Science, pp. 123–131. IEEE, Los Alamitos (1979) 12. E´en, N., S¨ orensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004) 13. Giesl, J., Swiderski, S., Schneider-Kamp, P., Thiemann, R.: Automated termination analysis for Haskell: From term rewriting to programming languages. In: Pfenning, F. (ed.) RTA 2006. LNCS, vol. 4098, pp. 297–312. Springer, Heidelberg (2006) 14. Hirokawa, N., Moser, G.: Automated complexity analysis based on the dependency pair method. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS (LNAI), vol. 5195, pp. 364–380. Springer, Heidelberg (2008) 15. Hofbauer, D.: Termination proofs by multiset path orderings imply primitive recursive derivation lengths. TCS 105(1), 129–140 (1992) 16. Koprowski, A.: Tpa: Termination proved automatically. In: Pfenning, F. (ed.) RTA 2006. LNCS, vol. 4098, pp. 297–312. Springer, Heidelberg (2006) 17. Koprowski, A., Middeldorp, A.: Predictive labeling with dependency pairs using SAT. In: Pfenning, F. (ed.) CADE 2007. LNCS (LNAI), vol. 4603, pp. 410–425. Springer, Heidelberg (2007) 18. Kusakari, K., Nakamura, M., Toyama, Y.: Argument filtering transformation. In: Nadathur, G. (ed.) PPDP 1999. LNCS, vol. 1702, pp. 47–61. Springer, Heidelberg (1999) 19. Lescanne, P.: Termination of rewrite systems by elementary interpretations. Formal Aspects of Computing 7(1), 77–90 (1995)
166
M. Avanzini
20. Moser, G., Schnabl, A.: Proving quadratic derivational complexities using context dependent interpretations. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 276–290. Springer, Heidelberg (2008) 21. Plaisted, D.A., Greenbaum, S.: A structure-preserving clause form translation. J. Symb. Comput. 2(3), 293–304 (1986) 22. Rosendahl, M.: Automatic complexity analysis. In: Proc. 4th FPCA, pp. 144–156 (1989) 23. Schneider-Kamp, P., Thiemann, R., Annov, E., Codish, M., Giesl, J.: Proving termination using recursive path orders and SAT solving. In: Konev, B., Wolter, F. (eds.) FroCos 2007. LNCS (LNAI), vol. 4720, pp. 267–282. Springer, Heidelberg (2007) 24. TeReSe: Term Rewriting Systems. CTTCS, vol. 55. Cambridge University Press, Cambridge (2003) 25. Tseitin, G.: On the complexity of derivation in propositional calculus. SCML, Part 2, 115–125 (1968) 26. Zantema, H.: Termination of term rewriting by semantic labelling. FI 24(1/2), 89–105 (1995)
Author Index
Avanzini, Martin
155
Bastenhof, Arno
57
Charlow, Simon
1
Franke, Michael
13
Graf, Thomas
Nikolova, Ivelina
72
Klarman, Szymon
Lassiter, Daniel 38 Lison, Pierre 102
124
114
Schnabl, Andreas
142
Theijssen, Daphne
87
Wintein, Stefan
25