Lecture Notes in Artificial Intelligence
6149
Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science FoLLI Publications on Logic, Language and Information Editors-in-Chief Luigia Carlucci Aiello, University of Rome "La Sapienza", Italy Michael Moortgat, University of Utrecht, The Netherlands Maarten de Rijke, University of Amsterdam, The Netherlands
Editorial Board Carlos Areces, INRIA Lorraine, France Nicholas Asher, University of Texas at Austin, TX, USA Johan van Benthem, University of Amsterdam, The Netherlands Raffaella Bernardi, Free University of Bozen-Bolzano, Italy Antal van den Bosch, Tilburg University, The Netherlands Paul Buitelaar, DFKI, Saarbrücken, Germany Diego Calvanese, Free University of Bozen-Bolzano, Italy Ann Copestake, University of Cambridge, United Kingdom Robert Dale, Macquarie University, Sydney, Australia Luis Fariñas, IRIT, Toulouse, France Claire Gardent, INRIA Lorraine, France Rajeev Goré, Australian National University, Canberra, Australia Reiner Hähnle, Chalmers University of Technology, Göteborg, Sweden Wilfrid Hodges, Queen Mary, University of London, United Kingdom Carsten Lutz, Dresden University of Technology, Germany Christopher Manning, Stanford University, CA, USA Valeria de Paiva, Palo Alto Research Center, CA, USA Martha Palmer, University of Pennsylvania, PA, USA Alberto Policriti, University of Udine, Italy James Rogers, Earlham College, Richmond, IN, USA Francesca Rossi, University of Padua, Italy Yde Venema, University of Amsterdam, The Netherlands Bonnie Webber, University of Edinburgh, Scotland, United Kingdom Ian H. Witten, University of Waikato, New Zealand
Christian Ebert Gerhard Jäger Jens Michaelis (Eds.)
The Mathematics of Language 10th and 11th Biennial Conference MOL 10, Los Angeles, CA, USA, July 28-30, 2007 and MOL 11, Bielefeld, Germany, August 20-21, 2009 Revised Selected Papers
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Christian Ebert Department of Linguistics, University of Tuebingen Wilhelmstrasse 19, 72074 Tuebingen, Germany E-mail:
[email protected] Gerhard Jäger Department of Linguistics, University of Tuebingen Wilhelmstrasse 19, 72074 Tuebingen, Germany E-mail:
[email protected] Jens Michaelis Faculty of Linguistics and Literary Studies, University of Bielefeld Postfach 100131, 33501 Bielefeld, Germany E-mail:
[email protected]
Library of Congress Control Number: 2010930278
CR Subject Classification (1998): F.4.1, F.2, G.2, F.3, F, I.2.3 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-642-14321-0 Springer Berlin Heidelberg New York 978-3-642-14321-2 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 06/3180
Preface
The Association for Mathematics of Language (MOL) is the ACL special interest group dedicated to the study of Mathematical Linguistics. After its first meeting in 1984, the association has been organizing meetings on a biennial basis since 1991, with locations usually alternating between Europe and the USA. This volume contains a selection of 19 papers that were presented in contributed talks at the 10th meeting at UCLA, Los Angeles, in 2007 and the 11th meeting at the University of Bielefeld, Germany, in 2009. It furthermore contains three papers of invited speakers from these meetings. Like each MOL proceedings volume, this collection reflects studies in a wide range of theoretical topics relating to language and computation that the association is devoted to supporting, including papers on the intersection of computational complexity, formal language theory, proof-theory, and logic as well as phonology, lexical semantics, syntax, and typology. This volume is hence of interest not only to mathematical linguists but to logicians, theoretical and computational linguists, and to computer scientists alike. It therefore fits very well in the Springer FoLLI/LNAI series and we are grateful to Michael Moortgat for suggesting this series as a place of publication. We would furthermore like to thank everyone who played a role in making these meetings possible and helped to make them a success, such as the reviewers, the invited speakers, the contributors, and the people who were involved in local organization. May 2010
Christian Ebert Gerhard J¨ ager Jens Michaelis
Organization
MOL 10 was held at the University of California, Los Angeles, during July 28–30, 2007.
Local Organizers Marcus Kracht Gerald Penn Edward P. Stabler
University of California, Los Angeles University of Toronto University of California, Los Angeles
Referees Ron Artstein Patrick Blackburn Pierre Boullier Wojciech Buszkowski David Chiang Tim Fernando Markus Egg Gerhard J¨ ager
Martin Jansche David Johnson Aravind Joshi Andr´ as Kornai Alain Lecomte Carlos Mart´ın-Vide Jens Michaelis Mehryar Mohri
Uwe M¨onnich Michael Moortgat Drew Moshier Gerald Penn Sylvain Pogodalla Edward P. Stabler Shuly Wintner
MOL 11 was held at the University of Bielefeld in Germany during August 20–21, 2009.
Local Organizers Gerhard J¨ ager Marcus Kracht Christian Ebert Jens Michaelis
University University University University
of of of of
T¨ ubingen Bielefeld T¨ ubingen Bielefeld
Referees Patrick Blackburn Philippe de Groote Christian Ebert Gerhard J¨ ager Aravind Joshi Stephan Kepser Gregory M. Kobele
Andr´ as Kornai Marcus Kracht Natasha Kurtonina Jens Michaelis Michael Moortgat Larry Moss Richard T. Oehrle
Gerald Penn Wiebke Petersen Sylvain Pogodalla James Rogers Sylvain Salvati Edward P. Stabler Hans-J¨org Tiede
Table of Contents
Dependency Structures Derived from Minimalist Grammars . . . . . . . . . . . Marisa Ferrara Boston, John T. Hale, and Marco Kuhlmann
1
Deforesting Logical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhong Chen and John T. Hale
13
On the Probability Distribution of Typological Frequencies . . . . . . . . . . . . Michael Cysouw
29
A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timothy A.D. Fowler
36
LC Graphs for the Lambek Calculus with Product . . . . . . . . . . . . . . . . . . . Timothy A.D. Fowler
44
Proof-Theoretic Semantics for a Natural Language Fragment . . . . . . . . . . Nissim Francez and Roy Dyckhoff
56
Some Interdefinability Results for Syntactic Constraint Classes . . . . . . . . Thomas Graf
72
Sortal Equivalence of Bare Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Holder
88
Deriving Syntactic Properties of Arguments and Adjuncts from Neo-Davidsonian Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tim Hunter On Monadic Second-Order Theories of Multidominance Structures . . . . . Stephan Kepser The Equivalence of Tree Adjoining Grammars and Monadic Linear Context-Free Tree Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephan Kepser and James Rogers
103 117
129
A Formal Foundation for A and A-bar Movement . . . . . . . . . . . . . . . . . . . . Gregory M. Kobele
145
Without Remnant Movement, MGs Are Context-Free . . . . . . . . . . . . . . . . Gregory M. Kobele
160
The Algebra of Lexical Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andr´ as Kornai
174
VIII
Table of Contents
Phonological Interpretation into Preordered Algebras . . . . . . . . . . . . . . . . . Yusuke Kubota and Carl Pollard
200
Relational Semantics for the Lambek-Grishin Calculus . . . . . . . . . . . . . . . . Natasha Kurtonina and Michael Moortgat
210
Intersecting Adjectives in Syllogistic Logic . . . . . . . . . . . . . . . . . . . . . . . . . . Lawrence S. Moss
223
Creation Myths of Generative Grammar and the Mathematics of Syntactic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geoffrey K. Pullum
238
On Languages Piecewise Testable in the Strict Sense . . . . . . . . . . . . . . . . . James Rogers, Jeffrey Heinz, Gil Bailey, Matt Edlefsen, Molly Visscher, David Wellcome, and Sean Wibel
255
A Note on the Complexity of Abstract Categorial Grammars . . . . . . . . . . Sylvain Salvati
266
Almost All Complex Quantifiers Are Simple . . . . . . . . . . . . . . . . . . . . . . . . . Jakub Szymanik
272
Constituent Structure Sets I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroyuki Uchida and Dirk Bury
281
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
297
Dependency Structures Derived from Minimalist Grammars Marisa Ferrara Boston1 , John T. Hale1 , and Marco Kuhlmann2 1 2
Cornell University Uppsala University
Abstract. This paper provides an interpretation of Minimalist Grammars [16,17] in terms of dependency structures. Under this interpretation, merge operations derive projective dependency structures, and movement operations introduce both non-projectivity and illnestedness. This new characterization of the generative capacity of Minimalist Grammar makes it possible to discuss the linguistic relevance of non-projectivity and illnestedness. This in turn provides insight into grammars that derive structures with these properties.1
1
Introduction
This paper investigates the class of dependency structures that Minimalist Grammars (MGs) [16,17] derive. MGs stem from the generative linguistic tradition, and Chomsky’s Minimalist Program [1] in particular. The MG formalism encourages a lexicalist analysis in which hierarchical syntactic structure is built when licensed by word properties called “features”. MGs facilitate a movement analysis of long-distance dependency that is conditioned by lexical features as well. Unlike unification grammars, but similar to categorial grammar, these features must be cancelled in a particular order specific to each lexical item. Dependency Grammar [18] is a syntactic tradition that determines sentence structure on the basis of word-to-word connections, or dependencies. DG names a family of approaches to syntactic analysis that all share a commitment to word-to-word connections. [8] relates properties of dependency graphs, such as projectivity and wellnestedness, to language-theoretic concerns like generative capacity. This paper examines these same properties in MG languages. We do so using a new DG interpretation of MG derivations. This tool reveals that syntactic movement as formalized in MGs can derive the sorts of illnested structures attested in Czech comparatives from the Prague Dependency Treebank 2.0 (PDT) [5]. Previous research in the field indirectly link MGs and DGs: [12] proves the equivalence of MGs with Linear Context-Free Rewriting Systems (LCFRS) [19], and the relation of LCFRS to DGs is made explicit in [8]. This paper provides a 1
The authors thank Joan Chen-Main, Aravind K. Joshi, and audience members at Mathematics of Language 11 for discussion and suggestions.
C. Ebert, G. Jäger, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 1–12, 2010. © Springer-Verlag Berlin Heidelberg 2010
2
M.F. Boston, J.T. Hale, and M. Kuhlmann
direct connection between the two formalisms. Using this connection, we investigate the linguistic relevance of structural constraints such as non-projectivity and illnestedness based on MGs that induce structures with these properties. The system proposed is not a new formalism, but a technical tool that we use to gain linguistic insight. Section 2 describes MGs as they are formalized in [17], and Section 3 translates MG operations into operations on dependency structures. Sections 4 and 5 discuss the structural constraints of projectivity and nestedness in terms of MGs.
2
Minimalist Grammars
This section introduces notation particular to the MG formalism. Following [16] and [17], a Minimalist Grammar G is a five-tuple (Σ, F, Types, Lex , F ). Σ is the vocabulary of the grammar, which can include empty elements. Figure 6 exemplifies the use of empty “functional” elements typical in Chomskyan and Kaynian analyses. There denotes an empty element that, while syntactically potent, makes no contribution to the derived string. F is a set of features, built over a non-empty set of base features, which denote the lexical category of the item. If f is a base feature, then =f is a selection feature, which selects for complements with base feature f ; a prefixed + or − identifies licensor and licensee features, +f and −f respectively, that license movement. A Minimalist Grammar distinguishes two types of structure. The “simple” type, flagged by double colons (::), identifies items fresh out of the lexicon. Any involvement in structure-building creates a “derived” item (:) flagged with a single colon. This distinction allows the first derivation of syntactic composition to be handled differently from later episodes. A chain is a triple Σ ∗ × T ypes × F ∗ , and an expression is a non-empty sequence of chains. The set of all expressions is denoted by E. The lexicon Lex is a finite subset of chains with type ::. The set F is a set of two generating functions, merge and move. For simplicity, we focus on MGs that do not incorporate head or covert movement. Table 1 presents the functions in inference-rule form, following [17]. In the table, the juxtaposition st denotes the concatenation of two strings s and t. Merge is a structure building operation that creates a new, derived expression from two expressions (E×E → E). It is the union of three functions, shown in the upper half of Table 1. Each sub-function applies according to the type and feature of the lexical items to be merged. The three merge operations differ with respect to the types of chains they operate on. If s is simple (::), the merge1 operation applies. If it is derived (:), the merge2 operation applies. We write · when the type does not matter. If t has additional features δ, the merge3 operation must apply regardless of the type of s. The move operation is a structure building operation that creates a new expression from an expression (E → E). It is the union of two functions, move1 and move2, provided in the lower half of Table 1. As with the merge3 operation, move2 only applies when t has additional features δ.
Dependency Structures Derived from Minimalist Grammars
3
Table 1. Merge and Move s :: =f γ
t · f, α1 , . . . αk
st : γ, α1 , . . . , αk s : =f γ, α1 , . . . , αk
merge1
t · f, ι1 , . . . , ιl
ts : γ, α1 , . . . , αk , ι1 , . . . , ιl s · =f γ, α1 , . . . , αk
t · f δ, ι1 , . . . , ιl
s : γ, α1 , . . . , αk , t : δ, ι1 , . . . , ιl
merge2 merge3
s : +f γ, α1 , . . . , αi−1 , t : −f, αi+1 , . . . , αk ts : γ, α1 , . . . , αi−1 , αi+1 , . . . , αk s · +f γ, α1 , . . . , αi−1 , t : −f δ, αi+1 , . . . , αk s : γ, α1 , . . . , αi−1 , t : δ, αi+1 , . . . , αk
3
move1 move2
MG Operations on Dependency Trees
In this section we introduce a formalism to derive dependency structures from MGs. Throughout the discussion N denotes the set of non-negative integers. 3.1
Dependency Trees
DG is typically discussed in terms of directed dependency graphs. However, the directed nature of dependency arrows and the single-headed condition [13] allow these graphs to also be viewed as trees. We define dependency trees in terms of their nodes, with each node in a dependency tree labeled by an address, a sequence of positive integers. We write λ for the empty sequence of integers. Letters u, v, w are variables for addresses, s,t are variables for sets of addresses, and x, y are variables for sequences of addresses. If u and v are addresses, then the concatenation of the two is as well, denoted by uv. Given an address u and a set of addresses s, we write ↑u s for the set { uv | v ∈ s }. Given an address u and a sequence of addresses x = v 1 , . . . , v n , we write ↑u x for the sequence uv 1 , . . . , uv n . Note that ↑u s is a set of addresses, whereas ↑u x is a sequence of addresses. A tree domain is a set t of addresses such that, for each address u and each integer i ∈ N, if ui ∈ t, then u ∈ t (prefix-closed), and uj ∈ t for all 1 ≤ j ≤ i (left-sibling closed). A linearization of a finite set S is a sequence of elements of S in which each element occurs exactly once. For the purposes of this paper, a dependency tree is a pair (t, x), where t is a tree domain, and x is a linearization of t. A segmented dependency tree is a non-empty sequence (s1 , x1 ), . . . , (sn , xn ), where each si is a set of addresses, each xi is a linearization of si , all sets si are pairwise disjoint, and the union of the sets si forms a tree domain. A pair (si , xi ) is called a component, which corresponds to chains in Stabler and Keenan’s [17] terminology.
4
M.F. Boston, J.T. Hale, and M. Kuhlmann
An expression is a sequence of triples (c1 , τ 1 , γ 1 ), . . . , (cn , τ n , γ n ), where (c1 , . . . , cn ) is a segmented dependency tree, each τ i is a type (lexical or derived), and each γ i is a sequence of features. We write these triples as ci :: γ i (if the type is lexical), ci : γ i (if the type is derived), or ci · γ i (if the type does not matter). We use the letters α and ι as variables for elements of an expression. Given an element α = ((s, x), τ, γ) and an address u, we write ↑u α for the element ((↑u s, ↑u x), τ, γ). Given an expression d with associated tree domain t, we write next(d) for the minimal positive integer i such that i ∈ / t. 3.2
Merge
Merge operations allow additional dependency structure to be added to an initially derived tree. These changes are recorded in the manipulation of the dependency tree addresses, as formalized in the previous section. Table 2 provides a dependency interpretation for each of the structure-building rules introduced in Table 1. The mergeDG functions create a dependency between two trees, such that the root of the left tree becomes the head of the root of the right tree, where left and right correspond to the trees in the rules. For example, in merge1DG the ↑1 t notation signifies that t is now the first daughter of s. Its components are similarly updated. Table 2. Merge in terms of dependency trees ({λ}, λ) :: =f γ
(t, x) · f, α1 , . . . , αk
({λ} ∪ ↑1 t, λ · ↑1 x) : γ, ↑1 α1 , . . . , ↑1 αk (s, x) : =f γ, α1 , . . . , αk
merge1DG
(t, y) · f, ι1 , . . . , ιl
(s ∪ ↑i t, ↑i y · x) : γ, α1 , . . . , αk , ↑i ι1 , . . . , ↑i ιl (s, x) · =f γ, α1 , . . . , αk
merge2DG
(t, y) · f δ, ι1 , . . . , ιl
(s, x) : γ, α1 , . . . , αk , (↑i t, ↑i y) : δ, ↑i ι1 , . . . , ↑i ιl
merge3DG
where i = next ((s, x) · =f γ, α1 , . . . , αk )
Applying the merge1DG rule to a simple English grammar creates the dependency tree in Figure (1). Dependency relations between nodes are notated with solid arrows and node labels are notated with dotted lines; the dependency graphs shown in the figures are encoded by the address sets described above in Table 2. A lexicon for these examples is provided in Figure 1(a). As was mentioned above, the merge rules apply in different contexts depending on the tree types and number of features. Merge1DG can apply in Figure 1 because the selector tree the is simple and the selected tree boat does not have additional features δ. The entire derived tree forms a single component, denoted by the dashed box. Merge2 contrasts with merge1 in the linearized order of the nodes: in this case, the right tree is ordered before the left tree and its children, as in Figure 2(a).
Dependency Structures Derived from Minimalist Grammars the::=n d
the::=n d (b)
boat::n
docked::=d=d v where::d -wh (a) Lexicon 1.
boat::=n (c)
5
::=v +wh c
the:d boat (d)
Fig. 1. merge1DG applies to two simple dependency trees
The rules given in Table 2 are a deduction system for expressions: sequences of triples whose first components together define a segmented dependency tree. Each component, spanning any number of words, has a feature-sequence associated with it. Merge3DG introduces new, unlinearized components into the derivation. These components are unordered with respect to the other components, though the words within the components are ordered. In Figure 2(b), docked and where are represented by separate dashed boxes; this indicates that their relative linear order is unknown. Merge3DG contrasts with applications of merge1DG merge2DG, where two dependency trees are merged into one, fullyordered component, demonstrated by Figures 1(d) and 2(a). 3.3
Move
The move operation does not create or destroy dependencies in the derived tree. It re-orders the nodes, and reduces the number of components in the tree by one. Table 3 defines these rules in terms of dependency trees. Table 3. Move in terms of dependency trees (s, x) : +f γ, α1 , . . . , αi−1 , (t, y) : −f, αi+1 , . . . , αk (s ∪ t, yx) : γ, α1 , . . . , αi−1 , αi+1 , . . . , αk s · +f γ, α1 , . . . , αi−1 , t : −f δ, αi+1 , . . . , αk s : γ, α1 , . . . , αi−1 , t : δ, αi+1 , . . . , αk
move1DG
move2DG
Figure 3 demonstrates the move1DG operation on a simple structure. The where node is not only reordered to the front of the tree, but it also becomes part of the node’s component. Note that for this example we use an to denote the structural position that has the +wh feature. This follows from standard linguistic practice; in some languages, this position can be marked by an overt lexical item. In English, it is not. Both and overt lexical items can have licensor features. Unlike the previous merge and move operations described, move2DG does not change the dependency structure or linearization of the tree. Move2DG applies when the licensee component has additional features that require further movements. Its sole purpose is to cancel the licensor and licensee features; this feature
6
M.F. Boston, J.T. Hale, and M. Kuhlmann
the boat sailed: v (a)
docked:=d v where:-wh (b)
Fig. 2. merge2DG and merge3DG
:+wh c the boat docked where: -wh
where :c the boat docked Fig. 3. move1DG
cancellation is necessary to preserve the linguistic intuition of the intermediate derivation step. Figure 4 demonstrates Move2DG. Although the +wh and −wh features are canceled, the linear order is unaffected and the additional licensee feature −t on where remains. Following [16] and [17], we restrict movement with the Shortest Move Condition (SMC), defined in (1). (1) None of α1 , ..., αi−1 , αi+1 , ..., αk has −f as its first feature. Adoption of the SMC guarantees a version of MGs that are weakly equivalent to LCFRS [2]. The rules above derive connected dependency structures. This is demonstrated by induction on the derived structure: Single-node trees (i.e., simple lexical items) vacuously satisfy connectedness. All merge rules create dependencies between trees, and movements do not destroy any already-created dependencies. Therefore, a dependency structure at any derivation step will be connected. Provided that every expression in the lexicon has exactly one base feature, the dependency trees derived from MGs will not contain multi-headed nodes (i.e., nodes with multiple parents). This single-headedness proof follows straightforwardly from two lemmas concerning the role of base features in expressions E = (c1 , τ 1 , γ 1 ), . . . , (cn , τ n , γ n ). Lemma 1 asserts the unique existence of a base feature f in the first feature sequence γ 1 . Lemma 2 denies the existence of base features in later components.
Dependency Structures Derived from Minimalist Grammars
7
:+wh c the boat docked where: -wh -t
: c the boat docked where: -t Fig. 4. move2DG
The dependency structures derived at intermediate steps need not be totally ordered. Because of the merge3 and move2 rules, components can be introduced into the dependency structure that have not yet moved to their final order in the structure. However, the usual notion of start category [17] in MGs is a single base feature. This implies that in complete derivations all licensee feature have been checked. The implication guarantees that dependency trees derived using the system in Tables 2 and 3 are totally-ordered.
4
Minimalist Grammars and Block Degree
Projectivity is a constraint on dependency structures that requires subtrees to span intervals. [9] define an interval as the set [i, j] := {k ∈ V |i ≤ k and k ≤ j}, where i and j are endpoints, and V is a set of nodes as defined in Section 3.1. Non-projective structures violate this constraint. The node labeled docked in Figure 3 spans two intervals: its child spans interval 0 and the node and its other children span intervals 2-4. Following [8] we use the notion of block degrees to characterize non-projective structures. A tree’s block degree is the maximum number of intervals each of its subtrees span. The block degree for Figure 3, repeated in Figure 5, is two: each of the intervals of the node labeled docked forms a block. Shaded boxes notate node blocks.
where:
:c the boat docked
Fig. 5. The block degree of this structure is 2
8
M.F. Boston, J.T. Hale, and M. Kuhlmann
By construction, mergeDG always forms dependency relations between the roots of subtrees. All nodes in the resulting expression are part of the same interval. Move1DG has the potential to create non-projective structures: constituents can move away from the interval that the parent node spans to create a separate constituent block, as demonstrated by Figure 5. In this example, the element intervenes between docked to where dependency. As discussed above, in other languages could be replaced by an overt lexical item. Both types of intervention are considered non-projective in this work. Because only movements can cause non-projectivity, and because all movements are triggered in the MG framework by a licensor and licensee pair, the block degree of the derived tree is bounded by the number of licensees. In other words, the number of licensees determines the maximum block degree of the structure. This number has previously been identified as an upper bound on the complexity of MGs [11,6]. The coincidence of this result follows from work by [14], who attributed the increased parsing complexity of LCFRS to nonprojectivity [8].
5
Minimalist Grammars and Nestedness
A further constraint on the class of dependency structures is wellnestedness [8]. Wellnested structures prohibit the “crossing” of disjoint subtree intervals. Any structure that is not wellnested is said to be illnested, as in Figure 6(b). Here, the subtree spanning the hearing on the issue crosses the subtree spanning scheduled today. [8] demonstrates that grammars that derive illnested structures are more powerful than grammars that do not, which leads to higher parsing complexity. the::=n =p d -y hearing::n on::=d p -w the::=n d issue::n is:: =v =d t -x scheduled::=r v today::r -w ::=t +w c ::=c +x c ::=c +y c ::=c +z c (a)
the
hearing
is
scheduled
(b)
on
the
issue
Fig. 6. MGs derive illnested dependency structures
today
Dependency Structures Derived from Minimalist Grammars
the: -z
today
on
is
the
hearing
on: -x the issue is: t -y scheduled (a) Merged structure before movement.
9
today: -w
:c the: -z hearing on: -x the issue is: -y scheduled (b) Merge of ::=v +w c and is: -y. Movement of today: -w.
the
issue :c today the: -z hearing is: -y scheduled (c) Merge of ::=c +x c and : c. Movement of on: -x.
scheduled :c on the issue today the: -z (d) Merge of ::=c +y c and : c. Movement of is: -y.
hearing
hearing :c is scheduled on the issue today (e) Merge of ::=c +z c and : c. Movement of the: -z. Fig. 7. Derivation of illnested English structure
10
M.F. Boston, J.T. Hale, and M. Kuhlmann
We prove that MGs are able to derive illnested structures by example. The grammar in Figure 6(a) derives the illnested English structure in Figure 6(b). The result is a 1-illnested structure, the lowest level of illnestedness in the characterization of [10]. Not all mildly context-sensitive formalisms can derive illnested structures. For example, TAGs can only generate wellnested structures with a block-degree of at most two [8]. Our proof demonstrates that MGs derive structures with higher block degrees, which have a potential for illnested structures. This allows MGs to generate the same string languages as LCFRS, which also generate illnested structures [15]. The illnested structure in Figure 6(b) is also interesting from a linguistic perspective. It represents a case of noun-complement clause extraposition [4], where the complement on the issue is extraposed from the determiner phrase the hearing. The additional extraposition of the adverb today from the verb phrase is scheduled leads to the illnested final structure. Several analyses of extraposition are put forth in the literature, but here we choose Kayne’s [7] “stranding analysis”, where a series of leftward movements leads to modifier stranding. These movements are each motivated by empty functional categories that could be overt in other possible human languages. In this lexicon, first the adverb is moved by the licensor +w (Figure 7(a)), followed by the prepositional phrase, the verb phrase, and finally the noun phrase, as in Figures 7(b) through 7(e). The analysis of the illnested structure in Figure 6(b) in terms of extraposition provides a first step towards an understanding of the linguistic relevance of illnested structures. This is not only useful for the analysis of dependencies in formal grammars, but also the analysis of extraposition in linguistics. Investigating the linguistic qualities of illnested structures cross-linguistically also shows promise. For example, treebanks from languages with freer word order, such as Czech, tend to have more illnested structures [8]. The sentence in Figure 8 is sentence number Ln94209_45.a/18 from the PDT 2.02 . An English gloss of the sentence is “A strong individual will obviously withstand a high risk better than a weak individual”.3 This particular example is a comparative construction (better X than Y) [3], which can give rise to illnestedness in Czech. The MG acknowledges syntactic relationships between the comparative construction and the adjectives weak and strong. The specific analysis stays close to the Kaynian tradition in supposing empty categories that intertwine with the dependencies from two different subtrees. Other constructions that cause illnestedness in the PDT are subject complements, verb-nominal predicates, and coordination (Kateřina Veselá, p.c.). At least in the case of Czech, this evidence suggests that the expressive power of a syntactic movement rule (rather than just attachment) is required.
2 3
Punctuation is removed to simplify the diagram. The authors thank Jiří Havelka (IBM Czech Republic), Kateřina Veselá (Charles University), and E. Wayles Browne (Cornell University) for the translation and linguistic analysis of the illnested structures in the PDT.
Dependency Structures Derived from Minimalist Grammars
Vysokému::Atr se::AuxT lépe::=AuxC Adv slabý::ExD jedinec::=Atr Sb -a ::=Pred +a c ::=c +d c ::=c +f c
11
riziku::=Atr Obj -f samozřejmě::AuxY než::=ExD AuxC -b silný::Atr -d ubrání::=Obj =Adv =AuxY =AuxT =Sb Pred -e ::=c +b c ::=c +e c
(a)
Vysokému riziku :c se samozřejmě lépe high
ubrání
silný než
slabý jedinec
risk :c self obviously better will-defend strong than weak individual
(b) Fig. 8. An illnested Czech example from the Prague Dependency Treebank
6
Conclusion
This paper provides a definition of MG merge and move operations in terms of dependency trees, and examines the properties of these operations in terms of both projectivity and nestedness constraints. We find that MGs with movement rules derive illnested structure of exactly the sort required by Czech comparatives and English noun-complement clause extractions. The work also provides a basis for future research in determining how different types of MG movement, such as head, covert, and remnant movement, interact with dependency constraints and properties like illnestedness. Dependencygenerative capacity may also provide a new avenue of research into determining how different types of locality constraints (besides the SMC) interact with generative capacity [2].
References 1. Chomsky, N.: The Minimalist Program. MIT Press, Boston (1995) 2. Gärtner, H.M., Michaelis, J.: Some remarks on locality conditions and Minimalist Grammars. In: Sauerland, U., Gärtner, H.M. (eds.) Interfaces + Recursion = Language?, pp. 161–195. Mouton de Gruyter, Berlin (2007) 3. Goldberg, A.: Constructions at Work: The Nature of Generalization in Language. Oxford University Press, New York (2006)
12
M.F. Boston, J.T. Hale, and M. Kuhlmann
4. Guéron, J., May, R.: Extraposition and logical form. Linguistic Inquiry 15, 1–32 (1984) 5. Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M.: Prague dependency treebank 2.0 (2000) 6. Harkema, H.: A characterization of minimalist languages. In: de Groote, P., Morrill, G., Retoré, C. (eds.) LACL 2001. LNCS (LNAI), vol. 2099, p. 193. Springer, Heidelberg (2001) 7. Kayne, R.S.: The Antisymmetry of Syntax. MIT Press, Cambridge (1994) 8. Kuhlmann, M.: Dependency structures and lexicalized grammars. Ph.D. thesis, Universität des Saarlandes (2007) 9. Kuhlmann, M., Nivre, J.: Mildly non-projective dependency structures. In: Proceedings of the COLING/ACL 2006, pp. 507–514 (2006) 10. Maier, W., Lichte, T.: Characterizing discontinuity in constituent treebanks. In: Proceedings of the Fourteenth Conference on Formal Grammar (2009) http://webloria.loria.fr/~degroote/FG09/Maier.pdf 11. Michaelis, J.: Derivational minimalism is mildly context-sensitive. In: Moortgat, M. (ed.) LACL 1998. LNCS (LNAI), vol. 2014, p. 179. Springer, Heidelberg (2001) 12. Michaelis, J.: On formal properties of Minimalist Grammars. Linguistics in Potsdam (LiP) 13, Universitätsbibliothek Publikationsstelle, Potsdam (2001) 13. Nivre, J.: Inductive Dependency Parsing. In: Text, Speech and Language Technology, Springer, New York (2006) 14. Satta, G.: Recognition of linear context-free rewriting systems. In: Proceedings of the Association for Computational Linguists (ACL), pp. 89–95 (1992) 15. Seki, H., Matsumura, T., Fujii, M., Kasami, T.: On multiple context-free grammars. Theoretical Computer Science 88(2), 191–229 (1991) 16. Stabler, E.P.: Derivational minimalism. In: Retoré, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 68–95. Springer, Heidelberg (1997) 17. Stabler, E.P., Keenan, E.: Structural similarity within and among languages. Theoretical Computer Science 293(2), 345–363 (2003) 18. Tesnière, L.: Éléments de syntaxe structurale. Editions Klincksiek (1959) 19. Vijay-shanker, K., Weir, D.J., Joshi, A.K.: Characterizing structural descriptions produced by various grammatical formalisms. In: Proceedings of the Association for Computational Linguists (ACL), pp. 104–111 (1987)
Deforesting Logical Form Zhong Chen and John T. Hale Department of Linguistics Cornell University Ithaca, NY 14853-4701, USA {zc77,jthale}@cornell.edu
Abstract. This paper argues against Logical Form (LF) as an intermediate level of representation in language processing. We apply a program transformation technique called deforestation to demonstrate the inessentiality of LF in a parsing system that builds semantic interpretations. We consider two phenomena, Quantifier Raising in English and Wh-movement in Chinese, which have played key roles in the broader argument for LF. Deforestation derives LF-free versions of these parsing systems. This casts doubt on LF’s relevance for processing models, contrary to suggestions in the literature.
1
Introduction
It is the business of the computational linguist, in his role as a cognitive scientist, to explain how a physically-realizable system could ever give rise to the diversity of language behaviors that ordinary people exhibit. In pursuing this grand aspiration, it makes sense to leverage whatever is known about language itself in the pursuit of computational models of language use. This idea, known as the Competence Hypothesis [5], dates back to the earliest days of generative grammar. Hypothesis 1 (Competence Hypothesis). A reasonable model of language use will incorporate, as a basic component, the generative grammar that expresses the speaker-hearer’s knowledge of the language. The Competence Hypothesis is the point of departure for the results reported in this paper. Section 4 and 5 show how a syntactic theory that incorporates a level of Logical Form (LF) can be applied fairly directly in a physically-realizable parser. These demonstrations are positive results about the viability of certain kinds of transformational grammars in models of language use. While not strictly novel, such positive results are important for cognitive scientists and others who wish to maintain the Competence Hypothesis about the grammar-parser relationship. In particular, the parser described in Section 4.2 handles a variety of English quantifier-scope examples that served to motivate LF when that level was first introduced. Section 5.2 extends the same technique to Chinese wh-questions, a case that has been widely taken as evidence in favor of such a level. C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 13–28, 2010. c Springer-Verlag Berlin Heidelberg 2010
14
Z. Chen and J.T. Hale
Sections 4.3 and 5.3 then apply a general program transformation technique called deforestation to each parser. Deforestation, as championed by Wadler [25] and described in more detail in Section 3, is a general method for getting rid of intermediate data structures in functional programs. In the current application, it is the LF representations that are eliminated. The resultant parsing programs do not construct any such representations, although they do compute the same input-output function. The outcome of deforestation is a witness to the fact that the parser need not construct LFs in order to do the job the grammar specifies. This inessentiality suggests that representations at the LF level should not be viewed as causally implicated in human sentence comprehension, despite suggestions to the contrary in the literature. For instance, Berwick and Weinberg suggest that: There is good evidence that the data structures or units of representation posited by theories of transformational grammar are actually implicated causally in online language processing. [1, 197] Consider this claim in relation to the positive and negative results. If one were to observe a characteristic pattern of errors or response times across two language understanding tasks that vary only in their LF-handling requirements, then a good theory of these results might involve some kind of LF parser whose computational actions causally depend on LF representations. The positive results of Sections 4.2 and 5.2 show how this could be done. However, the negative results to be presented in Sections 4.3 and 5.3 imply that the very same kind of model can, through deforestation, be reformulated to avoid calculating with precisely the LF representations that Berwick and Weinberg suggest are causally-implicated in language understanding. This is a paradox. To escape it, such evidence, if it exists, must instead be viewed as evidence for cognitive processes that calculate the language relationships that LF specifies, rather than as evidence for LF per se. The conjunction of these negative and positive results holds special significance for linguistics because it casts doubt on two widely held hypotheses. Hypothesis 2 (No levels are irrelevant). To understand a sentence it is first necessary to reconstruct its analysis on each linguistic level. [4, 87] Hypothesis 3 (LF Hypothesis). LF is the level of syntactic representation that is interpreted by semantic rules. [22, 248] We present counter-examples where understanding does not require reconstruction of the LF level, and where semantic rules would not apply at LF. With the broader significance of the question in mind, Section 2 identifies the sense of the term “Logical Form” at issue. A brief introduction to deforestation follows in Section 3. Sections 4.2 and 5.2 then define two parsers whose program text closely mirrors the grammar. Sections 4.3 and 5.3 discuss how deforestation applies to both programs. Section 6 makes some concluding remarks, speculating on relationships between this deforestation result and recent developments in linguistic theory.
Deforesting Logical Form
2 2.1
15
What Is Meant by LF LF Is a Level of Representation
Outside of the transformational generative grammar community, the words “Logical Form” refer to a symbolization of a sentence’s meaning in some agreed-upon logic. However, within this community, LF is a technical term that refers to a specific level of representation — an obligatory subpart of well-formed structural descriptions. Hornstein writes, LF is the level of representation at which all grammatical structure relevant to semantic interpretation is provided. [15, 3] Hornstein describes LF as being introduced in response to a realization that “surface structure cannot adequately bear the interpretive load expected of it” [15, 2]. Thus, in the transition from Chomsky’s Revised Extended Standard Theory to theories based on Government and Binding, an interpretive semantics is retained, while the precise level of representation being most directly interpreted is altered. 2.2
LF Is an Interface Level
In the course of applying LF to problems of quantifier scope, May [23] makes clear that LF is a kind of interface level by writing, We understand Logical Form to be the interface between a highly restricted theory of linguistic form and a more general theory of natural language semantics and pragmatics. [23, 2] The heavy lifting will be done by this more general theory. May emphasizes the mismatch between the limited capabilities of the rules affecting LF, compared to those that will be required to turn LFs into bona fide semantic representations. The latter occupies a different level, LF , pronounced “LF prime”. Representations at LF are derived by rules applying to the output of sentence grammar . . . Since the rules mapping Logical Form to LF are not rules of core grammar, they are not constrained by the restrictions limiting the expressive power of rules of core grammar. [23, 27] Representations at LF are subject to “the recursive clauses of a Tarskian truthcondition theory” [23, 26]. This is to be contrasted with representations at LF that are not. These representations are phrase markers, just like the immediate constituency trees at surface structure. This sense of LF has been repeatedly invoked in Chomsky’s recent work [7, fn 20][8, fn 11]. With this LF–terminology clarified, subsequent sections go on to apply functional programming techniques to define a surface structure parser and extend it with a transformational rule, Move-α, to build LF representations. We investigate two cases of this rule, Quantifier Raising in Section 4 and Wh-movement in Section 5. The relationship between these levels is depicted in Figure 1.
16
Z. Chen and J.T. Hale
Sentences
parser
Surface Structures
move-α (QR, Wh-movement...)
Logical Forms
mapping rules (convert)
LF′s
Fig. 1. The interlevel relationships from a sentence to LF s
3
Deforestation
Deforestation is a source-to-source program transformation used to derive programs that are more efficient in the sense that they allocate fewer data structures that only exist ephemerally during a program’s execution. Wadler provides a small deforestation example [25] which we repeat below as Listing 1 in the interest of self-contained presentation. n In this example, the main function sumSquares calculates the value x=1 x2 . let rec upto m n = match (m>n) with true −> [] | false −> m::(upto (m+1) n) let square x = x∗x let rec map f xs = match xs with [] −> [] | y :: ys −> (f y)::(map f ys) let sum xs = let rec sumAux a = function [] −> a | x :: rest −> sumAux (a+x) rest in sumAux 0 xs let sumSquares x = sum (map square (upto 1 x))
Listing 1. Classic deforestation example
The manner in which sumSquares actually calculates the sum of squares is typical of functional programs. Intuitively, it seems necessary that the upto function must first create the list [1, 2, . . . , n]. Then the map function maps square over this list, yielding a new list: [1, 4, . . . , n2 ]. These lists are both kinds of intermediate data. They are artifacts of the manner in which the function is calculated. the 3 Indeed, 1 2 sum function transmutes its input list into a single integer, 6 2n + 3n + n . This makes clear that the intermediate lists have a limited lifetime during the execution of the function. The idea of deforestation is to translate programs like the one in Listing 1 into programs like the one in Listing 2. let sumSquaresDeforested x = let rec h a m n = if m > n then a else h (a + (square m)) (m+1) n in h 0 1 x
Listing 2. Deforested program does not allocate intermediate lists
Deforesting Logical Form
17
The program in Listing 2 recurs on an increasing integer m and scrupulously avoids calling the Caml list constructor (::). The name “deforestation” is evocative of the fact that intermediate trees can be eliminated in the same way as lists. Over the years, many deforestation results have been obtained in the programming languages community. The typical paper presents a system of transformation rules that converts classes of program fragments in an input language into related fragments in the output language. An example rule from Wadler’s transformation scheme T is given is given below. T f t1 . . . tk = T t[t1 /v1 , . . . , tk /vk ]
(4)
where f is defined by f v1 . . . vk = t The deforestation rule above essentially says that if the definition of a function f is a term t in the input language, then the deforestation of f applied to some argument terms t1 through tk is simply the replacement of the function by its definition, subject to a substitution of the formal parameters v1 ,. . . ,vk by the actual parameters t1 ,. . . ,tk . This kind of program transformation is also known as the unfold transformation [3]. Wadler provides six other rules that handle other features of his input language; the original paper [25] should be consulted for full details. In the present application, neither automatic application nor heightened efficiency is the goal. Rather, at issue is the linguistic question whether or not a parser that uses LF strictly needs to do so. Sections 4.3 and 5.3 argue for negative answers to this question.
4
Quantifier Raising
Quantifier Raising (QR) is an adjunction transformation1 proposed by May [23] as part of a syntactic account of quantifier scope ambiguities. May discusses the following sentence (5) with two quantifiers every and some. It has only one reading; the LF representation is given in (6). (5) Every body in some Italian city met John. ‘There is an Italian city, such that all the people in it met John.’ (6) ∃x (∀y ((Italian-city (x) & (body-in (y, x) → met-John (y))))) 4.1
Proper Binding
When May’s QR rule freely applies to a sentence with multiple quantifiers, it derives multiple LFs that intuitively correspond to different quantifier scopes. 1
The word “transformation” here is intended solely to mean a function from trees to trees. Adjunction is a particular kind of transformation such that, in the result, there is a branch whose parent label is the same as the label of the sister of the re-arranged subtree.
18
Z. Chen and J.T. Hale
Sometimes this derivational ambiguity correctly reflects scope ambiguity. However, in semantically unambiguous cases like (5), the QR rule overgenerates. A representational constraint, the Proper Binding Condition (PBC) [12] helps to rule out certain bad cases that would otherwise be generated by unfettered application of QR. Principle 1 (The Proper Binding Condition on QR). Every raised quantified phrase must c-command2 its trace. Perhaps the most direct rendering of May’s idea would be a program in which LFs are repeatedly generated but then immediately filtered for Proper Binding. One of the LFs for example (5), shown in Figure 2(b) below, would be ruled out by the PBC because DP1 does not c-command its trace t1 . S
S
DP1 D some
S NP
Adj
DP2
DP2
S
N D
NP
Italian city every
VP V
N N
t2
PP
body P t1
DP
met PN John
S
D
NP
every
N N
DP1 D
PP
body P t1
some
S t2
NP Adj
VP V
N
Italian city
DP
met PN
in
John
in
(a) LF1
(b) *LF2
Fig. 2. The logical form of the Quantifier Raising example (5)
This kind of “Generate-and-Test” application of Proper Binding is obviously wasteful. There is no need to actually create LF trees that are doomed to failure. Consider the origins of possible Proper Binding violations. Because QR stacks up subtrees at the front of the sentence, the main threat is posed by inopportune choice of subtree to QR. A precedence-ordered list of quantified nodes is destined to fail the PBC just in case that list orders a super-constituent before one of its sub-constituents. This observation paves the way for a change of representation from concrete hierarchical trees to lists of tree-addresses3. Principle 2 below reformulates the PBC in terms of this alternative representation. 2
3
C-command is a relationship defined on tree nodes in a syntactic structure. A node α c-commands another node β if the first node above α contains β. The “Gorn address” is used here. It is a method of addressing an interior node within a tree [14].) Here we illustrate the Gorn address as an integer list. The Gorn address of the tree root is an empty list [] with the first child [0] and the second child [1]. The j-th child of the node with the Gorn address [i] has an address [i, j − 1].
Deforesting Logical Form
19
Principle 2 (Linear Proper Binding Condition on quantified phrases) Let L = n1 , . . . , nm be a list of tree-node addresses of quantified phrases. L will violate the Proper Binding Condition if any address ni is a prefix of nj (i < j). The tree-addresses of every body in some Italian city and some Italian city in the surface structure tree are [0] and [0, 1, 0, 1, 1] respectively. The first address is the prefix of the latter. A list of quantified nodes [[0], [0, 1, 0, 1, 1]], indicating that every body outscopes some Italian city, will violate Principle 2. Recognizing this allows a parser to avoid constructing uninterpretable LFs like the one in Figure 2(b). Note that the effect of this linear PBC is exactly the same as that of its hierarchical cousin, Principle 1. The following sections take up alternative implementations of the function from sentences to LF representations. One of these implementations can be derived from the other. The first implementation obtains LF s after building LFs. The second one does not allocate any intermediate LF but instead calculates LF s directly based on surface structure analyses. 4.2
The Quantifier Raising Implementation
The Appendix presents a small Context Free Grammar used in a standard combinator parser4 that analyzes example (5). The LF-using implementation takes each PBC-respecting quantifier ordering, applies QR, then converts the resulting LF into an LF formula. Illicit LFs, such as the one in Figure 2(b), are never built due to the linear PBC. Raised quantified phrases are translated into pieces of formulas using two key rules: some Italian city . . . ⇒ ∃x (Italian-city(x) & . . . ) every body in x . . . ⇒ ∀y (body-in(y, x) → . . . ) All these procedures can be combined in a LF parser as in Listing 3. let withLF ss = Seq.map convert (Seq.map qr (candidate quantifier orderings ss))
Listing 3. QR analyzer with plug-in that uses LF
The higher-order function Seq.map applies its first argument to every answer in a stream of alternative answers. Our program faithfully calculates interactions between the constraints that May’s competence theory specifies. Its output corresponds to the LF representation in (6). # Seq.iter formula (analyze withLF "every body in some italian city met john");; ‘exists v31[forall v32[(italian-city(v31) & (body-in(v32,v31) -> met-john(v32)))]]’ - : unit = ()
4
A combinator is a function that takes other functions as arguments, yielding more interesting and complicated functions just through function-application rather than any kind of variable-binding [10][11]. Parser combinators are higher-order functions that put together more complex parsers from simpler ones. The parsing method is called combinatory parsing [2][13].
20
Z. Chen and J.T. Hale
4.3
Deforesting LF in Quantifier Raising
Executing analyze withLF allocates an intermediate list of quantifier-raised logical forms. These phrase markers will be used as the basis for LF representations by convert. The existence of an equivalent program, that does not construct them, would demonstrate the inessentiality of LF. let fmay (ss , places) = convert (qr (ss , places)) deforest let withoutLF ss = Seq.map fmay (candidate quantifier orderings ss)
Listing 4. LF-free quantifier analyzer
The key idea is to replace the composition of the two functions convert and qr into one deforested function fmay which does the same thing without constructing any intermediate LFs. Algorithm 1 provides pseudocode for this function, which can be obtained via a sequence of deforestation steps5 , as shown in Table 1. It refers to rules defined by Wadler [25], including the unfold program transformation symbolized in (4). The implementation takes advantage of the fact that the list of quantified nodes, QDPplaces, can be filtered for compliance with the linear PBC ahead of time. Table 1. Steps in the deforestation of QR example (5) Wadler’s Number Action in the program 3 unfold a function application convert with its definition 6 unfold a function application qr with its definition 7 broadcast the inner case statement of a QDP outwards 5 simplify the matching on a constructor. Use fact: NP is the second daughter of DP 5 get rid of the matching on a Some constructor Not relevant knot-tie in the outermost match. Realize we started by translating (convert (qr (ss,places))) == fmay (ss,places)
Since the PBC applies equally well to lists of “places”, it is no longer necessary to actually build tree structures. The deforested procedure recurs down the list of QDPplaces (Line 5); applies QR rules according to their quantifier type (Line 10 or 16) and turns phrases into pieces of formulas using predicateify (Line 7 or 13). No phrase markers are moved and no LF representations are built.
5
Wh-Movement
Apart from quantifier scope, perhaps the most seminal application of LF in syntax is motivated by wh-questions. The same deforestation techniques are equally applicable to this case. They similarly demonstrate that a LF-free semantic 5
The full derivation of the deforestation is omitted in the paper due to space limits. The rule numbers used in Table 1 is consistent with Wadler [25, 238, Figure 4].
Deforesting Logical Form
21
Algorithm 1. Pseudocode for the deforested fmay 1: 2: 3: 4: 5: 6:
function fmay (ss,QDPplaces) if no more QDPplaces then predicateify(ss) else examine the next QDPplace qdp if qdp = DP then D NP
7: 8: 9: 10: 11: 12:
every let restrictor = predicateify(NP) let v be a fresh variable name let body = ss with qdp replaced by an indexed variable ∀ v restrictor(v) → fmay (body, remaining QDPplaces) end if if qdp = DP then D NP
13: 14: 15: 16: 17: 18: 19:
some let restrictor = predicateify(NP) let v be a fresh variable name let body = ss with qdp replaced by an indexed variable ∃ v restrictor(v) ∧ fmay (body, remaining QDPplaces) end if end if end function
interpreter may be obtained by deforesting a straightforward implementation that does use LF. To understand this implementation, a bit of background on wh-questions is in order. The standard treatment of wh-questions reflects languages like English where a wh-word overtly moves to a clause-initial position when forming an interrogative. However, it is well-known that in Chinese, the same sorts of interrogatives leave their wh-elements in clause-internal positions. This has come to be known as “wh-in-situ” [18]. Although the wh-word remains in its surface position, some syntacticians have argued that movement to a clause-initial landing site does occur, just as in English, but that movement is not visible on the surface because it occurs at LF [16][17]. This analysis thus makes crucial use of LF as a level of analysis. 5.1
The ECP and the Argument/Adjunct Asymmetry
At the heart of the argument for LF is the idea that this “covert” movement in Chinese interrogatives creates ambiguity. Example (7) below is a case in point: if Move-α were allowed to apply freely, then both readings, (7a) and (7b) should be acceptable interpretations of the sentence. In actuality only (7a) is acceptable. (7) ni xiangzhidao wo weishenme mai shenme? you wonder I why buy what a. ‘What is the x such that you wonder why I bought x?’ b. *‘What is the reason x such that you wonder what I bought for x?’
22
Z. Chen and J.T. Hale CP C
C
TP DP
T
Pronoun T
VP
ni V V
CP
xiangzhidao
C C
TP
DP
T
Pronoun wo
T
VP V
AdvP Adverbial wh weishenme
V V
DP
mai Nominal wh shenme
Fig. 3. The surface structure of the Chinese Wh-movement example (7)
The verb xiangzhidao ‘wonder’ in (7) forms a subordinate question. The presence of two alternative landing sites, shown in Figure 3, allows Wh-movement to derive two different LF possibilities. These two LFs, illustrated in Figure 4, correspond to alternative readings. However, only reading (7a) is acknowledged by Chinese speakers. Sentence (7) does not question the reason for buying as in (7b). Rather, it is a direct question about the object of mai ‘buy’. Huang argues that Empty Category Principle (ECP) [6] correctly excludes the unattested interpretation as shown in Figure 4(b). Principle 3 (The Empty Category Principle). A non-pronominal empty category (i.e., trace) is properly governed by either a lexical head or its antecedent.6 Figure 4 illustrates the ECP’s filtering action on example (7). In this Figure, dotted arrows indicate government. In the ECP-respecting LF shown in Figure 4(a), the trace t1 of the moved wh-word shenme is an empty category lexically governed by the verb mai ‘buy’. Trace t2 is antecedent governed. In the ECP-failing LF of Figure 4(b), the trace of shenme t1 , is also lexically governed. However, weishenme’s trace, t2 is left ungoverned. As an adjunct, it is not lexically governed. Nor is it antecedent governed — the nearest binder weishenme lies beyond a clause-boundary. 6
The antecedent can be a moved category (i.e., wh-phrase). We follow Huang et al [18] in using the classical, “disjunctive” version of the ECP.
Deforesting Logical Form CP DP1
23
CP
C
AdvP2
Nominal wh C
C
Adverbial wh
TP
C
shenme
TP
weishenme
DP
DP
T
Pronoun ni
T
Pronoun
T
VP
ni
T
VP
V
V
V
V
CP
xiangzhidao
xiangzhidao AdvP2 Adverbial wh
C C
TP
weishenme DP Pronoun T
CP DP1 Nominal wh
DP Pronoun T
VP
T VP
wo
V t2
TP
shenme
T
wo
C C
mai
(a) LF1
V V t1
V t1 mai
V t2
V
(b) *LF2
Fig. 4. The logical form of the Chinese Wh-movement example (7)
Like the Proper Binding Condition, the ECP can also be reformulated as a constraint on lists; in this case lists of wh elements. The key requirement is that only wh-arguments should outscope other wh-phrases. Principle 4 (Linear Empty Category Principle on wh-questions). Let L = n1 , . . . , nm be a scope-ordered list of wh-elements where n1 has the widest scope and nm has the narrowest. L will violate the Empty Category Principle if any wh-adjunct is at the position ni and i < m. The correctness of Principle 4 derives from the fact that moved wh-elements always end up at the edge of a clause. If a wh-adjunct scopes over another wh-phrase, it must have crossed a clause boundary in violation of the ECP. We postulate LF representations as in (8) in which wh-phrases denote sets of propositions [21]. The wh-argument shenme scopes over the wh-adjunct weishenme. The answer to weishenme is a set of propositions which CAUSEs the action wo-mai to happen as part of the state s. (8) λP ∃x (P = ni-xiangzhidao (λQ∃s (Q= CAUSE (s, wo-mai (x))))) 5.2
The Wh-Movement Implementation
The implementation of Wh-movement is mostly analogous to QR in Section 4.2. We define a combinator parser for the Chinese fragment in the Appendix. A Wh-movement function fills a landing site with a wh-phrase and creates a trace. A filter applies the ECP to the derived LFs. This Chinese parser differs from the QR example in its ability to handle subordinate clauses. It obtains LF representations for the main clause and the
24
Z. Chen and J.T. Hale CP CP DP1
Nominal wh
C AdvP2 C
TP
Adverbial wh C
shenme DP
C TP
weishenme
T
Pronoun T
DP
T
Pronoun T
VP
ni
VP V
wo CP
V
t2
V (b) subordinate clause
mai
xiangzhidao
λP∃x[P = ni-xiangzhidao (Subordinate Clause)]
cpconvert
λQ∃s[Q = CAUSE(s, wo-mai (x))]
(a) Main clause Fig. 5.
V V t1
(b) Subordinate clause
translates LF into LF of the example (7)
subordinate clause separately and then encapsulates them together, as shown in Figure 5. The cpconvert function applies two mapping rules: shenme ⇒ λP ∃x (P = . . . x . . . ) weishenme ⇒ λQ∃s (Q = CAUSE (s, . . . )) This function plugs-in to yield an LF-using analyzer as shown in Listing 5. let withCHSLF ss = Seq.map cpconvert (Seq.map mvwh (candidate whps orderings ss))
Listing 5. wh-question analyzer with plug-in that uses LF
This analyzer correctly finds the LF shown in (8). # Seq.iter formula (analyzeCHS withCHSLF "ni xiangzhidao wo weishenme mai shenme");; ‘lambda h[exists v6[h=[ni-xiangzhidao(lambda i[exists s[i=[CAUSE(s,wo-mai(v6))]]])]]]’ - : unit = ()
5.3
Deforesting LF in Wh-Movement
Executing the program in Section 5.2 causes intermediate wh-moved LF trees to be created. This is similar to the QR program discussed in Section 4.2 which allocates quantifier-raised structures. Using the same deforestation techniques, an equivalent program that does not allocate these intermediate data structures can be obtained.
Deforesting Logical Form
25
let fhuang ( ss , places) = cpconvert (mvwh (ss,places)) deforest let withoutCHSLF ss = Seq.map fhuang (candidate whp orderings ss)
Listing 6. LF-free analyzer for Chinese wh-questions
The Wh-movement function mvwh takes a pair consisting of a surface structure and an ECP-compliant list of wh-element addresses. After deforestation, the resultant function fhuang in Algorithm 2 is similar to fmay but also has the ability to handle complex embedded sentences. Apart from deciding the type of wh-phrase (Line 6 and 20), an additional condition detects whether the current structure is complex (Line 10 and 24). If it is, the program obtains the LF representation for the first clause then recursively works on the subordinate clause and the remaining wh-phrases (Line 17 and 31). Table 2 lists the deforestation steps used in the derivation of this deforested fhuang . Algorithm 2. Pseudocode for the deforested fhuang 1: 2: 3: 4: 5: 6:
function fhuang (ss,WHPplaces) if no more WHPplaces then predicateify(ss) else examine the next WHPplace whp if whp = DP then Nominal wh
7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
shenme let v be a fresh variable name let P be a fresh variable name let body = ss with whp replaced by an indexed variable if body has no subordinate clause then λ P ∃ v . P = fhuang (body, remaining WHPplace) else let Q be a fresh variable name let cp1 = body with the subordinate clause replaced by Q let cp2 = the subordinate clause of body let WHPplaces = the WH-phrase places of the cp2 λ P ∃ v . P = (λ Q .predicateify (cp1 )) fhuang cp2 , WHPplaces end if end if if whp = AdvP then adverbial wh
21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:
weishenme let s be a variable name for the state/event let P be a fresh variable name let body = ss with whp replaced by an indexed variable if body has no subordinate clause then λ P ∃ s . P = CAUSE (s, fhuang (body, remaining WHPplaces)) else let Q be a fresh variable name let cp1 = body with the subordinate clause replaced by Q let cp2 = the subordinate clause of body let WHPplaces = the WH-phrase places of the cp2 λ P ∃ s . P = (λ Q .CAUSE (s, predicateify(cp1 ))) fhuang cp2 , WHPplaces end if end if end if end function
26
Z. Chen and J.T. Hale Table 2. Steps in the deforestation of Wh-movement example (7) Wadler’s Number Action in the program 3 unfold a function application cpconvert with its definition 6 unfold a function application mvwh with its definition 7 broadcast the inner case statement of a WHP outwards 5 simplify the matching on a constructor; set the subordinate clause as the current structure; update WHP addresses accordingly. 5 get rid of the matching on a Some constructor Not relevant a knot-tie in the outermost match. We started by translating (cpconvert (mvwh (ss,places))) == fhuang (ss,places)
6
Conclusion
Parsing systems initially defined with the help of LF can be re-formulated so as to avoid constructing any LFs, or so it turns out in the well-known cases of English quantifier-scope and Chinese wh-questions. These findings suggest that LF may not in fact be an essential level of representation in a parser, because it can always be deforested away. They are consistent with the alternative organization depicted in Figure 6.
move-α Sentences
parsers
Logical Forms
mapping rules
Surface Structures
LF′s deforested f
Fig. 6. The interlevel relationships from a sentence to LF s
Of course, it remains to be seen if a facet of the LF idea will be uncovered that is in-principle un-deforestable. The positive results suggest that such an outcome is unlikely because the central elements of May and Huang’s proposal are so naturally captured by the programs in Listings 3 and 5. In the end, there is no formal criterion for having adequately realized the spirit of a linguistic analysis in an automaton. The most we can ask is that the grammar somehow be recognizable within the parser (Hypothesis 1). Nevertheless, the result harmonizes with a variety of other research. For instance, our method uses lists of quantified phrases or wh-phrases in a way that is reminiscent of Cooper Storage [9]. Cooper storage stores the denotation of quantified phrases or wh-phrases and retrieves them in a certain order. The linear versions of PBC and ECP could be construed as constraints on the order of retrieving them from the store.
Deforesting Logical Form
27
In addition, our method is consistent with the work of Mark Johnson, who uses the fold/unfold transformation to, in a sense, deforest the S-structure and D-structure levels of representation out of a deductive parser [20]. It also harmonizes with the research program that seeks directly-compositional competence-grammars [19][24]. The present paper argues only that LF is unnecessary in the processor, leaving open the possibility that LF might be advantageous in the grammar for empirical or conceptual reasons. Proponents of directly compositional grammars have argued that the available evidence fails to motivate LF even in competence. If they are right, a fortiori there is no need for LF in performance models.
Acknowledgements The authors would like to thank Hongyuan Dong, Scott Grimm, Tim Hunter, Graham Katz, Greg Kobele, Alan Munn and Michael Putnam and ChungChieh Shan for helpful feedback on this material. They are not to be held responsible for any errors, misstatements or other shortcomings that persist in the final version.
References 1. Berwick, R.C., Weinberg, A.S.: The Grammatical Basis of Linguistic Performance. MIT Press, Cambridge (1984) 2. Burge, W.H.: Recursive Programming Techniques. Addison-Wesley, Reading (1975) 3. Burstall, R., Darlington, J.: A transformation system for developing recursive programs. Journal of the Association for Computing Machinery 24(1), 44–67 (1977) 4. Chomsky, N.: Syntactic structures. Mouton de Gruyter, Berlin (1957) 5. Chomsky, N.: Aspects of the Theory of Syntax. MIT Press, Cambridge (1965) 6. Chomsky, N.: Lectures on Government and Binding. Foris, Dordrecht (1981) 7. Chomsky, N.: Approaching UG from below. In: Sauerland, U., G¨ artner, H.M. (eds.) Interfaces + recursion = language?: Chomsky’s minimalism and the view from syntax-semantics, pp. 1–29. Mouton de Gruyter, Berlin (2007) 8. Chomsky, N.: On phases. In: Freidin, R., Otero, C., Zubzarreta, M.-L. (eds.) Foundational Issues in Linguistic Theory: Essays in Honor of Jean-Roger Vergnaud, pp. 133–166. MIT Press, Cambridge (2008) 9. Cooper, R.H.: Montague’s Semantic Theory and Transformational Syntax. Ph.D. thesis, Umass (1975) 10. Curry, H.B., Feys, R.: Combinatory Logic, vol. 1. North-Holland, Amsterdam (1958) 11. Curry, H.B., Hindley, J.R., Seldin, J.P.: Combinatory Logic, vol. 2. North-Holland, Amsterdam (1972) 12. Fiengo, R.: Semantic Conditions on Surface Structure. Ph.D. thesis, MIT, Cambridge (1974) 13. Frost, R., Launchbury, J.: Constructing natural language interpreters in a lazy functional language. The Computer Journal 32(2), 108–121 (1989)
28
Z. Chen and J.T. Hale
14. Gorn, S.: Explicit definitions and linguistic dominoes. In: Hart, J., Takasu, S. (eds.) Systems and Computer Science, University of Toronto Press (1967) 15. Hornstein, N.: Logical Form: From GB to Minimalism. Blackwell, Oxford (1995) 16. Huang, C.T.J.: Logic relations in Chinese and the theory of grammar. Ph.D. thesis, MIT, Cambridge; edited version published by Garland, New York, 1998 (1982) 17. Huang, C.T.J.: Move wh in a language without wh-movement. The linguistic review 1, 369–416 (1982) 18. Huang, C.T.J., Li, Y.H.A., Li, Y.: The Syntax of Chinese. Cambridge University Press, Cambridge (2009) 19. Jacobson, P.: Paycheck pronouns, Bach-Peters sentences, and variable-free semantics. Natural Language Semantics 8(2), 77–155 (2000) 20. Johnson, M.: Parsing as deduction: the use of knowledge of language. Journal of Psycholinguistic Research 18(1), 105–128 (1989) 21. Karttunen, L.: Syntax and semantics of questions. Linguistics and Philosophy 1, 3–44 (1977) 22. Larson, R., Segal, G.: Knowledge of meaning. MIT Press, Cambridge (1995) 23. May, R.: The grammar of quantification. Ph.D. thesis, MIT, Cambridge (1977) 24. Steedman, M.: The Syntactic Process. MIT Press, Cambridge (2000) 25. Wadler, P.: Transforming programs to eliminate trees. Theoretical Computer Science 73, 231–248 (1990)
Appendix: Grammar of Two Examples Grammar of QR S → DP VP DP → D NP DP → PN NP → Adj N NP → Adj NP N → N PP VP → V DP VP → V DP PP PP → P DP PP
example (5) D → every D → some PN → John N → city N → body Adj → Italian P → in V → met
Grammar of Wh-movement example (7) CP → Spec C Pronoun → ni C →C TP Pronoun → wo TP → DP T NOMwh → shenme TP → Spec T ADVwh → weishenme T →T VP V → xiangzhidao DP → Pronoun V → mai DP → NOMwh NP → NOMwh AdvP → ADVwh V →V CP V → AdvP V V →V DP VP → V
On the Probability Distribution of Typological Frequencies Michael Cysouw Max Planck Institute for Evolutionary Anthropology, Leipzig
[email protected]
Abstract. Some language types are more frequent among the world’s languages than others, and the field of linguistic typology attempts to elucidate the reasons for such differences in type frequency. However, there is no consensus in that field about the stochastic processes that shape these frequencies, and there is thus likewise no agreement about the expected probability distribution of typological frequencies. This paper explains the problem and presents a first attempt to build a theory of typological probability purely based on processes of language change.
1
Probability Distributions in Typological Research
A central objective of the typological study of linguistic diversity is to explain why certain kinds of linguistic structures are much more frequently attested among the world’s languages than others. Unfortunately, such interpretations of empirically attested frequencies often rely on purely non-mathematic intuitions to judge whether observed frequencies are in any sense noteworthy or not. In the typological literature, this frequently leads to a tacit assumption that typological frequencies are evenly distributed, i.e. that a priori all language-types should be equally frequent, and any observed skewing of frequencies is thus in need of an explanation. Such an argumentation can be found, for example, in the widely read typological textbook by Comrie [2]: “In a representative sample of languages, if no universal were involved, i.e. if the distribution of types along some parameter were purely random, then we would expect each type to have roughly an equal number of representatives. To the extent that the actual distribution departs from this random distribution, the linguist is obliged to state and, if possible, account for this discrepancy” (p. 20) Various more sophisticated approaches to the interpretation of empirical frequencies often assume an underlyingly normal (or, more precisely, multinomial) distribution, as for example indicated by the regular use of χ2 statistics or Fisher’s exact test (e.g. in Cysouw [3]). Janssen et al. [6] and Maslova [12] explicitly discuss the problem of tacit assumptions of the underlying probability distributions as made in linguistic typology. As a practical solution to circumvent this problem for the assessment of statistical significance, Janssen et al. [6] propose to use randomization-based significance tests. Such tests do not make C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 29–35, 2010. c Springer-Verlag Berlin Heidelberg 2010
30
M. Cysouw
any assumptions about the underlying probability distribution. This of course still leaves open the question about the nature of these distributions. There is a small literature that explicitly deals with the question of the underlying probability distribution of typological variables, but there is no agreement whatsoever. The first paper to make any such proposal was Lehfeldt [9], who proposes a gamma distribution for the size of phoneme inventories. In a reaction to this claim Justeson and Stephens [7] proposed a log-normal distribution for the same data. The size of the phoneme inventory is a clear case of linguistic complexity. More general, Nichols [14] (and more recently Nichols et al. [15]) proposed a normal distribution for linguistic complexity. Similarly, Maddieson [10] hints at a normal distribution for the size of consonant inventories. Finally, Maslova [12] argues that the frequencies of types in the World Atlas of Language Structures (henceforth WALS, [5]) follows a pareto distribution. There are thus at least proposals for gamma, log-normal, normal and pareto distributions of typological variables. Most of these proposals arrive at their distribution on the basis of the inspection of empirical values. For example, the only argument Lehfeldt (1975) offered for the gamma distribution is a (rather speculative) interpretations of some moment-like characteristics of the empirical frequencies. Nichols et al. [15] only observe bell-shaped distribution, and propose a normal distribution on that meagre basis. Though empirical distributions might suggest a particular underlying probability distribution, but they can never be used to argue for a particular distribution. Both the gamma distribution and the log-normal distribution can be fitted easily to the empirically observed phoneme-size distribution. The proper argument for a particular probability distribution is the explication of the stochastic process that causes the distribution to arise. This approach was used by Justeson and Stephens [7] in their plea for a log-normal distribution of phoneme inventory size. Phoneme inventories, they argue, are based on phonological feature inventories. Given n binary features, it is possible to compose 2n phonemes. Now, assuming that feature inventories are normally distributed (a claim they do not further elucidate), phoneme inventories will thus be lognormally distributed (i.e. the logarithm of the phoneme inventory size will be normally distributed). Irrespective of the correctness of their claim, this argument is a good example of an attempt to find a stochastic reason for a particular distribution. The actual proposal of Justeson and Stephens is not convincing, because they do not substantiate the normal distribution of feature inventories. Still, their approach is an important step in the right direction.
2
The Stochastic Process of Language Change
To investigate the nature of any underlying probability distribution it is necessary to consider the stochastic process that causes the phenomenon at hand. For typological frequencies there are at least two (non-exclusive) kind of processes that can be considered. The frequencies of linguistic types in the world’s languages are partly shaped by cognitive processes, and partly by diachronic
On the Probability Distribution of Typological Frequencies
31
processes. In this paper I will restrict myself to further investigate the latter idea, namely that the process of language change determines the probability distribution of typological frequencies. The synchronic frequencies of a typological variable (for example the word order of verb and object cf [4]) can be seen as the result of the diachronic processes of language change (cf. Plank and Schellinger [16]). More precisely, the current number of languages of a particular linguistic type can be analyzed as the result of a Markov process in which language change from one type to another sequentially through time (cf. Maslova [11]). For example, a verb-object language can change into a object-verb language, and vice-verse, and this process of change from one type to the other determines the probability distribution of the linguistic type. As a first (strongly simplified) approach to the stochastic nature of this process of type-change, I will in this paper consider type-change as a simple birthdeath process: a verb-object language is “born” when an object-verb language changes to a verb-object language, and a verb-object language “dies” when this language changes to an object-verb language.1 Also as a first approximation, I will assume that such type-changes take place according to a Poisson process. A Poisson process is the stochastic process in which events occur continuously and independently of one another, which seems to be a suitable assumption for language change. Such a basic birth-death model with events happening according to a Poisson distribution is known in queueing theory as an M/M/1 process (using the notation from Kendall [8] in which M stands for a “Markovian” process). Normally, queueing models are used to describe the behavior of a queue in a shop. Given a (large) population of potential buyers, some will once in a while come to a cash register to pay for some goods (i.e. a “birth” in the queue) and then, possibly after some waiting time, pay and leave the queue again (i.e. a “death” in the queue). The queueing model also presents a suitable metaphor to illuminate the dynamics of typological variables. Consider all the worlds languages throughout the history of homo loquens as the (large) population under investigation. Through time languages change from one type to another type, and vice-versa. Metaphorically, one can then interpret the number of languages of a particular type at a particular point in time as the length of a queue. A central parameter of a queueing model is the traffic rate t, which is defined as the fraction of the arrival rate λ and the and the departure rate μ: t = λ/μ. The arrival and the departure rate designate the average number of arrivals and departures in the queue per time unit. However, time is factored out in the traffic 1
Altmann [1] also uses a birth-death model to investigate distributions in a crosslinguistic contexts. However, he investigates another kind of phenomenon, namely the probability distribution of the number of different types in dialectological maps. This aspect is closely related to the number of types per map in a typological atlas like WALS, though there is more arbitrariness in the number of types in a typological map compared to the number of types in a dialectological map. Altmann convincingly argues that the number of types on dialectological maps should be negatively binomially distributed.
32
M. Cysouw
rate t, so this rate is just a general indication of the dynamics of the queue. In a stable queue, the traffic rate must be between 0 and 1 (a traffic rate larger than one would result in an indefinite growing of the queue). Now, in an M/M/1 model with traffic rate t, the probability distribution of the queue length q is distributed according to (1), which is a slight variation on a regular (negative) exponential distribution (cf. Mitsenmacher and Upfal [13], p. 212). Following this model, typological frequencies should accordingly be roughly (negatively) exponentially distributed. P (q = n) = (1 − t) · tn
3
(1)
Meta-typological Fitting
To get an impression how these assumptions fare empirically, I will present a small meta-typological experiment (for an introduction to meta-typology, cf. Maslova [12]. As described earlier, such an experiment is no argument for a particular distribution; it will only show that it is possible to model empirical frequencies by using a negative exponential distribution. Whether this is indeed the right distribution can never be proved by a well-fitted curve. The theoretical derivation of the distribution has to be convincing, not the empirical adequacy. For the meta-typological experiment, I randomly selected one cross-linguistic type from each chapter of WALS [5]. A histogram of the number of languages per type is shown in Figure 1. Based on a similar distribution, Maslova [12] proposed that the size of cross-linguistic types follows a pareto distribution. However, as described in the previous section, it seems to make more sense to consider this an exponential distribution as defined in (1). To fit the empirical data to the proposed distribution in (1) I divided the types from WALS into bins of size 10, i.e. all types with 1 to 10 languages were combined into one group, likewise all types with 11-20 languages, etc. For each of these bins, I counted the number of types in it. For example, there are 18 types that have between 1 to 10 languages, which is 12.9% of all 140 types. So, the probability to have a“queue” with a length between 1 and 10 languages is 0.129. Fitting these empirical probabilities for type-sizes to the proposed distribution in (1) results in a traffic rate t = .85 ± .01.2 It is important to note that I took the bare number of languages per type as documented in each chapter in WALS. This decision has some complications, because the set of languages considered (i.e. the “sample”) is rather different between the different chapters in WALS. The number of languages in a particular type will normally differ when the researcher considered 100 languages or 500 languages as a sample. Still, I decided against normalizing the samples, 2
I used the function nls (“non-linear least squares”) from the statistics environment R [17] to estimate the traffic rate from the data, given the predicted distribution in (1). Also note that these fitted values represent a random sample of types from WALS, and the results will thus differ slightly depending on the choice of types.
0
5 10
20
Frequency
30
On the Probability Distribution of Typological Frequencies
0
100
200
300
400
500
600
size of types
0.10 0.05 0.00
fitted probabilities
0.15
Fig. 1. Histogram of type sizes from WALS
0.00
0.05
0.10
0.15
empirically observed probabilities Fig. 2. Fit of empirical distribution to predicted distribution
33
34
M. Cysouw
because that would introduce an artificial upper boundary. This decision implies, however, that the distribution of type-size in the current selection of data from WALS is also influenced by yet another random variable, namely the size of the sample from each chapter. From the perspective of typology, this is a rather strange approach, because a typological samples attempt to sample to current world’s languages. For the current purpose, however, the population to be sampled is not the current world’s languages, but all languages that were ever spoken, or will ever be spoken through time and space. From that perspective, any restriction on sample size will only restrict the number of languages, but it will not influence the traffic rate, nor the type-distribution. The relation between the empirically observed probabilities and the fitted probabilities is shown in Figure 2. It is thus easily possible to nicely fit the empirical data to the proposed distribution in (1). However, as argued in the previous section, this is not a proof of the proposal, but only an illustration. The nature of a probability distribution can never be empirically proven, but only made more plausible by a solid analysis of the underlying processes.
4
Outlook
The model presented in this paper is restricted to a very simplistic birth-death model of typological change. More complex models will have to be considered to also cover the more interesting cases like the size of phoneme inventories. Basically, to extend the current approach, Markov models involving multiple states and specified transition probabilities between these states are needed. For example, phoneme inventories can be considered a linearly ordered set of states, and the process of adding or losing one phoneme can also be considered a poisson process. At this point, I do not know what the resulting probability distribution would be in such a model, but it would not surprise me if Lehfeldt’s [9] proposal of a gamma distribution would turn out to be in the right direction after all.
References 1. Altmann, G.: Die Entstehung diatopischer Varianten. Zeitschrift f¨ ur Sprachwissenschaft 4(2), 139–155 (1985) 2. Comrie, B.: Language Universals and Linguistic Typology. Blackwell, Oxford (1989) 3. Cysouw, M.: Against implicational universals. Linguistic Typology 7(1), 89–101 (2003) 4. Dryer, M.S.: Order of object and verb. In: Haspelmath, M., Dryer, M.S., Gil, D., Comrie, B. (eds.) World Atlas of Language Structures, pp. 338–341. Oxford University Press, Oxford (2005) 5. Haspelmath, M., Dryer, M.S., Comrie, B., Gil, D. (eds.): The World Atlas of Language Structures. Oxford University Press, Oxford (2005) 6. Janssen, D.P., Bickel, B., Z´ un ˜iga, F.: Randomization tests in language typology. Linguistic Typology 10(3), 419–440 (2006)
On the Probability Distribution of Typological Frequencies
35
7. Justeson, J.S., Stephens, L.D.: On the relationship between the numbers of vowels and consonants in phonological systems. Linguistics 22, 531–545 (1984) 8. Kendall, D.G.: Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded markov chain. The Annals of Mathematical Statistics 24(3), 338–354 (1953) 9. Lehfeldt, W.: Die Verteilung der Phonemanzahl in den nat¨ urlichen Sprachen. Phonetica 31, 274–287 (1975) 10. Maddieson, I.: Consonant inventories. In: Haspelmath, M., Dryer, M.S., Gil, D., Comrie, B. (eds.) World Atlas of Language Structures, pp. 10–13. Oxford University Press, Oxford (2005) 11. Maslova, E.: A dynamic approach to the verification of distributional universals. Linguistic Typology 4(3), 307–333 (2000) 12. Maslova, E.: Meta-typological distributions. Sprachtypologie und Universalienforschung 61(3), 199–207 (2008) 13. Mitzenmacher, M., Upfal, E.: Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge Uniersity Press, Cambridge (2005) 14. Nichols, J.: Linguistic Diversity in Space and Time. University of Chicago Press, Chicago (1992) 15. Nichols, J., Barnes, J., Peterson, D.A.: The robust bell curve of morphological complexity. Linguistic Typology 10(1), 96–106 (2006) 16. Plank, F., Schellinger, W.: Dual laws in (no) time. Sprachtypologie und Universalienforschung 53(1), 46–52 (2000) 17. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008)
A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A.D. Fowler Department of Computer Science University of Toronto 10 King’s College Rd., Toronto, ON, M5S 3G4, Canada
[email protected]
Abstract. [2] introduced the bounded-order Lambek Calculus and provided a polynomial time algorithm for its sequent derivability. However, this result is limited because it requires exponential time in the presence of lexical ambiguity. That is, [2] did not provide a polynomial time parsing algorithm. The purpose of this paper will be to provide such an algorithm. We will prove an asymptotic bound of O(n4 ) for parsing and improve the bound for sequent derivability from O(n5 ) to O(n3 ).
1
Introduction
The Lambek calculus (L) [5] is a categorial grammar formalism having a number of attractive properties for modelling natural language. In particular, it is strongly lexicalized which allows the parsing problem to be a problem over a local set of categories, rather than a global set of rules. Furthermore, there is a categorial semantics accompanying every syntactic parse. Despite these attractive properties, [7] proved that L is weakly equivalent to context-free grammars (CFGs) which are widely agreed to be insufficient for modelling all natural language phenomena. In addition, [8] proved that the sequent derivability problem for L is NP-complete. The weak equivalence to CFGs has been addressed by a number of authors. [4] introduced a mildly contextsensitive extension of L, increasing the weak equivalence of the formalism and [9] proved that L is more powerful than CFGs in terms of strong generative capacity. [6] proved that restrictions of the multi-modal non-associative Lambek calculus is mildly context-sensitive, which raises interesting questions as to the importance of associativity to parsing. We will address the issue of parsing complexity by extending the results of [2] and providing a parsing algorithm for the Lambek calculus which runs in O(n4 ) time for an input of size n when the order of categories is bounded by a constant. The key to this result is in restricting Fowler’s representation of partial proofs to be local to certain categories rather than global. Our results are for the product-free fragment of the Lambek calculus (L) and the variant that allows empty premises (L∗ ), for simplicity and because the product connective has limited linguistic application. This work can be also seen as a generalization of C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 36–43, 2010. c Springer-Verlag Berlin Heidelberg 2010
A PTIME Algorithm for the Bounded Order Lambek Calculus
37
[1] which proved that if the order of categories is less than two then polynomial time sequent derivability is possible.
2
The Lambek Calculus
The set of categories C is built up from a set of atoms (e.g. {S, N P, N, P P }) and the two binary connectives / and \. A Lambek grammar G is a 4-tuple Σ, A, R, S where Σ is an alphabet, A is a set of atoms, R is a relation between symbols in Σ and categories in C and S is the set of sentence categories. A sequent is a sequence of categories (called the antecedents) together with the symbol and one more category (called the succedent). The parsing problem for a string of symbols s1 . . . sl can be characterized in terms of the sequent derivability problem as follows: s1 . . . sl ∈ Σ ∗ is parseable in a Lambek grammar Σ, A, R, S if there exists c1 , . . . , ck ∈ C and s ∈ S such that ci ∈ R(si ) for 1 ≤ i ≤ k and the sequent c1 . . . ck s is derivable in the Lambek calculus. The sequent derivability problem can be characterized by conditions on term graphs [3]. A term graph for a sequent is formed in a two step process. The first step is deterministic and begins with associating a polarity with each category in the sequent: Negative for the antecedents and positive for the succedent. Next, we consider the polarized categories as vertices and decompose the slashes according to the following vertex rewriting rules: (α/β)− ⇒ α− → β + (β\α)− ⇒ β + ← α− (α/β)+ ⇒ β − α+ (β\α)+ ⇒ α+ β − The neighbourhood of the polarized category on the left of each rule is assigned to α. The dashed edges are called Lambek edges and the non-dashed edges are called regular edges. This process translates categories into trees. Then, additional Lambek edges are introduced from the root of the succedent tree to the roots of the antecedent trees which are referred to as rooted Lambek edges. The result of the graph rewriting is an ordered sequence of polarized atoms. The second step is non-deterministic and assigns a complete matching of the polarized atoms. The matching must be planar above the atoms and match occurrences of atoms only with other occurrences of the same atom but of opposite polarity. These pairings of atom occurrences are called matches. The edges in
S
S
NP
NP
S
NP
NP
S
Fig. 1. An integral term graph for the sequent S/(N P \S), (N P \S)/N P, N P S
38
T.A.D. Fowler
the matching are regular edges and are directed from the positive atom to the negative atom. An example of a term graph can be seen in figure 1. We will prefix the term regular (resp. Lambek) to the usual definitions of graph theory when we mean to refer to the sub-graph of a term graph obtained by restricting the edge set to only the regular (resp. Lambek) edges. A term graph is L∗ -integral if it satisfies the following conditions: 1. G is regular acyclic. 2. For every Lambek edge s, t in G there is a regular path from s to t. A term graph is integral if it is L∗ -integral and it satisfies the following: 3. For every Lambek edge s, t in G, there is a regular path from s to a vertex x in G such that if x has a non-rooted Lambek in-edge s , x then there is no regular path from s to s . Theorem 1. A sequent is derivable in L iff it has an integral term graph. A sequent is derivable in L∗ iff it has an L∗ -integral term graph. Proof. See [3].
Parsing with L∗
3
[2] presented a chart parsing algorithm that uses an abstraction over term graphs1 as a representation and considers the exponential number of matchings by building them incrementally from smaller matchings. The indices of the chart are the atoms of the sequent. Applying this algorithm to the parsing problem is exponential time because near the bottom of the chart, there would need to be one entry in the chart for every possible sequence of categories. We will generalize this algorithm to the parsing problem by restricting the domain of the abstractions to be the categories of the atoms at its endpoints. 3.1
Abstract Term Graphs
In this section, we introduce versions of the L∗ -integrity conditions that can be enforced incrementally during chart parsing and using only the information contained in the abstractions. A partial matching is a contiguous sub-matching of a complete matching. The notion of partial term graph is defined in the obvious way. In a partial term graph, a vertex is open if it does not have a match. A partial term graph is L∗ -incrementally integral if it satisfies the following: 1. G is regular acyclic 2. For every Lambek edge s, t either there is a regular path from s to t or there is an open positive vertex u such that there is a regular path from s to u and there is an open negative vertex v such that there is a regular path from t to v. 1
There are some differences between the presentations. For discussion, see [3].
A PTIME Algorithm for the Bounded Order Lambek Calculus
39
The intuition is that a partial term graph G is L∗ -incrementally integral exactly when extending its matching to obtain an L∗ -integral term graph has not already been ruled out. It should be easy to see that L∗ -incrementally integral term graphs are L∗ -integral. We will only insert representations into the chart for partial term graphs that are L∗ -incrementally integral. An abstract term graph A(G) is obtained from an L∗ -incrementally integral term graph G by performing the following operations: 1. Deleting Lambek edges s, t s.t. there is a regular path from s to t. 2. Replacing Lambek edges s, t where t is not open by s, r where r is the open negative vertex from which there is a regular path to t. 3. Replacing Lambek edges s, t where s is not open by edges r, t for r ∈ R where R is the set of open positive vertices with regular paths from s. 4. Contracting regular paths between open vertices to a single edge and deleting the non-open vertices on the path For the term graph with the empty matching G , A(G ) = G . Planar matchings are built out of smaller partial planar matchings in one of two ways: Bracketing and Adjoining. Bracketing a partial matching extends it by introducing a match between the two atoms on either side of the matching. Adjoining two partial matchings concatenates the two partial matchings to obtain a single larger matching. [2] details two sub-algorithms for bracketing and adjoining ATGs. Given an ATG A(G), Fowler’s bracketing algorithm determines whether extending G by bracketing (for any term graph G with ATG A(G)) results in an incrementally integral term graph H and returns the ATG A(H) if it does. Given two ATGs A(G1 ) and A(G2 ), the adjoining algorithm does the same for adjoining. 3.2
The Parsing Algorithm
Polynomial time parsing depends on the following key insight: the only information in ATGs needed to run the bracketing and adjoining algorithms is the edges between atoms originating from categories at the endpoints of the ATG. Let G = Σ, A, R, S be a Lambek grammar and let s1 . . . sl ∈ Σ ∗ be the input string. Because an atom or category can occur more than once in the input, we need the notions of category occurrence and atom occurrence to be specific instances of a category and an atom, respectively. Given a category occurrence c, the partial term graph for c, denoted T (c), is the graph obtained from the deterministic step of term graph formation without the rooted edges. The atom sequence for c, denoted A(c), is the sequence of atom occurrences of T (c). Given an atom occurrence a, C(a) is the category occurrence from which it was obtained and given a category occurrence c, S(c) is the symbol si from which it was obtained. Our new definition of ATGs is as follows: For a partial term graph G, whose matching has endpoints as and ae , its ATG A(G) is obtained as before but with the following final step:
40
T.A.D. Fowler
5. Deleting all vertices v such that C(v)
= C(as ) and C(v)
= C(ae ) The bracketing and adjoining algorithms of [2] require knowledge both about the ATGs that they are bracketing and adjoining but also the state of the term graphs before any matchings have been made, which is referred to as the base ATG. In our case, the base ATG is simply the union of the term graphs of all categories of all symbols. In addition, [2] includes labels on the Lambek edges of ATGs that are the same as the target of the Lambek edge in the base ATG. Section 4 details a simpler and more elegant method of representing Lambek edges in ATGs. Finally, since the partial term graphs for categories do not include the rooted Lambek edges, they will need to be inserted whenever an ATG, G2 , with a right endpoint in a succedent category is combined with an ATG, G1 , with a left endpoint in a different category than the left endpoint of G2 . Then, Lambek edges are inserted from the positive vertices with in-degree 0 in G1 to the negative vertices with in-degree 0 in G2 . Our chart will be indexed by atom occurrences belonging to categories of symbols in the input string and categories in S. Entries consist of sets of ATGs and each entry as , ae will be specified by its leftmost matched atom as and its rightmost matched atom ae . A chart indexed in this way can be implemented by arbitrarily ordering the categories for a symbol and keeping track of the locations of word and category boundaries in the sequence of atoms. Sections 3.2 and 3.2 will outline the insertion of ATGs into the chart. Inserting ATGs for Minimal Matchings. A minimal matching is a matching consisting of exactly one match. Minimal matchings are inserted in two distinct steps. First, we insert minimal matchings over atom occurrences as and ae where C(as ) = C(ae ) = C and as and ae are adjacent in C. The ATG input to the bracketing algorithm is T (C). = Second, we insert minimal matchings over atoms as and ae where C(as )
C(ae ), as is the rightmost atom in C(as ) and ae is the leftmost atom in C(ae ). The ATG input to the bracketing algorithm is T (C(as )) ∪ T (C(ae )). Inserting ATGs for Non-minimal Matchings. Once we have inserted the ATGs for minimal matchings, we must process each ATG in the chart. That is, we consider all possible ways of bracketing and adjoining each ATG in the chart, which may require insertion of more ATGs that, in turn, need to be processed. An entry as , ae left subsumes an entry at , af if one of the following are satisfied: 1. at is equal to as 2. C(at ) = C(as ) = C and at appears to the right of as in A(C) = C(as ) and S(C(at ) appears to the right of S(C(as )) in the input 3. C(at )
string The notion of right subsumes is defined similarly for af and ae . An entry E subsumes an entry F if it left subsumes and right subsumes F . The intuition is that E subsumes F iff the endpoints of F appear between the endpoints of E.
A PTIME Algorithm for the Bounded Order Lambek Calculus
41
The size of an entry as , ae is the distance between as and ae , if C(as ) = C(ae ) and the distance between S(C(as )) and S(C(ae )), if C(as )
= C(ae ). However, any category with endpoints in the same category is smaller than any category for which they are not. We must take care not to process an entry until all ATGs for that entry have been inserted. All ATGs for an entry E have been inserted only after all entries that E subsumes have been processed. To ensure this, we process entries as follows. First, we process entries whose endpoints are in the same category. We process entries from smallest to largest and among entries of equal size from left to right. Second, we process entries whose endpoints are in different categories in the same order. To process an entry E = as , ae , we must consider all possible bracketings and adjoinings. To do this, we first calculate the set of atoms L(as ) and R(ae ), which will correspond to atoms which could occur to the left of as in a sequent and to the right of ae , respectively. If as is the leftmost atom in its category, then L(as ) is the set of atoms such that they are the rightmost atom in a category occurrence for the symbol occurring to the left of S(C(as )) in the input string. If S(C(as )) is the leftmost symbol in the input string then L(as ) = ∅. If as is not the leftmost atom in its category, then L(as ) is the singleton consisting of the atom to the left of as in A(C(as )). R(ae ) is computed analogously. Then, let G be an ATG in E and let al ∈ L(as ) and ar ∈ R(ae ). First, we run the bracketing algorithm with input G, al and ar . Then, for each ATG H in an entry F where F = a, al or F = ar , a for some atom a and such that F is smaller than E, we run the adjoining algorithm with input G and H. Lastly, for each ATG H in an entry the same size as E with endpoint al , we run the adjoining algorithm with input G and H. When running the bracketing and adjoining algorithms we insert the output into the chart at the appropriate place when they are successful. After processing every entry, we output “YES” if there are any ATGs in any entry as , ae such that as is the leftmost atom in a category in s1 and ae is the rightmost atom in a category in S. Otherwise, we output “NO”. Correctness. It is not hard to prove that the chart iteration process in the preceding sections considers every possible matching over every possible sequent for the sentence. Then, the bracketing and adjoining algorithms of [2] ensure any ATG inserted is the ATG of an L∗ -incrementally integral partial term graph. Finally, any ATG in an entry whose left endpoint is the leftmost atom of a category for s1 and whose right endpoint is the rightmost atom of a category for S is the ATG of an L∗ -incrementally integral term graph over some sequence of categories which must be an L∗ -integral term graph.
4
Running Time
[2] introduced the bounded order Lambek calculus where the categories in the input have order bounded by k. In the context of term graphs, bounding order by k is equivalent to bounding the length of paths in the base ATG. The key
42
T.A.D. Fowler B C A D E
H H I I I I
G F A ⇒ G
F D E
B C
Fig. 2. Comparing ATG representations
insight for achieving polynomial time is that bounding the lengths of paths in partial term graphs bounds the lengths of paths in ATGs in the chart, which in turn, bounds the variation. To improve the bound and simplify the representation, we introduce a better representation of ATGs. In the original definition of ATGs, the out-neighbourhood of negative vertices is not structured and the applicability of the integrity conditions during bracketing and adjoining is informed by the labels on Lambek edges. However, from ATG A(G) on the left of Fig. 2, we can deduce that for Lambek edges X, H and Y, I in G, there must be a regular path from Y to X. If we maintain this structure explicitly, then we can remove the need for labels on Lambek edges, since the labels are only used to deduce this structure. Furthermore, we can observe that any two positive vertices in an ATG who share an in-neighbour in the base ATG must have identical neighbourhoods in the ATG. Rather than representing each such positive vertex distinctly, we can use a placeholder that points to sets of positive vertices sharing in-neighbours in the base ATG. These two modifications are shown in Fig. 2 where the sets are listed below the placeholders. Finally, observe that any partition of the atoms of a category results in a path in the base ATG. Given an entry as , ae , the partition of the atoms to the left of as and to the right of ae results in two such paths. Then, any two ATGs in as , ae , can differ only by the neighbourhoods they assign to vertices incident to edges on one of the two paths. Bounded order gives us bounded paths and our new representation bounds the branching factor on these paths which means that the sub-graph of an ATG which is manipulated during bracketing and adjoining is of a constant size. Therefore, adjoining and bracketing are constant time operations. Also, the number of ATGs for an entry must also be constant, since the variation within them is bounded by a constant. Let n be the number of atoms in categories of symbols in the input. Then, the chart is of size O(n2 ) and while processing an entry, we bracket O(n) times and adjoin with the ATGs in up to O(n2 ) entries. Thus, the running time of our algorithm is O(n4 ). Without lexical ambiguity, we only need to adjoin with ATGs in O(n) entries yielding a time bound of O(n3 ) for sequent derivability.
5
Parsing with L
To ensure correctness of the algorithm with respect to L, we need a notion of incremental integrity for partial term graphs. A partial term graph G is incrementally integral if it is L∗ -incrementally integral and satisfies the following:
A PTIME Algorithm for the Bounded Order Lambek Calculus
43
3. For every Lambek edge s, t in G either there is a regular path from s to a negative vertex x such that if x has a non-rooted Lambek in-edge s , x then there is no regular path from s to s or there is an open positive vertex u such that there is a regular path from s to u and there is a negative vertex v and an open negative vertex w such that there is a regular path from w to v and if v has a non-rooted Lambek in-edge s , v then there is no regular path from w to s . To enforce this condition during chart parsing, we will need to associate an empty premise bit to positive vertices in ATGs. The bit associated to vp will represent whether there is a regular path from the source of a Lambek edge that has not yet met condition 3 to vp in the term graph. The empty premise bits for an ATG can be calculated analogously to the calculation of Lambek edges during adjoining and bracketing, since they represent the same notions of paths in underlying term graphs.
6
Conclusion
We have provided a parsing algorithm for both the Lambek calculus and the Lambek calculus allowing empty premises when the order of categories is bounded by a constant that runs in time O(n4 ). Also, we reduced the asymptotic bound for sequent derivability of [2] from O(n5 ) to O(n3 ). To the best of our knowledge, no linguistic analysis has called for categories of high order which means that these results allow for practical parsing with the Lambek calculus.
References 1. Aarts, E.: Proving theorems of the second order lambek calculus in polynomial time. Studia Logica 53(3), 373–387 (1994) 2. Fowler, T.A.D.: Efficient parsing with the Product-Free lambek calculus. In: Proceedings of The 22nd International Conference on Computational Linguistics (2008) 3. Fowler, T.A.D.: Term graphs and the NP-completeness of the Product-Free lambek calculus. In: Proceedings of the 14th Conference on Formal Grammar (2009) 4. Kruijff, G.J.M., Baldridge, J.M.: Relating categorial type logics and CCG through simulation. Unpublished Manuscript, University of Edinburgh (2000) 5. Lambek, J.: The mathematics of sentence structure. American Mathematical Monthly 65(3), 154–170 (1958) 6. Moot, R.: Lambek grammars, tree adjoining grammars and hyperedge replacement grammars. In: Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms (2008) 7. Pentus, M.: Product-free lambek calculus and context-free grammars. The Journal of Symbolic Logic 62(2), 648–660 (1997) 8. Savateev, Y.: Product-free lambek calculus is NP-complete. CUNY Technical Report (September 2008) 9. Tiede, H.J.: Deductive Systems and Grammars: Proofs as Grammatical Structures. PhD thesis, Indiana University (1999)
LC Graphs for the Lambek Calculus with Product Timothy A.D. Fowler Department of Computer Science University of Toronto 10 King’s College Rd., Toronto, ON, M5S 3G4, Canada
[email protected]
Abstract. This paper introduces a novel graph representation of proof nets for the Lambek calculus that extends the LC graph representation of [13] to include the product connective. This graph representation more clearly specifies the difference between the Lambek calculus with and without product than other proof net representations, which is important to the search for polynomial time among Lambek calculus fragments. We use LC graphs to further the efforts to characterize the boundary between polynomial time and NP-complete sequent derivability by analyzing the NPcompleteness proof of [14] and discussing a sequent derivability algorithm.
1
Introduction
The Lambek calculus [11] is a categorial grammar having four variants which will be considered in this paper: the Lambek calculus with product (L• ), the Lambek calculus without product (L), the Lambek calculus with product allowing empty premises (L•∗ ) and the Lambek calculus without product allowing empty premises (L∗ ). These four calculi can be characterized by the inference rules shown in figure 1. Lowercase Greek letters represent categories built from a set of atoms and the three connectives /, \ and • and uppercase Greek letters represent sequences of these categories. L•∗ uses these inference rules as they are given. L∗ is identical except that •L and •R are prohibited. L• and L differ from their counterparts that allow empty premises by prohibiting Γ from being empty in /R and \R. A wide variety of work has contributed to the search for the boundary between polynomial time and NP-completeness for sequent derivability in L and L• . In particular, [9], [6], [16], [15], [13] and [4] have contributed to the search for polynomial time by furthering research into proof nets and chart parsing algorithms for the four variants of the Lambek calculus defined here. As an alternative to this approach, [1] provides a polynomial time algorithm for L when the input is restricted to categories of order less than two. In a similar vein, [2] and [7] provide polynomial time algorithms for the non-associative variants of L and L• . In contrast to this work, [14] proved that sequent derivability in both L• and L•∗ is NP-complete. However, because the necessity of product for modelling C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 44–55, 2010. c Springer-Verlag Berlin Heidelberg 2010
LC Graphs for the Lambek Calculus with Product
45
αα Γ α ΔβΘ γ \L ΔΓ α\βΘ γ
αΓ β \R Γ α\β
Γ α ΔβΘ γ /L Δβ/αΓ Θ γ
Γα β /R Γ β/α
Γ αβΔ γ •L Γ α • βΔ γ
Γ α Δβ •R ΓΔ α • β
Fig. 1. Inference rules of the Lambek calculus
natural language has not been firmly established, the computational complexity of sequent derivability in both L and L∗ remains an important open problem. This paper will continue this research with the intent of discovering the precise computational differences between L and L• and with an eye towards solving the problem of sequent derivability in L. We will introduce a graph formalism for representing proofs in L• and use it to analyze the NP-completeness proof of [14]. An intuitive graphical presentation is made of Pentus’ proof and we also discuss the possibility of transforming that proof into an NP-completeness proof for L. Then, in the conclusion, we will discuss the use of this graph formalism as the basis of a chart parsing algorithm. Beyond purely theoretical interest, the Lambek calculus can be motivated by the practical success of Combinatory Categorial Grammar (CCG) [18,5,3]. However, despite the similarities between the approaches of the Lambek calculus and CCG, it is well-known that CCG recognizes languages that are super-contextfree [10] whereas the Lambek calculus recognizes only the context-free languages. This could be seen to be problematic since it is also well-known that there are natural languages which are not context-free [17]. Despite this, the Lambek Calculus is interesting because it is weakly equivalent to context-free grammars, which are widely used in the computational linguistics community for practical parsing, but not strongly equivalent [19], allowing for valuable comparisons. Furthermore, the Lambek calculus has been the basis of more complex systems that recognize languages that are not context-free [12] and any investigation into these more complex systems must begin with the Lambek calculus. The proofs in this paper are kept at a high level. More detail can be found in [8].
2
Proof Nets and Graph Representations
Exploring the sequent derivability problem in the Lambek calculus via the inference rules shown in figure 1 has proven to be quite cumbersome and as a result most work in this area is done via proof nets.
46
T.A.D. Fowler
Proof nets, as originally introduced by [9], are an extra-logical proof system which eliminates spurious ambiguity. A proof structure consists of a deterministic proof frame and a non-deterministic axiomatic linkage. First, all formulae in the sequent are assigned a polarity. Formulae in the antecedent are assigned negative polarity and the formula in the succedent is assigned positive polarity. The proof frame is a proof-like structure built on a sequent using the decomposition rules shown in figure 2. α+ −
β− α\β −
α
β α/β
α−
+
−
β−
α • β−
⊗ ⊗ ℘
β+ β
−
β+
α− α\β +
α α/β
℘
+
+
α+
α • β+
℘ ⊗
Fig. 2. Proof frame rules
Each connective-polarity pair has a unique rule which gives us a unique proof frame for a given sequent. The top of the proof frame consists of atoms with polarities which are called the axiomatic formulae. An axiomatic linkage is a bijection that matches axiomatic formulae with the same atom but opposite polarities. See figure 4 for an example of a proof structure for a sequent. Some proof structures correspond to proofs in the Lambek calculus and those which do are called proof nets. It should be noted that all proof nets for the Lambek Calculus require a planar axiomatic linkage. A variety of methods have been introduced to determine whether a proof structure is a proof net, all of which are based on graphs. These methods fall into two major categories described in sections 2.1 and 2.2. The primary difference between these two traditions is the fact that the Girard-style conditions can also be used for non-intuitionistic variants of the Lambek calculus whereas the Roorda-style conditions take advantage of the intuitionistic nature of the Lambek calculus. 2.1
Girard Style Correctness Conditions
Presentations in this style are based on the original correctness conditions given in [9]. This style is characterized by building graphs based on the proof frame rules based only on whether the rule is a ⊗-rule or a ℘-rule. Work in this style includes the graphs of [6], R&B graphs of [15], quantum graphs of [16] and switch graphs of [4]. [6] were the first to formulate graph representations of proof nets in the Girard style. The DR-graph of a proof structure is obtained by translating each formula in the proof structure into a vertex and then inserting edges between each parentchild pair in the proof frame and between each pair of axiomatic formulae in the axiomatic linkage.
LC Graphs for the Lambek Calculus with Product
47
Then, a switching of the DR-graph is obtained by finding the set of all ℘rules in the proof frame and deleting exactly one of the two edges between the conclusion and the two premises of each rule in the proof frame. [6] proved that a proof structure is a proof net if and only if every switching is acyclic. 2.2
Roorda Style Correctness Conditions
[16] introduced a significantly different method for evaluating the correctness of proof structures. This method requires the annotation of the proof frame with lambda calculus terms as well as the creation of a set of substitutions as shown in square brackets in figure 3. +
−
α : u
β : tu −
α\β : t − + α : tu β : u −
α/β : t − − α : (t)0 β : (t)1 −
α•β : t
+
−
β : v
α : u +
α\β : v − + β : u α : v +
α/β : v + + β : v α : v +
α•β : v
[v := λu.v ] [v := λu.v ] [v := v , v ]
Fig. 3. Annotation of lambda terms and substitutions
+
−
In addition to the substitutions specified above, for each pair X : α, X : Δ in the axiomatic linkage, we add a substitution of the form [α := Δ]. [16] then provides a method for determining proof structure correctness based on variable substitutions for L, L∗ , L• and L•∗ . [13] introduces a graph representation of this method for L called LC graphs. An LC graph is obtained from a proof structure by taking the vertex set as the set of lambda variables occurring in the proof frame. Then, directed edges are introduced from the lambda variable on the left of a substitution to the lambda variables on the right of the substitution for each substitution. [13] then gives the following correctness conditions for these LC graphs: – I(1) There is a unique vertex s in G with in-degree 0 such that for all v ∈ V , s v.1 – I(2) G is acyclic. – I(3) For every substitution of the form [v := λu.w], w u. – I(CT) For every substitution of the form [v := λu.w], there exists a negative vertex x in G and there is no substitution of the form [v := λx.w ]. [13] proves that a sequent is derivable in L∗ iff it has an LC graph that satisfies I(1-3) and that a sequent is derivable in L iff it has an LC graph that satisfies I(1-3) and I(CT). 1
denotes path accessibility.
48
T.A.D. Fowler
−
A : (f )0
−
A : (f )1 −
℘
(A • A) : f −
+
A : ab −
A : b ⊗
(A/A) : a
+
+
A : g +
(A/(A • A)) : e
A : d
+
((A/(A • A)) • A) : c
℘
⊗
Substitutions := [c := e, d], [e := λf.g], [g := ab], [b := (f )1 ], [d := (f )0 ] Fig. 4. A proof structure for (A/A) ((A/(A • A)) • A) with annotations
2.3
Evaluation of Girard and Roorda Style Correctness Conditions
Given our goal of investigating the boundary of tractability for the variants of the Lambek calculus which requires an investigation into the computational differences between L and L• , we must evaluate the two proof net styles. The Girard style conditions have the advantage that they have been defined for both L and L• but the significant disadvantage that by ignoring the differences among ⊗ rules and among ℘ rules, removing product does not simplify the complexity of these conditions. On the other hand, the Roorda style conditions do become simplified with the removal of product, given that projections and pairings are removed. However, no graph formalism has been introduced for L• in this style until now.
3
LC Graphs for L• and L•∗
We will construct our LC graphs for sequents with products in exactly the same way as for those without products with the obvious difference that we will have the two • rules and the substitutions associated with the positive • rule. It turns out that this is all that is necessary to accommodate both L• and L•∗ . Then, we will add the following correctness condition. – I(4) For every substitution of the form [v := λu.v ] and for every x ∈ V , either every path from x to u contains v or v x. We can prove that these correctness conditions are sound and complete relative to the correctness conditions for variable substitutions in [16] in a very similar way to the proofs for LC graphs in [13]. Most proofs follow from the close mirroring between the correctness conditions for LC graphs and the correctness conditions for variable substitutions. To prove that I(1) is necessary requires an application of structural induction and some facts about projections in the lambda calculus. Details of these proofs can be found in [8]. It is important to notice that the only difference between LC graphs for L and those for L• is a single correctness condition. This simple difference does not appear in the treatment of proof nets in [16] with the result that we now
LC Graphs for the Lambek Calculus with Product
49
d c
a
f
g
e
b Fig. 5. The LC graph for the proof structure in figure 4
have a new tool for examining how different the two calculi are in terms of their parsing complexity. Figure 4 shows a proof structure for a sequent which is potentially derivable in L• . Figure 5 shows the corresponding LC graph. The path from d to f violates I(4) causing this proof structure to not qualify as a proof net.
4
LC Graphs over Atoms
In this section, we will introduce a novel version of LC graphs that uses the atom occurrences across the top of the proof frame as vertices rather than the lambda variables. This new representation is a step towards a chart parsing algorithm based on LC graphs because by discarding the lambda terms, we obtain a closer connection between the axiomatic linkages and the LC graphs. The structure of these new LC graphs will closely mirror the structure of those in the preceding section. Definition 1. An LC graph over atoms for a sequent is a directed graph whose vertices are category occurrences and whose edges are introduced in four groups. We will proceed with a deterministic step first and a non-deterministic step second. First, we assign polarities to category occurrences by assigning negative polarity to occurrences in the antecedent and positive polarity to the succedent. Then, the first two groups of edges are introduced by decomposing the category occurrences via the following vertex rewrite rules: (α/β)− ⇒ α− → β + −
(1)
(α/β) ⇒ β α (β\α)− ⇒ β + ← α−
(2) (3)
(β\α)+ ⇒ α+ β − α− (β • α)− ⇒ β −
(4) (5)
(β • α)+ ⇒ α+
(6)
+
+
β+
Each vertex rewrite rule specifies how to rewrite a single vertex on the left side to two vertices on the right side. For rules (1-4), the neighbourhood of the vertex on the left side of each rule is assigned to α on the right side. For rules (5-6), the neighbourhood of the vertex on the left side is coped to both α and β on the
50
T.A.D. Fowler
right side. Dashed edges are referred to as Lambek edges and non-dashed edges are referred to as regular edges. These two groups of edges will be referred to as rewrite edges. After decomposition via the rewrite rules, we have an ordered set of polarized vertices, with some edges between them. We say that a vertex belongs to a category occurrence in the sequent if there is a chain of rewrites going back from the one that introduced this vertex to the one that rewrote the category occurrence. A third group of edges is introduced such that there is one Lambek edge from each vertex with in-degree 0 in the succedent to each vertex with in-degree 0 in each of the antecedent category occurrences. These edges are referred to as rooted Lambek edges. This completes the deterministic portion of term graph formation. An axiomatic linkage is defined exactly as in the preceding section. The fourth group of edges are introduced as regular edges from the positive vertices to the negative vertices they are linked to. This definition of LC graphs is quite similar to the definition based on lambda variables which we will now refer to as LC graphs over variables. An example of an LC graph over atoms is shown in figure 6 with the equivalent LC graph over variables shown in figure 7. We define the following correctness conditions on an LC graph over atoms G: – I’(1) G is acyclic – I’(2) For each negative vertex t in G and each vertex x in G, either there is no regular path from x to t or for some Lambek edge s, t, there is a regular path from s to x or there is a regular path from x to s. – I’(CT) For each positive vertex s in G that is the source of a Lambek edge, there is a regular path from s to some vertex x such that either x has an in-edge that is a rooted Lambek edge or x has an in-edge that is a Lambek edge s , x such that there is no regular path from s to s . These conditions correspond to sequent derivability in the Lambek calculus with product in the following way: Definition 2. An LC graph over atoms is L•∗ -integral if it satisfies I (1) and I (2). An LC graph over atoms is integral if it is L•∗ -integral and also satisfies I (CT ). Theorem 1. A sequent is derivable in L•∗ iff it has an L•∗ -integral LC graph over atoms. Proof. We will prove this result by a mapping between the variables in an LC graph over variables and the atoms in an LC graph over atoms with a number of additional considerations. The primary mapping will be to map an atom to the leftmost variable appearing in its lambda term label. This has the effect that negative vertices in an LC graph over variables are reoriented to have regular out-edges to their positive siblings in an LC graph over atoms. There are four additional structural differences:
LC Graphs for the Lambek Calculus with Product
51
(1) Positive occurrences of products In an LC graph over variables, there are a number of lambda variables that do not appear as the leftmost variable for an axiomatic formula which means that they have no corresponding atom in an LC graph over atoms. However, these variables serve only to allow branches in the paths of an LC graph over variables which is encoded similarly in an LC graph over atoms. We then need only ensure that for conditions I(3), I(4) and I(CT) this internal structure is preserved, which is done by the rewrite Lambek edges. (2) Negative occurrences of products Negative occurrences of products in a proof structure introduce projections and duplicate terms which has the effect of having multiple atoms labelled by terms with the same leftmost variable. In an LC graph over atoms, these atoms are necessarily represented by different atoms. To ensure that the two types of graphs are structurally equivalent, we duplicate the regular outedges during LC graph over atom construction and do the same for Lambek in-edges ensuring that each of the different atoms in an LC graph over atoms requires the same paths as in an LC graph over variables. (3) Rewrite Lambek edges In an LC graph over variables, conditions I(3) and I(4) specify conditions on the substitutions from the proof frame specifying paths between certain variables in those substitutions. Rather than maintaining these substitutions, LC graphs over atoms simply make these conditions explicit in the form of Lambek edges and requiring Lambek edges to have accompanying regular paths. (4) Rooted Lambek edges Since I(1), I(3) and I(4) all specify the existence of certain paths in an LC graph over variables, rooted Lambek edges allow us to combine these three conditions in an LC graph over atoms. This is done by inserting Lambek edges between each source in the succedent category’s LC graph and each source in each antecedent category’s LC graph. Then, satisfying I’(2) is precisely equivalent to satisfying I(1), I(3) and I(4). Then, given these structural correspondences, I’(1) and I’(2) are simply the natural translations of I(1-4). Theorem 2. A sequent is derivable in L• iff it has an integral LC graph over atoms. Proof. Given the structural correspondences given in the preceding theorem, I’(CT) is the straightforward translation of I(CT) into the language of LC graphs over atoms.
5
The NP-Completeness Proof
[14] showed that sequent derivability in both L• and L•∗ is NP-complete and our purpose in this section will be to analyze that proof using LC graphs to determine whether it can be adapted to an NP-completeness proof for derivability in L and L∗.
52
T.A.D. Fowler
Pentus’ proof is via a reduction from SAT. Given a SAT instance c1 ∧ . . . ∧ cm , [14] introduced the following categories for t ∈ {0, 1}, 1 ≤ i ≤ n and 0 ≤ j ≤ m. ¬1 v is a shorthand for v and ¬0 v is a shorthand for ¬v. Ei0 (t) = p0i−1 \p0i Eij (t) = (pji−1 \Eij−1 (t)) • pji if ¬t xi ∈ cj Eij (t) = pji−1 \(Eij−1 (t) • pji ) otherwise G0 = p00 \p0n Gj = (pj0 \Gj−1 ) • pjn Hi0 = p0i−1 \p0i Hij = pji−1 \(Hij−1 • pji ) Fi = (Eim (1)/Him ) • Him • (Him \Eim (0)) These categories are then used to construct the sequent F1 , . . . , Fn Gm . [14] then proved that F1 , . . . , Fn Gm is derivable in L• if and only if E1 (t1 ), . . . , En (tn ) Gm is derivable in L• for some truth assignment t1 , . . . tn ∈ {0, 1}n . We now want to consider all possible LC graphs for E1 (t1 ), . . . , En (tn ) Gm . Because each atom occurs exactly once with positive polarity and once with negative polarity in E1 (t1 ), . . . , En (tn ) Gm , there is exactly one possibly integral term graph and the axiomatic linkage for that term graph is planar. Given the similarity of these sequents, we can depict the LC graph over variables for an arbitrary truth assignment t1 , . . . , tn in figure 6, given the appropriate variable assignments in the proof frame. For the precise details behind the variables in the LC graphs for these sequents see [8]. In a similar way, the LC graph over atoms can be depicted as in figure 7. The LC graph is independent of t1 , . . . , tn except for an m by n chart of edges (shown as finely dashed edges). Then, in both figures 6 and 7, the edge in column j, row i is not present if and only if ¬ti xi appears in cj . Consider the LC graph over variables in figure 6. It is not difficult to see that no proof structure for this sequent can ever violate I(1), I(2) or I(3) by checking its LC graph. Since ¬ti xi is present in cj if and only if the presence of that variable causes cj to be true for truth assignment t1 , . . . , tn , all of the edges
m
c u
bm
dm em
... ...
1
c
b1
d1 e1
d0 b0
un .. .
am n .. .
...
a1n .. .
a0n .. .
u1
am 1
...
a11
a01
Fig. 6. LC graph of E1 (t1 ), . . . , En (tn ) G
LC Graphs for the Lambek Calculus with Product G+
pm n
...
p1n
p0n
G−
pm n
...
p1n
p0n
En+
pm n−1
...
p1n−1
p0n−1
.. .
.. .
.. .
p11
p01
E2−
pm 1
.. . ...
E1+
pm 0
...
p10
p00
E1−
pm 0
...
p10
p00
53
Fig. 7. LC graph over atoms of E1 (t1 ), . . . , En (tn ) G
in a column are present if and only if cj is necessarily false under t1 , . . . , tn . As can be seen this occurs if and only if an I(4) violation is caused by the path from bj to dj . With this result, we can see that not only is I(4) an important part of Pentus’ NP-completeness proof, but that it is the only correctness condition with any influence on the derivability of F1 , . . . , Fn Gm and consequently on the satisfiability of c1 ∧ . . . ∧ cm . Now, consider the LC graph over atoms in figure 7. The structure is quite similar to the LC graph over variables. In particular, the SAT instance is satisfiable if and only if for some 1 ≤ i ≤ m, there is no regular path from pin to pi0 . But I (2) is integral if and only if no such path exists due to the location of Lambek edges. With this presentation of LC graphs over atoms, we can see that the important structure that cannot be represented in LC graphs for L is the negative atoms with multiple Lambek in-edges and multiple regular in-edges. This is due to the fact that without the copying effect of the vertex rewriting rules for product, multiple in-edges of each type of edge are impossible. To adapt this proof to L, some way will need to be found to emulate this structure in LC graphs for L.
6
Conclusion
Having introduced LC graphs over variables for L• and L•∗ , comparing them with LC graphs for L reveals that the difference is only a single path condition on certain vertices in the graph. Furthermore, we can see by applying this observation to the NP-completeness proof of [14] that this path condition is absolutely essential to that proof. This has given us a graphical insight into the precise differences between L and L• . Furthermore, we have gained insight into precisely the kinds of structures in the LC-Graphs for L• that we will need to construct in the LC-Graphs for L if we are to prove the NP-completeness of L in a similar way.
54
T.A.D. Fowler
In addition to extending the LC-Graphs of [13], we also introduced the simplified variant which we called LC-Graphs over atoms. LC-Graphs over atoms allows a closer tie between the axiomatic linkage and the LC-Graph which allows us to define a very natural chart parsing algorithm that uses this representation. Such an algorithm would incrementally build axiomatic linkages, inserting partially completed LC-Graphs into the chart. In addition, each axiomatic linkage corresponds to exactly one edge in an LC-Graph and that edge is the only unconstrained part of the neighbourhoods of its endpoints. Thus, we can contract the paths around the vertices to allow for a more compact representation. To manipulate this more compact representation, we would need to define incremental versions of our correctness conditions. Further exploration of this algorithm is likely to give us insight into the precise boundary between polynomial time and NP-completeness for the variants of the Lambek calculus. This boundary is not limited simply to the variants proposed here, but also to restrictions of L and L• such as those with bounded order as in [1].
References 1. Aarts, E.: Proving theorems of the second order lambek calculus in polynomial time. Studia Logica 53(3), 373–387 (1994) 2. Aarts, E., Trautwein, K.: Non-associative lambek categorial grammar in polynomial time. Mathematical Logic Quarterly 41(4), 476–484 (1995) 3. Bos, J., Markert, K.: Recognising textual entailment with logical inference. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 628–635 (2005) 4. Carpenter, B., Morrill, G.: Switch graphs for parsing type logical grammars. In: Proceedings of IWPT 2005, Vancouver (2005) 5. Clark, S., Curran, J.R.: Parsing the WSJ using CCG and log-linear models. In: Proceedings of the 42nd Meeting of the ACL, pp. 104–111 (2004) 6. Danos, V., Regnier, L.: The structure of multiplicatives. Archive for Mathematical logic 28(3), 181–203 (1989) 7. de Groote, P.: The non-associative lambek calculus with product in polynomial time. In: Murray, N.V. (ed.) TABLEAUX 1999. LNCS (LNAI), vol. 1617, pp. 128–139. Springer, Heidelberg (1999) 8. Fowler, T.A.D.: A graph formalism for proofs in the Lambek Calculus with Product. Master’s thesis, University of Toronto (2006) 9. Girard, J.Y.: Linear logic. Theoretical Computer Science 50(1), 1–102 (1987) 10. Joshi, A.K., Vijay-Shanker, K., Weir, D.: The convergence of mildly ContextSensitive grammar formalisms. In: Foundational issues in natural language processing, pp. 31–81 (1991) 11. Lambek, J.: The mathematics of sentence structure. American Mathematical Monthly 65(3), 154–170 (1958) 12. Moot, R., Puite, Q.: Proof nets for the multimodal lambek calculus. Studia Logica 71(3), 415–442 (2002)
LC Graphs for the Lambek Calculus with Product
55
13. Penn, G.: A graph-theoretic approach to sequent derivability in the lambek calculus. Electronic Notes in Theoretical Computer Science 53, 274–295 (2004) 14. Pentus, M.: Lambek calculus is NP-complete. Theoretical Computer Science 357(1-3), 186–201 (2006) 15. Retore, C.: Perfect matchings and series-parallel graphs: multiplicatives proof nets as R&B-graphs. Electronic Notes in Theoretical Computer Science 3, 167–182 (1996) 16. Roorda, D.: Resource logics: proof-theoretical investigations. PhD thesis, Universiteit van Amsterdam (1991) 17. Shieber, S.M.: Evidence against the context-freeness of natural language. Linguistics and Philosophy 8(3), 333–343 (1985) 18. Steedman, M.: The syntactic process. MIT Press, Cambridge (2000) 19. Tiede, H.J.: Deductive Systems and Grammars: Proofs as Grammatical Structures. PhD thesis, Indiana University (1999)
Proof-Theoretic Semantics for a Natural Language Fragment Nissim Francez1 and Roy Dyckhoff2 1
2
1
Computer Science dept.,Technion-IIT, Haifa, Israel
[email protected] School of Computer Science, University of St Andrews, Scotland, UK
[email protected]
Introduction
We propose a Proof-Theoretic Semantics (PTS) for a (positive) fragment E0+ of Natural Language (NL) (English in this case). The semantics is intended [7] to be incorporated into actual grammars, within the framework of Type-Logical Grammar (TLG) [12]. Thereby, this semantics constitutes an alternative to the traditional model-theoretic semantics (MTS), originating in Montague’s seminal work [11], used in T LG. We would like to stress on the onset, that this paper is mainly intended for setting the stage for the proposed approach, focusing on properties of the system of rules itself. Subsequent work (references provided) focuses on more properties of the semantics itself and expand in depth on several semantic issues. There is no claim that the current paper solves any open semantic problems in MTS. By providing in detail an alternative approach we prepare the ground for further research that might settle the rivalry between the two approaches. The essence of our proposal is: – For meanings of sentences, replace truth conditions (in arbitrary models) by canonical derivability conditions (from suitable assumptions). In particular, this involves a “dedicated” proof-system (in natural deduction form), based on which the derivability conditions are defined. In a sense, the proof system should reflect the “use” of the sentences in the fragment, and should allow to recover pre-theoretic properties of the meanings of these sentences such as entailment and assertability conditions. The system should be harmonious, in that its rules have a certain balance between introduction and elimination, in order to qualify as meaning conferring. Two notions of harmony are shown to be satisfied by the proposed rules (see Section 4). – For subsentential phrases, down to lexical words, replace their denotations (in arbitrary models) as conferring meaning, by their contributions to the meanings (derivability conditions) of sentences in which they occur. This adheres to Frege’s context principle. As mentioned, this is reported elsewhere ([7]). C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 56–71, 2010. c Springer-Verlag Berlin Heidelberg 2010
Proof-Theoretic Semantics for a Natural Language Fragment
57
The following quotation from [22] (p. 525) emphasizes this lack of applicability to N L, the original reason for considering P T S to start with: Although the “meaning as use” approach has been quite prominent for half a century now and provided one of the cornerstones of philosophy of language, in particular of ordinary language philosophy, it has never become prevailing in the formal semantics of artificial and natural languages. In formal semantics, the denotational approach which starts with interpretations of singular terms and predicates, then fixes the meaning of sentences in terms of truth conditions, and finally defines logical consequence as truth preservation under all interpretations, has always dominated. The main motivation for pursuing PTS originates from the criticism, by several philosophers of language and logicians, about the adequacy of MTS as a theory of meaning. Notably, Dummett (e.g., [3]), Brandom (e.g., [2]) and others. The most famous criticism is Dummet’s manifestation argument, regarding grasping the meaning of a sentence as involving the ability (at least in principle) to verify it, as a condition for its assertability. If, as MTS maintains, meanings are truthconditions (in arbitrary models), manifestation would require deciding whether the truth-conditions obtain in a given arbitrary model. This is not the case even for the simplest sentences, involving only predication, as set membership is not decidable1 in general. In devising a PTS and incorporating it into the grammar of N L, we are not necessarily committing ourselves to the accompanying philosophical positions, such as anti-realism. Some of these philosophical principles have been put under scrutiny. Rather, our point of departure is computational linguistics, with its stress of effectiveness of its methods and theories. There are several differences in the way the PTS is conceived, owing to the differences between E0+ and traditional formal calculi for which ND-systems were proposed in logic as a basis for PTS. – Logical calculi are recursive, in that each operator (connective, quantifier) is applied to (one, two or more) formulas of the calculus, to yield another formula. Thus, there is a natural notion of the main operator, introduced into/eliminated from a formula. In E0+ there is no such notion of a main operator. In this sense, all E0+ sentences are atomic (in not having a sentence as a constituent). – Formal calculi are semantically unambiguous, while E0+ (and NL in general) is semantically ambiguous. In a PTS, the semantic ambiguity manifests itself via different derivations (from same assumptions). This will be exemplified below by showing how traditional quantifier scope ambiguity manifests itself (see (3.1)). – Formal logical calculi usually have (formal) theorems, having a proof, i.e. a (closed) derivation from no open assumptions. In natural language (an in particular in the fragment we consider here) there are hardly any formal theorems. 1
There is no precise statement by Dummet as to what is taken as “decidable”. It is plausible, at least in a computational linguistics context, to identify this notion with effectiveness (i.e., algorithmic decidability).
58
N. Francez and R. Dyckhoff
Typically, sentences are contingent and their derivations rely on open assumptions (but see Section (4.2)). This difference has a direct influence on the conception of PTS-validity (of arguments, or derivations) (see [5]).
2
The Natural Deduction Proof System
2.1
The NL Core Fragment E0+
We start with the core extensional fragment E0+ of English, with sentences headed by intransitive and transitive verbs, and determiner phrases with a (count) noun2 and a determiner. In addition, there is the copula. This is a typical fragment of many NLs, syntactically focusing on subcategorization, and semantically focusing on predication and quantification. Some typical sentences are listed3 below. (1) every/some girl smiles (2) every/some girl is a student (3) every/some girl loves every/some boy Note the absence of proper names, to be added later in the paper, and negative determiners like no, not included here (hence the superscript ‘+’ in the names of these positive fragments). Expressions such as every girl, some boy are dps (determiner-phrases). Every position that can be filled with a dp is a locus of introduction (of the quantifier corresponding to the determiner of the introduced dp). We note that this fragment contains only two determiners, ‘every’ and ‘some’, each treated in a sui generis way. In [1] we present a general treatment of determiners (and dps) in PTS, providing, for example, proof-theoretic characterization of their monotonicity properties, and capturing proof-theoretically their conservativity, traditionally expressed in model-theoretic terms. Also, a deeper study of negative determiners such as ‘no’, is added to the fragment elsewhere. + The Extended Proof-Language L+ 0 . The proof system N0 is defined over + + a language L0 , extending E0 , schematizing over it, and disambiguating its sentences. We use X, Y, ... to schematize over nouns4 , P, Q to schematize over intransitive verbs, and R to schematize over transitive verbs. In addition, L+ 0 incorporates a countable set P of individual parameters, ranged over by metavariables (in boldface font) like j, k, r. Syntactically, parameters are also regarded as dps. For simplicity, we consider is a as a single lexical unit, isa. Schematic sentences containing occurrences of parameters, referred to as pseudo-sentences, have a role only in derivations within the proof system; they are artifacts of inference, not of assertion. In the sequel, unless otherwise stated, we use ‘sentence’ generically both for sentences and for pseudo-sentences. 2 3 4
Currently, only singular (and not plural) nouns are considered. Throughout, all NL expressions are displayed in the san-serif font. Here nouns are lexical nouns only; later in the paper the language is augmented with compound nouns, also falling under the X, Y, ... schematization.
Proof-Theoretic Semantics for a Natural Language Fragment
59
We use meta-variable S to range over L+ 0 sentences. For any dp-expression D having a quantifier, the notation S[(D)n ] refers to a sentence S having a designated position filled by D, where n is the scope level (sl) of the quantifier in D. In case D has no quantifier (i.e., it is a parameter), sl = 0. The higher the sl, the higher the scope. For example, S[(some X)2 ], has some X in the higher scope, like (every X)1 loves (some Y )2 , representing in the object wide-scope reading of the E0+ sentence every X loves some Y . We use the conventions that within a rule, both S[D1 ], S[D2 ] refer to the same designated position in S, and when the sl can be unambiguously determined it is omitted. We use r(S) to indicate the rank of S, the highest sl on a dp within S. The notation is extended to more than one dp, where, say, S[(D1 )i , (D2 )j ] indicates a sentence S with two designated positions filled, respectively, by D1 , D2 , each with its respective sl. Pseudo-sentences are classified into two groups. – Ground: Ground pseudo-sentences contain5 only parameters in every position that can be filled by a dp. Note that for a ground S, r(S) = 0. – Non-ground: Non-ground pseudo-sentences contain a dp with a determiner in at least one such position (but not in all). The ground pseudo-sentences play the role of atomic sentences, and their meaning is assumed given, externally to the ND proof-system. The latter defines sentential meanings of non-ground pseudo-sentenses (and, in particular, E0+ sentences), relative to the given meanings of ground pseudo-sentences. 2.2
The Natural Deduction Proof-System N0+
The presentation is in Gentzen’s “Logistic”-style ND, with shared contexts, single succedent, set antecedent sequents Γ S, formed over contexts of L+ 0 sentences. We enclose discharged assumptions in (indexed) square brackets, using the index to mark the rule-application responsible for the discharge. There are introduction-rules (I-rules) and elimination-rules (E-rules) for each determiner forming a dp, the latter marked for its scope level. The usual notion of (treeshaped) derivation is assumed. We use D for derivations, where DΓ S is a derivation of sentence S∈L+ 0 from context Γ . We use Γ, S for the context extending Γ with sentence S. F (Γ ; j) means j is fresh for Γ . In the rule names, we abbreviate ‘every’ and ‘some’ to ‘e’ and ‘s’, respectively. The meta-rules for N0+ are presented in Figure 1. A word of explanation about the I-rules is due. The scopelevel r(S[j]) is the highest scope of a quantifier already present in S[j]. When a new dp is introduced into the position currently filled by j, it obtains the scope level r(S[j]) + 1, thereby its quantifier becoming the one of the highest scope in the resulting sentence. as for the E-rules, they always eliminate the quantifier of the highest scope. Note that the E-rules are of a format known As generalized elimination, relying on drawing arbitrary consequences from the major premiss. This issue is elaborated upon in Section 4.1, and in a more general setting in [6]. A convenient derived E-rule, shortening derivations is: 5
Note that this use of ‘ground’ is different from the one in logic programming, where it is used for a term without any (free) variables.
60
N. Francez and R. Dyckhoff
Γ, SS Γ, [j isa X]i S[j] (eI i ) Γ S[(every X)r(S[j])+1] Γ S[(every X)r(S[j])+1]
(Ax) Γ j isa X Γ S[j] (sI) Γ S[(some X)r(S[j])+1 ]
Γ j isa X
Γ, [S[j]]i S
Γ S Γ S[(some X)r(S[j])+1 ] Γ, [j isa X]j , [S[j]]i S Γ S
(eE i )
(sE i,j )
where F(Γ, S[every X]; j) in (eI), and F(Γ, S[some X], S ; j) for (sE).
Fig. 1. The meta-rules for N0+
Γ S[(every X)r(S[j])+1 ] Γ j isa X Γ S[j]
ˆ (eE)
(its derivability is shown in the full
paper). Lemma (weakening6 ): If Γ S, then Γ, Γ S. The need for a contraction-rule and the justification of the freshness condition are in the full paper. Below is an example derivation establishing some U isa X, (every X)2 R (some Y )1 , every Y isa Z (some U )1 R (some Z)2
D1 D2 Let (some U )2 R (some Z)1 and (some U )1 R (some Y )2 be the following two sub-derivations. (every X)2 R (some Y )1 [r isa X]2 ˆ (eE) [r isa U ]1 r R some Y (sI) (some U )2 R (some Y )1 some U isa X 1,2 (sE ) D1 : (some U )2 R (some Y )1 every Y isa Z [j isa Y ]4 ˆ (eE) j isa Z [some U R j]3 (sI) D2 : (some U )1 R (some Z)2 The whole derivation combines the two sub-derivations by D1 D2 (some U )2 R (some Y )1 (some U )1 R (some Z)2 (sE 3,4 ) (some U )1 R (some Z)2 6
Weakening is not really needed, and is introduced here for technical reason, having an easier proof of the termination of the proof-search in the Sequent-Calculus (see below). Ultimately, there might be a need to remove it.
Proof-Theoretic Semantics for a Natural Language Fragment
61
In PTS in logic, there is a notion of a canonical proof, namely a proof the last step in which is an application of an I-rule. In system where proof-normalization obtains, every proof, i.e., closed derivation (with no open assumptions) can be reduced to a canonical one. Here, in our NL-PTS, we are mainly interested in open derivations, having open assumptions. We extend the notion of canonicity to open derivations, and take (see Section 3) them to contribute to sentential meanings. However, as shown in the full paper, not every open derivation can c be reduced to a canonical one. We use c for canonical derivability, and DΓ S c for a canonical derivation of S from (open) assumptions Γ . Furthermore, [[S]]Γ denotes the collection of all (if any) canonical derivations of S from Γ .
3
The Sentential Proof-Theoretic Meaning
In the discussions of PTS in logic, it is usually stated that ‘the ND-rules determine the meanings (of the connectives/quantifiers)’. However, there is no explicit denotational meaning7 defined (proof-theoretic, not model-theoretic, denotation). In other words, there is no explicit definition of the result of this determination. Thus, one cannot express claims of the form ‘the meaning of S has this or that property’, or generalizations about all meanings, involving quantification over meanings. In particular, if one wants to apply Frege’s context principle to those PTS-meanings, and derive meanings for subsentential phrases (including lexical words) as contributions to sentential meanings, such an explication is needed (see [4] and [7]). We take here the PTS-meaning of a E0+ sentence S, and also of an L+ 0 nonground pseudo-sentence S, to be the function from contexts Γ returning the collection of all the canonical derivations in N0+ of S from Γ . For a ground + L+ 0 pseudo-sentence S, its meaning is assumed given, and the meaning of E0 + sentences, as well as non-ground L0 pseudo-sentences is defined relative to the given meanings of ground sentences. In accordance with many views in philosophy of language, every derivation in the meaning of a sentence S can be viewed as providing G[[S]], grounds of asserting S (recall that ground pseudo-sentences are not making any assertion). Semantic equivalence of sentences is based on equality of meaning (and not interderivability). In addition, a weaker semantic equivalence is based on equality of grounds of assertion. Definition (PTS-meaning, grounds): For a sentence S, or a non-ground PTS c pseudo-sentence S, in L+ =df. λΓ.[[S]]Γ G[[S]] =df. {Γ | Γ c S}, where: 0 : [[S]]L+ 0 – For S a sentence in E0+ , Γ consists of E0+ -sentences only. Parameters are not “observable” in grounds for assertion. – For S a pseudo-sentence in L+ 0 , Γ may also contain pseudo-sentences. Recall that the meanings of ground pseudo-sentences are given and meaning for E0+ is defined relative to them. 7
Known also as the semantic value.
62
N. Francez and R. Dyckhoff
The main formal property of meanings (under this definition) is the overcoming (at least for the fragment considered) of the manifestation argument against MTS: asserting a sentence S is based on (algorithmically) decidable grounds (see 4.3). A speaker in possession of Γ can decide whether Γ c S. Clearly, further research is needed to determine the limits (in terms of the size of the fragment captured) of this effective approach to meaning. 3.1
Interlude: Semantic Ambiguity
Consider one of the well-known features of E0+ : quantifier scope ambiguity. The following E0+ sentence is usually attributed two readings, with the following FOL expressions of their respective truth-conditions in model-theoretic semantics. (4) Every girl loves some boy Subject wide-scope (sws): ∀x.girl(x)→∃y.boy(y)∧love(x, y) Subject narrow-scope (sns): ∃y.boy(y)∧∀x.girl(x)→love(x, y) In our PTS, the difference in meanings reflects itself by having different derivations, differing in order of introduction of the subject and object dps. Subject wide-scope (sws): [r isa girl]i D2 D1 r loves j j isa boy (sI) r loves (some boy)1 (eI i ) (every girl)2 loves (some boy)1 Subject narrow-scope (sns): [r isa girl]i D1 r loves j D2 (eI i ) (every girl)1 loves j j isa boy (sI) (every girl)1 loves (some boy)2 Note that there is no way to introduce a dp with a narrow-scope where the dp with the wider-scope has already been introduced. The central pre-theoretic relationship between the two readings is the entailment8 present here in the form (every girl)1 loves (some boy)2 (every girl)2 loves (some boy)1 as shown by the following derivation. 8
A more general treatment of truth and entailment among sentences is deferred to [1], where truth under Γ is captured as non-emptiness of the grounds for assertion (for any given Γ ).
Proof-Theoretic Semantics for a Natural Language Fragment
[j isa X]1
63
[(every X)1 R r]2 ˆ (eE) jRr [r isa Y ]3 (sI) j R (some Y )1 (every X)1 R (some Y )2 (sE 2,3 ) j R (some Y )1 1 (eI ) (every X)2 R (some Y )1
Of course, in the inverse entailment does not hold.
4 4.1
Properties of N0+ Harmony
The origin of P T S for logic is already in the work of Gentzen [8], who invented the natural deduction proof-system for F OL. He hinted there, that I-rules could be seen as the definition of the logical constant, while the E-rules are nothing more than consequences of this definition. This was later refined into the Inversion Principle by Prawitz ([16]), which shows how the I-rules determine the E-rules. The I-rules were taken as a determination of the meaning of the logical constant under consideration, instead of the model-theoretic interpretation, that appeals to truth in a model. However, in view of Prior’s [17] attack, by presenting a connective ‘tonk’, whose I-rule was that of a disjunction, while its E-rule was that of conjunction, trivializing the whole deductive theory by rendering every two propositions inter-derivable, it became apparent that not every combination of ND-rules can serve as a basis for P T S. The notion of harmony of the ND-rules [3], taken in a broad sense to express a certain balance between E-rules and I-rules (absent from the tonk rules) became a serious contender for an appropriateness condition for ND-rules to serve as a basis for a P T S. See [18,20] for a critical discussion of tonk’s disharmony. We consider two harmony notions, and show that N0+ satisfies both. – General-Elimination (GE) harmony: In order to be harmonious, an Erule has to have some specific form, depending on the corresponding I-rules. This form is known as generalized E-rules, and was considered by [15] as having a better relationship to cut-free sequent-calculus derivations. Such an E-rule allows drawing an arbitrary conclusion, provided it is derivable from the premisses of the corresponding I-rule(s). This form guarantees that the inversion-principle obtains, and leads to the availability of proof-reduction, the elimination of a detour caused by an introduction immediately followed by an elimination. This underlies proof normalization, and also constitutes a requirement of intrinsic harmony (see below). Proof-normalization (in its strong version) requires that there is no possibility of an infinite sequence of such reductions (see [21] for a general discussion of the role of normalization in PTS). In [6], we show that a rule-form generalizing a proposal by Read [18], guarantees the availability of the required reduction. All the E-rules in N0+ are of this generalized-elimination form, hence N0+ is GE-harmonious.
64
N. Francez and R. Dyckhoff
Local Intrinsic harmony: Here, in order to be harmonious, no constraints on the form of the E-rules is imposed, but they have to stand in a certain relationship to the I-rules, to directly reflect the required balance among them. We consider here a specific proposal by [14], based on two properties known as local soundness and local completeness. – Local Soundness: Every introduction followed directly by an elimination can be reduced. This shows that the elimination-rules are not too strong w.r.t. the I-rules. – Local Completeness: There is a way to eliminate9 and to reintroduce, recovering the original formula. This process is called expansion. This shows that the E-rules are not too weak w.r.t. the I-rules. In the case of logic, introduction and elimination are of a top-level operator. Here, they refer to the introduction of a dp into every allowable position (and any scope level), and elimination from the same position. We show local intrinsic harmony (in the above sense) for (eI) ((sI) is similar), even though [6] shows this follows from the form of the rules. We do, however, omit showing the reductions/expansions for the extensions of the fragment presented below. Local soundness [j isa X]i D1 S[j] (eI i ) S[(every X)]
D2 k isa X
S
[S[k]]j D3 S (eE j )
D2 D1 [k isa X/j isa X, k/j] S[k] D3 ;r S
D2 Here D1 [k isa X/j isa X, k/j] denotes a derivation in which every instance D2 of use of the assumption j isa X is replaced by the derivation k isa X of its variant k isa X. Since j is fresh for the assumptions on which D1 depends, the replacement of j by k is permissible. Local completeness D S[(every X)] D S[(every X)] ;e
[j isa X]1 [S[j]]2 (eE 2 ) S[j] 1 (eI ) S[(every X)]
There are also other views of harmony, e.g., based on a conservative extension of the theory of the introduced operator [3]. 9
‘Eliminate’ here means applying an E-rule, not necessarily actually eliminating the operator occurrence at hand.
Proof-Theoretic Semantics for a Natural Language Fragment
4.2
65
Closed Derivations
A derivation of Γ S is closed iff Γ = ∅. In logic, closed derivations are a central topic, determining the (formal) theorems of the logic. In particular, for bivalent logics, they induce the (syntactic) notions tautology and contradiction. In L+ 0 , in the absence of negation and negative determiners (like no), there is no natural notion of a contradiction. Furthermore, the only “positive” closed derivation in N0+ is for the sentences of the form every X isa X. The closed derivation is shown [j isa X]1 j isa X (eI 1 ) by: every X isa X In particular, note that some X isa X does not hold. We refer to [5] for a discussion of the influence of the (almost) absent closed derivations on the notion of proof-theoretic validity, the PTS counterpart of the model-theoretic view of validity as truth-preservation. 4.3
Decidability of N0+ Derivability
We now attend to the decidability of derivability in N0+ . It makes P T S-based + + meaning effective for L+ 0 . Figure 2 displays a sequent-calculus SC0 for L0 , + easily shown equivalent to the N0 (in having the same provable sequents). The rules are arranged in the usual way of L-rules (introduction in the antecedent) and R-rules (introduction in the succedent). The following claims are routinely
Γ, SS Γ, j isa X, S[(every X)r(S[j])+1], S[j]S Γ, j isa X, S[(every X)r(S[j])+1 ]S Γ, j isa X, S[j]S Γ, S[(some X)r(S[j])+1 ]S
(Ls)
(ID)
(Le)
Γ, j isa XS[j] (Re) Γ S[(every X)r(S[j])+1 ]
Γ j isa X Γ S[j] (Rs) Γ S[(some X)r(S[j])+1 ]
where j is fresh in Re and Ls.
Fig. 2. A sequent-calculus SC0+ for L+ 0
established for SC0+ . – The structural rules of weakening (W ) and contraction (C) are admissible. – (Cut) is admissible. The full paper shows the existence of a terminating proof-search procedure.
66
N. Francez and R. Dyckhoff
5
Extending the Fragment
Next, we consider some simple extensions of E0+ (and the induced extension 10 of L+ proper names, and the other two are related to 0 ). The first one adds extending the notion of noun. In E0+ , we had only primitive nouns. We now consider two forms of compound noun: one formed by adding adjectives and the other by adding relative clauses. In both cases, in the corresponding extensions of N0+ , we let X, Y schematize over compound nouns also in the original rules. 5.1
Adding Proper Names
We extend E0+ with proper names in dp positions. Typical sentences are: (5) Rachel is a girl (6) Rachel smiles (7) Rachel loves every/some boy (8) every boy loves Rachel Proper names are strictly distinct from parameters in the way they function in the proof-system, as explained below. We retain the name E0+ for this (minor) extension. In L+ 0 , let proper names be schematized by N , and add pseudo-sentences of the forms (9) j is N, N is j (10) j is k, N is M Note that pseudo-sentences having a proper name in any dp-position are not ground! First, we add I-rules and E-rules for is (a disguised identity). We adopt a version of the rules in [19]. Γ, [S[j]]1 S[k] (isI 1 ) Γ j is k
Γ j is k Γ S[j] Γ, [S[k]]1 S (isE 1 ) S
where S does not occur in Γ . From these, we can derive rules for reflexivity (is − ref l), symmetry (is − sym) and transitivity (is − tr). For shortening the presentation of derivations, combinations of these rules are still referred to as applications of (isE). Next, we incorporate I-rules and E-rules of proper names into dp-positions. Γ j is N Γ S[j] (nI) Γ S[N ]
Γ S[N ] Γ, [j is N ]1 , [S[j]]2 S (nE 1,2 ) Γ S ,
j fresh for Γ, S
Below are two example derivations. Rachel isa girl, every girl smilesRachel smiles: Note that Rachel is not a paramˆ is not directly applicable. eter, and (eE) 10
This is different from the role of names in [13]; his names are our parameters. He has no proper names provided by the NL fragment itself.
Proof-Theoretic Semantics for a Natural Language Fragment
67
[r isa girl]2 every girl smiles ˆ (eE) [r is Rachel]1 r smiles (nI) Rachel isa girl Rachel smiles (nE 1,2 ) Rachel smiles Rachel isa girl, Rachel smilessome girl smiles: Again, since Rachel is not a parameter, (sI) is not directly applicable. [r1 is Rachel]1
[r2 is Rachel]3
r1 is r2
(isE)
r2 isa girl Rachel smiles Rachel isa girl
[r1 isa girl]2
(isE)
some girl smiles some girl smiles
some girl smiles
[r2 smiles]4
(sI)
(nE 3,4 )
(nE 1,2 )
The corresponding extension to the sequent calculus SC0+ consists of the following rules. Γ, j is N, S[j] S Γ j is N Γ S[j] (Ln) (Rn) Γ, S[N ] S Γ S[N ] 5.2
Adding Adjectives
We augment E0+ with sentences containing adjectives, schematized by A. We consider here only what is known in MTS as intersective adjectives. Typical sentences are: (11) Rachel is a beautiful girl/clever beautiful girl/clever beautiful red-headed girl (12) Rachel/every girl/some/girl is beautiful (13) Rachel/every beautiful girl/some beautiful girl smiles (14) Rachel/every beautiful girl/some beautiful girl loves Jacob/every clever boy/some clever boy A noun preceded by an adjective is again a (compound) noun. Denote this ex+ tension by E0,adj . Recall that, in the N0+ rules, the noun schematization should + be taken over compound nouns too. Note that E0,adj is no longer finite, as an unbounded number of adjectives may precede a noun. We augment N0+ with the following ND-rules for adjectives. Γ j isa A X Γ j isa X Γ j is A (adjI) Γ j isa A X
Γ, [j isa X]1 , [j is A]2 S (adjE 1,2 ) Γ S
+ Let the resulting system be N0,adj . Again, we can obtain the following derived elimination rules, used to shorten presentations of example derivations.
68
N. Francez and R. Dyckhoff
Γ j isa A X (adj Eˆ1 ) Γ j isa X
Γ j isa A X (adj Eˆ2 ) Γ j is A
Note that the intersectivity here is manifested by the rules themselves (embodying an “invisible” conjunctive operator), at the sentential level. These rules induce intersectivity as a lexical property of (some) adjectives by the way lexical meanings are extracted from sentential meanings, as shown in [7]. The following sequent, the corresponding entailment of which is often taken as the definition + of intersective adjectives, is derivable in N0,adj : j isa A X, j isa Y j isa A Y as shown by j isa A X ˆ2 ) (adj E j isa Y j is A (adjI) j isa A Y As an example of derivations using the rules for adjectives, consider the following derivation for j loves every girl j loves every beautiful girl In MTS terminology, the corresponding entailment is a witness to the downward monotonicity of the meaning of every in its second argument. We use an obvious schematization. [r isa A Y ]1 ˆ (adj E) j R every Y r isa Y ˆ (eE) jRr (eI 1 ) j R every A Y A proof-theoretic reconstruction of monotonicity is presented in [1]. Under this definition of the meaning of intersective adjectives, such adjectives are also extensional, in the sense of satisfying the following entailment: every X isa Y every A X isa A Y , as shown by the following derivation: [j isa A X]1 ˆ1 ) (adj E [j isa A X]1 every X isa Y j isa X ˆ (eE) (adj Eˆ2 ) j isa Y j is A (adjI) j isa A Y (eI 1 ) every A X isa A Y Decidability of derivability remains intact, by adding to SC0+ the following two + for L+ rules, obtaining thereby a sequent-calculus SC0,adj 0,adj . Γ, j is A, j isa XS Γ j is A Γ j isa X (Ladj) (Radj) Γ, j isa A XS Γ j isa A X
Proof-Theoretic Semantics for a Natural Language Fragment
5.3
69
Adding Relative Clauses
We next add to the fragment relative clauses (rcs). This fragment transcends the locality of subcategorization in E0+ , in having long-distance dependencies. We refer to this (still positive) fragment as E1+ . Typical sentences include the following. (15) Jacob/every boy/some boy loves every/some girl who(m) smiles/loves every flower/Rachel loves (16) Rachel/every girl/some girl is a girl who loves Jacob/every boy (17) Jacob loves every girl who loves every boy who smiles (nested relative clause) So, girl who smiles and girl who loves every boy are compound nouns. We treat somewhat loosely to the case of the relative pronoun, in the form of who(m), abbreviating either who or whom, as the case requires. Note that E1+ , by its nesting of rcs, expands the stock of available positions for dp-introduction/elimination. Thus, in (17), ‘every boy who smiles’ is the object of the relative clause modifying the object of the matrix clause. In addition, new scope relationships arise among the multitude of dps present in E1+ sentences. Island conditions, preventing some of the scopal relationships, are ignored here. The corresponding ND-system N1+ extends N0+ by adding the following I-rules and E-rules. For their formulation, we extend the distinguished position notation with S[−], indicating that the position is unfilled. For example, loves every girl and every girl loves have their subject and object dp positions, respectively, unfilled. Γ j isa X Γ S[j] (relI) Γ j isa X who S[−]
Γ, [j isa X]1 , [S[j]]2 S
Γ j isa X who S[−]
Γ S
(relE 1,2 )
,
j fresh
The simplified elimination-rules are: Γ j isa X who S[−] ˆ 1 (relE) Γ j isa X
Γ j isa X who S[−] ˆ 2 (relE) Γ S[j]
As an example of a derivation in this fragment, consider some girl who smiles sings N + some girl sings 1
exhibiting the model-theoretical upward monotonicity of some in its first argument. [r isa X who P1 ]1 ˆ 1 (relE) r isa X [r P2 ]2 (sI) some X P2 some X who P1 P2 (sE 1,2 ) some X P2
70
N. Francez and R. Dyckhoff
Similarly, the following witness of the downward monotonicity of ‘every’ (in its first argument) can be derived. every girl singsN + every girl who smiles sings 1
[j isa girl who smiles]1 (relEˆ1 ) every girl sings j isa girl ˆ (eE) j sings (eI 1 ) every girl who smiles sings Once again, decidability of derivability is shown by means of the following additional sequent-calculus rules, added to SC0+ , to form SC1+ . Γ, j isa X, S[j]S Γ j isa X Γ S[j] (Rrel) (Lrel) Γ, j isa X who S[−] S Γ j isa X who S[−]
6
Conclusions
The assignment of proof-theoretical meanings to NL-sentences and to subsentential phrases is, to the best of our knowledge, completely new. There is a vast literature on the use of proof-theory in deriving meanings; however, the derived meanings are all model-theoretic. Besides the traditional meaning derivation in TLG, relying on the Curry-Howard correspondence, there is also a similar approach in LFG called ‘glue’, using linear logic for the derivations. There also approaches like [9], that read of meanings from proof-nets instead of derivations. In all these approaches, the common theme is that some proof-theoretic object is used for deriving meanings, and does not constitute the meaning. The latter is usually formulated in some (extension of a) λ-calculus, the terms of which are interpreted model-theoretically in Henkin models. There is a body of work also going under the title of PTS for sentential meanings, based on constructive type-theory (MLTT), which is clearly related, but, we believe, different than our approach to PTS. The differences are discussed in the full paper.
Acknowledgements The work was supported by EPSRC grant EP/D064015/1, and grant 2006938 by the Israeli Academy for Sciences (ISF). We thank the following colleagues and students for various illuminating discussions, and for critical remarks on preliminary drafts: Gilad Ben-Avi, Iddo Ben-Zvi, Ole Hjortland, James Mckinna, Larry Moss, Dag Prawitz, Stephen Read.
Proof-Theoretic Semantics for a Natural Language Fragment
71
References 1. Ben-Avi, G., Francez, N.: A proof-theoretic reconstruction of generalized quantifiers (2009) (submitted for publication) 2. Brandom, R.B.: Articulating Reasons. Harvard University Press, Cambridge (2000) 3. Dummett, M.: The Logical Basis of Metaphysics. Harvard University Press, Cambridge (1991) 4. Francez, N., Ben-Avi, G.: Proof-theoretic semantic values for logical operator. Synthese (2009) (under refereeing) 5. Francez, N., Dyckhoff, R.: A note on proof-theoretic validity (2007) (in preparation) 6. Francez, N., Dyckhoff, R.: A note on harmony. Journal of Philosophical Logic (2007) (submitted) 7. Francez, N., Dyckhoff, R., Ben-Avi, G.: Proof-theoretic semantics for subsentential phrases. Studia Logica 94, 381–401 (2010), doi:10.1007/s11225-010-9241-y 8. Gentzen, G.: Investigations into logical deduction. In: Szabo, M. (ed.) The collected papers of Gerhard Gentzen, pp. 68–131. North-Holland, Amsterdam (1935) (english translation of the 1935 paper in German) 9. de Groote, P., Retore, C.: On the semantic readings of proof-nets. In: Kruijf, G.J., Oehrle, D. (eds.) Formal Grammar, pp. 57–70. FOLLI (1996) 10. Kremer, M.: Read on identity and harmony – a friendly correction and simplification. Analysis 67(2), 157–159 (2007) 11. Montague, R.: The proper treatment of quantification in ordinary english. In: Hintikka, J., Moravcsik, J., Suppes, P. (eds.) Approaches to natural language, Reidl, Dordrecht (1973); proceedings of the 1970 Stanford workshop on grammar and semantics 12. Moortgat, M.: Categorial type logics. In: van Benthem, J., ter Meulen, A. (eds.) Handbook of Logic and Language, pp. 93–178. North-Holland, Amsterdam (1997) 13. Moss, L.: Syllogistic logics with verbs. Journal of Logic and Information (to appear 2010) 14. Pfenning, F., Davies, R.: A judgmental reconstruction of modal logic. Mathematical Structures in Computer Science 11, 511–540 (2001) 15. Plato, J.V.: Natural deduction with general elimination rules. Archive Mathematical Logic 40, 541–567 (2001) 16. Prawitz, D.: Natural Deduction: Proof-Theoretical Study. Almqvist and Wicksell, Stockholm (1965) 17. Prior, A.N.: The roundabout inference-ticket. Analysis 21, 38–39 (1960) 18. Read, S.: Harmony and autonomy in classical logic. Journal of Philosophical Logic 29, 123–154 (2000) 19. Read, S.: Identity and harmony. Analysis 64(2), 113–119 (2004); see correction in [10] 20. Read, S.: Harmony and modality. In: D´egremont, C., Kieff, L., R¨ uckert, H. (eds.) Dialogues, Logics and Other Strong Things: Essays in Honour of Shahid Rahman, pp. 285–303. College Publications (2008) 21. Restall, G.: Proof theory and meaning: on the context of deducibility. In: Proceedings of Logica 2007, Hejnice, Czech Republic (2007) 22. Schroeder-Heister, P.: Validity concepts in proof-theoretic semantics. In: Kale, R., Schroeder-Heister, P. (eds.) Proof-Theoretic Semantics, vol. 148, pp. 525–571 (February 2006), special issue of Synthese
Some Interdefinability Results for Syntactic Constraint Classes Thomas Graf Department of Linguistics University of California, Los Angeles
[email protected] http://tgraf.bol.ucla.edu
Abstract. Choosing as my vantage point the linguistically motivated M¨ uller-Sternefeld hierarchy [23], which classifies constraints according to their locality properties, I investigate the interplay of various syntactic constraint classes on a formal level. For non-comparative constraints, I use Rogers’s framework of multi-dimensional trees [31] to state M¨ uller and Sternefeld’s definitions in general yet rigorous terms that are compatible with a wide range of syntactic theories, and I formulate conditions under which distinct non-comparative constraints are equivalent. Comparative constraints, on the other hand, are shown to be best understood in terms of optimality systems [5]. From this I derive that some of them are reducible to non-comparative constraints. The results jointly vindicate a broadly construed version of the M¨ uller-Sternefeld hierarchy, yet they also support a refined picture of constraint interaction that has profound repercussions for both the study of locality phenomena in natural language and how the complexity of linguistic proposals is to be assessed. Keywords: Syntactic constraints, Transderivationality, Economy conditions, Model theoretic syntax, Multi-dimensional trees, Optimality systems.
1
Introduction
Constraints are arguably one of the most prominent tools in modern syntactic analysis. Although the dominance of derivational approaches in the linguistic mainstream since the inception of Chomsky’s Minimalist Program [2,3] might suggest otherwise, generative frameworks still feature a dazzling diversity of principles and well-formedness conditions. The array of commonly assumed constraints ranges from the well-established Shortest Move Constraint to the fiercely debated principles of binding theory, but we also find slightly more esoteric proposals such as Rule I [26], MaxElide [35], GPSG’s Exhaustive Constant Partial Ordering Axiom [6] or the almost forgotten Avoid Pronoun Principle of classic GB. A closer examination of these constraints shows that they differ significantly in the structures they operate on and how they succeed at restricting the set of expressions. A natural question to ask, then, is if we can identify commonalities C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 72–87, 2010. c Springer-Verlag Berlin Heidelberg 2010
Interdefinability of Constraints
73
between different constraints, and what the formal and linguistic content of these commonalities might be. The M¨ uller-Sternefeld (MS) hierarchy [23,21] is — to my knowledge — the only articulate attempt at a classification of linguistic constraints so far. Basing their analysis on linguistic reasoning grounded in locality considerations, M¨ uller and Sternefeld distinguish several kinds of constraints, which in turn can be grouped into two bigger classes. The first one is the class of non-comparative constraints (NCCs): representational constraints are well-formedness conditions on standard trees (e.g. ECP, government), derivational constraints restrict the shape of trees that are adjacent in a derivation (e.g. Shortest Move), and global constraints apply to derivationally non-adjacent trees (e.g. Projection Principle). The second class is instantiated by comparative constraints (CCs), which operate on sets of structures. Given a set of structures, a CC returns the best member(s) of this set, which is usually called the optimal candidate. Crucially, the optimal candidate does not have to be well-formed — it just has to be better than the competing candidates. M¨ uller slightly revises this picture in [21] and further distinguishes CC according to the type of structures they operate on. If the structures in question are trees, the constraint is called translocal (e.g. Avoid Pronoun Principle); if they are derivations, it is called transderivational (e.g. Fewest Steps, MaxElide, Rule I). Finally, it is also maintained in [21] that these five subclasses can be partially ordered by their expressivity: representational = derivational < global < translocal < transderivational. A parametric depiction of the constraint classification and the expressivity hierarchy, which jointly make up the MS-hierarchy, is given in Fig. 1. The MS-hierarchy has a strong intuitive appeal, at least insofar as derivations, long-distance restrictions and operations on sets seem more complex than representations, strictly local restrictions and operations on trees, respectively. However, counterexamples are readily at hand. For instance, it is a simple coding Constraints
Comparative
Non-comparative
Derivations
Representations Representational
Representations
Derivations
Translocal
Transderivational
Level 3
Level 4
Adjacent nodes Arbitrary nodes Derivational
Level 1
Global Level 2
Fig. 1. The M¨ uller-Sternefeld hierarchy of constraints
74
T. Graf
exercise to implement any transderivational constraint as a global constraint by concatenating the distinct derivations into one big derivation, provided there are no substantial restrictions on how we may enrich our grammar formalism. As another example, it was shown in [16] that Minimalist Grammars with the Specifier Island Constraint (SPIC) but without the Shortest Move Constraint can generate any type-0 language. But the SPIC is a very simple derivational constraint, so it unequivocally belongs to the weakest class in the hierarchy, which is at odds with its unexpected effects on expressivity. Therefore, the MS-hierarchy makes the wrong predictions in its current form, or rather, it makes no predictions at all, because its notion of complexity and its assumptions concerning the power of the syntactic framework are left unspecified. In this paper, I show how a model theoretically informed perspective does away with these shortcomings and enables us to refine the MS-hierarchy such that the relations between constraint classes can be studied in a rigorous yet linguistically insightful way. In particular, I adopt Rogers’s multi-dimensional trees framework [31] as a restricted metatheory of linguistic proposals in order to ensure that the results hold for a wide range of syntactic theories. We proceed as follows: After a brief discussion of technical preliminaries I move on to the definition of classes of NCCs in Sect. 3 and study their behavior and interrelationship in arbitrary multi-dimensional tree grammars. I show that a proper subclass of the global constraints can be reduced to local constraints. In Sect. 4, I then turn to a discussion of CCs, why they require the model theoretic approach to be supplemented by optimality systems [5], and which CCs can be reduced to NCCs.
2
Preliminaries
Most of my results I couch in terms of the multi-dimensional tree (MDT) framework developed by Rogers [31,32]. The main appeal of MDTs for this endeavor is that they make it possible to abstract away from theory-specific idiosyncrasies. This allows for general characterizations of constraint classes and their reducibility that hold for a diverse range of linguistic theories. MDT renditions of GB [29], GPSG [27] and TAG [30] have already been developed; the translation procedure from HPSG to TAG defined in [14] should allow us to reign in (a fragment of) the former as well. Further, recent results suggest that an approximation of Minimalist Grammars [33] is feasible, too: for every Minimalist Grammar we can construct a strongly equivalent k-MCFG [20], and for each k ≥ 2, the class of 2k−1 -MCFLs properly includes the class of level-k control languages [12], which in turn are equivalent to the string yield of the set of (k + 1)-dimensional trees [31]. While initially intimidating due to cumbersome notation, MDTs are fairly easy to grasp at an intuitive level. Looking at familiar cases first, we note that a string can be understood as a unary branching tree, a set of nodes ordered by the precedence relation. But as there is only one axis along which its nodes are ordered, it is reasonable to call a string a one-dimensional tree, rather than a
Interdefinability of Constraints
Z A B C D E
Z A
= = = = = =
0 1 0 , 0 , 0
F G H I J K
= = = = = =
0 , 1 0 , 1, 0 0 , 1, 1 1 , 1 , 0 1 , 1
C
= = = = =
1 , 1, 0 1 , 1 , 1 , 1 , 0 1 , 1 , 1 1 , 1 , 2
A
I
B
L M N O P
75
D
D
I
K F
M
J
E
E
F G
G
H L
N
O
P
H
J
M
N
O
P
L
Fig. 2. A T 3 (with O a foot node), its node addresses and its 2-dimensional yield
unary branching one. In a standard tree, on the other hand, the set of nodes is ordered by two relations, usually called dominance and precedence. Suppose s is the mother of two nodes t and u in some standard tree, and also assume that t precedes u. Then we might say that s dominates the string tu. Given our new perspective on strings as one-dimensional trees, this suggests to construe standard trees as relating nodes to one-dimensional trees by immediate dominance. Thus it makes only sense to refer to them as two-dimensional objects. But from here it is only a small step to the concept of MDTs. A three-dimensional tree (see Fig. 2 for an example) relates nodes to two-dimensional, i.e. standard trees (for readers familiar with TAG, it might be helpful to know that three-dimensional trees correspond to TAG derivations). A four-dimensional tree relates nodes to three-dimensional trees, and so on. In general, a d-dimensional tree is a set of nodes ordered by d dominance relations such that the nth dominance relation relates nodes to (n − 1)-dimensional trees (for d = 1, assume that single nodes are zero-dimensional trees). To make this precise, we define d-dimensional trees as generalizations of Gorn tree domains. First, let a higher-order sequence be defined inductively as follows: – 0 1 := {1} – n+1 1 is the smallest set containing and if both x1 , . . . , xl ∈ y ∈ n 1 then x1 , . . . , xl , y ∈ n+1 1.
n+1
1 and
Concatenation of sequences is denoted by · and defined only for sequences of the same order. A 0-dimensional tree is either ∅ or {1}. For d ≥ 1, a d-dimensional tree T d is a set of dth -order sequences satisfying – T d ⊆ d 1, and
76
T. Graf
– ∀s, t ∈ d 1[s · t ∈ T d → s ∈ T d ], and – ∀s ∈ d 1[ w ∈ (d−1) 1 | s · w ∈ T d is a (d − 1)-dimensional tree]. The reader might want to take a look at Fig. 2 again for a better understanding of the correspondence between sequences and tree nodes (first-order sequences are represented by numerals to improve readability; e.g. 0 = and 2 = 1, 1). Several important notions are straightforwardly defined in terms of higherorder sequences. The leaves of T d are the nodes at addresses that are not properly extended by any other address in T d. The depth of T d is the length of its longest top level sequence, which in more intuitive terms corresponds to the length of the longest path of successors at dimension d from the root to a leaf. Given a T d and some node s of T d, the child structure of s in T d is the set t ∈ T d−1 | s · t ∈ T d . For example, the child structure of B in Fig. 2 is the T 2 with its root labeled D. For any T d and 1 ≤ i ≤ d, its branching factor at dimension i is 1 plus the maximum depth of the T i−1 child structures contained by T d . If the branching factor of some T d is at most n for all dimensions 1 ≤ i ≤ d, we call it n-branching and write Tnd. For any non-empty alphabet Σ, TΣd := T, , T a T d and a function from Σ to ℘(T ), is a Σ-labeled d-dimensional tree. In general, we require all trees to be labeled and simply write TΣd . The i-dimensional yield of TΣd is obtained by recursively rewriting all nodes at dimension j > i, starting at dimension d, by their (j − 1)-dimensional child structure. Trees with more than two dimensions have some of their leaves at each dimension i > 2 marked as foot nodes, which are the joints where the (i − 1) child structures are merged together. In forming the 2-dimensional yield of our example tree, K is rewritten by the 2-dimensional tree rooted by M. The daughter of K ends up dominated by O rather than N or P because O is marked as the foot node. For a sufficiently rigorous description of how the i-dimensional yield is computed, see [31, p.281–283] and [32, p.301– 307]. A sequence s1 , . . . , sm of nodes of T d, m ≥ 1, is an i-path iff with respect to the i-dimensional yield of T d , s1 is the root, sm a leaf, and for all sj , sj+1 , 1 ≤ j < m, it holds that sj immediately dominates sj+1 at dimension i. The set of all i-paths of T d is its i-path language. A set of TΣd s is also called a T d language, denoted LdΣ . Unless stated otherwise, the branching factor is assumed to be bounded for every LdΣ , that is to say, there is some n ∈ N such that each T ∈ LdΣ is n-branching. Call T d local iff its depth is 1. In Fig. 2, the T 3 rooted by K and the T 2 rooted by M are local; the T 3 rooted by B is also local, even though its child structure, the T 2 rooted by D, d over an alphabet Σ is a finite language of local TΣd s. is not. A T d grammar GΣ d d Let GΣ (Σ0 ) denote the set of TΣd s licensed by a grammar GΣ relative to a set of d initial symbols Σ0 ⊆ Σ, which is the set of all TΣ s with their root labeled by a symbol drawn from Σ0 and each of their local d-dimensional subtrees contained d d d in GΣ . A language LdΣ is a local set iff it is GΣ (Σ0 ) for some GΣ and some d d Σ0 ⊆ Σ. Intuitively, a local set of T s is a T language where all trees can be built up from local trees. An important fact about local sets is that they are fully characterized by subtree substitution closure.
Interdefinability of Constraints
77
Theorem 1 (Subtree substitution closure). LdΣ is a local set of TΣd s iff for all T, T ∈ LdΣ , all s ∈ T and all t ∈ T , if s and t have the same label, then the result of substituting the subtree rooted by s for the subtree rooted by t is in LdΣ . Proof. An easy lift of the proof in [28] to arbitrary dimension d.
d For our logical approach, we interpret a Tn,Σ as an initial segment of the rela d d := d Tn , i 1≤i≤d , where Tn is the infinite T d in which every tional structure Tn point has a child structure of depth n − 1 in all its dimensions, and where i denotes immediate dominance at dimension i, that is x i y iff y is the immediate successor of x in the ith dimension. x d y iff y = x · s x d−1 y iff x = p · s and y = p · s · w .. .
x 1 y iff x = p · s · · · · w · · · and y = p · s · · · · w · 1 · · · The weak monadic second-order logic for Tdn is denoted by msod and includes — besides the usual connectives, quantifiers and grouping symbols — constants for each i , 1 ≤ i ≤ d, and two countably infinite sets of variables ranging over individuals and finite subsets, respectively. As usual, we write Tdn |= φ[s] to assert that φ is satisfied in Tdn under assignment s. For any T d , all quantifiers are assumed to be implicitly restricted to the initial segment of Tdn corresponding to T d . The set of models of φ is denoted by Mod(φ). This notation extends to sets of formulas in the obvious way. Note that LdΣ is recognizable iff LdΣ = Mod(Φ) for some set Φ of msod formulas. Let me close this section with several minor remarks. The notation A \ B is used to denote set difference. Regular expressions are employed at certain points in the usual way, with the small addition of x≤1 as a stand-in for and x. Finally, I will liberally drop subscripts and superscripts whenever possible.
3 3.1
Non-comparative Constraints Logics for Non-comparative Constraints
A short glimpse at the MS-hierarchy in Fig. 1 reveals that NCCs are distinguished by two parameters: the distance between the nodes they restrict (1 versus unbounded) and the type of structure they operate on (representations versus derivations). As I will show now, this categorization can be sharpened by recasting it in logical terms, thereby opening it up to our mathematical explorations in the following section. The distinction between representations and derivations is merely a terminological confusion in our multi-dimensional setup. A two-dimensional tree, for instance, can be interpreted as both a representational tree structure and a string derivation. This ambiguity is particularly salient for higher dimensions, where there are no linguistic preconceptions concerning the type of structure we are operating on. A better solution, then, is to distinguish NCCs according
78
T. Graf
to the highest dimension they mention in their specification (this will be made precise soon). As for the distance between restricted nodes, it seems to be best captured by the distinction between local and recognizable sets, the latter allowing for unbounded dependencies between nodes while the former are limited to wellformedness conditions that apply within trees of depth 1. As mentioned in Sect. 2, definability in mso is a logical characterization of recognizability, so in conjunction with the MDT framework, this already gives us everything we need to give a theory-neutral definition of global NCCs. For the second, restricted kind of constraints, however, we still need a logical characterization of local sets. Fortunately, this characterization was obtained for two-dimensional trees by Rogers in [28] and can easily be lifted to higher dimensions as follows. For any D ∈ {i , i }i≥1 , let Dφ(x) abbreviate the msok formula ∃y[xDy ∧ φ(y)], where x i y := y i x. We require that Tdn |= Dφ(x)[s] iff Tdn |= ∀x∃y[xDy ∧ φ(y)][s]. Declaring all other uses of quantification to be illicit yields what may be regarded as a normal modal logic. Definition 1 (RLOCk ). rlock ( relaxed lock ) is the smallest set of msok formulas over the boolean operators, individual variables, set variables and all i , i , 1 ≤ i ≤ k. In the next step, we restrict disjunction. Let lock+ be the smallest set of rlock formulas such that – all i and j , i < j ≤ k, are in the scope of exactly one more k than k , and – all k are in the scope of exactly as many k as k . Similarly, let lock− be the smallest set of rlock formulas such that – all i and j , i < j ≤ k, are in the scope of exactly as many k as k , and – all k are in the scope of exactly one more k than k . Definition 2 (LOCk ). The set of lock formulas consists of all and only those formulas that are conjunctions of – disjunctions of formulas in lock+ , and – disjunctions of formulas in lock− . The following lemmata tell us that lock restricts only k and k in a meaningful way. This will also be of use in the next section. Lemma 1 (RLOCk and LOCk+1 ). A formula φ is an rlock formula iff it is a lock+1 formula containing no k+1 and no k+1 . Proof. By induction on the complexity of φ. The crucial condition is the first clause in the definition of lock− . Lemma 2 (Normal forms). Every lock+ formula is equivalent to a disjunction of conjunctions of lock+ formulas of the form (k {i , i }∗1≤i
Interdefinability of Constraints
Proof. The proof in [28] holds for all k ≥ 1.
79
With Lemma 2 under our belt, we can proceed to prove the sought after equivalence of definability in locd and locality of sets of d-dimensional trees. Theorem 2 (Locality and LOC). A set L of finite Tnd s, d, n ≥ 1, is local iff it is definable in locd . Proof. As the proof in [28] for the correspondence between loc2 and the local sets of 2-dimensional trees is easily generalized to all positive d = 2, a short sketch suffices. ⇒ Since L is local, there is a grammar G that derives L, i.e. L can be fully specified by a finite set of trees of depth 1. Assume that T1 , . . . , Tn are all the trees in G with their root labeled A and φ1 , . . . , φn are rlocd−1 formulas describing the child structure of A in T1 , . . . , Tn , respectively (for d = 1, φi is propositional). As there is an upper bound on the size of child structures for all T ∈ G, such φ are guaranteed to exist. Then φA := A → d 1≤i≤n φi is a locd+ formula, whence φΣ := A∈Σ φA is in locd . It only remains to conjoin φΣ with the locd− formulas A∈Σ A(x) and ¬ d → A∈Σ0 A(x) to ensure that all nodes are labeled and in particular that root nodes are labeled with an initial symbol. The result is a locd formula. ⇐ This follows from Lemma 2. It is easy to see that the truth value of loc formulas in normal form at some node t depends only on the local tree rooted at either some t in the same local tree as t for lock+ or the parent of t for lock− . In either case the truth value of the formula remains unaffected by subtree substitution, whence all sets satisfying it are local. Thus everything is in place now for the logical classification of NCCs we outlined before. Definition 3 (Classes of non-comparative constraints). A constraint c is – k-global iff it can be defined by an msok formula. – k-local iff it can be defined by a lock formula. – fully k-local iff • for k = 1, c is 1-local • for k > 1, c is definable by a lock formula φ built up from lock+ and lock− formulas φ1 , . . . , φn in normal form such that for each 1 ≤ i ≤ n the formula ψi obtained from φi by removing all occurrences of k and k is fully (k − 1)-local. 3.2
Reducibility with and without Fixed Signatures
We now turn to interdefinability results for NCCs. The well-understood relation between local and recognizable sets [4,36] in conjunction with their logical definability [28,31] immediately derives the reducibility of global constraints.
80
T. Graf
Theorem 3 (Reducibility by features). Let Φ be a set of msod formulas and cg a k-global constraint, 1 ≤ k ≤ d, with Mod(Φ ∪ {cg }) a recognizable set of Σ-labeled T d s. Then there is a fully k-local constraint cl such that Mod(Φ∪{cl }) is a set of Σ ∪ Ω-labeled T d s and a projection of Mod(Φ ∪ {cl }). The familiar idea underlying the theorem is that we only need to set aside a certain amount of diacritic features to make all the non-local information used in cg accessible to cl . The details of this procedure were studied by Marcus Kracht in his work on coding theory [17,18,19].1 Unfortunately, Theorem 3 is at most of peripheral importance to linguists, who usually do not want their grammar to contain spurious labels or features that have no independent empirical motivation. But comparable results can be obtained if the signature is fixed, at least for all dimensions but the highest one. The trick is to exploit the structure of trees to reencode global constraints as local constraints at higher dimensions. Lemma 1 already hinted at this possibility, but it is too weak to actually derive it. The missing piece of the puzzle is the expressivity of locd with respect to rlocd−1 and msod−1 at dimension d − 1, which is partially answered by the following two lemmata. Lemma 3 (RLOCd−1 < LOCd ). There is a set Φ of locd formulas, d > 1, such that the (d − 1)-dimensional yield of Mod(Φ) is not definable in rlocd−1 . Proof. We already know from Lemma 1 that rlocd−1 ≤ locd . Now consider ∗ ∗ the language L := ({a, b, d} (ab∗ c)∗ {a, b, d} )∗ . So every string in L with a c also has an a preceding it, and no b may intervene between the two. It is easy to write an rloc1 formula that requires every node in the string to be labeled with exactly one symbol drawn from {a, b, c, d}. Thus it only remains to sufficiently restrict the distribution of c. If a is at most n steps to the left of c, this can be done by the formula φ := c → 1 a ∨ ( 1 1 a ∧ 1 d) ∨ . . . ∨ ( n1 a ∧ 1 d ∧ . . . ∧ n−1 d), 1 where n1 is a sequence of n many 1 . But by virtue of our formulas being required to be finite, the presence of an a can be enforced only up to n steps to the left of c. So if a is n+1 steps away from c, then φ will be false. Similar problems arise if one starts with a moving to the right. Nor is it possible to use d as an intermediary, as in, say, the formula ψ := d → ( 1 a∨ 1 d)∧(¬ 1 1 (a∨¬a) → 1 a), which forces every sequence of ds to be ultimately preceded by an a. The second conjunct is essential, since it rules out strings of the form d∗ c. But ψ is too strong, because it is not satisfied by any L-strings containing the substrings bd or cd. Note that we cannot limit ψ to ds preceding c, again due to the finite length of rloc1 formulas, which is also the reason why we cannot write an implicational formula with b as its antecedent that will block bs from occurring between a and c. This 1
Note that Φ can remain unchanged since the logical perspective allows for a node to be assigned multiple labels l1 , . . . , ln instead of the sequence l1 , . . . , ln (which is the standard procedure in automata theory).
Interdefinability of Constraints
81
exhausts all possibilities, establishing that rloc1 fails to define L because it is in general incapable of restricting sequences of unbounded size.2 That L is definable in loc2 is witnessed by the following grammar of local 2dimensional trees (with Σ0 := {b}), which derives L without the use of additional features: b
b b
b
b
b
a
d
a
b a
c
a
d d
d
d
This case can be lifted to any dimension k by regarding L as the k-path language of some T d , 1 ≤ k ≤ d. Lemma 4 (MSOd ≮ LOCd+1 ). There is a set Φ of msod formulas, d ≥ 1, such that there is no locd+1 definable set Ld+1 whose d-dimensional yield is identical to Mod(Φ). Proof. Consider the language L := (aa)∗ . Clearly, this language is definable in mso1 but not in first-order logic over strings, since it involves modulo counting. Hence it cannot be defined in loc1 either. We now show that loc2 is also too weak to define L. As Σ := {a}, the grammar for the tree language with L as its string yield can only consist of trees of depth 1 with all nodes labeled a. Clearly, none of the trees may have an odd number of leaf nodes, since this would allow us to derive a language with an odd number of as. So assume that all trees in our grammar have only an even number of leaves. But local tree sets are characterized by subtree substitution closure, whence we could rewrite a single leaf in a tree with an even number of leaves by another tree with an even numer of leaves, yielding a complex tree with an odd number of leaf nodes. This proves undefinability of L in loc2 . We can again lift this example to any dimension d ≥ 2 by viewing L as a path language. We now have lock < rlock < lock+1 and rlock < msok and msok ≮ lock+1 with respect to expressivity at dimension k, from which it follows immediately that a proper subset of all global constraints can be replaced by local ones. Theorem 4 (Reducibility at lower dimensions). Let C be the set of all k-global but not k-local constraints. Then C properly includes the set of all c ∈ C for which there is a set Φ of msod formulas, k < d, such that Mod(Φ ∪ {c}) is recognizable and there is a (k + 1)-local constraint c with Mod(Φ ∪ {c}) = Mod(Φ ∪ {c }). 2
A relaxed version of L is definable by an infinite set of rloc1 formulas. Let L be the set of all strings over {a, b, c, d}∗ containing no substring of the form (ad∗ b+ d∗ c) but a c does not have to be preceded by an a. Then one may write a formula φ that checks two intervals I, I of size m and n, respectively. In particular, φ enforces that no b occurs in I if a is at the left edge of I and no c is contained in I and c is at the right edge of I and no a is contained in I . Occurrences of b in I are banned in a symmetrical way. Pumping m and n independently gives rise to an infinite set of rloc1 formulas that defines L .
82
T. Graf
Since rlock is essentially a modal logic, we can even use model theoretic properties of modal logics, e.g. bisimulation invariance, to exhibit sufficient (but not necessary) conditions for reducibility of global constraints. The results in this section have interesting implications for linguistic theorizing. The neutralizing effects of excessive feature coding with respect to NCCs lend support to recent proposals which try to do away with mechanisms of this kind in the analysis of phenomena such as pied-piping (e.g. [1]). That reducibility is limited to a proper subclass of the global constraints, on the other hand, provides us with a new perspective on approaches which severely constrain the size of representational locality domains by recourse to local constraints on derivations (e.g. [22]). In the light of my results, they seem to be less about reducing the size of locality domains — a quantitative notion — than determining the qualitative power of global constraints in syntax.
4 4.1
Comparative Constraints Model Theory and Comparative Constraints — A Problem
Our interest in NCCs is almost entirely motivated by linguistic considerations. CCs, on the other hand, are intriguing from a mathematical perspective, too, because they make the well-formedness of a structure depend on the presence or absence of other structures, which is uncommon in model theoretic settings, to say the least. As we will see in a moment when we take a look at the properties of various subclasses of CCs, this peculiar trait forces us to move beyond a purely model theoretic approach, but — unexpectedly — not for all CCs. According to M¨ uller [21], CCs are either translocal or transderivational. Several years earlier, however, it had already been noticed by Potts [24] that the metarules of GPSG instantiate a well-behaved subclass of CCs. By definition, metarules are restrictions on the form of a grammar. They specify a template, and a grammar has to contain all rules that can be generated from said template. Metarules can be fruitfully applied towards several ends, e.g. to extract multiple constituent orders from a single rule or to ensure that the legitimacy of one construction in language L entails that another construction is licit in L, too. From these two examples it should already be clear that metarules, although they are stated as restrictions on grammars, serve in restricting entire languages rather than just the structures contained by them. In particular, metarules are a special case of closure conditions on tree languages. From this perspective, it is not too surprising that GPSG-style metarules are provably mso2 -definable [24]. In fact, many closure constraints besides metarules can be expressed in mso2 as formula schemes [27]. With respect to the MS-hierarchy, this has several implications. First, there are more subclasses of CCs than predicted. Second, metarules instantiate a subclass that despite initial appearance can be represented by global constraints. Third, not all closure constraints are reducible to global constraints, since a formula scheme might give rise to an infinite set of formulas, which cannot be replaced by a single mso formula of finite length.
Interdefinability of Constraints
83
Given that closure constraints are already more powerful than global constraints, the high position of translocal and transderivational constraints in the MS-hierarchy would be corroborated if closure constraints could be shown to be too weak to faithfully capture either class. This seems to be the case. Consider the translocal constraint Avoid Pronoun. Upon being handed a 2-dimensional tree T ∈ L, Avoid Pronoun computes T ’s reference set, which is the set of trees in L that can be obtained from T by replacing overt pronouns by covert ones and vice versa. Out of this set, it then picks the tree containing the fewest occurrences of overt pronouns as the optimal output candidate. One might try to formalize Avoid Pronoun as a closure constraint on L such that for every T ∈ L, no T = T in the reference set of T is contained in L. This will run into problems when there are several optimal output candidates, but it is only a minor complication compared to the greater, in fact insurmountable challenge a closure constraint implementation of Avoid Pronoun faces: it permits any output candidate T , not just optimal ones, to be in L as long as no candidates competing with T belong to L. In other words, the closure constraint implementation of Avoid Pronoun allows for the selection of any candidate as the optimal output candidate under the proviso that all other output candidates are discarded. This means complete failure at capturing optimality, the very essence of CCs. 4.2
Comparative Constraints as Optimality Systems
From a model theoretic perspective, CCs are a conundrum. In order to verify that a set L of structures satisfies a CC, it does not suffice to look at L in isolation, we also have to consider what L looked like before the CC was applied to it. This kind of temporal reasoning is not readily available in model theory. Admittedly one could meddle with the models to encode such metadata, e.g. by moving to an ordered set of sets, but this is likely to obfuscate rather than illuminate our understanding of CCs in linguistics. Besides, trees are sets of nodes and tree languages are sets of sets of nodes, so our models would be sets of sets of sets of nodes and hence out of the reach of mso, pushing us into the realm of higher-order logics and beyond decidability. For these reasons, then, it seems advisable to approach CCs from a different angle with tools that are already well-adapted to optimality and economy conditions: optimality systems (OSs) [5]. Definition 4 (Optimality system). An optimality system over languages L, L is a pair O := Gen, C with Gen ⊆ L × L and C := c1 , . . . , cn a linearly ordered sequence of functions ci : range(Gen) → N. For i, o , i, o ∈ Gen, i, o
84
T. Graf
candidates with the fewest violations are kept as possible output candidates. All other candidates are discarded. We then proceed analogously for c2 until cn . The remaining output candidates are the optimal output candidates for i. In sum, an OS filters the set of output candidates in a stepwise manner while ensuring all along the way that the set is never emptied. It should be easy to see that without further restrictions, any recursively enumerable language can be derived by an OS. Interestingly, though, it has been established in a series of papers [5,13,37,7,11] that an OS defines a rational transduction if all the conditions below are satisfied (the converse does not hold).3 – L is a recognizable set. – Gen is a rational relation. – Every constraint defines a rational relation on the set of competing output candidates. – Optimality is global: If o ∈ L is an optimal output candidate for i ∈ L, then there is no i ∈ L such that o is an output candidate for i but not an optimal one (see [11] for details). It is a well-known fact that recognizable sets are closed under rational transductions. Therefore, if an OS is equivalent to a rational transduction, then its output language is recognizable. Recall that recognizability entails definability in mso, so the output language of such an OS has to be definable in terms of global constraints. Equating CCs with the subclass of OSs where Gen ⊆ L × L, this yields an intriguing (albeit partial) characterization of reducibility for CCs. In the following, we let O(Φ) denote the output language of O with L = Mod(Φ). Theorem 5. Let Φ be a set of msok formulas and cc a comparative constraint obeying the conditions listed above. Then there is an msok formula cg such that O(Φ) = Mod(Φ ∪ {cg }). Proof. We know from our previous discussion that LO = O(Φ) is recognizable. Since recognizable sets are closed under complement, there is an msok formula φ with Mod(φ) = Mod(Φ) \ O(Φ). Then cg := ¬φ. In other words, a CC of this restricted type can be reduced to a global constraint. On a methodological level, Theorem 5 allows linguists to freely employ a subclass of all CCs without running danger of computational intractability — a lot of the criticism commonly leveled against translocal and transderivational accounts [10,34] should thus turn out to be unfounded.4 The astute reader may wonder, though, how many CCs from the linguistic literature are to be found in 3
4
Kepser and M¨ onnich [15] define similar conditions for cases where L is a linear context-free language. As a consequence, the results in this section apply just as well to conservative extensions of the recognizable languages of d-dimensional trees, for instance Minimalist languages. On a speculative note, one may interpret Theorem 5 on an ontological level, that is as a claim that CCs are prevalent in the early stages of language acquisition but are subsequently recompiled into NCCs. This could offer a new perspective on well-known phenomena such as Principle B delay [9].
Interdefinability of Constraints
85
said subclass. This is a valid concern. While I do not have any conclusive answers yet, it seems that most syntactic (in contrast to most semantic) CCs satisfy global optimality [8]. Hence their reducibility hinges solely on their reference sets and their economy metric being rational relations, which I expect to hold for many interesting cases.
5
Conclusion
I demonstrated that the intuitive constraint hierarchy of [23,21] can be given a rigorous foundation that mostly confirms the big picture envisioned by these authors (with the addition of closure constraints as a third macro-class, inbetween non-comparative and comparative constraints). The interesting catch, however, is that certain constraints can be reduced to simpler ones depending on parameters such as the feature signature and the dimensionality and branching factor of our structures. From an application perspective, the reducibility of non-comparative constraints is of limited interest, due to the power of alternative feature coding techniques; the reducibility of comparative constraints, on the other hand, has profound repercussions as it opens up a pathway to their efficient implementation in natural language processing systems. Both types of reducibility results are of eminent importance to linguistic issues, foremost the study of locality phenomena and the Minimalist dictum that language is an optimal cognitive device. Acknowledgments. I am grateful to Ed Stabler and Sarah Zobel for helpful discussion and their extensive comments on numerous inferior incarnations of this paper.
References 1. Cable, S.: The Grammar of Q: Q-Particles and the Nature of Wh-Fronting. Ph.D. thesis, MIT, Cambridge (2007) 2. Chomsky, N.: A minimalist program for linguistic theory. In: Hale, K., Keyser, S.J. (eds.) The View from Building, vol. 20, pp. 1–52. MIT Press, Cambridge (1993) 3. Chomsky, N.: The Minimalist Program. MIT Press, Cambridge (1995) 4. Chomsky, N., Sch¨ uzenberger, M.P.: The algebraic theory of context-free languages. In: Braffort, P., Hirschberg, D. (eds.) Computer Programming and Formal Systems. Studies in Logic and the Foundations of Mathematics, pp. 118–161. North-Holland, Amsterdam (1963) 5. Frank, R., Satta, G.: Optimality theory and the generative complexity of constraint violability. Computational Linguistics 24, 307–315 (1998) 6. Gazdar, G., Klein, E., Pullum, G.K., Sag, I.A.: Generalized Phrase Structure Grammar. Blackwell, Oxford (1985) 7. Gerdemann, D., van Noord, G.: Approximation and exactness in finite-state phonology. In: Eisner, J., Karttunen, L., Th´eriault, A. (eds.) Finite State Phonology. Proceedings SIGPHON 2000, ACL (2000)
86
T. Graf
8. Graf, T.: Reference sets and congruences (in progress), ms. University of California, Los Angeles 9. Grodzinsky, Y., Reinhart, T.: The innateness of binding and coreference. Linguistic Inquiry 24, 69–102 (1993) 10. Johnson, D., Lappin, S.: Local Constraints vs. Economy. CSLI, Stanford (1999) 11. J¨ ager, G.: Gradient constraint in finite state OT: The unidirectional and the bidirectional case. In: Kaufmann, I., Stiebels, B. (eds.) More than Words. A Festschrift for Dieter Wunderlich, pp. 299–325. Akademie Verlag, Berlin (2002) 12. Kanazawa, M., Salvati, S.: Generating control languages with abstract categorial grammars. In: Kallmeyer, L., Monachesi, P., Penn, G. (eds.) Formal Grammar FG 2007, CSLI Publications, Stanford (2007), http://www.labri.fr/publications/mef/2007/KS07 13. Karttunen, L.: The proper treatment of optimality in computational phonology, manuscript, Xerox Research Center Europe (1998) 14. Kasper, R., Kiefer, B., Netter, K., Vijay-Shanker, K.: Compilation of HPSG to TAG. In: Proceedings of the 33rd annual meeting of the Association for Computational Linguistics, pp. 92–99 (1995) 15. Kepser, S., M¨ onnich, U.: Closure properties of linear context-free tree languages with an application to optimality theory. Theoretical Computer Science 354, 82–97 (2006) 16. Kobele, G.M., Michaelis, J.: Two type-0 variants of minimalist grammars. In: FGMoL 2005. The 10th conference on Formal Grammar and the 9th Meeting on Mathematics of Language, Edinburgh, pp. 81–93 (2005) 17. Kracht, M.: Is there a genuine modal perspective on feature structures? Linguistics and Philosophy 18, 401–458 (1995) 18. Kracht, M.: Syntactic codes and grammar refinement. Journal of Logic, Language and Information 4, 41–60 (1995) 19. Kracht, M.: Inessential features. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, p. 43. Springer, Heidelberg (1997) 20. Michaelis, J.: Derivational minimalism is mildly context-sensitive. In: Moortgat, M. (ed.) LACL 1998. LNCS (LNAI), vol. 2014, pp. 179–198. Springer, Heidelberg (2001) 21. M¨ uller, G.: Constraints in syntax. Lecture Notes, Universit¨ at Leipzig (2005) 22. M¨ uller, G.: On deriving CED effects from the PIC. Linguistic Inquiry 41, 35–82 (2010) 23. M¨ uller, G., Sternefeld, W.: The rise of competition in syntax: A synopsis. In: Sternefeld, W., M¨ uller, G. (eds.) Competition in Syntax, pp. 1–68. Mouton de Gruyter, Berlin (2000) 24. Potts, C.: Three kinds of transderivational constraints. In: Mac Bhloscaidh, S. (ed.) Syntax at Santa Cruz, vol. 3, pp. 21–40, Linguistics Department, UC Santa Cruz, Santa Cruz (2001) 25. Prince, A., Smolensky, P.: Optimality Theory: Constraint Interaction in Generative Grammar. Blackwell, Oxford (2004) 26. Reinhart, T.: Anaphora and Semantic Interpretation, Croon-Helm. Chicago University Press (1983) 27. Rogers, J.: Grammarless phrase structure grammar. Linguistics and Philosophy 20, 721–746 (1997) 28. Rogers, J.: Strict LT2 : Regular: Local: Recognizable. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 366–385. Springer, Heidelberg (1997) 29. Rogers, J.: A Descriptive Approach to Language-Theoretic Complexity. CSLI, Stanford (1998)
Interdefinability of Constraints
87
30. Rogers, J.: A descriptive characterization of tree-adjoining languages. In: Proceedings of the 17th International Conference on Computational Linguistics (COLING 1998) and the 36th Annual Meeting of the Association for Computational Linguistics (ACL 1998), pp. 1117–1121 (1998) 31. Rogers, J.: Syntactic structures as multi-dimensional trees. Research on Language and Computation 1(1), 265–305 (2003) 32. Rogers, J.: wMSO theories as grammar formalisms. Theoretical Computer Science 293, 291–320 (2003) 33. Stabler, E.P.: Derivational minimalism. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 68–95. Springer, Heidelberg (1997) 34. Stroik, T.S.: Locality in Minimalist Syntax. MIT Press, Cambridge (2009) 35. Takahashi, S., Fox, D.: MaxElide and the re-binding problem. In: Georgala, E., Howell, J. (eds.) Proceedings of SALT XV, pp. 223–240. CLC publications, Ithaca (2005) 36. Thatcher, J.W.: Characterizing derivation trees for context-free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences 1, 317–322 (1967) 37. Wartena, C.: A note on the complexity of optimality systems. In: Blutner, R., J¨ ager, G. (eds.) Studies in Optimality Theory, pp. 64–72. University of Potsdam, Potsdam (2000)
Sortal Equivalence of Bare Grammars Thomas Holder
[email protected]
Abstract. We discuss a concept of structural equivalence between grammars in the framework of Keenan and Stabler’s bare grammars. The definition of syntactic sorts for a grammar L permits the introduction of a sort structure group Autπ (L). The automorphism group Aut(L) of L is found to be a group extension by Autπ (L). We develop then a concept of equivalence of grammars based on isomorphisms between the syntactic sort algebras. We study the implications of this equivalence with techniques from category theory: we invert the class of grammar homomorphisms that induce isomorphisms of sort algebras. The resulting category of fractions is found to be equivalent to a category of sortally reduced grammars.
1
Introduction
Keenan and Stabler have proposed in their Bare Grammar framework to apply the structure-and-invariance methodology of Felix Klein’s Erlanger Program to linguistics ([17]). They model grammars of natural languages basically as finitely generated partial algebras and consider then the group of automorphisms of the grammar algebra. The aim is to study linguistic properties and relations defined over the expressions of a language L with respect to their invariance under the automorphism group of L. In this paper we focus on the role of syntactic sorts for the grammatical structure. Our approach differs from the line taken by Keenan and Stabler insofar our bare grammars are initially sort free. They acquire their sorts via a sort assigment map to a sort algebra L/μ. This sort algebra derives from the maximal congruence relation μ on the grammar algebra. Since it inherits a universal property from μ the sort algebra effectively constrains the behavior of the syntactic automorphisms. Keenan and Stabler have noticed early on that the syntactic invariants show sometimes an unfavorable dependence on the cardinality of the lexical material available in a given grammar. We study this problem from the angle of the sort algebra and explore an proposal initially due to G.Kobele ([18]): we consider two grammars as equivalent if they have isomorphic sort algebras. One implication of this view is that we should be able to treat homomorphisms that induce isomorphisms of sort algebras as isomorphisms. In other
The author would like to thank Hans-Martin G¨ artner, Andreas Haida, Ed Keenan, Greg Kobele, Marcus Kracht, Jens Michaelis and Ed Stabler for discussions and support in various forms.
C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 88–102, 2010. c Springer-Verlag Berlin Heidelberg 2010
Sortal Equivalence of Bare Grammars
89
words, we should be able to invert these morphisms. The problem of turning a class of ’weak equivalences’ into isomorphisms is of great importance for modern mathematics. Especially, category theory provides a rich environment for this process of localizing with respect to a class of morphisms. We will thus develop our approach to bare grammars in a categorical setting. This has the advantage that the results are less specific to bare grammars but become suggestive for other algebraic grammar frameworks as well. We give a brief outline of the content: Section 2 introduces bare grammars. Section 3 develops the concept of sort algebra via congruence relations. This permits to define a group of sort preserving automorphisms Autπ (L) in section 4. The following section uses Autπ (L) to display the group of syntactic automorphisms Aut(L) as a group extension. Section 6 defines sortal equivalences as those homomorphisms that induce isomorphisms between sort algebras, and the following section studies the consequences of formally inverting these sortal equivalences. The last section asks the questions whether it was a good idea to invert the sortal equivalences, and offers some general remarks on the philosophy of forcing morphisms to be isomorphisms. This paper presupposes acquaintance with category theory, partial algebras and group theory. The appendix contains the definitions of some of the employed concepts. Although the treatment of bare grammars is selfcontained, previous exposure to Keenan and Stabler’s monograph [17] is certainly helpful.
2
Bare Grammars and Syntactic Automorphisms
Bare grammars were introduced by E.Keenan and E.Stabler to provide a rigorous formalism to accomodate a wide range of analyses of natural languages. They find a natural habitat in the theory of partial algebras, as pointed out by G.Kobele ([18]). Here we give the basic definitions, and collect some useful facts from [17]. Definition 1. A bare grammar (over the signature τ ) is a finitely generated 1 n partial algebra L = [L0 ], (gL , . . . , gL )τ over τ . The finite generating set L0 is 1 i called the lexicon of L and the gL generating functions of L.2 A homomorphism α : L → L between two bare grammars is called a syntactic homomorphism. This is technically different from but morally close to the definition of Keenan and Stabler ([17]). The main differences are that their grammar tuple contains 1
2
Note that L0 is not part of the structure. In the practice of linguistics, a lexicon is supposed to contain at least a subset of expressions not in the codomain of any generating function thereby making it a more substantial concept. As defined here the same grammar L might have several different lexica. On this view, theorem 25 in [17], p.161, is then to be expected. To cut down the jungle of diacritics, we usually drop reference to τ and L, and write gi in the following. For notational convenience we will sometimes pretend that every gi is binary. We will distinguish between |L| and L only if necessary, and we will occasionally call the latter by abuse a ’language’. In such cases, the reader should suppose that a concrete grammar is implicitly given.
90
T. Holder
additional information on syntactic sorts and phonological strings, and that we have hardwired in ab initio certain finiteness restrictions. Remark 1. Bare grammars and syntactic homomorphisms yield a category of bare grammars Grm. Example 1. it2:2 : An excessively simplified version of ’Italian’ noun inflection. The lexicon of it2:2 is {parol, cas, libr, punt, a, o}. There is just a single generating function putting the gender inflection on the noun roots: parol, a → parola, punt, o → punto, cas, a → casa, and libr, o → libro. Definition 2 (cf. [17]). Let L be a bare grammar. The automorphism group Aut(L) of L is defined as the group of syntactic automorphisms α : L → L. The orbit of x ∈ L is defined as orb(x) := {y ∈ L : α(x) = y, α ∈ Aut(L)}. Two expressions x, y ∈ L are called structurally equivalent (written x y) iff x ∈ orb(y). Remark 2. Aut(L) is indeed a group, and is indeed an equivalence relation. Syntactic invariants can nowbe defined as invariants of the action of Aut(L), e.g. X ⊆ L is invariant if X ⊇ orb(x), x ∈ X. Invariant subsets form a Boolean lattice with the orbits as atoms ([17], p.29). Example 1 (cont.). The automorphism group of it2:2 is already quite remarkable: The dihedral group D4 of order 8 is the symmetry group of a square (cf. [20]) and of it2:2 : Aut(it2:2 ) ∼ = D4 . To see the correspondence, we label the nominal roots with numbers from 1 to 4 by assigning 1 to /punt/, 3 to /libr/ and 2 to /parol/, 4 to /cas/, thereby making masculin the odd gender: the action on the nominal roots already determines a syntactic automorphism. Then we label the vertices of a square with 1,. . . ,4. D4 is generated by a rotation R of the square in the plane through 90◦ and a reflection D in an axis through one of its vertices. We make the following choices for the generators: 1234 1234 R= D= (1) 4123 1432 Note that R mixes the genders thoroughly whereas D just rearranges the feminine sector. In the nth dihedral group Dn , the following relations hold between the two generators D and R: D2 = 1, Rn = 1, and Rn−1 D = DR. For n = 4 this results in D4 = {1, R, R2 , R3 , RD, R2 D, R3 D, D}. These correspond to the automorphisms in Aut(it2:2 ). An example for a syntactic invariant subset would be orb(a) = {a, o}.
3
Syntactic Sorts
Expressions of a language are from our perspective internally unstructured placeholders that receive their structure from the outside by the way they are shuffled around by the grammar. In this section we show how one can import sortal
Sortal Equivalence of Bare Grammars
91
structure bringing grammars into a form that is more in line with Keenan and Stabler’s approach. With sort information in the grammar in form of a set of sorts, Keenan and Stabler are faced with the problem how to link this sort structure with the structure given by the generating functions. They propose to regulate the relation by imposing several constraints among them category functionality 3 : Definition 3 (cf. [17]). Let L be a bare grammar, π be the function assigning to an expression x ∈ L its sort π(x). L is called category functional iff for every generating function gi there exists a function gˆi on the set of sorts such that gˆi (π(x1 ), . . . , π(xn )) = π(gi (x1 , . . . , xn )) . Remark 3. The internal sort structure of a category functional bare grammar in the sense of Keenan and Stabler amounts to the existence of an algebra structure gˆi on the set of sorts such that the sort assigment π is a homomorphism! This gives a first clue to the sortal structure of bare grammars in our sense: homomorphisms α from L to a sort algebra. It is natural to demand that the sort algebra contains no redundant sorts, and to impose, accordingly, a surjectivity requirement on α. This narrows the candidates for sort algebras down to the class of all homorphic images of L. We can make a best or ’universal’ choice in this class: just take the quotient L/ϑ of L under the maximal congruence relation ϑ, and assign sorts with the canonical projection π : L → L/ϑ! Unfortunately, the maximal congruence on L is the total relation on L, so we had better considered only closed homomorphic images and closed congruences4 . Definition 4. Let L be a bare grammar, μ the maximal closed congruence relation on L. The quotient L/μ is called the sort algebra of L, and the canonical projection π : L → L/μ is called the sort projection. π(x) is called the sort of the expression x. Example 1 (cont.). The sorts of it2:2 in this view are {parol, cas}, {libr, punt}, {a}, {o}, and {parola, casa, libro, punto}. The last set might look suspicious but there is no reason to differentiate between {parola, casa} and {libro, punto}. The quotient gender marking function is defined by [cas], [a] → [casa], and [punt], [o] → [casa]. The following proposition states the universal property of π : L → L/μ. Proposition 1 (Universal sort algebra). Let L be a bare grammar with sort projection π : L → L/μ. Given a closed surjective homomorphism ϕ : L → L 3
4
Their axioms 1 and 2 (pp.140f) are also relevant here, as they relate to the maximality of the sort assigment π below. Axiom 1 says roughly, that expressions with the same distribution should have the same sort, or don’t make too much distinctions ! Axiom 2 provides roughly the reverse requirement: expressions with different distribution should have different sorts, or make enough distinctions ! Their concept of distribution, however, takes only the domain of the generating functions into account whereas congruence relations consider domain and codomain. The affinity between syntactic sorts and equivalence classes of the maximal closed congruence on L has been pointed out by G.Kobele ([18]).
92
T. Holder
there exists a unique closed surjective homomorphism ψ : L → L/μ making the following diagram commute: π
/ L/μ = { { { ϕ { {{ {{ ψ L L
(2)
Proof. This is just the homomorphism theorem ([5], p.35) applied to ϕ and π taking into account that ker ϕ ⊆ ker π = μ. Remark 4. π : L → L/μ is the optimal or free sort assignment function in the sense that it is a terminal object in the category L ⇓ Grmc with objects α all ’possible’ sort assignments i.e. closed surjective homomorphisms L → L . α1 α2 A morphism from L → L1 to L → L2 is a closed surjective homomorphism γ : L1 → L2 such that γ ◦α1 = α2 . Proposition 1 states precisely that π : L→L/μ is a terminal object in L ⇓ Grmc . Remark 5. Goguen ([12]) shows in the context of automata theory that there is an adjunction between a behavior functor E mapping a reachable automaton to its behavior and a realization functor N mapping a language to its minimal automaton. The composite functor N ◦ E then maps a reachable automaton to a minimal automaton with the same behavior which is a terminal object in the category of automata realizing the same behavior and surjective automata homomorphisms. As the minimal automaton is the quotient of the external behavior under a maximal left congruence, the set up is strongly reminiscent of the construction of the sort algebra for a bare grammar, which can also be made functorial by restricting appropriately the grammar morphisms (see prop. 5 below). In this analogy, reachability of states for an automaton translates to surjectivity of the sort assignment for a bare grammar. Remark 6. We can internalize the sorting by forming the pullback of the sort projection along the identity of the sort algebra: L×π L/μ
p2
p1
/ L/μ
(3)
id
L
π
/ L/μ
The carrier of L×π L/μ is then sorted `a la Keenan and Stabler, its elements having the form x, π(x) with computations on the sorts running in parallel with computations on the expressions x ∈ L. Nothing really hinges in this construction on taking π : L → L/μ; in fact, one can take any closed homomorphism α : L → L and put a category functional L -sorting on L by pulling back along idL .
Sortal Equivalence of Bare Grammars
4
93
Sorts and Automorphisms
We discuss the relation between homomorphisms and syntactic sorts. The main result will be that syntactic automorphisms are bound to preserve the sortal structure at least partially. First we introduce an important subgroup of Aut(L). Definition 5. Let L a bare grammar with sort projection π : L → L/μ. A function f : L → L is called sort preserving iff π ◦ f = π. The set of all sort preserving syntactic automorphisms is denoted by Autπ (L). Autπ (L) is easily seen to be a subgroup of Aut(L); in fact, it is even a normal subgroup, as we shall see below (prop. 9). Example 1 (cont.). Autπ (it2:2 ) ∼ = D2 , the Klein four group. Consider the generators of Aut(it2:2 ): D preserves sorts, as does any even power of R; hence Autπ (it2:2 ) ∼ = {1, R2 , R2 D, D}. By setting S := R2 one gets {1, S, SD, D} which is just D2 , the necessary relations S 2 = D2 = 1 and SD = DS between S, D follow from the relations for R, D in D4 . But these are just the relations for the generators of two cyclic groups together with a commutativity relation between the generators, in other words, it is a presentation of the product of two cyclic groups: D2 = C2 × C2 . This is easy to understand on the linguistic side: each gender corresponds to a group representing its internal symmetries, and the commutativity describes the decoupling of the genders. Sort preserving functions map their arguments to elements of the same sort but functions can also be constrained in weaker forms by sortal information: Definition 6. A function f : L → L is said to preserve sortal equality iff π(x) = π(y) implies π (f (x)) = π (f (y)). f is said to reflect sortal equality iff π (f (x)) = π ((y)) implies π(x) = π(y). A function f preserving and reflecting sortal equality is called sort respecting. A sort preserving function is sort respecting but a sort respecting endomorphism does not necessarily perserve sorts. The requirements of sort respect express that f can neither fusion nor fission sorts. Proposition 2 (No Splitting). A surjective closed homomorphism η : L → L preserves sortal equality.5 Proof. The universal property (2) of π yields the following commutative diagram π
/ L/μ = { {{ { η {{ {{ ψ L L
5
(4)
The Span fragment in Keenan and Stabler ([17], p.149) is not a counterexample to the claim, as there is no reason to distinguish gender at the NP-level in Span ; in fact, the fragment violates their axiom 1, p.140.
94
T. Holder
Take x, y ∈ L with π(x) = π(y). The commutativity of the diagram implies that ψ identifies η(x) and η(y). Then η(x) ∼ = η(y) with respect to the congruence relation ker ψ, but μ = ker π is the maximal closed congruence on L , and accordingly ker ψ ⊆ ker π , hence π (η(x)) = π (η(y)) . Proposition 3. If α : L1 → L2 reflects sortal equality then ker α is closed. ∼ f (y), and x ∼ Proof. If x, y ∈ ker α then trivially f (x) = = y by equality reflection; whence ker α ⊆ ker π = μ which is a closed congruence. But the closed congruence relations are down closed in the lattice of all congruence relations ([5], p.31) whence ker α is a closed congruence. A class of injective homomorphisms that preserve sortally equality are provided by the following Proposition 4. Suppose ρ ◦ σ = idL with ρ closed. Then ρ : L → L and σ : L → L are sort respecting. Proof. Chase x, y ∈ L through the following commutative diagram that results from (2) and the observation that ρ is surjective and use (2). id
L
σ
+/ L { { { { π {{ { }{ ψ L /μ / L
ρ
(5)
An immediate consequence of proposition (4) is the useful Corollary 1. A syntactic automorphism is sort respecting.
Before ending the section we discuss sortal equality preserving homomorphisms from another angle. Definition 7. The category of sorted grammars Grmπ has objects sort projections π : L → L/μ. A morphism from π1 : L1 → L1 /μ1 to π2 : L2 → L2 /μ2 is a pair of homomorphisms α : L1 → L2 , α : L1 /μ1 → L2 /μ2 making the following diagram commute: α / L2 (6) L1 π1
L1 /μ1
π2
α
/ L2 /μ2
Remark 7. In α, α both components preserve sortal equality: α because otherwise π2 ◦ α could not be factored through π1 , and α because a sort algebra has only singletons as sorts. On the other hand, any sortal equality preserving α : L1 → L2 induces a unique homomorphism α : L1 /μ1 → L2 /μ2 by setting α([x]1 ) := [α(x)]2 . Hence Grmπ is isomorphic to the subcategory Grmseq of Grm having the same objects as Grm but only sortal equality preserving homomorphisms as morphisms.
Sortal Equivalence of Bare Grammars
95
Considering this correspondence the next proposition comes as no surprise as in Grmπ it amounts to ’forget about upstairs’. Proposition 5. The assignments L → L/μ and α → α define a functor called 2 the functor of sorts Sμ : Grmseq → Grmseq .
5
Language Extensions and Group Extensions
This section starts with a discussion of language extensions leading naturally up to the topic of group extensions6 . We finally exhibit Aut(L) as an extension by Autπ (L). Definition 8. Let L, L be two bare grammars with |L| ⊆ |L|. L is called an extension of L iff the inclusion i : L → L is a homomorphism. Demanding of the inclusion to be a homomorphism is rather a mild restriction. In order to study the relation between L and L we consider the relation between Aut(L) and Aut(L). Definition 9. Let L → L be a language extension, and H a subgroup of Aut(L). The lift of H is defined as H := {α ∈ Aut(L) : α|L = α ∈ H}. H contains at least idL , hence the lift is not empty; in fact, the following proposition is immediate. Proposition 6. H is a subgroup of Aut(L), and the restriction ρ : H → H, α → α|L is a group homomorphism. In particular, the set of α ∈ H having a lift α forms a subgroup of H, namely im(ρ). 2 Definition 10. Aut(L : L) := {idL } is called the relative automorphism group of the extension L → L. Aut(L : L) consists of all automorphisms of L that fix L pointwise. Intuitively, it represents the relations holding in L that are independent of the relations holding in L. Its definition is reminiscent of the Galois group from the theory of field extensions ([4,21]). Proposition 7. For every subgroup H ⊆ Aut(L) : Aut(L : L) = ker(ρ) H. Furthermore, H/Aut(L : L) ∼ = im(ρ). We would like to lift all α ∈ H. This situation corresponds to the following group extension: Proposition 8. Let L → L be language extension. Every α ∈ H has a lift α ∈ Aut(L) iff the sequence ρ
1 → Aut(L : L) → H → H → 1 is exact, or, in other words, iff H is an extension of H by Aut(L : L). 6
The necessary background in group theory can be found in [20] or [22].
(7)
96
T. Holder
Remark 8. Ultimately, we are interested in the case H = Aut(L), giving the ρ left exact sequence: 1 → Aut(L : L) → Aut(L) → Aut(L). We do not have to look far to find language extensions where ρ fails to be surjective: just throw another feminine noun root into it2:2 with the obvious modification of the gender marking function. In the resulting grammar it3:2 feminine and masculine noun roots can not be interchanged by a syntactic automorphism. This illustrates the unstability of the syntactic invariants alluded to in the introduction. Autπ (L) is a candidate for a subgroup with better extension behavior. Proposition 9. Autπ (L) Aut(L)
.
Proof. Consider an expression γ −1 αγ with γ ∈ Aut(L), α ∈ Autπ (L). As α preserves sorts, the sort exchanges of αγ are the same as the exchanges caused by γ. As γ is sort respecting (cor.1), all exchanges are systematic, and hence are undone by γ −1 . This implies γ −1 αγ ∈ Autπ (L). There is no ’subgroup of sort exchanging automorphisms’ because the complement of Autπ (L) is not multiplicatively closed. Since Autπ (L) is a normal subgroup, we can construct the quotient group Xπ (L) := Aut(L)/Autπ (L), and this is as close to a group of sort exchanges as we can get. Corollary 2. Aut(L) is an extension of Xπ (L) by Autπ (L) or, in other words, the following is a short exact sequence: p
1 → Autπ (L) → Aut(L) → Xπ (L) → 1
,
where p : Aut(L) → Xπ (L) is the canonical projection to the quotient group.
(8)
Example 1 (cont.). In Aut(it2:2 ) there are two left cosets, namely Autπ (it2:2 ) and its complement, the set of sort exchanging automorphisms {R, R3 , RD, R3 D}, hence (9) Xπ (it2:2 ) ∼ = {Autπ (L), {R, R3 , RD, R3 D}} ∼ = C2 . π
The extension Autπ (it2:2 ) → Aut(it2:2 ) → Xπ (it2:2 ) is split, i.e. we can decompose Aut(it2:2 ) as Autπ (it2:2 ) · K with K ∼ = Xπ (it2:2 ) , and Autπ (it2:2 ) ∩ K = 1, for K = {1, RD} or K = {1, R3D}. But since neither K is a normal subgroup of Aut(it2:2 ), the extension is not trivial, i.e. Aut(it2:2 ) Autπ (it2:2 ) × K. On the linguistic side, this makes sense: the coupling of the genders by general automorphisms is real, and not trivial as the coupling by sort preserving automorphisms, but it is regular (=split).7 7
The author suspects that the sequence (8) splits in general, and that over reduced lexica Autπ (L) ∼ = i Ski where ki is the number of lexical elements of the ith sort. This gives in the example: Autπ (itn:m ) ∼ = Sn × Sm and Xπ (itn:n ) ∼ = S2 . Here Sk denotes the symmetric group of permutations of a k-element set.
Sortal Equivalence of Bare Grammars
6
97
Defining Sortal Equivalence
We have seen in the previous section that the syntactic invariants are unstable in the sense that they depend on the cardinality of the lexical material. In order to get more stable invariants we have to find a way to filter out the lexical material. An idea to achieve this is by considering two grammars as equivalent when they have isomorphic sort algebras ([18]): the sort algebras are sortally reduced and yet retain homomorphic information on the grammatical structure. In this section and the next we investigate the internal logic of this decision: what does a world of grammars look like in which we can not differentiate between grammars that have isomorphic sort algebras? Remark 9. Since we are interested in the sort structure we take Grmseq as the ambient category in the following. Definition 11. A morphism α : L1 → L2 in Grmseq is called a sortal equiv alence iff α induces an isomorphism of sort algebras α : L1 /μ1 → L2 /μ2 . The class of sortal equivalences is denoted by Σ. Two grammars are called sortally equivalent iff there is a sortal equivalence between them. Proposition 10 (2-out-of-3 property). Given three morphisms α, γ, and α ◦ γ : if any two of them are sortal equivalences, then so is the third. Proof. We just have to check the 2-out-of-3 property ’downstairs’ for α, γ, α ◦ γ, and these have inverses when their upstairs partner is a sortal equivalence. The sortal equivalences are the morphisms that should become isomorphisms in the localization. The 2-out-of-3 property guarantees that they are precisely the isomorphisms in the new category ([8]). Proposition 11. The sort projection π : L → L/μ is a sortal equivalence. In particular, L/μ ∼ = (L/μ)/μL/μ . Proof. This follows from the fact (see [5], p.38) that a closed surjective homomorphism α : X → Y induces an isomorphism between the segment ↑ ker α = [ker α, μX ] in the lattice of closed congruence relations on X and the lattice of closed congruence relations on Y . In the case of π this segment is a singleton: [ker π, μ] = [μ, μ] = μ. Whence μL/μ = idL/μ and L/μ ∼ = (L/μ)/μL/μ . Remark 10. An immediate consequence is that every grammar is sortally equivalent to its sort algebra! In category theory properties of an object are coded by the morphisms that relate it to other objects. The next proposition singles out the class of objects in Grmseq that entertain equivalent relations with sortally equivalent grammars. Proposition 12. Let X ∈ Grmseq be a bare grammar. Then every sortal equiv∼ = alence σ : L1 → L2 induces a bijection σ ∗ : Hom(L2 , X) → Hom(L1 , X) by f → f ◦ σ iff X ∼ = X/μ.
98
T. Holder
Proof. ”⇒”: Suppose X is a bare grammar such that every σ ∈ Σ induces a ∼ = bijection σ ∗ : Hom(L2 , X) → Hom(L1 , X). By (prop.11) π : X → X/μ is in Σ. ∼ = This yields a bijection π ∗ : Hom(X/μ, X) → Hom(X, X). Set f := π ∗−1 (idX ) , giving f ◦ π = idX , but this says that f ([x]) = x , whence π(f ([x]) = π(x) = [x], or π ◦ f = idX/μ . “⇐“: Suppose X ∼ = X/μ. We show first that σ ∗ is surjective: Let g : L1 → X. Since X is sortally reduced and g preserves sortal equality we have ker π1 ⊆ ker g. But then the diagram completion lemma of ([5], p.35) gives a factorization g = h ◦ π1 with unique h : L1 /μ1 → X. The commutative square (6) for sortal equality preserving maps gives a factorization π1 = (σ)−1 ◦ π2 ◦ σ; combining both results yields g = h ◦ (σ)−1 ◦ π2 ◦σ which is just the form of g we want. ∗ Now we show that σ is injective: when g ◦ σ = f ◦ σ then g ◦ σ = f ◦ σ but downstairs σ can be cancelled: g = f . On the other hand, suppose g(x) = f (x) for some x ∈ L2 then we have argued with the diagram completion lemma above that f, g must factor through π2 , whence g(z) = f (z) for all z ∈ π2 (x), and accordingly g = f but then g ◦ σ = f ◦ σ by contraposition of the preceding remark.
7
Localization
In this section we invert the sortal equivalences. We follow the approach to localizations outlined by Dwyer in [8]. The discussion takes the form that we give Dwyer’s general statements and point out in a remark how our special case fits in. Definition 12. A localization context (C, W) is given by a category C and a subcategory W ⊆ C. The class of morphisms in W is denoted W, and its elements are called weak equivalences. Remark 11. We take (Grmseq , GrmΣ ) as localization context where GrmΣ is the subcategory of Grmseq with the same objects but only the sortal equivalences as morphisms. Definition 13. An object X of C is called W-local iff for all morphisms f : A → B in W the induced map f ∗ : HomC (B, X) → HomC (A, X), g → g ◦ f is a bijection. The full subcategory locW (C) of C with objects the W-local objects is called the category of local objects. Remark 12. The idea is that weakly equivalent objects always have equivalent views of a W-local object X. Σ-local objects in the localization context (Grmseq , GrmΣ ) are the grammars that are isomorphic to some L/μ; this was the content of prop. 12. Definition 14. A localization of an object A of C is a morphism λ : A → X with X W-local and λ ∈ W. (C, W) is called a good localization context iff every object A ∈ C has a localization.
Sortal Equivalence of Bare Grammars
99
Remark 13. (Grmseq , GrmΣ ) is a good localization context since the sort projection π : L → L/μ provides for every L a localization: that π ∈ Σ was the content of prop. 11. We have finally reached the point where we can invert the weak equivalences, i.e. we can find a functor Γ : C → CW−1 that maps w ∈ W to isomorphisms in CW−1 in a universal way (see (app. 18) for the exact definition). Fortunately, we don’t have to bother much with Γ as the same information is contained up to equivalence in the category of local objects: Proposition 13 (cf. [8]). Let (C, W) be a good localization context, and CW−1 the category of fractions of W with inverting functor Γ . The composition with Γ the inclusion Γ ◦ i : locW (C) → C → CW−1 is an equivalence of category. Remark 14. In the case of (Grmseq , GrmΣ ) this says that if you want to treat sortal equivalences as isomorphisms, i.e. your grammars live in Grmseq Σ −1 , you might as well pretend that Grmseq contains only sortally reduced grammars, i.e. your grammars live in locΣ (Grmseq ). This is unsurprising: when all you can see is sort algebraic you can only see sort algebras. But can sort algebras really tell you everything you want know about the structure of your grammars? This is the question we will turn to in the last section.
8
Conclusion
A drawback of the concept of sortal equivalence has been already pointed out above in remark 10: all grammars come out as equivalent to their sort algebra. In the case of it2:2 and it3:2 sortal equivalence works fine but they do not possess recursive sorts! The problem with the sort projections is that they conflate the derivational structure of the expressions. As soon as recursive sorts are present the sort algebra is no longer well founded in the sense of [17]: actually our category of grammars locΣ (Grmseq ) is populated by a lot of ’grammars’ that could not pass the axioms imposed on ’reasonable’ bare grammars by Keenan and Stabler. This is not only a matter of taste, as the category of grammars should possess an internal logic that makes certain grammatical constructs possible. If we choose the wrong category we might never be able to prove the things we would like to prove. It seems then better to keep our category of grammars free of general sort algebras, and handle the sort information via a functor to a category of partial algebras. Apparently the effort in the two preceding sections was in vain. Even the category that we got from the localization was to be expected. But although we landed in the end right at the place from where we started, we have seen on the way a highly powerful tool to attack the problem of turning weak equivalences into isomorphisms at work. Situations where conventional isomorphisms yield a too rigid concept of equivalence abound in logic, computer science and linguistics, and various concepts of reduction, bisimulation, etc. are available to supplant isomorphisms. It would be interesting to know if and how they could fit in the context of the category of
100
T. Holder
fractions. From this rendezvous between model theory and model category theory both sides could profit since at the geometric end the mysterious effectiveness of Galilei’s language of lines and triangles in the description of the libro del mondo is just that: a mystery. Mathematicians refer to the categorical approach as localization because in some classical geometric applications the construction corresponds literally to a highlighting of local structure at the expense of global structure; that is, for them the construction is basically a filtering device that permits to neglect systematically certain global parts in order to get a better grip on the surviving local features ([11]). In the special case we have considered the localization stripped down the grammar to its sort algebra. But there are other situations in linguistics where one is interested in getting rid of some features e.g. it would be nice to have a tool to extract from a grammar its purely ’context-sensitive aspects’.8 There is something to explore here.
Appendix We provide some background for partial algebras and the category of fractions. Partial Algebras Good sources for the theory of partial algebras are [5,6,15]. Definition 15. Let Ω be a finite set (of operator symbols). A signature τ is a function τ : Ω → N assigning an arity n to each operator symbol. A partial algebra over signature τ is a tuple X = |X|, (ϕiX )i∈Ω ) where |X| is a set, called the carrier of X, and the ϕiX : |X|τ (i) → |X| are partial functions. X = |X|, (ϕiX )i∈Ω ) is called finitely generated iff there is a finite subset X0 ⊆ |X| such that the smallest subalgebra of X containing X0 is X itself.9 Definition 16. Let X, Y be two partial algebras over the signature τ . A function f : |X| → |Y | is called a homomorphism iff whenever ϕiX (x) is defined for an operation ϕiX and a tuple x then ϕiY (f (x)) is defined, and f (ϕiX (x)) = ϕiY (f (x)). A homomorphism f is called full iff whenever ϕiY (f (x)) is defined and equals f (z) for some z ∈ X, there is w ∈ X such that ϕiX (w) is defined, and f (xj ) = f (wj ) for j = 1, . . . , τ (ϕ). A homomorphism f is called closed10 iff the definedness of ϕiY (f (x)) implies the definedness of ϕiX (x). 8
9
10
The case of modal variants of categorial grammar that provide for controlled contextsensitivity with modal operators suggests that this parallelism has something to it, since in a topos-theoretic setting geometric modal operators do indeed correspond to a closely related kind of localization (cf. [3,13,14]). We abbreviate x1 , . . . xn by x, and write x ∈ X for x ∈ X n . When we write ϕ(x) we presuppose that x is a tuple of the appropriate length. In case X0 is a finite generating set for X we write [X0 ] for |X|. Strong in the terminology of [15].
Sortal Equivalence of Bare Grammars
101
Definition 17. Let X be a partial algebra over τ . An equivalence relation ∼ on |X| is called a congruence relation iff whenever xi ∼ yi , i = 1, . . . , n, and ϕiX (x1 , . . . , xn ) and ϕiX (y1 , . . . , yn ) are both defined then ϕiX (x1 , . . . , xn ) ∼ ϕiX (y1 , . . . , yn ). A congruence relation is closed10 iff ϕiX (x1 , . . . , xn ) is defined precisely in case ϕiX (y1 , . . . , yn ) is defined, in case xi ∼ yi , i = 1, . . . , n. Category of Fractions [2] is a highly recommendable source on category theory in general and on categories of fractions in particular. The articles [8,10] give nice introductions to the inversion of morphisms in the context of abstract homotopy theory, as does [19] for a case close to ours. Definition 18. Let C be a category, and Σ a class of morphisms in C. A category of fractions for Σ is category CΣ −1 together with a functor Γ : C → CΣ −1 satisfying: 1) For all s ∈ Σ: Γ (s) is an isomorphism, and 2) for all functors G : C → D such that G(s) is an isomorphism for all s ∈ Σ, there exists a unique functor F : CΣ −1 → D with F ◦ Γ = G. Remark 15. CΣ −1 is the ’best’ solution to the problem of inverting the morphisms in Σ. Formally, such a solution always exists although it might burst through the roof of your universe. But after taking care of the set theoretician’s headache the category theoretician’s headache sets in because the obtained formal expressions are too unwieldy. There are several techniques to deal with this situation. The oldest is the so called calculus of fractions departing from the primordial case of localization: the passage from the integers Z to the ring of rationals Q. One can manipulate fractions because one can find common denominators and cancel; this suggests to impose similar restrictions on Σ in order to obtain cancellation properties ([2]). The second is Quillen’s model category approach to homotopy theory that supplements Σ with two other classes of morphisms in order to extract information from CΣ −1 ([10,16,19,23]). Unfortunately, the axioms of model categories are themselves hard to check and demand sometimes too much categorical structure. Hence, there are attempts to deflate the theory, e.g. Dwyer’s approach to localizations ([8]), or the theory of homotopical categories ([9]).
References 1. Ad´ amek, J., Herrlich, H., Strecker, G.E.: Abstract and Concrete Categories. Wiley, New York (1990); (available as reprint from Dover New York (2009) and for download at http://katmat.math.uni-bremen.de/acc) 2. Borceux, F.: Handbook of Categorical Algebra 1: Basic Category Theory. Encyclopedia of Mathematics and its Applications, vol. 50. Cambridge University Press, Cambridge (1994) 3. Borceux, F.: Handbook of Categorical Algebra 3: Categories of Sheaves, Encyclopedia of Mathematics and its Applications, vol. 52. Cambridge University Press, Cambridge (1994)
102
T. Holder
4. Borceux, F., Janelidze, G.: Galois Theories, Cambridge studies in advanced mathematics. Cambridge University Press, Cambridge (2001) 5. Burmeister, P.: A Model Theoretic Oriented Approach to Partial Algebras. In: Mathematische Forschung, vol. 32, Akademie-Verlag, Berlin (1986), http://www.mathematik.tu-darmstadt.de/$\sim$burmeister/ 6. Burmeister, P.: Lecture notes on universal algebra - many sorted partial algebras (2002), http://www.mathematik.tu-darmstadt.de/~ burmeister/ 7. Conway, J.H., Burgiel, H., Goodman-Strauss, C.: The Symmetries of Things. AK Peters, Wellesley (2008) 8. Dwyer, W.G.: Localizations. In: Greenless, J.P.C. (ed.) Axiomatic, Enriched and Motivic Homotopy Theory-Proceedings of the NATO ASI, pp. 3–28. Kluwer, Dordrecht (2004), http://hopf.math.purdue.edu/ 9. Dwyer, W.G., Hirschhorn, P.S., Kan, D.S., Smith, J.H.: Homotopy Limit Functors on Model Categories and Homotopical Categories, AMS Monograph, vol. 113. American Mathematical Society Providence (2004) 10. Dwyer, W.G., Spalinski, J.: Homotopy theories and model categories. In: James, I.M. (ed.) Handbook of Algebraic Topology, pp. 73–126. North-Holland, Amsterdam (1995), http://hopf.math.purdue.edu/ 11. Eisenbud, D.: Commutative Algebra, Graduate Texts in Mathematics, vol. 150. Springer, Heidelberg (2004) 12. Goguen, J.A.: Realization is universal. Math. Sys. Theory 6(4), 359–374 (1973) 13. Goldblatt, R.: Grothendieck topology as geometric modality. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik 27, 495–529 (1981) 14. Goldblatt, R.: Topoi - The Categorial Analysis of Logic, Studies in Logic and the Foundations of Mathematics, 2nd edn., vol. 98. North-Holland, Amsterdam (1984) 15. Gr¨ atzer, G.: Universal Algebra. The University Series in Higher Mathematics. Van Nostrand Princeton (1968) 16. Hovey, M.: Model Categories, AMS Monograph. American Mathematical Society Providence (1999) 17. Keenan, E., Stabler, E.: Bare Grammar. CLSI, Stanford (2003) 18. Kobele, G.: Structure and similarity, ms. UCLA (2002) 19. L´ arusson, F.: The homotopy theory of equivalence relations (2006), arXiv:math.AT/0611344v1 20. Mac Lane, S., Birkhoff, G.: Algebra. Macmillan, New York (1967) 21. Milne, J.S.: Fields and Galois theory (2003), http://www.jmilne.org/math/ 22. Milne, J.S.: Group theory (2003), http://www.jmilne.org/math/ 23. Quillen, D.G.: Homotopical Algebra. LNM. Springer, Heidelberg (1967) 24. Scott, W.R.: Group Theory. Prentice-Hall, Englewood Cliffs (1964), (available as reprint from Dover, New York, 1987)
Deriving Syntactic Properties of Arguments and Adjuncts from Neo-Davidsonian Semantics Tim Hunter Department of Linguistics University of Maryland
Abstract. This paper aims to show that certain syntactic differences between arguments and adjuncts can be thought of as a transparent reflection of differences between their contributions to neo-Davidsonian logical forms. Specifically, the crucial underlying distinction will be that between modifying an event variable directly, and modifying an event variable indirectly via a thematic relation. I note a convergence between the semantic composition of neo-Davidsonian logical forms and existing descriptions of the syntactic properties of adjunction, and then propose a novel integration of syntactic mechanisms with explicit neo-Davidsonian semantics which sheds light on the nature of the distinction between arguments and adjuncts.
This paper aims to show that certain syntactic differences between arguments and adjuncts can be thought of as a transparent reflection of differences between their contributions to neo-Davidsonian logical forms. Specifically, the crucial underlying distinction will be that between modifying an event variable directly, as violently and yesterday do in (1), and modifying an event variable indirectly via a thematic relation, as b and c do in (1). (1)
a. Brutus stabbed Caesar violently yesterday b. ∃e[stabbing(e) ∧ Stabber(e, b) ∧ Stabbee(e, c) ∧ violent(e) ∧ yesterday(e)]
I note a convergence between the semantic composition of neo-Davidsonian logical forms and the mechanisms added by Frey and G¨artner [6] to Stabler’s Minimalist Grammar (MG) formalism [18] to allow adjunction phenomena. Frey and G¨ artner provide an accurate formal encapsulation of what the distinctive syntactic properties of adjunction are. This paper contributes a novel integration of syntactic mechanisms with explicit neo-Davidsonian semantics which sheds light on why these properties cluster together.
1
Two Classes of Words
Consider the sentence in (2) and the variants of it in (3–6).
Thanks to Norbert Hornstein, Greg Kobele, Paul Pietroski, Amy Weinberg and Alexander Williams for helpful discussions related to this paper.
C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 103–116, 2010. c Springer-Verlag Berlin Heidelberg 2010
104
T. Hunter
(2)
Brutus stabbed Caesar
(3)
a. b. c. d.
(4)
a. b. c.
(5)
a. b. c.
(6)
a. b. c.
Brutus Brutus Brutus Brutus
stabbed stabbed stabbed stabbed
Caesar Caesar Caesar Caesar
violently yesterday violently yesterday yesterday violently
* Brutus stabbed Caesar Cassius * Brutus stabbed Caesar Antony * Brutus stabbed Caesar Cassius Antony Caesar stabbed Brutus Brutus stabbed Cassius Antony stabbed Caesar * Brutus stabbed * stabbed Caesar * stabbed
First, we can infer from (3) that there is some class of words, including ‘violently’ and ‘yesterday’, which can be boundlessly added to the sentence in (2) without affecting grammaticality, though no word of this class need be present. Let us call this class of words Class 1. We also note that each sentence in (3) implies the one in (2), and infer that any sentence including a word of Class 1 implies the sentence just like it but with that word removed; see Fig. 1. The data in (4) indicates that there is also some other class of words, including at least ‘Cassius’ and ‘Antony’, which can not be added to the sentence in (2) without affecting grammaticality. Let us call this class of words Class 2. Next, we can infer on the basis of (5) that ‘Brutus’ and ‘Caesar’ belong to a single class of words, since they can be interchanged without affecting grammaticality; and that this class must be Class 2, that of ‘Cassius’ and ‘Antony’. We also discover a difference between Class 1 and Class 2: while interchanging two Class 1 words does not produce any obvious difference in meaning (compare (3c) and (3d)), interchanging two Class 2 words does (compare (2) and (5a)). Finally, (6) shows that Class 2 words also can not be removed without affecting grammaticality, just as (4) shows that they can not be added. In sum we have discovered two classes of words with the following properties. (7)
Distributional properties: a. Of Class 1 words, any number zero or greater can be present in a sentence constructed around ‘stabbed’. b. Of Class 2 words, exactly two must be present in a sentence constructed around ‘stabbed’.
(8)
Semantic properties: a. When two Class 1 words are interchanged, no obvious difference in meaning results.
Deriving Syntactic Properties of Arguments and Adjuncts
105
(3c) ≡ (3d)
(3b)
(3a)
(2) Fig. 1. The “diamond entailment” pattern among the sentences in (2) and (3)
b. When two Class 2 words are interchanged, an obvious difference in meaning results.1 c. Removing a Class 1 word from a sentence weakens its truth condition (i.e. never produces a false sentence from a true one, holding facts about the world constant). These facts are not news, but the aim of this paper will be to show that the properties we have discovered for Class 1 words go together naturally, and that the properties we have discovered for Class 2 words go together naturally.2 This provides an explanation for the lack of an imaginary Class 3, of which any number zero or greater can be present but which produce differences in meaning when interchanged, and the lack of an imaginary Class 4, of which exactly n must be present but which produce no difference in meaning when interchanged. In section 2 I introduce an initially-appealing account of the distributional facts that leaves the semantic facts unexplained. After considering a way to capture the semantic facts alone in section 3, and a more complicated account of the distributional facts alone in section 4, I show in sections 5 and 6 that these two independent proposals can be integrated into a unified account of the underlying distinction between Class 1 and Class 2.
2
A First Approach
An account of the distributional properties in (7) of these two classes of words can be constructed in any formalism where expressions “select” others of certain types. Specifically, one can propose that ‘stabbed’ selects two words of Class 2 to yield a complete sentential expression, and that a word of Class 1 selects an 1
2
The relevant sense of “interchange” here is to swap the occurrences of two lexical items throughout a derivation, not just to swap their linear order. Thus the synonymy of ‘John gave a book to Mary’ and ‘John gave Mary a book’ is not a counterexample. Of course, what I have called Class 1 is the class of adjuncts and what I have called Class 2 is the class of arguments. I avoid the terms “argument” and “adjunct” only to prevent familiarity with these properties from obscuring the fact that the observed correlations do not yet follow from anything.
106
T. Hunter
expression of some type to yield a new expression of that same type. In categorial grammars, this would be encoded using categories like those in (9) (ignoring directionality); in Minimalist Grammars (MGs) [18], it can be equivalently encoded using the lexicon in (10). stabbed :: (v/d)/d
Brutus :: d Caesar :: d
yesterday :: v/v violently :: v/v
stabbed :: =d =d v
Brutus :: d
yesterday :: =v v
Caesar :: d
violently :: =v v
(9)
(10)
This clearly derives the desired distributional properties in (7). But this “saturation” conception of syntactic composition is standardly taken to be reflected transparently in semantic composition, such that the syntactic selector denotes a function which is applied to the denotation of the selected expression — explicitly throughout the categorial grammar literature, including categorial treatments of MGs [1,11,20], and (with some exceptions in cases more complex than anything considered here) by Kobele [8,9] for MGs as standardly presented. This produces the logical forms in (2 ) and (3 ) for the sentences in (2) and (3), which leave the semantic properties noted in (8) completely mysterious.3 Why should (3c) and (3d) be synonymous, and why should each of the sentences in (3) imply the one in (2)? (2 )
(3 )
stab(c)(b) a. b. c. d.
violently(stab(c)(b)) yesterday(stab(c)(b)) yesterday(violently(stab(c)(b))) violently(yesterday(stab(c)(b)))
Of course, it is possible to define yesterday and violently such that these implications do hold, but this would be a separate, stipulated property of each such lexical item. Nothing in the formalism precludes lexical items with syntactic and semantic types identical to those of ‘yesterday’ and ‘violently’ (and therefore identical distribution), which would produce a difference in meaning when interchanged — that is, nothing in the formalism precludes the imaginary Class 3 mentioned above. 3
To simplify the comparison with the addition of a specialised adjoin operation in MGs, discussed in section 4, I assume that ‘yesterday’ and ‘violently’ attach higher than both ‘Brutus’ and ‘Caesar’, such that semantically they must be operators on structured propositions. If they were to attach lower with a category of the form (v/d)/(v/d), semantically VP-operators, the lack of the relevant implications remains.
Deriving Syntactic Properties of Arguments and Adjuncts
107
A second problem with this approach is that the selecting expression is generally taken to contribute the head of a newly-formed expression (such that ‘stabbed’ is the head of ‘stabbed Caesar’), which predicts that the head of ‘Brutus stabbed Caesar yesterday’ should be ‘yesterday’. This conflicts with the general assumption that the head of ‘Brutus stabbed Caesar yesterday’ should be the head of ‘Brutus stabbed Caesar’.4 I return to this in section 4, after considering a way to capture the desired semantic properties in section 3.
3
Neo-Davidsonian Logical Forms
In order to capture the implications which are left unexplained by (2 ) and (3 ), it has been proposed [4,5,3,13,16,14] that the sentences in (2) and (3) should be associated with the logical forms in (2 ) and (3 ), where stabbing, violent and yesterday are predicates of events.5 (2 )
(3 )
∃e[stabbing(e) ∧ Stabber(b)(e) ∧ Stabbee(c)(e)] a. ∃e[stabbing(e) ∧ Stabber(b)(e) ∧ Stabbee(c)(e) ∧ violent(e)] b. ∃e[stabbing(e) ∧ Stabber(b)(e) ∧ Stabbee(c)(e) ∧ yesterday(e)] c. ∃e[stabbing(e) ∧ Stabber(b)(e) ∧ Stabbee(c)(e) ∧ violent(e) ∧ yesterday(e)] d. ∃e[stabbing(e) ∧ Stabber(b)(e) ∧ Stabbee(c)(e) ∧ yesterday(e) ∧ violent(e)]
The logical form in (3d ) (equivalent to that in (3c )) says that there exists an event with five properties: it was a stabbing event, its stabber was Brutus, its stabbee was Caesar, it was yesterday, and it was violent. The five words in (3d) correspond one-to-one with these properties. The observation in (8c) of the implications from (3) to (2) then follows trivially. What is particularly crucial for present purposes is to note the differences between the way in which ‘Brutus’ and ‘Caesar’, and the way in which ‘violently’ and ‘yesterday’, contribute their respective conjuncts. Whereas the lexical contents of ‘violent’ and ‘yesterday’ modify the event variable directly, the lexical contents of ‘Brutus’ and ‘Caesar’ do so only indirectly via the thematic relations of Stabber and Stabbee. Thus there is no room for ‘yesterday’ to contribute to the meaning of (3d) in some way that is different from the way it contributes to the meaning of (3c), and likewise for ‘violently’. But the more complex relation between ‘Brutus’ (or ‘Caesar’) and the event variable leaves room for ‘Brutus’ 4 5
Or, if ‘yesterday’ has a type like (v/d)/(v/d), this approach predicts that the head of ‘stabbed Caesar yesterday’ should be ‘yesterday’. The problem remains. It is more common to see something like Stabber(e, b) in place of Stabber(b)(e) as I have written here. For reasons that will become clear below, it is useful to think of Stabber and Stabbee as curried two-place functions — functions from individuals to event predicates (cf. [2]).
108
T. Hunter
to contribute to the meaning of (5a) in a way that is different from the way it contributes to the meaning of (2).6 These logical forms therefore seem to permit the right two kinds of semantic contributions, given what we observed in section 1: elements that contribute atomic event predicates will be interchangeable in a way that leaves meaning unaffected (property (8a)), and those that contribute parts of complex event predicates will not (property (8b)). But what remains unexplained is why semantic property (8a) is correlated with distributional property (7a), and why semantic property (8b) is correlated with distributional property (7b). As mentioned in section 2, it is possible to choose lexical meanings such that the function-application approach to semantic composition illustrated in (2 ) and (3 ) yields exactly the forms in (2 ) and (3 ), and therefore produces the desired implication relations. The meanings in (11) and (12), for example (with existential closure understood to apply to a sentential event predicate), would suffice. (11) (12)
stab = λxλyλe.stabbing(e) ∧ Stabber(y)(e) ∧ Stabbee(x)(e) violently = λP λe.P (e) ∧ violent(e)
This correctly encodes the fact that ‘violently’ has the semantic properties we observed in section 1. It says nothing, however, about the general correlation between these semantic properties and the distributional properties of optionality and iterability (property (7a)). A formalism where semantics mirrors syntax in the way sketched in section 2 is equally consistent with other lexical items with the same type, and therefore the same distributional properties, as ‘violently’, but with different semantic properties. But we do not find variants with meanings like those shown in (13).7 (13)
violently = λP λe.P (e) ∨ violent(e) violently = λP λe.P (e) → violent(e)
In other words, a word with the distribution of ‘violently’ does not “choose” its own logical connective. 6
7
Granted, the non-interchangeability observed in (8b) only follows if the thematic relations Stabber and Stabbee are necessarily distinct. The integrated proposal in section 5 will, I hope, make this seem reasonable. Even if it must be stipulated that they are distinct, this is certainly no worse than stipulating that stab(c)(b) and stab(b)(c) must be distinct in the system of section 2. This is of course a simplification: there exist adverbs with non-intersective interpretations, such as ‘allegedly’ and ‘apparently’. Whatever the correct logical form turns out to be for sentences such as ‘Brutus walked allegedly’, it seems likely that the relevant logical connective will be conjunction (and not, say, disjunction), although the conjuncts will not be of the simple sort I restrict attention to in this paper. Note that this problem is not confined to event-based adverbial modification: any theory that treats adjectival modification as generally intersective (eg. [7]) will encounter similar problems with ‘fake diamond’, ‘big ant’ and ‘alleged stabbing’. Larson’s [10] approach to some such adjectives treats them as intersections not of sets of individuals but rather of sets of events (see also [12]) — as mentioned above, even these seem to express conjunction of something.
Deriving Syntactic Properties of Arguments and Adjuncts
109
The same argument can be made with respect to the logical connective in the lexical meaning of stab given in (11): we do not find variants with meanings like those shown in (14). (14)
stab = λxλyλe.stabbing(e) ∨ Stabber(y)(e) ∨ Stabbee(x)(e) stab = λxλyλe.stabbing(e) ∧ Stabber(y)(e) → Stabbee(x)(e)
This relies crucially on the assumption that the logical form of ‘Brutus stabbed Caesar’ is not merely ∃e[stabbing(e, b, c)]; but see especially Schein [16,17] for convincing evidence of this. Given this assumption, we can observe that verbs, like adverbs, do not “choose” their own logical connective. In section 5 I will propose an alternative formalism which does not allow us to define the lexical meanings in (13) and (14), but rather permits only the degrees of freedom that natural languages seem to make use of. The new proposal will tie the neo-Davidsonian logical forms discussed here to an account of the syntactic properties of adjunction which I describe in the next section.
4
An Adjunction Operation in MGs
Recall that the simple selection-based proposal in section 2 wrongly establishes Class 1 words as heads — ‘yesterday’ as the head of ‘Brutus stabbed Caesar yesterday’, for example. I now turn to a proposed addition to the MG formalism which captures the distributional facts in (7) while avoiding this problem. This will be shown in the next section to unify elegantly with the event-based logical forms that capture the semantic properties in (8). The operation for combining two expressions in the original MGs [18], merge, is the analogue of slash-elimination in categorial grammars that would derive (2) from the lexicon in (10) in the obvious way.8 (15)
stabbed :: =d =d v Caesar :: d stabbed Caesar : =d v
merge
This is a “symmetric feature checking” operation in that it deletes one feature from ‘stabbed’ and one feature from ‘Caesar’ in the above example. Frey and G¨ artner [6] introduce an asymmetric feature checking counterpart of merge, called adjoin. This checks a feature of a new sort, written ~f, on one of its input expressions, and leaves the features of the other input expression unchanged (though this other expression must have a feature f which “matches” the ~f feature to be checked). (16)
Brutus stabbed Caesar : v yesterday :: ~v Brutus stabbed Caesar yesterday : v
adjoin
Furthermore, the adjoin operation is defined such that the head of the new expression is (the head of) the expression whose features remained unchanged. The 8
Although it will not be crucial for an understanding of what is to come, a short introduction to the MG formalism is given in the appendix.
110
T. Hunter
lexicon shown in (17) will therefore capture the desired distributional properties in (7) while assigning each expression the correct head. (17)
stabbed :: =d =d v
Brutus :: d Caesar :: d
yesterday :: ~v violently :: ~v
As stated at the outset, these mechanisms provide an accurate formal description of the observed distributional properties, but the question remains: why does distributional property (7a) coincide with semantic property (8a), and distributional property (7b) with semantic property (8b)? Put differently, we have observed in section 3 that neo-Davidsonian logical forms permit the right two kinds of semantic composition (atomic event predicates and complex thematic ones), and also that the supplemented MG formalism permits the right two kinds of syntactic composition; we would still like to know why adjoin produces only simple event predicates and merge produces only complex ones. The next section answers this question by showing that the feature-checking patterns of adjoin and merge line up neatly with the differing patterns of semantic composition exhibited by Class 1 and Class 2 words respectively in neo-Davidsonian logical forms.
5
An Integrated Proposal
I note now a convergence between the neo-Davidsonian logical forms presented in section 3 and the MG formalism when supplemented with the adjoin operation presented in section 4. When ‘stabbed’ combines with ‘Caesar’ via the merge operation, as shown in (15), two features are checked, and the predicate of events that is introduced has two “ingredients”, Stabbee and b. When ‘Brutus stabbed Caesar’ combines with ‘yesterday’ via the adjoin operation, as shown in (16), only one feature is checked, and the predicate of events that is introduced has only one “ingredient”, yesterday. Suppose, then, that features are annotated with these semantic ingredients, as shown in (18).
(18)
stabbed :: =d, Stabbee =d, Stabber v, stabbing Brutus :: d, b yesterday :: ~v, yesterday Caesar :: d, c violently :: ~v, violent
The intuition is that c will be established as the Stabbee precisely because the application of merge that checks the d feature of ‘Caesar’ also checks the =d feature of ‘stabbed’ which is annotated with the Stabbee thematic relation. Likewise, b will be established as the Stabber as a result of the second application of merge. These two steps, constituting the derivation of (2), are shown in Fig. 2(a),9 where we define P & Q := λe.P (e) ∧ Q(e). 9
When a sentential expression denotes a predicate P , the sentence’s truth condition is ∃e[P (e)].
stabbed :: =d, Stabbee =d, Stabber v, stabbing Caesar :: d, c stabbed Caesar : =d, Stabber v, stabbing & Stabbee(c) Brutus stabbed Caesar : v, stabbing & Stabbee(c) & Stabber(b) merge
Fig. 2. Derivations referred to in the main text
(c) An application of merge with existential closure of the selected predicate
Brutus who is tall : d, b & tall
stabbed Caesar : =d, Stabber v, stabbing & Stabbee∃ (c) Brutus who is tall stabbed Caesar : v, stabbing & Stabbee∃ (c) & Stabber∃ (b & tall)
(b) An application of adjoin with corresponding conjunctive interpretation
Brutus stabbed Caesar : v, stabbing & Stabbee(c) & Stabber(b)
yesterday :: ~v, yesterday Brutus stabbed Caesar yesterday : v, stabbing & Stabbee(c) & Stabber(b) & yesterday
(a) Two applications of merge with corresponding thematic roles being assigned
Brutus :: d, b
merge
adjoin
merge
Deriving Syntactic Properties of Arguments and Adjuncts 111
112
T. Hunter
This suggests the general schema in (19) for the merge operation, though this will be revised in section 6. (19)
s1 :: =f1 , θ1 . . . =fn , θn g, α s2 :: f1 , β s1 s2 : =f2 , θ2 . . . =fn , θn g, α & θ1 (β)
merge
The derivation can continue with an application of the adjoin operation which adds ‘yesterday’. This does not check any features of ‘Brutus stabbed Caesar’, and the conjunct added is simply yesterday, as shown in Fig. 2(b). This suggests the general schema in (20) for the adjoin operation. s1 :: f, α s2 :: ~f, β (20) adjoin s1 s2 : f, α & β Clearly this requires that α and β be predicates of some common type. Let T be a function mapping syntactic categories to semantic types; and let e be the semantic type of entities/individuals, s the semantic type of events, and t the semantic type of truth values. Then we can set T (v) = s → t, for example, and require that the semantic annotation of any feature f, and of any feature ~f, be of type T (f). Returning now to the schema for merge in (19), the semantic value α & θ1 (β) must be of type T (g), as must therefore θ1 (β) itself. Since β must be of type T (f1 ), θ1 must be of type T (f1 ) → T (g). The semantic annotation of a selecting feature, then, must be a function mapping the semantic type corresponding to the selected syntactic category, to the semantic type corresponding to the syntactic category at the end of its feature sequence. Setting T (d) = e, the Stabbee annotation of the =d feature of ‘stabbed’, for example, is of type T (d) → T (v) or e → (s → t), a function from individuals to event predicates. This can be thought of as an implementation of Carlson’s [2] idea that thematic roles adjust NP denotations to make them suitable for intersection (conjunction) with a set (predicate) of events. This theory accounts for the correlation of properties of Class 1 words and Class 2 words observed in (2-6). Class 1 words are those that have as semantic value a predicate with type T (v), ready for conjunctive interpretation; they are therefore optional but can be added without bound. Class 2 words are those that have a semantic value of some other type, and must therefore interact with a thematic relation for interpretation; since each verb makes available a precise finite number of these thematic relations, a precise finite number of these words will need to appear.
6
Generalising beyond the Verbal Domain
If these schemas in (19) and (20) were to be adopted for the merge and adjoin operations throughout the grammar, a problem would arise. The arbitrary category f in the adjoin schema must correspond some monadic predicate type, in order for the schema to be well-typed; but the nominal expressions of category
Deriving Syntactic Properties of Arguments and Adjuncts
113
d that are selected by verbs, which I have assumed denote individuals (type e), are thought to be possible adjunction sites as well. Indeed, adjunction to any category is generally thought to be possible, suggesting that for any category f, T (f) = τ → t for some semantic type τ . Pietroski [14,15] has suggested exactly this: that every expression of natural language denotes some monadic predicate. Space constraints preclude rehearsing the independent empirical justification for this here, but the proposal allows for the distinction noted in the verbal domain, between the two ways to modify a neo-Davidsonian event variable, to be extended to other categories. A canonical case of adjunction in the nominal domain, with a clear conjunctive interpretation, is the attachment of a relative clause. This is illustrated in the following application of the adjoin schema, where tall and b are both predicates of individuals (type e → t): who is tall : ~d, tall Brutus who is tall : d, b & tall
Brutus :: d, b
adjoin
The interpretation of the checking of selecting features, upon application of merge, must now involve existential closure of the predicate denoted by the selected expression. We define in (21) a unary operator ·∃ that modifies the annotations of selecting features (i.e. generalisations of thematic roles) to include the appropriate existential closure, such that the truth condition in (22b) can be written as in (22c). (21) (22)
R∃ (P ) := λe.∃x[P (x) ∧ R(x)(e)] a. Brutus, who is tall, stabbed Caesar b. ∃e[stabbing(e) ∧ ∃x[(b & tall)(x) ∧ Stabber(x)(e)] ∧ ∃y[c(y) ∧ Stabbee(y)(e)]] c. ∃e[(stabbing & Stabber∃ (b & tall) & Stabbee∃ (c))(e)]
So if R is (the curried characteristic function of) a binary relation between elements of τ1 and τ2 , and P is a predicate of elements of τ1 , then R∃ (P ) is the predicate satisfied by e ∈ τ2 iff there is some x ∈ τ1 satisfying P such that R(x)(e). For example, Stabber∃ (P ) is a predicate satisfied by those events of which a stabber satisfies P . The revised schema for the merge operation in (23) applies this ·∃ operator to the annotation of the selecting features. (23)
s1 :: =f1 , θ1 . . . =fn , θn g, α s2 :: f1 , β s1 s2 : =f2 , θ2 . . . =fn , θn g, α & θ1∃ (β)
merge
Now the semantic annotation θi for each selecting feature =fi is of type dom(T (fi )) → T (g), such that θi∃ is of type T (fi ) → T (g). Then the last step in the derivation of (22a) will proceed as shown in Fig. 2(c).
114
T. Hunter
The complete lexicon for the examples I have considered is given in (24).
(24)
stabbed :: =d, Stabbee , =d, Stabber , v, stabbing Brutus :: d, b yesterday :: ~v, yesterday Caesar :: d, c violently :: ~v, violent who is tall :: ~d, tall stabbing, yesterday, violent ∈ (s → t) b, c, tall ∈ (e → t)
where:
Stabber, Stabbee ∈ (e → (s → t)) If we limit our attention to applications of merge, then syntactic types (i.e. feature sequences) uniquely determine semantic types, just as in familiar categorial grammars. If in addition we assume that no semantic type corresponds to more than one syntactic category (i.e. that T is injective), there is a one-to-one correspondence between syntactic types and semantic types: whereas (most) categorial grammars use distinct slashes to produce two syntactic types for each semantic type as shown in (25), MGs do not distinguish between selection on the left or on the right (see appendix) and therefore we have the single rule in (26). (25)
T (A/B) = T (B) → T (A)
T (B\A) = T (B) → T (A)
(26) T (=f1 . . . =fn g) = (dom(T (f1 )) → T (g)) × . . . × (dom(T (fn )) → T (g)) × T (g) With this in mind, we can understand syntactic composition to be “driven by” semantic types when the merge rule is applied in (27), much as we can think of the composition in (28) as driven by (simpler and more familiar) semantic types. (27)
Brutus :: be→t
Brutus walked : walking & Walker∃ (b)
merge
Brutus :: be walked :: walkede→t Brutus walked :: walked(b)
(28)
7
walked :: Walker, walking(e→(s→t))×(s→t)
Conclusion
This unified system presented in sections 5 and 6 illustrates that the stipulated differences between the two modes of syntactic composition in the MG formalism can be thought of as a transparent reflection of neo-Davidsonian semantic composition. The correlation of distributional properties and semantic properties observed in section 1 is then explained. An expression will either denote a
Deriving Syntactic Properties of Arguments and Adjuncts
115
predicate of the same type as the one it attaches to, in which case its interpretation will be strictly conjunctive and it will be able to attach unassisted; or it will denote a predicate of some different type, in which case it will need to interact with an appropriate “thematic” relation, of which the attachment site makes a precise finite number available, for interpretation to be possible.
A
Appendix: (A Tiny Subset of) Minimalist Grammars
The aim of this appendix is to make explicit the way in which the merge and adjoin operations compose the string components of MG expressions, which I have glossed over in this paper. It is certainly not a general introduction to the MG formalism; see [18,19] for that purpose. We assume an alphabet Σ and a set C of categories. The set of features is F = C ∪ {=c : c ∈ C} ∪ {~c : c ∈ C} and the set of expressions is E = Σ ∗ × {:, ::} × F∗ Lexical expressions or lexical items are expressions with :: as their second coordinate; expressions with : as their second coordinate are non-lexical or complex. In the original presentation of MGs [18], the non-lexical expressions are binary trees with lexical items at the leaves. These trees allow movement operations to be defined which rearrange the internal structure of a tree, but since I have made no use of these movement operations in this paper we can ignore this additional complexity and instead adopt representations more like, but even simpler than, those of [19]. A language is defined as the closure of a set of lexical items under the structure-building functions: for our purposes, only two are relevant, merge and adjoin, where merge is the union of the two functions merge1 and merge2 , defined below. In these definitions c ranges over categories, α ranges over sequences of features, · ranges over {:, ::} and s, t range over strings. s :: =cα t·c st : α
merge1
t·c
s : =cα ts : α
merge2
s · cα t · ~c adjoin st : cα In brief, when a lexical expression “selects” another expression, the string yield of the selectee (a complement) is concatenated on the right (merge1 ), and when a complex expression selects another expression, the string yield of the selectee (a specifier) is concatenated on the left (merge2 ); and when an expression adjoins to another, its string yield is concatenated on the right. In the main text of the paper I take these rules as operations on categorised strings as a starting point, and propose a way to add semantics to them.
116
T. Hunter
References 1. Amblard, M.: R´epresentations s´emantiques pour les grammaires minimalistes. Tech. Rep. 5360, INRIA (2004) 2. Carlson, G.: Thematic roles and their role in semantic interpretation. Linguistics 22, 259–279 (1984) 3. Casta˜ neda, H.N.: Comments. In: Rescher, N. (ed.) The Logic of Decision and Action. University of Pittsburgh Press, Pittsburgh (1967) 4. Davidson, D.: The logical form of action sentences. In: Rescher, N. (ed.) The Logic of Decision and Action. University of Pittsburgh Press, Pittsburgh (1967) 5. Davidson, D.: Adverbs of action. In: Vermazen, B., Hintikka, M. (eds.) Essays on Davidson: Actions and Events. Clarendon Press, Oxford (1985) 6. Frey, W., G¨ artner, H.M.: On the treatment of scrambling and adjunction in minimalist grammars. In: J¨ ager, G., Monachesi, P., Penn, G., Wintner, S. (eds.) Proceedings of Formal Grammar 2002, pp. 41–52 (2002) 7. Heim, I., Kratzer, A.: Semantics in Generative Grammar. Blackwell, Oxford (1998) 8. Kobele, G.M.: Generating Copies: An investigation into Structural Identity in Language and Grammar. Ph.D. thesis, UCLA (2006) 9. Kobele, G.M.: Inverse linking in minimalist grammars, ms. (2009) 10. Larson, R.K.: Events and modification in nominals. In: Proceedings of SALT VIII, pp. 145–168. CLC Publications, Ithaca (1998) 11. Lecomte, A., Retor´e, C.: Extending lambek grammars: a logical account of minimalist grammars. In: ACL 2001: Proceedings of the 39th Annual Meeting of Association for Computational Linguistics, pp. 362–369. Association for Computational Linguistics, Morristown (2001) 12. McNally, L., Boleda, G.: Relational adjectives as properties of kinds. Empirical Issues in Formal Syntax and Semantics 5, 179–196 (2004) 13. Parsons, T.: Events in the semantics of English. MIT Press, Cambridge (1990) 14. Pietroski, P.M.: Events and Semantic Architecture, Oxford, New York (2005) 15. Pietroski, P.M.: Interpreting concatenation and concatenates. Philosophical Issues 16(1), 221–245 (2006) 16. Schein, B.: Plurals and Events. MIT Press, Cambridge (1993) 17. Schein, B.: Events and the semantic content of thematic relations. In: Preyer, G., Peter, G. (eds.) Logical Form and Language, pp. 263–344. Oxford University Press, Oxford (2002) 18. Stabler, E.P.: Derivational minimalism. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 68–95. Springer, Heidelberg (1997) 19. Stabler, E.P., Keenan, E.L.: Structural similarity within and among languages. Theoretical Computer Science 293, 345–363 (2003) 20. Vermaat, W.: Controlling Movement: Minimalism in a deductive perspective. Master’s thesis, Utrecht University (1999)
On Monadic Second-Order Theories of Multidominance Structures Stephan Kepser K¨ oniggr¨ atzer Str. 11 42699 Solingen, Germany
[email protected]
1
Introduction
Multidominance structures were introduced by Kracht [4] to provide a data structure for the formalisation of various aspects of GB-Theory. Kracht studied the PDL-theory of MDSes and showed that this theory is decidable in [5], actually 2EXPTIME-complete. He continues to conjecture that thus the MSO-theory of MDSes should be decidable, too. We show here the contrary. Actually, both the MSO-theory over vertices only and the MSO-theory over vertices and edges turn out to be undecidable.
2
Preliminaries
Graphs All graphs and structures discussed in this paper are finite. An undirected graph is a pair G = (VG , EG ) where VG is a finite set of vertices and EG is a set of edges, a subset of VG × VG . In the graphs (and multi graphs) we consider there is never an edge from any vertex to itself. A graph signature Σ is a finite set of symbols together with an arity. In this paper we do not consider hypergraphs. Consequently, all symbols have arity 0 for vertex labels or 2 for edge labels. Thus we split Σ into Σ0 and Σ2 . Edge labels are sometimes also called colours. For the purpose of this paper we are mainly interested in the structure of graphs. We thus assume that Σ0 consists of a single blank symbol, which is suppressed in the following. A multi graph of signature Σ is a quaduple G = (VG , EG , sig, inc) where VG is a finite set of vertices, EG is a finite set of directed edges, sig : VG → Σ0 ∪ EG → Σ2 a function assigning each vertex a label and each edge a colour, and inc : EG → VG × VG a function assigning each edge its starting and ending vertices. Note that there may be more than one edge between two vertices of a multi graph. A multi graph is simple, iff each pair of vertices is connected by at most one edge. Paths in a multi graph are uncoloured but directed. A multi graph is acyclic iff no path connets a vertex with itself. A multi graph is rooted iff there is a vertex r such that (i) there is no vertex v and edge e with inc(e, v, r) and (ii) every vertex is reachable from r. C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 117–128, 2010. c Springer-Verlag Berlin Heidelberg 2010
118
S. Kepser
• • •
•
•
•
•
•
•
• •
•
Fig. 1. The complete graphs K3 , K4 , and K5
The underlying undirected graph Gu of a multi graph G = (VG , EG , sig, inc) is Gu = (VG , Eu ) where (v, w) ∈ Eu iff there is some e ∈ EG with inc(e, v, w) or inc(e, w, v). That is we forget about the direction of edges and multiple edges between two vertices are reduced to one. The complete graph Kk is an undirected graph G = (V, E) where V = {1, . . . , k} and for all 1 ≤ i, j, ≤ k holds (i, j) ∈ E iff i = j, i.e., there is an edge between each pair of different vertices. Figure 1 shows the complete graphs K3 , K4 , K5 . Tree width is a notion introduced by Robertson and Seymour [6] to measure how similar to a tree a graph is. It assigns each graph a natural number, where smaller means closer to a tree. The number can be interpreted as the maximal number of independent paths between any two vertices. The following formal definition works independently of whether a graph is directed or not. Definition 1. A tree decomposition of a multi graph G = (V, E, sig, inc) is a pair (T,S), where T is an unordered tree and S is a family of sets indexed by the vertices of T such that 1. Xt ∈S Xt = V . 2. For all e ∈ E there is a unique Xt ∈ S such that if inc(e, v, w) then v, w ∈ Xt . 3. For all v ∈ V the subgraph of T induced by {t | v ∈ Xt } is connected. The width of such a decomposition is maxXt ∈S |{v | v ∈ Xt }|−1, i.e., the largest number of vertices in a single set of the decomposition minus 1. A graph G is of tree width k if and only if the smallest width of a tree decomposition of G is k. We will make use of the following two simple observations known from graph theory. Lemma 1. 1. The complete graph Kk has tree width k − 1. 2. Let G = (V, E, sig, inc) be a multi graph of tree width k. Then its underlying undirected graph Gu has the same tree width k. The following notion of a graph minor was also introduced and extensively studied by Robertson and Seymour. Definition 2. A graph G is a minor of H = (V, E, sig, inc) if it is the result of applying a finite sequence of the following three operations.
On Monadic Second-Order Theories of Multidominance Structures
119
– Edge deletion If e is an edge, then this operation removes e from the graph. – Vertex deletion If v is an unconnected vertex, then this operation removes v from the graph. – Edge contraction Let e be an edge and inc(e, v, w). Then edge e and vertex w are removed from the graph and each occurrence of w in inc is replaced by v. In effect, vertices v and w are fused. The tree width of a graph minor provides a lower bound for the tree width of a graph. Lemma 2. ([1], Lemma 16) If G is a minor of H, then tree width(G) ≤ tree width(H). Monadic Second-Order Logic of Graphs When talking about monadic second-order theories of graphs one distinguishes whether quantification is restricted to vertices or quantification is applicable to vertices and edges. The first theory is usually denoted MS 1 , the second MS 2 . It is known that definability of and decidability over certain classes of finite graphs vary depending on whether MS 1 or MS 2 is considered. The MS 1 -theory has vertex variables only, there are individual and set variables. For each vertex label L ∈ Σ0 and individual variable x there is an atomic formala L(x). For each edge colour C ∈ Σ2 and pair of individual variables x, y there is an atomic formula C(x, y). Furthermore equality and set membership are atomic. Complex formulae are constructed by boolean connectives and first-order and set existential and universal quantification. More precisely let X0 = {x0 , x1 , x2 , . . . } be a denumerably infinite set of firstorder vertex variables and X1 = {X0 , X1 , X2 , . . . } be a denumerably infinite set of vertex set variables. Atomic formulae are – – – – –
L(x) for each vertex label L ∈ Σ0 , C(x, y) for each edge colour C ∈ Σ2 , x = y, x ∈ X, X =Y.
Complex formulae are created by boolean operations and quantification: (let φ and ψ be formulae) – ¬φ, – ∀xφ, – ∀Xφ,
φ ∧ ψ, ∃xφ, ∃Xφ.
φ ∨ ψ,
The sematics is the usual one. The MS 2 -theory has two sorts of variables, namely vertex variables and edge variables. For both sorts there are individual and set variables. For each vertex
120
S. Kepser
label L ∈ Σ0 and individual vertex variable x there is an atomic formala L(x). For each edge colour C ∈ Σ2 , individual edge variable e and pair of individual vertex variables x, y there is an atomic formula incC (e, x, y). Furthermore equality and set membership for both sorts of variables are atomic. Complex formulae are constructed by boolean connectives and first-order and set existential and universal quantification for both sorts of variables. Let V be the sort of vertices and E be the sort of edges. Let X0 = {x0 , x1 , x2 , . . . } be a denumerably infinite set of first-order variables of sort V and X1 = {X0 , X1 , X2 , . . . } be a denumerably infinite set of set variables of sort V. Let E0 = {e0 , e1 , e2 , . . . } be a denumerably infinite set of first-order variables of sort E and E1 = {E0 , E1 , E2 , . . . } be a denumerably infinite set of set variables of sort E. The atomic formulae are – – – – – – – –
L(x) for each vertex label L ∈ Σ0 , incC (e, x, y) for each edge colour C ∈ Σ2 , x = y, x ∈ X, X =Y, e0 = e1 , e ∈ E, E0 = E1 .
Quantification can now be applied to vertices and edges. More precisely the complex formulae are constructed as follows (where φ and ψ are formulae): – – – – –
¬φ, ∀xφ, ∀Xφ, ∀eφ, ∀Eφ,
φ ∧ ψ, ∃xφ, ∃Xφ, ∃eφ, ∃Eφ.
φ ∨ ψ,
The sematics is the usual one.
3
Multidominance Structures
We introduce multidominance structures in this section quoting the relevant definitions from [5]. MDSes are structures which can be seen as being derived from binary trees1 . As binary trees, they are rooted directed graphs where each vertex has either 0 or 2 immediate successors. The graph may not contain a loop. In difference to trees, a vertex may have more than one immediate predecessor. The set of immediate predecessors is linearly ordered. Technically – and we follow here the description by Kracht [5] – the symbol defines an immediate dominance relation, where x y is read as x dominates y. Its inverse is denoted by ≺. Nodes are downward binary branching, they have at most two children. The following text is a longer quote from [5]. 1
We will later on see that they differ substantially from trees in important ways.
On Monadic Second-Order Theories of Multidominance Structures
121
To implement this we shall assume two relations, 0 and 1 each of which is a partial function, and = 0 ∪ 1 . We do not require the two relations to be disjoint. Recall the definition of the transitive closure R+ of a binary relation R ⊆ U ×U over a set U . It is the least set S containing R such that if (x, y) ∈ S and (y, z) ∈ S then also (x, z) ∈ S. Recall that R is loop free if and only if R+ is irreflexive. Also, R∗ := {(x, x) | x ∈ U } ∪ R+ is the reflexive, transitive closure of R. Definition 3. A preMDS is a structure M, 0 , 1 , where the following holds (with =0 ∪ 1 ): If y 0 x and y 0 x then x = x . If y 1 x and y 1 x then x = x . If y 1 x then there is a z such that y 0 z. There is exactly one x such that for no y, y x (this element is called the root). (5) ≺+ is irreflexive. (6) The set M (x) := {y : x ≺ y} is linearly ordered by ≺+ .
(1) (2) (3) (4)
We call a pair x, y such that x ≺ y a link. We shall also write x; y to say that x, y is a link. An MDS is shown in Figure 2. The lines denote the immediate daughter links. For example, there is a link from a upward to c. Hence we have a ≺ c, or, equivalently, c a. We also have b ≺ a. We use the standard practice of making the order of the daughters implicit: the leftward link is to the daughter number 0. This means that a ≺0 c and b ≺1 c. Similarly, it is seen that b ≺1 d and b ≺1 h, while c ≺0 d and g ≺0 h. It follows that M (a) = {c}, while M (b) = {c, d, h}. A link x, y such that y is minimal in M (x) is called a root link. For example, b, c is a root link, since c ≺+ d and c ≺+ h. A link that is not a root link is called derived. A leaf is a node without daughters. For technical reasons we shall split ≺0 and ≺1 into two relations each. Put x ≺00 y iff (= if and only if) x ≺0 y and y is minimal in M (x); and put x ≺01 y iff x ≺0 y but y in not minimal in M (x). Alternatively, x ≺00 y if x ≺0 y and x, y is a root link. Let x ≺01 y iff x ≺0 y but not x ≺00 y. Then by definition ≺00 ∩ ≺01 = ∅ and ≺0 = ≺00 ∪ ≺01 Similarly, we decompose ≺1 into ≺1 = ≺10 ∪ ≺11 where x ≺10 y iff x ≺1 y and y is minimal in M (x) (or, equivalently, x, y is a root link). And x ≺11 y iff x ≺1 and y is not minimal in M (x). We shall define ≺•0 := ≺00 ∪ ≺10 ≺•1 := ≺01 ∪ ≺11
122
S. Kepser
h • g •
@ @ @• d• e c• @ @ @• f •
• a
b Fig. 2. An MDS (from [5])
We shall spell out the conditions on these four relations in place of just ≺0 and ≺1 . The structures we get are called MDSs. Definition 4. An MDS is a structure M, 00 , 01 , 10 , 11 which, in addition to conditions (1) – (6) of Definition 3 satisfies (7) If y ∈ M (x) then x ≺•0 y iff x; y is a root link (iff y is the least element of M (x) with respect to ≺+ ). This ends the quote from [5]. We observe the following peculiarity. With MDSes defined the way above it is possible that there are two nodes x and y such that x 0 y and x 1 y, i.e., y is the left and right child of x. This is a deliberate decision on Kracht’s side, and justified by the logic to describe MDSes chosen by Kracht. But this decision causes technical problems to us that we will mention later. Since there is hardly any linguistic justification for making a node a left and right child simultanueously, we will also consider simple MDSes. These are MDSes that are simple in the graph theoretic sense of the word.
4
MS 1 -Axiomatisation of MDSes
The axiomatisability of MDSes (and regular ones) in MS 1 follows immediately from the axiomatisation of MDSes in PDL that Kracht provides in [5] and the well known fact that PDL formulae can be translated into equivalent MS 1 formulae. But MS 1 bing a powerful logic we can easily translate the defining properties of MDSes directly into MS 1 .
On Monadic Second-Order Theories of Multidominance Structures
123
y 0 x ∧ y 0 x =⇒ x = x . y 1 x ∧ y 1 x =⇒ x = x . y 1 x =⇒ ∃z y 0 z. ∃x ( ∃y y x) ∧ (∀z ( ∃y y z) =⇒ z = x). ∃x x + x. ∃E (y x ⇐⇒ y ∈ E) ∧ ∀z, z (z ∈ E ∧ z ∈ E) =⇒ (z + z ∨ z + z). ∃E (y x ⇐⇒ y ∈ E)∧ ∃y y ∈ E ∧ y •0 x ∧ ∀z z ∈ E =⇒ (z = y ∨ z + y) ∧ ∀z z ∈ E ∧ z = y =⇒ z •1 x, (8) x 0 y ∧ x 1 z =⇒ y = z.
(1) (2) (3) (4) (5) (6) (7)
Remember that MS 1 is capable of defining the transitive closure of an MS 1 definable binary relation. We skip this definition here (refering the reader to, e.g., [2]) and just use + as its abbreviation. Finiteness of the MDSes is also MS 1 -defineable. We will not present this fact in details. Rather we explain the method to be used. One can define a linear order on all the vertices extending . This order has to be discrete and has to have a maximal element. One defines a successor relation on the order and demands that the maximal element is in the set of elements reachable from the minimal element, which is the root, via the transitive closure of the successor relation.
5
Undecidability of the MS 2 -Theory of MDSes
For showing the undecidability of the MS 2 -theory of MDSes we use the following strategy. We define a sequence (Gk )k∈N of MDSes such that each element in the sequence contains its predecessor as a subgraph and each element Gi has the complete graph Ki as a graph minor. This way we can show that the class of MDSes has unbounded tree width. We then use a criterion by Seese [7] to deduce that its MS 2 theory is undecidable. Definition 5. Recursively define a sequence (Gk )k>1 of MDSes as follows. Define G2 = ({1.1, 2.1}, {(2.1, 1.1)}, ∅, ∅, ∅). For k > 2 set – – – – – –
Vk = Vk−1 ∪ {k.i | i = 1 . . . k − 2}, 00k =00k−1 ∪{(k.i, k.i + 1) | i = 1 . . . k − 3} ∪ {(k.k − 2, k − 1.1), 01k =01k−1 = ∅, 10k =10k−1 = ∅, 11k =11k−1 ∪{(k.i, i.1) | i = 1 . . . k − 2}, Gk = (Vk , 00k , 01k , 10k , 11k ).
The MDSes G2 , G3 and G4 are depicted in Figure 3. The MDSes G5 and G6 – being rather large – are depicted in Appendix A. It is immediately obvious that Gk−1 is a subgraph of Gk . We quickly check that Gk is indeed an MDS.
124
S. Kepser
4.1
4.2
3.1
3.1
2.1
1.1
2.1
2.1
1.1
1.1
Fig. 3. The MDSes G2 , G3 and G4
Lemma 3. For k > 1 each graph Gk is a simple MDS. Proof. Conditions (1), (2), and (4) (rootedness) are obviously true for all Gk . None of the graphs contains a loop (5). And simplicity is also observed. The thing to be shown are conditions (6) and (7) stating that the set of parents of each node is linearly ordered and that the lowest element in the order is the only root link. Observe that for each Gk all nodes are linearly ordered by 00k and that by definition of 00k each node different from the root has a single root link. A node (i.j) with j > 1 has a single parent, which is (i.j − 1). For i > 1 all nodes (i.1) have set M (i.1) = {(l.1) | i + 1 < l ≤ k} ∪ {(i + 1, i − 1)}. For node (1.1), M (1.1) = {(l.1) | i + 1 ≤ l ≤ k}. Lemma 4. For k > 1 each MDS Gk contains the complete graph Kk as a minor. Proof. For k = 2, 3 the complete graph Kk is the undirected version of Gk . For k > 3 the lemma is shown by induction on k. The general method is to contract all edges that connect vertices with the same main address. For k = 4 contract 4.1 004 4.2. As there is an edge from 4.1 to 1.1 and one each from 4.2 to 3.1 and 2.1 the undirected graph after contraction is K4 . For k > 4 contract the set of edges {k.i 00k k.i + 1 | 1 ≤ i ≤ k − 3}. As a result the set of vertices {k.i | 1 ≤ i ≤ k − 2} is fused to a single vertex k. By induction hypothesis, the subgraph Gk−1 can be contracted to Kk−1 . Now, since there is an edge from k.i to i.1 for each 1 ≤ i ≤ k − 2 and an edge from k.k − 2 to k − 1.1 (by definition of Gk ), each vertex in Kk−1 is connected to some vertex in {k.i | 1 ≤ i ≤ k − 2}. After fusing these into a single vertex k the resulting graph is thus Kk . Theorem 1. The MS 2 -theory of the classes of MDSes and simple MDSes is undecidable. Proof. As a consequence of the above lemma and Lemma 2 the classes of MDSes and simple MDSes have unbounded tree width.
On Monadic Second-Order Theories of Multidominance Structures
125
Seese ([7], Theorem 8) showed that if a class of graphs has a decidable MS 2 theory, it has bounded tree width.
6
Equivalence of MS 1 and MS 2 on MDSes
The aim of this section is to show that MS 1 has the same expressive power over MDSes as MS 2 . In other words, the option of edge set quantification does not extend the expressive power of MSO on MDSes. To show this we use a criterion by Courcelle. He showed in [3] that for uniformly k-sparse classes of simple graphs the two logics MS 1 and MS 2 have the same expresive power. A class of graphs is uniformly k-sparse if for some fixed k the number of edges of each subgraph of a graph is at most k-times the number of vertices. Definition 6. A finite multi graph G is k-sparse, if there is some natural number k such that Card(EG ) ≤ k Card(VG ). A finite multi graph G is uniformly k-sparse if each subgraph of G is k-sparse. A class of finite multi graphs is uniformly k-sparse if there is some natural number k such that each multi graph of the class is uniformly k-sparse. On the base of the following little lemma it is easy to see that MDSes are uniformly 2-sparse. Lemma 5. Let G be a multi graph. If the maximal in degree of G is d then G is uniformly d-sparse. If the maximal out degree of G is d then G is uniformly d-sparse. Proof. We can count edges by counting end points or starting points of edges. I.e. indeg(v) = outdeg(v). Card(EG ) = v∈VG
v∈VG
If the maximum in (out) degree is d the above equation can be approximated by Card(EG ) ≤ d Card(VG ). See also [3], Lemma 3.1.
Corollary 1. The class of MDSes is uniformly 2-sparse. Proof. MDSes share with binary trees the property of having a maximal out degree of 2. Thus simple MDSes fulfil the criterion set out in [3]. Proposition 1. The logics MS 1 and MS 2 have the same expressive power over the class of simple MDSes. Proof. By Theorem 5.1 of [3], the same properties of multi graphs are expressible by MS 1 and MS 2 formulae for the class of finite simple 2-sparse multi graphs.
126
S. Kepser
Corollary 2. The MS 1 -theory of the class of simple MDSes is undecidable. Proof. Follows immediately from the above proposition and Theorem 1. The restriction to simple MDSes can be overcome on the basis of the following observation. Since we have only four colours of edges, simplicity can be defined in first-order logic. The following axiom does this. ∀x, y (x 00 y ∨ x 01 y) ⇐⇒ ¬(x 10 y ∨ x 11 y). Theorem 2. The MS 1 -theory of the class of MDSes is undecidable. Proof. Suppose the MS 1 -theory of the class of MDSes were decidable. Add the above axiom of simplicity to gain a decision procedures for the MS 1 -theory over simple MDSes. This contradicts Corollary 2. Theorem 3. The logics MS 1 and MS 2 have the same expressive power over the class of MDSes. Proof. Both theories have the same degree of undecidability.
7
Conclusion
We showed that both the MS 1 -theory and the MS 2 -theory over MDSes are undecidable – contrary to what Kracht conjectured. There was a good reason for Kracht’s conjecture, namely MS 1 is not much more powerful than PDL. So, how can this result be interpreted. We’d like to propose the following view. Courcelle showed that the property of being a minor is definable by an MSOdefinable transduction. But this property is not PDL-definable. It is not possible to code grids in a direct way with MDSes, basically because any set of parents is linearly ordered by dominance. But grids can be minors of MDSes. There is the question whether we can find a natural restriction on MDSes to bound their tree width to regain decidability of the MSO-theories. It is of course possible to just demand this or enforce it by e.g., demanding MDSes to be generable by context-free graph grammars. But these restrictions do not seem to have a motivation different from bounding the tree width and thus seem arbitrary. It would be much nicer if restrictions could be found that relate multi dominance to restrictions for movement.
References 1. Bodlaender, H.L.: A partial k-arboretum of graphs with bounded treewidth. Theoretical Computer Science 209, 1–45 (1998) 2. Courcelle, B.: Graph rewriting: An algebraic and logic approach. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. B, ch. 5, pp. 193–242. Elsevier, Amsterdam (1990) 3. Courcelle, B.: The monadic second-order logic of graphs XIV: uniformly sparse graphs and edge set quantifications. Theoretical Computer Science 299(1-3), 1–36 (2003)
On Monadic Second-Order Theories of Multidominance Structures
127
4. Kracht, M.: Syntax in chains. Linguistics and Philosophy 24, 467–529 (2001) 5. Kracht, M.: On the logic of LGB type structures. Part I: Multidominance structures. In: Hamm, F., Kepser, S. (eds.) Logics for Linguistic Structures, pp. 105–142. Mouton de Gruyter, Berlin (2008) 6. Robertson, N., Seymour, P.: Graph minors II. Algorithmic aspects of treewidth. Journal of Algorithms 7(3), 309–322 (1986) 7. Seese, D.: The structure of the models of decidable monadic theories of graphs. Annals of Pure and Applied Logic 53, 169–195 (1991)
A
MDSes G5 and G6 5.1
5.2
5.3
4.1
4.2
3.1
2.1
1.1 Fig. 4. MDS G5
128
S. Kepser
6.1
6.2
6.3
6.4
5.1
5.2
5.3
4.1
4.2
3.1
2.1
1.1 Fig. 5. MDS G6
The Equivalence of Tree Adjoining Grammars and Monadic Linear Context-Free Tree Grammars Stephan Kepser1 and James Rogers2 1
Collaborative Research Centre 441 University of T¨ ubingen T¨ ubingen, Germany
[email protected] 2 Computer Science Department Earlham College Richmond, IN, USA
[email protected]
Abstract. It has been observed quite early after the introduction of Tree Adjoining Grammars that the adjoining operation seems to be a special case of the more general deduction step in a context-free tree grammar (CFTG) derivation. TAGs look like special cases of a subclass of CFTGs, namely monadic linear CFTGs. More than a decade ago it was shown that the two grammar formalisms are indeed weakly equivalent, i.e., define the same classes of string languages. This paper now closes the remaining gap showing the strong equivalence for so-called non-strict TAGs, a variant of TAGs where the restrictions for head and foot nodes are slightly generalised.
1
Introduction
Tree Adjoining Grammars [5,6] (TAGs) are a grammar formalism introduced by Joshi to extend the expressive power of context-free string grammars (alias local tree grammars) in a small and controlled way to render certain known mildly context-sensitive phenomena in natural language. The basic operation in these grammars, the adjunction operation, consists in replacing a node in a tree by a complete tree drawn from a finite collection. Context-free Tree Grammars (CFTGs, see [4] for an overview) have been studied in informatics since the late 1960ies. They provide a very powerful mechanism of defining tree languages. Rules of a CFTG define how to replace non-terminal nodes by complete trees. It has been observed quite early after the introduction of TAGs that the adjoining operation seems to be a special case of the more general deduction step in a CFTG-derivation. TAGs look like special cases of subclasses of CFTGs. This intuition was strengthened by showing that the yield languages definable by TAGs are equivalent to the yield languages definable by monadic linear C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 129–144, 2010. c Springer-Verlag Berlin Heidelberg 2010
130
S. Kepser and J. Rogers
non-deleting CFTGs, as was shown independently by M¨ onnich [9] and Fujioshi & Kasai [3]. The question of the strong equivalence of the two formalisms remained unanswered. Rogers [10,11] introduced a variant of TAGs called non-strict TAGs. Nonstrict TAGs generalise the definition of TAGs by releasing the conditions that the root node an foot node of an elementary tree must bear equal labels and that the label of the node to be replaced must be equal to the root node of the adjoined tree. The first proposal of such an extension of TAGs was made by Lang [7]. The new variant of TAGs looks even more like a subclass of CFTGs. And indeed, non-strict TAGs and monadic linear CFTGs are strongly equivalent. This is the main result of the present paper. We would like to point out that there is a small technical issue connected with this result. Call a tree ranked iff for every node the number of its children is a function of its label. It is well known that CFTGs define ranked trees. TAGs on the other hand define unranked trees. A tree generated by a TAG may have a leaf node and an internal node labelled with the same label. Taking the definition of ranked trees strict, this is not possible with CFTG generated trees. Our view on this issue is the standpoint taken by practical informatics: a function is not just defined by its name, rather by its name and arity. Hence a three-place A can be distinguished from a constant A by their difference in arity, though the function – or label – name is the same. For every label, a TAG only introduces a finite set of arities. Hence we opt for extending the definition of a ranked alphabet to be a function form labels to finite sets of natural numbers. The equivalence result and a previous result by Rogers [11] provide a new characterisation of the class of tree languages definable by monadic linear CFTGs by means of logics. A tree language is definable by a MLCFTG if and only if it is the two-dimensional yield of an MSO-definable three-dimensional tree language. The paper is organised as follows. The next section introduces trees, contextfree tree grammars, tree-adjoining grammars, and three-dimensional trees. Section 3 introduces a special type of CFTGs called footed CFTGs. These grammars can be seen as the CFTG-counterpart of TAGs. Section 3 contains the equivalence of monadic linear CFTGs and footed CFTGs. Section 4 shows that footed CFTGs are indeed the CFTG-counterpart of TAGs providing the equivalence of both grammar types. The next section states the afore mentioned logical characterisation of the expressive power of MLCFTGs. Due to space restrictions, technical proofs had to be left out unfortunately.
2 2.1
Preliminaries Trees
We consider labelled finite ordered ranked trees. A tree is ordered if there is a linear order on the daughters of each node. A tree is ranked if the label of a node implies the number of daughter nodes. A tree domain is a finite subset of the set of strings over natural numbers that is closed under prefixes and left sisters. Formally, let N∗ denote the set of
TAGs and MLCFTGs
131
all finite sequences of natural numbers including the empty sequence and N+ be the of all finite sequences of natural numbers excluding the empty sequence. A set D ⊂fin N∗ is called a tree domain iff for all u, v ∈ N∗ : uv ∈ D ⇒ u ∈ D (prefix closure) and for all u ∈ N∗ , i ∈ N : ui ∈ D ⇒ ∀j < i : uj ∈ D (closure under left sisters). An element of a tree domain is an address of a node in the tree. It is called a position. Let Σ be a set of labels. A tree is a pair (D, λ) where D is a tree domain and λ : D → Σ is a tree labelling function. The set of all trees labelled with symbols from Σ is denoted TΣ . A tree language L ⊆ TΣ is just a subset of TΣ . A set Σ of labels is ranked if there is a function ρ : Σ → ℘fin (N) assigning each symbol an arity. If t = (D, λ) is a tree of a ranked alphabet Σ then for each position p ∈ D : ρ(λ(p)) = n ⇒ p(n − 1) ∈ D, pm ∈ / D for every m ≥ n. If X is a set (of symbols) disjoint from Σ, then TΣ (X) denotes the set of trees TΣ∪X where all elements of X are taken as constants. The elements of X are understood to be “variables”. Let X = {x1 , x2 , x3 , . . . } be a fixed denumerable set of variables. Let X0 = ∅ and, for k ≥ 1, Xk = {x1 , . . . , xk } ⊂ X. For k ≥ 0, m ≥ 0, t ∈ TΣ (Xk ), and t1 , . . . , tk ∈ TΣ (Xm ), we denote by t[t1 , . . . , tk ] the result of substituting ti for xi in t. Note that t[t1 , . . . , tk ] is in TΣ (Xm ). Note also that for k = 0, t[t1 , . . . , tk ] = t. 2.2
Context-Free Tree Grammars
We start with the definition of a context-free tree grammar quoting [1]. Definition 1. A context-free tree grammar is a quadruple G = (Σ, F , S, P ) where Σ is a finite ranked alphabet of terminals, F is a finite ranked alphabet of nonterminals or function symbols, disjoint with Σ, S ∈ F 0 is the start symbol, and P is a finite set of productions (or rules) of the form F (x1 , . . . , xk ) → τ , where F ∈ F k and τ ∈ TΣ∪F (Xk ). We use the convention that for k = 0 an expression of the form F (τ1 , . . . , τk ) stands for F . In particular, for F ∈ F 0 , a rule is of the form F → τ with τ ∈ TΣ∪F . We sometimes use little superscripts to indicate the arity of a nonterminal, like in F 3 . The terminus context-free tree grammar is abbreviated by CFTG. For a context-free tree grammar G = (Σ, F , S, P ) we now define the direct derivation relation. Let n ≥ 0 and let σ1 , σ2 ∈ TΣ∪F (Xn ). We define σ1 ⇒ σ2 G
if and only if there is a production F (x1 , . . . , xk ) → τ , a tree η ∈ TΣ∪F (Xn+1 ) containing exactly one occurrence of xn+1 , and trees ξ1 , . . . , ξk ∈ TΣ∪F (Xn ) such that σ1 = η[x1 , . . . , xn , F (ξ1 , . . . , ξk )] and σ2 = η[x1 , . . . , xn , τ [ξ1 , . . . , ξk ]].
132
S. Kepser and J. Rogers
In other words, σ2 is obtained from σ1 by replacing an occurrence of a subtree F (ξ1 , . . . , ξk ) by the tree τ [ξ1 , . . . , ξk ]. ∗ As usual, ⇒ stands for the reflexive-transitive closure of ⇒. For a contextG
∗
G
free tree grammar G, we define L(G) = {t ∈ TΣ | S ⇒ t}. L(G) is called the G
tree language generated by G. Two grammars G and G are equivalent, if they generate the same tree language, i.e., L(G) = L(G ). We define three subtypes of context-free tree grammars. A production F (x1 , . . . , xk ) → τ is called linear, if each variable x1 , . . . , xk occurs at most once in τ . Linear productions do not allow the copying of subtrees. A tree grammar G = (Σ, F , S, P ) is called a linear context-free tree grammar, if every rule in P is linear. All the CFTGs we consider in this paper are linear. Secondly, a rule F (x1 , . . . , xk ) → τ is non-deleting if each variable x1 , . . . , xk occurs in τ . A CFTG is non-deleting if each rule is non-deleting. Thirdly, a CFTG G = (Σ, F , S, P ) is monadic if F k = ∅ for every k > 1. Non-terminals can only be constants or of rank 1. Monadic linear context-free tree grammars are abbreviated MLCFTGs. Example 1. Let G1 = ({g 2 , a, b, c, d, e}, {S, A1}, S, P ) where P consists of the three rules S → A(e)
A(x) →
a
b A(x) →
g> >> >> >> g> d >> >> >> > x c
g ??? ?? ?? ? a A d
g ??? ?? ?? ? x c b G1 is monadic, linear, and non-deleting. The tree language generated by G1 is not regular. Its yield language is the non-context-free language {an bn ecn dn | n ≥ 1}.
TAGs and MLCFTGs
2.3
133
Tree Adjoining Grammars
We consider so-called non-strict Tree Adjoining Grammars. Non-strict TAGs were introduced by Rogers [11] as an extension of TAGs that reflects the fact that adjunction (or substitution) operations are fully controlled by obligatory and selective adjoining constraints. There is hence no need to additionally demand the equality of head and foot node labels or the equality of the labels of the replaced node with the head node of the adjoined tree. Citing [11], a non-strict TAG is a pair (E, I) where E is a finite set of elementary trees in which each node is associated with – a label – drawn from some alphabet, – a selective adjunction (SA) constraint – a subset of the set of names of the elementary trees, and – an obligatory adjunction (OA) constraint – Boolean valued and I ⊂ E is a distinguished non-empty set of initial trees. Each elementary tree has a foot node. Formally let Λ be a set of linguistic labels and N a be a finite set of labels disjoint from Λ (the set of names of trees). A tree is a pair (D, λ) where – D is a tree domain, – λ : D → Λ × ℘(N a) × {true, false} a labelling function. Hence a node is labelled by a triple consisting of a linguistic label, an SA constraint, and an OA constraint. We denote TΛ,N a the set of all trees. An elementary tree is a triple ((D, λ, f ) where (D, λ) is a tree and f ∈ D is a leaf node, the foot node. Definition 2. A non-strict TAG is a quintuple G = (Λ, Na, E, I, name) where – – – – –
Λ is a set of labels, Na is a finite set of tree names, E is a finite set of elementary trees, I ⊆ E is a finite set of initial trees, and name : E → Na is a bijection, the tree naming function.
An adjunction is the operation of replacing a node n with a non-empty SA constraint by an elementary tree t listed in the SA constraint. The daughters of n become daughters of the foot node of t. A substitution is like an adjunction except that n is a leaf and hence there are no daughters to be moved to the foot node of t. Formally, let t, t be two trees and G a non-strict TAG. Then t is derived from t in a single step (written t ⇒ t ) iff there is a position p ∈ Dt and an G
elementary tree s ∈ E with foot node fs such that – λt (p) = (L, SA, OA) with L ∈ Λ, SA ⊆ Na, OA ∈ {true, false}, – Dt = {q ∈ Dt | v ∈ N+ : q = pv} ∪ {pv | v ∈ Ds } ∪ {pfs v | v ∈ N+ , pv ∈ Dt },
134
S. Kepser and J. Rogers
⎧ ⎨ λt (q) if q ∈ Dt and v ∈ N∗ : q = pv, – λt (q) = λs (v) if v ∈ Ds and q = pv, ⎩ λt (pv) if v ∈ N+ , pv ∈ Dt , q = pfs v. We write t = adj(t, p, s) if t is the result of adjoining s in t at position p. As ∗ usual, ⇒ is the reflexive-transitive closure of ⇒. Note that this definition also G
G
subsumes substitution. A substitution is just an adjunction at a leaf node. A tree is in the language of a given grammar, if every OA constraint on the way is fulfilled, i.e., no node of the tree is labelled with true as OA constraint. SA and OA constraints only play a role in derivations, they should not appear as labels of trees of the tree language generated by a TAG. Let π1 be the first projection on a triple. It can be extended in a natural way to apply to trees by setting – Dπ1 (t) = Dt , and for each p ∈ Dt , – λπ1 (t) (p) = L if λt (p) = (L, SA, OA) for some SA ⊆ Na, OA ∈ {true, false}. Now L(G) =
∗ ∃s ∈ I such that s ⇒ t, G π1 (t) p ∈ Dt with λt (p) = (L, SA, true) for some L ∈ Λ, SA ⊆ Na
One of the differences between TAGs and CFTGs is that there is no such concept of a non-terminal symbol or node in TAGs. The thing that comes closest is a node labelled with an OA constraint set to true. Such a node must be further expanded. The opposite is a node with an empty SA constraint. Such a node is a terminal node, because it must not be expanded. Nodes labelled with an OA constraint set to false but a non-empty SA constraint may or may not be expanded. They can neither be regarded as terminal nor as non-terminal nodes. Example 2. Let G2 = ({g, a, b, c, d, e}, {1, 2}, E, {name−1(1)}, name) where E and name are as follows: (To enhance readability we simplified node labels for all nodes that have an empty SA constraint. For these nodes we only present the linguistic label, the information (∅, false) is omitted. The foot nodes are underlined.) 1:
g s KKK KK ss s KK ss KK s s KK ss s K s a (g, {2}, false) d KKK s s K s K KKK ss KKK sss s s s e c b
TAGs and MLCFTGs
2:
135
gK ss KKK s KK ss KK ss KK ss s KK s s a (g, {2}, false) d JJ t J t J t JJJ ttt JJ tt JJ t tt g c b
Note that this grammar is even a strict TA grammar. Note also that it generates the same tree language as the MLCFTG from Example 1, i.e., L(G2 ) = L(G1 ). 2.4
Three-Dimensional Trees
We introduce the concept of three-dimensional trees to provide a logical characterisation of the tree languages generable by a monadic linear CFTG. Multidimensional trees, their logics, grammars and automata are thoroughly discussed in [11]. Here, we just quote those technical definitions to provide our results. The reader who wishes to gain a better understanding of the concepts and formalisms connected with multi-dimensional trees is kindly referred to [11]. Formally, a three-dimensional tree domain T 3 ⊂fin (N∗ )∗ is a finite set of sequences where each element of a sequence is itself a sequence of natural numbers such that for all u, v ∈ (N∗ )∗ if uv ∈ T 3 then u ∈ T 3 (prefix closure) and for each u ∈ (N∗ )∗ the set {v | v ∈ N∗ , uv ∈ T 3} is a tree domain in the sense of Subsection 2.1. Let Σ be a set of labels. A tree-dimensional tree is a pair (T 3, λ) where T 3 is a three-dimensional tree domain and λ : T 3 → Σ is a (node) labelling function. For a node x ∈ T 3 we define its immediate successors in three dimensions as follows. x 3 y iff y = x · m for some m ∈ N∗ , i.e., x is the longest proper prefix of y. x 2 y iff x = u · m and y = u · mj for some u ∈ T 3, m ∈ N∗ , j ∈ N, i.e. x and y are at the same 3rd dimensional level, but x is the mother of y in a tree at that level. Finally, x 1 y iff = u · mj and y = u · m(j + 1) for some u ∈ T 3, m ∈ N∗ , j ∈ N, i.e. x and y are at the same 3rd dimensional level and x is the immediate left sister of y in a tree at that level. We consider the weak monadic second-order logic over the relations 3 , 2 , 1 . Explanations about this logic and its relationship to T3 grammars and automata can be found in [11].
3
The Equivalence between MLCFTGs and Footed CFTGs
The equivalence between MLCFTGs and TAGs is proven by showing that both grammar formalisms are equivalent to a third formalism, so-called footed CFTGs.
136
S. Kepser and J. Rogers
non-deleting MLCFTG
MLCFTG gOOO kk OOO kkk k OOO k kk k OOO k u kk k
spinal-formed CFTG ?
non-deleting collapse-free MLCFTG S
SSS SSS SSS SSS S)
footed CFTG
Fig. 1. The Equivalence of MLCFTGs and footed CFTGs
Definition 3. Let G = (Σ, F , S, P ) be a linear CFT grammar. A rule F (x1 , . . . , xk ) → t is footed if there exists a position p ∈ Dt such that p has exactly k daughters, for 0 ≤ i ≤ k − 1 : λ(pi) = xi+1 , and no position different from {p0, . . . , p(k − 1)} is labelled with a variable. The node p is called the foot node and the path from the root of t to p is called the spine of t. A CFTG G is footed if every rule of G is footed. Footed CFTGs are apparently the counterpart of non-strict TAGs in the world of context-free grammars. Before we show this, we present in this section that footed CFTGs are actually equivalent to MLCFTGs. This is done in several intermediate steps, which are sketched in Figure 1. Each arrow indicates that a grammar of one type can be equivalently recasted as a grammar of the target type. We first show how to convert a MLCFTG into an equivalent footed CFTG in several steps. The start is the observation by Fujiyoshi that deletion rules do not contribute to the expressive power of MLCFTGs. Proposition 1. [2] For every monadic linear context-free tree grammar there exists an equivalent non-deleting monadic linear context-free tree grammar. But even in a non-deleting MLCFTG there may still be rules that delete nodes in a derivation step. Let G = (Σ, F , S, P ) be a non-deleting MLCFTG. A rule A(x) → x in P is called a collapsing rule. A collapsing rule actually deletes the non-terminal node A in a tree. If a CFTG does not contain a collapsing rule, the grammar is called collapse-free. Note that by definition footed CFTGs are collapse-free, because there is no position p having daughters (cp. Def. 3). Note also that the example MLCFTG in Ex. 1 is non-deleting and collapse-free. The next proposition shows that collapsing rules can be eliminated from MLCFTGs. Proposition 2. For every non-deleting MLCFTG there exists an equivalent non-deleting collapse-free MLCFTG. The idea of the proof is to apply the collapsing rule to all right-hand sides. Thus it is no longer needed. Some care has to be taken, if there is another way to expand the non-terminal that can collapse.
TAGs and MLCFTGs
137
MLCFTGs are not necessarily footed CFTGs, even when they are nondeleting and collapse-free. The reason is the following. Every right-hand side of every rule of a non-deleting MLCFTG has exactly one occurrence of the variable x. But this variable may have sisters. I.e. there may be subtrees in the rhs which have the same mother as x. Such a rule is apparently not footed. And its rhs can hardly be used as a base for an elementary tree in a TAG. Fortunately, though, a non-deleting collapse-free MLCFTG can be transformed into an equivalent footed CFTG. The resulting footed CFTG is usually not monadic any more. But this does not constitute any problem when translating the footed CFTG into a TAG. Proposition 3. For every non-deleting collapse-free MLCFTG there exists an equivalent footed CFTG. The main idea of the proof is the following. Let B(x) → t be a non-footed grammar rule containing the subtree g(t1 , x, t2 ). The undesirable sister subtrees t1 and t2 are replaced by variables yielding a new rule B(x1 , x2 , x3 ) → t where t is the result of replacing the subtree g(t1 , x, t2 ) by g(x1 , x2 , x3 ). The new rule is footed, but now ternary, not monadic. So is the non-terminal B. The original sister subtrees t1 and t2 still have to be dealt with. Suppose there is a grammar rule D(x) → τ such that τ contains a subtree B(θ). In this rhs we replace B(θ) by B(t1 , θ, t2 ). Now the non-terminal B is also ternary in the rhs, and the modified grammar rule can be applied to it. And if we apply the modified grammar rule, the trees t1 and t2 are moved back to being sisters of θ and daughters of the node g below which they were originally found. Example 3. We convert the non-deleting collapse-free MLCFTG G1 from Example 1 into a footed CFTG G2 . S → A (B, e, C) A (x1 , x2 , x3 ) → gB || BBB | BB | BB || || g A D | BBB BB || | BB || B || x1 x2 x3 A (x1 , x2 , x3 ) → gB || BBB | BB | BB || || A D A } AAA } AA }} AA }} A }} g B C | BBB BB || | BB || B || x1 x2 x3
138
S. Kepser and J. Rogers
A B C D
→ → → →
a b c d
Having shown by now that there is an equivalent footed CFTG for every MLCFTG we will now turn to the inverse direction. This is also done via intermediate steps. The following definitions are quoted from [3, p. 62]. A ranked alphabet is head-pointing, if it is a tripe (Σ, ρ, h) such that (Σ, ρ) is a ranked alphabet and h is a function from Σ to N such that, for each A ∈ Σ, if ρ(A) ≥ 1 then 0 ≤ h(A) < ρ(A), otherwise h(A) = 0. The integer h(A) is called the head of A. Definition 4. Let G = (Σ, F , S, P ) be a CFTG such that F is a head-pointing ranked alphabet. For n ≥ 1, a production A(x1 , . . . , xn ) → t in P is spinal-formed if it satisfies the following conditions: – There is exactly one leaf in t that is labelled by xh(A) . The path from the root to that leaf is called the spine of t, or the spine when t is obvious. – For a node d ∈ Dt , if d is on the spine and λ(d) = B ∈ F with ρ(B) ≥ 1, then d · h(B) is a node on the spine. – Every node labelled by a variable in Xn \ {xh(A) } is a child of a node on the spine. A CFTG G = (Σ, F , S, P ) is spinal-formed if every production A(x1 , . . . , xn ) → t in P with n ≥ 1 is spinal-formed. The intuition behind this definition as well as illustrating examples can be found in [3, p. 63]. We will not quote them here, because spinal-formed CFTGs are just an equivalent form of CFTGs on the way to showing that footed CFTGs can be rendered by MLCFTGs. Proposition 4. For every footed CFTG there exists an equivalent spinal-formed CFTG. Note that a rule in footed CFTG fulfills the first and the third condition of a spinal-formed rule already. What has to be shown is that the set F of nonterminals can be made to be head-pointing. Since the rhs of a footed CFTG rule has a spine, the non-terminals on the spine can be made head-pointing by following the spine. For all other non-terminals we arbitrarily choose the first daughter to be the head daughter. Proposition 5. [3] For every spinal-formed CFTG there exists an equivalent MLCFTG. This is a corollary of Theorem 1 (p. 65) of [3]. The authors see this fact themselves stating on p. 65 immediately above Theorem 1:
TAGs and MLCFTGs
139
“It follows from Theorem 1 that the class of tree languages generated by spine grammars is the same as the class of tree languages generated by linear non-deleting monadic CFTGs, that is, CFTGs with nonterminals of rank 1 and 0 only, and with exactly one occurrence of x in every right-hand side of a production for a nonterminal of rank 1.” We are now done showing that MLCFTGs are equivalent to footed CFTGs. Theorem 1. A tree language is definable by a monadic linear CFTG if and only if it is definable by a footed CFTG.
4
The Equivalence between Footed CFTGs and TAGs
The aim of this section is to show that footed CFTGs are indeed the counterpart of non-strict TAGs. We first translate footed CFTGs into non-strict TAGs. Proposition 6. For every footed CFTG there exists an equivalent non-strict TAG. The basic idea here is that every right-hand side of every rule from the CFTG is an elementary tree. The new foot node is the node that is the mother of the variables in the rhs of a rule. Of course, the variables and the nodes bearing them have to be removed from the elementary tree. To construct the TAG, every rhs of the CFTG gets a name. Every non-terminal in a rhs receives an obligatory adjunction constraint. The selection adjunction constraint it receives is the set of names of those rhs that are the rhs of the rules that expand this non-terminal. The initial trees are the rhs of the rules that expand the start symbol of the CFTG. Formally, let CFTG G = (Σ, F , S, P ) be a footed CFTG. Let Na be a set of labels such that |Na| = |P |. Define a bijection name : Na → rhs(P ) assigning names in Na to right hand side of rules in P in some arbitrary way. For a nonterminal A ∈ F k we define the set RhsA = {name(r) | (A(x1 , . . . , xk ) → r) ∈ P }. We define a function el-tree : rhs(P ) → TΣ∪F ,Na by considering two cases. For (A(x1 , . . . , xk ) → t) ∈ P such that f ∈ Dt with λt (f i) = xi+1 set D = Dt \ {f i | 0 ≤ i ≤ k − 1} for each p ∈ D : (λt (p), ∅, false) if λt (p) ∈ Σ, λ(p) = (B, RhsB , true) if λt (p) = B ∈ F el-tree(t) = (D, λ, f )
140
S. Kepser and J. Rogers
For (A → t) ∈ P set D = Dt for each p ∈ D : λ(p) =
(λt (p), ∅, false) if λt (p) ∈ Σ, (B, RhsB , true) if λt (p) = B ∈ F
f = 0k for k ∈ N, 0k ∈ D, 0k+1 ∈ /D el-tree(t) = (D, λ, f )
We let G = (Σ, Na, {el-tree(r) | r ∈ rhs(P )}, {el-tree(S)}, name) be the non-strict TAG derived from G. Example 4. To explain the construction we transform the grammar G2 of Example 3. The names are Na = {1, 2, 3, 4, 5, 6, 7} with name defined as Na rhs 1 A(B, e, C) 2 g(A, g(x1 , x2 , x3 ), D) 3 g(A, A(B, g(x1 , x2 , x3 ), C), D) 4 a 5 b 6 c 7 d We obtain the following elementary trees. (Again we simplify node labels of type (L, ∅, false) to just L. Foot nodes are underlined.) (A, {2, 3}, true) 1: QQQ QQQ mmm m m QQQ m m m QQQ m Q mmm (B, {5}, true) e (C, {6}, true) gK 2: ss KKKK ss KKK s s KKK ss s s g (A, {4}, true) (D, {7}, true) 3: ll g RRRRR lll RRR l l RRR ll l l RRR l R lll (A, {4}, true) (A, {2, 3}, true) (D, {7}, true) QQQ m QQQ mmm m m QQQ mm m QQQ m mm g (B, {5}, true) (C, {6}, true) 4: a 5: b 6: c 7: d
TAGs and MLCFTGs
141
Tree 1 is the only initial tree. If we substitute the substitution nodes 4 – 7 into the other elementary trees, the grammar bears a remarkable similarity the the TA grammar G2 of Example 2. Tree 2 of G2 corresponds to tree 3, and tree 1 of G2 to the result of adjoining tree 2 into 1. We now show the inverse direction. The idea of the construction is to take the elementary trees as right-hand sides of rules in a footed CFTG to be constructed. The non-terminals that are expanded – and hence the left-hand sides of rules – are those nodes that have an SA constraint that contains the name of the elementary tree under consideration. The arity of the non-terminal is just the number of daughters of such a node. Proposition 7. For every non-strict TAG there exists an equivalent footed CFTG. Let G = (Σ, Na, E, I, name) be a non-strict TAG. The construction looks as follows. Let S ∈ / Σ be a new symbol (the new start symbol). Set F k = {(L, SA, v) | ∃t ∈ E∃p ∈ Dt : λt (p) = (L, SA, v), v ∈ {true, false}, SA = ∅, p(k − 1) ∈ Dt , pk ∈ / Dt }. Set F = {S} ∪ k≥0 F k the set of non-terminals. For an elementary tree t = (Dt , λt , f ) ∈ E we define rhs(t, k) by D = Dt ∪ {f j | 0 ≤ j ≤ k − 1} for each p ∈ D :
⎧ L if λt (p) = (L, ∅, false), L ∈ Σ, ⎪ ⎪ ⎨ (L, SA, v) if λt (p) = (L, SA, v), L ∈ Σ, λ(p) = SA = ∅, v ∈ {true, false}, ⎪ ⎪ ⎩ xj+1 if p = f j, 0 ≤ j ≤ k − 1
rhs(t, k) = (D, λ) Note that for k = 0 the tree domain D = Dt . Define P1 as {(L, SA, v)(x1 , . . . , xk ) → rhs(t, k) | (L, SA, v) ∈ F k , t ∈ E : name(t) ∈ SA} ∪ {S → rhs(i, 0) | i ∈ I} and P2 as {(L, SA, false)(x1 , . . . , xk ) → L(x1 , . . . , xk ) | ∃t ∈ E ∃p ∈ t : λt (p) = (L, SA, false), p(k − 1) ∈ Dt , pk) ∈ / Dt }. The set P of productions is P1 ∪ P2 . Let G = (F , Σ, S, P ) be a CFTG. A simple check of the definition of the productions shows that G is footed. Note that the rules in P1 are used for the derivation proper while those in P2 serve the purpose of stripping off the undesirable SA and OA constraint information coded in the non-terminals.
142
S. Kepser and J. Rogers
Example 5. To illustrate the construction we provide an example of transforming a non-strict TAG into an equivalent footed CFTG. The input TAG is G2 from Example 2. The set of non-terminals is F = {S, (g, {2}, false)3 }. Rule set P1 consists of two rules: (g, {2}, false)(x1 , x2 , x3 ) →
S→
r g LLLL rr LLL r rr LLL rr r LLL r r r a (g, {2}, false) d KKK ss KKK s s s KKK sss KK s s s g LL c b r LLL rr r L r LLL r LLL rrr rrr x1 x2 x3 s g KKK KK ss s KK ss KK s ss KK s K ss a (g, {2}, false) d K KKK s s s KKK ss KKK sss s K s s e c b
Rule set P2 consists of a single rule: (g, {2}, false)(x1 , x2 , x3 ) → g(x1 , x2 , x3 ) The footed grammar is given by ({S, (g, {2}, false)}, {g, a, b, c, d, e}, S, P1 ∪ P2 ). The above results are accumulated in the following two theorems. Theorem 2. A tree language is definable by a footed CFTG if and only if it is definable by a non-strict TAG. We can now present the main result of this paper. It is an immediate consequence of the theorem above and Theorem 1. Theorem 3. The class of tree languages definable by non-strict Tree Adjoining Grammars is exactly the class of tree languages definable by monadic linear context-free tree grammars.
5
A Logical Characterisation
The aim of this section is to show that the theorem above and results by Rogers [11] on TAGs can be combined to yield a logical characterisation of tree languages
TAGs and MLCFTGs
143
definable by monadic linear CFTGs. A tree language is generable by a MLCFTG iff it is the two-dimensional yield of an MSO-definable three-dimensional tree language. Let us briefly sketch the two-dimensional yield of a three-dimensional tree. Let (T 3, λ) be a tree-dimensional tree. A node p ∈ T 3 is an internal node, iff p = (p is not the root) and there is a p with p 3 p (p has an immediate successor in the 3rd dimension). For an internal node we define a fold-in operation that replaces the node by the subtree it roots. Consider the set S of immediate successors of p. By definition it is a two-dimensional tree domain. We demand it to have a foot node, i.e., a distinguished node f ∈ S that has no immediate successors in the second dimension. The operation replaces p by S such that the immediate successors of p in the second dimension become the immediate successors of f in the second dimension. After this short sketch of a two-dimensional yield of a three-dimensional tree we can now state the main theorem of this section. It provides a logical characterisation of the tree languages definable by MLCFTGs. Theorem 4. A tree language is generable by a monadic linear context-free tree grammar iff it is the two-dimensional yield of an MSO-definable threedimensional tree language. Proof. Rogers [11] showed in Theorems 5 and 13 that a tree language is generable by a non-strict TAG iff it is the two-dimensional yield of an MSO-definable threedimensional tree language. The theorem is an immediate consequence of Rogers’ result and our Theorem 3.
6
Conclusion
We showed that non-strict TAGs and monadic linear CFTGs are strongly equivalent. We thereby rendered an old intuition about TAGs to be true (at least for non-strict ones). The strong equivalence result yields a new logical characterisation of the expressive power of monadic linear CFTGs. A tree language is definable by a MLCFTG iff it is the two-dimensional yield of an MSO-definable three-dimensional tree language. It is known that there is a whole family of mildly context-sensitive grammar formalisms that all turned out to be weakly equivalent. It would be interesting to compare their relative expressive powers in terms of tree languages, because, finally, linguists are interested in linguistic analyses, i.e., tree languages, and not so much in unanalysed utterances. For string based formalisms, the notion of strong generative capacity has to be extended along the lines proposed by Miller [8]. The current paper is one step in a program of comparing the strong generative capacity of mildly context-sensitive grammar formalisms.
144
S. Kepser and J. Rogers
References 1. Engelfriet, J., Schmidt, E.M.: IO and OI. I. Journal of Computer and System Sciences 15(3), 328–353 (1977) 2. Fujiyoshi, A.: Linearity and nondeletion on monadic context-free tree grammars. Information Processing Letters 93(3), 103–107 (2005) 3. Fujiyoshi, A., Kasai, T.: Spinal-formed context-free tree grammars. Theory of Computing Systems 33(1), 59–83 (2000) 4. G´ecseg, F., Steinby, M.: Tree languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages. Beyond Words, vol. 3, pp. 1–68. Springer, Heidelberg (1997) 5. Joshi, A., Levy, L.S., Takahashi, M.: Tree adjunct grammar. Journal of Computer and System Sciences 10(1), 136–163 (1975) 6. Joshi, A., Schabes, Y.: Tree adjoining grammars. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, Handbook of Formal Languages. Beyond Words, vol. 3, pp. 69–123. Springer, Berlin (1997) 7. Lang, B.: Recognition can be harder than parsing. Computational Intelligence 10, 486–494 (1994) 8. Miller, P.H.: Strong Generative Capacity: The Semantics of Linguistic Formalism. CSLI Publications, Stanford (1999) 9. M¨ onnich, U.: Adjunction as substitution. In: Kruijff, G.J., Morill, G., Oehrle, R. (eds.) Formal Grammar 1997, pp. 169–178 (1997) 10. Rogers, J.: A Descriptive Approach to Language-Theoretic Complexity. CSLI Publications, Stanford (1998) 11. Rogers, J.: wMSO theories as grammar formalisms. Theoretical Computer Science 293(2), 291–320 (2003)
A Formal Foundation for A and A-bar Movement Gregory M. Kobele University of Chicago
[email protected]
Abstract. It seems a fact that movement dependencies come in two flavours: “A” and “A-bar”. Over the years, a number of apparently independent properties have been shown to cluster together around this distinction. However, the basic structural property relating these two kinds of movement, the ban on improper movement (‘once you go bar, you never go back’), has never been given a satisfactory explanation. Here, I propose a timing-based account of the A/A-bar distinction, which derives the ban on improper movement, and allows for a simple and elegant account of some of their differences. In this account, “A” dependencies are those which are entered into before an expression is first merged into a structure, and “A-bar” dependencies are those an expression enters into after having been merged. The resulting system is mildly context-sensitive, providing therefore a restrictive account of possible human grammars, while remaining expressive enough to be able to describe the kinds of dependencies which are thought to be manifest.
It is common to describe the syntax of natural language in terms of expressions being related to multiple others, or moved from one position to another. Since [25], much effort has been put into determining the limitations on possible movements. A descriptively important step was taken by classifying movement dependencies into two basic kinds: those formed by the rule move NP, and those formed by move wh-phrase [7]. This bipartition of movement dependencies is a formal rendering of the observation that wh-movement, topicalization, and comparative constructions seem to have something in common, that they do not share with passive and raising constructions, which in turn have their own particular similarities. Whereas syntactic theories such as Head-Driven Phrase Structure Grammar [22] and Lexical-Functional Grammar [5] have continued to cash out this intuitive distinction between dependency types formally, in terms of a distinction between lexical operations and properly syntactic operations, this distinction has no formal counterpart in theories under the minimalist rubric [9]. This theoretical lacuna has led some [27] to explore the hypothesis that this perceived distinction between movement types is not an actual one; i.e. that the differences between Wh-construction types on the one hand and passive construction types on the other are not due to differences in the kinds of dependencies involved. A problem besetting those minimalists eager to maintain the traditional perspective on the difference between wh- and NP-movement C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 145–159, 2010. c Springer-Verlag Berlin Heidelberg 2010
146
G.M. Kobele
dependencies, is that there is no principled distinction between long-distance dependency types available in the theory; a theory with one kind of long-distance dependency does not lie well on the procrustean bed of one with two. The contribution of this paper is to provide a non-ad hoc minimalist theory with two kinds of movement dependencies, which have just the kind of properties which have become standardly associated with move NP- and move wh-phrase-related phenomenon, respectively. It is important to note that it is not a particular analysis of a particular language which I will claim has these properties, but rather the theoretical framework itself. Once we are in posession of a theoretical framework in which we have two movement dependency forming operations that interact in the appropriate way, we are in a position to determine whether the old intuitions about movement dependencies coming in two types were right; we can compare the relative elegance of analyses written in one framework to those written in the other. In §1 I describe the kinds of properties which are constitutive of the empirical basis for the bifurcation of movement into two types. Recent minimalist accounts of some of these properties [16,4] will form the conceptual background of my own proposal, developed in §2. The formal architecture of minimalist grammars [28] is presented in §2.1, and it is extended in §2.2 in accord with my proposal. In §2.3, I present an analysis of passivization in English (drawing on the smuggling account proposed in [10]) written within the framework of §2.2.
1
On A and A-bar Movements
Many differences between NP and wh-phrase movement have been suggested in the literature, such as whether they license parasitic gaps, whether they can move out of tensed clauses, whether they bar the application of certain morpho-phonological processes, and whether they incur crossover violations (see e.g. [18]). (These questions uniformly receive a negative answer with respect to NP movements, and a positive one with respect to wh-phrase movements.) A perusal of these properties makes clear that they are highly construction- and analysis-specific. In other words, a theoretical framework cannot derive these differences between NP and wh-phrase movement simpliciter, but may at most derive them relative to particular analyses of these constructions. The only analysis independent property of NP and wh-phrase movement types is the so-called ‘ban on improper movement’, which states that NP movement of an expression may not follow its movement as a wh-phrase. This relational property of NP (henceforth: ‘A’) and wh-phrase (henceforth: ‘A-bar’) movements is widely accepted, and was motivated by the desire to rule out sentences such as (1) below. (1). *[S John seems [S t [S t wanted to sleep ] ] ] In (1), the first movement (to SPEC-S) is an A-bar movement, and the second (to the matrix clause subject position) an A movement. The unacceptability of (1) contrasts with the well-formed (2) below, which one can interpret as suggesting that it is the second movement in (1), from SPEC-S to the subject position in the matrix clause, which leads to the deviance of (1).
A Formal Foundation for A and A-bar Movement
147
(2). [S Who does [S Mary believe [S t [S t wanted to sleep ] ] ] ] In the government and binding (GB) framework (as described in [8]) and the minimalist program (MP) (as in [9]), the ban on improper movement must simply be stated as such; movement from an A position may target an A-bar position, but movement from an A-bar position may only target other A-bar postions (see [21] for a particularly articulated view). In LFG and HPSG, where A movements are taken to be resolved lexically and A-bar movements resolved grammatically, the ban on improper movement follows from the architecture (grammatically complex expressions are simply not the kinds of things that lexical processes apply to). In the grammatical architecture I will develop in §2, A movements are those which occur before, and A-bar movements those which occur after, an expression has been first merged into a structure. The ban on improper movement is then just a simple consequence of the structure of derivations. Strictly speaking, the ban on improper movement is the only property of movement types which a grammatical framework can be said to derive. However, the following more analysis-specific property of movement types listed above will be shown to follow naturally from the architecture of the system in §2. A and A-bar movements differ systematically as to whether they create new binding possibilities.1 Consider sentences (3) and (4) below. In (3), the reflexive pronoun himself cannot be bound by the quantified noun phrase every boy, whereas in (4), after movement, it can. (3). *It seems to himself that every boy is wonderful. (4). Every boy seems to himself to be wonderful. This situation contrasts with the one laid out in (5) below, where we see that a wh-moved expression is not able to bind the reflexive pronoun. Sentence (6) shows that it is indeed the failed attempt at binding that results in the ungrammaticality of (5), as this movement is otherwise fine. (5). *Which boy does it seem to himself that Mary loves? (6). Which boy does it seem that Mary loves? The difference between these movement types can be summed up in the following diagram, with A movement of XP being able to, while A-bar movement of XP being unable to, bind the pronoun pro: XP . . . [ . . . pro . . . t . . . ] Attempts to account for these phenomena have been numerous in the GB framework, and have continued into the MP (see [26] for an accessible typology). One option is to rule out rebinding by A-bar movements by denying the 1
This is called ‘crossover’ in the literature [23]. Strong crossover is when the bound expression c-commands the source position of the movement, and weak crossover is when the bound expression is properly contained in such a c-commanding phrase. Weak crossover violations have been argued to be ameliorable under certain conditions [17].
148
G.M. Kobele
ability to bind pronouns from A-bar positions, and another is to require that no closer potential binders may intervene between an A-bar trace and its antecedent. Given the framework developed below, we are in a position to stipulate that an expression may bind only those expressions that it c-commands when first merged (in other words, that binding is determined by c-command in the derivation tree).
2
Trace Deletion, Derivationally
Without a formal apparatus to hang an account of differences between movement on, researchers in the GB tradition have attempted to capture the difference between A and A-bar movements in terms of properties of source positions: traces. It was discovered that under a certain network of assumptions, A-bar traces behaved as R-expressions, and A traces as anaphors. In the MP, it has been suggested for diverse reasons that A-bar traces should be treated formally as copies of the moved expression, while A traces should be treated formally as unstructured objects [11,16]. This is the idea upon which this paper builds. But currently there is nothing more than arbitrary stipulation (why are some traces copies, and others not? why does the ban on improper movement hold?). To excavate the idea, we should get clear on what, exactly, traces are (for). In mainstream minimalism, movement chains are licensed derivationally: only well-formed chains are built in the first place. Therefore, traces are not needed for evaluating well-formedness of a syntactic representation (their role in government and binding theory). Instead, traces (qua copies) play a role primarily at the interfaces, in particular the syntax-semantics interface, where they determine the positions in which an expression may take scope, as per [9]. The distinction between structured and unstructured traces (i.e. copies versus ‘traditional’ traces) is intended to indicate the possibility or not of reconstruction (with expressions being reconstructible into structured trace positions, but not into unstructured trace positions). A ‘copy’ indicates that an expression is present at a particular location in the structure for the purposes of reconstruction, while (unstructured) traces indicate that it is not. The intuition is simply that an expression may be interpreted in any position in which it is present; it is present in its A-bar positions, but not (necessarily) in its A positions. This is easier to understand if we think not about derived structures, but about the derivation itself: talk of ‘copies’ versus ‘traces’ is recast in terms of whether (copies) or not (traces) the object which is entering into these various dependencies is already present in the derivation at the time the dependency in question is entered into. The basic formal idea behind this intuition is to incorporate both transformations, as well as ‘slash-feature percolation’ [13] into a single formalism. Then we may have derivations involving slash-features, in which the object entering into the dependency in question is not present at the time the dependency is established:
A Formal Foundation for A and A-bar Movement
149
1. [V write ] 2. [V P/DP was written ] In addition, we may have derivations using transformations, in which the object entering into the dependency in question is present at the time the dependency is established: 1. [S that [S book was written ] ] 2. [N book [S that [S t was written ] ] ] The present derivational reconstruction of the representational traces versus copies account of the A/A-bar distinction has the distinct advantage of giving a unified and intuitive account of various properties of A and A-bar movement. In particular, the ban on improper movement is forced upon us in this timing-based perspective on long-distance dependency satisfaction. In the next section I show how to incarnate this derivational perspective on A and A-bar movement in a formal system. In so doing we gain a better understanding not only of the mechanisms involved, but also of the various analytical options which the mechanisms put at our disposal. 2.1
Minimalist Grammars
Minimalist grammars [28] provide a formal framework within which the ideas of researchers working within the minimalist program can be rigorously explored. A minimalist grammar is given by a four-tuple V, Cat, Lex, F , where – V , the alphabet, is a finite set – Cat, the set of features, is the union of the following pair of disjoint sets: • sel × Bool, where for ∗ x, 0 ∈ sel × Bool, we write =x, and call it a selector feature ∗ x, 1 ∈ sel × Bool, we write x, and call it a selectee feature • lic × Bool, where for ∗ y, 0 ∈ lic × Bool, we write +y, and call it a licensor feature ∗ y, 1 ∈ lic × Bool, we write -y, and call it a licensee feature – Lex, the lexicon, is a finite set of pairs v, δ, for v ∈ V ∪ {}, and δ ∈ Cat∗ – F = {merge, move} is the set of structure building operations Minimalist expressions are traditionally given in terms of leaf-labelled, doubly ordered (projection and precedence) binary trees. The leaves are labelled with pairs of alphabet symbols (V ∪ {}) and feature sequences (Cat∗ ). A typical expression is given in figure 1, where the precedence relation is indicated with the left-right order, and the projection relation is indicated with less-than (<) and greater-than (>) signs. The projection relation allows for the definition of the important concepts ‘head-of’ and ‘maximal projection’. Intuitively, one arrives at the leaf which is the head of a complex expression by always descending into the daughter which is least according to the projection relation. In the tree in figure 1, its head is
150
G.M. Kobele
<
will, +ks
> M ary, -k
< f eed,
< the,
dog,
Fig. 1. A minimalist expression
will, +ks, which is also (trivially) the head of its root’s left daughter. The head of the root’s right daughter is f eed, . Given a tree t with head v, δ, we write t[δ] to indicate that the head of t has features δ. A proper subtree t of tree t is a maximal projection just in case the sister ts of t is such that ts < t in t. If t is a subtree of a tree t, we may write t as Ct . Ct then refers to the tree like t but with the subtree t replaced by the subtree t . Work by [19] has shown that the operations of merge and move can be completely supported by data structures far less structured than doubly ordered leaf labelled binary trees.2 Accordingly, [29] provide a simplified expression type for minimalist grammars; an expression is a sequence φ0 , φ1 , . . . , φn , where each φi is a pair ν, δ, for ν ∈ V ∗ , and δ ∈ Cat+ . The intuition is that each φi , 1 ≤ i ≤ n, represents the phonetic yield of a moving subtree, and that φ0 represents the phonetic yield of the rest of the tree. Let t1 [=xδ1 ] and t2 [xδ2 ] be two minimalist trees with head-features beginning with =x and x respectively. Then the result of merging together t1 [=xδ1 ] and t2 [xδ2 ] is shown in figure 2.3
< t1 [δ1 ]
t2 [δ2 ]
Fig. 2. merge(t1 [=xδ1 ], t2 [xδ2 ])
From the perspective of the more concise chain-based representation, merge is broken up into two subcases, depending on whether or not the second argument 2 3
[15] has shown that these trees are also unnecessary for semantic interpretation. The merge operation presented here is non-standard in that it only allows for merger into a complement position (i.e. the merged expression follows the expression to which it is merged). I adopt this simplification only for expository purposes; nothing important hinges on this.
A Formal Foundation for A and A-bar Movement
151
will move (i.e. whether δ2 = or not). merge1(ν1 , =xδ1 ,φ1 , . . . , φm ; ν2 , x, ψ1 , . . . , ψn ) = ν1 ν2 , δ1 , φ1 , . . . , φm , ψ1 , . . . , ψn merge2(ν1 , =xδ1 ,φ1 , . . . , φm ; ν2 , xδ2 , ψ1 , . . . , ψn ) = ν1 , δ1 , φ1 , . . . , φm , ν2 , δ2 , ψ1 , . . . , ψn Let Ct[-yδ2 ][+yδ1 ] be a minimalist tree with head features beginning with +y, which contains a maximal (wrt projection) subtree t[-yδ2 ] with head features beginning with -y. Then the result of applying the move operation to Ct[-yδ2 ][+yδ1 ] is shown in figure 3 (where λ = , ).
> t[δ2 ]
Cλ[δ1 ]
Fig. 3. move(Ct[-yδ2 ][+yδ1 ])
Turning once more to the more concise chain-based representation, move is broken up into two subcases, depending on whether or not the moving subtree will move again (i.e. whether δ2 = or not). move1(ν1 , +yδ1 ,φ1 , . . . , ν2 , -y, . . . , φm ) = ν2 ν1 , δ1 , φ1 , . . . , φm move2(ν1 , +yδ1 ,φ1 , . . . , ν2 , -yδ2 , . . . , φm ) = ν1 , δ1 , φ1 , . . . , ν2 , δ2 , . . . , φm Since at least [25] it has been observed that movement cannot relate arbitrary tree positions, but rather that there are constraints on which positions a moved item can be construed as originating from. The canonical constraint on movement in minimalist grammars is the SMC [28], intended to be reminiscient of the shortest move constraint of [9].4 Intuitively, the SMC demands that if an expression can move, it must move. This disallows cases in which two or more moving subexpressions ‘compete’ for the same +y feature. The SMC is implemented as a restriction on the domain of move: move(ν, +yδ, φ1 , . . . , φm ) is defined iff exactly one φi = νi , δi is such that δi begins with -y 4
[12] investigate other constraints on movement in minimalist grammars.
152
2.2
G.M. Kobele
Trace Deletion in Minimalist Grammars
Minimalist grammars as presented above allow only for ‘derivational copies’; an expression is present in the derivation at every point in which it enters into a syntactic dependency. This is because we first merge an expression into the derivation, and then satisfy further dependencies by moving it around. In order to allow for ‘derivational traces’, we need an expression to start satisfying dependencies before it is part of the derivation. The mechanism adopted to allow this somewhat paradoxical sounding state of affairs bears strong similarities to ‘slash feature percolation’ in GPSG, as well as to hypothetical reasoning in logic. The intuition is that we will allow ourselves, upon encountering an expression t[z, 0δ1 ], to assume the existence of an expression with a matching feature z, 1. This allows us to continue the derivation as if we had successfully checked the first feature of t. However, assumptions, like other forms of credit, must eventually be paid back. This takes here the form of inserting an expression which actually has the features we had theretofore assumed we had, discharging the assumptions. To implement hypothetical reasoning, we introduce another pair of operations, assume and discharge. Informally, assume eliminates features of an expression, and keeps a record of the features so eliminated. An example is given in figure 4, where td represents the information that a d feature was hypothesized.
< smile, v
td
Fig. 4. assume(smile, =d v)
To eliminate assumptions, we introduce the discharge operation, which ‘merges’ two expressions together, using the second to satisfy en masse some of the features previously eliminated via assume in the first. An example is shown in figure 5, where the dotted lines indicate the checking relationships between the connected features. > < M ary, will, +k s
<
<
discharge( smile,
td , M ary, d -k)
−→
will, s
< smile,
λ
Fig. 5. discharge
While I have described the assume operation in terms of hypothesizing one feature away at a time, it is simpler to have the assume operation hypothesize
A Formal Foundation for A and A-bar Movement
153
an entire feature sequence. We therefore extend the definition of expressions: an expression is a sequence over (V ∗ × Cat+ ) ∪ (Cat+ × Cat+ ). A subexpression of the form δ, δ , where both δ, δ ∈ Cat+ indicates a (partially discharged) hypothesis of an expression with feature sequence beginning with δδ . The first component of such a subexpression records which of the hypothesized features have been checked, and the second component which of the hypothesized features remain unchecked (in this sense, a hypothetical subexpression resembles a dotted item in an Earley parser). Accordingly, we define assume as per the following:5 for δ2 ∈ Cat+ assume(ν1 , =xδ1 ,φ1 , . . . , φm ) → ν1 , δ1 , x, δ2 , φ1 , . . . , φm This definition of assume allows us to simply use move to deal with hypothesis manipulation, which then is subject to the same constraints as normally moving objects: move3(ν1 , +yδ1 ,φ1 , . . . , δ, -yδ2 , . . . , φm ) = ν1 , δ1 , φ1 , . . . , δ-y, δ2 , . . . , φm Once all but one of the features of a hypothesis have been ‘checked’, it is ready to be discharged. As with merge, the definition of discharge is split into two cases, as determined by whether the second argument will continue moving (i.e. whether it has licensee features in need of checking). discharge1(ν1 , +yδ1 , φ1 , . . . , δ, -y, . . . , φm ; ν2 , δ-y, ψ1 , . . . , ψn ) = ν2 ν1 , δ1 , φ1 , . . . , φm , ψ1 , . . . , ψn discharge2(ν1 , +yδ1 , φ1 , . . . , δ, -y, . . . , φm ; ν2 , δ-yδ2 , ψ1 , . . . , ψn ) = ν1 , δ1 , φ1 , . . . , φm , ν2 , δ2 , ψ1 , . . . , ψn As with move, we require that the arguments to discharge satisfy the SMC. discharge(ν, +yδ, φ1 , . . . , φm ; Ψ ) is defined only if at most one φi = α, δi is such that δi begins with -y 5
Note that assume is a relation. I don’t see any obvious way to incorporate hypothetical reasoning into the minimalist grammar framework without some element of non-determinism. The intuitive presentation as given in figure 4, where each application of assume hypothesizes away just a single feature, moves the non-determinism into the bookkeeping for which hypotheses may be eliminated by a single instance of the discharge operation. (Consider how many dotted lines we could have drawn in figure 5 if there had been other td hypotheses in the first argument to discharge.) Here, I have opted to localize all of the non-determinism in the assume operation, in the hope that this will make it easier to optimize away. (An obvious optimization is to limit the choice of δ2 to only those sequences of licensee features that actually occur in the lexicon.) It certainly is easier to present this way.
154
G.M. Kobele
Some comments on the formalism. Before moving on to some examples of the formalism at work, it is worth pointing out the following. First, it is clear that extending minimalist grammars with the operations assume and discharge in the manner described above does not increase the weak generative capacity of the formalism (the proof is a straightforward modification of the embedding given in [19] of minimalist grammars in MCFGs). Second, the particular definitions of assume and discharge given here allow for a certain kind of ‘smuggling’ [10] of moving subexpressions (to be taken up in §2.3). Specifically, certain violations of the SMC can be gotten around by delaying the first merge of an expression containing subexpressions which otherwise would compete with subexpressions of the main expression for checking. The smuggling seems logically independent of the addition of hypothetical reasoning to the minimalist grammar system [1], although it is not immediately obvious how to give a similarly elegant hybrid system without it. 2.3
An Analysis in the Hybrid Framework
In this section I will illustrate the workings of the hybrid framework developed in §2.2 by couching an analysis of passivization and relativization in English in these terms. Passivization is a canonical example of A movement, and relativization of A-bar movement. For relativization, I take as my starting point the raising analysis reanimated recently by [14] (see also [2,3,15,30]), according to which the head noun modified by the relative clause is base generated within the relative clause modifying it, and raised to its surface position. Schematically, one has derivations like the following. man [that the book will be written by t] For the analysis of passive, I adopt the smuggling analysis advanced recently by [10], according to which the demoted subject in the by-phrase is base generated in its canonical position. Under Collins’ analysis, a passive sentence is derived by moving the participle phrase to the specifier of the passive voice phrase (which is headed by by), and then exceptionally moving the logical object out from inside the just moved participle phrase into the surface subject position, as schematized below. [the book] will be [written t] by the man t Determiner phrases will be assigned the type ‘d -k’, which means that they will surface in a different position (-k) than the one they were merged in (d). This is an implementation of the idea that DPs move for case [8]. There are two ‘determiners’ in our fragment: i. the, =n d -k ii. , =n d -k -rel The first is the familiar the, the typing of which lets us know that it selects a noun phrase (of type n), and is then a determiner phrase (of type d -k). The
A Formal Foundation for A and A-bar Movement
155
second is particular to the raising analysis of relative clauses. It allows a noun phrase to behave as a determiner phrase within a clause, and then raise out of the clause (-rel), forming a noun-relative clause compound. iii. that, =s +rel n Lexical item iii selects a clause (of type s), and triggers raising of a noun phrase (+rel). The result (N Rel) behaves as a noun phrase (n). iv. smile, =d v v. will, =v +k s vi. man, n With the addition of lexical items iv, v, and vi, we may derive sentences like the man will smile in the following manner. 1. assume(iv) smile, v, d, -k 2. merge(v,1) will smile, +k s, d, -k 3. merge(i,vi) the man, d -k 4. discharge(2,3) the man will smile, s The relative clause man that will smile has an initially similar derivation, first diverging at step 3: 3. merge(ii,vi) man, d -k -rel 4. discharge(2,3) will smile, s, man, -rel 5. merge(iii,4) that will smile, +rel n, man, -rel 6. move(5) man that will smile, n With lexical items i–vi, all and only sentences belonging to the regular set the man (that will smile)∗ will smile are derivable. Expanding our fragment, we turn next to transitive clauses in the active voice, for which we need the following new lexical items.6 vii. write, =d V viii. , =>V +k V 6
The feature type =>x is a variant of =x, one which in addition triggers movement of the selected phrase’s head. For details, see [28,20,15].
156
G.M. Kobele
ix. , =>V =d v x. book, n Lexical item viii allows the object to check its case (-k) within the extended projection of the verb (the head movement is to get the word order right). It is optional, as passivization requires the object to check its case outside of the verb phrase in English. Lexical item ix is the head which selects the external argument of the verb phrase, and changes the category of the verbal projection to ‘little v’ (v). The sentence the man will write the book has the following derivation. 1. assume(vii) write, V, d, -k 2. merge(viii,1) write, +k V, d, -k 3. merge(i,x) the book, d -k 4. discharge(2,3) the book write, V 5. merge(ix,4) write the book, =d v 6. assume(5) write the book, v, d, -k 7. merge(v,6) will write the book, +k s, d, -k 8. merge(i,vi) the man, d -k 9. discharge(7,8) the man will write the book, s To derive passive sentences, we require the following five lexical items. xi. xii. xiii. xiv. xv.
-en, =>V V -part , =v +k x by, =x +part pass , =v +part pass be, =pass v
Using these lexical items, we may derive the sentence The book will be written by the man in the following manner. 1. assume(ix) , =d v, V, -part 2. assume(1) , v, d, -k, V, -part
A Formal Foundation for A and A-bar Movement
157
3. merge(xii,2) , +k x, d, -k, V, -part 4. merge(i,vi) the man, d -k 5. discharge(3,4) the man, x, V, -part 6. merge(xiii,5) by the man, +part pass, V, -part 7. assume(vii) write, V, d, -k 8. merge(xi,7) written, V -part, d, -k 9. discharge(6,8) written by the man, pass, d, -k 10. merge(xv,9) be written by the man, v, d, -k 11. merge(v,10) will be written by the man, +k s, d, -k 12. merge(i,x) the book, d -k 13. discharge(11,12) the book will be written by the man, s Note that if expression 8 were merged directly in step 1 instead of having been assumed in 1 and then discharged in 9, the derivation would have crashed at step 5, as the man and (the hypothesis of) the book would have been competing for the same position. We may derive the relative clause book that will be written by the man by first recycling steps 1–11 of the previous derivation, and then continuing in the following manner. 12. merge(ii,x) book, d -k -rel 13. discharge(11,12) will be written by the man, s, book, -rel 14. merge(iii,13) that will be written by the man, +rel n, book, -rel 15. move(14) book that will be written by the man, n
158
3
G.M. Kobele
Conclusions
I have demonstrated how a straightforward modification of the minimalist grammar framework yields a formal architecture for the description of language in which there exist two kinds of movement dependencies, which obey the ban on improper movement. It is simple and natural to connect the structure of movement chains to semantic interpretation in a way that derives the crossover differences between A and A-bar movement.7 Further semantic asymmetries, such as the hypothesis that A movement systematically prohibits reconstruction while A-bar movement does not [9,16], are also easily incorporable.8 One major difference between the hybrid minimalist framework presented here and the intuitive conception of A and A-bar movement present in the GB/MP literature, is the lack of a decision procedure for determining when an expression stops A moving and starts A-bar moving. (This is related to the fact that this decision procedure in the GB/MP relies on a complex network of assumptions which in the MG framework are non-logical, such as universal clausal structure, and crosslinguistic identity of features.) As mentioned in footnote 8, this freedom allows us to pursue hypotheses about language structure that we otherwise could not. It remains to be seen whether these novel hypothesis types prove enlightening, as it does the broader A/A-bar distinction.
References 1. Amblard, M.: Calculs de repr´esentations s´emantiques et syntaxe g´en´erative: les grammaires minimalistes cat´egorielles. Ph.D. thesis, Universit´e Bordeaux I (2007) 2. Bhatt, R.: The raising analysis of relative clauses: Evidence from adjectival modification. Natural Language Semantics 10(1), 43–90 (2002) 3. Bianchi, V.: The raising analysis of relative clauses: A reply to Borsley. Linguistic Inquiry 31(1), 123–140 (2000) 4. Boeckx, C.A.: A note on contraction. Linguistic Inquiry 31(2), 357–366 (2000) 5. Bresnan, J.: Lexical-Functional Syntax. Blackwell, Oxford (2001) 6. B¨ uring, D.: Crossover situations. Natural Language Semantics 12(1), 23–62 (2004) 7. Chomsky, N.: On Wh-movement. In: Culicover, P.W., Wasow, T., Akmajian, A. (eds.) Formal Syntax, pp. 71–132. Academic Press, New York (1977) 8. Chomsky, N.: Lectures on Government and Binding. Foris, Dordrecht (1981) 9. Chomsky, N.: The Minimalist Program. MIT Press, Cambridge (1995) 7
An expression may bind all and only those expressions it c-commands in the derivation tree. Note that this is a formal rendering of what [6] calls ‘Reinhart’s Generalization’ (after [24]): Pronoun binding can only take place from a c-commanding A position.
8
Two possibilities suggest themselves. First, we might allow an expression to reconstruct into any position through which it has moved (i.e. between where it is first merged/discharged and where it is last moved) [16]. Another option is to abandon the idea that there are dedicated A and A-bar positions, and force an expression to be interpreted exactly where it is first merged/discharged.
A Formal Foundation for A and A-bar Movement
159
10. Collins, C.: A smuggling approach to the passive in English. Syntax 8(2), 81–120 (2005) 11. Fox, D.: Reconstruction, binding theory, and the interpretation of chains. Linguistic Inquiry 30(2), 157–196 (1999) 12. G¨ artner, H.M., Michaelis, J.: A note on the complexity of constraint interaction: Locality conditions and minimalist grammars. In: Blache, P., Stabler, E.P., Busquets, J.V., Moot, R. (eds.) LACL 2005. LNCS (LNAI), vol. 3492, pp. 114–130. Springer, Heidelberg (2005) 13. Gazdar, G.: Unbounded dependencies and coordinate structure. Linguistic Inquiry 12(2), 155–184 (1981) 14. Kayne, R.: The Antisymmetry of Syntax. MIT Press, Cambridge (1994) 15. Kobele, G.M.: Generating Copies: An investigation into structural identity in language and grammar. Ph.D. thesis, University of California, Los Angeles (2006) 16. Lasnik, H.: Chains of arguments. In: Epstein, S.D., Hornstein, N. (eds.) Working Minimalism in Current Studies in Linguistics, vol. 32, pp. 189–215. MIT Press, Cambridge (1999) 17. Lasnik, H., Stowell, T.: Weakest crossover. Linguistic Inquiry 22, 687–720 (1991) 18. Mahajan, A.: The A/A-bar Distinction and Movement Theory. Ph.D. thesis, Massachusetts Institute of Technology (1990) 19. Michaelis, J.: On Formal Properties of Minimalist Grammars. Ph.D. thesis, Universit¨ at Potsdam (2001) 20. Michaelis, J.: Notes on the complexity of complex heads in a minimalist grammar. In: Proceedings of the Sixth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+6), Venezia (2002) 21. M¨ uller, G., Sternefeld, W.: Improper movement and unambiguous binding. Linguistic Inquiry 24(3), 461–507 (1993) 22. Pollard, C.J., Sag, I.A.: Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago (1994) 23. Postal, P.M.: Cross-over phenomena. Holt, Rinehart & Winston, New York (1971) 24. Reinhart, T.: Anaphora and Semantic Interpretation. University of Chicago Press, Chicago (1983) 25. Ross, J.R.: Constraints on Variables in Syntax. Ph.D. thesis, Massachusetts Institute of Technology (1967), published in 1986, as Infinite Syntax, Ablex 26. Ruys, E.G.: Weak crossover as a scope phenomenon. Linguistic Inquiry 31(3), 513–539 (2000) 27. Sportiche, D.: Reconstruction, binding and scope, ms. UCLA (2005) 28. Stabler, E.P.: Derivational minimalism. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 68–95. Springer, Heidelberg (1997) 29. Stabler, E.P., Keenan, E.L.: Structural similarity within and among languages. Theoretical Computer Science 293, 345–363 (2003) 30. de Vries, M.: The Syntax of Relativization. Ph.D. thesis, Universiteit van Amsterdam (2002)
Without Remnant Movement, MGs Are Context-Free Gregory M. Kobele University of Chicago
[email protected]
Abstract. Minimalist grammars offer a formal perspective on a popular linguistic theory, and are comparable in weak generative capacity to other mildly context sensitive formalism. Minimalist grammars allow for the straightforward definition of so-called remnant movement constructions, which have found use in many linguistic analyses. It has been conjectured that the ability to generate this kind of configuration is crucial to the super-context-free expressivity of minimalist grammars. This conjecture is here proven.
In the minimalist program of [2], the well-formedness conditions on movementtype dependencies of the previous GB Theory [1] are reimplemented derivationally, so as to render ill-formed movement chains impossible to assemble. For example, the c-command restriction on adjacent chain links is enforced by making movement always to the root of the current subtree–a position ccommanding any other. One advantage of this derivational reformulation of chain well-formedness conditions is that so-called ‘remnant movement’ configurations, as depicted on the left in figure 1, are easy to generate. Remnant movement occurs when, due to previous movement operations, a moving expression does not itself have a grammatical description. Here we imagine that the objects derivable by the grammar in figure 1 include the black triangle and the complex of white and black triangles, but not the white triangle to the exclusion of the black triangle. From an incremental bottom-up perspective, the structure on the left in figure 1 first involves moving the grammatically specifiable black triangle, but then the non-directly grammatically describable white triangle moves. This is to be contrasted with the superficially similar configuration on the right in figure 1, in which, again from an incremental bottom-up perspective, both
Fig. 1. Remnant Movement (left) vs Non-Remnant Movement (right) C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 160–173, 2010. c Springer-Verlag Berlin Heidelberg 2010
Without Remnant Movement, MGs Are Context-Free
161
movement steps are of grammatically specifiable objects (the first step (here, the dotted line) involves movement of the complex of white and black triangles, and the second step (the solid line) involves movement of the black triangle). In particular, the dependencies generated by remnant movement are ‘crossing’, while those of the permissible type are nested (in the intuitive sense made evident in the figure). The formalism of Minimalist Grammars (MGs) [20] was shown in [15] to be mildly context-sensitive (see also [10]). The MGs constructed in the proof use massive remnant movement to derive the non-context-free patterns, inviting the question as to whether this is necessary. Here we show that it is. MGs without remnant movement derive all and only the context-free languages. This result holds even when the SMC (a canonical constraint on movement, see [7]) is relaxed in such a way as to render the set of well-formed derivation trees non-regular. In this case, the standard proof [15] that MGs are mildly context-sensitive no longer goes through.
1
Mathematical Preliminaries
We assume familiarity with basic concepts of formal language theory. We write 2A for the power set of a set A, and, for f : A → B a partial function, dom(f ) denotes the subset of A on which f is defined. Given a set Σ, Σ ∗ denotes the set of all finite sequences of elements from Σ, including the empty sequence . Σ + is the set of all finite sequences over Σ of length greater than 0. For u, v ∈ Σ ∗ , u v is their concatenation. Often we will simply indicate concatenation via juxtaposition. A ranked alphabet is a set Σ together with a function rank : Σ → N assigning to each ‘function symbol’ in Σ a natural number indicating the arity of the function it denotes. If Σ is a ranked alphabet, we write Σi for the set {σ ∈ Σ : rank(σ) = i}. If σ ∈ Σi , we write σ (i) to indicate this fact. Let Σ be a ranked alphabet, the set of terms over Σ is written TΣ , and is defined to be the smallest set containing each σ ∈ Σ0 , and for each σ ∈ Σn , and t1 , . . . , tn ∈ TΣ , the term σ(t1 , . . . , tn ). For X any set, and Σ a ranked alphabet, Σ ∪ X is also a ranked alphabet, where (Σ ∪ X)0 = Σ0 ∪ X, and (Σ ∪ X)i = Σi for all i > 0. We write TΣ (X) for TΣ∪X . A unary context over Σ is C ∈ TΣ ({x}), such that x occurs exactly once in C. Given a unary context C and term T , we write C[t] to denote the result of substituting t in for x in C (x[t] = t, σ(t1 , . . . , tn )[t] = σ(t1 [t], . . . , tn [t])). A bottom-up tree automaton is given by a quadruple Q, Σ, →, QF , where Q is a finite set of states, Qf ⊆ Q is the set of final states, Σ is a ranked alphabet, and →⊂f in Σ × Q∗ → Q. A bottom-up tree automaton defines a relation ⇒: TΣ (Q) × TΣ (Q). If C is a unary context over Σ ∪ Q, and σ (n) , q1 , . . . , qn → q, then C[σ(q1 , . . . , qn )] ⇒ C[q]. The tree language accepted by a bottom-up tree automaton A is defined as L(A) = {t ∈ TΣ : ∃q ∈ QF . t ⇒∗ q}. A set of trees is regular iff it is the language accepted by some bottom-up tree automaton.
162
2
G.M. Kobele
Minimalist Grammars
We use the notation of [22]. An MG over an alphabet Σ is a triple G = Lex, sel, lic where sel and lic are finite non-empty sets (of ‘selection’ and ‘licensing’ feature types), and for F = {=s, s : s ∈ sel} ∪ {+l, -l : l ∈ lic}, Lex ⊂f in Σ ∗ × F∗ . Given binary function symbols Δ2 := {mrg1, mrg2, mrg3} and unary Δ1 := {mv1, mv2}, a derivation is a term in der(G) = TΔ2 ∪Δ1 ∪Lex , where elements of Lex are treated as nullary symbols. An expression is a finite sequence φ0 , φ1 , . . . , φn of pairs over Σ ∗ × F∗ ; the first component φ0 represents the yield and features of the expression (qua tree) minus any moving parts, and the remaining components represent the yield and features of the moving parts of the expression. Thus an expression of the form φ0 = σ, γ represents a tree with no moving pieces; such an expression is called a com∗ ∗ + plete expression of category γ. Eval : der(G) → 2(Σ ×F ) is a partial function mapping derivations to the sets of expressions they are derivations of. Given ∈ Lex, Eval() = {}, and Eval(mrgi (d1 , d2 )) and Eval(mvi (d)) are defined as {mergei (e1 , e2 ) : ej ∈ Eval(dj )} and {movei (e) : e ∈ Eval(d)} respectively, where the operations mergei and movei are defined below. In the following, σ, τ ∈ Σ ∗ , γ, δ ∈ F∗ , and φi , ψj ∈ Σ ∗ × F∗ . σ, =cγ ∈ Lex τ, c, ψ1 , . . . , ψn merge1 σ τ, γ, ψ1 , . . . , ψn σ, =cγ, φ1 , . . . , φm τ, c, ψ1 , . . . , ψn merge2 τ σ, γ, φ1 , . . . , φm , ψ1 , . . . , ψn σ, =cγ, φ1 , . . . , φm τ, cδ, ψ1 , . . . , ψn merge3 σ, γ, φ1 , . . . , φm , τ, δ, ψ1 , . . . , ψn σ, +cγ, φ1 , . . . , φi−1 , τ, -c, φi+1 , . . . , φm move1 τ σ, γ, φ1 , . . . , φi−1 , φi+1 , . . . , φm σ, +cγ, φ1 , . . . , φi−1 , τ, -cδ, φi+1 , . . . , φm move2 σ, γ, φ1 , . . . , φi−1 , τ, δ, φi+1 , . . . , φm The SMC is a restriction on the domains of move1 and move2 which render these relations functional. no φj = σj , γj is such that γj = -cγj unless j = i
(SMC)
The (string) language generated at a category c (for c ∈ sel) by a MG G is defined to be the yields of the complete expressions of category c:1 Lc (G) := {σ : ∃d ∈ der(G). σ, c ∈ Eval(d)}. 1
Implicit in [15] is the fact that for any c, domc (Eval) = {d : ∃σ. σ, c ∈ Eval(d)} is a regular tree language. This is explicitly shown in [13].
Without Remnant Movement, MGs Are Context-Free
3
163
A Ban on Remnant Movement
In order to implement a ban on remnant movement, we want to implement a temporaray island status on moving expressions: nothing can move out of a moving expression until it has settled down (‘please wait until the train has come to a complete stop before exiting’). Currently, an expression e = φ0 , φ1 , . . . , φk has the form just given, where φ0 is the ‘head’ of the expression, and the other φi are ‘moving parts’. Importantly, although we view such an expression as a compressed representation of a tree, there is no hierarchical relation among the φi . In order to implement a ban against remnant movement, we need to indicate which of the moving parts are contained in which others. We represent this information by retaining some of the relative dominance relations in the represented tree: e = φ0 , T1 , . . . , Tn , where each tree Ti pairs a moving part with a (possibly empty) sequence of trees (the set of trees T is the smallest set X such that X = (Σ ∗ × F∗ ) × X ∗ ). We interpret such a structure as a moving part (the features of which are represented by φi ) which itself may contain i moving subparts (T1i , . . . , Tm ). By allowing these moving subparts to become accessible for movement only after the features of φi have been exhausted, we rule out the crossing, remnant movement type dependencies. The revised cases of the operations merge and move, PBC-merge and PBC-move,2 are given below. The function P BC-Eval interprets derivations d ∈ der(G) in ‘PBC-mode,’ such that P BC-Eval() = {}, P BC-Eval(mvi (d)) = {P BC-movei (e) : e ∈ P BC-Eval(d)}, and P BC-Eval(mrgi (d1 , d2 )) = {P BC-mergei (e1 , e2 ) : ej ∈ P BC-Eval(dj )}. In the below, σ, τ are strings, γ, δ are finite sequences of syntactic features, Si , Tj are trees of the form σ, γ, S1 , . . . , Sn . σ, =cγ ∈ Lex τ, c, T1 , . . . , Tn P BC-merge1 σ τ, γ, T1 , . . . , Tn σ, =cγ, S1 , . . . , Sm τ, c, T1 , . . . , Tn P BC-merge2 τ σ, γ, S1 , . . . , Sm , T1 , . . . , Tn σ, =cγ, S1 , . . . , Sm τ, cδ, T1 , . . . , Tn P BC-merge3 σ, γ, S1 , . . . , Sm , τ, δ, T1 , . . . , Tn σ, +cγ, S1 , . . . , Si−1 , τ, -c, T1 , . . . , Tn , Si+1 , . . . , Sm P BC-move1 τ σ, γ, S1 , . . . , Si−1 , T1 , . . . , Tn , Si+1 , . . . , Sm σ, +cγ, S1 , . . . , Si−1 , τ, -cδ, T1 , . . . , Tn , Si+1 , . . . , Sm P BC-move2 σ, γ, S1 , . . . , Si−1 , τ, δ, T1 , . . . , Tn , Si+1 , . . . , Sm 2
The ‘PBC’ is named after the proper binding condition of [6], which filters out surface structures in which a trace linearly precedes its antecedent. If the antecedent of a trace left behind by a particular movement step is defined to be the element (trace or otherwise) in the target position of that movement, the present modification to the rules merge and move exactly implement the PBC in the minimalist grammar framework.
164
G.M. Kobele
We will continue to require that these rules satisfy (a version of) the SMC.3 Following [12], we define the SMC over P BC-move as follows: no Tj = σj , γj , T1j , . . . , Tnj is such that γj = -cγj unless j = i (PBC-SMC) The string language generated in PBC-mode at a category c is defined as BC usual: LP (G) := {σ : ∃d ∈ der(G).σ, c ∈ P BC-Eval(d)}. c Observe that the rule P BC-merge3 introduces new tree structure, temporarily freezing the moving pieces within its second argument. The rules P BC-move1 and P BC-move2 enforce that only the root of a tree is accessible to movement operations, and that its daughter subtrees become accessible to movement only once the root has finished moving. Note also that the set of well-formed derivation trees in PBC-mode (the set dom(P BC-Eval)) is not a regular tree language (this is due to the laxness of the PBC-SMC). To see this, consider the MG G1 = Lex, {x, y}, {A}, where Lex contains the four lexical items below. a::=x x -A f::x c::=y +A y e::=x y Derivations of complete expressions of category y begin by starting with f, and repeatedly merging tokens of a. Then e is merged, and for each a, a c is merged, and a move step occurs. In particular, although the yields of these trees form the context-free language cn ean f, the number of mrg3 nodes must be equal to the number of mv1 nodes. It is straightforward to show that no finite-state tree automaton can enforce this invariant. Our main result is that minimalist grammars under the PBC mode of derivation (i.e. using the rules just given above) generate exactly the class of contextfree languages.
4
MGs with Hypotheses
Because the elimination of remnant movement guarantees that, viewed from a bottom-up perspective, we will finish moving a containing expression before we need to deal with any of its subparts, we can re-represent expressions using ‘slashfeatures’, as familiar from GPSG [8]. Accordingly, we replace (P BC-)merge3 , 3
There are two natural interpretations of the SMC on expressions e = Φ, T1 , . . . , Tn . First, one might require that no two φi and φj , share the same first feature, regardless of how deeply embedded within trees they may be. This perspective views the tree structure as irrelevant for the statement of the SMC. Another reasonable option is to require only that no two φi and φj share the same first feature, where φi and φj are the roots of trees Ti and Tj respectively. This perspective views the tree structure of moving parts as relevant to the SMC, and allows for a kind of ‘smuggling’ [3], as described in [12]. The results of this paper are independent of which of these two interpretations of the SMC we adopt. We adopt the second, because it is more interesting (the derivation tree sets no longer constitute regular tree languages).
Without Remnant Movement, MGs Are Context-Free
165
which introduces a to-be-moved expression, with a new (non-functional) operation, assume, which introduces a ‘slash-feature’, or hypothesis. A hypothesis takes the form of a pair of feature strings δ, γ. The interpretation of a hypothesis δ, γ, is such that δ records the originally postulated ‘missing’ feature sequence (and thus is unchanging over the lifetime of the hypothesis), whereas γ represents the remaining features of the hypothesis, which are checked off as the derivation progresses. M ove1 , which re-integrates a moving part into the main expression, is replaced with another new operation, discharge. Discharge replaces ‘used up’ hypotheses with the expressions that they ‘could have been.’ These expressions may themselves contain hypothesized moving pieces. Derivations of minimalist grammars with hypothetical reasoning in this sense are terms d ∈ (2) (2) (1) Hyp-der(G) over a signature {mrg1 , mrg2 , assm(1) , mv2 , dschrg(2) }∪Lex, and Hyp-Eval partially maps such terms to expressions in the by now familiar manner. In the below, σ, τ are strings over Σ, γ, δ, ζ are finite sequences of syntactic features, φi , ψj are pairs of the form δ, γ. σ, =cγ ∈ Lex τ, c, ψ1 , . . . , ψn merge1 σ τ, γ, ψ1 , . . . , ψn σ, =cγ, φ1 , . . . , φm τ, c, ψ1 , . . . , ψn merge2 τ σ, γ, φ1 , . . . , φm , ψ1 , . . . , ψn σ, =cγ, φ1 , . . . , φm assume σ, γ, φ1 , . . . , φm , cδ, δ σ, +cγ, φ1 , . . . , φi−1 , δ, -c, φi+1 , . . . , φm τ, δ, ψ1 , . . . , ψn discharge τ σ, γ, φ1 , . . . , φi−1 , ψ1 , . . . , ψn , φi+1 , . . . , φm σ, +cγ, φ1 , . . . , φi−1 , ζ, -cδ, φi+1 , . . . , φm move2 σ, γ, φ1 , . . . , φi−1 , ζ, δ, φi+1 , . . . , φm We subject the operations move2 and discharge to a version of the SMC: no φj = ζj , γj is such that γj = -cγj unless j = i
(Hyp-SMC)
The language of a minimalist grammar G at category c using hypothetical reasoning is defined to be: LHyp (G) := {σ : ∃d ∈ Hyp-der(G). σ, c ∈ Hyp-Eval(d)} c The operation discharge constrains the kinds of assumptions introduced by assume which can be part of a well-formed derivation to be those which are of the form cδ, δ, where there is some lexical item σ, γcδ. As there are finitely many lexical items, there are thus only finitely many useful assumptions given a particular lexicon. It will be implicitly assumed in the remainder of this paper that assume is restricted so as to generate only useful assumptions. We henceforth index assm nodes with the features of the hypotheses introduced (writing thus assmcγ for an assume operation introducing the hypothesis cγ, γ).
166
G.M. Kobele
Theorem 1. For any G, and any c ∈ selG , the set domc (Hyp-Eval) = {d : ∃σ. σ, c ∈ Hyp-Eval(d)} is a regular tree language. Proof. Construct a nondeterministic bottom-up tree automaton whose states are (|lic| + 1)-tuples of pairs of suffixes of lexical feature sequences. The Hyp-SMC allows us to devote each component of such a sequence beyond the first to the (if it exists, unique) hypothesis beginning with a particular -c feature, and thus we assume to be given a fixed enumeration of lic. The remarks above guarantee that there are only a finite number of such states needed. Given an expression, φ0 , φ1 , . . . , φn , the state representing it has as its ith component the pair , if there is no φj beginning with the ith -c feature, and the unique φj beginning with the ith -c feature otherwise. The 0th component of a state is always of the form , γ, where γ is the feature sequence of φ0 . As we are interested in derivations of complete expressions of category c, the final state is , c, , , . . . , , . The transitions of the automaton are defined so as to preserve this invariant: at a lexical item = σ, γ, the automaton enters the state , γ, , , . . . , , , and at an internal node σ (n) (q1 , . . . , qn ), the automaton enters the state q just in case there are expressions e1 , . . . , en represented by states q1 , . . . , qn which are mapped by the operation denoted by σ to an expression e represented by state q. We use the facts that linear homomorphisms preserve recognizability and that the yield of a recognizable set of trees is context-free [4] in conjunction with theorem 1 to show that minimalist grammars using hypothetical reasoning define exactly the context-free languages. Theorem 2. For any G, and any c ∈ selG , LHyp (G) is context-free. c Proof. Let G and c be given. By theorem 1, D = domc (Hyp-Eval) is recognizable. Let E = f [D], where f is the homomorphism defined as follows (f maps nullary symbols to themselves): σ(f (e2 ), f (e1 )) if σ ∈ {mrg2 , dschrg} f (σ(e1 , . . . , en )) = σ(f (e1 ), . . . , f (en )) otherwise Inspection of f reveals that it is merely putting sister subtrees in the order in which they are pronounced (` a la Hyp-Eval) and thus, for any d ∈ D, Hyp-Eval(d) contains σ, c iff yield(f (d)) = σ. As f is linear, E is recognizable, (G) is context-free. and thus yield(E) = LHyp c
5
Relating the PBC to Hypothetical Reasoning
To show that minimalist grammars in PBC mode are equivalent to minimalist grammars with hypothetical reasoning we will exhibit an Eval-preserving bijection between complete derivation trees of both formalisms.4 The gist of the 4
A complete derivation tree is just one which is the derivation of a complete expression. I will in the following use the term in conjunction with derivations in der(G) to refer exclusively to expressions derived in PBC-mode.
Without Remnant Movement, MGs Are Context-Free
mv1
dschrg
mrg1 c
167
mrg1 c
mv1
dschrg
mrg1 c
mrg1 c
mrg3 e
e
mrg3 a
mrg1 a
assmx-A assmx-A a
f
mrg1 a
f
Fig. 2. Derivations in G1 of afcace
transformation is best provided via an example. Consider the trees in figure 2, which are derivations over the MG G1 in PBC mode and using hypothetical reasoning respectively of the string afcace. The dotted lines in the derivation trees in figure 2 indicate the implicit dependencies between the unary operations and other expressions in the derivation. For example, mv nodes are connected via a dotted line to the subtree which ‘moves’. Similarly, assmγ nodes are connected via a dotted line to the expression which ultimately discharges the assumption they introduced. The subtree connected via dotted line to a mv1 node I will call the subtree targeted by that move operation, and is that subtree whose leftmost leaf introduces the -c feature checked by this move step, and which is the right child of a mrg3 node. Note that if the derivation is well-formed (i.e. is in the domain of P BC-Eval) there is a unique subtree targeted by every mv1 node. The right daughter of a dschrg node connected via dotted line to a assmγ node is called the hypothesis discharged by that discharge operation, and is connected to the assmγ node which introduces the hypothesis which is discharged at its parent node. Again, if the derivation is well-formed, the assmγ node in question is the unique such. Note, however, that it is only in the case of complete derivation trees that to every assmγ node there corresponds the hypothesis-discharging discharge node. The major difference between PBC and Hyp MG derivations is that expressions entering into multiple feature checking relationships during the course of the derivation are introduced into the derivation at the point their first feature checking relationship takes place in the case of PBC (and MG derivations more generally), and at the point their last feature checking relationship obtains in the case of Hyp MGs. The relation between the two trees in figure 2, and more generally between PBC derivations and hypothetical derivations, is that the subtree connected via dotted line to mv1 becomes the second argument of dschrg, and the second argument of mrg3 becomes the subtree connected to assm via a dotted line. This is shown in figure 3.
168
G.M. Kobele
mv1
≈
dschrg
mrg3
≈
assm
Fig. 3. Relating PBC derivations and hypothetical derivations
We define a relation Trans ⊂ der(G) × Hyp-der(G) in the following manner: 1. Trans(mv1 (d), dschrg(d1 , d2 )), where, for d is the (unique) subtree targeted by this instance of mv1 , trans(d, d1 ) and Trans(d , d2 ) 2. Trans(mrg3 (d1 , d2 ), assmγ (d)), where P BC-Eval(d2 ) = {φ0 , φ1 , . . . , φn }, φ0 = σ, γ, and Trans(d1 , d) 3. Trans(σ(d1 , . . . , dn ), σ(d1 , . . . , dn )), where Trans(di , di ), for all 1 ≤ i ≤ n By inspection of the above case-wise definition, it is easy to see that Theorem 3. Trans is a function. The point of defining Trans as per the above is to use it to show that the structural ‘equivalence’ sketched in figure 3 preserves relevant aspects of weak generative capacity. Expressions denoted by derivations in both ‘formalisms’ have been represented here as sequences. However, only the type of the first element of such sequences (a pair σ, γ ∈ Σ ∗ × F∗ ) is identical across formalisms (the other elements are trees with nodes pairs of the same type in the PBC MGs, but are pairs of feature sequences in Hyp MGs). Accordingly, the relation I will show Trans to preserve is the identity of the first element of the yield of the source and target derivation trees. Theorem 4. For d ∈ der(G), such that {φ0 , T1 , . . . , Tn } = P BC-eval(d), if ψ0 , ψ1 , . . . , ψk ∈ Hyp-Eval(Trans(d)), then φ0 = ψ0 , n = k, and for 1 ≤ i ≤ n, i and ψi = ζi , γi . Ti = σi , γi , T1i , . . . , Tm Proof. By induction. For the base case, let d be a lexical item. P BC-Eval(d) and Hyp-Eval(d) are both equal to {d}, which by case 3 of the definition of Trans, is equal to Hyp-Eval(Trans(d)). Now let d1 and d2 be appropriately related to Trans(d1 ) and Trans(d1 ) respectively. There are five cases to consider (mrgi , mvj , for 1 ≤ i ≤ 3 and 1 ≤ j ≤ 2). 1. Let d = mrg1 (d1 , d2 ). Then by case 3 of the definition of Trans, Trans(d) = mrg1 (Trans(d1 ), Trans(d2 )). P BC-Eval(d) is defined if and only if both P BC-Eval(d1 ) = {σ, =cγ} and P BC-Eval(d1 ) = {τ, c, T1 , . . . , Tn }, in which case it is {σ τ, γ, T1 , . . . , Tn }. By the induction hypothesis, we conclude that Hyp-Eval(Trans(d1 )) = {σ, =cγ} and Hyp-Eval(Trans(d2 )) = {τ, c, ψ1 , . . . , ψn }, which are in the domain of merge1 , and thus, by inspection of the definition of this latter, that d and Trans(d) are appropriately related as well.
Without Remnant Movement, MGs Are Context-Free
169
2. The case where d = mrg2 (d1 , d2 ) is not interestingly different from the above. 3. Let d = mrg3 (d1 , d2 ). Then by case 2 of the definition of Trans, Trans(d) = assmγ (Trans(d1 )). As d1 and Trans(d1 ) are appropriately related (by the induction hypothesis), and as both merge3 and assumeγ define the first component of their result to be the same as the first component of their leftmost argument minus the first feature, d and Trans(d) are appropriately related as well. 4. Let d = mv1 (d1 ), and let d2 be the unique subtree targeted by this instance of mv1 . For P BC-Eval(d) to be defined, P BC-Eval(d1 ) must be equal to {σ, +cγ, S1 , . . . , Si−1 , τ, -c, T1 , . . . , Tn , Si+1 , . . . , Sm }. In this case, P BC-Eval(d2 ) = {τ, δ-c, T1 , . . . , Tn }. By the induction hypothesis, Hyp-Eval(Trans(d1 )) = {σ, +cγ, φ1 , . . . , δ, -c, φi+1 , . . . , φm }, and in addition Hyp-Eval(Trans(d2 )) = {τ, δ-c, ψ1 , . . . , ψn }. Thus, we can see that the discharge operation is defined on these arguments, and is equal to {τ σ, γ, φ1 , . . . , φi−1 , ψ1 , . . . , ψn , φi+1 , . . . , φm }. Applying the operation move1 to P BC-Eval(d1 ) we obtain the unit set consisting of the single element τ σ, γ, S1 , . . . , Si−1 , T1 , . . . , Tn , Si+1 , . . . , Sm , and thus establish that d is appropriately related to Trans(d). 5. Finally, let d = mv2 (d1 ). By case 3, Trans(d) = mv2 (Trans(d1 )). If P BC-Eval(d) is defined, then there is a unique moving component Ti of P BC-Eval(d1 ) which an appropriate first feature. By the induction hypothesis, there is a unique corresponding ψi in Hyp-Eval(Trans(d1 )), allowing move2 (Hyp-Eval(Trans(d1 ))) to be defined, and us to see that d and Trans(d) are appropriately related in this case too. Note that whenever d ∈ der(G) is complete, so too is Trans(d) ∈ Hyp-der(G). Corollary 1. Trans preserves completeness. Furthermore, by inspecting the cases of the proof above, we see that the hypothesis introduced by a particular assmγ node which is the translation of a mrg3 node, is discharged at the dschrg node which is the translation of the mv1 node which targets the right daughter of that mrg3 node. From theorem 4 follows the following BC Corollary 2. For every G, and any feature sequence γ, LP (G) ⊆ LHyp (G). γ γ
To prove the inclusion in the reverse direction, I will show that for every complete d ∈ Hyp-der(G) there is a d ∈ der(G) such that Trans(d) = d . I define a function snarT which takes a pair consisting of a derivation tree d ∈ Hyp-der(G) and a set M of pairs of strings over {0, 1}∗ and derivation trees in der(G). We interpret a pair p, d ∈ M as stating that we are to insert tree d as a daughter of the node at address p. (Recall that in translating from Hyp MG derivation trees to PBC trees we need to ‘lower’ expressions introduced at a dschrg node into the position in which they were assumed.) Given a set of such pairs M , I denote by (i) M (for i ∈ {0, 1}) the set {p, d : ip, d ∈ M }. I will use this notation to keep track of where the trees in M should be inserted into the translated structure. Basically, when an item , d ∈ M , it indicates that it should be inserted
170
G.M. Kobele
as a daughter of the current root. I will use the notation M () to denote the unique d such that , d ∈ M , if one exists. 1. for d = dschrg(d1 , d2 ), and p the address in d of the assmγ node whose hypothesis is discharged at this dschrg node, we define snarT(d, M ) = mv1 (snarT(d1 , (0) (M ∪ {p, snarT(d2 , (1) M )}))) 2. for d = assmγ (d1 ), snarT(d, M ) = mrg3 (snarT(d1 , (0) M ), M ()) 3. snarT(σ(d1 , . . . , dn ), M ) = σ(snarT(d1 , (0) M ), . . . , snart(dn , (n−1) M )) Note that, although snarT is not defined on all trees in Hyp-der(G) (case 2 is undefined whenever there is no (unique) , d ∈ M ), it is defined on all complete d ∈ Hyp-der(G). Theorem 5. For all complete d ∈ Hyp-der(G), snarT(d, ∅) ∈ der(G). Proof. Case 2 is the only potential problem (as is undefined whenever M () is). However, in a complete derivation tree, every assmγ node is dominated by a dschrg node, at which is discharged the hypothesis introduced by this former. Moreover, no dschrg node discharges the hypothesis of more than one assmγ node. Thus, we are guaranteed in a complete derivation tree that at each occurrence of an assmγ node M () is defined. That the range of snarT is contained in der(G) is verified by simple inspection of its definition. Of course, we want not just that snarT map derivations in Hyp-der(G) to ones in der(G), but also that a derivation d in der(G) to which a complete derivation d in Hyp-der(G) is mapped by snarT maps back to d via Trans. This will allow us to conclude the converse of corollary 2. Theorem 6. For all complete d ∈ Hyp-der(G), d = Trans(snarT(d, ∅)). Proof. In order to have a strong enough inductive hypothesis, we need to prove something stronger than what is stated in the theorem. Let d ∈ Hyp-der(G), and M be a partial function with domain {0, 1}∗ and range der(G), such that p is the address of an assmγ node in d without a corresponding dschrg node iff there is some d such that M (p) = d . (In plain English, M tells us how to translate ‘unbound’ assmγ nodes in d.) Then d = Trans(snarT(d, M )). Note that the statement of the theorem is a special case, as for d complete there are no unbound assmγ nodes, and thus M can be ∅. For the base case, let d be a lexical item (and thus complete). Then by case 3 of the definition of snarT, snarT(d, ∅) = d, and by case 3 of the definition of Trans, Trans(d) = Trans(snarT(d)) = d. Now let d1 , d2 , be as per the above such that for appropriate M1 , M2 , Trans(snarT(d1 , M1 )) = d1 and Trans(snarT(d2 , M2 )) = d2 . There are again five cases to consider. 1. Let d = mrg1 (d1 , d2 ), and M an assignment of trees in der(G) to unbound assmγ nodes in d. Then Trans(snarT(d, M )) is, by case 3 of the definition of snarT, Trans(mrg1 (snarT(d1 , (0) M ), snarT(d2 , (1) M ))). By
Without Remnant Movement, MGs Are Context-Free
2. 3.
4.
5.
171
case 3 of the definition of Trans, this is easily seen to be identical to mrg1 (Trans(snarT(d1 , (0) M )), Trans(snarT(d2 , (1) M ))). As, for any i, (i) M is an assignment of trees to unbound assmγ nodes in di , the inductive hypothesis applies, and thus Trans(snarT(d, M )) = d, as desired. The case where d = mrg2 (d1 , d2 ) is not interestingly different from the above. Let d = assmγ (d1 ), and let M assign trees to its unbound assmγ nodes (in particular, M () is defined). Then by case 2 of the definition of snarT, Trans(snarT(d, M )) = Trans(mrg3 (snarT(d1 , (0) M ), M ())). Now, according to case 2 of the definition of Trans, this is seen to be identical to assmγ (Trans(snarT(d1 , (0) M ))), which according to our inductive hypothesis is simply assmγ (d1 ) = d. Let d = dschrg(d1 , d2 ), and let M assign trees in der(G) to all and only unbound assmγ nodes in d. By case 1 of the definition of snarT, we have that Trans(snarT(d, M )) is equal to Trans(mv1 (snarT(d1 , (0) (M ∪ {0p, snarT(d2 , (1) M )})))), where 0p is the address of the assmγ node in d bound by the dschrg node at its root. Next, we apply the first case of the definition of Trans. This gives us dschrg(Trans(snarT(d1 , (0) (M ∪ {0p, snarT(d2 , (1) M )}))), Trans(d )), where d is the unique subtree targeted by the mv1 node at the root of the translated expression. This is the right daughter of the mrg3 node which the assmγ node at position p in d1 translates as, namely, snarT(d2 , (1) M ). As (0) M assigns trees to all unbound assmγ nodes in d1 except for the one at location p, (0) (M ∪ {0p, snarT(d2 , (1) M )}) assigns trees to all of d1 ’s unbound assmγ nodes. Therefore, the induction hypothesis applies, and Trans(snarT(d, M )) is seen to be identical to dschrg(d1 , d2 ). Finally, let d = mv2 (d1 ), and M as assignment of trees to unbound assmγ nodes in d. By case 3, snarT(d, M ) = mv2 (Trans(snarT(d1 , M ))), which, by the induction hypothesis, is equal to d.
Theorem 4 allowed us to conclude that for every d ∈ der(G) deriving a complete expression, there was a complete d ∈ Hyp-der(G) deriving the same complete expression (whence corollary 2). From theorem 6 we are able to conclude the reverse as well. BC Corollary 3. For every G, and any feature sequence γ, LP (G) = LHyp (G). γ γ
As Hyp MGs were shown in theorem 2 to be context free, we conclude that MGs subject to the proper binding constraint are as well.
6
Conclusion
We have demonstrated that movement by itself is not enough to describe noncontext free languages; the super-CFness of the MG formalism is essentially tied
172
G.M. Kobele
to remnant movement. This result confirms the intuition of several (cf. [15,21]), and seems related to the results of [18] in the context of the formalism of the GB theory presented therein. Stabler [21] conjectures that: Grammars in MG can define languages with more than 2 counting dependencies only when some sentences in those languages are derived with remnant movements. As we have shown here that MGs without remnant movement can only define context-free languages, we have proven Stabler’s conjecture. However, we can in fact show a strengthened version of this conjecture to be true. Beyond the mere existence of remnant movement, where at item moves from which has already been moved another, we can identify hierarchies of such movement, depending on whether the item moved out of the ‘remnant mover’ is itself a remnant, and if so, whether the item moved out of that item is a remnant, and so on. We could place an upper bound of k on the so-defined degree of remnant movement we were willing to allow by, using the tree-structured representation of moving subpieces from our definition of PBC-MGs, allowing the move operations to target -c features of up to depth k in the tree. In this case, however, we could simply enrich the complexity of our hypotheses in the corresponding Hyp-MGs by a finite amount, which would not change their generative capacity. Thus, in order to derive non-context free languages, MGs must allow for movement of remnants of remnants of remnants. . . , in other words, a MG can define languages with more than two counting dependencies only when there is no bound k such that every sentence in the language is assigned a structure with remnant degree less than k. Given that MGs can analyze non-CF patterns only in terms of unbounded remnant movement, one question these results make accessible is which such patterns in human languages are naturally so analyzed? Perhaps the most famous of the supra CF constructions in natural language is given by the relation between embedded verb clusters and their arguments in Swiss German [19]. [14] have provided an elegant analysis of verbal clusters in Germanic and Hungarian using remnant movement.5 Patterns of copying in natural language [5,17,11] on the other hand, do not seem particularly naturally treated in terms of unbounded remnant movement. [11] shows how the addition of ‘copy movement’ (non-linear string manipulation operations) to the MG formalism allows for a natural treatment of these patterns, one that is orthogonal to the question of whether our grammars for natural language should use bounded or unbounded remnant movement. 5
Not every linguist working in this tradition agrees that verb clusters are best treated in terms of remnant movement. [9] argues that remnant movement approaches to verb clustering are inferior to one using head movement. Adding head movement to MGs without remnant movement allows the generation of non-context free languages [16].
Without Remnant Movement, MGs Are Context-Free
173
References 1. Chomsky, N.: Lectures on Government and Binding. Foris, Dordrecht (1981) 2. Chomsky, N.: The Minimalist Program. MIT Press, Cambridge (1995) 3. Collins, C.: A smuggling approach to the passive in English. Syntax 8(2), 81–120 (2005) 4. Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications (2002), http://www.grappa.univ-lille3.fr/tata 5. Culy, C.: The complexity of the vocabulary of Bambara. Linguistics and Philosophy 8(3), 345–352 (1985) 6. Fiengo, R.: On trace theory. Linguistic Inquiry 8(1), 35–61 (1977) 7. G¨ artner, H.M., Michaelis, J.: Some remarks on locality conditions and minimalist grammars. In: Sauerland, U., G¨ artner, H.M. (eds.) Interfaces + Recursion = Language?, Studies in Generative Grammar, vol. 89, pp. 161–195. Mouton de Gruyter, Berlin (2007) 8. Gazdar, G., Klein, E., Pullum, G., Sag, I.: Generalized Phrase Structure Grammar. Harvard University Press, Cambridge (1985) 9. Haider, H.: V-clustering and clause union - causes and effects. In: Seuren, P., Kempen, G. (eds.) Verb Constructions in German and Dutch, pp. 91–126. John Benjamins, Amsterdam (2003) 10. Harkema, H.: Parsing Minimalist Languages. Ph.D. thesis, University of California, Los Angeles (2001) 11. Kobele, G.M.: Generating Copies: An investigation into structural identity in language and grammar. Ph.D. thesis, University of California, Los Angeles (2006) 12. Kobele, G.M.: A formal foundation for A and A-bar movement in the minimalist program. In: Kracht, M., Penn, G., Stabler, E.P. (eds.) Mathematics of Language, vol. 10. UCLA (2007) 13. Kobele, G.M., Retor´e, C., Salvati, S.: An automata theoretic approach to minimalism. In: Rogers, J., Kepser, S. (eds.) Proceedings of the Workshop Model-Theoretic Syntax at 10; ESSLLI 2007, Dublin (2007) 14. Koopman, H., Szabolcsi, A.: Verbal Complexes. MIT Press, Cambridge (2000) 15. Michaelis, J.: On Formal Properties of Minimalist Grammars. Ph.D. thesis, Universit¨ at Potsdam (2001) 16. Michaelis, J.: Notes on the complexity of complex heads in a minimalist grammar. In: Proceedings of the Sixth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+6), Venezia (2002) 17. Michaelis, J., Kracht, M.: Semilinearity as a syntactic invariant. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 37–40. Springer, Heidelberg (1997) 18. Rogers, J.: A Descriptive Approach to Language-Theoretic Complexity. CSLI Publications, Stanford (1998) 19. Shieber, S.M.: Evidence against the context-freeness of natural language. Linguistics and Philosophy 8, 333–343 (1985) 20. Stabler, E.P.: Derivational minimalism. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 68–95. Springer, Heidelberg (1997) 21. Stabler, E.P.: Remnant movement and complexity. In: Bouma, G., Hinrichs, E., Kruijff, G.J.M., Oehrle, R. (eds.) Constraints and Resources in Natural Language Syntax and Semantics, ch. 16, pp. 299–326. CSLI Publications, Stanford (1999) 22. Stabler, E.P., Keenan, E.L.: Structural similarity within and among languages. Theoretical Computer Science 293, 345–363 (2003)
The Algebra of Lexical Semantics Andr´ as Kornai Institite for Quantitative Social Science, Harvard University, 1737 Cambridge St, Cambridge MA 02138 Hungarian Academy of Sciences, Computer and Automation Research Institute, 13-17 Kende u, H-1111 Budapest
[email protected] http://kornai.com
Abstract. The current generative theory of the lexicon relies primarily on tools from formal language theory and mathematical logic. Here we describe how a different formal apparatus, taken from algebra and automata theory, resolves many of the known problems with the generative lexicon. We develop a finite state theory of word meaning based on machines in the sense of Eilenberg [11], a formalism capable of describing discrepancies between syntactic type (lexical category) and semantic type (number of arguments). This mechanism is compared both to the standard linguistic approaches and to the formalisms developed in AI/KR.
1
Problem Statement
In developing a formal theory of lexicography our starting point will be the informal practice of lexicography, rather than the more immediately related formal theories of Artificial Intelligence (AI) and Knowledge Representation (KR). Lexicography is a relatively mature field, with centuries of work experience and thousands of eminently usable work products in the form of both mono- and multilingual dictionaries. In contrast to this, KR is a rather immature field, with only a few decades of work experience, and few, if any, usable products. In fact, our work continues the trend toward more formalized lexicon-building that started around the Longman Dictionary (Boguraev and Briscoe [6]) and the Collins-COBUILD dictionary (Fillmore and Atkins [14]), but takes it further in that our focus is with the mathematical foundations rather than the domain-specific algorithms. An entry in a standard monolingual dictionary will have several components, such as the etymology of the word in question; part of speech/grammatical category information; pronunciation guidelines in the form of phonetic/phonological transcription; paradigmatic forms, especially if irregular; stylistic guidance and examples; a definition, or several, for different senses of the word; and perhaps even a picture, particularly for plants, animals, and artifacts. It is evident from C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 174–199, 2010. c Springer-Verlag Berlin Heidelberg 2010
The Algebra of Lexical Semantics
175
the typeset page that the bulk of the information is in the definitions, and this is easily verified by estimating the number of bits required to encode the various components. Also, definitions are the only truly obligatory component, because a definition will be needed even for words lacking in exceptional forms (these are the majority) or an interesting etymology, with a neutral stylistic value, predictable part of speech (most words are nouns), and an orthography sufficiently indicative of pronunciation. There is little doubt that definitions are central to the description of words, yet we have far richer and better formalized theories of etymology, grammatical category, morphological structure, and phonological transcription than we have theories of word meaning. Of necessity, work such as Dowty [8] concentrates on elucidating the semantic analysis of those terms for which the logic has the resources: since Montague’s intensional logic IL includes a time parameter, in depth analysis of temporal markers (tense, aspect, time adverbials) becomes possible. But as long as the logic lacks analogous resources for space, kinship terms, sensory inputs, or obligations, this approach has no traction, and heaping all these issues on top of what was already a computationally intractable logic calculus has not proven fruitful. First Order Logic (FOL) is a continental divide in this regard. From a mathematical perspective, FOL is a small system, considering that the language of set theory requires only one binary relation, ∈, and it is evident both from the Peano and the ZF axioms that you will need all well-formed formulas (or at least the fragment that has no atomic sentence lying in the scope of more than three quantifiers, see Tarski and Givant [41]) to do arithmetic. Therefore, those who believe that mathematics is but a small, clean, well-organized segment of natural language will search for the appropriate semantics somewhere upwards of FOL – this is the Montague Grammar (MG) tradition, where higher order intensional logic is viewed as essential. There is already significant work in trying to restrict the power of the Turing-complete higher order intensional apparatus to FOL (Blackburn and Bos [5]) and here we take this further, moving to formalisms that fall at the low end of the complexity scale, well below FOL. At that point, much of what mathematical logic offers is not applicable, and methods of algebra have more traction, as will be discussed in Section 2 in more detail. It is widely accepted that “people who put knowledge into computers need mathematical logic, including quantifiers, as much as engineers need calculus” (McCarthy [32]) but we claim that these tools are neither available in natural language (as noted repeatedly by the inventors of modern mathematical logic from Frege and Russell to Tarski) nor are they required for the analysis of natural language text – in the MG-style analysis it is the needs of the computer programmer that are being catered to at the expense of modeling the actual cognitive capabilities of the native speaker. This is not to say that such needs, especially for the engineer building knowledge-based systems, are not real, but our thesis is that the formalism appropriate for natural language semantics is too weak to supply this, being capable of natively supporting only a far weaker form of analogical reasoning discussed in Section 4.
176
A. Kornai
In this paper we offer a formal theory of lexical definitions. A word that is to be defined will be given in italics; its definition will use for the most part unary atoms, given in typewriter font and to a lesser extent binary atoms, given is small caps; its phonological representation (which we will also call its printname) will be marked by underscoring. Aside from the fancy typography, this is very much in keeping with linguistic tradition where a sign is conceived of as an ordered pair of meaning and form. (The typographical distinctions will pay off only in making the formal parts easier to parse visually – in running text, we will also use italics for emphasis and for the introduction of technical terms.) While we will have little to say about pronunciation, paradigmatic forms, style, or etimology here, the fact that these are important to the practice of lexicography is always kept in mind, and we will make an effort to indicate, however programmatically, how these are to be subsumed under the overall theory presented here. Given the widely accepted role of the lexicon in grammatical theory as the storage place of last resort, containing all that is idiosyncratic, arbitrary, and language-particular, the question must be asked: why should anyone want to dive in this trashcan? First, we need to see clearly that the lexicon is not trash, but rather it is the essential fuel of all communicative effort. As anyone trying to communicate in a language they mastered only at a tourist level will know, lack of crisp grammar is rarely a huge barrier to understanding. If you can produce the words, native speakers will generally be forgiving if the conjugation is shaky or the proper auxiliary is missing. But if you don’t have the words for beef stew or watch repairman, knowing that the analytic present perfect combines stage-level and individual-level predication and thus gives rise to an inchoative meaning will get you nowhere. A more rigorous estimate of the information content of sentences confirms our everyday experience. The word entropy of natural language is about 12-16 bits/word (see Kornai [26]:7.1 for how this depends on the language in question). √ The number of binary parse trees over n nodes is Cn ∼ 4n / πn1.5 or less than 2 bits per word. Aronoff[4] describes in some detail how the Masoretes used only 2 bits (four levels of symbols) to provide a binary parse tree for nearly every Biblical verse – what we learned of coding since would now enable us to create an equally sparse system that is sufficiently detailed to cover every possible branching structure with slightly less than two bits on the average. Definitions of logical structure other than by parse tree are possible, but they do not alter the picture significantly: logical structure accounts for no more than 12-16% of the information conveyed by a sentence, a number that actually goes down with increased sentence length. Another equally important reason why we need to develop a formal theory of word meaning is that without such a theory it is impossible to treat logical arguments like God cannot create a mountain without creating a valley which are based on the meaning of the predicates rather than on the meaning of the logical connectives. Why is this argument correct, even if we assume an omnipotent God? Because mountain means something like land higher than surrounding
The Algebra of Lexical Semantics
177
land so for there to be a mountain there needs to be a lower reference land, if there was no such reference ‘valley’ the purported mountain wouldn’t actually be a mountain. For St. Thomas Aquinas the argument serves to demonstrate that even God is bound by the laws of logic, and for us it serves as a reminder that the entire Western philosophical tradition from Aristotle to the Schoolmen considered word meaning an essential part of logic. We should add here that the same is true of the Eastern tradition, starting with Confucius’ theory of cheng ming (rectification of names) – for example, one who rules by force, rather than by the decree of heaven, is a tyrant, not a king (see Graham [16]:29). Modern mathematical logic, starting with De Morgan, could succeed in identifying a formal framework that can serve as a foundation of mathematics without taking the meaning of the basic elements into account because mathematical content differs from natural language content precisely in being lodged in the axioms entirely. However, for machine understanding of natural language text, lacking a proper theory of the meaning of words is far more of a bottleneck than the lack of compositional semantics, as McCarthy [31], and the closely related work on naive physics (Hayes [18]) already made clear. What does a theory of the lexicon have to provide? First, adequate support for the traditional lexicographic tasks such as distinguishing word senses, deciding whether two words/senses are synonymous or perhaps antonymous, whether one expression can be said to be a paraphrase of another, etc. Second, it needs to connect to a theory of the meaning of larger (non-lexicalized) constructions including, but not necessarily limited to, sentential syntax and semantics. Third, it should provide a means of linking up meanings across languages, serving as a translation pivot. Fourth, it should be coupled to some theory of inference that enables, at the very least, common sense reasoning about objects, people, and natural phenomena. Finally, the theory should offer learning algorithms whereby the representation of meanings can be acquired by the language learner. In this paper we disown the problem of learning, how an English-speaking child associates water with the sensory input (see Keller [21]), as it belongs more in cognitive science and experimental psychology than in mathematical linguistics, and the problem of pattern recognition: when is a person fat? It is possible to define this as the outcome of some physical measurements such as the Body Mass Index, but we will argue at some length that this is quite misguided. This is not to say that there is no learning problem or pattern recognition problem, but before we can get to these we first need a theory of what to learn and recognize. This is not the place to survey the history of lexical semantics, and we confine ourselves to numerical estimates of coverage on the core vocabulary. The large body of analytic work on function words such as connectives, modals, temporals, numerals, and quantifiers covers less than 5% of core vocabulary, where 90% are content words. Erring on the side of optimism and assuming that categories of space, case in particular, can be treated similarly, would bring this number up to 6%, but not further, since the remaining large classes of function words, in particular gender and class markers, are clearly non-logical. Another large body of research approaches natural kinds by means of species and genera. But in
178
A. Kornai
spite of its venerable roots, starting with Aristotle’s work on eidopoios diaphora, and its current popularity, including WordNet, EuroWordNet, and AsiaWordNet on the one hand and Semantic Web description logic (OWL) on the other, this method covers less than 10% of core vocabulary. This is still a big step forward in that it is imposing a formal theory on some content words, by means of a technique, default inheritance along is a links, that is missing from standard logic, including the high-powered modal intensional logics commonly used in sentential semantics. Perhaps surprisingly, the modern work on verb classification including Gruber [17], Dowty [9], Levin [29], FrameNet (Fillmore [12]), and VerbNet (Kipper et al [24]) has far broader scope, covering about 25% of core vocabulary. Taking all these together, and assuming rather generously that all formal problems concerning these systems have been resolved, this is considerably less than half of the core vocabulary, and when it comes to the operations on these elements, all the classical and modern work on the semantics associated with morphological operations (P¯ an.ini, Jakobson, Kiparsky) covers numerically no more than 5-10% of the core operations. That the pickings of the formal theory are rather slim is especially clear if we compare its coverage to that of the less formally stated, but often strikingly insightful work in linguistic semantics, in particular to the work of Wierzbicka, Lakoff, Fauconnier, Langacker, Talmy, Jackendoff, and others often broadly grouped together as ‘cognitively inspired’. We believe that part of the reason why the formal theory has so little traction is that it aims too high, largely in response to the well-articulated needs of AI and KR.
2
The Basic Elements
In creating a formal model of the lexicon the key difficulty is the circularity of traditional dictionary definitions – the first English dictionary, Cawdrey [7] already defines heathen as gentile and gentile as heathen. The problem has already been noted by Leibniz (quoted in Wierzbicka [45]): Suppose I make you a gift of a large sum of money saying you can collect it from Titius; Titius sends you to Caius; and Caius, to Maevius; if you continue to be sent like this from one person to another you will never receive anything. One way out of this problem is to come up with a small list of primitives, and define everything else in terms of these. There are many efforts in this direction (the early history of the subject is discussed in depth in Eco [10]) but the modern efforts begin with Ogden’s [35] Basic English. The KR tradition begins with the list of primitives introduced by Schank [40], and a more linguistically inspired list is developed by Wierzbicka and the NSM school. But it is not at all clear how Schank or Wierzbicka would set about defining new words based on their lists (the reader familiar with their systems should try to apply them to any term that is not on their lists such as liver). As a result, in cognitive science many
The Algebra of Lexical Semantics
179
have practically given up on meaning decomposition as hopeless. For example Mitchell et al. [33] distinguish words from one another by measuring correlation with their core words in the Google 5-gram data. Such correlations certainly do not constitute a semantic representation in the deductive sense we are interested in, but it requires no artful analysis, indeed, it requires no human labor at all, to come up with numerical values for any new word. Here we sketch a more systematic approach that exploits preexisting lexicographic work, in particular dictionary definitions that are already restricted to a smaller wordlist such as the Longman Defining Vocabulary (LDV) or Ogden’s Basic English (BE). These already have the proven capability to define all other words in the Longman Dictionary of Contemporary English (LDOCE) or the Simple English wikipedia at least for human readers, though not necessarily in sufficient detail and precision for reasoning by a machine. Any defining vocabulary D subdivides the problem of defining the meaning of (English) words in two. First, the definition of other vocabulary elements in terms of D, which is our focus of interest, and second, defining D itself, based perhaps on primary (sensory) data or perhaps on some deeper scientific understanding of the primitives. A complete solution to the dictionary definition problem must go beyond a mere listing D of the defining vocabulary elements: we need both a formal model of each element and a specification of lexical syntax, which regulates how elements of D combine with each other (and possibly with other, already defined, elements) in the definition of new words. We emphasize that our goal is to provide an algebra of lexicography rather than a generative lexicon (Flickinger [15], Pustejovsky [36]) of the sort familiar from generative morphology. A purely generative approach would start from some primitives and some rules or constraints which, when applied recursively, provide an algorithm that enumerates the lexicon. The algebraic approach is more modest in that it largely leaves open the actual contents of the lexicon. Consider the semantics of noun-noun compounds. As Kiparsky [22] notes, ropeladder is ‘ladder made of rope’; manslaughter is ‘slaughter undergone by man’; and testtube is ‘tube used for test’, so the overall semantics can only specify that N1 N2 is ‘N2 that is V -ed by N1 ’, i.e. the decomposition is subdirect (yields a superset of the target) rather than direct, as it would be in a fully compositional generative system. Another difference between the generative and the algebraic approach is that only the former implies commitment to a specific set of primitives. To the extent that work on lexical semantics often gets bogged down in a quest for the ultimate primitives, this point is worth a small illustrative example. Consider the Table 1. Multiplication in Z3 e ab e e ab aabe bbe a
180
A. Kornai
cyclic group Z3 on three points given by the elements e, a, b and the preceding multiplication table. The unit element e is unique (being the one and only y satisfying yx = xy = x for all x) but not necessarily irreducible in that if a and b are given, both ab and ba could be used to define it. Furthermore, if a is given, there is no need for b in that aa already defines this element, so the group can be presented simply as a, aa, aaa = e i.e. a is the ‘generator’ and a3 = e is the ‘defining relation’ (as these terms are used in group theory). Note, however, that the exact same group is equally well presented by using b as the generator and b3 = e as the defining relation – there is no unique/distinguished primitive as such. This non-uniqueness is worth keeping in mind when we discuss possible defining vocabularies. In algebra, similar examples abound: for example in a linear space any basis is just as good as any other to define all vectors in the space. For a lexical example, consider the Hungarian verbal stem toj and the derived toj´ o ‘hen’, toj´ as ‘egg’, and tojni ‘to lay egg’. It is evident that eggs are what hens lay, hens are what lay eggs, and laying of eggs is what hens do. In Hungarian, the interdependence of the definitions is made clear by the fact that all three forms are derived from the same stem by productive processes, -´ o is a noun-forming deverbal suffix denoting the agent, -´ as denotes the action or the result, and -ni is the infinitival suffix. But the same arbitrariness in the choice of primitives can be just as evident in less transparent examples, where the common stem is lacking: for example in English hen and egg it is quite unclear which one is logically prior. Consider prison ‘place where inmates are kept by guards’, guard ‘person who keeps inmates in prison’, and inmate ‘person who is kept in prison by guards’. One could easily imagine a language where prison guards are called keepers, inmates keepees, and the prison itself a keep. The mere fact that in English the semantic relationship is not signaled by the morphology does not mean that it’s not there – to the contrary, we consider it an accident of history, beyond the reach of explanatory theory, that the current nominal sense of keep, ‘fortress’ is fortified place to keep the enemy out rather than to keep prisoners in. What is, then, a reasonable defining vocabulary D? We propose to define one from the outside in, by analyzing the LDV or BE rather than building from the inside out from the putative core lists of Schank or Wierzbicka. This method guarantees that at any given point of reducing D to some smaller D’ we remain capable of defining all other words, not just those listed in LDOCE (some 90k items) or the Simple English wikipedia (over 30k entries) but also those that are definable in terms of these larger lists (really, the entire unabridged vocabulary of English). In the computational work that fuels the theoretical analysis presented here we begin with our own version of the LDV, called 4lang, which includes Latin, Hungarian, and Polish translations in the intended senses, both because we do not wish to lose sight of the longer term goal of translation and as a clear means of disambiguation for concepts whose common semantic root, if there ever was one, is no longer transparent, e.g. interest ‘usura’ v. interest ‘studium’.
The Algebra of Lexical Semantics
181
Clearly, a similarly disambiguated version of the BE vocabulary, or any other reasonable starting point could just as well be used. We perform the analysis of the starting D in several chunks, many corresponding to what old-fashioned lexicographers would call a semantic field (Trier [42]), conceptually related terms that are likely candidates to be defined in terms of one another such as color terms, legal terms, and so on. We will not attempt to define the notion of semantic fields in a rigorous fashion, but use an operational definition based on Roget’s Thesaurus. For example, for color terms we take about 30 stanzas from Roget 420 Light to Roget 449 Disappearance, (numbering follows the 1911 edition of Roget’s as this is available as a Project Gutenberg etext #10681) and for religious terms we take 25 stanzas Roget 976 Deity to Roget 1000 Temple. Since the chunking is purely pragmatic, we need not worry about the issues that plague semantic fields: for our purposes it matters but little where the limits of each field are, whether the resulting collections of words and concepts are properly named, or whether some kind of hierarchy can or should be imposed on them – all that matters is that each form a reasonable unit of workable size, perhaps a few dozen to a few hundred stanzas. We will mostly use the Religion field to illustrate our approach, not because we see it as somehow privileged but rather because it serves as a strong reminder of the inadequacy of the physicalist approach. In discussing color, we may be tempted to dispense with a defining vocabulary D in favor of a more scientifically defined core vocabulary, but in general such core expressions, if truly restricted to measurable qualia, have very limited traction over much of human social activity. The main fields defined through Roget are size R031 – R040a and R192 – R223; econ R775 – R819; emotion/attitude R820 – R936 except 845-852 and 922-927; esthetics R845 – R852; law/morals R937 – R975 plus R922 – 927. In this process, about a quarter of the LDV remains unaffiliated. For Religion we obtain the list anoint, believe, bless, buddhism, buddhist, call, ceremony, charm, christian, christianity, christmas, church, clerk, collect, consecrated, cross, cure, devil, dip, doubt, duty, elder, elect, entrance, fairy, faith, faithful, familiar, fast, father, feast, fold, form, glory, god, goddess, grace, heaven, hinduism, holy, host, humble, jew, kneel, lay, lord, magic, magician, mass, minister, mosque, move, office, people, praise, pray, prayer, preserve, priest, pure, religion, religious, reverence, revile, rod, save, see, service, shade, shadow, solemn, sound, spell, spirit, sprinkle, temple, translate, unity, word, worship. (Entries are lowercased for ease of automated stemming etc.) Two problems are evident from such a list. First, there are several words that do not fully belong in the semantic field, in that the sense presented in Roget’s is different from the sense in the LDV: for example port is not a color term and father is not a religious term in the primary sense used in the LDV. Such words are manually removed, since defining the religious sense of father or the color sense of port would in no way advance the cause of reducing the size of D. Programmatic removal is not feasible at this stage: to see what the senses are, and thus to see that the core sense is not the one used in the field, would require a working theory of lexical semantics of the sort we are developing here. Once such
182
A. Kornai
a theory is at hand, we may use it to verify the manual work performed early on, but this is only a form of error checking, rather than learning something new about the domain. Needless to say, father still needs to be defined or declared a primitive, but the place to do this is among kinship terms not religious terms. If a word is kept, this does not mean that it is unavailable outside the semantic field, clearly Bob worships the ground Alice walks on does not mean anything religious. However, for words inside the field such as worship even usage external to the field relies on the field-internal metaphor, so the core/defining sense of the word is the one inside. Conversely, if usage does not require the field-internal metaphor, the word/sense need not be treated as part of the size reduction effort: for example, This book fathered a new genre does not mean (or imply) that the object will treat the subject with reverence, so father can be left out of the religion field. Ideally, with a full sense-tagged corpus one could see ways of making such decisions in an automated fashion, but in reality creating the corpus would require far more manual work than making the decisions manually. Since the issue of different word senses will come up many times, some methodological remarks are in order. Kirsner [25] distinguishes two polarly opposed approaches. The polysemic approach aimed at maximally distinguishing as many senses as they appear distinct, e.g. bachelor1 ‘unmarried adult man’, bachelor2 ‘fur seal without a mate’, bachelor3 ‘knight serving under the banner of another knight’, and bachelor4 ‘holder of a BA degree’. The monosemic approach (also called Saussurean and Columbia School approach by Kirsner, who calls the polysemic approach cognitive) searches for a single, general, abstract meaning, and would subsume at least the first three senses above in a single definition, ‘unfulfilled in typical male role’. This is not the place to fully compare and contrast the two approaches (Kirsner’s work offers an excellent starting point), but we note here a significant advantage of the monosemic approach, namely that it makes interesting predictions about novel usage, while the predictions of the polysemic approach border on the trivial. To stay with the example, it is possible to envision novel usage of bachelor to denote a contestant in a game who wins by default (because no opponent could be found in the same weight class or the opponent was a no-show). The polysemic theory would predict that not just seals but maybe also penguins without a mate may be termed bachelor true but not very revealing. The choice between monosemic and polysemic analysis need not be made on a priori grounds: even the strictest adherent of the polysemic approach would grant that bachelor’s degree refers, at least historically, to the same kind of apprenticeship as bachelor knight. Conversely, even the strictest adherent of the monosemic approach must admit that the relationship between ‘obtaining a BA degree’ and ‘being unfulfilled in a male role’ is no longer apparent to contemporary language learners. That said, we still give methodological priority to the monosemic approach because of the original Saussurean motivation: if a single form is used, the burden of proof is on those who wish to posit separate meanings (see Ruhl [39]). An important consequence of this methodological stance is
The Algebra of Lexical Semantics
183
that we will rarely speak of metaphorical usage, assuming instead that the core meaning already extends to such cases. A second problem, which has notable impact on the structure of the list, is the treatment of natural kinds. By natural kinds here we mean not just biologically defined kinds as ox or yak, but also culturally defined artifact types like tuxedo or microscope – as a matter of fact the cultural definition has priority over the scientific definition when the two are in conflict. The biggest reason for the inclusion of natural kinds in the LDV is not conceptual structure but rather the eurocentric viewpoint of LDOCE: for the English speaker it is reasonable to define the yak as ox-like, but for a Tibetan defining the ox as yak-like would make more sense. There is nothing wrong with being eurocentric in a dictionary of an Indoeuropean language, but for our purposes neither of these terms can be truly treated as primitive. So far we discussed the lexicon, the repository of linguistic knowledge about words. Here we must say a few words about the encyclopedia, the repository of world knowledge. While our goal is to create a formal theory of lexical definitions, it must be acknowledged that such definitions can often elude the grasp of the linguist and slide into a description of world knowledge of various sorts. Lexicographic practice acknowledges this fact by providing, somewhat begrudgingly, little pictures of flora, fauna, or plumbers’ tools. A well-known method of avoiding the shame of publishing a picture of the yak is to make reference to Bos grunniens and thereby point the dictionary user explicitly to some encyclopedia where better information can be found. We will collect such pointers in a set E, and use curly braces to set them typographically apart from references to lexical content. When we say that light is defined as {flux of photons in the visible band}, what this really means is that light must be treated as a primitive. There is a physical theory of light which involves photons, a biophysical theory of visual perception that involves sensitivity of the retina to photons of specific wavelengths, but we are not interested in these theories, we are just offering a pointer to the person who is. From the linguistic standpoint light is a primitive, irreducible concept, one that people have used for millennia before the physical theory of electromagnetic radiation, or even the very notion of photons, was available. Ultimately any system of definitions must be rooted in primitives, and we believe the notion light is a good candidate for such a primitive. From the standpoint of lexicography only two things need to be said: first, whether we intend to take the nominal or the verbal meaning as our primitive, and second, whether we believe that the primitive notion light is shared across the oppositions with dark and with heavy or whether we have two different senses of light. In this particular case, we choose the second solution, treating the polysemy as an accident of English rather than a sign of deep semantic relationship, but the issue must be confronted every time we designate an element as primitive. The issue of how to assign grammatical category (also called part of speech or POS) to the primitives will be discussed in Section 3, but we note here in advance that we keep the semantic part of the representation constant across verbs, their substantive forms, and their cognate objects.
184
A. Kornai
The same point needs to be made in regards to ontological primitives like time. While it is true that the time used in the naive physics model is discrete and asynchronous, this is not intended as some hypothesis concerning the ultimate truth about physical time, which appears continuous (except possibly at a Planck scale) and appears distinct from space and matter (but is strongly intertwined with these). We take the appropriate method for deciding such matters to be physical experimentation and theory-making, and we certainly do not propose to find out the truth of the matter by reverse-engineering the lexica of natural languages. Since the model is not intended as a technical tool for the analysis of synchrony or continuous time, we do not wish to burden it with the kind of mechanisms, such as Petri nets or real numbers, that one would need to analyze such matters. Encyclopedic knowledge of time may of course include reference to the real numbers or other notions of continuous time, but our focus is not with a deep understanding of time as with tense marking in natural language, and it is the grammatical model, not the ontology, that carries the burden of recapitulating this. For the sake of concreteness we will assume a Reichenbachian view, distinguishing four different notions of time: (i) speech time, when the utterance is spoken, (ii) perspective time, the vantage point of temporal deixis, (iii) reference time, the time that adverbs refer to, and (iv) event time, the time the named event unfolds. Typically, these are intervals, possibly open-ended, more rarely points (degenerate intervals) and the hope is that we can eventually express the temporal semantics of natural language in terms of interval relations such as ‘event time precedes reference time’ (see Allen [1], [2], Kiparsky [23]). The formal apparatus required for this is considerably weaker than that of FOL. One important use of external pointers worth separate mention is for proper names. By sun we mean primarily the star nearest to us. The common noun usage is secondary, as is clear from the historical fact that people before Giordano Bruno didn’t even know that the small points of light visible on the night sky were also suns. That we have a theory of the Sun as {the nearest star} where the, near, -est, and star are all members of the LDV is irrelevant from a lexicographic standpoint – what really matters is that there is a particular object, ultimately identified by deixis, that is a natural kind on its own right. The same goes for natural kinds such as oxygen or bacteria that may not even have a naive lexical theory (it is fair to say that all our knowledge about these belongs in chemistry and the life sciences) and about cultural kinds such as tennis, television, british, or october. In 3.3 we return to the issue of how to formalize those cases when purely lexical knowledge is associated with natural kinds, e.g. that tennis is a game played with a ball and rackets, that November follows October, or that bacteria are small living things that can cause disease, but we wish to emphasize at the outset that there is much in the encyclopedia that our formalism is not intended to cover, e.g. that the standard atomic weight of oxygen is 15.9994(3). Lest the reader feel that any reference to some external encyclopedia is tantamount to shirking of lexicographic duty it is worth keeping in mind that natural and cultural kinds amount to less than 6% of the LDV.
The Algebra of Lexical Semantics
185
Returning to the field of religion, when we define Islam as religion centered on the teachings of {Mohamed}, the curly braces acknowledge the fact Mohamed (and similarly Buddha, Moses, or Jesus Christ) will be indispensable in any effort aimed at defining Islam (Buddhism, Judaism, or Christianity). The same is true for Hinduism, which we may define as being centered on revealed teachings ({´ sruti}), but of course to obtain Hinduism as the definiendum the definiens must make it clear that it is not any old set of revealed teachings that are central to it but rather the Vedas and the Upanishads. One way or another, when we wish to define such concepts as specific religions, some reference to specific people and texts designated by proper names is unavoidable. Remarkably, once the names of major religious figures and the titles of sacred texts are treated as pointers to the encyclopedia, there remains nothing in the whole semantic field that is not definable in terms of non-religious primitives. In particular, god can be defined as being, supreme where supreme is simply about occupying the highest position in a hierarchy (being a being has various implications, see Section 3.1, but none of these are particularly religious). The same does not hold for the semantic field of color, where we find irreducible entries such as light. Needless to say, our interest is not with exegesis (no doubt theologians could easily find fault with the particular definitions of god and the major religions offered here) but with the more mundane aspects of lexicography. Once we have buddhism, christianity, hinduism, islam, and judaism defined, buddhist, christian, hindu, muslim, and jew fall out as adherent of buddhism, ..., judaism for the noun denoting a person, and similarly for the adjectives buddhist, christian, hindu, islamic, jewish which get defined as of or about buddhism,..., judaism. We are less concerned with the theological correctness of our definitions than with the proper choice of the base element: should we take the -ism as basic and the -ist as derived, should we proceed the other way round, or should we, perhaps, derive both (or, if the adjectival form is also admitted, all three) from a common root? Our general rule is to try to derive the morphologically complex from the morphologically simplex, but exceptions must be made e.g. when we treat jew as derived (as if the word was *judaist). These are well handled by some principle of blocking (Aronoff [3]), which makes the non-derived jew act as the printname for *judaist. Another, seemingly mundane, but in fact rather thorny issue is the treatment of bound morphemes. The LDV includes, with good reason, some forty suffixes -able, -al, -an, -ance, -ar, -ate, -ation, -dom, -en, -ence, -er, -ess, -est, -ful, hood, -ible, -ic, -ical, -ing, -ion, -ish, -ist, -ity, -ive, -ization, -ize, -less, -like, -ly, -ment, -ness, -or, -ous, -ry, -ship, -th, -ure, -ward, -wards, -work, -y and a dozen prefixes counter-, dis-, en-, fore-, im-, in-, ir-, mid-, mis-, non-, re-, un-, vice-, well-. This affords great reduction in the size of D, in that a stem such as avoid now can appear in the definiens in many convenient forms such as avoidable, avoidance, avoiding as the syntax of the definition dictates. Including affixes is also the right decision from a cross-linguistic perspective, as it is evident that notions that are expressed by free morphemes in one language, such as possession
186
A. Kornai
(English my, your, ...), are expressed in many other languages by affixation. But polysemy can be present in affixes as well: for example, English and Latin have four affixes -an/anus, -ic/ius, -ical/icus, and -ly/tus where Hungarian and Polish have only one -i/anin and we have to make sure that no ambiguity is created in the definitions by the use of polysemous affixes. Altogether, affixes and affixlike function words make up about 8-9% of the LDV, and the challenge they pose to the theory developed here is far more significant than that posed by natural kinds in that their proper analysis involves very little, if any, reference to encyclopedic knowledge. Finally, there is the issue of the economy afforded by primitive conceptual elements that have no clear exponent in the LDV. For example, we may decide that we feel sorrow when something bad happens to us, gloating when it happens to others, happiness when something good happens to us, and resentment when it happens to others. (The example is from Hobbs [19], and there is no claim here or in the original that these are the best or most adequate emotional responses. Even if we agree that they are not, this does not affect the following point, which is about the economy of the system rather than about morally correct behavior.) Given that good, bad, and happen are primitives we will need in many corners of the system, we may wish to rely on some sociological notion of in-group and out-group rather than on the pronouns us and them in formalizing the above definitions. This has the clear advantage of remaining applicable independent of the choice of in-group (be it family, tribe, nation, colleagues, etc) and of indexical perspective (be it ours or theirs). Considerations of economy dictate that we use abstract elements as long as we can reduce the defining vocabulary D by more than one item: whether we prefer to use in-group, out-group or us, them as primitives is more a matter of taste than a substantive issue. If two solutions D and D’ have the same size, we have no substantive reason to prefer one to the other. That said, for expository convenience we will still prefer non-technical to technical and Anglo-Saxon to latinate vocabulary in our choice of primitives. To summarize what we have so far, for the sake of concreteness we identified a somewhat reduced version of the LDV, less than 2,000 items, including some bound morphemes and natural kinds, as our defining vocabulary D, but we make no claim that this is in any way superior to some other base list D’ as long as D’ is not bigger than D.
3
The Formal Model
The key issue is not so much the membership of D as the mechanism that regulates how its elements are put together. Here we depart from the practice of the LDOCE, which uses natural language paraphrases, in favor of a fully formal theory. In 3.1 we introduce the elements of this theory which we will call lexemes. In 3.2 we turn to the issue of how these elements are combined with one another. The semantics of the representations is discussed in 3.3. The formalism is introduced gradually, establishing the intuitive meaning of the various components before the fully formal definitions are given.
The Algebra of Lexical Semantics
3.1
187
Lexemes
We will call the basic building blocks of our system lexemes because they offer a formal reconstruction of the informal notion of lexicographic lexemes. Lexemes are well modularized knowledge containers, ideally suited for describing our knowledge of words (as opposed to our encyclopedic knowledge of the world, which involves a great deal of non-linguistic knowledge such as motor skills or perceptual inputs for which we lack words entirely). Lexemes come in two main varieties, unary lexemes which correspond to most nouns, adjectives, verbs, and content words in general (including most transitive and higher arity verbs as well) will be written in typewriter font, and binary lexemes, corresponding to adpositions, case markers, and other linkers, will be written small caps. Ignoring the printnames, the base of unary lexemes consists of an unordered (conjunctive) list of properties, e.g. the dog is four-legged, animal, hairy, barks, bites, faithful, inferior; the fox is four-legged, animal, hairy, red, clever. Binary lexemes are to be found only among the function words: for example at(x,y) ‘x is at location y’, has(x,y) ‘x possesses y’, cause(x,y) etc. In what follows these will be written infix, which lets us do away with variables entirely. (Thus the notation already assumes that there are no true ditransitives, a position justified in more detail in Kornai [27].) Binary lexemes have two defining lists of properties, one list pertaining to their first (superordinate) argument and another to their second (subordinate) argument – these two are called the base of the lexeme. We illustrate this on the predicate has, which could be the model for verbs such as owns, has, possesses, rules, etc. The differences between John has Rover and Rover has John are best seen in the implications (defaults) associated with the superordinate (possessor) and subordinate (possessed) slots: the former is assumed to be independent of the latter, the latter is assumed to be dependent on the former, the former controls the latter (and not the other way around), the former can end the possession relationship unilaterally, the latter can not, etc. The list of definitional properties is thus partitioned in two: those that belong to the superordinate argument are collected in the head partition, those belonging to the subordinate argument are listed on the dependent partition. The lexical entries in question may also include pointers to sensory data, biological, visual, or other extralinguistic knowledge about dogs and foxes. We assume some set E of external pointers (which may even be two-way in the sense that external sensory data may trigger access to lexical content) to handle these, but here E will not be used for any purpose other than delineating linguistic from non-linguistic concerns. How about the defining elements that we collected in D? These are no different, their definitions can refer to other lexemes that correspond to their essential properties. So definitions can invoke other definitions, but the circularity causes no foundational problems, as argued above. Following Quillian [37], semantic networks are generally defined in terms of some distinguished links: is a to encode facts such as dogs are animals, and attr to encode facts such that they are hairy. Here neither the genus nor the attribution relation is encoded explicitly. Rather, everything that appears on the distinguished (head) partition is attributed (or predicated) directly, and is a is
188
A. Kornai
defined simply by containment of the essential properties. Elementary pieces of link-tracing logic, such as a is a b ∧ b is a c ⇒ a is a c or a is a b ∧ b has c ⇒ a has c follow without any stipulation if we adopt this definition, but the system becomes more redundant: instead of listing only essential properties of dogs we need to list all the essential properties of the supercategories such as animals as well. Altogether, the use of is a links leads to better modularized knowledge bases, and for this reason we retain them as a presentation device, but without any special status: for us dog is a animal is just as valid as dog is a hairy and dog is a barks. From the KR perspective the main point here is that there is no mixing of strict and default inheritance, in fact there no strict portion of the system (except possibly in the encyclopedic part which need not concern us here). If we know that animals are alive then we know that donkeys are alive. If we know that being alive implies life functions such as growth, metabolism, and replication this implication will again be inherited by animals and thus by mules as well. The encyclopedic knowledge that mules don’t replicate has to be learned separately. Once acquired, this knowledge will override the default inheritance, but we are equally interested in the naive world-view where such knowledge has not yet been acquired. Only the naive lexical knowledge will be encoded by primitives directly: everything else must be given indirectly, by means of a pointer or set of pointers to encyclopedic knowledge. The most essential information that the lexicon has about tennis is that it is a game, all the world knowledge that we have about it, the court, the racket, the ball, the pert little skirts, and so forth, are stored in a non-lexical knowledge base. This is also clear from the evidence from word-formation: clearly table tennis is a kind of tennis, yet it requires no court, has a different racket, ball, and so forth. The clear distinction between essential (lexical) and accidental (encyclopedic) knowledge has broad implications for the contemporary practice of Knowledge Representation, exemplified by systems like CyC (Lenat and Guha [28]) or Mindpixel in that the current homogeneous knowledge bases need to be refactored, splitting out a small, lexical base that is entirely independent of domain. The syntax of well-formed lexemes can be summarized in a Context-Free Grammar (V, Σ, R, S) as follows. The nonterminals V are the start symbol S, the binary relation symbols B, and the unary relation symbols collected in U . Variables ranging over V will be taken from the end of the Latin alphabet, v, w, x, y, z. The terminals are the grouping brackets ‘[’ and ‘]’, the derivation history parentheses ‘(’ and ‘)’, and we introduce a special terminating operator ‘;’ to form a terminal v; from any nonterminal v. The rule S → U |B|λ handles the decision to use unary or binary lexemes, or perhaps none at all. The operation of attribution is captured in the rule schema w → w; [S ∗ ] which produces the list defining w. This requires the CFG to be extended in the usual sense that regular expressions are permitted on the right hand side, so the rule really means w → w; []|w; [S]|w; [SS]|... Finally, the operation of predication is handled by u → u; (S) for unary, and v → Sv; S for binary nonterminals. All lexemes are built up recursively by these rules.
The Algebra of Lexical Semantics
3.2
189
Combining the Lexemes
The first level of combining lexemes is morphological. At the very least, we need to account for productive derivational morphology, the prefixes and suffixes that are part of D, but in general we expect a theory that is just as capable of handling cases not easily exemplified in English such as binyanim. Compounding, to the extent predictable, also belongs here, and so does nominalization, especially as definitions make particularly heavy use of this process. The same is true for inflectional morphology, where the challenge is not so much English (though the core set -s, ’s, -ing, -ed must be covered) as languages with more complex inflectional systems. Since certain categories (e.g. gender and class system) can be derivational in one language but inflectional in another, what we really require is coverage of all productive morphology. This is obviously a tall order, and within the confines of this paper all we can do is to discuss one example, deriving insecure, from in- and secure, as this will bring many of the characteristic features of the system in play. Irrespective of whether secure is primitive (we assume it is not), we need some mechanism that takes the in- lexeme, the secure lexeme, and creates an insecure lexeme whose definition and printname are derived from those of the inputs. To forestall confusion we note here that not every morphologically complex word will be treated as derived. For example, it is clear, e.g. from the strong verb pattern, that withstand is morphologically complex, derived from with and stand (otherwise we would expect the past tense to be *withstanded rather than withstood), yet we do not attempt to describe the operation that creates it. We are content with listing withstand, understand, and other complex forms in the lexicon, though not necessarily as part of D. Similarly, if we have a model capable of accounting for insecure in terms of more primitive elements, we are not required to overapply the technique to inscrutable or ineffable just because these words are also morphologically complex and could well be, historically, the residue of in- prefixation to stems no longer preserved in the language. Our goal is to define meanings, and the structural decomposition of every lexeme to irreducible units is pursued only to the extent it advances this goal. Returning to insecure, the following facts should be noted. First, that the operation resides entirely in in- because secure is a free form. Second, that a great deal of the analysis is best formulated with reference to lexical categories (parts of speech): for example, in- clearly selects for an adjectival base and yields an adjectival output (the category of in- is A/A), because those forms such as income or indeed that are formed from a verbal or nominal base lack the negative meaning of in- that we are concerned with (and are clearly related to the preposition in rather than the prefix in/im that is our target here). Third, that the meaning of the operation is exhaustively characterized by the negation: forms like infirm where the base firm no longer carries the requisite meaning still carry a clear negative connotation (in this case, ‘lacking in health’ rather than ‘lacking in firmness’). In fact, whatever meaning representation we assign to the lexically listed element insecure must also be available for the non-lexical (syntactically derived) not secure.
190
A. Kornai
In much of model-theoretic semantics (the major exception is the work of Turner [43], [44]) preserving the semantic unity of stems like secure which can be a verb or an adjective, or stems like divorce which can be both nouns and verbs, with no perceptible meaning difference between the two, is extremely hard because of the differences in signature. Here it is clear that the verb is derived from the adjective: clearly, the verb to secure x means ‘make x (be) secure’, so when we say that in- selects for an adjectival base, this just means the part of the POS structure of secure that permits verbal combinatorics is filtered out by application of the prefix. The adjective secure means ‘able to withstand attack’. Prefixation of in- is simply the addition of the primitive neg to the semantic representation and concatenation plus assimilation in the first, cf. in+secure and im+precise. (We note here, without going into details, that the phonological changes triggered by the concatenation are also entirely amenable to treatment in finite state terms.) As far as the invisible deadjectival verb-forming affix (paraphrased as make) that we posited here to obtain the verbal form, this does two things: first, it brings a subject slot x, and second, it contributes a change of state predicate – before, there wasn’t an object y, and now there is. The first effect, which requires making a distinction between an external (subject) and internal (direct object, indirect object, etc) arguments, follows a long tradition of syntactic analysis going back at least to Williams [46], and will just be assumed without argumentation here, but the latter is worth discussing in greater detail, as it involves a key operation among lexemes, substitution, to which we turn now. Some form of recursive substitution of definitions in one another is necessary both for work aimed at reducing the size of the DV and for attempts to define non-D elements in terms of the primitives listed in D. When we add an element of negation (here given simply as neg, and a reasonable candidate for inclusion in D) to a definition such as ‘able to withstand attack’, how do we know that the result is ‘not able to withstand attack’ rather than ‘able to not withstand attack’ or even ‘able to withstand not attack’ ? The question is particularly acute because the head just contains the defining properties as elements of a set, with no order imposed. (We note that this is a restriction that we could trivially give up in favor of ordered lists, but only at a great price: once ordered lists are admitted the system would become Turing-complete just as HPSG.) Another way of asking the same question is to ask how the system deals with iterated substitutions, for even if we assume that able and attack are primitives (they are listed in the LDV), surely withstand is not, x withstands y means something like ‘x does not change from y’ or even ‘x actively opposes y’. Given our preference for a monosemic analysis we take the second of these as our definition, but this makes the problem even more acute: how do we know that the negation does not attach to the actively portion of the definition? What is at stake here is the single most important property of definitions, that the definiens can be substituted for the definiendum in any context. Since many processes, such as making a common noun definite, which are performed by syntactic means in English, will be performed by inflectional means in
The Algebra of Lexical Semantics
191
other languages such as Rumanian, complete coverage of productive morphology in the world’s languages already implies coverage of a great deal of syntax in English. Ideally, we would wish to take this further, requiring coverage of syntax as a whole, but we could be satisfied with slightly less, covering the meaning of syntactic constructions only to the extent they appear in dictionary definitions. Remarkably, almost all problem cases in syntax are already evident in this restricted domain, especially as we need to make sure that constructions and idioms are also covered. There are forms of grammar which assume all syntax to be a combination of constructions (Fillmore and Kay [13]), and the need to cover the semantics of these is already clear from the lexical domain: for example, a mule is animal, cross between horses and donkeys, stubborn, ... Clearly, a notion such as ‘cross between horses and donkeys’ is not a reasonable candidate for a primitive, so we need a mechanism for feeding back the semantics of nonce constructions into the lexicon. This leaves only the totally non-lexicalized, purely grammatical part of syntax out of scope, cases such as topicalization and other manipulation of given/new structure, as dictionary definitions tend to avoid communicative dynamics. But with this important caveat we can state the requirement that lexical semantics cover not just the lexical, but also the syntactic combination of morphemes, words, and larger units. 3.3
The Semantics of Lexemes
Now that we have seen the basic elements (lexemes) and the basic mode of combination (attribution, modeled as listing in the base of a lexeme), the question will no doubt be asked: how is this different from Markerese (Lewis [30])? The answer is that we will interpret our lexemes in model structures, and make the combination of lexemes correspond to operations on these structures, very much in the spirit of Montague [34]. Formally, we have a source algebra A that is freely generated from some set of primitives D by means of constructions listed in C. An example of such a construction would be x is to y as z is to w which is used not just in arithmetic (proportions) but also in everyday analogy: Paris is to London as France is to England, but in-prefixation would also be a construction of its own. We will also have an algebra M of machines, which will serve as our model structures, and a mapping σ of semantic interpretation that will assign elements of M both to elements of D and to elements of A formed from these in a compositional manner. This can be restated even more compactly in terms of category theory: members of D, plus all other elements of the lexicon, plus all expressions constructed from these, are the objects of some category L of linguistic expressions, whose arrows are given by the constructions and the definitional equations, members of M, and the mappings between them, make up the category M , and semantic interpretation is simply a functor S from L to M . The key observation, which bears repeating at this point, is that S underdetermines the semantics of lexicalized expressions: if noun-noun compounding (obviously a productive construction of English) has the semantics ‘N2 that is
192
A. Kornai
V -ed by N1 ’ all the theory gives us is that ropeladder is a kind of ladder that has something to do with rope. What we obtain is ladder, rope rather than the desired ladder, material, rope. Regrettably, the theory can take us only so far – the rest has to be done by diving into the trashcan and cataloging historical accidents. Lexemes will be mapped by S on finite state automata (FSA) that act on partitioned sets of elements of D ∪ D ∪ E (the underlined forms are printnames). Each partition contains one or more elements of D ∪ E or the printname of the lexeme (which is, as a matter of fact, just another pointer, to phonetic/phonological knowledge, a domain that we happen to have a highly developed theory of). By action we mean a relational mapping, which can be one to many or many to one, not just permutation. These FSA, together with the mapping associating actions to elements of the alphabet, are machines in the standard algebraic sense (Eilenberg [11]), with one added twist: the underlying set, called the base of the machine, is pointed (one element of it is distinguished). The FSA is called the control, the distinguished point is called the head of the base. Without control, a system composed of bases would be close to a semantic network, with activations flowing from nodes to nodes (Quillian [38]). Without a base, the control networks would just form one big FSA, a primitive kind of deduction system, so it is the combination of these two facets that give machines their added power and flexibility. Since the definitional burden is carried in the base, and the combinatorial burden in the control, the formal model has the resources to handle the occasional mismatch between syntactic type (part of speech) and semantic type (as defined by function-argument structure). Let us now survey lexemes in order of increasing base complexity. If the base is empty, it has no relations, so the only FSA that can act on it is the null graph (no states and no transitions). This is called the null lexeme. If the set has one member, the only relations it can have is the identity 1 and the empty relation 0, which combine in the expected manner (0·0 = 0·1 = 1·0 = 0, 1·1 = 1). Note that the identity corresponds to the empty string usually denoted λ or . Since 1n = 1, the behavior of the machine can only take four forms, depending on whether it contains 0, 1, both, or neither, the last case being indistinguishable from the null lexeme over any size base. If the behavior is given by the empty string alone, we will call the lexeme 1 with the usual abuse of notation, independent of the size of the base set. If the behavior is given by the empty relation alone, we will call the lexeme 0, again independent of the size of the base set. Slightly more complex is the lexeme that contains both 0 and 1, which is rightly thought of as the union of 0 and 1, giving us the first example of an operation on lexemes. To fix the notation, in Table 2 we present the multiplication table of the semigroup R2 that contains all relations over two elements (for ease of typesetting the rows and columns corresponding to 0 and 1 are omitted). The remaining elements are denoted a, b, d, u, p, q, n, p, q , a , b , d , u , t – the prime is also used to denote an involution over the 16 elements which is not a semigroup
The Algebra of Lexical Semantics
193
Table 2. Multiplication in R2
a b d u p q n p’ q’ a’ b’ d’ u’ t
ab a0 0b 0d u0 p0 ad ud 0 p’ ub u p’ pd pb a p’ p p’
d u d 0 0 u 0 a b 0 p’ 0 d a b a 0 p b u b p p’ a p’ u d p p’ p
p a u a u p a p p u p p p p p
q n p’ q d d 0 u b 0 a d q’ b b t p’ p’ q q d q’ 1 p’ 0 p p’ q’ q’ b q’ d’ p’ t u’ p’ t a’ p’ q b’ p’ t t p’
q’ 0 q’ q 0 0 q q t q’ t q q’ t t
a’ d q’ q b p’ q u’ t q’ t u’ a’ t t
b’ d’ u’ t q a q q u q’ b q’ a q d q q’ u q’ q’ t p t t q q q q d’ b’ a’ t p t p’ t q’ q’ q’ q’ d’ t a’ t t b’ t t t d’ t t b’ t u’ t t t t t
homomorphism (but does satisfy x = x). Under this mapping, 0 = t and 1 = n, the rest follows from the naming conventions. To specify an arbitrary lexeme over a two-element base we need to select an alphabet as a subset of these letters, an FSA that generates some language over (the semigroup closure of) this alphabet, and fix one of the two base elements as the head. (To bring this definition in harmony with the one provided by Eilenberg we would also need to specify input and output mappings α and ω but we omit this step here.) Because any string of alphabetic letters reduces to a single element according to the semigroup multiplication, the actual behavior of the FSA is given by selecting one of the 216 subsets of the alphabet [0, 1, a, . . . , t], so over a twoelement base there can be no more than 65,536, and in general over an n-element 2 base no more than 2n non-isomorphic lexemes, since over n elements there will 2 2 be n ordered pairs and thus 2n relations. While in principle the number of nonisomorphic lexemes could grow faster than exponentially in n, in practice the base can be limited to three (one partition for the printname one for subject and one for object) so the largest lexeme we need to countenance will have its alphabet size limited to 512. This is still very large, but the upper bound is very crude in that not all conceivable relations over three elements will actually be used, there may be operators that affect subject and object properties at the same time but there aren’t any that directly mix grammatical and phonological properties. Most nominals, adjectives, adadjectives, and verbs will only need one content partition. Relational primitives such as x at y ‘x is at location y’; x has y ‘x is in possession of y’; x before y ‘x temporally precedes y’ will require two content partitions (plus a printname). As noted earlier, transitive and higher arity verbs will also generally require only one content partition: eats(x,y) may look superficially similar to has(x,y) but will receive a very different analysis. At this point, variables serve only as a convenient shorthand: as we shall see
194
A. Kornai
shortly, specifying the actual combinatorics of the elements does not require parentheses, variables, or an operation of variable binding. Formally we could use more complex lexemes for ditransitives like give or show, or verbs with even higher arity such as rent, but in practice we will treat these as combinations of primitives with smaller arity. e.g. x gives y to z as x cause(z has y). (We will continue using both variables and natural language paraphrases as a convenient shorthand when this does not affect the argument we are making.) Let us now turn to operations on lexemes. Given a set L of lexemes, each n-ary operation is a function from Ln to L. As is usual, distinguished elements of L such as null, 0, and 1 are treated as nullary operations. The key unary operations we will consider are step, denoted ’; invstep, denoted ` ; and clean, denoted -. ’ is simply an elementary step of the FSA (performed on edges) which acts as a relation on the partition X. As a result of step R, the active state moves from x0 to the image of x0 under R. The inverse step does the opposite. The key binary operation is substitution, denoted by parens. The head of the dependent machine is built into the base of the head machine. For a simple illustration, recall the definition of mule as animal, cross between horses and donkeys, stubborn,... So far we said that one partition of the mule lexeme, the head, simply contains the conjunction (unordered list) of these and similar defining (essential) properties. Now assume, for the sake of the argument, that animal is not a primitive, but rather a similar conjunction living, capable of locomotion,... Substitution amounts to treating some part of the definiens as being a definiendum on its own right, and the substitution operation replaces the atomic animal on the list of essential properties defining mule by a conjunction living, capable of locomotion,... The internal bracketing is lost, what we have at the end of this step is simply a longer list living, capable of locomotion, cross between horses and donkeys, stubborn,... By repeated substitution we may remove living, stubborn, etc. – the role of the primitives in D is to guarantee that this process will terminate. But note that the semantic value of the list is not changed if we leave the original animal in place: as long as animals are truly defined as living things capable of locomotion, we have set-theoretical identity between animal, living, capable of locomotion and living, capable of locomotion (cf. our second remark above). Adding or removing redundant combinations of properties makes no difference. Let us now consider the next term, cross between horses and donkeys. By analyzing what cross means we can obtain statements father(donkey,mule) and mother(horse,mule). We will ignore all the encyclopedic details (such as the fact that if the donkey is female and the horse male the offspring is called a hinny not a mule) and concentrate on the syntax: how can we describe a statement such as ∀x mule(x) ∃y, z horse(y)& female(y) & donkey(z) & male(z) & parent(x, y) & parent(x, z) without recourse to variables? First, note that the Boolean connective & is entirely unnecessary, since everything is defined by a conjunction of properties – at
The Algebra of Lexical Semantics
195
best what is needed is to keep track of which parent has what gender, a matter that is generally handled by packing this information in a single lexical entry. Once we explain ∀x mule(x) ∃y horse(y) female(y) parent(x, y) the rest will be easy. Again, note that it makes no difference whether we consider a female horse or mare which is a parent or a horse which is a female parent or mother, these combinations will map out the exact same set. Whether primitives such as mother, mare or being are available is a matter of how we design D. Either way, further quantification will enter the picture as soon as we start to unravel parent, a notion defined (at least for this case) by ‘gives genetic material to offspring’ which in turn boils down to ‘causes offspring to have genetic material’. Note that both the quantification and the identity of the genetic material are rather weak: we don’t know whether the parent gives all its genetic material or just part of it, and we don’t know whether the material is the same or just a copy. But for the actual definition none of these niceties matter: what matters is that mules have horse genes and donkey genes. As a matter of fact, this simple definition applies to hinnies as well, which is precisely the reason why people who lack significant encyclopedic knowledge about this matter don’t keep the two apart, and even those who do will generally agree that a hinny is a kind of a mule, and not the other way around (just as bitches are a kind of a dog, i.e. the marked member of the opposition). After all these substitution steps what remains on the list of essential mule properties includes complex properties such as has(horse genes) and capable of locomotion but no variable is required as long as we grant that in any definiens the superordinate (subject) slot of has is automatically filled by the definiendum. Readers familiar with the Accessibility Hierarchy of Keenan and Comrie [20] and subsequent work may jump to the conclusion that one way or another the entire hierarchy (handled in HPSG and related theories by an ordered list) will be necessary, but we attempt to keep the mechanism under much tighter control. In particular, we assume no ternary relations whatsoever, so there are no such things as indirect objects, let alone obliques, in definitions. To get further with capable of locomotion we need to provide at least a rudimentary theory of being capable of doing something, but here we feel justified in assuming that can, change, and place are primitives, so that can(change(place)) is good enough. Notice that what would have been the subject variables, who has the capability, who performs the change, and who has the place, are all implicitly bound to the same superordinate entity, the mule. To make further progress on horse genes we also need a theory of compound nouns: what are horse genes if not genes characteristic of horses, and if they are indeed characteristic of horses how come that mules also have them, and in an essential fashion to boot? The key to understanding horse gene and similar compounds such as gold bar is that we need to supply a predicate that binds the two terms together, what classical grammar calls ‘the genitive of material’ that we will write as made of. A full analysis of this notion is beyond the limits of this paper, but we note that the central idea of made of is production, generation: the bar is produced from/of/by gold, and the genes in question are
196
A. Kornai
produced from/of/by horses. This turns the Kripkean idea of defining biological kinds by their genetic material on its head: what we assume is that horse genes are genes defined by their essential horse-ness rather than horses are animals defined by carrying the essence of horse-ness in their genes. (Mules are atypical in this respect, in that their essence can’t be fully captured without reference to their mixed parentage.)
4
Conclusions
In the Introduction we listed some desiderata for a theory of the lexicon. First, adequate support for the traditional lexicographic tasks such as distinguishing word senses, deciding whether two words/senses are synonymous or perhaps antonymous, whether one expression can be said to be a paraphrase of another, etc. We see how the current proposal does this: two lexemes are synonymous iff they are mapped on isomorphic machines. Since finer distinctions largely rest in the eidopoios diaphora that we blatantly ignore, there are many synonyms: for example we define both poodle and greyhound as dog. Second, we wanted the theory of lexical semantics to connect to a theory of the meaning of larger (non-lexicalized) constructions including, but not necessarily limited to, sentential syntax and semantics. The theory proposed here meets this criterion maximally, since it uses the exact same mechanism to describe meaning starting from the smallest morpheme to the largest construction (but not beyond, as communicative dynamics is left untreated). Third, we wanted the theory to provide a means of linking up meanings across languages, serving as a translation pivot. While making good on this promise is obviously beyond the scope of this paper, it is clear that in the theory proposed here such a task must begin with aligning the primitives D developed for one language with those developed for another, a task we find quite doable at least as far as the major branches of IE (Romance, Slavic, and Germanic) are concerned. Finally, we said that the theory should be coupled to some theory of inference that enables, at the very least, common sense reasoning about objects, people, and natural phenomena. We don’t claim to have a full solution, but we conclude this paper with some preliminary remarks on the main issues. The complexities of the logic surrounding lexemes are not exactly at the same points where we find complexities in mathematical logic. In particular truth, which is treated as a primitive notion in mathematical logic, will be treated as a derived concept here, paraphrased as ‘internal model corresponds in essence to external state of affairs’. This is almost the standard correspondence theory of truth, but the qualification ‘in essence’ takes away much of the deductive power of the standard theory. The mode of inferencing supported here is not sound. For example, consider the following rule: if A’ is part of A and B’ is the same part of B and A is bigger than B, then A’ is bigger than B’. Let’s call this the Rule of Proportional Size, RPS. A specific instance would be that children’s feet are smaller than adults’ feet since children are smaller than adults. Note that the rule is only statistically true: we can well imagine e.g. a bigger building with smaller rooms. Note also that both the premises and the conclusion
The Algebra of Lexical Semantics
197
are defeasible: there may be some children who are bigger than some adults to begin with, and we don’t expect the rule to hold for them (this is a (meta)rule of its own, what we will call Specific Application), and even if the premises are met the conclusion need not follow, the rule is not sound. Nevertheless, we feel comfortable with these rules, because they work most of the time, and when they don’t a specific failure mode can always be found e.g. we will claim that the small building with the larger rooms, or the large building with the smaller rooms, is somehow not fully proportional, or that there are more rooms in the big building, etc. Also, such rules are statistically true, and they often come from inverting or otherwise generalizing rules which are sound, e.g. the rule that if we build A from bigger parts A’ then B is built from parts B’, A will be bigger than B. (This follows from our general notion of size which includes additivity.) Once we do away with the soundness requirement for inference rules, we are no longer restricted to the handful of rules which are actually sound. We permit our rule base to evolve: for example the very first version of RPS may just say that big things have big parts (so that children’s legs also come out smaller than adults’ arms, something that will trigger a lot of counterexamples and thus efforts at rule revision), the restriction on it being the same part may only come later. Importantly, the old rule doesn’t go away just because we have a better new rule. What happens is that the new rule gets priority in the domain it was devised for, but the old rule is still considered applicable elsewhere.
Acknowledgements We thank Tibor Beke (UMass Lowell) for trenchant criticism of earlier versions.
References 1. Allen, B., Gardiner, D., Frantz, D.: Noun incorporation in Southern Tiwa. IJAL 50 (1984) 2. Allen, J., Ferguson, G.: Actions and events in interval temporal logic. Journal of logic and computation 4(5), 531–579 (1994) 3. Aronoff, M.: Word Formation in Generative Grammar. MIT Press, Cambridge (1976) 4. Aronoff, M.: Orthography and linguistic theory: The syntactic basis of masoretic Hebrew punctuation. Language 61(1), 28–72 (1985) 5. Blackburn, P., Bos, J.: Representation and Inference for Natural Language. In: A First Course in Computational Semantics, CSLI, Stanford (2005) 6. Boguraev, B.K., Briscoe, E.J.: Computational Lexicography for Natural Language Processing, Longman (1989) 7. Cawdrey, R.: A table alphabetical of hard usual English words (1604) 8. Dowty, D.: Word Meaning and Montague Grammar. Reidel, Dordrecht (1979) 9. Dowty, D.: Thematic proto-roles and argument selection. Language 67, 547–619 (1991) 10. Eco, U.: The Search for the Perfect Language. Blackwell, Oxford (1995)
198
A. Kornai
11. Eilenberg, S.: Automata, Languages, and Machines, vol. A. Academic Press, London (1974) 12. Fillmore, C., Atkins, S.: Framenet and lexicographic relevance. In: Proceedings of the First International Conference on Language Resources and Evaluation, Granada, Spain (1998) 13. Fillmore, C., Kay, P.: Berkeley Construction Grammar (1997), http://www.icsi.berkeley.edu/~ kay/bcg/ConGram.html 14. Fillmore, C., Atkins, B.: Starting where the dictionaries stop: The challenge of corpus lexicography. Computational approaches to the lexicon, 349–393 (1994) 15. Flickinger, D.P.: Lexical Rules in the Hierarchical Lexicon. PhD Thesis, Stanford University (1987) 16. Graham, A.: Two Chinese Philosophers, London (1958) 17. Gruber, J.: Lexical structures in syntax and semantics. North-Holland, Amsterdam (1976) 18. Hayes, P.: The naive physics manifesto. Expert Systems (1979) 19. Hobbs, J.: Deep lexical semantics. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 183–193. Springer, Heidelberg (2008) 20. Keenan, E., Comrie, B.: Noun phrase accessibility and universal grammar. Linguistic inquiry 8(1), 63–99 (1977) 21. Keller, H.: The story of my life. Dover, New York (1903) 22. Kiparsky, P.: From cyclic phonology to lexical phonology. In: van der Hulst, H., Smith, N. (eds.) The structure of phonological representations, vol. I, pp. 131–175. Foris, Dordrecht (1982) 23. Kiparsky, P.: On the Architecture of P¯ an.ini’s grammar. ms., Stanford University (2002) 24. Kipper, K., Dang, H.T., Palmer, M.: Class based construction of a verb lexicon. In: AAAI-2000 Seventeenth National Conference on Artificial Intelligence, Austin, TX (2000) 25. Kirsner, R.: From meaning to message in two theories: Cognitive and Saussurean views of the Modern Dutch demonstratives. Conceptualizations and mental processing in language, 80–114 (1993) 26. Kornai, A.: Mathematical Linguistics. Springer, Heidelberg (2008) 27. Kornai, A.: The treatment of ordinary quantification in English proper. Hungarian Review of Philosophy 51 (2009) (to appear) 28. Lenat, D.B., Guha, R.: Building Large Knowledge-Based Systems. Addison-Wesley, Reading (1990) 29. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993) 30. Lewis, D.: General semantics. Synthese 22(1), 18–67 (1970) 31. McCarthy, J.: An example for natural language understanding and the ai problems it raises. Formalizing Common Sense: Papers by John McCarthy. Ablex Publishing Corporation 355 (1976) 32. McCarthy, J.J.: Human-level AI is harder than it seemed in 1955 (2005), http://www.formalstanford.edu/ jmc/slides/wrong/wrong-sli/ wrong-slihtml 33. Mitchell, T.M., Shinkareva, S., Carlson, A., Chang, K., Malave, V., Mason, R., Just, M.: Predicting human brain activity associated with the meanings of nouns. Science 320(5880), 1191–1195 (2008) 34. Montague, R.: Universal grammar. Theoria 36, 373–398 (1970) 35. Ogden, C.: Basic English: a general introduction with rules and grammar. K. Paul, Trench, Trubner (1944)
The Algebra of Lexical Semantics
199
36. Pustejovsky, J.: The Generative Lexicon. MIT Press, Cambridge (1995) 37. Quillian, M.R.: Semantic memory. In: Minsky (ed.) Semantic information processing, pp. 227–270. MIT Press, Cambridge (1967) 38. Quillian, M.R.: Word concepts: A theory and simulation of some basic semantic capabilities. Behavioral Science 12, 410–430 (1968) 39. Ruhl, C.: On monosemy: a study in lingusitic semantics. State University of New York Press (1989) 40. Schank, R.C.: Conceptual dependency: A theory of natural language understanding. Cognitive Psychology 3(4), 552–631 (1972) 41. Tarski, A., Givant, S.: A formalization of set theory without variables. Amer. Mathematical Society (1987) 42. Trier, J.: Der Deutsche Wortschatz im Sinnbezirk des Verstandes. C. Winter (1931) 43. Turner, R.: Montague semantics, nominalisations and Scott’s domains. Linguistics and Philosophy 6, 259–288 (1983) 44. Turner, R.: Three theories of nominalized predicates. Studia Logica 44(2), 165–186 (1985) 45. Wierzbicka, A.: Lexicography and conceptual analysis. Karoma, Ann Arbor (1985) 46. Williams, E.: On the notions lexically related and head of a word. Linguistic Inquiry 12, 245–274 (1981)
Phonological Interpretation into Preordered Algebras Yusuke Kubota and Carl Pollard The Ohio State University, Columbus OH 43210, USA
Abstract. We propose a novel architecture for categorial grammar that clarifies the relationship between semantically relevant combinatoric reasoning and semantically inert reasoning that only affects surface-oriented phonological form. To this end, we employ a level of structured phonology that mediates between syntax (abstract combinatorics) and phonology proper (strings). To notate structured phonologies, we employ a lambda calculus analogous to the φ-terms of [8]. However, unlike Oehrle’s purely equational φ-calculus, our phonological calculus is inequational, in a way that is strongly analogous to the functional programming language LCF [10]. Like LCF, our phonological terms are interpreted into a Henkin frame of posets, with degree of definedness (‘height’ in the preorder that interprets the base type) corresponding to degree of pronounceability; only maximal elements are actual strings and therefore fully pronounceable. We illustrate with an analysis (also new) of some complex constituent-order phenomena in Japanese.
1
Introduction
Standard denotational semantics of functional programming languages (FPLs) follows [10] in interpreting programs and their parts into a Henkin frame of (complete) posets whose orders correspond to degree of definedness. Maximal members of the poset that interpret the base types are the interpretations of values, the irreducible terms returned by terminating programs. One reasons about meanings of program constructs in an inequational logic that extends the familiar equational logic for reasoning about meanings of lambda-terms. In a separate development initiated by [11] and [5], building on [7], multimodal categorial grammars (MMCGs) are phonologically interpreted by labelling syntactic derivations not just with meaning terms, but also with terms which take their denotations in an algebra whose operations model modes of phonological combination. (Following Oehrle, we call these φ-terms, and write a; m; A for syntactic type A labelled with φ-term a and meaning term m.) Combining these two ideas, we propose φ-term labelling for MMCGs with interpretation into preordered algebras whose ‘modes of phonological combination’ are binary operations monotonic in both arguments. Unlike the denotational semantics of FPLs, though, deducibility of a ≤ a’ means not that a’ is at least as defined as a, but rather that a’ is at least as pronounceable as a, in the sense that any syntactic derivation that can be phonologically interpreted as a can also C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 200–209, 2010. c Springer-Verlag Berlin Heidelberg 2010
Phonological Interpretation into Preordered Algebras
201
be phonologically interpreted as a’. And the maximal elements of the algebra interpreting the base type phon—the interpretations of the phonological values, are the ones that can be ‘returned’ in the sense of being actual pronunciations. Besides conceptual clarity, the main advantage of this approach is the wholesale elimination from the syntactic type logic of structural axioms or rules that govern semantically inert word order variation; all can be replaced by a single interface schema asserting that if a ≤ a’ is provable in the inequational theory associated with the φ-calculus and a; m; A is derivable in the syntactic calculus, then so is a’; m; A. The paper is organized as follows. Section 2 recalls basic definitions about preordered algebras. Section 3 reviews relevant aspects of the semantics of the programming language PCF. Section 4 sketches our system of φ-labelling in the context of an MMCG for a Japanese fragment. And section 5 uses this grammar to analyze some complex facts about word order in Japanese.
2
Preordered Algebras
A preorder is a reflexive transitive relation, and an order is an antisymmetric preorder. A preordered set (resp. poset) is a set P together with a preorder (resp. order), but usually we just call a preordered set a preorder. Any preorder induces an equivalence relation defined by p ≡ q iff p q and q p. A function from one preorder to another is monotonic (resp. antitonic) if it preserves (resp. reverses) the preorder, and tonic if it is either monotonic or antitonic. Any collection of preorders generates a Henkin frame by closure under exponentiation, where the exponential P → Q of two preorders is taken to be the set of monotonic functions from P to Q with the pointwise preorder. A preordered (resp. monotonic) algebra is a preorder together with a collection of tonic (resp. monotonic) operations. A simple example of a monotonic algebra is a presemigroup, with one binary operation which is associative up to equivalence (u.t.e.), i.e. (p q) r ≡ p (q r). A less familiar example is a monotonic algebra with one operation < which is only left-associative, i.e. (p < q) < r p < (q < r). A cpo is a poset with a bottom (⊥) element in which every chain has a least upper bound (lub). A function from one cpo to another is continuous if it preserves lubs of chains. Continuous functions can be shown to be monotonic. The cpo’s form a Henkin frame with the sets of continuous functions as exponentials. Every continuous function f from a cpo to itself has a least fixed point, equal to n n≥0 f (⊥). A flat cpo is one where p q implies p = q or p = ⊥.
3
PCF
As an alternative to early untyped FPLs, [10] proposed a typed lambda calculus, later dubbed LCF (logic of computable functions), with base types nat and bool . LCF includes constants for truth values, natural numbers, successor, predecessor, and zero-test, as well as McCarthy conditionals and recursion operators. [9]
202
Y. Kubota and C. Pollard
designed a simple LCF-based FPL, PCF, in which programs are modelled as closed terms of a base type, and computation as a form of call-by-name (CBN) evaluation. The irreducible terms returned by terminating programs are called values. Evaluation is confluent, but because of the recursion operators, program termination is not guaranteed. LCF/PCF is equipped with an interpretation I (denotational semantics) into a Henkin frame whose domains are cpo’s, with functional terms interpreted as continuous functions. Base types are interpreted as flat domains, with the bottoms being the meanings of nonterminating programs. For two functional terms s and t of the same type, I(s) I(t) means that, thought of as partial functions, I(t) is at least as defined as I(s). One reasons about the meanings of (parts of) PCF programs using an inequational logic whose formulas are inequalities s ≤ t between terms of the same type. Predictably, the axioms and rules of the inequational logic include β-conversion, Reflexivity and Transitivity (sound because the domains are cpo’s, hence preorders), as well as a form of Monotonicity (of function application only with respect to the first (the function) argument,1 sound because the order on functions is defined pointwise).
4
MMCG with Preordered Phonology
Following Morrill and Solias 1993, we assign phonological and semantic labels to types in syntactic derivations, e.g. a; m; A; such triples are called signs. Derivations themselves are written in the familiar natural-deduction style with hypotheses introduced at the top. Besides the usual logical rules (Introduction and Elimination rules for the various flavors of / and \), the syntactic calculus includes an Interface rule, which asserts that if a ≤ a’ is deducible in the inequational theory associated with the calculus of phonological terms (hereafter, φ-calculus, to be described below), then any sign with φ-component a has a counterpart with the same meaning and the same syntactic category but with φ-component a’. The Interface rule, together with the inequational φ-theory that governs the various modes of phonological combination and their interactions with each other, obviate the need for any structural rules in the syntax. In short: the φ-calculus governs semantically inert surface-oriented word order variation; the logical rules of the syntax govern the semantically relevant syntactic combinatorics; and the Interface rule is the channel of communication between the syntax and the phonology that enables signs to become pronounceable. The rules of the syntactic calculus are as follows: (1)
1
a. Forward Slash Elimination a; m; A/i B b; n; B /i E a ◦i b; m(n); A
Corresponding to CBN evaluation.
b. Backward Slash Elimination b; n; B a; m; B \i A \i E b ◦i a; m(n); A
Phonological Interpretation into Preordered Algebras
(2)
a. Forward Slash Introduction .. . .. .
.. . .. .
[p ; x; A]n .. .. . . b ◦i p ; m; B b; λx.m; B/i A
.. . .. .
b. Backward Slash Introduction .. . .. .
.. . .. . /i In
203
.. . .. .
[p ; x; A]n .. .. . . p ◦i b; m; B b; λx.m; A\i B
.. . .. .
.. . .. . \i In
(3) Interface rule a; m; A PI a’; m; A (where a ≤ a’ is a theorem in the inequational theory of φ-terms) Although most of the generalizations regarding word order are taken care of in the φ-calculus, the left- vs. right-slash distinction is retained for syntactic types. The semantic and phonological annotations on the rules are mostly as one would expect. The semantics is just function application and lambda abstraction for Elimination and Introduction rules, respectively. For Forward and Backward Slash Elimination, the phonology of the derived expression is obtained by combining the phonologies of the functor and the argument in the right order and in the right mode. And the Forward (respectively, Backward) Slash Introduction rule says roughly that a linguistic expression whose phonology is b alone is of category B/A (resp. A\B), given the hypothetical proof that b concatenated with p (whose category is A) to its right (resp. left) is of category B. The phonology for the derived expression in the Introduction rules might look somewhat non-standard, in that, unlike the semantics, it does not seem to involve lambda abstraction. Instead, the phonology of the hypothetically assumed expression (which is a variable of type phon) is simply stripped off from the phonology of the whole expression. To motivate this, we can think of the derived expression in the Introduction rule as actually having the following phonology: (4) λp.[b ◦i p ]() where is the identity element (interpreted as a null string). But in the inequational φ-theory (see below), this is provably equal to b (by β-conversion followed by one of the inequalities for the null phonology). That is, unlike in semantics, when the variable is bound, the resulting abstract is immediately applied to the null-phonology term, so that nothing can be ‘reconstructed’ to a gap in the surface string. The Introduction rules as stated can be thought of as implicitly compiling in these reduction steps together with an application of the Interface rule to produce a conclusion with the simplified phonology b. This is as it should be, given the typical way in which hypothetical reasoning is used in linguistic analyses in categorial grammar. That is, hypothetical
204
Y. Kubota and C. Pollard
reasoning is used in the analyses of phenomena that involve some kind of displaced constituency (such as long-distance dependencies and verb raising), and the phonologies of the hypothesized expressions should appear in the displaced position on the surface string rather than reconstructed in the original position. For this reason, the original ‘gap’ of the displaced element is filled by a null element as soon as the hypothetical reasoning step is taken. (We will illustrate how this works in an actual linguistic analysis with the Japanese fragment in the next section.) As stated above, the Interface rule is the channel between syntax and phonology, and the actual relation of relative pronounceability among phonologies of linguistic expressions is defined by the preorder imposed on the domain that interprets the phonological base type in the Henkin frame that models the inequational φ-theory. Thus this theory is analogous to the inequational theory used to reason about relative definedness of PCF programs, and closely resembles it in form. Specifically, the φ-calculus is a typed lambda calculus with one base type phon , constants of type phon (lexical phonologies and null phonology), and constants of type phon phon phon (modes of φ-composition). As with PCF, the formulas of the inequational theory are of the form a ≤ b, but now the inequated terms denote not program constructs with varying degrees of definedness, but rather phonological entities with varying degrees of pronounceability. Also as with PCF, the axioms (or reduction rules) include β-conversion, Reflexivity (5a), and Transitivity (5b), and a form of Monotonicity (for all the φ-modes, in both arguments) (6). Of course PCF also has parochial axioms (reduction rules for the built-in arithmetic operators, McCarthy conditionals, and fixed-point operators); and the φ-calculus does too, namely word-order variation rules for the specific modes ((8) with i and j being the same mode), rules of interaction between modes ((8) with i and j being different modes), and a rule that allows any of the ‘flavored’, unpronounceable modes to be replaced by the ‘plain vanilla’ fully pronounceable mode (concatenation) (9). In the model, phon is interpreted as a monotonic algebra whose maximal members (values) are the pronounceable phonologies (strings). φ-term reduction always terminates, but is non-confluent, corresponding to semantically inert word order variation. (5) Structured phonologies form a preorder. a.
a≤a
REFL
b. a ≤ b b ≤ c TRANS a≤c
(6) The ◦i are monotonic in both arguments. a ≤ a’ b ≤ b’ MON a ◦i b ≤ a’ ◦i b’ (7) is a two-sided identity for all the ◦i .
(◦i ∈ {◦ , ◦> , ◦< , ◦× , ◦· })
Phonological Interpretation into Preordered Algebras
a.
IDl
◦i a ≤ a (◦i ∈ {◦ , ◦> , ◦< , ◦× , ◦· })
b.
205
IDr
a ◦i ≤ a (◦i ∈ {◦ , ◦> , ◦< , ◦× , ◦· })
(8) Mode-specific rules: a. (LA; left associativity) (a ◦i b) ◦j c ≤ a ◦i (b ◦j c)
LA
(◦i , ◦j ∈ {◦< , ◦· })2
b. (RA; right associativity) a ◦i (b ◦j c) ≤ (a ◦i b) ◦j c
RA
(◦i , ◦j ∈ {◦> , ◦· })
c. (Perm; permutation) a ◦i (b ◦j c) ≤ b ◦i (a ◦j c)
PERM
(◦i , ◦j ∈ {◦× , ◦· })
(9) Concatenation mode is most pronounceable (phonological values are strings). a ◦i b ≤ a ◦ b
5
CONCAT
(◦i ∈ {◦ , ◦> , ◦< , ◦× , ◦· })
A Japanese Fragment
We illustrate the system developed in the previous section with an analysis of word-order variation found in the -te form complex predicate construction in Japanese. A fuller set of facts and an analysis embodying the same basic idea but formulated within the framework of Multi-Modal Combinatory Categorial Grammar [1] can be found in [3]. We here focus on two facts, the basic scrambling pattern and the patterns of argument cluster coordination in the -te form complex predicate. In both of these cases, the embedded verb (V1) and the embedding verb (V2) cluster together, suggesting that they are combined in a mode that is tighter than the mode in which ordinary nominal arguments are combined with the verbs that subcategorize for them. V1 and V2 are independent words syntactically, however, given that this construction systematically differs from the more typical cases of lexical complex predicate constructions in Japanese in terms of the wordhood of the sequence of the V1 and V2.3 Japanese does not freely allow long-distance scrambling. That is, normally, elements of an embedded clause cannot be scrambled to the domain of the higher clause. However, in the -te form complex predicate construction, such scrambling patterns are perfectly acceptable, suggesting that V1 and V2 form a single clausal domain in this construction. (10b) is a case in which the accusative object pianoo of V1 is scrambled over a matrix dative argument John-ni: 2 3
Note about notation: the modes i and j can (but need not) be identical. For the relevant set of facts, see [3].
206
(10)
Y. Kubota and C. Pollard
a. Mary-ga John-ni piano-o hii-te morat-ta. Mary-NOM John-DAT piano-ACC play-TE BENEF-PAST ‘Mary had John play the piano for her.’ b. Mary-ga piano-o John-ni hii-te morat-ta.
Argument cluster coordination patterns also suggest that, in this construction, (semantic) arguments of V1 and V2 are clausemates. As in (11), argument cluster coordination involving arguments of both V1 and V2 is possible, as long as the cluster of V1 and V2 is not split apart. (11) Mary-ga [John-ni piano-o], [Bill-ni gitaa-o] hii-te Mary-NOM John-DAT piano-ACC Bill-DAT guitar-ACC play-TE morat-ta. BENEF-PAST ‘Mary had John play the piano and Bill play the guitar for her. Under the account we propose, these word-order variation facts receive a straightforward account in terms of φ-modalities. The gist of the analysis is that V1 and V2 combine in a mode tighter than the mode in which ordinary arguments combine with the verbs that subcategorize for them, but looser than the mode in which components of lexical complex predicates combine. Specifically, we specify in the lexicon that V1 and V2 in this construction combine with one another in the left-associative mode ◦< , which is distinct from the default scrambling mode ◦· used for putting together nominal arguments with their verbal heads, which is both left- and right-associative and permutative. A sample lexicon is given in (12): (12) mary-ga; m; NPn piano-o; p; NPa gitaa-o; g; NPa john-ni; j; NPd
bill-ni; b; NPd hii-te; play; NPa \· NPn \· S morat-ta; λP λyλx.benef(x, P (y)); (NPn \· S)\< (NPd \· NPn \· S)
The derivation for (10) is given in (13): (13)
morat-ta; λP λyλx.benef(x, P (y)); piano-o; p; NPa hii-te; play; NPa \· NPn \· S \· E (NPn \· S)\< (NPd \· NPn \· S) piano-o ◦· hii-te; play(p); NPn \· S john-ni; j; NPd \· E (piano-o ◦· hii-te) ◦< morat-ta; λyλx.benef(x, play(p)(y)); NPd \· NPn \· S \· E john-ni ◦· ((piano-o ◦· hii-te) ◦< morat-ta); λx.benef(x, play(p)(j)); NPn \· S PI piano-o ◦ (john-ni ◦ (hii-te ◦ morat-ta)); λx.benef(x, play(p)(j)); NPn \· S
The key step in this derivation is the last one. Two things are happening here: (i) the direct object piano-o of the embedded verb scrambles over the dative argument of the higher verb, resulting in the surface order in which the former linearly precedes the latter and (ii) the modes by which the φ-terms of lexical words are combined are all converted to the concatenation mode ◦, so that we get a pronounceable φ-term for the expression derived. Technically, this last step is an application of the Interface rule, whose validity is supported by the following lemma in the inequational logic for φ-terms:
Phonological Interpretation into Preordered Algebras
207
(14) Lemma: a ◦· ((b ◦· c) ◦< d) ≤ b ◦ (a ◦ (c ◦ d)) Proof: a≤a
RFF
LA
(b ◦· c) ◦< d ≤ b ◦· (c ◦< d) MON PERM a ◦· ((b ◦· c) ◦< d) ≤ a ◦· (b ◦· (c ◦< d)) a ◦· (b ◦· (c ◦< d)) ≤ b ◦· (a ◦· (c ◦< d)) a ◦· ((b ◦· c) ◦< d) ≤ b ◦· (a ◦· (c ◦< d))
... ... a ◦· ((b ◦· c) ◦< d) ≤ b ◦· (a ◦· (c ◦< d)) a ◦· ((b ◦· c) ◦<
b ◦· (a ◦· (c ◦< d)) ≤ b ◦ (a ◦ (c ◦ d)) d) ≤ b ◦ (a ◦ (c ◦ d))
TRANS
CONCAT TRANS
Intuitively, what is going on here is that, due to the fact that the mode employed in combining V1 and V2 is left associative, V1 and V2 can be analyzed as forming a verb cluster by left associativity (8a). Once this verb cluster is formed, the object of the embedded verb has the same status as arguments of the higher verb and thus can scramble over the matrix dative argument by permutation (8c). In short, the less pronounceable modes (ones other than ◦) govern the way abstract ‘structured’ phonologies4 are mapped to other ‘structured’ phonologies until they are ultimately mapped to the actually pronounceable phonologies involving only the ◦ mode. The derivation for the more complex, argument cluster coordination case (11) goes as follows (semantics is omitted for the sake of simplicity of presentation):5 (15)
a.
[p; ; NPa ]1 hii-te; ; NPa \· VP \· E morat-ta ; ; VP\< NPd \· VP p ◦· hii-te; ; VP \< E (p ◦· hii-te) ◦< morat-ta; ; NPd \· VP PI p ◦· (hii-te ◦< morat-ta); ; NPd \· VP \· I1 hii-te ◦< morat-ta; ; NPa \· NPd \· VP
b. piano-o; ; NPa [q; ; NPa \· NPd \· VP]2 \· E piano-o ◦· q; ; NPd \· VP \· E john-ni ◦· (piano-o ◦· q); ; VP .. .. PI . . (john-ni ◦· piano-o) ◦· q; ; VP /· I2 bill-ni ◦· gitaa-o; ; VP/· (NPa \· NPd \· VP) john-ni ◦· piano-o; ; VP/· (NPa \· NPd \· VP) & (john-ni ◦· piano-o) ◦ (bill-ni ◦· gitaa-o); ; VP/· (NPa \· NPd \· VP)
john-ni; ; NPd
c. (john-ni ◦· piano-o) ◦ (bill-ni ◦· gitaa-o); ; VP/· (NPa \· NPd \· VP) hii-te ◦< morat-ta; ; NPa \· NPd \· VP \· E ((john-ni ◦· piano-o) ◦ (bill-ni ◦· gitaa-o)) ◦· (hii-te ◦< morat-ta); ; VP PI ((john-ni ◦ piano-o) ◦ (bill-ni ◦ gitaa-o)) ◦ (hii-te ◦ morat-ta); ; VP
The key point in this derivation is that the sequence of V1 and V2 is analyzed as a derived ditransitive verb. This is shown in the first chunk of the derivation 4 5
Of course, it is actually the φ-terms themselves that are ‘structured’, not their denotations, which are merely algebra elements. The step marked by ‘&’ in (15b) is licensed by a non-logical coordination rule. This is needed because the type of coordination in Japanese under consideration does not involve an overt conjunction. Another possibility would be to posit a phonologically empty conjunction. The choice between these two options does not have any significance to the overall analysis we propose here, but we opt for the treatment in the text in view of avoiding phonologically empty elements whenever possible.
208
Y. Kubota and C. Pollard
(15a). By hypothesizing a direct object for the embedded verb, we can form an embedded VP which can then be given as an argument to the matrix verb. Once the embedded verb combines with the matrix verb, the Interface rule is applicable to shift the phonology of the embedded object to the left edge. This feeds into the Slash Introduction rule at the next step, which assigns a ditransitive verblike category to the string of words composed solely of V1 and V2. The rest of the derivation is straightforward. Two argument clusters are coordinated to form a larger expression of the same category (15b) and this coordinated argument cluster takes as its argument the derived ditransitive verb (i.e. the sequence of V1 and V2) as an argument to produce a VP (15c). In this analysis, the whole derivation crucially depends on the possibility of assigning a derived ditransitive verb category to the sequence of the V1 and V2. This is made possible in the current fragment since instances of the Interface rule can be interleaved with instances of the logical rules in the course of syntactic derivations, licensing hypothetical reasoning that partly depends on (the abstract representations of) the surface phonological forms of linguistic expressions. Note also that the present treatment of the cluster of V1 and V2 as a ‘derived’ ditransitive verb depends on the way the Introduction rules are formulated: the phonology of the derived ditransitive verb at step (15a) is simply of type phon , rather than being of a functional type phon phon . This enables it to be given as an argument to the argument cluster in the application of the Elimination rule at the final step (15c). To summarize, we have demonstrated above that the proposed system that interprets phonological terms in a preordered algebra enables an elegant and formally precise analysis of complex word order facts of the Japanese -te form complex predicate construction.
6
Conclusion
The proposed system of φ-labelling for MMCG most closely resembles that of [6]. In Morrill’s system, in addition to logical rules analogous to ours, the syntactic calculus includes several kinds of structural rules that manipulate only the forms of φ-terms. That is, much of the syntactic logic is given over to modelling reasoning about semantically irrelevant surface-oriented phonological forms of linguistic expressions. Essentially the same idea is present in other variants of Type-Logical Grammar, such as [4] and [2]; in the latter, for example, the relevant idea is implemented via the notion of ‘structured antecedents’ in the sequent-style natural deduction presentation. The main way our approach differs from these is that it more explicitly recognizes such reasoning as pertaining to a separate φ-component through which linguistic expressions are mapped to their surface phonological realizations, while the syntactic calculus (apart from the Interface rule) governs the abstract syntactic combinatorics that guide semantic composition. Moreover, our approach builds straightforwardly on decades-old technology—modelling the approximation of realizable values by interpreting typed lambda calculi into Henkin frames of preorders—in a way that illuminates phonological realization as a kind of computation.
Phonological Interpretation into Preordered Algebras
209
References 1. Baldridge, J.: Lexically Specified Derivational Control in Combinatory Categorial Grammar. Ph.D. thesis, University of Edinburgh (2002), http://comp.ling.utexas.edu/jbaldrid/papers/dissertation.html 2. Bernardi, R.: Reasoning with Polarity in Categorial Type Logic. Ph.D. thesis, University of Utrecht (2002), http://www.inf.unibz.it/~ bernardi/finalthesis.html 3. Kubota, Y.: Solving the morpho-syntactic puzzle of the Japanese -te form complex predicate: A Multi-Modal Combinatory Categorial Grammar analysis. In: Bonami, O., Hofherr, P.C. (eds.) Empirical Issues in Syntax and Semantics, vol. 7 (2008), http://www.cssp.cnrs.fr/eiss6 4. Moortgat, M.: Categorial Type Logics. In: van Benthem, J., ter Meulen, A. (eds.) Handbook of Logic and Language, pp. 93–177. Elsevier, Amsterdam (1997) 5. Morrill, G., Solias, T.: Tuples, discontinuity, and gapping in categorial grammar. In: Proceedings of the Sixth Conference on European Chapter of the Association for Computational Linguistics, pp. 287–296. Association for Computational Linguistics, Morristown (1993) 6. Morrill, G.V.: Type Logical Grammar: Categorial Logic of Signs. Kluwer Academic Publishers, Dordrecht (1994) 7. Oehrle, R.T.: Multi-dimensional compositional functions as a basis for grammatical analysis. In: Oehrle, R.T., Bach, E., Wheeler, D. (eds.) Categorial Grammars and Natural Language Structures, pp. 349–389. Reidel, Dordrecht (1988) 8. Oehrle, R.T.: Term-labeled categorial type systems. Linguistics and Philosophy 17(6), 633–678 (1994) 9. Plotkin, G.: LCF considered as a programming language. Theoretical Computer Science 5(3), 223–255 (1977) 10. Scott, D.: A type-theoretical alternative to iswim, cuch, owhy. Theoretical Computer Science 121, 411–440 (1993) (revision of unpublished 1969 manuscript) 11. Solias, M.T.: Gram´ aticas Categoriales, Coordinaci´ on Generalizada Y Elisi´ on. Ph.D. thesis, Departamento de Ling¨ u´ıstica, L´ ogica, Lenguas Modernas y Filsof´ıa de la Ciencia, Universidad Aut´ onoma de Madrid (1992)
Relational Semantics for the Lambek-Grishin Calculus Natasha Kurtonina1 and Michael Moortgat2, 1
2
Fitchburg State College
[email protected] Utrecht Institute of Linguistics OTS, The Netherlands
[email protected]
Abstract. We study ternary relational semantics for LG: a symmetric version of the Lambek calculus with interaction principles due to Grishin [10]. We obtain completeness on the basis of a Henkin-style weak filter construction.
1
Background, Motivation
Lambek’s Syntactic Calculus in its two incarnations — the basic system NL [13] and the associative variant L [12] — recognizes only context-free languages. To go beyond this expressive limitation, various extended typelogical systems have been proposed: multimodal grammars [22,17], discontinuous calculi [23], etc. These extensions, as well as the original Lambek systems, respect an “intuitionistic” restriction: derivability is seen as a relation between a structured configuration of hypotheses A1 , . . . , An and a single conclusion B. In a paper written in 1983, Grishin [10] develops a symmetric extension of the Lambek calculus, where derivability holds between hypotheses A1 , . . . , An , taken together by means of a multiplicative conjunction, and multiple conclusions B1 , . . . , Bm , combined by means of a multiplicative disjunction. Linguistic exploration of Grishin’s framework has started in recent years. In §2 we present the LambekGrishin calculus LG and illustrate its uses in linguistic analysis. In §3, we turn to ternary relational semantics for LG.
2
The Lambek-Grishin Calculus LG
In Grishin’s extensions of the Lambek calculus, the inventory of type-forming operations is doubled: in addition to the familiar operators ⊗, \, / (product, left and right division), we find a second family ⊕, , : coproduct, right and left
The original version of this paper was presented at the 2007 Los Angeles meeting of the Association for Mathematics of Language. The completeness proof for the Lambek-Grishin calculus in §3 below is taken over unchanged from the 2007 version. The presentation of the Lambek-Grishin calculus in §2 is updated in the light of more recent developments. We thank Anna Chernilovskaya and the anonymous referees for helpful comments on the original version of the paper.
C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 210–222, 2010. c Springer-Verlag Berlin Heidelberg 2010
Relational Semantics for LG
211
difference. Some clarification about the notation: we follow [14] in writing ⊕ for the coproduct, which is a multiplicative operation, like ⊗. We read B\A as ‘B under A’, A/B as ‘A over B’, B A as ‘B from A’ and A B as ‘A less B’. For the difference operations, then, the quantity that is subtracted is under the circled (back)slash, just as we have the denominator under the (back)slash in the case of left and right division types. In a formulas-as-types spirit, we will feel free to refer to the division operations as implications, and to the difference operations as coimplications. A, B ::= p |
atoms: s, np, . . .
A ⊗ B | B\A | A/B |
product, left vs right division
A⊕B |AB |BA
coproduct, right vs left difference
(1)
The two families are related by an arrow-reversal symmetry: the Lambek operators form a residuated triple, obeying the laws of (3) on the left; the ⊕ family forms a dual residuated triple, obeying the principles of (3) on the right. The minimal symmetric1 categorial grammar, which we will refer to as LG∅ , consists of just the preorder axioms of (2), i.e. reflexivity and transitivity of the derivability relation, together with the (dual) residuation principles given in (3). AA A C/B rp A⊗B C rp B A\C
AB BC AC BC A drp C B⊕A drp C AB
(2)
(3)
It is well known that there is an alternative way of characterizing (dual) residuated families, using the monotonicity properties of the connectives, and the properties of their compositions (see [7] for discussion). For the compositions, we have the expanding and contracting patterns of (4). The rows here are related by left-right symmetry; the columns by arrow reversal. A ⊗ (A\B) B A\(A ⊗ B) (B ⊕ A) A B (B A) ⊕ A
(B/A) ⊗ A B (B ⊗ A)/A A (A ⊕ B) B A ⊕ (A B)
(4)
The tonicity properties of the type-forming operations can be summarized in the schema (↑ ⊗ ↑), (↑ / ↓), (↓ \ ↑), (↑ ⊕ ↑), (↑ ↓), (↓ ↑), where ↑ (↓) is an isotone (antitone) position; in other words, we have the inference rules of (5) and (6). Given the preorder laws (2) and the (dual) residuation principles (3), 1
‘Symmetric’ here stands for the arrow-reversal symmetry. Elsewhere, the word is sometimes used to refer to commutativity of ⊗/⊕. In LG∅ , these operations are non-commutative (and non-associative): we have the pure logic of residuation here.
212
N. Kurtonina and M. Moortgat
one easily derives (4) and the inference rules of (5) and (6). Conversely, given (4) and (5), (6), one can derive (3). A A B B A ⊗ B A ⊗ B
A A B B A ⊕ B A ⊕ B
A A B B A A B B A A B B A A B B A\B A \B A B A B B/A B /A B A B A
(5) (6)
Interaction principles. The minimal symmetric system LG∅ by itself does not offer the means to overcome the expressive limitations of the original Lambek calculus. For that we must turn to the interaction principles relating the ⊗ and ⊕ families. Grishin discusses two groups of interaction principles. They can be equivalently presented in the form of postulates, or in the form of inference rules. The rule form, which factors out the (dual) residuation principles, is used in [20] to obtain proof nets for LG, and in [18] to prove the decidability of LG by means of a cut elimination argument. The first group of interaction principles consists of the rules in (7). The recipe for these rules is the following: from A ⊗ B C ⊕ D in the premise, one selects a product and a coproduct term; in the conclusion, one simultaneously introduces the residual operations for the remaining two terms. A⊗B C ⊕D (1) C AD/B
A⊗B C⊕D (3) BD A\C
A⊗B C ⊕D (2) C B A\D
A⊗B C⊕D (4) AD C /B
(7)
From (7.1)–(7.4), using the (dual) residuation principles, one easily derives (8.1)– (8.4). In (9), the derivation of (8.1) is given as an illustration. Alternatively, taking (8.1)–(8.4) as primitive postulates, using transitivity and the (dual) residuation principles, (7.1)–(7.4) are obtained as derived rules of inference. We illustrate in (10) with (7.1). (1) (A B) ⊗ C A (B ⊗ C) (2) C ⊗ (A B) A (C ⊗ B)
C ⊗ (B A) (C ⊗ B) A (3) (B A) ⊗ C (B ⊗ C) A (4)
(8)
A (B ⊗ C) A (B ⊗ C) rp B ⊗ C A ⊕ (A (B ⊗ C)) (7.1) A B (A (B ⊗ C))/C rp (A B) ⊗ C A (B ⊗ C)
(9)
A⊗B C ⊕D (8.1) drp (C A) ⊗ B C (A ⊗ B) C (A ⊗ B) D (C A) ⊗ B D rp C A D/B
(10)
Relational Semantics for LG
B= fusion A
== ==
213
D
·
== == =
fission
C
Fig. 1. Interaction principles (7): input configuration A ⊗ B C ⊕ D
These interaction principles, in the form of (8), have been called linear (or weak) distributivity laws (see [6], and other works by these authors) — linear, in the sense that no material is duplicated. Moreover, in a setting where ⊗/⊕ are nonassociative, non-commutative operations, they are structure-preserving, in the sense that the linear order and tree structure of the input configuration is recoverable. This property of structure-preservation is brought out in a compelling way in the graphical format of proof nets for the Lambek-Grishin calculus as defined in [20]. In Fig. 1, we depict the input (premise) configuration of (7), with an orientation which has the antecedent assumptions on the left, and the succedent conclusions on the right. The four ways of rewiring the input configuration of Fig. 1 in such a way that linear order of the resources is respected (assumptions: A before B, conclusions: C before D) are given in Figs. 2 and 3. Of these four combinatory possibilities, earlier presentations of Grishin’s work, such as [14,9], only discuss half: (8.1) and (8.3), or equivalent forms derivable from (7.1) and (7.3), and their converses to be discussed below. B<
<< <<
· A
<< << <
C
D
D
B
>> >> >
· >>
A
>> >
C
Fig. 2. Interaction principles: output (7.1) (left) and (7.3) (right)
Grishin considers a second group of interaction principles. In the rule format, these are the converses of (7), where premise and conclusion change place. Characteristic theorems for this second group are (11), i.e. the converses of (8), and (12). (1) (A B) ⊗ C A (B ⊗ C) (2) C ⊗ (A B) A (C ⊗ B)
C ⊗ (B A) (C ⊗ B) A (3) (11) (B A) ⊗ C (B ⊗ C) A (4)
(1) (A ⊕ B) ⊗ C A ⊕ (B ⊗ C) (2) C ⊗ (A ⊕ B) A ⊕ (C ⊗ B)
C ⊗ (B ⊕ A) (C ⊗ B) ⊕ A (3) (12) (B ⊕ A) ⊗ C (B ⊗ C) ⊕ A (4)
214
N. Kurtonina and M. Moortgat
B
·< <<< << A A AA }}} AA}}} A }} AAA }}
A
B AA D A
D
AA }}} }}A }} AAAA } A <}<} << <<
C
·
C
Fig. 3. Interaction principles: output (7.2) (left) and (7.4) (right)
The general picture that emerges then is the landscape of Figure 4 where the minimal symmetric Lambek calculus can be extended with the irreversible rules of (7) or with their converses (7)−1 , or with the combination of the two, where the latter option turns the interaction principles into reversible rules. It has been shown in [1] that either of the irreversible options constitutes a conservative extension of LG∅ . For the combination, this is no longer true: with reversible interaction principles, one introduces partial associativity and/or commutativity of the ⊗/⊕ operations. Linguistic applications so far have been based on the (7) interaction principles. Henceforth, with LG we will refer to the combination LG∅ + (7). LG∅ + (7) + (7)−1
6 mmm mmm m m mm mmm
hQQQ QQQ QQQ QQQ
LG∅ + (7)−1
hRRR RRR RRR RRR RR
LG∅ + (7)
LG∅
m6 mmm m m m mmm mmm
(= LG)
Fig. 4. The Lambek-Grishin landscape
Illustration. LG has a particularly direct way of capturing phenomena that depend on infixation rather than concatenation. In (13), we show how the same pieces of information (the same premises) can be used either to introduce an implication A\B on the left of the turnstile, or a coimplication A B on the right. The first option leads to a rule of Application — the central composition operation of standard Lambek calculus. The second option leads to a Co-Application variant. Although these two rules are derived from the same premises, there is an important difference between them. When the implication A\B composes with its argument, it must stay external to A . In the case of the coimplication, when A is some product of factors A1 , . . . , An , the conditions for Grishin’s interaction principles (7) are met. This means that the coimplication A B will be able to descend into the phrase A and associate with any of the constituent parts Ai of A into a formula (A B) Ai .
Relational Semantics for LG
A A B B (6) A\B A \B rp A ⊗ (A\B) B
An7 7 .. 77 77 . Ai .. . A1
A A B B (6) A B A B drp A (A B) ⊕ B
B
·
? ?? ?? ??
(13)
An< < .. <<< << . (A B) Ai .. . A1
AB
215
·B
Fig. 5. Infixation
Phenomena of in situ binding exemplify the infixation pattern described above. In [16], in situ binders are characterized in terms of the inference rule (qL) of Fig. 6: an expression of type q(A, B, C) is used locally as an A to build a phrase of type B; it transforms this B phrase into a C. As shown on the right in Fig.6, the (qL) rule is a derived rule of inference in LG. We write Δ for the formula that results from replacing the structural connective of a sequent term Δ by its logical counterpart ⊗. With Γ[D] we denote the slash formula that results from dividing D by the formulas that form the context of C. This formula is obtained from Γ [C] by means of residuation inferences. Γ [C] D Δ[A] B
C Γ[D]
Δ[A] Γ [D] B C Δ[A] (B C) ⊕ Γ[D] Δ[A] ⇒ B
Γ [C] ⇒ D
Γ [ Δ[ q(A, B, C) ]] ⇒ D
qL
Δ[ (B C) A ] Γ [D]
Γ [ Δ[ (B C) A ]] D
rp (6)
drp (7)∗ rp
Fig. 6. In situ binding as a derived rule in LG with q(A, B, C) (B C) A
We illustrate in (14) with an example of in situ question formation from Japanese.2 The direct object wh element ‘nani-o’ is assigned the type (q wh) acc, with q typing yes-no questions, and wh for constituent questions. The particle ‘ka’ marks the main clause as a question q; this is then transformed into a 2
See [24] for a cross-linguistic discussion of question formation from a typelogical perspective.
216
N. Kurtonina and M. Moortgat
constituent question wh by the coimplication (q wh), launched from the direct object in the embedded clause. Miyako-wa Hiromi-ga
nani-o
kai-ta
to
M-top
what-acc
write-past
C
top
H-nom nom
sinzi-ta
ka?
believe-past Q
(q wh) acc acc\(nom\s) s\s s \(top\s)
s\q
‘What did Miyako believe that Hiromi wrote?’ (14) Results. Since this paper was originally written in 2007, a number of results on the formal properties and linguistic applicability of LG have been obtained; see [18] for an up-to-date overview. An analysis of scope construal along the lines of the illustration given above is developed in [3]; compositional interpretation here takes the form of a continuation-passing-style translation of LG derivations into LP. [2] shows how one can assimilate the analysis of wh extraction to that of in situ binding by explicitly coding gap information in lexical type assignments. On the formal side, [19] show that the relation of type similarity (aka conjoinability) of LG coincides with that of LP: types A, B are similar iff their atom counts match (in the case of LG there is also a matching operator count). A general theory of proof nets for extended categorial systems formulated as display calculi is developed in [20]; display rules are compiled away in the proof net representation, but the interaction principles of LG remain as structural conversions. [20] also gives an embedding translation of lexicalized Tree Adjoining Grammars in LG thus showing that this system handles the canonical mildly context sensitive constructions (copy languages, counting and crossed dependencies). The recognizing capacity of LG extends beyond the mildly context sensitive languages: [15] shows that all languages which are the intersection of a context-free language and the permutation closure of a context-free language are recognizable in LG. In this class, we find generalized forms of MIX, with equal multiplicity of k alphabet symbols in any order, and counting dependencies an1 . . . ank for any number k of alphabet symbols. The upper bound for LG recognizing capacity is unknown. This also holds for computational complexity; [21] identifies interesting fragments with polynomial parsability.
3
Relational Semantics
Let us turn now to the frame semantics for LG. In (15) and (16) we compare the truth conditions for the product (fusion) and coproduct (fission) operations. From the modal logic perspective, the multiplicative conjunction ⊗ is interpreted as an existential modality with ternary accessibility relation R⊗ . The residual slashes are the corresponding universal modalities for the rotations of R⊗ . For the multiplicative disjunction ⊕ and its residuals, the dual situation obtains: ⊕ here is the universal modality interpreted w.r.t. an accessibility relation R⊕ ;
Relational Semantics for LG
217
the coimplications are the existential modalities for the rotations of R⊕ . Notice that, in the minimal symmetric logic LG∅ , R⊕ and R⊗ are distinct accessibility relations: the ⊗ and ⊕ families are not interexpressible in terms of De Morgan dualities. Frame constraints corresponding to the Grishin interaction postulates of the group (7) or (7)−1 will determine how their interpretation is related. x A ⊗ B iff ∃yz.R⊗ xyz and y A and z B y C/B iff ∀xz.(R⊗ xyz and z B) implies x C z A\C iff ∀xy.(R⊗ xyz and y A) implies x C
(15)
x A ⊕ B iff ∀yz.R⊕ xyz implies (y A or z B) B and x C y C B iff ∃xz.R⊕ xyz and z z A C iff ∃xy.R⊕ xyz and y A and x C
(16)
Henkin construction. To establish completeness, we use a Henkin construction. In the Henkin setting, “worlds” are (weak) filters: sets of formulas closed under . Let F be the formula language of (1). Let F = {X ∈ P(F ) | (∀A ∈ X)(∀B ∈ F ) A B implies B ∈ X}. The set of filters F is closed under the operations ·), (· ·) defined in (17) below. It is easy to show that X ⊗ Y and X Y (· ⊗ are indeed members of F . Y = {C | ∃A, B (A ∈ X and B ∈ Y and A ⊗ B C)} X⊗ Y = {B | ∃A, C (A ∈ X
X and C ∈ Y and A C B}, Y = {B | ∃A, C (A X ∈ X and C ∈ Y and C A ⊕ B}
alternatively
(17) To lift the type-forming operations to the corresponding operations in F , let A be the principal filter generated by A, i.e. A = {B | A B} and A its principal ideal, i.e. A = {B | B A}. Writing X ∼ for the complement of X, we have (†)
B A ⊗ B = A ⊗
C (‡) A C = A ∼
proof. (†)(⊆) Suppose C ∈ A ⊗ B, i.e. A⊗B C. With A := A and B := B we claim ∃A , B such that A A , B B and A ⊗ B C, which by (Def ⊗) means that C ∈ A ⊗ B as desired. For the (⊇) direction, we will prove the following lemma: B ⊆ X. Lemma 1. A ⊗ B ∈ X implies A ⊗ B ⊆ A ⊗ B. Since A⊗ B ∈ A ⊗ B by definition, we then have A ⊗ B, i.e. ∃A , B such that A ∈ A proof of lemma 1. Suppose C ∈ A ⊗ i.e. A A , B ∈ B i.e. B B , and A ⊗ B C. By Monotonicity, A ⊗ B A ⊗ B . By Transitivity, A ⊗ B C. Together with A ⊗ B ∈ X this implies C ∈ X as desired. The (‡) case is entirely similar. (‡)(⊆) Suppose B ∈ A C, i.e. A C B. With A := A and C := C we claim ∃A , C such that A A, C C and
218
N. Kurtonina and M. Moortgat
means that B ∈ A ∼ C as desired. For the A C B, which by (Def ) (⊇) direction, we show that the folloing holds: C ⊆ X. Lemma 2. A C ∈ X implies A ∼ Since AC ∈ A C by definition, we then have A ∼ C ⊆ A C. C, i.e. ∃A , C such that A ∈ A ∼ proof of lemma 2. Suppose B ∈ A ∼ i.e. A A, C ∈ C i.e. C C , and A C B. By Monotonicity, A C A C . By Transitivity, A C B. Together with A C ∈ X this implies B ∈ X as desired. c c Canonical model. Consider Mc = W c , R⊗ , R⊕ , V c with
W c = F c Z⊆X R⊗ XY Z iff Y ⊗ c X⊆Z R⊕ XY Z iff Y V c (p) = {X ∈ W c | p ∈ X} Truth lemma. We want to show for any formula A ∈ F and filter X ∈ F that X A iff A ∈ X. The proof is by induction on the complexity of A. The base case is handled by V c . Let us look first at the connectives ⊕, , . Coproduct. X A ⊕ B iff A ⊕ B ∈ X (⇒) Suppose X A ⊕ B. We have to show that A ⊕ B ∈ X. By (Def ⊕) we X ⊆ Z and Y have that ∀Y, Z (Y A) implies Z B. Setting Y := A ∼ X, the antecedent (therefore, A ∈ / Y and, by IH for Y , Y A) and Z := Y holds, implying Z B. By IH and the choice of Z we then have B ∈ Z and X. By (Def ) B ∈ A ∼ X means ∃A1 , A2 such that A1 ∈ A ∼ , B ∈ A ∼ ∼ A2 ∈ X and A2 A1 ⊕B. A1 ∈ A means A1 A, hence from A2 A1 ⊕B we get A2 A ⊕ B by Transitivity. Since X is a filter, from A2 ∈ X and A2 A ⊕ B we obtain A ⊕ B ∈ X as desired. c XY Z (⇐) Suppose A ⊕ B ∈ X. We have to show that X A ⊕ B, i.e. ∀Y, Z (R⊕ c and Y A) implies z B. Assume R⊕ XY Z and Y A. We have to show c XY Z and A ∈ Y and Z B. Using IH and the facts we already have (R⊕ A ⊕ B ∈ X) we conclude that A (A ⊕ B) ∈ Z. But A (A ⊕ B) B, so B ∈ Z and by IH Z B. This is what was needed to show.
Left difference. X A B iff A B ∈ X (⇒) Suppose X A B. We have to show that A B ∈ X. X A B means c Z ⊆ X, and Y ∃Y, Z such that R⊕ ZY X, i.e. Y A and Z B. By IH we conclude that A ∈ Y and B ∈ Z. Since also A B A B, from (Def ) A B ∈ Y Z and therefore A B ∈ X as desired. (⇐) Suppose A B ∈ X. We have to show that X A B. It was shown B ⊆ X, which means we have in Lemma 2 that A B ∈ X implies A ∼ c B A ∼ X. Since A ∈ A ∼ and B ∈ B, by IH we claim ∃Y, Z such that R⊕ c R⊕ ZY X and Y A and Z B, which means X A B as desired.
Relational Semantics for LG
219
Right difference. X B A iff B A ∈ X c Z ⊆ Y (Def R⊕ (⇒) Suppose X B A, i.e. ∃Y, Z such that X ) and Y A and Z B, i.e. by IH B ∈ Z. To show that B A ∈ X, we reason by contradiction and assume B A ∈ X. From this assumption and B ∈ Z we have Z by (Def ). Since (B A) B A, A ∈ X Z, so also (B A) B ∈ X A ∈ Y . Contradiction with Y A, hence the assumption B A ∈ X doesn’t hold, as required.
(⇐) Suppose B A ∈ X. To show that X B A we proceed by contraposic ZXY and Y A) implies Z B, tion and assume X B A, i.e. ∀Y, Z (R⊕ alternatively (X Z ⊆ Y and Z B) implies Y A. Setting Y := X Z and Z A and by IH A ∈ X Z. By Z := B, the antecedent holds, hence X ∈ X, A2 ∈ B and A2 A1 ⊕ A. (Def ) this means ∃A1 , A2 such that A1 From A2 ∈ B we have B A2 , so by Transitivity, B A1 ⊕ A, and by Dual residuation, B A A1 . Since A1 ∈ X, B A ∈ X, contradicting our original assumption. For the ⊗, /, \ connectives, we refer to [11] (Theorem 3.3.2, p 75), repeated here for convenience. Product. X A ⊗ B iff A ⊗ B ∈ X Z ⊆ X, Y A and Z B. (⇒) Suppose X A ⊗ B, i.e. ∃Y, Z such that Y ⊗ we have A ⊗ B ∈ X By IH, A ∈ Y and B ∈ Z. Since A ⊗ B A ⊗ B, by (Def ⊗) as desired. (⇐) Suppose A ⊗ B ∈ X. In Lemma 1 we have shown that this implies A ⊗ c c B ⊆ X, i.e. R⊗ XAB by (Def R⊗ ). Since A ∈ A, B ∈ B, by IH we have A A, B B. By the truth condition for ⊗ this means X A ⊗ B as desired. Right division. We do X A\B iff A\B ∈ X. The / case is symmetric. c (⇒) Suppose X A\B, i.e. ∀Y, Z if R⊗ ZY X and Y A then Z B. Putting c X, since A ⊗ X ⊆ A ⊗ X we have R⊗ ZY X by (Def Y := A and Z := A ⊗ c X B, and by IH R⊗ ), and since A ∈ A also A A by IH, hence A ⊗ this means ∃C, D such that C ∈ A i.e. A C, D ∈ X B ∈ A ⊗X. By (Def ⊗) and C ⊗ D B. By Transitivity, A ⊗ D B and by Residuation, D A\B. Hence A\B ∈ X as desired. c ZY X (⇐) Suppose A\B ∈ X. We have to show that X A\B, i.e. ∀Y, Z if R⊗ X ⊆Z and Y A then Z B. Suppose the antecedent holds, which means Y ⊗ c ) and A ∈ Y by IH. Together with A\B ∈ X we have A ⊗ (A\B) ∈ Z by (Def R⊗ by (Def ⊗). Since A ⊗ (A\B) B, also B ∈ Z. By IH Z B which means the consequent of the truth condition for \ holds, hence X A\B as desired. This establishes the Truth Lemma, from which completeness immediately follows.
220
N. Kurtonina and M. Moortgat
Theorem. Completeness of LG∅ . If |= A B, then A B is provable in LG∅ . proof. Suppose A B is not provable. Then, by the Truth Lemma, Mc , A B. Since Mc , A A, we have Mc |= A B, and hence |= A B. Completeness of extensions with interaction principles. In the minimal symmetric system, the R⊗ and R⊕ accessibility relations are distinct. For the extensions with Grishin interaction principles, we have frame constraints relating the interpretation of R⊗ and R⊕ . Consider first the group (7), or a set of postulates equivalent to it such as (8.1)–(8.4). We take (8.1) as a representative: (A B) ⊗ C A (B ⊗ C); the other cases are similar. For (8.1) we have the constraint in (18) (where R(−2) xyz = Rzyx). (−2)
∀xyzwv (R⊗ xyz ∧ R⊕
(†)
W
(‡)
V
Y @ @@ @@ @@ X
(−2)
ywv) ⇒ ∃t (R⊕
~~ ~~ ~ ~~
Z
xwt ∧ R⊗ tvz)
V @ @@ @@ @@ T
W
(18)
Z
X
In (†) we depict X (A B) ⊗ C, with W A, V B and Z C; in (‡) c c X A (B ⊗ C). Dotted lines represent R⊕ , solid lines R⊗ . We have to show that in the Henkin model ∀X, Y, Z, V, W construed as in (†), there is a fresh internal T connecting the root X to the leaves W, V, Z as in c Z ⊆V ⊗ Z. To also Z gives us R⊗ T V Z since V ⊗ (‡). The solution T := V ⊗ c T ⊆ X, suppose A ∈ W T . We need to show that show R⊕ T W X, i.e. W A ∈ W T means ∃A1 , A2 such that A1 ∈ W , A2 ∈ T and A ∈ X. By (Def ) A1 A2 A . Since T := V ⊗Z, A2 ∈ T means ∃B1 , B2 such that B1 ∈ V , B2 ∈ Z ∈ W and and B1 ⊗ B2 A2 . Taking the configuration (†) together with A1 B1 ∈ V , we conclude Y A1 B1 which in (†) together with B2 ∈ Z implies that X (A1 B1 ) ⊗ B2 . By the Truth Lemma, this means that (A1 B1 ) ⊗ B2 ∈ X and since X is a filter and (8.1) an axiom, A1 (B1 ⊗ B2 ) ∈ X. But since B1 ⊗ B2 A2 we conclude that A1 A2 ∈ X. Together with A1 A2 A , since X is a filter, we obtain A ∈ X as desired. Consider next the group of interaction principles (7)−1 , the converses of (7). As a representative, we take (11.1): (A B) ⊗ C A (B ⊗ C). This time, we have to show that in the Henkin model ∀X, T, Z, V, W construed as in (‡), there is a fresh internal Y connecting the root X to the leaves W, V, Z c V . Since W V ⊆W V , R⊕ as in (†). Let Y := W V W Y holds. To show c that also R⊗ XY Z, i.e. Y ⊗ Z ⊆ X, suppose A ∈ Y ⊗ Z, and let us show that A ∈ Y ⊗ Z means ∃A2 B1 such that A2 ∈ Y , B1 ∈ Z A ∈ X. By (Def ⊗),
Relational Semantics for LG
221
V , A2 ∈ Y by (Def ) means and A2 ⊗ B1 A . Since we had Y := W ∃A3 C1 such that A3 ∈ W , C1 ∈ V and A3 C1 A2 . Given that C1 ∈ V and B1 ∈ Z, in the configuration (‡) we have T C1 ⊗ B1 , and since A3 ∈ W, X A3 (C1 ⊗ B1 ). By the Truth Lemma this means that A3 (C1 ⊗ B1 ) ∈ X, and also (A3 C1 ) ⊗ B1 ∈ X, since X is a filter and we have (11.1). Since A3 C1 A2 , we can conclude A2 ⊗ B1 ∈ X, and since A2 ⊗ B1 A , also A ∈ X as desired.
4
Discussion
We have established completeness for the minimal symmetric Lambek calculus LG∅ and for its extension with interaction principles. The construction is neutral with respect to the choice between (7) and (7)−1 : it accommodates (8.1)–(8.4) and the converses in an entirely similar way. In further research, we would like to consider more structured models with a bias towards either (7) or the converse principles; the working hypothesis would be that such a bias would reflect the distinction between discontinuous dependencies of the in situ binding and of the extraction type. Since this paper was originally presented, a number of authors have discussed calculi similar to LG, in a setting where also the lattice operations are considered. We refer to the ‘double residuated algebra’s’ of [5], the generalized Kripke frames of [8], and the symmetric generalized Galois logics of [4]. We defer a comparison of these works with the approach taken here to another occasion.
References 1. Bastenhof, A.: Continuations in natural language syntax and semantics. MPhil Linguistics, Utrecht University (2009) 2. Bastenhof, A.: Extraction in the Lambek-Grishin Calculus. In: Icard, T. (ed.) Proceedings of the 14th Student Session of the European Summer School in Logic, Language and Information, Bordeaux, pp. 106–116 (2009) 3. Bernardi, R., Moortgat, M.: Continuation semantics for the LambekGrishin calculus. Information and Computation 208(5), 397–416 (2010), doi:10.1016/j.ic.2009.11.005 4. Bimb´ o, K., Dunn, J.M.: Symmetric generalized Galois logics. Logica Universalis 3(1), 125–152 (2009) 5. Buszkowski, W.: Interpolation and FEP for logics of residuated algebras. Logic Journal of the IGPL. Special issue Logic, Algebra and Truth Degrees (LATD 2008) (to appear), doi:10.1093/jigpal/jzp094 6. Cockett, J.R.B., Seely, R.A.G.: Proof theory for full intuitionistic linear logic, bilinear logic and mix categories. Theory and Applications of Categories 3, 85–131 (1996) 7. Galatos, N., Jipsen, P., Kowalski, T., Ono, H.: Residuated Lattices: An Algebraic Glimpse at Substructural Logics. Studies in Logic and the Foundations of Mathematics, vol. 151. Elsevier, Amsterdam (2007) 8. Gehrke, M.: Generalized Kripke frames. Studia Logica 84(2), 241–275 (2006)
222
N. Kurtonina and M. Moortgat
9. Gor´e, R.: Substructural logics on display. Logic Journal of IGPL 6(3), 451–504 (1997) 10. Grishin, V.N.: On a generalization of the Ajdukiewicz-Lambek system. In: Mikhailov, A.I. (ed.) Studies in Nonclassical Logics and Formal Systems, Nauka, Moscow, pp. 315–334 (1983); English translation in Abrusci, Casadio (eds.): Proceedings 5th Roma Workshop. Bulzoni Editore, Roma (2002) 11. Kurtonina, N.: Frames and labels. A modal analysis of categorial inference. PhD thesis, OTS Utrecht University, ILLC Amsterdam University (1995) 12. Lambek, J.: The mathematics of sentence structure. American Mathematical Monthly 65, 154–170 (1958) 13. Lambek, J.: On the calculus of syntactic types. In: Jakobson, R. (ed.) Structure of Language and Its Mathematical Aspects, pp. 166–178. American Mathematical Society (1961) 14. Lambek, J.: From categorial to bilinear logic. In: Doˇsen, K., Schr¨ oder-Heister, P. (eds.) Substructural Logics, pp. 207–237. Oxford University Press, Oxford (1993) 15. Melissen, M.: The generative capacity of the Lambek-Grishin calculus: A new lower bound. In: de Groote, P. (ed.) Proceedings 14th Conference on Formal Grammar. LNCS, vol. 5591, Springer, Heidelberg (2010) 16. Moortgat, M.: Generalized quantifiers and discontinuous type constructors. In: Bunt, H., van Horck, A. (eds.) Discontinuous Constituency, pp. 181–207. De Gruyter, Berlin (1996) 17. Moortgat, M.: Multimodal linguistic inference. Journal of Logic, Language and Information 5(3,4), 349–385 (1996) 18. Moortgat, M.: Symmetric categorial grammar. Journal of Philosophical Logic 38(6), 681–710 (2009) 19. Moortgat, M., Pentus, M.: Type similarity for the Lambek-Grishin calculus. In: Proceedings 12th Conference on Formal Grammar, Dublin (2007) 20. Moot, R.: Proof nets for display logic. CoRR, abs/0711.2444 (2007) 21. Moot, R.: Lambek grammars, tree adjoining grammars and hyperedge replacement grammars. In: Proceedings of TAG+9, The 9th International Workshop on Tree Adjoining Grammars and Related Formalisms, T¨ ubingen, pp. 65–72 (2008) 22. Morrill, G.: Type Logical Grammar. Kluwer, Dordrecht (1994) 23. Morrill, G., Fadda, M., Valentin, O.: Nondeterministic discontinuous Lambek calculus. In: Proceedings of the Seventh International Workshop on Computational Semantics (IWCS 2007), Tilburg (2007) 24. Vermaat, W.: The logic of variation. A cross-linguistic account of wh-question formation. PhD thesis, Utrecht Institute of Linguistics OTS, Utrecht University (2006)
Intersecting Adjectives in Syllogistic Logic Lawrence S. Moss Department of Mathematics, Indiana University, Bloomington IN 47405 USA
[email protected]
Abstract. The goal of natural logic is to present and study logical systems for reasoning with sentences of (or which are reasonably close to) ordinary language. This paper explores simple systems of natural logic which make use of intersecting adjectives; these are adjectives whose interpretation does not vary with the noun they modify. Our project in this paper is to take one of the simplest syllogistic fragments, that of all and some, and to add intersecting adjectives. There are two ways to do this, depending on whether one allows iteration or prefers a “flat” structure of at most one adjective. We present rules of inference for both types of syntax, and these differ. The main results are four completeness theorems: for each of the two types of syntax we have completeness for the all fragment and for the full language of this paper. Keywords: syllogistic logic,completeness, adjectives, transitive relations.
1
Introduction: Intersecting Adjectives
By “natural logic” I mean the study of logical systems designed to model linguistic inference in a manner which is as “close to the surface” as possible. The idea is to study inference in language on its own terms, and hopefully to obtain sound and complete systems for linguistic inference which are also decidable. This contrasts with approaches that go via translation to first-order logic because first-order logic is undecidable, and because work done via translation does not yield logical systems in the first place. Among the simplest kind of logical systems of the type studied in this paper are ones derived from the classical syllogistic. These are extremely small logical systems, containing as sentences only expressions of the form all p and q and some p are q. The classical syllogistic can be viewed as a logical system, and then one could study its properties. The earliest work on this topic may be found in L ukasiewicz [2], and the goal there was to propose a modern reconstruction of the ancient sources of logic. In contrast, most of the contemporary interest in the topic is aimed at other matters: decidable fragments of language; alternatives to model-theoretic semantics based on proof theory; and logical systems for human reasoning. For examples, see Nishihara et. al [5] as well as [3,4,6]. The main new point in this paper concerns a class of adjectives call intersecting adjectives. This class includes the color adjectives, also male and female, and frequently also nationality adjectives such as Xhosa and Yoruba. Intersecting C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 223–237, 2010. c Springer-Verlag Berlin Heidelberg 2010
224
L.S. Moss
adjectives have two defining features, and as we shall see these features are closely related. The first is a proof-theoretic feature noted by Keenan and Faltz [1], p. 123: The sense in which an intersecting adjective determines a property can be described as follows: If Dana is a female student and Dana is also an athlete, then Dana is a female athlete.
(1)
The second feature of the intersecting adjectives is more semantic: in a standard model-theoretic semantics, the interpretation of a phrase such as female shopkeeper would be the intersection of the interpretation of shopkeeper (some subset of the underlying universe of discourse) with a set of “female individuals”. In this respect, the intersecting adjectives differ from the larger class of adjectives. To recall an oft-made point, consider a (non-intersecting) adjective such as tall. It may well be that a person could simultaneously be a tall student but not a tall basketball player. And this would mean that tall lacks both the proof-theoretic and model-theoretic features of the intersecting adjectives. That is, the statement of Keenan and Faltz above would be false with tall replacing female, and it also would not be sensible to interpret tall student in a model by intersecting the interpretation of student with a fixed set interpreting tall. We are interested in syllogistic inferences using the intersecting adjectives in addition to the determiners all and some. To make things precise, we must settle on a formal syntax and semantics. However, though the fragment of interest is very small indeed, the syntax already gives us pause. For intersecting adjectives can iterate, as in The driver was a gay Albanian with a brown-spotted partlygrey white dog. In fact, although color adjectives do not usually iterate on their own, if one adds words like “partly”, then we do get iteration: The partly blue, partly red, partly green ball was lost in the attic. For this reason, we propose two versions of the syntax. First, a flat syntax where nouns are either basic or contain an iterating adjective (Section 2). We call the languages of that section L(∀, adj) and L(∀, ∃, adj). We give a proof theory and completeness theorem for this language before turning to our second syntax, the languages Lr (∀, adj) and Lr (∀, ∃, adj) (Section 3). It might be interesting to mention that the rules of inference of our systems are indirectly based on the formulation of Keenan and Faltz from (1). (We say “indirectly” because our logical languages do not have proper nouns.) In a sense, one could state the basic issue of this paper: is (1) all that one could generally say about intersecting adjectives with the standard semantics? Does everything else follow from (1), together with more general facts about all and some? Or are there yet other logical principles waiting to be discovered? We shall return to this point at the end of the paper.
2
L(∀, adj) and L(∀, ∃, adj): Non-productive Syntax
Our syntax begins with basic nouns x1 , x2 , . . . and then adds intersecting adjectives a1 , a2 , . . . We then define the set of nouns, and denote nouns by letters like
Intersecting Adjectives in Syllogistic Logic
225
n, p, and q, by saying that the basic nouns are nouns, and if x is a noun and a an intersecting adjective, then a x is a noun. We call nouns of the form a x complex nouns. This is a very simple model of predication. It is also non-productive in the sense that nouns may contain only zero or one adjective, not more. Later in the paper we shall explore the possibility of re-working the syntax so that predication is productive (see Section 3). In what follows, we usually “abbreviate” the intersecting adjectives with color adjectives red , blue, and green. This helps to avoid subscripts, and it seems to improve readability. At first, the only sentences which we consider are those of the form ∀(p, q), read as all p are q. The collection of these sentences is called L(∀, adj). Later, we’ll expand this to a language L(∀, ∃, adj) by adding sentences some p are q. Our semantics for L(∀, adj) is based on models M for the fragment. A model consists of a set M , subsets [[x]] ⊆ M for each basic noun x, and sets [[a]] for the intersecting adjectives. Then we define the semantics of a noun a x by [[a x]] = [[a]] ∩ [[x]]. We define the relation of truth between models and sentences in the obvious way: M |= ∀(p, q) iff [[p]] ⊆ [[q]]. Then we say that M |= Γ if M |= S for all sentences S ∈ Γ . The main semantic definition is given by Γ |= S if for all models M, if M |= Γ , then also M |= S. The first logical question about this semantic notion is whether there is a matching proof-theoretic counterpart. 2.1
L(∀, adj): All and Intersecting Adjectives
The simplest syllogistic fragment “of all” is simply the collection of sentences of the form All n are p, where n and p are nouns. We shall call this language L(∀, adj), and later we shall expand it to L(∀, ∃, adj). Previous work studied the case of nouns without modifiers, and in this paper we allow nouns to be modified by intersecting adjectives. A logical system for L(∀, adj) is presented in Figure 1. The rules (T) and (B) are standard in syllogistic logic. (T) reflects our decision to let all p are p statements be valid (i.e., true in all models), regardless of whether a given model has p or not. The rule (B) gets its name from the classical syllogism Barbara. We shall not give a precise definition of a proof tree in a syllogistic logic, but the idea is that it should be a tree labeled with sentences, all of whose internal nodes match one of the rules of the logic. For a more precise definition, see Pratt-Hartmann and Moss [6]. The examples throughout this paper should help make this clear. If Γ ∪ {S} is a set of sentences in this fragment, we write Γ S to mean that there is a proof tree whose root is labeled S and whose leaves are labeled with sentences in the set Γ . This same definition works for all our fragments and all logics. All of our systems are sound : if Γ S, then Γ |= S. This easy point is shown by induction on derivations. Example 1. ∀(x, red y) ∀(x, red x). In words, if all x are red y, then all x are red x (hence red objects). Here is a derivation:
226
L.S. Moss
∀(n, n)
∀(n, p) ∀(p, q) (B) ∀(n, q)
(T)
∀(red x, x)
(Adj1 )
∀(n, red x) ∀(n, y) (Adj2 ) ∀(n, red y)
Fig. 1. The logic for the fragment L(∀, adj) of sentences ∀(n, p), read as all n are p. Note that x and y denote basic nouns, and n, p, and q denote noun which are either basic or complex. (T)
∀(x, red y) ∀(x, x) (Adj2 ) ∀(x, red x) Perhaps the most interesting single-premise inference available in this system is the following monotonicity fact. Example 2. ∀(x, y) ∀(red x, red y). The derivation is indicated below: (Adj1 )
∀(red x, x) ∀(x, y) (B) ∀(red x, red x) ∀(red x, y) (Adj2 ) ∀(red x, red y) (T)
Theorem 1. The logic of Figure 1 is complete for L(∀, adj): if Γ |= ∀(n, p), then Γ ∀(n, p). Proof. Suppose that Γ |= ∀(n, p); we show that Γ ∀(n, p). Consider a model M whose universe M is a singleton {∗}, and whose structure is given by {∗} if Γ ∀(n, x) [[x]] = ∅ if Γ ∀(n, x) [[red ]]
=
{∗} if for some basic noun x, Γ ∀(n, red x) ∅ otherwise
These definitions are made using the specific noun n from our overall statement of the theorem. We first claim that M |= Γ . Take a sentence in Γ such as ∀(l1 , l2 ). We have four cases, depending on whether l1 and l2 are basic or complex nouns. The most interesting is when l1 is red x and l2 is blue y. Again, we must show that [[red x]] ⊆ [[blue y]]. For this, we may assume that [[red x]] = ∅; otherwise, we trivially have the desired conclusion. Hence [[x]] = {∗}, so Γ ∀(n, x). Also, [[red]] must be {∗}, so for some z, Γ ∀(n, red z). Using (Adj2 ), Γ ∀(n, red x).
Intersecting Adjectives in Syllogistic Logic
227
Thus Γ ∀(n, blue y). Using (Adj1 ) and (B), we have Γ ∀(n, y). So ∗ ∈ [[blue]] ∩ [[y]] = [[blue y]]. This completes the proof of our claim. We have verified that M |= Γ . Recalling that Γ |= ∀(n, p), we have M |= ∀(n, p). We again have four cases, and we only mention two of them. First, in case n is a basic noun x, we have ∗ ∈ [[n]] by (T). Then ∗ ∈ [[p]] as well, and this means that Γ ∀(n, p). Second, assume that n is of the form red z also. Then using (Adj1 ), ∗ ∈ [[n]]. Hence again ∗ ∈ [[p]]. We only deal with the case that p is of the form blue w. So Γ |= ∀(n, w); also, for some basic noun z, Γ ∀(n, blue z). By (Adj2 ), Γ ∀(n, blue w). That is, Γ ∀(n, p). This completes the proof. 2.2
Proof Rules for some and Intersecting Adjectives
Next, we add sentences ∃(p, q) to our fragment. We call the resulting language L(∀, ∃, adj). The obvious semantics is to say that in a model M we have M |= ∃(p, q) just in case [[p]] ∩ [[q]] = ∅. We aim to study the semantic consequence relation Γ |= S, and especially to associate with it a sound and complete proof system. Figure 2 provides some sound inference rules, and the system that we study has as its rules the rules in Figures 1 and 2. The first two rules in Figure 2 come from syllogistic logic. Forgetting the adjectives for a moment, the rules (T), (B), (I), and (D) are complete for the language of sentences all p are q and some p are q; see [3], Theorem 4. There are related results in L ukasiewicz [2] and Westerst˚ ahl [7]. The name (D) comes from its name in classical syllogistics, Darii. The “twisted” form our formulation of (D) implies that the conversion property of some is derivable: ∃(n, p) ∃(p, n). (For this, take p = q in (D).) In the remainder of this paper, we use conversion frequently, and usually without mention. Example 3. ∃(red x, blue y) ∃(blue x, red y). Use (Adj5 ), taking green to be red , and z to be x. Example 4. ∃(x, red y) ∃(y, red x). On the left is a derivation of this reciprocity rule: ∃(x, red y) (Adj3 ) (Adj1 ) ∃(red x, red y) ∀(red y, y) (D) ∃(red x, y) Example 5. ∃(red x, blue y) ∃(blue x, red x). Use (Adj4 ), taking green to be red , and z to be x. Example 6. ∃(x, y), ∀(x, red z) ∃(x, red y). Here is a derivation: ∃(x, y) ∀(x, red z) Example 1 ∃(y, x) ∀(x, red x) (D) ∃(y, red x) Example 4 ∃(x, red y)
228
L.S. Moss ∃(n, p) (I) ∃(n, n)
∃(n, q) ∀(q, p) (D) ∃(p, n)
∃(x, red y) (Adj3 ) ∃(red x, red y)
∃(red x, blue y) ∀(red x, green z) (Adj4 ) ∃(red x, blue z) ∃(red x, blue y) ∀(red x, green z) (Adj5 ) ∃(blue x, green y) Fig. 2. Additions to Figure 1 for the larger language L(∀, ∃, adj) which contains sentences ∃(p, q)
In what follows, a sequent is a pair σ = (Γ, S) consisting of a set of sentences (of some fragment under discussion) and a sentence S of it. The set Γ is the set of premises of the sequent σ. The sequent is valid if Γ |= S. Note that completeness of a logical system is just the statement that every valid sequent is provable in the system. Proposition 1. Let σ be a sequent in L(∀, ∃, adj) with one or two premises. If σ is valid, then it is provable. For the verification, we begin with the one-premise sequents. The only valid ones are listed below, along with reasons for why they are provable: ∀(x, y) |= ∀(x, x): use (Adj1 ). ∀(x, red y) |= ∀(x, red x): use (Adj2 ) as in Example 6. ∃(x, y) |= ∃(y, x): use (D) with n = x and p = y = q. ∃(red x, n) |= ∃(x, x): use (I) to get ∃(red x, red x), and then use (D) and (Adj1 ) to get ∃(red x, x). Then use conversion and (I). 5. ∃(x, red y) |= ∃(y, red x): see Example 4. 6. ∃(red x, blue y) |= ∃(blue x, red y): see Example 3. 7. ∃(red x, blue y) |= ∃(blue x, red x): see Example 5. 1. 2. 3. 4.
We mentioned that these are the “only valid” sequents. To show that a given sequent is not valid, one may use a semantic argument. We shall see some of these below. There are a few others, and they are all minor modifications on the list above. For example, one can conclude ∀(x, x) from any premise. We next turn to the rules with two premises. For rules with two universal premises, the only sound ones are instances of (B), perhaps also involving monotonicity. For example, ∀(x, y), ∀(red y, blue z) |= ∀(red x, blue z). And in our system we have a corresponding proof: use monotonicity (Example 2) and the first premise to see ∀(red x, red y), and then use (B) with this and the other premise.
Intersecting Adjectives in Syllogistic Logic
229
Concerning two existential premises, it is easy to see that if S, T , and U are existential and S, T |= U , then either S |= U or T |= U . For if not, take a model M of S but not U , and a model N of T but not U , and then take the disjoint union. This would satisfy S and T but not U , a contradiction. The main work concerns the case when one premise is existential and the other is universal. In what follows, we are only going to consider existential consequences. The first two premise forms are as follows: 1. ∃(x, y), ∀(red x, n). 2. ∃(x, y), ∀(x, blue z). There are no sound conclusions of form (1) beyond what one can infer from the existential premise ∃(x, y). The second form has as sound conclusions ∃(x, blue y), and this was treated in Example 6. The other sound conclusions are ∃(x, blue z) (this is easy), and ∃(y, blue z) (this comes from (D)). The next forms would involve an existential sentence with one adjective, say ∃(red x, y). But this has the same models as ∃(red x, red y), and it is interderivable with it. So we may proceed to forms with premises containing a sentence of the form ∃(red x, blue y). The relevant forms continue thus: 3. ∃(red x, blue y), ∀(x, green z). 4. ∃(red x, blue y), ∀(red x, z). 5. ∃(red x, blue y), ∀(red x, green z). In form (3), one shows that for all existential sentences S, ∃(red x, blue y), ∀(x, green z) |= S
iff
∃(red x, blue y), ∀(red x, green z) |= S.
Here again, the argument is semantic. The implication from right to left is trivial, so assume the assertion on the right. Let M satisfy ∃(red x, blue y) and ∀(red x, green z). Consider the submodel Mred of M induced by [[red ]]. Then Mred satisfies ∃(red x, blue y) and ∀(x, green z). So Mred |= S. Since S is existential, we also have M |= S, as desired. The upshot is that form (3) is subsumed by form (5). Form (4) is easier, since ∀(red x, z) and ∀(red x, red z) are inter-derivable. We consider in full detail the premise form ∃(red x, blue y), ∀(red x, green z). Here are all of the sound conclusions of the form ∃(a u, b v), but omitting ones which are related by a use of Example 3. We list all the sound conclusions, together with an accounting of how each is proved in our system: ∃(red x, blue y): this is the first premise. ∃(red x, blue z): use (Adj4 ). ∃(red y, blue z): First use the second premise to get ∀(red x, red z). Then from this and the first premise and (D) get ∃(blue y, red z). Then use Example 3. ∃(red x, green y): Use the first premise to get ∃(red y, red x), and the second premise to get ∀(red x, green x). Then use (D) to get ∃(red y, green x). Finally, use Example 3.
230
L.S. Moss
∃(red x, green z): use the second premise, with ∃(red x, red x) from the first. ∃(red y, green z): First use the first premise to get ∃(red y, red x). Then from this and the first premise and (D) get ∃(red y, green z). ∃(blue x, green y): from (Adj5 ). ∃(blue x, green z): Use the first premise to get ∃(blue x, red x); see Example 5. Now use (D). ∃(blue y, green z): use (D) after inferring ∃(blue y, red x) from the first premise. This concludes our discussion of valid two-premise sequents and Proposition 1. 2.3
The Completeness Theorem
At this point, we have examined the proof system and know that it is strong enough to prove all of the valid two-premise sequents. We are ready to prove that the system is complete. Notation. If Γ is a set of sentences, we write Γ∀ for the subset of Γ containing only sentences of the form ∀(n, p). We do this for Γ∃ , mutatis mutandis. Theorem 2. The logic of Figures 1 and 2 is complete for L(∀, ∃, adj): if Γ |= S, then Γ S. Proof. Suppose that Γ |= S. There are two overall cases, depending on whether S is of the form ∀(n, m) or of the form ∃(n, m). In the first case, we claim that Γ∀ |= S. To see this, let M |= Γ∀ . We get a new model M = M ∪ {∗} via [[x]] = [[x]] ∪ {∗}. The model M so obtained satisfies Γ∀ and all ∃ sentences whatsoever in the fragment. Hence M |= Γ . So M |= S. And since S is a universal sentence, M |= S as well. This proves our claim that Γ∀ |= S. By Theorem 1, Γ∀ S. Hence Γ S. The second case, where S is an existential sentence, is more interesting. Consider the following model M = M(Γ ). Let M be the set of all unordered pairs {p, q} such that p and q are nouns, and Γ ∃(p, q). (We may well have p = q in such a pair.) For each basic nouns x and each intersecting adjective red we define sets [[x]] and [[red ]] for i = 0, 1, . . .; then sets we are after are i i i [[x]]i and i [[red ]]i ; we take these to be [[x]] and [[red ]]. The sets are defined by: 1. 2. 3. 4. 5. 6.
If {p, q} ∈ M and p is basic, then {p, q} ∈ [[p]]0 . If {p, q} ∈ M and p is red x, then {p, q} ∈ [[x]]0 ∩ [[red ]]0 . If {p, q} ∈ [[x]]i and Γ ∀(x, y), then {p, q} ∈ [[y]]i+1 . If {p, q} ∈ [[x]]i ∩ [[red ]] and Γ ∀(red x, y), then {p, q} ∈ [[y]]i+1 . If {p, q} ∈ [[x]]i and Γ ∀(x, blue y), then {p, q} ∈ [[y]]i+1 ∩ [[blue]]i+1 . If {p, q} ∈ [[x]]i ∩ [[red ]] and Γ ∀(red x, blue y), then {p, q} ∈ [[y]]i+1 ∩ [[blue]]i+1 .
An easy induction shows that if Γ ∀(x, y), then [[x]] ⊆ [[y]]. Moreover, this same fact is true for nouns containing adjectives. These facts imply that if a universal sentence ∀(p, q) belongs to Γ (so that Γ ∀(p, q)), then indeed [[p]] ⊆ [[q]]. We also want to check the analogous fact for sentences ∃(p, q). As usual, we have a
Intersecting Adjectives in Syllogistic Logic
231
number of cases, and we’ll only mention the one when p is red x and q is blue y. Then {x, y} belongs to [[x]] ∩ [[red ]] ∩ [[y]] ∩ [[blue]]. Hence M |= ∃(p, q). As a result of these observations, M |= Γ . Since we began with the assumption that Γ |= S, we see that M |= S. Now S is an existential sentence, say ∃(n, m), and our goal is to show that Γ ∃(n, m). In fact, we show the following facts: 1. If {u, v} ∈ [[x]] ∩ [[y]], then Γ ∃(x, y). 2. If {u, v} ∈ [[x]] ∩ [[y]] ∩ [[red ]] ∩ [[blue]], then Γ ∃(red x, blue y). It is at this point that we use the fact that our semantics of nouns and intersecting adjectives was the least fixed point of a monotone inductive definition, so that we can argue by induction it. That is, we show by induction on i that (a) if {u, v} ∈ [[x]]i ∩ [[y]]i , then Γ ∃(x, y); and similarly for the other assertion. The first base cases of this induction is when [[u, v]] ∈ [[x]]0 ∩ [[y]]0 via clause (1) in the definition. Then we have a number of subcases. To mention one, it might be that u = x and v = y. Since {u, v} ∈ M , we have Γ ∃(u, v). And thus Γ ∃(x, y). For another subcase, it might be that u = x and also u = y. Now as we have seen, Γ ∃(u, v), and by our logic, we also have Γ ∃(u, u). So in this case, we again have Γ ∃(x, y). Another base case in the induction is when [[u, v]] ∈ [[x]]0 ∩ [[y]]0 ∩ [[red ]]0 via clauses (1) and (2) in the definition of the semantics. For example, we might have u = red y so that [[u, v]] ∈ [[y]]0 ∩[[red ]]0 , and also v = x. Then Γ ∃(x, red y). And by the reciprocity fact noted in Example 4 we see that indeed Γ ∃(red x, y). The last base case in the induction is when [[u, v]] ∈ [[x]]0 ∩[[y]]0 ∩[[red ]]0 ∩[[blue]]0 via clauses (1) and (2) in the definition of the semantics. The arguments would be similar, and Example 3 would also be used. Next, we turn to the induction steps proper. Here is an example. Suppose that {u, v} ∈ [[x]]i+1 ∩ [[y]]i+1 ∩ [[red ]]i+1 ∩ [[blue]]i+1 because {u, v} ∈ [[x]]i ∩ [[w]]i ∩ [[red ]]i ∩ [[green]]i+1 and also Γ ∀(green x, blue z) and Γ ∀(w, y). By induction hypothesis, Γ ∃(green x, red w). We have the following derivation from Γ : .. .. .. .. .. ∃(green x, red w) ∀(green x, blue z) .. (Adj5 ) ∃(red x, blue w) ∀(w, y) P roposition 1 ∀(red x, blue y) (We are quoting Proposition 1 mostly because we did all the work to obtain that result.) There are, of course, many more induction steps. These all go through, and the main reason was mentioned before we started in on the proof of this theorem: we have included in the rules all of the sound two-premise rules that are expressible in the language. This fact is not directly used, but all of the reasoning that we have already seen would be used in the full verification here. This completes the proof.
232
2.4
L.S. Moss
A Note on (Adj4 ) and (Adj5 )
At this point, we digress from our main line and make a comment on (Adj4 ) and (Adj5 ). We shall check that they are not derivable from the other rules in our system. To see this, take the premises ∃(red x, blue y) and ∀(red x, green z), and call them Γ . An easy induction on derivations shows that if S is universal and Γ S without (Adj4 ) or (Adj5 ), then S must be of one of the following three forms: ∀(u, u) for some u, ∀(red u, u) for some u, or ∀(red x, green z). Now assume that Γ ∃(red x, blue z), or that Γ ∃(blue z, red x); again without (Adj4 ) or (Adj5 ). Take a derivation of minimal height. The last step in the derivation must be an application of (D). There are two cases, depending on the conclusion. They are similar, and we only go into details concerning ∃(blue z, red x). For some noun q, we must have Γ ∃(red x, q) and ∀(q, blue z). By our observation in the last paragraph, q must be blue z or red x. If q is blue z, we contradict the minimality assertion. And if q is red x, have Γ ∀(red x, blue z), contradicting what we showed in the last paragraph. This shows that without (Adj4 ) or (Adj5 ), we cannot derive ∃(red x, blue z) from our premises. Similar work shows the same thing about ∃(blue x, green y). The upshot is that neither ∃(red x, blue y) nor ∀(red x, green z) can be proved on the basis of (T), (B), (I), (D), (Adj1 ), (Adj2 ), and (Adj3 ).
3
Lr (∀, ∃, adj): Productive Predication
Up to now, we have worked with the flat syntax of nouns. We move to the other choice, a recursive syntax. Here we would start with basic nouns x, y, . . ., and (intersecting) adjectives a1 , a2 , . . ., and then say that basic nouns are nouns, and if n is a noun and red an adjective, then red n is a noun. The semantics of nouns in then is given by recursion, using the main clause [[a n]]
=
[[a]] ∩ [[n]].
Then we define M |= S, M |= Γ , and Γ |= S as earlier. (See the end of Section 1.) Our main goal again is to provide a proof system, thereby defining a relation Γ r S in a syntactic way, and then to show the connection in a soundness/completeness theorem. The proof system itself is listed in Figure 3. To keep straight the distinction between the proof system for L(∀, ∃, adj) and the one for Lr (∀, ∃, adj), we write Γ r S for the derivation relation in this section. Our first examples concern iterated adjectives. Example 7 shows that applying the same adjective twice gives nothing new; perhaps this is a justification for why we never see phrases like red red ball in natural language. The two adjectives in Example 8 are likewise odd, but we encourage the reader to read red and blue as partly red and partly blue, or to remember our semantics.
Intersecting Adjectives in Syllogistic Logic
∀(n, n)
∀(n, p) ∀(p, q) (B) ∀(n, q)
(T)
∀(red n, n)
∃(n, p) (I) ∃(n, n)
∀(n, red p) ∀(n, q) (Adj2 ) ∀(n, red q)
(Adj1 )
233
∃(n, q) ∀(q, p) (D) ∃(p, n) ∃(p, red q) (Adj3 ) ∃(red p, red q)
Fig. 3. The logical system for L r (∀, ∃, adj). Note that the x and y denote basic nouns, and n, p, and q denote complex nouns in the sense of this section.
Example 7. For all n, r ∀(red n, red red n) and r ∀(red red n, red n). For the first point, we have the following derivation: (T)
(T)
∀(red n, red n) ∀(red n, red n) (Adj2 ) ∀(red n, red red n) The second point is an instance of (Adj1 ). Example 8. r ∀(red blue n, blue red n). Here is the derivation: .. .. (T) ∀(red blue n, red blue n) ∀(red blue n, n) (Adj1 ) (Adj2 ) ∀(red bl n, bl n) ∀(red blue n, red n) (Adj2 ) ∀(red blue n, blue red n) The point at the top which is not shown consists of two applications of (Adj1 ) and also (B). Theorem 3. The rules (T), (B), (Adj1 ), and (Adj2 ) give a complete proof system for L(∀, adj)r : if Γ |= ∀(n, p), then Γ r ∀(n, p)S. Proof. Suppose that Γ |= ∀(n, p); we show that Γ r ∀(n, p). Consider a model M whose universe M is a singleton {∗}, and whose structure is given by {∗} if Γ r ∀(n, x) [[x]] = ∅ if Γ r ∀(n, x) [[red ]]
=
{∗} if for some basic noun x, Γ r ∀(n, red x) ∅ otherwise
These definitions are made using the specific noun n from our overall assumption in this proof. Claim. For all nouns p,
[[p]]
=
{∗} if Γ r ∀(n, p) ∅ if Γ r ∀(n, p)
(2)
234
L.S. Moss
The proof is by induction on p. For p a basic noun, the result is immediate. Assume (2) for p; we show (2) for red p. If ∗ ∈ [[red p]] = [[red ]] ∩ [[p]], then Γ r ∀(n, p) and for some x, Γ r ∀(n, red x) By (Adj2 ), Γ r ∀(n, red p), as desired. We now argue the converse. If Γ r ∀(n, red p), then Γ r ∀(n, p) using (B) and (Adj1 ). We write p as a1 · · · aj x, so that red p is red a1 · · · aj x, and then we argue by induction on j that r ∀(red p, red x). If n = 0, this is immediate. If n ≥ 1, we show that r ∀(red a1 a2 · · · an x, red a2 · · · an x), using (Adj1 ) and (B). (See Example 8.) And then by induction hypothesis, we have r ∀(red a2 · · · an x, red x). This concludes the induction showing that r ∀(red p, red x), and from this we see that Γ r ∀(n, red x). Therefore ∗ ∈ [[red ]]. Overall, ∗ ∈ [[red p]]. This completes the induction on p, hence the proof of this claim. Continuing with the proof of Theorem 3, we next observe that M |= Γ . Take a sentence in Γ such as ∀(l1 , l2 ). We must show that [[l1 ]] ⊆ [[l2 ]]. For this, we may assume that [[l1 ]] = ∅. Hence [[l1 ]] = {∗}, so Γ r ∀(n, l1 ). Using (B), Γ r ∀(n, l2 ), so again ∗ ∈ [[l2 ]]. We have verified that M |= Γ . Recalling that Γ |= ∀(n, p), we have M |= ∀(n, p). So by our claim, we have the desired conclusion that Γ r ∀(n, p). This completes the proof. 3.1
Simulation of L(∀, ∃, adj) in Lr (∀, ∃, adj)
Our goal in the next section is to prove the completeness of Lr (∀, ∃, adj) using the proof system defined in Figure 3. Here are two ways that one could go about this. First, one could basically repeat the proof of Theorem 2. This would be a fairly direct modification. At the same time, it would be uninteresting to read. Instead, we shall present a different approach. For each of the rules in Figure 2, except possibly (Adj4 ) and (Adj5 ), the corresponding sequent is provable in the logic for Lr (∀, ∃, adj). In fact, this holds with the basic nouns in Figure 2 replaced by arbitrary nouns. Proposition 2. Every instance of (Adj4 ) and (Adj5 ) in Figure 2 is provable in the logical system for Lr (∀, ∃, adj). Moreover, this holds with the basic nouns replaced by arbitrary nouns. Proof. Here is the derivation for (Adj4 ), omitting some routine conversion steps: (Adj1 )
∃(red n, blue p) ∀(red n, green q) ∀(green q, q) (B) Example 5 ∃(red n, blue n) ∀(red n, q) (Adj3 ) Example 2 ∃(red n, blue red n) ∀(blue red n, blue q) (D) ∃(red n, blue q)
What we mean by Examples 5 and 2 are the obvious versions of those results for the language Lr (∀, ∃, adj): a look back at both derivations shows that they did not use (Adj4 ) or (Adj5 ).
Intersecting Adjectives in Syllogistic Logic
235
For (Adj5 ), we have (Adj1 )
∀(red n, green q) ∀(red n, n) (Adj2 ) ∃(red n, blue p) ∀(red n, green n) (D) ∃(blue p, green n) (Adj3 ) ∃(blue p, blue green n) using Example 8 ∃(blue p, green blue n) (Adj3 ) ∃(green blue p, green blue n) .. .. ∃(green p, blue n) We have left out some routine steps at the bottom.
3.2
Completeness of Lr (∀, ∃, adj)
Our final result is the completeness of Lr (∀, ∃, adj). We aim to reduce this fact to our earlier completeness result for L(∀, ∃, adj). Some of the work was done in Proposition 2, but there are a few steps to go. Throughout this paper, we have been working with fixed sets of basic nouns and intersecting adjectives. That is, the languages in the paper have been defined in terms of those sets, but we suppressed the sets in our notation. At this time, we must be a little more explicit. Let N be our set of basic nouns and A our set of adjectives. We’ll call our languages L(∀, ∃, adj)N,A . Let N ∗ be the set all adjectives, allowing for recursion. Let X be a new set, and assume that X is in bijective correspondence with N ∗ . Write N + X for the disjoint union of N and X. The language L(∀, ∃, adj)N +X,A then has as basic nouns the elements of N together with new basic nouns in X. To be explicit, for every noun n of Lr (∀, ∃, adj)N,A , we have a basic noun vn of L(∀, ∃, adj)N +X,A . We translate Lr (∀, ∃, adj)N,A into L(∀, ∃, adj)N +X,A via a map S → S ∗ . For example, if S is ∃(red blue green x, y), then S ∗ is ∃(vred blue green x , vy ). Theorem 2, the completeness theorem from earlier in this paper, holds for L(∀, ∃, adj)N +X,A , since it holds for the flat syntax language built from any set of basic nouns. The translation also works in the other direction, taking each vn to the corresponding n, and also each red vn to the corresponding red n. Theorem 4. The logic of Figures 3 is complete for Lr (∀, ∃, adj): if Γ |= S, then Γ S. Proof. Assume that Γ S. Let Γ ∗ = {S ∗ : S ∈ Γ }. Let Δ
=
{∀(vred n , red vn ) : n ∈ N } ∪ {∀(red vn , vred n ) : n ∈ N };
236
L.S. Moss
again, N is the set of nouns with which we started. We claim that Γ ∗ ∪ Δ |= S ∗ . To see this, let M |= Γ ∗ ∪ Δ. Then an induction on nouns n in the recursive language shows that [[vn ]] (the interpretation of vn ) is the same as [[n]]. This is where we use the clauses in Δ. As a result, truth values of sentences in M are preserved under translation in both directions. Hence M |= Γ . Since Γ |= S, we have M |= S also. And then M |= S ∗ . Having shown the claim, we see that by completeness, Γ ∗ ∪ Δ S ∗ . Let D be a derivation for this in the sense of Section 2. D is in L(∀, ∃, adj)N +X,A , and therefore we must translate it back to a derivation in Lr (∀, ∃, adj)N,A . For this, replace each vn with the corresponding noun n. Most instances of the proof rules in D translate to the same steps in Lr (∀, ∃, adj). This is true for (T), (B), (Adj1 ), (Adj2 ), (I), (D), and (Adj3 ). For example, ∀(ured y , red ured x ) ∀(ured y , ublue x ) (Adj2 ) ∀(ured y , red ublue x ) translates to
∀(red y, red red x) ∀(red y, blue x) (Adj2 ) ∀(red y, red blue x)
However, some of the steps in D might use (Adj4 ) or (Adj5 ). Take these, and replace them with derivations which do not use them, following Proposition 2. Finally, the leaves of D which happen to belong to Δ translate to instances of (T). The conclusion is that we have a derivation in Lr (∀, ∃, adj), as desired.
4
Conclusion
The results in this paper are complete logical systems for some very simple syllogistic systems, those extending the basic syllogistic logic of all and some with intersecting adjectives. These are some of the simplest logical systems of all, all of the work has been completely elementary. This is not to say that it was obvious: I have found that it is easy in this kind of work to omit “obvious” cases and thereby fail to have a complete system, and on the other hand it is also easy to state redundant rules. My point is that the results here do not depend on any facts from other papers. At the end of the Introduction, we raised the question of whether the principle in (1) was essentially the only new one concerning intersecting adjectives. That is, if one adds it to the logic of all and some, is the resulting system complete? For the purposes of this point, we take (1) to be formalized as (Adj2 ) and (Adj3 ). We also assume the extensionality of adjectives, and this is (Adj1 ). Our results in Section 2.4 indicate that if one adheres to a flat syntax, then two more logical principles are needed to prove completeness: (Adj4 ) and (Adj5 ). On the other hand, moving to the larger language that admits recursive modification using intersecting adjectives allows us to prove (Adj4 ) and (Adj5 ). So in this sense, Keenan and Faltz’ (1) is indeed all that there is to the logic of intersecting adjectives.
Intersecting Adjectives in Syllogistic Logic
237
My feeling is that the results here should extend to many other syllogistic systems without much change. For example, they should extend to all of the systems in Pratt-Hartmann and Moss [6]. The details on this have yet to be worked out. Another worthwhile project would be to investigate what natural logic would look like for adjectives which are not intersecting.
References 1. Keenan, E.L., Faltz, L.M.: Boolean Semantics for Natural Language. In: Synthese Language Library, vol. 23, D. Reidel Publishing Co., Dordrecht (1985) 2. L ukasiewicz, J.: Aristotle’s Syllogistic from the Standpoint of Modern Formal Logic. Clarendon Press, Oxford (1951) 3. Moss, L.S.: Completeness Theorems for Syllogistic Fragments. In: Logics for Linguistic Structures, vol. 29, pp. 143–173. Mouton de Gruyter, Berlin (2008) 4. Moss, L.S.: Logics for Two Fragments Beyond the Syllogistic Boundary. In: Blass, A., et al. (eds.) Studies in Honor of Yuri Gurevich, August 2009. LNCS. Springer, Heidelberg (2010) 5. Nishihara, N., Morita, K., Iwata, S.: An Extended Syllogistic System with Verbs and Proper nouns, and its Completeness Proof. Systems and Computers in Japan 21(1), 760–771 (1990) 6. Pratt-Hartmann, I., Moss, L.S.: Logics for the Relational Syllogistic. Review of Symbolic Logic 2(4), 647–683 (2009) 7. Westerst˚ ahl, D.: Aristotelian Syllogisms and Generalized Quantifiers. Studia Logica XLVIII(4), 577–585 (1989)
Creation Myths of Generative Grammar and the Mathematics of Syntactic Structures Geoffrey K. Pullum School of Philosophy, Psychology and Language Sciences, University of Edinburgh 3 Charles Street, Edinburgh EH8 9AD, UK
[email protected] http://ling.ed.ac.uk/~ gpullum/
Abstract. Syntactic Structures (Chomsky [6]) is widely believed to have laid the foundations of a cognitive revolution in linguistic science, and to have presented (i) the first use in linguistics of powerful new ideas regarding grammars as generative systems, (ii) a proof that English was not a regular language, (iii) decisive syntactic arguments against contextfree phrase structure grammar description, and (iv) a demonstration of how transformational rules could provide a formal solution to those problems. None of these things are true. This paper offers a retrospective analysis and evaluation.
1
Introduction
Syntactic Structures (Chomsky [6], henceforth SS ) was not just another contribution to the discipline of structural linguistics. In the opinion of many American linguists, it ended the structuralist period. Martin Joos’s definitive anthology of structuralist work Readings in Linguistics I first appeared in the same year, and it now looks more like an obituary than a reader. The study of syntax was altered forever by the introduction in SS of transformational generative grammar (TGG). Forty years later, Howard Lasnik’s introductory graduate syntax course at the University of Connecticut was still built around the content of SS together with more recent developments that he regarded as flowing directly from it (see Lasnik [20]). But people have come to believe things about SS that were never true. Some linguists encourage such false beliefs. Lightfoot [21] opens his introduction to the ‘second edition’ of SS (actually just a re-issue of the second printing of the first edition, retaining the typographical errors) by stating that ‘Noam Chomsky’s Syntactic Structures was the snowball which began the avalanche of the modern “cognitive revolution”. . . [which] originated in the seventeenth century and now construes modern linguistics as part of psychology and human biology.’ There
This paper is based on an invited presentation at the Mathematics of Language conference at UCLA in August 2007. Many of the ideas here have been profitably discussed with my collaborator Barbara Scholz. I am very grateful to her for her generosity with assistance and advice — not that I have taken all of the advice.
C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 238–254, 2010. c Springer-Verlag Berlin Heidelberg 2010
Creation Myths and Syntactic Structures
239
was not even a nod toward the study of cognition in SS, nor a flicker of interest in the 17th century. Lightfoot’s psychobiological snowball is just an invention. In this paper I try to counter some of the myth-making about SS, focusing on the mathematical bases for the statement of grammars rather than any anachronistic claims about the philosophical origins or cognitive implications of the proposals in SS. I begin by examining the origins of the conception of grammars that SS introduced.
2
Generative Grammar and the Work of Emil Post
TGG originates in work that was aimed at mathematicizing logical proof. Above all it stems from early work by the Polish-American mathematical logician Emil Leon Post (1897–1954). 2.1
Production Systems
SS defines ‘the form of grammar associated with the theory of linguistic structure based upon constituent analysis’ thus (SS, p. 29): Each such grammar is defined by a finite set Σ of initial strings and a finite set F of ‘instruction formulas’ of the form X → Y interpreted: “rewrite X as Y .” Though X need not be a single symbol, only a single symbol of X can be rewritten in forming Y . As an example, Chomsky gives a grammar where Σ = {Z} and F contains the rules Z → ab and Z → aZb. The stringset generated is {an bn |n ≥ 1}. Chomsky adds (p. 31): It is important to observe that in describing this language we have introduced a symbol Z which is not contained in the sentences of this language. This is the essential fact about phrase structure which gives it its ‘abstract’ character. It will be clear to anyone acquainted with Emil Post’s mathematical work that a grammar of the sort Chomsky has defined is a special case of what Post called a production system. Post started out trying to formalize the logic informally assumed in Whitehead and Russell in Principia Mathematica, and ended up with a characterization of the recursively enumerable (r. e.) sets. He formalized inference rules as productions. A production associates a set of given strings (the premises) to a new string (the conclusion), which the premises are said to ‘produce’. A production system consists of a set of initial strings (this corresponds to the Σ of SS ) and a set of productions (corresponding to the set F of ‘instruction formulas’ in SS ). (Post [28] is the definitive journal article; Brainerd & Landweber [3] provides a very useful elementary exposition with worked examples.) Given a set {φ1 , . . . , φn } of initial strings and/or strings derived from
240
G.K. Pullum
them by the productions (where n ≥ 1), a production saying ‘{φ1 , . . . , φn } produces φn+1 ’ legitimates the addition of φn+1 to the collection of strings that are derived or generated. Twenty years after Post [28], Chomsky and Miller [12] propose (p. 284) that rules of grammar are of this form: (1)
φ1 , . . . , φn → φn+1
They explain: ‘each of the φi is a structure of some sort and . . . the relation → is to be interpreted as expressing the fact that if our process of recursive specification generates the structures φ1 , . . . , φn then it also generates the structure φn+1 .’ Clearly, they might just as well have said that they take grammatical rules to be productions in the sense of Post [28]. Generative Capacity However, Post did more than simply invent what were later to be called generative grammars. He also proved the first theorems concerning generative capacity. The major result of Post [28] was a theorem concerning the expressive power of production systems with a radically limited format for productions. Post’s original definition of productions was maximally general, with no limits on number or complexity of premises. The φi are of the form g0 P1 g1 P2 . . . gk−1 Pk gk (for k ≥ 0), where the gi are specified constant strings of symbols over a vocabulary Ω and the Pi are free variables that can take any string in Ω ∗ as value, and carry it over to the conclusion if that variable appears there. Post called these maximally general production systems ‘canonical systems’, but he proved that the same generative power was obtainable with productions of a much simpler form. Normal systems. The main theorem of Post [28] is that every set generated by a canonical system can also be generated by a system in a much more restricted format called a ‘normal system’. In a normal system there is just one axiom, and all productions take the form ‘g1 P produces P g2 ’, where P is a free variable and g1 and g2 are specified strings. To be more precise, Post’s theorem is this: (2)
Theorem. (Post [28]) Given a canonical system Γ over a finite vocabulary ΩT it is always possible to construct a normal system Γ over Ω = ΩT ∪ ΩN (where ΩN is a new set of symbols disjoint from ΩT ) such that Γ generates x ∈ ΩT∗ iff Γ generates x.
This shows that a radical limitation on rule form, restricting rules to saying ‘Any string beginning with g1 may be rewritten with its g1 prefix erased and g2 added at the end’, has no effect at all on generative capacity. The extra symbols in ΩN that do not appear in generated strings are of course the ones that Chomsky described as essential to the abstract character of phrase structure: they are the symbols he would later call nonterminals.
Creation Myths and Syntactic Structures
241
Semi-Thue systems. There is another specially limited form of productions. Chomsky [9] calls these ‘rewriting rules’, and recognizes explicitly that they are restricted forms of Post’s production systems: A rewriting rule is a special case of a production in the sense of Post; a rule of the form ZXW → ZYW, where Z or W (or both) may be null. (Chomsky [9]: 539) Productions in this format were called type-0 rules in Chomsky [7]. The number of premises is limited to 1, and all of W, X, Y, Z are specified strings. The only free variables are the flanking ones covering whatever precedes W and whatever follows Z. Thus in Post’s notation such as rule would say ‘P1 g1 g2 g3 P2 produces P1 g1 g4 g3 P2 ’. This replaces g2 by g4 if g1 immediately precedes and g3 immediately follows. This restriction originates in a technical paper from ten years before in which Post (following a suggestion by Alonzo Church) tackled an open question posed by Axel Thue [40]. Thue had asked whether there was a decision procedure for determining whether a specified string X could be converted into a given string Y by a set of rules of the form ‘W XZ ↔ W Y Z, where W, X, Y, Z are strings over some fixed finite alphabet and φ ↔ ψ is to be read as ‘φ may be replaced by ψ or conversely’. Post [30] answers Thue’s question by showing first that if there is a decision procedure for Thue-style bidirectional systems (where for every φ → ψ we also have the inverse ψ → φ) there is a decision procedure for unidirectional ones (which do not necessarily have the inverses), and this is known not to be true, so the reduction shows that the decision problem for Thue systems—the type-0 rules of Chomsky—is recursively unsolvable. 2.2
Recursive Enumerability
Post had thus proved the first two theorems in what would later come to be known as the theory of generative power of grammars. Both of his results show that radical limitations on the form of rules may have no effect on what can be generated. The importance of Chomsky [7] was that it showed other restrictions did limit what could be generated (for example, ‘P1 g1 g2 g3 P2 produces P1 g1 g4 g3 P2 ’ with the restriction that |g2 | ≤ |g4 | will generate only contextsensitive stringsets). But the transformations introduced in SS did not entail any such limitations. Hilary Putnam, in a remarkably prescient paper [33], discussed his reasons for thinking that natural languages had to have a decidable membership problem, and then remarked: Chomsky’s general characterization of a transformational grammar is much too wide. It is easy to show that any recursively enumerable set of sentences could be generated by a transformational grammar in Chomsky’s sense. He provided no proof, but his conclusion was surely correct. There were no signs of limitations on the form of transformations that could restrict their expressive power more tightly than that of canonical systems.
242
G.K. Pullum
There was one element that Chomsky added to production systems in developing generative grammars: the device of ‘extrinsic’ rule ordering. He required that a grammar should define a strict ordering on its rules, so that each rule Ri would be permitted to apply (if at all) only after all the rules ordered before it had applied, and before any of the rules ordered after it had applied. But this had no restrictive effect on generative power. No one ever offered an example of a stringset that can be generated by some unordered set of productions but cannot be generated by any ordered set of productions.1 Chomsky only ever cited one paper of Post’s, an informal paper on r. e. sets of positive integers that Post delivered as a lecture to the American Mathematical Society [29]. In [7] (p. 137n) and [8] (p. 7) this paper is cited as the source of the term ‘generate’. Post is also acknowledged (though without a bibliographical citation) in connection with the form of Type 0 rewriting rules ([9]: 539), and is mentioned once in Aspects of the Theory of Syntax ([10]: 9): ‘The term “generate” is familiar in the sense intended here in logic, particularly in Post’s theory of combinatorial systems’. But Chomsky appears never to have made a bibliographical reference to any of Post’s technical papers on production systems.2 SS, perhaps because its aim was to present transformational generative grammar to undergraduate science and engineering students, has even less referencing: the bibliography includes neither Rosenbloom’s book [34] nor anything by Post.3
3
The Supposed Proof That English Is Not Finite-State
It is very widely believed that SS gives a proof that English is not finite-state. This is not true. A few informal suggestions are made to support the assertion that ‘English is not a finite state language’ so that ‘it is impossible, not just difficult, to construct a device of the [finite automaton] type . . . which will produce all and only the grammatical sentences of English’ (p. 23). But there was no proof; and it is not clear that a proof anything like the one Chomsky seems to have had in mind can succeed. Chomsky had given a fuller argument that natural languages are not finitestate in a celebrated technical paper of the year before: [5], cited in SS on p. 22. This is claimed to contain the ‘rigorous proof’ to which SS alludes on p. 23. But 1
2
3
This is different from saying that ordering cannot restrict what a particular set of rules can generate. Pelletier [27] shows that requiring strict ordering of a set of rules can indeed make some outputs impossible to generate by that set of rules. But as he stresses, this result presumes that the set of rules is fixed, which is not the situation linguists ever find themselves in. Urquhart [41] suggests that this might be because his understanding of Post systems came from a secondary source, namely Rosenbloom [34], which Chomsky cites in [4] and [5]. The contributions of Zellig Harris are also somewhat downplayed in SS. See Seuren [37] for discussion of the way Harris introduced top-down generation — the idea that ‘a deductive system with axiomatically defined initial elements and with theorems concerning the relations among them’ could be used to ‘enable anyone to synthesize or predict utterances in the language.’
Creation Myths and Syntactic Structures
243
if the 1956 argument is sound, no one (to my knowledge) has confirmed that. I do not understand it, and nor did Daly [13]. In its original form (Chomsky [5]) it depended on a cumbersomely defined relation of “(i, j)-dependency’ holding between a string S of length n, two integers i and j such that 1 ≤ i < j ≤ n, and a language L over a vocabulary A. The definitions are changed in the 1965 reprint version of the paper (a footnote credits E. Assmuss for pointing out an error). The 1965 revision relies on a cumbersomely defined ternary relation of ‘mdependency’ between a sentence S, an integer m, and a stringset L, where S = x1 a1 x2 a2 . . . xm am z b1 y1 b2 y2 . . . bm ym , and there is a unique permutation of the numbers (1, . . . , m) — a bijective mapping α from {1, . . . , m} to itself — meeting the following condition (I quote from p. 108 of the reprint): “there are {c1 , . . . , c2m } ∈ A such that for each subsequence (i1 , . . . , ip ) of (1, . . . m), S1 is not a sentence of L and S2 is a sentence of L, where (10) S1 is formed by substituting cij for aij in S, for each j ≤ p; S2 is formed by substituting cm+α(ij ) for bα(ij ) in S1 , for each j ≤ p.” The idea is that if in the string S the symbol ai is replaced by the symbol ci , restoring grammaticality in L necessitates replacing bα(i) by cm+α(i) . From there, the crucially relevant mathematical step is to claim that an FSL can only exhibit m-dependencies up to some finite upper bound on m (Chomsky says an m-dependency needs at least 2m states; Svenonius [39] says this is untrue, and m states will suffice). The empirical claim is that English has no such upper bound, and is therefore not an FSL. But Chomsky does not complete the argument by connecting these abstractions to English data; he merely points to some sentence templates (“If S1 , then S2 ”; “Either S3 , or S4 ”; “The man who said that S5 , is arriving today” [comma in original]), and asserts that through them “we arrive at subparts of English with . . . mirror image properties” and thus “we can prove the literal inapplicability of this model” (Chomsky [5], 1965 reprinting, p. 109). Daly [13] spends many pages attempting to work out how a sound argument for Chomsky’s conclusion might be based on the data that he cites. Chomsky seems to think that pairs like if, then and either, or give rise to mdependencies. Daly could not see how this could be true. Nor can I. The words in these pairs can occur in sentences without the other member of the pair. (The same is true of other pairs such as neither, nor and both, and .) It is not clear that there is any pair of lexical items σ and τ in English such that if ϕσψ is grammatical then ψ = ψ1 τ ψ2 with |ψ1 | > 0. In addition, the reference to finding “various kinds of non-finite state models within English” (SS : 22–23) and the similar remark about “subparts of English with . . . mirror image properties” (Chomsky [5], 1965 reprinting, p. 109) suggest a failure to appreciate that FSLs (or context-free stringsets) can have infinite non-FSL (or non-context-free) subsets. Only if such a subset can be extracted by some regularity-preserving language-theoretic operation like homomorphism or intersection with a regular set does it entail anything about the language as a whole.
244
G.K. Pullum
Thus it is not at all clear that Chomsky ever had an argument against English being an FSL. Certainly none appears SS.
4
Justifying Transformations
Even if SS had shown that natural languages were not finite-state, that would not be sufficient to justify the transformational analyses that are thought of as the book’s most significant contribution, because context-free phrase structure grammars (CF-PSGs) might have sufficed. It has since been shown to most linguists’ satisfaction that natural languages are non-CF (see e.g. Shieber [38]), but there was no hint in SS of any such result. Instead, SS gives three arguments based on descriptive elegance. They hinge on coordination, auxiliaries, and passives. On re-examination, all three arguments look decidedly unconvincing. 4.1
Coordination
Coordination in English is claimed in SS to be governed by a principle informally stated as follows ((26) in SS, p. 36): (3)
“If S1 and S2 are grammatical sentences, and S1 differs from S2 only in that X appears in S1 where Y appears in S2 (i.e., S1 = . . . X . . . and S2 = . . . Y . . .), and X and Y are constituents of the same type in S1 and S2 , respectively, then S3 is a sentence, where S3 is the result of replacing X by X + and + Y in S1 (i.e., S3 = . . . X + and + Y . . .).”
This is not, of course, a transformation. S1 and S2 are required to be ‘grammatical sentences’; i.e., strings generated by the grammar. So (3) is quantifying over the entire content of the language. It is what would later be called a transderivational constraint. The claim is not true of English. There are many cases of X and Y such that both can occur in a given context but the coordination X and Y cannot. Perhaps the most obvious is the case of verb agreement controllers. Let X = Don and Y = Phil. Then for I think X was there and I think Y was there, (3) says that I think X and Y was there = *I think Don and Phil was there should be grammatical, but this is not so. Several other such failures of (3) have been noted by Huddleston & Pullum ([19], pp. 1323–1326). Chomsky recognizes that ‘additional qualification is necessary’, but nonetheless claims that ‘the grammar is enormously simplified if we set up constituents in such a way that [(3)] holds even approximately (SS, 37). In the summary rules at the end of the book (p. 113) he therefore gives a ‘generalized transformation’ — basically a production with two premises — to capture the effects of (3). His rule statement is given in (4). (4)
Structural analysis: of S1 : Z − X − W of S2 : Z − X − W where X is a minimal element (e.g., NP, VP, etc.) and Z, W are segments of terminal strings. Structural change: (X1 − X2 − X3 ; X4 − X5 − X6 ) → X1 − X2 + and + X5 − X3
Creation Myths and Syntactic Structures
245
Remarkably, despite all the symbols, (4) is less explicit and less accurate than (3). The letter S in the variable names ‘S1 ’ and ‘S2 ’ might suggest ‘Sentence’, but S1 and S2 will not in fact be sentences (strings over the terminal vocabulary); they will be sentential forms (possible stages in a derivation, potentially including nonterminals). X is stipulated to be a ‘minimal element’, but this term is undefined—it appears to mean ‘single nonterminal’. Z and W are stipulated to be ‘segments of terminal strings’, so S1 and S2 are the same string and there was no point in distinguishing them. A case to which (4) can apply will be something like S1 = S2 = Put NP in the truck. But nowhere in (4) is it guaranteed that there is any difference between the terminal strings of the X constituents in S1 and S2 : (4) yields *Put it and it in the truck as an output, which is probably unintended (since in (3) it was stated that ‘S1 differs from S2 ’). We can assume that Chomsky intended S1 and S2 to be identical sentential forms that are somehow guaranteed to have distinct generated terminal strings. But nothing hangs on S1 and S2 at all: no use is made of the variables Z and W in the ‘structural change’ (the output or conclusion) of the rule. Indeed, the structural change throws away all the variables of the input: six new variables X1 , . . . , X6 are introduced, the X in the variable names have no relation to the prior uses of X. SS says nothing about what the Xi range over, and no connection is made between them and Z or X or W . We are left to guess that all the Xi range over terminal strings; that X1 = X4 = Z; that X3 = X6 = W ; that X2
= X4 ; and that X2 and X4 are terminal strings of instances of the category X. None of this is made explicit in (4) or elsewhere. Nine variables are used to hold four values (the terminal strings Z, X2 , and W , and the category X), and they have not been explicitly related. This is an inexpert and somewhat pointless deployment of pseudo-mathematical symbolism. The content of the rule appears to be specifiable much more simply. All the rule does is to ensure that a nonterminal symbol X can exhaustively dominate the string ‘X and X’, in any context whatsoever. And a simple phrase structure rule ‘X → X and X’ could have done that.4 Nothing is said in SS about multiple coordination. An attempt is made to provide for the generation of sentences like I like indigo and violet, but not of sentences like I like red, orange, yellow, green, blue, indigo, and violet. It is not made clear whether a generalized transformation can reapply to its own output, nor why n − 2 of the coordinators disappear in an n-coordinate structure, nor why the coordinator and can be placed only before the last coordinate, nor how other coordinators are introduced. To summarize, the proposal that SS makes about handling coordination is obscure, incomplete, inadequate, and apparently unnecessary. 4
It may be that Chomsky ruled out positing such a rule on the grounds that it would not permit the unambiguous reconstruction of a tree from each phrase structure derivation (see McCawley [23] on this point). But as McCawley noted, the background assumption (that trees must be built from derivations rather than licensed by phrase structure rules directly) is a strange and unmotivated one.
246
4.2
G.K. Pullum
Auxiliaries
The SS analysis of the English auxiliaries is frequently cited as a novel and impressive achievement. It looks somewhat less novel when we consider the analysis published by Fries [15] five years before: (5)
group class A 1
group class B 2 (a) (b) (c) (d) The students may have had to be moving
Fries’s ‘classes’ are lexical categories like noun (class 1) and verb (class 2), and the ‘groups’ cover syntactically associated minor items like determiners (group A) and auxiliaries (group B). Fries takes the maximal auxiliary cluster to consist of a modal such as may followed by the perfect auxiliary have followed by an instance of have to followed by the progressive auxiliary be, each being optional. And the famous CF-PSG rule (6) of SS follows it, except that it correctly drops have to (not an auxiliary element at all): (6)
Aux → C (M ) (have + en) (be + ing)
‘C’ is a tense or concord (agreement) morpheme, and ‘M’ stands for ‘modal’. So the rule lays out the tense or concord morpheme, an optional modal, an optional instance of have accompanied by the past participle suffix -(e)n, and an optional instance of be accompanied by the gerund-participle suffix -ing, strictly in that order. Chomsky accepts Fries’s idea of treating the components of the auxiliary cluster as non-verb dependents. Both defend variants of what [19] calls the dependent-auxiliary analysis. Fries is not explicit about how the successive items get their inflectional properties, but SS provides an answer: there is a transformation in SS (subsequently known as ‘Affix Hopping’) called the Auxiliary Transformation, and it is formulated thus: (7)
Auxiliary Transformation — obligatory: Structural analysis: X – Af – v – Y (where Af is any C or is en or ing; v is any M or V, or have or be) Structural change: X1 – X2 – X3 – X4 → X1 – X3 – X2 # – X4
The use of symbols in the SS analysis is promiscuous and occasionally misleading. For example, SS uses no less than 6 competing and inconsistently defined symbols that might be said correspond to the informal notion ‘verb’: Verb, V , v, V1 , Va , and V2 . The text contradicts itself about several of them. Verb is introduced as a lexical node on p. 28, but is clearly treated as a phrasal node on p. 39. V is introduced as a lexical node on p. 39, and is equated with the
Creation Myths and Syntactic Structures
247
informal term ‘verb’ on p. 42, but then becomes a phrasal node on p.79 (where consider a fool is analyzed as a V ). The symbol v is an informally introduced abbreviation covering two elements that would be traditionally interpreted as either verb lexemes or verb stems (have and be) together with the category M of modals and the category V , yet it is mentioned in a transformation. And V2 on p. 112 appears to stand for a subcategory of verbs (including consider ) for which Va was used pp. 76-77. The text is similarly inconsistent about Aux. It is referred to as the ‘auxiliary phrase’ on p. 42, suggesting that it is a phrasal node; but on the next page it is called the ‘auxiliary verb’, suggesting it is a subcategory of the lexical category of verbs. This is crucially misleading, because what SS actually attempts to do is to analyze the syntax of English auxiliary verbs without making any reference to the notion ‘auxiliary verb’ at all. Nothing in the SS analysis corresponds to ‘auxiliary verb’, i.e., lexical item with verbal morphology capable of preceding the subject NP in closed interrogatives. Aux certainly does not correspond to that. In fact it is a very odd constituent indeed: a branching node housing a cluster of up to half a dozen non-verb siblings none of which is a head, which no transformation ever applies to or uses as a context. Aux is never moved, deleted, copied, inserted, targeted by adjunction, or mentioned as the context for the application of some other rule. How or why the SS analysis of auxiliaries came to be regarded as elegant or attractive is not clear. The analysis certainly appears to have a host of quite serious problems, such as various ordering paradoxes. Some of the problems only emerge given later advances in syntactic theory, but many are not anachronistic in this way, and should have been apparent at the time. The most serious of these is that the analysis is simply not compatible with formal theory of Chomsky’s magnum opus The Logical Structure of Linguistic Theory [4]. As noted by Sampson [36], the Auxiliary Transformation is not a legal transformation at all under the theory of LSLT. The reason is the cover symbols v and Af. These are neither terminal symbols nor non-terminal symbols; they function merely to make possible a collapsing of 16 different transformations sharing most of their structure. A less abstract but still theoretical issue is that the grammar proposed in SS assigns such different phrase structures to sequences that we would expect to have very similar structures: is asleep has is as a V but is sleeping has it as a member of the Aux sequence; ought to have left would apparently be monoclausal but thought to have left is biclausal; in has control the word has is a V but in has controlled it is not; and so on. The arbitrary syntactic distinctions drawn have no motivation. The fact is that modern analyses have without exception abandoned the Aux node. All of the items formerly housed in Aux are now treated as heads of projections, just as was always recommended by proponents of the primary alternative to the dependent-auxiliary analysis. That alternative has been presented in many minor variants over the years, going back to classic accounts like that of Jespersen (who referred to the modals as the ‘anomalous finites’ in the verb system), and
248
G.K. Pullum
defended by such writers as Ross [35], McCawley [24,25], Newmeyer [26], Pullum & Wilson [32], Gazdar et al. [16], Huddleston [17,18], and many others. The specific version adopted by Huddleston & Pullum ([19], Ch 14, §4.2, pp. 1209ff) shares with accounts like those of Pullum & Wilson [32] or Gazdar et al. [16] a treatment of the auxiliaries of English as verbs that have certain special behaviors but take complements in the same way that other complements do. More specifically, The Cambridge Grammar [19] analyzes auxiliaries as verbs that take catenative complements: non-finite, VP-internal, subjectless complements that are neither direct objects nor predicative complements, capable of recursive embedding leading to chains of verbs (may seem to want to avoid appearing to have been . . . , etc.). It is now well known that VP ellipsis phenomena, negation facts, and many other considerations argue for a uniformly right-branching structure of this kind. All in all, the treatment of auxiliaries in SS can hardly be said to be a progressive movement in syntactic theory or a good advertisement for transformations. 4.3
Passives
The analysis of passive clauses in SS is motivated by reference to four alleged problems that arise if passives are treated with phrase structure rules. According to Chomsky these complications ensue: 1. When Verb is expanded as Aux – V , the element be + en can be selected under Aux only if the V is transitive, and stating this would complicate the rule system (a child of Aux is dependent on features of a sibling of the parent of Aux ). 2. Even if V is transitive, be + en cannot be selected if V is followed by NP, and stating this condition further complicates the grammar (a child of Aux is disallowed if NP occurs as a sibling of its parent’s parent). 3. If V is followed by the PP by + NP, then be + en is obligatory in Aux — a third complex co-occurrence that has to be built into the rules (a child of Aux becomes obligatory given a certain sibling of its parent’s parent). 4. Selection restrictions reverse: acceptable subject NP s for passive clauses will be precisely those that would be acceptable as the object in the corresponding active, and acceptable by-phrase objects will be precisely those NP s that would have been acceptable as subjects in active clauses. The trouble is that all four of these claims are spurious. Claim 1: Be + en with intransitives — Not all verbs occurring with be and a past participle are transitive: (8) (9) (10)
Man is descended from apes. (← / *Someone descended man from apes). Charles is said to be gay. (← / *Somebody says Charles to be gay). Antarctica is uninhabited by man (← / *Man uninhabited Antarctica).
Creation Myths and Syntactic Structures
249
Claim 2: Be + en with following NP — There can be an NP after the verb in a clause with be + en: (11) (12) (13)
I’ve often been called an idiot. He was denied all his legal rights. We were shown several nice apartments.
Claim 3: By-phrase without be + en — A passive by-phrase complement can occur with no be auxiliary: (14)
a. b. c. d.
We had this [done by an expert ]. He went and got himself [stung by a wasp]. This car wants [cleaning]. The book needs [revising by an experienced editor ].
Claim 4: Selection restriction reversal — It has been clear since McCawley’s classic paper of 1968 [23] that selection restriction issues have no place in syntax. SS assumed that English syntax should distinguish John plays golf from Golf plays John (the latter is referred to as a ‘non-sentence’). This cannot be right. As McCawley pointed out, every semantic property of noun phrases is capable of being relevant to such putative restrictions: the property of denoting a crustacean (objects of the verb devein); the property of denoting a matrix (for objects of the verb diagonalize); and so on. I would say that selection restrictions do not belong in linguistics, but rather in metaphysics. Which noun phrases can fill the blank in The thinks it is Tuesday or other sentences with the verb think ? Would baby be appropriate? What about foetus? Crocodile? Cockroach? Computer ? One can readily imagine philosophical debate about the right cutoff point. Neurologists, philosophers of mind, and animal rights advocates might not agree. Turing’s famous 1950 paper in Mind set off controversy about whether machines can think; but surely that issue is not to be settled by syntax! This fourth point of Chomsky’s is clearly just a conceptual mistake. And the other three are entirely unpersuasive for syntactic reasons. 4.4
Analyzing Passives
The right analysis of auxiliaries in English leads us toward an acceptable analysis of passives too. Auxiliary verbs take non-finite, subjectless, recursively nestable complement clauses with specified inflectional features. Various matrix-clause verbs take passive clauses: be (was examined ), intransitive get (got arrested ), transitive get (got myself appointed ), go (went unnnoticed ), have (have someone collected ), and so on. We are in fact dealing with two dozen distinct constructions. Passive clauses such as liked by his classmates or beaten down by her troubles or irritated by his kids are best regarded as non-finite clauses that have distributions not very different from adjective phrases such as popular with his classmates or weary from her troubles or angry with his kids. They can be found as complements of
250
G.K. Pullum
ascriptive uses of the copula (compare was liked by his classmates and was popular with his classmates), or in various simple intransitive constructions (compare looked beaten down by her troubles and looked weary from her troubles), or in various complex-transitive constructions (compare got irritated by his kids and got angry with his kids). (15) (16) (17)
a. b. a. b. a. b.
He was well liked by his classmates. He was decidedly popular with his classmates. She looked beaten down by her troubles. She looked weary from her troubles. I often got irritated by his kids. I often got angry with his kids.
[passive VP] [AdjP] [passive VP] [AdjP] [passive VP] [AdjP]
The verbs may be in past-participal or gerund-participal inflected form (the ‘concealed passive’, as in The book merits re-reading); they may be adjectival (as with the ones taking un-) or verbal. And cross-cutting these distinctions are the lines dividing prepositional passives (with stranded prepositions, as in was looked at ) from the ordinary kind (was seen), and separating long passives (with the by-phrase complement) from short passive clauses (without it). The full array contains 24 English passive constructions, of which the SS transformation handles just one: the non-concealed non-adjectival non-prepositional long passive clause as complement of the copula. This one has no special priority or importance relative to the others. If the Passive transformation expressed a true generalization (we shall see below that it does not), it would be expressing a generalization holding over only a very small part of the range inherent in the descriptive task of characterizing English passive clauses. The key special property of passive clauses is that their meanings employ the sense of the verb in a way that involves what might be called role reversal: instead of the VP denoting a property of the agent, it denotes a property of the patient. This property is not tied to any of the elements present in the SS Passive transformation. – it is not tied to the presence of be + en, as shown by bare passives (Ignored by his workmates, he labored alone); – it is not tied to the presence of be + en, as shown by concealed passives (She needs examining by a specialist); – it is not tied to the presence of an immediately postverbal NP, as shown by prepositional passives (It has often been laughed at); – it is not tied to the existence of a corresponding active clause, as shown passives with verbs like rumored and said (He is said to be interested); – and in fact it is not tied to clauses at all, as we see from the ambiguity of the shooting of the hunters. 4.5
Irregularity in the Set of Passives
Note also that the generalization expressed by the SS Passive transformation is in any case massively false. The rule says:
Creation Myths and Syntactic Structures
(18)
251
Passive transformation Structural analysis: NP – Aux – V – NP Structural change: X1 – X2 – X3 – X4 → X4 – X2 + be + en – X3 – by + X1
This entails very clearly that for any NP immediately after any sequence of Aux – V , a grammatical passive will result from shifting the postverbal NP to subject position and the original subject into a by-phrase and adding be before the head verb and inflecting the head verb in past-participial form. But there are indefinitely many counterexamples, of many interestingly different types. Perhaps the most obvious counterexamples are strings like this: (19)
Everyone – must – hope – things will get better. NP – Aux – V – NP – X2 – X3 – X4 X1
From this the SS passive transformation (since it is blind to embedded clause boundaries) will generate the ungrammatical string in (20). (20)
*Things are hoped will get better by everyone.
Such trans-clausal cases were treated by Chomsky in [11] as a research problem to be solved by positing a constraint on transformational movement that is violated by any movement of an NP out of a tensed domain. But Chomsky’s proposals fail fairly decisively (see Bach & Horn [2], esp. 284–289). Over and above this class of examples, there are numerous lexical and semantic limitations on passivization. Bach [1] gives a significant number. Postal [31] catalogs many more. They include cases with predicative complement NPs (Mike seemed a nice enough guy
⇒ *A nice enough guy was seemed by Mike); measure NPs (The fish weighed twelve pounds
⇒ *Twelve pounds were weighed by the fish; (This matters a lot to me
⇒ *A lot is mattered by this to me; manner of speaking verbs (The old man growled some bitter comments
⇒ *Some bitter comments were growled by the old man); and many other idiosyncratic cases (The train departed the station at dawn
⇒ *The station was departed at dawn by the train; George had several homes
⇒ *Several homes were had by George; Fred lacks finesse
⇒ *Finesse is lacked by Fred ; etc.). The rich array of unpassivizable NP – Aux – V – NP sequences tells us much about the sensitivity of passive constructions to lexical factors. The notion that it represents some kind of simple, automatic, regular, syntactic modification process, which is the central claim presented in SS, has no plausibility whatsoever, and provides no motivation for transformations.
5
Conclusions
Why care about a retrospective evaluation of a monograph over 50 years old? Because myths about scientific breakthroughs and results can warp perceptions
252
G.K. Pullum
of the history of a field. Creation myths attributing everything to one individual are known in other fields too. The truth about science is that discoveries and innovations develop over time and build on earlier developments in the field or in adjacent fields, and myths of monogenesis and individual glorification damage contemporary theorizing in at least two ways. First, they encourage scientists in the complacent maintenance of false assumptions: if almost every linguist is convinced that SS showed transformations to be necessary back in 1957, non-transformational research will be underdeveloped or ignored (and indeed I think in general it has been over the past fifty years). Second, they promote biased and lazy citation practices — the same old references passed from paper to paper without anyone checking the sources. Both consequences are worth guarding against.
References 1. Bach, E.: In defense of passive. Linguistics and Philosophy 3, 297–341 (1980) 2. Bach, E., Horn, G.M.: Remarks on Conditions on transformations. Linguistic Inquiry 7, 265–361 (1976) 3. Brainerd, W.S., Landweber, L.H.: Theory of Computation. John Wiley, New York (1974) 4. Chomsky, N.: The Logical Structure of Linguistic Theory. MIT Library, Cambridge (1956) (microfilmed; revised version of a 1955 unpublished manuscript) 5. Chomsky, N.: Three models for the description of language. I.R.E. Transactions on Information Theory 2, 113–123 (1956); Reprinted with substantive revisions in Luce, Bush & Galanter ([22]), pp. 105–124 6. Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957) 7. Chomsky, N.: On certain formal properties of grammars. Information and Control 2, 137–167 (1959); Reprinted in Luce, Bush & Galanter [22], pp. 125–155; citation to original is incorrect 8. Chomsky, N.: On the notion ‘rule of grammar’. In: Proceedings of the Twelfth Symposium in Applied Mathematics, pp. 6–24. American Mathematical Society, Providence (1961); Reprinted with slight revision in Fodor, J.A., Katz, J.J. (eds.): The Structure of Language: Readings in the Philosophy of Language, pp. 155–210. Prentice-Hall, Englewood Cliffs 9. Chomsky, N.: Explanatory models in linguistics. In: Nagel, E., Suppes, P., Tarski, A. (eds.) Logic, Methodology and Philosophy of Science: Proceedings of the 1960 International Congress, pp. 528–550. Stanford University Press, Stanford (1962) 10. Chomsky, N.: Aspects of the Theory of Syntax. MIT Press, Cambridge (1965) 11. Chomsky, N.: Conditions on transformations. In: Anderson, S.R., Kiparsky, P. (eds.) A Festschrift for Morris Halle, Holt Rinehart and Winston, New York (1973) 12. Chomsky, N., Miller, G.A.: Introduction to the formal analysis of natural languages. In: Luce, R.D., Bush, R.R., Galanter, E. (eds.) Handbook of Mathematical Psychology, vol. II, pp. 269–321. John Wiley and Sons, New York (1963) 13. Daly, R.T.: Applications of the Mathematical Theory of Linguistics. Mouton, The Hague (1974) 14. Davis, M. (ed.): Solvability, Provability, Definability: The Collected Works of Emil L. Post. Birkh¨ auser, Boston (1994)
Creation Myths and Syntactic Structures
253
15. Fries, C.C.: The Structure of English. Harcourt Brace, New York (1952) 16. Gazdar, G., Pullum, G.K., Sag, I.A.: Auxiliaries and related phenomena in a restrictive theory of grammar. Language 58, 591–638 (1982) 17. Huddleston, R.: Further remarks on the analysis of auxiliaries as main verbs. Foundations of Language 11, 215–229 (1974) 18. Huddleston, R.: An Introduction to English Transformational Syntax. Longman, London (1976) 19. Huddleston, R., Pullum, G.K.: The Cambridge Grammar of the English Language. Cambridge University Press, Cambridge (2002) 20. Lasnik, H.: Syntactic Structures Revisited: Contemporary Lectures on Classic Transformational Theory. MIT Press, Cambridge (2000) 21. Lightfoot, D.: Introduction. In: Chomsky, N. (ed.) Syntactic Structures, 2nd edn., pp. v–xviii, Mouton de Gruyter, Berlin (2002) 22. Luce, R.D., Bush, R.R., Galanter, E. (eds.): Readings in Mathematical Psychology, vol. II. John Wiley & Sons, New York (1965) 23. McCawley, J.D.: Concerning the base component of a transformational grammar. Foundations of Language 4, 243–269 (1968); Reprinted in McCawley, J.D.: Grammar and Meaning, pp. 35–58. Academic Press, NewYork, Taishukan, Tokyo (1973) 24. McCawley, J.D.: Tense and time reference in English. In: Fillmore, C.J., Langendoen, D.T. (eds.) Studies in Linguistic Semantics, pp. 97–113. Holt, Rinehart and Winston, New York (1971) 25. McCawley, J.D.: The category status of English modals. Foundations of Language 12, 597–601 (1975) 26. Newmeyer, F.J.: English Aspectual Verbs. Mouton, The Hague (1975) 27. Pelletier, F.J.: The generative power of rule orderings in formal grammars. Linguistics 18(1/2 (227/228)), 17–72 (1980) 28. Post, E.: Formal reductions of the general combinatory decision problem. American Journal of Mathematics 65, 197–215 (1943); Reprinted in Davis [14], pp. 442–460 29. Post, E.: Recursively enumerable sets of positive integers and their decision problems. Bulletin of the American Mathematical Society 50, 284–316 (1944); Reprinted in Davis[14], pp. 461–494 30. Post, E.: Recursive unsolvability of a problem of Thue. Journal of Symbolic Logic 12, 1–11 (1947); Reprinted in Davis [14], pp. 503–513 31. Postal, P.M.: Skeptical Linguistic Essays. Oxford University Press, New York (2004) 32. Pullum, G., Wilson, D.: Autonomous syntax and the analysis of auxiliaries. Language 53, 741–788 (1977) 33. Putnam, H.: Some issues in the theory of grammar. In: Jakobson, R. (ed.) Proceedings of Symposia in Applied Mathematics Structure of Language and Its Mathematical Aspects, No. 12, pp. 25–42. American Mathematical Society, Providence (1961) 34. Rosenbloom, P.: The Elements of Mathematical Logic. Dover, New York (1950) 35. Ross, J.R.: Auxiliaries as main verbs. Studies in Philosophical Linguistics 1, 77–102 (1967) 36. Sampson, G.: What was transformational grammar? Lingua 48, 355–378 (1979); Reprinted in Empirical Linguistics, Continuum (2001)
254
G.K. Pullum
37. Seuren, P.: Concerning the roots of transformational generative grammar. Historiographia Linguistica 36(1), 97–115 (2009) 38. Shieber, S.: Evidence against the context-freeness of human language. Linguistics and Philosophy 8, 333–343 (1985) 39. Svenonius, L.: Review of Three models for the description of language by Noam Chomsky. Journal of Symbolic Logic 23, 71–72 (1957) 40. Thue, A.: Probleme u ¨ber Ver¨ anderungen von Zeichenreihen nach gegebenen Regeln. In: Skrifter utgit av Videnskapsselskapet i Kristiana, I. No. 10 in Matematisk-naturvidenskabelig klasse, Norske Videnskaps-Akademi, Oslo (1914) 41. Urquhart, A.: Emil Post. In: Gabbay, D.M., Woods, J. (eds.) Handbook of the History of Logic. Logic from Russell to Church, vol. 5, pp. 617–666. North-Holland, Amsterdam (2009)
On Languages Piecewise Testable in the Strict Sense James Rogers1, Jeffrey Heinz2, , Gil Bailey1 , Matt Edlefsen1 , Molly Visscher1 , David Wellcome1 , and Sean Wibel1 1
2
Dept. of Computer Science, Earlham College Dept. of Linguistics and Cognitive Science, University of Delaware
Abstract. In this paper we explore the class of Strictly Piecewise languages, originally introduced to characterize long-distance phonotactic patterns by Heinz [7] as the Precedence Languages. We provide a series of equivalent abstract characterizations, discuss their basic properties, locate them relative to other well-known subregular classes and provide algorithms for translating between the grammars defined here and finite state automata as well as an algorithm for deciding whether a regular language is Strictly Piecewise.
1
Introduction
From the beginning of the generative linguistics program, long-distance dependencies in natural language have attracted considerable interest. For example, [2] establishes that the long-distance dependencies necessary to describe sentence well-formedness are beyond the reach of finite state methods, and later work continues to characterize the kinds of non-local dependencies in natural language in ways which require increasingly expressive formalisms [11,21,12]. Although many long-distance dependencies in natural language require expressive formalisms that are at least context-free [2,11,21,12], some non-local patterns in natural language do not. An example from Heinz [7,8] comes from the sibilant harmony process of Sarcee, where [-anterior] sibilants like [ ] and [] regressively require [+anterior] sibilants like [s] and [z] to assimilate in anteriority, but not vice versa [3,4].1 As a consequence of this phonological rule, there are no words in Sarcee where [-anterior] sibilants may follow [+anterior] sibilants as in (1b), though the reverse is possible (1a) (data from Cook [3]). In the examples in (1), witness that words are well-formed when the [+anterior] sibilant [z] follows a [-anterior] sibilant like [ ] in (a) ‘my duck,’ but there are no words in Sarcee where [-anterior] sibilants like [ ] may follow [+anterior] sibilants like [s], as in the hypothetical example in (1c). 1
The author acknowledges the support of a 2008-2009 University of Delaware Research Fund Grant. Linguistic descriptions of Sarcee (and many other languages with consonantal harmony [6,18]) are clear that agreeing consonants can be arbitrarily distant.
C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 255–265, 2010. c Springer-Verlag Berlin Heidelberg 2010
256
J. Rogers et al.
1. a. / / → Ë ØË z ‘my duck’ c. cf. *× ØË Þ b. / / → Ë
Ë ‘I killed them again’ Heinz [7,8] observes that these kinds of long-distance dependencies can be described according to the well-formedness of subsequences: in Sarcee discontiguous subsequences like [ ] and [ ] are well-formed but discontiguous subsequences like [ ] and [ ] are not (since the phonological rule requires the [s] to become [ ] when followed by [ ]). The Piecewise Testable (PT) languages [22] are a subclass of the regular languages that can describe this kind of non-local pattern. These languages are similar, in many respects, to the Locally Testable (LT) languages [16,17] except that the two classes differ in how they determine an expression’s well-formedness: for LT languages, an expression’s well-formedness depends entirely on the set of contiguous subsequences (up to some length k, known as k-factors) in the expression, whereas for PT languages, an expression’s well-formedness depends entirely on the set of subsequences (not necessarily contiguous and up to some length k) found within the expression. In fact, Sarcee-like non-local patterns are describable by a proper subclass of the PT languages, which we call Strictly Piecewise (SP). This name reflects the fact that the relationship between the SP languages and the PT languages is precisely analogous to the the relationship between the Strictly Local (SL) languages and the LT languages [16,17]. The SP class completes a dual hierarchy of subregular language classes with the Local branch being characterized by immediate adjacency (successor) and the Piecewise branch by precedence (lessthan): – SP and LT are the languages definable as intersections of certain simple negative constraints, i.e., as conjunctions of complements of atomic formulae which are satisfied by strings that contain a specified subsequence or, respectively, factor (the so called forbidden subsequences/factors). – PT and LT are the languages definable by arbitrary propositional formulae of this sort. – The Star Free (SF) languages and the Locally Threshold Testable (LTT) languages are the languages that First-Order (FO) definable over sequences with less-than and successor, respectively. Since successor is FO definable from less-than, LTT is a subclass of SF. – The Regular languages are those that are Monadic Second-Order (MSO) definable over sequences with either less-than or successor. Strikingly, SP turns out to be exactly the class of languages which are closed under subsequence. The structure of the paper is as follows. Section 2 defines basic notation. Section 3 reviews the Piecewise Testable languages. Most of the results in this section are well-known (see [22], [19], [15], [13], [23]); the rest are probably best attributed to folklore. The primary contributions of this paper are in Sections 4 and 5. Section 4 defines the Strictly Piecewise Testable languages, explores some of their properties and provides a number of abstract characterizations of the
Strictly Piecewise Testable Languages
257
class. Section 5 presents algorithms for extracting a SPk grammar from a minimal Deterministic Finite-State Automaton (DFA) recognizing a SPk language and for constructing a minimal DFA recognizing an SPk language from its grammar. Together these provide an algorithm for deciding if an arbitrary regular language is SP and, if it is, for determining the least k for which it is SPk . In section 6 we consider the parallels between the Piecewise Testable and Locally Testable hierarchies from a descriptive perspective.
2
Preliminaries
We start with some mostly standard notation. P(S) denotes the power set of the set S; S1 − S2 set-theoretic difference. Σ denotes a finite set of symbols and a string over Σ is a finite sequence of symbols drawn from that set. Σ k , Σ ≤k , Σ ∗ denote all strings over this alphabet of length k, of length less than or equal to k, and of any finite length, respectively. denotes the empty string. |w| denotes the length of string w and |w|σ denotes the number of occurrences of σ ∈ Σ in w. A language L is a subset of Σ ∗ ; L its complement relative to Σ ∗ . Concatenation of sets of strings is denoted L1 · L2 = {uv | u ∈ L1 and v ∈ L2 }. When discussing partial functions, we use the notation ↑ and ↓ to indicate that the function is undefined, respectively is defined, for some particular arguments. A Deterministic Finite-state Automaton. (DFA) is a tuple M = Q, Σ, q0 , δ, F where Q is the state set, Σ is the alphabet, q0 is the start state, δ is a deterministic transition function and F is the set of accepting states. Let δˆ : Q × Σ ∗ → Q ˆ w) is the (unique) state reachable be the (partial) path function of M, i.e., δ(q, ˆ w)↑ otherwise. The language from state q via the sequence w, if any, or δ(q, def ˆ 0 , w)↓ ∈ F }. recognized by a DFA M is L(M) = {w ∈ Σ ∗ | δ(q ˆ 0 , w) Two strings w and v over Σ are distinguished by a DFA M iff δ(q = ˆ 0 , v). They are Nerode equivalent with respect to a language L if and only δ(q if wu ∈ L ⇐⇒ vu ∈ L for all u ∈ Σ ∗ . All DFAs which recognize L must distinguish strings which are inequivalent in this sense, but no DFA recognizing L necessarily distinguishes any strings which are equivalent. Hence the number of equivalence classes of strings over Σ modulo Nerode equivalence with respect to L gives a (tight) lower bound on the number of states required to recognize L. A DFA is minimal if the size of its state set is minimal among DFAs accepting the same language. A minimal DFA is trimmed if the (unique) sink state has been removed. The reader is referred to [10] for details. The relation between strings which is fundamental to Piecewise Testability is the subsequence relation, which is a partial order on Σ ∗ : def w v ⇐⇒ w = ε or w = σ1 · · · σn and (∃w0 , . . . , wn ∈ Σ ∗ )[v = w0 σ1 w1 · · · σn wn ]. in which case we say w is a subsequence of v. The subsequence relation is compatible with concatenation: w1 v1 and w2 v2 implies that w1 w2 v1 v2 . For w ∈ Σ ∗ , let Pk (w) = {v ∈ Σ k | v w} and P≤k (w) = {v ∈ Σ ≤k | v w}, def
def
258
J. Rogers et al.
the set of subsequences of length k, respectively length no greater than k, of w. Let Pk (L) and P≤k (L) be the natural extensions of these to sets of strings. Note that P0 (w) = {ε}, for all w ∈ Σ ∗ , that P1 (w) is the set of symbols occurring in w and that P≤k (L) is finite, for all L ⊆ Σ ∗ .
3
Piecewise Testable Languages
The class of Piecewise Testable languages (PT) was introduced by Simon [22] as the Boolean closure of the class of languages of the form Σ ∗ σ1 Σ ∗ · · · Σ ∗ σn Σ ∗ where σ1 · · · σn is a possibly empty word in Σ ∗ . Following Sakarovitch and Simon [19], Lothaire [15] and Kontorovich, et al., [13] we call the set of strings that contain w as a subsequence the (principal) shuffle ideal 2 of w: SI(w) = {v ∈ Σ ∗ | w v}. Then the class of Piecewise Testable (PT) languages is the smallest class of languages including SI(w) for all w ∈ Σ ∗ and closed under Boolean operations. Similarly, the class of k-Piecewise Testable (PTk ) languages is the smallest class of languages including SI(w) for all w ∈ Σ ≤k and closed under Boolean operations. Extending the notion of shuffle ideal to languages, SI(L) is the closure of L under inverse : SI(L) = {v ∈ Σ ∗ | ∃w ∈ L, w v} From a model-theoretic perspective, PTk is the class of languages definable by propositional formulae in which the atomic formulae are strings in Σ ≤k with a string w ∈ Σ ∗ satisfying a formula p ∈ Σ ≤k iff w ∈ SI(p). PT is the class of languages definable by arbitrary finite formulae of this type. The class of Piecewise Testable languages has a well-known characterization (sometimes taken to be its definition): Theorem 1. L ⊆ Σ ∗ is in the class PT iff there exists some k such that, for all w, v ∈ Σ ∗ , P≤k (w) = P≤k (v) ⇒ (w ∈ L ⇔ v ∈ L). Since P≤k (Σ ∗ ) is finite for all k and Σ, one consequence of this characterization is that a language is in PT iff it is the union of a finite set of equivalence classes def modulo the relation w ≡k v ⇐⇒ P≤k (w) = P≤k (v). Given this, we can take a PTk language to be generated by a finite set of subsets of Σ ≤k . Definition 1 (PTk Grammar). A PTk grammar is a pair G = Σ, T where T ⊆ P(Σ ≤k ). The language licensed by a PTk grammar is def
L(G) = {w ∈ Σ ∗ | P≤k (w) ∈ T }. 2
Properly SI(w) is the principal ideal generated by {w} wrt the inverse of .
Strictly Piecewise Testable Languages
259
Note that L(G) = ∅ iff T = ∅ and ε ∈ L(G) iff {ε} ∈ T . Theorem 2. The classes PTk form a proper hierarchy in k: (∀k)[PTk PTk+1 ]. The inclusion follows from the fact that ≡k+1 is a refinement of ≡k . To see that it is proper, let def ∗ L≤kb = {w ∈ {a, b} | |w|b ≤ k}. This is not in PTk since P≤k (bk ) = P≤k (bk+1 ). On the other hand, it is in PTk+1 since it is the complement of SI(bk+1 ). Theorem 3. The class of finite languages is a proper subset of the class of Piecewise Testable languages. Any singleton set {w} is PT|w|+1 , being the intersection of SI(w) and all SI(v) for v ∈ Σ |w|+1 . Hence every finite set L is in PTk for every k greater than the longest string in L. On the other hand, there is no k for which the class of finite languages is included in PTk . Theorem 4. PT and PTk , for any k > 0, are not closed under concatenation. The languages L≤kb = L≤(k−1)b · L≤1b witness that PTk is not closed under concatenation. For the general case consider the language Lawb = {a} · Σ ∗ · {b}. This is the concatenation of three PT2 languages, but it is not, itself, PT. Suppose, by way of contradiction, that it was PT. Then it would be PTk for ∈ Lawb despite the fact some k. But then the string (ab)k ∈ Lawb while (ab)k a ∗ that P≤k ((ab)k ) = P≤k ({a, b} ) = P≤k ((ab)k a), contradicting Theorem 1. Theorem 5 (Simon 1975). The class of Piecewise Testable languages is a proper subset of the class of Star-Free languages: PT SF. SI(w), where w = σ1 σ2 · · · σ|w| , is denoted by the SF expression ∅ · σ1 · ∅ · · · · · ∅ · σ|w| · ∅. and SF is closed under Boolean operations. That the inclusion is proper is witnessed by the fact that Lawb ∈ SF. Theorem 6. The class of Star-Free languages is the closure of the class of Piecewise Testable languages under concatenation and Boolean operations. SF is the closure of the class of Finite languages under union, concatenation and complement (hence concatenation and Boolean operations).
4
Strictly Piecewise Languages
Languages that are Locally Testable in the Strict Sense (Strictly Local, SL) are defined only in terms of the set of k-factors which are licensed to occur in the string (equivalently the complement of that set with respect to Σ ≤k , the forbidden factors) [16]. In this section we introduce the class of languages obtained by the analogous restriction to PT, which we call Piecewise Testable in the Strict Sense (Strictly Piecewise, SP).
260
J. Rogers et al.
Definition 2 (SPk Grammar). A SPk grammar is a pair G = Σ, T where T ⊆ Σ k . The language licensed by a SPk grammar is def
L(G) = {w ∈ Σ ∗ | P≤k (w) ⊆ P≤k (T )}. A language is SPk iff it is L(G) for some SPk grammar G. It is SP iff it is SPk for some k. The SP languages have a variety of characteristic properties. Theorem 7. The following are equivalent: 1. L = w∈S [SI(w)], S finite, 2. L ∈ SP, 3. (∃k)[P≤k (w) ⊆ P≤k (L) ⇒ w ∈ L], 4. w ∈ L and v w ⇒ v ∈ L (L is subsequence closed), 5. L = SI(X), X ⊆ Σ ∗ (L is the complement of a shuffle ideal). Proof. These are each almost immediate consequences of their predecessors. That 1 implies 2 is witnessed by the SPk grammar Σ, Σ ≤k − S, where k is the maximum length of the strings in S. To see that 2 implies 3, suppose that L ∈ SP. Then L ∈ SPk for some k and there is some Σ and T ⊆ Σ ≤k for which L = {w ∈ Σ ∗ | P≤k (w) ⊆ P≤k (T )}. Then P≤k (L) = w∈L [P≤k (w)] ⊆ P≤k (T ). That L is closed under Property 3 follows immediately. That 3 implies 4 follows from the fact that v w ∈ L ⇒ P≤k (v) ⊆ P≤k (w) ⊆ P≤k (L). That 4 implies 5 follows immediately from the definition of SI(X) since closure of L under subsequence implies that L is closed under inverse subsequence. Finally, 5 implies 1 by DeMorgan’s theorem and the fact that every shuffle ideal is finitely generated, which is a consequence of the fact that there are no infinite sequences of strings over a fixed alphabet which are pairwise unrelated by subsequence.3 Corollaries: If L ∈ SPk then: 1. wv ∈ L ⇒ w, v ∈ L (Prefix and Suffix closure), 2. P1 (L) ⊆ L (Unit strings) and 3. L = ∅ ⇒ ε ∈ L (Empty string). Theorem 8 (Proper Hierarchy). (∀k)[SPk SPk+1 ]. Inclusion follows from Property 3 of Theorem 7 along with the fact that P≤k (w) = P≤k (P≤k+1 (w)).4 The same sequence of languages that witnesses separation of the PTk classes witnesses separation of the SPk classes. 3 4
This is Theorem 6.12 of Lothaire [15], although Lothaire attributes the general principle to Higman [9]. It should be noted, though, that the language licensed by Σ, T as an SPk+1 grammar is not equal to that licensed by Σ, T as an SPk grammar, since T ⊆ Σ k implies that no string of length greater than k will be licensed in the SPk+1 sense.
Strictly Piecewise Testable Languages
261
Theorem 9. SP and SPk , for any k > 0, are closed under intersection and (in a trivial sense) Kleene closure. SPk is not closed under union or concatenation, although SP is closed under both. Neither SP nor SPk are closed under complement or intersection with Regular languages. Proof. Closure under intersection follows immediately from Property 1 of Theorem 7. Non-closure of SPk under union is witnessed by the language L = L
5
SP and the Regular Languages
Since SP ⊆ PT ⊆ Star-Free ⊆ Regular every SP language is recognizable by a Deterministic Finite-State Automaton (DFA). Theorem 7 has a number of consequences for the structure of the trimmed, minimal DFAs which recognize SP languages. In particular, let M = Q, Σ, q0 , δ, F be a trimmed minimal DFA for which L(M) ∈ SPk . Then: – All states of M are accepting states: F = Q. ˆ 1 , σ)↑ and δ(q ˆ 1 , w) = q2 for some w ∈ Σ ∗ – For all q1 , q2 ∈ Q and σ ∈ Σ, if δ(q ˆ then δ(q2 , σ)↑. (Missing edges propagate down.) – All cycles are self-edges. ˆ 0 , w) = q1 , δ(q ˆ 1 , v) = q2 and q1 – For all q1 , q2 ∈ Q and u, v, w ∈ Σ ∗ , if δ(q = q2 then:
262
J. Rogers et al.
ˆ 0 , wu)↓ and δ(q ˆ 0 , wvu)↑] and • (∃u ∈ Σ ∗ )[δ(q ∗ ˆ ˆ • (∀u ∈ Σ )[δ(q0 , wvu)↓ ⇒ δ(q0 , wu)↓] Lemma 1. Let M = Q, Σ, q0 , δ, F be a trimmed, minimal DFA for which ˆ 0 , w)↓}. L(M) ∈ SPk . Then P≤k (L(M)) = {w ∈ Σ ≤k | δ(q This follows from closure under subsequence. Lemma 1 provides an algorithm which, given a DFA that recognizes an SPk language, constructs the SPk grammar for that language. One simply does a search of the transition graph of the DFA with the depth limited to k, recording the strings labeling the paths traversed. The time complexity of this algorithm is Θ(card(Σ)k ). Note that this construction will yield some SPk grammar given any M; that grammar will license L(M) iff L(M) is SPk . Lemma 2. Suppose w ∈ Σ k , w = σ1 · · · σk . Let MSI(w) = Q, Σ, q0 , δ, F , where Q = {i | 1 ≤ i ≤ k}, q0 = 1, F = Q and for all qi ∈ Q, σ ∈ Σ: ⎧ ⎨ qi+1 if σ = σi and i < k, if σ = σi and i = k, δ(qi , σ) = ↑ ⎩ qi otherwise. Then MSI(w) is a minimal, trimmed DFA that recognizes the complement of SI(w), i.e., SI(w) = L(MSI(w) ). Lemma 2 provides the foundation for an algorithm which, given an SPk grammar Σ, T for a language L, constructs a minimal, trimmed DFA which recognizes L. One constructs the trimmed, minimal DFA for SI(w) for each w ∈ Σ ≤k − P≤k (T ), and then constructs the trimmed, minimal DFA for their intersection. The complexity of this algorithm is Θ(card(Σ)k ) (since card(T ) = Θ(card(Σ)k , worst case). Together, Lemmas 1 and 2, applied alternately for increasingly large k provide a mechanism for determining the least k for which L(M) ∈ SPk if, in fact, there is such a k. All that remains is to determine a bound on the size of the k for which L(M) could be SPk . Lemma 3. Suppose L ∈ SPk − SPk−1 . Then every DFA that recognizes L has at least k states. L ∈ SPk − SPk−1 implies that there is at least one w ∈ Σ k such that SI(w)∩L = ∅ but for all proper subsequences v of w it is the case that SI(v) ∩ L = ∅. In fact, since SPk languages are closed under subsequence v itself must be in L. Suppose v1 and v2 are distinct proper prefixes of w, |v1 | < |v2 |. Then there is some u such that v2 u = w ∈ L. On the other hand, v1 u is a proper subsequence of w and thus, by choice of w, v1 u ∈ L. Hence none of the k proper prefixes of w are Nerode equivalent (with respect to L) to each other or to w and any DFA recognizing L will need to distinguish at least k + 1 classes of strings, hence have at least k states.
Strictly Piecewise Testable Languages
263
Theorem 11. There is an algorithm which, given any Regular language L, decides if L is SP and, if it is, determines the least k for which L is SPk and returns an SPk grammar for L. Assume, wlog, that L is presented as a trimmed, minimal DFA. The algorithm constructs potential SPk grammars for increasing k using Lemma 1 and constructs trimmed, minimal DFAs for each using Lemma 2. The first grammar constructed in this way that is isomorphic to the DFA for L will be an SPk grammar for the least k for which L is SPk . By Lemma 3, if no such DFA is found for k ≤ card(Q) then L is not SPk for any k. The time complexity of this algorithm is Θ(card(Σ)card(Q) ), i.e., it is exponential time. This, however, turns out to be optimal for algorithms that actually construct an SPk grammar for L. Theorem 12. Suppose L ∈ SP. Let card(Q) be the size of the state set of a trimmed, minimal DFA recognizing L. Then the worst case size of the grammar for L is Θ(card(Σ)card(Q) ). By Lemma 3, no grammar for an SP language can be larger than card(Σ)card(Q) , where Q is the state set of a trimmed minimal DFA recognizing that language. By Lemma 2, grammars of that size do exist.
6
Dual Subregular Hierarchies
The hierarchy of Local classes of languages has a very attractive model-theoretic characterization. The class of Strictly Local languages is properly extended by the class of Locally Testable languages, which is the class of languages definable by propositional formulae in which the atomic formulae are blocks of symbols interpreted as factors of the string. This is properly extended by the class of Locally Threshold Testable languages, which is the class of languages definable by First-Order formulae with adjacency (successor) but not precedence (lessthan). This is properly extended by the class of Regular languages, which is the class of languages definable by Monadic Second-Order formulae with either adjacency or precedence, equivalently with both (since they are MSO definable from each other). As we have seen here, the Piecewise classes provide a parallel sequence of classes. The class of SP languages corresponds to the Strictly Local languages, except that they are defined in terms of subsequences rather than factors. This is extended by the class of PT languages, which is the class of languages definable over propositional formulae in which the atomic formulae are blocks of symbols interpreted as subsequences of the string. Hence PT corresponds to LT except that, again, it is defined in terms of subsequences rather than factors. The class of languages definable by First-Order formulae with precedence, corresponding to LTT on the adjacency side, is the class of Star-Free sets. Since adjacency is FO definable from precedence, LTT is actually a (proper) subclass of SF. Finally, the
264
J. Rogers et al.
Reg
MSO SF FO
LTT LT
PT
SL
SP
+1
<
Prop
Fig. 1. Parallel Sub-regular Hierarchies
two branches become indistinguishable at the MSO level in the class of Regular languages.5
7
Conclusion
We have characterized the Strictly Piecewise languages and presented their basic properties. Additionally, it was shown that SP languages complete a subregular hierarchy based on precedence in the same way the Local classes form a hierarchy based on adjacency. We have also provided algorithms for translating between the SP grammars defined here and finite state automata, as well as an algorithm for deciding if some regular language is SP. The theoretical contributions above provide a better understanding of linguistic, cognitive, and natural language processing models. We have already mentioned the capability of the SP languages to describe certain kinds of phonotactic patterns [7,8]. The introductory Sarcee pattern, for example, is given by a SP2 grammar which only prohibits subsequences consisting of a [+anterior] sibilant followed by a [-anterior] sibilant. Interestingly, SP languages also appear in models of reading comprehension [5,24] as well as in text classification [14,1] (see also Shawe-Taylor [20, chap. 11]). We hope that this paper continues to spur interest in the utility and beauty of languages described piecewise.
References 1. Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word-sequence kernels. Journal of Machine Learning Research 3, 1059–1082 (2003) 2. Chomsky, N.: Three models for the description of language. I.R.E. Transactions on Information Theory IT-2, 113–123 (1956); reprinted in Readings in Mathematical Psychology. In: Duncan Luce, R., Bush, R.R., Galanter, E. (eds.), vol. II, pp. 113–123. John Wiley & Sons, New York (1965) 5
Interestingly, it can also be said that they join at the bottom of the hierarchy as well, since SP1 and SL1 both contain just ∅ and Γ ∗ for each Γ ⊆ Σ.
Strictly Piecewise Testable Languages
265
3. Cook, E.D.: The synchronic and diachronic status of Sarcee gy . International Journal of American Linguistics 4, 192–196 (1978) 4. Cook, E.D.: A Sarcee Grammar. University of British Columbia Press (1984) 5. Grainger, J., Whitney, C.: Does the huamn mnid raed wrods as a wlohe? Trends in Cognitive Science 8, 58–59 (2004) 6. Hansson, G.: Theoretical and typological issues in consonant harmony. Ph.D. thesis, University of California, Berkeley (2001) 7. Heinz, J.: The Inductive Learning of Phonotactic Patterns. Ph.D. thesis, University of California, Los Angeles (2007) 8. Heinz, J.: Learning long distance phonotactics (2008) (submitted manuscipt) 9. Higman, G.: Ordering by divisibility in abstract algebras. Proceedings of the London Mathmatical Society 2, 326–336 (1952) 10. Hopcroft, J., Motwani, R., Ullman, J.: Introduction to Automata Theory. In: Languages, and Computation. Addison-Wesley, Reading (2001) 11. Joshi, A.K.: Tree-adjoining grammars: How much context sensitivity is required to provide reasonable structural descriptions? In: Dowty, D., Karttunen, L., Zwicky, A. (eds.) Natural Language Parsing, pp. 206–250. Cambridge University Press, Cambridge (1985) 12. Kobele, G.: Generating Copies: An Investigation into Structural Identity in Language and Grammar. Ph.D. thesis, University of California, Los Angeles (2006) 13. Kontorovich, L.A., Cortes, C., Mohri, M.: Kernel methods for learning languages. Theoretical Computer Science 405(3), 223–236 (2008) 14. Lodhi, H., Cristianini, N., Shawe-Taylor, J., Watkins, C.: Text classification using string kernels. Journal of Machine Language Research 2, 419–444 (2002) 15. Lothaire, M. (ed.): Combinatorics on Words. Cambridge University Press, Cambridge (1997) 16. McNaughton, R., Papert, S.: Counter-Free Automata. MIT Press, Cambridge (1971) 17. Rogers, J., Pullum, G.: Aural pattern recognition experiments and the subregular hierarchy. In: Kracht, M. (ed.) Proceedings of 10th Mathematics of Language Conference, pp. 1–7. University of California, Los Angeles (2007) 18. Rose, S., Walker, R.: A typology of consonant agreement as correspondence. Language 80(3), 475–531 (2004) 19. Sakarovitch, J., Simon, I.: Subwords. In: Lothaire, M. (ed.) Combinatorics on Words, Encyclopedia of Mathematics and Its Applications, vol. 17, ch. 6, pp. 105–134. Addison-Wesley, Reading (1983) 20. Shawe-Taylor, J., Christianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2005) 21. Shieber, S.: Evidence against the context-freeness of natural language. Linguistics and Philosophy 8, 333–343 (1985) 22. Simon, I.: Piecewise testable events. In: Brakhage, H. (ed.) GI-Fachtagung 1975. LNCS, vol. 33, pp. 214–222. Springer, Heidelberg (1975) 23. Trahtman, A.: Piecewise and local threshold testability of DFA. In: Freivalds, R. (ed.) FCT 2001. LNCS, vol. 2138, pp. 347–358. Springer, Heidelberg (2001) 24. Whitney, C., Cornelissen, P.: SERIOL reading. Language and Cognitive Processes 23, 143–164 (2008)
A Note on the Complexity of Abstract Categorial Grammars Sylvain Salvati INRIA Bordeaux Sud-Ouest, LaBRI, Universit´e de Bordeaux
1
Introduction
This paper presents a precise and detailed study of the complexities of the membership and the universal membership problems for Abstract Categorial Grammars (ACG). ACGs have been introduced by Philippe de Groote in [2] as a simplification over categorial grammars which reduces the number of necessary primitives involved in the definition of the formalism. Thus in ACGs, every structure is represented with the help of the linear λ-calculus and the languages defined by means of ACGs are sets of linear λ-terms. The problem under investigation has already been studied in [7], but we give here some more precise results with some arguably simpler proofs. We use the same classification of the grammars in terms of the order of the lexicon and of the order of abstract language.
2
Preliminaries
A higher order signature Σ is a triple (A, C, τ ) such that A is a finite set of atomic types, C is a finite set of constants and τ is a typing function which associates to each element of C an element of TA , the set of types, which itself is defined as the smallest set containing A and closed under the use of the infix binary operator . The order of a type α is given by ord(α) = 1 when α ∈ A and ord(α) = max(ord(α1 ) + 1, ord(α2 )) when α = α1 α2 . The order of a signature Σ = (A, C, τ ) is defined as ord(Σ) = maxc∈C (ord(τ (c))). For a given α ∈ TA , the set, Λα Σ , of linear λ-terms of type α built on the higher order signature Σ is the smallest set verifying: 1. 2. 3. 4.
if if if if
α ∈ TA then xα ∈ Λα Σ, c ∈ C then c ∈ Λα , Σ β 1 M1 ∈ Λαβ , M2 ∈ Λα Σ and F V (M1 ) ∩ F V (M2 ) = ∅ then (M1 M2 ) ∈ ΛΣ , Σ β αβ M ∈ ΛΣ and xα ∈ F V (M ) then λxα .M ∈ ΛΣ .
In Abstract Categorial Grammars, the λ-calculus is used to represent both the surface structure and the deep structure. The relation between these two levels is implemented with homomorphisms between higher order signatures. A homomorphism between the signatures Σ1 and Σ2 is a pair of functions (g, h) which respectively map TA1 to TA2 and ΛΣ1 to ΛΣ2 . Furthermore, homomorphisms verify the following constraints: 1
Here F V (M ) denotes the set of free variables of M ; this set is defined as usual.
C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 266–271, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Note on the Complexity of ACGs
1. 2. 3. 4. 5.
267
g(α β) = g(α) g(β), h(xα ) = xg(α) , τ (c) h(c) is a closed (i.e. F V (h(c)) = ∅) element of ΛΣ2 , h(M1 M2 ) = h(M1 )h(M2 ), h(λxα .M ) = λxg(α) .h(M ). g(α)
One has to note that whenever M ∈ Λα Σ1 then h(M ) ∈ ΛΣ2 . For a given homomorphism H = (g, h) we will in general write H(α) instead of g(α) and H(M ) instead of g(M ). The order of a homomorphism H between the signatures Σ1 = (A1 , C1 , τ1 ) and Σ2 = (A2 , C2 , τ2 ) is defined as ord(H) = maxα∈A1 (ord(H(α))). An Abstract Categorial Grammars G is 4-tuple (Σ1 , Σ2 , L, S) such that: 1. 2. 3. 4.
Σ1 is a higher order signature, the abstract vocabulary, Σ2 is a higher order signature, the object vocabulary, L is a homomorphism, the lexicon, S is an element of A1 , the distinguished type.
The ACG G defines two languages: 1. the abstract language, A(G) = {M ∈ ΛSΣ1 |M is closed}, L(S)
2. the object language, O(G) = {M ∈ ΛΣ1 |∃N ∈ A(G). L(N ) =βη M } The abstract language is the language in which deep structures are represented and the object language is the language in which surface structures are represented. We may use the word language instead of the expression object language. An ACG is said to be lexicalized when its lexicon associates a term that contains at least a constant from the object signature. An AGC is said to be of order k if its abstract signature is of order at most k. Furthermore an ACG G = (Σ1 , Σ2 , L, S) belongs to the set G(n, m) when ord(Σ1 ) ≤ n and ord(L) ≤ m. Intuitively this classification leads to a hierarchy of languages; indeed, increasing the order of the abstract signature increases the descriptive complexity of the abstract language and increasing the order of the lexicon gives more possibilities of transformations of the abstract structure into an object one. Concerning the expressiveness of second order ACGs many things are already known. In [3] it shown how to represent context free grammars, linear context free tree grammars, linear context free rewriting systems; [6] proves that the string languages generated by second order ACGs is always the language of a linear context free rewriting system and that therefore the hierarchy G(2, m) collapses at G(2, 4); and [4] proved that the tree languages generated by second order ACGs are always the tree languages of hyperedge replacement grammars. In what follows we show that the membership problem for the grammars of G(2, n) is polynomial. We also show that the universal membership problem is NP-complete for lexicalized grammars of G(2, 2) and we exhibit a lexicalized grammar of G(3, 1) whose language is NP-complete. These last results are an improvement over [7] who shows that the universal membership problem is NP-complete for lexicalized ACGs of G(4, 2) and exhibit a lexicalized ACG of
268
S. Salvati
G(4, 3) whose language is NP-complete. Furthermore, if P = NP, these results are optimal with respect to the hierarchy G(n, p). Indeed, since we show that the membership problem for grammars of G(2, n) is polynomial it is not possible (if P = NP) to find a grammar whose language is NP-complete in G(2, n); and, it is obvious that the universal membership is polynomial for grammars in G(2, 1).2
3
Second Order ACGs
In this section we give an original technique to prove that for second order ACGs, the membership problem is in general polynomial. This technique is based on a notion, syntactic descriptions, that has been introduced in [5] in order to address the higher order matching problem in the linear λ-calculus. Syntactic descriptions are in fact linear types that are built from the subterms of a given terms. The type system that uses syntactic descriptions differs from the usual type assignment performed for linear λ-terms by the way it treats constants. We denote the subterms of a λ-term u by the pairs (C[], t) where C[] is a context, i.e. a λ-term with a hole3 , t is a linear λ-term and C[t] is syntactically equal to u. We now define syntactic descriptions for terms in long normal forms. Given a higher order signature Σ = (A, C, τ ) and u an element of ΛΣ which is in long normal form, the family of sets (Duα )α∈IA is defined as the smallest family verifying: 1. if α ∈ A then Duα = {(C[], t) ∈ St |t ∈ Λα Σ }, 2. Duβα = Duβ × Duα . The elements of Duα are then used to type terms of Λα Σ ; the rules used to type terms are the following: d ∈ Duα α
α
u; x : d x : d u; Γ, xα : d t : e α
u; Γ λx .t : d e
Axiom
λ−abst.
(C[], a) ∈ Su u; a : θ(C[], a)
u; Γ1 t1 : d e
Constant
u; Γ2 t2 : d
u; Γ1 , Γ2 t1 t2 : e
App.
Given u an element of Λα Σ in long normal form and (C[], t) ∈ Su , θ(C[], t) is defined as follows: 1. if C[] = C [[]t ] then θ(C[], t) = θ(C [t[]], t ) θ(C [], tt ), 2
3
The normal forms of terms in the language, noted with de Bruijn convention, can easily be shown to be recognized by a bottom-up tree automaton whose size is linear with respect to the size of the grammar. Since normalizing a linear λ-term can be done in polynomial time, this gives the result. Note that when on puts a term M in the hole of a context C[], some variables that are free in t may be bound by C[]. For example if C[] = λx.[] and t = x then C[t] = λx.x.
A Note on the Complexity of ACGs
269
2. if t = λx.t then θ(C[], t) = θ(C[λx.Ct ,x []], x) θ(C[λx.[]], t ), 3. θ(C[], t) = (C[], t) otherwise. It is proved in [5] that for u closed and in long normal form, we have u; v : θ([], u) is derivable if and only if v =βη u. Thus to prove that a term u of ΛΣ2 is an element of O(G) it suffices to construct a term t of ΛSΣ1 such that u; L(t) : L(α)
θ([], u). To this end we saturate a set H of pairs (α, d) of ({α} × Du )α∈A1 . During one step, we transform the set H into a set H in the following way: L(α)
1. if there is c ∈ C1 such that τ1 (c) = α with α ∈ A1 and for d ∈ Du , u; L(c) : d is derivable then we let H = H ∪ {(α, d)} 2. if there is c ∈ C1 such that τ1 (c) = α1 · · · αn α0 with αi ∈ A1 for all i ∈ [0, n] and for all i ∈ [1, n] (di , αi ) ∈ H and u; L(c) : d1 · · · dn d0 , then we let H = H ∪ {(α0 , d0 )}. It is obvious that, with these rules, one may build a set containing the pair (S, θ([], u)) if and only if there is t ∈ ΛSΣ1 such that u; L(t) : θ([], u) is derivable, i.e. such that L(t) =βη u or u ∈ O(G). This algorithm can easily be implemented in polynomial time (parameters of the grammar being allowed to L(α) appear as exponents), since, the size of an element of Du is linearly bounded by the product of the size of α and of the size of u. This finally shows that the membership problem for ACGs of G(2, n) is polynomial.
4
Universal Membership Problem for Second Order ACGs
We now show that the universal membership problem for lexicalized grammars of G(2, 2) is NP-complete. We reduce this problem to the X3C problem which is known to be NP-complete [1]. X3C problems have as input a pair (X, B) where X = {a1 ; . . . ; a3n } is a set of 3n pairwise distinct elements and B = {B1 ; . . . ; Bm } is a set where Bi = {ai1 ; ai2 ; ai3 } with 1 ≤ i1 < i2 < i3 ≤ 3n. Solving an X3C problem amounts to find C ⊆ B such that C is a partition of X. To prove the NP-hardness of the universal membership problem of lexicalized ACGs of G(2, 2), for any instance of an X3C problem (X, B) we give an ACG GX,B and a term tX,B such that tX,B ∈ O(GX,B ) if and only if the X3C problem admits a solution. We let GX,B = (Σ1 , Σ2 , L, D0 ) where A1 = {D0 ; . . . ; Dn }, C1 = {E} ∪ {EBi ,k1 ,k2 ,k3 ,k |Bi ∈ B ∧ 0 ≤ k < n ∧ 1 ≤ k1 < k2 < k3 ≤ k}, τ1 (E) = Dn and τ1 (EBi ,k1 ,k2 ,k3 ,k ) = Dk+1 Dk . We also let A2 = {ι}, C2 = {e} ∪ X with e ∈ / X, τ2 (ai ) = ι for 1 ≥ i ≥ 3n and τ2 (e) = ι · · · ι ι. We then let 3n
· · ι ι, 1. L(Dk ) = ι · 3k
2. L(E) = λx1 . . . x3n .ex1 . . . x3n and, 3. L(EBi ,k1 ,k2 ,k3 ,k ) = λgx1 . . . xk1 −1 xk1 +1 . . . xk2 −1 xk2 +1 . . . xk3 −1 xk3 +1 . . . x3k . gx1 . . . xk1 −1 ai1 xk1 +1 . . . xk2 −1 ai2 xk2 +1 . . . xk3 −1 ai3 xk3 +1 . . . x3k
270
S. Salvati
It is then easy to prove that the term ea1 . . . a3m is in O(GX,B ) if and only if (X, B) admits a solution.
5
Universal Problem for Lexicalized ACGs
We now construct a lexicalized ACG of G(3, 1) whose language is NP-complete. The language recognized by this grammar contains an encoding of the set of 3-PARTITION problems that admit a solution. A 3-PARTITION problem is a pair ({s1 ; . . . ; s3m }, n) where n is an integer and for all i ∈ [1, 3m] si is an integer verifying n4 < si < n2 . Such a problem is said to admit a solutionif there is a partition (Si )i∈[1,m] of {s1 ; . . . ; s3m } such that for all i ∈ [1, m] s∈Si s = n. Remark that the Si must exactly contain three elements. Determining whether a 3-PARTITION problem admits a solution is known to be NP-complete [1]. We now build G = (Σ1 , Σ2 , L, S) with the desired properties. We let A1 = {B1 ; B2 ; B3 ; C; D; E; L; S}, C1 = {e; e ; nil; f1 ; f2 ; f3 ; nil; cons; h} with: 1. τ1 (e) = (B1 B2 B3 C D) S, 2. τ1 (e ) = L S, 3. τ1 (f1 ) = τ1 (f2 ) = τ1 (f3 ) = (B1 B2 B3 C D) (B1 B2 B3 C D), 4. τ1 (cons) = E L L, 5. τ1 (nil) = L and 6. τ1 (h) = (E E E E S) (B1 B2 B3 C D). We let A2 = {∗}, C2 = {a, b, c, d, o} with τ2 (a) = ∗ ∗ ∗, τ2 (b) = τ2 (c) = τ2 (d) = ∗ ∗ an τ2 (o) = ∗. Finally we define the lexicon as follows: 1. 2. 3. 4. 5. 6. 7. 8. 9.
L(α) = ∗ for all α ∈ A1 , L(e) = λf.f o o o o L(e ) = λx.d x L(f1 ) = λf x1 x2 x3 y.f (b x2 ) x2 x3 (c y) L(f2 ) = λf x1 x2 x3 y.f x1 (b x2 ) x3 (c y) L(f3 ) = λf x1 x2 x3 y.f x1 x2 (b x3 ) (c y) L(cons) = λx y.a x y L(nil) = o L(h) = λf x1 x2 x3 y.f (d x1 ) (d x2 ) (d x3 ) (d y)
The idea behind the reduction is that the abstract constant cons codes for a list constructor while nil represents the empty list, the constant h takes a list where there are four places which are not specified and give them the type of four kinds of stacks, B1 , B2 , B3 and C, the constant fi pushes one b on the stack Bi and at the same time it pushes with a c on the stack C at the object level. The constants e closes the bottom of the stack with an o and the constant e ends a list. Thus the grammar generates lists that contain integers of two kinds represented as monadic trees of b’s or monadic trees of c’s. The construction guaranties that the integers made of b’s can be partitioned in triples {p1 ; p2 ; p3 } which are put in bijection with the integers made of c’s such that if n is the
A Note on the Complexity of ACGs
271
integer associated to {p1 ; p2 ; p3 } we have n = p1 + p2 + p3 . Thus verifying that a certain 3-PARTITION problem ({s1 ; . . . ; s3m }, n) has a solution amounts to check whether a list that contains each si represented with b’s and m times the integer n represented with c’s is an element of O(G).
References 1. Garey, M.R., Johnson, D.S.: Computers and Intractability – A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979) 2. de Groote, P.: Towards abstract categorial grammars. In: for Computational Linguistic, A. (ed.) Proceedings 39th Annual Meeting and 10th Conference of the European Chapter, pp. 148–155. Morgan Kaufmann Publishers, San Francisco (2001) 3. de Groote, P., Pogodalla, S.: On the expressive power of abstract categorial grammars: Representing context-free formalisms. Journal of Logic, Language and Information 13(4), 421–438 (2005) 4. Kanazawa, M.: Second-order acgs as hyperedge replacement grammars. In: Workshop on New Directions in Type-theoretic Grammars, NDTTG (2007) 5. Salvati, S.: Syntactic descriptions: a type system for solving matching equations in the linear λ-calculus. In: proceedings of the 17th International Conference on Rewriting Techniques and Applications, pp. 151–165 (2006) 6. Salvati, S.: Encoding second order string acg with deterministic tree walking transducers. In: Wintner, S. (ed.) Proceedings FG 2006: the 11th conference on Formal Grammars. FG Online Proceedings, pp. 143–156. CSLI Publications, Stanford (2007) 7. Yoshinaka, R., Kanazawa, M.: The complexity and generative capacity of lexicalized abstract categorial grammars. In: Blache, P., Stabler, E.P., Busquets, J.V., Moot, R. (eds.) LACL 2005. LNCS (LNAI), vol. 3492, pp. 330–346. Springer, Heidelberg (2005)
Almost All Complex Quantifiers Are Simple Jakub Szymanik Department of Philosophy, Utrecht University Heidelberglaan 6, 3584 CS Utrecht, The Netherlands
[email protected]
Abstract. We prove that PTIME generalized quantifiers are closed under Boolean operations, iteration, cumulation and resumption. Keywords: generalized quantifiers; computational complexity; polyadic quantifiers; Boolean combinations; iteration; cumulation; resumption.
1
Introduction
Most research in generalized quantifier theory has been directed towards monadic quantification in natural language. The recent monograph [1] bears witness to this tendency, devoting more than 90% of its volume to the discussion of monadic quantifiers. It is then clear that definability and complexity of monadic quantifiers has been extensively studied. For example, it is known that monadic quantifiers definable in first-order logic, like “all” or “at least 7”, are recognizable by acyclic finite-automata [2]; first-order logic enriched by all quantifiers of the form “divisible by n” (e.g., “even” and “odd”) corresponds to the class of all regular languages [3]; and that proportional quantifiers, like “most”, can be recognized by push-down automata. Those results suggest that the comprehension of monadic quantifiers in natural language should be relatively easy. Indeed, a corpus of empirical studies shows that the cognitive task of recognizing logical-value of sentences with monadic quantifiers is simple. In fact, recently the automatatheoretic model for processing monadic quantifiers has been confronted with the human comprehension and the results show that the model captures many important cognitive aspects of the problem (see e.g. [4,5]). Those studies — linking computational complexity with cognitive processing — lead to natural interest in computational properties of polyadic quantification (multi-quantifier sentences). In the case of polyadic quantifiers things look differently. In the logical literature one can find examples of natural language polyadic quantifiers with computationally intractable model-checking problems. Branching quantifiers are an important example. It has been observed that branching readings of natural language sentences define NP-complete problems (see [6,7,8]). For instance, consider the following sentences: (1) Some book by every author is referred to in some essay by every critic. (2) In my class most boys and most girls dated each other. C. Ebert, G. Jäger, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 272–280, 2010. c Springer-Verlag Berlin Heidelberg 2010
Almost All Complex Quantifiers Are Simple
273
Sentences with branching interpretations are definitely not a typical part of everyday language (see e.g. [9]) but one can find less controversial examples of semantic constructions with high computational complexity. For example, some reciprocal sentences with quantified antecedents are NP-complete given a plausible interpretation (see [10]), e.g. the following sentence: (3) Most parliament members refer to each other. Additionally, collective quantification, like in sentence (4), can also lead to combinatorial explosion when the verification process is considered (see [11]). (4) Most of the PhD students played Hold’em together. Those results show that the direct understanding of some polyadic quantifiers in natural language might be difficult, if not impossible (see e.g. [12,8]). However, one might ask whether those examples are representative for natural language? In this paper we study the most common semantic constructions that turn simple quantifiers into complex ones: Boolean operations, iteration, cumulation and resumption. We prove that they do not increase computational complexity when applied to determiners. More precisely, PTIME quantifiers are closed under the application of those lifts. Most of the natural language determiners correspond to monadic quantifiers computable in polynomial time. This observation suggests that typically polyadic quantifiers in natural language are tractable and the intractable constructions mentioned in the previous paragraph and extensively studied in logic are rather exceptional. 1.1
Mathematical Preliminaries
Generalized Quantifiers. Usually in linguistic semantics quantifiers are treated as relations between subsets of the universe. However, it is well-known, and often used in logic, that equivalently generalized quantifiers might be defined as classes of models (see e.g. [1]). The formal definition is as follows: Definition 1. Let t = (n1 , . . . , nk ) be a k-tuple of positive integers. A generalized quantifier of type t is a class Q of models of a vocabulary τt = {R1 , . . . , Rk }, such that Ri is ni -ary for 1 ≤ i ≤ k, and Q is closed under isomorphisms, i.e. if M and M are isomorphic, then (M ∈ Q ⇐⇒ M ∈ Q). If in the above definition for all i: ni ≤ 1, then we say that a quantifier is monadic, otherwise we call it polyadic. Let us explain this definition further by giving a few examples. Consider a sentence of the form Every A is B, where A stands for poets and B for people having low self-esteem. The sentence is true if and only if A ⊆ B. Therefore, according to the definition, the quantifier “every” is of type (1, 1) and corresponds to the class of models (M, A, B) in which A ⊆ B. For the same reasons the quantifier “an even number of” corresponds to the class of models in which
274
J. Szymanik
the cardinality of A ∩ B is an even number. Finally, let us consider the quantifier “most” of type (1, 1). The sentence Most As are B is true if and only if card(A ∩ B) > card(A − B) and therefore the quantifier corresponds to the class of models where this inequality holds. Therefore, formally speaking: ∀ = {(M, A) | A = M }. ∃ = {(M, A) | A ⊆ M and A
= ∅}. Every = {(M, A, B) | A, B ⊆ M and A ⊆ B}. Even = {(M, A, B) | A, B ⊆ M and card(A ∩ B) is even}. Most = {(M, A, B) | A, B ⊆ M and card(A ∩ B) > card(A − B)}. Quantifiers in Finite Models. In the paper we are interested in finite models as arguably it is enough to model typical meanings of natural language expressions (see e.g. [5] for a discussion). Finite models can be encoded as finite strings over some vocabulary as follows. Let K be a class of finite models over some fixed vocabulary τ . We want to treat K as a problem (language) over the vocabulary τ . To do this we need to code τ -models as finite strings. We can assume that the universe of a model M ∈ K consists of natural numbers: U = {1, . . . , n}. A natural way of encoding a model M (up to isomorphism) is by listing its universe, U , and storing the interpretation of the symbols in τ by writing down their truth-values on all tuples of objects from U . Definition 2. Let τ = {R1 , . . . , Rk } be a relational vocabulary and M a τ model of the following form: M = (U, R1M , . . . , RkM ), where U = {1, . . . , n} is the universe of model M and RiM ⊆ U ni is an ni -ary relation over U , for 1 ≤ i ≤ k. We define a binary encoding for τ -models. The code for M is a word over {0, 1, #} of length O((card(U ))c ), where c is the maximal arity of the predicates in τ (or c = 1 if there are no predicates). The code has the followning form: n ˜ #R˜1M # . . . #R˜nM , where: – n ˜ is the part coding the universe of the model and consists of n 1s. – R˜iM — the code for the ni -ary relation RiM — is an nni -bit string whose j-th bit is 1 iff the j-th tuple in U ni (ordered lexicographically) is in RiM . – # is a separating symbol. Therefore, according to Definition 1, generalized quantifiers can be treated as classes of such finite strings, i.e., languages. Now we can easily fit the notions into the descriptive complexity paradigm (see e.g. [13]). Definition 3. By the complexity of a quantifier Q we mean the computational complexity of the corresponding class of finite models.
Almost All Complex Quantifiers Are Simple
275
For example, consider a quantifier of type (1, 2): a class of finite colored graphs of the form M = (M, AM , RM ). Let us take a model of this form, M, and a quantifier Q. Our computational problem is to decide whether M ∈ Q; or equivalently, to solve the query whether M |= Q[A, R]. This can simply be viewed as the modelchecking problem for quantifiers. In the following chapter we investigate the computational complexity of model checking problem for polyadic quantifiers commonly occuring in natural language. But before we turn to it we need to recall some basic concepts of the computational complexity theory (see e.g. [14] for an extensive treatment). Complexity Classes. Let f : ω −→ ω be a natural number function. TIME(f ) is the class of languages (problems) which can be recognized by a deterministic Turing machine in time bounded by f with respect to the length of the input. In other words, L ∈ TIME(f ) if there exists a deterministic Turing machine such that for every x ∈ L, the computation path of M on x is shorter than f (n), where n is the length of x. TIME(f ) is called a deterministic computational complexity class. A non-deterministic complexity class, NTIME(f ), is the class of languages L for which there exists a non-deterministic Turing machine M such that for every x ∈ L all branches in the computation tree of M on x are bounded by f (n) and moreover M decides L. One way of thinking about a non-deterministic Turing machine bounded by f is that it first guesses the right answer and then deterministically in a time bounded by f checks if the guess is correct. SPACE(f ) is the class of languages which can be recognized by a deterministic machine using at most f (n) cells of the working-tape. NSPACE(f ) is defined analogously. Below we define some well-known complexity classes, i.e., the sets of languages of related complexity. In other words, we can say that a complexity class is the set of problems that can be solved by a Turing machine using O(f (n)) of time or space resource, where n is the size of the input. Definition 4 – LOGSPACE = k∈ω SPACE(k log n) – PTIME = k∈ω TIME(nk ) – NP = k∈ω NTIME(nk ) If L ∈ NP, then we say that L is decidable (computable, solvable) in nondeterministic polynomial time and likewise for other complexity classes. Moreover, we will need a concept of relativization defined via oracle machines. An oracle machine can be described as a Turing machine with a black box, called an oracle, which is able to decide certain decision problems in a single step. More precisely, an oracle machine has a separate write-only oracle tape for writing down queries for the oracle. In a single step, the oracle computes the query, erases its input, and writes its output to the tape. Definition 5. If B and C are complexity classes, then B relativized to C, B C , is the class of languages recognized by oracle machines which obey the bounds defining B and use an oracle for problems belonging to C.
276
J. Szymanik
The question whether PTIME is strictly contained in NP is the famous Millennium Problem — one of the most fundamental problems in theoretical computer science, and in mathematics in general. The importance of this problem reaches well outside the theoretical sciences as the problems in NP are usually taken to be intractable or not efficiently computable as opposed to the problems in P which are conceived of as efficiently solvable. In the paper we take this distinction for granted and investigate semantic constructions in natural language from that perspective.
2
Complexity of Polyadic GQs in Language
One way to deal with polyadic quantification in natural language is to define it in terms of monadic quantifiers using Boolean combinations and so-called polyadic lifts. Below we recall definitions of the Boolean combinations of quantifiers and some well-known polyadic lifts: iteration, cumulation, and resumption (see e.g. [15]). Next we observe that they do not increase the computational complexity of quantifiers. 2.1
Boolean Combinations
To account for complex noun phrases, like those occurring in sentences (5)– (8), we define disjunction, conjunction, outer negation (complement) and inner negation (post-complement) of generalized quantifiers. (5) (6) (7) (8)
At least 5 or at most 10 departments can win EU grants. (disjunction) Between 100 and 200 students started in the marathon. (conjunction) Not all students passed. (outer negation) All students did not pass. (inner negation)
Definition 6. Let Q, Q be generalized quantifiers, both of type (n1 , . . . , nk ). We define: (Q ∧ Q )M [R1 , . . . , Rk ] ⇐⇒ QM [R1 , . . . , Rk ] and QM [R1 , . . . , Rk ] (conjunction) (Q ∨ Q )M [R1 , . . . , Rk ] ⇐⇒ QM [R1 , . . . , Rk ] or QM [R1 , . . . , Rk ] (disjunction). (¬Q)M [R1 , . . . , Rk ] ⇐⇒ not QM [R1 , . . . , Rk ] (complement) (Q¬)M [R1 , . . . , Rk ] ⇐⇒ QM [R1 , . . . , Rk−1 , M − Rk ] (post-complement) 2.2
Iteration
The Fregean nesting of first-order quantifiers, e.g., ∀∃, can be applied to any generalized quantifier by means of iteration. For example, iteration may be used to express the meaning of the following sentence in terms of its constituents.
Almost All Complex Quantifiers Are Simple
277
(9) Most logicians criticized some papers. The above sentence is true (under one interpretation) iff there is a set containing most logicians such that every logician from that set criticized at least one paper, or equivalently: It(Most, Some)[Logicians, Papers, Criticized]. Of course the sentence can have a different reading corresponding to other lifts than iteration. We will introduce another possibility in Section 2.3. But first let us define iteration precisely. Definition 7. Let Q and Q be generalized quantifiers of type (1, 1). Let A, B be subsets of the universe and R a binary relation over the universe. Suppressing the universe, we will define the iteration operator as follows: It(Q, Q )[A, B, R] ⇐⇒ Q[A, {a | Q [B, R(a) ]}], where R(a) = {b | R(a, b)}. Therefore, the iteration operator produces polyadic quantifiers of type (1, 1, 2) from two monadic quantifiers of type (1, 1). The definition can be extended to cover iteration of monadic quantifiers with an arbitrary number of arguments (see e.g. [1], p. 347). 2.3
Cumulation
Consider the following sentence: (10) Eighty professors taught sixty courses at ESSLLI’08. The analysis of this sentence by iteration of the quantifiers “eighty” and “sixty” implies that there were 80 × 60 = 4800 courses at ESSLLI. Therefore, obviously this is not the meaning we would like to account for. This sentence presumably means neither that each professor taught 60 courses (It(80, 60)) nor that each course was taught by 80 professors (It(60, 80)). In fact, this sentence is an example of so-called cumulative quantification, saying that each of the 80 professors taught at least one from 60 course and each of the courses was taught by at least one professor. Cumulation is easily definable in terms of iteration and the existential quantifier as follows. Definition 8. Let Q and Q be generalized quantifiers of type (1, 1). A, B are subsets of the universe and R is a binary relation over the universe. Suppressing the universe we will define the cumulation operator as follows: Cum(Q, Q )[A, B, R] ⇐⇒ It(Q, Some)[A, B, R] ∧ It(Q , Some)[B, A, R−1 ].
278
J. Szymanik
2.4
Resumption
The next lift we are about to introduce — resumption (vectorization) — has found many applications in theoretical computer science (see e.g. [13]). The idea here is to lift a monadic quantifier in such a way as to allow quantification over tuples. This is linguistically motivated when ordinary natural language quantifiers are applied to pairs of objects rather than individuals, for instance: (11) Most twins never separate. Moreover, resumption is useful for interpretation of certain cases of adverbial quantification (see e.g. [1], Ch. 10.2). Below we give a formal definition of the resumption operator. Definition 9. Let Q be any monadic quantifier with n arguments, U a universe, and R1 , . . . , Rn ⊆ U k for k ≥ 1. We define the resumption operator as follows: Resk (Q)U [R1 , . . . , Rn ] ⇐⇒ (Q)U k [R1 , . . . , Rn ]. That is, Resk (Q) is just Q applied to a universe, U k , containing k-tuples. In particular, Res1 (Q) = Q. Clearly, one can use Res2 (Most) to express the meaning of sentence (11). 2.5
PTIME GQs Are Closed under It, Cum, and Res
When studying the computational complexity of quantifiers a natural question arises in the context of polyadic lifts. Do they increase complexity? For example, is it possible that two tractable determiners can be turned into an intractable quantifier? We show that PTIME computable quantifiers are closed under Boolean combinations and the three lifts defined above. As we are interested in the strategies people may use to comprehend quantifiers we show a direct construction of the relevant procedures. In other words, we show how to construct a polynomial model-checker for our polyadic quantifiers from PTIME Turing machines computing monadic determiners. Proposition 1. Let Q and Q be monadic quantifiers computable in polynomial time with respect to the size of a universe. Then the quantifiers: (1) ¬Q; (2) Q¬; (3) Q ∧ Q ; (4) It(Q, Q ); (5) Cum(Q, Q ); (6) Res(Q) are PTIME computable. Proof. Let us assume that there are Turing machines M and M computing quantifiers Q and Q , respectively. Moreover M and M work in polynomial time with respect to any finite universe U . (1) A Turing machine computing ¬Q is like M . The only difference is that we change accepting states into rejecting states and vice versa. In other words, we accept ¬Q whenever M rejects Q and reject whenever M accepts. The working time of so-defined new Turing machine is exactly the same as the working time of machine M . Hence, the outer negation of PTIME quantifiers can be recognized in polynomial time.
Almost All Complex Quantifiers Are Simple
279
(2) Recall that on a given universe U we have the following equivalence: (Q¬)U [R1 , . . . , Rk ] ⇐⇒ QU [R1 , . . . , Rk−1 , U − Rk ]. Therefore, for the inner negation of a quantifier it suffices to compute U − Rk and then use the polynomial Turing machine M on the input QU [R1 , . . . , Rk−1 , U − Rk ]. (3) To compute Q ∧ Q we have to first compute Q using M and then Q using M . If both machines halt in an accepting state then we accept. Otherwise, we reject. This procedure is polynomial, because the sum of the polynomial bounds on working time of M and M is also polynomial. (4) Recall that It(Q, Q )[A, B, R] ⇐⇒ Q[A, A ], where A = {a|Q [B, R(a) ]}, for R(a) = {b|R(a, b)}. Notice that for every a from the universe, R(a) is a monadic predicate. Now, to construct A in polynomial time we execute the following procedure for every element from the universe. We initialize A = ∅. Then we repeat for each a from the universe the following: Firstly we compute R(a) . Then using the polynomial machine M we compute Q [B, R(a) ]. If the machine accepts, then we add a to A . Having constructed A in polynomial time we just use the polynomial machine M to compute Q[A, A ]. (5) Notice that cumulation is defined in terms of iteration and existential quantifier (see Definition 8). Therefore, this point follows from the previous one. (6) To compute Resk (Q) over the model M = {{1, . . . , n}, R1 , . . . , Rn } for a fixed k, we just use the machine M with the following input n˜k #R˜1 # . . . #R˜n instead of n ˜ # . . . Recall Definition 2. Additionally, let us give an argument that the above proposition holds for all generalized quantifiers not only for the monadic ones. Notice that the Boolean operations as well as iteration and cumulation are definable in firstorder logic. Recall that the model-checking problem for first-order sentences is in LOGSPACE ⊆ PTIME (see e.g. [13]). Let A be a set of generalized quantifiers of any type from a given complexity class C. Then the complexity of model-checking for sentences from FO(A) is in LOGSPACEC (deterministic logarithmic space with an oracle from C). One simply uses a LOGSPACE Turing machine to decide the first-order sentences, evoking the oracle when a quantifier from A appears. Therefore, the complexity of Boolean combinations, iteration and cumulation of PTIME generalized quantifiers has to be in LOGSPACEPTIME = PTIME. The case of the resumption operation is slightly more complicated. Resumption is not definable in first-order logic for all generalized quantifiers (see [16]). However, notice that our argument given in point (6) of the proof do not make use of any assumption about the arity of Ri . Therefore, the same proof works for resumption of polyadic quantifiers. The above considerations allow us to formulate the following theorem which is the generalization of the previous proposition. Theorem 1. Let Q and Q be generalized quantifiers computable in polynomial time with respect to the size of a universe. Then the quantifiers: (1) ¬Q; (2) Q¬; (3) Q ∧ Q ; (4) It(Q, Q ); (5) Cum(Q, Q ); (6) Res(Q) are PTIME computable.
280
3
J. Szymanik
Conclusion
We have shown that PTIME quantifiers are closed under Boolean operations as well as under the polyadic lifts occurring frequently in natural language. In other words, these operations do not increase the computational complexity of quantifiers. As we can safely assume that most of the simple determiners in natural language are PTIME computable the semantics of the polyadic quantifiers studied above is tractable. This seems to be good news for the computational theory of natural language processing.
References 1. Peters, S., Westerståhl, D.: Quantifiers in Language and Logic. Clarendon Press, Oxford (2006) 2. van Benthem, J.: Essays in logical semantics. Reidel, Dordrecht (1986) 3. Mostowski, M.: Computational semantics for monadic quantifiers. Journal of Applied Non-Classical Logics 8, 107–121 (1998) 4. McMillan, C.T., Clark, R., Moore, P., Devita, C., Grossman, M.: Neural basis for generalized quantifier comprehension. Neuropsychologia 43, 1729–1737 (2005) 5. Szymanik, J., Zajenkowski, M.: Comprehension of simple quantifiers. Empirical evaluation of a computational model. Cognitive Science 34(3), 521–532 (2010) 6. Mostowski, M., Wojtyniak, D.: Computational complexity of the semantics of some natural language constructions. Annals of Pure and Applied Logic 127(1-3), 219–227 (2004) 7. Sevenster, M.: Branches of imperfect information: logic, games, and computation. PhD thesis, Universiteit van Amsterdam (2006) 8. Szymanik, J.: Quantifiers in TIME and SPACE. Computational Complexity of Generalized Quantifiers in Natural Language. PhD thesis, Universiteit van Amsterdam (2009) 9. Gierasimczuk, N., Szymanik, J.: Branching quantification vs. two-way quantification. Journal of Semantics 26(4), 367–392 (2009) 10. Szymanik, J.: The computational complexity of quantified reciprocals. In: Bosch, P., Gabelaia, D., Lang, J. (eds.) LNCS (LNAI), vol. 5422, pp. 139–152. Springer, Heidelberg (2008) 11. Kontinen, J., Szymanik, J.: A remark on collective quantification. Journal of Logic, Language and Information 17(2), 131–140 (2008) 12. Frixione, M.: Tractable competence. Minds and Machines 11(3), 379–397 (2001) 13. Immerman, N.: Descriptive Complexity. In: Texts in Computer Science, Springer, Heidelberg (1998) 14. Papadimitriou, C.H.: Computational Complexity. Addison Wesley, Reading (November 1993) 15. van Benthem, J.: Polyadic quantifiers. Linguistics and Philosophy 12(4), 437–464 (1989) 16. Hella, L., Väänänen, J., Westerståhl, D.: Definability of polyadic lifts of generalized quantifiers. Journal of Logic, Language and Information 6(3), 305–335 (1997)
Constituent Structure Sets I Hiroyuki Uchida and Dirk Bury Rensselaer Polytechnic Institute, University of Bangor
Abstract. We replace traditional phrase structure tree representations by a new type of set-based representations. In comparison to labeled trees in which we can potentially copy one category onto an infinite number of distinct nodes, the proposed representation system significantly restricts syntactic copying. Thus, we do not need to filter out copying possibilities by way of additional constraints. Each structure represents a set of constituents with structurally distinguished items. We provide the intended PF structures with regard to which our set based syntactic representations are sound and complete via our PF interpretation rules. The intended PF structures provide enough flexibility to accommodate word order variation across languages relative to the same syntactic structures.
1
Introduction
We present a non-tree-based structure representation system called Constituency Structure Sets (CSS) and outline some of its features. The system is less expressive than common tree-based alternatives. However, the system can still represent syntactic copying with an inherent upper bound. Let us compare different structure representation systems with the same numeration set in (1). (1) The numeration set, N = {H, A} We assume that each numeration set is isomoprohic to the set of phonological words that appear in the phonological string that the structure in question represents. That is, the numeration set for the syntactic structure for Meg plays tennis hard is N i = {M eg, plays, tennis, hard} or {D1, V, D2, Adv}. For our discussion, it is not important how to represent each member of this numeration set since our aim is to compare the expressive powers of different structure representation systems. Also, we are not comparing syntactic theories that might or might not use these structure representations as theoretical tools. Given the same numeration set N in (1), we can have two kinds of structure representations, as shown in (2). (2) a) Labeled tree: H
b) Telescope 1: H
c) Telescope 2: A
A H
A
H
HA C. Ebert, G. J¨ ager, and J. Michaelis (Eds.): MOL 10/11, LNAI 6149, pp. 281–296, 2010. c Springer-Verlag Berlin Heidelberg 2010
282
H. Uchida and D. Bury
We can understand labeled trees in terms of a reflexive dominance (RD) relation1 that partially orders syntactic tree nodes that in turn are decorated with members of the numeration set N . If we assume that the set of tree nodes is potentially infinite, then we can spread the same member of N onto different nodes, either onto the mother nodes, as with H in (2a) or onto a terminal/leaf node, as with A in (2b) for a potentially infinite number of times.2 In practice, many syntactic theories have particular generative mechanisms that can only generate finite trees and most of them do not commit themselves to the full expressive power of the labeled trees as we describe above since they are using labeled trees only as a convenient description tool. But it is interesting to use an alternative structure representation system that simply cannot represent an abundant duplication of the same member of the numeration set N in the first place. A benefit of using such a restrictive system is that given that each N is finite (which we can naturally assume, considering that each phonological string that we consider only contains a finite number of phonological words), then each structure with N is provably finite without additional restrictions. Based on such considerations, we could define a reflexive dominance relation RD directly between members of N , instead of going via syntactic nodes. Then, given the numeration set N = {H, A}, we can only have two syntactic structures, represented as (2b) and (2c). Since this representation system shares formal properties with telescope trees that are proposed in [1], we call this system telescope structures.3 Note that the notion of tree nodes does not play any role in telescope structures, as becomes clear if we represent (2b) as {H, H, H, A, A, A}, where H, A (= RD(H, A)) means that H reflexively dominates A. Assuming that each structure comes with exactly one binary relation RD, which is a subset of N × N , if N is finite, then each telescope structure is provably finite. Also we can have only a finite number of distinct telescope structures. With N as in (1), we can have only two structures.4 The restrictiveness of telescope structures is attractive. However, this system cannot duplicate any item of the numeration set. A certain degree of syntactic duplication of a lexically provided item is empirically well-motivated.5 Thus, it 1
2 3 4 5
We keep the properties of this basic syntactic relation constant across different representation systems that we compare. As we see later, the formal relational properties that we assign to this basic relation are the properties of reflexive dominance as in traditional maximally binary branching rooted tree structures. For convenience, we call the duplication of a label both onto a mother node and onto a leaf node a syntactic copying. However, just like we do not discuss GB or Minimalism that uses labeled trees as a descriptive tool, we are not concerned about Brody’s syntactic theory itself, either. We assume that each member of N must occur at least once in the syntactic structure whichever representation system we may use. See [2] for some data that telescope structures can deal with in an adequate manner only by increasing the membership of the numeration set in comparison to alternative representation systems. The paper argues that it is better to duplicate a member of N in the syntax than to increase the membership of the numeration set in an arbitrary manner if the maximal limit on the syntactic duplication falls out from the property of the representation system.
Constituent Structure Sets I
283
is worth pursing a new structure representation system that maintains certain formal properties of telescope structures but can still duplicate a member of N with a clear maximal limit. Given such considerations, we start with the power set of the numeration set N , as shown in (3). (3) ℘(N ) = {{H, A}, {H}, {A}, ∅} In telescope trees, the reflexive dominance RD is a relation between members of N . Instead, we base this basic syntactic relation on the subset relation between members of ℘(N ). If we maintain the same common properties of RD which we formally define in the main section, then given N in (1), we can have the following structures.6 (4) a.
{H,A} {A} {H}
b. {H,A}
c. {H}
d. {A}
{A}
{H}
{A}
If N is finite, then ℘(N ) is provably finite. Since our syntactic relation based on set containment directly orders (particular) members of ℘(N ), each syntactic structure is provably finite in this representation system, and also we can have only a finite number of distinct structures relative to the same finite N . The Boolean lattice in (5) shows why our structures are finite. (5)
a. N 2 = {T, V, D1, D2} (or {can, play, M eg, tennis}) b. {T,V,D1,D2} {T,V,D1} {T,V} {T,D1} {T}
{T,V,D2}
{T,D1,D2} {V,D1,D2}
{T,D2}
{V,D1}
{V,D2}
{V}
{D1}
{D2}
{D1,D2}
∅
Each power set is partially ordered with regard to the set containment relation and ℘(N 2) can be represented as in (5b). The left-to-right linear order is not meaningful in this lattice structure. That is, the sets that are in the same row are not ordered. Given the finite numeration set N 2, (5) sets the maximal bound of a syntactic structure at the foundational level. Since the common reflexive dominance relation is more restrictive than the set containment relation each syntactic structure that our system can represent is smaller than the lattice in (5). But that again means that each structure in our system is provably finite. 6
We use tree notations in (4) for convenience only and the left-to-right order in the trees is not meaningful, as becomes clearer with (5) below.
284
H. Uchida and D. Bury
Coming back to our structures in (4), we regard each set that appears in each structure as a constituent of that structure. For empirical reasons, we specify the head of each such constituent. For example, if we assume the head of {H, A} is H, we represent it as {H, {H, A}}7 We call each constituent with its head specified such as {H, {H, A}} a treeet. We call {H, A} the dominance set of the treelet. Since every constituent (which corresponds to a member of ℘(N )) must have exactly one head, if the set is a singleton set, that unique member itself must be the head, as in {A, {A}}. Then, (4a) can become (6a) and (4b) can become (6b).8 (6) a.
{H, {H, A}}
b. {H, {H, A}}
{A, {A}} {H, {H}}
{A, {A}}
In (6a), the item H heads two constituents, that is, {H, A} and {H}. This is our syntactic copying of the item H. As we have intuitively shown above and as we more formally show in the main sections, this copying has a maximal limit falling out from the formal properties of the representation system. Thus, unlike labeled trees, we do not need to filter out excessive copying possibilities post-hoc. Section 2 provides a definition of our structure representation system. Section 3 shows with examples that CSSs have weaker expressive powers than labeled trees. Section 4 shows the intended Phonological interpretation of CSSs. Section 5 provides concluding remarks.
2
Definition of CSSs
This section provides the definition of our syntactic structures. Our system can express a limited degree of syntactic copying. We suggest that this restricted amount of syntactic copying is useful in linguistic application. Crucially, the maximal limit of copying falls out from the basic definition of our system. Following [3], we replace each tree by a Constituent Structure Set (CSS), which is a set of ‘treelets.’ Each structure is given as in (7). (7) Structure :=< Cat, CSS, RC > Cat is the numeration set, or the set of categories/items that appear in the structure. As in the previous section, we assume that for each structure the number of items in Cat corresponds to the number of overt phonological expressions except for some well-motivated functional heads on the verbal projection line, such as T (for tense) and v (in the double object construction). In the rest of the paper, each numeration set may contain only T as such an additional head. 7 8
We could represent this as an ordered pair H, {H, A}. This choice is not crucial. We can of course choose the heads in a different manner, but the number of such choices is still finite. Also, we assume that some system external considerations tell us which items are heads when the constituent sets contain more than one item.
Constituent Structure Sets I
285
For each a ∈ Cat, we have at least one treelet in the form as in (8a). Also, if a category a heads a treelet, the dominance set in the treelet must contain a as a member, as stated in (8b). CSS is a set of such treelets. (8) Treelet: a. In each structure, for each a ∈ Cat, CSS contains at least one treelet x such that x = {a, Dx } and {a} ⊆ Dx ⊆ Cat. b. In each CSS, for each x ∈ CSS and for each a ∈ Cat, if a is the head of x, then {a} ⊆ Dx ⊆ Cat. RC (mnemonic for ‘reflexive containment’) in (7) is a binary relation between treelets, which is analogous to reflexive dominance, though unlike reflexive dominance which is defined between tree-nodes in labeled tree representations, RC is defined between treelets.9 As we have explained in section 1, the containment relation between treelets is isomorphic to the set-containment relation between the dominance sets of those treelets, as specified in (9a). For each treelet x ∈ CSS, Dx represents the dominance set of the treelet x. (9)
a. Reflexive Containment (RC) : ∀x, y ∈ CSS.(RC(x, y) ⇔ (Dx ⊇ Dy )) b. Immediate Containment (IC): ∀x, y ∈ CSS. (IC(x, y) ⇔ (RC(x, y) & x = y & (¬∃z ∈ CSS.(RC(x, z) & RC(z, y) & z =x&z = y))))
The basic relation of our structure representation system is RC in (9a), but (9b) defines a derived relation of Immediate Containment IC. We use IC for showing some proofs and the generation of phonological strings later. As we have discussed in section 1, for each structure, the set of all the dominance sets in the structure is a subset of ℘(Cat) (i.e., the power set of the numeration set in that structure). Thus, if the Cat is finite, then each CSS structure is necessarily finite. Each CSS has a unique ‘maximal’ treelet with regard to RC. As we saw in section 1, the dominance set of this maximal treelet is necessarily the same as the numeration set Cat. (10) Maximal treelet, ∃x ∈ CSS.∀y ∈ CSS.RC(x, y) The dominance set of the maximal treelet is the same as the numeration set Cat of that structure. Reflexive containment RC is a partial order, as in (11). (11)
a. Reflexivity: ∀x ∈ CSS.RC(x, x) b. Transitivity: ∀x, y, z ∈ CSS.[(RC(x, y)&RC(y, z) → RC(x, z)]
9
Other than this difference with regard to which elements we partially order, we aim to have reflexive containment and reflexive dominance share basically the same relational properties, as we see shortly. See [4] for some common relational properties of the notion of dominance as in syntactic trees.
286
H. Uchida and D. Bury
c. Antisymmetry: ∀x, y ∈ CSS.[(RC(x, y)&RC(y, x) → (x = y)] d. (Corollary A: ∀x, y ∈ CSS : (Dx = Dy ) → x = y) Because of Antisymmetry in (11c) together with the definition of RC as in (9a), it follows that a CSS cannot contain two treelets that have the same dominance set but have different heads, as specified in Corollary A in (11d). To see this point, suppose that the dominance set Dx of x ∈ CSS and the dominance set Dy of y ∈ CSS contain exactly the same members of Cat, then Dx ⊆ Dy & Dy ⊆ Dx . By (9a), RC(x, y) & RC(y, x). Then, by (11c), x = y. According to our interpretation of ‘=,’ this means that x and y must be the same with regard to both the dominance set and the head. We also assume Maximally Binary-Branching in (12a). Just as it is debatable whether trees should be at most binary branching, the status of Maximally Binary-Branching in our representation system is provisional, but it plays some non-trivial role when we define the PF structures and the interpretation of our CSSs as such PF structures. (12)
a. Maximally Binary-Branching: ∀x, y, z ∈ CSS. = x} = {y ∈ CSS | RC(y , y) & y = (({x ∈ CSS | RC(x , x) & x y} = {z ∈ CSS | RC(z , z) & z = z}) → ((x = y) ∨ (x = z) ∨ (y = z))) b. Unique Splittability: ∀x, y ∈ CSS. (({x ∈ CSS | RC(x , x) & x = x} = {y ∈ CSS | RC(y , y) & y = y} & x = y) → (Dx ∪ Dy = ∅)). c. Inclusiveness: ∀x ∈ CSS, ∀a ∈ Cat. (a ∈ Dx → ∃y ∈ CSS.(RC(x, y) & head(y) = a)).
Unique Splittability in (12b) prevents one CSS from containing two treelets such as {d, {d, e}} and {c, {c, e}} in which a category e appears in the dominance sets of two distinct treelets which are not ordered with regard to RC. Together with the other conditions that we have introduced so far, it follows that if there exists a treelet z = {a, Dz } such that IC(z, x) and IC(z, y), then, the dominance set Dz must be the union of Dx and Dy and {a}. To prove this, suppose that Dz had two distinct categories a and b which are not members of Dx or Dy , where a is the head of z. Then by (8a), the CSS must have at least one treelet w = {b, {b, ...}}. Now, since b ∈ Dz , because of Unique Splittability in (12b) together with (8a), it follows that (a): RC(z, w). The next two paragraphs prove (a). If RC(z, w) were not the case, then because of Maximally Binary-Branching in (12a) and because of the presence of the maximal treelet, there would be only two sub-cases, Case A and Case B. In Case A, there would be another treelet u such that u = z, u = w, RC(u, z) and RC(u, w) and z and w are in different ‘branches’ under the treelet u. That is, ∃m, n ∈ CSS.(IC(u, m)&IC(u, n)&m =
Constituent Structure Sets I
287
n&RC(m, z)&RC(n, w)).10 But this would violate Unique Splittability since u would immediately contain two treelets (i.e., m and n) both of whose dominance sets contain b. That is, Dz contains b by assumption which is inherited by Dm , and Dw contains b by assumption which is inherited by Dn where m and n are immediately contained by u. In Case B, RC(w, z) and w = z. But then, by definition of RC, Dw must contain at least one element which does not appear in Dz . Suppose there is exactly one element e ∈ Cat such that e ∈ Dw but e ∈ Dz .11 Because of (8a), the CSS must have a treelet v = {e, {e, ...}}, but since e does not appear in Dz , it must be either the case that there is another treelet s such that s = w, s = v, RC(s, w) and RC(s, v), in which case we would violate Unique Splittability in (12b), or that there is another treelet t which immediately contains w and that does not immediately contain any other treelet. But in this latter case, again, because of the definition of RC, Dt must contain some element f ∈ Cat which is not in Dw .12 However, since each Cat is finite, at a certain stage, we would reach a situation in which this distinguishing element f could not be supplied from the Cat. Suppose that were the case for Dt already (that is, Cat = Dx ∪ Dy ∪ {a, b, e}), then Dt = Dw (i.e., Dw ⊆ Dt and Dt ⊆ Dw ). But remember that the head of t must be e since we have postulated t in order to satisfy the requirement in (8a). This would then violate Antisymmetry in (11c), since RC(t, w) and RC(w, t) but w = t (that is, their dominance sets are the same, but their heads are different). We could have assumed that Dt contains some other elements than the elements in Dx ∪ Dy ∪ {a, b, e} by expanding the Cat, but as we mentioned above, since any Cat must be finite by definition and since those additional elements cannot occur in w or any treelets w reflexively contains, we would have the same contradiction at a certain stage in any case. This concludes that RC(z, w). But this would violate Maximally BinaryBranching in (12a) since there would then be another treelet that was immediately contained by z other than x and y. Since the above proof is based on the assumption that Dz contains an element b ∈ Cat which is not the same as the head of z or a member of Dx or Dy , the same proof can be used when Dz contains still another element in addition to b which is not the same as a or a member of Dx or Dy . Thus, Dz must be the union of Dx and Dy and {a} where a ∈ Cat is the head of z. Coming back to (12), because of Inclusiveness in (12c) we cannot generate the CSS in (13a) in which the treelet headed by read contains the modal can in its dominance set even if can does not head any treelet that is reflexively contained by that treelet. (13)
10 11 12
a. Undesirable treelet: {{can, {can, read, M eg}}; {read, {read, can}}; {M eg, {M eg}}} b. (13a) should be:
Note that it can be the case that m = z or/and n = w. Note that b = e, where b is the head of w by assumption, since b ∈ Dz by assumption. Again, it follows that f = e.
288
H. Uchida and D. Bury
{{can, {can, read, M eg}}; {read, {read}}; {M eg, {M eg}}} Instead, in (13a), can head the maximal treelet that contains the treelet headed by read. Intuitively, when we scan a CSS from treelets of smaller size (i.e., treelets whose dominance sets contain fewer members, starting with identity treelets), then it must be the case that any element a ∈ Cat is introduced as the head of a treelet initially, in the form {a, {a, ...}} where ‘...’ might be empty (i.e., as in an identity treelet), before being incorporated into the dominance sets of the containing treelets. (12c) requires exactly that to be the case. As a derived property of our structure representation system, Closure in (14) is automatically satisfied by (11a). (14) Closure (satisfied by (11a)): (∀x ∈ CSS.∃y ∈ CSS.RC(x, y))&(∀y ∈ CSS.∃x ∈ CSS.RC(x, y)) Also, in combination of the presence of the unique maximal treelet in each CSS as required by (10) and Unique Splittability in (12b), we have Upward NonBranching in (15) as a derived theorem. (15) Upward Non-Branching: ∀x, y, y ∈ CSS.[(RC(y, x)&RC(y , x) → (RC(y, y ) ∨ RC(y , y))] That is, if the unique maximal treelet x immediately contains two distinct treelets, say, y and z, then the dominance sets of y and z do not share any members because of Unique Splittability. Thus, none of the treelets that y reflexively contains (which includes y itself) can reflexively contain a treelet that is also reflexively contained by a treelet that is reflexively contained by z (which again includes z itself). Since this binary branching is the only way of creating mutually unordered treelets under the maximal treelet, there is no other possibilities of generating an upward branching structure between treelets. Each CSS is a set and so is each dominance set Dx . According to the basic properties of sets, we interpret n occurrences of one item in each set as one, as shown in (16). (16) Denotational interpretation of sets: For all a ∈ Cat. {a, a, a} = {a} For all a, b ∈ Cat. {{a, {a, b}}; {a, {a, b}}} = {{a, {a, b}}} ... etc. Given the restriction in (16), a CSS can still contain more than one treelet for one category, say, a ∈ Cat, such as {a, {a, b, c}}, {a, {a, b}} and {a, {a}}, where Cat = {a, b, c}. As we see in the next section closely, (16) means that CSSs cannot distinguish some of the copy structures that labeled trees can represent as distinct. This inability to express syntactic copying is not stipulated in our system, it follows from the basic properties of CSSs. This section has formally defined our structural representation system. Given a finite set of lexicalized categories Cat, each CSS is a particular subset of the power set of Cat, where for each D ∈ ℘(Cat), a member of D is distinguished as its head. Section 3 provides some example structures that show that our representation system has weaker expressive power than labeled trees and that we can represent syntactic copying with an inherent maximal bound.
Constituent Structure Sets I
3
289
Representational Collapsibility and Its Linguistic Implications
This section compares the expressive powers of CSSs and labeled tree representations in terms of syntactic copying or duplication. To compare labeled trees with CSSs, we define a mapping T rans from labeled trees to CSSs in (17). (17)
a. For each maximally binary branching13 labeled tree T , T rans(T ) = CSS, where CSS is as in (17b). b. For each node x of T , we include a treelet in the form of {a, Dx } as a member of CSS where a is the label on the node x and Dx contains the labels on all the nodes that are reflexively dominated by x.
We show that T rans in (17) is a homomorphic mapping by way of crucial examples, rather than formally proving the homomorphism. Remember from (16) in section 2 that we cannot distinguish multiple occurrences of a category from one occurrence, as in {X, X} = {X}. Now we show that CSS cannot distinguish certain structures that labeled trees can.14 Compare (18) and (19). V
(18) a. V b.
V c.
V
d. D V
V
V e. D
V
V DV
D V
CSSs in (19a)∼(19e) represent the trees in (18a)∼(18e) according to (17). (19)
a. {{V, {V }}} b. {{V, {V , V }}; {V, {V }}} = {{V, {V }}; {V, {V }}} = {{V, {V }}}=(19a) c. {{V, {V , V , D}}; {V, {V }}; {D, {D}}} = {{V, {V, D}}; {V, {V }}; {D, {D}}} d. {{V, {V , V , V , D, D}}; {V, {V , V , D}}; {V, {V }}; {D, {D}}; {D, {D}}} = {{V, {V, D}}; {V, {V, D}}; {V, {V }}; {D, {D}}} = (19c) e. {{V, {V , V , V , D}}; {V, {V , V , D}}; {V, {V }}; {D, {D}}} = (19d) = (19c)
For presentation, we underline the multiple occurrences of the same item in the same set, which collapse into one. Two tree structures in (18a, b) collapse into one CSS, as shown in (19a, b). Thus, projection of V is impossible without a 13 14
Remember that we have assumed that both labeled trees and CSSs have this property. The proof of this proposition should not rely on a particular mapping such as T rans, but we only support this proposition relative to T rans. Between (18a) and (18b), it is easy to show that the proposition holds without relying on T rans. The numeration set for both the trees is {V}. With this numeration set, we can only generate CSS = {V,{V}} accodring to (7)∼(12).
290
H. Uchida and D. Bury
filled Specifier, that is, D in (18c) which produces a different CSS in (19c). Also, in CSS, we cannot fill this spec position by copying a category from a lower position in the tree as in (18d). In CSS, (18d) is equivalent to (18c), as is shown in (19c, d). Moreover, as (19c∼e) show, CSS cannot distinguish the multiple dominance structure (MDS) in (18e) from the copy-chain structure in (18d) or from the non-movement structure in (18c) (cf. [5] shows that copy chains and MDS are formally equivalent). In linguistic applications of CSSs, “self-attachment” of a head (e.g., the copying of the lower V onto the mother node in (18c)) is expressible with a filled Specifier. Remember from section 1 that (18c) is not a well-formed structure in telescope structures. On the other hand, copying into a Specifier (i.e., a terminal) node (e.g., (18d)) is not expressible in CSSs. Thus, for A/A-bar movement phenomena, we only have an overt item in the ‘landing site’ of the movement.
4
Interpreting CSSs into Phonological Structures
Section 4.1 provides the PF interpretation of CSSs defined in Section 2. Section 4.2 formally provides the intended PF structures for CSSs. 4.1
PF Interpretation
For presenting PF interpretation rules, we use the irreflexive containment relation C and the immediate containment relation IC between treelets in (21). C is simply an irreflexive version of the basic syntactic relation RC as we have defined in Section 2 and IC is as we have defined in Section 2. (20) repeats the most basic part of the definition of RC. We assume the further restrictions on RC that we provided in (8)∼(12). (20) Reflexive Containment (RC) : ∀x, y ∈ CSS.(RC(x, y) ⇔ (Dx ⊇ Dy )) (21a) and (21b) define C and IC from the basic syntactic relation RC. (21)
a. Containment C: ∀x, y ∈ CSS.(C(x, y) ⇔ (RC(x, y) & x = y)) b. Immediate Containment (IC): ∀x, y ∈ CSS. (IC(x, y) ⇔ (RC(x, y) & x = y & (¬∃z ∈ CSS.(RC(x, z) & RC(z, y) & z =x&z = y))))
When we generate a PF structure for a CSS, we successively linearize the members of the dominance sets of the treelets according to the irreflexive constituent containment relation C and the immediate constituent containment relation IC between treelets. Since C and IC are relations derived from RC, they are indirectly constrained by the restrictions on RC as we have assigned in Section 2, though some of the properties that RC has might not apply because of the definitions in (21). For example, IC is no longer transitive whereas RC and C are. And C is irreflexive
Constituent Structure Sets I
291
by definition whereas RC is reflexive. Note that each CSS is closed and partially ordered in terms of RC, with a maximal treelet. Thus, we can specify the order in which we generate PF (sub)structures for the treelets in each CSS with regard to its RC, but it is easier to provide the order instruction with C since it is a strict partial order, as we see shortly. We provide the set of the numerated PF lexical items as in (22). (22) P F N um is the set of phonological expressions whose syntactic categories appear as members of Cat. As we have discussed, we assume that T may either correspond to a finite auxiliary verb such as could and did or correspond to some invisible phonological item.15 Because of the one-to-one mapping between the members of each CSS and the members of the corresponding P F N um as in (22), and because category names in our structures are used only for distinguishing the lexical items that appear in the structures,16 we can assume Cat = P F N um for each structure < Cat, CSS, RC >. Given a P F N um, the set of potential PF structures is as in (23).17 (23) Φmax : the set of potential PF structures, relative to P F N um. a. If a ∈ P F N um, then a ∈ Φmax . (Each a ∈ P F N um is called an atomic PF item) b. For all a, b ∈ Φmax , (a · b) ∈ Φmax . c. For all a, b, c ∈ Φmax , (a · b · c) ∈ Φmax . d. Φmax is the smallest set that satisfies (23a)∼(23c). (23) limits possible PF structures to at most ternary bracketed structures made out of the elements of the given P F N um. The set of actual PF structures that can be generated for each P F N um is a subset of this maximal set Φmax as we see below. Since the Φmax covaries with each P F N um, we may specify this dependency as superscript, as in Φmax P F N um1 , but we omit such details in notation for readability. Given CSS, Cat, PFNum and the partial order between the treelets according to (21a), we successively interpret the treelets as PF structures, starting with the treelets that are the lowest in the order in terms of C in (21a) (i.e., the identity treelets) and finishing with the maximal treelet. That is, for all x, y ∈ CSS, if C(x, y), we linearize (the dominance set of) y before linearizing (the dominance set of) x. If ¬C(x, y) & ¬C(y, x), then we can linearize x and y in either order. 15 16 17
In languages such as French, T may host an inflected main verb. That is, it does not matter whether we put the phonological word play or its syntactic category V in the CSS. The connective ‘·’ is non-commutative and non-associative, but we sometimes omit the parentheses for presentation reasons.
292
H. Uchida and D. Bury
We first explain the basic ideas of our PF interpretation process in (24a)∼(24b) and then provide the PF interpretation rules that implement these basic ideas later. Basically, we successively incorporate the output of each treelet y ∈ CSS into the dominance set of the treelet x that immediately contains y (i.e., IC(x, y)) as shown in (24). (24)
a. {output − unit, {input − units}} b. E.g.: {{(c · (a · b)), {c, (a · b)}}; {(a · b), {a, b}}; {a, {a}}; {b, {b}}; {c, {c}}}
For each treelet, the output PF structure counts as a PF unit, as indicated by the outermost pair of parentheses.18 The PF unit status may covary with each treelet x that is being processed at each stage of the PF generation. For example, in (24b), (a · b) counts as one PF unit in the Dominance set of the maximal treelet, but inside the Dominance set of the second largest treelet = {a, {a, b}}, a and b are separate PF units. Each PF unit generated in this way at any stage of PF generation is a member of ΦCSS in (23) and, as we see later, each PF unit generated at any stage forms a PF substructure of the final output structure (which we call a ‘P F ’) for the CSS. We incorporate the output of the PF interpretation of each treelet to the Dominance set of the treelet that immediately contains it. The PF interpretation of the maximal treelet of the CSS is the final PF output. Given (21) and (24), we provide the basic PF interpretation rules of our system in (26). (26) is the formal implementation of the basic idea in (25). (25) Immediate Containment as PF adjacency (ICPA): Immediate containment between treelets in CSS corresponds to PF adjacency between the corresponding PF units. (26) PF interpretation rules. a. For all identity treelet xid = {A, {A}} ∈ CSS: a is the output PF unit of x, where a is the PF item for the category A . b. ∀x, y ∈ CSS, when we generate PF structures for x, y: If IC(x, y), then either (a · b ) or (b · a) is the output PF unit of x, where a is the PF lexical item for head(x) and b is the PF unit as the output of the PF interpretation of the treelet y. c. ∀x, y, z ∈ CSS, when we linearize x, y, z: If (IC(x, y) & IC(x, z) & y = z), then either (b · a · c ) or (c · a · b ) is the output PF unit of x, where a is the PF lexical item for head(x) and b is the PF unit as the output of the PF interpretation of y and c is the PF unit as the output of the PF interpretation of z. 18
Though we put the output PF unit in the head position of the treelet, this is for notational convenience, and does not mean that the head category in the syntax corresponds to the derived PF output unit.
Constituent Structure Sets I
293
d. The final output of a PF interpretation of a CSS is the output of the maximal treelet in the CSS. Again, (25) expresses the intuitive idea of our PF interpretation of CSSs. With Maximally Binary-Branching in (12a), the number of treelets that one treelet can immediately contain is either one or two. Correspondingly, in this paper, we assume that the generated PF structures are at most ternary bracketed. Also, PF structures are non-commutative and non-associative. (27) explains our treatment of syntactic copies in PF interpretation and flexible pronunciation positions of verbal heads. (27)
a. ∀A ∈ Cat, if the corresponding PF item a appears more than once in the generated PF string, a can potentially be pronounced in any of those positions, whereas the other positions are PF null. b. A verbal head item, such as the PF lexical item play for the category V , can be pronounced in the PF position of another head in the same “projection line” (such as T-V ).
(26) together with (27) provide flexibility in terms of syntax-phonology mapping, which overgenerates PF strings without further constraints. However, the idea is that, as in the syntax-semantics interface, independent PF considerations can provide non-trivial constraints on PF linearization. For example, structural DP case assignment by a verbal head may require a certain PF configuration in a morphologically impoverished language such as English.19 We also assume a certain asymmetry between PF and LF, in the sense that PF only linearizes the CSSs which are syntactically and semantically well-formed. Though we have provided (26), (27a) and (27b) as if they had the same status in our system, only the PF interpretation rules in (26) constitute an essential part of the our representation system, while (27a) gives us an additional pronunciation instruction with regard to multiple copies that the proposed system may generate. Also, the linearization rules that refer to some semantic concepts (such as the notion of ‘verbal heads’ and ‘projection line’ in (27b)) on the one hand and the rules that only refer to the narrow syntactic elements abstracted away from particular category names on the other are different in nature in the proposed system. For example, we can calculate the PF structures that CSSs can generate via the PF interpretation rules in (26) abstracted away from (the semantics of) the category names. (27a) can be stated abstracted away from such semantics as well, but (27b) is based on a particular relation that holds between particular categories, such as between T and V, but not between V and D. Since the main objective of this paper is to introduce a novel structure representation system abstracted away from applicational details, we mostly ignore the rules that need to refer to particular categories.
19
See [6] for a similar use of phonological constraints on case assignment.
294
H. Uchida and D. Bury
(26)∼(27) provides formally possible phonological structures which might then be further restricted by way of system external considerations. We provide a simple example. (28)
a. Cat : {T, V, D1, D2}, P F N um : {can, play, ally, tennis}
CSS : {{T, {T, V, D1, D2}}; {V, {V, D2}}; {V, {V }}; {D1, {D1}; {D2, {D2}}}
b. PF interpretation: tT : {(ally · can · (play · tennis)), {can, ally, (play · tennis)}} tV : {(play · tennis), {play, tennis}} tV Id : {play , {play }}; tD1 : {ally, {ally}}; tD2 : {tennis, {tennis}}} For convenience, tT represents the treelet headed by T , tD1 represents the treelet headed by D1 and so forth. The category V occurs in two treelets as the head, and I distinguish the identity treelet for V as tV Id from the other, which is tV . This is for presentational convenience. (28) is only one formally possible PF interpretation of the given CSS. We assume that some constraints that are external to our representation system allow us to choose this structure for English. 4.2
Intended PF Structures
This section provides the phonological structures that work as the intended phonological interpretations of CSSs. We assume that each member of Cat corresponds to exactly one phonological item in the PF string. That is, (29) holds. (29) Given Cat for CSS: For each category A ∈ Cat, there is exactly one PF item a ∈ P F N um, where P F N um is the (finite) set of atomic PF items for the CSS. For our current purpose, we can simply assume that Cat = P F N um for each CSS and a PF structure that we generate for the CSS. Both Cat and P F N um are sets, so one PF item can appear only once in each P F N um. Thus two occurrences of the same phonological word, say, smoke as in Meg smokes and Bill smokes must be stored as different PF items in the P F N um. Remember that given a set of atomic PF items P F N um, the set of maximal PF structures in (23) limits phonological structures to at most ternary bracketed structures. Each P F , which is a phonological structure that we can derive from each CSS via the interpretation rules that we have provided, is a member of ΦCSS . Before we define P F s, we define the notion of substructures in (30). (30)
a. ‘a ≤ b’ means that a is a substructure of b. (We omit ‘max’ subscript from ‘Φmax ’ in (30)). ∀a, b ∈ Φ. (((a = b) ∨ ∃c ∈ Φ.((a · c) = b) ∨ ∃c ∈ Φ.((c · a) = b) ∨
Constituent Structure Sets I
295
∃c, d ∈ Φ.((a · c · d) = b) ∨ ∃c, d ∈ Φ.((c · a · d) = b) ∨ ∃c, d ∈ Φ.((c · d · a) = b))) → (a ≤ b)). b. ∀a, b, c ∈ Φ.((a ≤ b & b ≤ c) → a ≤ c) c. Only (30a) together with (30b) defines a substructure relation (that is, ‘≤’ is defined by a transitive closure of (30a)). Transitivity of ‘≤’ is stipulated in (30b). Given (30), ‘≤’ is reflexive and antisymmetric, as in (31). (31)
a. Reflexivity: ∀a ∈ Φ.(a ≤ a). b. Antisymmetry: ∀a, b ∈ Φ.((a ≤ b & b ≤ a) → a = b).
Now, we define well-formed PF structures ‘P F ,’ with the numeration set P F N um. (32) Given P F N um, each P F is a member of Φmax that satisfies (32a)∼(32c). a. ∀a ∈ P F N um.(a ≤ P F ). b. ∀a ≤ P F.(∃b, c ∈ Φ.(a = (b · c)) → ((b ∈ P F N um ∨ c ∈ P F N um) & ¬∃d ∈ Φ(d ≤ b & d ≤ c) & (c · b) ≤ P F )) c. ∀a ≤ P F.(∃b, c, d ∈ Φ.(a = (b · c · d)) → (c ∈ P F N um & ¬∃e ∈ Φ.(e ≤ b & e ≤ d) & (d · c · b) ≤ PF) (32a) means that in each P F , each member of P F num appears at least once as an atomic substructure. (32c) does not exclude the possibility that the atomic c is a sub-structure of b or d, as long as c does not occur both in b and in d.
5
Conclusion
We have proposed a structure representation system in which we partially order particular subsets of the numeration set via the basic syntactic relation, rather than directly ordering the members of the numeration set as in telescope trees or ordering the tree nodes decorated with the members of the numeration set, as in phrase structure trees. Our system maintains the basic merit of telescope trees. That is, if the numeration set is finite, then each structure that we can generate is automatically finite and we can generate only a finite number of distinct structures without additional assumptions. On the other hand, unlike telescope trees, our system can still represent syntactic copying with a provable maximal limit, which we argue is linguistically useful. The intended phonological structures are maximally ternary-bracketed structures of phonological items. Unlike tree-based structures, we cannot read the phonological strings off the terminal nodes of the trees left-to-right. However, this flexible PF linearization will generate varying word orders from the same syntactic structure without multiplying the number of the items in the syntactic structures.
296
H. Uchida and D. Bury
References 1. Brody, M.: Mirror theory: Syntactic representation in perfect syntax. Linguistic Inquiry 4, 29–52 (2000) 2. Bury, D., Uchida, H.: Constituent Structure Sets 2. Submitted to the proceedings of Ways of Structure Building Workshop (2009) 3. Bury, D.: Phrase Structure and Derived Heads. PhD thesis, University College London, London (2003) 4. Kepser, S.: Properties of binary transitive closure logic over trees. In: Monachesi, P., Penn, G., Satta, G., Winter, S. (eds.) Proceedings of 11th Conference on Formal Grammar, pp. 77–90. CSLI Publications, Stanford (2006) 5. Kracht, M.: Syntax in chains. Linguistics and Philosophy 24, 467–529 (2001) 6. Neeleman, A., Weerman, F.: Flexible Syntax: a theory of Case and Assignment. The MIT Press, Dortrecht (1999)
Author Index
Bailey, Gil 255 Boston, Marisa Ferrara Bury, Dirk 281
1
Chen, Zhong 13 Cysouw, Michael 29 Dyckhoff, Roy
56
Edlefsen, Matt
255
Fowler, Timothy A.D. Francez, Nissim 56 Graf, Thomas
Kornai, Andr´ as 174 Kubota, Yusuke 200 Kuhlmann, Marco 1 Kurtonina, Natasha 210 Moortgat, Michael Moss, Lawrence S.
210 223
Pollard, Carl 200 Pullum, Geoffrey K. 36, 44
72
Hale, John T. 1, 13 Heinz, Jeffrey 255 Holder, Thomas 88 Hunter, Tim 103 Kepser, Stephan 117, 129 Kobele, Gregory M. 145, 160
Rogers, James
238
129, 255
Salvati, Sylvain 266 Szymanik, Jakub 272 Uchida, Hiroyuki Visscher, Molly
281 255
Wellcome, David 255 Wibel, Sean 255